linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
@ 2021-08-20 15:58 Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 01/45] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
                   ` (45 more replies)
  0 siblings, 46 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required in a host OS for SEV-SNP support. The series builds upon
SEV-SNP Part-1.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:
- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
access requires an RMP check. In particular, the read accesses from the
hypervisor do not require RMP checks because the data confidentiality is
already protected via memory encryption. When hardware encounters an RMP
checks failure, it raises a page-fault exception. If RMP check failure
is due to the page-size mismatch, then split the large page to resolve
the fault.

The series does not provide support for the interrupt security and migration
and those feature will be added after the base support.

The series is based on the commit:
 SNP part1 commit and
 fa7a549d321a (kvm/next, next) KVM: x86: accept userspace interrupt only if no event is injected

TODO:
  * Add support for command to ratelimit the guest message request.

Changes since v4:
 * Move the RMP entry definition to x86 specific header file.
 * Move the dump RMP entry function to SEV specific file.
 * Use BIT_ULL while defining the #PF bit fields.
 * Add helper function to check the IOMMU support for SEV-SNP feature.
 * Add helper functions for the page state transition.
 * Map and unmap the pages from the direct map after page is added or
   removed in RMP table.
 * Enforce the minimum SEV-SNP firmware version.
 * Extend the LAUNCH_UPDATE to accept the base_gfn and remove the
   logic to calculate the gfn from the hva.
 * Add a check in LAUNCH_UPDATE to ensure that all the pages are
   shared before calling the PSP.
 * Mark the memory failure when failing to remove the page from the
   RMP table or clearing the immutable bit.
 * Exclude the encrypted hva range from the KSM.
 * Remove the gfn tracking during the kvm_gfn_map() and use SRCU to
   syncronize the PSC and gfn mapping.
 * Allow PSC on the registered hva range only.
 * Add support for the Preferred GPA VMGEXIT.
 * Simplify the PSC handling routines.
 * Use the static_call() for the newly added kvm_x86_ops.
 * Remove the long-lived GHCB map.
 * Move the snp enable module parameter to the end of the file.
 * Remove the kvm_x86_op for the RMP fault handling. Call the
   fault handler directly from the #NPF interception.

Changes since v3:
 * Add support for extended guest message request.
 * Add ioctl to query the SNP Platform status.
 * Add ioctl to get and set the SNP config.
 * Add check to verify that memory reserved for the RMP covers the full system RAM.
 * Start the SNP specific commands from 256 instead of 255.
 * Multiple cleanup and fixes based on the review feedback.

Changes since v2:
 * Add AP creation support.
 * Drop the patch to handle the RMP fault for the kernel address.
 * Add functions to track the write access from the hypervisor.
 * Do not enable the SNP feature when IOMMU is disabled or is in passthrough mode.
 * Dump the RMP entry on RMP violation for the debug.
 * Shorten the GHCB macro names.
 * Start the SNP_INIT command id from 255 to give some gap for the legacy SEV.
 * Sync the header with the latest 0.9 SNP spec.
 
Changes since v1:
 * Add AP reset MSR protocol VMGEXIT NAE.
 * Add Hypervisor features VMGEXIT NAE.
 * Move the RMP table initialization and RMPUPDATE/PSMASH helper in
   arch/x86/kernel/sev.c.
 * Add support to map/unmap SEV legacy command buffer to firmware state when
   SNP is active.
 * Enhance PSP driver to provide helper to allocate/free memory used for the
   firmware context page.
 * Add support to handle RMP fault for the kernel address.
 * Add support to handle GUEST_REQUEST NAE event for attestation.
 * Rename RMP table lookup helper.
 * Drop typedef from rmpentry struct definition.
 * Drop SNP static key and use cpu_feature_enabled() to check whether SEV-SNP
   is active.
 * Multiple cleanup/fixes to address Boris review feedback.

Brijesh Singh (40):
  x86/cpufeatures: Add SEV-SNP CPU feature
  iommu/amd: Introduce function to check SEV-SNP support
  x86/sev: Add the host SEV-SNP initialization support
  x86/sev: Add RMP entry lookup helpers
  x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  x86/sev: Invalid pages from direct map when adding it to RMP table
  x86/traps: Define RMP violation #PF error code
  x86/fault: Add support to handle the RMP fault for user address
  x86/fault: Add support to dump RMP entry on fault
  crypto: ccp: shutdown SEV firmware on kexec
  crypto:ccp: Define the SEV-SNP commands
  crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
  crypto:ccp: Provide APIs to issue SEV-SNP commands
  crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  crypto: ccp: Handle the legacy SEV command when SNP is enabled
  crypto: ccp: Add the SNP_PLATFORM_STATUS command
  crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  crypto: ccp: Provide APIs to query extended attestation report
  KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
  KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  KVM: SVM: Add initial SEV-SNP support
  KVM: SVM: Add KVM_SNP_INIT command
  KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
  KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
  KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
  KVM: X86: Keep the NPT and RMP page level in sync
  KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
  KVM: x86: Define RMP page fault error bits for #NPF
  KVM: x86: Update page-fault trace to log full 64-bit error code
  KVM: SVM: Do not use long-lived GHCB map while setting scratch area
  KVM: SVM: Remove the long-lived GHCB host map
  KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
  KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  KVM: SVM: Add support to handle Page State Change VMGEXIT
  KVM: SVM: Introduce ops for the post gfn map and unmap
  KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
  KVM: SVM: Add support to handle the RMP nested page fault
  KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  KVM: SVM: Add module parameter to enable the SEV-SNP

Sean Christopherson (2):
  KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
  KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP

Tom Lendacky (3):
  KVM: SVM: Add support to handle AP reset MSR protocol
  KVM: SVM: Use a VMSA physical address variable for populating VMCB
  KVM: SVM: Support SEV-SNP AP Creation NAE event

 Documentation/virt/coco/sevguest.rst          |   55 +
 .../virt/kvm/amd-memory-encryption.rst        |  102 +
 arch/x86/include/asm/cpufeatures.h            |    1 +
 arch/x86/include/asm/disabled-features.h      |    8 +-
 arch/x86/include/asm/kvm-x86-ops.h            |    5 +
 arch/x86/include/asm/kvm_host.h               |   20 +
 arch/x86/include/asm/msr-index.h              |    6 +
 arch/x86/include/asm/sev-common.h             |   28 +
 arch/x86/include/asm/sev.h                    |   45 +
 arch/x86/include/asm/svm.h                    |    7 +
 arch/x86/include/asm/trap_pf.h                |   18 +-
 arch/x86/kernel/cpu/amd.c                     |    3 +-
 arch/x86/kernel/sev.c                         |  361 ++++
 arch/x86/kvm/lapic.c                          |    5 +-
 arch/x86/kvm/mmu.h                            |    7 +-
 arch/x86/kvm/mmu/mmu.c                        |   84 +-
 arch/x86/kvm/svm/sev.c                        | 1676 ++++++++++++++++-
 arch/x86/kvm/svm/svm.c                        |   62 +-
 arch/x86/kvm/svm/svm.h                        |   74 +-
 arch/x86/kvm/trace.h                          |   40 +-
 arch/x86/kvm/x86.c                            |   92 +-
 arch/x86/mm/fault.c                           |   84 +-
 drivers/crypto/ccp/sev-dev.c                  |  924 ++++++++-
 drivers/crypto/ccp/sev-dev.h                  |   17 +
 drivers/crypto/ccp/sp-pci.c                   |   12 +
 drivers/iommu/amd/init.c                      |   30 +
 include/linux/iommu.h                         |    9 +
 include/linux/mm.h                            |    6 +-
 include/linux/psp-sev.h                       |  346 ++++
 include/linux/sev.h                           |   32 +
 include/uapi/linux/kvm.h                      |   56 +
 include/uapi/linux/psp-sev.h                  |   60 +
 mm/memory.c                                   |   13 +
 tools/arch/x86/include/asm/cpufeatures.h      |    1 +
 34 files changed, 4088 insertions(+), 201 deletions(-)
 create mode 100644 include/linux/sev.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 01/45] x86/cpufeatures: Add SEV-SNP CPU feature
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-16 16:56   ` Borislav Petkov
  2021-08-20 15:58 ` [PATCH Part2 v5 02/45] iommu/amd: Introduce function to check SEV-SNP support Brijesh Singh
                   ` (44 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/cpufeatures.h       | 1 +
 arch/x86/kernel/cpu/amd.c                | 3 ++-
 tools/arch/x86/include/asm/cpufeatures.h | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d0ce5cfd3ac1..62f458680772 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -398,6 +398,7 @@
 #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
 #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP		(19*32+4)  /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
 
 /*
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index b7c003013d41..3e6a586fb589 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -586,7 +586,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 	 *	      If BIOS has not enabled SME then don't advertise the
 	 *	      SME feature (set in scattered.c).
 	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
-	 *            SEV and SEV_ES feature (set in scattered.c).
+	 *            SEV, SEV_ES and SEV_SNP feature.
 	 *
 	 *   In all cases, since support for SME and SEV requires long mode,
 	 *   don't advertise the feature under CONFIG_X86_32.
@@ -618,6 +618,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 clear_sev:
 		setup_clear_cpu_cap(X86_FEATURE_SEV);
 		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
 	}
 }
 
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index d0ce5cfd3ac1..62f458680772 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -398,6 +398,7 @@
 #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
 #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP		(19*32+4)  /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 02/45] iommu/amd: Introduce function to check SEV-SNP support
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 01/45] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-16 17:26   ` Borislav Petkov
  2021-08-20 15:58 ` [PATCH Part2 v5 03/45] x86/sev: Add the host SEV-SNP initialization support Brijesh Singh
                   ` (43 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The SEV-SNP support requires that IOMMU must to enabled, see the IOMMU
spec section 2.12 for further details. If IOMMU is not enabled or the
SNPSup extended feature register is not set then the SNP_INIT command
(used for initializing firmware) will fail.

The iommu_sev_snp_supported() can be used to check if IOMMU supports the
SEV-SNP feature.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/iommu/amd/init.c | 30 ++++++++++++++++++++++++++++++
 include/linux/iommu.h    |  9 +++++++++
 2 files changed, 39 insertions(+)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 46280e6e1535..bd420fb71126 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3320,3 +3320,33 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
 
 	return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);
 }
+
+bool iommu_sev_snp_supported(void)
+{
+	struct amd_iommu *iommu;
+
+	/*
+	 * The SEV-SNP support requires that IOMMU must be enabled, and is
+	 * not configured in the passthrough mode.
+	 */
+	if (no_iommu || iommu_default_passthrough()) {
+		pr_err("SEV-SNP: IOMMU is either disabled or configured in passthrough mode.\n");
+		return false;
+	}
+
+	/*
+	 * Iterate through all the IOMMUs and verify the SNPSup feature is
+	 * enabled.
+	 */
+	for_each_iommu(iommu) {
+		if (!iommu_feature(iommu, FEATURE_SNP)) {
+			pr_err("SNPSup is disabled (devid: %02x:%02x.%x)\n",
+			       PCI_BUS_NUM(iommu->devid), PCI_SLOT(iommu->devid),
+			       PCI_FUNC(iommu->devid));
+			return false;
+		}
+	}
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(iommu_sev_snp_supported);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..269abc17b2c3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -604,6 +604,12 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 void iommu_sva_unbind_device(struct iommu_sva *handle);
 u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+bool iommu_sev_snp_supported(void);
+#else
+static inline bool iommu_sev_snp_supported(void) { return false; }
+#endif
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -999,6 +1005,9 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
 	return NULL;
 }
+
+static inline bool iommu_sev_snp_supported(void) { return false; }
+
 #endif /* CONFIG_IOMMU_API */
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 03/45] x86/sev: Add the host SEV-SNP initialization support
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 01/45] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 02/45] iommu/amd: Introduce function to check SEV-SNP support Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-24  8:58   ` Borislav Petkov
  2021-08-20 15:58 ` [PATCH Part2 v5 04/45] x86/sev: Add RMP entry lookup helpers Brijesh Singh
                   ` (42 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
track the owner of each page of memory. Pages of memory can be owned by
the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
section 15.36.3 for more detail on RMP.

The RMP table is used to enforce access control to memory. The table itself
is not directly writable by the software. New CPU instructions (RMPUPDATE,
PVALIDATE, RMPADJUST) are used to manipulate the RMP entries.

Based on the platform configuration, the BIOS reserves the memory used
for the RMP table. The start and end address of the RMP table must be
queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
RMP_END are not set then disable the SEV-SNP feature.

The SEV-SNP feature is enabled only after the RMP table is successfully
initialized.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/msr-index.h         |   6 +
 arch/x86/kernel/sev.c                    | 144 +++++++++++++++++++++++
 3 files changed, 157 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 8f28fafa98b3..30a760e19c35 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -65,6 +65,12 @@
 # define DISABLE_SGX	(1 << (X86_FEATURE_SGX & 31))
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+# define DISABLE_SEV_SNP	0
+#else
+# define DISABLE_SEV_SNP	(1 << (X86_FEATURE_SEV_SNP & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -88,7 +94,7 @@
 			 DISABLE_ENQCMD)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
-#define DISABLED_MASK19	0
+#define DISABLED_MASK19	(DISABLE_SEV_SNP)
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)
 
 #endif /* _ASM_X86_DISABLED_FEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 37589da0282e..410359a9512c 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -485,6 +485,8 @@
 #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
 #define MSR_AMD64_SEV_ES_ENABLED	BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
 #define MSR_AMD64_SEV_SNP_ENABLED	BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_RMP_BASE		0xc0010132
+#define MSR_AMD64_RMP_END		0xc0010133
 
 #define MSR_AMD64_VIRT_SPEC_CTRL	0xc001011f
 
@@ -542,6 +544,10 @@
 #define MSR_AMD64_SYSCFG		0xc0010010
 #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT	23
 #define MSR_AMD64_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_SNP_EN_BIT		24
+#define MSR_AMD64_SYSCFG_SNP_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT	25
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN	BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
 #define MSR_K8_INT_PENDING_MSG		0xc0010055
 /* C1E active bits in int pending message */
 #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index ab17c93634e9..7936c8139c74 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -24,6 +24,8 @@
 #include <linux/sev-guest.h>
 #include <linux/platform_device.h>
 #include <linux/io.h>
+#include <linux/cpumask.h>
+#include <linux/iommu.h>
 
 #include <asm/cpu_entry_area.h>
 #include <asm/stacktrace.h>
@@ -40,11 +42,19 @@
 #include <asm/efi.h>
 #include <asm/cpuid.h>
 #include <asm/setup.h>
+#include <asm/apic.h>
+#include <asm/iommu.h>
 
 #include "sev-internal.h"
 
 #define DR7_RESET_VALUE        0x400
 
+/*
+ * The first 16KB from the RMP_BASE is used by the processor for the
+ * bookkeeping, the range need to be added during the RMP entry lookup.
+ */
+#define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
+
 /* For early boot hypervisor communication in SEV-ES enabled guests */
 static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
 
@@ -56,6 +66,9 @@ static struct ghcb __initdata *boot_ghcb;
 
 static u64 snp_secrets_phys;
 
+static unsigned long rmptable_start __ro_after_init;
+static unsigned long rmptable_end __ro_after_init;
+
 /* #VC handler runtime per-CPU data */
 struct sev_es_runtime_data {
 	struct ghcb ghcb_page;
@@ -2232,3 +2245,134 @@ static int __init add_snp_guest_request(void)
 	return 0;
 }
 device_initcall(add_snp_guest_request);
+
+#undef pr_fmt
+#define pr_fmt(fmt)	"SEV-SNP: " fmt
+
+static int __snp_enable(unsigned int cpu)
+{
+	u64 val;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+
+	val |= MSR_AMD64_SYSCFG_SNP_EN;
+	val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
+
+	wrmsrl(MSR_AMD64_SYSCFG, val);
+
+	return 0;
+}
+
+static __init void snp_enable(void *arg)
+{
+	__snp_enable(smp_processor_id());
+}
+
+static bool get_rmptable_info(u64 *start, u64 *len)
+{
+	u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end, nr_pages;
+
+	rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
+	rdmsrl(MSR_AMD64_RMP_END, rmp_end);
+
+	if (!rmp_base || !rmp_end) {
+		pr_info("Memory for the RMP table has not been reserved by BIOS\n");
+		return false;
+	}
+
+	rmp_sz = rmp_end - rmp_base + 1;
+
+	/*
+	 * Calculate the amount the memory that must be reserved by the BIOS to
+	 * address the full system RAM. The reserved memory should also cover the
+	 * RMP table itself.
+	 *
+	 * See PPR Family 19h Model 01h, Revision B1 section 2.1.5.2 for more
+	 * information on memory requirement.
+	 */
+	nr_pages = totalram_pages();
+	calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + nr_pages) << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+	if (calc_rmp_sz > rmp_sz) {
+		pr_info("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
+			calc_rmp_sz, rmp_sz);
+		return false;
+	}
+
+	*start = rmp_base;
+	*len = rmp_sz;
+
+	pr_info("RMP table physical address 0x%016llx - 0x%016llx\n", rmp_base, rmp_end);
+
+	return true;
+}
+
+static __init int __snp_rmptable_init(void)
+{
+	u64 rmp_base, sz;
+	void *start;
+	u64 val;
+
+	if (!get_rmptable_info(&rmp_base, &sz))
+		return 1;
+
+	start = memremap(rmp_base, sz, MEMREMAP_WB);
+	if (!start) {
+		pr_err("Failed to map RMP table 0x%llx+0x%llx\n", rmp_base, sz);
+		return 1;
+	}
+
+	/*
+	 * Check if SEV-SNP is already enabled, this can happen if we are coming from
+	 * kexec boot.
+	 */
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+	if (val & MSR_AMD64_SYSCFG_SNP_EN)
+		goto skip_enable;
+
+	/* Initialize the RMP table to zero */
+	memset(start, 0, sz);
+
+	/* Flush the caches to ensure that data is written before SNP is enabled. */
+	wbinvd_on_all_cpus();
+
+	/* Enable SNP on all CPUs. */
+	on_each_cpu(snp_enable, NULL, 1);
+
+skip_enable:
+	rmptable_start = (unsigned long)start;
+	rmptable_end = rmptable_start + sz;
+
+	return 0;
+}
+
+static int __init snp_rmptable_init(void)
+{
+	if (!boot_cpu_has(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	if (!iommu_sev_snp_supported())
+		goto nosnp;
+
+	if (__snp_rmptable_init())
+		goto nosnp;
+
+	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
+
+	return 0;
+
+nosnp:
+	setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+	return 1;
+}
+
+/*
+ * This must be called after the PCI subsystem. This is because before enabling
+ * the SNP feature we need to ensure that IOMMU supports the SEV-SNP feature.
+ * The iommu_sev_snp_support() is used for checking the feature, and it is
+ * available after subsys_initcall().
+ */
+fs_initcall(snp_rmptable_init);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 04/45] x86/sev: Add RMP entry lookup helpers
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (2 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 03/45] x86/sev: Add the host SEV-SNP initialization support Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-24  9:49   ` Borislav Petkov
  2022-06-02 11:57   ` Jarkko Sakkinen
  2021-08-20 15:58 ` [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Brijesh Singh
                   ` (41 subsequent siblings)
  45 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
entry for a given page. The RMP entry format is documented in AMD PPR, see
https://bugzilla.kernel.org/attachment.cgi?id=296015.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev.h | 27 ++++++++++++++++++++++++
 arch/x86/kernel/sev.c      | 43 ++++++++++++++++++++++++++++++++++++++
 include/linux/sev.h        | 30 ++++++++++++++++++++++++++
 3 files changed, 100 insertions(+)
 create mode 100644 include/linux/sev.h

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index a5f0a1c3ccbe..5b1a6a075c47 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -9,6 +9,7 @@
 #define __ASM_ENCRYPTED_STATE_H
 
 #include <linux/types.h>
+#include <linux/sev.h>
 #include <asm/insn.h>
 #include <asm/sev-common.h>
 #include <asm/bootparam.h>
@@ -77,6 +78,32 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 
 /* RMP page size */
 #define RMP_PG_SIZE_4K			0
+#define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+
+/*
+ * The RMP entry format is not architectural. The format is defined in PPR
+ * Family 19h Model 01h, Rev B1 processor.
+ */
+struct __packed rmpentry {
+	union {
+		struct {
+			u64	assigned	: 1,
+				pagesize	: 1,
+				immutable	: 1,
+				rsvd1		: 9,
+				gpa		: 39,
+				asid		: 10,
+				vmsa		: 1,
+				validated	: 1,
+				rsvd2		: 1;
+		} info;
+		u64 low;
+	};
+	u64 high;
+};
+
+#define rmpentry_assigned(x)	((x)->info.assigned)
+#define rmpentry_pagesize(x)	((x)->info.pagesize)
 
 #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
 
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 7936c8139c74..f383d2a89263 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -54,6 +54,8 @@
  * bookkeeping, the range need to be added during the RMP entry lookup.
  */
 #define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
+#define RMPENTRY_SHIFT			8
+#define rmptable_page_offset(x)	(RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))
 
 /* For early boot hypervisor communication in SEV-ES enabled guests */
 static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
@@ -2376,3 +2378,44 @@ static int __init snp_rmptable_init(void)
  * available after subsys_initcall().
  */
 fs_initcall(snp_rmptable_init);
+
+static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
+{
+	unsigned long vaddr, paddr = pfn << PAGE_SHIFT;
+	struct rmpentry *entry, *large_entry;
+
+	if (!pfn_valid(pfn))
+		return ERR_PTR(-EINVAL);
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return ERR_PTR(-ENXIO);
+
+	vaddr = rmptable_start + rmptable_page_offset(paddr);
+	if (unlikely(vaddr > rmptable_end))
+		return ERR_PTR(-ENXIO);
+
+	entry = (struct rmpentry *)vaddr;
+
+	/* Read a large RMP entry to get the correct page level used in RMP entry. */
+	vaddr = rmptable_start + rmptable_page_offset(paddr & PMD_MASK);
+	large_entry = (struct rmpentry *)vaddr;
+	*level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(large_entry));
+
+	return entry;
+}
+
+/*
+ * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
+ * and -errno if there is no corresponding RMP entry.
+ */
+int snp_lookup_rmpentry(u64 pfn, int *level)
+{
+	struct rmpentry *e;
+
+	e = __snp_lookup_rmpentry(pfn, level);
+	if (IS_ERR(e))
+		return PTR_ERR(e);
+
+	return !!rmpentry_assigned(e);
+}
+EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
diff --git a/include/linux/sev.h b/include/linux/sev.h
new file mode 100644
index 000000000000..1a68842789e1
--- /dev/null
+++ b/include/linux/sev.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD Secure Encrypted Virtualization
+ *
+ * Author: Brijesh Singh <brijesh.singh@amd.com>
+ */
+
+#ifndef __LINUX_SEV_H
+#define __LINUX_SEV_H
+
+/* RMUPDATE detected 4K page and 2MB page overlap. */
+#define RMPUPDATE_FAIL_OVERLAP		7
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+int snp_lookup_rmpentry(u64 pfn, int *level);
+int psmash(u64 pfn);
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
+int rmp_make_shared(u64 pfn, enum pg_level level);
+#else
+static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
+static inline int psmash(u64 pfn) { return -ENXIO; }
+static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
+				   bool immutable)
+{
+	return -ENODEV;
+}
+static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+#endif /* __LINUX_SEV_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (3 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 04/45] x86/sev: Add RMP entry lookup helpers Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-24 14:04   ` Borislav Petkov
  2021-10-15 18:05   ` Sean Christopherson
  2021-08-20 15:58 ` [PATCH Part2 v5 06/45] x86/sev: Invalid pages from direct map when adding it to RMP table Brijesh Singh
                   ` (40 subsequent siblings)
  45 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.

The PSMASH instruction expands a 2MB RMP entry into a corresponding set of
contiguous 4KB-Page RMP entries. The hypervisor will use this instruction
to adjust the RMP entry without invalidating the previous RMP entry.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev.h | 11 ++++++
 arch/x86/kernel/sev.c      | 72 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 5b1a6a075c47..92ced9626e95 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -78,7 +78,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 
 /* RMP page size */
 #define RMP_PG_SIZE_4K			0
+#define RMP_PG_SIZE_2M			1
 #define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+#define X86_TO_RMP_PG_LEVEL(level)	(((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
 
 /*
  * The RMP entry format is not architectural. The format is defined in PPR
@@ -107,6 +109,15 @@ struct __packed rmpentry {
 
 #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
 
+struct rmpupdate {
+	u64 gpa;
+	u8 assigned;
+	u8 pagesize;
+	u8 immutable;
+	u8 rsvd;
+	u32 asid;
+} __packed;
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 extern struct static_key_false sev_es_enable_key;
 extern void __sev_es_ist_enter(struct pt_regs *regs);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index f383d2a89263..8627c49666c9 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2419,3 +2419,75 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
 	return !!rmpentry_assigned(e);
 }
 EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
+
+int psmash(u64 pfn)
+{
+	unsigned long paddr = pfn << PAGE_SHIFT;
+	int ret;
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENXIO;
+
+	/* Binutils version 2.36 supports the PSMASH mnemonic. */
+	asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+		      : "=a"(ret)
+		      : "a"(paddr)
+		      : "memory", "cc");
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(psmash);
+
+static int rmpupdate(u64 pfn, struct rmpupdate *val)
+{
+	unsigned long paddr = pfn << PAGE_SHIFT;
+	int ret;
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENXIO;
+
+	/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+	asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+		     : "=a"(ret)
+		     : "a"(paddr), "c"((unsigned long)val)
+		     : "memory", "cc");
+	return ret;
+}
+
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
+{
+	struct rmpupdate val;
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
+	memset(&val, 0, sizeof(val));
+	val.assigned = 1;
+	val.asid = asid;
+	val.immutable = immutable;
+	val.gpa = gpa;
+	val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+	return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_private);
+
+int rmp_make_shared(u64 pfn, enum pg_level level)
+{
+	struct rmpupdate val;
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
+	memset(&val, 0, sizeof(val));
+	val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+	return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_shared);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 06/45] x86/sev: Invalid pages from direct map when adding it to RMP table
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (4 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-29 14:34   ` Borislav Petkov
  2021-08-20 15:58 ` [PATCH Part2 v5 07/45] x86/traps: Define RMP violation #PF error code Brijesh Singh
                   ` (39 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used with standard x86 and IOMMU page tables to enforce memory
restrictions and page access rights. The RMP check is enforced as soon as
SEV-SNP is enabled globally in the system. When hardware encounters an
RMP checks failure, it raises a page-fault exception.

The rmp_make_private() and rmp_make_shared() helpers are used to add
or remove the pages from the RMP table. Improve the rmp_make_private() to
invalid state so that pages cannot be used in the direct-map after its
added in the RMP table, and restore to its default valid permission after
the pages are removed from the RMP table.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kernel/sev.c | 61 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 8627c49666c9..bad41deb8335 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2441,10 +2441,42 @@ int psmash(u64 pfn)
 }
 EXPORT_SYMBOL_GPL(psmash);
 
+static int restore_direct_map(u64 pfn, int npages)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < npages; i++) {
+		ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));
+		if (ret)
+			goto cleanup;
+	}
+
+cleanup:
+	WARN(ret > 0, "Failed to restore direct map for pfn 0x%llx\n", pfn + i);
+	return ret;
+}
+
+static int invalid_direct_map(unsigned long pfn, int npages)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < npages; i++) {
+		ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));
+		if (ret)
+			goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	restore_direct_map(pfn, i);
+	return ret;
+}
+
 static int rmpupdate(u64 pfn, struct rmpupdate *val)
 {
 	unsigned long paddr = pfn << PAGE_SHIFT;
-	int ret;
+	int ret, level, npages;
 
 	if (!pfn_valid(pfn))
 		return -EINVAL;
@@ -2452,11 +2484,38 @@ static int rmpupdate(u64 pfn, struct rmpupdate *val)
 	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
 		return -ENXIO;
 
+	level = RMP_TO_X86_PG_LEVEL(val->pagesize);
+	npages = page_level_size(level) / PAGE_SIZE;
+
+	/*
+	 * If page is getting assigned in the RMP table then unmap it from the
+	 * direct map.
+	 */
+	if (val->assigned) {
+		if (invalid_direct_map(pfn, npages)) {
+			pr_err("Failed to unmap pfn 0x%llx pages %d from direct_map\n",
+			       pfn, npages);
+			return -EFAULT;
+		}
+	}
+
 	/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
 	asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
 		     : "=a"(ret)
 		     : "a"(paddr), "c"((unsigned long)val)
 		     : "memory", "cc");
+
+	/*
+	 * Restore the direct map after the page is removed from the RMP table.
+	 */
+	if (!ret && !val->assigned) {
+		if (restore_direct_map(pfn, npages)) {
+			pr_err("Failed to map pfn 0x%llx pages %d in direct_map\n",
+			       pfn, npages);
+			return -EFAULT;
+		}
+	}
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 07/45] x86/traps: Define RMP violation #PF error code
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (5 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 06/45] x86/sev: Invalid pages from direct map when adding it to RMP table Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-29 17:25   ` Borislav Petkov
  2021-08-20 15:58 ` [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address Brijesh Singh
                   ` (38 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

Bit 31 in the page fault-error bit will be set when processor encounters
an RMP violation.

While at it, use the BIT_ULL() macro.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/trap_pf.h | 18 +++++++++++-------
 arch/x86/mm/fault.c            |  1 +
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index 10b1de500ab1..89b705114b3f 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_TRAP_PF_H
 #define _ASM_X86_TRAP_PF_H
 
+#include <linux/bits.h>  /* BIT() macro */
+
 /*
  * Page fault error code bits:
  *
@@ -12,15 +14,17 @@
  *   bit 4 ==				1: fault was an instruction fetch
  *   bit 5 ==				1: protection keys block access
  *   bit 15 ==				1: SGX MMU page-fault
+ *   bit 31 ==				1: fault was due to RMP violation
  */
 enum x86_pf_error_code {
-	X86_PF_PROT	=		1 << 0,
-	X86_PF_WRITE	=		1 << 1,
-	X86_PF_USER	=		1 << 2,
-	X86_PF_RSVD	=		1 << 3,
-	X86_PF_INSTR	=		1 << 4,
-	X86_PF_PK	=		1 << 5,
-	X86_PF_SGX	=		1 << 15,
+	X86_PF_PROT	=		BIT_ULL(0),
+	X86_PF_WRITE	=		BIT_ULL(1),
+	X86_PF_USER	=		BIT_ULL(2),
+	X86_PF_RSVD	=		BIT_ULL(3),
+	X86_PF_INSTR	=		BIT_ULL(4),
+	X86_PF_PK	=		BIT_ULL(5),
+	X86_PF_SGX	=		BIT_ULL(15),
+	X86_PF_RMP	=		BIT_ULL(31),
 };
 
 #endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index b2eefdefc108..8b7a5757440e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -545,6 +545,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 		 !(error_code & X86_PF_PROT) ? "not-present page" :
 		 (error_code & X86_PF_RSVD)  ? "reserved bit violation" :
 		 (error_code & X86_PF_PK)    ? "protection keys violation" :
+		 (error_code & X86_PF_RMP)   ? "RMP violation" :
 					       "permissions violation");
 
 	if (!(error_code & X86_PF_USER) && user_mode(regs)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (6 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 07/45] x86/traps: Define RMP violation #PF error code Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-08-23 14:20   ` Dave Hansen
  2021-09-29 18:19   ` Borislav Petkov
  2021-08-20 15:58 ` [PATCH Part2 v5 09/45] x86/fault: Add support to dump RMP entry on fault Brijesh Singh
                   ` (37 subsequent siblings)
  45 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

When SEV-SNP is enabled globally, a write from the host goes through the
RMP check. When the host writes to pages, hardware checks the following
conditions at the end of page walk:

1. Assigned bit in the RMP table is zero (i.e page is shared).
2. If the page table entry that gives the sPA indicates that the target
   page size is a large page, then all RMP entries for the 4KB
   constituting pages of the target must have the assigned bit 0.
3. Immutable bit in the RMP table is not zero.

The hardware will raise page fault if one of the above conditions is not
met. Try resolving the fault instead of taking fault again and again. If
the host attempts to write to the guest private memory then send the
SIGBUS signal to kill the process. If the page level between the host and
RMP entry does not match, then split the address to keep the RMP and host
page levels in sync.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/mm/fault.c | 66 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm.h  |  6 ++++-
 mm/memory.c         | 13 +++++++++
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8b7a5757440e..f2d543b92f43 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -19,6 +19,7 @@
 #include <linux/uaccess.h>		/* faulthandler_disabled()	*/
 #include <linux/efi.h>			/* efi_crash_gracefully_on_page_fault()*/
 #include <linux/mm_types.h>
+#include <linux/sev.h>			/* snp_lookup_rmpentry()	*/
 
 #include <asm/cpufeature.h>		/* boot_cpu_has, ...		*/
 #include <asm/traps.h>			/* dotraplinkage, ...		*/
@@ -1202,6 +1203,60 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 }
 NOKPROBE_SYMBOL(do_kern_addr_fault);
 
+static inline size_t pages_per_hpage(int level)
+{
+	return page_level_size(level) / PAGE_SIZE;
+}
+
+/*
+ * Return 1 if the caller need to retry, 0 if it the address need to be split
+ * in order to resolve the fault.
+ */
+static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
+				      unsigned long address)
+{
+	int rmp_level, level;
+	pte_t *pte;
+	u64 pfn;
+
+	pte = lookup_address_in_mm(current->mm, address, &level);
+
+	/*
+	 * It can happen if there was a race between an unmap event and
+	 * the RMP fault delivery.
+	 */
+	if (!pte || !pte_present(*pte))
+		return 1;
+
+	pfn = pte_pfn(*pte);
+
+	/* If its large page then calculte the fault pfn */
+	if (level > PG_LEVEL_4K) {
+		unsigned long mask;
+
+		mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
+		pfn |= (address >> PAGE_SHIFT) & mask;
+	}
+
+	/*
+	 * If its a guest private page, then the fault cannot be resolved.
+	 * Send a SIGBUS to terminate the process.
+	 */
+	if (snp_lookup_rmpentry(pfn, &rmp_level)) {
+		do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
+		return 1;
+	}
+
+	/*
+	 * The backing page level is higher than the RMP page level, request
+	 * to split the page.
+	 */
+	if (level > rmp_level)
+		return 0;
+
+	return 1;
+}
+
 /*
  * Handle faults in the user portion of the address space.  Nothing in here
  * should check X86_PF_USER without a specific justification: for almost
@@ -1299,6 +1354,17 @@ void do_user_addr_fault(struct pt_regs *regs,
 	if (error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
+	/*
+	 * If its an RMP violation, try resolving it.
+	 */
+	if (error_code & X86_PF_RMP) {
+		if (handle_user_rmp_page_fault(regs, error_code, address))
+			return;
+
+		/* Ask to split the page */
+		flags |= FAULT_FLAG_PAGE_SPLIT;
+	}
+
 #ifdef CONFIG_X86_64
 	/*
 	 * Faults in the vsyscall page might need emulation.  The
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7ca22e6e694a..74a53c146365 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -447,6 +447,8 @@ extern pgprot_t protection_map[16];
  * @FAULT_FLAG_REMOTE: The fault is not for current task/mm.
  * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch.
  * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals.
+ * @FAULT_FLAG_PAGE_SPLIT: The fault was due page size mismatch, split the
+ *  region to smaller page size and retry.
  *
  * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
  * whether we would allow page faults to retry by specifying these two
@@ -478,6 +480,7 @@ enum fault_flag {
 	FAULT_FLAG_REMOTE =		1 << 7,
 	FAULT_FLAG_INSTRUCTION =	1 << 8,
 	FAULT_FLAG_INTERRUPTIBLE =	1 << 9,
+	FAULT_FLAG_PAGE_SPLIT =		1 << 10,
 };
 
 /*
@@ -517,7 +520,8 @@ static inline bool fault_flag_allow_retry_first(enum fault_flag flags)
 	{ FAULT_FLAG_USER,		"USER" }, \
 	{ FAULT_FLAG_REMOTE,		"REMOTE" }, \
 	{ FAULT_FLAG_INSTRUCTION,	"INSTRUCTION" }, \
-	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }
+	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }, \
+	{ FAULT_FLAG_PAGE_SPLIT,	"PAGESPLIT" }
 
 /*
  * vm_fault is filled by the pagefault handler and passed to the vma's
diff --git a/mm/memory.c b/mm/memory.c
index 747a01d495f2..27e6ccec3fc1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4589,6 +4589,15 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	return 0;
 }
 
+static int handle_split_page_fault(struct vm_fault *vmf)
+{
+	if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+		return VM_FAULT_SIGBUS;
+
+	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
+	return 0;
+}
+
 /*
  * By the time we get here, we already hold the mm semaphore
  *
@@ -4666,6 +4675,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 				pmd_migration_entry_wait(mm, vmf.pmd);
 			return 0;
 		}
+
+		if (flags & FAULT_FLAG_PAGE_SPLIT)
+			return handle_split_page_fault(&vmf);
+
 		if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) {
 			if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
 				return do_huge_pmd_numa_page(&vmf);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 09/45] x86/fault: Add support to dump RMP entry on fault
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (7 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-29 18:38   ` Borislav Petkov
  2021-08-20 15:58 ` [PATCH Part2 v5 10/45] crypto: ccp: shutdown SEV firmware on kexec Brijesh Singh
                   ` (36 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

When SEV-SNP is enabled globally, a write from the host goes through the
RMP check. If the hardware encounters the check failure, then it raises
the #PF (with RMP set). Dump the RMP entry at the faulting pfn to help
the debug.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev.h |  7 +++++++
 arch/x86/kernel/sev.c      | 43 ++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/fault.c        | 17 +++++++++++----
 include/linux/sev.h        |  2 ++
 4 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 92ced9626e95..569294f687e6 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -106,6 +106,11 @@ struct __packed rmpentry {
 
 #define rmpentry_assigned(x)	((x)->info.assigned)
 #define rmpentry_pagesize(x)	((x)->info.pagesize)
+#define rmpentry_vmsa(x)	((x)->info.vmsa)
+#define rmpentry_asid(x)	((x)->info.asid)
+#define rmpentry_validated(x)	((x)->info.validated)
+#define rmpentry_gpa(x)		((unsigned long)(x)->info.gpa)
+#define rmpentry_immutable(x)	((x)->info.immutable)
 
 #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
 
@@ -165,6 +170,7 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
 void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
 void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
 void snp_set_wakeup_secondary_cpu(void);
+void dump_rmpentry(u64 pfn);
 #ifdef __BOOT_COMPRESSED
 bool sev_snp_enabled(void);
 #else
@@ -188,6 +194,7 @@ static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npage
 static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
 static inline void snp_set_wakeup_secondary_cpu(void) { }
 static inline void sev_snp_cpuid_init(struct boot_params *bp) { }
+static inline void dump_rmpentry(u64 pfn) {}
 #ifdef __BOOT_COMPRESSED
 static inline bool sev_snp_enabled { return false; }
 #else
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index bad41deb8335..8b3e83e50468 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2404,6 +2404,49 @@ static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
 	return entry;
 }
 
+void dump_rmpentry(u64 pfn)
+{
+	unsigned long pfn_end;
+	struct rmpentry *e;
+	int level;
+
+	e = __snp_lookup_rmpentry(pfn, &level);
+	if (!e) {
+		pr_alert("failed to read RMP entry pfn 0x%llx\n", pfn);
+		return;
+	}
+
+	if (rmpentry_assigned(e)) {
+		pr_alert("RMPEntry paddr 0x%llx [assigned=%d immutable=%d pagesize=%d gpa=0x%lx"
+			" asid=%d vmsa=%d validated=%d]\n", pfn << PAGE_SHIFT,
+			rmpentry_assigned(e), rmpentry_immutable(e), rmpentry_pagesize(e),
+			rmpentry_gpa(e), rmpentry_asid(e), rmpentry_vmsa(e),
+			rmpentry_validated(e));
+		return;
+	}
+
+	/*
+	 * If the RMP entry at the faulting pfn was not assigned, then we do not
+	 * know what caused the RMP violation. To get some useful debug information,
+	 * let iterate through the entire 2MB region, and dump the RMP entries if
+	 * one of the bit in the RMP entry is set.
+	 */
+	pfn = pfn & ~(PTRS_PER_PMD - 1);
+	pfn_end = pfn + PTRS_PER_PMD;
+
+	while (pfn < pfn_end) {
+		e = __snp_lookup_rmpentry(pfn, &level);
+		if (!e)
+			return;
+
+		if (e->low || e->high)
+			pr_alert("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+				 pfn << PAGE_SHIFT, e->high, e->low);
+		pfn++;
+	}
+}
+EXPORT_SYMBOL_GPL(dump_rmpentry);
+
 /*
  * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
  * and -errno if there is no corresponding RMP entry.
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f2d543b92f43..9cd33169dfb5 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -33,6 +33,7 @@
 #include <asm/pgtable_areas.h>		/* VMALLOC_START, ...		*/
 #include <asm/kvm_para.h>		/* kvm_handle_async_pf		*/
 #include <asm/vdso.h>			/* fixup_vdso_exception()	*/
+#include <asm/sev.h>			/* dump_rmpentry()		*/
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -289,7 +290,7 @@ static bool low_pfn(unsigned long pfn)
 	return pfn < max_low_pfn;
 }
 
-static void dump_pagetable(unsigned long address)
+static void dump_pagetable(unsigned long address, bool show_rmpentry)
 {
 	pgd_t *base = __va(read_cr3_pa());
 	pgd_t *pgd = &base[pgd_index(address)];
@@ -345,10 +346,11 @@ static int bad_address(void *p)
 	return get_kernel_nofault(dummy, (unsigned long *)p);
 }
 
-static void dump_pagetable(unsigned long address)
+static void dump_pagetable(unsigned long address, bool show_rmpentry)
 {
 	pgd_t *base = __va(read_cr3_pa());
 	pgd_t *pgd = base + pgd_index(address);
+	unsigned long pfn;
 	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
@@ -366,6 +368,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(p4d))
 		goto bad;
 
+	pfn = p4d_pfn(*p4d);
 	pr_cont("P4D %lx ", p4d_val(*p4d));
 	if (!p4d_present(*p4d) || p4d_large(*p4d))
 		goto out;
@@ -374,6 +377,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pud))
 		goto bad;
 
+	pfn = pud_pfn(*pud);
 	pr_cont("PUD %lx ", pud_val(*pud));
 	if (!pud_present(*pud) || pud_large(*pud))
 		goto out;
@@ -382,6 +386,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pmd))
 		goto bad;
 
+	pfn = pmd_pfn(*pmd);
 	pr_cont("PMD %lx ", pmd_val(*pmd));
 	if (!pmd_present(*pmd) || pmd_large(*pmd))
 		goto out;
@@ -390,9 +395,13 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pte))
 		goto bad;
 
+	pfn = pte_pfn(*pte);
 	pr_cont("PTE %lx", pte_val(*pte));
 out:
 	pr_cont("\n");
+
+	if (show_rmpentry)
+		dump_rmpentry(pfn);
 	return;
 bad:
 	pr_info("BAD\n");
@@ -578,7 +587,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 		show_ldttss(&gdt, "TR", tr);
 	}
 
-	dump_pagetable(address);
+	dump_pagetable(address, error_code & X86_PF_RMP);
 }
 
 static noinline void
@@ -595,7 +604,7 @@ pgtable_bad(struct pt_regs *regs, unsigned long error_code,
 
 	printk(KERN_ALERT "%s: Corrupted page table at address %lx\n",
 	       tsk->comm, address);
-	dump_pagetable(address);
+	dump_pagetable(address, false);
 
 	if (__die("Bad pagetable", regs, error_code))
 		sig = 0;
diff --git a/include/linux/sev.h b/include/linux/sev.h
index 1a68842789e1..734b13a69c54 100644
--- a/include/linux/sev.h
+++ b/include/linux/sev.h
@@ -16,6 +16,7 @@ int snp_lookup_rmpentry(u64 pfn, int *level);
 int psmash(u64 pfn);
 int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
 int rmp_make_shared(u64 pfn, enum pg_level level);
+void dump_rmpentry(u64 pfn);
 #else
 static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
 static inline int psmash(u64 pfn) { return -ENXIO; }
@@ -25,6 +26,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
 	return -ENODEV;
 }
 static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
+static inline void dump_rmpentry(u64 pfn) { }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 #endif /* __LINUX_SEV_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 10/45] crypto: ccp: shutdown SEV firmware on kexec
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (8 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 09/45] x86/fault: Add support to dump RMP entry on fault Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 11/45] crypto:ccp: Define the SEV-SNP commands Brijesh Singh
                   ` (35 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh, stable,
	Herbert Xu

The commit 97f9ac3db6612 ("crypto: ccp - Add support for SEV-ES to the
PSP driver") added support to allocate Trusted Memory Region (TMR)
used during the SEV-ES firmware initialization. The TMR gets locked
during the firmware initialization and unlocked during the shutdown.
While the TMR is locked, access to it is disallowed.

Currently, the CCP driver does not shutdown the firmware during the
kexec reboot, leaving the TMR memory locked.

Register a callback to shutdown the SEV firmware on the kexec boot.

Fixes: 97f9ac3db6612 ("crypto: ccp - Add support for SEV-ES to the PSP driver")
Reported-by: Lucas Nussbaum <lucas.nussbaum@inria.fr>
Tested-by: Lucas Nussbaum <lucas.nussbaum@inria.fr>
Cc: <stable@kernel.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 49 +++++++++++++++++-------------------
 drivers/crypto/ccp/sp-pci.c  | 12 +++++++++
 2 files changed, 35 insertions(+), 26 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 91808402e0bf..2ecb0e1f65d8 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -300,6 +300,9 @@ static int __sev_platform_shutdown_locked(int *error)
 	struct sev_device *sev = psp_master->sev_data;
 	int ret;
 
+	if (sev->state == SEV_STATE_UNINIT)
+		return 0;
+
 	ret = __sev_do_cmd_locked(SEV_CMD_SHUTDOWN, NULL, error);
 	if (ret)
 		return ret;
@@ -1019,6 +1022,20 @@ int sev_dev_init(struct psp_device *psp)
 	return ret;
 }
 
+static void sev_firmware_shutdown(struct sev_device *sev)
+{
+	sev_platform_shutdown(NULL);
+
+	if (sev_es_tmr) {
+		/* The TMR area was encrypted, flush it from the cache */
+		wbinvd_on_all_cpus();
+
+		free_pages((unsigned long)sev_es_tmr,
+			   get_order(SEV_ES_TMR_SIZE));
+		sev_es_tmr = NULL;
+	}
+}
+
 void sev_dev_destroy(struct psp_device *psp)
 {
 	struct sev_device *sev = psp->sev_data;
@@ -1026,6 +1043,8 @@ void sev_dev_destroy(struct psp_device *psp)
 	if (!sev)
 		return;
 
+	sev_firmware_shutdown(sev);
+
 	if (sev->misc)
 		kref_put(&misc_dev->refcount, sev_exit);
 
@@ -1056,21 +1075,6 @@ void sev_pci_init(void)
 	if (sev_get_api_version())
 		goto err;
 
-	/*
-	 * If platform is not in UNINIT state then firmware upgrade and/or
-	 * platform INIT command will fail. These command require UNINIT state.
-	 *
-	 * In a normal boot we should never run into case where the firmware
-	 * is not in UNINIT state on boot. But in case of kexec boot, a reboot
-	 * may not go through a typical shutdown sequence and may leave the
-	 * firmware in INIT or WORKING state.
-	 */
-
-	if (sev->state != SEV_STATE_UNINIT) {
-		sev_platform_shutdown(NULL);
-		sev->state = SEV_STATE_UNINIT;
-	}
-
 	if (sev_version_greater_or_equal(0, 15) &&
 	    sev_update_firmware(sev->dev) == 0)
 		sev_get_api_version();
@@ -1115,17 +1119,10 @@ void sev_pci_init(void)
 
 void sev_pci_exit(void)
 {
-	if (!psp_master->sev_data)
-		return;
-
-	sev_platform_shutdown(NULL);
+	struct sev_device *sev = psp_master->sev_data;
 
-	if (sev_es_tmr) {
-		/* The TMR area was encrypted, flush it from the cache */
-		wbinvd_on_all_cpus();
+	if (!sev)
+		return;
 
-		free_pages((unsigned long)sev_es_tmr,
-			   get_order(SEV_ES_TMR_SIZE));
-		sev_es_tmr = NULL;
-	}
+	sev_firmware_shutdown(sev);
 }
diff --git a/drivers/crypto/ccp/sp-pci.c b/drivers/crypto/ccp/sp-pci.c
index 6fb6ba35f89d..9bcc1884c06a 100644
--- a/drivers/crypto/ccp/sp-pci.c
+++ b/drivers/crypto/ccp/sp-pci.c
@@ -241,6 +241,17 @@ static int sp_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	return ret;
 }
 
+static void sp_pci_shutdown(struct pci_dev *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct sp_device *sp = dev_get_drvdata(dev);
+
+	if (!sp)
+		return;
+
+	sp_destroy(sp);
+}
+
 static void sp_pci_remove(struct pci_dev *pdev)
 {
 	struct device *dev = &pdev->dev;
@@ -371,6 +382,7 @@ static struct pci_driver sp_pci_driver = {
 	.id_table = sp_pci_table,
 	.probe = sp_pci_probe,
 	.remove = sp_pci_remove,
+	.shutdown = sp_pci_shutdown,
 	.driver.pm = &sp_pci_pm_ops,
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 11/45] crypto:ccp: Define the SEV-SNP commands
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (9 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 10/45] crypto: ccp: shutdown SEV firmware on kexec Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 12/45] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Brijesh Singh
                   ` (34 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
while adding new hardware security protection.

Define the commands and structures used to communicate with the AMD-SP
when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
is available at developer.amd.com/sev.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c |  16 ++-
 include/linux/psp-sev.h      | 222 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h |  42 +++++++
 3 files changed, 279 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 2ecb0e1f65d8..f5dbadba82ff 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -134,7 +134,21 @@ static int sev_cmd_buffer_len(int cmd)
 	case SEV_CMD_DOWNLOAD_FIRMWARE:		return sizeof(struct sev_data_download_firmware);
 	case SEV_CMD_GET_ID:			return sizeof(struct sev_data_get_id);
 	case SEV_CMD_ATTESTATION_REPORT:	return sizeof(struct sev_data_attestation_report);
-	case SEV_CMD_SEND_CANCEL:			return sizeof(struct sev_data_send_cancel);
+	case SEV_CMD_SEND_CANCEL:		return sizeof(struct sev_data_send_cancel);
+	case SEV_CMD_SNP_GCTX_CREATE:		return sizeof(struct sev_data_snp_gctx_create);
+	case SEV_CMD_SNP_LAUNCH_START:		return sizeof(struct sev_data_snp_launch_start);
+	case SEV_CMD_SNP_LAUNCH_UPDATE:		return sizeof(struct sev_data_snp_launch_update);
+	case SEV_CMD_SNP_ACTIVATE:		return sizeof(struct sev_data_snp_activate);
+	case SEV_CMD_SNP_DECOMMISSION:		return sizeof(struct sev_data_snp_decommission);
+	case SEV_CMD_SNP_PAGE_RECLAIM:		return sizeof(struct sev_data_snp_page_reclaim);
+	case SEV_CMD_SNP_GUEST_STATUS:		return sizeof(struct sev_data_snp_guest_status);
+	case SEV_CMD_SNP_LAUNCH_FINISH:		return sizeof(struct sev_data_snp_launch_finish);
+	case SEV_CMD_SNP_DBG_DECRYPT:		return sizeof(struct sev_data_snp_dbg);
+	case SEV_CMD_SNP_DBG_ENCRYPT:		return sizeof(struct sev_data_snp_dbg);
+	case SEV_CMD_SNP_PAGE_UNSMASH:		return sizeof(struct sev_data_snp_page_unsmash);
+	case SEV_CMD_SNP_PLATFORM_STATUS:	return sizeof(struct sev_data_snp_platform_status_buf);
+	case SEV_CMD_SNP_GUEST_REQUEST:		return sizeof(struct sev_data_snp_guest_request);
+	case SEV_CMD_SNP_CONFIG:		return sizeof(struct sev_user_data_snp_config);
 	default:				return 0;
 	}
 
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index d48a7192e881..c3755099ab55 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -85,6 +85,34 @@ enum sev_cmd {
 	SEV_CMD_DBG_DECRYPT		= 0x060,
 	SEV_CMD_DBG_ENCRYPT		= 0x061,
 
+	/* SNP specific commands */
+	SEV_CMD_SNP_INIT		= 0x81,
+	SEV_CMD_SNP_SHUTDOWN		= 0x82,
+	SEV_CMD_SNP_PLATFORM_STATUS	= 0x83,
+	SEV_CMD_SNP_DF_FLUSH		= 0x84,
+	SEV_CMD_SNP_INIT_EX		= 0x85,
+	SEV_CMD_SNP_DECOMMISSION	= 0x90,
+	SEV_CMD_SNP_ACTIVATE		= 0x91,
+	SEV_CMD_SNP_GUEST_STATUS	= 0x92,
+	SEV_CMD_SNP_GCTX_CREATE		= 0x93,
+	SEV_CMD_SNP_GUEST_REQUEST	= 0x94,
+	SEV_CMD_SNP_ACTIVATE_EX		= 0x95,
+	SEV_CMD_SNP_LAUNCH_START	= 0xA0,
+	SEV_CMD_SNP_LAUNCH_UPDATE	= 0xA1,
+	SEV_CMD_SNP_LAUNCH_FINISH	= 0xA2,
+	SEV_CMD_SNP_DBG_DECRYPT		= 0xB0,
+	SEV_CMD_SNP_DBG_ENCRYPT		= 0xB1,
+	SEV_CMD_SNP_PAGE_SWAP_OUT	= 0xC0,
+	SEV_CMD_SNP_PAGE_SWAP_IN	= 0xC1,
+	SEV_CMD_SNP_PAGE_MOVE		= 0xC2,
+	SEV_CMD_SNP_PAGE_MD_INIT	= 0xC3,
+	SEV_CMD_SNP_PAGE_MD_RECLAIM	= 0xC4,
+	SEV_CMD_SNP_PAGE_RO_RECLAIM	= 0xC5,
+	SEV_CMD_SNP_PAGE_RO_RESTORE	= 0xC6,
+	SEV_CMD_SNP_PAGE_RECLAIM	= 0xC7,
+	SEV_CMD_SNP_PAGE_UNSMASH	= 0xC8,
+	SEV_CMD_SNP_CONFIG		= 0xC9,
+
 	SEV_CMD_MAX,
 };
 
@@ -510,6 +538,200 @@ struct sev_data_attestation_report {
 	u32 len;				/* In/Out */
 } __packed;
 
+/**
+ * struct sev_data_snp_platform_status_buf - SNP_PLATFORM_STATUS command params
+ *
+ * @address: physical address where the status should be copied
+ */
+struct sev_data_snp_platform_status_buf {
+	u64 status_paddr;			/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_download_firmware - SNP_DOWNLOAD_FIRMWARE command params
+ *
+ * @address: physical address of firmware image
+ * @len: len of the firmware image
+ */
+struct sev_data_snp_download_firmware {
+	u64 address;				/* In */
+	u32 len;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_gctx_create - SNP_GCTX_CREATE command params
+ *
+ * @gctx_paddr: system physical address of the page donated to firmware by
+ *		the hypervisor to contain the guest context.
+ */
+struct sev_data_snp_gctx_create {
+	u64 gctx_paddr;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_activate - SNP_ACTIVATE command params
+ *
+ * @gctx_paddr: system physical address guest context page
+ * @asid: ASID to bind to the guest
+ */
+struct sev_data_snp_activate {
+	u64 gctx_paddr;				/* In */
+	u32 asid;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_decommission - SNP_DECOMMISSION command params
+ *
+ * @address: system physical address guest context page
+ */
+struct sev_data_snp_decommission {
+	u64 gctx_paddr;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @policy: guest policy
+ * @ma_gctx_addr: system physical address of migration agent
+ * @imi_en: launch flow is launching an IMI for the purpose of
+ *   guest-assisted migration.
+ * @ma_en: the guest is associated with a migration agent
+ */
+struct sev_data_snp_launch_start {
+	u64 gctx_paddr;				/* In */
+	u64 policy;				/* In */
+	u64 ma_gctx_paddr;			/* In */
+	u32 ma_en:1;				/* In */
+	u32 imi_en:1;				/* In */
+	u32 rsvd:30;
+	u8 gosvw[16];				/* In */
+} __packed;
+
+/* SNP support page type */
+enum {
+	SNP_PAGE_TYPE_NORMAL		= 0x1,
+	SNP_PAGE_TYPE_VMSA		= 0x2,
+	SNP_PAGE_TYPE_ZERO		= 0x3,
+	SNP_PAGE_TYPE_UNMEASURED	= 0x4,
+	SNP_PAGE_TYPE_SECRET		= 0x5,
+	SNP_PAGE_TYPE_CPUID		= 0x6,
+
+	SNP_PAGE_TYPE_MAX
+};
+
+/**
+ * struct sev_data_snp_launch_update - SNP_LAUNCH_UPDATE command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @imi_page: indicates that this page is part of the IMI of the guest
+ * @page_type: encoded page type
+ * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
+ * @address: system physical address of destination page to encrypt
+ * @vmpl3_perms: VMPL permission mask for VMPL3
+ * @vmpl2_perms: VMPL permission mask for VMPL2
+ * @vmpl1_perms: VMPL permission mask for VMPL1
+ */
+struct sev_data_snp_launch_update {
+	u64 gctx_paddr;				/* In */
+	u32 page_size:1;			/* In */
+	u32 page_type:3;			/* In */
+	u32 imi_page:1;				/* In */
+	u32 rsvd:27;
+	u32 rsvd2;
+	u64 address;				/* In */
+	u32 rsvd3:8;
+	u32 vmpl3_perms:8;			/* In */
+	u32 vmpl2_perms:8;			/* In */
+	u32 vmpl1_perms:8;			/* In */
+	u32 rsvd4;
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_finish - SNP_LAUNCH_FINISH command params
+ *
+ * @gctx_addr: system pphysical address of guest context page
+ */
+struct sev_data_snp_launch_finish {
+	u64 gctx_paddr;
+	u64 id_block_paddr;
+	u64 id_auth_paddr;
+	u8 id_block_en:1;
+	u8 auth_key_en:1;
+	u64 rsvd:62;
+	u8 host_data[32];
+} __packed;
+
+/**
+ * struct sev_data_snp_guest_status - SNP_GUEST_STATUS command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @address: system physical address of guest status page
+ */
+struct sev_data_snp_guest_status {
+	u64 gctx_paddr;
+	u64 address;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_reclaim - SNP_PAGE_RECLAIM command params
+ *
+ * @paddr: system physical address of page to be claimed. The BIT0 indicate
+ *	the page size. 0h indicates 4 kB and 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_reclaim {
+	u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_unsmash - SNP_PAGE_UNMASH command params
+ *
+ * @paddr: system physical address of page to be unmashed. The BIT0 indicate
+ *	the page size. 0h indicates 4 kB and 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_unsmash {
+	u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_dbg - DBG_ENCRYPT/DBG_DECRYPT command parameters
+ *
+ * @handle: handle of the VM to perform debug operation
+ * @src_addr: source address of data to operate on
+ * @dst_addr: destination address of data to operate on
+ * @len: len of data to operate on
+ */
+struct sev_data_snp_dbg {
+	u64 gctx_paddr;				/* In */
+	u64 src_addr;				/* In */
+	u64 dst_addr;				/* In */
+	u32 len;				/* In */
+} __packed;
+
+/**
+ * struct sev_snp_guest_request - SNP_GUEST_REQUEST command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @req_paddr: system physical address of request page
+ * @res_paddr: system physical address of response page
+ */
+struct sev_data_snp_guest_request {
+	u64 gctx_paddr;				/* In */
+	u64 req_paddr;				/* In */
+	u64 res_paddr;				/* In */
+} __packed;
+
+/**
+ * struuct sev_data_snp_init - SNP_INIT_EX structure
+ *
+ * @init_rmp: indicate that the RMP should be initialized.
+ */
+struct sev_data_snp_init_ex {
+	u32 init_rmp:1;
+	u32 rsvd:31;
+	u8 rsvd1[60];
+} __packed;
+
 #ifdef CONFIG_CRYPTO_DEV_SP_PSP
 
 /**
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 91b4c63d5cbf..bed65a891223 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -61,6 +61,13 @@ typedef enum {
 	SEV_RET_INVALID_PARAM,
 	SEV_RET_RESOURCE_LIMIT,
 	SEV_RET_SECURE_DATA_INVALID,
+	SEV_RET_INVALID_PAGE_SIZE,
+	SEV_RET_INVALID_PAGE_STATE,
+	SEV_RET_INVALID_MDATA_ENTRY,
+	SEV_RET_INVALID_PAGE_OWNER,
+	SEV_RET_INVALID_PAGE_AEAD_OFLOW,
+	SEV_RET_RMP_INIT_REQUIRED,
+
 	SEV_RET_MAX,
 } sev_ret_code;
 
@@ -147,6 +154,41 @@ struct sev_user_data_get_id2 {
 	__u32 length;				/* In/Out */
 } __packed;
 
+/**
+ * struct sev_user_data_snp_status - SNP status
+ *
+ * @major: API major version
+ * @minor: API minor version
+ * @state: current platform state
+ * @build: firmware build id for the API version
+ * @guest_count: the number of guest currently managed by the firmware
+ * @tcb_version: current TCB version
+ */
+struct sev_user_data_snp_status {
+	__u8 api_major;		/* Out */
+	__u8 api_minor;		/* Out */
+	__u8 state;		/* Out */
+	__u8 rsvd;
+	__u32 build_id;		/* Out */
+	__u32 rsvd1;
+	__u32 guest_count;	/* Out */
+	__u64 tcb_version;	/* Out */
+	__u64 rsvd2;
+} __packed;
+
+/*
+ * struct sev_user_data_snp_config - system wide configuration value for SNP.
+ *
+ * @reported_tcb: The TCB version to report in the guest attestation report.
+ * @mask_chip_id: Indicates that the CHID_ID field in the attestation report
+ * will always be zero.
+ */
+struct sev_user_data_snp_config {
+	__u64 reported_tcb;     /* In */
+	__u32 mask_chip_id;     /* In */
+	__u8 rsvd[52];
+} __packed;
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 12/45] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (10 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 11/45] crypto:ccp: Define the SEV-SNP commands Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 13/45] crypto:ccp: Provide APIs to issue SEV-SNP commands Brijesh Singh
                   ` (33 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

Before SNP VMs can be launched, the platform must be appropriately
configured and initialized. Platform initialization is accomplished via
the SNP_INIT command.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 123 +++++++++++++++++++++++++++++++++--
 drivers/crypto/ccp/sev-dev.h |   2 +
 include/linux/psp-sev.h      |  16 +++++
 3 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index f5dbadba82ff..1321f6fb07c5 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -32,6 +32,10 @@
 #define SEV_FW_FILE		"amd/sev.fw"
 #define SEV_FW_NAME_SIZE	64
 
+/* Minimum firmware version required for the SEV-SNP support */
+#define SNP_MIN_API_MAJOR	1
+#define SNP_MIN_API_MINOR	30
+
 static DEFINE_MUTEX(sev_cmd_mutex);
 static struct sev_misc_dev *misc_dev;
 
@@ -598,6 +602,95 @@ static int sev_update_firmware(struct device *dev)
 	return ret;
 }
 
+static void snp_set_hsave_pa(void *arg)
+{
+	wrmsrl(MSR_VM_HSAVE_PA, 0);
+}
+
+static int __sev_snp_init_locked(int *error)
+{
+	struct psp_device *psp = psp_master;
+	struct sev_device *sev;
+	int rc = 0;
+
+	if (!psp || !psp->sev_data)
+		return -ENODEV;
+
+	sev = psp->sev_data;
+
+	if (sev->snp_inited)
+		return 0;
+
+	/*
+	 * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
+	 * across all cores.
+	 */
+	on_each_cpu(snp_set_hsave_pa, NULL, 1);
+
+	/* Prepare for first SEV guest launch after INIT */
+	wbinvd_on_all_cpus();
+
+	/* Issue the SNP_INIT firmware command. */
+	rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
+	if (rc)
+		return rc;
+
+	sev->snp_inited = true;
+	dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+
+	return rc;
+}
+
+int sev_snp_init(int *error)
+{
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENODEV;
+
+	mutex_lock(&sev_cmd_mutex);
+	rc = __sev_snp_init_locked(error);
+	mutex_unlock(&sev_cmd_mutex);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(sev_snp_init);
+
+static int __sev_snp_shutdown_locked(int *error)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	int ret;
+
+	if (!sev->snp_inited)
+		return 0;
+
+	/* SHUTDOWN requires the DF_FLUSH */
+	wbinvd_on_all_cpus();
+	__sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
+
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN, NULL, error);
+	if (ret) {
+		dev_err(sev->dev, "SEV-SNP firmware shutdown failed\n");
+		return ret;
+	}
+
+	sev->snp_inited = false;
+	dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
+
+	return ret;
+}
+
+static int sev_snp_shutdown(int *error)
+{
+	int rc;
+
+	mutex_lock(&sev_cmd_mutex);
+	rc = __sev_snp_shutdown_locked(NULL);
+	mutex_unlock(&sev_cmd_mutex);
+
+	return rc;
+}
+
 static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -1048,6 +1141,8 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 			   get_order(SEV_ES_TMR_SIZE));
 		sev_es_tmr = NULL;
 	}
+
+	sev_snp_shutdown(NULL);
 }
 
 void sev_dev_destroy(struct psp_device *psp)
@@ -1093,6 +1188,26 @@ void sev_pci_init(void)
 	    sev_update_firmware(sev->dev) == 0)
 		sev_get_api_version();
 
+	/*
+	 * If boot CPU supports the SNP, then first attempt to initialize
+	 * the SNP firmware.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP)) {
+		if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
+			dev_err(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
+				SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
+		} else {
+			rc = sev_snp_init(&error);
+			if (rc) {
+				/*
+				 * If we failed to INIT SNP then don't abort the probe.
+				 * Continue to initialize the legacy SEV firmware.
+				 */
+				dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
+			}
+		}
+	}
+
 	/* Obtain the TMR memory area for SEV-ES use */
 	tmr_page = alloc_pages(GFP_KERNEL, get_order(SEV_ES_TMR_SIZE));
 	if (tmr_page) {
@@ -1117,13 +1232,11 @@ void sev_pci_init(void)
 		rc = sev_platform_init(&error);
 	}
 
-	if (rc) {
+	if (rc)
 		dev_err(sev->dev, "SEV: failed to INIT error %#x\n", error);
-		return;
-	}
 
-	dev_info(sev->dev, "SEV API:%d.%d build:%d\n", sev->api_major,
-		 sev->api_minor, sev->build);
+	dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_inited ?
+		"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
 
 	return;
 
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 666c21eb81ab..186ad20cbd24 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -52,6 +52,8 @@ struct sev_device {
 	u8 build;
 
 	void *cmd_buf;
+
+	bool snp_inited;
 };
 
 int sev_dev_init(struct psp_device *psp);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index c3755099ab55..1b53e8782250 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -748,6 +748,20 @@ struct sev_data_snp_init_ex {
  */
 int sev_platform_init(int *error);
 
+/**
+ * sev_snp_init - perform SEV SNP_INIT command
+ *
+ * @error: SEV command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV    if the SEV device is not available
+ * -%ENOTSUPP  if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO       if the SEV returned a non-zero return code
+ */
+int sev_snp_init(int *error);
+
 /**
  * sev_platform_status - perform SEV PLATFORM_STATUS command
  *
@@ -855,6 +869,8 @@ sev_platform_status(struct sev_user_data_status *status, int *error) { return -E
 
 static inline int sev_platform_init(int *error) { return -ENODEV; }
 
+static inline int sev_snp_init(int *error) { return -ENODEV; }
+
 static inline int
 sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENODEV; }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 13/45] crypto:ccp: Provide APIs to issue SEV-SNP commands
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (11 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 12/45] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 14/45] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Brijesh Singh
                   ` (32 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

Provide the APIs for the hypervisor to manage an SEV-SNP guest. The
commands for SEV-SNP is defined in the SEV-SNP firmware specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 24 ++++++++++++
 include/linux/psp-sev.h      | 73 ++++++++++++++++++++++++++++++++++++
 2 files changed, 97 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 1321f6fb07c5..01edad9116f2 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1025,6 +1025,30 @@ int sev_guest_df_flush(int *error)
 }
 EXPORT_SYMBOL_GPL(sev_guest_df_flush);
 
+int snp_guest_decommission(struct sev_data_snp_decommission *data, int *error)
+{
+	return sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, data, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_decommission);
+
+int snp_guest_df_flush(int *error)
+{
+	return sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_df_flush);
+
+int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error)
+{
+	return sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, data, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_page_reclaim);
+
+int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
+{
+	return sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, data, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt);
+
 static void sev_exit(struct kref *ref)
 {
 	misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 1b53e8782250..f2105a8755f9 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -860,6 +860,64 @@ int sev_guest_df_flush(int *error);
  */
 int sev_guest_decommission(struct sev_data_decommission *data, int *error);
 
+/**
+ * snp_guest_df_flush - perform SNP DF_FLUSH command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV    if the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO       if the sev returned a non-zero return code
+ */
+int snp_guest_df_flush(int *error);
+
+/**
+ * snp_guest_decommission - perform SNP_DECOMMISSION command
+ *
+ * @decommission: sev_data_decommission structure to be processed
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV    if the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO       if the sev returned a non-zero return code
+ */
+int snp_guest_decommission(struct sev_data_snp_decommission *data, int *error);
+
+/**
+ * snp_guest_page_reclaim - perform SNP_PAGE_RECLAIM command
+ *
+ * @decommission: sev_snp_page_reclaim structure to be processed
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV    if the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO       if the sev returned a non-zero return code
+ */
+int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
+
+/**
+ * snp_guest_dbg_decrypt - perform SEV SNP_DBG_DECRYPT command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV    if the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO       if the sev returned a non-zero return code
+ */
+int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
+
 void *psp_copy_user_blob(u64 uaddr, u32 len);
 
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
@@ -887,6 +945,21 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int
 
 static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }
 
+static inline int
+snp_guest_decommission(struct sev_data_snp_decommission *data, int *error) { return -ENODEV; }
+
+static inline int snp_guest_df_flush(int *error) { return -ENODEV; }
+
+static inline int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error)
+{
+	return -ENODEV;
+}
+
+static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
+{
+	return -ENODEV;
+}
+
 #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
 
 #endif	/* __PSP_SEV_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 14/45] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (12 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 13/45] crypto:ccp: Provide APIs to issue SEV-SNP commands Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2022-02-25 18:03   ` Alper Gun
  2022-06-14  0:10   ` Alper Gun
  2021-08-20 15:58 ` [PATCH Part2 v5 15/45] crypto: ccp: Handle the legacy SEV command " Brijesh Singh
                   ` (31 subsequent siblings)
  45 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The behavior and requirement for the SEV-legacy command is altered when
the SNP firmware is in the INIT state. See SEV-SNP firmware specification
for more details.

Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
when SNP is enabled to satify new requirements for the SNP. Continue
allocating a 1mb region for !SNP configuration.

While at it, provide API that can be used by others to allocate a page
that can be used by the firmware. The immediate user for this API will
be the KVM driver. The KVM driver to need to allocate a firmware context
page during the guest creation. The context page need to be updated
by the firmware. See the SEV-SNP specification for further details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 169 ++++++++++++++++++++++++++++++++++-
 include/linux/psp-sev.h      |  11 +++
 2 files changed, 176 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 01edad9116f2..34dc358b13b9 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -62,6 +62,14 @@ static int psp_timeout;
 #define SEV_ES_TMR_SIZE		(1024 * 1024)
 static void *sev_es_tmr;
 
+/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
+#define SEV_SNP_ES_TMR_SIZE	(2 * 1024 * 1024)
+
+static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
+
+static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
+static int sev_do_cmd(int cmd, void *data, int *psp_ret);
+
 static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -159,6 +167,156 @@ static int sev_cmd_buffer_len(int cmd)
 	return 0;
 }
 
+static void snp_leak_pages(unsigned long pfn, unsigned int npages)
+{
+	WARN(1, "psc failed, pfn 0x%lx pages %d (leaking)\n", pfn, npages);
+	while (npages--) {
+		memory_failure(pfn, 0);
+		dump_rmpentry(pfn);
+		pfn++;
+	}
+}
+
+static int snp_reclaim_pages(unsigned long pfn, unsigned int npages, bool locked)
+{
+	struct sev_data_snp_page_reclaim data;
+	int ret, err, i, n = 0;
+
+	for (i = 0; i < npages; i++) {
+		memset(&data, 0, sizeof(data));
+		data.paddr = pfn << PAGE_SHIFT;
+
+		if (locked)
+			ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+		else
+			ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+		if (ret)
+			goto cleanup;
+
+		ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+		if (ret)
+			goto cleanup;
+
+		pfn++;
+		n++;
+	}
+
+	return 0;
+
+cleanup:
+	/*
+	 * If failed to reclaim the page then page is no longer safe to
+	 * be released, leak it.
+	 */
+	snp_leak_pages(pfn, npages - n);
+	return ret;
+}
+
+static inline int rmp_make_firmware(unsigned long pfn, int level)
+{
+	return rmp_make_private(pfn, 0, level, 0, true);
+}
+
+static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, bool to_fw, bool locked,
+			     bool need_reclaim)
+{
+	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT; /* Cbit maybe set in the paddr */
+	int rc, n = 0, i;
+
+	for (i = 0; i < npages; i++) {
+		if (to_fw)
+			rc = rmp_make_firmware(pfn, PG_LEVEL_4K);
+		else
+			rc = need_reclaim ? snp_reclaim_pages(pfn, 1, locked) :
+					    rmp_make_shared(pfn, PG_LEVEL_4K);
+		if (rc)
+			goto cleanup;
+
+		pfn++;
+		n++;
+	}
+
+	return 0;
+
+cleanup:
+	/* Try unrolling the firmware state changes */
+	if (to_fw) {
+		/*
+		 * Reclaim the pages which were already changed to the
+		 * firmware state.
+		 */
+		snp_reclaim_pages(paddr >> PAGE_SHIFT, n, locked);
+
+		return rc;
+	}
+
+	/*
+	 * If failed to change the page state to shared, then its not safe
+	 * to release the page back to the system, leak it.
+	 */
+	snp_leak_pages(pfn, npages - n);
+
+	return rc;
+}
+
+static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
+{
+	unsigned long npages = 1ul << order, paddr;
+	struct sev_device *sev;
+	struct page *page;
+
+	if (!psp_master || !psp_master->sev_data)
+		return ERR_PTR(-EINVAL);
+
+	page = alloc_pages(gfp_mask, order);
+	if (!page)
+		return NULL;
+
+	/* If SEV-SNP is initialized then add the page in RMP table. */
+	sev = psp_master->sev_data;
+	if (!sev->snp_inited)
+		return page;
+
+	paddr = __pa((unsigned long)page_address(page));
+	if (snp_set_rmp_state(paddr, npages, true, locked, false))
+		return NULL;
+
+	return page;
+}
+
+void *snp_alloc_firmware_page(gfp_t gfp_mask)
+{
+	struct page *page;
+
+	page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
+
+	return page ? page_address(page) : NULL;
+}
+EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
+
+static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
+{
+	unsigned long paddr, npages = 1ul << order;
+
+	if (!page)
+		return;
+
+	paddr = __pa((unsigned long)page_address(page));
+	if (snp_set_rmp_state(paddr, npages, false, locked, true))
+		return;
+
+	__free_pages(page, order);
+}
+
+void snp_free_firmware_page(void *addr)
+{
+	if (!addr)
+		return;
+
+	__snp_free_firmware_pages(virt_to_page(addr), 0, false);
+}
+EXPORT_SYMBOL(snp_free_firmware_page);
+
 static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 {
 	struct psp_device *psp = psp_master;
@@ -281,7 +439,7 @@ static int __sev_platform_init_locked(int *error)
 
 		data.flags |= SEV_INIT_FLAGS_SEV_ES;
 		data.tmr_address = tmr_pa;
-		data.tmr_len = SEV_ES_TMR_SIZE;
+		data.tmr_len = sev_es_tmr_size;
 	}
 
 	rc = __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
@@ -638,6 +796,8 @@ static int __sev_snp_init_locked(int *error)
 	sev->snp_inited = true;
 	dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
 
+	sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
+
 	return rc;
 }
 
@@ -1161,8 +1321,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 		/* The TMR area was encrypted, flush it from the cache */
 		wbinvd_on_all_cpus();
 
-		free_pages((unsigned long)sev_es_tmr,
-			   get_order(SEV_ES_TMR_SIZE));
+		__snp_free_firmware_pages(virt_to_page(sev_es_tmr),
+					  get_order(sev_es_tmr_size),
+					  false);
 		sev_es_tmr = NULL;
 	}
 
@@ -1233,7 +1394,7 @@ void sev_pci_init(void)
 	}
 
 	/* Obtain the TMR memory area for SEV-ES use */
-	tmr_page = alloc_pages(GFP_KERNEL, get_order(SEV_ES_TMR_SIZE));
+	tmr_page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(sev_es_tmr_size), false);
 	if (tmr_page) {
 		sev_es_tmr = page_address(tmr_page);
 	} else {
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index f2105a8755f9..00bd684dc094 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -12,6 +12,8 @@
 #ifndef __PSP_SEV_H__
 #define __PSP_SEV_H__
 
+#include <linux/sev.h>
+
 #include <uapi/linux/psp-sev.h>
 
 #ifdef CONFIG_X86
@@ -919,6 +921,8 @@ int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
 int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
 
 void *psp_copy_user_blob(u64 uaddr, u32 len);
+void *snp_alloc_firmware_page(gfp_t mask);
+void snp_free_firmware_page(void *addr);
 
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
 
@@ -960,6 +964,13 @@ static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *erro
 	return -ENODEV;
 }
 
+static inline void *snp_alloc_firmware_page(gfp_t mask)
+{
+	return NULL;
+}
+
+static inline void snp_free_firmware_page(void *addr) { }
+
 #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
 
 #endif	/* __PSP_SEV_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 15/45] crypto: ccp: Handle the legacy SEV command when SNP is enabled
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (13 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 14/45] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 16/45] crypto: ccp: Add the SNP_PLATFORM_STATUS command Brijesh Singh
                   ` (30 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The behavior of the SEV-legacy commands is altered when the SNP firmware
is in the INIT state. When SNP is in INIT state, all the SEV-legacy
commands that cause the firmware to write to memory must be in the
firmware state before issuing the command..

A command buffer may contains a system physical address that the firmware
may write to. There are two cases that need to be handled:

1) system physical address points to a guest memory
2) system physical address points to a host memory

To handle the case #1, change the page state to the firmware in the RMP
table before issuing the command and restore the state to shared after the
command completes.

For the case #2, use a bounce buffer to complete the request.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 346 ++++++++++++++++++++++++++++++++++-
 drivers/crypto/ccp/sev-dev.h |  12 ++
 2 files changed, 348 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 34dc358b13b9..4cd7d803a624 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -317,12 +317,295 @@ void snp_free_firmware_page(void *addr)
 }
 EXPORT_SYMBOL(snp_free_firmware_page);
 
+static int alloc_snp_host_map(struct sev_device *sev)
+{
+	struct page *page;
+	int i;
+
+	for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+		struct snp_host_map *map = &sev->snp_host_map[i];
+
+		memset(map, 0, sizeof(*map));
+
+		page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
+		if (!page)
+			return -ENOMEM;
+
+		map->host = page_address(page);
+	}
+
+	return 0;
+}
+
+static void free_snp_host_map(struct sev_device *sev)
+{
+	int i;
+
+	for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+		struct snp_host_map *map = &sev->snp_host_map[i];
+
+		if (map->host) {
+			__free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
+			memset(map, 0, sizeof(*map));
+		}
+	}
+}
+
+static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+	unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+	map->active = false;
+
+	if (!paddr || !len)
+		return 0;
+
+	map->paddr = *paddr;
+	map->len = len;
+
+	/* If paddr points to a guest memory then change the page state to firmwware. */
+	if (guest) {
+		if (snp_set_rmp_state(*paddr, npages, true, true, false))
+			return -EFAULT;
+
+		goto done;
+	}
+
+	if (!map->host)
+		return -ENOMEM;
+
+	/* Check if the pre-allocated buffer can be used to fullfil the request. */
+	if (len > SEV_FW_BLOB_MAX_SIZE)
+		return -EINVAL;
+
+	/* Transition the pre-allocated buffer to the firmware state. */
+	if (snp_set_rmp_state(__pa(map->host), npages, true, true, false))
+		return -EFAULT;
+
+	/* Set the paddr to use pre-allocated firmware buffer */
+	*paddr = __psp_pa(map->host);
+
+done:
+	map->active = true;
+	return 0;
+}
+
+static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+	unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+	if (!map->active)
+		return 0;
+
+	/* If paddr points to a guest memory then restore the page state to hypervisor. */
+	if (guest) {
+		if (snp_set_rmp_state(*paddr, npages, false, true, true))
+			return -EFAULT;
+
+		goto done;
+	}
+
+	/*
+	 * Transition the pre-allocated buffer to hypervisor state before the access.
+	 *
+	 * This is because while changing the page state to firmware, the kernel unmaps
+	 * the pages from the direct map, and to restore the direct map we must
+	 * transition the pages to shared state.
+	 */
+	if (snp_set_rmp_state(__pa(map->host), npages, false, true, true))
+		return -EFAULT;
+
+	/* Copy the response data firmware buffer to the callers buffer. */
+	memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
+	*paddr = map->paddr;
+
+done:
+	map->active = false;
+	return 0;
+}
+
+static bool sev_legacy_cmd_buf_writable(int cmd)
+{
+	switch (cmd) {
+	case SEV_CMD_PLATFORM_STATUS:
+	case SEV_CMD_GUEST_STATUS:
+	case SEV_CMD_LAUNCH_START:
+	case SEV_CMD_RECEIVE_START:
+	case SEV_CMD_LAUNCH_MEASURE:
+	case SEV_CMD_SEND_START:
+	case SEV_CMD_SEND_UPDATE_DATA:
+	case SEV_CMD_SEND_UPDATE_VMSA:
+	case SEV_CMD_PEK_CSR:
+	case SEV_CMD_PDH_CERT_EXPORT:
+	case SEV_CMD_GET_ID:
+	case SEV_CMD_ATTESTATION_REPORT:
+		return true;
+	default:
+		return false;
+	}
+}
+
+#define prep_buffer(name, addr, len, guest, map)  \
+   func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
+
+static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
+{
+	int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
+	struct sev_device *sev = psp_master->sev_data;
+	bool from_fw = !to_fw;
+
+	/*
+	 * After the command is completed, change the command buffer memory to
+	 * hypervisor state.
+	 *
+	 * The immutable bit is automatically cleared by the firmware, so
+	 * no not need to reclaim the page.
+	 */
+	if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
+		if (snp_set_rmp_state(__pa(cmd_buf), 1, false, true, false))
+			return -EFAULT;
+
+		/* No need to go further if firmware failed to execute command. */
+		if (fw_err)
+			return 0;
+	}
+
+	if (to_fw)
+		func = map_firmware_writeable;
+	else
+		func = unmap_firmware_writeable;
+
+	/*
+	 * A command buffer may contains a system physical address. If the address
+	 * points to a host memory then use an intermediate firmware page otherwise
+	 * change the page state in the RMP table.
+	 */
+	switch (cmd) {
+	case SEV_CMD_PDH_CERT_EXPORT:
+		if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
+				pdh_cert_len, false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
+				cert_chain_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_GET_ID:
+		if (prep_buffer(struct sev_data_get_id, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_PEK_CSR:
+		if (prep_buffer(struct sev_data_pek_csr, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_launch_update_data, address, len,
+				true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
+				true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_MEASURE:
+		if (prep_buffer(struct sev_data_launch_measure, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_SECRET:
+		if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
+				true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_DBG_DECRYPT:
+		if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
+				&sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_DBG_ENCRYPT:
+		if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
+				&sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_ATTESTATION_REPORT:
+		if (prep_buffer(struct sev_data_attestation_report, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_START:
+		if (prep_buffer(struct sev_data_send_start, session_address,
+				session_len, false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_send_update_data, trans_address,
+				trans_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
+				trans_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_RECEIVE_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_receive_update_data, guest_address,
+				guest_len, true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_RECEIVE_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
+				guest_len, true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	default:
+		break;
+	}
+
+	/* The command buffer need to be in the firmware state. */
+	if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
+		if (snp_set_rmp_state(__pa(cmd_buf), 1, true, true, false))
+			return -EFAULT;
+	}
+
+	return 0;
+
+err:
+	return -EINVAL;
+}
+
+static inline bool need_firmware_copy(int cmd)
+{
+	struct sev_device *sev = psp_master->sev_data;
+
+	/* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
+	return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_inited) ? true : false;
+}
+
+static int snp_aware_copy_to_firmware(int cmd, void *data)
+{
+	return __snp_cmd_buf_copy(cmd, data, true, 0);
+}
+
+static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
+{
+	return __snp_cmd_buf_copy(cmd, data, false, fw_err);
+}
+
 static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 {
 	struct psp_device *psp = psp_master;
 	struct sev_device *sev;
 	unsigned int phys_lsb, phys_msb;
 	unsigned int reg, ret = 0;
+	void *cmd_buf;
 	int buf_len;
 
 	if (!psp || !psp->sev_data)
@@ -342,12 +625,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 	 * work for some memory, e.g. vmalloc'd addresses, and @data may not be
 	 * physically contiguous.
 	 */
-	if (data)
-		memcpy(sev->cmd_buf, data, buf_len);
+	if (data) {
+		if (sev->cmd_buf_active > 2)
+			return -EBUSY;
+
+		cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
+
+		memcpy(cmd_buf, data, buf_len);
+		sev->cmd_buf_active++;
+
+		/*
+		 * The behavior of the SEV-legacy commands is altered when the
+		 * SNP firmware is in the INIT state.
+		 */
+		if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, sev->cmd_buf))
+			return -EFAULT;
+	} else {
+		cmd_buf = sev->cmd_buf;
+	}
 
 	/* Get the physical address of the command buffer */
-	phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
-	phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
+	phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
+	phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
 
 	dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
 		cmd, phys_msb, phys_lsb, psp_timeout);
@@ -388,15 +687,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 		ret = -EIO;
 	}
 
-	print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
-			     buf_len, false);
-
 	/*
 	 * Copy potential output from the PSP back to data.  Do this even on
 	 * failure in case the caller wants to glean something from the error.
 	 */
-	if (data)
-		memcpy(data, sev->cmd_buf, buf_len);
+	if (data) {
+		/*
+		 * Restore the page state after the command completes.
+		 */
+		if (need_firmware_copy(cmd) &&
+		    snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
+			return -EFAULT;
+
+		memcpy(data, cmd_buf, buf_len);
+		sev->cmd_buf_active--;
+	}
+
+	print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
+			     buf_len, false);
 
 	return ret;
 }
@@ -1271,10 +1579,12 @@ int sev_dev_init(struct psp_device *psp)
 	if (!sev)
 		goto e_err;
 
-	sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
+	sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
 	if (!sev->cmd_buf)
 		goto e_sev;
 
+	sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+
 	psp->sev_data = sev;
 
 	sev->dev = dev;
@@ -1327,6 +1637,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 		sev_es_tmr = NULL;
 	}
 
+	/*
+	 * The host map need to clear the immutable bit so it must be free'd before the
+	 * SNP firmware shutdown.
+	 */
+	free_snp_host_map(sev);
+
 	sev_snp_shutdown(NULL);
 }
 
@@ -1391,6 +1707,14 @@ void sev_pci_init(void)
 				dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
 			}
 		}
+
+		/*
+		 * Allocate the intermediate buffers used for the legacy command handling.
+		 */
+		if (alloc_snp_host_map(sev)) {
+			dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
+			goto skip_legacy;
+		}
 	}
 
 	/* Obtain the TMR memory area for SEV-ES use */
@@ -1420,12 +1744,14 @@ void sev_pci_init(void)
 	if (rc)
 		dev_err(sev->dev, "SEV: failed to INIT error %#x\n", error);
 
+skip_legacy:
 	dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_inited ?
 		"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
 
 	return;
 
 err:
+	free_snp_host_map(sev);
 	psp_master->sev_data = NULL;
 }
 
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 186ad20cbd24..fe5d7a3ebace 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -29,11 +29,20 @@
 #define SEV_CMDRESP_CMD_SHIFT		16
 #define SEV_CMDRESP_IOC			BIT(0)
 
+#define MAX_SNP_HOST_MAP_BUFS		2
+
 struct sev_misc_dev {
 	struct kref refcount;
 	struct miscdevice misc;
 };
 
+struct snp_host_map {
+	u64 paddr;
+	u32 len;
+	void *host;
+	bool active;
+};
+
 struct sev_device {
 	struct device *dev;
 	struct psp_device *psp;
@@ -52,8 +61,11 @@ struct sev_device {
 	u8 build;
 
 	void *cmd_buf;
+	void *cmd_buf_backup;
+	int cmd_buf_active;
 
 	bool snp_inited;
+	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
 };
 
 int sev_dev_init(struct psp_device *psp);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 16/45] crypto: ccp: Add the SNP_PLATFORM_STATUS command
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (14 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 15/45] crypto: ccp: Handle the legacy SEV command " Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-10  3:18   ` Marc Orr
  2021-09-22 17:35   ` Dr. David Alan Gilbert
  2021-08-20 15:58 ` [PATCH Part2 v5 17/45] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Brijesh Singh
                   ` (29 subsequent siblings)
  45 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The command can be used by the userspace to query the SNP platform status
report. See the SEV-SNP spec for more details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 Documentation/virt/coco/sevguest.rst | 27 +++++++++++++++++
 drivers/crypto/ccp/sev-dev.c         | 45 ++++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h         |  1 +
 3 files changed, 73 insertions(+)

diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
index 7acb8696fca4..7c51da010039 100644
--- a/Documentation/virt/coco/sevguest.rst
+++ b/Documentation/virt/coco/sevguest.rst
@@ -52,6 +52,22 @@ to execute due to the firmware error, then fw_err code will be set.
                 __u64 fw_err;
         };
 
+The host ioctl should be called to /dev/sev device. The ioctl accepts command
+id and command input structure.
+
+::
+        struct sev_issue_cmd {
+                /* Command ID */
+                __u32 cmd;
+
+                /* Command request structure */
+                __u64 data;
+
+                /* firmware error code on failure (see psp-sev.h) */
+                __u32 error;
+        };
+
+
 2.1 SNP_GET_REPORT
 ------------------
 
@@ -107,3 +123,14 @@ length of the blob is lesser than expected then snp_ext_report_req.certs_len wil
 be updated with the expected value.
 
 See GHCB specification for further detail on how to parse the certificate blob.
+
+2.3 SNP_PLATFORM_STATUS
+-----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_platform_status
+:Returns (out): 0 on success, -negative on error
+
+The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
+status includes API major, minor version and more. See the SEV-SNP
+specification for further details.
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 4cd7d803a624..16c6df5d412c 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1394,6 +1394,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
 	return ret;
 }
 
+static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_data_snp_platform_status_buf buf;
+	struct page *status_page;
+	void *data;
+	int ret;
+
+	if (!sev->snp_inited || !argp->data)
+		return -EINVAL;
+
+	status_page = alloc_page(GFP_KERNEL_ACCOUNT);
+	if (!status_page)
+		return -ENOMEM;
+
+	data = page_address(status_page);
+	if (snp_set_rmp_state(__pa(data), 1, true, true, false)) {
+		__free_pages(status_page, 0);
+		return -EFAULT;
+	}
+
+	buf.status_paddr = __psp_pa(data);
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
+
+	/* Change the page state before accessing it */
+	if (snp_set_rmp_state(__pa(data), 1, false, true, true)) {
+		snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
+		return -EFAULT;
+	}
+
+	if (ret)
+		goto cleanup;
+
+	if (copy_to_user((void __user *)argp->data, data,
+			 sizeof(struct sev_user_data_snp_status)))
+		ret = -EFAULT;
+
+cleanup:
+	__free_pages(status_page, 0);
+	return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -1445,6 +1487,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SEV_GET_ID2:
 		ret = sev_ioctl_do_get_id2(&input);
 		break;
+	case SNP_PLATFORM_STATUS:
+		ret = sev_ioctl_snp_platform_status(&input);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index bed65a891223..ffd60e8b0a31 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -28,6 +28,7 @@ enum {
 	SEV_PEK_CERT_IMPORT,
 	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
 	SEV_GET_ID2,
+	SNP_PLATFORM_STATUS,
 
 	SEV_MAX,
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 17/45] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (15 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 16/45] crypto: ccp: Add the SNP_PLATFORM_STATUS command Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-01 21:02   ` Connor Kuehl
  2021-09-10  3:27   ` Marc Orr
  2021-08-20 15:58 ` [PATCH Part2 v5 18/45] crypto: ccp: Provide APIs to query extended attestation report Brijesh Singh
                   ` (28 subsequent siblings)
  45 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The SEV-SNP firmware provides the SNP_CONFIG command used to set the
system-wide configuration value for SNP guests. The information includes
the TCB version string to be reported in guest attestation reports.

Version 2 of the GHCB specification adds an NAE (SNP extended guest
request) that a guest can use to query the reports that include additional
certificates.

In both cases, userspace provided additional data is included in the
attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
command to give the certificate blob and the reported TCB version string
at once. Note that the specification defines certificate blob with a
specific GUID format; the userspace is responsible for building the
proper certificate blob. The ioctl treats it an opaque blob.

While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
command that can be used to obtain the data programmed through the
SNP_SET_EXT_CONFIG.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 Documentation/virt/coco/sevguest.rst |  28 +++++++
 drivers/crypto/ccp/sev-dev.c         | 115 +++++++++++++++++++++++++++
 drivers/crypto/ccp/sev-dev.h         |   3 +
 include/uapi/linux/psp-sev.h         |  17 ++++
 4 files changed, 163 insertions(+)

diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
index 7c51da010039..64a1b5167b33 100644
--- a/Documentation/virt/coco/sevguest.rst
+++ b/Documentation/virt/coco/sevguest.rst
@@ -134,3 +134,31 @@ See GHCB specification for further detail on how to parse the certificate blob.
 The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
 status includes API major, minor version and more. See the SEV-SNP
 specification for further details.
+
+2.4 SNP_SET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
+reported TCB version in the attestation report. The command is similar to
+SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
+command also accepts an additional certificate blob defined in the GHCB
+specification.
+
+If the certs_address is zero, then previous certificate blob will deleted.
+For more information on the certificate blob layout, see the GHCB spec
+(extended guest request message).
+
+
+2.4 SNP_GET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_EXT_CONFIG is used to query the system-wide configuration set
+through the SNP_SET_EXT_CONFIG.
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 16c6df5d412c..9ba194acbe85 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1132,6 +1132,10 @@ static int __sev_snp_shutdown_locked(int *error)
 	if (!sev->snp_inited)
 		return 0;
 
+	/* Free the memory used for caching the certificate data */
+	kfree(sev->snp_certs_data);
+	sev->snp_certs_data = NULL;
+
 	/* SHUTDOWN requires the DF_FLUSH */
 	wbinvd_on_all_cpus();
 	__sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
@@ -1436,6 +1440,111 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
 	return ret;
 }
 
+static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_user_data_ext_snp_config input;
+	int ret;
+
+	if (!sev->snp_inited || !argp->data)
+		return -EINVAL;
+
+	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+		return -EFAULT;
+
+	/* Copy the TCB version programmed through the SET_CONFIG to userspace */
+	if (input.config_address) {
+		if (copy_to_user((void * __user)input.config_address,
+				 &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
+			return -EFAULT;
+	}
+
+	/* Copy the extended certs programmed through the SNP_SET_CONFIG */
+	if (input.certs_address && sev->snp_certs_data) {
+		if (input.certs_len < sev->snp_certs_len) {
+			/* Return the certs length to userspace */
+			input.certs_len = sev->snp_certs_len;
+
+			ret = -ENOSR;
+			goto e_done;
+		}
+
+		if (copy_to_user((void * __user)input.certs_address,
+				 sev->snp_certs_data, sev->snp_certs_len))
+			return -EFAULT;
+	}
+
+	ret = 0;
+
+e_done:
+	if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
+		ret = -EFAULT;
+
+	return ret;
+}
+
+static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_user_data_ext_snp_config input;
+	struct sev_user_data_snp_config config;
+	void *certs = NULL;
+	int ret = 0;
+
+	if (!sev->snp_inited || !argp->data)
+		return -EINVAL;
+
+	if (!writable)
+		return -EPERM;
+
+	if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+		return -EFAULT;
+
+	/* Copy the certs from userspace */
+	if (input.certs_address) {
+		if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
+			return -EINVAL;
+
+		certs = psp_copy_user_blob(input.certs_address, input.certs_len);
+		if (IS_ERR(certs))
+			return PTR_ERR(certs);
+	}
+
+	/* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
+	if (input.config_address) {
+		if (copy_from_user(&config,
+				   (void __user *)input.config_address, sizeof(config))) {
+			ret = -EFAULT;
+			goto e_free;
+		}
+
+		ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
+		if (ret)
+			goto e_free;
+
+		memcpy(&sev->snp_config, &config, sizeof(config));
+	}
+
+	/*
+	 * If the new certs are passed then cache it else free the old certs.
+	 */
+	if (certs) {
+		kfree(sev->snp_certs_data);
+		sev->snp_certs_data = certs;
+		sev->snp_certs_len = input.certs_len;
+	} else {
+		kfree(sev->snp_certs_data);
+		sev->snp_certs_data = NULL;
+		sev->snp_certs_len = 0;
+	}
+
+	return 0;
+
+e_free:
+	kfree(certs);
+	return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -1490,6 +1599,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SNP_PLATFORM_STATUS:
 		ret = sev_ioctl_snp_platform_status(&input);
 		break;
+	case SNP_SET_EXT_CONFIG:
+		ret = sev_ioctl_snp_set_config(&input, writable);
+		break;
+	case SNP_GET_EXT_CONFIG:
+		ret = sev_ioctl_snp_get_config(&input);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index fe5d7a3ebace..d2fe1706311a 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -66,6 +66,9 @@ struct sev_device {
 
 	bool snp_inited;
 	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
+	void *snp_certs_data;
+	u32 snp_certs_len;
+	struct sev_user_data_snp_config snp_config;
 };
 
 int sev_dev_init(struct psp_device *psp);
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index ffd60e8b0a31..60e7a8d1a18e 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -29,6 +29,8 @@ enum {
 	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
 	SEV_GET_ID2,
 	SNP_PLATFORM_STATUS,
+	SNP_SET_EXT_CONFIG,
+	SNP_GET_EXT_CONFIG,
 
 	SEV_MAX,
 };
@@ -190,6 +192,21 @@ struct sev_user_data_snp_config {
 	__u8 rsvd[52];
 } __packed;
 
+/**
+ * struct sev_data_snp_ext_config - system wide configuration value for SNP.
+ *
+ * @config_address: address of the struct sev_user_data_snp_config or 0 when
+ *		reported_tcb does not need to be updated.
+ * @certs_address: address of extended guest request certificate chain or
+ *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
+ * @certs_len: length of the certs
+ */
+struct sev_user_data_ext_snp_config {
+	__u64 config_address;		/* In */
+	__u64 certs_address;		/* In */
+	__u32 certs_len;		/* In */
+};
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 18/45] crypto: ccp: Provide APIs to query extended attestation report
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (16 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 17/45] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-10  3:30   ` Marc Orr
  2021-08-20 15:58 ` [PATCH Part2 v5 19/45] KVM: SVM: Add support to handle AP reset MSR protocol Brijesh Singh
                   ` (27 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

Version 2 of the GHCB specification defines VMGEXIT that is used to get
the extended attestation report. The extended attestation report includes
the certificate blobs provided through the SNP_SET_EXT_CONFIG.

The snp_guest_ext_guest_request() will be used by the hypervisor to get
the extended attestation report. See the GHCB specification for more
details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 43 ++++++++++++++++++++++++++++++++++++
 include/linux/psp-sev.h      | 24 ++++++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 9ba194acbe85..e2650c3d0d0a 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -22,6 +22,7 @@
 #include <linux/firmware.h>
 #include <linux/gfp.h>
 #include <linux/cpufeature.h>
+#include <linux/sev-guest.h>
 
 #include <asm/smp.h>
 
@@ -1677,6 +1678,48 @@ int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
 }
 EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt);
 
+int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+				unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
+{
+	unsigned long expected_npages;
+	struct sev_device *sev;
+	int rc;
+
+	if (!psp_master || !psp_master->sev_data)
+		return -ENODEV;
+
+	sev = psp_master->sev_data;
+
+	if (!sev->snp_inited)
+		return -EINVAL;
+
+	/*
+	 * Check if there is enough space to copy the certificate chain. Otherwise
+	 * return ERROR code defined in the GHCB specification.
+	 */
+	expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
+	if (*npages < expected_npages) {
+		*npages = expected_npages;
+		*fw_err = SNP_GUEST_REQ_INVALID_LEN;
+		return -EINVAL;
+	}
+
+	rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)&fw_err);
+	if (rc)
+		return rc;
+
+	/* Copy the certificate blob */
+	if (sev->snp_certs_data) {
+		*npages = expected_npages;
+		memcpy((void *)vaddr, sev->snp_certs_data, *npages << PAGE_SHIFT);
+	} else {
+		*npages = 0;
+	}
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
+
 static void sev_exit(struct kref *ref)
 {
 	misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 00bd684dc094..ea94ce4d834a 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -924,6 +924,23 @@ void *psp_copy_user_blob(u64 uaddr, u32 len);
 void *snp_alloc_firmware_page(gfp_t mask);
 void snp_free_firmware_page(void *addr);
 
+/**
+ * snp_guest_ext_guest_request - perform the SNP extended guest request command
+ *  defined in the GHCB specification.
+ *
+ * @data: the input guest request structure
+ * @vaddr: address where the certificate blob need to be copied.
+ * @npages: number of pages for the certificate blob.
+ *    If the specified page count is less than the certificate blob size, then the
+ *    required page count is returned with error code defined in the GHCB spec.
+ *    If the specified page count is more than the certificate blob size, then
+ *    page count is updated to reflect the amount of valid data copied in the
+ *    vaddr.
+ */
+int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+				unsigned long vaddr, unsigned long *npages,
+				unsigned long *error);
+
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
 
 static inline int
@@ -971,6 +988,13 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)
 
 static inline void snp_free_firmware_page(void *addr) { }
 
+static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
+					      unsigned long vaddr, unsigned long *n,
+					      unsigned long *error)
+{
+	return -ENODEV;
+}
+
 #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
 
 #endif	/* __PSP_SEV_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 19/45] KVM: SVM: Add support to handle AP reset MSR protocol
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (17 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 18/45] crypto: ccp: Provide APIs to query extended attestation report Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 20/45] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT Brijesh Singh
                   ` (26 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev-common.h |  2 ++
 arch/x86/kvm/svm/sev.c            | 56 ++++++++++++++++++++++++++-----
 arch/x86/kvm/svm/svm.h            |  1 +
 3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 5f134c172dbf..d70a19000953 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -56,6 +56,8 @@
 /* AP Reset Hold */
 #define GHCB_MSR_AP_RESET_HOLD_REQ	0x006
 #define GHCB_MSR_AP_RESET_HOLD_RESP	0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS	12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK	GENMASK_ULL(51, 0)
 
 /* GHCB GPA Register */
 #define GHCB_MSR_REG_GPA_REQ		0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6ce9bafe768c..0ca5b5b9aeef 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -58,6 +58,10 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 #define sev_es_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
+#define AP_RESET_HOLD_NONE		0
+#define AP_RESET_HOLD_NAE_EVENT		1
+#define AP_RESET_HOLD_MSR_PROTO		2
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2210,6 +2214,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 
 void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 {
+	/* Clear any indication that the vCPU is in a type of AP Reset Hold */
+	svm->ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
 	if (!svm->ghcb)
 		return;
 
@@ -2415,6 +2422,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_AP_RESET_HOLD_REQ:
+		svm->ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+		ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+		/*
+		 * Preset the result to a non-SIPI return and then only set
+		 * the result to non-zero when delivering a SIPI.
+		 */
+		set_ghcb_msr_bits(svm, 0,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -2502,6 +2525,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
 		break;
 	case SVM_VMGEXIT_AP_HLT_LOOP:
+		svm->ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
 		ret = kvm_emulate_ap_reset_hold(vcpu);
 		break;
 	case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -2639,13 +2663,29 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		return;
 	}
 
-	/*
-	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
-	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
-	 * non-zero value.
-	 */
-	if (!svm->ghcb)
-		return;
+	/* Subsequent SIPI */
+	switch (svm->ap_reset_hold_type) {
+	case AP_RESET_HOLD_NAE_EVENT:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+		 */
+		ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+		break;
+	case AP_RESET_HOLD_MSR_PROTO:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set GHCB data field to a non-zero value.
+		 */
+		set_ghcb_msr_bits(svm, 1,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
 
-	ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	default:
+		break;
+	}
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8f4cdb98d8ee..5b8d9dec8028 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -188,6 +188,7 @@ struct vcpu_svm {
 	struct ghcb *ghcb;
 	struct kvm_host_map ghcb_map;
 	bool received_first_sipi;
+	unsigned int ap_reset_hold_type;
 
 	/* SEV-ES scratch area support */
 	void *ghcb_sa;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 20/45] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (18 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 19/45] KVM: SVM: Add support to handle AP reset MSR protocol Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-10-12 20:38   ` Sean Christopherson
  2021-08-20 15:58 ` [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Brijesh Singh
                   ` (25 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.

Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev-common.h |  2 ++
 arch/x86/kvm/svm/sev.c            | 14 ++++++++++++++
 arch/x86/kvm/svm/svm.h            |  3 ++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index d70a19000953..779c7e8f836c 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -97,6 +97,8 @@ enum psc_op {
 /* GHCB Hypervisor Feature Request/Response */
 #define GHCB_MSR_HV_FT_REQ		0x080
 #define GHCB_MSR_HV_FT_RESP		0x081
+#define GHCB_MSR_HV_FT_POS		12
+#define GHCB_MSR_HV_FT_MASK		GENMASK_ULL(51, 0)
 #define GHCB_MSR_HV_FT_RESP_VAL(v)			\
 	/* GHCBData[63:12] */				\
 	(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0ca5b5b9aeef..1644da5fc93f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2184,6 +2184,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+	case SVM_VMGEXIT_HV_FEATURES:
 		break;
 	default:
 		goto vmgexit_err;
@@ -2438,6 +2439,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_MASK,
 				  GHCB_MSR_INFO_POS);
 		break;
+	case GHCB_MSR_HV_FT_REQ: {
+		set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+				  GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -2553,6 +2561,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_HV_FEATURES: {
+		ghcb_set_sw_exit_info_2(ghcb, GHCB_HV_FT_SUPPORTED);
+
+		ret = 1;
+		break;
+	}
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5b8d9dec8028..d1f1512a4b47 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -548,9 +548,10 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 
 /* sev.c */
 
-#define GHCB_VERSION_MAX	1ULL
+#define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
+#define GHCB_HV_FT_SUPPORTED	0
 
 extern unsigned int max_sev_asid;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (19 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 20/45] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-22 18:55   ` Dr. David Alan Gilbert
  2021-10-12 20:44   ` Sean Christopherson
  2021-08-20 15:58 ` [PATCH Part2 v5 22/45] KVM: SVM: Add initial SEV-SNP support Brijesh Singh
                   ` (24 subsequent siblings)
  45 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.

When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing   pages as "in-use" in the RMP after a successful VMRUN.  This is
done for _all_ VMs, not just SNP-Active VMs.

If the hypervisor accesses an in-use page through a writable translation,
the CPU will throw an RMP violation #PF.  On early SNP hardware, if an
in-use page is 2mb aligned and software accesses any part of the associated
2mb region with a hupage, the CPU will incorrectly treat the entire 2mb
region as in-use and signal a spurious RMP violation #PF.

The recommended is to not use the hugepage for the VMCB, VMSA or
AVIC backing page. Add a generic allocator that will ensure that the page
returns is not hugepage (2mb or 1gb) and is safe to be used when SEV-SNP
is enabled.

Co-developed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/lapic.c               |  5 ++++-
 arch/x86/kvm/svm/sev.c             | 35 ++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c             | 16 ++++++++++++--
 arch/x86/kvm/svm/svm.h             |  1 +
 6 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index a12a4987154e..36a9c23a4b27 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -122,6 +122,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush)
 KVM_X86_OP_NULL(migrate_timers)
 KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP_NULL(complete_emulated_msr)
+KVM_X86_OP(alloc_apic_backing_page)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_NULL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 974cbfb1eefe..5ad6255ff5d5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1453,6 +1453,7 @@ struct kvm_x86_ops {
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ba5a27879f1d..05b45747b20b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2457,7 +2457,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
 
 	vcpu->arch.apic = apic;
 
-	apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+	if (kvm_x86_ops.alloc_apic_backing_page)
+		apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
+	else
+		apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
 	if (!apic->regs) {
 		printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
 		       vcpu->vcpu_id);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1644da5fc93f..8771b878193f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2703,3 +2703,38 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		break;
 	}
 }
+
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
+{
+	unsigned long pfn;
+	struct page *p;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+
+	/*
+	 * Allocate an SNP safe page to workaround the SNP erratum where
+	 * the CPU will incorrectly signal an RMP violation  #PF if a
+	 * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
+	 * or AVIC backing page. The recommeded workaround is to not use the
+	 * hugepage.
+	 *
+	 * Allocate one extra page, use a page which is not 2mb aligned
+	 * and free the other.
+	 */
+	p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+	if (!p)
+		return NULL;
+
+	split_page(p, 1);
+
+	pfn = page_to_pfn(p);
+	if (IS_ALIGNED(__pfn_to_phys(pfn), PMD_SIZE)) {
+		pfn++;
+		__free_page(p);
+	} else {
+		__free_page(pfn_to_page(pfn + 1));
+	}
+
+	return pfn_to_page(pfn);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 25773bf72158..058eea8353c9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1368,7 +1368,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	svm = to_svm(vcpu);
 
 	err = -ENOMEM;
-	vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	vmcb01_page = snp_safe_alloc_page(vcpu);
 	if (!vmcb01_page)
 		goto out;
 
@@ -1377,7 +1377,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 		 * SEV-ES guests require a separate VMSA page used to contain
 		 * the encrypted register state of the guest.
 		 */
-		vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+		vmsa_page = snp_safe_alloc_page(vcpu);
 		if (!vmsa_page)
 			goto error_free_vmcb_page;
 
@@ -4539,6 +4539,16 @@ static int svm_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
+{
+	struct page *page = snp_safe_alloc_page(vcpu);
+
+	if (!page)
+		return NULL;
+
+	return page_address(page);
+}
+
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.hardware_unsetup = svm_hardware_teardown,
 	.hardware_enable = svm_hardware_enable,
@@ -4667,6 +4677,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.complete_emulated_msr = svm_complete_emulated_msr,
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
+
+	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d1f1512a4b47..e40800e9c998 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -575,6 +575,7 @@ void sev_es_create_vcpu(struct vcpu_svm *svm);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 
 /* vmenter.S */
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 22/45] KVM: SVM: Add initial SEV-SNP support
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (20 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command Brijesh Singh
                   ` (23 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality  while adding new
hardware based security protection. SEV-SNP adds strong memory encryption
integrity protection to help prevent malicious hypervisor-based attacks
such as data replay, memory re-mapping, and more, to create an isolated
execution environment.

The SNP feature is added incrementally, the later patches adds a new module
parameters that can be used to enabled SEV-SNP in the KVM.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 10 +++++++++-
 arch/x86/kvm/svm/svm.h |  8 ++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8771b878193f..50fddbe56981 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -58,6 +58,9 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 #define sev_es_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
+
 #define AP_RESET_HOLD_NONE		0
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
@@ -1836,6 +1839,7 @@ void __init sev_hardware_setup(void)
 {
 #ifdef CONFIG_KVM_AMD_SEV
 	unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+	bool sev_snp_supported = false;
 	bool sev_es_supported = false;
 	bool sev_supported = false;
 
@@ -1896,12 +1900,16 @@ void __init sev_hardware_setup(void)
 	if (misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count))
 		goto out;
 
-	pr_info("SEV-ES supported: %u ASIDs\n", sev_es_asid_count);
 	sev_es_supported = true;
+	sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
+
+	pr_info("SEV-ES %ssupported: %u ASIDs\n",
+		sev_snp_supported ? "and SEV-SNP " : "", sev_es_asid_count);
 
 out:
 	sev_enabled = sev_supported;
 	sev_es_enabled = sev_es_supported;
+	sev_snp_enabled = sev_snp_supported;
 #endif
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e40800e9c998..01953522097d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -72,6 +72,7 @@ enum {
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
+	bool snp_active;	/* SEV-SNP enabled guest */
 	unsigned int asid;	/* ASID used for this guest */
 	unsigned int handle;	/* SEV firmware handle */
 	int fd;			/* SEV device fd */
@@ -246,6 +247,13 @@ static inline bool sev_es_guest(struct kvm *kvm)
 #endif
 }
 
+static inline bool sev_snp_guest(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	return sev_es_guest(kvm) && sev->snp_active;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (21 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 22/45] KVM: SVM: Add initial SEV-SNP support Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-05  6:56   ` Dov Murik
                     ` (3 more replies)
  2021-08-20 15:58 ` [PATCH Part2 v5 24/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Brijesh Singh
                   ` (22 subsequent siblings)
  45 siblings, 4 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh,
	Pavan Kumar Paluri

The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be the
first command issued. When creating SEV-SNP guest, the VMM must use this
command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.

The flags value must be zero, it will be extended in future SNP support to
communicate the optional features (such as restricted INT injection etc).

Co-developed-by: Pavan Kumar Paluri <papaluri@amd.com>
Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        | 27 ++++++++++++
 arch/x86/include/asm/svm.h                    |  2 +
 arch/x86/kvm/svm/sev.c                        | 44 ++++++++++++++++++-
 arch/x86/kvm/svm/svm.h                        |  4 ++
 include/uapi/linux/kvm.h                      | 13 ++++++
 5 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 5c081c8c7164..7b1d32fb99a8 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -427,6 +427,33 @@ issued by the hypervisor to make the guest ready for execution.
 
 Returns: 0 on success, -negative on error
 
+18. KVM_SNP_INIT
+----------------
+
+The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
+context. In a typical workflow, this command should be the first command issued.
+
+Parameters (in/out): struct kvm_snp_init
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_snp_init {
+                __u64 flags;
+        };
+
+The flags bitmap is defined as::
+
+   /* enable the restricted injection */
+   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
+
+   /* enable the restricted injection timer */
+   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)
+
+If the specified flags is not supported then return -EOPNOTSUPP, and the supported
+flags are returned.
+
 References
 ==========
 
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 44a3f920f886..a39e31845a33 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -218,6 +218,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define SVM_NESTED_CTL_SEV_ENABLE	BIT(1)
 #define SVM_NESTED_CTL_SEV_ES_ENABLE	BIT(2)
 
+#define SVM_SEV_FEAT_SNP_ACTIVE		BIT(0)
+
 struct vmcb_seg {
 	u16 selector;
 	u16 attrib;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 50fddbe56981..93da463545ef 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -235,10 +235,30 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 	sev_decommission(handle);
 }
 
+static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_snp_init params;
+	int ret = 0;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
+		ret = -EOPNOTSUPP;
+
+	params.flags = SEV_SNP_SUPPORTED_FLAGS;
+
+	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
+		ret = -EFAULT;
+
+	return ret;
+}
+
 static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
+	bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
-	bool es_active = argp->id == KVM_SEV_ES_INIT;
+	bool snp_active = argp->id == KVM_SEV_SNP_INIT;
 	int asid, ret;
 
 	if (kvm->created_vcpus)
@@ -249,12 +269,22 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		return ret;
 
 	sev->es_active = es_active;
+	sev->snp_active = snp_active;
 	asid = sev_asid_new(sev);
 	if (asid < 0)
 		goto e_no_asid;
 	sev->asid = asid;
 
-	ret = sev_platform_init(&argp->error);
+	if (snp_active) {
+		ret = verify_snp_init_flags(kvm, argp);
+		if (ret)
+			goto e_free;
+
+		ret = sev_snp_init(&argp->error);
+	} else {
+		ret = sev_platform_init(&argp->error);
+	}
+
 	if (ret)
 		goto e_free;
 
@@ -600,6 +630,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->pkru = svm->vcpu.arch.pkru;
 	save->xss  = svm->vcpu.arch.ia32_xss;
 
+	/* Enable the SEV-SNP feature */
+	if (sev_snp_guest(svm->vcpu.kvm))
+		save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
 	return 0;
 }
 
@@ -1532,6 +1566,12 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	}
 
 	switch (sev_cmd.id) {
+	case KVM_SEV_SNP_INIT:
+		if (!sev_snp_enabled) {
+			r = -ENOTTY;
+			goto out;
+		}
+		fallthrough;
 	case KVM_SEV_ES_INIT:
 		if (!sev_es_enabled) {
 			r = -ENOTTY;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 01953522097d..57c3c404b0b3 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -69,6 +69,9 @@ enum {
 /* TPR and CR2 are always written before VMRUN */
 #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
 
+/* Supported init feature flags */
+#define SEV_SNP_SUPPORTED_FLAGS		0x0
+
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
@@ -81,6 +84,7 @@ struct kvm_sev_info {
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
+	u64 snp_init_flags;
 };
 
 struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..944e2bf601fe 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1712,6 +1712,9 @@ enum sev_cmd_id {
 	/* Guest Migration Extension */
 	KVM_SEV_SEND_CANCEL,
 
+	/* SNP specific commands */
+	KVM_SEV_SNP_INIT,
+
 	KVM_SEV_NR_MAX,
 };
 
@@ -1808,6 +1811,16 @@ struct kvm_sev_receive_update_data {
 	__u32 trans_len;
 };
 
+/* enable the restricted injection */
+#define KVM_SEV_SNP_RESTRICTED_INJET   (1 << 0)
+
+/* enable the restricted injection timer */
+#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1 << 1)
+
+struct kvm_snp_init {
+	__u64 flags;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 24/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (22 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-08-20 15:58 ` [PATCH Part2 v5 25/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Brijesh Singh
                   ` (21 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. If the guest is expected to be migrated,
the command also binds a migration agent (MA) to the guest.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  24 ++++
 arch/x86/kvm/svm/sev.c                        | 116 +++++++++++++++++-
 arch/x86/kvm/svm/svm.h                        |   1 +
 include/uapi/linux/kvm.h                      |  10 ++
 4 files changed, 148 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 7b1d32fb99a8..937af3447954 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -454,6 +454,30 @@ The flags bitmap is defined as::
 If the specified flags is not supported then return -EOPNOTSUPP, and the supported
 flags are returned.
 
+19. KVM_SNP_LAUNCH_START
+------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. To create the encryption context, user must
+provide a guest policy, migration agent (if any) and guest OS visible
+workarounds value as defined SEV-SNP specification.
+
+Parameters (in): struct  kvm_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_start {
+                __u64 policy;           /* Guest policy to use. */
+                __u64 ma_uaddr;         /* userspace address of migration agent */
+                __u8 ma_en;             /* 1 if the migtation agent is enabled */
+                __u8 imi_en;            /* set IMI to 1. */
+                __u8 gosvw[16];         /* guest OS visible workarounds */
+        };
+
+See the SEV-SNP specification for further detail on the launch input.
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 93da463545ef..dbf04a52b23d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -21,6 +21,7 @@
 
 #include <asm/pkru.h>
 #include <asm/trapnr.h>
+#include <asm/sev.h>
 
 #include "x86.h"
 #include "svm.h"
@@ -74,6 +75,8 @@ static unsigned long sev_me_mask;
 static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
 
+static int snp_decommission_context(struct kvm *kvm);
+
 struct enc_region {
 	struct list_head list;
 	unsigned long npages;
@@ -85,7 +88,7 @@ struct enc_region {
 /* Called with the sev_bitmap_lock held, or on shutdown  */
 static int sev_flush_asids(int min_asid, int max_asid)
 {
-	int ret, pos, error = 0;
+	int ret, pos, error = 0, ret_snp = 0, error_snp = 0;
 
 	/* Check if there are any ASIDs to reclaim before performing a flush */
 	pos = find_next_bit(sev_reclaim_asid_bitmap, max_asid, min_asid);
@@ -101,12 +104,18 @@ static int sev_flush_asids(int min_asid, int max_asid)
 	wbinvd_on_all_cpus();
 	ret = sev_guest_df_flush(&error);
 
+	if (sev_snp_enabled)
+		ret_snp = snp_guest_df_flush(&error_snp);
+
 	up_write(&sev_deactivate_lock);
 
 	if (ret)
 		pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
 
-	return ret;
+	if (ret_snp)
+		pr_err("SEV: SNP_DF_FLUSH failed, ret=%d, error=%#x\n", ret_snp, error_snp);
+
+	return ret || ret_snp;
 }
 
 static inline bool is_mirroring_enc_context(struct kvm *kvm)
@@ -1543,6 +1552,74 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, &data, &argp->error);
 }
 
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct sev_data_snp_gctx_create data = {};
+	void *context;
+	int rc;
+
+	/* Allocate memory for context page */
+	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+	if (!context)
+		return NULL;
+
+	data.gctx_paddr = __psp_pa(context);
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+	if (rc) {
+		snp_free_firmware_page(context);
+		return NULL;
+	}
+
+	return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_activate data = {0};
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.asid   = sev_get_asid(kvm);
+	return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_start start = {0};
+	struct kvm_sev_snp_launch_start params;
+	int rc;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	sev->snp_context = snp_context_create(kvm, argp);
+	if (!sev->snp_context)
+		return -ENOTTY;
+
+	start.gctx_paddr = __psp_pa(sev->snp_context);
+	start.policy = params.policy;
+	memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+	if (rc)
+		goto e_free_context;
+
+	sev->fd = argp->sev_fd;
+	rc = snp_bind_asid(kvm, &argp->error);
+	if (rc)
+		goto e_free_context;
+
+	return 0;
+
+e_free_context:
+	snp_decommission_context(kvm);
+
+	return rc;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1632,6 +1709,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_RECEIVE_FINISH:
 		r = sev_receive_finish(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_START:
+		r = snp_launch_start(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -1825,6 +1905,28 @@ int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd)
 	return ret;
 }
 
+static int snp_decommission_context(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_decommission data = {};
+	int ret;
+
+	/* If context is not created then do nothing */
+	if (!sev->snp_context)
+		return 0;
+
+	data.gctx_paddr = __sme_pa(sev->snp_context);
+	ret = snp_guest_decommission(&data, NULL);
+	if (WARN_ONCE(ret, "failed to release guest context"))
+		return ret;
+
+	/* free the context page now */
+	snp_free_firmware_page(sev->snp_context);
+	sev->snp_context = NULL;
+
+	return 0;
+}
+
 void sev_vm_destroy(struct kvm *kvm)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -1863,7 +1965,15 @@ void sev_vm_destroy(struct kvm *kvm)
 
 	mutex_unlock(&kvm->lock);
 
-	sev_unbind_asid(kvm, sev->handle);
+	if (sev_snp_guest(kvm)) {
+		if (snp_decommission_context(kvm)) {
+			WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+			return;
+		}
+	} else {
+		sev_unbind_asid(kvm, sev->handle);
+	}
+
 	sev_asid_free(sev);
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 57c3c404b0b3..85417c44812d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -85,6 +85,7 @@ struct kvm_sev_info {
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 	u64 snp_init_flags;
+	void *snp_context;      /* SNP guest context page */
 };
 
 struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 944e2bf601fe..e6416e58cd9a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1714,6 +1714,7 @@ enum sev_cmd_id {
 
 	/* SNP specific commands */
 	KVM_SEV_SNP_INIT,
+	KVM_SEV_SNP_LAUNCH_START,
 
 	KVM_SEV_NR_MAX,
 };
@@ -1821,6 +1822,15 @@ struct kvm_snp_init {
 	__u64 flags;
 };
 
+struct kvm_sev_snp_launch_start {
+	__u64 policy;
+	__u64 ma_uaddr;
+	__u8 ma_en;
+	__u8 imi_en;
+	__u8 gosvw[16];
+	__u8 pad[6];
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 25/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (23 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 24/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-27 16:43   ` Peter Gonda
  2021-08-20 15:58 ` [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests Brijesh Singh
                   ` (20 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
guest's memory. The data is encrypted with the cryptographic context
created with the KVM_SEV_SNP_LAUNCH_START.

In addition to the inserting data, it can insert a two special pages
into the guests memory: the secrets page and the CPUID page.

While terminating the guest, reclaim the guest pages added in the RMP
table. If the reclaim fails, then the page is no longer safe to be
released back to the system and leak them.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  29 +++
 arch/x86/kvm/svm/sev.c                        | 187 ++++++++++++++++++
 include/uapi/linux/kvm.h                      |  19 ++
 3 files changed, 235 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index 937af3447954..ddcd94e9ffed 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -478,6 +478,35 @@ Returns: 0 on success, -negative on error
 
 See the SEV-SNP specification for further detail on the launch input.
 
+20. KVM_SNP_LAUNCH_UPDATE
+-------------------------
+
+The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
+calculates a measurement of the memory contents. The measurement is a signature
+of the memory contents that can be sent to the guest owner as an attestation
+that the memory was encrypted correctly by the firmware.
+
+Parameters (in): struct  kvm_snp_launch_update
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_update {
+                __u64 start_gfn;        /* Guest page number to start from. */
+                __u64 uaddr;            /* userspace address need to be encrypted */
+                __u32 len;              /* length of memory region */
+                __u8 imi_page;          /* 1 if memory is part of the IMI */
+                __u8 page_type;         /* page type */
+                __u8 vmpl3_perms;       /* VMPL3 permission mask */
+                __u8 vmpl2_perms;       /* VMPL2 permission mask */
+                __u8 vmpl1_perms;       /* VMPL1 permission mask */
+        };
+
+See the SEV-SNP spec for further details on how to build the VMPL permission
+mask and page type.
+
+
 References
 ==========
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index dbf04a52b23d..4b126598b7aa 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -17,6 +17,7 @@
 #include <linux/misc_cgroup.h>
 #include <linux/processor.h>
 #include <linux/trace_events.h>
+#include <linux/sev.h>
 #include <asm/fpu/internal.h>
 
 #include <asm/pkru.h>
@@ -227,6 +228,49 @@ static void sev_decommission(unsigned int handle)
 	sev_guest_decommission(&decommission, NULL);
 }
 
+static inline void snp_leak_pages(u64 pfn, enum pg_level level)
+{
+	unsigned int npages = page_level_size(level) >> PAGE_SHIFT;
+
+	WARN(1, "psc failed pfn 0x%llx pages %d (leaking)\n", pfn, npages);
+
+	while (npages) {
+		memory_failure(pfn, 0);
+		dump_rmpentry(pfn);
+		npages--;
+		pfn++;
+	}
+}
+
+static int snp_page_reclaim(u64 pfn)
+{
+	struct sev_data_snp_page_reclaim data = {0};
+	int err, rc;
+
+	data.paddr = __sme_set(pfn << PAGE_SHIFT);
+	rc = snp_guest_page_reclaim(&data, &err);
+	if (rc) {
+		/*
+		 * If the reclaim failed, then page is no longer safe
+		 * to use.
+		 */
+		snp_leak_pages(pfn, PG_LEVEL_4K);
+	}
+
+	return rc;
+}
+
+static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
+{
+	int rc;
+
+	rc = rmp_make_shared(pfn, level);
+	if (rc && leak)
+		snp_leak_pages(pfn, level);
+
+	return rc;
+}
+
 static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 {
 	struct sev_data_deactivate deactivate;
@@ -1620,6 +1664,123 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return rc;
 }
 
+static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct list_head *head = &sev->regions_list;
+	struct enc_region *i;
+
+	lockdep_assert_held(&kvm->lock);
+
+	list_for_each_entry(i, head, list) {
+		u64 start = i->uaddr;
+		u64 end = start + i->size;
+
+		if (start <= hva && end >= (hva + len))
+			return true;
+	}
+
+	return false;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_update data = {0};
+	struct kvm_sev_snp_launch_update params;
+	unsigned long npages, pfn, n = 0;
+	int *error = &argp->error;
+	struct page **inpages;
+	int ret, i, level;
+	u64 gfn;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	/* Verify that the specified address range is registered. */
+	if (!is_hva_registered(kvm, params.uaddr, params.len))
+		return -EINVAL;
+
+	/*
+	 * The userspace memory is already locked so technically we don't
+	 * need to lock it again. Later part of the function needs to know
+	 * pfn so call the sev_pin_memory() so that we can get the list of
+	 * pages to iterate through.
+	 */
+	inpages = sev_pin_memory(kvm, params.uaddr, params.len, &npages, 1);
+	if (!inpages)
+		return -ENOMEM;
+
+	/*
+	 * Verify that all the pages are marked shared in the RMP table before
+	 * going further. This is avoid the cases where the userspace may try
+	 * updating the same page twice.
+	 */
+	for (i = 0; i < npages; i++) {
+		if (snp_lookup_rmpentry(page_to_pfn(inpages[i]), &level) != 0) {
+			sev_unpin_memory(kvm, inpages, npages);
+			return -EFAULT;
+		}
+	}
+
+	gfn = params.start_gfn;
+	level = PG_LEVEL_4K;
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+
+	for (i = 0; i < npages; i++) {
+		pfn = page_to_pfn(inpages[i]);
+
+		ret = rmp_make_private(pfn, gfn << PAGE_SHIFT, level, sev_get_asid(kvm), true);
+		if (ret) {
+			ret = -EFAULT;
+			goto e_unpin;
+		}
+
+		n++;
+		data.address = __sme_page_pa(inpages[i]);
+		data.page_size = X86_TO_RMP_PG_LEVEL(level);
+		data.page_type = params.page_type;
+		data.vmpl3_perms = params.vmpl3_perms;
+		data.vmpl2_perms = params.vmpl2_perms;
+		data.vmpl1_perms = params.vmpl1_perms;
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, &data, error);
+		if (ret) {
+			/*
+			 * If the command failed then need to reclaim the page.
+			 */
+			snp_page_reclaim(pfn);
+			goto e_unpin;
+		}
+
+		gfn++;
+	}
+
+e_unpin:
+	/* Content of memory is updated, mark pages dirty */
+	for (i = 0; i < n; i++) {
+		set_page_dirty_lock(inpages[i]);
+		mark_page_accessed(inpages[i]);
+
+		/*
+		 * If its an error, then update RMP entry to change page ownership
+		 * to the hypervisor.
+		 */
+		if (ret)
+			host_rmp_make_shared(pfn, level, true);
+	}
+
+	/* Unlock the user pages */
+	sev_unpin_memory(kvm, inpages, npages);
+
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1712,6 +1873,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_START:
 		r = snp_launch_start(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_UPDATE:
+		r = snp_launch_update(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -1794,6 +1958,29 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
 static void __unregister_enc_region_locked(struct kvm *kvm,
 					   struct enc_region *region)
 {
+	unsigned long i, pfn;
+	int level;
+
+	/*
+	 * The guest memory pages are assigned in the RMP table. Unassign it
+	 * before releasing the memory.
+	 */
+	if (sev_snp_guest(kvm)) {
+		for (i = 0; i < region->npages; i++) {
+			pfn = page_to_pfn(region->pages[i]);
+
+			if (!snp_lookup_rmpentry(pfn, &level))
+				continue;
+
+			cond_resched();
+
+			if (level > PG_LEVEL_4K)
+				pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+			host_rmp_make_shared(pfn, level, true);
+		}
+	}
+
 	sev_unpin_memory(kvm, region->pages, region->npages);
 	list_del(&region->list);
 	kfree(region);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e6416e58cd9a..0681be4bdfdf 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1715,6 +1715,7 @@ enum sev_cmd_id {
 	/* SNP specific commands */
 	KVM_SEV_SNP_INIT,
 	KVM_SEV_SNP_LAUNCH_START,
+	KVM_SEV_SNP_LAUNCH_UPDATE,
 
 	KVM_SEV_NR_MAX,
 };
@@ -1831,6 +1832,24 @@ struct kvm_sev_snp_launch_start {
 	__u8 pad[6];
 };
 
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL		0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA		0x2
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO		0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED	0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS		0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID		0x6
+
+struct kvm_sev_snp_launch_update {
+	__u64 start_gfn;
+	__u64 uaddr;
+	__u32 len;
+	__u8 imi_page;
+	__u8 page_type;
+	__u8 vmpl3_perms;
+	__u8 vmpl2_perms;
+	__u8 vmpl1_perms;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (24 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 25/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Brijesh Singh
@ 2021-08-20 15:58 ` Brijesh Singh
  2021-09-23 17:18   ` Dr. David Alan Gilbert
  2021-10-12 18:46   ` Sean Christopherson
  2021-08-20 15:59 ` [PATCH Part2 v5 27/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Brijesh Singh
                   ` (19 subsequent siblings)
  45 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:58 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

When SEV-SNP is enabled, the guest private pages are added in the RMP
table; while adding the pages, the rmp_make_private() unmaps the pages
from the direct map. If KSM attempts to access those unmapped pages then
it will trigger #PF (page-not-present).

Encrypted guest pages cannot be shared between the process, so an
userspace should not mark the region mergeable but to be safe, mark the
process vma unmerable before adding the pages in the RMP table.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4b126598b7aa..dcef0ae5f8e4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -18,11 +18,13 @@
 #include <linux/processor.h>
 #include <linux/trace_events.h>
 #include <linux/sev.h>
+#include <linux/ksm.h>
 #include <asm/fpu/internal.h>
 
 #include <asm/pkru.h>
 #include <asm/trapnr.h>
 #include <asm/sev.h>
+#include <asm/mman.h>
 
 #include "x86.h"
 #include "svm.h"
@@ -1683,6 +1685,30 @@ static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
 	return false;
 }
 
+static int snp_mark_unmergable(struct kvm *kvm, u64 start, u64 size)
+{
+	struct vm_area_struct *vma;
+	u64 end = start + size;
+	int ret;
+
+	do {
+		vma = find_vma_intersection(kvm->mm, start, end);
+		if (!vma) {
+			ret = -EINVAL;
+			break;
+		}
+
+		ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
+				  MADV_UNMERGEABLE, &vma->vm_flags);
+		if (ret)
+			break;
+
+		start = vma->vm_end;
+	} while (end > vma->vm_end);
+
+	return ret;
+}
+
 static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -1707,6 +1733,12 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (!is_hva_registered(kvm, params.uaddr, params.len))
 		return -EINVAL;
 
+	mmap_write_lock(kvm->mm);
+	ret = snp_mark_unmergable(kvm, params.uaddr, params.len);
+	mmap_write_unlock(kvm->mm);
+	if (ret)
+		return -EFAULT;
+
 	/*
 	 * The userspace memory is already locked so technically we don't
 	 * need to lock it again. Later part of the function needs to know
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 27/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (25 preceding siblings ...)
  2021-08-20 15:58 ` [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2022-05-18 20:21   ` Marc Orr
  2021-08-20 15:59 ` [PATCH Part2 v5 28/45] KVM: X86: Keep the NPT and RMP page level in sync Brijesh Singh
                   ` (18 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.

While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 .../virt/kvm/amd-memory-encryption.rst        |  22 ++++
 arch/x86/kvm/svm/sev.c                        | 116 ++++++++++++++++++
 include/uapi/linux/kvm.h                      |  14 +++
 3 files changed, 152 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
index ddcd94e9ffed..c7332e0e0baa 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -506,6 +506,28 @@ Returns: 0 on success, -negative on error
 See the SEV-SNP spec for further details on how to build the VMPL permission
 mask and page type.
 
+21. KVM_SNP_LAUNCH_FINISH
+-------------------------
+
+After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
+issued to make the guest ready for the execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+        struct kvm_sev_snp_launch_finish {
+                __u64 id_block_uaddr;
+                __u64 id_auth_uaddr;
+                __u8 id_block_en;
+                __u8 auth_key_en;
+                __u8 host_data[32];
+        };
+
+
+See SEV-SNP specification for further details on launch finish input parameters.
 
 References
 ==========
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index dcef0ae5f8e4..248096a5c307 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1813,6 +1813,106 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_update data = {};
+	int i, ret;
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.page_type = SNP_PAGE_TYPE_VMSA;
+
+	for (i = 0; i < kvm->created_vcpus; i++) {
+		struct vcpu_svm *svm = to_svm(kvm->vcpus[i]);
+		u64 pfn = __pa(svm->vmsa) >> PAGE_SHIFT;
+
+		/* Perform some pre-encryption checks against the VMSA */
+		ret = sev_es_sync_vmsa(svm);
+		if (ret)
+			return ret;
+
+		/* Transition the VMSA page to a firmware state. */
+		ret = rmp_make_private(pfn, -1, PG_LEVEL_4K, sev->asid, true);
+		if (ret)
+			return ret;
+
+		/* Issue the SNP command to encrypt the VMSA */
+		data.address = __sme_pa(svm->vmsa);
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &data, &argp->error);
+		if (ret) {
+			snp_page_reclaim(pfn);
+			return ret;
+		}
+
+		svm->vcpu.arch.guest_state_protected = true;
+	}
+
+	return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_finish *data;
+	void *id_block = NULL, *id_auth = NULL;
+	struct kvm_sev_snp_launch_finish params;
+	int ret;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	/* Measure all vCPUs using LAUNCH_UPDATE before we finalize the launch flow. */
+	ret = snp_launch_update_vmsa(kvm, argp);
+	if (ret)
+		return ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (!data)
+		return -ENOMEM;
+
+	if (params.id_block_en) {
+		id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+		if (IS_ERR(id_block)) {
+			ret = PTR_ERR(id_block);
+			goto e_free;
+		}
+
+		data->id_block_en = 1;
+		data->id_block_paddr = __sme_pa(id_block);
+	}
+
+	if (params.auth_key_en) {
+		id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+		if (IS_ERR(id_auth)) {
+			ret = PTR_ERR(id_auth);
+			goto e_free_id_block;
+		}
+
+		data->auth_key_en = 1;
+		data->id_auth_paddr = __sme_pa(id_auth);
+	}
+
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+	kfree(id_auth);
+
+e_free_id_block:
+	kfree(id_block);
+
+e_free:
+	kfree(data);
+
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1908,6 +2008,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_UPDATE:
 		r = snp_launch_update(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_FINISH:
+		r = snp_launch_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2364,16 +2467,29 @@ static void sev_flush_guest_memory(struct vcpu_svm *svm, void *va,
 void sev_free_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm;
+	u64 pfn;
 
 	if (!sev_es_guest(vcpu->kvm))
 		return;
 
 	svm = to_svm(vcpu);
+	pfn = __pa(svm->vmsa) >> PAGE_SHIFT;
 
 	if (vcpu->arch.guest_state_protected)
 		sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
+
+	/*
+	 * If its an SNP guest, then VMSA was added in the RMP entry as
+	 * a guest owned page. Transition the page to hyperivosr state
+	 * before releasing it back to the system.
+	 */
+	if (sev_snp_guest(vcpu->kvm) &&
+	    host_rmp_make_shared(pfn, PG_LEVEL_4K, false))
+		goto skip_vmsa_free;
+
 	__free_page(virt_to_page(svm->vmsa));
 
+skip_vmsa_free:
 	if (svm->ghcb_sa_free)
 		kfree(svm->ghcb_sa);
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0681be4bdfdf..ab9b1c82b0ee 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1716,6 +1716,7 @@ enum sev_cmd_id {
 	KVM_SEV_SNP_INIT,
 	KVM_SEV_SNP_LAUNCH_START,
 	KVM_SEV_SNP_LAUNCH_UPDATE,
+	KVM_SEV_SNP_LAUNCH_FINISH,
 
 	KVM_SEV_NR_MAX,
 };
@@ -1850,6 +1851,19 @@ struct kvm_sev_snp_launch_update {
 	__u8 vmpl1_perms;
 };
 
+#define KVM_SEV_SNP_ID_BLOCK_SIZE	96
+#define KVM_SEV_SNP_ID_AUTH_SIZE	4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE	32
+
+struct kvm_sev_snp_launch_finish {
+	__u64 id_block_uaddr;
+	__u64 id_auth_uaddr;
+	__u8 id_block_en;
+	__u8 auth_key_en;
+	__u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+	__u8 pad[6];
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 28/45] KVM: X86: Keep the NPT and RMP page level in sync
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (26 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 27/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-08-20 15:59 ` [PATCH Part2 v5 29/45] KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault() Brijesh Singh
                   ` (17 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

When running an SEV-SNP VM, the sPA used to index the RMP entry is
obtained through the NPT translation (gva->gpa->spa). The NPT page
level is checked against the page level programmed in the RMP entry.
If the page level does not match, then it will cause a nested page
fault with the RMP bit set to indicate the RMP violation.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/mmu/mmu.c             |  5 ++++
 arch/x86/kvm/svm/sev.c             | 46 ++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c             |  1 +
 arch/x86/kvm/svm/svm.h             |  1 +
 6 files changed, 56 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 36a9c23a4b27..371756c7f8f4 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -123,6 +123,7 @@ KVM_X86_OP_NULL(migrate_timers)
 KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP_NULL(complete_emulated_msr)
 KVM_X86_OP(alloc_apic_backing_page)
+KVM_X86_OP_NULL(rmp_page_level_adjust)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_NULL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5ad6255ff5d5..109e80167f11 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1453,7 +1453,9 @@ struct kvm_x86_ops {
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+	void (*rmp_page_level_adjust)(struct kvm *kvm, kvm_pfn_t pfn, int *level);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 66f7f5bc3482..f9aaf6e1e51e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -43,6 +43,7 @@
 #include <linux/hash.h>
 #include <linux/kern_levels.h>
 #include <linux/kthread.h>
+#include <linux/sev.h>
 
 #include <asm/page.h>
 #include <asm/memtype.h>
@@ -2818,6 +2819,10 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
 	if (unlikely(!pte))
 		return PG_LEVEL_4K;
 
+	/* Adjust the page level based on the SEV-SNP RMP page level. */
+	if (kvm_x86_ops.rmp_page_level_adjust)
+		static_call(kvm_x86_rmp_page_level_adjust)(kvm, pfn, &level);
+
 	return level;
 }
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 248096a5c307..2ad186d7e7b0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3231,3 +3231,49 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
 
 	return pfn_to_page(pfn);
 }
+
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+	int level;
+
+	while (end > start) {
+		if (snp_lookup_rmpentry(start, &level) != 0)
+			return false;
+		start++;
+	}
+
+	return true;
+}
+
+void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level)
+{
+	int rmp_level, assigned;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return;
+
+	assigned = snp_lookup_rmpentry(pfn, &rmp_level);
+	if (unlikely(assigned < 0))
+		return;
+
+	if (!assigned) {
+		/*
+		 * If all the pages are shared then no need to keep the RMP
+		 * and NPT in sync.
+		 */
+		pfn = pfn & ~(PTRS_PER_PMD - 1);
+		if (is_pfn_range_shared(pfn, pfn + PTRS_PER_PMD))
+			return;
+	}
+
+	/*
+	 * The hardware installs 2MB TLB entries to access to 1GB pages,
+	 * therefore allow NPT to use 1GB pages when pfn was added as 2MB
+	 * in the RMP table.
+	 */
+	if (rmp_level == PG_LEVEL_2M && (*level == PG_LEVEL_1G))
+		return;
+
+	/* Adjust the level to keep the NPT and RMP in sync */
+	*level = min_t(size_t, *level, rmp_level);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 058eea8353c9..0c8510ad63f1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4679,6 +4679,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+	.rmp_page_level_adjust = sev_rmp_page_level_adjust,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 85417c44812d..27c0c7b265b8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -589,6 +589,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level);
 
 /* vmenter.S */
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 29/45] KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (27 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 28/45] KVM: X86: Keep the NPT and RMP page level in sync Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-08-20 15:59 ` [PATCH Part2 v5 30/45] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP Brijesh Singh
                   ` (16 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Sean Christopherson,
	Isaku Yamahata, Brijesh Singh

From: Sean Christopherson <sean.j.christopherson@intel.com>

When adding pages prior to boot, TDX will need the resulting host pfn so
that it can be passed to TDADDPAGE (TDX-SEAM always works with physical
addresses as it has its own page tables).  Start plumbing pfn back up
the page fault stack.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/mmu/mmu.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f9aaf6e1e51e..5cbcbedcaaa6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3818,7 +3818,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 }
 
 static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-			     bool prefault, int max_level, bool is_tdp)
+			     bool prefault, int max_level, bool is_tdp,
+			     kvm_pfn_t *pfn)
 {
 	bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
 	bool write = error_code & PFERR_WRITE_MASK;
@@ -3826,7 +3827,6 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
 	hva_t hva;
 	int r;
 
@@ -3846,11 +3846,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, gfn, gpa, pfn, &hva,
 			 write, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, *pfn, ACC_ALL, &r))
 		return r;
 
 	r = RET_PF_RETRY;
@@ -3860,7 +3860,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	else
 		write_lock(&vcpu->kvm->mmu_lock);
 
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(*pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
@@ -3868,9 +3869,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	if (is_tdp_mmu_fault)
 		r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
-				    pfn, prefault);
+				    *pfn, prefault);
 	else
-		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
+		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, *pfn,
 				 prefault, is_tdp);
 
 out_unlock:
@@ -3878,18 +3879,20 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		read_unlock(&vcpu->kvm->mmu_lock);
 	else
 		write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	kvm_release_pfn_clean(*pfn);
 	return r;
 }
 
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa,
 				u32 error_code, bool prefault)
 {
+	kvm_pfn_t pfn;
+
 	pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code);
 
 	/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
 	return direct_page_fault(vcpu, gpa & PAGE_MASK, error_code, prefault,
-				 PG_LEVEL_2M, false);
+				 PG_LEVEL_2M, false, &pfn);
 }
 
 int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
@@ -3928,6 +3931,7 @@ EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		       bool prefault)
 {
+	kvm_pfn_t pfn;
 	int max_level;
 
 	for (max_level = KVM_MAX_HUGEPAGE_LEVEL;
@@ -3941,7 +3945,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	}
 
 	return direct_page_fault(vcpu, gpa, error_code, prefault,
-				 max_level, true);
+				 max_level, true, &pfn);
 }
 
 static void nonpaging_init_context(struct kvm_mmu *context)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 30/45] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (28 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 29/45] KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault() Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-08-20 15:59 ` [PATCH Part2 v5 31/45] KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use Brijesh Singh
                   ` (15 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Sean Christopherson,
	Isaku Yamahata, Brijesh Singh

From: Sean Christopherson <sean.j.christopherson@intel.com>

Introduce a helper to directly (pun intended) fault-in a TDP page
without having to go through the full page fault path.  This allows
TDX to get the resulting pfn and also allows the RET_PF_* enums to
stay in mmu.c where they belong.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/mmu.h     |  3 +++
 arch/x86/kvm/mmu/mmu.c | 25 +++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 83e6c6965f1e..af063188d073 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -127,6 +127,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	return vcpu->arch.mmu->page_fault(vcpu, cr2_or_gpa, err, prefault);
 }
 
+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+			       u32 error_code, int max_level);
+
 /*
  * Currently, we have two sorts of write-protection, a) the first one
  * write-protects guest page to sync the guest modification, b) another one is
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5cbcbedcaaa6..a21e64ec048b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3948,6 +3948,31 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 				 max_level, true, &pfn);
 }
 
+kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
+			       u32 error_code, int max_level)
+{
+	kvm_pfn_t pfn;
+	int r;
+
+	if (mmu_topup_memory_caches(vcpu, false))
+		return KVM_PFN_ERR_FAULT;
+
+	/*
+	 * Loop on the page fault path to handle the case where an mmu_notifier
+	 * invalidation triggers RET_PF_RETRY.  In the normal page fault path,
+	 * KVM needs to resume the guest in case the invalidation changed any
+	 * of the page fault properties, i.e. the gpa or error code.  For this
+	 * path, the gpa and error code are fixed by the caller, and the caller
+	 * expects failure if and only if the page fault can't be fixed.
+	 */
+	do {
+		r = direct_page_fault(vcpu, gpa, error_code, false, max_level,
+				      true, &pfn);
+	} while (r == RET_PF_RETRY && !is_error_noslot_pfn(pfn));
+	return pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);
+
 static void nonpaging_init_context(struct kvm_mmu *context)
 {
 	context->page_fault = nonpaging_page_fault;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 31/45] KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (29 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 30/45] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-08-20 15:59 ` [PATCH Part2 v5 32/45] KVM: x86: Define RMP page fault error bits for #NPF Brijesh Singh
                   ` (14 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The SEV-SNP VMs may call the page state change VMGEXIT to add the GPA
as private or shared in the RMP table. The page state change VMGEXIT
will contain the RMP page level to be used in the RMP entry. If the
page level between the TDP and RMP does not match then, it will result
in nested-page-fault (RMP violation).

The SEV-SNP VMGEXIT handler will use the kvm_mmu_get_tdp_walk() to get
the current page-level in the TDP for the given GPA and calculate a
workable page level. If a GPA is mapped as a 4K-page in the TDP, but
the guest requested to add the GPA as a 2M in the RMP entry then the
2M request will be broken into 4K-pages to keep the RMP and TDP
page-levels in sync.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/mmu.h     |  2 ++
 arch/x86/kvm/mmu/mmu.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index af063188d073..7c4fac53183d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -117,6 +117,8 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		       bool prefault);
 
+bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, int *level);
+
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefault)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a21e64ec048b..e660d832e235 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3973,6 +3973,35 @@ kvm_pfn_t kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa,
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);
 
+bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, int *level)
+{
+	u64 sptes[PT64_ROOT_MAX_LEVEL + 1];
+	int leaf, root;
+
+	if (is_tdp_mmu(vcpu->arch.mmu))
+		leaf = kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, &root);
+	else
+		leaf = get_walk(vcpu, gpa, sptes, &root);
+
+	if (unlikely(leaf < 0))
+		return false;
+
+	/* Check if the leaf SPTE is present */
+	if (!is_shadow_present_pte(sptes[leaf]))
+		return false;
+
+	*pfn = spte_to_pfn(sptes[leaf]);
+	if (leaf > PG_LEVEL_4K) {
+		u64 page_mask = KVM_PAGES_PER_HPAGE(leaf) - KVM_PAGES_PER_HPAGE(leaf - 1);
+		*pfn |= (gpa_to_gfn(gpa) & page_mask);
+	}
+
+	*level = leaf;
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_get_tdp_walk);
+
 static void nonpaging_init_context(struct kvm_mmu *context)
 {
 	context->page_fault = nonpaging_page_fault;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 32/45] KVM: x86: Define RMP page fault error bits for #NPF
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (30 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 31/45] KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-09-30 23:41   ` Marc Orr
  2021-08-20 15:59 ` [PATCH Part2 v5 33/45] KVM: x86: Update page-fault trace to log full 64-bit error code Brijesh Singh
                   ` (13 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

When SEV-SNP is enabled globally, the hardware places restrictions on all
memory accesses based on the RMP entry, whether the hypervisor or a VM,
performs the accesses. When hardware encounters an RMP access violation
during a guest access, it will cause a #VMEXIT(NPF).

See APM2 section 16.36.10 for more details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 109e80167f11..a6e764458f3e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -239,8 +239,12 @@ enum x86_intercept_stage;
 #define PFERR_FETCH_BIT 4
 #define PFERR_PK_BIT 5
 #define PFERR_SGX_BIT 15
+#define PFERR_GUEST_RMP_BIT 31
 #define PFERR_GUEST_FINAL_BIT 32
 #define PFERR_GUEST_PAGE_BIT 33
+#define PFERR_GUEST_ENC_BIT 34
+#define PFERR_GUEST_SIZEM_BIT 35
+#define PFERR_GUEST_VMPL_BIT 36
 
 #define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
 #define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
@@ -251,6 +255,10 @@ enum x86_intercept_stage;
 #define PFERR_SGX_MASK (1U << PFERR_SGX_BIT)
 #define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
 #define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
+#define PFERR_GUEST_RMP_MASK (1ULL << PFERR_GUEST_RMP_BIT)
+#define PFERR_GUEST_ENC_MASK (1ULL << PFERR_GUEST_ENC_BIT)
+#define PFERR_GUEST_SIZEM_MASK (1ULL << PFERR_GUEST_SIZEM_BIT)
+#define PFERR_GUEST_VMPL_MASK (1ULL << PFERR_GUEST_VMPL_BIT)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
 				 PFERR_WRITE_MASK |		\
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 33/45] KVM: x86: Update page-fault trace to log full 64-bit error code
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (31 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 32/45] KVM: x86: Define RMP page fault error bits for #NPF Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-10-13 21:23   ` Sean Christopherson
  2021-08-20 15:59 ` [PATCH Part2 v5 34/45] KVM: SVM: Do not use long-lived GHCB map while setting scratch area Brijesh Singh
                   ` (12 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh, stable

The #NPT error code is a 64-bit value but the trace prints only the
lower 32-bits. Some of the fault error code (e.g PFERR_GUEST_FINAL_MASK)
are available in the upper 32-bits.

Cc: <stable@kernel.org>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/trace.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b484141ea15b..1c360e07856f 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -365,12 +365,12 @@ TRACE_EVENT(kvm_inj_exception,
  * Tracepoint for page fault.
  */
 TRACE_EVENT(kvm_page_fault,
-	TP_PROTO(unsigned long fault_address, unsigned int error_code),
+	TP_PROTO(unsigned long fault_address, u64 error_code),
 	TP_ARGS(fault_address, error_code),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	fault_address	)
-		__field(	unsigned int,	error_code	)
+		__field(	u64,		error_code	)
 	),
 
 	TP_fast_assign(
@@ -378,7 +378,7 @@ TRACE_EVENT(kvm_page_fault,
 		__entry->error_code	= error_code;
 	),
 
-	TP_printk("address %lx error_code %x",
+	TP_printk("address %lx error_code %llx",
 		  __entry->fault_address, __entry->error_code)
 );
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 34/45] KVM: SVM: Do not use long-lived GHCB map while setting scratch area
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (32 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 33/45] KVM: x86: Update page-fault trace to log full 64-bit error code Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-10-13 21:20   ` Sean Christopherson
  2021-08-20 15:59 ` [PATCH Part2 v5 35/45] KVM: SVM: Remove the long-lived GHCB host map Brijesh Singh
                   ` (11 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

The setup_vmgexit_scratch() function may rely on a long-lived GHCB
mapping if the GHCB shared buffer area was used for the scratch area.
In preparation for eliminating the long-lived GHCB mapping, always
allocate a buffer for the scratch area so it can be accessed without
the GHCB mapping.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 70 +++++++++++++++++++-----------------------
 arch/x86/kvm/svm/svm.h |  3 +-
 2 files changed, 34 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2ad186d7e7b0..7dfb68e06334 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2490,8 +2490,7 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 	__free_page(virt_to_page(svm->vmsa));
 
 skip_vmsa_free:
-	if (svm->ghcb_sa_free)
-		kfree(svm->ghcb_sa);
+	kfree(svm->ghcb_sa);
 }
 
 static void dump_ghcb(struct vcpu_svm *svm)
@@ -2579,6 +2578,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 	control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb);
 	control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb);
 
+	/* Copy the GHCB scratch area GPA */
+	svm->ghcb_sa_gpa = ghcb_get_sw_scratch(ghcb);
+
 	/* Clear the valid entries fields */
 	memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
 }
@@ -2714,22 +2716,12 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 	if (!svm->ghcb)
 		return;
 
-	if (svm->ghcb_sa_free) {
-		/*
-		 * The scratch area lives outside the GHCB, so there is a
-		 * buffer that, depending on the operation performed, may
-		 * need to be synced, then freed.
-		 */
-		if (svm->ghcb_sa_sync) {
-			kvm_write_guest(svm->vcpu.kvm,
-					ghcb_get_sw_scratch(svm->ghcb),
-					svm->ghcb_sa, svm->ghcb_sa_len);
-			svm->ghcb_sa_sync = false;
-		}
-
-		kfree(svm->ghcb_sa);
-		svm->ghcb_sa = NULL;
-		svm->ghcb_sa_free = false;
+	 /* Sync the scratch buffer area. */
+	if (svm->ghcb_sa_sync) {
+		kvm_write_guest(svm->vcpu.kvm,
+				ghcb_get_sw_scratch(svm->ghcb),
+				svm->ghcb_sa, svm->ghcb_sa_len);
+		svm->ghcb_sa_sync = false;
 	}
 
 	trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->ghcb);
@@ -2767,12 +2759,11 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
 static bool setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
-	struct ghcb *ghcb = svm->ghcb;
 	u64 ghcb_scratch_beg, ghcb_scratch_end;
 	u64 scratch_gpa_beg, scratch_gpa_end;
 	void *scratch_va;
 
-	scratch_gpa_beg = ghcb_get_sw_scratch(ghcb);
+	scratch_gpa_beg = svm->ghcb_sa_gpa;
 	if (!scratch_gpa_beg) {
 		pr_err("vmgexit: scratch gpa not provided\n");
 		return false;
@@ -2802,9 +2793,6 @@ static bool setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
 			       scratch_gpa_beg, scratch_gpa_end);
 			return false;
 		}
-
-		scratch_va = (void *)svm->ghcb;
-		scratch_va += (scratch_gpa_beg - control->ghcb_gpa);
 	} else {
 		/*
 		 * The guest memory must be read into a kernel buffer, so
@@ -2815,29 +2803,35 @@ static bool setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
 			       len, GHCB_SCRATCH_AREA_LIMIT);
 			return false;
 		}
+	}
+
+	if (svm->ghcb_sa_alloc_len < len) {
 		scratch_va = kzalloc(len, GFP_KERNEL_ACCOUNT);
 		if (!scratch_va)
 			return false;
 
-		if (kvm_read_guest(svm->vcpu.kvm, scratch_gpa_beg, scratch_va, len)) {
-			/* Unable to copy scratch area from guest */
-			pr_err("vmgexit: kvm_read_guest for scratch area failed\n");
-
-			kfree(scratch_va);
-			return false;
-		}
-
 		/*
-		 * The scratch area is outside the GHCB. The operation will
-		 * dictate whether the buffer needs to be synced before running
-		 * the vCPU next time (i.e. a read was requested so the data
-		 * must be written back to the guest memory).
+		 * Free the old scratch area and switch to using newly
+		 * allocated.
 		 */
-		svm->ghcb_sa_sync = sync;
-		svm->ghcb_sa_free = true;
+		kfree(svm->ghcb_sa);
+
+		svm->ghcb_sa_alloc_len = len;
+		svm->ghcb_sa = scratch_va;
 	}
 
-	svm->ghcb_sa = scratch_va;
+	if (kvm_read_guest(svm->vcpu.kvm, scratch_gpa_beg, svm->ghcb_sa, len)) {
+		/* Unable to copy scratch area from guest */
+		pr_err("vmgexit: kvm_read_guest for scratch area failed\n");
+		return false;
+	}
+
+	/*
+	 * The operation will dictate whether the buffer needs to be synced
+	 * before running the vCPU next time (i.e. a read was requested so
+	 * the data must be written back to the guest memory).
+	 */
+	svm->ghcb_sa_sync = sync;
 	svm->ghcb_sa_len = len;
 
 	return true;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 27c0c7b265b8..85c852bb548a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -199,8 +199,9 @@ struct vcpu_svm {
 	/* SEV-ES scratch area support */
 	void *ghcb_sa;
 	u64 ghcb_sa_len;
+	u64 ghcb_sa_gpa;
+	u32 ghcb_sa_alloc_len;
 	bool ghcb_sa_sync;
-	bool ghcb_sa_free;
 
 	bool guest_state_loaded;
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 35/45] KVM: SVM: Remove the long-lived GHCB host map
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (33 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 34/45] KVM: SVM: Do not use long-lived GHCB map while setting scratch area Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-08-20 15:59 ` [PATCH Part2 v5 36/45] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Brijesh Singh
                   ` (10 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

On VMGEXIT, sev_handle_vmgexit() creates a host mapping for the GHCB GPA,
and unmaps it just before VM-entry. This long-lived GHCB map is used by
the VMGEXIT handler through accessors such as ghcb_{set_get}_xxx().

A long-lived GHCB map can cause issue when SEV-SNP is enabled. When
SEV-SNP is enabled the mapped GPA needs to be protected against a page
state change.

To eliminate the long-lived GHCB mapping, update the GHCB sync operations
to explicitly map the GHCB before access and unmap it after access is
complete. This requires that the setting of the GHCBs sw_exit_info_{1,2}
fields be done during sev_es_sync_to_ghcb(), so create two new fields in
the vcpu_svm struct to hold these values when required to be set outside
of the GHCB mapping.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 129 ++++++++++++++++++++++++++---------------
 arch/x86/kvm/svm/svm.c |  12 ++--
 arch/x86/kvm/svm/svm.h |  24 +++++++-
 3 files changed, 111 insertions(+), 54 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7dfb68e06334..c41d972dadc3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2493,15 +2493,40 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 	kfree(svm->ghcb_sa);
 }
 
+static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
+{
+	struct vmcb_control_area *control = &svm->vmcb->control;
+	u64 gfn = gpa_to_gfn(control->ghcb_gpa);
+
+	if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
+		/* Unable to map GHCB from guest */
+		pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static inline void svm_unmap_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
+{
+	kvm_vcpu_unmap(&svm->vcpu, map, true);
+}
+
 static void dump_ghcb(struct vcpu_svm *svm)
 {
-	struct ghcb *ghcb = svm->ghcb;
+	struct kvm_host_map map;
 	unsigned int nbits;
+	struct ghcb *ghcb;
+
+	if (svm_map_ghcb(svm, &map))
+		return;
+
+	ghcb = map.hva;
 
 	/* Re-use the dump_invalid_vmcb module parameter */
 	if (!dump_invalid_vmcb) {
 		pr_warn_ratelimited("set kvm_amd.dump_invalid_vmcb=1 to dump internal KVM state.\n");
-		return;
+		goto e_unmap;
 	}
 
 	nbits = sizeof(ghcb->save.valid_bitmap) * 8;
@@ -2516,12 +2541,21 @@ static void dump_ghcb(struct vcpu_svm *svm)
 	pr_err("%-20s%016llx is_valid: %u\n", "sw_scratch",
 	       ghcb->save.sw_scratch, ghcb_sw_scratch_is_valid(ghcb));
 	pr_err("%-20s%*pb\n", "valid_bitmap", nbits, ghcb->save.valid_bitmap);
+
+e_unmap:
+	svm_unmap_ghcb(svm, &map);
 }
 
-static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
+static bool sev_es_sync_to_ghcb(struct vcpu_svm *svm)
 {
 	struct kvm_vcpu *vcpu = &svm->vcpu;
-	struct ghcb *ghcb = svm->ghcb;
+	struct kvm_host_map map;
+	struct ghcb *ghcb;
+
+	if (svm_map_ghcb(svm, &map))
+		return false;
+
+	ghcb = map.hva;
 
 	/*
 	 * The GHCB protocol so far allows for the following data
@@ -2535,13 +2569,24 @@ static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
 	ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]);
 	ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]);
 	ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]);
+
+	/*
+	 * Copy the return values from the exit_info_{1,2}.
+	 */
+	ghcb_set_sw_exit_info_1(ghcb, svm->ghcb_sw_exit_info_1);
+	ghcb_set_sw_exit_info_2(ghcb, svm->ghcb_sw_exit_info_2);
+
+	trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, ghcb);
+
+	svm_unmap_ghcb(svm, &map);
+
+	return true;
 }
 
-static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
+static void sev_es_sync_from_ghcb(struct vcpu_svm *svm, struct ghcb *ghcb)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
 	struct kvm_vcpu *vcpu = &svm->vcpu;
-	struct ghcb *ghcb = svm->ghcb;
 	u64 exit_code;
 
 	/*
@@ -2585,13 +2630,18 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 	memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
 }
 
-static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
+static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
 {
-	struct kvm_vcpu *vcpu;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_host_map map;
 	struct ghcb *ghcb;
-	u64 exit_code = 0;
 
-	ghcb = svm->ghcb;
+	if (svm_map_ghcb(svm, &map))
+		return -EFAULT;
+
+	ghcb = map.hva;
+
+	trace_kvm_vmgexit_enter(vcpu->vcpu_id, ghcb);
 
 	/* Only GHCB Usage code 0 is supported */
 	if (ghcb->ghcb_usage)
@@ -2601,7 +2651,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	 * Retrieve the exit code now even though is may not be marked valid
 	 * as it could help with debugging.
 	 */
-	exit_code = ghcb_get_sw_exit_code(ghcb);
+	*exit_code = ghcb_get_sw_exit_code(ghcb);
 
 	if (!ghcb_sw_exit_code_is_valid(ghcb) ||
 	    !ghcb_sw_exit_info_1_is_valid(ghcb) ||
@@ -2685,6 +2735,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		goto vmgexit_err;
 	}
 
+	sev_es_sync_from_ghcb(svm, ghcb);
+
+	svm_unmap_ghcb(svm, &map);
 	return 0;
 
 vmgexit_err:
@@ -2695,16 +2748,17 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 			    ghcb->ghcb_usage);
 	} else {
 		vcpu_unimpl(vcpu, "vmgexit: exit reason %#llx is not valid\n",
-			    exit_code);
+			    *exit_code);
 		dump_ghcb(svm);
 	}
 
 	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON;
 	vcpu->run->internal.ndata = 2;
-	vcpu->run->internal.data[0] = exit_code;
+	vcpu->run->internal.data[0] = *exit_code;
 	vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
 
+	svm_unmap_ghcb(svm, &map);
 	return -EINVAL;
 }
 
@@ -2713,23 +2767,20 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 	/* Clear any indication that the vCPU is in a type of AP Reset Hold */
 	svm->ap_reset_hold_type = AP_RESET_HOLD_NONE;
 
-	if (!svm->ghcb)
+	if (!svm->ghcb_in_use)
 		return;
 
 	 /* Sync the scratch buffer area. */
 	if (svm->ghcb_sa_sync) {
 		kvm_write_guest(svm->vcpu.kvm,
-				ghcb_get_sw_scratch(svm->ghcb),
+				svm->ghcb_sa_gpa,
 				svm->ghcb_sa, svm->ghcb_sa_len);
 		svm->ghcb_sa_sync = false;
 	}
 
-	trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->ghcb);
-
 	sev_es_sync_to_ghcb(svm);
 
-	kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
-	svm->ghcb = NULL;
+	svm->ghcb_in_use = false;
 }
 
 void pre_sev_run(struct vcpu_svm *svm, int cpu)
@@ -2961,7 +3012,6 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct vmcb_control_area *control = &svm->vmcb->control;
 	u64 ghcb_gpa, exit_code;
-	struct ghcb *ghcb;
 	int ret;
 
 	/* Validate the GHCB */
@@ -2974,27 +3024,14 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		return -EINVAL;
 	}
 
-	if (kvm_vcpu_map(vcpu, ghcb_gpa >> PAGE_SHIFT, &svm->ghcb_map)) {
-		/* Unable to map GHCB from guest */
-		vcpu_unimpl(vcpu, "vmgexit: error mapping GHCB [%#llx] from guest\n",
-			    ghcb_gpa);
-		return -EINVAL;
-	}
-
-	svm->ghcb = svm->ghcb_map.hva;
-	ghcb = svm->ghcb_map.hva;
-
-	trace_kvm_vmgexit_enter(vcpu->vcpu_id, ghcb);
-
-	exit_code = ghcb_get_sw_exit_code(ghcb);
-
-	ret = sev_es_validate_vmgexit(svm);
+	ret = sev_es_validate_vmgexit(svm, &exit_code);
 	if (ret)
 		return ret;
 
-	sev_es_sync_from_ghcb(svm);
-	ghcb_set_sw_exit_info_1(ghcb, 0);
-	ghcb_set_sw_exit_info_2(ghcb, 0);
+	svm->ghcb_in_use = true;
+
+	svm_set_ghcb_sw_exit_info_1(vcpu, 0);
+	svm_set_ghcb_sw_exit_info_2(vcpu, 0);
 
 	ret = -EINVAL;
 	switch (exit_code) {
@@ -3033,23 +3070,23 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 			break;
 		case 1:
 			/* Get AP jump table address */
-			ghcb_set_sw_exit_info_2(ghcb, sev->ap_jump_table);
+			svm_set_ghcb_sw_exit_info_2(vcpu, sev->ap_jump_table);
 			break;
 		default:
 			pr_err("svm: vmgexit: unsupported AP jump table request - exit_info_1=%#llx\n",
 			       control->exit_info_1);
-			ghcb_set_sw_exit_info_1(ghcb, 1);
-			ghcb_set_sw_exit_info_2(ghcb,
-						X86_TRAP_UD |
-						SVM_EVTINJ_TYPE_EXEPT |
-						SVM_EVTINJ_VALID);
+			svm_set_ghcb_sw_exit_info_1(vcpu, 1);
+			svm_set_ghcb_sw_exit_info_2(vcpu,
+						    X86_TRAP_UD |
+						    SVM_EVTINJ_TYPE_EXEPT |
+						    SVM_EVTINJ_VALID);
 		}
 
 		ret = 1;
 		break;
 	}
 	case SVM_VMGEXIT_HV_FEATURES: {
-		ghcb_set_sw_exit_info_2(ghcb, GHCB_HV_FT_SUPPORTED);
+		svm_set_ghcb_sw_exit_info_2(vcpu, GHCB_HV_FT_SUPPORTED);
 
 		ret = 1;
 		break;
@@ -3171,7 +3208,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		 * Return from an AP Reset Hold VMGEXIT, where the guest will
 		 * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
 		 */
-		ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+		svm_set_ghcb_sw_exit_info_2(vcpu, 1);
 		break;
 	case AP_RESET_HOLD_MSR_PROTO:
 		/*
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0c8510ad63f1..5f73f21a37a1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2786,14 +2786,14 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->ghcb))
+	if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->ghcb_in_use))
 		return kvm_complete_insn_gp(vcpu, err);
 
-	ghcb_set_sw_exit_info_1(svm->ghcb, 1);
-	ghcb_set_sw_exit_info_2(svm->ghcb,
-				X86_TRAP_GP |
-				SVM_EVTINJ_TYPE_EXEPT |
-				SVM_EVTINJ_VALID);
+	svm_set_ghcb_sw_exit_info_1(vcpu, 1);
+	svm_set_ghcb_sw_exit_info_2(vcpu,
+				    X86_TRAP_GP |
+				    SVM_EVTINJ_TYPE_EXEPT |
+				    SVM_EVTINJ_VALID);
 	return 1;
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 85c852bb548a..22c01d958898 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -191,8 +191,7 @@ struct vcpu_svm {
 
 	/* SEV-ES support */
 	struct sev_es_save_area *vmsa;
-	struct ghcb *ghcb;
-	struct kvm_host_map ghcb_map;
+	bool ghcb_in_use;
 	bool received_first_sipi;
 	unsigned int ap_reset_hold_type;
 
@@ -204,6 +203,13 @@ struct vcpu_svm {
 	bool ghcb_sa_sync;
 
 	bool guest_state_loaded;
+
+	/*
+	 * SEV-ES support to hold the sw_exit_info return values to be
+	 * sync'ed to the GHCB when mapped.
+	 */
+	u64 ghcb_sw_exit_info_1;
+	u64 ghcb_sw_exit_info_2;
 };
 
 struct svm_cpu_data {
@@ -503,6 +509,20 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
 void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
 void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
 
+static inline void svm_set_ghcb_sw_exit_info_1(struct kvm_vcpu *vcpu, u64 val)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->ghcb_sw_exit_info_1 = val;
+}
+
+static inline void svm_set_ghcb_sw_exit_info_2(struct kvm_vcpu *vcpu, u64 val)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	svm->ghcb_sw_exit_info_2 = val;
+}
+
 extern struct kvm_x86_nested_ops svm_nested_ops;
 
 /* avic.c */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 36/45] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (34 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 35/45] KVM: SVM: Remove the long-lived GHCB host map Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-08-20 15:59 ` [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
                   ` (9 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.

On VMEXIT, Verify that GHCB GPA matches with the registered value. If a
mismatch is detected then abort the guest.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev-common.h |  8 ++++++++
 arch/x86/kvm/svm/sev.c            | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h            |  7 +++++++
 3 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 779c7e8f836c..91089967ab09 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_POS	12
 #define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK	GENMASK_ULL(51, 0)
 
+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ		0x010
+#define GHCB_MSR_GPA_VALUE_POS		12
+#define GHCB_MSR_GPA_VALUE_MASK		GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP		0x011
+#define GHCB_MSR_PREF_GPA_NONE		0xfffffffffffff
+
 /* GHCB GPA Register */
 #define GHCB_MSR_REG_GPA_REQ		0x012
 #define GHCB_MSR_REG_GPA_REQ_VAL(v)			\
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c41d972dadc3..991b8c996fc1 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2984,6 +2984,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_PREF_GPA_REQ: {
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	}
+	case GHCB_MSR_REG_GPA_REQ: {
+		u64 gfn;
+
+		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+					GHCB_MSR_GPA_VALUE_POS);
+
+		svm->ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+		set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+				  GHCB_MSR_GPA_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -3024,6 +3045,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		return -EINVAL;
 	}
 
+	/* SEV-SNP guest requires that the GHCB GPA must be registered */
+	if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+		vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+		return -EINVAL;
+	}
+
 	ret = sev_es_validate_vmgexit(svm, &exit_code);
 	if (ret)
 		return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 22c01d958898..d10f7166b39d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -210,6 +210,8 @@ struct vcpu_svm {
 	 */
 	u64 ghcb_sw_exit_info_1;
 	u64 ghcb_sw_exit_info_2;
+
+	u64 ghcb_registered_gpa;
 };
 
 struct svm_cpu_data {
@@ -266,6 +268,11 @@ static inline bool sev_snp_guest(struct kvm *kvm)
 	return sev_es_guest(kvm) && sev->snp_active;
 }
 
+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+	return svm->ghcb_registered_gpa == val;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (35 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 36/45] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-09-28  9:56   ` Dr. David Alan Gilbert
  2021-10-12 21:48   ` Sean Christopherson
  2021-08-20 15:59 ` [PATCH Part2 v5 38/45] KVM: SVM: Add support to handle " Brijesh Singh
                   ` (8 subsequent siblings)
  45 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

Before changing the page state in the RMP entry, lookup the page in the
NPT to make sure that there is a valid mapping for it. If the mapping
exist then try to find a workable page level between the NPT and RMP for
the page. If the page is not mapped in the NPT, then create a fault such
that it gets mapped before we change the page state in the RMP entry.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev-common.h |   9 ++
 arch/x86/kvm/svm/sev.c            | 197 ++++++++++++++++++++++++++++++
 arch/x86/kvm/trace.h              |  34 ++++++
 arch/x86/kvm/x86.c                |   1 +
 4 files changed, 241 insertions(+)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 91089967ab09..4980f77aa1d5 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -89,6 +89,10 @@ enum psc_op {
 };
 
 #define GHCB_MSR_PSC_REQ		0x014
+#define GHCB_MSR_PSC_GFN_POS		12
+#define GHCB_MSR_PSC_GFN_MASK		GENMASK_ULL(39, 0)
+#define GHCB_MSR_PSC_OP_POS		52
+#define GHCB_MSR_PSC_OP_MASK		0xf
 #define GHCB_MSR_PSC_REQ_GFN(gfn, op)			\
 	/* GHCBData[55:52] */				\
 	(((u64)((op) & 0xf) << 52) |			\
@@ -98,6 +102,11 @@ enum psc_op {
 	GHCB_MSR_PSC_REQ)
 
 #define GHCB_MSR_PSC_RESP		0x015
+#define GHCB_MSR_PSC_ERROR_POS		32
+#define GHCB_MSR_PSC_ERROR_MASK		GENMASK_ULL(31, 0)
+#define GHCB_MSR_PSC_ERROR		GENMASK_ULL(31, 0)
+#define GHCB_MSR_PSC_RSVD_POS		12
+#define GHCB_MSR_PSC_RSVD_MASK		GENMASK_ULL(19, 0)
 #define GHCB_MSR_PSC_RESP_VAL(val)			\
 	/* GHCBData[63:32] */				\
 	(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 991b8c996fc1..6d9483ec91ab 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -31,6 +31,7 @@
 #include "svm_ops.h"
 #include "cpuid.h"
 #include "trace.h"
+#include "mmu.h"
 
 #define __ex(x) __kvm_handle_fault_on_reboot(x)
 
@@ -2905,6 +2906,181 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
+{
+	pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+	return psmash(pfn);
+}
+
+static int snp_make_page_shared(struct kvm *kvm, gpa_t gpa, kvm_pfn_t pfn, int level)
+{
+	int rc, rmp_level;
+
+	rc = snp_lookup_rmpentry(pfn, &rmp_level);
+	if (rc < 0)
+		return -EINVAL;
+
+	/* If page is not assigned then do nothing */
+	if (!rc)
+		return 0;
+
+	/*
+	 * Is the page part of an existing 2MB RMP entry ? Split the 2MB into
+	 * multiple of 4K-page before making the memory shared.
+	 */
+	if (level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M) {
+		rc = snp_rmptable_psmash(kvm, pfn);
+		if (rc)
+			return rc;
+	}
+
+	return rmp_make_shared(pfn, level);
+}
+
+static int snp_check_and_build_npt(struct kvm_vcpu *vcpu, gpa_t gpa, int level)
+{
+	struct kvm *kvm = vcpu->kvm;
+	int rc, npt_level;
+	kvm_pfn_t pfn;
+
+	/*
+	 * Get the pfn and level for the gpa from the nested page table.
+	 *
+	 * If the tdp walk fails, then its safe to say that there is no
+	 * valid mapping for this gpa. Create a fault to build the map.
+	 */
+	write_lock(&kvm->mmu_lock);
+	rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
+	write_unlock(&kvm->mmu_lock);
+	if (!rc) {
+		pfn = kvm_mmu_map_tdp_page(vcpu, gpa, PFERR_USER_MASK, level);
+		if (is_error_noslot_pfn(pfn))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int snp_gpa_to_hva(struct kvm *kvm, gpa_t gpa, hva_t *hva)
+{
+	struct kvm_memory_slot *slot;
+	gfn_t gfn = gpa_to_gfn(gpa);
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	/*
+	 * Note, using the __gfn_to_hva_memslot() is not solely for performance,
+	 * it's also necessary to avoid the "writable" check in __gfn_to_hva_many(),
+	 * which will always fail on read-only memslots due to gfn_to_hva() assuming
+	 * writes.
+	 */
+	*hva = __gfn_to_hva_memslot(slot, gfn);
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return 0;
+}
+
+static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op, gpa_t gpa,
+					  int level)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
+	struct kvm *kvm = vcpu->kvm;
+	int rc, npt_level;
+	kvm_pfn_t pfn;
+	gpa_t gpa_end;
+
+	gpa_end = gpa + page_level_size(level);
+
+	while (gpa < gpa_end) {
+		/*
+		 * If the gpa is not present in the NPT then build the NPT.
+		 */
+		rc = snp_check_and_build_npt(vcpu, gpa, level);
+		if (rc)
+			return -EINVAL;
+
+		if (op == SNP_PAGE_STATE_PRIVATE) {
+			hva_t hva;
+
+			if (snp_gpa_to_hva(kvm, gpa, &hva))
+				return -EINVAL;
+
+			/*
+			 * Verify that the hva range is registered. This enforcement is
+			 * required to avoid the cases where a page is marked private
+			 * in the RMP table but never gets cleanup during the VM
+			 * termination path.
+			 */
+			mutex_lock(&kvm->lock);
+			rc = is_hva_registered(kvm, hva, page_level_size(level));
+			mutex_unlock(&kvm->lock);
+			if (!rc)
+				return -EINVAL;
+
+			/*
+			 * Mark the userspace range unmerable before adding the pages
+			 * in the RMP table.
+			 */
+			mmap_write_lock(kvm->mm);
+			rc = snp_mark_unmergable(kvm, hva, page_level_size(level));
+			mmap_write_unlock(kvm->mm);
+			if (rc)
+				return -EINVAL;
+		}
+
+		write_lock(&kvm->mmu_lock);
+
+		rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
+		if (!rc) {
+			/*
+			 * This may happen if another vCPU unmapped the page
+			 * before we acquire the lock. Retry the PSC.
+			 */
+			write_unlock(&kvm->mmu_lock);
+			return 0;
+		}
+
+		/*
+		 * Adjust the level so that we don't go higher than the backing
+		 * page level.
+		 */
+		level = min_t(size_t, level, npt_level);
+
+		trace_kvm_snp_psc(vcpu->vcpu_id, pfn, gpa, op, level);
+
+		switch (op) {
+		case SNP_PAGE_STATE_SHARED:
+			rc = snp_make_page_shared(kvm, gpa, pfn, level);
+			break;
+		case SNP_PAGE_STATE_PRIVATE:
+			rc = rmp_make_private(pfn, gpa, level, sev->asid, false);
+			break;
+		default:
+			rc = -EINVAL;
+			break;
+		}
+
+		write_unlock(&kvm->mmu_lock);
+
+		if (rc) {
+			pr_err_ratelimited("Error op %d gpa %llx pfn %llx level %d rc %d\n",
+					   op, gpa, pfn, level, rc);
+			return rc;
+		}
+
+		gpa = gpa + page_level_size(level);
+	}
+
+	return 0;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3005,6 +3181,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_PSC_REQ: {
+		gfn_t gfn;
+		int ret;
+		enum psc_op op;
+
+		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_GFN_MASK, GHCB_MSR_PSC_GFN_POS);
+		op = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_OP_MASK, GHCB_MSR_PSC_OP_POS);
+
+		ret = __snp_handle_page_state_change(vcpu, op, gfn_to_gpa(gfn), PG_LEVEL_4K);
+
+		if (ret)
+			set_ghcb_msr_bits(svm, GHCB_MSR_PSC_ERROR,
+					  GHCB_MSR_PSC_ERROR_MASK, GHCB_MSR_PSC_ERROR_POS);
+		else
+			set_ghcb_msr_bits(svm, 0,
+					  GHCB_MSR_PSC_ERROR_MASK, GHCB_MSR_PSC_ERROR_POS);
+
+		set_ghcb_msr_bits(svm, 0, GHCB_MSR_PSC_RSVD_MASK, GHCB_MSR_PSC_RSVD_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_PSC_RESP, GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 1c360e07856f..35ca1cf8440a 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -7,6 +7,7 @@
 #include <asm/svm.h>
 #include <asm/clocksource.h>
 #include <asm/pvclock-abi.h>
+#include <asm/sev-common.h>
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvm
@@ -1711,6 +1712,39 @@ TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
 		  __entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
 );
 
+/*
+ * Tracepoint for the SEV-SNP page state change processing
+ */
+#define psc_operation					\
+	{SNP_PAGE_STATE_PRIVATE, "private"},		\
+	{SNP_PAGE_STATE_SHARED,  "shared"}		\
+
+TRACE_EVENT(kvm_snp_psc,
+	TP_PROTO(unsigned int vcpu_id, u64 pfn, u64 gpa, u8 op, int level),
+	TP_ARGS(vcpu_id, pfn, gpa, op, level),
+
+	TP_STRUCT__entry(
+		__field(int, vcpu_id)
+		__field(u64, pfn)
+		__field(u64, gpa)
+		__field(u8, op)
+		__field(int, level)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu_id = vcpu_id;
+		__entry->pfn = pfn;
+		__entry->gpa = gpa;
+		__entry->op = op;
+		__entry->level = level;
+	),
+
+	TP_printk("vcpu %u, pfn %llx, gpa %llx, op %s, level %d",
+		  __entry->vcpu_id, __entry->pfn, __entry->gpa,
+		  __print_symbolic(__entry->op, psc_operation),
+		  __entry->level)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5d5c5ed7dd4..afcdc75a99f2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12371,3 +12371,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_snp_psc);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 38/45] KVM: SVM: Add support to handle Page State Change VMGEXIT
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (36 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-09-28 10:17   ` Dr. David Alan Gilbert
  2021-08-20 15:59 ` [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap Brijesh Singh
                   ` (7 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev-common.h |  7 +++
 arch/x86/kvm/svm/sev.c            | 82 +++++++++++++++++++++++++++++--
 2 files changed, 84 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 4980f77aa1d5..5ee30bb2cdb8 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -126,6 +126,13 @@ enum psc_op {
 /* SNP Page State Change NAE event */
 #define VMGEXIT_PSC_MAX_ENTRY		253
 
+/* The page state change hdr structure in not valid */
+#define PSC_INVALID_HDR			1
+/* The hdr.cur_entry or hdr.end_entry is not valid */
+#define PSC_INVALID_ENTRY		2
+/* Page state change encountered undefined error */
+#define PSC_UNDEF_ERR			3
+
 struct psc_hdr {
 	u16 cur_entry;
 	u16 end_entry;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6d9483ec91ab..0de85ed63e9b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2731,6 +2731,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FEATURES:
+	case SVM_VMGEXIT_PSC:
 		break;
 	default:
 		goto vmgexit_err;
@@ -3004,13 +3005,13 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
 		 */
 		rc = snp_check_and_build_npt(vcpu, gpa, level);
 		if (rc)
-			return -EINVAL;
+			return PSC_UNDEF_ERR;
 
 		if (op == SNP_PAGE_STATE_PRIVATE) {
 			hva_t hva;
 
 			if (snp_gpa_to_hva(kvm, gpa, &hva))
-				return -EINVAL;
+				return PSC_UNDEF_ERR;
 
 			/*
 			 * Verify that the hva range is registered. This enforcement is
@@ -3022,7 +3023,7 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
 			rc = is_hva_registered(kvm, hva, page_level_size(level));
 			mutex_unlock(&kvm->lock);
 			if (!rc)
-				return -EINVAL;
+				return PSC_UNDEF_ERR;
 
 			/*
 			 * Mark the userspace range unmerable before adding the pages
@@ -3032,7 +3033,7 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
 			rc = snp_mark_unmergable(kvm, hva, page_level_size(level));
 			mmap_write_unlock(kvm->mm);
 			if (rc)
-				return -EINVAL;
+				return PSC_UNDEF_ERR;
 		}
 
 		write_lock(&kvm->mmu_lock);
@@ -3062,8 +3063,11 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
 		case SNP_PAGE_STATE_PRIVATE:
 			rc = rmp_make_private(pfn, gpa, level, sev->asid, false);
 			break;
+		case SNP_PAGE_STATE_PSMASH:
+		case SNP_PAGE_STATE_UNSMASH:
+			/* TODO: Add support to handle it */
 		default:
-			rc = -EINVAL;
+			rc = PSC_INVALID_ENTRY;
 			break;
 		}
 
@@ -3081,6 +3085,65 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
 	return 0;
 }
 
+static inline unsigned long map_to_psc_vmgexit_code(int rc)
+{
+	switch (rc) {
+	case PSC_INVALID_HDR:
+		return ((1ul << 32) | 1);
+	case PSC_INVALID_ENTRY:
+		return ((1ul << 32) | 2);
+	case RMPUPDATE_FAIL_OVERLAP:
+		return ((3ul << 32) | 2);
+	default: return (4ul << 32);
+	}
+}
+
+static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	int level, op, rc = PSC_UNDEF_ERR;
+	struct snp_psc_desc *info;
+	struct psc_entry *entry;
+	u16 cur, end;
+	gpa_t gpa;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		return PSC_INVALID_HDR;
+
+	if (!setup_vmgexit_scratch(svm, true, sizeof(*info))) {
+		pr_err("vmgexit: scratch area is not setup.\n");
+		return PSC_INVALID_HDR;
+	}
+
+	info = (struct snp_psc_desc *)svm->ghcb_sa;
+	cur = info->hdr.cur_entry;
+	end = info->hdr.end_entry;
+
+	if (cur >= VMGEXIT_PSC_MAX_ENTRY ||
+	    end >= VMGEXIT_PSC_MAX_ENTRY || cur > end)
+		return PSC_INVALID_ENTRY;
+
+	for (; cur <= end; cur++) {
+		entry = &info->entries[cur];
+		gpa = gfn_to_gpa(entry->gfn);
+		level = RMP_TO_X86_PG_LEVEL(entry->pagesize);
+		op = entry->operation;
+
+		if (!IS_ALIGNED(gpa, page_level_size(level))) {
+			rc = PSC_INVALID_ENTRY;
+			goto out;
+		}
+
+		rc = __snp_handle_page_state_change(vcpu, op, gpa, level);
+		if (rc)
+			goto out;
+	}
+
+out:
+	info->hdr.cur_entry = cur;
+	return rc ? map_to_psc_vmgexit_code(rc) : 0;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3315,6 +3378,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_PSC: {
+		unsigned long rc;
+
+		ret = 1;
+
+		rc = snp_handle_page_state_change(svm);
+		svm_set_ghcb_sw_exit_info_2(vcpu, rc);
+		break;
+	}
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (37 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 38/45] KVM: SVM: Add support to handle " Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-10-13  0:23   ` Sean Christopherson
  2021-08-20 15:59 ` [PATCH Part2 v5 40/45] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Brijesh Singh
                   ` (6 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

When SEV-SNP is enabled in the guest VM, the guest memory pages can
either be a private or shared. A write from the hypervisor goes through
the RMP checks. If hardware sees that hypervisor is attempting to write
to a guest private page, then it triggers an RMP violation #PF.

To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
used to verify that its safe to map a given guest page. Use the SRCU to
protect against the page state change for existing mapped pages.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  2 +
 arch/x86/include/asm/kvm_host.h    |  4 ++
 arch/x86/kvm/svm/sev.c             | 69 +++++++++++++++++++++-----
 arch/x86/kvm/svm/svm.c             |  4 ++
 arch/x86/kvm/svm/svm.h             |  8 +++
 arch/x86/kvm/x86.c                 | 78 +++++++++++++++++++++++++++---
 6 files changed, 146 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 371756c7f8f4..c09bd40e0160 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -124,6 +124,8 @@ KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP_NULL(complete_emulated_msr)
 KVM_X86_OP(alloc_apic_backing_page)
 KVM_X86_OP_NULL(rmp_page_level_adjust)
+KVM_X86_OP(post_map_gfn)
+KVM_X86_OP(post_unmap_gfn)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_NULL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a6e764458f3e..5ac1ff097e8c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1463,7 +1463,11 @@ struct kvm_x86_ops {
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
 
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+
 	void (*rmp_page_level_adjust)(struct kvm *kvm, kvm_pfn_t pfn, int *level);
+
+	int (*post_map_gfn)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token);
+	void (*post_unmap_gfn)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0de85ed63e9b..65b578463271 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -336,6 +336,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		if (ret)
 			goto e_free;
 
+		init_srcu_struct(&sev->psc_srcu);
 		ret = sev_snp_init(&argp->error);
 	} else {
 		ret = sev_platform_init(&argp->error);
@@ -2293,6 +2294,7 @@ void sev_vm_destroy(struct kvm *kvm)
 			WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
 			return;
 		}
+		cleanup_srcu_struct(&sev->psc_srcu);
 	} else {
 		sev_unbind_asid(kvm, sev->handle);
 	}
@@ -2494,23 +2496,32 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 	kfree(svm->ghcb_sa);
 }
 
-static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
+static inline int svm_map_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map, int *token)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
 	u64 gfn = gpa_to_gfn(control->ghcb_gpa);
+	struct kvm_vcpu *vcpu = &svm->vcpu;
 
-	if (kvm_vcpu_map(&svm->vcpu, gfn, map)) {
+	if (kvm_vcpu_map(vcpu, gfn, map)) {
 		/* Unable to map GHCB from guest */
 		pr_err("error mapping GHCB GFN [%#llx] from guest\n", gfn);
 		return -EFAULT;
 	}
 
+	if (sev_post_map_gfn(vcpu->kvm, map->gfn, map->pfn, token)) {
+		kvm_vcpu_unmap(vcpu, map, false);
+		return -EBUSY;
+	}
+
 	return 0;
 }
 
-static inline void svm_unmap_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map)
+static inline void svm_unmap_ghcb(struct vcpu_svm *svm, struct kvm_host_map *map, int token)
 {
-	kvm_vcpu_unmap(&svm->vcpu, map, true);
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+
+	kvm_vcpu_unmap(vcpu, map, true);
+	sev_post_unmap_gfn(vcpu->kvm, map->gfn, map->pfn, token);
 }
 
 static void dump_ghcb(struct vcpu_svm *svm)
@@ -2518,8 +2529,9 @@ static void dump_ghcb(struct vcpu_svm *svm)
 	struct kvm_host_map map;
 	unsigned int nbits;
 	struct ghcb *ghcb;
+	int token;
 
-	if (svm_map_ghcb(svm, &map))
+	if (svm_map_ghcb(svm, &map, &token))
 		return;
 
 	ghcb = map.hva;
@@ -2544,7 +2556,7 @@ static void dump_ghcb(struct vcpu_svm *svm)
 	pr_err("%-20s%*pb\n", "valid_bitmap", nbits, ghcb->save.valid_bitmap);
 
 e_unmap:
-	svm_unmap_ghcb(svm, &map);
+	svm_unmap_ghcb(svm, &map, token);
 }
 
 static bool sev_es_sync_to_ghcb(struct vcpu_svm *svm)
@@ -2552,8 +2564,9 @@ static bool sev_es_sync_to_ghcb(struct vcpu_svm *svm)
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 	struct kvm_host_map map;
 	struct ghcb *ghcb;
+	int token;
 
-	if (svm_map_ghcb(svm, &map))
+	if (svm_map_ghcb(svm, &map, &token))
 		return false;
 
 	ghcb = map.hva;
@@ -2579,7 +2592,7 @@ static bool sev_es_sync_to_ghcb(struct vcpu_svm *svm)
 
 	trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, ghcb);
 
-	svm_unmap_ghcb(svm, &map);
+	svm_unmap_ghcb(svm, &map, token);
 
 	return true;
 }
@@ -2636,8 +2649,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 	struct kvm_host_map map;
 	struct ghcb *ghcb;
+	int token;
 
-	if (svm_map_ghcb(svm, &map))
+	if (svm_map_ghcb(svm, &map, &token))
 		return -EFAULT;
 
 	ghcb = map.hva;
@@ -2739,7 +2753,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
 
 	sev_es_sync_from_ghcb(svm, ghcb);
 
-	svm_unmap_ghcb(svm, &map);
+	svm_unmap_ghcb(svm, &map, token);
 	return 0;
 
 vmgexit_err:
@@ -2760,7 +2774,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
 	vcpu->run->internal.data[0] = *exit_code;
 	vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu;
 
-	svm_unmap_ghcb(svm, &map);
+	svm_unmap_ghcb(svm, &map, token);
 	return -EINVAL;
 }
 
@@ -3036,6 +3050,9 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
 				return PSC_UNDEF_ERR;
 		}
 
+		/* Wait for all the existing mapped gfn to unmap */
+		synchronize_srcu_expedited(&sev->psc_srcu);
+
 		write_lock(&kvm->mmu_lock);
 
 		rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
@@ -3604,3 +3621,33 @@ void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level)
 	/* Adjust the level to keep the NPT and RMP in sync */
 	*level = min_t(size_t, *level, rmp_level);
 }
+
+int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	int level;
+
+	if (!sev_snp_guest(kvm))
+		return 0;
+
+	*token = srcu_read_lock(&sev->psc_srcu);
+
+	/* If pfn is not added as private then fail */
+	if (snp_lookup_rmpentry(pfn, &level) == 1) {
+		srcu_read_unlock(&sev->psc_srcu, *token);
+		pr_err_ratelimited("failed to map private gfn 0x%llx pfn 0x%llx\n", gfn, pfn);
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	if (!sev_snp_guest(kvm))
+		return;
+
+	srcu_read_unlock(&sev->psc_srcu, token);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5f73f21a37a1..3784d389247b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4679,7 +4679,11 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
 	.rmp_page_level_adjust = sev_rmp_page_level_adjust,
+
+	.post_map_gfn = sev_post_map_gfn,
+	.post_unmap_gfn = sev_post_unmap_gfn,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d10f7166b39d..ff91184f9b4a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -76,16 +76,22 @@ struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
 	bool snp_active;	/* SEV-SNP enabled guest */
+
 	unsigned int asid;	/* ASID used for this guest */
 	unsigned int handle;	/* SEV firmware handle */
 	int fd;			/* SEV device fd */
+
 	unsigned long pages_locked; /* Number of pages locked */
 	struct list_head regions_list;  /* List of registered regions */
+
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
+
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
+
 	u64 snp_init_flags;
 	void *snp_context;      /* SNP guest context page */
+	struct srcu_struct psc_srcu;
 };
 
 struct kvm_svm {
@@ -618,6 +624,8 @@ void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level);
+int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token);
+void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token);
 
 /* vmenter.S */
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index afcdc75a99f2..bf4389ffc88f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3095,6 +3095,65 @@ static inline bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu->arch.apf.msr_en_val & mask) == mask;
 }
 
+static int kvm_map_gfn_protected(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
+				 struct gfn_to_pfn_cache *cache, bool atomic, int *token)
+{
+	int ret;
+
+	ret = kvm_map_gfn(vcpu, gfn, map, cache, atomic);
+	if (ret)
+		return ret;
+
+	if (kvm_x86_ops.post_map_gfn) {
+		ret = static_call(kvm_x86_post_map_gfn)(vcpu->kvm, map->gfn, map->pfn, token);
+		if (ret)
+			kvm_unmap_gfn(vcpu, map, cache, false, atomic);
+	}
+
+	return ret;
+}
+
+static int kvm_unmap_gfn_protected(struct kvm_vcpu *vcpu, struct kvm_host_map *map,
+				   struct gfn_to_pfn_cache *cache, bool dirty,
+				   bool atomic, int token)
+{
+	int ret;
+
+	ret = kvm_unmap_gfn(vcpu, map, cache, dirty, atomic);
+
+	if (kvm_x86_ops.post_unmap_gfn)
+		static_call(kvm_x86_post_unmap_gfn)(vcpu->kvm, map->gfn, map->pfn, token);
+
+	return ret;
+}
+
+static int kvm_vcpu_map_protected(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map,
+				  int *token)
+{
+	int ret;
+
+	ret = kvm_vcpu_map(vcpu, gpa, map);
+	if (ret)
+		return ret;
+
+	if (kvm_x86_ops.post_map_gfn) {
+		ret = static_call(kvm_x86_post_map_gfn)(vcpu->kvm, map->gfn, map->pfn, token);
+		if (ret)
+			kvm_vcpu_unmap(vcpu, map, false);
+	}
+
+	return ret;
+}
+
+static void kvm_vcpu_unmap_protected(struct kvm_vcpu *vcpu, struct kvm_host_map *map,
+				     bool dirty, int token)
+{
+	kvm_vcpu_unmap(vcpu, map, dirty);
+
+	if (kvm_x86_ops.post_unmap_gfn)
+		static_call(kvm_x86_post_unmap_gfn)(vcpu->kvm, map->gfn, map->pfn, token);
+}
+
 static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 {
 	gpa_t gpa = data & ~0x3f;
@@ -3185,6 +3244,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 {
 	struct kvm_host_map map;
 	struct kvm_steal_time *st;
+	int token;
 
 	if (kvm_xen_msr_enabled(vcpu->kvm)) {
 		kvm_xen_runstate_set_running(vcpu);
@@ -3195,8 +3255,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 		return;
 
 	/* -EAGAIN is returned in atomic context so we can just return. */
-	if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT,
-			&map, &vcpu->arch.st.cache, false))
+	if (kvm_map_gfn_protected(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT,
+				  &map, &vcpu->arch.st.cache, false, &token))
 		return;
 
 	st = map.hva +
@@ -3234,7 +3294,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 
 	st->version += 1;
 
-	kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, false);
+	kvm_unmap_gfn_protected(vcpu, &map, &vcpu->arch.st.cache, true, false, token);
 }
 
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@@ -4271,6 +4331,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 {
 	struct kvm_host_map map;
 	struct kvm_steal_time *st;
+	int token;
 
 	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
 		return;
@@ -4278,8 +4339,8 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.st.preempted)
 		return;
 
-	if (kvm_map_gfn(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT, &map,
-			&vcpu->arch.st.cache, true))
+	if (kvm_map_gfn_protected(vcpu, vcpu->arch.st.msr_val >> PAGE_SHIFT,
+				  &map, &vcpu->arch.st.cache, true, &token))
 		return;
 
 	st = map.hva +
@@ -4287,7 +4348,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 
 	st->preempted = vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED;
 
-	kvm_unmap_gfn(vcpu, &map, &vcpu->arch.st.cache, true, true);
+	kvm_unmap_gfn_protected(vcpu, &map, &vcpu->arch.st.cache, true, true, token);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -6816,6 +6877,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 	gpa_t gpa;
 	char *kaddr;
 	bool exchanged;
+	int token;
 
 	/* guests cmpxchg8b have to be emulated atomically */
 	if (bytes > 8 || (bytes & (bytes - 1)))
@@ -6839,7 +6901,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 	if (((gpa + bytes - 1) & page_line_mask) != (gpa & page_line_mask))
 		goto emul_write;
 
-	if (kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map))
+	if (kvm_vcpu_map_protected(vcpu, gpa_to_gfn(gpa), &map, &token))
 		goto emul_write;
 
 	kaddr = map.hva + offset_in_page(gpa);
@@ -6861,7 +6923,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 		BUG();
 	}
 
-	kvm_vcpu_unmap(vcpu, &map, true);
+	kvm_vcpu_unmap_protected(vcpu, &map, true, token);
 
 	if (!exchanged)
 		return X86EMUL_CMPXCHG_FAILED;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 40/45] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (38 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-08-20 15:59 ` [PATCH Part2 v5 41/45] KVM: SVM: Add support to handle the RMP nested page fault Brijesh Singh
                   ` (5 subsequent siblings)
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

While resolving the RMP page fault, we may run into cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.

To keep the RMP and TDP page level in sync, we will zap the gfn range
after splitting the pages in the RMP entry. The zap should force the
TDP to gets rebuilt with the new page level.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/mmu.h              | 2 --
 arch/x86/kvm/mmu/mmu.c          | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5ac1ff097e8c..8773c1f9e45e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1561,6 +1561,8 @@ void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
 
 int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 7c4fac53183d..f767a52f9178 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -228,8 +228,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	return -(u32)fault & errcode;
 }
 
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e660d832e235..56a7da49092d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5748,6 +5748,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 
 	return need_tlb_flush;
 }
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);
 
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 41/45] KVM: SVM: Add support to handle the RMP nested page fault
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (39 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 40/45] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-09-29 12:24   ` Dr. David Alan Gilbert
  2021-10-13 17:57   ` Sean Christopherson
  2021-08-20 15:59 ` [PATCH Part2 v5 42/45] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Brijesh Singh
                   ` (4 subsequent siblings)
  45 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

When SEV-SNP is enabled in the guest, the hardware places restrictions on
all memory accesses based on the contents of the RMP table. When hardware
encounters RMP check failure caused by the guest memory access it raises
the #NPF. The error code contains additional information on the access
type. See the APM volume 2 for additional information.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c | 14 +++++---
 arch/x86/kvm/svm/svm.h |  1 +
 3 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 65b578463271..712e8907bc39 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3651,3 +3651,79 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token)
 
 	srcu_read_unlock(&sev->psc_srcu, token);
 }
+
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+	int rmp_level, npt_level, rc, assigned;
+	struct kvm *kvm = vcpu->kvm;
+	gfn_t gfn = gpa_to_gfn(gpa);
+	bool need_psc = false;
+	enum psc_op psc_op;
+	kvm_pfn_t pfn;
+	bool private;
+
+	write_lock(&kvm->mmu_lock);
+
+	if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level)))
+		goto unlock;
+
+	assigned = snp_lookup_rmpentry(pfn, &rmp_level);
+	if (unlikely(assigned < 0))
+		goto unlock;
+
+	private = !!(error_code & PFERR_GUEST_ENC_MASK);
+
+	/*
+	 * If the fault was due to size mismatch, or NPT and RMP page level's
+	 * are not in sync, then use PSMASH to split the RMP entry into 4K.
+	 */
+	if ((error_code & PFERR_GUEST_SIZEM_MASK) ||
+	    (npt_level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M && private)) {
+		rc = snp_rmptable_psmash(kvm, pfn);
+		if (rc)
+			pr_err_ratelimited("psmash failed, gpa 0x%llx pfn 0x%llx rc %d\n",
+					   gpa, pfn, rc);
+		goto out;
+	}
+
+	/*
+	 * If it's a private access, and the page is not assigned in the
+	 * RMP table, create a new private RMP entry. This can happen if
+	 * guest did not use the PSC VMGEXIT to transition the page state
+	 * before the access.
+	 */
+	if (!assigned && private) {
+		need_psc = 1;
+		psc_op = SNP_PAGE_STATE_PRIVATE;
+		goto out;
+	}
+
+	/*
+	 * If it's a shared access, but the page is private in the RMP table
+	 * then make the page shared in the RMP table. This can happen if
+	 * the guest did not use the PSC VMGEXIT to transition the page
+	 * state before the access.
+	 */
+	if (assigned && !private) {
+		need_psc = 1;
+		psc_op = SNP_PAGE_STATE_SHARED;
+	}
+
+out:
+	write_unlock(&kvm->mmu_lock);
+
+	if (need_psc)
+		rc = __snp_handle_page_state_change(vcpu, psc_op, gpa, PG_LEVEL_4K);
+
+	/*
+	 * The fault handler has updated the RMP pagesize, zap the existing
+	 * rmaps for large entry ranges so that nested page table gets rebuilt
+	 * with the updated RMP pagesize.
+	 */
+	gfn = gpa_to_gfn(gpa) & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+	kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+	return;
+
+unlock:
+	write_unlock(&kvm->mmu_lock);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3784d389247b..3ba62f21b113 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1933,15 +1933,21 @@ static int pf_interception(struct kvm_vcpu *vcpu)
 static int npf_interception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	int rc;
 
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
 	trace_kvm_page_fault(fault_address, error_code);
-	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
-			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
-			svm->vmcb->control.insn_bytes : NULL,
-			svm->vmcb->control.insn_len);
+	rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+				svm->vmcb->control.insn_bytes : NULL,
+				svm->vmcb->control.insn_len);
+
+	if (error_code & PFERR_GUEST_RMP_MASK)
+		handle_rmp_page_fault(vcpu, fault_address, error_code);
+
+	return rc;
 }
 
 static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ff91184f9b4a..280072995306 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -626,6 +626,7 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level);
 int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token);
 void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token);
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
 
 /* vmenter.S */
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 42/45] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (40 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 41/45] KVM: SVM: Add support to handle the RMP nested page fault Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-09-29 21:33   ` Peter Gonda
  2021-09-29 22:00   ` Peter Gonda
  2021-08-20 15:59 ` [PATCH Part2 v5 43/45] KVM: SVM: Use a VMSA physical address variable for populating VMCB Brijesh Singh
                   ` (3 subsequent siblings)
  45 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

Version 2 of GHCB specification added the support for two SNP Guest
Request Message NAE events. The events allows for an SEV-SNP guest to
make request to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.

The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
difference of an additional certificate blob that can be passed through
the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
provides snp_guest_ext_guest_request() that is used by the KVM to get
both the report and certificate data at once.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 197 +++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.h |   2 +
 2 files changed, 193 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 712e8907bc39..81ccad412e55 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -19,6 +19,7 @@
 #include <linux/trace_events.h>
 #include <linux/sev.h>
 #include <linux/ksm.h>
+#include <linux/sev-guest.h>
 #include <asm/fpu/internal.h>
 
 #include <asm/pkru.h>
@@ -338,6 +339,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 		init_srcu_struct(&sev->psc_srcu);
 		ret = sev_snp_init(&argp->error);
+		mutex_init(&sev->guest_req_lock);
 	} else {
 		ret = sev_platform_init(&argp->error);
 	}
@@ -1602,23 +1604,39 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
+	void *context = NULL, *certs_data = NULL, *resp_page = NULL;
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
 	struct sev_data_snp_gctx_create data = {};
-	void *context;
 	int rc;
 
+	/* Allocate memory used for the certs data in SNP guest request */
+	certs_data = kmalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
+	if (!certs_data)
+		return NULL;
+
 	/* Allocate memory for context page */
 	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
 	if (!context)
-		return NULL;
+		goto e_free;
+
+	/* Allocate a firmware buffer used during the guest command handling. */
+	resp_page = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+	if (!resp_page)
+		goto e_free;
 
 	data.gctx_paddr = __psp_pa(context);
 	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
-	if (rc) {
-		snp_free_firmware_page(context);
-		return NULL;
-	}
+	if (rc)
+		goto e_free;
+
+	sev->snp_certs_data = certs_data;
 
 	return context;
+
+e_free:
+	snp_free_firmware_page(context);
+	kfree(certs_data);
+	return NULL;
 }
 
 static int snp_bind_asid(struct kvm *kvm, int *error)
@@ -2248,6 +2266,8 @@ static int snp_decommission_context(struct kvm *kvm)
 	snp_free_firmware_page(sev->snp_context);
 	sev->snp_context = NULL;
 
+	kfree(sev->snp_certs_data);
+
 	return 0;
 }
 
@@ -2746,6 +2766,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FEATURES:
 	case SVM_VMGEXIT_PSC:
+	case SVM_VMGEXIT_GUEST_REQUEST:
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
 		break;
 	default:
 		goto vmgexit_err;
@@ -3161,6 +3183,155 @@ static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm)
 	return rc ? map_to_psc_vmgexit_code(rc) : 0;
 }
 
+static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
+					 struct sev_data_snp_guest_request *data,
+					 gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	kvm_pfn_t req_pfn, resp_pfn;
+	struct kvm_sev_info *sev;
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
+		return SEV_RET_INVALID_PARAM;
+
+	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+	if (is_error_noslot_pfn(req_pfn))
+		return SEV_RET_INVALID_ADDRESS;
+
+	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+	if (is_error_noslot_pfn(resp_pfn))
+		return SEV_RET_INVALID_ADDRESS;
+
+	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+		return SEV_RET_INVALID_ADDRESS;
+
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+	data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+	return 0;
+}
+
+static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
+{
+	u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+	int ret;
+
+	ret = snp_page_reclaim(pfn);
+	if (ret)
+		*rc = SEV_RET_INVALID_ADDRESS;
+
+	ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+	if (ret)
+		*rc = SEV_RET_INVALID_ADDRESS;
+}
+
+static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct sev_data_snp_guest_request data = {0};
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_sev_info *sev;
+	unsigned long rc;
+	int err;
+
+	if (!sev_snp_guest(vcpu->kvm)) {
+		rc = SEV_RET_INVALID_GUEST;
+		goto e_fail;
+	}
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	mutex_lock(&sev->guest_req_lock);
+
+	rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
+	if (rc)
+		goto unlock;
+
+	rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
+	if (rc)
+		/* use the firmware error code */
+		rc = err;
+
+	snp_cleanup_guest_buf(&data, &rc);
+
+unlock:
+	mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+	svm_set_ghcb_sw_exit_info_2(vcpu, rc);
+}
+
+static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct sev_data_snp_guest_request req = {0};
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	unsigned long data_npages;
+	struct kvm_sev_info *sev;
+	unsigned long rc, err;
+	u64 data_gpa;
+
+	if (!sev_snp_guest(vcpu->kvm)) {
+		rc = SEV_RET_INVALID_GUEST;
+		goto e_fail;
+	}
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+	data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+	if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
+		rc = SEV_RET_INVALID_ADDRESS;
+		goto e_fail;
+	}
+
+	/* Verify that requested blob will fit in certificate buffer */
+	if ((data_npages << PAGE_SHIFT) > SEV_FW_BLOB_MAX_SIZE) {
+		rc = SEV_RET_INVALID_PARAM;
+		goto e_fail;
+	}
+
+	mutex_lock(&sev->guest_req_lock);
+
+	rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
+	if (rc)
+		goto unlock;
+
+	rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
+					 &data_npages, &err);
+	if (rc) {
+		/*
+		 * If buffer length is small then return the expected
+		 * length in rbx.
+		 */
+		if (err == SNP_GUEST_REQ_INVALID_LEN)
+			vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
+
+		/* pass the firmware error code */
+		rc = err;
+		goto cleanup;
+	}
+
+	/* Copy the certificate blob in the guest memory */
+	if (data_npages &&
+	    kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
+		rc = SEV_RET_INVALID_ADDRESS;
+
+cleanup:
+	snp_cleanup_guest_buf(&req, &rc);
+
+unlock:
+	mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+	svm_set_ghcb_sw_exit_info_2(vcpu, rc);
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3404,6 +3575,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		svm_set_ghcb_sw_exit_info_2(vcpu, rc);
 		break;
 	}
+	case SVM_VMGEXIT_GUEST_REQUEST: {
+		snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
+
+		ret = 1;
+		break;
+	}
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
+		snp_handle_ext_guest_request(svm,
+					     control->exit_info_1,
+					     control->exit_info_2);
+
+		ret = 1;
+		break;
+	}
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 280072995306..71fe46a778f3 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -92,6 +92,8 @@ struct kvm_sev_info {
 	u64 snp_init_flags;
 	void *snp_context;      /* SNP guest context page */
 	struct srcu_struct psc_srcu;
+	void *snp_certs_data;
+	struct mutex guest_req_lock;
 };
 
 struct kvm_svm {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 43/45] KVM: SVM: Use a VMSA physical address variable for populating VMCB
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (41 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 42/45] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-10-15 18:58   ` Sean Christopherson
  2021-08-20 15:59 ` [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event Brijesh Singh
                   ` (2 subsequent siblings)
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

From: Tom Lendacky <thomas.lendacky@amd.com>

In preparation to support SEV-SNP AP Creation, use a variable that holds
the VMSA physical address rather than converting the virtual address.
This will allow SEV-SNP AP Creation to set the new physical address that
will be used should the vCPU reset path be taken.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm/sev.c | 5 ++---
 arch/x86/kvm/svm/svm.c | 9 ++++++++-
 arch/x86/kvm/svm/svm.h | 1 +
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 81ccad412e55..05f795c30816 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3619,10 +3619,9 @@ void sev_es_init_vmcb(struct vcpu_svm *svm)
 
 	/*
 	 * An SEV-ES guest requires a VMSA area that is a separate from the
-	 * VMCB page. Do not include the encryption mask on the VMSA physical
-	 * address since hardware will access it using the guest key.
+	 * VMCB page.
 	 */
-	svm->vmcb->control.vmsa_pa = __pa(svm->vmsa);
+	svm->vmcb->control.vmsa_pa = svm->vmsa_pa;
 
 	/* Can't intercept CR register access, HV can't modify CR registers */
 	svm_clr_intercept(svm, INTERCEPT_CR0_READ);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3ba62f21b113..be820eb999fb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1409,9 +1409,16 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	svm->vmcb01.ptr = page_address(vmcb01_page);
 	svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
 
-	if (vmsa_page)
+	if (vmsa_page) {
 		svm->vmsa = page_address(vmsa_page);
 
+		/*
+		 * Do not include the encryption mask on the VMSA physical
+		 * address since hardware will access it using the guest key.
+		 */
+		svm->vmsa_pa = __pa(svm->vmsa);
+	}
+
 	svm->guest_state_loaded = false;
 
 	svm_switch_vmcb(svm, &svm->vmcb01);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 71fe46a778f3..9bf6404142dd 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -199,6 +199,7 @@ struct vcpu_svm {
 
 	/* SEV-ES support */
 	struct sev_es_save_area *vmsa;
+	hpa_t vmsa_pa;
 	bool ghcb_in_use;
 	bool received_first_sipi;
 	unsigned int ap_reset_hold_type;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (42 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 43/45] KVM: SVM: Use a VMSA physical address variable for populating VMCB Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-10-15 19:50   ` Sean Christopherson
  2021-08-20 15:59 ` [PATCH Part2 v5 45/45] KVM: SVM: Add module parameter to enable the SEV-SNP Brijesh Singh
  2021-11-12 15:43 ` [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Peter Gonda
  45 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.

A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.

For CREATE
  The guest supplies the GPA of the VMSA to be used for the vCPU with the
  specified APIC ID. The GPA is saved in the svm struct of the target
  vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added to the
  vCPU and then the vCPU is kicked.

For CREATE_ON_INIT:
  The guest supplies the GPA of the VMSA to be used for the vCPU with the
  specified APIC ID the next time an INIT is performed. The GPA is saved
  in the svm struct of the target vCPU.

For DESTROY:
  The guest indicates it wishes to stop the vCPU. The GPA is cleared from
  the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
  to vCPU and then the vCPU is kicked.


The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked as
a result of the event or as a result of an INIT. The handler sets the vCPU
to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will leave the
vCPU as not runnable. Any previous VMSA pages that were installed as
part of an SEV-SNP AP Creation NAE event are un-pinned. If a new VMSA is
to be installed, the VMSA guest page is pinned and set as the VMSA in the
vCPU VMCB and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new
VMSA is not to be installed, the VMSA is cleared in the vCPU VMCB and the
vCPU state is left as KVM_MP_STATE_UNINITIALIZED to prevent it from being
run.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |   1 +
 arch/x86/include/asm/kvm_host.h    |   3 +
 arch/x86/include/asm/svm.h         |   7 +-
 arch/x86/kvm/svm/sev.c             | 211 +++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c             |   6 +-
 arch/x86/kvm/svm/svm.h             |   9 ++
 arch/x86/kvm/x86.c                 |  13 +-
 7 files changed, 247 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index c09bd40e0160..01f31957bd7d 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -126,6 +126,7 @@ KVM_X86_OP(alloc_apic_backing_page)
 KVM_X86_OP_NULL(rmp_page_level_adjust)
 KVM_X86_OP(post_map_gfn)
 KVM_X86_OP(post_unmap_gfn)
+KVM_X86_OP(update_protected_guest_state)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_NULL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8773c1f9e45e..11ce66fe1656 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -91,6 +91,7 @@
 #define KVM_REQ_MSR_FILTER_CHANGED	KVM_ARCH_REQ(29)
 #define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \
 	KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE	KVM_ARCH_REQ(31)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -1468,6 +1469,8 @@ struct kvm_x86_ops {
 
 	int (*post_map_gfn)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token);
 	void (*post_unmap_gfn)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token);
+
+	int (*update_protected_guest_state)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index a39e31845a33..cf7c88a0d60a 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -218,7 +218,12 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define SVM_NESTED_CTL_SEV_ENABLE	BIT(1)
 #define SVM_NESTED_CTL_SEV_ES_ENABLE	BIT(2)
 
-#define SVM_SEV_FEAT_SNP_ACTIVE		BIT(0)
+#define SVM_SEV_FEAT_SNP_ACTIVE			BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION	BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION	BIT(4)
+#define SVM_SEV_FEAT_INT_INJ_MODES		\
+	(SVM_SEV_FEAT_RESTRICTED_INJECTION |	\
+	 SVM_SEV_FEAT_ALTERNATE_INJECTION)
 
 struct vmcb_seg {
 	u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 05f795c30816..151747ec0809 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -649,6 +649,7 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 {
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
 	struct sev_es_save_area *save = svm->vmsa;
 
 	/* Check some debug related fields before encrypting the VMSA */
@@ -693,6 +694,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	if (sev_snp_guest(svm->vcpu.kvm))
 		save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
 
+	/*
+	 * Save the VMSA synced SEV features. For now, they are the same for
+	 * all vCPUs, so just save each time.
+	 */
+	sev->sev_features = save->sev_features;
+
 	return 0;
 }
 
@@ -2760,6 +2767,10 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
 		if (!ghcb_sw_scratch_is_valid(ghcb))
 			goto vmgexit_err;
 		break;
+	case SVM_VMGEXIT_AP_CREATION:
+		if (!ghcb_rax_is_valid(ghcb))
+			goto vmgexit_err;
+		break;
 	case SVM_VMGEXIT_NMI_COMPLETE:
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3332,6 +3343,191 @@ static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gp
 	svm_set_ghcb_sw_exit_info_2(vcpu, rc);
 }
 
+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	kvm_pfn_t pfn;
+
+	WARN_ON(!mutex_is_locked(&svm->snp_vmsa_mutex));
+
+	/* Mark the vCPU as offline and not runnable */
+	vcpu->arch.pv.pv_unhalted = false;
+	vcpu->arch.mp_state = KVM_MP_STATE_STOPPED;
+
+	/* Clear use of the VMSA in the sev_es_init_vmcb() path */
+	svm->vmsa_pa = INVALID_PAGE;
+
+	/* Clear use of the VMSA from the VMCB */
+	svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+	if (VALID_PAGE(svm->snp_vmsa_pfn)) {
+		/*
+		 * The snp_vmsa_pfn fields holds the hypervisor physical address
+		 * of the about to be replaced VMSA which will no longer be used
+		 * or referenced, so un-pin it.
+		 */
+		kvm_release_pfn_dirty(svm->snp_vmsa_pfn);
+		svm->snp_vmsa_pfn = INVALID_PAGE;
+	}
+
+	if (VALID_PAGE(svm->snp_vmsa_gpa)) {
+		/*
+		 * The VMSA is referenced by the hypervisor physical address,
+		 * so retrieve the PFN and pin it.
+		 */
+		pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(svm->snp_vmsa_gpa));
+		if (is_error_pfn(pfn))
+			return -EINVAL;
+
+		svm->snp_vmsa_pfn = pfn;
+
+		/* Use the new VMSA in the sev_es_init_vmcb() path */
+		svm->vmsa_pa = pfn_to_hpa(pfn);
+		svm->vmcb->control.vmsa_pa = svm->vmsa_pa;
+
+		/* Mark the vCPU as runnable */
+		vcpu->arch.pv.pv_unhalted = false;
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+	}
+
+	return 0;
+}
+
+/*
+ * Invoked as part of vcpu_enter_guest() event processing.
+ * Expected return values are:
+ *   0 - exit to userspace
+ *   1 - continue vcpu_run() execution loop
+ */
+int sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	mutex_lock(&svm->snp_vmsa_mutex);
+
+	ret = __sev_snp_update_protected_guest_state(vcpu);
+	if (ret)
+		vcpu_unimpl(vcpu, "snp: AP state update failed\n");
+
+	mutex_unlock(&svm->snp_vmsa_mutex);
+
+	return ret ? 0 : 1;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		return;
+
+	mutex_lock(&svm->snp_vmsa_mutex);
+
+	if (!svm->snp_vmsa_update_on_init)
+		goto unlock;
+
+	svm->snp_vmsa_update_on_init = false;
+
+	ret = __sev_snp_update_protected_guest_state(vcpu);
+	if (ret)
+		vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+	mutex_unlock(&svm->snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_vcpu *target_vcpu;
+	struct vcpu_svm *target_svm;
+	unsigned int request;
+	unsigned int apic_id;
+	bool kick;
+	int ret;
+
+	request = lower_32_bits(svm->vmcb->control.exit_info_1);
+	apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+	/* Validate the APIC ID */
+	target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+	if (!target_vcpu) {
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+			    apic_id);
+		return -EINVAL;
+	}
+
+	ret = 0;
+
+	target_svm = to_svm(target_vcpu);
+
+	/*
+	 * We have a valid target vCPU, so the vCPU will be kicked unless the
+	 * request is for CREATE_ON_INIT. For any errors at this stage, the
+	 * kick will place the vCPU in an non-runnable state.
+	 */
+	kick = true;
+
+	mutex_lock(&target_svm->snp_vmsa_mutex);
+
+	target_svm->snp_vmsa_gpa = INVALID_PAGE;
+	target_svm->snp_vmsa_update_on_init = false;
+
+	/* Interrupt injection mode shouldn't change for AP creation */
+	if (request < SVM_VMGEXIT_AP_DESTROY) {
+		u64 sev_features;
+
+		sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+		sev_features ^= sev->sev_features;
+		if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+				    vcpu->arch.regs[VCPU_REGS_RAX]);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	switch (request) {
+	case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+		kick = false;
+		target_svm->snp_vmsa_update_on_init = true;
+		fallthrough;
+	case SVM_VMGEXIT_AP_CREATE:
+		if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+			vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+				    svm->vmcb->control.exit_info_2);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		target_svm->snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+		break;
+	case SVM_VMGEXIT_AP_DESTROY:
+		break;
+	default:
+		vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+			    request);
+		ret = -EINVAL;
+		break;
+	}
+
+out:
+	mutex_unlock(&target_svm->snp_vmsa_mutex);
+
+	if (kick) {
+		kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+		kvm_vcpu_kick(target_vcpu);
+	}
+
+	return ret;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3589,6 +3785,18 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_AP_CREATION:
+		ret = sev_snp_ap_creation(svm);
+		if (ret) {
+			svm_set_ghcb_sw_exit_info_1(vcpu, 1);
+			svm_set_ghcb_sw_exit_info_2(vcpu,
+						    X86_TRAP_GP |
+						    SVM_EVTINJ_TYPE_EXEPT |
+						    SVM_EVTINJ_VALID);
+		}
+
+		ret = 1;
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -3663,6 +3871,9 @@ void sev_es_create_vcpu(struct vcpu_svm *svm)
 	set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
 					    GHCB_VERSION_MIN,
 					    sev_enc_bit));
+
+	mutex_init(&svm->snp_vmsa_mutex);
+	svm->snp_vmsa_pfn = INVALID_PAGE;
 }
 
 void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index be820eb999fb..29e7666a710b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1336,7 +1336,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	svm->spec_ctrl = 0;
 	svm->virt_spec_ctrl = 0;
 
-	if (!init_event) {
+	if (init_event) {
+		sev_snp_init_protected_guest_state(vcpu);
+	} else {
 		vcpu->arch.apic_base = APIC_DEFAULT_PHYS_BASE |
 				       MSR_IA32_APICBASE_ENABLE;
 		if (kvm_vcpu_is_reset_bsp(vcpu))
@@ -4697,6 +4699,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.post_map_gfn = sev_post_map_gfn,
 	.post_unmap_gfn = sev_post_unmap_gfn,
+
+	.update_protected_guest_state = sev_snp_update_protected_guest_state,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9bf6404142dd..59044b3a7c7a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -94,6 +94,8 @@ struct kvm_sev_info {
 	struct srcu_struct psc_srcu;
 	void *snp_certs_data;
 	struct mutex guest_req_lock;
+
+	u64 sev_features;	/* Features set at VMSA creation */
 };
 
 struct kvm_svm {
@@ -221,6 +223,11 @@ struct vcpu_svm {
 	u64 ghcb_sw_exit_info_2;
 
 	u64 ghcb_registered_gpa;
+
+	struct mutex snp_vmsa_mutex;
+	gpa_t snp_vmsa_gpa;
+	kvm_pfn_t snp_vmsa_pfn;
+	bool snp_vmsa_update_on_init;	/* SEV-SNP AP Creation on INIT-SIPI */
 };
 
 struct svm_cpu_data {
@@ -630,6 +637,8 @@ void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level);
 int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token);
 void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token);
 void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu);
 
 /* vmenter.S */
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf4389ffc88f..dbb8362cc576 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9576,6 +9576,16 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
 			static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+		if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+			r = static_call(kvm_x86_update_protected_guest_state)(vcpu);
+			if (!r) {
+				vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+				goto out;
+			} else if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
+				goto out;
+			}
+		}
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -11656,7 +11666,8 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 	if (!list_empty_careful(&vcpu->async_pf.done))
 		return true;
 
-	if (kvm_apic_has_events(vcpu))
+	if (kvm_apic_has_events(vcpu) ||
+	    kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
 		return true;
 
 	if (vcpu->arch.pv.pv_unhalted)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* [PATCH Part2 v5 45/45] KVM: SVM: Add module parameter to enable the SEV-SNP
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (43 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event Brijesh Singh
@ 2021-08-20 15:59 ` Brijesh Singh
  2021-11-12 15:43 ` [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Peter Gonda
  45 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-08-20 15:59 UTC (permalink / raw)
  To: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Brijesh Singh

Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 7 ++++---
 arch/x86/kvm/svm/svm.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 151747ec0809..0c834df3f63c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -59,14 +59,15 @@ module_param_named(sev, sev_enabled, bool, 0444);
 /* enable/disable SEV-ES support */
 static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
+
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
 #else
 #define sev_enabled false
 #define sev_es_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
-/* enable/disable SEV-SNP support */
-static bool sev_snp_enabled;
-
 #define AP_RESET_HOLD_NONE		0
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 59044b3a7c7a..9d6c51e92a79 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -608,7 +608,7 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 #define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
-#define GHCB_HV_FT_SUPPORTED	0
+#define GHCB_HV_FT_SUPPORTED	(GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)
 
 extern unsigned int max_sev_asid;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address
  2021-08-20 15:58 ` [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address Brijesh Singh
@ 2021-08-23 14:20   ` Dave Hansen
  2021-08-23 14:36     ` Brijesh Singh
  2021-09-29 18:19   ` Borislav Petkov
  1 sibling, 1 reply; 239+ messages in thread
From: Dave Hansen @ 2021-08-23 14:20 UTC (permalink / raw)
  To: Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On 8/20/21 8:58 AM, Brijesh Singh wrote:
> +static int handle_split_page_fault(struct vm_fault *vmf)
> +{
> +	if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
> +		return VM_FAULT_SIGBUS;
> +
> +	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
> +	return 0;
> +}

We had a whole conversation the last time this was posted about huge
page types *other* than THP.  I don't see any comprehension of those
types or what would happen if one of those was used with SEV-SNP.

What was the result of those review comments?

I'm still worried that hugetlbfs (and others) are not properly handled
by this series.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address
  2021-08-23 14:20   ` Dave Hansen
@ 2021-08-23 14:36     ` Brijesh Singh
  2021-08-23 14:50       ` Dave Hansen
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-08-23 14:36 UTC (permalink / raw)
  To: Dave Hansen, x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: brijesh.singh, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

Hi Dave,

On 8/23/21 9:20 AM, Dave Hansen wrote:
> On 8/20/21 8:58 AM, Brijesh Singh wrote:
>> +static int handle_split_page_fault(struct vm_fault *vmf)
>> +{
>> +	if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
>> +		return VM_FAULT_SIGBUS;
>> +
>> +	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
>> +	return 0;
>> +}
> 
> We had a whole conversation the last time this was posted about huge
> page types *other* than THP.  I don't see any comprehension of those
> types or what would happen if one of those was used with SEV-SNP.
> 
> What was the result of those review comments?
> 

Based on previous review comments Sean was not keen on KVM having 
perform this detection and abort the guest SEV-SNP VM launch. So, I 
didn't implemented the check and waiting for more discussion before 
going at it.

SEV-SNP guest requires the VMM to register the guest backing pages 
before the VM launch. Personally, I would prefer KVM to check the 
backing page type during the registration and fail to register if its 
hugetlbfs (and others) to avoid us get into situation where we could not 
split the hugepage.

thanks

> I'm still worried that hugetlbfs (and others) are not properly handled
> by this series.
> 

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address
  2021-08-23 14:36     ` Brijesh Singh
@ 2021-08-23 14:50       ` Dave Hansen
  2021-08-24 16:42         ` Joerg Roedel
  0 siblings, 1 reply; 239+ messages in thread
From: Dave Hansen @ 2021-08-23 14:50 UTC (permalink / raw)
  To: Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On 8/23/21 7:36 AM, Brijesh Singh wrote:
> On 8/23/21 9:20 AM, Dave Hansen wrote:
>> On 8/20/21 8:58 AM, Brijesh Singh wrote:
>>> +static int handle_split_page_fault(struct vm_fault *vmf)
>>> +{
>>> +    if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
>>> +        return VM_FAULT_SIGBUS;
>>> +
>>> +    __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
>>> +    return 0;
>>> +}
>>
>> We had a whole conversation the last time this was posted about huge
>> page types *other* than THP.  I don't see any comprehension of those
>> types or what would happen if one of those was used with SEV-SNP.
>>
>> What was the result of those review comments?
> 
> Based on previous review comments Sean was not keen on KVM having
> perform this detection and abort the guest SEV-SNP VM launch. So, I
> didn't implemented the check and waiting for more discussion before
> going at it.

OK.  But, you need to *acknowledge* the situation somewhere.  Maybe the
cover letter of the series, maybe in this changelog.

As it stands, it looks like you're simply ignoring _some_ reviewer feedback.

> SEV-SNP guest requires the VMM to register the guest backing pages
> before the VM launch. Personally, I would prefer KVM to check the
> backing page type during the registration and fail to register if its
> hugetlbfs (and others) to avoid us get into situation where we could not
> split the hugepage.

It *has* to be done in KVM, IMNHO.

The core kernel really doesn't know much about SEV.  It *really* doesn't
know when its memory is being exposed to a virtualization architecture
that doesn't know how to split TLBs like every single one before it.

This essentially *must* be done at the time that the KVM code realizes
that it's being asked to shove a non-splittable page mapping into the
SEV hardware structures.

The only other alternative is raising a signal from the fault handler
when the page can't be split.  That's a *LOT* nastier because it's so
much later in the process.

It's either that, or figure out a way to split hugetlbfs (and DAX)
mappings in a failsafe way.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address
  2021-08-23 14:50       ` Dave Hansen
@ 2021-08-24 16:42         ` Joerg Roedel
  2021-08-25  9:16           ` Vlastimil Babka
  0 siblings, 1 reply; 239+ messages in thread
From: Joerg Roedel @ 2021-08-24 16:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Mon, Aug 23, 2021 at 07:50:22AM -0700, Dave Hansen wrote:
> It *has* to be done in KVM, IMNHO.
> 
> The core kernel really doesn't know much about SEV.  It *really* doesn't
> know when its memory is being exposed to a virtualization architecture
> that doesn't know how to split TLBs like every single one before it.
> 
> This essentially *must* be done at the time that the KVM code realizes
> that it's being asked to shove a non-splittable page mapping into the
> SEV hardware structures.
> 
> The only other alternative is raising a signal from the fault handler
> when the page can't be split.  That's a *LOT* nastier because it's so
> much later in the process.
> 
> It's either that, or figure out a way to split hugetlbfs (and DAX)
> mappings in a failsafe way.

Yes, I agree with that. KVM needs a check to disallow HugeTLB pages in
SEV-SNP guests, at least as a temporary workaround. When HugeTLBfs
mappings can be split into smaller pages the check can be removed.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address
  2021-08-24 16:42         ` Joerg Roedel
@ 2021-08-25  9:16           ` Vlastimil Babka
  2021-08-25 13:50             ` Tom Lendacky
  0 siblings, 1 reply; 239+ messages in thread
From: Vlastimil Babka @ 2021-08-25  9:16 UTC (permalink / raw)
  To: Joerg Roedel, Dave Hansen
  Cc: Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, Mike Kravetz

On 8/24/21 18:42, Joerg Roedel wrote:
> On Mon, Aug 23, 2021 at 07:50:22AM -0700, Dave Hansen wrote:
>> It *has* to be done in KVM, IMNHO.
>> 
>> The core kernel really doesn't know much about SEV.  It *really* doesn't
>> know when its memory is being exposed to a virtualization architecture
>> that doesn't know how to split TLBs like every single one before it.
>> 
>> This essentially *must* be done at the time that the KVM code realizes
>> that it's being asked to shove a non-splittable page mapping into the
>> SEV hardware structures.
>> 
>> The only other alternative is raising a signal from the fault handler
>> when the page can't be split.  That's a *LOT* nastier because it's so
>> much later in the process.
>> 
>> It's either that, or figure out a way to split hugetlbfs (and DAX)
>> mappings in a failsafe way.
> 
> Yes, I agree with that. KVM needs a check to disallow HugeTLB pages in
> SEV-SNP guests, at least as a temporary workaround. When HugeTLBfs
> mappings can be split into smaller pages the check can be removed.

FTR, this is Sean's reply with concerns in v4:
https://lore.kernel.org/linux-coco/YPCuTiNET%2FhJHqOY@google.com/

I think there are two main arguments there:
- it's not KVM business to decide
- guest may do all page state changes with 2mb granularity so it might be fine
with hugetlb

The latter might become true, but I think it's more probable that sooner
hugetlbfs will learn to split the mappings to base pages - I know people plan to
work on that. At that point qemu will have to recognize if the host kernel is
the new one that can do this splitting vs older one that can't. Preferably
without relying on kernel version number, as backports exist. Thus, trying to
register a hugetlbfs range that either is rejected (kernel can't split) or
passes (kernel can split) seems like a straightforward way. So I'm also in favor
of adding that, hopefuly temporary, check.

Vlastimil

> Regards,
> 
> 	Joerg
> 


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address
  2021-08-25  9:16           ` Vlastimil Babka
@ 2021-08-25 13:50             ` Tom Lendacky
  0 siblings, 0 replies; 239+ messages in thread
From: Tom Lendacky @ 2021-08-25 13:50 UTC (permalink / raw)
  To: Vlastimil Babka, Joerg Roedel, Dave Hansen
  Cc: Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Ard Biesheuvel, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, Mike Kravetz

On 8/25/21 4:16 AM, Vlastimil Babka wrote:
> On 8/24/21 18:42, Joerg Roedel wrote:
>> On Mon, Aug 23, 2021 at 07:50:22AM -0700, Dave Hansen wrote:
>>> It *has* to be done in KVM, IMNHO.
>>>
>>> The core kernel really doesn't know much about SEV.  It *really* doesn't
>>> know when its memory is being exposed to a virtualization architecture
>>> that doesn't know how to split TLBs like every single one before it.
>>>
>>> This essentially *must* be done at the time that the KVM code realizes
>>> that it's being asked to shove a non-splittable page mapping into the
>>> SEV hardware structures.
>>>
>>> The only other alternative is raising a signal from the fault handler
>>> when the page can't be split.  That's a *LOT* nastier because it's so
>>> much later in the process.
>>>
>>> It's either that, or figure out a way to split hugetlbfs (and DAX)
>>> mappings in a failsafe way.
>>
>> Yes, I agree with that. KVM needs a check to disallow HugeTLB pages in
>> SEV-SNP guests, at least as a temporary workaround. When HugeTLBfs
>> mappings can be split into smaller pages the check can be removed.
> 
> FTR, this is Sean's reply with concerns in v4:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-coco%2FYPCuTiNET%252FhJHqOY%40google.com%2F&amp;data=04%7C01%7Cthomas.lendacky%40amd.com%7C692ea2e8bfd744e7ab5d08d967a918d3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637654798234874418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=leZuMY0%2FX8xbHA%2FOrxkXNoLCGNoVUQpY5eB3EInM55A%3D&amp;reserved=0
> 
> I think there are two main arguments there:
> - it's not KVM business to decide
> - guest may do all page state changes with 2mb granularity so it might be fine
> with hugetlb
> 
> The latter might become true, but I think it's more probable that sooner
> hugetlbfs will learn to split the mappings to base pages - I know people plan to
> work on that. At that point qemu will have to recognize if the host kernel is
> the new one that can do this splitting vs older one that can't. Preferably
> without relying on kernel version number, as backports exist. Thus, trying to
> register a hugetlbfs range that either is rejected (kernel can't split) or
> passes (kernel can split) seems like a straightforward way. So I'm also in favor
> of adding that, hopefuly temporary, check.

If that's the direction taken, I think we'd be able to use a KVM_CAP_
value that can be queried by the VMM to make the determination.

Thanks,
Tom

> 
> Vlastimil
> 
>> Regards,
>>
>> 	Joerg
>>
> 

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 17/45] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2021-08-20 15:58 ` [PATCH Part2 v5 17/45] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Brijesh Singh
@ 2021-09-01 21:02   ` Connor Kuehl
  2021-09-01 23:06     ` Brijesh Singh
  2021-09-10  3:27   ` Marc Orr
  1 sibling, 1 reply; 239+ messages in thread
From: Connor Kuehl @ 2021-09-01 21:02 UTC (permalink / raw)
  To: Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, tfanelli

On 8/20/21 10:58 AM, Brijesh Singh wrote:
> +2.4 SNP_SET_EXT_CONFIG
> +----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_ext_config
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
> +reported TCB version in the attestation report. The command is similar to
> +SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
> +command also accepts an additional certificate blob defined in the GHCB
> +specification.
> +
> +If the certs_address is zero, then previous certificate blob will deleted.
> +For more information on the certificate blob layout, see the GHCB spec
> +(extended guest request message).

Hi Brijesh,

Just to be clear, is the documentation you're referring to regarding the
layout of the certificate blob specified on page 47 of the GHCB spec?
More specifically, is it the `struct cert_table` on that page?

https://developer.amd.com/wp-content/resources/56421.pdf

If so, where is the VCEK certificate layout documented?

Connor

> +/**
> + * struct sev_data_snp_ext_config - system wide configuration value for SNP.
> + *
> + * @config_address: address of the struct sev_user_data_snp_config or 0 when
> + *		reported_tcb does not need to be updated.
> + * @certs_address: address of extended guest request certificate chain or
> + *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
> + * @certs_len: length of the certs
> + */
> +struct sev_user_data_ext_snp_config {
> +	__u64 config_address;		/* In */
> +	__u64 certs_address;		/* In */
> +	__u32 certs_len;		/* In */
> +};


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 17/45] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2021-09-01 21:02   ` Connor Kuehl
@ 2021-09-01 23:06     ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-09-01 23:06 UTC (permalink / raw)
  To: Connor Kuehl, x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, tfanelli


On 9/1/21 4:02 PM, Connor Kuehl wrote:
> On 8/20/21 10:58 AM, Brijesh Singh wrote:
>> +2.4 SNP_SET_EXT_CONFIG
>> +----------------------
>> +:Technology: sev-snp
>> +:Type: hypervisor ioctl cmd
>> +:Parameters (in): struct sev_data_snp_ext_config
>> +:Returns (out): 0 on success, -negative on error
>> +
>> +The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
>> +reported TCB version in the attestation report. The command is similar to
>> +SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
>> +command also accepts an additional certificate blob defined in the GHCB
>> +specification.
>> +
>> +If the certs_address is zero, then previous certificate blob will deleted.
>> +For more information on the certificate blob layout, see the GHCB spec
>> +(extended guest request message).
> Hi Brijesh,
>
> Just to be clear, is the documentation you're referring to regarding the
> layout of the certificate blob specified on page 47 of the GHCB spec?
> More specifically, is it the `struct cert_table` on that page?

Yes that is correct.


>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fwp-content%2Fresources%2F56421.pdf&amp;data=04%7C01%7Cbrijesh.singh%40amd.com%7C62df2fe1cb384de88ed708d96d8bda20%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637661270135555480%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=V4S8atM%2BTlZ%2BiIlddRjpTNIx4yecGEoETuFVjeNWWNQ%3D&amp;reserved=0
>
> If so, where is the VCEK certificate layout documented?

You can get the VCEK from the KDS using the chip id. The certificate is
standard X.509.

thanks

>
> Connor
>
>> +/**
>> + * struct sev_data_snp_ext_config - system wide configuration value for SNP.
>> + *
>> + * @config_address: address of the struct sev_user_data_snp_config or 0 when
>> + *		reported_tcb does not need to be updated.
>> + * @certs_address: address of extended guest request certificate chain or
>> + *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
>> + * @certs_len: length of the certs
>> + */
>> +struct sev_user_data_ext_snp_config {
>> +	__u64 config_address;		/* In */
>> +	__u64 certs_address;		/* In */
>> +	__u32 certs_len;		/* In */
>> +};

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2021-08-20 15:58 ` [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command Brijesh Singh
@ 2021-09-05  6:56   ` Dov Murik
  2021-09-05 13:59     ` Brijesh Singh
  2021-09-10  3:32   ` Marc Orr
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 239+ messages in thread
From: Dov Murik @ 2021-09-05  6:56 UTC (permalink / raw)
  To: Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Pavan Kumar Paluri,
	Dov Murik

Hi Brijesh,

On 20/08/2021 18:58, Brijesh Singh wrote:
> The KVM_SNP_INIT command is used by the hypervisor to initialize the
> SEV-SNP platform context. In a typical workflow, this command should be the
> first command issued. When creating SEV-SNP guest, the VMM must use this
> command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.
> 
> The flags value must be zero, it will be extended in future SNP support to
> communicate the optional features (such as restricted INT injection etc).
> 
> Co-developed-by: Pavan Kumar Paluri <papaluri@amd.com>
> Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  .../virt/kvm/amd-memory-encryption.rst        | 27 ++++++++++++
>  arch/x86/include/asm/svm.h                    |  2 +
>  arch/x86/kvm/svm/sev.c                        | 44 ++++++++++++++++++-
>  arch/x86/kvm/svm/svm.h                        |  4 ++
>  include/uapi/linux/kvm.h                      | 13 ++++++
>  5 files changed, 88 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 5c081c8c7164..7b1d32fb99a8 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -427,6 +427,33 @@ issued by the hypervisor to make the guest ready for execution.
>  
>  Returns: 0 on success, -negative on error
>  
> +18. KVM_SNP_INIT
> +----------------
> +
> +The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
> +context. In a typical workflow, this command should be the first command issued.
> +
> +Parameters (in/out): struct kvm_snp_init
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_snp_init {
> +                __u64 flags;
> +        };
> +
> +The flags bitmap is defined as::
> +
> +   /* enable the restricted injection */
> +   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
> +
> +   /* enable the restricted injection timer */
> +   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)

Typo in these flags: s/INJET/INJECT/

Also in the actual .h file below.


-Dov


> +
> +If the specified flags is not supported then return -EOPNOTSUPP, and the supported
> +flags are returned.
> +
>  References
>  ==========
>  
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 44a3f920f886..a39e31845a33 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -218,6 +218,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  #define SVM_NESTED_CTL_SEV_ENABLE	BIT(1)
>  #define SVM_NESTED_CTL_SEV_ES_ENABLE	BIT(2)
>  
> +#define SVM_SEV_FEAT_SNP_ACTIVE		BIT(0)
> +
>  struct vmcb_seg {
>  	u16 selector;
>  	u16 attrib;
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 50fddbe56981..93da463545ef 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -235,10 +235,30 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>  	sev_decommission(handle);
>  }
>  
> +static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +	struct kvm_snp_init params;
> +	int ret = 0;
> +
> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> +		return -EFAULT;
> +
> +	if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
> +		ret = -EOPNOTSUPP;
> +
> +	params.flags = SEV_SNP_SUPPORTED_FLAGS;
> +
> +	if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
> +		ret = -EFAULT;
> +
> +	return ret;
> +}
> +
>  static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
> +	bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
>  	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> -	bool es_active = argp->id == KVM_SEV_ES_INIT;
> +	bool snp_active = argp->id == KVM_SEV_SNP_INIT;
>  	int asid, ret;
>  
>  	if (kvm->created_vcpus)
> @@ -249,12 +269,22 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  		return ret;
>  
>  	sev->es_active = es_active;
> +	sev->snp_active = snp_active;
>  	asid = sev_asid_new(sev);
>  	if (asid < 0)
>  		goto e_no_asid;
>  	sev->asid = asid;
>  
> -	ret = sev_platform_init(&argp->error);
> +	if (snp_active) {
> +		ret = verify_snp_init_flags(kvm, argp);
> +		if (ret)
> +			goto e_free;
> +
> +		ret = sev_snp_init(&argp->error);
> +	} else {
> +		ret = sev_platform_init(&argp->error);
> +	}
> +
>  	if (ret)
>  		goto e_free;
>  
> @@ -600,6 +630,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>  	save->pkru = svm->vcpu.arch.pkru;
>  	save->xss  = svm->vcpu.arch.ia32_xss;
>  
> +	/* Enable the SEV-SNP feature */
> +	if (sev_snp_guest(svm->vcpu.kvm))
> +		save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
> +
>  	return 0;
>  }
>  
> @@ -1532,6 +1566,12 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  	}
>  
>  	switch (sev_cmd.id) {
> +	case KVM_SEV_SNP_INIT:
> +		if (!sev_snp_enabled) {
> +			r = -ENOTTY;
> +			goto out;
> +		}
> +		fallthrough;
>  	case KVM_SEV_ES_INIT:
>  		if (!sev_es_enabled) {
>  			r = -ENOTTY;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 01953522097d..57c3c404b0b3 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -69,6 +69,9 @@ enum {
>  /* TPR and CR2 are always written before VMRUN */
>  #define VMCB_ALWAYS_DIRTY_MASK	((1U << VMCB_INTR) | (1U << VMCB_CR2))
>  
> +/* Supported init feature flags */
> +#define SEV_SNP_SUPPORTED_FLAGS		0x0
> +
>  struct kvm_sev_info {
>  	bool active;		/* SEV enabled guest */
>  	bool es_active;		/* SEV-ES enabled guest */
> @@ -81,6 +84,7 @@ struct kvm_sev_info {
>  	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
>  	struct kvm *enc_context_owner; /* Owner of copied encryption context */
>  	struct misc_cg *misc_cg; /* For misc cgroup accounting */
> +	u64 snp_init_flags;
>  };
>  
>  struct kvm_svm {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d9e4aabcb31a..944e2bf601fe 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1712,6 +1712,9 @@ enum sev_cmd_id {
>  	/* Guest Migration Extension */
>  	KVM_SEV_SEND_CANCEL,
>  
> +	/* SNP specific commands */
> +	KVM_SEV_SNP_INIT,
> +
>  	KVM_SEV_NR_MAX,
>  };
>  
> @@ -1808,6 +1811,16 @@ struct kvm_sev_receive_update_data {
>  	__u32 trans_len;
>  };
>  
> +/* enable the restricted injection */
> +#define KVM_SEV_SNP_RESTRICTED_INJET   (1 << 0)
> +
> +/* enable the restricted injection timer */
> +#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1 << 1)
> +
> +struct kvm_snp_init {
> +	__u64 flags;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
> 

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2021-09-05  6:56   ` Dov Murik
@ 2021-09-05 13:59     ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-09-05 13:59 UTC (permalink / raw)
  To: Dov Murik, x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto
  Cc: Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy, Pavan Kumar Paluri

Hi Dov,


On 9/5/21 1:56 AM, Dov Murik wrote:
> + /* enable the restricted injection */
>> +   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
>> +
>> +   /* enable the restricted injection timer */
>> +   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)
> Typo in these flags: s/INJET/INJECT/
>
> Also in the actual .h file below.

Noted. thanks


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 16/45] crypto: ccp: Add the SNP_PLATFORM_STATUS command
  2021-08-20 15:58 ` [PATCH Part2 v5 16/45] crypto: ccp: Add the SNP_PLATFORM_STATUS command Brijesh Singh
@ 2021-09-10  3:18   ` Marc Orr
  2021-09-13 11:17     ` Brijesh Singh
  2021-09-22 17:35   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-09-10  3:18 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> The command can be used by the userspace to query the SNP platform status
> report. See the SEV-SNP spec for more details.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  Documentation/virt/coco/sevguest.rst | 27 +++++++++++++++++
>  drivers/crypto/ccp/sev-dev.c         | 45 ++++++++++++++++++++++++++++
>  include/uapi/linux/psp-sev.h         |  1 +
>  3 files changed, 73 insertions(+)
>
> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
> index 7acb8696fca4..7c51da010039 100644
> --- a/Documentation/virt/coco/sevguest.rst
> +++ b/Documentation/virt/coco/sevguest.rst
> @@ -52,6 +52,22 @@ to execute due to the firmware error, then fw_err code will be set.
>                  __u64 fw_err;
>          };
>
> +The host ioctl should be called to /dev/sev device. The ioctl accepts command
> +id and command input structure.
> +
> +::
> +        struct sev_issue_cmd {
> +                /* Command ID */
> +                __u32 cmd;
> +
> +                /* Command request structure */
> +                __u64 data;
> +
> +                /* firmware error code on failure (see psp-sev.h) */
> +                __u32 error;
> +        };
> +
> +
>  2.1 SNP_GET_REPORT
>  ------------------
>
> @@ -107,3 +123,14 @@ length of the blob is lesser than expected then snp_ext_report_req.certs_len wil
>  be updated with the expected value.
>
>  See GHCB specification for further detail on how to parse the certificate blob.
> +
> +2.3 SNP_PLATFORM_STATUS
> +-----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_platform_status
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
> +status includes API major, minor version and more. See the SEV-SNP
> +specification for further details.
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 4cd7d803a624..16c6df5d412c 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -1394,6 +1394,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
>         return ret;
>  }
>
> +static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
> +{
> +       struct sev_device *sev = psp_master->sev_data;
> +       struct sev_data_snp_platform_status_buf buf;
> +       struct page *status_page;
> +       void *data;
> +       int ret;
> +
> +       if (!sev->snp_inited || !argp->data)
> +               return -EINVAL;
> +
> +       status_page = alloc_page(GFP_KERNEL_ACCOUNT);
> +       if (!status_page)
> +               return -ENOMEM;
> +
> +       data = page_address(status_page);
> +       if (snp_set_rmp_state(__pa(data), 1, true, true, false)) {
> +               __free_pages(status_page, 0);
> +               return -EFAULT;
> +       }
> +
> +       buf.status_paddr = __psp_pa(data);
> +       ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
> +
> +       /* Change the page state before accessing it */
> +       if (snp_set_rmp_state(__pa(data), 1, false, true, true)) {
> +               snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);

Calling `snp_leak_pages()` here seems wrong, because
`snp_set_rmp_state()` calls `snp_leak_pages()` when it returns an
error.

> +               return -EFAULT;
> +       }
> +
> +       if (ret)
> +               goto cleanup;
> +
> +       if (copy_to_user((void __user *)argp->data, data,
> +                        sizeof(struct sev_user_data_snp_status)))
> +               ret = -EFAULT;
> +
> +cleanup:
> +       __free_pages(status_page, 0);
> +       return ret;
> +}
> +
>  static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>  {
>         void __user *argp = (void __user *)arg;
> @@ -1445,6 +1487,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>         case SEV_GET_ID2:
>                 ret = sev_ioctl_do_get_id2(&input);
>                 break;
> +       case SNP_PLATFORM_STATUS:
> +               ret = sev_ioctl_snp_platform_status(&input);
> +               break;
>         default:
>                 ret = -EINVAL;
>                 goto out;
> diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
> index bed65a891223..ffd60e8b0a31 100644
> --- a/include/uapi/linux/psp-sev.h
> +++ b/include/uapi/linux/psp-sev.h
> @@ -28,6 +28,7 @@ enum {
>         SEV_PEK_CERT_IMPORT,
>         SEV_GET_ID,     /* This command is deprecated, use SEV_GET_ID2 */
>         SEV_GET_ID2,
> +       SNP_PLATFORM_STATUS,
>
>         SEV_MAX,
>  };
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 17/45] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2021-08-20 15:58 ` [PATCH Part2 v5 17/45] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Brijesh Singh
  2021-09-01 21:02   ` Connor Kuehl
@ 2021-09-10  3:27   ` Marc Orr
  2021-09-13 11:29     ` Brijesh Singh
  1 sibling, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-09-10  3:27 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

`

On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> The SEV-SNP firmware provides the SNP_CONFIG command used to set the
> system-wide configuration value for SNP guests. The information includes
> the TCB version string to be reported in guest attestation reports.
>
> Version 2 of the GHCB specification adds an NAE (SNP extended guest
> request) that a guest can use to query the reports that include additional
> certificates.
>
> In both cases, userspace provided additional data is included in the
> attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
> command to give the certificate blob and the reported TCB version string
> at once. Note that the specification defines certificate blob with a
> specific GUID format; the userspace is responsible for building the
> proper certificate blob. The ioctl treats it an opaque blob.
>
> While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
> command that can be used to obtain the data programmed through the
> SNP_SET_EXT_CONFIG.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  Documentation/virt/coco/sevguest.rst |  28 +++++++
>  drivers/crypto/ccp/sev-dev.c         | 115 +++++++++++++++++++++++++++
>  drivers/crypto/ccp/sev-dev.h         |   3 +
>  include/uapi/linux/psp-sev.h         |  17 ++++
>  4 files changed, 163 insertions(+)
>
> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
> index 7c51da010039..64a1b5167b33 100644
> --- a/Documentation/virt/coco/sevguest.rst
> +++ b/Documentation/virt/coco/sevguest.rst
> @@ -134,3 +134,31 @@ See GHCB specification for further detail on how to parse the certificate blob.
>  The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
>  status includes API major, minor version and more. See the SEV-SNP
>  specification for further details.
> +
> +2.4 SNP_SET_EXT_CONFIG
> +----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_ext_config
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
> +reported TCB version in the attestation report. The command is similar to
> +SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
> +command also accepts an additional certificate blob defined in the GHCB
> +specification.
> +
> +If the certs_address is zero, then previous certificate blob will deleted.
> +For more information on the certificate blob layout, see the GHCB spec
> +(extended guest request message).
> +
> +
> +2.4 SNP_GET_EXT_CONFIG
> +----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_ext_config
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_SET_EXT_CONFIG is used to query the system-wide configuration set
> +through the SNP_SET_EXT_CONFIG.
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 16c6df5d412c..9ba194acbe85 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -1132,6 +1132,10 @@ static int __sev_snp_shutdown_locked(int *error)
>         if (!sev->snp_inited)
>                 return 0;
>
> +       /* Free the memory used for caching the certificate data */
> +       kfree(sev->snp_certs_data);
> +       sev->snp_certs_data = NULL;
> +
>         /* SHUTDOWN requires the DF_FLUSH */
>         wbinvd_on_all_cpus();
>         __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
> @@ -1436,6 +1440,111 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
>         return ret;
>  }
>
> +static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
> +{
> +       struct sev_device *sev = psp_master->sev_data;
> +       struct sev_user_data_ext_snp_config input;
> +       int ret;
> +
> +       if (!sev->snp_inited || !argp->data)
> +               return -EINVAL;
> +
> +       if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
> +               return -EFAULT;
> +
> +       /* Copy the TCB version programmed through the SET_CONFIG to userspace */
> +       if (input.config_address) {
> +               if (copy_to_user((void * __user)input.config_address,
> +                                &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
> +                       return -EFAULT;
> +       }
> +
> +       /* Copy the extended certs programmed through the SNP_SET_CONFIG */
> +       if (input.certs_address && sev->snp_certs_data) {
> +               if (input.certs_len < sev->snp_certs_len) {
> +                       /* Return the certs length to userspace */
> +                       input.certs_len = sev->snp_certs_len;

This API to retrieve the length of the certs seems pretty odd. We only
return the length if the input.certs_address is non-NULL. But if we
know the length how did we allocate an address to write to
`input.certs_address`?

> +
> +                       ret = -ENOSR;
> +                       goto e_done;
> +               }
> +
> +               if (copy_to_user((void * __user)input.certs_address,
> +                                sev->snp_certs_data, sev->snp_certs_len))
> +                       return -EFAULT;
> +       }
> +
> +       ret = 0;
> +
> +e_done:
> +       if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
> +               ret = -EFAULT;
> +
> +       return ret;
> +}
> +
> +static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
> +{
> +       struct sev_device *sev = psp_master->sev_data;
> +       struct sev_user_data_ext_snp_config input;
> +       struct sev_user_data_snp_config config;
> +       void *certs = NULL;
> +       int ret = 0;
> +
> +       if (!sev->snp_inited || !argp->data)
> +               return -EINVAL;
> +
> +       if (!writable)
> +               return -EPERM;
> +
> +       if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
> +               return -EFAULT;
> +
> +       /* Copy the certs from userspace */
> +       if (input.certs_address) {
> +               if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
> +                       return -EINVAL;
> +
> +               certs = psp_copy_user_blob(input.certs_address, input.certs_len);

Is `psp_copy_user_blob()` implemented in this patch series? When I
searched through the patches, I only found an implementation that
always returns an error. But maybe I missed the implementation?

Also, out of curiosity, any reason we cannot use copy_from_user here?

> +               if (IS_ERR(certs))
> +                       return PTR_ERR(certs);
> +       }
> +
> +       /* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
> +       if (input.config_address) {
> +               if (copy_from_user(&config,
> +                                  (void __user *)input.config_address, sizeof(config))) {
> +                       ret = -EFAULT;
> +                       goto e_free;
> +               }
> +
> +               ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
> +               if (ret)
> +                       goto e_free;
> +
> +               memcpy(&sev->snp_config, &config, sizeof(config));
> +       }
> +
> +       /*
> +        * If the new certs are passed then cache it else free the old certs.
> +        */
> +       if (certs) {
> +               kfree(sev->snp_certs_data);
> +               sev->snp_certs_data = certs;
> +               sev->snp_certs_len = input.certs_len;
> +       } else {
> +               kfree(sev->snp_certs_data);
> +               sev->snp_certs_data = NULL;
> +               sev->snp_certs_len = 0;
> +       }
> +
> +       return 0;
> +
> +e_free:
> +       kfree(certs);
> +       return ret;
> +}
> +
>  static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>  {
>         void __user *argp = (void __user *)arg;
> @@ -1490,6 +1599,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>         case SNP_PLATFORM_STATUS:
>                 ret = sev_ioctl_snp_platform_status(&input);
>                 break;
> +       case SNP_SET_EXT_CONFIG:
> +               ret = sev_ioctl_snp_set_config(&input, writable);
> +               break;
> +       case SNP_GET_EXT_CONFIG:
> +               ret = sev_ioctl_snp_get_config(&input);
> +               break;

What is the intended use of `SNP_GET_EXT_CONFIG`. Yes, I get that it
returns the "EXT config" previously set via `SNP_SET_EXT_CONFIG`. But
presumably the caller can keep track of what it's previously passed to
`SNP_SET_EXT_CONFIG`. Does it really need to call into the kernel to
get these certs?

>         default:
>                 ret = -EINVAL;
>                 goto out;
> diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
> index fe5d7a3ebace..d2fe1706311a 100644
> --- a/drivers/crypto/ccp/sev-dev.h
> +++ b/drivers/crypto/ccp/sev-dev.h
> @@ -66,6 +66,9 @@ struct sev_device {
>
>         bool snp_inited;
>         struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
> +       void *snp_certs_data;
> +       u32 snp_certs_len;
> +       struct sev_user_data_snp_config snp_config;
>  };
>
>  int sev_dev_init(struct psp_device *psp);
> diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
> index ffd60e8b0a31..60e7a8d1a18e 100644
> --- a/include/uapi/linux/psp-sev.h
> +++ b/include/uapi/linux/psp-sev.h
> @@ -29,6 +29,8 @@ enum {
>         SEV_GET_ID,     /* This command is deprecated, use SEV_GET_ID2 */
>         SEV_GET_ID2,
>         SNP_PLATFORM_STATUS,
> +       SNP_SET_EXT_CONFIG,
> +       SNP_GET_EXT_CONFIG,
>
>         SEV_MAX,
>  };
> @@ -190,6 +192,21 @@ struct sev_user_data_snp_config {
>         __u8 rsvd[52];
>  } __packed;
>
> +/**
> + * struct sev_data_snp_ext_config - system wide configuration value for SNP.
> + *
> + * @config_address: address of the struct sev_user_data_snp_config or 0 when
> + *             reported_tcb does not need to be updated.
> + * @certs_address: address of extended guest request certificate chain or
> + *              0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
> + * @certs_len: length of the certs
> + */
> +struct sev_user_data_ext_snp_config {
> +       __u64 config_address;           /* In */
> +       __u64 certs_address;            /* In */
> +       __u32 certs_len;                /* In */
> +};
> +
>  /**
>   * struct sev_issue_cmd - SEV ioctl parameters
>   *
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 18/45] crypto: ccp: Provide APIs to query extended attestation report
  2021-08-20 15:58 ` [PATCH Part2 v5 18/45] crypto: ccp: Provide APIs to query extended attestation report Brijesh Singh
@ 2021-09-10  3:30   ` Marc Orr
  2021-09-12  7:46     ` Dov Murik
  0 siblings, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-09-10  3:30 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> Version 2 of the GHCB specification defines VMGEXIT that is used to get
> the extended attestation report. The extended attestation report includes
> the certificate blobs provided through the SNP_SET_EXT_CONFIG.
>
> The snp_guest_ext_guest_request() will be used by the hypervisor to get
> the extended attestation report. See the GHCB specification for more
> details.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  drivers/crypto/ccp/sev-dev.c | 43 ++++++++++++++++++++++++++++++++++++
>  include/linux/psp-sev.h      | 24 ++++++++++++++++++++
>  2 files changed, 67 insertions(+)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 9ba194acbe85..e2650c3d0d0a 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -22,6 +22,7 @@
>  #include <linux/firmware.h>
>  #include <linux/gfp.h>
>  #include <linux/cpufeature.h>
> +#include <linux/sev-guest.h>
>
>  #include <asm/smp.h>
>
> @@ -1677,6 +1678,48 @@ int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
>  }
>  EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt);
>
> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> +                               unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
> +{
> +       unsigned long expected_npages;
> +       struct sev_device *sev;
> +       int rc;
> +
> +       if (!psp_master || !psp_master->sev_data)
> +               return -ENODEV;
> +
> +       sev = psp_master->sev_data;
> +
> +       if (!sev->snp_inited)
> +               return -EINVAL;
> +
> +       /*
> +        * Check if there is enough space to copy the certificate chain. Otherwise
> +        * return ERROR code defined in the GHCB specification.
> +        */
> +       expected_npages = sev->snp_certs_len >> PAGE_SHIFT;

Is this calculation for `expected_npages` correct? Assume that
`sev->snp_certs_len` is less than a page (e.g., 2000). Then, this
calculation will return `0` for `expected_npages`, rather than round
up to 1.

> +       if (*npages < expected_npages) {
> +               *npages = expected_npages;
> +               *fw_err = SNP_GUEST_REQ_INVALID_LEN;
> +               return -EINVAL;
> +       }
> +
> +       rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)&fw_err);
> +       if (rc)
> +               return rc;
> +
> +       /* Copy the certificate blob */
> +       if (sev->snp_certs_data) {
> +               *npages = expected_npages;
> +               memcpy((void *)vaddr, sev->snp_certs_data, *npages << PAGE_SHIFT);
> +       } else {
> +               *npages = 0;
> +       }
> +
> +       return rc;
> +}
> +EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
> +
>  static void sev_exit(struct kref *ref)
>  {
>         misc_deregister(&misc_dev->misc);
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 00bd684dc094..ea94ce4d834a 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -924,6 +924,23 @@ void *psp_copy_user_blob(u64 uaddr, u32 len);
>  void *snp_alloc_firmware_page(gfp_t mask);
>  void snp_free_firmware_page(void *addr);
>
> +/**
> + * snp_guest_ext_guest_request - perform the SNP extended guest request command
> + *  defined in the GHCB specification.
> + *
> + * @data: the input guest request structure
> + * @vaddr: address where the certificate blob need to be copied.
> + * @npages: number of pages for the certificate blob.
> + *    If the specified page count is less than the certificate blob size, then the
> + *    required page count is returned with error code defined in the GHCB spec.
> + *    If the specified page count is more than the certificate blob size, then
> + *    page count is updated to reflect the amount of valid data copied in the
> + *    vaddr.
> + */
> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> +                               unsigned long vaddr, unsigned long *npages,
> +                               unsigned long *error);
> +
>  #else  /* !CONFIG_CRYPTO_DEV_SP_PSP */
>
>  static inline int
> @@ -971,6 +988,13 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)
>
>  static inline void snp_free_firmware_page(void *addr) { }
>
> +static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
> +                                             unsigned long vaddr, unsigned long *n,
> +                                             unsigned long *error)
> +{
> +       return -ENODEV;
> +}
> +
>  #endif /* CONFIG_CRYPTO_DEV_SP_PSP */
>
>  #endif /* __PSP_SEV_H__ */
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2021-08-20 15:58 ` [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command Brijesh Singh
  2021-09-05  6:56   ` Dov Murik
@ 2021-09-10  3:32   ` Marc Orr
  2021-09-13 11:32     ` Brijesh Singh
  2021-09-16 15:50   ` Peter Gonda
  2022-06-13 20:58   ` Alper Gun
  3 siblings, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-09-10  3:32 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy, Pavan Kumar Paluri

On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> The KVM_SNP_INIT command is used by the hypervisor to initialize the
> SEV-SNP platform context. In a typical workflow, this command should be the
> first command issued. When creating SEV-SNP guest, the VMM must use this
> command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.
>
> The flags value must be zero, it will be extended in future SNP support to
> communicate the optional features (such as restricted INT injection etc).
>
> Co-developed-by: Pavan Kumar Paluri <papaluri@amd.com>
> Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  .../virt/kvm/amd-memory-encryption.rst        | 27 ++++++++++++
>  arch/x86/include/asm/svm.h                    |  2 +
>  arch/x86/kvm/svm/sev.c                        | 44 ++++++++++++++++++-
>  arch/x86/kvm/svm/svm.h                        |  4 ++
>  include/uapi/linux/kvm.h                      | 13 ++++++
>  5 files changed, 88 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 5c081c8c7164..7b1d32fb99a8 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -427,6 +427,33 @@ issued by the hypervisor to make the guest ready for execution.
>
>  Returns: 0 on success, -negative on error
>
> +18. KVM_SNP_INIT
> +----------------
> +
> +The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
> +context. In a typical workflow, this command should be the first command issued.
> +
> +Parameters (in/out): struct kvm_snp_init
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_snp_init {
> +                __u64 flags;
> +        };
> +
> +The flags bitmap is defined as::
> +
> +   /* enable the restricted injection */
> +   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
> +
> +   /* enable the restricted injection timer */
> +   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)
> +
> +If the specified flags is not supported then return -EOPNOTSUPP, and the supported
> +flags are returned.
> +
>  References
>  ==========
>
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 44a3f920f886..a39e31845a33 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -218,6 +218,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  #define SVM_NESTED_CTL_SEV_ENABLE      BIT(1)
>  #define SVM_NESTED_CTL_SEV_ES_ENABLE   BIT(2)
>
> +#define SVM_SEV_FEAT_SNP_ACTIVE                BIT(0)
> +
>  struct vmcb_seg {
>         u16 selector;
>         u16 attrib;
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 50fddbe56981..93da463545ef 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -235,10 +235,30 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>         sev_decommission(handle);
>  }
>
> +static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +       struct kvm_snp_init params;
> +       int ret = 0;
> +
> +       if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> +               return -EFAULT;
> +
> +       if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
> +               ret = -EOPNOTSUPP;
> +
> +       params.flags = SEV_SNP_SUPPORTED_FLAGS;
> +
> +       if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
> +               ret = -EFAULT;
> +
> +       return ret;
> +}
> +
>  static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
>         struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
>         int asid, ret;
>
>         if (kvm->created_vcpus)
> @@ -249,12 +269,22 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>                 return ret;
>
>         sev->es_active = es_active;
> +       sev->snp_active = snp_active;
>         asid = sev_asid_new(sev);
>         if (asid < 0)
>                 goto e_no_asid;
>         sev->asid = asid;
>
> -       ret = sev_platform_init(&argp->error);
> +       if (snp_active) {
> +               ret = verify_snp_init_flags(kvm, argp);
> +               if (ret)
> +                       goto e_free;
> +
> +               ret = sev_snp_init(&argp->error);
> +       } else {
> +               ret = sev_platform_init(&argp->error);
> +       }
> +
>         if (ret)
>                 goto e_free;
>
> @@ -600,6 +630,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>         save->pkru = svm->vcpu.arch.pkru;
>         save->xss  = svm->vcpu.arch.ia32_xss;
>
> +       /* Enable the SEV-SNP feature */
> +       if (sev_snp_guest(svm->vcpu.kvm))
> +               save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
> +
>         return 0;
>  }
>
> @@ -1532,6 +1566,12 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>         }
>
>         switch (sev_cmd.id) {
> +       case KVM_SEV_SNP_INIT:
> +               if (!sev_snp_enabled) {
> +                       r = -ENOTTY;
> +                       goto out;
> +               }
> +               fallthrough;
>         case KVM_SEV_ES_INIT:
>                 if (!sev_es_enabled) {
>                         r = -ENOTTY;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 01953522097d..57c3c404b0b3 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -69,6 +69,9 @@ enum {
>  /* TPR and CR2 are always written before VMRUN */
>  #define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
>
> +/* Supported init feature flags */
> +#define SEV_SNP_SUPPORTED_FLAGS                0x0
> +
>  struct kvm_sev_info {
>         bool active;            /* SEV enabled guest */
>         bool es_active;         /* SEV-ES enabled guest */
> @@ -81,6 +84,7 @@ struct kvm_sev_info {
>         u64 ap_jump_table;      /* SEV-ES AP Jump Table address */
>         struct kvm *enc_context_owner; /* Owner of copied encryption context */
>         struct misc_cg *misc_cg; /* For misc cgroup accounting */
> +       u64 snp_init_flags;

This field never gets set anywhere. Should it get set in
`verify_snp_init_flags()`?

>  };
>
>  struct kvm_svm {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d9e4aabcb31a..944e2bf601fe 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1712,6 +1712,9 @@ enum sev_cmd_id {
>         /* Guest Migration Extension */
>         KVM_SEV_SEND_CANCEL,
>
> +       /* SNP specific commands */
> +       KVM_SEV_SNP_INIT,
> +
>         KVM_SEV_NR_MAX,
>  };
>
> @@ -1808,6 +1811,16 @@ struct kvm_sev_receive_update_data {
>         __u32 trans_len;
>  };
>
> +/* enable the restricted injection */
> +#define KVM_SEV_SNP_RESTRICTED_INJET   (1 << 0)
> +
> +/* enable the restricted injection timer */
> +#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1 << 1)
> +
> +struct kvm_snp_init {
> +       __u64 flags;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU    (1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3         (1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX       (1 << 2)
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 18/45] crypto: ccp: Provide APIs to query extended attestation report
  2021-09-10  3:30   ` Marc Orr
@ 2021-09-12  7:46     ` Dov Murik
  0 siblings, 0 replies; 239+ messages in thread
From: Dov Murik @ 2021-09-12  7:46 UTC (permalink / raw)
  To: Marc Orr, Brijesh Singh
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy, Dov Murik



On 10/09/2021 6:30, Marc Orr wrote:
> On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>>
>> Version 2 of the GHCB specification defines VMGEXIT that is used to get
>> the extended attestation report. The extended attestation report includes
>> the certificate blobs provided through the SNP_SET_EXT_CONFIG.
>>
>> The snp_guest_ext_guest_request() will be used by the hypervisor to get
>> the extended attestation report. See the GHCB specification for more
>> details.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  drivers/crypto/ccp/sev-dev.c | 43 ++++++++++++++++++++++++++++++++++++
>>  include/linux/psp-sev.h      | 24 ++++++++++++++++++++
>>  2 files changed, 67 insertions(+)
>>
>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index 9ba194acbe85..e2650c3d0d0a 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -22,6 +22,7 @@
>>  #include <linux/firmware.h>
>>  #include <linux/gfp.h>
>>  #include <linux/cpufeature.h>
>> +#include <linux/sev-guest.h>
>>
>>  #include <asm/smp.h>
>>
>> @@ -1677,6 +1678,48 @@ int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
>>  }
>>  EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt);
>>
>> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
>> +                               unsigned long vaddr, unsigned long *npages, unsigned long *fw_err)
>> +{
>> +       unsigned long expected_npages;
>> +       struct sev_device *sev;
>> +       int rc;
>> +
>> +       if (!psp_master || !psp_master->sev_data)
>> +               return -ENODEV;
>> +
>> +       sev = psp_master->sev_data;
>> +
>> +       if (!sev->snp_inited)
>> +               return -EINVAL;
>> +
>> +       /*
>> +        * Check if there is enough space to copy the certificate chain. Otherwise
>> +        * return ERROR code defined in the GHCB specification.
>> +        */
>> +       expected_npages = sev->snp_certs_len >> PAGE_SHIFT;
> 
> Is this calculation for `expected_npages` correct? Assume that
> `sev->snp_certs_len` is less than a page (e.g., 2000). Then, this
> calculation will return `0` for `expected_npages`, rather than round
> up to 1.
> 

In patch 17 in sev_ioctl_snp_set_config there's a check that certs_len
is aligned to PAGE_SIZE (and this value is later in that function assigned
to sev->snp_certs_len):

+	/* Copy the certs from userspace */
+	if (input.certs_address) {
+		if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
+			return -EINVAL;

but I agree that rounding up (DIV_ROUND_UP) or asserting that
sev->snp_certs_len is aligned to PAGE_SIZE (and non-zero) makes
sense here.


-Dov


>> +       if (*npages < expected_npages) {
>> +               *npages = expected_npages;
>> +               *fw_err = SNP_GUEST_REQ_INVALID_LEN;
>> +               return -EINVAL;
>> +       }
>> +
>> +       rc = sev_do_cmd(SEV_CMD_SNP_GUEST_REQUEST, data, (int *)&fw_err);
>> +       if (rc)
>> +               return rc;
>> +
>> +       /* Copy the certificate blob */
>> +       if (sev->snp_certs_data) {
>> +               *npages = expected_npages;
>> +               memcpy((void *)vaddr, sev->snp_certs_data, *npages << PAGE_SHIFT);
>> +       } else {
>> +               *npages = 0;
>> +       }
>> +
>> +       return rc;
>> +}
>> +EXPORT_SYMBOL_GPL(snp_guest_ext_guest_request);
>> +
>>  static void sev_exit(struct kref *ref)
>>  {
>>         misc_deregister(&misc_dev->misc);
>> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
>> index 00bd684dc094..ea94ce4d834a 100644
>> --- a/include/linux/psp-sev.h
>> +++ b/include/linux/psp-sev.h
>> @@ -924,6 +924,23 @@ void *psp_copy_user_blob(u64 uaddr, u32 len);
>>  void *snp_alloc_firmware_page(gfp_t mask);
>>  void snp_free_firmware_page(void *addr);
>>
>> +/**
>> + * snp_guest_ext_guest_request - perform the SNP extended guest request command
>> + *  defined in the GHCB specification.
>> + *
>> + * @data: the input guest request structure
>> + * @vaddr: address where the certificate blob need to be copied.
>> + * @npages: number of pages for the certificate blob.
>> + *    If the specified page count is less than the certificate blob size, then the
>> + *    required page count is returned with error code defined in the GHCB spec.
>> + *    If the specified page count is more than the certificate blob size, then
>> + *    page count is updated to reflect the amount of valid data copied in the
>> + *    vaddr.
>> + */
>> +int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
>> +                               unsigned long vaddr, unsigned long *npages,
>> +                               unsigned long *error);
>> +
>>  #else  /* !CONFIG_CRYPTO_DEV_SP_PSP */
>>
>>  static inline int
>> @@ -971,6 +988,13 @@ static inline void *snp_alloc_firmware_page(gfp_t mask)
>>
>>  static inline void snp_free_firmware_page(void *addr) { }
>>
>> +static inline int snp_guest_ext_guest_request(struct sev_data_snp_guest_request *data,
>> +                                             unsigned long vaddr, unsigned long *n,
>> +                                             unsigned long *error)
>> +{
>> +       return -ENODEV;
>> +}
>> +
>>  #endif /* CONFIG_CRYPTO_DEV_SP_PSP */
>>
>>  #endif /* __PSP_SEV_H__ */
>> --
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 16/45] crypto: ccp: Add the SNP_PLATFORM_STATUS command
  2021-09-10  3:18   ` Marc Orr
@ 2021-09-13 11:17     ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-09-13 11:17 UTC (permalink / raw)
  To: Marc Orr
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy


On 9/9/21 10:18 PM, Marc Orr wrote:
> On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>> The command can be used by the userspace to query the SNP platform status
>> report. See the SEV-SNP spec for more details.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  Documentation/virt/coco/sevguest.rst | 27 +++++++++++++++++
>>  drivers/crypto/ccp/sev-dev.c         | 45 ++++++++++++++++++++++++++++
>>  include/uapi/linux/psp-sev.h         |  1 +
>>  3 files changed, 73 insertions(+)
>>
>> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
>> index 7acb8696fca4..7c51da010039 100644
>> --- a/Documentation/virt/coco/sevguest.rst
>> +++ b/Documentation/virt/coco/sevguest.rst
>> @@ -52,6 +52,22 @@ to execute due to the firmware error, then fw_err code will be set.
>>                  __u64 fw_err;
>>          };
>>
>> +The host ioctl should be called to /dev/sev device. The ioctl accepts command
>> +id and command input structure.
>> +
>> +::
>> +        struct sev_issue_cmd {
>> +                /* Command ID */
>> +                __u32 cmd;
>> +
>> +                /* Command request structure */
>> +                __u64 data;
>> +
>> +                /* firmware error code on failure (see psp-sev.h) */
>> +                __u32 error;
>> +        };
>> +
>> +
>>  2.1 SNP_GET_REPORT
>>  ------------------
>>
>> @@ -107,3 +123,14 @@ length of the blob is lesser than expected then snp_ext_report_req.certs_len wil
>>  be updated with the expected value.
>>
>>  See GHCB specification for further detail on how to parse the certificate blob.
>> +
>> +2.3 SNP_PLATFORM_STATUS
>> +-----------------------
>> +:Technology: sev-snp
>> +:Type: hypervisor ioctl cmd
>> +:Parameters (in): struct sev_data_snp_platform_status
>> +:Returns (out): 0 on success, -negative on error
>> +
>> +The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
>> +status includes API major, minor version and more. See the SEV-SNP
>> +specification for further details.
>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index 4cd7d803a624..16c6df5d412c 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -1394,6 +1394,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
>>         return ret;
>>  }
>>
>> +static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
>> +{
>> +       struct sev_device *sev = psp_master->sev_data;
>> +       struct sev_data_snp_platform_status_buf buf;
>> +       struct page *status_page;
>> +       void *data;
>> +       int ret;
>> +
>> +       if (!sev->snp_inited || !argp->data)
>> +               return -EINVAL;
>> +
>> +       status_page = alloc_page(GFP_KERNEL_ACCOUNT);
>> +       if (!status_page)
>> +               return -ENOMEM;
>> +
>> +       data = page_address(status_page);
>> +       if (snp_set_rmp_state(__pa(data), 1, true, true, false)) {
>> +               __free_pages(status_page, 0);
>> +               return -EFAULT;
>> +       }
>> +
>> +       buf.status_paddr = __psp_pa(data);
>> +       ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
>> +
>> +       /* Change the page state before accessing it */
>> +       if (snp_set_rmp_state(__pa(data), 1, false, true, true)) {
>> +               snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
> Calling `snp_leak_pages()` here seems wrong, because
> `snp_set_rmp_state()` calls `snp_leak_pages()` when it returns an
> error.

Agreed, i will fix in next rev.


>
>> +               return -EFAULT;
>> +       }
>> +
>> +       if (ret)
>> +               goto cleanup;
>> +
>> +       if (copy_to_user((void __user *)argp->data, data,
>> +                        sizeof(struct sev_user_data_snp_status)))
>> +               ret = -EFAULT;
>> +
>> +cleanup:
>> +       __free_pages(status_page, 0);
>> +       return ret;
>> +}
>> +
>>  static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>>  {
>>         void __user *argp = (void __user *)arg;
>> @@ -1445,6 +1487,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>>         case SEV_GET_ID2:
>>                 ret = sev_ioctl_do_get_id2(&input);
>>                 break;
>> +       case SNP_PLATFORM_STATUS:
>> +               ret = sev_ioctl_snp_platform_status(&input);
>> +               break;
>>         default:
>>                 ret = -EINVAL;
>>                 goto out;
>> diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
>> index bed65a891223..ffd60e8b0a31 100644
>> --- a/include/uapi/linux/psp-sev.h
>> +++ b/include/uapi/linux/psp-sev.h
>> @@ -28,6 +28,7 @@ enum {
>>         SEV_PEK_CERT_IMPORT,
>>         SEV_GET_ID,     /* This command is deprecated, use SEV_GET_ID2 */
>>         SEV_GET_ID2,
>> +       SNP_PLATFORM_STATUS,
>>
>>         SEV_MAX,
>>  };
>> --
>> 2.17.1
>>
>>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 17/45] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
  2021-09-10  3:27   ` Marc Orr
@ 2021-09-13 11:29     ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-09-13 11:29 UTC (permalink / raw)
  To: Marc Orr
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy


On 9/9/21 10:27 PM, Marc Orr wrote:
> `
>
> On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>> The SEV-SNP firmware provides the SNP_CONFIG command used to set the
>> system-wide configuration value for SNP guests. The information includes
>> the TCB version string to be reported in guest attestation reports.
>>
>> Version 2 of the GHCB specification adds an NAE (SNP extended guest
>> request) that a guest can use to query the reports that include additional
>> certificates.
>>
>> In both cases, userspace provided additional data is included in the
>> attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
>> command to give the certificate blob and the reported TCB version string
>> at once. Note that the specification defines certificate blob with a
>> specific GUID format; the userspace is responsible for building the
>> proper certificate blob. The ioctl treats it an opaque blob.
>>
>> While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
>> command that can be used to obtain the data programmed through the
>> SNP_SET_EXT_CONFIG.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  Documentation/virt/coco/sevguest.rst |  28 +++++++
>>  drivers/crypto/ccp/sev-dev.c         | 115 +++++++++++++++++++++++++++
>>  drivers/crypto/ccp/sev-dev.h         |   3 +
>>  include/uapi/linux/psp-sev.h         |  17 ++++
>>  4 files changed, 163 insertions(+)
>>
>> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
>> index 7c51da010039..64a1b5167b33 100644
>> --- a/Documentation/virt/coco/sevguest.rst
>> +++ b/Documentation/virt/coco/sevguest.rst
>> @@ -134,3 +134,31 @@ See GHCB specification for further detail on how to parse the certificate blob.
>>  The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
>>  status includes API major, minor version and more. See the SEV-SNP
>>  specification for further details.
>> +
>> +2.4 SNP_SET_EXT_CONFIG
>> +----------------------
>> +:Technology: sev-snp
>> +:Type: hypervisor ioctl cmd
>> +:Parameters (in): struct sev_data_snp_ext_config
>> +:Returns (out): 0 on success, -negative on error
>> +
>> +The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
>> +reported TCB version in the attestation report. The command is similar to
>> +SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
>> +command also accepts an additional certificate blob defined in the GHCB
>> +specification.
>> +
>> +If the certs_address is zero, then previous certificate blob will deleted.
>> +For more information on the certificate blob layout, see the GHCB spec
>> +(extended guest request message).
>> +
>> +
>> +2.4 SNP_GET_EXT_CONFIG
>> +----------------------
>> +:Technology: sev-snp
>> +:Type: hypervisor ioctl cmd
>> +:Parameters (in): struct sev_data_snp_ext_config
>> +:Returns (out): 0 on success, -negative on error
>> +
>> +The SNP_SET_EXT_CONFIG is used to query the system-wide configuration set
>> +through the SNP_SET_EXT_CONFIG.
>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index 16c6df5d412c..9ba194acbe85 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -1132,6 +1132,10 @@ static int __sev_snp_shutdown_locked(int *error)
>>         if (!sev->snp_inited)
>>                 return 0;
>>
>> +       /* Free the memory used for caching the certificate data */
>> +       kfree(sev->snp_certs_data);
>> +       sev->snp_certs_data = NULL;
>> +
>>         /* SHUTDOWN requires the DF_FLUSH */
>>         wbinvd_on_all_cpus();
>>         __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
>> @@ -1436,6 +1440,111 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
>>         return ret;
>>  }
>>
>> +static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
>> +{
>> +       struct sev_device *sev = psp_master->sev_data;
>> +       struct sev_user_data_ext_snp_config input;
>> +       int ret;
>> +
>> +       if (!sev->snp_inited || !argp->data)
>> +               return -EINVAL;
>> +
>> +       if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
>> +               return -EFAULT;
>> +
>> +       /* Copy the TCB version programmed through the SET_CONFIG to userspace */
>> +       if (input.config_address) {
>> +               if (copy_to_user((void * __user)input.config_address,
>> +                                &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
>> +                       return -EFAULT;
>> +       }
>> +
>> +       /* Copy the extended certs programmed through the SNP_SET_CONFIG */
>> +       if (input.certs_address && sev->snp_certs_data) {
>> +               if (input.certs_len < sev->snp_certs_len) {
>> +                       /* Return the certs length to userspace */
>> +                       input.certs_len = sev->snp_certs_len;
> This API to retrieve the length of the certs seems pretty odd. We only
> return the length if the input.certs_address is non-NULL. But if we
> know the length how did we allocate an address to write to
> `input.certs_address`?

Ah good point, I should provide an option to query the length when
input.cert_address == 0. This will make it much cleaner that there are
two approaches to get the length.


>> +
>> +                       ret = -ENOSR;
>> +                       goto e_done;
>> +               }
>> +
>> +               if (copy_to_user((void * __user)input.certs_address,
>> +                                sev->snp_certs_data, sev->snp_certs_len))
>> +                       return -EFAULT;
>> +       }
>> +
>> +       ret = 0;
>> +
>> +e_done:
>> +       if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
>> +               ret = -EFAULT;
>> +
>> +       return ret;
>> +}
>> +
>> +static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
>> +{
>> +       struct sev_device *sev = psp_master->sev_data;
>> +       struct sev_user_data_ext_snp_config input;
>> +       struct sev_user_data_snp_config config;
>> +       void *certs = NULL;
>> +       int ret = 0;
>> +
>> +       if (!sev->snp_inited || !argp->data)
>> +               return -EINVAL;
>> +
>> +       if (!writable)
>> +               return -EPERM;
>> +
>> +       if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
>> +               return -EFAULT;
>> +
>> +       /* Copy the certs from userspace */
>> +       if (input.certs_address) {
>> +               if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
>> +                       return -EINVAL;
>> +
>> +               certs = psp_copy_user_blob(input.certs_address, input.certs_len);
> Is `psp_copy_user_blob()` implemented in this patch series? When I
> searched through the patches, I only found an implementation that
> always returns an error. But maybe I missed the implementation?
>
> Also, out of curiosity, any reason we cannot use copy_from_user here?
psp_copy_user_blob() is a wrapper around memdup_user() -- which does
kmalloc() + copy_to_user(). The wrapper performs some additional checks
such as the blob size etc, we limit the blob size to 16K to avoid
copying a random large data from userspace.
>> +               if (IS_ERR(certs))
>> +                       return PTR_ERR(certs);
>> +       }
>> +
>> +       /* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
>> +       if (input.config_address) {
>> +               if (copy_from_user(&config,
>> +                                  (void __user *)input.config_address, sizeof(config))) {
>> +                       ret = -EFAULT;
>> +                       goto e_free;
>> +               }
>> +
>> +               ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
>> +               if (ret)
>> +                       goto e_free;
>> +
>> +               memcpy(&sev->snp_config, &config, sizeof(config));
>> +       }
>> +
>> +       /*
>> +        * If the new certs are passed then cache it else free the old certs.
>> +        */
>> +       if (certs) {
>> +               kfree(sev->snp_certs_data);
>> +               sev->snp_certs_data = certs;
>> +               sev->snp_certs_len = input.certs_len;
>> +       } else {
>> +               kfree(sev->snp_certs_data);
>> +               sev->snp_certs_data = NULL;
>> +               sev->snp_certs_len = 0;
>> +       }
>> +
>> +       return 0;
>> +
>> +e_free:
>> +       kfree(certs);
>> +       return ret;
>> +}
>> +
>>  static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>>  {
>>         void __user *argp = (void __user *)arg;
>> @@ -1490,6 +1599,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>>         case SNP_PLATFORM_STATUS:
>>                 ret = sev_ioctl_snp_platform_status(&input);
>>                 break;
>> +       case SNP_SET_EXT_CONFIG:
>> +               ret = sev_ioctl_snp_set_config(&input, writable);
>> +               break;
>> +       case SNP_GET_EXT_CONFIG:
>> +               ret = sev_ioctl_snp_get_config(&input);
>> +               break;
> What is the intended use of `SNP_GET_EXT_CONFIG`. Yes, I get that it
> returns the "EXT config" previously set via `SNP_SET_EXT_CONFIG`. But
> presumably the caller can keep track of what it's previously passed to
> `SNP_SET_EXT_CONFIG`. Does it really need to call into the kernel to
> get these certs?

This is mainly to help in the cases where the provisioning tools may not
keep track of the programmed TCB version and would prefer to read the
TCB version before updating.



^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2021-09-10  3:32   ` Marc Orr
@ 2021-09-13 11:32     ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-09-13 11:32 UTC (permalink / raw)
  To: Marc Orr
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy


On 9/9/21 10:32 PM, Marc Orr wrote:
> On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>> The KVM_SNP_INIT command is used by the hypervisor to initialize the
>> SEV-SNP platform context. In a typical workflow, this command should be the
>> first command issued. When creating SEV-SNP guest, the VMM must use this
>> command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.
>>
>> The flags value must be zero, it will be extended in future SNP support to
>> communicate the optional features (such as restricted INT injection etc).
>>
>> Co-developed-by: Pavan Kumar Paluri <papaluri@amd.com>
>> Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  .../virt/kvm/amd-memory-encryption.rst        | 27 ++++++++++++
>>  arch/x86/include/asm/svm.h                    |  2 +
>>  arch/x86/kvm/svm/sev.c                        | 44 ++++++++++++++++++-
>>  arch/x86/kvm/svm/svm.h                        |  4 ++
>>  include/uapi/linux/kvm.h                      | 13 ++++++
>>  5 files changed, 88 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
>> index 5c081c8c7164..7b1d32fb99a8 100644
>> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
>> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
>> @@ -427,6 +427,33 @@ issued by the hypervisor to make the guest ready for execution.
>>
>>  Returns: 0 on success, -negative on error
>>
>> +18. KVM_SNP_INIT
>> +----------------
>> +
>> +The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
>> +context. In a typical workflow, this command should be the first command issued.
>> +
>> +Parameters (in/out): struct kvm_snp_init
>> +
>> +Returns: 0 on success, -negative on error
>> +
>> +::
>> +
>> +        struct kvm_snp_init {
>> +                __u64 flags;
>> +        };
>> +
>> +The flags bitmap is defined as::
>> +
>> +   /* enable the restricted injection */
>> +   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
>> +
>> +   /* enable the restricted injection timer */
>> +   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)
>> +
>> +If the specified flags is not supported then return -EOPNOTSUPP, and the supported
>> +flags are returned.
>> +
>>  References
>>  ==========
>>
>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
>> index 44a3f920f886..a39e31845a33 100644
>> --- a/arch/x86/include/asm/svm.h
>> +++ b/arch/x86/include/asm/svm.h
>> @@ -218,6 +218,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>>  #define SVM_NESTED_CTL_SEV_ENABLE      BIT(1)
>>  #define SVM_NESTED_CTL_SEV_ES_ENABLE   BIT(2)
>>
>> +#define SVM_SEV_FEAT_SNP_ACTIVE                BIT(0)
>> +
>>  struct vmcb_seg {
>>         u16 selector;
>>         u16 attrib;
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 50fddbe56981..93da463545ef 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -235,10 +235,30 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>>         sev_decommission(handle);
>>  }
>>
>> +static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> +{
>> +       struct kvm_snp_init params;
>> +       int ret = 0;
>> +
>> +       if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
>> +               return -EFAULT;
>> +
>> +       if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
>> +               ret = -EOPNOTSUPP;
>> +
>> +       params.flags = SEV_SNP_SUPPORTED_FLAGS;
>> +
>> +       if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
>> +               ret = -EFAULT;
>> +
>> +       return ret;
>> +}
>> +
>>  static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>  {
>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
>>         struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
>>         int asid, ret;
>>
>>         if (kvm->created_vcpus)
>> @@ -249,12 +269,22 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>                 return ret;
>>
>>         sev->es_active = es_active;
>> +       sev->snp_active = snp_active;
>>         asid = sev_asid_new(sev);
>>         if (asid < 0)
>>                 goto e_no_asid;
>>         sev->asid = asid;
>>
>> -       ret = sev_platform_init(&argp->error);
>> +       if (snp_active) {
>> +               ret = verify_snp_init_flags(kvm, argp);
>> +               if (ret)
>> +                       goto e_free;
>> +
>> +               ret = sev_snp_init(&argp->error);
>> +       } else {
>> +               ret = sev_platform_init(&argp->error);
>> +       }
>> +
>>         if (ret)
>>                 goto e_free;
>>
>> @@ -600,6 +630,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>>         save->pkru = svm->vcpu.arch.pkru;
>>         save->xss  = svm->vcpu.arch.ia32_xss;
>>
>> +       /* Enable the SEV-SNP feature */
>> +       if (sev_snp_guest(svm->vcpu.kvm))
>> +               save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
>> +
>>         return 0;
>>  }
>>
>> @@ -1532,6 +1566,12 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>         }
>>
>>         switch (sev_cmd.id) {
>> +       case KVM_SEV_SNP_INIT:
>> +               if (!sev_snp_enabled) {
>> +                       r = -ENOTTY;
>> +                       goto out;
>> +               }
>> +               fallthrough;
>>         case KVM_SEV_ES_INIT:
>>                 if (!sev_es_enabled) {
>>                         r = -ENOTTY;
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index 01953522097d..57c3c404b0b3 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -69,6 +69,9 @@ enum {
>>  /* TPR and CR2 are always written before VMRUN */
>>  #define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
>>
>> +/* Supported init feature flags */
>> +#define SEV_SNP_SUPPORTED_FLAGS                0x0
>> +
>>  struct kvm_sev_info {
>>         bool active;            /* SEV enabled guest */
>>         bool es_active;         /* SEV-ES enabled guest */
>> @@ -81,6 +84,7 @@ struct kvm_sev_info {
>>         u64 ap_jump_table;      /* SEV-ES AP Jump Table address */
>>         struct kvm *enc_context_owner; /* Owner of copied encryption context */
>>         struct misc_cg *misc_cg; /* For misc cgroup accounting */
>> +       u64 snp_init_flags;
> This field never gets set anywhere. Should it get set in
> `verify_snp_init_flags()`?

Actually the supported flag value is zero, so didn't update it. But to
make code cleaner I will set the flag after the negotiation.


>
>>  };
>>
>>  struct kvm_svm {
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index d9e4aabcb31a..944e2bf601fe 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1712,6 +1712,9 @@ enum sev_cmd_id {
>>         /* Guest Migration Extension */
>>         KVM_SEV_SEND_CANCEL,
>>
>> +       /* SNP specific commands */
>> +       KVM_SEV_SNP_INIT,
>> +
>>         KVM_SEV_NR_MAX,
>>  };
>>
>> @@ -1808,6 +1811,16 @@ struct kvm_sev_receive_update_data {
>>         __u32 trans_len;
>>  };
>>
>> +/* enable the restricted injection */
>> +#define KVM_SEV_SNP_RESTRICTED_INJET   (1 << 0)
>> +
>> +/* enable the restricted injection timer */
>> +#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1 << 1)
>> +
>> +struct kvm_snp_init {
>> +       __u64 flags;
>> +};
>> +
>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU    (1 << 0)
>>  #define KVM_DEV_ASSIGN_PCI_2_3         (1 << 1)
>>  #define KVM_DEV_ASSIGN_MASK_INTX       (1 << 2)
>> --
>> 2.17.1
>>
>>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2021-08-20 15:58 ` [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command Brijesh Singh
  2021-09-05  6:56   ` Dov Murik
  2021-09-10  3:32   ` Marc Orr
@ 2021-09-16 15:50   ` Peter Gonda
  2022-06-13 20:58   ` Alper Gun
  3 siblings, 0 replies; 239+ messages in thread
From: Peter Gonda @ 2021-09-16 15:50 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	Marc Orr, sathyanarayanan.kuppuswamy, Pavan Kumar Paluri

On Fri, Aug 20, 2021 at 10:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> The KVM_SNP_INIT command is used by the hypervisor to initialize the
> SEV-SNP platform context. In a typical workflow, this command should be the
> first command issued. When creating SEV-SNP guest, the VMM must use this
> command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.
>
> The flags value must be zero, it will be extended in future SNP support to
> communicate the optional features (such as restricted INT injection etc).
>
> Co-developed-by: Pavan Kumar Paluri <papaluri@amd.com>
> Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  .../virt/kvm/amd-memory-encryption.rst        | 27 ++++++++++++
>  arch/x86/include/asm/svm.h                    |  2 +
>  arch/x86/kvm/svm/sev.c                        | 44 ++++++++++++++++++-
>  arch/x86/kvm/svm/svm.h                        |  4 ++
>  include/uapi/linux/kvm.h                      | 13 ++++++
>  5 files changed, 88 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 5c081c8c7164..7b1d32fb99a8 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -427,6 +427,33 @@ issued by the hypervisor to make the guest ready for execution.
>
>  Returns: 0 on success, -negative on error
>
> +18. KVM_SNP_INIT
> +----------------
> +
> +The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
> +context. In a typical workflow, this command should be the first command issued.
> +
> +Parameters (in/out): struct kvm_snp_init
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_snp_init {
> +                __u64 flags;
> +        };
> +
> +The flags bitmap is defined as::
> +
> +   /* enable the restricted injection */
> +   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
> +
> +   /* enable the restricted injection timer */
> +   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)
> +
> +If the specified flags is not supported then return -EOPNOTSUPP, and the supported
> +flags are returned.
> +
>  References
>  ==========
>
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 44a3f920f886..a39e31845a33 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -218,6 +218,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  #define SVM_NESTED_CTL_SEV_ENABLE      BIT(1)
>  #define SVM_NESTED_CTL_SEV_ES_ENABLE   BIT(2)
>
> +#define SVM_SEV_FEAT_SNP_ACTIVE                BIT(0)
> +
>  struct vmcb_seg {
>         u16 selector;
>         u16 attrib;
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 50fddbe56981..93da463545ef 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -235,10 +235,30 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>         sev_decommission(handle);
>  }
>
> +static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +       struct kvm_snp_init params;
> +       int ret = 0;
> +
> +       if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> +               return -EFAULT;
> +
> +       if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
> +               ret = -EOPNOTSUPP;
> +
> +       params.flags = SEV_SNP_SUPPORTED_FLAGS;
> +
> +       if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
> +               ret = -EFAULT;
> +
> +       return ret;
> +}
> +
>  static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
>         struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
>         int asid, ret;

Not sure if this is the patch place for this but I think you want to
disallow svm_vm_copy_asid_from() if snp_active == true.

>
>         if (kvm->created_vcpus)
> @@ -249,12 +269,22 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>                 return ret;
>
>         sev->es_active = es_active;
> +       sev->snp_active = snp_active;
>         asid = sev_asid_new(sev);
>         if (asid < 0)
>                 goto e_no_asid;
>         sev->asid = asid;
>
> -       ret = sev_platform_init(&argp->error);
> +       if (snp_active) {
> +               ret = verify_snp_init_flags(kvm, argp);
> +               if (ret)
> +                       goto e_free;
> +
> +               ret = sev_snp_init(&argp->error);
> +       } else {
> +               ret = sev_platform_init(&argp->error);
> +       }
> +
>         if (ret)
>                 goto e_free;
>
> @@ -600,6 +630,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>         save->pkru = svm->vcpu.arch.pkru;
>         save->xss  = svm->vcpu.arch.ia32_xss;
>
> +       /* Enable the SEV-SNP feature */
> +       if (sev_snp_guest(svm->vcpu.kvm))
> +               save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
> +
>         return 0;
>  }
>
> @@ -1532,6 +1566,12 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>         }
>
>         switch (sev_cmd.id) {
> +       case KVM_SEV_SNP_INIT:
> +               if (!sev_snp_enabled) {
> +                       r = -ENOTTY;
> +                       goto out;
> +               }
> +               fallthrough;
>         case KVM_SEV_ES_INIT:
>                 if (!sev_es_enabled) {
>                         r = -ENOTTY;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 01953522097d..57c3c404b0b3 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -69,6 +69,9 @@ enum {
>  /* TPR and CR2 are always written before VMRUN */
>  #define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
>
> +/* Supported init feature flags */
> +#define SEV_SNP_SUPPORTED_FLAGS                0x0
> +
>  struct kvm_sev_info {
>         bool active;            /* SEV enabled guest */
>         bool es_active;         /* SEV-ES enabled guest */
> @@ -81,6 +84,7 @@ struct kvm_sev_info {
>         u64 ap_jump_table;      /* SEV-ES AP Jump Table address */
>         struct kvm *enc_context_owner; /* Owner of copied encryption context */
>         struct misc_cg *misc_cg; /* For misc cgroup accounting */
> +       u64 snp_init_flags;
>  };
>
>  struct kvm_svm {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d9e4aabcb31a..944e2bf601fe 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1712,6 +1712,9 @@ enum sev_cmd_id {
>         /* Guest Migration Extension */
>         KVM_SEV_SEND_CANCEL,
>
> +       /* SNP specific commands */
> +       KVM_SEV_SNP_INIT,
> +
>         KVM_SEV_NR_MAX,
>  };
>
> @@ -1808,6 +1811,16 @@ struct kvm_sev_receive_update_data {
>         __u32 trans_len;
>  };
>
> +/* enable the restricted injection */
> +#define KVM_SEV_SNP_RESTRICTED_INJET   (1 << 0)
> +
> +/* enable the restricted injection timer */
> +#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1 << 1)
> +
> +struct kvm_snp_init {
> +       __u64 flags;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU    (1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3         (1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX       (1 << 2)
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 01/45] x86/cpufeatures: Add SEV-SNP CPU feature
  2021-08-20 15:58 ` [PATCH Part2 v5 01/45] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
@ 2021-09-16 16:56   ` Borislav Petkov
  2021-09-16 17:35     ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Borislav Petkov @ 2021-09-16 16:56 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:58:34AM -0500, Brijesh Singh wrote:
> Add CPU feature detection for Secure Encrypted Virtualization with
> Secure Nested Paging. This feature adds a strong memory integrity
> protection to help prevent malicious hypervisor-based attacks like
> data replay, memory re-mapping, and more.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h       | 1 +
>  arch/x86/kernel/cpu/amd.c                | 3 ++-
>  tools/arch/x86/include/asm/cpufeatures.h | 1 +
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index d0ce5cfd3ac1..62f458680772 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -398,6 +398,7 @@
>  #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
>  #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
>  #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
> +#define X86_FEATURE_SEV_SNP		(19*32+4)  /* AMD Secure Encrypted Virtualization - Secure Nested Paging */

s/AMD Secure Encrypted Virtualization/AMD SEV/g

Bit 1 above already has that string - no need for repeating it
everywhere.

Also, note the vertical alignment (space after the '+'):

					(19*32+ 4)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 02/45] iommu/amd: Introduce function to check SEV-SNP support
  2021-08-20 15:58 ` [PATCH Part2 v5 02/45] iommu/amd: Introduce function to check SEV-SNP support Brijesh Singh
@ 2021-09-16 17:26   ` Borislav Petkov
  0 siblings, 0 replies; 239+ messages in thread
From: Borislav Petkov @ 2021-09-16 17:26 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:58:35AM -0500, Brijesh Singh wrote:
> The SEV-SNP support requires that IOMMU must to enabled, see the IOMMU

s/must to/is/

> spec section 2.12 for further details. If IOMMU is not enabled or the
> SNPSup extended feature register is not set then the SNP_INIT command
> (used for initializing firmware) will fail.
> 
> The iommu_sev_snp_supported() can be used to check if IOMMU supports the

"can be used"?

Just say what is going to use it.

> SEV-SNP feature.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  drivers/iommu/amd/init.c | 30 ++++++++++++++++++++++++++++++
>  include/linux/iommu.h    |  9 +++++++++
>  2 files changed, 39 insertions(+)
> 
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 46280e6e1535..bd420fb71126 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -3320,3 +3320,33 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
>  
>  	return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);
>  }
> +
> +bool iommu_sev_snp_supported(void)
> +{
> +	struct amd_iommu *iommu;
> +
> +	/*
> +	 * The SEV-SNP support requires that IOMMU must be enabled, and is
> +	 * not configured in the passthrough mode.
> +	 */
> +	if (no_iommu || iommu_default_passthrough()) {
> +		pr_err("SEV-SNP: IOMMU is either disabled or configured in passthrough mode.\n");
> +		return false;
> +	}
> +
> +	/*
> +	 * Iterate through all the IOMMUs and verify the SNPSup feature is
> +	 * enabled.
> +	 */
> +	for_each_iommu(iommu) {
> +		if (!iommu_feature(iommu, FEATURE_SNP)) {
> +			pr_err("SNPSup is disabled (devid: %02x:%02x.%x)\n",
> +			       PCI_BUS_NUM(iommu->devid), PCI_SLOT(iommu->devid),
> +			       PCI_FUNC(iommu->devid));
> +			return false;
> +		}
> +	}
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sev_snp_supported);

That export is not needed.

> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 32d448050bf7..269abc17b2c3 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -604,6 +604,12 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
>  void iommu_sva_unbind_device(struct iommu_sva *handle);
>  u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +bool iommu_sev_snp_supported(void);
> +#else
> +static inline bool iommu_sev_snp_supported(void) { return false; }
> +#endif
> +
>  #else /* CONFIG_IOMMU_API */
>  
>  struct iommu_ops {};
> @@ -999,6 +1005,9 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
>  {
>  	return NULL;
>  }
> +
> +static inline bool iommu_sev_snp_supported(void) { return false; }
> +

Most of those stubs and ifdeffery is not needed if you put the function
itself in

#ifdef CONFIG_AMD_MEM_ENCRYPT

...

#endif

as it is called by sev.c only, AFAICT, and latter is enabled by
CONFIG_AMD_MEM_ENCRYPT anyway.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 01/45] x86/cpufeatures: Add SEV-SNP CPU feature
  2021-09-16 16:56   ` Borislav Petkov
@ 2021-09-16 17:35     ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-09-16 17:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy



On 9/16/21 11:56 AM, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:58:34AM -0500, Brijesh Singh wrote:
>> Add CPU feature detection for Secure Encrypted Virtualization with
>> Secure Nested Paging. This feature adds a strong memory integrity
>> protection to help prevent malicious hypervisor-based attacks like
>> data replay, memory re-mapping, and more.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>   arch/x86/include/asm/cpufeatures.h       | 1 +
>>   arch/x86/kernel/cpu/amd.c                | 3 ++-
>>   tools/arch/x86/include/asm/cpufeatures.h | 1 +
>>   3 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index d0ce5cfd3ac1..62f458680772 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -398,6 +398,7 @@
>>   #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
>>   #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
>>   #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
>> +#define X86_FEATURE_SEV_SNP		(19*32+4)  /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
> 
> s/AMD Secure Encrypted Virtualization/AMD SEV/g
> 
> Bit 1 above already has that string - no need for repeating it
> everywhere.
> 
> Also, note the vertical alignment (space after the '+'):
> 
> 					(19*32+ 4)
> 

Noted. I will fix in next rev. thanks

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 16/45] crypto: ccp: Add the SNP_PLATFORM_STATUS command
  2021-08-20 15:58 ` [PATCH Part2 v5 16/45] crypto: ccp: Add the SNP_PLATFORM_STATUS command Brijesh Singh
  2021-09-10  3:18   ` Marc Orr
@ 2021-09-22 17:35   ` Dr. David Alan Gilbert
  2021-09-23 18:01     ` Brijesh Singh
  1 sibling, 1 reply; 239+ messages in thread
From: Dr. David Alan Gilbert @ 2021-09-22 17:35 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

* Brijesh Singh (brijesh.singh@amd.com) wrote:
> The command can be used by the userspace to query the SNP platform status
> report. See the SEV-SNP spec for more details.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  Documentation/virt/coco/sevguest.rst | 27 +++++++++++++++++
>  drivers/crypto/ccp/sev-dev.c         | 45 ++++++++++++++++++++++++++++
>  include/uapi/linux/psp-sev.h         |  1 +
>  3 files changed, 73 insertions(+)
> 
> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
> index 7acb8696fca4..7c51da010039 100644
> --- a/Documentation/virt/coco/sevguest.rst
> +++ b/Documentation/virt/coco/sevguest.rst
> @@ -52,6 +52,22 @@ to execute due to the firmware error, then fw_err code will be set.
>                  __u64 fw_err;
>          };
>  
> +The host ioctl should be called to /dev/sev device. The ioctl accepts command
> +id and command input structure.
> +
> +::
> +        struct sev_issue_cmd {
> +                /* Command ID */
> +                __u32 cmd;
> +
> +                /* Command request structure */
> +                __u64 data;
> +
> +                /* firmware error code on failure (see psp-sev.h) */
> +                __u32 error;
> +        };
> +
> +
>  2.1 SNP_GET_REPORT
>  ------------------
>  
> @@ -107,3 +123,14 @@ length of the blob is lesser than expected then snp_ext_report_req.certs_len wil
>  be updated with the expected value.
>  
>  See GHCB specification for further detail on how to parse the certificate blob.
> +
> +2.3 SNP_PLATFORM_STATUS
> +-----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_platform_status
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
> +status includes API major, minor version and more. See the SEV-SNP
> +specification for further details.
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 4cd7d803a624..16c6df5d412c 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -1394,6 +1394,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
>  	return ret;
>  }
>  
> +static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
> +{
> +	struct sev_device *sev = psp_master->sev_data;
> +	struct sev_data_snp_platform_status_buf buf;
> +	struct page *status_page;
> +	void *data;
> +	int ret;
> +
> +	if (!sev->snp_inited || !argp->data)
> +		return -EINVAL;
> +
> +	status_page = alloc_page(GFP_KERNEL_ACCOUNT);
> +	if (!status_page)
> +		return -ENOMEM;
> +
> +	data = page_address(status_page);
> +	if (snp_set_rmp_state(__pa(data), 1, true, true, false)) {
> +		__free_pages(status_page, 0);
> +		return -EFAULT;
> +	}
> +
> +	buf.status_paddr = __psp_pa(data);
> +	ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
> +
> +	/* Change the page state before accessing it */
> +	if (snp_set_rmp_state(__pa(data), 1, false, true, true)) {
> +		snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
> +		return -EFAULT;
> +	}

Could we find a way of returning some of these errors into the output,
rather than interepreting them from errno values?

Dave

> +	if (ret)
> +		goto cleanup;
> +
> +	if (copy_to_user((void __user *)argp->data, data,
> +			 sizeof(struct sev_user_data_snp_status)))
> +		ret = -EFAULT;
> +
> +cleanup:
> +	__free_pages(status_page, 0);
> +	return ret;
> +}
> +
>  static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>  {
>  	void __user *argp = (void __user *)arg;
> @@ -1445,6 +1487,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
>  	case SEV_GET_ID2:
>  		ret = sev_ioctl_do_get_id2(&input);
>  		break;
> +	case SNP_PLATFORM_STATUS:
> +		ret = sev_ioctl_snp_platform_status(&input);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		goto out;
> diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
> index bed65a891223..ffd60e8b0a31 100644
> --- a/include/uapi/linux/psp-sev.h
> +++ b/include/uapi/linux/psp-sev.h
> @@ -28,6 +28,7 @@ enum {
>  	SEV_PEK_CERT_IMPORT,
>  	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
>  	SEV_GET_ID2,
> +	SNP_PLATFORM_STATUS,
>  
>  	SEV_MAX,
>  };
> -- 
> 2.17.1
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2021-08-20 15:58 ` [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Brijesh Singh
@ 2021-09-22 18:55   ` Dr. David Alan Gilbert
  2021-09-23 18:09     ` Brijesh Singh
  2021-10-12 20:44   ` Sean Christopherson
  1 sibling, 1 reply; 239+ messages in thread
From: Dr. David Alan Gilbert @ 2021-09-22 18:55 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

* Brijesh Singh (brijesh.singh@amd.com) wrote:
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of a VMCB, VMSA or AVIC backing page.
> 
> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> backing   pages as "in-use" in the RMP after a successful VMRUN.  This is
> done for _all_ VMs, not just SNP-Active VMs.

Can you explain what 'globally enabled' means?

Or more specifically, can we trip this bug on public hardware that has
the SNP enabled in the bios, but no SNP init in the host OS?

Dave

> If the hypervisor accesses an in-use page through a writable translation,
> the CPU will throw an RMP violation #PF.  On early SNP hardware, if an
> in-use page is 2mb aligned and software accesses any part of the associated
> 2mb region with a hupage, the CPU will incorrectly treat the entire 2mb
> region as in-use and signal a spurious RMP violation #PF.
> 
> The recommended is to not use the hugepage for the VMCB, VMSA or
> AVIC backing page. Add a generic allocator that will ensure that the page
> returns is not hugepage (2mb or 1gb) and is safe to be used when SEV-SNP
> is enabled.
> 
> Co-developed-by: Marc Orr <marcorr@google.com>
> Signed-off-by: Marc Orr <marcorr@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/kvm/lapic.c               |  5 ++++-
>  arch/x86/kvm/svm/sev.c             | 35 ++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c             | 16 ++++++++++++--
>  arch/x86/kvm/svm/svm.h             |  1 +
>  6 files changed, 56 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index a12a4987154e..36a9c23a4b27 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -122,6 +122,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush)
>  KVM_X86_OP_NULL(migrate_timers)
>  KVM_X86_OP(msr_filter_changed)
>  KVM_X86_OP_NULL(complete_emulated_msr)
> +KVM_X86_OP(alloc_apic_backing_page)
>  
>  #undef KVM_X86_OP
>  #undef KVM_X86_OP_NULL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 974cbfb1eefe..5ad6255ff5d5 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1453,6 +1453,7 @@ struct kvm_x86_ops {
>  	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
>  
>  	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> +	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
>  };
>  
>  struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index ba5a27879f1d..05b45747b20b 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2457,7 +2457,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
>  
>  	vcpu->arch.apic = apic;
>  
> -	apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> +	if (kvm_x86_ops.alloc_apic_backing_page)
> +		apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
> +	else
> +		apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
>  	if (!apic->regs) {
>  		printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
>  		       vcpu->vcpu_id);
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 1644da5fc93f..8771b878193f 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2703,3 +2703,38 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
>  		break;
>  	}
>  }
> +
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long pfn;
> +	struct page *p;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +
> +	/*
> +	 * Allocate an SNP safe page to workaround the SNP erratum where
> +	 * the CPU will incorrectly signal an RMP violation  #PF if a
> +	 * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
> +	 * or AVIC backing page. The recommeded workaround is to not use the
> +	 * hugepage.
> +	 *
> +	 * Allocate one extra page, use a page which is not 2mb aligned
> +	 * and free the other.
> +	 */
> +	p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
> +	if (!p)
> +		return NULL;
> +
> +	split_page(p, 1);
> +
> +	pfn = page_to_pfn(p);
> +	if (IS_ALIGNED(__pfn_to_phys(pfn), PMD_SIZE)) {
> +		pfn++;
> +		__free_page(p);
> +	} else {
> +		__free_page(pfn_to_page(pfn + 1));
> +	}
> +
> +	return pfn_to_page(pfn);
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 25773bf72158..058eea8353c9 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1368,7 +1368,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
>  	svm = to_svm(vcpu);
>  
>  	err = -ENOMEM;
> -	vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +	vmcb01_page = snp_safe_alloc_page(vcpu);
>  	if (!vmcb01_page)
>  		goto out;
>  
> @@ -1377,7 +1377,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
>  		 * SEV-ES guests require a separate VMSA page used to contain
>  		 * the encrypted register state of the guest.
>  		 */
> -		vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +		vmsa_page = snp_safe_alloc_page(vcpu);
>  		if (!vmsa_page)
>  			goto error_free_vmcb_page;
>  
> @@ -4539,6 +4539,16 @@ static int svm_vm_init(struct kvm *kvm)
>  	return 0;
>  }
>  
> +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
> +{
> +	struct page *page = snp_safe_alloc_page(vcpu);
> +
> +	if (!page)
> +		return NULL;
> +
> +	return page_address(page);
> +}
> +
>  static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.hardware_unsetup = svm_hardware_teardown,
>  	.hardware_enable = svm_hardware_enable,
> @@ -4667,6 +4677,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.complete_emulated_msr = svm_complete_emulated_msr,
>  
>  	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> +
> +	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
>  };
>  
>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index d1f1512a4b47..e40800e9c998 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -575,6 +575,7 @@ void sev_es_create_vcpu(struct vcpu_svm *svm);
>  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
>  void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
>  void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>  
>  /* vmenter.S */
>  
> -- 
> 2.17.1
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
  2021-08-20 15:58 ` [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests Brijesh Singh
@ 2021-09-23 17:18   ` Dr. David Alan Gilbert
  2021-10-12 18:46   ` Sean Christopherson
  1 sibling, 0 replies; 239+ messages in thread
From: Dr. David Alan Gilbert @ 2021-09-23 17:18 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

* Brijesh Singh (brijesh.singh@amd.com) wrote:
> When SEV-SNP is enabled, the guest private pages are added in the RMP
> table; while adding the pages, the rmp_make_private() unmaps the pages
> from the direct map. If KSM attempts to access those unmapped pages then
> it will trigger #PF (page-not-present).
> 
> Encrypted guest pages cannot be shared between the process, so an
> userspace should not mark the region mergeable but to be safe, mark the
> process vma unmerable before adding the pages in the RMP table.
              ^^^^^^^^^

(and in the subject) -> unmergeable

> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 4b126598b7aa..dcef0ae5f8e4 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -18,11 +18,13 @@
>  #include <linux/processor.h>
>  #include <linux/trace_events.h>
>  #include <linux/sev.h>
> +#include <linux/ksm.h>
>  #include <asm/fpu/internal.h>
>  
>  #include <asm/pkru.h>
>  #include <asm/trapnr.h>
>  #include <asm/sev.h>
> +#include <asm/mman.h>
>  
>  #include "x86.h"
>  #include "svm.h"
> @@ -1683,6 +1685,30 @@ static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
>  	return false;
>  }
>  
> +static int snp_mark_unmergable(struct kvm *kvm, u64 start, u64 size)
                       ^^^^^^^^^^

> +{
> +	struct vm_area_struct *vma;
> +	u64 end = start + size;

Do you need to worry about wrap there? (User supplied start/size?)

Dave

> +	int ret;
> +
> +	do {
> +		vma = find_vma_intersection(kvm->mm, start, end);
> +		if (!vma) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +				  MADV_UNMERGEABLE, &vma->vm_flags);
> +		if (ret)
> +			break;
> +
> +		start = vma->vm_end;
> +	} while (end > vma->vm_end);
> +
> +	return ret;
> +}
> +
>  static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
>  	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> @@ -1707,6 +1733,12 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	if (!is_hva_registered(kvm, params.uaddr, params.len))
>  		return -EINVAL;
>  
> +	mmap_write_lock(kvm->mm);
> +	ret = snp_mark_unmergable(kvm, params.uaddr, params.len);
> +	mmap_write_unlock(kvm->mm);
> +	if (ret)
> +		return -EFAULT;
> +
>  	/*
>  	 * The userspace memory is already locked so technically we don't
>  	 * need to lock it again. Later part of the function needs to know
> -- 
> 2.17.1
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 16/45] crypto: ccp: Add the SNP_PLATFORM_STATUS command
  2021-09-22 17:35   ` Dr. David Alan Gilbert
@ 2021-09-23 18:01     ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-09-23 18:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 9/22/21 12:35 PM, Dr. David Alan Gilbert wrote:
> * Brijesh Singh (brijesh.singh@amd.com) wrote:
>> The command can be used by the userspace to query the SNP platform status
>> report. See the SEV-SNP spec for more details.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  Documentation/virt/coco/sevguest.rst | 27 +++++++++++++++++
>>  drivers/crypto/ccp/sev-dev.c         | 45 ++++++++++++++++++++++++++++
>>  include/uapi/linux/psp-sev.h         |  1 +
>>  3 files changed, 73 insertions(+)
>>
>> diff --git a/Documentation/virt/coco/sevguest.rst b/Documentation/virt/coco/sevguest.rst
>> index 7acb8696fca4..7c51da010039 100644
>> --- a/Documentation/virt/coco/sevguest.rst
>> +++ b/Documentation/virt/coco/sevguest.rst
>> @@ -52,6 +52,22 @@ to execute due to the firmware error, then fw_err code will be set.
>>                  __u64 fw_err;
>>          };
>>  
>> +The host ioctl should be called to /dev/sev device. The ioctl accepts command
>> +id and command input structure.
>> +
>> +::
>> +        struct sev_issue_cmd {
>> +                /* Command ID */
>> +                __u32 cmd;
>> +
>> +                /* Command request structure */
>> +                __u64 data;
>> +
>> +                /* firmware error code on failure (see psp-sev.h) */
>> +                __u32 error;
>> +        };
>> +
>> +
>>  2.1 SNP_GET_REPORT
>>  ------------------
>>  
>> @@ -107,3 +123,14 @@ length of the blob is lesser than expected then snp_ext_report_req.certs_len wil
>>  be updated with the expected value.
>>  
>>  See GHCB specification for further detail on how to parse the certificate blob.
>> +
>> +2.3 SNP_PLATFORM_STATUS
>> +-----------------------
>> +:Technology: sev-snp
>> +:Type: hypervisor ioctl cmd
>> +:Parameters (in): struct sev_data_snp_platform_status
>> +:Returns (out): 0 on success, -negative on error
>> +
>> +The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
>> +status includes API major, minor version and more. See the SEV-SNP
>> +specification for further details.
>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>> index 4cd7d803a624..16c6df5d412c 100644
>> --- a/drivers/crypto/ccp/sev-dev.c
>> +++ b/drivers/crypto/ccp/sev-dev.c
>> @@ -1394,6 +1394,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
>>  	return ret;
>>  }
>>  
>> +static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
>> +{
>> +	struct sev_device *sev = psp_master->sev_data;
>> +	struct sev_data_snp_platform_status_buf buf;
>> +	struct page *status_page;
>> +	void *data;
>> +	int ret;
>> +
>> +	if (!sev->snp_inited || !argp->data)
>> +		return -EINVAL;
>> +
>> +	status_page = alloc_page(GFP_KERNEL_ACCOUNT);
>> +	if (!status_page)
>> +		return -ENOMEM;
>> +
>> +	data = page_address(status_page);
>> +	if (snp_set_rmp_state(__pa(data), 1, true, true, false)) {
>> +		__free_pages(status_page, 0);
>> +		return -EFAULT;
>> +	}
>> +
>> +	buf.status_paddr = __psp_pa(data);
>> +	ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
>> +
>> +	/* Change the page state before accessing it */
>> +	if (snp_set_rmp_state(__pa(data), 1, false, true, true)) {
>> +		snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
>> +		return -EFAULT;
>> +	}
> Could we find a way of returning some of these errors into the output,
> rather than interepreting them from errno values?

I can try to find a corresponding fw error code and use it where
applicable. e.g in this case I can use the SEV_RET_INVALID_PAGE_STATE.

thanks


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2021-09-22 18:55   ` Dr. David Alan Gilbert
@ 2021-09-23 18:09     ` Brijesh Singh
  2021-09-23 18:39       ` Dr. David Alan Gilbert
  2021-09-23 19:17       ` Marc Orr
  0 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-09-23 18:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 9/22/21 1:55 PM, Dr. David Alan Gilbert wrote:
> * Brijesh Singh (brijesh.singh@amd.com) wrote:
>> Implement a workaround for an SNP erratum where the CPU will incorrectly
>> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
>> RMP entry of a VMCB, VMSA or AVIC backing page.
>>
>> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
>> backing   pages as "in-use" in the RMP after a successful VMRUN.  This is
>> done for _all_ VMs, not just SNP-Active VMs.
> Can you explain what 'globally enabled' means?

This means that SNP is enabled in  host SYSCFG_MSR.Snp=1. Once its
enabled then RMP checks are enforced.


> Or more specifically, can we trip this bug on public hardware that has
> the SNP enabled in the bios, but no SNP init in the host OS?

Enabling the SNP support on host is 3 step process:

step1 (bios): reserve memory for the RMP table.

step2 (host): initialize the RMP table memory, set the SYSCFG msr to
enable the SNP feature

step3 (host): call the SNP_INIT to initialize the SNP firmware (this is
needed only if you ever plan to launch SNP guest from this host).

The "SNP globally enabled" means the step 1 to 2. The RMP checks are
enforced as soon as step 2 is completed.

thanks

>
> Dave
>
>> If the hypervisor accesses an in-use page through a writable translation,
>> the CPU will throw an RMP violation #PF.  On early SNP hardware, if an
>> in-use page is 2mb aligned and software accesses any part of the associated
>> 2mb region with a hupage, the CPU will incorrectly treat the entire 2mb
>> region as in-use and signal a spurious RMP violation #PF.
>>
>> The recommended is to not use the hugepage for the VMCB, VMSA or
>> AVIC backing page. Add a generic allocator that will ensure that the page
>> returns is not hugepage (2mb or 1gb) and is safe to be used when SEV-SNP
>> is enabled.
>>
>> Co-developed-by: Marc Orr <marcorr@google.com>
>> Signed-off-by: Marc Orr <marcorr@google.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>>  arch/x86/include/asm/kvm_host.h    |  1 +
>>  arch/x86/kvm/lapic.c               |  5 ++++-
>>  arch/x86/kvm/svm/sev.c             | 35 ++++++++++++++++++++++++++++++
>>  arch/x86/kvm/svm/svm.c             | 16 ++++++++++++--
>>  arch/x86/kvm/svm/svm.h             |  1 +
>>  6 files changed, 56 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
>> index a12a4987154e..36a9c23a4b27 100644
>> --- a/arch/x86/include/asm/kvm-x86-ops.h
>> +++ b/arch/x86/include/asm/kvm-x86-ops.h
>> @@ -122,6 +122,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush)
>>  KVM_X86_OP_NULL(migrate_timers)
>>  KVM_X86_OP(msr_filter_changed)
>>  KVM_X86_OP_NULL(complete_emulated_msr)
>> +KVM_X86_OP(alloc_apic_backing_page)
>>  
>>  #undef KVM_X86_OP
>>  #undef KVM_X86_OP_NULL
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 974cbfb1eefe..5ad6255ff5d5 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1453,6 +1453,7 @@ struct kvm_x86_ops {
>>  	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
>>  
>>  	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
>> +	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
>>  };
>>  
>>  struct kvm_x86_nested_ops {
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index ba5a27879f1d..05b45747b20b 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -2457,7 +2457,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
>>  
>>  	vcpu->arch.apic = apic;
>>  
>> -	apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
>> +	if (kvm_x86_ops.alloc_apic_backing_page)
>> +		apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
>> +	else
>> +		apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
>>  	if (!apic->regs) {
>>  		printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
>>  		       vcpu->vcpu_id);
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 1644da5fc93f..8771b878193f 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -2703,3 +2703,38 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
>>  		break;
>>  	}
>>  }
>> +
>> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned long pfn;
>> +	struct page *p;
>> +
>> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>> +		return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>> +
>> +	/*
>> +	 * Allocate an SNP safe page to workaround the SNP erratum where
>> +	 * the CPU will incorrectly signal an RMP violation  #PF if a
>> +	 * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
>> +	 * or AVIC backing page. The recommeded workaround is to not use the
>> +	 * hugepage.
>> +	 *
>> +	 * Allocate one extra page, use a page which is not 2mb aligned
>> +	 * and free the other.
>> +	 */
>> +	p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
>> +	if (!p)
>> +		return NULL;
>> +
>> +	split_page(p, 1);
>> +
>> +	pfn = page_to_pfn(p);
>> +	if (IS_ALIGNED(__pfn_to_phys(pfn), PMD_SIZE)) {
>> +		pfn++;
>> +		__free_page(p);
>> +	} else {
>> +		__free_page(pfn_to_page(pfn + 1));
>> +	}
>> +
>> +	return pfn_to_page(pfn);
>> +}
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 25773bf72158..058eea8353c9 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -1368,7 +1368,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
>>  	svm = to_svm(vcpu);
>>  
>>  	err = -ENOMEM;
>> -	vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>> +	vmcb01_page = snp_safe_alloc_page(vcpu);
>>  	if (!vmcb01_page)
>>  		goto out;
>>  
>> @@ -1377,7 +1377,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
>>  		 * SEV-ES guests require a separate VMSA page used to contain
>>  		 * the encrypted register state of the guest.
>>  		 */
>> -		vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>> +		vmsa_page = snp_safe_alloc_page(vcpu);
>>  		if (!vmsa_page)
>>  			goto error_free_vmcb_page;
>>  
>> @@ -4539,6 +4539,16 @@ static int svm_vm_init(struct kvm *kvm)
>>  	return 0;
>>  }
>>  
>> +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
>> +{
>> +	struct page *page = snp_safe_alloc_page(vcpu);
>> +
>> +	if (!page)
>> +		return NULL;
>> +
>> +	return page_address(page);
>> +}
>> +
>>  static struct kvm_x86_ops svm_x86_ops __initdata = {
>>  	.hardware_unsetup = svm_hardware_teardown,
>>  	.hardware_enable = svm_hardware_enable,
>> @@ -4667,6 +4677,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>>  	.complete_emulated_msr = svm_complete_emulated_msr,
>>  
>>  	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
>> +
>> +	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
>>  };
>>  
>>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index d1f1512a4b47..e40800e9c998 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -575,6 +575,7 @@ void sev_es_create_vcpu(struct vcpu_svm *svm);
>>  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
>>  void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
>>  void sev_es_unmap_ghcb(struct vcpu_svm *svm);
>> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>>  
>>  /* vmenter.S */
>>  
>> -- 
>> 2.17.1
>>
>>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2021-09-23 18:09     ` Brijesh Singh
@ 2021-09-23 18:39       ` Dr. David Alan Gilbert
  2021-09-23 22:23         ` Brijesh Singh
  2021-09-23 19:17       ` Marc Orr
  1 sibling, 1 reply; 239+ messages in thread
From: Dr. David Alan Gilbert @ 2021-09-23 18:39 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

* Brijesh Singh (brijesh.singh@amd.com) wrote:
> 
> On 9/22/21 1:55 PM, Dr. David Alan Gilbert wrote:
> > * Brijesh Singh (brijesh.singh@amd.com) wrote:
> >> Implement a workaround for an SNP erratum where the CPU will incorrectly
> >> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> >> RMP entry of a VMCB, VMSA or AVIC backing page.
> >>
> >> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> >> backing   pages as "in-use" in the RMP after a successful VMRUN.  This is
> >> done for _all_ VMs, not just SNP-Active VMs.
> > Can you explain what 'globally enabled' means?
> 
> This means that SNP is enabled in  host SYSCFG_MSR.Snp=1. Once its
> enabled then RMP checks are enforced.
> 
> 
> > Or more specifically, can we trip this bug on public hardware that has
> > the SNP enabled in the bios, but no SNP init in the host OS?
> 
> Enabling the SNP support on host is 3 step process:
> 
> step1 (bios): reserve memory for the RMP table.
> 
> step2 (host): initialize the RMP table memory, set the SYSCFG msr to
> enable the SNP feature
> 
> step3 (host): call the SNP_INIT to initialize the SNP firmware (this is
> needed only if you ever plan to launch SNP guest from this host).
> 
> The "SNP globally enabled" means the step 1 to 2. The RMP checks are
> enforced as soon as step 2 is completed.

So I think that means we don't need to backport this to older kernels
that don't know about SNP but might run on SNP enabled hardware (1), since
those kernels won't do step2.

Dave

> thanks
> 
> >
> > Dave
> >
> >> If the hypervisor accesses an in-use page through a writable translation,
> >> the CPU will throw an RMP violation #PF.  On early SNP hardware, if an
> >> in-use page is 2mb aligned and software accesses any part of the associated
> >> 2mb region with a hupage, the CPU will incorrectly treat the entire 2mb
> >> region as in-use and signal a spurious RMP violation #PF.
> >>
> >> The recommended is to not use the hugepage for the VMCB, VMSA or
> >> AVIC backing page. Add a generic allocator that will ensure that the page
> >> returns is not hugepage (2mb or 1gb) and is safe to be used when SEV-SNP
> >> is enabled.
> >>
> >> Co-developed-by: Marc Orr <marcorr@google.com>
> >> Signed-off-by: Marc Orr <marcorr@google.com>
> >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> >> ---
> >>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
> >>  arch/x86/include/asm/kvm_host.h    |  1 +
> >>  arch/x86/kvm/lapic.c               |  5 ++++-
> >>  arch/x86/kvm/svm/sev.c             | 35 ++++++++++++++++++++++++++++++
> >>  arch/x86/kvm/svm/svm.c             | 16 ++++++++++++--
> >>  arch/x86/kvm/svm/svm.h             |  1 +
> >>  6 files changed, 56 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> >> index a12a4987154e..36a9c23a4b27 100644
> >> --- a/arch/x86/include/asm/kvm-x86-ops.h
> >> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> >> @@ -122,6 +122,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush)
> >>  KVM_X86_OP_NULL(migrate_timers)
> >>  KVM_X86_OP(msr_filter_changed)
> >>  KVM_X86_OP_NULL(complete_emulated_msr)
> >> +KVM_X86_OP(alloc_apic_backing_page)
> >>  
> >>  #undef KVM_X86_OP
> >>  #undef KVM_X86_OP_NULL
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index 974cbfb1eefe..5ad6255ff5d5 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -1453,6 +1453,7 @@ struct kvm_x86_ops {
> >>  	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
> >>  
> >>  	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> >> +	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
> >>  };
> >>  
> >>  struct kvm_x86_nested_ops {
> >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >> index ba5a27879f1d..05b45747b20b 100644
> >> --- a/arch/x86/kvm/lapic.c
> >> +++ b/arch/x86/kvm/lapic.c
> >> @@ -2457,7 +2457,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
> >>  
> >>  	vcpu->arch.apic = apic;
> >>  
> >> -	apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> >> +	if (kvm_x86_ops.alloc_apic_backing_page)
> >> +		apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
> >> +	else
> >> +		apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> >>  	if (!apic->regs) {
> >>  		printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
> >>  		       vcpu->vcpu_id);
> >> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> >> index 1644da5fc93f..8771b878193f 100644
> >> --- a/arch/x86/kvm/svm/sev.c
> >> +++ b/arch/x86/kvm/svm/sev.c
> >> @@ -2703,3 +2703,38 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> >>  		break;
> >>  	}
> >>  }
> >> +
> >> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> >> +{
> >> +	unsigned long pfn;
> >> +	struct page *p;
> >> +
> >> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> >> +		return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> >> +
> >> +	/*
> >> +	 * Allocate an SNP safe page to workaround the SNP erratum where
> >> +	 * the CPU will incorrectly signal an RMP violation  #PF if a
> >> +	 * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
> >> +	 * or AVIC backing page. The recommeded workaround is to not use the
> >> +	 * hugepage.
> >> +	 *
> >> +	 * Allocate one extra page, use a page which is not 2mb aligned
> >> +	 * and free the other.
> >> +	 */
> >> +	p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
> >> +	if (!p)
> >> +		return NULL;
> >> +
> >> +	split_page(p, 1);
> >> +
> >> +	pfn = page_to_pfn(p);
> >> +	if (IS_ALIGNED(__pfn_to_phys(pfn), PMD_SIZE)) {
> >> +		pfn++;
> >> +		__free_page(p);
> >> +	} else {
> >> +		__free_page(pfn_to_page(pfn + 1));
> >> +	}
> >> +
> >> +	return pfn_to_page(pfn);
> >> +}
> >> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> >> index 25773bf72158..058eea8353c9 100644
> >> --- a/arch/x86/kvm/svm/svm.c
> >> +++ b/arch/x86/kvm/svm/svm.c
> >> @@ -1368,7 +1368,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
> >>  	svm = to_svm(vcpu);
> >>  
> >>  	err = -ENOMEM;
> >> -	vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> >> +	vmcb01_page = snp_safe_alloc_page(vcpu);
> >>  	if (!vmcb01_page)
> >>  		goto out;
> >>  
> >> @@ -1377,7 +1377,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
> >>  		 * SEV-ES guests require a separate VMSA page used to contain
> >>  		 * the encrypted register state of the guest.
> >>  		 */
> >> -		vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> >> +		vmsa_page = snp_safe_alloc_page(vcpu);
> >>  		if (!vmsa_page)
> >>  			goto error_free_vmcb_page;
> >>  
> >> @@ -4539,6 +4539,16 @@ static int svm_vm_init(struct kvm *kvm)
> >>  	return 0;
> >>  }
> >>  
> >> +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
> >> +{
> >> +	struct page *page = snp_safe_alloc_page(vcpu);
> >> +
> >> +	if (!page)
> >> +		return NULL;
> >> +
> >> +	return page_address(page);
> >> +}
> >> +
> >>  static struct kvm_x86_ops svm_x86_ops __initdata = {
> >>  	.hardware_unsetup = svm_hardware_teardown,
> >>  	.hardware_enable = svm_hardware_enable,
> >> @@ -4667,6 +4677,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> >>  	.complete_emulated_msr = svm_complete_emulated_msr,
> >>  
> >>  	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> >> +
> >> +	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
> >>  };
> >>  
> >>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
> >> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> >> index d1f1512a4b47..e40800e9c998 100644
> >> --- a/arch/x86/kvm/svm/svm.h
> >> +++ b/arch/x86/kvm/svm/svm.h
> >> @@ -575,6 +575,7 @@ void sev_es_create_vcpu(struct vcpu_svm *svm);
> >>  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
> >>  void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
> >>  void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> >> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
> >>  
> >>  /* vmenter.S */
> >>  
> >> -- 
> >> 2.17.1
> >>
> >>
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2021-09-23 18:09     ` Brijesh Singh
  2021-09-23 18:39       ` Dr. David Alan Gilbert
@ 2021-09-23 19:17       ` Marc Orr
  2021-09-23 20:44         ` Brijesh Singh
  1 sibling, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-09-23 19:17 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Dr. David Alan Gilbert, x86, LKML, kvm list, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Gonda, Peter Zijlstra, Srinivas Pandruvada, David Rientjes,
	Dov Murik, Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Thu, Sep 23, 2021 at 11:09 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
>
> On 9/22/21 1:55 PM, Dr. David Alan Gilbert wrote:
> > * Brijesh Singh (brijesh.singh@amd.com) wrote:
> >> Implement a workaround for an SNP erratum where the CPU will incorrectly
> >> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> >> RMP entry of a VMCB, VMSA or AVIC backing page.
> >>
> >> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> >> backing   pages as "in-use" in the RMP after a successful VMRUN.  This is
> >> done for _all_ VMs, not just SNP-Active VMs.
> > Can you explain what 'globally enabled' means?
>
> This means that SNP is enabled in  host SYSCFG_MSR.Snp=1. Once its
> enabled then RMP checks are enforced.
>
>
> > Or more specifically, can we trip this bug on public hardware that has
> > the SNP enabled in the bios, but no SNP init in the host OS?
>
> Enabling the SNP support on host is 3 step process:
>
> step1 (bios): reserve memory for the RMP table.
>
> step2 (host): initialize the RMP table memory, set the SYSCFG msr to
> enable the SNP feature
>
> step3 (host): call the SNP_INIT to initialize the SNP firmware (this is
> needed only if you ever plan to launch SNP guest from this host).
>
> The "SNP globally enabled" means the step 1 to 2. The RMP checks are
> enforced as soon as step 2 is completed.

It might be good to update the code to reflect this reply, such that
the new `snp_safe_alloc_page()` function introduced in this patch only
deviates from "normal" page allocation logic when these two conditions
are met. We could introduce a global variable, `snp_globally_enabled`
that defaults to false and gets set to true after we write the SYSCFG
MSR.

> thanks
>
> >
> > Dave
> >
> >> If the hypervisor accesses an in-use page through a writable translation,
> >> the CPU will throw an RMP violation #PF.  On early SNP hardware, if an
> >> in-use page is 2mb aligned and software accesses any part of the associated
> >> 2mb region with a hupage, the CPU will incorrectly treat the entire 2mb
> >> region as in-use and signal a spurious RMP violation #PF.
> >>
> >> The recommended is to not use the hugepage for the VMCB, VMSA or
> >> AVIC backing page. Add a generic allocator that will ensure that the page
> >> returns is not hugepage (2mb or 1gb) and is safe to be used when SEV-SNP
> >> is enabled.
> >>
> >> Co-developed-by: Marc Orr <marcorr@google.com>
> >> Signed-off-by: Marc Orr <marcorr@google.com>
> >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> >> ---
> >>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
> >>  arch/x86/include/asm/kvm_host.h    |  1 +
> >>  arch/x86/kvm/lapic.c               |  5 ++++-
> >>  arch/x86/kvm/svm/sev.c             | 35 ++++++++++++++++++++++++++++++
> >>  arch/x86/kvm/svm/svm.c             | 16 ++++++++++++--
> >>  arch/x86/kvm/svm/svm.h             |  1 +
> >>  6 files changed, 56 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> >> index a12a4987154e..36a9c23a4b27 100644
> >> --- a/arch/x86/include/asm/kvm-x86-ops.h
> >> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> >> @@ -122,6 +122,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush)
> >>  KVM_X86_OP_NULL(migrate_timers)
> >>  KVM_X86_OP(msr_filter_changed)
> >>  KVM_X86_OP_NULL(complete_emulated_msr)
> >> +KVM_X86_OP(alloc_apic_backing_page)
> >>
> >>  #undef KVM_X86_OP
> >>  #undef KVM_X86_OP_NULL
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index 974cbfb1eefe..5ad6255ff5d5 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -1453,6 +1453,7 @@ struct kvm_x86_ops {
> >>      int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
> >>
> >>      void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> >> +    void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
> >>  };
> >>
> >>  struct kvm_x86_nested_ops {
> >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >> index ba5a27879f1d..05b45747b20b 100644
> >> --- a/arch/x86/kvm/lapic.c
> >> +++ b/arch/x86/kvm/lapic.c
> >> @@ -2457,7 +2457,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
> >>
> >>      vcpu->arch.apic = apic;
> >>
> >> -    apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> >> +    if (kvm_x86_ops.alloc_apic_backing_page)
> >> +            apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
> >> +    else
> >> +            apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> >>      if (!apic->regs) {
> >>              printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
> >>                     vcpu->vcpu_id);
> >> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> >> index 1644da5fc93f..8771b878193f 100644
> >> --- a/arch/x86/kvm/svm/sev.c
> >> +++ b/arch/x86/kvm/svm/sev.c
> >> @@ -2703,3 +2703,38 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> >>              break;
> >>      }
> >>  }
> >> +
> >> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> >> +{
> >> +    unsigned long pfn;
> >> +    struct page *p;
> >> +
> >> +    if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> >> +            return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);

Continuing my other comment, above: if we introduce a
`snp_globally_enabled` var, we could use that here, rather than
`cpu_feature_enabled(X86_FEATURE_SEV_SNP)`.

> >> +
> >> +    /*
> >> +     * Allocate an SNP safe page to workaround the SNP erratum where
> >> +     * the CPU will incorrectly signal an RMP violation  #PF if a
> >> +     * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
> >> +     * or AVIC backing page. The recommeded workaround is to not use the
> >> +     * hugepage.
> >> +     *
> >> +     * Allocate one extra page, use a page which is not 2mb aligned
> >> +     * and free the other.
> >> +     */
> >> +    p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
> >> +    if (!p)
> >> +            return NULL;
> >> +
> >> +    split_page(p, 1);
> >> +
> >> +    pfn = page_to_pfn(p);
> >> +    if (IS_ALIGNED(__pfn_to_phys(pfn), PMD_SIZE)) {
> >> +            pfn++;
> >> +            __free_page(p);
> >> +    } else {
> >> +            __free_page(pfn_to_page(pfn + 1));
> >> +    }
> >> +
> >> +    return pfn_to_page(pfn);
> >> +}
> >> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> >> index 25773bf72158..058eea8353c9 100644
> >> --- a/arch/x86/kvm/svm/svm.c
> >> +++ b/arch/x86/kvm/svm/svm.c
> >> @@ -1368,7 +1368,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
> >>      svm = to_svm(vcpu);
> >>
> >>      err = -ENOMEM;
> >> -    vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> >> +    vmcb01_page = snp_safe_alloc_page(vcpu);
> >>      if (!vmcb01_page)
> >>              goto out;
> >>
> >> @@ -1377,7 +1377,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
> >>               * SEV-ES guests require a separate VMSA page used to contain
> >>               * the encrypted register state of the guest.
> >>               */
> >> -            vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> >> +            vmsa_page = snp_safe_alloc_page(vcpu);
> >>              if (!vmsa_page)
> >>                      goto error_free_vmcb_page;
> >>
> >> @@ -4539,6 +4539,16 @@ static int svm_vm_init(struct kvm *kvm)
> >>      return 0;
> >>  }
> >>
> >> +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
> >> +{
> >> +    struct page *page = snp_safe_alloc_page(vcpu);
> >> +
> >> +    if (!page)
> >> +            return NULL;
> >> +
> >> +    return page_address(page);
> >> +}
> >> +
> >>  static struct kvm_x86_ops svm_x86_ops __initdata = {
> >>      .hardware_unsetup = svm_hardware_teardown,
> >>      .hardware_enable = svm_hardware_enable,
> >> @@ -4667,6 +4677,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> >>      .complete_emulated_msr = svm_complete_emulated_msr,
> >>
> >>      .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> >> +
> >> +    .alloc_apic_backing_page = svm_alloc_apic_backing_page,
> >>  };
> >>
> >>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
> >> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> >> index d1f1512a4b47..e40800e9c998 100644
> >> --- a/arch/x86/kvm/svm/svm.h
> >> +++ b/arch/x86/kvm/svm/svm.h
> >> @@ -575,6 +575,7 @@ void sev_es_create_vcpu(struct vcpu_svm *svm);
> >>  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
> >>  void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
> >>  void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> >> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
> >>
> >>  /* vmenter.S */
> >>
> >> --
> >> 2.17.1
> >>
> >>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2021-09-23 19:17       ` Marc Orr
@ 2021-09-23 20:44         ` Brijesh Singh
  2021-09-23 20:55           ` Marc Orr
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-09-23 20:44 UTC (permalink / raw)
  To: Marc Orr
  Cc: Dr. David Alan Gilbert, x86, LKML, kvm list, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Gonda, Peter Zijlstra, Srinivas Pandruvada, David Rientjes,
	Dov Murik, Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy


On 9/23/21 2:17 PM, Marc Orr wrote:

>>>> +
>>>> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +    unsigned long pfn;
>>>> +    struct page *p;
>>>> +
>>>> +    if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>>>> +            return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> Continuing my other comment, above: if we introduce a
> `snp_globally_enabled` var, we could use that here, rather than
> `cpu_feature_enabled(X86_FEATURE_SEV_SNP)`.


Maybe I am missing something, what is wrong with
cpu_feature_enabled(...) check ? It's same as creating a global
variable. The feature enabled bit is not set if the said is not
enabled.  See the patch #3 [1] in this series.

[1]
https://lore.kernel.org/linux-mm/YUN+L0dlFMbC3bd4@zn.tnic/T/#m2ac1242b33abfcd0d9fb22a89f4c103eacf67ea7

thanks


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2021-09-23 20:44         ` Brijesh Singh
@ 2021-09-23 20:55           ` Marc Orr
  0 siblings, 0 replies; 239+ messages in thread
From: Marc Orr @ 2021-09-23 20:55 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Dr. David Alan Gilbert, x86, LKML, kvm list, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Gonda, Peter Zijlstra, Srinivas Pandruvada, David Rientjes,
	Dov Murik, Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Thu, Sep 23, 2021 at 1:44 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
>
> On 9/23/21 2:17 PM, Marc Orr wrote:
>
> >>>> +
> >>>> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +    unsigned long pfn;
> >>>> +    struct page *p;
> >>>> +
> >>>> +    if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> >>>> +            return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> > Continuing my other comment, above: if we introduce a
> > `snp_globally_enabled` var, we could use that here, rather than
> > `cpu_feature_enabled(X86_FEATURE_SEV_SNP)`.
>
>
> Maybe I am missing something, what is wrong with
> cpu_feature_enabled(...) check ? It's same as creating a global
> variable. The feature enabled bit is not set if the said is not
> enabled.  See the patch #3 [1] in this series.
>
> [1]
> https://lore.kernel.org/linux-mm/YUN+L0dlFMbC3bd4@zn.tnic/T/#m2ac1242b33abfcd0d9fb22a89f4c103eacf67ea7
>
> thanks

You are right. Patch #3 does exactly what I was asking for in
`snp_rmptable_init()`. Thanks!

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2021-09-23 18:39       ` Dr. David Alan Gilbert
@ 2021-09-23 22:23         ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-09-23 22:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 9/23/21 1:39 PM, Dr. David Alan Gilbert wrote:
> * Brijesh Singh (brijesh.singh@amd.com) wrote:
>> On 9/22/21 1:55 PM, Dr. David Alan Gilbert wrote:
>>> * Brijesh Singh (brijesh.singh@amd.com) wrote:
>>>> Implement a workaround for an SNP erratum where the CPU will incorrectly
>>>> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
>>>> RMP entry of a VMCB, VMSA or AVIC backing page.
>>>>
>>>> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
>>>> backing   pages as "in-use" in the RMP after a successful VMRUN.  This is
>>>> done for _all_ VMs, not just SNP-Active VMs.
>>> Can you explain what 'globally enabled' means?
>> This means that SNP is enabled in  host SYSCFG_MSR.Snp=1. Once its
>> enabled then RMP checks are enforced.
>>
>>
>>> Or more specifically, can we trip this bug on public hardware that has
>>> the SNP enabled in the bios, but no SNP init in the host OS?
>> Enabling the SNP support on host is 3 step process:
>>
>> step1 (bios): reserve memory for the RMP table.
>>
>> step2 (host): initialize the RMP table memory, set the SYSCFG msr to
>> enable the SNP feature
>>
>> step3 (host): call the SNP_INIT to initialize the SNP firmware (this is
>> needed only if you ever plan to launch SNP guest from this host).
>>
>> The "SNP globally enabled" means the step 1 to 2. The RMP checks are
>> enforced as soon as step 2 is completed.
> So I think that means we don't need to backport this to older kernels
> that don't know about SNP but might run on SNP enabled hardware (1), since
> those kernels won't do step2.

Correct.

thanks


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 03/45] x86/sev: Add the host SEV-SNP initialization support
  2021-08-20 15:58 ` [PATCH Part2 v5 03/45] x86/sev: Add the host SEV-SNP initialization support Brijesh Singh
@ 2021-09-24  8:58   ` Borislav Petkov
  0 siblings, 0 replies; 239+ messages in thread
From: Borislav Petkov @ 2021-09-24  8:58 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:58:36AM -0500, Brijesh Singh wrote:
> @@ -542,6 +544,10 @@
>  #define MSR_AMD64_SYSCFG		0xc0010010
>  #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT	23
>  #define MSR_AMD64_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT		24
> +#define MSR_AMD64_SYSCFG_SNP_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT	25
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN	BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)

While you're here, move all defines that belong to that MSR which
we decided it is architectural after all, above the definition of
MSR_AMD64_NB_CFG above.

>  #define MSR_K8_INT_PENDING_MSG		0xc0010055
>  /* C1E active bits in int pending message */
>  #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index ab17c93634e9..7936c8139c74 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -24,6 +24,8 @@
>  #include <linux/sev-guest.h>
>  #include <linux/platform_device.h>
>  #include <linux/io.h>
> +#include <linux/cpumask.h>
> +#include <linux/iommu.h>
>  
>  #include <asm/cpu_entry_area.h>
>  #include <asm/stacktrace.h>
> @@ -40,11 +42,19 @@
>  #include <asm/efi.h>
>  #include <asm/cpuid.h>
>  #include <asm/setup.h>
> +#include <asm/apic.h>
> +#include <asm/iommu.h>
>  
>  #include "sev-internal.h"
>  
>  #define DR7_RESET_VALUE        0x400
>  
> +/*
> + * The first 16KB from the RMP_BASE is used by the processor for the

s/for the/for/

> + * bookkeeping, the range need to be added during the RMP entry lookup.

s/need/needs/ ... s/during the/during /

> + */
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000

Too long: just call it RMP_BASE_OFFSET.

> +static bool get_rmptable_info(u64 *start, u64 *len)
> +{
> +	u64 calc_rmp_sz, rmp_sz, rmp_base, rmp_end, nr_pages;
> +
> +	rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> +	rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> +	if (!rmp_base || !rmp_end) {
> +		pr_info("Memory for the RMP table has not been reserved by BIOS\n");
> +		return false;
> +	}
> +
> +	rmp_sz = rmp_end - rmp_base + 1;
> +
> +	/*
> +	 * Calculate the amount the memory that must be reserved by the BIOS to
> +	 * address the full system RAM. The reserved memory should also cover the
> +	 * RMP table itself.
> +	 *
> +	 * See PPR Family 19h Model 01h, Revision B1 section 2.1.5.2 for more
> +	 * information on memory requirement.
> +	 */
> +	nr_pages = totalram_pages();
> +	calc_rmp_sz = (((rmp_sz >> PAGE_SHIFT) + nr_pages) << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;

use totalram_pages() directly here - nr_pages is assigned to only once.

> +
> +	if (calc_rmp_sz > rmp_sz) {
> +		pr_info("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> +			calc_rmp_sz, rmp_sz);
> +		return false;
> +	}
> +
> +	*start = rmp_base;
> +	*len = rmp_sz;
> +
> +	pr_info("RMP table physical address 0x%016llx - 0x%016llx\n", rmp_base, rmp_end);

		"RMP table physical address range: [ ... - ... ]\n", ...

...

> +/*
> + * This must be called after the PCI subsystem. This is because before enabling
> + * the SNP feature we need to ensure that IOMMU supports the SEV-SNP feature.

"... it needs to be ensured that... "

> + * The iommu_sev_snp_support() is used for checking the feature, and it is
> + * available after subsys_initcall().
> + */
> +fs_initcall(snp_rmptable_init);
> -- 
> 2.17.1
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 04/45] x86/sev: Add RMP entry lookup helpers
  2021-08-20 15:58 ` [PATCH Part2 v5 04/45] x86/sev: Add RMP entry lookup helpers Brijesh Singh
@ 2021-09-24  9:49   ` Borislav Petkov
  2021-09-27 16:01     ` Brijesh Singh
  2022-06-02 11:57   ` Jarkko Sakkinen
  1 sibling, 1 reply; 239+ messages in thread
From: Borislav Petkov @ 2021-09-24  9:49 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:58:37AM -0500, Brijesh Singh wrote:
> +struct __packed rmpentry {
> +	union {
> +		struct {
> +			u64	assigned	: 1,
> +				pagesize	: 1,
> +				immutable	: 1,
> +				rsvd1		: 9,
> +				gpa		: 39,
> +				asid		: 10,
> +				vmsa		: 1,
> +				validated	: 1,
> +				rsvd2		: 1;
> +		} info;
> +		u64 low;
> +	};
> +	u64 high;
> +};

__packed goes at the end of the struct definition.

> +
> +#define rmpentry_assigned(x)	((x)->info.assigned)
> +#define rmpentry_pagesize(x)	((x)->info.pagesize)

Inline functions pls so that you can get typechecking too.

>  
>  #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
>  
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 7936c8139c74..f383d2a89263 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -54,6 +54,8 @@
>   * bookkeeping, the range need to be added during the RMP entry lookup.
>   */
>  #define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
> +#define RMPENTRY_SHIFT			8
> +#define rmptable_page_offset(x)	(RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))
>  
>  /* For early boot hypervisor communication in SEV-ES enabled guests */
>  static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
> @@ -2376,3 +2378,44 @@ static int __init snp_rmptable_init(void)
>   * available after subsys_initcall().
>   */
>  fs_initcall(snp_rmptable_init);
> +
> +static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
> +{
> +	unsigned long vaddr, paddr = pfn << PAGE_SHIFT;
> +	struct rmpentry *entry, *large_entry;
> +
> +	if (!pfn_valid(pfn))
> +		return ERR_PTR(-EINVAL);
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return ERR_PTR(-ENXIO);

I think that check should happen first.

> +
> +	vaddr = rmptable_start + rmptable_page_offset(paddr);
> +	if (unlikely(vaddr > rmptable_end))
> +		return ERR_PTR(-ENXIO);

Maybe the above -E should be -ENOENT instead so that you have unique
error types for each check to facilitate debugging.

> +
> +	entry = (struct rmpentry *)vaddr;
> +
> +	/* Read a large RMP entry to get the correct page level used in RMP entry. */

That comment needs rewriting.

> +	vaddr = rmptable_start + rmptable_page_offset(paddr & PMD_MASK);
> +	large_entry = (struct rmpentry *)vaddr;
> +	*level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(large_entry));
> +
> +	return entry;
> +}
> +
> +/*
> + * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
> + * and -errno if there is no corresponding RMP entry.
> + */

kernel-doc format since it is being exported.

> +int snp_lookup_rmpentry(u64 pfn, int *level)
> +{
> +	struct rmpentry *e;
> +
> +	e = __snp_lookup_rmpentry(pfn, level);
> +	if (IS_ERR(e))
> +		return PTR_ERR(e);
> +
> +	return !!rmpentry_assigned(e);
> +}
> +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);

This export is for kvm, I presume?

> diff --git a/include/linux/sev.h b/include/linux/sev.h
> new file mode 100644
> index 000000000000..1a68842789e1
> --- /dev/null
> +++ b/include/linux/sev.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * AMD Secure Encrypted Virtualization
> + *
> + * Author: Brijesh Singh <brijesh.singh@amd.com>
> + */
> +
> +#ifndef __LINUX_SEV_H
> +#define __LINUX_SEV_H
> +
> +/* RMUPDATE detected 4K page and 2MB page overlap. */
> +#define RMPUPDATE_FAIL_OVERLAP		7
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +int snp_lookup_rmpentry(u64 pfn, int *level);
> +int psmash(u64 pfn);
> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> +int rmp_make_shared(u64 pfn, enum pg_level level);
> +#else
> +static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
> +static inline int psmash(u64 pfn) { return -ENXIO; }
> +static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
> +				   bool immutable)
> +{
> +	return -ENODEV;
> +}
> +static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
> +
> +#endif /* CONFIG_AMD_MEM_ENCRYPT */
> +#endif /* __LINUX_SEV_H */
> -- 

What is going to use this linux/ namespace header?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2021-08-20 15:58 ` [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Brijesh Singh
@ 2021-09-24 14:04   ` Borislav Petkov
  2021-09-27 16:06     ` Brijesh Singh
  2021-10-15 18:05   ` Sean Christopherson
  1 sibling, 1 reply; 239+ messages in thread
From: Borislav Petkov @ 2021-09-24 14:04 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:58:38AM -0500, Brijesh Singh wrote:
> The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
> hypervisor will use the instruction to add pages to the RMP table. See
> APM3 for details on the instruction operations.
> 
> The PSMASH instruction expands a 2MB RMP entry into a corresponding set of
> contiguous 4KB-Page RMP entries. The hypervisor will use this instruction
> to adjust the RMP entry without invalidating the previous RMP entry.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/sev.h | 11 ++++++
>  arch/x86/kernel/sev.c      | 72 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+)
> 
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 5b1a6a075c47..92ced9626e95 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -78,7 +78,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>  
>  /* RMP page size */
>  #define RMP_PG_SIZE_4K			0
> +#define RMP_PG_SIZE_2M			1
>  #define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
> +#define X86_TO_RMP_PG_LEVEL(level)	(((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
>  
>  /*
>   * The RMP entry format is not architectural. The format is defined in PPR
> @@ -107,6 +109,15 @@ struct __packed rmpentry {
>  
>  #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
>  
> +struct rmpupdate {

Function is called the same way - maybe this should be called
rmpupdate_desc or so.

> +	u64 gpa;
> +	u8 assigned;
> +	u8 pagesize;
> +	u8 immutable;
> +	u8 rsvd;
> +	u32 asid;
> +} __packed;
> +
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  extern struct static_key_false sev_es_enable_key;
>  extern void __sev_es_ist_enter(struct pt_regs *regs);
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index f383d2a89263..8627c49666c9 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2419,3 +2419,75 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
>  	return !!rmpentry_assigned(e);
>  }
>  EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> +

<--- kernel-doc comment.

> +int psmash(u64 pfn)
> +{
> +	unsigned long paddr = pfn << PAGE_SHIFT;
> +	int ret;
> +
> +	if (!pfn_valid(pfn))
> +		return -EINVAL;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return -ENXIO;

Make that the first check pls.

> +
> +	/* Binutils version 2.36 supports the PSMASH mnemonic. */
> +	asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> +		      : "=a"(ret)
> +		      : "a"(paddr)
> +		      : "memory", "cc");
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(psmash);

That's for kvm?

> +static int rmpupdate(u64 pfn, struct rmpupdate *val)
> +{
> +	unsigned long paddr = pfn << PAGE_SHIFT;
> +	int ret;
> +
> +	if (!pfn_valid(pfn))
> +		return -EINVAL;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return -ENXIO;

Also first check.

> +
> +	/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> +	asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> +		     : "=a"(ret)
> +		     : "a"(paddr), "c"((unsigned long)val)
> +		     : "memory", "cc");
> +	return ret;
> +}
> +
> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
> +{
> +	struct rmpupdate val;
> +
> +	if (!pfn_valid(pfn))
> +		return -EINVAL;

rmpupdate() does that check too so choose one place please and kill the
other.

> +	memset(&val, 0, sizeof(val));
> +	val.assigned = 1;
> +	val.asid = asid;
> +	val.immutable = immutable;
> +	val.gpa = gpa;
> +	val.pagesize = X86_TO_RMP_PG_LEVEL(level);
> +
> +	return rmpupdate(pfn, &val);
> +}
> +EXPORT_SYMBOL_GPL(rmp_make_private);
> +
> +int rmp_make_shared(u64 pfn, enum pg_level level)
> +{
> +	struct rmpupdate val;
> +
> +	if (!pfn_valid(pfn))
> +		return -EINVAL;
> +
> +	memset(&val, 0, sizeof(val));
> +	val.pagesize = X86_TO_RMP_PG_LEVEL(level);
> +
> +	return rmpupdate(pfn, &val);
> +}
> +EXPORT_SYMBOL_GPL(rmp_make_shared);

Both exports for kvm I assume?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 04/45] x86/sev: Add RMP entry lookup helpers
  2021-09-24  9:49   ` Borislav Petkov
@ 2021-09-27 16:01     ` Brijesh Singh
  2021-09-27 16:04       ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-09-27 16:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

Hi Boris,

I agreed with all of your comment, responding to your specific questions.

On 9/24/21 4:49 AM, Borislav Petkov wrote:
...

>> +}
>> +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> 
> This export is for kvm, I presume?

yes, both KVM and CCP (i.e PSP) driver will need to lookup RMP entries.

> 
>> diff --git a/include/linux/sev.h b/include/linux/sev.h
>> new file mode 100644
>> index 000000000000..1a68842789e1
>> --- /dev/null
>> +++ b/include/linux/sev.h
>> @@ -0,0 +1,30 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * AMD Secure Encrypted Virtualization
>> + *
>> + * Author: Brijesh Singh <brijesh.singh@amd.com>
>> + */
>> +
>> +#ifndef __LINUX_SEV_H
>> +#define __LINUX_SEV_H
>> +
>> +/* RMUPDATE detected 4K page and 2MB page overlap. */
>> +#define RMPUPDATE_FAIL_OVERLAP		7
>> +
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +int snp_lookup_rmpentry(u64 pfn, int *level);
>> +int psmash(u64 pfn);
>> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
>> +int rmp_make_shared(u64 pfn, enum pg_level level);
>> +#else
>> +static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
>> +static inline int psmash(u64 pfn) { return -ENXIO; }
>> +static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
>> +				   bool immutable)
>> +{
>> +	return -ENODEV;
>> +}
>> +static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
>> +
>> +#endif /* CONFIG_AMD_MEM_ENCRYPT */
>> +#endif /* __LINUX_SEV_H */
>> -- 
> 
> What is going to use this linux/ namespace header?
> 

The kvm and ccp drivers.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 04/45] x86/sev: Add RMP entry lookup helpers
  2021-09-27 16:01     ` Brijesh Singh
@ 2021-09-27 16:04       ` Brijesh Singh
  2021-09-29 12:56         ` Borislav Petkov
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-09-27 16:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy



On 9/27/21 11:01 AM, Brijesh Singh wrote:

>>> -- 
>>
>> What is going to use this linux/ namespace header?
>>
> 
> The kvm and ccp drivers.

Currently, we have only x86 drivers using it, are you thinking to move 
this to arch/x86/include/asm/ ?

thanks

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2021-09-24 14:04   ` Borislav Petkov
@ 2021-09-27 16:06     ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-09-27 16:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy



On 9/24/21 9:04 AM, Borislav Petkov wrote:
> On Fri, Aug 20, 2021 at 10:58:38AM -0500, Brijesh Singh wrote:
>> The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
>> hypervisor will use the instruction to add pages to the RMP table. See
>> APM3 for details on the instruction operations.
>>
>> The PSMASH instruction expands a 2MB RMP entry into a corresponding set of
>> contiguous 4KB-Page RMP entries. The hypervisor will use this instruction
>> to adjust the RMP entry without invalidating the previous RMP entry.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>   arch/x86/include/asm/sev.h | 11 ++++++
>>   arch/x86/kernel/sev.c      | 72 ++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 83 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
>> index 5b1a6a075c47..92ced9626e95 100644
>> --- a/arch/x86/include/asm/sev.h
>> +++ b/arch/x86/include/asm/sev.h
>> @@ -78,7 +78,9 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>>   
>>   /* RMP page size */
>>   #define RMP_PG_SIZE_4K			0
>> +#define RMP_PG_SIZE_2M			1
>>   #define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
>> +#define X86_TO_RMP_PG_LEVEL(level)	(((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
>>   
>>   /*
>>    * The RMP entry format is not architectural. The format is defined in PPR
>> @@ -107,6 +109,15 @@ struct __packed rmpentry {
>>   
>>   #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
>>   
>> +struct rmpupdate {
> 
> Function is called the same way - maybe this should be called
> rmpupdate_desc or so.

Noted.


>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(psmash);
> 
> That's for kvm?

Yes.

...

>> +EXPORT_SYMBOL_GPL(rmp_make_shared);
> 
> Both exports for kvm I assume?

yes, both KVM and CCP drivers need them.

thanks

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 25/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2021-08-20 15:58 ` [PATCH Part2 v5 25/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Brijesh Singh
@ 2021-09-27 16:43   ` Peter Gonda
  2021-09-27 19:33     ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Peter Gonda @ 2021-09-27 16:43 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	Marc Orr, sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
> guest's memory. The data is encrypted with the cryptographic context
> created with the KVM_SEV_SNP_LAUNCH_START.
>
> In addition to the inserting data, it can insert a two special pages
> into the guests memory: the secrets page and the CPUID page.
>
> While terminating the guest, reclaim the guest pages added in the RMP
> table. If the reclaim fails, then the page is no longer safe to be
> released back to the system and leak them.
>
> For more information see the SEV-SNP specification.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  .../virt/kvm/amd-memory-encryption.rst        |  29 +++
>  arch/x86/kvm/svm/sev.c                        | 187 ++++++++++++++++++
>  include/uapi/linux/kvm.h                      |  19 ++
>  3 files changed, 235 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 937af3447954..ddcd94e9ffed 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -478,6 +478,35 @@ Returns: 0 on success, -negative on error
>
>  See the SEV-SNP specification for further detail on the launch input.
>
> +20. KVM_SNP_LAUNCH_UPDATE
> +-------------------------
> +
> +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
> +calculates a measurement of the memory contents. The measurement is a signature
> +of the memory contents that can be sent to the guest owner as an attestation
> +that the memory was encrypted correctly by the firmware.
> +
> +Parameters (in): struct  kvm_snp_launch_update
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_sev_snp_launch_update {
> +                __u64 start_gfn;        /* Guest page number to start from. */
> +                __u64 uaddr;            /* userspace address need to be encrypted */
> +                __u32 len;              /* length of memory region */
> +                __u8 imi_page;          /* 1 if memory is part of the IMI */
> +                __u8 page_type;         /* page type */
> +                __u8 vmpl3_perms;       /* VMPL3 permission mask */
> +                __u8 vmpl2_perms;       /* VMPL2 permission mask */
> +                __u8 vmpl1_perms;       /* VMPL1 permission mask */
> +        };
> +
> +See the SEV-SNP spec for further details on how to build the VMPL permission
> +mask and page type.
> +
> +
>  References
>  ==========
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index dbf04a52b23d..4b126598b7aa 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -17,6 +17,7 @@
>  #include <linux/misc_cgroup.h>
>  #include <linux/processor.h>
>  #include <linux/trace_events.h>
> +#include <linux/sev.h>
>  #include <asm/fpu/internal.h>
>
>  #include <asm/pkru.h>
> @@ -227,6 +228,49 @@ static void sev_decommission(unsigned int handle)
>         sev_guest_decommission(&decommission, NULL);
>  }
>
> +static inline void snp_leak_pages(u64 pfn, enum pg_level level)
> +{
> +       unsigned int npages = page_level_size(level) >> PAGE_SHIFT;
> +
> +       WARN(1, "psc failed pfn 0x%llx pages %d (leaking)\n", pfn, npages);
> +
> +       while (npages) {
> +               memory_failure(pfn, 0);
> +               dump_rmpentry(pfn);
> +               npages--;
> +               pfn++;
> +       }
> +}
> +
> +static int snp_page_reclaim(u64 pfn)
> +{
> +       struct sev_data_snp_page_reclaim data = {0};
> +       int err, rc;
> +
> +       data.paddr = __sme_set(pfn << PAGE_SHIFT);
> +       rc = snp_guest_page_reclaim(&data, &err);
> +       if (rc) {
> +               /*
> +                * If the reclaim failed, then page is no longer safe
> +                * to use.
> +                */
> +               snp_leak_pages(pfn, PG_LEVEL_4K);
> +       }
> +
> +       return rc;
> +}
> +
> +static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
> +{
> +       int rc;
> +
> +       rc = rmp_make_shared(pfn, level);
> +       if (rc && leak)
> +               snp_leak_pages(pfn, level);
> +
> +       return rc;
> +}
> +
>  static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>  {
>         struct sev_data_deactivate deactivate;
> @@ -1620,6 +1664,123 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>         return rc;
>  }
>
> +static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +       struct list_head *head = &sev->regions_list;
> +       struct enc_region *i;
> +
> +       lockdep_assert_held(&kvm->lock);
> +
> +       list_for_each_entry(i, head, list) {
> +               u64 start = i->uaddr;
> +               u64 end = start + i->size;
> +
> +               if (start <= hva && end >= (hva + len))
> +                       return true;
> +       }
> +
> +       return false;
> +}

Internally we actually register the guest memory in chunks for various
reasons. So for our largest SEV VM we have 768 1 GB entries in
|sev->regions_list|. This was OK before because no look ups were done.
Now that we are performing a look ups a linked list with linear time
lookups seems not ideal, could we switch the back data structure here
to something more conducive too fast lookups?
> +
> +static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +       struct sev_data_snp_launch_update data = {0};
> +       struct kvm_sev_snp_launch_update params;
> +       unsigned long npages, pfn, n = 0;

Could we have a slightly more descriptive name for |n|? nprivate
maybe? Also why not zero in the loop below?

for (i = 0, n = 0; i < npages; ++i)

> +       int *error = &argp->error;
> +       struct page **inpages;
> +       int ret, i, level;

Should |i| be an unsigned long since it can is tracked in a for loop
with "i < npages" npages being an unsigned long? (|n| too)

> +       u64 gfn;
> +
> +       if (!sev_snp_guest(kvm))
> +               return -ENOTTY;
> +
> +       if (!sev->snp_context)
> +               return -EINVAL;
> +
> +       if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> +               return -EFAULT;
> +
> +       /* Verify that the specified address range is registered. */
> +       if (!is_hva_registered(kvm, params.uaddr, params.len))
> +               return -EINVAL;
> +
> +       /*
> +        * The userspace memory is already locked so technically we don't
> +        * need to lock it again. Later part of the function needs to know
> +        * pfn so call the sev_pin_memory() so that we can get the list of
> +        * pages to iterate through.
> +        */
> +       inpages = sev_pin_memory(kvm, params.uaddr, params.len, &npages, 1);
> +       if (!inpages)
> +               return -ENOMEM;
> +
> +       /*
> +        * Verify that all the pages are marked shared in the RMP table before
> +        * going further. This is avoid the cases where the userspace may try

This is *too* avoid cases...

> +        * updating the same page twice.
> +        */
> +       for (i = 0; i < npages; i++) {
> +               if (snp_lookup_rmpentry(page_to_pfn(inpages[i]), &level) != 0) {
> +                       sev_unpin_memory(kvm, inpages, npages);
> +                       return -EFAULT;
> +               }
> +       }
> +
> +       gfn = params.start_gfn;
> +       level = PG_LEVEL_4K;
> +       data.gctx_paddr = __psp_pa(sev->snp_context);
> +
> +       for (i = 0; i < npages; i++) {
> +               pfn = page_to_pfn(inpages[i]);
> +
> +               ret = rmp_make_private(pfn, gfn << PAGE_SHIFT, level, sev_get_asid(kvm), true);
> +               if (ret) {
> +                       ret = -EFAULT;
> +                       goto e_unpin;
> +               }
> +
> +               n++;
> +               data.address = __sme_page_pa(inpages[i]);
> +               data.page_size = X86_TO_RMP_PG_LEVEL(level);
> +               data.page_type = params.page_type;
> +               data.vmpl3_perms = params.vmpl3_perms;
> +               data.vmpl2_perms = params.vmpl2_perms;
> +               data.vmpl1_perms = params.vmpl1_perms;
> +               ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, &data, error);
> +               if (ret) {
> +                       /*
> +                        * If the command failed then need to reclaim the page.
> +                        */
> +                       snp_page_reclaim(pfn);
> +                       goto e_unpin;
> +               }

Hmm if this call fails after the first iteration of this loop it will
lead to a hard to reproduce LaunchDigest right? Say if we are
SnpLaunchUpdating just 2 pages A and B. If we first call this ioctl
and A is SNP_LAUNCH_UPDATED'd but B fails, we then make A shared again
in the RMP. So we must call the ioctl with 2 pages again, after fixing
the issue with page B. Now the Launch digest has something like
Hash(A) then HASH(A & B) right (overly simplified) so A will be
included twice right? I am not sure if anything better can be done
here but might be worth documenting IIUC.

> +
> +               gfn++;
> +       }
> +
> +e_unpin:
> +       /* Content of memory is updated, mark pages dirty */
> +       for (i = 0; i < n; i++) {
> +               set_page_dirty_lock(inpages[i]);
> +               mark_page_accessed(inpages[i]);
> +
> +               /*
> +                * If its an error, then update RMP entry to change page ownership
> +                * to the hypervisor.
> +                */
> +               if (ret)
> +                       host_rmp_make_shared(pfn, level, true);
> +       }
> +
> +       /* Unlock the user pages */
> +       sev_unpin_memory(kvm, inpages, npages);
> +
> +       return ret;
> +}
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>         struct kvm_sev_cmd sev_cmd;
> @@ -1712,6 +1873,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>         case KVM_SEV_SNP_LAUNCH_START:
>                 r = snp_launch_start(kvm, &sev_cmd);
>                 break;
> +       case KVM_SEV_SNP_LAUNCH_UPDATE:
> +               r = snp_launch_update(kvm, &sev_cmd);
> +               break;
>         default:
>                 r = -EINVAL;
>                 goto out;
> @@ -1794,6 +1958,29 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
>  static void __unregister_enc_region_locked(struct kvm *kvm,
>                                            struct enc_region *region)
>  {
> +       unsigned long i, pfn;
> +       int level;
> +
> +       /*
> +        * The guest memory pages are assigned in the RMP table. Unassign it
> +        * before releasing the memory.
> +        */
> +       if (sev_snp_guest(kvm)) {
> +               for (i = 0; i < region->npages; i++) {
> +                       pfn = page_to_pfn(region->pages[i]);
> +
> +                       if (!snp_lookup_rmpentry(pfn, &level))
> +                               continue;
> +
> +                       cond_resched();
> +
> +                       if (level > PG_LEVEL_4K)
> +                               pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
> +
> +                       host_rmp_make_shared(pfn, level, true);
> +               }
> +       }
> +
>         sev_unpin_memory(kvm, region->pages, region->npages);
>         list_del(&region->list);
>         kfree(region);
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index e6416e58cd9a..0681be4bdfdf 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1715,6 +1715,7 @@ enum sev_cmd_id {
>         /* SNP specific commands */
>         KVM_SEV_SNP_INIT,
>         KVM_SEV_SNP_LAUNCH_START,
> +       KVM_SEV_SNP_LAUNCH_UPDATE,
>
>         KVM_SEV_NR_MAX,
>  };
> @@ -1831,6 +1832,24 @@ struct kvm_sev_snp_launch_start {
>         __u8 pad[6];
>  };
>
> +#define KVM_SEV_SNP_PAGE_TYPE_NORMAL           0x1
> +#define KVM_SEV_SNP_PAGE_TYPE_VMSA             0x2
> +#define KVM_SEV_SNP_PAGE_TYPE_ZERO             0x3
> +#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED       0x4
> +#define KVM_SEV_SNP_PAGE_TYPE_SECRETS          0x5
> +#define KVM_SEV_SNP_PAGE_TYPE_CPUID            0x6
> +
> +struct kvm_sev_snp_launch_update {
> +       __u64 start_gfn;
> +       __u64 uaddr;
> +       __u32 len;
> +       __u8 imi_page;
> +       __u8 page_type;
> +       __u8 vmpl3_perms;
> +       __u8 vmpl2_perms;
> +       __u8 vmpl1_perms;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU    (1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3         (1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX       (1 << 2)
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 25/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2021-09-27 16:43   ` Peter Gonda
@ 2021-09-27 19:33     ` Brijesh Singh
  2021-10-05 15:01       ` Peter Gonda
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-09-27 19:33 UTC (permalink / raw)
  To: Peter Gonda
  Cc: brijesh.singh, x86, LKML, kvm list, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	Marc Orr, sathyanarayanan.kuppuswamy



On 9/27/21 11:43 AM, Peter Gonda wrote:
...
>>
>> +static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
>> +{
>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +       struct list_head *head = &sev->regions_list;
>> +       struct enc_region *i;
>> +
>> +       lockdep_assert_held(&kvm->lock);
>> +
>> +       list_for_each_entry(i, head, list) {
>> +               u64 start = i->uaddr;
>> +               u64 end = start + i->size;
>> +
>> +               if (start <= hva && end >= (hva + len))
>> +                       return true;
>> +       }
>> +
>> +       return false;
>> +}
> 
> Internally we actually register the guest memory in chunks for various
> reasons. So for our largest SEV VM we have 768 1 GB entries in
> |sev->regions_list|. This was OK before because no look ups were done.
> Now that we are performing a look ups a linked list with linear time
> lookups seems not ideal, could we switch the back data structure here
> to something more conducive too fast lookups?
>> +

Interesting, for qemu we had very few number of regions so there was no 
strong reason for me to think something otherwise. Do you have any 
preference on what data structure you will use ?

>> +static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> +{
>> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +       struct sev_data_snp_launch_update data = {0};
>> +       struct kvm_sev_snp_launch_update params;
>> +       unsigned long npages, pfn, n = 0;
> 
> Could we have a slightly more descriptive name for |n|? nprivate
> maybe? Also why not zero in the loop below?
> 

Sure, I will pick a better name and no need to zero above. I will fix it.

> for (i = 0, n = 0; i < npages; ++i)
> 
>> +       int *error = &argp->error;
>> +       struct page **inpages;
>> +       int ret, i, level;
> 
> Should |i| be an unsigned long since it can is tracked in a for loop
> with "i < npages" npages being an unsigned long? (|n| too)
> 

Noted.

>> +       u64 gfn;
>> +
>> +       if (!sev_snp_guest(kvm))
>> +               return -ENOTTY;
>> +
>> +       if (!sev->snp_context)
>> +               return -EINVAL;
>> +
>> +       if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
>> +               return -EFAULT;
>> +
>> +       /* Verify that the specified address range is registered. */
>> +       if (!is_hva_registered(kvm, params.uaddr, params.len))
>> +               return -EINVAL;
>> +
>> +       /*
>> +        * The userspace memory is already locked so technically we don't
>> +        * need to lock it again. Later part of the function needs to know
>> +        * pfn so call the sev_pin_memory() so that we can get the list of
>> +        * pages to iterate through.
>> +        */
>> +       inpages = sev_pin_memory(kvm, params.uaddr, params.len, &npages, 1);
>> +       if (!inpages)
>> +               return -ENOMEM;
>> +
>> +       /*
>> +        * Verify that all the pages are marked shared in the RMP table before
>> +        * going further. This is avoid the cases where the userspace may try
> 
> This is *too* avoid cases...
> 
Noted

>> +        * updating the same page twice.
>> +        */
>> +       for (i = 0; i < npages; i++) {
>> +               if (snp_lookup_rmpentry(page_to_pfn(inpages[i]), &level) != 0) {
>> +                       sev_unpin_memory(kvm, inpages, npages);
>> +                       return -EFAULT;
>> +               }
>> +       }
>> +
>> +       gfn = params.start_gfn;
>> +       level = PG_LEVEL_4K;
>> +       data.gctx_paddr = __psp_pa(sev->snp_context);
>> +
>> +       for (i = 0; i < npages; i++) {
>> +               pfn = page_to_pfn(inpages[i]);
>> +
>> +               ret = rmp_make_private(pfn, gfn << PAGE_SHIFT, level, sev_get_asid(kvm), true);
>> +               if (ret) {
>> +                       ret = -EFAULT;
>> +                       goto e_unpin;
>> +               }
>> +
>> +               n++;
>> +               data.address = __sme_page_pa(inpages[i]);
>> +               data.page_size = X86_TO_RMP_PG_LEVEL(level);
>> +               data.page_type = params.page_type;
>> +               data.vmpl3_perms = params.vmpl3_perms;
>> +               data.vmpl2_perms = params.vmpl2_perms;
>> +               data.vmpl1_perms = params.vmpl1_perms;
>> +               ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, &data, error);
>> +               if (ret) {
>> +                       /*
>> +                        * If the command failed then need to reclaim the page.
>> +                        */
>> +                       snp_page_reclaim(pfn);
>> +                       goto e_unpin;
>> +               }
> 
> Hmm if this call fails after the first iteration of this loop it will
> lead to a hard to reproduce LaunchDigest right? Say if we are
> SnpLaunchUpdating just 2 pages A and B. If we first call this ioctl
> and A is SNP_LAUNCH_UPDATED'd but B fails, we then make A shared again
> in the RMP. So we must call the ioctl with 2 pages again, after fixing
> the issue with page B. Now the Launch digest has something like
> Hash(A) then HASH(A & B) right (overly simplified) so A will be
> included twice right? I am not sure if anything better can be done
> here but might be worth documenting IIUC.
> 

I can add a comment in documentation that if a LAUNCH_UPDATE fails then 
user need to destroy the existing context and start from the beginning. 
I am not sure if we want to support the partial update cases. But in 
case we have two choices a) decommission the context on failure or b) 
add a new command to destroy the existing context.


>> +
>> +               gfn++;
>> +       }
>> +
>> +e_unpin:
>> +       /* Content of memory is updated, mark pages dirty */
>> +       for (i = 0; i < n; i++) {
>> +               set_page_dirty_lock(inpages[i]);
>> +               mark_page_accessed(inpages[i]);
>> +
>> +               /*
>> +                * If its an error, then update RMP entry to change page ownership
>> +                * to the hypervisor.
>> +                */
>> +               if (ret)
>> +                       host_rmp_make_shared(pfn, level, true);
>> +       }
>> +
>> +       /* Unlock the user pages */
>> +       sev_unpin_memory(kvm, inpages, npages);
>> +
>> +       return ret;
>> +}
>> +
>>   int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>   {
>>          struct kvm_sev_cmd sev_cmd;
>> @@ -1712,6 +1873,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>          case KVM_SEV_SNP_LAUNCH_START:
>>                  r = snp_launch_start(kvm, &sev_cmd);
>>                  break;
>> +       case KVM_SEV_SNP_LAUNCH_UPDATE:
>> +               r = snp_launch_update(kvm, &sev_cmd);
>> +               break;
>>          default:
>>                  r = -EINVAL;
>>                  goto out;
>> @@ -1794,6 +1958,29 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
>>   static void __unregister_enc_region_locked(struct kvm *kvm,
>>                                             struct enc_region *region)
>>   {
>> +       unsigned long i, pfn;
>> +       int level;
>> +
>> +       /*
>> +        * The guest memory pages are assigned in the RMP table. Unassign it
>> +        * before releasing the memory.
>> +        */
>> +       if (sev_snp_guest(kvm)) {
>> +               for (i = 0; i < region->npages; i++) {
>> +                       pfn = page_to_pfn(region->pages[i]);
>> +
>> +                       if (!snp_lookup_rmpentry(pfn, &level))
>> +                               continue;
>> +
>> +                       cond_resched();
>> +
>> +                       if (level > PG_LEVEL_4K)
>> +                               pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
>> +
>> +                       host_rmp_make_shared(pfn, level, true);
>> +               }
>> +       }
>> +
>>          sev_unpin_memory(kvm, region->pages, region->npages);
>>          list_del(&region->list);
>>          kfree(region);
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index e6416e58cd9a..0681be4bdfdf 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1715,6 +1715,7 @@ enum sev_cmd_id {
>>          /* SNP specific commands */
>>          KVM_SEV_SNP_INIT,
>>          KVM_SEV_SNP_LAUNCH_START,
>> +       KVM_SEV_SNP_LAUNCH_UPDATE,
>>
>>          KVM_SEV_NR_MAX,
>>   };
>> @@ -1831,6 +1832,24 @@ struct kvm_sev_snp_launch_start {
>>          __u8 pad[6];
>>   };
>>
>> +#define KVM_SEV_SNP_PAGE_TYPE_NORMAL           0x1
>> +#define KVM_SEV_SNP_PAGE_TYPE_VMSA             0x2
>> +#define KVM_SEV_SNP_PAGE_TYPE_ZERO             0x3
>> +#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED       0x4
>> +#define KVM_SEV_SNP_PAGE_TYPE_SECRETS          0x5
>> +#define KVM_SEV_SNP_PAGE_TYPE_CPUID            0x6
>> +
>> +struct kvm_sev_snp_launch_update {
>> +       __u64 start_gfn;
>> +       __u64 uaddr;
>> +       __u32 len;
>> +       __u8 imi_page;
>> +       __u8 page_type;
>> +       __u8 vmpl3_perms;
>> +       __u8 vmpl2_perms;
>> +       __u8 vmpl1_perms;
>> +};
>> +
>>   #define KVM_DEV_ASSIGN_ENABLE_IOMMU    (1 << 0)
>>   #define KVM_DEV_ASSIGN_PCI_2_3         (1 << 1)
>>   #define KVM_DEV_ASSIGN_MASK_INTX       (1 << 2)
>> --
>> 2.17.1
>>
>>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-08-20 15:59 ` [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
@ 2021-09-28  9:56   ` Dr. David Alan Gilbert
  2021-10-12 21:48   ` Sean Christopherson
  1 sibling, 0 replies; 239+ messages in thread
From: Dr. David Alan Gilbert @ 2021-09-28  9:56 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

* Brijesh Singh (brijesh.singh@amd.com) wrote:
> SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
> table to be private or shared using the Page State Change MSR protocol
> as defined in the GHCB specification.
> 
> Before changing the page state in the RMP entry, lookup the page in the
> NPT to make sure that there is a valid mapping for it. If the mapping
> exist then try to find a workable page level between the NPT and RMP for
> the page. If the page is not mapped in the NPT, then create a fault such
> that it gets mapped before we change the page state in the RMP entry.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/sev-common.h |   9 ++
>  arch/x86/kvm/svm/sev.c            | 197 ++++++++++++++++++++++++++++++
>  arch/x86/kvm/trace.h              |  34 ++++++
>  arch/x86/kvm/x86.c                |   1 +
>  4 files changed, 241 insertions(+)
> 
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 91089967ab09..4980f77aa1d5 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -89,6 +89,10 @@ enum psc_op {
>  };
>  
>  #define GHCB_MSR_PSC_REQ		0x014
> +#define GHCB_MSR_PSC_GFN_POS		12
> +#define GHCB_MSR_PSC_GFN_MASK		GENMASK_ULL(39, 0)
> +#define GHCB_MSR_PSC_OP_POS		52
> +#define GHCB_MSR_PSC_OP_MASK		0xf
>  #define GHCB_MSR_PSC_REQ_GFN(gfn, op)			\
>  	/* GHCBData[55:52] */				\
>  	(((u64)((op) & 0xf) << 52) |			\
> @@ -98,6 +102,11 @@ enum psc_op {
>  	GHCB_MSR_PSC_REQ)
>  
>  #define GHCB_MSR_PSC_RESP		0x015
> +#define GHCB_MSR_PSC_ERROR_POS		32
> +#define GHCB_MSR_PSC_ERROR_MASK		GENMASK_ULL(31, 0)
> +#define GHCB_MSR_PSC_ERROR		GENMASK_ULL(31, 0)
> +#define GHCB_MSR_PSC_RSVD_POS		12
> +#define GHCB_MSR_PSC_RSVD_MASK		GENMASK_ULL(19, 0)
>  #define GHCB_MSR_PSC_RESP_VAL(val)			\
>  	/* GHCBData[63:32] */				\
>  	(((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 991b8c996fc1..6d9483ec91ab 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -31,6 +31,7 @@
>  #include "svm_ops.h"
>  #include "cpuid.h"
>  #include "trace.h"
> +#include "mmu.h"
>  
>  #define __ex(x) __kvm_handle_fault_on_reboot(x)
>  
> @@ -2905,6 +2906,181 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
>  	svm->vmcb->control.ghcb_gpa = value;
>  }
>  
> +static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
> +{
> +	pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
> +
> +	return psmash(pfn);
> +}
> +
> +static int snp_make_page_shared(struct kvm *kvm, gpa_t gpa, kvm_pfn_t pfn, int level)

....

> +
> +			/*
> +			 * Mark the userspace range unmerable before adding the pages

                                                    ^^^^^^^^^ typo

> +			 * in the RMP table.
> +			 */
> +			mmap_write_lock(kvm->mm);
> +			rc = snp_mark_unmergable(kvm, hva, page_level_size(level));
> +			mmap_write_unlock(kvm->mm);
> +			if (rc)
> +				return -EINVAL;
> +		}
> +
> +		write_lock(&kvm->mmu_lock);
> +
> +		rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
> +		if (!rc) {
> +			/*
> +			 * This may happen if another vCPU unmapped the page
> +			 * before we acquire the lock. Retry the PSC.
> +			 */
> +			write_unlock(&kvm->mmu_lock);
> +			return 0;
> +		}
> +
> +		/*
> +		 * Adjust the level so that we don't go higher than the backing
> +		 * page level.
> +		 */
> +		level = min_t(size_t, level, npt_level);
> +
> +		trace_kvm_snp_psc(vcpu->vcpu_id, pfn, gpa, op, level);
> +
> +		switch (op) {
> +		case SNP_PAGE_STATE_SHARED:
> +			rc = snp_make_page_shared(kvm, gpa, pfn, level);
> +			break;
> +		case SNP_PAGE_STATE_PRIVATE:
> +			rc = rmp_make_private(pfn, gpa, level, sev->asid, false);

Minor nit; it seems a shame that snp_make_page_shared and
rmp_make_private  both take gpa, pfn, level - in different orders.

Dave

> +			break;
> +		default:
> +			rc = -EINVAL;
> +			break;
> +		}
> +
> +		write_unlock(&kvm->mmu_lock);
> +
> +		if (rc) {
> +			pr_err_ratelimited("Error op %d gpa %llx pfn %llx level %d rc %d\n",
> +					   op, gpa, pfn, level, rc);
> +			return rc;
> +		}
> +
> +		gpa = gpa + page_level_size(level);
> +	}
> +
> +	return 0;
> +}
> +
>  static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  {
>  	struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -3005,6 +3181,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  				  GHCB_MSR_INFO_POS);
>  		break;
>  	}
> +	case GHCB_MSR_PSC_REQ: {
> +		gfn_t gfn;
> +		int ret;
> +		enum psc_op op;
> +
> +		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_GFN_MASK, GHCB_MSR_PSC_GFN_POS);
> +		op = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_OP_MASK, GHCB_MSR_PSC_OP_POS);
> +
> +		ret = __snp_handle_page_state_change(vcpu, op, gfn_to_gpa(gfn), PG_LEVEL_4K);
> +
> +		if (ret)
> +			set_ghcb_msr_bits(svm, GHCB_MSR_PSC_ERROR,
> +					  GHCB_MSR_PSC_ERROR_MASK, GHCB_MSR_PSC_ERROR_POS);
> +		else
> +			set_ghcb_msr_bits(svm, 0,
> +					  GHCB_MSR_PSC_ERROR_MASK, GHCB_MSR_PSC_ERROR_POS);
> +
> +		set_ghcb_msr_bits(svm, 0, GHCB_MSR_PSC_RSVD_MASK, GHCB_MSR_PSC_RSVD_POS);
> +		set_ghcb_msr_bits(svm, GHCB_MSR_PSC_RESP, GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
> +		break;
> +	}
>  	case GHCB_MSR_TERM_REQ: {
>  		u64 reason_set, reason_code;
>  
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 1c360e07856f..35ca1cf8440a 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -7,6 +7,7 @@
>  #include <asm/svm.h>
>  #include <asm/clocksource.h>
>  #include <asm/pvclock-abi.h>
> +#include <asm/sev-common.h>
>  
>  #undef TRACE_SYSTEM
>  #define TRACE_SYSTEM kvm
> @@ -1711,6 +1712,39 @@ TRACE_EVENT(kvm_vmgexit_msr_protocol_exit,
>  		  __entry->vcpu_id, __entry->ghcb_gpa, __entry->result)
>  );
>  
> +/*
> + * Tracepoint for the SEV-SNP page state change processing
> + */
> +#define psc_operation					\
> +	{SNP_PAGE_STATE_PRIVATE, "private"},		\
> +	{SNP_PAGE_STATE_SHARED,  "shared"}		\
> +
> +TRACE_EVENT(kvm_snp_psc,
> +	TP_PROTO(unsigned int vcpu_id, u64 pfn, u64 gpa, u8 op, int level),
> +	TP_ARGS(vcpu_id, pfn, gpa, op, level),
> +
> +	TP_STRUCT__entry(
> +		__field(int, vcpu_id)
> +		__field(u64, pfn)
> +		__field(u64, gpa)
> +		__field(u8, op)
> +		__field(int, level)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu_id = vcpu_id;
> +		__entry->pfn = pfn;
> +		__entry->gpa = gpa;
> +		__entry->op = op;
> +		__entry->level = level;
> +	),
> +
> +	TP_printk("vcpu %u, pfn %llx, gpa %llx, op %s, level %d",
> +		  __entry->vcpu_id, __entry->pfn, __entry->gpa,
> +		  __print_symbolic(__entry->op, psc_operation),
> +		  __entry->level)
> +);
> +
>  #endif /* _TRACE_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e5d5c5ed7dd4..afcdc75a99f2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12371,3 +12371,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_snp_psc);
> -- 
> 2.17.1
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 38/45] KVM: SVM: Add support to handle Page State Change VMGEXIT
  2021-08-20 15:59 ` [PATCH Part2 v5 38/45] KVM: SVM: Add support to handle " Brijesh Singh
@ 2021-09-28 10:17   ` Dr. David Alan Gilbert
  2021-09-28 23:20     ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Dr. David Alan Gilbert @ 2021-09-28 10:17 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

* Brijesh Singh (brijesh.singh@amd.com) wrote:
> SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
> table to be private or shared using the Page State Change NAE event
> as defined in the GHCB specification version 2.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/sev-common.h |  7 +++
>  arch/x86/kvm/svm/sev.c            | 82 +++++++++++++++++++++++++++++--
>  2 files changed, 84 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 4980f77aa1d5..5ee30bb2cdb8 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -126,6 +126,13 @@ enum psc_op {
>  /* SNP Page State Change NAE event */
>  #define VMGEXIT_PSC_MAX_ENTRY		253
>  
> +/* The page state change hdr structure in not valid */
> +#define PSC_INVALID_HDR			1
> +/* The hdr.cur_entry or hdr.end_entry is not valid */
> +#define PSC_INVALID_ENTRY		2
> +/* Page state change encountered undefined error */
> +#define PSC_UNDEF_ERR			3
> +
>  struct psc_hdr {
>  	u16 cur_entry;
>  	u16 end_entry;
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 6d9483ec91ab..0de85ed63e9b 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2731,6 +2731,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
>  	case SVM_VMGEXIT_AP_JUMP_TABLE:
>  	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>  	case SVM_VMGEXIT_HV_FEATURES:
> +	case SVM_VMGEXIT_PSC:
>  		break;
>  	default:
>  		goto vmgexit_err;
> @@ -3004,13 +3005,13 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
>  		 */
>  		rc = snp_check_and_build_npt(vcpu, gpa, level);
>  		if (rc)
> -			return -EINVAL;
> +			return PSC_UNDEF_ERR;
>  
>  		if (op == SNP_PAGE_STATE_PRIVATE) {
>  			hva_t hva;
>  
>  			if (snp_gpa_to_hva(kvm, gpa, &hva))
> -				return -EINVAL;
> +				return PSC_UNDEF_ERR;
>  
>  			/*
>  			 * Verify that the hva range is registered. This enforcement is
> @@ -3022,7 +3023,7 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
>  			rc = is_hva_registered(kvm, hva, page_level_size(level));
>  			mutex_unlock(&kvm->lock);
>  			if (!rc)
> -				return -EINVAL;
> +				return PSC_UNDEF_ERR;
>  
>  			/*
>  			 * Mark the userspace range unmerable before adding the pages
> @@ -3032,7 +3033,7 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
>  			rc = snp_mark_unmergable(kvm, hva, page_level_size(level));
>  			mmap_write_unlock(kvm->mm);
>  			if (rc)
> -				return -EINVAL;
> +				return PSC_UNDEF_ERR;
>  		}
>  
>  		write_lock(&kvm->mmu_lock);
> @@ -3062,8 +3063,11 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
>  		case SNP_PAGE_STATE_PRIVATE:
>  			rc = rmp_make_private(pfn, gpa, level, sev->asid, false);
>  			break;
> +		case SNP_PAGE_STATE_PSMASH:
> +		case SNP_PAGE_STATE_UNSMASH:
> +			/* TODO: Add support to handle it */
>  		default:
> -			rc = -EINVAL;
> +			rc = PSC_INVALID_ENTRY;
>  			break;
>  		}
>  
> @@ -3081,6 +3085,65 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
>  	return 0;
>  }
>  
> +static inline unsigned long map_to_psc_vmgexit_code(int rc)
> +{
> +	switch (rc) {
> +	case PSC_INVALID_HDR:
> +		return ((1ul << 32) | 1);
> +	case PSC_INVALID_ENTRY:
> +		return ((1ul << 32) | 2);
> +	case RMPUPDATE_FAIL_OVERLAP:
> +		return ((3ul << 32) | 2);
> +	default: return (4ul << 32);
> +	}

Are these the values defined in 56421 section 4.1.6 ?
If so, that says:
  SW_EXITINFO2[63:32] == 0x00000100
      The hypervisor encountered some other error situation and was not able to complete the
      request identified by page_state_change_header.cur_entry. It is left to the guest to decide how
      to proceed in this situation.

so it looks like the default should be 0x100 rather than 4?

(It's a shame they're all magical constants, it would be nice if the
standard have them names)

Dave


> +}
> +
> +static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm)
> +{
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	int level, op, rc = PSC_UNDEF_ERR;
> +	struct snp_psc_desc *info;
> +	struct psc_entry *entry;
> +	u16 cur, end;
> +	gpa_t gpa;
> +
> +	if (!sev_snp_guest(vcpu->kvm))
> +		return PSC_INVALID_HDR;
> +
> +	if (!setup_vmgexit_scratch(svm, true, sizeof(*info))) {
> +		pr_err("vmgexit: scratch area is not setup.\n");
> +		return PSC_INVALID_HDR;
> +	}
> +
> +	info = (struct snp_psc_desc *)svm->ghcb_sa;
> +	cur = info->hdr.cur_entry;
> +	end = info->hdr.end_entry;
> +
> +	if (cur >= VMGEXIT_PSC_MAX_ENTRY ||
> +	    end >= VMGEXIT_PSC_MAX_ENTRY || cur > end)
> +		return PSC_INVALID_ENTRY;
> +
> +	for (; cur <= end; cur++) {
> +		entry = &info->entries[cur];
> +		gpa = gfn_to_gpa(entry->gfn);
> +		level = RMP_TO_X86_PG_LEVEL(entry->pagesize);
> +		op = entry->operation;
> +
> +		if (!IS_ALIGNED(gpa, page_level_size(level))) {
> +			rc = PSC_INVALID_ENTRY;
> +			goto out;
> +		}
> +
> +		rc = __snp_handle_page_state_change(vcpu, op, gpa, level);
> +		if (rc)
> +			goto out;
> +	}
> +
> +out:
> +	info->hdr.cur_entry = cur;
> +	return rc ? map_to_psc_vmgexit_code(rc) : 0;
> +}
> +
>  static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  {
>  	struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -3315,6 +3378,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>  		ret = 1;
>  		break;
>  	}
> +	case SVM_VMGEXIT_PSC: {
> +		unsigned long rc;
> +
> +		ret = 1;
> +
> +		rc = snp_handle_page_state_change(svm);
> +		svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> +		break;
> +	}
>  	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>  		vcpu_unimpl(vcpu,
>  			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
> -- 
> 2.17.1
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 38/45] KVM: SVM: Add support to handle Page State Change VMGEXIT
  2021-09-28 10:17   ` Dr. David Alan Gilbert
@ 2021-09-28 23:20     ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-09-28 23:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 9/28/21 5:17 AM, Dr. David Alan Gilbert wrote:
> * Brijesh Singh (brijesh.singh@amd.com) wrote:
>> SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
>> table to be private or shared using the Page State Change NAE event
>> as defined in the GHCB specification version 2.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  arch/x86/include/asm/sev-common.h |  7 +++
>>  arch/x86/kvm/svm/sev.c            | 82 +++++++++++++++++++++++++++++--
>>  2 files changed, 84 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
>> index 4980f77aa1d5..5ee30bb2cdb8 100644
>> --- a/arch/x86/include/asm/sev-common.h
>> +++ b/arch/x86/include/asm/sev-common.h
>> @@ -126,6 +126,13 @@ enum psc_op {
>>  /* SNP Page State Change NAE event */
>>  #define VMGEXIT_PSC_MAX_ENTRY		253
>>  
>> +/* The page state change hdr structure in not valid */
>> +#define PSC_INVALID_HDR			1
>> +/* The hdr.cur_entry or hdr.end_entry is not valid */
>> +#define PSC_INVALID_ENTRY		2
>> +/* Page state change encountered undefined error */
>> +#define PSC_UNDEF_ERR			3
>> +
>>  struct psc_hdr {
>>  	u16 cur_entry;
>>  	u16 end_entry;
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 6d9483ec91ab..0de85ed63e9b 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -2731,6 +2731,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
>>  	case SVM_VMGEXIT_AP_JUMP_TABLE:
>>  	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>  	case SVM_VMGEXIT_HV_FEATURES:
>> +	case SVM_VMGEXIT_PSC:
>>  		break;
>>  	default:
>>  		goto vmgexit_err;
>> @@ -3004,13 +3005,13 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
>>  		 */
>>  		rc = snp_check_and_build_npt(vcpu, gpa, level);
>>  		if (rc)
>> -			return -EINVAL;
>> +			return PSC_UNDEF_ERR;
>>  
>>  		if (op == SNP_PAGE_STATE_PRIVATE) {
>>  			hva_t hva;
>>  
>>  			if (snp_gpa_to_hva(kvm, gpa, &hva))
>> -				return -EINVAL;
>> +				return PSC_UNDEF_ERR;
>>  
>>  			/*
>>  			 * Verify that the hva range is registered. This enforcement is
>> @@ -3022,7 +3023,7 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
>>  			rc = is_hva_registered(kvm, hva, page_level_size(level));
>>  			mutex_unlock(&kvm->lock);
>>  			if (!rc)
>> -				return -EINVAL;
>> +				return PSC_UNDEF_ERR;
>>  
>>  			/*
>>  			 * Mark the userspace range unmerable before adding the pages
>> @@ -3032,7 +3033,7 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
>>  			rc = snp_mark_unmergable(kvm, hva, page_level_size(level));
>>  			mmap_write_unlock(kvm->mm);
>>  			if (rc)
>> -				return -EINVAL;
>> +				return PSC_UNDEF_ERR;
>>  		}
>>  
>>  		write_lock(&kvm->mmu_lock);
>> @@ -3062,8 +3063,11 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
>>  		case SNP_PAGE_STATE_PRIVATE:
>>  			rc = rmp_make_private(pfn, gpa, level, sev->asid, false);
>>  			break;
>> +		case SNP_PAGE_STATE_PSMASH:
>> +		case SNP_PAGE_STATE_UNSMASH:
>> +			/* TODO: Add support to handle it */
>>  		default:
>> -			rc = -EINVAL;
>> +			rc = PSC_INVALID_ENTRY;
>>  			break;
>>  		}
>>  
>> @@ -3081,6 +3085,65 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op,
>>  	return 0;
>>  }
>>  
>> +static inline unsigned long map_to_psc_vmgexit_code(int rc)
>> +{
>> +	switch (rc) {
>> +	case PSC_INVALID_HDR:
>> +		return ((1ul << 32) | 1);
>> +	case PSC_INVALID_ENTRY:
>> +		return ((1ul << 32) | 2);
>> +	case RMPUPDATE_FAIL_OVERLAP:
>> +		return ((3ul << 32) | 2);
>> +	default: return (4ul << 32);
>> +	}
> Are these the values defined in 56421 section 4.1.6 ?
> If so, that says:
>   SW_EXITINFO2[63:32] == 0x00000100
>       The hypervisor encountered some other error situation and was not able to complete the
>       request identified by page_state_change_header.cur_entry. It is left to the guest to decide how
>       to proceed in this situation.
>
> so it looks like the default should be 0x100 rather than 4?

Ah good catch, it should be 0x100. I will fix it.


> (It's a shame they're all magical constants, it would be nice if the
> standard have them names)

In early RFC's I had a macros to build the error code but based on
feedback's went with this function with open coded values.

thanks



^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 41/45] KVM: SVM: Add support to handle the RMP nested page fault
  2021-08-20 15:59 ` [PATCH Part2 v5 41/45] KVM: SVM: Add support to handle the RMP nested page fault Brijesh Singh
@ 2021-09-29 12:24   ` Dr. David Alan Gilbert
  2021-10-13 17:57   ` Sean Christopherson
  1 sibling, 0 replies; 239+ messages in thread
From: Dr. David Alan Gilbert @ 2021-09-29 12:24 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

* Brijesh Singh (brijesh.singh@amd.com) wrote:
> When SEV-SNP is enabled in the guest, the hardware places restrictions on
> all memory accesses based on the contents of the RMP table. When hardware
> encounters RMP check failure caused by the guest memory access it raises
> the #NPF. The error code contains additional information on the access
> type. See the APM volume 2 for additional information.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c | 14 +++++---
>  arch/x86/kvm/svm/svm.h |  1 +
>  3 files changed, 87 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 65b578463271..712e8907bc39 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3651,3 +3651,79 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token)
>  
>  	srcu_read_unlock(&sev->psc_srcu, token);
>  }
> +
> +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
> +{
> +	int rmp_level, npt_level, rc, assigned;
> +	struct kvm *kvm = vcpu->kvm;
> +	gfn_t gfn = gpa_to_gfn(gpa);
> +	bool need_psc = false;
> +	enum psc_op psc_op;
> +	kvm_pfn_t pfn;
> +	bool private;
> +
> +	write_lock(&kvm->mmu_lock);
> +
> +	if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level)))
> +		goto unlock;
> +
> +	assigned = snp_lookup_rmpentry(pfn, &rmp_level);
> +	if (unlikely(assigned < 0))
> +		goto unlock;
> +
> +	private = !!(error_code & PFERR_GUEST_ENC_MASK);
> +
> +	/*
> +	 * If the fault was due to size mismatch, or NPT and RMP page level's
> +	 * are not in sync, then use PSMASH to split the RMP entry into 4K.
> +	 */
> +	if ((error_code & PFERR_GUEST_SIZEM_MASK) ||
> +	    (npt_level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M && private)) {
> +		rc = snp_rmptable_psmash(kvm, pfn);
> +		if (rc)
> +			pr_err_ratelimited("psmash failed, gpa 0x%llx pfn 0x%llx rc %d\n",
> +					   gpa, pfn, rc);
> +		goto out;
> +	}
> +
> +	/*
> +	 * If it's a private access, and the page is not assigned in the
> +	 * RMP table, create a new private RMP entry. This can happen if
> +	 * guest did not use the PSC VMGEXIT to transition the page state
> +	 * before the access.
> +	 */
> +	if (!assigned && private) {
> +		need_psc = 1;
> +		psc_op = SNP_PAGE_STATE_PRIVATE;
> +		goto out;
> +	}
> +
> +	/*
> +	 * If it's a shared access, but the page is private in the RMP table
> +	 * then make the page shared in the RMP table. This can happen if
> +	 * the guest did not use the PSC VMGEXIT to transition the page
> +	 * state before the access.
> +	 */
> +	if (assigned && !private) {
> +		need_psc = 1;
> +		psc_op = SNP_PAGE_STATE_SHARED;
> +	}
> +
> +out:
> +	write_unlock(&kvm->mmu_lock);
> +
> +	if (need_psc)
> +		rc = __snp_handle_page_state_change(vcpu, psc_op, gpa, PG_LEVEL_4K);

That 'rc' never goes anywhere - should it?

> +	/*
> +	 * The fault handler has updated the RMP pagesize, zap the existing
> +	 * rmaps for large entry ranges so that nested page table gets rebuilt
> +	 * with the updated RMP pagesize.
> +	 */
> +	gfn = gpa_to_gfn(gpa) & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
> +	kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
> +	return;
> +
> +unlock:
> +	write_unlock(&kvm->mmu_lock);
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3784d389247b..3ba62f21b113 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1933,15 +1933,21 @@ static int pf_interception(struct kvm_vcpu *vcpu)
>  static int npf_interception(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> +	int rc;
>  
>  	u64 fault_address = svm->vmcb->control.exit_info_2;
>  	u64 error_code = svm->vmcb->control.exit_info_1;
>  
>  	trace_kvm_page_fault(fault_address, error_code);
> -	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
> -			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
> -			svm->vmcb->control.insn_bytes : NULL,
> -			svm->vmcb->control.insn_len);
> +	rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
> +				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
> +				svm->vmcb->control.insn_bytes : NULL,
> +				svm->vmcb->control.insn_len);

If kvm_mmu_page_fault failed, (rc!=0) do you still want to call your
handler?

Dave

> +	if (error_code & PFERR_GUEST_RMP_MASK)
> +		handle_rmp_page_fault(vcpu, fault_address, error_code);
> +
> +	return rc;
>  }
>  
>  static int db_interception(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index ff91184f9b4a..280072995306 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -626,6 +626,7 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>  void sev_rmp_page_level_adjust(struct kvm *kvm, kvm_pfn_t pfn, int *level);
>  int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token);
>  void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token);
> +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
>  
>  /* vmenter.S */
>  
> -- 
> 2.17.1
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 04/45] x86/sev: Add RMP entry lookup helpers
  2021-09-27 16:04       ` Brijesh Singh
@ 2021-09-29 12:56         ` Borislav Petkov
  0 siblings, 0 replies; 239+ messages in thread
From: Borislav Petkov @ 2021-09-29 12:56 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Mon, Sep 27, 2021 at 11:04:04AM -0500, Brijesh Singh wrote:
> Currently, we have only x86 drivers using it, are you thinking to move
> this to arch/x86/include/asm/ ?

arch/x86/include/asm/sev.h, yap.

We can always create it later if needed by other architectures.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 06/45] x86/sev: Invalid pages from direct map when adding it to RMP table
  2021-08-20 15:58 ` [PATCH Part2 v5 06/45] x86/sev: Invalid pages from direct map when adding it to RMP table Brijesh Singh
@ 2021-09-29 14:34   ` Borislav Petkov
  2021-09-30 16:19     ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Borislav Petkov @ 2021-09-29 14:34 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:58:39AM -0500, Brijesh Singh wrote:
> Subject: Re: [PATCH Part2 v5 06/45] x86/sev: Invalid pages from direct map when adding it to RMP table

That subject needs to have a verb. I think that verb should be
"Invalidate".

> The integrity guarantee of SEV-SNP is enforced through the RMP table.
> The RMP is used with standard x86 and IOMMU page tables to enforce memory
> restrictions and page access rights. The RMP check is enforced as soon as
> SEV-SNP is enabled globally in the system. When hardware encounters an
> RMP checks failure, it raises a page-fault exception.
> 
> The rmp_make_private() and rmp_make_shared() helpers are used to add
> or remove the pages from the RMP table.

> Improve the rmp_make_private() to
> invalid state so that pages cannot be used in the direct-map after its
> added in the RMP table, and restore to its default valid permission after
> the pages are removed from the RMP table.

That sentence needs rewriting into proper english.

The more important thing is, though, this doesn't talk about *why*
you're doing this: you want to remove pages from the direct map when
they're in the RMP table because something might modify the page and
then the RMP check will fail?

Also, set_direct_map_invalid_noflush() simply clears the Present and RW
bit of a pte.

So what's up?

> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kernel/sev.c | 61 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 8627c49666c9..bad41deb8335 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2441,10 +2441,42 @@ int psmash(u64 pfn)
>  }
>  EXPORT_SYMBOL_GPL(psmash);
>  
> +static int restore_direct_map(u64 pfn, int npages)

restore_pages_in_direct_map()

> +{
> +	int i, ret = 0;
> +
> +	for (i = 0; i < npages; i++) {
> +		ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));
> +		if (ret)
> +			goto cleanup;
> +	}

So this is looping over a set of virtually contiguous pages, I presume,
and if so, you should add a function called

	set_memory_p_rw()

to arch/x86/mm/pat/set_memory.c which does

	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT | _PAGE_RW), 0);

so that you can do all pages in one go.

> +
> +cleanup:
> +	WARN(ret > 0, "Failed to restore direct map for pfn 0x%llx\n", pfn + i);
> +	return ret;
> +}
> +
> +static int invalid_direct_map(unsigned long pfn, int npages)

invalidate_pages_in_direct_map()

or so.

> +{
> +	int i, ret = 0;
> +
> +	for (i = 0; i < npages; i++) {
> +		ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));

Same as above but that helper should do the reverse:

set_memory_np_ro()
{
	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT | _PAGE_RW), 0);
}

Btw, please add those helpers in a separate patch.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 07/45] x86/traps: Define RMP violation #PF error code
  2021-08-20 15:58 ` [PATCH Part2 v5 07/45] x86/traps: Define RMP violation #PF error code Brijesh Singh
@ 2021-09-29 17:25   ` Borislav Petkov
  0 siblings, 0 replies; 239+ messages in thread
From: Borislav Petkov @ 2021-09-29 17:25 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:58:40AM -0500, Brijesh Singh wrote:
>  enum x86_pf_error_code {
> -	X86_PF_PROT	=		1 << 0,
> -	X86_PF_WRITE	=		1 << 1,
> -	X86_PF_USER	=		1 << 2,
> -	X86_PF_RSVD	=		1 << 3,
> -	X86_PF_INSTR	=		1 << 4,
> -	X86_PF_PK	=		1 << 5,
> -	X86_PF_SGX	=		1 << 15,
> +	X86_PF_PROT	=		BIT_ULL(0),
> +	X86_PF_WRITE	=		BIT_ULL(1),
> +	X86_PF_USER	=		BIT_ULL(2),
> +	X86_PF_RSVD	=		BIT_ULL(3),
> +	X86_PF_INSTR	=		BIT_ULL(4),
> +	X86_PF_PK	=		BIT_ULL(5),
> +	X86_PF_SGX	=		BIT_ULL(15),
> +	X86_PF_RMP	=		BIT_ULL(31),

Those are tested against error_code mostly, which is unsigned long so it
looks like you wanna use _BITUL() here. Not that it matters on x86-64
but if we want to be precise...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address
  2021-08-20 15:58 ` [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address Brijesh Singh
  2021-08-23 14:20   ` Dave Hansen
@ 2021-09-29 18:19   ` Borislav Petkov
  1 sibling, 0 replies; 239+ messages in thread
From: Borislav Petkov @ 2021-09-29 18:19 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:58:41AM -0500, Brijesh Singh wrote:
> +static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned long error_code,
> +				      unsigned long address)
> +{

#ifdef CONFIG_AMD_MEM_ENCRYPT

> +	int rmp_level, level;
> +	pte_t *pte;
> +	u64 pfn;
> +
> +	pte = lookup_address_in_mm(current->mm, address, &level);
> +
> +	/*
> +	 * It can happen if there was a race between an unmap event and
> +	 * the RMP fault delivery.
> +	 */
> +	if (!pte || !pte_present(*pte))
> +		return 1;
> +
> +	pfn = pte_pfn(*pte);
> +
> +	/* If its large page then calculte the fault pfn */
> +	if (level > PG_LEVEL_4K) {
> +		unsigned long mask;
> +
> +		mask = pages_per_hpage(level) - pages_per_hpage(level - 1);

Just use two helper variables named properly instead of this oneliner:

		pages_level 	 = page_level_size(level) / PAGE_SIZE;
		pages_prev_level = page_level_size(level - 1) / PAGE_SIZE;

> +		pfn |= (address >> PAGE_SHIFT) & mask;
> +	}
> +
> +	/*
> +	 * If its a guest private page, then the fault cannot be resolved.
> +	 * Send a SIGBUS to terminate the process.
> +	 */
> +	if (snp_lookup_rmpentry(pfn, &rmp_level)) {
> +		do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
> +		return 1;
> +	}
> +
> +	/*
> +	 * The backing page level is higher than the RMP page level, request
> +	 * to split the page.
> +	 */
> +	if (level > rmp_level)
> +		return 0;
> +
> +	return 1;

#else
	WARN_ONONCE(1);
	return -1;
#endif

and also handle that -1 negative value at the call site.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 09/45] x86/fault: Add support to dump RMP entry on fault
  2021-08-20 15:58 ` [PATCH Part2 v5 09/45] x86/fault: Add support to dump RMP entry on fault Brijesh Singh
@ 2021-09-29 18:38   ` Borislav Petkov
  0 siblings, 0 replies; 239+ messages in thread
From: Borislav Petkov @ 2021-09-29 18:38 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:58:42AM -0500, Brijesh Singh wrote:
> When SEV-SNP is enabled globally, a write from the host goes through the
> RMP check. If the hardware encounters the check failure, then it raises
> the #PF (with RMP set). Dump the RMP entry at the faulting pfn to help
> the debug.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/sev.h |  7 +++++++
>  arch/x86/kernel/sev.c      | 43 ++++++++++++++++++++++++++++++++++++++
>  arch/x86/mm/fault.c        | 17 +++++++++++----
>  include/linux/sev.h        |  2 ++
>  4 files changed, 65 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 92ced9626e95..569294f687e6 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -106,6 +106,11 @@ struct __packed rmpentry {
>  
>  #define rmpentry_assigned(x)	((x)->info.assigned)
>  #define rmpentry_pagesize(x)	((x)->info.pagesize)
> +#define rmpentry_vmsa(x)	((x)->info.vmsa)
> +#define rmpentry_asid(x)	((x)->info.asid)
> +#define rmpentry_validated(x)	((x)->info.validated)
> +#define rmpentry_gpa(x)		((unsigned long)(x)->info.gpa)
> +#define rmpentry_immutable(x)	((x)->info.immutable)

If some of those accessors are going to be used only in dump_rmpentry(),
then you don't really need them.

>  
>  #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
>  
> @@ -165,6 +170,7 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
>  void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
>  void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
>  void snp_set_wakeup_secondary_cpu(void);
> +void dump_rmpentry(u64 pfn);
>  #ifdef __BOOT_COMPRESSED
>  bool sev_snp_enabled(void);
>  #else
> @@ -188,6 +194,7 @@ static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npage
>  static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
>  static inline void snp_set_wakeup_secondary_cpu(void) { }
>  static inline void sev_snp_cpuid_init(struct boot_params *bp) { }
> +static inline void dump_rmpentry(u64 pfn) {}
>  #ifdef __BOOT_COMPRESSED
>  static inline bool sev_snp_enabled { return false; }
>  #else
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index bad41deb8335..8b3e83e50468 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2404,6 +2404,49 @@ static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
>  	return entry;
>  }
>  
> +void dump_rmpentry(u64 pfn)

snp_dump_rmpentry()

> +{
> +	unsigned long pfn_end;
> +	struct rmpentry *e;
> +	int level;
> +
> +	e = __snp_lookup_rmpentry(pfn, &level);
> +	if (!e) {
> +		pr_alert("failed to read RMP entry pfn 0x%llx\n", pfn);
> +		return;
> +	}
> +
> +	if (rmpentry_assigned(e)) {
> +		pr_alert("RMPEntry paddr 0x%llx [assigned=%d immutable=%d pagesize=%d gpa=0x%lx"
> +			" asid=%d vmsa=%d validated=%d]\n", pfn << PAGE_SHIFT,

WARNING: quoted string split across lines
#174: FILE: arch/x86/kernel/sev.c:2421:
+		pr_alert("RMPEntry paddr 0x%llx [assigned=%d immutable=%d pagesize=%d gpa=0x%lx"
+			" asid=%d vmsa=%d validated=%d]\n", pfn << PAGE_SHIFT,

> +			rmpentry_assigned(e), rmpentry_immutable(e), rmpentry_pagesize(e),
> +			rmpentry_gpa(e), rmpentry_asid(e), rmpentry_vmsa(e),
> +			rmpentry_validated(e));
> +		return;
> +	}
> +
> +	/*
> +	 * If the RMP entry at the faulting pfn was not assigned, then we do not
> +	 * know what caused the RMP violation. To get some useful debug information,
> +	 * let iterate through the entire 2MB region, and dump the RMP entries if
> +	 * one of the bit in the RMP entry is set.
> +	 */
> +	pfn = pfn & ~(PTRS_PER_PMD - 1);
> +	pfn_end = pfn + PTRS_PER_PMD;
> +
> +	while (pfn < pfn_end) {
> +		e = __snp_lookup_rmpentry(pfn, &level);
> +		if (!e)
> +			return;

return? You mean "continue;" ?

> +
> +		if (e->low || e->high)
> +			pr_alert("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> +				 pfn << PAGE_SHIFT, e->high, e->low);
> +		pfn++;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(dump_rmpentry);

Why is that exported? Some module will be calling it too?

> +
>  /*
>   * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
>   * and -errno if there is no corresponding RMP entry.
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index f2d543b92f43..9cd33169dfb5 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -33,6 +33,7 @@
>  #include <asm/pgtable_areas.h>		/* VMALLOC_START, ...		*/
>  #include <asm/kvm_para.h>		/* kvm_handle_async_pf		*/
>  #include <asm/vdso.h>			/* fixup_vdso_exception()	*/
> +#include <asm/sev.h>			/* dump_rmpentry()		*/
>  
>  #define CREATE_TRACE_POINTS
>  #include <asm/trace/exceptions.h>
> @@ -289,7 +290,7 @@ static bool low_pfn(unsigned long pfn)
>  	return pfn < max_low_pfn;
>  }
>  
> -static void dump_pagetable(unsigned long address)
> +static void dump_pagetable(unsigned long address, bool show_rmpentry)

I think passing in error_code and testing X86_PF_RMP inside should
make this a bit more palatable than simply "grafting" SNP-specific
functionality to generic paths.

Thx. 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 42/45] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2021-08-20 15:59 ` [PATCH Part2 v5 42/45] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Brijesh Singh
@ 2021-09-29 21:33   ` Peter Gonda
  2021-09-29 22:00   ` Peter Gonda
  1 sibling, 0 replies; 239+ messages in thread
From: Peter Gonda @ 2021-09-29 21:33 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	Marc Orr, sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:01 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> Version 2 of GHCB specification added the support for two SNP Guest
> Request Message NAE events. The events allows for an SEV-SNP guest to
> make request to the SEV-SNP firmware through hypervisor using the
> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>
> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
> difference of an additional certificate blob that can be passed through
> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
> provides snp_guest_ext_guest_request() that is used by the KVM to get
> both the report and certificate data at once.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 197 +++++++++++++++++++++++++++++++++++++++--
>  arch/x86/kvm/svm/svm.h |   2 +
>  2 files changed, 193 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 712e8907bc39..81ccad412e55 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -19,6 +19,7 @@
>  #include <linux/trace_events.h>
>  #include <linux/sev.h>
>  #include <linux/ksm.h>
> +#include <linux/sev-guest.h>
>  #include <asm/fpu/internal.h>
>
>  #include <asm/pkru.h>
> @@ -338,6 +339,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
>                 init_srcu_struct(&sev->psc_srcu);
>                 ret = sev_snp_init(&argp->error);
> +               mutex_init(&sev->guest_req_lock);
>         } else {
>                 ret = sev_platform_init(&argp->error);
>         }
> @@ -1602,23 +1604,39 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
>  static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
> +       void *context = NULL, *certs_data = NULL, *resp_page = NULL;
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>         struct sev_data_snp_gctx_create data = {};
> -       void *context;
>         int rc;
>
> +       /* Allocate memory used for the certs data in SNP guest request */
> +       certs_data = kmalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
> +       if (!certs_data)
> +               return NULL;
> +
>         /* Allocate memory for context page */
>         context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>         if (!context)
> -               return NULL;
> +               goto e_free;
> +
> +       /* Allocate a firmware buffer used during the guest command handling. */
> +       resp_page = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
> +       if (!resp_page)
> +               goto e_free;
>
>         data.gctx_paddr = __psp_pa(context);
>         rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
> -       if (rc) {
> -               snp_free_firmware_page(context);
> -               return NULL;
> -       }
> +       if (rc)
> +               goto e_free;
> +
> +       sev->snp_certs_data = certs_data;
>
>         return context;
> +
> +e_free:
> +       snp_free_firmware_page(context);
> +       kfree(certs_data);
> +       return NULL;
>  }
>
>  static int snp_bind_asid(struct kvm *kvm, int *error)
> @@ -2248,6 +2266,8 @@ static int snp_decommission_context(struct kvm *kvm)
>         snp_free_firmware_page(sev->snp_context);
>         sev->snp_context = NULL;
>
> +       kfree(sev->snp_certs_data);
> +
>         return 0;
>  }
>
> @@ -2746,6 +2766,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
>         case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>         case SVM_VMGEXIT_HV_FEATURES:
>         case SVM_VMGEXIT_PSC:
> +       case SVM_VMGEXIT_GUEST_REQUEST:
> +       case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>                 break;
>         default:
>                 goto vmgexit_err;
> @@ -3161,6 +3183,155 @@ static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm)
>         return rc ? map_to_psc_vmgexit_code(rc) : 0;
>  }
>
> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
> +                                        struct sev_data_snp_guest_request *data,
> +                                        gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm *kvm = vcpu->kvm;
> +       kvm_pfn_t req_pfn, resp_pfn;
> +       struct kvm_sev_info *sev;
> +
> +       sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
> +               return SEV_RET_INVALID_PARAM;
> +
> +       req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
> +       if (is_error_noslot_pfn(req_pfn))
> +               return SEV_RET_INVALID_ADDRESS;
> +
> +       resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
> +       if (is_error_noslot_pfn(resp_pfn))
> +               return SEV_RET_INVALID_ADDRESS;
> +
> +       if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
> +               return SEV_RET_INVALID_ADDRESS;
> +
> +       data->gctx_paddr = __psp_pa(sev->snp_context);
> +       data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> +       data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
> +
> +       return 0;
> +}
> +
> +static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
> +{
> +       u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
> +       int ret;
> +
> +       ret = snp_page_reclaim(pfn);
> +       if (ret)
> +               *rc = SEV_RET_INVALID_ADDRESS;
> +
> +       ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> +       if (ret)
> +               *rc = SEV_RET_INVALID_ADDRESS;
> +}
> +
> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +       struct sev_data_snp_guest_request data = {0};
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm *kvm = vcpu->kvm;
> +       struct kvm_sev_info *sev;
> +       unsigned long rc;
> +       int err;
> +
> +       if (!sev_snp_guest(vcpu->kvm)) {
> +               rc = SEV_RET_INVALID_GUEST;
> +               goto e_fail;
> +       }
> +
> +       sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       mutex_lock(&sev->guest_req_lock);
> +
> +       rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
> +       if (rc)
> +               goto unlock;
> +
> +       rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> +       if (rc)
> +               /* use the firmware error code */
> +               rc = err;
> +
> +       snp_cleanup_guest_buf(&data, &rc);
> +
> +unlock:
> +       mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> +       svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> +}
> +
> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +       struct sev_data_snp_guest_request req = {0};
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm *kvm = vcpu->kvm;
> +       unsigned long data_npages;
> +       struct kvm_sev_info *sev;
> +       unsigned long rc, err;
> +       u64 data_gpa;
> +
> +       if (!sev_snp_guest(vcpu->kvm)) {
> +               rc = SEV_RET_INVALID_GUEST;
> +               goto e_fail;
> +       }
> +
> +       sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> +       data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> +
> +       if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> +               rc = SEV_RET_INVALID_ADDRESS;
> +               goto e_fail;
> +       }
> +
> +       /* Verify that requested blob will fit in certificate buffer */
> +       if ((data_npages << PAGE_SHIFT) > SEV_FW_BLOB_MAX_SIZE) {
> +               rc = SEV_RET_INVALID_PARAM;
> +               goto e_fail;
> +       }
> +
> +       mutex_lock(&sev->guest_req_lock);
> +
> +       rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
> +       if (rc)
> +               goto unlock;
> +
> +       rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
> +                                        &data_npages, &err);
> +       if (rc) {
> +               /*
> +                * If buffer length is small then return the expected
> +                * length in rbx.
> +                */
> +               if (err == SNP_GUEST_REQ_INVALID_LEN)
> +                       vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
> +
> +               /* pass the firmware error code */
> +               rc = err;
> +               goto cleanup;
> +       }
> +
> +       /* Copy the certificate blob in the guest memory */
> +       if (data_npages &&
> +           kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
> +               rc = SEV_RET_INVALID_ADDRESS;
> +
> +cleanup:
> +       snp_cleanup_guest_buf(&req, &rc);
> +
> +unlock:
> +       mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> +       svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> +}
> +
>  static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  {
>         struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -3404,6 +3575,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>                 svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>                 break;
>         }
> +       case SVM_VMGEXIT_GUEST_REQUEST: {
> +               snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
> +
> +               ret = 1;
> +               break;
> +       }
> +       case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
> +               snp_handle_ext_guest_request(svm,
> +                                            control->exit_info_1,
> +                                            control->exit_info_2);
> +
> +               ret = 1;
> +               break;
> +       }

Don't we need some sort of rate limiting here? Since the PSP is a
shared resource that the host needs access to as well as other guests
it seems a little dangerous to let the guest spam this interface.

>         case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>                 vcpu_unimpl(vcpu,
>                             "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 280072995306..71fe46a778f3 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -92,6 +92,8 @@ struct kvm_sev_info {
>         u64 snp_init_flags;
>         void *snp_context;      /* SNP guest context page */
>         struct srcu_struct psc_srcu;
> +       void *snp_certs_data;
> +       struct mutex guest_req_lock;
>  };
>
>  struct kvm_svm {
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 42/45] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2021-08-20 15:59 ` [PATCH Part2 v5 42/45] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Brijesh Singh
  2021-09-29 21:33   ` Peter Gonda
@ 2021-09-29 22:00   ` Peter Gonda
  1 sibling, 0 replies; 239+ messages in thread
From: Peter Gonda @ 2021-09-29 22:00 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	Marc Orr, sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:01 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> Version 2 of GHCB specification added the support for two SNP Guest
> Request Message NAE events. The events allows for an SEV-SNP guest to
> make request to the SEV-SNP firmware through hypervisor using the
> SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
>
> The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
> difference of an additional certificate blob that can be passed through
> the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
> provides snp_guest_ext_guest_request() that is used by the KVM to get
> both the report and certificate data at once.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 197 +++++++++++++++++++++++++++++++++++++++--
>  arch/x86/kvm/svm/svm.h |   2 +
>  2 files changed, 193 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 712e8907bc39..81ccad412e55 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -19,6 +19,7 @@
>  #include <linux/trace_events.h>
>  #include <linux/sev.h>
>  #include <linux/ksm.h>
> +#include <linux/sev-guest.h>
>  #include <asm/fpu/internal.h>
>
>  #include <asm/pkru.h>
> @@ -338,6 +339,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
>                 init_srcu_struct(&sev->psc_srcu);
>                 ret = sev_snp_init(&argp->error);
> +               mutex_init(&sev->guest_req_lock);
>         } else {
>                 ret = sev_platform_init(&argp->error);
>         }
> @@ -1602,23 +1604,39 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
>  static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
> +       void *context = NULL, *certs_data = NULL, *resp_page = NULL;
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>         struct sev_data_snp_gctx_create data = {};
> -       void *context;
>         int rc;
>
> +       /* Allocate memory used for the certs data in SNP guest request */
> +       certs_data = kmalloc(SEV_FW_BLOB_MAX_SIZE, GFP_KERNEL_ACCOUNT);
> +       if (!certs_data)
> +               return NULL;
> +
>         /* Allocate memory for context page */
>         context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
>         if (!context)
> -               return NULL;
> +               goto e_free;
> +
> +       /* Allocate a firmware buffer used during the guest command handling. */
> +       resp_page = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
> +       if (!resp_page)
> +               goto e_free;
>
>         data.gctx_paddr = __psp_pa(context);
>         rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
> -       if (rc) {
> -               snp_free_firmware_page(context);
> -               return NULL;
> -       }
> +       if (rc)
> +               goto e_free;
> +
> +       sev->snp_certs_data = certs_data;
>
>         return context;
> +
> +e_free:
> +       snp_free_firmware_page(context);
> +       kfree(certs_data);
> +       return NULL;
>  }
>
>  static int snp_bind_asid(struct kvm *kvm, int *error)
> @@ -2248,6 +2266,8 @@ static int snp_decommission_context(struct kvm *kvm)
>         snp_free_firmware_page(sev->snp_context);
>         sev->snp_context = NULL;
>
> +       kfree(sev->snp_certs_data);
> +
>         return 0;
>  }
>
> @@ -2746,6 +2766,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm, u64 *exit_code)
>         case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>         case SVM_VMGEXIT_HV_FEATURES:
>         case SVM_VMGEXIT_PSC:
> +       case SVM_VMGEXIT_GUEST_REQUEST:
> +       case SVM_VMGEXIT_EXT_GUEST_REQUEST:
>                 break;
>         default:
>                 goto vmgexit_err;
> @@ -3161,6 +3183,155 @@ static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm)
>         return rc ? map_to_psc_vmgexit_code(rc) : 0;
>  }
>
> +static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
> +                                        struct sev_data_snp_guest_request *data,
> +                                        gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm *kvm = vcpu->kvm;
> +       kvm_pfn_t req_pfn, resp_pfn;
> +       struct kvm_sev_info *sev;
> +
> +       sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
> +               return SEV_RET_INVALID_PARAM;
> +
> +       req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
> +       if (is_error_noslot_pfn(req_pfn))
> +               return SEV_RET_INVALID_ADDRESS;
> +
> +       resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
> +       if (is_error_noslot_pfn(resp_pfn))
> +               return SEV_RET_INVALID_ADDRESS;
> +
> +       if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
> +               return SEV_RET_INVALID_ADDRESS;
> +
> +       data->gctx_paddr = __psp_pa(sev->snp_context);
> +       data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> +       data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
> +
> +       return 0;
> +}
> +
> +static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
> +{
> +       u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
> +       int ret;
> +
> +       ret = snp_page_reclaim(pfn);
> +       if (ret)
> +               *rc = SEV_RET_INVALID_ADDRESS;
> +
> +       ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> +       if (ret)
> +               *rc = SEV_RET_INVALID_ADDRESS;
> +}
> +
> +static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +       struct sev_data_snp_guest_request data = {0};
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm *kvm = vcpu->kvm;
> +       struct kvm_sev_info *sev;
> +       unsigned long rc;
> +       int err;
> +
> +       if (!sev_snp_guest(vcpu->kvm)) {
> +               rc = SEV_RET_INVALID_GUEST;
> +               goto e_fail;
> +       }
> +
> +       sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       mutex_lock(&sev->guest_req_lock);
> +
> +       rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
> +       if (rc)
> +               goto unlock;
> +
> +       rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> +       if (rc)
> +               /* use the firmware error code */
> +               rc = err;
> +
> +       snp_cleanup_guest_buf(&data, &rc);
> +
> +unlock:
> +       mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> +       svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> +}
> +
> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +       struct sev_data_snp_guest_request req = {0};
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm *kvm = vcpu->kvm;
> +       unsigned long data_npages;
> +       struct kvm_sev_info *sev;
> +       unsigned long rc, err;
> +       u64 data_gpa;
> +
> +       if (!sev_snp_guest(vcpu->kvm)) {
> +               rc = SEV_RET_INVALID_GUEST;
> +               goto e_fail;
> +       }
> +
> +       sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> +       data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> +
> +       if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> +               rc = SEV_RET_INVALID_ADDRESS;
> +               goto e_fail;
> +       }
> +
> +       /* Verify that requested blob will fit in certificate buffer */
> +       if ((data_npages << PAGE_SHIFT) > SEV_FW_BLOB_MAX_SIZE) {
> +               rc = SEV_RET_INVALID_PARAM;
> +               goto e_fail;
> +       }
> +
> +       mutex_lock(&sev->guest_req_lock);
> +
> +       rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
> +       if (rc)
> +               goto unlock;
> +
> +       rc = snp_guest_ext_guest_request(&req, (unsigned long)sev->snp_certs_data,
> +                                        &data_npages, &err);

What's the point of caching the certs from the sev_device in the
sev_info struct if the snp_guest_ext_guest_request() function always
seems to re-copy the certs back into |sev->snp_certs_data| on every
call? I am probably missing something.


> +       if (rc) {
> +               /*
> +                * If buffer length is small then return the expected
> +                * length in rbx.
> +                */
> +               if (err == SNP_GUEST_REQ_INVALID_LEN)
> +                       vcpu->arch.regs[VCPU_REGS_RBX] = data_npages;
> +
> +               /* pass the firmware error code */
> +               rc = err;
> +               goto cleanup;
> +       }
> +
> +       /* Copy the certificate blob in the guest memory */
> +       if (data_npages &&
> +           kvm_write_guest(kvm, data_gpa, sev->snp_certs_data, data_npages << PAGE_SHIFT))
> +               rc = SEV_RET_INVALID_ADDRESS;
> +
> +cleanup:
> +       snp_cleanup_guest_buf(&req, &rc);
> +
> +unlock:
> +       mutex_unlock(&sev->guest_req_lock);
> +
> +e_fail:
> +       svm_set_ghcb_sw_exit_info_2(vcpu, rc);
> +}
> +
>  static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  {
>         struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -3404,6 +3575,20 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>                 svm_set_ghcb_sw_exit_info_2(vcpu, rc);
>                 break;
>         }
> +       case SVM_VMGEXIT_GUEST_REQUEST: {
> +               snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
> +
> +               ret = 1;
> +               break;
> +       }
> +       case SVM_VMGEXIT_EXT_GUEST_REQUEST: {
> +               snp_handle_ext_guest_request(svm,
> +                                            control->exit_info_1,
> +                                            control->exit_info_2);
> +
> +               ret = 1;
> +               break;
> +       }
>         case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>                 vcpu_unimpl(vcpu,
>                             "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 280072995306..71fe46a778f3 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -92,6 +92,8 @@ struct kvm_sev_info {
>         u64 snp_init_flags;
>         void *snp_context;      /* SNP guest context page */
>         struct srcu_struct psc_srcu;
> +       void *snp_certs_data;
> +       struct mutex guest_req_lock;
>  };
>
>  struct kvm_svm {
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 06/45] x86/sev: Invalid pages from direct map when adding it to RMP table
  2021-09-29 14:34   ` Borislav Petkov
@ 2021-09-30 16:19     ` Brijesh Singh
  2021-10-01 11:06       ` Borislav Petkov
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-09-30 16:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

Hi Boris,


On 9/29/21 9:34 AM, Borislav Petkov wrote:

...


>> Improve the rmp_make_private() to
>> invalid state so that pages cannot be used in the direct-map after its
>> added in the RMP table, and restore to its default valid permission after
>> the pages are removed from the RMP table.
> That sentence needs rewriting into proper english.
>
> The more important thing is, though, this doesn't talk about *why*
> you're doing this: you want to remove pages from the direct map when
> they're in the RMP table because something might modify the page and
> then the RMP check will fail?

I'll work to improve the commit description.


> Also, set_direct_map_invalid_noflush() simply clears the Present and RW
> bit of a pte.
>
> So what's up?

The set_direct_map_invalid_noflush() does two steps

1) Split the host page table (if needed)

2) Clear the present and RW bit from pte

In previous patches I was using the set_memory_4k() to split the direct
map before adding the pages in the RMP table. Based on Sean's review
comment [1], I switched to using set_direct_map_invalid_noflush() so
that page is not just split but also removed from the direct map. The
thought process is if in the future  set_direct_map_default_noflush() is
improved to restore the large mapping then it will all work transparently.

[1] https://lore.kernel.org/lkml/YO9kP1v0TAFXISHD@google.com/#t


>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  arch/x86/kernel/sev.c | 61 ++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 60 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index 8627c49666c9..bad41deb8335 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -2441,10 +2441,42 @@ int psmash(u64 pfn)
>>  }
>>  EXPORT_SYMBOL_GPL(psmash);
>>  
>> +static int restore_direct_map(u64 pfn, int npages)
> restore_pages_in_direct_map()
>
>> +{
>> +	int i, ret = 0;
>> +
>> +	for (i = 0; i < npages; i++) {
>> +		ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));
>> +		if (ret)
>> +			goto cleanup;
>> +	}
> So this is looping over a set of virtually contiguous pages, I presume,
> and if so, you should add a function called
>
> 	set_memory_p_rw()
>
> to arch/x86/mm/pat/set_memory.c which does
>
> 	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT | _PAGE_RW), 0);
>
> so that you can do all pages in one go.

I will look into it.

thanks

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 32/45] KVM: x86: Define RMP page fault error bits for #NPF
  2021-08-20 15:59 ` [PATCH Part2 v5 32/45] KVM: x86: Define RMP page fault error bits for #NPF Brijesh Singh
@ 2021-09-30 23:41   ` Marc Orr
  2021-10-01 13:03     ` Borislav Petkov
  0 siblings, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-09-30 23:41 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> When SEV-SNP is enabled globally, the hardware places restrictions on all
> memory accesses based on the RMP entry, whether the hypervisor or a VM,
> performs the accesses. When hardware encounters an RMP access violation
> during a guest access, it will cause a #VMEXIT(NPF).
>
> See APM2 section 16.36.10 for more details.

nit: Section # should be 15.36.10 (rather than 16.36.10). Also, is it
better to put section headings, rather than numbers in the commit logs
and comments? Someone mentioned to me that the section numbering in
APM and SDM can move around over time, but the section titles tend to
be more stable. I'm not sure how true this is, so feel free to
disregard this comment.

>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 109e80167f11..a6e764458f3e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -239,8 +239,12 @@ enum x86_intercept_stage;
>  #define PFERR_FETCH_BIT 4
>  #define PFERR_PK_BIT 5
>  #define PFERR_SGX_BIT 15
> +#define PFERR_GUEST_RMP_BIT 31
>  #define PFERR_GUEST_FINAL_BIT 32
>  #define PFERR_GUEST_PAGE_BIT 33
> +#define PFERR_GUEST_ENC_BIT 34
> +#define PFERR_GUEST_SIZEM_BIT 35
> +#define PFERR_GUEST_VMPL_BIT 36
>
>  #define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
>  #define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
> @@ -251,6 +255,10 @@ enum x86_intercept_stage;
>  #define PFERR_SGX_MASK (1U << PFERR_SGX_BIT)
>  #define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
>  #define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
> +#define PFERR_GUEST_RMP_MASK (1ULL << PFERR_GUEST_RMP_BIT)
> +#define PFERR_GUEST_ENC_MASK (1ULL << PFERR_GUEST_ENC_BIT)
> +#define PFERR_GUEST_SIZEM_MASK (1ULL << PFERR_GUEST_SIZEM_BIT)
> +#define PFERR_GUEST_VMPL_MASK (1ULL << PFERR_GUEST_VMPL_BIT)
>
>  #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |       \
>                                  PFERR_WRITE_MASK |             \
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 06/45] x86/sev: Invalid pages from direct map when adding it to RMP table
  2021-09-30 16:19     ` Brijesh Singh
@ 2021-10-01 11:06       ` Borislav Petkov
  0 siblings, 0 replies; 239+ messages in thread
From: Borislav Petkov @ 2021-10-01 11:06 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Thu, Sep 30, 2021 at 09:19:52AM -0700, Brijesh Singh wrote:
> . The thought process is if in the future 
> set_direct_map_default_noflush() is improved to restore the large
> mapping then it will all work transparently.

That's only scratching the surface of the *why* this is done so please
explain why this dance is being done in a comment above the code so that
it is clear.

It is not really obvious why that hiding from the direct map is being
done.

Good reason from that memfd_secret mail are:

"* Prevent cross-process secret userspace memory exposures. Once the secret
memory is allocated, the user can't accidentally pass it into the kernel to
be transmitted somewhere. The secreremem pages cannot be accessed via the
direct map and they are disallowed in GUP."

and in general hiding RMP pages from the direct map is a nice additional
protection.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 32/45] KVM: x86: Define RMP page fault error bits for #NPF
  2021-09-30 23:41   ` Marc Orr
@ 2021-10-01 13:03     ` Borislav Petkov
  0 siblings, 0 replies; 239+ messages in thread
From: Borislav Petkov @ 2021-10-01 13:03 UTC (permalink / raw)
  To: Marc Orr
  Cc: Brijesh Singh, x86, LKML, kvm list, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Thu, Sep 30, 2021 at 04:41:54PM -0700, Marc Orr wrote:
> On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
> >
> > When SEV-SNP is enabled globally, the hardware places restrictions on all
> > memory accesses based on the RMP entry, whether the hypervisor or a VM,
> > performs the accesses. When hardware encounters an RMP access violation
> > during a guest access, it will cause a #VMEXIT(NPF).
> >
> > See APM2 section 16.36.10 for more details.
> 
> nit: Section # should be 15.36.10 (rather than 16.36.10). Also, is it
> better to put section headings, rather than numbers in the commit logs
> and comments? Someone mentioned to me that the section numbering in
> APM and SDM can move around over time, but the section titles tend to
> be more stable. I'm not sure how true this is, so feel free to
> disregard this comment.

No, that comment is correct, please make it unambiguous so that if
someone's looking later, someone can find the section even in future
docs. (I'm hoping they don't change headings, that is...).

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 25/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2021-09-27 19:33     ` Brijesh Singh
@ 2021-10-05 15:01       ` Peter Gonda
  0 siblings, 0 replies; 239+ messages in thread
From: Peter Gonda @ 2021-10-05 15:01 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, LKML, kvm list, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	Marc Orr, sathyanarayanan.kuppuswamy

On Mon, Sep 27, 2021 at 1:33 PM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
>
>
> On 9/27/21 11:43 AM, Peter Gonda wrote:
> ...
> >>
> >> +static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
> >> +{
> >> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >> +       struct list_head *head = &sev->regions_list;
> >> +       struct enc_region *i;
> >> +
> >> +       lockdep_assert_held(&kvm->lock);
> >> +
> >> +       list_for_each_entry(i, head, list) {
> >> +               u64 start = i->uaddr;
> >> +               u64 end = start + i->size;
> >> +
> >> +               if (start <= hva && end >= (hva + len))
> >> +                       return true;
> >> +       }
> >> +
> >> +       return false;
> >> +}
> >
> > Internally we actually register the guest memory in chunks for various
> > reasons. So for our largest SEV VM we have 768 1 GB entries in
> > |sev->regions_list|. This was OK before because no look ups were done.
> > Now that we are performing a look ups a linked list with linear time
> > lookups seems not ideal, could we switch the back data structure here
> > to something more conducive too fast lookups?
> >> +
>
> Interesting, for qemu we had very few number of regions so there was no
> strong reason for me to think something otherwise. Do you have any
> preference on what data structure you will use ?

Chatted offline. I think this is fine for now, we won't want to use
our userspace demand pinning with SNP yet.

>
> >> +static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >> +{
> >> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >> +       struct sev_data_snp_launch_update data = {0};
> >> +       struct kvm_sev_snp_launch_update params;
> >> +       unsigned long npages, pfn, n = 0;
> >
> > Could we have a slightly more descriptive name for |n|? nprivate
> > maybe? Also why not zero in the loop below?
> >
>
> Sure, I will pick a better name and no need to zero above. I will fix it.
>
> > for (i = 0, n = 0; i < npages; ++i)
> >
> >> +       int *error = &argp->error;
> >> +       struct page **inpages;
> >> +       int ret, i, level;
> >
> > Should |i| be an unsigned long since it can is tracked in a for loop
> > with "i < npages" npages being an unsigned long? (|n| too)
> >
>
> Noted.
>
> >> +       u64 gfn;
> >> +
> >> +       if (!sev_snp_guest(kvm))
> >> +               return -ENOTTY;
> >> +
> >> +       if (!sev->snp_context)
> >> +               return -EINVAL;
> >> +
> >> +       if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> >> +               return -EFAULT;
> >> +
> >> +       /* Verify that the specified address range is registered. */
> >> +       if (!is_hva_registered(kvm, params.uaddr, params.len))
> >> +               return -EINVAL;
> >> +
> >> +       /*
> >> +        * The userspace memory is already locked so technically we don't
> >> +        * need to lock it again. Later part of the function needs to know
> >> +        * pfn so call the sev_pin_memory() so that we can get the list of
> >> +        * pages to iterate through.
> >> +        */
> >> +       inpages = sev_pin_memory(kvm, params.uaddr, params.len, &npages, 1);
> >> +       if (!inpages)
> >> +               return -ENOMEM;
> >> +
> >> +       /*
> >> +        * Verify that all the pages are marked shared in the RMP table before
> >> +        * going further. This is avoid the cases where the userspace may try
> >
> > This is *too* avoid cases...
> >
> Noted
>
> >> +        * updating the same page twice.
> >> +        */
> >> +       for (i = 0; i < npages; i++) {
> >> +               if (snp_lookup_rmpentry(page_to_pfn(inpages[i]), &level) != 0) {
> >> +                       sev_unpin_memory(kvm, inpages, npages);
> >> +                       return -EFAULT;
> >> +               }
> >> +       }
> >> +
> >> +       gfn = params.start_gfn;
> >> +       level = PG_LEVEL_4K;
> >> +       data.gctx_paddr = __psp_pa(sev->snp_context);
> >> +
> >> +       for (i = 0; i < npages; i++) {
> >> +               pfn = page_to_pfn(inpages[i]);
> >> +
> >> +               ret = rmp_make_private(pfn, gfn << PAGE_SHIFT, level, sev_get_asid(kvm), true);
> >> +               if (ret) {
> >> +                       ret = -EFAULT;
> >> +                       goto e_unpin;
> >> +               }
> >> +
> >> +               n++;
> >> +               data.address = __sme_page_pa(inpages[i]);
> >> +               data.page_size = X86_TO_RMP_PG_LEVEL(level);
> >> +               data.page_type = params.page_type;
> >> +               data.vmpl3_perms = params.vmpl3_perms;
> >> +               data.vmpl2_perms = params.vmpl2_perms;
> >> +               data.vmpl1_perms = params.vmpl1_perms;
> >> +               ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, &data, error);
> >> +               if (ret) {
> >> +                       /*
> >> +                        * If the command failed then need to reclaim the page.
> >> +                        */
> >> +                       snp_page_reclaim(pfn);
> >> +                       goto e_unpin;
> >> +               }
> >
> > Hmm if this call fails after the first iteration of this loop it will
> > lead to a hard to reproduce LaunchDigest right? Say if we are
> > SnpLaunchUpdating just 2 pages A and B. If we first call this ioctl
> > and A is SNP_LAUNCH_UPDATED'd but B fails, we then make A shared again
> > in the RMP. So we must call the ioctl with 2 pages again, after fixing
> > the issue with page B. Now the Launch digest has something like
> > Hash(A) then HASH(A & B) right (overly simplified) so A will be
> > included twice right? I am not sure if anything better can be done
> > here but might be worth documenting IIUC.
> >
>
> I can add a comment in documentation that if a LAUNCH_UPDATE fails then
> user need to destroy the existing context and start from the beginning.
> I am not sure if we want to support the partial update cases. But in
> case we have two choices a) decommission the context on failure or b)
> add a new command to destroy the existing context.
>

Agreed supporting the partial update case seems very tricky.

>
> >> +
> >> +               gfn++;
> >> +       }
> >> +
> >> +e_unpin:
> >> +       /* Content of memory is updated, mark pages dirty */
> >> +       for (i = 0; i < n; i++) {
> >> +               set_page_dirty_lock(inpages[i]);
> >> +               mark_page_accessed(inpages[i]);
> >> +
> >> +               /*
> >> +                * If its an error, then update RMP entry to change page ownership
> >> +                * to the hypervisor.
> >> +                */
> >> +               if (ret)
> >> +                       host_rmp_make_shared(pfn, level, true);
> >> +       }
> >> +
> >> +       /* Unlock the user pages */
> >> +       sev_unpin_memory(kvm, inpages, npages);
> >> +
> >> +       return ret;
> >> +}
> >> +
> >>   int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>   {
> >>          struct kvm_sev_cmd sev_cmd;
> >> @@ -1712,6 +1873,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >>          case KVM_SEV_SNP_LAUNCH_START:
> >>                  r = snp_launch_start(kvm, &sev_cmd);
> >>                  break;
> >> +       case KVM_SEV_SNP_LAUNCH_UPDATE:
> >> +               r = snp_launch_update(kvm, &sev_cmd);
> >> +               break;
> >>          default:
> >>                  r = -EINVAL;
> >>                  goto out;
> >> @@ -1794,6 +1958,29 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
> >>   static void __unregister_enc_region_locked(struct kvm *kvm,
> >>                                             struct enc_region *region)
> >>   {
> >> +       unsigned long i, pfn;
> >> +       int level;
> >> +
> >> +       /*
> >> +        * The guest memory pages are assigned in the RMP table. Unassign it
> >> +        * before releasing the memory.
> >> +        */
> >> +       if (sev_snp_guest(kvm)) {
> >> +               for (i = 0; i < region->npages; i++) {
> >> +                       pfn = page_to_pfn(region->pages[i]);
> >> +
> >> +                       if (!snp_lookup_rmpentry(pfn, &level))
> >> +                               continue;
> >> +
> >> +                       cond_resched();
> >> +
> >> +                       if (level > PG_LEVEL_4K)
> >> +                               pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
> >> +
> >> +                       host_rmp_make_shared(pfn, level, true);
> >> +               }
> >> +       }
> >> +
> >>          sev_unpin_memory(kvm, region->pages, region->npages);
> >>          list_del(&region->list);
> >>          kfree(region);
> >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >> index e6416e58cd9a..0681be4bdfdf 100644
> >> --- a/include/uapi/linux/kvm.h
> >> +++ b/include/uapi/linux/kvm.h
> >> @@ -1715,6 +1715,7 @@ enum sev_cmd_id {
> >>          /* SNP specific commands */
> >>          KVM_SEV_SNP_INIT,
> >>          KVM_SEV_SNP_LAUNCH_START,
> >> +       KVM_SEV_SNP_LAUNCH_UPDATE,
> >>
> >>          KVM_SEV_NR_MAX,
> >>   };
> >> @@ -1831,6 +1832,24 @@ struct kvm_sev_snp_launch_start {
> >>          __u8 pad[6];
> >>   };
> >>
> >> +#define KVM_SEV_SNP_PAGE_TYPE_NORMAL           0x1
> >> +#define KVM_SEV_SNP_PAGE_TYPE_VMSA             0x2
> >> +#define KVM_SEV_SNP_PAGE_TYPE_ZERO             0x3
> >> +#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED       0x4
> >> +#define KVM_SEV_SNP_PAGE_TYPE_SECRETS          0x5
> >> +#define KVM_SEV_SNP_PAGE_TYPE_CPUID            0x6
> >> +
> >> +struct kvm_sev_snp_launch_update {
> >> +       __u64 start_gfn;
> >> +       __u64 uaddr;
> >> +       __u32 len;
> >> +       __u8 imi_page;
> >> +       __u8 page_type;
> >> +       __u8 vmpl3_perms;
> >> +       __u8 vmpl2_perms;
> >> +       __u8 vmpl1_perms;
> >> +};
> >> +
> >>   #define KVM_DEV_ASSIGN_ENABLE_IOMMU    (1 << 0)
> >>   #define KVM_DEV_ASSIGN_PCI_2_3         (1 << 1)
> >>   #define KVM_DEV_ASSIGN_MASK_INTX       (1 << 2)
> >> --
> >> 2.17.1
> >>
> >>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
  2021-08-20 15:58 ` [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests Brijesh Singh
  2021-09-23 17:18   ` Dr. David Alan Gilbert
@ 2021-10-12 18:46   ` Sean Christopherson
  2021-10-13 12:39     ` Brijesh Singh
  1 sibling, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-10-12 18:46 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> When SEV-SNP is enabled, the guest private pages are added in the RMP
> table; while adding the pages, the rmp_make_private() unmaps the pages
> from the direct map. If KSM attempts to access those unmapped pages then
> it will trigger #PF (page-not-present).
> 
> Encrypted guest pages cannot be shared between the process, so an
> userspace should not mark the region mergeable but to be safe, mark the
> process vma unmerable before adding the pages in the RMP table.

To be safe from what?  Does the !PRESENT #PF crash the kernel?

> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 4b126598b7aa..dcef0ae5f8e4 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -18,11 +18,13 @@
>  #include <linux/processor.h>
>  #include <linux/trace_events.h>
>  #include <linux/sev.h>
> +#include <linux/ksm.h>
>  #include <asm/fpu/internal.h>
>  
>  #include <asm/pkru.h>
>  #include <asm/trapnr.h>
>  #include <asm/sev.h>
> +#include <asm/mman.h>
>  
>  #include "x86.h"
>  #include "svm.h"
> @@ -1683,6 +1685,30 @@ static bool is_hva_registered(struct kvm *kvm, hva_t hva, size_t len)
>  	return false;
>  }
>  
> +static int snp_mark_unmergable(struct kvm *kvm, u64 start, u64 size)
> +{
> +	struct vm_area_struct *vma;
> +	u64 end = start + size;
> +	int ret;
> +
> +	do {
> +		vma = find_vma_intersection(kvm->mm, start, end);
> +		if (!vma) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +				  MADV_UNMERGEABLE, &vma->vm_flags);
> +		if (ret)
> +			break;
> +
> +		start = vma->vm_end;
> +	} while (end > vma->vm_end);
> +
> +	return ret;
> +}
> +
>  static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
>  	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> @@ -1707,6 +1733,12 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	if (!is_hva_registered(kvm, params.uaddr, params.len))
>  		return -EINVAL;
>  
> +	mmap_write_lock(kvm->mm);
> +	ret = snp_mark_unmergable(kvm, params.uaddr, params.len);
> +	mmap_write_unlock(kvm->mm);

This does not, and practically speaking cannot, work.  There are multiple TOCTOU
bugs, here and in __snp_handle_page_state_change().  Userspace can madvise() the
range at any later point, munmap()/mmap() the entire range, mess with the memslots
in the PSC case, and so on and so forth.  Relying on MADV_UNMERGEABLE for functional
correctness simply cannot work in KVM, barring mmu_notifier and a big pile of code.

> +	if (ret)
> +		return -EFAULT;
> +
>  	/*
>  	 * The userspace memory is already locked so technically we don't
>  	 * need to lock it again. Later part of the function needs to know
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 20/45] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
  2021-08-20 15:58 ` [PATCH Part2 v5 20/45] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT Brijesh Singh
@ 2021-10-12 20:38   ` Sean Christopherson
  0 siblings, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-10-12 20:38 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> Version 2 of the GHCB specification introduced advertisement of features
> that are supported by the Hypervisor.
> 
> Now that KVM supports version 2 of the GHCB specification, bump the
> maximum supported protocol version.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/sev-common.h |  2 ++
>  arch/x86/kvm/svm/sev.c            | 14 ++++++++++++++
>  arch/x86/kvm/svm/svm.h            |  3 ++-
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index d70a19000953..779c7e8f836c 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -97,6 +97,8 @@ enum psc_op {
>  /* GHCB Hypervisor Feature Request/Response */
>  #define GHCB_MSR_HV_FT_REQ		0x080
>  #define GHCB_MSR_HV_FT_RESP		0x081
> +#define GHCB_MSR_HV_FT_POS		12
> +#define GHCB_MSR_HV_FT_MASK		GENMASK_ULL(51, 0)
>  #define GHCB_MSR_HV_FT_RESP_VAL(v)			\
>  	/* GHCBData[63:12] */				\
>  	(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0ca5b5b9aeef..1644da5fc93f 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2184,6 +2184,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>  	case SVM_VMGEXIT_AP_HLT_LOOP:
>  	case SVM_VMGEXIT_AP_JUMP_TABLE:
>  	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> +	case SVM_VMGEXIT_HV_FEATURES:
>  		break;
>  	default:
>  		goto vmgexit_err;
> @@ -2438,6 +2439,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  				  GHCB_MSR_INFO_MASK,
>  				  GHCB_MSR_INFO_POS);
>  		break;
> +	case GHCB_MSR_HV_FT_REQ: {

Unnecessary braces.

> +		set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
> +				  GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
> +		set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
> +				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
> +		break;
> +	}
>  	case GHCB_MSR_TERM_REQ: {
>  		u64 reason_set, reason_code;
>  
> @@ -2553,6 +2561,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>  		ret = 1;
>  		break;
>  	}
> +	case SVM_VMGEXIT_HV_FEATURES: {

Same here.

> +		ghcb_set_sw_exit_info_2(ghcb, GHCB_HV_FT_SUPPORTED);
> +
> +		ret = 1;
> +		break;
> +	}
>  	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>  		vcpu_unimpl(vcpu,
>  			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2021-08-20 15:58 ` [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Brijesh Singh
  2021-09-22 18:55   ` Dr. David Alan Gilbert
@ 2021-10-12 20:44   ` Sean Christopherson
  1 sibling, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-10-12 20:44 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of a VMCB, VMSA or AVIC backing page.

...

> @@ -4539,6 +4539,16 @@ static int svm_vm_init(struct kvm *kvm)
>  	return 0;
>  }
>  
> +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
> +{
> +	struct page *page = snp_safe_alloc_page(vcpu);
> +
> +	if (!page)
> +		return NULL;
> +
> +	return page_address(page);
> +}
> +
>  static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.hardware_unsetup = svm_hardware_teardown,
>  	.hardware_enable = svm_hardware_enable,
> @@ -4667,6 +4677,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.complete_emulated_msr = svm_complete_emulated_msr,
>  
>  	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> +
> +	.alloc_apic_backing_page = svm_alloc_apic_backing_page,

IMO, this should be guarded by a module param or X86_BUG_* to make it clear that
this is a bug and not working as intended.

And doesn't the APIC page need these shenanigans iff AVIC is enabled? (the module
param, not necessarily in the VM)

>  };
>  
>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index d1f1512a4b47..e40800e9c998 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -575,6 +575,7 @@ void sev_es_create_vcpu(struct vcpu_svm *svm);
>  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
>  void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
>  void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>  
>  /* vmenter.S */
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-08-20 15:59 ` [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
  2021-09-28  9:56   ` Dr. David Alan Gilbert
@ 2021-10-12 21:48   ` Sean Christopherson
  2021-10-13 17:04     ` Sean Christopherson
  2021-10-13 17:05     ` Brijesh Singh
  1 sibling, 2 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-10-12 21:48 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 991b8c996fc1..6d9483ec91ab 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -31,6 +31,7 @@
>  #include "svm_ops.h"
>  #include "cpuid.h"
>  #include "trace.h"
> +#include "mmu.h"
>  
>  #define __ex(x) __kvm_handle_fault_on_reboot(x)
>  
> @@ -2905,6 +2906,181 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
>  	svm->vmcb->control.ghcb_gpa = value;
>  }
>  
> +static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
> +{
> +	pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
> +
> +	return psmash(pfn);
> +}
> +
> +static int snp_make_page_shared(struct kvm *kvm, gpa_t gpa, kvm_pfn_t pfn, int level)
> +{
> +	int rc, rmp_level;
> +
> +	rc = snp_lookup_rmpentry(pfn, &rmp_level);
> +	if (rc < 0)
> +		return -EINVAL;
> +
> +	/* If page is not assigned then do nothing */
> +	if (!rc)
> +		return 0;
> +
> +	/*
> +	 * Is the page part of an existing 2MB RMP entry ? Split the 2MB into
> +	 * multiple of 4K-page before making the memory shared.
> +	 */
> +	if (level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M) {
> +		rc = snp_rmptable_psmash(kvm, pfn);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	return rmp_make_shared(pfn, level);
> +}
> +
> +static int snp_check_and_build_npt(struct kvm_vcpu *vcpu, gpa_t gpa, int level)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	int rc, npt_level;
> +	kvm_pfn_t pfn;
> +
> +	/*
> +	 * Get the pfn and level for the gpa from the nested page table.
> +	 *
> +	 * If the tdp walk fails, then its safe to say that there is no
> +	 * valid mapping for this gpa. Create a fault to build the map.
> +	 */
> +	write_lock(&kvm->mmu_lock);

SEV (or any vendor code for that matter) should not be taking mmu_lock.  All of
KVM has somewhat fungible borders between the various components, but IMO this
crosses firmly into "this belongs in the MMU" territory.

For example, I highly doubt this actually need to take mmu_lock for write.  More
below.

> +	rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
> +	write_unlock(&kvm->mmu_lock);

What's the point of this walk?  As soon as mmu_lock is dropped, all bets are off.
At best this is a strong hint.  It doesn't hurt anything per se, it's just a waste
of cycles.

> +	if (!rc) {
> +		pfn = kvm_mmu_map_tdp_page(vcpu, gpa, PFERR_USER_MASK, level);

Same here.

> +		if (is_error_noslot_pfn(pfn))
> +			return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int snp_gpa_to_hva(struct kvm *kvm, gpa_t gpa, hva_t *hva)
> +{
> +	struct kvm_memory_slot *slot;
> +	gfn_t gfn = gpa_to_gfn(gpa);
> +	int idx;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot) {
> +		srcu_read_unlock(&kvm->srcu, idx);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Note, using the __gfn_to_hva_memslot() is not solely for performance,
> +	 * it's also necessary to avoid the "writable" check in __gfn_to_hva_many(),
> +	 * which will always fail on read-only memslots due to gfn_to_hva() assuming
> +	 * writes.
> +	 */
> +	*hva = __gfn_to_hva_memslot(slot, gfn);
> +	srcu_read_unlock(&kvm->srcu, idx);

*hva is effectively invalidated the instance kvm->srcu is unlocked, e.g. a pending
memslot update can complete immediately after and delete/move the backing memslot.

> +
> +	return 0;
> +}
> +
> +static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op, gpa_t gpa,
> +					  int level)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
> +	struct kvm *kvm = vcpu->kvm;
> +	int rc, npt_level;
> +	kvm_pfn_t pfn;
> +	gpa_t gpa_end;
> +
> +	gpa_end = gpa + page_level_size(level);
> +
> +	while (gpa < gpa_end) {
> +		/*
> +		 * If the gpa is not present in the NPT then build the NPT.
> +		 */
> +		rc = snp_check_and_build_npt(vcpu, gpa, level);
> +		if (rc)
> +			return -EINVAL;
> +
> +		if (op == SNP_PAGE_STATE_PRIVATE) {
> +			hva_t hva;
> +
> +			if (snp_gpa_to_hva(kvm, gpa, &hva))
> +				return -EINVAL;
> +
> +			/*
> +			 * Verify that the hva range is registered. This enforcement is
> +			 * required to avoid the cases where a page is marked private
> +			 * in the RMP table but never gets cleanup during the VM
> +			 * termination path.
> +			 */
> +			mutex_lock(&kvm->lock);
> +			rc = is_hva_registered(kvm, hva, page_level_size(level));

This will get a false negative if a hva+size spans two contiguous regions.

Also, storing a boolean return in a variable that is an int _and_ was already used
for the kernel's standard 

> +			mutex_unlock(&kvm->lock);

This is also subject to races, e.g. userspace unregisters the hva immediately
after this check, before KVM makes whatever conversion it makes below.

A linear walk through a list to find a range is also a bad idea, e.g. pathological
worst case scenario is that userspace has created tens of thousands of individual
regions.  There is no restriction on the number of regions, just the number of
pages that can be pinned.

I dislike the svm_(un)register_enc_region() scheme in general, but at least for
SEV and SEV-ES the code is isolated, e.g. KVM is little more than a dump pipe to
let userspace pin pages.  I would like to go the opposite direction and work towards
eliminating regions_list (or at least making it optional), not build more stuff
on top.

The more I look at this, the more strongly I feel that private <=> shared conversions
belong in the MMU, and that KVM's SPTEs should be the single source of truth for
shared vs. private.  E.g. add a SPTE_TDP_PRIVATE_MASK in the software available bits.
I believe the only hiccup is the snafu where not zapping _all_ SPTEs on memslot
deletion breaks QEMU+VFIO+GPU, i.e. KVM would lose its canonical info on unrelated
memslot deletion.

But that is a solvable problem.  Ideally the bug, wherever it is, would be root
caused and fixed.  I believe Peter (and Marc?) is going to work on reproducing
the bug.

If we are unable to root cause and fix the bug, I think a viable workaround would
be to clear the hardware present bit in unrelated SPTEs, but keep the SPTEs
themselves.  The idea mostly the same as the ZAPPED_PRIVATE concept from the initial
TDX RFC.  MMU notifier invalidations, memslot removal, RMP restoration, etc... would
all continue to work since the SPTEs is still there, and KVM's page fault handler
could audit any "blocked" SPTE when it's refaulted (I'm pretty sure it'd be
impossible for the PFN to change, since any PFN change would require a memslot
update or mmu_notifier invalidation).

The downside to that approach is that it would require walking all SPTEs to do a
memslot deletion, i.e. we'd lose the "fast zap" behavior.  If that's a performance
issue, the behavior could be opt-in (but not for SNP/TDX).

> +			if (!rc)
> +				return -EINVAL;
> +
> +			/*
> +			 * Mark the userspace range unmerable before adding the pages
> +			 * in the RMP table.
> +			 */
> +			mmap_write_lock(kvm->mm);
> +			rc = snp_mark_unmergable(kvm, hva, page_level_size(level));
> +			mmap_write_unlock(kvm->mm);

As mentioned in an earlier patch, this simply cannot work.

> +			if (rc)
> +				return -EINVAL;
> +		}
> +
> +		write_lock(&kvm->mmu_lock);
> +
> +		rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);

Same comment about the bool into int. Though in this case I'd say have
kvm_mmu_get_tdp_walk() return 0/-errno, not a bool.  Boolean returns for helpers
without "is_", "test_", etc... are generally confusing.

> +		if (!rc) {
> +			/*
> +			 * This may happen if another vCPU unmapped the page
> +			 * before we acquire the lock. Retry the PSC.
> +			 */
> +			write_unlock(&kvm->mmu_lock);
> +			return 0;

How will the caller (guest?) know to retry the PSC if KVM returns "success"?

> +		}
> +
> +		/*
> +		 * Adjust the level so that we don't go higher than the backing
> +		 * page level.
> +		 */
> +		level = min_t(size_t, level, npt_level);
> +
> +		trace_kvm_snp_psc(vcpu->vcpu_id, pfn, gpa, op, level);
> +
> +		switch (op) {
> +		case SNP_PAGE_STATE_SHARED:
> +			rc = snp_make_page_shared(kvm, gpa, pfn, level);
> +			break;
> +		case SNP_PAGE_STATE_PRIVATE:
> +			rc = rmp_make_private(pfn, gpa, level, sev->asid, false);
> +			break;
> +		default:
> +			rc = -EINVAL;

Not that it really matters, because I don't think the MADV_* approach is viable,
but this neglects to undo snp_mark_unmergable() on failure.

> +			break;
> +		}
> +
> +		write_unlock(&kvm->mmu_lock);
> +
> +		if (rc) {
> +			pr_err_ratelimited("Error op %d gpa %llx pfn %llx level %d rc %d\n",
> +					   op, gpa, pfn, level, rc);
> +			return rc;
> +		}
> +
> +		gpa = gpa + page_level_size(level);
> +	}
> +
> +	return 0;
> +}
> +
>  static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>  {
>  	struct vmcb_control_area *control = &svm->vmcb->control;

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2021-08-20 15:59 ` [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap Brijesh Singh
@ 2021-10-13  0:23   ` Sean Christopherson
  2021-10-13 18:10     ` Brijesh Singh
  2021-10-13 20:16     ` Sean Christopherson
  0 siblings, 2 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-10-13  0:23 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> When SEV-SNP is enabled in the guest VM, the guest memory pages can
> either be a private or shared. A write from the hypervisor goes through
> the RMP checks. If hardware sees that hypervisor is attempting to write
> to a guest private page, then it triggers an RMP violation #PF.
> 
> To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
> used to verify that its safe to map a given guest page. Use the SRCU to
> protect against the page state change for existing mapped pages.

SRCU isn't protecting anything.  The synchronize_srcu_expedited() in the PSC code
forces it to wait for existing maps to go away, but it doesn't prevent new maps
from being created while the actual RMP updates are in-flight.  Most telling is
that the RMP updates happen _after_ the synchronize_srcu_expedited() call.

This also doesn't handle kvm_{read,write}_guest_cached().

I can't help but note that the private memslots idea[*] would handle this gracefully,
e.g. the memslot lookup would fail, and any change in private memslots would
invalidate the cache due to a generation mismatch.

KSM is another mess that would Just Work.

I'm not saying that SNP should be blocked on support for unmapping guest private
memory, but I do think we should strongly consider focusing on that effort rather
than trying to fix things piecemeal throughout KVM.  I don't think it's too absurd
to say that it might actually be faster overall.  And I 100% think that having a
cohesive design and uABI for SNP and TDX would be hugely beneficial to KVM.

[*] https://lkml.kernel.org/r/20210824005248.200037-1-seanjc@google.com

> +int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	int level;
> +
> +	if (!sev_snp_guest(kvm))
> +		return 0;
> +
> +	*token = srcu_read_lock(&sev->psc_srcu);
> +
> +	/* If pfn is not added as private then fail */

This comment and the pr_err() are backwards, and confused the heck out of me.
snp_lookup_rmpentry() returns '1' if the pfn is assigned, a.k.a. private.  That
means this code throws an error if the page is private, i.e. requires the page
to be shared.  Which makes sense given the use cases, it's just incredibly
confusing.

> +	if (snp_lookup_rmpentry(pfn, &level) == 1) {

Any reason not to provide e.g. rmp_is_shared() and rmp_is_private() so that
callers don't have to care as much about the return values?  The -errno/0/1
semantics are all but guarantee to bite us in the rear at some point.

Actually, peeking at other patches, I think it already has.  This usage in
__unregister_enc_region_locked() is wrong:

	/*
	 * The guest memory pages are assigned in the RMP table. Unassign it
	 * before releasing the memory.
	 */
	if (sev_snp_guest(kvm)) {
		for (i = 0; i < region->npages; i++) {
			pfn = page_to_pfn(region->pages[i]);

			if (!snp_lookup_rmpentry(pfn, &level))  <-- attempts make_shared() on non-existent entry
				continue;

			cond_resched();

			if (level > PG_LEVEL_4K)
				pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);

			host_rmp_make_shared(pfn, level, true);
		}
	}


> +		srcu_read_unlock(&sev->psc_srcu, *token);
> +		pr_err_ratelimited("failed to map private gfn 0x%llx pfn 0x%llx\n", gfn, pfn);
> +		return -EBUSY;
> +	}
> +
> +	return 0;
> +}
>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index d10f7166b39d..ff91184f9b4a 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -76,16 +76,22 @@ struct kvm_sev_info {
>  	bool active;		/* SEV enabled guest */
>  	bool es_active;		/* SEV-ES enabled guest */
>  	bool snp_active;	/* SEV-SNP enabled guest */
> +
>  	unsigned int asid;	/* ASID used for this guest */
>  	unsigned int handle;	/* SEV firmware handle */
>  	int fd;			/* SEV device fd */
> +
>  	unsigned long pages_locked; /* Number of pages locked */
>  	struct list_head regions_list;  /* List of registered regions */
> +
>  	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
> +
>  	struct kvm *enc_context_owner; /* Owner of copied encryption context */
>  	struct misc_cg *misc_cg; /* For misc cgroup accounting */
> +

Unrelated whitespace changes.

>  	u64 snp_init_flags;
>  	void *snp_context;      /* SNP guest context page */
> +	struct srcu_struct psc_srcu;
>  };

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
  2021-10-12 18:46   ` Sean Christopherson
@ 2021-10-13 12:39     ` Brijesh Singh
  2021-10-13 14:34       ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-10-13 12:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/12/21 11:46 AM, Sean Christopherson wrote:
> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>> When SEV-SNP is enabled, the guest private pages are added in the RMP
>> table; while adding the pages, the rmp_make_private() unmaps the pages
>> from the direct map. If KSM attempts to access those unmapped pages then
>> it will trigger #PF (page-not-present).
>>
>> Encrypted guest pages cannot be shared between the process, so an
>> userspace should not mark the region mergeable but to be safe, mark the
>> process vma unmerable before adding the pages in the RMP table.
> To be safe from what?  Does the !PRESENT #PF crash the kernel?

Yes, kernel crashes when KSM attempts to access to an unmaped pfn.

[...]
>> +	mmap_write_lock(kvm->mm);
>> +	ret = snp_mark_unmergable(kvm, params.uaddr, params.len);
>> +	mmap_write_unlock(kvm->mm);
> This does not, and practically speaking cannot, work.  There are multiple TOCTOU
> bugs, here and in __snp_handle_page_state_change().  Userspace can madvise() the
> range at any later point, munmap()/mmap() the entire range, mess with the memslots
> in the PSC case, and so on and so forth.  Relying on MADV_UNMERGEABLE for functional
> correctness simply cannot work in KVM, barring mmu_notifier and a big pile of code.

AFAICT, ksm does not exclude the unmapped pfn from its scan list. We
need to tell ksm somehow to exclude the unmapped pfn from its scan list.
I understand that if userspace is messing with us, we have an issue, but
it's a userspace bug ;) To fix it right, we need to enhance ksm to
exclude the pfn when it is getting unmapped from the direct map. I
believe that work can be done outside of the SNP series. I am okay to
drop snp_mark_unmerable(), and until then, we just run with KSM
disabled. Thoughts?

thanks

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
  2021-10-13 12:39     ` Brijesh Singh
@ 2021-10-13 14:34       ` Sean Christopherson
  2021-10-13 14:51         ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-10-13 14:34 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Wed, Oct 13, 2021, Brijesh Singh wrote:
> 
> On 10/12/21 11:46 AM, Sean Christopherson wrote:
> > On Fri, Aug 20, 2021, Brijesh Singh wrote:
> >> When SEV-SNP is enabled, the guest private pages are added in the RMP
> >> table; while adding the pages, the rmp_make_private() unmaps the pages
> >> from the direct map. If KSM attempts to access those unmapped pages then
> >> it will trigger #PF (page-not-present).
> >>
> >> Encrypted guest pages cannot be shared between the process, so an
> >> userspace should not mark the region mergeable but to be safe, mark the
> >> process vma unmerable before adding the pages in the RMP table.
> > To be safe from what?  Does the !PRESENT #PF crash the kernel?
> 
> Yes, kernel crashes when KSM attempts to access to an unmaped pfn.

Is this problem unique to nuking the direct map (patch 05), or would it also be
a problem (in the form of an RMP violation) if the direct map were demoted to 4k
pages?
 
> [...]
> >> +	mmap_write_lock(kvm->mm);
> >> +	ret = snp_mark_unmergable(kvm, params.uaddr, params.len);
> >> +	mmap_write_unlock(kvm->mm);
> > This does not, and practically speaking cannot, work.  There are multiple TOCTOU
> > bugs, here and in __snp_handle_page_state_change().  Userspace can madvise() the
> > range at any later point, munmap()/mmap() the entire range, mess with the memslots
> > in the PSC case, and so on and so forth.  Relying on MADV_UNMERGEABLE for functional
> > correctness simply cannot work in KVM, barring mmu_notifier and a big pile of code.
> 
> AFAICT, ksm does not exclude the unmapped pfn from its scan list. We
> need to tell ksm somehow to exclude the unmapped pfn from its scan list.
> I understand that if userspace is messing with us, we have an issue, but
> it's a userspace bug ;) To fix it right, we need to enhance ksm to
> exclude the pfn when it is getting unmapped from the direct map. I
> believe that work can be done outside of the SNP series. I am okay to
> drop snp_mark_unmerable(), and until then, we just run with KSM
> disabled. Thoughts?
> 
> thanks

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
  2021-10-13 14:34       ` Sean Christopherson
@ 2021-10-13 14:51         ` Brijesh Singh
  2021-10-13 15:33           ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-10-13 14:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/13/21 7:34 AM, Sean Christopherson wrote:
> On Wed, Oct 13, 2021, Brijesh Singh wrote:
>> On 10/12/21 11:46 AM, Sean Christopherson wrote:
>>> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>>>> When SEV-SNP is enabled, the guest private pages are added in the RMP
>>>> table; while adding the pages, the rmp_make_private() unmaps the pages
>>>> from the direct map. If KSM attempts to access those unmapped pages then
>>>> it will trigger #PF (page-not-present).
>>>>
>>>> Encrypted guest pages cannot be shared between the process, so an
>>>> userspace should not mark the region mergeable but to be safe, mark the
>>>> process vma unmerable before adding the pages in the RMP table.
>>> To be safe from what?  Does the !PRESENT #PF crash the kernel?
>> Yes, kernel crashes when KSM attempts to access to an unmaped pfn.
> Is this problem unique to nuking the direct map (patch 05), 

Yes. This problem didn't exist in previous series because we were not
nuking the page from direct map and KSM was able to read the memory just
fine. Now with the page removed from the direct map causes #PF
(not-present).


> or would it also be
> a problem (in the form of an RMP violation) if the direct map were demoted to 4k
> pages?
>  

No, this problem does happen due to the demotion. In previous series, we
were demoting the pages to 4k and everyone was happy (including ksm). In
the case of ksm, the page will *never* be merged because ciphertext for
two private pages will never be the same. Removing the pages from direct
map certainly brings additional complexity in the KVM and other places
in the kernel. From architecture point of view, there is actually no
need to mark the page *not present* in the direct map. I believe in TDX
that is must but for the SEV-SNP its not required at all. A hypervisor
can read the guest private pages just fine, only the write will cause an
RMP fault.


>> [...]
>>>> +	mmap_write_lock(kvm->mm);
>>>> +	ret = snp_mark_unmergable(kvm, params.uaddr, params.len);
>>>> +	mmap_write_unlock(kvm->mm);
>>> This does not, and practically speaking cannot, work.  There are multiple TOCTOU
>>> bugs, here and in __snp_handle_page_state_change().  Userspace can madvise() the
>>> range at any later point, munmap()/mmap() the entire range, mess with the memslots
>>> in the PSC case, and so on and so forth.  Relying on MADV_UNMERGEABLE for functional
>>> correctness simply cannot work in KVM, barring mmu_notifier and a big pile of code.
>> AFAICT, ksm does not exclude the unmapped pfn from its scan list. We
>> need to tell ksm somehow to exclude the unmapped pfn from its scan list.
>> I understand that if userspace is messing with us, we have an issue, but
>> it's a userspace bug ;) To fix it right, we need to enhance ksm to
>> exclude the pfn when it is getting unmapped from the direct map. I
>> believe that work can be done outside of the SNP series. I am okay to
>> drop snp_mark_unmerable(), and until then, we just run with KSM
>> disabled. Thoughts?
>>
>> thanks

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
  2021-10-13 14:51         ` Brijesh Singh
@ 2021-10-13 15:33           ` Sean Christopherson
  0 siblings, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-10-13 15:33 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Wed, Oct 13, 2021, Brijesh Singh wrote:
> 
> On 10/13/21 7:34 AM, Sean Christopherson wrote:
> > On Wed, Oct 13, 2021, Brijesh Singh wrote:
> >> On 10/12/21 11:46 AM, Sean Christopherson wrote:
> >>> On Fri, Aug 20, 2021, Brijesh Singh wrote:
> >>>> When SEV-SNP is enabled, the guest private pages are added in the RMP
> >>>> table; while adding the pages, the rmp_make_private() unmaps the pages
> >>>> from the direct map. If KSM attempts to access those unmapped pages then
> >>>> it will trigger #PF (page-not-present).
> >>>>
> >>>> Encrypted guest pages cannot be shared between the process, so an
> >>>> userspace should not mark the region mergeable but to be safe, mark the
> >>>> process vma unmerable before adding the pages in the RMP table.
> >>> To be safe from what?  Does the !PRESENT #PF crash the kernel?
> >> Yes, kernel crashes when KSM attempts to access to an unmaped pfn.
> > Is this problem unique to nuking the direct map (patch 05), 
> 
> Yes. This problem didn't exist in previous series because we were not
> nuking the page from direct map and KSM was able to read the memory just
> fine. Now with the page removed from the direct map causes #PF
> (not-present).

Hrm, so regardless of what manipulations are done to the direct map, any errant
write to guest private memory via the direct map would be fatal to the kernel.
That's both mildly terrifying and oddly encouraging, as it means silent guest data
corruption is no longer a thing, at least for private memory.

One concrete takeaway for me is that "silently" nuking the direct map on RMP
assignment is not an option.  Nuking the direct map if the kernel has a way to
determine that the backing store is for guest private memory is perfectly ok,
but pulling the rug out so to speak is setting us up for maintenance hell.

> > or would it also be a problem (in the form of an RMP violation) if the
> > direct map were demoted to 4k pages?
> >  
> 
> No, this problem does happen due to the demotion. In previous series, we
> were demoting the pages to 4k and everyone was happy (including ksm). In
> the case of ksm, the page will *never* be merged because ciphertext for
> two private pages will never be the same. Removing the pages from direct
> map certainly brings additional complexity in the KVM and other places
> in the kernel. From architecture point of view, there is actually no
> need to mark the page *not present* in the direct map. I believe in TDX
> that is must but for the SEV-SNP its not required at all.

Nuking the direct map is not strictly required for TDX either, as reads do not
compromise the integrity of the memory, i.e. don't poison memory and lead to
#MC.  Like SNP, writes via the direct map would be fatal.

The issue with TDX that is not shared by SNP is that writes through _user_ mappings
can be fatal the system.  With SNP, those generate RMP violations, but because they
are "just" page faults, the normal uaccess machinery happily eats them and SIGBUSes
the VMM.

> A hypervisor can read the guest private pages just fine, only the write will
> cause an RMP fault.

Well, for some definitions of "read".  I'm kinda joking, kinda serious.  KSM may
"work" when it reads garbage, but the same is likely not true for other kernel
code that wanders into guest private memory.  Ideally, the kernel would provide
a mechanism to _prevent_ any such reads/writes, and violations would be treated
as kernel bugs.  Given that SEV has been successfully deployed, the probability
of lurking bugs is quite low, but I still dislike the idea of latent bugs going
unnoticed or manifesting in weird ways.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-10-12 21:48   ` Sean Christopherson
@ 2021-10-13 17:04     ` Sean Christopherson
  2021-10-13 17:05     ` Brijesh Singh
  1 sibling, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-10-13 17:04 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Tue, Oct 12, 2021, Sean Christopherson wrote:
> If we are unable to root cause and fix the bug, I think a viable workaround would
> be to clear the hardware present bit in unrelated SPTEs, but keep the SPTEs
> themselves.  The idea mostly the same as the ZAPPED_PRIVATE concept from the initial
> TDX RFC.  MMU notifier invalidations, memslot removal, RMP restoration, etc... would
> all continue to work since the SPTEs is still there, and KVM's page fault handler
> could audit any "blocked" SPTE when it's refaulted (I'm pretty sure it'd be
> impossible for the PFN to change, since any PFN change would require a memslot
> update or mmu_notifier invalidation).
> 
> The downside to that approach is that it would require walking all SPTEs to do a
> memslot deletion, i.e. we'd lose the "fast zap" behavior.  If that's a performance
> issue, the behavior could be opt-in (but not for SNP/TDX).

Another option if we introduce private memslots is to preserve private memslots
on unrelated deletions.  The argument being that (a) private memslots are a new
feature so there's no prior uABI to break, and (b) if not zapping private memslot
SPTEs in response to the guest remapping a BAR somehow breaks GPU pass-through,
then the bug is all but guaranteed to be somewhere besides KVM's memslot logic.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-10-12 21:48   ` Sean Christopherson
  2021-10-13 17:04     ` Sean Christopherson
@ 2021-10-13 17:05     ` Brijesh Singh
  2021-10-13 17:24       ` Sean Christopherson
  1 sibling, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-10-13 17:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/12/21 2:48 PM, Sean Christopherson wrote:
> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 991b8c996fc1..6d9483ec91ab 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -31,6 +31,7 @@
>>  #include "svm_ops.h"
>>  #include "cpuid.h"
>>  #include "trace.h"
>> +#include "mmu.h"
>>  
>>  #define __ex(x) __kvm_handle_fault_on_reboot(x)
>>  
>> @@ -2905,6 +2906,181 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
>>  	svm->vmcb->control.ghcb_gpa = value;
>>  }
>>  
>> +static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn)
>> +{
>> +	pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
>> +
>> +	return psmash(pfn);
>> +}
>> +
>> +static int snp_make_page_shared(struct kvm *kvm, gpa_t gpa, kvm_pfn_t pfn, int level)
>> +{
>> +	int rc, rmp_level;
>> +
>> +	rc = snp_lookup_rmpentry(pfn, &rmp_level);
>> +	if (rc < 0)
>> +		return -EINVAL;
>> +
>> +	/* If page is not assigned then do nothing */
>> +	if (!rc)
>> +		return 0;
>> +
>> +	/*
>> +	 * Is the page part of an existing 2MB RMP entry ? Split the 2MB into
>> +	 * multiple of 4K-page before making the memory shared.
>> +	 */
>> +	if (level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M) {
>> +		rc = snp_rmptable_psmash(kvm, pfn);
>> +		if (rc)
>> +			return rc;
>> +	}
>> +
>> +	return rmp_make_shared(pfn, level);
>> +}
>> +
>> +static int snp_check_and_build_npt(struct kvm_vcpu *vcpu, gpa_t gpa, int level)
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	int rc, npt_level;
>> +	kvm_pfn_t pfn;
>> +
>> +	/*
>> +	 * Get the pfn and level for the gpa from the nested page table.
>> +	 *
>> +	 * If the tdp walk fails, then its safe to say that there is no
>> +	 * valid mapping for this gpa. Create a fault to build the map.
>> +	 */
>> +	write_lock(&kvm->mmu_lock);
> SEV (or any vendor code for that matter) should not be taking mmu_lock.  All of
> KVM has somewhat fungible borders between the various components, but IMO this
> crosses firmly into "this belongs in the MMU" territory.
>
> For example, I highly doubt this actually need to take mmu_lock for write.  More
> below.
>
>> +	rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
>> +	write_unlock(&kvm->mmu_lock);
> What's the point of this walk?  As soon as mmu_lock is dropped, all bets are off.
> At best this is a strong hint.  It doesn't hurt anything per se, it's just a waste
> of cycles.

I can avoid the walk because the kvm_mmu_map_tdp_page() will return a
pfn if the NPT is already built.


>> +	if (!rc) {
>> +		pfn = kvm_mmu_map_tdp_page(vcpu, gpa, PFERR_USER_MASK, level);
> Same here.
>
>> +		if (is_error_noslot_pfn(pfn))
>> +			return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int snp_gpa_to_hva(struct kvm *kvm, gpa_t gpa, hva_t *hva)
>> +{
>> +	struct kvm_memory_slot *slot;
>> +	gfn_t gfn = gpa_to_gfn(gpa);
>> +	int idx;
>> +
>> +	idx = srcu_read_lock(&kvm->srcu);
>> +	slot = gfn_to_memslot(kvm, gfn);
>> +	if (!slot) {
>> +		srcu_read_unlock(&kvm->srcu, idx);
>> +		return -EINVAL;
>> +	}
>> +
>> +	/*
>> +	 * Note, using the __gfn_to_hva_memslot() is not solely for performance,
>> +	 * it's also necessary to avoid the "writable" check in __gfn_to_hva_many(),
>> +	 * which will always fail on read-only memslots due to gfn_to_hva() assuming
>> +	 * writes.
>> +	 */
>> +	*hva = __gfn_to_hva_memslot(slot, gfn);
>> +	srcu_read_unlock(&kvm->srcu, idx);
> *hva is effectively invalidated the instance kvm->srcu is unlocked, e.g. a pending
> memslot update can complete immediately after and delete/move the backing memslot.

Fair point, I can rework to do all the hva related updates while we keep
the kvm->srcu. More on this below.


>> +
>> +	return 0;
>> +}
>> +
>> +static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, enum psc_op op, gpa_t gpa,
>> +					  int level)
>> +{
>> +	struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
>> +	struct kvm *kvm = vcpu->kvm;
>> +	int rc, npt_level;
>> +	kvm_pfn_t pfn;
>> +	gpa_t gpa_end;
>> +
>> +	gpa_end = gpa + page_level_size(level);
>> +
>> +	while (gpa < gpa_end) {
>> +		/*
>> +		 * If the gpa is not present in the NPT then build the NPT.
>> +		 */
>> +		rc = snp_check_and_build_npt(vcpu, gpa, level);
>> +		if (rc)
>> +			return -EINVAL;
>> +
>> +		if (op == SNP_PAGE_STATE_PRIVATE) {
>> +			hva_t hva;
>> +
>> +			if (snp_gpa_to_hva(kvm, gpa, &hva))
>> +				return -EINVAL;
>> +
>> +			/*
>> +			 * Verify that the hva range is registered. This enforcement is
>> +			 * required to avoid the cases where a page is marked private
>> +			 * in the RMP table but never gets cleanup during the VM
>> +			 * termination path.
>> +			 */
>> +			mutex_lock(&kvm->lock);
>> +			rc = is_hva_registered(kvm, hva, page_level_size(level));
> This will get a false negative if a hva+size spans two contiguous regions.
>
> Also, storing a boolean return in a variable that is an int _and_ was already used
> for the kernel's standard 
>
>> +			mutex_unlock(&kvm->lock);
> This is also subject to races, e.g. userspace unregisters the hva immediately
> after this check, before KVM makes whatever conversion it makes below.
>
> A linear walk through a list to find a range is also a bad idea, e.g. pathological
> worst case scenario is that userspace has created tens of thousands of individual
> regions.  There is no restriction on the number of regions, just the number of
> pages that can be pinned.
>
> I dislike the svm_(un)register_enc_region() scheme in general, but at least for
> SEV and SEV-ES the code is isolated, e.g. KVM is little more than a dump pipe to
> let userspace pin pages.  I would like to go the opposite direction and work towards
> eliminating regions_list (or at least making it optional), not build more stuff
> on top.

If we don't nuke the page from the direct map and region_list removed
then we no longer need to have all the above complexity in the PSC.  PSC
can be as simple as 1) map the page in NPF and 2) update the RMP table.


> The more I look at this, the more strongly I feel that private <=> shared conversions
> belong in the MMU, and that KVM's SPTEs should be the single source of truth for
> shared vs. private.  E.g. add a SPTE_TDP_PRIVATE_MASK in the software available bits.
> I believe the only hiccup is the snafu where not zapping _all_ SPTEs on memslot
> deletion breaks QEMU+VFIO+GPU, i.e. KVM would lose its canonical info on unrelated
> memslot deletion.
>
> But that is a solvable problem.  Ideally the bug, wherever it is, would be root
> caused and fixed.  I believe Peter (and Marc?) is going to work on reproducing
> the bug.
We have been also setting up VM with Qemu + VFIO + GPU usecase to repro
the bug on AMD HW and so far we no luck in reproducing it. Will continue
stressing the system to recreate it. Lets hope that Peter (and Marc) can
easily recreate on Intel HW so that we can work towards fixing it.
>
> If we are unable to root cause and fix the bug, I think a viable workaround would
> be to clear the hardware present bit in unrelated SPTEs, but keep the SPTEs
> themselves.  The idea mostly the same as the ZAPPED_PRIVATE concept from the initial
> TDX RFC.  MMU notifier invalidations, memslot removal, RMP restoration, etc... would
> all continue to work since the SPTEs is still there, and KVM's page fault handler
> could audit any "blocked" SPTE when it's refaulted (I'm pretty sure it'd be
> impossible for the PFN to change, since any PFN change would require a memslot
> update or mmu_notifier invalidation).
>
> The downside to that approach is that it would require walking all SPTEs to do a
> memslot deletion, i.e. we'd lose the "fast zap" behavior.  If that's a performance
> issue, the behavior could be opt-in (but not for SNP/TDX).
>
>> +			if (!rc)
>> +				return -EINVAL;
>> +
>> +			/*
>> +			 * Mark the userspace range unmerable before adding the pages
>> +			 * in the RMP table.
>> +			 */
>> +			mmap_write_lock(kvm->mm);
>> +			rc = snp_mark_unmergable(kvm, hva, page_level_size(level));
>> +			mmap_write_unlock(kvm->mm);
> As mentioned in an earlier patch, this simply cannot work.
As discussed in the previous patches, will drop the support for nuking
the page from direct map; this will keep ksm happy and no need to mark
vma unmergable.
>
>> +			if (rc)
>> +				return -EINVAL;
>> +		}
>> +
>> +		write_lock(&kvm->mmu_lock);
>> +
>> +		rc = kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level);
> Same comment about the bool into int. Though in this case I'd say have
> kvm_mmu_get_tdp_walk() return 0/-errno, not a bool.  Boolean returns for helpers
> without "is_", "test_", etc... are generally confusing.
>
>> +		if (!rc) {
>> +			/*
>> +			 * This may happen if another vCPU unmapped the page
>> +			 * before we acquire the lock. Retry the PSC.
>> +			 */
>> +			write_unlock(&kvm->mmu_lock);
>> +			return 0;
> How will the caller (guest?) know to retry the PSC if KVM returns "success"?

If a guest is adhering to the GHCB spec then it will see that hypervisor
has not processed all the entry and it should retry the PSC.


>> +		}
>> +
>> +		/*
>> +		 * Adjust the level so that we don't go higher than the backing
>> +		 * page level.
>> +		 */
>> +		level = min_t(size_t, level, npt_level);
>> +
>> +		trace_kvm_snp_psc(vcpu->vcpu_id, pfn, gpa, op, level);
>> +
>> +		switch (op) {
>> +		case SNP_PAGE_STATE_SHARED:
>> +			rc = snp_make_page_shared(kvm, gpa, pfn, level);
>> +			break;
>> +		case SNP_PAGE_STATE_PRIVATE:
>> +			rc = rmp_make_private(pfn, gpa, level, sev->asid, false);
>> +			break;
>> +		default:
>> +			rc = -EINVAL;
> Not that it really matters, because I don't think the MADV_* approach is viable,
> but this neglects to undo snp_mark_unmergable() on failure.
>
>> +			break;
>> +		}
>> +
>> +		write_unlock(&kvm->mmu_lock);
>> +
>> +		if (rc) {
>> +			pr_err_ratelimited("Error op %d gpa %llx pfn %llx level %d rc %d\n",
>> +					   op, gpa, pfn, level, rc);
>> +			return rc;
>> +		}
>> +
>> +		gpa = gpa + page_level_size(level);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>>  {
>>  	struct vmcb_control_area *control = &svm->vmcb->control;

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-10-13 17:05     ` Brijesh Singh
@ 2021-10-13 17:24       ` Sean Christopherson
  2021-10-13 17:49         ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-10-13 17:24 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Wed, Oct 13, 2021, Brijesh Singh wrote:
> > The more I look at this, the more strongly I feel that private <=> shared conversions
> > belong in the MMU, and that KVM's SPTEs should be the single source of truth for
> > shared vs. private.  E.g. add a SPTE_TDP_PRIVATE_MASK in the software available bits.
> > I believe the only hiccup is the snafu where not zapping _all_ SPTEs on memslot
> > deletion breaks QEMU+VFIO+GPU, i.e. KVM would lose its canonical info on unrelated
> > memslot deletion.
> >
> > But that is a solvable problem.  Ideally the bug, wherever it is, would be root
> > caused and fixed.  I believe Peter (and Marc?) is going to work on reproducing
> > the bug.
> We have been also setting up VM with Qemu + VFIO + GPU usecase to repro
> the bug on AMD HW and so far we no luck in reproducing it. Will continue
> stressing the system to recreate it. Lets hope that Peter (and Marc) can
> easily recreate on Intel HW so that we can work towards fixing it.

Are you trying on a modern kernel?  If so, double check that nx_huge_pages is off,
turning that on caused the bug to disappear.  It should be off for AMD systems,
but it's worth checking.

> >> +		if (!rc) {
> >> +			/*
> >> +			 * This may happen if another vCPU unmapped the page
> >> +			 * before we acquire the lock. Retry the PSC.
> >> +			 */
> >> +			write_unlock(&kvm->mmu_lock);
> >> +			return 0;
> > How will the caller (guest?) know to retry the PSC if KVM returns "success"?
> 
> If a guest is adhering to the GHCB spec then it will see that hypervisor
> has not processed all the entry and it should retry the PSC.

But AFAICT that information isn't passed to the guest.  Even in this single-page
MSR-based case, the caller will say "all good" on a return of 0.

The "full" path is more obvious, as the caller clearly continues to process
entries unless there's an actual failure.

+       for (; cur <= end; cur++) {
+               entry = &info->entries[cur];
+               gpa = gfn_to_gpa(entry->gfn);
+               level = RMP_TO_X86_PG_LEVEL(entry->pagesize);
+               op = entry->operation;
+
+               if (!IS_ALIGNED(gpa, page_level_size(level))) {
+                       rc = PSC_INVALID_ENTRY;
+                       goto out;
+               }
+
+               rc = __snp_handle_page_state_change(vcpu, op, gpa, level);
+               if (rc)
+                       goto out;
+       }

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-10-13 17:24       ` Sean Christopherson
@ 2021-10-13 17:49         ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-10-13 17:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/13/21 10:24 AM, Sean Christopherson wrote:
> On Wed, Oct 13, 2021, Brijesh Singh wrote:
>>> The more I look at this, the more strongly I feel that private <=> shared conversions
>>> belong in the MMU, and that KVM's SPTEs should be the single source of truth for
>>> shared vs. private.  E.g. add a SPTE_TDP_PRIVATE_MASK in the software available bits.
>>> I believe the only hiccup is the snafu where not zapping _all_ SPTEs on memslot
>>> deletion breaks QEMU+VFIO+GPU, i.e. KVM would lose its canonical info on unrelated
>>> memslot deletion.
>>>
>>> But that is a solvable problem.  Ideally the bug, wherever it is, would be root
>>> caused and fixed.  I believe Peter (and Marc?) is going to work on reproducing
>>> the bug.
>> We have been also setting up VM with Qemu + VFIO + GPU usecase to repro
>> the bug on AMD HW and so far we no luck in reproducing it. Will continue
>> stressing the system to recreate it. Lets hope that Peter (and Marc) can
>> easily recreate on Intel HW so that we can work towards fixing it.
> Are you trying on a modern kernel?  If so, double check that nx_huge_pages is off,
> turning that on caused the bug to disappear.  It should be off for AMD systems,
> but it's worth checking.

Yes, this is a recent kernel. I will double check the nx_huge_pages is off.


>>>> +		if (!rc) {
>>>> +			/*
>>>> +			 * This may happen if another vCPU unmapped the page
>>>> +			 * before we acquire the lock. Retry the PSC.
>>>> +			 */
>>>> +			write_unlock(&kvm->mmu_lock);
>>>> +			return 0;
>>> How will the caller (guest?) know to retry the PSC if KVM returns "success"?
>> If a guest is adhering to the GHCB spec then it will see that hypervisor
>> has not processed all the entry and it should retry the PSC.
> But AFAICT that information isn't passed to the guest.  Even in this single-page
> MSR-based case, the caller will say "all good" on a return of 0.
>
> The "full" path is more obvious, as the caller clearly continues to process
> entries unless there's an actual failure.
>
> +       for (; cur <= end; cur++) {
> +               entry = &info->entries[cur];
> +               gpa = gfn_to_gpa(entry->gfn);
> +               level = RMP_TO_X86_PG_LEVEL(entry->pagesize);
> +               op = entry->operation;
> +
> +               if (!IS_ALIGNED(gpa, page_level_size(level))) {
> +                       rc = PSC_INVALID_ENTRY;
> +                       goto out;
> +               }
> +
> +               rc = __snp_handle_page_state_change(vcpu, op, gpa, level);
> +               if (rc)
> +                       goto out;
> +       }
>
Please see the guest kernel patch #19 [1]. In spec there is no special
code for the retry. The guest will look at the PSC hdr to determine how
many entries were processed by the hypervisor (in this particular case a
0). And at time the guest can do whatever it wants. In the case of Linux
guest, we retry the PSC.

[1]
https://lore.kernel.org/linux-mm/20211008180453.462291-20-brijesh.singh@amd.com/

thanks


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 41/45] KVM: SVM: Add support to handle the RMP nested page fault
  2021-08-20 15:59 ` [PATCH Part2 v5 41/45] KVM: SVM: Add support to handle the RMP nested page fault Brijesh Singh
  2021-09-29 12:24   ` Dr. David Alan Gilbert
@ 2021-10-13 17:57   ` Sean Christopherson
  1 sibling, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-10-13 17:57 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> When SEV-SNP is enabled in the guest, the hardware places restrictions on
> all memory accesses based on the contents of the RMP table. When hardware
> encounters RMP check failure caused by the guest memory access it raises
> the #NPF. The error code contains additional information on the access
> type. See the APM volume 2 for additional information.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c | 14 +++++---
>  arch/x86/kvm/svm/svm.h |  1 +
>  3 files changed, 87 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 65b578463271..712e8907bc39 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3651,3 +3651,79 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int token)
>  
>  	srcu_read_unlock(&sev->psc_srcu, token);
>  }
> +
> +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
> +{
> +	int rmp_level, npt_level, rc, assigned;

Really silly nit: can you use 'r' or 'ret' instead of 'rc'?  Outside of the
emulator, which should never be the gold standard for code ;-), 'rc' isn't used
in x86 KVM.

> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3784d389247b..3ba62f21b113 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1933,15 +1933,21 @@ static int pf_interception(struct kvm_vcpu *vcpu)
>  static int npf_interception(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> +	int rc;
>  
>  	u64 fault_address = svm->vmcb->control.exit_info_2;
>  	u64 error_code = svm->vmcb->control.exit_info_1;
>  
>  	trace_kvm_page_fault(fault_address, error_code);
> -	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
> -			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
> -			svm->vmcb->control.insn_bytes : NULL,
> -			svm->vmcb->control.insn_len);
> +	rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
> +				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
> +				svm->vmcb->control.insn_bytes : NULL,
> +				svm->vmcb->control.insn_len);
> +
> +	if (error_code & PFERR_GUEST_RMP_MASK)

Don't think it will matter in the end, but shouldn't this consult 'rc' before
diving into RMP handling?

> +		handle_rmp_page_fault(vcpu, fault_address, error_code);

Similar to my comments on the PSC patches, I think this is backwards.  The MMU
should provide the control logic to convert between private and shared, and vendor
code should only get involved when there are side effects from the conversion.

Once I've made a dent in my review backlog, I'll fiddle with this and some of the
other MMU interactions, and hopefully come up with a workable sketch for the MMU
stuff.  Yell at me if you haven't gotten an update by Wednesday or so.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2021-10-13  0:23   ` Sean Christopherson
@ 2021-10-13 18:10     ` Brijesh Singh
  2021-10-13 20:10       ` Sean Christopherson
  2021-10-13 20:16     ` Sean Christopherson
  1 sibling, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-10-13 18:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/12/21 5:23 PM, Sean Christopherson wrote:
> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>> When SEV-SNP is enabled in the guest VM, the guest memory pages can
>> either be a private or shared. A write from the hypervisor goes through
>> the RMP checks. If hardware sees that hypervisor is attempting to write
>> to a guest private page, then it triggers an RMP violation #PF.
>>
>> To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
>> used to verify that its safe to map a given guest page. Use the SRCU to
>> protect against the page state change for existing mapped pages.
> SRCU isn't protecting anything.  The synchronize_srcu_expedited() in the PSC code
> forces it to wait for existing maps to go away, but it doesn't prevent new maps
> from being created while the actual RMP updates are in-flight.  Most telling is
> that the RMP updates happen _after_ the synchronize_srcu_expedited() call.
>
> This also doesn't handle kvm_{read,write}_guest_cached().

Hmm, I thought the kvm_{read_write}_guest_cached() uses the
copy_{to,from}_user(). Writing to the private will cause a #PF and will
fail the copy_to_user(). Am I missing something?


>
> I can't help but note that the private memslots idea[*] would handle this gracefully,
> e.g. the memslot lookup would fail, and any change in private memslots would
> invalidate the cache due to a generation mismatch.
>
> KSM is another mess that would Just Work.
>
> I'm not saying that SNP should be blocked on support for unmapping guest private
> memory, but I do think we should strongly consider focusing on that effort rather
> than trying to fix things piecemeal throughout KVM.  I don't think it's too absurd
> to say that it might actually be faster overall.  And I 100% think that having a
> cohesive design and uABI for SNP and TDX would be hugely beneficial to KVM.
>
> [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20210824005248.200037-1-seanjc%40google.com&amp;data=04%7C01%7Cbrijesh.singh%40amd.com%7Cd1717d511a1f473cedc408d98ddfb027%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637696814148744591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=3LF77%2BcqmpUdiP6YAk7LpImisBzjRGUzdI3raqjJxl0%3D&amp;reserved=0
>
>> +int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token)
>> +{
>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +	int level;
>> +
>> +	if (!sev_snp_guest(kvm))
>> +		return 0;
>> +
>> +	*token = srcu_read_lock(&sev->psc_srcu);
>> +
>> +	/* If pfn is not added as private then fail */
> This comment and the pr_err() are backwards, and confused the heck out of me.
> snp_lookup_rmpentry() returns '1' if the pfn is assigned, a.k.a. private.  That
> means this code throws an error if the page is private, i.e. requires the page
> to be shared.  Which makes sense given the use cases, it's just incredibly
> confusing.
Actually I followed your recommendation from the previous feedback that
snp_lookup_rmpentry() should return 1 for the assigned page, 0 for the
shared and -negative for invalid. I can clarify it here  again.
>
>> +	if (snp_lookup_rmpentry(pfn, &level) == 1) {
> Any reason not to provide e.g. rmp_is_shared() and rmp_is_private() so that
> callers don't have to care as much about the return values?  The -errno/0/1
> semantics are all but guarantee to bite us in the rear at some point.

If we look at the previous series, I had a macro rmp_is_assigned() for
exactly the same purpose but the feedback was to drop those macros and
return the state indirectly through the snp_lookup_rmpentry(). I can
certainly add a helper rmp_is_{shared,private}() if it makes code more
readable.


> Actually, peeking at other patches, I think it already has.  This usage in
> __unregister_enc_region_locked() is wrong:
>
> 	/*
> 	 * The guest memory pages are assigned in the RMP table. Unassign it
> 	 * before releasing the memory.
> 	 */
> 	if (sev_snp_guest(kvm)) {
> 		for (i = 0; i < region->npages; i++) {
> 			pfn = page_to_pfn(region->pages[i]);
>
> 			if (!snp_lookup_rmpentry(pfn, &level))  <-- attempts make_shared() on non-existent entry
> 				continue;
>
> 			cond_resched();
>
> 			if (level > PG_LEVEL_4K)
> 				pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
>
> 			host_rmp_make_shared(pfn, level, true);
> 		}
> 	}
>
>
>> +		srcu_read_unlock(&sev->psc_srcu, *token);
>> +		pr_err_ratelimited("failed to map private gfn 0x%llx pfn 0x%llx\n", gfn, pfn);
>> +		return -EBUSY;
>> +	}
>> +
>> +	return 0;
>> +}
>>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index d10f7166b39d..ff91184f9b4a 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -76,16 +76,22 @@ struct kvm_sev_info {
>>  	bool active;		/* SEV enabled guest */
>>  	bool es_active;		/* SEV-ES enabled guest */
>>  	bool snp_active;	/* SEV-SNP enabled guest */
>> +
>>  	unsigned int asid;	/* ASID used for this guest */
>>  	unsigned int handle;	/* SEV firmware handle */
>>  	int fd;			/* SEV device fd */
>> +
>>  	unsigned long pages_locked; /* Number of pages locked */
>>  	struct list_head regions_list;  /* List of registered regions */
>> +
>>  	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
>> +
>>  	struct kvm *enc_context_owner; /* Owner of copied encryption context */
>>  	struct misc_cg *misc_cg; /* For misc cgroup accounting */
>> +
> Unrelated whitespace changes.
>
>>  	u64 snp_init_flags;
>>  	void *snp_context;      /* SNP guest context page */
>> +	struct srcu_struct psc_srcu;
>>  };

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2021-10-13 18:10     ` Brijesh Singh
@ 2021-10-13 20:10       ` Sean Christopherson
  2021-10-13 21:49         ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-10-13 20:10 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Wed, Oct 13, 2021, Brijesh Singh wrote:
> 
> On 10/12/21 5:23 PM, Sean Christopherson wrote:
> > On Fri, Aug 20, 2021, Brijesh Singh wrote:
> >> When SEV-SNP is enabled in the guest VM, the guest memory pages can
> >> either be a private or shared. A write from the hypervisor goes through
> >> the RMP checks. If hardware sees that hypervisor is attempting to write
> >> to a guest private page, then it triggers an RMP violation #PF.
> >>
> >> To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
> >> used to verify that its safe to map a given guest page. Use the SRCU to
> >> protect against the page state change for existing mapped pages.
> > SRCU isn't protecting anything.  The synchronize_srcu_expedited() in the PSC code
> > forces it to wait for existing maps to go away, but it doesn't prevent new maps
> > from being created while the actual RMP updates are in-flight.  Most telling is
> > that the RMP updates happen _after_ the synchronize_srcu_expedited() call.
> >
> > This also doesn't handle kvm_{read,write}_guest_cached().
> 
> Hmm, I thought the kvm_{read_write}_guest_cached() uses the
> copy_{to,from}_user(). Writing to the private will cause a #PF and will
> fail the copy_to_user(). Am I missing something?

Doh, right you are.  I was thinking they cached the kmap, but it's just the
gpa->hva that gets cached.

> > I can't help but note that the private memslots idea[*] would handle this gracefully,
> > e.g. the memslot lookup would fail, and any change in private memslots would
> > invalidate the cache due to a generation mismatch.
> >
> > KSM is another mess that would Just Work.
> >
> > I'm not saying that SNP should be blocked on support for unmapping guest private
> > memory, but I do think we should strongly consider focusing on that effort rather
> > than trying to fix things piecemeal throughout KVM.  I don't think it's too absurd
> > to say that it might actually be faster overall.  And I 100% think that having a
> > cohesive design and uABI for SNP and TDX would be hugely beneficial to KVM.
> >
> > [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20210824005248.200037-1-seanjc%40google.com&amp;data=04%7C01%7Cbrijesh.singh%40amd.com%7Cd1717d511a1f473cedc408d98ddfb027%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637696814148744591%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=3LF77%2BcqmpUdiP6YAk7LpImisBzjRGUzdI3raqjJxl0%3D&amp;reserved=0
> >
> >> +int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token)
> >> +{
> >> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >> +	int level;
> >> +
> >> +	if (!sev_snp_guest(kvm))
> >> +		return 0;
> >> +
> >> +	*token = srcu_read_lock(&sev->psc_srcu);
> >> +
> >> +	/* If pfn is not added as private then fail */
> > This comment and the pr_err() are backwards, and confused the heck out of me.
> > snp_lookup_rmpentry() returns '1' if the pfn is assigned, a.k.a. private.  That
> > means this code throws an error if the page is private, i.e. requires the page
> > to be shared.  Which makes sense given the use cases, it's just incredibly
> > confusing.
> Actually I followed your recommendation from the previous feedback that
> snp_lookup_rmpentry() should return 1 for the assigned page, 0 for the
> shared and -negative for invalid. I can clarify it here  again.
>
> >> +	if (snp_lookup_rmpentry(pfn, &level) == 1) {
> > Any reason not to provide e.g. rmp_is_shared() and rmp_is_private() so that
> > callers don't have to care as much about the return values?  The -errno/0/1
> > semantics are all but guarantee to bite us in the rear at some point.
> 
> If we look at the previous series, I had a macro rmp_is_assigned() for
> exactly the same purpose but the feedback was to drop those macros and
> return the state indirectly through the snp_lookup_rmpentry(). I can
> certainly add a helper rmp_is_{shared,private}() if it makes code more
> readable.

Ah rats.  Bad communication on my side.  I didn't intended to have non-RMP code
directly consume snp_lookup_rmpentry() for simple checks.  The goal behind the
helper was to bury "struct rmpentry" so that it wasn't visible to the kernel at
large.  I.e. my objection was that rmp_assigned() was looking directly at a
non-architectural struct.

My full thought for snp_lookup_rmpentry() was that it _could_ be consumed directly
without exposing "struct rmpentry", but that simple, common use cases would provide
wrappers, e.g.

static inline snp_is_rmp_private(...)
{
	return snp_lookup_rmpentry(...) == 1;
}

static inline snp_is_rmp_shared(...)
{
	return snp_lookup_rmpentry(...) < 1;
}


Side topic, what do you think about s/assigned/private for the "public" APIs, as
suggested/implied above?  I actually like the terminology when talking specifically
about the RMP, but it doesn't fit the abstractions that tend to be used when talking
about these things in other contexts, e.g. in KVM.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2021-10-13  0:23   ` Sean Christopherson
  2021-10-13 18:10     ` Brijesh Singh
@ 2021-10-13 20:16     ` Sean Christopherson
  2021-10-15 16:31       ` Brijesh Singh
  1 sibling, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-10-13 20:16 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Wed, Oct 13, 2021, Sean Christopherson wrote:
> On Fri, Aug 20, 2021, Brijesh Singh wrote:
> > When SEV-SNP is enabled in the guest VM, the guest memory pages can
> > either be a private or shared. A write from the hypervisor goes through
> > the RMP checks. If hardware sees that hypervisor is attempting to write
> > to a guest private page, then it triggers an RMP violation #PF.
> > 
> > To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
> > used to verify that its safe to map a given guest page. Use the SRCU to
> > protect against the page state change for existing mapped pages.
> 
> SRCU isn't protecting anything.  The synchronize_srcu_expedited() in the PSC code
> forces it to wait for existing maps to go away, but it doesn't prevent new maps
> from being created while the actual RMP updates are in-flight.  Most telling is
> that the RMP updates happen _after_ the synchronize_srcu_expedited() call.

Argh, another goof on my part.  Rereading prior feedback, I see that I loosely
suggested SRCU as a possible solution.  That was a bad, bad suggestion.  I think
(hope) I made it offhand without really thinking it through.  SRCU can't work in
this case, because the whole premise of Read-Copy-Update is that there can be
multiple copies of the data.  That simply can't be true for the RMP as hardware
operates on a single table.

In the future, please don't hesitate to push back on and/or question suggestions,
especially those that are made without concrete examples, i.e. are likely off the
cuff.  My goal isn't to set you up for failure :-/

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 34/45] KVM: SVM: Do not use long-lived GHCB map while setting scratch area
  2021-08-20 15:59 ` [PATCH Part2 v5 34/45] KVM: SVM: Do not use long-lived GHCB map while setting scratch area Brijesh Singh
@ 2021-10-13 21:20   ` Sean Christopherson
  2021-10-15 16:11     ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-10-13 21:20 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> The setup_vmgexit_scratch() function may rely on a long-lived GHCB
> mapping if the GHCB shared buffer area was used for the scratch area.
> In preparation for eliminating the long-lived GHCB mapping, always
> allocate a buffer for the scratch area so it can be accessed without
> the GHCB mapping.

Would it make sense to post this patch and the next (Remove the long-lived GHCB
host map) in a separate mini-series?  It's needed for SNP, but AFAICT there's
nothing that depends on SNP.  Getting this merged ahead of time would reduce the
size of the SNP series by a smidge.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 33/45] KVM: x86: Update page-fault trace to log full 64-bit error code
  2021-08-20 15:59 ` [PATCH Part2 v5 33/45] KVM: x86: Update page-fault trace to log full 64-bit error code Brijesh Singh
@ 2021-10-13 21:23   ` Sean Christopherson
  0 siblings, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-10-13 21:23 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, stable

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> The #NPT error code is a 64-bit value but the trace prints only the
> lower 32-bits. Some of the fault error code (e.g PFERR_GUEST_FINAL_MASK)
> are available in the upper 32-bits.
> 
> Cc: <stable@kernel.org>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

Paolo, can you grab this one sooner than later?  Alternatively, maybe post this
separately Brijesh?

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2021-10-13 20:10       ` Sean Christopherson
@ 2021-10-13 21:49         ` Brijesh Singh
  2021-10-13 22:10           ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-10-13 21:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/13/21 1:10 PM, Sean Christopherson wrote:
> On Wed, Oct 13, 2021, Brijesh Singh wrote:
>> On 10/12/21 5:23 PM, Sean Christopherson wrote:
>>> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>>>> When SEV-SNP is enabled in the guest VM, the guest memory pages can
>>>> either be a private or shared. A write from the hypervisor goes through
>>>> the RMP checks. If hardware sees that hypervisor is attempting to write
>>>> to a guest private page, then it triggers an RMP violation #PF.
>>>>
>>>> To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
>>>> used to verify that its safe to map a given guest page. Use the SRCU to
>>>> protect against the page state change for existing mapped pages.
>>> SRCU isn't protecting anything.  The synchronize_srcu_expedited() in the PSC code
>>> forces it to wait for existing maps to go away, but it doesn't prevent new maps
>>> from being created while the actual RMP updates are in-flight.  Most telling is
>>> that the RMP updates happen _after_ the synchronize_srcu_expedited() call.
>>>
>>> This also doesn't handle kvm_{read,write}_guest_cached().
>> Hmm, I thought the kvm_{read_write}_guest_cached() uses the
>> copy_{to,from}_user(). Writing to the private will cause a #PF and will
>> fail the copy_to_user(). Am I missing something?
> Doh, right you are.  I was thinking they cached the kmap, but it's just the
> gpa->hva that gets cached.
>
>>> I can't help but note that the private memslots idea[*] would handle this gracefully,
>>> e.g. the memslot lookup would fail, and any change in private memslots would
>>> invalidate the cache due to a generation mismatch.
>>>
>>> KSM is another mess that would Just Work.
>>>
>>> I'm not saying that SNP should be blocked on support for unmapping guest private
>>> memory, but I do think we should strongly consider focusing on that effort rather
>>> than trying to fix things piecemeal throughout KVM.  I don't think it's too absurd
>>> to say that it might actually be faster overall.  And I 100% think that having a
>>> cohesive design and uABI for SNP and TDX would be hugely beneficial to KVM.
>>>
>>> [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20210824005248.200037-1-seanjc%40google.com&amp;data=04%7C01%7Cbrijesh.singh%40amd.com%7C0f1a3f5f63074b60d21b08d98e857daf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637697526304105177%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=w6xS3DcG4fcTweC5i4%2BuB4jhn3Xcj2a44BkoATVcSgQ%3D&amp;reserved=0
>>>
>>>> +int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token)
>>>> +{
>>>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> +	int level;
>>>> +
>>>> +	if (!sev_snp_guest(kvm))
>>>> +		return 0;
>>>> +
>>>> +	*token = srcu_read_lock(&sev->psc_srcu);
>>>> +
>>>> +	/* If pfn is not added as private then fail */
>>> This comment and the pr_err() are backwards, and confused the heck out of me.
>>> snp_lookup_rmpentry() returns '1' if the pfn is assigned, a.k.a. private.  That
>>> means this code throws an error if the page is private, i.e. requires the page
>>> to be shared.  Which makes sense given the use cases, it's just incredibly
>>> confusing.
>> Actually I followed your recommendation from the previous feedback that
>> snp_lookup_rmpentry() should return 1 for the assigned page, 0 for the
>> shared and -negative for invalid. I can clarify it here  again.
>>
>>>> +	if (snp_lookup_rmpentry(pfn, &level) == 1) {
>>> Any reason not to provide e.g. rmp_is_shared() and rmp_is_private() so that
>>> callers don't have to care as much about the return values?  The -errno/0/1
>>> semantics are all but guarantee to bite us in the rear at some point.
>> If we look at the previous series, I had a macro rmp_is_assigned() for
>> exactly the same purpose but the feedback was to drop those macros and
>> return the state indirectly through the snp_lookup_rmpentry(). I can
>> certainly add a helper rmp_is_{shared,private}() if it makes code more
>> readable.
> Ah rats.  Bad communication on my side.  I didn't intended to have non-RMP code
> directly consume snp_lookup_rmpentry() for simple checks.  The goal behind the
> helper was to bury "struct rmpentry" so that it wasn't visible to the kernel at
> large.  I.e. my objection was that rmp_assigned() was looking directly at a
> non-architectural struct.
>
> My full thought for snp_lookup_rmpentry() was that it _could_ be consumed directly
> without exposing "struct rmpentry", but that simple, common use cases would provide
> wrappers, e.g.
>
> static inline snp_is_rmp_private(...)
> {
> 	return snp_lookup_rmpentry(...) == 1;
> }
>
> static inline snp_is_rmp_shared(...)
> {
> 	return snp_lookup_rmpentry(...) < 1;
> }

Yep, that what I was going to do for the helper.


>
> Side topic, what do you think about s/assigned/private for the "public" APIs, as
> suggested/implied above?  I actually like the terminology when talking specifically
> about the RMP, but it doesn't fit the abstractions that tend to be used when talking
> about these things in other contexts, e.g. in KVM.

I can float the idea to see if docs folks is okay with the changes but
generally speaking we all have been referring the assigned == private in
the Linux SNP support patch.

thanks


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2021-10-13 21:49         ` Brijesh Singh
@ 2021-10-13 22:10           ` Sean Christopherson
  2021-10-13 22:31             ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-10-13 22:10 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Wed, Oct 13, 2021, Brijesh Singh wrote:
> 
> On 10/13/21 1:10 PM, Sean Christopherson wrote:
> > Side topic, what do you think about s/assigned/private for the "public" APIs, as
> > suggested/implied above?  I actually like the terminology when talking specifically
> > about the RMP, but it doesn't fit the abstractions that tend to be used when talking
> > about these things in other contexts, e.g. in KVM.
> 
> I can float the idea to see if docs folks is okay with the changes but
> generally speaking we all have been referring the assigned == private in
> the Linux SNP support patch.

Oh, I wasn't suggesting anything remotely close to an "official" change, it was
purely a suggestion for the host-side patches.  It might even be unique to this
one helper.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2021-10-13 22:10           ` Sean Christopherson
@ 2021-10-13 22:31             ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-10-13 22:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/13/21 3:10 PM, Sean Christopherson wrote:
> On Wed, Oct 13, 2021, Brijesh Singh wrote:
>> On 10/13/21 1:10 PM, Sean Christopherson wrote:
>>> Side topic, what do you think about s/assigned/private for the "public" APIs, as
>>> suggested/implied above?  I actually like the terminology when talking specifically
>>> about the RMP, but it doesn't fit the abstractions that tend to be used when talking
>>> about these things in other contexts, e.g. in KVM.
>> I can float the idea to see if docs folks is okay with the changes but
>> generally speaking we all have been referring the assigned == private in
>> the Linux SNP support patch.
> Oh, I wasn't suggesting anything remotely close to an "official" change, it was
> purely a suggestion for the host-side patches.  It might even be unique to this
> one helper.

Ah, sure I am good with calling a private in the helper.

-Brijesh


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 34/45] KVM: SVM: Do not use long-lived GHCB map while setting scratch area
  2021-10-13 21:20   ` Sean Christopherson
@ 2021-10-15 16:11     ` Brijesh Singh
  2021-10-15 16:44       ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-10-15 16:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/13/21 2:20 PM, Sean Christopherson wrote:
> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>> The setup_vmgexit_scratch() function may rely on a long-lived GHCB
>> mapping if the GHCB shared buffer area was used for the scratch area.
>> In preparation for eliminating the long-lived GHCB mapping, always
>> allocate a buffer for the scratch area so it can be accessed without
>> the GHCB mapping.
> Would it make sense to post this patch and the next (Remove the long-lived GHCB
> host map) in a separate mini-series?  It's needed for SNP, but AFAICT there's
> nothing that depends on SNP.  Getting this merged ahead of time would reduce the
> size of the SNP series by a smidge.

While testing with random configs, I am seeing some might_sleep() warns.
This is happening mainly because during the vmrun the GHCB is accessed
with preempt disabled. The kvm_vcpu_map() -> kmap() reports the warning.
I am leaning towards creating a cache on the vmgexit and use that cache
instead of the doing a kmap() on every access. Does that sound okay to you ?

-Brijesh


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2021-10-13 20:16     ` Sean Christopherson
@ 2021-10-15 16:31       ` Brijesh Singh
  2021-10-15 17:16         ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-10-15 16:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/13/21 1:16 PM, Sean Christopherson wrote:
> On Wed, Oct 13, 2021, Sean Christopherson wrote:
>> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>>> When SEV-SNP is enabled in the guest VM, the guest memory pages can
>>> either be a private or shared. A write from the hypervisor goes through
>>> the RMP checks. If hardware sees that hypervisor is attempting to write
>>> to a guest private page, then it triggers an RMP violation #PF.
>>>
>>> To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
>>> used to verify that its safe to map a given guest page. Use the SRCU to
>>> protect against the page state change for existing mapped pages.
>> SRCU isn't protecting anything.  The synchronize_srcu_expedited() in the PSC code
>> forces it to wait for existing maps to go away, but it doesn't prevent new maps
>> from being created while the actual RMP updates are in-flight.  Most telling is
>> that the RMP updates happen _after_ the synchronize_srcu_expedited() call.
> Argh, another goof on my part.  Rereading prior feedback, I see that I loosely
> suggested SRCU as a possible solution.  That was a bad, bad suggestion.  I think
> (hope) I made it offhand without really thinking it through.  SRCU can't work in
> this case, because the whole premise of Read-Copy-Update is that there can be
> multiple copies of the data.  That simply can't be true for the RMP as hardware
> operates on a single table.
>
> In the future, please don't hesitate to push back on and/or question suggestions,
> especially those that are made without concrete examples, i.e. are likely off the
> cuff.  My goal isn't to set you up for failure :-/

What do you think about going back to my initial proposal of per-gfn
tracking [1] ? We can limit the changes to just for the kvm_vcpu_map()
and let the copy_to_user() take a fault and return an error (if it
attempt to write to guest private). If PSC happen while lock is held
then simplify return and let the guest retry PSC.

[1]
https://lore.kernel.org/lkml/20210707183616.5620-36-brijesh.singh@amd.com/



^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 34/45] KVM: SVM: Do not use long-lived GHCB map while setting scratch area
  2021-10-15 16:11     ` Brijesh Singh
@ 2021-10-15 16:44       ` Sean Christopherson
  0 siblings, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-10-15 16:44 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Oct 15, 2021, Brijesh Singh wrote:
> 
> On 10/13/21 2:20 PM, Sean Christopherson wrote:
> > On Fri, Aug 20, 2021, Brijesh Singh wrote:
> >> The setup_vmgexit_scratch() function may rely on a long-lived GHCB
> >> mapping if the GHCB shared buffer area was used for the scratch area.
> >> In preparation for eliminating the long-lived GHCB mapping, always
> >> allocate a buffer for the scratch area so it can be accessed without
> >> the GHCB mapping.
> > Would it make sense to post this patch and the next (Remove the long-lived GHCB
> > host map) in a separate mini-series?  It's needed for SNP, but AFAICT there's
> > nothing that depends on SNP.  Getting this merged ahead of time would reduce the
> > size of the SNP series by a smidge.
> 
> While testing with random configs, I am seeing some might_sleep() warns.
> This is happening mainly because during the vmrun the GHCB is accessed
> with preempt disabled. The kvm_vcpu_map() -> kmap() reports the warning.
> I am leaning towards creating a cache on the vmgexit and use that cache
> instead of the doing a kmap() on every access. Does that sound okay to you ?

Since SEV is 64-bit only, it should be ok to add a kvm_vcpu_map_atomic() variant.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2021-10-15 16:31       ` Brijesh Singh
@ 2021-10-15 17:16         ` Sean Christopherson
  2022-09-08 21:21           ` Michael Roth
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-10-15 17:16 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Oct 15, 2021, Brijesh Singh wrote:
> 
> On 10/13/21 1:16 PM, Sean Christopherson wrote:
> > On Wed, Oct 13, 2021, Sean Christopherson wrote:
> >> On Fri, Aug 20, 2021, Brijesh Singh wrote:
> >>> When SEV-SNP is enabled in the guest VM, the guest memory pages can
> >>> either be a private or shared. A write from the hypervisor goes through
> >>> the RMP checks. If hardware sees that hypervisor is attempting to write
> >>> to a guest private page, then it triggers an RMP violation #PF.
> >>>
> >>> To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
> >>> used to verify that its safe to map a given guest page. Use the SRCU to
> >>> protect against the page state change for existing mapped pages.
> >> SRCU isn't protecting anything.  The synchronize_srcu_expedited() in the PSC code
> >> forces it to wait for existing maps to go away, but it doesn't prevent new maps
> >> from being created while the actual RMP updates are in-flight.  Most telling is
> >> that the RMP updates happen _after_ the synchronize_srcu_expedited() call.
> > Argh, another goof on my part.  Rereading prior feedback, I see that I loosely
> > suggested SRCU as a possible solution.  That was a bad, bad suggestion.  I think
> > (hope) I made it offhand without really thinking it through.  SRCU can't work in
> > this case, because the whole premise of Read-Copy-Update is that there can be
> > multiple copies of the data.  That simply can't be true for the RMP as hardware
> > operates on a single table.
> >
> > In the future, please don't hesitate to push back on and/or question suggestions,
> > especially those that are made without concrete examples, i.e. are likely off the
> > cuff.  My goal isn't to set you up for failure :-/
> 
> What do you think about going back to my initial proposal of per-gfn
> tracking [1] ? We can limit the changes to just for the kvm_vcpu_map()
> and let the copy_to_user() take a fault and return an error (if it
> attempt to write to guest private). If PSC happen while lock is held
> then simplify return and let the guest retry PSC.

That approach is also broken as it doesn't hold a lock when updating host_write_track,
e.g. the count can be corrupted if writers collide, and nothing blocks writers on
in-progress readers.

I'm not opposed to a scheme that blocks PSC while KVM is reading, but I don't want
to spend time iterating on the KVM case until consensus has been reached on how
exactly RMP updates will be handled, and in general how the kernel will manage
guest private memory.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2021-08-20 15:58 ` [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Brijesh Singh
  2021-09-24 14:04   ` Borislav Petkov
@ 2021-10-15 18:05   ` Sean Christopherson
  2021-10-15 20:18     ` Brijesh Singh
  1 sibling, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-10-15 18:05 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index f383d2a89263..8627c49666c9 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2419,3 +2419,75 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
>  	return !!rmpentry_assigned(e);
>  }
>  EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> +
> +int psmash(u64 pfn)
> +{
> +	unsigned long paddr = pfn << PAGE_SHIFT;

Probably better to use __pfn_to_phys()?

> +	int ret;
> +
> +	if (!pfn_valid(pfn))
> +		return -EINVAL;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))

Shouldn't this be a WARN_ON_ONCE()?

> +		return -ENXIO;
> +
> +	/* Binutils version 2.36 supports the PSMASH mnemonic. */
> +	asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> +		      : "=a"(ret)
> +		      : "a"(paddr)
> +		      : "memory", "cc");
> +
> +	return ret;

I don't like returning the raw result from hardware; it's mostly works because
hardware also uses '0' for success, but it will cause confusion should hardware
ever set bit 31.  There are also failures that likely should never happen unless
there's a kernel bug, e.g. I suspect we can do:

	if (WARN_ON_ONCE(ret == FAIL_INPUT))
		return -EINVAL;
	if (WARN_ON_ONCE(ret == FAIL_PERMISSION))
		return -EPERM;
	
	....

or if all errors are "impossible"

	if (WARN_ON_ONCE(ret))
		return snp_error_code_to_errno(ret);

FAIL_INUSE and FAIL_OVERLAP also need further discussion.  It's not clear to me
that two well-behaved callers can't encounter collisions due to the 2mb <=> 4kb
interactions.  If collisions between well-behaved callers are possible, then this
helper likely needs some form of serialization.  Either, the concurrency rules
for RMP access need explicit and lengthy documentation.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 43/45] KVM: SVM: Use a VMSA physical address variable for populating VMCB
  2021-08-20 15:59 ` [PATCH Part2 v5 43/45] KVM: SVM: Use a VMSA physical address variable for populating VMCB Brijesh Singh
@ 2021-10-15 18:58   ` Sean Christopherson
  0 siblings, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-10-15 18:58 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> In preparation to support SEV-SNP AP Creation, use a variable that holds
> the VMSA physical address rather than converting the virtual address.
> This will allow SEV-SNP AP Creation to set the new physical address that
> will be used should the vCPU reset path be taken.

The use of "variable" in the changelog and shortlog is really confusing.  I read
them multiple times and still didn't fully understand the change until I sussed
out that the change is to track the PA in vcpu_svm separately from vcpu_svm.vmsa.

It's somewhat of a moot point though, because I think this can and should be
simplified.

In the SEV-ES case, svm->vmcb->control.vmsa_pa is always __pa(svm->vmsa).  And
in the SNP case, svm->vmcb->control.vmsa_pa defaults to __pa(svm->vmsa), but is
not changed on INIT.  Rather than do this crazy 3-way dance, simply don't write
svm->vmcb->control.vmsa_pa on INIT.  Then SNP can change it at will without having
an unnecessary and confusing field.

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1e8b26b93b4f..0bec0b71577e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2593,13 +2593,6 @@ void sev_es_init_vmcb(struct vcpu_svm *svm)
        svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
        svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;

-       /*
-        * An SEV-ES guest requires a VMSA area that is a separate from the
-        * VMCB page. Do not include the encryption mask on the VMSA physical
-        * address since hardware will access it using the guest key.
-        */
-       svm->vmcb->control.vmsa_pa = __pa(svm->vmsa);
-
        /* Can't intercept CR register access, HV can't modify CR registers */
        svm_clr_intercept(svm, INTERCEPT_CR0_READ);
        svm_clr_intercept(svm, INTERCEPT_CR4_READ);
@@ -2633,6 +2626,13 @@ void sev_es_init_vmcb(struct vcpu_svm *svm)

 void sev_es_vcpu_reset(struct vcpu_svm *svm)
 {
+       /*
+        * An SEV-ES guest requires a VMSA area that is a separate from the
+        * VMCB page. Do not include the encryption mask on the VMSA physical
+        * address since hardware will access it using the guest key.
+        */
+       svm->vmcb->control.vmsa_pa = __pa(svm->vmsa);
+
        /*
         * Set the GHCB MSR value as per the GHCB specification when emulating
         * vCPU RESET for an SEV-ES guest.

> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>

This needs your SoB.

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2021-08-20 15:59 ` [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event Brijesh Singh
@ 2021-10-15 19:50   ` Sean Christopherson
  2021-10-20 21:48     ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-10-15 19:50 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021, Brijesh Singh wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
> guests to alter the register state of the APs on their own. This allows
> the guest a way of simulating INIT-SIPI.
> 
> A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
> so as to avoid updating the VMSA pointer while the vCPU is running.
> 
> For CREATE
>   The guest supplies the GPA of the VMSA to be used for the vCPU with the
>   specified APIC ID. The GPA is saved in the svm struct of the target
>   vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added to the
>   vCPU and then the vCPU is kicked.
> 
> For CREATE_ON_INIT:
>   The guest supplies the GPA of the VMSA to be used for the vCPU with the
>   specified APIC ID the next time an INIT is performed. The GPA is saved
>   in the svm struct of the target vCPU.
> 
> For DESTROY:
>   The guest indicates it wishes to stop the vCPU. The GPA is cleared from
>   the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
>   to vCPU and then the vCPU is kicked.
> 
> 
> The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked as
> a result of the event or as a result of an INIT. The handler sets the vCPU
> to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will leave the
> vCPU as not runnable. Any previous VMSA pages that were installed as
> part of an SEV-SNP AP Creation NAE event are un-pinned. If a new VMSA is
> to be installed, the VMSA guest page is pinned and set as the VMSA in the
> vCPU VMCB and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new
> VMSA is not to be installed, the VMSA is cleared in the vCPU VMCB and the
> vCPU state is left as KVM_MP_STATE_UNINITIALIZED to prevent it from being
> run.

LOL, this part of the GHCB is debatable, though I guess it does say "may"...

  Using VMGEXIT SW_EXITCODE 0x8000_0013, an SEV-SNP guest can create or update the
  vCPU state of an AP, which may allow for a simpler and more secure method of
                                             ^^^^^^^
  booting an AP.

> +	if (VALID_PAGE(svm->snp_vmsa_pfn)) {

KVM's VMSA page should be freed on a successful "switch", because AFAICT it's
incorrect for KVM to ever go back to the original VMSA.

> +		/*
> +		 * The snp_vmsa_pfn fields holds the hypervisor physical address
> +		 * of the about to be replaced VMSA which will no longer be used
> +		 * or referenced, so un-pin it.
> +		 */
> +		kvm_release_pfn_dirty(svm->snp_vmsa_pfn);
> +		svm->snp_vmsa_pfn = INVALID_PAGE;
> +	}
> +
> +	if (VALID_PAGE(svm->snp_vmsa_gpa)) {
> +		/*
> +		 * The VMSA is referenced by the hypervisor physical address,
> +		 * so retrieve the PFN and pin it.
> +		 */
> +		pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(svm->snp_vmsa_gpa));

Oh yay, a gfn.  That means that the page is subject to memslot movement.  I don't
think the code will break per se, but it's a wrinkle that's not handled.

I'm also pretty sure the page will effectively be leaked, I don't see a

	kvm_release_pfn_dirty(svm->snp_vmsa_pfn);

in vCPU teardown.

Furthermore, letting the guest specify the page would open up to exploits of the
erratum where a spurious RMP violation is signaled if an in-use page, a.k.a. VMSA
page, is 2mb aligned.  That also means the _guest_ needs to be somehow be aware
of the erratum.

And digging through the guest patches, this gives the guest _full_ control over
the VMSA contents.  That is bonkers.  At _best_ it gives the guest the ability to
fuzz VMRUN ucode by stuffing garbage into the VMSA.

Honestly, why should KVM even support guest-provided VMSAs?  It's far, far simpler
to handle this fully in the guest with a BIOS<=>kernel mailbox; see the MP wakeup
protocol being added for TDX.  That would allow improving the security for SEV-ES
as well, though I'm guessing no one actually cares about that in practice.

IIUC, the use case for VMPLs is that VMPL0 would be fully trusted by both the host
and guest, i.e. attacks via the VMSA are out-of-scope.  That is very much not the
case here.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2021-10-15 18:05   ` Sean Christopherson
@ 2021-10-15 20:18     ` Brijesh Singh
  2021-10-15 20:27       ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-10-15 20:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/15/21 1:05 PM, Sean Christopherson wrote:
> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
>> index f383d2a89263..8627c49666c9 100644
>> --- a/arch/x86/kernel/sev.c
>> +++ b/arch/x86/kernel/sev.c
>> @@ -2419,3 +2419,75 @@ int snp_lookup_rmpentry(u64 pfn, int *level)
>>  	return !!rmpentry_assigned(e);
>>  }
>>  EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
>> +
>> +int psmash(u64 pfn)
>> +{
>> +	unsigned long paddr = pfn << PAGE_SHIFT;
> Probably better to use __pfn_to_phys()?

Sure, I can use that instead of direct shift.


>
>> +	int ret;
>> +
>> +	if (!pfn_valid(pfn))
>> +		return -EINVAL;
>> +
>> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> Shouldn't this be a WARN_ON_ONCE()?

Since some of these function are called while handling the PSC so I
tried to avoid using the WARN -- mainly because if the warn_on_panic=1
is set on the host then it will result in the kernel panic.

>
>> +		return -ENXIO;
>> +
>> +	/* Binutils version 2.36 supports the PSMASH mnemonic. */
>> +	asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
>> +		      : "=a"(ret)
>> +		      : "a"(paddr)
>> +		      : "memory", "cc");
>> +
>> +	return ret;
> I don't like returning the raw result from hardware; it's mostly works because
> hardware also uses '0' for success, but it will cause confusion should hardware
> ever set bit 31.  There are also failures that likely should never happen unless
> there's a kernel bug, e.g. I suspect we can do:
>
> 	if (WARN_ON_ONCE(ret == FAIL_INPUT))
> 		return -EINVAL;
> 	if (WARN_ON_ONCE(ret == FAIL_PERMISSION))
> 		return -EPERM;
> 	
> 	....
>
> or if all errors are "impossible"
>
> 	if (WARN_ON_ONCE(ret))
> 		return snp_error_code_to_errno(ret);
>
> FAIL_INUSE and FAIL_OVERLAP also need further discussion.  It's not clear to me
> that two well-behaved callers can't encounter collisions due to the 2mb <=> 4kb
> interactions.  If collisions between well-behaved callers are possible, then this
> helper likely needs some form of serialization.  Either, the concurrency rules
> for RMP access need explicit and lengthy documentation.

I don't think we need to serialize the calls. The hardware acquires the
lock before changing the RMP table, and if another processor tries to
access the same RMP table entry, the hardware will return FAIL_INUSE or
#NPF. The FAIL_INUSE will happen on the RMPUPDATE, whereas the #NPF will
occur if the guest attempts to modify the RMP entry (such as using the
RMPADJUST).

As per the FAIL_OVERLAP is concerns, it's the case where the guest is
asking to create an invalid situation and hardware detects it.  In other
words, it is a guest bug. e.g., the guest issued a PSC to add a page as
2MB and then asked to add the same page (or one of subpage) as a 4K.
Hardware sees the overlap condition and fails to change the state on the
second request.


thanks


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2021-10-15 20:18     ` Brijesh Singh
@ 2021-10-15 20:27       ` Sean Christopherson
  2021-10-15 20:36         ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-10-15 20:27 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Oct 15, 2021, Brijesh Singh wrote:
> On 10/15/21 1:05 PM, Sean Christopherson wrote:
> > On Fri, Aug 20, 2021, Brijesh Singh wrote:
> >> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> > Shouldn't this be a WARN_ON_ONCE()?
> 
> Since some of these function are called while handling the PSC so I
> tried to avoid using the WARN -- mainly because if the warn_on_panic=1
> is set on the host then it will result in the kernel panic.

But why would KVM be handling PSC requests if SNP is disabled?

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2021-10-15 20:27       ` Sean Christopherson
@ 2021-10-15 20:36         ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-10-15 20:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/15/21 3:27 PM, Sean Christopherson wrote:
> On Fri, Oct 15, 2021, Brijesh Singh wrote:
>> On 10/15/21 1:05 PM, Sean Christopherson wrote:
>>> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>>>> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>>> Shouldn't this be a WARN_ON_ONCE()?
>> Since some of these function are called while handling the PSC so I
>> tried to avoid using the WARN -- mainly because if the warn_on_panic=1
>> is set on the host then it will result in the kernel panic.
> But why would KVM be handling PSC requests if SNP is disabled?

The RMPUPDATE is also used by the CCP drv to change the page state
during the initialization. You are right that neither KVM nor CCP should
be using these function when SNP is not enabled. In your peudo code you
used the WARN_ON_ONCE() for the cpu_feature_enabled() and return code
check. I was more concern about the return code WARN_ON_ONCE() because
that will be called during the PSC. I am okay with using the
WARN_ON_ONCE() for the cpu_feature_enabled() check.

thanks


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2021-10-15 19:50   ` Sean Christopherson
@ 2021-10-20 21:48     ` Brijesh Singh
  2021-10-20 23:01       ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-10-20 21:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 10/15/21 2:50 PM, Sean Christopherson wrote:
> On Fri, Aug 20, 2021, Brijesh Singh wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
>> guests to alter the register state of the APs on their own. This allows
>> the guest a way of simulating INIT-SIPI.
>>
>> A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
>> so as to avoid updating the VMSA pointer while the vCPU is running.
>>
>> For CREATE
>>   The guest supplies the GPA of the VMSA to be used for the vCPU with the
>>   specified APIC ID. The GPA is saved in the svm struct of the target
>>   vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added to the
>>   vCPU and then the vCPU is kicked.
>>
>> For CREATE_ON_INIT:
>>   The guest supplies the GPA of the VMSA to be used for the vCPU with the
>>   specified APIC ID the next time an INIT is performed. The GPA is saved
>>   in the svm struct of the target vCPU.
>>
>> For DESTROY:
>>   The guest indicates it wishes to stop the vCPU. The GPA is cleared from
>>   the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
>>   to vCPU and then the vCPU is kicked.
>>
>>
>> The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked as
>> a result of the event or as a result of an INIT. The handler sets the vCPU
>> to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will leave the
>> vCPU as not runnable. Any previous VMSA pages that were installed as
>> part of an SEV-SNP AP Creation NAE event are un-pinned. If a new VMSA is
>> to be installed, the VMSA guest page is pinned and set as the VMSA in the
>> vCPU VMCB and the vCPU state is set to KVM_MP_STATE_RUNNABLE. If a new
>> VMSA is not to be installed, the VMSA is cleared in the vCPU VMCB and the
>> vCPU state is left as KVM_MP_STATE_UNINITIALIZED to prevent it from being
>> run.
> LOL, this part of the GHCB is debatable, though I guess it does say "may"...
>
>   Using VMGEXIT SW_EXITCODE 0x8000_0013, an SEV-SNP guest can create or update the
>   vCPU state of an AP, which may allow for a simpler and more secure method of
>                                              ^^^^^^^
>   booting an AP.
>
>> +	if (VALID_PAGE(svm->snp_vmsa_pfn)) {
> KVM's VMSA page should be freed on a successful "switch", because AFAICT it's
> incorrect for KVM to ever go back to the original VMSA.
>
>> +		/*
>> +		 * The snp_vmsa_pfn fields holds the hypervisor physical address
>> +		 * of the about to be replaced VMSA which will no longer be used
>> +		 * or referenced, so un-pin it.
>> +		 */
>> +		kvm_release_pfn_dirty(svm->snp_vmsa_pfn);
>> +		svm->snp_vmsa_pfn = INVALID_PAGE;
>> +	}
>> +
>> +	if (VALID_PAGE(svm->snp_vmsa_gpa)) {
>> +		/*
>> +		 * The VMSA is referenced by the hypervisor physical address,
>> +		 * so retrieve the PFN and pin it.
>> +		 */
>> +		pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(svm->snp_vmsa_gpa));
> Oh yay, a gfn.  That means that the page is subject to memslot movement.  I don't
> think the code will break per se, but it's a wrinkle that's not handled.
>
> I'm also pretty sure the page will effectively be leaked, I don't see a
>
> 	kvm_release_pfn_dirty(svm->snp_vmsa_pfn);
>
> in vCPU teardown.
>
> Furthermore, letting the guest specify the page would open up to exploits of the
> erratum where a spurious RMP violation is signaled if an in-use page, a.k.a. VMSA
> page, is 2mb aligned.  That also means the _guest_ needs to be somehow be aware
> of the erratum.

Good point Sean, a guest could exploit the IN_USE erratum in this case.
We need to somehow communicate this to guest so that it does not
allocate the VMSA at 2MB boundary. It would be nice if GHCB spec can add
a requirement that VMSA should not be a 2MB aligned. I will see what we
can do to address this.


> And digging through the guest patches, this gives the guest _full_ control over
> the VMSA contents.  That is bonkers.  At _best_ it gives the guest the ability to
> fuzz VMRUN ucode by stuffing garbage into the VMSA.

If guest puts garbage in VMSA then VMRUN will fail. I am sure ucode is
doing all kind of sanity checks to ensure that VMSA does not contain
invalid value before the run.


> Honestly, why should KVM even support guest-provided VMSAs?  It's far, far simpler
> to handle this fully in the guest with a BIOS<=>kernel mailbox; see the MP wakeup
> protocol being added for TDX.  That would allow improving the security for SEV-ES
> as well, though I'm guessing no one actually cares about that in practice.
> IIUC, the use case for VMPLs is that VMPL0 would be fully trusted by both the host
> and guest, i.e. attacks via the VMSA are out-of-scope.  That is very much not the
> case here.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event
  2021-10-20 21:48     ` Brijesh Singh
@ 2021-10-20 23:01       ` Sean Christopherson
  0 siblings, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-10-20 23:01 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Wed, Oct 20, 2021, Brijesh Singh wrote:
> 
> On 10/15/21 2:50 PM, Sean Christopherson wrote:
> > And digging through the guest patches, this gives the guest _full_ control over
> > the VMSA contents.  That is bonkers.  At _best_ it gives the guest the ability to
> > fuzz VMRUN ucode by stuffing garbage into the VMSA.
> 
> If guest puts garbage in VMSA then VMRUN will fail. I am sure ucode is
> doing all kind of sanity checks to ensure that VMSA does not contain
> invalid value before the run.

Oh, I'm well aware of the number of sanity checks that are in VM-Enter ucode, and
that's precisely why I'm of the opinion that letting the guest fuzz VMRUN is a
non-trivial security risk for the host.  I know of at least at least two VMX bugs
(one erratum that I could find, one that must have been fixed with a ucode patch?)
where ucode failed to detect invalid state.  Those were "benign" in that they
caused a missed VM-Fail but didn't corrupt CPU state, but it's not a stretch to
imagine a ucode bug that leads to corruption of CPU state and a system crash.

The sheer number of checks involved, combined with the fact that there likely
hasn't been much fuzzing of VM-Enter outside of the hardware vendor's own
validation, means I'm not exactly brimming with confidence that VMRUN's ucode
is perfect.

I fully acknowledge that the host kernel obviously "trusts" CPU ucode to a great
extent.  My point here is that the design exposes the host to unnecessary risk.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (44 preceding siblings ...)
  2021-08-20 15:59 ` [PATCH Part2 v5 45/45] KVM: SVM: Add module parameter to enable the SEV-SNP Brijesh Singh
@ 2021-11-12 15:43 ` Peter Gonda
  2021-11-12 17:59   ` Dave Hansen
  2021-11-22 15:23   ` Brijesh Singh
  45 siblings, 2 replies; 239+ messages in thread
From: Peter Gonda @ 2021-11-12 15:43 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

Hi Brijesh,,

One high level discussion I'd like to have on these SNP KVM patches.

In these patches (V5) if a host userspace process writes a guest
private page a SIGBUS is issued to that process. If the kernel writes
a guest private page then the kernel panics due to the unhandled RMP
fault page fault. This is an issue because not all writes into guest
memory may come from a bug in the host. For instance a malicious or
even buggy guest could easily point the host to writing a private page
during the emulation of many virtual devices (virtio, NVMe, etc). For
example if a well behaved guests behavior is to: start up a driver,
select some pages to share with the guest, ask the host to convert
them to shared, then use those pages for virtual device DMA, if a
buggy guest forget the step to request the pages be converted to
shared its easy to see how the host could rightfully write to private
memory. I think we can better guarantee host reliability when running
SNP guests without changing SNP’s security properties.

Here is an alternative to the current approach: On RMP violation (host
or userspace) the page fault handler converts the page from private to
shared to allow the write to continue. This pulls from s390’s error
handling which does exactly this. See ‘arch_make_page_accessible()’.
Additionally it adds less complexity to the SNP kernel patches, and
requires no new ABI.

In the current (V5) KVM implementation if a userspace process
generates an RMP violation (writes to guest private memory) the
process receives a SIGBUS. At first glance, it would appear that
user-space shouldn’t write to private memory. However, guaranteeing
this in a generic fashion requires locking the RMP entries (via locks
external to the RMP). Otherwise, a user-space process emulating a
guest device IO may be vulnerable to having the guest memory
(maliciously or by guest bug) converted to private while user-space
emulation is happening. This results in a well behaved userspace
process receiving a SIGBUS.

This proposal allows buggy and malicious guests to run under SNP
without jeopardizing the reliability / safety of host processes. This
is very important to a cloud service provider (CSP) since it’s common
to have host wide daemons that write/read all guests, i.e. a single
process could manage the networking for all VMs on the host. Crashing
that singleton process kills networking for all VMs on the system.

This proposal also allows for minimal changes to the kexec flow and
kdump. The new kexec kernel can simply update private pages to shared
as it encounters them during their boot. This avoids needing to
propagate the RMP state from kernel to kernel. Of course this doesn’t
preserve any running VMs but is still useful for kdump crash dumps or
quicker rekerneling for development with kexec.

This proposal does cause guest memory corruption for some bugs but one
of SEV-SNP’s goals extended from SEV-ES’s goals is for guest’s to be
able to detect when its memory has been corrupted / replayed by the
host. So SNP already has features for allowing guests to detect this
kind of memory corruption. Additionally this is very similar to a page
of memory generating a machine check because of 2-bit memory
corruption. In other words SNP guests must be enlightened and ready
for these kinds of errors.

For an SNP guest running under this proposal the flow would look like this:
* Host gets a #PF because its trying to write to a private page.
* Host #PF handler updates the page to shared.
* Write continues normally.
* Guest accesses memory (r/w).
* Guest gets a #VC error because the page is not PVALIDATED
* Guest is now in control. Guest can terminate because its memory has
been corrupted. Guest could try and continue to log the error to its
owner.

A similar approach was introduced in the SNP patches V1 and V2 for
kernel page fault handling. The pushback around this convert to shared
approach was largely focused around the idea that the kernel has all
the information about which pages are shared vs private so it should
be able to check shared status before write to pages. After V2 the
patches were updated to not have a kernel page fault handler for RMP
violations (other than dumping state during a panic). The current
patches protect the host with new post_{map,unmap}_gfn() function that
checks if a page is shared before mapping it, then locks the page
shared until unmapped. Given the discussions on ‘[Part2,v5,39/45] KVM:
SVM: Introduce ops for the post gfn map and unmap’ building a solution
to do this is non trivial and adds new overheads to KVM. Additionally
the current solution is local to the kernel. So a new ABI just now be
created to allow the userspace VMM to access the kernel-side locks for
this to work generically for the whole host. This is more complicated
than this proposal and adding more lock holders seems like it could
reduce performance further.

There are a couple corner cases with this approach. Under SNP guests
can request their memory be changed into a VMSA. This VMSA page cannot
be changed to shared while the vCPU associated with it is running. So
KVM + the #PF handler will need something to kick vCPUs from running.
Joerg believes that a possible fix for this could be a new MMU
notifier in the kernel, then on the #PF we can go through the rmp and
execute this vCPU kick callback.

Another corner case is the RMPUPDATE instruction is not guaranteed to
succeed on first iteration. As noted above if the page is a VMSA it
cannot be updated while the vCPU is running. Another issue is if the
guest is running a RMPADJUST on a page it cannot be RMPUPDATED at that
time. There is a lock for each RMP Entry so there is a race for these
instructions. The vCPU kicking can solve this issue to be kicking all
guest vCPUs which removes the chance for the race.

Since this proposal probably results in SNP guests terminating due to
a page unexpectedly needing PVALIDATE. The approach could be
simplified to just the KVM killing the guest. I think it's nicer to
users to instead of unilaterally killing the guest allowing the
unvalidated #VC exception to allow users to collect some additional
debug information and any additional clean up work they would like to
perform.

Thanks
Peter

On Fri, Aug 20, 2021 at 9:59 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
> changes required in a host OS for SEV-SNP support. The series builds upon
> SEV-SNP Part-1.
>
> This series provides the basic building blocks to support booting the SEV-SNP
> VMs, it does not cover all the security enhancement introduced by the SEV-SNP
> such as interrupt protection.
>
> The CCP driver is enhanced to provide new APIs that use the SEV-SNP
> specific commands defined in the SEV-SNP firmware specification. The KVM
> driver uses those APIs to create and managed the SEV-SNP guests.
>
> The GHCB specification version 2 introduces new set of NAE's that is
> used by the SEV-SNP guest to communicate with the hypervisor. The series
> provides support to handle the following new NAE events:
> - Register GHCB GPA
> - Page State Change Request
> - Hypevisor feature
> - Guest message request
>
> The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
> access requires an RMP check. In particular, the read accesses from the
> hypervisor do not require RMP checks because the data confidentiality is
> already protected via memory encryption. When hardware encounters an RMP
> checks failure, it raises a page-fault exception. If RMP check failure
> is due to the page-size mismatch, then split the large page to resolve
> the fault.
>
> The series does not provide support for the interrupt security and migration
> and those feature will be added after the base support.
>
> The series is based on the commit:
>  SNP part1 commit and
>  fa7a549d321a (kvm/next, next) KVM: x86: accept userspace interrupt only if no event is injected
>
> TODO:
>   * Add support for command to ratelimit the guest message request.
>
> Changes since v4:
>  * Move the RMP entry definition to x86 specific header file.
>  * Move the dump RMP entry function to SEV specific file.
>  * Use BIT_ULL while defining the #PF bit fields.
>  * Add helper function to check the IOMMU support for SEV-SNP feature.
>  * Add helper functions for the page state transition.
>  * Map and unmap the pages from the direct map after page is added or
>    removed in RMP table.
>  * Enforce the minimum SEV-SNP firmware version.
>  * Extend the LAUNCH_UPDATE to accept the base_gfn and remove the
>    logic to calculate the gfn from the hva.
>  * Add a check in LAUNCH_UPDATE to ensure that all the pages are
>    shared before calling the PSP.
>  * Mark the memory failure when failing to remove the page from the
>    RMP table or clearing the immutable bit.
>  * Exclude the encrypted hva range from the KSM.
>  * Remove the gfn tracking during the kvm_gfn_map() and use SRCU to
>    syncronize the PSC and gfn mapping.
>  * Allow PSC on the registered hva range only.
>  * Add support for the Preferred GPA VMGEXIT.
>  * Simplify the PSC handling routines.
>  * Use the static_call() for the newly added kvm_x86_ops.
>  * Remove the long-lived GHCB map.
>  * Move the snp enable module parameter to the end of the file.
>  * Remove the kvm_x86_op for the RMP fault handling. Call the
>    fault handler directly from the #NPF interception.
>
> Changes since v3:
>  * Add support for extended guest message request.
>  * Add ioctl to query the SNP Platform status.
>  * Add ioctl to get and set the SNP config.
>  * Add check to verify that memory reserved for the RMP covers the full system RAM.
>  * Start the SNP specific commands from 256 instead of 255.
>  * Multiple cleanup and fixes based on the review feedback.
>
> Changes since v2:
>  * Add AP creation support.
>  * Drop the patch to handle the RMP fault for the kernel address.
>  * Add functions to track the write access from the hypervisor.
>  * Do not enable the SNP feature when IOMMU is disabled or is in passthrough mode.
>  * Dump the RMP entry on RMP violation for the debug.
>  * Shorten the GHCB macro names.
>  * Start the SNP_INIT command id from 255 to give some gap for the legacy SEV.
>  * Sync the header with the latest 0.9 SNP spec.
>
> Changes since v1:
>  * Add AP reset MSR protocol VMGEXIT NAE.
>  * Add Hypervisor features VMGEXIT NAE.
>  * Move the RMP table initialization and RMPUPDATE/PSMASH helper in
>    arch/x86/kernel/sev.c.
>  * Add support to map/unmap SEV legacy command buffer to firmware state when
>    SNP is active.
>  * Enhance PSP driver to provide helper to allocate/free memory used for the
>    firmware context page.
>  * Add support to handle RMP fault for the kernel address.
>  * Add support to handle GUEST_REQUEST NAE event for attestation.
>  * Rename RMP table lookup helper.
>  * Drop typedef from rmpentry struct definition.
>  * Drop SNP static key and use cpu_feature_enabled() to check whether SEV-SNP
>    is active.
>  * Multiple cleanup/fixes to address Boris review feedback.
>
> Brijesh Singh (40):
>   x86/cpufeatures: Add SEV-SNP CPU feature
>   iommu/amd: Introduce function to check SEV-SNP support
>   x86/sev: Add the host SEV-SNP initialization support
>   x86/sev: Add RMP entry lookup helpers
>   x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
>   x86/sev: Invalid pages from direct map when adding it to RMP table
>   x86/traps: Define RMP violation #PF error code
>   x86/fault: Add support to handle the RMP fault for user address
>   x86/fault: Add support to dump RMP entry on fault
>   crypto: ccp: shutdown SEV firmware on kexec
>   crypto:ccp: Define the SEV-SNP commands
>   crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
>   crypto:ccp: Provide APIs to issue SEV-SNP commands
>   crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
>   crypto: ccp: Handle the legacy SEV command when SNP is enabled
>   crypto: ccp: Add the SNP_PLATFORM_STATUS command
>   crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
>   crypto: ccp: Provide APIs to query extended attestation report
>   KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
>   KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
>   KVM: SVM: Add initial SEV-SNP support
>   KVM: SVM: Add KVM_SNP_INIT command
>   KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
>   KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
>   KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
>   KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
>   KVM: X86: Keep the NPT and RMP page level in sync
>   KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
>   KVM: x86: Define RMP page fault error bits for #NPF
>   KVM: x86: Update page-fault trace to log full 64-bit error code
>   KVM: SVM: Do not use long-lived GHCB map while setting scratch area
>   KVM: SVM: Remove the long-lived GHCB host map
>   KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
>   KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
>   KVM: SVM: Add support to handle Page State Change VMGEXIT
>   KVM: SVM: Introduce ops for the post gfn map and unmap
>   KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
>   KVM: SVM: Add support to handle the RMP nested page fault
>   KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
>   KVM: SVM: Add module parameter to enable the SEV-SNP
>
> Sean Christopherson (2):
>   KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
>   KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP
>
> Tom Lendacky (3):
>   KVM: SVM: Add support to handle AP reset MSR protocol
>   KVM: SVM: Use a VMSA physical address variable for populating VMCB
>   KVM: SVM: Support SEV-SNP AP Creation NAE event
>
>  Documentation/virt/coco/sevguest.rst          |   55 +
>  .../virt/kvm/amd-memory-encryption.rst        |  102 +
>  arch/x86/include/asm/cpufeatures.h            |    1 +
>  arch/x86/include/asm/disabled-features.h      |    8 +-
>  arch/x86/include/asm/kvm-x86-ops.h            |    5 +
>  arch/x86/include/asm/kvm_host.h               |   20 +
>  arch/x86/include/asm/msr-index.h              |    6 +
>  arch/x86/include/asm/sev-common.h             |   28 +
>  arch/x86/include/asm/sev.h                    |   45 +
>  arch/x86/include/asm/svm.h                    |    7 +
>  arch/x86/include/asm/trap_pf.h                |   18 +-
>  arch/x86/kernel/cpu/amd.c                     |    3 +-
>  arch/x86/kernel/sev.c                         |  361 ++++
>  arch/x86/kvm/lapic.c                          |    5 +-
>  arch/x86/kvm/mmu.h                            |    7 +-
>  arch/x86/kvm/mmu/mmu.c                        |   84 +-
>  arch/x86/kvm/svm/sev.c                        | 1676 ++++++++++++++++-
>  arch/x86/kvm/svm/svm.c                        |   62 +-
>  arch/x86/kvm/svm/svm.h                        |   74 +-
>  arch/x86/kvm/trace.h                          |   40 +-
>  arch/x86/kvm/x86.c                            |   92 +-
>  arch/x86/mm/fault.c                           |   84 +-
>  drivers/crypto/ccp/sev-dev.c                  |  924 ++++++++-
>  drivers/crypto/ccp/sev-dev.h                  |   17 +
>  drivers/crypto/ccp/sp-pci.c                   |   12 +
>  drivers/iommu/amd/init.c                      |   30 +
>  include/linux/iommu.h                         |    9 +
>  include/linux/mm.h                            |    6 +-
>  include/linux/psp-sev.h                       |  346 ++++
>  include/linux/sev.h                           |   32 +
>  include/uapi/linux/kvm.h                      |   56 +
>  include/uapi/linux/psp-sev.h                  |   60 +
>  mm/memory.c                                   |   13 +
>  tools/arch/x86/include/asm/cpufeatures.h      |    1 +
>  34 files changed, 4088 insertions(+), 201 deletions(-)
>  create mode 100644 include/linux/sev.h
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 15:43 ` [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Peter Gonda
@ 2021-11-12 17:59   ` Dave Hansen
  2021-11-12 18:35     ` Borislav Petkov
  2021-11-22 15:23   ` Brijesh Singh
  1 sibling, 1 reply; 239+ messages in thread
From: Dave Hansen @ 2021-11-12 17:59 UTC (permalink / raw)
  To: Peter Gonda, Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On 11/12/21 7:43 AM, Peter Gonda wrote:
...
> Here is an alternative to the current approach: On RMP violation (host
> or userspace) the page fault handler converts the page from private to
> shared to allow the write to continue. This pulls from s390’s error
> handling which does exactly this. See ‘arch_make_page_accessible()’.
> Additionally it adds less complexity to the SNP kernel patches, and
> requires no new ABI.

I think it's important to very carefully describe where these RMP page
faults can occur within the kernel.  They can obvious occur on things like:

	copy_to_user(&user_buf, &kernel_buf, len);

That's not a big deal.  Those can obviously handle page faults.  We know
exactly the instruction on which the fault can occur and we handle it
gracefully.

*But*, these are harder:

	get_user_pages(addr, len, &pages);
	spin_lock(&lock);
	ptr = kmap_atomic(pages[0]);
	*ptr = foo; // #PF here
	kunmap_atomic(ptr);
	spin_unlock(&lock);
	put_page(pages[0]);

In this case, the place where the fault happens are not especially well
bounded.  It can be in compiler-generated code.  It can happen on any
random instruction in there.

Or, is there some mechanism that prevent guest-private memory from being
accessed in random host kernel code?

> This proposal does cause guest memory corruption for some bugs but one
> of SEV-SNP’s goals extended from SEV-ES’s goals is for guest’s to be
> able to detect when its memory has been corrupted / replayed by the
> host. So SNP already has features for allowing guests to detect this
> kind of memory corruption. Additionally this is very similar to a page
> of memory generating a machine check because of 2-bit memory
> corruption. In other words SNP guests must be enlightened and ready
> for these kinds of errors.
> 
> For an SNP guest running under this proposal the flow would look like this:
> * Host gets a #PF because its trying to write to a private page.
> * Host #PF handler updates the page to shared.
> * Write continues normally.
> * Guest accesses memory (r/w).
> * Guest gets a #VC error because the page is not PVALIDATED
> * Guest is now in control. Guest can terminate because its memory has
> been corrupted. Guest could try and continue to log the error to its
> owner.

This sounds like a _possible_ opportunity for the guest to do some extra
handling.  It's also quite possible that this #VC happens in a place
that the guest can't handle.


> A similar approach was introduced in the SNP patches V1 and V2 for
> kernel page fault handling. The pushback around this convert to shared
> approach was largely focused around the idea that the kernel has all
> the information about which pages are shared vs private so it should
> be able to check shared status before write to pages. After V2 the
> patches were updated to not have a kernel page fault handler for RMP
> violations (other than dumping state during a panic). The current
> patches protect the host with new post_{map,unmap}_gfn() function that
> checks if a page is shared before mapping it, then locks the page
> shared until unmapped. Given the discussions on ‘[Part2,v5,39/45] KVM:
> SVM: Introduce ops for the post gfn map and unmap’ building a solution
> to do this is non trivial and adds new overheads to KVM. Additionally
> the current solution is local to the kernel. So a new ABI just now be
> created to allow the userspace VMM to access the kernel-side locks for
> this to work generically for the whole host. This is more complicated
> than this proposal and adding more lock holders seems like it could
> reduce performance further.

The locking is complicated.  But, I think it is necessary.  Once the
kernel can map private memory, it can access it in random spots.  It's
hard to make random kernel code recoverable from faults.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 17:59   ` Dave Hansen
@ 2021-11-12 18:35     ` Borislav Petkov
  2021-11-12 19:48       ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Borislav Petkov @ 2021-11-12 18:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Peter Gonda, Brijesh Singh, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
> Or, is there some mechanism that prevent guest-private memory from being
> accessed in random host kernel code?

So I'm currently under the impression that random host->guest accesses
should not happen if not previously agreed upon by both.

Because, as explained on IRC, if host touches a private guest page,
whatever the host does to that page, the next time the guest runs, it'll
get a #VC where it will see that that page doesn't belong to it anymore
and then, out of paranoia, it will simply terminate to protect itself.

So cloud providers should have an interest to prevent such random stray
accesses if they wanna have guests. :)

> This sounds like a _possible_ opportunity for the guest to do some extra
> handling.  It's also quite possible that this #VC happens in a place
> that the guest can't handle.

How? It'll get a #VC when it first touches that page.

I'd say the #VC handler should be able to deal with it...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 18:35     ` Borislav Petkov
@ 2021-11-12 19:48       ` Sean Christopherson
  2021-11-12 20:04         ` Borislav Petkov
                           ` (3 more replies)
  0 siblings, 4 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-11-12 19:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Peter Gonda, Brijesh Singh, x86, linux-kernel, kvm,
	linux-coco, linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021, Borislav Petkov wrote:
> On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
> > Or, is there some mechanism that prevent guest-private memory from being
> > accessed in random host kernel code?

Or random host userspace code...

> So I'm currently under the impression that random host->guest accesses
> should not happen if not previously agreed upon by both.

Key word "should".

> Because, as explained on IRC, if host touches a private guest page,
> whatever the host does to that page, the next time the guest runs, it'll
> get a #VC where it will see that that page doesn't belong to it anymore
> and then, out of paranoia, it will simply terminate to protect itself.
> 
> So cloud providers should have an interest to prevent such random stray
> accesses if they wanna have guests. :)

Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.

On Fri, Nov 12, 2021, Peter Gonda wrote:
> Here is an alternative to the current approach: On RMP violation (host
> or userspace) the page fault handler converts the page from private to
> shared to allow the write to continue. This pulls from s390’s error
> handling which does exactly this. See ‘arch_make_page_accessible()’.

Ah, after further reading, s390 does _not_ do implicit private=>shared conversions.

s390's arch_make_page_accessible() is somewhat similar, but it is not a direct
comparison.  IIUC, it exports and integrity protects the data and thus preserves
the guest's data in an encrypted form, e.g. so that it can be swapped to disk.
And if the host corrupts the data, attempting to convert it back to secure on a
subsequent guest access will fail.

The host kernel's handling of the "convert to secure" failures doesn't appear to
be all that robust, e.g. it looks like there are multiple paths where the error
is dropped on the floor and the guest is resumed , but IMO soft hanging the guest 
is still better than inducing a fault in the guest, and far better than potentially
coercing the guest into reading corrupted memory ("spurious" PVALIDATE).  And s390's
behavior is fixable since it's purely a host error handling problem.

To truly make a page shared, s390 requires the guest to call into the ultravisor
to make a page shared.  And on the host side, the host can pin a page as shared
to prevent the guest from unsharing it while the host is accessing it as a shared
page.

So, inducing #VC is similar in the sense that a malicious s390 can also DoS itself,
but is quite different in that (AFAICT) s390 does not create an attack surface where
a malicious or buggy host userspace can induce faults in the guest, or worst case in
SNP, exploit a buggy guest into accepting and accessing corrupted data.

It's also different in that s390 doesn't implicitly convert between shared and
private.  Functionally, it doesn't really change the end result because a buggy
host that writes guest private memory will DoS the guest (by inducing a #VC or
corrupting exported data), but at least for s390 there's a sane, legitimate use
case for accessing guest private memory (swap and maybe migration?), whereas for
SNP, IMO implicitly converting to shared on a host access is straight up wrong.

> Additionally it adds less complexity to the SNP kernel patches, and
> requires no new ABI.

I disagree, this would require "new" ABI in the sense that it commits KVM to
supporting SNP without requiring userspace to initiate any and all conversions
between shared and private.  Which in my mind is the big elephant in the room:
do we want to require new KVM (and kernel?) ABI to allow/force userspace to
explicitly declare guest private memory for TDX _and_ SNP, or just TDX?

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 19:48       ` Sean Christopherson
@ 2021-11-12 20:04         ` Borislav Petkov
  2021-11-12 20:37           ` Sean Christopherson
  2021-11-12 21:16         ` Marc Orr
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 239+ messages in thread
From: Borislav Petkov @ 2021-11-12 20:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dave Hansen, Peter Gonda, Brijesh Singh, x86, linux-kernel, kvm,
	linux-coco, linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021 at 07:48:17PM +0000, Sean Christopherson wrote:
> Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.

What do you suggest instead?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 20:04         ` Borislav Petkov
@ 2021-11-12 20:37           ` Sean Christopherson
  2021-11-12 20:53             ` Borislav Petkov
                               ` (2 more replies)
  0 siblings, 3 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-11-12 20:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Peter Gonda, Brijesh Singh, x86, linux-kernel, kvm,
	linux-coco, linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021, Borislav Petkov wrote:
> On Fri, Nov 12, 2021 at 07:48:17PM +0000, Sean Christopherson wrote:
> > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
> 
> What do you suggest instead?

Let userspace decide what is mapped shared and what is mapped private.  The kernel
and KVM provide the APIs/infrastructure to do the actual conversions in a thread-safe
fashion and also to enforce the current state, but userspace is the control plane.

It would require non-trivial changes in userspace if there are multiple processes
accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
solvable.  The exit to userspace means all three components (guest, kernel, 
and userspace) have full knowledge of what is shared and what is private.  There
is zero ambiguity:

  - if userspace accesses guest private memory, it gets SIGSEGV or whatever.  
  - if kernel accesses guest private memory, it does BUG/panic/oops[*]
  - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.

This is the direction KVM TDX support is headed, though it's obviously still a WIP.

And ideally, to avoid implicit conversions at any level, hardware vendors' ABIs
define that:

  a) All convertible memory, i.e. RAM, starts as private.
  b) Conversions between private and shared must be done via explicit hypercall.

Without (b), userspace and thus KVM have to treat guest accesses to the incorrect
type as implicit conversions.

[*] Sadly, fully preventing kernel access to guest private is not possible with
    TDX, especially if the direct map is left intact.  But maybe in the future
    TDX will signal a fault instead of poisoning memory and leaving a #MC mine.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 20:37           ` Sean Christopherson
@ 2021-11-12 20:53             ` Borislav Petkov
  2021-11-12 21:12               ` Peter Gonda
  2021-11-12 21:30             ` Marc Orr
  2021-11-15 16:18             ` Brijesh Singh
  2 siblings, 1 reply; 239+ messages in thread
From: Borislav Petkov @ 2021-11-12 20:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dave Hansen, Peter Gonda, Brijesh Singh, x86, linux-kernel, kvm,
	linux-coco, linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021 at 08:37:59PM +0000, Sean Christopherson wrote:
> Let userspace decide what is mapped shared and what is mapped private. 

With "userspace", you mean the *host* userspace?

> The kernel and KVM provide the APIs/infrastructure to do the actual
> conversions in a thread-safe fashion and also to enforce the current
> state, but userspace is the control plane.
>
> It would require non-trivial changes in userspace if there are multiple processes
> accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
> solvable.  The exit to userspace means all three components (guest, kernel, 
> and userspace) have full knowledge of what is shared and what is private.  There
> is zero ambiguity:
> 
>   - if userspace accesses guest private memory, it gets SIGSEGV or whatever.  

That SIGSEGV is generated by the host kernel, I presume, after it checks
whether the memory belongs to the guest?

>   - if kernel accesses guest private memory, it does BUG/panic/oops[*]

If *it* is the host kernel, then you probably shouldn't do that -
otherwise you just killed the host kernel on which all those guests are
running.

>   - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.

Yah, that's the easy one.

> This is the direction KVM TDX support is headed, though it's obviously still a WIP.
> 
> And ideally, to avoid implicit conversions at any level, hardware vendors' ABIs
> define that:
> 
>   a) All convertible memory, i.e. RAM, starts as private.
>   b) Conversions between private and shared must be done via explicit hypercall.

I like the explicit nature of this but devil's in the detail and I'm no
virt guy...

> Without (b), userspace and thus KVM have to treat guest accesses to the incorrect
> type as implicit conversions.
> 
> [*] Sadly, fully preventing kernel access to guest private is not possible with
>     TDX, especially if the direct map is left intact.  But maybe in the future
>     TDX will signal a fault instead of poisoning memory and leaving a #MC mine.

Yah, the #MC thing sounds like someone didn't think things through. ;-\

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 20:53             ` Borislav Petkov
@ 2021-11-12 21:12               ` Peter Gonda
  2021-11-12 21:20                 ` Andy Lutomirski
  2021-11-13  0:00                 ` Sean Christopherson
  0 siblings, 2 replies; 239+ messages in thread
From: Peter Gonda @ 2021-11-12 21:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sean Christopherson, Dave Hansen, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021 at 1:55 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Fri, Nov 12, 2021 at 08:37:59PM +0000, Sean Christopherson wrote:
> > Let userspace decide what is mapped shared and what is mapped private.
>
> With "userspace", you mean the *host* userspace?
>
> > The kernel and KVM provide the APIs/infrastructure to do the actual
> > conversions in a thread-safe fashion and also to enforce the current
> > state, but userspace is the control plane.
> >
> > It would require non-trivial changes in userspace if there are multiple processes
> > accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
> > solvable.  The exit to userspace means all three components (guest, kernel,
> > and userspace) have full knowledge of what is shared and what is private.  There
> > is zero ambiguity:
> >
> >   - if userspace accesses guest private memory, it gets SIGSEGV or whatever.
>
> That SIGSEGV is generated by the host kernel, I presume, after it checks
> whether the memory belongs to the guest?
>
> >   - if kernel accesses guest private memory, it does BUG/panic/oops[*]
>
> If *it* is the host kernel, then you probably shouldn't do that -
> otherwise you just killed the host kernel on which all those guests are
> running.

I agree, it seems better to terminate the single guest with an issue.
Rather than killing the host (and therefore all guests). So I'd
suggest even in this case we do the 'convert to shared' approach or
just outright terminate the guest.

Are there already examples in KVM of a KVM bug in servicing a VM's
request results in a BUG/panic/oops? That seems not ideal ever.

>
> >   - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.
>
> Yah, that's the easy one.
>
> > This is the direction KVM TDX support is headed, though it's obviously still a WIP.
> >
> > And ideally, to avoid implicit conversions at any level, hardware vendors' ABIs
> > define that:
> >
> >   a) All convertible memory, i.e. RAM, starts as private.
> >   b) Conversions between private and shared must be done via explicit hypercall.
>
> I like the explicit nature of this but devil's in the detail and I'm no
> virt guy...

This seems like a reasonable approach that can help with the issue of
terminating the entity behaving poorly. Could this feature be an
improvement that comes later? This improvement could be gated behind a
per VM KVM CAP, a kvm module param, or insert other solution here, to
not blind side userspace with this change?

>
> > Without (b), userspace and thus KVM have to treat guest accesses to the incorrect
> > type as implicit conversions.
> >
> > [*] Sadly, fully preventing kernel access to guest private is not possible with
> >     TDX, especially if the direct map is left intact.  But maybe in the future
> >     TDX will signal a fault instead of poisoning memory and leaving a #MC mine.
>
> Yah, the #MC thing sounds like someone didn't think things through. ;-\

Yes #MC in TDX seems much harder to deal with than the #PF for SNP.
I'd propose TDX keeps the host kernel safe bug not have that solution
block SNP. As suggested above I like the idea but it seems like it can
come as a future improvement to SNP support.

>
> Thx.
>
> --
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 19:48       ` Sean Christopherson
  2021-11-12 20:04         ` Borislav Petkov
@ 2021-11-12 21:16         ` Marc Orr
  2021-11-12 21:23           ` Andy Lutomirski
  2021-11-15 12:30         ` Dr. David Alan Gilbert
  2021-11-15 16:16         ` Joerg Roedel
  3 siblings, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-11-12 21:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, Dave Hansen, Peter Gonda, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

> > So cloud providers should have an interest to prevent such random stray
> > accesses if they wanna have guests. :)
>
> Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.

I want to push back on "inducing a fault in the guest because of
_host_ bug is wrong.". The guest is _required_ to be robust against
the host maliciously (or accidentally) writing its memory. SNP
security depends on the guest detecting such writes. Therefore, why is
leveraging this system property that the guest will detect when its
private memory has been written wrong? Especially when its orders or
magnitudes simpler than the alternative to have everything in the
system -- kernel, user-space, and guest -- all coordinate to agree
what's private and what's shared. Such a complex approach is likely to
bring a lot of bugs, vulnerabilities, and limitations on future design
into the picture.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 21:12               ` Peter Gonda
@ 2021-11-12 21:20                 ` Andy Lutomirski
  2021-11-12 22:04                   ` Borislav Petkov
  2021-11-12 22:52                   ` Peter Gonda
  2021-11-13  0:00                 ` Sean Christopherson
  1 sibling, 2 replies; 239+ messages in thread
From: Andy Lutomirski @ 2021-11-12 21:20 UTC (permalink / raw)
  To: Peter Gonda, Borislav Petkov
  Cc: Sean Christopherson, Dave Hansen, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Dave Hansen, Sergio Lopez,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On 11/12/21 13:12, Peter Gonda wrote:
> On Fri, Nov 12, 2021 at 1:55 PM Borislav Petkov <bp@alien8.de> wrote:
>>
>> On Fri, Nov 12, 2021 at 08:37:59PM +0000, Sean Christopherson wrote:
>>> Let userspace decide what is mapped shared and what is mapped private.
>>
>> With "userspace", you mean the *host* userspace?
>>
>>> The kernel and KVM provide the APIs/infrastructure to do the actual
>>> conversions in a thread-safe fashion and also to enforce the current
>>> state, but userspace is the control plane.
>>>
>>> It would require non-trivial changes in userspace if there are multiple processes
>>> accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
>>> solvable.  The exit to userspace means all three components (guest, kernel,
>>> and userspace) have full knowledge of what is shared and what is private.  There
>>> is zero ambiguity:
>>>
>>>    - if userspace accesses guest private memory, it gets SIGSEGV or whatever.
>>
>> That SIGSEGV is generated by the host kernel, I presume, after it checks
>> whether the memory belongs to the guest?
>>
>>>    - if kernel accesses guest private memory, it does BUG/panic/oops[*]
>>
>> If *it* is the host kernel, then you probably shouldn't do that -
>> otherwise you just killed the host kernel on which all those guests are
>> running.
> 
> I agree, it seems better to terminate the single guest with an issue.
> Rather than killing the host (and therefore all guests). So I'd
> suggest even in this case we do the 'convert to shared' approach or
> just outright terminate the guest.
> 
> Are there already examples in KVM of a KVM bug in servicing a VM's
> request results in a BUG/panic/oops? That seems not ideal ever.
> 
>>
>>>    - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.
>>
>> Yah, that's the easy one.
>>
>>> This is the direction KVM TDX support is headed, though it's obviously still a WIP.
>>>
>>> And ideally, to avoid implicit conversions at any level, hardware vendors' ABIs
>>> define that:
>>>
>>>    a) All convertible memory, i.e. RAM, starts as private.
>>>    b) Conversions between private and shared must be done via explicit hypercall.
>>
>> I like the explicit nature of this but devil's in the detail and I'm no
>> virt guy...
> 
> This seems like a reasonable approach that can help with the issue of
> terminating the entity behaving poorly. Could this feature be an
> improvement that comes later? This improvement could be gated behind a
> per VM KVM CAP, a kvm module param, or insert other solution here, to
> not blind side userspace with this change?
> 
>>
>>> Without (b), userspace and thus KVM have to treat guest accesses to the incorrect
>>> type as implicit conversions.
>>>
>>> [*] Sadly, fully preventing kernel access to guest private is not possible with
>>>      TDX, especially if the direct map is left intact.  But maybe in the future
>>>      TDX will signal a fault instead of poisoning memory and leaving a #MC mine.
>>
>> Yah, the #MC thing sounds like someone didn't think things through. ;-\
> 
> Yes #MC in TDX seems much harder to deal with than the #PF for SNP.
> I'd propose TDX keeps the host kernel safe bug not have that solution
> block SNP. As suggested above I like the idea but it seems like it can
> come as a future improvement to SNP support.
> 

Can you clarify what type of host bug you're talking about here?  We're 
talking about the host kernel reading or writing guest private memory 
through the direct map or kmap, right?  It seems to me that the way this 
happens is:

struct page *guest_page = (get and refcount a guest page);

...

guest switches the page to private;

...

read or write the page via direct map or kmap.


This naively *looks* like something a malicious or buggy guest could 
induce the host kernel to do.  But I think that, if this actually 
occurs, it's an example of a much more severe host bug.  In particular, 
there is nothing special about the guest switching a page to private -- 
the guest or QEMU could just as easily have freed (hotunplugged) a 
shared guest page or ballooned it or swapped it or otherwise deallocated 
it.  And, if the host touches a page/pfn that the guest has freed, this 
is UAF, is a huge security hole if the guest has any degree of control, 
and needs to kill the host kernel if detected.

Now we can kind of sweep it under the rug by saying that changing a page 
from shared to private isn't really freeing the page -- it's just 
modifying the usage of the page.  But to me that sounds like saying 
"reusing a former user memory page as a pagetable isn't *really* freeing 
it if the kernel kept a pointer around the whole time and the page 
allocator wasn't involved".

So let's please just consider these mode transitions to be equivalent to 
actually freeing the page at least from a locking perspective, and then 
the way to prevent these bugs and the correct action to take if they 
occur is clear.

(On TDX, changing a page shared/private state is very much like freeing 
and reallocating -- they're in different page tables and, in principle 
anyway, there is no actual relationship between a shared page and a 
supposedly matching private page except that some bits of the GPA happen 
to match.  The hardware the TD module fully support both being mapped at 
once at the same "address", at least according to the specs I've read.)

--Andy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 21:16         ` Marc Orr
@ 2021-11-12 21:23           ` Andy Lutomirski
  2021-11-12 21:35             ` Borislav Petkov
  0 siblings, 1 reply; 239+ messages in thread
From: Andy Lutomirski @ 2021-11-12 21:23 UTC (permalink / raw)
  To: Marc Orr, Sean Christopherson
  Cc: Borislav Petkov, Dave Hansen, Peter Gonda, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Dave Hansen, Sergio Lopez,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On 11/12/21 13:16, Marc Orr wrote:
>>> So cloud providers should have an interest to prevent such random stray
>>> accesses if they wanna have guests. :)
>>
>> Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
> 
> I want to push back on "inducing a fault in the guest because of
> _host_ bug is wrong.". The guest is _required_ to be robust against
> the host maliciously (or accidentally) writing its memory. SNP
> security depends on the guest detecting such writes. Therefore, why is
> leveraging this system property that the guest will detect when its
> private memory has been written wrong?

 >
> Especially when its orders or
> magnitudes simpler than the alternative to have everything in the
> system -- kernel, user-space, and guest -- all coordinate to agree
> what's private and what's shared. Such a complex approach is likely to
> bring a lot of bugs, vulnerabilities, and limitations on future design
> into the picture.
> 

SEV-SNP, TDX, and any reasonable software solution all require that the 
host know which pages are private and which pages are shared.  Sure, the 
old SEV-ES Linux host implementation was very simple, but it's nasty and 
fundamentally can't support migration.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 20:37           ` Sean Christopherson
  2021-11-12 20:53             ` Borislav Petkov
@ 2021-11-12 21:30             ` Marc Orr
  2021-11-12 21:37               ` Dave Hansen
  2021-11-12 21:39               ` Andy Lutomirski
  2021-11-15 16:18             ` Brijesh Singh
  2 siblings, 2 replies; 239+ messages in thread
From: Marc Orr @ 2021-11-12 21:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, Dave Hansen, Peter Gonda, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021 at 12:38 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Nov 12, 2021, Borislav Petkov wrote:
> > On Fri, Nov 12, 2021 at 07:48:17PM +0000, Sean Christopherson wrote:
> > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
> >
> > What do you suggest instead?
>
> Let userspace decide what is mapped shared and what is mapped private.  The kernel
> and KVM provide the APIs/infrastructure to do the actual conversions in a thread-safe
> fashion and also to enforce the current state, but userspace is the control plane.
>
> It would require non-trivial changes in userspace if there are multiple processes
> accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
> solvable.  The exit to userspace means all three components (guest, kernel,
> and userspace) have full knowledge of what is shared and what is private.  There
> is zero ambiguity:
>
>   - if userspace accesses guest private memory, it gets SIGSEGV or whatever.
>   - if kernel accesses guest private memory, it does BUG/panic/oops[*]
>   - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.
>
> This is the direction KVM TDX support is headed, though it's obviously still a WIP.
>
> And ideally, to avoid implicit conversions at any level, hardware vendors' ABIs
> define that:
>
>   a) All convertible memory, i.e. RAM, starts as private.
>   b) Conversions between private and shared must be done via explicit hypercall.
>
> Without (b), userspace and thus KVM have to treat guest accesses to the incorrect
> type as implicit conversions.
>
> [*] Sadly, fully preventing kernel access to guest private is not possible with
>     TDX, especially if the direct map is left intact.  But maybe in the future
>     TDX will signal a fault instead of poisoning memory and leaving a #MC mine.

In this proposal, consider a guest driver instructing a device to DMA
write a 1 GB memory buffer. A well-behaved guest driver will ensure
that the entire 1 GB is marked shared. But what about a malicious or
buggy guest? Let's assume a bad guest driver instructs the device to
write guest private memory.

So now, the virtual device, which might be implemented as some host
side process, needs to (1) check and lock all 4k constituent RMP
entries (so they're not converted to private while the DMA write is
taking palce), (2) write the 1 GB buffer, and (3) unlock all 4 k
constituent RMP entries? If I'm understanding this correctly, then the
synchronization will be prohibitively expensive.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 21:23           ` Andy Lutomirski
@ 2021-11-12 21:35             ` Borislav Petkov
  0 siblings, 0 replies; 239+ messages in thread
From: Borislav Petkov @ 2021-11-12 21:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Marc Orr, Sean Christopherson, Dave Hansen, Peter Gonda,
	Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021 at 01:23:25PM -0800, Andy Lutomirski wrote:
> SEV-SNP, TDX, and any reasonable software solution all require that the host
> know which pages are private and which pages are shared.  Sure, the old
> SEV-ES Linux host implementation was very simple, but it's nasty and
> fundamentally can't support migration.

Right, so at least SNP guests need to track which pages have been
already PVALIDATEd by them so that they don't validate them again. So if
we track that somewhere in struct page or wherever, that same bit can be
used to state, page is private or shared.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 21:30             ` Marc Orr
@ 2021-11-12 21:37               ` Dave Hansen
  2021-11-12 21:40                 ` Marc Orr
  2021-11-12 21:39               ` Andy Lutomirski
  1 sibling, 1 reply; 239+ messages in thread
From: Dave Hansen @ 2021-11-12 21:37 UTC (permalink / raw)
  To: Marc Orr, Sean Christopherson
  Cc: Borislav Petkov, Peter Gonda, Brijesh Singh, x86, linux-kernel,
	kvm, linux-coco, linux-mm, linux-crypto, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, Tom Lendacky, H. Peter Anvin,
	Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On 11/12/21 1:30 PM, Marc Orr wrote:
> In this proposal, consider a guest driver instructing a device to DMA
> write a 1 GB memory buffer. A well-behaved guest driver will ensure
> that the entire 1 GB is marked shared. But what about a malicious or
> buggy guest? Let's assume a bad guest driver instructs the device to
> write guest private memory.
> 
> So now, the virtual device, which might be implemented as some host
> side process, needs to (1) check and lock all 4k constituent RMP
> entries (so they're not converted to private while the DMA write is
> taking palce), (2) write the 1 GB buffer, and (3) unlock all 4 k
> constituent RMP entries? If I'm understanding this correctly, then the
> synchronization will be prohibitively expensive.

Are you taking about a 1GB *mapping* here?  As in, something us using a
1GB page table entry to map the 1GB memory buffer?  That was the only
case where I knew we needed coordination between neighbor RMP entries
and host memory accesses.

That 1GB problem _should_ be impossible.  I thought we settled on
disabling hugetlbfs and fracturing the whole of the direct map down to 4k.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 21:30             ` Marc Orr
  2021-11-12 21:37               ` Dave Hansen
@ 2021-11-12 21:39               ` Andy Lutomirski
  2021-11-12 21:43                 ` Marc Orr
  1 sibling, 1 reply; 239+ messages in thread
From: Andy Lutomirski @ 2021-11-12 21:39 UTC (permalink / raw)
  To: Marc Orr, Sean Christopherson
  Cc: Borislav Petkov, Dave Hansen, Peter Gonda, Brijesh Singh,
	the arch/x86 maintainers, Linux Kernel Mailing List, kvm list,
	linux-coco, linux-mm, Linux Crypto Mailing List, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, Tom Lendacky, H. Peter Anvin,
	Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy



On Fri, Nov 12, 2021, at 1:30 PM, Marc Orr wrote:
> On Fri, Nov 12, 2021 at 12:38 PM Sean Christopherson <seanjc@google.com> wrote:
>>
>> On Fri, Nov 12, 2021, Borislav Petkov wrote:
>> > On Fri, Nov 12, 2021 at 07:48:17PM +0000, Sean Christopherson wrote:
>> > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
>> >
>> > What do you suggest instead?
>>
>> Let userspace decide what is mapped shared and what is mapped private.  The kernel
>> and KVM provide the APIs/infrastructure to do the actual conversions in a thread-safe
>> fashion and also to enforce the current state, but userspace is the control plane.
>>
>> It would require non-trivial changes in userspace if there are multiple processes
>> accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
>> solvable.  The exit to userspace means all three components (guest, kernel,
>> and userspace) have full knowledge of what is shared and what is private.  There
>> is zero ambiguity:
>>
>>   - if userspace accesses guest private memory, it gets SIGSEGV or whatever.
>>   - if kernel accesses guest private memory, it does BUG/panic/oops[*]
>>   - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.
>>
>> This is the direction KVM TDX support is headed, though it's obviously still a WIP.
>>
>> And ideally, to avoid implicit conversions at any level, hardware vendors' ABIs
>> define that:
>>
>>   a) All convertible memory, i.e. RAM, starts as private.
>>   b) Conversions between private and shared must be done via explicit hypercall.
>>
>> Without (b), userspace and thus KVM have to treat guest accesses to the incorrect
>> type as implicit conversions.
>>
>> [*] Sadly, fully preventing kernel access to guest private is not possible with
>>     TDX, especially if the direct map is left intact.  But maybe in the future
>>     TDX will signal a fault instead of poisoning memory and leaving a #MC mine.
>
> In this proposal, consider a guest driver instructing a device to DMA
> write a 1 GB memory buffer. A well-behaved guest driver will ensure
> that the entire 1 GB is marked shared. But what about a malicious or
> buggy guest? Let's assume a bad guest driver instructs the device to
> write guest private memory.
>
> So now, the virtual device, which might be implemented as some host
> side process, needs to (1) check and lock all 4k constituent RMP
> entries (so they're not converted to private while the DMA write is
> taking palce), (2) write the 1 GB buffer, and (3) unlock all 4 k
> constituent RMP entries? If I'm understanding this correctly, then the
> synchronization will be prohibitively expensive.

Let's consider a very very similar scenario: consider a guest driver setting up a 1 GB DMA buffer.  The virtual device, implemented as host process, needs to (1) map (and thus lock *or* be prepared for faults) in 1GB / 4k pages of guest memory (so they're not *freed* while the DMA write is taking place), (2) write the buffer, and (3) unlock all the pages.  Or it can lock them at setup time and keep them locked for a long time if that's appropriate.

Sure, the locking is expensive, but it's nonnegotiable.  The RMP issue is just a special case of the more general issue that the host MUST NOT ACCESS GUEST MEMORY AFTER IT'S FREED.

--Andy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 21:37               ` Dave Hansen
@ 2021-11-12 21:40                 ` Marc Orr
  0 siblings, 0 replies; 239+ messages in thread
From: Marc Orr @ 2021-11-12 21:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sean Christopherson, Borislav Petkov, Peter Gonda, Brijesh Singh,
	x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021 at 1:38 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 11/12/21 1:30 PM, Marc Orr wrote:
> > In this proposal, consider a guest driver instructing a device to DMA
> > write a 1 GB memory buffer. A well-behaved guest driver will ensure
> > that the entire 1 GB is marked shared. But what about a malicious or
> > buggy guest? Let's assume a bad guest driver instructs the device to
> > write guest private memory.
> >
> > So now, the virtual device, which might be implemented as some host
> > side process, needs to (1) check and lock all 4k constituent RMP
> > entries (so they're not converted to private while the DMA write is
> > taking palce), (2) write the 1 GB buffer, and (3) unlock all 4 k
> > constituent RMP entries? If I'm understanding this correctly, then the
> > synchronization will be prohibitively expensive.
>
> Are you taking about a 1GB *mapping* here?  As in, something us using a
> 1GB page table entry to map the 1GB memory buffer?  That was the only
> case where I knew we needed coordination between neighbor RMP entries
> and host memory accesses.
>
> That 1GB problem _should_ be impossible.  I thought we settled on
> disabling hugetlbfs and fracturing the whole of the direct map down to 4k.

No. I was trying to give an example where a host-side process is
virtualizing a DMA write over a large buffer that consists of a lot of
4k or 2MB RMP entries. I picked 1 GB as an arbitrary example.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 21:39               ` Andy Lutomirski
@ 2021-11-12 21:43                 ` Marc Orr
  2021-11-12 22:54                   ` Peter Gonda
  0 siblings, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-11-12 21:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sean Christopherson, Borislav Petkov, Dave Hansen, Peter Gonda,
	Brijesh Singh, the arch/x86 maintainers,
	Linux Kernel Mailing List, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy

On Fri, Nov 12, 2021 at 1:39 PM Andy Lutomirski <luto@kernel.org> wrote:
>
>
>
> On Fri, Nov 12, 2021, at 1:30 PM, Marc Orr wrote:
> > On Fri, Nov 12, 2021 at 12:38 PM Sean Christopherson <seanjc@google.com> wrote:
> >>
> >> On Fri, Nov 12, 2021, Borislav Petkov wrote:
> >> > On Fri, Nov 12, 2021 at 07:48:17PM +0000, Sean Christopherson wrote:
> >> > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
> >> >
> >> > What do you suggest instead?
> >>
> >> Let userspace decide what is mapped shared and what is mapped private.  The kernel
> >> and KVM provide the APIs/infrastructure to do the actual conversions in a thread-safe
> >> fashion and also to enforce the current state, but userspace is the control plane.
> >>
> >> It would require non-trivial changes in userspace if there are multiple processes
> >> accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
> >> solvable.  The exit to userspace means all three components (guest, kernel,
> >> and userspace) have full knowledge of what is shared and what is private.  There
> >> is zero ambiguity:
> >>
> >>   - if userspace accesses guest private memory, it gets SIGSEGV or whatever.
> >>   - if kernel accesses guest private memory, it does BUG/panic/oops[*]
> >>   - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.
> >>
> >> This is the direction KVM TDX support is headed, though it's obviously still a WIP.
> >>
> >> And ideally, to avoid implicit conversions at any level, hardware vendors' ABIs
> >> define that:
> >>
> >>   a) All convertible memory, i.e. RAM, starts as private.
> >>   b) Conversions between private and shared must be done via explicit hypercall.
> >>
> >> Without (b), userspace and thus KVM have to treat guest accesses to the incorrect
> >> type as implicit conversions.
> >>
> >> [*] Sadly, fully preventing kernel access to guest private is not possible with
> >>     TDX, especially if the direct map is left intact.  But maybe in the future
> >>     TDX will signal a fault instead of poisoning memory and leaving a #MC mine.
> >
> > In this proposal, consider a guest driver instructing a device to DMA
> > write a 1 GB memory buffer. A well-behaved guest driver will ensure
> > that the entire 1 GB is marked shared. But what about a malicious or
> > buggy guest? Let's assume a bad guest driver instructs the device to
> > write guest private memory.
> >
> > So now, the virtual device, which might be implemented as some host
> > side process, needs to (1) check and lock all 4k constituent RMP
> > entries (so they're not converted to private while the DMA write is
> > taking palce), (2) write the 1 GB buffer, and (3) unlock all 4 k
> > constituent RMP entries? If I'm understanding this correctly, then the
> > synchronization will be prohibitively expensive.
>
> Let's consider a very very similar scenario: consider a guest driver setting up a 1 GB DMA buffer.  The virtual device, implemented as host process, needs to (1) map (and thus lock *or* be prepared for faults) in 1GB / 4k pages of guest memory (so they're not *freed* while the DMA write is taking place), (2) write the buffer, and (3) unlock all the pages.  Or it can lock them at setup time and keep them locked for a long time if that's appropriate.
>
> Sure, the locking is expensive, but it's nonnegotiable.  The RMP issue is just a special case of the more general issue that the host MUST NOT ACCESS GUEST MEMORY AFTER IT'S FREED.

Good point.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 21:20                 ` Andy Lutomirski
@ 2021-11-12 22:04                   ` Borislav Petkov
  2021-11-12 22:52                   ` Peter Gonda
  1 sibling, 0 replies; 239+ messages in thread
From: Borislav Petkov @ 2021-11-12 22:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Gonda, Sean Christopherson, Dave Hansen, Brijesh Singh,
	x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Dave Hansen, Sergio Lopez,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021 at 01:20:53PM -0800, Andy Lutomirski wrote:
> Can you clarify what type of host bug you're talking about here?  We're
> talking about the host kernel reading or writing guest private memory
> through the direct map or kmap, right?  It seems to me that the way this
> happens is:
> 
> struct page *guest_page = (get and refcount a guest page);
> 
> ...
> 
> guest switches the page to private;
> 
> ...
> 
> read or write the page via direct map or kmap.

Maybe I don't understand your example properly but on this access here,
the page is not gonna be validated anymore in the RMP table, i.e., on
the next access, the guest will get a #VC.

Or are you talking about a case where the host gets a reference to a
guest page and the guest - in the meantime - PVALIDATEs that page and
thus converts it into a private page?

So that case is simple: if the guest really deliberately gave that page
to the host to do stuff to it and it converted it to private in the
meantime, guest dies.

That's why it would make sense to have explicit consensus between guest
and host which pages get shared. Everything else is not going to be
allowed and the entity at fault pays the consequences.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 21:20                 ` Andy Lutomirski
  2021-11-12 22:04                   ` Borislav Petkov
@ 2021-11-12 22:52                   ` Peter Gonda
  1 sibling, 0 replies; 239+ messages in thread
From: Peter Gonda @ 2021-11-12 22:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, Sean Christopherson, Dave Hansen, Brijesh Singh,
	x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Dave Hansen, Sergio Lopez,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021 at 2:20 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On 11/12/21 13:12, Peter Gonda wrote:
> > On Fri, Nov 12, 2021 at 1:55 PM Borislav Petkov <bp@alien8.de> wrote:
> >>
> >> On Fri, Nov 12, 2021 at 08:37:59PM +0000, Sean Christopherson wrote:
> >>> Let userspace decide what is mapped shared and what is mapped private.
> >>
> >> With "userspace", you mean the *host* userspace?
> >>
> >>> The kernel and KVM provide the APIs/infrastructure to do the actual
> >>> conversions in a thread-safe fashion and also to enforce the current
> >>> state, but userspace is the control plane.
> >>>
> >>> It would require non-trivial changes in userspace if there are multiple processes
> >>> accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
> >>> solvable.  The exit to userspace means all three components (guest, kernel,
> >>> and userspace) have full knowledge of what is shared and what is private.  There
> >>> is zero ambiguity:
> >>>
> >>>    - if userspace accesses guest private memory, it gets SIGSEGV or whatever.
> >>
> >> That SIGSEGV is generated by the host kernel, I presume, after it checks
> >> whether the memory belongs to the guest?
> >>
> >>>    - if kernel accesses guest private memory, it does BUG/panic/oops[*]
> >>
> >> If *it* is the host kernel, then you probably shouldn't do that -
> >> otherwise you just killed the host kernel on which all those guests are
> >> running.
> >
> > I agree, it seems better to terminate the single guest with an issue.
> > Rather than killing the host (and therefore all guests). So I'd
> > suggest even in this case we do the 'convert to shared' approach or
> > just outright terminate the guest.
> >
> > Are there already examples in KVM of a KVM bug in servicing a VM's
> > request results in a BUG/panic/oops? That seems not ideal ever.
> >
> >>
> >>>    - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.
> >>
> >> Yah, that's the easy one.
> >>
> >>> This is the direction KVM TDX support is headed, though it's obviously still a WIP.
> >>>
> >>> And ideally, to avoid implicit conversions at any level, hardware vendors' ABIs
> >>> define that:
> >>>
> >>>    a) All convertible memory, i.e. RAM, starts as private.
> >>>    b) Conversions between private and shared must be done via explicit hypercall.
> >>
> >> I like the explicit nature of this but devil's in the detail and I'm no
> >> virt guy...
> >
> > This seems like a reasonable approach that can help with the issue of
> > terminating the entity behaving poorly. Could this feature be an
> > improvement that comes later? This improvement could be gated behind a
> > per VM KVM CAP, a kvm module param, or insert other solution here, to
> > not blind side userspace with this change?
> >
> >>
> >>> Without (b), userspace and thus KVM have to treat guest accesses to the incorrect
> >>> type as implicit conversions.
> >>>
> >>> [*] Sadly, fully preventing kernel access to guest private is not possible with
> >>>      TDX, especially if the direct map is left intact.  But maybe in the future
> >>>      TDX will signal a fault instead of poisoning memory and leaving a #MC mine.
> >>
> >> Yah, the #MC thing sounds like someone didn't think things through. ;-\
> >
> > Yes #MC in TDX seems much harder to deal with than the #PF for SNP.
> > I'd propose TDX keeps the host kernel safe bug not have that solution
> > block SNP. As suggested above I like the idea but it seems like it can
> > come as a future improvement to SNP support.
> >
>
> Can you clarify what type of host bug you're talking about here?  We're
> talking about the host kernel reading or writing guest private memory
> through the direct map or kmap, right?  It seems to me that the way this
> happens is:

An example bug I am discussing looks like this:
* Guest wants to use virt device on some shared memory.
* Guest forgets to ask host to RMPUPDATE page to shared state (hypervisor owned)
* Guest points virt device at pages
* Virt device writes to page, since they are still private guest
userspace causes RMP fault #PF.

This seems like a trivial attack malicious guests could do.

>
> struct page *guest_page = (get and refcount a guest page);
>
> ...
>
> guest switches the page to private;
>
> ...
>
> read or write the page via direct map or kmap.
>
>
> This naively *looks* like something a malicious or buggy guest could
> induce the host kernel to do.  But I think that, if this actually
> occurs, it's an example of a much more severe host bug.  In particular,
> there is nothing special about the guest switching a page to private --
> the guest or QEMU could just as easily have freed (hotunplugged) a
> shared guest page or ballooned it or swapped it or otherwise deallocated
> it.  And, if the host touches a page/pfn that the guest has freed, this
> is UAF, is a huge security hole if the guest has any degree of control,
> and needs to kill the host kernel if detected.
>
> Now we can kind of sweep it under the rug by saying that changing a page
> from shared to private isn't really freeing the page -- it's just
> modifying the usage of the page.  But to me that sounds like saying
> "reusing a former user memory page as a pagetable isn't *really* freeing
> it if the kernel kept a pointer around the whole time and the page
> allocator wasn't involved".

This is different from the QEMU hotunplug because in that case QEMU is
aware of the memory change. Currently QEMU (userspace) is completely
unaware of which pages are shared and which are private. So how does
userspace know the  memory has been "freed"?

>
> So let's please just consider these mode transitions to be equivalent to
> actually freeing the page at least from a locking perspective, and then
> the way to prevent these bugs and the correct action to take if they
> occur is clear.
>
> (On TDX, changing a page shared/private state is very much like freeing
> and reallocating -- they're in different page tables and, in principle
> anyway, there is no actual relationship between a shared page and a
> supposedly matching private page except that some bits of the GPA happen
> to match.  The hardware the TD module fully support both being mapped at
> once at the same "address", at least according to the specs I've read.)
>
> --Andy

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 21:43                 ` Marc Orr
@ 2021-11-12 22:54                   ` Peter Gonda
  2021-11-13  0:53                     ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Peter Gonda @ 2021-11-12 22:54 UTC (permalink / raw)
  To: Marc Orr
  Cc: Andy Lutomirski, Sean Christopherson, Borislav Petkov,
	Dave Hansen, Brijesh Singh, the arch/x86 maintainers,
	Linux Kernel Mailing List, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy

On Fri, Nov 12, 2021 at 2:43 PM Marc Orr <marcorr@google.com> wrote:
>
> On Fri, Nov 12, 2021 at 1:39 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> >
> >
> > On Fri, Nov 12, 2021, at 1:30 PM, Marc Orr wrote:
> > > On Fri, Nov 12, 2021 at 12:38 PM Sean Christopherson <seanjc@google.com> wrote:
> > >>
> > >> On Fri, Nov 12, 2021, Borislav Petkov wrote:
> > >> > On Fri, Nov 12, 2021 at 07:48:17PM +0000, Sean Christopherson wrote:
> > >> > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
> > >> >
> > >> > What do you suggest instead?
> > >>
> > >> Let userspace decide what is mapped shared and what is mapped private.  The kernel
> > >> and KVM provide the APIs/infrastructure to do the actual conversions in a thread-safe
> > >> fashion and also to enforce the current state, but userspace is the control plane.
> > >>
> > >> It would require non-trivial changes in userspace if there are multiple processes
> > >> accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
> > >> solvable.  The exit to userspace means all three components (guest, kernel,
> > >> and userspace) have full knowledge of what is shared and what is private.  There
> > >> is zero ambiguity:
> > >>
> > >>   - if userspace accesses guest private memory, it gets SIGSEGV or whatever.
> > >>   - if kernel accesses guest private memory, it does BUG/panic/oops[*]
> > >>   - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.
> > >>
> > >> This is the direction KVM TDX support is headed, though it's obviously still a WIP.
> > >>
> > >> And ideally, to avoid implicit conversions at any level, hardware vendors' ABIs
> > >> define that:
> > >>
> > >>   a) All convertible memory, i.e. RAM, starts as private.
> > >>   b) Conversions between private and shared must be done via explicit hypercall.
> > >>
> > >> Without (b), userspace and thus KVM have to treat guest accesses to the incorrect
> > >> type as implicit conversions.
> > >>
> > >> [*] Sadly, fully preventing kernel access to guest private is not possible with
> > >>     TDX, especially if the direct map is left intact.  But maybe in the future
> > >>     TDX will signal a fault instead of poisoning memory and leaving a #MC mine.
> > >
> > > In this proposal, consider a guest driver instructing a device to DMA
> > > write a 1 GB memory buffer. A well-behaved guest driver will ensure
> > > that the entire 1 GB is marked shared. But what about a malicious or
> > > buggy guest? Let's assume a bad guest driver instructs the device to
> > > write guest private memory.
> > >
> > > So now, the virtual device, which might be implemented as some host
> > > side process, needs to (1) check and lock all 4k constituent RMP
> > > entries (so they're not converted to private while the DMA write is
> > > taking palce), (2) write the 1 GB buffer, and (3) unlock all 4 k
> > > constituent RMP entries? If I'm understanding this correctly, then the
> > > synchronization will be prohibitively expensive.
> >
> > Let's consider a very very similar scenario: consider a guest driver setting up a 1 GB DMA buffer.  The virtual device, implemented as host process, needs to (1) map (and thus lock *or* be prepared for faults) in 1GB / 4k pages of guest memory (so they're not *freed* while the DMA write is taking place), (2) write the buffer, and (3) unlock all the pages.  Or it can lock them at setup time and keep them locked for a long time if that's appropriate.
> >
> > Sure, the locking is expensive, but it's nonnegotiable.  The RMP issue is just a special case of the more general issue that the host MUST NOT ACCESS GUEST MEMORY AFTER IT'S FREED.
>
> Good point.

Thanks for the responses Andy.

Having a way for userspace to lock pages as shared was an idea I just
proposed the simplest solution to start the conversation. So what we
could do here is change map to error if the selected region has
private pages, if the region is mapped we can then lock the pages
shared. Now processes mapping guest memory that are well behaved can
be safe from RMP violations. That seems like a reasonable solution for
allowing userspace to know if guest memory is accessible or not. Or do
you have other ideas to meet your other comment:

> SEV-SNP, TDX, and any reasonable software solution all require that the
> host know which pages are private and which pages are shared.  Sure, the
> old SEV-ES Linux host implementation was very simple, but it's nasty and
> fundamentally can't support migration.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 21:12               ` Peter Gonda
  2021-11-12 21:20                 ` Andy Lutomirski
@ 2021-11-13  0:00                 ` Sean Christopherson
  2021-11-13  0:10                   ` Marc Orr
  1 sibling, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-11-13  0:00 UTC (permalink / raw)
  To: Peter Gonda
  Cc: Borislav Petkov, Dave Hansen, Brijesh Singh, x86, linux-kernel,
	kvm, linux-coco, linux-mm, linux-crypto, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, Tom Lendacky, H. Peter Anvin,
	Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021, Peter Gonda wrote:
> On Fri, Nov 12, 2021 at 1:55 PM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Fri, Nov 12, 2021 at 08:37:59PM +0000, Sean Christopherson wrote:
> > > Let userspace decide what is mapped shared and what is mapped private.
> >
> > With "userspace", you mean the *host* userspace?

Yep.

> > > The kernel and KVM provide the APIs/infrastructure to do the actual
> > > conversions in a thread-safe fashion and also to enforce the current
> > > state, but userspace is the control plane.
> > >
> > > It would require non-trivial changes in userspace if there are multiple processes
> > > accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
> > > solvable.  The exit to userspace means all three components (guest, kernel,
> > > and userspace) have full knowledge of what is shared and what is private.  There
> > > is zero ambiguity:
> > >
> > >   - if userspace accesses guest private memory, it gets SIGSEGV or whatever.
> >
> > That SIGSEGV is generated by the host kernel, I presume, after it checks
> > whether the memory belongs to the guest?

Yep.

> > >   - if kernel accesses guest private memory, it does BUG/panic/oops[*]
> >
> > If *it* is the host kernel, then you probably shouldn't do that -
> > otherwise you just killed the host kernel on which all those guests are
> > running.
> 
> I agree, it seems better to terminate the single guest with an issue.
> Rather than killing the host (and therefore all guests). So I'd
> suggest even in this case we do the 'convert to shared' approach or
> just outright terminate the guest.
> 
> Are there already examples in KVM of a KVM bug in servicing a VM's
> request results in a BUG/panic/oops? That seems not ideal ever.

Plenty of examples.  kvm_spurious_fault() is the obvious one.  Any NULL pointer
deref will lead to a BUG, etc...  And it's not just KVM, e.g. it's possible, if
unlikely, for the core kernel to run into guest private memory (e.g. if the kernel
botches an RMP change), and if that happens there's no guarantee that the kernel
can recover.

I fully agree that ideally KVM would have a better sense of self-preservation,
but IMO that's an orthogonal discussion.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-13  0:00                 ` Sean Christopherson
@ 2021-11-13  0:10                   ` Marc Orr
  2021-11-13 18:34                     ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-11-13  0:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Gonda, Borislav Petkov, Dave Hansen, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

> > > If *it* is the host kernel, then you probably shouldn't do that -
> > > otherwise you just killed the host kernel on which all those guests are
> > > running.
> >
> > I agree, it seems better to terminate the single guest with an issue.
> > Rather than killing the host (and therefore all guests). So I'd
> > suggest even in this case we do the 'convert to shared' approach or
> > just outright terminate the guest.
> >
> > Are there already examples in KVM of a KVM bug in servicing a VM's
> > request results in a BUG/panic/oops? That seems not ideal ever.
>
> Plenty of examples.  kvm_spurious_fault() is the obvious one.  Any NULL pointer
> deref will lead to a BUG, etc...  And it's not just KVM, e.g. it's possible, if
> unlikely, for the core kernel to run into guest private memory (e.g. if the kernel
> botches an RMP change), and if that happens there's no guarantee that the kernel
> can recover.
>
> I fully agree that ideally KVM would have a better sense of self-preservation,
> but IMO that's an orthogonal discussion.

I don't think we should treat the possibility of crashing the host
with live VMs nonchalantly. It's a big deal. Doing so has big
implications on the probability that any cloud vendor wil bee able to
deploy this code to production. And aren't cloud vendors one of the
main use cases for all of this confidential compute stuff? I'm
honestly surprised that so many people are OK with crashing the host.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 22:54                   ` Peter Gonda
@ 2021-11-13  0:53                     ` Sean Christopherson
  2021-11-13  1:04                       ` Marc Orr
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-11-13  0:53 UTC (permalink / raw)
  To: Peter Gonda
  Cc: Marc Orr, Andy Lutomirski, Borislav Petkov, Dave Hansen,
	Brijesh Singh, the arch/x86 maintainers,
	Linux Kernel Mailing List, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy

On Fri, Nov 12, 2021, Peter Gonda wrote:
> On Fri, Nov 12, 2021 at 2:43 PM Marc Orr <marcorr@google.com> wrote:
> >
> > On Fri, Nov 12, 2021 at 1:39 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > Let's consider a very very similar scenario: consider a guest driver
> > > setting up a 1 GB DMA buffer.  The virtual device, implemented as host
> > > process, needs to (1) map (and thus lock *or* be prepared for faults) in
> > > 1GB / 4k pages of guest memory (so they're not *freed* while the DMA
> > > write is taking place), (2) write the buffer, and (3) unlock all the
> > > pages.  Or it can lock them at setup time and keep them locked for a long
> > > time if that's appropriate.
> > >
> > > Sure, the locking is expensive, but it's nonnegotiable.  The RMP issue is
> > > just a special case of the more general issue that the host MUST NOT
> > > ACCESS GUEST MEMORY AFTER IT'S FREED.
> >
> > Good point.
> 
> Thanks for the responses Andy.
> 
> Having a way for userspace to lock pages as shared was an idea I just
> proposed the simplest solution to start the conversation.

Assuming you meant that to read:

  Having a way for userspace to lock pages as shared is an alternative idea; I
  just proposed the simplest solution to start the conversation.

The unmapping[*] guest private memory proposal is essentially that, a way for userspace
to "lock" the state of a page by requiring all conversions to be initiated by userspace
and by providing APIs to associate a pfn 1:1 with a KVM instance, i.e. lock a pfn to
a guest.

Andy's DMA example brings up a very good point though.  If the shared and private
variants of a given GPA are _not_ required to point at a single PFN, which is the
case in the current unmapping proposal, userspace doesn't need to do any additional
juggling to track guest conversions across multiple processes.

Any process that's accessing guest (shared!) memory simply does its locking as normal,
which as Andy pointed out, is needed for correctness today.  If the guest requests a
conversion from shared=>private without first ensuring the gfn is unused (by a host
"device"), the host will side will continue accessing the old, shared memory, which it
locked, while the guest will be doing who knows what.  And if the guest provides a GPA
that isn't mapped shared in the VMM's address space, it's conceptually no different
than if the guest provided a completely bogus GPA, which again needs to be handled today.

In other words, if done properly, differentiating private from shared shouldn't be a
heavy lift for host userspace.

[*] Actually unmapping memory may not be strictly necessary for SNP because a
    #PF(RMP) is likely just as good as a #PF(!PRESENT) when both are treated as
    fatal, but the rest of the proposal that allows KVM to understand the stage
    of a page and exit to userspace accordingly applies.


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-13  0:53                     ` Sean Christopherson
@ 2021-11-13  1:04                       ` Marc Orr
  2021-11-13 18:28                         ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-11-13  1:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Gonda, Andy Lutomirski, Borislav Petkov, Dave Hansen,
	Brijesh Singh, the arch/x86 maintainers,
	Linux Kernel Mailing List, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy

On Fri, Nov 12, 2021 at 4:53 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Nov 12, 2021, Peter Gonda wrote:
> > On Fri, Nov 12, 2021 at 2:43 PM Marc Orr <marcorr@google.com> wrote:
> > >
> > > On Fri, Nov 12, 2021 at 1:39 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > > Let's consider a very very similar scenario: consider a guest driver
> > > > setting up a 1 GB DMA buffer.  The virtual device, implemented as host
> > > > process, needs to (1) map (and thus lock *or* be prepared for faults) in
> > > > 1GB / 4k pages of guest memory (so they're not *freed* while the DMA
> > > > write is taking place), (2) write the buffer, and (3) unlock all the
> > > > pages.  Or it can lock them at setup time and keep them locked for a long
> > > > time if that's appropriate.
> > > >
> > > > Sure, the locking is expensive, but it's nonnegotiable.  The RMP issue is
> > > > just a special case of the more general issue that the host MUST NOT
> > > > ACCESS GUEST MEMORY AFTER IT'S FREED.
> > >
> > > Good point.
> >
> > Thanks for the responses Andy.
> >
> > Having a way for userspace to lock pages as shared was an idea I just
> > proposed the simplest solution to start the conversation.
>
> Assuming you meant that to read:
>
>   Having a way for userspace to lock pages as shared is an alternative idea; I
>   just proposed the simplest solution to start the conversation.
>
> The unmapping[*] guest private memory proposal is essentially that, a way for userspace
> to "lock" the state of a page by requiring all conversions to be initiated by userspace
> and by providing APIs to associate a pfn 1:1 with a KVM instance, i.e. lock a pfn to
> a guest.
>
> Andy's DMA example brings up a very good point though.  If the shared and private
> variants of a given GPA are _not_ required to point at a single PFN, which is the
> case in the current unmapping proposal, userspace doesn't need to do any additional
> juggling to track guest conversions across multiple processes.
>
> Any process that's accessing guest (shared!) memory simply does its locking as normal,
> which as Andy pointed out, is needed for correctness today.  If the guest requests a
> conversion from shared=>private without first ensuring the gfn is unused (by a host
> "device"), the host will side will continue accessing the old, shared memory, which it
> locked, while the guest will be doing who knows what.  And if the guest provides a GPA
> that isn't mapped shared in the VMM's address space, it's conceptually no different
> than if the guest provided a completely bogus GPA, which again needs to be handled today.
>
> In other words, if done properly, differentiating private from shared shouldn't be a
> heavy lift for host userspace.
>
> [*] Actually unmapping memory may not be strictly necessary for SNP because a
>     #PF(RMP) is likely just as good as a #PF(!PRESENT) when both are treated as
>     fatal, but the rest of the proposal that allows KVM to understand the stage
>     of a page and exit to userspace accordingly applies.

Thanks for this explanation. When you write "while the guest will be
doing who knows what":

Isn't that a large weakness of this proposal? To me, it seems better
for debuggability to corrupt the private memory (i.e., convert the
page to shared) so the guest can detect the issue via a PVALIDATE
failure.

The main issue I see with corrupting the guest memory is that we may
not know whether the host is at fault or the guest. Though, we can
probably in many cases be sure it's the host, if the pid associated
with the page fault is NOT a process associated with virtualization.
But if it is a process associated with virtualization, we legitimately
might not know. (I think if the pid is the kernel itself, it's
probably a host-side bug, but I'm still not confident on this; for
example, the guest might be able to coerce KVM's built-in emulator to
write guest private memory.)

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-13  1:04                       ` Marc Orr
@ 2021-11-13 18:28                         ` Sean Christopherson
  2021-11-14  7:41                           ` Marc Orr
  2021-11-15 16:52                           ` Joerg Roedel
  0 siblings, 2 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-11-13 18:28 UTC (permalink / raw)
  To: Marc Orr
  Cc: Peter Gonda, Andy Lutomirski, Borislav Petkov, Dave Hansen,
	Brijesh Singh, the arch/x86 maintainers,
	Linux Kernel Mailing List, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy

On Fri, Nov 12, 2021, Marc Orr wrote:
> On Fri, Nov 12, 2021 at 4:53 PM Sean Christopherson <seanjc@google.com> wrote:
> > On Fri, Nov 12, 2021, Peter Gonda wrote:
> > > Having a way for userspace to lock pages as shared was an idea I just
> > > proposed the simplest solution to start the conversation.
> >
> > Assuming you meant that to read:
> >
> >   Having a way for userspace to lock pages as shared is an alternative idea; I
> >   just proposed the simplest solution to start the conversation.
> >
> > The unmapping[*] guest private memory proposal is essentially that, a way for userspace
> > to "lock" the state of a page by requiring all conversions to be initiated by userspace
> > and by providing APIs to associate a pfn 1:1 with a KVM instance, i.e. lock a pfn to
> > a guest.
> >
> > Andy's DMA example brings up a very good point though.  If the shared and private
> > variants of a given GPA are _not_ required to point at a single PFN, which is the
> > case in the current unmapping proposal, userspace doesn't need to do any additional
> > juggling to track guest conversions across multiple processes.
> >
> > Any process that's accessing guest (shared!) memory simply does its locking as normal,
> > which as Andy pointed out, is needed for correctness today.  If the guest requests a
> > conversion from shared=>private without first ensuring the gfn is unused (by a host
> > "device"), the host will side will continue accessing the old, shared memory, which it
> > locked, while the guest will be doing who knows what.  And if the guest provides a GPA
> > that isn't mapped shared in the VMM's address space, it's conceptually no different
> > than if the guest provided a completely bogus GPA, which again needs to be handled today.
> >
> > In other words, if done properly, differentiating private from shared shouldn't be a
> > heavy lift for host userspace.
> >
> > [*] Actually unmapping memory may not be strictly necessary for SNP because a
> >     #PF(RMP) is likely just as good as a #PF(!PRESENT) when both are treated as
> >     fatal, but the rest of the proposal that allows KVM to understand the stage
> >     of a page and exit to userspace accordingly applies.
> 
> Thanks for this explanation. When you write "while the guest will be
> doing who knows what":
> 
> Isn't that a large weakness of this proposal? To me, it seems better
> for debuggability to corrupt the private memory (i.e., convert the
> page to shared) so the guest can detect the issue via a PVALIDATE
> failure.

The behavior is no different than it is today for regular VMs.

> The main issue I see with corrupting the guest memory is that we may
> not know whether the host is at fault or the guest.

Yes, one issue is that bugs in the host will result in downstream errors in the
guest, as opposed to immediate, synchronous detection in the guest.  IMO that is
a significant flaw.

Another issue is that the host kernel, which despite being "untrusted", absolutely
should be acting in the best interests of the guest.  Allowing userspace to inject
#VC, e.g. to attempt to attack the guest by triggering a spurious PVALIDATE, means
the kernel is failing miserably on that front.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-13  0:10                   ` Marc Orr
@ 2021-11-13 18:34                     ` Sean Christopherson
  2021-11-14  7:54                       ` Marc Orr
  2021-11-15 16:36                       ` Joerg Roedel
  0 siblings, 2 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-11-13 18:34 UTC (permalink / raw)
  To: Marc Orr
  Cc: Peter Gonda, Borislav Petkov, Dave Hansen, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021, Marc Orr wrote:
> > > > If *it* is the host kernel, then you probably shouldn't do that -
> > > > otherwise you just killed the host kernel on which all those guests are
> > > > running.
> > >
> > > I agree, it seems better to terminate the single guest with an issue.
> > > Rather than killing the host (and therefore all guests). So I'd
> > > suggest even in this case we do the 'convert to shared' approach or
> > > just outright terminate the guest.
> > >
> > > Are there already examples in KVM of a KVM bug in servicing a VM's
> > > request results in a BUG/panic/oops? That seems not ideal ever.
> >
> > Plenty of examples.  kvm_spurious_fault() is the obvious one.  Any NULL pointer
> > deref will lead to a BUG, etc...  And it's not just KVM, e.g. it's possible, if
> > unlikely, for the core kernel to run into guest private memory (e.g. if the kernel
> > botches an RMP change), and if that happens there's no guarantee that the kernel
> > can recover.
> >
> > I fully agree that ideally KVM would have a better sense of self-preservation,
> > but IMO that's an orthogonal discussion.
> 
> I don't think we should treat the possibility of crashing the host
> with live VMs nonchalantly. It's a big deal. Doing so has big
> implications on the probability that any cloud vendor wil bee able to
> deploy this code to production. And aren't cloud vendors one of the
> main use cases for all of this confidential compute stuff? I'm
> honestly surprised that so many people are OK with crashing the host.

I'm not treating it nonchalantly, merely acknowledging that (a) some flavors of kernel
bugs (or hardware issues!) are inherently fatal to the system, and (b) crashing the
host may be preferable to continuing on in certain cases, e.g. if continuing on has a
high probablity of corrupting guest data.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-13 18:28                         ` Sean Christopherson
@ 2021-11-14  7:41                           ` Marc Orr
  2021-11-15 18:17                             ` Sean Christopherson
  2021-11-15 16:52                           ` Joerg Roedel
  1 sibling, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-11-14  7:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Gonda, Andy Lutomirski, Borislav Petkov, Dave Hansen,
	Brijesh Singh, the arch/x86 maintainers,
	Linux Kernel Mailing List, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy

On Sat, Nov 13, 2021 at 10:28 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Nov 12, 2021, Marc Orr wrote:
> > On Fri, Nov 12, 2021 at 4:53 PM Sean Christopherson <seanjc@google.com> wrote:
> > > On Fri, Nov 12, 2021, Peter Gonda wrote:
> > > > Having a way for userspace to lock pages as shared was an idea I just
> > > > proposed the simplest solution to start the conversation.
> > >
> > > Assuming you meant that to read:
> > >
> > >   Having a way for userspace to lock pages as shared is an alternative idea; I
> > >   just proposed the simplest solution to start the conversation.
> > >
> > > The unmapping[*] guest private memory proposal is essentially that, a way for userspace
> > > to "lock" the state of a page by requiring all conversions to be initiated by userspace
> > > and by providing APIs to associate a pfn 1:1 with a KVM instance, i.e. lock a pfn to
> > > a guest.
> > >
> > > Andy's DMA example brings up a very good point though.  If the shared and private
> > > variants of a given GPA are _not_ required to point at a single PFN, which is the
> > > case in the current unmapping proposal, userspace doesn't need to do any additional
> > > juggling to track guest conversions across multiple processes.
> > >
> > > Any process that's accessing guest (shared!) memory simply does its locking as normal,
> > > which as Andy pointed out, is needed for correctness today.  If the guest requests a
> > > conversion from shared=>private without first ensuring the gfn is unused (by a host
> > > "device"), the host will side will continue accessing the old, shared memory, which it
> > > locked, while the guest will be doing who knows what.  And if the guest provides a GPA
> > > that isn't mapped shared in the VMM's address space, it's conceptually no different
> > > than if the guest provided a completely bogus GPA, which again needs to be handled today.
> > >
> > > In other words, if done properly, differentiating private from shared shouldn't be a
> > > heavy lift for host userspace.
> > >
> > > [*] Actually unmapping memory may not be strictly necessary for SNP because a
> > >     #PF(RMP) is likely just as good as a #PF(!PRESENT) when both are treated as
> > >     fatal, but the rest of the proposal that allows KVM to understand the stage
> > >     of a page and exit to userspace accordingly applies.
> >
> > Thanks for this explanation. When you write "while the guest will be
> > doing who knows what":
> >
> > Isn't that a large weakness of this proposal? To me, it seems better
> > for debuggability to corrupt the private memory (i.e., convert the
> > page to shared) so the guest can detect the issue via a PVALIDATE
> > failure.
>
> The behavior is no different than it is today for regular VMs.

Isn't this counter to the sketch you laid out earlier where you wrote:

--- QUOTE START ---
  - if userspace accesses guest private memory, it gets SIGSEGV or whatever.
  - if kernel accesses guest private memory, it does BUG/panic/oops[*]
  - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.
--- QUOTE END ---

Here, the guest does not get killed. Which seems hard to debug.

> > The main issue I see with corrupting the guest memory is that we may
> > not know whether the host is at fault or the guest.
>
> Yes, one issue is that bugs in the host will result in downstream errors in the
> guest, as opposed to immediate, synchronous detection in the guest.  IMO that is
> a significant flaw.

Nobody wants bugs in the host. Once we've hit one -- and we will --
we're in a bad situation. The question is how will we handle these
bugs. I'm arguing that we should design the system to be as robust as
possible.

I agree that immediate, synchronous detection in the guest is ideal.
But at what cost? Is it worth killing _ALL_ VMs on the system? That's
what we're doing when we crash host-wide processes, or even worse, the
kernel itself.

The reality is that guests must be able to detect writes to their
private memory long after they have occurred. Both SNP and TDX are
designed this way! For SNP the guest gets a PVALIDATE failure when it
reads the corrupted memory. For TDX, my understanding is that the
hardware fails to verify an HMAC over a page when it's read by the
guest.

The argument I'm making is that the success of confidential VMs -- in
both SNP and TDX -- depends on the guest being able to detect that its
private memory has been corrupted in a delayed and asynchronous
fashion. We should be leveraging this fact to make the entire system
more robust and reliable.

> Another issue is that the host kernel, which despite being "untrusted", absolutely
> should be acting in the best interests of the guest.  Allowing userspace to inject
> #VC, e.g. to attempt to attack the guest by triggering a spurious PVALIDATE, means
> the kernel is failing miserably on that front.

The host is acting in the best interests of the guest. That's why
we're having this debate :-). No-one here is trying to write host code
to sabotage the guest. Quite the opposite.

But what happens when we mess up? That's what this conversation is really about.

If allowing userspace to inject #VC into the guest means that the host
can continue to serve other guests, that seems like a win. The
alternative, to blow up the host, essentially expands the blast radius
from a single guest to all guests.

Also, these attacks go both ways. As we already discussed, the guest
may try to trick the host into writing its own private memory. Yes,
the entire idea to literally make that impossible is a good one -- in
theory. But it's also very complicated. And what happens if we get
that wrong? Now our entire host is at risk from a single guest.

Leveraging the system-wide design of SNP and TDX -- where a detecting
writes to private memory asynchronously is table stakes -- increases
the reliability of the entire system. And, as Peter mentioned earlier
on, we can always incorporate any future work to make writing private
memory impossible into SNP, on top of the code to convert the shared
page to private. This way, we get reliability in depth, and minimize
the odds of crashing the host -- and its VMs.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-13 18:34                     ` Sean Christopherson
@ 2021-11-14  7:54                       ` Marc Orr
  2021-11-15 17:16                         ` Sean Christopherson
  2021-11-15 16:36                       ` Joerg Roedel
  1 sibling, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-11-14  7:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Gonda, Borislav Petkov, Dave Hansen, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Sat, Nov 13, 2021 at 10:35 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Nov 12, 2021, Marc Orr wrote:
> > > > > If *it* is the host kernel, then you probably shouldn't do that -
> > > > > otherwise you just killed the host kernel on which all those guests are
> > > > > running.
> > > >
> > > > I agree, it seems better to terminate the single guest with an issue.
> > > > Rather than killing the host (and therefore all guests). So I'd
> > > > suggest even in this case we do the 'convert to shared' approach or
> > > > just outright terminate the guest.
> > > >
> > > > Are there already examples in KVM of a KVM bug in servicing a VM's
> > > > request results in a BUG/panic/oops? That seems not ideal ever.
> > >
> > > Plenty of examples.  kvm_spurious_fault() is the obvious one.  Any NULL pointer
> > > deref will lead to a BUG, etc...  And it's not just KVM, e.g. it's possible, if
> > > unlikely, for the core kernel to run into guest private memory (e.g. if the kernel
> > > botches an RMP change), and if that happens there's no guarantee that the kernel
> > > can recover.
> > >
> > > I fully agree that ideally KVM would have a better sense of self-preservation,
> > > but IMO that's an orthogonal discussion.
> >
> > I don't think we should treat the possibility of crashing the host
> > with live VMs nonchalantly. It's a big deal. Doing so has big
> > implications on the probability that any cloud vendor wil bee able to
> > deploy this code to production. And aren't cloud vendors one of the
> > main use cases for all of this confidential compute stuff? I'm
> > honestly surprised that so many people are OK with crashing the host.
>
> I'm not treating it nonchalantly, merely acknowledging that (a) some flavors of kernel
> bugs (or hardware issues!) are inherently fatal to the system, and (b) crashing the
> host may be preferable to continuing on in certain cases, e.g. if continuing on has a
> high probablity of corrupting guest data.

I disagree. Crashing the host -- and _ALL_ of its VMs (including
non-confidential VMs) -- is not preferable to crashing a single SNP
VM. Especially when that SNP VM is guaranteed to detect the memory
corruption and react accordingly.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 19:48       ` Sean Christopherson
  2021-11-12 20:04         ` Borislav Petkov
  2021-11-12 21:16         ` Marc Orr
@ 2021-11-15 12:30         ` Dr. David Alan Gilbert
  2021-11-15 14:42           ` Joerg Roedel
  2021-11-15 18:26           ` Sean Christopherson
  2021-11-15 16:16         ` Joerg Roedel
  3 siblings, 2 replies; 239+ messages in thread
From: Dr. David Alan Gilbert @ 2021-11-15 12:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, Dave Hansen, Peter Gonda, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

* Sean Christopherson (seanjc@google.com) wrote:
> On Fri, Nov 12, 2021, Borislav Petkov wrote:
> > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
> > > Or, is there some mechanism that prevent guest-private memory from being
> > > accessed in random host kernel code?
> 
> Or random host userspace code...
> 
> > So I'm currently under the impression that random host->guest accesses
> > should not happen if not previously agreed upon by both.
> 
> Key word "should".
> 
> > Because, as explained on IRC, if host touches a private guest page,
> > whatever the host does to that page, the next time the guest runs, it'll
> > get a #VC where it will see that that page doesn't belong to it anymore
> > and then, out of paranoia, it will simply terminate to protect itself.
> > 
> > So cloud providers should have an interest to prevent such random stray
> > accesses if they wanna have guests. :)
> 
> Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.

Would it necessarily have been a host bug?  A guest telling the host a
bad GPA to DMA into would trigger this wouldn't it?

Still; I wonder if it's best to kill the guest - maybe it's best for
the host to kill the guest and leave behind diagnostics of what
happened; for someone debugging the crash, it's going to be less useful
to know that page X was wrongly accessed (which is what the guest would
see), and more useful to know that it was the kernel's vhost-... driver
that accessed it.

Dave

> On Fri, Nov 12, 2021, Peter Gonda wrote:
> > Here is an alternative to the current approach: On RMP violation (host
> > or userspace) the page fault handler converts the page from private to
> > shared to allow the write to continue. This pulls from s390’s error
> > handling which does exactly this. See ‘arch_make_page_accessible()’.
> 
> Ah, after further reading, s390 does _not_ do implicit private=>shared conversions.
> 
> s390's arch_make_page_accessible() is somewhat similar, but it is not a direct
> comparison.  IIUC, it exports and integrity protects the data and thus preserves
> the guest's data in an encrypted form, e.g. so that it can be swapped to disk.
> And if the host corrupts the data, attempting to convert it back to secure on a
> subsequent guest access will fail.
> 
> The host kernel's handling of the "convert to secure" failures doesn't appear to
> be all that robust, e.g. it looks like there are multiple paths where the error
> is dropped on the floor and the guest is resumed , but IMO soft hanging the guest 
> is still better than inducing a fault in the guest, and far better than potentially
> coercing the guest into reading corrupted memory ("spurious" PVALIDATE).  And s390's
> behavior is fixable since it's purely a host error handling problem.
> 
> To truly make a page shared, s390 requires the guest to call into the ultravisor
> to make a page shared.  And on the host side, the host can pin a page as shared
> to prevent the guest from unsharing it while the host is accessing it as a shared
> page.
> 
> So, inducing #VC is similar in the sense that a malicious s390 can also DoS itself,
> but is quite different in that (AFAICT) s390 does not create an attack surface where
> a malicious or buggy host userspace can induce faults in the guest, or worst case in
> SNP, exploit a buggy guest into accepting and accessing corrupted data.
> 
> It's also different in that s390 doesn't implicitly convert between shared and
> private.  Functionally, it doesn't really change the end result because a buggy
> host that writes guest private memory will DoS the guest (by inducing a #VC or
> corrupting exported data), but at least for s390 there's a sane, legitimate use
> case for accessing guest private memory (swap and maybe migration?), whereas for
> SNP, IMO implicitly converting to shared on a host access is straight up wrong.
> 
> > Additionally it adds less complexity to the SNP kernel patches, and
> > requires no new ABI.
> 
> I disagree, this would require "new" ABI in the sense that it commits KVM to
> supporting SNP without requiring userspace to initiate any and all conversions
> between shared and private.  Which in my mind is the big elephant in the room:
> do we want to require new KVM (and kernel?) ABI to allow/force userspace to
> explicitly declare guest private memory for TDX _and_ SNP, or just TDX?
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 12:30         ` Dr. David Alan Gilbert
@ 2021-11-15 14:42           ` Joerg Roedel
  2021-11-15 15:33             ` Dr. David Alan Gilbert
  2021-11-15 18:26           ` Sean Christopherson
  1 sibling, 1 reply; 239+ messages in thread
From: Joerg Roedel @ 2021-11-15 14:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Sean Christopherson, Borislav Petkov, Dave Hansen, Peter Gonda,
	Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Mon, Nov 15, 2021 at 12:30:59PM +0000, Dr. David Alan Gilbert wrote:
> Still; I wonder if it's best to kill the guest - maybe it's best for
> the host to kill the guest and leave behind diagnostics of what
> happened; for someone debugging the crash, it's going to be less useful
> to know that page X was wrongly accessed (which is what the guest would
> see), and more useful to know that it was the kernel's vhost-... driver
> that accessed it.

I is best to let the guest #VC on the page when this happens. If it
happened because of a guest bug all necessary debugging data is in the
guest and only the guest owner can obtain it.

Then the guest owner can do a kdump on this unexpected #VC and collect
the data to debug the issue. With just killing the guest from the host
side this data would be lost.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 14:42           ` Joerg Roedel
@ 2021-11-15 15:33             ` Dr. David Alan Gilbert
  2021-11-15 16:20               ` Joerg Roedel
  0 siblings, 1 reply; 239+ messages in thread
From: Dr. David Alan Gilbert @ 2021-11-15 15:33 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Sean Christopherson, Borislav Petkov, Dave Hansen, Peter Gonda,
	Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

* Joerg Roedel (jroedel@suse.de) wrote:
> On Mon, Nov 15, 2021 at 12:30:59PM +0000, Dr. David Alan Gilbert wrote:
> > Still; I wonder if it's best to kill the guest - maybe it's best for
> > the host to kill the guest and leave behind diagnostics of what
> > happened; for someone debugging the crash, it's going to be less useful
> > to know that page X was wrongly accessed (which is what the guest would
> > see), and more useful to know that it was the kernel's vhost-... driver
> > that accessed it.
> 
> I is best to let the guest #VC on the page when this happens. If it
> happened because of a guest bug all necessary debugging data is in the
> guest and only the guest owner can obtain it.
> 
> Then the guest owner can do a kdump on this unexpected #VC and collect
> the data to debug the issue. With just killing the guest from the host
> side this data would be lost.

How would you debug an unexpected access by the host kernel using a
guests kdump?

Dave

> Regards,
> 
> 	Joerg
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 19:48       ` Sean Christopherson
                           ` (2 preceding siblings ...)
  2021-11-15 12:30         ` Dr. David Alan Gilbert
@ 2021-11-15 16:16         ` Joerg Roedel
  3 siblings, 0 replies; 239+ messages in thread
From: Joerg Roedel @ 2021-11-15 16:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, Dave Hansen, Peter Gonda, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Tom Lendacky, H. Peter Anvin,
	Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Fri, Nov 12, 2021 at 07:48:17PM +0000, Sean Christopherson wrote:
> Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.

And what is the plan with handling this host bug? Can it be handled in a
way that keeps the guest running?

IMO the best way to handle this is to do it the way Peter proposed:

	* Convert the page from private to shared on host write access
	  and log this event on the host side (e.g. via a trace event)
	* The guest will notice what happened and can decide on its own
	  what to do, either poison the page or panic with doing a
	  kdump that can be used for bug analysis by guest and host
	  owner

At the time the fault happens we can not reliably find the reason. It
can be a host bug, a guest bug (or attack), or whatnot. So the best that
can be done is collecting debug data without impacting other guests.

This also saves lots of code for avoiding these faults when the outcome
would be the same: A dead VM.

> I disagree, this would require "new" ABI in the sense that it commits KVM to
> supporting SNP without requiring userspace to initiate any and all conversions
> between shared and private.  Which in my mind is the big elephant in the room:
> do we want to require new KVM (and kernel?) ABI to allow/force userspace to
> explicitly declare guest private memory for TDX _and_ SNP, or just TDX?

No, not for SNP. User-space has no say in what guest memory is private
and shared, that should fundamentally be the guests decision. The host
has no idea about the guests workload and how much shared memory it
needs. It might be that the guest temporarily needs to share more
memory. I see no reason to cut this flexibility out for SNP guests.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 20:37           ` Sean Christopherson
  2021-11-12 20:53             ` Borislav Petkov
  2021-11-12 21:30             ` Marc Orr
@ 2021-11-15 16:18             ` Brijesh Singh
  2021-11-15 18:44               ` Sean Christopherson
  2 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-11-15 16:18 UTC (permalink / raw)
  To: Sean Christopherson, Borislav Petkov
  Cc: brijesh.singh, Dave Hansen, Peter Gonda, x86, linux-kernel, kvm,
	linux-coco, linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy


On 11/12/21 2:37 PM, Sean Christopherson wrote:
> On Fri, Nov 12, 2021, Borislav Petkov wrote:
>> On Fri, Nov 12, 2021 at 07:48:17PM +0000, Sean Christopherson wrote:
>>> Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
>>

In the automatic change proposal, both the the host and a guest bug will 
cause a guest to get the #VC and then the guest can decide whether it 
wants to proceed or terminate. If it chooses to move, it can poison the 
page and log it for future examination.

>> What do you suggest instead?
> 
> Let userspace decide what is mapped shared and what is mapped private.  The kernel
> and KVM provide the APIs/infrastructure to do the actual conversions in a thread-safe
> fashion and also to enforce the current state, but userspace is the control plane.
> 
> It would require non-trivial changes in userspace if there are multiple processes
> accessing guest memory, e.g. Peter's networking daemon example, but it _is_ fully
> solvable.  The exit to userspace means all three components (guest, kernel,
> and userspace) have full knowledge of what is shared and what is private.  There
> is zero ambiguity:
> 
>    - if userspace accesses guest private memory, it gets SIGSEGV or whatever.
>    - if kernel accesses guest private memory, it does BUG/panic/oops[*]
>    - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.
> 
> This is the direction KVM TDX support is headed, though it's obviously still a WIP.
> 

Just curious, in this approach, how do you propose handling the host 
kexec/kdump? If a kexec/kdump occurs while the VM is still active, the 
new kernel will encounter the #PF (RMP violation) because some pages are 
still marked 'private' in the RMP table.



> And ideally, to avoid implicit conversions at any level, hardware vendors' ABIs
> define that:
> 
>    a) All convertible memory, i.e. RAM, starts as private.
>    b) Conversions between private and shared must be done via explicit hypercall.
> 
> Without (b), userspace and thus KVM have to treat guest accesses to the incorrect
> type as implicit conversions.
> 
> [*] Sadly, fully preventing kernel access to guest private is not possible with
>      TDX, especially if the direct map is left intact.  But maybe in the future
>      TDX will signal a fault instead of poisoning memory and leaving a #MC mine.
> 

thanks

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 15:33             ` Dr. David Alan Gilbert
@ 2021-11-15 16:20               ` Joerg Roedel
  2021-11-15 16:32                 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 239+ messages in thread
From: Joerg Roedel @ 2021-11-15 16:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Sean Christopherson, Borislav Petkov, Dave Hansen, Peter Gonda,
	Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Mon, Nov 15, 2021 at 03:33:03PM +0000, Dr. David Alan Gilbert wrote:
> How would you debug an unexpected access by the host kernel using a
> guests kdump?

The host needs to log the access in some way (trace-event, pr_err) with
relevant information.

And with the guest-side kdump you can rule out that it was a guest bug
which triggered the access. And you learn what the guest was trying to
do when it triggered the access. This also helps finding the issue on
the host side (if it is a host-side issue).

(Guest and host owner need to work together for debugging, of course).

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 16:20               ` Joerg Roedel
@ 2021-11-15 16:32                 ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 239+ messages in thread
From: Dr. David Alan Gilbert @ 2021-11-15 16:32 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Sean Christopherson, Borislav Petkov, Dave Hansen, Peter Gonda,
	Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

* Joerg Roedel (jroedel@suse.de) wrote:
> On Mon, Nov 15, 2021 at 03:33:03PM +0000, Dr. David Alan Gilbert wrote:
> > How would you debug an unexpected access by the host kernel using a
> > guests kdump?
> 
> The host needs to log the access in some way (trace-event, pr_err) with
> relevant information.

Yeh OK that makes sense if the host-owner could then enable some type of
debugging based on that event for the unfortunate guest owner.

> And with the guest-side kdump you can rule out that it was a guest bug
> which triggered the access. And you learn what the guest was trying to
> do when it triggered the access. This also helps finding the issue on
> the host side (if it is a host-side issue).
> 
> (Guest and host owner need to work together for debugging, of course).

Yeh.

Dave

> Regards,
> 
> -- 
> Jörg Rödel
> jroedel@suse.de
> 
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5
> 90409 Nürnberg
> Germany
>  
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-13 18:34                     ` Sean Christopherson
  2021-11-14  7:54                       ` Marc Orr
@ 2021-11-15 16:36                       ` Joerg Roedel
  2021-11-15 17:25                         ` Sean Christopherson
  1 sibling, 1 reply; 239+ messages in thread
From: Joerg Roedel @ 2021-11-15 16:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Orr, Peter Gonda, Borislav Petkov, Dave Hansen,
	Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Sat, Nov 13, 2021 at 06:34:52PM +0000, Sean Christopherson wrote:
> I'm not treating it nonchalantly, merely acknowledging that (a) some flavors of kernel
> bugs (or hardware issues!) are inherently fatal to the system, and (b) crashing the
> host may be preferable to continuing on in certain cases, e.g. if continuing on has a
> high probablity of corrupting guest data.

The problem here is that for SNP host-side RMP faults it will often not
be clear at fault-time if it was caused by wrong guest or host behavior. 

I agree with Marc that crashing the host is not the right thing to do in
this situation. Instead debug data should be collected to do further
post-mortem analysis.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-13 18:28                         ` Sean Christopherson
  2021-11-14  7:41                           ` Marc Orr
@ 2021-11-15 16:52                           ` Joerg Roedel
  1 sibling, 0 replies; 239+ messages in thread
From: Joerg Roedel @ 2021-11-15 16:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Orr, Peter Gonda, Andy Lutomirski, Borislav Petkov,
	Dave Hansen, Brijesh Singh, the arch/x86 maintainers,
	Linux Kernel Mailing List, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Dave Hansen,
	Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy

On Sat, Nov 13, 2021 at 06:28:16PM +0000, Sean Christopherson wrote:
> Another issue is that the host kernel, which despite being "untrusted", absolutely
> should be acting in the best interests of the guest.  Allowing userspace to inject
> #VC, e.g. to attempt to attack the guest by triggering a spurious PVALIDATE, means
> the kernel is failing miserably on that front.

Well, no. The kernel is only a part of the hypervisor, KVM userspace is
another. It is possible today for the userspace part(s) to interact in bad
ways with the guest and trick or kill it. Allowing user-space to cause a
#VC in the guest is no different from that.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-14  7:54                       ` Marc Orr
@ 2021-11-15 17:16                         ` Sean Christopherson
  0 siblings, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-11-15 17:16 UTC (permalink / raw)
  To: Marc Orr
  Cc: Peter Gonda, Borislav Petkov, Dave Hansen, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Sat, Nov 13, 2021, Marc Orr wrote:
> On Sat, Nov 13, 2021 at 10:35 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Nov 12, 2021, Marc Orr wrote:
> > > > > > If *it* is the host kernel, then you probably shouldn't do that -
> > > > > > otherwise you just killed the host kernel on which all those guests are
> > > > > > running.
> > > > >
> > > > > I agree, it seems better to terminate the single guest with an issue.
> > > > > Rather than killing the host (and therefore all guests). So I'd
> > > > > suggest even in this case we do the 'convert to shared' approach or
> > > > > just outright terminate the guest.
> > > > >
> > > > > Are there already examples in KVM of a KVM bug in servicing a VM's
> > > > > request results in a BUG/panic/oops? That seems not ideal ever.
> > > >
> > > > Plenty of examples.  kvm_spurious_fault() is the obvious one.  Any NULL pointer
> > > > deref will lead to a BUG, etc...  And it's not just KVM, e.g. it's possible, if
> > > > unlikely, for the core kernel to run into guest private memory (e.g. if the kernel
> > > > botches an RMP change), and if that happens there's no guarantee that the kernel
> > > > can recover.
> > > >
> > > > I fully agree that ideally KVM would have a better sense of self-preservation,
> > > > but IMO that's an orthogonal discussion.
> > >
> > > I don't think we should treat the possibility of crashing the host
> > > with live VMs nonchalantly. It's a big deal. Doing so has big
> > > implications on the probability that any cloud vendor wil bee able to
> > > deploy this code to production. And aren't cloud vendors one of the
> > > main use cases for all of this confidential compute stuff? I'm
> > > honestly surprised that so many people are OK with crashing the host.
> >
> > I'm not treating it nonchalantly, merely acknowledging that (a) some flavors of kernel
> > bugs (or hardware issues!) are inherently fatal to the system, and (b) crashing the
> > host may be preferable to continuing on in certain cases, e.g. if continuing on has a
> > high probablity of corrupting guest data.
> 
> I disagree. Crashing the host -- and _ALL_ of its VMs (including
> non-confidential VMs) -- is not preferable to crashing a single SNP
> VM.

We're in violent agreement.  I fully agree that, when allowed by the architecture,
injecting an error into the guest is preferable to killing the VM, which is in turn
preferable to crashing the host.

What I'm saying is that there are classes of bugs where injecting an error is not
allowed/feasible, and where killing an individual VM is not correct/feasible.

The canonical example of this escalating behavior is an uncorrectable ECC #MC.  If
the bad page is guest memory and the guest vCPU model supports MCA, then userspace
can inject an #MC into the guest so that the guest can take action and hopefully
not simply die.  If the bad page is in the guest but the guest doesn't support #MC
injection, the guest effectively gets killed.  And if the #MC is in host kernel
memory that can't be offlined, e.g. hits the IDT, then the whole system comes
crashing down.

> Especially when that SNP VM is guaranteed to detect the memory corruption and
> react accordingly.

For the record, we can make no such guarantees about the SNP VM.  Yes, the VM
_should_ do the right thing when handed a #VC, but bugs happen, otherwise we
wouldn't be having this discussion.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 16:36                       ` Joerg Roedel
@ 2021-11-15 17:25                         ` Sean Christopherson
  0 siblings, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-11-15 17:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Marc Orr, Peter Gonda, Borislav Petkov, Dave Hansen,
	Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Mon, Nov 15, 2021, Joerg Roedel wrote:
> On Sat, Nov 13, 2021 at 06:34:52PM +0000, Sean Christopherson wrote:
> > I'm not treating it nonchalantly, merely acknowledging that (a) some flavors of kernel
> > bugs (or hardware issues!) are inherently fatal to the system, and (b) crashing the
> > host may be preferable to continuing on in certain cases, e.g. if continuing on has a
> > high probablity of corrupting guest data.
> 
> The problem here is that for SNP host-side RMP faults it will often not
> be clear at fault-time if it was caused by wrong guest or host behavior. 
> 
> I agree with Marc that crashing the host is not the right thing to do in
> this situation. Instead debug data should be collected to do further
> post-mortem analysis.

Again, I am not saying that any RMP #PF violation is an immediate, "crash the
host".  It should be handled exactly like any other #PF due to permission violation.
The only wrinkle added by the RMP is that the #PF can be due to permissions on the
GPA itself, but even that is not unique, e.g. see the proposed KVM XO support that
will hopefully still land someday.

If the guest violates the current permissions, it (indirectly) gets a #VC.  If host
userspace violates permissions, it gets SIGSEGV.  If the host kernel violates
permissions, then it reacts to the #PF in whatever way it can.  What I am saying is
that in some cases, there is _zero_ chance of recovery in the host and so crashing
the entire system is inevitable.   E.g. if the host kernel hits an RMP #PF when
vectoring a #GP because the IDT lookup somehow triggers an RMP violation, then the
host is going into triple fault shutdown.

[*] https://lore.kernel.org/linux-mm/20191003212400.31130-1-rick.p.edgecombe@intel.com/

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-14  7:41                           ` Marc Orr
@ 2021-11-15 18:17                             ` Sean Christopherson
  0 siblings, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-11-15 18:17 UTC (permalink / raw)
  To: Marc Orr
  Cc: Peter Gonda, Andy Lutomirski, Borislav Petkov, Dave Hansen,
	Brijesh Singh, the arch/x86 maintainers,
	Linux Kernel Mailing List, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy

On Sat, Nov 13, 2021, Marc Orr wrote:
> On Sat, Nov 13, 2021 at 10:28 AM Sean Christopherson <seanjc@google.com> wrote:
> > The behavior is no different than it is today for regular VMs.
> 
> Isn't this counter to the sketch you laid out earlier where you wrote:
> 
> --- QUOTE START ---
>   - if userspace accesses guest private memory, it gets SIGSEGV or whatever.
>   - if kernel accesses guest private memory, it does BUG/panic/oops[*]
>   - if guest accesses memory with the incorrect C/SHARED-bit, it gets killed.
> --- QUOTE END ---
> 
> Here, the guest does not get killed. Which seems hard to debug.

No, it does contradict that statement.
 
  If the guest requests a conversion from shared=>private without first ensuring
  the gfn is unused (by a host "device"), the host will side will continue accessing
  the old, shared memory, which it locked, while the guest will be doing who knows
  what.

In this case, the guest will have converted a GPA from shared=>private, i.e. changed
the effective GFN for a e.g. a shared queue, without informing host userspace that
the GFN, and thus the associated HVA in the host, has changed. For TDX that is
literally the same bug as the guest changing the GFN without informing the host, as
the SHARED bit is just an address bit with some extra meaning piled on top.  For SNP,
it's slightly different because the C-bit isn't strictly required to be an address
bit, but for all intents and purposes it's the same type of bug.

I phrased it "guest will be doing who knows what" because from a host userspace
perspective, it can't know what the guest behavior will be, and more importantly,
it doesn't care because (a) the guest is buggy and (b) the host itself  is _not_ in
danger.

Yes, those types of bugs suck to debug.  But they really should be few and far
between.  The only reason I called out this specific scenario was to note that host
userspace doesn't need to take extra steps to guard against bogus shared=>private
conversions, because host userspace already needs to have such guards in place.  In
prior (offline?) conversations, we had assumed that host userspace would need to
propagate the shared vs. private status to any and all processes that map guest
memory, i.e. would require substantial enabling, but that assumption was wrong.

> If allowing userspace to inject #VC into the guest means that the host
> can continue to serve other guests, that seems like a win. The
> alternative, to blow up the host, essentially expands the blast radius
> from a single guest to all guests.

As mentioned in other threads of this conversation, when I say "host crashes", I
am specifically talking about scenarios where it is simply not possible for the
host kernel to recover, e.g. an RMP #PF violation on the IDT.

Setting that aside, injecting a #VC into the guest is not in anyway necessary for
a buggy host userspace to terminate a single guest, host userspace can simply stop
running that specific guest.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 12:30         ` Dr. David Alan Gilbert
  2021-11-15 14:42           ` Joerg Roedel
@ 2021-11-15 18:26           ` Sean Christopherson
  2021-11-15 18:41             ` Marc Orr
  2021-11-16 13:02             ` Joerg Roedel
  1 sibling, 2 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-11-15 18:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Borislav Petkov, Dave Hansen, Peter Gonda, Brijesh Singh, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Mon, Nov 15, 2021, Dr. David Alan Gilbert wrote:
> * Sean Christopherson (seanjc@google.com) wrote:
> > On Fri, Nov 12, 2021, Borislav Petkov wrote:
> > > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
> > > > Or, is there some mechanism that prevent guest-private memory from being
> > > > accessed in random host kernel code?
> > 
> > Or random host userspace code...
> > 
> > > So I'm currently under the impression that random host->guest accesses
> > > should not happen if not previously agreed upon by both.
> > 
> > Key word "should".
> > 
> > > Because, as explained on IRC, if host touches a private guest page,
> > > whatever the host does to that page, the next time the guest runs, it'll
> > > get a #VC where it will see that that page doesn't belong to it anymore
> > > and then, out of paranoia, it will simply terminate to protect itself.
> > > 
> > > So cloud providers should have an interest to prevent such random stray
> > > accesses if they wanna have guests. :)
> > 
> > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
> 
> Would it necessarily have been a host bug?  A guest telling the host a
> bad GPA to DMA into would trigger this wouldn't it?

No, because as Andy pointed out, host userspace must already guard against a bad
GPA, i.e. this is just a variant of the guest telling the host to DMA to a GPA
that is completely bogus.  The shared vs. private behavior just means that when
host userspace is doing a GPA=>HVA lookup, it needs to incorporate the "shared"
state of the GPA.  If the host goes and DMAs into the completely wrong HVA=>PFN,
then that is a host bug; that the bug happened to be exploited by a buggy/malicious
guest doesn't change the fact that the host messed up.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 18:26           ` Sean Christopherson
@ 2021-11-15 18:41             ` Marc Orr
  2021-11-15 19:15               ` Sean Christopherson
  2021-11-16  5:00               ` Andy Lutomirski
  2021-11-16 13:02             ` Joerg Roedel
  1 sibling, 2 replies; 239+ messages in thread
From: Marc Orr @ 2021-11-15 18:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dr. David Alan Gilbert, Borislav Petkov, Dave Hansen,
	Peter Gonda, Brijesh Singh, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy

On Mon, Nov 15, 2021 at 10:26 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Nov 15, 2021, Dr. David Alan Gilbert wrote:
> > * Sean Christopherson (seanjc@google.com) wrote:
> > > On Fri, Nov 12, 2021, Borislav Petkov wrote:
> > > > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
> > > > > Or, is there some mechanism that prevent guest-private memory from being
> > > > > accessed in random host kernel code?
> > >
> > > Or random host userspace code...
> > >
> > > > So I'm currently under the impression that random host->guest accesses
> > > > should not happen if not previously agreed upon by both.
> > >
> > > Key word "should".
> > >
> > > > Because, as explained on IRC, if host touches a private guest page,
> > > > whatever the host does to that page, the next time the guest runs, it'll
> > > > get a #VC where it will see that that page doesn't belong to it anymore
> > > > and then, out of paranoia, it will simply terminate to protect itself.
> > > >
> > > > So cloud providers should have an interest to prevent such random stray
> > > > accesses if they wanna have guests. :)
> > >
> > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
> >
> > Would it necessarily have been a host bug?  A guest telling the host a
> > bad GPA to DMA into would trigger this wouldn't it?
>
> No, because as Andy pointed out, host userspace must already guard against a bad
> GPA, i.e. this is just a variant of the guest telling the host to DMA to a GPA
> that is completely bogus.  The shared vs. private behavior just means that when
> host userspace is doing a GPA=>HVA lookup, it needs to incorporate the "shared"
> state of the GPA.  If the host goes and DMAs into the completely wrong HVA=>PFN,
> then that is a host bug; that the bug happened to be exploited by a buggy/malicious
> guest doesn't change the fact that the host messed up.

"If the host goes and DMAs into the completely wrong HVA=>PFN, then
that is a host bug; that the bug happened to be exploited by a
buggy/malicious guest doesn't change the fact that the host messed
up."
^^^
Again, I'm flabbergasted that you are arguing that it's OK for a guest
to exploit a host bug to take down host-side processes or the host
itself, either of which could bring down all other VMs on the machine.

I'm going to repeat -- this is not OK! Period.

Again, if the community wants to layer some orchestration scheme
between host userspace, host kernel, and guest, on top of the code to
inject the #VC into the guest, that's fine. This proposal is not
stopping that. In fact, the two approaches are completely orthogonal
and compatible.

But so far I have heard zero reasons why injecting a #VC into the
guest is wrong. Other than just stating that it's wrong.

Again, the guest must be able to detect buggy and malicious host-side
writes to private memory. Or else "confidential computing" doesn't
work. Assuming that's not true is not a valid argument to dismiss
injecting a #VC exception into the guest.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 16:18             ` Brijesh Singh
@ 2021-11-15 18:44               ` Sean Christopherson
  2021-11-15 18:58                 ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-11-15 18:44 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Borislav Petkov, Dave Hansen, Peter Gonda, x86, linux-kernel,
	kvm, linux-coco, linux-mm, linux-crypto, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, Tom Lendacky, H. Peter Anvin,
	Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Mon, Nov 15, 2021, Brijesh Singh wrote:
> 
> On 11/12/21 2:37 PM, Sean Christopherson wrote:
> > This is the direction KVM TDX support is headed, though it's obviously still a WIP.
> > 
> 
> Just curious, in this approach, how do you propose handling the host
> kexec/kdump? If a kexec/kdump occurs while the VM is still active, the new
> kernel will encounter the #PF (RMP violation) because some pages are still
> marked 'private' in the RMP table.

There are two basic options: a) eagerly purge the RMP or b) lazily fixup the RMP
on #PF.  Either approach can be made to work.  I'm not opposed to fixing up the RMP
on #PF in the kexec/kdump case, I'm opposed to blindly updating the RMP on _all_
RMP #PFs, i.e. the kernel should modify the RMP if and only if it knows that doing
so is correct.  E.g. a naive lazy-fixup solution would be to track which pages have
been sanitized and adjust the RMP on #PF to a page that hasn't yet been sanitized.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 18:44               ` Sean Christopherson
@ 2021-11-15 18:58                 ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-11-15 18:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, Borislav Petkov, Dave Hansen, Peter Gonda, x86,
	linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy



On 11/15/21 12:44 PM, Sean Christopherson wrote:
> On Mon, Nov 15, 2021, Brijesh Singh wrote:
>>
>> On 11/12/21 2:37 PM, Sean Christopherson wrote:
>>> This is the direction KVM TDX support is headed, though it's obviously still a WIP.
>>>
>>
>> Just curious, in this approach, how do you propose handling the host
>> kexec/kdump? If a kexec/kdump occurs while the VM is still active, the new
>> kernel will encounter the #PF (RMP violation) because some pages are still
>> marked 'private' in the RMP table.
> 
> There are two basic options: a) eagerly purge the RMP or b) lazily fixup the RMP
> on #PF.  Either approach can be made to work.  I'm not opposed to fixing up the RMP
> on #PF in the kexec/kdump case, I'm opposed to blindly updating the RMP on _all_
> RMP #PFs, i.e. the kernel should modify the RMP if and only if it knows that doing
> so is correct.  E.g. a naive lazy-fixup solution would be to track which pages have
> been sanitized and adjust the RMP on #PF to a page that hasn't yet been sanitized.
> 

Yap, I think option #a will require the current kernel to iterate 
through the entire memory and make it shared before booting the kexec 
kernel. It may bring another ask to track the guest private/shared on 
the host to minimize the iterations.

thanks

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 18:41             ` Marc Orr
@ 2021-11-15 19:15               ` Sean Christopherson
  2021-11-16  3:07                 ` Marc Orr
  2021-11-16 13:30                 ` Joerg Roedel
  2021-11-16  5:00               ` Andy Lutomirski
  1 sibling, 2 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-11-15 19:15 UTC (permalink / raw)
  To: Marc Orr
  Cc: Dr. David Alan Gilbert, Borislav Petkov, Dave Hansen,
	Peter Gonda, Brijesh Singh, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy, Marc Zyngier, Will Deacon,
	Quentin Perret

+arm64 KVM folks

On Mon, Nov 15, 2021, Marc Orr wrote:
> On Mon, Nov 15, 2021 at 10:26 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, Nov 15, 2021, Dr. David Alan Gilbert wrote:
> > > * Sean Christopherson (seanjc@google.com) wrote:
> > > > On Fri, Nov 12, 2021, Borislav Petkov wrote:
> > > > > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
> > > > > > Or, is there some mechanism that prevent guest-private memory from being
> > > > > > accessed in random host kernel code?
> > > >
> > > > Or random host userspace code...
> > > >
> > > > > So I'm currently under the impression that random host->guest accesses
> > > > > should not happen if not previously agreed upon by both.
> > > >
> > > > Key word "should".
> > > >
> > > > > Because, as explained on IRC, if host touches a private guest page,
> > > > > whatever the host does to that page, the next time the guest runs, it'll
> > > > > get a #VC where it will see that that page doesn't belong to it anymore
> > > > > and then, out of paranoia, it will simply terminate to protect itself.
> > > > >
> > > > > So cloud providers should have an interest to prevent such random stray
> > > > > accesses if they wanna have guests. :)
> > > >
> > > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
> > >
> > > Would it necessarily have been a host bug?  A guest telling the host a
> > > bad GPA to DMA into would trigger this wouldn't it?
> >
> > No, because as Andy pointed out, host userspace must already guard against a bad
> > GPA, i.e. this is just a variant of the guest telling the host to DMA to a GPA
> > that is completely bogus.  The shared vs. private behavior just means that when
> > host userspace is doing a GPA=>HVA lookup, it needs to incorporate the "shared"
> > state of the GPA.  If the host goes and DMAs into the completely wrong HVA=>PFN,
> > then that is a host bug; that the bug happened to be exploited by a buggy/malicious
> > guest doesn't change the fact that the host messed up.
> 
> "If the host goes and DMAs into the completely wrong HVA=>PFN, then
> that is a host bug; that the bug happened to be exploited by a
> buggy/malicious guest doesn't change the fact that the host messed
> up."
> ^^^
> Again, I'm flabbergasted that you are arguing that it's OK for a guest
> to exploit a host bug to take down host-side processes or the host
> itself, either of which could bring down all other VMs on the machine.
> 
> I'm going to repeat -- this is not OK! Period.

Huh?  At which point did I suggest it's ok to ship software with bugs?  Of course
it's not ok to introduce host bugs that let the guest crash the host (or host
processes).  But _if_ someone does ship buggy host software, it's not like we can
wave a magic wand and stop the guest from exploiting the bug.  That's why they're
such a big deal.

Yes, in this case a very specific flavor of host userspace bug could be morphed
into a guest exception, but as mentioned ad nauseum, _if_ host userspace has bug
where it does not properly validate a GPA=>HVA, then any such bug exists and is
exploitable today irrespective of SNP.

> Again, if the community wants to layer some orchestration scheme
> between host userspace, host kernel, and guest, on top of the code to
> inject the #VC into the guest, that's fine. This proposal is not
> stopping that. In fact, the two approaches are completely orthogonal
> and compatible.
> 
> But so far I have heard zero reasons why injecting a #VC into the
> guest is wrong. Other than just stating that it's wrong.

It creates a new attack surface, e.g. if the guest mishandles the #VC and does
PVALIDATE on memory that it previously accepted, then userspace can attack the
guest by accessing guest private memory to coerce the guest into consuming corrupted
data.

> Again, the guest must be able to detect buggy and malicious host-side
> writes to private memory. Or else "confidential computing" doesn't
> work.

That assertion assumes the host _hypervisor_ is untrusted, which does not hold true
for all use cases.  The Cc'd arm64 folks are working on a protected VM model where
the host kernel at large is untrusted, but the "hypervisor" (KVM plus a few other
bits), is still trusted by the guest.  Because the hypervisor is trusted, the guest
doesn't need to be hardened against event injection attacks from the host.

Note, SNP already has a similar concept in it's VMPLs.  VMPL3 runs a confidential VM
that is not hardened in any way, and fully trusts VMPL0 to not inject bogus faults.

And along the lines of arm64's pKVM, I would very much like to get KVM to a point
where it can remove host userspace from the guest's TCB without relying on hardware.
Anything that can be in hardware absolutely can be done in the kernel, and likely
can be done with significantly less performance overhead.  Confidential computing is
not a binary thing where the only valid use case is removing the host kernel from
the TCB and trusting only hardware.  There are undoubtedly use cases where trusting
the host kernel but not host userspace brings tangible value, but the overhead of
TDX/SNP to get the host kernel out of the TCB is the wrong tradeoff for performance
vs. security.

> Assuming that's not true is not a valid argument to dismiss
> injecting a #VC exception into the guest.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 19:15               ` Sean Christopherson
@ 2021-11-16  3:07                 ` Marc Orr
  2021-11-16  5:14                   ` Andy Lutomirski
  2021-11-16 13:30                 ` Joerg Roedel
  1 sibling, 1 reply; 239+ messages in thread
From: Marc Orr @ 2021-11-16  3:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dr. David Alan Gilbert, Borislav Petkov, Dave Hansen,
	Peter Gonda, Brijesh Singh, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy, Marc Zyngier, Will Deacon,
	Quentin Perret

On Mon, Nov 15, 2021 at 11:15 AM Sean Christopherson <seanjc@google.com> wrote:
>
> +arm64 KVM folks
>
> On Mon, Nov 15, 2021, Marc Orr wrote:
> > On Mon, Nov 15, 2021 at 10:26 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Mon, Nov 15, 2021, Dr. David Alan Gilbert wrote:
> > > > * Sean Christopherson (seanjc@google.com) wrote:
> > > > > On Fri, Nov 12, 2021, Borislav Petkov wrote:
> > > > > > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
> > > > > > > Or, is there some mechanism that prevent guest-private memory from being
> > > > > > > accessed in random host kernel code?
> > > > >
> > > > > Or random host userspace code...
> > > > >
> > > > > > So I'm currently under the impression that random host->guest accesses
> > > > > > should not happen if not previously agreed upon by both.
> > > > >
> > > > > Key word "should".
> > > > >
> > > > > > Because, as explained on IRC, if host touches a private guest page,
> > > > > > whatever the host does to that page, the next time the guest runs, it'll
> > > > > > get a #VC where it will see that that page doesn't belong to it anymore
> > > > > > and then, out of paranoia, it will simply terminate to protect itself.
> > > > > >
> > > > > > So cloud providers should have an interest to prevent such random stray
> > > > > > accesses if they wanna have guests. :)
> > > > >
> > > > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
> > > >
> > > > Would it necessarily have been a host bug?  A guest telling the host a
> > > > bad GPA to DMA into would trigger this wouldn't it?
> > >
> > > No, because as Andy pointed out, host userspace must already guard against a bad
> > > GPA, i.e. this is just a variant of the guest telling the host to DMA to a GPA
> > > that is completely bogus.  The shared vs. private behavior just means that when
> > > host userspace is doing a GPA=>HVA lookup, it needs to incorporate the "shared"
> > > state of the GPA.  If the host goes and DMAs into the completely wrong HVA=>PFN,
> > > then that is a host bug; that the bug happened to be exploited by a buggy/malicious
> > > guest doesn't change the fact that the host messed up.
> >
> > "If the host goes and DMAs into the completely wrong HVA=>PFN, then
> > that is a host bug; that the bug happened to be exploited by a
> > buggy/malicious guest doesn't change the fact that the host messed
> > up."
> > ^^^
> > Again, I'm flabbergasted that you are arguing that it's OK for a guest
> > to exploit a host bug to take down host-side processes or the host
> > itself, either of which could bring down all other VMs on the machine.
> >
> > I'm going to repeat -- this is not OK! Period.
>
> Huh?  At which point did I suggest it's ok to ship software with bugs?  Of course
> it's not ok to introduce host bugs that let the guest crash the host (or host
> processes).  But _if_ someone does ship buggy host software, it's not like we can
> wave a magic wand and stop the guest from exploiting the bug.  That's why they're
> such a big deal.
>
> Yes, in this case a very specific flavor of host userspace bug could be morphed
> into a guest exception, but as mentioned ad nauseum, _if_ host userspace has bug
> where it does not properly validate a GPA=>HVA, then any such bug exists and is
> exploitable today irrespective of SNP.

If I'm understanding you correctly, you're saying that we will never
get into the host's page fault handler due to an RMP violation if we
implement the unmapping guest private memory proposal (without bugs).

However, bugs do happen. And the host-side page fault handler will
have code to react to an RMP violation (even if it's supposedly
impossible to hit). I'm saying that the host-side page fault handler
should NOT handle an RMP violation by killing host-side processes or
the kernel itself. This is detrimental to host reliability.

There are two ways to handle this. (1) Convert the private page
causing the RMP violation to shared, (2) Kill the guest.

Converting the private page to shared is a good solution in SNP's
threat model. And overall, it's better for debuggability than
immediately terminating the guest.

> > Again, if the community wants to layer some orchestration scheme
> > between host userspace, host kernel, and guest, on top of the code to
> > inject the #VC into the guest, that's fine. This proposal is not
> > stopping that. In fact, the two approaches are completely orthogonal
> > and compatible.
> >
> > But so far I have heard zero reasons why injecting a #VC into the
> > guest is wrong. Other than just stating that it's wrong.
>
> It creates a new attack surface, e.g. if the guest mishandles the #VC and does
> PVALIDATE on memory that it previously accepted, then userspace can attack the
> guest by accessing guest private memory to coerce the guest into consuming corrupted
> data.

We should handle RMP violations as best possible from within the
host-side page fault handler, independent of the proposal to unmap
private guest memory for all CVM architectures. Otherwise, if someone
figures out how to trigger an RMP violation by writing guest private
memory (despite unmapping guest private memory's goal to make this
impossible), the attack surface has now increased. Because now we're
either killing host processes or the kernel. Which is worse than
killing the single guest.

Second, I don't think it's correct to say that the host-side
implementation changes the attack surface. The guest must already be
hardened against host-side bugs and attacks. From the guest's
perspective, the attack surface is the same. Also, unmapping private
guest memory is going to require coordination across host userspace,
host kernel, and guest. That's a lot of code. And typically more code
means more attack surface. That being said, if all that code is
implemented perfectly, you might be right, that unmapping guest
private memory makes it harder for the host to attack the guest. But I
think it's a moot point. Because in the end, converting the page from
private to shared is entirely reasonable to solve this problem for
SNP, and we should be doing it irregardless, even if we do get
unmapping private memory working on SNP.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 18:41             ` Marc Orr
  2021-11-15 19:15               ` Sean Christopherson
@ 2021-11-16  5:00               ` Andy Lutomirski
  1 sibling, 0 replies; 239+ messages in thread
From: Andy Lutomirski @ 2021-11-16  5:00 UTC (permalink / raw)
  To: Marc Orr, Sean Christopherson
  Cc: Dr. David Alan Gilbert, Borislav Petkov, Dave Hansen,
	Peter Gonda, Brijesh Singh, the arch/x86 maintainers,
	Linux Kernel Mailing List, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy



On Mon, Nov 15, 2021, at 10:41 AM, Marc Orr wrote:
> On Mon, Nov 15, 2021 at 10:26 AM Sean Christopherson <seanjc@google.com> wrote:
>>
>> On Mon, Nov 15, 2021, Dr. David Alan Gilbert wrote:
>> > * Sean Christopherson (seanjc@google.com) wrote:
>> > > On Fri, Nov 12, 2021, Borislav Petkov wrote:
>> > > > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
>> > > > > Or, is there some mechanism that prevent guest-private memory from being
>> > > > > accessed in random host kernel code?
>> > >
>> > > Or random host userspace code...
>> > >
>> > > > So I'm currently under the impression that random host->guest accesses
>> > > > should not happen if not previously agreed upon by both.
>> > >
>> > > Key word "should".
>> > >
>> > > > Because, as explained on IRC, if host touches a private guest page,
>> > > > whatever the host does to that page, the next time the guest runs, it'll
>> > > > get a #VC where it will see that that page doesn't belong to it anymore
>> > > > and then, out of paranoia, it will simply terminate to protect itself.
>> > > >
>> > > > So cloud providers should have an interest to prevent such random stray
>> > > > accesses if they wanna have guests. :)
>> > >
>> > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
>> >
>> > Would it necessarily have been a host bug?  A guest telling the host a
>> > bad GPA to DMA into would trigger this wouldn't it?
>>
>> No, because as Andy pointed out, host userspace must already guard against a bad
>> GPA, i.e. this is just a variant of the guest telling the host to DMA to a GPA
>> that is completely bogus.  The shared vs. private behavior just means that when
>> host userspace is doing a GPA=>HVA lookup, it needs to incorporate the "shared"
>> state of the GPA.  If the host goes and DMAs into the completely wrong HVA=>PFN,
>> then that is a host bug; that the bug happened to be exploited by a buggy/malicious
>> guest doesn't change the fact that the host messed up.
>
> "If the host goes and DMAs into the completely wrong HVA=>PFN, then
> that is a host bug; that the bug happened to be exploited by a
> buggy/malicious guest doesn't change the fact that the host messed
> up."
> ^^^
> Again, I'm flabbergasted that you are arguing that it's OK for a guest
> to exploit a host bug to take down host-side processes or the host
> itself, either of which could bring down all other VMs on the machine.
>
> I'm going to repeat -- this is not OK! Period.

I don’t understand the point you’re trying to make. If the host _kernel_has a bug that allows a guest to trigger invalid host memory access, this is bad. We want to know about it and fix it, abcs the security folks want to minimize the chance that such a bug exists.

If host _userspace_ such a bug, the kernel should not crash if it’s exploited.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-16  3:07                 ` Marc Orr
@ 2021-11-16  5:14                   ` Andy Lutomirski
  2021-11-16 13:21                     ` Joerg Roedel
  0 siblings, 1 reply; 239+ messages in thread
From: Andy Lutomirski @ 2021-11-16  5:14 UTC (permalink / raw)
  To: Marc Orr, Sean Christopherson
  Cc: Dr. David Alan Gilbert, Borislav Petkov, Dave Hansen,
	Peter Gonda, Brijesh Singh, the arch/x86 maintainers,
	Linux Kernel Mailing List, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy, Marc Zyngier, Will Deacon,
	Quentin Perret



On Mon, Nov 15, 2021, at 7:07 PM, Marc Orr wrote:
> On Mon, Nov 15, 2021 at 11:15 AM Sean Christopherson <seanjc@google.com> wrote:
>>
>> +arm64 KVM folks
>>
>> On Mon, Nov 15, 2021, Marc Orr wrote:
>> > On Mon, Nov 15, 2021 at 10:26 AM Sean Christopherson <seanjc@google.com> wrote:
>> > >
>> > > On Mon, Nov 15, 2021, Dr. David Alan Gilbert wrote:
>> > > > * Sean Christopherson (seanjc@google.com) wrote:
>> > > > > On Fri, Nov 12, 2021, Borislav Petkov wrote:
>> > > > > > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
>> > > > > > > Or, is there some mechanism that prevent guest-private memory from being
>> > > > > > > accessed in random host kernel code?
>> > > > >
>> > > > > Or random host userspace code...
>> > > > >
>> > > > > > So I'm currently under the impression that random host->guest accesses
>> > > > > > should not happen if not previously agreed upon by both.
>> > > > >
>> > > > > Key word "should".
>> > > > >
>> > > > > > Because, as explained on IRC, if host touches a private guest page,
>> > > > > > whatever the host does to that page, the next time the guest runs, it'll
>> > > > > > get a #VC where it will see that that page doesn't belong to it anymore
>> > > > > > and then, out of paranoia, it will simply terminate to protect itself.
>> > > > > >
>> > > > > > So cloud providers should have an interest to prevent such random stray
>> > > > > > accesses if they wanna have guests. :)
>> > > > >
>> > > > > Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.
>> > > >
>> > > > Would it necessarily have been a host bug?  A guest telling the host a
>> > > > bad GPA to DMA into would trigger this wouldn't it?
>> > >
>> > > No, because as Andy pointed out, host userspace must already guard against a bad
>> > > GPA, i.e. this is just a variant of the guest telling the host to DMA to a GPA
>> > > that is completely bogus.  The shared vs. private behavior just means that when
>> > > host userspace is doing a GPA=>HVA lookup, it needs to incorporate the "shared"
>> > > state of the GPA.  If the host goes and DMAs into the completely wrong HVA=>PFN,
>> > > then that is a host bug; that the bug happened to be exploited by a buggy/malicious
>> > > guest doesn't change the fact that the host messed up.
>> >
>> > "If the host goes and DMAs into the completely wrong HVA=>PFN, then
>> > that is a host bug; that the bug happened to be exploited by a
>> > buggy/malicious guest doesn't change the fact that the host messed
>> > up."
>> > ^^^
>> > Again, I'm flabbergasted that you are arguing that it's OK for a guest
>> > to exploit a host bug to take down host-side processes or the host
>> > itself, either of which could bring down all other VMs on the machine.
>> >
>> > I'm going to repeat -- this is not OK! Period.
>>
>> Huh?  At which point did I suggest it's ok to ship software with bugs?  Of course
>> it's not ok to introduce host bugs that let the guest crash the host (or host
>> processes).  But _if_ someone does ship buggy host software, it's not like we can
>> wave a magic wand and stop the guest from exploiting the bug.  That's why they're
>> such a big deal.
>>
>> Yes, in this case a very specific flavor of host userspace bug could be morphed
>> into a guest exception, but as mentioned ad nauseum, _if_ host userspace has bug
>> where it does not properly validate a GPA=>HVA, then any such bug exists and is
>> exploitable today irrespective of SNP.
>
> If I'm understanding you correctly, you're saying that we will never
> get into the host's page fault handler due to an RMP violation if we
> implement the unmapping guest private memory proposal (without bugs).
>
> However, bugs do happen. And the host-side page fault handler will
> have code to react to an RMP violation (even if it's supposedly
> impossible to hit). I'm saying that the host-side page fault handler
> should NOT handle an RMP violation by killing host-side processes or
> the kernel itself. This is detrimental to host reliability.
>
> There are two ways to handle this. (1) Convert the private page
> causing the RMP violation to shared, (2) Kill the guest.

It’s time to put on my maintainer hat. This is solidly in my territory, and NAK.  A kernel privilege fault, from who-knows-what context (interrupts off? NMI? locks held?) that gets an RMP violation with no exception handler is *not* going to blindly write the RMP and retry.  It’s not going to send flush IPIs or call into KVM to “fix” things.  Just the locking issues alone are probably showstopping, even ignoring the security and sanity issues.

You are welcome to submit patches to make the panic_on_oops=0 case as robust as practical, and you are welcome to keep running other VMs after we die() and taint the kernel, but that is strictly best effort and is a bad idea in a highly security sensitive environment.

And that goes for everyone else here too. If you all have a brilliant idea to lazily fix RMP faults (and the actual semantics are reasonable) and you want my NAK to go away, I want to see a crystal clear explanation of what you plan to put in fault.c, how the locking is supposed to work, and how you bound retries such that the system will reliably OOPS if something goes wrong instead of retrying forever or deadlocking.

Otherwise can we please get on with designing a reasonable model for guest-private memory please?

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 18:26           ` Sean Christopherson
  2021-11-15 18:41             ` Marc Orr
@ 2021-11-16 13:02             ` Joerg Roedel
  2021-11-16 20:08               ` Sean Christopherson
  1 sibling, 1 reply; 239+ messages in thread
From: Joerg Roedel @ 2021-11-16 13:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dr. David Alan Gilbert, Borislav Petkov, Dave Hansen,
	Peter Gonda, Brijesh Singh, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Mon, Nov 15, 2021 at 06:26:16PM +0000, Sean Christopherson wrote:
> No, because as Andy pointed out, host userspace must already guard against a bad
> GPA, i.e. this is just a variant of the guest telling the host to DMA to a GPA
> that is completely bogus.  The shared vs. private behavior just means that when
> host userspace is doing a GPA=>HVA lookup, it needs to incorporate the "shared"
> state of the GPA.  If the host goes and DMAs into the completely wrong HVA=>PFN,
> then that is a host bug; that the bug happened to be exploited by a buggy/malicious
> guest doesn't change the fact that the host messed up.

The thing is that the usual checking mechanisms can't be applied to
guest-private pages. For user-space the GPA is valid if it fits into the
guest memory layout user-space set up before. But whether a page is
shared or private is the guests business. And without an expensive
reporting/query mechanism user-space doesn't have the information to do
the check.

A mechanism to lock pages to shared is also needed, and that creates the
next problems:

	* Who can release the lock, only the process which created it or
	  anyone who has the memory mapped?

	* What happens when a process has locked guest regions and then
	  dies with SIGSEGV, will its locks on guest memory be released
	  stay around forever?

And this is only what comes to mind immediatly, I sure there are more
problematic details in such an interface.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-16  5:14                   ` Andy Lutomirski
@ 2021-11-16 13:21                     ` Joerg Roedel
  2021-11-16 18:26                       ` Sean Christopherson
  0 siblings, 1 reply; 239+ messages in thread
From: Joerg Roedel @ 2021-11-16 13:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Marc Orr, Sean Christopherson, Dr. David Alan Gilbert,
	Borislav Petkov, Dave Hansen, Peter Gonda, Brijesh Singh,
	the arch/x86 maintainers, Linux Kernel Mailing List, kvm list,
	linux-coco, linux-mm, Linux Crypto Mailing List, Thomas Gleixner,
	Ingo Molnar, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy, Marc Zyngier, Will Deacon,
	Quentin Perret

On Mon, Nov 15, 2021 at 09:14:14PM -0800, Andy Lutomirski wrote:
> It’s time to put on my maintainer hat. This is solidly in my
> territory, and NAK.  A kernel privilege fault, from who-knows-what
> context (interrupts off? NMI? locks held?) that gets an RMP violation
> with no exception handler is *not* going to blindly write the RMP and
> retry.  It’s not going to send flush IPIs or call into KVM to “fix”
> things.  Just the locking issues alone are probably showstopping, even
> ignoring the security and sanity issues.

RMP faults are expected from two contexts:

	* User-space
	* KVM running in task context

The only situation where RMP faults could happen outside of these
contexts is when running a kexec'ed kernel, which was launched while SNP
guests were still running (that needs to be taken care of as well).

And from the locking side, which lock does the #PF handler need to take?
Processors supporting SNP also have hardware support for flushing remote
TLBs, so locks taken in the flush path are not strictly required.

Calling into KVM is another story and needs some more thought, I agree
with that.

> Otherwise can we please get on with designing a reasonable model for
> guest-private memory please?

It is fine to unmap guest-private memory from the host kernel, even if
it is not required by SNP. TDX need to do that because of the #MC thing
that happens otherwise, but that is also just a way to emulate an
RMP-like fault with TDX.

But as Marc already pointed out, the kernel needs a plan B when an RMP
happens anyway due to some bug.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-15 19:15               ` Sean Christopherson
  2021-11-16  3:07                 ` Marc Orr
@ 2021-11-16 13:30                 ` Joerg Roedel
  1 sibling, 0 replies; 239+ messages in thread
From: Joerg Roedel @ 2021-11-16 13:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Orr, Dr. David Alan Gilbert, Borislav Petkov, Dave Hansen,
	Peter Gonda, Brijesh Singh, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	sathyanarayanan.kuppuswamy, Marc Zyngier, Will Deacon,
	Quentin Perret

On Mon, Nov 15, 2021 at 07:15:07PM +0000, Sean Christopherson wrote:
> It creates a new attack surface, e.g. if the guest mishandles the #VC and does
> PVALIDATE on memory that it previously accepted, then userspace can attack the
> guest by accessing guest private memory to coerce the guest into consuming corrupted
> data.

If a guest can be tricked into a double PVALIDATE or otherwise
misbehaves on a #VC exception, then it is a guest bug and needs to be
fixed there.

It is a core requirement to the #VC handler that it can not be tricked
that way.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-16 13:21                     ` Joerg Roedel
@ 2021-11-16 18:26                       ` Sean Christopherson
  2021-11-16 18:39                         ` Peter Gonda
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-11-16 18:26 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Marc Orr, Dr. David Alan Gilbert,
	Borislav Petkov, Dave Hansen, Peter Gonda, Brijesh Singh,
	the arch/x86 maintainers, Linux Kernel Mailing List, kvm list,
	linux-coco, linux-mm, Linux Crypto Mailing List, Thomas Gleixner,
	Ingo Molnar, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy, Marc Zyngier, Will Deacon,
	Quentin Perret

On Tue, Nov 16, 2021, Joerg Roedel wrote:
> But as Marc already pointed out, the kernel needs a plan B when an RMP
> happens anyway due to some bug.

I don't see why unexpected RMP #PF is a special snowflake that needs a different
plan than literally every other type of unexpected #PF in the kernel.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-16 18:26                       ` Sean Christopherson
@ 2021-11-16 18:39                         ` Peter Gonda
  0 siblings, 0 replies; 239+ messages in thread
From: Peter Gonda @ 2021-11-16 18:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Joerg Roedel, Andy Lutomirski, Marc Orr, Dr. David Alan Gilbert,
	Borislav Petkov, Dave Hansen, Brijesh Singh,
	the arch/x86 maintainers, Linux Kernel Mailing List, kvm list,
	linux-coco, linux-mm, Linux Crypto Mailing List, Thomas Gleixner,
	Ingo Molnar, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Dave Hansen, Sergio Lopez, Peter Zijlstra (Intel),
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy, Marc Zyngier, Will Deacon,
	Quentin Perret

On Tue, Nov 16, 2021 at 11:26 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Nov 16, 2021, Joerg Roedel wrote:
> > But as Marc already pointed out, the kernel needs a plan B when an RMP
> > happens anyway due to some bug.
>
> I don't see why unexpected RMP #PF is a special snowflake that needs a different
> plan than literally every other type of unexpected #PF in the kernel.

When I started this thread I was not trying to say we *need* to do
something different for RMP faults, but that we *could* improve host
reliability by doing something. Since it is possible to special case
an RMP fault and prevent a panic I thought it was with discussing.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-16 13:02             ` Joerg Roedel
@ 2021-11-16 20:08               ` Sean Christopherson
  0 siblings, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-11-16 20:08 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Dr. David Alan Gilbert, Borislav Petkov, Dave Hansen,
	Peter Gonda, Brijesh Singh, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Tue, Nov 16, 2021, Joerg Roedel wrote:
> On Mon, Nov 15, 2021 at 06:26:16PM +0000, Sean Christopherson wrote:
> > No, because as Andy pointed out, host userspace must already guard against a bad
> > GPA, i.e. this is just a variant of the guest telling the host to DMA to a GPA
> > that is completely bogus.  The shared vs. private behavior just means that when
> > host userspace is doing a GPA=>HVA lookup, it needs to incorporate the "shared"
> > state of the GPA.  If the host goes and DMAs into the completely wrong HVA=>PFN,
> > then that is a host bug; that the bug happened to be exploited by a buggy/malicious
> > guest doesn't change the fact that the host messed up.
> 
> The thing is that the usual checking mechanisms can't be applied to
> guest-private pages. For user-space the GPA is valid if it fits into the
> guest memory layout user-space set up before. But whether a page is
> shared or private is the guests business.

And that's where we fundamentally disagree.  Whether a page is shared or private
is very much the host's business.  The guest can _ask_ to convert a page, but the
host ultimately owns the state of a page.  Even in this proposed auto-convert
approach, the host has final say over the state of the page.

The main difference between auto-converting in the #PF handler and an unmapping
approach is that, as you note below, the latter requires an explicit action from
host userspace.  But again, the host kernel has final say over the state of any
given page.

> And without an expensive reporting/query mechanism user-space doesn't have the
> information to do the check.

The cost of exiting to userspace isn't all that expensive relative to the cost of
the RMP update itself, e.g. IIRC updating the RMP is several thousand cycles.
TDX will have a similar cost to modify S-EPT entries.

Actually updating the backing store (see below) might be relatively expensive, but
I highly doubt it will be orders of magnitude slower than updating the RMP, or that
it will have a meaningful impact on guest performance.

> A mechanism to lock pages to shared is also needed, and that creates the
> next problems:

The most recent proposal for unmapping guest private memory doesn't require new
locking at the page level.  The high level idea is to treat shared and private
variations of GPAs as two unrelated addresses from a host memory management
perspective.  They are only a "single" address in the context of KVM's MMU, i.e.
the NPT for SNP.

For shared pages, no new locking is required as the PFN associated with a VMA will
not be freed until all mappings go away.  Any access after all mappings/references
have been dropped is a nothing more than a use-after-free bug, and the guilty party
is punished accordingly.

For private pages, the proposed idea is to require that all guest private memory
be backed by an elightened backing store, e.g. the initial RFC enhances memfd and
shmem to support sealing the file as guest-only:

  : The new seal is only allowed if there's no pre-existing pages in the fd
  : and there's no existing mapping of the file. After the seal is set, no
  : read/write/mmap from userspace is allowed.

It is KVM's responsibility to ensure it doesn't map a shared PFN into a private
GPA and vice versa, and that TDP entries are unmapped appropriately, e.g. when
userspace punches a hole in the backing store, but that can all be done using
existing locks, e.g. KVM's mmu_lock.  No new locking mechanisms are required.

> 	* Who can release the lock, only the process which created it or
> 	  anyone who has the memory mapped?
> 
> 	* What happens when a process has locked guest regions and then
> 	  dies with SIGSEGV, will its locks on guest memory be released
> 	  stay around forever?

> And this is only what comes to mind immediatly, I sure there are more
> problematic details in such an interface.

Please read through this proposal/RFC, more eyeballs would certainly be welcome.

https://lkml.kernel.org/r/20211111141352.26311-1-chao.p.peng@linux.intel.com

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-12 15:43 ` [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Peter Gonda
  2021-11-12 17:59   ` Dave Hansen
@ 2021-11-22 15:23   ` Brijesh Singh
  2021-11-22 17:03     ` Vlastimil Babka
  2021-11-22 18:30     ` Dave Hansen
  1 sibling, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-11-22 15:23 UTC (permalink / raw)
  To: Peter Gonda
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

Hi Peter,

On 11/12/21 9:43 AM, Peter Gonda wrote:
> Hi Brijesh,,
> 
> One high level discussion I'd like to have on these SNP KVM patches.
> 
> In these patches (V5) if a host userspace process writes a guest
> private page a SIGBUS is issued to that process. If the kernel writes
> a guest private page then the kernel panics due to the unhandled RMP
> fault page fault. This is an issue because not all writes into guest
> memory may come from a bug in the host. For instance a malicious or
> even buggy guest could easily point the host to writing a private page
> during the emulation of many virtual devices (virtio, NVMe, etc). For
> example if a well behaved guests behavior is to: start up a driver,
> select some pages to share with the guest, ask the host to convert
> them to shared, then use those pages for virtual device DMA, if a
> buggy guest forget the step to request the pages be converted to
> shared its easy to see how the host could rightfully write to private
> memory. I think we can better guarantee host reliability when running
> SNP guests without changing SNP’s security properties.
> 
> Here is an alternative to the current approach: On RMP violation (host
> or userspace) the page fault handler converts the page from private to
> shared to allow the write to continue. This pulls from s390’s error
> handling which does exactly this. See ‘arch_make_page_accessible()’.
> Additionally it adds less complexity to the SNP kernel patches, and
> requires no new ABI.
> 
> In the current (V5) KVM implementation if a userspace process
> generates an RMP violation (writes to guest private memory) the
> process receives a SIGBUS. At first glance, it would appear that
> user-space shouldn’t write to private memory. However, guaranteeing
> this in a generic fashion requires locking the RMP entries (via locks
> external to the RMP). Otherwise, a user-space process emulating a
> guest device IO may be vulnerable to having the guest memory
> (maliciously or by guest bug) converted to private while user-space
> emulation is happening. This results in a well behaved userspace
> process receiving a SIGBUS.
> 
> This proposal allows buggy and malicious guests to run under SNP
> without jeopardizing the reliability / safety of host processes. This
> is very important to a cloud service provider (CSP) since it’s common
> to have host wide daemons that write/read all guests, i.e. a single
> process could manage the networking for all VMs on the host. Crashing
> that singleton process kills networking for all VMs on the system.
> 
Thank you for starting the thread; based on the discussion, I am keeping 
the current implementation as-is and *not* going with the auto 
conversion from private to shared. To summarize what we are doing in the 
current SNP series:

- If userspace accesses guest private memory, it gets SIGBUS.
- If kernel accesses[*] guest private memory, it does panic.

[*] Kernel consults the RMP table for the page ownership before the 
access. If the page is shared, then it uses the locking mechanism to 
ensure that a guest will not be able to change the page ownership while 
kernel has it mapped.

thanks

> This proposal also allows for minimal changes to the kexec flow and
> kdump. The new kexec kernel can simply update private pages to shared
> as it encounters them during their boot. This avoids needing to
> propagate the RMP state from kernel to kernel. Of course this doesn’t
> preserve any running VMs but is still useful for kdump crash dumps or
> quicker rekerneling for development with kexec.
> 
> This proposal does cause guest memory corruption for some bugs but one
> of SEV-SNP’s goals extended from SEV-ES’s goals is for guest’s to be
> able to detect when its memory has been corrupted / replayed by the
> host. So SNP already has features for allowing guests to detect this
> kind of memory corruption. Additionally this is very similar to a page
> of memory generating a machine check because of 2-bit memory
> corruption. In other words SNP guests must be enlightened and ready
> for these kinds of errors.
> 
> For an SNP guest running under this proposal the flow would look like this:
> * Host gets a #PF because its trying to write to a private page.
> * Host #PF handler updates the page to shared.
> * Write continues normally.
> * Guest accesses memory (r/w).
> * Guest gets a #VC error because the page is not PVALIDATED
> * Guest is now in control. Guest can terminate because its memory has
> been corrupted. Guest could try and continue to log the error to its
> owner.
> 
> A similar approach was introduced in the SNP patches V1 and V2 for
> kernel page fault handling. The pushback around this convert to shared
> approach was largely focused around the idea that the kernel has all
> the information about which pages are shared vs private so it should
> be able to check shared status before write to pages. After V2 the
> patches were updated to not have a kernel page fault handler for RMP
> violations (other than dumping state during a panic). The current
> patches protect the host with new post_{map,unmap}_gfn() function that
> checks if a page is shared before mapping it, then locks the page
> shared until unmapped. Given the discussions on ‘[Part2,v5,39/45] KVM:
> SVM: Introduce ops for the post gfn map and unmap’ building a solution
> to do this is non trivial and adds new overheads to KVM. Additionally
> the current solution is local to the kernel. So a new ABI just now be
> created to allow the userspace VMM to access the kernel-side locks for
> this to work generically for the whole host. This is more complicated
> than this proposal and adding more lock holders seems like it could
> reduce performance further.
> 
> There are a couple corner cases with this approach. Under SNP guests
> can request their memory be changed into a VMSA. This VMSA page cannot
> be changed to shared while the vCPU associated with it is running. So
> KVM + the #PF handler will need something to kick vCPUs from running.
> Joerg believes that a possible fix for this could be a new MMU
> notifier in the kernel, then on the #PF we can go through the rmp and
> execute this vCPU kick callback.
> 
> Another corner case is the RMPUPDATE instruction is not guaranteed to
> succeed on first iteration. As noted above if the page is a VMSA it
> cannot be updated while the vCPU is running. Another issue is if the
> guest is running a RMPADJUST on a page it cannot be RMPUPDATED at that
> time. There is a lock for each RMP Entry so there is a race for these
> instructions. The vCPU kicking can solve this issue to be kicking all
> guest vCPUs which removes the chance for the race.
> 
> Since this proposal probably results in SNP guests terminating due to
> a page unexpectedly needing PVALIDATE. The approach could be
> simplified to just the KVM killing the guest. I think it's nicer to
> users to instead of unilaterally killing the guest allowing the
> unvalidated #VC exception to allow users to collect some additional
> debug information and any additional clean up work they would like to
> perform.
> 
> Thanks
> Peter
> 
> On Fri, Aug 20, 2021 at 9:59 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>>
>> This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
>> changes required in a host OS for SEV-SNP support. The series builds upon
>> SEV-SNP Part-1.
>>
>> This series provides the basic building blocks to support booting the SEV-SNP
>> VMs, it does not cover all the security enhancement introduced by the SEV-SNP
>> such as interrupt protection.
>>
>> The CCP driver is enhanced to provide new APIs that use the SEV-SNP
>> specific commands defined in the SEV-SNP firmware specification. The KVM
>> driver uses those APIs to create and managed the SEV-SNP guests.
>>
>> The GHCB specification version 2 introduces new set of NAE's that is
>> used by the SEV-SNP guest to communicate with the hypervisor. The series
>> provides support to handle the following new NAE events:
>> - Register GHCB GPA
>> - Page State Change Request
>> - Hypevisor feature
>> - Guest message request
>>
>> The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
>> access requires an RMP check. In particular, the read accesses from the
>> hypervisor do not require RMP checks because the data confidentiality is
>> already protected via memory encryption. When hardware encounters an RMP
>> checks failure, it raises a page-fault exception. If RMP check failure
>> is due to the page-size mismatch, then split the large page to resolve
>> the fault.
>>
>> The series does not provide support for the interrupt security and migration
>> and those feature will be added after the base support.
>>
>> The series is based on the commit:
>>   SNP part1 commit and
>>   fa7a549d321a (kvm/next, next) KVM: x86: accept userspace interrupt only if no event is injected
>>
>> TODO:
>>    * Add support for command to ratelimit the guest message request.
>>
>> Changes since v4:
>>   * Move the RMP entry definition to x86 specific header file.
>>   * Move the dump RMP entry function to SEV specific file.
>>   * Use BIT_ULL while defining the #PF bit fields.
>>   * Add helper function to check the IOMMU support for SEV-SNP feature.
>>   * Add helper functions for the page state transition.
>>   * Map and unmap the pages from the direct map after page is added or
>>     removed in RMP table.
>>   * Enforce the minimum SEV-SNP firmware version.
>>   * Extend the LAUNCH_UPDATE to accept the base_gfn and remove the
>>     logic to calculate the gfn from the hva.
>>   * Add a check in LAUNCH_UPDATE to ensure that all the pages are
>>     shared before calling the PSP.
>>   * Mark the memory failure when failing to remove the page from the
>>     RMP table or clearing the immutable bit.
>>   * Exclude the encrypted hva range from the KSM.
>>   * Remove the gfn tracking during the kvm_gfn_map() and use SRCU to
>>     syncronize the PSC and gfn mapping.
>>   * Allow PSC on the registered hva range only.
>>   * Add support for the Preferred GPA VMGEXIT.
>>   * Simplify the PSC handling routines.
>>   * Use the static_call() for the newly added kvm_x86_ops.
>>   * Remove the long-lived GHCB map.
>>   * Move the snp enable module parameter to the end of the file.
>>   * Remove the kvm_x86_op for the RMP fault handling. Call the
>>     fault handler directly from the #NPF interception.
>>
>> Changes since v3:
>>   * Add support for extended guest message request.
>>   * Add ioctl to query the SNP Platform status.
>>   * Add ioctl to get and set the SNP config.
>>   * Add check to verify that memory reserved for the RMP covers the full system RAM.
>>   * Start the SNP specific commands from 256 instead of 255.
>>   * Multiple cleanup and fixes based on the review feedback.
>>
>> Changes since v2:
>>   * Add AP creation support.
>>   * Drop the patch to handle the RMP fault for the kernel address.
>>   * Add functions to track the write access from the hypervisor.
>>   * Do not enable the SNP feature when IOMMU is disabled or is in passthrough mode.
>>   * Dump the RMP entry on RMP violation for the debug.
>>   * Shorten the GHCB macro names.
>>   * Start the SNP_INIT command id from 255 to give some gap for the legacy SEV.
>>   * Sync the header with the latest 0.9 SNP spec.
>>
>> Changes since v1:
>>   * Add AP reset MSR protocol VMGEXIT NAE.
>>   * Add Hypervisor features VMGEXIT NAE.
>>   * Move the RMP table initialization and RMPUPDATE/PSMASH helper in
>>     arch/x86/kernel/sev.c.
>>   * Add support to map/unmap SEV legacy command buffer to firmware state when
>>     SNP is active.
>>   * Enhance PSP driver to provide helper to allocate/free memory used for the
>>     firmware context page.
>>   * Add support to handle RMP fault for the kernel address.
>>   * Add support to handle GUEST_REQUEST NAE event for attestation.
>>   * Rename RMP table lookup helper.
>>   * Drop typedef from rmpentry struct definition.
>>   * Drop SNP static key and use cpu_feature_enabled() to check whether SEV-SNP
>>     is active.
>>   * Multiple cleanup/fixes to address Boris review feedback.
>>
>> Brijesh Singh (40):
>>    x86/cpufeatures: Add SEV-SNP CPU feature
>>    iommu/amd: Introduce function to check SEV-SNP support
>>    x86/sev: Add the host SEV-SNP initialization support
>>    x86/sev: Add RMP entry lookup helpers
>>    x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
>>    x86/sev: Invalid pages from direct map when adding it to RMP table
>>    x86/traps: Define RMP violation #PF error code
>>    x86/fault: Add support to handle the RMP fault for user address
>>    x86/fault: Add support to dump RMP entry on fault
>>    crypto: ccp: shutdown SEV firmware on kexec
>>    crypto:ccp: Define the SEV-SNP commands
>>    crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
>>    crypto:ccp: Provide APIs to issue SEV-SNP commands
>>    crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
>>    crypto: ccp: Handle the legacy SEV command when SNP is enabled
>>    crypto: ccp: Add the SNP_PLATFORM_STATUS command
>>    crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
>>    crypto: ccp: Provide APIs to query extended attestation report
>>    KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
>>    KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
>>    KVM: SVM: Add initial SEV-SNP support
>>    KVM: SVM: Add KVM_SNP_INIT command
>>    KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
>>    KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
>>    KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
>>    KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
>>    KVM: X86: Keep the NPT and RMP page level in sync
>>    KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
>>    KVM: x86: Define RMP page fault error bits for #NPF
>>    KVM: x86: Update page-fault trace to log full 64-bit error code
>>    KVM: SVM: Do not use long-lived GHCB map while setting scratch area
>>    KVM: SVM: Remove the long-lived GHCB host map
>>    KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
>>    KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
>>    KVM: SVM: Add support to handle Page State Change VMGEXIT
>>    KVM: SVM: Introduce ops for the post gfn map and unmap
>>    KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
>>    KVM: SVM: Add support to handle the RMP nested page fault
>>    KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
>>    KVM: SVM: Add module parameter to enable the SEV-SNP
>>
>> Sean Christopherson (2):
>>    KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
>>    KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP
>>
>> Tom Lendacky (3):
>>    KVM: SVM: Add support to handle AP reset MSR protocol
>>    KVM: SVM: Use a VMSA physical address variable for populating VMCB
>>    KVM: SVM: Support SEV-SNP AP Creation NAE event
>>
>>   Documentation/virt/coco/sevguest.rst          |   55 +
>>   .../virt/kvm/amd-memory-encryption.rst        |  102 +
>>   arch/x86/include/asm/cpufeatures.h            |    1 +
>>   arch/x86/include/asm/disabled-features.h      |    8 +-
>>   arch/x86/include/asm/kvm-x86-ops.h            |    5 +
>>   arch/x86/include/asm/kvm_host.h               |   20 +
>>   arch/x86/include/asm/msr-index.h              |    6 +
>>   arch/x86/include/asm/sev-common.h             |   28 +
>>   arch/x86/include/asm/sev.h                    |   45 +
>>   arch/x86/include/asm/svm.h                    |    7 +
>>   arch/x86/include/asm/trap_pf.h                |   18 +-
>>   arch/x86/kernel/cpu/amd.c                     |    3 +-
>>   arch/x86/kernel/sev.c                         |  361 ++++
>>   arch/x86/kvm/lapic.c                          |    5 +-
>>   arch/x86/kvm/mmu.h                            |    7 +-
>>   arch/x86/kvm/mmu/mmu.c                        |   84 +-
>>   arch/x86/kvm/svm/sev.c                        | 1676 ++++++++++++++++-
>>   arch/x86/kvm/svm/svm.c                        |   62 +-
>>   arch/x86/kvm/svm/svm.h                        |   74 +-
>>   arch/x86/kvm/trace.h                          |   40 +-
>>   arch/x86/kvm/x86.c                            |   92 +-
>>   arch/x86/mm/fault.c                           |   84 +-
>>   drivers/crypto/ccp/sev-dev.c                  |  924 ++++++++-
>>   drivers/crypto/ccp/sev-dev.h                  |   17 +
>>   drivers/crypto/ccp/sp-pci.c                   |   12 +
>>   drivers/iommu/amd/init.c                      |   30 +
>>   include/linux/iommu.h                         |    9 +
>>   include/linux/mm.h                            |    6 +-
>>   include/linux/psp-sev.h                       |  346 ++++
>>   include/linux/sev.h                           |   32 +
>>   include/uapi/linux/kvm.h                      |   56 +
>>   include/uapi/linux/psp-sev.h                  |   60 +
>>   mm/memory.c                                   |   13 +
>>   tools/arch/x86/include/asm/cpufeatures.h      |    1 +
>>   34 files changed, 4088 insertions(+), 201 deletions(-)
>>   create mode 100644 include/linux/sev.h
>>
>> --
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 15:23   ` Brijesh Singh
@ 2021-11-22 17:03     ` Vlastimil Babka
  2021-11-22 18:01       ` Brijesh Singh
  2021-11-22 18:30     ` Dave Hansen
  1 sibling, 1 reply; 239+ messages in thread
From: Vlastimil Babka @ 2021-11-22 17:03 UTC (permalink / raw)
  To: Brijesh Singh, Peter Gonda
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On 11/22/21 16:23, Brijesh Singh wrote:
> Hi Peter,
> 
> On 11/12/21 9:43 AM, Peter Gonda wrote:
>> Hi Brijesh,,
>>
>> One high level discussion I'd like to have on these SNP KVM patches.
>>
>> In these patches (V5) if a host userspace process writes a guest
>> private page a SIGBUS is issued to that process. If the kernel writes
>> a guest private page then the kernel panics due to the unhandled RMP
>> fault page fault. This is an issue because not all writes into guest
>> memory may come from a bug in the host. For instance a malicious or
>> even buggy guest could easily point the host to writing a private page
>> during the emulation of many virtual devices (virtio, NVMe, etc). For
>> example if a well behaved guests behavior is to: start up a driver,
>> select some pages to share with the guest, ask the host to convert
>> them to shared, then use those pages for virtual device DMA, if a
>> buggy guest forget the step to request the pages be converted to
>> shared its easy to see how the host could rightfully write to private
>> memory. I think we can better guarantee host reliability when running
>> SNP guests without changing SNP’s security properties.
>>
>> Here is an alternative to the current approach: On RMP violation (host
>> or userspace) the page fault handler converts the page from private to
>> shared to allow the write to continue. This pulls from s390’s error
>> handling which does exactly this. See ‘arch_make_page_accessible()’.
>> Additionally it adds less complexity to the SNP kernel patches, and
>> requires no new ABI.
>>
>> In the current (V5) KVM implementation if a userspace process
>> generates an RMP violation (writes to guest private memory) the
>> process receives a SIGBUS. At first glance, it would appear that
>> user-space shouldn’t write to private memory. However, guaranteeing
>> this in a generic fashion requires locking the RMP entries (via locks
>> external to the RMP). Otherwise, a user-space process emulating a
>> guest device IO may be vulnerable to having the guest memory
>> (maliciously or by guest bug) converted to private while user-space
>> emulation is happening. This results in a well behaved userspace
>> process receiving a SIGBUS.
>>
>> This proposal allows buggy and malicious guests to run under SNP
>> without jeopardizing the reliability / safety of host processes. This
>> is very important to a cloud service provider (CSP) since it’s common
>> to have host wide daemons that write/read all guests, i.e. a single
>> process could manage the networking for all VMs on the host. Crashing
>> that singleton process kills networking for all VMs on the system.
>>
> Thank you for starting the thread; based on the discussion, I am keeping the
> current implementation as-is and *not* going with the auto conversion from
> private to shared. To summarize what we are doing in the current SNP series:
> 
> - If userspace accesses guest private memory, it gets SIGBUS.

So, is there anything protecting host userspace processes from malicious guests?

> - If kernel accesses[*] guest private memory, it does panic.
> 
> [*] Kernel consults the RMP table for the page ownership before the access.
> If the page is shared, then it uses the locking mechanism to ensure that a
> guest will not be able to change the page ownership while kernel has it mapped.
> 
> thanks
> 

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 17:03     ` Vlastimil Babka
@ 2021-11-22 18:01       ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-11-22 18:01 UTC (permalink / raw)
  To: Vlastimil Babka, Peter Gonda
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy



On 11/22/21 11:03 AM, Vlastimil Babka wrote:
> On 11/22/21 16:23, Brijesh Singh wrote:
>> Hi Peter,
>>
>> On 11/12/21 9:43 AM, Peter Gonda wrote:
>>> Hi Brijesh,,
>>>
>>> One high level discussion I'd like to have on these SNP KVM patches.
>>>
>>> In these patches (V5) if a host userspace process writes a guest
>>> private page a SIGBUS is issued to that process. If the kernel writes
>>> a guest private page then the kernel panics due to the unhandled RMP
>>> fault page fault. This is an issue because not all writes into guest
>>> memory may come from a bug in the host. For instance a malicious or
>>> even buggy guest could easily point the host to writing a private page
>>> during the emulation of many virtual devices (virtio, NVMe, etc). For
>>> example if a well behaved guests behavior is to: start up a driver,
>>> select some pages to share with the guest, ask the host to convert
>>> them to shared, then use those pages for virtual device DMA, if a
>>> buggy guest forget the step to request the pages be converted to
>>> shared its easy to see how the host could rightfully write to private
>>> memory. I think we can better guarantee host reliability when running
>>> SNP guests without changing SNP’s security properties.
>>>
>>> Here is an alternative to the current approach: On RMP violation (host
>>> or userspace) the page fault handler converts the page from private to
>>> shared to allow the write to continue. This pulls from s390’s error
>>> handling which does exactly this. See ‘arch_make_page_accessible()’.
>>> Additionally it adds less complexity to the SNP kernel patches, and
>>> requires no new ABI.
>>>
>>> In the current (V5) KVM implementation if a userspace process
>>> generates an RMP violation (writes to guest private memory) the
>>> process receives a SIGBUS. At first glance, it would appear that
>>> user-space shouldn’t write to private memory. However, guaranteeing
>>> this in a generic fashion requires locking the RMP entries (via locks
>>> external to the RMP). Otherwise, a user-space process emulating a
>>> guest device IO may be vulnerable to having the guest memory
>>> (maliciously or by guest bug) converted to private while user-space
>>> emulation is happening. This results in a well behaved userspace
>>> process receiving a SIGBUS.
>>>
>>> This proposal allows buggy and malicious guests to run under SNP
>>> without jeopardizing the reliability / safety of host processes. This
>>> is very important to a cloud service provider (CSP) since it’s common
>>> to have host wide daemons that write/read all guests, i.e. a single
>>> process could manage the networking for all VMs on the host. Crashing
>>> that singleton process kills networking for all VMs on the system.
>>>
>> Thank you for starting the thread; based on the discussion, I am keeping the
>> current implementation as-is and *not* going with the auto conversion from
>> private to shared. To summarize what we are doing in the current SNP series:
>>
>> - If userspace accesses guest private memory, it gets SIGBUS.
> 
> So, is there anything protecting host userspace processes from malicious guests?
> 

Unfortunately, no.

In the future, we could look into Sean's suggestion to come with an ABI 
that userspace can use to lock the guest pages before the access and 
notify the caller of the access violation. It seems that TDX may need 
something similar, but I cannot tell for sure. This proposal seems good 
at the first glance but devil is in the detail; once implemented we also 
need to measure the performance implication of it.

Should we consider using SIGSEGV (SEGV_ACCERR) instead of SIGBUS? In 
other words, treating a guest's private pages as read-only and writing 
to them will generate a standard SIGSEGV.

thanks


>> - If kernel accesses[*] guest private memory, it does panic.
>>
>> [*] Kernel consults the RMP table for the page ownership before the access.
>> If the page is shared, then it uses the locking mechanism to ensure that a
>> guest will not be able to change the page ownership while kernel has it mapped.
>>
>> thanks
>>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 15:23   ` Brijesh Singh
  2021-11-22 17:03     ` Vlastimil Babka
@ 2021-11-22 18:30     ` Dave Hansen
  2021-11-22 19:06       ` Brijesh Singh
  1 sibling, 1 reply; 239+ messages in thread
From: Dave Hansen @ 2021-11-22 18:30 UTC (permalink / raw)
  To: Brijesh Singh, Peter Gonda
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On 11/22/21 7:23 AM, Brijesh Singh wrote:
> Thank you for starting the thread; based on the discussion, I am keeping
> the current implementation as-is and *not* going with the auto
> conversion from private to shared. To summarize what we are doing in the
> current SNP series:
> 
> - If userspace accesses guest private memory, it gets SIGBUS.
> - If kernel accesses[*] guest private memory, it does panic.

There's a subtlety here, though.  There are really three *different*
kinds of kernel accesses that matter:

1. Kernel bugs.  Kernel goes off and touches some guest private memory
   when it didn't mean to.  Say, it runs off the end of a slab page and
   runs into a guest page.  panic() is expected here.
2. Kernel accesses guest private memory via a userspace mapping, in a
   place where it is known to be accessing userspace and is prepared to
   fault.  copy_to_user() is the most straightforward example.  Kernel
   must *not* panic().  Returning an error to the syscall is a good
   way to handle these (if in a syscall).
3. Kernel accesses guest private memory via a kernel mapping.  This one
   is tricky.  These probably *do* result in a panic() today, but
   ideally shouldn't.

Could you explicitly clarify what the current behavior is?

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 18:30     ` Dave Hansen
@ 2021-11-22 19:06       ` Brijesh Singh
  2021-11-22 19:14         ` Dave Hansen
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-11-22 19:06 UTC (permalink / raw)
  To: Dave Hansen, Peter Gonda
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy


On 11/22/21 12:30 PM, Dave Hansen wrote:
> On 11/22/21 7:23 AM, Brijesh Singh wrote:
>> Thank you for starting the thread; based on the discussion, I am keeping
>> the current implementation as-is and *not* going with the auto
>> conversion from private to shared. To summarize what we are doing in the
>> current SNP series:
>>
>> - If userspace accesses guest private memory, it gets SIGBUS.
>> - If kernel accesses[*] guest private memory, it does panic.
> There's a subtlety here, though.  There are really three *different*
> kinds of kernel accesses that matter:
> 1. Kernel bugs.  Kernel goes off and touches some guest private memory
>    when it didn't mean to.  Say, it runs off the end of a slab page and
>    runs into a guest page.  panic() is expected here.

In current implementation, a write to guest private will trigger a
kernel panic().


> 2. Kernel accesses guest private memory via a userspace mapping, in a
>    place where it is known to be accessing userspace and is prepared to
>    fault.  copy_to_user() is the most straightforward example.  Kernel
>    must *not* panic().  Returning an error to the syscall is a good
>    way to handle these (if in a syscall).

In the current implementation, the copy_to_user() on the guest private
will fails with -EFAULT.


> 3. Kernel accesses guest private memory via a kernel mapping.  This one
>    is tricky.  These probably *do* result in a panic() today, but
>    ideally shouldn't.

KVM has defined some helper functions to maps and unmap the guest pages.
Those helper functions do the GPA to PFN lookup before calling the
kmap(). Those helpers are enhanced such that it check the RMP table
before the kmap() and acquire a lock to prevent a page state change
until the kunmap() is called. So, in the current implementation, we
should *not* see a panic() unless there is a KVM driver bug that didn't
use the helper functions or a bug in the helper function itself.


> Could you explicitly clarify what the current behavior is?

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 19:06       ` Brijesh Singh
@ 2021-11-22 19:14         ` Dave Hansen
  2021-11-22 20:33           ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Dave Hansen @ 2021-11-22 19:14 UTC (permalink / raw)
  To: Brijesh Singh, Peter Gonda
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On 11/22/21 11:06 AM, Brijesh Singh wrote:
>> 3. Kernel accesses guest private memory via a kernel mapping.  This one
>>    is tricky.  These probably *do* result in a panic() today, but
>>    ideally shouldn't.
> KVM has defined some helper functions to maps and unmap the guest pages.
> Those helper functions do the GPA to PFN lookup before calling the
> kmap(). Those helpers are enhanced such that it check the RMP table
> before the kmap() and acquire a lock to prevent a page state change
> until the kunmap() is called. So, in the current implementation, we
> should *not* see a panic() unless there is a KVM driver bug that didn't
> use the helper functions or a bug in the helper function itself.

I don't think this is really KVM specific.

Think of a remote process doing ptrace(PTRACE_POKEUSER) or pretty much
any generic get_user_pages() instance.  As long as the memory is mapped
into the page tables, you're exposed to users that walk the page tables.

How do we, for example, prevent ptrace() from inducing a panic()?

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 19:14         ` Dave Hansen
@ 2021-11-22 20:33           ` Brijesh Singh
  2021-11-22 21:34             ` Sean Christopherson
  2021-11-22 22:51             ` Dave Hansen
  0 siblings, 2 replies; 239+ messages in thread
From: Brijesh Singh @ 2021-11-22 20:33 UTC (permalink / raw)
  To: Dave Hansen, Peter Gonda
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy



On 11/22/21 1:14 PM, Dave Hansen wrote:
> On 11/22/21 11:06 AM, Brijesh Singh wrote:
>>> 3. Kernel accesses guest private memory via a kernel mapping.  This one
>>>     is tricky.  These probably *do* result in a panic() today, but
>>>     ideally shouldn't.
>> KVM has defined some helper functions to maps and unmap the guest pages.
>> Those helper functions do the GPA to PFN lookup before calling the
>> kmap(). Those helpers are enhanced such that it check the RMP table
>> before the kmap() and acquire a lock to prevent a page state change
>> until the kunmap() is called. So, in the current implementation, we
>> should *not* see a panic() unless there is a KVM driver bug that didn't
>> use the helper functions or a bug in the helper function itself.
> 
> I don't think this is really KVM specific.
> 
> Think of a remote process doing ptrace(PTRACE_POKEUSER) or pretty much
> any generic get_user_pages() instance.  As long as the memory is mapped
> into the page tables, you're exposed to users that walk the page tables.
> 
> How do we, for example, prevent ptrace() from inducing a panic()?
> 

In the current approach, this access will induce a panic(). In general, 
supporting the ptrace() for the encrypted VM region is going to be 
difficult. The upcoming TDX work to unmap the guest memory region from 
the current process page table can easily extend for the SNP to cover 
the current limitations.

thanks

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 20:33           ` Brijesh Singh
@ 2021-11-22 21:34             ` Sean Christopherson
  2021-11-22 22:51             ` Dave Hansen
  1 sibling, 0 replies; 239+ messages in thread
From: Sean Christopherson @ 2021-11-22 21:34 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Dave Hansen, Peter Gonda, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Mon, Nov 22, 2021, Brijesh Singh wrote:
> 
> On 11/22/21 1:14 PM, Dave Hansen wrote:
> > On 11/22/21 11:06 AM, Brijesh Singh wrote:
> > > > 3. Kernel accesses guest private memory via a kernel mapping.  This one
> > > >     is tricky.  These probably *do* result in a panic() today, but
> > > >     ideally shouldn't.
> > > KVM has defined some helper functions to maps and unmap the guest pages.
> > > Those helper functions do the GPA to PFN lookup before calling the
> > > kmap(). Those helpers are enhanced such that it check the RMP table
> > > before the kmap() and acquire a lock to prevent a page state change
> > > until the kunmap() is called. So, in the current implementation, we
> > > should *not* see a panic() unless there is a KVM driver bug that didn't
> > > use the helper functions or a bug in the helper function itself.
> > 
> > I don't think this is really KVM specific.
> > 
> > Think of a remote process doing ptrace(PTRACE_POKEUSER) or pretty much
> > any generic get_user_pages() instance.  As long as the memory is mapped
> > into the page tables, you're exposed to users that walk the page tables.
> > 
> > How do we, for example, prevent ptrace() from inducing a panic()?
> > 
> 
> In the current approach, this access will induce a panic(). In general,
> supporting the ptrace() for the encrypted VM region is going to be
> difficult.

But ptrace() is just an example, any path in the kernel that accesses a gup'd
page through a kernel mapping will explode if handed a guest private page.

> The upcoming TDX work to unmap the guest memory region from the current process
> page table can easily extend for the SNP to cover the current limitations.

That represents an ABI change though.  If KVM allows userspace to create SNP guests
without any guarantees that userspace cannot coerce the kernel into accessing guest
private memory, then we are stuck supporting that behavior even if KVM later gains
the ability to provide such guarantees through new APIs.

If allowing this behavior was only a matter of the system admin opting into a
dangerous configuration, I would probably be ok merging SNP with it buried behind
EXPERT or something scarier, but this impacts KVM's ABI as well as kernel internals,
e.g. the hooks in kvm_vcpu_map() and friends are unnecessary if KVM can differentiate
between shared and private gfns in its memslots, as gfn_to_pfn() will either fail or
point at memory that is guaranteed to be in the shared state.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 20:33           ` Brijesh Singh
  2021-11-22 21:34             ` Sean Christopherson
@ 2021-11-22 22:51             ` Dave Hansen
  2021-11-23  5:15               ` Luck, Tony
                                 ` (3 more replies)
  1 sibling, 4 replies; 239+ messages in thread
From: Dave Hansen @ 2021-11-22 22:51 UTC (permalink / raw)
  To: Brijesh Singh, Peter Gonda
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On 11/22/21 12:33 PM, Brijesh Singh wrote:
>> How do we, for example, prevent ptrace() from inducing a panic()?
> 
> In the current approach, this access will induce a panic(). 

That needs to get fixed before SEV-SNP is merged, IMNHO.  This behavior
would effectively mean that any userspace given access to create SNP
guests would panic the kernel.

> In general, supporting the ptrace() for the encrypted VM region is
> going to be difficult.

By "supporting", do you mean doing something functional?  I don't really
care if ptrace() to guest private memory returns -EINVAL or whatever.
The most important thing is not crashing the host.

Also, as Sean mentioned, this isn't really about ptrace() itself.  It's
really about ensuring that no kernel or devices accesses to guest
private memory can induce bad behavior.

> The upcoming TDX work to unmap the guest memory region from the
> current process page table can easily extend for the SNP to cover the
> current limitations.

My preference would be that we never have SEV-SNP code in the kernel
that can panic() the host from guest userspace.  If that means waiting
until there's common guest unmapping infrastructure around, then I think
we should wait.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* RE: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 22:51             ` Dave Hansen
@ 2021-11-23  5:15               ` Luck, Tony
  2021-11-23  7:18               ` Borislav Petkov
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 239+ messages in thread
From: Luck, Tony @ 2021-11-23  5:15 UTC (permalink / raw)
  To: Hansen, Dave, Brijesh Singh, Peter Gonda
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, marcorr,
	sathyanarayanan.kuppuswamy

> My preference would be that we never have SEV-SNP code in the kernel
> that can panic() the host from guest userspace.  If that means waiting
> until there's common guest unmapping infrastructure around, then I think
> we should wait.

Perhaps I'm missing some context ... but guests must NEVER be allowed to
panic the host.

-Tony

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 22:51             ` Dave Hansen
  2021-11-23  5:15               ` Luck, Tony
@ 2021-11-23  7:18               ` Borislav Petkov
  2021-11-23 15:36                 ` Sean Christopherson
  2021-11-23  8:55               ` Vlastimil Babka
  2021-11-24 16:03               ` Joerg Roedel
  3 siblings, 1 reply; 239+ messages in thread
From: Borislav Petkov @ 2021-11-23  7:18 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Brijesh Singh, Peter Gonda, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Mon, Nov 22, 2021 at 02:51:35PM -0800, Dave Hansen wrote:
> By "supporting", do you mean doing something functional?  I don't really
> care if ptrace() to guest private memory returns -EINVAL or whatever.
> The most important thing is not crashing the host.
> 
> Also, as Sean mentioned, this isn't really about ptrace() itself.  It's
> really about ensuring that no kernel or devices accesses to guest
> private memory can induce bad behavior.

I keep repeating this suggestion of mine that we should treat
guest-private pages as hw-poisoned pages which have experienced a
uncorrectable error in the past.

mm already knows how to stay away from those.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 22:51             ` Dave Hansen
  2021-11-23  5:15               ` Luck, Tony
  2021-11-23  7:18               ` Borislav Petkov
@ 2021-11-23  8:55               ` Vlastimil Babka
  2021-11-24 16:03               ` Joerg Roedel
  3 siblings, 0 replies; 239+ messages in thread
From: Vlastimil Babka @ 2021-11-23  8:55 UTC (permalink / raw)
  To: Dave Hansen, Brijesh Singh, Peter Gonda
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On 11/22/21 23:51, Dave Hansen wrote:
> On 11/22/21 12:33 PM, Brijesh Singh wrote:
>>> How do we, for example, prevent ptrace() from inducing a panic()?
>> 
>> In the current approach, this access will induce a panic(). 
> 
> That needs to get fixed before SEV-SNP is merged, IMNHO.  This behavior
> would effectively mean that any userspace given access to create SNP
> guests would panic the kernel.
> 
>> In general, supporting the ptrace() for the encrypted VM region is
>> going to be difficult.
> 
> By "supporting", do you mean doing something functional?  I don't really
> care if ptrace() to guest private memory returns -EINVAL or whatever.
> The most important thing is not crashing the host.
> 
> Also, as Sean mentioned, this isn't really about ptrace() itself.  It's
> really about ensuring that no kernel or devices accesses to guest
> private memory can induce bad behavior.

Then we need gup to block any changes from shared to guest private? I assume
there will be the usual issues of recognizing temporary elevated refcount vs
long-term gup, etc.

>> The upcoming TDX work to unmap the guest memory region from the
>> current process page table can easily extend for the SNP to cover the
>> current limitations.

By "current process page table" you mean userspace page tables?

> My preference would be that we never have SEV-SNP code in the kernel
> that can panic() the host from guest userspace.  If that means waiting
> until there's common guest unmapping infrastructure around, then I think
> we should wait.
> 


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-23  7:18               ` Borislav Petkov
@ 2021-11-23 15:36                 ` Sean Christopherson
  2021-11-23 16:26                   ` Borislav Petkov
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2021-11-23 15:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Brijesh Singh, Peter Gonda, x86, linux-kernel, kvm,
	linux-coco, linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Tue, Nov 23, 2021, Borislav Petkov wrote:
> On Mon, Nov 22, 2021 at 02:51:35PM -0800, Dave Hansen wrote:
> > By "supporting", do you mean doing something functional?  I don't really
> > care if ptrace() to guest private memory returns -EINVAL or whatever.
> > The most important thing is not crashing the host.
> > 
> > Also, as Sean mentioned, this isn't really about ptrace() itself.  It's
> > really about ensuring that no kernel or devices accesses to guest
> > private memory can induce bad behavior.
> 
> I keep repeating this suggestion of mine that we should treat
> guest-private pages as hw-poisoned pages which have experienced a
> uncorrectable error in the past.
> 
> mm already knows how to stay away from those.

Kirill posted a few RFCs that did exactly that.  It's definitely a viable approach,
but it's a bit of a dead end, e.g. doesn't help solve page migration, is limited to
struct page, doesn't capture which KVM guest owns the memory, etc...

https://lore.kernel.org/kvm/20210416154106.23721-1-kirill.shutemov@linux.intel.com/

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-23 15:36                 ` Sean Christopherson
@ 2021-11-23 16:26                   ` Borislav Petkov
  0 siblings, 0 replies; 239+ messages in thread
From: Borislav Petkov @ 2021-11-23 16:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Dave Hansen, Brijesh Singh, Peter Gonda, x86, linux-kernel, kvm,
	linux-coco, linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Michael Roth, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On Tue, Nov 23, 2021 at 03:36:35PM +0000, Sean Christopherson wrote:
> Kirill posted a few RFCs that did exactly that.  It's definitely a viable approach,
> but it's a bit of a dead end,

One thing at a time...

> e.g. doesn't help solve page migration,

AFAICR, that needs a whole explicit and concerted effort with the
migration helper - that was one of the approaches, at least, guest's
explicit involvement, remote attestation and a bunch of other things...

> is limited to struct page

I'm no mm guy so maybe you can elaborate further.

> doesn't capture which KVM guest owns the memory, etc...

So I don't think we need this for the problem at hand. But from the
sound of it, it probably is a good idea to be able to map the guest
owner to the memory anyway.

> https://lore.kernel.org/kvm/20210416154106.23721-1-kirill.shutemov@linux.intel.com/

Right, there it is in the last patch.

Hmmkay, so we need some generic machinery which unmaps memory from
the host kernel's pagetables so that it doesn't do any stray/unwanted
accesses to it. I'd look in the direction of mm folks for what to do
exactly, though.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-22 22:51             ` Dave Hansen
                                 ` (2 preceding siblings ...)
  2021-11-23  8:55               ` Vlastimil Babka
@ 2021-11-24 16:03               ` Joerg Roedel
  2021-11-24 17:48                 ` Dave Hansen
  3 siblings, 1 reply; 239+ messages in thread
From: Joerg Roedel @ 2021-11-24 16:03 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Brijesh Singh, Peter Gonda, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Mon, Nov 22, 2021 at 02:51:35PM -0800, Dave Hansen wrote:
> My preference would be that we never have SEV-SNP code in the kernel
> that can panic() the host from guest userspace.  If that means waiting
> until there's common guest unmapping infrastructure around, then I think
> we should wait.

Can you elaborate how to crash host kernel from guest user-space? If I
understood correctly it was about crashing host kernel from _host_
user-space.

I think the RMP-fault path in the page-fault handler needs to take the
uaccess exception tables into account before actually causing a panic.
This should solve most of the problems discussed here.

Maybe we also need the previously suggested copy_from/to_guest()
interfaces.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-24 16:03               ` Joerg Roedel
@ 2021-11-24 17:48                 ` Dave Hansen
  2021-11-24 19:34                   ` Vlastimil Babka
  2021-11-25 10:05                   ` Joerg Roedel
  0 siblings, 2 replies; 239+ messages in thread
From: Dave Hansen @ 2021-11-24 17:48 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Brijesh Singh, Peter Gonda, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On 11/24/21 8:03 AM, Joerg Roedel wrote:
> On Mon, Nov 22, 2021 at 02:51:35PM -0800, Dave Hansen wrote:
>> My preference would be that we never have SEV-SNP code in the kernel
>> that can panic() the host from guest userspace.  If that means waiting
>> until there's common guest unmapping infrastructure around, then I think
>> we should wait.
> Can you elaborate how to crash host kernel from guest user-space? If I
> understood correctly it was about crashing host kernel from _host_
> user-space.

Sorry, I misspoke there.

My concern is about crashing the host kernel.  It appears that *host*
userspace can do that quite easily by inducing the host kernel to access
some guest private memory via a kernel mapping.

> I think the RMP-fault path in the page-fault handler needs to take the
> uaccess exception tables into account before actually causing a panic.
> This should solve most of the problems discussed here.

That covers things like copy_from_user().  It does not account for
things where kernel mappings are used, like where a
get_user_pages()/kmap() is in play.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-24 17:48                 ` Dave Hansen
@ 2021-11-24 19:34                   ` Vlastimil Babka
  2021-11-25 10:05                   ` Joerg Roedel
  1 sibling, 0 replies; 239+ messages in thread
From: Vlastimil Babka @ 2021-11-24 19:34 UTC (permalink / raw)
  To: Dave Hansen, Joerg Roedel
  Cc: Brijesh Singh, Peter Gonda, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On 11/24/21 18:48, Dave Hansen wrote:
> On 11/24/21 8:03 AM, Joerg Roedel wrote:
>> On Mon, Nov 22, 2021 at 02:51:35PM -0800, Dave Hansen wrote:
>>> My preference would be that we never have SEV-SNP code in the kernel
>>> that can panic() the host from guest userspace.  If that means waiting
>>> until there's common guest unmapping infrastructure around, then I think
>>> we should wait.
>> Can you elaborate how to crash host kernel from guest user-space? If I
>> understood correctly it was about crashing host kernel from _host_
>> user-space.
> 
> Sorry, I misspoke there.
> 
> My concern is about crashing the host kernel.  It appears that *host*
> userspace can do that quite easily by inducing the host kernel to access
> some guest private memory via a kernel mapping.

I thought some of the scenarios discussed here also went along "guest
(doesn't matter if userspace or kernel) shares a page with host, invokes
some host kernel operation and in parallel makes the page private again".

>> I think the RMP-fault path in the page-fault handler needs to take the
>> uaccess exception tables into account before actually causing a panic.
>> This should solve most of the problems discussed here.
> 
> That covers things like copy_from_user().  It does not account for
> things where kernel mappings are used, like where a
> get_user_pages()/kmap() is in play.
> 


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-24 17:48                 ` Dave Hansen
  2021-11-24 19:34                   ` Vlastimil Babka
@ 2021-11-25 10:05                   ` Joerg Roedel
  2021-11-29 14:44                     ` Brijesh Singh
  2021-11-29 16:41                     ` Dave Hansen
  1 sibling, 2 replies; 239+ messages in thread
From: Joerg Roedel @ 2021-11-25 10:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Brijesh Singh, Peter Gonda, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Wed, Nov 24, 2021 at 09:48:14AM -0800, Dave Hansen wrote:
> That covers things like copy_from_user().  It does not account for
> things where kernel mappings are used, like where a
> get_user_pages()/kmap() is in play.

The kmap case is guarded by KVM code, which locks the page first so that
the guest can't change the page state, then checks the page state, and
if it is shared does the kmap and the access.

This should turn an RMP fault in the kernel which is not covered in the
uaccess exception table into a fatal error.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-25 10:05                   ` Joerg Roedel
@ 2021-11-29 14:44                     ` Brijesh Singh
  2021-11-29 14:58                       ` Vlastimil Babka
  2021-11-29 16:41                     ` Dave Hansen
  1 sibling, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-11-29 14:44 UTC (permalink / raw)
  To: Joerg Roedel, Dave Hansen
  Cc: brijesh.singh, Peter Gonda, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy



On 11/25/21 4:05 AM, Joerg Roedel wrote:
> On Wed, Nov 24, 2021 at 09:48:14AM -0800, Dave Hansen wrote:
>> That covers things like copy_from_user().  It does not account for
>> things where kernel mappings are used, like where a
>> get_user_pages()/kmap() is in play.
> 
> The kmap case is guarded by KVM code, which locks the page first so that
> the guest can't change the page state, then checks the page state, and
> if it is shared does the kmap and the access.


The KVM use-case is well covered in the series, but I believe Dave is 
highlighting what if the access happens outside of the KVM driver (such 
as a ptrace() or others).

One possible approach to fix this is to enlighten the kmap/unmap(). 
Basically, move the per page locking mechanism used by the KVM in the 
arch-specific code and have kmap/kunmap() call the arch hooks. The arch 
hooks will do this:

Before the map, check whether the page is added as a shared in the RMP 
table. If not shared, then error.
Acquire a per-page map_lock.
Release the per-page map_lock on the kunmap().

The current patch set provides helpers to change the page from private 
to shared. Enhance the helpers to check for the per-page map_lock, if 
the map_lock is held then do not allow changing the page from shared to 
private.

Thoughts ?

> 
> This should turn an RMP fault in the kernel which is not covered in the
> uaccess exception table into a fatal error.
> 
> Regards,
> 

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-29 14:44                     ` Brijesh Singh
@ 2021-11-29 14:58                       ` Vlastimil Babka
  2021-11-29 16:13                         ` Brijesh Singh
  0 siblings, 1 reply; 239+ messages in thread
From: Vlastimil Babka @ 2021-11-29 14:58 UTC (permalink / raw)
  To: Brijesh Singh, Joerg Roedel, Dave Hansen
  Cc: Peter Gonda, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On 11/29/21 15:44, Brijesh Singh wrote:
> 
> 
> On 11/25/21 4:05 AM, Joerg Roedel wrote:
>> On Wed, Nov 24, 2021 at 09:48:14AM -0800, Dave Hansen wrote:
>>> That covers things like copy_from_user().  It does not account for
>>> things where kernel mappings are used, like where a
>>> get_user_pages()/kmap() is in play.
>>
>> The kmap case is guarded by KVM code, which locks the page first so that
>> the guest can't change the page state, then checks the page state, and
>> if it is shared does the kmap and the access.
> 
> 
> The KVM use-case is well covered in the series, but I believe Dave is
> highlighting what if the access happens outside of the KVM driver (such as a
> ptrace() or others).

AFAIU ptrace() is a scenario where the userspace mapping is being gup-ped,
not a kernel page being kmap()ed?

> One possible approach to fix this is to enlighten the kmap/unmap().
> Basically, move the per page locking mechanism used by the KVM in the
> arch-specific code and have kmap/kunmap() call the arch hooks. The arch
> hooks will do this:
> 
> Before the map, check whether the page is added as a shared in the RMP
> table. If not shared, then error.
> Acquire a per-page map_lock.
> Release the per-page map_lock on the kunmap().
> 
> The current patch set provides helpers to change the page from private to
> shared. Enhance the helpers to check for the per-page map_lock, if the
> map_lock is held then do not allow changing the page from shared to private.

That could work for the kmap() context.
What to do for the userspace context (host userspace)?
- shared->private transition - page has to be unmapped from all userspace,
elevated refcount (gup() in progress) can block this unmap until it goes
away - could be doable
- still, what to do if host userspace then tries to access the unmapped
page? SIGSEGV instead of SIGBUS and it can recover?



> Thoughts ?
> 
>>
>> This should turn an RMP fault in the kernel which is not covered in the
>> uaccess exception table into a fatal error.
>>
>> Regards,
>>


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-29 14:58                       ` Vlastimil Babka
@ 2021-11-29 16:13                         ` Brijesh Singh
  2021-11-30 19:40                           ` Vlastimil Babka
  0 siblings, 1 reply; 239+ messages in thread
From: Brijesh Singh @ 2021-11-29 16:13 UTC (permalink / raw)
  To: Vlastimil Babka, Joerg Roedel, Dave Hansen
  Cc: brijesh.singh, Peter Gonda, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy



On 11/29/21 8:58 AM, Vlastimil Babka wrote:
> On 11/29/21 15:44, Brijesh Singh wrote:
>>
>>
>> On 11/25/21 4:05 AM, Joerg Roedel wrote:
>>> On Wed, Nov 24, 2021 at 09:48:14AM -0800, Dave Hansen wrote:
>>>> That covers things like copy_from_user().  It does not account for
>>>> things where kernel mappings are used, like where a
>>>> get_user_pages()/kmap() is in play.
>>>
>>> The kmap case is guarded by KVM code, which locks the page first so that
>>> the guest can't change the page state, then checks the page state, and
>>> if it is shared does the kmap and the access.
>>
>>
>> The KVM use-case is well covered in the series, but I believe Dave is
>> highlighting what if the access happens outside of the KVM driver (such as a
>> ptrace() or others).
> 
> AFAIU ptrace() is a scenario where the userspace mapping is being gup-ped,
> not a kernel page being kmap()ed?
> 

Yes that is correct.

>> One possible approach to fix this is to enlighten the kmap/unmap().
>> Basically, move the per page locking mechanism used by the KVM in the
>> arch-specific code and have kmap/kunmap() call the arch hooks. The arch
>> hooks will do this:
>>
>> Before the map, check whether the page is added as a shared in the RMP
>> table. If not shared, then error.
>> Acquire a per-page map_lock.
>> Release the per-page map_lock on the kunmap().
>>
>> The current patch set provides helpers to change the page from private to
>> shared. Enhance the helpers to check for the per-page map_lock, if the
>> map_lock is held then do not allow changing the page from shared to private.
> 
> That could work for the kmap() context.
> What to do for the userspace context (host userspace)?
> - shared->private transition - page has to be unmapped from all userspace,
> elevated refcount (gup() in progress) can block this unmap until it goes
> away - could be doable

An unmap of the page from all the userspace process during the page 
state transition will be great. If we can somehow store the state 
information in the 'struct page' then it can be later used to make 
better decision. I am not sure that relying on the elevated refcount is 
the correct approach. e.g in the case of encrypted guests, the HV may 
pin the page to prevent it from migration.

Thoughts on how you want to approach unmaping the page from userspace 
page table?


> - still, what to do if host userspace then tries to access the unmapped
> page? SIGSEGV instead of SIGBUS and it can recover?
> 

Yes, SIGSEGV makes sense to me.


> 
> 
>> Thoughts ?
>>
>>>
>>> This should turn an RMP fault in the kernel which is not covered in the
>>> uaccess exception table into a fatal error.
>>>
>>> Regards,
>>>
> 

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-25 10:05                   ` Joerg Roedel
  2021-11-29 14:44                     ` Brijesh Singh
@ 2021-11-29 16:41                     ` Dave Hansen
  1 sibling, 0 replies; 239+ messages in thread
From: Dave Hansen @ 2021-11-29 16:41 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Brijesh Singh, Peter Gonda, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On 11/25/21 2:05 AM, Joerg Roedel wrote:
> On Wed, Nov 24, 2021 at 09:48:14AM -0800, Dave Hansen wrote:
>> That covers things like copy_from_user().  It does not account for
>> things where kernel mappings are used, like where a
>> get_user_pages()/kmap() is in play.
> The kmap case is guarded by KVM code, which locks the page first so that
> the guest can't change the page state, then checks the page state, and
> if it is shared does the kmap and the access.
> 
> This should turn an RMP fault in the kernel which is not covered in the
> uaccess exception table into a fatal error.

Let's say something does process_vm_readv() where the pid is a qemu
process and it is writing to a guest private memory area.  The syscall
will eventually end up in process_vm_rw_single_vec() which does:

>                 pinned_pages = pin_user_pages_remote(mm, pa, pinned_pages,
>                                                      flags, process_pages,
>                                                      NULL, &locked);
...
>                 rc = process_vm_rw_pages(process_pages,
>                                          start_offset, bytes, iter,
>                                          vm_write);


and eventually in copy_page_from_iter():

>                 void *kaddr = kmap_local_page(page);
>                 size_t wanted = _copy_from_iter(kaddr + offset, bytes, i);
>                 kunmap_local(kaddr);

The kernel access to 'kaddr+offset' shouldn't fault.  How does the KVM
code thwart that kmap_local_page()?

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
  2021-11-29 16:13                         ` Brijesh Singh
@ 2021-11-30 19:40                           ` Vlastimil Babka
  0 siblings, 0 replies; 239+ messages in thread
From: Vlastimil Babka @ 2021-11-30 19:40 UTC (permalink / raw)
  To: Brijesh Singh, Joerg Roedel, Dave Hansen
  Cc: Peter Gonda, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy

On 11/29/21 17:13, Brijesh Singh wrote:
>>
>> That could work for the kmap() context.
>> What to do for the userspace context (host userspace)?
>> - shared->private transition - page has to be unmapped from all userspace,
>> elevated refcount (gup() in progress) can block this unmap until it goes
>> away - could be doable
> 
> An unmap of the page from all the userspace process during the page state
> transition will be great. If we can somehow store the state information in
> the 'struct page' then it can be later used to make better decision. I am
> not sure that relying on the elevated refcount is the correct approach. e.g
> in the case of encrypted guests, the HV may pin the page to prevent it from
> migration.
> 
> Thoughts on how you want to approach unmaping the page from userspace page
> table?

After giving it more thought and rereading the threads here it seems I
thought it would be easier than it really is, and it would have to be
something at least like Kirill's hwpoison based approach.

>> - still, what to do if host userspace then tries to access the unmapped
>> page? SIGSEGV instead of SIGBUS and it can recover?
>>
> 
> Yes, SIGSEGV makes sense to me.

OTOH the newer fd-based proposal also IIUC takes care of this part better -
the host userspace controls the guest's shared->private conversion requests
so it can't be tricked to access a page that's changed under it.

>>
>>
>>> Thoughts ?
>>>
>>>>
>>>> This should turn an RMP fault in the kernel which is not covered in the
>>>> uaccess exception table into a fatal error.
>>>>
>>>> Regards,
>>>>
>>


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 14/45] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2021-08-20 15:58 ` [PATCH Part2 v5 14/45] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Brijesh Singh
@ 2022-02-25 18:03   ` Alper Gun
  2022-03-01 14:12     ` Brijesh Singh
  2022-06-14  0:10   ` Alper Gun
  1 sibling, 1 reply; 239+ messages in thread
From: Alper Gun @ 2022-02-25 18:03 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> The behavior and requirement for the SEV-legacy command is altered when
> the SNP firmware is in the INIT state. See SEV-SNP firmware specification
> for more details.
>
> Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
> when SNP is enabled to satify new requirements for the SNP. Continue
> allocating a 1mb region for !SNP configuration.
>
> While at it, provide API that can be used by others to allocate a page
> that can be used by the firmware. The immediate user for this API will
> be the KVM driver. The KVM driver to need to allocate a firmware context
> page during the guest creation. The context page need to be updated
> by the firmware. See the SEV-SNP specification for further details.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  drivers/crypto/ccp/sev-dev.c | 169 ++++++++++++++++++++++++++++++++++-
>  include/linux/psp-sev.h      |  11 +++
>  2 files changed, 176 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 01edad9116f2..34dc358b13b9 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -62,6 +62,14 @@ static int psp_timeout;
>  #define SEV_ES_TMR_SIZE                (1024 * 1024)
>  static void *sev_es_tmr;
>
> +/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
> +#define SEV_SNP_ES_TMR_SIZE    (2 * 1024 * 1024)
> +
> +static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
> +
> +static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
> +static int sev_do_cmd(int cmd, void *data, int *psp_ret);
> +
>  static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
>  {
>         struct sev_device *sev = psp_master->sev_data;
> @@ -159,6 +167,156 @@ static int sev_cmd_buffer_len(int cmd)
>         return 0;
>  }
>
> +static void snp_leak_pages(unsigned long pfn, unsigned int npages)
> +{
> +       WARN(1, "psc failed, pfn 0x%lx pages %d (leaking)\n", pfn, npages);
> +       while (npages--) {
> +               memory_failure(pfn, 0);
> +               dump_rmpentry(pfn);
> +               pfn++;
> +       }
> +}
> +
> +static int snp_reclaim_pages(unsigned long pfn, unsigned int npages, bool locked)
> +{
> +       struct sev_data_snp_page_reclaim data;
> +       int ret, err, i, n = 0;
> +
> +       for (i = 0; i < npages; i++) {
> +               memset(&data, 0, sizeof(data));
> +               data.paddr = pfn << PAGE_SHIFT;
> +
> +               if (locked)
> +                       ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> +               else
> +                       ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> +               if (ret)
> +                       goto cleanup;
> +
> +               ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> +               if (ret)
> +                       goto cleanup;
> +
> +               pfn++;
> +               n++;
> +       }
> +
> +       return 0;
> +
> +cleanup:
> +       /*
> +        * If failed to reclaim the page then page is no longer safe to
> +        * be released, leak it.
> +        */
> +       snp_leak_pages(pfn, npages - n);
> +       return ret;
> +}
> +
> +static inline int rmp_make_firmware(unsigned long pfn, int level)
> +{
> +       return rmp_make_private(pfn, 0, level, 0, true);
> +}
> +
> +static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, bool to_fw, bool locked,
> +                            bool need_reclaim)
> +{
> +       unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT; /* Cbit maybe set in the paddr */
> +       int rc, n = 0, i;
> +
> +       for (i = 0; i < npages; i++) {
> +               if (to_fw)
> +                       rc = rmp_make_firmware(pfn, PG_LEVEL_4K);
> +               else
> +                       rc = need_reclaim ? snp_reclaim_pages(pfn, 1, locked) :
> +                                           rmp_make_shared(pfn, PG_LEVEL_4K);
> +               if (rc)
> +                       goto cleanup;
> +
> +               pfn++;
> +               n++;
> +       }
> +
> +       return 0;
> +
> +cleanup:
> +       /* Try unrolling the firmware state changes */
> +       if (to_fw) {
> +               /*
> +                * Reclaim the pages which were already changed to the
> +                * firmware state.
> +                */
> +               snp_reclaim_pages(paddr >> PAGE_SHIFT, n, locked);
> +
> +               return rc;
> +       }
> +
> +       /*
> +        * If failed to change the page state to shared, then its not safe
> +        * to release the page back to the system, leak it.
> +        */
> +       snp_leak_pages(pfn, npages - n);
> +
> +       return rc;
> +}
> +
> +static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
> +{
> +       unsigned long npages = 1ul << order, paddr;
> +       struct sev_device *sev;
> +       struct page *page;
> +
> +       if (!psp_master || !psp_master->sev_data)
> +               return ERR_PTR(-EINVAL);
> +
> +       page = alloc_pages(gfp_mask, order);
> +       if (!page)
> +               return NULL;
> +
> +       /* If SEV-SNP is initialized then add the page in RMP table. */
> +       sev = psp_master->sev_data;
> +       if (!sev->snp_inited)
> +               return page;
> +
> +       paddr = __pa((unsigned long)page_address(page));
> +       if (snp_set_rmp_state(paddr, npages, true, locked, false))
> +               return NULL;
> +
> +       return page;
> +}
> +
> +void *snp_alloc_firmware_page(gfp_t gfp_mask)
> +{
> +       struct page *page;
> +
> +       page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
> +
> +       return page ? page_address(page) : NULL;
> +}
> +EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
> +
> +static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
> +{
> +       unsigned long paddr, npages = 1ul << order;
> +
> +       if (!page)
> +               return;
> +
> +       paddr = __pa((unsigned long)page_address(page));
> +       if (snp_set_rmp_state(paddr, npages, false, locked, true))
> +               return;
> +
> +       __free_pages(page, order);
> +}
> +
> +void snp_free_firmware_page(void *addr)
> +{
> +       if (!addr)
> +               return;
> +
> +       __snp_free_firmware_pages(virt_to_page(addr), 0, false);
> +}
> +EXPORT_SYMBOL(snp_free_firmware_page);
> +
>  static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>  {
>         struct psp_device *psp = psp_master;
> @@ -281,7 +439,7 @@ static int __sev_platform_init_locked(int *error)
>
>                 data.flags |= SEV_INIT_FLAGS_SEV_ES;
>                 data.tmr_address = tmr_pa;
> -               data.tmr_len = SEV_ES_TMR_SIZE;
> +               data.tmr_len = sev_es_tmr_size;
>         }
>
>         rc = __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
> @@ -638,6 +796,8 @@ static int __sev_snp_init_locked(int *error)
>         sev->snp_inited = true;
>         dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
>
> +       sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
> +
>         return rc;
>  }
>
> @@ -1161,8 +1321,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
>                 /* The TMR area was encrypted, flush it from the cache */
>                 wbinvd_on_all_cpus();
>
> -               free_pages((unsigned long)sev_es_tmr,
> -                          get_order(SEV_ES_TMR_SIZE));
> +               __snp_free_firmware_pages(virt_to_page(sev_es_tmr),
> +                                         get_order(sev_es_tmr_size),
> +                                         false);
Shouldn't there be a check here for snp_inited before calling rmpupdate.
TMR page can exist even if the SNP is not supported.

>                 sev_es_tmr = NULL;
>         }
>
> @@ -1233,7 +1394,7 @@ void sev_pci_init(void)
>         }
>
>         /* Obtain the TMR memory area for SEV-ES use */
> -       tmr_page = alloc_pages(GFP_KERNEL, get_order(SEV_ES_TMR_SIZE));
> +       tmr_page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(sev_es_tmr_size), false);
>         if (tmr_page) {
>                 sev_es_tmr = page_address(tmr_page);
>         } else {
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index f2105a8755f9..00bd684dc094 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -12,6 +12,8 @@
>  #ifndef __PSP_SEV_H__
>  #define __PSP_SEV_H__
>
> +#include <linux/sev.h>
> +
>  #include <uapi/linux/psp-sev.h>
>
>  #ifdef CONFIG_X86
> @@ -919,6 +921,8 @@ int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
>  int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
>
>  void *psp_copy_user_blob(u64 uaddr, u32 len);
> +void *snp_alloc_firmware_page(gfp_t mask);
> +void snp_free_firmware_page(void *addr);
>
>  #else  /* !CONFIG_CRYPTO_DEV_SP_PSP */
>
> @@ -960,6 +964,13 @@ static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *erro
>         return -ENODEV;
>  }
>
> +static inline void *snp_alloc_firmware_page(gfp_t mask)
> +{
> +       return NULL;
> +}
> +
> +static inline void snp_free_firmware_page(void *addr) { }
> +
>  #endif /* CONFIG_CRYPTO_DEV_SP_PSP */
>
>  #endif /* __PSP_SEV_H__ */
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 14/45] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2022-02-25 18:03   ` Alper Gun
@ 2022-03-01 14:12     ` Brijesh Singh
  0 siblings, 0 replies; 239+ messages in thread
From: Brijesh Singh @ 2022-03-01 14:12 UTC (permalink / raw)
  To: Alper Gun
  Cc: brijesh.singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

Hi Alper,

On 2/25/22 12:03, Alper Gun wrote:

>>
>> -               free_pages((unsigned long)sev_es_tmr,
>> -                          get_order(SEV_ES_TMR_SIZE));
>> +               __snp_free_firmware_pages(virt_to_page(sev_es_tmr),
>> +                                         get_order(sev_es_tmr_size),
>> +                                         false);
> Shouldn't there be a check here for snp_inited before calling rmpupdate.
> TMR page can exist even if the SNP is not supported.
> 

Yes, I have this fix in my WIP branch and will be included in the v6.

thanks

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 27/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
  2021-08-20 15:59 ` [PATCH Part2 v5 27/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Brijesh Singh
@ 2022-05-18 20:21   ` Marc Orr
  2022-05-18 20:35     ` Kalra, Ashish
  0 siblings, 1 reply; 239+ messages in thread
From: Marc Orr @ 2022-05-18 20:21 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: x86, LKML, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Gonda, Peter Zijlstra, Srinivas Pandruvada, David Rientjes,
	Dov Murik, Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Sathyanarayanan Kuppuswamy, Alper Gun

> @@ -2364,16 +2467,29 @@ static void sev_flush_guest_memory(struct vcpu_svm *svm, void *va,
>  void sev_free_vcpu(struct kvm_vcpu *vcpu)
>  {
>         struct vcpu_svm *svm;
> +       u64 pfn;
>
>         if (!sev_es_guest(vcpu->kvm))
>                 return;
>
>         svm = to_svm(vcpu);
> +       pfn = __pa(svm->vmsa) >> PAGE_SHIFT;
>
>         if (vcpu->arch.guest_state_protected)
>                 sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
> +
> +       /*
> +        * If its an SNP guest, then VMSA was added in the RMP entry as
> +        * a guest owned page. Transition the page to hyperivosr state
> +        * before releasing it back to the system.
> +        */
> +       if (sev_snp_guest(vcpu->kvm) &&
> +           host_rmp_make_shared(pfn, PG_LEVEL_4K, false))
> +               goto skip_vmsa_free;
> +
>         __free_page(virt_to_page(svm->vmsa));
>
> +skip_vmsa_free:
>         if (svm->ghcb_sa_free)
>                 kfree(svm->ghcb_sa);
>  }

Hi Ashish. We're still working with this patch set internally. We
found a bug that I wanted to report in this patch. Above, we need to
flush the VMSA page, `svm->vmsa`, _after_ we call
`host_rmp_make_shared()` to mark the page is shared. Otherwise, the
host gets an RMP violation when it tries to flush the guest-owned VMSA
page.

The bug was silent, at least on our Milan platforms, bef reo
d45829b351ee6 ("KVM: SVM: Flush when freeing encrypted pages even on
SME_COHERENT CPUs"), because the `sev_flush_guest_memory()` helper was
a noop on platforms with the SME_COHERENT feature. However, after
d45829b351ee6, we unconditionally do the flush to keep the IO address
space coherent. And then we hit this bug.

Thanks,
Marc

^ permalink raw reply	[flat|nested] 239+ messages in thread

* RE: [PATCH Part2 v5 27/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
  2022-05-18 20:21   ` Marc Orr
@ 2022-05-18 20:35     ` Kalra, Ashish
  0 siblings, 0 replies; 239+ messages in thread
From: Kalra, Ashish @ 2022-05-18 20:35 UTC (permalink / raw)
  To: Marc Orr
  Cc: x86, LKML, kvm list, linux-coco, linux-mm,
	Linux Crypto Mailing List, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Lendacky, Thomas, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Gonda, Peter Zijlstra, Srinivas Pandruvada, David Rientjes,
	Dov Murik, Tobin Feldman-Fitzthum, Borislav Petkov, Roth,
	Michael, Vlastimil Babka, Kirill A . Shutemov, Andi Kleen,
	Tony Luck, Sathyanarayanan Kuppuswamy, Alper Gun

[AMD Official Use Only - General]

Hello Marc,

-----Original Message-----
From: Marc Orr <marcorr@google.com> 
Sent: Wednesday, May 18, 2022 3:21 PM
To: Kalra, Ashish <Ashish.Kalra@amd.com>
Cc: x86 <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing List <linux-crypto@vger.kernel.org>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H. Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo Bonzini <pbonzini@redhat.com>; Sean Christopherson <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy Lutomirski <luto@kernel.org>; Dave Hansen <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter Gonda <pgonda@google.com>; Peter Zijlstra <peterz@infradead.org>; Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>; David Rientjes <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>; Alper Gun <alpergun@google.com>
Subject: Re: [PATCH Part2 v5 27/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command

> @@ -2364,16 +2467,29 @@ static void sev_flush_guest_memory(struct 
> vcpu_svm *svm, void *va,  void sev_free_vcpu(struct kvm_vcpu *vcpu)  {
>         struct vcpu_svm *svm;
> +       u64 pfn;
>
>         if (!sev_es_guest(vcpu->kvm))
>                 return;
>
>         svm = to_svm(vcpu);
> +       pfn = __pa(svm->vmsa) >> PAGE_SHIFT;
>
>         if (vcpu->arch.guest_state_protected)
>                 sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
> +
> +       /*
> +        * If its an SNP guest, then VMSA was added in the RMP entry as
> +        * a guest owned page. Transition the page to hyperivosr state
> +        * before releasing it back to the system.
> +        */
> +       if (sev_snp_guest(vcpu->kvm) &&
> +           host_rmp_make_shared(pfn, PG_LEVEL_4K, false))
> +               goto skip_vmsa_free;
> +
>         __free_page(virt_to_page(svm->vmsa));
>
> +skip_vmsa_free:
>         if (svm->ghcb_sa_free)
>                 kfree(svm->ghcb_sa);
>  }

>Hi Ashish. We're still working with this patch set internally. We found a bug that I wanted to report in this patch. Above, we need to flush the VMSA page, `svm->vmsa`, _after_ we call `host_rmp_make_shared()` to mark the page is shared. >Otherwise, the host gets an RMP violation when it tries to flush the guest-owned VMSA page.

>The bug was silent, at least on our Milan platforms, bef reo
>d45829b351ee6 ("KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs"), because the `sev_flush_guest_memory()` helper was a noop on platforms with the SME_COHERENT feature. However, after d45829b351ee6, we >unconditionally do the flush to keep the IO address space coherent. And then we hit this bug.

Yes I have already hit this bug and added a fix as below:

commit 944fba38cbd3baf1ece76197630bd45e83089f14
Author: Ashish Kalra <ashish.kalra@amd.com>
Date:   Tue May 3 14:33:29 2022 +0000

    KVM: SVM: Fix VMSA flush for an SNP guest.
    
    If its an SNP guest, then VMSA was added in the RMP entry as
    a guest owned page and also removed from the kernel direct map
    so flush it later after it is transitioned back to hypervisor
    state and restored in the direct map.
    
    Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index cc7c34d8b0db..0f772a0f1d35 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2840,27 +2840,23 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 
        svm = to_svm(vcpu);
 
-       if (vcpu->arch.guest_state_protected)
-               sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
-
        /*
         * If its an SNP guest, then VMSA was added in the RMP entry as
         * a guest owned page. Transition the page to hyperivosr state
         * before releasing it back to the system.
+        * Also the page is removed from the kernel direct map, so flush it
+        * later after it is transitioned back to hypervisor state and
+        * restored in the direct map.
         */
        if (sev_snp_guest(vcpu->kvm)) {
                u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
                if (host_rmp_make_shared(pfn, PG_LEVEL_4K, false))
                        goto skip_vmsa_free;
        }
 
+       if (vcpu->arch.guest_state_protected)
+               sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
+
        __free_page(virt_to_page(svm->sev_es.vmsa));
 
 skip_vmsa_free:


This will be part of the next hypervisor patches which we will be posting next.
Thanks,
Ashish

^ permalink raw reply related	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 04/45] x86/sev: Add RMP entry lookup helpers
  2021-08-20 15:58 ` [PATCH Part2 v5 04/45] x86/sev: Add RMP entry lookup helpers Brijesh Singh
  2021-09-24  9:49   ` Borislav Petkov
@ 2022-06-02 11:57   ` Jarkko Sakkinen
  1 sibling, 0 replies; 239+ messages in thread
From: Jarkko Sakkinen @ 2022-06-02 11:57 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	marcorr, sathyanarayanan.kuppuswamy

On Fri, Aug 20, 2021 at 10:58:37AM -0500, Brijesh Singh wrote:
> The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
> entry for a given page. The RMP entry format is documented in AMD PPR, see
> https://bugzilla.kernel.org/attachment.cgi?id=296015.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/sev.h | 27 ++++++++++++++++++++++++
>  arch/x86/kernel/sev.c      | 43 ++++++++++++++++++++++++++++++++++++++
>  include/linux/sev.h        | 30 ++++++++++++++++++++++++++
>  3 files changed, 100 insertions(+)
>  create mode 100644 include/linux/sev.h
> 
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index a5f0a1c3ccbe..5b1a6a075c47 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -9,6 +9,7 @@
>  #define __ASM_ENCRYPTED_STATE_H
>  
>  #include <linux/types.h>
> +#include <linux/sev.h>
>  #include <asm/insn.h>
>  #include <asm/sev-common.h>
>  #include <asm/bootparam.h>
> @@ -77,6 +78,32 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
>  
>  /* RMP page size */
>  #define RMP_PG_SIZE_4K			0
> +#define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
> +
> +/*
> + * The RMP entry format is not architectural. The format is defined in PPR
> + * Family 19h Model 01h, Rev B1 processor.
> + */
> +struct __packed rmpentry {
> +	union {
> +		struct {
> +			u64	assigned	: 1,
> +				pagesize	: 1,
> +				immutable	: 1,
> +				rsvd1		: 9,
> +				gpa		: 39,
> +				asid		: 10,
> +				vmsa		: 1,
> +				validated	: 1,
> +				rsvd2		: 1;
> +		} info;
> +		u64 low;
> +	};
> +	u64 high;
> +};
> +
> +#define rmpentry_assigned(x)	((x)->info.assigned)
> +#define rmpentry_pagesize(x)	((x)->info.pagesize)
>  
>  #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
>  
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 7936c8139c74..f383d2a89263 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -54,6 +54,8 @@
>   * bookkeeping, the range need to be added during the RMP entry lookup.
>   */
>  #define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
> +#define RMPENTRY_SHIFT			8
> +#define rmptable_page_offset(x)	(RMPTABLE_CPU_BOOKKEEPING_SZ + (((unsigned long)x) >> RMPENTRY_SHIFT))
>  
>  /* For early boot hypervisor communication in SEV-ES enabled guests */
>  static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
> @@ -2376,3 +2378,44 @@ static int __init snp_rmptable_init(void)
>   * available after subsys_initcall().
>   */
>  fs_initcall(snp_rmptable_init);
> +
> +static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
> +{
> +	unsigned long vaddr, paddr = pfn << PAGE_SHIFT;
> +	struct rmpentry *entry, *large_entry;
> +
> +	if (!pfn_valid(pfn))
> +		return ERR_PTR(-EINVAL);
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return ERR_PTR(-ENXIO);
> +
> +	vaddr = rmptable_start + rmptable_page_offset(paddr);
> +	if (unlikely(vaddr > rmptable_end))
> +		return ERR_PTR(-ENXIO);
> +
> +	entry = (struct rmpentry *)vaddr;
> +
> +	/* Read a large RMP entry to get the correct page level used in RMP entry. */
> +	vaddr = rmptable_start + rmptable_page_offset(paddr & PMD_MASK);
> +	large_entry = (struct rmpentry *)vaddr;
> +	*level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(large_entry));
> +
> +	return entry;
> +}
> +
> +/*
> + * Return 1 if the RMP entry is assigned, 0 if it exists but is not assigned,
> + * and -errno if there is no corresponding RMP entry.

According to /usr/include/asm-generic/errno.h, there is 133
error codes. Why you need so many just to say that there
is no entry?

> + */
> +int snp_lookup_rmpentry(u64 pfn, int *level)
> +{
> +	struct rmpentry *e;
> +
> +	e = __snp_lookup_rmpentry(pfn, level);
> +	if (IS_ERR(e))
> +		return PTR_ERR(e);
> +
> +	return !!rmpentry_assigned(e);
> +}
> +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> diff --git a/include/linux/sev.h b/include/linux/sev.h
> new file mode 100644
> index 000000000000..1a68842789e1
> --- /dev/null
> +++ b/include/linux/sev.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * AMD Secure Encrypted Virtualization
> + *
> + * Author: Brijesh Singh <brijesh.singh@amd.com>

nit: this is redundant.

Each git commit has an author-field.

> + */
> +
> +#ifndef __LINUX_SEV_H
> +#define __LINUX_SEV_H
> +
> +/* RMUPDATE detected 4K page and 2MB page overlap. */
> +#define RMPUPDATE_FAIL_OVERLAP		7
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +int snp_lookup_rmpentry(u64 pfn, int *level);
> +int psmash(u64 pfn);
> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> +int rmp_make_shared(u64 pfn, enum pg_level level);
> +#else
> +static inline int snp_lookup_rmpentry(u64 pfn, int *level) { return 0; }
> +static inline int psmash(u64 pfn) { return -ENXIO; }
> +static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
> +				   bool immutable)
> +{
> +	return -ENODEV;
> +}
> +static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
> +
> +#endif /* CONFIG_AMD_MEM_ENCRYPT */
> +#endif /* __LINUX_SEV_H */
> -- 
> 2.17.1
> 

BR, Jarkko

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2021-08-20 15:58 ` [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command Brijesh Singh
                     ` (2 preceding siblings ...)
  2021-09-16 15:50   ` Peter Gonda
@ 2022-06-13 20:58   ` Alper Gun
  2022-06-13 23:15     ` Ashish Kalra
  3 siblings, 1 reply; 239+ messages in thread
From: Alper Gun @ 2022-06-13 20:58 UTC (permalink / raw)
  To: Brijesh Singh, Ashish.Kalra
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	Marc Orr, sathyanarayanan.kuppuswamy, Pavan Kumar Paluri

On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> The KVM_SNP_INIT command is used by the hypervisor to initialize the
> SEV-SNP platform context. In a typical workflow, this command should be the
> first command issued. When creating SEV-SNP guest, the VMM must use this
> command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.
>
> The flags value must be zero, it will be extended in future SNP support to
> communicate the optional features (such as restricted INT injection etc).
>
> Co-developed-by: Pavan Kumar Paluri <papaluri@amd.com>
> Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  .../virt/kvm/amd-memory-encryption.rst        | 27 ++++++++++++
>  arch/x86/include/asm/svm.h                    |  2 +
>  arch/x86/kvm/svm/sev.c                        | 44 ++++++++++++++++++-
>  arch/x86/kvm/svm/svm.h                        |  4 ++
>  include/uapi/linux/kvm.h                      | 13 ++++++
>  5 files changed, 88 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 5c081c8c7164..7b1d32fb99a8 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -427,6 +427,33 @@ issued by the hypervisor to make the guest ready for execution.
>
>  Returns: 0 on success, -negative on error
>
> +18. KVM_SNP_INIT
> +----------------
> +
> +The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
> +context. In a typical workflow, this command should be the first command issued.
> +
> +Parameters (in/out): struct kvm_snp_init
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +        struct kvm_snp_init {
> +                __u64 flags;
> +        };
> +
> +The flags bitmap is defined as::
> +
> +   /* enable the restricted injection */
> +   #define KVM_SEV_SNP_RESTRICTED_INJET   (1<<0)
> +
> +   /* enable the restricted injection timer */
> +   #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1<<1)
> +
> +If the specified flags is not supported then return -EOPNOTSUPP, and the supported
> +flags are returned.
> +
>  References
>  ==========
>
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 44a3f920f886..a39e31845a33 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -218,6 +218,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  #define SVM_NESTED_CTL_SEV_ENABLE      BIT(1)
>  #define SVM_NESTED_CTL_SEV_ES_ENABLE   BIT(2)
>
> +#define SVM_SEV_FEAT_SNP_ACTIVE                BIT(0)
> +
>  struct vmcb_seg {
>         u16 selector;
>         u16 attrib;
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 50fddbe56981..93da463545ef 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -235,10 +235,30 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>         sev_decommission(handle);
>  }
>
> +static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +       struct kvm_snp_init params;
> +       int ret = 0;
> +
> +       if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
> +               return -EFAULT;
> +
> +       if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
> +               ret = -EOPNOTSUPP;
> +
> +       params.flags = SEV_SNP_SUPPORTED_FLAGS;
> +
> +       if (copy_to_user((void __user *)(uintptr_t)argp->data, &params, sizeof(params)))
> +               ret = -EFAULT;
> +
> +       return ret;
> +}
> +
>  static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
>         struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
>         int asid, ret;
>
>         if (kvm->created_vcpus)
> @@ -249,12 +269,22 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>                 return ret;
>
>         sev->es_active = es_active;
> +       sev->snp_active = snp_active;
>         asid = sev_asid_new(sev);
>         if (asid < 0)
>                 goto e_no_asid;
>         sev->asid = asid;
>
> -       ret = sev_platform_init(&argp->error);
> +       if (snp_active) {
> +               ret = verify_snp_init_flags(kvm, argp);
> +               if (ret)
> +                       goto e_free;
> +
> +               ret = sev_snp_init(&argp->error);
> +       } else {
> +               ret = sev_platform_init(&argp->error);

After SEV INIT_EX support patches, SEV may be initialized in the platform late.
In my tests, if SEV has not been initialized in the platform yet, SNP
VMs fail with SEV_DF_FLUSH required error. I tried calling
SEV_DF_FLUSH right after the SNP platform init but this time it failed
later on the SNP launch update command with SEV_RET_INVALID_PARAM
error. Looks like there is another dependency on SEV platform
initialization.

Calling sev_platform_init for SNP VMs fixes the problem in our tests.

> +       }
> +
>         if (ret)
>                 goto e_free;
>
> @@ -600,6 +630,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>         save->pkru = svm->vcpu.arch.pkru;
>         save->xss  = svm->vcpu.arch.ia32_xss;
>
> +       /* Enable the SEV-SNP feature */
> +       if (sev_snp_guest(svm->vcpu.kvm))
> +               save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
> +
>         return 0;
>  }
>
> @@ -1532,6 +1566,12 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>         }
>
>         switch (sev_cmd.id) {
> +       case KVM_SEV_SNP_INIT:
> +               if (!sev_snp_enabled) {
> +                       r = -ENOTTY;
> +                       goto out;
> +               }
> +               fallthrough;
>         case KVM_SEV_ES_INIT:
>                 if (!sev_es_enabled) {
>                         r = -ENOTTY;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 01953522097d..57c3c404b0b3 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -69,6 +69,9 @@ enum {
>  /* TPR and CR2 are always written before VMRUN */
>  #define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
>
> +/* Supported init feature flags */
> +#define SEV_SNP_SUPPORTED_FLAGS                0x0
> +
>  struct kvm_sev_info {
>         bool active;            /* SEV enabled guest */
>         bool es_active;         /* SEV-ES enabled guest */
> @@ -81,6 +84,7 @@ struct kvm_sev_info {
>         u64 ap_jump_table;      /* SEV-ES AP Jump Table address */
>         struct kvm *enc_context_owner; /* Owner of copied encryption context */
>         struct misc_cg *misc_cg; /* For misc cgroup accounting */
> +       u64 snp_init_flags;
>  };
>
>  struct kvm_svm {
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d9e4aabcb31a..944e2bf601fe 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1712,6 +1712,9 @@ enum sev_cmd_id {
>         /* Guest Migration Extension */
>         KVM_SEV_SEND_CANCEL,
>
> +       /* SNP specific commands */
> +       KVM_SEV_SNP_INIT,
> +
>         KVM_SEV_NR_MAX,
>  };
>
> @@ -1808,6 +1811,16 @@ struct kvm_sev_receive_update_data {
>         __u32 trans_len;
>  };
>
> +/* enable the restricted injection */
> +#define KVM_SEV_SNP_RESTRICTED_INJET   (1 << 0)
> +
> +/* enable the restricted injection timer */
> +#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET   (1 << 1)
> +
> +struct kvm_snp_init {
> +       __u64 flags;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU    (1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3         (1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX       (1 << 2)
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2022-06-13 20:58   ` Alper Gun
@ 2022-06-13 23:15     ` Ashish Kalra
  2022-06-13 23:33       ` Alper Gun
  0 siblings, 1 reply; 239+ messages in thread
From: Ashish Kalra @ 2022-06-13 23:15 UTC (permalink / raw)
  To: Alper Gun, Brijesh Singh, Ashish.Kalra
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	Marc Orr, sathyanarayanan.kuppuswamy, Pavan Kumar Paluri

Hello Alper,

On 6/13/22 20:58, Alper Gun wrote:
> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>   {
>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
>>          struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
>>          int asid, ret;
>>
>>          if (kvm->created_vcpus)
>> @@ -249,12 +269,22 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>                  return ret;
>>
>>          sev->es_active = es_active;
>> +       sev->snp_active = snp_active;
>>          asid = sev_asid_new(sev);
>>          if (asid < 0)
>>                  goto e_no_asid;
>>          sev->asid = asid;
>>
>> -       ret = sev_platform_init(&argp->error);
>> +       if (snp_active) {
>> +               ret = verify_snp_init_flags(kvm, argp);
>> +               if (ret)
>> +                       goto e_free;
>> +
>> +               ret = sev_snp_init(&argp->error);
>> +       } else {
>> +               ret = sev_platform_init(&argp->error);
> After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> In my tests, if SEV has not been initialized in the platform yet, SNP
> VMs fail with SEV_DF_FLUSH required error. I tried calling
> SEV_DF_FLUSH right after the SNP platform init but this time it failed
> later on the SNP launch update command with SEV_RET_INVALID_PARAM
> error. Looks like there is another dependency on SEV platform
> initialization.
>
> Calling sev_platform_init for SNP VMs fixes the problem in our tests.

Trying to get some more context for this issue.

When you say after SEV_INIT_EX support patches, SEV may be initialized 
in the platform late, do you mean sev_pci_init()->sev_snp_init() ... 
sev_platform_init() code path has still not executed on the host BSP ?

Before launching the first SNP/SEV guest launch after INIT, we need to 
issue SEV_CMD_DF_FLUSH command.

I assume that we will always be initially doing SNP firmware 
initialization with SNP_INIT command followed by sev_platform_init(), if 
SNP is enabled on boot CPU.

Thanks, Ashish


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2022-06-13 23:15     ` Ashish Kalra
@ 2022-06-13 23:33       ` Alper Gun
  2022-06-14  0:21         ` Ashish Kalra
  0 siblings, 1 reply; 239+ messages in thread
From: Alper Gun @ 2022-06-13 23:33 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Brijesh Singh, Ashish.Kalra, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Gonda, Peter Zijlstra, Srinivas Pandruvada, David Rientjes,
	Dov Murik, Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	Marc Orr, sathyanarayanan.kuppuswamy, Pavan Kumar Paluri

On Mon, Jun 13, 2022 at 4:15 PM Ashish Kalra <ashkalra@amd.com> wrote:
>
> Hello Alper,
>
> On 6/13/22 20:58, Alper Gun wrote:
> > static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>   {
> >> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
> >>          struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> >> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
> >>          int asid, ret;
> >>
> >>          if (kvm->created_vcpus)
> >> @@ -249,12 +269,22 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>                  return ret;
> >>
> >>          sev->es_active = es_active;
> >> +       sev->snp_active = snp_active;
> >>          asid = sev_asid_new(sev);
> >>          if (asid < 0)
> >>                  goto e_no_asid;
> >>          sev->asid = asid;
> >>
> >> -       ret = sev_platform_init(&argp->error);
> >> +       if (snp_active) {
> >> +               ret = verify_snp_init_flags(kvm, argp);
> >> +               if (ret)
> >> +                       goto e_free;
> >> +
> >> +               ret = sev_snp_init(&argp->error);
> >> +       } else {
> >> +               ret = sev_platform_init(&argp->error);
> > After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> > In my tests, if SEV has not been initialized in the platform yet, SNP
> > VMs fail with SEV_DF_FLUSH required error. I tried calling
> > SEV_DF_FLUSH right after the SNP platform init but this time it failed
> > later on the SNP launch update command with SEV_RET_INVALID_PARAM
> > error. Looks like there is another dependency on SEV platform
> > initialization.
> >
> > Calling sev_platform_init for SNP VMs fixes the problem in our tests.
>
> Trying to get some more context for this issue.
>
> When you say after SEV_INIT_EX support patches, SEV may be initialized
> in the platform late, do you mean sev_pci_init()->sev_snp_init() ...
> sev_platform_init() code path has still not executed on the host BSP ?
>

Correct, INIT_EX requires the file system to be ready and there is a
ccp module param to call it only when needed.

MODULE_PARM_DESC(psp_init_on_probe, " if true, the PSP will be
initialized on module init. Else the PSP will be initialized on the
first command requiring it");

If this module param is false, it won't initialize SEV on the platform
until the first SEV VM.

> Before launching the first SNP/SEV guest launch after INIT, we need to
> issue SEV_CMD_DF_FLUSH command.
>
> I assume that we will always be initially doing SNP firmware
> initialization with SNP_INIT command followed by sev_platform_init(), if
> SNP is enabled on boot CPU.
>
> Thanks, Ashish
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 14/45] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2021-08-20 15:58 ` [PATCH Part2 v5 14/45] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Brijesh Singh
  2022-02-25 18:03   ` Alper Gun
@ 2022-06-14  0:10   ` Alper Gun
  1 sibling, 0 replies; 239+ messages in thread
From: Alper Gun @ 2022-06-14  0:10 UTC (permalink / raw)
  To: Ashish.Kalra
  Cc: x86, linux-kernel, kvm, linux-coco, linux-mm, linux-crypto,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Gonda,
	Peter Zijlstra, Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	Marc Orr, sathyanarayanan.kuppuswamy, Brijesh Singh

Similar to the TMR page, sev_init_ex_buffer should be owned by
firmware. Otherwise INIT_EX won't work with the SNP.  Since v5 patches
are prepared before INIT_EX work, I wanted to bring this to your
attention.
One difference from the TMR page, sev_init_ex_buffer has to be in the
direct map. Firmware pages are removed from directmap in v5 patches.
But the kernel reads sev_init_ex_buffer later to write into a
persistent file. I have a version to make it work, if you're
interested I can share.

On Fri, Aug 20, 2021 at 9:00 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> The behavior and requirement for the SEV-legacy command is altered when
> the SNP firmware is in the INIT state. See SEV-SNP firmware specification
> for more details.
>
> Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
> when SNP is enabled to satify new requirements for the SNP. Continue
> allocating a 1mb region for !SNP configuration.
>
> While at it, provide API that can be used by others to allocate a page
> that can be used by the firmware. The immediate user for this API will
> be the KVM driver. The KVM driver to need to allocate a firmware context
> page during the guest creation. The context page need to be updated
> by the firmware. See the SEV-SNP specification for further details.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  drivers/crypto/ccp/sev-dev.c | 169 ++++++++++++++++++++++++++++++++++-
>  include/linux/psp-sev.h      |  11 +++
>  2 files changed, 176 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 01edad9116f2..34dc358b13b9 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -62,6 +62,14 @@ static int psp_timeout;
>  #define SEV_ES_TMR_SIZE                (1024 * 1024)
>  static void *sev_es_tmr;
>
> +/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
> +#define SEV_SNP_ES_TMR_SIZE    (2 * 1024 * 1024)
> +
> +static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
> +
> +static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
> +static int sev_do_cmd(int cmd, void *data, int *psp_ret);
> +
>  static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
>  {
>         struct sev_device *sev = psp_master->sev_data;
> @@ -159,6 +167,156 @@ static int sev_cmd_buffer_len(int cmd)
>         return 0;
>  }
>
> +static void snp_leak_pages(unsigned long pfn, unsigned int npages)
> +{
> +       WARN(1, "psc failed, pfn 0x%lx pages %d (leaking)\n", pfn, npages);
> +       while (npages--) {
> +               memory_failure(pfn, 0);
> +               dump_rmpentry(pfn);
> +               pfn++;
> +       }
> +}
> +
> +static int snp_reclaim_pages(unsigned long pfn, unsigned int npages, bool locked)
> +{
> +       struct sev_data_snp_page_reclaim data;
> +       int ret, err, i, n = 0;
> +
> +       for (i = 0; i < npages; i++) {
> +               memset(&data, 0, sizeof(data));
> +               data.paddr = pfn << PAGE_SHIFT;
> +
> +               if (locked)
> +                       ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> +               else
> +                       ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> +               if (ret)
> +                       goto cleanup;
> +
> +               ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> +               if (ret)
> +                       goto cleanup;
> +
> +               pfn++;
> +               n++;
> +       }
> +
> +       return 0;
> +
> +cleanup:
> +       /*
> +        * If failed to reclaim the page then page is no longer safe to
> +        * be released, leak it.
> +        */
> +       snp_leak_pages(pfn, npages - n);
> +       return ret;
> +}
> +
> +static inline int rmp_make_firmware(unsigned long pfn, int level)
> +{
> +       return rmp_make_private(pfn, 0, level, 0, true);
> +}
> +
> +static int snp_set_rmp_state(unsigned long paddr, unsigned int npages, bool to_fw, bool locked,
> +                            bool need_reclaim)
> +{
> +       unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT; /* Cbit maybe set in the paddr */
> +       int rc, n = 0, i;
> +
> +       for (i = 0; i < npages; i++) {
> +               if (to_fw)
> +                       rc = rmp_make_firmware(pfn, PG_LEVEL_4K);
> +               else
> +                       rc = need_reclaim ? snp_reclaim_pages(pfn, 1, locked) :
> +                                           rmp_make_shared(pfn, PG_LEVEL_4K);
> +               if (rc)
> +                       goto cleanup;
> +
> +               pfn++;
> +               n++;
> +       }
> +
> +       return 0;
> +
> +cleanup:
> +       /* Try unrolling the firmware state changes */
> +       if (to_fw) {
> +               /*
> +                * Reclaim the pages which were already changed to the
> +                * firmware state.
> +                */
> +               snp_reclaim_pages(paddr >> PAGE_SHIFT, n, locked);
> +
> +               return rc;
> +       }
> +
> +       /*
> +        * If failed to change the page state to shared, then its not safe
> +        * to release the page back to the system, leak it.
> +        */
> +       snp_leak_pages(pfn, npages - n);
> +
> +       return rc;
> +}
> +
> +static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
> +{
> +       unsigned long npages = 1ul << order, paddr;
> +       struct sev_device *sev;
> +       struct page *page;
> +
> +       if (!psp_master || !psp_master->sev_data)
> +               return ERR_PTR(-EINVAL);
> +
> +       page = alloc_pages(gfp_mask, order);
> +       if (!page)
> +               return NULL;
> +
> +       /* If SEV-SNP is initialized then add the page in RMP table. */
> +       sev = psp_master->sev_data;
> +       if (!sev->snp_inited)
> +               return page;
> +
> +       paddr = __pa((unsigned long)page_address(page));
> +       if (snp_set_rmp_state(paddr, npages, true, locked, false))
> +               return NULL;
> +
> +       return page;
> +}
> +
> +void *snp_alloc_firmware_page(gfp_t gfp_mask)
> +{
> +       struct page *page;
> +
> +       page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
> +
> +       return page ? page_address(page) : NULL;
> +}
> +EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
> +
> +static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
> +{
> +       unsigned long paddr, npages = 1ul << order;
> +
> +       if (!page)
> +               return;
> +
> +       paddr = __pa((unsigned long)page_address(page));
> +       if (snp_set_rmp_state(paddr, npages, false, locked, true))
> +               return;
> +
> +       __free_pages(page, order);
> +}
> +
> +void snp_free_firmware_page(void *addr)
> +{
> +       if (!addr)
> +               return;
> +
> +       __snp_free_firmware_pages(virt_to_page(addr), 0, false);
> +}
> +EXPORT_SYMBOL(snp_free_firmware_page);
> +
>  static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>  {
>         struct psp_device *psp = psp_master;
> @@ -281,7 +439,7 @@ static int __sev_platform_init_locked(int *error)
>
>                 data.flags |= SEV_INIT_FLAGS_SEV_ES;
>                 data.tmr_address = tmr_pa;
> -               data.tmr_len = SEV_ES_TMR_SIZE;
> +               data.tmr_len = sev_es_tmr_size;
>         }
>
>         rc = __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
> @@ -638,6 +796,8 @@ static int __sev_snp_init_locked(int *error)
>         sev->snp_inited = true;
>         dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
>
> +       sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
> +
>         return rc;
>  }
>
> @@ -1161,8 +1321,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
>                 /* The TMR area was encrypted, flush it from the cache */
>                 wbinvd_on_all_cpus();
>
> -               free_pages((unsigned long)sev_es_tmr,
> -                          get_order(SEV_ES_TMR_SIZE));
> +               __snp_free_firmware_pages(virt_to_page(sev_es_tmr),
> +                                         get_order(sev_es_tmr_size),
> +                                         false);
>                 sev_es_tmr = NULL;
>         }
>
> @@ -1233,7 +1394,7 @@ void sev_pci_init(void)
>         }
>
>         /* Obtain the TMR memory area for SEV-ES use */
> -       tmr_page = alloc_pages(GFP_KERNEL, get_order(SEV_ES_TMR_SIZE));
> +       tmr_page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(sev_es_tmr_size), false);
>         if (tmr_page) {
>                 sev_es_tmr = page_address(tmr_page);
>         } else {
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index f2105a8755f9..00bd684dc094 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -12,6 +12,8 @@
>  #ifndef __PSP_SEV_H__
>  #define __PSP_SEV_H__
>
> +#include <linux/sev.h>
> +
>  #include <uapi/linux/psp-sev.h>
>
>  #ifdef CONFIG_X86
> @@ -919,6 +921,8 @@ int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
>  int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
>
>  void *psp_copy_user_blob(u64 uaddr, u32 len);
> +void *snp_alloc_firmware_page(gfp_t mask);
> +void snp_free_firmware_page(void *addr);
>
>  #else  /* !CONFIG_CRYPTO_DEV_SP_PSP */
>
> @@ -960,6 +964,13 @@ static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *erro
>         return -ENODEV;
>  }
>
> +static inline void *snp_alloc_firmware_page(gfp_t mask)
> +{
> +       return NULL;
> +}
> +
> +static inline void snp_free_firmware_page(void *addr) { }
> +
>  #endif /* CONFIG_CRYPTO_DEV_SP_PSP */
>
>  #endif /* __PSP_SEV_H__ */
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2022-06-13 23:33       ` Alper Gun
@ 2022-06-14  0:21         ` Ashish Kalra
  2022-06-14 15:37           ` Peter Gonda
  0 siblings, 1 reply; 239+ messages in thread
From: Ashish Kalra @ 2022-06-14  0:21 UTC (permalink / raw)
  To: Alper Gun
  Cc: Brijesh Singh, Ashish.Kalra, x86, linux-kernel, kvm, linux-coco,
	linux-mm, linux-crypto, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, Tom Lendacky, H. Peter Anvin, Ard Biesheuvel,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Andy Lutomirski, Dave Hansen, Sergio Lopez,
	Peter Gonda, Peter Zijlstra, Srinivas Pandruvada, David Rientjes,
	Dov Murik, Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, tony.luck,
	Marc Orr, sathyanarayanan.kuppuswamy, Pavan Kumar Paluri


On 6/13/22 23:33, Alper Gun wrote:
> On Mon, Jun 13, 2022 at 4:15 PM Ashish Kalra <ashkalra@amd.com> wrote:
>> Hello Alper,
>>
>> On 6/13/22 20:58, Alper Gun wrote:
>>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>>    {
>>>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
>>>>           struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
>>>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
>>>>           int asid, ret;
>>>>
>>>>           if (kvm->created_vcpus)
>>>> @@ -249,12 +269,22 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>>>                   return ret;
>>>>
>>>>           sev->es_active = es_active;
>>>> +       sev->snp_active = snp_active;
>>>>           asid = sev_asid_new(sev);
>>>>           if (asid < 0)
>>>>                   goto e_no_asid;
>>>>           sev->asid = asid;
>>>>
>>>> -       ret = sev_platform_init(&argp->error);
>>>> +       if (snp_active) {
>>>> +               ret = verify_snp_init_flags(kvm, argp);
>>>> +               if (ret)
>>>> +                       goto e_free;
>>>> +
>>>> +               ret = sev_snp_init(&argp->error);
>>>> +       } else {
>>>> +               ret = sev_platform_init(&argp->error);
>>> After SEV INIT_EX support patches, SEV may be initialized in the platform late.
>>> In my tests, if SEV has not been initialized in the platform yet, SNP
>>> VMs fail with SEV_DF_FLUSH required error. I tried calling
>>> SEV_DF_FLUSH right after the SNP platform init but this time it failed
>>> later on the SNP launch update command with SEV_RET_INVALID_PARAM
>>> error. Looks like there is another dependency on SEV platform
>>> initialization.
>>>
>>> Calling sev_platform_init for SNP VMs fixes the problem in our tests.
>> Trying to get some more context for this issue.
>>
>> When you say after SEV_INIT_EX support patches, SEV may be initialized
>> in the platform late, do you mean sev_pci_init()->sev_snp_init() ...
>> sev_platform_init() code path has still not executed on the host BSP ?
>>
> Correct, INIT_EX requires the file system to be ready and there is a
> ccp module param to call it only when needed.
>
> MODULE_PARM_DESC(psp_init_on_probe, " if true, the PSP will be
> initialized on module init. Else the PSP will be initialized on the
> first command requiring it");
>
> If this module param is false, it won't initialize SEV on the platform
> until the first SEV VM.
>
Ok, that makes sense.

So the fix will be to call sev_platform_init() unconditionally here in 
sev_guest_init(), and both sev_snp_init() and sev_platform_init() are 
protected from being called again, so there won't be any issues if these 
functions are invoked again at SNP/SEV VM launch if they have been 
invoked earlier during module init.

Thanks, Ashish


^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2022-06-14  0:21         ` Ashish Kalra
@ 2022-06-14 15:37           ` Peter Gonda
  2022-06-14 16:11             ` Kalra, Ashish
  0 siblings, 1 reply; 239+ messages in thread
From: Peter Gonda @ 2022-06-14 15:37 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Alper Gun, Brijesh Singh, Ashish Kalra, the arch/x86 maintainers,
	LKML, kvm list, linux-coco, linux-mm, Linux Crypto Mailing List,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Michael Roth,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Marc Orr, Sathyanarayanan Kuppuswamy, Pavan Kumar Paluri

On Mon, Jun 13, 2022 at 6:21 PM Ashish Kalra <ashkalra@amd.com> wrote:
>
>
> On 6/13/22 23:33, Alper Gun wrote:
> > On Mon, Jun 13, 2022 at 4:15 PM Ashish Kalra <ashkalra@amd.com> wrote:
> >> Hello Alper,
> >>
> >> On 6/13/22 20:58, Alper Gun wrote:
> >>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>>>    {
> >>>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
> >>>>           struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> >>>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
> >>>>           int asid, ret;
> >>>>
> >>>>           if (kvm->created_vcpus)
> >>>> @@ -249,12 +269,22 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>>>                   return ret;
> >>>>
> >>>>           sev->es_active = es_active;
> >>>> +       sev->snp_active = snp_active;
> >>>>           asid = sev_asid_new(sev);
> >>>>           if (asid < 0)
> >>>>                   goto e_no_asid;
> >>>>           sev->asid = asid;
> >>>>
> >>>> -       ret = sev_platform_init(&argp->error);
> >>>> +       if (snp_active) {
> >>>> +               ret = verify_snp_init_flags(kvm, argp);
> >>>> +               if (ret)
> >>>> +                       goto e_free;
> >>>> +
> >>>> +               ret = sev_snp_init(&argp->error);
> >>>> +       } else {
> >>>> +               ret = sev_platform_init(&argp->error);
> >>> After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> >>> In my tests, if SEV has not been initialized in the platform yet, SNP
> >>> VMs fail with SEV_DF_FLUSH required error. I tried calling
> >>> SEV_DF_FLUSH right after the SNP platform init but this time it failed
> >>> later on the SNP launch update command with SEV_RET_INVALID_PARAM
> >>> error. Looks like there is another dependency on SEV platform
> >>> initialization.
> >>>
> >>> Calling sev_platform_init for SNP VMs fixes the problem in our tests.
> >> Trying to get some more context for this issue.
> >>
> >> When you say after SEV_INIT_EX support patches, SEV may be initialized
> >> in the platform late, do you mean sev_pci_init()->sev_snp_init() ...
> >> sev_platform_init() code path has still not executed on the host BSP ?
> >>
> > Correct, INIT_EX requires the file system to be ready and there is a
> > ccp module param to call it only when needed.
> >
> > MODULE_PARM_DESC(psp_init_on_probe, " if true, the PSP will be
> > initialized on module init. Else the PSP will be initialized on the
> > first command requiring it");
> >
> > If this module param is false, it won't initialize SEV on the platform
> > until the first SEV VM.
> >
> Ok, that makes sense.
>
> So the fix will be to call sev_platform_init() unconditionally here in
> sev_guest_init(), and both sev_snp_init() and sev_platform_init() are
> protected from being called again, so there won't be any issues if these
> functions are invoked again at SNP/SEV VM launch if they have been
> invoked earlier during module init.

That's one solution. I don't know if there is a downside to the system
for enabling SEV if SNP is being enabled but another solution could be
to just directly place a DF_FLUSH command instead of calling
sev_platform_init().

>
> Thanks, Ashish
>

^ permalink raw reply	[flat|nested] 239+ messages in thread

* RE: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2022-06-14 15:37           ` Peter Gonda
@ 2022-06-14 16:11             ` Kalra, Ashish
  2022-06-14 16:30               ` Peter Gonda
  0 siblings, 1 reply; 239+ messages in thread
From: Kalra, Ashish @ 2022-06-14 16:11 UTC (permalink / raw)
  To: Peter Gonda
  Cc: Alper Gun, Brijesh Singh, the arch/x86 maintainers, LKML,
	kvm list, linux-coco, linux-mm, Linux Crypto Mailing List,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Lendacky, Thomas,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Roth, Michael,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Marc Orr, Sathyanarayanan Kuppuswamy, Pavan Kumar Paluri

[AMD Official Use Only - General]


-----Original Message-----
From: Peter Gonda <pgonda@google.com> 
Sent: Tuesday, June 14, 2022 10:38 AM
To: Kalra, Ashish <Ashish.Kalra@amd.com>
Cc: Alper Gun <alpergun@google.com>; Brijesh Singh <brijesh.singh@amd.com>; Kalra, Ashish <Ashish.Kalra@amd.com>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing List <linux-crypto@vger.kernel.org>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H. Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo Bonzini <pbonzini@redhat.com>; Sean Christopherson <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy Lutomirski <luto@kernel.org>; Dave Hansen <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter Zijlstra <peterz@infradead.org>; Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>; David Rientjes <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc Orr <marcorr@google.com>; Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri <papaluri@amd.com>
Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command

On Mon, Jun 13, 2022 at 6:21 PM Ashish Kalra <ashkalra@amd.com> wrote:
>
>
> On 6/13/22 23:33, Alper Gun wrote:
> > On Mon, Jun 13, 2022 at 4:15 PM Ashish Kalra <ashkalra@amd.com> wrote:
> >> Hello Alper,
> >>
> >> On 6/13/22 20:58, Alper Gun wrote:
> >>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd 
> >>> *argp)
> >>>>    {
> >>>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id 
> >>>> + == KVM_SEV_SNP_INIT);
> >>>>           struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> >>>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> >>>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
> >>>>           int asid, ret;
> >>>>
> >>>>           if (kvm->created_vcpus) @@ -249,12 +269,22 @@ static 
> >>>> int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> >>>>                   return ret;
> >>>>
> >>>>           sev->es_active = es_active;
> >>>> +       sev->snp_active = snp_active;
> >>>>           asid = sev_asid_new(sev);
> >>>>           if (asid < 0)
> >>>>                   goto e_no_asid;
> >>>>           sev->asid = asid;
> >>>>
> >>>> -       ret = sev_platform_init(&argp->error);
> >>>> +       if (snp_active) {
> >>>> +               ret = verify_snp_init_flags(kvm, argp);
> >>>> +               if (ret)
> >>>> +                       goto e_free;
> >>>> +
> >>>> +               ret = sev_snp_init(&argp->error);
> >>>> +       } else {
> >>>> +               ret = sev_platform_init(&argp->error);
> >>> After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> >>> In my tests, if SEV has not been initialized in the platform yet, 
> >>> SNP VMs fail with SEV_DF_FLUSH required error. I tried calling 
> >>> SEV_DF_FLUSH right after the SNP platform init but this time it 
> >>> failed later on the SNP launch update command with 
> >>> SEV_RET_INVALID_PARAM error. Looks like there is another 
> >>> dependency on SEV platform initialization.
> >>>
> >>> Calling sev_platform_init for SNP VMs fixes the problem in our tests.
> >> Trying to get some more context for this issue.
> >>
> >> When you say after SEV_INIT_EX support patches, SEV may be 
> >> initialized in the platform late, do you mean sev_pci_init()->sev_snp_init() ...
> >> sev_platform_init() code path has still not executed on the host BSP ?
> >>
> > Correct, INIT_EX requires the file system to be ready and there is a 
> > ccp module param to call it only when needed.
> >
> > MODULE_PARM_DESC(psp_init_on_probe, " if true, the PSP will be 
> > initialized on module init. Else the PSP will be initialized on the 
> > first command requiring it");
> >
> > If this module param is false, it won't initialize SEV on the 
> > platform until the first SEV VM.
> >
> Ok, that makes sense.
>
> So the fix will be to call sev_platform_init() unconditionally here in 
> sev_guest_init(), and both sev_snp_init() and sev_platform_init() are 
> protected from being called again, so there won't be any issues if 
> these functions are invoked again at SNP/SEV VM launch if they have 
> been invoked earlier during module init.

>That's one solution. I don't know if there is a downside to the system for enabling SEV if SNP is being enabled but another solution could be to just directly place a DF_FLUSH command instead of calling sev_platform_init().

Actually sev_platform_init() is already called on module init if psp_init_on_probe is not false. Only need to ensure that SNP firmware is initialized first with SNP_INIT command.

Thanks,
Ashish 

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2022-06-14 16:11             ` Kalra, Ashish
@ 2022-06-14 16:30               ` Peter Gonda
  2022-06-14 17:16                 ` Kalra, Ashish
  0 siblings, 1 reply; 239+ messages in thread
From: Peter Gonda @ 2022-06-14 16:30 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: Alper Gun, Brijesh Singh, the arch/x86 maintainers, LKML,
	kvm list, linux-coco, linux-mm, Linux Crypto Mailing List,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Lendacky, Thomas,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Andy Lutomirski, Dave Hansen, Sergio Lopez, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Roth, Michael,
	Vlastimil Babka, Kirill A . Shutemov, Andi Kleen, Tony Luck,
	Marc Orr, Sathyanarayanan Kuppuswamy, Pavan Kumar Paluri

On Tue, Jun 14, 2022 at 10:11 AM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
>
> [AMD Official Use Only - General]
>
>
> -----Original Message-----
> From: Peter Gonda <pgonda@google.com>
> Sent: Tuesday, June 14, 2022 10:38 AM
> To: Kalra, Ashish <Ashish.Kalra@amd.com>
> Cc: Alper Gun <alpergun@google.com>; Brijesh Singh <brijesh.singh@amd.com>; Kalra, Ashish <Ashish.Kalra@amd.com>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing List <linux-crypto@vger.kernel.org>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H. Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo Bonzini <pbonzini@redhat.com>; Sean Christopherson <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy Lutomirski <luto@kernel.org>; Dave Hansen <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter Zijlstra <peterz@infradead.org>; Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>; David Rientjes <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc Orr <marcorr@google.com>; Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri <papaluri@amd.com>
> Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
>
> On Mon, Jun 13, 2022 at 6:21 PM Ashish Kalra <ashkalra@amd.com> wrote:
> >
> >
> > On 6/13/22 23:33, Alper Gun wrote:
> > > On Mon, Jun 13, 2022 at 4:15 PM Ashish Kalra <ashkalra@amd.com> wrote:
> > >> Hello Alper,
> > >>
> > >> On 6/13/22 20:58, Alper Gun wrote:
> > >>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd
> > >>> *argp)
> > >>>>    {
> > >>>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id
> > >>>> + == KVM_SEV_SNP_INIT);
> > >>>>           struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> > >>>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
> > >>>>           int asid, ret;
> > >>>>
> > >>>>           if (kvm->created_vcpus) @@ -249,12 +269,22 @@ static
> > >>>> int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > >>>>                   return ret;
> > >>>>
> > >>>>           sev->es_active = es_active;
> > >>>> +       sev->snp_active = snp_active;
> > >>>>           asid = sev_asid_new(sev);
> > >>>>           if (asid < 0)
> > >>>>                   goto e_no_asid;
> > >>>>           sev->asid = asid;
> > >>>>
> > >>>> -       ret = sev_platform_init(&argp->error);
> > >>>> +       if (snp_active) {
> > >>>> +               ret = verify_snp_init_flags(kvm, argp);
> > >>>> +               if (ret)
> > >>>> +                       goto e_free;
> > >>>> +
> > >>>> +               ret = sev_snp_init(&argp->error);
> > >>>> +       } else {
> > >>>> +               ret = sev_platform_init(&argp->error);
> > >>> After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> > >>> In my tests, if SEV has not been initialized in the platform yet,
> > >>> SNP VMs fail with SEV_DF_FLUSH required error. I tried calling
> > >>> SEV_DF_FLUSH right after the SNP platform init but this time it
> > >>> failed later on the SNP launch update command with
> > >>> SEV_RET_INVALID_PARAM error. Looks like there is another
> > >>> dependency on SEV platform initialization.
> > >>>
> > >>> Calling sev_platform_init for SNP VMs fixes the problem in our tests.
> > >> Trying to get some more context for this issue.
> > >>
> > >> When you say after SEV_INIT_EX support patches, SEV may be
> > >> initialized in the platform late, do you mean sev_pci_init()->sev_snp_init() ...
> > >> sev_platform_init() code path has still not executed on the host BSP ?
> > >>
> > > Correct, INIT_EX requires the file system to be ready and there is a
> > > ccp module param to call it only when needed.
> > >
> > > MODULE_PARM_DESC(psp_init_on_probe, " if true, the PSP will be
> > > initialized on module init. Else the PSP will be initialized on the
> > > first command requiring it");
> > >
> > > If this module param is false, it won't initialize SEV on the
> > > platform until the first SEV VM.
> > >
> > Ok, that makes sense.
> >
> > So the fix will be to call sev_platform_init() unconditionally here in
> > sev_guest_init(), and both sev_snp_init() and sev_platform_init() are
> > protected from being called again, so there won't be any issues if
> > these functions are invoked again at SNP/SEV VM launch if they have
> > been invoked earlier during module init.
>
> >That's one solution. I don't know if there is a downside to the system for enabling SEV if SNP is being enabled but another solution could be to just directly place a DF_FLUSH command instead of calling sev_platform_init().
>
> Actually sev_platform_init() is already called on module init if psp_init_on_probe is not false. Only need to ensure that SNP firmware is initialized first with SNP_INIT command.

But if psp_init_on_probe is false, sev_platform_init() isn't called
down this path. Alper has suggested we always call sev_platform_init()
but we could just place an SEV_DF_FLUSH command instead. Or am I still
missing something?

>
> Thanks,
> Ashish

^ permalink raw reply	[flat|nested] 239+ messages in thread

* RE: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2022-06-14 16:30               ` Peter Gonda
@ 2022-06-14 17:16                 ` Kalra, Ashish
  2022-06-14 18:58                   ` Alper Gun
  0 siblings, 1 reply; 239+ messages in thread
From: Kalra, Ashish @ 2022-06-14 17:16 UTC (permalink / raw)
  To: Peter Gonda
  Cc: Alper Gun, the arch/x86 maintainers, LKML, kvm list, linux-coco,
	linux-mm, Linux Crypto Mailing List, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, Lendacky, Thomas, H. Peter Anvin,
	Ard Biesheuvel, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Roth, Michael, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck, Marc Orr,
	Sathyanarayanan Kuppuswamy

[AMD Official Use Only - General]

Hello Alper, Peter,

-----Original Message-----
From: Peter Gonda <pgonda@google.com> 
Sent: Tuesday, June 14, 2022 11:30 AM
To: Kalra, Ashish <Ashish.Kalra@amd.com>
Cc: Alper Gun <alpergun@google.com>; Brijesh Singh <brijesh.singh@amd.com>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing List <linux-crypto@vger.kernel.org>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H. Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo Bonzini <pbonzini@redhat.com>; Sean Christopherson <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy Lutomirski <luto@kernel.org>; Dave Hansen <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter Zijlstra <peterz@infradead.org>; Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>; David Rientjes <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc Orr <marcorr@google.com>; Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri <papaluri@amd.com>
Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command

On Tue, Jun 14, 2022 at 10:11 AM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
>
> [AMD Official Use Only - General]
>
>
> -----Original Message-----
> From: Peter Gonda <pgonda@google.com>
> Sent: Tuesday, June 14, 2022 10:38 AM
> To: Kalra, Ashish <Ashish.Kalra@amd.com>
> Cc: Alper Gun <alpergun@google.com>; Brijesh Singh 
> <brijesh.singh@amd.com>; Kalra, Ashish <Ashish.Kalra@amd.com>; the 
> arch/x86 maintainers <x86@kernel.org>; LKML 
> <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; 
> linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing 
> List <linux-crypto@vger.kernel.org>; Thomas Gleixner 
> <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel 
> <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H. 
> Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo 
> Bonzini <pbonzini@redhat.com>; Sean Christopherson 
> <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng 
> Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy 
> Lutomirski <luto@kernel.org>; Dave Hansen 
> <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter 
> Zijlstra <peterz@infradead.org>; Srinivas Pandruvada 
> <srinivas.pandruvada@linux.intel.com>; David Rientjes 
> <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin 
> Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; 
> Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka 
> <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi 
> Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc Orr 
> <marcorr@google.com>; Sathyanarayanan Kuppuswamy 
> <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri 
> <papaluri@amd.com>
> Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
>
> On Mon, Jun 13, 2022 at 6:21 PM Ashish Kalra <ashkalra@amd.com> wrote:
> >
> >
> > On 6/13/22 23:33, Alper Gun wrote:
> > > On Mon, Jun 13, 2022 at 4:15 PM Ashish Kalra <ashkalra@amd.com> wrote:
> > >> Hello Alper,
> > >>
> > >> On 6/13/22 20:58, Alper Gun wrote:
> > >>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd
> > >>> *argp)
> > >>>>    {
> > >>>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || 
> > >>>> + argp->id == KVM_SEV_SNP_INIT);
> > >>>>           struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > >>>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> > >>>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
> > >>>>           int asid, ret;
> > >>>>
> > >>>>           if (kvm->created_vcpus) @@ -249,12 +269,22 @@ static 
> > >>>> int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > >>>>                   return ret;
> > >>>>
> > >>>>           sev->es_active = es_active;
> > >>>> +       sev->snp_active = snp_active;
> > >>>>           asid = sev_asid_new(sev);
> > >>>>           if (asid < 0)
> > >>>>                   goto e_no_asid;
> > >>>>           sev->asid = asid;
> > >>>>
> > >>>> -       ret = sev_platform_init(&argp->error);
> > >>>> +       if (snp_active) {
> > >>>> +               ret = verify_snp_init_flags(kvm, argp);
> > >>>> +               if (ret)
> > >>>> +                       goto e_free;
> > >>>> +
> > >>>> +               ret = sev_snp_init(&argp->error);
> > >>>> +       } else {
> > >>>> +               ret = sev_platform_init(&argp->error);
> > >>> After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> > >>> In my tests, if SEV has not been initialized in the platform 
> > >>> yet, SNP VMs fail with SEV_DF_FLUSH required error. I tried 
> > >>> calling SEV_DF_FLUSH right after the SNP platform init but this 
> > >>> time it failed later on the SNP launch update command with 
> > >>> SEV_RET_INVALID_PARAM error. Looks like there is another 
> > >>> dependency on SEV platform initialization.
> > >>>
> > >>> Calling sev_platform_init for SNP VMs fixes the problem in our tests.
> > >> Trying to get some more context for this issue.
> > >>
> > >> When you say after SEV_INIT_EX support patches, SEV may be 
> > >> initialized in the platform late, do you mean sev_pci_init()->sev_snp_init() ...
> > >> sev_platform_init() code path has still not executed on the host BSP ?
> > >>
> > > Correct, INIT_EX requires the file system to be ready and there is 
> > > a ccp module param to call it only when needed.
> > >
> > > MODULE_PARM_DESC(psp_init_on_probe, " if true, the PSP will be 
> > > initialized on module init. Else the PSP will be initialized on 
> > > the first command requiring it");
> > >
> > > If this module param is false, it won't initialize SEV on the 
> > > platform until the first SEV VM.
> > >
> > Ok, that makes sense.
> >
> > So the fix will be to call sev_platform_init() unconditionally here 
> > in sev_guest_init(), and both sev_snp_init() and sev_platform_init() 
> > are protected from being called again, so there won't be any issues 
> > if these functions are invoked again at SNP/SEV VM launch if they 
> > have been invoked earlier during module init.
>
> >That's one solution. I don't know if there is a downside to the system for enabling SEV if SNP is being enabled but another solution could be to just directly place a DF_FLUSH command instead of calling sev_platform_init().
>
> Actually sev_platform_init() is already called on module init if psp_init_on_probe is not false. Only need to ensure that SNP firmware is initialized first with SNP_INIT command.

> But if psp_init_on_probe is false, sev_platform_init() isn't called down this path. Alper has suggested we always call sev_platform_init() but we could just place an SEV_DF_FLUSH command instead. Or am I still missing something?

>After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> In my tests, if SEV has not been initialized in the platform 
> yet, SNP VMs fail with SEV_DF_FLUSH required error. I tried 
> calling SEV_DF_FLUSH right after the SNP platform init.

Are you getting the DLFLUSH_REQUIRED error after the SNP activate command ?

Also did you use the SEV_DF_FLUSH command or the SNP_DF_FLUSH command ?

With SNP you need to use SNP_DF_FLUSH command.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2022-06-14 17:16                 ` Kalra, Ashish
@ 2022-06-14 18:58                   ` Alper Gun
  2022-06-14 20:23                     ` Kalra, Ashish
  0 siblings, 1 reply; 239+ messages in thread
From: Alper Gun @ 2022-06-14 18:58 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: Peter Gonda, the arch/x86 maintainers, LKML, kvm list,
	linux-coco, linux-mm, Linux Crypto Mailing List, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, Lendacky, Thomas, H. Peter Anvin,
	Ard Biesheuvel, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Roth, Michael, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck, Marc Orr,
	Sathyanarayanan Kuppuswamy

Let me summarize what I tried.

1- when using psp_init_probe false, the SNP VM fails in
SNP_LAUNCH_START step with error SEV_RET_DFFLUSH_REQUIRED(15).
2- added SEV_DF_FLUSH just after SNP platform init and it didn't fail
in launch start but failed later during SNP_LAUNCH_UPDATE with
SEV_RET_INVALID_PARAM(22)
3- added SNP_DF_FLUSH just after SNP platform init and it failed again
during SNP_LAUNCH_UPDATE with SEV_RET_INVALID_PARAM(22)
4- added sev_platform_init for SNP VMs and it worked.

For me DF_FLUSH alone didn' help boot a VM. I don't know yet why sev
platform status impacts the SNP VM, but sev_platform_init fixes the
problem.


On Tue, Jun 14, 2022 at 10:16 AM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
>
> [AMD Official Use Only - General]
>
> Hello Alper, Peter,
>
> -----Original Message-----
> From: Peter Gonda <pgonda@google.com>
> Sent: Tuesday, June 14, 2022 11:30 AM
> To: Kalra, Ashish <Ashish.Kalra@amd.com>
> Cc: Alper Gun <alpergun@google.com>; Brijesh Singh <brijesh.singh@amd.com>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing List <linux-crypto@vger.kernel.org>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H. Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo Bonzini <pbonzini@redhat.com>; Sean Christopherson <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy Lutomirski <luto@kernel.org>; Dave Hansen <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter Zijlstra <peterz@infradead.org>; Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>; David Rientjes <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc Orr <marcorr@google.com>; Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri <papaluri@amd.com>
> Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
>
> On Tue, Jun 14, 2022 at 10:11 AM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
> >
> > [AMD Official Use Only - General]
> >
> >
> > -----Original Message-----
> > From: Peter Gonda <pgonda@google.com>
> > Sent: Tuesday, June 14, 2022 10:38 AM
> > To: Kalra, Ashish <Ashish.Kalra@amd.com>
> > Cc: Alper Gun <alpergun@google.com>; Brijesh Singh
> > <brijesh.singh@amd.com>; Kalra, Ashish <Ashish.Kalra@amd.com>; the
> > arch/x86 maintainers <x86@kernel.org>; LKML
> > <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>;
> > linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing
> > List <linux-crypto@vger.kernel.org>; Thomas Gleixner
> > <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel
> > <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H.
> > Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo
> > Bonzini <pbonzini@redhat.com>; Sean Christopherson
> > <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng
> > Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy
> > Lutomirski <luto@kernel.org>; Dave Hansen
> > <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter
> > Zijlstra <peterz@infradead.org>; Srinivas Pandruvada
> > <srinivas.pandruvada@linux.intel.com>; David Rientjes
> > <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin
> > Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>;
> > Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka
> > <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi
> > Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc Orr
> > <marcorr@google.com>; Sathyanarayanan Kuppuswamy
> > <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri
> > <papaluri@amd.com>
> > Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
> >
> > On Mon, Jun 13, 2022 at 6:21 PM Ashish Kalra <ashkalra@amd.com> wrote:
> > >
> > >
> > > On 6/13/22 23:33, Alper Gun wrote:
> > > > On Mon, Jun 13, 2022 at 4:15 PM Ashish Kalra <ashkalra@amd.com> wrote:
> > > >> Hello Alper,
> > > >>
> > > >> On 6/13/22 20:58, Alper Gun wrote:
> > > >>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd
> > > >>> *argp)
> > > >>>>    {
> > > >>>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT ||
> > > >>>> + argp->id == KVM_SEV_SNP_INIT);
> > > >>>>           struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> > > >>>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
> > > >>>>           int asid, ret;
> > > >>>>
> > > >>>>           if (kvm->created_vcpus) @@ -249,12 +269,22 @@ static
> > > >>>> int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > >>>>                   return ret;
> > > >>>>
> > > >>>>           sev->es_active = es_active;
> > > >>>> +       sev->snp_active = snp_active;
> > > >>>>           asid = sev_asid_new(sev);
> > > >>>>           if (asid < 0)
> > > >>>>                   goto e_no_asid;
> > > >>>>           sev->asid = asid;
> > > >>>>
> > > >>>> -       ret = sev_platform_init(&argp->error);
> > > >>>> +       if (snp_active) {
> > > >>>> +               ret = verify_snp_init_flags(kvm, argp);
> > > >>>> +               if (ret)
> > > >>>> +                       goto e_free;
> > > >>>> +
> > > >>>> +               ret = sev_snp_init(&argp->error);
> > > >>>> +       } else {
> > > >>>> +               ret = sev_platform_init(&argp->error);
> > > >>> After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> > > >>> In my tests, if SEV has not been initialized in the platform
> > > >>> yet, SNP VMs fail with SEV_DF_FLUSH required error. I tried
> > > >>> calling SEV_DF_FLUSH right after the SNP platform init but this
> > > >>> time it failed later on the SNP launch update command with
> > > >>> SEV_RET_INVALID_PARAM error. Looks like there is another
> > > >>> dependency on SEV platform initialization.
> > > >>>
> > > >>> Calling sev_platform_init for SNP VMs fixes the problem in our tests.
> > > >> Trying to get some more context for this issue.
> > > >>
> > > >> When you say after SEV_INIT_EX support patches, SEV may be
> > > >> initialized in the platform late, do you mean sev_pci_init()->sev_snp_init() ...
> > > >> sev_platform_init() code path has still not executed on the host BSP ?
> > > >>
> > > > Correct, INIT_EX requires the file system to be ready and there is
> > > > a ccp module param to call it only when needed.
> > > >
> > > > MODULE_PARM_DESC(psp_init_on_probe, " if true, the PSP will be
> > > > initialized on module init. Else the PSP will be initialized on
> > > > the first command requiring it");
> > > >
> > > > If this module param is false, it won't initialize SEV on the
> > > > platform until the first SEV VM.
> > > >
> > > Ok, that makes sense.
> > >
> > > So the fix will be to call sev_platform_init() unconditionally here
> > > in sev_guest_init(), and both sev_snp_init() and sev_platform_init()
> > > are protected from being called again, so there won't be any issues
> > > if these functions are invoked again at SNP/SEV VM launch if they
> > > have been invoked earlier during module init.
> >
> > >That's one solution. I don't know if there is a downside to the system for enabling SEV if SNP is being enabled but another solution could be to just directly place a DF_FLUSH command instead of calling sev_platform_init().
> >
> > Actually sev_platform_init() is already called on module init if psp_init_on_probe is not false. Only need to ensure that SNP firmware is initialized first with SNP_INIT command.
>
> > But if psp_init_on_probe is false, sev_platform_init() isn't called down this path. Alper has suggested we always call sev_platform_init() but we could just place an SEV_DF_FLUSH command instead. Or am I still missing something?
>
> >After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> > In my tests, if SEV has not been initialized in the platform
> > yet, SNP VMs fail with SEV_DF_FLUSH required error. I tried
> > calling SEV_DF_FLUSH right after the SNP platform init.
>
> Are you getting the DLFLUSH_REQUIRED error after the SNP activate command ?
>
> Also did you use the SEV_DF_FLUSH command or the SNP_DF_FLUSH command ?
>
> With SNP you need to use SNP_DF_FLUSH command.
>
> Thanks,
> Ashish

^ permalink raw reply	[flat|nested] 239+ messages in thread

* RE: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2022-06-14 18:58                   ` Alper Gun
@ 2022-06-14 20:23                     ` Kalra, Ashish
  2022-06-14 20:29                       ` Peter Gonda
  0 siblings, 1 reply; 239+ messages in thread
From: Kalra, Ashish @ 2022-06-14 20:23 UTC (permalink / raw)
  To: Alper Gun
  Cc: Peter Gonda, the arch/x86 maintainers, LKML, kvm list,
	linux-coco, linux-mm, Linux Crypto Mailing List, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, Lendacky, Thomas, H. Peter Anvin,
	Ard Biesheuvel, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Roth, Michael, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck, Marc Orr,
	Sathyanarayanan Kuppuswamy

[AMD Official Use Only - General]

Hello Alper,

Here is the feedback from the SEV/SNP firmware team:

The SNP spec has this line in SNP_INIT_EX:

“The firmware marks all encryption capable ASIDs as unusable for encrypted virtualization.”

This is a back-handed way of saying that after SNP_INIT, all of the ASIDs are considered “dirty”. None are in use, but the FW can’t know if there’s any data in the caches left over from some prior guest. So after doing an SNP_INIT (or SNP_INIT_EX), a WBINVD (on all threads) and DF_FLUSH sequence is required. It doesn’t matter whether it’s an SEV DF_FLUSH or an SNP_DF_FLUSH… they both do EXACTLY the same thing.

I don’t understand off hand why SNP_LAUNCH_START would require a DF_FLUSH… Usually, it’s only an “activate” command that requires the DF_FLUSH. ACTIVATE/ACTIVATE_EX/SNP_ACTIVATE/SNP_ACTIVATE_EX

[Ashish] That's why I asked if you are getting the DLFLUSH_REQUIRED error after the SNP activate command ?

It’s when you attempt to (re-)use >an ASID that’s “dirty” that you should get the DF_FLUSH_REQUIRED error.

[Ashish] And we do SNP_DF_FLUSH/SEV_DF_FLUSH whenever ASIDs are reused/re-cycled.

If the host only wants to run SNP guests, there’s no need to do an SEV INIT or any other SEV operations. If the host DOES want to run SEV AND SNP guests, then the required sequence is to do the SNP_INIT before the SEV INIT. Doing the WBINVD/DF_FLUSH any time between the SNP_INIT and any ACTIVATE command should be sufficient.

Thanks,
Ashish

-----Original Message-----
From: Alper Gun <alpergun@google.com> 
Sent: Tuesday, June 14, 2022 1:58 PM
To: Kalra, Ashish <Ashish.Kalra@amd.com>
Cc: Peter Gonda <pgonda@google.com>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing List <linux-crypto@vger.kernel.org>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H. Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo Bonzini <pbonzini@redhat.com>; Sean Christopherson <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy Lutomirski <luto@kernel.org>; Dave Hansen <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter Zijlstra <peterz@infradead.org>; Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>; David Rientjes <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc Orr <marcorr@google.com>; Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command

Let me summarize what I tried.

1- when using psp_init_probe false, the SNP VM fails in SNP_LAUNCH_START step with error SEV_RET_DFFLUSH_REQUIRED(15).
2- added SEV_DF_FLUSH just after SNP platform init and it didn't fail in launch start but failed later during SNP_LAUNCH_UPDATE with
SEV_RET_INVALID_PARAM(22)
3- added SNP_DF_FLUSH just after SNP platform init and it failed again during SNP_LAUNCH_UPDATE with SEV_RET_INVALID_PARAM(22)
4- added sev_platform_init for SNP VMs and it worked.

For me DF_FLUSH alone didn' help boot a VM. I don't know yet why sev platform status impacts the SNP VM, but sev_platform_init fixes the problem.


On Tue, Jun 14, 2022 at 10:16 AM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
>
> [AMD Official Use Only - General]
>
> Hello Alper, Peter,
>
> -----Original Message-----
> From: Peter Gonda <pgonda@google.com>
> Sent: Tuesday, June 14, 2022 11:30 AM
> To: Kalra, Ashish <Ashish.Kalra@amd.com>
> Cc: Alper Gun <alpergun@google.com>; Brijesh Singh 
> <brijesh.singh@amd.com>; the arch/x86 maintainers <x86@kernel.org>; 
> LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; 
> linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing 
> List <linux-crypto@vger.kernel.org>; Thomas Gleixner 
> <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel 
> <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H. 
> Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo 
> Bonzini <pbonzini@redhat.com>; Sean Christopherson 
> <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng 
> Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy 
> Lutomirski <luto@kernel.org>; Dave Hansen 
> <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter 
> Zijlstra <peterz@infradead.org>; Srinivas Pandruvada 
> <srinivas.pandruvada@linux.intel.com>; David Rientjes 
> <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin 
> Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; 
> Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka 
> <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi 
> Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc Orr 
> <marcorr@google.com>; Sathyanarayanan Kuppuswamy 
> <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri 
> <papaluri@amd.com>
> Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
>
> On Tue, Jun 14, 2022 at 10:11 AM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
> >
> > [AMD Official Use Only - General]
> >
> >
> > -----Original Message-----
> > From: Peter Gonda <pgonda@google.com>
> > Sent: Tuesday, June 14, 2022 10:38 AM
> > To: Kalra, Ashish <Ashish.Kalra@amd.com>
> > Cc: Alper Gun <alpergun@google.com>; Brijesh Singh 
> > <brijesh.singh@amd.com>; Kalra, Ashish <Ashish.Kalra@amd.com>; the
> > arch/x86 maintainers <x86@kernel.org>; LKML 
> > <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; 
> > linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing 
> > List <linux-crypto@vger.kernel.org>; Thomas Gleixner 
> > <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel 
> > <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H.
> > Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo 
> > Bonzini <pbonzini@redhat.com>; Sean Christopherson 
> > <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng 
> > Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy 
> > Lutomirski <luto@kernel.org>; Dave Hansen 
> > <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter 
> > Zijlstra <peterz@infradead.org>; Srinivas Pandruvada 
> > <srinivas.pandruvada@linux.intel.com>; David Rientjes 
> > <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin 
> > Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; 
> > Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka 
> > <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi 
> > Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc 
> > Orr <marcorr@google.com>; Sathyanarayanan Kuppuswamy 
> > <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri 
> > <papaluri@amd.com>
> > Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT 
> > command
> >
> > On Mon, Jun 13, 2022 at 6:21 PM Ashish Kalra <ashkalra@amd.com> wrote:
> > >
> > >
> > > On 6/13/22 23:33, Alper Gun wrote:
> > > > On Mon, Jun 13, 2022 at 4:15 PM Ashish Kalra <ashkalra@amd.com> wrote:
> > > >> Hello Alper,
> > > >>
> > > >> On 6/13/22 20:58, Alper Gun wrote:
> > > >>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd
> > > >>> *argp)
> > > >>>>    {
> > > >>>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT ||
> > > >>>> + argp->id == KVM_SEV_SNP_INIT);
> > > >>>>           struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > >>>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> > > >>>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
> > > >>>>           int asid, ret;
> > > >>>>
> > > >>>>           if (kvm->created_vcpus) @@ -249,12 +269,22 @@ 
> > > >>>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > >>>>                   return ret;
> > > >>>>
> > > >>>>           sev->es_active = es_active;
> > > >>>> +       sev->snp_active = snp_active;
> > > >>>>           asid = sev_asid_new(sev);
> > > >>>>           if (asid < 0)
> > > >>>>                   goto e_no_asid;
> > > >>>>           sev->asid = asid;
> > > >>>>
> > > >>>> -       ret = sev_platform_init(&argp->error);
> > > >>>> +       if (snp_active) {
> > > >>>> +               ret = verify_snp_init_flags(kvm, argp);
> > > >>>> +               if (ret)
> > > >>>> +                       goto e_free;
> > > >>>> +
> > > >>>> +               ret = sev_snp_init(&argp->error);
> > > >>>> +       } else {
> > > >>>> +               ret = sev_platform_init(&argp->error);
> > > >>> After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> > > >>> In my tests, if SEV has not been initialized in the platform 
> > > >>> yet, SNP VMs fail with SEV_DF_FLUSH required error. I tried 
> > > >>> calling SEV_DF_FLUSH right after the SNP platform init but 
> > > >>> this time it failed later on the SNP launch update command 
> > > >>> with SEV_RET_INVALID_PARAM error. Looks like there is another 
> > > >>> dependency on SEV platform initialization.
> > > >>>
> > > >>> Calling sev_platform_init for SNP VMs fixes the problem in our tests.
> > > >> Trying to get some more context for this issue.
> > > >>
> > > >> When you say after SEV_INIT_EX support patches, SEV may be 
> > > >> initialized in the platform late, do you mean sev_pci_init()->sev_snp_init() ...
> > > >> sev_platform_init() code path has still not executed on the host BSP ?
> > > >>
> > > > Correct, INIT_EX requires the file system to be ready and there 
> > > > is a ccp module param to call it only when needed.
> > > >
> > > > MODULE_PARM_DESC(psp_init_on_probe, " if true, the PSP will be 
> > > > initialized on module init. Else the PSP will be initialized on 
> > > > the first command requiring it");
> > > >
> > > > If this module param is false, it won't initialize SEV on the 
> > > > platform until the first SEV VM.
> > > >
> > > Ok, that makes sense.
> > >
> > > So the fix will be to call sev_platform_init() unconditionally 
> > > here in sev_guest_init(), and both sev_snp_init() and 
> > > sev_platform_init() are protected from being called again, so 
> > > there won't be any issues if these functions are invoked again at 
> > > SNP/SEV VM launch if they have been invoked earlier during module init.
> >
> > >That's one solution. I don't know if there is a downside to the system for enabling SEV if SNP is being enabled but another solution could be to just directly place a DF_FLUSH command instead of calling sev_platform_init().
> >
> > Actually sev_platform_init() is already called on module init if psp_init_on_probe is not false. Only need to ensure that SNP firmware is initialized first with SNP_INIT command.
>
> > But if psp_init_on_probe is false, sev_platform_init() isn't called down this path. Alper has suggested we always call sev_platform_init() but we could just place an SEV_DF_FLUSH command instead. Or am I still missing something?
>
> >After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> > In my tests, if SEV has not been initialized in the platform  yet, 
> >SNP VMs fail with SEV_DF_FLUSH required error. I tried  calling 
> >SEV_DF_FLUSH right after the SNP platform init.
>
> Are you getting the DLFLUSH_REQUIRED error after the SNP activate command ?
>
> Also did you use the SEV_DF_FLUSH command or the SNP_DF_FLUSH command ?
>
> With SNP you need to use SNP_DF_FLUSH command.
>
> Thanks,
> Ashish

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2022-06-14 20:23                     ` Kalra, Ashish
@ 2022-06-14 20:29                       ` Peter Gonda
  2022-06-14 20:39                         ` Kalra, Ashish
  0 siblings, 1 reply; 239+ messages in thread
From: Peter Gonda @ 2022-06-14 20:29 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: Alper Gun, the arch/x86 maintainers, LKML, kvm list, linux-coco,
	linux-mm, Linux Crypto Mailing List, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, Lendacky, Thomas, H. Peter Anvin,
	Ard Biesheuvel, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Roth, Michael, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck, Marc Orr,
	Sathyanarayanan Kuppuswamy

On Tue, Jun 14, 2022 at 2:23 PM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
>
> [AMD Official Use Only - General]
>
> Hello Alper,
>
> Here is the feedback from the SEV/SNP firmware team:
>
> The SNP spec has this line in SNP_INIT_EX:
>
> “The firmware marks all encryption capable ASIDs as unusable for encrypted virtualization.”
>
> This is a back-handed way of saying that after SNP_INIT, all of the ASIDs are considered “dirty”. None are in use, but the FW can’t know if there’s any data in the caches left over from some prior guest. So after doing an SNP_INIT (or SNP_INIT_EX), a WBINVD (on all threads) and DF_FLUSH sequence is required. It doesn’t matter whether it’s an SEV DF_FLUSH or an SNP_DF_FLUSH… they both do EXACTLY the same thing.
>
> I don’t understand off hand why SNP_LAUNCH_START would require a DF_FLUSH… Usually, it’s only an “activate” command that requires the DF_FLUSH. ACTIVATE/ACTIVATE_EX/SNP_ACTIVATE/SNP_ACTIVATE_EX
>
> [Ashish] That's why I asked if you are getting the DLFLUSH_REQUIRED error after the SNP activate command ?
>
> It’s when you attempt to (re-)use >an ASID that’s “dirty” that you should get the DF_FLUSH_REQUIRED error.
>
> [Ashish] And we do SNP_DF_FLUSH/SEV_DF_FLUSH whenever ASIDs are reused/re-cycled.
>
> If the host only wants to run SNP guests, there’s no need to do an SEV INIT or any other SEV operations. If the host DOES want to run SEV AND SNP guests, then the required sequence is to do the SNP_INIT before the SEV INIT. Doing the WBINVD/DF_FLUSH any time between the SNP_INIT and any ACTIVATE command should be sufficient.

Thanks for the follow up Ashish! I was assuming there was some issue
with the ASIDs here but didn't have time to experiment yet. Is there
any downside to enabling SEV? If not these patches make sense but if
there is some cost should we make KVM capable of running SNP VMs
without enabling SEV?


>
> Thanks,
> Ashish
>
> -----Original Message-----
> From: Alper Gun <alpergun@google.com>
> Sent: Tuesday, June 14, 2022 1:58 PM
> To: Kalra, Ashish <Ashish.Kalra@amd.com>
> Cc: Peter Gonda <pgonda@google.com>; the arch/x86 maintainers <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing List <linux-crypto@vger.kernel.org>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H. Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo Bonzini <pbonzini@redhat.com>; Sean Christopherson <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy Lutomirski <luto@kernel.org>; Dave Hansen <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter Zijlstra <peterz@infradead.org>; Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>; David Rientjes <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc Orr <marcorr@google.com>; Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
> Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
>
> Let me summarize what I tried.
>
> 1- when using psp_init_probe false, the SNP VM fails in SNP_LAUNCH_START step with error SEV_RET_DFFLUSH_REQUIRED(15).
> 2- added SEV_DF_FLUSH just after SNP platform init and it didn't fail in launch start but failed later during SNP_LAUNCH_UPDATE with
> SEV_RET_INVALID_PARAM(22)
> 3- added SNP_DF_FLUSH just after SNP platform init and it failed again during SNP_LAUNCH_UPDATE with SEV_RET_INVALID_PARAM(22)
> 4- added sev_platform_init for SNP VMs and it worked.
>
> For me DF_FLUSH alone didn' help boot a VM. I don't know yet why sev platform status impacts the SNP VM, but sev_platform_init fixes the problem.
>
>
> On Tue, Jun 14, 2022 at 10:16 AM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
> >
> > [AMD Official Use Only - General]
> >
> > Hello Alper, Peter,
> >
> > -----Original Message-----
> > From: Peter Gonda <pgonda@google.com>
> > Sent: Tuesday, June 14, 2022 11:30 AM
> > To: Kalra, Ashish <Ashish.Kalra@amd.com>
> > Cc: Alper Gun <alpergun@google.com>; Brijesh Singh
> > <brijesh.singh@amd.com>; the arch/x86 maintainers <x86@kernel.org>;
> > LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>;
> > linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing
> > List <linux-crypto@vger.kernel.org>; Thomas Gleixner
> > <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel
> > <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H.
> > Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo
> > Bonzini <pbonzini@redhat.com>; Sean Christopherson
> > <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng
> > Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy
> > Lutomirski <luto@kernel.org>; Dave Hansen
> > <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter
> > Zijlstra <peterz@infradead.org>; Srinivas Pandruvada
> > <srinivas.pandruvada@linux.intel.com>; David Rientjes
> > <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin
> > Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>;
> > Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka
> > <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi
> > Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc Orr
> > <marcorr@google.com>; Sathyanarayanan Kuppuswamy
> > <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri
> > <papaluri@amd.com>
> > Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
> >
> > On Tue, Jun 14, 2022 at 10:11 AM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
> > >
> > > [AMD Official Use Only - General]
> > >
> > >
> > > -----Original Message-----
> > > From: Peter Gonda <pgonda@google.com>
> > > Sent: Tuesday, June 14, 2022 10:38 AM
> > > To: Kalra, Ashish <Ashish.Kalra@amd.com>
> > > Cc: Alper Gun <alpergun@google.com>; Brijesh Singh
> > > <brijesh.singh@amd.com>; Kalra, Ashish <Ashish.Kalra@amd.com>; the
> > > arch/x86 maintainers <x86@kernel.org>; LKML
> > > <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>;
> > > linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing
> > > List <linux-crypto@vger.kernel.org>; Thomas Gleixner
> > > <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel
> > > <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H.
> > > Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo
> > > Bonzini <pbonzini@redhat.com>; Sean Christopherson
> > > <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng
> > > Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy
> > > Lutomirski <luto@kernel.org>; Dave Hansen
> > > <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter
> > > Zijlstra <peterz@infradead.org>; Srinivas Pandruvada
> > > <srinivas.pandruvada@linux.intel.com>; David Rientjes
> > > <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin
> > > Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>;
> > > Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka
> > > <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi
> > > Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc
> > > Orr <marcorr@google.com>; Sathyanarayanan Kuppuswamy
> > > <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri
> > > <papaluri@amd.com>
> > > Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT
> > > command
> > >
> > > On Mon, Jun 13, 2022 at 6:21 PM Ashish Kalra <ashkalra@amd.com> wrote:
> > > >
> > > >
> > > > On 6/13/22 23:33, Alper Gun wrote:
> > > > > On Mon, Jun 13, 2022 at 4:15 PM Ashish Kalra <ashkalra@amd.com> wrote:
> > > > >> Hello Alper,
> > > > >>
> > > > >> On 6/13/22 20:58, Alper Gun wrote:
> > > > >>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd
> > > > >>> *argp)
> > > > >>>>    {
> > > > >>>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT ||
> > > > >>>> + argp->id == KVM_SEV_SNP_INIT);
> > > > >>>>           struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > >>>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> > > > >>>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
> > > > >>>>           int asid, ret;
> > > > >>>>
> > > > >>>>           if (kvm->created_vcpus) @@ -249,12 +269,22 @@
> > > > >>>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > > >>>>                   return ret;
> > > > >>>>
> > > > >>>>           sev->es_active = es_active;
> > > > >>>> +       sev->snp_active = snp_active;
> > > > >>>>           asid = sev_asid_new(sev);
> > > > >>>>           if (asid < 0)
> > > > >>>>                   goto e_no_asid;
> > > > >>>>           sev->asid = asid;
> > > > >>>>
> > > > >>>> -       ret = sev_platform_init(&argp->error);
> > > > >>>> +       if (snp_active) {
> > > > >>>> +               ret = verify_snp_init_flags(kvm, argp);
> > > > >>>> +               if (ret)
> > > > >>>> +                       goto e_free;
> > > > >>>> +
> > > > >>>> +               ret = sev_snp_init(&argp->error);
> > > > >>>> +       } else {
> > > > >>>> +               ret = sev_platform_init(&argp->error);
> > > > >>> After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> > > > >>> In my tests, if SEV has not been initialized in the platform
> > > > >>> yet, SNP VMs fail with SEV_DF_FLUSH required error. I tried
> > > > >>> calling SEV_DF_FLUSH right after the SNP platform init but
> > > > >>> this time it failed later on the SNP launch update command
> > > > >>> with SEV_RET_INVALID_PARAM error. Looks like there is another
> > > > >>> dependency on SEV platform initialization.
> > > > >>>
> > > > >>> Calling sev_platform_init for SNP VMs fixes the problem in our tests.
> > > > >> Trying to get some more context for this issue.
> > > > >>
> > > > >> When you say after SEV_INIT_EX support patches, SEV may be
> > > > >> initialized in the platform late, do you mean sev_pci_init()->sev_snp_init() ...
> > > > >> sev_platform_init() code path has still not executed on the host BSP ?
> > > > >>
> > > > > Correct, INIT_EX requires the file system to be ready and there
> > > > > is a ccp module param to call it only when needed.
> > > > >
> > > > > MODULE_PARM_DESC(psp_init_on_probe, " if true, the PSP will be
> > > > > initialized on module init. Else the PSP will be initialized on
> > > > > the first command requiring it");
> > > > >
> > > > > If this module param is false, it won't initialize SEV on the
> > > > > platform until the first SEV VM.
> > > > >
> > > > Ok, that makes sense.
> > > >
> > > > So the fix will be to call sev_platform_init() unconditionally
> > > > here in sev_guest_init(), and both sev_snp_init() and
> > > > sev_platform_init() are protected from being called again, so
> > > > there won't be any issues if these functions are invoked again at
> > > > SNP/SEV VM launch if they have been invoked earlier during module init.
> > >
> > > >That's one solution. I don't know if there is a downside to the system for enabling SEV if SNP is being enabled but another solution could be to just directly place a DF_FLUSH command instead of calling sev_platform_init().
> > >
> > > Actually sev_platform_init() is already called on module init if psp_init_on_probe is not false. Only need to ensure that SNP firmware is initialized first with SNP_INIT command.
> >
> > > But if psp_init_on_probe is false, sev_platform_init() isn't called down this path. Alper has suggested we always call sev_platform_init() but we could just place an SEV_DF_FLUSH command instead. Or am I still missing something?
> >
> > >After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> > > In my tests, if SEV has not been initialized in the platform  yet,
> > >SNP VMs fail with SEV_DF_FLUSH required error. I tried  calling
> > >SEV_DF_FLUSH right after the SNP platform init.
> >
> > Are you getting the DLFLUSH_REQUIRED error after the SNP activate command ?
> >
> > Also did you use the SEV_DF_FLUSH command or the SNP_DF_FLUSH command ?
> >
> > With SNP you need to use SNP_DF_FLUSH command.
> >
> > Thanks,
> > Ashish

^ permalink raw reply	[flat|nested] 239+ messages in thread

* RE: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
  2022-06-14 20:29                       ` Peter Gonda
@ 2022-06-14 20:39                         ` Kalra, Ashish
  0 siblings, 0 replies; 239+ messages in thread
From: Kalra, Ashish @ 2022-06-14 20:39 UTC (permalink / raw)
  To: Peter Gonda
  Cc: Alper Gun, the arch/x86 maintainers, LKML, kvm list, linux-coco,
	linux-mm, Linux Crypto Mailing List, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, Lendacky, Thomas, H. Peter Anvin,
	Ard Biesheuvel, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Roth, Michael, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, Tony Luck, Marc Orr,
	Sathyanarayanan Kuppuswamy

[AMD Official Use Only - General]

Hello Peter,

>> Here is the feedback from the SEV/SNP firmware team:
>>
>> The SNP spec has this line in SNP_INIT_EX:
>>
>> “The firmware marks all encryption capable ASIDs as unusable for encrypted virtualization.”
>>
>> This is a back-handed way of saying that after SNP_INIT, all of the ASIDs are considered “dirty”. None are in use, but the FW can’t know if there’s any data in the caches left over from some prior guest. So after doing an SNP_INIT (or SNP_INIT_EX), a WBINVD (on all threads) and DF_FLUSH sequence is required. It doesn’t matter whether it’s an SEV DF_FLUSH or an SNP_DF_FLUSH… they both do EXACTLY the same thing.
>>
>> I don’t understand off hand why SNP_LAUNCH_START would require a 
>> DF_FLUSH… Usually, it’s only an “activate” command that requires the 
>> DF_FLUSH. ACTIVATE/ACTIVATE_EX/SNP_ACTIVATE/SNP_ACTIVATE_EX
>>
>> [Ashish] That's why I asked if you are getting the DLFLUSH_REQUIRED error after the SNP activate command ?
>>
>> It’s when you attempt to (re-)use >an ASID that’s “dirty” that you should get the DF_FLUSH_REQUIRED error.
>>
>> [Ashish] And we do SNP_DF_FLUSH/SEV_DF_FLUSH whenever ASIDs are reused/re-cycled.
>>
>> If the host only wants to run SNP guests, there’s no need to do an SEV INIT or any other SEV operations. If the host DOES want to run SEV AND SNP guests, then the required sequence is to do the SNP_INIT before the SEV INIT. Doing the >>WBINVD/DF_FLUSH any time between the SNP_INIT and any ACTIVATE command should be sufficient.

>Thanks for the follow up Ashish! I was assuming there was some issue with the ASIDs here but didn't have time to experiment yet.

Yes, it looks to be some issue with ASIDs.

> Is there any downside to enabling SEV? If not these patches make sense but if there is some cost should we make KVM capable of running SNP VMs without enabling SEV?

As mentioned above we only need to do this if the host wants to run both SEV and SNP guests. I assume the cost should not be considered as this is only a init or guest launch time addition.

Thanks,
Ashish

> -----Original Message-----
> From: Alper Gun <alpergun@google.com>
> Sent: Tuesday, June 14, 2022 1:58 PM
> To: Kalra, Ashish <Ashish.Kalra@amd.com>
> Cc: Peter Gonda <pgonda@google.com>; the arch/x86 maintainers 
> <x86@kernel.org>; LKML <linux-kernel@vger.kernel.org>; kvm list 
> <kvm@vger.kernel.org>; linux-coco@lists.linux.dev; linux-mm@kvack.org; 
> Linux Crypto Mailing List <linux-crypto@vger.kernel.org>; Thomas 
> Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg 
> Roedel <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; 
> H. Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; 
> Paolo Bonzini <pbonzini@redhat.com>; Sean Christopherson 
> <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng 
> Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy 
> Lutomirski <luto@kernel.org>; Dave Hansen 
> <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter 
> Zijlstra <peterz@infradead.org>; Srinivas Pandruvada 
> <srinivas.pandruvada@linux.intel.com>; David Rientjes 
> <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin 
> Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; 
> Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka 
> <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi 
> Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc Orr 
> <marcorr@google.com>; Sathyanarayanan Kuppuswamy 
> <sathyanarayanan.kuppuswamy@linux.intel.com>
> Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command
>
> Let me summarize what I tried.
>
> 1- when using psp_init_probe false, the SNP VM fails in SNP_LAUNCH_START step with error SEV_RET_DFFLUSH_REQUIRED(15).
> 2- added SEV_DF_FLUSH just after SNP platform init and it didn't fail 
> in launch start but failed later during SNP_LAUNCH_UPDATE with
> SEV_RET_INVALID_PARAM(22)
> 3- added SNP_DF_FLUSH just after SNP platform init and it failed again 
> during SNP_LAUNCH_UPDATE with SEV_RET_INVALID_PARAM(22)
> 4- added sev_platform_init for SNP VMs and it worked.
>
> For me DF_FLUSH alone didn' help boot a VM. I don't know yet why sev platform status impacts the SNP VM, but sev_platform_init fixes the problem.
>
>
> On Tue, Jun 14, 2022 at 10:16 AM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
> >
> > [AMD Official Use Only - General]
> >
> > Hello Alper, Peter,
> >
> > -----Original Message-----
> > From: Peter Gonda <pgonda@google.com>
> > Sent: Tuesday, June 14, 2022 11:30 AM
> > To: Kalra, Ashish <Ashish.Kalra@amd.com>
> > Cc: Alper Gun <alpergun@google.com>; Brijesh Singh 
> > <brijesh.singh@amd.com>; the arch/x86 maintainers <x86@kernel.org>; 
> > LKML <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; 
> > linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto Mailing 
> > List <linux-crypto@vger.kernel.org>; Thomas Gleixner 
> > <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel 
> > <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H.
> > Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; Paolo 
> > Bonzini <pbonzini@redhat.com>; Sean Christopherson 
> > <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Wanpeng 
> > Li <wanpengli@tencent.com>; Jim Mattson <jmattson@google.com>; Andy 
> > Lutomirski <luto@kernel.org>; Dave Hansen 
> > <dave.hansen@linux.intel.com>; Sergio Lopez <slp@redhat.com>; Peter 
> > Zijlstra <peterz@infradead.org>; Srinivas Pandruvada 
> > <srinivas.pandruvada@linux.intel.com>; David Rientjes 
> > <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin 
> > Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; 
> > Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka 
> > <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi 
> > Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc 
> > Orr <marcorr@google.com>; Sathyanarayanan Kuppuswamy 
> > <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri 
> > <papaluri@amd.com>
> > Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT 
> > command
> >
> > On Tue, Jun 14, 2022 at 10:11 AM Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
> > >
> > > [AMD Official Use Only - General]
> > >
> > >
> > > -----Original Message-----
> > > From: Peter Gonda <pgonda@google.com>
> > > Sent: Tuesday, June 14, 2022 10:38 AM
> > > To: Kalra, Ashish <Ashish.Kalra@amd.com>
> > > Cc: Alper Gun <alpergun@google.com>; Brijesh Singh 
> > > <brijesh.singh@amd.com>; Kalra, Ashish <Ashish.Kalra@amd.com>; the
> > > arch/x86 maintainers <x86@kernel.org>; LKML 
> > > <linux-kernel@vger.kernel.org>; kvm list <kvm@vger.kernel.org>; 
> > > linux-coco@lists.linux.dev; linux-mm@kvack.org; Linux Crypto 
> > > Mailing List <linux-crypto@vger.kernel.org>; Thomas Gleixner 
> > > <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Joerg Roedel 
> > > <jroedel@suse.de>; Lendacky, Thomas <Thomas.Lendacky@amd.com>; H.
> > > Peter Anvin <hpa@zytor.com>; Ard Biesheuvel <ardb@kernel.org>; 
> > > Paolo Bonzini <pbonzini@redhat.com>; Sean Christopherson 
> > > <seanjc@google.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; 
> > > Wanpeng Li <wanpengli@tencent.com>; Jim Mattson 
> > > <jmattson@google.com>; Andy Lutomirski <luto@kernel.org>; Dave 
> > > Hansen <dave.hansen@linux.intel.com>; Sergio Lopez 
> > > <slp@redhat.com>; Peter Zijlstra <peterz@infradead.org>; Srinivas 
> > > Pandruvada <srinivas.pandruvada@linux.intel.com>; David Rientjes 
> > > <rientjes@google.com>; Dov Murik <dovmurik@linux.ibm.com>; Tobin 
> > > Feldman-Fitzthum <tobin@ibm.com>; Borislav Petkov <bp@alien8.de>; 
> > > Roth, Michael <Michael.Roth@amd.com>; Vlastimil Babka 
> > > <vbabka@suse.cz>; Kirill A . Shutemov <kirill@shutemov.name>; Andi 
> > > Kleen <ak@linux.intel.com>; Tony Luck <tony.luck@intel.com>; Marc 
> > > Orr <marcorr@google.com>; Sathyanarayanan Kuppuswamy 
> > > <sathyanarayanan.kuppuswamy@linux.intel.com>; Pavan Kumar Paluri 
> > > <papaluri@amd.com>
> > > Subject: Re: [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT 
> > > command
> > >
> > > On Mon, Jun 13, 2022 at 6:21 PM Ashish Kalra <ashkalra@amd.com> wrote:
> > > >
> > > >
> > > > On 6/13/22 23:33, Alper Gun wrote:
> > > > > On Mon, Jun 13, 2022 at 4:15 PM Ashish Kalra <ashkalra@amd.com> wrote:
> > > > >> Hello Alper,
> > > > >>
> > > > >> On 6/13/22 20:58, Alper Gun wrote:
> > > > >>> static int sev_guest_init(struct kvm *kvm, struct 
> > > > >>> kvm_sev_cmd
> > > > >>> *argp)
> > > > >>>>    {
> > > > >>>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT ||
> > > > >>>> + argp->id == KVM_SEV_SNP_INIT);
> > > > >>>>           struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > >>>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> > > > >>>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
> > > > >>>>           int asid, ret;
> > > > >>>>
> > > > >>>>           if (kvm->created_vcpus) @@ -249,12 +269,22 @@ 
> > > > >>>> static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > > > >>>>                   return ret;
> > > > >>>>
> > > > >>>>           sev->es_active = es_active;
> > > > >>>> +       sev->snp_active = snp_active;
> > > > >>>>           asid = sev_asid_new(sev);
> > > > >>>>           if (asid < 0)
> > > > >>>>                   goto e_no_asid;
> > > > >>>>           sev->asid = asid;
> > > > >>>>
> > > > >>>> -       ret = sev_platform_init(&argp->error);
> > > > >>>> +       if (snp_active) {
> > > > >>>> +               ret = verify_snp_init_flags(kvm, argp);
> > > > >>>> +               if (ret)
> > > > >>>> +                       goto e_free;
> > > > >>>> +
> > > > >>>> +               ret = sev_snp_init(&argp->error);
> > > > >>>> +       } else {
> > > > >>>> +               ret = sev_platform_init(&argp->error);
> > > > >>> After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> > > > >>> In my tests, if SEV has not been initialized in the platform 
> > > > >>> yet, SNP VMs fail with SEV_DF_FLUSH required error. I tried 
> > > > >>> calling SEV_DF_FLUSH right after the SNP platform init but 
> > > > >>> this time it failed later on the SNP launch update command 
> > > > >>> with SEV_RET_INVALID_PARAM error. Looks like there is 
> > > > >>> another dependency on SEV platform initialization.
> > > > >>>
> > > > >>> Calling sev_platform_init for SNP VMs fixes the problem in our tests.
> > > > >> Trying to get some more context for this issue.
> > > > >>
> > > > >> When you say after SEV_INIT_EX support patches, SEV may be 
> > > > >> initialized in the platform late, do you mean sev_pci_init()->sev_snp_init() ...
> > > > >> sev_platform_init() code path has still not executed on the host BSP ?
> > > > >>
> > > > > Correct, INIT_EX requires the file system to be ready and 
> > > > > there is a ccp module param to call it only when needed.
> > > > >
> > > > > MODULE_PARM_DESC(psp_init_on_probe, " if true, the PSP will be 
> > > > > initialized on module init. Else the PSP will be initialized 
> > > > > on the first command requiring it");
> > > > >
> > > > > If this module param is false, it won't initialize SEV on the 
> > > > > platform until the first SEV VM.
> > > > >
> > > > Ok, that makes sense.
> > > >
> > > > So the fix will be to call sev_platform_init() unconditionally 
> > > > here in sev_guest_init(), and both sev_snp_init() and
> > > > sev_platform_init() are protected from being called again, so 
> > > > there won't be any issues if these functions are invoked again 
> > > > at SNP/SEV VM launch if they have been invoked earlier during module init.
> > >
> > > >That's one solution. I don't know if there is a downside to the system for enabling SEV if SNP is being enabled but another solution could be to just directly place a DF_FLUSH command instead of calling sev_platform_init().
> > >
> > > Actually sev_platform_init() is already called on module init if psp_init_on_probe is not false. Only need to ensure that SNP firmware is initialized first with SNP_INIT command.
> >
> > > But if psp_init_on_probe is false, sev_platform_init() isn't called down this path. Alper has suggested we always call sev_platform_init() but we could just place an SEV_DF_FLUSH command instead. Or am I still missing something?
> >
> > >After SEV INIT_EX support patches, SEV may be initialized in the platform late.
> > > In my tests, if SEV has not been initialized in the platform  yet, 
> > >SNP VMs fail with SEV_DF_FLUSH required error. I tried  calling 
> > >SEV_DF_FLUSH right after the SNP platform init.
> >
> > Are you getting the DLFLUSH_REQUIRED error after the SNP activate command ?
> >
> > Also did you use the SEV_DF_FLUSH command or the SNP_DF_FLUSH command ?
> >
> > With SNP you need to use SNP_DF_FLUSH command.
> >
> > Thanks,
> > Ashish

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2021-10-15 17:16         ` Sean Christopherson
@ 2022-09-08 21:21           ` Michael Roth
  2022-09-08 22:28             ` Michael Roth
  2022-09-14  8:05             ` Sean Christopherson
  0 siblings, 2 replies; 239+ messages in thread
From: Michael Roth @ 2022-09-08 21:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, jarkko

On Fri, Oct 15, 2021 at 05:16:28PM +0000, Sean Christopherson wrote:
> On Fri, Oct 15, 2021, Brijesh Singh wrote:
> > 
> > On 10/13/21 1:16 PM, Sean Christopherson wrote:
> > > On Wed, Oct 13, 2021, Sean Christopherson wrote:
> > >> On Fri, Aug 20, 2021, Brijesh Singh wrote:
> > >>> When SEV-SNP is enabled in the guest VM, the guest memory pages can
> > >>> either be a private or shared. A write from the hypervisor goes through
> > >>> the RMP checks. If hardware sees that hypervisor is attempting to write
> > >>> to a guest private page, then it triggers an RMP violation #PF.
> > >>>
> > >>> To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
> > >>> used to verify that its safe to map a given guest page. Use the SRCU to
> > >>> protect against the page state change for existing mapped pages.
> > >> SRCU isn't protecting anything.  The synchronize_srcu_expedited() in the PSC code
> > >> forces it to wait for existing maps to go away, but it doesn't prevent new maps
> > >> from being created while the actual RMP updates are in-flight.  Most telling is
> > >> that the RMP updates happen _after_ the synchronize_srcu_expedited() call.
> > > Argh, another goof on my part.  Rereading prior feedback, I see that I loosely
> > > suggested SRCU as a possible solution.  That was a bad, bad suggestion.  I think
> > > (hope) I made it offhand without really thinking it through.  SRCU can't work in
> > > this case, because the whole premise of Read-Copy-Update is that there can be
> > > multiple copies of the data.  That simply can't be true for the RMP as hardware
> > > operates on a single table.
> > >
> > > In the future, please don't hesitate to push back on and/or question suggestions,
> > > especially those that are made without concrete examples, i.e. are likely off the
> > > cuff.  My goal isn't to set you up for failure :-/
> > 
> > What do you think about going back to my initial proposal of per-gfn
> > tracking [1] ? We can limit the changes to just for the kvm_vcpu_map()
> > and let the copy_to_user() take a fault and return an error (if it
> > attempt to write to guest private). If PSC happen while lock is held
> > then simplify return and let the guest retry PSC.
> 
> That approach is also broken as it doesn't hold a lock when updating host_write_track,
> e.g. the count can be corrupted if writers collide, and nothing blocks writers on
> in-progress readers.
> 
> I'm not opposed to a scheme that blocks PSC while KVM is reading, but I don't want
> to spend time iterating on the KVM case until consensus has been reached on how
> exactly RMP updates will be handled, and in general how the kernel will manage
> guest private memory.

Hi Sean,

(Sorry in advance for the long read, but it touches on a couple
inter-tangled topics that I think are important for this series.)

While we do still remain committed to working with community toward
implementing a UPM solution, we would still like to have a 'complete'
solution in the meantime, if at the very least to use as a basis for
building out other parts of the stack, or for enabling early adopters
downstream doing the same.

Toward that end, there are a couple areas that need to be addressed,
(and remain unaddressed with v6, since we were planning to leverage UPM
to handle them and so left them as open TODOs at the time):

 1) this issue of guarding against shared->private conversions for pages
    that are in-use by kernel
 2) how to deal with unmapping/splitting the direct map

(These 2 things end up being fairly tightly coupled, which is why I'm
trying to cover them both in this email. Will explain more in a bit)

You mentioned elsewhere how 1) would be nicely addressed with UPM, since
the conversion would only point the guest to an entirely new page, while
leaving the shared page intact (at least until the normal notifiers do
their thing in response to any subsequent discard operations from
userspace). If the guest breaks because it doesn't see the write, that's
okay, because it's not supposed to be switching in-use shared pages to
private while they are being used. (though the pKVM folks did note in v7
of private memslot patches that they actually use the same physical page
for both shared/private, so I'm not sure if that approach will still
stack up in that context, if it ends up being needed there)

So in the context of this interim solution, we're trying to look for a
solution that's simple enough that it can be used reliably, without
introducing too much additional complexity into KVM. There is one
approach that seems to fit that bill, that Brijesh attempted in an
earlier version of this series (I'm not sure what exactly was the
catalyst to changing the approach, as I wasn't really in the loop at
the time, but AIUI there weren't any showstoppers there, but please
correct me if I'm missing anything):

 - if the host is writing to a page that it thinks is supposed to be
   shared, and the guest switches it to private, we get an RMP fault
   (actually, we will get a !PRESENT fault, since as of v5 we now
   remove the mapping from the directmap as part of conversion)
 - in the host #PF handler, if we see that the page is marked private
   in the RMP table, simply switch it back to shared
 - if this was a bug on the part of the host, then the guest will see
   the validated bit was cleared, and get a #VC where it can
   terminate accordingly
 - if this was a bug (or maliciousness) on the part of the guest,
   then the behavior is unexpected anyway, and it will be made
   aware this by the same mechanism

We've done some testing with this approach and it seems to hold up,
but since we also need the handler to deal with the !PRESENT case
(because of the current approach of nuking directmap entry for
private pages), there's a situation that can't be easily resolved
with this approach:

  # CASE1: recoverable
  
    THREAD 1              THREAD 2
    - #PF(!PRESENT)       - #PF(!PRESENT)
    - restore directmap
    - RMPUPDATE(shared)
                          - not private? okay to retry since maybe it was
                            fixed up already
  
  # CASE2: unrecoverable
  
    THREAD 1
    - broken kernel breaks directmap/init-mm, not related to SEV
    - #PF(!PRESENT)
    - not private? we should oops, since retrying might cause a loop
  
The reason we can't distinguish between recoverable/unrecoverable is
because the kernel #PF handling here needs to happen for both !PRESENT
and for RMP violations. This is due to how we unmap from directmap as
part of shared->private conversion.

So we have to take the pessimistic approach due to CASE2, which
means that if CASE1 pops up, we'll still have a case where guest can
cause a crash/loop.

That where 2) comes into play. If we go back to the original approach
(as of v4) of simply splitting the directmap when we do shared->private
conversions, then we can limit the #PF handling to only have to deal
with cases where there's an RMP violation and X86_PF_RMP is set, which
makes it safe to retry to handle spurious cases like case #1.

AIUI, this is still sort of an open question, but you noted how nuking
the directmap without any formalized interface for letting the kernel
know about it could be problematic down the road, which also sounds
like the sort of thing more suited for having UPM address at a more
general level, since there are similar requirements for TDX as well.

AIUI there are 2 main arguments against splitting the directmap:
 a) we can't easily rebuild it atm
 b) things like KSM might still tries to access private pages

But unmapping also suffers from a), since we still end up splitting the
directmap unless we remove pages in blocks of 2M. But nothing prevents
a guest from switching a single 4K page to private, in which case we
are forced to split. That would be normal behavior on the part of the
guest for setting up GHCB pages/etc, so we still end up splitting the
directmap over time.

And for b), as you noted, this is already something that SEV/SEV-ES
need to deal with, and at least in the case of SNP guest things are
a bit better since:

  -  if some kernel thread tried to write to private memory, they
     would notice if some errant kernel thread tried to write to guest
     memory, since #PF handler with flip the page and unset the
     validated bit on that memory.

  - for reads, they will get ciphertext, which isn't any worse than
    reading unencrypted guest memory for anything outside of KVM that
    specifically knows what it is expecting to read there (KSM being
    one exception, since it'll waste cycles cmp'ing ciphertext, but
    there's no way to make it work anyway, and we would be fine with
    requiring it to be disabled if that is a concern)

So that's sort of the context for why we're considering these 2 changes.
Any input on these is welcome, but at the very least wanted to provide
our rationale for past reviewers.

Thanks!

-Mike

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2022-09-08 21:21           ` Michael Roth
@ 2022-09-08 22:28             ` Michael Roth
  2022-09-14  8:05             ` Sean Christopherson
  1 sibling, 0 replies; 239+ messages in thread
From: Michael Roth @ 2022-09-08 22:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, jarkko

On Thu, Sep 08, 2022 at 04:21:14PM -0500, Michael Roth wrote:
> On Fri, Oct 15, 2021 at 05:16:28PM +0000, Sean Christopherson wrote:
> > On Fri, Oct 15, 2021, Brijesh Singh wrote:
> > > 
> > > On 10/13/21 1:16 PM, Sean Christopherson wrote:
> > > > On Wed, Oct 13, 2021, Sean Christopherson wrote:
> > > >> On Fri, Aug 20, 2021, Brijesh Singh wrote:
> > > >>> When SEV-SNP is enabled in the guest VM, the guest memory pages can
> > > >>> either be a private or shared. A write from the hypervisor goes through
> > > >>> the RMP checks. If hardware sees that hypervisor is attempting to write
> > > >>> to a guest private page, then it triggers an RMP violation #PF.
> > > >>>
> > > >>> To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
> > > >>> used to verify that its safe to map a given guest page. Use the SRCU to
> > > >>> protect against the page state change for existing mapped pages.
> > > >> SRCU isn't protecting anything.  The synchronize_srcu_expedited() in the PSC code
> > > >> forces it to wait for existing maps to go away, but it doesn't prevent new maps
> > > >> from being created while the actual RMP updates are in-flight.  Most telling is
> > > >> that the RMP updates happen _after_ the synchronize_srcu_expedited() call.
> > > > Argh, another goof on my part.  Rereading prior feedback, I see that I loosely
> > > > suggested SRCU as a possible solution.  That was a bad, bad suggestion.  I think
> > > > (hope) I made it offhand without really thinking it through.  SRCU can't work in
> > > > this case, because the whole premise of Read-Copy-Update is that there can be
> > > > multiple copies of the data.  That simply can't be true for the RMP as hardware
> > > > operates on a single table.
> > > >
> > > > In the future, please don't hesitate to push back on and/or question suggestions,
> > > > especially those that are made without concrete examples, i.e. are likely off the
> > > > cuff.  My goal isn't to set you up for failure :-/
> > > 
> > > What do you think about going back to my initial proposal of per-gfn
> > > tracking [1] ? We can limit the changes to just for the kvm_vcpu_map()
> > > and let the copy_to_user() take a fault and return an error (if it
> > > attempt to write to guest private). If PSC happen while lock is held
> > > then simplify return and let the guest retry PSC.
> > 
> > That approach is also broken as it doesn't hold a lock when updating host_write_track,
> > e.g. the count can be corrupted if writers collide, and nothing blocks writers on
> > in-progress readers.
> > 
> > I'm not opposed to a scheme that blocks PSC while KVM is reading, but I don't want
> > to spend time iterating on the KVM case until consensus has been reached on how
> > exactly RMP updates will be handled, and in general how the kernel will manage
> > guest private memory.
> 
> Hi Sean,
> 
> (Sorry in advance for the long read, but it touches on a couple
> inter-tangled topics that I think are important for this series.)
> 
> While we do still remain committed to working with community toward
> implementing a UPM solution, we would still like to have a 'complete'
> solution in the meantime, if at the very least to use as a basis for
> building out other parts of the stack, or for enabling early adopters
> downstream doing the same.
> 
> Toward that end, there are a couple areas that need to be addressed,
> (and remain unaddressed with v6, since we were planning to leverage UPM
> to handle them and so left them as open TODOs at the time):
> 
>  1) this issue of guarding against shared->private conversions for pages
>     that are in-use by kernel
>  2) how to deal with unmapping/splitting the direct map
> 
> (These 2 things end up being fairly tightly coupled, which is why I'm
> trying to cover them both in this email. Will explain more in a bit)
> 
> You mentioned elsewhere how 1) would be nicely addressed with UPM, since
> the conversion would only point the guest to an entirely new page, while
> leaving the shared page intact (at least until the normal notifiers do
> their thing in response to any subsequent discard operations from
> userspace). If the guest breaks because it doesn't see the write, that's
> okay, because it's not supposed to be switching in-use shared pages to
> private while they are being used. (though the pKVM folks did note in v7
> of private memslot patches that they actually use the same physical page
> for both shared/private, so I'm not sure if that approach will still
> stack up in that context, if it ends up being needed there)
> 
> So in the context of this interim solution, we're trying to look for a
> solution that's simple enough that it can be used reliably, without
> introducing too much additional complexity into KVM. There is one
> approach that seems to fit that bill, that Brijesh attempted in an
> earlier version of this series (I'm not sure what exactly was the
> catalyst to changing the approach, as I wasn't really in the loop at
> the time, but AIUI there weren't any showstoppers there, but please
> correct me if I'm missing anything):
> 
>  - if the host is writing to a page that it thinks is supposed to be
>    shared, and the guest switches it to private, we get an RMP fault
>    (actually, we will get a !PRESENT fault, since as of v5 we now
>    remove the mapping from the directmap as part of conversion)
>  - in the host #PF handler, if we see that the page is marked private
>    in the RMP table, simply switch it back to shared
>  - if this was a bug on the part of the host, then the guest will see
>    the validated bit was cleared, and get a #VC where it can
>    terminate accordingly
>  - if this was a bug (or maliciousness) on the part of the guest,
>    then the behavior is unexpected anyway, and it will be made
>    aware this by the same mechanism
> 
> We've done some testing with this approach and it seems to hold up,
> but since we also need the handler to deal with the !PRESENT case
> (because of the current approach of nuking directmap entry for
> private pages), there's a situation that can't be easily resolved
> with this approach:
> 
>   # CASE1: recoverable
>   
>     THREAD 1              THREAD 2
>     - #PF(!PRESENT)       - #PF(!PRESENT)
>     - restore directmap
>     - RMPUPDATE(shared)
>                           - not private? okay to retry since maybe it was
>                             fixed up already
>   
>   # CASE2: unrecoverable
>   
>     THREAD 1
>     - broken kernel breaks directmap/init-mm, not related to SEV
>     - #PF(!PRESENT)
>     - not private? we should oops, since retrying might cause a loop
>   
> The reason we can't distinguish between recoverable/unrecoverable is
> because the kernel #PF handling here needs to happen for both !PRESENT
> and for RMP violations. This is due to how we unmap from directmap as
> part of shared->private conversion.
> 
> So we have to take the pessimistic approach due to CASE2, which
> means that if CASE1 pops up, we'll still have a case where guest can
> cause a crash/loop.
> 
> That where 2) comes into play. If we go back to the original approach
> (as of v4) of simply splitting the directmap when we do shared->private
> conversions, then we can limit the #PF handling to only have to deal
> with cases where there's an RMP violation and X86_PF_RMP is set, which
> makes it safe to retry to handle spurious cases like case #1.
> 
> AIUI, this is still sort of an open question, but you noted how nuking
> the directmap without any formalized interface for letting the kernel
> know about it could be problematic down the road, which also sounds
> like the sort of thing more suited for having UPM address at a more
> general level, since there are similar requirements for TDX as well.
> 
> AIUI there are 2 main arguments against splitting the directmap:
>  a) we can't easily rebuild it atm
>  b) things like KSM might still tries to access private pages
> 
> But unmapping also suffers from a), since we still end up splitting the
> directmap unless we remove pages in blocks of 2M. But nothing prevents
> a guest from switching a single 4K page to private, in which case we
> are forced to split. That would be normal behavior on the part of the
> guest for setting up GHCB pages/etc, so we still end up splitting the
> directmap over time.

One more thing to note here, is with the splitting approach, we also
have the option of doing all the splitting when the region is registered
via KVM_MEMORY_ENCRYPT_REG_REGION. If unsplitting becomes possible in
the future, we could then handle that in KVM_MEMORY_ENCRYPT_UNREG_REGION
all at once which, where we'd have a bit more flexibility for going
about that.

-Mike

> 
> And for b), as you noted, this is already something that SEV/SEV-ES
> need to deal with, and at least in the case of SNP guest things are
> a bit better since:
> 
>   -  if some kernel thread tried to write to private memory, they
>      would notice if some errant kernel thread tried to write to guest
>      memory, since #PF handler with flip the page and unset the
>      validated bit on that memory.
> 
>   - for reads, they will get ciphertext, which isn't any worse than
>     reading unencrypted guest memory for anything outside of KVM that
>     specifically knows what it is expecting to read there (KSM being
>     one exception, since it'll waste cycles cmp'ing ciphertext, but
>     there's no way to make it work anyway, and we would be fine with
>     requiring it to be disabled if that is a concern)
> 
> So that's sort of the context for why we're considering these 2 changes.
> Any input on these is welcome, but at the very least wanted to provide
> our rationale for past reviewers.
> 
> Thanks!
> 
> -Mike

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2022-09-08 21:21           ` Michael Roth
  2022-09-08 22:28             ` Michael Roth
@ 2022-09-14  8:05             ` Sean Christopherson
  2022-09-14 11:02               ` Marc Orr
  2022-09-19 17:56               ` Michael Roth
  1 sibling, 2 replies; 239+ messages in thread
From: Sean Christopherson @ 2022-09-14  8:05 UTC (permalink / raw)
  To: Michael Roth
  Cc: Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, jarkko

On Thu, Sep 08, 2022, Michael Roth wrote:
> On Fri, Oct 15, 2021 at 05:16:28PM +0000, Sean Christopherson wrote:
> So in the context of this interim solution, we're trying to look for a
> solution that's simple enough that it can be used reliably, without
> introducing too much additional complexity into KVM. There is one
> approach that seems to fit that bill, that Brijesh attempted in an
> earlier version of this series (I'm not sure what exactly was the
> catalyst to changing the approach, as I wasn't really in the loop at
> the time, but AIUI there weren't any showstoppers there, but please
> correct me if I'm missing anything):
> 
>  - if the host is writing to a page that it thinks is supposed to be
>    shared, and the guest switches it to private, we get an RMP fault
>    (actually, we will get a !PRESENT fault, since as of v5 we now
>    remove the mapping from the directmap as part of conversion)
>  - in the host #PF handler, if we see that the page is marked private
>    in the RMP table, simply switch it back to shared
>  - if this was a bug on the part of the host, then the guest will see

As discussed off-list, attempting to fix up RMP violations in the host #PF handler
is not a viable approach.  There was also extensive discussion on-list a while back:

https://lore.kernel.org/all/8a244d34-2b10-4cf8-894a-1bf12b59cf92@www.fastmail.com

> AIUI, this is still sort of an open question, but you noted how nuking
> the directmap without any formalized interface for letting the kernel
> know about it could be problematic down the road, which also sounds
> like the sort of thing more suited for having UPM address at a more
> general level, since there are similar requirements for TDX as well.
> 
> AIUI there are 2 main arguments against splitting the directmap:
>  a) we can't easily rebuild it atm
>  b) things like KSM might still tries to access private pages
> 
> But unmapping also suffers from a), since we still end up splitting the
> directmap unless we remove pages in blocks of 2M.

But for UPM, it's easy to rebuild the direct map since there will be an explicit,
kernel controlled point where the "inaccesible" memfd releases the private page.

> But nothing prevents a guest from switching a single 4K page to private, in
> which case we are forced to split. That would be normal behavior on the part
> of the guest for setting up GHCB pages/etc, so we still end up splitting the
> directmap over time.

The host actually isn't _forced_ to split with UPM.  One option would be to refuse
to split the direct map and instead force userspace to eat the 2mb allocation even
though it only wants to map a single 4kb chunk into the guest.  I don't know that
that's a _good_ option, but it is an option.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2022-09-14  8:05             ` Sean Christopherson
@ 2022-09-14 11:02               ` Marc Orr
  2022-09-14 16:15                 ` Sean Christopherson
  2022-09-19 17:56               ` Michael Roth
  1 sibling, 1 reply; 239+ messages in thread
From: Marc Orr @ 2022-09-14 11:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Michael Roth, Brijesh Singh, x86, LKML, kvm list, linux-coco,
	Linux Memory Management List, Linux Crypto Mailing List,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Vlastimil Babka, Kirill A . Shutemov,
	Andi Kleen, Tony Luck, Sathyanarayanan Kuppuswamy, jarkko

On Wed, Sep 14, 2022 at 9:05 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Sep 08, 2022, Michael Roth wrote:
> > On Fri, Oct 15, 2021 at 05:16:28PM +0000, Sean Christopherson wrote:
> > So in the context of this interim solution, we're trying to look for a
> > solution that's simple enough that it can be used reliably, without
> > introducing too much additional complexity into KVM. There is one
> > approach that seems to fit that bill, that Brijesh attempted in an
> > earlier version of this series (I'm not sure what exactly was the
> > catalyst to changing the approach, as I wasn't really in the loop at
> > the time, but AIUI there weren't any showstoppers there, but please
> > correct me if I'm missing anything):
> >
> >  - if the host is writing to a page that it thinks is supposed to be
> >    shared, and the guest switches it to private, we get an RMP fault
> >    (actually, we will get a !PRESENT fault, since as of v5 we now
> >    remove the mapping from the directmap as part of conversion)
> >  - in the host #PF handler, if we see that the page is marked private
> >    in the RMP table, simply switch it back to shared
> >  - if this was a bug on the part of the host, then the guest will see
>
> As discussed off-list, attempting to fix up RMP violations in the host #PF handler
> is not a viable approach.  There was also extensive discussion on-list a while back:
>
> https://lore.kernel.org/all/8a244d34-2b10-4cf8-894a-1bf12b59cf92@www.fastmail.com

I mentioned this during Mike's talk at the micro-conference: For pages
mapped in by the kernel can we disallow them to be converted to
private? Note, userspace accesses are already handled by UPM.

In pseudo-code, I'm thinking something like this:

kmap_helper() {
  // And all other interfaces where the kernel can map a GPA
  // into the kernel page tables
  mapped_into_kernel_mem_set[hpa] = true;
}

kunmap_helper() {
  // And all other interfaces where the kernel can unmap a GPA
  // into the kernel page tables
  mapped_into_kernel_mem_set[hpa] = false;

  // Except it's not this simple because we probably need ref counting
  // for multiple mappings. Sigh. But you get the idea.
}

rmpupdate_helper() {
  if (conversion = SHARED_TO_PRIVATE && mapped_into_kernel_mem_set[hpa])
    return -EINVAL;  // Or whatever the appropriate error code here is.
}

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2022-09-14 11:02               ` Marc Orr
@ 2022-09-14 16:15                 ` Sean Christopherson
  2022-09-14 16:32                   ` Marc Orr
  0 siblings, 1 reply; 239+ messages in thread
From: Sean Christopherson @ 2022-09-14 16:15 UTC (permalink / raw)
  To: Marc Orr
  Cc: Michael Roth, Brijesh Singh, x86, LKML, kvm list, linux-coco,
	Linux Memory Management List, Linux Crypto Mailing List,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Vlastimil Babka, Kirill A . Shutemov,
	Andi Kleen, Tony Luck, Sathyanarayanan Kuppuswamy, jarkko

On Wed, Sep 14, 2022, Marc Orr wrote:
> On Wed, Sep 14, 2022 at 9:05 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Sep 08, 2022, Michael Roth wrote:
> > > On Fri, Oct 15, 2021 at 05:16:28PM +0000, Sean Christopherson wrote:
> > > So in the context of this interim solution, we're trying to look for a
> > > solution that's simple enough that it can be used reliably, without
> > > introducing too much additional complexity into KVM. There is one
> > > approach that seems to fit that bill, that Brijesh attempted in an
> > > earlier version of this series (I'm not sure what exactly was the
> > > catalyst to changing the approach, as I wasn't really in the loop at
> > > the time, but AIUI there weren't any showstoppers there, but please
> > > correct me if I'm missing anything):
> > >
> > >  - if the host is writing to a page that it thinks is supposed to be
> > >    shared, and the guest switches it to private, we get an RMP fault
> > >    (actually, we will get a !PRESENT fault, since as of v5 we now
> > >    remove the mapping from the directmap as part of conversion)
> > >  - in the host #PF handler, if we see that the page is marked private
> > >    in the RMP table, simply switch it back to shared
> > >  - if this was a bug on the part of the host, then the guest will see
> >
> > As discussed off-list, attempting to fix up RMP violations in the host #PF handler
> > is not a viable approach.  There was also extensive discussion on-list a while back:
> >
> > https://lore.kernel.org/all/8a244d34-2b10-4cf8-894a-1bf12b59cf92@www.fastmail.com
> 
> I mentioned this during Mike's talk at the micro-conference: For pages
> mapped in by the kernel can we disallow them to be converted to
> private?

In theory, yes.  Do we want to do something like this?  No.  kmap() does something
vaguely similar for 32-bit PAE/PSE kernels, but that's a lot of complexity and
overhead to take on.  And this issue goes far beyond a kmap(); when the kernel gup()s
a page, the kernel expects the pfn to be available, no exceptions (pun intended).

> Note, userspace accesses are already handled by UPM.

I'm confused by the UPM comment.  Isn't the gist of this thread about the ability
to merge SNP _without_ UPM?  Or am I out in left field?

> In pseudo-code, I'm thinking something like this:
> 
> kmap_helper() {
>   // And all other interfaces where the kernel can map a GPA
>   // into the kernel page tables
>   mapped_into_kernel_mem_set[hpa] = true;
> }
> 
> kunmap_helper() {
>   // And all other interfaces where the kernel can unmap a GPA
>   // into the kernel page tables
>   mapped_into_kernel_mem_set[hpa] = false;
> 
>   // Except it's not this simple because we probably need ref counting
>   // for multiple mappings. Sigh. But you get the idea.

A few issues off the top of my head:

  - It's not just refcounting, there would also likely need to be locking to
    guarantee sane behavior.
  - kmap() isn't allowed to fail and RMPUPDATE isn't strictly guaranteed to succeed,
    which is problematic if the kernel attempts to kmap() a page that's already
    private, especially for kmap_atomic(), which isn't allowed to sleep.
  - Not all kernel code is well behaved and bounces through kmap(); undoubtedly
    some of the 1200+ users of page_address() will be problematic.
    
    $ git grep page_address | wc -l
    1267
  - It's not sufficient for TDX.  Merging something this complicated when we know
    we still need UPM would be irresponsible from a maintenance perspective.
  - KVM would need to support two separate APIs for SNP, which I very much don't
    want to do.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2022-09-14 16:15                 ` Sean Christopherson
@ 2022-09-14 16:32                   ` Marc Orr
  2022-09-14 16:39                     ` Marc Orr
  0 siblings, 1 reply; 239+ messages in thread
From: Marc Orr @ 2022-09-14 16:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Michael Roth, Brijesh Singh, x86, LKML, kvm list, linux-coco,
	Linux Memory Management List, Linux Crypto Mailing List,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Vlastimil Babka, Kirill A . Shutemov,
	Andi Kleen, Tony Luck, Sathyanarayanan Kuppuswamy, jarkko

On Wed, Sep 14, 2022 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Sep 14, 2022, Marc Orr wrote:
> > On Wed, Sep 14, 2022 at 9:05 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Thu, Sep 08, 2022, Michael Roth wrote:
> > > > On Fri, Oct 15, 2021 at 05:16:28PM +0000, Sean Christopherson wrote:
> > > > So in the context of this interim solution, we're trying to look for a
> > > > solution that's simple enough that it can be used reliably, without
> > > > introducing too much additional complexity into KVM. There is one
> > > > approach that seems to fit that bill, that Brijesh attempted in an
> > > > earlier version of this series (I'm not sure what exactly was the
> > > > catalyst to changing the approach, as I wasn't really in the loop at
> > > > the time, but AIUI there weren't any showstoppers there, but please
> > > > correct me if I'm missing anything):
> > > >
> > > >  - if the host is writing to a page that it thinks is supposed to be
> > > >    shared, and the guest switches it to private, we get an RMP fault
> > > >    (actually, we will get a !PRESENT fault, since as of v5 we now
> > > >    remove the mapping from the directmap as part of conversion)
> > > >  - in the host #PF handler, if we see that the page is marked private
> > > >    in the RMP table, simply switch it back to shared
> > > >  - if this was a bug on the part of the host, then the guest will see
> > >
> > > As discussed off-list, attempting to fix up RMP violations in the host #PF handler
> > > is not a viable approach.  There was also extensive discussion on-list a while back:
> > >
> > > https://lore.kernel.org/all/8a244d34-2b10-4cf8-894a-1bf12b59cf92@www.fastmail.com
> >
> > I mentioned this during Mike's talk at the micro-conference: For pages
> > mapped in by the kernel can we disallow them to be converted to
> > private?
>
> In theory, yes.  Do we want to do something like this?  No.  kmap() does something
> vaguely similar for 32-bit PAE/PSE kernels, but that's a lot of complexity and
> overhead to take on.  And this issue goes far beyond a kmap(); when the kernel gup()s
> a page, the kernel expects the pfn to be available, no exceptions (pun intended).
>
> > Note, userspace accesses are already handled by UPM.
>
> I'm confused by the UPM comment.  Isn't the gist of this thread about the ability
> to merge SNP _without_ UPM?  Or am I out in left field?

I think that was the overall gist: yes. But it's not what I was trying
to comment on :-).

HOWEVER, thinking about this more: I was confused when I wrote out my
last reply. I had thought that the issue that Michael brought up
applied even with UPM. That is, I was thinking it was still possibly
for a guest to maliciously convert a page to private mapped in by the
kernel and assumed to be shared.

But I now realize that is not what will actually happen. To be
concrete, let's assume the GHCB page. What will happen is:
- KVM has GHCB page mapped in. GHCB is always assumed to be shared. So
far so good.
- Malicious guest converts GHCB page to private (e.g., via Page State
Change request)
- Guest exits to KVM
- KVM exits to userspace VMM
- Userspace VM allocates page in private FD.

Now, what happens here depends on how UPM works. If we allow double
allocation then our host kernel is safe. However, now we have the
"double allocation problem".

If on the other hand, we deallocate the page in the shared FD, the
host kernel can segfault. And now we actually do have essentially the
same problem Michael was describing that we have without UPM. Because
we'll end up in fault.c in the kernel context and likely panic the
host.

I hope I got this right this time. Sorry for the confusion on my last reply.

> > In pseudo-code, I'm thinking something like this:
> >
> > kmap_helper() {
> >   // And all other interfaces where the kernel can map a GPA
> >   // into the kernel page tables
> >   mapped_into_kernel_mem_set[hpa] = true;
> > }
> >
> > kunmap_helper() {
> >   // And all other interfaces where the kernel can unmap a GPA
> >   // into the kernel page tables
> >   mapped_into_kernel_mem_set[hpa] = false;
> >
> >   // Except it's not this simple because we probably need ref counting
> >   // for multiple mappings. Sigh. But you get the idea.
>
> A few issues off the top of my head:
>
>   - It's not just refcounting, there would also likely need to be locking to
>     guarantee sane behavior.
>   - kmap() isn't allowed to fail and RMPUPDATE isn't strictly guaranteed to succeed,
>     which is problematic if the kernel attempts to kmap() a page that's already
>     private, especially for kmap_atomic(), which isn't allowed to sleep.
>   - Not all kernel code is well behaved and bounces through kmap(); undoubtedly
>     some of the 1200+ users of page_address() will be problematic.
>
>     $ git grep page_address | wc -l
>     1267
>   - It's not sufficient for TDX.  Merging something this complicated when we know
>     we still need UPM would be irresponsible from a maintenance perspective.
>   - KVM would need to support two separate APIs for SNP, which I very much don't
>     want to do.

Ack on merging without UPM. I wasn't trying to chime in on merging
before/after UPM. See my other comment above. Sorry for the confusion.
Ack on other concerns about "enlightening kmap" as well. I agree.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2022-09-14 16:32                   ` Marc Orr
@ 2022-09-14 16:39                     ` Marc Orr
  0 siblings, 0 replies; 239+ messages in thread
From: Marc Orr @ 2022-09-14 16:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Michael Roth, Brijesh Singh, x86, LKML, kvm list, linux-coco,
	Linux Memory Management List, Linux Crypto Mailing List,
	Thomas Gleixner, Ingo Molnar, Joerg Roedel, Tom Lendacky,
	H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Andy Lutomirski, Dave Hansen,
	Sergio Lopez, Peter Gonda, Peter Zijlstra, Srinivas Pandruvada,
	David Rientjes, Dov Murik, Tobin Feldman-Fitzthum,
	Borislav Petkov, Vlastimil Babka, Kirill A . Shutemov,
	Andi Kleen, Tony Luck, Sathyanarayanan Kuppuswamy, jarkko

On Wed, Sep 14, 2022 at 5:32 PM Marc Orr <marcorr@google.com> wrote:
>
> On Wed, Sep 14, 2022 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Sep 14, 2022, Marc Orr wrote:
> > > On Wed, Sep 14, 2022 at 9:05 AM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Thu, Sep 08, 2022, Michael Roth wrote:
> > > > > On Fri, Oct 15, 2021 at 05:16:28PM +0000, Sean Christopherson wrote:
> > > > > So in the context of this interim solution, we're trying to look for a
> > > > > solution that's simple enough that it can be used reliably, without
> > > > > introducing too much additional complexity into KVM. There is one
> > > > > approach that seems to fit that bill, that Brijesh attempted in an
> > > > > earlier version of this series (I'm not sure what exactly was the
> > > > > catalyst to changing the approach, as I wasn't really in the loop at
> > > > > the time, but AIUI there weren't any showstoppers there, but please
> > > > > correct me if I'm missing anything):
> > > > >
> > > > >  - if the host is writing to a page that it thinks is supposed to be
> > > > >    shared, and the guest switches it to private, we get an RMP fault
> > > > >    (actually, we will get a !PRESENT fault, since as of v5 we now
> > > > >    remove the mapping from the directmap as part of conversion)
> > > > >  - in the host #PF handler, if we see that the page is marked private
> > > > >    in the RMP table, simply switch it back to shared
> > > > >  - if this was a bug on the part of the host, then the guest will see
> > > >
> > > > As discussed off-list, attempting to fix up RMP violations in the host #PF handler
> > > > is not a viable approach.  There was also extensive discussion on-list a while back:
> > > >
> > > > https://lore.kernel.org/all/8a244d34-2b10-4cf8-894a-1bf12b59cf92@www.fastmail.com
> > >
> > > I mentioned this during Mike's talk at the micro-conference: For pages
> > > mapped in by the kernel can we disallow them to be converted to
> > > private?
> >
> > In theory, yes.  Do we want to do something like this?  No.  kmap() does something
> > vaguely similar for 32-bit PAE/PSE kernels, but that's a lot of complexity and
> > overhead to take on.  And this issue goes far beyond a kmap(); when the kernel gup()s
> > a page, the kernel expects the pfn to be available, no exceptions (pun intended).
> >
> > > Note, userspace accesses are already handled by UPM.
> >
> > I'm confused by the UPM comment.  Isn't the gist of this thread about the ability
> > to merge SNP _without_ UPM?  Or am I out in left field?
>
> I think that was the overall gist: yes. But it's not what I was trying
> to comment on :-).
>
> HOWEVER, thinking about this more: I was confused when I wrote out my
> last reply. I had thought that the issue that Michael brought up
> applied even with UPM. That is, I was thinking it was still possibly
> for a guest to maliciously convert a page to private mapped in by the
> kernel and assumed to be shared.
>
> But I now realize that is not what will actually happen. To be
> concrete, let's assume the GHCB page. What will happen is:
> - KVM has GHCB page mapped in. GHCB is always assumed to be shared. So
> far so good.
> - Malicious guest converts GHCB page to private (e.g., via Page State
> Change request)
> - Guest exits to KVM
> - KVM exits to userspace VMM
> - Userspace VM allocates page in private FD.
>
> Now, what happens here depends on how UPM works. If we allow double
> allocation then our host kernel is safe. However, now we have the
> "double allocation problem".
>
> If on the other hand, we deallocate the page in the shared FD, the
> host kernel can segfault. And now we actually do have essentially the
> same problem Michael was describing that we have without UPM. Because
> we'll end up in fault.c in the kernel context and likely panic the
> host.

Thinking about this even more... Even if we deallocate in the
userspace VMM's shared FD, the kernel has its own page tables --
right? So maybe we are actually 100% OK under UPM then regardless of
the userspace VMM's policy around managing the private and shared FDs.

^ permalink raw reply	[flat|nested] 239+ messages in thread

* Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap
  2022-09-14  8:05             ` Sean Christopherson
  2022-09-14 11:02               ` Marc Orr
@ 2022-09-19 17:56               ` Michael Roth
  1 sibling, 0 replies; 239+ messages in thread
From: Michael Roth @ 2022-09-19 17:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Brijesh Singh, x86, linux-kernel, kvm, linux-coco, linux-mm,
	linux-crypto, Thomas Gleixner, Ingo Molnar, Joerg Roedel,
	Tom Lendacky, H. Peter Anvin, Ard Biesheuvel, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Andy Lutomirski,
	Dave Hansen, Sergio Lopez, Peter Gonda, Peter Zijlstra,
	Srinivas Pandruvada, David Rientjes, Dov Murik,
	Tobin Feldman-Fitzthum, Borislav Petkov, Vlastimil Babka,
	Kirill A . Shutemov, Andi Kleen, tony.luck, marcorr,
	sathyanarayanan.kuppuswamy, jarkko

On Wed, Sep 14, 2022 at 08:05:49AM +0000, Sean Christopherson wrote:
> On Thu, Sep 08, 2022, Michael Roth wrote:
> > On Fri, Oct 15, 2021 at 05:16:28PM +0000, Sean Christopherson wrote:
> > So in the context of this interim solution, we're trying to look for a
> > solution that's simple enough that it can be used reliably, without
> > introducing too much additional complexity into KVM. There is one
> > approach that seems to fit that bill, that Brijesh attempted in an
> > earlier version of this series (I'm not sure what exactly was the
> > catalyst to changing the approach, as I wasn't really in the loop at
> > the time, but AIUI there weren't any showstoppers there, but please
> > correct me if I'm missing anything):
> > 
> >  - if the host is writing to a page that it thinks is supposed to be
> >    shared, and the guest switches it to private, we get an RMP fault
> >    (actually, we will get a !PRESENT fault, since as of v5 we now
> >    remove the mapping from the directmap as part of conversion)
> >  - in the host #PF handler, if we see that the page is marked private
> >    in the RMP table, simply switch it back to shared
> >  - if this was a bug on the part of the host, then the guest will see

Hi Sean,

Thanks for the input here and at KVM Forum.

> 
> As discussed off-list, attempting to fix up RMP violations in the host #PF handler
> is not a viable approach.  There was also extensive discussion on-list a while back:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F8a244d34-2b10-4cf8-894a-1bf12b59cf92%40www.fastmail.com&amp;data=05%7C01%7Cmichael.roth%40amd.com%7C2f2356ebe2b44daab93708da9627f2b4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637987395629620130%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Mm13HgUAE4M%2BluyBys3Ihp%2FTNqSQTq14WrMXdF8ArAw%3D&amp;reserved=0

I think that was likely the only hope for a non-UPM approach, as
anything else would require a good bit of infrastructure in KVM and
elsewhere to avoid that situation occuring to begin with, and it
probably would not be worth the effort outside the context of a
general/platform-independent solution like UPM. I was hoping it would be
possible to work through Andy's concerns, but the concerns you and Paolo
raised of potential surprise #PFs in other parts of the kernel are
something I'm less optimistic about, so I agree UPM is probably the right
place to focus efforts.

> 
> > AIUI, this is still sort of an open question, but you noted how nuking
> > the directmap without any formalized interface for letting the kernel
> > know about it could be problematic down the road, which also sounds
> > like the sort of thing more suited for having UPM address at a more
> > general level, since there are similar requirements for TDX as well.
> > 
> > AIUI there are 2 main arguments against splitting the directmap:
> >  a) we can't easily rebuild it atm
> >  b) things like KSM might still tries to access private pages
> > 
> > But unmapping also suffers from a), since we still end up splitting the
> > directmap unless we remove pages in blocks of 2M.
> 
> But for UPM, it's easy to rebuild the direct map since there will be an explicit,
> kernel controlled point where the "inaccesible" memfd releases the private page.

I was thinking it would be possible to do something similar by doing page
splitting/restore in bulk as part of MEM_ENCRYPT_{REG,UNREG}_REGION, but
yes UPM also allows for a convenient point in time to split/unsplit.

> 
> > But nothing prevents a guest from switching a single 4K page to private, in
> > which case we are forced to split. That would be normal behavior on the part
> > of the guest for setting up GHCB pages/etc, so we still end up splitting the
> > directmap over time.
> 
> The host actually isn't _forced_ to split with UPM.  One option would be to refuse
> to split the direct map and instead force userspace to eat the 2mb allocation even
> though it only wants to map a single 4kb chunk into the guest.  I don't know that
> that's a _good_ option, but it is an option.

That does seem like a reasonable option. Maybe it also opens up a path
for hugetlbfs support of sorts. In practice I wouldn't expect too many
of those pages to be wasted, worst case would be 2MB per shared page in
the guest... I suppose that could add up for GHCB pages and whatnot if
there are lots of vCPUs, but at that point you're likely dealing with
large guests with enough memory to spare. Could be another pain point
regarding calculating appropriate memory limits for userspace though.

Thanks!

-Mike

^ permalink raw reply	[flat|nested] 239+ messages in thread

end of thread, other threads:[~2022-09-19 17:57 UTC | newest]

Thread overview: 239+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20 15:58 [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 01/45] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
2021-09-16 16:56   ` Borislav Petkov
2021-09-16 17:35     ` Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 02/45] iommu/amd: Introduce function to check SEV-SNP support Brijesh Singh
2021-09-16 17:26   ` Borislav Petkov
2021-08-20 15:58 ` [PATCH Part2 v5 03/45] x86/sev: Add the host SEV-SNP initialization support Brijesh Singh
2021-09-24  8:58   ` Borislav Petkov
2021-08-20 15:58 ` [PATCH Part2 v5 04/45] x86/sev: Add RMP entry lookup helpers Brijesh Singh
2021-09-24  9:49   ` Borislav Petkov
2021-09-27 16:01     ` Brijesh Singh
2021-09-27 16:04       ` Brijesh Singh
2021-09-29 12:56         ` Borislav Petkov
2022-06-02 11:57   ` Jarkko Sakkinen
2021-08-20 15:58 ` [PATCH Part2 v5 05/45] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Brijesh Singh
2021-09-24 14:04   ` Borislav Petkov
2021-09-27 16:06     ` Brijesh Singh
2021-10-15 18:05   ` Sean Christopherson
2021-10-15 20:18     ` Brijesh Singh
2021-10-15 20:27       ` Sean Christopherson
2021-10-15 20:36         ` Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 06/45] x86/sev: Invalid pages from direct map when adding it to RMP table Brijesh Singh
2021-09-29 14:34   ` Borislav Petkov
2021-09-30 16:19     ` Brijesh Singh
2021-10-01 11:06       ` Borislav Petkov
2021-08-20 15:58 ` [PATCH Part2 v5 07/45] x86/traps: Define RMP violation #PF error code Brijesh Singh
2021-09-29 17:25   ` Borislav Petkov
2021-08-20 15:58 ` [PATCH Part2 v5 08/45] x86/fault: Add support to handle the RMP fault for user address Brijesh Singh
2021-08-23 14:20   ` Dave Hansen
2021-08-23 14:36     ` Brijesh Singh
2021-08-23 14:50       ` Dave Hansen
2021-08-24 16:42         ` Joerg Roedel
2021-08-25  9:16           ` Vlastimil Babka
2021-08-25 13:50             ` Tom Lendacky
2021-09-29 18:19   ` Borislav Petkov
2021-08-20 15:58 ` [PATCH Part2 v5 09/45] x86/fault: Add support to dump RMP entry on fault Brijesh Singh
2021-09-29 18:38   ` Borislav Petkov
2021-08-20 15:58 ` [PATCH Part2 v5 10/45] crypto: ccp: shutdown SEV firmware on kexec Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 11/45] crypto:ccp: Define the SEV-SNP commands Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 12/45] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 13/45] crypto:ccp: Provide APIs to issue SEV-SNP commands Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 14/45] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Brijesh Singh
2022-02-25 18:03   ` Alper Gun
2022-03-01 14:12     ` Brijesh Singh
2022-06-14  0:10   ` Alper Gun
2021-08-20 15:58 ` [PATCH Part2 v5 15/45] crypto: ccp: Handle the legacy SEV command " Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 16/45] crypto: ccp: Add the SNP_PLATFORM_STATUS command Brijesh Singh
2021-09-10  3:18   ` Marc Orr
2021-09-13 11:17     ` Brijesh Singh
2021-09-22 17:35   ` Dr. David Alan Gilbert
2021-09-23 18:01     ` Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 17/45] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Brijesh Singh
2021-09-01 21:02   ` Connor Kuehl
2021-09-01 23:06     ` Brijesh Singh
2021-09-10  3:27   ` Marc Orr
2021-09-13 11:29     ` Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 18/45] crypto: ccp: Provide APIs to query extended attestation report Brijesh Singh
2021-09-10  3:30   ` Marc Orr
2021-09-12  7:46     ` Dov Murik
2021-08-20 15:58 ` [PATCH Part2 v5 19/45] KVM: SVM: Add support to handle AP reset MSR protocol Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 20/45] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT Brijesh Singh
2021-10-12 20:38   ` Sean Christopherson
2021-08-20 15:58 ` [PATCH Part2 v5 21/45] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Brijesh Singh
2021-09-22 18:55   ` Dr. David Alan Gilbert
2021-09-23 18:09     ` Brijesh Singh
2021-09-23 18:39       ` Dr. David Alan Gilbert
2021-09-23 22:23         ` Brijesh Singh
2021-09-23 19:17       ` Marc Orr
2021-09-23 20:44         ` Brijesh Singh
2021-09-23 20:55           ` Marc Orr
2021-10-12 20:44   ` Sean Christopherson
2021-08-20 15:58 ` [PATCH Part2 v5 22/45] KVM: SVM: Add initial SEV-SNP support Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 23/45] KVM: SVM: Add KVM_SNP_INIT command Brijesh Singh
2021-09-05  6:56   ` Dov Murik
2021-09-05 13:59     ` Brijesh Singh
2021-09-10  3:32   ` Marc Orr
2021-09-13 11:32     ` Brijesh Singh
2021-09-16 15:50   ` Peter Gonda
2022-06-13 20:58   ` Alper Gun
2022-06-13 23:15     ` Ashish Kalra
2022-06-13 23:33       ` Alper Gun
2022-06-14  0:21         ` Ashish Kalra
2022-06-14 15:37           ` Peter Gonda
2022-06-14 16:11             ` Kalra, Ashish
2022-06-14 16:30               ` Peter Gonda
2022-06-14 17:16                 ` Kalra, Ashish
2022-06-14 18:58                   ` Alper Gun
2022-06-14 20:23                     ` Kalra, Ashish
2022-06-14 20:29                       ` Peter Gonda
2022-06-14 20:39                         ` Kalra, Ashish
2021-08-20 15:58 ` [PATCH Part2 v5 24/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Brijesh Singh
2021-08-20 15:58 ` [PATCH Part2 v5 25/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Brijesh Singh
2021-09-27 16:43   ` Peter Gonda
2021-09-27 19:33     ` Brijesh Singh
2021-10-05 15:01       ` Peter Gonda
2021-08-20 15:58 ` [PATCH Part2 v5 26/45] KVM: SVM: Mark the private vma unmerable for SEV-SNP guests Brijesh Singh
2021-09-23 17:18   ` Dr. David Alan Gilbert
2021-10-12 18:46   ` Sean Christopherson
2021-10-13 12:39     ` Brijesh Singh
2021-10-13 14:34       ` Sean Christopherson
2021-10-13 14:51         ` Brijesh Singh
2021-10-13 15:33           ` Sean Christopherson
2021-08-20 15:59 ` [PATCH Part2 v5 27/45] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Brijesh Singh
2022-05-18 20:21   ` Marc Orr
2022-05-18 20:35     ` Kalra, Ashish
2021-08-20 15:59 ` [PATCH Part2 v5 28/45] KVM: X86: Keep the NPT and RMP page level in sync Brijesh Singh
2021-08-20 15:59 ` [PATCH Part2 v5 29/45] KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault() Brijesh Singh
2021-08-20 15:59 ` [PATCH Part2 v5 30/45] KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP Brijesh Singh
2021-08-20 15:59 ` [PATCH Part2 v5 31/45] KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use Brijesh Singh
2021-08-20 15:59 ` [PATCH Part2 v5 32/45] KVM: x86: Define RMP page fault error bits for #NPF Brijesh Singh
2021-09-30 23:41   ` Marc Orr
2021-10-01 13:03     ` Borislav Petkov
2021-08-20 15:59 ` [PATCH Part2 v5 33/45] KVM: x86: Update page-fault trace to log full 64-bit error code Brijesh Singh
2021-10-13 21:23   ` Sean Christopherson
2021-08-20 15:59 ` [PATCH Part2 v5 34/45] KVM: SVM: Do not use long-lived GHCB map while setting scratch area Brijesh Singh
2021-10-13 21:20   ` Sean Christopherson
2021-10-15 16:11     ` Brijesh Singh
2021-10-15 16:44       ` Sean Christopherson
2021-08-20 15:59 ` [PATCH Part2 v5 35/45] KVM: SVM: Remove the long-lived GHCB host map Brijesh Singh
2021-08-20 15:59 ` [PATCH Part2 v5 36/45] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Brijesh Singh
2021-08-20 15:59 ` [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
2021-09-28  9:56   ` Dr. David Alan Gilbert
2021-10-12 21:48   ` Sean Christopherson
2021-10-13 17:04     ` Sean Christopherson
2021-10-13 17:05     ` Brijesh Singh
2021-10-13 17:24       ` Sean Christopherson
2021-10-13 17:49         ` Brijesh Singh
2021-08-20 15:59 ` [PATCH Part2 v5 38/45] KVM: SVM: Add support to handle " Brijesh Singh
2021-09-28 10:17   ` Dr. David Alan Gilbert
2021-09-28 23:20     ` Brijesh Singh
2021-08-20 15:59 ` [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap Brijesh Singh
2021-10-13  0:23   ` Sean Christopherson
2021-10-13 18:10     ` Brijesh Singh
2021-10-13 20:10       ` Sean Christopherson
2021-10-13 21:49         ` Brijesh Singh
2021-10-13 22:10           ` Sean Christopherson
2021-10-13 22:31             ` Brijesh Singh
2021-10-13 20:16     ` Sean Christopherson
2021-10-15 16:31       ` Brijesh Singh
2021-10-15 17:16         ` Sean Christopherson
2022-09-08 21:21           ` Michael Roth
2022-09-08 22:28             ` Michael Roth
2022-09-14  8:05             ` Sean Christopherson
2022-09-14 11:02               ` Marc Orr
2022-09-14 16:15                 ` Sean Christopherson
2022-09-14 16:32                   ` Marc Orr
2022-09-14 16:39                     ` Marc Orr
2022-09-19 17:56               ` Michael Roth
2021-08-20 15:59 ` [PATCH Part2 v5 40/45] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Brijesh Singh
2021-08-20 15:59 ` [PATCH Part2 v5 41/45] KVM: SVM: Add support to handle the RMP nested page fault Brijesh Singh
2021-09-29 12:24   ` Dr. David Alan Gilbert
2021-10-13 17:57   ` Sean Christopherson
2021-08-20 15:59 ` [PATCH Part2 v5 42/45] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Brijesh Singh
2021-09-29 21:33   ` Peter Gonda
2021-09-29 22:00   ` Peter Gonda
2021-08-20 15:59 ` [PATCH Part2 v5 43/45] KVM: SVM: Use a VMSA physical address variable for populating VMCB Brijesh Singh
2021-10-15 18:58   ` Sean Christopherson
2021-08-20 15:59 ` [PATCH Part2 v5 44/45] KVM: SVM: Support SEV-SNP AP Creation NAE event Brijesh Singh
2021-10-15 19:50   ` Sean Christopherson
2021-10-20 21:48     ` Brijesh Singh
2021-10-20 23:01       ` Sean Christopherson
2021-08-20 15:59 ` [PATCH Part2 v5 45/45] KVM: SVM: Add module parameter to enable the SEV-SNP Brijesh Singh
2021-11-12 15:43 ` [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Peter Gonda
2021-11-12 17:59   ` Dave Hansen
2021-11-12 18:35     ` Borislav Petkov
2021-11-12 19:48       ` Sean Christopherson
2021-11-12 20:04         ` Borislav Petkov
2021-11-12 20:37           ` Sean Christopherson
2021-11-12 20:53             ` Borislav Petkov
2021-11-12 21:12               ` Peter Gonda
2021-11-12 21:20                 ` Andy Lutomirski
2021-11-12 22:04                   ` Borislav Petkov
2021-11-12 22:52                   ` Peter Gonda
2021-11-13  0:00                 ` Sean Christopherson
2021-11-13  0:10                   ` Marc Orr
2021-11-13 18:34                     ` Sean Christopherson
2021-11-14  7:54                       ` Marc Orr
2021-11-15 17:16                         ` Sean Christopherson
2021-11-15 16:36                       ` Joerg Roedel
2021-11-15 17:25                         ` Sean Christopherson
2021-11-12 21:30             ` Marc Orr
2021-11-12 21:37               ` Dave Hansen
2021-11-12 21:40                 ` Marc Orr
2021-11-12 21:39               ` Andy Lutomirski
2021-11-12 21:43                 ` Marc Orr
2021-11-12 22:54                   ` Peter Gonda
2021-11-13  0:53                     ` Sean Christopherson
2021-11-13  1:04                       ` Marc Orr
2021-11-13 18:28                         ` Sean Christopherson
2021-11-14  7:41                           ` Marc Orr
2021-11-15 18:17                             ` Sean Christopherson
2021-11-15 16:52                           ` Joerg Roedel
2021-11-15 16:18             ` Brijesh Singh
2021-11-15 18:44               ` Sean Christopherson
2021-11-15 18:58                 ` Brijesh Singh
2021-11-12 21:16         ` Marc Orr
2021-11-12 21:23           ` Andy Lutomirski
2021-11-12 21:35             ` Borislav Petkov
2021-11-15 12:30         ` Dr. David Alan Gilbert
2021-11-15 14:42           ` Joerg Roedel
2021-11-15 15:33             ` Dr. David Alan Gilbert
2021-11-15 16:20               ` Joerg Roedel
2021-11-15 16:32                 ` Dr. David Alan Gilbert
2021-11-15 18:26           ` Sean Christopherson
2021-11-15 18:41             ` Marc Orr
2021-11-15 19:15               ` Sean Christopherson
2021-11-16  3:07                 ` Marc Orr
2021-11-16  5:14                   ` Andy Lutomirski
2021-11-16 13:21                     ` Joerg Roedel
2021-11-16 18:26                       ` Sean Christopherson
2021-11-16 18:39                         ` Peter Gonda
2021-11-16 13:30                 ` Joerg Roedel
2021-11-16  5:00               ` Andy Lutomirski
2021-11-16 13:02             ` Joerg Roedel
2021-11-16 20:08               ` Sean Christopherson
2021-11-15 16:16         ` Joerg Roedel
2021-11-22 15:23   ` Brijesh Singh
2021-11-22 17:03     ` Vlastimil Babka
2021-11-22 18:01       ` Brijesh Singh
2021-11-22 18:30     ` Dave Hansen
2021-11-22 19:06       ` Brijesh Singh
2021-11-22 19:14         ` Dave Hansen
2021-11-22 20:33           ` Brijesh Singh
2021-11-22 21:34             ` Sean Christopherson
2021-11-22 22:51             ` Dave Hansen
2021-11-23  5:15               ` Luck, Tony
2021-11-23  7:18               ` Borislav Petkov
2021-11-23 15:36                 ` Sean Christopherson
2021-11-23 16:26                   ` Borislav Petkov
2021-11-23  8:55               ` Vlastimil Babka
2021-11-24 16:03               ` Joerg Roedel
2021-11-24 17:48                 ` Dave Hansen
2021-11-24 19:34                   ` Vlastimil Babka
2021-11-25 10:05                   ` Joerg Roedel
2021-11-29 14:44                     ` Brijesh Singh
2021-11-29 14:58                       ` Vlastimil Babka
2021-11-29 16:13                         ` Brijesh Singh
2021-11-30 19:40                           ` Vlastimil Babka
2021-11-29 16:41                     ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).