linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support
@ 2023-12-30 16:19 Michael Roth
  2023-12-30 16:19 ` [PATCH v1 01/26] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
                   ` (25 more replies)
  0 siblings, 26 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

This patchset is also available at:

  https://github.com/amdese/linux/commits/snp-host-init-v1

and is based on top of linux-next tag next-20231222

These patches were originally included in v10 of the SNP KVM/hypervisor
patches[1], but have been split off from the general KVM support for easier
review and eventual merging into the x86 tree. They are based on linux-next
to help stay in sync with both tip and kvm-next.

There is 1 KVM-specific patch here since it is needed to avoid regressions
when running legacy SEV guests while the RMP table is enabled.

== OVERVIEW ==

AMD EPYC systems utilizing Zen 3 and newer microarchitectures add support
for a new feature called SEV-SNP, which adds Secure Nested Paging support
on top of the SEV/SEV-ES support already present on existing EPYC systems.

One of the main features of SNP is the addition of an RMP (Reverse Map)
table to enforce additional security protections for private guest memory.
This series primarily focuses on the various host initialization
requirements for enabling SNP on the system, while the actual KVM support
for running SNP guests is added as a separate series based on top of these
patches.

The basic requirements to initialize SNP support on a host when the feature
has been enabled in the BIOS are:

  - Discovering and initializing the RMP table
  - Initializing various MSRs to enable the capability across CPUs
  - Various tasks to maintain legacy functionality on the system, such as:
    - Setting up hooks for handling RMP-related changes for IOMMU pages
    - Initializing SNP in the firmware via the CCP driver, and implement
      additional requirements needed for continued operation of legacy
      SEV/SEV-ES guests

Additionally some basic SEV ioctl interfaces are added to configure various
aspects of SNP-enabled firmwares via the CCP driver.

More details are available in the SEV-SNP Firmware ABI[2].

== KNOWN ISSUES ==

  - When SNP is enabled, it has been observed that TMR allocation may fail when
    reloading CCP driver after kexec. This is being investigated.

Feedback/review is very much appreciated!

-Mike

[1] https://lore.kernel.org/kvm/20231016132819.1002933-1-michael.roth@amd.com/ 
[2] https://www.amd.com/en/developer/sev.html 


Changes since being split off from v10 hypervisor patches:

 * Move all host initialization patches to beginning of overall series and
   post as a separate patchset. Include only KVM patches that are necessary
   for maintaining legacy SVM/SEV functionality with SNP enabled.
   (Paolo, Boris)
 * Don't enable X86_FEATURE_SEV_SNP until all patches are in place to maintain
   legacy SVM/SEV/SEV-ES functionality when RMP table and SNP firmware support
   are enabled. (Paolo)
 * Re-write how firmware-owned buffers are handled when dealing with legacy
   SEV commands. Allocate on-demand rather than relying on pre-allocated pool,
   use a descriptor format for handling nested/pointer params instead of
   relying on macros. (Boris)
 * Don't introduce sev-host.h, re-use sev.h (Boris)
 * Various renames, cleanups, refactorings throughout the tree (Boris)
 * Rework leaked pages handling (Vlastimil)
 * Fix kernel-doc errors introduced by series (Boris)
 * Fix warnings when onlining/offlining CPUs (Jeremi, Ashish)
 * Ensure X86_FEATURE_SEV_SNP is cleared early enough that AutoIBRS will still
   be enabled if RMP table support is not available/configured. (Tom)
 * Only read the RMP base/end MSR values once via BSP (Boris)
 * Handle IOMMU SNP setup automatically based on state machine rather than via
   external caller (Boris)

----------------------------------------------------------------
Ashish Kalra (5):
      iommu/amd: Don't rely on external callers to enable IOMMU SNP support
      x86/mtrr: Don't print errors if MtrrFixDramModEn is set when SNP enabled
      x86/sev: Introduce snp leaked pages list
      iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown
      crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump

Brijesh Singh (16):
      x86/cpufeatures: Add SEV-SNP CPU feature
      x86/sev: Add the host SEV-SNP initialization support
      x86/sev: Add RMP entry lookup helpers
      x86/fault: Add helper for dumping RMP entries
      x86/traps: Define RMP violation #PF error code
      x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
      x86/sev: Invalidate pages from the direct map when adding them to the RMP table
      crypto: ccp: Define the SEV-SNP commands
      crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
      crypto: ccp: Provide API to issue SEV and SNP commands
      crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
      crypto: ccp: Handle legacy SEV commands when SNP is enabled
      crypto: ccp: Add debug support for decrypting pages
      KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
      crypto: ccp: Add the SNP_PLATFORM_STATUS command
      crypto: ccp: Add the SNP_SET_CONFIG command

Kim Phillips (1):
      x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled

Michael Roth (2):
      x86/fault: Dump RMP table information when RMP page faults occur
      x86/cpufeatures: Enable/unmask SEV-SNP CPU feature

Tom Lendacky (2):
      crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled
      crypto: ccp: Add the SNP_COMMIT command

 Documentation/virt/coco/sev-guest.rst    |   51 ++
 arch/x86/Kbuild                          |    2 +
 arch/x86/include/asm/cpufeatures.h       |    1 +
 arch/x86/include/asm/disabled-features.h |    8 +-
 arch/x86/include/asm/iommu.h             |    1 +
 arch/x86/include/asm/kvm-x86-ops.h       |    1 +
 arch/x86/include/asm/kvm_host.h          |    1 +
 arch/x86/include/asm/msr-index.h         |   11 +-
 arch/x86/include/asm/sev.h               |   36 +
 arch/x86/include/asm/trap_pf.h           |   20 +-
 arch/x86/kernel/cpu/amd.c                |   20 +-
 arch/x86/kernel/cpu/common.c             |    7 +-
 arch/x86/kernel/cpu/mtrr/generic.c       |    3 +
 arch/x86/kernel/crash.c                  |    7 +
 arch/x86/kvm/lapic.c                     |    5 +-
 arch/x86/kvm/svm/nested.c                |    2 +-
 arch/x86/kvm/svm/sev.c                   |   37 +-
 arch/x86/kvm/svm/svm.c                   |   17 +-
 arch/x86/kvm/svm/svm.h                   |    1 +
 arch/x86/mm/fault.c                      |    5 +
 arch/x86/virt/svm/Makefile               |    3 +
 arch/x86/virt/svm/sev.c                  |  513 +++++++++++++
 drivers/crypto/ccp/sev-dev.c             | 1216 +++++++++++++++++++++++++++---
 drivers/crypto/ccp/sev-dev.h             |    5 +
 drivers/iommu/amd/amd_iommu.h            |    1 -
 drivers/iommu/amd/init.c                 |  120 ++-
 include/linux/amd-iommu.h                |    6 +-
 include/linux/psp-sev.h                  |  337 ++++++++-
 include/uapi/linux/psp-sev.h             |   59 ++
 tools/arch/x86/include/asm/cpufeatures.h |    1 +
 30 files changed, 2359 insertions(+), 138 deletions(-)
 create mode 100644 arch/x86/virt/svm/Makefile
 create mode 100644 arch/x86/virt/svm/sev.c




^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v1 01/26] x86/cpufeatures: Add SEV-SNP CPU feature
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2023-12-31 11:50   ` Borislav Petkov
  2023-12-30 16:19 ` [PATCH v1 02/26] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
                   ` (24 subsequent siblings)
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh, Jarkko Sakkinen, Ashish Kalra

From: Brijesh Singh <brijesh.singh@amd.com>

Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Since enabling the SNP CPU feature imposes a number of additional
requirements on host initialization and handling legacy firmware APIs
for SEV/SEV-ES guests, only introduce the CPU feature bit so that the
relevant handling can be added, but leave it disabled via a
disabled-features mask.

Once all the necessary changes needed to maintain legacy SEV/SEV-ES
support are introduced in subsequent patches, the SNP feature bit will
be unmasked/enabled.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/cpufeatures.h       | 1 +
 arch/x86/include/asm/disabled-features.h | 4 +++-
 arch/x86/kernel/cpu/amd.c                | 5 +++--
 tools/arch/x86/include/asm/cpufeatures.h | 1 +
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 29cb275a219d..9492dcad560d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -442,6 +442,7 @@
 #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
 #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
 #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
 #define X86_FEATURE_DEBUG_SWAP		(19*32+14) /* AMD SEV-ES full debug state swap support */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 702d93fdd10e..a864a5b208fa 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -117,6 +117,8 @@
 #define DISABLE_IBT	(1 << (X86_FEATURE_IBT & 31))
 #endif
 
+#define DISABLE_SEV_SNP		0
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -141,7 +143,7 @@
 			 DISABLE_ENQCMD)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	(DISABLE_IBT)
-#define DISABLED_MASK19	0
+#define DISABLED_MASK19	(DISABLE_SEV_SNP)
 #define DISABLED_MASK20	0
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21)
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 9f42d1c59e09..9a17165dfe84 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -592,8 +592,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 	 *	      SME feature (set in scattered.c).
 	 *	      If the kernel has not enabled SME via any means then
 	 *	      don't advertise the SME feature.
-	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
-	 *            SEV and SEV_ES feature (set in scattered.c).
+	 *   For SEV: If BIOS has not enabled SEV then don't advertise SEV and
+	 *	      any additional functionality based on it.
 	 *
 	 *   In all cases, since support for SME and SEV requires long mode,
 	 *   don't advertise the feature under CONFIG_X86_32.
@@ -628,6 +628,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 clear_sev:
 		setup_clear_cpu_cap(X86_FEATURE_SEV);
 		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
 	}
 }
 
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index f4542d2718f4..e58bd69356ee 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -437,6 +437,7 @@
 #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
 #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
 #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
 #define X86_FEATURE_DEBUG_SWAP		(19*32+14) /* AMD SEV-ES full debug state swap support */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 02/26] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
  2023-12-30 16:19 ` [PATCH v1 01/26] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2023-12-30 16:19 ` [PATCH v1 03/26] iommu/amd: Don't rely on external callers to enable IOMMU SNP support Michael Roth
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Kim Phillips, Dave Hansen

From: Kim Phillips <kim.phillips@amd.com>

Without SEV-SNP, Automatic IBRS protects only the kernel. But when
SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
host-side code, including userspace. This protection comes at a cost:
reduced userspace indirect branch performance.

To avoid this performance loss, don't use Automatic IBRS on SEV-SNP
hosts. Fall back to retpolines instead.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
[mdr: squash in changes from review discussion]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kernel/cpu/common.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f367d376520..6b253440ea72 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1355,8 +1355,13 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
 	/*
 	 * AMD's AutoIBRS is equivalent to Intel's eIBRS - use the Intel feature
 	 * flag and protect from vendor-specific bugs via the whitelist.
+	 *
+	 * Don't use AutoIBRS when SNP is enabled because it degrades host
+	 * userspace indirect branch performance.
 	 */
-	if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
+	if ((ia32_cap & ARCH_CAP_IBRS_ALL) ||
+	    (cpu_has(c, X86_FEATURE_AUTOIBRS) &&
+	     !cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
 		setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
 		if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
 		    !(ia32_cap & ARCH_CAP_PBRSB_NO))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 03/26] iommu/amd: Don't rely on external callers to enable IOMMU SNP support
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
  2023-12-30 16:19 ` [PATCH v1 01/26] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
  2023-12-30 16:19 ` [PATCH v1 02/26] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-04 10:30   ` Borislav Petkov
  2024-01-04 10:58   ` Joerg Roedel
  2023-12-30 16:19 ` [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support Michael Roth
                   ` (22 subsequent siblings)
  25 siblings, 2 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

From: Ashish Kalra <ashish.kalra@amd.com>

Currently the expectation is that the kernel will call
amd_iommu_snp_enable() to perform various checks and set the
amd_iommu_snp_en flag that the IOMMU uses to adjust its setup routines
to account for additional requirements on hosts where SNP is enabled.

This is somewhat fragile as it relies on this call being done prior to
IOMMU setup. It is more robust to just do this automatically as part of
IOMMU initialization, so rework the code accordingly.

There is still a need to export information about whether or not the
IOMMU is configured in a manner compatible with SNP, so relocate the
existing amd_iommu_snp_en flag so it can be used to convey that
information in place of the return code that was previously provided by
calls to amd_iommu_snp_enable().

While here, also adjust the kernel messages related to IOMMU SNP
enablement for consistency/grammar/clarity.

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/iommu.h  |  1 +
 drivers/iommu/amd/amd_iommu.h |  1 -
 drivers/iommu/amd/init.c      | 69 ++++++++++++++++-------------------
 include/linux/amd-iommu.h     |  4 --
 4 files changed, 32 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index 2fd52b65deac..3be2451e7bc8 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -10,6 +10,7 @@ extern int force_iommu, no_iommu;
 extern int iommu_detected;
 extern int iommu_merge;
 extern int panic_on_overflow;
+extern bool amd_iommu_snp_en;
 
 #ifdef CONFIG_SWIOTLB
 extern bool x86_swiotlb_enable;
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 8b3601f285fd..c970eae2313d 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -164,5 +164,4 @@ void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
 				  u64 *root, int mode);
 struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
 
-extern bool amd_iommu_snp_en;
 #endif
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index c83bd0c2a1c9..96a1a7fed470 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3221,6 +3221,36 @@ static bool __init detect_ivrs(void)
 	return true;
 }
 
+static void iommu_snp_enable(void)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return;
+	/*
+	 * The SNP support requires that IOMMU must be enabled, and is
+	 * not configured in the passthrough mode.
+	 */
+	if (no_iommu || iommu_default_passthrough()) {
+		pr_err("SNP: IOMMU is disabled or configured in passthrough mode, SNP cannot be supported.\n");
+		return;
+	}
+
+	amd_iommu_snp_en = check_feature(FEATURE_SNP);
+	if (!amd_iommu_snp_en) {
+		pr_err("SNP: IOMMU SNP feature is not enabled, SNP cannot be supported.\n");
+		return;
+	}
+
+	pr_info("IOMMU SNP support is enabled.\n");
+
+	/* Enforce IOMMU v1 pagetable when SNP is enabled. */
+	if (amd_iommu_pgtable != AMD_IOMMU_V1) {
+		pr_warn("Forcing use of AMD IOMMU v1 page table due to SNP.\n");
+		amd_iommu_pgtable = AMD_IOMMU_V1;
+	}
+#endif
+}
+
 /****************************************************************************
  *
  * AMD IOMMU Initialization State Machine
@@ -3256,6 +3286,7 @@ static int __init state_next(void)
 		break;
 	case IOMMU_ENABLED:
 		register_syscore_ops(&amd_iommu_syscore_ops);
+		iommu_snp_enable();
 		ret = amd_iommu_init_pci();
 		init_state = ret ? IOMMU_INIT_ERROR : IOMMU_PCI_INIT;
 		break;
@@ -3766,41 +3797,3 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
 
 	return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);
 }
-
-#ifdef CONFIG_AMD_MEM_ENCRYPT
-int amd_iommu_snp_enable(void)
-{
-	/*
-	 * The SNP support requires that IOMMU must be enabled, and is
-	 * not configured in the passthrough mode.
-	 */
-	if (no_iommu || iommu_default_passthrough()) {
-		pr_err("SNP: IOMMU is disabled or configured in passthrough mode, SNP cannot be supported");
-		return -EINVAL;
-	}
-
-	/*
-	 * Prevent enabling SNP after IOMMU_ENABLED state because this process
-	 * affect how IOMMU driver sets up data structures and configures
-	 * IOMMU hardware.
-	 */
-	if (init_state > IOMMU_ENABLED) {
-		pr_err("SNP: Too late to enable SNP for IOMMU.\n");
-		return -EINVAL;
-	}
-
-	amd_iommu_snp_en = check_feature(FEATURE_SNP);
-	if (!amd_iommu_snp_en)
-		return -EINVAL;
-
-	pr_info("SNP enabled\n");
-
-	/* Enforce IOMMU v1 pagetable when SNP is enabled. */
-	if (amd_iommu_pgtable != AMD_IOMMU_V1) {
-		pr_warn("Force to using AMD IOMMU v1 page table due to SNP\n");
-		amd_iommu_pgtable = AMD_IOMMU_V1;
-	}
-
-	return 0;
-}
-#endif
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index dc7ed2f46886..7365be00a795 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -85,8 +85,4 @@ int amd_iommu_pc_get_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn,
 		u64 *value);
 struct amd_iommu *get_amd_iommu(unsigned int idx);
 
-#ifdef CONFIG_AMD_MEM_ENCRYPT
-int amd_iommu_snp_enable(void);
-#endif
-
 #endif /* _ASM_X86_AMD_IOMMU_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (2 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 03/26] iommu/amd: Don't rely on external callers to enable IOMMU SNP support Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-04 11:05   ` Jeremi Piotrowski
                     ` (4 more replies)
  2023-12-30 16:19 ` [PATCH v1 05/26] x86/mtrr: Don't print errors if MtrrFixDramModEn is set when SNP enabled Michael Roth
                   ` (21 subsequent siblings)
  25 siblings, 5 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
a number of steps needed to detect/enable SEV-SNP and RMP table support
on the host:

 - Detect SEV-SNP support based on CPUID bit
 - Initialize the RMP table memory reported by the RMP base/end MSR
   registers and configure IOMMU to be compatible with RMP access
   restrictions
 - Set the MtrrFixDramModEn bit in SYSCFG MSR
 - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
 - Configure IOMMU

RMP table entry format is non-architectural and it can vary by
processor. It is defined by the PPR. Restrict SNP support to CPU
models/families which are compatible with the current RMP table entry
format to guard against any undefined behavior when running on other
system types. Future models/support will handle this through an
architectural mechanism to allow for broader compatibility.

SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
instead of CONFIG_AMD_MEM_ENCRYPT.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/Kbuild                  |   2 +
 arch/x86/include/asm/msr-index.h |  11 +-
 arch/x86/include/asm/sev.h       |   6 +
 arch/x86/kernel/cpu/amd.c        |  15 +++
 arch/x86/virt/svm/Makefile       |   3 +
 arch/x86/virt/svm/sev.c          | 219 +++++++++++++++++++++++++++++++
 6 files changed, 255 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/virt/svm/Makefile
 create mode 100644 arch/x86/virt/svm/sev.c

diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
index 5a83da703e87..6a1f36df6a18 100644
--- a/arch/x86/Kbuild
+++ b/arch/x86/Kbuild
@@ -28,5 +28,7 @@ obj-y += net/
 
 obj-$(CONFIG_KEXEC_FILE) += purgatory/
 
+obj-y += virt/svm/
+
 # for cleaning
 subdir- += boot tools
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index f1bd7b91b3c6..15ce1269f270 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -599,6 +599,8 @@
 #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
 #define MSR_AMD64_SEV_ES_ENABLED	BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
 #define MSR_AMD64_SEV_SNP_ENABLED	BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_RMP_BASE		0xc0010132
+#define MSR_AMD64_RMP_END		0xc0010133
 
 /* SNP feature bits enabled by the hypervisor */
 #define MSR_AMD64_SNP_VTOM			BIT_ULL(3)
@@ -709,7 +711,14 @@
 #define MSR_K8_TOP_MEM2			0xc001001d
 #define MSR_AMD64_SYSCFG		0xc0010010
 #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT	23
-#define MSR_AMD64_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_MEM_ENCRYPT		BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_SNP_EN_BIT		24
+#define MSR_AMD64_SYSCFG_SNP_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT	25
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
+#define MSR_AMD64_SYSCFG_MFDM_BIT		19
+#define MSR_AMD64_SYSCFG_MFDM			BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
+
 #define MSR_K8_INT_PENDING_MSG		0xc0010055
 /* C1E active bits in int pending message */
 #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 5b4a1ce3d368..1f59d8ba9776 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -243,4 +243,10 @@ static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
 static inline u64 sev_get_status(void) { return 0; }
 #endif
 
+#ifdef CONFIG_KVM_AMD_SEV
+bool snp_probe_rmptable_info(void);
+#else
+static inline bool snp_probe_rmptable_info(void) { return false; }
+#endif
+
 #endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 9a17165dfe84..0f0d425f0440 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -20,6 +20,7 @@
 #include <asm/delay.h>
 #include <asm/debugreg.h>
 #include <asm/resctrl.h>
+#include <asm/sev.h>
 
 #ifdef CONFIG_X86_64
 # include <asm/mmconfig.h>
@@ -574,6 +575,20 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 		break;
 	}
 
+	if (cpu_has(c, X86_FEATURE_SEV_SNP)) {
+		/*
+		 * RMP table entry format is not architectural and it can vary by processor
+		 * and is defined by the per-processor PPR. Restrict SNP support on the
+		 * known CPU model and family for which the RMP table entry format is
+		 * currently defined for.
+		 */
+		if (!(c->x86 == 0x19 && c->x86_model <= 0xaf) &&
+		    !(c->x86 == 0x1a && c->x86_model <= 0xf))
+			setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+		else if (!snp_probe_rmptable_info())
+			setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+	}
+
 	return;
 
 warn:
diff --git a/arch/x86/virt/svm/Makefile b/arch/x86/virt/svm/Makefile
new file mode 100644
index 000000000000..ef2a31bdcc70
--- /dev/null
+++ b/arch/x86/virt/svm/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_KVM_AMD_SEV) += sev.o
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
new file mode 100644
index 000000000000..ce7ede9065ed
--- /dev/null
+++ b/arch/x86/virt/svm/sev.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD SVM-SEV Host Support.
+ *
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ *
+ * Author: Ashish Kalra <ashish.kalra@amd.com>
+ *
+ */
+
+#include <linux/cc_platform.h>
+#include <linux/printk.h>
+#include <linux/mm_types.h>
+#include <linux/set_memory.h>
+#include <linux/memblock.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/cpumask.h>
+#include <linux/iommu.h>
+#include <linux/amd-iommu.h>
+
+#include <asm/sev.h>
+#include <asm/processor.h>
+#include <asm/setup.h>
+#include <asm/svm.h>
+#include <asm/smp.h>
+#include <asm/cpu.h>
+#include <asm/apic.h>
+#include <asm/cpuid.h>
+#include <asm/cmdline.h>
+#include <asm/iommu.h>
+
+/*
+ * The RMP entry format is not architectural. The format is defined in PPR
+ * Family 19h Model 01h, Rev B1 processor.
+ */
+struct rmpentry {
+	u64	assigned	: 1,
+		pagesize	: 1,
+		immutable	: 1,
+		rsvd1		: 9,
+		gpa		: 39,
+		asid		: 10,
+		vmsa		: 1,
+		validated	: 1,
+		rsvd2		: 1;
+	u64 rsvd3;
+} __packed;
+
+/*
+ * The first 16KB from the RMP_BASE is used by the processor for the
+ * bookkeeping, the range needs to be added during the RMP entry lookup.
+ */
+#define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
+
+static u64 probed_rmp_base, probed_rmp_size;
+static struct rmpentry *rmptable __ro_after_init;
+static u64 rmptable_max_pfn __ro_after_init;
+
+#undef pr_fmt
+#define pr_fmt(fmt)	"SEV-SNP: " fmt
+
+static int __mfd_enable(unsigned int cpu)
+{
+	u64 val;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+
+	val |= MSR_AMD64_SYSCFG_MFDM;
+
+	wrmsrl(MSR_AMD64_SYSCFG, val);
+
+	return 0;
+}
+
+static __init void mfd_enable(void *arg)
+{
+	__mfd_enable(smp_processor_id());
+}
+
+static int __snp_enable(unsigned int cpu)
+{
+	u64 val;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+
+	val |= MSR_AMD64_SYSCFG_SNP_EN;
+	val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
+
+	wrmsrl(MSR_AMD64_SYSCFG, val);
+
+	return 0;
+}
+
+static __init void snp_enable(void *arg)
+{
+	__snp_enable(smp_processor_id());
+}
+
+#define RMP_ADDR_MASK GENMASK_ULL(51, 13)
+
+bool snp_probe_rmptable_info(void)
+{
+	u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
+
+	rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
+	rdmsrl(MSR_AMD64_RMP_END, rmp_end);
+
+	if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
+		pr_err("Memory for the RMP table has not been reserved by BIOS\n");
+		return false;
+	}
+
+	if (rmp_base > rmp_end) {
+		pr_err("RMP configuration not valid: base=%#llx, end=%#llx\n", rmp_base, rmp_end);
+		return false;
+	}
+
+	rmp_sz = rmp_end - rmp_base + 1;
+
+	/*
+	 * Calculate the amount the memory that must be reserved by the BIOS to
+	 * address the whole RAM, including the bookkeeping area. The RMP itself
+	 * must also be covered.
+	 */
+	max_rmp_pfn = max_pfn;
+	if (PHYS_PFN(rmp_end) > max_pfn)
+		max_rmp_pfn = PHYS_PFN(rmp_end);
+
+	calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+	if (calc_rmp_sz > rmp_sz) {
+		pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
+		       calc_rmp_sz, rmp_sz);
+		return false;
+	}
+
+	probed_rmp_base = rmp_base;
+	probed_rmp_size = rmp_sz;
+
+	pr_info("RMP table physical range [0x%016llx - 0x%016llx]\n",
+		probed_rmp_base, probed_rmp_base + probed_rmp_size - 1);
+
+	return true;
+}
+
+static int __init __snp_rmptable_init(void)
+{
+	u64 rmptable_size;
+	void *rmptable_start;
+	u64 val;
+
+	if (!probed_rmp_size)
+		return 1;
+
+	rmptable_start = memremap(probed_rmp_base, probed_rmp_size, MEMREMAP_WB);
+	if (!rmptable_start) {
+		pr_err("Failed to map RMP table\n");
+		return 1;
+	}
+
+	/*
+	 * Check if SEV-SNP is already enabled, this can happen in case of
+	 * kexec boot.
+	 */
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+	if (val & MSR_AMD64_SYSCFG_SNP_EN)
+		goto skip_enable;
+
+	memset(rmptable_start, 0, probed_rmp_size);
+
+	/* Flush the caches to ensure that data is written before SNP is enabled. */
+	wbinvd_on_all_cpus();
+
+	/* MtrrFixDramModEn must be enabled on all the CPUs prior to enabling SNP. */
+	on_each_cpu(mfd_enable, NULL, 1);
+
+	on_each_cpu(snp_enable, NULL, 1);
+
+skip_enable:
+	rmptable_start += RMPTABLE_CPU_BOOKKEEPING_SZ;
+	rmptable_size = probed_rmp_size - RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+	rmptable = (struct rmpentry *)rmptable_start;
+	rmptable_max_pfn = rmptable_size / sizeof(struct rmpentry) - 1;
+
+	return 0;
+}
+
+static int __init snp_rmptable_init(void)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	if (!amd_iommu_snp_en)
+		return 0;
+
+	if (__snp_rmptable_init())
+		goto nosnp;
+
+	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
+
+	return 0;
+
+nosnp:
+	setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+	return -ENOSYS;
+}
+
+/*
+ * This must be called after the IOMMU has been initialized.
+ */
+device_initcall(snp_rmptable_init);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 05/26] x86/mtrr: Don't print errors if MtrrFixDramModEn is set when SNP enabled
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (3 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2023-12-30 16:19 ` [PATCH v1 06/26] x86/sev: Add RMP entry lookup helpers Michael Roth
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Jeremi Piotrowski

From: Ashish Kalra <ashish.kalra@amd.com>

SNP enabled platforms require the MtrrFixDramModeEn bit to be set across
all CPUs when SNP is enabled. Therefore, don't print error messages when
MtrrFixDramModeEn is set when bringing CPUs online.

Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Closes: https://lore.kernel.org/kvm/68b2d6bf-bce7-47f9-bebb-2652cc923ff9@linux.microsoft.com/
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index d3524778a545..422a4ddc2ab7 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -108,6 +108,9 @@ static inline void k8_check_syscfg_dram_mod_en(void)
 	      (boot_cpu_data.x86 >= 0x0f)))
 		return;
 
+	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return;
+
 	rdmsr(MSR_AMD64_SYSCFG, lo, hi);
 	if (lo & K8_MTRRFIXRANGE_DRAM_MODIFY) {
 		pr_err(FW_WARN "MTRR: CPU %u: SYSCFG[MtrrFixDramModEn]"
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 06/26] x86/sev: Add RMP entry lookup helpers
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (4 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 05/26] x86/mtrr: Don't print errors if MtrrFixDramModEn is set when SNP enabled Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2023-12-30 16:19 ` [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries Michael Roth
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Add a helper that can be used to access information contained in the RMP
entry corresponding to a particular PFN. This will be needed to make
decisions on how to handle setting up mappings in the NPT in response to
guest page-faults and handling things like cleaning up pages and setting
them back to the default hypervisor-owned state when they are no longer
being used for private data.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: separate 'assigned' indicator from return code, and simplify
      function signatures for various helpers]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev.h |  3 +++
 arch/x86/virt/svm/sev.c    | 49 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 1f59d8ba9776..01ce61b283a3 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -90,6 +90,7 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 /* RMP page size */
 #define RMP_PG_SIZE_4K			0
 #define RMP_PG_SIZE_2M			1
+#define RMP_TO_PG_LEVEL(level)		(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
 
 #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
 
@@ -245,8 +246,10 @@ static inline u64 sev_get_status(void) { return 0; }
 
 #ifdef CONFIG_KVM_AMD_SEV
 bool snp_probe_rmptable_info(void);
+int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
+static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
 #endif
 
 #endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index ce7ede9065ed..49fdfbf4e518 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -53,6 +53,9 @@ struct rmpentry {
  */
 #define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
 
+/* Mask to apply to a PFN to get the first PFN of a 2MB page */
+#define PFN_PMD_MASK	GENMASK_ULL(63, PMD_SHIFT - PAGE_SHIFT)
+
 static u64 probed_rmp_base, probed_rmp_size;
 static struct rmpentry *rmptable __ro_after_init;
 static u64 rmptable_max_pfn __ro_after_init;
@@ -217,3 +220,49 @@ static int __init snp_rmptable_init(void)
  * This must be called after the IOMMU has been initialized.
  */
 device_initcall(snp_rmptable_init);
+
+static struct rmpentry *get_rmpentry(u64 pfn)
+{
+	if (WARN_ON_ONCE(pfn > rmptable_max_pfn))
+		return ERR_PTR(-EFAULT);
+
+	return &rmptable[pfn];
+}
+
+static struct rmpentry *__snp_lookup_rmpentry(u64 pfn, int *level)
+{
+	struct rmpentry *large_entry, *entry;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return ERR_PTR(-ENODEV);
+
+	entry = get_rmpentry(pfn);
+	if (IS_ERR(entry))
+		return entry;
+
+	/*
+	 * Find the authoritative RMP entry for a PFN. This can be either a 4K
+	 * RMP entry or a special large RMP entry that is authoritative for a
+	 * whole 2M area.
+	 */
+	large_entry = get_rmpentry(pfn & PFN_PMD_MASK);
+	if (IS_ERR(large_entry))
+		return large_entry;
+
+	*level = RMP_TO_PG_LEVEL(large_entry->pagesize);
+
+	return entry;
+}
+
+int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
+{
+	struct rmpentry *e;
+
+	e = __snp_lookup_rmpentry(pfn, level);
+	if (IS_ERR(e))
+		return PTR_ERR(e);
+
+	*assigned = !!e->assigned;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (5 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 06/26] x86/sev: Add RMP entry lookup helpers Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-10  9:59   ` Borislav Petkov
                     ` (2 more replies)
  2023-12-30 16:19 ` [PATCH v1 08/26] x86/traps: Define RMP violation #PF error code Michael Roth
                   ` (18 subsequent siblings)
  25 siblings, 3 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

This information will be useful for debugging things like page faults
due to RMP access violations and RMPUPDATE failures.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: move helper to standalone patch, rework dump logic to reduce
      verbosity]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev.h |  2 +
 arch/x86/virt/svm/sev.c    | 77 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 01ce61b283a3..2c53e3de0b71 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -247,9 +247,11 @@ static inline u64 sev_get_status(void) { return 0; }
 #ifdef CONFIG_KVM_AMD_SEV
 bool snp_probe_rmptable_info(void);
 int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
+void snp_dump_hva_rmpentry(unsigned long address);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
+static inline void snp_dump_hva_rmpentry(unsigned long address) {}
 #endif
 
 #endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 49fdfbf4e518..7c9ced8911e9 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -266,3 +266,80 @@ int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
+
+/*
+ * Dump the raw RMP entry for a particular PFN. These bits are documented in the
+ * PPR for a particular CPU model and provide useful information about how a
+ * particular PFN is being utilized by the kernel/firmware at the time certain
+ * unexpected events occur, such as RMP faults.
+ */
+static void dump_rmpentry(u64 pfn)
+{
+	u64 pfn_current, pfn_end;
+	struct rmpentry *e;
+	u64 *e_data;
+	int level;
+
+	e = __snp_lookup_rmpentry(pfn, &level);
+	if (IS_ERR(e)) {
+		pr_info("Failed to read RMP entry for PFN 0x%llx, error %ld\n",
+			pfn, PTR_ERR(e));
+		return;
+	}
+
+	e_data = (u64 *)e;
+	if (e->assigned) {
+		pr_info("RMP entry for PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+			pfn, e_data[1], e_data[0]);
+		return;
+	}
+
+	/*
+	 * If the RMP entry for a particular PFN is not in an assigned state,
+	 * then it is sometimes useful to get an idea of whether or not any RMP
+	 * entries for other PFNs within the same 2MB region are assigned, since
+	 * those too can affect the ability to access a particular PFN in
+	 * certain situations, such as when the PFN is being accessed via a 2MB
+	 * mapping in the host page table.
+	 */
+	pfn_current = ALIGN(pfn, PTRS_PER_PMD);
+	pfn_end = pfn_current + PTRS_PER_PMD;
+
+	while (pfn_current < pfn_end) {
+		e = __snp_lookup_rmpentry(pfn_current, &level);
+		if (IS_ERR(e)) {
+			pfn_current++;
+			continue;
+		}
+
+		e_data = (u64 *)e;
+		if (e_data[0] || e_data[1]) {
+			pr_info("No assigned RMP entry for PFN 0x%llx, but the 2MB region contains populated RMP entries, e.g.: PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+				pfn, pfn_current, e_data[1], e_data[0]);
+			return;
+		}
+		pfn_current++;
+	}
+
+	pr_info("No populated RMP entries in the 2MB region containing PFN 0x%llx\n",
+		pfn);
+}
+
+void snp_dump_hva_rmpentry(unsigned long hva)
+{
+	unsigned int level;
+	pgd_t *pgd;
+	pte_t *pte;
+
+	pgd = __va(read_cr3_pa());
+	pgd += pgd_index(hva);
+	pte = lookup_address_in_pgd(pgd, hva, &level);
+
+	if (!pte) {
+		pr_info("Can't dump RMP entry for HVA %lx: no PTE/PFN found\n", hva);
+		return;
+	}
+
+	dump_rmpentry(pte_pfn(*pte));
+}
+EXPORT_SYMBOL_GPL(snp_dump_hva_rmpentry);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 08/26] x86/traps: Define RMP violation #PF error code
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (6 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2023-12-30 16:19 ` [PATCH v1 09/26] x86/fault: Dump RMP table information when RMP page faults occur Michael Roth
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh, Dave Hansen

From: Brijesh Singh <brijesh.singh@amd.com>

Bit 31 in the page fault-error bit will be set when processor encounters
an RMP violation.

While at it, use the BIT() macro.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off by: Ashish Kalra <ashish.kalra@amd.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/trap_pf.h | 20 ++++++++++++--------
 arch/x86/mm/fault.c            |  1 +
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index afa524325e55..7fc4db774dd6 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_TRAP_PF_H
 #define _ASM_X86_TRAP_PF_H
 
+#include <linux/bits.h>  /* BIT() macro */
+
 /*
  * Page fault error code bits:
  *
@@ -13,16 +15,18 @@
  *   bit 5 ==				1: protection keys block access
  *   bit 6 ==				1: shadow stack access fault
  *   bit 15 ==				1: SGX MMU page-fault
+ *   bit 31 ==				1: fault was due to RMP violation
  */
 enum x86_pf_error_code {
-	X86_PF_PROT	=		1 << 0,
-	X86_PF_WRITE	=		1 << 1,
-	X86_PF_USER	=		1 << 2,
-	X86_PF_RSVD	=		1 << 3,
-	X86_PF_INSTR	=		1 << 4,
-	X86_PF_PK	=		1 << 5,
-	X86_PF_SHSTK	=		1 << 6,
-	X86_PF_SGX	=		1 << 15,
+	X86_PF_PROT	=		BIT(0),
+	X86_PF_WRITE	=		BIT(1),
+	X86_PF_USER	=		BIT(2),
+	X86_PF_RSVD	=		BIT(3),
+	X86_PF_INSTR	=		BIT(4),
+	X86_PF_PK	=		BIT(5),
+	X86_PF_SHSTK	=		BIT(6),
+	X86_PF_SGX	=		BIT(15),
+	X86_PF_RMP	=		BIT(31),
 };
 
 #endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index ab778eac1952..7858b9515d4a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -547,6 +547,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 		 !(error_code & X86_PF_PROT) ? "not-present page" :
 		 (error_code & X86_PF_RSVD)  ? "reserved bit violation" :
 		 (error_code & X86_PF_PK)    ? "protection keys violation" :
+		 (error_code & X86_PF_RMP)   ? "RMP violation" :
 					       "permissions violation");
 
 	if (!(error_code & X86_PF_USER) && user_mode(regs)) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 09/26] x86/fault: Dump RMP table information when RMP page faults occur
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (7 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 08/26] x86/traps: Define RMP violation #PF error code Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2023-12-30 16:19 ` [PATCH v1 10/26] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

RMP faults on kernel addresses are fatal and should never happen in
practice. They indicate a bug in the host kernel somewhere. Userspace
RMP faults shouldn't occur either, since even for VMs the memory used
for private pages is handled by guest_memfd and by design is not
mappable by userspace.

Dump RMP table information about the PFN corresponding to the faulting
HVA to help diagnose any issues of this sort when show_fault_oops() is
triggered by an RMP fault.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/mm/fault.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7858b9515d4a..8f33ec12984e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -34,6 +34,7 @@
 #include <asm/kvm_para.h>		/* kvm_handle_async_pf		*/
 #include <asm/vdso.h>			/* fixup_vdso_exception()	*/
 #include <asm/irq_stack.h>
+#include <asm/sev.h>			/* snp_dump_hva_rmpentry()	*/
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -580,6 +581,9 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 	}
 
 	dump_pagetable(address);
+
+	if (error_code & X86_PF_RMP)
+		snp_dump_hva_rmpentry(address);
 }
 
 static noinline void
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 10/26] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (8 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 09/26] x86/fault: Dump RMP table information when RMP page faults occur Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-12 14:49   ` Borislav Petkov
  2023-12-30 16:19 ` [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
                   ` (15 subsequent siblings)
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.

The PSMASH instruction expands a 2MB RMP entry into a corresponding set
of contiguous 4KB RMP entries. The hypervisor will use this instruction
to adjust the RMP entry without invalidating it.

Add helpers to make use of these instructions.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev.h | 23 +++++++++++
 arch/x86/virt/svm/sev.c    | 79 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 2c53e3de0b71..d3ccb7a0c7e9 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -87,10 +87,23 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 /* Software defined (when rFlags.CF = 1) */
 #define PVALIDATE_FAIL_NOUPDATE		255
 
+/* RMUPDATE detected 4K page and 2MB page overlap. */
+#define RMPUPDATE_FAIL_OVERLAP		4
+
 /* RMP page size */
 #define RMP_PG_SIZE_4K			0
 #define RMP_PG_SIZE_2M			1
 #define RMP_TO_PG_LEVEL(level)		(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+#define PG_LEVEL_TO_RMP(level)		(((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
+
+struct rmp_state {
+	u64 gpa;
+	u8 assigned;
+	u8 pagesize;
+	u8 immutable;
+	u8 rsvd;
+	u32 asid;
+} __packed;
 
 #define RMPADJUST_VMSA_PAGE_BIT		BIT(16)
 
@@ -248,10 +261,20 @@ static inline u64 sev_get_status(void) { return 0; }
 bool snp_probe_rmptable_info(void);
 int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
 void snp_dump_hva_rmpentry(unsigned long address);
+int psmash(u64 pfn);
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
+int rmp_make_shared(u64 pfn, enum pg_level level);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
 static inline void snp_dump_hva_rmpentry(unsigned long address) {}
+static inline int psmash(u64 pfn) { return -ENODEV; }
+static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
+				   bool immutable)
+{
+	return -ENODEV;
+}
+static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
 #endif
 
 #endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 7c9ced8911e9..ff9fa0a85a7f 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -343,3 +343,82 @@ void snp_dump_hva_rmpentry(unsigned long hva)
 	dump_rmpentry(pte_pfn(*pte));
 }
 EXPORT_SYMBOL_GPL(snp_dump_hva_rmpentry);
+
+/*
+ * PSMASH a 2MB aligned page into 4K pages in the RMP table while preserving the
+ * Validated bit.
+ */
+int psmash(u64 pfn)
+{
+	unsigned long paddr = pfn << PAGE_SHIFT;
+	int ret;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENODEV;
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
+	/* Binutils version 2.36 supports the PSMASH mnemonic. */
+	asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+		      : "=a" (ret)
+		      : "a" (paddr)
+		      : "memory", "cc");
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(psmash);
+
+static int rmpupdate(u64 pfn, struct rmp_state *state)
+{
+	unsigned long paddr = pfn << PAGE_SHIFT;
+	int ret;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENODEV;
+
+	do {
+		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+			     : "=a" (ret)
+			     : "a" (paddr), "c" ((unsigned long)state)
+			     : "memory", "cc");
+	} while (ret == RMPUPDATE_FAIL_OVERLAP);
+
+	if (ret) {
+		pr_err("RMPUPDATE failed for PFN %llx, ret: %d\n", pfn, ret);
+		dump_rmpentry(pfn);
+		dump_stack();
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+/* Transition a page to guest-owned/private state in the RMP table. */
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
+{
+	struct rmp_state state;
+
+	memset(&state, 0, sizeof(state));
+	state.assigned = 1;
+	state.asid = asid;
+	state.immutable = immutable;
+	state.gpa = gpa;
+	state.pagesize = PG_LEVEL_TO_RMP(level);
+
+	return rmpupdate(pfn, &state);
+}
+EXPORT_SYMBOL_GPL(rmp_make_private);
+
+/* Transition a page to hypervisor-owned/shared state in the RMP table. */
+int rmp_make_shared(u64 pfn, enum pg_level level)
+{
+	struct rmp_state state;
+
+	memset(&state, 0, sizeof(state));
+	state.pagesize = PG_LEVEL_TO_RMP(level);
+
+	return rmpupdate(pfn, &state);
+}
+EXPORT_SYMBOL_GPL(rmp_make_shared);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (9 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 10/26] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-12 19:48   ` Borislav Petkov
                     ` (2 more replies)
  2023-12-30 16:19 ` [PATCH v1 12/26] crypto: ccp: Define the SEV-SNP commands Michael Roth
                   ` (14 subsequent siblings)
  25 siblings, 3 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used with standard x86 and IOMMU page tables to enforce
memory restrictions and page access rights. The RMP check is enforced as
soon as SEV-SNP is enabled globally in the system. When hardware
encounters an RMP-check failure, it raises a page-fault exception.

If the kernel uses a 2MB directmap mapping to write to an address, and
that 2MB range happens to contain a 4KB page that set to private in the
RMP table, that will also lead to a page-fault exception.

Prevent this by removing pages from the directmap prior to setting them
as private in the RMP table, and then re-adding them to the directmap
when they set back to shared.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/virt/svm/sev.c | 58 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index ff9fa0a85a7f..ee182351d93a 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -369,14 +369,64 @@ int psmash(u64 pfn)
 }
 EXPORT_SYMBOL_GPL(psmash);
 
+static int restore_direct_map(u64 pfn, int npages)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < npages; i++) {
+		ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));
+		if (ret)
+			break;
+	}
+
+	if (ret)
+		pr_warn("Failed to restore direct map for pfn 0x%llx, ret: %d\n",
+			pfn + i, ret);
+
+	return ret;
+}
+
+static int invalidate_direct_map(u64 pfn, int npages)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < npages; i++) {
+		ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		pr_warn("Failed to invalidate direct map for pfn 0x%llx, ret: %d\n",
+			pfn + i, ret);
+		restore_direct_map(pfn, i);
+	}
+
+	return ret;
+}
+
 static int rmpupdate(u64 pfn, struct rmp_state *state)
 {
 	unsigned long paddr = pfn << PAGE_SHIFT;
-	int ret;
+	int ret, level, npages;
 
 	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
 		return -ENODEV;
 
+	level = RMP_TO_PG_LEVEL(state->pagesize);
+	npages = page_level_size(level) / PAGE_SIZE;
+
+	/*
+	 * If the kernel uses a 2MB directmap mapping to write to an address,
+	 * and that 2MB range happens to contain a 4KB page that set to private
+	 * in the RMP table, an RMP #PF will trigger and cause a host crash.
+	 *
+	 * Prevent this by removing pages from the directmap prior to setting
+	 * them as private in the RMP table.
+	 */
+	if (state->assigned && invalidate_direct_map(pfn, npages))
+		return -EFAULT;
+
 	do {
 		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
 		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
@@ -386,12 +436,16 @@ static int rmpupdate(u64 pfn, struct rmp_state *state)
 	} while (ret == RMPUPDATE_FAIL_OVERLAP);
 
 	if (ret) {
-		pr_err("RMPUPDATE failed for PFN %llx, ret: %d\n", pfn, ret);
+		pr_err("RMPUPDATE failed for PFN %llx, pg_level: %d, ret: %d\n",
+		       pfn, level, ret);
 		dump_rmpentry(pfn);
 		dump_stack();
 		return -EFAULT;
 	}
 
+	if (!state->assigned && restore_direct_map(pfn, npages))
+		return -EFAULT;
+
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 12/26] crypto: ccp: Define the SEV-SNP commands
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (10 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-15  9:41   ` Borislav Petkov
  2023-12-30 16:19 ` [PATCH v1 13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
                   ` (13 subsequent siblings)
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
while adding new hardware security protection.

Define the commands and structures used to communicate with the AMD-SP
when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
is available at developer.amd.com/sev.

Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: update SNP command list and SNP status struct based on current
      spec, use C99 flexible arrays, fix kernel-doc issues]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c |  16 +++
 include/linux/psp-sev.h      | 264 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h |  56 ++++++++
 3 files changed, 336 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index fcaccd0b5a65..b2672234f386 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -130,6 +130,8 @@ static int sev_cmd_buffer_len(int cmd)
 	switch (cmd) {
 	case SEV_CMD_INIT:			return sizeof(struct sev_data_init);
 	case SEV_CMD_INIT_EX:                   return sizeof(struct sev_data_init_ex);
+	case SEV_CMD_SNP_SHUTDOWN_EX:		return sizeof(struct sev_data_snp_shutdown_ex);
+	case SEV_CMD_SNP_INIT_EX:		return sizeof(struct sev_data_snp_init_ex);
 	case SEV_CMD_PLATFORM_STATUS:		return sizeof(struct sev_user_data_status);
 	case SEV_CMD_PEK_CSR:			return sizeof(struct sev_data_pek_csr);
 	case SEV_CMD_PEK_CERT_IMPORT:		return sizeof(struct sev_data_pek_cert_import);
@@ -158,6 +160,20 @@ static int sev_cmd_buffer_len(int cmd)
 	case SEV_CMD_GET_ID:			return sizeof(struct sev_data_get_id);
 	case SEV_CMD_ATTESTATION_REPORT:	return sizeof(struct sev_data_attestation_report);
 	case SEV_CMD_SEND_CANCEL:		return sizeof(struct sev_data_send_cancel);
+	case SEV_CMD_SNP_GCTX_CREATE:		return sizeof(struct sev_data_snp_addr);
+	case SEV_CMD_SNP_LAUNCH_START:		return sizeof(struct sev_data_snp_launch_start);
+	case SEV_CMD_SNP_LAUNCH_UPDATE:		return sizeof(struct sev_data_snp_launch_update);
+	case SEV_CMD_SNP_ACTIVATE:		return sizeof(struct sev_data_snp_activate);
+	case SEV_CMD_SNP_DECOMMISSION:		return sizeof(struct sev_data_snp_addr);
+	case SEV_CMD_SNP_PAGE_RECLAIM:		return sizeof(struct sev_data_snp_page_reclaim);
+	case SEV_CMD_SNP_GUEST_STATUS:		return sizeof(struct sev_data_snp_guest_status);
+	case SEV_CMD_SNP_LAUNCH_FINISH:		return sizeof(struct sev_data_snp_launch_finish);
+	case SEV_CMD_SNP_DBG_DECRYPT:		return sizeof(struct sev_data_snp_dbg);
+	case SEV_CMD_SNP_DBG_ENCRYPT:		return sizeof(struct sev_data_snp_dbg);
+	case SEV_CMD_SNP_PAGE_UNSMASH:		return sizeof(struct sev_data_snp_page_unsmash);
+	case SEV_CMD_SNP_PLATFORM_STATUS:	return sizeof(struct sev_data_snp_addr);
+	case SEV_CMD_SNP_GUEST_REQUEST:		return sizeof(struct sev_data_snp_guest_request);
+	case SEV_CMD_SNP_CONFIG:		return sizeof(struct sev_user_data_snp_config);
 	default:				return 0;
 	}
 
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 7fd17e82bab4..983d314b5ff5 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -78,6 +78,36 @@ enum sev_cmd {
 	SEV_CMD_DBG_DECRYPT		= 0x060,
 	SEV_CMD_DBG_ENCRYPT		= 0x061,
 
+	/* SNP specific commands */
+	SEV_CMD_SNP_INIT		= 0x081,
+	SEV_CMD_SNP_SHUTDOWN		= 0x082,
+	SEV_CMD_SNP_PLATFORM_STATUS	= 0x083,
+	SEV_CMD_SNP_DF_FLUSH		= 0x084,
+	SEV_CMD_SNP_INIT_EX		= 0x085,
+	SEV_CMD_SNP_SHUTDOWN_EX		= 0x086,
+	SEV_CMD_SNP_DECOMMISSION	= 0x090,
+	SEV_CMD_SNP_ACTIVATE		= 0x091,
+	SEV_CMD_SNP_GUEST_STATUS	= 0x092,
+	SEV_CMD_SNP_GCTX_CREATE		= 0x093,
+	SEV_CMD_SNP_GUEST_REQUEST	= 0x094,
+	SEV_CMD_SNP_ACTIVATE_EX		= 0x095,
+	SEV_CMD_SNP_LAUNCH_START	= 0x0A0,
+	SEV_CMD_SNP_LAUNCH_UPDATE	= 0x0A1,
+	SEV_CMD_SNP_LAUNCH_FINISH	= 0x0A2,
+	SEV_CMD_SNP_DBG_DECRYPT		= 0x0B0,
+	SEV_CMD_SNP_DBG_ENCRYPT		= 0x0B1,
+	SEV_CMD_SNP_PAGE_SWAP_OUT	= 0x0C0,
+	SEV_CMD_SNP_PAGE_SWAP_IN	= 0x0C1,
+	SEV_CMD_SNP_PAGE_MOVE		= 0x0C2,
+	SEV_CMD_SNP_PAGE_MD_INIT	= 0x0C3,
+	SEV_CMD_SNP_PAGE_SET_STATE	= 0x0C6,
+	SEV_CMD_SNP_PAGE_RECLAIM	= 0x0C7,
+	SEV_CMD_SNP_PAGE_UNSMASH	= 0x0C8,
+	SEV_CMD_SNP_CONFIG		= 0x0C9,
+	SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX	= 0x0CA,
+	SEV_CMD_SNP_COMMIT		= 0x0CB,
+	SEV_CMD_SNP_VLEK_LOAD		= 0x0CD,
+
 	SEV_CMD_MAX,
 };
 
@@ -523,6 +553,240 @@ struct sev_data_attestation_report {
 	u32 len;				/* In/Out */
 } __packed;
 
+/**
+ * struct sev_data_snp_download_firmware - SNP_DOWNLOAD_FIRMWARE command params
+ *
+ * @address: physical address of firmware image
+ * @len: len of the firmware image
+ */
+struct sev_data_snp_download_firmware {
+	u64 address;				/* In */
+	u32 len;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_activate - SNP_ACTIVATE command params
+ *
+ * @gctx_paddr: system physical address guest context page
+ * @asid: ASID to bind to the guest
+ */
+struct sev_data_snp_activate {
+	u64 gctx_paddr;				/* In */
+	u32 asid;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_addr - generic SNP command params
+ *
+ * @gctx_paddr: system physical address guest context page
+ */
+struct sev_data_snp_addr {
+	u64 gctx_paddr;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @policy: guest policy
+ * @ma_gctx_paddr: system physical address of migration agent
+ * @ma_en: the guest is associated with a migration agent
+ * @imi_en: launch flow is launching an Incoming Migration Image for the
+ *	purpose of guest-assisted migration.
+ * @rsvd: reserved
+ * @gosvw: guest OS-visible workarounds, as defined by hypervisor
+ */
+struct sev_data_snp_launch_start {
+	u64 gctx_paddr;				/* In */
+	u64 policy;				/* In */
+	u64 ma_gctx_paddr;			/* In */
+	u32 ma_en:1;				/* In */
+	u32 imi_en:1;				/* In */
+	u32 rsvd:30;
+	u8 gosvw[16];				/* In */
+} __packed;
+
+/* SNP support page type */
+enum {
+	SNP_PAGE_TYPE_NORMAL		= 0x1,
+	SNP_PAGE_TYPE_VMSA		= 0x2,
+	SNP_PAGE_TYPE_ZERO		= 0x3,
+	SNP_PAGE_TYPE_UNMEASURED	= 0x4,
+	SNP_PAGE_TYPE_SECRET		= 0x5,
+	SNP_PAGE_TYPE_CPUID		= 0x6,
+
+	SNP_PAGE_TYPE_MAX
+};
+
+/**
+ * struct sev_data_snp_launch_update - SNP_LAUNCH_UPDATE command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
+ * @page_type: encoded page type
+ * @imi_page: indicates that this page is part of the IMI of the guest
+ * @rsvd: reserved
+ * @rsvd2: reserved
+ * @address: system physical address of destination page to encrypt
+ * @rsvd3: reserved
+ * @vmpl1_perms: VMPL permission mask for VMPL1
+ * @vmpl2_perms: VMPL permission mask for VMPL2
+ * @vmpl3_perms: VMPL permission mask for VMPL3
+ * @rsvd4: reserved
+ */
+struct sev_data_snp_launch_update {
+	u64 gctx_paddr;				/* In */
+	u32 page_size:1;			/* In */
+	u32 page_type:3;			/* In */
+	u32 imi_page:1;				/* In */
+	u32 rsvd:27;
+	u32 rsvd2;
+	u64 address;				/* In */
+	u32 rsvd3:8;
+	u32 vmpl1_perms:8;			/* In */
+	u32 vmpl2_perms:8;			/* In */
+	u32 vmpl3_perms:8;			/* In */
+	u32 rsvd4;
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_finish - SNP_LAUNCH_FINISH command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @id_block_paddr: system physical address of ID block
+ * @id_auth_paddr: system physical address of ID block authentication structure
+ * @id_block_en: indicates whether ID block is present
+ * @auth_key_en: indicates whether author key is present in authentication structure
+ * @rsvd: reserved
+ * @host_data: host-supplied data for guest, not interpreted by firmware
+ */
+struct sev_data_snp_launch_finish {
+	u64 gctx_paddr;
+	u64 id_block_paddr;
+	u64 id_auth_paddr;
+	u8 id_block_en:1;
+	u8 auth_key_en:1;
+	u64 rsvd:62;
+	u8 host_data[32];
+} __packed;
+
+/**
+ * struct sev_data_snp_guest_status - SNP_GUEST_STATUS command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @address: system physical address of guest status page
+ */
+struct sev_data_snp_guest_status {
+	u64 gctx_paddr;
+	u64 address;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_reclaim - SNP_PAGE_RECLAIM command params
+ *
+ * @paddr: system physical address of page to be claimed. The 0th bit
+ *	in the address indicates the page size. 0h indicates 4 kB and
+ *	1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_reclaim {
+	u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_unsmash - SNP_PAGE_UNSMASH command params
+ *
+ * @paddr: system physical address of page to be unsmashed. The 0th bit
+ *	in the address indicates the page size. 0h indicates 4 kB and
+ *	1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_unsmash {
+	u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_snp_dbg - DBG_ENCRYPT/DBG_DECRYPT command parameters
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @src_addr: source address of data to operate on
+ * @dst_addr: destination address of data to operate on
+ */
+struct sev_data_snp_dbg {
+	u64 gctx_paddr;				/* In */
+	u64 src_addr;				/* In */
+	u64 dst_addr;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_guest_request - SNP_GUEST_REQUEST command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @req_paddr: system physical address of request page
+ * @res_paddr: system physical address of response page
+ */
+struct sev_data_snp_guest_request {
+	u64 gctx_paddr;				/* In */
+	u64 req_paddr;				/* In */
+	u64 res_paddr;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_init_ex - SNP_INIT_EX structure
+ *
+ * @init_rmp: indicate that the RMP should be initialized.
+ * @list_paddr_en: indicate that list_paddr is valid
+ * @rsvd: reserved
+ * @rsvd1: reserved
+ * @list_paddr: system physical address of range list
+ * @rsvd2: reserved
+ */
+struct sev_data_snp_init_ex {
+	u32 init_rmp:1;
+	u32 list_paddr_en:1;
+	u32 rsvd:30;
+	u32 rsvd1;
+	u64 list_paddr;
+	u8  rsvd2[48];
+} __packed;
+
+/**
+ * struct sev_data_range - RANGE structure
+ *
+ * @base: system physical address of first byte of range
+ * @page_count: number of 4KB pages in this range
+ * @rsvd: reserved
+ */
+struct sev_data_range {
+	u64 base;
+	u32 page_count;
+	u32 rsvd;
+} __packed;
+
+/**
+ * struct sev_data_range_list - RANGE_LIST structure
+ *
+ * @num_elements: number of elements in RANGE_ARRAY
+ * @rsvd: reserved
+ * @ranges: array of num_elements of type RANGE
+ */
+struct sev_data_range_list {
+	u32 num_elements;
+	u32 rsvd;
+	struct sev_data_range ranges[];
+} __packed;
+
+/**
+ * struct sev_data_snp_shutdown_ex - SNP_SHUTDOWN_EX structure
+ *
+ * @length: len of the command buffer read by the PSP
+ * @iommu_snp_shutdown: Disable enforcement of SNP in the IOMMU
+ * @rsvd1: reserved
+ */
+struct sev_data_snp_shutdown_ex {
+	u32 length;
+	u32 iommu_snp_shutdown:1;
+	u32 rsvd1:31;
+} __packed;
+
 #ifdef CONFIG_CRYPTO_DEV_SP_PSP
 
 /**
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index b44ba7dcdefc..71ba5f9f90a8 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -69,6 +69,12 @@ typedef enum {
 	SEV_RET_RESOURCE_LIMIT,
 	SEV_RET_SECURE_DATA_INVALID,
 	SEV_RET_INVALID_KEY = 0x27,
+	SEV_RET_INVALID_PAGE_SIZE,
+	SEV_RET_INVALID_PAGE_STATE,
+	SEV_RET_INVALID_MDATA_ENTRY,
+	SEV_RET_INVALID_PAGE_OWNER,
+	SEV_RET_INVALID_PAGE_AEAD_OFLOW,
+	SEV_RET_RMP_INIT_REQUIRED,
 	SEV_RET_MAX,
 } sev_ret_code;
 
@@ -155,6 +161,56 @@ struct sev_user_data_get_id2 {
 	__u32 length;				/* In/Out */
 } __packed;
 
+/**
+ * struct sev_user_data_snp_status - SNP status
+ *
+ * @api_major: API major version
+ * @api_minor: API minor version
+ * @state: current platform state
+ * @is_rmp_initialized: whether RMP is initialized or not
+ * @rsvd: reserved
+ * @build_id: firmware build id for the API version
+ * @mask_chip_id: whether chip id is present in attestation reports or not
+ * @mask_chip_key: whether attestation reports are signed or not
+ * @vlek_en: VLEK hashstick is loaded
+ * @rsvd1: reserved
+ * @guest_count: the number of guest currently managed by the firmware
+ * @current_tcb_version: current TCB version
+ * @reported_tcb_version: reported TCB version
+ */
+struct sev_user_data_snp_status {
+	__u8 api_major;			/* Out */
+	__u8 api_minor;			/* Out */
+	__u8 state;			/* Out */
+	__u8 is_rmp_initialized:1;	/* Out */
+	__u8 rsvd:7;
+	__u32 build_id;			/* Out */
+	__u32 mask_chip_id:1;		/* Out */
+	__u32 mask_chip_key:1;		/* Out */
+	__u32 vlek_en:1;		/* Out */
+	__u32 rsvd1:29;
+	__u32 guest_count;		/* Out */
+	__u64 current_tcb_version;	/* Out */
+	__u64 reported_tcb_version;	/* Out */
+} __packed;
+
+/**
+ * struct sev_user_data_snp_config - system wide configuration value for SNP.
+ *
+ * @reported_tcb: the TCB version to report in the guest attestation report.
+ * @mask_chip_id: whether chip id is present in attestation reports or not
+ * @mask_chip_key: whether attestation reports are signed or not
+ * @rsvd: reserved
+ * @rsvd1: reserved
+ */
+struct sev_user_data_snp_config {
+	__u64 reported_tcb  ;   /* In */
+	__u32 mask_chip_id:1;   /* In */
+	__u32 mask_chip_key:1;  /* In */
+	__u32 rsvd:30;          /* In */
+	__u8 rsvd1[52];
+} __packed;
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (11 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 12/26] crypto: ccp: Define the SEV-SNP commands Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-15 11:19   ` Borislav Petkov
  2024-01-15 19:53   ` Borislav Petkov
  2023-12-30 16:19 ` [PATCH v1 14/26] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
                   ` (12 subsequent siblings)
  25 siblings, 2 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh, Jarkko Sakkinen

From: Brijesh Singh <brijesh.singh@amd.com>

Before SNP VMs can be launched, the platform must be appropriately
configured and initialized. Platform initialization is accomplished via
the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
command to prepare for the first SNP guest launch after INIT.

During the execution of SNP_INIT command, the firmware configures
and enables SNP security policy enforcement in many system components.
Some system components write to regions of memory reserved by early
x86 firmware (e.g. UEFI). Other system components write to regions
provided by the operation system, hypervisor, or x86 firmware.
Such system components can only write to HV-fixed pages or Default
pages. They will error when attempting to write to other page states
after SNP_INIT enables their SNP enforcement.

Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
system physical address ranges to convert into the HV-fixed page states
during the RMP initialization. If INIT_RMP is 1, hypervisors should
provide all system physical address ranges that the hypervisor will
never assign to a guest until the next RMP re-initialization.
For instance, the memory that UEFI reserves should be included in the
range list. This allows system components that occasionally write to
memory (e.g. logging to UEFI reserved regions) to not fail due to
RMP initialization and SNP enablement.

Note that SNP_INIT(_EX) must not be executed while non-SEV guests are
executing, otherwise it is possible that the system could reset or hang.
The psp_init_on_probe module parameter was added for SEV/SEV-ES support
and the init_ex_path module parameter to allow for time for the
necessary file system to be mounted/available. SNP_INIT(_EX) does not
use the file associated with init_ex_path. So, to avoid running into
issues where SNP_INIT(_EX) is called while there are other running
guests, issue it during module probe regardless of the psp_init_on_probe
setting, but maintain the previous deferrable handling for SEV/SEV-ES
initialization.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[mdr: squash in psp_init_on_probe changes from Tom, reduce
 proliferation of 'probe' function parameter where possible]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kvm/svm/sev.c       |   5 +-
 drivers/crypto/ccp/sev-dev.c | 282 ++++++++++++++++++++++++++++++++---
 drivers/crypto/ccp/sev-dev.h |   2 +
 include/linux/psp-sev.h      |  17 ++-
 4 files changed, 283 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d0c580607f00..58e19d023d70 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -246,6 +246,7 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_platform_init_args init_args = {0};
 	int asid, ret;
 
 	if (kvm->created_vcpus)
@@ -262,7 +263,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		goto e_no_asid;
 	sev->asid = asid;
 
-	ret = sev_platform_init(&argp->error);
+	init_args.probe = false;
+	ret = sev_platform_init(&init_args);
 	if (ret)
 		goto e_free;
 
@@ -274,6 +276,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return 0;
 
 e_free:
+	argp->error = init_args.error;
 	sev_asid_free(sev);
 	sev->asid = 0;
 e_no_asid:
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index b2672234f386..85634d4f8cfe 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -29,6 +29,7 @@
 
 #include <asm/smp.h>
 #include <asm/cacheflush.h>
+#include <asm/e820/types.h>
 
 #include "psp-dev.h"
 #include "sev-dev.h"
@@ -37,6 +38,10 @@
 #define SEV_FW_FILE		"amd/sev.fw"
 #define SEV_FW_NAME_SIZE	64
 
+/* Minimum firmware version required for the SEV-SNP support */
+#define SNP_MIN_API_MAJOR	1
+#define SNP_MIN_API_MINOR	51
+
 static DEFINE_MUTEX(sev_cmd_mutex);
 static struct sev_misc_dev *misc_dev;
 
@@ -80,6 +85,13 @@ static void *sev_es_tmr;
 #define NV_LENGTH (32 * 1024)
 static void *sev_init_ex_buffer;
 
+/*
+ * SEV_DATA_RANGE_LIST:
+ *   Array containing range of pages that firmware transitions to HV-fixed
+ *   page state.
+ */
+struct sev_data_range_list *snp_range_list;
+
 static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -480,20 +492,165 @@ static inline int __sev_do_init_locked(int *psp_ret)
 		return __sev_init_locked(psp_ret);
 }
 
-static int __sev_platform_init_locked(int *error)
+static void snp_set_hsave_pa(void *arg)
+{
+	wrmsrl(MSR_VM_HSAVE_PA, 0);
+}
+
+static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
+{
+	struct sev_data_range_list *range_list = arg;
+	struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
+	size_t size;
+
+	/*
+	 * Ensure the list of HV_FIXED pages that will be passed to firmware
+	 * do not exceed the page-sized argument buffer.
+	 */
+	if ((range_list->num_elements * sizeof(struct sev_data_range) +
+	     sizeof(struct sev_data_range_list)) > PAGE_SIZE)
+		return -E2BIG;
+
+	switch (rs->desc) {
+	case E820_TYPE_RESERVED:
+	case E820_TYPE_PMEM:
+	case E820_TYPE_ACPI:
+		range->base = rs->start & PAGE_MASK;
+		size = (rs->end + 1) - rs->start;
+		range->page_count = size >> PAGE_SHIFT;
+		range_list->num_elements++;
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static int __sev_snp_init_locked(int *error)
 {
-	int rc = 0, psp_ret = SEV_RET_NO_FW_CALL;
 	struct psp_device *psp = psp_master;
+	struct sev_data_snp_init_ex data;
 	struct sev_device *sev;
+	void *arg = &data;
+	int cmd, rc = 0;
 
-	if (!psp || !psp->sev_data)
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
 		return -ENODEV;
 
 	sev = psp->sev_data;
 
+	if (sev->snp_initialized)
+		return 0;
+
+	if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
+		dev_dbg(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
+			SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
+		return 0;
+	}
+
+	/*
+	 * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
+	 * across all cores.
+	 */
+	on_each_cpu(snp_set_hsave_pa, NULL, 1);
+
+	/*
+	 * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
+	 * system physical address ranges to convert into the HV-fixed page states
+	 * during the RMP initialization.  For instance, the memory that UEFI
+	 * reserves should be included in the range list. This allows system
+	 * components that occasionally write to memory (e.g. logging to UEFI
+	 * reserved regions) to not fail due to RMP initialization and SNP enablement.
+	 */
+	if (sev_version_greater_or_equal(SNP_MIN_API_MAJOR, 52)) {
+		/*
+		 * Firmware checks that the pages containing the ranges enumerated
+		 * in the RANGES structure are either in the Default page state or in the
+		 * firmware page state.
+		 */
+		snp_range_list = kzalloc(PAGE_SIZE, GFP_KERNEL);
+		if (!snp_range_list) {
+			dev_err(sev->dev,
+				"SEV: SNP_INIT_EX range list memory allocation failed\n");
+			return -ENOMEM;
+		}
+
+		/*
+		 * Retrieve all reserved memory regions setup by UEFI from the e820 memory map
+		 * to be setup as HV-fixed pages.
+		 */
+		rc = walk_iomem_res_desc(IORES_DESC_NONE, IORESOURCE_MEM, 0, ~0,
+					 snp_range_list, snp_filter_reserved_mem_regions);
+		if (rc) {
+			dev_err(sev->dev,
+				"SEV: SNP_INIT_EX walk_iomem_res_desc failed rc = %d\n", rc);
+			return rc;
+		}
+
+		memset(&data, 0, sizeof(data));
+		data.init_rmp = 1;
+		data.list_paddr_en = 1;
+		data.list_paddr = __psp_pa(snp_range_list);
+		cmd = SEV_CMD_SNP_INIT_EX;
+	} else {
+		cmd = SEV_CMD_SNP_INIT;
+		arg = NULL;
+	}
+
+	/*
+	 * The following sequence must be issued before launching the
+	 * first SNP guest to ensure all dirty cache lines are flushed,
+	 * including from updates to the RMP table itself via RMPUPDATE
+	 * instructions:
+	 *
+	 * - WBINDV on all running CPUs
+	 * - SEV_CMD_SNP_INIT[_EX] firmware command
+	 * - WBINDV on all running CPUs
+	 * - SEV_CMD_SNP_DF_FLUSH firmware command
+	 */
+	wbinvd_on_all_cpus();
+
+	rc = __sev_do_cmd_locked(cmd, arg, error);
+	if (rc)
+		return rc;
+
+	/* Prepare for first SNP guest launch after INIT. */
+	wbinvd_on_all_cpus();
+	rc = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+	if (rc)
+		return rc;
+
+	sev->snp_initialized = true;
+	dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+
+	return rc;
+}
+
+static int __sev_platform_init_locked(int *error)
+{
+	int rc, psp_ret = SEV_RET_NO_FW_CALL;
+	struct sev_device *sev;
+
+	if (!psp_master || !psp_master->sev_data)
+		return -ENODEV;
+
+	sev = psp_master->sev_data;
+
 	if (sev->state == SEV_STATE_INIT)
 		return 0;
 
+	if (!sev_es_tmr) {
+		/* Obtain the TMR memory area for SEV-ES use */
+		sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
+		if (sev_es_tmr)
+			/* Must flush the cache before giving it to the firmware */
+			clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
+		else
+			dev_warn(sev->dev,
+				 "SEV: TMR allocation failed, SEV-ES support unavailable\n");
+		}
+
 	if (sev_init_ex_buffer) {
 		rc = sev_read_init_ex_file();
 		if (rc)
@@ -536,12 +693,46 @@ static int __sev_platform_init_locked(int *error)
 	return 0;
 }
 
-int sev_platform_init(int *error)
+static int _sev_platform_init_locked(struct sev_platform_init_args *args)
+{
+	struct sev_device *sev;
+	int rc;
+
+	if (!psp_master || !psp_master->sev_data)
+		return -ENODEV;
+
+	sev = psp_master->sev_data;
+
+	if (sev->state == SEV_STATE_INIT)
+		return 0;
+
+	/*
+	 * Legacy guests cannot be running while SNP_INIT(_EX) is executing,
+	 * so perform SEV-SNP initialization at probe time.
+	 */
+	rc = __sev_snp_init_locked(&args->error);
+	if (rc && rc != -ENODEV) {
+		/*
+		 * Don't abort the probe if SNP INIT failed,
+		 * continue to initialize the legacy SEV firmware.
+		 */
+		dev_err(sev->dev, "SEV-SNP: failed to INIT rc %d, error %#x\n",
+			rc, args->error);
+	}
+
+	/* Defer legacy SEV/SEV-ES support if allowed by caller/module. */
+	if (args->probe && !psp_init_on_probe)
+		return 0;
+
+	return __sev_platform_init_locked(&args->error);
+}
+
+int sev_platform_init(struct sev_platform_init_args *args)
 {
 	int rc;
 
 	mutex_lock(&sev_cmd_mutex);
-	rc = __sev_platform_init_locked(error);
+	rc = _sev_platform_init_locked(args);
 	mutex_unlock(&sev_cmd_mutex);
 
 	return rc;
@@ -852,6 +1043,55 @@ static int sev_update_firmware(struct device *dev)
 	return ret;
 }
 
+static int __sev_snp_shutdown_locked(int *error)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_data_snp_shutdown_ex data;
+	int ret;
+
+	if (!sev->snp_initialized)
+		return 0;
+
+	memset(&data, 0, sizeof(data));
+	data.length = sizeof(data);
+	data.iommu_snp_shutdown = 1;
+
+	wbinvd_on_all_cpus();
+
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
+	/* SHUTDOWN may require DF_FLUSH */
+	if (*error == SEV_RET_DFFLUSH_REQUIRED) {
+		ret = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
+		if (ret) {
+			dev_err(sev->dev, "SEV-SNP DF_FLUSH failed\n");
+			return ret;
+		}
+		/* reissue the shutdown command */
+		ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data,
+					  error);
+	}
+	if (ret) {
+		dev_err(sev->dev, "SEV-SNP firmware shutdown failed\n");
+		return ret;
+	}
+
+	sev->snp_initialized = false;
+	dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
+
+	return ret;
+}
+
+static int sev_snp_shutdown(int *error)
+{
+	int rc;
+
+	mutex_lock(&sev_cmd_mutex);
+	rc = __sev_snp_shutdown_locked(error);
+	mutex_unlock(&sev_cmd_mutex);
+
+	return rc;
+}
+
 static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -1299,6 +1539,8 @@ int sev_dev_init(struct psp_device *psp)
 
 static void sev_firmware_shutdown(struct sev_device *sev)
 {
+	int error;
+
 	sev_platform_shutdown(NULL);
 
 	if (sev_es_tmr) {
@@ -1315,6 +1557,13 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 			   get_order(NV_LENGTH));
 		sev_init_ex_buffer = NULL;
 	}
+
+	if (snp_range_list) {
+		kfree(snp_range_list);
+		snp_range_list = NULL;
+	}
+
+	sev_snp_shutdown(&error);
 }
 
 void sev_dev_destroy(struct psp_device *psp)
@@ -1345,7 +1594,8 @@ EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user);
 void sev_pci_init(void)
 {
 	struct sev_device *sev = psp_master->sev_data;
-	int error, rc;
+	struct sev_platform_init_args args = {0};
+	int rc;
 
 	if (!sev)
 		return;
@@ -1370,23 +1620,15 @@ void sev_pci_init(void)
 		}
 	}
 
-	/* Obtain the TMR memory area for SEV-ES use */
-	sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
-	if (sev_es_tmr)
-		/* Must flush the cache before giving it to the firmware */
-		clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
-	else
-		dev_warn(sev->dev,
-			 "SEV: TMR allocation failed, SEV-ES support unavailable\n");
-
-	if (!psp_init_on_probe)
-		return;
-
 	/* Initialize the platform */
-	rc = sev_platform_init(&error);
+	args.probe = true;
+	rc = sev_platform_init(&args);
 	if (rc)
 		dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
-			error, rc);
+			args.error, rc);
+
+	dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
+		"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
 
 	return;
 
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 778c95155e74..85506325051a 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -52,6 +52,8 @@ struct sev_device {
 	u8 build;
 
 	void *cmd_buf;
+
+	bool snp_initialized;
 };
 
 int sev_dev_init(struct psp_device *psp);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 983d314b5ff5..a39a9e5b5bc4 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -789,10 +789,23 @@ struct sev_data_snp_shutdown_ex {
 
 #ifdef CONFIG_CRYPTO_DEV_SP_PSP
 
+/**
+ * struct sev_platform_init_args
+ *
+ * @error: SEV firmware error code
+ * @probe: True if this is being called as part of CCP module probe, which
+ *  will defer SEV_INIT/SEV_INIT_EX firmware initialization until needed
+ *  unless psp_init_on_probe module param is set
+ */
+struct sev_platform_init_args {
+	int error;
+	bool probe;
+};
+
 /**
  * sev_platform_init - perform SEV INIT command
  *
- * @error: SEV command return code
+ * @args: struct sev_platform_init_args to pass in arguments
  *
  * Returns:
  * 0 if the SEV successfully processed the command
@@ -801,7 +814,7 @@ struct sev_data_snp_shutdown_ex {
  * -%ETIMEDOUT if the SEV command timed out
  * -%EIO       if the SEV returned a non-zero return code
  */
-int sev_platform_init(int *error);
+int sev_platform_init(struct sev_platform_init_args *args);
 
 /**
  * sev_platform_status - perform SEV PLATFORM_STATUS command
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 14/26] crypto: ccp: Provide API to issue SEV and SNP commands
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (12 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-17  9:48   ` Borislav Petkov
  2023-12-30 16:19 ` [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list Michael Roth
                   ` (11 subsequent siblings)
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Export sev_do_cmd() as a generic API for the hypervisor to issue
commands to manage an SEV and SNP guest. The commands for SEV and SNP
are defined in the SEV and SEV-SNP firmware specifications.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: kernel-doc fixups]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c |  3 ++-
 include/linux/psp-sev.h      | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 85634d4f8cfe..767f0ec3d5bb 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -431,7 +431,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 	return ret;
 }
 
-static int sev_do_cmd(int cmd, void *data, int *psp_ret)
+int sev_do_cmd(int cmd, void *data, int *psp_ret)
 {
 	int rc;
 
@@ -441,6 +441,7 @@ static int sev_do_cmd(int cmd, void *data, int *psp_ret)
 
 	return rc;
 }
+EXPORT_SYMBOL_GPL(sev_do_cmd);
 
 static int __sev_init_locked(int *error)
 {
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index a39a9e5b5bc4..0581f194cdd0 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -914,6 +914,22 @@ int sev_guest_df_flush(int *error);
  */
 int sev_guest_decommission(struct sev_data_decommission *data, int *error);
 
+/**
+ * sev_do_cmd - issue an SEV or an SEV-SNP command
+ *
+ * @cmd: SEV or SEV-SNP firmware command to issue
+ * @data: arguments for firmware command
+ * @psp_ret: SEV command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV    if the PSP device is not available
+ * -%ENOTSUPP  if PSP device does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO       if PSP device returned a non-zero return code
+ */
+int sev_do_cmd(int cmd, void *data, int *psp_ret);
+
 void *psp_copy_user_blob(u64 uaddr, u32 len);
 
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
@@ -929,6 +945,9 @@ sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENO
 static inline int
 sev_guest_decommission(struct sev_data_decommission *data, int *error) { return -ENODEV; }
 
+static inline int
+sev_do_cmd(int cmd, void *data, int *psp_ret) { return -ENODEV; }
+
 static inline int
 sev_guest_activate(struct sev_data_activate *data, int *error) { return -ENODEV; }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (13 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 14/26] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-08 10:45   ` Vlastimil Babka
  2023-12-30 16:19 ` [PATCH v1 16/26] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

From: Ashish Kalra <ashish.kalra@amd.com>

Pages are unsafe to be released back to the page-allocator, if they
have been transitioned to firmware/guest state and can't be reclaimed
or transitioned back to hypervisor/shared state. In this case add
them to an internal leaked pages list to ensure that they are not freed
or touched/accessed to cause fatal page faults.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: relocate to arch/x86/virt/svm/sev.c]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/sev.h |  2 ++
 arch/x86/virt/svm/sev.c    | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index d3ccb7a0c7e9..435ba9bc4510 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -264,6 +264,7 @@ void snp_dump_hva_rmpentry(unsigned long address);
 int psmash(u64 pfn);
 int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
 int rmp_make_shared(u64 pfn, enum pg_level level);
+void snp_leak_pages(u64 pfn, unsigned int npages);
 #else
 static inline bool snp_probe_rmptable_info(void) { return false; }
 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -275,6 +276,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
 	return -ENODEV;
 }
 static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
+static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
 #endif
 
 #endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index ee182351d93a..0f2e1ce241b5 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -60,6 +60,17 @@ static u64 probed_rmp_base, probed_rmp_size;
 static struct rmpentry *rmptable __ro_after_init;
 static u64 rmptable_max_pfn __ro_after_init;
 
+/* List of pages which are leaked and cannot be reclaimed */
+struct leaked_page {
+	struct page *page;
+	struct list_head list;
+};
+
+static LIST_HEAD(snp_leaked_pages_list);
+static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
+
+static unsigned long snp_nr_leaked_pages;
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"SEV-SNP: " fmt
 
@@ -476,3 +487,27 @@ int rmp_make_shared(u64 pfn, enum pg_level level)
 	return rmpupdate(pfn, &state);
 }
 EXPORT_SYMBOL_GPL(rmp_make_shared);
+
+void snp_leak_pages(u64 pfn, unsigned int npages)
+{
+	struct page *page = pfn_to_page(pfn);
+	struct leaked_page *leak;
+
+	pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
+
+	spin_lock(&snp_leaked_pages_list_lock);
+	while (npages--) {
+		leak = kzalloc(sizeof(*leak), GFP_KERNEL_ACCOUNT);
+		if (!leak)
+			goto unlock;
+		leak->page = page;
+		list_add_tail(&leak->list, &snp_leaked_pages_list);
+		dump_rmpentry(pfn);
+		snp_nr_leaked_pages++;
+		pfn++;
+		page++;
+	}
+unlock:
+	spin_unlock(&snp_leaked_pages_list_lock);
+}
+EXPORT_SYMBOL_GPL(snp_leak_pages);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 16/26] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (14 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2023-12-30 16:19 ` [PATCH v1 17/26] crypto: ccp: Handle non-volatile INIT_EX data " Michael Roth
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The behavior and requirement for the SEV-legacy command is altered when
the SNP firmware is in the INIT state. See SEV-SNP firmware specification
for more details.

Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
when SNP is enabled to satisfy new requirements for SNP. Continue
allocating a 1mb region for !SNP configuration.

While at it, provide an API to allocate a firmware page. The KVM driver
needs to allocate a firmware context page during the guest creation. The
context page needs to be updated by the firmware. See the SEV-SNP
specification for further details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: use struct sev_data_snp_page_reclaim instead of passing paddr
      directly to SEV_CMD_SNP_PAGE_RECLAIM]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 176 +++++++++++++++++++++++++++++++----
 include/linux/psp-sev.h      |   9 ++
 2 files changed, 165 insertions(+), 20 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 767f0ec3d5bb..307eb3e7c354 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -30,6 +30,7 @@
 #include <asm/smp.h>
 #include <asm/cacheflush.h>
 #include <asm/e820/types.h>
+#include <asm/sev.h>
 
 #include "psp-dev.h"
 #include "sev-dev.h"
@@ -73,9 +74,14 @@ static int psp_timeout;
  *   The TMR is a 1MB area that must be 1MB aligned.  Use the page allocator
  *   to allocate the memory, which will return aligned memory for the specified
  *   allocation order.
+ *
+ * When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB sized.
  */
-#define SEV_ES_TMR_SIZE		(1024 * 1024)
+#define SEV_TMR_SIZE		(1024 * 1024)
+#define SNP_TMR_SIZE		(2 * 1024 * 1024)
+
 static void *sev_es_tmr;
+static size_t sev_es_tmr_size = SEV_TMR_SIZE;
 
 /* INIT_EX NV Storage:
  *   The NV Storage is a 32Kb area and must be 4Kb page aligned.  Use the page
@@ -192,17 +198,6 @@ static int sev_cmd_buffer_len(int cmd)
 	return 0;
 }
 
-static void *sev_fw_alloc(unsigned long len)
-{
-	struct page *page;
-
-	page = alloc_pages(GFP_KERNEL, get_order(len));
-	if (!page)
-		return NULL;
-
-	return page_address(page);
-}
-
 static struct file *open_file_as_root(const char *filename, int flags, umode_t mode)
 {
 	struct file *fp;
@@ -333,6 +328,142 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
 	return sev_write_init_ex_file();
 }
 
+/*
+ * snp_reclaim_pages() needs __sev_do_cmd_locked(), and __sev_do_cmd_locked()
+ * needs snp_reclaim_pages(), so a forward declaration is needed.
+ */
+static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
+
+static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
+{
+	int ret, err, i;
+
+	paddr = __sme_clr(paddr);
+
+	for (i = 0; i < npages; i++, paddr += PAGE_SIZE) {
+		struct sev_data_snp_page_reclaim data = {0};
+
+		data.paddr = paddr;
+
+		if (locked)
+			ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+		else
+			ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+
+		if (ret)
+			goto cleanup;
+
+		ret = rmp_make_shared(__phys_to_pfn(paddr), PG_LEVEL_4K);
+		if (ret)
+			goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	/*
+	 * If there was a failure reclaiming the page then it is no longer safe
+	 * to release it back to the system; leak it instead.
+	 */
+	snp_leak_pages(__phys_to_pfn(paddr), npages - i);
+	return ret;
+}
+
+static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
+{
+	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+	int rc, i;
+
+	for (i = 0; i < npages; i++, pfn++) {
+		rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
+		if (rc)
+			goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	/*
+	 * Try unrolling the firmware state changes by
+	 * reclaiming the pages which were already changed to the
+	 * firmware state.
+	 */
+	snp_reclaim_pages(paddr, i, locked);
+
+	return rc;
+}
+
+static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order)
+{
+	unsigned long npages = 1ul << order, paddr;
+	struct sev_device *sev;
+	struct page *page;
+
+	if (!psp_master || !psp_master->sev_data)
+		return NULL;
+
+	page = alloc_pages(gfp_mask, order);
+	if (!page)
+		return NULL;
+
+	/* If SEV-SNP is initialized then add the page in RMP table. */
+	sev = psp_master->sev_data;
+	if (!sev->snp_initialized)
+		return page;
+
+	paddr = __pa((unsigned long)page_address(page));
+	if (rmp_mark_pages_firmware(paddr, npages, false))
+		return NULL;
+
+	return page;
+}
+
+void *snp_alloc_firmware_page(gfp_t gfp_mask)
+{
+	struct page *page;
+
+	page = __snp_alloc_firmware_pages(gfp_mask, 0);
+
+	return page ? page_address(page) : NULL;
+}
+EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
+
+static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	unsigned long paddr, npages = 1ul << order;
+
+	if (!page)
+		return;
+
+	paddr = __pa((unsigned long)page_address(page));
+	if (sev->snp_initialized &&
+	    snp_reclaim_pages(paddr, npages, locked))
+		return;
+
+	__free_pages(page, order);
+}
+
+void snp_free_firmware_page(void *addr)
+{
+	if (!addr)
+		return;
+
+	__snp_free_firmware_pages(virt_to_page(addr), 0, false);
+}
+EXPORT_SYMBOL_GPL(snp_free_firmware_page);
+
+static void *sev_fw_alloc(unsigned long len)
+{
+	struct page *page;
+
+	page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(len));
+	if (!page)
+		return NULL;
+
+	return page_address(page);
+}
+
 static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 {
 	struct psp_device *psp = psp_master;
@@ -456,7 +587,7 @@ static int __sev_init_locked(int *error)
 		data.tmr_address = __pa(sev_es_tmr);
 
 		data.flags |= SEV_INIT_FLAGS_SEV_ES;
-		data.tmr_len = SEV_ES_TMR_SIZE;
+		data.tmr_len = sev_es_tmr_size;
 	}
 
 	return __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
@@ -479,7 +610,7 @@ static int __sev_init_ex_locked(int *error)
 		data.tmr_address = __pa(sev_es_tmr);
 
 		data.flags |= SEV_INIT_FLAGS_SEV_ES;
-		data.tmr_len = SEV_ES_TMR_SIZE;
+		data.tmr_len = sev_es_tmr_size;
 	}
 
 	return __sev_do_cmd_locked(SEV_CMD_INIT_EX, &data, error);
@@ -625,6 +756,8 @@ static int __sev_snp_init_locked(int *error)
 	sev->snp_initialized = true;
 	dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
 
+	sev_es_tmr_size = SNP_TMR_SIZE;
+
 	return rc;
 }
 
@@ -643,14 +776,16 @@ static int __sev_platform_init_locked(int *error)
 
 	if (!sev_es_tmr) {
 		/* Obtain the TMR memory area for SEV-ES use */
-		sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
-		if (sev_es_tmr)
+		sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
+		if (sev_es_tmr) {
 			/* Must flush the cache before giving it to the firmware */
-			clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
-		else
+			if (!sev->snp_initialized)
+				clflush_cache_range(sev_es_tmr, sev_es_tmr_size);
+		} else {
 			dev_warn(sev->dev,
 				 "SEV: TMR allocation failed, SEV-ES support unavailable\n");
 		}
+	}
 
 	if (sev_init_ex_buffer) {
 		rc = sev_read_init_ex_file();
@@ -1548,8 +1683,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 		/* The TMR area was encrypted, flush it from the cache */
 		wbinvd_on_all_cpus();
 
-		free_pages((unsigned long)sev_es_tmr,
-			   get_order(SEV_ES_TMR_SIZE));
+		__snp_free_firmware_pages(virt_to_page(sev_es_tmr),
+					  get_order(sev_es_tmr_size),
+					  false);
 		sev_es_tmr = NULL;
 	}
 
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 0581f194cdd0..16d0da895680 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -931,6 +931,8 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
 int sev_do_cmd(int cmd, void *data, int *psp_ret);
 
 void *psp_copy_user_blob(u64 uaddr, u32 len);
+void *snp_alloc_firmware_page(gfp_t mask);
+void snp_free_firmware_page(void *addr);
 
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
 
@@ -958,6 +960,13 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int
 
 static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }
 
+static inline void *snp_alloc_firmware_page(gfp_t mask)
+{
+	return NULL;
+}
+
+static inline void snp_free_firmware_page(void *addr) { }
+
 #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
 
 #endif	/* __PSP_SEV_H__ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 17/26] crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (15 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 16/26] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-18 14:03   ` Borislav Petkov
  2023-12-30 16:19 ` [PATCH v1 18/26] crypto: ccp: Handle legacy SEV commands " Michael Roth
                   ` (8 subsequent siblings)
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

From: Tom Lendacky <thomas.lendacky@amd.com>

For SEV/SEV-ES, a buffer can be used to access non-volatile data so it
can be initialized from a specified by the init_ex_path CCP module
parameter instead of relying on the SPI bus for NV storage, and
afterward the buffer can be read from to sync new data back to the file.

When SNP is enabled, the pages comprising this buffer need to be set to
firmware-owned in the RMP table before they can be accessed by firmware
for subsequent updates to the initial contents.

Implement that handling here.

Setting these pages to firmware-owned will also result in them being
removed from the kernel direct map, since generally the hypervisor does
not access firmware-owned pages. However, in this exceptional case,
the hypervisor does need to read the buffer to transfer updated contents
back to the file at init_ex_path.

Support this by using vmap() to create temporary mappings to the
firmware-owned buffer whenever accesses are made, and rework the
existing code to track the struct page corresponding to the start of the
buffer rather than the virtual address that was previously provided via
the direct map.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 104 ++++++++++++++++++++++++++---------
 1 file changed, 79 insertions(+), 25 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 307eb3e7c354..dfe7f7afc411 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -88,8 +88,9 @@ static size_t sev_es_tmr_size = SEV_TMR_SIZE;
  *   allocator to allocate the memory, which will return aligned memory for the
  *   specified allocation order.
  */
-#define NV_LENGTH (32 * 1024)
-static void *sev_init_ex_buffer;
+#define NV_LENGTH	(32 * 1024)
+#define NV_PAGES	(NV_LENGTH >> PAGE_SHIFT)
+static struct page *sev_init_ex_page;
 
 /*
  * SEV_DATA_RANGE_LIST:
@@ -231,7 +232,7 @@ static int sev_read_init_ex_file(void)
 
 	lockdep_assert_held(&sev_cmd_mutex);
 
-	if (!sev_init_ex_buffer)
+	if (!sev_init_ex_page)
 		return -EOPNOTSUPP;
 
 	fp = open_file_as_root(init_ex_path, O_RDONLY, 0);
@@ -251,7 +252,7 @@ static int sev_read_init_ex_file(void)
 		return ret;
 	}
 
-	nread = kernel_read(fp, sev_init_ex_buffer, NV_LENGTH, NULL);
+	nread = kernel_read(fp, page_to_virt(sev_init_ex_page), NV_LENGTH, NULL);
 	if (nread != NV_LENGTH) {
 		dev_info(sev->dev,
 			"SEV: could not read %u bytes to non volatile memory area, ret %ld\n",
@@ -264,16 +265,44 @@ static int sev_read_init_ex_file(void)
 	return 0;
 }
 
+/*
+ * When SNP is enabled, the pages comprising the buffer used to populate
+ * the file specified by the init_ex_path module parameter needs to be set
+ * to firmware-owned, which removes the mapping from the kernel direct
+ * mapping since generally the hypervisor does not access firmware-owned
+ * pages. However, in this case the hypervisor does need to read the
+ * buffer to transfer the contents to the file at init_ex_path, so this
+ * function is used to create a temporary virtual mapping to be used for
+ * this purpose.
+ */
+static void *vmap_sev_init_ex_buffer(void)
+{
+	struct page *pages[NV_PAGES];
+	unsigned long base_pfn;
+	int i;
+
+	if (WARN_ON_ONCE(!sev_init_ex_page))
+		return NULL;
+
+	base_pfn = page_to_pfn(sev_init_ex_page);
+
+	for (i = 0; i < NV_PAGES; i++)
+		pages[i] = pfn_to_page(base_pfn + i);
+
+	return vmap(pages, NV_PAGES, VM_MAP, PAGE_KERNEL_RO);
+}
+
 static int sev_write_init_ex_file(void)
 {
 	struct sev_device *sev = psp_master->sev_data;
+	void *sev_init_ex_buffer;
 	struct file *fp;
 	loff_t offset = 0;
 	ssize_t nwrite;
 
 	lockdep_assert_held(&sev_cmd_mutex);
 
-	if (!sev_init_ex_buffer)
+	if (!sev_init_ex_page)
 		return 0;
 
 	fp = open_file_as_root(init_ex_path, O_CREAT | O_WRONLY, 0600);
@@ -286,6 +315,12 @@ static int sev_write_init_ex_file(void)
 		return ret;
 	}
 
+	sev_init_ex_buffer = vmap_sev_init_ex_buffer();
+	if (!sev_init_ex_buffer) {
+		dev_err(sev->dev, "SEV: failed to map non-volative memory area\n");
+		return -EIO;
+	}
+
 	nwrite = kernel_write(fp, sev_init_ex_buffer, NV_LENGTH, &offset);
 	vfs_fsync(fp, 0);
 	filp_close(fp, NULL);
@@ -294,10 +329,12 @@ static int sev_write_init_ex_file(void)
 		dev_err(sev->dev,
 			"SEV: failed to write %u bytes to non volatile memory area, ret %ld\n",
 			NV_LENGTH, nwrite);
+		vunmap(sev_init_ex_buffer);
 		return -EIO;
 	}
 
 	dev_dbg(sev->dev, "SEV: write successful to NV file\n");
+	vunmap(sev_init_ex_buffer);
 
 	return 0;
 }
@@ -306,7 +343,7 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
 {
 	lockdep_assert_held(&sev_cmd_mutex);
 
-	if (!sev_init_ex_buffer)
+	if (!sev_init_ex_page)
 		return 0;
 
 	/*
@@ -599,7 +636,7 @@ static int __sev_init_ex_locked(int *error)
 
 	memset(&data, 0, sizeof(data));
 	data.length = sizeof(data);
-	data.nv_address = __psp_pa(sev_init_ex_buffer);
+	data.nv_address = sme_me_mask | PFN_PHYS(page_to_pfn(sev_init_ex_page));
 	data.nv_len = NV_LENGTH;
 
 	if (sev_es_tmr) {
@@ -618,7 +655,7 @@ static int __sev_init_ex_locked(int *error)
 
 static inline int __sev_do_init_locked(int *psp_ret)
 {
-	if (sev_init_ex_buffer)
+	if (sev_init_ex_page)
 		return __sev_init_ex_locked(psp_ret);
 	else
 		return __sev_init_locked(psp_ret);
@@ -787,10 +824,38 @@ static int __sev_platform_init_locked(int *error)
 		}
 	}
 
-	if (sev_init_ex_buffer) {
+	/*
+	 * If an init_ex_path is provided allocate a buffer for the file and
+	 * read in the contents. Additionally, if SNP is initialized, convert
+	 * the buffer pages to firmware pages.
+	 */
+	if (init_ex_path && !sev_init_ex_page) {
+		struct page *page;
+
+		page = alloc_pages(GFP_KERNEL, get_order(NV_LENGTH));
+		if (!page) {
+			dev_err(sev->dev, "SEV: INIT_EX NV memory allocation failed\n");
+			return -ENOMEM;
+		}
+
+		sev_init_ex_page = page;
+
 		rc = sev_read_init_ex_file();
 		if (rc)
 			return rc;
+
+		/* If SEV-SNP is initialized, transition to firmware page. */
+		if (sev->snp_initialized) {
+			unsigned long npages;
+
+			npages = 1UL << get_order(NV_LENGTH);
+			if (rmp_mark_pages_firmware(PFN_PHYS(page_to_pfn(sev_init_ex_page)),
+						    npages, false)) {
+				dev_err(sev->dev,
+					"SEV: INIT_EX NV memory page state change failed.\n");
+				return -ENOMEM;
+			}
+		}
 	}
 
 	rc = __sev_do_init_locked(&psp_ret);
@@ -1689,10 +1754,11 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 		sev_es_tmr = NULL;
 	}
 
-	if (sev_init_ex_buffer) {
-		free_pages((unsigned long)sev_init_ex_buffer,
-			   get_order(NV_LENGTH));
-		sev_init_ex_buffer = NULL;
+	if (sev_init_ex_page) {
+		__snp_free_firmware_pages(sev_init_ex_page,
+					  get_order(NV_LENGTH),
+					  true);
+		sev_init_ex_page = NULL;
 	}
 
 	if (snp_range_list) {
@@ -1745,18 +1811,6 @@ void sev_pci_init(void)
 	if (sev_update_firmware(sev->dev) == 0)
 		sev_get_api_version();
 
-	/* If an init_ex_path is provided rely on INIT_EX for PSP initialization
-	 * instead of INIT.
-	 */
-	if (init_ex_path) {
-		sev_init_ex_buffer = sev_fw_alloc(NV_LENGTH);
-		if (!sev_init_ex_buffer) {
-			dev_err(sev->dev,
-				"SEV: INIT_EX NV memory allocation failed\n");
-			goto err;
-		}
-	}
-
 	/* Initialize the platform */
 	args.probe = true;
 	rc = sev_platform_init(&args);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 18/26] crypto: ccp: Handle legacy SEV commands when SNP is enabled
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (16 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 17/26] crypto: ccp: Handle non-volatile INIT_EX data " Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-19 17:18   ` Borislav Petkov
  2023-12-30 16:19 ` [PATCH v1 19/26] iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown Michael Roth
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

The behavior of legacy SEV commands is altered when the firmware is
initialized for SNP support. In that case, all command buffer memory
that may get written to by legacy SEV commands must be marked as
firmware-owned in the RMP table prior to issuing the command.

Additionally, when a command buffer contains a system physical address
that points to additional buffers that firmware may write to, special
handling is needed depending on whether:

  1) the system physical address points to guest memory
  2) the system physical address points to host memory

To handle case #1, the pages of these buffers are changed to
firmware-owned in the RMP table before issuing the command, and restored
to after the command completes.

For case #2, a bounce buffer is used instead of the original address.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 421 ++++++++++++++++++++++++++++++++++-
 drivers/crypto/ccp/sev-dev.h |   3 +
 2 files changed, 414 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index dfe7f7afc411..8cfb376ca2e7 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -43,6 +43,15 @@
 #define SNP_MIN_API_MAJOR	1
 #define SNP_MIN_API_MINOR	51
 
+/*
+ * Maximum number of firmware-writable buffers that might be specified
+ * in the parameters of a legacy SEV command buffer.
+ */
+#define CMD_BUF_FW_WRITABLE_MAX 2
+
+/* Leave room in the descriptor array for an end-of-list indicator. */
+#define CMD_BUF_DESC_MAX (CMD_BUF_FW_WRITABLE_MAX + 1)
+
 static DEFINE_MUTEX(sev_cmd_mutex);
 static struct sev_misc_dev *misc_dev;
 
@@ -501,13 +510,351 @@ static void *sev_fw_alloc(unsigned long len)
 	return page_address(page);
 }
 
+/**
+ * struct cmd_buf_desc - descriptors for managing legacy SEV command address
+ * parameters corresponding to buffers that may be written to by firmware.
+ *
+ * @paddr_ptr: pointer the address parameter in the command buffer, which may
+ *	need to be saved/restored depending on whether a bounce buffer is used.
+ *	Must be NULL if this descriptor is only an end-of-list indicator.
+ * @paddr_orig: storage for the original address parameter, which can be used to
+ *	restore the original value in @paddr_ptr in cases where it is replaced
+ *	with the address of a bounce buffer.
+ * @len: length of buffer located at the address originally stored at @paddr_ptr
+ * @guest_owned: true if the address corresponds to guest-owned pages, in which
+ *	case bounce buffers are not needed.
+ */
+struct cmd_buf_desc {
+	u64 *paddr_ptr;
+	u64 paddr_orig;
+	u32 len;
+	bool guest_owned;
+};
+
+/*
+ * If a legacy SEV command parameter is a memory address, those pages in
+ * turn need to be transitioned to/from firmware-owned before/after
+ * executing the firmware command.
+ *
+ * Additionally, in cases where those pages are not guest-owned, a bounce
+ * buffer is needed in place of the original memory address parameter.
+ *
+ * A set of descriptors are used to keep track of this handling, and
+ * initialized here based on the specific commands being executed.
+ */
+static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
+					   struct cmd_buf_desc *desc_list)
+{
+	switch (cmd) {
+	case SEV_CMD_PDH_CERT_EXPORT: {
+		struct sev_data_pdh_cert_export *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->pdh_cert_address;
+		desc_list[0].len = data->pdh_cert_len;
+		desc_list[1].paddr_ptr = &data->cert_chain_address;
+		desc_list[1].len = data->cert_chain_len;
+		break;
+	}
+	case SEV_CMD_GET_ID: {
+		struct sev_data_get_id *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].len = data->len;
+		break;
+	}
+	case SEV_CMD_PEK_CSR: {
+		struct sev_data_pek_csr *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].len = data->len;
+		break;
+	}
+	case SEV_CMD_LAUNCH_UPDATE_DATA: {
+		struct sev_data_launch_update_data *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].len = data->len;
+		desc_list[0].guest_owned = true;
+		break;
+	}
+	case SEV_CMD_LAUNCH_UPDATE_VMSA: {
+		struct sev_data_launch_update_vmsa *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].len = data->len;
+		desc_list[0].guest_owned = true;
+		break;
+	}
+	case SEV_CMD_LAUNCH_MEASURE: {
+		struct sev_data_launch_measure *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].len = data->len;
+		break;
+	}
+	case SEV_CMD_LAUNCH_UPDATE_SECRET: {
+		struct sev_data_launch_secret *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->guest_address;
+		desc_list[0].len = data->guest_len;
+		desc_list[0].guest_owned = true;
+		break;
+	}
+	case SEV_CMD_DBG_DECRYPT: {
+		struct sev_data_dbg *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->dst_addr;
+		desc_list[0].len = data->len;
+		desc_list[0].guest_owned = true;
+		break;
+	}
+	case SEV_CMD_DBG_ENCRYPT: {
+		struct sev_data_dbg *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->dst_addr;
+		desc_list[0].len = data->len;
+		desc_list[0].guest_owned = true;
+		break;
+	}
+	case SEV_CMD_ATTESTATION_REPORT: {
+		struct sev_data_attestation_report *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].len = data->len;
+		break;
+	}
+	case SEV_CMD_SEND_START: {
+		struct sev_data_send_start *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->session_address;
+		desc_list[0].len = data->session_len;
+		break;
+	}
+	case SEV_CMD_SEND_UPDATE_DATA: {
+		struct sev_data_send_update_data *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->hdr_address;
+		desc_list[0].len = data->hdr_len;
+		desc_list[1].paddr_ptr = &data->trans_address;
+		desc_list[1].len = data->trans_len;
+		break;
+	}
+	case SEV_CMD_SEND_UPDATE_VMSA: {
+		struct sev_data_send_update_vmsa *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->hdr_address;
+		desc_list[0].len = data->hdr_len;
+		desc_list[1].paddr_ptr = &data->trans_address;
+		desc_list[1].len = data->trans_len;
+		break;
+	}
+	case SEV_CMD_RECEIVE_UPDATE_DATA: {
+		struct sev_data_receive_update_data *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->guest_address;
+		desc_list[0].len = data->guest_len;
+		desc_list[0].guest_owned = true;
+		break;
+	}
+	case SEV_CMD_RECEIVE_UPDATE_VMSA: {
+		struct sev_data_receive_update_vmsa *data = cmd_buf;
+
+		desc_list[0].paddr_ptr = &data->guest_address;
+		desc_list[0].len = data->guest_len;
+		desc_list[0].guest_owned = true;
+		break;
+	}
+	default:
+		break;
+	}
+}
+
+static int snp_map_cmd_buf_desc(struct cmd_buf_desc *desc)
+{
+	unsigned long paddr;
+	unsigned int npages;
+
+	if (!desc->len)
+		return 0;
+
+	/* Allocate a bounce buffer if this isn't a guest owned page. */
+	if (!desc->guest_owned) {
+		struct page *page;
+
+		page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(desc->len));
+		if (!page) {
+			pr_warn("Failed to allocate bounce buffer for SEV legacy command.\n");
+			return -ENOMEM;
+		}
+
+		desc->paddr_orig = *desc->paddr_ptr;
+		*desc->paddr_ptr = __psp_pa(page_to_virt(page));
+	}
+
+	paddr = *desc->paddr_ptr;
+	npages = PAGE_ALIGN(desc->len) >> PAGE_SHIFT;
+
+	/* Transition the buffer to firmware-owned. */
+	if (rmp_mark_pages_firmware(paddr, npages, true)) {
+		pr_warn("Failed move pages to firmware-owned state for SEV legacy command.\n");
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int snp_unmap_cmd_buf_desc(struct cmd_buf_desc *desc)
+{
+	unsigned long paddr;
+	unsigned int npages;
+
+	if (!desc->len)
+		return 0;
+
+	paddr = *desc->paddr_ptr;
+	npages = PAGE_ALIGN(desc->len) >> PAGE_SHIFT;
+
+	/* Transition the buffers back to hypervisor-owned. */
+	if (snp_reclaim_pages(paddr, npages, true)) {
+		pr_warn("Failed to reclaim firmware-owned pages while issuing SEV legacy command.\n");
+		return -EFAULT;
+	}
+
+	/* Copy data from bounce buffer and then free it. */
+	if (!desc->guest_owned) {
+		void *bounce_buf = __va(__sme_clr(paddr));
+		void *dst_buf = __va(__sme_clr(desc->paddr_orig));
+
+		memcpy(dst_buf, bounce_buf, desc->len);
+		__free_pages(virt_to_page(bounce_buf), get_order(desc->len));
+
+		/* Restore the original address in the command buffer. */
+		*desc->paddr_ptr = desc->paddr_orig;
+	}
+
+	return 0;
+}
+
+static int snp_map_cmd_buf_desc_list(int cmd, void *cmd_buf, struct cmd_buf_desc *desc_list)
+{
+	int i, n;
+
+	snp_populate_cmd_buf_desc_list(cmd, cmd_buf, desc_list);
+
+	for (i = 0; i < CMD_BUF_DESC_MAX; i++) {
+		struct cmd_buf_desc *desc = &desc_list[i];
+
+		if (!desc->paddr_ptr)
+			break;
+
+		if (snp_map_cmd_buf_desc(desc))
+			goto err_unmap;
+	}
+
+	return 0;
+
+err_unmap:
+	n = i;
+	for (i = 0; i < n; i++)
+		snp_unmap_cmd_buf_desc(&desc_list[i]);
+
+	return -EFAULT;
+}
+
+static int snp_unmap_cmd_buf_desc_list(struct cmd_buf_desc *desc_list)
+{
+	int i;
+
+	for (i = 0; i < CMD_BUF_DESC_MAX; i++) {
+		struct cmd_buf_desc *desc = &desc_list[i];
+
+		if (!desc->paddr_ptr)
+			break;
+
+		if (snp_unmap_cmd_buf_desc(desc))
+			return -EFAULT;
+	}
+
+	return 0;
+}
+
+static bool sev_cmd_buf_writable(int cmd)
+{
+	switch (cmd) {
+	case SEV_CMD_PLATFORM_STATUS:
+	case SEV_CMD_GUEST_STATUS:
+	case SEV_CMD_LAUNCH_START:
+	case SEV_CMD_RECEIVE_START:
+	case SEV_CMD_LAUNCH_MEASURE:
+	case SEV_CMD_SEND_START:
+	case SEV_CMD_SEND_UPDATE_DATA:
+	case SEV_CMD_SEND_UPDATE_VMSA:
+	case SEV_CMD_PEK_CSR:
+	case SEV_CMD_PDH_CERT_EXPORT:
+	case SEV_CMD_GET_ID:
+	case SEV_CMD_ATTESTATION_REPORT:
+		return true;
+	default:
+		return false;
+	}
+}
+
+/* After SNP is INIT'ed, the behavior of legacy SEV commands is changed. */
+static bool snp_legacy_handling_needed(int cmd)
+{
+	struct sev_device *sev = psp_master->sev_data;
+
+	return cmd < SEV_CMD_SNP_INIT && sev->snp_initialized;
+}
+
+static int snp_prep_cmd_buf(int cmd, void *cmd_buf, struct cmd_buf_desc *desc_list)
+{
+	if (!snp_legacy_handling_needed(cmd))
+		return 0;
+
+	if (snp_map_cmd_buf_desc_list(cmd, cmd_buf, desc_list))
+		return -EFAULT;
+
+	/*
+	 * Before command execution, the command buffer needs to be put into
+	 * the firmware-owned state.
+	 */
+	if (sev_cmd_buf_writable(cmd)) {
+		if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
+			return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int snp_reclaim_cmd_buf(int cmd, void *cmd_buf, struct cmd_buf_desc *desc_list)
+{
+	if (!snp_legacy_handling_needed(cmd))
+		return 0;
+
+	/*
+	 * After command completion, the command buffer needs to be put back
+	 * into the hypervisor-owned state.
+	 */
+	if (sev_cmd_buf_writable(cmd))
+		if (snp_reclaim_pages(__pa(cmd_buf), 1, true))
+			return -EFAULT;
+
+	if (snp_unmap_cmd_buf_desc_list(desc_list))
+		return -EFAULT;
+
+	return 0;
+}
+
 static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 {
+	struct cmd_buf_desc desc_list[CMD_BUF_DESC_MAX] = {0};
 	struct psp_device *psp = psp_master;
 	struct sev_device *sev;
 	unsigned int cmdbuff_hi, cmdbuff_lo;
 	unsigned int phys_lsb, phys_msb;
 	unsigned int reg, ret = 0;
+	void *cmd_buf;
 	int buf_len;
 
 	if (!psp || !psp->sev_data)
@@ -527,12 +874,47 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 	 * work for some memory, e.g. vmalloc'd addresses, and @data may not be
 	 * physically contiguous.
 	 */
-	if (data)
-		memcpy(sev->cmd_buf, data, buf_len);
+	if (data) {
+		/*
+		 * Commands are generally issued one at a time and require the
+		 * sev_cmd_mutex, but there could be recursive firmware requests
+		 * due to SEV_CMD_SNP_PAGE_RECLAIM needing to be issued while
+		 * preparing buffers for another command. This is the only known
+		 * case of nesting in the current code, so exactly one
+		 * additional command buffer is available for that purpose.
+		 */
+		if (!sev->cmd_buf_active) {
+			cmd_buf = sev->cmd_buf;
+			sev->cmd_buf_active = true;
+		} else if (!sev->cmd_buf_backup_active) {
+			cmd_buf = sev->cmd_buf_backup;
+			sev->cmd_buf_backup_active = true;
+		} else {
+			dev_err(sev->dev,
+				"SEV: too many firmware commands are in-progress, no command buffers available.\n");
+			return -EBUSY;
+		}
+
+		memcpy(cmd_buf, data, buf_len);
+
+		/*
+		 * The behavior of the SEV-legacy commands is altered when the
+		 * SNP firmware is in the INIT state.
+		 */
+		ret = snp_prep_cmd_buf(cmd, cmd_buf, desc_list);
+		if (ret) {
+			dev_err(sev->dev,
+				"SEV: failed to prepare buffer for legacy command %#x. Error: %d\n",
+				cmd, ret);
+			return ret;
+		}
+	} else {
+		cmd_buf = sev->cmd_buf;
+	}
 
 	/* Get the physical address of the command buffer */
-	phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
-	phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
+	phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
+	phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
 
 	dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
 		cmd, phys_msb, phys_lsb, psp_timeout);
@@ -586,15 +968,32 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 		ret = sev_write_init_ex_file_if_required(cmd);
 	}
 
-	print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
-			     buf_len, false);
-
 	/*
 	 * Copy potential output from the PSP back to data.  Do this even on
 	 * failure in case the caller wants to glean something from the error.
 	 */
-	if (data)
-		memcpy(data, sev->cmd_buf, buf_len);
+	if (data) {
+		/*
+		 * Restore the page state after the command completes.
+		 */
+		ret = snp_reclaim_cmd_buf(cmd, cmd_buf, desc_list);
+		if (ret) {
+			dev_err(sev->dev,
+				"SEV: failed to reclaim buffer for legacy command %#x. Error: %d\n",
+				cmd, ret);
+			return ret;
+		}
+
+		memcpy(data, cmd_buf, buf_len);
+
+		if (sev->cmd_buf_backup_active)
+			sev->cmd_buf_backup_active = false;
+		else
+			sev->cmd_buf_active = false;
+	}
+
+	print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
+			     buf_len, false);
 
 	return ret;
 }
@@ -1696,10 +2095,12 @@ int sev_dev_init(struct psp_device *psp)
 	if (!sev)
 		goto e_err;
 
-	sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
+	sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
 	if (!sev->cmd_buf)
 		goto e_sev;
 
+	sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+
 	psp->sev_data = sev;
 
 	sev->dev = dev;
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 85506325051a..3e4e5574e88a 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -52,6 +52,9 @@ struct sev_device {
 	u8 build;
 
 	void *cmd_buf;
+	void *cmd_buf_backup;
+	bool cmd_buf_active;
+	bool cmd_buf_backup_active;
 
 	bool snp_initialized;
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 19/26] iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (17 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 18/26] crypto: ccp: Handle legacy SEV commands " Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2023-12-30 16:19 ` [PATCH v1 20/26] crypto: ccp: Add debug support for decrypting pages Michael Roth
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

From: Ashish Kalra <ashish.kalra@amd.com>

Add a new IOMMU API interface amd_iommu_snp_disable() to transition
IOMMU pages to Hypervisor state from Reclaim state after SNP_SHUTDOWN_EX
command. Invoke this API from the CCP driver after SNP_SHUTDOWN_EX
command.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 20 +++++++++
 drivers/iommu/amd/init.c     | 79 ++++++++++++++++++++++++++++++++++++
 include/linux/amd-iommu.h    |  6 +++
 3 files changed, 105 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 8cfb376ca2e7..47fc58ed9e6a 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -26,6 +26,7 @@
 #include <linux/fs.h>
 #include <linux/fs_struct.h>
 #include <linux/psp.h>
+#include <linux/amd-iommu.h>
 
 #include <asm/smp.h>
 #include <asm/cacheflush.h>
@@ -1675,6 +1676,25 @@ static int __sev_snp_shutdown_locked(int *error)
 		return ret;
 	}
 
+	/*
+	 * SNP_SHUTDOWN_EX with IOMMU_SNP_SHUTDOWN set to 1 disables SNP
+	 * enforcement by the IOMMU and also transitions all pages
+	 * associated with the IOMMU to the Reclaim state.
+	 * Firmware was transitioning the IOMMU pages to Hypervisor state
+	 * before version 1.53. But, accounting for the number of assigned
+	 * 4kB pages in a 2M page was done incorrectly by not transitioning
+	 * to the Reclaim state. This resulted in RMP #PF when later accessing
+	 * the 2M page containing those pages during kexec boot. Hence, the
+	 * firmware now transitions these pages to Reclaim state and hypervisor
+	 * needs to transition these pages to shared state. SNP Firmware
+	 * version 1.53 and above are needed for kexec boot.
+	 */
+	ret = amd_iommu_snp_disable();
+	if (ret) {
+		dev_err(sev->dev, "SNP IOMMU shutdown failed\n");
+		return ret;
+	}
+
 	sev->snp_initialized = false;
 	dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 96a1a7fed470..3d95b2e67784 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -30,6 +30,7 @@
 #include <asm/io_apic.h>
 #include <asm/irq_remapping.h>
 #include <asm/set_memory.h>
+#include <asm/sev.h>
 
 #include <linux/crash_dump.h>
 
@@ -3797,3 +3798,81 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
 
 	return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);
 }
+
+#ifdef CONFIG_KVM_AMD_SEV
+static int iommu_page_make_shared(void *page)
+{
+	unsigned long paddr, pfn;
+
+	paddr = iommu_virt_to_phys(page);
+	/* Cbit maybe set in the paddr */
+	pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+
+	if (!(pfn % PTRS_PER_PMD)) {
+		int ret, level;
+		bool assigned;
+
+		ret = snp_lookup_rmpentry(pfn, &assigned, &level);
+		if (ret)
+			pr_warn("IOMMU PFN %lx RMP lookup failed, ret %d\n",
+				pfn, ret);
+
+		if (!assigned)
+			pr_warn("IOMMU PFN %lx not assigned in RMP table\n",
+				pfn);
+
+		if (level > PG_LEVEL_4K) {
+			ret = psmash(pfn);
+			if (ret) {
+				pr_warn("IOMMU PFN %lx had a huge RMP entry, but attempted psmash failed, ret: %d, level: %d\n",
+					pfn, ret, level);
+			}
+		}
+	}
+
+	return rmp_make_shared(pfn, PG_LEVEL_4K);
+}
+
+static int iommu_make_shared(void *va, size_t size)
+{
+	void *page;
+	int ret;
+
+	if (!va)
+		return 0;
+
+	for (page = va; page < (va + size); page += PAGE_SIZE) {
+		ret = iommu_page_make_shared(page);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int amd_iommu_snp_disable(void)
+{
+	struct amd_iommu *iommu;
+	int ret;
+
+	if (!amd_iommu_snp_en)
+		return 0;
+
+	for_each_iommu(iommu) {
+		ret = iommu_make_shared(iommu->evt_buf, EVT_BUFFER_SIZE);
+		if (ret)
+			return ret;
+
+		ret = iommu_make_shared(iommu->ppr_log, PPR_LOG_SIZE);
+		if (ret)
+			return ret;
+
+		ret = iommu_make_shared((void *)iommu->cmd_sem, PAGE_SIZE);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(amd_iommu_snp_disable);
+#endif
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 7365be00a795..2b90c48a6a87 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -85,4 +85,10 @@ int amd_iommu_pc_get_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn,
 		u64 *value);
 struct amd_iommu *get_amd_iommu(unsigned int idx);
 
+#ifdef CONFIG_KVM_AMD_SEV
+int amd_iommu_snp_disable(void);
+#else
+static inline int amd_iommu_snp_disable(void) { return 0; }
+#endif
+
 #endif /* _ASM_X86_AMD_IOMMU_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 20/26] crypto: ccp: Add debug support for decrypting pages
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (18 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 19/26] iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-10 14:59   ` Sean Christopherson
  2023-12-30 16:19 ` [PATCH v1 21/26] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump Michael Roth
                   ` (5 subsequent siblings)
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

Add support to decrypt guest encrypted memory. These API interfaces can
be used for example to dump VMCBs on SNP guest exit.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: minor commit fixups]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 32 ++++++++++++++++++++++++++++++++
 include/linux/psp-sev.h      | 19 +++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 47fc58ed9e6a..9792c7af3005 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2053,6 +2053,38 @@ int sev_guest_df_flush(int *error)
 }
 EXPORT_SYMBOL_GPL(sev_guest_df_flush);
 
+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
+{
+	struct sev_data_snp_dbg data = {0};
+	struct sev_device *sev;
+	int ret;
+
+	if (!psp_master || !psp_master->sev_data)
+		return -ENODEV;
+
+	sev = psp_master->sev_data;
+
+	if (!sev->snp_initialized)
+		return -EINVAL;
+
+	data.gctx_paddr = sme_me_mask | (gctx_pfn << PAGE_SHIFT);
+	data.src_addr = sme_me_mask | (src_pfn << PAGE_SHIFT);
+	data.dst_addr = sme_me_mask | (dst_pfn << PAGE_SHIFT);
+
+	/* The destination page must be in the firmware state. */
+	if (rmp_mark_pages_firmware(data.dst_addr, 1, false))
+		return -EIO;
+
+	ret = sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, &data, error);
+
+	/* Restore the page state */
+	if (snp_reclaim_pages(data.dst_addr, 1, false))
+		ret = -EIO;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);
+
 static void sev_exit(struct kref *ref)
 {
 	misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 16d0da895680..b14008388a37 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -930,6 +930,20 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
  */
 int sev_do_cmd(int cmd, void *data, int *psp_ret);
 
+/**
+ * snp_guest_dbg_decrypt_page - perform SEV SNP_DBG_DECRYPT command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV    if the SEV device is not available
+ * -%ENOTSUPP  if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO       if the SEV returned a non-zero return code
+ */
+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error);
+
 void *psp_copy_user_blob(u64 uaddr, u32 len);
 void *snp_alloc_firmware_page(gfp_t mask);
 void snp_free_firmware_page(void *addr);
@@ -960,6 +974,11 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int
 
 static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }
 
+static inline int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
+{
+	return -ENODEV;
+}
+
 static inline void *snp_alloc_firmware_page(gfp_t mask)
 {
 	return NULL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 21/26] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (19 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 20/26] crypto: ccp: Add debug support for decrypting pages Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-21 11:49   ` Borislav Petkov
  2023-12-30 16:19 ` [PATCH v1 22/26] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
                   ` (4 subsequent siblings)
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

From: Ashish Kalra <ashish.kalra@amd.com>

Add a kdump safe version of sev_firmware_shutdown() registered as a
crash_kexec_post_notifier, which is invoked during panic/crash to do
SEV/SNP shutdown. This is required for transitioning all IOMMU pages
to reclaim/hypervisor state, otherwise re-init of IOMMU pages during
crashdump kernel boot fails and panics the crashdump kernel. This
panic notifier runs in atomic context, hence it ensures not to
acquire any locks/mutexes and polls for PSP command completion
instead of depending on PSP command completion interrupt.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: remove use of "we" in comments]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/kernel/crash.c      |   7 +++
 drivers/crypto/ccp/sev-dev.c | 112 +++++++++++++++++++++++++----------
 2 files changed, 89 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cbffb27f6468..b8c44c492d43 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -59,6 +59,13 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
 	 */
 	cpu_emergency_stop_pt();
 
+	/*
+	 * for SNP do wbinvd() on remote CPUs to
+	 * safely do SNP_SHUTDOWN on the local CPU.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		wbinvd();
+
 	disable_local_APIC();
 }
 
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 9792c7af3005..598878e760bc 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -21,6 +21,7 @@
 #include <linux/hw_random.h>
 #include <linux/ccp.h>
 #include <linux/firmware.h>
+#include <linux/panic_notifier.h>
 #include <linux/gfp.h>
 #include <linux/cpufeature.h>
 #include <linux/fs.h>
@@ -144,6 +145,26 @@ static int sev_wait_cmd_ioc(struct sev_device *sev,
 {
 	int ret;
 
+	/*
+	 * If invoked during panic handling, local interrupts are disabled,
+	 * so the PSP command completion interrupt can't be used. Poll for
+	 * PSP command completion instead.
+	 */
+	if (irqs_disabled()) {
+		unsigned long timeout_usecs = (timeout * USEC_PER_SEC) / 10;
+
+		/* Poll for SEV command completion: */
+		while (timeout_usecs--) {
+			*reg = ioread32(sev->io_regs + sev->vdata->cmdresp_reg);
+			if (*reg & PSP_CMDRESP_RESP)
+				return 0;
+
+			udelay(10);
+		}
+
+		return -ETIMEDOUT;
+	}
+
 	ret = wait_event_timeout(sev->int_queue,
 			sev->int_rcvd, timeout * HZ);
 	if (!ret)
@@ -1358,17 +1379,6 @@ static int __sev_platform_shutdown_locked(int *error)
 	return ret;
 }
 
-static int sev_platform_shutdown(int *error)
-{
-	int rc;
-
-	mutex_lock(&sev_cmd_mutex);
-	rc = __sev_platform_shutdown_locked(NULL);
-	mutex_unlock(&sev_cmd_mutex);
-
-	return rc;
-}
-
 static int sev_get_platform_state(int *state, int *error)
 {
 	struct sev_user_data_status data;
@@ -1644,7 +1654,7 @@ static int sev_update_firmware(struct device *dev)
 	return ret;
 }
 
-static int __sev_snp_shutdown_locked(int *error)
+static int __sev_snp_shutdown_locked(int *error, bool in_panic)
 {
 	struct sev_device *sev = psp_master->sev_data;
 	struct sev_data_snp_shutdown_ex data;
@@ -1657,7 +1667,16 @@ static int __sev_snp_shutdown_locked(int *error)
 	data.length = sizeof(data);
 	data.iommu_snp_shutdown = 1;
 
-	wbinvd_on_all_cpus();
+	/*
+	 * If invoked during panic handling, local interrupts are disabled
+	 * and all CPUs are stopped, so wbinvd_on_all_cpus() can't be called.
+	 * In that case, a wbinvd() is done on remote CPUs via the NMI
+	 * callback, so only a local wbinvd() is needed here.
+	 */
+	if (!in_panic)
+		wbinvd_on_all_cpus();
+	else
+		wbinvd();
 
 	ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
 	/* SHUTDOWN may require DF_FLUSH */
@@ -1701,17 +1720,6 @@ static int __sev_snp_shutdown_locked(int *error)
 	return ret;
 }
 
-static int sev_snp_shutdown(int *error)
-{
-	int rc;
-
-	mutex_lock(&sev_cmd_mutex);
-	rc = __sev_snp_shutdown_locked(error);
-	mutex_unlock(&sev_cmd_mutex);
-
-	return rc;
-}
-
 static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -2191,19 +2199,29 @@ int sev_dev_init(struct psp_device *psp)
 	return ret;
 }
 
-static void sev_firmware_shutdown(struct sev_device *sev)
+static void __sev_firmware_shutdown(struct sev_device *sev, bool in_panic)
 {
 	int error;
 
-	sev_platform_shutdown(NULL);
+	__sev_platform_shutdown_locked(NULL);
 
 	if (sev_es_tmr) {
-		/* The TMR area was encrypted, flush it from the cache */
-		wbinvd_on_all_cpus();
+		/*
+		 * The TMR area was encrypted, flush it from the cache
+		 *
+		 * If invoked during panic handling, local interrupts are
+		 * disabled and all CPUs are stopped, so wbinvd_on_all_cpus()
+		 * can't be used. In that case, wbinvd() is done on remote CPUs
+		 * via the NMI callback, so a local wbinvd() is sufficient here.
+		 */
+		if (!in_panic)
+			wbinvd_on_all_cpus();
+		else
+			wbinvd();
 
 		__snp_free_firmware_pages(virt_to_page(sev_es_tmr),
 					  get_order(sev_es_tmr_size),
-					  false);
+					  true);
 		sev_es_tmr = NULL;
 	}
 
@@ -2219,7 +2237,14 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 		snp_range_list = NULL;
 	}
 
-	sev_snp_shutdown(&error);
+	__sev_snp_shutdown_locked(&error, in_panic);
+}
+
+static void sev_firmware_shutdown(struct sev_device *sev)
+{
+	mutex_lock(&sev_cmd_mutex);
+	__sev_firmware_shutdown(sev, false);
+	mutex_unlock(&sev_cmd_mutex);
 }
 
 void sev_dev_destroy(struct psp_device *psp)
@@ -2237,6 +2262,28 @@ void sev_dev_destroy(struct psp_device *psp)
 	psp_clear_sev_irq_handler(psp);
 }
 
+static int sev_snp_shutdown_on_panic(struct notifier_block *nb,
+				     unsigned long reason, void *arg)
+{
+	struct sev_device *sev = psp_master->sev_data;
+
+	/*
+	 * Panic callbacks are executed with all other CPUs stopped,
+	 * so don't wait for sev_cmd_mutex to be released since it
+	 * would block here forever.
+	 */
+	if (mutex_is_locked(&sev_cmd_mutex))
+		return NOTIFY_DONE;
+
+	__sev_firmware_shutdown(sev, true);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block sev_snp_panic_notifier = {
+	.notifier_call = sev_snp_shutdown_on_panic,
+};
+
 int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
 				void *data, int *error)
 {
@@ -2274,6 +2321,8 @@ void sev_pci_init(void)
 	dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
 		"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
 
+	atomic_notifier_chain_register(&panic_notifier_list,
+				       &sev_snp_panic_notifier);
 	return;
 
 err:
@@ -2288,4 +2337,7 @@ void sev_pci_exit(void)
 		return;
 
 	sev_firmware_shutdown(sev);
+
+	atomic_notifier_chain_unregister(&panic_notifier_list,
+					 &sev_snp_panic_notifier);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 22/26] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (20 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 21/26] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-21 11:51   ` Borislav Petkov
  2023-12-30 16:19 ` [PATCH v1 23/26] x86/cpufeatures: Enable/unmask SEV-SNP CPU feature Michael Roth
                   ` (3 subsequent siblings)
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh, Marc Orr

From: Brijesh Singh <brijesh.singh@amd.com>

Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2MB or 1GB) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.

When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" via a reserved bit in the corresponding RMP
entry after a successful VMRUN. This is done for _all_ VMs, not just
SNP-Active VMs.

If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2MB-aligned and software accesses any
part of the associated 2MB region with a hugepage, the CPU will
incorrectly treat the entire 2MB region as in-use and signal a an RMP
violation #PF.

To avoid this, the recommendation is to not use a 2MB-aligned page for
the VMCB, VMSA or AVIC pages. Add a generic allocator that will ensure
that the page returned is not 2MB-aligned and is safe to be used when
SEV-SNP is enabled. Also implement similar handling for the VMCB/VMSA
pages of nested guests.

Co-developed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
[mdr: squash in nested guest handling from Ashish, commit msg fixups]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/lapic.c               |  5 ++++-
 arch/x86/kvm/svm/nested.c          |  2 +-
 arch/x86/kvm/svm/sev.c             | 32 ++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c             | 17 +++++++++++++---
 arch/x86/kvm/svm/svm.h             |  1 +
 7 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 378ed944b849..ab24ce207988 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -138,6 +138,7 @@ KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
 KVM_X86_OP_OPTIONAL(get_untagged_addr)
+KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7bc1daf68741..9b0f18d096ed 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1794,6 +1794,7 @@ struct kvm_x86_ops {
 	unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
 
 	gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
+	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 3242f3da2457..1edf93ee3395 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2815,7 +2815,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
 
 	vcpu->arch.apic = apic;
 
-	apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+	if (kvm_x86_ops.alloc_apic_backing_page)
+		apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
+	else
+		apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
 	if (!apic->regs) {
 		printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
 		       vcpu->vcpu_id);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dee62362a360..55b9a6d96bcf 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1181,7 +1181,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
 	if (svm->nested.initialized)
 		return 0;
 
-	vmcb02_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	vmcb02_page = snp_safe_alloc_page(&svm->vcpu);
 	if (!vmcb02_page)
 		return -ENOMEM;
 	svm->nested.vmcb02.ptr = page_address(vmcb02_page);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 58e19d023d70..2efe3ed89808 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3144,3 +3144,35 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 
 	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
 }
+
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
+{
+	unsigned long pfn;
+	struct page *p;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+
+	/*
+	 * Allocate an SNP-safe page to workaround the SNP erratum where
+	 * the CPU will incorrectly signal an RMP violation #PF if a
+	 * hugepage (2MB or 1GB) collides with the RMP entry of a
+	 * 2MB-aligned VMCB, VMSA, or AVIC backing page.
+	 *
+	 * Allocate one extra page, choose a page which is not
+	 * 2MB-aligned, and free the other.
+	 */
+	p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+	if (!p)
+		return NULL;
+
+	split_page(p, 1);
+
+	pfn = page_to_pfn(p);
+	if (IS_ALIGNED(pfn, PTRS_PER_PMD))
+		__free_page(p++);
+	else
+		__free_page(p + 1);
+
+	return p;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 13cacaba229c..b6179696861a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -702,7 +702,7 @@ static int svm_cpu_init(int cpu)
 	int ret = -ENOMEM;
 
 	memset(sd, 0, sizeof(struct svm_cpu_data));
-	sd->save_area = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	sd->save_area = snp_safe_alloc_page(NULL);
 	if (!sd->save_area)
 		return ret;
 
@@ -1420,7 +1420,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 	svm = to_svm(vcpu);
 
 	err = -ENOMEM;
-	vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	vmcb01_page = snp_safe_alloc_page(vcpu);
 	if (!vmcb01_page)
 		goto out;
 
@@ -1429,7 +1429,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 		 * SEV-ES guests require a separate VMSA page used to contain
 		 * the encrypted register state of the guest.
 		 */
-		vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+		vmsa_page = snp_safe_alloc_page(vcpu);
 		if (!vmsa_page)
 			goto error_free_vmcb_page;
 
@@ -4899,6 +4899,16 @@ static int svm_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
+{
+	struct page *page = snp_safe_alloc_page(vcpu);
+
+	if (!page)
+		return NULL;
+
+	return page_address(page);
+}
+
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
@@ -5030,6 +5040,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
+	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 };
 
 /*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 59adff7bbf55..9ed9d72546b3 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -694,6 +694,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 
 /* vmenter.S */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 23/26] x86/cpufeatures: Enable/unmask SEV-SNP CPU feature
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (21 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 22/26] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2023-12-30 16:19 ` [PATCH v1 24/26] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

With all the required host changes in place, it should now be possible
to initialize SNP-related MSR bits, set up RMP table enforcement, and
initialize SNP support in firmware while maintaining legacy support for
running SEV/SEV-ES guests. Go ahead and enable the SNP feature now.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/include/asm/disabled-features.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index a864a5b208fa..3332d2940020 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -117,7 +117,11 @@
 #define DISABLE_IBT	(1 << (X86_FEATURE_IBT & 31))
 #endif
 
+#ifdef CONFIG_KVM_AMD_SEV
 #define DISABLE_SEV_SNP		0
+#else
+#define DISABLE_SEV_SNP		(1 << (X86_FEATURE_SEV_SNP & 31))
+#endif
 
 /*
  * Make sure to add features to the correct mask
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 24/26] crypto: ccp: Add the SNP_PLATFORM_STATUS command
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (22 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 23/26] x86/cpufeatures: Enable/unmask SEV-SNP CPU feature Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-21 12:29   ` Borislav Petkov
  2023-12-30 16:19 ` [PATCH v1 25/26] crypto: ccp: Add the SNP_COMMIT command Michael Roth
  2023-12-30 16:19 ` [PATCH v1 26/26] crypto: ccp: Add the SNP_SET_CONFIG command Michael Roth
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

From: Brijesh Singh <brijesh.singh@amd.com>

This command is used to query the SNP platform status. See the SEV-SNP
spec for more details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/coco/sev-guest.rst | 27 ++++++++++++++++
 drivers/crypto/ccp/sev-dev.c          | 45 +++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h          |  1 +
 3 files changed, 73 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 68b0d2363af8..6d3d5d336e5f 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -67,6 +67,22 @@ counter (e.g. counter overflow), then -EIO will be returned.
                 };
         };
 
+The host ioctls are issued to a file descriptor of the /dev/sev device.
+The ioctl accepts the command ID/input structure documented below.
+
+::
+        struct sev_issue_cmd {
+                /* Command ID */
+                __u32 cmd;
+
+                /* Command request structure */
+                __u64 data;
+
+                /* Firmware error code on failure (see psp-sev.h) */
+                __u32 error;
+        };
+
+
 2.1 SNP_GET_REPORT
 ------------------
 
@@ -124,6 +140,17 @@ be updated with the expected value.
 
 See GHCB specification for further detail on how to parse the certificate blob.
 
+2.4 SNP_PLATFORM_STATUS
+-----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (out): struct sev_user_data_snp_status
+:Returns (out): 0 on success, -negative on error
+
+The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
+status includes API major, minor version and more. See the SEV-SNP
+specification for further details.
+
 3. SEV-SNP CPUID Enforcement
 ============================
 
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 598878e760bc..e663175cfa44 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1962,6 +1962,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
 	return ret;
 }
 
+static int sev_ioctl_do_snp_platform_status(struct sev_issue_cmd *argp)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_data_snp_addr buf;
+	struct page *status_page;
+	void *data;
+	int ret;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	status_page = alloc_page(GFP_KERNEL_ACCOUNT);
+	if (!status_page)
+		return -ENOMEM;
+
+	data = page_address(status_page);
+	if (rmp_mark_pages_firmware(__pa(data), 1, true)) {
+		ret = -EFAULT;
+		goto cleanup;
+	}
+
+	buf.gctx_paddr = __psp_pa(data);
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
+
+	/* Change the page state before accessing it */
+	if (snp_reclaim_pages(__pa(data), 1, true)) {
+		snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
+		return -EFAULT;
+	}
+
+	if (ret)
+		goto cleanup;
+
+	if (copy_to_user((void __user *)argp->data, data,
+			 sizeof(struct sev_user_data_snp_status)))
+		ret = -EFAULT;
+
+cleanup:
+	__free_pages(status_page, 0);
+	return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -2013,6 +2055,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SEV_GET_ID2:
 		ret = sev_ioctl_do_get_id2(&input);
 		break;
+	case SNP_PLATFORM_STATUS:
+		ret = sev_ioctl_do_snp_platform_status(&input);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 71ba5f9f90a8..1feba7d08099 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -28,6 +28,7 @@ enum {
 	SEV_PEK_CERT_IMPORT,
 	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
 	SEV_GET_ID2,
+	SNP_PLATFORM_STATUS,
 
 	SEV_MAX,
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 25/26] crypto: ccp: Add the SNP_COMMIT command
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (23 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 24/26] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-21 12:35   ` Borislav Petkov
  2023-12-30 16:19 ` [PATCH v1 26/26] crypto: ccp: Add the SNP_SET_CONFIG command Michael Roth
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

From: Tom Lendacky <thomas.lendacky@amd.com>

The SNP_COMMIT command is used to commit the currently installed version
of the SEV firmware. Once committed, the firmware cannot be replaced
with a previous firmware version (cannot be rolled back). This command
will also update the reported TCB to match that of the currently
installed firmware.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[mdr: note the reported TCB update in the documentation/commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/coco/sev-guest.rst | 11 +++++++++++
 drivers/crypto/ccp/sev-dev.c          | 17 +++++++++++++++++
 include/linux/psp-sev.h               |  9 +++++++++
 include/uapi/linux/psp-sev.h          |  1 +
 4 files changed, 38 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 6d3d5d336e5f..007ae828aa2a 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -151,6 +151,17 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
 status includes API major, minor version and more. See the SEV-SNP
 specification for further details.
 
+2.5 SNP_COMMIT
+--------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Returns (out): 0 on success, -negative on error
+
+SNP_COMMIT is used to commit the currently installed firmware using the
+SEV-SNP firmware SNP_COMMIT command. This prevents roll-back to a previously
+committed firmware version. This will also update the reported TCB to match
+that of the currently installed firmware.
+
 3. SEV-SNP CPUID Enforcement
 ============================
 
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index e663175cfa44..9c051a9b43e2 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -224,6 +224,7 @@ static int sev_cmd_buffer_len(int cmd)
 	case SEV_CMD_SNP_PLATFORM_STATUS:	return sizeof(struct sev_data_snp_addr);
 	case SEV_CMD_SNP_GUEST_REQUEST:		return sizeof(struct sev_data_snp_guest_request);
 	case SEV_CMD_SNP_CONFIG:		return sizeof(struct sev_user_data_snp_config);
+	case SEV_CMD_SNP_COMMIT:		return sizeof(struct sev_data_snp_commit);
 	default:				return 0;
 	}
 
@@ -2004,6 +2005,19 @@ static int sev_ioctl_do_snp_platform_status(struct sev_issue_cmd *argp)
 	return ret;
 }
 
+static int sev_ioctl_do_snp_commit(struct sev_issue_cmd *argp)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_data_snp_commit buf;
+
+	if (!sev->snp_initialized)
+		return -EINVAL;
+
+	buf.length = sizeof(buf);
+
+	return __sev_do_cmd_locked(SEV_CMD_SNP_COMMIT, &buf, &argp->error);
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -2058,6 +2072,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SNP_PLATFORM_STATUS:
 		ret = sev_ioctl_do_snp_platform_status(&input);
 		break;
+	case SNP_COMMIT:
+		ret = sev_ioctl_do_snp_commit(&input);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index b14008388a37..11af3dd9126d 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -787,6 +787,15 @@ struct sev_data_snp_shutdown_ex {
 	u32 rsvd1:31;
 } __packed;
 
+/**
+ * struct sev_data_snp_commit - SNP_COMMIT structure
+ *
+ * @length: len of the command buffer read by the PSP
+ */
+struct sev_data_snp_commit {
+	u32 length;
+} __packed;
+
 #ifdef CONFIG_CRYPTO_DEV_SP_PSP
 
 /**
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 1feba7d08099..01aab4b340f4 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -29,6 +29,7 @@ enum {
 	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
 	SEV_GET_ID2,
 	SNP_PLATFORM_STATUS,
+	SNP_COMMIT,
 
 	SEV_MAX,
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH v1 26/26] crypto: ccp: Add the SNP_SET_CONFIG command
  2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
                   ` (24 preceding siblings ...)
  2023-12-30 16:19 ` [PATCH v1 25/26] crypto: ccp: Add the SNP_COMMIT command Michael Roth
@ 2023-12-30 16:19 ` Michael Roth
  2024-01-21 12:41   ` Borislav Petkov
  25 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2023-12-30 16:19 UTC (permalink / raw)
  To: x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh, Alexey Kardashevskiy, Dionna Glaze

From: Brijesh Singh <brijesh.singh@amd.com>

The SEV-SNP firmware provides the SNP_CONFIG command used to set various
system-wide configuration values for SNP guests, such as the TCB version
to be reported in guest attestation reports. Add an interface to set
this via userspace.

Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: squash in doc patch from Dionna, drop extended request/certificate
 handling and simplify this to a simple wrapper around SNP_CONFIG fw
 cmd]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/coco/sev-guest.rst | 13 +++++++++++++
 drivers/crypto/ccp/sev-dev.c          | 20 ++++++++++++++++++++
 include/uapi/linux/psp-sev.h          |  1 +
 3 files changed, 34 insertions(+)

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 007ae828aa2a..4f696aacc866 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -162,6 +162,19 @@ SEV-SNP firmware SNP_COMMIT command. This prevents roll-back to a previously
 committed firmware version. This will also update the reported TCB to match
 that of the currently installed firmware.
 
+2.6 SNP_SET_CONFIG
+------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_user_data_snp_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_CONFIG is used to set the system-wide configuration such as
+reported TCB version in the attestation report. The command is similar to
+SNP_CONFIG command defined in the SEV-SNP spec. The current values of the
+firmware parameters affected by this command can be queried via
+SNP_PLATFORM_STATUS.
+
 3. SEV-SNP CPUID Enforcement
 ============================
 
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 9c051a9b43e2..c5b26b3fe7ff 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2018,6 +2018,23 @@ static int sev_ioctl_do_snp_commit(struct sev_issue_cmd *argp)
 	return __sev_do_cmd_locked(SEV_CMD_SNP_COMMIT, &buf, &argp->error);
 }
 
+static int sev_ioctl_do_snp_set_config(struct sev_issue_cmd *argp, bool writable)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	struct sev_user_data_snp_config config;
+
+	if (!sev->snp_initialized || !argp->data)
+		return -EINVAL;
+
+	if (!writable)
+		return -EPERM;
+
+	if (copy_from_user(&config, (void __user *)argp->data, sizeof(config)))
+		return -EFAULT;
+
+	return __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
 	void __user *argp = (void __user *)arg;
@@ -2075,6 +2092,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 	case SNP_COMMIT:
 		ret = sev_ioctl_do_snp_commit(&input);
 		break;
+	case SNP_SET_CONFIG:
+		ret = sev_ioctl_do_snp_set_config(&input, writable);
+		break;
 	default:
 		ret = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 01aab4b340f4..f28d4fb5bc21 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -30,6 +30,7 @@ enum {
 	SEV_GET_ID2,
 	SNP_PLATFORM_STATUS,
 	SNP_COMMIT,
+	SNP_SET_CONFIG,
 
 	SEV_MAX,
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 01/26] x86/cpufeatures: Add SEV-SNP CPU feature
  2023-12-30 16:19 ` [PATCH v1 01/26] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
@ 2023-12-31 11:50   ` Borislav Petkov
  2023-12-31 16:44     ` Michael Roth
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2023-12-31 11:50 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh, Jarkko Sakkinen

On Sat, Dec 30, 2023 at 10:19:29AM -0600, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Add CPU feature detection for Secure Encrypted Virtualization with
> Secure Nested Paging. This feature adds a strong memory integrity
> protection to help prevent malicious hypervisor-based attacks like
> data replay, memory re-mapping, and more.
> 
> Since enabling the SNP CPU feature imposes a number of additional
> requirements on host initialization and handling legacy firmware APIs
> for SEV/SEV-ES guests, only introduce the CPU feature bit so that the
> relevant handling can be added, but leave it disabled via a
> disabled-features mask.
> 
> Once all the necessary changes needed to maintain legacy SEV/SEV-ES
> support are introduced in subsequent patches, the SNP feature bit will
> be unmasked/enabled.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
> Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h       | 1 +
>  arch/x86/include/asm/disabled-features.h | 4 +++-
>  arch/x86/kernel/cpu/amd.c                | 5 +++--
>  tools/arch/x86/include/asm/cpufeatures.h | 1 +
>  4 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 29cb275a219d..9492dcad560d 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -442,6 +442,7 @@
>  #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
>  #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
>  #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
> +#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
>  #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
>  #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
>  #define X86_FEATURE_DEBUG_SWAP		(19*32+14) /* AMD SEV-ES full debug state swap support */
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 702d93fdd10e..a864a5b208fa 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -117,6 +117,8 @@
>  #define DISABLE_IBT	(1 << (X86_FEATURE_IBT & 31))
>  #endif
>  
> +#define DISABLE_SEV_SNP		0

I think you want this here if SEV_SNP should be initially disabled:

diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index a864a5b208fa..5b2fab8ad262 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -117,7 +117,7 @@
 #define DISABLE_IBT	(1 << (X86_FEATURE_IBT & 31))
 #endif
 
-#define DISABLE_SEV_SNP		0
+#define DISABLE_SEV_SNP	(1 << (X86_FEATURE_SEV_SNP & 31))
 
 /*
  * Make sure to add features to the correct mask

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 01/26] x86/cpufeatures: Add SEV-SNP CPU feature
  2023-12-31 11:50   ` Borislav Petkov
@ 2023-12-31 16:44     ` Michael Roth
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2023-12-31 16:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh, Jarkko Sakkinen

On Sun, Dec 31, 2023 at 12:50:12PM +0100, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:29AM -0600, Michael Roth wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> > 
> > Add CPU feature detection for Secure Encrypted Virtualization with
> > Secure Nested Paging. This feature adds a strong memory integrity
> > protection to help prevent malicious hypervisor-based attacks like
> > data replay, memory re-mapping, and more.
> > 
> > Since enabling the SNP CPU feature imposes a number of additional
> > requirements on host initialization and handling legacy firmware APIs
> > for SEV/SEV-ES guests, only introduce the CPU feature bit so that the
> > relevant handling can be added, but leave it disabled via a
> > disabled-features mask.
> > 
> > Once all the necessary changes needed to maintain legacy SEV/SEV-ES
> > support are introduced in subsequent patches, the SNP feature bit will
> > be unmasked/enabled.
> > 
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
> > Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  arch/x86/include/asm/cpufeatures.h       | 1 +
> >  arch/x86/include/asm/disabled-features.h | 4 +++-
> >  arch/x86/kernel/cpu/amd.c                | 5 +++--
> >  tools/arch/x86/include/asm/cpufeatures.h | 1 +
> >  4 files changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index 29cb275a219d..9492dcad560d 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -442,6 +442,7 @@
> >  #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
> >  #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
> >  #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
> > +#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
> >  #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
> >  #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
> >  #define X86_FEATURE_DEBUG_SWAP		(19*32+14) /* AMD SEV-ES full debug state swap support */
> > diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> > index 702d93fdd10e..a864a5b208fa 100644
> > --- a/arch/x86/include/asm/disabled-features.h
> > +++ b/arch/x86/include/asm/disabled-features.h
> > @@ -117,6 +117,8 @@
> >  #define DISABLE_IBT	(1 << (X86_FEATURE_IBT & 31))
> >  #endif
> >  
> > +#define DISABLE_SEV_SNP		0
> 
> I think you want this here if SEV_SNP should be initially disabled:
> 
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index a864a5b208fa..5b2fab8ad262 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -117,7 +117,7 @@
>  #define DISABLE_IBT	(1 << (X86_FEATURE_IBT & 31))
>  #endif
>  
> -#define DISABLE_SEV_SNP		0
> +#define DISABLE_SEV_SNP	(1 << (X86_FEATURE_SEV_SNP & 31))
>  
>  /*
>   * Make sure to add features to the correct mask

Sorry, I must have inverted things when I was squashing in the changes =\

I've gone ahead and force-pushed your fixup to the snp-host-init-v1
branch.

Thanks,

Mike

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 03/26] iommu/amd: Don't rely on external callers to enable IOMMU SNP support
  2023-12-30 16:19 ` [PATCH v1 03/26] iommu/amd: Don't rely on external callers to enable IOMMU SNP support Michael Roth
@ 2024-01-04 10:30   ` Borislav Petkov
  2024-01-04 10:58   ` Joerg Roedel
  1 sibling, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-04 10:30 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

On Sat, Dec 30, 2023 at 10:19:31AM -0600, Michael Roth wrote:
> +static void iommu_snp_enable(void)
> +{
> +#ifdef CONFIG_KVM_AMD_SEV
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return;
> +	/*
> +	 * The SNP support requires that IOMMU must be enabled, and is
> +	 * not configured in the passthrough mode.
> +	 */
> +	if (no_iommu || iommu_default_passthrough()) {
> +		pr_err("SNP: IOMMU is disabled or configured in passthrough mode, SNP cannot be supported.\n");
> +		return;
> +	}
> +
> +	amd_iommu_snp_en = check_feature(FEATURE_SNP);
> +	if (!amd_iommu_snp_en) {
> +		pr_err("SNP: IOMMU SNP feature is not enabled, SNP cannot be supported.\n");
> +		return;
> +	}
> +
> +	pr_info("IOMMU SNP support is enabled.\n");
> +
> +	/* Enforce IOMMU v1 pagetable when SNP is enabled. */
> +	if (amd_iommu_pgtable != AMD_IOMMU_V1) {
> +		pr_warn("Forcing use of AMD IOMMU v1 page table due to SNP.\n");
> +		amd_iommu_pgtable = AMD_IOMMU_V1;
> +	}

Kernel code usually says simple "<bla> enabled" not "<bla> is enabled".
Other than that, LGTM.

---

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1ed2ef22a0fb..2f1517acaba0 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3231,17 +3231,17 @@ static void iommu_snp_enable(void)
 	 * not configured in the passthrough mode.
 	 */
 	if (no_iommu || iommu_default_passthrough()) {
-		pr_err("SNP: IOMMU is disabled or configured in passthrough mode, SNP cannot be supported.\n");
+		pr_err("SNP: IOMMU disabled or configured in passthrough mode, SNP cannot be supported.\n");
 		return;
 	}
 
 	amd_iommu_snp_en = check_feature(FEATURE_SNP);
 	if (!amd_iommu_snp_en) {
-		pr_err("SNP: IOMMU SNP feature is not enabled, SNP cannot be supported.\n");
+		pr_err("SNP: IOMMU SNP feature not enabled, SNP cannot be supported.\n");
 		return;
 	}
 
-	pr_info("IOMMU SNP support is enabled.\n");
+	pr_info("IOMMU SNP support enabled.\n");
 
 	/* Enforce IOMMU v1 pagetable when SNP is enabled. */
 	if (amd_iommu_pgtable != AMD_IOMMU_V1) {


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 03/26] iommu/amd: Don't rely on external callers to enable IOMMU SNP support
  2023-12-30 16:19 ` [PATCH v1 03/26] iommu/amd: Don't rely on external callers to enable IOMMU SNP support Michael Roth
  2024-01-04 10:30   ` Borislav Petkov
@ 2024-01-04 10:58   ` Joerg Roedel
  1 sibling, 0 replies; 102+ messages in thread
From: Joerg Roedel @ 2024-01-04 10:58 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, thomas.lendacky, hpa, ardb, pbonzini, seanjc, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

On Sat, Dec 30, 2023 at 10:19:31AM -0600, Michael Roth wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Currently the expectation is that the kernel will call
> amd_iommu_snp_enable() to perform various checks and set the
> amd_iommu_snp_en flag that the IOMMU uses to adjust its setup routines
> to account for additional requirements on hosts where SNP is enabled.
> 
> This is somewhat fragile as it relies on this call being done prior to
> IOMMU setup. It is more robust to just do this automatically as part of
> IOMMU initialization, so rework the code accordingly.
> 
> There is still a need to export information about whether or not the
> IOMMU is configured in a manner compatible with SNP, so relocate the
> existing amd_iommu_snp_en flag so it can be used to convey that
> information in place of the return code that was previously provided by
> calls to amd_iommu_snp_enable().
> 
> While here, also adjust the kernel messages related to IOMMU SNP
> enablement for consistency/grammar/clarity.
> 
> Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Co-developed-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Acked-by: Joerg Roedel <jroedel@suse.de>

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Frankenstraße 146
90461 Nürnberg
Germany

(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2023-12-30 16:19 ` [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support Michael Roth
@ 2024-01-04 11:05   ` Jeremi Piotrowski
  2024-01-05 16:09     ` Borislav Petkov
  2024-01-04 11:16   ` Borislav Petkov
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 102+ messages in thread
From: Jeremi Piotrowski @ 2024-01-04 11:05 UTC (permalink / raw)
  To: Michael Roth, x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On 30/12/2023 17:19, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
> a number of steps needed to detect/enable SEV-SNP and RMP table support
> on the host:
> 
>  - Detect SEV-SNP support based on CPUID bit
>  - Initialize the RMP table memory reported by the RMP base/end MSR
>    registers and configure IOMMU to be compatible with RMP access
>    restrictions
>  - Set the MtrrFixDramModEn bit in SYSCFG MSR
>  - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
>  - Configure IOMMU
> 
> RMP table entry format is non-architectural and it can vary by
> processor. It is defined by the PPR. Restrict SNP support to CPU
> models/families which are compatible with the current RMP table entry
> format to guard against any undefined behavior when running on other
> system types. Future models/support will handle this through an
> architectural mechanism to allow for broader compatibility.
> 
> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
> instead of CONFIG_AMD_MEM_ENCRYPT.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Co-developed-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/Kbuild                  |   2 +
>  arch/x86/include/asm/msr-index.h |  11 +-
>  arch/x86/include/asm/sev.h       |   6 +
>  arch/x86/kernel/cpu/amd.c        |  15 +++
>  arch/x86/virt/svm/Makefile       |   3 +
>  arch/x86/virt/svm/sev.c          | 219 +++++++++++++++++++++++++++++++
>  6 files changed, 255 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/virt/svm/Makefile
>  create mode 100644 arch/x86/virt/svm/sev.c
> 
> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
> index 5a83da703e87..6a1f36df6a18 100644
> --- a/arch/x86/Kbuild
> +++ b/arch/x86/Kbuild
> @@ -28,5 +28,7 @@ obj-y += net/
>  
>  obj-$(CONFIG_KEXEC_FILE) += purgatory/
>  
> +obj-y += virt/svm/
> +
>  # for cleaning
>  subdir- += boot tools
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index f1bd7b91b3c6..15ce1269f270 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -599,6 +599,8 @@
>  #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
>  #define MSR_AMD64_SEV_ES_ENABLED	BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
>  #define MSR_AMD64_SEV_SNP_ENABLED	BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> +#define MSR_AMD64_RMP_BASE		0xc0010132
> +#define MSR_AMD64_RMP_END		0xc0010133
>  
>  /* SNP feature bits enabled by the hypervisor */
>  #define MSR_AMD64_SNP_VTOM			BIT_ULL(3)
> @@ -709,7 +711,14 @@
>  #define MSR_K8_TOP_MEM2			0xc001001d
>  #define MSR_AMD64_SYSCFG		0xc0010010
>  #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT	23
> -#define MSR_AMD64_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_MEM_ENCRYPT		BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT		24
> +#define MSR_AMD64_SYSCFG_SNP_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT	25
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
> +#define MSR_AMD64_SYSCFG_MFDM_BIT		19
> +#define MSR_AMD64_SYSCFG_MFDM			BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
> +
>  #define MSR_K8_INT_PENDING_MSG		0xc0010055
>  /* C1E active bits in int pending message */
>  #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 5b4a1ce3d368..1f59d8ba9776 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -243,4 +243,10 @@ static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
>  static inline u64 sev_get_status(void) { return 0; }
>  #endif
>  
> +#ifdef CONFIG_KVM_AMD_SEV
> +bool snp_probe_rmptable_info(void);
> +#else
> +static inline bool snp_probe_rmptable_info(void) { return false; }
> +#endif
> +
>  #endif
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 9a17165dfe84..0f0d425f0440 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -20,6 +20,7 @@
>  #include <asm/delay.h>
>  #include <asm/debugreg.h>
>  #include <asm/resctrl.h>
> +#include <asm/sev.h>
>  
>  #ifdef CONFIG_X86_64
>  # include <asm/mmconfig.h>
> @@ -574,6 +575,20 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
>  		break;
>  	}
>  
> +	if (cpu_has(c, X86_FEATURE_SEV_SNP)) {
> +		/*
> +		 * RMP table entry format is not architectural and it can vary by processor
> +		 * and is defined by the per-processor PPR. Restrict SNP support on the
> +		 * known CPU model and family for which the RMP table entry format is
> +		 * currently defined for.
> +		 */
> +		if (!(c->x86 == 0x19 && c->x86_model <= 0xaf) &&
> +		    !(c->x86 == 0x1a && c->x86_model <= 0xf))
> +			setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> +		else if (!snp_probe_rmptable_info())
> +			setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);

Is there a really good reason to perform the snp_probe_smptable_info() check at this
point (instead of in snp_rmptable_init). snp_rmptable_init will also clear the cap
on failure, and bsp_init_amd() runs too early to allow for the kernel to allocate the
rmptable itself. I pointed out in the previous review that kernel allocation of rmptable
is necessary in SNP-host capable VMs in Azure.

> +	}
> +
>  	return;
>  
>  warn:
> diff --git a/arch/x86/virt/svm/Makefile b/arch/x86/virt/svm/Makefile
> new file mode 100644
> index 000000000000..ef2a31bdcc70
> --- /dev/null
> +++ b/arch/x86/virt/svm/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_KVM_AMD_SEV) += sev.o
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> new file mode 100644
> index 000000000000..ce7ede9065ed
> --- /dev/null
> +++ b/arch/x86/virt/svm/sev.c
> @@ -0,0 +1,219 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * AMD SVM-SEV Host Support.
> + *
> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> + *
> + * Author: Ashish Kalra <ashish.kalra@amd.com>
> + *
> + */
> +
> +#include <linux/cc_platform.h>
> +#include <linux/printk.h>
> +#include <linux/mm_types.h>
> +#include <linux/set_memory.h>
> +#include <linux/memblock.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/cpumask.h>
> +#include <linux/iommu.h>
> +#include <linux/amd-iommu.h>
> +
> +#include <asm/sev.h>
> +#include <asm/processor.h>
> +#include <asm/setup.h>
> +#include <asm/svm.h>
> +#include <asm/smp.h>
> +#include <asm/cpu.h>
> +#include <asm/apic.h>
> +#include <asm/cpuid.h>
> +#include <asm/cmdline.h>
> +#include <asm/iommu.h>
> +
> +/*
> + * The RMP entry format is not architectural. The format is defined in PPR
> + * Family 19h Model 01h, Rev B1 processor.
> + */
> +struct rmpentry {
> +	u64	assigned	: 1,
> +		pagesize	: 1,
> +		immutable	: 1,
> +		rsvd1		: 9,
> +		gpa		: 39,
> +		asid		: 10,
> +		vmsa		: 1,
> +		validated	: 1,
> +		rsvd2		: 1;
> +	u64 rsvd3;
> +} __packed;
> +
> +/*
> + * The first 16KB from the RMP_BASE is used by the processor for the
> + * bookkeeping, the range needs to be added during the RMP entry lookup.
> + */
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ	0x4000
> +
> +static u64 probed_rmp_base, probed_rmp_size;
> +static struct rmpentry *rmptable __ro_after_init;
> +static u64 rmptable_max_pfn __ro_after_init;
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt)	"SEV-SNP: " fmt
> +
> +static int __mfd_enable(unsigned int cpu)
> +{
> +	u64 val;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return 0;
> +
> +	rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	val |= MSR_AMD64_SYSCFG_MFDM;
> +
> +	wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	return 0;
> +}
> +
> +static __init void mfd_enable(void *arg)
> +{
> +	__mfd_enable(smp_processor_id());
> +}
> +
> +static int __snp_enable(unsigned int cpu)
> +{
> +	u64 val;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return 0;
> +
> +	rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	val |= MSR_AMD64_SYSCFG_SNP_EN;
> +	val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
> +
> +	wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> +	return 0;
> +}
> +
> +static __init void snp_enable(void *arg)
> +{
> +	__snp_enable(smp_processor_id());
> +}
> +
> +#define RMP_ADDR_MASK GENMASK_ULL(51, 13)
> +
> +bool snp_probe_rmptable_info(void)
> +{
> +	u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
> +
> +	rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> +	rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> +	if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
> +		pr_err("Memory for the RMP table has not been reserved by BIOS\n");
> +		return false;
> +	}
> +
> +	if (rmp_base > rmp_end) {
> +		pr_err("RMP configuration not valid: base=%#llx, end=%#llx\n", rmp_base, rmp_end);
> +		return false;
> +	}
> +
> +	rmp_sz = rmp_end - rmp_base + 1;
> +
> +	/*
> +	 * Calculate the amount the memory that must be reserved by the BIOS to
> +	 * address the whole RAM, including the bookkeeping area. The RMP itself
> +	 * must also be covered.
> +	 */
> +	max_rmp_pfn = max_pfn;
> +	if (PHYS_PFN(rmp_end) > max_pfn)
> +		max_rmp_pfn = PHYS_PFN(rmp_end);
> +
> +	calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> +	if (calc_rmp_sz > rmp_sz) {
> +		pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> +		       calc_rmp_sz, rmp_sz);
> +		return false;
> +	}
> +
> +	probed_rmp_base = rmp_base;
> +	probed_rmp_size = rmp_sz;
> +
> +	pr_info("RMP table physical range [0x%016llx - 0x%016llx]\n",
> +		probed_rmp_base, probed_rmp_base + probed_rmp_size - 1);
> +
> +	return true;
> +}
> +
> +static int __init __snp_rmptable_init(void)
> +{
> +	u64 rmptable_size;
> +	void *rmptable_start;
> +	u64 val;
> +
> +	if (!probed_rmp_size)
> +		return 1;
> +
> +	rmptable_start = memremap(probed_rmp_base, probed_rmp_size, MEMREMAP_WB);
> +	if (!rmptable_start) {
> +		pr_err("Failed to map RMP table\n");
> +		return 1;
> +	}
> +
> +	/*
> +	 * Check if SEV-SNP is already enabled, this can happen in case of
> +	 * kexec boot.
> +	 */
> +	rdmsrl(MSR_AMD64_SYSCFG, val);
> +	if (val & MSR_AMD64_SYSCFG_SNP_EN)
> +		goto skip_enable;
> +
> +	memset(rmptable_start, 0, probed_rmp_size);
> +
> +	/* Flush the caches to ensure that data is written before SNP is enabled. */
> +	wbinvd_on_all_cpus();
> +
> +	/* MtrrFixDramModEn must be enabled on all the CPUs prior to enabling SNP. */
> +	on_each_cpu(mfd_enable, NULL, 1);
> +
> +	on_each_cpu(snp_enable, NULL, 1);
> +
> +skip_enable:
> +	rmptable_start += RMPTABLE_CPU_BOOKKEEPING_SZ;
> +	rmptable_size = probed_rmp_size - RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> +	rmptable = (struct rmpentry *)rmptable_start;
> +	rmptable_max_pfn = rmptable_size / sizeof(struct rmpentry) - 1;
> +
> +	return 0;
> +}
> +
> +static int __init snp_rmptable_init(void)
> +{
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return 0;
> +
> +	if (!amd_iommu_snp_en)
> +		return 0;

Looks better - do you think it'll be OK to add a X86_FEATURE_HYPERVISOR check at this point
later to account for SNP-host capable VMs with no access to an iommu?

Jeremi

> +
> +	if (__snp_rmptable_init())
> +		goto nosnp;
> +
> +	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> +	return 0;
> +
> +nosnp:
> +	setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> +	return -ENOSYS;
> +}
> +
> +/*
> + * This must be called after the IOMMU has been initialized.
> + */
> +device_initcall(snp_rmptable_init);


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2023-12-30 16:19 ` [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support Michael Roth
  2024-01-04 11:05   ` Jeremi Piotrowski
@ 2024-01-04 11:16   ` Borislav Petkov
  2024-01-04 14:42   ` Borislav Petkov
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-04 11:16 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:32AM -0600, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
> a number of steps needed to detect/enable SEV-SNP and RMP table support
> on the host:
> 
>  - Detect SEV-SNP support based on CPUID bit
>  - Initialize the RMP table memory reported by the RMP base/end MSR
>    registers and configure IOMMU to be compatible with RMP access
>    restrictions
>  - Set the MtrrFixDramModEn bit in SYSCFG MSR
>  - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
>  - Configure IOMMU
> 
> RMP table entry format is non-architectural and it can vary by
> processor. It is defined by the PPR. Restrict SNP support to CPU
> models/families which are compatible with the current RMP table entry
> format to guard against any undefined behavior when running on other
> system types. Future models/support will handle this through an
> architectural mechanism to allow for broader compatibility.
> 
> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
> instead of CONFIG_AMD_MEM_ENCRYPT.

Small fixups to the commit message:

    The memory integrity guarantees of SEV-SNP are enforced through a new
    structure called the Reverse Map Table (RMP). The RMP is a single data
    structure shared across the system that contains one entry for every 4K
    page of DRAM that may be used by SEV-SNP VMs. The APM v2 section on
    Secure Nested Paging (SEV-SNP) details a number of steps needed to
    detect/enable SEV-SNP and RMP table support on the host:
    
     - Detect SEV-SNP support based on CPUID bit
     - Initialize the RMP table memory reported by the RMP base/end MSR
       registers and configure IOMMU to be compatible with RMP access
       restrictions
     - Set the MtrrFixDramModEn bit in SYSCFG MSR
     - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
     - Configure IOMMU
    
    The RMP table entry format is non-architectural and it can vary by
    processor. It is defined by the PPR document for each respective CPU
    family. Restrict SNP support to CPU models/families which are compatible
    with the current RMP table entry format to guard against any undefined
    behavior when running on other system types. Future models/support will
    handle this through an architectural mechanism to allow for broader
    compatibility.
    
    The SNP host code depends on CONFIG_KVM_AMD_SEV config flag which may
    be enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
    SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
    instead of CONFIG_AMD_MEM_ENCRYPT.

> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index f1bd7b91b3c6..15ce1269f270 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -599,6 +599,8 @@
>  #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
>  #define MSR_AMD64_SEV_ES_ENABLED	BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
>  #define MSR_AMD64_SEV_SNP_ENABLED	BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> +#define MSR_AMD64_RMP_BASE		0xc0010132
> +#define MSR_AMD64_RMP_END		0xc0010133
>  
>  /* SNP feature bits enabled by the hypervisor */
>  #define MSR_AMD64_SNP_VTOM			BIT_ULL(3)
> @@ -709,7 +711,14 @@
>  #define MSR_K8_TOP_MEM2			0xc001001d
>  #define MSR_AMD64_SYSCFG		0xc0010010
>  #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT	23
> -#define MSR_AMD64_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_MEM_ENCRYPT		BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT		24
> +#define MSR_AMD64_SYSCFG_SNP_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT	25
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
> +#define MSR_AMD64_SYSCFG_MFDM_BIT		19
> +#define MSR_AMD64_SYSCFG_MFDM			BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
> +

Fix the vertical alignment:

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 15ce1269f270..f482bc6a5ae7 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -710,14 +710,14 @@
 #define MSR_K8_TOP_MEM1			0xc001001a
 #define MSR_K8_TOP_MEM2			0xc001001d
 #define MSR_AMD64_SYSCFG		0xc0010010
-#define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT	23
-#define MSR_AMD64_SYSCFG_MEM_ENCRYPT		BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
-#define MSR_AMD64_SYSCFG_SNP_EN_BIT		24
+#define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
+#define MSR_AMD64_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_SNP_EN_BIT	24
 #define MSR_AMD64_SYSCFG_SNP_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
-#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT	25
-#define MSR_AMD64_SYSCFG_SNP_VMPL_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
-#define MSR_AMD64_SYSCFG_MFDM_BIT		19
-#define MSR_AMD64_SYSCFG_MFDM			BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN	BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
+#define MSR_AMD64_SYSCFG_MFDM_BIT	19
+#define MSR_AMD64_SYSCFG_MFDM		BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
 
 #define MSR_K8_INT_PENDING_MSG		0xc0010055
 /* C1E active bits in int pending message */

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2023-12-30 16:19 ` [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support Michael Roth
  2024-01-04 11:05   ` Jeremi Piotrowski
  2024-01-04 11:16   ` Borislav Petkov
@ 2024-01-04 14:42   ` Borislav Petkov
  2024-01-05 19:19   ` Borislav Petkov
  2024-01-05 21:27   ` Borislav Petkov
  4 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-04 14:42 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:32AM -0600, Michael Roth wrote:
> +	if (cpu_has(c, X86_FEATURE_SEV_SNP)) {
> +		/*
> +		 * RMP table entry format is not architectural and it can vary by processor
> +		 * and is defined by the per-processor PPR. Restrict SNP support on the
> +		 * known CPU model and family for which the RMP table entry format is
> +		 * currently defined for.
> +		 */
> +		if (!(c->x86 == 0x19 && c->x86_model <= 0xaf) &&
> +		    !(c->x86 == 0x1a && c->x86_model <= 0xf))
> +			setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> +		else if (!snp_probe_rmptable_info())
> +			setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> +	}

IOW, this below.

Lemme send the ZEN5 thing as a separate patch.

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 9492dcad560d..0fa702673e73 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -81,10 +81,8 @@
 #define X86_FEATURE_K6_MTRR		( 3*32+ 1) /* AMD K6 nonstandard MTRRs */
 #define X86_FEATURE_CYRIX_ARR		( 3*32+ 2) /* Cyrix ARRs (= MTRRs) */
 #define X86_FEATURE_CENTAUR_MCR		( 3*32+ 3) /* Centaur MCRs (= MTRRs) */
-
-/* CPU types for specific tunings: */
 #define X86_FEATURE_K8			( 3*32+ 4) /* "" Opteron, Athlon64 */
-/* FREE, was #define X86_FEATURE_K7			( 3*32+ 5) "" Athlon */
+#define X86_FEATURE_ZEN5		( 3*32+ 5) /* "" CPU based on Zen5 microarchitecture */
 #define X86_FEATURE_P3			( 3*32+ 6) /* "" P3 */
 #define X86_FEATURE_P4			( 3*32+ 7) /* "" P4 */
 #define X86_FEATURE_CONSTANT_TSC	( 3*32+ 8) /* TSC ticks at a constant rate */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 0f0d425f0440..46335c2df083 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -539,7 +539,7 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 
 	/* Figure out Zen generations: */
 	switch (c->x86) {
-	case 0x17: {
+	case 0x17:
 		switch (c->x86_model) {
 		case 0x00 ... 0x2f:
 		case 0x50 ... 0x5f:
@@ -555,8 +555,8 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 			goto warn;
 		}
 		break;
-	}
-	case 0x19: {
+
+	case 0x19:
 		switch (c->x86_model) {
 		case 0x00 ... 0x0f:
 		case 0x20 ... 0x5f:
@@ -570,20 +570,31 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 			goto warn;
 		}
 		break;
-	}
+
+	case 0x1a:
+		switch (c->x86_model) {
+		case 0x00 ... 0x0f:
+			setup_force_cpu_cap(X86_FEATURE_ZEN5);
+			break;
+		default:
+			goto warn;
+		}
+		break;
+
 	default:
 		break;
 	}
 
 	if (cpu_has(c, X86_FEATURE_SEV_SNP)) {
 		/*
-		 * RMP table entry format is not architectural and it can vary by processor
+		 * RMP table entry format is not architectural, can vary by processor
 		 * and is defined by the per-processor PPR. Restrict SNP support on the
 		 * known CPU model and family for which the RMP table entry format is
 		 * currently defined for.
 		 */
-		if (!(c->x86 == 0x19 && c->x86_model <= 0xaf) &&
-		    !(c->x86 == 0x1a && c->x86_model <= 0xf))
+		if (!boot_cpu_has(X86_FEATURE_ZEN3) &&
+		    !boot_cpu_has(X86_FEATURE_ZEN4) &&
+		    !boot_cpu_has(X86_FEATURE_ZEN5))
 			setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
 		else if (!snp_probe_rmptable_info())
 			setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
@@ -1055,6 +1066,11 @@ static void init_amd_zen4(struct cpuinfo_x86 *c)
 		msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
 }
 
+static void init_amd_zen5(struct cpuinfo_x86 *c)
+{
+	init_amd_zen_common();
+}
+
 static void init_amd(struct cpuinfo_x86 *c)
 {
 	u64 vm_cr;
@@ -1100,6 +1116,8 @@ static void init_amd(struct cpuinfo_x86 *c)
 		init_amd_zen3(c);
 	else if (boot_cpu_has(X86_FEATURE_ZEN4))
 		init_amd_zen4(c);
+	else if (boot_cpu_has(X86_FEATURE_ZEN5))
+		init_amd_zen5(c);
 
 	/*
 	 * Enable workaround for FXSAVE leak on CPUs


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2024-01-04 11:05   ` Jeremi Piotrowski
@ 2024-01-05 16:09     ` Borislav Petkov
  2024-01-05 16:21       ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-05 16:09 UTC (permalink / raw)
  To: Jeremi Piotrowski
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Thu, Jan 04, 2024 at 12:05:27PM +0100, Jeremi Piotrowski wrote:
> Is there a really good reason to perform the snp_probe_smptable_info() check at this
> point (instead of in snp_rmptable_init). snp_rmptable_init will also clear the cap
> on failure, and bsp_init_amd() runs too early to allow for the kernel to allocate the
> rmptable itself. I pointed out in the previous review that kernel allocation of rmptable
> is necessary in SNP-host capable VMs in Azure.

What does that even mean?

That function is doing some calculations after reading two MSRs. What
can possibly go wrong?!

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2024-01-05 16:09     ` Borislav Petkov
@ 2024-01-05 16:21       ` Borislav Petkov
  2024-01-08 16:49         ` Jeremi Piotrowski
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-05 16:21 UTC (permalink / raw)
  To: Jeremi Piotrowski
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Fri, Jan 05, 2024 at 05:09:16PM +0100, Borislav Petkov wrote:
> On Thu, Jan 04, 2024 at 12:05:27PM +0100, Jeremi Piotrowski wrote:
> > Is there a really good reason to perform the snp_probe_smptable_info() check at this
> > point (instead of in snp_rmptable_init). snp_rmptable_init will also clear the cap
> > on failure, and bsp_init_amd() runs too early to allow for the kernel to allocate the
> > rmptable itself. I pointed out in the previous review that kernel allocation of rmptable
> > is necessary in SNP-host capable VMs in Azure.
> 
> What does that even mean?
> 
> That function is doing some calculations after reading two MSRs. What
> can possibly go wrong?!

That could be one reason perhaps:

"It needs to be called early enough to allow for AutoIBRS to not be disabled
just because SNP is supported. By calling it where it is currently called, the
SNP feature can be cleared if, even though supported, SNP can't be used,
allowing AutoIBRS to be used as a more performant Spectre mitigation."

https://lore.kernel.org/r/8ec38db1-5ccf-4684-bc0d-d48579ebf0d0@amd.com

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2023-12-30 16:19 ` [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support Michael Roth
                     ` (2 preceding siblings ...)
  2024-01-04 14:42   ` Borislav Petkov
@ 2024-01-05 19:19   ` Borislav Petkov
  2024-01-05 21:27   ` Borislav Petkov
  4 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-05 19:19 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:32AM -0600, Michael Roth wrote:
> +static int __init __snp_rmptable_init(void)
> +{
> +	u64 rmptable_size;
> +	void *rmptable_start;
> +	u64 val;

...

Ontop:

diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index ce7ede9065ed..566bb6f39665 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -150,6 +150,11 @@ bool snp_probe_rmptable_info(void)
 	return true;
 }
 
+/*
+ * Do the necessary preparations which are verified by the firmware as
+ * described in the SNP_INIT_EX firmware command description in the SNP
+ * firmware ABI spec.
+ */
 static int __init __snp_rmptable_init(void)
 {
 	u64 rmptable_size;

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2023-12-30 16:19 ` [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support Michael Roth
                     ` (3 preceding siblings ...)
  2024-01-05 19:19   ` Borislav Petkov
@ 2024-01-05 21:27   ` Borislav Petkov
  4 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-05 21:27 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:32AM -0600, Michael Roth wrote:
> +static int __init __snp_rmptable_init(void)

I already asked a year ago:

https://lore.kernel.org/all/Y9ubi0i4Z750gdMm@zn.tnic/

why is the __ version - __snp_rmptable_init - carved out but crickets.
It simply gets ignored. :-\

So let me do it myself, diff below.

Please add to the next version:

Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>

after incorporating all the changes.

Thx.

---
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 566bb6f39665..feed65f80776 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -155,19 +155,25 @@ bool snp_probe_rmptable_info(void)
  * described in the SNP_INIT_EX firmware command description in the SNP
  * firmware ABI spec.
  */
-static int __init __snp_rmptable_init(void)
+static int __init snp_rmptable_init(void)
 {
-	u64 rmptable_size;
 	void *rmptable_start;
+	u64 rmptable_size;
 	u64 val;
 
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	if (!amd_iommu_snp_en)
+		return 0;
+
 	if (!probed_rmp_size)
-		return 1;
+		goto nosnp;
 
 	rmptable_start = memremap(probed_rmp_base, probed_rmp_size, MEMREMAP_WB);
 	if (!rmptable_start) {
 		pr_err("Failed to map RMP table\n");
-		return 1;
+		goto nosnp;
 	}
 
 	/*
@@ -195,20 +201,6 @@ static int __init __snp_rmptable_init(void)
 	rmptable = (struct rmpentry *)rmptable_start;
 	rmptable_max_pfn = rmptable_size / sizeof(struct rmpentry) - 1;
 
-	return 0;
-}
-
-static int __init snp_rmptable_init(void)
-{
-	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
-		return 0;
-
-	if (!amd_iommu_snp_en)
-		return 0;
-
-	if (__snp_rmptable_init())
-		goto nosnp;
-
 	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
 
 	return 0;

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list
  2023-12-30 16:19 ` [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list Michael Roth
@ 2024-01-08 10:45   ` Vlastimil Babka
  2024-01-09 22:19     ` Kalra, Ashish
  0 siblings, 1 reply; 102+ messages in thread
From: Vlastimil Babka @ 2024-01-08 10:45 UTC (permalink / raw)
  To: Michael Roth, x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang

On 12/30/23 17:19, Michael Roth wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Pages are unsafe to be released back to the page-allocator, if they
> have been transitioned to firmware/guest state and can't be reclaimed
> or transitioned back to hypervisor/shared state. In this case add
> them to an internal leaked pages list to ensure that they are not freed
> or touched/accessed to cause fatal page faults.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: relocate to arch/x86/virt/svm/sev.c]
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Hi, sorry I didn't respond in time to the last mail discussing previous
version in
https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/
due to upcoming holidays.

I would rather avoid the approach of allocating container objects:
- it's allocating memory when effectively losing memory, a dangerous thing
- are all the callers and their context ok with GFP_KERNEL?
- GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
current process, it's probably not its fault the pages are leaked? Also the
charging can fail?
- given the benefit of having leaked pages on a list is basically just
debugging (i.e. crash dump or drgn inspection) this seems too heavy

I think it would be better and sufficient to use page->lru for order-0 and
head pages, and simply skip tail pages (possibly with adjusted warning
message for that case).

Vlastimil

<snip>

> +
> +void snp_leak_pages(u64 pfn, unsigned int npages)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +	struct leaked_page *leak;
> +
> +	pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
> +
> +	spin_lock(&snp_leaked_pages_list_lock);
> +	while (npages--) {
> +		leak = kzalloc(sizeof(*leak), GFP_KERNEL_ACCOUNT);
> +		if (!leak)
> +			goto unlock;

Should we skip the dump_rmpentry() in such a case?

> +		leak->page = page;
> +		list_add_tail(&leak->list, &snp_leaked_pages_list);
> +		dump_rmpentry(pfn);
> +		snp_nr_leaked_pages++;
> +		pfn++;
> +		page++;
> +	}
> +unlock:
> +	spin_unlock(&snp_leaked_pages_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(snp_leak_pages);


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2024-01-05 16:21       ` Borislav Petkov
@ 2024-01-08 16:49         ` Jeremi Piotrowski
  2024-01-08 17:04           ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Jeremi Piotrowski @ 2024-01-08 16:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On 05/01/2024 17:21, Borislav Petkov wrote:
> On Fri, Jan 05, 2024 at 05:09:16PM +0100, Borislav Petkov wrote:
>> On Thu, Jan 04, 2024 at 12:05:27PM +0100, Jeremi Piotrowski wrote:
>>> Is there a really good reason to perform the snp_probe_smptable_info() check at this
>>> point (instead of in snp_rmptable_init). snp_rmptable_init will also clear the cap
>>> on failure, and bsp_init_amd() runs too early to allow for the kernel to allocate the
>>> rmptable itself. I pointed out in the previous review that kernel allocation of rmptable
>>> is necessary in SNP-host capable VMs in Azure.
>>
>> What does that even mean?>>
>> That function is doing some calculations after reading two MSRs. What
>> can possibly go wrong?!
> 

What I wrote: "allow for the kernel to allocate the rmptable". Until the kernel allocates a
rmptable the two MSRs are not initialized in a VM. This is specific to SNP-host VMs because
they don't have access to the system-wide rmptable (or a virtualized version of it), and the
rmptable is only useful for kernel internal tracking in this case. So we don't strictly need
one and could save the overhead but not having one would complicate the KVM SNP code so I'd
rather allocate one for now.
 
It makes most sense to perform the rmptable allocation later in kernel init, after platform
detection and e820 setup. It isn't really used until device_initcall.

https://lore.kernel.org/lkml/20230213103402.1189285-2-jpiotrowski@linux.microsoft.com/
(I'll be posting updated patches soon).


> That could be one reason perhaps:
> 
> "It needs to be called early enough to allow for AutoIBRS to not be disabled
> just because SNP is supported. By calling it where it is currently called, the
> SNP feature can be cleared if, even though supported, SNP can't be used,
> allowing AutoIBRS to be used as a more performant Spectre mitigation."
> 
> https://lore.kernel.org/r/8ec38db1-5ccf-4684-bc0d-d48579ebf0d0@amd.com
> 

This logic seems twisted. Why use firmware rmptable allocation as a proxy for SEV-SNP
enablement if BIOS provides an explicit flag to enable/disable SEV-SNP support. That
would be a better signal to use to control AutoIBRS enablement.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2024-01-08 16:49         ` Jeremi Piotrowski
@ 2024-01-08 17:04           ` Borislav Petkov
  2024-01-09 11:56             ` Jeremi Piotrowski
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-08 17:04 UTC (permalink / raw)
  To: Jeremi Piotrowski
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Mon, Jan 08, 2024 at 05:49:01PM +0100, Jeremi Piotrowski wrote:
> What I wrote: "allow for the kernel to allocate the rmptable".

What?!

"15.36.5 Hypervisor RMP Management

...

Because the RMP is initialized by the AMD-SP to prevent direct access to
the RMP, the hypervisor must use the RMPUPDATE instruction to alter the
entries of the RMP. RMPUPDATE allows the hypervisor to alter the
Guest_Physical_Address, Assigned, Page_Size, Immutable, and ASID fields
of an RMP entry."

What you want is something that you should keep far and away from the
upstream kernel.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2024-01-08 17:04           ` Borislav Petkov
@ 2024-01-09 11:56             ` Jeremi Piotrowski
  2024-01-09 12:29               ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Jeremi Piotrowski @ 2024-01-09 11:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On 08/01/2024 18:04, Borislav Petkov wrote:
> On Mon, Jan 08, 2024 at 05:49:01PM +0100, Jeremi Piotrowski wrote:
>> What I wrote: "allow for the kernel to allocate the rmptable".
> 
> What?!
> 
> "15.36.5 Hypervisor RMP Management
> 
> ...
> 
> Because the RMP is initialized by the AMD-SP to prevent direct access to
> the RMP, the hypervisor must use the RMPUPDATE instruction to alter the
> entries of the RMP. RMPUPDATE allows the hypervisor to alter the
> Guest_Physical_Address, Assigned, Page_Size, Immutable, and ASID fields
> of an RMP entry."
>> What you want is something that you should keep far and away from the
> upstream kernel.
>

Can we please not assume I am acting in bad faith. I am explicitly trying to
integrate nicely with AMD's KVM SNP host patches to cover an additional usecase
and get something upstreamable.

Let's separate RMP allocation from who (and how) maintains the entries.

"""
15.36.4 Initializing the RMP
...
Software must program RMP_BASE and RMP_END identically for each core in the
system and before enabling SEV-SNP globally.
"""

KVM expects UEFI to do this, Hyper-V does the allocation itself (on bare-metal).
Both are valid. Afaik it is the SNP_INIT command that hands over control of the
RMP from software to AMD-SP.

When it comes to "who and how maintains the rmp" - that is of course the AMD-SP
and hypervisor issues RMPUPDATE instructions. The paragraph you cite talks about
the physical RMP and AMD-SP - not virtualized SNP (aka "SNP-host VM"/nested SNP).
AMD specified an MSR-based RMPUPDATE for us for that usecase (15.36.19 SEV-SNP
Instruction Virtualization). The RMP inside the SNP-host VM is not related to
the physical RMP and is an entirely software based construct.

The RMP in nested SNP is only used for kernel bookkeeping and so its allocation
is optional. KVM could do without reading the RMP directly altogether (by tracking
the assigned bit somewhere) but that would be a design change and I'd rather see
the KVM SNP host patches merged in their current shape. Which is why the patch
I linked allocates a (shadow) RMP from the kernel.

I would very much appreciate if we would not prevent that usecase from working -
that's why I've been reviewing and testing multiple revisions of these patches
and providing feedback all along.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2024-01-09 11:56             ` Jeremi Piotrowski
@ 2024-01-09 12:29               ` Borislav Petkov
  2024-01-09 12:44                 ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-09 12:29 UTC (permalink / raw)
  To: Jeremi Piotrowski
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Tue, Jan 09, 2024 at 12:56:17PM +0100, Jeremi Piotrowski wrote:
> Can we please not assume I am acting in bad faith.

No you're not acting with bad faith.

What you're doing, in my experience so far is, you come with some weird
HV + guest models which has been invented somewhere, behind some closed
doors, then you come with some desire that the upstream kernel should
support it and you're not even documenting it properly and I'm left with
asking questions all the time, what is this, what's the use case,
blabla.

Don't take this personally - I guess this is all due to NDAs,
development schedules, and whatever else and yes, I've heard it all.

But just because you want this, we're not going to jump on it and
support it unconditionally. It needs to integrate properly with the rest
of the kernel and if it doesn't, it is not going upstream. That simple.

> I am explicitly trying to integrate nicely with AMD's KVM SNP host
> patches to cover an additional usecase and get something upstreamable.

And yet I still have no clue what your use case is. I always have to go
ask behind the scenes and get some half-answers about *maybe* this is
what they support.

Looking at the patch you pointed at I see there a proper explanation of
your nested SNP stuff. Finally!

From now on, please make sure your use case is properly explained
before you come with patches.

> The RMP in nested SNP is only used for kernel bookkeeping and so its
> allocation is optional. KVM could do without reading the RMP directly
> altogether (by tracking the assigned bit somewhere) but that would be
> a design change and I'd rather see the KVM SNP host patches merged in
> their current shape. Which is why the patch I linked allocates
> a (shadow) RMP from the kernel.

At least three issues I see with that:

- the allocation can fail so it is a lot more convenient when the
  firmware prepares it

- the RMP_BASE and RMP_END writes need to be verified they actially did
  set up the RMP range because if they haven't, you might as well
  throw SNP security out of the window. In general, letting the kernel
  do the RMP allocation needs to be verified very very thoroughly.

- a future feature might make this more complicated

> I would very much appreciate if we would not prevent that usecase from
> working - that's why I've been reviewing and testing multiple
> revisions of these patches and providing feedback all along.

I very much appreciate the help but we need to get the main SNP host
stuff in first and then we can talk about modifications.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2024-01-09 12:29               ` Borislav Petkov
@ 2024-01-09 12:44                 ` Borislav Petkov
  2024-02-14 16:56                   ` Jeremi Piotrowski
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-09 12:44 UTC (permalink / raw)
  To: Jeremi Piotrowski
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick@oracle.com Brijesh Singh

On Tue, Jan 09, 2024 at 01:29:06PM +0100, Borislav Petkov wrote:
> At least three issues I see with that:
> 
> - the allocation can fail so it is a lot more convenient when the
>   firmware prepares it
> 
> - the RMP_BASE and RMP_END writes need to be verified they actially did
>   set up the RMP range because if they haven't, you might as well
>   throw SNP security out of the window. In general, letting the kernel
>   do the RMP allocation needs to be verified very very thoroughly.
> 
> - a future feature might make this more complicated

- What do you do if you boot on a system which has the RMP already
  allocated in the BIOS?

- How do you detect that it is the L1 kernel that must allocate the RMP?

- Why can't you use the BIOS allocated RMP in your scenario too instead
  of the L1 kernel allocating it?

- ...

I might think of more.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list
  2024-01-08 10:45   ` Vlastimil Babka
@ 2024-01-09 22:19     ` Kalra, Ashish
  2024-01-10  8:59       ` Vlastimil Babka
  0 siblings, 1 reply; 102+ messages in thread
From: Kalra, Ashish @ 2024-01-09 22:19 UTC (permalink / raw)
  To: Vlastimil Babka, Michael Roth, x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
	pankaj.gupta, liam.merwick, zhi.a.wang

Hello Vlastimil,

On 1/8/2024 4:45 AM, Vlastimil Babka wrote:
> On 12/30/23 17:19, Michael Roth wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> Pages are unsafe to be released back to the page-allocator, if they
>> have been transitioned to firmware/guest state and can't be reclaimed
>> or transitioned back to hypervisor/shared state. In this case add
>> them to an internal leaked pages list to ensure that they are not freed
>> or touched/accessed to cause fatal page faults.
>>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> [mdr: relocate to arch/x86/virt/svm/sev.c]
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Hi, sorry I didn't respond in time to the last mail discussing previous
> version in
> https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/
> due to upcoming holidays.
>
> I would rather avoid the approach of allocating container objects:
> - it's allocating memory when effectively losing memory, a dangerous thing
> - are all the callers and their context ok with GFP_KERNEL?
> - GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
> current process, it's probably not its fault the pages are leaked? Also the
> charging can fail?
> - given the benefit of having leaked pages on a list is basically just
> debugging (i.e. crash dump or drgn inspection) this seems too heavy
>
> I think it would be better and sufficient to use page->lru for order-0 and
> head pages, and simply skip tail pages (possibly with adjusted warning
> message for that case).
>
> Vlastimil
>
> <snip

Considering the above thoughts, this is updated version of 
snp_leak_pages(), looking forward to any review comments/feedback you 
have on the same:

void snp_leak_pages(u64 pfn, unsigned int npages)
{
         struct page *page = pfn_to_page(pfn);

         pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, 
pfn, pfn + npages);

         spin_lock(&snp_leaked_pages_list_lock);
         while (npages--) {
                 /*
                  * Reuse the page's buddy list for chaining into the leaked
                  * pages list. This page should not be on a free list 
currently
                  * and is also unsafe to be added to a free list.
                  */
                 if ((likely(!PageCompound(page))) || (PageCompound(page) &&
                     !PageTail(page) && compound_head(page) == page))
                         /*
                          * Skip inserting tail pages of compound page as
                          * page->buddy_list of tail pages is not usable.
                          */
                         list_add_tail(&page->buddy_list, 
&snp_leaked_pages_list);
                 sev_dump_rmpentry(pfn);
                 snp_nr_leaked_pages++;
                 pfn++;
                 page++;
         }
         spin_unlock(&snp_leaked_pages_list_lock);
}

Thanks, Ashish


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list
  2024-01-09 22:19     ` Kalra, Ashish
@ 2024-01-10  8:59       ` Vlastimil Babka
  0 siblings, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2024-01-10  8:59 UTC (permalink / raw)
  To: Kalra, Ashish, Michael Roth, x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
	pankaj.gupta, liam.merwick, zhi.a.wang

On 1/9/24 23:19, Kalra, Ashish wrote:
> Hello Vlastimil,
> 
> On 1/8/2024 4:45 AM, Vlastimil Babka wrote:
>> On 12/30/23 17:19, Michael Roth wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>>
>>> Pages are unsafe to be released back to the page-allocator, if they
>>> have been transitioned to firmware/guest state and can't be reclaimed
>>> or transitioned back to hypervisor/shared state. In this case add
>>> them to an internal leaked pages list to ensure that they are not freed
>>> or touched/accessed to cause fatal page faults.
>>>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> [mdr: relocate to arch/x86/virt/svm/sev.c]
>>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> Hi, sorry I didn't respond in time to the last mail discussing previous
>> version in
>> https://lore.kernel.org/all/8c1fd8da-912a-a9ce-9547-107ba8a450fc@amd.com/
>> due to upcoming holidays.
>>
>> I would rather avoid the approach of allocating container objects:
>> - it's allocating memory when effectively losing memory, a dangerous thing
>> - are all the callers and their context ok with GFP_KERNEL?
>> - GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
>> current process, it's probably not its fault the pages are leaked? Also the
>> charging can fail?
>> - given the benefit of having leaked pages on a list is basically just
>> debugging (i.e. crash dump or drgn inspection) this seems too heavy
>>
>> I think it would be better and sufficient to use page->lru for order-0 and
>> head pages, and simply skip tail pages (possibly with adjusted warning
>> message for that case).
>>
>> Vlastimil
>>
>> <snip
> 
> Considering the above thoughts, this is updated version of 
> snp_leak_pages(), looking forward to any review comments/feedback you 
> have on the same:
> 
> void snp_leak_pages(u64 pfn, unsigned int npages)
> {
>          struct page *page = pfn_to_page(pfn);
> 
>          pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, 
> pfn, pfn + npages);
> 
>          spin_lock(&snp_leaked_pages_list_lock);
>          while (npages--) {
>                  /*
>                   * Reuse the page's buddy list for chaining into the leaked
>                   * pages list. This page should not be on a free list 
> currently
>                   * and is also unsafe to be added to a free list.
>                   */
>                  if ((likely(!PageCompound(page))) || (PageCompound(page) &&
>                      !PageTail(page) && compound_head(page) == page))

This is unnecessarily paranoid wrt that compound_head(page) test, but OTOH
doesn't handle the weird case when we're leaking less than whole compound
page (if that can even happen). So I'd suggest:

while (npages) {

  if ((likely(!PageCompound(page))) || (PageHead(page) && compound_nr(page)
<= npages))
	list_add_tail(&page->buddy_list, ...)
  }

  ... (no change from yours)

  npages--;
}

(or an equivalent for()) perhaps

>                          /*
>                           * Skip inserting tail pages of compound page as
>                           * page->buddy_list of tail pages is not usable.
>                           */
>                          list_add_tail(&page->buddy_list, 
> &snp_leaked_pages_list);
>                  sev_dump_rmpentry(pfn);
>                  snp_nr_leaked_pages++;
>                  pfn++;
>                  page++;
>          }
>          spin_unlock(&snp_leaked_pages_list_lock);
> }
> 
> Thanks, Ashish
> 


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries
  2023-12-30 16:19 ` [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries Michael Roth
@ 2024-01-10  9:59   ` Borislav Petkov
  2024-01-10 20:18     ` Jarkko Sakkinen
  2024-01-10 11:13   ` Borislav Petkov
  2024-01-10 15:10   ` Tom Lendacky
  2 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-10  9:59 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:35AM -0600, Michael Roth wrote:
> +void snp_dump_hva_rmpentry(unsigned long hva)
> +{
> +	unsigned int level;
> +	pgd_t *pgd;
> +	pte_t *pte;
> +
> +	pgd = __va(read_cr3_pa());
> +	pgd += pgd_index(hva);
> +	pte = lookup_address_in_pgd(pgd, hva, &level);
> +
> +	if (!pte) {
> +		pr_info("Can't dump RMP entry for HVA %lx: no PTE/PFN found\n", hva);
> +		return;
> +	}
> +
> +	dump_rmpentry(pte_pfn(*pte));
> +}
> +EXPORT_SYMBOL_GPL(snp_dump_hva_rmpentry);

show_fault_oops() - the only caller of this - is builtin code and thus
doesn't need symbol exports. Symbol exports are only for module code.

---
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index a8cf33b7da71..31154f087fb0 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -339,4 +339,3 @@ void snp_dump_hva_rmpentry(unsigned long hva)
 
 	dump_rmpentry(pte_pfn(*pte));
 }
-EXPORT_SYMBOL_GPL(snp_dump_hva_rmpentry);

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries
  2023-12-30 16:19 ` [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries Michael Roth
  2024-01-10  9:59   ` Borislav Petkov
@ 2024-01-10 11:13   ` Borislav Petkov
  2024-01-10 15:20     ` Tom Lendacky
  2024-01-10 15:10   ` Tom Lendacky
  2 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-10 11:13 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:35AM -0600, Michael Roth wrote:
> +	while (pfn_current < pfn_end) {
> +		e = __snp_lookup_rmpentry(pfn_current, &level);
> +		if (IS_ERR(e)) {
> +			pfn_current++;
> +			continue;
> +		}
> +
> +		e_data = (u64 *)e;
> +		if (e_data[0] || e_data[1]) {
> +			pr_info("No assigned RMP entry for PFN 0x%llx, but the 2MB region contains populated RMP entries, e.g.: PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> +				pfn, pfn_current, e_data[1], e_data[0]);
> +			return;
> +		}
> +		pfn_current++;
> +	}
> +
> +	pr_info("No populated RMP entries in the 2MB region containing PFN 0x%llx\n",
> +		pfn);
> +}

Ok, I went and reworked this, see below.

Yes, I think it is important - at least in the beginning - to dump the
whole 2M PFN region for debugging purposes. If that output starts
becoming too unwieldy and overflowing terminals or log files, we'd
shorten it or put it behind a debug option or so.

Thx.

---
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index a8cf33b7da71..259a1dd655a7 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -35,16 +35,21 @@
  * Family 19h Model 01h, Rev B1 processor.
  */
 struct rmpentry {
-	u64	assigned	: 1,
-		pagesize	: 1,
-		immutable	: 1,
-		rsvd1		: 9,
-		gpa		: 39,
-		asid		: 10,
-		vmsa		: 1,
-		validated	: 1,
-		rsvd2		: 1;
-	u64 rsvd3;
+	union {
+		struct {
+			u64	assigned	: 1,
+				pagesize	: 1,
+				immutable	: 1,
+				rsvd1		: 9,
+				gpa		: 39,
+				asid		: 10,
+				vmsa		: 1,
+				validated	: 1,
+				rsvd2		: 1;
+		};
+		u64 lo;
+	};
+	u64 hi;
 } __packed;
 
 /*
@@ -272,22 +277,20 @@ EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
  */
 static void dump_rmpentry(u64 pfn)
 {
-	u64 pfn_current, pfn_end;
+	u64 pfn_i, pfn_end;
 	struct rmpentry *e;
-	u64 *e_data;
 	int level;
 
 	e = __snp_lookup_rmpentry(pfn, &level);
 	if (IS_ERR(e)) {
-		pr_info("Failed to read RMP entry for PFN 0x%llx, error %ld\n",
-			pfn, PTR_ERR(e));
+		pr_err("Error %ld reading RMP entry for PFN 0x%llx\n",
+			PTR_ERR(e), pfn);
 		return;
 	}
 
-	e_data = (u64 *)e;
 	if (e->assigned) {
-		pr_info("RMP entry for PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
-			pfn, e_data[1], e_data[0]);
+		pr_info("PFN 0x%llx, RMP entry: [0x%016llx - 0x%016llx]\n",
+			pfn, e->lo, e->hi);
 		return;
 	}
 
@@ -299,27 +302,28 @@ static void dump_rmpentry(u64 pfn)
 	 * certain situations, such as when the PFN is being accessed via a 2MB
 	 * mapping in the host page table.
 	 */
-	pfn_current = ALIGN(pfn, PTRS_PER_PMD);
-	pfn_end = pfn_current + PTRS_PER_PMD;
+	pfn_i = ALIGN(pfn, PTRS_PER_PMD);
+	pfn_end = pfn_i + PTRS_PER_PMD;
 
-	while (pfn_current < pfn_end) {
-		e = __snp_lookup_rmpentry(pfn_current, &level);
+	pr_info("PFN 0x%llx unassigned, dumping the whole 2M PFN region: [0x%llx - 0x%llx]\n",
+		pfn, pfn_i, pfn_end);
+
+	while (pfn_i < pfn_end) {
+		e = __snp_lookup_rmpentry(pfn_i, &level);
 		if (IS_ERR(e)) {
-			pfn_current++;
+			pr_err("Error %ld reading RMP entry for PFN 0x%llx\n",
+				PTR_ERR(e), pfn_i);
+			pfn_i++;
 			continue;
 		}
 
-		e_data = (u64 *)e;
-		if (e_data[0] || e_data[1]) {
-			pr_info("No assigned RMP entry for PFN 0x%llx, but the 2MB region contains populated RMP entries, e.g.: PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
-				pfn, pfn_current, e_data[1], e_data[0]);
-			return;
-		}
-		pfn_current++;
-	}
+		if (e->lo || e->hi)
+			pr_info("PFN: 0x%llx, [0x%016llx - 0x%016llx]\n", pfn_i, e->lo, e->hi);
+		else
+			pr_info("PFN: 0x%llx ...\n", pfn_i);
 
-	pr_info("No populated RMP entries in the 2MB region containing PFN 0x%llx\n",
-		pfn);
+		pfn_i++;
+	}
 }
 
 void snp_dump_hva_rmpentry(unsigned long hva)
@@ -339,4 +343,3 @@ void snp_dump_hva_rmpentry(unsigned long hva)
 
 	dump_rmpentry(pte_pfn(*pte));
 }
-EXPORT_SYMBOL_GPL(snp_dump_hva_rmpentry);

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 20/26] crypto: ccp: Add debug support for decrypting pages
  2023-12-30 16:19 ` [PATCH v1 20/26] crypto: ccp: Add debug support for decrypting pages Michael Roth
@ 2024-01-10 14:59   ` Sean Christopherson
  2024-01-11  0:50     ` Michael Roth
  0 siblings, 1 reply; 102+ messages in thread
From: Sean Christopherson @ 2024-01-10 14:59 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Add support to decrypt guest encrypted memory. These API interfaces can
> be used for example to dump VMCBs on SNP guest exit.

By who?  Nothing in this series, or the KVM series, ever invokes
snp_guest_dbg_decrypt_page().

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries
  2023-12-30 16:19 ` [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries Michael Roth
  2024-01-10  9:59   ` Borislav Petkov
  2024-01-10 11:13   ` Borislav Petkov
@ 2024-01-10 15:10   ` Tom Lendacky
  2 siblings, 0 replies; 102+ messages in thread
From: Tom Lendacky @ 2024-01-10 15:10 UTC (permalink / raw)
  To: Michael Roth, x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, tobin, bp, vbabka, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
	Brijesh Singh

On 12/30/23 10:19, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> This information will be useful for debugging things like page faults
> due to RMP access violations and RMPUPDATE failures.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: move helper to standalone patch, rework dump logic to reduce
>        verbosity]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>   arch/x86/include/asm/sev.h |  2 +
>   arch/x86/virt/svm/sev.c    | 77 ++++++++++++++++++++++++++++++++++++++
>   2 files changed, 79 insertions(+)
> 
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 01ce61b283a3..2c53e3de0b71 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -247,9 +247,11 @@ static inline u64 sev_get_status(void) { return 0; }
>   #ifdef CONFIG_KVM_AMD_SEV
>   bool snp_probe_rmptable_info(void);
>   int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
> +void snp_dump_hva_rmpentry(unsigned long address);
>   #else
>   static inline bool snp_probe_rmptable_info(void) { return false; }
>   static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
> +static inline void snp_dump_hva_rmpentry(unsigned long address) {}
>   #endif
>   
>   #endif
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index 49fdfbf4e518..7c9ced8911e9 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c
> @@ -266,3 +266,80 @@ int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
>   	return 0;
>   }
>   EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> +
> +/*
> + * Dump the raw RMP entry for a particular PFN. These bits are documented in the
> + * PPR for a particular CPU model and provide useful information about how a
> + * particular PFN is being utilized by the kernel/firmware at the time certain
> + * unexpected events occur, such as RMP faults.
> + */
> +static void dump_rmpentry(u64 pfn)
> +{
> +	u64 pfn_current, pfn_end;
> +	struct rmpentry *e;
> +	u64 *e_data;
> +	int level;
> +
> +	e = __snp_lookup_rmpentry(pfn, &level);
> +	if (IS_ERR(e)) {
> +		pr_info("Failed to read RMP entry for PFN 0x%llx, error %ld\n",
> +			pfn, PTR_ERR(e));
> +		return;
> +	}
> +
> +	e_data = (u64 *)e;
> +	if (e->assigned) {
> +		pr_info("RMP entry for PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> +			pfn, e_data[1], e_data[0]);
> +		return;
> +	}
> +
> +	/*
> +	 * If the RMP entry for a particular PFN is not in an assigned state,
> +	 * then it is sometimes useful to get an idea of whether or not any RMP
> +	 * entries for other PFNs within the same 2MB region are assigned, since
> +	 * those too can affect the ability to access a particular PFN in
> +	 * certain situations, such as when the PFN is being accessed via a 2MB
> +	 * mapping in the host page table.
> +	 */
> +	pfn_current = ALIGN(pfn, PTRS_PER_PMD);
> +	pfn_end = pfn_current + PTRS_PER_PMD;
> +
> +	while (pfn_current < pfn_end) {
> +		e = __snp_lookup_rmpentry(pfn_current, &level);
> +		if (IS_ERR(e)) {
> +			pfn_current++;
> +			continue;
> +		}
> +
> +		e_data = (u64 *)e;
> +		if (e_data[0] || e_data[1]) {
> +			pr_info("No assigned RMP entry for PFN 0x%llx, but the 2MB region contains populated RMP entries, e.g.: PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> +				pfn, pfn_current, e_data[1], e_data[0]);
> +			return;
> +		}
> +		pfn_current++;
> +	}
> +
> +	pr_info("No populated RMP entries in the 2MB region containing PFN 0x%llx\n",
> +		pfn);
> +}
> +
> +void snp_dump_hva_rmpentry(unsigned long hva)
> +{
> +	unsigned int level;
> +	pgd_t *pgd;
> +	pte_t *pte;
> +
> +	pgd = __va(read_cr3_pa());
> +	pgd += pgd_index(hva);
> +	pte = lookup_address_in_pgd(pgd, hva, &level);
> +
> +	if (!pte) {
> +		pr_info("Can't dump RMP entry for HVA %lx: no PTE/PFN found\n", hva);
> +		return;
> +	}
> +
> +	dump_rmpentry(pte_pfn(*pte));

Already worked with Mike offline when I was running into issues using this 
function. Net of that conversation is that the PFN needs to be adjusted 
using the address offset if the PTE level indicates a huge page.

Additionally the loop in dump_rmpentry() needs to use ALIGN_DOWN() in 
order to get the PFN of the starting 2MB area.

Thanks,
Tom


> +}
> +EXPORT_SYMBOL_GPL(snp_dump_hva_rmpentry);

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries
  2024-01-10 11:13   ` Borislav Petkov
@ 2024-01-10 15:20     ` Tom Lendacky
  2024-01-10 15:27       ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Tom Lendacky @ 2024-01-10 15:20 UTC (permalink / raw)
  To: Borislav Petkov, Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, tobin, vbabka, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
	Brijesh Singh

On 1/10/24 05:13, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:35AM -0600, Michael Roth wrote:
>> +	while (pfn_current < pfn_end) {
>> +		e = __snp_lookup_rmpentry(pfn_current, &level);
>> +		if (IS_ERR(e)) {
>> +			pfn_current++;
>> +			continue;
>> +		}
>> +
>> +		e_data = (u64 *)e;
>> +		if (e_data[0] || e_data[1]) {
>> +			pr_info("No assigned RMP entry for PFN 0x%llx, but the 2MB region contains populated RMP entries, e.g.: PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
>> +				pfn, pfn_current, e_data[1], e_data[0]);
>> +			return;
>> +		}
>> +		pfn_current++;
>> +	}
>> +
>> +	pr_info("No populated RMP entries in the 2MB region containing PFN 0x%llx\n",
>> +		pfn);
>> +}
> 
> Ok, I went and reworked this, see below.
> 
> Yes, I think it is important - at least in the beginning - to dump the
> whole 2M PFN region for debugging purposes. If that output starts
> becoming too unwieldy and overflowing terminals or log files, we'd
> shorten it or put it behind a debug option or so.
> 
> Thx.
> 
> ---
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index a8cf33b7da71..259a1dd655a7 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c

> +	pr_info("PFN 0x%llx unassigned, dumping the whole 2M PFN region: [0x%llx - 0x%llx]\n",
> +		pfn, pfn_i, pfn_end);

How about saying "... dumping all non-zero entries in the whole ..."

and then removing the print below that prints the PFN and "..."

> +
> +	while (pfn_i < pfn_end) {
> +		e = __snp_lookup_rmpentry(pfn_i, &level);
>   		if (IS_ERR(e)) {
> -			pfn_current++;
> +			pr_err("Error %ld reading RMP entry for PFN 0x%llx\n",
> +				PTR_ERR(e), pfn_i);
> +			pfn_i++;
>   			continue;
>   		}
>   
> -		e_data = (u64 *)e;
> -		if (e_data[0] || e_data[1]) {
> -			pr_info("No assigned RMP entry for PFN 0x%llx, but the 2MB region contains populated RMP entries, e.g.: PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> -				pfn, pfn_current, e_data[1], e_data[0]);
> -			return;
> -		}
> -		pfn_current++;
> -	}
> +		if (e->lo || e->hi)
> +			pr_info("PFN: 0x%llx, [0x%016llx - 0x%016llx]\n", pfn_i, e->lo, e->hi);
> +		else
> +			pr_info("PFN: 0x%llx ...\n", pfn_i);

Remove this one.

That should cut down on excess output since you are really only concerned 
with non-zero RMP entries when the input PFN RMP entry is not assigned.

Thanks,
Tom

>   
> -	pr_info("No populated RMP entries in the 2MB region containing PFN 0x%llx\n",
> -		pfn);
> +		pfn_i++;
> +	}
>   }
>   
>   void snp_dump_hva_rmpentry(unsigned long hva)
> @@ -339,4 +343,3 @@ void snp_dump_hva_rmpentry(unsigned long hva)
>   
>   	dump_rmpentry(pte_pfn(*pte));
>   }
> -EXPORT_SYMBOL_GPL(snp_dump_hva_rmpentry);
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries
  2024-01-10 15:20     ` Tom Lendacky
@ 2024-01-10 15:27       ` Borislav Petkov
  2024-01-10 15:51         ` Tom Lendacky
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-10 15:27 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Wed, Jan 10, 2024 at 09:20:44AM -0600, Tom Lendacky wrote:
> How about saying "... dumping all non-zero entries in the whole ..."

I'm trying not to have long stories in printk statements :)

> and then removing the print below that prints the PFN and "..."

Why remove the print? You want to print every non-null RMP entry in the
2M range, no?

And the "..." says that it is a null entry.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries
  2024-01-10 15:27       ` Borislav Petkov
@ 2024-01-10 15:51         ` Tom Lendacky
  2024-01-10 15:55           ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Tom Lendacky @ 2024-01-10 15:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh



On 1/10/24 09:27, Borislav Petkov wrote:
> On Wed, Jan 10, 2024 at 09:20:44AM -0600, Tom Lendacky wrote:
>> How about saying "... dumping all non-zero entries in the whole ..."
> 
> I'm trying not to have long stories in printk statements :)

Well it only adds "non-zero"

> 
>> and then removing the print below that prints the PFN and "..."
> 
> Why remove the print? You want to print every non-null RMP entry in the
> 2M range, no?

I'm only suggesting getting rid of the else that prints "..." when the 
entry is all zeroes. Printing the non-zero entries would still occur.

Thanks,
Tom

> 
> And the "..." says that it is a null entry.
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries
  2024-01-10 15:51         ` Tom Lendacky
@ 2024-01-10 15:55           ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-10 15:55 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Wed, Jan 10, 2024 at 09:51:04AM -0600, Tom Lendacky wrote:
> I'm only suggesting getting rid of the else that prints "..." when the entry
> is all zeroes. Printing the non-zero entries would still occur.

Sure, one should be able to to infer that the missing entries are null.

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries
  2024-01-10  9:59   ` Borislav Petkov
@ 2024-01-10 20:18     ` Jarkko Sakkinen
  2024-01-10 22:14       ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Jarkko Sakkinen @ 2024-01-10 20:18 UTC (permalink / raw)
  To: Borislav Petkov, Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
	Brijesh Singh

On Wed Jan 10, 2024 at 11:59 AM EET, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:35AM -0600, Michael Roth wrote:
> > +void snp_dump_hva_rmpentry(unsigned long hva)
> > +{
> > +	unsigned int level;
> > +	pgd_t *pgd;
> > +	pte_t *pte;
> > +
> > +	pgd = __va(read_cr3_pa());
> > +	pgd += pgd_index(hva);
> > +	pte = lookup_address_in_pgd(pgd, hva, &level);
> > +
> > +	if (!pte) {
> > +		pr_info("Can't dump RMP entry for HVA %lx: no PTE/PFN found\n", hva);
                ~~~~~~~
		is this correct log level?

BR, Jarkko

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries
  2024-01-10 20:18     ` Jarkko Sakkinen
@ 2024-01-10 22:14       ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-10 22:14 UTC (permalink / raw)
  To: Jarkko Sakkinen, Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
	Brijesh Singh

On Wed, Jan 10, 2024 at 10:18:37PM +0200, Jarkko Sakkinen wrote:
> > > +	if (!pte) {
> > > +		pr_info("Can't dump RMP entry for HVA %lx: no PTE/PFN found\n", hva);
>                 ~~~~~~~
> 		is this correct log level?

No, and I caught a couple of those already but missed this one, thanks.

Mike, please make sure all your error prints are pr_err.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 20/26] crypto: ccp: Add debug support for decrypting pages
  2024-01-10 14:59   ` Sean Christopherson
@ 2024-01-11  0:50     ` Michael Roth
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2024-01-11  0:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
	jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Wed, Jan 10, 2024 at 06:59:56AM -0800, Sean Christopherson wrote:
> On Sat, Dec 30, 2023, Michael Roth wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> > 
> > Add support to decrypt guest encrypted memory. These API interfaces can
> > be used for example to dump VMCBs on SNP guest exit.
> 
> By who?  Nothing in this series, or the KVM series, ever invokes
> snp_guest_dbg_decrypt_page().

There's a VMSA dump patch we've been using internally, but yah, this should
be dropped from the series until that patch is actually submitted upstream.

-Mike

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 10/26] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2023-12-30 16:19 ` [PATCH v1 10/26] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
@ 2024-01-12 14:49   ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-12 14:49 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:38AM -0600, Michael Roth wrote:
> +static int rmpupdate(u64 pfn, struct rmp_state *state)
> +{
> +	unsigned long paddr = pfn << PAGE_SHIFT;
> +	int ret;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		return -ENODEV;
> +

Let's document this deal here and leave the door open for potential
future improvements:

---

diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index c9156ce47c77..1fbf9843c163 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -374,6 +374,21 @@ static int rmpupdate(u64 pfn, struct rmp_state *state)
 	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
 		return -ENODEV;
 
+	/*
+	 * It is expected that those operations are seldom enough so
+	 * that no mutual exclusion of updaters is needed and thus the
+	 * overlap error condition below should happen very seldomly and
+	 * would get resolved relatively quickly by the firmware.
+	 *
+	 * If not, one could consider introducing a mutex or so here to
+	 * sync concurrent RMP updates and thus diminish the amount of
+	 * cases where firmware needs to lock 2M ranges to protect
+	 * against concurrent updates.
+	 *
+	 * The optimal solution would be range locking to avoid locking
+	 * disjoint regions unnecessarily but there's no support for
+	 * that yet.
+	 */
 	do {
 		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
 		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"

> +	do {
> +		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> +		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> +			     : "=a" (ret)
> +			     : "a" (paddr), "c" ((unsigned long)state)
> +			     : "memory", "cc");
> +	} while (ret == RMPUPDATE_FAIL_OVERLAP);
> +
> +	if (ret) {
> +		pr_err("RMPUPDATE failed for PFN %llx, ret: %d\n", pfn, ret);
> +		dump_rmpentry(pfn);
> +		dump_stack();
> +		return -EFAULT;
> +	}
> +
> +	return 0;
> +}


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2023-12-30 16:19 ` [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
@ 2024-01-12 19:48   ` Borislav Petkov
  2024-01-12 20:00   ` Dave Hansen
  2024-01-15  9:01   ` Borislav Petkov
  2 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-12 19:48 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:39AM -0600, Michael Roth wrote:
>  static int rmpupdate(u64 pfn, struct rmp_state *state)
>  {
>  	unsigned long paddr = pfn << PAGE_SHIFT;
> -	int ret;
> +	int ret, level, npages;
>  
>  	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>  		return -ENODEV;
>  
> +	level = RMP_TO_PG_LEVEL(state->pagesize);
> +	npages = page_level_size(level) / PAGE_SIZE;
> +
> +	/*
> +	 * If the kernel uses a 2MB directmap mapping to write to an address,
> +	 * and that 2MB range happens to contain a 4KB page that set to private
> +	 * in the RMP table, an RMP #PF will trigger and cause a host crash.
> +	 *
> +	 * Prevent this by removing pages from the directmap prior to setting
> +	 * them as private in the RMP table.
> +	 */
> +	if (state->assigned && invalidate_direct_map(pfn, npages))
> +		return -EFAULT;

Well, the comment says one thing but the code does not do quite what the
text says:

* where is the check that pfn is part of the kernel direct map?

* where is the check that this address is part of a 2M PMD in the direct
map so that the situation is even given that you can have a 4K private
PTE there?

What this does is simply invalidate @npages unconditionally.

Then, __set_pages_np() already takes a number of pages and a start
address. Why is this thing iterating instead of sending *all* npages in
one go?

Yes, you'd need two new helpers, something like:

set_pages_present(struct page *page, int numpages)
set_pages_non_present(struct page *page, int numpages)

which call the lowlevel counterparts.

For some reason I remember asking this a while ago...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2023-12-30 16:19 ` [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
  2024-01-12 19:48   ` Borislav Petkov
@ 2024-01-12 20:00   ` Dave Hansen
  2024-01-12 20:07     ` Borislav Petkov
  2024-01-15  9:09     ` Borislav Petkov
  2024-01-15  9:01   ` Borislav Petkov
  2 siblings, 2 replies; 102+ messages in thread
From: Dave Hansen @ 2024-01-12 20:00 UTC (permalink / raw)
  To: Michael Roth, x86
  Cc: kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, bp, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On 12/30/23 08:19, Michael Roth wrote:
> If the kernel uses a 2MB directmap mapping to write to an address, and
> that 2MB range happens to contain a 4KB page that set to private in the
> RMP table, that will also lead to a page-fault exception.

I thought we agreed long ago to just demote the whole direct map to 4k
on kernels that might need to act as SEV-SNP hosts.  That should be step
one and this can be discussed as an optimization later.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-12 20:00   ` Dave Hansen
@ 2024-01-12 20:07     ` Borislav Petkov
  2024-01-12 20:27       ` Vlastimil Babka
  2024-01-12 20:28       ` Tom Lendacky
  2024-01-15  9:09     ` Borislav Petkov
  1 sibling, 2 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-12 20:07 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Fri, Jan 12, 2024 at 12:00:01PM -0800, Dave Hansen wrote:
> On 12/30/23 08:19, Michael Roth wrote:
> > If the kernel uses a 2MB directmap mapping to write to an address, and
> > that 2MB range happens to contain a 4KB page that set to private in the
> > RMP table, that will also lead to a page-fault exception.
> 
> I thought we agreed long ago to just demote the whole direct map to 4k
> on kernels that might need to act as SEV-SNP hosts.  That should be step
> one and this can be discussed as an optimization later.

What would be the disadvantage here? Higher TLB pressure when running
kernel code I guess...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-12 20:07     ` Borislav Petkov
@ 2024-01-12 20:27       ` Vlastimil Babka
  2024-01-15  9:06         ` Borislav Petkov
  2024-01-12 20:28       ` Tom Lendacky
  1 sibling, 1 reply; 102+ messages in thread
From: Vlastimil Babka @ 2024-01-12 20:27 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On 1/12/24 21:07, Borislav Petkov wrote:
> On Fri, Jan 12, 2024 at 12:00:01PM -0800, Dave Hansen wrote:
>> On 12/30/23 08:19, Michael Roth wrote:
>> > If the kernel uses a 2MB directmap mapping to write to an address, and
>> > that 2MB range happens to contain a 4KB page that set to private in the
>> > RMP table, that will also lead to a page-fault exception.
>> 
>> I thought we agreed long ago to just demote the whole direct map to 4k
>> on kernels that might need to act as SEV-SNP hosts.  That should be step
>> one and this can be discussed as an optimization later.
> 
> What would be the disadvantage here? Higher TLB pressure when running
> kernel code I guess...

Yeah and last LSF/MM we concluded that it's not as a big disadvantage as we
previously thought https://lwn.net/Articles/931406/

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-12 20:07     ` Borislav Petkov
  2024-01-12 20:27       ` Vlastimil Babka
@ 2024-01-12 20:28       ` Tom Lendacky
  2024-01-12 20:37         ` Dave Hansen
  1 sibling, 1 reply; 102+ messages in thread
From: Tom Lendacky @ 2024-01-12 20:28 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh



On 1/12/24 14:07, Borislav Petkov wrote:
> On Fri, Jan 12, 2024 at 12:00:01PM -0800, Dave Hansen wrote:
>> On 12/30/23 08:19, Michael Roth wrote:
>>> If the kernel uses a 2MB directmap mapping to write to an address, and
>>> that 2MB range happens to contain a 4KB page that set to private in the
>>> RMP table, that will also lead to a page-fault exception.

I thought there was also a desire to remove the direct map for any pages 
assigned to a guest as private, not just the case that the comment says. 
So updating the comment would probably the best action.

>>
>> I thought we agreed long ago to just demote the whole direct map to 4k
>> on kernels that might need to act as SEV-SNP hosts.  That should be step
>> one and this can be discussed as an optimization later.

Won't this accomplish that without actually demoting a lot of long-live 
kernel related mappings that would never be demoted? I don't think we need 
to demote the whole mapping to 4K.

Thanks,
Tom

> 
> What would be the disadvantage here? Higher TLB pressure when running
> kernel code I guess...
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-12 20:28       ` Tom Lendacky
@ 2024-01-12 20:37         ` Dave Hansen
  2024-01-15  9:23           ` Vlastimil Babka
  2024-01-16 16:19           ` Michael Roth
  0 siblings, 2 replies; 102+ messages in thread
From: Dave Hansen @ 2024-01-12 20:37 UTC (permalink / raw)
  To: Tom Lendacky, Borislav Petkov
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On 1/12/24 12:28, Tom Lendacky wrote:
> I thought there was also a desire to remove the direct map for any pages
> assigned to a guest as private, not just the case that the comment says.
> So updating the comment would probably the best action.

I'm not sure who desires that.

It's sloooooooow to remove things from the direct map.  There's almost
certainly a frequency cutoff where running the whole direct mapping as
4k is better than the cost of mapping/unmapping.

Actually, where _is_ the TLB flushing here?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2023-12-30 16:19 ` [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
  2024-01-12 19:48   ` Borislav Petkov
  2024-01-12 20:00   ` Dave Hansen
@ 2024-01-15  9:01   ` Borislav Petkov
  2 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-15  9:01 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:39AM -0600, Michael Roth wrote:
> +	/*
> +	 * If the kernel uses a 2MB directmap mapping to write to an address,
> +	 * and that 2MB range happens to contain a 4KB page that set to private
> +	 * in the RMP table, an RMP #PF will trigger and cause a host crash.

Also:

diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 7d294d1a620b..2ad83e7fb2da 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -415,8 +415,9 @@ static int rmpupdate(u64 pfn, struct rmp_state *state)
 
 	/*
 	 * If the kernel uses a 2MB directmap mapping to write to an address,
-	 * and that 2MB range happens to contain a 4KB page that set to private
-	 * in the RMP table, an RMP #PF will trigger and cause a host crash.
+	 * and that 2MB range happens to contain a 4KB page that has been set
+	 * to private in the RMP table, an RMP #PF will trigger and cause a
+	 * host crash.
 	 *
 	 * Prevent this by removing pages from the directmap prior to setting
 	 * them as private in the RMP table.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-12 20:27       ` Vlastimil Babka
@ 2024-01-15  9:06         ` Borislav Petkov
  2024-01-15  9:14           ` Vlastimil Babka
  2024-01-15  9:16           ` Mike Rapoport
  0 siblings, 2 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-15  9:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Dave Hansen, Michael Roth, x86, kvm, linux-coco, linux-mm,
	linux-crypto, linux-kernel, tglx, mingo, jroedel,
	thomas.lendacky, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, tobin, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
	Brijesh Singh

On Fri, Jan 12, 2024 at 09:27:45PM +0100, Vlastimil Babka wrote:
> Yeah and last LSF/MM we concluded that it's not as a big disadvantage as we
> previously thought https://lwn.net/Articles/931406/

How nice, thanks for that!

Do you have some refs to Mike's tests so that we could run them here too
with SNP guests to see how big - if any - the fragmentation has.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-12 20:00   ` Dave Hansen
  2024-01-12 20:07     ` Borislav Petkov
@ 2024-01-15  9:09     ` Borislav Petkov
  2024-01-16 16:21       ` Dave Hansen
  1 sibling, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-15  9:09 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Fri, Jan 12, 2024 at 12:00:01PM -0800, Dave Hansen wrote:
> I thought we agreed long ago to just demote the whole direct map to 4k
> on kernels that might need to act as SEV-SNP hosts.  That should be step
> one and this can be discussed as an optimization later.

Do we have a link to that agreement somewhere?

I'd like to read why we agreed. And looking at Mike's talk:
https://lwn.net/Articles/931406/ - what do we wanna do in general with
the direct map granularity?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-15  9:06         ` Borislav Petkov
@ 2024-01-15  9:14           ` Vlastimil Babka
  2024-01-15  9:16           ` Mike Rapoport
  1 sibling, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2024-01-15  9:14 UTC (permalink / raw)
  To: Borislav Petkov, Mike Rapoport
  Cc: Dave Hansen, Michael Roth, x86, kvm, linux-coco, linux-mm,
	linux-crypto, linux-kernel, tglx, mingo, jroedel,
	thomas.lendacky, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, tobin, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
	Brijesh Singh

On 1/15/24 10:06, Borislav Petkov wrote:
> On Fri, Jan 12, 2024 at 09:27:45PM +0100, Vlastimil Babka wrote:
>> Yeah and last LSF/MM we concluded that it's not as a big disadvantage as we
>> previously thought https://lwn.net/Articles/931406/
> 
> How nice, thanks for that!
> 
> Do you have some refs to Mike's tests so that we could run them here too
> with SNP guests to see how big - if any - the fragmentation has.

Let me Cc him...

> Thx.
> 


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-15  9:06         ` Borislav Petkov
  2024-01-15  9:14           ` Vlastimil Babka
@ 2024-01-15  9:16           ` Mike Rapoport
  2024-01-15  9:20             ` Borislav Petkov
  1 sibling, 1 reply; 102+ messages in thread
From: Mike Rapoport @ 2024-01-15  9:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Vlastimil Babka, Dave Hansen, Michael Roth, x86, kvm, linux-coco,
	linux-mm, linux-crypto, linux-kernel, tglx, mingo, jroedel,
	thomas.lendacky, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, tobin, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
	Brijesh Singh

On Mon, Jan 15, 2024 at 10:06:39AM +0100, Borislav Petkov wrote:
> On Fri, Jan 12, 2024 at 09:27:45PM +0100, Vlastimil Babka wrote:
> > Yeah and last LSF/MM we concluded that it's not as a big disadvantage as we
> > previously thought https://lwn.net/Articles/931406/
> 
> How nice, thanks for that!
> 
> Do you have some refs to Mike's tests so that we could run them here too
> with SNP guests to see how big - if any - the fragmentation has.

I ran several tests from mmtests suite.

The tests results are here:
https://rppt.io/gfp-unmapped-bench/
 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-15  9:16           ` Mike Rapoport
@ 2024-01-15  9:20             ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-15  9:20 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Vlastimil Babka, Dave Hansen, Michael Roth, x86, kvm, linux-coco,
	linux-mm, linux-crypto, linux-kernel, tglx, mingo, jroedel,
	thomas.lendacky, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, tobin, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta,
	liam.merwick@oracle.com Brijesh Singh

On Mon, Jan 15, 2024 at 11:16:19AM +0200, Mike Rapoport wrote:
> I ran several tests from mmtests suite.
> 
> The tests results are here:
> https://rppt.io/gfp-unmapped-bench/

Cool, thanks Mike!

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-12 20:37         ` Dave Hansen
@ 2024-01-15  9:23           ` Vlastimil Babka
  2024-01-16 16:19           ` Michael Roth
  1 sibling, 0 replies; 102+ messages in thread
From: Vlastimil Babka @ 2024-01-15  9:23 UTC (permalink / raw)
  To: Dave Hansen, Tom Lendacky, Borislav Petkov
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
	Brijesh Singh

On 1/12/24 21:37, Dave Hansen wrote:
> On 1/12/24 12:28, Tom Lendacky wrote:
>> I thought there was also a desire to remove the direct map for any pages
>> assigned to a guest as private, not just the case that the comment says.
>> So updating the comment would probably the best action.
> 
> I'm not sure who desires that.
> 
> It's sloooooooow to remove things from the direct map.  There's almost
> certainly a frequency cutoff where running the whole direct mapping as
> 4k is better than the cost of mapping/unmapping.
> 
> Actually, where _is_ the TLB flushing here?

Hm yeah it seems to be using the _noflush version? Maybe the RMP issues this
avoids are only triggered with actual page tables and a stray outdated TLB
hit doesn't trigger it? Needs documenting though if that's the case.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 12/26] crypto: ccp: Define the SEV-SNP commands
  2023-12-30 16:19 ` [PATCH v1 12/26] crypto: ccp: Define the SEV-SNP commands Michael Roth
@ 2024-01-15  9:41   ` Borislav Petkov
  2024-01-26  1:56     ` Michael Roth
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-15  9:41 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick@oracle.com Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:40AM -0600, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
> Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
> while adding new hardware security protection.
> 
> Define the commands and structures used to communicate with the AMD-SP
> when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
> is available at developer.amd.com/sev.
> 
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> [mdr: update SNP command list and SNP status struct based on current
>       spec, use C99 flexible arrays, fix kernel-doc issues]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  drivers/crypto/ccp/sev-dev.c |  16 +++
>  include/linux/psp-sev.h      | 264 +++++++++++++++++++++++++++++++++++
>  include/uapi/linux/psp-sev.h |  56 ++++++++
>  3 files changed, 336 insertions(+)

More ignored feedback:

https://lore.kernel.org/r/20231124143630.GKZWC07hjqxkf60ni4@fat_crate.local

Lemme send it to you as a diff then - it'll work then perhaps.

diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 983d314b5ff5..1a76b5297f03 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -104,7 +104,7 @@ enum sev_cmd {
 	SEV_CMD_SNP_PAGE_RECLAIM	= 0x0C7,
 	SEV_CMD_SNP_PAGE_UNSMASH	= 0x0C8,
 	SEV_CMD_SNP_CONFIG		= 0x0C9,
-	SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX	= 0x0CA,
+	SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX = 0x0CA,
 	SEV_CMD_SNP_COMMIT		= 0x0CB,
 	SEV_CMD_SNP_VLEK_LOAD		= 0x0CD,
 
@@ -624,7 +624,8 @@ enum {
  * @gctx_paddr: system physical address of guest context page
  * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
  * @page_type: encoded page type
- * @imi_page: indicates that this page is part of the IMI of the guest
+ * @imi_page: indicates that this page is part of the IMI (Incoming
+ * Migration Image) of the guest
  * @rsvd: reserved
  * @rsvd2: reserved
  * @address: system physical address of destination page to encrypt

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
  2023-12-30 16:19 ` [PATCH v1 13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
@ 2024-01-15 11:19   ` Borislav Petkov
  2024-01-15 19:53   ` Borislav Petkov
  1 sibling, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-15 11:19 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick@oracle.com Brijesh Singh, Jarkko Sakkinen

On Sat, Dec 30, 2023 at 10:19:41AM -0600, Michael Roth wrote:
> +	/*
> +	 * The following sequence must be issued before launching the
> +	 * first SNP guest to ensure all dirty cache lines are flushed,
> +	 * including from updates to the RMP table itself via RMPUPDATE
> +	 * instructions:
> +	 *
> +	 * - WBINDV on all running CPUs
> +	 * - SEV_CMD_SNP_INIT[_EX] firmware command
> +	 * - WBINDV on all running CPUs
> +	 * - SEV_CMD_SNP_DF_FLUSH firmware command
> +	 */

Typos:

---
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 85634d4f8cfe..ce0c56006900 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -604,9 +604,9 @@ static int __sev_snp_init_locked(int *error)
 	 * including from updates to the RMP table itself via RMPUPDATE
 	 * instructions:
 	 *
-	 * - WBINDV on all running CPUs
+	 * - WBINVD on all running CPUs
 	 * - SEV_CMD_SNP_INIT[_EX] firmware command
-	 * - WBINDV on all running CPUs
+	 * - WBINVD on all running CPUs
 	 * - SEV_CMD_SNP_DF_FLUSH firmware command
 	 */
 	wbinvd_on_all_cpus();

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
  2023-12-30 16:19 ` [PATCH v1 13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
  2024-01-15 11:19   ` Borislav Petkov
@ 2024-01-15 19:53   ` Borislav Petkov
  2024-01-26  2:48     ` Michael Roth
  1 sibling, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-15 19:53 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick@oracle.com Brijesh Singh, Jarkko Sakkinen

On Sat, Dec 30, 2023 at 10:19:41AM -0600, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> Before SNP VMs can be launched, the platform must be appropriately
> configured and initialized. Platform initialization is accomplished via
> the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
> command to prepare for the first SNP guest launch after INIT.
							  ^^^^^^
Which "INIT"?

Sounds like after hipervisor's init...

> During the execution of SNP_INIT command, the firmware configures
> and enables SNP security policy enforcement in many system components.
> Some system components write to regions of memory reserved by early
> x86 firmware (e.g. UEFI). Other system components write to regions
> provided by the operation system, hypervisor, or x86 firmware.
> Such system components can only write to HV-fixed pages or Default
> pages. They will error when attempting to write to other page states

"... to pages in other page states... "

> after SNP_INIT enables their SNP enforcement.

And yes, this version looks much better. Some text cleanups ontop:

---
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 85634d4f8cfe..7942ec730525 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -549,24 +549,22 @@ static int __sev_snp_init_locked(int *error)
 		return 0;
 	}
 
-	/*
-	 * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
-	 * across all cores.
-	 */
+	/* SNP_INIT requires MSR_VM_HSAVE_PA to be cleared on all CPUs. */
 	on_each_cpu(snp_set_hsave_pa, NULL, 1);
 
 	/*
-	 * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
-	 * system physical address ranges to convert into the HV-fixed page states
-	 * during the RMP initialization.  For instance, the memory that UEFI
-	 * reserves should be included in the range list. This allows system
+	 * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list
+	 * of system physical address ranges to convert into HV-fixed page
+	 * states during the RMP initialization.  For instance, the memory that
+	 * UEFI reserves should be included in the that list. This allows system
 	 * components that occasionally write to memory (e.g. logging to UEFI
-	 * reserved regions) to not fail due to RMP initialization and SNP enablement.
+	 * reserved regions) to not fail due to RMP initialization and SNP
+	 * enablement.
 	 */
 	if (sev_version_greater_or_equal(SNP_MIN_API_MAJOR, 52)) {
 		/*
 		 * Firmware checks that the pages containing the ranges enumerated
-		 * in the RANGES structure are either in the Default page state or in the
+		 * in the RANGES structure are either in the default page state or in the
 		 * firmware page state.
 		 */
 		snp_range_list = kzalloc(PAGE_SIZE, GFP_KERNEL);
@@ -577,7 +575,7 @@ static int __sev_snp_init_locked(int *error)
 		}
 
 		/*
-		 * Retrieve all reserved memory regions setup by UEFI from the e820 memory map
+		 * Retrieve all reserved memory regions from the e820 memory map
 		 * to be setup as HV-fixed pages.
 		 */
 		rc = walk_iomem_res_desc(IORES_DESC_NONE, IORESOURCE_MEM, 0, ~0,
@@ -599,14 +597,13 @@ static int __sev_snp_init_locked(int *error)
 	}
 
 	/*
-	 * The following sequence must be issued before launching the
-	 * first SNP guest to ensure all dirty cache lines are flushed,
-	 * including from updates to the RMP table itself via RMPUPDATE
-	 * instructions:
+	 * The following sequence must be issued before launching the first SNP
+	 * guest to ensure all dirty cache lines are flushed, including from
+	 * updates to the RMP table itself via the RMPUPDATE instruction:
 	 *
-	 * - WBINDV on all running CPUs
+	 * - WBINVD on all running CPUs
 	 * - SEV_CMD_SNP_INIT[_EX] firmware command
-	 * - WBINDV on all running CPUs
+	 * - WBINVD on all running CPUs
 	 * - SEV_CMD_SNP_DF_FLUSH firmware command
 	 */
 	wbinvd_on_all_cpus();



-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-12 20:37         ` Dave Hansen
  2024-01-15  9:23           ` Vlastimil Babka
@ 2024-01-16 16:19           ` Michael Roth
  2024-01-16 16:50             ` Michael Roth
                               ` (2 more replies)
  1 sibling, 3 replies; 102+ messages in thread
From: Michael Roth @ 2024-01-16 16:19 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Tom Lendacky, Borislav Petkov, x86, kvm, linux-coco, linux-mm,
	linux-crypto, linux-kernel, tglx, mingo, jroedel, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh, rppt

On Fri, Jan 12, 2024 at 12:37:31PM -0800, Dave Hansen wrote:
> On 1/12/24 12:28, Tom Lendacky wrote:
> > I thought there was also a desire to remove the direct map for any pages
> > assigned to a guest as private, not just the case that the comment says.
> > So updating the comment would probably the best action.
> 
> I'm not sure who desires that.
> 
> It's sloooooooow to remove things from the direct map.  There's almost
> certainly a frequency cutoff where running the whole direct mapping as
> 4k is better than the cost of mapping/unmapping.

One area we'd been looking to optimize[1] is lots of vCPUs/guests
faulting in private memory on-demand via kernels with lazy acceptance
support. The lazy acceptance / #NPF path for each 4K/2M guest page
involves updating the corresponding RMP table entry to set it to
Guest-owned state, and as part of that we remove the PFNs from the
directmap. There is indeed potential for scalability issues due to how
directmap updates are currently handled, since they involve holding a
global cpa_lock for every update.

At the time, there were investigations on how to remove the cpa_lock,
and there's an RFC patch[2] that implements this change, but I don't
know where this stands today.

There was also some work[3] on being able to restore 2MB entries in the
directmap (currently entries are only re-added as 4K) the Mike Rapoport
pointed out to me a while back. With that, since the bulk of private
guest pages 2MB, we'd be able to avoid splitting the directmap to 4K over
time.

There was also some indication[4] that UPM/guest_memfd would eventually
manage directmap invalidations internally, so it made sense to have
SNP continue to handle this up until the point that mangement was
moved to gmem.

Those 3 things, paired with a platform-independent way of catching
unexpected kernel accesses to private guest memory, are what I think
nudged us all toward the current implementation.

But AFAIK none of these 3 things are being actively upstreamed today,
so it makes sense to re-consider how we handle the directmap in the
interim.

I did some performance tests which do seem to indicate that
pre-splitting the directmap to 4K can be substantially improve certain
SNP guest workloads. This test involves running a single 1TB SNP guest
with 128 vCPUs running "stress --vm 128 --vm-bytes 5G --vm-keep" to
rapidly fault in all of its memory via lazy acceptance, and then
measuring the rate that gmem pages are being allocated on the host by
monitoring "FileHugePages" from /proc/meminfo to get some rough gauge
of how quickly a guest can fault in it's initial working set prior to
reaching steady state. The data is a bit noisy but seems to indicate
significant improvement by taking the directmap updates out of the
lazy acceptance path, and I would only expect that to become more
significant as you scale up the number of guests / vCPUs.

  # Average fault-in rate across 3 runs, measured in GB/s
                    unpinned | pinned to NUMA node 0
  DirectMap4K           12.9 | 12.1
             stddev      2.2 |  1.3
  DirectMap2M+split      8.0 |  8.9
             stddev      1.3 |  0.8

The downside of course is potential impact for non-SNP workloads
resulting from splitting the directmap. Mike Rapoport's numbers make
me feel a little better about it, but I don't think they apply directly
to the notion of splitting the entire directmap. It's Even he LWN article
summarizes:

  "The conclusion from all of this, Rapoport continued, was that
  direct-map fragmentation just does not matter — for data access, at
  least. Using huge-page mappings does still appear to make a difference
  for memory containing the kernel code, so allocator changes should
  focus on code allocations — improving the layout of allocations for
  loadable modules, for example, or allowing vmalloc() to allocate huge
  pages for code. But, for kernel-data allocations, direct-map
  fragmentation simply appears to not be worth worrying about."

So at the very least, if we went down this path, we would be worth
investigating the following areas in addition to general perf testing:

  1) Only splitting directmap regions corresponding to kernel-allocatable
     *data* (hopefully that's even feasible...)
  2) Potentially deferring the split until an SNP guest is actually
     run, so there isn't any impact just from having SNP enabled (though
     you still take a hit from RMP checks in that case so maybe it's not
     worthwhile, but that itself has been noted as a concern for users
     so it would be nice to not make things even worse).

[1] https://lore.kernel.org/linux-mm/20231103000105.m3z4eijcxlxciyzd@amd.com/
[2] https://lore.kernel.org/lkml/Y7f9ZuPcIMk37KnN@gmail.com/T/#m15b74841f5319c0d1177f118470e9714d4ea96c8
[3] https://lore.kernel.org/linux-kernel/20200416213229.19174-1-kirill.shutemov@linux.intel.com/
[4] https://lore.kernel.org/all/YyGLXXkFCmxBfu5U@google.com/

> 
> Actually, where _is_ the TLB flushing here?

Boris pointed that out in v6, and we implemented it in v7, but it
completely cratered performance:

   https://lore.kernel.org/linux-mm/20221219150026.bltiyk72pmdc2ic3@amd.com/

After further discussion I think we'd concluded it wasn't necessary. Maybe
that's worth revisiting though. If it is necessary, then that would be
another reason to just pre-split the directmap because the above-mentioned
lazy acceptance workload/bottleneck would likely get substantially worse.

-Mike

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-15  9:09     ` Borislav Petkov
@ 2024-01-16 16:21       ` Dave Hansen
  2024-01-17  9:34         ` Borislav Petkov
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Hansen @ 2024-01-16 16:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On 1/15/24 01:09, Borislav Petkov wrote:
> On Fri, Jan 12, 2024 at 12:00:01PM -0800, Dave Hansen wrote:
>> I thought we agreed long ago to just demote the whole direct map to 4k
>> on kernels that might need to act as SEV-SNP hosts.  That should be step
>> one and this can be discussed as an optimization later.
> 
> Do we have a link to that agreement somewhere?

I seem to be remembering it wrong.  I have some memory of disabling 2M
mappings in the same spots that debug_pagealloc_enabled() does, but I
must have hallucinated it.

> https://lore.kernel.org/lkml/535400b4-0593-a7ca-1548-532ee1fefbd7@intel.com/

The original approach tried to fix up the page faults by doing the
demotion there and didn't seem to work with hugetlbfs at all.

> I'd like to read why we agreed. And looking at Mike's talk:
> https://lwn.net/Articles/931406/ - what do we wanna do in general with
> the direct map granularity?

I think fracturing it is much less of a problem than we once thought it
was.  There are _definitely_ people that will pay the cost for added
security.

The problem will be the first time someone sees a regression when their
direct-map-fracturing kernel feature (secret pages, SNP host, etc...)
collides with some workload that's sensitive to kernel TLB pressure.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-16 16:19           ` Michael Roth
@ 2024-01-16 16:50             ` Michael Roth
  2024-01-16 20:12               ` Mike Rapoport
  2024-01-16 18:22             ` Borislav Petkov
  2024-01-16 20:22             ` Dave Hansen
  2 siblings, 1 reply; 102+ messages in thread
From: Michael Roth @ 2024-01-16 16:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Tom Lendacky, Borislav Petkov, x86, kvm, linux-coco, linux-mm,
	linux-crypto, linux-kernel, tglx, mingo, jroedel, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh, rppt

On Tue, Jan 16, 2024 at 10:19:09AM -0600, Michael Roth wrote:
> I did some performance tests which do seem to indicate that
> pre-splitting the directmap to 4K can be substantially improve certain
> SNP guest workloads. This test involves running a single 1TB SNP guest
> with 128 vCPUs running "stress --vm 128 --vm-bytes 5G --vm-keep" to
> rapidly fault in all of its memory via lazy acceptance, and then
> measuring the rate that gmem pages are being allocated on the host by
> monitoring "FileHugePages" from /proc/meminfo to get some rough gauge
> of how quickly a guest can fault in it's initial working set prior to
> reaching steady state. The data is a bit noisy but seems to indicate
> significant improvement by taking the directmap updates out of the
> lazy acceptance path, and I would only expect that to become more
> significant as you scale up the number of guests / vCPUs.
> 
>   # Average fault-in rate across 3 runs, measured in GB/s
>                     unpinned | pinned to NUMA node 0
>   DirectMap4K           12.9 | 12.1
>              stddev      2.2 |  1.3
>   DirectMap2M+split      8.0 |  8.9
>              stddev      1.3 |  0.8
> 
> The downside of course is potential impact for non-SNP workloads
> resulting from splitting the directmap. Mike Rapoport's numbers make
> me feel a little better about it, but I don't think they apply directly
> to the notion of splitting the entire directmap. It's Even he LWN article
> summarizes:
> 
>   "The conclusion from all of this, Rapoport continued, was that
>   direct-map fragmentation just does not matter — for data access, at
>   least. Using huge-page mappings does still appear to make a difference
>   for memory containing the kernel code, so allocator changes should
>   focus on code allocations — improving the layout of allocations for
>   loadable modules, for example, or allowing vmalloc() to allocate huge
>   pages for code. But, for kernel-data allocations, direct-map
>   fragmentation simply appears to not be worth worrying about."
> 
> So at the very least, if we went down this path, we would be worth
> investigating the following areas in addition to general perf testing:
> 
>   1) Only splitting directmap regions corresponding to kernel-allocatable
>      *data* (hopefully that's even feasible...)
>   2) Potentially deferring the split until an SNP guest is actually
>      run, so there isn't any impact just from having SNP enabled (though
>      you still take a hit from RMP checks in that case so maybe it's not
>      worthwhile, but that itself has been noted as a concern for users
>      so it would be nice to not make things even worse).

There's another potential area of investigation I forgot to mention that
doesn't involve pre-splitting the directmap. It makes use of the fact
that the kernel should never be accessing a 2MB mapping that overlaps with
private guest memory if the backing PFN for the guest memory is a 2MB page.
Since there's no chance for overlap (well, maybe via a 1GB directmap entry,
but not as dramatic a change to force those to 2M), there's no need to
actually split the directmap entry in these cases since they won't
result in unexpected RMP faults.

So if pre-splitting the directmap ends up having too many downsides, then
there may still some potential for optimizing the current approach to a
fair degree.

-Mike

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-16 16:19           ` Michael Roth
  2024-01-16 16:50             ` Michael Roth
@ 2024-01-16 18:22             ` Borislav Petkov
  2024-01-16 20:22             ` Dave Hansen
  2 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-16 18:22 UTC (permalink / raw)
  To: Michael Roth
  Cc: Dave Hansen, Tom Lendacky, x86, kvm, linux-coco, linux-mm,
	linux-crypto, linux-kernel, tglx, mingo, jroedel, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, Brijesh Singh, rppt

On Tue, Jan 16, 2024 at 10:19:09AM -0600, Michael Roth wrote:
> So at the very least, if we went down this path, we would be worth
> investigating the following areas in addition to general perf testing:
> 
>   1) Only splitting directmap regions corresponding to kernel-allocatable
>      *data* (hopefully that's even feasible...)
>   2) Potentially deferring the split until an SNP guest is actually
>      run, so there isn't any impact just from having SNP enabled (though
>      you still take a hit from RMP checks in that case so maybe it's not
>      worthwhile, but that itself has been noted as a concern for users
>      so it would be nice to not make things even worse).

So the gist of this whole explanation why we end up doing what we end up
doing eventually should be in the commit message so that it is clear
*why* we did it. 

> After further discussion I think we'd concluded it wasn't necessary. Maybe
> that's worth revisiting though. If it is necessary, then that would be
> another reason to just pre-split the directmap because the above-mentioned
> lazy acceptance workload/bottleneck would likely get substantially worse.

The reason for that should also be in the commit message.

And to answer:

https://lore.kernel.org/linux-mm/20221219150026.bltiyk72pmdc2ic3@amd.com/

yes, you should add a @npages variant.

See if you could use/extend this, for example:

https://lore.kernel.org/r/20240116022008.1023398-3-mhklinux@outlook.com

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-16 16:50             ` Michael Roth
@ 2024-01-16 20:12               ` Mike Rapoport
  2024-01-26  1:49                 ` Michael Roth
  0 siblings, 1 reply; 102+ messages in thread
From: Mike Rapoport @ 2024-01-16 20:12 UTC (permalink / raw)
  To: Michael Roth
  Cc: Dave Hansen, Tom Lendacky, Borislav Petkov, x86, kvm, linux-coco,
	linux-mm, linux-crypto, linux-kernel, tglx, mingo, jroedel, hpa,
	ardb, pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen,
	slp, pgonda, peterz, srinivas.pandruvada, rientjes, tobin,
	vbabka, kirill, ak, tony.luck, sathyanarayanan.kuppuswamy,
	alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Tue, Jan 16, 2024 at 10:50:25AM -0600, Michael Roth wrote:
> On Tue, Jan 16, 2024 at 10:19:09AM -0600, Michael Roth wrote:
> > 
> > The downside of course is potential impact for non-SNP workloads
> > resulting from splitting the directmap. Mike Rapoport's numbers make
> > me feel a little better about it, but I don't think they apply directly
> > to the notion of splitting the entire directmap. It's Even he LWN article
> > summarizes:

When I ran the tests, I had the entire direct map forced to 4k or 2M pages.
There is indeed some impact and some tests suffer more than others but
there were also runs that benefit from smaller page size in the direct map,
at least if I remember correctly the results Intel folks posted a while
ago.
 
> >   "The conclusion from all of this, Rapoport continued, was that
> >   direct-map fragmentation just does not matter — for data access, at
> >   least. Using huge-page mappings does still appear to make a difference
> >   for memory containing the kernel code, so allocator changes should
> >   focus on code allocations — improving the layout of allocations for
> >   loadable modules, for example, or allowing vmalloc() to allocate huge
> >   pages for code. But, for kernel-data allocations, direct-map
> >   fragmentation simply appears to not be worth worrying about."
> > 
> > So at the very least, if we went down this path, we would be worth
> > investigating the following areas in addition to general perf testing:
> > 
> >   1) Only splitting directmap regions corresponding to kernel-allocatable
> >      *data* (hopefully that's even feasible...)

Forcing the direct map to 4k pages does not affect the kernel text
mappings, they are created separately and they are not the part of the
kernel mapping of the physical memory.
Well, except the modules, but they are mapped with 4k pages anyway.

> -Mike

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-16 16:19           ` Michael Roth
  2024-01-16 16:50             ` Michael Roth
  2024-01-16 18:22             ` Borislav Petkov
@ 2024-01-16 20:22             ` Dave Hansen
  2024-01-26  1:35               ` Michael Roth
  2 siblings, 1 reply; 102+ messages in thread
From: Dave Hansen @ 2024-01-16 20:22 UTC (permalink / raw)
  To: Michael Roth
  Cc: Tom Lendacky, Borislav Petkov, x86, kvm, linux-coco, linux-mm,
	linux-crypto, linux-kernel, tglx, mingo, jroedel, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh, rppt

On 1/16/24 08:19, Michael Roth wrote:
> 
> So at the very least, if we went down this path, we would be worth
> investigating the following areas in addition to general perf testing:
> 
>   1) Only splitting directmap regions corresponding to kernel-allocatable
>      *data* (hopefully that's even feasible...)

Take a look at the 64-bit memory map in here:

	https://www.kernel.org/doc/Documentation/x86/x86_64/mm.rst

We already have separate mappings for kernel data and (normal) kernel text.

>   2) Potentially deferring the split until an SNP guest is actually
>      run, so there isn't any impact just from having SNP enabled (though
>      you still take a hit from RMP checks in that case so maybe it's not
>      worthwhile, but that itself has been noted as a concern for users
>      so it would be nice to not make things even worse).

Yes, this would be nice too.

>> Actually, where _is_ the TLB flushing here?
> Boris pointed that out in v6, and we implemented it in v7, but it
> completely cratered performance:

That *desperately* needs to be documented.

How can it be safe to skip the TLB flush?  It this akin to a page
permission promotion where you go from RO->RW but can skip the TLB
flush?  In that case, the CPU will see the RO TLB entry violation, drop
it, and re-walk the page tables, discovering the RW entry.

Does something similar happen here where the CPU sees the 2M/4k conflict
in the TLB, drops the 2M entry, does a re-walk then picks up the
newly-split 2M->4k entries?

I can see how something like that would work, but it's _awfully_ subtle
to go unmentioned.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-16 16:21       ` Dave Hansen
@ 2024-01-17  9:34         ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-17  9:34 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick

On Tue, Jan 16, 2024 at 08:21:09AM -0800, Dave Hansen wrote:
> The problem will be the first time someone sees a regression when their
> direct-map-fracturing kernel feature (secret pages, SNP host, etc...)
> collides with some workload that's sensitive to kernel TLB pressure.

Yeah, and that "workload" will be some microbenchmark crap which people
would pay too much attention to, without it having any relevance to
real-world scenarios. Completely hypothetically speaking, ofc.

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 14/26] crypto: ccp: Provide API to issue SEV and SNP commands
  2023-12-30 16:19 ` [PATCH v1 14/26] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
@ 2024-01-17  9:48   ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-17  9:48 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:42AM -0600, Michael Roth wrote:
> +/**
> + * sev_do_cmd - issue an SEV or an SEV-SNP command
> + *
> + * @cmd: SEV or SEV-SNP firmware command to issue
> + * @data: arguments for firmware command
> + * @psp_ret: SEV command return code
> + *
> + * Returns:
> + * 0 if the SEV successfully processed the command

More forgotten feedback:

---
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 0581f194cdd0..a356a7b7408e 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -922,7 +922,7 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
  * @psp_ret: SEV command return code
  *
  * Returns:
- * 0 if the SEV successfully processed the command
+ * 0 if the SEV device successfully processed the command
  * -%ENODEV    if the PSP device is not available
  * -%ENOTSUPP  if PSP device does not support SEV
  * -%ETIMEDOUT if the SEV command timed out

---

Also, pls add it to your TODO list as a very low prio item to fixup all
this complains about:

./scripts/kernel-doc -none include/linux/psp-sev.h

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 17/26] crypto: ccp: Handle non-volatile INIT_EX data when SNP is enabled
  2023-12-30 16:19 ` [PATCH v1 17/26] crypto: ccp: Handle non-volatile INIT_EX data " Michael Roth
@ 2024-01-18 14:03   ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-18 14:03 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick

On Sat, Dec 30, 2023 at 10:19:45AM -0600, Michael Roth wrote:
>  drivers/crypto/ccp/sev-dev.c | 104 ++++++++++++++++++++++++++---------
>  1 file changed, 79 insertions(+), 25 deletions(-)

Some minor cleanups ontop:

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index dfe7f7afc411..a72ed4466d7b 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -266,16 +266,15 @@ static int sev_read_init_ex_file(void)
 }
 
 /*
- * When SNP is enabled, the pages comprising the buffer used to populate
- * the file specified by the init_ex_path module parameter needs to be set
- * to firmware-owned, which removes the mapping from the kernel direct
- * mapping since generally the hypervisor does not access firmware-owned
- * pages. However, in this case the hypervisor does need to read the
- * buffer to transfer the contents to the file at init_ex_path, so this
- * function is used to create a temporary virtual mapping to be used for
- * this purpose.
+ * When SNP is enabled, the pages comprising the buffer used to populate the
+ * file specified by the init_ex_path module parameter needs to be set to
+ * firmware-owned. This removes the mapping from the kernel direct mapping since
+ * generally the hypervisor does not access firmware-owned pages. However, in
+ * this case the hypervisor does need to read the buffer to transfer the
+ * contents to the file at init_ex_path, so create a temporary virtual mapping
+ * to be used for this purpose.
  */
-static void *vmap_sev_init_ex_buffer(void)
+static void *vmap_init_ex_buf(void)
 {
 	struct page *pages[NV_PAGES];
 	unsigned long base_pfn;
@@ -292,6 +291,11 @@ static void *vmap_sev_init_ex_buffer(void)
 	return vmap(pages, NV_PAGES, VM_MAP, PAGE_KERNEL_RO);
 }
 
+static void destroy_init_ex_buf(void *buf)
+{
+	vunmap(buf);
+}
+
 static int sev_write_init_ex_file(void)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -315,7 +319,7 @@ static int sev_write_init_ex_file(void)
 		return ret;
 	}
 
-	sev_init_ex_buffer = vmap_sev_init_ex_buffer();
+	sev_init_ex_buffer = vmap_init_ex_buf();
 	if (!sev_init_ex_buffer) {
 		dev_err(sev->dev, "SEV: failed to map non-volative memory area\n");
 		return -EIO;
@@ -329,12 +333,12 @@ static int sev_write_init_ex_file(void)
 		dev_err(sev->dev,
 			"SEV: failed to write %u bytes to non volatile memory area, ret %ld\n",
 			NV_LENGTH, nwrite);
-		vunmap(sev_init_ex_buffer);
+		destroy_init_ex_buf(sev_init_ex_buffer);
 		return -EIO;
 	}
 
 	dev_dbg(sev->dev, "SEV: write successful to NV file\n");
-	vunmap(sev_init_ex_buffer);
+	destroy_init_ex_buf(sev_init_ex_buffer);
 
 	return 0;
 }

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 18/26] crypto: ccp: Handle legacy SEV commands when SNP is enabled
  2023-12-30 16:19 ` [PATCH v1 18/26] crypto: ccp: Handle legacy SEV commands " Michael Roth
@ 2024-01-19 17:18   ` Borislav Petkov
  2024-01-19 17:36     ` Tom Lendacky
  2024-01-26 13:29     ` Michael Roth
  0 siblings, 2 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-19 17:18 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick

On Sat, Dec 30, 2023 at 10:19:46AM -0600, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> The behavior of legacy SEV commands is altered when the firmware is
> initialized for SNP support. In that case, all command buffer memory
> that may get written to by legacy SEV commands must be marked as
> firmware-owned in the RMP table prior to issuing the command.
> 
> Additionally, when a command buffer contains a system physical address
> that points to additional buffers that firmware may write to, special
> handling is needed depending on whether:
> 
>   1) the system physical address points to guest memory
>   2) the system physical address points to host memory
> 
> To handle case #1, the pages of these buffers are changed to
> firmware-owned in the RMP table before issuing the command, and restored
> to after the command completes.
> 
> For case #2, a bounce buffer is used instead of the original address.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Co-developed-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  drivers/crypto/ccp/sev-dev.c | 421 ++++++++++++++++++++++++++++++++++-
>  drivers/crypto/ccp/sev-dev.h |   3 +
>  2 files changed, 414 insertions(+), 10 deletions(-)

Definitely better, thanks.

Some cleanups ontop:

---

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 8cfb376ca2e7..7681c094c7ff 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -514,18 +514,21 @@ static void *sev_fw_alloc(unsigned long len)
  * struct cmd_buf_desc - descriptors for managing legacy SEV command address
  * parameters corresponding to buffers that may be written to by firmware.
  *
- * @paddr_ptr: pointer the address parameter in the command buffer, which may
- *	need to be saved/restored depending on whether a bounce buffer is used.
- *	Must be NULL if this descriptor is only an end-of-list indicator.
+ * @paddr: address which may need to be saved/restored depending on whether
+ * a bounce buffer is used. Must be NULL if this descriptor is only an
+ * end-of-list indicator.
+ *
  * @paddr_orig: storage for the original address parameter, which can be used to
- *	restore the original value in @paddr_ptr in cases where it is replaced
- *	with the address of a bounce buffer.
- * @len: length of buffer located at the address originally stored at @paddr_ptr
+ * restore the original value in @paddr in cases where it is replaced with
+ * the address of a bounce buffer.
+ *
+ * @len: length of buffer located at the address originally stored at @paddr
+ *
  * @guest_owned: true if the address corresponds to guest-owned pages, in which
- *	case bounce buffers are not needed.
+ * case bounce buffers are not needed.
  */
 struct cmd_buf_desc {
-	u64 *paddr_ptr;
+	u64 paddr;
 	u64 paddr_orig;
 	u32 len;
 	bool guest_owned;
@@ -549,30 +552,30 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
 	case SEV_CMD_PDH_CERT_EXPORT: {
 		struct sev_data_pdh_cert_export *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->pdh_cert_address;
+		desc_list[0].paddr = data->pdh_cert_address;
 		desc_list[0].len = data->pdh_cert_len;
-		desc_list[1].paddr_ptr = &data->cert_chain_address;
+		desc_list[1].paddr = data->cert_chain_address;
 		desc_list[1].len = data->cert_chain_len;
 		break;
 	}
 	case SEV_CMD_GET_ID: {
 		struct sev_data_get_id *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].paddr = data->address;
 		desc_list[0].len = data->len;
 		break;
 	}
 	case SEV_CMD_PEK_CSR: {
 		struct sev_data_pek_csr *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].paddr = data->address;
 		desc_list[0].len = data->len;
 		break;
 	}
 	case SEV_CMD_LAUNCH_UPDATE_DATA: {
 		struct sev_data_launch_update_data *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].paddr = data->address;
 		desc_list[0].len = data->len;
 		desc_list[0].guest_owned = true;
 		break;
@@ -580,7 +583,7 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
 	case SEV_CMD_LAUNCH_UPDATE_VMSA: {
 		struct sev_data_launch_update_vmsa *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].paddr = data->address;
 		desc_list[0].len = data->len;
 		desc_list[0].guest_owned = true;
 		break;
@@ -588,14 +591,14 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
 	case SEV_CMD_LAUNCH_MEASURE: {
 		struct sev_data_launch_measure *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].paddr = data->address;
 		desc_list[0].len = data->len;
 		break;
 	}
 	case SEV_CMD_LAUNCH_UPDATE_SECRET: {
 		struct sev_data_launch_secret *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->guest_address;
+		desc_list[0].paddr = data->guest_address;
 		desc_list[0].len = data->guest_len;
 		desc_list[0].guest_owned = true;
 		break;
@@ -603,7 +606,7 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
 	case SEV_CMD_DBG_DECRYPT: {
 		struct sev_data_dbg *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->dst_addr;
+		desc_list[0].paddr = data->dst_addr;
 		desc_list[0].len = data->len;
 		desc_list[0].guest_owned = true;
 		break;
@@ -611,7 +614,7 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
 	case SEV_CMD_DBG_ENCRYPT: {
 		struct sev_data_dbg *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->dst_addr;
+		desc_list[0].paddr = data->dst_addr;
 		desc_list[0].len = data->len;
 		desc_list[0].guest_owned = true;
 		break;
@@ -619,39 +622,39 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
 	case SEV_CMD_ATTESTATION_REPORT: {
 		struct sev_data_attestation_report *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->address;
+		desc_list[0].paddr = data->address;
 		desc_list[0].len = data->len;
 		break;
 	}
 	case SEV_CMD_SEND_START: {
 		struct sev_data_send_start *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->session_address;
+		desc_list[0].paddr = data->session_address;
 		desc_list[0].len = data->session_len;
 		break;
 	}
 	case SEV_CMD_SEND_UPDATE_DATA: {
 		struct sev_data_send_update_data *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->hdr_address;
+		desc_list[0].paddr = data->hdr_address;
 		desc_list[0].len = data->hdr_len;
-		desc_list[1].paddr_ptr = &data->trans_address;
+		desc_list[1].paddr = data->trans_address;
 		desc_list[1].len = data->trans_len;
 		break;
 	}
 	case SEV_CMD_SEND_UPDATE_VMSA: {
 		struct sev_data_send_update_vmsa *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->hdr_address;
+		desc_list[0].paddr = data->hdr_address;
 		desc_list[0].len = data->hdr_len;
-		desc_list[1].paddr_ptr = &data->trans_address;
+		desc_list[1].paddr = data->trans_address;
 		desc_list[1].len = data->trans_len;
 		break;
 	}
 	case SEV_CMD_RECEIVE_UPDATE_DATA: {
 		struct sev_data_receive_update_data *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->guest_address;
+		desc_list[0].paddr = data->guest_address;
 		desc_list[0].len = data->guest_len;
 		desc_list[0].guest_owned = true;
 		break;
@@ -659,7 +662,7 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
 	case SEV_CMD_RECEIVE_UPDATE_VMSA: {
 		struct sev_data_receive_update_vmsa *data = cmd_buf;
 
-		desc_list[0].paddr_ptr = &data->guest_address;
+		desc_list[0].paddr = data->guest_address;
 		desc_list[0].len = data->guest_len;
 		desc_list[0].guest_owned = true;
 		break;
@@ -687,16 +690,16 @@ static int snp_map_cmd_buf_desc(struct cmd_buf_desc *desc)
 			return -ENOMEM;
 		}
 
-		desc->paddr_orig = *desc->paddr_ptr;
-		*desc->paddr_ptr = __psp_pa(page_to_virt(page));
+		desc->paddr_orig = desc->paddr;
+		desc->paddr = __psp_pa(page_to_virt(page));
 	}
 
-	paddr = *desc->paddr_ptr;
+	paddr = desc->paddr;
 	npages = PAGE_ALIGN(desc->len) >> PAGE_SHIFT;
 
 	/* Transition the buffer to firmware-owned. */
 	if (rmp_mark_pages_firmware(paddr, npages, true)) {
-		pr_warn("Failed move pages to firmware-owned state for SEV legacy command.\n");
+		pr_warn("Error moving pages to firmware-owned state for SEV legacy command.\n");
 		return -EFAULT;
 	}
 
@@ -705,31 +708,29 @@ static int snp_map_cmd_buf_desc(struct cmd_buf_desc *desc)
 
 static int snp_unmap_cmd_buf_desc(struct cmd_buf_desc *desc)
 {
-	unsigned long paddr;
 	unsigned int npages;
 
 	if (!desc->len)
 		return 0;
 
-	paddr = *desc->paddr_ptr;
 	npages = PAGE_ALIGN(desc->len) >> PAGE_SHIFT;
 
 	/* Transition the buffers back to hypervisor-owned. */
-	if (snp_reclaim_pages(paddr, npages, true)) {
+	if (snp_reclaim_pages(desc->paddr, npages, true)) {
 		pr_warn("Failed to reclaim firmware-owned pages while issuing SEV legacy command.\n");
 		return -EFAULT;
 	}
 
 	/* Copy data from bounce buffer and then free it. */
 	if (!desc->guest_owned) {
-		void *bounce_buf = __va(__sme_clr(paddr));
+		void *bounce_buf = __va(__sme_clr(desc->paddr));
 		void *dst_buf = __va(__sme_clr(desc->paddr_orig));
 
 		memcpy(dst_buf, bounce_buf, desc->len);
 		__free_pages(virt_to_page(bounce_buf), get_order(desc->len));
 
 		/* Restore the original address in the command buffer. */
-		*desc->paddr_ptr = desc->paddr_orig;
+		desc->paddr = desc->paddr_orig;
 	}
 
 	return 0;
@@ -737,14 +738,14 @@ static int snp_unmap_cmd_buf_desc(struct cmd_buf_desc *desc)
 
 static int snp_map_cmd_buf_desc_list(int cmd, void *cmd_buf, struct cmd_buf_desc *desc_list)
 {
-	int i, n;
+	int i;
 
 	snp_populate_cmd_buf_desc_list(cmd, cmd_buf, desc_list);
 
 	for (i = 0; i < CMD_BUF_DESC_MAX; i++) {
 		struct cmd_buf_desc *desc = &desc_list[i];
 
-		if (!desc->paddr_ptr)
+		if (!desc->paddr)
 			break;
 
 		if (snp_map_cmd_buf_desc(desc))
@@ -754,8 +755,7 @@ static int snp_map_cmd_buf_desc_list(int cmd, void *cmd_buf, struct cmd_buf_desc
 	return 0;
 
 err_unmap:
-	n = i;
-	for (i = 0; i < n; i++)
+	for (i--; i >= 0; i--)
 		snp_unmap_cmd_buf_desc(&desc_list[i]);
 
 	return -EFAULT;
@@ -768,7 +768,7 @@ static int snp_unmap_cmd_buf_desc_list(struct cmd_buf_desc *desc_list)
 	for (i = 0; i < CMD_BUF_DESC_MAX; i++) {
 		struct cmd_buf_desc *desc = &desc_list[i];
 
-		if (!desc->paddr_ptr)
+		if (!desc->paddr)
 			break;
 
 		if (snp_unmap_cmd_buf_desc(desc))
@@ -799,8 +799,8 @@ static bool sev_cmd_buf_writable(int cmd)
 	}
 }
 
-/* After SNP is INIT'ed, the behavior of legacy SEV commands is changed. */
-static bool snp_legacy_handling_needed(int cmd)
+/* After SNP is initialized, the behavior of legacy SEV commands is changed. */
+static inline bool snp_legacy_handling_needed(int cmd)
 {
 	struct sev_device *sev = psp_master->sev_data;
 
@@ -891,7 +891,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 			sev->cmd_buf_backup_active = true;
 		} else {
 			dev_err(sev->dev,
-				"SEV: too many firmware commands are in-progress, no command buffers available.\n");
+				"SEV: too many firmware commands in progress, no command buffers available.\n");
 			return -EBUSY;
 		}
 
@@ -904,7 +904,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 		ret = snp_prep_cmd_buf(cmd, cmd_buf, desc_list);
 		if (ret) {
 			dev_err(sev->dev,
-				"SEV: failed to prepare buffer for legacy command %#x. Error: %d\n",
+				"SEV: failed to prepare buffer for legacy command 0x%#x. Error: %d\n",
 				cmd, ret);
 			return ret;
 		}


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 18/26] crypto: ccp: Handle legacy SEV commands when SNP is enabled
  2024-01-19 17:18   ` Borislav Petkov
@ 2024-01-19 17:36     ` Tom Lendacky
  2024-01-19 17:48       ` Borislav Petkov
  2024-01-26 13:29     ` Michael Roth
  1 sibling, 1 reply; 102+ messages in thread
From: Tom Lendacky @ 2024-01-19 17:36 UTC (permalink / raw)
  To: Borislav Petkov, Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
	luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
	rientjes, tobin, vbabka, kirill, ak, tony.luck,
	sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
	nikunj.dadhania, pankaj.gupta, liam.merwick

On 1/19/24 11:18, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:46AM -0600, Michael Roth wrote:
>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>> The behavior of legacy SEV commands is altered when the firmware is
>> initialized for SNP support. In that case, all command buffer memory
>> that may get written to by legacy SEV commands must be marked as
>> firmware-owned in the RMP table prior to issuing the command.
>>
>> Additionally, when a command buffer contains a system physical address
>> that points to additional buffers that firmware may write to, special
>> handling is needed depending on whether:
>>
>>    1) the system physical address points to guest memory
>>    2) the system physical address points to host memory
>>
>> To handle case #1, the pages of these buffers are changed to
>> firmware-owned in the RMP table before issuing the command, and restored
>> to after the command completes.
>>
>> For case #2, a bounce buffer is used instead of the original address.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Co-developed-by: Michael Roth <michael.roth@amd.com>
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> ---
>>   drivers/crypto/ccp/sev-dev.c | 421 ++++++++++++++++++++++++++++++++++-
>>   drivers/crypto/ccp/sev-dev.h |   3 +
>>   2 files changed, 414 insertions(+), 10 deletions(-)
> 
> Definitely better, thanks.
> 
> Some cleanups ontop:
> 
> ---
> 

> @@ -904,7 +904,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>   		ret = snp_prep_cmd_buf(cmd, cmd_buf, desc_list);
>   		if (ret) {
>   			dev_err(sev->dev,
> -				"SEV: failed to prepare buffer for legacy command %#x. Error: %d\n",
> +				"SEV: failed to prepare buffer for legacy command 0x%#x. Error: %d\n",

Using %#x will produce the 0x in the output (except if the value is zero 
for some reason). So I would say make that 0x%x.

Thanks,
Tom

>   				cmd, ret);
>   			return ret;
>   		}
> 
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 18/26] crypto: ccp: Handle legacy SEV commands when SNP is enabled
  2024-01-19 17:36     ` Tom Lendacky
@ 2024-01-19 17:48       ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-19 17:48 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Michael Roth, x86, kvm, linux-coco, linux-mm, linux-crypto,
	linux-kernel, tglx, mingo, jroedel, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick

On Fri, Jan 19, 2024 at 11:36:44AM -0600, Tom Lendacky wrote:
> Using %#x will produce the 0x in the output (except if the value is zero for
> some reason).

printf-compatibility:

       #      The value should be converted to an "alternate form".  For o conversions,
              the first character of the output string is made zero (by prefixing  a  0
              if  it  was not zero already).  For x and X conversions, a nonzero result
              has the string "0x" (or "0X" for X conversions) prepended to it.

> So I would say make that 0x%x.

Yap, better, unambiguous.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 21/26] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
  2023-12-30 16:19 ` [PATCH v1 21/26] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump Michael Roth
@ 2024-01-21 11:49   ` Borislav Petkov
  2024-01-26  3:03     ` Kalra, Ashish
  2024-01-26 13:38     ` Michael Roth
  0 siblings, 2 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-21 11:49 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick

On Sat, Dec 30, 2023 at 10:19:49AM -0600, Michael Roth wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Add a kdump safe version of sev_firmware_shutdown() registered as a
> crash_kexec_post_notifier, which is invoked during panic/crash to do
> SEV/SNP shutdown. This is required for transitioning all IOMMU pages
> to reclaim/hypervisor state, otherwise re-init of IOMMU pages during
> crashdump kernel boot fails and panics the crashdump kernel. This
> panic notifier runs in atomic context, hence it ensures not to
> acquire any locks/mutexes and polls for PSP command completion
> instead of depending on PSP command completion interrupt.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: remove use of "we" in comments]
> Signed-off-by: Michael Roth <michael.roth@amd.com>

Cleanups ontop, see if the below works too. Especially:

* I've zapped the WBINVD before the TMR pages are freed because
__sev_snp_shutdown_locked() will WBINVD anyway.

* The mutex_is_locked() check in snp_shutdown_on_panic() is silly
because the panic notifier runs on one CPU anyway.

Thx.

---

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 435ba9bc4510..27323203e593 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -227,6 +227,7 @@ int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct sn
 void snp_accept_memory(phys_addr_t start, phys_addr_t end);
 u64 snp_get_unsupported_features(u64 status);
 u64 sev_get_status(void);
+void kdump_sev_callback(void);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
@@ -255,6 +256,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
 static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
 static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
 static inline u64 sev_get_status(void) { return 0; }
+static inline void kdump_sev_callback(void) {  }
 #endif
 
 #ifdef CONFIG_KVM_AMD_SEV
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 23ede774d31b..64ae3a1e5c30 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -40,6 +40,7 @@
 #include <asm/intel_pt.h>
 #include <asm/crash.h>
 #include <asm/cmdline.h>
+#include <asm/sev.h>
 
 /* Used while preparing memory map entries for second kernel */
 struct crash_memmap_data {
@@ -59,12 +60,7 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
 	 */
 	cpu_emergency_stop_pt();
 
-	/*
-	 * for SNP do wbinvd() on remote CPUs to
-	 * safely do SNP_SHUTDOWN on the local CPU.
-	 */
-	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
-		wbinvd();
+	kdump_sev_callback();
 
 	disable_local_APIC();
 }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index c67285824e82..dbb2cc6b5666 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2262,3 +2262,13 @@ static int __init snp_init_platform_device(void)
 	return 0;
 }
 device_initcall(snp_init_platform_device);
+
+void kdump_sev_callback(void)
+{
+	/*
+	 * Do wbinvd() on remote CPUs when SNP is enabled in order to
+	 * safely do SNP_SHUTDOWN on the the local CPU.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		wbinvd();
+}
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 598878e760bc..c342e5e54e45 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -161,7 +161,6 @@ static int sev_wait_cmd_ioc(struct sev_device *sev,
 
 			udelay(10);
 		}
-
 		return -ETIMEDOUT;
 	}
 
@@ -1654,7 +1653,7 @@ static int sev_update_firmware(struct device *dev)
 	return ret;
 }
 
-static int __sev_snp_shutdown_locked(int *error, bool in_panic)
+static int __sev_snp_shutdown_locked(int *error, bool panic)
 {
 	struct sev_device *sev = psp_master->sev_data;
 	struct sev_data_snp_shutdown_ex data;
@@ -1673,7 +1672,7 @@ static int __sev_snp_shutdown_locked(int *error, bool in_panic)
 	 * In that case, a wbinvd() is done on remote CPUs via the NMI
 	 * callback, so only a local wbinvd() is needed here.
 	 */
-	if (!in_panic)
+	if (!panic)
 		wbinvd_on_all_cpus();
 	else
 		wbinvd();
@@ -2199,26 +2198,13 @@ int sev_dev_init(struct psp_device *psp)
 	return ret;
 }
 
-static void __sev_firmware_shutdown(struct sev_device *sev, bool in_panic)
+static void __sev_firmware_shutdown(struct sev_device *sev, bool panic)
 {
 	int error;
 
 	__sev_platform_shutdown_locked(NULL);
 
 	if (sev_es_tmr) {
-		/*
-		 * The TMR area was encrypted, flush it from the cache
-		 *
-		 * If invoked during panic handling, local interrupts are
-		 * disabled and all CPUs are stopped, so wbinvd_on_all_cpus()
-		 * can't be used. In that case, wbinvd() is done on remote CPUs
-		 * via the NMI callback, so a local wbinvd() is sufficient here.
-		 */
-		if (!in_panic)
-			wbinvd_on_all_cpus();
-		else
-			wbinvd();
-
 		__snp_free_firmware_pages(virt_to_page(sev_es_tmr),
 					  get_order(sev_es_tmr_size),
 					  true);
@@ -2237,7 +2223,7 @@ static void __sev_firmware_shutdown(struct sev_device *sev, bool in_panic)
 		snp_range_list = NULL;
 	}
 
-	__sev_snp_shutdown_locked(&error, in_panic);
+	__sev_snp_shutdown_locked(&error, panic);
 }
 
 static void sev_firmware_shutdown(struct sev_device *sev)
@@ -2262,26 +2248,18 @@ void sev_dev_destroy(struct psp_device *psp)
 	psp_clear_sev_irq_handler(psp);
 }
 
-static int sev_snp_shutdown_on_panic(struct notifier_block *nb,
-				     unsigned long reason, void *arg)
+static int snp_shutdown_on_panic(struct notifier_block *nb,
+				 unsigned long reason, void *arg)
 {
 	struct sev_device *sev = psp_master->sev_data;
 
-	/*
-	 * Panic callbacks are executed with all other CPUs stopped,
-	 * so don't wait for sev_cmd_mutex to be released since it
-	 * would block here forever.
-	 */
-	if (mutex_is_locked(&sev_cmd_mutex))
-		return NOTIFY_DONE;
-
 	__sev_firmware_shutdown(sev, true);
 
 	return NOTIFY_DONE;
 }
 
-static struct notifier_block sev_snp_panic_notifier = {
-	.notifier_call = sev_snp_shutdown_on_panic,
+static struct notifier_block snp_panic_notifier = {
+	.notifier_call = snp_shutdown_on_panic,
 };
 
 int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
@@ -2322,7 +2300,7 @@ void sev_pci_init(void)
 		"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
 
 	atomic_notifier_chain_register(&panic_notifier_list,
-				       &sev_snp_panic_notifier);
+				       &snp_panic_notifier);
 	return;
 
 err:
@@ -2339,5 +2317,5 @@ void sev_pci_exit(void)
 	sev_firmware_shutdown(sev);
 
 	atomic_notifier_chain_unregister(&panic_notifier_list,
-					 &sev_snp_panic_notifier);
+					 &snp_panic_notifier);
 }

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 22/26] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2023-12-30 16:19 ` [PATCH v1 22/26] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
@ 2024-01-21 11:51   ` Borislav Petkov
  2024-01-26  3:44     ` Michael Roth
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-21 11:51 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh, Marc Orr

On Sat, Dec 30, 2023 at 10:19:50AM -0600, Michael Roth wrote:
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/kvm/lapic.c               |  5 ++++-
>  arch/x86/kvm/svm/nested.c          |  2 +-
>  arch/x86/kvm/svm/sev.c             | 32 ++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c             | 17 +++++++++++++---
>  arch/x86/kvm/svm/svm.h             |  1 +
>  7 files changed, 54 insertions(+), 5 deletions(-)

This one belongs in the second part, the KVM set.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 24/26] crypto: ccp: Add the SNP_PLATFORM_STATUS command
  2023-12-30 16:19 ` [PATCH v1 24/26] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
@ 2024-01-21 12:29   ` Borislav Petkov
  2024-01-26  3:32     ` Michael Roth
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-21 12:29 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick@oracle.com Brijesh Singh

On Sat, Dec 30, 2023 at 10:19:52AM -0600, Michael Roth wrote:
> +	/* Change the page state before accessing it */
> +	if (snp_reclaim_pages(__pa(data), 1, true)) {
> +		snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
> +		return -EFAULT;
> +	}

This looks weird and it doesn't explain why this needs to happen.
SNP_PLATFORM_STATUS text doesn't explain either.

So, what's up?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 25/26] crypto: ccp: Add the SNP_COMMIT command
  2023-12-30 16:19 ` [PATCH v1 25/26] crypto: ccp: Add the SNP_COMMIT command Michael Roth
@ 2024-01-21 12:35   ` Borislav Petkov
  0 siblings, 0 replies; 102+ messages in thread
From: Borislav Petkov @ 2024-01-21 12:35 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang

On Sat, Dec 30, 2023 at 10:19:53AM -0600, Michael Roth wrote:
> +/**
> + * struct sev_data_snp_commit - SNP_COMMIT structure
> + *
> + * @length: len of the command buffer read by the PSP
> + */
> +struct sev_data_snp_commit {
> +	u32 length;
> +} __packed;
> +

diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 11af3dd9126d..31b9266b26ce 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -790,7 +790,7 @@ struct sev_data_snp_shutdown_ex {
 /**
  * struct sev_data_snp_commit - SNP_COMMIT structure
  *
- * @length: len of the command buffer read by the PSP
+ * @length: length of the command buffer read by the PSP
  */
 struct sev_data_snp_commit {
 	u32 length;

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 26/26] crypto: ccp: Add the SNP_SET_CONFIG command
  2023-12-30 16:19 ` [PATCH v1 26/26] crypto: ccp: Add the SNP_SET_CONFIG command Michael Roth
@ 2024-01-21 12:41   ` Borislav Petkov
  2024-01-26 13:30     ` Michael Roth
  0 siblings, 1 reply; 102+ messages in thread
From: Borislav Petkov @ 2024-01-21 12:41 UTC (permalink / raw)
  To: Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	Brijesh Singh, Alexey Kardashevskiy, Dionna Glaze

On Sat, Dec 30, 2023 at 10:19:54AM -0600, Michael Roth wrote:
> +The SNP_SET_CONFIG is used to set the system-wide configuration such as
> +reported TCB version in the attestation report. The command is similar to
> +SNP_CONFIG command defined in the SEV-SNP spec. The current values of the
> +firmware parameters affected by this command can be queried via
> +SNP_PLATFORM_STATUS.

diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 4f696aacc866..14c9de997b7d 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -169,10 +169,10 @@ that of the currently installed firmware.
 :Parameters (in): struct sev_user_data_snp_config
 :Returns (out): 0 on success, -negative on error
 
-The SNP_SET_CONFIG is used to set the system-wide configuration such as
-reported TCB version in the attestation report. The command is similar to
-SNP_CONFIG command defined in the SEV-SNP spec. The current values of the
-firmware parameters affected by this command can be queried via
+SNP_SET_CONFIG is used to set the system-wide configuration such as
+reported TCB version in the attestation report. The command is similar
+to SNP_CONFIG command defined in the SEV-SNP spec. The current values of
+the firmware parameters affected by this command can be queried via
 SNP_PLATFORM_STATUS.
 
 3. SEV-SNP CPUID Enforcement

---

Ok, you're all reviewed. Please send a new revision with *all* feedback
addressed so that I can queue it.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-16 20:22             ` Dave Hansen
@ 2024-01-26  1:35               ` Michael Roth
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2024-01-26  1:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Tom Lendacky, Borislav Petkov, x86, kvm, linux-coco, linux-mm,
	linux-crypto, linux-kernel, tglx, mingo, jroedel, hpa, ardb,
	pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
	pgonda, peterz, srinivas.pandruvada, rientjes, tobin, vbabka,
	kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
	jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh, rppt

On Tue, Jan 16, 2024 at 12:22:39PM -0800, Dave Hansen wrote:
> On 1/16/24 08:19, Michael Roth wrote:
> > 
> > So at the very least, if we went down this path, we would be worth
> > investigating the following areas in addition to general perf testing:
> > 
> >   1) Only splitting directmap regions corresponding to kernel-allocatable
> >      *data* (hopefully that's even feasible...)
> 
> Take a look at the 64-bit memory map in here:
> 
> 	https://www.kernel.org/doc/Documentation/x86/x86_64/mm.rst
> 
> We already have separate mappings for kernel data and (normal) kernel text.
> 
> >   2) Potentially deferring the split until an SNP guest is actually
> >      run, so there isn't any impact just from having SNP enabled (though
> >      you still take a hit from RMP checks in that case so maybe it's not
> >      worthwhile, but that itself has been noted as a concern for users
> >      so it would be nice to not make things even worse).
> 
> Yes, this would be nice too.
> 
> >> Actually, where _is_ the TLB flushing here?
> > Boris pointed that out in v6, and we implemented it in v7, but it
> > completely cratered performance:
> 
> That *desperately* needs to be documented.
> 
> How can it be safe to skip the TLB flush?  It this akin to a page
> permission promotion where you go from RO->RW but can skip the TLB
> flush?  In that case, the CPU will see the RO TLB entry violation, drop
> it, and re-walk the page tables, discovering the RW entry.
> 
> Does something similar happen here where the CPU sees the 2M/4k conflict
> in the TLB, drops the 2M entry, does a re-walk then picks up the
> newly-split 2M->4k entries?
> 
> I can see how something like that would work, but it's _awfully_ subtle
> to go unmentioned.

I think the current logic relies on the fact that we don't actually have
to remove entries from the RMP table to avoid unexpected RMP #PFs,
because the owners of private PFNs are aware of what access restrictions
would be active, and non-owners of private PFNs shouldn't be trying to
write to them. 

It's only cases where a non-owner does a write via a 2MB+ mapping to that
happens to overlap a private PFN that needs to be guarded against, and
when a private PFN is removed from the directmap it will end up
splitting the 2MB mapping, and in that cases there *is* a TLB flush that
would wipe out the 2MB TLB entry.

After that point, there may still be stale 4KB TLB entries that accrue,
but by the time there is any sensible reason to use them to perform a
write, the PFN would have been transitioned back to shared state and
re-added to the directmap.

But yes, it's ugly, and we've ended up re-working this to simply split
the directmap via set_memory_4k(), which also does the expected TLB
flushing of 2MB+ mappings, and we only call it when absolutely
necessary, benefitting from the following optimizations:

  - if lookup_address() tells us the mapping is already split, no work
    splitting is needed
  - when the rmpupdate() is for a 2MB private page/RMP entry, there's no
    need to split the directmap, because there's no risk of a non-owner's
    writes overlapping with the private PFNs (RMP checks are only done on
    4KB/2MB granularity, so even a write via a 1GB directmap mapping
    would not trigger an RMP fault since the non-owner's writes would be
    to a different 2MB PFN range.

Since KVM/gmem generally allocate 2MB backing pages for private guest
memory, the above optimizations result in the directmap only needing to
be split, and only needing to acquire the cpa_lock, a handful of times
for each guest.

This also sort of gives us what we were looking for earlier: a way to
defer the directmap split until it's actually needed. But rather than in
bulk, it's amortized over time, and the more it's done, the more likely it
is that the host is being used primarily to run SNP guests, and so the less
likely it is that splitting the directmap is going to cause some sort of
noticeable regression for non-SNP/non-guest workloads.

I did some basic testing to compare this against pre-splitting the
directmap, and I think it's pretty close performance-wise, so it seems like
a good, safe default. It's also fairly trivial to add a kernel cmdline
params or something later if someone wants to optimize specifically for
SNP guests.

  System: EPYC, 512 cores, 1.5TB memory
  
  == Page-in/acceptance rate for 1 guests, each with 128 vCPUs/512M ==
  
     directmap pre-split to 4K:
     13.77GB/s
     11.88GB/s
     15.37GB/s

     directmap split on-demand:
     14.77 GB/s
     16.57 GB/s
     15.53 GB/s
  
  == Page-in/acceptance rate for 4 guests, each with 128 vCPUs/256GB ==
  
     directmap pre-split to 4K:
     37.35 GB/s
     33.37 GB/s
     29.55 GB/s
    
     directmap split on-demand:
     28.88 GB/s
     25.13 GB/s
     29.28 GB/s
  
  == Page-in/acceptance rate for 8 guests, each with 128 vCPUs/160GB ==
  
     directmap pre-split to 4K:
     24.39 GB/s
     27.16 GB/s
     24.70 GB/s
     
     direct split on-demand:
     26.12 GB/s
     24.44 GB/s
     22.66 GB/s

So for v2 we're using this on-demand approach, and I've tried to capture
some of this rationale and in the commit log / comments for the relevant
patch.

Thanks,

Mike

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
  2024-01-16 20:12               ` Mike Rapoport
@ 2024-01-26  1:49                 ` Michael Roth
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2024-01-26  1:49 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Dave Hansen, Tom Lendacky, Borislav Petkov, x86, kvm, linux-coco,
	linux-mm, linux-crypto, linux-kernel, tglx, mingo, jroedel, hpa,
	ardb, pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen,
	slp, pgonda, peterz, srinivas.pandruvada, rientjes, tobin,
	vbabka, kirill, ak, tony.luck, sathyanarayanan.kuppuswamy,
	alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick, zhi.a.wang, Brijesh Singh

On Tue, Jan 16, 2024 at 10:12:26PM +0200, Mike Rapoport wrote:
> On Tue, Jan 16, 2024 at 10:50:25AM -0600, Michael Roth wrote:
> > On Tue, Jan 16, 2024 at 10:19:09AM -0600, Michael Roth wrote:
> > > 
> > > The downside of course is potential impact for non-SNP workloads
> > > resulting from splitting the directmap. Mike Rapoport's numbers make
> > > me feel a little better about it, but I don't think they apply directly
> > > to the notion of splitting the entire directmap. It's Even he LWN article
> > > summarizes:
> 
> When I ran the tests, I had the entire direct map forced to 4k or 2M pages.
> There is indeed some impact and some tests suffer more than others but
> there were also runs that benefit from smaller page size in the direct map,
> at least if I remember correctly the results Intel folks posted a while
> ago.

I see, thanks for clarifying. Certainly helps to have this data if someone
ends up wanting to investigate pre-splitting directmap to optimize SNP
use-cases in the future. Still feel more comfortable introducing this as an
optional tuneable though; can't help but worry about that *1* workload that
somehow suffers perf regression and has us frantically re-working SNP
implementation if we were to start off with making 4k directmap a requirement.

>  
> > >   "The conclusion from all of this, Rapoport continued, was that
> > >   direct-map fragmentation just does not matter — for data access, at
> > >   least. Using huge-page mappings does still appear to make a difference
> > >   for memory containing the kernel code, so allocator changes should
> > >   focus on code allocations — improving the layout of allocations for
> > >   loadable modules, for example, or allowing vmalloc() to allocate huge
> > >   pages for code. But, for kernel-data allocations, direct-map
> > >   fragmentation simply appears to not be worth worrying about."
> > > 
> > > So at the very least, if we went down this path, we would be worth
> > > investigating the following areas in addition to general perf testing:
> > > 
> > >   1) Only splitting directmap regions corresponding to kernel-allocatable
> > >      *data* (hopefully that's even feasible...)
> 
> Forcing the direct map to 4k pages does not affect the kernel text
> mappings, they are created separately and they are not the part of the
> kernel mapping of the physical memory.
> Well, except the modules, but they are mapped with 4k pages anyway.

Thanks!

-Mike

> 
> > -Mike
> 
> -- 
> Sincerely yours,
> Mike.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 12/26] crypto: ccp: Define the SEV-SNP commands
  2024-01-15  9:41   ` Borislav Petkov
@ 2024-01-26  1:56     ` Michael Roth
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2024-01-26  1:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick@oracle.com Brijesh Singh

On Mon, Jan 15, 2024 at 10:41:12AM +0100, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:40AM -0600, Michael Roth wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> > 
> > AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
> > Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
> > while adding new hardware security protection.
> > 
> > Define the commands and structures used to communicate with the AMD-SP
> > when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
> > is available at developer.amd.com/sev.
> > 
> > Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > [mdr: update SNP command list and SNP status struct based on current
> >       spec, use C99 flexible arrays, fix kernel-doc issues]
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  drivers/crypto/ccp/sev-dev.c |  16 +++
> >  include/linux/psp-sev.h      | 264 +++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/psp-sev.h |  56 ++++++++
> >  3 files changed, 336 insertions(+)
> 
> More ignored feedback:
> 
> https://lore.kernel.org/r/20231124143630.GKZWC07hjqxkf60ni4@fat_crate.local
> 
> Lemme send it to you as a diff then - it'll work then perhaps.
> 
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 983d314b5ff5..1a76b5297f03 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -104,7 +104,7 @@ enum sev_cmd {
>  	SEV_CMD_SNP_PAGE_RECLAIM	= 0x0C7,
>  	SEV_CMD_SNP_PAGE_UNSMASH	= 0x0C8,
>  	SEV_CMD_SNP_CONFIG		= 0x0C9,
> -	SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX	= 0x0CA,
> +	SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX = 0x0CA,
>  	SEV_CMD_SNP_COMMIT		= 0x0CB,
>  	SEV_CMD_SNP_VLEK_LOAD		= 0x0CD,
>  
> @@ -624,7 +624,8 @@ enum {
>   * @gctx_paddr: system physical address of guest context page
>   * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
>   * @page_type: encoded page type
> - * @imi_page: indicates that this page is part of the IMI of the guest
> + * @imi_page: indicates that this page is part of the IMI (Incoming
> + * Migration Image) of the guest

I'd added it in sev_data_snp_launch_start kernel-doc where you originally
mentioned it. I've gone ahead and clarification for all kernel-doc occurances
of IMI.

-Mike

>   * @rsvd: reserved
>   * @rsvd2: reserved
>   * @address: system physical address of destination page to encrypt
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
  2024-01-15 19:53   ` Borislav Petkov
@ 2024-01-26  2:48     ` Michael Roth
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2024-01-26  2:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick@oracle.com Brijesh Singh, Jarkko Sakkinen

On Mon, Jan 15, 2024 at 08:53:46PM +0100, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:41AM -0600, Michael Roth wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> > 
> > Before SNP VMs can be launched, the platform must be appropriately
> > configured and initialized. Platform initialization is accomplished via
> > the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
> > command to prepare for the first SNP guest launch after INIT.
> 							  ^^^^^^
> Which "INIT"?
> 
> Sounds like after hipervisor's init...

This is referring to the WBINVD/DF_FLUSH needs after SNP_INIT and before
launch of first SNP guest. I'd actually already removed this line from
the commit msg since it's explained in better detail in comments below
and it seemed out of place where it originally was.

-Mike

> 
> > During the execution of SNP_INIT command, the firmware configures
> > and enables SNP security policy enforcement in many system components.
> > Some system components write to regions of memory reserved by early
> > x86 firmware (e.g. UEFI). Other system components write to regions
> > provided by the operation system, hypervisor, or x86 firmware.
> > Such system components can only write to HV-fixed pages or Default
> > pages. They will error when attempting to write to other page states
> 
> "... to pages in other page states... "
> 
> > after SNP_INIT enables their SNP enforcement.
> 
> And yes, this version looks much better. Some text cleanups ontop:
> 
> ---
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 85634d4f8cfe..7942ec730525 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -549,24 +549,22 @@ static int __sev_snp_init_locked(int *error)
>  		return 0;
>  	}
>  
> -	/*
> -	 * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
> -	 * across all cores.
> -	 */
> +	/* SNP_INIT requires MSR_VM_HSAVE_PA to be cleared on all CPUs. */
>  	on_each_cpu(snp_set_hsave_pa, NULL, 1);
>  
>  	/*
> -	 * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
> -	 * system physical address ranges to convert into the HV-fixed page states
> -	 * during the RMP initialization.  For instance, the memory that UEFI
> -	 * reserves should be included in the range list. This allows system
> +	 * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list
> +	 * of system physical address ranges to convert into HV-fixed page
> +	 * states during the RMP initialization.  For instance, the memory that
> +	 * UEFI reserves should be included in the that list. This allows system
>  	 * components that occasionally write to memory (e.g. logging to UEFI
> -	 * reserved regions) to not fail due to RMP initialization and SNP enablement.
> +	 * reserved regions) to not fail due to RMP initialization and SNP
> +	 * enablement.
>  	 */
>  	if (sev_version_greater_or_equal(SNP_MIN_API_MAJOR, 52)) {
>  		/*
>  		 * Firmware checks that the pages containing the ranges enumerated
> -		 * in the RANGES structure are either in the Default page state or in the
> +		 * in the RANGES structure are either in the default page state or in the
>  		 * firmware page state.
>  		 */
>  		snp_range_list = kzalloc(PAGE_SIZE, GFP_KERNEL);
> @@ -577,7 +575,7 @@ static int __sev_snp_init_locked(int *error)
>  		}
>  
>  		/*
> -		 * Retrieve all reserved memory regions setup by UEFI from the e820 memory map
> +		 * Retrieve all reserved memory regions from the e820 memory map
>  		 * to be setup as HV-fixed pages.
>  		 */
>  		rc = walk_iomem_res_desc(IORES_DESC_NONE, IORESOURCE_MEM, 0, ~0,
> @@ -599,14 +597,13 @@ static int __sev_snp_init_locked(int *error)
>  	}
>  
>  	/*
> -	 * The following sequence must be issued before launching the
> -	 * first SNP guest to ensure all dirty cache lines are flushed,
> -	 * including from updates to the RMP table itself via RMPUPDATE
> -	 * instructions:
> +	 * The following sequence must be issued before launching the first SNP
> +	 * guest to ensure all dirty cache lines are flushed, including from
> +	 * updates to the RMP table itself via the RMPUPDATE instruction:
>  	 *
> -	 * - WBINDV on all running CPUs
> +	 * - WBINVD on all running CPUs
>  	 * - SEV_CMD_SNP_INIT[_EX] firmware command
> -	 * - WBINDV on all running CPUs
> +	 * - WBINVD on all running CPUs
>  	 * - SEV_CMD_SNP_DF_FLUSH firmware command
>  	 */
>  	wbinvd_on_all_cpus();
> 
> 
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 21/26] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
  2024-01-21 11:49   ` Borislav Petkov
@ 2024-01-26  3:03     ` Kalra, Ashish
  2024-01-26 13:38     ` Michael Roth
  1 sibling, 0 replies; 102+ messages in thread
From: Kalra, Ashish @ 2024-01-26  3:03 UTC (permalink / raw)
  To: Borislav Petkov, Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	nikunj.dadhania, pankaj.gupta, liam.merwick

Hello Boris,

On 1/21/2024 5:49 AM, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:49AM -0600, Michael Roth wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> Add a kdump safe version of sev_firmware_shutdown() registered as a
>> crash_kexec_post_notifier, which is invoked during panic/crash to do
>> SEV/SNP shutdown. This is required for transitioning all IOMMU pages
>> to reclaim/hypervisor state, otherwise re-init of IOMMU pages during
>> crashdump kernel boot fails and panics the crashdump kernel. This
>> panic notifier runs in atomic context, hence it ensures not to
>> acquire any locks/mutexes and polls for PSP command completion
>> instead of depending on PSP command completion interrupt.
>>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> [mdr: remove use of "we" in comments]
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Cleanups ontop, see if the below works too. Especially:
>
> * I've zapped the WBINVD before the TMR pages are freed because
> __sev_snp_shutdown_locked() will WBINVD anyway.

This flush is required for TMR, as TMR is encrypted and it needs to be 
flushed from cache before being reclaimed and freed, therefore this 
flush is required.

SNP_SHUTDOWN_EX may additionally require wbinvd + DF_FLUSH, therefore 
there is another WBINVD in __sev_snp_shutdown_locked().

>
> * The mutex_is_locked() check in snp_shutdown_on_panic() is silly
> because the panic notifier runs on one CPU anyway.

But, what if there is an active command on the PSP when panic occurs, 
the mutex will already be acquired in such a case and we can't issue 
another PSP command if there is an active PSP command, so i believe this 
check is required.

Thanks, Ashish


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 24/26] crypto: ccp: Add the SNP_PLATFORM_STATUS command
  2024-01-21 12:29   ` Borislav Petkov
@ 2024-01-26  3:32     ` Michael Roth
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2024-01-26  3:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick@oracle.com Brijesh Singh

On Sun, Jan 21, 2024 at 01:29:20PM +0100, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:52AM -0600, Michael Roth wrote:
> > +	/* Change the page state before accessing it */
> > +	if (snp_reclaim_pages(__pa(data), 1, true)) {
> > +		snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
> > +		return -EFAULT;
> > +	}
> 
> This looks weird and it doesn't explain why this needs to happen.
> SNP_PLATFORM_STATUS text doesn't explain either.
> 
> So, what's up?

I've adding some clarifying comment in v2, but the page that firmware
writes needs to first be switched to the Firmware-owned state, and
after successful completion it will be put in Reclaim state. But it's
possible a failure might occur before that transition is made by
firmware, maybe the command fails somewhere in the callstack before it
even reaches firmware.

If that happens the page might still be in firmware-owned state, and
need to go through snp_reclaim_pages()/SNP_PAGE_RECLAIM before it can
be switched back to Default state.

Rather than trying to special-case all these possibilities, it's simpler
to just always use snp_reclaim_pages(), which will handle both Reclaim
and Firmware-owned pages.

However, snp_reclaim_pages() will already leak the page when necessary,
so I've dropped that bit.

-Mike

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 22/26] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2024-01-21 11:51   ` Borislav Petkov
@ 2024-01-26  3:44     ` Michael Roth
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2024-01-26  3:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	zhi.a.wang, Brijesh Singh, Marc Orr

On Sun, Jan 21, 2024 at 12:51:21PM +0100, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:50AM -0600, Michael Roth wrote:
> >  arch/x86/include/asm/kvm-x86-ops.h |  1 +
> >  arch/x86/include/asm/kvm_host.h    |  1 +
> >  arch/x86/kvm/lapic.c               |  5 ++++-
> >  arch/x86/kvm/svm/nested.c          |  2 +-
> >  arch/x86/kvm/svm/sev.c             | 32 ++++++++++++++++++++++++++++++
> >  arch/x86/kvm/svm/svm.c             | 17 +++++++++++++---
> >  arch/x86/kvm/svm/svm.h             |  1 +
> >  7 files changed, 54 insertions(+), 5 deletions(-)
> 
> This one belongs in the second part, the KVM set.

If we enable the RMP table (the following patch) without this patch in
place, it can still cause crashes for legacy guests.

I'd moved it earlier into this part of the series based on Paolo's concerns
about that, so my hope was that he'd be willing to give it an Acked-by if
needed so it can go through your tree.

-Mike

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 18/26] crypto: ccp: Handle legacy SEV commands when SNP is enabled
  2024-01-19 17:18   ` Borislav Petkov
  2024-01-19 17:36     ` Tom Lendacky
@ 2024-01-26 13:29     ` Michael Roth
  1 sibling, 0 replies; 102+ messages in thread
From: Michael Roth @ 2024-01-26 13:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick

On Fri, Jan 19, 2024 at 06:18:30PM +0100, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:46AM -0600, Michael Roth wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> > 
> > The behavior of legacy SEV commands is altered when the firmware is
> > initialized for SNP support. In that case, all command buffer memory
> > that may get written to by legacy SEV commands must be marked as
> > firmware-owned in the RMP table prior to issuing the command.
> > 
> > Additionally, when a command buffer contains a system physical address
> > that points to additional buffers that firmware may write to, special
> > handling is needed depending on whether:
> > 
> >   1) the system physical address points to guest memory
> >   2) the system physical address points to host memory
> > 
> > To handle case #1, the pages of these buffers are changed to
> > firmware-owned in the RMP table before issuing the command, and restored
> > to after the command completes.
> > 
> > For case #2, a bounce buffer is used instead of the original address.
> > 
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Co-developed-by: Michael Roth <michael.roth@amd.com>
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >  drivers/crypto/ccp/sev-dev.c | 421 ++++++++++++++++++++++++++++++++++-
> >  drivers/crypto/ccp/sev-dev.h |   3 +
> >  2 files changed, 414 insertions(+), 10 deletions(-)
> 
> Definitely better, thanks.
> 
> Some cleanups ontop:
> 
> ---
> 
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 8cfb376ca2e7..7681c094c7ff 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -514,18 +514,21 @@ static void *sev_fw_alloc(unsigned long len)
>   * struct cmd_buf_desc - descriptors for managing legacy SEV command address
>   * parameters corresponding to buffers that may be written to by firmware.
>   *
> - * @paddr_ptr: pointer the address parameter in the command buffer, which may
> - *	need to be saved/restored depending on whether a bounce buffer is used.
> - *	Must be NULL if this descriptor is only an end-of-list indicator.
> + * @paddr: address which may need to be saved/restored depending on whether
> + * a bounce buffer is used. Must be NULL if this descriptor is only an
> + * end-of-list indicator.
> + *
>   * @paddr_orig: storage for the original address parameter, which can be used to
> - *	restore the original value in @paddr_ptr in cases where it is replaced
> - *	with the address of a bounce buffer.
> - * @len: length of buffer located at the address originally stored at @paddr_ptr
> + * restore the original value in @paddr in cases where it is replaced with
> + * the address of a bounce buffer.
> + *
> + * @len: length of buffer located at the address originally stored at @paddr
> + *
>   * @guest_owned: true if the address corresponds to guest-owned pages, in which
> - *	case bounce buffers are not needed.
> + * case bounce buffers are not needed.

In v2 I've fixed up the alignments in accordance with
Documentation/doc-guide/kernel-doc.rst, which asks for the starting column of
a multi-line description to be the same as first line, and asked for newlines
to not be inserted in between parameters/members. I've gone back and updated
other occurences in the series as well.

>   */
>  struct cmd_buf_desc {
> -	u64 *paddr_ptr;
> +	u64 paddr;

Unfortunately the logic here doesn't really work with this approach. If
a descriptor needs to be re-mapped to a bounce buffer, the command
buffer that contains the original address of the buffer needs to be
updated to point to it, so that's where the pointer comes into play, so
I've kept this in place for v2, but worked on all your other
suggestions.

Thanks,

Mike

>  	u64 paddr_orig;
>  	u32 len;
>  	bool guest_owned;
> @@ -549,30 +552,30 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
>  	case SEV_CMD_PDH_CERT_EXPORT: {
>  		struct sev_data_pdh_cert_export *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->pdh_cert_address;
> +		desc_list[0].paddr = data->pdh_cert_address;
>  		desc_list[0].len = data->pdh_cert_len;
> -		desc_list[1].paddr_ptr = &data->cert_chain_address;
> +		desc_list[1].paddr = data->cert_chain_address;
>  		desc_list[1].len = data->cert_chain_len;
>  		break;
>  	}
>  	case SEV_CMD_GET_ID: {
>  		struct sev_data_get_id *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->address;
> +		desc_list[0].paddr = data->address;
>  		desc_list[0].len = data->len;
>  		break;
>  	}
>  	case SEV_CMD_PEK_CSR: {
>  		struct sev_data_pek_csr *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->address;
> +		desc_list[0].paddr = data->address;
>  		desc_list[0].len = data->len;
>  		break;
>  	}
>  	case SEV_CMD_LAUNCH_UPDATE_DATA: {
>  		struct sev_data_launch_update_data *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->address;
> +		desc_list[0].paddr = data->address;
>  		desc_list[0].len = data->len;
>  		desc_list[0].guest_owned = true;
>  		break;
> @@ -580,7 +583,7 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
>  	case SEV_CMD_LAUNCH_UPDATE_VMSA: {
>  		struct sev_data_launch_update_vmsa *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->address;
> +		desc_list[0].paddr = data->address;
>  		desc_list[0].len = data->len;
>  		desc_list[0].guest_owned = true;
>  		break;
> @@ -588,14 +591,14 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
>  	case SEV_CMD_LAUNCH_MEASURE: {
>  		struct sev_data_launch_measure *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->address;
> +		desc_list[0].paddr = data->address;
>  		desc_list[0].len = data->len;
>  		break;
>  	}
>  	case SEV_CMD_LAUNCH_UPDATE_SECRET: {
>  		struct sev_data_launch_secret *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->guest_address;
> +		desc_list[0].paddr = data->guest_address;
>  		desc_list[0].len = data->guest_len;
>  		desc_list[0].guest_owned = true;
>  		break;
> @@ -603,7 +606,7 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
>  	case SEV_CMD_DBG_DECRYPT: {
>  		struct sev_data_dbg *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->dst_addr;
> +		desc_list[0].paddr = data->dst_addr;
>  		desc_list[0].len = data->len;
>  		desc_list[0].guest_owned = true;
>  		break;
> @@ -611,7 +614,7 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
>  	case SEV_CMD_DBG_ENCRYPT: {
>  		struct sev_data_dbg *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->dst_addr;
> +		desc_list[0].paddr = data->dst_addr;
>  		desc_list[0].len = data->len;
>  		desc_list[0].guest_owned = true;
>  		break;
> @@ -619,39 +622,39 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
>  	case SEV_CMD_ATTESTATION_REPORT: {
>  		struct sev_data_attestation_report *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->address;
> +		desc_list[0].paddr = data->address;
>  		desc_list[0].len = data->len;
>  		break;
>  	}
>  	case SEV_CMD_SEND_START: {
>  		struct sev_data_send_start *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->session_address;
> +		desc_list[0].paddr = data->session_address;
>  		desc_list[0].len = data->session_len;
>  		break;
>  	}
>  	case SEV_CMD_SEND_UPDATE_DATA: {
>  		struct sev_data_send_update_data *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->hdr_address;
> +		desc_list[0].paddr = data->hdr_address;
>  		desc_list[0].len = data->hdr_len;
> -		desc_list[1].paddr_ptr = &data->trans_address;
> +		desc_list[1].paddr = data->trans_address;
>  		desc_list[1].len = data->trans_len;
>  		break;
>  	}
>  	case SEV_CMD_SEND_UPDATE_VMSA: {
>  		struct sev_data_send_update_vmsa *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->hdr_address;
> +		desc_list[0].paddr = data->hdr_address;
>  		desc_list[0].len = data->hdr_len;
> -		desc_list[1].paddr_ptr = &data->trans_address;
> +		desc_list[1].paddr = data->trans_address;
>  		desc_list[1].len = data->trans_len;
>  		break;
>  	}
>  	case SEV_CMD_RECEIVE_UPDATE_DATA: {
>  		struct sev_data_receive_update_data *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->guest_address;
> +		desc_list[0].paddr = data->guest_address;
>  		desc_list[0].len = data->guest_len;
>  		desc_list[0].guest_owned = true;
>  		break;
> @@ -659,7 +662,7 @@ static void snp_populate_cmd_buf_desc_list(int cmd, void *cmd_buf,
>  	case SEV_CMD_RECEIVE_UPDATE_VMSA: {
>  		struct sev_data_receive_update_vmsa *data = cmd_buf;
> 
> -		desc_list[0].paddr_ptr = &data->guest_address;
> +		desc_list[0].paddr = data->guest_address;
>  		desc_list[0].len = data->guest_len;
>  		desc_list[0].guest_owned = true;
>  		break;
> @@ -687,16 +690,16 @@ static int snp_map_cmd_buf_desc(struct cmd_buf_desc *desc)
>  			return -ENOMEM;
>  		}
> 
> -		desc->paddr_orig = *desc->paddr_ptr;
> -		*desc->paddr_ptr = __psp_pa(page_to_virt(page));
> +		desc->paddr_orig = desc->paddr;
> +		desc->paddr = __psp_pa(page_to_virt(page));
>  	}
> 
> -	paddr = *desc->paddr_ptr;
> +	paddr = desc->paddr;
>  	npages = PAGE_ALIGN(desc->len) >> PAGE_SHIFT;
> 
>  	/* Transition the buffer to firmware-owned. */
>  	if (rmp_mark_pages_firmware(paddr, npages, true)) {
> -		pr_warn("Failed move pages to firmware-owned state for SEV legacy command.\n");
> +		pr_warn("Error moving pages to firmware-owned state for SEV legacy command.\n");
>  		return -EFAULT;
>  	}
> 
> @@ -705,31 +708,29 @@ static int snp_map_cmd_buf_desc(struct cmd_buf_desc *desc)
> 
>  static int snp_unmap_cmd_buf_desc(struct cmd_buf_desc *desc)
>  {
> -	unsigned long paddr;
>  	unsigned int npages;
> 
>  	if (!desc->len)
>  		return 0;
> 
> -	paddr = *desc->paddr_ptr;
>  	npages = PAGE_ALIGN(desc->len) >> PAGE_SHIFT;
> 
>  	/* Transition the buffers back to hypervisor-owned. */
> -	if (snp_reclaim_pages(paddr, npages, true)) {
> +	if (snp_reclaim_pages(desc->paddr, npages, true)) {
>  		pr_warn("Failed to reclaim firmware-owned pages while issuing SEV legacy command.\n");
>  		return -EFAULT;
>  	}
> 
>  	/* Copy data from bounce buffer and then free it. */
>  	if (!desc->guest_owned) {
> -		void *bounce_buf = __va(__sme_clr(paddr));
> +		void *bounce_buf = __va(__sme_clr(desc->paddr));
>  		void *dst_buf = __va(__sme_clr(desc->paddr_orig));
> 
>  		memcpy(dst_buf, bounce_buf, desc->len);
>  		__free_pages(virt_to_page(bounce_buf), get_order(desc->len));
> 
>  		/* Restore the original address in the command buffer. */
> -		*desc->paddr_ptr = desc->paddr_orig;
> +		desc->paddr = desc->paddr_orig;
>  	}
> 
>  	return 0;
> @@ -737,14 +738,14 @@ static int snp_unmap_cmd_buf_desc(struct cmd_buf_desc *desc)
> 
>  static int snp_map_cmd_buf_desc_list(int cmd, void *cmd_buf, struct cmd_buf_desc *desc_list)
>  {
> -	int i, n;
> +	int i;
> 
>  	snp_populate_cmd_buf_desc_list(cmd, cmd_buf, desc_list);
> 
>  	for (i = 0; i < CMD_BUF_DESC_MAX; i++) {
>  		struct cmd_buf_desc *desc = &desc_list[i];
> 
> -		if (!desc->paddr_ptr)
> +		if (!desc->paddr)
>  			break;
> 
>  		if (snp_map_cmd_buf_desc(desc))
> @@ -754,8 +755,7 @@ static int snp_map_cmd_buf_desc_list(int cmd, void *cmd_buf, struct cmd_buf_desc
>  	return 0;
> 
>  err_unmap:
> -	n = i;
> -	for (i = 0; i < n; i++)
> +	for (i--; i >= 0; i--)
>  		snp_unmap_cmd_buf_desc(&desc_list[i]);
> 
>  	return -EFAULT;
> @@ -768,7 +768,7 @@ static int snp_unmap_cmd_buf_desc_list(struct cmd_buf_desc *desc_list)
>  	for (i = 0; i < CMD_BUF_DESC_MAX; i++) {
>  		struct cmd_buf_desc *desc = &desc_list[i];
> 
> -		if (!desc->paddr_ptr)
> +		if (!desc->paddr)
>  			break;
> 
>  		if (snp_unmap_cmd_buf_desc(desc))
> @@ -799,8 +799,8 @@ static bool sev_cmd_buf_writable(int cmd)
>  	}
>  }
> 
> -/* After SNP is INIT'ed, the behavior of legacy SEV commands is changed. */
> -static bool snp_legacy_handling_needed(int cmd)
> +/* After SNP is initialized, the behavior of legacy SEV commands is changed. */
> +static inline bool snp_legacy_handling_needed(int cmd)
>  {
>  	struct sev_device *sev = psp_master->sev_data;
> 
> @@ -891,7 +891,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>  			sev->cmd_buf_backup_active = true;
>  		} else {
>  			dev_err(sev->dev,
> -				"SEV: too many firmware commands are in-progress, no command buffers available.\n");
> +				"SEV: too many firmware commands in progress, no command buffers available.\n");
>  			return -EBUSY;
>  		}
> 
> @@ -904,7 +904,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
>  		ret = snp_prep_cmd_buf(cmd, cmd_buf, desc_list);
>  		if (ret) {
>  			dev_err(sev->dev,
> -				"SEV: failed to prepare buffer for legacy command %#x. Error: %d\n",
> +				"SEV: failed to prepare buffer for legacy command 0x%#x. Error: %d\n",
>  				cmd, ret);
>  			return ret;
>  		}
> 
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 26/26] crypto: ccp: Add the SNP_SET_CONFIG command
  2024-01-21 12:41   ` Borislav Petkov
@ 2024-01-26 13:30     ` Michael Roth
  0 siblings, 0 replies; 102+ messages in thread
From: Michael Roth @ 2024-01-26 13:30 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
	Brijesh Singh, Alexey Kardashevskiy, Dionna Glaze

On Sun, Jan 21, 2024 at 01:41:02PM +0100, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:54AM -0600, Michael Roth wrote:
> > +The SNP_SET_CONFIG is used to set the system-wide configuration such as
> > +reported TCB version in the attestation report. The command is similar to
> > +SNP_CONFIG command defined in the SEV-SNP spec. The current values of the
> > +firmware parameters affected by this command can be queried via
> > +SNP_PLATFORM_STATUS.
> 
> diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
> index 4f696aacc866..14c9de997b7d 100644
> --- a/Documentation/virt/coco/sev-guest.rst
> +++ b/Documentation/virt/coco/sev-guest.rst
> @@ -169,10 +169,10 @@ that of the currently installed firmware.
>  :Parameters (in): struct sev_user_data_snp_config
>  :Returns (out): 0 on success, -negative on error
>  
> -The SNP_SET_CONFIG is used to set the system-wide configuration such as
> -reported TCB version in the attestation report. The command is similar to
> -SNP_CONFIG command defined in the SEV-SNP spec. The current values of the
> -firmware parameters affected by this command can be queried via
> +SNP_SET_CONFIG is used to set the system-wide configuration such as
> +reported TCB version in the attestation report. The command is similar
> +to SNP_CONFIG command defined in the SEV-SNP spec. The current values of
> +the firmware parameters affected by this command can be queried via
>  SNP_PLATFORM_STATUS.
>  
>  3. SEV-SNP CPUID Enforcement
> 
> ---
> 
> Ok, you're all reviewed. Please send a new revision with *all* feedback
> addressed so that I can queue it.

Thanks! Unless otherwise noted, I *think* I got everything this time. :)

-Mike

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 21/26] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
  2024-01-21 11:49   ` Borislav Petkov
  2024-01-26  3:03     ` Kalra, Ashish
@ 2024-01-26 13:38     ` Michael Roth
  1 sibling, 0 replies; 102+ messages in thread
From: Michael Roth @ 2024-01-26 13:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick

On Sun, Jan 21, 2024 at 12:49:00PM +0100, Borislav Petkov wrote:
> On Sat, Dec 30, 2023 at 10:19:49AM -0600, Michael Roth wrote:
> > From: Ashish Kalra <ashish.kalra@amd.com>
> > 
> > Add a kdump safe version of sev_firmware_shutdown() registered as a
> > crash_kexec_post_notifier, which is invoked during panic/crash to do
> > SEV/SNP shutdown. This is required for transitioning all IOMMU pages
> > to reclaim/hypervisor state, otherwise re-init of IOMMU pages during
> > crashdump kernel boot fails and panics the crashdump kernel. This
> > panic notifier runs in atomic context, hence it ensures not to
> > acquire any locks/mutexes and polls for PSP command completion
> > instead of depending on PSP command completion interrupt.
> > 
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > [mdr: remove use of "we" in comments]
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> 
> Cleanups ontop, see if the below works too. Especially:
> 
> * I've zapped the WBINVD before the TMR pages are freed because
> __sev_snp_shutdown_locked() will WBINVD anyway.

As Ashish mentioned the wbinvd_on_all_cpus() is still needed for general
TMR handling even outside of panic/SNP, but I think you're right about
the panic==true where the handling is SNP-specific and so the wbinvd()
that gets called later during SNP shutdown will take care of it, so I've
adjusted things accordingly.

> 
> * The mutex_is_locked() check in snp_shutdown_on_panic() is silly
> because the panic notifier runs on one CPU anyway.

Based on discussion with Ashish I've updated the comments in v2
regarding this to make it a little clearer why it might still be a good
idea to keep that in to avoid weird unexpected behavior if we try to
issue PSP commands while another one was already in-flight by another
task before the panic.

But I squashed in all the other changes here as-is.

-Mike

> 
> Thx.
> 
> ---
> 
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 435ba9bc4510..27323203e593 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -227,6 +227,7 @@ int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct sn
>  void snp_accept_memory(phys_addr_t start, phys_addr_t end);
>  u64 snp_get_unsupported_features(u64 status);
>  u64 sev_get_status(void);
> +void kdump_sev_callback(void);
>  #else
>  static inline void sev_es_ist_enter(struct pt_regs *regs) { }
>  static inline void sev_es_ist_exit(void) { }
> @@ -255,6 +256,7 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
>  static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
>  static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
>  static inline u64 sev_get_status(void) { return 0; }
> +static inline void kdump_sev_callback(void) {  }
>  #endif
>  
>  #ifdef CONFIG_KVM_AMD_SEV
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 23ede774d31b..64ae3a1e5c30 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -40,6 +40,7 @@
>  #include <asm/intel_pt.h>
>  #include <asm/crash.h>
>  #include <asm/cmdline.h>
> +#include <asm/sev.h>
>  
>  /* Used while preparing memory map entries for second kernel */
>  struct crash_memmap_data {
> @@ -59,12 +60,7 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
>  	 */
>  	cpu_emergency_stop_pt();
>  
> -	/*
> -	 * for SNP do wbinvd() on remote CPUs to
> -	 * safely do SNP_SHUTDOWN on the local CPU.
> -	 */
> -	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> -		wbinvd();
> +	kdump_sev_callback();
>  
>  	disable_local_APIC();
>  }
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index c67285824e82..dbb2cc6b5666 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2262,3 +2262,13 @@ static int __init snp_init_platform_device(void)
>  	return 0;
>  }
>  device_initcall(snp_init_platform_device);
> +
> +void kdump_sev_callback(void)
> +{
> +	/*
> +	 * Do wbinvd() on remote CPUs when SNP is enabled in order to
> +	 * safely do SNP_SHUTDOWN on the the local CPU.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> +		wbinvd();
> +}
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 598878e760bc..c342e5e54e45 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -161,7 +161,6 @@ static int sev_wait_cmd_ioc(struct sev_device *sev,
>  
>  			udelay(10);
>  		}
> -
>  		return -ETIMEDOUT;
>  	}
>  
> @@ -1654,7 +1653,7 @@ static int sev_update_firmware(struct device *dev)
>  	return ret;
>  }
>  
> -static int __sev_snp_shutdown_locked(int *error, bool in_panic)
> +static int __sev_snp_shutdown_locked(int *error, bool panic)
>  {
>  	struct sev_device *sev = psp_master->sev_data;
>  	struct sev_data_snp_shutdown_ex data;
> @@ -1673,7 +1672,7 @@ static int __sev_snp_shutdown_locked(int *error, bool in_panic)
>  	 * In that case, a wbinvd() is done on remote CPUs via the NMI
>  	 * callback, so only a local wbinvd() is needed here.
>  	 */
> -	if (!in_panic)
> +	if (!panic)
>  		wbinvd_on_all_cpus();
>  	else
>  		wbinvd();
> @@ -2199,26 +2198,13 @@ int sev_dev_init(struct psp_device *psp)
>  	return ret;
>  }
>  
> -static void __sev_firmware_shutdown(struct sev_device *sev, bool in_panic)
> +static void __sev_firmware_shutdown(struct sev_device *sev, bool panic)
>  {
>  	int error;
>  
>  	__sev_platform_shutdown_locked(NULL);
>  
>  	if (sev_es_tmr) {
> -		/*
> -		 * The TMR area was encrypted, flush it from the cache
> -		 *
> -		 * If invoked during panic handling, local interrupts are
> -		 * disabled and all CPUs are stopped, so wbinvd_on_all_cpus()
> -		 * can't be used. In that case, wbinvd() is done on remote CPUs
> -		 * via the NMI callback, so a local wbinvd() is sufficient here.
> -		 */
> -		if (!in_panic)
> -			wbinvd_on_all_cpus();
> -		else
> -			wbinvd();
> -
>  		__snp_free_firmware_pages(virt_to_page(sev_es_tmr),
>  					  get_order(sev_es_tmr_size),
>  					  true);
> @@ -2237,7 +2223,7 @@ static void __sev_firmware_shutdown(struct sev_device *sev, bool in_panic)
>  		snp_range_list = NULL;
>  	}
>  
> -	__sev_snp_shutdown_locked(&error, in_panic);
> +	__sev_snp_shutdown_locked(&error, panic);
>  }
>  
>  static void sev_firmware_shutdown(struct sev_device *sev)
> @@ -2262,26 +2248,18 @@ void sev_dev_destroy(struct psp_device *psp)
>  	psp_clear_sev_irq_handler(psp);
>  }
>  
> -static int sev_snp_shutdown_on_panic(struct notifier_block *nb,
> -				     unsigned long reason, void *arg)
> +static int snp_shutdown_on_panic(struct notifier_block *nb,
> +				 unsigned long reason, void *arg)
>  {
>  	struct sev_device *sev = psp_master->sev_data;
>  
> -	/*
> -	 * Panic callbacks are executed with all other CPUs stopped,
> -	 * so don't wait for sev_cmd_mutex to be released since it
> -	 * would block here forever.
> -	 */
> -	if (mutex_is_locked(&sev_cmd_mutex))
> -		return NOTIFY_DONE;
> -
>  	__sev_firmware_shutdown(sev, true);
>  
>  	return NOTIFY_DONE;
>  }
>  
> -static struct notifier_block sev_snp_panic_notifier = {
> -	.notifier_call = sev_snp_shutdown_on_panic,
> +static struct notifier_block snp_panic_notifier = {
> +	.notifier_call = snp_shutdown_on_panic,
>  };
>  
>  int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
> @@ -2322,7 +2300,7 @@ void sev_pci_init(void)
>  		"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
>  
>  	atomic_notifier_chain_register(&panic_notifier_list,
> -				       &sev_snp_panic_notifier);
> +				       &snp_panic_notifier);
>  	return;
>  
>  err:
> @@ -2339,5 +2317,5 @@ void sev_pci_exit(void)
>  	sev_firmware_shutdown(sev);
>  
>  	atomic_notifier_chain_unregister(&panic_notifier_list,
> -					 &sev_snp_panic_notifier);
> +					 &snp_panic_notifier);
>  }
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support
  2024-01-09 12:44                 ` Borislav Petkov
@ 2024-02-14 16:56                   ` Jeremi Piotrowski
  0 siblings, 0 replies; 102+ messages in thread
From: Jeremi Piotrowski @ 2024-02-14 16:56 UTC (permalink / raw)
  To: Borislav Petkov, Michael Roth
  Cc: x86, kvm, linux-coco, linux-mm, linux-crypto, linux-kernel, tglx,
	mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
	vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
	srinivas.pandruvada, rientjes, tobin, vbabka, kirill, ak,
	tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
	ashish.kalra, nikunj.dadhania, pankaj.gupta,
	liam.merwick@oracle.com Brijesh Singh

On 09/01/2024 13:44, Borislav Petkov wrote:
> On Tue, Jan 09, 2024 at 01:29:06PM +0100, Borislav Petkov wrote:
>> At least three issues I see with that:
>>
>> - the allocation can fail so it is a lot more convenient when the
>>   firmware prepares it
>>
>> - the RMP_BASE and RMP_END writes need to be verified they actially did
>>   set up the RMP range because if they haven't, you might as well
>>   throw SNP security out of the window. In general, letting the kernel
>>   do the RMP allocation needs to be verified very very thoroughly.
>>
>> - a future feature might make this more complicated
> 
> - What do you do if you boot on a system which has the RMP already
>   allocated in the BIOS?
> 
> - How do you detect that it is the L1 kernel that must allocate the RMP?
> 
> - Why can't you use the BIOS allocated RMP in your scenario too instead
>   of the L1 kernel allocating it?
> 
> - ...
> 
> I might think of more.
> 

Sorry for not replying back sooner.

I agree, lets get the base SNP stuff in and then talk about extensions.

I want to sync up with Michael to make sure he's onboard with what I'm
proposing. I'll add more design/documentation/usecase descriptions with the
next submission and will make sure to address all the issues you brought up.

Jeremi

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2024-02-14 16:57 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-30 16:19 [PATCH v1 00/26] Add AMD Secure Nested Paging (SEV-SNP) Initialization Support Michael Roth
2023-12-30 16:19 ` [PATCH v1 01/26] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
2023-12-31 11:50   ` Borislav Petkov
2023-12-31 16:44     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 02/26] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
2023-12-30 16:19 ` [PATCH v1 03/26] iommu/amd: Don't rely on external callers to enable IOMMU SNP support Michael Roth
2024-01-04 10:30   ` Borislav Petkov
2024-01-04 10:58   ` Joerg Roedel
2023-12-30 16:19 ` [PATCH v1 04/26] x86/sev: Add the host SEV-SNP initialization support Michael Roth
2024-01-04 11:05   ` Jeremi Piotrowski
2024-01-05 16:09     ` Borislav Petkov
2024-01-05 16:21       ` Borislav Petkov
2024-01-08 16:49         ` Jeremi Piotrowski
2024-01-08 17:04           ` Borislav Petkov
2024-01-09 11:56             ` Jeremi Piotrowski
2024-01-09 12:29               ` Borislav Petkov
2024-01-09 12:44                 ` Borislav Petkov
2024-02-14 16:56                   ` Jeremi Piotrowski
2024-01-04 11:16   ` Borislav Petkov
2024-01-04 14:42   ` Borislav Petkov
2024-01-05 19:19   ` Borislav Petkov
2024-01-05 21:27   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 05/26] x86/mtrr: Don't print errors if MtrrFixDramModEn is set when SNP enabled Michael Roth
2023-12-30 16:19 ` [PATCH v1 06/26] x86/sev: Add RMP entry lookup helpers Michael Roth
2023-12-30 16:19 ` [PATCH v1 07/26] x86/fault: Add helper for dumping RMP entries Michael Roth
2024-01-10  9:59   ` Borislav Petkov
2024-01-10 20:18     ` Jarkko Sakkinen
2024-01-10 22:14       ` Borislav Petkov
2024-01-10 11:13   ` Borislav Petkov
2024-01-10 15:20     ` Tom Lendacky
2024-01-10 15:27       ` Borislav Petkov
2024-01-10 15:51         ` Tom Lendacky
2024-01-10 15:55           ` Borislav Petkov
2024-01-10 15:10   ` Tom Lendacky
2023-12-30 16:19 ` [PATCH v1 08/26] x86/traps: Define RMP violation #PF error code Michael Roth
2023-12-30 16:19 ` [PATCH v1 09/26] x86/fault: Dump RMP table information when RMP page faults occur Michael Roth
2023-12-30 16:19 ` [PATCH v1 10/26] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
2024-01-12 14:49   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
2024-01-12 19:48   ` Borislav Petkov
2024-01-12 20:00   ` Dave Hansen
2024-01-12 20:07     ` Borislav Petkov
2024-01-12 20:27       ` Vlastimil Babka
2024-01-15  9:06         ` Borislav Petkov
2024-01-15  9:14           ` Vlastimil Babka
2024-01-15  9:16           ` Mike Rapoport
2024-01-15  9:20             ` Borislav Petkov
2024-01-12 20:28       ` Tom Lendacky
2024-01-12 20:37         ` Dave Hansen
2024-01-15  9:23           ` Vlastimil Babka
2024-01-16 16:19           ` Michael Roth
2024-01-16 16:50             ` Michael Roth
2024-01-16 20:12               ` Mike Rapoport
2024-01-26  1:49                 ` Michael Roth
2024-01-16 18:22             ` Borislav Petkov
2024-01-16 20:22             ` Dave Hansen
2024-01-26  1:35               ` Michael Roth
2024-01-15  9:09     ` Borislav Petkov
2024-01-16 16:21       ` Dave Hansen
2024-01-17  9:34         ` Borislav Petkov
2024-01-15  9:01   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 12/26] crypto: ccp: Define the SEV-SNP commands Michael Roth
2024-01-15  9:41   ` Borislav Petkov
2024-01-26  1:56     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 13/26] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
2024-01-15 11:19   ` Borislav Petkov
2024-01-15 19:53   ` Borislav Petkov
2024-01-26  2:48     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 14/26] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
2024-01-17  9:48   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list Michael Roth
2024-01-08 10:45   ` Vlastimil Babka
2024-01-09 22:19     ` Kalra, Ashish
2024-01-10  8:59       ` Vlastimil Babka
2023-12-30 16:19 ` [PATCH v1 16/26] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
2023-12-30 16:19 ` [PATCH v1 17/26] crypto: ccp: Handle non-volatile INIT_EX data " Michael Roth
2024-01-18 14:03   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 18/26] crypto: ccp: Handle legacy SEV commands " Michael Roth
2024-01-19 17:18   ` Borislav Petkov
2024-01-19 17:36     ` Tom Lendacky
2024-01-19 17:48       ` Borislav Petkov
2024-01-26 13:29     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 19/26] iommu/amd: Clean up RMP entries for IOMMU pages during SNP shutdown Michael Roth
2023-12-30 16:19 ` [PATCH v1 20/26] crypto: ccp: Add debug support for decrypting pages Michael Roth
2024-01-10 14:59   ` Sean Christopherson
2024-01-11  0:50     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 21/26] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump Michael Roth
2024-01-21 11:49   ` Borislav Petkov
2024-01-26  3:03     ` Kalra, Ashish
2024-01-26 13:38     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 22/26] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
2024-01-21 11:51   ` Borislav Petkov
2024-01-26  3:44     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 23/26] x86/cpufeatures: Enable/unmask SEV-SNP CPU feature Michael Roth
2023-12-30 16:19 ` [PATCH v1 24/26] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
2024-01-21 12:29   ` Borislav Petkov
2024-01-26  3:32     ` Michael Roth
2023-12-30 16:19 ` [PATCH v1 25/26] crypto: ccp: Add the SNP_COMMIT command Michael Roth
2024-01-21 12:35   ` Borislav Petkov
2023-12-30 16:19 ` [PATCH v1 26/26] crypto: ccp: Add the SNP_SET_CONFIG command Michael Roth
2024-01-21 12:41   ` Borislav Petkov
2024-01-26 13:30     ` Michael Roth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).