kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support
@ 2021-04-30 12:37 Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 01/37] KVM: SVM: Add support to handle AP reset MSR protocol Brijesh Singh
                   ` (36 more replies)
  0 siblings, 37 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required in a host OS for SEV-SNP support. The series builds upon
SEV-SNP Part-1 https://marc.info/?l=kvm&m=161978500619624&w=2 .

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:
- Register GHCB GPA
- Page State Change Request
- Hypevisor feature
- Guest message request

The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
access requires an RMP check. In particular, the read accesses from the
hypervisor do not require RMP checks because the data confidentiality is
already protected via memory encryption. When hardware encounters an RMP
checks failure, it raises a page-fault exception. If RMP check failure
is due to the page-size mismatch, then split the large page to resolve
the fault.

The series does not provide support for the following SEV-SNP specific
NAE's yet:

* Extended guest request
* AP bring up
* Interrupt security

The series is based on the commit:
 3bf0fcd75434 (tag: kvm-5.13-1, origin/next, next) KVM: selftests: Speed up set_memory_region_test

Changes since v1:
 * Add AP reset MSR protocol VMGEXIT NAE.
 * Add Hypervisor features VMGEXIT NAE.
 * Move the RMP table initialization and RMPUPDATE/PSMASH helper in
   arch/x86/kernel/sev.c.
 * Add support to map/unmap SEV legacy command buffer to firmware state when
   SNP is active.
 * Enhance PSP driver to provide helper to allocate/free memory used for the
   firmware context page.
 * Add support to handle RMP fault for the kernel address.
 * Add support to handle GUEST_REQUEST NAE event for attestation.
 * Rename RMP table lookup helper.
 * Drop typedef from rmpentry struct definition.
 * Drop SNP static key and use cpu_feature_enabled() to check whether SEV-SNP
   is active.
 * Multiple cleanup/fixes to address Boris review feedback.

Brijesh Singh (36):
  KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
  KVM: SVM: Increase the GHCB protocol version
  x86/cpufeatures: Add SEV-SNP CPU feature
  x86/sev: Add the host SEV-SNP initialization support
  x86/sev: Add RMP entry lookup helpers
  x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  x86/sev: Split the physmap when adding the page in RMP table
  x86/traps: Define RMP violation #PF error code
  x86/fault: Add support to handle the RMP fault for kernel address
  x86/fault: Add support to handle the RMP fault for user address
  crypto:ccp: Define the SEV-SNP commands
  crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
  crypto: ccp: Shutdown SNP firmware on kexec
  crypto:ccp: Provide APIs to issue SEV-SNP commands
  crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  crypto: ccp: Handle the legacy SEV command when SNP is enabled
  KVM: SVM: make AVIC backing, VMSA and VMCB memory allocation SNP safe
  KVM: SVM: Add initial SEV-SNP support
  KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area
  KVM: SVM: Add KVM_SNP_INIT command
  KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
  KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  KVM: SVM: Reclaim the guest pages when SEV-SNP VM terminates
  KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
  KVM: X86: Add kvm_x86_ops to get the max page level for the TDP
  KVM: X86: Introduce kvm_mmu_map_tdp_page() for use by SEV
  KVM: X86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
  KVM: X86: Define new RMP check related #NPF error bits
  KVM: X86: update page-fault trace to log the 64-bit error code
  KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
  KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  KVM: SVM: Add support to handle Page State Change VMGEXIT
  KVM: X86: Export the kvm_zap_gfn_range() for the SNP use
  KVM: SVM: Add support to handle the RMP nested page fault
  KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  KVM: SVM: Advertise the SEV-SNP feature support

Tom Lendacky (1):
  KVM: SVM: Add support to handle AP reset MSR protocol

 arch/x86/include/asm/cpufeatures.h       |   1 +
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/kvm_host.h          |  14 +
 arch/x86/include/asm/msr-index.h         |   6 +
 arch/x86/include/asm/sev.h               |   5 +-
 arch/x86/include/asm/svm.h               |  15 +-
 arch/x86/include/asm/trap_pf.h           |  18 +-
 arch/x86/kernel/cpu/amd.c                |   3 +-
 arch/x86/kernel/sev.c                    | 167 ++++
 arch/x86/kvm/lapic.c                     |   5 +-
 arch/x86/kvm/mmu.h                       |   5 +-
 arch/x86/kvm/mmu/mmu.c                   |  76 +-
 arch/x86/kvm/svm/sev.c                   | 961 ++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.c                   |  22 +-
 arch/x86/kvm/svm/svm.h                   |  31 +-
 arch/x86/kvm/trace.h                     |   6 +-
 arch/x86/kvm/vmx/vmx.c                   |   8 +
 arch/x86/mm/fault.c                      | 207 +++++
 drivers/crypto/ccp/sev-dev.c             | 647 ++++++++++++++-
 drivers/crypto/ccp/sev-dev.h             |  14 +
 drivers/crypto/ccp/sp-pci.c              |  12 +
 include/linux/mm.h                       |   6 +-
 include/linux/psp-sev.h                  | 323 ++++++++
 include/linux/sev.h                      |  81 ++
 include/uapi/linux/kvm.h                 |  43 +
 include/uapi/linux/psp-sev.h             |  44 ++
 mm/memory.c                              |  13 +
 tools/arch/x86/include/asm/cpufeatures.h |   1 +
 28 files changed, 2664 insertions(+), 78 deletions(-)
 create mode 100644 include/linux/sev.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 01/37] KVM: SVM: Add support to handle AP reset MSR protocol
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 02/37] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT Brijesh Singh
                   ` (35 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++++++++++++------
 arch/x86/kvm/svm/svm.h |  1 +
 2 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a9d8d6aafdb8..7bf4c2354a5a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -57,6 +57,10 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 #define sev_es_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
+#define AP_RESET_HOLD_NONE		0
+#define AP_RESET_HOLD_NAE_EVENT		1
+#define AP_RESET_HOLD_MSR_PROTO		2
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2200,6 +2204,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 
 static void pre_sev_es_run(struct vcpu_svm *svm)
 {
+	/* Clear any indication that the vCPU is in a type of AP Reset Hold */
+	svm->ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
 	if (!svm->ghcb)
 		return;
 
@@ -2408,6 +2415,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_AP_RESET_HOLD_REQ:
+		svm->ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+		ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+		/*
+		 * Preset the result to a non-SIPI return and then only set
+		 * the result to non-zero when delivering a SIPI.
+		 */
+		set_ghcb_msr_bits(svm, 0,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -2495,6 +2518,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
 		break;
 	case SVM_VMGEXIT_AP_HLT_LOOP:
+		svm->ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
 		ret = kvm_emulate_ap_reset_hold(vcpu);
 		break;
 	case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -2632,13 +2656,29 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		return;
 	}
 
-	/*
-	 * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
-	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
-	 * non-zero value.
-	 */
-	if (!svm->ghcb)
-		return;
+	/* Subsequent SIPI */
+	switch (svm->ap_reset_hold_type) {
+	case AP_RESET_HOLD_NAE_EVENT:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+		 */
+		ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+		break;
+	case AP_RESET_HOLD_MSR_PROTO:
+		/*
+		 * Return from an AP Reset Hold VMGEXIT, where the guest will
+		 * set the CS and RIP. Set GHCB data field to a non-zero value.
+		 */
+		set_ghcb_msr_bits(svm, 1,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+				  GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
 
-	ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+		set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+				  GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	default:
+		break;
+	}
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 42f8a7b9048f..dad528d9f08f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -174,6 +174,7 @@ struct vcpu_svm {
 	struct ghcb *ghcb;
 	struct kvm_host_map ghcb_map;
 	bool received_first_sipi;
+	unsigned int ap_reset_hold_type;
 
 	/* SEV-ES scratch area support */
 	void *ghcb_sa;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 02/37] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 01/37] KVM: SVM: Add support to handle AP reset MSR protocol Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 03/37] KVM: SVM: Increase the GHCB protocol version Brijesh Singh
                   ` (34 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.

Now that KVM supports the basic SEV-SNP, advertisement the support through
the hypervisor feature request MSR protocol and NAE VMGEXITs.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 14 ++++++++++++++
 arch/x86/kvm/svm/svm.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7bf4c2354a5a..5f0034e0dacc 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2174,6 +2174,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_HLT_LOOP:
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+	case SVM_VMGEXIT_HYPERVISOR_FEATURES:
 		break;
 	default:
 		goto vmgexit_err;
@@ -2431,6 +2432,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_MASK,
 				  GHCB_MSR_INFO_POS);
 		break;
+	case GHCB_MSR_HV_FEATURES_REQ: {
+		set_ghcb_msr_bits(svm, GHCB_HV_FEATURES_SUPPORTED,
+				GHCB_MSR_HV_FEATURES_MASK, GHCB_MSR_HV_FEATURES_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_HV_FEATURES_RESP,
+				GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -2546,6 +2554,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_HYPERVISOR_FEATURES: {
+		ghcb_set_sw_exit_info_2(ghcb, GHCB_HV_FEATURES_SUPPORTED);
+
+		ret = 1;
+		break;
+	}
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index dad528d9f08f..2b0083753812 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -530,6 +530,7 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 #define GHCB_VERSION_MAX	1ULL
 #define GHCB_VERSION_MIN	1ULL
 
+#define GHCB_HV_FEATURES_SUPPORTED	0
 
 extern unsigned int max_sev_asid;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 03/37] KVM: SVM: Increase the GHCB protocol version
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 01/37] KVM: SVM: Add support to handle AP reset MSR protocol Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 02/37] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 04/37] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
                   ` (33 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/svm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 2b0083753812..053f2505a738 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -527,7 +527,7 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 
 /* sev.c */
 
-#define GHCB_VERSION_MAX	1ULL
+#define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
 #define GHCB_HV_FEATURES_SUPPORTED	0
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 04/37] x86/cpufeatures: Add SEV-SNP CPU feature
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (2 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 03/37] KVM: SVM: Increase the GHCB protocol version Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 05/37] x86/sev: Add the host SEV-SNP initialization support Brijesh Singh
                   ` (32 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/cpufeatures.h       | 1 +
 arch/x86/kernel/cpu/amd.c                | 3 ++-
 tools/arch/x86/include/asm/cpufeatures.h | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index dddc746b5455..88b21de977d8 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -393,6 +393,7 @@
 #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
 #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
 
 /*
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index daf6c6e74ff9..15b389ebb019 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -586,7 +586,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 	 *	      If BIOS has not enabled SME then don't advertise the
 	 *	      SME feature (set in scattered.c).
 	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
-	 *            SEV and SEV_ES feature (set in scattered.c).
+	 *            SEV, SEV_ES and SEV_SNP feature.
 	 *
 	 *   In all cases, since support for SME and SEV requires long mode,
 	 *   don't advertise the feature under CONFIG_X86_32.
@@ -618,6 +618,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 clear_sev:
 		setup_clear_cpu_cap(X86_FEATURE_SEV);
 		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
 	}
 }
 
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index cc96e26d69f7..2e78ab5b92ab 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -390,6 +390,7 @@
 #define X86_FEATURE_SEV			(19*32+ 1) /* AMD Secure Encrypted Virtualization */
 #define X86_FEATURE_VM_PAGE_FLUSH	(19*32+ 2) /* "" VM Page Flush MSR is supported */
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP		(19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 05/37] x86/sev: Add the host SEV-SNP initialization support
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (3 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 04/37] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 06/37] x86/sev: Add RMP entry lookup helpers Brijesh Singh
                   ` (31 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
track the owner of each page of memory. Pages of memory can be owned by
the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
section 15.36.3 for more detail on RMP.

The RMP table is used to enforce access control to memory. The table itself
is not directly writable by the software. New CPU instructions (RMPUPDATE,
PVALIDATE, RMPADJUST) are used to manipulate the RMP entries.

Based on the platform configuration, the BIOS reserves the memory used
for the RMP table. The start and end address of the RMP table must be
queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
RMP_END are not set then disable the SEV-SNP feature.

The SEV-SNP feature is enabled only after the RMP table is successfully
initialized.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/disabled-features.h |  8 ++-
 arch/x86/include/asm/msr-index.h         |  6 ++
 arch/x86/kernel/sev.c                    | 91 ++++++++++++++++++++++++
 3 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index b7dd944dc867..0d5c8d08185c 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -68,6 +68,12 @@
 # define DISABLE_SGX	(1 << (X86_FEATURE_SGX & 31))
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+# define DISABLE_SEV_SNP	0
+#else
+# define DISABLE_SEV_SNP	(1 << (X86_FEATURE_SEV_SNP & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -91,7 +97,7 @@
 			 DISABLE_ENQCMD)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
-#define DISABLED_MASK19	0
+#define DISABLED_MASK19	(DISABLE_SEV_SNP)
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 20)
 
 #endif /* _ASM_X86_DISABLED_FEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 79f7a926476a..862cd2e777d9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -481,6 +481,8 @@
 #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
 #define MSR_AMD64_SEV_ES_ENABLED	BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
 #define MSR_AMD64_SEV_SNP_ENABLED	BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_RMP_BASE		0xc0010132
+#define MSR_AMD64_RMP_END		0xc0010133
 
 #define MSR_AMD64_VIRT_SPEC_CTRL	0xc001011f
 
@@ -538,6 +540,10 @@
 #define MSR_AMD64_SYSCFG		0xc0010010
 #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT	23
 #define MSR_AMD64_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_SNP_EN_BIT		24
+#define MSR_AMD64_SYSCFG_SNP_EN		BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT	25
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN	BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
 #define MSR_K8_INT_PENDING_MSG		0xc0010055
 /* C1E active bits in int pending message */
 #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index e54a497877e1..126fa441c0f8 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -23,6 +23,7 @@
 #include <linux/efi.h>
 #include <linux/mm.h>
 #include <linux/io.h>
+#include <linux/io.h>
 
 #include <asm/cpu_entry_area.h>
 #include <asm/stacktrace.h>
@@ -48,6 +49,9 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
  */
 static struct ghcb __initdata *boot_ghcb;
 
+static unsigned long rmptable_start __ro_after_init;
+static unsigned long rmptable_end __ro_after_init;
+
 /* #VC handler runtime per-CPU data */
 struct sev_es_runtime_data {
 	struct ghcb ghcb_page;
@@ -1782,3 +1786,90 @@ unsigned long snp_issue_guest_request(int type, struct snp_guest_request_data *i
 	return ret;
 }
 EXPORT_SYMBOL_GPL(snp_issue_guest_request);
+
+#undef pr_fmt
+#define pr_fmt(fmt)	"SEV-SNP: " fmt
+
+static void __snp_enable(void)
+{
+	u64 val;
+
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+
+	val |= MSR_AMD64_SYSCFG_SNP_EN;
+	val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
+
+	wrmsrl(MSR_AMD64_SYSCFG, val);
+}
+
+static int snp_enable(unsigned int cpu)
+{
+	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		__snp_enable();
+
+	return 0;
+}
+
+static __init int __snp_rmptable_init(void)
+{
+	u64 rmp_base, rmp_end;
+	unsigned long sz;
+	void *start;
+	u64 val;
+
+	rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
+	rdmsrl(MSR_AMD64_RMP_END, rmp_end);
+
+	if (!rmp_base || !rmp_end) {
+		pr_info("Memory for the RMP table has not been reserved by BIOS\n");
+		return 1;
+	}
+
+	sz = rmp_end - rmp_base + 1;
+
+	start = memremap(rmp_base, sz, MEMREMAP_WB);
+	if (!start) {
+		pr_err("Failed to map RMP table 0x%llx-0x%llx\n", rmp_base, rmp_end);
+		return 1;
+	}
+
+	/*
+	 * Check if SEV-SNP is already enabled, this can happen if we are coming from
+	 * kexec boot.
+	 */
+	rdmsrl(MSR_AMD64_SYSCFG, val);
+	if (val & MSR_AMD64_SYSCFG_SNP_EN)
+		goto skip_enable;
+
+	/* Initialize the RMP table to zero */
+	memset(start, 0, sz);
+
+	/* Flush the caches to ensure that data is written before SNP is enabled. */
+	wbinvd_on_all_cpus();
+
+	__snp_enable();
+
+skip_enable:
+	rmptable_start = (unsigned long)start;
+	rmptable_end = rmptable_start + sz;
+
+	pr_info("RMP table physical address 0x%016llx - 0x%016llx\n", rmp_base, rmp_end);
+
+	return 0;
+}
+
+static int __init snp_rmptable_init(void)
+{
+	if (!boot_cpu_has(X86_FEATURE_SEV_SNP))
+		return 0;
+
+	if (__snp_rmptable_init()) {
+		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+		return 1;
+	}
+
+	cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", snp_enable, NULL);
+
+	return 0;
+}
+early_initcall(snp_rmptable_init);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 06/37] x86/sev: Add RMP entry lookup helpers
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (4 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 05/37] x86/sev: Add the host SEV-SNP initialization support Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 07/37] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Brijesh Singh
                   ` (30 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
entry for a given page. The RMP entry format is documented in AMD PPR, see
https://bugzilla.kernel.org/attachment.cgi?id=296015.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev.h |  5 +---
 arch/x86/kernel/sev.c      | 28 ++++++++++++++++++++
 include/linux/sev.h        | 54 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/sev.h

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7f4c34dd84e1..a65e78fa3d51 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -9,6 +9,7 @@
 #define __ASM_ENCRYPTED_STATE_H
 
 #include <linux/types.h>
+#include <linux/sev.h>
 #include <asm/insn.h>
 #include <asm/sev-common.h>
 
@@ -65,10 +66,6 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 #define PVALIDATE_FAIL_SIZEMISMATCH	6
 #define PVALIDATE_FAIL_NOUPDATE		255 /* Software defined (when rFlags.CF = 1) */
 
-/* RMP page size */
-#define RMP_PG_SIZE_2M			1
-#define RMP_PG_SIZE_4K			0
-
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 extern struct static_key_false sev_es_enable_key;
 extern void __sev_es_ist_enter(struct pt_regs *regs);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 126fa441c0f8..dec4f423e232 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -40,6 +40,10 @@
 
 #define DR7_RESET_VALUE        0x400
 
+#define RMPTABLE_ENTRIES_OFFSET	0x4000
+#define RMPENTRY_SHIFT		8
+#define rmptable_page_offset(x)	(RMPTABLE_ENTRIES_OFFSET + (((unsigned long)x) >> RMPENTRY_SHIFT))
+
 /* For early boot hypervisor communication in SEV-ES enabled guests */
 static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
 
@@ -1873,3 +1877,27 @@ static int __init snp_rmptable_init(void)
 	return 0;
 }
 early_initcall(snp_rmptable_init);
+
+struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level)
+{
+	unsigned long phys = page_to_pfn(page) << PAGE_SHIFT;
+	struct rmpentry *entry, *large_entry;
+	unsigned long vaddr;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return NULL;
+
+	vaddr = rmptable_start + rmptable_page_offset(phys);
+	if (unlikely(vaddr > rmptable_end))
+		return NULL;
+
+	entry = (struct rmpentry *)vaddr;
+
+	/* Read a large RMP entry to get the correct page level used in RMP entry. */
+	vaddr = rmptable_start + rmptable_page_offset(phys & PMD_MASK);
+	large_entry = (struct rmpentry *)vaddr;
+	*level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(large_entry));
+
+	return entry;
+}
+EXPORT_SYMBOL_GPL(snp_lookup_page_in_rmptable);
diff --git a/include/linux/sev.h b/include/linux/sev.h
new file mode 100644
index 000000000000..ee038d466786
--- /dev/null
+++ b/include/linux/sev.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD Secure Encrypted Virtualization
+ *
+ * Author: Brijesh Singh <brijesh.singh@amd.com>
+ */
+
+#ifndef __LINUX_SEV_H
+#define __LINUX_SEV_H
+
+struct __packed rmpentry {
+	union {
+		struct {
+			u64 assigned:1;
+			u64 pagesize:1;
+			u64 immutable:1;
+			u64 rsvd1:9;
+			u64 gpa:39;
+			u64 asid:10;
+			u64 vmsa:1;
+			u64 validated:1;
+			u64 rsvd2:1;
+		} info;
+		u64 low;
+	};
+	u64 high;
+};
+
+#define rmpentry_assigned(x)	((x)->info.assigned)
+#define rmpentry_pagesize(x)	((x)->info.pagesize)
+#define rmpentry_vmsa(x)	((x)->info.vmsa)
+#define rmpentry_asid(x)	((x)->info.asid)
+#define rmpentry_validated(x)	((x)->info.validated)
+#define rmpentry_gpa(x)		((unsigned long)(x)->info.gpa)
+#define rmpentry_immutable(x)	((x)->info.immutable)
+
+/* RMP page size */
+#define RMP_PG_SIZE_2M			1
+#define RMP_PG_SIZE_4K			0
+
+/* Macro to convert the x86 page level to the RMP level and vice versa */
+#define X86_TO_RMP_PG_LEVEL(level)	(((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
+#define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level);
+#else
+static inline struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level)
+{
+	return NULL;
+}
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+#endif /* __LINUX_SEV_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 07/37] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (5 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 06/37] x86/sev: Add RMP entry lookup helpers Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 08/37] x86/sev: Split the physmap when adding the page in RMP table Brijesh Singh
                   ` (29 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.

The PSMASH instruction expands a 2MB RMP entry into a corresponding set of
contiguous 4KB-Page RMP entries. The hypervisor will use this instruction
to adjust the RMP entry without invalidating the previous RMP entry.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kernel/sev.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/sev.h   | 27 +++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index dec4f423e232..a8a0c6cd22ca 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1901,3 +1901,45 @@ struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level)
 	return entry;
 }
 EXPORT_SYMBOL_GPL(snp_lookup_page_in_rmptable);
+
+int psmash(struct page *page)
+{
+	unsigned long spa = page_to_pfn(page) << PAGE_SHIFT;
+	int ret;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENXIO;
+
+	/* Retry if another processor is modifying the RMP entry. */
+	do {
+		/* Binutils version 2.36 supports the PSMASH mnemonic. */
+		asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+			      : "=a"(ret)
+			      : "a"(spa)
+			      : "memory", "cc");
+	} while (ret == PSMASH_FAIL_INUSE);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(psmash);
+
+int rmpupdate(struct page *page, struct rmpupdate *val)
+{
+	unsigned long spa = page_to_pfn(page) << PAGE_SHIFT;
+	int ret;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENXIO;
+
+	/* Retry if another processor is modifying the RMP entry. */
+	do {
+		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+		asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+			     : "=a"(ret)
+			     : "a"(spa), "c"((unsigned long)val)
+			     : "memory", "cc");
+	} while (ret == RMPUPDATE_FAIL_INUSE);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(rmpupdate);
diff --git a/include/linux/sev.h b/include/linux/sev.h
index ee038d466786..9855e881e542 100644
--- a/include/linux/sev.h
+++ b/include/linux/sev.h
@@ -42,13 +42,40 @@ struct __packed rmpentry {
 #define X86_TO_RMP_PG_LEVEL(level)	(((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
 #define RMP_TO_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
 
+/* Return code of RMPUPDATE */
+#define RMPUPDATE_SUCCESS		0
+#define RMPUPDATE_FAIL_INPUT		1
+#define RMPUPDATE_FAIL_PERMISSION	2
+#define RMPUPDATE_FAIL_INUSE		3
+#define RMPUPDATE_FAIL_OVERLAP		4
+
+struct rmpupdate {
+	u64 gpa;
+	u8 assigned;
+	u8 pagesize;
+	u8 immutable;
+	u8 rsvd;
+	u32 asid;
+} __packed;
+
+/* Return code of PSMASH */
+#define PSMASH_SUCCESS			0
+#define PSMASH_FAIL_INPUT		1
+#define PSMASH_FAIL_PERMISSION		2
+#define PSMASH_FAIL_INUSE		3
+#define PSMASH_FAIL_BADADDR		4
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level);
+int psmash(struct page *page);
+int rmpupdate(struct page *page, struct rmpupdate *e);
 #else
 static inline struct rmpentry *snp_lookup_page_in_rmptable(struct page *page, int *level)
 {
 	return NULL;
 }
+static inline int psmash(struct page *page) { return -ENXIO; }
+static inline int rmpupdate(struct page *page, struct rmpupdate *e) { return -ENXIO; }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 #endif /* __LINUX_SEV_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 08/37] x86/sev: Split the physmap when adding the page in RMP table
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (6 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 07/37] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-05-03 15:07   ` Peter Zijlstra
  2021-05-03 15:15   ` Andy Lutomirski
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 09/37] x86/traps: Define RMP violation #PF error code Brijesh Singh
                   ` (28 subsequent siblings)
  36 siblings, 2 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used in conjuntion with standard x86 and IOMMU page
tables to enforce memory restrictions and page access rights. The
RMP is indexed by system physical address, and is checked at the end
of CPU and IOMMU table walks. The RMP check is enforced as soon as
SEV-SNP is enabled globally in the system. Not every memory access
requires an RMP check. In particular, the read accesses from the
hypervisor do not require RMP checks because the data confidentiality
is already protected via memory encryption. When hardware encounters
an RMP checks failure, it raise a page-fault exception. The RMP bit in
fault error code can be used to determine if the fault was due to an
RMP checks failure.

A write from the hypervisor goes through the RMP checks. When the
hypervisor writes to pages, hardware checks to ensures that the assigned
bit in the RMP is zero (i.e page is shared). If the page table entry that
gives the sPA indicates that the target page size is a large page, then
all RMP entries for the 4KB constituting pages of the target must have the
assigned bit 0. If one of entry does not have assigned bit 0 then hardware
will raise an RMP violation. To resolve it, split the page table entry
leading to target page into 4K.

This poses a challenge in the Linux memory model. The Linux kernel
creates a direct mapping of all the physical memory -- referred to as
the physmap. The physmap may contain a valid mapping of guest owned pages.
During the page table walk, the host access may get into the situation where
one of the pages within the large page is owned by the guest (i.e assigned
bit is set in RMP). A write to a non-guest within the large page will
raise an RMP violation. To workaround it, call set_memory_4k() to split
the physmap before adding the page in the RMP table. This ensures that the
pages added in the RMP table are used as 4K in the physmap.

The spliting of the physmap is a temporary solution until the kernel page
fault handler is improved to split the kernel address on demand. One of the
disadvtange of splitting is that eventually, it will end up breaking down
the entire physmap unless its coalesce back to a large page. I am open to
the suggestation on various approaches we could take to address this problem.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kernel/sev.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a8a0c6cd22ca..60d62c66778b 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1931,6 +1931,12 @@ int rmpupdate(struct page *page, struct rmpupdate *val)
 	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
 		return -ENXIO;
 
+	ret = set_memory_4k((unsigned long)page_to_virt(page), 1);
+	if (ret) {
+		pr_err("Failed to split physical address 0x%lx (%d)\n", spa, ret);
+		return ret;
+	}
+
 	/* Retry if another processor is modifying the RMP entry. */
 	do {
 		/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 09/37] x86/traps: Define RMP violation #PF error code
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (7 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 08/37] x86/sev: Split the physmap when adding the page in RMP table Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address Brijesh Singh
                   ` (27 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

Bit 31 in the page fault-error bit will be set when processor encounters
an RMP violation.

While at it, use the BIT() macro.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/trap_pf.h | 18 +++++++++++-------
 arch/x86/mm/fault.c            |  1 +
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index 10b1de500ab1..29f678701753 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_TRAP_PF_H
 #define _ASM_X86_TRAP_PF_H
 
+#include <vdso/bits.h>  /* BIT() macro */
+
 /*
  * Page fault error code bits:
  *
@@ -12,15 +14,17 @@
  *   bit 4 ==				1: fault was an instruction fetch
  *   bit 5 ==				1: protection keys block access
  *   bit 15 ==				1: SGX MMU page-fault
+ *   bit 31 ==				1: fault was an RMP violation
  */
 enum x86_pf_error_code {
-	X86_PF_PROT	=		1 << 0,
-	X86_PF_WRITE	=		1 << 1,
-	X86_PF_USER	=		1 << 2,
-	X86_PF_RSVD	=		1 << 3,
-	X86_PF_INSTR	=		1 << 4,
-	X86_PF_PK	=		1 << 5,
-	X86_PF_SGX	=		1 << 15,
+	X86_PF_PROT	=		BIT(0),
+	X86_PF_WRITE	=		BIT(1),
+	X86_PF_USER	=		BIT(2),
+	X86_PF_RSVD	=		BIT(3),
+	X86_PF_INSTR	=		BIT(4),
+	X86_PF_PK	=		BIT(5),
+	X86_PF_SGX	=		BIT(15),
+	X86_PF_RMP	=		BIT(31),
 };
 
 #endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a73347e2cdfc..39d22f6870e1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -545,6 +545,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 		 !(error_code & X86_PF_PROT) ? "not-present page" :
 		 (error_code & X86_PF_RSVD)  ? "reserved bit violation" :
 		 (error_code & X86_PF_PK)    ? "protection keys violation" :
+		 (error_code & X86_PF_RMP)   ? "rmp violation" :
 					       "permissions violation");
 
 	if (!(error_code & X86_PF_USER) && user_mode(regs)) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (8 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 09/37] x86/traps: Define RMP violation #PF error code Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-05-03 14:44   ` Dave Hansen
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 11/37] x86/fault: Add support to handle the RMP fault for user address Brijesh Singh
                   ` (26 subsequent siblings)
  36 siblings, 1 reply; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

When SEV-SNP is enabled globally, a write from the host goes through the
RMP check. When the host writes to pages, hardware checks the following
conditions at the end of page walk:

1. Assigned bit in the RMP table is zero (i.e page is shared).
2. If the page table entry that gives the sPA indicates that the target
   page size is a large page, then all RMP entries for the 4KB
   constituting pages of the target must have the assigned bit 0.
3. Immutable bit in the RMP table is not zero.

The hardware will raise page fault if one of the above conditions is not
met. A host should not encounter the RMP fault in normal execution, but
a malicious guest could trick the hypervisor into it. e.g., a guest does
not make the GHCB page shared, on #VMGEXIT, the hypervisor will attempt
to write to GHCB page.

Try resolving the fault instead of crashing the host. To resolve it,
forcefully clear the assigned bit from the RMP entry to make the page
shared so that the write succeeds.  If the fault handler cannot resolve
the RMP violation, then dump the RMP entry for debugging purposes.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/mm/fault.c | 146 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 39d22f6870e1..d833fe84010f 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -19,6 +19,7 @@
 #include <linux/uaccess.h>		/* faulthandler_disabled()	*/
 #include <linux/efi.h>			/* efi_crash_gracefully_on_page_fault()*/
 #include <linux/mm_types.h>
+#include <linux/sev.h>			/* snp_lookup_page_in_rmptable() */
 
 #include <asm/cpufeature.h>		/* boot_cpu_has, ...		*/
 #include <asm/traps.h>			/* dotraplinkage, ...		*/
@@ -1132,6 +1133,145 @@ bool fault_in_kernel_space(unsigned long address)
 	return address >= TASK_SIZE_MAX;
 }
 
+#define RMP_FAULT_RETRY		0
+#define RMP_FAULT_KILL		1
+#define RMP_FAULT_PAGE_SPLIT	2
+
+static inline size_t pages_per_hpage(int level)
+{
+	return page_level_size(level) / PAGE_SIZE;
+}
+
+static void dump_rmpentry(unsigned long pfn)
+{
+	struct rmpentry *e;
+	int level;
+
+	e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &level);
+
+	/*
+	 * If the RMP entry at the faulting address was not assigned, then dump may not
+	 * provide any useful debug information. Iterate through the entire 2MB region,
+	 * and dump the RMP entries if one of the bit in the RMP entry is set.
+	 */
+	if (rmpentry_assigned(e)) {
+		pr_alert("RMPEntry paddr 0x%lx [assigned=%d immutable=%d pagesize=%d gpa=0x%lx"
+			" asid=%d vmsa=%d validated=%d]\n", pfn << PAGE_SHIFT,
+			rmpentry_assigned(e), rmpentry_immutable(e), rmpentry_pagesize(e),
+			rmpentry_gpa(e), rmpentry_asid(e), rmpentry_vmsa(e),
+			rmpentry_validated(e));
+
+		pr_alert("RMPEntry paddr 0x%lx %016llx %016llx\n", pfn << PAGE_SHIFT,
+			e->high, e->low);
+	} else {
+		unsigned long pfn_end;
+
+		pfn = pfn & ~0x1ff;
+		pfn_end = pfn + PTRS_PER_PMD;
+
+		while (pfn < pfn_end) {
+			e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &level);
+
+			if (unlikely(!e))
+				return;
+
+			if (e->low || e->high)
+				pr_alert("RMPEntry paddr 0x%lx: %016llx %016llx\n",
+					pfn << PAGE_SHIFT, e->high, e->low);
+			pfn++;
+		}
+	}
+}
+
+/*
+ * Called for all faults where 'address' is part of the kernel address space.
+ * The function returns RMP_FAULT_RETRY when its able to resolve the fault and
+ * its safe to retry.
+ */
+static int handle_kern_rmp_fault(unsigned long hw_error_code, unsigned long address)
+{
+	int ret, level, rmp_level, mask;
+	struct rmpupdate val = {};
+	struct rmpentry *e;
+	unsigned long pfn;
+	pgd_t *pgd;
+	pte_t *pte;
+
+	if (unlikely(!cpu_feature_enabled(X86_FEATURE_SEV_SNP)))
+		return RMP_FAULT_KILL;
+
+	pgd = __va(read_cr3_pa());
+	pgd += pgd_index(address);
+
+	pte = lookup_address_in_pgd(pgd, address, &level);
+
+	if (unlikely(!pte))
+		return RMP_FAULT_KILL;
+
+	switch(level) {
+	case PG_LEVEL_4K: pfn = pte_pfn(*pte); break;
+	case PG_LEVEL_2M: pfn = pmd_pfn(*(pmd_t *)pte); break;
+	case PG_LEVEL_1G: pfn = pud_pfn(*(pud_t *)pte); break;
+	case PG_LEVEL_512G: pfn = p4d_pfn(*(p4d_t *)pte); break;
+	default: return RMP_FAULT_KILL;
+	}
+
+	/* Calculate the PFN within large page. */
+	if (level > PG_LEVEL_4K) {
+		mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
+		pfn |= (address >> PAGE_SHIFT) & mask;
+	}
+
+	e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
+	if (unlikely(!e))
+		return RMP_FAULT_KILL;
+
+	/*
+	 * If the immutable bit is set, we cannot convert the page to shared
+	 * to resolve the fault.
+	 */
+	if (rmpentry_immutable(e))
+		goto e_dump_rmpentry;
+
+	/*
+	 * If the host page level is greather than RMP page level then only way to
+	 * resolve the fault is to split the address. We don't support splitting
+	 * kernel address in the fault path yet.
+	 */
+	if (level > rmp_level)
+		goto e_dump_rmpentry;
+
+	/*
+	 * If the RMP page level is higher than host page level then use the PSMASH
+	 * to split the RMP large entry into 512 4K entries.
+	 */
+	if (rmp_level > level) {
+		ret = psmash(pfn_to_page(pfn & ~0x1FF));
+		if (ret) {
+			pr_alert("Failed to psmash pfn 0x%lx (rc %d)\n", pfn, ret);
+			goto e_dump_rmpentry;
+		}
+	}
+
+	/* Log that the RMP fault handler is clearing the assigned bit. */
+	if (rmpentry_assigned(e))
+		pr_alert("Force address %lx from assigned -> unassigned in RMP table\n", address);
+
+	/* Clear the assigned bit from the RMP table. */
+	ret = rmpupdate(pfn_to_page(pfn), &val);
+	if (ret) {
+		pr_alert("Failed to unassign address 0x%lx in RMP table\n", address);
+		goto e_dump_rmpentry;
+	}
+
+	return RMP_FAULT_RETRY;
+
+e_dump_rmpentry:
+
+	dump_rmpentry(pfn);
+	return RMP_FAULT_KILL;
+}
+
 /*
  * Called for all faults where 'address' is part of the kernel address
  * space.  Might get called for faults that originate from *code* that
@@ -1179,6 +1319,12 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 	}
 #endif
 
+	/* Try resolving the RMP fault. */
+	if (hw_error_code & X86_PF_RMP) {
+		if (handle_kern_rmp_fault(hw_error_code, address) == RMP_FAULT_RETRY)
+			return;
+	}
+
 	if (is_f00f_bug(regs, hw_error_code, address))
 		return;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 11/37] x86/fault: Add support to handle the RMP fault for user address
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (9 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 12/37] crypto:ccp: Define the SEV-SNP commands Brijesh Singh
                   ` (25 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

When SEV-SNP is enabled globally, a write from the host goes through the
RMP check. When the host writes to pages, hardware checks the following
conditions at the end of page walk:

1. Assigned bit in the RMP table is zero (i.e page is shared).
2. If the page table entry that gives the sPA indicates that the target
   page size is a large page, then all RMP entries for the 4KB
   constituting pages of the target must have the assigned bit 0.
3. Immutable bit in the RMP table is not zero.

The hardware will raise page fault if one of the above conditions is not
met. Try resolving the fault instead of taking fault again and again. If
the host attempts to write to the guest private memory then send the
SIGBUG signal to kill the process. If the page level between the host and
RMP entry does not match, then split the address to keep the RMP and host
page levels in sync.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/mm/fault.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/mm.h  |  6 ++++-
 mm/memory.c         | 13 ++++++++++
 3 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index d833fe84010f..4441f5332c2c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1348,6 +1348,49 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 }
 NOKPROBE_SYMBOL(do_kern_addr_fault);
 
+static int handle_user_rmp_page_fault(unsigned long hw_error_code, unsigned long address)
+{
+	unsigned long pfn, mask;
+	int rmp_level, level;
+	struct rmpentry *e;
+	pte_t *pte;
+
+	if (unlikely(!cpu_feature_enabled(X86_FEATURE_SEV_SNP)))
+		return RMP_FAULT_KILL;
+
+	/* Get the native page level */
+	pte = lookup_address_in_mm(current->mm, address, &level);
+	if (unlikely(!pte))
+		return RMP_FAULT_KILL;
+
+	pfn = pte_pfn(*pte);
+	if (level > PG_LEVEL_4K) {
+		mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
+		pfn |= (address >> PAGE_SHIFT) & mask;
+	}
+
+	/* Get the page level from the RMP entry. */
+	e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
+	if (!e)
+		return RMP_FAULT_KILL;
+
+	/*
+	 * Check if the RMP violation is due to the guest private page access. We can
+	 * not resolve this RMP fault, ask to kill the guest.
+	 */
+	if (rmpentry_assigned(e))
+		return RMP_FAULT_KILL;
+
+	/*
+	 * Its a guest shared page, and the backing page level is higher than the RMP
+	 * page level, request to split the page.
+	 */
+	if (level > rmp_level)
+		return RMP_FAULT_PAGE_SPLIT;
+
+	return RMP_FAULT_RETRY;
+}
+
 /*
  * Handle faults in the user portion of the address space.  Nothing in here
  * should check X86_PF_USER without a specific justification: for almost
@@ -1365,6 +1408,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	vm_fault_t fault;
+	int ret;
 	unsigned int flags = FAULT_FLAG_DEFAULT;
 
 	tsk = current;
@@ -1445,6 +1489,22 @@ void do_user_addr_fault(struct pt_regs *regs,
 	if (error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
+	/*
+	 * If its an RMP violation, try resolving it.
+	 */
+	if (error_code & X86_PF_RMP) {
+		ret = handle_user_rmp_page_fault(error_code, address);
+		if (ret == RMP_FAULT_PAGE_SPLIT) {
+			flags |= FAULT_FLAG_PAGE_SPLIT;
+		} else if (ret == RMP_FAULT_KILL) {
+			fault |= VM_FAULT_SIGBUS;
+			do_sigbus(regs, error_code, address, fault);
+			return;
+		} else {
+			return;
+		}
+	}
+
 #ifdef CONFIG_X86_64
 	/*
 	 * Faults in the vsyscall page might need emulation.  The
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8ba434287387..b37d9d8aae3b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -434,6 +434,8 @@ extern pgprot_t protection_map[16];
  * @FAULT_FLAG_REMOTE: The fault is not for current task/mm.
  * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch.
  * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals.
+ * @FAULT_FLAG_PAGE_SPLIT: The fault was due page size mismatch, split the
+ *  region to smaller page size and retry.
  *
  * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
  * whether we would allow page faults to retry by specifying these two
@@ -464,6 +466,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_REMOTE			0x80
 #define FAULT_FLAG_INSTRUCTION  		0x100
 #define FAULT_FLAG_INTERRUPTIBLE		0x200
+#define FAULT_FLAG_PAGE_SPLIT			0x400
 
 /*
  * The default fault flags that should be used by most of the
@@ -501,7 +504,8 @@ static inline bool fault_flag_allow_retry_first(unsigned int flags)
 	{ FAULT_FLAG_USER,		"USER" }, \
 	{ FAULT_FLAG_REMOTE,		"REMOTE" }, \
 	{ FAULT_FLAG_INSTRUCTION,	"INSTRUCTION" }, \
-	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }
+	{ FAULT_FLAG_INTERRUPTIBLE,	"INTERRUPTIBLE" }, \
+	{ FAULT_FLAG_PAGE_SPLIT,	"PAGESPLIT" }
 
 /*
  * vm_fault is filled by the pagefault handler and passed to the vma's
diff --git a/mm/memory.c b/mm/memory.c
index 550405fc3b5e..21ec049e21ad 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4358,6 +4358,15 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	return 0;
 }
 
+static int handle_split_page_fault(struct vm_fault *vmf)
+{
+	if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+		return VM_FAULT_SIGBUS;
+
+	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
+	return 0;
+}
+
 /*
  * By the time we get here, we already hold the mm semaphore
  *
@@ -4435,6 +4444,10 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
 				pmd_migration_entry_wait(mm, vmf.pmd);
 			return 0;
 		}
+
+		if (flags & FAULT_FLAG_PAGE_SPLIT)
+			return handle_split_page_fault(&vmf);
+
 		if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
 			if (pmd_protnone(orig_pmd) && vma_is_accessible(vma))
 				return do_huge_pmd_numa_page(&vmf, orig_pmd);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 12/37] crypto:ccp: Define the SEV-SNP commands
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (10 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 11/37] x86/fault: Add support to handle the RMP fault for user address Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 13/37] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Brijesh Singh
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
while adding new hardware security protection.

Define the commands and structures used to communicate with the AMD-SP
when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
is available at developer.amd.com/sev.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c |  16 ++-
 include/linux/psp-sev.h      | 222 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/psp-sev.h |  44 +++++++
 3 files changed, 281 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 6ee703176049..09d117b99bf5 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -129,7 +129,21 @@ static int sev_cmd_buffer_len(int cmd)
 	case SEV_CMD_DOWNLOAD_FIRMWARE:		return sizeof(struct sev_data_download_firmware);
 	case SEV_CMD_GET_ID:			return sizeof(struct sev_data_get_id);
 	case SEV_CMD_ATTESTATION_REPORT:	return sizeof(struct sev_data_attestation_report);
-	case SEV_CMD_SEND_CANCEL:			return sizeof(struct sev_data_send_cancel);
+	case SEV_CMD_SEND_CANCEL:		return sizeof(struct sev_data_send_cancel);
+	case SEV_CMD_SNP_GCTX_CREATE:		return sizeof(struct sev_data_snp_gctx_create);
+	case SEV_CMD_SNP_LAUNCH_START:		return sizeof(struct sev_data_snp_launch_start);
+	case SEV_CMD_SNP_LAUNCH_UPDATE:		return sizeof(struct sev_data_snp_launch_update);
+	case SEV_CMD_SNP_ACTIVATE:		return sizeof(struct sev_data_snp_activate);
+	case SEV_CMD_SNP_DECOMMISSION:		return sizeof(struct sev_data_snp_decommission);
+	case SEV_CMD_SNP_PAGE_RECLAIM:		return sizeof(struct sev_data_snp_page_reclaim);
+	case SEV_CMD_SNP_GUEST_STATUS:		return sizeof(struct sev_data_snp_guest_status);
+	case SEV_CMD_SNP_LAUNCH_FINISH:		return sizeof(struct sev_data_snp_launch_finish);
+	case SEV_CMD_SNP_DBG_DECRYPT:		return sizeof(struct sev_data_snp_dbg);
+	case SEV_CMD_SNP_DBG_ENCRYPT:		return sizeof(struct sev_data_snp_dbg);
+	case SEV_CMD_SNP_PAGE_UNSMASH:		return sizeof(struct sev_data_snp_page_unsmash);
+	case SEV_CMD_SNP_PLATFORM_STATUS:	return sizeof(struct sev_data_snp_platform_status_buf);
+	case SEV_CMD_SNP_GUEST_REQUEST:		return sizeof(struct sev_data_snp_guest_request);
+	case SEV_CMD_SNP_CONFIG:		return sizeof(struct sev_data_snp_config);
 	default:				return 0;
 	}
 
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index d48a7192e881..c3755099ab55 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -85,6 +85,34 @@ enum sev_cmd {
 	SEV_CMD_DBG_DECRYPT		= 0x060,
 	SEV_CMD_DBG_ENCRYPT		= 0x061,
 
+	/* SNP specific commands */
+	SEV_CMD_SNP_INIT		= 0x81,
+	SEV_CMD_SNP_SHUTDOWN		= 0x82,
+	SEV_CMD_SNP_PLATFORM_STATUS	= 0x83,
+	SEV_CMD_SNP_DF_FLUSH		= 0x84,
+	SEV_CMD_SNP_INIT_EX		= 0x85,
+	SEV_CMD_SNP_DECOMMISSION	= 0x90,
+	SEV_CMD_SNP_ACTIVATE		= 0x91,
+	SEV_CMD_SNP_GUEST_STATUS	= 0x92,
+	SEV_CMD_SNP_GCTX_CREATE		= 0x93,
+	SEV_CMD_SNP_GUEST_REQUEST	= 0x94,
+	SEV_CMD_SNP_ACTIVATE_EX		= 0x95,
+	SEV_CMD_SNP_LAUNCH_START	= 0xA0,
+	SEV_CMD_SNP_LAUNCH_UPDATE	= 0xA1,
+	SEV_CMD_SNP_LAUNCH_FINISH	= 0xA2,
+	SEV_CMD_SNP_DBG_DECRYPT		= 0xB0,
+	SEV_CMD_SNP_DBG_ENCRYPT		= 0xB1,
+	SEV_CMD_SNP_PAGE_SWAP_OUT	= 0xC0,
+	SEV_CMD_SNP_PAGE_SWAP_IN	= 0xC1,
+	SEV_CMD_SNP_PAGE_MOVE		= 0xC2,
+	SEV_CMD_SNP_PAGE_MD_INIT	= 0xC3,
+	SEV_CMD_SNP_PAGE_MD_RECLAIM	= 0xC4,
+	SEV_CMD_SNP_PAGE_RO_RECLAIM	= 0xC5,
+	SEV_CMD_SNP_PAGE_RO_RESTORE	= 0xC6,
+	SEV_CMD_SNP_PAGE_RECLAIM	= 0xC7,
+	SEV_CMD_SNP_PAGE_UNSMASH	= 0xC8,
+	SEV_CMD_SNP_CONFIG		= 0xC9,
+
 	SEV_CMD_MAX,
 };
 
@@ -510,6 +538,200 @@ struct sev_data_attestation_report {
 	u32 len;				/* In/Out */
 } __packed;
 
+/**
+ * struct sev_data_snp_platform_status_buf - SNP_PLATFORM_STATUS command params
+ *
+ * @address: physical address where the status should be copied
+ */
+struct sev_data_snp_platform_status_buf {
+	u64 status_paddr;			/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_download_firmware - SNP_DOWNLOAD_FIRMWARE command params
+ *
+ * @address: physical address of firmware image
+ * @len: len of the firmware image
+ */
+struct sev_data_snp_download_firmware {
+	u64 address;				/* In */
+	u32 len;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_gctx_create - SNP_GCTX_CREATE command params
+ *
+ * @gctx_paddr: system physical address of the page donated to firmware by
+ *		the hypervisor to contain the guest context.
+ */
+struct sev_data_snp_gctx_create {
+	u64 gctx_paddr;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_activate - SNP_ACTIVATE command params
+ *
+ * @gctx_paddr: system physical address guest context page
+ * @asid: ASID to bind to the guest
+ */
+struct sev_data_snp_activate {
+	u64 gctx_paddr;				/* In */
+	u32 asid;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_decommission - SNP_DECOMMISSION command params
+ *
+ * @address: system physical address guest context page
+ */
+struct sev_data_snp_decommission {
+	u64 gctx_paddr;				/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @policy: guest policy
+ * @ma_gctx_addr: system physical address of migration agent
+ * @imi_en: launch flow is launching an IMI for the purpose of
+ *   guest-assisted migration.
+ * @ma_en: the guest is associated with a migration agent
+ */
+struct sev_data_snp_launch_start {
+	u64 gctx_paddr;				/* In */
+	u64 policy;				/* In */
+	u64 ma_gctx_paddr;			/* In */
+	u32 ma_en:1;				/* In */
+	u32 imi_en:1;				/* In */
+	u32 rsvd:30;
+	u8 gosvw[16];				/* In */
+} __packed;
+
+/* SNP support page type */
+enum {
+	SNP_PAGE_TYPE_NORMAL		= 0x1,
+	SNP_PAGE_TYPE_VMSA		= 0x2,
+	SNP_PAGE_TYPE_ZERO		= 0x3,
+	SNP_PAGE_TYPE_UNMEASURED	= 0x4,
+	SNP_PAGE_TYPE_SECRET		= 0x5,
+	SNP_PAGE_TYPE_CPUID		= 0x6,
+
+	SNP_PAGE_TYPE_MAX
+};
+
+/**
+ * struct sev_data_snp_launch_update - SNP_LAUNCH_UPDATE command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @imi_page: indicates that this page is part of the IMI of the guest
+ * @page_type: encoded page type
+ * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
+ * @address: system physical address of destination page to encrypt
+ * @vmpl3_perms: VMPL permission mask for VMPL3
+ * @vmpl2_perms: VMPL permission mask for VMPL2
+ * @vmpl1_perms: VMPL permission mask for VMPL1
+ */
+struct sev_data_snp_launch_update {
+	u64 gctx_paddr;				/* In */
+	u32 page_size:1;			/* In */
+	u32 page_type:3;			/* In */
+	u32 imi_page:1;				/* In */
+	u32 rsvd:27;
+	u32 rsvd2;
+	u64 address;				/* In */
+	u32 rsvd3:8;
+	u32 vmpl3_perms:8;			/* In */
+	u32 vmpl2_perms:8;			/* In */
+	u32 vmpl1_perms:8;			/* In */
+	u32 rsvd4;
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_finish - SNP_LAUNCH_FINISH command params
+ *
+ * @gctx_addr: system pphysical address of guest context page
+ */
+struct sev_data_snp_launch_finish {
+	u64 gctx_paddr;
+	u64 id_block_paddr;
+	u64 id_auth_paddr;
+	u8 id_block_en:1;
+	u8 auth_key_en:1;
+	u64 rsvd:62;
+	u8 host_data[32];
+} __packed;
+
+/**
+ * struct sev_data_snp_guest_status - SNP_GUEST_STATUS command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @address: system physical address of guest status page
+ */
+struct sev_data_snp_guest_status {
+	u64 gctx_paddr;
+	u64 address;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_reclaim - SNP_PAGE_RECLAIM command params
+ *
+ * @paddr: system physical address of page to be claimed. The BIT0 indicate
+ *	the page size. 0h indicates 4 kB and 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_reclaim {
+	u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_unsmash - SNP_PAGE_UNMASH command params
+ *
+ * @paddr: system physical address of page to be unmashed. The BIT0 indicate
+ *	the page size. 0h indicates 4 kB and 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_unsmash {
+	u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_dbg - DBG_ENCRYPT/DBG_DECRYPT command parameters
+ *
+ * @handle: handle of the VM to perform debug operation
+ * @src_addr: source address of data to operate on
+ * @dst_addr: destination address of data to operate on
+ * @len: len of data to operate on
+ */
+struct sev_data_snp_dbg {
+	u64 gctx_paddr;				/* In */
+	u64 src_addr;				/* In */
+	u64 dst_addr;				/* In */
+	u32 len;				/* In */
+} __packed;
+
+/**
+ * struct sev_snp_guest_request - SNP_GUEST_REQUEST command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @req_paddr: system physical address of request page
+ * @res_paddr: system physical address of response page
+ */
+struct sev_data_snp_guest_request {
+	u64 gctx_paddr;				/* In */
+	u64 req_paddr;				/* In */
+	u64 res_paddr;				/* In */
+} __packed;
+
+/**
+ * struuct sev_data_snp_init - SNP_INIT_EX structure
+ *
+ * @init_rmp: indicate that the RMP should be initialized.
+ */
+struct sev_data_snp_init_ex {
+	u32 init_rmp:1;
+	u32 rsvd:31;
+	u8 rsvd1[60];
+} __packed;
+
 #ifdef CONFIG_CRYPTO_DEV_SP_PSP
 
 /**
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 91b4c63d5cbf..f6d02d4dd014 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -28,6 +28,8 @@ enum {
 	SEV_PEK_CERT_IMPORT,
 	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
 	SEV_GET_ID2,
+	SNP_PLATFORM_STATUS,
+	SNP_CONFIG,
 
 	SEV_MAX,
 };
@@ -61,6 +63,13 @@ typedef enum {
 	SEV_RET_INVALID_PARAM,
 	SEV_RET_RESOURCE_LIMIT,
 	SEV_RET_SECURE_DATA_INVALID,
+	SEV_RET_INVALID_PAGE_SIZE,
+	SEV_RET_INVALID_PAGE_STATE,
+	SEV_RET_INVALID_MDATA_ENTRY,
+	SEV_RET_INVALID_PAGE_OWNER,
+	SEV_RET_INVALID_PAGE_AEAD_OFLOW,
+	SEV_RET_RMP_INIT_REQUIRED,
+
 	SEV_RET_MAX,
 } sev_ret_code;
 
@@ -147,6 +156,41 @@ struct sev_user_data_get_id2 {
 	__u32 length;				/* In/Out */
 } __packed;
 
+/**
+ * struct sev_data_snp_platform_status - Platform status
+ *
+ * @major: API major version
+ * @minor: API minor version
+ * @state: current platform state
+ * @build: firmware build id for the API version
+ * @guest_count: the number of guest currently managed by the firmware
+ * @tcb_version: current TCB version
+ */
+struct sev_user_snp_status {
+	__u8 api_major;		/* Out */
+	__u8 api_minor;		/* Out */
+	__u8 state;		/* Out */
+	__u8 rsvd;
+	__u32 build_id;		/* Out */
+	__u32 rsvd1;
+	__u32 guest_count;	/* Out */
+	__u64 tcb_version;	/* Out */
+	__u64 rsvd2;
+} __packed;
+
+/**
+ * struct sev_data_snp_config - system wide configuration value for SNP.
+ *
+ * @reported_tcb: The TCB version to report in the guest attestation report.
+ * @mask_chip_id: Indicates that the CHID_ID field in the attestation report
+ *  will always be zero.
+ */
+struct sev_data_snp_config {
+	__u64 reported_tcb;	/* In */
+	__u32 mask_chip_id;	/* In */
+	__u8 rsvd[52];
+} __packed;
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 13/37] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (11 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 12/37] crypto:ccp: Define the SEV-SNP commands Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 14/37] crypto: ccp: Shutdown SNP firmware on kexec Brijesh Singh
                   ` (23 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

Before SNP VMs can be launched, the platform must be appropriately
configured and initialized. Platform initialization is accomplished via
the SNP_INIT command.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 107 ++++++++++++++++++++++++++++++++++-
 drivers/crypto/ccp/sev-dev.h |   2 +
 include/linux/psp-sev.h      |  16 ++++++
 3 files changed, 123 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 09d117b99bf5..852bbeac1019 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -590,6 +590,92 @@ static int sev_update_firmware(struct device *dev)
 	return ret;
 }
 
+static void snp_set_hsave_pa(void *arg)
+{
+	wrmsrl(MSR_VM_HSAVE_PA, 0);
+}
+
+static int __sev_snp_init_locked(int *error)
+{
+	struct psp_device *psp = psp_master;
+	struct sev_device *sev;
+	int rc = 0;
+
+	if (!psp || !psp->sev_data)
+		return -ENODEV;
+
+	sev = psp->sev_data;
+
+	if (sev->snp_inited)
+		return 0;
+
+	/* SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h across all cores. */
+	on_each_cpu(snp_set_hsave_pa, NULL, 1);
+
+	/* Prepare for first SEV guest launch after INIT */
+	wbinvd_on_all_cpus();
+
+	/* Issue the SNP_INIT firmware command. */
+	rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
+	if (rc)
+		return rc;
+
+	sev->snp_inited = true;
+	dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+
+	return rc;
+}
+
+int sev_snp_init(int *error)
+{
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return -ENODEV;
+
+	mutex_lock(&sev_cmd_mutex);
+	rc = __sev_snp_init_locked(error);
+	mutex_unlock(&sev_cmd_mutex);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(sev_snp_init);
+
+static int __sev_snp_shutdown_locked(int *error)
+{
+	struct sev_device *sev = psp_master->sev_data;
+	int ret;
+
+	if (!sev->snp_inited)
+		return 0;
+
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN, NULL, error);
+	if (ret)
+		return ret;
+
+	wbinvd_on_all_cpus();
+
+	ret = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+	if (ret)
+		dev_err(sev->dev, "SEV-SNP firmware DF_FLUSH failed\n");
+
+	sev->snp_inited = false;
+	dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
+
+	return ret;
+}
+
+static int sev_snp_shutdown(int *error)
+{
+	int rc;
+
+	mutex_lock(&sev_cmd_mutex);
+	rc = __sev_snp_shutdown_locked(NULL);
+	mutex_unlock(&sev_cmd_mutex);
+
+	return rc;
+}
+
 static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -1089,6 +1175,21 @@ void sev_pci_init(void)
 			 "SEV: TMR allocation failed, SEV-ES support unavailable\n");
 	}
 
+	/*
+	 * If boot CPU supports the SNP, then let first attempt to initialize
+	 * the SNP firmware.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_SEV_SNP)) {
+		rc = sev_snp_init(&error);
+		if (rc) {
+			/*
+			 * If we failed to INIT SNP then don't abort the probe.
+			 * Continue to initialize the legacy SEV firmware.
+			 */
+			dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
+		}
+	}
+
 	/* Initialize the platform */
 	rc = sev_platform_init(&error);
 	if (rc && (error == SEV_RET_SECURE_DATA_INVALID)) {
@@ -1108,8 +1209,8 @@ void sev_pci_init(void)
 		return;
 	}
 
-	dev_info(sev->dev, "SEV API:%d.%d build:%d\n", sev->api_major,
-		 sev->api_minor, sev->build);
+	dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_inited ?
+		"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
 
 	return;
 
@@ -1132,4 +1233,6 @@ void sev_pci_exit(void)
 			   get_order(SEV_ES_TMR_SIZE));
 		sev_es_tmr = NULL;
 	}
+
+	sev_snp_shutdown(NULL);
 }
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 666c21eb81ab..186ad20cbd24 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -52,6 +52,8 @@ struct sev_device {
 	u8 build;
 
 	void *cmd_buf;
+
+	bool snp_inited;
 };
 
 int sev_dev_init(struct psp_device *psp);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index c3755099ab55..1b53e8782250 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -748,6 +748,20 @@ struct sev_data_snp_init_ex {
  */
 int sev_platform_init(int *error);
 
+/**
+ * sev_snp_init - perform SEV SNP_INIT command
+ *
+ * @error: SEV command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV    if the SEV device is not available
+ * -%ENOTSUPP  if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO       if the SEV returned a non-zero return code
+ */
+int sev_snp_init(int *error);
+
 /**
  * sev_platform_status - perform SEV PLATFORM_STATUS command
  *
@@ -855,6 +869,8 @@ sev_platform_status(struct sev_user_data_status *status, int *error) { return -E
 
 static inline int sev_platform_init(int *error) { return -ENODEV; }
 
+static inline int sev_snp_init(int *error) { return -ENODEV; }
+
 static inline int
 sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENODEV; }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 14/37] crypto: ccp: Shutdown SNP firmware on kexec
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (12 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 13/37] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Brijesh Singh
@ 2021-04-30 12:37 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 15/37] crypto:ccp: Provide APIs to issue SEV-SNP commands Brijesh Singh
                   ` (22 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:37 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

When the kernel is getting ready to kexec, it calls the device_shutdown() to
allow drivers to cleanup before the kexec. If SEV firmware is initialized
then shut it down before kexec'ing the new kernel.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 50 ++++++++++++++++--------------------
 drivers/crypto/ccp/sp-pci.c  | 12 +++++++++
 2 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 852bbeac1019..23ad6e7696df 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1109,6 +1109,22 @@ int sev_dev_init(struct psp_device *psp)
 	return ret;
 }
 
+static void sev_firmware_shutdown(struct sev_device *sev)
+{
+	sev_platform_shutdown(NULL);
+
+	if (sev_es_tmr) {
+		/* The TMR area was encrypted, flush it from the cache */
+		wbinvd_on_all_cpus();
+
+		free_pages((unsigned long)sev_es_tmr,
+			   get_order(SEV_ES_TMR_SIZE));
+		sev_es_tmr = NULL;
+	}
+
+	sev_snp_shutdown(NULL);
+}
+
 void sev_dev_destroy(struct psp_device *psp)
 {
 	struct sev_device *sev = psp->sev_data;
@@ -1116,6 +1132,8 @@ void sev_dev_destroy(struct psp_device *psp)
 	if (!sev)
 		return;
 
+	sev_firmware_shutdown(sev);
+
 	if (sev->misc)
 		kref_put(&misc_dev->refcount, sev_exit);
 
@@ -1146,21 +1164,6 @@ void sev_pci_init(void)
 	if (sev_get_api_version())
 		goto err;
 
-	/*
-	 * If platform is not in UNINIT state then firmware upgrade and/or
-	 * platform INIT command will fail. These command require UNINIT state.
-	 *
-	 * In a normal boot we should never run into case where the firmware
-	 * is not in UNINIT state on boot. But in case of kexec boot, a reboot
-	 * may not go through a typical shutdown sequence and may leave the
-	 * firmware in INIT or WORKING state.
-	 */
-
-	if (sev->state != SEV_STATE_UNINIT) {
-		sev_platform_shutdown(NULL);
-		sev->state = SEV_STATE_UNINIT;
-	}
-
 	if (sev_version_greater_or_equal(0, 15) &&
 	    sev_update_firmware(sev->dev) == 0)
 		sev_get_api_version();
@@ -1220,19 +1223,10 @@ void sev_pci_init(void)
 
 void sev_pci_exit(void)
 {
-	if (!psp_master->sev_data)
-		return;
-
-	sev_platform_shutdown(NULL);
-
-	if (sev_es_tmr) {
-		/* The TMR area was encrypted, flush it from the cache */
-		wbinvd_on_all_cpus();
+	struct sev_device *sev = psp_master->sev_data;
 
-		free_pages((unsigned long)sev_es_tmr,
-			   get_order(SEV_ES_TMR_SIZE));
-		sev_es_tmr = NULL;
-	}
+	if (!sev)
+		return;
 
-	sev_snp_shutdown(NULL);
+	sev_firmware_shutdown(sev);
 }
diff --git a/drivers/crypto/ccp/sp-pci.c b/drivers/crypto/ccp/sp-pci.c
index f471dbaef1fb..9210bfda91a2 100644
--- a/drivers/crypto/ccp/sp-pci.c
+++ b/drivers/crypto/ccp/sp-pci.c
@@ -239,6 +239,17 @@ static int sp_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	return ret;
 }
 
+static void sp_pci_shutdown(struct pci_dev *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct sp_device *sp = dev_get_drvdata(dev);
+
+	if (!sp)
+		return;
+
+	sp_destroy(sp);
+}
+
 static void sp_pci_remove(struct pci_dev *pdev)
 {
 	struct device *dev = &pdev->dev;
@@ -368,6 +379,7 @@ static struct pci_driver sp_pci_driver = {
 	.id_table = sp_pci_table,
 	.probe = sp_pci_probe,
 	.remove = sp_pci_remove,
+	.shutdown = sp_pci_shutdown,
 	.driver.pm = &sp_pci_pm_ops,
 };
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 15/37] crypto:ccp: Provide APIs to issue SEV-SNP commands
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (13 preceding siblings ...)
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 14/37] crypto: ccp: Shutdown SNP firmware on kexec Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 16/37] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Brijesh Singh
                   ` (21 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

Provide the APIs for the hypervisor to manage an SEV-SNP guest. The
commands for SEV-SNP is defined in the SEV-SNP firmware specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 24 ++++++++++++
 include/linux/psp-sev.h      | 74 ++++++++++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 23ad6e7696df..75ec67ba2b55 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1010,6 +1010,30 @@ int sev_guest_df_flush(int *error)
 }
 EXPORT_SYMBOL_GPL(sev_guest_df_flush);
 
+int snp_guest_decommission(struct sev_data_snp_decommission *data, int *error)
+{
+	return sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, data, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_decommission);
+
+int snp_guest_df_flush(int *error)
+{
+	return sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_df_flush);
+
+int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error)
+{
+	return sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, data, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_page_reclaim);
+
+int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
+{
+	return sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, data, error);
+}
+EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt);
+
 static void sev_exit(struct kref *ref)
 {
 	misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 1b53e8782250..63ef766cbd7a 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -860,6 +860,65 @@ int sev_guest_df_flush(int *error);
  */
 int sev_guest_decommission(struct sev_data_decommission *data, int *error);
 
+/**
+ * snp_guest_df_flush - perform SNP DF_FLUSH command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV    if the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO       if the sev returned a non-zero return code
+ */
+int snp_guest_df_flush(int *error);
+
+/**
+ * snp_guest_decommission - perform SNP_DECOMMISSION command
+ *
+ * @decommission: sev_data_decommission structure to be processed
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV    if the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO       if the sev returned a non-zero return code
+ */
+int snp_guest_decommission(struct sev_data_snp_decommission *data, int *error);
+
+/**
+ * snp_guest_page_reclaim - perform SNP_PAGE_RECLAIM command
+ *
+ * @decommission: sev_snp_page_reclaim structure to be processed
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV    if the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO       if the sev returned a non-zero return code
+ */
+int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
+
+/**
+ * snp_guest_dbg_decrypt - perform SEV SNP_DBG_DECRYPT command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEV    if the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO       if the sev returned a non-zero return code
+ */
+int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
+
+
 void *psp_copy_user_blob(u64 uaddr, u32 len);
 
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
@@ -887,6 +946,21 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int
 
 static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }
 
+static inline int
+snp_guest_decommission(struct sev_data_snp_decommission *data, int *error) { return -ENODEV; }
+
+static inline int snp_guest_df_flush(int *error) { return -ENODEV; }
+
+static inline int snp_guest_page_reclaim(struct sev_data_snp_page_reclaim *data, int *error)
+{
+	return -ENODEV;
+}
+
+static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
+{
+	return -ENODEV;
+}
+
 #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
 
 #endif	/* __PSP_SEV_H__ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 16/37] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (14 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 15/37] crypto:ccp: Provide APIs to issue SEV-SNP commands Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-05-10 18:23   ` Peter Gonda
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 17/37] crypto: ccp: Handle the legacy SEV command " Brijesh Singh
                   ` (20 subsequent siblings)
  36 siblings, 1 reply; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The behavior and requirement for the SEV-legacy command is altered when
the SNP firmware is in the INIT state. See SEV-SNP firmware specification
for more details.

When SNP is INIT state, all the SEV-legacy commands that cause the
firmware to write memory must be in the firmware state. The TMR memory
is allocated by the host but updated by the firmware, so, it must be
in the firmware state.  Additionally, the TMR memory must be a 2MB aligned
instead of the 1MB, and the TMR length need to be 2MB instead of 1MB.
The helper __snp_{alloc,free}_firmware_pages() can be used for allocating
and freeing the memory used by the firmware.

While at it, provide API that can be used by others to allocate a page
that can be used by the firmware. The immediate user for this API will
be the KVM driver. The KVM driver to need to allocate a firmware context
page during the guest creation. The context page need to be updated
by the firmware. See the SEV-SNP specification for further details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 130 +++++++++++++++++++++++++++++++----
 include/linux/psp-sev.h      |  11 +++
 2 files changed, 128 insertions(+), 13 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 75ec67ba2b55..fe104d50d83d 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -53,6 +53,14 @@ static int psp_timeout;
 #define SEV_ES_TMR_SIZE		(1024 * 1024)
 static void *sev_es_tmr;
 
+/* When SEV-SNP is enabled the TMR need to be 2MB aligned and 2MB size. */
+#define SEV_SNP_ES_TMR_SIZE	(2 * 1024 * 1024)
+
+static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
+
+static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
+static int sev_do_cmd(int cmd, void *data, int *psp_ret);
+
 static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
 {
 	struct sev_device *sev = psp_master->sev_data;
@@ -150,6 +158,100 @@ static int sev_cmd_buffer_len(int cmd)
 	return 0;
 }
 
+static int snp_reclaim_page(struct page *page, bool locked)
+{
+	struct sev_data_snp_page_reclaim data = {};
+	int ret, err;
+
+	data.paddr = page_to_pfn(page) << PAGE_SHIFT;
+
+	if (locked)
+		ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+	else
+		ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+
+	return ret;
+}
+
+static int snp_set_rmptable_state(unsigned long paddr, int npages,
+				  struct rmpupdate *val, bool locked, bool need_reclaim)
+{
+	unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+	unsigned long pfn_end = pfn + npages;
+	int rc;
+
+	while (pfn < pfn_end) {
+		if (need_reclaim)
+			if (snp_reclaim_page(pfn_to_page(pfn), locked))
+				return -EFAULT;
+
+		rc = rmpupdate(pfn_to_page(pfn), val);
+		if (rc)
+			return rc;
+
+		pfn++;
+	}
+
+	return 0;
+}
+
+static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order)
+{
+	struct rmpupdate val = {};
+	unsigned long paddr;
+	struct page *page;
+
+	page = alloc_pages(gfp_mask, order);
+	if (!page)
+		return NULL;
+
+	val.assigned = 1;
+	val.immutable = 1;
+	paddr = __pa((unsigned long)page_address(page));
+
+	if (snp_set_rmptable_state(paddr, 1 << order, &val, false, true)) {
+		__free_pages(page, order);
+		return NULL;
+	}
+
+	return page;
+}
+
+void *snp_alloc_firmware_page(gfp_t gfp_mask)
+{
+	struct page *page;
+
+	page = __snp_alloc_firmware_pages(gfp_mask, 0);
+
+	return page ? page_address(page) : NULL;
+}
+EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
+
+static void __snp_free_firmware_pages(struct page *page, int order)
+{
+	struct rmpupdate val = {};
+	unsigned long paddr;
+
+	if (!page)
+		return;
+
+	paddr = __pa((unsigned long)page_address(page));
+
+	if (snp_set_rmptable_state(paddr, 1 << order, &val, false, true))
+		return;
+
+	__free_pages(page, order);
+}
+
+void snp_free_firmware_page(void *addr)
+{
+	if (!addr)
+		return;
+
+	__snp_free_firmware_pages(virt_to_page(addr), 0);
+}
+EXPORT_SYMBOL(snp_free_firmware_page);
+
 static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 {
 	struct psp_device *psp = psp_master;
@@ -272,7 +374,7 @@ static int __sev_platform_init_locked(int *error)
 
 		data.flags |= SEV_INIT_FLAGS_SEV_ES;
 		data.tmr_address = tmr_pa;
-		data.tmr_len = SEV_ES_TMR_SIZE;
+		data.tmr_len = sev_es_tmr_size;
 	}
 
 	rc = __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
@@ -623,6 +725,8 @@ static int __sev_snp_init_locked(int *error)
 	sev->snp_inited = true;
 	dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
 
+	sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
+
 	return rc;
 }
 
@@ -1141,8 +1245,8 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 		/* The TMR area was encrypted, flush it from the cache */
 		wbinvd_on_all_cpus();
 
-		free_pages((unsigned long)sev_es_tmr,
-			   get_order(SEV_ES_TMR_SIZE));
+
+		__snp_free_firmware_pages(virt_to_page(sev_es_tmr), get_order(sev_es_tmr_size));
 		sev_es_tmr = NULL;
 	}
 
@@ -1192,16 +1296,6 @@ void sev_pci_init(void)
 	    sev_update_firmware(sev->dev) == 0)
 		sev_get_api_version();
 
-	/* Obtain the TMR memory area for SEV-ES use */
-	tmr_page = alloc_pages(GFP_KERNEL, get_order(SEV_ES_TMR_SIZE));
-	if (tmr_page) {
-		sev_es_tmr = page_address(tmr_page);
-	} else {
-		sev_es_tmr = NULL;
-		dev_warn(sev->dev,
-			 "SEV: TMR allocation failed, SEV-ES support unavailable\n");
-	}
-
 	/*
 	 * If boot CPU supports the SNP, then let first attempt to initialize
 	 * the SNP firmware.
@@ -1217,6 +1311,16 @@ void sev_pci_init(void)
 		}
 	}
 
+	/* Obtain the TMR memory area for SEV-ES use */
+	tmr_page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(sev_es_tmr_size));
+	if (tmr_page) {
+		sev_es_tmr = page_address(tmr_page);
+	} else {
+		sev_es_tmr = NULL;
+		dev_warn(sev->dev,
+			 "SEV: TMR allocation failed, SEV-ES support unavailable\n");
+	}
+
 	/* Initialize the platform */
 	rc = sev_platform_init(&error);
 	if (rc && (error == SEV_RET_SECURE_DATA_INVALID)) {
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 63ef766cbd7a..b72a74f6a4e9 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -12,6 +12,8 @@
 #ifndef __PSP_SEV_H__
 #define __PSP_SEV_H__
 
+#include <linux/sev.h>
+
 #include <uapi/linux/psp-sev.h>
 
 #ifdef CONFIG_X86
@@ -920,6 +922,8 @@ int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
 
 
 void *psp_copy_user_blob(u64 uaddr, u32 len);
+void *snp_alloc_firmware_page(gfp_t mask);
+void snp_free_firmware_page(void *addr);
 
 #else	/* !CONFIG_CRYPTO_DEV_SP_PSP */
 
@@ -961,6 +965,13 @@ static inline int snp_guest_dbg_decrypt(struct sev_data_snp_dbg *data, int *erro
 	return -ENODEV;
 }
 
+static inline void *snp_alloc_firmware_page(gfp_t mask)
+{
+	return NULL;
+}
+
+static inline void snp_free_firmware_page(void *addr) { }
+
 #endif	/* CONFIG_CRYPTO_DEV_SP_PSP */
 
 #endif	/* __PSP_SEV_H__ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 17/37] crypto: ccp: Handle the legacy SEV command when SNP is enabled
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (15 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 16/37] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 18/37] KVM: SVM: make AVIC backing, VMSA and VMCB memory allocation SNP safe Brijesh Singh
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The behavior of the SEV-legacy commands is altered when the SNP firmware
is in the INIT state. When SNP is in INIT state, all the SEV-legacy
commands that cause the firmware to write to memory must be in the
firmware state before issuing the command..

A command buffer may contains a system physical address that the firmware
may write to. There are two cases that need to be handled:

1) system physical address points to a guest memory
2) system physical address points to a host memory

To handle the case #1, map_firmware_writeable() helper simply
changes the page state in the RMP table before and after the command is
sent to the firmware.

For the case #2, the map_firmware_writeable() replaces the host system
physical memory with a pre-allocated firmware page, and after the command
completes, the unmap_firmware_writeable() copies the content from
pre-allocated firmware page to original host system physical.

The unmap_firmware_writeable() calls a __sev_do_cmd_locked() to clear
the immutable bit from the memory page. To support the nested calling,
a separate command buffer is required. Allocate a backup command buffer
and keep reference count of it. If a nested call is detected then use the
backup cmd_buf to complete the command submission.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 drivers/crypto/ccp/sev-dev.c | 348 ++++++++++++++++++++++++++++++++++-
 drivers/crypto/ccp/sev-dev.h |  12 ++
 2 files changed, 350 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index fe104d50d83d..5b3f3f718cfb 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -252,12 +252,299 @@ void snp_free_firmware_page(void *addr)
 }
 EXPORT_SYMBOL(snp_free_firmware_page);
 
+static int alloc_snp_host_map(struct sev_device *sev)
+{
+	struct page *page;
+	int i;
+
+	for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+		struct snp_host_map *map = &sev->snp_host_map[i];
+
+		memset(map, 0, sizeof(*map));
+
+		page = __snp_alloc_firmware_pages(GFP_KERNEL_ACCOUNT,
+						  get_order(SEV_FW_BLOB_MAX_SIZE));
+		if (!page)
+			return -ENOMEM;
+
+		map->host = page_address(page);
+	}
+
+	return 0;
+}
+
+static void free_snp_host_map(struct sev_device *sev)
+{
+	int i;
+
+	for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+		struct snp_host_map *map = &sev->snp_host_map[i];
+
+		if (map->host) {
+			__snp_free_firmware_pages(virt_to_page(map->host),
+						  get_order(SEV_FW_BLOB_MAX_SIZE));
+			memset(map, 0, sizeof(*map));
+		}
+	}
+}
+
+static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+	unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+	int ret;
+
+	map->active = false;
+
+	if (!paddr || !len)
+		return 0;
+
+	map->paddr = *paddr;
+	map->len = len;
+
+	/* If paddr points to a guest memory then change the page state to firmwware. */
+	if (guest) {
+		struct rmpupdate val = {};
+
+		val.immutable = true;
+		val.assigned = true;
+		ret = snp_set_rmptable_state(*paddr, npages, &val, true, false);
+		if (ret)
+			return ret;
+
+		goto done;
+	}
+
+	if (unlikely(!map->host))
+		return -EINVAL;
+
+	/* Check if the pre-allocated buffer can be used to fullfil the request. */
+	if (unlikely(len > SEV_FW_BLOB_MAX_SIZE))
+		return -EINVAL;
+
+	/* Set the paddr to use an intermediate firmware buffer */
+	*paddr = __psp_pa(map->host);
+
+done:
+	map->active = true;
+	return 0;
+}
+
+static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+	unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+	int ret;
+
+	if (!map->active)
+		return 0;
+
+	/* If paddr points to a guest memory then restore the page state to hypervisor. */
+	if (guest) {
+		struct rmpupdate val = {};
+
+		ret = snp_set_rmptable_state(*paddr, npages, &val, true, true);
+		if (ret)
+			return ret;
+
+		goto done;
+	}
+
+	/* Copy the response data firmware buffer to the callers buffer. */
+	memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
+	*paddr = map->paddr;
+
+done:
+	map->active = false;
+	return 0;
+}
+
+static bool sev_legacy_cmd_buf_writable(int cmd)
+{
+	switch (cmd) {
+	case SEV_CMD_PLATFORM_STATUS:
+	case SEV_CMD_GUEST_STATUS:
+	case SEV_CMD_LAUNCH_START:
+	case SEV_CMD_RECEIVE_START:
+	case SEV_CMD_LAUNCH_MEASURE:
+	case SEV_CMD_SEND_START:
+	case SEV_CMD_SEND_UPDATE_DATA:
+	case SEV_CMD_SEND_UPDATE_VMSA:
+	case SEV_CMD_PEK_CSR:
+	case SEV_CMD_PDH_CERT_EXPORT:
+	case SEV_CMD_GET_ID:
+	case SEV_CMD_ATTESTATION_REPORT:
+		return true;
+	default:
+		return false;
+	}
+}
+
+#define prep_buffer(name, addr, len, guest, map)  \
+	   func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
+
+static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
+{
+	int (*func)(u64 *, u32, bool, struct snp_host_map *);
+	struct sev_device *sev = psp_master->sev_data;
+	struct rmpupdate val = {};
+	bool from_fw = !to_fw;
+	int ret;
+
+	/*
+	 * After the command is completed, change the command buffer memory to
+	 * hypervisor state.
+	 *
+	 * The immutable bit is automatically cleared by the firmware, so
+	 * no not need to reclaim the page.
+	 */
+	if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
+		ret = snp_set_rmptable_state(__pa(cmd_buf), 1, &val, true, false);
+		if (ret)
+			return ret;
+
+		/* No need to go further if firmware failed to execute command. */
+		if (fw_err)
+			return 0;
+	}
+
+	if (to_fw)
+		func = map_firmware_writeable;
+	else
+		func = unmap_firmware_writeable;
+
+	/*
+	 * A command buffer may contains a system physical address. If the address
+	 * points to a host memory then use an intermediate firmware page otherwise
+	 * change the page state in the RMP table.
+	 */
+	switch (cmd) {
+	case SEV_CMD_PDH_CERT_EXPORT:
+		if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
+				pdh_cert_len, false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
+				cert_chain_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_GET_ID:
+		if (prep_buffer(struct sev_data_get_id, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_PEK_CSR:
+		if (prep_buffer(struct sev_data_pek_csr, address, len,
+				    false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_launch_update_data, address, len,
+				    true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
+				true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_MEASURE:
+		if (prep_buffer(struct sev_data_launch_measure, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_LAUNCH_UPDATE_SECRET:
+		if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
+				true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_DBG_DECRYPT:
+		if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
+				&sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_DBG_ENCRYPT:
+		if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
+				&sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_ATTESTATION_REPORT:
+		if (prep_buffer(struct sev_data_attestation_report, address, len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_START:
+		if (prep_buffer(struct sev_data_send_start, session_address,
+				session_len, false, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_send_update_data, trans_address,
+				trans_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_SEND_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
+				false, &sev->snp_host_map[0]))
+			goto err;
+		if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
+				trans_len, false, &sev->snp_host_map[1]))
+			goto err;
+		break;
+	case SEV_CMD_RECEIVE_UPDATE_DATA:
+		if (prep_buffer(struct sev_data_receive_update_data, guest_address,
+				guest_len, true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	case SEV_CMD_RECEIVE_UPDATE_VMSA:
+		if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
+				guest_len, true, &sev->snp_host_map[0]))
+			goto err;
+		break;
+	default:
+		break;
+	}
+
+	/* The command buffer need to be in the firmware state. */
+	if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
+		val.assigned = true;
+		val.immutable = true;
+		ret = snp_set_rmptable_state(__pa(cmd_buf), 1, &val, true, false);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+
+err:
+	return -EINVAL;
+}
+
+static inline bool need_firmware_copy(int cmd)
+{
+	struct sev_device *sev = psp_master->sev_data;
+
+	/* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
+	return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_inited) ? true : false;
+}
+
+static int snp_aware_copy_to_firmware(int cmd, void *data)
+{
+	return __snp_cmd_buf_copy(cmd, data, true, 0);
+}
+
+static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
+{
+	return __snp_cmd_buf_copy(cmd, data, false, fw_err);
+}
+
 static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 {
 	struct psp_device *psp = psp_master;
 	struct sev_device *sev;
 	unsigned int phys_lsb, phys_msb;
 	unsigned int reg, ret = 0;
+	void *cmd_buf;
 	int buf_len;
 
 	if (!psp || !psp->sev_data)
@@ -277,12 +564,26 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 	 * work for some memory, e.g. vmalloc'd addresses, and @data may not be
 	 * physically contiguous.
 	 */
-	if (data)
-		memcpy(sev->cmd_buf, data, buf_len);
+	if (data) {
+		if (unlikely(sev->cmd_buf_active > 2))
+			return -EBUSY;
+
+		cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
+
+		memcpy(cmd_buf, data, buf_len);
+		sev->cmd_buf_active++;
+
+		/*
+		 * The behavior of the SEV-legacy commands is altered when the
+		 * SNP firmware is in the INIT state.
+		 */
+		if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, sev->cmd_buf))
+			return -EFAULT;
+	}
 
 	/* Get the physical address of the command buffer */
-	phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
-	phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
+	phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
+	phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
 
 	dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
 		cmd, phys_msb, phys_lsb, psp_timeout);
@@ -323,15 +624,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 		ret = -EIO;
 	}
 
-	print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
-			     buf_len, false);
-
 	/*
 	 * Copy potential output from the PSP back to data.  Do this even on
 	 * failure in case the caller wants to glean something from the error.
 	 */
-	if (data)
-		memcpy(data, sev->cmd_buf, buf_len);
+	if (data) {
+		/*
+		 * Restore the page state after the command completes.
+		 */
+		if (need_firmware_copy(cmd) &&
+		    snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
+			return -EFAULT;
+
+		memcpy(data, cmd_buf, buf_len);
+		sev->cmd_buf_active--;
+	}
+
+	print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
+			     buf_len, false);
 
 	return ret;
 }
@@ -1195,10 +1505,12 @@ int sev_dev_init(struct psp_device *psp)
 	if (!sev)
 		goto e_err;
 
-	sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
+	sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
 	if (!sev->cmd_buf)
 		goto e_sev;
 
+	sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+
 	psp->sev_data = sev;
 
 	sev->dev = dev;
@@ -1250,6 +1562,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
 		sev_es_tmr = NULL;
 	}
 
+	/*
+	 * The host map need to clear the immutable bit so it must be free'd before the
+	 * SNP firmware shutdown.
+	 */
+	free_snp_host_map(sev);
+
 	sev_snp_shutdown(NULL);
 }
 
@@ -1309,6 +1627,14 @@ void sev_pci_init(void)
 			 */
 			dev_err(sev->dev, "SEV-SNP: failed to INIT error %#x\n", error);
 		}
+
+		/*
+		 * Allocate the intermediate buffers used for the legacy command handling.
+		 */
+		if (alloc_snp_host_map(sev)) {
+			dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
+			goto skip_legacy;
+		}
 	}
 
 	/* Obtain the TMR memory area for SEV-ES use */
@@ -1340,12 +1666,14 @@ void sev_pci_init(void)
 		return;
 	}
 
+skip_legacy:
 	dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_inited ?
 		"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
 
 	return;
 
 err:
+	free_snp_host_map(sev);
 	psp_master->sev_data = NULL;
 }
 
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 186ad20cbd24..fe5d7a3ebace 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -29,11 +29,20 @@
 #define SEV_CMDRESP_CMD_SHIFT		16
 #define SEV_CMDRESP_IOC			BIT(0)
 
+#define MAX_SNP_HOST_MAP_BUFS		2
+
 struct sev_misc_dev {
 	struct kref refcount;
 	struct miscdevice misc;
 };
 
+struct snp_host_map {
+	u64 paddr;
+	u32 len;
+	void *host;
+	bool active;
+};
+
 struct sev_device {
 	struct device *dev;
 	struct psp_device *psp;
@@ -52,8 +61,11 @@ struct sev_device {
 	u8 build;
 
 	void *cmd_buf;
+	void *cmd_buf_backup;
+	int cmd_buf_active;
 
 	bool snp_inited;
+	struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
 };
 
 int sev_dev_init(struct psp_device *psp);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 18/37] KVM: SVM: make AVIC backing, VMSA and VMCB memory allocation SNP safe
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (16 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 17/37] crypto: ccp: Handle the legacy SEV command " Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 19/37] KVM: SVM: Add initial SEV-SNP support Brijesh Singh
                   ` (18 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

When SEV-SNP is globally enabled on a system, the VMRUN instruction
performs additional security checks on AVIC backing, VMSA, and VMCB page.
On a successful VMRUN, these pages are marked "in-use" by the
hardware in the RMP entry, and any attempt to modify the RMP entry for
these pages will result in page-fault (RMP violation check).

While performing the RMP check, hardware will try to create a 2MB TLB
entry for the large page accesses. When it does this, it first reads
the RMP for the base of 2MB region and verifies that all this memory is
safe. If AVIC backing, VMSA, and VMCB memory happen to be the base of
2MB region, then RMP check will fail because of the "in-use" marking for
the base entry of this 2MB region.

e.g.

1. A VMCB was allocated on 2MB-aligned address.
2. The VMRUN instruction marks this RMP entry as "in-use".
3. Another process allocated some other page of memory that happened to be
   within the same 2MB region.
4. That process tried to write its page using physmap.

If the physmap entry in step #4 uses a large (1G/2M) page, then the
hardware will attempt to create a 2M TLB entry. The hardware will find
that the "in-use" bit is set in the RMP entry (because it was a
VMCB page) and will cause an RMP violation check.

See APM2 section 15.36.12 for more information on VMRUN checks when
SEV-SNP is globally active.

A generic allocator can return a page which are 2M aligned and will not
be safe to be used when SEV-SNP is globally enabled. Add a
snp_safe_alloc_page() helper that can be used for allocating the
SNP safe memory. The helper allocated 2 pages and splits them into order-1
allocation. It frees one page and keeps one of the page which is not
2M aligned.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/lapic.c            |  5 ++++-
 arch/x86/kvm/svm/sev.c          | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          | 16 ++++++++++++++--
 arch/x86/kvm/svm/svm.h          |  1 +
 5 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ad22d4839bcc..71e79a1998ad 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1381,6 +1381,7 @@ struct kvm_x86_ops {
 	int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 152591f9243a..897ce6ebdd7c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2441,7 +2441,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
 
 	vcpu->arch.apic = apic;
 
-	apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+	if (kvm_x86_ops.alloc_apic_backing_page)
+		apic->regs = kvm_x86_ops.alloc_apic_backing_page(vcpu);
+	else
+		apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
 	if (!apic->regs) {
 		printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
 		       vcpu->vcpu_id);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5f0034e0dacc..b750e435626a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2696,3 +2696,30 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 		break;
 	}
 }
+
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
+{
+	unsigned long pfn;
+	struct page *p;
+
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+
+	p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+	if (!p)
+		return NULL;
+
+	/* split the page order */
+	split_page(p, 1);
+
+	/* Find a non-2M aligned page */
+	pfn = page_to_pfn(p);
+	if (IS_ALIGNED(__pfn_to_phys(pfn), PMD_SIZE)) {
+		pfn++;
+		__free_page(p);
+	} else {
+		__free_page(pfn_to_page(pfn + 1));
+	}
+
+	return pfn_to_page(pfn);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 392d44a2756d..ede3cf460894 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1323,7 +1323,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	svm = to_svm(vcpu);
 
 	err = -ENOMEM;
-	vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	vmcb01_page = snp_safe_alloc_page(vcpu);
 	if (!vmcb01_page)
 		goto out;
 
@@ -1332,7 +1332,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 		 * SEV-ES guests require a separate VMSA page used to contain
 		 * the encrypted register state of the guest.
 		 */
-		vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+		vmsa_page = snp_safe_alloc_page(vcpu);
 		if (!vmsa_page)
 			goto error_free_vmcb_page;
 
@@ -4480,6 +4480,16 @@ static int svm_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
+{
+	struct page *page = snp_safe_alloc_page(vcpu);
+
+	if (!page)
+		return NULL;
+
+	return page_address(page);
+}
+
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.hardware_unsetup = svm_hardware_teardown,
 	.hardware_enable = svm_hardware_enable,
@@ -4605,6 +4615,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.complete_emulated_msr = svm_complete_emulated_msr,
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
+
+	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 053f2505a738..894e828227d9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -553,6 +553,7 @@ void sev_es_init_vmcb(struct vcpu_svm *svm);
 void sev_es_create_vcpu(struct vcpu_svm *svm);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 
 /* vmenter.S */
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 19/37] KVM: SVM: Add initial SEV-SNP support
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (17 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 18/37] KVM: SVM: make AVIC backing, VMSA and VMCB memory allocation SNP safe Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 20/37] KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area Brijesh Singh
                   ` (17 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality  while adding new
hardware based security protection. SEV-SNP adds strong memory encryption
integrity protection to help prevent malicious hypervisor-based attacks
such as data replay, memory re-mapping, and more, to create an isolated
execution environment.

The SNP feature can be enabled in the KVM by passing the sev-snp module
parameter.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 18 ++++++++++++++++++
 arch/x86/kvm/svm/svm.h | 12 ++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b750e435626a..200d227f9232 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -52,9 +52,14 @@ module_param_named(sev, sev_enabled, bool, 0444);
 /* enable/disable SEV-ES support */
 static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
+
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
 #else
 #define sev_enabled false
 #define sev_es_enabled false
+#define sev_snp_enabled  false
 #endif /* CONFIG_KVM_AMD_SEV */
 
 #define AP_RESET_HOLD_NONE		0
@@ -1826,6 +1831,7 @@ void __init sev_hardware_setup(void)
 {
 #ifdef CONFIG_KVM_AMD_SEV
 	unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+	bool sev_snp_supported = false;
 	bool sev_es_supported = false;
 	bool sev_supported = false;
 
@@ -1889,9 +1895,21 @@ void __init sev_hardware_setup(void)
 	pr_info("SEV-ES supported: %u ASIDs\n", sev_es_asid_count);
 	sev_es_supported = true;
 
+	/* SEV-SNP support requested? */
+	if (!sev_snp_enabled)
+		goto out;
+
+	/* Is SEV-SNP enabled? */
+	if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+		goto out;
+
+	pr_info("SEV-SNP supported: %u ASIDs\n", min_sev_asid - 1);
+	sev_snp_supported = true;
+
 out:
 	sev_enabled = sev_supported;
 	sev_es_enabled = sev_es_supported;
+	sev_snp_enabled = sev_snp_supported;
 #endif
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 894e828227d9..85a2d5857ffb 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -58,6 +58,7 @@ enum {
 struct kvm_sev_info {
 	bool active;		/* SEV enabled guest */
 	bool es_active;		/* SEV-ES enabled guest */
+	bool snp_active;	/* SEV-SNP enabled guest */
 	unsigned int asid;	/* ASID used for this guest */
 	unsigned int handle;	/* SEV firmware handle */
 	int fd;			/* SEV device fd */
@@ -232,6 +233,17 @@ static inline bool sev_es_guest(struct kvm *kvm)
 #endif
 }
 
+static inline bool sev_snp_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	return sev_es_guest(kvm) && sev->snp_active;
+#else
+	return false;
+#endif
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 20/37] KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (18 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 19/37] KVM: SVM: Add initial SEV-SNP support Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 21/37] KVM: SVM: Add KVM_SNP_INIT command Brijesh Singh
                   ` (16 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The hypervisor uses the SEV_FEATURES field (offset 3B0h) in the Save State
Area to control the SEV-SNP guest features such as SNPActive, vTOM,
ReflectVC etc. An SEV-SNP guest can read the SEV_FEATURES fields through
the SEV_STATUS MSR.

While at it, define the VMPL field and update the dump_vmcb().

See APM2 Table 15-34 and B-4 for more details.
---
 arch/x86/include/asm/svm.h | 15 +++++++++++++--
 arch/x86/kvm/svm/svm.c     |  4 ++--
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 772e60efe243..ff614cdcf628 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -212,6 +212,15 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define SVM_NESTED_CTL_SEV_ENABLE	BIT(1)
 #define SVM_NESTED_CTL_SEV_ES_ENABLE	BIT(2)
 
+#define SVM_SEV_FEATURES_SNP_ACTIVE		BIT(0)
+#define SVM_SEV_FEATURES_VTOM			BIT(1)
+#define SVM_SEV_FEATURES_REFLECT_VC		BIT(2)
+#define SVM_SEV_FEATURES_RESTRICTED_INJECTION	BIT(3)
+#define SVM_SEV_FEATURES_ALTERNATE_INJECTION	BIT(4)
+#define SVM_SEV_FEATURES_DEBUG_SWAP		BIT(5)
+#define SVM_SEV_FEATURES_PREVENT_HOST_IBS	BIT(6)
+#define SVM_SEV_FEATURES_BTB_ISOLATION		BIT(7)
+
 struct vmcb_seg {
 	u16 selector;
 	u16 attrib;
@@ -230,7 +239,8 @@ struct vmcb_save_area {
 	struct vmcb_seg ldtr;
 	struct vmcb_seg idtr;
 	struct vmcb_seg tr;
-	u8 reserved_1[43];
+	u8 reserved_1[42];
+	u8 vmpl;
 	u8 cpl;
 	u8 reserved_2[4];
 	u64 efer;
@@ -295,7 +305,8 @@ struct vmcb_save_area {
 	u64 sw_exit_info_1;
 	u64 sw_exit_info_2;
 	u64 sw_scratch;
-	u8 reserved_11[56];
+	u64 sev_features;
+	u8 reserved_11[48];
 	u64 xcr0;
 	u8 valid_bitmap[16];
 	u64 x87_state_gpa;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ede3cf460894..1b9091d750fc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3191,8 +3191,8 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	       "tr:",
 	       save01->tr.selector, save01->tr.attrib,
 	       save01->tr.limit, save01->tr.base);
-	pr_err("cpl:            %d                efer:         %016llx\n",
-		save->cpl, save->efer);
+	pr_err("vmpl: %d   cpl:  %d               efer:          %016llx\n",
+		save->vmpl, save->cpl, save->efer);
 	pr_err("%-15s %016llx %-13s %016llx\n",
 	       "cr0:", save->cr0, "cr2:", save->cr2);
 	pr_err("%-15s %016llx %-13s %016llx\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 21/37] KVM: SVM: Add KVM_SNP_INIT command
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (19 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 20/37] KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-05-06 20:25   ` Peter Gonda
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 22/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Brijesh Singh
                   ` (15 subsequent siblings)
  36 siblings, 1 reply; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be the
first command issued. When creating SEV-SNP guest, the VMM must use this
command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c   | 18 ++++++++++++++++--
 include/uapi/linux/kvm.h |  3 +++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 200d227f9232..ea74dd9e03d3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -230,8 +230,9 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 
 static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
+	bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
-	bool es_active = argp->id == KVM_SEV_ES_INIT;
+	bool snp_active = argp->id == KVM_SEV_SNP_INIT;
 	int asid, ret;
 
 	if (kvm->created_vcpus)
@@ -242,12 +243,16 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		return ret;
 
 	sev->es_active = es_active;
+	sev->snp_active = snp_active;
 	asid = sev_asid_new(sev);
 	if (asid < 0)
 		goto e_no_asid;
 	sev->asid = asid;
 
-	ret = sev_platform_init(&argp->error);
+	if (snp_active)
+		ret = sev_snp_init(&argp->error);
+	else
+		ret = sev_platform_init(&argp->error);
 	if (ret)
 		goto e_free;
 
@@ -583,6 +588,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->pkru = svm->vcpu.arch.pkru;
 	save->xss  = svm->vcpu.arch.ia32_xss;
 
+	if (sev_snp_guest(svm->vcpu.kvm))
+		save->sev_features |= SVM_SEV_FEATURES_SNP_ACTIVE;
+
 	/*
 	 * SEV-ES will use a VMSA that is pointed to by the VMCB, not
 	 * the traditional VMSA that is part of the VMCB. Copy the
@@ -1525,6 +1533,12 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	}
 
 	switch (sev_cmd.id) {
+	case KVM_SEV_SNP_INIT:
+		if (!sev_snp_enabled) {
+			r = -ENOTTY;
+			goto out;
+		}
+		fallthrough;
 	case KVM_SEV_ES_INIT:
 		if (!sev_es_enabled) {
 			r = -ENOTTY;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..aaa2d62f09b5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1678,6 +1678,9 @@ enum sev_cmd_id {
 	/* Guest Migration Extension */
 	KVM_SEV_SEND_CANCEL,
 
+	/* SNP specific commands */
+	KVM_SEV_SNP_INIT,
+
 	KVM_SEV_NR_MAX,
 };
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 22/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (20 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 21/37] KVM: SVM: Add KVM_SNP_INIT command Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 23/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Brijesh Singh
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. If the guest is expected to be migrated,
the command also binds a migration agent (MA) to the guest.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c   | 132 ++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h   |   1 +
 include/uapi/linux/kvm.h |   9 +++
 3 files changed, 141 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ea74dd9e03d3..90d70038b607 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -20,6 +20,7 @@
 #include <asm/fpu/internal.h>
 
 #include <asm/trapnr.h>
+#include <asm/sev.h>
 
 #include "x86.h"
 #include "svm.h"
@@ -75,6 +76,8 @@ static unsigned long sev_me_mask;
 static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
 
+static int snp_decommission_context(struct kvm *kvm);
+
 struct enc_region {
 	struct list_head list;
 	unsigned long npages;
@@ -1510,6 +1513,100 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, &data, &argp->error);
 }
 
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct sev_data_snp_gctx_create data = {};
+	void *context;
+	int rc;
+
+	/* Allocate memory for context page */
+	context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+	if (!context)
+		return NULL;
+
+	data.gctx_paddr = __psp_pa(context);
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+	if (rc) {
+		snp_free_firmware_page(context);
+		return NULL;
+	}
+
+	return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_activate data = {};
+	int asid = sev_get_asid(kvm);
+	int ret, retry_count = 0;
+
+	/* Activate ASID on the given context */
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.asid   = asid;
+again:
+	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+
+	/* Check if the DF_FLUSH is required, and try again */
+	if (ret && (*error == SEV_RET_DFFLUSH_REQUIRED) && (!retry_count)) {
+		/* Guard DEACTIVATE against WBINVD/DF_FLUSH used in ASID recycling */
+		down_read(&sev_deactivate_lock);
+		wbinvd_on_all_cpus();
+		ret = snp_guest_df_flush(error);
+		up_read(&sev_deactivate_lock);
+
+		if (ret)
+			return ret;
+
+		/* only one retry */
+		retry_count = 1;
+
+		goto again;
+	}
+
+	return ret;
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_start start = {};
+	struct kvm_sev_snp_launch_start params;
+	int rc;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	/* Initialize the guest context */
+	sev->snp_context = snp_context_create(kvm, argp);
+	if (!sev->snp_context)
+		return -ENOTTY;
+
+	/* Issue the LAUNCH_START command */
+	start.gctx_paddr = __psp_pa(sev->snp_context);
+	start.policy = params.policy;
+	memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+	if (rc)
+		goto e_free_context;
+
+	/* Bind ASID to this guest */
+	sev->fd = argp->sev_fd;
+	rc = snp_bind_asid(kvm, &argp->error);
+	if (rc)
+		goto e_free_context;
+
+	return 0;
+
+e_free_context:
+	snp_decommission_context(kvm);
+
+	return rc;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1599,6 +1696,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_RECEIVE_FINISH:
 		r = sev_receive_finish(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_START:
+		r = snp_launch_start(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -1791,6 +1891,28 @@ int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd)
 	return ret;
 }
 
+static int snp_decommission_context(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_decommission data = {};
+	int ret;
+
+	/* If context is not created then do nothing */
+	if (!sev->snp_context)
+		return 0;
+
+	data.gctx_paddr = __sme_pa(sev->snp_context);
+	ret = snp_guest_decommission(&data, NULL);
+	if (ret)
+		return ret;
+
+	/* free the context page now */
+	snp_free_firmware_page(sev->snp_context);
+	sev->snp_context = NULL;
+
+	return 0;
+}
+
 void sev_vm_destroy(struct kvm *kvm)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -1829,7 +1951,15 @@ void sev_vm_destroy(struct kvm *kvm)
 
 	mutex_unlock(&kvm->lock);
 
-	sev_unbind_asid(kvm, sev->handle);
+	if (sev_snp_guest(kvm)) {
+		if (snp_decommission_context(kvm)) {
+			pr_err("Failed to free SNP guest context, leaking asid!\n");
+			return;
+		}
+	} else {
+		sev_unbind_asid(kvm, sev->handle);
+	}
+
 	sev_asid_free(sev);
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 85a2d5857ffb..a870bbb64ce7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -67,6 +67,7 @@ struct kvm_sev_info {
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
+	void *snp_context;      /* SNP guest context page */
 };
 
 struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index aaa2d62f09b5..00427707d053 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1680,6 +1680,7 @@ enum sev_cmd_id {
 
 	/* SNP specific commands */
 	KVM_SEV_SNP_INIT,
+	KVM_SEV_SNP_LAUNCH_START,
 
 	KVM_SEV_NR_MAX,
 };
@@ -1777,6 +1778,14 @@ struct kvm_sev_receive_update_data {
 	__u32 trans_len;
 };
 
+struct kvm_sev_snp_launch_start {
+	__u64 policy;
+	__u64 ma_uaddr;
+	__u8 ma_en;
+	__u8 imi_en;
+	__u8 gosvw[16];
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 23/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (21 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 22/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 24/37] KVM: SVM: Reclaim the guest pages when SEV-SNP VM terminates Brijesh Singh
                   ` (13 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
guest's memory. The data is encrypted with the cryptographic context
created with the KVM_SEV_SNP_LAUNCH_START.

In addition to the inserting data, it can insert a two special pages
into the guests memory: the secrets page and the CPUID page.

For more information see the SEV-SNP specification.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c   | 139 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h |  18 +++++
 2 files changed, 157 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 90d70038b607..d97f37df1f3b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -17,6 +17,7 @@
 #include <linux/misc_cgroup.h>
 #include <linux/processor.h>
 #include <linux/trace_events.h>
+#include <linux/sev.h>
 #include <asm/fpu/internal.h>
 
 #include <asm/trapnr.h>
@@ -1607,6 +1608,141 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return rc;
 }
 
+static struct kvm_memory_slot *hva_to_memslot(struct kvm *kvm, unsigned long hva)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot;
+
+	kvm_for_each_memslot(memslot, slots) {
+		if (hva >= memslot->userspace_addr &&
+		    hva < memslot->userspace_addr + (memslot->npages << PAGE_SHIFT))
+			return memslot;
+	}
+
+	return NULL;
+}
+
+static bool hva_to_gpa(struct kvm *kvm, unsigned long hva, gpa_t *gpa)
+{
+	struct kvm_memory_slot *memslot;
+	gpa_t gpa_offset;
+
+	memslot = hva_to_memslot(kvm, hva);
+	if (!memslot)
+		return false;
+
+	gpa_offset = hva - memslot->userspace_addr;
+	*gpa = ((memslot->base_gfn << PAGE_SHIFT) + gpa_offset);
+
+	return true;
+}
+
+static int snp_page_reclaim(struct page *page, int rmppage_size)
+{
+	struct sev_data_snp_page_reclaim data = {};
+	struct rmpupdate e = {};
+	int rc, err;
+
+	data.paddr = __sme_page_pa(page) | rmppage_size;
+	rc = snp_guest_page_reclaim(&data, &err);
+	if (rc)
+		return rc;
+
+	return rmpupdate(page, &e);
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	unsigned long npages, vaddr, vaddr_end, i, next_vaddr;
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_update data = {};
+	struct kvm_sev_snp_launch_update params;
+	int *error = &argp->error;
+	struct kvm_vcpu *vcpu;
+	struct page **inpages;
+	struct rmpupdate e;
+	int ret;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+
+	/* Lock the user memory. */
+	inpages = sev_pin_memory(kvm, params.uaddr, params.len, &npages, 1);
+	if (!inpages)
+		return -ENOMEM;
+
+	vcpu = kvm_get_vcpu(kvm, 0);
+	vaddr = params.uaddr;
+	vaddr_end = vaddr + params.len;
+
+	for (i = 0; vaddr < vaddr_end; vaddr = next_vaddr, i++) {
+		unsigned long psize, pmask;
+		int level = PG_LEVEL_4K;
+		gpa_t gpa;
+
+		if (!hva_to_gpa(kvm, vaddr, &gpa)) {
+			ret = -EINVAL;
+			goto e_unpin;
+		}
+
+		psize = page_level_size(level);
+		pmask = page_level_mask(level);
+		gpa = gpa & pmask;
+
+		/* Transition the page state to pre-guest */
+		memset(&e, 0, sizeof(e));
+		e.assigned = 1;
+		e.gpa = gpa;
+		e.asid = sev_get_asid(kvm);
+		e.immutable = true;
+		e.pagesize = X86_TO_RMP_PG_LEVEL(level);
+		ret = rmpupdate(inpages[i], &e);
+		if (ret) {
+			ret = -EFAULT;
+			goto e_unpin;
+		}
+
+		data.address = __sme_page_pa(inpages[i]);
+		data.page_size = e.pagesize;
+		data.page_type = params.page_type;
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, &data, error);
+		if (ret) {
+			snp_page_reclaim(inpages[i], e.pagesize);
+			goto e_unpin;
+		}
+
+		next_vaddr = (vaddr & pmask) + psize;
+	}
+
+e_unpin:
+	/* Content of memory is updated, mark pages dirty */
+	memset(&e, 0, sizeof(e));
+	for (i = 0; i < npages; i++) {
+		set_page_dirty_lock(inpages[i]);
+		mark_page_accessed(inpages[i]);
+
+		/*
+		 * If its an error, then update RMP entry to change page ownership
+		 * to the hypervisor.
+		 */
+		if (ret)
+			rmpupdate(inpages[i], &e);
+	}
+
+	/* Unlock the user pages */
+	sev_unpin_memory(kvm, inpages, npages);
+
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1699,6 +1835,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_START:
 		r = snp_launch_start(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_UPDATE:
+		r = snp_launch_update(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 00427707d053..dfc4975820d6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1681,6 +1681,7 @@ enum sev_cmd_id {
 	/* SNP specific commands */
 	KVM_SEV_SNP_INIT,
 	KVM_SEV_SNP_LAUNCH_START,
+	KVM_SEV_SNP_LAUNCH_UPDATE,
 
 	KVM_SEV_NR_MAX,
 };
@@ -1786,6 +1787,23 @@ struct kvm_sev_snp_launch_start {
 	__u8 gosvw[16];
 };
 
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL		0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA		0x2
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO		0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED	0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS		0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID		0x6
+
+struct kvm_sev_snp_launch_update {
+	__u64 uaddr;
+	__u32 len;
+	__u8 imi_page;
+	__u8 page_type;
+	__u8 vmpl3_perms;
+	__u8 vmpl2_perms;
+	__u8 vmpl1_perms;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 24/37] KVM: SVM: Reclaim the guest pages when SEV-SNP VM terminates
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (22 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 23/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 25/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Brijesh Singh
                   ` (12 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The guest pages of the SEV-SNP VM maybe added as a private page in the
RMP entry (assigned bit is set). The guest private pages must be
transitioned to the hypervisor state before its freed.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d97f37df1f3b..4ce91c2583a3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1920,6 +1920,45 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
 static void __unregister_enc_region_locked(struct kvm *kvm,
 					   struct enc_region *region)
 {
+	struct rmpupdate val = {};
+	unsigned long i, pfn;
+	struct rmpentry *e;
+	int level, rc;
+
+	/*
+	 * The guest memory pages are assigned in the RMP table. Unassign it
+	 * before releasing the memory.
+	 */
+	if (sev_snp_guest(kvm)) {
+		for (i = 0; i < region->npages; i++) {
+			pfn = page_to_pfn(region->pages[i]);
+
+			if (need_resched())
+				schedule();
+
+			e = snp_lookup_page_in_rmptable(region->pages[i], &level);
+			if (unlikely(!e))
+				continue;
+
+			/* If its not a guest assigned page then skip it. */
+			if (!rmpentry_assigned(e))
+				continue;
+
+			/* Is the page part of a 2MB RMP entry? */
+			if (level == PG_LEVEL_2M) {
+				val.pagesize = RMP_PG_SIZE_2M;
+				pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+			} else {
+				val.pagesize = RMP_PG_SIZE_4K;
+			}
+
+			/* Transition the page to hypervisor owned. */
+			rc = rmpupdate(pfn_to_page(pfn), &val);
+			if (rc)
+				pr_err("Failed to release pfn 0x%lx ret=%d\n", pfn, rc);
+		}
+	}
+
 	sev_unpin_memory(kvm, region->pages, region->npages);
 	list_del(&region->list);
 	kfree(region);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 25/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (23 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 24/37] KVM: SVM: Reclaim the guest pages when SEV-SNP VM terminates Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 26/37] KVM: X86: Add kvm_x86_ops to get the max page level for the TDP Brijesh Singh
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.

While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c   | 125 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h |  13 ++++
 2 files changed, 138 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4ce91c2583a3..7da24b3600c4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1743,6 +1743,111 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return ret;
 }
 
+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_update data = {};
+	int i, ret;
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.page_type = SNP_PAGE_TYPE_VMSA;
+
+	for (i = 0; i < kvm->created_vcpus; i++) {
+		struct vcpu_svm *svm = to_svm(kvm->vcpus[i]);
+		struct rmpupdate e = {};
+
+		/* Perform some pre-encryption checks against the VMSA */
+		ret = sev_es_sync_vmsa(svm);
+		if (ret)
+			return ret;
+
+		/* Transition the VMSA page to a firmware state. */
+		e.assigned = 1;
+		e.immutable = 1;
+		e.asid = sev->asid;
+		e.gpa = -1;
+		e.pagesize = RMP_PG_SIZE_4K;
+		ret = rmpupdate(virt_to_page(svm->vmsa), &e);
+		if (ret)
+			return ret;
+
+		/* Issue the SNP command to encrypt the VMSA */
+		data.address = __sme_pa(svm->vmsa);
+		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+				      &data, &argp->error);
+		if (ret) {
+			snp_page_reclaim(virt_to_page(svm->vmsa), RMP_PG_SIZE_4K);
+			return ret;
+		}
+
+		svm->vcpu.arch.guest_state_protected = true;
+	}
+
+	return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct sev_data_snp_launch_finish *data;
+	void *id_block = NULL, *id_auth = NULL;
+	struct kvm_sev_snp_launch_finish params;
+	int ret;
+
+	if (!sev_snp_guest(kvm))
+		return -ENOTTY;
+
+	if (!sev->snp_context)
+		return -EINVAL;
+
+	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+		return -EFAULT;
+
+	/* Measure all vCPUs using LAUNCH_UPDATE before we finalize the launch flow. */
+	ret = snp_launch_update_vmsa(kvm, argp);
+	if (ret)
+		return ret;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+	if (!data)
+		return -ENOMEM;
+
+	if (params.id_block_en) {
+		id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+		if (IS_ERR(id_block)) {
+			ret = PTR_ERR(id_block);
+			goto e_free;
+		}
+
+		data->id_block_en = 1;
+		data->id_block_paddr = __sme_pa(id_block);
+	}
+
+	if (params.auth_key_en) {
+		id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+		if (IS_ERR(id_auth)) {
+			ret = PTR_ERR(id_auth);
+			goto e_free_id_block;
+		}
+
+		data->auth_key_en = 1;
+		data->id_auth_paddr = __sme_pa(id_auth);
+	}
+
+	data->gctx_paddr = __psp_pa(sev->snp_context);
+	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+	kfree(id_auth);
+
+e_free_id_block:
+	kfree(id_block);
+
+e_free:
+	kfree(data);
+
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
@@ -1838,6 +1943,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 	case KVM_SEV_SNP_LAUNCH_UPDATE:
 		r = snp_launch_update(kvm, &sev_cmd);
 		break;
+	case KVM_SEV_SNP_LAUNCH_FINISH:
+		r = snp_launch_finish(kvm, &sev_cmd);
+		break;
 	default:
 		r = -EINVAL;
 		goto out;
@@ -2325,8 +2433,25 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 
 	if (vcpu->arch.guest_state_protected)
 		sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
+
+	/*
+	 * If its an SNP guest, then VMSA was added in the RMP entry as a guest owned page.
+	 * Transition the page to hyperivosr state before releasing it back to the system.
+	 */
+	if (sev_snp_guest(vcpu->kvm)) {
+		struct rmpupdate e = {};
+		int rc;
+
+		rc = rmpupdate(virt_to_page(svm->vmsa), &e);
+		if (rc) {
+			pr_err("Failed to release SNP guest VMSA page (rc %d), leaking it\n", rc);
+			goto skip_vmsa_free;
+		}
+	}
+
 	__free_page(virt_to_page(svm->vmsa));
 
+skip_vmsa_free:
 	if (svm->ghcb_sa_free)
 		kfree(svm->ghcb_sa);
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index dfc4975820d6..33f8919afac2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1682,6 +1682,7 @@ enum sev_cmd_id {
 	KVM_SEV_SNP_INIT,
 	KVM_SEV_SNP_LAUNCH_START,
 	KVM_SEV_SNP_LAUNCH_UPDATE,
+	KVM_SEV_SNP_LAUNCH_FINISH,
 
 	KVM_SEV_NR_MAX,
 };
@@ -1693,6 +1694,18 @@ struct kvm_sev_cmd {
 	__u32 sev_fd;
 };
 
+#define KVM_SEV_SNP_ID_BLOCK_SIZE	96
+#define KVM_SEV_SNP_ID_AUTH_SIZE	4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE	32
+
+struct kvm_sev_snp_launch_finish {
+	__u64 id_block_uaddr;
+	__u64 id_auth_uaddr;
+	__u8 id_block_en;
+	__u8 auth_key_en;
+	__u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+};
+
 struct kvm_sev_launch_start {
 	__u32 handle;
 	__u32 policy;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 26/37] KVM: X86: Add kvm_x86_ops to get the max page level for the TDP
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (24 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 25/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 27/37] KVM: X86: Introduce kvm_mmu_map_tdp_page() for use by SEV Brijesh Singh
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

When running an SEV-SNP VM, the sPA used to index the RMP entry is
obtained through the TDP translation (gva->gpa->spa). The TDP page
level is checked against the page level programmed in the RMP entry.
If the page level does not match, then it will cause a nested page
fault with the RMP bit set to indicate the RMP violation.

To keep the TDP and RMP page level's in sync, the KVM fault handle
kvm_handle_page_fault() will call get_tdp_max_page_level() to get
the maximum allowed page level so that it can limit the TDP level.

In the case of SEV-SNP guest, the get_tdp_max_page_level() will consult
the RMP table to compute the maximum allowed page level for a given
GPA.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu.c          |  6 ++++--
 arch/x86/kvm/svm/sev.c          | 20 ++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |  1 +
 arch/x86/kvm/svm/svm.h          |  1 +
 arch/x86/kvm/vmx/vmx.c          |  8 ++++++++
 6 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 71e79a1998ad..88033147a6bf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1382,6 +1382,7 @@ struct kvm_x86_ops {
 
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+	int (*get_tdp_max_page_level)(struct kvm_vcpu *vcpu, gpa_t gpa, int max_level);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 930ac8a7e7c9..fe2c5a704a16 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3781,11 +3781,13 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa,
 				u32 error_code, bool prefault)
 {
+	int max_level = kvm_x86_ops.get_tdp_max_page_level(vcpu, gpa, PG_LEVEL_2M);
+
 	pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code);
 
 	/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
 	return direct_page_fault(vcpu, gpa & PAGE_MASK, error_code, prefault,
-				 PG_LEVEL_2M, false);
+				 max_level, false);
 }
 
 int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
@@ -3826,7 +3828,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 {
 	int max_level;
 
-	for (max_level = KVM_MAX_HUGEPAGE_LEVEL;
+	for (max_level = kvm_x86_ops.get_tdp_max_page_level(vcpu, gpa, KVM_MAX_HUGEPAGE_LEVEL);
 	     max_level > PG_LEVEL_4K;
 	     max_level--) {
 		int page_num = KVM_PAGES_PER_HPAGE(max_level);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7da24b3600c4..3203abbd22f3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3188,3 +3188,23 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
 
 	return pfn_to_page(pfn);
 }
+
+int sev_get_tdp_max_page_level(struct kvm_vcpu *vcpu, gpa_t gpa, int max_level)
+{
+	struct rmpentry *e;
+	kvm_pfn_t pfn;
+	int level;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		return max_level;
+
+	pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
+	if (is_error_noslot_pfn(pfn))
+		return max_level;
+
+	e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &level);
+	if (unlikely(!e))
+		return max_level;
+
+	return min_t(uint32_t, level, max_level);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1b9091d750fc..81a83a7c1229 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4617,6 +4617,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+	.get_tdp_max_page_level = sev_get_tdp_max_page_level,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a870bbb64ce7..cf9f0e6c6827 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -567,6 +567,7 @@ void sev_es_create_vcpu(struct vcpu_svm *svm);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+int sev_get_tdp_max_page_level(struct kvm_vcpu *vcpu, gpa_t gpa, int max_level);
 
 /* vmenter.S */
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 10b610fc7bbc..6733f1557016 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7670,6 +7670,12 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
 	return supported & BIT(bit);
 }
 
+
+static int vmx_get_tdp_max_page_level(struct kvm_vcpu *vcpu, gpa_t gpa, int max_level)
+{
+	return max_level;
+}
+
 static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.hardware_unsetup = hardware_unsetup,
 
@@ -7800,6 +7806,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.complete_emulated_msr = kvm_complete_insn_gp,
 
 	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
+
+	.get_tdp_max_page_level = vmx_get_tdp_max_page_level,
 };
 
 static __init int hardware_setup(void)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 27/37] KVM: X86: Introduce kvm_mmu_map_tdp_page() for use by SEV
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (25 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 26/37] KVM: X86: Add kvm_x86_ops to get the max page level for the TDP Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 28/37] KVM: X86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use Brijesh Singh
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

Introduce a helper to directly fault-in a TDP page without going through
the full page fault path.  This allows SEV-SNP to build the netsted page
table while handling the page state change VMGEXIT. A guest may issue a
page state change VMGEXIT before accessing the page. Create a fault so
that VMGEXIT handler can get the TDP page level and keep the TDP and RMP
page level in sync.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/mmu.h     |  2 ++
 arch/x86/kvm/mmu/mmu.c | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 88d0ed5225a4..005ce139c97d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -114,6 +114,8 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		       bool prefault);
 
+int kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, int max_level);
+
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefault)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fe2c5a704a16..d150201cf10c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3842,6 +3842,26 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 				 max_level, true);
 }
 
+int kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, int max_level)
+{
+	int r;
+
+	/*
+	 * Loop on the page fault path to handle the case where an mmu_notifier
+	 * invalidation triggers RET_PF_RETRY.  In the normal page fault path,
+	 * KVM needs to resume the guest in case the invalidation changed any
+	 * of the page fault properties, i.e. the gpa or error code.  For this
+	 * path, the gpa and error code are fixed by the caller, and the caller
+	 * expects failure if and only if the page fault can't be fixed.
+	 */
+	do {
+		r = direct_page_fault(vcpu, gpa, error_code, false, max_level, true);
+	} while (r == RET_PF_RETRY);
+
+	return r;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);
+
 static void nonpaging_init_context(struct kvm_vcpu *vcpu,
 				   struct kvm_mmu *context)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 28/37] KVM: X86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (26 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 27/37] KVM: X86: Introduce kvm_mmu_map_tdp_page() for use by SEV Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 29/37] KVM: X86: Define new RMP check related #NPF error bits Brijesh Singh
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The SEV-SNP VMs may call the page state change VMGEXIT to add the GPA
as private or shared in the RMP table. The page state change VMGEXIT
will contain the RMP page level to be used in the RMP entry. If the
page level between the TDP and RMP does not match then, it will result
in nested-page-fault (RMP violation).

The SEV-SNP VMGEXIT handler will use the kvm_mmu_get_tdp_walk() to get
the current page-level in the TDP for the given GPA and calculate a
workable page level. If a GPA is mapped as a 4K-page in the TDP, but
the guest requested to add the GPA as a 2M in the RMP entry then the
2M request will be broken into 4K-pages to keep the RMP and TDP
page-levels in sync.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/mmu.h     |  1 +
 arch/x86/kvm/mmu/mmu.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 005ce139c97d..147e76ab1536 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -115,6 +115,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 		       bool prefault);
 
 int kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, int max_level);
+bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, int *level);
 
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefault)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d150201cf10c..956bbc747167 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3862,6 +3862,35 @@ int kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, int m
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);
 
+bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, int *level)
+{
+	u64 sptes[PT64_ROOT_MAX_LEVEL + 1];
+	int leaf, root;
+
+	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
+		leaf = kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, &root);
+	else
+		leaf = get_walk(vcpu, gpa, sptes, &root);
+
+	if (unlikely(leaf < 0))
+		return false;
+
+	/* Check if the leaf SPTE is present */
+	if (!is_shadow_present_pte(sptes[leaf]))
+		return false;
+
+	*pfn = spte_to_pfn(sptes[leaf]);
+	if (leaf > PG_LEVEL_4K) {
+		u64 page_mask = KVM_PAGES_PER_HPAGE(leaf) - KVM_PAGES_PER_HPAGE(leaf - 1);
+		*pfn |= (gpa_to_gfn(gpa) & page_mask);
+	}
+
+	*level = leaf;
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_get_tdp_walk);
+
 static void nonpaging_init_context(struct kvm_vcpu *vcpu,
 				   struct kvm_mmu *context)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 29/37] KVM: X86: Define new RMP check related #NPF error bits
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (27 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 28/37] KVM: X86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 30/37] KVM: X86: update page-fault trace to log the 64-bit error code Brijesh Singh
                   ` (7 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

When SEV-SNP is enabled globally, the hardware places restrictions on all
memory accesses based on the RMP entry, whether the hyperviso or a VM,
performs the accesses. When hardware encounters an RMP access violation
during a guest access, it will cause a #VMEXIT(NPF).

See APM2 section 16.36.10 for more details.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 88033147a6bf..ad01fe9f4c43 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -237,8 +237,12 @@ enum x86_intercept_stage;
 #define PFERR_FETCH_BIT 4
 #define PFERR_PK_BIT 5
 #define PFERR_SGX_BIT 15
+#define PFERR_GUEST_RMP_BIT 31
 #define PFERR_GUEST_FINAL_BIT 32
 #define PFERR_GUEST_PAGE_BIT 33
+#define PFERR_GUEST_ENC_BIT 34
+#define PFERR_GUEST_SIZEM_BIT 35
+#define PFERR_GUEST_VMPL_BIT 36
 
 #define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
 #define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
@@ -249,6 +253,10 @@ enum x86_intercept_stage;
 #define PFERR_SGX_MASK (1U << PFERR_SGX_BIT)
 #define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
 #define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
+#define PFERR_GUEST_RMP_MASK (1ULL << PFERR_GUEST_RMP_BIT)
+#define PFERR_GUEST_ENC_MASK (1ULL << PFERR_GUEST_ENC_BIT)
+#define PFERR_GUEST_SIZEM_MASK (1ULL << PFERR_GUEST_SIZEM_BIT)
+#define PFERR_GUEST_VMPL_MASK (1ULL << PFERR_GUEST_VMPL_BIT)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
 				 PFERR_WRITE_MASK |		\
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 30/37] KVM: X86: update page-fault trace to log the 64-bit error code
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (28 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 29/37] KVM: X86: Define new RMP check related #NPF error bits Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 31/37] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Brijesh Singh
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

The page-fault error code is a 64-bit value, but the trace prints only
the lower 32-bits. Some of the SEV-SNP RMP fault error codes are
available in the upper 32-bits.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/trace.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index a61c015870e3..78cbf53bf412 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -365,12 +365,12 @@ TRACE_EVENT(kvm_inj_exception,
  * Tracepoint for page fault.
  */
 TRACE_EVENT(kvm_page_fault,
-	TP_PROTO(unsigned long fault_address, unsigned int error_code),
+	TP_PROTO(unsigned long fault_address, u64 error_code),
 	TP_ARGS(fault_address, error_code),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	fault_address	)
-		__field(	unsigned int,	error_code	)
+		__field(	u64,		error_code	)
 	),
 
 	TP_fast_assign(
@@ -378,7 +378,7 @@ TRACE_EVENT(kvm_page_fault,
 		__entry->error_code	= error_code;
 	),
 
-	TP_printk("address %lx error_code %x",
+	TP_printk("address %lx error_code %llx",
 		  __entry->fault_address, __entry->error_code)
 );
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 31/37] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (29 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 30/37] KVM: X86: update page-fault trace to log the 64-bit error code Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 32/37] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
                   ` (5 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

SEV-SNP guests are required to perform a GHCB GPA registration (see
section 2.5.2 in GHCB specification). Before using a GHCB GPA for a vCPU
the first time, a guest must register the vCPU GHCB GPA. If hypervisor
can work with the guest requested GPA then it must respond back with the
same GPA otherwise return -1.

On VMEXIT, Verify that GHCB GPA matches with the registered value. If a
mismatch is detected then abort the guest.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 25 +++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h |  7 +++++++
 2 files changed, 32 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3203abbd22f3..1cba9d770860 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2904,6 +2904,25 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_GPA_REG_REQ: {
+		kvm_pfn_t pfn;
+		u64 gfn;
+
+		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_REG_VALUE_MASK,
+					GHCB_MSR_GPA_REG_VALUE_POS);
+
+		pfn = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
+		if (is_error_noslot_pfn(pfn))
+			gfn = GHCB_MSR_GPA_REG_ERROR;
+		else
+			svm->ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+		set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_REG_VALUE_MASK,
+				  GHCB_MSR_GPA_REG_VALUE_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_GPA_REG_RESP, GHCB_MSR_INFO_MASK,
+				  GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
@@ -2952,6 +2971,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		return -EINVAL;
 	}
 
+	/* SEV-SNP guest requires that the GHCB GPA must be registered */
+	if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+		vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+		return -EINVAL;
+	}
+
 	svm->ghcb = svm->ghcb_map.hva;
 	ghcb = svm->ghcb_map.hva;
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index cf9f0e6c6827..243503fa3fd6 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -185,6 +185,8 @@ struct vcpu_svm {
 	bool ghcb_sa_free;
 
 	bool guest_state_loaded;
+
+	u64 ghcb_registered_gpa;
 };
 
 struct svm_cpu_data {
@@ -245,6 +247,11 @@ static inline bool sev_snp_guest(struct kvm *kvm)
 #endif
 }
 
+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+	return svm->ghcb_registered_gpa == val;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
 	vmcb->control.clean = 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 32/37] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (30 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 31/37] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-05-10 17:30   ` Peter Gonda
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 33/37] KVM: SVM: Add support to handle " Brijesh Singh
                   ` (4 subsequent siblings)
  36 siblings, 1 reply; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.

Before changing the page state in the RMP entry, we lookup the page in
the TDP to make sure that there is a valid mapping for it. If the mapping
exist then try to find a workable page level between the TDP and RMP for
the page. If the page is not mapped in the TDP, then create a fault such
that it gets mapped before we change the page state in the RMP entry.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 137 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 137 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1cba9d770860..cc2628d8ef1b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -28,6 +28,7 @@
 #include "svm_ops.h"
 #include "cpuid.h"
 #include "trace.h"
+#include "mmu.h"
 
 #define __ex(x) __kvm_handle_fault_on_reboot(x)
 
@@ -2825,6 +2826,127 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
 	svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_rmptable_psmash(struct kvm_vcpu *vcpu, kvm_pfn_t pfn)
+{
+	pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+	return psmash(pfn_to_page(pfn));
+}
+
+static int snp_make_page_shared(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn, int level)
+{
+	struct rmpupdate val;
+	int rc, rmp_level;
+	struct rmpentry *e;
+
+	e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
+	if (!e)
+		return -EINVAL;
+
+	if (!rmpentry_assigned(e))
+		return 0;
+
+	/* Log if the entry is validated */
+	if (rmpentry_validated(e))
+		pr_debug_ratelimited("Remove RMP entry for a validated gpa 0x%llx\n", gpa);
+
+	/*
+	 * Is the page part of an existing 2M RMP entry ? Split the 2MB into multiple
+	 * of 4K-page before making the memory shared.
+	 */
+	if ((level == PG_LEVEL_4K) && (rmp_level == PG_LEVEL_2M)) {
+		rc = snp_rmptable_psmash(vcpu, pfn);
+		if (rc)
+			return rc;
+	}
+
+	memset(&val, 0, sizeof(val));
+	val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+	return rmpupdate(pfn_to_page(pfn), &val);
+}
+
+static int snp_make_page_private(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn, int level)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
+	struct rmpupdate val;
+	struct rmpentry *e;
+	int rmp_level;
+
+	e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
+	if (!e)
+		return -EINVAL;
+
+	/* Log if the entry is validated */
+	if (rmpentry_validated(e))
+		pr_err_ratelimited("Asked to make a pre-validated gpa %llx private\n", gpa);
+
+	memset(&val, 0, sizeof(val));
+	val.gpa = gpa;
+	val.asid = sev->asid;
+	val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+	val.assigned = true;
+
+	return rmpupdate(pfn_to_page(pfn), &val);
+}
+
+static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, int op, gpa_t gpa, int level)
+{
+	struct kvm *kvm = vcpu->kvm;
+	int rc, tdp_level;
+	kvm_pfn_t pfn;
+	gpa_t gpa_end;
+
+	gpa_end = gpa + page_level_size(level);
+
+	while (gpa < gpa_end) {
+		/*
+		 * Get the pfn and level for the gpa from the nested page table.
+		 *
+		 * If the TDP walk failed, then its safe to say that we don't have a valid
+		 * mapping for the gpa in the nested page table. Create a fault to map the
+		 * page is nested page table.
+		 */
+		if (!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &tdp_level)) {
+			pfn = kvm_mmu_map_tdp_page(vcpu, gpa, PFERR_USER_MASK, level);
+			if (is_error_noslot_pfn(pfn))
+				goto out;
+
+			if (!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &tdp_level))
+				goto out;
+		}
+
+		/* Adjust the level so that we don't go higher than the backing page level */
+		level = min_t(size_t, level, tdp_level);
+
+		write_lock(&kvm->mmu_lock);
+
+		switch (op) {
+		case SNP_PAGE_STATE_SHARED:
+			rc = snp_make_page_shared(vcpu, gpa, pfn, level);
+			break;
+		case SNP_PAGE_STATE_PRIVATE:
+			rc = snp_make_page_private(vcpu, gpa, pfn, level);
+			break;
+		default:
+			rc = -EINVAL;
+			break;
+		}
+
+		write_unlock(&kvm->mmu_lock);
+
+		if (rc) {
+			pr_err_ratelimited("Error op %d gpa %llx pfn %llx level %d rc %d\n",
+					   op, gpa, pfn, level, rc);
+			goto out;
+		}
+
+		gpa = gpa + page_level_size(level);
+	}
+
+out:
+	return rc;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -2923,6 +3045,21 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_POS);
 		break;
 	}
+	case GHCB_MSR_PSC_REQ: {
+		gfn_t gfn;
+		int ret;
+		u8 op;
+
+		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_GFN_MASK, GHCB_MSR_PSC_GFN_POS);
+		op = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_OP_MASK, GHCB_MSR_PSC_OP_POS);
+
+		ret = __snp_handle_page_state_change(vcpu, op, gfn_to_gpa(gfn), PG_LEVEL_4K);
+
+		set_ghcb_msr_bits(svm, ret, GHCB_MSR_PSC_ERROR_MASK, GHCB_MSR_PSC_ERROR_POS);
+		set_ghcb_msr_bits(svm, 0, GHCB_MSR_PSC_RSVD_MASK, GHCB_MSR_PSC_RSVD_POS);
+		set_ghcb_msr_bits(svm, GHCB_MSR_PSC_RESP, GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+		break;
+	}
 	case GHCB_MSR_TERM_REQ: {
 		u64 reason_set, reason_code;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 33/37] KVM: SVM: Add support to handle Page State Change VMGEXIT
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (31 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 32/37] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 34/37] KVM: X86: Export the kvm_zap_gfn_range() for the SNP use Brijesh Singh
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification section 4.1.6.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 58 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index cc2628d8ef1b..bd71ece35597 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2641,6 +2641,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HYPERVISOR_FEATURES:
+	case SVM_VMGEXIT_SNP_PAGE_STATE_CHANGE:
 		break;
 	default:
 		goto vmgexit_err;
@@ -2927,6 +2928,7 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, int op, gpa_t g
 		case SNP_PAGE_STATE_PRIVATE:
 			rc = snp_make_page_private(vcpu, gpa, pfn, level);
 			break;
+		/* TODO: Add USMASH and PSMASH support */
 		default:
 			rc = -EINVAL;
 			break;
@@ -2947,6 +2949,53 @@ static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, int op, gpa_t g
 	return rc;
 }
 
+static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm, struct ghcb *ghcb)
+{
+	struct snp_page_state_entry *entry;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct snp_page_state_change *info;
+	unsigned long rc;
+	int level, op;
+	gpa_t gpa;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		return -ENXIO;
+
+	if (!setup_vmgexit_scratch(svm, true, sizeof(ghcb->save.sw_scratch))) {
+		pr_err("vmgexit: scratch area is not setup.\n");
+		return -EINVAL;
+	}
+
+	info = (struct snp_page_state_change *)svm->ghcb_sa;
+	entry = &info->entry[info->header.cur_entry];
+
+	if ((info->header.cur_entry >= VMGEXIT_PSC_MAX_ENTRY) ||
+	    (info->header.end_entry >= VMGEXIT_PSC_MAX_ENTRY) ||
+	    (info->header.cur_entry > info->header.end_entry))
+		return VMGEXIT_PSC_INVALID_HEADER;
+
+	while (info->header.cur_entry <= info->header.end_entry) {
+		entry = &info->entry[info->header.cur_entry];
+		gpa = gfn_to_gpa(entry->gfn);
+		level = RMP_TO_X86_PG_LEVEL(entry->pagesize);
+		op = entry->operation;
+
+		if (!IS_ALIGNED(gpa, page_level_size(level))) {
+			rc = VMGEXIT_PSC_INVALID_ENTRY;
+			goto out;
+		}
+
+		rc = __snp_handle_page_state_change(vcpu, op, gpa, level);
+		if (rc)
+			goto out;
+
+		info->header.cur_entry++;
+	}
+
+out:
+	return rc;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3187,6 +3236,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = 1;
 		break;
 	}
+	case SVM_VMGEXIT_SNP_PAGE_STATE_CHANGE: {
+		unsigned long rc;
+
+		ret = 1;
+
+		rc = snp_handle_page_state_change(svm, ghcb);
+		ghcb_set_sw_exit_info_2(ghcb, rc);
+		break;
+	}
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 34/37] KVM: X86: Export the kvm_zap_gfn_range() for the SNP use
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (32 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 33/37] KVM: SVM: Add support to handle " Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 35/37] KVM: SVM: Add support to handle the RMP nested page fault Brijesh Singh
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

While resolving the RMP page fault, we may run into cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.

To keep the RMP and TDP page level in sync, we will zap the gfn range
after splitting the pages in the RMP entry. The zap should force the
TDP to gets rebuilt with the new page level.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/mmu.h              | 2 --
 arch/x86/kvm/mmu/mmu.c          | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ad01fe9f4c43..7d4db88b94f3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1481,6 +1481,8 @@ void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
 
 int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
 bool pdptrs_changed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 147e76ab1536..eec62011bb2e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -228,8 +228,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	return -(u32)fault & errcode;
 }
 
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 956bbc747167..d484f9e8a6b5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5657,6 +5657,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 
 	return need_tlb_flush;
 }
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);
 
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 35/37] KVM: SVM: Add support to handle the RMP nested page fault
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (33 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 34/37] KVM: X86: Export the kvm_zap_gfn_range() for the SNP use Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 36/37] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Brijesh Singh
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 37/37] KVM: SVM: Advertise the SEV-SNP feature support Brijesh Singh
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

Follow the recommendation from APM2 section 15.36.10 and 15.36.11 to
resolve the RMP violation encountered during the NPT table walk.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu/mmu.c          | 20 ++++++++++++
 arch/x86/kvm/svm/sev.c          | 57 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |  1 +
 arch/x86/kvm/svm/svm.h          |  2 ++
 5 files changed, 82 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7d4db88b94f3..8413220f3a83 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1391,6 +1391,8 @@ struct kvm_x86_ops {
 	void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
 	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 	int (*get_tdp_max_page_level)(struct kvm_vcpu *vcpu, gpa_t gpa, int max_level);
+	int (*handle_rmp_page_fault)(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn,
+			int level, u64 error_code);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d484f9e8a6b5..0a2bf3c2af14 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5096,6 +5096,18 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	write_unlock(&vcpu->kvm->mmu_lock);
 }
 
+static int handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+	kvm_pfn_t pfn;
+	int level;
+
+	if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &level)))
+		return RET_PF_RETRY;
+
+	kvm_x86_ops.handle_rmp_page_fault(vcpu, gpa, pfn, level, error_code);
+	return RET_PF_RETRY;
+}
+
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
 		       void *insn, int insn_len)
 {
@@ -5112,6 +5124,14 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
 			goto emulate;
 	}
 
+	if (unlikely(error_code & PFERR_GUEST_RMP_MASK)) {
+		r = handle_rmp_page_fault(vcpu, cr2_or_gpa, error_code);
+		if (r == RET_PF_RETRY)
+			return 1;
+		else
+			return r;
+	}
+
 	if (r == RET_PF_INVALID) {
 		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
 					  lower_32_bits(error_code), false);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index bd71ece35597..8e4783e105b9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3428,3 +3428,60 @@ int sev_get_tdp_max_page_level(struct kvm_vcpu *vcpu, gpa_t gpa, int max_level)
 
 	return min_t(uint32_t, level, max_level);
 }
+
+int snp_handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn,
+			      int level, u64 error_code)
+{
+	struct rmpentry *e;
+	int rlevel, rc = 0;
+	bool private;
+	gfn_t gfn;
+
+	e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rlevel);
+	if (!e)
+		return 1;
+
+	private = !!(error_code & PFERR_GUEST_ENC_MASK);
+
+	/*
+	 * See APM section 15.36.11 on how to handle the RMP fault for the large pages.
+	 *
+	 *  npt	     rmp    access      action
+	 *  --------------------------------------------------
+	 *  4k       2M     C=1       psmash
+	 *  x        x      C=1       if page is not private then add a new RMP entry
+	 *  x        x      C=0       if page is private then make it shared
+	 *  2M       4k     C=x       zap
+	 */
+	if ((error_code & PFERR_GUEST_SIZEM_MASK) ||
+	    ((level == PG_LEVEL_4K) && (rlevel == PG_LEVEL_2M) && private)) {
+		rc = snp_rmptable_psmash(vcpu, pfn);
+		goto zap_gfn;
+	}
+
+	/*
+	 * If it's a private access, and the page is not assigned in the RMP table, create a
+	 * new private RMP entry.
+	 */
+	if (!rmpentry_assigned(e) && private) {
+		rc = snp_make_page_private(vcpu, gpa, pfn, PG_LEVEL_4K);
+		goto zap_gfn;
+	}
+
+	/*
+	 * If it's a shared access, then make the page shared in the RMP table.
+	 */
+	if (rmpentry_assigned(e) && !private)
+		rc = snp_make_page_shared(vcpu, gpa, pfn, PG_LEVEL_4K);
+
+zap_gfn:
+	/*
+	 * Now that we have updated the RMP pagesize, zap the existing rmaps for
+	 * large entry ranges so that nested page table gets rebuilt with the updated RMP
+	 * pagesize.
+	 */
+	gfn = gpa_to_gfn(gpa) & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+	kvm_zap_gfn_range(vcpu->kvm, gfn, gfn + 512);
+
+	return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 81a83a7c1229..fd58f7358386 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4618,6 +4618,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.alloc_apic_backing_page = svm_alloc_apic_backing_page,
 	.get_tdp_max_page_level = sev_get_tdp_max_page_level,
+	.handle_rmp_page_fault = snp_handle_rmp_page_fault,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 243503fa3fd6..011374e6b2b2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -575,6 +575,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 int sev_get_tdp_max_page_level(struct kvm_vcpu *vcpu, gpa_t gpa, int max_level);
+int snp_handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn,
+			      int level, u64 error_code);
 
 /* vmenter.S */
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 36/37] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (34 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 35/37] KVM: SVM: Add support to handle the RMP nested page fault Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  2021-05-10 18:57   ` Peter Gonda
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 37/37] KVM: SVM: Advertise the SEV-SNP feature support Brijesh Singh
  36 siblings, 1 reply; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

Version 2 of GHCB specification added the support SNP Guest Request Message
NAE event. The event allows for an SEV-SNP guest to make request to the
SEV-SNP firmware through hypervisor using the SNP_GUEST_REQUEST API define
in the SEV-SNP firmware specification.

The SNP_GUEST_REQUEST requires two unique pages, one page for the request
and one page for the response. The response page need to be in the firmware
state. The GHCB specification says that both the pages need to be in the
hypervisor state but before executing the SEV-SNP command the response page
need to be in the firmware state.

In order to minimize the page state transition during the command handling,
pre-allocate a firmware page on guest creation. Use the pre-allocated
firmware page to complete the command execution and copy the result in the
guest response page.

Ratelimit the handling of SNP_GUEST_REQUEST NAE to avoid the possibility
of a guest creating a denial of service attack aginst the SNP firmware.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 106 +++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.h |   3 ++
 2 files changed, 106 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8e4783e105b9..da4158da644b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -18,6 +18,7 @@
 #include <linux/processor.h>
 #include <linux/trace_events.h>
 #include <linux/sev.h>
+#include <linux/kvm_host.h>
 #include <asm/fpu/internal.h>
 
 #include <asm/trapnr.h>
@@ -1517,6 +1518,7 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
 	struct sev_data_snp_gctx_create data = {};
 	void *context;
 	int rc;
@@ -1526,14 +1528,24 @@ static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (!context)
 		return NULL;
 
-	data.gctx_paddr = __psp_pa(context);
-	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
-	if (rc) {
+	/* Allocate a firmware buffer used during the guest command handling. */
+	sev->snp_resp_page = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+	if (!sev->snp_resp_page) {
 		snp_free_firmware_page(context);
 		return NULL;
 	}
 
+	data.gctx_paddr = __psp_pa(context);
+	rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+	if (rc)
+		goto e_free;
+
 	return context;
+
+e_free:
+	snp_free_firmware_page(context);
+	snp_free_firmware_page(sev->snp_resp_page);
+	return NULL;
 }
 
 static int snp_bind_asid(struct kvm *kvm, int *error)
@@ -1601,6 +1613,9 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (rc)
 		goto e_free_context;
 
+	/* Used for rate limiting SNP guest message request, use the default settings */
+	ratelimit_default_init(&sev->snp_guest_msg_rs);
+
 	return 0;
 
 e_free_context:
@@ -2197,6 +2212,9 @@ static int snp_decommission_context(struct kvm *kvm)
 	snp_free_firmware_page(sev->snp_context);
 	sev->snp_context = NULL;
 
+	/* Free the response page. */
+	snp_free_firmware_page(sev->snp_resp_page);
+
 	return 0;
 }
 
@@ -2642,6 +2660,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HYPERVISOR_FEATURES:
 	case SVM_VMGEXIT_SNP_PAGE_STATE_CHANGE:
+	case SVM_VMGEXIT_SNP_GUEST_REQUEST:
 		break;
 	default:
 		goto vmgexit_err;
@@ -2996,6 +3015,81 @@ static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm, struct g
 	return rc;
 }
 
+static void snp_handle_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb,
+				    gpa_t req_gpa, gpa_t resp_gpa)
+{
+	struct sev_data_snp_guest_request data = {};
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
+	kvm_pfn_t req_pfn, resp_pfn;
+	struct kvm_sev_info *sev;
+	int rc, err = 0;
+
+	if (!sev_snp_guest(vcpu->kvm)) {
+		rc = -ENODEV;
+		goto e_fail;
+	}
+
+	sev = &to_kvm_svm(kvm)->sev_info;
+
+	if (!__ratelimit(&sev->snp_guest_msg_rs)) {
+		pr_info_ratelimited("svm: too many guest message requests\n");
+		rc = -EAGAIN;
+		goto e_fail;
+	}
+
+	if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE)) {
+		pr_err("svm: guest request (%#llx) or response (%#llx) is not page aligned\n",
+			req_gpa, resp_gpa);
+		goto e_term;
+	}
+
+	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+	if (is_error_noslot_pfn(req_pfn)) {
+		pr_err("svm: guest request invalid gpa=%#llx\n", req_gpa);
+		goto e_term;
+	}
+
+	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+	if (is_error_noslot_pfn(resp_pfn)) {
+		pr_err("svm: guest response invalid gpa=%#llx\n", resp_gpa);
+		goto e_term;
+	}
+
+	data.gctx_paddr = __psp_pa(sev->snp_context);
+	data.req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+	data.res_paddr = __psp_pa(sev->snp_resp_page);
+
+	mutex_lock(&kvm->lock);
+
+	rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
+	if (rc) {
+		mutex_unlock(&kvm->lock);
+
+		/* If we have a firmware error code then use it. */
+		if (err)
+			rc = err;
+
+		goto e_fail;
+	}
+
+	/* Copy the response after the firmware returns success. */
+	rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
+
+	mutex_unlock(&kvm->lock);
+
+e_fail:
+	ghcb_set_sw_exit_info_2(ghcb, rc);
+	return;
+
+e_term:
+	ghcb_set_sw_exit_info_1(ghcb, 1);
+	ghcb_set_sw_exit_info_2(ghcb,
+				X86_TRAP_GP |
+				SVM_EVTINJ_TYPE_EXEPT |
+				SVM_EVTINJ_VALID);
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3245,6 +3339,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ghcb_set_sw_exit_info_2(ghcb, rc);
 		break;
 	}
+	case SVM_VMGEXIT_SNP_GUEST_REQUEST: {
+		snp_handle_guest_request(svm, ghcb, control->exit_info_1, control->exit_info_2);
+
+		ret = 1;
+		break;
+	}
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 011374e6b2b2..ecd466721c23 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -18,6 +18,7 @@
 #include <linux/kvm_types.h>
 #include <linux/kvm_host.h>
 #include <linux/bits.h>
+#include <linux/ratelimit.h>
 
 #include <asm/svm.h>
 #include <asm/sev-common.h>
@@ -68,6 +69,8 @@ struct kvm_sev_info {
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 	void *snp_context;      /* SNP guest context page */
+	void *snp_resp_page;	/* SNP guest response page */
+	struct ratelimit_state snp_guest_msg_rs; /* Rate limit the SNP guest message */
 };
 
 struct kvm_svm {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH Part2 RFC v2 37/37] KVM: SVM: Advertise the SEV-SNP feature support
  2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
                   ` (35 preceding siblings ...)
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 36/37] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Brijesh Singh
@ 2021-04-30 12:38 ` Brijesh Singh
  36 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-04-30 12:38 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, dave.hansen,
	rientjes, seanjc, peterz, hpa, tony.luck, Brijesh Singh

Now that KVM supports all the VMGEXIT NAEs required for the base SEV-SNP
feature, set the hypervisor feature to advertise it.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/svm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ecd466721c23..f344ffd5afd6 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -553,7 +553,7 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 #define GHCB_VERSION_MAX	2ULL
 #define GHCB_VERSION_MIN	1ULL
 
-#define GHCB_HV_FEATURES_SUPPORTED	0
+#define GHCB_HV_FEATURES_SUPPORTED	GHCB_HV_FEATURES_SNP
 
 extern unsigned int max_sev_asid;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address Brijesh Singh
@ 2021-05-03 14:44   ` Dave Hansen
  2021-05-03 15:03     ` Andy Lutomirski
  2021-05-03 15:37     ` Brijesh Singh
  0 siblings, 2 replies; 67+ messages in thread
From: Dave Hansen @ 2021-05-03 14:44 UTC (permalink / raw)
  To: Brijesh Singh, x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, rientjes,
	seanjc, peterz, hpa, tony.luck

On 4/30/21 5:37 AM, Brijesh Singh wrote:
> When SEV-SNP is enabled globally, a write from the host goes through the
> RMP check. When the host writes to pages, hardware checks the following
> conditions at the end of page walk:
> 
> 1. Assigned bit in the RMP table is zero (i.e page is shared).
> 2. If the page table entry that gives the sPA indicates that the target
>    page size is a large page, then all RMP entries for the 4KB
>    constituting pages of the target must have the assigned bit 0.
> 3. Immutable bit in the RMP table is not zero.
> 
> The hardware will raise page fault if one of the above conditions is not
> met. A host should not encounter the RMP fault in normal execution, but
> a malicious guest could trick the hypervisor into it. e.g., a guest does
> not make the GHCB page shared, on #VMGEXIT, the hypervisor will attempt
> to write to GHCB page.

Is that the only case which is left?  If so, why don't you simply split
the direct map for GHCB pages before giving them to the guest?  Or, map
them with vmap() so that the mapping is always 4k?

Or, worst case, you could use exception tables and something like
copy_to_user() to write to the GHCB.  That way, the thread doing the
write can safely recover from the fault without the instruction actually
ever finishing execution.

BTW, I went looking through the spec.  I didn't see anything about the
guest being able to write the "Assigned" RMP bit.  Did I miss that?
Which of the above three conditions is triggered by the guest failing to
make the GHCB page shared?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-03 14:44   ` Dave Hansen
@ 2021-05-03 15:03     ` Andy Lutomirski
  2021-05-03 15:49       ` Brijesh Singh
  2021-05-03 15:37     ` Brijesh Singh
  1 sibling, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2021-05-03 15:03 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Brijesh Singh, X86 ML, LKML, kvm list, Thomas Gleixner,
	Borislav Petkov, Joerg Roedel, Tom Lendacky, Paolo Bonzini,
	Ingo Molnar, David Rientjes, Sean Christopherson, Peter Zijlstra,
	H. Peter Anvin, Tony Luck

On Mon, May 3, 2021 at 7:44 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 4/30/21 5:37 AM, Brijesh Singh wrote:
> > When SEV-SNP is enabled globally, a write from the host goes through the
> > RMP check. When the host writes to pages, hardware checks the following
> > conditions at the end of page walk:
> >
> > 1. Assigned bit in the RMP table is zero (i.e page is shared).
> > 2. If the page table entry that gives the sPA indicates that the target
> >    page size is a large page, then all RMP entries for the 4KB
> >    constituting pages of the target must have the assigned bit 0.
> > 3. Immutable bit in the RMP table is not zero.
> >
> > The hardware will raise page fault if one of the above conditions is not
> > met. A host should not encounter the RMP fault in normal execution, but
> > a malicious guest could trick the hypervisor into it. e.g., a guest does
> > not make the GHCB page shared, on #VMGEXIT, the hypervisor will attempt
> > to write to GHCB page.
>
> Is that the only case which is left?  If so, why don't you simply split
> the direct map for GHCB pages before giving them to the guest?  Or, map
> them with vmap() so that the mapping is always 4k?

If I read Brijesh's message right, this isn't about 4k.  It's about
the guest violating host expectations about the page type.

I need to go and do a full read of all the relevant specs, but I think
there's an analogous situation in TDX: if the host touches guest
private memory, the TDX hardware will get extremely angry (more so
than AMD hardware).  And, if I have understood this patch correctly,
it's fudging around the underlying bug by intentionally screwing up
the RMP contents to avoid a page fault.  Assuming I've understood
everything correctly (a big if!), then I think this is backwards.  The
host kernel should not ever access guest memory without a plan in
place to handle failure.  We need real accessors, along the lines of
copy_from_guest() and copy_to_guest().

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 08/37] x86/sev: Split the physmap when adding the page in RMP table
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 08/37] x86/sev: Split the physmap when adding the page in RMP table Brijesh Singh
@ 2021-05-03 15:07   ` Peter Zijlstra
  2021-05-03 15:15   ` Andy Lutomirski
  1 sibling, 0 replies; 67+ messages in thread
From: Peter Zijlstra @ 2021-05-03 15:07 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, tglx, bp, jroedel, thomas.lendacky,
	pbonzini, mingo, dave.hansen, rientjes, seanjc, hpa, tony.luck

On Fri, Apr 30, 2021 at 07:37:53AM -0500, Brijesh Singh wrote:

> This poses a challenge in the Linux memory model. The Linux kernel
> creates a direct mapping of all the physical memory -- referred to as
> the physmap. The physmap may contain a valid mapping of guest owned pages.
> During the page table walk, the host access may get into the situation where
> one of the pages within the large page is owned by the guest (i.e assigned
> bit is set in RMP). A write to a non-guest within the large page will
> raise an RMP violation. To workaround it, call set_memory_4k() to split
> the physmap before adding the page in the RMP table. This ensures that the
> pages added in the RMP table are used as 4K in the physmap.

What's an RMP violation and why are they a problem?

> The spliting of the physmap is a temporary solution until the kernel page
> fault handler is improved to split the kernel address on demand.

How is that an improvement? Fracturing the physmap sucks whichever way
around.

> One of the
> disadvtange of splitting is that eventually, it will end up breaking down
> the entire physmap unless its coalesce back to a large page. I am open to
> the suggestation on various approaches we could take to address this problem.

Have the hardware fracture the TLB entry internally?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 08/37] x86/sev: Split the physmap when adding the page in RMP table
  2021-04-30 12:37 ` [PATCH Part2 RFC v2 08/37] x86/sev: Split the physmap when adding the page in RMP table Brijesh Singh
  2021-05-03 15:07   ` Peter Zijlstra
@ 2021-05-03 15:15   ` Andy Lutomirski
  2021-05-03 15:41     ` Dave Hansen
  1 sibling, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2021-05-03 15:15 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: X86 ML, LKML, kvm list, Thomas Gleixner, Borislav Petkov,
	Joerg Roedel, Tom Lendacky, Paolo Bonzini, Ingo Molnar,
	Dave Hansen, David Rientjes, Sean Christopherson, Peter Zijlstra,
	H. Peter Anvin, Tony Luck

On Fri, Apr 30, 2021 at 5:39 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> The integrity guarantee of SEV-SNP is enforced through the RMP table.
> The RMP is used in conjuntion with standard x86 and IOMMU page
> tables to enforce memory restrictions and page access rights. The
> RMP is indexed by system physical address, and is checked at the end
> of CPU and IOMMU table walks. The RMP check is enforced as soon as
> SEV-SNP is enabled globally in the system. Not every memory access
> requires an RMP check. In particular, the read accesses from the
> hypervisor do not require RMP checks because the data confidentiality
> is already protected via memory encryption. When hardware encounters
> an RMP checks failure, it raise a page-fault exception. The RMP bit in
> fault error code can be used to determine if the fault was due to an
> RMP checks failure.
>
> A write from the hypervisor goes through the RMP checks. When the
> hypervisor writes to pages, hardware checks to ensures that the assigned
> bit in the RMP is zero (i.e page is shared). If the page table entry that
> gives the sPA indicates that the target page size is a large page, then
> all RMP entries for the 4KB constituting pages of the target must have the
> assigned bit 0. If one of entry does not have assigned bit 0 then hardware
> will raise an RMP violation. To resolve it, split the page table entry
> leading to target page into 4K.
>
> This poses a challenge in the Linux memory model. The Linux kernel
> creates a direct mapping of all the physical memory -- referred to as
> the physmap. The physmap may contain a valid mapping of guest owned pages.
> During the page table walk, the host access may get into the situation where
> one of the pages within the large page is owned by the guest (i.e assigned
> bit is set in RMP). A write to a non-guest within the large page will
> raise an RMP violation. To workaround it, call set_memory_4k() to split
> the physmap before adding the page in the RMP table. This ensures that the
> pages added in the RMP table are used as 4K in the physmap.
>
> The spliting of the physmap is a temporary solution until the kernel page
> fault handler is improved to split the kernel address on demand.

Not happening.  The pages to be split might be critical to fault
handling, e.g. stack, GDT, IDT, etc.

How much performance do we get back if we add a requirement that only
2M pages (hugetlbfs, etc) may be used for private guest memory?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-03 14:44   ` Dave Hansen
  2021-05-03 15:03     ` Andy Lutomirski
@ 2021-05-03 15:37     ` Brijesh Singh
  2021-05-03 16:15       ` Dave Hansen
  1 sibling, 1 reply; 67+ messages in thread
From: Brijesh Singh @ 2021-05-03 15:37 UTC (permalink / raw)
  To: Dave Hansen, x86, linux-kernel, kvm
  Cc: brijesh.singh, tglx, bp, jroedel, thomas.lendacky, pbonzini,
	mingo, rientjes, seanjc, peterz, hpa, tony.luck

Hi Dave,

On 5/3/21 9:44 AM, Dave Hansen wrote:
> On 4/30/21 5:37 AM, Brijesh Singh wrote:
>> When SEV-SNP is enabled globally, a write from the host goes through the
>> RMP check. When the host writes to pages, hardware checks the following
>> conditions at the end of page walk:
>>
>> 1. Assigned bit in the RMP table is zero (i.e page is shared).
>> 2. If the page table entry that gives the sPA indicates that the target
>>    page size is a large page, then all RMP entries for the 4KB
>>    constituting pages of the target must have the assigned bit 0.
>> 3. Immutable bit in the RMP table is not zero.
>>
>> The hardware will raise page fault if one of the above conditions is not
>> met. A host should not encounter the RMP fault in normal execution, but
>> a malicious guest could trick the hypervisor into it. e.g., a guest does
>> not make the GHCB page shared, on #VMGEXIT, the hypervisor will attempt
>> to write to GHCB page.
> Is that the only case which is left?  If so, why don't you simply split
> the direct map for GHCB pages before giving them to the guest?  Or, map
> them with vmap() so that the mapping is always 4k?

GHCB was just an example. Another example is a vfio driver accessing the
shared page. If those pages are not marked shared then kernel access
will cause an RMP fault. Ideally we should not be running into this
situation, but if we do, then I am trying to see how best we can avoid
the host crashes.

Another reason for having this is to catch  the hypervisor bug, during
the SNP guest create, the KVM allocates few backing pages and sets the
assigned bit for it (the examples are VMSA, and firmware context page).
If hypervisor accidentally free's these pages without clearing the
assigned bit in the RMP table then it will result in RMP fault and thus
a kernel crash.


>
> Or, worst case, you could use exception tables and something like
> copy_to_user() to write to the GHCB.  That way, the thread doing the
> write can safely recover from the fault without the instruction actually
> ever finishing execution.
>
> BTW, I went looking through the spec.  I didn't see anything about the
> guest being able to write the "Assigned" RMP bit.  Did I miss that?
> Which of the above three conditions is triggered by the guest failing to
> make the GHCB page shared?

The GHCB spec section "Page State Change" provides an interface for the
guest to request the page state change. During bootup, the guest uses
the Page State Change VMGEXIT to request hypervisor to make the page
shared. The hypervisor uses the RMPUPDATE instruction to write to
"assigned" bit in the RMP table.

On VMGEXIT, the very first thing which vmgexit handler does is to map
the GHCB page for the access and then later using the copy_to_user() to
sync the GHCB updates from hypervisor to guest. The copy_to_user() will
cause a RMP fault if the GHCB is not mapped shared. As I explained
above, GHCB page was just an example, vfio or other may also get into
this situation.


-Brijesh



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 08/37] x86/sev: Split the physmap when adding the page in RMP table
  2021-05-03 15:15   ` Andy Lutomirski
@ 2021-05-03 15:41     ` Dave Hansen
  2021-05-07 11:28       ` Vlastimil Babka
  0 siblings, 1 reply; 67+ messages in thread
From: Dave Hansen @ 2021-05-03 15:41 UTC (permalink / raw)
  To: Andy Lutomirski, Brijesh Singh
  Cc: X86 ML, LKML, kvm list, Thomas Gleixner, Borislav Petkov,
	Joerg Roedel, Tom Lendacky, Paolo Bonzini, Ingo Molnar,
	David Rientjes, Sean Christopherson, Peter Zijlstra,
	H. Peter Anvin, Tony Luck

On 5/3/21 8:15 AM, Andy Lutomirski wrote:
> How much performance do we get back if we add a requirement that only
> 2M pages (hugetlbfs, etc) may be used for private guest memory?

Are you generally asking about the performance overhead of using 4k
pages instead of 2M for the direct map?  We looked at that recently and
pulled together some data:

> https://lore.kernel.org/lkml/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-03 15:03     ` Andy Lutomirski
@ 2021-05-03 15:49       ` Brijesh Singh
  0 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-05-03 15:49 UTC (permalink / raw)
  To: Andy Lutomirski, Dave Hansen
  Cc: brijesh.singh, X86 ML, LKML, kvm list, Thomas Gleixner,
	Borislav Petkov, Joerg Roedel, Tom Lendacky, Paolo Bonzini,
	Ingo Molnar, David Rientjes, Sean Christopherson, Peter Zijlstra,
	H. Peter Anvin, Tony Luck


On 5/3/21 10:03 AM, Andy Lutomirski wrote:
> On Mon, May 3, 2021 at 7:44 AM Dave Hansen <dave.hansen@intel.com> wrote:
>> On 4/30/21 5:37 AM, Brijesh Singh wrote:
>>> When SEV-SNP is enabled globally, a write from the host goes through the
>>> RMP check. When the host writes to pages, hardware checks the following
>>> conditions at the end of page walk:
>>>
>>> 1. Assigned bit in the RMP table is zero (i.e page is shared).
>>> 2. If the page table entry that gives the sPA indicates that the target
>>>    page size is a large page, then all RMP entries for the 4KB
>>>    constituting pages of the target must have the assigned bit 0.
>>> 3. Immutable bit in the RMP table is not zero.
>>>
>>> The hardware will raise page fault if one of the above conditions is not
>>> met. A host should not encounter the RMP fault in normal execution, but
>>> a malicious guest could trick the hypervisor into it. e.g., a guest does
>>> not make the GHCB page shared, on #VMGEXIT, the hypervisor will attempt
>>> to write to GHCB page.
>> Is that the only case which is left?  If so, why don't you simply split
>> the direct map for GHCB pages before giving them to the guest?  Or, map
>> them with vmap() so that the mapping is always 4k?
> If I read Brijesh's message right, this isn't about 4k.  It's about
> the guest violating host expectations about the page type.
>
> I need to go and do a full read of all the relevant specs, but I think
> there's an analogous situation in TDX: if the host touches guest
> private memory, the TDX hardware will get extremely angry (more so
> than AMD hardware).  And, if I have understood this patch correctly,
> it's fudging around the underlying bug by intentionally screwing up
> the RMP contents to avoid a page fault.  Assuming I've understood
> everything correctly (a big if!), then I think this is backwards.  The
> host kernel should not ever access guest memory without a plan in
> place to handle failure.  We need real accessors, along the lines of
> copy_from_guest() and copy_to_guest().

You understood it correctly. Its an underlying bug either in host or
guest which may cause the host accessing the guest private pages. If it
happen avoiding the host crash is much preferred (especially when its a
guest kernel bug).



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-03 15:37     ` Brijesh Singh
@ 2021-05-03 16:15       ` Dave Hansen
  2021-05-03 17:19         ` Brijesh Singh
  0 siblings, 1 reply; 67+ messages in thread
From: Dave Hansen @ 2021-05-03 16:15 UTC (permalink / raw)
  To: Brijesh Singh, x86, linux-kernel, kvm
  Cc: tglx, bp, jroedel, thomas.lendacky, pbonzini, mingo, rientjes,
	seanjc, peterz, hpa, tony.luck

On 5/3/21 8:37 AM, Brijesh Singh wrote:
> GHCB was just an example. Another example is a vfio driver accessing the
> shared page. If those pages are not marked shared then kernel access
> will cause an RMP fault. Ideally we should not be running into this
> situation, but if we do, then I am trying to see how best we can avoid
> the host crashes.

I'm confused.  Are you suggesting that the VFIO driver could be passed
an address such that the host kernel would blindly try to write private
guest memory?

The host kernel *knows* which memory is guest private and what is
shared.  It had to set it up in the first place.  It can also consult
the RMP at any time if it somehow forgot.

So, this scenario seems to be that the host got a guest physical address
(gpa) from the guest, it did a gpa->hpa->hva conversion and then wrote
the page all without bothering to consult the RMP.  Shouldn't the the
gpa->hpa conversion point offer a perfect place to determine if the page
is shared or private?

> Another reason for having this is to catch  the hypervisor bug, during
> the SNP guest create, the KVM allocates few backing pages and sets the
> assigned bit for it (the examples are VMSA, and firmware context page).
> If hypervisor accidentally free's these pages without clearing the
> assigned bit in the RMP table then it will result in RMP fault and thus
> a kernel crash.

I think I'd be just fine with a BUG_ON() in those cases instead of an
attempt to paper over the issue.  Kernel crashes are fine in the case of
kernel bugs.

>> Or, worst case, you could use exception tables and something like
>> copy_to_user() to write to the GHCB.  That way, the thread doing the
>> write can safely recover from the fault without the instruction actually
>> ever finishing execution.
>>
>> BTW, I went looking through the spec.  I didn't see anything about the
>> guest being able to write the "Assigned" RMP bit.  Did I miss that?
>> Which of the above three conditions is triggered by the guest failing to
>> make the GHCB page shared?
> 
> The GHCB spec section "Page State Change" provides an interface for the
> guest to request the page state change. During bootup, the guest uses
> the Page State Change VMGEXIT to request hypervisor to make the page
> shared. The hypervisor uses the RMPUPDATE instruction to write to
> "assigned" bit in the RMP table.

Right...  So the *HOST* is in control.  Why should the host ever be
surprised by a page transitioning from shared to private?

> On VMGEXIT, the very first thing which vmgexit handler does is to map
> the GHCB page for the access and then later using the copy_to_user() to
> sync the GHCB updates from hypervisor to guest. The copy_to_user() will
> cause a RMP fault if the GHCB is not mapped shared. As I explained
> above, GHCB page was just an example, vfio or other may also get into
> this situation.

Causing an RMP fault is fine.  The problem is shoving a whole bunch of
*recovery* code in the kernel when recovery isn't necessary.  Just look
for the -EFAULT from copy_to_user() and move on with life.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-03 16:15       ` Dave Hansen
@ 2021-05-03 17:19         ` Brijesh Singh
  2021-05-03 17:31           ` Brijesh Singh
  2021-05-03 17:40           ` Andy Lutomirski
  0 siblings, 2 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-05-03 17:19 UTC (permalink / raw)
  To: Dave Hansen, x86, linux-kernel, kvm
  Cc: brijesh.singh, tglx, bp, jroedel, thomas.lendacky, pbonzini,
	mingo, rientjes, seanjc, peterz, hpa, tony.luck


On 5/3/21 11:15 AM, Dave Hansen wrote:
> On 5/3/21 8:37 AM, Brijesh Singh wrote:
>> GHCB was just an example. Another example is a vfio driver accessing the
>> shared page. If those pages are not marked shared then kernel access
>> will cause an RMP fault. Ideally we should not be running into this
>> situation, but if we do, then I am trying to see how best we can avoid
>> the host crashes.
> I'm confused.  Are you suggesting that the VFIO driver could be passed
> an address such that the host kernel would blindly try to write private
> guest memory?

Not blindly. But a guest could trick a VMM (qemu) to ask the host driver
to access a GPA which is guest private page (Its a hypothetical case, so
its possible that I may missing something). Let's see with an example:

- A guest provides a GPA to VMM to write to (e.g DMA operation).

- VMM translates the GPA->HVA and calls down to host kernel with the HVA.

- The host kernel may pin the HVA to get the PFN for it and then kmap().
Write to the mapped PFN will cause an RMP fault if the guest provided
GPA was not a marked shared in the RMP table. In an ideal world, a guest
should *never* do this but what if it does ?


> The host kernel *knows* which memory is guest private and what is
> shared.  It had to set it up in the first place.  It can also consult
> the RMP at any time if it somehow forgot.
>
> So, this scenario seems to be that the host got a guest physical address
> (gpa) from the guest, it did a gpa->hpa->hva conversion and then wrote
> the page all without bothering to consult the RMP.  Shouldn't the the
> gpa->hpa conversion point offer a perfect place to determine if the page
> is shared or private?

The GPA->HVA is typically done by the VMM, and HVA->HPA is done by the
host drivers. So, only time we could verify is after the HVA->HPA. One
of my patch provides a snp_lookup_page_in_rmptable() helper that can be
used to query the page state in the RMP table. This means the all the
host backend drivers need to enlightened to always read the RMP table
before making a write access to guest provided GPA. A good guest should
*never* be using a private page for the DMA operation and if it does
then the fault handler introduced in this patch can avoid the host crash
and eliminate the need to enlightened the drivers to check for the
permission before the access.

I felt it is good idea to have some kind of recovery specially when a
malicious guest could lead us into this path.


>
>> Another reason for having this is to catch  the hypervisor bug, during
>> the SNP guest create, the KVM allocates few backing pages and sets the
>> assigned bit for it (the examples are VMSA, and firmware context page).
>> If hypervisor accidentally free's these pages without clearing the
>> assigned bit in the RMP table then it will result in RMP fault and thus
>> a kernel crash.
> I think I'd be just fine with a BUG_ON() in those cases instead of an
> attempt to paper over the issue.  Kernel crashes are fine in the case of
> kernel bugs.

Yes, fine with me.


>
>>> Or, worst case, you could use exception tables and something like
>>> copy_to_user() to write to the GHCB.  That way, the thread doing the
>>> write can safely recover from the fault without the instruction actually
>>> ever finishing execution.
>>>
>>> BTW, I went looking through the spec.  I didn't see anything about the
>>> guest being able to write the "Assigned" RMP bit.  Did I miss that?
>>> Which of the above three conditions is triggered by the guest failing to
>>> make the GHCB page shared?
>> The GHCB spec section "Page State Change" provides an interface for the
>> guest to request the page state change. During bootup, the guest uses
>> the Page State Change VMGEXIT to request hypervisor to make the page
>> shared. The hypervisor uses the RMPUPDATE instruction to write to
>> "assigned" bit in the RMP table.
> Right...  So the *HOST* is in control.  Why should the host ever be
> surprised by a page transitioning from shared to private?

I am trying is a cover a malicious guest cases. A good guest should
follow the GHCB spec and change the page state before the access.

>
>> On VMGEXIT, the very first thing which vmgexit handler does is to map
>> the GHCB page for the access and then later using the copy_to_user() to
>> sync the GHCB updates from hypervisor to guest. The copy_to_user() will
>> cause a RMP fault if the GHCB is not mapped shared. As I explained
>> above, GHCB page was just an example, vfio or other may also get into
>> this situation.
> Causing an RMP fault is fine.  The problem is shoving a whole bunch of
> *recovery* code in the kernel when recovery isn't necessary.  Just look
> for the -EFAULT from copy_to_user() and move on with life.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-03 17:19         ` Brijesh Singh
@ 2021-05-03 17:31           ` Brijesh Singh
  2021-05-03 17:40           ` Andy Lutomirski
  1 sibling, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-05-03 17:31 UTC (permalink / raw)
  To: Dave Hansen, x86, linux-kernel, kvm
  Cc: brijesh.singh, tglx, bp, jroedel, thomas.lendacky, pbonzini,
	mingo, rientjes, seanjc, peterz, hpa, tony.luck


On 5/3/21 12:19 PM, Brijesh Singh wrote:
> On 5/3/21 11:15 AM, Dave Hansen wrote:
>> On 5/3/21 8:37 AM, Brijesh Singh wrote:
>>> GHCB was just an example. Another example is a vfio driver accessing the
>>> shared page. If those pages are not marked shared then kernel access
>>> will cause an RMP fault. Ideally we should not be running into this
>>> situation, but if we do, then I am trying to see how best we can avoid
>>> the host crashes.
>> I'm confused.  Are you suggesting that the VFIO driver could be passed

One small correction, I was meaning to say VIRTIO but typed VFIO. Sorry
for the confusion.


>> an address such that the host kernel would blindly try to write private
>> guest memory?
> Not blindly. But a guest could trick a VMM (qemu) to ask the host driver
> to access a GPA which is guest private page (Its a hypothetical case, so
> its possible that I may missing something). Let's see with an example:
>
> - A guest provides a GPA to VMM to write to (e.g DMA operation).
>
> - VMM translates the GPA->HVA and calls down to host kernel with the HVA.
>
> - The host kernel may pin the HVA to get the PFN for it and then kmap().
> Write to the mapped PFN will cause an RMP fault if the guest provided
> GPA was not a marked shared in the RMP table. In an ideal world, a guest
> should *never* do this but what if it does ?
>
>
>> The host kernel *knows* which memory is guest private and what is
>> shared.  It had to set it up in the first place.  It can also consult
>> the RMP at any time if it somehow forgot.
>>
>> So, this scenario seems to be that the host got a guest physical address
>> (gpa) from the guest, it did a gpa->hpa->hva conversion and then wrote
>> the page all without bothering to consult the RMP.  Shouldn't the the
>> gpa->hpa conversion point offer a perfect place to determine if the page
>> is shared or private?
> The GPA->HVA is typically done by the VMM, and HVA->HPA is done by the
> host drivers. So, only time we could verify is after the HVA->HPA. One
> of my patch provides a snp_lookup_page_in_rmptable() helper that can be
> used to query the page state in the RMP table. This means the all the
> host backend drivers need to enlightened to always read the RMP table
> before making a write access to guest provided GPA. A good guest should
> *never* be using a private page for the DMA operation and if it does
> then the fault handler introduced in this patch can avoid the host crash
> and eliminate the need to enlightened the drivers to check for the
> permission before the access.
>
> I felt it is good idea to have some kind of recovery specially when a
> malicious guest could lead us into this path.
>
>
>>> Another reason for having this is to catch  the hypervisor bug, during
>>> the SNP guest create, the KVM allocates few backing pages and sets the
>>> assigned bit for it (the examples are VMSA, and firmware context page).
>>> If hypervisor accidentally free's these pages without clearing the
>>> assigned bit in the RMP table then it will result in RMP fault and thus
>>> a kernel crash.
>> I think I'd be just fine with a BUG_ON() in those cases instead of an
>> attempt to paper over the issue.  Kernel crashes are fine in the case of
>> kernel bugs.
> Yes, fine with me.
>
>
>>>> Or, worst case, you could use exception tables and something like
>>>> copy_to_user() to write to the GHCB.  That way, the thread doing the
>>>> write can safely recover from the fault without the instruction actually
>>>> ever finishing execution.
>>>>
>>>> BTW, I went looking through the spec.  I didn't see anything about the
>>>> guest being able to write the "Assigned" RMP bit.  Did I miss that?
>>>> Which of the above three conditions is triggered by the guest failing to
>>>> make the GHCB page shared?
>>> The GHCB spec section "Page State Change" provides an interface for the
>>> guest to request the page state change. During bootup, the guest uses
>>> the Page State Change VMGEXIT to request hypervisor to make the page
>>> shared. The hypervisor uses the RMPUPDATE instruction to write to
>>> "assigned" bit in the RMP table.
>> Right...  So the *HOST* is in control.  Why should the host ever be
>> surprised by a page transitioning from shared to private?
> I am trying is a cover a malicious guest cases. A good guest should
> follow the GHCB spec and change the page state before the access.
>
>>> On VMGEXIT, the very first thing which vmgexit handler does is to map
>>> the GHCB page for the access and then later using the copy_to_user() to
>>> sync the GHCB updates from hypervisor to guest. The copy_to_user() will
>>> cause a RMP fault if the GHCB is not mapped shared. As I explained
>>> above, GHCB page was just an example, vfio or other may also get into
>>> this situation.
>> Causing an RMP fault is fine.  The problem is shoving a whole bunch of
>> *recovery* code in the kernel when recovery isn't necessary.  Just look
>> for the -EFAULT from copy_to_user() and move on with life.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-03 17:19         ` Brijesh Singh
  2021-05-03 17:31           ` Brijesh Singh
@ 2021-05-03 17:40           ` Andy Lutomirski
  2021-05-03 19:41             ` Brijesh Singh
  1 sibling, 1 reply; 67+ messages in thread
From: Andy Lutomirski @ 2021-05-03 17:40 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Dave Hansen, x86, linux-kernel, kvm, tglx, bp, jroedel,
	Thomas.Lendacky, pbonzini, mingo, rientjes, seanjc, peterz, hpa,
	tony.luck



> On May 3, 2021, at 10:19 AM, Brijesh Singh <brijesh.singh@amd.com> wrote:
> 
> 
>> On 5/3/21 11:15 AM, Dave Hansen wrote:
>>> On 5/3/21 8:37 AM, Brijesh Singh wrote:
>>> GHCB was just an example. Another example is a vfio driver accessing the
>>> shared page. If those pages are not marked shared then kernel access
>>> will cause an RMP fault. Ideally we should not be running into this
>>> situation, but if we do, then I am trying to see how best we can avoid
>>> the host crashes.
>> I'm confused.  Are you suggesting that the VFIO driver could be passed
>> an address such that the host kernel would blindly try to write private
>> guest memory?
> 
> Not blindly. But a guest could trick a VMM (qemu) to ask the host driver
> to access a GPA which is guest private page (Its a hypothetical case, so
> its possible that I may missing something). Let's see with an example:
> 
> - A guest provides a GPA to VMM to write to (e.g DMA operation).
> 
> - VMM translates the GPA->HVA and calls down to host kernel with the HVA.
> 
> - The host kernel may pin the HVA to get the PFN for it and then kmap().
> Write to the mapped PFN will cause an RMP fault if the guest provided
> GPA was not a marked shared in the RMP table. In an ideal world, a guest
> should *never* do this but what if it does ?
> 
> 
>> The host kernel *knows* which memory is guest private and what is
>> shared.  It had to set it up in the first place.  It can also consult
>> the RMP at any time if it somehow forgot.
>> 
>> So, this scenario seems to be that the host got a guest physical address
>> (gpa) from the guest, it did a gpa->hpa->hva conversion and then wrote
>> the page all without bothering to consult the RMP.  Shouldn't the the
>> gpa->hpa conversion point offer a perfect place to determine if the page
>> is shared or private?
> 
> The GPA->HVA is typically done by the VMM, and HVA->HPA is done by the
> host drivers. So, only time we could verify is after the HVA->HPA. One
> of my patch provides a snp_lookup_page_in_rmptable() helper that can be
> used to query the page state in the RMP table. This means the all the
> host backend drivers need to enlightened to always read the RMP table
> before making a write access to guest provided GPA. A good guest should
> *never* be using a private page for the DMA operation and if it does
> then the fault handler introduced in this patch can avoid the host crash
> and eliminate the need to enlightened the drivers to check for the
> permission before the access.

Can we arrange for the page walk plus kmap process to fail?

> 
> I felt it is good idea to have some kind of recovery specially when a
> malicious guest could lead us into this path.
> 
> 
>> 
>>> Another reason for having this is to catch  the hypervisor bug, during
>>> the SNP guest create, the KVM allocates few backing pages and sets the
>>> assigned bit for it (the examples are VMSA, and firmware context page).
>>> If hypervisor accidentally free's these pages without clearing the
>>> assigned bit in the RMP table then it will result in RMP fault and thus
>>> a kernel crash.
>> I think I'd be just fine with a BUG_ON() in those cases instead of an
>> attempt to paper over the issue.  Kernel crashes are fine in the case of
>> kernel bugs.
> 
> Yes, fine with me.
> 
> 
>> 
>>>> Or, worst case, you could use exception tables and something like
>>>> copy_to_user() to write to the GHCB.  That way, the thread doing the
>>>> write can safely recover from the fault without the instruction actually
>>>> ever finishing execution.
>>>> 
>>>> BTW, I went looking through the spec.  I didn't see anything about the
>>>> guest being able to write the "Assigned" RMP bit.  Did I miss that?
>>>> Which of the above three conditions is triggered by the guest failing to
>>>> make the GHCB page shared?
>>> The GHCB spec section "Page State Change" provides an interface for the
>>> guest to request the page state change. During bootup, the guest uses
>>> the Page State Change VMGEXIT to request hypervisor to make the page
>>> shared. The hypervisor uses the RMPUPDATE instruction to write to
>>> "assigned" bit in the RMP table.
>> Right...  So the *HOST* is in control.  Why should the host ever be
>> surprised by a page transitioning from shared to private?
> 
> I am trying is a cover a malicious guest cases. A good guest should
> follow the GHCB spec and change the page state before the access.
> 
>> 
>>> On VMGEXIT, the very first thing which vmgexit handler does is to map
>>> the GHCB page for the access and then later using the copy_to_user() to
>>> sync the GHCB updates from hypervisor to guest. The copy_to_user() will
>>> cause a RMP fault if the GHCB is not mapped shared. As I explained
>>> above, GHCB page was just an example, vfio or other may also get into
>>> this situation.
>> Causing an RMP fault is fine.  The problem is shoving a whole bunch of
>> *recovery* code in the kernel when recovery isn't necessary.  Just look
>> for the -EFAULT from copy_to_user() and move on with life.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-03 17:40           ` Andy Lutomirski
@ 2021-05-03 19:41             ` Brijesh Singh
  2021-05-03 19:43               ` Dave Hansen
  0 siblings, 1 reply; 67+ messages in thread
From: Brijesh Singh @ 2021-05-03 19:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: brijesh.singh, Dave Hansen, x86, linux-kernel, kvm, tglx, bp,
	jroedel, Thomas.Lendacky, pbonzini, mingo, rientjes, seanjc,
	peterz, hpa, tony.luck


On 5/3/21 12:40 PM, Andy Lutomirski wrote:
>
>> On May 3, 2021, at 10:19 AM, Brijesh Singh <brijesh.singh@amd.com> wrote:
>>
>> 
>>> On 5/3/21 11:15 AM, Dave Hansen wrote:
>>>> On 5/3/21 8:37 AM, Brijesh Singh wrote:
>>>> GHCB was just an example. Another example is a vfio driver accessing the
>>>> shared page. If those pages are not marked shared then kernel access
>>>> will cause an RMP fault. Ideally we should not be running into this
>>>> situation, but if we do, then I am trying to see how best we can avoid
>>>> the host crashes.
>>> I'm confused.  Are you suggesting that the VFIO driver could be passed
>>> an address such that the host kernel would blindly try to write private
>>> guest memory?
>> Not blindly. But a guest could trick a VMM (qemu) to ask the host driver
>> to access a GPA which is guest private page (Its a hypothetical case, so
>> its possible that I may missing something). Let's see with an example:
>>
>> - A guest provides a GPA to VMM to write to (e.g DMA operation).
>>
>> - VMM translates the GPA->HVA and calls down to host kernel with the HVA.
>>
>> - The host kernel may pin the HVA to get the PFN for it and then kmap().
>> Write to the mapped PFN will cause an RMP fault if the guest provided
>> GPA was not a marked shared in the RMP table. In an ideal world, a guest
>> should *never* do this but what if it does ?
>>
>>
>>> The host kernel *knows* which memory is guest private and what is
>>> shared.  It had to set it up in the first place.  It can also consult
>>> the RMP at any time if it somehow forgot.
>>>
>>> So, this scenario seems to be that the host got a guest physical address
>>> (gpa) from the guest, it did a gpa->hpa->hva conversion and then wrote
>>> the page all without bothering to consult the RMP.  Shouldn't the the
>>> gpa->hpa conversion point offer a perfect place to determine if the page
>>> is shared or private?
>> The GPA->HVA is typically done by the VMM, and HVA->HPA is done by the
>> host drivers. So, only time we could verify is after the HVA->HPA. One
>> of my patch provides a snp_lookup_page_in_rmptable() helper that can be
>> used to query the page state in the RMP table. This means the all the
>> host backend drivers need to enlightened to always read the RMP table
>> before making a write access to guest provided GPA. A good guest should
>> *never* be using a private page for the DMA operation and if it does
>> then the fault handler introduced in this patch can avoid the host crash
>> and eliminate the need to enlightened the drivers to check for the
>> permission before the access.
> Can we arrange for the page walk plus kmap process to fail?

Sure, I will look into all the drivers which do a walk plus kmap to make
sure that they fail instead of going into the fault path. Should I drop
this patch or keep it just in the case we miss something?




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-03 19:41             ` Brijesh Singh
@ 2021-05-03 19:43               ` Dave Hansen
  2021-05-04 12:31                 ` Brijesh Singh
  0 siblings, 1 reply; 67+ messages in thread
From: Dave Hansen @ 2021-05-03 19:43 UTC (permalink / raw)
  To: Brijesh Singh, Andy Lutomirski
  Cc: x86, linux-kernel, kvm, tglx, bp, jroedel, Thomas.Lendacky,
	pbonzini, mingo, rientjes, seanjc, peterz, hpa, tony.luck

On 5/3/21 12:41 PM, Brijesh Singh wrote:
> Sure, I will look into all the drivers which do a walk plus kmap to make
> sure that they fail instead of going into the fault path. Should I drop
> this patch or keep it just in the case we miss something?

I think you should drop it, and just ensure that the existing page fault
oops code can produce a coherent, descriptive error message about what
went wrong.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-03 19:43               ` Dave Hansen
@ 2021-05-04 12:31                 ` Brijesh Singh
  2021-05-04 14:33                   ` Dave Hansen
  0 siblings, 1 reply; 67+ messages in thread
From: Brijesh Singh @ 2021-05-04 12:31 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski
  Cc: brijesh.singh, x86, linux-kernel, kvm, tglx, bp, jroedel,
	Thomas.Lendacky, pbonzini, mingo, rientjes, seanjc, peterz, hpa,
	tony.luck


On 5/3/21 2:43 PM, Dave Hansen wrote:
> On 5/3/21 12:41 PM, Brijesh Singh wrote:
>> Sure, I will look into all the drivers which do a walk plus kmap to make
>> sure that they fail instead of going into the fault path. Should I drop
>> this patch or keep it just in the case we miss something?
> I think you should drop it, and just ensure that the existing page fault
> oops code can produce a coherent, descriptive error message about what
> went wrong.

A malicious guest could still trick the host into accessing a guest
private page unless we make sure that host kernel *never* does kmap() on
GPA. The example I was thinking is:

1. Guest provides a GPA to host.

2. Host queries the RMP table and finds that GPA is shared and allows
the kmap() to happen.

3. Guest later changes the page to private.

4. Host write to mapped address will trigger a page-fault.

KVM provides kvm_map_gfn(), kvm_vcpu_map() to map a GPA; these APIs will
no longer be safe to be used. In addition, some shared pages are
registered once by the guest and KVM updates the contents of the page on
vcpu enter (e.g, CPU steal time).

IMHO, we should add the RMP table check before kmap'ing GPA but still
keep this patch to mitigate the cases where a malicious guest changes
the page state after the kmap().

-Brijesh



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-04 12:31                 ` Brijesh Singh
@ 2021-05-04 14:33                   ` Dave Hansen
  2021-05-04 15:16                     ` Brijesh Singh
  0 siblings, 1 reply; 67+ messages in thread
From: Dave Hansen @ 2021-05-04 14:33 UTC (permalink / raw)
  To: Brijesh Singh, Andy Lutomirski
  Cc: x86, linux-kernel, kvm, tglx, bp, jroedel, Thomas.Lendacky,
	pbonzini, mingo, rientjes, seanjc, peterz, hpa, tony.luck

On 5/4/21 5:31 AM, Brijesh Singh wrote:
> On 5/3/21 2:43 PM, Dave Hansen wrote:
>> On 5/3/21 12:41 PM, Brijesh Singh wrote:
>>> Sure, I will look into all the drivers which do a walk plus kmap to make
>>> sure that they fail instead of going into the fault path. Should I drop
>>> this patch or keep it just in the case we miss something?
>> I think you should drop it, and just ensure that the existing page fault
>> oops code can produce a coherent, descriptive error message about what
>> went wrong.
> 
> A malicious guest could still trick the host into accessing a guest
> private page unless we make sure that host kernel *never* does kmap() on
> GPA. The example I was thinking is:
> 
> 1. Guest provides a GPA to host.
> 
> 2. Host queries the RMP table and finds that GPA is shared and allows
> the kmap() to happen.
> 
> 3. Guest later changes the page to private.

This literally isn't possible in the SEV-SNP architecture.  I really
wish you would stop stating it.  It's horribly confusing.

The guest can not directly change the page to private.  Only the host
can change the page to private.  The guest must _ask_ the host to do it.
 That's *CRITICALLY* important because what you need to do later is
prevent specific *HOST* behavior.

When those guest requests come it, the host has to ensure that the
request is refused or stalled until there is no chance that the host
will write to the page.  That means that the host needs some locks and
some metadata.

It's also why Andy has been suggesting that you need something along the
lines of copy_to/from_guest().  Those functions would take and release
locks to ensure that shared->private guest page transitions are
impossible while host access to the memory is in flight.

> 4. Host write to mapped address will trigger a page-fault.
> 
> KVM provides kvm_map_gfn(), kvm_vcpu_map() to map a GPA; these APIs will
> no longer be safe to be used.

Yes, it sounds like there is some missing KVM infrastructure that needs
to accompany this series.

> In addition, some shared pages are registered once by the guest and
> KVM updates the contents of the page on vcpu enter (e.g, CPU steal
> time).
Are you suggesting that the host would honor a guest request to convert
to private the shared page used for communicating CPU steal time?  That
seems like a bug to me.

> IMHO, we should add the RMP table check before kmap'ing GPA but still
> keep this patch to mitigate the cases where a malicious guest changes
> the page state after the kmap().

I much prefer a solution where guest requests are placed under
sufficient scrutiny and not blindly followed by the host.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address
  2021-05-04 14:33                   ` Dave Hansen
@ 2021-05-04 15:16                     ` Brijesh Singh
  0 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-05-04 15:16 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski
  Cc: brijesh.singh, x86, linux-kernel, kvm, tglx, bp, jroedel,
	Thomas.Lendacky, pbonzini, mingo, rientjes, seanjc, peterz, hpa,
	tony.luck

Hi Dave,


On 5/4/21 9:33 AM, Dave Hansen wrote:
> On 5/4/21 5:31 AM, Brijesh Singh wrote:
>> On 5/3/21 2:43 PM, Dave Hansen wrote:
>>> On 5/3/21 12:41 PM, Brijesh Singh wrote:
>>>> Sure, I will look into all the drivers which do a walk plus kmap to make
>>>> sure that they fail instead of going into the fault path. Should I drop
>>>> this patch or keep it just in the case we miss something?
>>> I think you should drop it, and just ensure that the existing page fault
>>> oops code can produce a coherent, descriptive error message about what
>>> went wrong.
>> A malicious guest could still trick the host into accessing a guest
>> private page unless we make sure that host kernel *never* does kmap() on
>> GPA. The example I was thinking is:
>>
>> 1. Guest provides a GPA to host.
>>
>> 2. Host queries the RMP table and finds that GPA is shared and allows
>> the kmap() to happen.
>>
>> 3. Guest later changes the page to private.
> This literally isn't possible in the SEV-SNP architecture.  I really
> wish you would stop stating it.  It's horribly confusing.
>
> The guest can not directly change the page to private.  Only the host
> can change the page to private.  The guest must _ask_ the host to do it.
>  That's *CRITICALLY* important because what you need to do later is
> prevent specific *HOST* behavior.
>
> When those guest requests come it, the host has to ensure that the
> request is refused or stalled until there is no chance that the host
> will write to the page.  That means that the host needs some locks and
> some metadata.

Ah, this message clarifies what you and Andy are asking. I was not able
to follow how the kmap'ed addressess will be protected, but now things
are much clear and I feel better dropping this patch. Basically we want
host to keep track of the kmap'ed pages. Stall or reject the guest
request to change the page state if the page is already mapped by the host.


> It's also why Andy has been suggesting that you need something along the
> lines of copy_to/from_guest().  Those functions would take and release
> locks to ensure that shared->private guest page transitions are
> impossible while host access to the memory is in flight.

Now it all make sense.

>
>> 4. Host write to mapped address will trigger a page-fault.
>>
>> KVM provides kvm_map_gfn(), kvm_vcpu_map() to map a GPA; these APIs will
>> no longer be safe to be used.
> Yes, it sounds like there is some missing KVM infrastructure that needs
> to accompany this series.

Yes, I will enhance these APIs to ensure that map'ed GPAs are tracked.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 21/37] KVM: SVM: Add KVM_SNP_INIT command
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 21/37] KVM: SVM: Add KVM_SNP_INIT command Brijesh Singh
@ 2021-05-06 20:25   ` Peter Gonda
  2021-05-06 22:29     ` Brijesh Singh
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Gonda @ 2021-05-06 20:25 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm list, Thomas Gleixner, Borislav Petkov,
	jroedel, Lendacky, Thomas, Paolo Bonzini, Ingo Molnar,
	Dave Hansen, David Rientjes, Sean Christopherson, peterz,
	H. Peter Anvin, tony.luck

On Fri, Apr 30, 2021 at 6:44 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> The KVM_SNP_INIT command is used by the hypervisor to initialize the
> SEV-SNP platform context. In a typical workflow, this command should be the
> first command issued. When creating SEV-SNP guest, the VMM must use this
> command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c   | 18 ++++++++++++++++--
>  include/uapi/linux/kvm.h |  3 +++
>  2 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 200d227f9232..ea74dd9e03d3 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -230,8 +230,9 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>
>  static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
>         struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
>         int asid, ret;
>
>         if (kvm->created_vcpus)
> @@ -242,12 +243,16 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>                 return ret;
>
>         sev->es_active = es_active;
> +       sev->snp_active = snp_active;
>         asid = sev_asid_new(sev);
>         if (asid < 0)
>                 goto e_no_asid;
>         sev->asid = asid;
>
> -       ret = sev_platform_init(&argp->error);
> +       if (snp_active)
> +               ret = sev_snp_init(&argp->error);
> +       else
> +               ret = sev_platform_init(&argp->error);
>         if (ret)
>                 goto e_free;
>
> @@ -583,6 +588,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>         save->pkru = svm->vcpu.arch.pkru;
>         save->xss  = svm->vcpu.arch.ia32_xss;
>
> +       if (sev_snp_guest(svm->vcpu.kvm))
> +               save->sev_features |= SVM_SEV_FEATURES_SNP_ACTIVE;
> +
>         /*
>          * SEV-ES will use a VMSA that is pointed to by the VMCB, not
>          * the traditional VMSA that is part of the VMCB. Copy the
> @@ -1525,6 +1533,12 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>         }
>
>         switch (sev_cmd.id) {
> +       case KVM_SEV_SNP_INIT:
> +               if (!sev_snp_enabled) {
> +                       r = -ENOTTY;
> +                       goto out;
> +               }
> +               fallthrough;
>         case KVM_SEV_ES_INIT:
>                 if (!sev_es_enabled) {
>                         r = -ENOTTY;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 3fd9a7e9d90c..aaa2d62f09b5 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1678,6 +1678,9 @@ enum sev_cmd_id {
>         /* Guest Migration Extension */
>         KVM_SEV_SEND_CANCEL,
>
> +       /* SNP specific commands */
> +       KVM_SEV_SNP_INIT,
> +

Do you want to reserve some more enum values for SEV in case
additional functionality is added, or is this very unlikely?

>         KVM_SEV_NR_MAX,
>  };
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 21/37] KVM: SVM: Add KVM_SNP_INIT command
  2021-05-06 20:25   ` Peter Gonda
@ 2021-05-06 22:29     ` Brijesh Singh
  0 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-05-06 22:29 UTC (permalink / raw)
  To: Peter Gonda
  Cc: brijesh.singh, x86, linux-kernel, kvm list, Thomas Gleixner,
	Borislav Petkov, jroedel, Lendacky, Thomas, Paolo Bonzini,
	Ingo Molnar, Dave Hansen, David Rientjes, Sean Christopherson,
	peterz, H. Peter Anvin, tony.luck


On 5/6/21 3:25 PM, Peter Gonda wrote:
> On Fri, Apr 30, 2021 at 6:44 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>> The KVM_SNP_INIT command is used by the hypervisor to initialize the
>> SEV-SNP platform context. In a typical workflow, this command should be the
>> first command issued. When creating SEV-SNP guest, the VMM must use this
>> command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  arch/x86/kvm/svm/sev.c   | 18 ++++++++++++++++--
>>  include/uapi/linux/kvm.h |  3 +++
>>  2 files changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 200d227f9232..ea74dd9e03d3 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -230,8 +230,9 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
>>
>>  static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>  {
>> +       bool es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
>>         struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> -       bool es_active = argp->id == KVM_SEV_ES_INIT;
>> +       bool snp_active = argp->id == KVM_SEV_SNP_INIT;
>>         int asid, ret;
>>
>>         if (kvm->created_vcpus)
>> @@ -242,12 +243,16 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>                 return ret;
>>
>>         sev->es_active = es_active;
>> +       sev->snp_active = snp_active;
>>         asid = sev_asid_new(sev);
>>         if (asid < 0)
>>                 goto e_no_asid;
>>         sev->asid = asid;
>>
>> -       ret = sev_platform_init(&argp->error);
>> +       if (snp_active)
>> +               ret = sev_snp_init(&argp->error);
>> +       else
>> +               ret = sev_platform_init(&argp->error);
>>         if (ret)
>>                 goto e_free;
>>
>> @@ -583,6 +588,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>>         save->pkru = svm->vcpu.arch.pkru;
>>         save->xss  = svm->vcpu.arch.ia32_xss;
>>
>> +       if (sev_snp_guest(svm->vcpu.kvm))
>> +               save->sev_features |= SVM_SEV_FEATURES_SNP_ACTIVE;
>> +
>>         /*
>>          * SEV-ES will use a VMSA that is pointed to by the VMCB, not
>>          * the traditional VMSA that is part of the VMCB. Copy the
>> @@ -1525,6 +1533,12 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>>         }
>>
>>         switch (sev_cmd.id) {
>> +       case KVM_SEV_SNP_INIT:
>> +               if (!sev_snp_enabled) {
>> +                       r = -ENOTTY;
>> +                       goto out;
>> +               }
>> +               fallthrough;
>>         case KVM_SEV_ES_INIT:
>>                 if (!sev_es_enabled) {
>>                         r = -ENOTTY;
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 3fd9a7e9d90c..aaa2d62f09b5 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1678,6 +1678,9 @@ enum sev_cmd_id {
>>         /* Guest Migration Extension */
>>         KVM_SEV_SEND_CANCEL,
>>
>> +       /* SNP specific commands */
>> +       KVM_SEV_SNP_INIT,
>> +
> Do you want to reserve some more enum values for SEV in case
> additional functionality is added, or is this very unlikely?

Good idea, I will reserve some enum.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 08/37] x86/sev: Split the physmap when adding the page in RMP table
  2021-05-03 15:41     ` Dave Hansen
@ 2021-05-07 11:28       ` Vlastimil Babka
  0 siblings, 0 replies; 67+ messages in thread
From: Vlastimil Babka @ 2021-05-07 11:28 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Brijesh Singh
  Cc: X86 ML, LKML, kvm list, Thomas Gleixner, Borislav Petkov,
	Joerg Roedel, Tom Lendacky, Paolo Bonzini, Ingo Molnar,
	David Rientjes, Sean Christopherson, Peter Zijlstra,
	H. Peter Anvin, Tony Luck

On 5/3/21 5:41 PM, Dave Hansen wrote:
> On 5/3/21 8:15 AM, Andy Lutomirski wrote:
>> How much performance do we get back if we add a requirement that only
>> 2M pages (hugetlbfs, etc) may be used for private guest memory?
> 
> Are you generally asking about the performance overhead of using 4k
> pages instead of 2M for the direct map?  We looked at that recently and
> pulled together some data:

IIUC using 2M for private guest memory wouldn't be itself sufficient, as the
guest would also have to share pages with host with 2MB granularity, and that
might be too restrictive?

>> https://lore.kernel.org/lkml/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/
> 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 32/37] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 32/37] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
@ 2021-05-10 17:30   ` Peter Gonda
  2021-05-10 17:51     ` Brijesh Singh
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Gonda @ 2021-05-10 17:30 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm list, Thomas Gleixner, Borislav Petkov,
	jroedel, Lendacky, Thomas, Paolo Bonzini, Ingo Molnar,
	Dave Hansen, David Rientjes, Sean Christopherson, peterz,
	H. Peter Anvin, tony.luck

> +static int snp_make_page_shared(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn, int level)
> +{
> +       struct rmpupdate val;
> +       int rc, rmp_level;
> +       struct rmpentry *e;
> +
> +       e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
> +       if (!e)
> +               return -EINVAL;
> +
> +       if (!rmpentry_assigned(e))
> +               return 0;
> +
> +       /* Log if the entry is validated */
> +       if (rmpentry_validated(e))
> +               pr_debug_ratelimited("Remove RMP entry for a validated gpa 0x%llx\n", gpa);
> +
> +       /*
> +        * Is the page part of an existing 2M RMP entry ? Split the 2MB into multiple
> +        * of 4K-page before making the memory shared.
> +        */
> +       if ((level == PG_LEVEL_4K) && (rmp_level == PG_LEVEL_2M)) {
> +               rc = snp_rmptable_psmash(vcpu, pfn);
> +               if (rc)
> +                       return rc;
> +       }
> +
> +       memset(&val, 0, sizeof(val));
> +       val.pagesize = X86_TO_RMP_PG_LEVEL(level);

This is slightly different from Rev 2.00 of the GHCB spec. This
defaults to 2MB page sizes, when the spec says the only valid settings
for level are 0 -> 4k pages or 1 -> 2MB pages. Should this enforce the
same strictness as the spec?

> +       return rmpupdate(pfn_to_page(pfn), &val);
> +}
> +
> +static int snp_make_page_private(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn, int level)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
> +       struct rmpupdate val;
> +       struct rmpentry *e;
> +       int rmp_level;
> +
> +       e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
> +       if (!e)
> +               return -EINVAL;
> +
> +       /* Log if the entry is validated */
> +       if (rmpentry_validated(e))
> +               pr_err_ratelimited("Asked to make a pre-validated gpa %llx private\n", gpa);
> +
> +       memset(&val, 0, sizeof(val));
> +       val.gpa = gpa;
> +       val.asid = sev->asid;
> +       val.pagesize = X86_TO_RMP_PG_LEVEL(level);

Same comment as above.

> +       val.assigned = true;
> +
> +       return rmpupdate(pfn_to_page(pfn), &val);
> +}

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 32/37] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-05-10 17:30   ` Peter Gonda
@ 2021-05-10 17:51     ` Brijesh Singh
  2021-05-10 19:59       ` Peter Gonda
  0 siblings, 1 reply; 67+ messages in thread
From: Brijesh Singh @ 2021-05-10 17:51 UTC (permalink / raw)
  To: Peter Gonda
  Cc: brijesh.singh, x86, linux-kernel, kvm list, Thomas Gleixner,
	Borislav Petkov, jroedel, Lendacky, Thomas, Paolo Bonzini,
	Ingo Molnar, Dave Hansen, David Rientjes, Sean Christopherson,
	peterz, H. Peter Anvin, tony.luck

Hi Peter,

On 5/10/21 12:30 PM, Peter Gonda wrote:
>> +static int snp_make_page_shared(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn, int level)
>> +{
>> +       struct rmpupdate val;
>> +       int rc, rmp_level;
>> +       struct rmpentry *e;
>> +
>> +       e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
>> +       if (!e)
>> +               return -EINVAL;
>> +
>> +       if (!rmpentry_assigned(e))
>> +               return 0;
>> +
>> +       /* Log if the entry is validated */
>> +       if (rmpentry_validated(e))
>> +               pr_debug_ratelimited("Remove RMP entry for a validated gpa 0x%llx\n", gpa);
>> +
>> +       /*
>> +        * Is the page part of an existing 2M RMP entry ? Split the 2MB into multiple
>> +        * of 4K-page before making the memory shared.
>> +        */
>> +       if ((level == PG_LEVEL_4K) && (rmp_level == PG_LEVEL_2M)) {
>> +               rc = snp_rmptable_psmash(vcpu, pfn);
>> +               if (rc)
>> +                       return rc;
>> +       }
>> +
>> +       memset(&val, 0, sizeof(val));
>> +       val.pagesize = X86_TO_RMP_PG_LEVEL(level);
> This is slightly different from Rev 2.00 of the GHCB spec. This
> defaults to 2MB page sizes, when the spec says the only valid settings
> for level are 0 -> 4k pages or 1 -> 2MB pages. Should this enforce the
> same strictness as the spec?


The caller of the snp_make_page_shared() must pass the x86 page level.
We should reach here after all the guest provide value have passed
through checks.

The call sequence in this case should be:

snp_handle_vmgexit_msr_protocol()

 __snp_handle_page_state_change(vcpu, gfn_to_gpa(gfn), PG_LEVEL_4K)

  snp_make_page_shared(..., level)

Am I missing something  ?

>> +       return rmpupdate(pfn_to_page(pfn), &val);
>> +}
>> +
>> +static int snp_make_page_private(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn, int level)
>> +{
>> +       struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
>> +       struct rmpupdate val;
>> +       struct rmpentry *e;
>> +       int rmp_level;
>> +
>> +       e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
>> +       if (!e)
>> +               return -EINVAL;
>> +
>> +       /* Log if the entry is validated */
>> +       if (rmpentry_validated(e))
>> +               pr_err_ratelimited("Asked to make a pre-validated gpa %llx private\n", gpa);
>> +
>> +       memset(&val, 0, sizeof(val));
>> +       val.gpa = gpa;
>> +       val.asid = sev->asid;
>> +       val.pagesize = X86_TO_RMP_PG_LEVEL(level);
> Same comment as above.

See my above response.


>
>> +       val.assigned = true;
>> +
>> +       return rmpupdate(pfn_to_page(pfn), &val);
>> +}

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 16/37] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 16/37] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Brijesh Singh
@ 2021-05-10 18:23   ` Peter Gonda
  2021-05-10 20:07     ` Brijesh Singh
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Gonda @ 2021-05-10 18:23 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm list, Thomas Gleixner, Borislav Petkov,
	jroedel, Lendacky, Thomas, Paolo Bonzini, Ingo Molnar,
	Dave Hansen, David Rientjes, Sean Christopherson, peterz,
	H. Peter Anvin, tony.luck

> +
> +static int snp_set_rmptable_state(unsigned long paddr, int npages,
> +                                 struct rmpupdate *val, bool locked, bool need_reclaim)
> +{
> +       unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> +       unsigned long pfn_end = pfn + npages;
> +       int rc;
> +
> +       while (pfn < pfn_end) {
> +               if (need_reclaim)
> +                       if (snp_reclaim_page(pfn_to_page(pfn), locked))
> +                               return -EFAULT;
> +
> +               rc = rmpupdate(pfn_to_page(pfn), val);
> +               if (rc)
> +                       return rc;

This functional can return an error but have partially converted some
of the npages requested by the caller. Should this function return the
number of affected pages or something to allow the caller to know if
some pages need to be reverted? Or should the function attempt to do
that itself?

> +
> +               pfn++;
> +       }
> +
> +       return 0;
> +}

> +
> +static void __snp_free_firmware_pages(struct page *page, int order)
> +{
> +       struct rmpupdate val = {};
> +       unsigned long paddr;
> +
> +       if (!page)
> +               return;
> +
> +       paddr = __pa((unsigned long)page_address(page));
> +
> +       if (snp_set_rmptable_state(paddr, 1 << order, &val, false, true))
> +               return;

We now have leaked the given pages right? Should some warning be
logged or should we track these leaked pages and maybe try and free
them with a kworker?

> +
> +       __free_pages(page, order);
> +}
> +

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 36/37] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2021-04-30 12:38 ` [PATCH Part2 RFC v2 36/37] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Brijesh Singh
@ 2021-05-10 18:57   ` Peter Gonda
  2021-05-10 20:14     ` Brijesh Singh
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Gonda @ 2021-05-10 18:57 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm list, Thomas Gleixner, Borislav Petkov,
	jroedel, Lendacky, Thomas, Paolo Bonzini, Ingo Molnar,
	Dave Hansen, David Rientjes, Sean Christopherson, peterz,
	H. Peter Anvin, tony.luck

>
> +static void snp_handle_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb,
> +                                   gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +       struct sev_data_snp_guest_request data = {};
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm *kvm = vcpu->kvm;
> +       kvm_pfn_t req_pfn, resp_pfn;
> +       struct kvm_sev_info *sev;
> +       int rc, err = 0;
> +
> +       if (!sev_snp_guest(vcpu->kvm)) {
> +               rc = -ENODEV;
> +               goto e_fail;
> +       }
> +
> +       sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       if (!__ratelimit(&sev->snp_guest_msg_rs)) {
> +               pr_info_ratelimited("svm: too many guest message requests\n");
> +               rc = -EAGAIN;
> +               goto e_fail;
> +       }
> +
> +       if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE)) {
> +               pr_err("svm: guest request (%#llx) or response (%#llx) is not page aligned\n",
> +                       req_gpa, resp_gpa);
> +               goto e_term;
> +       }
> +
> +       req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
> +       if (is_error_noslot_pfn(req_pfn)) {
> +               pr_err("svm: guest request invalid gpa=%#llx\n", req_gpa);
> +               goto e_term;
> +       }
> +
> +       resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
> +       if (is_error_noslot_pfn(resp_pfn)) {
> +               pr_err("svm: guest response invalid gpa=%#llx\n", resp_gpa);
> +               goto e_term;
> +       }
> +
> +       data.gctx_paddr = __psp_pa(sev->snp_context);
> +       data.req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> +       data.res_paddr = __psp_pa(sev->snp_resp_page);
> +
> +       mutex_lock(&kvm->lock);
> +
> +       rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> +       if (rc) {
> +               mutex_unlock(&kvm->lock);
> +
> +               /* If we have a firmware error code then use it. */
> +               if (err)
> +                       rc = err;
> +
> +               goto e_fail;
> +       }
> +
> +       /* Copy the response after the firmware returns success. */
> +       rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
> +
> +       mutex_unlock(&kvm->lock);
> +
> +e_fail:
> +       ghcb_set_sw_exit_info_2(ghcb, rc);
> +       return;
> +
> +e_term:
> +       ghcb_set_sw_exit_info_1(ghcb, 1);
> +       ghcb_set_sw_exit_info_2(ghcb,
> +                               X86_TRAP_GP |
> +                               SVM_EVTINJ_TYPE_EXEPT |
> +                               SVM_EVTINJ_VALID);
> +}

I am probably missing something in the spec but I don't see any
references to #GP in the '4.1.7 SNP Guest Request' section. Why is
this different from e_fail?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 32/37] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-05-10 17:51     ` Brijesh Singh
@ 2021-05-10 19:59       ` Peter Gonda
  2021-05-10 20:50         ` Brijesh Singh
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Gonda @ 2021-05-10 19:59 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm list, Thomas Gleixner, Borislav Petkov,
	jroedel, Lendacky, Thomas, Paolo Bonzini, Ingo Molnar,
	Dave Hansen, David Rientjes, Sean Christopherson, peterz,
	H. Peter Anvin, tony.luck

On Mon, May 10, 2021 at 11:51 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>
> Hi Peter,
>
> On 5/10/21 12:30 PM, Peter Gonda wrote:
> >> +static int snp_make_page_shared(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn, int level)
> >> +{
> >> +       struct rmpupdate val;
> >> +       int rc, rmp_level;
> >> +       struct rmpentry *e;
> >> +
> >> +       e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
> >> +       if (!e)
> >> +               return -EINVAL;
> >> +
> >> +       if (!rmpentry_assigned(e))
> >> +               return 0;
> >> +
> >> +       /* Log if the entry is validated */
> >> +       if (rmpentry_validated(e))
> >> +               pr_debug_ratelimited("Remove RMP entry for a validated gpa 0x%llx\n", gpa);
> >> +
> >> +       /*
> >> +        * Is the page part of an existing 2M RMP entry ? Split the 2MB into multiple
> >> +        * of 4K-page before making the memory shared.
> >> +        */
> >> +       if ((level == PG_LEVEL_4K) && (rmp_level == PG_LEVEL_2M)) {
> >> +               rc = snp_rmptable_psmash(vcpu, pfn);
> >> +               if (rc)
> >> +                       return rc;
> >> +       }
> >> +
> >> +       memset(&val, 0, sizeof(val));
> >> +       val.pagesize = X86_TO_RMP_PG_LEVEL(level);
> > This is slightly different from Rev 2.00 of the GHCB spec. This
> > defaults to 2MB page sizes, when the spec says the only valid settings
> > for level are 0 -> 4k pages or 1 -> 2MB pages. Should this enforce the
> > same strictness as the spec?
>
>
> The caller of the snp_make_page_shared() must pass the x86 page level.
> We should reach here after all the guest provide value have passed
> through checks.
>
> The call sequence in this case should be:
>
> snp_handle_vmgexit_msr_protocol()
>
>  __snp_handle_page_state_change(vcpu, gfn_to_gpa(gfn), PG_LEVEL_4K)
>
>   snp_make_page_shared(..., level)
>
> Am I missing something  ?

Thanks Brijesh. I think my comment was misplaced. Looking at 33/37

+static unsigned long snp_handle_page_state_change(struct vcpu_svm
*svm, struct ghcb *ghcb)
+{
...
+ while (info->header.cur_entry <= info->header.end_entry) {
+ entry = &info->entry[info->header.cur_entry];
+ gpa = gfn_to_gpa(entry->gfn);
+ level = RMP_TO_X86_PG_LEVEL(entry->pagesize);
+ op = entry->operation;

This call to RMP_TO_X86_PG_LEVEL is not as strict as the spec. Is that OK?

>
> >> +       return rmpupdate(pfn_to_page(pfn), &val);
> >> +}
> >> +
> >> +static int snp_make_page_private(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn, int level)
> >> +{
> >> +       struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
> >> +       struct rmpupdate val;
> >> +       struct rmpentry *e;
> >> +       int rmp_level;
> >> +
> >> +       e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
> >> +       if (!e)
> >> +               return -EINVAL;
> >> +
> >> +       /* Log if the entry is validated */
> >> +       if (rmpentry_validated(e))
> >> +               pr_err_ratelimited("Asked to make a pre-validated gpa %llx private\n", gpa);
> >> +
> >> +       memset(&val, 0, sizeof(val));
> >> +       val.gpa = gpa;
> >> +       val.asid = sev->asid;
> >> +       val.pagesize = X86_TO_RMP_PG_LEVEL(level);
> > Same comment as above.
>
> See my above response.
>
>
> >
> >> +       val.assigned = true;
> >> +
> >> +       return rmpupdate(pfn_to_page(pfn), &val);
> >> +}

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 16/37] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
  2021-05-10 18:23   ` Peter Gonda
@ 2021-05-10 20:07     ` Brijesh Singh
  0 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-05-10 20:07 UTC (permalink / raw)
  To: Peter Gonda
  Cc: brijesh.singh, x86, linux-kernel, kvm list, Thomas Gleixner,
	Borislav Petkov, jroedel, Lendacky, Thomas, Paolo Bonzini,
	Ingo Molnar, Dave Hansen, David Rientjes, Sean Christopherson,
	peterz, H. Peter Anvin, tony.luck


On 5/10/21 1:23 PM, Peter Gonda wrote:
>> +
>> +static int snp_set_rmptable_state(unsigned long paddr, int npages,
>> +                                 struct rmpupdate *val, bool locked, bool need_reclaim)
>> +{
>> +       unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
>> +       unsigned long pfn_end = pfn + npages;
>> +       int rc;
>> +
>> +       while (pfn < pfn_end) {
>> +               if (need_reclaim)
>> +                       if (snp_reclaim_page(pfn_to_page(pfn), locked))
>> +                               return -EFAULT;
>> +
>> +               rc = rmpupdate(pfn_to_page(pfn), val);
>> +               if (rc)
>> +                       return rc;
> This functional can return an error but have partially converted some
> of the npages requested by the caller. Should this function return the
> number of affected pages or something to allow the caller to know if
> some pages need to be reverted? Or should the function attempt to do
> that itself?

I will look into improving this function to cleanup the partial updates
on the failure. Thanks


>
>> +
>> +               pfn++;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static void __snp_free_firmware_pages(struct page *page, int order)
>> +{
>> +       struct rmpupdate val = {};
>> +       unsigned long paddr;
>> +
>> +       if (!page)
>> +               return;
>> +
>> +       paddr = __pa((unsigned long)page_address(page));
>> +
>> +       if (snp_set_rmptable_state(paddr, 1 << order, &val, false, true))
>> +               return;
> We now have leaked the given pages right? Should some warning be
> logged or should we track these leaked pages and maybe try and free
> them with a kworker?

I will add the log about it. Only reason I can think of this function
failing is if the firmware fails to clear the immutable bit from the
page, If it did then I don't see any reason why a kworker retry will
succeed. Per the SNP firmware spec, the firmware should be able to clear
immutable bit as long as the firmware is in the INIT state.


>
>> +
>> +       __free_pages(page, order);
>> +}
>> +

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 36/37] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2021-05-10 18:57   ` Peter Gonda
@ 2021-05-10 20:14     ` Brijesh Singh
  2021-05-10 21:17       ` Sean Christopherson
  0 siblings, 1 reply; 67+ messages in thread
From: Brijesh Singh @ 2021-05-10 20:14 UTC (permalink / raw)
  To: Peter Gonda
  Cc: brijesh.singh, x86, linux-kernel, kvm list, Thomas Gleixner,
	Borislav Petkov, jroedel, Lendacky, Thomas, Paolo Bonzini,
	Ingo Molnar, Dave Hansen, David Rientjes, Sean Christopherson,
	peterz, H. Peter Anvin, tony.luck


On 5/10/21 1:57 PM, Peter Gonda wrote:
>> +static void snp_handle_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb,
>> +                                   gpa_t req_gpa, gpa_t resp_gpa)
>> +{
>> +       struct sev_data_snp_guest_request data = {};
>> +       struct kvm_vcpu *vcpu = &svm->vcpu;
>> +       struct kvm *kvm = vcpu->kvm;
>> +       kvm_pfn_t req_pfn, resp_pfn;
>> +       struct kvm_sev_info *sev;
>> +       int rc, err = 0;
>> +
>> +       if (!sev_snp_guest(vcpu->kvm)) {
>> +               rc = -ENODEV;
>> +               goto e_fail;
>> +       }
>> +
>> +       sev = &to_kvm_svm(kvm)->sev_info;
>> +
>> +       if (!__ratelimit(&sev->snp_guest_msg_rs)) {
>> +               pr_info_ratelimited("svm: too many guest message requests\n");
>> +               rc = -EAGAIN;
>> +               goto e_fail;
>> +       }
>> +
>> +       if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE)) {
>> +               pr_err("svm: guest request (%#llx) or response (%#llx) is not page aligned\n",
>> +                       req_gpa, resp_gpa);
>> +               goto e_term;
>> +       }
>> +
>> +       req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
>> +       if (is_error_noslot_pfn(req_pfn)) {
>> +               pr_err("svm: guest request invalid gpa=%#llx\n", req_gpa);
>> +               goto e_term;
>> +       }
>> +
>> +       resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
>> +       if (is_error_noslot_pfn(resp_pfn)) {
>> +               pr_err("svm: guest response invalid gpa=%#llx\n", resp_gpa);
>> +               goto e_term;
>> +       }
>> +
>> +       data.gctx_paddr = __psp_pa(sev->snp_context);
>> +       data.req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
>> +       data.res_paddr = __psp_pa(sev->snp_resp_page);
>> +
>> +       mutex_lock(&kvm->lock);
>> +
>> +       rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
>> +       if (rc) {
>> +               mutex_unlock(&kvm->lock);
>> +
>> +               /* If we have a firmware error code then use it. */
>> +               if (err)
>> +                       rc = err;
>> +
>> +               goto e_fail;
>> +       }
>> +
>> +       /* Copy the response after the firmware returns success. */
>> +       rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
>> +
>> +       mutex_unlock(&kvm->lock);
>> +
>> +e_fail:
>> +       ghcb_set_sw_exit_info_2(ghcb, rc);
>> +       return;
>> +
>> +e_term:
>> +       ghcb_set_sw_exit_info_1(ghcb, 1);
>> +       ghcb_set_sw_exit_info_2(ghcb,
>> +                               X86_TRAP_GP |
>> +                               SVM_EVTINJ_TYPE_EXEPT |
>> +                               SVM_EVTINJ_VALID);
>> +}
> I am probably missing something in the spec but I don't see any
> references to #GP in the '4.1.7 SNP Guest Request' section. Why is
> this different from e_fail?

The spec does not say to inject the #GP, I chose this because guest is
not adhering to the spec and there was a not a good error code in the
GHCB spec to communicate this condition. Per the spec, both the request
and response page must be a valid GPA. If we detect that guest is not
following the spec then its a guest BUG. IIRC, other places in the KVM
does something similar when guest is trying invalid operation.

-Brijesh


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 32/37] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
  2021-05-10 19:59       ` Peter Gonda
@ 2021-05-10 20:50         ` Brijesh Singh
  0 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-05-10 20:50 UTC (permalink / raw)
  To: Peter Gonda
  Cc: brijesh.singh, x86, linux-kernel, kvm list, Thomas Gleixner,
	Borislav Petkov, jroedel, Lendacky, Thomas, Paolo Bonzini,
	Ingo Molnar, Dave Hansen, David Rientjes, Sean Christopherson,
	peterz, H. Peter Anvin, tony.luck


On 5/10/21 2:59 PM, Peter Gonda wrote:
> On Mon, May 10, 2021 at 11:51 AM Brijesh Singh <brijesh.singh@amd.com> wrote:
>> Hi Peter,
>>
>> On 5/10/21 12:30 PM, Peter Gonda wrote:
>>>> +static int snp_make_page_shared(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn, int level)
>>>> +{
>>>> +       struct rmpupdate val;
>>>> +       int rc, rmp_level;
>>>> +       struct rmpentry *e;
>>>> +
>>>> +       e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
>>>> +       if (!e)
>>>> +               return -EINVAL;
>>>> +
>>>> +       if (!rmpentry_assigned(e))
>>>> +               return 0;
>>>> +
>>>> +       /* Log if the entry is validated */
>>>> +       if (rmpentry_validated(e))
>>>> +               pr_debug_ratelimited("Remove RMP entry for a validated gpa 0x%llx\n", gpa);
>>>> +
>>>> +       /*
>>>> +        * Is the page part of an existing 2M RMP entry ? Split the 2MB into multiple
>>>> +        * of 4K-page before making the memory shared.
>>>> +        */
>>>> +       if ((level == PG_LEVEL_4K) && (rmp_level == PG_LEVEL_2M)) {
>>>> +               rc = snp_rmptable_psmash(vcpu, pfn);
>>>> +               if (rc)
>>>> +                       return rc;
>>>> +       }
>>>> +
>>>> +       memset(&val, 0, sizeof(val));
>>>> +       val.pagesize = X86_TO_RMP_PG_LEVEL(level);
>>> This is slightly different from Rev 2.00 of the GHCB spec. This
>>> defaults to 2MB page sizes, when the spec says the only valid settings
>>> for level are 0 -> 4k pages or 1 -> 2MB pages. Should this enforce the
>>> same strictness as the spec?
>>
>> The caller of the snp_make_page_shared() must pass the x86 page level.
>> We should reach here after all the guest provide value have passed
>> through checks.
>>
>> The call sequence in this case should be:
>>
>> snp_handle_vmgexit_msr_protocol()
>>
>>  __snp_handle_page_state_change(vcpu, gfn_to_gpa(gfn), PG_LEVEL_4K)
>>
>>   snp_make_page_shared(..., level)
>>
>> Am I missing something  ?
> Thanks Brijesh. I think my comment was misplaced. Looking at 33/37
>
> +static unsigned long snp_handle_page_state_change(struct vcpu_svm
> *svm, struct ghcb *ghcb)
> +{
> ...
> + while (info->header.cur_entry <= info->header.end_entry) {
> + entry = &info->entry[info->header.cur_entry];
> + gpa = gfn_to_gpa(entry->gfn);
> + level = RMP_TO_X86_PG_LEVEL(entry->pagesize);
> + op = entry->operation;
>
> This call to RMP_TO_X86_PG_LEVEL is not as strict as the spec. Is that OK?

I am not able to follow which part of the spec we are missing here. Can
you please elaborate it a bit more - thanks

The entry->pagesize is boolean, so, the level returned by the macro is
either a 4K or 2MB.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 36/37] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2021-05-10 20:14     ` Brijesh Singh
@ 2021-05-10 21:17       ` Sean Christopherson
  2021-05-11 18:34         ` Brijesh Singh
  0 siblings, 1 reply; 67+ messages in thread
From: Sean Christopherson @ 2021-05-10 21:17 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: Peter Gonda, x86, linux-kernel, kvm list, Thomas Gleixner,
	Borislav Petkov, jroedel, Lendacky, Thomas, Paolo Bonzini,
	Ingo Molnar, Dave Hansen, David Rientjes, peterz, H. Peter Anvin,
	tony.luck

On Mon, May 10, 2021, Brijesh Singh wrote:
> 
> On 5/10/21 1:57 PM, Peter Gonda wrote:
> >> +e_fail:
> >> +       ghcb_set_sw_exit_info_2(ghcb, rc);
> >> +       return;
> >> +
> >> +e_term:
> >> +       ghcb_set_sw_exit_info_1(ghcb, 1);
> >> +       ghcb_set_sw_exit_info_2(ghcb,
> >> +                               X86_TRAP_GP |
> >> +                               SVM_EVTINJ_TYPE_EXEPT |
> >> +                               SVM_EVTINJ_VALID);
> >> +}
> > I am probably missing something in the spec but I don't see any
> > references to #GP in the '4.1.7 SNP Guest Request' section. Why is
> > this different from e_fail?
> 
> The spec does not say to inject the #GP, I chose this because guest is
> not adhering to the spec and there was a not a good error code in the
> GHCB spec to communicate this condition. Per the spec, both the request
> and response page must be a valid GPA. If we detect that guest is not
> following the spec then its a guest BUG. IIRC, other places in the KVM
> does something similar when guest is trying invalid operation.

The GHCB spec should be updated to define an error code, even if it's a blanket
statement for all VMGEXITs that fail to adhere to the spec.  Arbitrarily choosing
an error code and/or exception number makes the information useless to the guest
because the guest can't take specific action for those failures.  E.g. if there
is a return code specifically for GHCB spec violation, then the guest can panic,
WARN, etc... knowing that it done messed up.

"Injecting" an exception is particularly bad, because if the guest kernel takes
that request literally and emulates a #GP, then we can end up in a situation
where someone files a bug report because VMGEXIT is hitting a #GP and confusion
ensues.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH Part2 RFC v2 36/37] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
  2021-05-10 21:17       ` Sean Christopherson
@ 2021-05-11 18:34         ` Brijesh Singh
  0 siblings, 0 replies; 67+ messages in thread
From: Brijesh Singh @ 2021-05-11 18:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: brijesh.singh, Peter Gonda, x86, linux-kernel, kvm list,
	Thomas Gleixner, Borislav Petkov, jroedel, Lendacky, Thomas,
	Paolo Bonzini, Ingo Molnar, Dave Hansen, David Rientjes, peterz,
	H. Peter Anvin, tony.luck


On 5/10/21 4:17 PM, Sean Christopherson wrote:
> On Mon, May 10, 2021, Brijesh Singh wrote:
>> On 5/10/21 1:57 PM, Peter Gonda wrote:
>>>> +e_fail:
>>>> +       ghcb_set_sw_exit_info_2(ghcb, rc);
>>>> +       return;
>>>> +
>>>> +e_term:
>>>> +       ghcb_set_sw_exit_info_1(ghcb, 1);
>>>> +       ghcb_set_sw_exit_info_2(ghcb,
>>>> +                               X86_TRAP_GP |
>>>> +                               SVM_EVTINJ_TYPE_EXEPT |
>>>> +                               SVM_EVTINJ_VALID);
>>>> +}
>>> I am probably missing something in the spec but I don't see any
>>> references to #GP in the '4.1.7 SNP Guest Request' section. Why is
>>> this different from e_fail?
>> The spec does not say to inject the #GP, I chose this because guest is
>> not adhering to the spec and there was a not a good error code in the
>> GHCB spec to communicate this condition. Per the spec, both the request
>> and response page must be a valid GPA. If we detect that guest is not
>> following the spec then its a guest BUG. IIRC, other places in the KVM
>> does something similar when guest is trying invalid operation.
> The GHCB spec should be updated to define an error code, even if it's a blanket
> statement for all VMGEXITs that fail to adhere to the spec.  Arbitrarily choosing
> an error code and/or exception number makes the information useless to the guest
> because the guest can't take specific action for those failures.  E.g. if there
> is a return code specifically for GHCB spec violation, then the guest can panic,
> WARN, etc... knowing that it done messed up.

The GHCB is finalized and released so I don't think it will be covered
in v2. But I will go ahead and file the report so that it is considered
in the next updates. Having said that, I do see that for other commands
(e.g, HV door bell page) the spec spell out that if guest provides an
invalid GPA then inject a #GP. I guess we need to move that statement
and apply to all the commands. Until then I am fine with not injecting
#GP to not divert from the spec.


> "Injecting" an exception is particularly bad, because if the guest kernel takes
> that request literally and emulates a #GP, then we can end up in a situation
> where someone files a bug report because VMGEXIT is hitting a #GP and confusion
> ensues.

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2021-05-11 18:34 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 12:37 [PATCH Part2 RFC v2 00/37] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 01/37] KVM: SVM: Add support to handle AP reset MSR protocol Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 02/37] KVM: SVM: Provide the Hypervisor Feature support VMGEXIT Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 03/37] KVM: SVM: Increase the GHCB protocol version Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 04/37] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 05/37] x86/sev: Add the host SEV-SNP initialization support Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 06/37] x86/sev: Add RMP entry lookup helpers Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 07/37] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 08/37] x86/sev: Split the physmap when adding the page in RMP table Brijesh Singh
2021-05-03 15:07   ` Peter Zijlstra
2021-05-03 15:15   ` Andy Lutomirski
2021-05-03 15:41     ` Dave Hansen
2021-05-07 11:28       ` Vlastimil Babka
2021-04-30 12:37 ` [PATCH Part2 RFC v2 09/37] x86/traps: Define RMP violation #PF error code Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 10/37] x86/fault: Add support to handle the RMP fault for kernel address Brijesh Singh
2021-05-03 14:44   ` Dave Hansen
2021-05-03 15:03     ` Andy Lutomirski
2021-05-03 15:49       ` Brijesh Singh
2021-05-03 15:37     ` Brijesh Singh
2021-05-03 16:15       ` Dave Hansen
2021-05-03 17:19         ` Brijesh Singh
2021-05-03 17:31           ` Brijesh Singh
2021-05-03 17:40           ` Andy Lutomirski
2021-05-03 19:41             ` Brijesh Singh
2021-05-03 19:43               ` Dave Hansen
2021-05-04 12:31                 ` Brijesh Singh
2021-05-04 14:33                   ` Dave Hansen
2021-05-04 15:16                     ` Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 11/37] x86/fault: Add support to handle the RMP fault for user address Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 12/37] crypto:ccp: Define the SEV-SNP commands Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 13/37] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Brijesh Singh
2021-04-30 12:37 ` [PATCH Part2 RFC v2 14/37] crypto: ccp: Shutdown SNP firmware on kexec Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 15/37] crypto:ccp: Provide APIs to issue SEV-SNP commands Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 16/37] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Brijesh Singh
2021-05-10 18:23   ` Peter Gonda
2021-05-10 20:07     ` Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 17/37] crypto: ccp: Handle the legacy SEV command " Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 18/37] KVM: SVM: make AVIC backing, VMSA and VMCB memory allocation SNP safe Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 19/37] KVM: SVM: Add initial SEV-SNP support Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 20/37] KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 21/37] KVM: SVM: Add KVM_SNP_INIT command Brijesh Singh
2021-05-06 20:25   ` Peter Gonda
2021-05-06 22:29     ` Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 22/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 23/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 24/37] KVM: SVM: Reclaim the guest pages when SEV-SNP VM terminates Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 25/37] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 26/37] KVM: X86: Add kvm_x86_ops to get the max page level for the TDP Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 27/37] KVM: X86: Introduce kvm_mmu_map_tdp_page() for use by SEV Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 28/37] KVM: X86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 29/37] KVM: X86: Define new RMP check related #NPF error bits Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 30/37] KVM: X86: update page-fault trace to log the 64-bit error code Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 31/37] KVM: SVM: Add support to handle GHCB GPA register VMGEXIT Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 32/37] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT Brijesh Singh
2021-05-10 17:30   ` Peter Gonda
2021-05-10 17:51     ` Brijesh Singh
2021-05-10 19:59       ` Peter Gonda
2021-05-10 20:50         ` Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 33/37] KVM: SVM: Add support to handle " Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 34/37] KVM: X86: Export the kvm_zap_gfn_range() for the SNP use Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 35/37] KVM: SVM: Add support to handle the RMP nested page fault Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 36/37] KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event Brijesh Singh
2021-05-10 18:57   ` Peter Gonda
2021-05-10 20:14     ` Brijesh Singh
2021-05-10 21:17       ` Sean Christopherson
2021-05-11 18:34         ` Brijesh Singh
2021-04-30 12:38 ` [PATCH Part2 RFC v2 37/37] KVM: SVM: Advertise the SEV-SNP feature support Brijesh Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).