linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support
@ 2021-03-24 16:44 Brijesh Singh
  2021-03-24 16:44 ` [RFC Part1 PATCH 01/13] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
                   ` (12 more replies)
  0 siblings, 13 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

This part of Secure Encrypted Paging (SEV-SNP) series focuses on the changes
required in a guest OS for SEV-SNP support.

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based memory protections. SEV-SNP adds strong memory integrity
protection to help prevent malicious hypervisor-based attacks like data
replay, memory re-mapping and more in order to create an isolated memory
encryption environment.
 
This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

Many of the integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP
VM requires a 2-step process. First, the hypervisor assigns a page to the
guest using the new RMPUPDATE instruction. This transitions the page to
guest-invalid. Second, the guest validates the page using the new PVALIDATE
instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE"
defined in the GHCB specification to ask hypervisor to add or remove page
from the RMP table.
 
Each page assigned to the SEV-SNP VM can either be validated or unvalidated,
as indicated by the Validated flag in the page's RMP entry. There are two
approaches that can be taken for the page validation: Pre-validation and
Lazy Validation.
  
Under pre-validation, the pages are validated prior to first use. And under
lazy validation, pages are validated when first accessed. An access to a
unvalidated page results in a #VC exception, at which time the exception
handler may validate the page. Lazy validation requires careful tracking of
the validated pages to avoid validating the same GPA more than once. The
recently introduced "Unaccepted" memory type can be used to communicate the
unvalidated memory ranges to the Guest OS.

At this time we only sypport the pre-validation, the OVMF guest BIOS
validates the entire RAM before the control is handed over to the guest kernel.
The early_set_memory_{encrypt,decrypt} and set_memory_{encrypt,decrypt} are
enlightened to perform the page validation or invalidation while setting or
clearing the encryption attribute from the page table.

This series does not provide support for the following SEV-SNP features yet:

* CPUID filtering
* Driver to query attestation report
* AP bring up using the new SEV-SNP NAE
* Lazy validation
* Interrupt security

The series is based on kvm/master commit:
  87aa9ec939ec KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEs

The complete source is available at
 https://github.com/AMDESE/linux/tree/sev-snp-part-1-rfc1

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org

Additional resources
---------------------
SEV-SNP whitepaper
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
 
APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf
(section 15.36)

GHCB spec v2:
  The draft specification is posted on AMD-SEV-SNP mailing list:
   https://lists.suse.com/mailman/private/amd-sev-snp/

  Copy of draft spec is also available at 
  https://github.com/AMDESE/AMDSEV/blob/sev-snp-devel/docs/56421-Guest_Hypervisor_Communication_Block_Standardization.pdf

GHCB spec v1:
SEV-SNP firmware specification:
 https://developer.amd.com/sev/

Brijesh Singh (13):
  x86/cpufeatures: Add SEV-SNP CPU feature
  x86/mm: add sev_snp_active() helper
  x86: add a helper routine for the PVALIDATE instruction
  x86/sev-snp: define page state change VMGEXIT structure
  X86/sev-es: move few helper functions in common file
  x86/compressed: rescinds and validate the memory used for the GHCB
  x86/compressed: register GHCB memory when SNP is active
  x86/sev-es: register GHCB memory when SEV-SNP is active
  x86/kernel: add support to validate memory in early enc attribute
    change
  X86: kernel: make the bss.decrypted section shared in RMP table
  x86/kernel: validate rom memory before accessing when SEV-SNP is
    active
  x86/sev-es: make GHCB get and put helper accessible outside
  x86/kernel: add support to validate memory when changing C-bit

 arch/x86/boot/compressed/Makefile       |   1 +
 arch/x86/boot/compressed/ident_map_64.c |  18 ++
 arch/x86/boot/compressed/sev-common.c   |  32 +++
 arch/x86/boot/compressed/sev-es.c       |  26 +--
 arch/x86/boot/compressed/sev-snp.c      | 141 +++++++++++++
 arch/x86/boot/compressed/sev-snp.h      |  25 +++
 arch/x86/include/asm/cpufeatures.h      |   1 +
 arch/x86/include/asm/mem_encrypt.h      |   2 +
 arch/x86/include/asm/msr-index.h        |   2 +
 arch/x86/include/asm/sev-es.h           |  11 +
 arch/x86/include/asm/sev-snp.h          | 121 +++++++++++
 arch/x86/include/uapi/asm/svm.h         |   1 +
 arch/x86/kernel/Makefile                |   3 +
 arch/x86/kernel/cpu/amd.c               |   3 +-
 arch/x86/kernel/cpu/scattered.c         |   1 +
 arch/x86/kernel/head64.c                |  14 ++
 arch/x86/kernel/probe_roms.c            |  15 ++
 arch/x86/kernel/sev-common-shared.c     |  31 +++
 arch/x86/kernel/sev-es-shared.c         |  21 +-
 arch/x86/kernel/sev-es.c                |  32 ++-
 arch/x86/kernel/sev-snp.c               | 269 ++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt.c               |  49 ++++-
 arch/x86/mm/pat/set_memory.c            |  19 ++
 23 files changed, 792 insertions(+), 46 deletions(-)
 create mode 100644 arch/x86/boot/compressed/sev-common.c
 create mode 100644 arch/x86/boot/compressed/sev-snp.c
 create mode 100644 arch/x86/boot/compressed/sev-snp.h
 create mode 100644 arch/x86/include/asm/sev-snp.h
 create mode 100644 arch/x86/kernel/sev-common-shared.c
 create mode 100644 arch/x86/kernel/sev-snp.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 01/13] x86/cpufeatures: Add SEV-SNP CPU feature
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-03-25 10:54   ` Borislav Petkov
  2021-03-24 16:44 ` [RFC Part1 PATCH 02/13] x86/mm: add sev_snp_active() helper Brijesh Singh
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/amd.c          | 3 ++-
 arch/x86/kernel/cpu/scattered.c    | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..a5b369f10bcd 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -238,6 +238,7 @@
 #define X86_FEATURE_VMW_VMMCALL		( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
 #define X86_FEATURE_SEV_ES		( 8*32+20) /* AMD Secure Encrypted Virtualization - Encrypted State */
 #define X86_FEATURE_VM_PAGE_FLUSH	( 8*32+21) /* "" VM Page Flush MSR is supported */
+#define X86_FEATURE_SEV_SNP		( 8*32+22) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f8ca66f3d861..39f7a4b5b04c 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -586,7 +586,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 	 *	      If BIOS has not enabled SME then don't advertise the
 	 *	      SME feature (set in scattered.c).
 	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
-	 *            SEV and SEV_ES feature (set in scattered.c).
+	 *            SEV, SEV_ES and SEV_SNP feature (set in scattered.c).
 	 *
 	 *   In all cases, since support for SME and SEV requires long mode,
 	 *   don't advertise the feature under CONFIG_X86_32.
@@ -618,6 +618,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 clear_sev:
 		setup_clear_cpu_cap(X86_FEATURE_SEV);
 		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 236924930bf0..eaec1278dc2e 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_SEV_ES,		CPUID_EAX,  3, 0x8000001f, 0 },
 	{ X86_FEATURE_SME_COHERENT,	CPUID_EAX, 10, 0x8000001f, 0 },
 	{ X86_FEATURE_VM_PAGE_FLUSH,	CPUID_EAX,  2, 0x8000001f, 0 },
+	{ X86_FEATURE_SEV_SNP,		CPUID_EAX,  4, 0x8000001f, 0 },
 	{ 0, 0, 0, 0, 0 }
 };
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 02/13] x86/mm: add sev_snp_active() helper
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
  2021-03-24 16:44 ` [RFC Part1 PATCH 01/13] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-03-24 16:44 ` [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction Brijesh Singh
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

The sev_snp_active() helper can be used by the guest to query whether the
SNP - Secure Nested Paging feature is active.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h | 2 ++
 arch/x86/include/asm/msr-index.h   | 2 ++
 arch/x86/mm/mem_encrypt.c          | 9 +++++++++
 3 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 31c4df123aa0..d99aa260d328 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -54,6 +54,7 @@ void __init sev_es_init_vc_handling(void);
 bool sme_active(void);
 bool sev_active(void);
 bool sev_es_active(void);
+bool sev_snp_active(void);
 
 #define __bss_decrypted __section(".bss..decrypted")
 
@@ -79,6 +80,7 @@ static inline void sev_es_init_vc_handling(void) { }
 static inline bool sme_active(void) { return false; }
 static inline bool sev_active(void) { return false; }
 static inline bool sev_es_active(void) { return false; }
+static inline bool sev_snp_active(void) { return false; }
 
 static inline int __init
 early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 546d6ecf0a35..b03694e116fe 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -477,8 +477,10 @@
 #define MSR_AMD64_SEV			0xc0010131
 #define MSR_AMD64_SEV_ENABLED_BIT	0
 #define MSR_AMD64_SEV_ES_ENABLED_BIT	1
+#define MSR_AMD64_SEV_SNP_ENABLED_BIT	2
 #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
 #define MSR_AMD64_SEV_ES_ENABLED	BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
+#define MSR_AMD64_SEV_SNP_ENABLED	BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
 
 #define MSR_AMD64_VIRT_SPEC_CTRL	0xc001011f
 
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index c3d5f0236f35..5bd50008fc9a 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -390,6 +390,11 @@ bool noinstr sev_es_active(void)
 	return sev_status & MSR_AMD64_SEV_ES_ENABLED;
 }
 
+bool sev_snp_active(void)
+{
+	return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
+}
+
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 bool force_dma_unencrypted(struct device *dev)
 {
@@ -462,6 +467,10 @@ static void print_mem_encrypt_feature_info(void)
 	if (sev_es_active())
 		pr_cont(" SEV-ES");
 
+	/* Secure Nested Paging */
+	if (sev_snp_active())
+		pr_cont(" SEV-SNP");
+
 	pr_cont("\n");
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
  2021-03-24 16:44 ` [RFC Part1 PATCH 01/13] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
  2021-03-24 16:44 ` [RFC Part1 PATCH 02/13] x86/mm: add sev_snp_active() helper Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-03-26 14:30   ` Borislav Petkov
  2021-03-24 16:44 ` [RFC Part1 PATCH 04/13] x86/sev-snp: define page state change VMGEXIT structure Brijesh Singh
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

An SNP-active guest uses the PVALIDATE instruction to validate or
rescind the validation of a guest page’s RMP entry. Upon completion,
a return code is stored in EAX and rFLAGS bits are set based on the
return code. If the instruction completed successfully, the CF
indicates if the content of the RMP were changed or not.

See AMD APM Volume 3 for additional details.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev-snp.h | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)
 create mode 100644 arch/x86/include/asm/sev-snp.h

diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
new file mode 100644
index 000000000000..5a6d1367cab7
--- /dev/null
+++ b/arch/x86/include/asm/sev-snp.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD SEV Secure Nested Paging Support
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <brijesh.singh@amd.com>
+ */
+
+#ifndef __ASM_SECURE_NESTED_PAGING_H
+#define __ASM_SECURE_NESTED_PAGING_H
+
+#ifndef __ASSEMBLY__
+#include <asm/irqflags.h> /* native_save_fl() */
+
+/* Return code of __pvalidate */
+#define PVALIDATE_SUCCESS		0
+#define PVALIDATE_FAIL_INPUT		1
+#define PVALIDATE_FAIL_SIZEMISMATCH	6
+
+/* RMP page size */
+#define RMP_PG_SIZE_2M			1
+#define RMP_PG_SIZE_4K			0
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
+			      unsigned long *rflags)
+{
+	unsigned long flags;
+	int rc;
+
+	asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
+		     "pushf; pop %0\n\t"
+		     : "=rm"(flags), "=a"(rc)
+		     : "a"(vaddr), "c"(rmp_psize), "d"(validate)
+		     : "memory", "cc");
+
+	*rflags = flags;
+	return rc;
+}
+
+#else	/* !CONFIG_AMD_MEM_ENCRYPT */
+
+static inline int __pvalidate(unsigned long vaddr, int psize, int validate, unsigned long *eflags)
+{
+	return 0;
+}
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif	/* __ASSEMBLY__ */
+#endif  /* __ASM_SECURE_NESTED_PAGING_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 04/13] x86/sev-snp: define page state change VMGEXIT structure
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
                   ` (2 preceding siblings ...)
  2021-03-24 16:44 ` [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-04-01 10:32   ` Borislav Petkov
  2021-03-24 16:44 ` [RFC Part1 PATCH 05/13] X86/sev-es: move few helper functions in common file Brijesh Singh
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

An SNP-active guest will use the page state change VNAE MGEXIT defined in
the GHCB specification section 4.1.6 to ask the hypervisor to make the
guest page private or shared in the RMP table. In addition to the
private/shared, the guest can also ask the hypervisor to split or
combine multiple 4K validated pages as a single 2M page or vice versa.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev-snp.h  | 34 +++++++++++++++++++++++++++++++++
 arch/x86/include/uapi/asm/svm.h |  1 +
 2 files changed, 35 insertions(+)

diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
index 5a6d1367cab7..f514dad276f2 100644
--- a/arch/x86/include/asm/sev-snp.h
+++ b/arch/x86/include/asm/sev-snp.h
@@ -22,6 +22,40 @@
 #define RMP_PG_SIZE_2M			1
 #define RMP_PG_SIZE_4K			0
 
+/* Page State Change MSR Protocol */
+#define GHCB_SNP_PAGE_STATE_CHANGE_REQ	0x0014
+#define		GHCB_SNP_PAGE_STATE_REQ_GFN(v, o)	(GHCB_SNP_PAGE_STATE_CHANGE_REQ | \
+							 ((unsigned long)((o) & 0xf) << 52) | \
+							 (((v) << 12) & 0xffffffffffffff))
+#define	SNP_PAGE_STATE_PRIVATE		1
+#define	SNP_PAGE_STATE_SHARED		2
+#define	SNP_PAGE_STATE_PSMASH		3
+#define	SNP_PAGE_STATE_UNSMASH		4
+
+#define GHCB_SNP_PAGE_STATE_CHANGE_RESP	0x0015
+#define		GHCB_SNP_PAGE_STATE_RESP_VAL(val)	((val) >> 32)
+
+/* Page State Change NAE event */
+#define SNP_PAGE_STATE_CHANGE_MAX_ENTRY		253
+struct __packed snp_page_state_header {
+	uint16_t cur_entry;
+	uint16_t end_entry;
+	uint32_t reserved;
+};
+
+struct __packed snp_page_state_entry {
+	uint64_t cur_page:12;
+	uint64_t gfn:40;
+	uint64_t operation:4;
+	uint64_t pagesize:1;
+	uint64_t reserved:7;
+};
+
+struct __packed snp_page_state_change {
+	struct snp_page_state_header header;
+	struct snp_page_state_entry entry[SNP_PAGE_STATE_CHANGE_MAX_ENTRY];
+};
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
 			      unsigned long *rflags)
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 554f75fe013c..751867aa432f 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -108,6 +108,7 @@
 #define SVM_VMGEXIT_AP_JUMP_TABLE		0x80000005
 #define SVM_VMGEXIT_SET_AP_JUMP_TABLE		0
 #define SVM_VMGEXIT_GET_AP_JUMP_TABLE		1
+#define SVM_VMGEXIT_PAGE_STATE_CHANGE		0x80000010
 #define SVM_VMGEXIT_UNSUPPORTED_EVENT		0x8000ffff
 
 #define SVM_EXIT_ERR           -1
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 05/13] X86/sev-es: move few helper functions in common file
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
                   ` (3 preceding siblings ...)
  2021-03-24 16:44 ` [RFC Part1 PATCH 04/13] x86/sev-snp: define page state change VMGEXIT structure Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-04-02 19:27   ` Borislav Petkov
  2021-03-24 16:44 ` [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB Brijesh Singh
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

The sev_es_terminate() and sev_es_{wr,rd}_ghcb_msr() helper functions
in a common file so that it can be used by both the SEV-ES and SEV-SNP.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/boot/compressed/sev-common.c | 32 +++++++++++++++++++++++++++
 arch/x86/boot/compressed/sev-es.c     | 22 ++----------------
 arch/x86/kernel/sev-common-shared.c   | 31 ++++++++++++++++++++++++++
 arch/x86/kernel/sev-es-shared.c       | 21 +++---------------
 4 files changed, 68 insertions(+), 38 deletions(-)
 create mode 100644 arch/x86/boot/compressed/sev-common.c
 create mode 100644 arch/x86/kernel/sev-common-shared.c

diff --git a/arch/x86/boot/compressed/sev-common.c b/arch/x86/boot/compressed/sev-common.c
new file mode 100644
index 000000000000..d81ff7a3a67d
--- /dev/null
+++ b/arch/x86/boot/compressed/sev-common.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD Encrypted Register State Support
+ *
+ * Author: Brijesh Singh <brijesh.singh@amd.com>
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * This file is not compiled stand-alone. It is includes directly in the
+ * sev-es.c and sev-snp.c.
+ */
+
+static inline u64 sev_es_rd_ghcb_msr(void)
+{
+	unsigned long low, high;
+
+	asm volatile("rdmsr" : "=a" (low), "=d" (high) :
+			"c" (MSR_AMD64_SEV_ES_GHCB));
+
+	return ((high << 32) | low);
+}
+
+static inline void sev_es_wr_ghcb_msr(u64 val)
+{
+	u32 low, high;
+
+	low  = val & 0xffffffffUL;
+	high = val >> 32;
+
+	asm volatile("wrmsr" : : "c" (MSR_AMD64_SEV_ES_GHCB),
+			"a"(low), "d" (high) : "memory");
+}
diff --git a/arch/x86/boot/compressed/sev-es.c b/arch/x86/boot/compressed/sev-es.c
index 27826c265aab..58b15b7c1aa7 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -54,26 +54,8 @@ static unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx)
 	return 0UL;
 }
 
-static inline u64 sev_es_rd_ghcb_msr(void)
-{
-	unsigned long low, high;
-
-	asm volatile("rdmsr" : "=a" (low), "=d" (high) :
-			"c" (MSR_AMD64_SEV_ES_GHCB));
-
-	return ((high << 32) | low);
-}
-
-static inline void sev_es_wr_ghcb_msr(u64 val)
-{
-	u32 low, high;
-
-	low  = val & 0xffffffffUL;
-	high = val >> 32;
-
-	asm volatile("wrmsr" : : "c" (MSR_AMD64_SEV_ES_GHCB),
-			"a"(low), "d" (high) : "memory");
-}
+/* Provides sev_es_{wr,rd}_ghcb_msr() */
+#include "sev-common.c"
 
 static enum es_result vc_decode_insn(struct es_em_ctxt *ctxt)
 {
diff --git a/arch/x86/kernel/sev-common-shared.c b/arch/x86/kernel/sev-common-shared.c
new file mode 100644
index 000000000000..6229566add6f
--- /dev/null
+++ b/arch/x86/kernel/sev-common-shared.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD Encrypted Register State Support
+ *
+ * Author: Brijesh Singh <brijesh.singh@amd.com>
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * This file is not compiled stand-alone. It contains code shared
+ * between the pre-decompression boot code and the running Linux kernel
+ * and is included directly into both code-bases.
+ */
+
+static void sev_es_terminate(unsigned int reason)
+{
+	u64 val = GHCB_SEV_TERMINATE;
+
+	/*
+	 * Tell the hypervisor what went wrong - only reason-set 0 is
+	 * currently supported.
+	 */
+	val |= GHCB_SEV_TERMINATE_REASON(0, reason);
+
+	/* Request Guest Termination from Hypvervisor */
+	sev_es_wr_ghcb_msr(val);
+	VMGEXIT();
+
+	while (true)
+		asm volatile("hlt\n" : : : "memory");
+}
+
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index cdc04d091242..669e15678387 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -14,6 +14,9 @@
 #define has_cpuflag(f)	boot_cpu_has(f)
 #endif
 
+/* Provides sev_es_terminate() */
+#include "sev-common-shared.c"
+
 static bool __init sev_es_check_cpu_features(void)
 {
 	if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -24,24 +27,6 @@ static bool __init sev_es_check_cpu_features(void)
 	return true;
 }
 
-static void sev_es_terminate(unsigned int reason)
-{
-	u64 val = GHCB_SEV_TERMINATE;
-
-	/*
-	 * Tell the hypervisor what went wrong - only reason-set 0 is
-	 * currently supported.
-	 */
-	val |= GHCB_SEV_TERMINATE_REASON(0, reason);
-
-	/* Request Guest Termination from Hypvervisor */
-	sev_es_wr_ghcb_msr(val);
-	VMGEXIT();
-
-	while (true)
-		asm volatile("hlt\n" : : : "memory");
-}
-
 static bool sev_es_negotiate_protocol(void)
 {
 	u64 val;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
                   ` (4 preceding siblings ...)
  2021-03-24 16:44 ` [RFC Part1 PATCH 05/13] X86/sev-es: move few helper functions in common file Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-04-06 10:33   ` Borislav Petkov
  2021-03-24 16:44 ` [RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active Brijesh Singh
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

Many of the integrity guarantees of SEV-SNP are enforced through the
Reverse Map Table (RMP). Each RMP entry contains the GPA at which a
particular page of DRAM should be mapped. The VMs can request the
hypervisor to add pages in the RMP table via the Page State Change VMGEXIT
defined in the GHCB specification section 2.5.1 and 4.1.6. Inside each RMP
entry is a Validated flag; this flag is automatically cleared to 0 by the
CPU hardware when a new RMP entry is created for a guest. Each VM page
can be either validated or invalidated, as indicated by the Validated
flag in the RMP entry. Memory access to a private page that is not
validated generates a #VC. A VM can use PVALIDATE instruction to validate
the private page before using it.

To maintain the security guarantee of SEV-SNP guests, when transitioning
a memory from private to shared, the guest must invalidate the memory range
before asking the hypervisor to change the page state to shared in the RMP
table.

After the page is mapped private in the page table, the guest must issue a
page state change VMGEXIT to make the memory private in the RMP table and
validate it. If the memory is not validated after its added in the RMP table
as private, then a VC exception (page-not-validated) will be raised. We do
not support the page-not-validated exception yet, so it will crash the guest.

On boot, BIOS should have validated the entire system memory. During
the kernel decompression stage, the VC handler uses the
set_memory_decrypted() to make the GHCB page shared (i.e clear encryption
attribute). And while exiting from the decompression, it calls the
set_memory_encyrpted() to make the page private.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/boot/compressed/Makefile       |   1 +
 arch/x86/boot/compressed/ident_map_64.c |  18 ++++
 arch/x86/boot/compressed/sev-snp.c      | 115 ++++++++++++++++++++++++
 arch/x86/boot/compressed/sev-snp.h      |  25 ++++++
 4 files changed, 159 insertions(+)
 create mode 100644 arch/x86/boot/compressed/sev-snp.c
 create mode 100644 arch/x86/boot/compressed/sev-snp.h

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e0bc3988c3fa..4d422aae8a86 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -93,6 +93,7 @@ ifdef CONFIG_X86_64
 	vmlinux-objs-y += $(obj)/mem_encrypt.o
 	vmlinux-objs-y += $(obj)/pgtable_64.o
 	vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-es.o
+	vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-snp.o
 endif
 
 vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index f7213d0943b8..0a420ce5550f 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -37,6 +37,8 @@
 #include <asm/setup.h>	/* For COMMAND_LINE_SIZE */
 #undef _SETUP
 
+#include "sev-snp.h"
+
 extern unsigned long get_cmd_line_ptr(void);
 
 /* Used by PAGE_KERN* macros: */
@@ -278,12 +280,28 @@ static int set_clr_page_flags(struct x86_mapping_info *info,
 	if ((set | clr) & _PAGE_ENC)
 		clflush_page(address);
 
+	/*
+	 * If the encryption attribute is being cleared, then change the page state to
+	 * shared in the RMP entry. Change of the page state must be done before the
+	 * PTE updates.
+	 */
+	if (clr & _PAGE_ENC)
+		sev_snp_set_page_shared(pte_pfn(*ptep) << PAGE_SHIFT);
+
 	/* Update PTE */
 	pte = *ptep;
 	pte = pte_set_flags(pte, set);
 	pte = pte_clear_flags(pte, clr);
 	set_pte(ptep, pte);
 
+	/*
+	 * If the encryption attribute is being set, then change the page state to
+	 * private in the RMP entry. The page state must be done after the PTE
+	 * is updated.
+	 */
+	if (set & _PAGE_ENC)
+		sev_snp_set_page_private(pte_pfn(*ptep) << PAGE_SHIFT);
+
 	/* Flush TLB after changing encryption attribute */
 	write_cr3(top_level_pgt);
 
diff --git a/arch/x86/boot/compressed/sev-snp.c b/arch/x86/boot/compressed/sev-snp.c
new file mode 100644
index 000000000000..5c25103b0df1
--- /dev/null
+++ b/arch/x86/boot/compressed/sev-snp.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD SEV SNP support
+ *
+ * Author: Brijesh Singh <brijesh.singh@amd.com>
+ *
+ */
+
+#include "misc.h"
+#include "error.h"
+
+#include <asm/msr-index.h>
+#include <asm/sev-snp.h>
+#include <asm/sev-es.h>
+
+#include "sev-snp.h"
+
+static bool sev_snp_enabled(void)
+{
+	unsigned long low, high;
+	u64 val;
+
+	asm volatile("rdmsr\n" : "=a" (low), "=d" (high) :
+			"c" (MSR_AMD64_SEV));
+
+	val = (high << 32) | low;
+
+	if (val & MSR_AMD64_SEV_SNP_ENABLED)
+		return true;
+
+	return false;
+}
+
+/* Provides sev_snp_{wr,rd}_ghcb_msr() */
+#include "sev-common.c"
+
+/* Provides sev_es_terminate() */
+#include "../../kernel/sev-common-shared.c"
+
+static void sev_snp_pages_state_change(unsigned long paddr, int op)
+{
+	u64 pfn = paddr >> PAGE_SHIFT;
+	u64 old, val;
+
+	/* save the old GHCB MSR */
+	old = sev_es_rd_ghcb_msr();
+
+	/* Issue VMGEXIT to change the page state */
+	sev_es_wr_ghcb_msr(GHCB_SNP_PAGE_STATE_REQ_GFN(pfn, op));
+	VMGEXIT();
+
+	/* Read the response of the VMGEXIT */
+	val = sev_es_rd_ghcb_msr();
+	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SNP_PAGE_STATE_CHANGE_RESP) ||
+	    (GHCB_SNP_PAGE_STATE_RESP_VAL(val) != 0))
+		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+
+	/* Restore the GHCB MSR value */
+	sev_es_wr_ghcb_msr(old);
+}
+
+static void sev_snp_issue_pvalidate(unsigned long paddr, bool validate)
+{
+	unsigned long eflags;
+	int rc;
+
+	rc = __pvalidate(paddr, RMP_PG_SIZE_4K, validate, &eflags);
+	if (rc) {
+		error("Failed to validate address");
+		goto e_fail;
+	}
+
+	/* Check for the double validation and assert on failure */
+	if (eflags & X86_EFLAGS_CF) {
+		error("Double validation detected");
+		goto e_fail;
+	}
+
+	return;
+e_fail:
+	sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+}
+
+static void sev_snp_set_page_private_shared(unsigned long paddr, int op)
+{
+	if (!sev_snp_enabled())
+		return;
+
+	/*
+	 * We are change the page state from private to shared, invalidate the pages before
+	 * making the page state change in the RMP table.
+	 */
+	if (op == SNP_PAGE_STATE_SHARED)
+		sev_snp_issue_pvalidate(paddr, false);
+
+	/* Request the page state change in the RMP table. */
+	sev_snp_pages_state_change(paddr, op);
+
+	/*
+	 * Now that pages are added in the RMP table as a private memory, validate the
+	 * memory range so that it is consistent with the RMP entry.
+	 */
+	if (op == SNP_PAGE_STATE_PRIVATE)
+		sev_snp_issue_pvalidate(paddr, true);
+}
+
+void sev_snp_set_page_private(unsigned long paddr)
+{
+	sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_PRIVATE);
+}
+
+void sev_snp_set_page_shared(unsigned long paddr)
+{
+	sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_SHARED);
+}
diff --git a/arch/x86/boot/compressed/sev-snp.h b/arch/x86/boot/compressed/sev-snp.h
new file mode 100644
index 000000000000..12fe9581a255
--- /dev/null
+++ b/arch/x86/boot/compressed/sev-snp.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD SEV Secure Nested Paging Support
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh <brijesh.singh@amd.com>
+ */
+
+#ifndef __COMPRESSED_SECURE_NESTED_PAGING_H
+#define __COMPRESSED_SECURE_NESTED_PAGING_H
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+void sev_snp_set_page_private(unsigned long paddr);
+void sev_snp_set_page_shared(unsigned long paddr);
+
+#else
+
+static inline void sev_snp_set_page_private(unsigned long paddr) { }
+static inline void sev_snp_set_page_shared(unsigned long paddr) { }
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif /* __COMPRESSED_SECURE_NESTED_PAGING_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
                   ` (5 preceding siblings ...)
  2021-03-24 16:44 ` [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-04-07 11:59   ` Borislav Petkov
  2021-03-24 16:44 ` [RFC Part1 PATCH 08/13] x86/sev-es: register GHCB memory when SEV-SNP " Brijesh Singh
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

The SEV-SNP guest is required to perform GHCB GPA registration. This is
because the hypervisor may prefer that a guest use a consistent and/or
specific GPA for the GHCB associated with a vCPU. For more information,
see the GHCB specification section 2.5.2.

Currently, we do not support working with hypervisor preferred GPA, If
the hypervisor can not work with our provided GPA then we will terminate
the boot.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/boot/compressed/sev-es.c  |  4 ++++
 arch/x86/boot/compressed/sev-snp.c | 26 ++++++++++++++++++++++++++
 arch/x86/include/asm/sev-snp.h     | 11 +++++++++++
 3 files changed, 41 insertions(+)

diff --git a/arch/x86/boot/compressed/sev-es.c b/arch/x86/boot/compressed/sev-es.c
index 58b15b7c1aa7..c85d3d9ec57a 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -20,6 +20,7 @@
 #include <asm/fpu/xcr.h>
 #include <asm/ptrace.h>
 #include <asm/svm.h>
+#include <asm/sev-snp.h>
 
 #include "error.h"
 
@@ -118,6 +119,9 @@ static bool early_setup_sev_es(void)
 	/* Initialize lookup tables for the instruction decoder */
 	inat_init_tables();
 
+	/* SEV-SNP guest requires the GHCB GPA must be registered */
+	sev_snp_register_ghcb(__pa(&boot_ghcb_page));
+
 	return true;
 }
 
diff --git a/arch/x86/boot/compressed/sev-snp.c b/arch/x86/boot/compressed/sev-snp.c
index 5c25103b0df1..a4c5e85699a7 100644
--- a/arch/x86/boot/compressed/sev-snp.c
+++ b/arch/x86/boot/compressed/sev-snp.c
@@ -113,3 +113,29 @@ void sev_snp_set_page_shared(unsigned long paddr)
 {
 	sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_SHARED);
 }
+
+void sev_snp_register_ghcb(unsigned long paddr)
+{
+	u64 pfn = paddr >> PAGE_SHIFT;
+	u64 old, val;
+
+	if (!sev_snp_enabled())
+		return;
+
+	/* save the old GHCB MSR */
+	old = sev_es_rd_ghcb_msr();
+
+	/* Issue VMGEXIT */
+	sev_es_wr_ghcb_msr(GHCB_REGISTER_GPA_REQ_VAL(pfn));
+	VMGEXIT();
+
+	val = sev_es_rd_ghcb_msr();
+
+	/* If the response GPA is not ours then abort the guest */
+	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_REGISTER_GPA_RESP) ||
+	    (GHCB_REGISTER_GPA_RESP_VAL(val) != pfn))
+		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+
+	/* Restore the GHCB MSR value */
+	sev_es_wr_ghcb_msr(old);
+}
diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
index f514dad276f2..0523eb21abd7 100644
--- a/arch/x86/include/asm/sev-snp.h
+++ b/arch/x86/include/asm/sev-snp.h
@@ -56,6 +56,13 @@ struct __packed snp_page_state_change {
 	struct snp_page_state_entry entry[SNP_PAGE_STATE_CHANGE_MAX_ENTRY];
 };
 
+/* GHCB GPA register */
+#define GHCB_REGISTER_GPA_REQ	0x012UL
+#define		GHCB_REGISTER_GPA_REQ_VAL(v)		(GHCB_REGISTER_GPA_REQ | ((v) << 12))
+
+#define GHCB_REGISTER_GPA_RESP	0x013UL
+#define		GHCB_REGISTER_GPA_RESP_VAL(val)		((val) >> 12)
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
 			      unsigned long *rflags)
@@ -73,6 +80,8 @@ static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
 	return rc;
 }
 
+void sev_snp_register_ghcb(unsigned long paddr);
+
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
 static inline int __pvalidate(unsigned long vaddr, int psize, int validate, unsigned long *eflags)
@@ -80,6 +89,8 @@ static inline int __pvalidate(unsigned long vaddr, int psize, int validate, unsi
 	return 0;
 }
 
+static inline void sev_snp_register_ghcb(unsigned long paddr) { }
+
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 #endif	/* __ASSEMBLY__ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 08/13] x86/sev-es: register GHCB memory when SEV-SNP is active
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
                   ` (6 preceding siblings ...)
  2021-03-24 16:44 ` [RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-04-08  8:38   ` Borislav Petkov
  2021-03-24 16:44 ` [RFC Part1 PATCH 09/13] x86/kernel: add support to validate memory in early enc attribute change Brijesh Singh
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

The SEV-SNP guest is required to perform GHCB GPA registration. This is
because the hypervisor may prefer that a guest use a consistent and/or
specific GPA for the GHCB associated with a vCPU. For more information,
see the GHCB specification section 2.5.2.

During the boot, init_ghcb() allocates a per-cpu GHCB page. On very first
VC exception, the exception handler switch to using the per-cpu GHCB page
allocated during the init_ghcb(). The GHCB page must be registered in
the current vcpu context.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kernel/Makefile  |  3 ++
 arch/x86/kernel/sev-es.c  | 19 +++++++++++++
 arch/x86/kernel/sev-snp.c | 58 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)
 create mode 100644 arch/x86/kernel/sev-snp.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 5eeb808eb024..2fb24c49d2e3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -21,6 +21,7 @@ CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
 CFLAGS_REMOVE_sev-es.o = -pg
+CFLAGS_REMOVE_sev-snp.o = -pg
 endif
 
 KASAN_SANITIZE_head$(BITS).o				:= n
@@ -29,6 +30,7 @@ KASAN_SANITIZE_dumpstack_$(BITS).o			:= n
 KASAN_SANITIZE_stacktrace.o				:= n
 KASAN_SANITIZE_paravirt.o				:= n
 KASAN_SANITIZE_sev-es.o					:= n
+KASAN_SANITIZE_sev-snp.o				:= n
 
 # With some compiler versions the generated code results in boot hangs, caused
 # by several compilation units. To be safe, disable all instrumentation.
@@ -151,6 +153,7 @@ obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)		+= sev-es.o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)		+= sev-snp.o
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 0bd1a0fc587e..004bf1102dc1 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -23,6 +23,7 @@
 #include <asm/cpu_entry_area.h>
 #include <asm/stacktrace.h>
 #include <asm/sev-es.h>
+#include <asm/sev-snp.h>
 #include <asm/insn-eval.h>
 #include <asm/fpu/internal.h>
 #include <asm/processor.h>
@@ -88,6 +89,13 @@ struct sev_es_runtime_data {
 	 * is currently unsupported in SEV-ES guests.
 	 */
 	unsigned long dr7;
+
+	/*
+	 * SEV-SNP requires that the GHCB must be registered before using it.
+	 * The flag below will indicate whether the GHCB is registered, if its
+	 * not registered then sev_es_get_ghcb() will perform the registration.
+	 */
+	bool ghcb_registered;
 };
 
 struct ghcb_state {
@@ -196,6 +204,12 @@ static __always_inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state)
 		data->ghcb_active = true;
 	}
 
+	/* SEV-SNP guest requires that GHCB must be registered before using it. */
+	if (sev_snp_active() && !data->ghcb_registered) {
+		sev_snp_register_ghcb(__pa(ghcb));
+		data->ghcb_registered = true;
+	}
+
 	return ghcb;
 }
 
@@ -569,6 +583,10 @@ static bool __init sev_es_setup_ghcb(void)
 	/* Alright - Make the boot-ghcb public */
 	boot_ghcb = &boot_ghcb_page;
 
+	/* SEV-SNP guest requires that GHCB GPA must be registered */
+	if (sev_snp_active())
+		sev_snp_register_ghcb(__pa(&boot_ghcb_page));
+
 	return true;
 }
 
@@ -658,6 +676,7 @@ static void __init init_ghcb(int cpu)
 
 	data->ghcb_active = false;
 	data->backup_ghcb_active = false;
+	data->ghcb_registered = false;
 }
 
 void __init sev_es_init_vc_handling(void)
diff --git a/arch/x86/kernel/sev-snp.c b/arch/x86/kernel/sev-snp.c
new file mode 100644
index 000000000000..d32225c2b653
--- /dev/null
+++ b/arch/x86/kernel/sev-snp.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2021 Advanced Micro Devices
+ *
+ * Author: Brijesh Singh <brijesh.singh@amd.com>
+ */
+
+#define pr_fmt(fmt)	"SEV-SNP: " fmt
+
+#include <linux/mem_encrypt.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+
+#include <asm/sev-es.h>
+#include <asm/sev-snp.h>
+
+static inline u64 sev_es_rd_ghcb_msr(void)
+{
+	return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
+}
+
+static inline void sev_es_wr_ghcb_msr(u64 val)
+{
+	u32 low, high;
+
+	low  = (u32)(val);
+	high = (u32)(val >> 32);
+
+	native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
+}
+
+/* Provides sev_es_terminate() */
+#include "sev-common-shared.c"
+
+void sev_snp_register_ghcb(unsigned long paddr)
+{
+	u64 pfn = paddr >> PAGE_SHIFT;
+	u64 old, val;
+
+	/* save the old GHCB MSR */
+	old = sev_es_rd_ghcb_msr();
+
+	/* Issue VMGEXIT */
+	sev_es_wr_ghcb_msr(GHCB_REGISTER_GPA_REQ_VAL(pfn));
+	VMGEXIT();
+
+	val = sev_es_rd_ghcb_msr();
+
+	/* If the response GPA is not ours then abort the guest */
+	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_REGISTER_GPA_RESP) ||
+	    (GHCB_REGISTER_GPA_RESP_VAL(val) != pfn))
+		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+
+	/* Restore the GHCB MSR value */
+	sev_es_wr_ghcb_msr(old);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 09/13] x86/kernel: add support to validate memory in early enc attribute change
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
                   ` (7 preceding siblings ...)
  2021-03-24 16:44 ` [RFC Part1 PATCH 08/13] x86/sev-es: register GHCB memory when SEV-SNP " Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-04-08 11:40   ` Borislav Petkov
  2021-03-24 16:44 ` [RFC Part1 PATCH 10/13] X86: kernel: make the bss.decrypted section shared in RMP table Brijesh Singh
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

The early_set_memory_{encrypt,decrypt}() are used for changing the
page from decrypted (shared) to encrypted (private) and vice versa.
When SEV-SNP is active, the page state transition needs to go through
additional steps.

If the page is transitioned from shared to private, then perform the
following after the encryption attribute is set in the page table:

1. Issue the page state change VMGEXIT to add the page as a private
   in the RMP table.
2. Validate the page after its successfully added in the RMP table.

To maintain the security guarantees, if the page is transitioned from
private to shared, then perform the following before clearing the
encryption attribute from the page table.

1. Invalidate the page.
2. Issue the page state change VMGEXIT to make the page shared in the
   RMP table.

The early_set_memory_{encryot,decrypt} can be called before the full GHCB
is setup, use the SNP page state MSR protocol VMGEXIT defined in the GHCB
section 2.3.1 to request the page state change in the RMP table.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev-snp.h |  20 +++++++
 arch/x86/kernel/sev-snp.c      | 105 +++++++++++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt.c      |  40 ++++++++++++-
 3 files changed, 163 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
index 0523eb21abd7..c4b096206062 100644
--- a/arch/x86/include/asm/sev-snp.h
+++ b/arch/x86/include/asm/sev-snp.h
@@ -63,6 +63,10 @@ struct __packed snp_page_state_change {
 #define GHCB_REGISTER_GPA_RESP	0x013UL
 #define		GHCB_REGISTER_GPA_RESP_VAL(val)		((val) >> 12)
 
+/* Macro to convert the x86 page level to the RMP level and vice versa */
+#define X86_RMP_PG_LEVEL(level)	(((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
+#define RMP_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
 			      unsigned long *rflags)
@@ -82,6 +86,11 @@ static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
 
 void sev_snp_register_ghcb(unsigned long paddr);
 
+void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+		unsigned int npages);
+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+		unsigned int npages);
+
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
 static inline int __pvalidate(unsigned long vaddr, int psize, int validate, unsigned long *eflags)
@@ -91,6 +100,17 @@ static inline int __pvalidate(unsigned long vaddr, int psize, int validate, unsi
 
 static inline void sev_snp_register_ghcb(unsigned long paddr) { }
 
+static inline void __init
+early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages)
+{
+	return 0;
+}
+static inline void __init
+early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages)
+{
+	return 0;
+}
+
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 #endif	/* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/sev-snp.c b/arch/x86/kernel/sev-snp.c
index d32225c2b653..ff9b35bfb05c 100644
--- a/arch/x86/kernel/sev-snp.c
+++ b/arch/x86/kernel/sev-snp.c
@@ -56,3 +56,108 @@ void sev_snp_register_ghcb(unsigned long paddr)
 	/* Restore the GHCB MSR value */
 	sev_es_wr_ghcb_msr(old);
 }
+
+static void sev_snp_issue_pvalidate(unsigned long vaddr, unsigned int npages, bool validate)
+{
+	unsigned long eflags, vaddr_end, vaddr_next;
+	int rc;
+
+	vaddr = vaddr & PAGE_MASK;
+	vaddr_end = vaddr + (npages << PAGE_SHIFT);
+
+	for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+		rc = __pvalidate(vaddr, RMP_PG_SIZE_4K, validate, &eflags);
+
+		if (rc) {
+			pr_err("Failed to validate address 0x%lx ret %d\n", vaddr, rc);
+			goto e_fail;
+		}
+
+		/* Check for the double validation condition */
+		if (eflags & X86_EFLAGS_CF) {
+			pr_err("Double %salidation detected (address 0x%lx)\n",
+					validate ? "v" : "inv", vaddr);
+			goto e_fail;
+		}
+
+		vaddr_next = vaddr + PAGE_SIZE;
+	}
+
+	return;
+
+e_fail:
+	/* Dump stack for the debugging purpose */
+	dump_stack();
+
+	/* Ask to terminate the guest */
+	sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+}
+
+static void __init early_snp_set_page_state(unsigned long paddr, unsigned int npages, int op)
+{
+	unsigned long paddr_end, paddr_next;
+	u64 old, val;
+
+	paddr = paddr & PAGE_MASK;
+	paddr_end = paddr + (npages << PAGE_SHIFT);
+
+	/* save the old GHCB MSR */
+	old = sev_es_rd_ghcb_msr();
+
+	for (; paddr < paddr_end; paddr = paddr_next) {
+
+		/*
+		 * Use the MSR protocol VMGEXIT to request the page state change. We use the MSR
+		 * protocol VMGEXIT because in early boot we may not have the full GHCB setup
+		 * yet.
+		 */
+		sev_es_wr_ghcb_msr(GHCB_SNP_PAGE_STATE_REQ_GFN(paddr >> PAGE_SHIFT, op));
+		VMGEXIT();
+
+		val = sev_es_rd_ghcb_msr();
+
+		/* Read the response, if the page state change failed then terminate the guest. */
+		if (GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SNP_PAGE_STATE_CHANGE_RESP)
+			sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+
+		if (GHCB_SNP_PAGE_STATE_RESP_VAL(val) != 0) {
+			pr_err("Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n",
+					op == SNP_PAGE_STATE_PRIVATE ? "private" : "shared",
+					paddr, GHCB_SNP_PAGE_STATE_RESP_VAL(val));
+
+			/* Dump stack for the debugging purpose */
+			dump_stack();
+
+			/* Ask to terminate the guest */
+			sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+		}
+
+		paddr_next = paddr + PAGE_SIZE;
+	}
+
+	/* Restore the GHCB MSR value */
+	sev_es_wr_ghcb_msr(old);
+}
+
+void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+					 unsigned int npages)
+{
+	 /* Ask hypervisor to add the memory in RMP table as a 'private'. */
+	early_snp_set_page_state(paddr, npages, SNP_PAGE_STATE_PRIVATE);
+
+	/* Validate the memory region after its added in the RMP table. */
+	sev_snp_issue_pvalidate(vaddr, npages, true);
+}
+
+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+					unsigned int npages)
+{
+	/*
+	 * We are chaning the memory from private to shared, invalidate the memory region
+	 * before making it shared in the RMP table.
+	 */
+	sev_snp_issue_pvalidate(vaddr, npages, false);
+
+	 /* Ask hypervisor to make the memory shared in the RMP table. */
+	early_snp_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
+}
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 5bd50008fc9a..35af2f21b8f1 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -29,6 +29,7 @@
 #include <asm/processor-flags.h>
 #include <asm/msr.h>
 #include <asm/cmdline.h>
+#include <asm/sev-snp.h>
 
 #include "mm_internal.h"
 
@@ -49,6 +50,27 @@ bool sev_enabled __section(".data");
 /* Buffer used for early in-place encryption by BSP, no locking needed */
 static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
 
+/*
+ * When SNP is active, this routine changes the page state from private to shared before
+ * copying the data from the source to destination and restore after the copy. This is required
+ * because the source address is mapped as decrypted by the caller of the routine.
+ */
+static inline void __init snp_aware_memcpy(void *dst, void *src, size_t sz,
+					   unsigned long paddr, bool dec)
+{
+	unsigned long npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+	/* If the paddr need to accessed decrypted, make the page shared before memcpy. */
+	if (sev_snp_active() && dec)
+		early_snp_set_memory_shared((unsigned long)__va(paddr), paddr, npages);
+
+	memcpy(dst, src, sz);
+
+	/* Restore the page state after the memcpy. */
+	if (sev_snp_active() && dec)
+		early_snp_set_memory_private((unsigned long)__va(paddr), paddr, npages);
+}
+
 /*
  * This routine does not change the underlying encryption setting of the
  * page(s) that map this memory. It assumes that eventually the memory is
@@ -97,8 +119,8 @@ static void __init __sme_early_enc_dec(resource_size_t paddr,
 		 * Use a temporary buffer, of cache-line multiple size, to
 		 * avoid data corruption as documented in the APM.
 		 */
-		memcpy(sme_early_buffer, src, len);
-		memcpy(dst, sme_early_buffer, len);
+		snp_aware_memcpy(sme_early_buffer, src, len, paddr, enc);
+		snp_aware_memcpy(dst, sme_early_buffer, len, paddr, !enc);
 
 		early_memunmap(dst, len);
 		early_memunmap(src, len);
@@ -278,9 +300,23 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
 	else
 		sme_early_decrypt(pa, size);
 
+	/*
+	 * If SEV-SNP is active, rescind validation of the page in the RMP entry before encryption
+	 * attribute is changed from C=1 to C=0.
+	 */
+	if (sev_snp_active() && !enc)
+		early_snp_set_memory_shared((unsigned long)__va(pa), pa, 1);
+
 	/* Change the page encryption mask. */
 	new_pte = pfn_pte(pfn, new_prot);
 	set_pte_atomic(kpte, new_pte);
+
+	/*
+	 * If SEV-SNP is active, validate the page in the RMP entry after encryption attribute is
+	 * changed from C=0 to C=1.
+	 */
+	if (sev_snp_active() && enc)
+		early_snp_set_memory_private((unsigned long)__va(pa), pa, 1);
 }
 
 static int __init early_set_memory_enc_dec(unsigned long vaddr,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 10/13] X86: kernel: make the bss.decrypted section shared in RMP table
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
                   ` (8 preceding siblings ...)
  2021-03-24 16:44 ` [RFC Part1 PATCH 09/13] x86/kernel: add support to validate memory in early enc attribute change Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-03-24 16:44 ` [RFC Part1 PATCH 11/13] x86/kernel: validate rom memory before accessing when SEV-SNP is active Brijesh Singh
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

The encryption attribute for the bss.decrypted region is cleared in the
initial page table build. This is because the section contains the data
that need to be shared between the guest and the hypervisor.

When SEV-SNP is active, just clearing the encryption attribute in the
page table is not enough. We also need to make the page shared in the
RMP table.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kernel/head64.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 5e9beb77cafd..1bf005d38ebc 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -40,6 +40,7 @@
 #include <asm/extable.h>
 #include <asm/trapnr.h>
 #include <asm/sev-es.h>
+#include <asm/sev-snp.h>
 
 /*
  * Manage page tables very early on.
@@ -288,6 +289,19 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	if (mem_encrypt_active()) {
 		vaddr = (unsigned long)__start_bss_decrypted;
 		vaddr_end = (unsigned long)__end_bss_decrypted;
+
+		/*
+		 * The bss.decrypted region is mapped decrypted in the initial page table.
+		 * If SEV-SNP is active then transition the page to shared in the RMP table
+		 * so that it is consistent with the page table attribute change below.
+		 */
+		if (sev_snp_active()) {
+			unsigned long npages;
+
+			npages = PAGE_ALIGN(vaddr_end - vaddr) >> PAGE_SHIFT;
+			early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), npages);
+		}
+
 		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
 			i = pmd_index(vaddr);
 			pmd[i] -= sme_get_me_mask();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 11/13] x86/kernel: validate rom memory before accessing when SEV-SNP is active
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
                   ` (9 preceding siblings ...)
  2021-03-24 16:44 ` [RFC Part1 PATCH 10/13] X86: kernel: make the bss.decrypted section shared in RMP table Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-04-09 16:53   ` Borislav Petkov
  2021-03-24 16:44 ` [RFC Part1 PATCH 12/13] x86/sev-es: make GHCB get and put helper accessible outside Brijesh Singh
  2021-03-24 16:44 ` [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit Brijesh Singh
  12 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

The probe_roms() access the memory range (0xc0000 - 0x10000) to probe
various ROMs. The memory range is not part of the E820 system RAM
range. The memory range is mapped as private (i.e encrypted) in page
table.

When SEV-SNP is active, all the private memory must be validated before
the access. The ROM range was not part of E820 map, so the guest BIOS
did not validate it. An access to invalidated memory will cause a VC
exception. We don't have VC exception handler ready to validate the
memory on-demand. Lets validate the ROM memory region before it is
assessed.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kernel/probe_roms.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index 9e1def3744f2..65640b401b9c 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -21,6 +21,8 @@
 #include <asm/sections.h>
 #include <asm/io.h>
 #include <asm/setup_arch.h>
+#include <asm/mem_encrypt.h>
+#include <asm/sev-snp.h>
 
 static struct resource system_rom_resource = {
 	.name	= "System ROM",
@@ -202,6 +204,19 @@ void __init probe_roms(void)
 	unsigned char c;
 	int i;
 
+	/*
+	 * The ROM memory is not part of the E820 system RAM and is not prevalidated by the BIOS.
+	 * The kernel page table maps the ROM region as encrypted memory, the SEV-SNP requires
+	 * the all the encrypted memory must be validated before the access.
+	 */
+	if (sev_snp_active()) {
+		unsigned long n, paddr;
+
+		n = ((system_rom_resource.end + 1) - video_rom_resource.start) >> PAGE_SHIFT;
+		paddr = video_rom_resource.start;
+		early_snp_set_memory_private((unsigned long)__va(paddr), paddr, n);
+	}
+
 	/* video rom */
 	upper = adapter_rom_resources[0].start;
 	for (start = video_rom_resource.start; start < upper; start += 2048) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 12/13] x86/sev-es: make GHCB get and put helper accessible outside
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
                   ` (10 preceding siblings ...)
  2021-03-24 16:44 ` [RFC Part1 PATCH 11/13] x86/kernel: validate rom memory before accessing when SEV-SNP is active Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-03-24 16:44 ` [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit Brijesh Singh
  12 siblings, 0 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

The SEV-SNP support extended the GHCB specification with few SNP-specific
VMGEXITs. Those VMGEXITs will be implemented in sev-snp.c. Make the GHCB
get/put helper available outside the sev-es.c so that SNP VMGEXIT can
avoid duplicating the GHCB get/put logic..

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev-es.h | 9 +++++++++
 arch/x86/kernel/sev-es.c      | 8 ++------
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index cf1d957c7091..33838a8f8495 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -81,6 +81,10 @@ extern void vc_no_ghcb(void);
 extern void vc_boot_ghcb(void);
 extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 
+struct ghcb_state {
+	struct ghcb *ghcb;
+};
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 extern struct static_key_false sev_es_enable_key;
 extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -103,12 +107,17 @@ static __always_inline void sev_es_nmi_complete(void)
 		__sev_es_nmi_complete();
 }
 extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+extern struct ghcb *sev_es_get_ghcb(struct ghcb_state *state);
+extern void sev_es_put_ghcb(struct ghcb_state *state);
+
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
 static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { return 0; }
 static inline void sev_es_nmi_complete(void) { }
 static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
+static inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state) { return NULL; }
+static inline void sev_es_put_ghcb(struct ghcb_state *state) { }
 #endif
 
 #endif
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 004bf1102dc1..d4957b3fc43f 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -98,10 +98,6 @@ struct sev_es_runtime_data {
 	bool ghcb_registered;
 };
 
-struct ghcb_state {
-	struct ghcb *ghcb;
-};
-
 static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
 DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
 
@@ -178,7 +174,7 @@ void noinstr __sev_es_ist_exit(void)
 	this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
 }
 
-static __always_inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state)
+struct ghcb *sev_es_get_ghcb(struct ghcb_state *state)
 {
 	struct sev_es_runtime_data *data;
 	struct ghcb *ghcb;
@@ -213,7 +209,7 @@ static __always_inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state)
 	return ghcb;
 }
 
-static __always_inline void sev_es_put_ghcb(struct ghcb_state *state)
+void sev_es_put_ghcb(struct ghcb_state *state)
 {
 	struct sev_es_runtime_data *data;
 	struct ghcb *ghcb;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit
  2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
                   ` (11 preceding siblings ...)
  2021-03-24 16:44 ` [RFC Part1 PATCH 12/13] x86/sev-es: make GHCB get and put helper accessible outside Brijesh Singh
@ 2021-03-24 16:44 ` Brijesh Singh
  2021-04-12 11:49   ` Borislav Petkov
  12 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-24 16:44 UTC (permalink / raw)
  To: linux-kernel, x86, kvm
  Cc: ak, Brijesh Singh, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

The set_memory_{encrypt,decrypt}() are used for changing the pages
from decrypted (shared) to encrypted (private) and vice versa.
When SEV-SNP is active, the page state transition needs to go through
additional steps.

If the page is transitioned from shared to private, then perform the
following after the encryption attribute is set in the page table:

1. Issue the page state change VMGEXIT to add the memory region in
   the RMP table.
2. Validate the memory region after the RMP entry is added.

To maintain the security guarantees, if the page is transitioned from
private to shared, then perform the following before encryption attribute
is removed from the page table:

1. Invalidate the page.
2. Issue the page state change VMGEXIT to remove the page from RMP table.

To change the page state in the RMP table, use the Page State Change
VMGEXIT defined in the GHCB spec section 4.1.6.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/include/asm/sev-es.h  |   2 +
 arch/x86/include/asm/sev-snp.h |   4 ++
 arch/x86/kernel/sev-es.c       |   7 +++
 arch/x86/kernel/sev-snp.c      | 106 +++++++++++++++++++++++++++++++++
 arch/x86/mm/pat/set_memory.c   |  19 ++++++
 5 files changed, 138 insertions(+)

diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index 33838a8f8495..8715e41e2c8f 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -109,6 +109,7 @@ static __always_inline void sev_es_nmi_complete(void)
 extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
 extern struct ghcb *sev_es_get_ghcb(struct ghcb_state *state);
 extern void sev_es_put_ghcb(struct ghcb_state *state);
+extern int vmgexit_page_state_change(struct ghcb *ghcb, void *data);
 
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
@@ -118,6 +119,7 @@ static inline void sev_es_nmi_complete(void) { }
 static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
 static inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state) { return NULL; }
 static inline void sev_es_put_ghcb(struct ghcb_state *state) { }
+static inline int vmgexit_page_state_change(struct ghcb *ghcb, void *data) { return 0; }
 #endif
 
 #endif
diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
index c4b096206062..59b57a5f6524 100644
--- a/arch/x86/include/asm/sev-snp.h
+++ b/arch/x86/include/asm/sev-snp.h
@@ -90,6 +90,8 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
 		unsigned int npages);
 void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
 		unsigned int npages);
+int snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
+int snp_set_memory_private(unsigned long vaddr, unsigned int npages);
 
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
@@ -110,6 +112,8 @@ early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned i
 {
 	return 0;
 }
+static inline int snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { return 0; }
+static inline int snp_set_memory_private(unsigned long vaddr, unsigned int npages) { return 0; }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index d4957b3fc43f..7309be685440 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -586,6 +586,13 @@ static bool __init sev_es_setup_ghcb(void)
 	return true;
 }
 
+int vmgexit_page_state_change(struct ghcb *ghcb, void *data)
+{
+	ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
+
+	return sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PAGE_STATE_CHANGE, 0, 0);
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 static void sev_es_ap_hlt_loop(void)
 {
diff --git a/arch/x86/kernel/sev-snp.c b/arch/x86/kernel/sev-snp.c
index ff9b35bfb05c..d236089c0739 100644
--- a/arch/x86/kernel/sev-snp.c
+++ b/arch/x86/kernel/sev-snp.c
@@ -15,6 +15,7 @@
 
 #include <asm/sev-es.h>
 #include <asm/sev-snp.h>
+#include <asm/svm.h>
 
 static inline u64 sev_es_rd_ghcb_msr(void)
 {
@@ -161,3 +162,108 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
 	 /* Ask hypervisor to make the memory shared in the RMP table. */
 	early_snp_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
 }
+
+static int snp_page_state_vmgexit(struct ghcb *ghcb, struct snp_page_state_change *data)
+{
+	struct snp_page_state_header *hdr;
+	int ret = 0;
+
+	hdr = &data->header;
+
+	/*
+	 * The hypervisor can return before processing all the entries, the loop below retries
+	 * until all the entries are processed.
+	 */
+	while (hdr->cur_entry <= hdr->end_entry) {
+		ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
+		ret = vmgexit_page_state_change(ghcb, data);
+		/* Page State Change VMGEXIT can pass error code through exit_info_2. */
+		if (ret || ghcb->save.sw_exit_info_2)
+			break;
+	}
+
+	return ret;
+}
+
+static void snp_set_page_state(unsigned long paddr, unsigned int npages, int op)
+{
+	unsigned long paddr_end, paddr_next;
+	struct snp_page_state_change *data;
+	struct snp_page_state_header *hdr;
+	struct snp_page_state_entry *e;
+	struct ghcb_state state;
+	struct ghcb *ghcb;
+	int ret, idx;
+
+	paddr = paddr & PAGE_MASK;
+	paddr_end = paddr + (npages << PAGE_SHIFT);
+
+	ghcb = sev_es_get_ghcb(&state);
+
+	data = (struct snp_page_state_change *)ghcb->shared_buffer;
+	hdr = &data->header;
+	e = &(data->entry[0]);
+	memset(data, 0, sizeof (*data));
+
+	for (idx = 0; paddr < paddr_end; paddr = paddr_next) {
+		int level = PG_LEVEL_4K;
+
+		/* If we cannot fit more request then issue VMGEXIT before going further.  */
+		if (hdr->end_entry == (SNP_PAGE_STATE_CHANGE_MAX_ENTRY - 1)) {
+			ret = snp_page_state_vmgexit(ghcb, data);
+			if (ret)
+				goto e_fail;
+
+			idx = 0;
+			memset(data, 0, sizeof (*data));
+			e = &(data->entry[0]);
+		}
+
+		hdr->end_entry = idx;
+		e->gfn = paddr >> PAGE_SHIFT;
+		e->operation = op;
+		e->pagesize = X86_RMP_PG_LEVEL(level);
+		e++;
+		idx++;
+		paddr_next = paddr + page_level_size(level);
+	}
+
+	/*
+	 * We can exit the above loop before issuing the VMGEXIT, if we exited before calling the
+	 * the VMGEXIT, then issue the VMGEXIT now.
+	 */
+	if (idx)
+		ret = snp_page_state_vmgexit(ghcb, data);
+
+	sev_es_put_ghcb(&state);
+	return;
+
+e_fail:
+	/* Dump stack for the debugging purpose */
+	dump_stack();
+
+	/* Ask to terminate the guest */
+	sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+}
+
+int snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
+{
+	/* Invalidate the memory before changing the page state in the RMP table. */
+	sev_snp_issue_pvalidate(vaddr, npages, false);
+
+	/* Change the page state in the RMP table. */
+	snp_set_page_state(__pa(vaddr), npages, SNP_PAGE_STATE_SHARED);
+
+	return 0;
+}
+
+int snp_set_memory_private(unsigned long vaddr, unsigned int npages)
+{
+	/* Change the page state in the RMP table. */
+	snp_set_page_state(__pa(vaddr), npages, SNP_PAGE_STATE_PRIVATE);
+
+	/* Validate the memory after the memory is made private in the RMP table. */
+	sev_snp_issue_pvalidate(vaddr, npages, true);
+
+	return 0;
+}
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 16f878c26667..19ee18ddbc37 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -27,6 +27,8 @@
 #include <asm/proto.h>
 #include <asm/memtype.h>
 #include <asm/set_memory.h>
+#include <asm/mem_encrypt.h>
+#include <asm/sev-snp.h>
 
 #include "../mm_internal.h"
 
@@ -2001,8 +2003,25 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 	 */
 	cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
 
+	/*
+	 * To maintain the security gurantees of SEV-SNP guest invalidate the memory before
+	 * clearing the encryption attribute.
+	 */
+	if (sev_snp_active() && !enc) {
+		ret = snp_set_memory_shared(addr, numpages);
+		if (ret)
+			return ret;
+	}
+
 	ret = __change_page_attr_set_clr(&cpa, 1);
 
+	/*
+	 * Now that memory is mapped encrypted in the page table, validate the memory range before
+	 * we return from here.
+	 */
+	if (!ret && sev_snp_active() && enc)
+		ret = snp_set_memory_private(addr, numpages);
+
 	/*
 	 * After changing the encryption attribute, we need to flush TLBs again
 	 * in case any speculative TLB caching occurred (but no need to flush
-- 
2.17.1


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 01/13] x86/cpufeatures: Add SEV-SNP CPU feature
  2021-03-24 16:44 ` [RFC Part1 PATCH 01/13] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
@ 2021-03-25 10:54   ` Borislav Petkov
  2021-03-25 14:50     ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-03-25 10:54 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Wed, Mar 24, 2021 at 11:44:12AM -0500, Brijesh Singh wrote:
> Add CPU feature detection for Secure Encrypted Virtualization with
> Secure Nested Paging. This feature adds a strong memory integrity
> protection to help prevent malicious hypervisor-based attacks like
> data replay, memory re-mapping, and more.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Joerg Roedel <jroedel@suse.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  arch/x86/kernel/cpu/amd.c          | 3 ++-
>  arch/x86/kernel/cpu/scattered.c    | 1 +
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 84b887825f12..a5b369f10bcd 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -238,6 +238,7 @@
>  #define X86_FEATURE_VMW_VMMCALL		( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
>  #define X86_FEATURE_SEV_ES		( 8*32+20) /* AMD Secure Encrypted Virtualization - Encrypted State */
>  #define X86_FEATURE_VM_PAGE_FLUSH	( 8*32+21) /* "" VM Page Flush MSR is supported */
> +#define X86_FEATURE_SEV_SNP		( 8*32+22) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */

That leaf got a separate word now: word 19.

For the future: pls redo your patches against tip/master because it has
the latest state of affairs in tip-land.

>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
>  #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index f8ca66f3d861..39f7a4b5b04c 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -586,7 +586,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>  	 *	      If BIOS has not enabled SME then don't advertise the
>  	 *	      SME feature (set in scattered.c).
>  	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
> -	 *            SEV and SEV_ES feature (set in scattered.c).
> +	 *            SEV, SEV_ES and SEV_SNP feature (set in scattered.c).

So you can remove the "scattered.c" references in the comments here.

>  	 *
>  	 *   In all cases, since support for SME and SEV requires long mode,
>  	 *   don't advertise the feature under CONFIG_X86_32.
> @@ -618,6 +618,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>  clear_sev:
>  		setup_clear_cpu_cap(X86_FEATURE_SEV);
>  		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
> +		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>  	}
>  }
>  
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 236924930bf0..eaec1278dc2e 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_SEV_ES,		CPUID_EAX,  3, 0x8000001f, 0 },
>  	{ X86_FEATURE_SME_COHERENT,	CPUID_EAX, 10, 0x8000001f, 0 },
>  	{ X86_FEATURE_VM_PAGE_FLUSH,	CPUID_EAX,  2, 0x8000001f, 0 },
> +	{ X86_FEATURE_SEV_SNP,		CPUID_EAX,  4, 0x8000001f, 0 },
>  	{ 0, 0, 0, 0, 0 }
>  };

And this too.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 01/13] x86/cpufeatures: Add SEV-SNP CPU feature
  2021-03-25 10:54   ` Borislav Petkov
@ 2021-03-25 14:50     ` Brijesh Singh
  2021-03-25 16:29       ` Borislav Petkov
  0 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-25 14:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 3/25/21 5:54 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:12AM -0500, Brijesh Singh wrote:
>> Add CPU feature detection for Secure Encrypted Virtualization with
>> Secure Nested Paging. This feature adds a strong memory integrity
>> protection to help prevent malicious hypervisor-based attacks like
>> data replay, memory re-mapping, and more.
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Joerg Roedel <jroedel@suse.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Tony Luck <tony.luck@intel.com>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Sean Christopherson <seanjc@google.com>
>> Cc: x86@kernel.org
>> Cc: kvm@vger.kernel.org
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  arch/x86/include/asm/cpufeatures.h | 1 +
>>  arch/x86/kernel/cpu/amd.c          | 3 ++-
>>  arch/x86/kernel/cpu/scattered.c    | 1 +
>>  3 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index 84b887825f12..a5b369f10bcd 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -238,6 +238,7 @@
>>  #define X86_FEATURE_VMW_VMMCALL		( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
>>  #define X86_FEATURE_SEV_ES		( 8*32+20) /* AMD Secure Encrypted Virtualization - Encrypted State */
>>  #define X86_FEATURE_VM_PAGE_FLUSH	( 8*32+21) /* "" VM Page Flush MSR is supported */
>> +#define X86_FEATURE_SEV_SNP		( 8*32+22) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
> That leaf got a separate word now: word 19.
>
> For the future: pls redo your patches against tip/master because it has
> the latest state of affairs in tip-land.


For the early feedback I was trying to find one tree which can be used
for building both the guest and hypervisor at once. In future, I will
submit the part-1 against the tip/master and part-2 against the
kvm/master. thanks


>
>>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
>>  #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
>> index f8ca66f3d861..39f7a4b5b04c 100644
>> --- a/arch/x86/kernel/cpu/amd.c
>> +++ b/arch/x86/kernel/cpu/amd.c
>> @@ -586,7 +586,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>  	 *	      If BIOS has not enabled SME then don't advertise the
>>  	 *	      SME feature (set in scattered.c).
>>  	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
>> -	 *            SEV and SEV_ES feature (set in scattered.c).
>> +	 *            SEV, SEV_ES and SEV_SNP feature (set in scattered.c).
> So you can remove the "scattered.c" references in the comments here.
>
>>  	 *
>>  	 *   In all cases, since support for SME and SEV requires long mode,
>>  	 *   don't advertise the feature under CONFIG_X86_32.
>> @@ -618,6 +618,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>  clear_sev:
>>  		setup_clear_cpu_cap(X86_FEATURE_SEV);
>>  		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
>> +		setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>>  	}
>>  }
>>  
>> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
>> index 236924930bf0..eaec1278dc2e 100644
>> --- a/arch/x86/kernel/cpu/scattered.c
>> +++ b/arch/x86/kernel/cpu/scattered.c
>> @@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>>  	{ X86_FEATURE_SEV_ES,		CPUID_EAX,  3, 0x8000001f, 0 },
>>  	{ X86_FEATURE_SME_COHERENT,	CPUID_EAX, 10, 0x8000001f, 0 },
>>  	{ X86_FEATURE_VM_PAGE_FLUSH,	CPUID_EAX,  2, 0x8000001f, 0 },
>> +	{ X86_FEATURE_SEV_SNP,		CPUID_EAX,  4, 0x8000001f, 0 },
>>  	{ 0, 0, 0, 0, 0 }
>>  };
> And this too.
>
> Thx.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 01/13] x86/cpufeatures: Add SEV-SNP CPU feature
  2021-03-25 14:50     ` Brijesh Singh
@ 2021-03-25 16:29       ` Borislav Petkov
  0 siblings, 0 replies; 52+ messages in thread
From: Borislav Petkov @ 2021-03-25 16:29 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Thu, Mar 25, 2021 at 09:50:20AM -0500, Brijesh Singh wrote:
> For the early feedback I was trying to find one tree which can be used
> for building both the guest and hypervisor at once. In future, I will
> submit the part-1 against the tip/master and part-2 against the
> kvm/master. thanks

Then I think you could base ontop of current linux-next because it has
both trees. I presume test-applying the patches on our trees then should
work. I think...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction
  2021-03-24 16:44 ` [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction Brijesh Singh
@ 2021-03-26 14:30   ` Borislav Petkov
  2021-03-26 15:42     ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-03-26 14:30 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Wed, Mar 24, 2021 at 11:44:14AM -0500, Brijesh Singh wrote:
>  arch/x86/include/asm/sev-snp.h | 52 ++++++++++++++++++++++++++++++++++

Hmm, a separate header.

Yeah, I know we did sev-es.h but I think it all should be in a single
sev.h which contains all AMD-specific memory encryption declarations.
It's not like it is going to be huge or so, by the looks of how big
sev-es.h is.

Or is there a particular need to have a separate snp header?

If not, please do a pre-patch which renames sev-es.h to sev.h and then
add the SNP stuff to it.

>  1 file changed, 52 insertions(+)
>  create mode 100644 arch/x86/include/asm/sev-snp.h
> 
> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
> new file mode 100644
> index 000000000000..5a6d1367cab7
> --- /dev/null
> +++ b/arch/x86/include/asm/sev-snp.h
> @@ -0,0 +1,52 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * AMD SEV Secure Nested Paging Support
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Brijesh Singh <brijesh.singh@amd.com>
> + */
> +
> +#ifndef __ASM_SECURE_NESTED_PAGING_H
> +#define __ASM_SECURE_NESTED_PAGING_H
> +
> +#ifndef __ASSEMBLY__
> +#include <asm/irqflags.h> /* native_save_fl() */

Where is that used? Looks like leftovers.

> +
> +/* Return code of __pvalidate */
> +#define PVALIDATE_SUCCESS		0
> +#define PVALIDATE_FAIL_INPUT		1
> +#define PVALIDATE_FAIL_SIZEMISMATCH	6
> +
> +/* RMP page size */
> +#define RMP_PG_SIZE_2M			1
> +#define RMP_PG_SIZE_4K			0
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,

Why the "__" prefix?

> +			      unsigned long *rflags)
> +{
> +	unsigned long flags;
> +	int rc;
> +
> +	asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
> +		     "pushf; pop %0\n\t"

Ewww, PUSHF is expensive.

> +		     : "=rm"(flags), "=a"(rc)
> +		     : "a"(vaddr), "c"(rmp_psize), "d"(validate)
> +		     : "memory", "cc");
> +
> +	*rflags = flags;
> +	return rc;

Hmm, rc *and* rflags. Manual says "Upon completion, a return code is
stored in EAX. rFLAGS bits OF, ZF, AF, PF and SF are set based on this
return code."

So what exactly does that mean and is the return code duplicated in
rFLAGS?

If so, can you return a single value which has everything you need to
know?

I see that you're using the retval only for the carry flag to check
whether the page has already been validated so I think you could define
a set of return value defines from that function which callers can
check.

And looking above again, you do have PVALIDATE_* defines except that
nothing's using them. Use them please.

Also, for how to do condition code checks properly, see how the
CC_SET/CC_OUT macros are used.

> +}
> +
> +#else	/* !CONFIG_AMD_MEM_ENCRYPT */

This else-ifdeffery can go too if you move the ifdeffery inside the
function:

static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
{
	int rc = 0;

#fidef CONFIG_AMD_MEM_ENCRYPT

	...

#endif

	return rc;
}

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction
  2021-03-26 14:30   ` Borislav Petkov
@ 2021-03-26 15:42     ` Brijesh Singh
  2021-03-26 18:22       ` Brijesh Singh
  2021-03-26 19:22       ` Borislav Petkov
  0 siblings, 2 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-03-26 15:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 3/26/21 9:30 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:14AM -0500, Brijesh Singh wrote:
>>  arch/x86/include/asm/sev-snp.h | 52 ++++++++++++++++++++++++++++++++++
> Hmm, a separate header.
>
> Yeah, I know we did sev-es.h but I think it all should be in a single
> sev.h which contains all AMD-specific memory encryption declarations.
> It's not like it is going to be huge or so, by the looks of how big
> sev-es.h is.
>
> Or is there a particular need to have a separate snp header?
>
> If not, please do a pre-patch which renames sev-es.h to sev.h and then
> add the SNP stuff to it.


There is no strong reason for a separate sev-snp.h. I will add a
pre-patch to rename sev-es.h to sev.h and add SNP stuff to it.


>
>>  1 file changed, 52 insertions(+)
>>  create mode 100644 arch/x86/include/asm/sev-snp.h
>>
>> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
>> new file mode 100644
>> index 000000000000..5a6d1367cab7
>> --- /dev/null
>> +++ b/arch/x86/include/asm/sev-snp.h
>> @@ -0,0 +1,52 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * AMD SEV Secure Nested Paging Support
>> + *
>> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
>> + *
>> + * Author: Brijesh Singh <brijesh.singh@amd.com>
>> + */
>> +
>> +#ifndef __ASM_SECURE_NESTED_PAGING_H
>> +#define __ASM_SECURE_NESTED_PAGING_H
>> +
>> +#ifndef __ASSEMBLY__
>> +#include <asm/irqflags.h> /* native_save_fl() */
> Where is that used? Looks like leftovers.


Initially I was thinking to use the native_save_fl() to read the rFlags
but then realized that what if rFlags get changed between the call to
pvalidate instruction and native_save_fl(). I will remove this header
inclusion. Thank you for pointing.

>
>> +
>> +/* Return code of __pvalidate */
>> +#define PVALIDATE_SUCCESS		0
>> +#define PVALIDATE_FAIL_INPUT		1
>> +#define PVALIDATE_FAIL_SIZEMISMATCH	6
>> +
>> +/* RMP page size */
>> +#define RMP_PG_SIZE_2M			1
>> +#define RMP_PG_SIZE_4K			0
>> +
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
> Why the "__" prefix?

I was trying to adhere to existing functions which uses a direct
instruction opcode. Most of those function have "__" prefix (e.g
__mwait, __tpause, ..).

Should I drop the __prefix ?

 

>
>> +			      unsigned long *rflags)
>> +{
>> +	unsigned long flags;
>> +	int rc;
>> +
>> +	asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
>> +		     "pushf; pop %0\n\t"
> Ewww, PUSHF is expensive.
>
>> +		     : "=rm"(flags), "=a"(rc)
>> +		     : "a"(vaddr), "c"(rmp_psize), "d"(validate)
>> +		     : "memory", "cc");
>> +
>> +	*rflags = flags;
>> +	return rc;
> Hmm, rc *and* rflags. Manual says "Upon completion, a return code is
> stored in EAX. rFLAGS bits OF, ZF, AF, PF and SF are set based on this
> return code."
>
> So what exactly does that mean and is the return code duplicated in
> rFLAGS?


It's not duplicate error code. The EAX returns an actual error code. The
rFlags contains additional information. We want both the codes available
to the caller so that it can make a proper decision.

e.g.

1. A callers validate an address 0x1000. The instruction validated it
and return success.

2. Caller asked to validate the same address again. The instruction will
return success but since the address was validated before hence
rFlags.CF will be set to indicate that PVALIDATE instruction did not
made any change in the RMP table.

> If so, can you return a single value which has everything you need to
> know?
>
> I see that you're using the retval only for the carry flag to check
> whether the page has already been validated so I think you could define
> a set of return value defines from that function which callers can
> check.


You are correct that currently I am using only carry flag. So far we
don't need other flags. What do you think about something like this:

* Add a new user defined error code

 #define PVALIDATE_FAIL_NOUPDATE        255 /* The error is returned if
rFlags.CF set */

* Remove the rFlags parameters from the __pvalidate()

* Update the __pvalidate to check the rFlags.CF and if set then return
the new user-defined error code.


> And looking above again, you do have PVALIDATE_* defines except that
> nothing's using them. Use them please.

Actually the later patches does make use of the error codes (e.g
SIZEMISMATCH). The caller should check the error code value and can take
an action to resolve them. e.g a sizemismatch is seen then it can use
the lower page level for the validation etc.


>
> Also, for how to do condition code checks properly, see how the
> CC_SET/CC_OUT macros are used.


I will look into it. thanks.


>> +}
>> +
>> +#else	/* !CONFIG_AMD_MEM_ENCRYPT */
> This else-ifdeffery can go too if you move the ifdeffery inside the
> function:
>
> static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
> {
> 	int rc = 0;
>
> #fidef CONFIG_AMD_MEM_ENCRYPT
>
> 	...
>
> #endif
>
> 	return rc;
> }


Noted. thanks


>
> Thx.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction
  2021-03-26 15:42     ` Brijesh Singh
@ 2021-03-26 18:22       ` Brijesh Singh
  2021-03-26 19:12         ` Borislav Petkov
  2021-03-26 19:22       ` Borislav Petkov
  1 sibling, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-03-26 18:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 3/26/21 10:42 AM, Brijesh Singh wrote:
> On 3/26/21 9:30 AM, Borislav Petkov wrote:
>> On Wed, Mar 24, 2021 at 11:44:14AM -0500, Brijesh Singh wrote:
>>>  arch/x86/include/asm/sev-snp.h | 52 ++++++++++++++++++++++++++++++++++
>> Hmm, a separate header.
>>
>> Yeah, I know we did sev-es.h but I think it all should be in a single
>> sev.h which contains all AMD-specific memory encryption declarations.
>> It's not like it is going to be huge or so, by the looks of how big
>> sev-es.h is.
>>
>> Or is there a particular need to have a separate snp header?
>>
>> If not, please do a pre-patch which renames sev-es.h to sev.h and then
>> add the SNP stuff to it.
>
> There is no strong reason for a separate sev-snp.h. I will add a
> pre-patch to rename sev-es.h to sev.h and add SNP stuff to it.


Should I do the same for the sev-es.c ? Currently, I am keeping all the
SEV-SNP specific changes in sev-snp.{c,h}. After a rename of
sev-es.{c,h} from both the arch/x86/kernel and arch-x86/boot/compressed
I can add the SNP specific stuff to it.

Thoughts ?

>
>>>  1 file changed, 52 insertions(+)
>>>  create mode 100644 arch/x86/include/asm/sev-snp.h
>>>
>>> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
>>> new file mode 100644
>>> index 000000000000..5a6d1367cab7
>>> --- /dev/null
>>> +++ b/arch/x86/include/asm/sev-snp.h
>>> @@ -0,0 +1,52 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +/*
>>> + * AMD SEV Secure Nested Paging Support
>>> + *
>>> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
>>> + *
>>> + * Author: Brijesh Singh <brijesh.singh@amd.com>
>>> + */
>>> +
>>> +#ifndef __ASM_SECURE_NESTED_PAGING_H
>>> +#define __ASM_SECURE_NESTED_PAGING_H
>>> +
>>> +#ifndef __ASSEMBLY__
>>> +#include <asm/irqflags.h> /* native_save_fl() */
>> Where is that used? Looks like leftovers.
>
> Initially I was thinking to use the native_save_fl() to read the rFlags
> but then realized that what if rFlags get changed between the call to
> pvalidate instruction and native_save_fl(). I will remove this header
> inclusion. Thank you for pointing.
>
>>> +
>>> +/* Return code of __pvalidate */
>>> +#define PVALIDATE_SUCCESS		0
>>> +#define PVALIDATE_FAIL_INPUT		1
>>> +#define PVALIDATE_FAIL_SIZEMISMATCH	6
>>> +
>>> +/* RMP page size */
>>> +#define RMP_PG_SIZE_2M			1
>>> +#define RMP_PG_SIZE_4K			0
>>> +
>>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>>> +static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
>> Why the "__" prefix?
> I was trying to adhere to existing functions which uses a direct
> instruction opcode. Most of those function have "__" prefix (e.g
> __mwait, __tpause, ..).
>
> Should I drop the __prefix ?
>
>  
>
>>> +			      unsigned long *rflags)
>>> +{
>>> +	unsigned long flags;
>>> +	int rc;
>>> +
>>> +	asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
>>> +		     "pushf; pop %0\n\t"
>> Ewww, PUSHF is expensive.
>>
>>> +		     : "=rm"(flags), "=a"(rc)
>>> +		     : "a"(vaddr), "c"(rmp_psize), "d"(validate)
>>> +		     : "memory", "cc");
>>> +
>>> +	*rflags = flags;
>>> +	return rc;
>> Hmm, rc *and* rflags. Manual says "Upon completion, a return code is
>> stored in EAX. rFLAGS bits OF, ZF, AF, PF and SF are set based on this
>> return code."
>>
>> So what exactly does that mean and is the return code duplicated in
>> rFLAGS?
>
> It's not duplicate error code. The EAX returns an actual error code. The
> rFlags contains additional information. We want both the codes available
> to the caller so that it can make a proper decision.
>
> e.g.
>
> 1. A callers validate an address 0x1000. The instruction validated it
> and return success.
>
> 2. Caller asked to validate the same address again. The instruction will
> return success but since the address was validated before hence
> rFlags.CF will be set to indicate that PVALIDATE instruction did not
> made any change in the RMP table.
>
>> If so, can you return a single value which has everything you need to
>> know?
>>
>> I see that you're using the retval only for the carry flag to check
>> whether the page has already been validated so I think you could define
>> a set of return value defines from that function which callers can
>> check.
>
> You are correct that currently I am using only carry flag. So far we
> don't need other flags. What do you think about something like this:
>
> * Add a new user defined error code
>
>  #define PVALIDATE_FAIL_NOUPDATE        255 /* The error is returned if
> rFlags.CF set */
>
> * Remove the rFlags parameters from the __pvalidate()
>
> * Update the __pvalidate to check the rFlags.CF and if set then return
> the new user-defined error code.
>
>
>> And looking above again, you do have PVALIDATE_* defines except that
>> nothing's using them. Use them please.
> Actually the later patches does make use of the error codes (e.g
> SIZEMISMATCH). The caller should check the error code value and can take
> an action to resolve them. e.g a sizemismatch is seen then it can use
> the lower page level for the validation etc.
>
>
>> Also, for how to do condition code checks properly, see how the
>> CC_SET/CC_OUT macros are used.
>
> I will look into it. thanks.
>
>
>>> +}
>>> +
>>> +#else	/* !CONFIG_AMD_MEM_ENCRYPT */
>> This else-ifdeffery can go too if you move the ifdeffery inside the
>> function:
>>
>> static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
>> {
>> 	int rc = 0;
>>
>> #fidef CONFIG_AMD_MEM_ENCRYPT
>>
>> 	...
>>
>> #endif
>>
>> 	return rc;
>> }
>
> Noted. thanks
>
>
>> Thx.
>>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction
  2021-03-26 18:22       ` Brijesh Singh
@ 2021-03-26 19:12         ` Borislav Petkov
  2021-03-26 20:04           ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-03-26 19:12 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Fri, Mar 26, 2021 at 01:22:24PM -0500, Brijesh Singh wrote:
> Should I do the same for the sev-es.c ? Currently, I am keeping all the
> SEV-SNP specific changes in sev-snp.{c,h}. After a rename of
> sev-es.{c,h} from both the arch/x86/kernel and arch-x86/boot/compressed
> I can add the SNP specific stuff to it.
> 
> Thoughts ?

SNP depends on the whole functionality in SEV-ES, right? Which means,
SNP will need all the functionality of sev-es.c.

But sev-es.c is a lot more code than the header and snp is

 arch/x86/kernel/sev-snp.c               | 269 ++++++++++++++++++++++++

oh well, not so much.

I guess a single

arch/x86/kernel/sev.c

is probably ok.

We can always do arch/x86/kernel/sev/ later and split stuff then when it
starts getting real fat and impacts complication times.

Btw, there's also arch/x86/kernel/sev-es-shared.c and that can be

arch/x86/kernel/sev-shared.c

then.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction
  2021-03-26 15:42     ` Brijesh Singh
  2021-03-26 18:22       ` Brijesh Singh
@ 2021-03-26 19:22       ` Borislav Petkov
  2021-03-26 20:01         ` Brijesh Singh
  1 sibling, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-03-26 19:22 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Fri, Mar 26, 2021 at 10:42:56AM -0500, Brijesh Singh wrote:
> There is no strong reason for a separate sev-snp.h. I will add a
> pre-patch to rename sev-es.h to sev.h and add SNP stuff to it.

Thx.

> I was trying to adhere to existing functions which uses a direct
> instruction opcode.

That's not really always the case:

arch/x86/include/asm/special_insns.h

The "__" prefixed things should mean lower abstraction level helpers and
we drop the ball on those sometimes.

> It's not duplicate error code. The EAX returns an actual error code. The
> rFlags contains additional information. We want both the codes available
> to the caller so that it can make a proper decision.
> 
> e.g.
> 
> 1. A callers validate an address 0x1000. The instruction validated it
> and return success.

Your function returns PVALIDATE_SUCCESS.

> 2. Caller asked to validate the same address again. The instruction will
> return success but since the address was validated before hence
> rFlags.CF will be set to indicate that PVALIDATE instruction did not
> made any change in the RMP table.

Your function returns PVALIDATE_VALIDATED_ALREADY or so.

> You are correct that currently I am using only carry flag. So far we
> don't need other flags. What do you think about something like this:
> 
> * Add a new user defined error code
> 
>  #define PVALIDATE_FAIL_NOUPDATE        255 /* The error is returned if
> rFlags.CF set */

Or that.

> * Remove the rFlags parameters from the __pvalidate()

Yes, it seems unnecessary at the moment. And I/O function arguments are
always yuck.

> * Update the __pvalidate to check the rFlags.CF and if set then return
> the new user-defined error code.

Yap, you can convert all that to pvalidate() return values, methinks,
and then make that function simpler for callers because they should
not have to deal with rFLAGS - only return values to denote what the
function did.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction
  2021-03-26 19:22       ` Borislav Petkov
@ 2021-03-26 20:01         ` Brijesh Singh
  0 siblings, 0 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-03-26 20:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 3/26/21 2:22 PM, Borislav Petkov wrote:
> On Fri, Mar 26, 2021 at 10:42:56AM -0500, Brijesh Singh wrote:
>> There is no strong reason for a separate sev-snp.h. I will add a
>> pre-patch to rename sev-es.h to sev.h and add SNP stuff to it.
> Thx.
>
>> I was trying to adhere to existing functions which uses a direct
>> instruction opcode.
> That's not really always the case:
>
> arch/x86/include/asm/special_insns.h
>
> The "__" prefixed things should mean lower abstraction level helpers and
> we drop the ball on those sometimes.
>
>> It's not duplicate error code. The EAX returns an actual error code. The
>> rFlags contains additional information. We want both the codes available
>> to the caller so that it can make a proper decision.
>>
>> e.g.
>>
>> 1. A callers validate an address 0x1000. The instruction validated it
>> and return success.
> Your function returns PVALIDATE_SUCCESS.
>
>> 2. Caller asked to validate the same address again. The instruction will
>> return success but since the address was validated before hence
>> rFlags.CF will be set to indicate that PVALIDATE instruction did not
>> made any change in the RMP table.
> Your function returns PVALIDATE_VALIDATED_ALREADY or so.
>
>> You are correct that currently I am using only carry flag. So far we
>> don't need other flags. What do you think about something like this:
>>
>> * Add a new user defined error code
>>
>>  #define PVALIDATE_FAIL_NOUPDATE        255 /* The error is returned if
>> rFlags.CF set */
> Or that.
>
>> * Remove the rFlags parameters from the __pvalidate()
> Yes, it seems unnecessary at the moment. And I/O function arguments are
> always yuck.
>
>> * Update the __pvalidate to check the rFlags.CF and if set then return
>> the new user-defined error code.
> Yap, you can convert all that to pvalidate() return values, methinks,
> and then make that function simpler for callers because they should
> not have to deal with rFLAGS - only return values to denote what the
> function did.


Ack. I will made the required changes in next version.

>
> Thx.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction
  2021-03-26 19:12         ` Borislav Petkov
@ 2021-03-26 20:04           ` Brijesh Singh
  0 siblings, 0 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-03-26 20:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 3/26/21 2:12 PM, Borislav Petkov wrote:
> On Fri, Mar 26, 2021 at 01:22:24PM -0500, Brijesh Singh wrote:
>> Should I do the same for the sev-es.c ? Currently, I am keeping all the
>> SEV-SNP specific changes in sev-snp.{c,h}. After a rename of
>> sev-es.{c,h} from both the arch/x86/kernel and arch-x86/boot/compressed
>> I can add the SNP specific stuff to it.
>>
>> Thoughts ?
> SNP depends on the whole functionality in SEV-ES, right? Which means,
> SNP will need all the functionality of sev-es.c.

Yes, SEV-SNP needs the whole SEV-ES functionality.  I will work add
pre-patch to rename sev-es to sev, then add SNP changes in the sev.c.

thanks

>
> But sev-es.c is a lot more code than the header and snp is
>
>  arch/x86/kernel/sev-snp.c               | 269 ++++++++++++++++++++++++
>
> oh well, not so much.
>
> I guess a single
>
> arch/x86/kernel/sev.c
>
> is probably ok.
>
> We can always do arch/x86/kernel/sev/ later and split stuff then when it
> starts getting real fat and impacts complication times.
>
> Btw, there's also arch/x86/kernel/sev-es-shared.c and that can be
>
> arch/x86/kernel/sev-shared.c
>
> then.
>
> Thx.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 04/13] x86/sev-snp: define page state change VMGEXIT structure
  2021-03-24 16:44 ` [RFC Part1 PATCH 04/13] x86/sev-snp: define page state change VMGEXIT structure Brijesh Singh
@ 2021-04-01 10:32   ` Borislav Petkov
  2021-04-01 14:11     ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-04-01 10:32 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Wed, Mar 24, 2021 at 11:44:15AM -0500, Brijesh Singh wrote:
> An SNP-active guest will use the page state change VNAE MGEXIT defined in

I guess this was supposed to mean "NAE VMGEXIT" but pls write "NAE" out
at least once so that reader can find its way around the spec.

> the GHCB specification section 4.1.6 to ask the hypervisor to make the
> guest page private or shared in the RMP table. In addition to the
> private/shared, the guest can also ask the hypervisor to split or
> combine multiple 4K validated pages as a single 2M page or vice versa.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Joerg Roedel <jroedel@suse.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/include/asm/sev-snp.h  | 34 +++++++++++++++++++++++++++++++++
>  arch/x86/include/uapi/asm/svm.h |  1 +
>  2 files changed, 35 insertions(+)
> 
> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
> index 5a6d1367cab7..f514dad276f2 100644
> --- a/arch/x86/include/asm/sev-snp.h
> +++ b/arch/x86/include/asm/sev-snp.h
> @@ -22,6 +22,40 @@
>  #define RMP_PG_SIZE_2M			1
>  #define RMP_PG_SIZE_4K			0
>  
> +/* Page State Change MSR Protocol */
> +#define GHCB_SNP_PAGE_STATE_CHANGE_REQ	0x0014
> +#define		GHCB_SNP_PAGE_STATE_REQ_GFN(v, o)	(GHCB_SNP_PAGE_STATE_CHANGE_REQ | \
> +							 ((unsigned long)((o) & 0xf) << 52) | \
> +							 (((v) << 12) & 0xffffffffffffff))

This macro needs to be more readable and I'm not sure the masking is
correct. IOW, something like this perhaps:

#define GHCB_SNP_PAGE_STATE_REQ_GFN(va, operation)	\
	((((operation) & 0xf) << 52) | ((va) & GENMASK_ULL(51, 12)) | GHCB_SNP_PAGE_STATE_CHANGE_REQ)

where you have each GHCBData element at the proper place, msb to lsb.
Now, GHCB spec says:

	"GHCBData[51:12] – Guest physical frame number"

and I'm not clear as to what this macro takes: a virtual address or a
pfn. If it is a pfn, then you need to do:

	(((pfn) << 12) & GENMASK_ULL(51, 0))

but if it is a virtual address you need to do what I have above. Since
you do "<< 12" then it must be a pfn already but then you should call
the argument "pfn" so that it is clear what it takes.

Your mask above covers [55:0] but [55:52] is the page operation so the
high bit in that mask needs to be 51.

AFAICT, ofc.

> +#define	SNP_PAGE_STATE_PRIVATE		1
> +#define	SNP_PAGE_STATE_SHARED		2
> +#define	SNP_PAGE_STATE_PSMASH		3
> +#define	SNP_PAGE_STATE_UNSMASH		4
> +
> +#define GHCB_SNP_PAGE_STATE_CHANGE_RESP	0x0015
> +#define		GHCB_SNP_PAGE_STATE_RESP_VAL(val)	((val) >> 32)
	  ^^^^^^^^^^^^

Some stray tabs here and above, pls pay attention to vertical alignment too.

> +
> +/* Page State Change NAE event */
> +#define SNP_PAGE_STATE_CHANGE_MAX_ENTRY		253
> +struct __packed snp_page_state_header {
> +	uint16_t cur_entry;
> +	uint16_t end_entry;
> +	uint32_t reserved;
> +};
> +
> +struct __packed snp_page_state_entry {
> +	uint64_t cur_page:12;
> +	uint64_t gfn:40;
> +	uint64_t operation:4;
> +	uint64_t pagesize:1;
> +	uint64_t reserved:7;
> +};

Any particular reason for the uint<width>_t types or can you use our
shorter u<width> types?

> +
> +struct __packed snp_page_state_change {
> +	struct snp_page_state_header header;
> +	struct snp_page_state_entry entry[SNP_PAGE_STATE_CHANGE_MAX_ENTRY];

Also, looking further into the patchset, I wonder if it would make sense
to do:

s/PAGE_STATE_CHANGE/PSC/g

to avoid breaking lines of huge statements:

+	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SNP_PAGE_STATE_CHANGE_RESP) ||
+	    (GHCB_SNP_PAGE_STATE_RESP_VAL(val) != 0))
+		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);

and turn them into something more readable

+	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SNP_PSC_RESP) ||
+	    (GHCB_SNP_PSC_RESP_VAL(val)))
+		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);

Also, you have GHCB_SEV and GHCB_SNP prefixes and I wonder whether we
even need them and have everything be prefixed simply GHCB_ because that
is the protocol, after all.

Which will then turn the above into:

+	if ((GHCB_RESP_CODE(val) != GHCB_PSC_RESP) || (GHCB_PSC_RESP_VAL(val)))
+		sev_es_terminate(GHCB_REASON_GENERAL_REQUEST);

Oh yeah baby! :-)

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 04/13] x86/sev-snp: define page state change VMGEXIT structure
  2021-04-01 10:32   ` Borislav Petkov
@ 2021-04-01 14:11     ` Brijesh Singh
  2021-04-02 15:44       ` Borislav Petkov
  0 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-04-01 14:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 4/1/21 5:32 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:15AM -0500, Brijesh Singh wrote:
>> An SNP-active guest will use the page state change VNAE MGEXIT defined in
> I guess this was supposed to mean "NAE VMGEXIT" but pls write "NAE" out
> at least once so that reader can find its way around the spec.


Noted. I will fix in next rev.

>> the GHCB specification section 4.1.6 to ask the hypervisor to make the
>> guest page private or shared in the RMP table. In addition to the
>> private/shared, the guest can also ask the hypervisor to split or
>> combine multiple 4K validated pages as a single 2M page or vice versa.
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Joerg Roedel <jroedel@suse.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Tony Luck <tony.luck@intel.com>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Sean Christopherson <seanjc@google.com>
>> Cc: x86@kernel.org
>> Cc: kvm@vger.kernel.org
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  arch/x86/include/asm/sev-snp.h  | 34 +++++++++++++++++++++++++++++++++
>>  arch/x86/include/uapi/asm/svm.h |  1 +
>>  2 files changed, 35 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
>> index 5a6d1367cab7..f514dad276f2 100644
>> --- a/arch/x86/include/asm/sev-snp.h
>> +++ b/arch/x86/include/asm/sev-snp.h
>> @@ -22,6 +22,40 @@
>>  #define RMP_PG_SIZE_2M			1
>>  #define RMP_PG_SIZE_4K			0
>>  
>> +/* Page State Change MSR Protocol */
>> +#define GHCB_SNP_PAGE_STATE_CHANGE_REQ	0x0014
>> +#define		GHCB_SNP_PAGE_STATE_REQ_GFN(v, o)	(GHCB_SNP_PAGE_STATE_CHANGE_REQ | \
>> +							 ((unsigned long)((o) & 0xf) << 52) | \
>> +							 (((v) << 12) & 0xffffffffffffff))
> This macro needs to be more readable and I'm not sure the masking is
> correct. IOW, something like this perhaps:
>
> #define GHCB_SNP_PAGE_STATE_REQ_GFN(va, operation)	\
> 	((((operation) & 0xf) << 52) | ((va) & GENMASK_ULL(51, 12)) | GHCB_SNP_PAGE_STATE_CHANGE_REQ)


I guess I was trying to keep it in consistent with sev-es.h macro
definitions in which the command is used before the fields. In next
version, I will use the msb to lsb ordering.


>
> where you have each GHCBData element at the proper place, msb to lsb.
> Now, GHCB spec says:
>
> 	"GHCBData[51:12] – Guest physical frame number"
>
> and I'm not clear as to what this macro takes: a virtual address or a
> pfn. If it is a pfn, then you need to do:
>
> 	(((pfn) << 12) & GENMASK_ULL(51, 0))
>
> but if it is a virtual address you need to do what I have above. Since
> you do "<< 12" then it must be a pfn already but then you should call
> the argument "pfn" so that it is clear what it takes.


Yes, the macro takes the pfn.

> Your mask above covers [55:0] but [55:52] is the page operation so the
> high bit in that mask needs to be 51.

Ack. I will limit the mask to 51 so that we don't go outside the pfn
field width. thank you for pointing it.


> AFAICT, ofc.
>
>> +#define	SNP_PAGE_STATE_PRIVATE		1
>> +#define	SNP_PAGE_STATE_SHARED		2
>> +#define	SNP_PAGE_STATE_PSMASH		3
>> +#define	SNP_PAGE_STATE_UNSMASH		4
>> +
>> +#define GHCB_SNP_PAGE_STATE_CHANGE_RESP	0x0015
>> +#define		GHCB_SNP_PAGE_STATE_RESP_VAL(val)	((val) >> 32)
> 	  ^^^^^^^^^^^^
>
> Some stray tabs here and above, pls pay attention to vertical alignment too.


I noticed that sev-es.h uses tab when defining the macro to build
command. Another example where I tried to keep a bit consistentency with
sev-es.h. I will drop it in next rev.


>
>> +
>> +/* Page State Change NAE event */
>> +#define SNP_PAGE_STATE_CHANGE_MAX_ENTRY		253
>> +struct __packed snp_page_state_header {
>> +	uint16_t cur_entry;
>> +	uint16_t end_entry;
>> +	uint32_t reserved;
>> +};
>> +
>> +struct __packed snp_page_state_entry {
>> +	uint64_t cur_page:12;
>> +	uint64_t gfn:40;
>> +	uint64_t operation:4;
>> +	uint64_t pagesize:1;
>> +	uint64_t reserved:7;
>> +};
> Any particular reason for the uint<width>_t types or can you use our
> shorter u<width> types?

IIRC, the spec structure has uint<width>_t, so I used it as-is. No
strong reason for using it. I will switch to u64 type in the next version.


>
>> +
>> +struct __packed snp_page_state_change {
>> +	struct snp_page_state_header header;
>> +	struct snp_page_state_entry entry[SNP_PAGE_STATE_CHANGE_MAX_ENTRY];
> Also, looking further into the patchset, I wonder if it would make sense
> to do:
>
> s/PAGE_STATE_CHANGE/PSC/g
>
> to avoid breaking lines of huge statements:
>
> +	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SNP_PAGE_STATE_CHANGE_RESP) ||
> +	    (GHCB_SNP_PAGE_STATE_RESP_VAL(val) != 0))
> +		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
>
> and turn them into something more readable
>
> +	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SNP_PSC_RESP) ||
> +	    (GHCB_SNP_PSC_RESP_VAL(val)))
> +		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
>
> Also, you have GHCB_SEV and GHCB_SNP prefixes and I wonder whether we
> even need them and have everything be prefixed simply GHCB_ because that
> is the protocol, after all.
>
> Which will then turn the above into:
>
> +	if ((GHCB_RESP_CODE(val) != GHCB_PSC_RESP) || (GHCB_PSC_RESP_VAL(val)))
> +		sev_es_terminate(GHCB_REASON_GENERAL_REQUEST);
>
> Oh yeah baby! :-)

It looks much better :-). I am good with dropping GHCB_{SEV,SNP} prefix,
and also rename the PAGE_STATE_CHANGE to PSC. Thanks.

>
> Thx.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 04/13] x86/sev-snp: define page state change VMGEXIT structure
  2021-04-01 14:11     ` Brijesh Singh
@ 2021-04-02 15:44       ` Borislav Petkov
  0 siblings, 0 replies; 52+ messages in thread
From: Borislav Petkov @ 2021-04-02 15:44 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Thu, Apr 01, 2021 at 09:11:34AM -0500, Brijesh Singh wrote:
> I guess I was trying to keep it in consistent with sev-es.h macro
> definitions in which the command is used before the fields. In next
> version, I will use the msb to lsb ordering.

Yes pls. And then you could fix the sev-es.h macro too, in a prepatch
maybe or in the same one, to do the same so that when reading the GHCB
doc, it maps directly to the macros.

> IIRC, the spec structure has uint<width>_t, so I used it as-is. No
> strong reason for using it. I will switch to u64 type in the next version.

Yeah, the uint* things are in the C spec but we don't need this
definition outside the kernel, right?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 05/13] X86/sev-es: move few helper functions in common file
  2021-03-24 16:44 ` [RFC Part1 PATCH 05/13] X86/sev-es: move few helper functions in common file Brijesh Singh
@ 2021-04-02 19:27   ` Borislav Petkov
  2021-04-02 21:33     ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-04-02 19:27 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Wed, Mar 24, 2021 at 11:44:16AM -0500, Brijesh Singh wrote:
> The sev_es_terminate() and sev_es_{wr,rd}_ghcb_msr() helper functions
> in a common file so that it can be used by both the SEV-ES and SEV-SNP.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Joerg Roedel <jroedel@suse.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/boot/compressed/sev-common.c | 32 +++++++++++++++++++++++++++
>  arch/x86/boot/compressed/sev-es.c     | 22 ++----------------
>  arch/x86/kernel/sev-common-shared.c   | 31 ++++++++++++++++++++++++++
>  arch/x86/kernel/sev-es-shared.c       | 21 +++---------------
>  4 files changed, 68 insertions(+), 38 deletions(-)
>  create mode 100644 arch/x86/boot/compressed/sev-common.c
>  create mode 100644 arch/x86/kernel/sev-common-shared.c

Yeah, once you merge it all into sev.c and sev-shared.c, that patch is
not needed anymore.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 05/13] X86/sev-es: move few helper functions in common file
  2021-04-02 19:27   ` Borislav Petkov
@ 2021-04-02 21:33     ` Brijesh Singh
  0 siblings, 0 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-04-02 21:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 4/2/21 2:27 PM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:16AM -0500, Brijesh Singh wrote:
>> The sev_es_terminate() and sev_es_{wr,rd}_ghcb_msr() helper functions
>> in a common file so that it can be used by both the SEV-ES and SEV-SNP.
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Joerg Roedel <jroedel@suse.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Tony Luck <tony.luck@intel.com>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Sean Christopherson <seanjc@google.com>
>> Cc: x86@kernel.org
>> Cc: kvm@vger.kernel.org
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  arch/x86/boot/compressed/sev-common.c | 32 +++++++++++++++++++++++++++
>>  arch/x86/boot/compressed/sev-es.c     | 22 ++----------------
>>  arch/x86/kernel/sev-common-shared.c   | 31 ++++++++++++++++++++++++++
>>  arch/x86/kernel/sev-es-shared.c       | 21 +++---------------
>>  4 files changed, 68 insertions(+), 38 deletions(-)
>>  create mode 100644 arch/x86/boot/compressed/sev-common.c
>>  create mode 100644 arch/x86/kernel/sev-common-shared.c
> Yeah, once you merge it all into sev.c and sev-shared.c, that patch is
> not needed anymore.


Agreed. Renaming the sev-es.{c,h} -> sev.{c,h} will certainly help.
Additionally,  I noticed that GHCB MSR helper macro's are duplicated
between the arch/x86/include/asm/sev-es.h and arch/x86/kvm/svm/svm.h. I
am creating a new file (arch/x86/include/asm/sev-common.h) that will
consolidate all the helper macro common between the guest and the
hypervisor.

>
> Thx.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB
  2021-03-24 16:44 ` [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB Brijesh Singh
@ 2021-04-06 10:33   ` Borislav Petkov
  2021-04-06 15:47     ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-04-06 10:33 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Wed, Mar 24, 2021 at 11:44:17AM -0500, Brijesh Singh wrote:
> Many of the integrity guarantees of SEV-SNP are enforced through the
> Reverse Map Table (RMP). Each RMP entry contains the GPA at which a
> particular page of DRAM should be mapped. The VMs can request the
> hypervisor to add pages in the RMP table via the Page State Change VMGEXIT
> defined in the GHCB specification section 2.5.1 and 4.1.6. Inside each RMP
> entry is a Validated flag; this flag is automatically cleared to 0 by the
> CPU hardware when a new RMP entry is created for a guest. Each VM page
> can be either validated or invalidated, as indicated by the Validated
> flag in the RMP entry. Memory access to a private page that is not
> validated generates a #VC. A VM can use PVALIDATE instruction to validate
> the private page before using it.

I guess this should say "A VM must use the PVALIDATE insn to validate
that private page before using it." Otherwise it can't use it, right.
Thus the "must" and not "can".

> To maintain the security guarantee of SEV-SNP guests, when transitioning
> a memory from private to shared, the guest must invalidate the memory range
> before asking the hypervisor to change the page state to shared in the RMP
> table.

So first you talk about memory pages, now about memory range...

> After the page is mapped private in the page table, the guest must issue a

... and now about pages again. Let's talk pages only pls.

> page state change VMGEXIT to make the memory private in the RMP table and
> validate it. If the memory is not validated after its added in the RMP table
> as private, then a VC exception (page-not-validated) will be raised.

Didn't you just say this already above?

> We do

Who's "we"?

> not support the page-not-validated exception yet, so it will crash the guest.
> 
> On boot, BIOS should have validated the entire system memory. During
> the kernel decompression stage, the VC handler uses the
> set_memory_decrypted() to make the GHCB page shared (i.e clear encryption
> attribute). And while exiting from the decompression, it calls the
> set_memory_encyrpted() to make the page private.

Hmm, that commit message needs reorganizing, from
Documentation/process/submitting-patches.rst:

 "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
  instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
  to do frotz", as if you are giving orders to the codebase to change
  its behaviour."

So this should say something along the lines of "Add helpers for validating
pages in the decompression stage" or so.

> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Joerg Roedel <jroedel@suse.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org

Btw, you don't really need to add those CCs to the patch - it is enough
if you Cc the folks when you send the patches with git.

> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> ---
>  arch/x86/boot/compressed/Makefile       |   1 +
>  arch/x86/boot/compressed/ident_map_64.c |  18 ++++
>  arch/x86/boot/compressed/sev-snp.c      | 115 ++++++++++++++++++++++++
>  arch/x86/boot/compressed/sev-snp.h      |  25 ++++++
>  4 files changed, 159 insertions(+)
>  create mode 100644 arch/x86/boot/compressed/sev-snp.c
>  create mode 100644 arch/x86/boot/compressed/sev-snp.h
> 
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e0bc3988c3fa..4d422aae8a86 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -93,6 +93,7 @@ ifdef CONFIG_X86_64
>  	vmlinux-objs-y += $(obj)/mem_encrypt.o
>  	vmlinux-objs-y += $(obj)/pgtable_64.o
>  	vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-es.o
> +	vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-snp.o

Yeah, as before, make that a single sev.o and put everything in it.

>  endif
>  
>  vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
> diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
> index f7213d0943b8..0a420ce5550f 100644
> --- a/arch/x86/boot/compressed/ident_map_64.c
> +++ b/arch/x86/boot/compressed/ident_map_64.c
> @@ -37,6 +37,8 @@
>  #include <asm/setup.h>	/* For COMMAND_LINE_SIZE */
>  #undef _SETUP
>  
> +#include "sev-snp.h"
> +
>  extern unsigned long get_cmd_line_ptr(void);
>  
>  /* Used by PAGE_KERN* macros: */
> @@ -278,12 +280,28 @@ static int set_clr_page_flags(struct x86_mapping_info *info,
>  	if ((set | clr) & _PAGE_ENC)
>  		clflush_page(address);
>  
> +	/*
> +	 * If the encryption attribute is being cleared, then change the page state to
> +	 * shared in the RMP entry. Change of the page state must be done before the
> +	 * PTE updates.
> +	 */
> +	if (clr & _PAGE_ENC)
> +		sev_snp_set_page_shared(pte_pfn(*ptep) << PAGE_SHIFT);

The statement above already looks at clr - just merge the two together.

> +
>  	/* Update PTE */
>  	pte = *ptep;
>  	pte = pte_set_flags(pte, set);
>  	pte = pte_clear_flags(pte, clr);
>  	set_pte(ptep, pte);
>  
> +	/*
> +	 * If the encryption attribute is being set, then change the page state to
> +	 * private in the RMP entry. The page state must be done after the PTE
> +	 * is updated.
> +	 */
> +	if (set & _PAGE_ENC)
> +		sev_snp_set_page_private(pte_pfn(*ptep) << PAGE_SHIFT);
> +
>  	/* Flush TLB after changing encryption attribute */
>  	write_cr3(top_level_pgt);
>  
> diff --git a/arch/x86/boot/compressed/sev-snp.c b/arch/x86/boot/compressed/sev-snp.c
> new file mode 100644
> index 000000000000..5c25103b0df1
> --- /dev/null
> +++ b/arch/x86/boot/compressed/sev-snp.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * AMD SEV SNP support
> + *
> + * Author: Brijesh Singh <brijesh.singh@amd.com>
> + *
> + */
> +
> +#include "misc.h"
> +#include "error.h"
> +
> +#include <asm/msr-index.h>
> +#include <asm/sev-snp.h>
> +#include <asm/sev-es.h>
> +
> +#include "sev-snp.h"
> +
> +static bool sev_snp_enabled(void)
> +{
> +	unsigned long low, high;
> +	u64 val;
> +
> +	asm volatile("rdmsr\n" : "=a" (low), "=d" (high) :
> +			"c" (MSR_AMD64_SEV));
> +
> +	val = (high << 32) | low;
> +
> +	if (val & MSR_AMD64_SEV_SNP_ENABLED)
> +		return true;
> +
> +	return false;
> +}

arch/x86/boot/compressed/mem_encrypt.S already touches
MSR_AMD64_SEV - you can extend that function there and cache the
MSR_AMD64_SEV_SNP_ENABLED too, depending on where you need it. That
function is called in .code32 though.

If not, you should at least cache the MSR so that you don't have to read
it each time.

> +
> +/* Provides sev_snp_{wr,rd}_ghcb_msr() */
> +#include "sev-common.c"
> +
> +/* Provides sev_es_terminate() */
> +#include "../../kernel/sev-common-shared.c"
> +
> +static void sev_snp_pages_state_change(unsigned long paddr, int op)

no need for too many prefixes on static functions - just call this one
__change_page_state() or so, so that the below one can be called...

> +{
> +	u64 pfn = paddr >> PAGE_SHIFT;
> +	u64 old, val;
> +
> +	/* save the old GHCB MSR */
> +	old = sev_es_rd_ghcb_msr();

Why do you need to save/restore GHCB MSR? Other callers simply go and
write into it the new command...

> +
> +	/* Issue VMGEXIT to change the page state */
> +	sev_es_wr_ghcb_msr(GHCB_SNP_PAGE_STATE_REQ_GFN(pfn, op));
> +	VMGEXIT();
> +
> +	/* Read the response of the VMGEXIT */
> +	val = sev_es_rd_ghcb_msr();
> +	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SNP_PAGE_STATE_CHANGE_RESP) ||
> +	    (GHCB_SNP_PAGE_STATE_RESP_VAL(val) != 0))

No need for the "!= 0"

> +		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);

So what does that mean?

*Any* and *all* page state changes which fail immediately terminate a
guest? Why?

Then, how do we communicate this to the guest user what has happened?

Can GHCB_SEV_ES_REASON_GENERAL_REQUEST be something special like

GHCB_SEV_ES_REASON_PSC_FAILURE

or so, so that users know what has happened?

> +	/* Restore the GHCB MSR value */
> +	sev_es_wr_ghcb_msr(old);
> +}
> +
> +static void sev_snp_issue_pvalidate(unsigned long paddr, bool validate)

That one you can call simply "pvalidate" and then the layering with
__pvalidate works too.

> +{
> +	unsigned long eflags;
> +	int rc;
> +
> +	rc = __pvalidate(paddr, RMP_PG_SIZE_4K, validate, &eflags);
> +	if (rc) {
> +		error("Failed to validate address");
> +		goto e_fail;
> +	}
> +
> +	/* Check for the double validation and assert on failure */
> +	if (eflags & X86_EFLAGS_CF) {
> +		error("Double validation detected");
> +		goto e_fail;
> +	}
> +
> +	return;
> +e_fail:
> +	sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> +}
> +
> +static void sev_snp_set_page_private_shared(unsigned long paddr, int op)

... change_page_state()

> +{
> +	if (!sev_snp_enabled())
> +		return;
> +
> +	/*
> +	 * We are change the page state from private to shared, invalidate the pages before

s/We are//

> +	 * making the page state change in the RMP table.
> +	 */
> +	if (op == SNP_PAGE_STATE_SHARED)
> +		sev_snp_issue_pvalidate(paddr, false);

The new RMP Validated bit is specified in EDX[0]. The C standard defines

	false == 0
	true == 1

but make that explicit pls:

	pvalidate(paddr, 0);
	pvalidate(paddr, 1);

> +
> +	/* Request the page state change in the RMP table. */
> +	sev_snp_pages_state_change(paddr, op);
> +
> +	/*
> +	 * Now that pages are added in the RMP table as a private memory, validate the
> +	 * memory range so that it is consistent with the RMP entry.
> +	 */
> +	if (op == SNP_PAGE_STATE_PRIVATE)
> +		sev_snp_issue_pvalidate(paddr, true);
> +}
> +
> +void sev_snp_set_page_private(unsigned long paddr)
> +{
> +	sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_PRIVATE);
> +}
> +
> +void sev_snp_set_page_shared(unsigned long paddr)
> +{
> +	sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_SHARED);
> +}
> diff --git a/arch/x86/boot/compressed/sev-snp.h b/arch/x86/boot/compressed/sev-snp.h
> new file mode 100644
> index 000000000000..12fe9581a255
> --- /dev/null
> +++ b/arch/x86/boot/compressed/sev-snp.h

A single sev.h I guess.

> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * AMD SEV Secure Nested Paging Support
> + *
> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + *
> + * Author: Brijesh Singh <brijesh.singh@amd.com>
> + */
> +
> +#ifndef __COMPRESSED_SECURE_NESTED_PAGING_H
> +#define __COMPRESSED_SECURE_NESTED_PAGING_H

Look at how other x86 headers define their guards' format.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB
  2021-04-06 10:33   ` Borislav Petkov
@ 2021-04-06 15:47     ` Brijesh Singh
  2021-04-06 19:42       ` Tom Lendacky
  2021-04-07 11:16       ` Borislav Petkov
  0 siblings, 2 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-04-06 15:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 4/6/21 5:33 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:17AM -0500, Brijesh Singh wrote:
>> Many of the integrity guarantees of SEV-SNP are enforced through the
>> Reverse Map Table (RMP). Each RMP entry contains the GPA at which a
>> particular page of DRAM should be mapped. The VMs can request the
>> hypervisor to add pages in the RMP table via the Page State Change VMGEXIT
>> defined in the GHCB specification section 2.5.1 and 4.1.6. Inside each RMP
>> entry is a Validated flag; this flag is automatically cleared to 0 by the
>> CPU hardware when a new RMP entry is created for a guest. Each VM page
>> can be either validated or invalidated, as indicated by the Validated
>> flag in the RMP entry. Memory access to a private page that is not
>> validated generates a #VC. A VM can use PVALIDATE instruction to validate
>> the private page before using it.
> I guess this should say "A VM must use the PVALIDATE insn to validate
> that private page before using it." Otherwise it can't use it, right.
> Thus the "must" and not "can".


Noted, I should have used "must".

>
>> To maintain the security guarantee of SEV-SNP guests, when transitioning
>> a memory from private to shared, the guest must invalidate the memory range
>> before asking the hypervisor to change the page state to shared in the RMP
>> table.
> So first you talk about memory pages, now about memory range...
>
>> After the page is mapped private in the page table, the guest must issue a
> ... and now about pages again. Let's talk pages only pls.


Noted, I will stick to memory pages. thanks


>
>> page state change VMGEXIT to make the memory private in the RMP table and
>> validate it. If the memory is not validated after its added in the RMP table
>> as private, then a VC exception (page-not-validated) will be raised.
> Didn't you just say this already above?


Yes I said it in the start of the commit, I will work to avoid the
repetition.


>> We do
> Who's "we"?
>
>> not support the page-not-validated exception yet, so it will crash the guest.
>>
>> On boot, BIOS should have validated the entire system memory. During
>> the kernel decompression stage, the VC handler uses the
>> set_memory_decrypted() to make the GHCB page shared (i.e clear encryption
>> attribute). And while exiting from the decompression, it calls the
>> set_memory_encyrpted() to make the page private.
> Hmm, that commit message needs reorganizing, from
> Documentation/process/submitting-patches.rst:
>
>  "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
>   instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
>   to do frotz", as if you are giving orders to the codebase to change
>   its behaviour."
>
> So this should say something along the lines of "Add helpers for validating
> pages in the decompression stage" or so.

I will improve the commit message to avoid using the "[we]" or "[I]".
Add helpers looks good, thanks.


>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Joerg Roedel <jroedel@suse.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Tony Luck <tony.luck@intel.com>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Sean Christopherson <seanjc@google.com>
>> Cc: x86@kernel.org
>> Cc: kvm@vger.kernel.org
> Btw, you don't really need to add those CCs to the patch - it is enough
> if you Cc the folks when you send the patches with git.


Noted.

>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>  arch/x86/boot/compressed/Makefile       |   1 +
>>  arch/x86/boot/compressed/ident_map_64.c |  18 ++++
>>  arch/x86/boot/compressed/sev-snp.c      | 115 ++++++++++++++++++++++++
>>  arch/x86/boot/compressed/sev-snp.h      |  25 ++++++
>>  4 files changed, 159 insertions(+)
>>  create mode 100644 arch/x86/boot/compressed/sev-snp.c
>>  create mode 100644 arch/x86/boot/compressed/sev-snp.h
>>
>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>> index e0bc3988c3fa..4d422aae8a86 100644
>> --- a/arch/x86/boot/compressed/Makefile
>> +++ b/arch/x86/boot/compressed/Makefile
>> @@ -93,6 +93,7 @@ ifdef CONFIG_X86_64
>>  	vmlinux-objs-y += $(obj)/mem_encrypt.o
>>  	vmlinux-objs-y += $(obj)/pgtable_64.o
>>  	vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-es.o
>> +	vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-snp.o
> Yeah, as before, make that a single sev.o and put everything in it.


Yes, all the SNP changes will be merged into sev.c.

>
>>  endif
>>  
>>  vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
>> diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
>> index f7213d0943b8..0a420ce5550f 100644
>> --- a/arch/x86/boot/compressed/ident_map_64.c
>> +++ b/arch/x86/boot/compressed/ident_map_64.c
>> @@ -37,6 +37,8 @@
>>  #include <asm/setup.h>	/* For COMMAND_LINE_SIZE */
>>  #undef _SETUP
>>  
>> +#include "sev-snp.h"
>> +
>>  extern unsigned long get_cmd_line_ptr(void);
>>  
>>  /* Used by PAGE_KERN* macros: */
>> @@ -278,12 +280,28 @@ static int set_clr_page_flags(struct x86_mapping_info *info,
>>  	if ((set | clr) & _PAGE_ENC)
>>  		clflush_page(address);
>>  
>> +	/*
>> +	 * If the encryption attribute is being cleared, then change the page state to
>> +	 * shared in the RMP entry. Change of the page state must be done before the
>> +	 * PTE updates.
>> +	 */
>> +	if (clr & _PAGE_ENC)
>> +		sev_snp_set_page_shared(pte_pfn(*ptep) << PAGE_SHIFT);
> The statement above already looks at clr - just merge the two together.


Noted.

>
>> +
>>  	/* Update PTE */
>>  	pte = *ptep;
>>  	pte = pte_set_flags(pte, set);
>>  	pte = pte_clear_flags(pte, clr);
>>  	set_pte(ptep, pte);
>>  
>> +	/*
>> +	 * If the encryption attribute is being set, then change the page state to
>> +	 * private in the RMP entry. The page state must be done after the PTE
>> +	 * is updated.
>> +	 */
>> +	if (set & _PAGE_ENC)
>> +		sev_snp_set_page_private(pte_pfn(*ptep) << PAGE_SHIFT);
>> +
>>  	/* Flush TLB after changing encryption attribute */
>>  	write_cr3(top_level_pgt);
>>  
>> diff --git a/arch/x86/boot/compressed/sev-snp.c b/arch/x86/boot/compressed/sev-snp.c
>> new file mode 100644
>> index 000000000000..5c25103b0df1
>> --- /dev/null
>> +++ b/arch/x86/boot/compressed/sev-snp.c
>> @@ -0,0 +1,115 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * AMD SEV SNP support
>> + *
>> + * Author: Brijesh Singh <brijesh.singh@amd.com>
>> + *
>> + */
>> +
>> +#include "misc.h"
>> +#include "error.h"
>> +
>> +#include <asm/msr-index.h>
>> +#include <asm/sev-snp.h>
>> +#include <asm/sev-es.h>
>> +
>> +#include "sev-snp.h"
>> +
>> +static bool sev_snp_enabled(void)
>> +{
>> +	unsigned long low, high;
>> +	u64 val;
>> +
>> +	asm volatile("rdmsr\n" : "=a" (low), "=d" (high) :
>> +			"c" (MSR_AMD64_SEV));
>> +
>> +	val = (high << 32) | low;
>> +
>> +	if (val & MSR_AMD64_SEV_SNP_ENABLED)
>> +		return true;
>> +
>> +	return false;
>> +}
> arch/x86/boot/compressed/mem_encrypt.S already touches
> MSR_AMD64_SEV - you can extend that function there and cache the
> MSR_AMD64_SEV_SNP_ENABLED too, depending on where you need it. That
> function is called in .code32 though.
>
> If not, you should at least cache the MSR so that you don't have to read
> it each time.
I think we will not be able to use the status saved from the mem_encrypt.S.

The call sequence is something like this:

arch/x86/boot/compressed/head_64.S

   call get_sev_encryption_bit

arch/x86/boot/compressed/mem_encrypt.S

get_sev_encryption_bit:

    // Reads the Memory encryption CPUID Fn8000_001F

    // Check if the SEV is available

    // If SEV is available then call the rdmsr MSR_AMD64_SEV


When SEV-{ES,SNP} is enabled, read CPUID Fn8000_001F  will cause a #VC.
Inside the #VC handler, the first thing we do is setup the early GHCB.
While setting up the GHCB we check if SNP is enabled. If SNP is enabled
then perform additional setup (e.g GHCB GPA request etc).

I will try caching the value on first read.


>
>> +
>> +/* Provides sev_snp_{wr,rd}_ghcb_msr() */
>> +#include "sev-common.c"
>> +
>> +/* Provides sev_es_terminate() */
>> +#include "../../kernel/sev-common-shared.c"
>> +
>> +static void sev_snp_pages_state_change(unsigned long paddr, int op)
> no need for too many prefixes on static functions - just call this one
> __change_page_state() or so, so that the below one can be called...


Noted.

>> +{
>> +	u64 pfn = paddr >> PAGE_SHIFT;
>> +	u64 old, val;
>> +
>> +	/* save the old GHCB MSR */
>> +	old = sev_es_rd_ghcb_msr();
> Why do you need to save/restore GHCB MSR? Other callers simply go and
> write into it the new command...


Before the GHCB is established the caller does not need to save and
restore MSRs. The page_state_change() uses the GHCB MSR protocol and it
can be called before and after the GHCB is established hence I am saving
and restoring GHCB MSRs.

>
>> +
>> +	/* Issue VMGEXIT to change the page state */
>> +	sev_es_wr_ghcb_msr(GHCB_SNP_PAGE_STATE_REQ_GFN(pfn, op));
>> +	VMGEXIT();
>> +
>> +	/* Read the response of the VMGEXIT */
>> +	val = sev_es_rd_ghcb_msr();
>> +	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SNP_PAGE_STATE_CHANGE_RESP) ||
>> +	    (GHCB_SNP_PAGE_STATE_RESP_VAL(val) != 0))
> No need for the "!= 0"


Noted.

>
>> +		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> So what does that mean?
>
> *Any* and *all* page state changes which fail immediately terminate a
> guest? Why?


The hypervisor uses the RMPUPDATE instruction to add the pages in the
RMP table. If RMPUPDATE fails, then it will be communicated to the
guest. Now its up to guest on what it wants to do. I choose to terminate
because guest can't resolve this step on its own. It needs help from the
hypervisor and hypervisor has bailed on it. Depending on request type,
the next step will either fail or we go into infinite loop. Lets
consider an example:

1. Guest asked to add a page as a private in RMP table.

2. Hypervisor fail to add the page in the RMP table and return an error.

3. Guest ignored the error code and moved to the step to validate the page.

4. The page validation instruction expects that page must be added in
the RMP table. In our case the page was not added in the RMP table. So
it will cause #NPF (rmp violation).

5. On #NPF, hypervisor will try adding the page as private but it will
fail (same as #2). This will keep repeating and guest will not make any
progress.

I choose to return "void" from page_state_change() because caller can't
do anything with error code. Some of the failure may have security
implication, terminate the guest  as soon as we detect an error condition.


> Then, how do we communicate this to the guest user what has happened?
>
> Can GHCB_SEV_ES_REASON_GENERAL_REQUEST be something special like
>
> GHCB_SEV_ES_REASON_PSC_FAILURE
>
> or so, so that users know what has happened?


Current GHCB does not have special code for this. But I think Linux
guest can define a special code which can be used to indicate the
termination reason.

Tom,

Any other suggestion ?


>
>> +	/* Restore the GHCB MSR value */
>> +	sev_es_wr_ghcb_msr(old);
>> +}
>> +
>> +static void sev_snp_issue_pvalidate(unsigned long paddr, bool validate)
> That one you can call simply "pvalidate" and then the layering with
> __pvalidate works too.


Yes.

>> +{
>> +	unsigned long eflags;
>> +	int rc;
>> +
>> +	rc = __pvalidate(paddr, RMP_PG_SIZE_4K, validate, &eflags);
>> +	if (rc) {
>> +		error("Failed to validate address");
>> +		goto e_fail;
>> +	}
>> +
>> +	/* Check for the double validation and assert on failure */
>> +	if (eflags & X86_EFLAGS_CF) {
>> +		error("Double validation detected");
>> +		goto e_fail;
>> +	}
>> +
>> +	return;
>> +e_fail:
>> +	sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
>> +}
>> +
>> +static void sev_snp_set_page_private_shared(unsigned long paddr, int op)
> ... change_page_state()
>
>> +{
>> +	if (!sev_snp_enabled())
>> +		return;
>> +
>> +	/*
>> +	 * We are change the page state from private to shared, invalidate the pages before
> s/We are//


Noted.


>
>> +	 * making the page state change in the RMP table.
>> +	 */
>> +	if (op == SNP_PAGE_STATE_SHARED)
>> +		sev_snp_issue_pvalidate(paddr, false);
> The new RMP Validated bit is specified in EDX[0]. The C standard defines
>
> 	false == 0
> 	true == 1
>
> but make that explicit pls:
>
> 	pvalidate(paddr, 0);
> 	pvalidate(paddr, 1);


Noted.


>
>> +
>> +	/* Request the page state change in the RMP table. */
>> +	sev_snp_pages_state_change(paddr, op);
>> +
>> +	/*
>> +	 * Now that pages are added in the RMP table as a private memory, validate the
>> +	 * memory range so that it is consistent with the RMP entry.
>> +	 */
>> +	if (op == SNP_PAGE_STATE_PRIVATE)
>> +		sev_snp_issue_pvalidate(paddr, true);
>> +}
>> +
>> +void sev_snp_set_page_private(unsigned long paddr)
>> +{
>> +	sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_PRIVATE);
>> +}
>> +
>> +void sev_snp_set_page_shared(unsigned long paddr)
>> +{
>> +	sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_SHARED);
>> +}
>> diff --git a/arch/x86/boot/compressed/sev-snp.h b/arch/x86/boot/compressed/sev-snp.h
>> new file mode 100644
>> index 000000000000..12fe9581a255
>> --- /dev/null
>> +++ b/arch/x86/boot/compressed/sev-snp.h
> A single sev.h I guess.
>
>> @@ -0,0 +1,25 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * AMD SEV Secure Nested Paging Support
>> + *
>> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
>> + *
>> + * Author: Brijesh Singh <brijesh.singh@amd.com>
>> + */
>> +
>> +#ifndef __COMPRESSED_SECURE_NESTED_PAGING_H
>> +#define __COMPRESSED_SECURE_NESTED_PAGING_H
> Look at how other x86 headers define their guards' format.


Noted.


> Thx.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB
  2021-04-06 15:47     ` Brijesh Singh
@ 2021-04-06 19:42       ` Tom Lendacky
  2021-04-07 11:25         ` Borislav Petkov
  2021-04-07 11:16       ` Borislav Petkov
  1 sibling, 1 reply; 52+ messages in thread
From: Tom Lendacky @ 2021-04-06 19:42 UTC (permalink / raw)
  To: Brijesh Singh, Borislav Petkov
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, David Rientjes, Sean Christopherson

On 4/6/21 10:47 AM, Brijesh Singh wrote:
> 
> On 4/6/21 5:33 AM, Borislav Petkov wrote:
>> On Wed, Mar 24, 2021 at 11:44:17AM -0500, Brijesh Singh wrote:
>>

...

>> *Any* and *all* page state changes which fail immediately terminate a
>> guest? Why?
> 
> 
> The hypervisor uses the RMPUPDATE instruction to add the pages in the
> RMP table. If RMPUPDATE fails, then it will be communicated to the
> guest. Now its up to guest on what it wants to do. I choose to terminate
> because guest can't resolve this step on its own. It needs help from the
> hypervisor and hypervisor has bailed on it. Depending on request type,
> the next step will either fail or we go into infinite loop. Lets
> consider an example:
> 
> 1. Guest asked to add a page as a private in RMP table.
> 
> 2. Hypervisor fail to add the page in the RMP table and return an error.
> 
> 3. Guest ignored the error code and moved to the step to validate the page.
> 
> 4. The page validation instruction expects that page must be added in
> the RMP table. In our case the page was not added in the RMP table. So
> it will cause #NPF (rmp violation).
> 
> 5. On #NPF, hypervisor will try adding the page as private but it will
> fail (same as #2). This will keep repeating and guest will not make any
> progress.
> 
> I choose to return "void" from page_state_change() because caller can't
> do anything with error code. Some of the failure may have security
> implication, terminate the guest  as soon as we detect an error condition.
> 
> 
>> Then, how do we communicate this to the guest user what has happened?
>>
>> Can GHCB_SEV_ES_REASON_GENERAL_REQUEST be something special like
>>
>> GHCB_SEV_ES_REASON_PSC_FAILURE
>>
>> or so, so that users know what has happened?
> 
> 
> Current GHCB does not have special code for this. But I think Linux
> guest can define a special code which can be used to indicate the
> termination reason.
> 
> Tom,
> 
> Any other suggestion ?

The GHCB spec only defines the "0" reason code set. We could provide Linux
it's own reason code set with some more specific reason codes for
failures, if that is needed.

Thanks,
Tom

> 
> 
>>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB
  2021-04-06 15:47     ` Brijesh Singh
  2021-04-06 19:42       ` Tom Lendacky
@ 2021-04-07 11:16       ` Borislav Petkov
  2021-04-07 13:35         ` Brijesh Singh
  1 sibling, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-04-07 11:16 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Tue, Apr 06, 2021 at 10:47:18AM -0500, Brijesh Singh wrote:
> Before the GHCB is established the caller does not need to save and
> restore MSRs. The page_state_change() uses the GHCB MSR protocol and it
> can be called before and after the GHCB is established hence I am saving
> and restoring GHCB MSRs.

I think you need to elaborate on that, maybe with an example. What the
other sites using the GHCB MSR currently do is:

1. request by writing it
2. read the response

None of them save and restore it.

So why here?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB
  2021-04-06 19:42       ` Tom Lendacky
@ 2021-04-07 11:25         ` Borislav Petkov
  2021-04-07 19:45           ` Borislav Petkov
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-04-07 11:25 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Brijesh Singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, David Rientjes, Sean Christopherson

On Tue, Apr 06, 2021 at 02:42:43PM -0500, Tom Lendacky wrote:
> The GHCB spec only defines the "0" reason code set. We could provide Linux
> it's own reason code set with some more specific reason codes for
> failures, if that is needed.

Why Linux only?

Don't we want to have a generalized set of error codes which say what
has happened so that people can debug?

Let's take the above case Brijesh explains: guest tries a page state
change, HV cannot manage for whatever reason and guest terminates with a
"general request".

Wouldn't you want to at least have a *hint* as to why the guest
terminated instead of just "guest terminated"?

I.e., none of those:

https://duckduckgo.com/?q=dumb+error+messages&iax=images&ia=images

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active
  2021-03-24 16:44 ` [RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active Brijesh Singh
@ 2021-04-07 11:59   ` Borislav Petkov
  2021-04-07 17:34     ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-04-07 11:59 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Wed, Mar 24, 2021 at 11:44:18AM -0500, Brijesh Singh wrote:
> The SEV-SNP guest is required to perform GHCB GPA registration. This is

Why does it need to do that? Some additional security so as to not allow
changing the GHCB once it is established?

I'm guessing that's enforced by the SNP fw and we cannot do that
retroactively for SEV...? Because it sounds like a nice little thing we
could do additionally.

> because the hypervisor may prefer that a guest use a consistent and/or
> specific GPA for the GHCB associated with a vCPU. For more information,
> see the GHCB specification section 2.5.2.

I think you mean

"2.3.2 GHCB GPA Registration"

Please use the section name too because that doc changes from time to
time.

Also, you probably should update it here:

https://bugzilla.kernel.org/show_bug.cgi?id=206537

> diff --git a/arch/x86/boot/compressed/sev-snp.c b/arch/x86/boot/compressed/sev-snp.c
> index 5c25103b0df1..a4c5e85699a7 100644
> --- a/arch/x86/boot/compressed/sev-snp.c
> +++ b/arch/x86/boot/compressed/sev-snp.c
> @@ -113,3 +113,29 @@ void sev_snp_set_page_shared(unsigned long paddr)
>  {
>  	sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_SHARED);
>  }
> +
> +void sev_snp_register_ghcb(unsigned long paddr)

Right and let's prefix SNP-specific functions with "snp_" only so that
it is clear which is wcich when looking at the code.

> +{
> +	u64 pfn = paddr >> PAGE_SHIFT;
> +	u64 old, val;
> +
> +	if (!sev_snp_enabled())
> +		return;
> +
> +	/* save the old GHCB MSR */
> +	old = sev_es_rd_ghcb_msr();
> +
> +	/* Issue VMGEXIT */

No need for that comment.

> +	sev_es_wr_ghcb_msr(GHCB_REGISTER_GPA_REQ_VAL(pfn));
> +	VMGEXIT();
> +
> +	val = sev_es_rd_ghcb_msr();
> +
> +	/* If the response GPA is not ours then abort the guest */
> +	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_REGISTER_GPA_RESP) ||
> +	    (GHCB_REGISTER_GPA_RESP_VAL(val) != pfn))
> +		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);

Yet another example where using a specific termination reason could help
with debugging guests. Looking at the GHCB spec, I hope GHCBData[23:16]
is big enough for all reasons. I'm sure it can be extended ofc ...

:-)

> +	/* Restore the GHCB MSR value */
> +	sev_es_wr_ghcb_msr(old);
> +}
> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
> index f514dad276f2..0523eb21abd7 100644
> --- a/arch/x86/include/asm/sev-snp.h
> +++ b/arch/x86/include/asm/sev-snp.h
> @@ -56,6 +56,13 @@ struct __packed snp_page_state_change {
>  	struct snp_page_state_entry entry[SNP_PAGE_STATE_CHANGE_MAX_ENTRY];
>  };
>  
> +/* GHCB GPA register */
> +#define GHCB_REGISTER_GPA_REQ	0x012UL
> +#define		GHCB_REGISTER_GPA_REQ_VAL(v)		(GHCB_REGISTER_GPA_REQ | ((v) << 12))
> +
> +#define GHCB_REGISTER_GPA_RESP	0x013UL

Let's append "UL" to the other request numbers for consistency.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB
  2021-04-07 11:16       ` Borislav Petkov
@ 2021-04-07 13:35         ` Brijesh Singh
  2021-04-07 14:21           ` Tom Lendacky
  0 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-04-07 13:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 4/7/21 6:16 AM, Borislav Petkov wrote:
> On Tue, Apr 06, 2021 at 10:47:18AM -0500, Brijesh Singh wrote:
>> Before the GHCB is established the caller does not need to save and
>> restore MSRs. The page_state_change() uses the GHCB MSR protocol and it
>> can be called before and after the GHCB is established hence I am saving
>> and restoring GHCB MSRs.
> I think you need to elaborate on that, maybe with an example. What the
> other sites using the GHCB MSR currently do is:
>
> 1. request by writing it
> 2. read the response
>
> None of them save and restore it.
>
> So why here?

GHCB provides two ways to exit from the guest to the hypervisor. The MSR
protocol and NAEs. The MSR protocol is generally used before the GHCB is
established. After the GHCB is established the guests typically uses the
NAEs. All of the current call sites uses the MSR protocol before the
GHCB is established so they do not need to save and restore the GHCB.
The GHCB is established on the first #VC -
arch/x86/boot/compressed/sev-es.c early_setup_sev_es(). The GHCB page
must a shared page:

early_setup_sev_es()

  set_page_decrypted()

   sev_snp_set_page_shared()

The sev_snp_set_page_shared() called before the GHCB is established.
While exiting from the decompression the sev_es_shutdown_ghcb() is
called to deinit the GHCB.

sev_es_shutdown_ghcb()

  set_page_encrypted()

    sev_snp_set_page_private()

Now that sev_snp_set_private() is called after the GHCB is established.

Since both the sev_snp_set_page_{shared, private}() uses the common
routine to request the page change hence I choose the Page State Change
MSR protocol. In one case the page state request happen before and after
the GHCB is established. We need to save and restore GHCB otherwise will
be loose the previously established GHCB GPA.

If needed then we can avoid the save and restore. The GHCB  provides a
page state change NAE that can be used after the GHCB is established. If
we go with it then code may look like this:

1. Read the GHCB MSR to determine whether the GHCB is established.

2. If GHCB is established then use the page state change NAE

3. If GHCB is not established then use the page state change MSR protocol.

We can eliminate the restore but we still need the rdmsr. The code for
using the NAE page state is going to be a bit larger. Since it is not in
the hot path so I felt we stick with MSR protocol for the page state change.

I am open to suggestions. 

-Brijesh


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB
  2021-04-07 13:35         ` Brijesh Singh
@ 2021-04-07 14:21           ` Tom Lendacky
  2021-04-07 17:15             ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Tom Lendacky @ 2021-04-07 14:21 UTC (permalink / raw)
  To: Brijesh Singh, Borislav Petkov
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, David Rientjes, Sean Christopherson

On 4/7/21 8:35 AM, Brijesh Singh wrote:
> 
> On 4/7/21 6:16 AM, Borislav Petkov wrote:
>> On Tue, Apr 06, 2021 at 10:47:18AM -0500, Brijesh Singh wrote:
>>> Before the GHCB is established the caller does not need to save and
>>> restore MSRs. The page_state_change() uses the GHCB MSR protocol and it
>>> can be called before and after the GHCB is established hence I am saving
>>> and restoring GHCB MSRs.
>> I think you need to elaborate on that, maybe with an example. What the
>> other sites using the GHCB MSR currently do is:
>>
>> 1. request by writing it
>> 2. read the response
>>
>> None of them save and restore it.
>>
>> So why here?
> 
> GHCB provides two ways to exit from the guest to the hypervisor. The MSR
> protocol and NAEs. The MSR protocol is generally used before the GHCB is
> established. After the GHCB is established the guests typically uses the
> NAEs. All of the current call sites uses the MSR protocol before the
> GHCB is established so they do not need to save and restore the GHCB.
> The GHCB is established on the first #VC -
> arch/x86/boot/compressed/sev-es.c early_setup_sev_es(). The GHCB page
> must a shared page:
> 
> early_setup_sev_es()
> 
>   set_page_decrypted()
> 
>    sev_snp_set_page_shared()
> 
> The sev_snp_set_page_shared() called before the GHCB is established.
> While exiting from the decompression the sev_es_shutdown_ghcb() is
> called to deinit the GHCB.
> 
> sev_es_shutdown_ghcb()
> 
>   set_page_encrypted()
> 
>     sev_snp_set_page_private()
> 
> Now that sev_snp_set_private() is called after the GHCB is established.

I believe the current SEV-ES code always sets the GHCB address in the GHCB
MSR before invoking VMGEXIT, so I think you're safe either way. Worth
testing at least.

Thanks,
Tom

> 
> Since both the sev_snp_set_page_{shared, private}() uses the common
> routine to request the page change hence I choose the Page State Change
> MSR protocol. In one case the page state request happen before and after
> the GHCB is established. We need to save and restore GHCB otherwise will
> be loose the previously established GHCB GPA.
> 
> If needed then we can avoid the save and restore. The GHCB  provides a
> page state change NAE that can be used after the GHCB is established. If
> we go with it then code may look like this:
> 
> 1. Read the GHCB MSR to determine whether the GHCB is established.
> 
> 2. If GHCB is established then use the page state change NAE
> 
> 3. If GHCB is not established then use the page state change MSR protocol.
> 
> We can eliminate the restore but we still need the rdmsr. The code for
> using the NAE page state is going to be a bit larger. Since it is not in
> the hot path so I felt we stick with MSR protocol for the page state change.
> 
> I am open to suggestions. 
> 
> -Brijesh
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB
  2021-04-07 14:21           ` Tom Lendacky
@ 2021-04-07 17:15             ` Brijesh Singh
  0 siblings, 0 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-04-07 17:15 UTC (permalink / raw)
  To: Tom Lendacky, Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, David Rientjes, Sean Christopherson


On 4/7/21 9:21 AM, Tom Lendacky wrote:
> On 4/7/21 8:35 AM, Brijesh Singh wrote:
>> On 4/7/21 6:16 AM, Borislav Petkov wrote:
>>> On Tue, Apr 06, 2021 at 10:47:18AM -0500, Brijesh Singh wrote:
>>>> Before the GHCB is established the caller does not need to save and
>>>> restore MSRs. The page_state_change() uses the GHCB MSR protocol and it
>>>> can be called before and after the GHCB is established hence I am saving
>>>> and restoring GHCB MSRs.
>>> I think you need to elaborate on that, maybe with an example. What the
>>> other sites using the GHCB MSR currently do is:
>>>
>>> 1. request by writing it
>>> 2. read the response
>>>
>>> None of them save and restore it.
>>>
>>> So why here?
>> GHCB provides two ways to exit from the guest to the hypervisor. The MSR
>> protocol and NAEs. The MSR protocol is generally used before the GHCB is
>> established. After the GHCB is established the guests typically uses the
>> NAEs. All of the current call sites uses the MSR protocol before the
>> GHCB is established so they do not need to save and restore the GHCB.
>> The GHCB is established on the first #VC -
>> arch/x86/boot/compressed/sev-es.c early_setup_sev_es(). The GHCB page
>> must a shared page:
>>
>> early_setup_sev_es()
>>
>>   set_page_decrypted()
>>
>>    sev_snp_set_page_shared()
>>
>> The sev_snp_set_page_shared() called before the GHCB is established.
>> While exiting from the decompression the sev_es_shutdown_ghcb() is
>> called to deinit the GHCB.
>>
>> sev_es_shutdown_ghcb()
>>
>>   set_page_encrypted()
>>
>>     sev_snp_set_page_private()
>>
>> Now that sev_snp_set_private() is called after the GHCB is established.
> I believe the current SEV-ES code always sets the GHCB address in the GHCB
> MSR before invoking VMGEXIT, so I think you're safe either way. Worth
> testing at least.


Ah, I didn;t realize that the sev_es_ghcb_hv_call() helper sets the GHCB
MSR before invoking VMGEXIT. I should be able to drop the save and
restore during the page state change. Thanks Tom.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active
  2021-04-07 11:59   ` Borislav Petkov
@ 2021-04-07 17:34     ` Brijesh Singh
  2021-04-07 17:54       ` Tom Lendacky
  2021-04-08  8:17       ` Borislav Petkov
  0 siblings, 2 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-04-07 17:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 4/7/21 6:59 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:18AM -0500, Brijesh Singh wrote:
>> The SEV-SNP guest is required to perform GHCB GPA registration. This is
> Why does it need to do that? Some additional security so as to not allow
> changing the GHCB once it is established?
>
> I'm guessing that's enforced by the SNP fw and we cannot do that
> retroactively for SEV...? Because it sounds like a nice little thing we
> could do additionally.

The feature is part of the GHCB version 2 and is enforced by the
hypervisor. I guess it can be extended for the ES. Since this feature
was not available in GHCB version 1 (base ES) so it should be presented
as an optional for the ES ?


>
>> because the hypervisor may prefer that a guest use a consistent and/or
>> specific GPA for the GHCB associated with a vCPU. For more information,
>> see the GHCB specification section 2.5.2.
> I think you mean
>
> "2.3.2 GHCB GPA Registration"
>
> Please use the section name too because that doc changes from time to
> time.
>
> Also, you probably should update it here:
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=04%7C01%7Cbrijesh.singh%40amd.com%7Ce8ae7574ecc742be6c1a08d8f9bcac94%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533936070042328%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NaHJ5R9Dfo%2FPnci%2B%2B6xK9ecpV0%2F%2FYbsdGl25%2BFj3TaU%3D&amp;reserved=0
>

Yes, the section may have changed since I wrote the description. Noted.
I will refer the section name.


>> diff --git a/arch/x86/boot/compressed/sev-snp.c b/arch/x86/boot/compressed/sev-snp.c
>> index 5c25103b0df1..a4c5e85699a7 100644
>> --- a/arch/x86/boot/compressed/sev-snp.c
>> +++ b/arch/x86/boot/compressed/sev-snp.c
>> @@ -113,3 +113,29 @@ void sev_snp_set_page_shared(unsigned long paddr)
>>  {
>>  	sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_SHARED);
>>  }
>> +
>> +void sev_snp_register_ghcb(unsigned long paddr)
> Right and let's prefix SNP-specific functions with "snp_" only so that
> it is clear which is wcich when looking at the code.
>
>> +{
>> +	u64 pfn = paddr >> PAGE_SHIFT;
>> +	u64 old, val;
>> +
>> +	if (!sev_snp_enabled())
>> +		return;
>> +
>> +	/* save the old GHCB MSR */
>> +	old = sev_es_rd_ghcb_msr();
>> +
>> +	/* Issue VMGEXIT */
> No need for that comment.
>
>> +	sev_es_wr_ghcb_msr(GHCB_REGISTER_GPA_REQ_VAL(pfn));
>> +	VMGEXIT();
>> +
>> +	val = sev_es_rd_ghcb_msr();
>> +
>> +	/* If the response GPA is not ours then abort the guest */
>> +	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_REGISTER_GPA_RESP) ||
>> +	    (GHCB_REGISTER_GPA_RESP_VAL(val) != pfn))
>> +		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> Yet another example where using a specific termination reason could help
> with debugging guests. Looking at the GHCB spec, I hope GHCBData[23:16]
> is big enough for all reasons. I'm sure it can be extended ofc ...


Maybe we can request the GHCB version 3 to add the extended error code.


> :-)
>
>> +	/* Restore the GHCB MSR value */
>> +	sev_es_wr_ghcb_msr(old);
>> +}
>> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
>> index f514dad276f2..0523eb21abd7 100644
>> --- a/arch/x86/include/asm/sev-snp.h
>> +++ b/arch/x86/include/asm/sev-snp.h
>> @@ -56,6 +56,13 @@ struct __packed snp_page_state_change {
>>  	struct snp_page_state_entry entry[SNP_PAGE_STATE_CHANGE_MAX_ENTRY];
>>  };
>>  
>> +/* GHCB GPA register */
>> +#define GHCB_REGISTER_GPA_REQ	0x012UL
>> +#define		GHCB_REGISTER_GPA_REQ_VAL(v)		(GHCB_REGISTER_GPA_REQ | ((v) << 12))
>> +
>> +#define GHCB_REGISTER_GPA_RESP	0x013UL
> Let's append "UL" to the other request numbers for consistency.
>
> Thx.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active
  2021-04-07 17:34     ` Brijesh Singh
@ 2021-04-07 17:54       ` Tom Lendacky
  2021-04-08  8:17       ` Borislav Petkov
  1 sibling, 0 replies; 52+ messages in thread
From: Tom Lendacky @ 2021-04-07 17:54 UTC (permalink / raw)
  To: Brijesh Singh, Borislav Petkov
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, David Rientjes, Sean Christopherson

On 4/7/21 12:34 PM, Brijesh Singh wrote:
> 
> On 4/7/21 6:59 AM, Borislav Petkov wrote:
>> On Wed, Mar 24, 2021 at 11:44:18AM -0500, Brijesh Singh wrote:
>>> The SEV-SNP guest is required to perform GHCB GPA registration. This is
>> Why does it need to do that? Some additional security so as to not allow
>> changing the GHCB once it is established?
>>
>> I'm guessing that's enforced by the SNP fw and we cannot do that
>> retroactively for SEV...? Because it sounds like a nice little thing we
>> could do additionally.
> 
> The feature is part of the GHCB version 2 and is enforced by the
> hypervisor. I guess it can be extended for the ES. Since this feature
> was not available in GHCB version 1 (base ES) so it should be presented
> as an optional for the ES ?

GHCB GPA registration is only supported and required for SEV-SNP guests.
The final version of the spec documents that and should be published
within the next few days.

Thanks,
Tom

> 
> 
>>
>>> because the hypervisor may prefer that a guest use a consistent and/or
>>> specific GPA for the GHCB associated with a vCPU. For more information,
>>> see the GHCB specification section 2.5.2.
>> I think you mean
>>
>> "2.3.2 GHCB GPA Registration"
>>
>> Please use the section name too because that doc changes from time to
>> time.
>>
>> Also, you probably should update it here:
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=04%7C01%7Cbrijesh.singh%40amd.com%7Ce8ae7574ecc742be6c1a08d8f9bcac94%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533936070042328%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NaHJ5R9Dfo%2FPnci%2B%2B6xK9ecpV0%2F%2FYbsdGl25%2BFj3TaU%3D&amp;reserved=0
>>
> 
> Yes, the section may have changed since I wrote the description. Noted.
> I will refer the section name.
> 
> 
>>> diff --git a/arch/x86/boot/compressed/sev-snp.c b/arch/x86/boot/compressed/sev-snp.c
>>> index 5c25103b0df1..a4c5e85699a7 100644
>>> --- a/arch/x86/boot/compressed/sev-snp.c
>>> +++ b/arch/x86/boot/compressed/sev-snp.c
>>> @@ -113,3 +113,29 @@ void sev_snp_set_page_shared(unsigned long paddr)
>>>  {
>>>  	sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_SHARED);
>>>  }
>>> +
>>> +void sev_snp_register_ghcb(unsigned long paddr)
>> Right and let's prefix SNP-specific functions with "snp_" only so that
>> it is clear which is wcich when looking at the code.
>>
>>> +{
>>> +	u64 pfn = paddr >> PAGE_SHIFT;
>>> +	u64 old, val;
>>> +
>>> +	if (!sev_snp_enabled())
>>> +		return;
>>> +
>>> +	/* save the old GHCB MSR */
>>> +	old = sev_es_rd_ghcb_msr();
>>> +
>>> +	/* Issue VMGEXIT */
>> No need for that comment.
>>
>>> +	sev_es_wr_ghcb_msr(GHCB_REGISTER_GPA_REQ_VAL(pfn));
>>> +	VMGEXIT();
>>> +
>>> +	val = sev_es_rd_ghcb_msr();
>>> +
>>> +	/* If the response GPA is not ours then abort the guest */
>>> +	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_REGISTER_GPA_RESP) ||
>>> +	    (GHCB_REGISTER_GPA_RESP_VAL(val) != pfn))
>>> +		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
>> Yet another example where using a specific termination reason could help
>> with debugging guests. Looking at the GHCB spec, I hope GHCBData[23:16]
>> is big enough for all reasons. I'm sure it can be extended ofc ...
> 
> 
> Maybe we can request the GHCB version 3 to add the extended error code.
> 
> 
>> :-)
>>
>>> +	/* Restore the GHCB MSR value */
>>> +	sev_es_wr_ghcb_msr(old);
>>> +}
>>> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
>>> index f514dad276f2..0523eb21abd7 100644
>>> --- a/arch/x86/include/asm/sev-snp.h
>>> +++ b/arch/x86/include/asm/sev-snp.h
>>> @@ -56,6 +56,13 @@ struct __packed snp_page_state_change {
>>>  	struct snp_page_state_entry entry[SNP_PAGE_STATE_CHANGE_MAX_ENTRY];
>>>  };
>>>  
>>> +/* GHCB GPA register */
>>> +#define GHCB_REGISTER_GPA_REQ	0x012UL
>>> +#define		GHCB_REGISTER_GPA_REQ_VAL(v)		(GHCB_REGISTER_GPA_REQ | ((v) << 12))
>>> +
>>> +#define GHCB_REGISTER_GPA_RESP	0x013UL
>> Let's append "UL" to the other request numbers for consistency.
>>
>> Thx.
>>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB
  2021-04-07 11:25         ` Borislav Petkov
@ 2021-04-07 19:45           ` Borislav Petkov
  2021-04-08 13:57             ` Tom Lendacky
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-04-07 19:45 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Brijesh Singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, David Rientjes, Sean Christopherson

On Wed, Apr 07, 2021 at 01:25:55PM +0200, Borislav Petkov wrote:
> On Tue, Apr 06, 2021 at 02:42:43PM -0500, Tom Lendacky wrote:
> > The GHCB spec only defines the "0" reason code set. We could provide Linux
> > it's own reason code set with some more specific reason codes for
> > failures, if that is needed.
> 
> Why Linux only?
> 
> Don't we want to have a generalized set of error codes which say what
> has happened so that people can debug?

To quote Tom from IRC - and that is perfectly fine too, AFAIC:

<tlendacky> i'm ok with it, but i don't think it should be something dictated by the spec.  the problem is if you want to provide a new error code then the spec has to be updated constantly
<tlendacky> that's why i said, pick a "reason code set" value and say those are what Linux will use. We could probably document them in Documentation/
<tlendacky> the error code thing was an issue when introduced as part of the first spec.  that's why only a small number of reason codes are specified

Yap, makes sense. What we should do in the spec, though, is say: "This
range is for vendor-specific error codes".

Also, is GHCBData[23:16] big enough and can we extend it simply? Or do
we need the spec to at least dictate some ranges so that it can use some bits
above, say, bit 32 or whatever the upper range of the extension is...

Hmmm.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active
  2021-04-07 17:34     ` Brijesh Singh
  2021-04-07 17:54       ` Tom Lendacky
@ 2021-04-08  8:17       ` Borislav Petkov
  1 sibling, 0 replies; 52+ messages in thread
From: Borislav Petkov @ 2021-04-08  8:17 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Wed, Apr 07, 2021 at 12:34:59PM -0500, Brijesh Singh wrote:
> The feature is part of the GHCB version 2 and is enforced by the
> hypervisor. I guess it can be extended for the ES. Since this feature
> was not available in GHCB version 1 (base ES) so it should be presented
> as an optional for the ES ?

Yeah, it probably is not worth the effort. If an attacker controls the
guest kernel, then it can re-register a new GHCB so it doesn't really
matter.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 08/13] x86/sev-es: register GHCB memory when SEV-SNP is active
  2021-03-24 16:44 ` [RFC Part1 PATCH 08/13] x86/sev-es: register GHCB memory when SEV-SNP " Brijesh Singh
@ 2021-04-08  8:38   ` Borislav Petkov
  0 siblings, 0 replies; 52+ messages in thread
From: Borislav Petkov @ 2021-04-08  8:38 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Wed, Mar 24, 2021 at 11:44:19AM -0500, Brijesh Singh wrote:
> @@ -88,6 +89,13 @@ struct sev_es_runtime_data {
>  	 * is currently unsupported in SEV-ES guests.
>  	 */
>  	unsigned long dr7;
> +
> +	/*
> +	 * SEV-SNP requires that the GHCB must be registered before using it.
> +	 * The flag below will indicate whether the GHCB is registered, if its
> +	 * not registered then sev_es_get_ghcb() will perform the registration.
> +	 */
> +	bool ghcb_registered;

snp_ghcb_registered

because it is SNP-specific.

>  };
>  
>  struct ghcb_state {
> @@ -196,6 +204,12 @@ static __always_inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state)
>  		data->ghcb_active = true;
>  	}
>  
> +	/* SEV-SNP guest requires that GHCB must be registered before using it. */
> +	if (sev_snp_active() && !data->ghcb_registered) {
> +		sev_snp_register_ghcb(__pa(ghcb));
> +		data->ghcb_registered = true;

This needs to be set to true in the function itself, in the success
case.

> +static inline u64 sev_es_rd_ghcb_msr(void)
> +{
> +	return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
> +}
> +
> +static inline void sev_es_wr_ghcb_msr(u64 val)
> +{
> +	u32 low, high;
> +
> +	low  = (u32)(val);
> +	high = (u32)(val >> 32);
> +
> +	native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
> +}

Those copies will go away once you create the common sev.c

> +
> +/* Provides sev_es_terminate() */
> +#include "sev-common-shared.c"
> +
> +void sev_snp_register_ghcb(unsigned long paddr)
> +{
> +	u64 pfn = paddr >> PAGE_SHIFT;
> +	u64 old, val;
> +
> +	/* save the old GHCB MSR */
> +	old = sev_es_rd_ghcb_msr();
> +
> +	/* Issue VMGEXIT */
> +	sev_es_wr_ghcb_msr(GHCB_REGISTER_GPA_REQ_VAL(pfn));
> +	VMGEXIT();
> +
> +	val = sev_es_rd_ghcb_msr();
> +
> +	/* If the response GPA is not ours then abort the guest */
> +	if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_REGISTER_GPA_RESP) ||
> +	    (GHCB_REGISTER_GPA_RESP_VAL(val) != pfn))
> +		sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> +
> +	/* Restore the GHCB MSR value */
> +	sev_es_wr_ghcb_msr(old);
> +}

This is an almost identical copy of the version in compressed/. Move to
the shared file?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 09/13] x86/kernel: add support to validate memory in early enc attribute change
  2021-03-24 16:44 ` [RFC Part1 PATCH 09/13] x86/kernel: add support to validate memory in early enc attribute change Brijesh Singh
@ 2021-04-08 11:40   ` Borislav Petkov
  2021-04-08 12:25     ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-04-08 11:40 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Wed, Mar 24, 2021 at 11:44:20AM -0500, Brijesh Singh wrote:
> @@ -63,6 +63,10 @@ struct __packed snp_page_state_change {
>  #define GHCB_REGISTER_GPA_RESP	0x013UL
>  #define		GHCB_REGISTER_GPA_RESP_VAL(val)		((val) >> 12)
>  
> +/* Macro to convert the x86 page level to the RMP level and vice versa */
> +#define X86_RMP_PG_LEVEL(level)	(((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
> +#define RMP_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)

Please add those with the patch which uses them for the first time.

Also, it seems to me the names should be

X86_TO_RMP_PG_LEVEL
RMP_TO_X86_PG_LEVEL

...

> @@ -56,3 +56,108 @@ void sev_snp_register_ghcb(unsigned long paddr)
>  	/* Restore the GHCB MSR value */
>  	sev_es_wr_ghcb_msr(old);
>  }
> +
> +static void sev_snp_issue_pvalidate(unsigned long vaddr, unsigned int npages, bool validate)

pvalidate_pages() I guess.

> +{
> +	unsigned long eflags, vaddr_end, vaddr_next;
> +	int rc;
> +
> +	vaddr = vaddr & PAGE_MASK;
> +	vaddr_end = vaddr + (npages << PAGE_SHIFT);
> +
> +	for (; vaddr < vaddr_end; vaddr = vaddr_next) {

Yuck, that vaddr_next gets initialized at the end of the loop. How about
using a while loop here instead?

	while (vaddr < vaddr_end) {

		...

		vaddr += PAGE_SIZE;
	}

then you don't need vaddr_next at all. Ditto for all the other loops in
this patch which iterate over pages.

> +		rc = __pvalidate(vaddr, RMP_PG_SIZE_4K, validate, &eflags);

So this function gets only 4K pages to pvalidate?

> +

^ Superfluous newline.

> +		if (rc) {
> +			pr_err("Failed to validate address 0x%lx ret %d\n", vaddr, rc);

You can combine the pr_err and dump_stack() below into a WARN() here:

		WARN(rc, ...);

> +			goto e_fail;
> +		}
> +
> +		/* Check for the double validation condition */
> +		if (eflags & X86_EFLAGS_CF) {
> +			pr_err("Double %salidation detected (address 0x%lx)\n",
> +					validate ? "v" : "inv", vaddr);
> +			goto e_fail;
> +		}

As before - this should be communicated by a special retval from
__pvalidate().

> +
> +		vaddr_next = vaddr + PAGE_SIZE;
> +	}
> +
> +	return;
> +
> +e_fail:
> +	/* Dump stack for the debugging purpose */
> +	dump_stack();
> +
> +	/* Ask to terminate the guest */
> +	sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);

Another termination reason to #define.

> +}
> +
> +static void __init early_snp_set_page_state(unsigned long paddr, unsigned int npages, int op)
> +{
> +	unsigned long paddr_end, paddr_next;
> +	u64 old, val;
> +
> +	paddr = paddr & PAGE_MASK;
> +	paddr_end = paddr + (npages << PAGE_SHIFT);
> +
> +	/* save the old GHCB MSR */
> +	old = sev_es_rd_ghcb_msr();
> +
> +	for (; paddr < paddr_end; paddr = paddr_next) {
> +
> +		/*
> +		 * Use the MSR protocol VMGEXIT to request the page state change. We use the MSR
> +		 * protocol VMGEXIT because in early boot we may not have the full GHCB setup
> +		 * yet.
> +		 */
> +		sev_es_wr_ghcb_msr(GHCB_SNP_PAGE_STATE_REQ_GFN(paddr >> PAGE_SHIFT, op));
> +		VMGEXIT();

Yeah, I know we don't always strictly adhere to 80 columns but there's
no real need not to fit that in 80 cols here so please shorten names and
comments. Ditto for the rest.

> +
> +		val = sev_es_rd_ghcb_msr();
> +
> +		/* Read the response, if the page state change failed then terminate the guest. */
> +		if (GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SNP_PAGE_STATE_CHANGE_RESP)
> +			sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);

if (...)
	goto fail;

and add that fail label at the end so that all the error handling path
is out of the way.

> +
> +		if (GHCB_SNP_PAGE_STATE_RESP_VAL(val) != 0) {

s/!= 0//

> +			pr_err("Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n",
> +					op == SNP_PAGE_STATE_PRIVATE ? "private" : "shared",
> +					paddr, GHCB_SNP_PAGE_STATE_RESP_VAL(val));
> +
> +			/* Dump stack for the debugging purpose */
> +			dump_stack();

WARN as above.

> @@ -49,6 +50,27 @@ bool sev_enabled __section(".data");
>  /* Buffer used for early in-place encryption by BSP, no locking needed */
>  static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
>  
> +/*
> + * When SNP is active, this routine changes the page state from private to shared before
> + * copying the data from the source to destination and restore after the copy. This is required
> + * because the source address is mapped as decrypted by the caller of the routine.
> + */
> +static inline void __init snp_aware_memcpy(void *dst, void *src, size_t sz,
> +					   unsigned long paddr, bool dec)

snp_memcpy() simply.

> +{
> +	unsigned long npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
> +
> +	/* If the paddr need to accessed decrypted, make the page shared before memcpy. */

*needs*

> +	if (sev_snp_active() && dec)

Flip that test so that you don't have it twice in the code:

	if (!sev_snp_active()) {
		memcpy(dst, src, sz);
	} else {
		...
	}


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 09/13] x86/kernel: add support to validate memory in early enc attribute change
  2021-04-08 11:40   ` Borislav Petkov
@ 2021-04-08 12:25     ` Brijesh Singh
  0 siblings, 0 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-04-08 12:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 4/8/21 6:40 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:20AM -0500, Brijesh Singh wrote:
>> @@ -63,6 +63,10 @@ struct __packed snp_page_state_change {
>>  #define GHCB_REGISTER_GPA_RESP	0x013UL
>>  #define		GHCB_REGISTER_GPA_RESP_VAL(val)		((val) >> 12)
>>  
>> +/* Macro to convert the x86 page level to the RMP level and vice versa */
>> +#define X86_RMP_PG_LEVEL(level)	(((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
>> +#define RMP_X86_PG_LEVEL(level)	(((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
> Please add those with the patch which uses them for the first time.
>
> Also, it seems to me the names should be
>
> X86_TO_RMP_PG_LEVEL
> RMP_TO_X86_PG_LEVEL
>
> ...

Noted.

>> @@ -56,3 +56,108 @@ void sev_snp_register_ghcb(unsigned long paddr)
>>  	/* Restore the GHCB MSR value */
>>  	sev_es_wr_ghcb_msr(old);
>>  }
>> +
>> +static void sev_snp_issue_pvalidate(unsigned long vaddr, unsigned int npages, bool validate)
> pvalidate_pages() I guess.

Noted.

>
>> +{
>> +	unsigned long eflags, vaddr_end, vaddr_next;
>> +	int rc;
>> +
>> +	vaddr = vaddr & PAGE_MASK;
>> +	vaddr_end = vaddr + (npages << PAGE_SHIFT);
>> +
>> +	for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> Yuck, that vaddr_next gets initialized at the end of the loop. How about
> using a while loop here instead?
>
> 	while (vaddr < vaddr_end) {
>
> 		...
>
> 		vaddr += PAGE_SIZE;
> 	}
>
> then you don't need vaddr_next at all. Ditto for all the other loops in
> this patch which iterate over pages.
Yes, I will switch to use a while loop() in next rev.
>
>> +		rc = __pvalidate(vaddr, RMP_PG_SIZE_4K, validate, &eflags);
> So this function gets only 4K pages to pvalidate?

The early routines uses the GHCB MSR protocol for the validation. The
GHCB MSR protocol supports 4K only. The early routine can be called
before the GHCB is established.


>
>> +
> ^ Superfluous newline.
Noted.
>> +		if (rc) {
>> +			pr_err("Failed to validate address 0x%lx ret %d\n", vaddr, rc);
> You can combine the pr_err and dump_stack() below into a WARN() here:
>
> 		WARN(rc, ...);
Noted.
>> +			goto e_fail;
>> +		}
>> +
>> +		/* Check for the double validation condition */
>> +		if (eflags & X86_EFLAGS_CF) {
>> +			pr_err("Double %salidation detected (address 0x%lx)\n",
>> +					validate ? "v" : "inv", vaddr);
>> +			goto e_fail;
>> +		}
> As before - this should be communicated by a special retval from
> __pvalidate().
Yes.
>
>> +
>> +		vaddr_next = vaddr + PAGE_SIZE;
>> +	}
>> +
>> +	return;
>> +
>> +e_fail:
>> +	/* Dump stack for the debugging purpose */
>> +	dump_stack();
>> +
>> +	/* Ask to terminate the guest */
>> +	sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> Another termination reason to #define.
>
>> +}
>> +
>> +static void __init early_snp_set_page_state(unsigned long paddr, unsigned int npages, int op)
>> +{
>> +	unsigned long paddr_end, paddr_next;
>> +	u64 old, val;
>> +
>> +	paddr = paddr & PAGE_MASK;
>> +	paddr_end = paddr + (npages << PAGE_SHIFT);
>> +
>> +	/* save the old GHCB MSR */
>> +	old = sev_es_rd_ghcb_msr();
>> +
>> +	for (; paddr < paddr_end; paddr = paddr_next) {
>> +
>> +		/*
>> +		 * Use the MSR protocol VMGEXIT to request the page state change. We use the MSR
>> +		 * protocol VMGEXIT because in early boot we may not have the full GHCB setup
>> +		 * yet.
>> +		 */
>> +		sev_es_wr_ghcb_msr(GHCB_SNP_PAGE_STATE_REQ_GFN(paddr >> PAGE_SHIFT, op));
>> +		VMGEXIT();
> Yeah, I know we don't always strictly adhere to 80 columns but there's
> no real need not to fit that in 80 cols here so please shorten names and
> comments. Ditto for the rest.
Noted.
>
>> +
>> +		val = sev_es_rd_ghcb_msr();
>> +
>> +		/* Read the response, if the page state change failed then terminate the guest. */
>> +		if (GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SNP_PAGE_STATE_CHANGE_RESP)
>> +			sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> if (...)
> 	goto fail;
>
> and add that fail label at the end so that all the error handling path
> is out of the way.
Noted.
>
>> +
>> +		if (GHCB_SNP_PAGE_STATE_RESP_VAL(val) != 0) {
> s/!= 0//
Noted.
>
>> +			pr_err("Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n",
>> +					op == SNP_PAGE_STATE_PRIVATE ? "private" : "shared",
>> +					paddr, GHCB_SNP_PAGE_STATE_RESP_VAL(val));
>> +
>> +			/* Dump stack for the debugging purpose */
>> +			dump_stack();
> WARN as above.
Noted.
>
>> @@ -49,6 +50,27 @@ bool sev_enabled __section(".data");
>>  /* Buffer used for early in-place encryption by BSP, no locking needed */
>>  static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
>>  
>> +/*
>> + * When SNP is active, this routine changes the page state from private to shared before
>> + * copying the data from the source to destination and restore after the copy. This is required
>> + * because the source address is mapped as decrypted by the caller of the routine.
>> + */
>> +static inline void __init snp_aware_memcpy(void *dst, void *src, size_t sz,
>> +					   unsigned long paddr, bool dec)
> snp_memcpy() simply.
Noted.
>
>> +{
>> +	unsigned long npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
>> +
>> +	/* If the paddr need to accessed decrypted, make the page shared before memcpy. */
> *needs*
>
>> +	if (sev_snp_active() && dec)
> Flip that test so that you don't have it twice in the code:
>
> 	if (!sev_snp_active()) {
> 		memcpy(dst, src, sz);
> 	} else {
> 		...
> 	}
>
>
I will look into it. thanks

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB
  2021-04-07 19:45           ` Borislav Petkov
@ 2021-04-08 13:57             ` Tom Lendacky
  0 siblings, 0 replies; 52+ messages in thread
From: Tom Lendacky @ 2021-04-08 13:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Brijesh Singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, David Rientjes, Sean Christopherson

On 4/7/21 2:45 PM, Borislav Petkov wrote:
> On Wed, Apr 07, 2021 at 01:25:55PM +0200, Borislav Petkov wrote:
>> On Tue, Apr 06, 2021 at 02:42:43PM -0500, Tom Lendacky wrote:
>>> The GHCB spec only defines the "0" reason code set. We could provide Linux
>>> it's own reason code set with some more specific reason codes for
>>> failures, if that is needed.
>>
>> Why Linux only?
>>
>> Don't we want to have a generalized set of error codes which say what
>> has happened so that people can debug?
> 
> To quote Tom from IRC - and that is perfectly fine too, AFAIC:
> 
> <tlendacky> i'm ok with it, but i don't think it should be something dictated by the spec.  the problem is if you want to provide a new error code then the spec has to be updated constantly
> <tlendacky> that's why i said, pick a "reason code set" value and say those are what Linux will use. We could probably document them in Documentation/
> <tlendacky> the error code thing was an issue when introduced as part of the first spec.  that's why only a small number of reason codes are specified
> 
> Yap, makes sense. What we should do in the spec, though, is say: "This
> range is for vendor-specific error codes".
> 
> Also, is GHCBData[23:16] big enough and can we extend it simply? Or do
> we need the spec to at least dictate some ranges so that it can use some bits
> above, say, bit 32 or whatever the upper range of the extension is...

Hopefully we won't have 255 different reason codes. But if we get to that
point we should be able to expand the reason code field to 16-bits. Just
need to be sure that if any new fields are added between now and then,
they are added at bit 32 or above.

Thanks,
Tom

> 
> Hmmm.
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 11/13] x86/kernel: validate rom memory before accessing when SEV-SNP is active
  2021-03-24 16:44 ` [RFC Part1 PATCH 11/13] x86/kernel: validate rom memory before accessing when SEV-SNP is active Brijesh Singh
@ 2021-04-09 16:53   ` Borislav Petkov
  2021-04-09 17:40     ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-04-09 16:53 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Wed, Mar 24, 2021 at 11:44:22AM -0500, Brijesh Singh wrote:
> +	/*
> +	 * The ROM memory is not part of the E820 system RAM and is not prevalidated by the BIOS.
> +	 * The kernel page table maps the ROM region as encrypted memory, the SEV-SNP requires
> +	 * the all the encrypted memory must be validated before the access.
> +	 */
> +	if (sev_snp_active()) {
> +		unsigned long n, paddr;
> +
> +		n = ((system_rom_resource.end + 1) - video_rom_resource.start) >> PAGE_SHIFT;
> +		paddr = video_rom_resource.start;
> +		early_snp_set_memory_private((unsigned long)__va(paddr), paddr, n);
> +	}

I don't like this sprinkling of SNP-special stuff that needs to be done,
around the tree. Instead, pls define a function called

	snp_prep_memory(unsigned long pa, unsigned int num_pages, enum operation);

or so which does all the manipulation needed and the callsites only
simply unconditionally call that function so that all detail is
extracted and optimized away when not config-enabled.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 11/13] x86/kernel: validate rom memory before accessing when SEV-SNP is active
  2021-04-09 16:53   ` Borislav Petkov
@ 2021-04-09 17:40     ` Brijesh Singh
  0 siblings, 0 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-04-09 17:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 4/9/21 11:53 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:22AM -0500, Brijesh Singh wrote:
>> +	/*
>> +	 * The ROM memory is not part of the E820 system RAM and is not prevalidated by the BIOS.
>> +	 * The kernel page table maps the ROM region as encrypted memory, the SEV-SNP requires
>> +	 * the all the encrypted memory must be validated before the access.
>> +	 */
>> +	if (sev_snp_active()) {
>> +		unsigned long n, paddr;
>> +
>> +		n = ((system_rom_resource.end + 1) - video_rom_resource.start) >> PAGE_SHIFT;
>> +		paddr = video_rom_resource.start;
>> +		early_snp_set_memory_private((unsigned long)__va(paddr), paddr, n);
>> +	}
> I don't like this sprinkling of SNP-special stuff that needs to be done,
> around the tree. Instead, pls define a function called
>
> 	snp_prep_memory(unsigned long pa, unsigned int num_pages, enum operation);
>
> or so which does all the manipulation needed and the callsites only
> simply unconditionally call that function so that all detail is
> extracted and optimized away when not config-enabled.


Sure, I will do this in the next rev.


>
> Thx.
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit
  2021-03-24 16:44 ` [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit Brijesh Singh
@ 2021-04-12 11:49   ` Borislav Petkov
  2021-04-12 12:55     ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-04-12 11:49 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Wed, Mar 24, 2021 at 11:44:24AM -0500, Brijesh Singh wrote:
> @@ -161,3 +162,108 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
>  	 /* Ask hypervisor to make the memory shared in the RMP table. */
>  	early_snp_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
>  }
> +
> +static int snp_page_state_vmgexit(struct ghcb *ghcb, struct snp_page_state_change *data)

That function name definitely needs changing. The
vmgexit_page_state_change() one too. They're currenty confusing as hell
and I can't know what each one does without looking at its function
body.

> +{
> +	struct snp_page_state_header *hdr;
> +	int ret = 0;
> +
> +	hdr = &data->header;
> +
> +	/*
> +	 * The hypervisor can return before processing all the entries, the loop below retries
> +	 * until all the entries are processed.
> +	 */
> +	while (hdr->cur_entry <= hdr->end_entry) {

This doesn't make any sense: snp_set_page_state() builds a "set" of
pages to change their state in a loop and this one iterates *again* over
*something* which I'm not even clear on what.

Is something setting cur_entry to end_entry eventually?

In any case, why not issue those page state changes one-by-one in
snp_set_page_state() or is it possible that HV can do a couple of
them in one go so you have to poke it here until it sets cur_entry ==
end_entry?

> +		ghcb_set_sw_scratch(ghcb, (u64)__pa(data));

Why do you have to call that here for every loop iteration...

> +		ret = vmgexit_page_state_change(ghcb, data);

.... and in that function too?!

> +		/* Page State Change VMGEXIT can pass error code through exit_info_2. */
> +		if (ret || ghcb->save.sw_exit_info_2)
> +			break;
> +	}
> +
> +	return ret;

You don't need that ret variable - just return value directly.

> +}
> +
> +static void snp_set_page_state(unsigned long paddr, unsigned int npages, int op)
> +{
> +	unsigned long paddr_end, paddr_next;
> +	struct snp_page_state_change *data;
> +	struct snp_page_state_header *hdr;
> +	struct snp_page_state_entry *e;
> +	struct ghcb_state state;
> +	struct ghcb *ghcb;
> +	int ret, idx;
> +
> +	paddr = paddr & PAGE_MASK;
> +	paddr_end = paddr + (npages << PAGE_SHIFT);
> +
> +	ghcb = sev_es_get_ghcb(&state);

That function can return NULL.

> +	data = (struct snp_page_state_change *)ghcb->shared_buffer;
> +	hdr = &data->header;
> +	e = &(data->entry[0]);

So

	e = data->entry;

?

> +	memset(data, 0, sizeof (*data));
> +
> +	for (idx = 0; paddr < paddr_end; paddr = paddr_next) {

As before, a while loop pls.

> +		int level = PG_LEVEL_4K;

Why does this needs to happen on each loop iteration? It looks to me you
wanna do below:

	e->pagesize = X86_RMP_PG_LEVEL(PG_LEVEL_4K);

instead.

> +
> +		/* If we cannot fit more request then issue VMGEXIT before going further.  */
				   any more requests

No "we" pls.


> +		if (hdr->end_entry == (SNP_PAGE_STATE_CHANGE_MAX_ENTRY - 1)) {
> +			ret = snp_page_state_vmgexit(ghcb, data);
> +			if (ret)
> +				goto e_fail;

WARN

> +
> +			idx = 0;
> +			memset(data, 0, sizeof (*data));
> +			e = &(data->entry[0]);
> +		}

The order of the operations in this function looks weird: what you
should do is:

	- clear stuff, memset etc.
	- build shared buffer
	- issue vmgexit

so that you don't have the memset and e reinit twice and the flow can
be more clear and you don't have two snp_page_state_vmgexit() function
calls when there's a trailing set of entries which haven't reached
SNP_PAGE_STATE_CHANGE_MAX_ENTRY.

Maybe a double-loop or so.

...

> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 16f878c26667..19ee18ddbc37 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -27,6 +27,8 @@
>  #include <asm/proto.h>
>  #include <asm/memtype.h>
>  #include <asm/set_memory.h>
> +#include <asm/mem_encrypt.h>
> +#include <asm/sev-snp.h>
>  
>  #include "../mm_internal.h"
>  
> @@ -2001,8 +2003,25 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>  	 */
>  	cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
>  
> +	/*
> +	 * To maintain the security gurantees of SEV-SNP guest invalidate the memory before
> +	 * clearing the encryption attribute.
> +	 */

Align that comment on 80 cols, just like the rest of the comments do in
this file. Below too.

> +	if (sev_snp_active() && !enc) {

Push that sev_snp_active() inside the function. Below too.

> +		ret = snp_set_memory_shared(addr, numpages);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	ret = __change_page_attr_set_clr(&cpa, 1);
>  
> +	/*
> +	 * Now that memory is mapped encrypted in the page table, validate the memory range before
> +	 * we return from here.
> +	 */
> +	if (!ret && sev_snp_active() && enc)
> +		ret = snp_set_memory_private(addr, numpages);
> +
>  	/*
>  	 * After changing the encryption attribute, we need to flush TLBs again
>  	 * in case any speculative TLB caching occurred (but no need to flush

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit
  2021-04-12 11:49   ` Borislav Petkov
@ 2021-04-12 12:55     ` Brijesh Singh
  2021-04-12 13:05       ` Borislav Petkov
  0 siblings, 1 reply; 52+ messages in thread
From: Brijesh Singh @ 2021-04-12 12:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 4/12/21 6:49 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:24AM -0500, Brijesh Singh wrote:
>> @@ -161,3 +162,108 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
>>  	 /* Ask hypervisor to make the memory shared in the RMP table. */
>>  	early_snp_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
>>  }
>> +
>> +static int snp_page_state_vmgexit(struct ghcb *ghcb, struct snp_page_state_change *data)
> That function name definitely needs changing. The
> vmgexit_page_state_change() one too. They're currenty confusing as hell
> and I can't know what each one does without looking at its function
> body.
>
>> +{
>> +	struct snp_page_state_header *hdr;
>> +	int ret = 0;
>> +
>> +	hdr = &data->header;
>> +
>> +	/*
>> +	 * The hypervisor can return before processing all the entries, the loop below retries
>> +	 * until all the entries are processed.
>> +	 */
>> +	while (hdr->cur_entry <= hdr->end_entry) {
> This doesn't make any sense: snp_set_page_state() builds a "set" of
> pages to change their state in a loop and this one iterates *again* over
> *something* which I'm not even clear on what.
>
> Is something setting cur_entry to end_entry eventually?
>
> In any case, why not issue those page state changes one-by-one in
> snp_set_page_state() or is it possible that HV can do a couple of
> them in one go so you have to poke it here until it sets cur_entry ==
> end_entry?


The cur_entry is updated by the hypervisor. While building the psc
buffer the guest sets the cur_entry=0 and the end_entry point to the
last valid entry. The cur_entry is incremented by the hypervisor after
it successfully processes one 4K page. As per the spec, the hypervisor
could get interrupted in middle of the page state change and cur_entry
allows the guest to resume the page state change from the point where it
was interrupted.

>
>> +		ghcb_set_sw_scratch(ghcb, (u64)__pa(data));


Since we can get interrupted while executing the PSC so just to be safe
I re-initialized the scratch scratch area with our buffer instead of
relying on old values.


> Why do you have to call that here for every loop iteration...
>
>> +		ret = vmgexit_page_state_change(ghcb, data);


As per the spec the caller must check that the cur_entry > end_entry to
determine whether all the entries are processed. If not then retry the
state change. The hypervisor will skip the previously processed entries.
The snp_page_state_vmgexit() is implemented to return only after all the
entries are changed.


> .... and in that function too?!
>
>> +		/* Page State Change VMGEXIT can pass error code through exit_info_2. */
>> +		if (ret || ghcb->save.sw_exit_info_2)
>> +			break;
>> +	}
>> +
>> +	return ret;
> You don't need that ret variable - just return value directly.


Noted.

>
>> +}
>> +
>> +static void snp_set_page_state(unsigned long paddr, unsigned int npages, int op)
>> +{
>> +	unsigned long paddr_end, paddr_next;
>> +	struct snp_page_state_change *data;
>> +	struct snp_page_state_header *hdr;
>> +	struct snp_page_state_entry *e;
>> +	struct ghcb_state state;
>> +	struct ghcb *ghcb;
>> +	int ret, idx;
>> +
>> +	paddr = paddr & PAGE_MASK;
>> +	paddr_end = paddr + (npages << PAGE_SHIFT);
>> +
>> +	ghcb = sev_es_get_ghcb(&state);
> That function can return NULL.


Ah good point. Will fix in next rev.

>
>> +	data = (struct snp_page_state_change *)ghcb->shared_buffer;
>> +	hdr = &data->header;
>> +	e = &(data->entry[0]);
> So
>
> 	e = data->entry;
>
> ?


Sure I can do that. It reads better that way.


>> +	memset(data, 0, sizeof (*data));
>> +
>> +	for (idx = 0; paddr < paddr_end; paddr = paddr_next) {
> As before, a while loop pls.


Noted.

>
>> +		int level = PG_LEVEL_4K;
> Why does this needs to happen on each loop iteration? It looks to me you
> wanna do below:
>
> 	e->pagesize = X86_RMP_PG_LEVEL(PG_LEVEL_4K);
>
> instead.


Noted. I will remove the local variable.


>> +
>> +		/* If we cannot fit more request then issue VMGEXIT before going further.  */
> 				   any more requests
>
> No "we" pls.


Noted.

>
>> +		if (hdr->end_entry == (SNP_PAGE_STATE_CHANGE_MAX_ENTRY - 1)) {
>> +			ret = snp_page_state_vmgexit(ghcb, data);
>> +			if (ret)
>> +				goto e_fail;
> WARN


Based on your feedback on previous patches I am going to replace it with
WARN() followed by special termination code to indicate that guest fail
to change the page state.


>
>> +
>> +			idx = 0;
>> +			memset(data, 0, sizeof (*data));
>> +			e = &(data->entry[0]);
>> +		}
> The order of the operations in this function looks weird: what you
> should do is:
>
> 	- clear stuff, memset etc.
> 	- build shared buffer
> 	- issue vmgexit
>
> so that you don't have the memset and e reinit twice and the flow can
> be more clear and you don't have two snp_page_state_vmgexit() function
> calls when there's a trailing set of entries which haven't reached
> SNP_PAGE_STATE_CHANGE_MAX_ENTRY.
>
> Maybe a double-loop or so.


Yes with a double loop I can rearrange it a bit better. I will look into
it for the v2. thanks


> ...
>
>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
>> index 16f878c26667..19ee18ddbc37 100644
>> --- a/arch/x86/mm/pat/set_memory.c
>> +++ b/arch/x86/mm/pat/set_memory.c
>> @@ -27,6 +27,8 @@
>>  #include <asm/proto.h>
>>  #include <asm/memtype.h>
>>  #include <asm/set_memory.h>
>> +#include <asm/mem_encrypt.h>
>> +#include <asm/sev-snp.h>
>>  
>>  #include "../mm_internal.h"
>>  
>> @@ -2001,8 +2003,25 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>>  	 */
>>  	cpa_flush(&cpa, !this_cpu_has(X86_FEATURE_SME_COHERENT));
>>  
>> +	/*
>> +	 * To maintain the security gurantees of SEV-SNP guest invalidate the memory before
>> +	 * clearing the encryption attribute.
>> +	 */
> Align that comment on 80 cols, just like the rest of the comments do in
> this file. Below too.
>

Noted.

>> +	if (sev_snp_active() && !enc) {
> Push that sev_snp_active() inside the function. Below too.


Noted.

>
>> +		ret = snp_set_memory_shared(addr, numpages);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>>  	ret = __change_page_attr_set_clr(&cpa, 1);
>>  
>> +	/*
>> +	 * Now that memory is mapped encrypted in the page table, validate the memory range before
>> +	 * we return from here.
>> +	 */
>> +	if (!ret && sev_snp_active() && enc)
>> +		ret = snp_set_memory_private(addr, numpages);
>> +
>>  	/*
>>  	 * After changing the encryption attribute, we need to flush TLBs again
>>  	 * in case any speculative TLB caching occurred (but no need to flush

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit
  2021-04-12 12:55     ` Brijesh Singh
@ 2021-04-12 13:05       ` Borislav Petkov
  2021-04-12 14:31         ` Brijesh Singh
  0 siblings, 1 reply; 52+ messages in thread
From: Borislav Petkov @ 2021-04-12 13:05 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: linux-kernel, x86, kvm, ak, Thomas Gleixner, Ingo Molnar,
	Joerg Roedel, H. Peter Anvin, Tony Luck, Dave Hansen,
	Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson

On Mon, Apr 12, 2021 at 07:55:01AM -0500, Brijesh Singh wrote:
> The cur_entry is updated by the hypervisor. While building the psc
> buffer the guest sets the cur_entry=0 and the end_entry point to the
> last valid entry. The cur_entry is incremented by the hypervisor after
> it successfully processes one 4K page. As per the spec, the hypervisor
> could get interrupted in middle of the page state change and cur_entry
> allows the guest to resume the page state change from the point where it
> was interrupted.

This is non-obvious and belongs in a comment above it. Otherwise it
looks weird.

> Since we can get interrupted while executing the PSC so just to be safe
> I re-initialized the scratch scratch area with our buffer instead of
> relying on old values.

Ditto.

> As per the spec the caller must check that the cur_entry > end_entry to
> determine whether all the entries are processed. If not then retry the
> state change. The hypervisor will skip the previously processed entries.
> The snp_page_state_vmgexit() is implemented to return only after all the
> entries are changed.

Ditto.

This whole mechanism of what the guest does and what the HV does, needs
to be explained in a bigger comment somewhere around there so that it is
clear what's going on.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit
  2021-04-12 13:05       ` Borislav Petkov
@ 2021-04-12 14:31         ` Brijesh Singh
  0 siblings, 0 replies; 52+ messages in thread
From: Brijesh Singh @ 2021-04-12 14:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, linux-kernel, x86, kvm, ak, Thomas Gleixner,
	Ingo Molnar, Joerg Roedel, H. Peter Anvin, Tony Luck,
	Dave Hansen, Peter Zijlstra (Intel),
	Paolo Bonzini, Tom Lendacky, David Rientjes, Sean Christopherson


On 4/12/21 8:05 AM, Borislav Petkov wrote:
> On Mon, Apr 12, 2021 at 07:55:01AM -0500, Brijesh Singh wrote:
>> The cur_entry is updated by the hypervisor. While building the psc
>> buffer the guest sets the cur_entry=0 and the end_entry point to the
>> last valid entry. The cur_entry is incremented by the hypervisor after
>> it successfully processes one 4K page. As per the spec, the hypervisor
>> could get interrupted in middle of the page state change and cur_entry
>> allows the guest to resume the page state change from the point where it
>> was interrupted.
> This is non-obvious and belongs in a comment above it. Otherwise it
> looks weird.


Sure, I will add the comment and provide reference to the GHCB section.

Thanks




^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2021-04-12 14:31 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-24 16:44 [RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support Brijesh Singh
2021-03-24 16:44 ` [RFC Part1 PATCH 01/13] x86/cpufeatures: Add SEV-SNP CPU feature Brijesh Singh
2021-03-25 10:54   ` Borislav Petkov
2021-03-25 14:50     ` Brijesh Singh
2021-03-25 16:29       ` Borislav Petkov
2021-03-24 16:44 ` [RFC Part1 PATCH 02/13] x86/mm: add sev_snp_active() helper Brijesh Singh
2021-03-24 16:44 ` [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction Brijesh Singh
2021-03-26 14:30   ` Borislav Petkov
2021-03-26 15:42     ` Brijesh Singh
2021-03-26 18:22       ` Brijesh Singh
2021-03-26 19:12         ` Borislav Petkov
2021-03-26 20:04           ` Brijesh Singh
2021-03-26 19:22       ` Borislav Petkov
2021-03-26 20:01         ` Brijesh Singh
2021-03-24 16:44 ` [RFC Part1 PATCH 04/13] x86/sev-snp: define page state change VMGEXIT structure Brijesh Singh
2021-04-01 10:32   ` Borislav Petkov
2021-04-01 14:11     ` Brijesh Singh
2021-04-02 15:44       ` Borislav Petkov
2021-03-24 16:44 ` [RFC Part1 PATCH 05/13] X86/sev-es: move few helper functions in common file Brijesh Singh
2021-04-02 19:27   ` Borislav Petkov
2021-04-02 21:33     ` Brijesh Singh
2021-03-24 16:44 ` [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB Brijesh Singh
2021-04-06 10:33   ` Borislav Petkov
2021-04-06 15:47     ` Brijesh Singh
2021-04-06 19:42       ` Tom Lendacky
2021-04-07 11:25         ` Borislav Petkov
2021-04-07 19:45           ` Borislav Petkov
2021-04-08 13:57             ` Tom Lendacky
2021-04-07 11:16       ` Borislav Petkov
2021-04-07 13:35         ` Brijesh Singh
2021-04-07 14:21           ` Tom Lendacky
2021-04-07 17:15             ` Brijesh Singh
2021-03-24 16:44 ` [RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active Brijesh Singh
2021-04-07 11:59   ` Borislav Petkov
2021-04-07 17:34     ` Brijesh Singh
2021-04-07 17:54       ` Tom Lendacky
2021-04-08  8:17       ` Borislav Petkov
2021-03-24 16:44 ` [RFC Part1 PATCH 08/13] x86/sev-es: register GHCB memory when SEV-SNP " Brijesh Singh
2021-04-08  8:38   ` Borislav Petkov
2021-03-24 16:44 ` [RFC Part1 PATCH 09/13] x86/kernel: add support to validate memory in early enc attribute change Brijesh Singh
2021-04-08 11:40   ` Borislav Petkov
2021-04-08 12:25     ` Brijesh Singh
2021-03-24 16:44 ` [RFC Part1 PATCH 10/13] X86: kernel: make the bss.decrypted section shared in RMP table Brijesh Singh
2021-03-24 16:44 ` [RFC Part1 PATCH 11/13] x86/kernel: validate rom memory before accessing when SEV-SNP is active Brijesh Singh
2021-04-09 16:53   ` Borislav Petkov
2021-04-09 17:40     ` Brijesh Singh
2021-03-24 16:44 ` [RFC Part1 PATCH 12/13] x86/sev-es: make GHCB get and put helper accessible outside Brijesh Singh
2021-03-24 16:44 ` [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit Brijesh Singh
2021-04-12 11:49   ` Borislav Petkov
2021-04-12 12:55     ` Brijesh Singh
2021-04-12 13:05       ` Borislav Petkov
2021-04-12 14:31         ` Brijesh Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).