All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/5] x86: Fix SEV guest regression
@ 2018-09-07 17:57 Brijesh Singh
  2018-09-07 17:57 ` [PATCH v6 1/5] x86/mm: Restructure sme_encrypt_kernel() Brijesh Singh
                   ` (4 more replies)
  0 siblings, 5 replies; 37+ messages in thread
From: Brijesh Singh @ 2018-09-07 17:57 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: Brijesh Singh, Tom Lendacky, Thomas Gleixner, Borislav Petkov,
	Paolo Bonzini, Sean Christopherson, Radim Krčmář

The following commit

"
x86/kvmclock: Remove memblock dependency

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=368a540e0232ad446931f5a4e8a5e06f69f21343
"

introduced SEV guest regression.

The guest physical address holding the wall_clock and hv_clock_boot
are shared with the hypervisor must be mapped with C=0 when SEV
is active. To clear the C-bit we use  kernel_physical_mapping_init() to
split the large pages. The above commit moved the kvmclock initialization
very early and kernel_physical_mapping_init() fails to allocate memory
while spliting the large page.

To solve it, we add a special .data..decrypted section, this section can be
used to hold the shared variables. Early boot code maps this section with
C=0. The section is pmd aligned and sized to avoid the need to split the pages.
Caller can use __decrypted attribute to add the variables in .data..decrypted
section. 

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>

Changes since v5:
 - rename hvclock_boot_dec -> hvclock_boot_aux.
 - rename section from .data..decrypted_aux -> .data..decrypted.aux.
 - use NR_CPUS instead of random number elements in hv_clock_aux variable.

Changes since v4:
 - define few static pages in .data..decrypted which can be used
   for cpus > HVC_BOOT_ARRAY_SIZE when SEV is active.

Changes since v3:
 - commit message improvements (based on Sean's feedback)

Changes since v2:
 - commit message and code comment improvements (based on Boris feedback)
 - move sme_populate_pgd fixes in new patch.
 - drop stable Cc - will submit to stable after patch is upstreamed.

Changes since v1:
 - move the logic to re-arrange mapping in new patch
 - move the definition of __start_data_* in mem_encrypt.h
 - map the workarea buffer as encrypted when SEV is enabled
 - enhance the sme_populate_pgd to update the pte/pmd flags when mapping exist

Brijesh Singh (5):
  x86/mm: Restructure sme_encrypt_kernel()
  x86/mm: fix sme_populate_pgd() to update page flags
  x86/mm: add .data..decrypted section to hold shared variables
  x86/kvm: use __decrypted attribute in shared variables
  x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active

 arch/x86/include/asm/mem_encrypt.h |  10 ++
 arch/x86/kernel/head64.c           |  11 ++
 arch/x86/kernel/kvmclock.c         |  18 ++-
 arch/x86/kernel/vmlinux.lds.S      |  20 ++++
 arch/x86/mm/init.c                 |   3 +
 arch/x86/mm/mem_encrypt.c          |  10 ++
 arch/x86/mm/mem_encrypt_identity.c | 232 +++++++++++++++++++++++++++----------
 7 files changed, 240 insertions(+), 64 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 1/5] x86/mm: Restructure sme_encrypt_kernel()
  2018-09-07 17:57 [PATCH v6 0/5] x86: Fix SEV guest regression Brijesh Singh
@ 2018-09-07 17:57 ` Brijesh Singh
  2018-09-10 11:32   ` Borislav Petkov
  2018-09-07 17:57 ` [PATCH v6 2/5] x86/mm: fix sme_populate_pgd() to update page flags Brijesh Singh
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 37+ messages in thread
From: Brijesh Singh @ 2018-09-07 17:57 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: Brijesh Singh, Tom Lendacky, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

Re-arrange the sme_encrypt_kernel() by moving the workarea map/unmap
logic in a separate static function. There are no logical changes in this
patch. The restructuring will allow us to expand the sme_encrypt_kernel
in future.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: kvm@vger.kernel.org
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
---
 arch/x86/mm/mem_encrypt_identity.c | 160 ++++++++++++++++++++++++-------------
 1 file changed, 104 insertions(+), 56 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 7ae3686..92265d3 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -72,6 +72,22 @@ struct sme_populate_pgd_data {
 	unsigned long vaddr_end;
 };
 
+struct sme_workarea_data {
+	unsigned long kernel_start;
+	unsigned long kernel_end;
+	unsigned long kernel_len;
+
+	unsigned long initrd_start;
+	unsigned long initrd_end;
+	unsigned long initrd_len;
+
+	unsigned long workarea_start;
+	unsigned long workarea_end;
+	unsigned long workarea_len;
+
+	unsigned long decrypted_base;
+};
+
 static char sme_cmdline_arg[] __initdata = "mem_encrypt";
 static char sme_cmdline_on[]  __initdata = "on";
 static char sme_cmdline_off[] __initdata = "off";
@@ -266,19 +282,17 @@ static unsigned long __init sme_pgtable_calc(unsigned long len)
 	return entries + tables;
 }
 
-void __init sme_encrypt_kernel(struct boot_params *bp)
+static void __init build_workarea_map(struct boot_params *bp,
+				      struct sme_workarea_data *wa,
+				      struct sme_populate_pgd_data *ppd)
 {
 	unsigned long workarea_start, workarea_end, workarea_len;
 	unsigned long execute_start, execute_end, execute_len;
 	unsigned long kernel_start, kernel_end, kernel_len;
 	unsigned long initrd_start, initrd_end, initrd_len;
-	struct sme_populate_pgd_data ppd;
 	unsigned long pgtable_area_len;
 	unsigned long decrypted_base;
 
-	if (!sme_active())
-		return;
-
 	/*
 	 * Prepare for encrypting the kernel and initrd by building new
 	 * pagetables with the necessary attributes needed to encrypt the
@@ -358,17 +372,17 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
 	 * pagetables and when the new encrypted and decrypted kernel
 	 * mappings are populated.
 	 */
-	ppd.pgtable_area = (void *)execute_end;
+	ppd->pgtable_area = (void *)execute_end;
 
 	/*
 	 * Make sure the current pagetable structure has entries for
 	 * addressing the workarea.
 	 */
-	ppd.pgd = (pgd_t *)native_read_cr3_pa();
-	ppd.paddr = workarea_start;
-	ppd.vaddr = workarea_start;
-	ppd.vaddr_end = workarea_end;
-	sme_map_range_decrypted(&ppd);
+	ppd->pgd = (pgd_t *)native_read_cr3_pa();
+	ppd->paddr = workarea_start;
+	ppd->vaddr = workarea_start;
+	ppd->vaddr_end = workarea_end;
+	sme_map_range_decrypted(ppd);
 
 	/* Flush the TLB - no globals so cr3 is enough */
 	native_write_cr3(__native_read_cr3());
@@ -379,9 +393,9 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
 	 * then be populated with new PUDs and PMDs as the encrypted and
 	 * decrypted kernel mappings are created.
 	 */
-	ppd.pgd = ppd.pgtable_area;
-	memset(ppd.pgd, 0, sizeof(pgd_t) * PTRS_PER_PGD);
-	ppd.pgtable_area += sizeof(pgd_t) * PTRS_PER_PGD;
+	ppd->pgd = ppd->pgtable_area;
+	memset(ppd->pgd, 0, sizeof(pgd_t) * PTRS_PER_PGD);
+	ppd->pgtable_area += sizeof(pgd_t) * PTRS_PER_PGD;
 
 	/*
 	 * A different PGD index/entry must be used to get different
@@ -399,75 +413,109 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
 	decrypted_base <<= PGDIR_SHIFT;
 
 	/* Add encrypted kernel (identity) mappings */
-	ppd.paddr = kernel_start;
-	ppd.vaddr = kernel_start;
-	ppd.vaddr_end = kernel_end;
-	sme_map_range_encrypted(&ppd);
+	ppd->paddr = kernel_start;
+	ppd->vaddr = kernel_start;
+	ppd->vaddr_end = kernel_end;
+	sme_map_range_encrypted(ppd);
 
 	/* Add decrypted, write-protected kernel (non-identity) mappings */
-	ppd.paddr = kernel_start;
-	ppd.vaddr = kernel_start + decrypted_base;
-	ppd.vaddr_end = kernel_end + decrypted_base;
-	sme_map_range_decrypted_wp(&ppd);
+	ppd->paddr = kernel_start;
+	ppd->vaddr = kernel_start + decrypted_base;
+	ppd->vaddr_end = kernel_end + decrypted_base;
+	sme_map_range_decrypted_wp(ppd);
 
 	if (initrd_len) {
 		/* Add encrypted initrd (identity) mappings */
-		ppd.paddr = initrd_start;
-		ppd.vaddr = initrd_start;
-		ppd.vaddr_end = initrd_end;
-		sme_map_range_encrypted(&ppd);
+		ppd->paddr = initrd_start;
+		ppd->vaddr = initrd_start;
+		ppd->vaddr_end = initrd_end;
+		sme_map_range_encrypted(ppd);
 		/*
 		 * Add decrypted, write-protected initrd (non-identity) mappings
 		 */
-		ppd.paddr = initrd_start;
-		ppd.vaddr = initrd_start + decrypted_base;
-		ppd.vaddr_end = initrd_end + decrypted_base;
-		sme_map_range_decrypted_wp(&ppd);
+		ppd->paddr = initrd_start;
+		ppd->vaddr = initrd_start + decrypted_base;
+		ppd->vaddr_end = initrd_end + decrypted_base;
+		sme_map_range_decrypted_wp(ppd);
 	}
 
 	/* Add decrypted workarea mappings to both kernel mappings */
-	ppd.paddr = workarea_start;
-	ppd.vaddr = workarea_start;
-	ppd.vaddr_end = workarea_end;
-	sme_map_range_decrypted(&ppd);
+	ppd->paddr = workarea_start;
+	ppd->vaddr = workarea_start;
+	ppd->vaddr_end = workarea_end;
+	sme_map_range_decrypted(ppd);
 
-	ppd.paddr = workarea_start;
-	ppd.vaddr = workarea_start + decrypted_base;
-	ppd.vaddr_end = workarea_end + decrypted_base;
-	sme_map_range_decrypted(&ppd);
+	ppd->paddr = workarea_start;
+	ppd->vaddr = workarea_start + decrypted_base;
+	ppd->vaddr_end = workarea_end + decrypted_base;
+	sme_map_range_decrypted(ppd);
 
-	/* Perform the encryption */
-	sme_encrypt_execute(kernel_start, kernel_start + decrypted_base,
-			    kernel_len, workarea_start, (unsigned long)ppd.pgd);
+	wa->kernel_start = kernel_start;
+	wa->kernel_end = kernel_end;
+	wa->kernel_len = kernel_len;
 
-	if (initrd_len)
-		sme_encrypt_execute(initrd_start, initrd_start + decrypted_base,
-				    initrd_len, workarea_start,
-				    (unsigned long)ppd.pgd);
+	wa->initrd_start = initrd_start;
+	wa->initrd_end = initrd_end;
+	wa->initrd_len = initrd_len;
+
+	wa->workarea_start = workarea_start;
+	wa->workarea_end = workarea_end;
+	wa->workarea_len = workarea_len;
+
+	wa->decrypted_base = decrypted_base;
+}
 
+static void __init teardown_workarea_map(struct sme_workarea_data *wa,
+				         struct sme_populate_pgd_data *ppd)
+{
 	/*
 	 * At this point we are running encrypted.  Remove the mappings for
 	 * the decrypted areas - all that is needed for this is to remove
 	 * the PGD entry/entries.
 	 */
-	ppd.vaddr = kernel_start + decrypted_base;
-	ppd.vaddr_end = kernel_end + decrypted_base;
-	sme_clear_pgd(&ppd);
-
-	if (initrd_len) {
-		ppd.vaddr = initrd_start + decrypted_base;
-		ppd.vaddr_end = initrd_end + decrypted_base;
-		sme_clear_pgd(&ppd);
+	ppd->vaddr = wa->kernel_start + wa->decrypted_base;
+	ppd->vaddr_end = wa->kernel_end + wa->decrypted_base;
+	sme_clear_pgd(ppd);
+
+	if (wa->initrd_len) {
+		ppd->vaddr = wa->initrd_start + wa->decrypted_base;
+		ppd->vaddr_end = wa->initrd_end + wa->decrypted_base;
+		sme_clear_pgd(ppd);
 	}
 
-	ppd.vaddr = workarea_start + decrypted_base;
-	ppd.vaddr_end = workarea_end + decrypted_base;
-	sme_clear_pgd(&ppd);
+	ppd->vaddr = wa->workarea_start + wa->decrypted_base;
+	ppd->vaddr_end = wa->workarea_end + wa->decrypted_base;
+	sme_clear_pgd(ppd);
 
 	/* Flush the TLB - no globals so cr3 is enough */
 	native_write_cr3(__native_read_cr3());
 }
 
+void __init sme_encrypt_kernel(struct boot_params *bp)
+{
+	struct sme_populate_pgd_data ppd;
+	struct sme_workarea_data wa;
+
+	if (!sme_active())
+		return;
+
+	build_workarea_map(bp, &wa, &ppd);
+
+	/* When SEV is active, encrypt kernel and initrd */
+	sme_encrypt_execute(wa.kernel_start,
+			    wa.kernel_start + wa.decrypted_base,
+			    wa.kernel_len, wa.workarea_start,
+			    (unsigned long)ppd.pgd);
+
+	if (wa.initrd_len)
+		sme_encrypt_execute(wa.initrd_start,
+				    wa.initrd_start + wa.decrypted_base,
+				    wa.initrd_len, wa.workarea_start,
+				    (unsigned long)ppd.pgd);
+
+	teardown_workarea_map(&wa, &ppd);
+}
+
 void __init sme_enable(struct boot_params *bp)
 {
 	const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 2/5] x86/mm: fix sme_populate_pgd() to update page flags
  2018-09-07 17:57 [PATCH v6 0/5] x86: Fix SEV guest regression Brijesh Singh
  2018-09-07 17:57 ` [PATCH v6 1/5] x86/mm: Restructure sme_encrypt_kernel() Brijesh Singh
@ 2018-09-07 17:57 ` Brijesh Singh
  2018-09-10 11:36   ` Borislav Petkov
  2018-09-07 17:57 ` [PATCH v6 3/5] x86/mm: add .data..decrypted section to hold shared variables Brijesh Singh
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 37+ messages in thread
From: Brijesh Singh @ 2018-09-07 17:57 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: Brijesh Singh, Tom Lendacky, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

Fix sme_populate_pgd() to update page flags if the PMD/PTE entry
already exists.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: kvm@vger.kernel.org
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
---
 arch/x86/mm/mem_encrypt_identity.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 92265d3..7659e65 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -154,9 +154,6 @@ static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
 		return;
 
 	pmd = pmd_offset(pud, ppd->vaddr);
-	if (pmd_large(*pmd))
-		return;
-
 	set_pmd(pmd, __pmd(ppd->paddr | ppd->pmd_flags));
 }
 
@@ -182,8 +179,7 @@ static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
 		return;
 
 	pte = pte_offset_map(pmd, ppd->vaddr);
-	if (pte_none(*pte))
-		set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
+	set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
 }
 
 static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 3/5] x86/mm: add .data..decrypted section to hold shared variables
  2018-09-07 17:57 [PATCH v6 0/5] x86: Fix SEV guest regression Brijesh Singh
  2018-09-07 17:57 ` [PATCH v6 1/5] x86/mm: Restructure sme_encrypt_kernel() Brijesh Singh
  2018-09-07 17:57 ` [PATCH v6 2/5] x86/mm: fix sme_populate_pgd() to update page flags Brijesh Singh
@ 2018-09-07 17:57 ` Brijesh Singh
  2018-09-10 11:54   ` Borislav Petkov
  2018-09-07 17:57 ` [PATCH v6 4/5] x86/kvm: use __decrypted attribute in " Brijesh Singh
  2018-09-07 17:57 ` [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active Brijesh Singh
  4 siblings, 1 reply; 37+ messages in thread
From: Brijesh Singh @ 2018-09-07 17:57 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: Brijesh Singh, Tom Lendacky, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

kvmclock defines few static variables which are shared with the
hypervisor during the kvmclock initialization.

When SEV is active, memory is encrypted with a guest-specific key, and
if guest OS wants to share the memory region with hypervisor then it must
clear the C-bit before sharing it. Currently, we use
kernel_physical_mapping_init() to split large pages before clearing the
C-bit on shared pages. But it fails when called from the kvmclock
initialization (mainly because memblock allocator is not ready that early
during boot).

Add a __decrypted section attribute which can be used when defining
such shared variable. The so-defined variables will be placed in the
.data..decrypted section. This section is mapped with C=0 early
during boot, we also ensure that the initialized values are updated
to match with C=0 (i.e perform an in-place decryption). The
.data..decrypted section is PMD-aligned and sized so that we avoid
the need to split the large pages when mapping the section.

The sme_encrypt_kernel() was used to perform the in-place encryption
of the Linux kernel and initrd when SME is active. The routine has been
enhanced to decrypt the .data..decrypted section for both SME and SEV
cases.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: kvm@vger.kernel.org
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
---
 arch/x86/include/asm/mem_encrypt.h |  6 +++
 arch/x86/kernel/head64.c           | 11 +++++
 arch/x86/kernel/vmlinux.lds.S      | 17 +++++++
 arch/x86/mm/mem_encrypt_identity.c | 94 ++++++++++++++++++++++++++++++++------
 4 files changed, 113 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index c064383..802b2eb 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -52,6 +52,8 @@ void __init mem_encrypt_init(void);
 bool sme_active(void);
 bool sev_active(void);
 
+#define __decrypted __attribute__((__section__(".data..decrypted")))
+
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
 #define sme_me_mask	0ULL
@@ -77,6 +79,8 @@ early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0;
 static inline int __init
 early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }
 
+#define __decrypted
+
 #endif	/* CONFIG_AMD_MEM_ENCRYPT */
 
 /*
@@ -88,6 +92,8 @@ early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0;
 #define __sme_pa(x)		(__pa(x) | sme_me_mask)
 #define __sme_pa_nodebug(x)	(__pa_nodebug(x) | sme_me_mask)
 
+extern char __start_data_decrypted[], __end_data_decrypted[];
+
 #endif	/* __ASSEMBLY__ */
 
 #endif	/* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 8047379..af39d68 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -112,6 +112,7 @@ static bool __head check_la57_support(unsigned long physaddr)
 unsigned long __head __startup_64(unsigned long physaddr,
 				  struct boot_params *bp)
 {
+	unsigned long vaddr, vaddr_end;
 	unsigned long load_delta, *p;
 	unsigned long pgtable_flags;
 	pgdval_t *pgd;
@@ -234,6 +235,16 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	/* Encrypt the kernel and related (if SME is active) */
 	sme_encrypt_kernel(bp);
 
+	/* Clear the memory encryption mask from the .data..decrypted section. */
+	if (mem_encrypt_active()) {
+		vaddr = (unsigned long)__start_data_decrypted;
+		vaddr_end = (unsigned long)__end_data_decrypted;
+		for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
+			i = pmd_index(vaddr);
+			pmd[i] -= sme_get_me_mask();
+		}
+	}
+
 	/*
 	 * Return the SME encryption mask (if SME is active) to be used as a
 	 * modifier for the initial pgdir entry programmed into CR3.
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 8bde0a4..4cb1064 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -65,6 +65,21 @@ jiffies_64 = jiffies;
 #define ALIGN_ENTRY_TEXT_BEGIN	. = ALIGN(PMD_SIZE);
 #define ALIGN_ENTRY_TEXT_END	. = ALIGN(PMD_SIZE);
 
+/*
+ * This section contains data which will be mapped as decrypted. Memory
+ * encryption operates on a page basis. Make this section PMD-aligned
+ * to avoid spliting the pages while mapping the section early.
+ *
+ * Note: We use a separate section so that only this section gets
+ * decrypted to avoid exposing more than we wish.
+ */
+#define DATA_DECRYPTED						\
+	. = ALIGN(PMD_SIZE);					\
+	__start_data_decrypted = .;				\
+	*(.data..decrypted);					\
+	. = ALIGN(PMD_SIZE);					\
+	__end_data_decrypted = .;				\
+
 #else
 
 #define X86_ALIGN_RODATA_BEGIN
@@ -74,6 +89,7 @@ jiffies_64 = jiffies;
 
 #define ALIGN_ENTRY_TEXT_BEGIN
 #define ALIGN_ENTRY_TEXT_END
+#define DATA_DECRYPTED
 
 #endif
 
@@ -171,6 +187,7 @@ SECTIONS
 		/* rarely changed data like cpu maps */
 		READ_MOSTLY_DATA(INTERNODE_CACHE_BYTES)
 
+		DATA_DECRYPTED
 		/* End of data section */
 		_edata = .;
 	} :data
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 7659e65..08e70ba 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -51,6 +51,8 @@
 				 (_PAGE_PAT | _PAGE_PWT))
 
 #define PMD_FLAGS_ENC		(PMD_FLAGS_LARGE | _PAGE_ENC)
+#define PMD_FLAGS_ENC_WP	((PMD_FLAGS_ENC & ~_PAGE_CACHE_MASK) | \
+				 (_PAGE_PAT | _PAGE_PWT))
 
 #define PTE_FLAGS		(__PAGE_KERNEL_EXEC & ~_PAGE_GLOBAL)
 
@@ -59,6 +61,8 @@
 				 (_PAGE_PAT | _PAGE_PWT))
 
 #define PTE_FLAGS_ENC		(PTE_FLAGS | _PAGE_ENC)
+#define PTE_FLAGS_ENC_WP	((PTE_FLAGS_ENC & ~_PAGE_CACHE_MASK) | \
+				 (_PAGE_PAT | _PAGE_PWT))
 
 struct sme_populate_pgd_data {
 	void    *pgtable_area;
@@ -231,6 +235,11 @@ static void __init sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
 	__sme_map_range(ppd, PMD_FLAGS_ENC, PTE_FLAGS_ENC);
 }
 
+static void __init sme_map_range_encrypted_wp(struct sme_populate_pgd_data *ppd)
+{
+	__sme_map_range(ppd, PMD_FLAGS_ENC_WP, PTE_FLAGS_ENC_WP);
+}
+
 static void __init sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
 {
 	__sme_map_range(ppd, PMD_FLAGS_DEC, PTE_FLAGS_DEC);
@@ -378,7 +387,10 @@ static void __init build_workarea_map(struct boot_params *bp,
 	ppd->paddr = workarea_start;
 	ppd->vaddr = workarea_start;
 	ppd->vaddr_end = workarea_end;
-	sme_map_range_decrypted(ppd);
+	if (sev_active())
+		sme_map_range_encrypted(ppd);
+	else
+		sme_map_range_decrypted(ppd);
 
 	/* Flush the TLB - no globals so cr3 is enough */
 	native_write_cr3(__native_read_cr3());
@@ -435,16 +447,27 @@ static void __init build_workarea_map(struct boot_params *bp,
 		sme_map_range_decrypted_wp(ppd);
 	}
 
-	/* Add decrypted workarea mappings to both kernel mappings */
+	/*
+	 * When SEV is active, kernel is already encrypted hence mapping
+	 * the initial workarea_start as encrypted. When SME is active,
+	 * the kernel is not encrypted hence add decrypted workarea
+	 * mappings to both kernel mappings.
+	 */
 	ppd->paddr = workarea_start;
 	ppd->vaddr = workarea_start;
 	ppd->vaddr_end = workarea_end;
-	sme_map_range_decrypted(ppd);
+	if (sev_active())
+		sme_map_range_encrypted(ppd);
+	else
+		sme_map_range_decrypted(ppd);
 
 	ppd->paddr = workarea_start;
 	ppd->vaddr = workarea_start + decrypted_base;
 	ppd->vaddr_end = workarea_end + decrypted_base;
-	sme_map_range_decrypted(ppd);
+	if (sev_active())
+		sme_map_range_encrypted(ppd);
+	else
+		sme_map_range_decrypted(ppd);
 
 	wa->kernel_start = kernel_start;
 	wa->kernel_end = kernel_end;
@@ -487,28 +510,69 @@ static void __init teardown_workarea_map(struct sme_workarea_data *wa,
 	native_write_cr3(__native_read_cr3());
 }
 
+static void __init decrypt_shared_data(struct sme_workarea_data *wa,
+				       struct sme_populate_pgd_data *ppd)
+{
+	unsigned long decrypted_start, decrypted_end, decrypted_len;
+
+	/* Physical addresses of decrypted data section */
+	decrypted_start = __pa_symbol(__start_data_decrypted);
+	decrypted_end = ALIGN(__pa_symbol(__end_data_decrypted), PMD_PAGE_SIZE);
+	decrypted_len = decrypted_end - decrypted_start;
+
+	if (!decrypted_len)
+		return;
+
+	/* Add decrypted mapping for the section (identity) */
+	ppd->paddr = decrypted_start;
+	ppd->vaddr = decrypted_start;
+	ppd->vaddr_end = decrypted_end;
+	sme_map_range_decrypted(ppd);
+
+	/* Add encrypted-wp mapping for the section (non-identity) */
+	ppd->paddr = decrypted_start;
+	ppd->vaddr = decrypted_start + wa->decrypted_base;
+	ppd->vaddr_end = decrypted_end + wa->decrypted_base;
+	sme_map_range_encrypted_wp(ppd);
+
+	/* Perform in-place decryption */
+	sme_encrypt_execute(decrypted_start,
+			    decrypted_start + wa->decrypted_base,
+			    decrypted_len, wa->workarea_start,
+			    (unsigned long)ppd->pgd);
+
+	ppd->vaddr = decrypted_start + wa->decrypted_base;
+	ppd->vaddr_end = decrypted_end + wa->decrypted_base;
+	sme_clear_pgd(ppd);
+}
+
 void __init sme_encrypt_kernel(struct boot_params *bp)
 {
 	struct sme_populate_pgd_data ppd;
 	struct sme_workarea_data wa;
 
-	if (!sme_active())
+	if (!mem_encrypt_active())
 		return;
 
 	build_workarea_map(bp, &wa, &ppd);
 
-	/* When SEV is active, encrypt kernel and initrd */
-	sme_encrypt_execute(wa.kernel_start,
-			    wa.kernel_start + wa.decrypted_base,
-			    wa.kernel_len, wa.workarea_start,
-			    (unsigned long)ppd.pgd);
-
-	if (wa.initrd_len)
-		sme_encrypt_execute(wa.initrd_start,
-				    wa.initrd_start + wa.decrypted_base,
-				    wa.initrd_len, wa.workarea_start,
+	/* When SME is active, encrypt kernel and initrd */
+	if (sme_active()) {
+		sme_encrypt_execute(wa.kernel_start,
+				    wa.kernel_start + wa.decrypted_base,
+				    wa.kernel_len, wa.workarea_start,
 				    (unsigned long)ppd.pgd);
 
+		if (wa.initrd_len)
+			sme_encrypt_execute(wa.initrd_start,
+					    wa.initrd_start + wa.decrypted_base,
+					    wa.initrd_len, wa.workarea_start,
+					    (unsigned long)ppd.pgd);
+	}
+
+	/* Decrypt the contents of .data..decrypted section */
+	decrypt_shared_data(&wa, &ppd);
+
 	teardown_workarea_map(&wa, &ppd);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 4/5] x86/kvm: use __decrypted attribute in shared variables
  2018-09-07 17:57 [PATCH v6 0/5] x86: Fix SEV guest regression Brijesh Singh
                   ` (2 preceding siblings ...)
  2018-09-07 17:57 ` [PATCH v6 3/5] x86/mm: add .data..decrypted section to hold shared variables Brijesh Singh
@ 2018-09-07 17:57 ` Brijesh Singh
  2018-09-10 12:04   ` Borislav Petkov
  2018-09-10 12:29   ` Paolo Bonzini
  2018-09-07 17:57 ` [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active Brijesh Singh
  4 siblings, 2 replies; 37+ messages in thread
From: Brijesh Singh @ 2018-09-07 17:57 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: Brijesh Singh, Tom Lendacky, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

Commit: 368a540e0232 (x86/kvmclock: Remove memblock dependency)
caused SEV guest regression. When SEV is active, we map the shared
variables (wall_clock and hv_clock_boot) with C=0 to ensure that both
the guest and the hypervisor are able to access the data. To map the
variables we use kernel_physical_mapping_init() to split the large pages,
but splitting large pages requires allocating a new PMD, which fails now
that kvmclock initialization is called early during boot.

Recently we added a special .data..decrypted section to hold the shared
variables. This section is mapped with C=0 early during boot. Use
__decrypted attribute to put the wall_clock and hv_clock_boot in
.data..decrypted section so that they are mapped with C=0.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Fixes: 368a540e0232 ("x86/kvmclock: Remove memblock dependency")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: kvm@vger.kernel.org
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
---
 arch/x86/kernel/kvmclock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 1e67646..376fd3a 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -61,8 +61,8 @@ early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
 	(PAGE_SIZE / sizeof(struct pvclock_vsyscall_time_info))
 
 static struct pvclock_vsyscall_time_info
-			hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __aligned(PAGE_SIZE);
-static struct pvclock_wall_clock wall_clock;
+			hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
+static struct pvclock_wall_clock wall_clock __decrypted;
 static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
 
 static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-07 17:57 [PATCH v6 0/5] x86: Fix SEV guest regression Brijesh Singh
                   ` (3 preceding siblings ...)
  2018-09-07 17:57 ` [PATCH v6 4/5] x86/kvm: use __decrypted attribute in " Brijesh Singh
@ 2018-09-07 17:57 ` Brijesh Singh
  2018-09-10 12:27   ` Borislav Petkov
  2018-09-10 12:28   ` Paolo Bonzini
  4 siblings, 2 replies; 37+ messages in thread
From: Brijesh Singh @ 2018-09-07 17:57 UTC (permalink / raw)
  To: x86, linux-kernel, kvm
  Cc: Brijesh Singh, Tom Lendacky, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

Currently, the per-cpu pvclock data is allocated dynamically when
cpu > HVC_BOOT_ARRAY_SIZE. The physical address of this variable is
shared between the guest and the hypervisor hence it must be mapped as
unencrypted (ie. C=0) when SEV is active.

The C-bit works on a page, hence we will be required to perform a
full 4k page allocation to store a single 32-byte pvclock variable. It
will waste fairly sizeable amount of memory since each CPU will be doing
a separate 4k allocation. Let's define a second array for the SEV case to
statically allocate for NR_CPUS and put this array in .data..decrypted
section so that its mapped with C=0 during boot. The .data..decrypted
section has a big chunk of memory that is currently unused. And since
second array will be used only when memory encryption is active hence
free it when encryption is not active.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: kvm@vger.kernel.org
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
---
 arch/x86/include/asm/mem_encrypt.h |  4 ++++
 arch/x86/kernel/kvmclock.c         | 14 ++++++++++++++
 arch/x86/kernel/vmlinux.lds.S      |  3 +++
 arch/x86/mm/init.c                 |  3 +++
 arch/x86/mm/mem_encrypt.c          | 10 ++++++++++
 5 files changed, 34 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 802b2eb..cc46584 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
 
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_init(void);
+void __init free_decrypted_mem(void);
 
 bool sme_active(void);
 bool sev_active(void);
 
 #define __decrypted __attribute__((__section__(".data..decrypted")))
+#define __decrypted_aux __attribute__((__section__(".data..decrypted.aux")))
 
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
@@ -80,6 +82,7 @@ static inline int __init
 early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }
 
 #define __decrypted
+#define __decrypted_aux
 
 #endif	/* CONFIG_AMD_MEM_ENCRYPT */
 
@@ -93,6 +96,7 @@ early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0;
 #define __sme_pa_nodebug(x)	(__pa_nodebug(x) | sme_me_mask)
 
 extern char __start_data_decrypted[], __end_data_decrypted[];
+extern char __start_data_decrypted_aux[];
 
 #endif	/* __ASSEMBLY__ */
 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 376fd3a..6086b56 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -65,6 +65,15 @@ static struct pvclock_vsyscall_time_info
 static struct pvclock_wall_clock wall_clock __decrypted;
 static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+/*
+ * The auxiliary array will be used when SEV is active. In non-SEV case,
+ * it will be freed by free_decrypted_mem().
+ */
+static struct pvclock_vsyscall_time_info
+			hv_clock_aux[NR_CPUS] __decrypted_aux;
+#endif
+
 static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
 {
 	return &this_cpu_read(hv_clock_per_cpu)->pvti;
@@ -269,6 +278,11 @@ static int kvmclock_setup_percpu(unsigned int cpu)
 	/* Use the static page for the first CPUs, allocate otherwise */
 	if (cpu < HVC_BOOT_ARRAY_SIZE)
 		p = &hv_clock_boot[cpu];
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	/* Use the static page from auxiliary array instead of allocating it. */
+	else if (sev_active())
+		p = &hv_clock_aux[cpu - HVC_BOOT_ARRAY_SIZE];
+#endif
 	else
 		p = kzalloc(sizeof(*p), GFP_KERNEL);
 
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4cb1064..bde287a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -77,6 +77,9 @@ jiffies_64 = jiffies;
 	. = ALIGN(PMD_SIZE);					\
 	__start_data_decrypted = .;				\
 	*(.data..decrypted);					\
+	. = ALIGN(PAGE_SIZE);					\
+	__start_data_decrypted_aux = .;				\
+	*(.data..decrypted.aux);				\
 	. = ALIGN(PMD_SIZE);					\
 	__end_data_decrypted = .;				\
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 7a8fc26..052b279 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -815,9 +815,12 @@ void free_kernel_image_pages(void *begin, void *end)
 		set_memory_np_noalias(begin_ul, len_pages);
 }
 
+void __weak free_decrypted_mem(void) { }
+
 void __ref free_initmem(void)
 {
 	e820__reallocate_tables();
+	free_decrypted_mem();
 
 	free_kernel_image_pages(&__init_begin, &__init_end);
 }
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index b2de398..9a08c52 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -348,6 +348,16 @@ bool sev_active(void)
 EXPORT_SYMBOL(sev_active);
 
 /* Architecture __weak replacement functions */
+void __init free_decrypted_mem(void)
+{
+	if (mem_encrypt_active())
+		return;
+
+	free_init_pages("unused decrypted",
+			(unsigned long)__start_data_decrypted_aux,
+			(unsigned long)__end_data_decrypted);
+}
+
 void __init mem_encrypt_init(void)
 {
 	if (!sme_me_mask)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 1/5] x86/mm: Restructure sme_encrypt_kernel()
  2018-09-07 17:57 ` [PATCH v6 1/5] x86/mm: Restructure sme_encrypt_kernel() Brijesh Singh
@ 2018-09-10 11:32   ` Borislav Petkov
  0 siblings, 0 replies; 37+ messages in thread
From: Borislav Petkov @ 2018-09-10 11:32 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

On Fri, Sep 07, 2018 at 12:57:26PM -0500, Brijesh Singh wrote:
> Re-arrange the sme_encrypt_kernel() by moving the workarea map/unmap
> logic in a separate static function. There are no logical changes in this
> patch. The restructuring will allow us to expand the sme_encrypt_kernel
> in future.

...

> -	/* Perform the encryption */
> -	sme_encrypt_execute(kernel_start, kernel_start + decrypted_base,
> -			    kernel_len, workarea_start, (unsigned long)ppd.pgd);
> +	wa->kernel_start = kernel_start;
> +	wa->kernel_end = kernel_end;
> +	wa->kernel_len = kernel_len;
>  
> -	if (initrd_len)
> -		sme_encrypt_execute(initrd_start, initrd_start + decrypted_base,
> -				    initrd_len, workarea_start,
> -				    (unsigned long)ppd.pgd);
> +	wa->initrd_start = initrd_start;
> +	wa->initrd_end = initrd_end;
> +	wa->initrd_len = initrd_len;
> +
> +	wa->workarea_start = workarea_start;
> +	wa->workarea_end = workarea_end;
> +	wa->workarea_len = workarea_len;
> +
> +	wa->decrypted_base = decrypted_base;
> +}
>  
> +static void __init teardown_workarea_map(struct sme_workarea_data *wa,
> +				         struct sme_populate_pgd_data *ppd)

ERROR: code indent should use tabs where possible
#214: FILE: arch/x86/mm/mem_encrypt_identity.c:469:
+^I^I^I^I         struct sme_populate_pgd_data *ppd)$

I think I've already said that to you but here it is again:

Please integrate scripts/checkpatch.pl into your patch creation
workflow. Some of the warnings/errors *actually* make sense.

Otherwise:

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/5] x86/mm: fix sme_populate_pgd() to update page flags
  2018-09-07 17:57 ` [PATCH v6 2/5] x86/mm: fix sme_populate_pgd() to update page flags Brijesh Singh
@ 2018-09-10 11:36   ` Borislav Petkov
  2018-09-10 12:28     ` Brijesh Singh
  0 siblings, 1 reply; 37+ messages in thread
From: Borislav Petkov @ 2018-09-10 11:36 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

On Fri, Sep 07, 2018 at 12:57:27PM -0500, Brijesh Singh wrote:
> Fix sme_populate_pgd() to update page flags if the PMD/PTE entry
> already exists.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: kvm@vger.kernel.org
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: kvm@vger.kernel.org
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> ---
>  arch/x86/mm/mem_encrypt_identity.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index 92265d3..7659e65 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -154,9 +154,6 @@ static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
>  		return;
>  
>  	pmd = pmd_offset(pud, ppd->vaddr);
> -	if (pmd_large(*pmd))
> -		return;
> -
>  	set_pmd(pmd, __pmd(ppd->paddr | ppd->pmd_flags));
>  }
>  
> @@ -182,8 +179,7 @@ static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
>  		return;
>  
>  	pte = pte_offset_map(pmd, ppd->vaddr);
> -	if (pte_none(*pte))
> -		set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
> +	set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
>  }
>  
>  static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
> -- 

This looks like a bugfix to me and as such should be:

* at the beginning of the series
* contain a Fixes: tag
* contain Cc: <stable@vger.kernel.org>

Right?

With that addressed:

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 3/5] x86/mm: add .data..decrypted section to hold shared variables
  2018-09-07 17:57 ` [PATCH v6 3/5] x86/mm: add .data..decrypted section to hold shared variables Brijesh Singh
@ 2018-09-10 11:54   ` Borislav Petkov
  2018-09-10 12:33     ` Brijesh Singh
  0 siblings, 1 reply; 37+ messages in thread
From: Borislav Petkov @ 2018-09-10 11:54 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

On Fri, Sep 07, 2018 at 12:57:28PM -0500, Brijesh Singh wrote:
> kvmclock defines few static variables which are shared with the
> hypervisor during the kvmclock initialization.
> 
> When SEV is active, memory is encrypted with a guest-specific key, and
> if guest OS wants to share the memory region with hypervisor then it must

"... if *the* guest OS ... with *the* hypervisor ..."

> clear the C-bit before sharing it.

<---- newline here.

> Currently, we use
> kernel_physical_mapping_init() to split large pages before clearing the
> C-bit on shared pages. But it fails when called from the kvmclock
> initialization (mainly because memblock allocator is not ready that early

		... because *the* memblock allocator...

hmm, those definite articles are kinda all lost... :)

> during boot).
> 
> Add a __decrypted section attribute which can be used when defining
> such shared variable. The so-defined variables will be placed in the
> .data..decrypted section. This section is mapped with C=0 early
> during boot, we also ensure that the initialized values are updated
> to match with C=0 (i.e perform an in-place decryption). The
> .data..decrypted section is PMD-aligned and sized so that we avoid
> the need to split the large pages when mapping the section.
> 
> The sme_encrypt_kernel() was used to perform the in-place encryption

Here, of all things, you don't need a definite article :)

"sme_encrypt_kernel() performs the in-place encryption...

> of the Linux kernel and initrd when SME is active. The routine has been
> enhanced to decrypt the .data..decrypted section for both SME and SEV
> cases.

Otherwise, that's a much better commit message!

> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: kvm@vger.kernel.org
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: kvm@vger.kernel.org
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> ---
>  arch/x86/include/asm/mem_encrypt.h |  6 +++
>  arch/x86/kernel/head64.c           | 11 +++++
>  arch/x86/kernel/vmlinux.lds.S      | 17 +++++++
>  arch/x86/mm/mem_encrypt_identity.c | 94 ++++++++++++++++++++++++++++++++------
>  4 files changed, 113 insertions(+), 15 deletions(-)

...

> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 8bde0a4..4cb1064 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -65,6 +65,21 @@ jiffies_64 = jiffies;
>  #define ALIGN_ENTRY_TEXT_BEGIN	. = ALIGN(PMD_SIZE);
>  #define ALIGN_ENTRY_TEXT_END	. = ALIGN(PMD_SIZE);
>  
> +/*
> + * This section contains data which will be mapped as decrypted. Memory
> + * encryption operates on a page basis. Make this section PMD-aligned
> + * to avoid spliting the pages while mapping the section early.

"splitting" - damn, that spell checker of yours is still not working... ;-\

> + *
> + * Note: We use a separate section so that only this section gets
> + * decrypted to avoid exposing more than we wish.
> + */
> +#define DATA_DECRYPTED						\
> +	. = ALIGN(PMD_SIZE);					\
> +	__start_data_decrypted = .;				\
> +	*(.data..decrypted);					\
> +	. = ALIGN(PMD_SIZE);					\
> +	__end_data_decrypted = .;				\
> +
>  #else
>  
>  #define X86_ALIGN_RODATA_BEGIN

...

> @@ -487,28 +510,69 @@ static void __init teardown_workarea_map(struct sme_workarea_data *wa,
>  	native_write_cr3(__native_read_cr3());
>  }
>  
> +static void __init decrypt_shared_data(struct sme_workarea_data *wa,
> +				       struct sme_populate_pgd_data *ppd)
> +{
> +	unsigned long decrypted_start, decrypted_end, decrypted_len;
> +
> +	/* Physical addresses of decrypted data section */
> +	decrypted_start = __pa_symbol(__start_data_decrypted);
> +	decrypted_end = ALIGN(__pa_symbol(__end_data_decrypted), PMD_PAGE_SIZE);

Why?

You have:

+       . = ALIGN(PMD_SIZE);                                    \
+       __end_data_decrypted = .;                               \

It is already aligned by definition?!?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 4/5] x86/kvm: use __decrypted attribute in shared variables
  2018-09-07 17:57 ` [PATCH v6 4/5] x86/kvm: use __decrypted attribute in " Brijesh Singh
@ 2018-09-10 12:04   ` Borislav Petkov
  2018-09-10 13:15     ` Sean Christopherson
  2018-09-10 12:29   ` Paolo Bonzini
  1 sibling, 1 reply; 37+ messages in thread
From: Borislav Petkov @ 2018-09-10 12:04 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

On Fri, Sep 07, 2018 at 12:57:29PM -0500, Brijesh Singh wrote:
> Commit: 368a540e0232 (x86/kvmclock: Remove memblock dependency)
> caused SEV guest regression.

When mentioning a commit in the commit message, put it on a separate
line, like this:

"Commit

  368a540e0232 (x86/kvmclock: Remove memblock dependency)

caused a SEV guest regression."

> When SEV is active, we map the shared

Use passive tone in your commit message: no "we", etc...

> variables (wall_clock and hv_clock_boot) with C=0 to ensure that both
> the guest and the hypervisor are able to access the data. To map the
> variables we use kernel_physical_mapping_init() to split the large pages,

"... to potentially split large pages used for that mapping... "

> but splitting large pages requires allocating a new PMD, which fails now
> that kvmclock initialization is called early during boot.

"... before the memblock allocator is initialized."

> Recently we added a special .data..decrypted section to hold the shared
> variables.

You don't really need that sentence.

> This section is mapped with C=0 early during boot. Use
> __decrypted attribute to put the wall_clock and hv_clock_boot in
> .data..decrypted section so that they are mapped with C=0.

"... so that they're mapped decrypted." Readers don't care about C=0
- they simply wanna know what C=0 represents, i.e., memory is not
encrypted.

With that:

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-07 17:57 ` [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active Brijesh Singh
@ 2018-09-10 12:27   ` Borislav Petkov
  2018-09-10 13:15     ` Brijesh Singh
  2018-09-10 12:28   ` Paolo Bonzini
  1 sibling, 1 reply; 37+ messages in thread
From: Borislav Petkov @ 2018-09-10 12:27 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

On Fri, Sep 07, 2018 at 12:57:30PM -0500, Brijesh Singh wrote:
> Currently, the per-cpu pvclock data is allocated dynamically when
> cpu > HVC_BOOT_ARRAY_SIZE.

Well no, you need to write this correctly - what is "cpu >
HVC_BOOT_ARRAY_SIZE" ?!

( I know what it is but I know it only because I've looked at that code before.  )

So no, please explain it in English not in code.

> The physical address of this variable is
> shared between the guest and the hypervisor hence it must be mapped as
> unencrypted (ie. C=0) when SEV is active.

This sentence is a good example about how to explain stuff in commit
messages.

> The C-bit works on a page,

"The C-bit determines the encryption status of a 4K page."

> hence we will be required to perform a

Use passive tone in your commit message: no "we", etc...

> full 4k page allocation to store a single 32-byte pvclock variable. It
> will waste fairly sizeable amount of memory since each CPU will be doing

"... will waste *a* fairly sizeable amount of ..."

> a separate 4k allocation.

Start new paragraph here and use passive tone.

> Let's define a second array for the SEV case to
> statically allocate for NR_CPUS and put this array in .data..decrypted

NR_CPUS needs explaining for the unenlightened reader. Also,

	"... put this array in *the* .data..decrypted section... "

> section so that its mapped with C=0 during boot.

<---- newline here.

> The .data..decrypted
> section has a big chunk of memory that is currently unused. And since
> second array will be used only when memory encryption is active hence

"... since *the* second array... "

s/hence //

> free it when encryption is not active.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: kvm@vger.kernel.org
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: kvm@vger.kernel.org
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> ---
>  arch/x86/include/asm/mem_encrypt.h |  4 ++++
>  arch/x86/kernel/kvmclock.c         | 14 ++++++++++++++
>  arch/x86/kernel/vmlinux.lds.S      |  3 +++
>  arch/x86/mm/init.c                 |  3 +++
>  arch/x86/mm/mem_encrypt.c          | 10 ++++++++++
>  5 files changed, 34 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 802b2eb..cc46584 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
>  
>  /* Architecture __weak replacement functions */
>  void __init mem_encrypt_init(void);
> +void __init free_decrypted_mem(void);

Proper prefixing:

"mem_encrypt_free_decrypted"

or so

>  bool sme_active(void);
>  bool sev_active(void);
>  
>  #define __decrypted __attribute__((__section__(".data..decrypted")))
> +#define __decrypted_aux __attribute__((__section__(".data..decrypted.aux")))
>  
>  #else	/* !CONFIG_AMD_MEM_ENCRYPT */
>  
> @@ -80,6 +82,7 @@ static inline int __init
>  early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }
>  
>  #define __decrypted
> +#define __decrypted_aux
>  
>  #endif	/* CONFIG_AMD_MEM_ENCRYPT */
>  
> @@ -93,6 +96,7 @@ early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0;
>  #define __sme_pa_nodebug(x)	(__pa_nodebug(x) | sme_me_mask)
>  
>  extern char __start_data_decrypted[], __end_data_decrypted[];
> +extern char __start_data_decrypted_aux[];
>  
>  #endif	/* __ASSEMBLY__ */
>  
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 376fd3a..6086b56 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -65,6 +65,15 @@ static struct pvclock_vsyscall_time_info
>  static struct pvclock_wall_clock wall_clock __decrypted;
>  static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +/*
> + * The auxiliary array will be used when SEV is active. In non-SEV case,
> + * it will be freed by free_decrypted_mem().
> + */
> +static struct pvclock_vsyscall_time_info
> +			hv_clock_aux[NR_CPUS] __decrypted_aux;

Hmm, so worst case that's 64 4K pages:

(8192*32)/4096 = 64 4K pages.

Now, the real question from all this SNAFU is, why can't all those point
to a single struct pvclock_vsyscall_time_info and all CPUs read a single
thing? Why do they have to be per-CPU and thus waste so much memory?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/5] x86/mm: fix sme_populate_pgd() to update page flags
  2018-09-10 11:36   ` Borislav Petkov
@ 2018-09-10 12:28     ` Brijesh Singh
  2018-09-10 12:32       ` Borislav Petkov
  0 siblings, 1 reply; 37+ messages in thread
From: Brijesh Singh @ 2018-09-10 12:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, x86, linux-kernel, kvm, Tom Lendacky,
	Thomas Gleixner, H. Peter Anvin, Paolo Bonzini,
	Sean Christopherson, Radim Krčmář



On 9/10/18 6:36 AM, Borislav Petkov wrote:
> On Fri, Sep 07, 2018 at 12:57:27PM -0500, Brijesh Singh wrote:
>> Fix sme_populate_pgd() to update page flags if the PMD/PTE entry
>> already exists.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: kvm@vger.kernel.org
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Borislav Petkov <bp@suse.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: linux-kernel@vger.kernel.org
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
>> Cc: kvm@vger.kernel.org
>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>> ---
>>  arch/x86/mm/mem_encrypt_identity.c | 6 +-----
>>  1 file changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
>> index 92265d3..7659e65 100644
>> --- a/arch/x86/mm/mem_encrypt_identity.c
>> +++ b/arch/x86/mm/mem_encrypt_identity.c
>> @@ -154,9 +154,6 @@ static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
>>  		return;
>>  
>>  	pmd = pmd_offset(pud, ppd->vaddr);
>> -	if (pmd_large(*pmd))
>> -		return;
>> -
>>  	set_pmd(pmd, __pmd(ppd->paddr | ppd->pmd_flags));
>>  }
>>  
>> @@ -182,8 +179,7 @@ static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
>>  		return;
>>  
>>  	pte = pte_offset_map(pmd, ppd->vaddr);
>> -	if (pte_none(*pte))
>> -		set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
>> +	set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
>>  }
>>  
>>  static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
>> -- 
> This looks like a bugfix to me and as such should be:
>
> * at the beginning of the series
> * contain a Fixes: tag
> * contain Cc: <stable@vger.kernel.org>
>
> Right?

Based on your advice I  was going to submit this series to stable
separately. This particular issue is not affecting anyone right now. The
only user to this function is sme_kernel_encrypt() which never updates
the PTE/PMD entries.  I can update the commit message like this this to
clarify it.

Enhance the sme_populate_pgd() to update page flags if the PMD/PTE entry
already exists.



> With that addressed:
>
> Reviewed-by: Borislav Petkov <bp@suse.de>
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-07 17:57 ` [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active Brijesh Singh
  2018-09-10 12:27   ` Borislav Petkov
@ 2018-09-10 12:28   ` Paolo Bonzini
  1 sibling, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2018-09-10 12:28 UTC (permalink / raw)
  To: Brijesh Singh, x86, linux-kernel, kvm
  Cc: Tom Lendacky, Thomas Gleixner, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Radim Krčmář

On 07/09/2018 19:57, Brijesh Singh wrote:
> Currently, the per-cpu pvclock data is allocated dynamically when
> cpu > HVC_BOOT_ARRAY_SIZE. The physical address of this variable is
> shared between the guest and the hypervisor hence it must be mapped as
> unencrypted (ie. C=0) when SEV is active.
> 
> The C-bit works on a page, hence we will be required to perform a
> full 4k page allocation to store a single 32-byte pvclock variable. It
> will waste fairly sizeable amount of memory since each CPU will be doing
> a separate 4k allocation. Let's define a second array for the SEV case to
> statically allocate for NR_CPUS and put this array in .data..decrypted
> section so that its mapped with C=0 during boot. The .data..decrypted
> section has a big chunk of memory that is currently unused. And since
> second array will be used only when memory encryption is active hence
> free it when encryption is not active.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: kvm@vger.kernel.org
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: kvm@vger.kernel.org
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> ---
>  arch/x86/include/asm/mem_encrypt.h |  4 ++++
>  arch/x86/kernel/kvmclock.c         | 14 ++++++++++++++
>  arch/x86/kernel/vmlinux.lds.S      |  3 +++
>  arch/x86/mm/init.c                 |  3 +++
>  arch/x86/mm/mem_encrypt.c          | 10 ++++++++++
>  5 files changed, 34 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 802b2eb..cc46584 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
>  
>  /* Architecture __weak replacement functions */
>  void __init mem_encrypt_init(void);
> +void __init free_decrypted_mem(void);
>  
>  bool sme_active(void);
>  bool sev_active(void);
>  
>  #define __decrypted __attribute__((__section__(".data..decrypted")))
> +#define __decrypted_aux __attribute__((__section__(".data..decrypted.aux")))
>  
>  #else	/* !CONFIG_AMD_MEM_ENCRYPT */
>  
> @@ -80,6 +82,7 @@ static inline int __init
>  early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }
>  
>  #define __decrypted
> +#define __decrypted_aux
>  
>  #endif	/* CONFIG_AMD_MEM_ENCRYPT */
>  
> @@ -93,6 +96,7 @@ early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0;
>  #define __sme_pa_nodebug(x)	(__pa_nodebug(x) | sme_me_mask)
>  
>  extern char __start_data_decrypted[], __end_data_decrypted[];
> +extern char __start_data_decrypted_aux[];
>  
>  #endif	/* __ASSEMBLY__ */
>  
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 376fd3a..6086b56 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -65,6 +65,15 @@ static struct pvclock_vsyscall_time_info
>  static struct pvclock_wall_clock wall_clock __decrypted;
>  static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +/*
> + * The auxiliary array will be used when SEV is active. In non-SEV case,
> + * it will be freed by free_decrypted_mem().
> + */
> +static struct pvclock_vsyscall_time_info
> +			hv_clock_aux[NR_CPUS] __decrypted_aux;
> +#endif
> +
>  static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
>  {
>  	return &this_cpu_read(hv_clock_per_cpu)->pvti;
> @@ -269,6 +278,11 @@ static int kvmclock_setup_percpu(unsigned int cpu)
>  	/* Use the static page for the first CPUs, allocate otherwise */
>  	if (cpu < HVC_BOOT_ARRAY_SIZE)
>  		p = &hv_clock_boot[cpu];
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +	/* Use the static page from auxiliary array instead of allocating it. */
> +	else if (sev_active())
> +		p = &hv_clock_aux[cpu - HVC_BOOT_ARRAY_SIZE];
> +#endif
>  	else
>  		p = kzalloc(sizeof(*p), GFP_KERNEL);
>  
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 4cb1064..bde287a 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -77,6 +77,9 @@ jiffies_64 = jiffies;
>  	. = ALIGN(PMD_SIZE);					\
>  	__start_data_decrypted = .;				\
>  	*(.data..decrypted);					\
> +	. = ALIGN(PAGE_SIZE);					\
> +	__start_data_decrypted_aux = .;				\
> +	*(.data..decrypted.aux);				\
>  	. = ALIGN(PMD_SIZE);					\
>  	__end_data_decrypted = .;				\
>  
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 7a8fc26..052b279 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -815,9 +815,12 @@ void free_kernel_image_pages(void *begin, void *end)
>  		set_memory_np_noalias(begin_ul, len_pages);
>  }
>  
> +void __weak free_decrypted_mem(void) { }
> +
>  void __ref free_initmem(void)
>  {
>  	e820__reallocate_tables();
> +	free_decrypted_mem();
>  
>  	free_kernel_image_pages(&__init_begin, &__init_end);
>  }
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index b2de398..9a08c52 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -348,6 +348,16 @@ bool sev_active(void)
>  EXPORT_SYMBOL(sev_active);
>  
>  /* Architecture __weak replacement functions */
> +void __init free_decrypted_mem(void)
> +{
> +	if (mem_encrypt_active())
> +		return;
> +
> +	free_init_pages("unused decrypted",
> +			(unsigned long)__start_data_decrypted_aux,
> +			(unsigned long)__end_data_decrypted);
> +}
> +
>  void __init mem_encrypt_init(void)
>  {
>  	if (!sme_me_mask)
> 

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 4/5] x86/kvm: use __decrypted attribute in shared variables
  2018-09-07 17:57 ` [PATCH v6 4/5] x86/kvm: use __decrypted attribute in " Brijesh Singh
  2018-09-10 12:04   ` Borislav Petkov
@ 2018-09-10 12:29   ` Paolo Bonzini
  2018-09-10 12:33     ` Borislav Petkov
  1 sibling, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2018-09-10 12:29 UTC (permalink / raw)
  To: Brijesh Singh, x86, linux-kernel, kvm
  Cc: Tom Lendacky, Thomas Gleixner, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Radim Krčmář

On 07/09/2018 19:57, Brijesh Singh wrote:
> Commit: 368a540e0232 (x86/kvmclock: Remove memblock dependency)
> caused SEV guest regression. When SEV is active, we map the shared
> variables (wall_clock and hv_clock_boot) with C=0 to ensure that both
> the guest and the hypervisor are able to access the data. To map the
> variables we use kernel_physical_mapping_init() to split the large pages,
> but splitting large pages requires allocating a new PMD, which fails now
> that kvmclock initialization is called early during boot.
> 
> Recently we added a special .data..decrypted section to hold the shared
> variables. This section is mapped with C=0 early during boot. Use
> __decrypted attribute to put the wall_clock and hv_clock_boot in
> .data..decrypted section so that they are mapped with C=0.
> 
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> Fixes: 368a540e0232 ("x86/kvmclock: Remove memblock dependency")
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: kvm@vger.kernel.org
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: kvm@vger.kernel.org
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> ---
>  arch/x86/kernel/kvmclock.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 1e67646..376fd3a 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -61,8 +61,8 @@ early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
>  	(PAGE_SIZE / sizeof(struct pvclock_vsyscall_time_info))
>  
>  static struct pvclock_vsyscall_time_info
> -			hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __aligned(PAGE_SIZE);
> -static struct pvclock_wall_clock wall_clock;
> +			hv_clock_boot[HVC_BOOT_ARRAY_SIZE] __decrypted __aligned(PAGE_SIZE);
> +static struct pvclock_wall_clock wall_clock __decrypted;
>  static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
>  
>  static inline struct pvclock_vcpu_time_info *this_cpu_pvti(void)
> 

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

(Though perhaps __noencrypt or __unencrypted would be a bit more
accurate; likewise for the freeing function added in patch 5).

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/5] x86/mm: fix sme_populate_pgd() to update page flags
  2018-09-10 12:28     ` Brijesh Singh
@ 2018-09-10 12:32       ` Borislav Petkov
  0 siblings, 0 replies; 37+ messages in thread
From: Borislav Petkov @ 2018-09-10 12:32 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

On Mon, Sep 10, 2018 at 07:28:08AM -0500, Brijesh Singh wrote:
> Based on your advice I  was going to submit this series to stable
> separately. This particular issue is not affecting anyone right now. The
> only user to this function is sme_kernel_encrypt() which never updates
> the PTE/PMD entries.  I can update the commit message like this this to
> clarify it.
>
> Enhance the sme_populate_pgd() to update page flags if the PMD/PTE entry
> already exists.

Agreed, except you still need the Fixes: tag to go with the patch. And
that Fixes: tag should point to the patch which added the checks you're
removing, I'd say.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 3/5] x86/mm: add .data..decrypted section to hold shared variables
  2018-09-10 11:54   ` Borislav Petkov
@ 2018-09-10 12:33     ` Brijesh Singh
  0 siblings, 0 replies; 37+ messages in thread
From: Brijesh Singh @ 2018-09-10 12:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, x86, linux-kernel, kvm, Tom Lendacky,
	Thomas Gleixner, H. Peter Anvin, Paolo Bonzini,
	Sean Christopherson, Radim Krčmář



On 9/10/18 6:54 AM, Borislav Petkov wrote:
...

>> @@ -487,28 +510,69 @@ static void __init teardown_workarea_map(struct sme_workarea_data *wa,
>>  	native_write_cr3(__native_read_cr3());
>>  }
>>  
>> +static void __init decrypt_shared_data(struct sme_workarea_data *wa,
>> +				       struct sme_populate_pgd_data *ppd)
>> +{
>> +	unsigned long decrypted_start, decrypted_end, decrypted_len;
>> +
>> +	/* Physical addresses of decrypted data section */
>> +	decrypted_start = __pa_symbol(__start_data_decrypted);
>> +	decrypted_end = ALIGN(__pa_symbol(__end_data_decrypted), PMD_PAGE_SIZE);
> Why?
>
> You have:
>
> +       . = ALIGN(PMD_SIZE);                                    \
> +       __end_data_decrypted = .;                               \
>
> It is already aligned by definition?!?
>

Right, there is no need to use ALIGN. I will remove it.

-Brijesh

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 4/5] x86/kvm: use __decrypted attribute in shared variables
  2018-09-10 12:29   ` Paolo Bonzini
@ 2018-09-10 12:33     ` Borislav Petkov
  2018-09-10 12:46       ` Paolo Bonzini
  0 siblings, 1 reply; 37+ messages in thread
From: Borislav Petkov @ 2018-09-10 12:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Brijesh Singh, x86, linux-kernel, kvm, Tom Lendacky,
	Thomas Gleixner, H. Peter Anvin, Sean Christopherson,
	Radim Krčmář

On Mon, Sep 10, 2018 at 02:29:07PM +0200, Paolo Bonzini wrote:
> (Though perhaps __noencrypt or __unencrypted would be a bit more
> accurate; likewise for the freeing function added in patch 5).

It has been raised already:

https://lkml.kernel.org/r/20180830092606.GC18459@nazgul.tnic

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 4/5] x86/kvm: use __decrypted attribute in shared variables
  2018-09-10 12:33     ` Borislav Petkov
@ 2018-09-10 12:46       ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2018-09-10 12:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Brijesh Singh, x86, linux-kernel, kvm, Tom Lendacky,
	Thomas Gleixner, H. Peter Anvin, Sean Christopherson,
	Radim Krčmář

On 10/09/2018 14:33, Borislav Petkov wrote:
> On Mon, Sep 10, 2018 at 02:29:07PM +0200, Paolo Bonzini wrote:
>> (Though perhaps __noencrypt or __unencrypted would be a bit more
>> accurate; likewise for the freeing function added in patch 5).
> 
> It has been raised already:
> 
> https://lkml.kernel.org/r/20180830092606.GC18459@nazgul.tnic

Fair enough (I am just back from vacation and still in delete-happy
mode, so I missed that).

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 4/5] x86/kvm: use __decrypted attribute in shared variables
  2018-09-10 12:04   ` Borislav Petkov
@ 2018-09-10 13:15     ` Sean Christopherson
  2018-09-10 13:29       ` Thomas Gleixner
  2018-09-10 15:34       ` Borislav Petkov
  0 siblings, 2 replies; 37+ messages in thread
From: Sean Christopherson @ 2018-09-10 13:15 UTC (permalink / raw)
  To: Borislav Petkov, Brijesh Singh
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Radim Krčmář

On Mon, 2018-09-10 at 14:04 +0200, Borislav Petkov wrote:
> On Fri, Sep 07, 2018 at 12:57:29PM -0500, Brijesh Singh wrote:
> > 
> > Commit: 368a540e0232 (x86/kvmclock: Remove memblock dependency)
> > caused SEV guest regression.
> When mentioning a commit in the commit message, put it on a separate
> line, like this:
> 
> "Commit
> 
>   368a540e0232 (x86/kvmclock: Remove memblock dependency)
> 
> caused a SEV guest regression."

Heh, that was the original formatting until I asked him to change it:
https://lkml.org/lkml/2018/8/29/809.  Checkpatch throws an error if
there's a newline after 'Commit'.  Though looking at this again, there
shouldn't be a colon after 'Commit' and there should be quotes inside
the parentheses, e.g. this satisfies checkpatch:

Commit 368a540e0232 ("x86/kvmclock: Remove memblock dependency")

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-10 12:27   ` Borislav Petkov
@ 2018-09-10 13:15     ` Brijesh Singh
  2018-09-10 13:29       ` Sean Christopherson
  2018-09-10 15:53       ` Borislav Petkov
  0 siblings, 2 replies; 37+ messages in thread
From: Brijesh Singh @ 2018-09-10 13:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, x86, linux-kernel, kvm, Tom Lendacky,
	Thomas Gleixner, H. Peter Anvin, Paolo Bonzini,
	Sean Christopherson, Radim Krčmář



On 9/10/18 7:27 AM, Borislav Petkov wrote:
> On Fri, Sep 07, 2018 at 12:57:30PM -0500, Brijesh Singh wrote:
>> Currently, the per-cpu pvclock data is allocated dynamically when
>> cpu > HVC_BOOT_ARRAY_SIZE.
> Well no, you need to write this correctly - what is "cpu >
> HVC_BOOT_ARRAY_SIZE" ?!
>
> ( I know what it is but I know it only because I've looked at that code before.  )
>
> So no, please explain it in English not in code.


>> The physical address of this variable is
>> shared between the guest and the hypervisor hence it must be mapped as
>> unencrypted (ie. C=0) when SEV is active.
> This sentence is a good example about how to explain stuff in commit
> messages.
>
>> The C-bit works on a page,
> "The C-bit determines the encryption status of a 4K page."
>
>> hence we will be required to perform a
> Use passive tone in your commit message: no "we", etc...
>
>> full 4k page allocation to store a single 32-byte pvclock variable. It
>> will waste fairly sizeable amount of memory since each CPU will be doing
> "... will waste *a* fairly sizeable amount of ..."
>
>> a separate 4k allocation.
> Start new paragraph here and use passive tone.
>
>> Let's define a second array for the SEV case to
>> statically allocate for NR_CPUS and put this array in .data..decrypted
> NR_CPUS needs explaining for the unenlightened reader. Also,
>
> 	"... put this array in *the* .data..decrypted section... "
>
>> section so that its mapped with C=0 during boot.
> <---- newline here.
>
>> The .data..decrypted
>> section has a big chunk of memory that is currently unused. And since
>> second array will be used only when memory encryption is active hence
> "... since *the* second array... "
>
> s/hence //
>
>> free it when encryption is not active.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: kvm@vger.kernel.org
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Borislav Petkov <bp@suse.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: linux-kernel@vger.kernel.org
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
>> Cc: kvm@vger.kernel.org
>> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>> ---
>>  arch/x86/include/asm/mem_encrypt.h |  4 ++++
>>  arch/x86/kernel/kvmclock.c         | 14 ++++++++++++++
>>  arch/x86/kernel/vmlinux.lds.S      |  3 +++
>>  arch/x86/mm/init.c                 |  3 +++
>>  arch/x86/mm/mem_encrypt.c          | 10 ++++++++++
>>  5 files changed, 34 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> index 802b2eb..cc46584 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
>>  
>>  /* Architecture __weak replacement functions */
>>  void __init mem_encrypt_init(void);
>> +void __init free_decrypted_mem(void);
> Proper prefixing:
>
> "mem_encrypt_free_decrypted"
>
> or so
>
>>  bool sme_active(void);
>>  bool sev_active(void);
>>  
>>  #define __decrypted __attribute__((__section__(".data..decrypted")))
>> +#define __decrypted_aux __attribute__((__section__(".data..decrypted.aux")))
>>  
>>  #else	/* !CONFIG_AMD_MEM_ENCRYPT */
>>  
>> @@ -80,6 +82,7 @@ static inline int __init
>>  early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }
>>  
>>  #define __decrypted
>> +#define __decrypted_aux
>>  
>>  #endif	/* CONFIG_AMD_MEM_ENCRYPT */
>>  
>> @@ -93,6 +96,7 @@ early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0;
>>  #define __sme_pa_nodebug(x)	(__pa_nodebug(x) | sme_me_mask)
>>  
>>  extern char __start_data_decrypted[], __end_data_decrypted[];
>> +extern char __start_data_decrypted_aux[];
>>  
>>  #endif	/* __ASSEMBLY__ */
>>  
>> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
>> index 376fd3a..6086b56 100644
>> --- a/arch/x86/kernel/kvmclock.c
>> +++ b/arch/x86/kernel/kvmclock.c
>> @@ -65,6 +65,15 @@ static struct pvclock_vsyscall_time_info
>>  static struct pvclock_wall_clock wall_clock __decrypted;
>>  static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
>>  
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +/*
>> + * The auxiliary array will be used when SEV is active. In non-SEV case,
>> + * it will be freed by free_decrypted_mem().
>> + */
>> +static struct pvclock_vsyscall_time_info
>> +			hv_clock_aux[NR_CPUS] __decrypted_aux;
> Hmm, so worst case that's 64 4K pages:
>
> (8192*32)/4096 = 64 4K pages.

We can minimize the worst case memory usage. The number of VCPUs
supported by KVM maybe less than NR_CPUS. e.g Currently KVM_MAX_VCPUS is
set to 288

(288 * 64)/4096 = 4 4K pages.

(pvclock_vsyscall_time_info is cache aligned so it will be 64 bytes)


#if NR_CPUS > KVM_MAX_VCPUS
#define HV_AUX_ARRAY_SIZE  KVM_MAX_VCPUS
#else
#define HV_AUX_ARRAY_SIZE NR_CPUS
#endif

static struct pvclock_vsyscall_time_info
                        hv_clock_aux[HV_AUX_ARRAY_SIZE] __decrypted_aux;


> Now, the real question from all this SNAFU is, why can't all those point
> to a single struct pvclock_vsyscall_time_info and all CPUs read a single
> thing? Why do they have to be per-CPU and thus waste so much memory?
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-10 13:15     ` Brijesh Singh
@ 2018-09-10 13:29       ` Sean Christopherson
  2018-09-10 15:10         ` Brijesh Singh
  2018-09-10 15:53       ` Borislav Petkov
  1 sibling, 1 reply; 37+ messages in thread
From: Sean Christopherson @ 2018-09-10 13:29 UTC (permalink / raw)
  To: Brijesh Singh, Borislav Petkov
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Radim Krčmář

On Mon, 2018-09-10 at 08:15 -0500, Brijesh Singh wrote:
> 
> On 9/10/18 7:27 AM, Borislav Petkov wrote:
> > 
> > On Fri, Sep 07, 2018 at 12:57:30PM -0500, Brijesh Singh wrote:
> > > 
> > > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> > > index 376fd3a..6086b56 100644
> > > --- a/arch/x86/kernel/kvmclock.c
> > > +++ b/arch/x86/kernel/kvmclock.c
> > > @@ -65,6 +65,15 @@ static struct pvclock_vsyscall_time_info
> > >  static struct pvclock_wall_clock wall_clock __decrypted;
> > >  static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu);
> > >  
> > > +#ifdef CONFIG_AMD_MEM_ENCRYPT
> > > +/*
> > > + * The auxiliary array will be used when SEV is active. In non-SEV case,
> > > + * it will be freed by free_decrypted_mem().
> > > + */
> > > +static struct pvclock_vsyscall_time_info
> > > +			hv_clock_aux[NR_CPUS] __decrypted_aux;
> > Hmm, so worst case that's 64 4K pages:
> > 
> > (8192*32)/4096 = 64 4K pages.
> We can minimize the worst case memory usage. The number of VCPUs
> supported by KVM maybe less than NR_CPUS. e.g Currently KVM_MAX_VCPUS is
> set to 288

KVM_MAX_VCPUS is a property of the host, whereas this code runs in the
guest, e.g. KVM_MAX_VCPUS could be 2048 in the host for all we know.

> (288 * 64)/4096 = 4 4K pages.
> 
> (pvclock_vsyscall_time_info is cache aligned so it will be 64 bytes)

Ah, I was wondering why my calculations were always different than
yours.  I was looking at struct pvclock_vcpu_time_info, which is 32
bytes.

> #if NR_CPUS > KVM_MAX_VCPUS
> #define HV_AUX_ARRAY_SIZE  KVM_MAX_VCPUS
> #else
> #define HV_AUX_ARRAY_SIZE NR_CPUS
> #endif
> 
> static struct pvclock_vsyscall_time_info
>                         hv_clock_aux[HV_AUX_ARRAY_SIZE] __decrypted_aux;


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 4/5] x86/kvm: use __decrypted attribute in shared variables
  2018-09-10 13:15     ` Sean Christopherson
@ 2018-09-10 13:29       ` Thomas Gleixner
  2018-09-10 15:34       ` Borislav Petkov
  1 sibling, 0 replies; 37+ messages in thread
From: Thomas Gleixner @ 2018-09-10 13:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, Brijesh Singh, x86, linux-kernel, kvm,
	Tom Lendacky, H. Peter Anvin, Paolo Bonzini,
	Radim Krčmář

[-- Attachment #1: Type: text/plain, Size: 1347 bytes --]

On Mon, 10 Sep 2018, Sean Christopherson wrote:

> On Mon, 2018-09-10 at 14:04 +0200, Borislav Petkov wrote:
> > On Fri, Sep 07, 2018 at 12:57:29PM -0500, Brijesh Singh wrote:
> > > 
> > > Commit: 368a540e0232 (x86/kvmclock: Remove memblock dependency)
> > > caused SEV guest regression.
> > When mentioning a commit in the commit message, put it on a separate
> > line, like this:
> > 
> > "Commit
> > 
> >   368a540e0232 (x86/kvmclock: Remove memblock dependency)
> > 
> > caused a SEV guest regression."
> 
> Heh, that was the original formatting until I asked him to change it:
> https://lkml.org/lkml/2018/8/29/809.  Checkpatch throws an error if
> there's a newline after 'Commit'.  Though looking at this again, there
> shouldn't be a colon after 'Commit' and there should be quotes inside
> the parentheses, e.g. this satisfies checkpatch:
> 
> Commit 368a540e0232 ("x86/kvmclock: Remove memblock dependency")

The whole 5 lines of 'commit bla broke' are bogus to me. They convey not
really useful information and the commit line will repeat itself in the
Fixes: tag. So something like:

  The recent removal of the memblock dependency caused a SEV guest
  regression because bla....


  Fixes: 368a540e0232 ("x86/kvmclock: Remove memblock dependency")

Is much more useful than the above independent of its formatting.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-10 13:29       ` Sean Christopherson
@ 2018-09-10 15:10         ` Brijesh Singh
  2018-09-10 15:28           ` Sean Christopherson
  0 siblings, 1 reply; 37+ messages in thread
From: Brijesh Singh @ 2018-09-10 15:10 UTC (permalink / raw)
  To: Sean Christopherson, Borislav Petkov
  Cc: brijesh.singh, x86, linux-kernel, kvm, Tom Lendacky,
	Thomas Gleixner, H. Peter Anvin, Paolo Bonzini,
	Radim Krčmář



On 09/10/2018 08:29 AM, Sean Christopherson wrote:
...

>>>> + */
>>>> +static struct pvclock_vsyscall_time_info
>>>> +			hv_clock_aux[NR_CPUS] __decrypted_aux;
>>> Hmm, so worst case that's 64 4K pages:
>>>
>>> (8192*32)/4096 = 64 4K pages.
>> We can minimize the worst case memory usage. The number of VCPUs
>> supported by KVM maybe less than NR_CPUS. e.g Currently KVM_MAX_VCPUS is
>> set to 288
> 
> KVM_MAX_VCPUS is a property of the host, whereas this code runs in the
> guest, e.g. KVM_MAX_VCPUS could be 2048 in the host for all we know.
> 


IIRC, during guest creation time qemu will check the host supported
VCPUS count. If count is greater than KVM_MAX_VCPUS then it will
fail to launch guest (or fail to hot plug vcpus). In other words, the
number of vcpus in a KVM guest will never to > KVM_MAX_VCPUS.

Am I missing something ?


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-10 15:10         ` Brijesh Singh
@ 2018-09-10 15:28           ` Sean Christopherson
  2018-09-10 15:30             ` Brijesh Singh
  0 siblings, 1 reply; 37+ messages in thread
From: Sean Christopherson @ 2018-09-10 15:28 UTC (permalink / raw)
  To: Brijesh Singh, Borislav Petkov
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Radim Krčmář

On Mon, 2018-09-10 at 10:10 -0500, Brijesh Singh wrote:
> 
> On 09/10/2018 08:29 AM, Sean Christopherson wrote:
> ...
> 
> > > > > + */
> > > > > +static struct pvclock_vsyscall_time_info
> > > > > +			hv_clock_aux[NR_CPUS] __decrypted_aux;
> > > > Hmm, so worst case that's 64 4K pages:
> > > > 
> > > > (8192*32)/4096 = 64 4K pages.
> > > We can minimize the worst case memory usage. The number of VCPUs
> > > supported by KVM maybe less than NR_CPUS. e.g Currently KVM_MAX_VCPUS is
> > > set to 288
> > KVM_MAX_VCPUS is a property of the host, whereas this code runs in the
> > guest, e.g. KVM_MAX_VCPUS could be 2048 in the host for all we know.
> > 
> 
> IIRC, during guest creation time qemu will check the host supported
> VCPUS count. If count is greater than KVM_MAX_VCPUS then it will
> fail to launch guest (or fail to hot plug vcpus). In other words, the
> number of vcpus in a KVM guest will never to > KVM_MAX_VCPUS.
> 
> Am I missing something ?

KVM_MAX_VCPUS is a definition for use in the *host*, it's even defined
in kvm_host.h.  The guest's pvclock code won't get magically recompiled
if KVM_MAX_VCPUS is changed in the host.  KVM_MAX_VCPUS is an arbitrary
value in the sense that there isn't a fundamental hard limit, i.e. the
value can be changed, either for a custom KVM build or in mainline,
e.g. it was bumped in 2016:

commit 682f732ecf7396e9d6fe24d44738966699fae6c0
Author: Radim Krčmář <rkrcmar@redhat.com>
Date:   Tue Jul 12 22:09:29 2016 +0200

    KVM: x86: bump MAX_VCPUS to 288
    
    288 is in high demand because of Knights Landing CPU.
    We cannot set the limit to 640k, because that would be wasting space.
    
    Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 074b5c760327..21a40dc7aad6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -34,7 +34,7 @@
 #include <asm/asm.h>
 #include <asm/kvm_page_track.h>
 
-#define KVM_MAX_VCPUS 255
+#define KVM_MAX_VCPUS 288
 #define KVM_SOFT_MAX_VCPUS 240
 #define KVM_USER_MEM_SLOTS 509
 /* memory slots that are not exposed to userspace */

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-10 15:28           ` Sean Christopherson
@ 2018-09-10 15:30             ` Brijesh Singh
  2018-09-10 16:48               ` Borislav Petkov
  0 siblings, 1 reply; 37+ messages in thread
From: Brijesh Singh @ 2018-09-10 15:30 UTC (permalink / raw)
  To: Sean Christopherson, Borislav Petkov
  Cc: brijesh.singh, x86, linux-kernel, kvm, Tom Lendacky,
	Thomas Gleixner, H. Peter Anvin, Paolo Bonzini,
	Radim Krčmář



On 09/10/2018 10:28 AM, Sean Christopherson wrote:
...

>>
>> IIRC, during guest creation time qemu will check the host supported
>> VCPUS count. If count is greater than KVM_MAX_VCPUS then it will
>> fail to launch guest (or fail to hot plug vcpus). In other words, the
>> number of vcpus in a KVM guest will never to > KVM_MAX_VCPUS.
>>
>> Am I missing something ?
> 
> KVM_MAX_VCPUS is a definition for use in the *host*, it's even defined
> in kvm_host.h.  The guest's pvclock code won't get magically recompiled
> if KVM_MAX_VCPUS is changed in the host.  KVM_MAX_VCPUS is an arbitrary
> value in the sense that there isn't a fundamental hard limit, i.e. the
> value can be changed, either for a custom KVM build or in mainline,
> e.g. it was bumped in 2016:
> 

Ah I see your point. thanks for clarifying it.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 4/5] x86/kvm: use __decrypted attribute in shared variables
  2018-09-10 13:15     ` Sean Christopherson
  2018-09-10 13:29       ` Thomas Gleixner
@ 2018-09-10 15:34       ` Borislav Petkov
  1 sibling, 0 replies; 37+ messages in thread
From: Borislav Petkov @ 2018-09-10 15:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Brijesh Singh, x86, linux-kernel, kvm, Tom Lendacky,
	Thomas Gleixner, H. Peter Anvin, Paolo Bonzini,
	Radim Krčmář

On Mon, Sep 10, 2018 at 06:15:10AM -0700, Sean Christopherson wrote:
> Heh, that was the original formatting until I asked him to change it:
> https://lkml.org/lkml/2018/8/29/809.  Checkpatch throws an error if
> there's a newline after 'Commit'.  Though looking at this again, there
> shouldn't be a colon after 'Commit' and there should be quotes inside
> the parentheses, e.g. this satisfies checkpatch:

Yeah, we don't take checkpatch too seriously - only when it makes sense
to a human brain. Roughly speaking. In this case, the logic should be,
the commit which is the culprit should be visible immediately.

Although after the Fixes: tag thing got introduced, that argument is not
as important anymore, I'd say.

As to the formatting, I think adding this to your git config helps in
that regard:

[alias]
	...
        one = show -s --pretty='format:%h (\"%s\")'

so that you can do:

$ git one 368a540e0232
368a540e0232 ("x86/kvmclock: Remove memblock dependency")

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-10 13:15     ` Brijesh Singh
  2018-09-10 13:29       ` Sean Christopherson
@ 2018-09-10 15:53       ` Borislav Petkov
  2018-09-10 16:13         ` Sean Christopherson
  2018-09-10 16:14         ` Brijesh Singh
  1 sibling, 2 replies; 37+ messages in thread
From: Borislav Petkov @ 2018-09-10 15:53 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Sean Christopherson,
	Radim Krčmář

On Mon, Sep 10, 2018 at 08:15:38AM -0500, Brijesh Singh wrote:
> > Now, the real question from all this SNAFU is, why can't all those point
> > to a single struct pvclock_vsyscall_time_info and all CPUs read a single
> > thing? Why do they have to be per-CPU and thus waste so much memory?

You forgot to answer to the real question - why do we need those things
to be perCPU and why can't we use a single instance to share with *all*
CPUs?

Because if we can, then we're golden!

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-10 15:53       ` Borislav Petkov
@ 2018-09-10 16:13         ` Sean Christopherson
  2018-09-10 16:14         ` Brijesh Singh
  1 sibling, 0 replies; 37+ messages in thread
From: Sean Christopherson @ 2018-09-10 16:13 UTC (permalink / raw)
  To: Borislav Petkov, Brijesh Singh
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Radim Krčmář

On Mon, 2018-09-10 at 17:53 +0200, Borislav Petkov wrote:
> On Mon, Sep 10, 2018 at 08:15:38AM -0500, Brijesh Singh wrote:
> > 
> > > 
> > > Now, the real question from all this SNAFU is, why can't all those point
> > > to a single struct pvclock_vsyscall_time_info and all CPUs read a single
> > > thing? Why do they have to be per-CPU and thus waste so much memory?
> You forgot to answer to the real question - why do we need those things
> to be perCPU and why can't we use a single instance to share with *all*
> CPUs?

I can't speak to the actual TSC stuff, but...

The pvclock ABI includes a per-vCPU bit, PVCLOCK_GUEST_STOPPED, to indicate
that the VM has been paused by the host.  The guest uses this information to
update its watchdogs to avoid false positives.  I have no idea if there are
use cases for setting STOPPED on a subset of vCPUs, but the ABI allows it so
here we are...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-10 15:53       ` Borislav Petkov
  2018-09-10 16:13         ` Sean Christopherson
@ 2018-09-10 16:14         ` Brijesh Singh
  1 sibling, 0 replies; 37+ messages in thread
From: Brijesh Singh @ 2018-09-10 16:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: brijesh.singh, x86, linux-kernel, kvm, Tom Lendacky,
	Thomas Gleixner, H. Peter Anvin, Paolo Bonzini,
	Sean Christopherson, Radim Krčmář



On 09/10/2018 10:53 AM, Borislav Petkov wrote:
> On Mon, Sep 10, 2018 at 08:15:38AM -0500, Brijesh Singh wrote:
>>> Now, the real question from all this SNAFU is, why can't all those point
>>> to a single struct pvclock_vsyscall_time_info and all CPUs read a single
>>> thing? Why do they have to be per-CPU and thus waste so much memory?
> 
> You forgot to answer to the real question - why do we need those things
> to be perCPU and why can't we use a single instance to share with *all*
> CPUs?
> 
> Because if we can, then we're golden!
> 


I did not forgot to answer it. Maybe Paolo or someone more knowledgeable
in that area of code can comment why those are perCpu.

-Brijesh

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-10 15:30             ` Brijesh Singh
@ 2018-09-10 16:48               ` Borislav Petkov
  2018-09-11  9:26                 ` Paolo Bonzini
  0 siblings, 1 reply; 37+ messages in thread
From: Borislav Petkov @ 2018-09-10 16:48 UTC (permalink / raw)
  To: Brijesh Singh, Sean Christopherson
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Radim Krčmář

Ok,

so *maybe* - and I pretty much have no clue about virt - but just
*maybe*, that kvmclock thing can be shared by all CPUs in a *guest*
then. As in: the guest should see stable clocks which are the same
regardless from which vCPU they're read and so on...

Just a dumb idea anyway - this is me thinking about baremetal and trying
to convert that to virt.

But I'd say the memory savings aspect is something we can discuss later,
when there's time and after the regression at hand has been addressed...

Thx.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-10 16:48               ` Borislav Petkov
@ 2018-09-11  9:26                 ` Paolo Bonzini
  2018-09-11 10:01                   ` Borislav Petkov
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2018-09-11  9:26 UTC (permalink / raw)
  To: Borislav Petkov, Brijesh Singh, Sean Christopherson
  Cc: x86, linux-kernel, kvm, Tom Lendacky, Thomas Gleixner,
	H. Peter Anvin, Radim Krčmář

On 10/09/2018 18:48, Borislav Petkov wrote:
> 
> so *maybe* - and I pretty much have no clue about virt - but just
> *maybe*, that kvmclock thing can be shared by all CPUs in a *guest*
> then. As in: the guest should see stable clocks which are the same
> regardless from which vCPU they're read and so on...

Usually the kvmclock structs are all the same, but there is support for
old machines with inconsistent TSCs (across different sockets typically).

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-11  9:26                 ` Paolo Bonzini
@ 2018-09-11 10:01                   ` Borislav Petkov
  2018-09-11 10:19                     ` Paolo Bonzini
  0 siblings, 1 reply; 37+ messages in thread
From: Borislav Petkov @ 2018-09-11 10:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Brijesh Singh, Sean Christopherson, x86, linux-kernel, kvm,
	Tom Lendacky, Thomas Gleixner, H. Peter Anvin,
	Radim Krčmář

On Tue, Sep 11, 2018 at 11:26:21AM +0200, Paolo Bonzini wrote:
> Usually the kvmclock structs are all the same, but there is support for
> old machines with inconsistent TSCs (across different sockets typically).

Would that be a problem, though? Sounds like an "improvement" to me. :-)

I mean, if we keep using the same TSC across all vCPUs, the guest will
actually see a single TSC and thus have stable and synchronized TSCs.
Unlike the host.

I.e., the guest will be better than the host! :-)

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-11 10:01                   ` Borislav Petkov
@ 2018-09-11 10:19                     ` Paolo Bonzini
  2018-09-11 10:25                       ` Borislav Petkov
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2018-09-11 10:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Brijesh Singh, Sean Christopherson, x86, linux-kernel, kvm,
	Tom Lendacky, Thomas Gleixner, H. Peter Anvin,
	Radim Krčmář

On 11/09/2018 12:01, Borislav Petkov wrote:
> On Tue, Sep 11, 2018 at 11:26:21AM +0200, Paolo Bonzini wrote:
>> Usually the kvmclock structs are all the same, but there is support for
>> old machines with inconsistent TSCs (across different sockets typically).
> 
> Would that be a problem, though? Sounds like an "improvement" to me. :-)
> 
> I mean, if we keep using the same TSC across all vCPUs, the guest will
> actually see a single TSC and thus have stable and synchronized TSCs.
> Unlike the host.

That's exactly what kvmclock is for, it provides a stable and
synchronized clock on top of unsynchronized TSCs.  But that's also why
you need one struct per vCPU, at least in the synchronized case.

Paolo

> I.e., the guest will be better than the host! :-)
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-11 10:19                     ` Paolo Bonzini
@ 2018-09-11 10:25                       ` Borislav Petkov
  2018-09-11 11:07                         ` Paolo Bonzini
  0 siblings, 1 reply; 37+ messages in thread
From: Borislav Petkov @ 2018-09-11 10:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Brijesh Singh, Sean Christopherson, x86, linux-kernel, kvm,
	Tom Lendacky, Thomas Gleixner, H. Peter Anvin,
	Radim Krčmář

On Tue, Sep 11, 2018 at 12:19:13PM +0200, Paolo Bonzini wrote:
> That's exactly what kvmclock is for, it provides a stable and
> synchronized clock on top of unsynchronized TSCs.  But that's also why
> you need one struct per vCPU, at least in the synchronized case.

Why?

Why can't it be a single pointer to a struct pvclock_vsyscall_time_info
shared between all vCPUs?

Or does each vCPU write its own specific stuff into it so it has to be
per-vCPU?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-11 10:25                       ` Borislav Petkov
@ 2018-09-11 11:07                         ` Paolo Bonzini
  2018-09-11 13:55                           ` Borislav Petkov
  0 siblings, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2018-09-11 11:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Brijesh Singh, Sean Christopherson, x86, linux-kernel, kvm,
	Tom Lendacky, Thomas Gleixner, H. Peter Anvin,
	Radim Krčmář

On 11/09/2018 12:25, Borislav Petkov wrote:
> On Tue, Sep 11, 2018 at 12:19:13PM +0200, Paolo Bonzini wrote:
>> That's exactly what kvmclock is for, it provides a stable and
>> synchronized clock on top of unsynchronized TSCs.  But that's also why
>> you need one struct per vCPU, at least in the synchronized case.
                                                 ^^^^^^^^^^^^

unsynchronized

> 
> Why?
> 
> Why can't it be a single pointer to a struct pvclock_vsyscall_time_info
> shared between all vCPUs?
> 
> Or does each vCPU write its own specific stuff into it so it has to be
> per-vCPU?

If the host TSCs are unsynchronized then yes, that's what happens.  And
you can do live migration from synchronized to unsynchronized.

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-11 11:07                         ` Paolo Bonzini
@ 2018-09-11 13:55                           ` Borislav Petkov
  2018-09-11 14:00                             ` Paolo Bonzini
  0 siblings, 1 reply; 37+ messages in thread
From: Borislav Petkov @ 2018-09-11 13:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Brijesh Singh, Sean Christopherson, x86, linux-kernel, kvm,
	Tom Lendacky, Thomas Gleixner, H. Peter Anvin,
	Radim Krčmář

On Tue, Sep 11, 2018 at 01:07:06PM +0200, Paolo Bonzini wrote:
> If the host TSCs are unsynchronized then yes, that's what happens.  And
> you can do live migration from synchronized to unsynchronized.

Which brings us back to my original question: why would we *ever* want
to support unsynchronized TSCs in a guest? Such machines are a real
abomination for baremetal - it doesn't make *any* sense to me to have
that in guests too, if it can be helped...

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active
  2018-09-11 13:55                           ` Borislav Petkov
@ 2018-09-11 14:00                             ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2018-09-11 14:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Brijesh Singh, Sean Christopherson, x86, linux-kernel, kvm,
	Tom Lendacky, Thomas Gleixner, H. Peter Anvin,
	Radim Krčmář

On 11/09/2018 15:55, Borislav Petkov wrote:
> On Tue, Sep 11, 2018 at 01:07:06PM +0200, Paolo Bonzini wrote:
>> If the host TSCs are unsynchronized then yes, that's what happens.  And
>> you can do live migration from synchronized to unsynchronized.
> 
> Which brings us back to my original question: why would we *ever* want
> to support unsynchronized TSCs in a guest? Such machines are a real
> abomination for baremetal - it doesn't make *any* sense to me to have
> that in guests too, if it can be helped...

No, wait.  The host TSC is unsynchronized, _so_ you need one kvmclock
struct per vCPU.  The resulting kvmclock is synchronized.

Paolo


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2018-09-11 14:00 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-07 17:57 [PATCH v6 0/5] x86: Fix SEV guest regression Brijesh Singh
2018-09-07 17:57 ` [PATCH v6 1/5] x86/mm: Restructure sme_encrypt_kernel() Brijesh Singh
2018-09-10 11:32   ` Borislav Petkov
2018-09-07 17:57 ` [PATCH v6 2/5] x86/mm: fix sme_populate_pgd() to update page flags Brijesh Singh
2018-09-10 11:36   ` Borislav Petkov
2018-09-10 12:28     ` Brijesh Singh
2018-09-10 12:32       ` Borislav Petkov
2018-09-07 17:57 ` [PATCH v6 3/5] x86/mm: add .data..decrypted section to hold shared variables Brijesh Singh
2018-09-10 11:54   ` Borislav Petkov
2018-09-10 12:33     ` Brijesh Singh
2018-09-07 17:57 ` [PATCH v6 4/5] x86/kvm: use __decrypted attribute in " Brijesh Singh
2018-09-10 12:04   ` Borislav Petkov
2018-09-10 13:15     ` Sean Christopherson
2018-09-10 13:29       ` Thomas Gleixner
2018-09-10 15:34       ` Borislav Petkov
2018-09-10 12:29   ` Paolo Bonzini
2018-09-10 12:33     ` Borislav Petkov
2018-09-10 12:46       ` Paolo Bonzini
2018-09-07 17:57 ` [PATCH v6 5/5] x86/kvm: Avoid dynamic allocation of pvclock data when SEV is active Brijesh Singh
2018-09-10 12:27   ` Borislav Petkov
2018-09-10 13:15     ` Brijesh Singh
2018-09-10 13:29       ` Sean Christopherson
2018-09-10 15:10         ` Brijesh Singh
2018-09-10 15:28           ` Sean Christopherson
2018-09-10 15:30             ` Brijesh Singh
2018-09-10 16:48               ` Borislav Petkov
2018-09-11  9:26                 ` Paolo Bonzini
2018-09-11 10:01                   ` Borislav Petkov
2018-09-11 10:19                     ` Paolo Bonzini
2018-09-11 10:25                       ` Borislav Petkov
2018-09-11 11:07                         ` Paolo Bonzini
2018-09-11 13:55                           ` Borislav Petkov
2018-09-11 14:00                             ` Paolo Bonzini
2018-09-10 15:53       ` Borislav Petkov
2018-09-10 16:13         ` Sean Christopherson
2018-09-10 16:14         ` Brijesh Singh
2018-09-10 12:28   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.