linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Kexec enabling in TDX guest
@ 2023-02-13 23:48 Kirill A. Shutemov
  2023-02-13 23:48 ` [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec Kirill A. Shutemov
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2023-02-13 23:48 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Isaku Yamahata, x86,
	linux-coco, kexec, linux-kernel, Kirill A. Shutemov

The patch brings basic enabling of kexec in TDX guests.

By "basic enabling" I mean, kexec in the guests with a single CPU.
TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
doesn't allow to put a CPU back offline if it has woken up.

We are looking into this, but it might take time.

Kirill A. Shutemov (2):
  x86/kexec: Preserve CR4.MCE during kexec
  x86/tdx: Convert shared memory back to private on kexec

 arch/x86/coco/tdx/Makefile           |  1 +
 arch/x86/coco/tdx/kexec.c            | 82 ++++++++++++++++++++++++++++
 arch/x86/include/asm/tdx.h           |  4 ++
 arch/x86/kernel/machine_kexec_64.c   |  2 +
 arch/x86/kernel/relocate_kernel_64.S |  6 +-
 5 files changed, 94 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/coco/tdx/kexec.c

-- 
2.39.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec
  2023-02-13 23:48 [PATCH 0/2] Kexec enabling in TDX guest Kirill A. Shutemov
@ 2023-02-13 23:48 ` Kirill A. Shutemov
  2023-02-16  1:49   ` Edgecombe, Rick P
  2023-02-16 19:46   ` Andrew Cooper
  2023-02-13 23:48 ` [PATCH 2/2] x86/tdx: Convert shared memory back to private on kexec Kirill A. Shutemov
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2023-02-13 23:48 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Isaku Yamahata, x86,
	linux-coco, kexec, linux-kernel, Kirill A. Shutemov

TDX guests are not allowed to clear CR4.MCE. Attempt to clear it leads
to #VE.

Preserve the flag during kexec.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/relocate_kernel_64.S | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 4a73351f87f8..18f19dcc40e9 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -145,8 +145,12 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	 * Set cr4 to a known state:
 	 *  - physical address extension enabled
 	 *  - 5-level paging, if it was enabled before
+	 *  - Preserve MCE, if it was set. Clearing MCE may fault in some
+	 *    environments.
 	 */
-	movl	$X86_CR4_PAE, %eax
+	movq	%cr4, %rax
+	andl	$X86_CR4_MCE, %eax
+	orl	$X86_CR4_PAE, %eax
 	testq	$X86_CR4_LA57, %r13
 	jz	1f
 	orl	$X86_CR4_LA57, %eax
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] x86/tdx: Convert shared memory back to private on kexec
  2023-02-13 23:48 [PATCH 0/2] Kexec enabling in TDX guest Kirill A. Shutemov
  2023-02-13 23:48 ` [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec Kirill A. Shutemov
@ 2023-02-13 23:48 ` Kirill A. Shutemov
  2023-02-15  1:53   ` Huang, Kai
  2023-02-16 17:50 ` [PATCH 0/2] Kexec enabling in TDX guest Dave Hansen
  2023-02-22 10:26 ` David Woodhouse
  3 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2023-02-13 23:48 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Isaku Yamahata, x86,
	linux-coco, kexec, linux-kernel, Kirill A. Shutemov

TDX guests allocate shared buffers to perform I/O. It is done by
allocating pages normally from the buddy allocator and converting them
to shared with set_memory_decrypted().

The target kernel has no idea what memory is converted this way. It only
sees E820_TYPE_RAM.

Accessing shared memory via private mapping is fatal. It leads to
unrecoverable TD exit.

Walk direct mapping and convert all shared memory back to private.
It makes all RAM private again and target kernel may use it normally.

Skip the conversion on kexec of crashkernel. It uses own pool of memory
and will not accidentally allocate from the memory the first kernel made
shared.

For crash investigation, it might be useful to access data in the shared
buffers.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/coco/tdx/Makefile         |  1 +
 arch/x86/coco/tdx/kexec.c          | 82 ++++++++++++++++++++++++++++++
 arch/x86/include/asm/tdx.h         |  4 ++
 arch/x86/kernel/machine_kexec_64.c |  2 +
 4 files changed, 89 insertions(+)
 create mode 100644 arch/x86/coco/tdx/kexec.c

diff --git a/arch/x86/coco/tdx/Makefile b/arch/x86/coco/tdx/Makefile
index 46c55998557d..a5daa2a33531 100644
--- a/arch/x86/coco/tdx/Makefile
+++ b/arch/x86/coco/tdx/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-y += tdx.o tdcall.o
+obj-$(CONFIG_KEXEC_CORE) += kexec.o
diff --git a/arch/x86/coco/tdx/kexec.c b/arch/x86/coco/tdx/kexec.c
new file mode 100644
index 000000000000..f1f31515f372
--- /dev/null
+++ b/arch/x86/coco/tdx/kexec.c
@@ -0,0 +1,82 @@
+#define pr_fmt(fmt)  "tdx: " fmt
+
+#include <linux/pagewalk.h>
+#include <asm/tdx.h>
+#include <asm/x86_init.h>
+
+static inline bool pud_decrypted(pud_t pud)
+{
+	return cc_mkdec(pud_val(pud)) == pud_val(pud);
+}
+
+static inline bool pmd_decrypted(pmd_t pmd)
+{
+	return cc_mkdec(pmd_val(pmd)) == pmd_val(pmd);
+}
+
+static inline bool pte_decrypted(pte_t pte)
+{
+	return cc_mkdec(pte_val(pte)) == pte_val(pte);
+}
+
+static inline void unshare_range(unsigned long start, unsigned long end)
+{
+	int pages = (end - start) / PAGE_SIZE;
+
+	if (!x86_platform.guest.enc_status_change_finish(start, pages, true))
+		pr_err("Failed to unshare range %#lx-%#lx\n", start, end);
+}
+
+static int unshare_pud(pud_t *pud, unsigned long addr, unsigned long next,
+		       struct mm_walk *walk)
+{
+	if (pud_decrypted(*pud))
+		unshare_range(addr, next);
+
+	return 0;
+}
+
+static int unshare_pmd(pmd_t *pmd, unsigned long addr, unsigned long next,
+		       struct mm_walk *walk)
+{
+	if (pmd_decrypted(*pmd))
+		unshare_range(addr, next);
+
+	return 0;
+}
+
+static int unshare_pte(pte_t *pte, unsigned long addr, unsigned long next,
+		       struct mm_walk *walk)
+{
+	if (pte_decrypted(*pte))
+		unshare_range(addr, next);
+
+	return 0;
+}
+
+static const struct mm_walk_ops unshare_ops = {
+	.pud_entry = unshare_pud,
+	.pmd_entry = unshare_pmd,
+	.pte_entry = unshare_pte,
+};
+
+void tdx_kexec_prepare(bool crash)
+{
+	/*
+	 * Crash kernel may want to see data in the shared buffers.
+	 * Do not revert them to private on kexec of crash kernel.
+	 */
+	if (crash)
+		return;
+
+	/*
+	 * Walk direct mapping and convert all shared memory back to private,
+	 * so the target kernel will be able use it normally.
+	 */
+	mmap_write_lock(&init_mm);
+	walk_page_range_novma(&init_mm,
+			      PAGE_OFFSET,
+			      PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT),
+			      &unshare_ops, init_mm.pgd, NULL);
+	mmap_write_unlock(&init_mm);
+}
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 28d889c9aa16..7cdbf10e9f7d 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -69,6 +69,8 @@ bool tdx_early_handle_ve(struct pt_regs *regs);
 
 int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport);
 
+void tdx_kexec_prepare(bool crash);
+
 #else
 
 static inline void tdx_early_init(void) { };
@@ -76,6 +78,8 @@ static inline void tdx_safe_halt(void) { };
 
 static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return false; }
 
+static inline void tdx_kexec_prepare(bool crash) {}
+
 #endif /* CONFIG_INTEL_TDX_GUEST */
 
 #if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 0611fd83858e..adbb3e5347da 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -28,6 +28,7 @@
 #include <asm/setup.h>
 #include <asm/set_memory.h>
 #include <asm/cpu.h>
+#include <asm/tdx.h>
 
 #ifdef CONFIG_ACPI
 /*
@@ -312,6 +313,7 @@ void machine_kexec(struct kimage *image)
 	local_irq_disable();
 	hw_breakpoint_disable();
 	cet_disable();
+	tdx_kexec_prepare(image->type == KEXEC_TYPE_CRASH);
 
 	if (image->preserve_context) {
 #ifdef CONFIG_X86_IO_APIC
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] x86/tdx: Convert shared memory back to private on kexec
  2023-02-13 23:48 ` [PATCH 2/2] x86/tdx: Convert shared memory back to private on kexec Kirill A. Shutemov
@ 2023-02-15  1:53   ` Huang, Kai
  0 siblings, 0 replies; 15+ messages in thread
From: Huang, Kai @ 2023-02-15  1:53 UTC (permalink / raw)
  To: kirill.shutemov, Hansen, Dave, bp
  Cc: sathyanarayanan.kuppuswamy, tglx, linux-coco, kexec,
	linux-kernel, Yamahata, Isaku, x86


> +void tdx_kexec_prepare(bool crash)
> +{
> +	/*
> +	 * Crash kernel may want to see data in the shared buffers.
> +	 * Do not revert them to private on kexec of crash kernel.
> +	 */
> +	if (crash)
> +		return;
> +
> +	/*
> +	 * Walk direct mapping and convert all shared memory back to private,
> +	 * so the target kernel will be able use it normally.
> +	 */
> +	mmap_write_lock(&init_mm);
> +	walk_page_range_novma(&init_mm,
> +			      PAGE_OFFSET,
> +			      PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT),
> +			      &unshare_ops, init_mm.pgd, NULL);
> +	mmap_write_unlock(&init_mm);
> +}

Looks the page table walk is done unconditionally when !crash.

I think it's better to check whether this is TDX guest (either this function, or
below in machine_kexec()) and just return early if it's not a TDX guest?

[..]


>  /*
> @@ -312,6 +313,7 @@ void machine_kexec(struct kimage *image)
>  	local_irq_disable();
>  	hw_breakpoint_disable();
>  	cet_disable();
> +	tdx_kexec_prepare(image->type == KEXEC_TYPE_CRASH);
>  
>  	if (image->preserve_context) {
>  #ifdef CONFIG_X86_IO_APIC


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec
  2023-02-13 23:48 ` [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec Kirill A. Shutemov
@ 2023-02-16  1:49   ` Edgecombe, Rick P
  2023-02-16  9:43     ` Kirill A. Shutemov
  2023-02-16 19:46   ` Andrew Cooper
  1 sibling, 1 reply; 15+ messages in thread
From: Edgecombe, Rick P @ 2023-02-16  1:49 UTC (permalink / raw)
  To: kirill.shutemov, bp, Hansen, Dave
  Cc: sathyanarayanan.kuppuswamy, tglx, linux-kernel, linux-coco,
	Yamahata, Isaku, x86, kexec

On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote:
> TDX guests are not allowed to clear CR4.MCE. Attempt to clear it
> leads
> to #VE.
> 
> Preserve the flag during kexec.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

I wonder whats going on with the pre-existing switching between eax and
rax in this code for the cr0 and cr4 manipulations. Do you know what
the reason is?

Also, for a simple non-tdx kexec regression test:
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec
  2023-02-16  1:49   ` Edgecombe, Rick P
@ 2023-02-16  9:43     ` Kirill A. Shutemov
  2023-02-16 17:33       ` Edgecombe, Rick P
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2023-02-16  9:43 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: kirill.shutemov, bp, Hansen, Dave, sathyanarayanan.kuppuswamy,
	tglx, linux-kernel, linux-coco, Yamahata, Isaku, x86, kexec

On Thu, Feb 16, 2023 at 01:49:39AM +0000, Edgecombe, Rick P wrote:
> On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote:
> > TDX guests are not allowed to clear CR4.MCE. Attempt to clear it
> > leads
> > to #VE.
> > 
> > Preserve the flag during kexec.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> I wonder whats going on with the pre-existing switching between eax and
> rax in this code for the cr0 and cr4 manipulations. Do you know what
> the reason is?

32-bit ORs and ANDs save one byte per instruction. And there's no 32-bit
MOV to/from control registers in 64-bit mode.

> 
> Also, for a simple non-tdx kexec regression test:
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

Thanks!

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec
  2023-02-16  9:43     ` Kirill A. Shutemov
@ 2023-02-16 17:33       ` Edgecombe, Rick P
  0 siblings, 0 replies; 15+ messages in thread
From: Edgecombe, Rick P @ 2023-02-16 17:33 UTC (permalink / raw)
  To: kirill
  Cc: kexec, kirill.shutemov, linux-kernel, tglx,
	sathyanarayanan.kuppuswamy, x86, linux-coco, bp, Hansen, Dave,
	Yamahata, Isaku

On Thu, 2023-02-16 at 12:43 +0300, Kirill A. Shutemov wrote:
> On Thu, Feb 16, 2023 at 01:49:39AM +0000, Edgecombe, Rick P wrote:
> > On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote:
> > > TDX guests are not allowed to clear CR4.MCE. Attempt to clear it
> > > leads
> > > to #VE.
> > > 
> > > Preserve the flag during kexec.
> > > 
> > > Signed-off-by: Kirill A. Shutemov <
> > > kirill.shutemov@linux.intel.com>
> > 
> > I wonder whats going on with the pre-existing switching between eax
> > and
> > rax in this code for the cr0 and cr4 manipulations. Do you know
> > what
> > the reason is?
> 
> 32-bit ORs and ANDs save one byte per instruction. And there's no 32-
> bit
> MOV to/from control registers in 64-bit mode.

Oh right, I think I recall now. There is a 64 bit AND in the CR0 piece
here too, which of course is outside of these changes.

But otherwise, it's not clear from the patch what the implications are
of leaving CR4.MCE set for the non-TDX environment. I see in head_64.S
it will clear it during boot if the kernel doesn't support machine
check. So it leaves a little window where CR4.MCE is set where it
wasn't before.

The piece in head_64.S talks about how an #MC will crash the system if
it happens before the machine check stuff is fully setup anyway, so it
doesn't hurt to leave it on. Is that the reasoning for this change as
well? If so it might help to add a little more about the reasoning in
the commit log.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Kexec enabling in TDX guest
  2023-02-13 23:48 [PATCH 0/2] Kexec enabling in TDX guest Kirill A. Shutemov
  2023-02-13 23:48 ` [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec Kirill A. Shutemov
  2023-02-13 23:48 ` [PATCH 2/2] x86/tdx: Convert shared memory back to private on kexec Kirill A. Shutemov
@ 2023-02-16 17:50 ` Dave Hansen
  2023-02-16 18:12   ` Kirill A. Shutemov
  2023-02-22 10:26 ` David Woodhouse
  3 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2023-02-16 17:50 UTC (permalink / raw)
  To: Kirill A. Shutemov, Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Isaku Yamahata, x86,
	linux-coco, kexec, linux-kernel

On 2/13/23 15:48, Kirill A. Shutemov wrote:
> The patch brings basic enabling of kexec in TDX guests.
> 
> By "basic enabling" I mean, kexec in the guests with a single CPU.
> TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
> doesn't allow to put a CPU back offline if it has woken up.
> 
> We are looking into this, but it might take time.

This is simple enough.  But, nobody will _actually_ use this code as-is,
right?  What's the point of applying it now?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Kexec enabling in TDX guest
  2023-02-16 17:50 ` [PATCH 0/2] Kexec enabling in TDX guest Dave Hansen
@ 2023-02-16 18:12   ` Kirill A. Shutemov
  2023-02-16 18:32     ` Dave Hansen
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2023-02-16 18:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kirill A. Shutemov, Borislav Petkov, Kuppuswamy Sathyanarayanan,
	Thomas Gleixner, Isaku Yamahata, x86, linux-coco, kexec,
	linux-kernel

On Thu, Feb 16, 2023 at 09:50:32AM -0800, Dave Hansen wrote:
> On 2/13/23 15:48, Kirill A. Shutemov wrote:
> > The patch brings basic enabling of kexec in TDX guests.
> > 
> > By "basic enabling" I mean, kexec in the guests with a single CPU.
> > TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
> > doesn't allow to put a CPU back offline if it has woken up.
> > 
> > We are looking into this, but it might take time.
> 
> This is simple enough.  But, nobody will _actually_ use this code as-is,
> right?  What's the point of applying it now?

Why nobody? Single CPU VMs are not that uncommon.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Kexec enabling in TDX guest
  2023-02-16 18:12   ` Kirill A. Shutemov
@ 2023-02-16 18:32     ` Dave Hansen
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Hansen @ 2023-02-16 18:32 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Borislav Petkov, Kuppuswamy Sathyanarayanan,
	Thomas Gleixner, Isaku Yamahata, x86, linux-coco, kexec,
	linux-kernel

On 2/16/23 10:12, Kirill A. Shutemov wrote:
> On Thu, Feb 16, 2023 at 09:50:32AM -0800, Dave Hansen wrote:
>> On 2/13/23 15:48, Kirill A. Shutemov wrote:
>>> The patch brings basic enabling of kexec in TDX guests.
>>>
>>> By "basic enabling" I mean, kexec in the guests with a single CPU.
>>> TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
>>> doesn't allow to put a CPU back offline if it has woken up.
>>>
>>> We are looking into this, but it might take time.
>> This is simple enough.  But, nobody will _actually_ use this code as-is,
>> right?  What's the point of applying it now?
> Why nobody? Single CPU VMs are not that uncommon.

Here's one data point: the only "General Purpose" ones I see AWS
offering are Haswell era:

	https://aws.amazon.com/ec2/instance-types/

That _might_ be because of concerns about SMT side-channel exposure on
anything newer.

So, we can argue about what "uncommon" means.  But, a minority of folks
care about 1-cpu VMs.  Also, a separate minority of folks care about
kexec().  I'm worried that the overlap between the two will be an
*OVERWHELMING* minority of folks.  In other words, so few people will
use this code that it'll just bitrot.

I'm looking for compelling arguments why mainline should carry this.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec
  2023-02-13 23:48 ` [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec Kirill A. Shutemov
  2023-02-16  1:49   ` Edgecombe, Rick P
@ 2023-02-16 19:46   ` Andrew Cooper
  1 sibling, 0 replies; 15+ messages in thread
From: Andrew Cooper @ 2023-02-16 19:46 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dave Hansen, Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Isaku Yamahata, x86,
	linux-coco, kexec, linux-kernel

On 13/02/2023 11:48 pm, Kirill A. Shutemov wrote:
> TDX guests are not allowed to clear CR4.MCE. Attempt to clear it leads
> to #VE.
>
> Preserve the flag during kexec.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

That's unfortunate for TDX, but in a non-TDX system you must never
maintain CR4.MCE into kexec.

Async events, including NMIs, cannot be taken between this point and the
target having set itself up into it's intended operating mode.  During
this period you get all kinds of fun with type confusion in the IDT/TSS
and/or not having a safe stack to service the event.

A clean shutdown from not having machine checks enabled is far
preferable to trying to deliver an #MC in purgatory.

That, or you're welcome to debug the next bug report I get where
(amazingly) an NMI managed to hit with a good stack in the new context,
but most of the old context (IDT, TSS and .text) still intact enough to
start emitting a very confused oops onto serial...

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Kexec enabling in TDX guest
  2023-02-13 23:48 [PATCH 0/2] Kexec enabling in TDX guest Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2023-02-16 17:50 ` [PATCH 0/2] Kexec enabling in TDX guest Dave Hansen
@ 2023-02-22 10:26 ` David Woodhouse
  2023-02-24 14:30   ` Kirill A. Shutemov
  3 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2023-02-22 10:26 UTC (permalink / raw)
  To: Kirill A. Shutemov, Dave Hansen, Borislav Petkov
  Cc: Kuppuswamy Sathyanarayanan, Thomas Gleixner, Isaku Yamahata, x86,
	linux-coco, kexec, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]

On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote:
> The patch brings basic enabling of kexec in TDX guests.
> 
> By "basic enabling" I mean, kexec in the guests with a single CPU.
> TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
> doesn't allow to put a CPU back offline if it has woken up.
> 
> We are looking into this, but it might take time.

Can't we park the secondary CPUs in a purgatory-like thing of their own
and wake them from there when we want them?

Patches for that were floating around once, although the primary reason
then was latency, and we decided to address that differently by doing
the bringup in parallel instead.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Kexec enabling in TDX guest
  2023-02-22 10:26 ` David Woodhouse
@ 2023-02-24 14:30   ` Kirill A. Shutemov
  2023-02-24 15:22     ` Dave Hansen
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2023-02-24 14:30 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Dave Hansen, Borislav Petkov, Kuppuswamy Sathyanarayanan,
	Thomas Gleixner, Isaku Yamahata, x86, linux-coco, kexec,
	linux-kernel

On Wed, Feb 22, 2023 at 10:26:22AM +0000, David Woodhouse wrote:
> On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote:
> > The patch brings basic enabling of kexec in TDX guests.
> > 
> > By "basic enabling" I mean, kexec in the guests with a single CPU.
> > TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
> > doesn't allow to put a CPU back offline if it has woken up.
> > 
> > We are looking into this, but it might take time.
> 
> Can't we park the secondary CPUs in a purgatory-like thing of their own
> and wake them from there when we want them?
> 
> Patches for that were floating around once, although the primary reason
> then was latency, and we decided to address that differently by doing
> the bringup in parallel instead.

That's plan B. It is suboptimal. kexec() can happen into something that is
not Linux which will not be able to wake up CPUs.

Ideally, it has to be addressed on BIOS level: it has to provide a way to
offline CPUs, putting it back to pre-wakeup state.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Kexec enabling in TDX guest
  2023-02-24 14:30   ` Kirill A. Shutemov
@ 2023-02-24 15:22     ` Dave Hansen
  2023-02-24 16:12       ` Kirill A. Shutemov
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2023-02-24 15:22 UTC (permalink / raw)
  To: Kirill A. Shutemov, David Woodhouse
  Cc: Borislav Petkov, Kuppuswamy Sathyanarayanan, Thomas Gleixner,
	Isaku Yamahata, x86, linux-coco, kexec, linux-kernel

On 2/24/23 06:30, Kirill A. Shutemov wrote:
> Ideally, it has to be addressed on BIOS level: it has to provide a way to
> offline CPUs, putting it back to pre-wakeup state.

Is there anything stopping us from just parking the CPUs in a loop
looking at 'acpi_mp_wake_mailbox_paddr'?  Basically park them in a way
which is indistinguishable from what the BIOS did.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Kexec enabling in TDX guest
  2023-02-24 15:22     ` Dave Hansen
@ 2023-02-24 16:12       ` Kirill A. Shutemov
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2023-02-24 16:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: David Woodhouse, Borislav Petkov, Kuppuswamy Sathyanarayanan,
	Thomas Gleixner, Isaku Yamahata, x86, linux-coco, kexec,
	linux-kernel, rafael

On Fri, Feb 24, 2023 at 07:22:18AM -0800, Dave Hansen wrote:
> On 2/24/23 06:30, Kirill A. Shutemov wrote:
> > Ideally, it has to be addressed on BIOS level: it has to provide a way to
> > offline CPUs, putting it back to pre-wakeup state.
> 
> Is there anything stopping us from just parking the CPUs in a loop
> looking at 'acpi_mp_wake_mailbox_paddr'?  Basically park them in a way
> which is indistinguishable from what the BIOS did.

+Rafael.

 - Forward compatibility can be an issue. Version 0 of mailbox supports
   only single Wakeup command. Future specs may define a new command that
   kernel implementation doesn't support.

 - BIOS owns the mailbox page and can re-use for something else after the
   last CPU has woken up. (I know it is very theoretical, but still.)

 - We can patch ACPI table to point to mailbox page in kernel allocated
   memory, but it brings other problem. If the first kernel didn't wake up
   all CPUs for some reason (CONFIG_SMP=n or nr_cpus= or something) the
   second kernel would not be able to wake up them too since they looping
   around the old address.

But ultimately, I think it is clearly missing BIOS functionality and has
to be addressed there. Hacking around it in kernel will lead to more
problems down the road.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-02-24 16:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-13 23:48 [PATCH 0/2] Kexec enabling in TDX guest Kirill A. Shutemov
2023-02-13 23:48 ` [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec Kirill A. Shutemov
2023-02-16  1:49   ` Edgecombe, Rick P
2023-02-16  9:43     ` Kirill A. Shutemov
2023-02-16 17:33       ` Edgecombe, Rick P
2023-02-16 19:46   ` Andrew Cooper
2023-02-13 23:48 ` [PATCH 2/2] x86/tdx: Convert shared memory back to private on kexec Kirill A. Shutemov
2023-02-15  1:53   ` Huang, Kai
2023-02-16 17:50 ` [PATCH 0/2] Kexec enabling in TDX guest Dave Hansen
2023-02-16 18:12   ` Kirill A. Shutemov
2023-02-16 18:32     ` Dave Hansen
2023-02-22 10:26 ` David Woodhouse
2023-02-24 14:30   ` Kirill A. Shutemov
2023-02-24 15:22     ` Dave Hansen
2023-02-24 16:12       ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).