All of lore.kernel.org
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests
@ 2022-04-12 17:33 Varad Gautam
  2022-04-12 17:33 ` [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c Varad Gautam
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Varad Gautam @ 2022-04-12 17:33 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	seanjc, brijesh.singh, Thomas.Lendacky, jroedel, bp,
	varad.gautam

This series brings multi-vcpu support to UEFI tests on x86.

Most of the necessary AP bringup code already exists within kvm-unit-tests'
cstart64.S, and has now been either rewritten in C or moved to a common location
to be shared between EFI and non-EFI test builds.

A call gate is used to transition from 16-bit to 32-bit mode, since EFI may
not load the 32-bit entrypoint low enough to be reachable from the SIPI vector.

Changes in v2:
- rebase onto kvm-unit-tests@1a4529ce83 + seanjc's percpu apic_ops series [1].
- split some commits for readability.

The series has been tested with this patch [2] which fixes EFI pagetable setup.

Git branch: https://github.com/varadgautam/kvm-unit-tests/commits/ap-boot-v2/
[1] https://lore.kernel.org/all/20220121231852.1439917-1-seanjc@google.com/
[2] https://lore.kernel.org/kvm/20220406123312.12986-1-varad.gautam@suse.com/
v1: https://lore.kernel.org/all/20220408103127.19219-1-varad.gautam@suse.com/

Varad Gautam (10):
  x86: Move ap_init() to smp.c
  x86: Move load_idt() to desc.c
  x86: desc: Split IDT entry setup into a generic helper
  x86: Move load_gdt_tss() to desc.c
  x86: efi: Stop using UEFI-provided stack
  x86: efi: Stop using UEFI-provided %gs for percpu storage
  x86: efi, smp: Transition APs from 16-bit to 32-bit mode
  x86: Move 32-bit bringup routines to start32.S
  x86: efi, smp: Transition APs from 32-bit to 64-bit mode
  x86: Move ap_start64 and save_id to setup.c

 lib/x86/asm/setup.h       |   3 ++
 lib/x86/desc.c            |  39 +++++++++++---
 lib/x86/desc.h            |   3 ++
 lib/x86/setup.c           |  59 ++++++++++++++++-----
 lib/x86/smp.c             |  88 +++++++++++++++++++++++++++++++-
 lib/x86/smp.h             |   1 +
 x86/cstart64.S            | 105 ++------------------------------------
 x86/efi/crt0-efi-x86_64.S |   3 ++
 x86/efi/efistart64.S      |  69 ++++++++++++++++++++-----
 x86/start32.S             | 102 ++++++++++++++++++++++++++++++++++++
 10 files changed, 336 insertions(+), 136 deletions(-)
 create mode 100644 x86/start32.S

-- 
2.32.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c
  2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
@ 2022-04-12 17:33 ` Varad Gautam
  2022-04-13 15:16   ` Sean Christopherson
  2022-04-13 18:44   ` Sean Christopherson
  2022-04-12 17:33 ` [kvm-unit-tests PATCH v2 02/10] x86: Move load_idt() to desc.c Varad Gautam
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 27+ messages in thread
From: Varad Gautam @ 2022-04-12 17:33 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	seanjc, brijesh.singh, Thomas.Lendacky, jroedel, bp,
	varad.gautam

ap_init() copies the SIPI vector to lowmem, sends INIT/SIPI to APs
and waits on the APs to come up.

Port this routine to C from asm and move it to smp.c to allow sharing
this functionality between the EFI (-fPIC) and non-EFI builds.

Call ap_init() from the EFI setup path to reset the APs to a known
location.

Signed-off-by: Varad Gautam <varad.gautam@suse.com>
---
 lib/x86/setup.c      |  1 +
 lib/x86/smp.c        | 28 ++++++++++++++++++++++++++--
 lib/x86/smp.h        |  1 +
 x86/cstart64.S       | 20 ++------------------
 x86/efi/efistart64.S |  9 +++++++++
 5 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/lib/x86/setup.c b/lib/x86/setup.c
index 2d63a44..86ba6de 100644
--- a/lib/x86/setup.c
+++ b/lib/x86/setup.c
@@ -323,6 +323,7 @@ efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
 	load_idt();
 	mask_pic_interrupts();
 	enable_apic();
+	ap_init();
 	enable_x2apic();
 	smp_init();
 	setup_page_table();
diff --git a/lib/x86/smp.c b/lib/x86/smp.c
index 683b25d..d7f5aba 100644
--- a/lib/x86/smp.c
+++ b/lib/x86/smp.c
@@ -18,6 +18,9 @@ static volatile int ipi_done;
 static volatile bool ipi_wait;
 static int _cpu_count;
 static atomic_t active_cpus;
+extern u8 sipi_entry;
+extern u8 sipi_end;
+volatile unsigned cpu_online_count = 1;
 
 static __attribute__((used)) void ipi(void)
 {
@@ -114,8 +117,6 @@ void smp_init(void)
 	int i;
 	void ipi_entry(void);
 
-	_cpu_count = fwcfg_get_nb_cpus();
-
 	setup_idt();
 	init_apic_map();
 	set_idt_entry(IPI_VECTOR, ipi_entry, 0);
@@ -142,3 +143,26 @@ void smp_reset_apic(void)
 
 	atomic_inc(&active_cpus);
 }
+
+void ap_init(void)
+{
+	u8 *dst_addr = 0;
+	size_t sipi_sz = (&sipi_end - &sipi_entry) + 1;
+
+	asm volatile("cld");
+
+	/* Relocate SIPI vector to dst_addr so it can run in 16-bit mode. */
+	memcpy(dst_addr, &sipi_entry, sipi_sz);
+
+	/* INIT */
+	apic_icr_write(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_INIT | APIC_INT_ASSERT, 0);
+
+	/* SIPI */
+	apic_icr_write(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_STARTUP, 0);
+
+	_cpu_count = fwcfg_get_nb_cpus();
+
+	while (_cpu_count != cpu_online_count) {
+		;
+	}
+}
diff --git a/lib/x86/smp.h b/lib/x86/smp.h
index bd303c2..9c92853 100644
--- a/lib/x86/smp.h
+++ b/lib/x86/smp.h
@@ -78,5 +78,6 @@ void on_cpu(int cpu, void (*function)(void *data), void *data);
 void on_cpu_async(int cpu, void (*function)(void *data), void *data);
 void on_cpus(void (*function)(void *data), void *data);
 void smp_reset_apic(void);
+void ap_init(void);
 
 #endif
diff --git a/x86/cstart64.S b/x86/cstart64.S
index 7272452..f371d06 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -157,6 +157,7 @@ gdt32:
 gdt32_end:
 
 .code16
+.globl sipi_entry
 sipi_entry:
 	mov %cr0, %eax
 	or $1, %eax
@@ -168,6 +169,7 @@ gdt32_descr:
 	.word gdt32_end - gdt32 - 1
 	.long gdt32
 
+.globl sipi_end
 sipi_end:
 
 .code32
@@ -240,21 +242,3 @@ lvl5:
 
 online_cpus:
 	.fill (max_cpus + 7) / 8, 1, 0
-
-ap_init:
-	cld
-	lea sipi_entry, %rsi
-	xor %rdi, %rdi
-	mov $(sipi_end - sipi_entry), %rcx
-	rep movsb
-	mov $APIC_DEFAULT_PHYS_BASE, %eax
-	movl $(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_INIT | APIC_INT_ASSERT), APIC_ICR(%rax)
-	movl $(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_STARTUP), APIC_ICR(%rax)
-	call fwcfg_get_nb_cpus
-1:	pause
-	cmpw %ax, cpu_online_count
-	jne 1b
-	ret
-
-.align 2
-cpu_online_count:	.word 1
diff --git a/x86/efi/efistart64.S b/x86/efi/efistart64.S
index 017abba..0425153 100644
--- a/x86/efi/efistart64.S
+++ b/x86/efi/efistart64.S
@@ -57,3 +57,12 @@ load_gdt_tss:
 	pushq $0x08 /* 2nd entry in gdt64: 64-bit code segment */
 	pushq %rdi
 	lretq
+
+.code16
+
+.globl sipi_entry
+sipi_entry:
+	jmp sipi_entry
+
+.globl sipi_end
+sipi_end:
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [kvm-unit-tests PATCH v2 02/10] x86: Move load_idt() to desc.c
  2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
  2022-04-12 17:33 ` [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c Varad Gautam
@ 2022-04-12 17:33 ` Varad Gautam
  2022-04-13 15:19   ` Sean Christopherson
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 03/10] x86: desc: Split IDT entry setup into a generic helper Varad Gautam
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Varad Gautam @ 2022-04-12 17:33 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	seanjc, brijesh.singh, Thomas.Lendacky, jroedel, bp,
	varad.gautam

This allows sharing IDT setup code between EFI (-fPIC) and
non-EFI builds.

Signed-off-by: Varad Gautam <varad.gautam@suse.com>
---
 lib/x86/desc.c       | 5 +++++
 lib/x86/desc.h       | 1 +
 lib/x86/setup.c      | 1 -
 x86/cstart64.S       | 3 ++-
 x86/efi/efistart64.S | 5 -----
 5 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/lib/x86/desc.c b/lib/x86/desc.c
index 0677fcd..087e85c 100644
--- a/lib/x86/desc.c
+++ b/lib/x86/desc.c
@@ -294,6 +294,11 @@ void setup_idt(void)
 	handle_exception(13, check_exception_table);
 }
 
+void load_idt(void)
+{
+	lidt(&idt_descr);
+}
+
 unsigned exception_vector(void)
 {
 	return this_cpu_read_exception_vector();
diff --git a/lib/x86/desc.h b/lib/x86/desc.h
index 5224b58..3044409 100644
--- a/lib/x86/desc.h
+++ b/lib/x86/desc.h
@@ -4,6 +4,7 @@
 #include <setjmp.h>
 
 void setup_idt(void);
+void load_idt(void);
 void setup_alt_stack(void);
 
 struct ex_regs {
diff --git a/lib/x86/setup.c b/lib/x86/setup.c
index 86ba6de..94e9f86 100644
--- a/lib/x86/setup.c
+++ b/lib/x86/setup.c
@@ -170,7 +170,6 @@ void setup_multiboot(struct mbi_bootinfo *bi)
 #ifdef CONFIG_EFI
 
 /* From x86/efi/efistart64.S */
-extern void load_idt(void);
 extern void load_gdt_tss(size_t tss_offset);
 
 static efi_status_t setup_memory_allocator(efi_bootinfo_t *efi_bootinfo)
diff --git a/x86/cstart64.S b/x86/cstart64.S
index f371d06..30012ca 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -66,7 +66,6 @@ MSR_GS_BASE = 0xc0000101
 .endm
 
 .macro load_tss
-	lidtq idt_descr
 	movq %rsp, %rdi
 	call setup_tss
 	ltr %ax
@@ -191,6 +190,7 @@ save_id:
 
 ap_start64:
 	call reset_apic
+	call load_idt
 	load_tss
 	call enable_apic
 	call save_id
@@ -204,6 +204,7 @@ ap_start64:
 
 start64:
 	call reset_apic
+	call load_idt
 	load_tss
 	call mask_pic_interrupts
 	call enable_apic
diff --git a/x86/efi/efistart64.S b/x86/efi/efistart64.S
index 0425153..ea3d1c0 100644
--- a/x86/efi/efistart64.S
+++ b/x86/efi/efistart64.S
@@ -26,11 +26,6 @@ ptl4:
 .code64
 .text
 
-.globl load_idt
-load_idt:
-	lidtq idt_descr(%rip)
-	retq
-
 .globl load_gdt_tss
 load_gdt_tss:
 	/* Load GDT */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [kvm-unit-tests PATCH v2 03/10] x86: desc: Split IDT entry setup into a generic helper
  2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
  2022-04-12 17:33 ` [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c Varad Gautam
  2022-04-12 17:33 ` [kvm-unit-tests PATCH v2 02/10] x86: Move load_idt() to desc.c Varad Gautam
@ 2022-04-12 17:34 ` Varad Gautam
  2022-04-13 16:35   ` Sean Christopherson
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 04/10] x86: Move load_gdt_tss() to desc.c Varad Gautam
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Varad Gautam @ 2022-04-12 17:34 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	seanjc, brijesh.singh, Thomas.Lendacky, jroedel, bp,
	varad.gautam

EFI bootstrapping code configures a call gate in a later commit to jump
from 16-bit to 32-bit code.

Introduce a set_idt_entry_t() routine which can be used to fill both
an interrupt descriptor and a call gate descriptor on x86.

Signed-off-by: Varad Gautam <varad.gautam@suse.com>
---
 lib/x86/desc.c | 28 ++++++++++++++++++++++------
 lib/x86/desc.h |  1 +
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/lib/x86/desc.c b/lib/x86/desc.c
index 087e85c..049adeb 100644
--- a/lib/x86/desc.c
+++ b/lib/x86/desc.c
@@ -57,22 +57,38 @@ __attribute__((regparm(1)))
 #endif
 void do_handle_exception(struct ex_regs *regs);
 
-void set_idt_entry(int vec, void *addr, int dpl)
+/*
+ * Fill an idt_entry_t, clearing e_sz bytes first.
+ *
+ * This can also be used to set up x86 call gates, since the gate
+ * descriptor layout is identical to idt_entry_t, except for the
+ * absence of .offset2 and .reserved fields. To do so, pass in e_sz
+ * according to the gate descriptor size.
+ */
+void set_idt_entry_t(idt_entry_t *e, size_t e_sz, void *addr,
+		u16 sel, u16 type, u16 dpl)
 {
-	idt_entry_t *e = &boot_idt[vec];
-	memset(e, 0, sizeof *e);
+	memset(e, 0, e_sz);
 	e->offset0 = (unsigned long)addr;
-	e->selector = read_cs();
+	e->selector = sel;
 	e->ist = 0;
-	e->type = 14;
+	e->type = type;
 	e->dpl = dpl;
 	e->p = 1;
 	e->offset1 = (unsigned long)addr >> 16;
 #ifdef __x86_64__
-	e->offset2 = (unsigned long)addr >> 32;
+	if (e_sz == sizeof(*e)) {
+		e->offset2 = (unsigned long)addr >> 32;
+	}
 #endif
 }
 
+void set_idt_entry(int vec, void *addr, int dpl)
+{
+	idt_entry_t *e = &boot_idt[vec];
+	set_idt_entry_t(e, sizeof *e, addr, read_cs(), 14, dpl);
+}
+
 void set_idt_dpl(int vec, u16 dpl)
 {
 	idt_entry_t *e = &boot_idt[vec];
diff --git a/lib/x86/desc.h b/lib/x86/desc.h
index 3044409..ae0928f 100644
--- a/lib/x86/desc.h
+++ b/lib/x86/desc.h
@@ -217,6 +217,7 @@ unsigned exception_vector(void);
 int write_cr4_checking(unsigned long val);
 unsigned exception_error_code(void);
 bool exception_rflags_rf(void);
+void set_idt_entry_t(idt_entry_t *e, size_t e_sz, void *addr, u16 sel, u16 type, u16 dpl);
 void set_idt_entry(int vec, void *addr, int dpl);
 void set_idt_sel(int vec, u16 sel);
 void set_idt_dpl(int vec, u16 dpl);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [kvm-unit-tests PATCH v2 04/10] x86: Move load_gdt_tss() to desc.c
  2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
                   ` (2 preceding siblings ...)
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 03/10] x86: desc: Split IDT entry setup into a generic helper Varad Gautam
@ 2022-04-12 17:34 ` Varad Gautam
  2022-04-13 15:58   ` Sean Christopherson
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 05/10] x86: efi: Stop using UEFI-provided stack Varad Gautam
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Varad Gautam @ 2022-04-12 17:34 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	seanjc, brijesh.singh, Thomas.Lendacky, jroedel, bp,
	varad.gautam

Split load_gdt_tss() functionality into:
1. Load gdt/tss
2. Setup segments in 64-bit mode and update %cs via far-return

and move load_gdt_tss() to desc.c to share this code between
EFI and non-EFI tests.

Signed-off-by: Varad Gautam <varad.gautam@suse.com>
---
 lib/x86/desc.c       |  6 ++++++
 lib/x86/desc.h       |  1 +
 lib/x86/setup.c      |  4 +++-
 x86/efi/efistart64.S | 11 ++---------
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/lib/x86/desc.c b/lib/x86/desc.c
index 049adeb..3b1208b 100644
--- a/lib/x86/desc.c
+++ b/lib/x86/desc.c
@@ -362,6 +362,12 @@ void set_gdt_entry(int sel, unsigned long base,  u32 limit, u8 type, u8 flags)
 #endif
 }
 
+void load_gdt_tss(size_t tss_offset)
+{
+	lgdt(&gdt_descr);
+	ltr(tss_offset);
+}
+
 #ifndef __x86_64__
 void set_gdt_task_gate(u16 sel, u16 tss_sel)
 {
diff --git a/lib/x86/desc.h b/lib/x86/desc.h
index ae0928f..83a16dd 100644
--- a/lib/x86/desc.h
+++ b/lib/x86/desc.h
@@ -222,6 +222,7 @@ void set_idt_entry(int vec, void *addr, int dpl);
 void set_idt_sel(int vec, u16 sel);
 void set_idt_dpl(int vec, u16 dpl);
 void set_gdt_entry(int sel, unsigned long base, u32 limit, u8 access, u8 gran);
+void load_gdt_tss(size_t tss_offset);
 void set_intr_alt_stack(int e, void *fn);
 void print_current_tss_info(void);
 handler handle_exception(u8 v, handler fn);
diff --git a/lib/x86/setup.c b/lib/x86/setup.c
index 94e9f86..7dd6677 100644
--- a/lib/x86/setup.c
+++ b/lib/x86/setup.c
@@ -170,7 +170,7 @@ void setup_multiboot(struct mbi_bootinfo *bi)
 #ifdef CONFIG_EFI
 
 /* From x86/efi/efistart64.S */
-extern void load_gdt_tss(size_t tss_offset);
+extern void setup_segments64(void);
 
 static efi_status_t setup_memory_allocator(efi_bootinfo_t *efi_bootinfo)
 {
@@ -275,6 +275,8 @@ static void setup_gdt_tss(void)
 	/* 64-bit setup_tss does not use the stacktop argument.  */
 	tss_offset = setup_tss(NULL);
 	load_gdt_tss(tss_offset);
+
+	setup_segments64();
 }
 
 efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
diff --git a/x86/efi/efistart64.S b/x86/efi/efistart64.S
index ea3d1c0..8eadca5 100644
--- a/x86/efi/efistart64.S
+++ b/x86/efi/efistart64.S
@@ -26,15 +26,8 @@ ptl4:
 .code64
 .text
 
-.globl load_gdt_tss
-load_gdt_tss:
-	/* Load GDT */
-	lgdt gdt_descr(%rip)
-
-	/* Load TSS */
-	mov %rdi, %rax
-	ltr %ax
-
+.globl setup_segments64
+setup_segments64:
 	/* Update data segments */
 	mov $0x10, %ax /* 3rd entry in gdt64: 32/64-bit data segment */
 	mov %ax, %ds
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [kvm-unit-tests PATCH v2 05/10] x86: efi: Stop using UEFI-provided stack
  2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
                   ` (3 preceding siblings ...)
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 04/10] x86: Move load_gdt_tss() to desc.c Varad Gautam
@ 2022-04-12 17:34 ` Varad Gautam
  2022-04-13 16:20   ` Sean Christopherson
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 06/10] x86: efi: Stop using UEFI-provided %gs for percpu storage Varad Gautam
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Varad Gautam @ 2022-04-12 17:34 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	seanjc, brijesh.singh, Thomas.Lendacky, jroedel, bp,
	varad.gautam

UEFI test builds currently use the stack pointer configured by the
UEFI implementation.

Reserve stack space in .data for EFI testcases and switch %rsp to
use this memory on early boot. This provides one 4K page per CPU
to store its stack / percpu data.

Signed-off-by: Varad Gautam <varad.gautam@suse.com>
---
 x86/efi/crt0-efi-x86_64.S | 3 +++
 x86/efi/efistart64.S      | 8 ++++++++
 2 files changed, 11 insertions(+)

diff --git a/x86/efi/crt0-efi-x86_64.S b/x86/efi/crt0-efi-x86_64.S
index eaf1656..1708ed5 100644
--- a/x86/efi/crt0-efi-x86_64.S
+++ b/x86/efi/crt0-efi-x86_64.S
@@ -58,6 +58,9 @@ _start:
 	popq %rdi
 	popq %rsi
 
+	/* Switch away from EFI stack. */
+	lea stacktop(%rip), %rsp
+
 	call efi_main
 	addq $8, %rsp
 
diff --git a/x86/efi/efistart64.S b/x86/efi/efistart64.S
index 8eadca5..cb08230 100644
--- a/x86/efi/efistart64.S
+++ b/x86/efi/efistart64.S
@@ -6,6 +6,14 @@
 
 .data
 
+max_cpus = MAX_TEST_CPUS
+
+/* Reserve stack in .data */
+	. = . + 4096 * max_cpus
+	.align 16
+.globl stacktop
+stacktop:
+
 .align PAGE_SIZE
 .globl ptl2
 ptl2:
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [kvm-unit-tests PATCH v2 06/10] x86: efi: Stop using UEFI-provided %gs for percpu storage
  2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
                   ` (4 preceding siblings ...)
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 05/10] x86: efi: Stop using UEFI-provided stack Varad Gautam
@ 2022-04-12 17:34 ` Varad Gautam
  2022-04-13 16:23   ` Sean Christopherson
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 07/10] x86: efi, smp: Transition APs from 16-bit to 32-bit mode Varad Gautam
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Varad Gautam @ 2022-04-12 17:34 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	seanjc, brijesh.singh, Thomas.Lendacky, jroedel, bp,
	varad.gautam

UEFI tests do not update MSR_GS_BASE during bringup, and continue
using the GS_BASE set up by the UEFI implementation for percpu
storage.

Update this MSR during setup_segments64() to allow storing percpu
data at a sane location reserved by the testcase, and ensure that
this happens before any operation that ends up storing to the percpu
space.

Signed-off-by: Varad Gautam <varad.gautam@suse.com>
---
 lib/x86/setup.c      | 9 ++++++---
 x86/efi/efistart64.S | 7 +++++++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/lib/x86/setup.c b/lib/x86/setup.c
index 7dd6677..5d32d3f 100644
--- a/lib/x86/setup.c
+++ b/lib/x86/setup.c
@@ -170,7 +170,8 @@ void setup_multiboot(struct mbi_bootinfo *bi)
 #ifdef CONFIG_EFI
 
 /* From x86/efi/efistart64.S */
-extern void setup_segments64(void);
+extern void setup_segments64(u64 gs_base);
+extern u8 stacktop;
 
 static efi_status_t setup_memory_allocator(efi_bootinfo_t *efi_bootinfo)
 {
@@ -271,12 +272,14 @@ static void setup_page_table(void)
 static void setup_gdt_tss(void)
 {
 	size_t tss_offset;
+	u64 gs_base;
 
 	/* 64-bit setup_tss does not use the stacktop argument.  */
 	tss_offset = setup_tss(NULL);
 	load_gdt_tss(tss_offset);
 
-	setup_segments64();
+	gs_base = (u64)(&stacktop) - (PAGE_SIZE * (pre_boot_apic_id() + 1));
+	setup_segments64(gs_base);
 }
 
 efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
@@ -318,8 +321,8 @@ efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
 		return status;
 	}
 
-	reset_apic();
 	setup_gdt_tss();
+	reset_apic();
 	setup_idt();
 	load_idt();
 	mask_pic_interrupts();
diff --git a/x86/efi/efistart64.S b/x86/efi/efistart64.S
index cb08230..1c38355 100644
--- a/x86/efi/efistart64.S
+++ b/x86/efi/efistart64.S
@@ -44,6 +44,13 @@ setup_segments64:
 	mov %ax, %gs
 	mov %ax, %ss
 
+	/* Setup percpu base */
+	MSR_GS_BASE = 0xc0000101
+	mov %rdi, %rax
+	mov $0, %edx
+	mov $MSR_GS_BASE, %ecx
+	wrmsr
+
 	/*
 	 * Update the code segment by putting it on the stack before the return
 	 * address, then doing a far return: this will use the new code segment
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [kvm-unit-tests PATCH v2 07/10] x86: efi, smp: Transition APs from 16-bit to 32-bit mode
  2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
                   ` (5 preceding siblings ...)
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 06/10] x86: efi: Stop using UEFI-provided %gs for percpu storage Varad Gautam
@ 2022-04-12 17:34 ` Varad Gautam
  2022-04-13 19:17   ` Sean Christopherson
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 08/10] x86: Move 32-bit bringup routines to start32.S Varad Gautam
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Varad Gautam @ 2022-04-12 17:34 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	seanjc, brijesh.singh, Thomas.Lendacky, jroedel, bp,
	varad.gautam

Sending INIT/SIPI to APs from ap_init() resets them into 16-bit mode
to loop into sipi_entry().

To drive the APs into 32-bit mode, the SIPI vector needs:
1. A GDT descriptor reachable from 16-bit code (gdt32_descr).
2. A 32-bit entrypoint reachable from 16-bit code (ap_start32).
3. The locations of GDT and the 32-bit entrypoint.

Setting these up at compile time (like on non-EFI builds) is not
possible since EFI builds with -shared -fPIC and efistart64.S cannot
reference any absolute addresses.

Relative addressing is unavailable on 16-bit mode.

Moreover, EFI may not load the 32-bit entrypoint to be reachable from
16-bit mode.

To overcome these problems,
1. Fill the GDT descriptor at runtime after relocating
   [sipi_entry-sipi_end] to lowmem. Since sipi_entry does not know the
   address of this descriptor, use the last two bytes of SIPI page to
   communicate it.
2. Place a call gate in the GDT to point to ap_start32.
3. Popluate sipi_entry() to lcall to ap_start32.

With this, the APs can transition to 32-bit mode and loop at a known
location.

Signed-off-by: Varad Gautam <varad.gautam@suse.com>
---
 lib/x86/smp.c        | 56 ++++++++++++++++++++++++++++++++++++++++++++
 x86/efi/efistart64.S | 29 ++++++++++++++++++++++-
 2 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/lib/x86/smp.c b/lib/x86/smp.c
index d7f5aba..5cc1648 100644
--- a/lib/x86/smp.c
+++ b/lib/x86/smp.c
@@ -6,6 +6,7 @@
 #include "apic.h"
 #include "fwcfg.h"
 #include "desc.h"
+#include "asm/page.h"
 
 #define IPI_VECTOR 0x20
 
@@ -144,16 +145,71 @@ void smp_reset_apic(void)
 	atomic_inc(&active_cpus);
 }
 
+#ifdef CONFIG_EFI
+extern u8 gdt32_descr, gdt32, gdt32_end;
+extern u8 ap_start32;
+#endif
+
 void ap_init(void)
 {
 	u8 *dst_addr = 0;
 	size_t sipi_sz = (&sipi_end - &sipi_entry) + 1;
 
+	assert(sipi_sz < PAGE_SIZE);
+
 	asm volatile("cld");
 
 	/* Relocate SIPI vector to dst_addr so it can run in 16-bit mode. */
+	memset(dst_addr, 0, PAGE_SIZE);
 	memcpy(dst_addr, &sipi_entry, sipi_sz);
 
+#ifdef CONFIG_EFI
+	volatile struct descriptor_table_ptr *gdt32_descr_rel;
+	idt_entry_t *gate_descr;
+	u16 *gdt32_descr_reladdr = (u16 *) (PAGE_SIZE - sizeof(u16));
+
+	/*
+	 * gdt32_descr for CONFIG_EFI needs to be filled here dynamically
+	 * since compile time calculation of offsets is not allowed when
+	 * building with -shared, and rip-relative addressing is not supported
+	 * in 16-bit mode.
+	 *
+	 * Use the last two bytes of SIPI page to store relocated gdt32_descr
+	 * addr.
+	 */
+	*gdt32_descr_reladdr = (&gdt32_descr - &sipi_entry);
+
+	gdt32_descr_rel = (struct descriptor_table_ptr *) ((u64) *gdt32_descr_reladdr);
+	gdt32_descr_rel->limit = (u16) (&gdt32_end - &gdt32 - 1);
+	gdt32_descr_rel->base = (ulong) ((u32) (&gdt32 - &sipi_entry));
+
+	/*
+	 * EFI may not load the 32-bit AP entrypoint (ap_start32) low enough
+	 * to be reachable from the SIPI vector. Since we build with -shared, this
+	 * location needs to be fetched at runtime, and rip-relative addressing is
+	 * not supported in 16-bit mode.
+	 * To perform 16-bit -> 32-bit far jump, our options are:
+	 * - ljmpl $cs, $label : unusable since $label is not known at build time.
+	 * - push $cs; push $label; lret : requires an intermediate trampoline since
+	 *	 $label must still be within 0 - 0xFFFF for 16-bit far return to work.
+	 * - lcall into a call-gate : best suited.
+	 *
+	 * Set up call gate to ap_start32 within GDT.
+	 *
+	 * gdt32 layout:
+	 *
+	 * Entry | Segment
+	 * 0	 | NULL descr
+	 * 1	 | Code segment descr
+	 * 2	 | Data segment descr
+	 * 3	 | Call gate descr
+	 */
+	gate_descr = (idt_entry_t *) ((u8 *)(&gdt32 - &sipi_entry)
+		+ 3 * sizeof(gdt_entry_t));
+	set_idt_entry_t(gate_descr, sizeof(gdt_entry_t), (void *) &ap_start32,
+		0x8 /* sel */, 0xc /* type */, 0 /* dpl */);
+#endif
+
 	/* INIT */
 	apic_icr_write(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_INIT | APIC_INT_ASSERT, 0);
 
diff --git a/x86/efi/efistart64.S b/x86/efi/efistart64.S
index 1c38355..00279b8 100644
--- a/x86/efi/efistart64.S
+++ b/x86/efi/efistart64.S
@@ -65,7 +65,34 @@ setup_segments64:
 
 .globl sipi_entry
 sipi_entry:
-	jmp sipi_entry
+	mov %cr0, %eax
+	or $1, %eax
+	mov %eax, %cr0
+
+	/* Retrieve relocated gdt32_descr address at (PAGE_SIZE - 2). */
+	mov (PAGE_SIZE - 2), %ebx
+	lgdtl (%ebx)
+
+	lcall $0x18, $0x0
+
+.globl gdt32
+gdt32:
+	.quad 0
+	.quad 0x00cf9b000000ffff // flat 32-bit code segment
+	.quad 0x00cf93000000ffff // flat 32-bit data segment
+	.quad 0                  // call gate to 32-bit AP entrypoint
+.globl gdt32_end
+gdt32_end:
+
+.globl gdt32_descr
+gdt32_descr:
+	.word 0
+	.long 0
 
 .globl sipi_end
 sipi_end:
+
+.code32
+.globl ap_start32
+ap_start32:
+	jmp ap_start32
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [kvm-unit-tests PATCH v2 08/10] x86: Move 32-bit bringup routines to start32.S
  2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
                   ` (6 preceding siblings ...)
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 07/10] x86: efi, smp: Transition APs from 16-bit to 32-bit mode Varad Gautam
@ 2022-04-12 17:34 ` Varad Gautam
  2022-04-13 19:33   ` Sean Christopherson
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 09/10] x86: efi, smp: Transition APs from 32-bit to 64-bit mode Varad Gautam
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Varad Gautam @ 2022-04-12 17:34 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	seanjc, brijesh.singh, Thomas.Lendacky, jroedel, bp,
	varad.gautam

These can be shared across EFI and non-EFI builds.

Signed-off-by: Varad Gautam <varad.gautam@suse.com>
---
 x86/cstart64.S | 60 +-----------------------------------------------
 x86/start32.S  | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 59 deletions(-)
 create mode 100644 x86/start32.S

diff --git a/x86/cstart64.S b/x86/cstart64.S
index 30012ca..6eb109d 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -56,35 +56,13 @@ mb_flags = 0x0
 	.long mb_magic, mb_flags, 0 - (mb_magic + mb_flags)
 mb_cmdline = 16
 
-MSR_GS_BASE = 0xc0000101
-
-.macro setup_percpu_area
-	lea -4096(%esp), %eax
-	mov $0, %edx
-	mov $MSR_GS_BASE, %ecx
-	wrmsr
-.endm
-
 .macro load_tss
 	movq %rsp, %rdi
 	call setup_tss
 	ltr %ax
 .endm
 
-.macro setup_segments
-	mov $MSR_GS_BASE, %ecx
-	rdmsr
-
-	mov $0x10, %bx
-	mov %bx, %ds
-	mov %bx, %es
-	mov %bx, %fs
-	mov %bx, %gs
-	mov %bx, %ss
-
-	/* restore MSR_GS_BASE */
-	wrmsr
-.endm
+#include "start32.S"
 
 .globl start
 start:
@@ -118,33 +96,6 @@ switch_to_5level:
 	call enter_long_mode
 	jmpl $8, $lvl5
 
-prepare_64:
-	lgdt gdt_descr
-	setup_segments
-
-	xor %eax, %eax
-	mov %eax, %cr4
-
-enter_long_mode:
-	mov %cr4, %eax
-	bts $5, %eax  // pae
-	mov %eax, %cr4
-
-	mov pt_root, %eax
-	mov %eax, %cr3
-
-efer = 0xc0000080
-	mov $efer, %ecx
-	rdmsr
-	bts $8, %eax
-	wrmsr
-
-	mov %cr0, %eax
-	bts $0, %eax
-	bts $31, %eax
-	mov %eax, %cr0
-	ret
-
 smp_stacktop:	.long stacktop - 4096
 
 .align 16
@@ -171,15 +122,6 @@ gdt32_descr:
 .globl sipi_end
 sipi_end:
 
-.code32
-ap_start32:
-	setup_segments
-	mov $-4096, %esp
-	lock xaddl %esp, smp_stacktop
-	setup_percpu_area
-	call prepare_64
-	ljmpl $8, $ap_start64
-
 .code64
 save_id:
 	movl $(APIC_DEFAULT_PHYS_BASE + APIC_ID), %eax
diff --git a/x86/start32.S b/x86/start32.S
new file mode 100644
index 0000000..9e00474
--- /dev/null
+++ b/x86/start32.S
@@ -0,0 +1,62 @@
+/* Common 32-bit code between EFI and non-EFI bootstrapping. */
+
+.code32
+
+MSR_GS_BASE = 0xc0000101
+
+.macro setup_percpu_area
+	lea -4096(%esp), %eax
+	mov $0, %edx
+	mov $MSR_GS_BASE, %ecx
+	wrmsr
+.endm
+
+.macro setup_segments
+	mov $MSR_GS_BASE, %ecx
+	rdmsr
+
+	mov $0x10, %bx
+	mov %bx, %ds
+	mov %bx, %es
+	mov %bx, %fs
+	mov %bx, %gs
+	mov %bx, %ss
+
+	/* restore MSR_GS_BASE */
+	wrmsr
+.endm
+
+prepare_64:
+	lgdt gdt_descr
+	setup_segments
+
+	xor %eax, %eax
+	mov %eax, %cr4
+
+enter_long_mode:
+	mov %cr4, %eax
+	bts $5, %eax  // pae
+	mov %eax, %cr4
+
+	mov pt_root, %eax
+	mov %eax, %cr3
+
+efer = 0xc0000080
+	mov $efer, %ecx
+	rdmsr
+	bts $8, %eax
+	wrmsr
+
+	mov %cr0, %eax
+	bts $0, %eax
+	bts $31, %eax
+	mov %eax, %cr0
+	ret
+
+ap_start32:
+	setup_segments
+	mov $-4096, %esp
+	lock xaddl %esp, smp_stacktop
+	setup_percpu_area
+	call prepare_64
+	ljmpl $8, $ap_start64
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [kvm-unit-tests PATCH v2 09/10] x86: efi, smp: Transition APs from 32-bit to 64-bit mode
  2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
                   ` (7 preceding siblings ...)
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 08/10] x86: Move 32-bit bringup routines to start32.S Varad Gautam
@ 2022-04-12 17:34 ` Varad Gautam
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 10/10] x86: Provide a common 64-bit AP entrypoint for EFI and non-EFI Varad Gautam
  2022-04-13 19:57 ` [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Sean Christopherson
  10 siblings, 0 replies; 27+ messages in thread
From: Varad Gautam @ 2022-04-12 17:34 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	seanjc, brijesh.singh, Thomas.Lendacky, jroedel, bp,
	varad.gautam

Reaching 64-bit mode requires setting up a valid stack and percpu
regions for each CPU and configuring a page table before far-jumping to
the 64-bit entrypoint.

This functionality is already present as prepare_64() and ap_start32()
routines in start32.S for non-EFI test builds.

However since EFI builds (-fPIC) cannot use absolute addressing, and
32-bit mode does not allow RIP-relative addressing, these routines need
some changes.

Modify prepare_64() and ap_start32() asm routines to calculate label
addresses during runtime on CONFIG_EFI. To ease the common case, replace
the far-jump to ap_start64() with a far-return.

This brings the APs into 64-bit mode to loop at a known location.

Signed-off-by: Varad Gautam <varad.gautam@suse.com>
---
 lib/x86/setup.c      |  2 +-
 lib/x86/smp.c        |  3 +++
 x86/efi/efistart64.S | 13 ++++++++++---
 x86/start32.S        | 46 +++++++++++++++++++++++++++++++++++++++++---
 4 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/lib/x86/setup.c b/lib/x86/setup.c
index 5d32d3f..e2f7967 100644
--- a/lib/x86/setup.c
+++ b/lib/x86/setup.c
@@ -326,11 +326,11 @@ efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
 	setup_idt();
 	load_idt();
 	mask_pic_interrupts();
+	setup_page_table();
 	enable_apic();
 	ap_init();
 	enable_x2apic();
 	smp_init();
-	setup_page_table();
 
 	return EFI_SUCCESS;
 }
diff --git a/lib/x86/smp.c b/lib/x86/smp.c
index 5cc1648..cee82ac 100644
--- a/lib/x86/smp.c
+++ b/lib/x86/smp.c
@@ -148,6 +148,8 @@ void smp_reset_apic(void)
 #ifdef CONFIG_EFI
 extern u8 gdt32_descr, gdt32, gdt32_end;
 extern u8 ap_start32;
+extern u32 smp_stacktop;
+extern u8 stacktop;
 #endif
 
 void ap_init(void)
@@ -168,6 +170,7 @@ void ap_init(void)
 	idt_entry_t *gate_descr;
 	u16 *gdt32_descr_reladdr = (u16 *) (PAGE_SIZE - sizeof(u16));
 
+	smp_stacktop = ((u64) (&stacktop)) - 4096;
 	/*
 	 * gdt32_descr for CONFIG_EFI needs to be filled here dynamically
 	 * since compile time calculation of offsets is not allowed when
diff --git a/x86/efi/efistart64.S b/x86/efi/efistart64.S
index 00279b8..149e3f7 100644
--- a/x86/efi/efistart64.S
+++ b/x86/efi/efistart64.S
@@ -14,6 +14,9 @@ max_cpus = MAX_TEST_CPUS
 .globl stacktop
 stacktop:
 
+.globl smp_stacktop
+smp_stacktop:	.long 0
+
 .align PAGE_SIZE
 .globl ptl2
 ptl2:
@@ -93,6 +96,10 @@ gdt32_descr:
 sipi_end:
 
 .code32
-.globl ap_start32
-ap_start32:
-	jmp ap_start32
+
+#include "../start32.S"
+
+.code64:
+
+ap_start64:
+	jmp ap_start64
diff --git a/x86/start32.S b/x86/start32.S
index 9e00474..2089be7 100644
--- a/x86/start32.S
+++ b/x86/start32.S
@@ -27,7 +27,16 @@ MSR_GS_BASE = 0xc0000101
 .endm
 
 prepare_64:
-	lgdt gdt_descr
+#ifdef CONFIG_EFI
+	call prepare_64_1
+prepare_64_1:
+	pop %edx
+	add $gdt_descr - prepare_64_1, %edx
+#else
+	mov $gdt_descr, %edx
+#endif
+	lgdtl (%edx)
+
 	setup_segments
 
 	xor %eax, %eax
@@ -38,7 +47,14 @@ enter_long_mode:
 	bts $5, %eax  // pae
 	mov %eax, %cr4
 
+#ifdef CONFIG_EFI
+	call prepare_64_2
+prepare_64_2:
+	pop %eax
+	add $ptl4 - prepare_64_2, %eax
+#else
 	mov pt_root, %eax
+#endif
 	mov %eax, %cr3
 
 efer = 0xc0000080
@@ -53,10 +69,34 @@ efer = 0xc0000080
 	mov %eax, %cr0
 	ret
 
+.globl ap_start32
 ap_start32:
 	setup_segments
+
+#ifdef CONFIG_EFI
+	call ap_start32_1
+ap_start32_1:
+	pop %edx
+	add $smp_stacktop - ap_start32_1, %edx
+#else
+	mov $smp_stacktop, %edx
+#endif
 	mov $-4096, %esp
-	lock xaddl %esp, smp_stacktop
+	lock xaddl %esp, (%edx)
+
 	setup_percpu_area
 	call prepare_64
-	ljmpl $8, $ap_start64
+
+#ifdef CONFIG_EFI
+	call ap_start32_2
+ap_start32_2:
+	pop %edx
+	add $ap_start64 - ap_start32_2, %edx
+#else
+	mov $ap_start64, %edx
+#endif
+
+	pushl $0x08
+	pushl %edx
+
+	lretl
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [kvm-unit-tests PATCH v2 10/10] x86: Provide a common 64-bit AP entrypoint for EFI and non-EFI
  2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
                   ` (8 preceding siblings ...)
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 09/10] x86: efi, smp: Transition APs from 32-bit to 64-bit mode Varad Gautam
@ 2022-04-12 17:34 ` Varad Gautam
  2022-04-13 19:55   ` Sean Christopherson
  2022-04-13 19:57 ` [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Sean Christopherson
  10 siblings, 1 reply; 27+ messages in thread
From: Varad Gautam @ 2022-04-12 17:34 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	seanjc, brijesh.singh, Thomas.Lendacky, jroedel, bp,
	varad.gautam

ap_start64() currently serves as the 64-bit entrypoint for non-EFI
tests.

Having ap_start64() and save_id() written in asm prevents sharing these
routines between EFI and non-EFI tests.

Rewrite them in C and use ap_start64 as the 64-bit entrypoint in the EFI
boot flow.

With this, EFI tests support -smp > 1. smptest.efi now passes.

Signed-off-by: Varad Gautam <varad.gautam@suse.com>
---
 lib/x86/asm/setup.h  |  3 +++
 lib/x86/setup.c      | 54 +++++++++++++++++++++++++++++++++-----------
 lib/x86/smp.c        |  1 +
 x86/cstart64.S       | 24 --------------------
 x86/efi/efistart64.S |  5 ----
 5 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/lib/x86/asm/setup.h b/lib/x86/asm/setup.h
index 24d4fa9..8502e7d 100644
--- a/lib/x86/asm/setup.h
+++ b/lib/x86/asm/setup.h
@@ -16,4 +16,7 @@ efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo);
 void setup_5level_page_table(void);
 #endif /* CONFIG_EFI */
 
+void save_id(void);
+void ap_start64(void);
+
 #endif /* _X86_ASM_SETUP_H_ */
diff --git a/lib/x86/setup.c b/lib/x86/setup.c
index e2f7967..a0e0b0c 100644
--- a/lib/x86/setup.c
+++ b/lib/x86/setup.c
@@ -14,8 +14,12 @@
 #include "apic.h"
 #include "apic-defs.h"
 #include "asm/setup.h"
+#include "processor.h"
+#include "atomic.h"
 
 extern char edata;
+extern unsigned char online_cpus[(MAX_TEST_CPUS + 7) / 8];
+extern unsigned cpu_online_count;
 
 struct mbi_bootinfo {
 	u32 flags;
@@ -172,7 +176,22 @@ void setup_multiboot(struct mbi_bootinfo *bi)
 /* From x86/efi/efistart64.S */
 extern void setup_segments64(u64 gs_base);
 extern u8 stacktop;
+#endif
+
+static void setup_gdt_tss(void)
+{
+	size_t tss_offset;
 
+	/* 64-bit setup_tss does not use the stacktop argument.  */
+	tss_offset = setup_tss(NULL);
+	load_gdt_tss(tss_offset);
+#ifdef CONFIG_EFI
+	u64 gs_base = (u64)(&stacktop) - (PAGE_SIZE * (pre_boot_apic_id() + 1));
+	setup_segments64(gs_base);
+#endif
+}
+
+#ifdef CONFIG_EFI
 static efi_status_t setup_memory_allocator(efi_bootinfo_t *efi_bootinfo)
 {
 	int i;
@@ -269,19 +288,6 @@ static void setup_page_table(void)
 	write_cr3((ulong)&ptl4);
 }
 
-static void setup_gdt_tss(void)
-{
-	size_t tss_offset;
-	u64 gs_base;
-
-	/* 64-bit setup_tss does not use the stacktop argument.  */
-	tss_offset = setup_tss(NULL);
-	load_gdt_tss(tss_offset);
-
-	gs_base = (u64)(&stacktop) - (PAGE_SIZE * (pre_boot_apic_id() + 1));
-	setup_segments64(gs_base);
-}
-
 efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
 {
 	efi_status_t status;
@@ -328,6 +334,7 @@ efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
 	mask_pic_interrupts();
 	setup_page_table();
 	enable_apic();
+	save_id();
 	ap_init();
 	enable_x2apic();
 	smp_init();
@@ -350,3 +357,24 @@ void setup_libcflat(void)
 			add_setup_arg("bootloader");
 	}
 }
+
+void save_id(void)
+{
+	u32 id = apic_id();
+
+	/* atomic_fetch_or() emits `lock or %dl, (%eax)` */
+	atomic_fetch_or(&online_cpus[id / 8], (1 << (id % 8)));
+}
+
+void ap_start64(void)
+{
+	setup_gdt_tss();
+	reset_apic();
+	load_idt();
+	save_id();
+	enable_apic();
+	enable_x2apic();
+	sti();
+	atomic_fetch_inc(&cpu_online_count);
+	asm volatile("1: hlt; jmp 1b");
+}
diff --git a/lib/x86/smp.c b/lib/x86/smp.c
index cee82ac..779d346 100644
--- a/lib/x86/smp.c
+++ b/lib/x86/smp.c
@@ -22,6 +22,7 @@ static atomic_t active_cpus;
 extern u8 sipi_entry;
 extern u8 sipi_end;
 volatile unsigned cpu_online_count = 1;
+unsigned char online_cpus[(MAX_TEST_CPUS + 7) / 8];
 
 static __attribute__((used)) void ipi(void)
 {
diff --git a/x86/cstart64.S b/x86/cstart64.S
index 6eb109d..6363293 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -123,27 +123,6 @@ gdt32_descr:
 sipi_end:
 
 .code64
-save_id:
-	movl $(APIC_DEFAULT_PHYS_BASE + APIC_ID), %eax
-	movl (%rax), %eax
-	shrl $24, %eax
-	lock btsl %eax, online_cpus
-	retq
-
-ap_start64:
-	call reset_apic
-	call load_idt
-	load_tss
-	call enable_apic
-	call save_id
-	call enable_x2apic
-	sti
-	nop
-	lock incw cpu_online_count
-
-1:	hlt
-	jmp 1b
-
 start64:
 	call reset_apic
 	call load_idt
@@ -182,6 +161,3 @@ setup_5level_page_table:
 	lretq
 lvl5:
 	retq
-
-online_cpus:
-	.fill (max_cpus + 7) / 8, 1, 0
diff --git a/x86/efi/efistart64.S b/x86/efi/efistart64.S
index 149e3f7..a5d7219 100644
--- a/x86/efi/efistart64.S
+++ b/x86/efi/efistart64.S
@@ -98,8 +98,3 @@ sipi_end:
 .code32
 
 #include "../start32.S"
-
-.code64:
-
-ap_start64:
-	jmp ap_start64
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c
  2022-04-12 17:33 ` [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c Varad Gautam
@ 2022-04-13 15:16   ` Sean Christopherson
  2022-04-13 18:44   ` Sean Christopherson
  1 sibling, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 15:16 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Tue, Apr 12, 2022, Varad Gautam wrote:
> ap_init() copies the SIPI vector to lowmem, sends INIT/SIPI to APs
> and waits on the APs to come up.
> 
> Port this routine to C from asm and move it to smp.c to allow sharing
> this functionality between the EFI (-fPIC) and non-EFI builds.
> 
> Call ap_init() from the EFI setup path to reset the APs to a known
> location.
> 
> Signed-off-by: Varad Gautam <varad.gautam@suse.com>
> ---
>  lib/x86/setup.c      |  1 +
>  lib/x86/smp.c        | 28 ++++++++++++++++++++++++++--
>  lib/x86/smp.h        |  1 +
>  x86/cstart64.S       | 20 ++------------------

x86/cstart.S needs to join the party, without being kept in the loop KUT fails to
build on 32-bit targets.

	x86/vmexit.o x86/cstart.o lib/libcflat.a
lib/libcflat.a(smp.o): In function `ap_init':
/home/sean/go/src/kernel.org/kvm-unit-tests/lib/x86/smp.c:150: undefined reference to `sipi_end'
/home/sean/go/src/kernel.org/kvm-unit-tests/lib/x86/smp.c:150: undefined reference to `sipi_entry'
/home/sean/go/src/kernel.org/kvm-unit-tests/lib/x86/smp.c:155: undefined reference to `sipi_entry'
collect2: error: ld returned 1 exit status
/home/sean/go/src/kernel.org/kvm-unit-tests/x86/Makefile.common:65: recipe for target 'x86/vmexit.elf' failed

>  x86/efi/efistart64.S |  9 +++++++++
>  5 files changed, 39 insertions(+), 20 deletions(-)
> 
> diff --git a/lib/x86/setup.c b/lib/x86/setup.c
> index 2d63a44..86ba6de 100644
> --- a/lib/x86/setup.c
> +++ b/lib/x86/setup.c
> @@ -323,6 +323,7 @@ efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
>  	load_idt();
>  	mask_pic_interrupts();
>  	enable_apic();
> +	ap_init();
>  	enable_x2apic();
>  	smp_init();
>  	setup_page_table();
> diff --git a/lib/x86/smp.c b/lib/x86/smp.c
> index 683b25d..d7f5aba 100644
> --- a/lib/x86/smp.c
> +++ b/lib/x86/smp.c
> @@ -18,6 +18,9 @@ static volatile int ipi_done;
>  static volatile bool ipi_wait;
>  static int _cpu_count;
>  static atomic_t active_cpus;
> +extern u8 sipi_entry;
> +extern u8 sipi_end;
> +volatile unsigned cpu_online_count = 1;

Please no bare "unsigned".  But that's a moot point because it actually needs to
be a u16, the asm code does incw.  That's also sort of a moot point because there's
zero chance any of this will work with 65536+ vCPUs, but it's still odd to see.

There's also a declaration in x86/svm_tests.c that needs to get deleted.

	extern u16 cpu_online_count;

Ooh, actually, an even better idea.  Make cpu_online_count a proper atomic_t,
same as active_cpus.  Then the volatile goes away.  Ugh, but atomic_t uses a
volatile.  *sigh*  At least that's contained in one spot that we can fix all at
once if someone gets motivated in the future.

And then rather than leave the

	lock incw cpu_online_count

in asm code, move that to C code to, e.g. ap_call_in() or something (there's gotta
be a standard-ish name for this).  The shortlog can be something like "Move AP
wakeup and rendezvous to smp.c.

And as a bonus, add a printf() in the C helper (assuming that doesn't cause
explosions) to state which AP came online.  That would be super helpful for debug
when someone breaks SMP boot.

>  static __attribute__((used)) void ipi(void)
>  {
> @@ -114,8 +117,6 @@ void smp_init(void)
>  	int i;
>  	void ipi_entry(void);
>  
> -	_cpu_count = fwcfg_get_nb_cpus();
> -
>  	setup_idt();
>  	init_apic_map();
>  	set_idt_entry(IPI_VECTOR, ipi_entry, 0);
> @@ -142,3 +143,26 @@ void smp_reset_apic(void)
>  
>  	atomic_inc(&active_cpus);
>  }
> +
> +void ap_init(void)
> +{
> +	u8 *dst_addr = 0;
> +	size_t sipi_sz = (&sipi_end - &sipi_entry) + 1;
> +
> +	asm volatile("cld");
> +
> +	/* Relocate SIPI vector to dst_addr so it can run in 16-bit mode. */
> +	memcpy(dst_addr, &sipi_entry, sipi_sz);
> +
> +	/* INIT */
> +	apic_icr_write(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_INIT | APIC_INT_ASSERT, 0);
> +
> +	/* SIPI */
> +	apic_icr_write(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_STARTUP, 0);
> +
> +	_cpu_count = fwcfg_get_nb_cpus();
> +

Similar to above, I would be in favor of opportunitically adding a printf to state
that the BSP is about to wait for N number of APs to come online, 
.
> +	while (_cpu_count != cpu_online_count) {
> +		;
> +	}

Curly braces technically aren't needed.

> +}
> diff --git a/lib/x86/smp.h b/lib/x86/smp.h
> index bd303c2..9c92853 100644

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 02/10] x86: Move load_idt() to desc.c
  2022-04-12 17:33 ` [kvm-unit-tests PATCH v2 02/10] x86: Move load_idt() to desc.c Varad Gautam
@ 2022-04-13 15:19   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 15:19 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Tue, Apr 12, 2022, Varad Gautam wrote:
> This allows sharing IDT setup code between EFI (-fPIC) and
> non-EFI builds.
> 
> Signed-off-by: Varad Gautam <varad.gautam@suse.com>
> ---
>  lib/x86/desc.c       | 5 +++++
>  lib/x86/desc.h       | 1 +
>  lib/x86/setup.c      | 1 -
>  x86/cstart64.S       | 3 ++-
>  x86/efi/efistart64.S | 5 -----

I belive setup_tr_and_percpu in x86/cstart.S can also use the common load_idt.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 04/10] x86: Move load_gdt_tss() to desc.c
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 04/10] x86: Move load_gdt_tss() to desc.c Varad Gautam
@ 2022-04-13 15:58   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 15:58 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Tue, Apr 12, 2022, Varad Gautam wrote:
> diff --git a/lib/x86/setup.c b/lib/x86/setup.c
> index 94e9f86..7dd6677 100644
> --- a/lib/x86/setup.c
> +++ b/lib/x86/setup.c
> @@ -170,7 +170,7 @@ void setup_multiboot(struct mbi_bootinfo *bi)
>  #ifdef CONFIG_EFI
>  
>  /* From x86/efi/efistart64.S */
> -extern void load_gdt_tss(size_t tss_offset);
> +extern void setup_segments64(void);
>  
>  static efi_status_t setup_memory_allocator(efi_bootinfo_t *efi_bootinfo)
>  {
> @@ -275,6 +275,8 @@ static void setup_gdt_tss(void)
>  	/* 64-bit setup_tss does not use the stacktop argument.  */
>  	tss_offset = setup_tss(NULL);
>  	load_gdt_tss(tss_offset);
> +
> +	setup_segments64();

Rather than call back into asm, how about doing this in inline asm?  It's a bit
gross no matter what, but this has the advantage of getting to use KERNEL_CS/DS
without extra magic.

And then the future patch that manipulates MSR_GS_BASE can use the wrmsr() helper.

	asm volatile("mov %0, %%ds\n\t"
		     "mov %0, %%es\n\t"
		     "mov %0, %%fs\n\t"
		     "mov %0, %%gs\n\t"
		     "mov %0, %%ss\n\t"
		     "pushq %1\n\t"
		     "lea 1f(%%rip), %0\n\t"
		     "pushq %0\n\t"
		     "lretq\n\t"
		     "1:"
		     :: "r" ((u64)KERNEL_DS), "i" (KERNEL_CS)
	);


>  efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
> diff --git a/x86/efi/efistart64.S b/x86/efi/efistart64.S
> index ea3d1c0..8eadca5 100644
> --- a/x86/efi/efistart64.S
> +++ b/x86/efi/efistart64.S
> @@ -26,15 +26,8 @@ ptl4:
>  .code64
>  .text
>  
> -.globl load_gdt_tss
> -load_gdt_tss:
> -	/* Load GDT */
> -	lgdt gdt_descr(%rip)
> -
> -	/* Load TSS */
> -	mov %rdi, %rax
> -	ltr %ax
> -
> +.globl setup_segments64
> +setup_segments64:
>  	/* Update data segments */
>  	mov $0x10, %ax /* 3rd entry in gdt64: 32/64-bit data segment */
>  	mov %ax, %ds
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 05/10] x86: efi: Stop using UEFI-provided stack
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 05/10] x86: efi: Stop using UEFI-provided stack Varad Gautam
@ 2022-04-13 16:20   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 16:20 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Tue, Apr 12, 2022, Varad Gautam wrote:
> UEFI test builds currently use the stack pointer configured by the
> UEFI implementation.
> 
> Reserve stack space in .data for EFI testcases and switch %rsp to
> use this memory on early boot. This provides one 4K page per CPU
> to store its stack / percpu data.
> 
> Signed-off-by: Varad Gautam <varad.gautam@suse.com>
> ---
>  x86/efi/crt0-efi-x86_64.S | 3 +++
>  x86/efi/efistart64.S      | 8 ++++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/x86/efi/crt0-efi-x86_64.S b/x86/efi/crt0-efi-x86_64.S
> index eaf1656..1708ed5 100644
> --- a/x86/efi/crt0-efi-x86_64.S
> +++ b/x86/efi/crt0-efi-x86_64.S
> @@ -58,6 +58,9 @@ _start:
>  	popq %rdi
>  	popq %rsi
>  
> +	/* Switch away from EFI stack. */
> +	lea stacktop(%rip), %rsp
> +
>  	call efi_main
>  	addq $8, %rsp
>  
> diff --git a/x86/efi/efistart64.S b/x86/efi/efistart64.S
> index 8eadca5..cb08230 100644
> --- a/x86/efi/efistart64.S
> +++ b/x86/efi/efistart64.S
> @@ -6,6 +6,14 @@
>  
>  .data
>  
> +max_cpus = MAX_TEST_CPUS

Why declare max_cpus?  MAX_TEST_CPUS can be used directly.

> +

Rather than align the stack top, wouldn't it be easier and more obvious to the
reader to pre-align to?

.align PAGE_SIZE

> +/* Reserve stack in .data */
> +	. = . + 4096 * max_cpus

Use PAGE_SIZE instead of open coding 4096.

E.g. this works for me with my MSR_GS_BASE suggestion in the next patch (spoiler
alert).

/* Reserve stack in .data */
.data
.align PAGE_SIZE
	. = . + PAGE_SIZE * MAX_TEST_CPUS
.globl stacktop
stacktop:


> +	.align 16
> +.globl stacktop
> +stacktop:
> +
>  .align PAGE_SIZE
>  .globl ptl2
>  ptl2:
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 06/10] x86: efi: Stop using UEFI-provided %gs for percpu storage
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 06/10] x86: efi: Stop using UEFI-provided %gs for percpu storage Varad Gautam
@ 2022-04-13 16:23   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 16:23 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Tue, Apr 12, 2022, Varad Gautam wrote:
> UEFI tests do not update MSR_GS_BASE during bringup, and continue
> using the GS_BASE set up by the UEFI implementation for percpu
> storage.
> 
> Update this MSR during setup_segments64() to allow storing percpu
> data at a sane location reserved by the testcase, and ensure that
> this happens before any operation that ends up storing to the percpu
> space

It's worth noting in the changelog that reset_apic() needs to be moved below
setup_gdt_tss() as it depends on per-cpu setup.  That definitely won't be obvious
to most people.

> Signed-off-by: Varad Gautam <varad.gautam@suse.com>
> ---
>  lib/x86/setup.c      | 9 ++++++---
>  x86/efi/efistart64.S | 7 +++++++
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/x86/setup.c b/lib/x86/setup.c
> index 7dd6677..5d32d3f 100644
> --- a/lib/x86/setup.c
> +++ b/lib/x86/setup.c
> @@ -170,7 +170,8 @@ void setup_multiboot(struct mbi_bootinfo *bi)
>  #ifdef CONFIG_EFI
>  
>  /* From x86/efi/efistart64.S */
> -extern void setup_segments64(void);
> +extern void setup_segments64(u64 gs_base);
> +extern u8 stacktop;
>  
>  static efi_status_t setup_memory_allocator(efi_bootinfo_t *efi_bootinfo)
>  {
> @@ -271,12 +272,14 @@ static void setup_page_table(void)
>  static void setup_gdt_tss(void)
>  {
>  	size_t tss_offset;
> +	u64 gs_base;
>  
>  	/* 64-bit setup_tss does not use the stacktop argument.  */
>  	tss_offset = setup_tss(NULL);
>  	load_gdt_tss(tss_offset);
>  
> -	setup_segments64();
> +	gs_base = (u64)(&stacktop) - (PAGE_SIZE * (pre_boot_apic_id() + 1));

Rather than follow the (IMO awful) non-EFI behavior of hijacking a chunk of the
stack, which is a symptom of doing everything in asm, since this is now C we
can declare a proper percpu array and index that.  Disclaimer, this has only been
tested with smp=1 at this point, haven't reached the end of the series :-)

diff --git a/lib/x86/setup.c b/lib/x86/setup.c
index 6131ea2..91a06f7 100644
--- a/lib/x86/setup.c
+++ b/lib/x86/setup.c
@@ -169,6 +169,8 @@ void setup_multiboot(struct mbi_bootinfo *bi)

 #ifdef CONFIG_EFI

+static struct percpu_data __percpu_data[MAX_TEST_CPUS];
+
 static efi_status_t setup_memory_allocator(efi_bootinfo_t *efi_bootinfo)
 {
        int i;
@@ -285,6 +287,8 @@ static void setup_gdt_tss(void)
                     "1:"
                     :: "r" ((u64)KERNEL_DS), "i" (KERNEL_CS)
        );
+
+       wrmsr(MSR_GS_BASE, (u64)&__percpu_data[pre_boot_apic_id()]);
 }

 efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
@@ -326,8 +330,8 @@ efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
                return status;
        }

-       reset_apic();
        setup_gdt_tss();
+       reset_apic();
        setup_idt();
        load_idt();
        mask_pic_interrupts();

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 03/10] x86: desc: Split IDT entry setup into a generic helper
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 03/10] x86: desc: Split IDT entry setup into a generic helper Varad Gautam
@ 2022-04-13 16:35   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 16:35 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Tue, Apr 12, 2022, Varad Gautam wrote:
> EFI bootstrapping code configures a call gate in a later commit to jump
> from 16-bit to 32-bit code.
> 
> Introduce a set_idt_entry_t() routine which can be used to fill both
> an interrupt descriptor and a call gate descriptor on x86.
> 
> Signed-off-by: Varad Gautam <varad.gautam@suse.com>
> ---
>  lib/x86/desc.c | 28 ++++++++++++++++++++++------
>  lib/x86/desc.h |  1 +
>  2 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/x86/desc.c b/lib/x86/desc.c
> index 087e85c..049adeb 100644
> --- a/lib/x86/desc.c
> +++ b/lib/x86/desc.c
> @@ -57,22 +57,38 @@ __attribute__((regparm(1)))
>  #endif
>  void do_handle_exception(struct ex_regs *regs);
>  
> -void set_idt_entry(int vec, void *addr, int dpl)
> +/*
> + * Fill an idt_entry_t, clearing e_sz bytes first.
> + *
> + * This can also be used to set up x86 call gates, since the gate
> + * descriptor layout is identical to idt_entry_t, except for the
> + * absence of .offset2 and .reserved fields. To do so, pass in e_sz
> + * according to the gate descriptor size.
> + */
> +void set_idt_entry_t(idt_entry_t *e, size_t e_sz, void *addr,
> +		u16 sel, u16 type, u16 dpl)

The usage in patch 7, "Transition APs from 16-bit to 32-bit mode" is really confusing
because it's calling an IDT helper to setup the GDT.  Also, the "_t" postfix usually
indicates a typedef, not a function

Rather than set_idt_entry_t, maybe set_oversized_desc_entry()?  That's not very good
either, but it's at least not outright confusing.  Definitely open to other suggestions...

>  {
> -	idt_entry_t *e = &boot_idt[vec];
> -	memset(e, 0, sizeof *e);
> +	memset(e, 0, e_sz);
>  	e->offset0 = (unsigned long)addr;
> -	e->selector = read_cs();
> +	e->selector = sel;
>  	e->ist = 0;
> -	e->type = 14;
> +	e->type = type;
>  	e->dpl = dpl;
>  	e->p = 1;
>  	e->offset1 = (unsigned long)addr >> 16;
>  #ifdef __x86_64__
> -	e->offset2 = (unsigned long)addr >> 32;
> +	if (e_sz == sizeof(*e)) {
> +		e->offset2 = (unsigned long)addr >> 32;
> +	}
>  #endif
>  }
>  
> +void set_idt_entry(int vec, void *addr, int dpl)
> +{
> +	idt_entry_t *e = &boot_idt[vec];
> +	set_idt_entry_t(e, sizeof *e, addr, read_cs(), 14, dpl);
> +}
> +
>  void set_idt_dpl(int vec, u16 dpl)
>  {
>  	idt_entry_t *e = &boot_idt[vec];
> diff --git a/lib/x86/desc.h b/lib/x86/desc.h
> index 3044409..ae0928f 100644
> --- a/lib/x86/desc.h
> +++ b/lib/x86/desc.h
> @@ -217,6 +217,7 @@ unsigned exception_vector(void);
>  int write_cr4_checking(unsigned long val);
>  unsigned exception_error_code(void);
>  bool exception_rflags_rf(void);
> +void set_idt_entry_t(idt_entry_t *e, size_t e_sz, void *addr, u16 sel, u16 type, u16 dpl);
>  void set_idt_entry(int vec, void *addr, int dpl);
>  void set_idt_sel(int vec, u16 sel);
>  void set_idt_dpl(int vec, u16 dpl);
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c
  2022-04-12 17:33 ` [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c Varad Gautam
  2022-04-13 15:16   ` Sean Christopherson
@ 2022-04-13 18:44   ` Sean Christopherson
  2022-04-13 18:55     ` Sean Christopherson
  1 sibling, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 18:44 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Tue, Apr 12, 2022, Varad Gautam wrote:
> @@ -142,3 +143,26 @@ void smp_reset_apic(void)
>  
>  	atomic_inc(&active_cpus);
>  }
> +
> +void ap_init(void)
> +{
> +	u8 *dst_addr = 0;

Oof, this is subtle.  I didn't realize until patch 7 that this is actually using
va=pa=0 for the trampoline.

Does anything actually prevent KUT from allocating pa=0?  Ah, looks like __setup_vm()
excludes the lower 1mb.

'0' should be a #define somewhere, e.g. EFI_RM_TRAMPOLINE_PHYS_ADDR, probably in
lib/alloc_page.h next to AREA_ANY_NUMBER with a comment tying the two together.
And then patch 7 can (hopefully without too much pain) use the define instead of
open coding the reference to PA=0, which is really confusing and unnecessarily
fragile.

E.g. instead of

	/* Retrieve relocated gdt32_descr address at (PAGE_SIZE - 2). */
	mov (PAGE_SIZE - 2), %ebx

hopefully it can be

	mov (EFI_RM_TRAMPOLINE_PHYS_ADDR + PAGE_SIZE - 2), %ebx

> +	size_t sipi_sz = (&sipi_end - &sipi_entry) + 1;

Nit, maybe sipi_trampoline_size?

> +	asm volatile("cld");
> +
> +	/* Relocate SIPI vector to dst_addr so it can run in 16-bit mode. */
> +	memcpy(dst_addr, &sipi_entry, sipi_sz);

A more descriptive name than dst_addr would help, and I'm pretty sure it can be
a void *.  Maybe?

	void *rm_trampoline = EFI_RM_TRAMPOLINE_PHYS_ADDR?

And rather than add the assert+memset in patch 7, do that here.  Oh, and fill
the page with 0xcc, i.e. int3, instead of 0, so that if an AP wanders into the
weeds, it gets a fault and shutdown.  All zeros decodes to ADD [rax], rax (or maybe
the reverse?), i.e. isn't guaranteed to fail right away.

	assert(sipi_trampoline_size < PAGE_SIZE);

	/* Comment goes here about filling with 0xcc. */
	memset(rm_trampoline, 0xcc, PAGE_SIZE);
	memcpy(rm_trampoline, &sipi_entry, sipi_trampoline_size);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c
  2022-04-13 18:44   ` Sean Christopherson
@ 2022-04-13 18:55     ` Sean Christopherson
  2022-04-13 18:59       ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 18:55 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Wed, Apr 13, 2022, Sean Christopherson wrote:
> On Tue, Apr 12, 2022, Varad Gautam wrote:
> > @@ -142,3 +143,26 @@ void smp_reset_apic(void)
> >  
> >  	atomic_inc(&active_cpus);
> >  }
> > +
> > +void ap_init(void)
> > +{
> > +	u8 *dst_addr = 0;
> 
> Oof, this is subtle.  I didn't realize until patch 7 that this is actually using
> va=pa=0 for the trampoline.
> 
> Does anything actually prevent KUT from allocating pa=0?  Ah, looks like __setup_vm()
> excludes the lower 1mb.
> 
> '0' should be a #define somewhere, e.g. EFI_RM_TRAMPOLINE_PHYS_ADDR, probably in
> lib/alloc_page.h next to AREA_ANY_NUMBER with a comment tying the two together.
> And then patch 7 can (hopefully without too much pain) use the define instead of
> open coding the reference to PA=0, which is really confusing and unnecessarily
> fragile.
> 
> E.g. instead of
> 
> 	/* Retrieve relocated gdt32_descr address at (PAGE_SIZE - 2). */
> 	mov (PAGE_SIZE - 2), %ebx
> 
> hopefully it can be
> 
> 	mov (EFI_RM_TRAMPOLINE_PHYS_ADDR + PAGE_SIZE - 2), %ebx
> 
> > +	size_t sipi_sz = (&sipi_end - &sipi_entry) + 1;
> 
> Nit, maybe sipi_trampoline_size?

Ooooh, and I finally realized in patch 7 that the "sipi" area encompasses the GDT
as well.  I couldn't figure out how the GDT was getting relocated :-)

Definitely needs a different name.  How about efi_rm_trampoline_start,
efi_rm_trampoline_end, and efi_rm_trampoline_size?  The "real mode" part is
kinda misleading, but on the other hand it's also the main reason why this needs
to be relocated to super low memory.  And add a comment 

That will help make it a little more obvious that there's more than just the SIPI
code that is getting manually relocated.

It's probably still worth having an explicit sipi_entry label, with a comment to
document that it absolutely needs to be at the start of the trampoline so that
the SIPI vector sends APs to the right location.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c
  2022-04-13 18:55     ` Sean Christopherson
@ 2022-04-13 18:59       ` Sean Christopherson
  2022-04-13 19:01         ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 18:59 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Wed, Apr 13, 2022, Sean Christopherson wrote:
> On Wed, Apr 13, 2022, Sean Christopherson wrote:
> > On Tue, Apr 12, 2022, Varad Gautam wrote:
> > > @@ -142,3 +143,26 @@ void smp_reset_apic(void)
> > >  
> > >  	atomic_inc(&active_cpus);
> > >  }
> > > +
> > > +void ap_init(void)

Sorry for chaining these, I keep understanding more things as I read through the
end of the series.  Hopefully this is the last one.

Can this be named setup_efi_rm_trampoline()?  Or whatever best matches the name
we decide on.  I keep thinking APs bounce through this to do their initialization,
but it's the BSP doing setup to prep waking the APs.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c
  2022-04-13 18:59       ` Sean Christopherson
@ 2022-04-13 19:01         ` Sean Christopherson
  2022-04-13 19:04           ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 19:01 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Wed, Apr 13, 2022, Sean Christopherson wrote:
> On Wed, Apr 13, 2022, Sean Christopherson wrote:
> > On Wed, Apr 13, 2022, Sean Christopherson wrote:
> > > On Tue, Apr 12, 2022, Varad Gautam wrote:
> > > > @@ -142,3 +143,26 @@ void smp_reset_apic(void)
> > > >  
> > > >  	atomic_inc(&active_cpus);
> > > >  }
> > > > +
> > > > +void ap_init(void)
> 
> Sorry for chaining these, I keep understanding more things as I read through the
> end of the series.  Hopefully this is the last one.
> 
> Can this be named setup_efi_rm_trampoline()?  Or whatever best matches the name
> we decide on.  I keep thinking APs bounce through this to do their initialization,
> but it's the BSP doing setup to prep waking the APs.

Well that didn't take long.  Just realized this isn't unique to EFI, and it also
does the waking.  Maybe wake_aps()?  That makes smp_init() even more confusing
(I keep thinking that it wakes APs...), but IMO smp_init() is the one that needs
a new name.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c
  2022-04-13 19:01         ` Sean Christopherson
@ 2022-04-13 19:04           ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 19:04 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Wed, Apr 13, 2022, Sean Christopherson wrote:
> On Wed, Apr 13, 2022, Sean Christopherson wrote:
> > On Wed, Apr 13, 2022, Sean Christopherson wrote:
> > > On Wed, Apr 13, 2022, Sean Christopherson wrote:
> > > > On Tue, Apr 12, 2022, Varad Gautam wrote:
> > > > > @@ -142,3 +143,26 @@ void smp_reset_apic(void)
> > > > >  
> > > > >  	atomic_inc(&active_cpus);
> > > > >  }
> > > > > +
> > > > > +void ap_init(void)
> > 
> > Sorry for chaining these, I keep understanding more things as I read through the
> > end of the series.  Hopefully this is the last one.
> > 
> > Can this be named setup_efi_rm_trampoline()?  Or whatever best matches the name
> > we decide on.  I keep thinking APs bounce through this to do their initialization,
> > but it's the BSP doing setup to prep waking the APs.
> 
> Well that didn't take long.  Just realized this isn't unique to EFI, and it also
> does the waking.  Maybe wake_aps()?  That makes smp_init() even more confusing
> (I keep thinking that it wakes APs...), but IMO smp_init() is the one that needs
> a new name.

*sigh*  Ignore the last complaint about smp_init(), it does do SMP initialization
by sending IPIs.  We should really change that, but that's a future problem.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 07/10] x86: efi, smp: Transition APs from 16-bit to 32-bit mode
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 07/10] x86: efi, smp: Transition APs from 16-bit to 32-bit mode Varad Gautam
@ 2022-04-13 19:17   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 19:17 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Tue, Apr 12, 2022, Varad Gautam wrote:
> Sending INIT/SIPI to APs from ap_init() resets them into 16-bit mode
> to loop into sipi_entry().
> 
> To drive the APs into 32-bit mode, the SIPI vector needs:
> 1. A GDT descriptor reachable from 16-bit code (gdt32_descr).
> 2. A 32-bit entrypoint reachable from 16-bit code (ap_start32).
> 3. The locations of GDT and the 32-bit entrypoint.
> 
> Setting these up at compile time (like on non-EFI builds) is not
> possible since EFI builds with -shared -fPIC and efistart64.S cannot
> reference any absolute addresses.
> 
> Relative addressing is unavailable on 16-bit mode.
> 
> Moreover, EFI may not load the 32-bit entrypoint to be reachable from
> 16-bit mode.
> 
> To overcome these problems,
> 1. Fill the GDT descriptor at runtime after relocating
>    [sipi_entry-sipi_end] to lowmem. Since sipi_entry does not know the
>    address of this descriptor, use the last two bytes of SIPI page to
>    communicate it.
> 2. Place a call gate in the GDT to point to ap_start32.
> 3. Popluate sipi_entry() to lcall to ap_start32.
> 
> With this, the APs can transition to 32-bit mode and loop at a known
> location.
> 
> Signed-off-by: Varad Gautam <varad.gautam@suse.com>
> ---
>  lib/x86/smp.c        | 56 ++++++++++++++++++++++++++++++++++++++++++++
>  x86/efi/efistart64.S | 29 ++++++++++++++++++++++-
>  2 files changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/x86/smp.c b/lib/x86/smp.c
> index d7f5aba..5cc1648 100644
> --- a/lib/x86/smp.c
> +++ b/lib/x86/smp.c
> @@ -6,6 +6,7 @@
>  #include "apic.h"
>  #include "fwcfg.h"
>  #include "desc.h"
> +#include "asm/page.h"
>  
>  #define IPI_VECTOR 0x20
>  
> @@ -144,16 +145,71 @@ void smp_reset_apic(void)
>  	atomic_inc(&active_cpus);
>  }
>  
> +#ifdef CONFIG_EFI
> +extern u8 gdt32_descr, gdt32, gdt32_end;
> +extern u8 ap_start32;
> +#endif
> +
>  void ap_init(void)
>  {
>  	u8 *dst_addr = 0;
>  	size_t sipi_sz = (&sipi_end - &sipi_entry) + 1;
>  
> +	assert(sipi_sz < PAGE_SIZE);
> +
>  	asm volatile("cld");
>  
>  	/* Relocate SIPI vector to dst_addr so it can run in 16-bit mode. */
> +	memset(dst_addr, 0, PAGE_SIZE);
>  	memcpy(dst_addr, &sipi_entry, sipi_sz);
>  
> +#ifdef CONFIG_EFI
> +	volatile struct descriptor_table_ptr *gdt32_descr_rel;
> +	idt_entry_t *gate_descr;
> +	u16 *gdt32_descr_reladdr = (u16 *) (PAGE_SIZE - sizeof(u16));

Ah, the gdt32 name confused me for a bit.  Can we use something more unique?
Maybe efi_rm_trampoline_gdt?  Or just efi_trampoline_gdt since the "rm" part is
technically wrong (though consistent).

Oooh, and the "PAGE_SIZE - sizeof(u16)" is another instance of open coding
"PAGE_SIZE - 2".  Definitely need a #define for that.

> +
> +	/*
> +	 * gdt32_descr for CONFIG_EFI needs to be filled here dynamically
> +	 * since compile time calculation of offsets is not allowed when
> +	 * building with -shared, and rip-relative addressing is not supported
> +	 * in 16-bit mode.
> +	 *
> +	 * Use the last two bytes of SIPI page to store relocated gdt32_descr
> +	 * addr.

Ooh, I see, it's a double indirection.  Load the address to the descriptor, then
load the descriptor which holds the address to the GDT.  I kept thinking that 
2 bytes were the limit chunk of the descriptor.

Maybe instead of "relocated gdt32_descr addr", "a pointer to the relocated
descriptor used by the trampoline to load the GDT"?

> +	 */
> +	*gdt32_descr_reladdr = (&gdt32_descr - &sipi_entry);
> +
> +	gdt32_descr_rel = (struct descriptor_table_ptr *) ((u64) *gdt32_descr_reladdr);
> +	gdt32_descr_rel->limit = (u16) (&gdt32_end - &gdt32 - 1);
> +	gdt32_descr_rel->base = (ulong) ((u32) (&gdt32 - &sipi_entry));
> +
> +	/*
> +	 * EFI may not load the 32-bit AP entrypoint (ap_start32) low enough
> +	 * to be reachable from the SIPI vector. Since we build with -shared, this

Nit, please avoid "we" and other pronous, e.g. "Since KVM-unit-tests is built with..."
avoids any ambiguity.

> +	 * location needs to be fetched at runtime, and rip-relative addressing is
> +	 * not supported in 16-bit mode.
> +	 * To perform 16-bit -> 32-bit far jump, our options are:

"our options" can be tweaked to something like "Alternatives to a CALL GATE are".

> +	 * - ljmpl $cs, $label : unusable since $label is not known at build time.
> +	 * - push $cs; push $label; lret : requires an intermediate trampoline since
> +	 *	 $label must still be within 0 - 0xFFFF for 16-bit far return to work.
> +	 * - lcall into a call-gate : best suited.

I very much appreciate the comment, but it's backwards in the sense that I had no
idea what I was reading until the very end.  I.e. lead with the below "Set up a
call gate ..." blurb, then dive into the alternatives.  First and forement, help
the reader understand what is actually implemented, then call out the alternatives
and why they're inferior.

> +	 *
> +	 * Set up call gate to ap_start32 within GDT.
> +	 *
> +	 * gdt32 layout:
> +	 *
> +	 * Entry | Segment
> +	 * 0	 | NULL descr
> +	 * 1	 | Code segment descr
> +	 * 2	 | Data segment descr
> +	 * 3	 | Call gate descr
> +	 */
> +	gate_descr = (idt_entry_t *) ((u8 *)(&gdt32 - &sipi_entry)
> +		+ 3 * sizeof(gdt_entry_t));

Ah, it's just a coincidence that this GDT has the same layout as the "real" GDT,
e.g. has the same KERNEL_DS and KERNEL_CS.  It took me a while to suss out that
the APs do indeed reload a different GDT during ap_start64().

In the comment above, can you add a blurb to call out that it's purely coincidental
and not strictly required that this matches the initial part of the final GDT, and
that APs will load the final GDT in ap_start64()?  I kept trying to figure out how
this didn't break tests that use other selectors :-)

> +	set_idt_entry_t(gate_descr, sizeof(gdt_entry_t), (void *) &ap_start32,
> +		0x8 /* sel */, 0xc /* type */, 0 /* dpl */);
> +#endif
> +
>  	/* INIT */
>  	apic_icr_write(APIC_DEST_ALLBUT | APIC_DEST_PHYSICAL | APIC_DM_INIT | APIC_INT_ASSERT, 0);
>  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 08/10] x86: Move 32-bit bringup routines to start32.S
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 08/10] x86: Move 32-bit bringup routines to start32.S Varad Gautam
@ 2022-04-13 19:33   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 19:33 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Tue, Apr 12, 2022, Varad Gautam wrote:
> These can be shared across EFI and non-EFI builds.
> 
> Signed-off-by: Varad Gautam <varad.gautam@suse.com>
> ---
> diff --git a/x86/start32.S b/x86/start32.S
> new file mode 100644
> index 0000000..9e00474
> --- /dev/null
> +++ b/x86/start32.S
> @@ -0,0 +1,62 @@
> +/* Common 32-bit code between EFI and non-EFI bootstrapping. */
> +
> +.code32
> +
> +MSR_GS_BASE = 0xc0000101
> +
> +.macro setup_percpu_area
> +	lea -4096(%esp), %eax
> +	mov $0, %edx
> +	mov $MSR_GS_BASE, %ecx
> +	wrmsr
> +.endm
> +
> +.macro setup_segments
> +	mov $MSR_GS_BASE, %ecx
> +	rdmsr
> +
> +	mov $0x10, %bx
> +	mov %bx, %ds
> +	mov %bx, %es
> +	mov %bx, %fs
> +	mov %bx, %gs
> +	mov %bx, %ss
> +
> +	/* restore MSR_GS_BASE */
> +	wrmsr
> +.endm
> +
> +prepare_64:
> +	lgdt gdt_descr
> +	setup_segments
> +
> +	xor %eax, %eax
> +	mov %eax, %cr4
> +
> +enter_long_mode:
> +	mov %cr4, %eax
> +	bts $5, %eax  // pae
> +	mov %eax, %cr4
> +
> +	mov pt_root, %eax
> +	mov %eax, %cr3
> +
> +efer = 0xc0000080
> +	mov $efer, %ecx
> +	rdmsr
> +	bts $8, %eax
> +	wrmsr
> +
> +	mov %cr0, %eax
> +	bts $0, %eax
> +	bts $31, %eax
> +	mov %eax, %cr0
> +	ret
> +
> +ap_start32:
> +	setup_segments
> +	mov $-4096, %esp
> +	lock xaddl %esp, smp_stacktop
> +	setup_percpu_area
> +	call prepare_64

I suspect this will conflict with my idea of using a dedicated percpu area.  But
can't that be remedied by adding a prep patch to drop setup_percpu_area and add a
C helper to the setup (using the dedicated area), called from ap_start64()?  I don't
see any instances of gs: being used before reset_apic().

Then the funky save/restore of MSR_GS_BASE also disappears.

> +	ljmpl $8, $ap_start64
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 10/10] x86: Provide a common 64-bit AP entrypoint for EFI and non-EFI
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 10/10] x86: Provide a common 64-bit AP entrypoint for EFI and non-EFI Varad Gautam
@ 2022-04-13 19:55   ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 19:55 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Tue, Apr 12, 2022, Varad Gautam wrote:
> ap_start64() currently serves as the 64-bit entrypoint for non-EFI
> tests.
> 
> Having ap_start64() and save_id() written in asm prevents sharing these
> routines between EFI and non-EFI tests.
> 
> Rewrite them in C and use ap_start64 as the 64-bit entrypoint in the EFI
> boot flow.
> 
> With this, EFI tests support -smp > 1. smptest.efi now passes.
> 
> Signed-off-by: Varad Gautam <varad.gautam@suse.com>
> ---
>  lib/x86/asm/setup.h  |  3 +++
>  lib/x86/setup.c      | 54 +++++++++++++++++++++++++++++++++-----------
>  lib/x86/smp.c        |  1 +
>  x86/cstart64.S       | 24 --------------------
>  x86/efi/efistart64.S |  5 ----
>  5 files changed, 45 insertions(+), 42 deletions(-)
> 
> diff --git a/lib/x86/asm/setup.h b/lib/x86/asm/setup.h
> index 24d4fa9..8502e7d 100644
> --- a/lib/x86/asm/setup.h
> +++ b/lib/x86/asm/setup.h
> @@ -16,4 +16,7 @@ efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo);
>  void setup_5level_page_table(void);
>  #endif /* CONFIG_EFI */
>  
> +void save_id(void);
> +void ap_start64(void);
> +
>  #endif /* _X86_ASM_SETUP_H_ */
> diff --git a/lib/x86/setup.c b/lib/x86/setup.c
> index e2f7967..a0e0b0c 100644
> --- a/lib/x86/setup.c
> +++ b/lib/x86/setup.c
> @@ -14,8 +14,12 @@
>  #include "apic.h"
>  #include "apic-defs.h"
>  #include "asm/setup.h"
> +#include "processor.h"
> +#include "atomic.h"
>  
>  extern char edata;
> +extern unsigned char online_cpus[(MAX_TEST_CPUS + 7) / 8];

This is also in lib/x86/apic.c, I think it makes sense to move the declaration
to smp.h.  And opportunistically tweak the open coded size to:

  extern unsigned char online_cpus[DIV_ROUND_UP(MAX_TEST_CPUS), BITS_PER_BYTE)];

> +extern unsigned cpu_online_count;

This should probably go into smp.h too, e.g. svm_tests.c also declares it.

> @@ -328,6 +334,7 @@ efi_status_t setup_efi(efi_bootinfo_t *efi_bootinfo)
>  	mask_pic_interrupts();
>  	setup_page_table();
>  	enable_apic();
> +	save_id();
>  	ap_init();
>  	enable_x2apic();
>  	smp_init();
> @@ -350,3 +357,24 @@ void setup_libcflat(void)
>  			add_setup_arg("bootloader");
>  	}
>  }
> +
> +void save_id(void)

save_id() is a very odd name for what this is doing.  Maybe mark_cpu_online()?
Or am I overlooking something?

> +{
> +	u32 id = apic_id();
> +
> +	/* atomic_fetch_or() emits `lock or %dl, (%eax)` */
> +	atomic_fetch_or(&online_cpus[id / 8], (1 << (id % 8)));

Heh, this makes my brain go "what!?!".  I strongly prefer we add^Wcopy the kernel's
arch_set_bit() into lib/x86/atomic.h as atomic_set_bit(), then this becomes

	atomic_set_bit(&online_cpus, apic_id());

which is waaay easier to understand.

> +}
> +
> +void ap_start64(void)
> +{
> +	setup_gdt_tss();
> +	reset_apic();
> +	load_idt();
> +	save_id();
> +	enable_apic();
> +	enable_x2apic();
> +	sti();

Hmm, so ap_start64 has a nop after the sti, presumably to consume the STI blocking
shadow.  _Why_ it needed to that, I have no idea, any pending IRQs should be
serviced prior to actually executing HLT.  Purely to be stupidly cautious, can
you drop the "nop" from ap_start64 in a prep patch?  Just so that if there's some
magic we're missing, it shows up in a more obvious bisect.

> +	atomic_fetch_inc(&cpu_online_count);
> +	asm volatile("1: hlt; jmp 1b");

Unless I'm missing something, this should work and makes it more obvious that
the vCPU is being put into a loop:

	for (;;)
		asm volatile("hlt");

or while(1) if that's your preference.

> +}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests
  2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
                   ` (9 preceding siblings ...)
  2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 10/10] x86: Provide a common 64-bit AP entrypoint for EFI and non-EFI Varad Gautam
@ 2022-04-13 19:57 ` Sean Christopherson
  2022-04-26 11:51   ` Varad Gautam
  10 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2022-04-13 19:57 UTC (permalink / raw)
  To: Varad Gautam
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On Tue, Apr 12, 2022, Varad Gautam wrote:
> This series brings multi-vcpu support to UEFI tests on x86.
> 
> Most of the necessary AP bringup code already exists within kvm-unit-tests'
> cstart64.S, and has now been either rewritten in C or moved to a common location
> to be shared between EFI and non-EFI test builds.
> 
> A call gate is used to transition from 16-bit to 32-bit mode, since EFI may
> not load the 32-bit entrypoint low enough to be reachable from the SIPI vector.
> 
> Changes in v2:
> - rebase onto kvm-unit-tests@1a4529ce83 + seanjc's percpu apic_ops series [1].

Thanks for taking on the rebase pain, I appreciate it!

Lots of comments, but mostly minor things to (hopefully) improve readability.  I
belive the mixup with 32-bit targets is the only thing that might get painful.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests
  2022-04-13 19:57 ` [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Sean Christopherson
@ 2022-04-26 11:51   ` Varad Gautam
  0 siblings, 0 replies; 27+ messages in thread
From: Varad Gautam @ 2022-04-26 11:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, pbonzini, drjones, marcorr, zxwang42, erdemaktas, rientjes,
	brijesh.singh, Thomas.Lendacky, jroedel, bp

On 4/13/22 9:57 PM, Sean Christopherson wrote:
> On Tue, Apr 12, 2022, Varad Gautam wrote:
>> This series brings multi-vcpu support to UEFI tests on x86.
>>
>> Most of the necessary AP bringup code already exists within kvm-unit-tests'
>> cstart64.S, and has now been either rewritten in C or moved to a common location
>> to be shared between EFI and non-EFI test builds.
>>
>> A call gate is used to transition from 16-bit to 32-bit mode, since EFI may
>> not load the 32-bit entrypoint low enough to be reachable from the SIPI vector.
>>
>> Changes in v2:
>> - rebase onto kvm-unit-tests@1a4529ce83 + seanjc's percpu apic_ops series [1].
> 
> Thanks for taking on the rebase pain, I appreciate it!
> 
> Lots of comments, but mostly minor things to (hopefully) improve readability.  I
> belive the mixup with 32-bit targets is the only thing that might get painful.
> 

I've sent out a v3 at [1] taking in most of your comments. I've only left out the
changes to non-EFI 32-bit asm bringup code (x86/start32.S) and some renames which
I think would better go into a different series to keep this one easier to follow.

[1] https://lore.kernel.org/kvm/20220426114352.1262-1-varad.gautam@suse.com/

Thanks,
Varad


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-04-26 11:51 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12 17:33 [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Varad Gautam
2022-04-12 17:33 ` [kvm-unit-tests PATCH v2 01/10] x86: Move ap_init() to smp.c Varad Gautam
2022-04-13 15:16   ` Sean Christopherson
2022-04-13 18:44   ` Sean Christopherson
2022-04-13 18:55     ` Sean Christopherson
2022-04-13 18:59       ` Sean Christopherson
2022-04-13 19:01         ` Sean Christopherson
2022-04-13 19:04           ` Sean Christopherson
2022-04-12 17:33 ` [kvm-unit-tests PATCH v2 02/10] x86: Move load_idt() to desc.c Varad Gautam
2022-04-13 15:19   ` Sean Christopherson
2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 03/10] x86: desc: Split IDT entry setup into a generic helper Varad Gautam
2022-04-13 16:35   ` Sean Christopherson
2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 04/10] x86: Move load_gdt_tss() to desc.c Varad Gautam
2022-04-13 15:58   ` Sean Christopherson
2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 05/10] x86: efi: Stop using UEFI-provided stack Varad Gautam
2022-04-13 16:20   ` Sean Christopherson
2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 06/10] x86: efi: Stop using UEFI-provided %gs for percpu storage Varad Gautam
2022-04-13 16:23   ` Sean Christopherson
2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 07/10] x86: efi, smp: Transition APs from 16-bit to 32-bit mode Varad Gautam
2022-04-13 19:17   ` Sean Christopherson
2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 08/10] x86: Move 32-bit bringup routines to start32.S Varad Gautam
2022-04-13 19:33   ` Sean Christopherson
2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 09/10] x86: efi, smp: Transition APs from 32-bit to 64-bit mode Varad Gautam
2022-04-12 17:34 ` [kvm-unit-tests PATCH v2 10/10] x86: Provide a common 64-bit AP entrypoint for EFI and non-EFI Varad Gautam
2022-04-13 19:55   ` Sean Christopherson
2022-04-13 19:57 ` [kvm-unit-tests PATCH v2 00/10] SMP Support for x86 UEFI Tests Sean Christopherson
2022-04-26 11:51   ` Varad Gautam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.