All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Fixes for s3 with parallel bootup
@ 2023-10-26 17:03 Mario Limonciello
  2023-10-26 17:03 ` [PATCH v2 1/2] x86: Enable x2apic during resume from suspend if used previously Mario Limonciello
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Mario Limonciello @ 2023-10-26 17:03 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter, Dave Hansen,
	H . Peter Anvin, Rafael J . Wysocki, Len Brown, Pavel Machek,
	David Woodhouse, Sandipan Das,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, open list:SUSPEND TO RAM,
	open list:ACPI, Mario Limonciello

Parallel bootup on systems that use x2apic broke suspend to ram.
This series ensures x2apic is re-enabled at startup and fixes an exposed
pre-emption issue.

Mario Limonciello (2):
  x86: Enable x2apic during resume from suspend if used previously
  perf/x86/amd: Stop calling amd_pmu_cpu_reset() from amd_pmu_cpu_dead()

 arch/x86/events/amd/core.c   |  7 ++++++-
 arch/x86/include/asm/smp.h   |  1 +
 arch/x86/kernel/acpi/sleep.c | 13 +++++++++----
 arch/x86/kernel/head_64.S    | 15 +++++++++++++++
 4 files changed, 31 insertions(+), 5 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] x86: Enable x2apic during resume from suspend if used previously
  2023-10-26 17:03 [PATCH v2 0/2] Fixes for s3 with parallel bootup Mario Limonciello
@ 2023-10-26 17:03 ` Mario Limonciello
  2023-10-26 17:15   ` Rafael J. Wysocki
  2023-10-27 21:31   ` Thomas Gleixner
  2023-10-26 17:03 ` [PATCH v2 2/2] perf/x86/amd: Stop calling amd_pmu_cpu_reset() from amd_pmu_cpu_dead() Mario Limonciello
  2023-10-27 19:24 ` [PATCH v2 0/2] Fixes for s3 with parallel bootup Thomas Gleixner
  2 siblings, 2 replies; 11+ messages in thread
From: Mario Limonciello @ 2023-10-26 17:03 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter, Dave Hansen,
	H . Peter Anvin, Rafael J . Wysocki, Len Brown, Pavel Machek,
	David Woodhouse, Sandipan Das,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, open list:SUSPEND TO RAM,
	open list:ACPI, Mario Limonciello

If x2apic was enabled during boot with parallel startup
it will be needed during resume from suspend to ram as well.

Store whether to enable into the smpboot_control global variable
and during startup re-enable it if necessary.

This fixes resume from suspend on workstation CPUs with x2apic
enabled.

It will also work on systems with one maxcpus=1 but still using
x2apic since x2apic is also re-enabled in lapic_resume().

Cc: stable@vger.kernel.org # 6.5
Fixes: 0c7ffa32dbd6 ("x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
v1->v2:
 * Clarify it's in workstations in commit message
 * Fix style issues in comment and curly braces
---
 arch/x86/include/asm/smp.h   |  1 +
 arch/x86/kernel/acpi/sleep.c | 13 +++++++++----
 arch/x86/kernel/head_64.S    | 15 +++++++++++++++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index c31c633419fe..86584ffaebc3 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -190,6 +190,7 @@ extern unsigned long apic_mmio_base;
 #endif /* !__ASSEMBLY__ */
 
 /* Control bits for startup_64 */
+#define STARTUP_ENABLE_X2APIC	0x40000000
 #define STARTUP_READ_APICID	0x80000000
 
 /* Top 8 bits are reserved for control */
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 6dfecb27b846..10d8921b4bb8 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -11,6 +11,7 @@
 #include <linux/dmi.h>
 #include <linux/cpumask.h>
 #include <linux/pgtable.h>
+#include <asm/apic.h>
 #include <asm/segment.h>
 #include <asm/desc.h>
 #include <asm/cacheflush.h>
@@ -129,12 +130,16 @@ int x86_acpi_suspend_lowlevel(void)
 	 */
 	current->thread.sp = (unsigned long)temp_stack + sizeof(temp_stack);
 	/*
-	 * Ensure the CPU knows which one it is when it comes back, if
-	 * it isn't in parallel mode and expected to work that out for
-	 * itself.
+	 * Ensure x2apic is re-enabled if necessary and the CPU knows which
+	 * one it is when it comes back, if it isn't in parallel mode and
+	 * expected to work that out for itself.
 	 */
-	if (!(smpboot_control & STARTUP_PARALLEL_MASK))
+	if (smpboot_control & STARTUP_PARALLEL_MASK) {
+		if (x2apic_enabled())
+			smpboot_control |= STARTUP_ENABLE_X2APIC;
+	} else {
 		smpboot_control = smp_processor_id();
+	}
 #endif
 	initial_code = (unsigned long)wakeup_long64;
 	saved_magic = 0x123456789abcdef0L;
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index ea6995920b7a..300901af9fa3 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -237,9 +237,14 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	 * CPU number is encoded in smpboot_control.
 	 *
 	 * Bit 31	STARTUP_READ_APICID (Read APICID from APIC)
+	 * Bit 30	STARTUP_ENABLE_X2APIC (Enable X2APIC mode)
 	 * Bit 0-23	CPU# if STARTUP_xx flags are not set
 	 */
 	movl	smpboot_control(%rip), %ecx
+
+	testl	$STARTUP_ENABLE_X2APIC, %ecx
+	jnz	.Lenable_x2apic
+
 	testl	$STARTUP_READ_APICID, %ecx
 	jnz	.Lread_apicid
 	/*
@@ -249,6 +254,16 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	andl	$(~STARTUP_PARALLEL_MASK), %ecx
 	jmp	.Lsetup_cpu
 
+.Lenable_x2apic:
+	/* Enable X2APIC if disabled */
+	mov	$MSR_IA32_APICBASE, %ecx
+	rdmsr
+	testl	$X2APIC_ENABLE, %eax
+	jnz	.Lread_apicid_msr
+	orl	$X2APIC_ENABLE, %eax
+	wrmsr
+	jmp	.Lread_apicid_msr
+
 .Lread_apicid:
 	/* Check whether X2APIC mode is already enabled */
 	mov	$MSR_IA32_APICBASE, %ecx
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] perf/x86/amd: Stop calling amd_pmu_cpu_reset() from amd_pmu_cpu_dead()
  2023-10-26 17:03 [PATCH v2 0/2] Fixes for s3 with parallel bootup Mario Limonciello
  2023-10-26 17:03 ` [PATCH v2 1/2] x86: Enable x2apic during resume from suspend if used previously Mario Limonciello
@ 2023-10-26 17:03 ` Mario Limonciello
  2023-10-27 21:47   ` Thomas Gleixner
  2023-10-27 19:24 ` [PATCH v2 0/2] Fixes for s3 with parallel bootup Thomas Gleixner
  2 siblings, 1 reply; 11+ messages in thread
From: Mario Limonciello @ 2023-10-26 17:03 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter, Dave Hansen,
	H . Peter Anvin, Rafael J . Wysocki, Len Brown, Pavel Machek,
	David Woodhouse, Sandipan Das,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, open list:SUSPEND TO RAM,
	open list:ACPI, Mario Limonciello

During suspend testing on a workstation CPU a preemption BUG was reported.

```
BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2960
caller is amd_pmu_lbr_reset+0x19/0xc0
CPU: 104 PID: 2960 Comm: rtcwake Not tainted 6.6.0-rc6-00002-g3e2c7f3ac51f
Call Trace:
 <TASK>
 dump_stack_lvl+0x44/0x60
 check_preemption_disabled+0xce/0xf0
 ? __pfx_x86_pmu_dead_cpu+0x10/0x10
 amd_pmu_lbr_reset+0x19/0xc0
 ? __pfx_x86_pmu_dead_cpu+0x10/0x10
 amd_pmu_cpu_reset.constprop.0+0x51/0x60
 amd_pmu_cpu_dead+0x3e/0x90
 x86_pmu_dead_cpu+0x13/0x20
 cpuhp_invoke_callback+0x169/0x4b0
 ? __pfx_virtnet_cpu_dead+0x10/0x10
 __cpuhp_invoke_callback_range+0x76/0xe0
 _cpu_down+0x112/0x270
 freeze_secondary_cpus+0x8e/0x280
 suspend_devices_and_enter+0x342/0x900
 pm_suspend+0x2fd/0x690
 state_store+0x71/0xd0
 kernfs_fop_write_iter+0x128/0x1c0
 vfs_write+0x2db/0x400
 ksys_write+0x5f/0xe0
 do_syscall_64+0x59/0x90
 ? srso_alias_return_thunk+0x5/0x7f
 ? count_memcg_events.constprop.0+0x1a/0x30
 ? srso_alias_return_thunk+0x5/0x7f
 ? handle_mm_fault+0x1e9/0x340
 ? srso_alias_return_thunk+0x5/0x7f
 ? preempt_count_add+0x4d/0xa0
 ? srso_alias_return_thunk+0x5/0x7f
 ? up_read+0x38/0x70
 ? srso_alias_return_thunk+0x5/0x7f
 ? do_user_addr_fault+0x343/0x6b0
 ? srso_alias_return_thunk+0x5/0x7f
 ? exc_page_fault+0x74/0x170
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7f32f8d14a77
Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa
64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff
77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffdc648de18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f32f8d14a77
RDX: 0000000000000004 RSI: 000055b2fc2a5670 RDI: 0000000000000004
RBP: 000055b2fc2a5670 R08: 0000000000000000 R09: 000055b2fc2a5670
R10: 00007f32f8e1a2f0 R11: 0000000000000246 R12: 0000000000000004
R13: 000055b2fc2a2480 R14: 00007f32f8e16600 R15: 00007f32f8e15a00
 </TASK>
```

This bug shows that there is a mistake with the flow used for offlining
a CPU.  Calling amd_pmu_cpu_reset() from the dead callback is problematic
because this doesn't run on the actual CPU being offlined.  The intent of
the function is to reset MSRs local to that CPU.

Move the call into the dying callback which is actually run on the local
CPU.

Cc: stable@vger.kernel.org # 6.1+
Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support")
Suggested-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
v1->v2:
 * Add more of trace
 * Explain root cause better
 * Adjust solution to fix real root cause
---
 arch/x86/events/amd/core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index e24976593a29..4ec6d3ece07d 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -598,13 +598,17 @@ static void amd_pmu_cpu_starting(int cpu)
 	cpuc->amd_nb->refcnt++;
 }
 
+static void amd_pmu_cpu_dying(int cpu)
+{
+	amd_pmu_cpu_reset(cpu);
+}
+
 static void amd_pmu_cpu_dead(int cpu)
 {
 	struct cpu_hw_events *cpuhw = &per_cpu(cpu_hw_events, cpu);
 
 	kfree(cpuhw->lbr_sel);
 	cpuhw->lbr_sel = NULL;
-	amd_pmu_cpu_reset(cpu);
 
 	if (!x86_pmu.amd_nb_constraints)
 		return;
@@ -1270,6 +1274,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 
 	.cpu_prepare		= amd_pmu_cpu_prepare,
 	.cpu_starting		= amd_pmu_cpu_starting,
+	.cpu_dying		= amd_pmu_cpu_dying,
 	.cpu_dead		= amd_pmu_cpu_dead,
 
 	.amd_nb_constraints	= 1,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] x86: Enable x2apic during resume from suspend if used previously
  2023-10-26 17:03 ` [PATCH v2 1/2] x86: Enable x2apic during resume from suspend if used previously Mario Limonciello
@ 2023-10-26 17:15   ` Rafael J. Wysocki
  2023-10-27 21:31   ` Thomas Gleixner
  1 sibling, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2023-10-26 17:15 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter, Dave Hansen,
	H . Peter Anvin, Rafael J . Wysocki, Len Brown, Pavel Machek,
	David Woodhouse, Sandipan Das,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, open list:SUSPEND TO RAM,
	open list:ACPI

On Thu, Oct 26, 2023 at 7:03 PM Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
> If x2apic was enabled during boot with parallel startup
> it will be needed during resume from suspend to ram as well.
>
> Store whether to enable into the smpboot_control global variable
> and during startup re-enable it if necessary.
>
> This fixes resume from suspend on workstation CPUs with x2apic
> enabled.
>
> It will also work on systems with one maxcpus=1 but still using
> x2apic since x2apic is also re-enabled in lapic_resume().
>
> Cc: stable@vger.kernel.org # 6.5
> Fixes: 0c7ffa32dbd6 ("x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it")
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>

Acked-by: Rafael J. Wysocki <rafael@kernel.org>

> ---
> v1->v2:
>  * Clarify it's in workstations in commit message
>  * Fix style issues in comment and curly braces
> ---
>  arch/x86/include/asm/smp.h   |  1 +
>  arch/x86/kernel/acpi/sleep.c | 13 +++++++++----
>  arch/x86/kernel/head_64.S    | 15 +++++++++++++++
>  3 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index c31c633419fe..86584ffaebc3 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -190,6 +190,7 @@ extern unsigned long apic_mmio_base;
>  #endif /* !__ASSEMBLY__ */
>
>  /* Control bits for startup_64 */
> +#define STARTUP_ENABLE_X2APIC  0x40000000
>  #define STARTUP_READ_APICID    0x80000000
>
>  /* Top 8 bits are reserved for control */
> diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
> index 6dfecb27b846..10d8921b4bb8 100644
> --- a/arch/x86/kernel/acpi/sleep.c
> +++ b/arch/x86/kernel/acpi/sleep.c
> @@ -11,6 +11,7 @@
>  #include <linux/dmi.h>
>  #include <linux/cpumask.h>
>  #include <linux/pgtable.h>
> +#include <asm/apic.h>
>  #include <asm/segment.h>
>  #include <asm/desc.h>
>  #include <asm/cacheflush.h>
> @@ -129,12 +130,16 @@ int x86_acpi_suspend_lowlevel(void)
>          */
>         current->thread.sp = (unsigned long)temp_stack + sizeof(temp_stack);
>         /*
> -        * Ensure the CPU knows which one it is when it comes back, if
> -        * it isn't in parallel mode and expected to work that out for
> -        * itself.
> +        * Ensure x2apic is re-enabled if necessary and the CPU knows which
> +        * one it is when it comes back, if it isn't in parallel mode and
> +        * expected to work that out for itself.
>          */
> -       if (!(smpboot_control & STARTUP_PARALLEL_MASK))
> +       if (smpboot_control & STARTUP_PARALLEL_MASK) {
> +               if (x2apic_enabled())
> +                       smpboot_control |= STARTUP_ENABLE_X2APIC;
> +       } else {
>                 smpboot_control = smp_processor_id();
> +       }
>  #endif
>         initial_code = (unsigned long)wakeup_long64;
>         saved_magic = 0x123456789abcdef0L;
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index ea6995920b7a..300901af9fa3 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -237,9 +237,14 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>          * CPU number is encoded in smpboot_control.
>          *
>          * Bit 31       STARTUP_READ_APICID (Read APICID from APIC)
> +        * Bit 30       STARTUP_ENABLE_X2APIC (Enable X2APIC mode)
>          * Bit 0-23     CPU# if STARTUP_xx flags are not set
>          */
>         movl    smpboot_control(%rip), %ecx
> +
> +       testl   $STARTUP_ENABLE_X2APIC, %ecx
> +       jnz     .Lenable_x2apic
> +
>         testl   $STARTUP_READ_APICID, %ecx
>         jnz     .Lread_apicid
>         /*
> @@ -249,6 +254,16 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>         andl    $(~STARTUP_PARALLEL_MASK), %ecx
>         jmp     .Lsetup_cpu
>
> +.Lenable_x2apic:
> +       /* Enable X2APIC if disabled */
> +       mov     $MSR_IA32_APICBASE, %ecx
> +       rdmsr
> +       testl   $X2APIC_ENABLE, %eax
> +       jnz     .Lread_apicid_msr
> +       orl     $X2APIC_ENABLE, %eax
> +       wrmsr
> +       jmp     .Lread_apicid_msr
> +
>  .Lread_apicid:
>         /* Check whether X2APIC mode is already enabled */
>         mov     $MSR_IA32_APICBASE, %ecx
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/2] Fixes for s3 with parallel bootup
  2023-10-26 17:03 [PATCH v2 0/2] Fixes for s3 with parallel bootup Mario Limonciello
  2023-10-26 17:03 ` [PATCH v2 1/2] x86: Enable x2apic during resume from suspend if used previously Mario Limonciello
  2023-10-26 17:03 ` [PATCH v2 2/2] perf/x86/amd: Stop calling amd_pmu_cpu_reset() from amd_pmu_cpu_dead() Mario Limonciello
@ 2023-10-27 19:24 ` Thomas Gleixner
  2023-10-27 19:29   ` Mario Limonciello
  2 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2023-10-27 19:24 UTC (permalink / raw)
  To: Mario Limonciello, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter, Dave Hansen,
	H . Peter Anvin, Rafael J . Wysocki, Len Brown, Pavel Machek,
	David Woodhouse, Sandipan Das,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, open list:SUSPEND TO RAM,
	open list:ACPI, Mario Limonciello

On Thu, Oct 26 2023 at 12:03, Mario Limonciello wrote:
> Parallel bootup on systems that use x2apic broke suspend to ram.
> This series ensures x2apic is re-enabled at startup and fixes an exposed
> pre-emption issue.

The PMU issue has absolutely nothing to do with parallel bootup.

Can you please describe stuff coherently?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/2] Fixes for s3 with parallel bootup
  2023-10-27 19:24 ` [PATCH v2 0/2] Fixes for s3 with parallel bootup Thomas Gleixner
@ 2023-10-27 19:29   ` Mario Limonciello
  2023-10-27 21:50     ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Mario Limonciello @ 2023-10-27 19:29 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter, Dave Hansen,
	H . Peter Anvin, Rafael J . Wysocki, Len Brown, Pavel Machek,
	David Woodhouse, Sandipan Das,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, open list:SUSPEND TO RAM,
	open list:ACPI

On 10/27/2023 14:24, Thomas Gleixner wrote:
> On Thu, Oct 26 2023 at 12:03, Mario Limonciello wrote:
>> Parallel bootup on systems that use x2apic broke suspend to ram.
>> This series ensures x2apic is re-enabled at startup and fixes an exposed
>> pre-emption issue.
> 
> The PMU issue has absolutely nothing to do with parallel bootup.
> 
> Can you please describe stuff coherently?

They are both issues found with S3 testing.
The PMU issue wasn't being observed with cpuhp.parallel=0.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] x86: Enable x2apic during resume from suspend if used previously
  2023-10-26 17:03 ` [PATCH v2 1/2] x86: Enable x2apic during resume from suspend if used previously Mario Limonciello
  2023-10-26 17:15   ` Rafael J. Wysocki
@ 2023-10-27 21:31   ` Thomas Gleixner
  2023-10-31 18:53     ` Mario Limonciello
  1 sibling, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2023-10-27 21:31 UTC (permalink / raw)
  To: Mario Limonciello, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter, Dave Hansen,
	H . Peter Anvin, Rafael J . Wysocki, Len Brown, Pavel Machek,
	David Woodhouse, Sandipan Das,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, open list:SUSPEND TO RAM,
	open list:ACPI, Mario Limonciello

Mario!

On Thu, Oct 26 2023 at 12:03, Mario Limonciello wrote:

> If x2apic was enabled during boot with parallel startup
> it will be needed during resume from suspend to ram as well.

Lacks an explanation why it is needed.

> Store whether to enable into the smpboot_control global variable
> and during startup re-enable it if necessary.
>
> This fixes resume from suspend on workstation CPUs with x2apic
> enabled.

You completely fail to describe the failure mode.

> It will also work on systems with one maxcpus=1 but still using
> x2apic since x2apic is also re-enabled in lapic_resume().

This sentence makes no sense. What's so special about maxcpus=1?

Absolutely nothing.

lapic_resume() is a syscore op and is always invoked on the CPU which
handles suspend/resume. The point is that __x2apic_enable() which is
invoked from there becomes a NOOP because X2APIC is already enabled.

So what are you trying to tell me?

I really appreciate your enthusiasm of chasing and fixing bugs, but your
change logs and explanations are really making it hard to keep that
appreciation up.

>  	/*
> -	 * Ensure the CPU knows which one it is when it comes back, if
> -	 * it isn't in parallel mode and expected to work that out for
> -	 * itself.
> +	 * Ensure x2apic is re-enabled if necessary and the CPU knows which
> +	 * one it is when it comes back, if it isn't in parallel mode and
> +	 * expected to work that out for itself.

The x2apic comment is misplaced. It should be above the x2apic
conditional as it has nothing to do with the initial condition because
even if X2APIC is enabled then the parallel startup might be disabled.

Go and read this comment 3 month from now and try to make sense of it.

>  	 */
> -	if (!(smpboot_control & STARTUP_PARALLEL_MASK))
> +	if (smpboot_control & STARTUP_PARALLEL_MASK) {
> +		if (x2apic_enabled())
> +			smpboot_control |= STARTUP_ENABLE_X2APIC;

This bit is sticky after resume, so any subsequent CPU hotplug operation
will see it as well.

This lacks an explanation why this is correct and why this flag is not
set early during boot before the APs are brought up.

> +	} else {
>  		smpboot_control = smp_processor_id();
> +	}
>  #endif
>  	initial_code = (unsigned long)wakeup_long64;
>  	saved_magic = 0x123456789abcdef0L;
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index ea6995920b7a..300901af9fa3 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -237,9 +237,14 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>  	 * CPU number is encoded in smpboot_control.
>  	 *
>  	 * Bit 31	STARTUP_READ_APICID (Read APICID from APIC)
> +	 * Bit 30	STARTUP_ENABLE_X2APIC (Enable X2APIC mode)
>  	 * Bit 0-23	CPU# if STARTUP_xx flags are not set
>  	 */
>  	movl	smpboot_control(%rip), %ecx
> +
> +	testl	$STARTUP_ENABLE_X2APIC, %ecx
> +	jnz	.Lenable_x2apic

Why is this tested here? The test clearly belongs into .Lread_apicid

> +
>  	testl	$STARTUP_READ_APICID, %ecx
>  	jnz	.Lread_apicid
>  	/*
> @@ -249,6 +254,16 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>  	andl	$(~STARTUP_PARALLEL_MASK), %ecx
>  	jmp	.Lsetup_cpu
>  
> +.Lenable_x2apic:
> +	/* Enable X2APIC if disabled */
> +	mov	$MSR_IA32_APICBASE, %ecx
> +	rdmsr
> +	testl	$X2APIC_ENABLE, %eax
> +	jnz	.Lread_apicid_msr
> +	orl	$X2APIC_ENABLE, %eax
> +	wrmsr
> +	jmp	.Lread_apicid_msr

And this part just moves in front of .Lread_apicid_msr and spares
the jump at the end.

>  .Lread_apicid:
>  	/* Check whether X2APIC mode is already enabled */
>  	mov	$MSR_IA32_APICBASE, %ecx

That aside, I'm still failing to see the actual failure scenario due to
the utter void in the change log.

The kernel has two mechanisms which end up with X2APIC enabled:

    1) BIOS has it enabled already, which is required for any machine
       which has more than 255 CPUs, but BIOS can decide to enable
       X2APIC even with less than 256 CPUs.

    2) BIOS has not enabled it, but the kernel enables it because the
       CPU supports it.

The major difference is:

    #1 prevents the MMIO fixmap for the APIC to be installed

    #2 installs the fixmap but does not use it. It's never torn down.

Let's look at these two cases in the light of resume:

    #1 If the BIOS enabled X2APIC mode then the only way how this can
       fail on resume is that the BIOS did not enable X2APIC mode in the
       resume path before going back into the kernel and due to the
       non-existent MMIO mapping there is no way to read the APIC ID.

    #2 It does not matter whether the BIOS enabled X2APIC mode during
       resume because the MMIO mapping exists and the APIC ID read via
       MMIO should be identical to the APIC ID read via the X2APIC MSR.

       If not, then there is something fundamentally wrong.

How is this working during the initial bringup of the APs?

    #1 If the APs do not have X2APIC enabled by the BIOS then they will
       crash and burn during bringup due to non-existent MMIO mapping.

    #2 The APs can read their APIC ID just fine via MMIO and it
       obviously is the same as the APIC ID after the bringup enabled
       X2APIC mode. Otherwise the kernel would not work at all.

So the only reasonable explanation for the failure is that the BIOS
fails to reenable X2APIC mode on resume either on the CPU which handles
suspend/resume or on the subsequent AP bringups, which is not clear at
all due to the bit being sticky and the changelog being full of void in
that aspect.

That said the sticky bit is wrong for the following case with older CPUs
where X2APIC requires up to date microcode loaded to work correctly:

    boot maxcpus=4
      load microcode    // Same sequence applies for AP (CPU1-3)
      enable x2apic
    suspend
       set X2APIC enable bit in smpboot_control
    resume
    bringup CPU4
       enable X2APIC early --> fail due to lack of microcode fix

Whether this affects the APIC ID readout or not, which we don't know and
even if we consider this case to be academic, it's still fundamentally
wrong from a correctness point of view.

So without proper information about the failure scenario this "fix" is
simply going nowhere.

Please provide the following information:

  - dmesg of the initial boot up to 'smp: Bringing up secondary CPUs ...'

  - A proper description of the failure case:

    - Is the CPU which handles suspend/resume failing?

    - Is a subsequent AP bringup failing?

    - Is the failure due to the lack of MMIO mapping ?

    - Is the failure due to a bogus APIC ID retrieved via MMIO ?

Thanks for making me do your homework (again),

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] perf/x86/amd: Stop calling amd_pmu_cpu_reset() from amd_pmu_cpu_dead()
  2023-10-26 17:03 ` [PATCH v2 2/2] perf/x86/amd: Stop calling amd_pmu_cpu_reset() from amd_pmu_cpu_dead() Mario Limonciello
@ 2023-10-27 21:47   ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2023-10-27 21:47 UTC (permalink / raw)
  To: Mario Limonciello, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter, Dave Hansen,
	H . Peter Anvin, Rafael J . Wysocki, Len Brown, Pavel Machek,
	David Woodhouse, Sandipan Das,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, open list:SUSPEND TO RAM,
	open list:ACPI, Mario Limonciello

On Thu, Oct 26 2023 at 12:03, Mario Limonciello wrote:

> During suspend testing on a workstation CPU a preemption BUG was
> reported.

How is this related to a workstation CPU? Laptop CPUs and server CPUs
are magically not affected, right?

Also how is this related to suspend?

This clearly affects any CPU down operation whether in the context of
suspend or initiated via sysfs, no?

Just because you observed it during suspend testing does not magically
make it a suspend related problem....

> BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2960
> caller is amd_pmu_lbr_reset+0x19/0xc0
> CPU: 104 PID: 2960 Comm: rtcwake Not tainted 6.6.0-rc6-00002-g3e2c7f3ac51f
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x44/0x60
>  check_preemption_disabled+0xce/0xf0
>  ? __pfx_x86_pmu_dead_cpu+0x10/0x10
>  amd_pmu_lbr_reset+0x19/0xc0
>  ? __pfx_x86_pmu_dead_cpu+0x10/0x10
>  amd_pmu_cpu_reset.constprop.0+0x51/0x60
>  amd_pmu_cpu_dead+0x3e/0x90
>  x86_pmu_dead_cpu+0x13/0x20
>  cpuhp_invoke_callback+0x169/0x4b0
>  ? __pfx_virtnet_cpu_dead+0x10/0x10
>  __cpuhp_invoke_callback_range+0x76/0xe0
>  _cpu_down+0x112/0x270
>  freeze_secondary_cpus+0x8e/0x280
>  suspend_devices_and_enter+0x342/0x900
>  pm_suspend+0x2fd/0x690
>  state_store+0x71/0xd0
>  kernfs_fop_write_iter+0x128/0x1c0
>  vfs_write+0x2db/0x400
>  ksys_write+0x5f/0xe0
>  do_syscall_64+0x59/0x90
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? count_memcg_events.constprop.0+0x1a/0x30
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? handle_mm_fault+0x1e9/0x340
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? preempt_count_add+0x4d/0xa0
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? up_read+0x38/0x70
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? do_user_addr_fault+0x343/0x6b0
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? exc_page_fault+0x74/0x170
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> RIP: 0033:0x7f32f8d14a77
> Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa
> 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff
> 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> RSP: 002b:00007ffdc648de18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f32f8d14a77
> RDX: 0000000000000004 RSI: 000055b2fc2a5670 RDI: 0000000000000004
> RBP: 000055b2fc2a5670 R08: 0000000000000000 R09: 000055b2fc2a5670
> R10: 00007f32f8e1a2f0 R11: 0000000000000246 R12: 0000000000000004
> R13: 000055b2fc2a2480 R14: 00007f32f8e16600 R15: 00007f32f8e15a00
>  </TASK>

How much of that backtrace is actually substantial information?

At max 5 lines out of ~50. See:

  https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces

> This bug shows that there is a mistake with the flow used for offlining

This bug shows nothing than a calltrace. Please explain the context and
the fail in coherent sentences. The bug backtrace is just for
illustration.

> a CPU.  Calling amd_pmu_cpu_reset() from the dead callback is
> problematic

It's not problematic. It's simply wrong.

> because this doesn't run on the actual CPU being offlined.  The intent of
> the function is to reset MSRs local to that CPU.
>
> Move the call into the dying callback which is actually run on the local
> CPU.

...

> +static void amd_pmu_cpu_dying(int cpu)
> +{
> +	amd_pmu_cpu_reset(cpu);
> +}

You clearly can spare that wrapper which wraps a function with the signature

     void fn(int)

into a function with the signature

     void fn(int)

by just assigning amd_pmu_cpu_reset() to the cpu_dying callback, no?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/2] Fixes for s3 with parallel bootup
  2023-10-27 19:29   ` Mario Limonciello
@ 2023-10-27 21:50     ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2023-10-27 21:50 UTC (permalink / raw)
  To: Mario Limonciello, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter, Dave Hansen,
	H . Peter Anvin, Rafael J . Wysocki, Len Brown, Pavel Machek,
	David Woodhouse, Sandipan Das,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, open list:SUSPEND TO RAM,
	open list:ACPI

On Fri, Oct 27 2023 at 14:29, Mario Limonciello wrote:
> On 10/27/2023 14:24, Thomas Gleixner wrote:
>> On Thu, Oct 26 2023 at 12:03, Mario Limonciello wrote:
>>> Parallel bootup on systems that use x2apic broke suspend to ram.
>>> This series ensures x2apic is re-enabled at startup and fixes an exposed
>>> pre-emption issue.
>> 
>> The PMU issue has absolutely nothing to do with parallel bootup.
>> 
>> Can you please describe stuff coherently?
>
> They are both issues found with S3 testing.
> The PMU issue wasn't being observed with cpuhp.parallel=0.

It does not matter whether you cannot observe it under a certain
conditions. What matters is the proper analysis of the root cause and
that is clearly neither related to suspend nor cpuhp.parallel=0.

Stop this 'fix the symptom' approach before it becomes a habit.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] x86: Enable x2apic during resume from suspend if used previously
  2023-10-27 21:31   ` Thomas Gleixner
@ 2023-10-31 18:53     ` Mario Limonciello
  2023-10-31 23:20       ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Mario Limonciello @ 2023-10-31 18:53 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Linux kernel regressions list
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter, Dave Hansen,
	H . Peter Anvin, Rafael J . Wysocki, Len Brown, Pavel Machek,
	David Woodhouse, Sandipan Das,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, open list:SUSPEND TO RAM,
	open list:ACPI

On 10/27/2023 16:31, Thomas Gleixner wrote:
> Mario!
> 
> On Thu, Oct 26 2023 at 12:03, Mario Limonciello wrote:
> 
>> If x2apic was enabled during boot with parallel startup
>> it will be needed during resume from suspend to ram as well.
> 
> Lacks an explanation why it is needed.
> 
>> Store whether to enable into the smpboot_control global variable
>> and during startup re-enable it if necessary.
>>
>> This fixes resume from suspend on workstation CPUs with x2apic
>> enabled.
> 
> You completely fail to describe the failure mode.
> 
>> It will also work on systems with one maxcpus=1 but still using
>> x2apic since x2apic is also re-enabled in lapic_resume().
> 
> This sentence makes no sense. What's so special about maxcpus=1?
> 
> Absolutely nothing.
> 
> lapic_resume() is a syscore op and is always invoked on the CPU which
> handles suspend/resume. The point is that __x2apic_enable() which is
> invoked from there becomes a NOOP because X2APIC is already enabled.
> 
> So what are you trying to tell me?
> 
> I really appreciate your enthusiasm of chasing and fixing bugs, but your
> change logs and explanations are really making it hard to keep that
> appreciation up.
> 
>>   	/*
>> -	 * Ensure the CPU knows which one it is when it comes back, if
>> -	 * it isn't in parallel mode and expected to work that out for
>> -	 * itself.
>> +	 * Ensure x2apic is re-enabled if necessary and the CPU knows which
>> +	 * one it is when it comes back, if it isn't in parallel mode and
>> +	 * expected to work that out for itself.
> 
> The x2apic comment is misplaced. It should be above the x2apic
> conditional as it has nothing to do with the initial condition because
> even if X2APIC is enabled then the parallel startup might be disabled.
> 
> Go and read this comment 3 month from now and try to make sense of it.
> 
>>   	 */
>> -	if (!(smpboot_control & STARTUP_PARALLEL_MASK))
>> +	if (smpboot_control & STARTUP_PARALLEL_MASK) {
>> +		if (x2apic_enabled())
>> +			smpboot_control |= STARTUP_ENABLE_X2APIC;
> 
> This bit is sticky after resume, so any subsequent CPU hotplug operation
> will see it as well.
> 
> This lacks an explanation why this is correct and why this flag is not
> set early during boot before the APs are brought up.
> 
>> +	} else {
>>   		smpboot_control = smp_processor_id();
>> +	}
>>   #endif
>>   	initial_code = (unsigned long)wakeup_long64;
>>   	saved_magic = 0x123456789abcdef0L;
>> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
>> index ea6995920b7a..300901af9fa3 100644
>> --- a/arch/x86/kernel/head_64.S
>> +++ b/arch/x86/kernel/head_64.S
>> @@ -237,9 +237,14 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>>   	 * CPU number is encoded in smpboot_control.
>>   	 *
>>   	 * Bit 31	STARTUP_READ_APICID (Read APICID from APIC)
>> +	 * Bit 30	STARTUP_ENABLE_X2APIC (Enable X2APIC mode)
>>   	 * Bit 0-23	CPU# if STARTUP_xx flags are not set
>>   	 */
>>   	movl	smpboot_control(%rip), %ecx
>> +
>> +	testl	$STARTUP_ENABLE_X2APIC, %ecx
>> +	jnz	.Lenable_x2apic
> 
> Why is this tested here? The test clearly belongs into .Lread_apicid
> 
>> +
>>   	testl	$STARTUP_READ_APICID, %ecx
>>   	jnz	.Lread_apicid
>>   	/*
>> @@ -249,6 +254,16 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>>   	andl	$(~STARTUP_PARALLEL_MASK), %ecx
>>   	jmp	.Lsetup_cpu
>>   
>> +.Lenable_x2apic:
>> +	/* Enable X2APIC if disabled */
>> +	mov	$MSR_IA32_APICBASE, %ecx
>> +	rdmsr
>> +	testl	$X2APIC_ENABLE, %eax
>> +	jnz	.Lread_apicid_msr
>> +	orl	$X2APIC_ENABLE, %eax
>> +	wrmsr
>> +	jmp	.Lread_apicid_msr
> 
> And this part just moves in front of .Lread_apicid_msr and spares
> the jump at the end.
> 
>>   .Lread_apicid:
>>   	/* Check whether X2APIC mode is already enabled */
>>   	mov	$MSR_IA32_APICBASE, %ecx
> 
> That aside, I'm still failing to see the actual failure scenario due to
> the utter void in the change log.
> 
> The kernel has two mechanisms which end up with X2APIC enabled:
> 
>      1) BIOS has it enabled already, which is required for any machine
>         which has more than 255 CPUs, but BIOS can decide to enable
>         X2APIC even with less than 256 CPUs.
> 
>      2) BIOS has not enabled it, but the kernel enables it because the
>         CPU supports it.
> 
> The major difference is:
> 
>      #1 prevents the MMIO fixmap for the APIC to be installed
> 
>      #2 installs the fixmap but does not use it. It's never torn down.
> 
> Let's look at these two cases in the light of resume:
> 
>      #1 If the BIOS enabled X2APIC mode then the only way how this can
>         fail on resume is that the BIOS did not enable X2APIC mode in the
>         resume path before going back into the kernel and due to the
>         non-existent MMIO mapping there is no way to read the APIC ID.
> 
>      #2 It does not matter whether the BIOS enabled X2APIC mode during
>         resume because the MMIO mapping exists and the APIC ID read via
>         MMIO should be identical to the APIC ID read via the X2APIC MSR.
> 
>         If not, then there is something fundamentally wrong.
> 
> How is this working during the initial bringup of the APs?
> 
>      #1 If the APs do not have X2APIC enabled by the BIOS then they will
>         crash and burn during bringup due to non-existent MMIO mapping.
> 
>      #2 The APs can read their APIC ID just fine via MMIO and it
>         obviously is the same as the APIC ID after the bringup enabled
>         X2APIC mode. Otherwise the kernel would not work at all.
> 
> So the only reasonable explanation for the failure is that the BIOS
> fails to reenable X2APIC mode on resume either on the CPU which handles
> suspend/resume or on the subsequent AP bringups, which is not clear at
> all due to the bit being sticky and the changelog being full of void in
> that aspect.
> 
> That said the sticky bit is wrong for the following case with older CPUs
> where X2APIC requires up to date microcode loaded to work correctly:
> 
>      boot maxcpus=4
>        load microcode    // Same sequence applies for AP (CPU1-3)
>        enable x2apic
>      suspend
>         set X2APIC enable bit in smpboot_control
>      resume
>      bringup CPU4
>         enable X2APIC early --> fail due to lack of microcode fix
> 
> Whether this affects the APIC ID readout or not, which we don't know and
> even if we consider this case to be academic, it's still fundamentally
> wrong from a correctness point of view.
> 
> So without proper information about the failure scenario this "fix" is
> simply going nowhere.
> 
> Please provide the following information:
> 
>    - dmesg of the initial boot up to 'smp: Bringing up secondary CPUs ...'
> 
>    - A proper description of the failure case:
> 
>      - Is the CPU which handles suspend/resume failing?
> 
>      - Is a subsequent AP bringup failing?
> 
>      - Is the failure due to the lack of MMIO mapping ?
> 
>      - Is the failure due to a bogus APIC ID retrieved via MMIO ?
> 
> Thanks for making me do your homework (again),
> 
>          tglx

Hi Thomas,

Thank you for looking this over.  I've reviewed it with the internal 
team and confirmed there was a BIOS bug where the MSR wasn't restored 
after the S3 cycle completed.  The BIOS team has fixed it.

Thanks,

#regzbot invalid: BIOS bug

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] x86: Enable x2apic during resume from suspend if used previously
  2023-10-31 18:53     ` Mario Limonciello
@ 2023-10-31 23:20       ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2023-10-31 23:20 UTC (permalink / raw)
  To: Mario Limonciello, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Linux kernel regressions list
  Cc: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter, Dave Hansen,
	H . Peter Anvin, Rafael J . Wysocki, Len Brown, Pavel Machek,
	David Woodhouse, Sandipan Das,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM, open list:SUSPEND TO RAM,
	open list:ACPI

On Tue, Oct 31 2023 at 13:53, Mario Limonciello wrote:
> Thank you for looking this over.  I've reviewed it with the internal 
> team and confirmed there was a BIOS bug where the MSR wasn't restored 
> after the S3 cycle completed.  The BIOS team has fixed it.

Why am I not surprised?

Thanks for the heads up!

       tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-10-31 23:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-26 17:03 [PATCH v2 0/2] Fixes for s3 with parallel bootup Mario Limonciello
2023-10-26 17:03 ` [PATCH v2 1/2] x86: Enable x2apic during resume from suspend if used previously Mario Limonciello
2023-10-26 17:15   ` Rafael J. Wysocki
2023-10-27 21:31   ` Thomas Gleixner
2023-10-31 18:53     ` Mario Limonciello
2023-10-31 23:20       ` Thomas Gleixner
2023-10-26 17:03 ` [PATCH v2 2/2] perf/x86/amd: Stop calling amd_pmu_cpu_reset() from amd_pmu_cpu_dead() Mario Limonciello
2023-10-27 21:47   ` Thomas Gleixner
2023-10-27 19:24 ` [PATCH v2 0/2] Fixes for s3 with parallel bootup Thomas Gleixner
2023-10-27 19:29   ` Mario Limonciello
2023-10-27 21:50     ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.