linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/smp: Validate APIC ID before parking CPU in INIT
@ 2023-07-19  5:13 Vasant Hegde
  2023-08-03 16:28 ` Vasant Hegde
  2023-08-09 18:42 ` Thomas Gleixner
  0 siblings, 2 replies; 7+ messages in thread
From: Vasant Hegde @ 2023-07-19  5:13 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: x86, dave.hansen, bp, mingo, Vasant Hegde, Dheeraj Kumar Srivastava

Below commit is causing kexec to hang in certain scenarios with >255 CPUs.

Reproduce steps:
  - We are using 2 socket system with 384 CPUs
  - Booting first kernel with kernel command line intremap=off
    This disabled x2apic in kernel and booted with apic mode
  - During kexec it tries to send INIT to all CPUs except boot CPU
    If APIC ID is 0x100 (like in our case) then it will send CPU0
    to INIT mode and system hangs (in APIC mode DEST field is 8bit)

Fix this issue by adding apic->apic_id_valid() check before sending
INIT sequence.

Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible")
Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
---
 arch/x86/kernel/smpboot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index e1aa2cd7734b..e5ca0689c4dd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1360,7 +1360,7 @@ bool smp_park_other_cpus_in_init(void)
 		if (cpu == this_cpu)
 			continue;
 		apicid = apic->cpu_present_to_apicid(cpu);
-		if (apicid == BAD_APICID)
+		if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
 			continue;
 		send_init_sequence(apicid);
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/smp: Validate APIC ID before parking CPU in INIT
  2023-07-19  5:13 [PATCH] x86/smp: Validate APIC ID before parking CPU in INIT Vasant Hegde
@ 2023-08-03 16:28 ` Vasant Hegde
  2023-08-09 18:42 ` Thomas Gleixner
  1 sibling, 0 replies; 7+ messages in thread
From: Vasant Hegde @ 2023-08-03 16:28 UTC (permalink / raw)
  To: tglx, linux-kernel; +Cc: x86, dave.hansen, bp, mingo, Dheeraj Kumar Srivastava

Hi

Did you get a chance to look into this patch?


-Vasant

On 7/19/2023 10:43 AM, Vasant Hegde wrote:
> Below commit is causing kexec to hang in certain scenarios with >255 CPUs.
> 
> Reproduce steps:
>   - We are using 2 socket system with 384 CPUs
>   - Booting first kernel with kernel command line intremap=off
>     This disabled x2apic in kernel and booted with apic mode
>   - During kexec it tries to send INIT to all CPUs except boot CPU
>     If APIC ID is 0x100 (like in our case) then it will send CPU0
>     to INIT mode and system hangs (in APIC mode DEST field is 8bit)
> 
> Fix this issue by adding apic->apic_id_valid() check before sending
> INIT sequence.
> 
> Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible")
> Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>  arch/x86/kernel/smpboot.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index e1aa2cd7734b..e5ca0689c4dd 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1360,7 +1360,7 @@ bool smp_park_other_cpus_in_init(void)
>  		if (cpu == this_cpu)
>  			continue;
>  		apicid = apic->cpu_present_to_apicid(cpu);
> -		if (apicid == BAD_APICID)
> +		if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
>  			continue;
>  		send_init_sequence(apicid);
>  	}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/smp: Validate APIC ID before parking CPU in INIT
  2023-07-19  5:13 [PATCH] x86/smp: Validate APIC ID before parking CPU in INIT Vasant Hegde
  2023-08-03 16:28 ` Vasant Hegde
@ 2023-08-09 18:42 ` Thomas Gleixner
  2023-08-09 18:52   ` Thomas Gleixner
  1 sibling, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2023-08-09 18:42 UTC (permalink / raw)
  To: Vasant Hegde, linux-kernel
  Cc: x86, dave.hansen, bp, mingo, Vasant Hegde, Dheeraj Kumar Srivastava

On Wed, Jul 19 2023 at 05:13, Vasant Hegde wrote:
> Below commit is causing kexec to hang in certain scenarios with >255 CPUs.
>
> Reproduce steps:
>   - We are using 2 socket system with 384 CPUs
>   - Booting first kernel with kernel command line intremap=off
>     This disabled x2apic in kernel and booted with apic mode
>   - During kexec it tries to send INIT to all CPUs except boot CPU
>     If APIC ID is 0x100 (like in our case) then it will send CPU0
>     to INIT mode and system hangs (in APIC mode DEST field is 8bit)

It took me a while to decode the above.

> Fix this issue by adding apic->apic_id_valid() check before sending
> INIT sequence.

Sigh, yes.

> Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible")
> Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>  arch/x86/kernel/smpboot.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index e1aa2cd7734b..e5ca0689c4dd 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1360,7 +1360,7 @@ bool smp_park_other_cpus_in_init(void)
>  		if (cpu == this_cpu)
>  			continue;
>  		apicid = apic->cpu_present_to_apicid(cpu);
> -		if (apicid == BAD_APICID)
> +		if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
>  			continue;
>  		send_init_sequence(apicid);
>  	}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/smp: Validate APIC ID before parking CPU in INIT
  2023-08-09 18:42 ` Thomas Gleixner
@ 2023-08-09 18:52   ` Thomas Gleixner
  2023-08-10 11:26     ` Vasant Hegde
  2023-09-04 13:48     ` [tip: x86/urgent] x86/smp: Don't send INIT to non-present and non-booted CPUs tip-bot2 for Thomas Gleixner
  0 siblings, 2 replies; 7+ messages in thread
From: Thomas Gleixner @ 2023-08-09 18:52 UTC (permalink / raw)
  To: Vasant Hegde, linux-kernel
  Cc: x86, dave.hansen, bp, mingo, Vasant Hegde, Dheeraj Kumar Srivastava

On Wed, Aug 09 2023 at 20:42, Thomas Gleixner wrote:
> On Wed, Jul 19 2023 at 05:13, Vasant Hegde wrote:
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index e1aa2cd7734b..e5ca0689c4dd 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -1360,7 +1360,7 @@ bool smp_park_other_cpus_in_init(void)
>>  		if (cpu == this_cpu)
>>  			continue;
>>  		apicid = apic->cpu_present_to_apicid(cpu);
>> -		if (apicid == BAD_APICID)
>> +		if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
>>  			continue;
>>  		send_init_sequence(apicid);
>>  	}

I think this papers over the underlying problem that this sends INIT to
an APIC which was never booted. The below is curing the root cause.

Thanks,

        tglx
---
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1356,7 +1356,7 @@ bool smp_park_other_cpus_in_init(void)
 	if (this_cpu)
 		return false;
 
-	for_each_present_cpu(cpu) {
+	for_each_cpu_and(cpu, &cpus_booted_once_mask, cpu_present_mask) {
 		if (cpu == this_cpu)
 			continue;
 		apicid = apic->cpu_present_to_apicid(cpu);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/smp: Validate APIC ID before parking CPU in INIT
  2023-08-09 18:52   ` Thomas Gleixner
@ 2023-08-10 11:26     ` Vasant Hegde
  2023-09-04  8:27       ` Vasant Hegde
  2023-09-04 13:48     ` [tip: x86/urgent] x86/smp: Don't send INIT to non-present and non-booted CPUs tip-bot2 for Thomas Gleixner
  1 sibling, 1 reply; 7+ messages in thread
From: Vasant Hegde @ 2023-08-10 11:26 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel
  Cc: x86, dave.hansen, bp, mingo, Dheeraj Kumar Srivastava

Hi,


On 8/10/2023 12:22 AM, Thomas Gleixner wrote:
> On Wed, Aug 09 2023 at 20:42, Thomas Gleixner wrote:
>> On Wed, Jul 19 2023 at 05:13, Vasant Hegde wrote:
>>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>>> index e1aa2cd7734b..e5ca0689c4dd 100644
>>> --- a/arch/x86/kernel/smpboot.c
>>> +++ b/arch/x86/kernel/smpboot.c
>>> @@ -1360,7 +1360,7 @@ bool smp_park_other_cpus_in_init(void)
>>>  		if (cpu == this_cpu)
>>>  			continue;
>>>  		apicid = apic->cpu_present_to_apicid(cpu);
>>> -		if (apicid == BAD_APICID)
>>> +		if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
>>>  			continue;
>>>  		send_init_sequence(apicid);
>>>  	}
> 
> I think this papers over the underlying problem that this sends INIT to
> an APIC which was never booted. The below is curing the root cause.

I have tested below patch and it fixes the issue. Thanks

Tested-by: Vasant Hegde <vasant.hegde@amd.com>

-Vasant

> 
> Thanks,
> 
>         tglx
> ---
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1356,7 +1356,7 @@ bool smp_park_other_cpus_in_init(void)
>  	if (this_cpu)
>  		return false;
>  
> -	for_each_present_cpu(cpu) {
> +	for_each_cpu_and(cpu, &cpus_booted_once_mask, cpu_present_mask) {
>  		if (cpu == this_cpu)
>  			continue;
>  		apicid = apic->cpu_present_to_apicid(cpu);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/smp: Validate APIC ID before parking CPU in INIT
  2023-08-10 11:26     ` Vasant Hegde
@ 2023-09-04  8:27       ` Vasant Hegde
  0 siblings, 0 replies; 7+ messages in thread
From: Vasant Hegde @ 2023-09-04  8:27 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel
  Cc: x86, dave.hansen, bp, mingo, Dheeraj Kumar Srivastava

Hi Thomas,


On 8/10/2023 4:56 PM, Vasant Hegde wrote:
> Hi,
> 
> 
> On 8/10/2023 12:22 AM, Thomas Gleixner wrote:
>> On Wed, Aug 09 2023 at 20:42, Thomas Gleixner wrote:
>>> On Wed, Jul 19 2023 at 05:13, Vasant Hegde wrote:
>>>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>>>> index e1aa2cd7734b..e5ca0689c4dd 100644
>>>> --- a/arch/x86/kernel/smpboot.c
>>>> +++ b/arch/x86/kernel/smpboot.c
>>>> @@ -1360,7 +1360,7 @@ bool smp_park_other_cpus_in_init(void)
>>>>  		if (cpu == this_cpu)
>>>>  			continue;
>>>>  		apicid = apic->cpu_present_to_apicid(cpu);
>>>> -		if (apicid == BAD_APICID)
>>>> +		if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
>>>>  			continue;
>>>>  		send_init_sequence(apicid);
>>>>  	}
>>
>> I think this papers over the underlying problem that this sends INIT to
>> an APIC which was never booted. The below is curing the root cause.
> 
> I have tested below patch and it fixes the issue. Thanks
> 
> Tested-by: Vasant Hegde <vasant.hegde@amd.com>

Ping.

Will you be posting/picking below patch -OR- Do you want me to resend below patch?

-Vasant

> 
> -Vasant
> 
>>
>> Thanks,
>>
>>         tglx
>> ---
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -1356,7 +1356,7 @@ bool smp_park_other_cpus_in_init(void)
>>  	if (this_cpu)
>>  		return false;
>>  
>> -	for_each_present_cpu(cpu) {
>> +	for_each_cpu_and(cpu, &cpus_booted_once_mask, cpu_present_mask) {
>>  		if (cpu == this_cpu)
>>  			continue;
>>  		apicid = apic->cpu_present_to_apicid(cpu);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip: x86/urgent] x86/smp: Don't send INIT to non-present and non-booted CPUs
  2023-08-09 18:52   ` Thomas Gleixner
  2023-08-10 11:26     ` Vasant Hegde
@ 2023-09-04 13:48     ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2023-09-04 13:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Dheeraj Kumar Srivastava, Thomas Gleixner, Vasant Hegde, x86,
	linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     3f874c9b2aae8e30463efc1872bea4baa9ed25dc
Gitweb:        https://git.kernel.org/tip/3f874c9b2aae8e30463efc1872bea4baa9ed25dc
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Wed, 09 Aug 2023 20:52:20 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 04 Sep 2023 15:41:42 +02:00

x86/smp: Don't send INIT to non-present and non-booted CPUs

Vasant reported that kexec() can hang or reset the machine when it tries to
park CPUs via INIT. This happens when the kernel is using extended APIC,
but the present mask has APIC IDs >= 0x100 enumerated.

As extended APIC can only handle 8 bit of APIC ID sending INIT to APIC ID
0x100 sends INIT to APIC ID 0x0. That's the boot CPU which is special on
x86 and INIT causes the system to hang or resets the machine.

Prevent this by sending INIT only to those CPUs which have been booted
once.

Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible")
Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/87cyzwjbff.ffs@tglx
---
 arch/x86/kernel/smpboot.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d7667a2..4e45ff4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1250,7 +1250,7 @@ bool smp_park_other_cpus_in_init(void)
 	if (this_cpu)
 		return false;
 
-	for_each_present_cpu(cpu) {
+	for_each_cpu_and(cpu, &cpus_booted_once_mask, cpu_present_mask) {
 		if (cpu == this_cpu)
 			continue;
 		apicid = apic->cpu_present_to_apicid(cpu);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-09-04 13:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-19  5:13 [PATCH] x86/smp: Validate APIC ID before parking CPU in INIT Vasant Hegde
2023-08-03 16:28 ` Vasant Hegde
2023-08-09 18:42 ` Thomas Gleixner
2023-08-09 18:52   ` Thomas Gleixner
2023-08-10 11:26     ` Vasant Hegde
2023-09-04  8:27       ` Vasant Hegde
2023-09-04 13:48     ` [tip: x86/urgent] x86/smp: Don't send INIT to non-present and non-booted CPUs tip-bot2 for Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).