All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] kernel/cpu: restart cpu_up when hotplug is disabled
@ 2022-04-18 19:54 Joel Savitz
  2022-04-19 12:34 ` David Hildenbrand
  0 siblings, 1 reply; 5+ messages in thread
From: Joel Savitz @ 2022-04-18 19:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Savitz, Thomas Gleixner, Valentin Schneider, Peter Zijlstra,
	Frederic Weisbecker, Mark Rutland, Yuan ZhaoXiong, Baokun Li,
	Jason A. Donenfeld, YueHaibing, Randy Dunlap, David Hildenbrand

The cpu hotplug path may be utilized while hotplug is disabled for a
brief moment leading to failures. As an example, attempts to perform
cpu hotplug by userspace soon after boot may race with pci_device_probe
leading to inconsistent results.

Proposed idea:
Call restart_syscall instead of returning -EBUSY since
cpu_hotplug_disabled seems to only have a positive value
for short, temporary amounts of time.

Does anyone see any serious problems with this?

Signed-off-by: Joel Savitz <jsavitz@redhat.com>
---
 kernel/cpu.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5797c2a7a93f..2992c7d1d24e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -35,6 +35,7 @@
 #include <linux/percpu-rwsem.h>
 #include <linux/cpuset.h>
 #include <linux/random.h>
+#include <linux/delay.h>
 
 #include <trace/events/power.h>
 #define CREATE_TRACE_POINTS
@@ -1401,7 +1402,9 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
 	cpu_maps_update_begin();
 
 	if (cpu_hotplug_disabled) {
-		err = -EBUSY;
+		/* avoid busy looping (5ms of sleep should be enough) */
+		msleep(5);
+		err = restart_syscall();
 		goto out;
 	}
 	if (!cpu_smt_allowed(cpu)) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] kernel/cpu: restart cpu_up when hotplug is disabled
  2022-04-18 19:54 [RFC PATCH] kernel/cpu: restart cpu_up when hotplug is disabled Joel Savitz
@ 2022-04-19 12:34 ` David Hildenbrand
  2022-04-21 14:23   ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2022-04-19 12:34 UTC (permalink / raw)
  To: Joel Savitz, linux-kernel
  Cc: Thomas Gleixner, Valentin Schneider, Peter Zijlstra,
	Frederic Weisbecker, Mark Rutland, Yuan ZhaoXiong, Baokun Li,
	Jason A. Donenfeld, YueHaibing, Randy Dunlap, David Hildenbrand

On 18.04.22 21:54, Joel Savitz wrote:
> The cpu hotplug path may be utilized while hotplug is disabled for a
> brief moment leading to failures. As an example, attempts to perform
> cpu hotplug by userspace soon after boot may race with pci_device_probe
> leading to inconsistent results.

You might want to extend a bit in which situation we observed that issue
fairly reliably.

When restricting the number of boot cpus on the kernel cmdline, e.g.,
via "maxcpus=2", udev will find the offline cpus when enumerating all
cpus and try onlining them. Due to the race, onlining of some cpus fails
e.g., when racing with pci_device_probe().

While teaching udev to not online coldplugged CPUs when "maxcpus" was
specified ("policy"), it revealed the underlying issue that onlining a
CPU can fail with -EBUSY in corner cases when cpu hotplug is temporarily
disabled.

> 
> Proposed idea:
> Call restart_syscall instead of returning -EBUSY since
> cpu_hotplug_disabled seems to only have a positive value
> for short, temporary amounts of time.
> 
> Does anyone see any serious problems with this?
> 
> Signed-off-by: Joel Savitz <jsavitz@redhat.com>
> ---
>  kernel/cpu.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 5797c2a7a93f..2992c7d1d24e 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -35,6 +35,7 @@
>  #include <linux/percpu-rwsem.h>
>  #include <linux/cpuset.h>
>  #include <linux/random.h>
> +#include <linux/delay.h>
>  
>  #include <trace/events/power.h>
>  #define CREATE_TRACE_POINTS
> @@ -1401,7 +1402,9 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
>  	cpu_maps_update_begin();
>  
>  	if (cpu_hotplug_disabled) {
> -		err = -EBUSY;
> +		/* avoid busy looping (5ms of sleep should be enough) */
> +		msleep(5);
> +		err = restart_syscall();

It's worth noting that we use the same approach in
lock_device_hotplug_sysfs(). It's far from perfect I would say, but we
really wanted to avoid letting user space having to deal with retry logic.


For example, while memory onlining can fail with -EBUSY, it's not
expected to fail during memory onlining (we only fail in very rare
cases, when a memory notifier fails -- for example when kasan fails to
allocate memory).

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] kernel/cpu: restart cpu_up when hotplug is disabled
  2022-04-19 12:34 ` David Hildenbrand
@ 2022-04-21 14:23   ` Thomas Gleixner
  2022-04-21 14:31     ` David Hildenbrand
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2022-04-21 14:23 UTC (permalink / raw)
  To: David Hildenbrand, Joel Savitz, linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Frederic Weisbecker,
	Mark Rutland, Yuan ZhaoXiong, Baokun Li, Jason A. Donenfeld,
	YueHaibing, Randy Dunlap, David Hildenbrand, Greg Kroah-Hartman,
	Rafael J. Wysocki

On Tue, Apr 19 2022 at 14:34, David Hildenbrand wrote:
> On 18.04.22 21:54, Joel Savitz wrote:
>> The cpu hotplug path may be utilized while hotplug is disabled for a
>> brief moment leading to failures. As an example, attempts to perform
>> cpu hotplug by userspace soon after boot may race with pci_device_probe
>> leading to inconsistent results.
>
> You might want to extend a bit in which situation we observed that issue
> fairly reliably.
>
> When restricting the number of boot cpus on the kernel cmdline, e.g.,
> via "maxcpus=2", udev will find the offline cpus when enumerating all
> cpus and try onlining them. Due to the race, onlining of some cpus fails
> e.g., when racing with pci_device_probe().

maxcpus is a horrible hack and broken vs. MCE broadcasting on x86.

> While teaching udev to not online coldplugged CPUs when "maxcpus" was
> specified ("policy"), it revealed the underlying issue that onlining a
> CPU can fail with -EBUSY in corner cases when cpu hotplug is temporarily
> disabled.

Right. It can fail with -EBUSY and because userspace fails to handle it
gracefully we need to hack around it?

>> Proposed idea:
>> Call restart_syscall instead of returning -EBUSY since
>> cpu_hotplug_disabled seems to only have a positive value
>> for short, temporary amounts of time.
>> 
>> Does anyone see any serious problems with this?

Yes. It's a horrible hack and wrong...
 
>>  	if (cpu_hotplug_disabled) {
>> -		err = -EBUSY;
>> +		/* avoid busy looping (5ms of sleep should be enough) */
>> +		msleep(5);
>> +		err = restart_syscall();

... as it sleeps with cpu_add_remove_lock held, which protects
cpu_hotplug_disabled. IOW, cpu_hotplug_enable() is blocked until
msleep() returns.

> It's worth noting that we use the same approach in
> lock_device_hotplug_sysfs().

That does not make it any better, really. 

> It's far from perfect I would say, but we really wanted to avoid
> letting user space having to deal with retry logic.

What's so hard with retry logic in user space? 

If you can come up with a reasonable argument why user space cannot be
fixed, then there is certainly a better solution than slapping a
msleep(5) at some random place into the code.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] kernel/cpu: restart cpu_up when hotplug is disabled
  2022-04-21 14:23   ` Thomas Gleixner
@ 2022-04-21 14:31     ` David Hildenbrand
  2022-04-21 14:34       ` David Hildenbrand
  0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2022-04-21 14:31 UTC (permalink / raw)
  To: Thomas Gleixner, Joel Savitz, linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Frederic Weisbecker,
	Mark Rutland, Yuan ZhaoXiong, Baokun Li, Jason A. Donenfeld,
	YueHaibing, Randy Dunlap, David Hildenbrand, Greg Kroah-Hartman,
	Rafael J. Wysocki

>> It's far from perfect I would say, but we really wanted to avoid
>> letting user space having to deal with retry logic.
> 
> What's so hard with retry logic in user space? 
> 
> If you can come up with a reasonable argument why user space cannot be
> fixed, then there is certainly a better solution than slapping a
> msleep(5) at some random place into the code.
Most probably you're right and we should just retry in udev. Staring at
the history, it looks like the -EBUSY might have been returned forever,
so user space just never really triggered it on actual CPU hotplug
because it doesn't usually happen that cpu hotplug is disabled.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] kernel/cpu: restart cpu_up when hotplug is disabled
  2022-04-21 14:31     ` David Hildenbrand
@ 2022-04-21 14:34       ` David Hildenbrand
  0 siblings, 0 replies; 5+ messages in thread
From: David Hildenbrand @ 2022-04-21 14:34 UTC (permalink / raw)
  To: Thomas Gleixner, Joel Savitz, linux-kernel
  Cc: Valentin Schneider, Peter Zijlstra, Frederic Weisbecker,
	Mark Rutland, Yuan ZhaoXiong, Baokun Li, Jason A. Donenfeld,
	YueHaibing, Randy Dunlap, David Hildenbrand, Greg Kroah-Hartman,
	Rafael J. Wysocki

On 21.04.22 16:31, David Hildenbrand wrote:
>>> It's far from perfect I would say, but we really wanted to avoid
>>> letting user space having to deal with retry logic.
>>
>> What's so hard with retry logic in user space? 
>>
>> If you can come up with a reasonable argument why user space cannot be
>> fixed, then there is certainly a better solution than slapping a
>> msleep(5) at some random place into the code.
> Most probably you're right and we should just retry in udev. Staring at
> the history, it looks like the -EBUSY might have been returned forever,
> so user space just never really triggered it on actual CPU hotplug
> because it doesn't usually happen that cpu hotplug is disabled.

The last part was confusing: "on actual CPU hotplug because it doesn't
usually happen that cpu hotplug is disabled when onlining a CPU that was
just hotplugged."

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-04-21 14:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-18 19:54 [RFC PATCH] kernel/cpu: restart cpu_up when hotplug is disabled Joel Savitz
2022-04-19 12:34 ` David Hildenbrand
2022-04-21 14:23   ` Thomas Gleixner
2022-04-21 14:31     ` David Hildenbrand
2022-04-21 14:34       ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.