linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] cpuhp: fix some st->target issues
@ 2022-05-26 16:06 Phil Auld
  2022-05-26 16:06 ` [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state Phil Auld
  2022-05-26 16:06 ` [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu Phil Auld
  0 siblings, 2 replies; 9+ messages in thread
From: Phil Auld @ 2022-05-26 16:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, Peter Zijlstra, Valentin Schneider

Several small fixes that clean up some cpuhp inconsistencies.
The first prevents target_store() from calling cpu_down() when
target == state which prevents the cpu being incorrectly marked
as dying.  The second just makes the boot cpu have a valid cpuhp 
target rather than 0 (CPU_OFFLINE) while being in state 
CPU_ONLINE.

A further issue which these two patches don't address is that
the cpuX/online file looks at the device->offline state and can
thus get out of sync with the actual cpuhp state if the cpuhp
target is used to change state.


Cheers,
Phil


Phil Auld (2):
  cpuhp: make target_store() a nop when target == state
  cpuhp: Set cpuhp target for boot cpu

 kernel/cpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state
  2022-05-26 16:06 [PATCH v2 0/2] cpuhp: fix some st->target issues Phil Auld
@ 2022-05-26 16:06 ` Phil Auld
  2022-05-27  9:38   ` Valentin Schneider
  2022-05-26 16:06 ` [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu Phil Auld
  1 sibling, 1 reply; 9+ messages in thread
From: Phil Auld @ 2022-05-26 16:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, Peter Zijlstra, Valentin Schneider

writing the current state back in hotplug/target calls cpu_down()
which will set cpu dying even when it isn't and then nothing will
ever clear it. A stress test that reads values and writes them back
for all cpu device files in sysfs will trigger the BUG() in
select_fallback_rq once all cpus are marked as dying.

kernel/cpu.c::target_store()
	...
        if (st->state < target)
                ret = cpu_up(dev->id, target);
        else
                ret = cpu_down(dev->id, target);

cpu_down() -> cpu_set_state()
	 bool bringup = st->state < target;
	 ...
	 if (cpu_dying(cpu) != !bringup)
		set_cpu_dying(cpu, !bringup);

Fix this by letting state==target fall through in the target_store()
conditional.

Signed-off-by: Phil Auld <pauld@redhat.com>
---
 kernel/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d0a9aa0b42e8..cdb6ac10ad94 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2315,7 +2315,7 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr,
 
 	if (st->state < target)
 		ret = cpu_up(dev->id, target);
-	else
+	else if (st->state > target)
 		ret = cpu_down(dev->id, target);
 out:
 	unlock_device_hotplug();
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu
  2022-05-26 16:06 [PATCH v2 0/2] cpuhp: fix some st->target issues Phil Auld
  2022-05-26 16:06 ` [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state Phil Auld
@ 2022-05-26 16:06 ` Phil Auld
  2022-05-27  9:38   ` Valentin Schneider
  1 sibling, 1 reply; 9+ messages in thread
From: Phil Auld @ 2022-05-26 16:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, Peter Zijlstra, Valentin Schneider

Since the boot cpu does not go through the hotplug process it ends
up with state == CPUHP_ONLINE but target == CPUHP_OFFLINE.
Set the target to match in boot_cpu_hotplug_init().

Signed-off-by: Phil Auld <pauld@redhat.com>
---
 kernel/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index cdb6ac10ad94..0bb8ad0fa2d9 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2677,6 +2677,7 @@ void __init boot_cpu_hotplug_init(void)
 	cpumask_set_cpu(smp_processor_id(), &cpus_booted_once_mask);
 #endif
 	this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
+	this_cpu_write(cpuhp_state.target, CPUHP_ONLINE);
 }
 
 /*
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state
  2022-05-26 16:06 ` [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state Phil Auld
@ 2022-05-27  9:38   ` Valentin Schneider
  2022-05-27 13:22     ` Phil Auld
  0 siblings, 1 reply; 9+ messages in thread
From: Valentin Schneider @ 2022-05-27  9:38 UTC (permalink / raw)
  To: Phil Auld, linux-kernel; +Cc: Thomas Gleixner, Peter Zijlstra

On 26/05/22 12:06, Phil Auld wrote:
> writing the current state back in hotplug/target calls cpu_down()
> which will set cpu dying even when it isn't and then nothing will
> ever clear it. A stress test that reads values and writes them back
> for all cpu device files in sysfs will trigger the BUG() in
> select_fallback_rq once all cpus are marked as dying.
>
> kernel/cpu.c::target_store()
> 	...
>         if (st->state < target)
>                 ret = cpu_up(dev->id, target);
>         else
>                 ret = cpu_down(dev->id, target);
>
> cpu_down() -> cpu_set_state()
> 	 bool bringup = st->state < target;
> 	 ...
> 	 if (cpu_dying(cpu) != !bringup)
> 		set_cpu_dying(cpu, !bringup);
>
> Fix this by letting state==target fall through in the target_store()
> conditional.
>

To go back on my data race paranoia: writes to both cpu$x/online and
cpu$x/hotplug/target are serialized by device_hotplug_lock, and so are the
exported kernel hotplug functions ({add, remove}_cpu()).

That's not cpu_add_remove_lock as I was looking for, but that's still all
under one lock, so I think we're good. Sorry for that!

> Signed-off-by: Phil Auld <pauld@redhat.com>

Reviewed-by: Valentin Schneider <vschneid@redhat.com>

> ---
>  kernel/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index d0a9aa0b42e8..cdb6ac10ad94 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2315,7 +2315,7 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr,
>  
>  	if (st->state < target)
>  		ret = cpu_up(dev->id, target);
> -	else
> +	else if (st->state > target)
>  		ret = cpu_down(dev->id, target);
>  out:
>  	unlock_device_hotplug();
> -- 
> 2.18.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu
  2022-05-26 16:06 ` [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu Phil Auld
@ 2022-05-27  9:38   ` Valentin Schneider
  0 siblings, 0 replies; 9+ messages in thread
From: Valentin Schneider @ 2022-05-27  9:38 UTC (permalink / raw)
  To: Phil Auld, linux-kernel; +Cc: Thomas Gleixner, Peter Zijlstra

On 26/05/22 12:06, Phil Auld wrote:
> Since the boot cpu does not go through the hotplug process it ends
> up with state == CPUHP_ONLINE but target == CPUHP_OFFLINE.
> Set the target to match in boot_cpu_hotplug_init().
>
> Signed-off-by: Phil Auld <pauld@redhat.com>

Reviewed-by: Valentin Schneider <vschneid@redhat.com>

> ---
>  kernel/cpu.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index cdb6ac10ad94..0bb8ad0fa2d9 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2677,6 +2677,7 @@ void __init boot_cpu_hotplug_init(void)
>  	cpumask_set_cpu(smp_processor_id(), &cpus_booted_once_mask);
>  #endif
>  	this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
> +	this_cpu_write(cpuhp_state.target, CPUHP_ONLINE);
>  }
>  
>  /*
> -- 
> 2.18.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state
  2022-05-27  9:38   ` Valentin Schneider
@ 2022-05-27 13:22     ` Phil Auld
  2022-05-30 12:27       ` Valentin Schneider
  0 siblings, 1 reply; 9+ messages in thread
From: Phil Auld @ 2022-05-27 13:22 UTC (permalink / raw)
  To: Valentin Schneider; +Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra

On Fri, May 27, 2022 at 10:38:24AM +0100 Valentin Schneider wrote:
> On 26/05/22 12:06, Phil Auld wrote:
> > writing the current state back in hotplug/target calls cpu_down()
> > which will set cpu dying even when it isn't and then nothing will
> > ever clear it. A stress test that reads values and writes them back
> > for all cpu device files in sysfs will trigger the BUG() in
> > select_fallback_rq once all cpus are marked as dying.
> >
> > kernel/cpu.c::target_store()
> > 	...
> >         if (st->state < target)
> >                 ret = cpu_up(dev->id, target);
> >         else
> >                 ret = cpu_down(dev->id, target);
> >
> > cpu_down() -> cpu_set_state()
> > 	 bool bringup = st->state < target;
> > 	 ...
> > 	 if (cpu_dying(cpu) != !bringup)
> > 		set_cpu_dying(cpu, !bringup);
> >
> > Fix this by letting state==target fall through in the target_store()
> > conditional.
> >
> 
> To go back on my data race paranoia: writes to both cpu$x/online and
> cpu$x/hotplug/target are serialized by device_hotplug_lock, and so are the
> exported kernel hotplug functions ({add, remove}_cpu()).
> 
> That's not cpu_add_remove_lock as I was looking for, but that's still all
> under one lock, so I think we're good. Sorry for that!
> 

Right. This catches it up higher so that we don't get into the code that
starts actually changing things.  I wonder now in the state == target case
if we should make sure st->target == target.  With the second patch it's 
less likely to be needed. Thoughts?

Maybe I'll include that if/when I have code to keep cpux/online in sync
with st->state and cpu_online_mask.

> > Signed-off-by: Phil Auld <pauld@redhat.com>
> 
> Reviewed-by: Valentin Schneider <vschneid@redhat.com>

Thanks!


Cheers,
Phil


> 
> > ---
> >  kernel/cpu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index d0a9aa0b42e8..cdb6ac10ad94 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -2315,7 +2315,7 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr,
> >  
> >  	if (st->state < target)
> >  		ret = cpu_up(dev->id, target);
> > -	else
> > +	else if (st->state > target)
> >  		ret = cpu_down(dev->id, target);
> >  out:
> >  	unlock_device_hotplug();
> > -- 
> > 2.18.0
> 

-- 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state
  2022-05-27 13:22     ` Phil Auld
@ 2022-05-30 12:27       ` Valentin Schneider
  2022-06-01 15:49         ` Phil Auld
  0 siblings, 1 reply; 9+ messages in thread
From: Valentin Schneider @ 2022-05-30 12:27 UTC (permalink / raw)
  To: Phil Auld; +Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra

On 27/05/22 09:22, Phil Auld wrote:
> On Fri, May 27, 2022 at 10:38:24AM +0100 Valentin Schneider wrote:
>> On 26/05/22 12:06, Phil Auld wrote:
>> > writing the current state back in hotplug/target calls cpu_down()
>> > which will set cpu dying even when it isn't and then nothing will
>> > ever clear it. A stress test that reads values and writes them back
>> > for all cpu device files in sysfs will trigger the BUG() in
>> > select_fallback_rq once all cpus are marked as dying.
>> >
>> > kernel/cpu.c::target_store()
>> >    ...
>> >         if (st->state < target)
>> >                 ret = cpu_up(dev->id, target);
>> >         else
>> >                 ret = cpu_down(dev->id, target);
>> >
>> > cpu_down() -> cpu_set_state()
>> >     bool bringup = st->state < target;
>> >     ...
>> >     if (cpu_dying(cpu) != !bringup)
>> >            set_cpu_dying(cpu, !bringup);
>> >
>> > Fix this by letting state==target fall through in the target_store()
>> > conditional.
>> >
>>
>> To go back on my data race paranoia: writes to both cpu$x/online and
>> cpu$x/hotplug/target are serialized by device_hotplug_lock, and so are the
>> exported kernel hotplug functions ({add, remove}_cpu()).
>>
>> That's not cpu_add_remove_lock as I was looking for, but that's still all
>> under one lock, so I think we're good. Sorry for that!
>>
>
> Right. This catches it up higher so that we don't get into the code that
> starts actually changing things.  I wonder now in the state == target case
> if we should make sure st->target == target.  With the second patch it's
> less likely to be needed. Thoughts?
>

Yeah, you could append a simple:

        else
                WARN_ON(st->state != target);


> Maybe I'll include that if/when I have code to keep cpux/online in sync
> with st->state and cpu_online_mask.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state
  2022-05-30 12:27       ` Valentin Schneider
@ 2022-06-01 15:49         ` Phil Auld
  2022-06-01 16:39           ` Valentin Schneider
  0 siblings, 1 reply; 9+ messages in thread
From: Phil Auld @ 2022-06-01 15:49 UTC (permalink / raw)
  To: Valentin Schneider; +Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra

On Mon, May 30, 2022 at 01:27:00PM +0100 Valentin Schneider wrote:
> On 27/05/22 09:22, Phil Auld wrote:
> > On Fri, May 27, 2022 at 10:38:24AM +0100 Valentin Schneider wrote:
> >> On 26/05/22 12:06, Phil Auld wrote:
> >> > writing the current state back in hotplug/target calls cpu_down()
> >> > which will set cpu dying even when it isn't and then nothing will
> >> > ever clear it. A stress test that reads values and writes them back
> >> > for all cpu device files in sysfs will trigger the BUG() in
> >> > select_fallback_rq once all cpus are marked as dying.
> >> >
> >> > kernel/cpu.c::target_store()
> >> >    ...
> >> >         if (st->state < target)
> >> >                 ret = cpu_up(dev->id, target);
> >> >         else
> >> >                 ret = cpu_down(dev->id, target);
> >> >
> >> > cpu_down() -> cpu_set_state()
> >> >     bool bringup = st->state < target;
> >> >     ...
> >> >     if (cpu_dying(cpu) != !bringup)
> >> >            set_cpu_dying(cpu, !bringup);
> >> >
> >> > Fix this by letting state==target fall through in the target_store()
> >> > conditional.
> >> >
> >>
> >> To go back on my data race paranoia: writes to both cpu$x/online and
> >> cpu$x/hotplug/target are serialized by device_hotplug_lock, and so are the
> >> exported kernel hotplug functions ({add, remove}_cpu()).
> >>
> >> That's not cpu_add_remove_lock as I was looking for, but that's still all
> >> under one lock, so I think we're good. Sorry for that!
> >>
> >
> > Right. This catches it up higher so that we don't get into the code that
> > starts actually changing things.  I wonder now in the state == target case
> > if we should make sure st->target == target.  With the second patch it's
> > less likely to be needed. Thoughts?
> >
> 
> Yeah, you could append a simple:
> 
>         else
>                 WARN_ON(st->state != target);

I was thinking more like:

      	 else 
                   if (st->target != target) st->target = target;

Since this is a write to the target field and we are not
doing one of the operations that will set target because 
state == target we should make sure target == target. Although
that could have its own issues, I suppose. But as I said
fixing the boot cpu should make it much less likely that
st->target != st->state once we have the hotplug lock. 
 
I don't see how that WARN would ever fire. We're under the lock
and nothing is re-reading the value of st->state anyway. Looks more
like a compiler sanity check :)


Cheers,
Phil


> > Maybe I'll include that if/when I have code to keep cpux/online in sync
> > with st->state and cpu_online_mask.
> 

-- 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state
  2022-06-01 15:49         ` Phil Auld
@ 2022-06-01 16:39           ` Valentin Schneider
  0 siblings, 0 replies; 9+ messages in thread
From: Valentin Schneider @ 2022-06-01 16:39 UTC (permalink / raw)
  To: Phil Auld; +Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra

On 01/06/22 11:49, Phil Auld wrote:
> On Mon, May 30, 2022 at 01:27:00PM +0100 Valentin Schneider wrote:
>> 
>> Yeah, you could append a simple:
>> 
>>         else
>>                 WARN_ON(st->state != target);
>
> I was thinking more like:
>
>       	 else 
>                    if (st->target != target) st->target = target;
>
> Since this is a write to the target field and we are not
> doing one of the operations that will set target because 
> state == target we should make sure target == target. Although
> that could have its own issues, I suppose. But as I said
> fixing the boot cpu should make it much less likely that
> st->target != st->state once we have the hotplug lock. 
>  
> I don't see how that WARN would ever fire. We're under the lock
> and nothing is re-reading the value of st->state anyway. Looks more
> like a compiler sanity check :)
>

Right, having a warning in there would mostly be to catch
unexpected/unintended scenarios like the one you've found for the boot
CPU. I'd vote for

WARN("Huh, didn't expect that")
+
fix st->target

>
> Cheers,
> Phil
>
>
>> > Maybe I'll include that if/when I have code to keep cpux/online in sync
>> > with st->state and cpu_online_mask.
>> 
>
> -- 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-06-01 16:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-26 16:06 [PATCH v2 0/2] cpuhp: fix some st->target issues Phil Auld
2022-05-26 16:06 ` [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state Phil Auld
2022-05-27  9:38   ` Valentin Schneider
2022-05-27 13:22     ` Phil Auld
2022-05-30 12:27       ` Valentin Schneider
2022-06-01 15:49         ` Phil Auld
2022-06-01 16:39           ` Valentin Schneider
2022-05-26 16:06 ` [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu Phil Auld
2022-05-27  9:38   ` Valentin Schneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).