* [PATCH v2 0/2] cpuhp: fix some st->target issues @ 2022-05-26 16:06 Phil Auld 2022-05-26 16:06 ` [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state Phil Auld 2022-05-26 16:06 ` [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu Phil Auld 0 siblings, 2 replies; 9+ messages in thread From: Phil Auld @ 2022-05-26 16:06 UTC (permalink / raw) To: linux-kernel; +Cc: Thomas Gleixner, Peter Zijlstra, Valentin Schneider Several small fixes that clean up some cpuhp inconsistencies. The first prevents target_store() from calling cpu_down() when target == state which prevents the cpu being incorrectly marked as dying. The second just makes the boot cpu have a valid cpuhp target rather than 0 (CPU_OFFLINE) while being in state CPU_ONLINE. A further issue which these two patches don't address is that the cpuX/online file looks at the device->offline state and can thus get out of sync with the actual cpuhp state if the cpuhp target is used to change state. Cheers, Phil Phil Auld (2): cpuhp: make target_store() a nop when target == state cpuhp: Set cpuhp target for boot cpu kernel/cpu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- 2.18.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state 2022-05-26 16:06 [PATCH v2 0/2] cpuhp: fix some st->target issues Phil Auld @ 2022-05-26 16:06 ` Phil Auld 2022-05-27 9:38 ` Valentin Schneider 2022-05-26 16:06 ` [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu Phil Auld 1 sibling, 1 reply; 9+ messages in thread From: Phil Auld @ 2022-05-26 16:06 UTC (permalink / raw) To: linux-kernel; +Cc: Thomas Gleixner, Peter Zijlstra, Valentin Schneider writing the current state back in hotplug/target calls cpu_down() which will set cpu dying even when it isn't and then nothing will ever clear it. A stress test that reads values and writes them back for all cpu device files in sysfs will trigger the BUG() in select_fallback_rq once all cpus are marked as dying. kernel/cpu.c::target_store() ... if (st->state < target) ret = cpu_up(dev->id, target); else ret = cpu_down(dev->id, target); cpu_down() -> cpu_set_state() bool bringup = st->state < target; ... if (cpu_dying(cpu) != !bringup) set_cpu_dying(cpu, !bringup); Fix this by letting state==target fall through in the target_store() conditional. Signed-off-by: Phil Auld <pauld@redhat.com> --- kernel/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index d0a9aa0b42e8..cdb6ac10ad94 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2315,7 +2315,7 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr, if (st->state < target) ret = cpu_up(dev->id, target); - else + else if (st->state > target) ret = cpu_down(dev->id, target); out: unlock_device_hotplug(); -- 2.18.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state 2022-05-26 16:06 ` [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state Phil Auld @ 2022-05-27 9:38 ` Valentin Schneider 2022-05-27 13:22 ` Phil Auld 0 siblings, 1 reply; 9+ messages in thread From: Valentin Schneider @ 2022-05-27 9:38 UTC (permalink / raw) To: Phil Auld, linux-kernel; +Cc: Thomas Gleixner, Peter Zijlstra On 26/05/22 12:06, Phil Auld wrote: > writing the current state back in hotplug/target calls cpu_down() > which will set cpu dying even when it isn't and then nothing will > ever clear it. A stress test that reads values and writes them back > for all cpu device files in sysfs will trigger the BUG() in > select_fallback_rq once all cpus are marked as dying. > > kernel/cpu.c::target_store() > ... > if (st->state < target) > ret = cpu_up(dev->id, target); > else > ret = cpu_down(dev->id, target); > > cpu_down() -> cpu_set_state() > bool bringup = st->state < target; > ... > if (cpu_dying(cpu) != !bringup) > set_cpu_dying(cpu, !bringup); > > Fix this by letting state==target fall through in the target_store() > conditional. > To go back on my data race paranoia: writes to both cpu$x/online and cpu$x/hotplug/target are serialized by device_hotplug_lock, and so are the exported kernel hotplug functions ({add, remove}_cpu()). That's not cpu_add_remove_lock as I was looking for, but that's still all under one lock, so I think we're good. Sorry for that! > Signed-off-by: Phil Auld <pauld@redhat.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> > --- > kernel/cpu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index d0a9aa0b42e8..cdb6ac10ad94 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -2315,7 +2315,7 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr, > > if (st->state < target) > ret = cpu_up(dev->id, target); > - else > + else if (st->state > target) > ret = cpu_down(dev->id, target); > out: > unlock_device_hotplug(); > -- > 2.18.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state 2022-05-27 9:38 ` Valentin Schneider @ 2022-05-27 13:22 ` Phil Auld 2022-05-30 12:27 ` Valentin Schneider 0 siblings, 1 reply; 9+ messages in thread From: Phil Auld @ 2022-05-27 13:22 UTC (permalink / raw) To: Valentin Schneider; +Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra On Fri, May 27, 2022 at 10:38:24AM +0100 Valentin Schneider wrote: > On 26/05/22 12:06, Phil Auld wrote: > > writing the current state back in hotplug/target calls cpu_down() > > which will set cpu dying even when it isn't and then nothing will > > ever clear it. A stress test that reads values and writes them back > > for all cpu device files in sysfs will trigger the BUG() in > > select_fallback_rq once all cpus are marked as dying. > > > > kernel/cpu.c::target_store() > > ... > > if (st->state < target) > > ret = cpu_up(dev->id, target); > > else > > ret = cpu_down(dev->id, target); > > > > cpu_down() -> cpu_set_state() > > bool bringup = st->state < target; > > ... > > if (cpu_dying(cpu) != !bringup) > > set_cpu_dying(cpu, !bringup); > > > > Fix this by letting state==target fall through in the target_store() > > conditional. > > > > To go back on my data race paranoia: writes to both cpu$x/online and > cpu$x/hotplug/target are serialized by device_hotplug_lock, and so are the > exported kernel hotplug functions ({add, remove}_cpu()). > > That's not cpu_add_remove_lock as I was looking for, but that's still all > under one lock, so I think we're good. Sorry for that! > Right. This catches it up higher so that we don't get into the code that starts actually changing things. I wonder now in the state == target case if we should make sure st->target == target. With the second patch it's less likely to be needed. Thoughts? Maybe I'll include that if/when I have code to keep cpux/online in sync with st->state and cpu_online_mask. > > Signed-off-by: Phil Auld <pauld@redhat.com> > > Reviewed-by: Valentin Schneider <vschneid@redhat.com> Thanks! Cheers, Phil > > > --- > > kernel/cpu.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/cpu.c b/kernel/cpu.c > > index d0a9aa0b42e8..cdb6ac10ad94 100644 > > --- a/kernel/cpu.c > > +++ b/kernel/cpu.c > > @@ -2315,7 +2315,7 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr, > > > > if (st->state < target) > > ret = cpu_up(dev->id, target); > > - else > > + else if (st->state > target) > > ret = cpu_down(dev->id, target); > > out: > > unlock_device_hotplug(); > > -- > > 2.18.0 > -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state 2022-05-27 13:22 ` Phil Auld @ 2022-05-30 12:27 ` Valentin Schneider 2022-06-01 15:49 ` Phil Auld 0 siblings, 1 reply; 9+ messages in thread From: Valentin Schneider @ 2022-05-30 12:27 UTC (permalink / raw) To: Phil Auld; +Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra On 27/05/22 09:22, Phil Auld wrote: > On Fri, May 27, 2022 at 10:38:24AM +0100 Valentin Schneider wrote: >> On 26/05/22 12:06, Phil Auld wrote: >> > writing the current state back in hotplug/target calls cpu_down() >> > which will set cpu dying even when it isn't and then nothing will >> > ever clear it. A stress test that reads values and writes them back >> > for all cpu device files in sysfs will trigger the BUG() in >> > select_fallback_rq once all cpus are marked as dying. >> > >> > kernel/cpu.c::target_store() >> > ... >> > if (st->state < target) >> > ret = cpu_up(dev->id, target); >> > else >> > ret = cpu_down(dev->id, target); >> > >> > cpu_down() -> cpu_set_state() >> > bool bringup = st->state < target; >> > ... >> > if (cpu_dying(cpu) != !bringup) >> > set_cpu_dying(cpu, !bringup); >> > >> > Fix this by letting state==target fall through in the target_store() >> > conditional. >> > >> >> To go back on my data race paranoia: writes to both cpu$x/online and >> cpu$x/hotplug/target are serialized by device_hotplug_lock, and so are the >> exported kernel hotplug functions ({add, remove}_cpu()). >> >> That's not cpu_add_remove_lock as I was looking for, but that's still all >> under one lock, so I think we're good. Sorry for that! >> > > Right. This catches it up higher so that we don't get into the code that > starts actually changing things. I wonder now in the state == target case > if we should make sure st->target == target. With the second patch it's > less likely to be needed. Thoughts? > Yeah, you could append a simple: else WARN_ON(st->state != target); > Maybe I'll include that if/when I have code to keep cpux/online in sync > with st->state and cpu_online_mask. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state 2022-05-30 12:27 ` Valentin Schneider @ 2022-06-01 15:49 ` Phil Auld 2022-06-01 16:39 ` Valentin Schneider 0 siblings, 1 reply; 9+ messages in thread From: Phil Auld @ 2022-06-01 15:49 UTC (permalink / raw) To: Valentin Schneider; +Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra On Mon, May 30, 2022 at 01:27:00PM +0100 Valentin Schneider wrote: > On 27/05/22 09:22, Phil Auld wrote: > > On Fri, May 27, 2022 at 10:38:24AM +0100 Valentin Schneider wrote: > >> On 26/05/22 12:06, Phil Auld wrote: > >> > writing the current state back in hotplug/target calls cpu_down() > >> > which will set cpu dying even when it isn't and then nothing will > >> > ever clear it. A stress test that reads values and writes them back > >> > for all cpu device files in sysfs will trigger the BUG() in > >> > select_fallback_rq once all cpus are marked as dying. > >> > > >> > kernel/cpu.c::target_store() > >> > ... > >> > if (st->state < target) > >> > ret = cpu_up(dev->id, target); > >> > else > >> > ret = cpu_down(dev->id, target); > >> > > >> > cpu_down() -> cpu_set_state() > >> > bool bringup = st->state < target; > >> > ... > >> > if (cpu_dying(cpu) != !bringup) > >> > set_cpu_dying(cpu, !bringup); > >> > > >> > Fix this by letting state==target fall through in the target_store() > >> > conditional. > >> > > >> > >> To go back on my data race paranoia: writes to both cpu$x/online and > >> cpu$x/hotplug/target are serialized by device_hotplug_lock, and so are the > >> exported kernel hotplug functions ({add, remove}_cpu()). > >> > >> That's not cpu_add_remove_lock as I was looking for, but that's still all > >> under one lock, so I think we're good. Sorry for that! > >> > > > > Right. This catches it up higher so that we don't get into the code that > > starts actually changing things. I wonder now in the state == target case > > if we should make sure st->target == target. With the second patch it's > > less likely to be needed. Thoughts? > > > > Yeah, you could append a simple: > > else > WARN_ON(st->state != target); I was thinking more like: else if (st->target != target) st->target = target; Since this is a write to the target field and we are not doing one of the operations that will set target because state == target we should make sure target == target. Although that could have its own issues, I suppose. But as I said fixing the boot cpu should make it much less likely that st->target != st->state once we have the hotplug lock. I don't see how that WARN would ever fire. We're under the lock and nothing is re-reading the value of st->state anyway. Looks more like a compiler sanity check :) Cheers, Phil > > Maybe I'll include that if/when I have code to keep cpux/online in sync > > with st->state and cpu_online_mask. > -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state 2022-06-01 15:49 ` Phil Auld @ 2022-06-01 16:39 ` Valentin Schneider 0 siblings, 0 replies; 9+ messages in thread From: Valentin Schneider @ 2022-06-01 16:39 UTC (permalink / raw) To: Phil Auld; +Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra On 01/06/22 11:49, Phil Auld wrote: > On Mon, May 30, 2022 at 01:27:00PM +0100 Valentin Schneider wrote: >> >> Yeah, you could append a simple: >> >> else >> WARN_ON(st->state != target); > > I was thinking more like: > > else > if (st->target != target) st->target = target; > > Since this is a write to the target field and we are not > doing one of the operations that will set target because > state == target we should make sure target == target. Although > that could have its own issues, I suppose. But as I said > fixing the boot cpu should make it much less likely that > st->target != st->state once we have the hotplug lock. > > I don't see how that WARN would ever fire. We're under the lock > and nothing is re-reading the value of st->state anyway. Looks more > like a compiler sanity check :) > Right, having a warning in there would mostly be to catch unexpected/unintended scenarios like the one you've found for the boot CPU. I'd vote for WARN("Huh, didn't expect that") + fix st->target > > Cheers, > Phil > > >> > Maybe I'll include that if/when I have code to keep cpux/online in sync >> > with st->state and cpu_online_mask. >> > > -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu 2022-05-26 16:06 [PATCH v2 0/2] cpuhp: fix some st->target issues Phil Auld 2022-05-26 16:06 ` [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state Phil Auld @ 2022-05-26 16:06 ` Phil Auld 2022-05-27 9:38 ` Valentin Schneider 1 sibling, 1 reply; 9+ messages in thread From: Phil Auld @ 2022-05-26 16:06 UTC (permalink / raw) To: linux-kernel; +Cc: Thomas Gleixner, Peter Zijlstra, Valentin Schneider Since the boot cpu does not go through the hotplug process it ends up with state == CPUHP_ONLINE but target == CPUHP_OFFLINE. Set the target to match in boot_cpu_hotplug_init(). Signed-off-by: Phil Auld <pauld@redhat.com> --- kernel/cpu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/cpu.c b/kernel/cpu.c index cdb6ac10ad94..0bb8ad0fa2d9 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2677,6 +2677,7 @@ void __init boot_cpu_hotplug_init(void) cpumask_set_cpu(smp_processor_id(), &cpus_booted_once_mask); #endif this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); + this_cpu_write(cpuhp_state.target, CPUHP_ONLINE); } /* -- 2.18.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu 2022-05-26 16:06 ` [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu Phil Auld @ 2022-05-27 9:38 ` Valentin Schneider 0 siblings, 0 replies; 9+ messages in thread From: Valentin Schneider @ 2022-05-27 9:38 UTC (permalink / raw) To: Phil Auld, linux-kernel; +Cc: Thomas Gleixner, Peter Zijlstra On 26/05/22 12:06, Phil Auld wrote: > Since the boot cpu does not go through the hotplug process it ends > up with state == CPUHP_ONLINE but target == CPUHP_OFFLINE. > Set the target to match in boot_cpu_hotplug_init(). > > Signed-off-by: Phil Auld <pauld@redhat.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> > --- > kernel/cpu.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/cpu.c b/kernel/cpu.c > index cdb6ac10ad94..0bb8ad0fa2d9 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -2677,6 +2677,7 @@ void __init boot_cpu_hotplug_init(void) > cpumask_set_cpu(smp_processor_id(), &cpus_booted_once_mask); > #endif > this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); > + this_cpu_write(cpuhp_state.target, CPUHP_ONLINE); > } > > /* > -- > 2.18.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-06-01 16:39 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-26 16:06 [PATCH v2 0/2] cpuhp: fix some st->target issues Phil Auld 2022-05-26 16:06 ` [PATCH v2 1/2] cpuhp: make target_store() a nop when target == state Phil Auld 2022-05-27 9:38 ` Valentin Schneider 2022-05-27 13:22 ` Phil Auld 2022-05-30 12:27 ` Valentin Schneider 2022-06-01 15:49 ` Phil Auld 2022-06-01 16:39 ` Valentin Schneider 2022-05-26 16:06 ` [PATCH v2 2/2] cpuhp: Set cpuhp target for boot cpu Phil Auld 2022-05-27 9:38 ` Valentin Schneider
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).