* [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
@ 2018-09-04 6:33 Neeraj Upadhyay
2018-09-04 12:12 ` Mukesh Ojha
2018-09-05 11:33 ` Thomas Gleixner
0 siblings, 2 replies; 13+ messages in thread
From: Neeraj Upadhyay @ 2018-09-04 6:33 UTC (permalink / raw)
To: tglx, josh, peterz, mingo, jiangshanlai, dzickus, brendan.jackman, malat
Cc: linux-kernel, sramana, linux-arm-msm, Neeraj Upadhyay
If takedown_cpu() fails during _cpu_down(), st->state is reset,
by calling cpuhp_reset_state(). This results in an additional
increment of st->state, which results in CPUHP_AP_SMPBOOT_THREADS
state being skipped during rollback. Fix this by not calling
cpuhp_reset_state() and doing the state reset directly in
_cpu_down().
Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
---
kernel/cpu.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa7fe85..9f49edb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -970,7 +970,14 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
*/
ret = cpuhp_down_callbacks(cpu, st, target);
if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
- cpuhp_reset_state(st, prev_state);
+ /*
+ * As st->last is not set, cpuhp_reset_state() increments
+ * st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
+ * skipped during rollback. So, don't use it here.
+ */
+ st->rollback = true;
+ st->target = prev_state;
+ st->bringup = !st->bringup;
__cpuhp_kick_ap(st);
}
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-04 6:33 [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu() Neeraj Upadhyay
@ 2018-09-04 12:12 ` Mukesh Ojha
2018-09-05 11:33 ` Thomas Gleixner
1 sibling, 0 replies; 13+ messages in thread
From: Mukesh Ojha @ 2018-09-04 12:12 UTC (permalink / raw)
To: Neeraj Upadhyay, tglx, josh, peterz, mingo, jiangshanlai,
dzickus, brendan.jackman, malat
Cc: linux-kernel, sramana, linux-arm-msm
On 9/4/2018 12:03 PM, Neeraj Upadhyay wrote:
> If takedown_cpu() fails during _cpu_down(), st->state is reset,
> by calling cpuhp_reset_state(). This results in an additional
> increment of st->state, which results in CPUHP_AP_SMPBOOT_THREADS
> state being skipped during rollback. Fix this by not calling
> cpuhp_reset_state() and doing the state reset directly in
> _cpu_down().
>
> Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
> Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
> ---
> kernel/cpu.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index aa7fe85..9f49edb 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -970,7 +970,14 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
> */
> ret = cpuhp_down_callbacks(cpu, st, target);
> if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
> - cpuhp_reset_state(st, prev_state);
> + /*
> + * As st->last is not set, cpuhp_reset_state() increments
> + * st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
> + * skipped during rollback. So, don't use it here.
> + */
> + st->rollback = true;
> + st->target = prev_state;
> + st->bringup = !st->bringup;
Acked-by: Mukesh Ojha <mojha@codeaurora.org>
looks valid for this case.
> __cpuhp_kick_ap(st);
> }
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-04 6:33 [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu() Neeraj Upadhyay
2018-09-04 12:12 ` Mukesh Ojha
@ 2018-09-05 11:33 ` Thomas Gleixner
2018-09-05 12:09 ` Mukesh Ojha
2018-09-05 12:23 ` Thomas Gleixner
1 sibling, 2 replies; 13+ messages in thread
From: Thomas Gleixner @ 2018-09-05 11:33 UTC (permalink / raw)
To: Neeraj Upadhyay
Cc: josh, peterz, mingo, jiangshanlai, dzickus, brendan.jackman,
malat, linux-kernel, sramana, linux-arm-msm
On Tue, 4 Sep 2018, Neeraj Upadhyay wrote:
> If takedown_cpu() fails during _cpu_down(), st->state is reset,
> by calling cpuhp_reset_state(). This results in an additional
> increment of st->state, which results in CPUHP_AP_SMPBOOT_THREADS
> state being skipped during rollback. Fix this by not calling
> cpuhp_reset_state() and doing the state reset directly in
> _cpu_down().
>
> Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
> Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
> ---
> kernel/cpu.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index aa7fe85..9f49edb 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -970,7 +970,14 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
> */
> ret = cpuhp_down_callbacks(cpu, st, target);
> if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
> - cpuhp_reset_state(st, prev_state);
> + /*
> + * As st->last is not set, cpuhp_reset_state() increments
> + * st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
> + * skipped during rollback. So, don't use it here.
> + */
> + st->rollback = true;
> + st->target = prev_state;
> + st->bringup = !st->bringup;
No, this is just papering over the actual problem.
The state inconsistency happens in take_cpu_down() when it returns with a
failure from __cpu_disable() because that returns with state = TEARDOWN_CPU
and st->state is then incremented in undo_cpu_down().
That's the real issue and we need to analyze the whole cpu_down rollback
logic first.
Thanks,
tglx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-05 11:33 ` Thomas Gleixner
@ 2018-09-05 12:09 ` Mukesh Ojha
2018-09-05 13:14 ` Thomas Gleixner
2018-09-05 12:23 ` Thomas Gleixner
1 sibling, 1 reply; 13+ messages in thread
From: Mukesh Ojha @ 2018-09-05 12:09 UTC (permalink / raw)
To: Thomas Gleixner, Neeraj Upadhyay
Cc: josh, peterz, mingo, jiangshanlai, dzickus, brendan.jackman,
malat, linux-kernel, sramana, linux-arm-msm
On 9/5/2018 5:03 PM, Thomas Gleixner wrote:
> On Tue, 4 Sep 2018, Neeraj Upadhyay wrote:
>> If takedown_cpu() fails during _cpu_down(), st->state is reset,
>> by calling cpuhp_reset_state(). This results in an additional
>> increment of st->state, which results in CPUHP_AP_SMPBOOT_THREADS
>> state being skipped during rollback. Fix this by not calling
>> cpuhp_reset_state() and doing the state reset directly in
>> _cpu_down().
>>
>> Fixes: 4dddfb5faa61 ("smp/hotplug: Rewrite AP state machine core")
>> Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
>> ---
>> kernel/cpu.c | 9 ++++++++-
>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index aa7fe85..9f49edb 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -970,7 +970,14 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
>> */
>> ret = cpuhp_down_callbacks(cpu, st, target);
>> if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
>> - cpuhp_reset_state(st, prev_state);
>> + /*
>> + * As st->last is not set, cpuhp_reset_state() increments
>> + * st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
>> + * skipped during rollback. So, don't use it here.
>> + */
>> + st->rollback = true;
>> + st->target = prev_state;
>> + st->bringup = !st->bringup;
> No, this is just papering over the actual problem.
>
> The state inconsistency happens in take_cpu_down() when it returns with a
> failure from __cpu_disable() because that returns with state = TEARDOWN_CPU
> and st->state is then incremented in undo_cpu_down().
>
> That's the real issue and we need to analyze the whole cpu_down rollback
> logic first.
Could this be done like below ?
diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa7fe85..47bce90 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -802,17 +802,18 @@ static int take_cpu_down(void *_param)
int err, cpu = smp_processor_id();
int ret;
- /* Ensure this CPU doesn't handle any more interrupts. */
- err = __cpu_disable();
- if (err < 0)
- return err;
-
/*
* We get here while we are in CPUHP_TEARDOWN_CPU state and we
must not
* do this step again.
*/
WARN_ON(st->state != CPUHP_TEARDOWN_CPU);
st->state--;
+
+ /* Ensure this CPU doesn't handle any more interrupts. */
+ err = __cpu_disable();
+ if (err < 0)
+ return err;
+
/* Invoke the former CPU_DYING callbacks */
Thanks,
Mukesh
>
>
>
>
>
>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-05 11:33 ` Thomas Gleixner
2018-09-05 12:09 ` Mukesh Ojha
@ 2018-09-05 12:23 ` Thomas Gleixner
2018-09-05 12:51 ` Neeraj Upadhyay
2018-09-05 14:53 ` Sudeep Holla
1 sibling, 2 replies; 13+ messages in thread
From: Thomas Gleixner @ 2018-09-05 12:23 UTC (permalink / raw)
To: Neeraj Upadhyay
Cc: josh, peterz, mingo, jiangshanlai, dzickus, brendan.jackman,
malat, linux-kernel, sramana, linux-arm-msm
On Wed, 5 Sep 2018, Thomas Gleixner wrote:
> On Tue, 4 Sep 2018, Neeraj Upadhyay wrote:
> > ret = cpuhp_down_callbacks(cpu, st, target);
> > if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
> > - cpuhp_reset_state(st, prev_state);
> > + /*
> > + * As st->last is not set, cpuhp_reset_state() increments
> > + * st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
> > + * skipped during rollback. So, don't use it here.
> > + */
> > + st->rollback = true;
> > + st->target = prev_state;
> > + st->bringup = !st->bringup;
>
> No, this is just papering over the actual problem.
>
> The state inconsistency happens in take_cpu_down() when it returns with a
> failure from __cpu_disable() because that returns with state = TEARDOWN_CPU
> and st->state is then incremented in undo_cpu_down().
>
> That's the real issue and we need to analyze the whole cpu_down rollback
> logic first.
And looking closer this is a general issue. Just that the TEARDOWN state
makes it simple to observe. It's universaly broken, when the first teardown
callback fails because, st->state is only decremented _AFTER_ the callback
returns success, but undo_cpu_down() increments unconditionally.
Patch below.
Thanks,
tglx
----
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -916,7 +916,8 @@ static int cpuhp_down_callbacks(unsigned
ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
if (ret) {
st->target = prev_state;
- undo_cpu_down(cpu, st);
+ if (st->state < prev_state)
+ undo_cpu_down(cpu, st);
break;
}
}
@@ -969,7 +970,7 @@ static int __ref _cpu_down(unsigned int
* to do the further cleanups.
*/
ret = cpuhp_down_callbacks(cpu, st, target);
- if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
+ if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
cpuhp_reset_state(st, prev_state);
__cpuhp_kick_ap(st);
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-05 12:23 ` Thomas Gleixner
@ 2018-09-05 12:51 ` Neeraj Upadhyay
2018-09-05 13:17 ` Thomas Gleixner
2018-09-05 14:53 ` Sudeep Holla
1 sibling, 1 reply; 13+ messages in thread
From: Neeraj Upadhyay @ 2018-09-05 12:51 UTC (permalink / raw)
To: Thomas Gleixner
Cc: josh, peterz, mingo, jiangshanlai, dzickus, brendan.jackman,
malat, linux-kernel, sramana, linux-arm-msm
On 09/05/2018 05:53 PM, Thomas Gleixner wrote:
> On Wed, 5 Sep 2018, Thomas Gleixner wrote:
>> On Tue, 4 Sep 2018, Neeraj Upadhyay wrote:
>>> ret = cpuhp_down_callbacks(cpu, st, target);
>>> if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
>>> - cpuhp_reset_state(st, prev_state);
>>> + /*
>>> + * As st->last is not set, cpuhp_reset_state() increments
>>> + * st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
>>> + * skipped during rollback. So, don't use it here.
>>> + */
>>> + st->rollback = true;
>>> + st->target = prev_state;
>>> + st->bringup = !st->bringup;
>> No, this is just papering over the actual problem.
>>
>> The state inconsistency happens in take_cpu_down() when it returns with a
>> failure from __cpu_disable() because that returns with state = TEARDOWN_CPU
>> and st->state is then incremented in undo_cpu_down().
>>
>> That's the real issue and we need to analyze the whole cpu_down rollback
>> logic first.
> And looking closer this is a general issue. Just that the TEARDOWN state
> makes it simple to observe. It's universaly broken, when the first teardown
> callback fails because, st->state is only decremented _AFTER_ the callback
> returns success, but undo_cpu_down() increments unconditionally.
>
> Patch below.
>
> Thanks,
>
> tglx
As per my understanding, there are 2 problems here; one is fixed with
your patch, and other is cpuhp_reset_state() is used during rollback
from non-AP to AP state, which seem to result in 2 increments of
st->state (one increment done by cpuhp_reset_state() and another by
cpu_thread_fun()) .
> ----
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -916,7 +916,8 @@ static int cpuhp_down_callbacks(unsigned
> ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
> if (ret) {
> st->target = prev_state;
> - undo_cpu_down(cpu, st);
> + if (st->state < prev_state)
> + undo_cpu_down(cpu, st);
> break;
> }
> }
> @@ -969,7 +970,7 @@ static int __ref _cpu_down(unsigned int
> * to do the further cleanups.
> */
> ret = cpuhp_down_callbacks(cpu, st, target);
> - if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
> + if (ret && st->state == CPUHP_TEARDOWN_CPU && st->state < prev_state) {
> cpuhp_reset_state(st, prev_state);
> __cpuhp_kick_ap(st);
> }
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-05 12:09 ` Mukesh Ojha
@ 2018-09-05 13:14 ` Thomas Gleixner
0 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2018-09-05 13:14 UTC (permalink / raw)
To: Mukesh Ojha
Cc: Neeraj Upadhyay, josh, peterz, mingo, jiangshanlai, dzickus,
brendan.jackman, malat, linux-kernel, sramana, linux-arm-msm
[-- Attachment #1: Type: text/plain, Size: 1003 bytes --]
On Wed, 5 Sep 2018, Mukesh Ojha wrote:
> On 9/5/2018 5:03 PM, Thomas Gleixner wrote:
> > > + st->rollback = true;
> > > + st->target = prev_state;
> > > + st->bringup = !st->bringup;
> > No, this is just papering over the actual problem.
> >
> > The state inconsistency happens in take_cpu_down() when it returns with a
> > failure from __cpu_disable() because that returns with state = TEARDOWN_CPU
> > and st->state is then incremented in undo_cpu_down().
> >
> > That's the real issue and we need to analyze the whole cpu_down rollback
> > logic first.
>
> Could this be done like below ?
IOW, more papering over the real root cause.
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index aa7fe85..47bce90 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -802,17 +802,18 @@ static int take_cpu_down(void *_param)
> int err, cpu = smp_processor_id();
> int ret;
^^^^^^^
Please fix your mailer or your editor. That patch is whitespace damaged.
Thanks,
tglx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-05 12:51 ` Neeraj Upadhyay
@ 2018-09-05 13:17 ` Thomas Gleixner
2018-09-06 2:56 ` Neeraj Upadhyay
0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2018-09-05 13:17 UTC (permalink / raw)
To: Neeraj Upadhyay
Cc: josh, peterz, mingo, jiangshanlai, dzickus, brendan.jackman,
malat, linux-kernel, sramana, linux-arm-msm
On Wed, 5 Sep 2018, Neeraj Upadhyay wrote:
> On 09/05/2018 05:53 PM, Thomas Gleixner wrote:
> >
> > And looking closer this is a general issue. Just that the TEARDOWN state
> > makes it simple to observe. It's universaly broken, when the first teardown
> > callback fails because, st->state is only decremented _AFTER_ the callback
> > returns success, but undo_cpu_down() increments unconditionally.
> >
>
> As per my understanding, there are 2 problems here; one is fixed with your
> patch, and other is cpuhp_reset_state() is used during rollback from non-AP to
> AP state, which seem to result in 2 increments of st->state (one increment
> done by cpuhp_reset_state() and another by cpu_thread_fun()) .
And how did your hack fix that up magically? I'll have a look later today.
Thanks,
tglx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-05 12:23 ` Thomas Gleixner
2018-09-05 12:51 ` Neeraj Upadhyay
@ 2018-09-05 14:53 ` Sudeep Holla
2018-09-05 15:19 ` Geert Uytterhoeven
1 sibling, 1 reply; 13+ messages in thread
From: Sudeep Holla @ 2018-09-05 14:53 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Neeraj Upadhyay, josh, peterz, mingo, jiangshanlai, dzickus,
brendan.jackman, malat, linux-kernel, sramana, linux-arm-msm,
Geert Uytterhoeven, Lorenzo Pieralisi, Sudeep Holla
On Wed, Sep 05, 2018 at 02:23:46PM +0200, Thomas Gleixner wrote:
> On Wed, 5 Sep 2018, Thomas Gleixner wrote:
> > On Tue, 4 Sep 2018, Neeraj Upadhyay wrote:
> > > ret = cpuhp_down_callbacks(cpu, st, target);
> > > if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
> > > - cpuhp_reset_state(st, prev_state);
> > > + /*
> > > + * As st->last is not set, cpuhp_reset_state() increments
> > > + * st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
> > > + * skipped during rollback. So, don't use it here.
> > > + */
> > > + st->rollback = true;
> > > + st->target = prev_state;
> > > + st->bringup = !st->bringup;
> >
> > No, this is just papering over the actual problem.
> >
> > The state inconsistency happens in take_cpu_down() when it returns with a
> > failure from __cpu_disable() because that returns with state = TEARDOWN_CPU
> > and st->state is then incremented in undo_cpu_down().
> >
> > That's the real issue and we need to analyze the whole cpu_down rollback
> > logic first.
>
> And looking closer this is a general issue. Just that the TEARDOWN state
> makes it simple to observe. It's universaly broken, when the first teardown
> callback fails because, st->state is only decremented _AFTER_ the callback
> returns success, but undo_cpu_down() increments unconditionally.
>
> Patch below.
This patch fixes the issue reported @[1]. Lorenzo did some debugging and
I wanted to have a look at it at some point but this discussion drew my
attention and sounded very similar[2]. So I did a quick test with this
patch and it fixes the issue.
--
Regards,
Sudeep
[1] https://lore.kernel.org/lkml/CAMuHMdVg868LgL5xTg5Dp5rReKxoo+8fRy+ETJiMxGWZCp+hWw@mail.gmail.com/
[2] https://lore.kernel.org/lkml/20180823131505.GA31558@red-moon/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-05 14:53 ` Sudeep Holla
@ 2018-09-05 15:19 ` Geert Uytterhoeven
0 siblings, 0 replies; 13+ messages in thread
From: Geert Uytterhoeven @ 2018-09-05 15:19 UTC (permalink / raw)
To: Sudeep Holla
Cc: Thomas Gleixner, Neeraj Upadhyay, Josh Triplett, Peter Zijlstra,
Ingo Molnar, Lai Jiangshan, dzickus, Brendan Jackman,
Mathieu Malaterre, Linux Kernel Mailing List, sramana,
linux-arm-msm, Lorenzo Pieralisi, Linux-Renesas
Hi Sudeep,
Thanks for the CC!
On Wed, Sep 5, 2018 at 4:54 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> On Wed, Sep 05, 2018 at 02:23:46PM +0200, Thomas Gleixner wrote:
> > On Wed, 5 Sep 2018, Thomas Gleixner wrote:
> > > On Tue, 4 Sep 2018, Neeraj Upadhyay wrote:
> > > > ret = cpuhp_down_callbacks(cpu, st, target);
> > > > if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
> > > > - cpuhp_reset_state(st, prev_state);
> > > > + /*
> > > > + * As st->last is not set, cpuhp_reset_state() increments
> > > > + * st->state, which results in CPUHP_AP_SMPBOOT_THREADS being
> > > > + * skipped during rollback. So, don't use it here.
> > > > + */
> > > > + st->rollback = true;
> > > > + st->target = prev_state;
> > > > + st->bringup = !st->bringup;
> > >
> > > No, this is just papering over the actual problem.
> > >
> > > The state inconsistency happens in take_cpu_down() when it returns with a
> > > failure from __cpu_disable() because that returns with state = TEARDOWN_CPU
> > > and st->state is then incremented in undo_cpu_down().
> > >
> > > That's the real issue and we need to analyze the whole cpu_down rollback
> > > logic first.
> >
> > And looking closer this is a general issue. Just that the TEARDOWN state
> > makes it simple to observe. It's universaly broken, when the first teardown
> > callback fails because, st->state is only decremented _AFTER_ the callback
> > returns success, but undo_cpu_down() increments unconditionally.
> >
> > Patch below.
>
> This patch fixes the issue reported @[1]. Lorenzo did some debugging and
> I wanted to have a look at it at some point but this discussion drew my
> attention and sounded very similar[2]. So I did a quick test with this
> patch and it fixes the issue.
> [1] https://lore.kernel.org/lkml/CAMuHMdVg868LgL5xTg5Dp5rReKxoo+8fRy+ETJiMxGWZCp+hWw@mail.gmail.com/
> [2] https://lore.kernel.org/lkml/20180823131505.GA31558@red-moon/
Thomas' patch fixes the issue for me:
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-05 13:17 ` Thomas Gleixner
@ 2018-09-06 2:56 ` Neeraj Upadhyay
2018-09-06 8:18 ` Thomas Gleixner
0 siblings, 1 reply; 13+ messages in thread
From: Neeraj Upadhyay @ 2018-09-06 2:56 UTC (permalink / raw)
To: Thomas Gleixner
Cc: josh, peterz, mingo, jiangshanlai, dzickus, brendan.jackman,
malat, linux-kernel, sramana, linux-arm-msm
On 09/05/2018 06:47 PM, Thomas Gleixner wrote:
> On Wed, 5 Sep 2018, Neeraj Upadhyay wrote:
>> On 09/05/2018 05:53 PM, Thomas Gleixner wrote:
>>> And looking closer this is a general issue. Just that the TEARDOWN state
>>> makes it simple to observe. It's universaly broken, when the first teardown
>>> callback fails because, st->state is only decremented _AFTER_ the callback
>>> returns success, but undo_cpu_down() increments unconditionally.
>>>
>> As per my understanding, there are 2 problems here; one is fixed with your
>> patch, and other is cpuhp_reset_state() is used during rollback from non-AP to
>> AP state, which seem to result in 2 increments of st->state (one increment
>> done by cpuhp_reset_state() and another by cpu_thread_fun()) .
> And how did your hack fix that up magically? I'll have a look later today.
>
> Thanks,
>
> tglx
The hack fixes it by not calling cpuhp_reset_state() and doing rollback
state reset inline in _cpu_down().
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-06 2:56 ` Neeraj Upadhyay
@ 2018-09-06 8:18 ` Thomas Gleixner
2018-09-06 9:00 ` Neeraj Upadhyay
0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2018-09-06 8:18 UTC (permalink / raw)
To: Neeraj Upadhyay
Cc: josh, peterz, mingo, jiangshanlai, dzickus, brendan.jackman,
malat, linux-kernel, sramana, linux-arm-msm
[-- Attachment #1: Type: text/plain, Size: 1311 bytes --]
On Thu, 6 Sep 2018, Neeraj Upadhyay wrote:
> On 09/05/2018 06:47 PM, Thomas Gleixner wrote:
> > On Wed, 5 Sep 2018, Neeraj Upadhyay wrote:
> > > On 09/05/2018 05:53 PM, Thomas Gleixner wrote:
> > > > And looking closer this is a general issue. Just that the TEARDOWN state
> > > > makes it simple to observe. It's universaly broken, when the first
> > > > teardown
> > > > callback fails because, st->state is only decremented _AFTER_ the
> > > > callback
> > > > returns success, but undo_cpu_down() increments unconditionally.
> > > >
> > > As per my understanding, there are 2 problems here; one is fixed with your
> > > patch, and other is cpuhp_reset_state() is used during rollback from
> > > non-AP to
> > > AP state, which seem to result in 2 increments of st->state (one increment
> > > done by cpuhp_reset_state() and another by cpu_thread_fun()) .
> > And how did your hack fix that up magically? I'll have a look later today.
> >
> > Thanks,
> >
> > tglx
>
> The hack fixes it by not calling cpuhp_reset_state() and doing rollback state
> reset inline in _cpu_down().
And how is that any different from the proper patch I sent? It ends up in
the same state. So I have a hard time to understand your blurb above where
you claim that my patch just solves one of two problems?
Thanks,
tglx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu()
2018-09-06 8:18 ` Thomas Gleixner
@ 2018-09-06 9:00 ` Neeraj Upadhyay
0 siblings, 0 replies; 13+ messages in thread
From: Neeraj Upadhyay @ 2018-09-06 9:00 UTC (permalink / raw)
To: Thomas Gleixner
Cc: josh, peterz, mingo, jiangshanlai, dzickus, brendan.jackman,
malat, linux-kernel, sramana, linux-arm-msm
On 09/06/2018 01:48 PM, Thomas Gleixner wrote:
> On Thu, 6 Sep 2018, Neeraj Upadhyay wrote:
>> On 09/05/2018 06:47 PM, Thomas Gleixner wrote:
>>> On Wed, 5 Sep 2018, Neeraj Upadhyay wrote:
>>>> On 09/05/2018 05:53 PM, Thomas Gleixner wrote:
>>>>> And looking closer this is a general issue. Just that the TEARDOWN state
>>>>> makes it simple to observe. It's universaly broken, when the first
>>>>> teardown
>>>>> callback fails because, st->state is only decremented _AFTER_ the
>>>>> callback
>>>>> returns success, but undo_cpu_down() increments unconditionally.
>>>>>
>>>> As per my understanding, there are 2 problems here; one is fixed with your
>>>> patch, and other is cpuhp_reset_state() is used during rollback from
>>>> non-AP to
>>>> AP state, which seem to result in 2 increments of st->state (one increment
>>>> done by cpuhp_reset_state() and another by cpu_thread_fun()) .
>>> And how did your hack fix that up magically? I'll have a look later today.
>>>
>>> Thanks,
>>>
>>> tglx
>>
>> The hack fixes it by not calling cpuhp_reset_state() and doing rollback state
>> reset inline in _cpu_down().
>
> And how is that any different from the proper patch I sent? It ends up in
> the same state. So I have a hard time to understand your blurb above where
> you claim that my patch just solves one of two problems?
>
> Thanks,
>
> tglx
>
Yes, your patch solves the problem related to smpboot unparking being
skipped during rollback and with the hack we end up in same state. The
second thing, which I am referring to, is that there is one additional
state increment. I missed the part that, it could be required, so that
we reach CPUHP_AP_ONLINE_IDLE before calling __cpuhp_kick_ap(). So, it's
not a problem.
* cpuhp_reset_state() does one state increment and we reach
CPUHP_AP_ONLINE_IDLE.
if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) {
cpuhp_reset_state(st, prev_state);
__cpuhp_kick_ap(st);
}
CPUHP_TEARDOWN_CPU,
CPUHP_AP_ONLINE_IDLE,
CPUHP_AP_SMPBOOT_THREADS,
* cpuhp_thread_fun() does one more increment before invoking state
callback (so, we skip CPUHP_AP_ONLINE_IDLE) and we reach
CPUHP_AP_SMPBOOT_THREADS:
static void cpuhp_thread_fun(unsigned int cpu)
<snip>
if (bringup) {
st->state++;
state = st->state;
<snip>
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-09-06 9:00 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-04 6:33 [PATCH] cpu/hotplug: Fix rollback during error-out in takedown_cpu() Neeraj Upadhyay
2018-09-04 12:12 ` Mukesh Ojha
2018-09-05 11:33 ` Thomas Gleixner
2018-09-05 12:09 ` Mukesh Ojha
2018-09-05 13:14 ` Thomas Gleixner
2018-09-05 12:23 ` Thomas Gleixner
2018-09-05 12:51 ` Neeraj Upadhyay
2018-09-05 13:17 ` Thomas Gleixner
2018-09-06 2:56 ` Neeraj Upadhyay
2018-09-06 8:18 ` Thomas Gleixner
2018-09-06 9:00 ` Neeraj Upadhyay
2018-09-05 14:53 ` Sudeep Holla
2018-09-05 15:19 ` Geert Uytterhoeven
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).