linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()
@ 2019-02-22 22:57 Thiago Jung Bauermann
  2019-02-28  9:47 ` Gautham R Shenoy
  0 siblings, 1 reply; 3+ messages in thread
From: Thiago Jung Bauermann @ 2019-02-22 22:57 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-kernel, Gautham R Shenoy, Michael Bringmann,
	Tyrel Datwyler, Vaidyanathan Srinivasan, Thiago Jung Bauermann

When testing DLPAR CPU add/remove on a system under stress,
pseries_cpu_die() doesn't wait long enough for a CPU to die:

[  446.983944] cpu 148 (hwid 148) Ready to die...
[  446.984062] cpu 149 (hwid 149) Ready to die...
[  446.993518] cpu 150 (hwid 150) Ready to die...
[  446.993543] Querying DEAD? cpu 150 (150) shows 2
[  446.994098] cpu 151 (hwid 151) Ready to die...
[  447.133726] cpu 136 (hwid 136) Ready to die...
[  447.403532] cpu 137 (hwid 137) Ready to die...
[  447.403772] cpu 138 (hwid 138) Ready to die...
[  447.403839] cpu 139 (hwid 139) Ready to die...
[  447.403887] cpu 140 (hwid 140) Ready to die...
[  447.403937] cpu 141 (hwid 141) Ready to die...
[  447.403979] cpu 142 (hwid 142) Ready to die...
[  447.404038] cpu 143 (hwid 143) Ready to die...
[  447.513546] cpu 128 (hwid 128) Ready to die...
[  447.693533] cpu 129 (hwid 129) Ready to die...
[  447.693999] cpu 130 (hwid 130) Ready to die...
[  447.703530] cpu 131 (hwid 131) Ready to die...
[  447.704087] Querying DEAD? cpu 132 (132) shows 2
[  447.704102] cpu 132 (hwid 132) Ready to die...
[  447.713534] cpu 133 (hwid 133) Ready to die...
[  447.714064] Querying DEAD? cpu 134 (134) shows 2

This is a race between one CPU stopping and another one calling
pseries_cpu_die() to wait for it to stop. That function does a short busy
loop calling RTAS query-cpu-stopped-state on the stopping CPU to verify
that it is stopped, but I think there's a lot for the stopping CPU to do
which may take longer than this loop allows.

As can be seen in the dmesg right before or after the "Querying DEAD?"
messages, if pseries_cpu_die() waited a little longer it would have seen
the CPU in the stopped state.

I see two cases that can be causing this race:

1. It's possible that CPU 134 was inactive at the time it was unplugged. In
   that case, dlpar_offline_cpu() calls H_PROD on that CPU and immediately
   calls pseries_cpu_die(). Meanwhile, the prodded CPU activates and start
   the process of stopping itself. It's possible that the busy loop is not
   long enough to allow for the CPU to wake up and complete the stopping
   process.

2. If CPU 134 was online at the time it was unplugged, it would have gone
   through the new CPU hotplug state machine in kernel/cpu.c that was
   introduced in v4.6 to get itself stopped. It's possible that the busy
   loop in pseries_cpu_die() was long enough for the older hotplug code but
   not for the new hotplug state machine.

I don't know if this race condition has any ill effects, but we can make
the race a lot more even if we only start querying if the CPU is stopped
when the stopping CPU is close to call rtas_stop_self().

Since pseries_mach_cpu_die() sets the CPU current state to offline almost
immediately before calling rtas_stop_self(), we use that as a signal that
it is either already stopped or very close to that point, and we can start
the busy loop.

As suggested by Michael Ellerman, this patch also changes the busy loop to
wait for a fixed amount of wall time.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

I tried to estimate good amounts for the timeout and loop delays, but
I'm not sure how reasonable my numbers are. The busy loops will wait for
100 µs between each try, and spin_event_timeout() will timeout after
100 ms. I'll be happy to change these values if you have better
suggestions.

Gautham was able to test this patch and it solved the race condition.

v1 was a cruder patch which just increased the number of loops:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/153734.html

v1 also mentioned a kernel crash but Gautham narrowed it down to a bug
in RTAS, which is in the process of being fixed.

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 97feb6e79f1a..424146cc752e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -214,13 +214,21 @@ static void pseries_cpu_die(unsigned int cpu)
 			msleep(1);
 		}
 	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
+		/*
+		 * If the current state is not offline yet, it means that the
+		 * dying CPU (which is in pseries_mach_cpu_die) didn't have a
+		 * chance to call rtas_stop_self yet and therefore it's too
+		 * early to query if the CPU is stopped.
+		 */
+		spin_event_timeout(get_cpu_current_state(cpu) == CPU_STATE_OFFLINE,
+				   100000, 100);
 
 		for (tries = 0; tries < 25; tries++) {
 			cpu_status = smp_query_cpu_stopped(pcpu);
 			if (cpu_status == QCSS_STOPPED ||
 			    cpu_status == QCSS_HARDWARE_ERROR)
 				break;
-			cpu_relax();
+			udelay(100);
 		}
 	}
 


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()
  2019-02-22 22:57 [PATCH v2] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self() Thiago Jung Bauermann
@ 2019-02-28  9:47 ` Gautham R Shenoy
  2019-03-11 19:29   ` Thiago Jung Bauermann
  0 siblings, 1 reply; 3+ messages in thread
From: Gautham R Shenoy @ 2019-02-28  9:47 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: linuxppc-dev, linux-kernel, Gautham R Shenoy, Michael Bringmann,
	Tyrel Datwyler, Vaidyanathan Srinivasan

Hello Thiago,

On Fri, Feb 22, 2019 at 07:57:52PM -0300, Thiago Jung Bauermann wrote:
> When testing DLPAR CPU add/remove on a system under stress,
> pseries_cpu_die() doesn't wait long enough for a CPU to die:
> 
> [  446.983944] cpu 148 (hwid 148) Ready to die...
> [  446.984062] cpu 149 (hwid 149) Ready to die...
> [  446.993518] cpu 150 (hwid 150) Ready to die...
> [  446.993543] Querying DEAD? cpu 150 (150) shows 2
> [  446.994098] cpu 151 (hwid 151) Ready to die...
> [  447.133726] cpu 136 (hwid 136) Ready to die...
> [  447.403532] cpu 137 (hwid 137) Ready to die...
> [  447.403772] cpu 138 (hwid 138) Ready to die...
> [  447.403839] cpu 139 (hwid 139) Ready to die...
> [  447.403887] cpu 140 (hwid 140) Ready to die...
> [  447.403937] cpu 141 (hwid 141) Ready to die...
> [  447.403979] cpu 142 (hwid 142) Ready to die...
> [  447.404038] cpu 143 (hwid 143) Ready to die...
> [  447.513546] cpu 128 (hwid 128) Ready to die...
> [  447.693533] cpu 129 (hwid 129) Ready to die...
> [  447.693999] cpu 130 (hwid 130) Ready to die...
> [  447.703530] cpu 131 (hwid 131) Ready to die...
> [  447.704087] Querying DEAD? cpu 132 (132) shows 2
> [  447.704102] cpu 132 (hwid 132) Ready to die...
> [  447.713534] cpu 133 (hwid 133) Ready to die...
> [  447.714064] Querying DEAD? cpu 134 (134) shows 2
> 
> This is a race between one CPU stopping and another one calling
> pseries_cpu_die() to wait for it to stop. That function does a short busy
> loop calling RTAS query-cpu-stopped-state on the stopping CPU to verify
> that it is stopped, but I think there's a lot for the stopping CPU to do
> which may take longer than this loop allows.
> 
> As can be seen in the dmesg right before or after the "Querying DEAD?"
> messages, if pseries_cpu_die() waited a little longer it would have seen
> the CPU in the stopped state.

> 
> I see two cases that can be causing this race:
> 
> 1. It's possible that CPU 134 was inactive at the time it was unplugged. In
>    that case, dlpar_offline_cpu() calls H_PROD on that CPU and immediately
>    calls pseries_cpu_die(). Meanwhile, the prodded CPU activates and start
>    the process of stopping itself. It's possible that the busy loop is not
>    long enough to allow for the CPU to wake up and complete the stopping
>    process.

The problem is a bit more severe since, after printing "Querying
DEAD?" for CPU X, this CPU can prod another offline CPU Y on the same
core which, on waking up, will call rtas_stop_self. Thus we can have two
concurrent calls to rtas-stop-self, which is prohibited by the PAPR.


> 
> 2. If CPU 134 was online at the time it was unplugged, it would have gone
>    through the new CPU hotplug state machine in kernel/cpu.c that was
>    introduced in v4.6 to get itself stopped. It's possible that the busy
>    loop in pseries_cpu_die() was long enough for the older hotplug code but
>    not for the new hotplug state machine.

I haven't been able to observe the "Querying DEAD?" messages for the
online CPU which was being offlined and dlpar'ed out.

> 
> I don't know if this race condition has any ill effects, but we can make
> the race a lot more even if we only start querying if the CPU is stopped
> when the stopping CPU is close to call rtas_stop_self().
> 
> Since pseries_mach_cpu_die() sets the CPU current state to offline almost
> immediately before calling rtas_stop_self(), we use that as a signal that
> it is either already stopped or very close to that point, and we can start
> the busy loop.
> 
> As suggested by Michael Ellerman, this patch also changes the busy loop to
> wait for a fixed amount of wall time.
> 
> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> I tried to estimate good amounts for the timeout and loop delays, but
> I'm not sure how reasonable my numbers are. The busy loops will wait for
> 100 µs between each try, and spin_event_timeout() will timeout after
> 100 ms. I'll be happy to change these values if you have better
> suggestions.

Based on the measurements that I did on a POWER9 system, in successful
cases of smp_query_cpu_stopped(cpu) returning affirmative, the maximum
time spent inside the loop was was 10ms.


> Gautham was able to test this patch and it solved the race condition.
> 
> v1 was a cruder patch which just increased the number of loops:
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/153734.html
> 
> v1 also mentioned a kernel crash but Gautham narrowed it down to a bug
> in RTAS, which is in the process of being fixed.
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 97feb6e79f1a..424146cc752e 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -214,13 +214,21 @@ static void pseries_cpu_die(unsigned int cpu)
>  			msleep(1);
>  		}
>  	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> +		/*
> +		 * If the current state is not offline yet, it means that the
> +		 * dying CPU (which is in pseries_mach_cpu_die) didn't have a
> +		 * chance to call rtas_stop_self yet and therefore it's too
> +		 * early to query if the CPU is stopped.
> +		 */
> +		spin_event_timeout(get_cpu_current_state(cpu) == CPU_STATE_OFFLINE,
> +				   100000, 100);
> 
>  		for (tries = 0; tries < 25; tries++) {

Can we bumped up the tries to 100, so that we wait for 10ms before
printing the warning message ?

>  			cpu_status = smp_query_cpu_stopped(pcpu);
>  			if (cpu_status == QCSS_STOPPED ||
>  			    cpu_status == QCSS_HARDWARE_ERROR)
>  				break;
> -			cpu_relax();
> +			udelay(100);
>  		}
>  	}
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()
  2019-02-28  9:47 ` Gautham R Shenoy
@ 2019-03-11 19:29   ` Thiago Jung Bauermann
  0 siblings, 0 replies; 3+ messages in thread
From: Thiago Jung Bauermann @ 2019-03-11 19:29 UTC (permalink / raw)
  To: ego
  Cc: linuxppc-dev, linux-kernel, Michael Bringmann, Tyrel Datwyler,
	Vaidyanathan Srinivasan


Hello Gautham,

Thanks for your review.

Gautham R Shenoy <ego@linux.vnet.ibm.com> writes:

> Hello Thiago,
>
> On Fri, Feb 22, 2019 at 07:57:52PM -0300, Thiago Jung Bauermann wrote:
>> I see two cases that can be causing this race:
>>
>> 1. It's possible that CPU 134 was inactive at the time it was unplugged. In
>>    that case, dlpar_offline_cpu() calls H_PROD on that CPU and immediately
>>    calls pseries_cpu_die(). Meanwhile, the prodded CPU activates and start
>>    the process of stopping itself. It's possible that the busy loop is not
>>    long enough to allow for the CPU to wake up and complete the stopping
>>    process.
>
> The problem is a bit more severe since, after printing "Querying
> DEAD?" for CPU X, this CPU can prod another offline CPU Y on the same
> core which, on waking up, will call rtas_stop_self. Thus we can have two
> concurrent calls to rtas-stop-self, which is prohibited by the PAPR.

Inded, very good point. I added this information to the patch
description.

>> 2. If CPU 134 was online at the time it was unplugged, it would have gone
>>    through the new CPU hotplug state machine in kernel/cpu.c that was
>>    introduced in v4.6 to get itself stopped. It's possible that the busy
>>    loop in pseries_cpu_die() was long enough for the older hotplug code but
>>    not for the new hotplug state machine.
>
> I haven't been able to observe the "Querying DEAD?" messages for the
> online CPU which was being offlined and dlpar'ed out.

Ah, thanks for pointing this out. That was a scenario I thought could
happen when I was investigating this issue but I never confirmed whether
it could really happen. I removed it from the patch description.

>> I don't know if this race condition has any ill effects, but we can make
>> the race a lot more even if we only start querying if the CPU is stopped
>> when the stopping CPU is close to call rtas_stop_self().
>>
>> Since pseries_mach_cpu_die() sets the CPU current state to offline almost
>> immediately before calling rtas_stop_self(), we use that as a signal that
>> it is either already stopped or very close to that point, and we can start
>> the busy loop.
>>
>> As suggested by Michael Ellerman, this patch also changes the busy loop to
>> wait for a fixed amount of wall time.
>>
>> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> I tried to estimate good amounts for the timeout and loop delays, but
>> I'm not sure how reasonable my numbers are. The busy loops will wait for
>> 100 µs between each try, and spin_event_timeout() will timeout after
>> 100 ms. I'll be happy to change these values if you have better
>> suggestions.
>
> Based on the measurements that I did on a POWER9 system, in successful
> cases of smp_query_cpu_stopped(cpu) returning affirmative, the maximum
> time spent inside the loop was was 10ms.

That's very good to know. I added this information to the patch
description.

I also added you in an Analyzed-by tag, I hope it's fine with you.

>> Gautham was able to test this patch and it solved the race condition.
>>
>> v1 was a cruder patch which just increased the number of loops:
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/153734.html
>>
>> v1 also mentioned a kernel crash but Gautham narrowed it down to a bug
>> in RTAS, which is in the process of being fixed.
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 97feb6e79f1a..424146cc752e 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -214,13 +214,21 @@ static void pseries_cpu_die(unsigned int cpu)
>>  			msleep(1);
>>  		}
>>  	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
>> +		/*
>> +		 * If the current state is not offline yet, it means that the
>> +		 * dying CPU (which is in pseries_mach_cpu_die) didn't have a
>> +		 * chance to call rtas_stop_self yet and therefore it's too
>> +		 * early to query if the CPU is stopped.
>> +		 */
>> +		spin_event_timeout(get_cpu_current_state(cpu) == CPU_STATE_OFFLINE,
>> +				   100000, 100);
>>
>>  		for (tries = 0; tries < 25; tries++) {
>
> Can we bumped up the tries to 100, so that we wait for 10ms before
> printing the warning message ?

Good idea. I increased the loop to 200 iterations so that it can take up
to 20 ms, just to be sure.

>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>  			if (cpu_status == QCSS_STOPPED ||
>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>  				break;
>> -			cpu_relax();
>> +			udelay(100);
>>  		}
>>  	}
>>


--
Thiago Jung Bauermann
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-03-11 19:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-22 22:57 [PATCH v2] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self() Thiago Jung Bauermann
2019-02-28  9:47 ` Gautham R Shenoy
2019-03-11 19:29   ` Thiago Jung Bauermann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).