All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
@ 2019-04-23 22:39 ` Thiago Jung Bauermann
  0 siblings, 0 replies; 9+ messages in thread
From: Thiago Jung Bauermann @ 2019-04-23 22:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-kernel, Gautham R Shenoy, Michael Bringmann,
	Tyrel Datwyler, Vaidyanathan Srinivasan, Nicholas Piggin,
	Thiago Jung Bauermann

When testing DLPAR CPU add/remove on a system under stress,
pseries_cpu_die() doesn't wait long enough for a CPU to die:

[  446.983944] cpu 148 (hwid 148) Ready to die...
[  446.984062] cpu 149 (hwid 149) Ready to die...
[  446.993518] cpu 150 (hwid 150) Ready to die...
[  446.993543] Querying DEAD? cpu 150 (150) shows 2
[  446.994098] cpu 151 (hwid 151) Ready to die...
[  447.133726] cpu 136 (hwid 136) Ready to die...
[  447.403532] cpu 137 (hwid 137) Ready to die...
[  447.403772] cpu 138 (hwid 138) Ready to die...
[  447.403839] cpu 139 (hwid 139) Ready to die...
[  447.403887] cpu 140 (hwid 140) Ready to die...
[  447.403937] cpu 141 (hwid 141) Ready to die...
[  447.403979] cpu 142 (hwid 142) Ready to die...
[  447.404038] cpu 143 (hwid 143) Ready to die...
[  447.513546] cpu 128 (hwid 128) Ready to die...
[  447.693533] cpu 129 (hwid 129) Ready to die...
[  447.693999] cpu 130 (hwid 130) Ready to die...
[  447.703530] cpu 131 (hwid 131) Ready to die...
[  447.704087] Querying DEAD? cpu 132 (132) shows 2
[  447.704102] cpu 132 (hwid 132) Ready to die...
[  447.713534] cpu 133 (hwid 133) Ready to die...
[  447.714064] Querying DEAD? cpu 134 (134) shows 2

This is a race between one CPU stopping and another one calling
pseries_cpu_die() to wait for it to stop. That function does a short busy
loop calling RTAS query-cpu-stopped-state on the stopping CPU to verify
that it is stopped, but I think there's a lot for the stopping CPU to do
which may take longer than this loop allows.

As can be seen in the dmesg right before or after the "Querying DEAD?"
messages, if pseries_cpu_die() waited a little longer it would have seen
the CPU in the stopped state.

What I think is going on is that CPU 134 was inactive at the time it was
unplugged. In that case, dlpar_offline_cpu() calls H_PROD on that CPU and
immediately calls pseries_cpu_die(). Meanwhile, the prodded CPU activates
and start the process of stopping itself. The busy loop is not long enough
to allow for the CPU to wake up and complete the stopping process.

This can be a problem because if the busy loop finishes too early, then the
kernel may offline another CPU before the previous one finished dying,
which would lead to two concurrent calls to rtas-stop-self, which is
prohibited by the PAPR.

Since the hotplug machinery already assumes that cpu_die() is going to
work, we can simply loop until the CPU stops.

Also change the loop to wait 100 µs between each call to
smp_query_cpu_stopped() to avoid querying RTAS too often.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Analyzed-by: Gautham R Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

I have seen this problem since v4.8. Should this patch go to stable as
well?

Changes since v3:
- Changed to loop until the CPU stops rather than for a fixed amount
  of time.

Changes since v2:
- Increased busy loop to 200 iterations so that it can last up to 20 ms
  (suggested by Gautham).
- Changed commit message to include Gautham's remarks.

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 97feb6e79f1a..d75cee60644c 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -214,13 +214,17 @@ static void pseries_cpu_die(unsigned int cpu)
 			msleep(1);
 		}
 	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
-
-		for (tries = 0; tries < 25; tries++) {
+		/*
+		 * rtas_stop_self() panics if the CPU fails to stop and our
+		 * callers already assume that we are going to succeed, so we
+		 * can just loop until the CPU stops.
+		 */
+		while (true) {
 			cpu_status = smp_query_cpu_stopped(pcpu);
 			if (cpu_status == QCSS_STOPPED ||
 			    cpu_status == QCSS_HARDWARE_ERROR)
 				break;
-			cpu_relax();
+			udelay(100);
 		}
 	}
 


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
@ 2019-04-23 22:39 ` Thiago Jung Bauermann
  0 siblings, 0 replies; 9+ messages in thread
From: Thiago Jung Bauermann @ 2019-04-23 22:39 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Gautham R Shenoy, linux-kernel, Nicholas Piggin,
	Michael Bringmann, Tyrel Datwyler, Vaidyanathan Srinivasan,
	Thiago Jung Bauermann

When testing DLPAR CPU add/remove on a system under stress,
pseries_cpu_die() doesn't wait long enough for a CPU to die:

[  446.983944] cpu 148 (hwid 148) Ready to die...
[  446.984062] cpu 149 (hwid 149) Ready to die...
[  446.993518] cpu 150 (hwid 150) Ready to die...
[  446.993543] Querying DEAD? cpu 150 (150) shows 2
[  446.994098] cpu 151 (hwid 151) Ready to die...
[  447.133726] cpu 136 (hwid 136) Ready to die...
[  447.403532] cpu 137 (hwid 137) Ready to die...
[  447.403772] cpu 138 (hwid 138) Ready to die...
[  447.403839] cpu 139 (hwid 139) Ready to die...
[  447.403887] cpu 140 (hwid 140) Ready to die...
[  447.403937] cpu 141 (hwid 141) Ready to die...
[  447.403979] cpu 142 (hwid 142) Ready to die...
[  447.404038] cpu 143 (hwid 143) Ready to die...
[  447.513546] cpu 128 (hwid 128) Ready to die...
[  447.693533] cpu 129 (hwid 129) Ready to die...
[  447.693999] cpu 130 (hwid 130) Ready to die...
[  447.703530] cpu 131 (hwid 131) Ready to die...
[  447.704087] Querying DEAD? cpu 132 (132) shows 2
[  447.704102] cpu 132 (hwid 132) Ready to die...
[  447.713534] cpu 133 (hwid 133) Ready to die...
[  447.714064] Querying DEAD? cpu 134 (134) shows 2

This is a race between one CPU stopping and another one calling
pseries_cpu_die() to wait for it to stop. That function does a short busy
loop calling RTAS query-cpu-stopped-state on the stopping CPU to verify
that it is stopped, but I think there's a lot for the stopping CPU to do
which may take longer than this loop allows.

As can be seen in the dmesg right before or after the "Querying DEAD?"
messages, if pseries_cpu_die() waited a little longer it would have seen
the CPU in the stopped state.

What I think is going on is that CPU 134 was inactive at the time it was
unplugged. In that case, dlpar_offline_cpu() calls H_PROD on that CPU and
immediately calls pseries_cpu_die(). Meanwhile, the prodded CPU activates
and start the process of stopping itself. The busy loop is not long enough
to allow for the CPU to wake up and complete the stopping process.

This can be a problem because if the busy loop finishes too early, then the
kernel may offline another CPU before the previous one finished dying,
which would lead to two concurrent calls to rtas-stop-self, which is
prohibited by the PAPR.

Since the hotplug machinery already assumes that cpu_die() is going to
work, we can simply loop until the CPU stops.

Also change the loop to wait 100 µs between each call to
smp_query_cpu_stopped() to avoid querying RTAS too often.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Analyzed-by: Gautham R Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

I have seen this problem since v4.8. Should this patch go to stable as
well?

Changes since v3:
- Changed to loop until the CPU stops rather than for a fixed amount
  of time.

Changes since v2:
- Increased busy loop to 200 iterations so that it can last up to 20 ms
  (suggested by Gautham).
- Changed commit message to include Gautham's remarks.

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 97feb6e79f1a..d75cee60644c 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -214,13 +214,17 @@ static void pseries_cpu_die(unsigned int cpu)
 			msleep(1);
 		}
 	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
-
-		for (tries = 0; tries < 25; tries++) {
+		/*
+		 * rtas_stop_self() panics if the CPU fails to stop and our
+		 * callers already assume that we are going to succeed, so we
+		 * can just loop until the CPU stops.
+		 */
+		while (true) {
 			cpu_status = smp_query_cpu_stopped(pcpu);
 			if (cpu_status == QCSS_STOPPED ||
 			    cpu_status == QCSS_HARDWARE_ERROR)
 				break;
-			cpu_relax();
+			udelay(100);
 		}
 	}
 


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
  2019-04-23 22:39 ` Thiago Jung Bauermann
  (?)
@ 2019-04-30 16:34 ` Nathan Lynch
  2019-04-30 19:59     ` Thiago Jung Bauermann
  -1 siblings, 1 reply; 9+ messages in thread
From: Nathan Lynch @ 2019-04-30 16:34 UTC (permalink / raw)
  To: Thiago Jung Bauermann, linuxppc-dev
  Cc: Gautham R Shenoy, linux-kernel, Nicholas Piggin,
	Michael Bringmann, Tyrel Datwyler, Vaidyanathan Srinivasan,
	Thiago Jung Bauermann

Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
> This can be a problem because if the busy loop finishes too early, then the
> kernel may offline another CPU before the previous one finished dying,
> which would lead to two concurrent calls to rtas-stop-self, which is
> prohibited by the PAPR.
>
> Since the hotplug machinery already assumes that cpu_die() is going to
> work, we can simply loop until the CPU stops.
>
> Also change the loop to wait 100 µs between each call to
> smp_query_cpu_stopped() to avoid querying RTAS too often.

[...]

> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 97feb6e79f1a..d75cee60644c 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -214,13 +214,17 @@ static void pseries_cpu_die(unsigned int cpu)
>  			msleep(1);
>  		}
>  	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> -
> -		for (tries = 0; tries < 25; tries++) {
> +		/*
> +		 * rtas_stop_self() panics if the CPU fails to stop and our
> +		 * callers already assume that we are going to succeed, so we
> +		 * can just loop until the CPU stops.
> +		 */
> +		while (true) {
>  			cpu_status = smp_query_cpu_stopped(pcpu);
>  			if (cpu_status == QCSS_STOPPED ||
>  			    cpu_status == QCSS_HARDWARE_ERROR)
>  				break;
> -			cpu_relax();
> +			udelay(100);
>  		}
>  	}

I agree with looping indefinitely but doesn't it need a cond_resched()
or similar check?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
  2019-04-30 16:34 ` Nathan Lynch
@ 2019-04-30 19:59     ` Thiago Jung Bauermann
  0 siblings, 0 replies; 9+ messages in thread
From: Thiago Jung Bauermann @ 2019-04-30 19:59 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: linuxppc-dev, Gautham R Shenoy, linux-kernel, Nicholas Piggin,
	Michael Bringmann, Tyrel Datwyler, Vaidyanathan Srinivasan


Hello Nathan,

Thanks for reviewing the patch!

Nathan Lynch <nathanl@linux.ibm.com> writes:

> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>> This can be a problem because if the busy loop finishes too early, then the
>> kernel may offline another CPU before the previous one finished dying,
>> which would lead to two concurrent calls to rtas-stop-self, which is
>> prohibited by the PAPR.
>>
>> Since the hotplug machinery already assumes that cpu_die() is going to
>> work, we can simply loop until the CPU stops.
>>
>> Also change the loop to wait 100 µs between each call to
>> smp_query_cpu_stopped() to avoid querying RTAS too often.
>
> [...]
>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 97feb6e79f1a..d75cee60644c 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -214,13 +214,17 @@ static void pseries_cpu_die(unsigned int cpu)
>>  			msleep(1);
>>  		}
>>  	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
>> -
>> -		for (tries = 0; tries < 25; tries++) {
>> +		/*
>> +		 * rtas_stop_self() panics if the CPU fails to stop and our
>> +		 * callers already assume that we are going to succeed, so we
>> +		 * can just loop until the CPU stops.
>> +		 */
>> +		while (true) {
>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>  			if (cpu_status == QCSS_STOPPED ||
>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>  				break;
>> -			cpu_relax();
>> +			udelay(100);
>>  		}
>>  	}
>
> I agree with looping indefinitely but doesn't it need a cond_resched()
> or similar check?

If there's no kernel or hypervisor bug, it shouldn't take more than a
few tens of ms for this loop to complete (Gautham measured a maximum of
10 ms on a POWER9 with an earlier version of this patch).

In case of bugs related to CPU hotplug (either in the kernel or the
hypervisor), I was hoping that the resulting lockup warnings would be a
good indicator that something is wrong. :-)

Though perhaps adding a cond_resched() every 10 ms or so, with a
WARN_ON() if it loops for more than 50 ms would be better.

I'll send an alternative patch.

--
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
@ 2019-04-30 19:59     ` Thiago Jung Bauermann
  0 siblings, 0 replies; 9+ messages in thread
From: Thiago Jung Bauermann @ 2019-04-30 19:59 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Gautham R Shenoy, linux-kernel, Nicholas Piggin,
	Michael Bringmann, Tyrel Datwyler, Vaidyanathan Srinivasan,
	linuxppc-dev


Hello Nathan,

Thanks for reviewing the patch!

Nathan Lynch <nathanl@linux.ibm.com> writes:

> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>> This can be a problem because if the busy loop finishes too early, then the
>> kernel may offline another CPU before the previous one finished dying,
>> which would lead to two concurrent calls to rtas-stop-self, which is
>> prohibited by the PAPR.
>>
>> Since the hotplug machinery already assumes that cpu_die() is going to
>> work, we can simply loop until the CPU stops.
>>
>> Also change the loop to wait 100 µs between each call to
>> smp_query_cpu_stopped() to avoid querying RTAS too often.
>
> [...]
>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 97feb6e79f1a..d75cee60644c 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -214,13 +214,17 @@ static void pseries_cpu_die(unsigned int cpu)
>>  			msleep(1);
>>  		}
>>  	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
>> -
>> -		for (tries = 0; tries < 25; tries++) {
>> +		/*
>> +		 * rtas_stop_self() panics if the CPU fails to stop and our
>> +		 * callers already assume that we are going to succeed, so we
>> +		 * can just loop until the CPU stops.
>> +		 */
>> +		while (true) {
>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>  			if (cpu_status == QCSS_STOPPED ||
>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>  				break;
>> -			cpu_relax();
>> +			udelay(100);
>>  		}
>>  	}
>
> I agree with looping indefinitely but doesn't it need a cond_resched()
> or similar check?

If there's no kernel or hypervisor bug, it shouldn't take more than a
few tens of ms for this loop to complete (Gautham measured a maximum of
10 ms on a POWER9 with an earlier version of this patch).

In case of bugs related to CPU hotplug (either in the kernel or the
hypervisor), I was hoping that the resulting lockup warnings would be a
good indicator that something is wrong. :-)

Though perhaps adding a cond_resched() every 10 ms or so, with a
WARN_ON() if it loops for more than 50 ms would be better.

I'll send an alternative patch.

--
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
  2019-04-30 19:59     ` Thiago Jung Bauermann
@ 2019-05-01 14:57       ` Nathan Lynch
  -1 siblings, 0 replies; 9+ messages in thread
From: Nathan Lynch @ 2019-05-01 14:57 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: linuxppc-dev, Gautham R Shenoy, linux-kernel, Nicholas Piggin,
	Michael Bringmann, Tyrel Datwyler, Vaidyanathan Srinivasan

Hi Thiago,

Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>>> +		while (true) {
>>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>>  			if (cpu_status == QCSS_STOPPED ||
>>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>>  				break;
>>> -			cpu_relax();
>>> +			udelay(100);
>>>  		}
>>>  	}
>>
>> I agree with looping indefinitely but doesn't it need a cond_resched()
>> or similar check?
>
> If there's no kernel or hypervisor bug, it shouldn't take more than a
> few tens of ms for this loop to complete (Gautham measured a maximum of
> 10 ms on a POWER9 with an earlier version of this patch).

10ms is twice the default scheduler quantum...


> In case of bugs related to CPU hotplug (either in the kernel or the
> hypervisor), I was hoping that the resulting lockup warnings would be a
> good indicator that something is wrong. :-)

Not convinced we should assume something is wrong if it takes a few
dozen ms to complete the operation. AFAIK we don't have any guarantees
about the maximum latency of stop-self, and it can be affected by other
activity in the system, whether we're in shared processor mode, etc. Not
to mention smp_query_cpu_stopped has to acquire the global RTAS lock and
be serialized with other tasks calling into RTAS. So I am concerned
about generating spurious warnings here.

If for whatever reason the operation is taking too long, drmgr or
whichever application is initiating the change will appear to stop
making progress. It's not too hard to find out what's going on with
facilities like perf or /proc/pid/stack.


> Though perhaps adding a cond_resched() every 10 ms or so, with a
> WARN_ON() if it loops for more than 50 ms would be better.

A warning doesn't seem appropriate to me, and cond_resched should be
invoked in each iteration. Or just msleep(1) in each iteration would be
fine, I think.

But I'd like to bring in some more context -- here is the body of
pseries_cpu_die:

static void pseries_cpu_die(unsigned int cpu)
{
	int tries;
	int cpu_status = 1;
	unsigned int pcpu = get_hard_smp_processor_id(cpu);

	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
		cpu_status = 1;
		for (tries = 0; tries < 5000; tries++) {
			if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
				cpu_status = 0;
				break;
			}
			msleep(1);
		}
	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {

		for (tries = 0; tries < 25; tries++) {
			cpu_status = smp_query_cpu_stopped(pcpu);
			if (cpu_status == QCSS_STOPPED ||
			    cpu_status == QCSS_HARDWARE_ERROR)
				break;
			cpu_relax();
		}
}

This patch alters the behavior of the second loop (the CPU_STATE_OFFLINE
branch). The CPU_STATE_INACTIVE branch is used when the offline behavior
is to use H_CEDE instead of stop-self, correct?

And isn't entering H_CEDE expected to be quite a bit faster than
stop-self? If so, why does that path get five whole seconds[*] while
we're bikeshedding about tens of milliseconds for stop-self? :-)

[*] And should it be made to retry indefinitely as well?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
@ 2019-05-01 14:57       ` Nathan Lynch
  0 siblings, 0 replies; 9+ messages in thread
From: Nathan Lynch @ 2019-05-01 14:57 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Gautham R Shenoy, linux-kernel, Nicholas Piggin,
	Michael Bringmann, Tyrel Datwyler, Vaidyanathan Srinivasan,
	linuxppc-dev

Hi Thiago,

Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>>> +		while (true) {
>>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>>  			if (cpu_status == QCSS_STOPPED ||
>>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>>  				break;
>>> -			cpu_relax();
>>> +			udelay(100);
>>>  		}
>>>  	}
>>
>> I agree with looping indefinitely but doesn't it need a cond_resched()
>> or similar check?
>
> If there's no kernel or hypervisor bug, it shouldn't take more than a
> few tens of ms for this loop to complete (Gautham measured a maximum of
> 10 ms on a POWER9 with an earlier version of this patch).

10ms is twice the default scheduler quantum...


> In case of bugs related to CPU hotplug (either in the kernel or the
> hypervisor), I was hoping that the resulting lockup warnings would be a
> good indicator that something is wrong. :-)

Not convinced we should assume something is wrong if it takes a few
dozen ms to complete the operation. AFAIK we don't have any guarantees
about the maximum latency of stop-self, and it can be affected by other
activity in the system, whether we're in shared processor mode, etc. Not
to mention smp_query_cpu_stopped has to acquire the global RTAS lock and
be serialized with other tasks calling into RTAS. So I am concerned
about generating spurious warnings here.

If for whatever reason the operation is taking too long, drmgr or
whichever application is initiating the change will appear to stop
making progress. It's not too hard to find out what's going on with
facilities like perf or /proc/pid/stack.


> Though perhaps adding a cond_resched() every 10 ms or so, with a
> WARN_ON() if it loops for more than 50 ms would be better.

A warning doesn't seem appropriate to me, and cond_resched should be
invoked in each iteration. Or just msleep(1) in each iteration would be
fine, I think.

But I'd like to bring in some more context -- here is the body of
pseries_cpu_die:

static void pseries_cpu_die(unsigned int cpu)
{
	int tries;
	int cpu_status = 1;
	unsigned int pcpu = get_hard_smp_processor_id(cpu);

	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
		cpu_status = 1;
		for (tries = 0; tries < 5000; tries++) {
			if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
				cpu_status = 0;
				break;
			}
			msleep(1);
		}
	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {

		for (tries = 0; tries < 25; tries++) {
			cpu_status = smp_query_cpu_stopped(pcpu);
			if (cpu_status == QCSS_STOPPED ||
			    cpu_status == QCSS_HARDWARE_ERROR)
				break;
			cpu_relax();
		}
}

This patch alters the behavior of the second loop (the CPU_STATE_OFFLINE
branch). The CPU_STATE_INACTIVE branch is used when the offline behavior
is to use H_CEDE instead of stop-self, correct?

And isn't entering H_CEDE expected to be quite a bit faster than
stop-self? If so, why does that path get five whole seconds[*] while
we're bikeshedding about tens of milliseconds for stop-self? :-)

[*] And should it be made to retry indefinitely as well?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
  2019-05-01 14:57       ` Nathan Lynch
@ 2019-05-01 23:12         ` Nicholas Piggin
  -1 siblings, 0 replies; 9+ messages in thread
From: Nicholas Piggin @ 2019-05-01 23:12 UTC (permalink / raw)
  To: Thiago Jung Bauermann, Nathan Lynch
  Cc: Gautham R Shenoy, linux-kernel, linuxppc-dev, Michael Bringmann,
	Vaidyanathan Srinivasan, Tyrel Datwyler

Nathan Lynch's on May 2, 2019 12:57 am:
> Hi Thiago,
> 
> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>>>> +		while (true) {
>>>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>>>  			if (cpu_status == QCSS_STOPPED ||
>>>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>>>  				break;
>>>> -			cpu_relax();
>>>> +			udelay(100);
>>>>  		}
>>>>  	}
>>>
>>> I agree with looping indefinitely but doesn't it need a cond_resched()
>>> or similar check?
>>
>> If there's no kernel or hypervisor bug, it shouldn't take more than a
>> few tens of ms for this loop to complete (Gautham measured a maximum of
>> 10 ms on a POWER9 with an earlier version of this patch).
> 
> 10ms is twice the default scheduler quantum...
> 
> 
>> In case of bugs related to CPU hotplug (either in the kernel or the
>> hypervisor), I was hoping that the resulting lockup warnings would be a
>> good indicator that something is wrong. :-)
> 
> Not convinced we should assume something is wrong if it takes a few
> dozen ms to complete the operation.

Right, and if there is no kernel or hypervisor bug then it will stop
eventually :)

> AFAIK we don't have any guarantees
> about the maximum latency of stop-self, and it can be affected by other
> activity in the system, whether we're in shared processor mode, etc. Not
> to mention smp_query_cpu_stopped has to acquire the global RTAS lock and
> be serialized with other tasks calling into RTAS. So I am concerned
> about generating spurious warnings here.

Agreed.

> 
> If for whatever reason the operation is taking too long, drmgr or
> whichever application is initiating the change will appear to stop
> making progress. It's not too hard to find out what's going on with
> facilities like perf or /proc/pid/stack.
> 
> 
>> Though perhaps adding a cond_resched() every 10 ms or so, with a
>> WARN_ON() if it loops for more than 50 ms would be better.
> 
> A warning doesn't seem appropriate to me, and cond_resched should be
> invoked in each iteration. Or just msleep(1) in each iteration would be
> fine, I think.
> 
> But I'd like to bring in some more context -- here is the body of
> pseries_cpu_die:
> 
> static void pseries_cpu_die(unsigned int cpu)
> {
> 	int tries;
> 	int cpu_status = 1;
> 	unsigned int pcpu = get_hard_smp_processor_id(cpu);
> 
> 	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
> 		cpu_status = 1;
> 		for (tries = 0; tries < 5000; tries++) {
> 			if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
> 				cpu_status = 0;
> 				break;
> 			}
> 			msleep(1);
> 		}
> 	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> 
> 		for (tries = 0; tries < 25; tries++) {
> 			cpu_status = smp_query_cpu_stopped(pcpu);
> 			if (cpu_status == QCSS_STOPPED ||
> 			    cpu_status == QCSS_HARDWARE_ERROR)
> 				break;
> 			cpu_relax();
> 		}
> }
> 
> This patch alters the behavior of the second loop (the CPU_STATE_OFFLINE
> branch). The CPU_STATE_INACTIVE branch is used when the offline behavior
> is to use H_CEDE instead of stop-self, correct?
> 
> And isn't entering H_CEDE expected to be quite a bit faster than
> stop-self? If so, why does that path get five whole seconds[*] while
> we're bikeshedding about tens of milliseconds for stop-self? :-)
> 
> [*] And should it be made to retry indefinitely as well?

I think so.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
@ 2019-05-01 23:12         ` Nicholas Piggin
  0 siblings, 0 replies; 9+ messages in thread
From: Nicholas Piggin @ 2019-05-01 23:12 UTC (permalink / raw)
  To: Thiago Jung Bauermann, Nathan Lynch
  Cc: Gautham R Shenoy, linux-kernel, Michael Bringmann,
	Tyrel Datwyler, Vaidyanathan Srinivasan, linuxppc-dev

Nathan Lynch's on May 2, 2019 12:57 am:
> Hi Thiago,
> 
> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>> Thiago Jung Bauermann <bauerman@linux.ibm.com> writes:
>>>> +		while (true) {
>>>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>>>  			if (cpu_status == QCSS_STOPPED ||
>>>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>>>  				break;
>>>> -			cpu_relax();
>>>> +			udelay(100);
>>>>  		}
>>>>  	}
>>>
>>> I agree with looping indefinitely but doesn't it need a cond_resched()
>>> or similar check?
>>
>> If there's no kernel or hypervisor bug, it shouldn't take more than a
>> few tens of ms for this loop to complete (Gautham measured a maximum of
>> 10 ms on a POWER9 with an earlier version of this patch).
> 
> 10ms is twice the default scheduler quantum...
> 
> 
>> In case of bugs related to CPU hotplug (either in the kernel or the
>> hypervisor), I was hoping that the resulting lockup warnings would be a
>> good indicator that something is wrong. :-)
> 
> Not convinced we should assume something is wrong if it takes a few
> dozen ms to complete the operation.

Right, and if there is no kernel or hypervisor bug then it will stop
eventually :)

> AFAIK we don't have any guarantees
> about the maximum latency of stop-self, and it can be affected by other
> activity in the system, whether we're in shared processor mode, etc. Not
> to mention smp_query_cpu_stopped has to acquire the global RTAS lock and
> be serialized with other tasks calling into RTAS. So I am concerned
> about generating spurious warnings here.

Agreed.

> 
> If for whatever reason the operation is taking too long, drmgr or
> whichever application is initiating the change will appear to stop
> making progress. It's not too hard to find out what's going on with
> facilities like perf or /proc/pid/stack.
> 
> 
>> Though perhaps adding a cond_resched() every 10 ms or so, with a
>> WARN_ON() if it loops for more than 50 ms would be better.
> 
> A warning doesn't seem appropriate to me, and cond_resched should be
> invoked in each iteration. Or just msleep(1) in each iteration would be
> fine, I think.
> 
> But I'd like to bring in some more context -- here is the body of
> pseries_cpu_die:
> 
> static void pseries_cpu_die(unsigned int cpu)
> {
> 	int tries;
> 	int cpu_status = 1;
> 	unsigned int pcpu = get_hard_smp_processor_id(cpu);
> 
> 	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
> 		cpu_status = 1;
> 		for (tries = 0; tries < 5000; tries++) {
> 			if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
> 				cpu_status = 0;
> 				break;
> 			}
> 			msleep(1);
> 		}
> 	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> 
> 		for (tries = 0; tries < 25; tries++) {
> 			cpu_status = smp_query_cpu_stopped(pcpu);
> 			if (cpu_status == QCSS_STOPPED ||
> 			    cpu_status == QCSS_HARDWARE_ERROR)
> 				break;
> 			cpu_relax();
> 		}
> }
> 
> This patch alters the behavior of the second loop (the CPU_STATE_OFFLINE
> branch). The CPU_STATE_INACTIVE branch is used when the offline behavior
> is to use H_CEDE instead of stop-self, correct?
> 
> And isn't entering H_CEDE expected to be quite a bit faster than
> stop-self? If so, why does that path get five whole seconds[*] while
> we're bikeshedding about tens of milliseconds for stop-self? :-)
> 
> [*] And should it be made to retry indefinitely as well?

I think so.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-05-01 23:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-23 22:39 [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU Thiago Jung Bauermann
2019-04-23 22:39 ` Thiago Jung Bauermann
2019-04-30 16:34 ` Nathan Lynch
2019-04-30 19:59   ` Thiago Jung Bauermann
2019-04-30 19:59     ` Thiago Jung Bauermann
2019-05-01 14:57     ` Nathan Lynch
2019-05-01 14:57       ` Nathan Lynch
2019-05-01 23:12       ` Nicholas Piggin
2019-05-01 23:12         ` Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.