linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
@ 2019-06-21  6:05 Nathan Lynch
  2019-06-24 16:52 ` mmc
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Nathan Lynch @ 2019-06-21  6:05 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: ego, mmc, julietk

The protocol for suspending or migrating an LPAR requires all present
processor threads to enter H_JOIN. So if we have threads offline, we
have to temporarily bring them up. This can race with administrator
actions such as SMT state changes. As of dfd718a2ed1f ("powerpc/rtas:
Fix a potential race between CPU-Offline & Migration"),
rtas_ibm_suspend_me() accounts for this, but errors out with -EBUSY
for what almost certainly is a transient condition in any reasonable
scenario.

Callers of rtas_ibm_suspend_me() already retry when -EAGAIN is
returned, and it is typical during a migration for that to happen
repeatedly for several minutes polling the H_VASI_STATE hcall result
before proceeding to the next stage.

So return -EAGAIN instead of -EBUSY when this race is
encountered. Additionally: logging this event is still appropriate but
use pr_info instead of pr_err; and remove use of unlikely() while here
as this is not a hot path at all.

Fixes: dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index fbc676160adf..9b4d2a2ffb4f 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -984,10 +984,9 @@ int rtas_ibm_suspend_me(u64 handle)
 	cpu_hotplug_disable();
 
 	/* Check if we raced with a CPU-Offline Operation */
-	if (unlikely(!cpumask_equal(cpu_present_mask, cpu_online_mask))) {
-		pr_err("%s: Raced against a concurrent CPU-Offline\n",
-		       __func__);
-		atomic_set(&data.error, -EBUSY);
+	if (!cpumask_equal(cpu_present_mask, cpu_online_mask)) {
+		pr_info("%s: Raced against a concurrent CPU-Offline\n", __func__);
+		atomic_set(&data.error, -EAGAIN);
 		goto out_hotplug_enable;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
  2019-06-21  6:05 [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration Nathan Lynch
@ 2019-06-24 16:52 ` mmc
  2019-06-24 17:23   ` Nathan Lynch
  2019-07-01 22:12 ` Nathan Lynch
  2019-07-03 14:27 ` Michael Ellerman
  2 siblings, 1 reply; 11+ messages in thread
From: mmc @ 2019-06-24 16:52 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: ego, linuxppc-dev, julietk

On 2019-06-21 00:05, Nathan Lynch wrote:
> The protocol for suspending or migrating an LPAR requires all present
> processor threads to enter H_JOIN. So if we have threads offline, we
> have to temporarily bring them up. This can race with administrator
> actions such as SMT state changes. As of dfd718a2ed1f ("powerpc/rtas:
> Fix a potential race between CPU-Offline & Migration"),
> rtas_ibm_suspend_me() accounts for this, but errors out with -EBUSY
> for what almost certainly is a transient condition in any reasonable
> scenario.
> 
> Callers of rtas_ibm_suspend_me() already retry when -EAGAIN is
> returned, and it is typical during a migration for that to happen
> repeatedly for several minutes polling the H_VASI_STATE hcall result
> before proceeding to the next stage.
> 
> So return -EAGAIN instead of -EBUSY when this race is
> encountered. Additionally: logging this event is still appropriate but
> use pr_info instead of pr_err; and remove use of unlikely() while here
> as this is not a hot path at all.
> 

Looks good, since it's not a hot path anyway, so unlikely() should 
benefit from optimize compiler path, and should stay. No?

Mingming
> Fixes: dfd718a2ed1f ("powerpc/rtas: Fix a potential race between
> CPU-Offline & Migration")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---
>  arch/powerpc/kernel/rtas.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index fbc676160adf..9b4d2a2ffb4f 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -984,10 +984,9 @@ int rtas_ibm_suspend_me(u64 handle)
>  	cpu_hotplug_disable();
> 
>  	/* Check if we raced with a CPU-Offline Operation */
> -	if (unlikely(!cpumask_equal(cpu_present_mask, cpu_online_mask))) {
> -		pr_err("%s: Raced against a concurrent CPU-Offline\n",
> -		       __func__);
> -		atomic_set(&data.error, -EBUSY);
> +	if (!cpumask_equal(cpu_present_mask, cpu_online_mask)) {
> +		pr_info("%s: Raced against a concurrent CPU-Offline\n", __func__);
> +		atomic_set(&data.error, -EAGAIN);
>  		goto out_hotplug_enable;
>  	}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
  2019-06-24 16:52 ` mmc
@ 2019-06-24 17:23   ` Nathan Lynch
  2019-06-25  1:02     ` Juliet Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Lynch @ 2019-06-24 17:23 UTC (permalink / raw)
  To: mmc; +Cc: ego, linuxppc-dev, julietk

Hi Mingming,

mmc <mmc@linux.vnet.ibm.com> writes:
> On 2019-06-21 00:05, Nathan Lynch wrote:
>> So return -EAGAIN instead of -EBUSY when this race is
>> encountered. Additionally: logging this event is still appropriate but
>> use pr_info instead of pr_err; and remove use of unlikely() while here
>> as this is not a hot path at all.
>
> Looks good, since it's not a hot path anyway, so unlikely() should 
> benefit from optimize compiler path, and should stay. No?

The latency of this path in rtas_ibm_suspend_me() in the best case is
around 2-3 seconds.

So I think not -- this is such a heavyweight and relatively
seldom-executed path that the unlikely() cannot yield any discernible
performance benefit, and its presence imposes a readability cost.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
  2019-06-24 17:23   ` Nathan Lynch
@ 2019-06-25  1:02     ` Juliet Kim
  2019-06-25 18:51       ` Nathan Lynch
  0 siblings, 1 reply; 11+ messages in thread
From: Juliet Kim @ 2019-06-25  1:02 UTC (permalink / raw)
  To: Nathan Lynch, mmc; +Cc: ego, linuxppc-dev, julietk

There's some concern this could retry forever, resulting in live lock.  
Another approach would be to abandon the attempt and fail the LPM 
request. 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-June/192531.html

On 6/24/19 12:23 PM, Nathan Lynch wrote:
> Hi Mingming,
>
> mmc <mmc@linux.vnet.ibm.com> writes:
>> On 2019-06-21 00:05, Nathan Lynch wrote:
>>> So return -EAGAIN instead of -EBUSY when this race is
>>> encountered. Additionally: logging this event is still appropriate but
>>> use pr_info instead of pr_err; and remove use of unlikely() while here
>>> as this is not a hot path at all.
>> Looks good, since it's not a hot path anyway, so unlikely() should
>> benefit from optimize compiler path, and should stay. No?
> The latency of this path in rtas_ibm_suspend_me() in the best case is
> around 2-3 seconds.
>
> So I think not -- this is such a heavyweight and relatively
> seldom-executed path that the unlikely() cannot yield any discernible
> performance benefit, and its presence imposes a readability cost.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
  2019-06-25  1:02     ` Juliet Kim
@ 2019-06-25 18:51       ` Nathan Lynch
  2019-06-26 21:40         ` Juliet Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Nathan Lynch @ 2019-06-25 18:51 UTC (permalink / raw)
  To: Juliet Kim; +Cc: ego, mmc, linuxppc-dev

Juliet Kim <julietk@linux.vnet.ibm.com> writes:

> There's some concern this could retry forever, resulting in live lock.

First of all the system will make progress in other areas even if there
are repeated retries; we're not indefinitely holding locks or anything
like that.

Second, Linux checks the H_VASI_STATE result on every retry. If the
platform wants to terminate the migration (say, if it imposes a
timeout), Linux will abandon it when H_VASI_STATE fails to return
H_VASI_SUSPENDING. And it seems incorrect to bail out before that
happens, absent hard errors on the Linux side such as allocation
failures.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
  2019-06-25 18:51       ` Nathan Lynch
@ 2019-06-26 21:40         ` Juliet Kim
  2019-06-27  5:01           ` Michael Ellerman
  0 siblings, 1 reply; 11+ messages in thread
From: Juliet Kim @ 2019-06-26 21:40 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: ego, mmc, linuxppc-dev


On 6/25/19 1:51 PM, Nathan Lynch wrote:
> Juliet Kim <julietk@linux.vnet.ibm.com> writes:
>
>> There's some concern this could retry forever, resulting in live lock.
> First of all the system will make progress in other areas even if there
> are repeated retries; we're not indefinitely holding locks or anything
> like that.

For instance, system admin runs a script that picks and offlines CPUs in a
loop to keep a certain rate of onlined CPUs for energy saving. If LPM keeps
putting CPUs back online, that would never finish, and would keepgenerating
new offline requests

> Second, Linux checks the H_VASI_STATE result on every retry. If the
> platform wants to terminate the migration (say, if it imposes a
> timeout), Linux will abandon it when H_VASI_STATE fails to return
> H_VASI_SUSPENDING. And it seems incorrect to bail out before that
> happens, absent hard errors on the Linux side such as allocation
> failures.
I confirmed with the PHYP and HMC folks that they wouldn't time out the LPM
request including H_VASI_STATE, so if the LPM retries were unlucky enough to
encounter repeated CPU offline attempts (maybe some customer code retrying
that), then the retries could continue indefinitely, or until some manual
intervention.  And in the mean time, the LPM delay here would cause PHYP to
block other operations.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
  2019-06-26 21:40         ` Juliet Kim
@ 2019-06-27  5:01           ` Michael Ellerman
  2019-06-27 21:59             ` Juliet Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Ellerman @ 2019-06-27  5:01 UTC (permalink / raw)
  To: Juliet Kim, Nathan Lynch; +Cc: ego, mmc, linuxppc-dev

Juliet Kim <julietk@linux.vnet.ibm.com> writes:
> On 6/25/19 1:51 PM, Nathan Lynch wrote:
>> Juliet Kim <julietk@linux.vnet.ibm.com> writes:
>>
>>> There's some concern this could retry forever, resulting in live lock.
>> First of all the system will make progress in other areas even if there
>> are repeated retries; we're not indefinitely holding locks or anything
>> like that.
>
> For instance, system admin runs a script that picks and offlines CPUs in a
> loop to keep a certain rate of onlined CPUs for energy saving. If LPM keeps
> putting CPUs back online, that would never finish, and would keepgenerating
> new offline requests
>
>> Second, Linux checks the H_VASI_STATE result on every retry. If the
>> platform wants to terminate the migration (say, if it imposes a
>> timeout), Linux will abandon it when H_VASI_STATE fails to return
>> H_VASI_SUSPENDING. And it seems incorrect to bail out before that
>> happens, absent hard errors on the Linux side such as allocation
>> failures.
> I confirmed with the PHYP and HMC folks that they wouldn't time out the LPM
> request including H_VASI_STATE, so if the LPM retries were unlucky enough to
> encounter repeated CPU offline attempts (maybe some customer code retrying
> that), then the retries could continue indefinitely, or until some manual
> intervention.  And in the mean time, the LPM delay here would cause PHYP to
> block other operations.

That sounds like a PHYP bug to me.

cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
  2019-06-27  5:01           ` Michael Ellerman
@ 2019-06-27 21:59             ` Juliet Kim
  0 siblings, 0 replies; 11+ messages in thread
From: Juliet Kim @ 2019-06-27 21:59 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Lynch; +Cc: ego, mmc, linuxppc-dev


On 6/27/19 12:01 AM, Michael Ellerman wrote:
> Juliet Kim <julietk@linux.vnet.ibm.com> writes:
>> On 6/25/19 1:51 PM, Nathan Lynch wrote:
>>> Juliet Kim <julietk@linux.vnet.ibm.com> writes:
>>>
>>>> There's some concern this could retry forever, resulting in live lock.
>>> First of all the system will make progress in other areas even if there
>>> are repeated retries; we're not indefinitely holding locks or anything
>>> like that.
>> For instance, system admin runs a script that picks and offlines CPUs in a
>> loop to keep a certain rate of onlined CPUs for energy saving. If LPM keeps
>> putting CPUs back online, that would never finish, and would keepgenerating
>> new offline requests
>>
>>> Second, Linux checks the H_VASI_STATE result on every retry. If the
>>> platform wants to terminate the migration (say, if it imposes a
>>> timeout), Linux will abandon it when H_VASI_STATE fails to return
>>> H_VASI_SUSPENDING. And it seems incorrect to bail out before that
>>> happens, absent hard errors on the Linux side such as allocation
>>> failures.
>> I confirmed with the PHYP and HMC folks that they wouldn't time out the LPM
>> request including H_VASI_STATE, so if the LPM retries were unlucky enough to
>> encounter repeated CPU offline attempts (maybe some customer code retrying
>> that), then the retries could continue indefinitely, or until some manual
>> intervention.  And in the mean time, the LPM delay here would cause PHYP to
>> block other operations.
> That sounds like a PHYP bug to me.
>
> cheers


PHYP doesn’t time out because they have no idea how long it will take for OS to
get to the point that it suspends. Other OS allows application to prepare for LPM.
They cannot predict the length of time that is appropriate in all cases and in any
case, it’s unlikely they’d make a change to that would come in time to help with
the current issue.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
  2019-06-21  6:05 [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration Nathan Lynch
  2019-06-24 16:52 ` mmc
@ 2019-07-01 22:12 ` Nathan Lynch
  2019-07-02  2:16   ` Michael Ellerman
  2019-07-03 14:27 ` Michael Ellerman
  2 siblings, 1 reply; 11+ messages in thread
From: Nathan Lynch @ 2019-07-01 22:12 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: ego, mmc, linuxppc-dev, julietk

Hi Michael,

Nathan Lynch <nathanl@linux.ibm.com> writes:
> The protocol for suspending or migrating an LPAR requires all present
> processor threads to enter H_JOIN. So if we have threads offline, we
> have to temporarily bring them up. This can race with administrator
> actions such as SMT state changes. As of dfd718a2ed1f ("powerpc/rtas:
> Fix a potential race between CPU-Offline & Migration"),

snowpatch/checkpatch flagged an error in my commit message here:

  ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit
  <12+ chars of sha1> ("<title line>")' - ie: 'commit dfd718a2ed1f
  ("powerpc/rtas: Fix a potential race between CPU-Offline &
  Migration")'

I see this is in your next-test branch though. Should I fix the commit
message and resend?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
  2019-07-01 22:12 ` Nathan Lynch
@ 2019-07-02  2:16   ` Michael Ellerman
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Ellerman @ 2019-07-02  2:16 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: ego, mmc, linuxppc-dev, julietk

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Hi Michael,
>
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> The protocol for suspending or migrating an LPAR requires all present
>> processor threads to enter H_JOIN. So if we have threads offline, we
>> have to temporarily bring them up. This can race with administrator
>> actions such as SMT state changes. As of dfd718a2ed1f ("powerpc/rtas:
>> Fix a potential race between CPU-Offline & Migration"),
>
> snowpatch/checkpatch flagged an error in my commit message here:
>
>   ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit
>   <12+ chars of sha1> ("<title line>")' - ie: 'commit dfd718a2ed1f
>   ("powerpc/rtas: Fix a potential race between CPU-Offline &
>   Migration")'
>
> I see this is in your next-test branch though. Should I fix the commit
> message and resend?

No that's fine.

You have a Fixes tag which is formatted correctly and that's what
matters IMO.

Ideally we could control that check to not complain about it being split
across lines when there's a fixes tag as well.

cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration
  2019-06-21  6:05 [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration Nathan Lynch
  2019-06-24 16:52 ` mmc
  2019-07-01 22:12 ` Nathan Lynch
@ 2019-07-03 14:27 ` Michael Ellerman
  2 siblings, 0 replies; 11+ messages in thread
From: Michael Ellerman @ 2019-07-03 14:27 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: ego, mmc, julietk

On Fri, 2019-06-21 at 06:05:18 UTC, Nathan Lynch wrote:
> The protocol for suspending or migrating an LPAR requires all present
> processor threads to enter H_JOIN. So if we have threads offline, we
> have to temporarily bring them up. This can race with administrator
> actions such as SMT state changes. As of dfd718a2ed1f ("powerpc/rtas:
> Fix a potential race between CPU-Offline & Migration"),
> rtas_ibm_suspend_me() accounts for this, but errors out with -EBUSY
> for what almost certainly is a transient condition in any reasonable
> scenario.
> 
> Callers of rtas_ibm_suspend_me() already retry when -EAGAIN is
> returned, and it is typical during a migration for that to happen
> repeatedly for several minutes polling the H_VASI_STATE hcall result
> before proceeding to the next stage.
> 
> So return -EAGAIN instead of -EBUSY when this race is
> encountered. Additionally: logging this event is still appropriate but
> use pr_info instead of pr_err; and remove use of unlikely() while here
> as this is not a hot path at all.
> 
> Fixes: dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9fb603050ffd94f8127df99c699cca2f575eb6a0

cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-07-03 14:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-21  6:05 [PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration Nathan Lynch
2019-06-24 16:52 ` mmc
2019-06-24 17:23   ` Nathan Lynch
2019-06-25  1:02     ` Juliet Kim
2019-06-25 18:51       ` Nathan Lynch
2019-06-26 21:40         ` Juliet Kim
2019-06-27  5:01           ` Michael Ellerman
2019-06-27 21:59             ` Juliet Kim
2019-07-01 22:12 ` Nathan Lynch
2019-07-02  2:16   ` Michael Ellerman
2019-07-03 14:27 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).