linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again
@ 2021-09-01  5:11 Longpeng(Mike)
  2021-09-30 14:01 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Longpeng(Mike) @ 2021-09-01  5:11 UTC (permalink / raw)
  To: peterz, valentin.schneider, mingo, tglx, bigeasy
  Cc: linux-kernel, arei.gonglei, Longpeng(Mike)

The cpu's cpu_hotplug_state will be set to CPU_UP_PREPARE before
the cpu is waken up, but it won't be reset when the failure occurs.
Then the user cannot to make the cpu online anymore, because the
CPU_UP_PREPARE state makes cpu_check_up_prepare() unhappy.

We should allow the user to try again in this case.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
---
 kernel/smpboot.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index f6bc0bc..d18f8ff 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
 		 */
 		return -EAGAIN;
 
+	case CPU_UP_PREPARE:
+		/*
+		 * The CPU failed to bringup last time, allow the user
+		 * continue to try to start it up.
+		 */
+		return 0;
+
 	default:
 
 		/* Should not happen.  Famous last words. */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again
  2021-09-01  5:11 [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again Longpeng(Mike)
@ 2021-09-30 14:01 ` Sebastian Andrzej Siewior
  2021-10-08  3:10   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-09-30 14:01 UTC (permalink / raw)
  To: Longpeng(Mike)
  Cc: peterz, valentin.schneider, mingo, tglx, linux-kernel, arei.gonglei

On 2021-09-01 13:11:43 [+0800], Longpeng(Mike) wrote:
> The cpu's cpu_hotplug_state will be set to CPU_UP_PREPARE before
> the cpu is waken up, but it won't be reset when the failure occurs.
> Then the user cannot to make the cpu online anymore, because the
> CPU_UP_PREPARE state makes cpu_check_up_prepare() unhappy.
> 
> We should allow the user to try again in this case.

Can you please describe where it failed / what did you reach that state?

> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
> ---
>  kernel/smpboot.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index f6bc0bc..d18f8ff 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
>  		 */
>  		return -EAGAIN;
>  
> +	case CPU_UP_PREPARE:
> +		/*
> +		 * The CPU failed to bringup last time, allow the user
> +		 * continue to try to start it up.
> +		 */
> +		return 0;
> +
>  	default:
>  
>  		/* Should not happen.  Famous last words. */
> -- 
> 1.8.3.1

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again
  2021-09-30 14:01 ` Sebastian Andrzej Siewior
@ 2021-10-08  3:10   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  2021-11-19 17:36     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) @ 2021-10-08  3:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: peterz, valentin.schneider, mingo, tglx, linux-kernel, Gonglei (Arei)



> -----Original Message-----
> From: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de]
> Sent: Thursday, September 30, 2021 10:01 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@huawei.com>
> Cc: peterz@infradead.org; valentin.schneider@arm.com; mingo@kernel.org;
> tglx@linutronix.de; linux-kernel@vger.kernel.org; Gonglei (Arei)
> <arei.gonglei@huawei.com>
> Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup
> again
> 
> On 2021-09-01 13:11:43 [+0800], Longpeng(Mike) wrote:
> > The cpu's cpu_hotplug_state will be set to CPU_UP_PREPARE before
> > the cpu is waken up, but it won't be reset when the failure occurs.
> > Then the user cannot to make the cpu online anymore, because the
> > CPU_UP_PREPARE state makes cpu_check_up_prepare() unhappy.
> >
> > We should allow the user to try again in this case.
> 
> Can you please describe where it failed / what did you reach that state?
> 

native_cpu_up
  cpu_check_up_prepare
  do_boot_cpu
    /* Wait 10s total for first sign of life from AP */

It will fail if the AP doesn't response in 10s and then cpu_hotplug_state
will stay in CPU_UP_PREPARE state.

This could happen on a virtualized system, especially in some special usages,
e.g. Software Enclaves [1][2]

[1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
[2] https://www.alibabacloud.com/help/doc-detail/203433.htm?spm=a3c0i.23986742.6981761520.1.7e30715eZCRXmk


> > Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
> > ---
> >  kernel/smpboot.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> > index f6bc0bc..d18f8ff 100644
> > --- a/kernel/smpboot.c
> > +++ b/kernel/smpboot.c
> > @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
> >  		 */
> >  		return -EAGAIN;
> >
> > +	case CPU_UP_PREPARE:
> > +		/*
> > +		 * The CPU failed to bringup last time, allow the user
> > +		 * continue to try to start it up.
> > +		 */
> > +		return 0;
> > +
> >  	default:
> >
> >  		/* Should not happen.  Famous last words. */
> > --
> > 1.8.3.1
> 
> Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again
  2021-10-08  3:10   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
@ 2021-11-19 17:36     ` Sebastian Andrzej Siewior
  2021-11-22  0:26       ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-11-19 17:36 UTC (permalink / raw)
  To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  Cc: peterz, valentin.schneider, mingo, tglx, linux-kernel, Gonglei (Arei)

Sorry for forgetting…

On 2021-10-08 03:10:34 [+0000], Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
> > -----Original Message-----
> > From: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de]
> > Sent: Thursday, September 30, 2021 10:01 PM
> > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > <longpeng2@huawei.com>
> > Cc: peterz@infradead.org; valentin.schneider@arm.com; mingo@kernel.org;
> > tglx@linutronix.de; linux-kernel@vger.kernel.org; Gonglei (Arei)
> > <arei.gonglei@huawei.com>
> > Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup
> > again
> > 
> > On 2021-09-01 13:11:43 [+0800], Longpeng(Mike) wrote:
> > > The cpu's cpu_hotplug_state will be set to CPU_UP_PREPARE before
> > > the cpu is waken up, but it won't be reset when the failure occurs.
> > > Then the user cannot to make the cpu online anymore, because the
> > > CPU_UP_PREPARE state makes cpu_check_up_prepare() unhappy.
> > >
> > > We should allow the user to try again in this case.
> > 
> > Can you please describe where it failed / what did you reach that state?
> > 
> 
> native_cpu_up
>   cpu_check_up_prepare
>   do_boot_cpu
>     /* Wait 10s total for first sign of life from AP */
> 
> It will fail if the AP doesn't response in 10s and then cpu_hotplug_state
> will stay in CPU_UP_PREPARE state.
> 
> This could happen on a virtualized system, especially in some special usages,
> e.g. Software Enclaves [1][2]

So wakeup_cpu_via_init_nmi() / wakeup_secondary_cpu() succeeds but the
CPU does not show up with 10 seconds.
Does the CPU come in later and spins in wait_for_master_cpu() or is the
CPU completely missing?

> [1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
> [2] https://www.alibabacloud.com/help/doc-detail/203433.htm?spm=a3c0i.23986742.6981761520.1.7e30715eZCRXmk
> 
> 
> > > Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
> > > ---
> > >  kernel/smpboot.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> > > index f6bc0bc..d18f8ff 100644
> > > --- a/kernel/smpboot.c
> > > +++ b/kernel/smpboot.c
> > > @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
> > >  		 */
> > >  		return -EAGAIN;
> > >
> > > +	case CPU_UP_PREPARE:
> > > +		/*
> > > +		 * The CPU failed to bringup last time, allow the user
> > > +		 * continue to try to start it up.
> > > +		 */
> > > +		return 0;
> > > +
> > >  	default:
> > >
> > >  		/* Should not happen.  Famous last words. */

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again
  2021-11-19 17:36     ` Sebastian Andrzej Siewior
@ 2021-11-22  0:26       ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  0 siblings, 0 replies; 5+ messages in thread
From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) @ 2021-11-22  0:26 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: peterz, valentin.schneider, mingo, tglx, linux-kernel, Gonglei (Arei)



> -----Original Message-----
> From: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de]
> Sent: Saturday, November 20, 2021 1:37 AM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@huawei.com>
> Cc: peterz@infradead.org; valentin.schneider@arm.com; mingo@kernel.org;
> tglx@linutronix.de; linux-kernel@vger.kernel.org; Gonglei (Arei)
> <arei.gonglei@huawei.com>
> Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup
> again
> 
> Sorry for forgetting…
> 
> On 2021-10-08 03:10:34 [+0000], Longpeng (Mike, Cloud Infrastructure Service
> Product Dept.) wrote:
> > > -----Original Message-----
> > > From: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de]
> > > Sent: Thursday, September 30, 2021 10:01 PM
> > > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > <longpeng2@huawei.com>
> > > Cc: peterz@infradead.org; valentin.schneider@arm.com; mingo@kernel.org;
> > > tglx@linutronix.de; linux-kernel@vger.kernel.org; Gonglei (Arei)
> > > <arei.gonglei@huawei.com>
> > > Subject: Re: [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup
> > > again
> > >
> > > On 2021-09-01 13:11:43 [+0800], Longpeng(Mike) wrote:
> > > > The cpu's cpu_hotplug_state will be set to CPU_UP_PREPARE before
> > > > the cpu is waken up, but it won't be reset when the failure occurs.
> > > > Then the user cannot to make the cpu online anymore, because the
> > > > CPU_UP_PREPARE state makes cpu_check_up_prepare() unhappy.
> > > >
> > > > We should allow the user to try again in this case.
> > >
> > > Can you please describe where it failed / what did you reach that state?
> > >
> >
> > native_cpu_up
> >   cpu_check_up_prepare
> >   do_boot_cpu
> >     /* Wait 10s total for first sign of life from AP */
> >
> > It will fail if the AP doesn't response in 10s and then cpu_hotplug_state
> > will stay in CPU_UP_PREPARE state.
> >
> > This could happen on a virtualized system, especially in some special usages,
> > e.g. Software Enclaves [1][2]
> 
> So wakeup_cpu_via_init_nmi() / wakeup_secondary_cpu() succeeds but the
> CPU does not show up with 10 seconds.
> Does the CPU come in later and spins in wait_for_master_cpu() or is the
> CPU completely missing?
> 

The cpu is completely missing at the moment since the hypervisor can reject
all events that send to this cpu when the enclave vm is running.

But the cpu can receive the events and bring up again if the enclave vm is
terminated.


> > [1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html
> > [2]
> https://www.alibabacloud.com/help/doc-detail/203433.htm?spm=a3c0i.23986742.
> 6981761520.1.7e30715eZCRXmk
> >
> >
> > > > Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
> > > > ---
> > > >  kernel/smpboot.c | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> > > > index f6bc0bc..d18f8ff 100644
> > > > --- a/kernel/smpboot.c
> > > > +++ b/kernel/smpboot.c
> > > > @@ -392,6 +392,13 @@ int cpu_check_up_prepare(int cpu)
> > > >  		 */
> > > >  		return -EAGAIN;
> > > >
> > > > +	case CPU_UP_PREPARE:
> > > > +		/*
> > > > +		 * The CPU failed to bringup last time, allow the user
> > > > +		 * continue to try to start it up.
> > > > +		 */
> > > > +		return 0;
> > > > +
> > > >  	default:
> > > >
> > > >  		/* Should not happen.  Famous last words. */
> 
> Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-22  0:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01  5:11 [RFC] cpu/hotplug: allow the cpu in UP_PREPARE state to bringup again Longpeng(Mike)
2021-09-30 14:01 ` Sebastian Andrzej Siewior
2021-10-08  3:10   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-11-19 17:36     ` Sebastian Andrzej Siewior
2021-11-22  0:26       ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).