linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early
@ 2020-03-02  2:03 Gavin Shan
  2020-03-02 12:21 ` Mark Rutland
  2020-03-17 18:32 ` Catalin Marinas
  0 siblings, 2 replies; 8+ messages in thread
From: Gavin Shan @ 2020-03-02  2:03 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: catalin.marinas, will, maz, mark.rutland, shan.gavin

The function __cpu_up() is invoked to bring up the target CPU through
the backend, PSCI for example. The nested if statements won't be needed
if we bail out early on the following two conditions where the status
won't be checked. The code looks simplified in that case.

   * Error returned from the backend (e.g. PSCI)
   * The target CPU has been marked as onlined

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/kernel/smp.c | 79 +++++++++++++++++++----------------------
 1 file changed, 37 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d4ed9a19d8fe..2a9d8f39dc58 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -115,60 +115,55 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 	update_cpu_boot_status(CPU_MMU_OFF);
 	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
 
-	/*
-	 * Now bring the CPU into our world.
-	 */
+	/* Now bring the CPU into our world */
 	ret = boot_secondary(cpu, idle);
-	if (ret == 0) {
-		/*
-		 * CPU was successfully started, wait for it to come online or
-		 * time out.
-		 */
-		wait_for_completion_timeout(&cpu_running,
-					    msecs_to_jiffies(5000));
-
-		if (!cpu_online(cpu)) {
-			pr_crit("CPU%u: failed to come online\n", cpu);
-			ret = -EIO;
-		}
-	} else {
+	if (ret) {
 		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
 		return ret;
 	}
 
+	/*
+	 * CPU was successfully started, wait for it to come online or
+	 * time out.
+	 */
+	wait_for_completion_timeout(&cpu_running,
+				    msecs_to_jiffies(5000));
+	if (cpu_online(cpu))
+		return 0;
+
+	pr_crit("CPU%u: failed to come online\n", cpu);
 	secondary_data.task = NULL;
 	secondary_data.stack = NULL;
 	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
 	status = READ_ONCE(secondary_data.status);
-	if (ret && status) {
-
-		if (status == CPU_MMU_OFF)
-			status = READ_ONCE(__early_cpu_boot_status);
+	if (status == CPU_MMU_OFF)
+		status = READ_ONCE(__early_cpu_boot_status);
 
-		switch (status & CPU_BOOT_STATUS_MASK) {
-		default:
-			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
-					cpu, status);
-			cpus_stuck_in_kernel++;
-			break;
-		case CPU_KILL_ME:
-			if (!op_cpu_kill(cpu)) {
-				pr_crit("CPU%u: died during early boot\n", cpu);
-				break;
-			}
-			pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
-			/* Fall through */
-		case CPU_STUCK_IN_KERNEL:
-			pr_crit("CPU%u: is stuck in kernel\n", cpu);
-			if (status & CPU_STUCK_REASON_52_BIT_VA)
-				pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
-			if (status & CPU_STUCK_REASON_NO_GRAN)
-				pr_crit("CPU%u: does not support %luK granule \n", cpu, PAGE_SIZE / SZ_1K);
-			cpus_stuck_in_kernel++;
+	switch (status & CPU_BOOT_STATUS_MASK) {
+	default:
+		pr_err("CPU%u: failed in unknown state : 0x%lx\n",
+		       cpu, status);
+		cpus_stuck_in_kernel++;
+		break;
+	case CPU_KILL_ME:
+		if (!op_cpu_kill(cpu)) {
+			pr_crit("CPU%u: died during early boot\n", cpu);
 			break;
-		case CPU_PANIC_KERNEL:
-			panic("CPU%u detected unsupported configuration\n", cpu);
 		}
+		pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
+		/* Fall through */
+	case CPU_STUCK_IN_KERNEL:
+		pr_crit("CPU%u: is stuck in kernel\n", cpu);
+		if (status & CPU_STUCK_REASON_52_BIT_VA)
+			pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
+		if (status & CPU_STUCK_REASON_NO_GRAN) {
+			pr_crit("CPU%u: does not support %luK granule\n",
+				cpu, PAGE_SIZE / SZ_1K);
+		}
+		cpus_stuck_in_kernel++;
+		break;
+	case CPU_PANIC_KERNEL:
+		panic("CPU%u detected unsupported configuration\n", cpu);
 	}
 
 	return ret;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early
  2020-03-02  2:03 [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early Gavin Shan
@ 2020-03-02 12:21 ` Mark Rutland
  2020-03-02 13:38   ` Gavin Shan
  2020-03-17 10:06   ` Mark Rutland
  2020-03-17 18:32 ` Catalin Marinas
  1 sibling, 2 replies; 8+ messages in thread
From: Mark Rutland @ 2020-03-02 12:21 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, will, maz, shan.gavin

On Mon, Mar 02, 2020 at 01:03:40PM +1100, Gavin Shan wrote:
> The function __cpu_up() is invoked to bring up the target CPU through
> the backend, PSCI for example. The nested if statements won't be needed
> if we bail out early on the following two conditions where the status
> won't be checked. The code looks simplified in that case.
> 
>    * Error returned from the backend (e.g. PSCI)
>    * The target CPU has been marked as onlined
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>

FWIW, this looks like a nice cleanup to me:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

While this patch leaves secondary_data.{task,stack} stale on a
successful onlining, that was already the case for a timeout, and should
be fine (since the next attempt at onlining will configure those before
poking the CPU).

Thanks,
Mark.

> ---
>  arch/arm64/kernel/smp.c | 79 +++++++++++++++++++----------------------
>  1 file changed, 37 insertions(+), 42 deletions(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index d4ed9a19d8fe..2a9d8f39dc58 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -115,60 +115,55 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  	update_cpu_boot_status(CPU_MMU_OFF);
>  	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
>  
> -	/*
> -	 * Now bring the CPU into our world.
> -	 */
> +	/* Now bring the CPU into our world */
>  	ret = boot_secondary(cpu, idle);
> -	if (ret == 0) {
> -		/*
> -		 * CPU was successfully started, wait for it to come online or
> -		 * time out.
> -		 */
> -		wait_for_completion_timeout(&cpu_running,
> -					    msecs_to_jiffies(5000));
> -
> -		if (!cpu_online(cpu)) {
> -			pr_crit("CPU%u: failed to come online\n", cpu);
> -			ret = -EIO;
> -		}
> -	} else {
> +	if (ret) {
>  		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>  		return ret;
>  	}
>  
> +	/*
> +	 * CPU was successfully started, wait for it to come online or
> +	 * time out.
> +	 */
> +	wait_for_completion_timeout(&cpu_running,
> +				    msecs_to_jiffies(5000));
> +	if (cpu_online(cpu))
> +		return 0;
> +
> +	pr_crit("CPU%u: failed to come online\n", cpu);
>  	secondary_data.task = NULL;
>  	secondary_data.stack = NULL;
>  	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
>  	status = READ_ONCE(secondary_data.status);
> -	if (ret && status) {
> -
> -		if (status == CPU_MMU_OFF)
> -			status = READ_ONCE(__early_cpu_boot_status);
> +	if (status == CPU_MMU_OFF)
> +		status = READ_ONCE(__early_cpu_boot_status);
>  
> -		switch (status & CPU_BOOT_STATUS_MASK) {
> -		default:
> -			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
> -					cpu, status);
> -			cpus_stuck_in_kernel++;
> -			break;
> -		case CPU_KILL_ME:
> -			if (!op_cpu_kill(cpu)) {
> -				pr_crit("CPU%u: died during early boot\n", cpu);
> -				break;
> -			}
> -			pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
> -			/* Fall through */
> -		case CPU_STUCK_IN_KERNEL:
> -			pr_crit("CPU%u: is stuck in kernel\n", cpu);
> -			if (status & CPU_STUCK_REASON_52_BIT_VA)
> -				pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
> -			if (status & CPU_STUCK_REASON_NO_GRAN)
> -				pr_crit("CPU%u: does not support %luK granule \n", cpu, PAGE_SIZE / SZ_1K);
> -			cpus_stuck_in_kernel++;
> +	switch (status & CPU_BOOT_STATUS_MASK) {
> +	default:
> +		pr_err("CPU%u: failed in unknown state : 0x%lx\n",
> +		       cpu, status);
> +		cpus_stuck_in_kernel++;
> +		break;
> +	case CPU_KILL_ME:
> +		if (!op_cpu_kill(cpu)) {
> +			pr_crit("CPU%u: died during early boot\n", cpu);
>  			break;
> -		case CPU_PANIC_KERNEL:
> -			panic("CPU%u detected unsupported configuration\n", cpu);
>  		}
> +		pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
> +		/* Fall through */
> +	case CPU_STUCK_IN_KERNEL:
> +		pr_crit("CPU%u: is stuck in kernel\n", cpu);
> +		if (status & CPU_STUCK_REASON_52_BIT_VA)
> +			pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
> +		if (status & CPU_STUCK_REASON_NO_GRAN) {
> +			pr_crit("CPU%u: does not support %luK granule\n",
> +				cpu, PAGE_SIZE / SZ_1K);
> +		}
> +		cpus_stuck_in_kernel++;
> +		break;
> +	case CPU_PANIC_KERNEL:
> +		panic("CPU%u detected unsupported configuration\n", cpu);
>  	}
>  
>  	return ret;
> -- 
> 2.23.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early
  2020-03-02 12:21 ` Mark Rutland
@ 2020-03-02 13:38   ` Gavin Shan
  2020-03-02 14:06     ` Mark Rutland
  2020-03-17 10:06   ` Mark Rutland
  1 sibling, 1 reply; 8+ messages in thread
From: Gavin Shan @ 2020-03-02 13:38 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, will, maz, shan.gavin

On 3/2/20 11:21 PM, Mark Rutland wrote:
> On Mon, Mar 02, 2020 at 01:03:40PM +1100, Gavin Shan wrote:
>> The function __cpu_up() is invoked to bring up the target CPU through
>> the backend, PSCI for example. The nested if statements won't be needed
>> if we bail out early on the following two conditions where the status
>> won't be checked. The code looks simplified in that case.
>>
>>     * Error returned from the backend (e.g. PSCI)
>>     * The target CPU has been marked as onlined
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
> 
> FWIW, this looks like a nice cleanup to me:
> 
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> 
> While this patch leaves secondary_data.{task,stack} stale on a
> successful onlining, that was already the case for a timeout, and should
> be fine (since the next attempt at onlining will configure those before
> poking the CPU).
> 
> Thanks,
> Mark.
> 

Thanks, Mark. Yeah, it should be fine as you said. There are something else,
which might be not relevant. @secondary_data could be accessed by multiple CPUs
in parallel. For example, the master CPU boots CPU#1 and timeouts to wait it
to be online in 5 seconds. CPU#1 isn't necessarily stuck in somewhere. After
that, CPU#2 is brought up and might be accessing @secondary_data. At this point,
CPU#1 can come back to access it either. However, @secondary_data isn't valid
for CPU#1 anymore.

I was thinking of something to improve the situation, but not sure if it makes
any sense to do so. There are several options: (1) Make @secondary_data per-cpu
variable, which looks a nature way to go. (2) To shutdown the CPU on timeout.
The shutdown request can be failed to be served in theory, but it seems still
an improvement.

Thanks,
Gavin

>> ---
>>   arch/arm64/kernel/smp.c | 79 +++++++++++++++++++----------------------
>>   1 file changed, 37 insertions(+), 42 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index d4ed9a19d8fe..2a9d8f39dc58 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -115,60 +115,55 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>   	update_cpu_boot_status(CPU_MMU_OFF);
>>   	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
>>   
>> -	/*
>> -	 * Now bring the CPU into our world.
>> -	 */
>> +	/* Now bring the CPU into our world */
>>   	ret = boot_secondary(cpu, idle);
>> -	if (ret == 0) {
>> -		/*
>> -		 * CPU was successfully started, wait for it to come online or
>> -		 * time out.
>> -		 */
>> -		wait_for_completion_timeout(&cpu_running,
>> -					    msecs_to_jiffies(5000));
>> -
>> -		if (!cpu_online(cpu)) {
>> -			pr_crit("CPU%u: failed to come online\n", cpu);
>> -			ret = -EIO;
>> -		}
>> -	} else {
>> +	if (ret) {
>>   		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>>   		return ret;
>>   	}
>>   
>> +	/*
>> +	 * CPU was successfully started, wait for it to come online or
>> +	 * time out.
>> +	 */
>> +	wait_for_completion_timeout(&cpu_running,
>> +				    msecs_to_jiffies(5000));
>> +	if (cpu_online(cpu))
>> +		return 0;
>> +
>> +	pr_crit("CPU%u: failed to come online\n", cpu);
>>   	secondary_data.task = NULL;
>>   	secondary_data.stack = NULL;
>>   	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
>>   	status = READ_ONCE(secondary_data.status);
>> -	if (ret && status) {
>> -
>> -		if (status == CPU_MMU_OFF)
>> -			status = READ_ONCE(__early_cpu_boot_status);
>> +	if (status == CPU_MMU_OFF)
>> +		status = READ_ONCE(__early_cpu_boot_status);
>>   
>> -		switch (status & CPU_BOOT_STATUS_MASK) {
>> -		default:
>> -			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
>> -					cpu, status);
>> -			cpus_stuck_in_kernel++;
>> -			break;
>> -		case CPU_KILL_ME:
>> -			if (!op_cpu_kill(cpu)) {
>> -				pr_crit("CPU%u: died during early boot\n", cpu);
>> -				break;
>> -			}
>> -			pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
>> -			/* Fall through */
>> -		case CPU_STUCK_IN_KERNEL:
>> -			pr_crit("CPU%u: is stuck in kernel\n", cpu);
>> -			if (status & CPU_STUCK_REASON_52_BIT_VA)
>> -				pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
>> -			if (status & CPU_STUCK_REASON_NO_GRAN)
>> -				pr_crit("CPU%u: does not support %luK granule \n", cpu, PAGE_SIZE / SZ_1K);
>> -			cpus_stuck_in_kernel++;
>> +	switch (status & CPU_BOOT_STATUS_MASK) {
>> +	default:
>> +		pr_err("CPU%u: failed in unknown state : 0x%lx\n",
>> +		       cpu, status);
>> +		cpus_stuck_in_kernel++;
>> +		break;
>> +	case CPU_KILL_ME:
>> +		if (!op_cpu_kill(cpu)) {
>> +			pr_crit("CPU%u: died during early boot\n", cpu);
>>   			break;
>> -		case CPU_PANIC_KERNEL:
>> -			panic("CPU%u detected unsupported configuration\n", cpu);
>>   		}
>> +		pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
>> +		/* Fall through */
>> +	case CPU_STUCK_IN_KERNEL:
>> +		pr_crit("CPU%u: is stuck in kernel\n", cpu);
>> +		if (status & CPU_STUCK_REASON_52_BIT_VA)
>> +			pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
>> +		if (status & CPU_STUCK_REASON_NO_GRAN) {
>> +			pr_crit("CPU%u: does not support %luK granule\n",
>> +				cpu, PAGE_SIZE / SZ_1K);
>> +		}
>> +		cpus_stuck_in_kernel++;
>> +		break;
>> +	case CPU_PANIC_KERNEL:
>> +		panic("CPU%u detected unsupported configuration\n", cpu);
>>   	}
>>   
>>   	return ret;
>> -- 
>> 2.23.0
>>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early
  2020-03-02 13:38   ` Gavin Shan
@ 2020-03-02 14:06     ` Mark Rutland
  2020-03-02 14:35       ` Gavin Shan
  0 siblings, 1 reply; 8+ messages in thread
From: Mark Rutland @ 2020-03-02 14:06 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, will, maz, shan.gavin

On Tue, Mar 03, 2020 at 12:38:48AM +1100, Gavin Shan wrote:
> On 3/2/20 11:21 PM, Mark Rutland wrote:
> > On Mon, Mar 02, 2020 at 01:03:40PM +1100, Gavin Shan wrote:
> > > The function __cpu_up() is invoked to bring up the target CPU through
> > > the backend, PSCI for example. The nested if statements won't be needed
> > > if we bail out early on the following two conditions where the status
> > > won't be checked. The code looks simplified in that case.
> > > 
> > >     * Error returned from the backend (e.g. PSCI)
> > >     * The target CPU has been marked as onlined
> > > 
> > > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > 
> > FWIW, this looks like a nice cleanup to me:
> > 
> > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> > 
> > While this patch leaves secondary_data.{task,stack} stale on a
> > successful onlining, that was already the case for a timeout, and should
> > be fine (since the next attempt at onlining will configure those before
> > poking the CPU).
> > 
> > Thanks,
> > Mark.
> > 
> 
> Thanks, Mark. Yeah, it should be fine as you said. There are something else,
> which might be not relevant. @secondary_data could be accessed by multiple CPUs
> in parallel. For example, the master CPU boots CPU#1 and timeouts to wait it
> to be online in 5 seconds. CPU#1 isn't necessarily stuck in somewhere. After
> that, CPU#2 is brought up and might be accessing @secondary_data. At this point,
> CPU#1 can come back to access it either. However, @secondary_data isn't valid
> for CPU#1 anymore.

Sure; I'm aware of improvements that could be made here, but I don't
think they need to block this patch.

> I was thinking of something to improve the situation, but not sure if it makes
> any sense to do so. There are several options: (1) Make @secondary_data per-cpu
> variable, which looks a nature way to go. (2) To shutdown the CPU on timeout.
> The shutdown request can be failed to be served in theory, but it seems still
> an improvement.

I think #2 is a bad idea, since if the CPU gets into the kernel at all,
it may have done stuff (e.g. acquiring locks), and ripping it out is
liable to cause more problems.

I think doing #1 might be nice, but some caveats apply.

I'd like to clean up the secondary stack/task hand-over to use an atomic
cmpxchg pair, so that we can detect when the secondary has possibly
tried to use the stack/task. That requires splitting that from the
MMU-off bits from the MMU-on bits, and I'm not sure how well that
interacts with #1. It might mean that the per-cpu part isn't that
worthwhile.

Thanks,
Mark.

> 
> Thanks,
> Gavin
> 
> > > ---
> > >   arch/arm64/kernel/smp.c | 79 +++++++++++++++++++----------------------
> > >   1 file changed, 37 insertions(+), 42 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > index d4ed9a19d8fe..2a9d8f39dc58 100644
> > > --- a/arch/arm64/kernel/smp.c
> > > +++ b/arch/arm64/kernel/smp.c
> > > @@ -115,60 +115,55 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> > >   	update_cpu_boot_status(CPU_MMU_OFF);
> > >   	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
> > > -	/*
> > > -	 * Now bring the CPU into our world.
> > > -	 */
> > > +	/* Now bring the CPU into our world */
> > >   	ret = boot_secondary(cpu, idle);
> > > -	if (ret == 0) {
> > > -		/*
> > > -		 * CPU was successfully started, wait for it to come online or
> > > -		 * time out.
> > > -		 */
> > > -		wait_for_completion_timeout(&cpu_running,
> > > -					    msecs_to_jiffies(5000));
> > > -
> > > -		if (!cpu_online(cpu)) {
> > > -			pr_crit("CPU%u: failed to come online\n", cpu);
> > > -			ret = -EIO;
> > > -		}
> > > -	} else {
> > > +	if (ret) {
> > >   		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> > >   		return ret;
> > >   	}
> > > +	/*
> > > +	 * CPU was successfully started, wait for it to come online or
> > > +	 * time out.
> > > +	 */
> > > +	wait_for_completion_timeout(&cpu_running,
> > > +				    msecs_to_jiffies(5000));
> > > +	if (cpu_online(cpu))
> > > +		return 0;
> > > +
> > > +	pr_crit("CPU%u: failed to come online\n", cpu);
> > >   	secondary_data.task = NULL;
> > >   	secondary_data.stack = NULL;
> > >   	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
> > >   	status = READ_ONCE(secondary_data.status);
> > > -	if (ret && status) {
> > > -
> > > -		if (status == CPU_MMU_OFF)
> > > -			status = READ_ONCE(__early_cpu_boot_status);
> > > +	if (status == CPU_MMU_OFF)
> > > +		status = READ_ONCE(__early_cpu_boot_status);
> > > -		switch (status & CPU_BOOT_STATUS_MASK) {
> > > -		default:
> > > -			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
> > > -					cpu, status);
> > > -			cpus_stuck_in_kernel++;
> > > -			break;
> > > -		case CPU_KILL_ME:
> > > -			if (!op_cpu_kill(cpu)) {
> > > -				pr_crit("CPU%u: died during early boot\n", cpu);
> > > -				break;
> > > -			}
> > > -			pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
> > > -			/* Fall through */
> > > -		case CPU_STUCK_IN_KERNEL:
> > > -			pr_crit("CPU%u: is stuck in kernel\n", cpu);
> > > -			if (status & CPU_STUCK_REASON_52_BIT_VA)
> > > -				pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
> > > -			if (status & CPU_STUCK_REASON_NO_GRAN)
> > > -				pr_crit("CPU%u: does not support %luK granule \n", cpu, PAGE_SIZE / SZ_1K);
> > > -			cpus_stuck_in_kernel++;
> > > +	switch (status & CPU_BOOT_STATUS_MASK) {
> > > +	default:
> > > +		pr_err("CPU%u: failed in unknown state : 0x%lx\n",
> > > +		       cpu, status);
> > > +		cpus_stuck_in_kernel++;
> > > +		break;
> > > +	case CPU_KILL_ME:
> > > +		if (!op_cpu_kill(cpu)) {
> > > +			pr_crit("CPU%u: died during early boot\n", cpu);
> > >   			break;
> > > -		case CPU_PANIC_KERNEL:
> > > -			panic("CPU%u detected unsupported configuration\n", cpu);
> > >   		}
> > > +		pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
> > > +		/* Fall through */
> > > +	case CPU_STUCK_IN_KERNEL:
> > > +		pr_crit("CPU%u: is stuck in kernel\n", cpu);
> > > +		if (status & CPU_STUCK_REASON_52_BIT_VA)
> > > +			pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
> > > +		if (status & CPU_STUCK_REASON_NO_GRAN) {
> > > +			pr_crit("CPU%u: does not support %luK granule\n",
> > > +				cpu, PAGE_SIZE / SZ_1K);
> > > +		}
> > > +		cpus_stuck_in_kernel++;
> > > +		break;
> > > +	case CPU_PANIC_KERNEL:
> > > +		panic("CPU%u detected unsupported configuration\n", cpu);
> > >   	}
> > >   	return ret;
> > > -- 
> > > 2.23.0
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early
  2020-03-02 14:06     ` Mark Rutland
@ 2020-03-02 14:35       ` Gavin Shan
  0 siblings, 0 replies; 8+ messages in thread
From: Gavin Shan @ 2020-03-02 14:35 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, will, maz, shan.gavin

On 3/3/20 1:06 AM, Mark Rutland wrote:
> On Tue, Mar 03, 2020 at 12:38:48AM +1100, Gavin Shan wrote:
>> On 3/2/20 11:21 PM, Mark Rutland wrote:
>>> On Mon, Mar 02, 2020 at 01:03:40PM +1100, Gavin Shan wrote:
>>>> The function __cpu_up() is invoked to bring up the target CPU through
>>>> the backend, PSCI for example. The nested if statements won't be needed
>>>> if we bail out early on the following two conditions where the status
>>>> won't be checked. The code looks simplified in that case.
>>>>
>>>>      * Error returned from the backend (e.g. PSCI)
>>>>      * The target CPU has been marked as onlined
>>>>
>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>
>>> FWIW, this looks like a nice cleanup to me:
>>>
>>> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>>>
>>> While this patch leaves secondary_data.{task,stack} stale on a
>>> successful onlining, that was already the case for a timeout, and should
>>> be fine (since the next attempt at onlining will configure those before
>>> poking the CPU).
>>>
>>> Thanks,
>>> Mark.
>>>
>>
>> Thanks, Mark. Yeah, it should be fine as you said. There are something else,
>> which might be not relevant. @secondary_data could be accessed by multiple CPUs
>> in parallel. For example, the master CPU boots CPU#1 and timeouts to wait it
>> to be online in 5 seconds. CPU#1 isn't necessarily stuck in somewhere. After
>> that, CPU#2 is brought up and might be accessing @secondary_data. At this point,
>> CPU#1 can come back to access it either. However, @secondary_data isn't valid
>> for CPU#1 anymore.
> 
> Sure; I'm aware of improvements that could be made here, but I don't
> think they need to block this patch.
> 

Yep, I think this patch is ready to go in if nobody else objects.

>> I was thinking of something to improve the situation, but not sure if it makes
>> any sense to do so. There are several options: (1) Make @secondary_data per-cpu
>> variable, which looks a nature way to go. (2) To shutdown the CPU on timeout.
>> The shutdown request can be failed to be served in theory, but it seems still
>> an improvement.
> 
> I think #2 is a bad idea, since if the CPU gets into the kernel at all,
> it may have done stuff (e.g. acquiring locks), and ripping it out is
> liable to cause more problems.
> 
> I think doing #1 might be nice, but some caveats apply.
> 
> I'd like to clean up the secondary stack/task hand-over to use an atomic
> cmpxchg pair, so that we can detect when the secondary has possibly
> tried to use the stack/task. That requires splitting that from the
> MMU-off bits from the MMU-on bits, and I'm not sure how well that
> interacts with #1. It might mean that the per-cpu part isn't that
> worthwhile.
> 

Right, #2 isn't good if the acquired resource (e.g. lock) can't be released.

It's something like to introduce a lock to the shared @secondary_data with
atomic cmpxchg pair. With that, I don't think per-cpu part is needed, not
so useful at least.

Thanks,
Gavin

>>
>>>> ---
>>>>    arch/arm64/kernel/smp.c | 79 +++++++++++++++++++----------------------
>>>>    1 file changed, 37 insertions(+), 42 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>>>> index d4ed9a19d8fe..2a9d8f39dc58 100644
>>>> --- a/arch/arm64/kernel/smp.c
>>>> +++ b/arch/arm64/kernel/smp.c
>>>> @@ -115,60 +115,55 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>>>    	update_cpu_boot_status(CPU_MMU_OFF);
>>>>    	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
>>>> -	/*
>>>> -	 * Now bring the CPU into our world.
>>>> -	 */
>>>> +	/* Now bring the CPU into our world */
>>>>    	ret = boot_secondary(cpu, idle);
>>>> -	if (ret == 0) {
>>>> -		/*
>>>> -		 * CPU was successfully started, wait for it to come online or
>>>> -		 * time out.
>>>> -		 */
>>>> -		wait_for_completion_timeout(&cpu_running,
>>>> -					    msecs_to_jiffies(5000));
>>>> -
>>>> -		if (!cpu_online(cpu)) {
>>>> -			pr_crit("CPU%u: failed to come online\n", cpu);
>>>> -			ret = -EIO;
>>>> -		}
>>>> -	} else {
>>>> +	if (ret) {
>>>>    		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>>>>    		return ret;
>>>>    	}
>>>> +	/*
>>>> +	 * CPU was successfully started, wait for it to come online or
>>>> +	 * time out.
>>>> +	 */
>>>> +	wait_for_completion_timeout(&cpu_running,
>>>> +				    msecs_to_jiffies(5000));
>>>> +	if (cpu_online(cpu))
>>>> +		return 0;
>>>> +
>>>> +	pr_crit("CPU%u: failed to come online\n", cpu);
>>>>    	secondary_data.task = NULL;
>>>>    	secondary_data.stack = NULL;
>>>>    	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
>>>>    	status = READ_ONCE(secondary_data.status);
>>>> -	if (ret && status) {
>>>> -
>>>> -		if (status == CPU_MMU_OFF)
>>>> -			status = READ_ONCE(__early_cpu_boot_status);
>>>> +	if (status == CPU_MMU_OFF)
>>>> +		status = READ_ONCE(__early_cpu_boot_status);
>>>> -		switch (status & CPU_BOOT_STATUS_MASK) {
>>>> -		default:
>>>> -			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
>>>> -					cpu, status);
>>>> -			cpus_stuck_in_kernel++;
>>>> -			break;
>>>> -		case CPU_KILL_ME:
>>>> -			if (!op_cpu_kill(cpu)) {
>>>> -				pr_crit("CPU%u: died during early boot\n", cpu);
>>>> -				break;
>>>> -			}
>>>> -			pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
>>>> -			/* Fall through */
>>>> -		case CPU_STUCK_IN_KERNEL:
>>>> -			pr_crit("CPU%u: is stuck in kernel\n", cpu);
>>>> -			if (status & CPU_STUCK_REASON_52_BIT_VA)
>>>> -				pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
>>>> -			if (status & CPU_STUCK_REASON_NO_GRAN)
>>>> -				pr_crit("CPU%u: does not support %luK granule \n", cpu, PAGE_SIZE / SZ_1K);
>>>> -			cpus_stuck_in_kernel++;
>>>> +	switch (status & CPU_BOOT_STATUS_MASK) {
>>>> +	default:
>>>> +		pr_err("CPU%u: failed in unknown state : 0x%lx\n",
>>>> +		       cpu, status);
>>>> +		cpus_stuck_in_kernel++;
>>>> +		break;
>>>> +	case CPU_KILL_ME:
>>>> +		if (!op_cpu_kill(cpu)) {
>>>> +			pr_crit("CPU%u: died during early boot\n", cpu);
>>>>    			break;
>>>> -		case CPU_PANIC_KERNEL:
>>>> -			panic("CPU%u detected unsupported configuration\n", cpu);
>>>>    		}
>>>> +		pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
>>>> +		/* Fall through */
>>>> +	case CPU_STUCK_IN_KERNEL:
>>>> +		pr_crit("CPU%u: is stuck in kernel\n", cpu);
>>>> +		if (status & CPU_STUCK_REASON_52_BIT_VA)
>>>> +			pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
>>>> +		if (status & CPU_STUCK_REASON_NO_GRAN) {
>>>> +			pr_crit("CPU%u: does not support %luK granule\n",
>>>> +				cpu, PAGE_SIZE / SZ_1K);
>>>> +		}
>>>> +		cpus_stuck_in_kernel++;
>>>> +		break;
>>>> +	case CPU_PANIC_KERNEL:
>>>> +		panic("CPU%u detected unsupported configuration\n", cpu);
>>>>    	}
>>>>    	return ret;
>>>> -- 
>>>> 2.23.0
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early
  2020-03-02 12:21 ` Mark Rutland
  2020-03-02 13:38   ` Gavin Shan
@ 2020-03-17 10:06   ` Mark Rutland
  2020-03-17 10:08     ` Catalin Marinas
  1 sibling, 1 reply; 8+ messages in thread
From: Mark Rutland @ 2020-03-17 10:06 UTC (permalink / raw)
  To: Gavin Shan, catalin.marinas
  Cc: linux-arm-kernel, linux-kernel, will, maz, shan.gavin

On Mon, Mar 02, 2020 at 12:21:35PM +0000, Mark Rutland wrote:
> On Mon, Mar 02, 2020 at 01:03:40PM +1100, Gavin Shan wrote:
> > The function __cpu_up() is invoked to bring up the target CPU through
> > the backend, PSCI for example. The nested if statements won't be needed
> > if we bail out early on the following two conditions where the status
> > won't be checked. The code looks simplified in that case.
> > 
> >    * Error returned from the backend (e.g. PSCI)
> >    * The target CPU has been marked as onlined
> > 
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
> 
> FWIW, this looks like a nice cleanup to me:
> 
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Catalin, are you happy to pick this up?

Thanks,
Mark.

> While this patch leaves secondary_data.{task,stack} stale on a
> successful onlining, that was already the case for a timeout, and should
> be fine (since the next attempt at onlining will configure those before
> poking the CPU).
> 
> Thanks,
> Mark.
> 
> > ---
> >  arch/arm64/kernel/smp.c | 79 +++++++++++++++++++----------------------
> >  1 file changed, 37 insertions(+), 42 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index d4ed9a19d8fe..2a9d8f39dc58 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -115,60 +115,55 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> >  	update_cpu_boot_status(CPU_MMU_OFF);
> >  	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
> >  
> > -	/*
> > -	 * Now bring the CPU into our world.
> > -	 */
> > +	/* Now bring the CPU into our world */
> >  	ret = boot_secondary(cpu, idle);
> > -	if (ret == 0) {
> > -		/*
> > -		 * CPU was successfully started, wait for it to come online or
> > -		 * time out.
> > -		 */
> > -		wait_for_completion_timeout(&cpu_running,
> > -					    msecs_to_jiffies(5000));
> > -
> > -		if (!cpu_online(cpu)) {
> > -			pr_crit("CPU%u: failed to come online\n", cpu);
> > -			ret = -EIO;
> > -		}
> > -	} else {
> > +	if (ret) {
> >  		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> >  		return ret;
> >  	}
> >  
> > +	/*
> > +	 * CPU was successfully started, wait for it to come online or
> > +	 * time out.
> > +	 */
> > +	wait_for_completion_timeout(&cpu_running,
> > +				    msecs_to_jiffies(5000));
> > +	if (cpu_online(cpu))
> > +		return 0;
> > +
> > +	pr_crit("CPU%u: failed to come online\n", cpu);
> >  	secondary_data.task = NULL;
> >  	secondary_data.stack = NULL;
> >  	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
> >  	status = READ_ONCE(secondary_data.status);
> > -	if (ret && status) {
> > -
> > -		if (status == CPU_MMU_OFF)
> > -			status = READ_ONCE(__early_cpu_boot_status);
> > +	if (status == CPU_MMU_OFF)
> > +		status = READ_ONCE(__early_cpu_boot_status);
> >  
> > -		switch (status & CPU_BOOT_STATUS_MASK) {
> > -		default:
> > -			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
> > -					cpu, status);
> > -			cpus_stuck_in_kernel++;
> > -			break;
> > -		case CPU_KILL_ME:
> > -			if (!op_cpu_kill(cpu)) {
> > -				pr_crit("CPU%u: died during early boot\n", cpu);
> > -				break;
> > -			}
> > -			pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
> > -			/* Fall through */
> > -		case CPU_STUCK_IN_KERNEL:
> > -			pr_crit("CPU%u: is stuck in kernel\n", cpu);
> > -			if (status & CPU_STUCK_REASON_52_BIT_VA)
> > -				pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
> > -			if (status & CPU_STUCK_REASON_NO_GRAN)
> > -				pr_crit("CPU%u: does not support %luK granule \n", cpu, PAGE_SIZE / SZ_1K);
> > -			cpus_stuck_in_kernel++;
> > +	switch (status & CPU_BOOT_STATUS_MASK) {
> > +	default:
> > +		pr_err("CPU%u: failed in unknown state : 0x%lx\n",
> > +		       cpu, status);
> > +		cpus_stuck_in_kernel++;
> > +		break;
> > +	case CPU_KILL_ME:
> > +		if (!op_cpu_kill(cpu)) {
> > +			pr_crit("CPU%u: died during early boot\n", cpu);
> >  			break;
> > -		case CPU_PANIC_KERNEL:
> > -			panic("CPU%u detected unsupported configuration\n", cpu);
> >  		}
> > +		pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
> > +		/* Fall through */
> > +	case CPU_STUCK_IN_KERNEL:
> > +		pr_crit("CPU%u: is stuck in kernel\n", cpu);
> > +		if (status & CPU_STUCK_REASON_52_BIT_VA)
> > +			pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
> > +		if (status & CPU_STUCK_REASON_NO_GRAN) {
> > +			pr_crit("CPU%u: does not support %luK granule\n",
> > +				cpu, PAGE_SIZE / SZ_1K);
> > +		}
> > +		cpus_stuck_in_kernel++;
> > +		break;
> > +	case CPU_PANIC_KERNEL:
> > +		panic("CPU%u detected unsupported configuration\n", cpu);
> >  	}
> >  
> >  	return ret;
> > -- 
> > 2.23.0
> > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early
  2020-03-17 10:06   ` Mark Rutland
@ 2020-03-17 10:08     ` Catalin Marinas
  0 siblings, 0 replies; 8+ messages in thread
From: Catalin Marinas @ 2020-03-17 10:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Gavin Shan, linux-arm-kernel, linux-kernel, will, maz, shan.gavin

On Tue, Mar 17, 2020 at 10:06:09AM +0000, Mark Rutland wrote:
> On Mon, Mar 02, 2020 at 12:21:35PM +0000, Mark Rutland wrote:
> > On Mon, Mar 02, 2020 at 01:03:40PM +1100, Gavin Shan wrote:
> > > The function __cpu_up() is invoked to bring up the target CPU through
> > > the backend, PSCI for example. The nested if statements won't be needed
> > > if we bail out early on the following two conditions where the status
> > > won't be checked. The code looks simplified in that case.
> > > 
> > >    * Error returned from the backend (e.g. PSCI)
> > >    * The target CPU has been marked as onlined
> > > 
> > > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > 
> > FWIW, this looks like a nice cleanup to me:
> > 
> > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> 
> Catalin, are you happy to pick this up?

Yes, it was on my list to pick up already, just haven't got around to it
yet.

-- 
Catalin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early
  2020-03-02  2:03 [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early Gavin Shan
  2020-03-02 12:21 ` Mark Rutland
@ 2020-03-17 18:32 ` Catalin Marinas
  1 sibling, 0 replies; 8+ messages in thread
From: Catalin Marinas @ 2020-03-17 18:32 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-arm-kernel, linux-kernel, will, maz, mark.rutland, shan.gavin

On Mon, Mar 02, 2020 at 01:03:40PM +1100, Gavin Shan wrote:
> The function __cpu_up() is invoked to bring up the target CPU through
> the backend, PSCI for example. The nested if statements won't be needed
> if we bail out early on the following two conditions where the status
> won't be checked. The code looks simplified in that case.
> 
>    * Error returned from the backend (e.g. PSCI)
>    * The target CPU has been marked as onlined
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>

Queued for 5.7. Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-03-17 18:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-02  2:03 [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early Gavin Shan
2020-03-02 12:21 ` Mark Rutland
2020-03-02 13:38   ` Gavin Shan
2020-03-02 14:06     ` Mark Rutland
2020-03-02 14:35       ` Gavin Shan
2020-03-17 10:06   ` Mark Rutland
2020-03-17 10:08     ` Catalin Marinas
2020-03-17 18:32 ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).