linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, catalin.marinas@arm.com,
	will@kernel.org, maz@kernel.org, shan.gavin@gmail.com
Subject: Re: [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early
Date: Tue, 3 Mar 2020 01:35:54 +1100	[thread overview]
Message-ID: <803b09ec-233d-8994-3dad-5a4a4ad85412@redhat.com> (raw)
In-Reply-To: <20200302140640.GC56497@lakrids.cambridge.arm.com>

On 3/3/20 1:06 AM, Mark Rutland wrote:
> On Tue, Mar 03, 2020 at 12:38:48AM +1100, Gavin Shan wrote:
>> On 3/2/20 11:21 PM, Mark Rutland wrote:
>>> On Mon, Mar 02, 2020 at 01:03:40PM +1100, Gavin Shan wrote:
>>>> The function __cpu_up() is invoked to bring up the target CPU through
>>>> the backend, PSCI for example. The nested if statements won't be needed
>>>> if we bail out early on the following two conditions where the status
>>>> won't be checked. The code looks simplified in that case.
>>>>
>>>>      * Error returned from the backend (e.g. PSCI)
>>>>      * The target CPU has been marked as onlined
>>>>
>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>
>>> FWIW, this looks like a nice cleanup to me:
>>>
>>> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>>>
>>> While this patch leaves secondary_data.{task,stack} stale on a
>>> successful onlining, that was already the case for a timeout, and should
>>> be fine (since the next attempt at onlining will configure those before
>>> poking the CPU).
>>>
>>> Thanks,
>>> Mark.
>>>
>>
>> Thanks, Mark. Yeah, it should be fine as you said. There are something else,
>> which might be not relevant. @secondary_data could be accessed by multiple CPUs
>> in parallel. For example, the master CPU boots CPU#1 and timeouts to wait it
>> to be online in 5 seconds. CPU#1 isn't necessarily stuck in somewhere. After
>> that, CPU#2 is brought up and might be accessing @secondary_data. At this point,
>> CPU#1 can come back to access it either. However, @secondary_data isn't valid
>> for CPU#1 anymore.
> 
> Sure; I'm aware of improvements that could be made here, but I don't
> think they need to block this patch.
> 

Yep, I think this patch is ready to go in if nobody else objects.

>> I was thinking of something to improve the situation, but not sure if it makes
>> any sense to do so. There are several options: (1) Make @secondary_data per-cpu
>> variable, which looks a nature way to go. (2) To shutdown the CPU on timeout.
>> The shutdown request can be failed to be served in theory, but it seems still
>> an improvement.
> 
> I think #2 is a bad idea, since if the CPU gets into the kernel at all,
> it may have done stuff (e.g. acquiring locks), and ripping it out is
> liable to cause more problems.
> 
> I think doing #1 might be nice, but some caveats apply.
> 
> I'd like to clean up the secondary stack/task hand-over to use an atomic
> cmpxchg pair, so that we can detect when the secondary has possibly
> tried to use the stack/task. That requires splitting that from the
> MMU-off bits from the MMU-on bits, and I'm not sure how well that
> interacts with #1. It might mean that the per-cpu part isn't that
> worthwhile.
> 

Right, #2 isn't good if the acquired resource (e.g. lock) can't be released.

It's something like to introduce a lock to the shared @secondary_data with
atomic cmpxchg pair. With that, I don't think per-cpu part is needed, not
so useful at least.

Thanks,
Gavin

>>
>>>> ---
>>>>    arch/arm64/kernel/smp.c | 79 +++++++++++++++++++----------------------
>>>>    1 file changed, 37 insertions(+), 42 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>>>> index d4ed9a19d8fe..2a9d8f39dc58 100644
>>>> --- a/arch/arm64/kernel/smp.c
>>>> +++ b/arch/arm64/kernel/smp.c
>>>> @@ -115,60 +115,55 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>>>    	update_cpu_boot_status(CPU_MMU_OFF);
>>>>    	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
>>>> -	/*
>>>> -	 * Now bring the CPU into our world.
>>>> -	 */
>>>> +	/* Now bring the CPU into our world */
>>>>    	ret = boot_secondary(cpu, idle);
>>>> -	if (ret == 0) {
>>>> -		/*
>>>> -		 * CPU was successfully started, wait for it to come online or
>>>> -		 * time out.
>>>> -		 */
>>>> -		wait_for_completion_timeout(&cpu_running,
>>>> -					    msecs_to_jiffies(5000));
>>>> -
>>>> -		if (!cpu_online(cpu)) {
>>>> -			pr_crit("CPU%u: failed to come online\n", cpu);
>>>> -			ret = -EIO;
>>>> -		}
>>>> -	} else {
>>>> +	if (ret) {
>>>>    		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>>>>    		return ret;
>>>>    	}
>>>> +	/*
>>>> +	 * CPU was successfully started, wait for it to come online or
>>>> +	 * time out.
>>>> +	 */
>>>> +	wait_for_completion_timeout(&cpu_running,
>>>> +				    msecs_to_jiffies(5000));
>>>> +	if (cpu_online(cpu))
>>>> +		return 0;
>>>> +
>>>> +	pr_crit("CPU%u: failed to come online\n", cpu);
>>>>    	secondary_data.task = NULL;
>>>>    	secondary_data.stack = NULL;
>>>>    	__flush_dcache_area(&secondary_data, sizeof(secondary_data));
>>>>    	status = READ_ONCE(secondary_data.status);
>>>> -	if (ret && status) {
>>>> -
>>>> -		if (status == CPU_MMU_OFF)
>>>> -			status = READ_ONCE(__early_cpu_boot_status);
>>>> +	if (status == CPU_MMU_OFF)
>>>> +		status = READ_ONCE(__early_cpu_boot_status);
>>>> -		switch (status & CPU_BOOT_STATUS_MASK) {
>>>> -		default:
>>>> -			pr_err("CPU%u: failed in unknown state : 0x%lx\n",
>>>> -					cpu, status);
>>>> -			cpus_stuck_in_kernel++;
>>>> -			break;
>>>> -		case CPU_KILL_ME:
>>>> -			if (!op_cpu_kill(cpu)) {
>>>> -				pr_crit("CPU%u: died during early boot\n", cpu);
>>>> -				break;
>>>> -			}
>>>> -			pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
>>>> -			/* Fall through */
>>>> -		case CPU_STUCK_IN_KERNEL:
>>>> -			pr_crit("CPU%u: is stuck in kernel\n", cpu);
>>>> -			if (status & CPU_STUCK_REASON_52_BIT_VA)
>>>> -				pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
>>>> -			if (status & CPU_STUCK_REASON_NO_GRAN)
>>>> -				pr_crit("CPU%u: does not support %luK granule \n", cpu, PAGE_SIZE / SZ_1K);
>>>> -			cpus_stuck_in_kernel++;
>>>> +	switch (status & CPU_BOOT_STATUS_MASK) {
>>>> +	default:
>>>> +		pr_err("CPU%u: failed in unknown state : 0x%lx\n",
>>>> +		       cpu, status);
>>>> +		cpus_stuck_in_kernel++;
>>>> +		break;
>>>> +	case CPU_KILL_ME:
>>>> +		if (!op_cpu_kill(cpu)) {
>>>> +			pr_crit("CPU%u: died during early boot\n", cpu);
>>>>    			break;
>>>> -		case CPU_PANIC_KERNEL:
>>>> -			panic("CPU%u detected unsupported configuration\n", cpu);
>>>>    		}
>>>> +		pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
>>>> +		/* Fall through */
>>>> +	case CPU_STUCK_IN_KERNEL:
>>>> +		pr_crit("CPU%u: is stuck in kernel\n", cpu);
>>>> +		if (status & CPU_STUCK_REASON_52_BIT_VA)
>>>> +			pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
>>>> +		if (status & CPU_STUCK_REASON_NO_GRAN) {
>>>> +			pr_crit("CPU%u: does not support %luK granule\n",
>>>> +				cpu, PAGE_SIZE / SZ_1K);
>>>> +		}
>>>> +		cpus_stuck_in_kernel++;
>>>> +		break;
>>>> +	case CPU_PANIC_KERNEL:
>>>> +		panic("CPU%u detected unsupported configuration\n", cpu);
>>>>    	}
>>>>    	return ret;
>>>> -- 
>>>> 2.23.0
>>>>
>>>
>>
> 


  reply	other threads:[~2020-03-02 14:36 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-02  2:03 [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early Gavin Shan
2020-03-02 12:21 ` Mark Rutland
2020-03-02 13:38   ` Gavin Shan
2020-03-02 14:06     ` Mark Rutland
2020-03-02 14:35       ` Gavin Shan [this message]
2020-03-17 10:06   ` Mark Rutland
2020-03-17 10:08     ` Catalin Marinas
2020-03-17 18:32 ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=803b09ec-233d-8994-3dad-5a4a4ad85412@redhat.com \
    --to=gshan@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=shan.gavin@gmail.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).