From: Mark Rutland <mark.rutland@arm.com>
To: Gavin Shan <gshan@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, catalin.marinas@arm.com,
will@kernel.org, maz@kernel.org, shan.gavin@gmail.com
Subject: Re: [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early
Date: Mon, 2 Mar 2020 14:06:40 +0000 [thread overview]
Message-ID: <20200302140640.GC56497@lakrids.cambridge.arm.com> (raw)
In-Reply-To: <ddbb5cb2-e8b6-ab1c-d283-fb0f402d2a4f@redhat.com>
On Tue, Mar 03, 2020 at 12:38:48AM +1100, Gavin Shan wrote:
> On 3/2/20 11:21 PM, Mark Rutland wrote:
> > On Mon, Mar 02, 2020 at 01:03:40PM +1100, Gavin Shan wrote:
> > > The function __cpu_up() is invoked to bring up the target CPU through
> > > the backend, PSCI for example. The nested if statements won't be needed
> > > if we bail out early on the following two conditions where the status
> > > won't be checked. The code looks simplified in that case.
> > >
> > > * Error returned from the backend (e.g. PSCI)
> > > * The target CPU has been marked as onlined
> > >
> > > Signed-off-by: Gavin Shan <gshan@redhat.com>
> >
> > FWIW, this looks like a nice cleanup to me:
> >
> > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> >
> > While this patch leaves secondary_data.{task,stack} stale on a
> > successful onlining, that was already the case for a timeout, and should
> > be fine (since the next attempt at onlining will configure those before
> > poking the CPU).
> >
> > Thanks,
> > Mark.
> >
>
> Thanks, Mark. Yeah, it should be fine as you said. There are something else,
> which might be not relevant. @secondary_data could be accessed by multiple CPUs
> in parallel. For example, the master CPU boots CPU#1 and timeouts to wait it
> to be online in 5 seconds. CPU#1 isn't necessarily stuck in somewhere. After
> that, CPU#2 is brought up and might be accessing @secondary_data. At this point,
> CPU#1 can come back to access it either. However, @secondary_data isn't valid
> for CPU#1 anymore.
Sure; I'm aware of improvements that could be made here, but I don't
think they need to block this patch.
> I was thinking of something to improve the situation, but not sure if it makes
> any sense to do so. There are several options: (1) Make @secondary_data per-cpu
> variable, which looks a nature way to go. (2) To shutdown the CPU on timeout.
> The shutdown request can be failed to be served in theory, but it seems still
> an improvement.
I think #2 is a bad idea, since if the CPU gets into the kernel at all,
it may have done stuff (e.g. acquiring locks), and ripping it out is
liable to cause more problems.
I think doing #1 might be nice, but some caveats apply.
I'd like to clean up the secondary stack/task hand-over to use an atomic
cmpxchg pair, so that we can detect when the secondary has possibly
tried to use the stack/task. That requires splitting that from the
MMU-off bits from the MMU-on bits, and I'm not sure how well that
interacts with #1. It might mean that the per-cpu part isn't that
worthwhile.
Thanks,
Mark.
>
> Thanks,
> Gavin
>
> > > ---
> > > arch/arm64/kernel/smp.c | 79 +++++++++++++++++++----------------------
> > > 1 file changed, 37 insertions(+), 42 deletions(-)
> > >
> > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > index d4ed9a19d8fe..2a9d8f39dc58 100644
> > > --- a/arch/arm64/kernel/smp.c
> > > +++ b/arch/arm64/kernel/smp.c
> > > @@ -115,60 +115,55 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> > > update_cpu_boot_status(CPU_MMU_OFF);
> > > __flush_dcache_area(&secondary_data, sizeof(secondary_data));
> > > - /*
> > > - * Now bring the CPU into our world.
> > > - */
> > > + /* Now bring the CPU into our world */
> > > ret = boot_secondary(cpu, idle);
> > > - if (ret == 0) {
> > > - /*
> > > - * CPU was successfully started, wait for it to come online or
> > > - * time out.
> > > - */
> > > - wait_for_completion_timeout(&cpu_running,
> > > - msecs_to_jiffies(5000));
> > > -
> > > - if (!cpu_online(cpu)) {
> > > - pr_crit("CPU%u: failed to come online\n", cpu);
> > > - ret = -EIO;
> > > - }
> > > - } else {
> > > + if (ret) {
> > > pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> > > return ret;
> > > }
> > > + /*
> > > + * CPU was successfully started, wait for it to come online or
> > > + * time out.
> > > + */
> > > + wait_for_completion_timeout(&cpu_running,
> > > + msecs_to_jiffies(5000));
> > > + if (cpu_online(cpu))
> > > + return 0;
> > > +
> > > + pr_crit("CPU%u: failed to come online\n", cpu);
> > > secondary_data.task = NULL;
> > > secondary_data.stack = NULL;
> > > __flush_dcache_area(&secondary_data, sizeof(secondary_data));
> > > status = READ_ONCE(secondary_data.status);
> > > - if (ret && status) {
> > > -
> > > - if (status == CPU_MMU_OFF)
> > > - status = READ_ONCE(__early_cpu_boot_status);
> > > + if (status == CPU_MMU_OFF)
> > > + status = READ_ONCE(__early_cpu_boot_status);
> > > - switch (status & CPU_BOOT_STATUS_MASK) {
> > > - default:
> > > - pr_err("CPU%u: failed in unknown state : 0x%lx\n",
> > > - cpu, status);
> > > - cpus_stuck_in_kernel++;
> > > - break;
> > > - case CPU_KILL_ME:
> > > - if (!op_cpu_kill(cpu)) {
> > > - pr_crit("CPU%u: died during early boot\n", cpu);
> > > - break;
> > > - }
> > > - pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
> > > - /* Fall through */
> > > - case CPU_STUCK_IN_KERNEL:
> > > - pr_crit("CPU%u: is stuck in kernel\n", cpu);
> > > - if (status & CPU_STUCK_REASON_52_BIT_VA)
> > > - pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
> > > - if (status & CPU_STUCK_REASON_NO_GRAN)
> > > - pr_crit("CPU%u: does not support %luK granule \n", cpu, PAGE_SIZE / SZ_1K);
> > > - cpus_stuck_in_kernel++;
> > > + switch (status & CPU_BOOT_STATUS_MASK) {
> > > + default:
> > > + pr_err("CPU%u: failed in unknown state : 0x%lx\n",
> > > + cpu, status);
> > > + cpus_stuck_in_kernel++;
> > > + break;
> > > + case CPU_KILL_ME:
> > > + if (!op_cpu_kill(cpu)) {
> > > + pr_crit("CPU%u: died during early boot\n", cpu);
> > > break;
> > > - case CPU_PANIC_KERNEL:
> > > - panic("CPU%u detected unsupported configuration\n", cpu);
> > > }
> > > + pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
> > > + /* Fall through */
> > > + case CPU_STUCK_IN_KERNEL:
> > > + pr_crit("CPU%u: is stuck in kernel\n", cpu);
> > > + if (status & CPU_STUCK_REASON_52_BIT_VA)
> > > + pr_crit("CPU%u: does not support 52-bit VAs\n", cpu);
> > > + if (status & CPU_STUCK_REASON_NO_GRAN) {
> > > + pr_crit("CPU%u: does not support %luK granule\n",
> > > + cpu, PAGE_SIZE / SZ_1K);
> > > + }
> > > + cpus_stuck_in_kernel++;
> > > + break;
> > > + case CPU_PANIC_KERNEL:
> > > + panic("CPU%u detected unsupported configuration\n", cpu);
> > > }
> > > return ret;
> > > --
> > > 2.23.0
> > >
> >
>
next prev parent reply other threads:[~2020-03-02 14:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-02 2:03 [PATCH] arm64/kernel: Simplify __cpu_up() by bailing out early Gavin Shan
2020-03-02 12:21 ` Mark Rutland
2020-03-02 13:38 ` Gavin Shan
2020-03-02 14:06 ` Mark Rutland [this message]
2020-03-02 14:35 ` Gavin Shan
2020-03-17 10:06 ` Mark Rutland
2020-03-17 10:08 ` Catalin Marinas
2020-03-17 18:32 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200302140640.GC56497@lakrids.cambridge.arm.com \
--to=mark.rutland@arm.com \
--cc=catalin.marinas@arm.com \
--cc=gshan@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=shan.gavin@gmail.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).