All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiaoming Ni <nixiaoming@huawei.com>
To: Martin Kennedy <hurricos@gmail.com>
Cc: <Yuantian.Tang@feescale.com>, <benh@kernel.crashing.org>,
	<chenhui.zhao@freescale.com>, <chenjianguo3@huawei.com>,
	<gregkh@linuxfoundation.org>, <linux-kernel@vger.kernel.org>,
	<linuxppc-dev@lists.ozlabs.org>, <liuwenliang@huawei.com>,
	<mpe@ellerman.id.au>, <oss@buserror.net>,
	<paul.gortmaker@windriver.com>, <paulus@samba.org>,
	<stable@vger.kernel.org>, <wangle6@huawei.com>,
	"Christian Lamparter" <chunkeey@gmail.com>
Subject: Re: [PATCH v2 2/2] powerpc:85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n
Date: Thu, 25 Nov 2021 15:23:30 +0800	[thread overview]
Message-ID: <3c7523a3-2de2-3a76-2f46-9e4cf38f40b6@huawei.com> (raw)
In-Reply-To: <CANA18Uxu5dUYOkDmXpYtLc8iQuAYMv1UujkmEo1bkhm3CqxMAA@mail.gmail.com>

On 2021/11/25 12:20, Martin Kennedy wrote:
> Hi there,
> 
> I have bisected OpenWrt master, and then the Linux kernel down to this
> change, to confirm that this change causes a kernel panic on my
> P1020RDB-based, dual-core Aerohive HiveAP 370, at initialization of
> the second CPU:
> 
> :
> [    0.000000] Linux version 5.10.80 (labby@lobon)
> (powerpc-openwrt-linux-musl-gcc (OpenWrt GCC 11.2.0
> r18111+1-ebb6f9287e) 11.2.0, GNU ld (GNU Binutils) 2.37) #0 SMP Thu
> Nov 25 02:49:35 2021
> [    0.000000] Using P1020 RDB machine description
> :
> [    0.627233] smp: Bringing up secondary CPUs ...
> [    0.681659] kernel tried to execute user page (0) - exploit attempt? (uid: 0)
> [    0.766618] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
> [    0.848899] Faulting instruction address: 0x00000000
> [    0.908273] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.972851] BE PAGE_SIZE=4K SMP NR_CPUS=2 P1020 RDB
> [    1.031179] Modules linked in:
> [    1.067640] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.80 #0
> [    1.139507] NIP:  00000000 LR: c0021d2c CTR: 00000000
> [    1.199921] REGS: c1051cf0 TRAP: 0400   Not tainted  (5.10.80)
> [    1.269705] MSR:  00021000 <CE,ME>  CR: 84020202  XER: 00000000
> [    1.340534]
> [    1.340534] GPR00: c0021cb8 c1051da8 c1048000 00000001 00029000
> 00000000 00000001 00000000
> [    1.340534] GPR08: 00000001 00000000 c08b0000 00000040 22000208
> 00000000 c00032c4 00000000
> [    1.340534] GPR16: 00000000 00000000 00000000 00000000 00000000
> 00000000 00029000 00000001
> [    1.340534] GPR24: 1ffff240 20000000 dffff240 c080a1f4 00000001
> c08ae0a8 00000001 dffff240
> [    1.758220] NIP [00000000] 0x0
> [    1.794688] LR [c0021d2c] smp_85xx_kick_cpu+0xe8/0x568
> [    1.856126] Call Trace:
> [    1.885295] [c1051da8] [c0021cb8] smp_85xx_kick_cpu+0x74/0x568 (unreliable)
> [    1.968633] [c1051de8] [c0011460] __cpu_up+0xc0/0x228
> [    2.029038] [c1051e18] [c0031bbc] bringup_cpu+0x30/0x224
> [    2.092572] [c1051e48] [c0031f3c] cpu_up.constprop.0+0x180/0x33c
> [    2.164443] [c1051e88] [c00322e8] bringup_nonboot_cpus+0x88/0xc8
> [    2.236326] [c1051eb8] [c07e67bc] smp_init+0x30/0x78
> [    2.295698] [c1051ed8] [c07d9e28] kernel_init_freeable+0x118/0x2a8
> [    2.369641] [c1051f18] [c00032d8] kernel_init+0x14/0x124
> [    2.433176] [c1051f38] [c0010278] ret_from_kernel_thread+0x14/0x1c
> [    2.507125] Instruction dump:
> [    2.542541] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> XXXXXXXX XXXXXXXX
> [    2.635242] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> XXXXXXXX XXXXXXXX
> [    2.727952] ---[ end trace 9b796a4bafb6bc14 ]---
> [    2.783149]
> [    3.800879] Kernel panic - not syncing: Fatal exception
> [    3.862353] Rebooting in 1 seconds..
> [    5.905097] System Halted, OK to turn off power
> 
> Without this patch, the kernel no longer panics:
> 
> [    0.627232] smp: Bringing up secondary CPUs ...
> [    0.681857] smp: Brought up 1 node, 2 CPUs
> 
> Here is the kernel configuration for this built kernel:
> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob_plain;f=target/linux/mpc85xx/config-5.10;hb=HEAD
> 
> In case a force-push is needed for the source repository
> (https://github.com/Hurricos/openwrt/commit/ad19bdfc77d60ee1c52b41bb4345fdd02284c4cf),
> here is the device tree for this board:
> https://paste.c-net.org/TrousersSliced
> 
> Martin
> .
> 
When CONFIG_FSL_PMC is set to n, cpu_up_prepare is not assigned to 
mpc85xx_pm_ops. I suspect that this is the cause of the current null 
pointer access.
I do not have the corresponding board test environment. Can you help me 
to test whether the following patch solves the problem?

diff --git a/arch/powerpc/platforms/85xx/smp.c 
b/arch/powerpc/platforms/85xx/smp.c
index 83f4a6389a28..d7081e9af65c 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -220,7 +220,7 @@ static int smp_85xx_start_cpu(int cpu)
         local_irq_save(flags);
         hard_irq_disable();

-   if (qoriq_pm_ops)
+ if (qoriq_pm_ops && qoriq_pm_ops->cpu_up_prepare)
                 qoriq_pm_ops->cpu_up_prepare(cpu);

         /* if cpu is not spinning, reset it */
@@ -292,7 +292,7 @@ static int smp_85xx_kick_cpu(int nr)
                 booting_thread_hwid = cpu_thread_in_core(nr);
                 primary = cpu_first_thread_sibling(nr);

-           if (qoriq_pm_ops)
+         if (qoriq_pm_ops && qoriq_pm_ops->cpu_up_prepare)
                         qoriq_pm_ops->cpu_up_prepare(nr);

                 /*



WARNING: multiple messages have this Message-ID (diff)
From: Xiaoming Ni <nixiaoming@huawei.com>
To: Martin Kennedy <hurricos@gmail.com>
Cc: chenjianguo3@huawei.com, wangle6@huawei.com,
	chenhui.zhao@freescale.com,
	Christian Lamparter <chunkeey@gmail.com>,
	oss@buserror.net, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, Yuantian.Tang@feescale.com,
	paul.gortmaker@windriver.com, paulus@samba.org,
	gregkh@linuxfoundation.org, linuxppc-dev@lists.ozlabs.org,
	liuwenliang@huawei.com
Subject: Re: [PATCH v2 2/2] powerpc:85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n
Date: Thu, 25 Nov 2021 15:23:30 +0800	[thread overview]
Message-ID: <3c7523a3-2de2-3a76-2f46-9e4cf38f40b6@huawei.com> (raw)
In-Reply-To: <CANA18Uxu5dUYOkDmXpYtLc8iQuAYMv1UujkmEo1bkhm3CqxMAA@mail.gmail.com>

On 2021/11/25 12:20, Martin Kennedy wrote:
> Hi there,
> 
> I have bisected OpenWrt master, and then the Linux kernel down to this
> change, to confirm that this change causes a kernel panic on my
> P1020RDB-based, dual-core Aerohive HiveAP 370, at initialization of
> the second CPU:
> 
> :
> [    0.000000] Linux version 5.10.80 (labby@lobon)
> (powerpc-openwrt-linux-musl-gcc (OpenWrt GCC 11.2.0
> r18111+1-ebb6f9287e) 11.2.0, GNU ld (GNU Binutils) 2.37) #0 SMP Thu
> Nov 25 02:49:35 2021
> [    0.000000] Using P1020 RDB machine description
> :
> [    0.627233] smp: Bringing up secondary CPUs ...
> [    0.681659] kernel tried to execute user page (0) - exploit attempt? (uid: 0)
> [    0.766618] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
> [    0.848899] Faulting instruction address: 0x00000000
> [    0.908273] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.972851] BE PAGE_SIZE=4K SMP NR_CPUS=2 P1020 RDB
> [    1.031179] Modules linked in:
> [    1.067640] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.80 #0
> [    1.139507] NIP:  00000000 LR: c0021d2c CTR: 00000000
> [    1.199921] REGS: c1051cf0 TRAP: 0400   Not tainted  (5.10.80)
> [    1.269705] MSR:  00021000 <CE,ME>  CR: 84020202  XER: 00000000
> [    1.340534]
> [    1.340534] GPR00: c0021cb8 c1051da8 c1048000 00000001 00029000
> 00000000 00000001 00000000
> [    1.340534] GPR08: 00000001 00000000 c08b0000 00000040 22000208
> 00000000 c00032c4 00000000
> [    1.340534] GPR16: 00000000 00000000 00000000 00000000 00000000
> 00000000 00029000 00000001
> [    1.340534] GPR24: 1ffff240 20000000 dffff240 c080a1f4 00000001
> c08ae0a8 00000001 dffff240
> [    1.758220] NIP [00000000] 0x0
> [    1.794688] LR [c0021d2c] smp_85xx_kick_cpu+0xe8/0x568
> [    1.856126] Call Trace:
> [    1.885295] [c1051da8] [c0021cb8] smp_85xx_kick_cpu+0x74/0x568 (unreliable)
> [    1.968633] [c1051de8] [c0011460] __cpu_up+0xc0/0x228
> [    2.029038] [c1051e18] [c0031bbc] bringup_cpu+0x30/0x224
> [    2.092572] [c1051e48] [c0031f3c] cpu_up.constprop.0+0x180/0x33c
> [    2.164443] [c1051e88] [c00322e8] bringup_nonboot_cpus+0x88/0xc8
> [    2.236326] [c1051eb8] [c07e67bc] smp_init+0x30/0x78
> [    2.295698] [c1051ed8] [c07d9e28] kernel_init_freeable+0x118/0x2a8
> [    2.369641] [c1051f18] [c00032d8] kernel_init+0x14/0x124
> [    2.433176] [c1051f38] [c0010278] ret_from_kernel_thread+0x14/0x1c
> [    2.507125] Instruction dump:
> [    2.542541] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> XXXXXXXX XXXXXXXX
> [    2.635242] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> XXXXXXXX XXXXXXXX
> [    2.727952] ---[ end trace 9b796a4bafb6bc14 ]---
> [    2.783149]
> [    3.800879] Kernel panic - not syncing: Fatal exception
> [    3.862353] Rebooting in 1 seconds..
> [    5.905097] System Halted, OK to turn off power
> 
> Without this patch, the kernel no longer panics:
> 
> [    0.627232] smp: Bringing up secondary CPUs ...
> [    0.681857] smp: Brought up 1 node, 2 CPUs
> 
> Here is the kernel configuration for this built kernel:
> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob_plain;f=target/linux/mpc85xx/config-5.10;hb=HEAD
> 
> In case a force-push is needed for the source repository
> (https://github.com/Hurricos/openwrt/commit/ad19bdfc77d60ee1c52b41bb4345fdd02284c4cf),
> here is the device tree for this board:
> https://paste.c-net.org/TrousersSliced
> 
> Martin
> .
> 
When CONFIG_FSL_PMC is set to n, cpu_up_prepare is not assigned to 
mpc85xx_pm_ops. I suspect that this is the cause of the current null 
pointer access.
I do not have the corresponding board test environment. Can you help me 
to test whether the following patch solves the problem?

diff --git a/arch/powerpc/platforms/85xx/smp.c 
b/arch/powerpc/platforms/85xx/smp.c
index 83f4a6389a28..d7081e9af65c 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -220,7 +220,7 @@ static int smp_85xx_start_cpu(int cpu)
         local_irq_save(flags);
         hard_irq_disable();

-   if (qoriq_pm_ops)
+ if (qoriq_pm_ops && qoriq_pm_ops->cpu_up_prepare)
                 qoriq_pm_ops->cpu_up_prepare(cpu);

         /* if cpu is not spinning, reset it */
@@ -292,7 +292,7 @@ static int smp_85xx_kick_cpu(int nr)
                 booting_thread_hwid = cpu_thread_in_core(nr);
                 primary = cpu_first_thread_sibling(nr);

-           if (qoriq_pm_ops)
+         if (qoriq_pm_ops && qoriq_pm_ops->cpu_up_prepare)
                         qoriq_pm_ops->cpu_up_prepare(nr);

                 /*



  reply	other threads:[~2021-11-25  7:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-25  4:20 [PATCH v2 2/2] powerpc:85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n Martin Kennedy
2021-11-25  4:20 ` Martin Kennedy
2021-11-25  7:23 ` Xiaoming Ni [this message]
2021-11-25  7:23   ` Xiaoming Ni
2021-11-25 14:34   ` Martin Kennedy
2021-11-25 14:34     ` Martin Kennedy
2021-11-26  1:22     ` Xiaoming Ni
2021-11-26  1:22       ` Xiaoming Ni
2021-11-26  4:11       ` [PATCH] powerpc/85xx: fix oops when CONFIG_FSL_PMC=n Xiaoming Ni
2021-11-26  4:11         ` Xiaoming Ni
2021-12-03 11:53         ` Michael Ellerman
  -- strict thread matches above, loose matches on Subject: below --
2021-09-26 12:34 [PATCH] powerpc:85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=n Xiaoming Ni
2021-09-29  3:36 ` [PATCH v2 0/2] " Xiaoming Ni
2021-09-29  3:36   ` [PATCH v2 2/2] " Xiaoming Ni
2021-09-29  3:36     ` Xiaoming Ni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c7523a3-2de2-3a76-2f46-9e4cf38f40b6@huawei.com \
    --to=nixiaoming@huawei.com \
    --cc=Yuantian.Tang@feescale.com \
    --cc=benh@kernel.crashing.org \
    --cc=chenhui.zhao@freescale.com \
    --cc=chenjianguo3@huawei.com \
    --cc=chunkeey@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hurricos@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=liuwenliang@huawei.com \
    --cc=mpe@ellerman.id.au \
    --cc=oss@buserror.net \
    --cc=paul.gortmaker@windriver.com \
    --cc=paulus@samba.org \
    --cc=stable@vger.kernel.org \
    --cc=wangle6@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.