All of lore.kernel.org
 help / color / mirror / Atom feed
* Armada XP (MV78460): BUG in netdevice.h with maxcpus=2
@ 2016-01-08  7:50 Stefan Roese
  2016-01-08 10:25 ` Jisheng Zhang
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Roese @ 2016-01-08  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

I'm currently trying to boot latest Linux on the DB-MV784MP-GP
Marvell board. This works just fine for the normal
configuration. But when I try to boot with only 2 CPUs enabled
(maxcpus=2), mounting the rootfs via NFS crashes with this
log:

[   16.408285] ------------[ cut here ]------------
[   16.412915] kernel BUG at include/linux/netdevice.h:507!
[   16.418239] Internal error: Oops - BUG: 0 [#1] SMP ARM
[   16.423390] Modules linked in:
[   16.426464] CPU: 0 PID: 1424 Comm: systemd-udevd Not tainted 4.4.0-rc8-00045-gcbf389a #37
[   16.434660] Hardware name: Marvell Armada 370/XP (Device Tree)
[   16.440507] task: ee9d3100 ti: ee164000 task.ti: ee164000
[   16.445927] PC is at mvneta_percpu_notifier+0x358/0x35c
[   16.451167] LR is at mvneta_percpu_notifier+0x27c/0x35c
[   16.456405] pc : [<c0310d60>]    lr : [<c0310c84>]    psr: 60010013
[   16.456405] sp : ee165e18  ip : 00000003  fp : c077c494
[   16.467914] r10: c077cd58  r9 : 00000002  r8 : c077ce08
[   16.473154] r7 : eea8a520  r6 : ff7ea400  r5 : 00000004  r4 : 00000000
[   16.479697] r3 : 00000000  r2 : 00000008  r1 : 00000004  r0 : 00000004
[   16.486243] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   16.493402] Control: 10c5387d  Table: 2e16806a  DAC: 00000051
[   16.499168] Process systemd-udevd (pid: 1424, stack limit = 0xee164220)
[   16.505801] Stack: (0xee165e18 to 0xee166000)
[   16.510170] 5e00:                                                       c07b89c0 eea8a500
[   16.518373] 5e20: 00000002 ffffffdb 00000002 00000002 00000000 c0782aa0 00000000 ee82a5c0
[   16.526575] 5e40: 00000000 c003fb44 00000000 00000000 c077c494 00000000 00000001 c0024f68
[   16.534774] 5e60: 00000000 00000000 00000002 c0025574 ee165e7c 00000022 00000000 00000028
[   16.542977] 5e80: 00000002 eefca010 eefca044 ee165f88 edc61240 b6f07000 00000000 c00255fc
[   16.551178] 5ea0: c02907fc c028c520 eefca010 00000001 00000001 c028c5ac 00000000 01eada00
[   16.559381] 5ec0: 00000001 edeada00 edc6124c c01324f0 00000000 00000000 00000800 ee9c6f40
[   16.567582] 5ee0: c0132438 b6f07000 ee165f88 c000f7c4 ee164000 00000000 00039388 c00d6bd0
[   16.575782] 5f00: 00000001 00000000 0002b2cc c00092f4 00001000 00001000 b6f07000 00000003
[   16.583981] 5f20: 00000000 edc22580 00000001 c077ce2c 00000073 c00c3994 000b6f07 00000000
[   16.592180] 5f40: beef1270 ffffffff 48400000 ee9c6f40 00000001 b6f07000 ee165f88 c000f7c4
[   16.600385] 5f60: ee164000 c00d7414 0002b2cc c00b3a88 ee9c6f40 ee9c6f40 b6f07000 00000001
[   16.608588] 5f80: c000f7c4 c00d7c38 00000000 00000000 48400000 00000001 b6f07000 00046c70
[   16.616797] 5fa0: 00000004 c000f600 00000001 b6f07000 00000005 b6f07000 00000001 00000000
[   16.625003] 5fc0: 00000001 b6f07000 00046c70 00000004 00000001 00000000 00039860 00039388
[   16.633208] 5fe0: 00000000 beef181c b6d6bf81 b6da5066 400d0030 00000005 00180011 000022be
[   16.641431] [<c0310d60>] (mvneta_percpu_notifier) from [<c003fb44>] (notifier_call_chain+0x44/0x84)
[   16.650513] [<c003fb44>] (notifier_call_chain) from [<c0024f68>] (cpu_notify+0x24/0x40)
[   16.658544] [<c0024f68>] (cpu_notify) from [<c0025574>] (_cpu_up+0x14c/0x164)
[   16.665700] [<c0025574>] (_cpu_up) from [<c00255fc>] (cpu_up+0x70/0x94)
[   16.672333] [<c00255fc>] (cpu_up) from [<c028c520>] (device_online+0x64/0x88)
[   16.679490] [<c028c520>] (device_online) from [<c028c5ac>] (online_store+0x68/0x74)
[   16.687174] [<c028c5ac>] (online_store) from [<c01324f0>] (kernfs_fop_write+0xb8/0x1b4)
[   16.695206] [<c01324f0>] (kernfs_fop_write) from [<c00d6bd0>] (__vfs_write+0x1c/0xd8)
[   16.703061] [<c00d6bd0>] (__vfs_write) from [<c00d7414>] (vfs_write+0x90/0x16c)
[   16.710391] [<c00d7414>] (vfs_write) from [<c00d7c38>] (SyS_write+0x44/0x9c)
[   16.717464] [<c00d7c38>] (SyS_write) from [<c000f600>] (ret_fast_syscall+0x0/0x3c)
[   16.725057] Code: 3afffff5 e3a00001 e28dd00c e8bd8ff0 (e7f001f2) 
[   16.731167] ---[ end trace 1451bd9f24838b0f ]---


Again, this only happen, if I boot the system with 2 CPUs online.
Without this kernel commandline arg (and all 4 CPUs online),
mounting the NFS rootfs works just fine.

Is this a known problem? Do you have any quick hints how to fix
this issue?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Armada XP (MV78460): BUG in netdevice.h with maxcpus=2
  2016-01-08  7:50 Armada XP (MV78460): BUG in netdevice.h with maxcpus=2 Stefan Roese
@ 2016-01-08 10:25 ` Jisheng Zhang
  2016-01-08 10:51   ` Gregory CLEMENT
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jisheng Zhang @ 2016-01-08 10:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Stefan,

On Fri, 8 Jan 2016 08:50:31 +0100 Stefan Roese wrote:

> Hi Thomas,
> 
> I'm currently trying to boot latest Linux on the DB-MV784MP-GP
> Marvell board. This works just fine for the normal
> configuration. But when I try to boot with only 2 CPUs enabled
> (maxcpus=2), mounting the rootfs via NFS crashes with this
> log:
> 
> [   16.408285] ------------[ cut here ]------------
> [   16.412915] kernel BUG at include/linux/netdevice.h:507!
> [   16.418239] Internal error: Oops - BUG: 0 [#1] SMP ARM
> [   16.423390] Modules linked in:
> [   16.426464] CPU: 0 PID: 1424 Comm: systemd-udevd Not tainted 4.4.0-rc8-00045-gcbf389a #37
> [   16.434660] Hardware name: Marvell Armada 370/XP (Device Tree)
> [   16.440507] task: ee9d3100 ti: ee164000 task.ti: ee164000
> [   16.445927] PC is at mvneta_percpu_notifier+0x358/0x35c
> [   16.451167] LR is at mvneta_percpu_notifier+0x27c/0x35c
> [   16.456405] pc : [<c0310d60>]    lr : [<c0310c84>]    psr: 60010013
> [   16.456405] sp : ee165e18  ip : 00000003  fp : c077c494
> [   16.467914] r10: c077cd58  r9 : 00000002  r8 : c077ce08
> [   16.473154] r7 : eea8a520  r6 : ff7ea400  r5 : 00000004  r4 : 00000000
> [   16.479697] r3 : 00000000  r2 : 00000008  r1 : 00000004  r0 : 00000004
> [   16.486243] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> [   16.493402] Control: 10c5387d  Table: 2e16806a  DAC: 00000051
> [   16.499168] Process systemd-udevd (pid: 1424, stack limit = 0xee164220)
> [   16.505801] Stack: (0xee165e18 to 0xee166000)
> [   16.510170] 5e00:                                                       c07b89c0 eea8a500
> [   16.518373] 5e20: 00000002 ffffffdb 00000002 00000002 00000000 c0782aa0 00000000 ee82a5c0
> [   16.526575] 5e40: 00000000 c003fb44 00000000 00000000 c077c494 00000000 00000001 c0024f68
> [   16.534774] 5e60: 00000000 00000000 00000002 c0025574 ee165e7c 00000022 00000000 00000028
> [   16.542977] 5e80: 00000002 eefca010 eefca044 ee165f88 edc61240 b6f07000 00000000 c00255fc
> [   16.551178] 5ea0: c02907fc c028c520 eefca010 00000001 00000001 c028c5ac 00000000 01eada00
> [   16.559381] 5ec0: 00000001 edeada00 edc6124c c01324f0 00000000 00000000 00000800 ee9c6f40
> [   16.567582] 5ee0: c0132438 b6f07000 ee165f88 c000f7c4 ee164000 00000000 00039388 c00d6bd0
> [   16.575782] 5f00: 00000001 00000000 0002b2cc c00092f4 00001000 00001000 b6f07000 00000003
> [   16.583981] 5f20: 00000000 edc22580 00000001 c077ce2c 00000073 c00c3994 000b6f07 00000000
> [   16.592180] 5f40: beef1270 ffffffff 48400000 ee9c6f40 00000001 b6f07000 ee165f88 c000f7c4
> [   16.600385] 5f60: ee164000 c00d7414 0002b2cc c00b3a88 ee9c6f40 ee9c6f40 b6f07000 00000001
> [   16.608588] 5f80: c000f7c4 c00d7c38 00000000 00000000 48400000 00000001 b6f07000 00046c70
> [   16.616797] 5fa0: 00000004 c000f600 00000001 b6f07000 00000005 b6f07000 00000001 00000000
> [   16.625003] 5fc0: 00000001 b6f07000 00046c70 00000004 00000001 00000000 00039860 00039388
> [   16.633208] 5fe0: 00000000 beef181c b6d6bf81 b6da5066 400d0030 00000005 00180011 000022be
> [   16.641431] [<c0310d60>] (mvneta_percpu_notifier) from [<c003fb44>] (notifier_call_chain+0x44/0x84)
> [   16.650513] [<c003fb44>] (notifier_call_chain) from [<c0024f68>] (cpu_notify+0x24/0x40)
> [   16.658544] [<c0024f68>] (cpu_notify) from [<c0025574>] (_cpu_up+0x14c/0x164)
> [   16.665700] [<c0025574>] (_cpu_up) from [<c00255fc>] (cpu_up+0x70/0x94)
> [   16.672333] [<c00255fc>] (cpu_up) from [<c028c520>] (device_online+0x64/0x88)
> [   16.679490] [<c028c520>] (device_online) from [<c028c5ac>] (online_store+0x68/0x74)
> [   16.687174] [<c028c5ac>] (online_store) from [<c01324f0>] (kernfs_fop_write+0xb8/0x1b4)
> [   16.695206] [<c01324f0>] (kernfs_fop_write) from [<c00d6bd0>] (__vfs_write+0x1c/0xd8)
> [   16.703061] [<c00d6bd0>] (__vfs_write) from [<c00d7414>] (vfs_write+0x90/0x16c)
> [   16.710391] [<c00d7414>] (vfs_write) from [<c00d7c38>] (SyS_write+0x44/0x9c)
> [   16.717464] [<c00d7c38>] (SyS_write) from [<c000f600>] (ret_fast_syscall+0x0/0x3c)
> [   16.725057] Code: 3afffff5 e3a00001 e28dd00c e8bd8ff0 (e7f001f2) 
> [   16.731167] ---[ end trace 1451bd9f24838b0f ]---
> 
> 
> Again, this only happen, if I boot the system with 2 CPUs online.
> Without this kernel commandline arg (and all 4 CPUs online),
> mounting the NFS rootfs works just fine.
> 
> Is this a known problem? Do you have any quick hints how to fix
> this issue?


can you please try the following patch?

Thanks

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index ed622fa..e1242f4 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2446,7 +2446,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
 	mvneta_port_enable(pp);
 
 	/* Enable polling on the port */
-	for_each_present_cpu(cpu) {
+	for_each_online_cpu(cpu) {
 		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
 
 		napi_enable(&port->napi);
@@ -2472,7 +2472,7 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
 
 	phy_stop(pp->phy_dev);
 
-	for_each_present_cpu(cpu) {
+	for_each_online_cpu(cpu) {
 		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
 
 		napi_disable(&port->napi);
@@ -2907,7 +2907,7 @@ static int mvneta_stop(struct net_device *dev)
 	mvneta_stop_dev(pp);
 	mvneta_mdio_remove(pp);
 	unregister_cpu_notifier(&pp->cpu_notifier);
-	for_each_present_cpu(cpu)
+	for_each_online_cpu(cpu)
 		smp_call_function_single(cpu, mvneta_percpu_disable, pp, true);
 	free_percpu_irq(dev->irq, pp->ports);
 	mvneta_cleanup_rxqs(pp);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Armada XP (MV78460): BUG in netdevice.h with maxcpus=2
  2016-01-08 10:25 ` Jisheng Zhang
@ 2016-01-08 10:51   ` Gregory CLEMENT
  2016-01-08 10:53   ` Stefan Roese
  2016-01-08 10:57   ` Russell King - ARM Linux
  2 siblings, 0 replies; 10+ messages in thread
From: Gregory CLEMENT @ 2016-01-08 10:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jisheng,
 
 On ven., janv. 08 2016, Jisheng Zhang <jszhang@marvell.com> wrote:

> Hi Stefan,
>
> On Fri, 8 Jan 2016 08:50:31 +0100 Stefan Roese wrote:
>
>> Hi Thomas,
>> 
>> I'm currently trying to boot latest Linux on the DB-MV784MP-GP
>> Marvell board. This works just fine for the normal
>> configuration. But when I try to boot with only 2 CPUs enabled
>> (maxcpus=2), mounting the rootfs via NFS crashes with this
>> log:
>> 
>> [   16.408285] ------------[ cut here ]------------
>> [   16.412915] kernel BUG at include/linux/netdevice.h:507!
>> [   16.418239] Internal error: Oops - BUG: 0 [#1] SMP ARM
>> [   16.423390] Modules linked in:
>> [   16.426464] CPU: 0 PID: 1424 Comm: systemd-udevd Not tainted 4.4.0-rc8-00045-gcbf389a #37
>> [   16.434660] Hardware name: Marvell Armada 370/XP (Device Tree)
>> [   16.440507] task: ee9d3100 ti: ee164000 task.ti: ee164000
>> [   16.445927] PC is at mvneta_percpu_notifier+0x358/0x35c
>> [   16.451167] LR is at mvneta_percpu_notifier+0x27c/0x35c
>> [   16.456405] pc : [<c0310d60>]    lr : [<c0310c84>]    psr: 60010013
>> [   16.456405] sp : ee165e18  ip : 00000003  fp : c077c494
>> [   16.467914] r10: c077cd58  r9 : 00000002  r8 : c077ce08
>> [   16.473154] r7 : eea8a520  r6 : ff7ea400  r5 : 00000004  r4 : 00000000
>> [   16.479697] r3 : 00000000  r2 : 00000008  r1 : 00000004  r0 : 00000004
>> [   16.486243] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
>> [   16.493402] Control: 10c5387d  Table: 2e16806a  DAC: 00000051
>> [   16.499168] Process systemd-udevd (pid: 1424, stack limit = 0xee164220)
>> [   16.505801] Stack: (0xee165e18 to 0xee166000)
>> [   16.510170] 5e00:                                                       c07b89c0 eea8a500
>> [   16.518373] 5e20: 00000002 ffffffdb 00000002 00000002 00000000 c0782aa0 00000000 ee82a5c0
>> [   16.526575] 5e40: 00000000 c003fb44 00000000 00000000 c077c494 00000000 00000001 c0024f68
>> [   16.534774] 5e60: 00000000 00000000 00000002 c0025574 ee165e7c 00000022 00000000 00000028
>> [   16.542977] 5e80: 00000002 eefca010 eefca044 ee165f88 edc61240 b6f07000 00000000 c00255fc
>> [   16.551178] 5ea0: c02907fc c028c520 eefca010 00000001 00000001 c028c5ac 00000000 01eada00
>> [   16.559381] 5ec0: 00000001 edeada00 edc6124c c01324f0 00000000 00000000 00000800 ee9c6f40
>> [   16.567582] 5ee0: c0132438 b6f07000 ee165f88 c000f7c4 ee164000 00000000 00039388 c00d6bd0
>> [   16.575782] 5f00: 00000001 00000000 0002b2cc c00092f4 00001000 00001000 b6f07000 00000003
>> [   16.583981] 5f20: 00000000 edc22580 00000001 c077ce2c 00000073 c00c3994 000b6f07 00000000
>> [   16.592180] 5f40: beef1270 ffffffff 48400000 ee9c6f40 00000001 b6f07000 ee165f88 c000f7c4
>> [   16.600385] 5f60: ee164000 c00d7414 0002b2cc c00b3a88 ee9c6f40 ee9c6f40 b6f07000 00000001
>> [   16.608588] 5f80: c000f7c4 c00d7c38 00000000 00000000 48400000 00000001 b6f07000 00046c70
>> [   16.616797] 5fa0: 00000004 c000f600 00000001 b6f07000 00000005 b6f07000 00000001 00000000
>> [   16.625003] 5fc0: 00000001 b6f07000 00046c70 00000004 00000001 00000000 00039860 00039388
>> [   16.633208] 5fe0: 00000000 beef181c b6d6bf81 b6da5066 400d0030 00000005 00180011 000022be
>> [   16.641431] [<c0310d60>] (mvneta_percpu_notifier) from [<c003fb44>] (notifier_call_chain+0x44/0x84)
>> [   16.650513] [<c003fb44>] (notifier_call_chain) from [<c0024f68>] (cpu_notify+0x24/0x40)
>> [   16.658544] [<c0024f68>] (cpu_notify) from [<c0025574>] (_cpu_up+0x14c/0x164)
>> [   16.665700] [<c0025574>] (_cpu_up) from [<c00255fc>] (cpu_up+0x70/0x94)
>> [   16.672333] [<c00255fc>] (cpu_up) from [<c028c520>] (device_online+0x64/0x88)
>> [   16.679490] [<c028c520>] (device_online) from [<c028c5ac>] (online_store+0x68/0x74)
>> [   16.687174] [<c028c5ac>] (online_store) from [<c01324f0>] (kernfs_fop_write+0xb8/0x1b4)
>> [   16.695206] [<c01324f0>] (kernfs_fop_write) from [<c00d6bd0>] (__vfs_write+0x1c/0xd8)
>> [   16.703061] [<c00d6bd0>] (__vfs_write) from [<c00d7414>] (vfs_write+0x90/0x16c)
>> [   16.710391] [<c00d7414>] (vfs_write) from [<c00d7c38>] (SyS_write+0x44/0x9c)
>> [   16.717464] [<c00d7c38>] (SyS_write) from [<c000f600>] (ret_fast_syscall+0x0/0x3c)
>> [   16.725057] Code: 3afffff5 e3a00001 e28dd00c e8bd8ff0 (e7f001f2) 
>> [   16.731167] ---[ end trace 1451bd9f24838b0f ]---
>> 
>> 
>> Again, this only happen, if I boot the system with 2 CPUs online.
>> Without this kernel commandline arg (and all 4 CPUs online),
>> mounting the NFS rootfs works just fine.
>> 
>> Is this a known problem? Do you have any quick hints how to fix
>> this issue?
>
>
> can you please try the following patch?

This patch looks good for me. The reason behind using the
for_each_present_cpu macro was to be ready when the CPUs come
online. But this case is already managed by the CPU notifiers.

Once Stefan will have confirmed that it fixed his issue, would you mind
to send a proper patch to the netdev mailing list?

Thanks,

Gregory


>
> Thanks
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index ed622fa..e1242f4 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -2446,7 +2446,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
>  	mvneta_port_enable(pp);
>  
>  	/* Enable polling on the port */
> -	for_each_present_cpu(cpu) {
> +	for_each_online_cpu(cpu) {
>  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
>  
>  		napi_enable(&port->napi);
> @@ -2472,7 +2472,7 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
>  
>  	phy_stop(pp->phy_dev);
>  
> -	for_each_present_cpu(cpu) {
> +	for_each_online_cpu(cpu) {
>  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
>  
>  		napi_disable(&port->napi);
> @@ -2907,7 +2907,7 @@ static int mvneta_stop(struct net_device *dev)
>  	mvneta_stop_dev(pp);
>  	mvneta_mdio_remove(pp);
>  	unregister_cpu_notifier(&pp->cpu_notifier);
> -	for_each_present_cpu(cpu)
> +	for_each_online_cpu(cpu)
>  		smp_call_function_single(cpu, mvneta_percpu_disable, pp, true);
>  	free_percpu_irq(dev->irq, pp->ports);
>  	mvneta_cleanup_rxqs(pp);
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Armada XP (MV78460): BUG in netdevice.h with maxcpus=2
  2016-01-08 10:25 ` Jisheng Zhang
  2016-01-08 10:51   ` Gregory CLEMENT
@ 2016-01-08 10:53   ` Stefan Roese
  2016-01-08 10:57   ` Russell King - ARM Linux
  2 siblings, 0 replies; 10+ messages in thread
From: Stefan Roese @ 2016-01-08 10:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jisheng,

On 08.01.2016 11:25, Jisheng Zhang wrote:
> On Fri, 8 Jan 2016 08:50:31 +0100 Stefan Roese wrote:
>
>> Hi Thomas,
>>
>> I'm currently trying to boot latest Linux on the DB-MV784MP-GP
>> Marvell board. This works just fine for the normal
>> configuration. But when I try to boot with only 2 CPUs enabled
>> (maxcpus=2), mounting the rootfs via NFS crashes with this
>> log:
>>
>> [   16.408285] ------------[ cut here ]------------
>> [   16.412915] kernel BUG at include/linux/netdevice.h:507!
>> [   16.418239] Internal error: Oops - BUG: 0 [#1] SMP ARM
>> [   16.423390] Modules linked in:
>> [   16.426464] CPU: 0 PID: 1424 Comm: systemd-udevd Not tainted 4.4.0-rc8-00045-gcbf389a #37
>> [   16.434660] Hardware name: Marvell Armada 370/XP (Device Tree)
>> [   16.440507] task: ee9d3100 ti: ee164000 task.ti: ee164000
>> [   16.445927] PC is at mvneta_percpu_notifier+0x358/0x35c
>> [   16.451167] LR is at mvneta_percpu_notifier+0x27c/0x35c
>> [   16.456405] pc : [<c0310d60>]    lr : [<c0310c84>]    psr: 60010013
>> [   16.456405] sp : ee165e18  ip : 00000003  fp : c077c494
>> [   16.467914] r10: c077cd58  r9 : 00000002  r8 : c077ce08
>> [   16.473154] r7 : eea8a520  r6 : ff7ea400  r5 : 00000004  r4 : 00000000
>> [   16.479697] r3 : 00000000  r2 : 00000008  r1 : 00000004  r0 : 00000004
>> [   16.486243] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
>> [   16.493402] Control: 10c5387d  Table: 2e16806a  DAC: 00000051
>> [   16.499168] Process systemd-udevd (pid: 1424, stack limit = 0xee164220)
>> [   16.505801] Stack: (0xee165e18 to 0xee166000)
>> [   16.510170] 5e00:                                                       c07b89c0 eea8a500
>> [   16.518373] 5e20: 00000002 ffffffdb 00000002 00000002 00000000 c0782aa0 00000000 ee82a5c0
>> [   16.526575] 5e40: 00000000 c003fb44 00000000 00000000 c077c494 00000000 00000001 c0024f68
>> [   16.534774] 5e60: 00000000 00000000 00000002 c0025574 ee165e7c 00000022 00000000 00000028
>> [   16.542977] 5e80: 00000002 eefca010 eefca044 ee165f88 edc61240 b6f07000 00000000 c00255fc
>> [   16.551178] 5ea0: c02907fc c028c520 eefca010 00000001 00000001 c028c5ac 00000000 01eada00
>> [   16.559381] 5ec0: 00000001 edeada00 edc6124c c01324f0 00000000 00000000 00000800 ee9c6f40
>> [   16.567582] 5ee0: c0132438 b6f07000 ee165f88 c000f7c4 ee164000 00000000 00039388 c00d6bd0
>> [   16.575782] 5f00: 00000001 00000000 0002b2cc c00092f4 00001000 00001000 b6f07000 00000003
>> [   16.583981] 5f20: 00000000 edc22580 00000001 c077ce2c 00000073 c00c3994 000b6f07 00000000
>> [   16.592180] 5f40: beef1270 ffffffff 48400000 ee9c6f40 00000001 b6f07000 ee165f88 c000f7c4
>> [   16.600385] 5f60: ee164000 c00d7414 0002b2cc c00b3a88 ee9c6f40 ee9c6f40 b6f07000 00000001
>> [   16.608588] 5f80: c000f7c4 c00d7c38 00000000 00000000 48400000 00000001 b6f07000 00046c70
>> [   16.616797] 5fa0: 00000004 c000f600 00000001 b6f07000 00000005 b6f07000 00000001 00000000
>> [   16.625003] 5fc0: 00000001 b6f07000 00046c70 00000004 00000001 00000000 00039860 00039388
>> [   16.633208] 5fe0: 00000000 beef181c b6d6bf81 b6da5066 400d0030 00000005 00180011 000022be
>> [   16.641431] [<c0310d60>] (mvneta_percpu_notifier) from [<c003fb44>] (notifier_call_chain+0x44/0x84)
>> [   16.650513] [<c003fb44>] (notifier_call_chain) from [<c0024f68>] (cpu_notify+0x24/0x40)
>> [   16.658544] [<c0024f68>] (cpu_notify) from [<c0025574>] (_cpu_up+0x14c/0x164)
>> [   16.665700] [<c0025574>] (_cpu_up) from [<c00255fc>] (cpu_up+0x70/0x94)
>> [   16.672333] [<c00255fc>] (cpu_up) from [<c028c520>] (device_online+0x64/0x88)
>> [   16.679490] [<c028c520>] (device_online) from [<c028c5ac>] (online_store+0x68/0x74)
>> [   16.687174] [<c028c5ac>] (online_store) from [<c01324f0>] (kernfs_fop_write+0xb8/0x1b4)
>> [   16.695206] [<c01324f0>] (kernfs_fop_write) from [<c00d6bd0>] (__vfs_write+0x1c/0xd8)
>> [   16.703061] [<c00d6bd0>] (__vfs_write) from [<c00d7414>] (vfs_write+0x90/0x16c)
>> [   16.710391] [<c00d7414>] (vfs_write) from [<c00d7c38>] (SyS_write+0x44/0x9c)
>> [   16.717464] [<c00d7c38>] (SyS_write) from [<c000f600>] (ret_fast_syscall+0x0/0x3c)
>> [   16.725057] Code: 3afffff5 e3a00001 e28dd00c e8bd8ff0 (e7f001f2)
>> [   16.731167] ---[ end trace 1451bd9f24838b0f ]---
>>
>>
>> Again, this only happen, if I boot the system with 2 CPUs online.
>> Without this kernel commandline arg (and all 4 CPUs online),
>> mounting the NFS rootfs works just fine.
>>
>> Is this a known problem? Do you have any quick hints how to fix
>> this issue?
>
>
> can you please try the following patch?
>
> Thanks
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index ed622fa..e1242f4 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -2446,7 +2446,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
>   	mvneta_port_enable(pp);
>
>   	/* Enable polling on the port */
> -	for_each_present_cpu(cpu) {
> +	for_each_online_cpu(cpu) {
>   		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
>
>   		napi_enable(&port->napi);
> @@ -2472,7 +2472,7 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
>
>   	phy_stop(pp->phy_dev);
>
> -	for_each_present_cpu(cpu) {
> +	for_each_online_cpu(cpu) {
>   		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
>
>   		napi_disable(&port->napi);
> @@ -2907,7 +2907,7 @@ static int mvneta_stop(struct net_device *dev)
>   	mvneta_stop_dev(pp);
>   	mvneta_mdio_remove(pp);
>   	unregister_cpu_notifier(&pp->cpu_notifier);
> -	for_each_present_cpu(cpu)
> +	for_each_online_cpu(cpu)
>   		smp_call_function_single(cpu, mvneta_percpu_disable, pp, true);
>   	free_percpu_irq(dev->irq, pp->ports);
>   	mvneta_cleanup_rxqs(pp);

This patch fixes this issue. Thanks for the quick response.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Armada XP (MV78460): BUG in netdevice.h with maxcpus=2
  2016-01-08 10:25 ` Jisheng Zhang
  2016-01-08 10:51   ` Gregory CLEMENT
  2016-01-08 10:53   ` Stefan Roese
@ 2016-01-08 10:57   ` Russell King - ARM Linux
  2016-01-08 12:45     ` Jisheng Zhang
  2 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2016-01-08 10:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 08, 2016 at 06:25:37PM +0800, Jisheng Zhang wrote:
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index ed622fa..e1242f4 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -2446,7 +2446,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
>  	mvneta_port_enable(pp);
>  
>  	/* Enable polling on the port */
> -	for_each_present_cpu(cpu) {
> +	for_each_online_cpu(cpu) {
>  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
>  
>  		napi_enable(&port->napi);
> @@ -2472,7 +2472,7 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
>  
>  	phy_stop(pp->phy_dev);
>  
> -	for_each_present_cpu(cpu) {
> +	for_each_online_cpu(cpu) {
>  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
>  
>  		napi_disable(&port->napi);
> @@ -2907,7 +2907,7 @@ static int mvneta_stop(struct net_device *dev)
>  	mvneta_stop_dev(pp);
>  	mvneta_mdio_remove(pp);
>  	unregister_cpu_notifier(&pp->cpu_notifier);
> -	for_each_present_cpu(cpu)
> +	for_each_online_cpu(cpu)
>  		smp_call_function_single(cpu, mvneta_percpu_disable, pp, true);
>  	free_percpu_irq(dev->irq, pp->ports);
>  	mvneta_cleanup_rxqs(pp);

I'm not convinced that this isn't racy - what happens if a CPU is
brought online between unregister_cpu_notifier(&pp->cpu_notifier)
and the following loop?  We'll end up calling mvneta_percpu_disable()
for the new CPU - is that harmless?

Similarly, what if the online CPUs change between mvneta_stop_dev()
and mvneta_stop(), and also what about hotplug CPU changes during
the startup path?

Secondly, is there a reason for:

	for_each_online_cpu(cpu)
		smp_call_function_single(cpu, ...)

as opposed to:

	smp_call_function(mvneta_percpu_disable, pp, true);

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Armada XP (MV78460): BUG in netdevice.h with maxcpus=2
  2016-01-08 10:57   ` Russell King - ARM Linux
@ 2016-01-08 12:45     ` Jisheng Zhang
  2016-01-08 13:03       ` Jisheng Zhang
  2016-01-08 13:31       ` Russell King - ARM Linux
  0 siblings, 2 replies; 10+ messages in thread
From: Jisheng Zhang @ 2016-01-08 12:45 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Russell,

On Fri, 8 Jan 2016 10:57:21 +0000 Russell King - ARM Linux wrote:

> On Fri, Jan 08, 2016 at 06:25:37PM +0800, Jisheng Zhang wrote:
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > index ed622fa..e1242f4 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > @@ -2446,7 +2446,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
> >  	mvneta_port_enable(pp);
> >  
> >  	/* Enable polling on the port */
> > -	for_each_present_cpu(cpu) {
> > +	for_each_online_cpu(cpu) {
> >  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
> >  
> >  		napi_enable(&port->napi);
> > @@ -2472,7 +2472,7 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
> >  
> >  	phy_stop(pp->phy_dev);
> >  
> > -	for_each_present_cpu(cpu) {
> > +	for_each_online_cpu(cpu) {
> >  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
> >  
> >  		napi_disable(&port->napi);
> > @@ -2907,7 +2907,7 @@ static int mvneta_stop(struct net_device *dev)
> >  	mvneta_stop_dev(pp);
> >  	mvneta_mdio_remove(pp);
> >  	unregister_cpu_notifier(&pp->cpu_notifier);
> > -	for_each_present_cpu(cpu)
> > +	for_each_online_cpu(cpu)
> >  		smp_call_function_single(cpu, mvneta_percpu_disable, pp, true);
> >  	free_percpu_irq(dev->irq, pp->ports);
> >  	mvneta_cleanup_rxqs(pp);  
> 
> I'm not convinced that this isn't racy - what happens if a CPU is
> brought online between unregister_cpu_notifier(&pp->cpu_notifier)
> and the following loop?  We'll end up calling mvneta_percpu_disable()
> for the new CPU - is that harmless?
> 
> Similarly, what if the online CPUs change between mvneta_stop_dev()
> and mvneta_stop(), and also what about hotplug CPU changes during
> the startup path?
> 
> Secondly, is there a reason for:
> 
> 	for_each_online_cpu(cpu)
> 		smp_call_function_single(cpu, ...)
> 
> as opposed to:
> 
> 	smp_call_function(mvneta_percpu_disable, pp, true);
> 

Yep, thanks for pointing out the race. Before sending out the patch, I prepared
another *wrong*(IMHO) patch to patch smp_prepare_cpus() in arch/arm/kernel/smp.c.

Here is the background:
I can reproduce this issue on arm but failed to reproduce it on arm64. The key
is what's present cpu.

let's assume a quad core system, boot with maxcpus=2, after booting.

on arm64, present cpus is cpu0, cpu1

on arm, present cpus is cpu0, cpu1, cpu2 and cpu3.

the arm core code requires every platform to update the present map in
platforms' smp_prepare_cpus(), but only two or three platforms do so.

Then get back to mvneta issue, during startup, mvneta_start_dev() calls
napi_enable() for each present cpu's port. If userspace ask for online
cpu2, mvneta_percpu_notifier() will call napi_enable() for cpu2 again,
so BUG_ON() is triggered.

I have two solutions:

1. as the above patch did, then prevent the race as pointed out by
get_online_cpus().

2. make arm platforms smp_prepare_cpus to update the present map or even
patch arm core smp_prepare_cpus().

What's the better solution? Could you please guide me?

Thanks in advance,
Jisheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Armada XP (MV78460): BUG in netdevice.h with maxcpus=2
  2016-01-08 12:45     ` Jisheng Zhang
@ 2016-01-08 13:03       ` Jisheng Zhang
  2016-01-08 13:21         ` Jisheng Zhang
  2016-01-08 13:31       ` Russell King - ARM Linux
  1 sibling, 1 reply; 10+ messages in thread
From: Jisheng Zhang @ 2016-01-08 13:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 8 Jan 2016 20:45:23 +0800
Jisheng Zhang <jszhang@marvell.com> wrote:

> Dear Russell,
> 
> On Fri, 8 Jan 2016 10:57:21 +0000 Russell King - ARM Linux wrote:
> 
> > On Fri, Jan 08, 2016 at 06:25:37PM +0800, Jisheng Zhang wrote:  
> > > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > > index ed622fa..e1242f4 100644
> > > --- a/drivers/net/ethernet/marvell/mvneta.c
> > > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > > @@ -2446,7 +2446,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
> > >  	mvneta_port_enable(pp);
> > >  
> > >  	/* Enable polling on the port */
> > > -	for_each_present_cpu(cpu) {
> > > +	for_each_online_cpu(cpu) {
> > >  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
> > >  
> > >  		napi_enable(&port->napi);
> > > @@ -2472,7 +2472,7 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
> > >  
> > >  	phy_stop(pp->phy_dev);
> > >  
> > > -	for_each_present_cpu(cpu) {
> > > +	for_each_online_cpu(cpu) {
> > >  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
> > >  
> > >  		napi_disable(&port->napi);
> > > @@ -2907,7 +2907,7 @@ static int mvneta_stop(struct net_device *dev)
> > >  	mvneta_stop_dev(pp);
> > >  	mvneta_mdio_remove(pp);
> > >  	unregister_cpu_notifier(&pp->cpu_notifier);
> > > -	for_each_present_cpu(cpu)
> > > +	for_each_online_cpu(cpu)
> > >  		smp_call_function_single(cpu, mvneta_percpu_disable, pp, true);
> > >  	free_percpu_irq(dev->irq, pp->ports);
> > >  	mvneta_cleanup_rxqs(pp);    
> > 
> > I'm not convinced that this isn't racy - what happens if a CPU is
> > brought online between unregister_cpu_notifier(&pp->cpu_notifier)
> > and the following loop?  We'll end up calling mvneta_percpu_disable()
> > for the new CPU - is that harmless?
> > 
> > Similarly, what if the online CPUs change between mvneta_stop_dev()
> > and mvneta_stop(), and also what about hotplug CPU changes during
> > the startup path?
> > 
> > Secondly, is there a reason for:
> > 
> > 	for_each_online_cpu(cpu)
> > 		smp_call_function_single(cpu, ...)
> > 
> > as opposed to:
> > 
> > 	smp_call_function(mvneta_percpu_disable, pp, true);
> >   
> 
> Yep, thanks for pointing out the race. Before sending out the patch, I prepared
> another *wrong*(IMHO) patch to patch smp_prepare_cpus() in arch/arm/kernel/smp.c.

Here is the patch. I think it could also fix the issue.

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index b263613..f94c755 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -443,15 +443,31 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	 */
 	if (max_cpus > ncores)
 		max_cpus = ncores;
-	if (ncores > 1 && max_cpus) {
-		/*
-		 * Initialise the present map, which describes the set of CPUs
-		 * actually populated at the present time. A platform should
-		 * re-initialize the map in the platforms smp_prepare_cpus()
-		 * if present != possible (e.g. physical hotplug).
-		 */
-		init_cpu_present(cpu_possible_mask);
 
+	/* Don't bother if we're effectively UP */
+	if (max_cpus <= 1)
+		return;
+
+	/*
+	 * Initialise the present map (which describes the set of CPUs
+	 * actually populated at the present time) and release the
+	 * secondaries from the bootloader.
+	 *
+	 * Make sure we online@most (max_cpus - 1) additional CPUs.
+	 */
+	max_cpus--;
+	for_each_possible_cpu(cpu) {
+		if (max_cpus == 0)
+			break;
+
+		if (cpu == smp_processor_id())
+			continue;
+
+		set_cpu_present(cpu, true);
+		max_cpus--;
+	}
+
+	if (ncores > 1 && max_cpus) {
 		/*
 		 * Initialise the SCU if there are more than one CPU
 		 * and let them know where to start.


> 
> Here is the background:
> I can reproduce this issue on arm but failed to reproduce it on arm64. The key
> is what's present cpu.
> 
> let's assume a quad core system, boot with maxcpus=2, after booting.
> 
> on arm64, present cpus is cpu0, cpu1
> 
> on arm, present cpus is cpu0, cpu1, cpu2 and cpu3.
> 
> the arm core code requires every platform to update the present map in
> platforms' smp_prepare_cpus(), but only two or three platforms do so.
> 
> Then get back to mvneta issue, during startup, mvneta_start_dev() calls
> napi_enable() for each present cpu's port. If userspace ask for online
> cpu2, mvneta_percpu_notifier() will call napi_enable() for cpu2 again,
> so BUG_ON() is triggered.
> 
> I have two solutions:
> 
> 1. as the above patch did, then prevent the race as pointed out by
> get_online_cpus().
> 
> 2. make arm platforms smp_prepare_cpus to update the present map or even
> patch arm core smp_prepare_cpus().
> 
> What's the better solution? Could you please guide me?
> 
> Thanks in advance,
> Jisheng
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Armada XP (MV78460): BUG in netdevice.h with maxcpus=2
  2016-01-08 13:03       ` Jisheng Zhang
@ 2016-01-08 13:21         ` Jisheng Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: Jisheng Zhang @ 2016-01-08 13:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 8 Jan 2016 21:03:09 +0800
Jisheng Zhang <jszhang@marvell.com> wrote:

> On Fri, 8 Jan 2016 20:45:23 +0800
> Jisheng Zhang <jszhang@marvell.com> wrote:
> 
> > Dear Russell,
> > 
> > On Fri, 8 Jan 2016 10:57:21 +0000 Russell King - ARM Linux wrote:
> >   
> > > On Fri, Jan 08, 2016 at 06:25:37PM +0800, Jisheng Zhang wrote:    
> > > > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > > > index ed622fa..e1242f4 100644
> > > > --- a/drivers/net/ethernet/marvell/mvneta.c
> > > > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > > > @@ -2446,7 +2446,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
> > > >  	mvneta_port_enable(pp);
> > > >  
> > > >  	/* Enable polling on the port */
> > > > -	for_each_present_cpu(cpu) {
> > > > +	for_each_online_cpu(cpu) {
> > > >  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
> > > >  
> > > >  		napi_enable(&port->napi);
> > > > @@ -2472,7 +2472,7 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
> > > >  
> > > >  	phy_stop(pp->phy_dev);
> > > >  
> > > > -	for_each_present_cpu(cpu) {
> > > > +	for_each_online_cpu(cpu) {
> > > >  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
> > > >  
> > > >  		napi_disable(&port->napi);
> > > > @@ -2907,7 +2907,7 @@ static int mvneta_stop(struct net_device *dev)
> > > >  	mvneta_stop_dev(pp);
> > > >  	mvneta_mdio_remove(pp);
> > > >  	unregister_cpu_notifier(&pp->cpu_notifier);
> > > > -	for_each_present_cpu(cpu)
> > > > +	for_each_online_cpu(cpu)
> > > >  		smp_call_function_single(cpu, mvneta_percpu_disable, pp, true);
> > > >  	free_percpu_irq(dev->irq, pp->ports);
> > > >  	mvneta_cleanup_rxqs(pp);      
> > > 
> > > I'm not convinced that this isn't racy - what happens if a CPU is
> > > brought online between unregister_cpu_notifier(&pp->cpu_notifier)
> > > and the following loop?  We'll end up calling mvneta_percpu_disable()
> > > for the new CPU - is that harmless?
> > > 
> > > Similarly, what if the online CPUs change between mvneta_stop_dev()
> > > and mvneta_stop(), and also what about hotplug CPU changes during
> > > the startup path?
> > > 
> > > Secondly, is there a reason for:
> > > 
> > > 	for_each_online_cpu(cpu)
> > > 		smp_call_function_single(cpu, ...)
> > > 
> > > as opposed to:
> > > 
> > > 	smp_call_function(mvneta_percpu_disable, pp, true);
> > >     
> > 
> > Yep, thanks for pointing out the race. Before sending out the patch, I prepared
> > another *wrong*(IMHO) patch to patch smp_prepare_cpus() in arch/arm/kernel/smp.c.  
> 
> Here is the patch. I think it could also fix the issue.

oops, the version can't be compiled, sorry for noise. Here is the updated one:

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index b263613..26c164b 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -432,6 +432,7 @@ void __init smp_prepare_boot_cpu(void)
 
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
+	unsigned int cpu, max;
 	unsigned int ncores = num_possible_cpus();
 
 	init_cpu_topology();
@@ -443,15 +444,29 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	 */
 	if (max_cpus > ncores)
 		max_cpus = ncores;
-	if (ncores > 1 && max_cpus) {
-		/*
-		 * Initialise the present map, which describes the set of CPUs
-		 * actually populated at the present time. A platform should
-		 * re-initialize the map in the platforms smp_prepare_cpus()
-		 * if present != possible (e.g. physical hotplug).
-		 */
-		init_cpu_present(cpu_possible_mask);
 
+	/* Don't bother if we're effectively UP */
+	if (max_cpus <= 1)
+		return;
+
+	/*
+	 * Initialise the present map (which describes the set of CPUs
+	 * actually populated@the present time).
+	 */
+	max = max_cpus;
+	max--;
+	for_each_possible_cpu(cpu) {
+		if (max == 0)
+			break;
+
+		if (cpu == smp_processor_id())
+			continue;
+
+		set_cpu_present(cpu, true);
+		max--;
+	}
+
+	if (ncores > 1 && max_cpus) {
 		/*
 		 * Initialise the SCU if there are more than one CPU
 		 * and let them know where to start.


> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index b263613..f94c755 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -443,15 +443,31 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  	 */
>  	if (max_cpus > ncores)
>  		max_cpus = ncores;
> -	if (ncores > 1 && max_cpus) {
> -		/*
> -		 * Initialise the present map, which describes the set of CPUs
> -		 * actually populated at the present time. A platform should
> -		 * re-initialize the map in the platforms smp_prepare_cpus()
> -		 * if present != possible (e.g. physical hotplug).
> -		 */
> -		init_cpu_present(cpu_possible_mask);
>  
> +	/* Don't bother if we're effectively UP */
> +	if (max_cpus <= 1)
> +		return;
> +
> +	/*
> +	 * Initialise the present map (which describes the set of CPUs
> +	 * actually populated at the present time) and release the
> +	 * secondaries from the bootloader.
> +	 *
> +	 * Make sure we online at most (max_cpus - 1) additional CPUs.
> +	 */
> +	max_cpus--;
> +	for_each_possible_cpu(cpu) {
> +		if (max_cpus == 0)
> +			break;
> +
> +		if (cpu == smp_processor_id())
> +			continue;
> +
> +		set_cpu_present(cpu, true);
> +		max_cpus--;
> +	}
> +
> +	if (ncores > 1 && max_cpus) {
>  		/*
>  		 * Initialise the SCU if there are more than one CPU
>  		 * and let them know where to start.
> 
> 
> > 
> > Here is the background:
> > I can reproduce this issue on arm but failed to reproduce it on arm64. The key
> > is what's present cpu.
> > 
> > let's assume a quad core system, boot with maxcpus=2, after booting.
> > 
> > on arm64, present cpus is cpu0, cpu1
> > 
> > on arm, present cpus is cpu0, cpu1, cpu2 and cpu3.
> > 
> > the arm core code requires every platform to update the present map in
> > platforms' smp_prepare_cpus(), but only two or three platforms do so.
> > 
> > Then get back to mvneta issue, during startup, mvneta_start_dev() calls
> > napi_enable() for each present cpu's port. If userspace ask for online
> > cpu2, mvneta_percpu_notifier() will call napi_enable() for cpu2 again,
> > so BUG_ON() is triggered.
> > 
> > I have two solutions:
> > 
> > 1. as the above patch did, then prevent the race as pointed out by
> > get_online_cpus().
> > 
> > 2. make arm platforms smp_prepare_cpus to update the present map or even
> > patch arm core smp_prepare_cpus().
> > 
> > What's the better solution? Could you please guide me?
> > 
> > Thanks in advance,
> > Jisheng
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel  
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Armada XP (MV78460): BUG in netdevice.h with maxcpus=2
  2016-01-08 12:45     ` Jisheng Zhang
  2016-01-08 13:03       ` Jisheng Zhang
@ 2016-01-08 13:31       ` Russell King - ARM Linux
  2016-01-08 13:48         ` Jisheng Zhang
  1 sibling, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2016-01-08 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 08, 2016 at 08:45:23PM +0800, Jisheng Zhang wrote:
> let's assume a quad core system, boot with maxcpus=2, after booting.
> 
> on arm64, present cpus is cpu0, cpu1
> 
> on arm, present cpus is cpu0, cpu1, cpu2 and cpu3.
> 
> the arm core code requires every platform to update the present map in
> platforms' smp_prepare_cpus(), but only two or three platforms do so.

The behaviour of maxcpus= is architecture specific.  Some architectures
take notice of this to limit the number of present and possible CPUs,
others ignore it.  In any case, it limits the number of CPUs that come
online at boot.

However, that is a complete red herring, because what we're talking
about is bringing up the network interface at some point later, when
CPU hotplug may have changed the online state already: the kernel may
have booted with CPU0-3 online, but CPUs 2 and 3 may have been taken
offline before the network interface is brought up.

> What's the better solution? Could you please guide me?

I would suggest that you need to track the state of each CPU within
your percpu specific code with appropriate locks to prevent concurrency,
and ensure that you register the CPU hotplug notifier just before
walking the currently on-line CPUs.

Also, walk the list of currently on-line CPUs (using, eg,
smp_call_function) just before unregistering the hotplug notifier.

Remember that your per-CPU bringup/takedown may be executed twice
for the same CPU - once via the hotplug notifier and once via
smp_call_function() - hence why you need locking and state tracking.

That fixes two of the sites.  For the other site, I wonder whether
it's possible to restructure the code so there's no need to have
three sites, since it's fairly obvious when thinking about the
CPU hotplug case, you only have notification of the CPU coming
online and the CPU going away.  It's been a while since I looked
at the mvneta code to really comment on that though.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Armada XP (MV78460): BUG in netdevice.h with maxcpus=2
  2016-01-08 13:31       ` Russell King - ARM Linux
@ 2016-01-08 13:48         ` Jisheng Zhang
  0 siblings, 0 replies; 10+ messages in thread
From: Jisheng Zhang @ 2016-01-08 13:48 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Russell,

On Fri, 8 Jan 2016 13:31:04 +0000 Russell King - ARM Linux wrote:

> On Fri, Jan 08, 2016 at 08:45:23PM +0800, Jisheng Zhang wrote:
> > let's assume a quad core system, boot with maxcpus=2, after booting.
> > 
> > on arm64, present cpus is cpu0, cpu1
> > 
> > on arm, present cpus is cpu0, cpu1, cpu2 and cpu3.
> > 
> > the arm core code requires every platform to update the present map in
> > platforms' smp_prepare_cpus(), but only two or three platforms do so.  
> 
> The behaviour of maxcpus= is architecture specific.  Some architectures
> take notice of this to limit the number of present and possible CPUs,
> others ignore it.  In any case, it limits the number of CPUs that come
> online at boot.
> 
> However, that is a complete red herring, because what we're talking
> about is bringing up the network interface at some point later, when
> CPU hotplug may have changed the online state already: the kernel may
> have booted with CPU0-3 online, but CPUs 2 and 3 may have been taken
> offline before the network interface is brought up.

Oh yeah! On arm64, the BUG_ON can be reproduced in this case. So the driver
need a fix. Thanks a lot

> 
> > What's the better solution? Could you please guide me?  
> 
> I would suggest that you need to track the state of each CPU within
> your percpu specific code with appropriate locks to prevent concurrency,
> and ensure that you register the CPU hotplug notifier just before
> walking the currently on-line CPUs.
> 
> Also, walk the list of currently on-line CPUs (using, eg,
> smp_call_function) just before unregistering the hotplug notifier.
> 
> Remember that your per-CPU bringup/takedown may be executed twice
> for the same CPU - once via the hotplug notifier and once via
> smp_call_function() - hence why you need locking and state tracking.

Got it. Thanks for the guidance, I'll try.

> 
> That fixes two of the sites.  For the other site, I wonder whether
> it's possible to restructure the code so there's no need to have
> three sites, since it's fairly obvious when thinking about the
> CPU hotplug case, you only have notification of the CPU coming
> online and the CPU going away.  It's been a while since I looked
> at the mvneta code to really comment on that though.
> 

Thank you very much,
Jisheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-01-08 13:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-08  7:50 Armada XP (MV78460): BUG in netdevice.h with maxcpus=2 Stefan Roese
2016-01-08 10:25 ` Jisheng Zhang
2016-01-08 10:51   ` Gregory CLEMENT
2016-01-08 10:53   ` Stefan Roese
2016-01-08 10:57   ` Russell King - ARM Linux
2016-01-08 12:45     ` Jisheng Zhang
2016-01-08 13:03       ` Jisheng Zhang
2016-01-08 13:21         ` Jisheng Zhang
2016-01-08 13:31       ` Russell King - ARM Linux
2016-01-08 13:48         ` Jisheng Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.