rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
@ 2022-01-25 19:13 Paul Menzel
  2022-01-26  9:47 ` Zhouyi Zhou
  2022-01-29  2:23 ` Zhouyi Zhou
  0 siblings, 2 replies; 17+ messages in thread
From: Paul Menzel @ 2022-01-25 19:13 UTC (permalink / raw)
  To: Paul E. McKenney, Josh Triplett
  Cc: rcu, LKML, David S. Miller, Jakub Kicinski, netdev

Dear Linux folks,


I do not know, if this is an rcutorture issue, or if rcutorture found a 
bug with `rtmsg_ifinfo_build_skb()`.


Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with

     CONFIG_TORTURE_TEST=y
     CONFIG_RCU_TORTURE_TEST=y

and

     $ clang --version
     Ubuntu clang version 12.0.0-3ubuntu1~21.04.2
     Target: powerpc64le-unknown-linux-gnu
     Thread model: posix
     InstalledDir: /usr/bin
     $ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg

and booting it on an IBM S822LC, Linux paniced with a null pointer 
dereference, and the watchdog rebooted, and I found the message below in 
`/sys/fs/pstore/dmesg-nvram-2.enc.z`.

```
[    T1] Key type id_legacy registered
[    T1] SGI XFS with ACLs, security attributes, no debug enabled
[    T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 
248)
[    T1] io scheduler mq-deadline registered
[    T1] io scheduler kyber registered
[  T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left
[    T1] pci 0021:10:00.0: enabling device (0141 -> 0143)
[    T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32, 
pitch=4096
[    T1] Console: switching to colour frame buffer device 128x48
[    T1] fb0: Open Firmware frame buffer device on 
/pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0
[    T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console)
[    T1] hvc0: No interrupts property, using OPAL event
[    T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    T1] Non-volatile memory driver v1.3
[    T1] brd: module loaded
[    T1] loop: module loaded
[    T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March 
14, 2017)
[    T1] ahci 0021:0e:00.0: version 3.0
[    T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143)
[    T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf 
impl SATA mode
[    T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio 
slum part sxs
[    T1] scsi host0: ahci
[    T1] scsi host1: ahci
[    T1] scsi host2: ahci
[    T1] scsi host3: ahci
[    T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port 
0x3fe881000100 irq 39
[    T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port 
0x3fe881000180 irq 39
[    T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port 
0x3fe881000200 irq 39
[    T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port 
0x3fe881000280 irq 39
[    T1] e100: Intel(R) PRO/100 Network Driver
[    T1] e100: Copyright(c) 1999-2006 Intel Corporation
[    T1] e1000: Intel(R) PRO/1000 Network Driver
[    T1] e1000: Copyright (c) 1999-2006 Intel Corporation.
[    T1] e1000e: Intel(R) PRO/1000 Network Driver
[    T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    T1] ehci-pci: EHCI PCI platform driver
[    T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    T1] ohci-pci: OHCI PCI platform driver
[    T1] rtc-opal opal-rtc: registered as rtc0
[    T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45 
UTC (1643048505)
[    T1] i2c_dev: i2c /dev entries driver
[    T1] device-mapper: uevent: version 1.0.3
[    T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: 
dm-devel@redhat.com
[    T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal 
0xffffffef max 0x0
[    T1] powernv-cpufreq: Workload Optimized Frequency is disabled in 
the platform
[    T1] powernv_idle_driver registered
[    T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1
[    T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9
[    T1] usbcore: registered new interface driver usbhid
[    T1] usbhid: USB HID core driver
[    T1] ipip: IPv4 and MPLS over IPv4 tunneling driver
[    T1] NET: Registered PF_INET6 protocol family
[    T1] Segment Routing with IPv6
[    T1] In-situ OAM (IOAM) with IPv6
[    T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    T1] BUG: Kernel NULL pointer dereference on write at 0x00000000
[    T1] Faulting instruction address: 0xc0000000008e2400
[    T1] Oops: Kernel access of bad area, sig: 11 [#1]
[    T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
[    T1] Modules linked in:
[    T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted 
5.17.0-rc1-00032-gdd81e1c7d5fb #29
[    T1] NIP:  c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60
[    T1] REGS: c0000000125033e0 TRAP: 0380   Not tainted 
(5.17.0-rc1-00032-gdd81e1c7d5fb)
[    T1] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42800c40 
XER: 00000000
[    T1] CFAR: c000000000d65dac IRQMASK: 0
[    T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600 
0000000000000000
[    T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000 
0000000000000cc0
[    T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff 
0000000000000001
[    T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478 
0000000000000000
[    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0 
0000000000000000
[    T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000 
0000000000000000
[    T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000 
c000000012503680
[    T1] NIP [c0000000008e2400] strlen+0x10/0x30
[    T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360
[    T1] Call Trace:
[    T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0 
(unreliable)
[    T1] [c0000000125036f0] [c000000000d65b40] 
rtmsg_ifinfo_build_skb+0x80/0x1a0
[    T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0
[    T1] [c000000012503800] [c000000000d4de50] 
register_netdevice+0x690/0x770
[    T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80
[    T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0
[    T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0
[    T1] [c000000012503970] [c000000000d331bc] 
register_pernet_operations+0xec/0x1e0
[    T1] [c0000000125039d0] [c000000000d33440] 
register_pernet_device+0x60/0xd0
[    T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160
[    T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0
[    T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4
[    T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4
[    T1] [c000000012503d40] [c000000002005c7c] 
kernel_init_freeable+0x160/0x1ec
[    T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270
[    T1] [c000000012503e10] [c00000000000cd64] 
ret_from_kernel_thread+0x5c/0x64
[    T1] Instruction dump:
[    T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 
60000000
[    T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 
4082fff8 7c632050
[    T1] ---[ end trace 0000000000000000 ]---
[    T1]
[  T206] ata4: SATA link down (SStatus 0 SControl 300)
[  T204] ata3: SATA link down (SStatus 0 SControl 300)
[  T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  T200] ata1.00: ATA-10: ST1000NX0313         00LY266 00LY265IBM, BE33, 
max UDMA/133
[  T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA
[  T200] ata1.00: configured for UDMA/133
[    T7] scsi 0:0:0:0: Direct-Access     ATA      ST1000NX0313     BE33 
PQ: 0 ANSI: 5
[    T7] sd 0:0:0:0: Attached scsi generic sg0 type 0
[  T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 
TB/932 GiB)
[  T209] sd 0:0:0:0: [sda] 4096-byte physical blocks
[  T209] sd 0:0:0:0: [sda] Write Protect is off
[  T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[  T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[  T209]  sda: sda1 sda2
[  T209] sd 0:0:0:0: [sda] Attached SCSI removable disk
[    T1] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0000000b
```


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-01-25 19:13 BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) Paul Menzel
@ 2022-01-26  9:47 ` Zhouyi Zhou
  2022-01-29  2:23 ` Zhouyi Zhou
  1 sibling, 0 replies; 17+ messages in thread
From: Zhouyi Zhou @ 2022-01-26  9:47 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller,
	Jakub Kicinski, netdev

Dear Menzel

I am also very interested in RCU tests;-)
First of all, thank your email for teaching me how to construct a
kernel deb package using clang  ;-)
I build and test the linux-next under x86_64, but the kernel does not
panic, I guess our kernel configuration maybe different, following is
my steps:

1. git clone https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/next/linux-next.git
2. git describe: next-20220125
3. make menuconfig CC=clang-12 (CONFIG_TORTURE_TEST=y
CONFIG_RCU_TORTURE_TEST=y)
My configuration file is uploaded to my VPS cloud server:
http://154.223.142.244/config-5.17.0-rc1-next-20220125+
4. make CC=clang-12 -j 16 bindeb-pkg
5. install the kernel, reboot
6. the kernel does not panic (has been running for 30 minutes by now)

I Hope I can be more helpful ;-)

Thanks
Sincerely
Zhouyi


On Wed, Jan 26, 2022 at 3:24 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Linux folks,
>
>
> I do not know, if this is an rcutorture issue, or if rcutorture found a
> bug with `rtmsg_ifinfo_build_skb()`.
>
>
> Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with
>
>      CONFIG_TORTURE_TEST=y
>      CONFIG_RCU_TORTURE_TEST=y
>
> and
>
>      $ clang --version
>      Ubuntu clang version 12.0.0-3ubuntu1~21.04.2
>      Target: powerpc64le-unknown-linux-gnu
>      Thread model: posix
>      InstalledDir: /usr/bin
>      $ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg
>
> and booting it on an IBM S822LC, Linux paniced with a null pointer
> dereference, and the watchdog rebooted, and I found the message below in
> `/sys/fs/pstore/dmesg-nvram-2.enc.z`.
>
> ```
> [    T1] Key type id_legacy registered
> [    T1] SGI XFS with ACLs, security attributes, no debug enabled
> [    T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major
> 248)
> [    T1] io scheduler mq-deadline registered
> [    T1] io scheduler kyber registered
> [  T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left
> [    T1] pci 0021:10:00.0: enabling device (0141 -> 0143)
> [    T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32,
> pitch=4096
> [    T1] Console: switching to colour frame buffer device 128x48
> [    T1] fb0: Open Firmware frame buffer device on
> /pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0
> [    T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console)
> [    T1] hvc0: No interrupts property, using OPAL event
> [    T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> [    T1] Non-volatile memory driver v1.3
> [    T1] brd: module loaded
> [    T1] loop: module loaded
> [    T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March
> 14, 2017)
> [    T1] ahci 0021:0e:00.0: version 3.0
> [    T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143)
> [    T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf
> impl SATA mode
> [    T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio
> slum part sxs
> [    T1] scsi host0: ahci
> [    T1] scsi host1: ahci
> [    T1] scsi host2: ahci
> [    T1] scsi host3: ahci
> [    T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000100 irq 39
> [    T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000180 irq 39
> [    T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000200 irq 39
> [    T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000280 irq 39
> [    T1] e100: Intel(R) PRO/100 Network Driver
> [    T1] e100: Copyright(c) 1999-2006 Intel Corporation
> [    T1] e1000: Intel(R) PRO/1000 Network Driver
> [    T1] e1000: Copyright (c) 1999-2006 Intel Corporation.
> [    T1] e1000e: Intel(R) PRO/1000 Network Driver
> [    T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [    T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [    T1] ehci-pci: EHCI PCI platform driver
> [    T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> [    T1] ohci-pci: OHCI PCI platform driver
> [    T1] rtc-opal opal-rtc: registered as rtc0
> [    T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45
> UTC (1643048505)
> [    T1] i2c_dev: i2c /dev entries driver
> [    T1] device-mapper: uevent: version 1.0.3
> [    T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised:
> dm-devel@redhat.com
> [    T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal
> 0xffffffef max 0x0
> [    T1] powernv-cpufreq: Workload Optimized Frequency is disabled in
> the platform
> [    T1] powernv_idle_driver registered
> [    T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1
> [    T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9
> [    T1] usbcore: registered new interface driver usbhid
> [    T1] usbhid: USB HID core driver
> [    T1] ipip: IPv4 and MPLS over IPv4 tunneling driver
> [    T1] NET: Registered PF_INET6 protocol family
> [    T1] Segment Routing with IPv6
> [    T1] In-situ OAM (IOAM) with IPv6
> [    T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
> [    T1] BUG: Kernel NULL pointer dereference on write at 0x00000000
> [    T1] Faulting instruction address: 0xc0000000008e2400
> [    T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [    T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> [    T1] Modules linked in:
> [    T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted
> 5.17.0-rc1-00032-gdd81e1c7d5fb #29
> [    T1] NIP:  c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60
> [    T1] REGS: c0000000125033e0 TRAP: 0380   Not tainted
> (5.17.0-rc1-00032-gdd81e1c7d5fb)
> [    T1] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42800c40
> XER: 00000000
> [    T1] CFAR: c000000000d65dac IRQMASK: 0
> [    T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600
> 0000000000000000
> [    T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000
> 0000000000000cc0
> [    T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff
> 0000000000000001
> [    T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478
> 0000000000000000
> [    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0
> 0000000000000000
> [    T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000
> 0000000000000000
> [    T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000
> c000000012503680
> [    T1] NIP [c0000000008e2400] strlen+0x10/0x30
> [    T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360
> [    T1] Call Trace:
> [    T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0
> (unreliable)
> [    T1] [c0000000125036f0] [c000000000d65b40]
> rtmsg_ifinfo_build_skb+0x80/0x1a0
> [    T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0
> [    T1] [c000000012503800] [c000000000d4de50]
> register_netdevice+0x690/0x770
> [    T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80
> [    T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0
> [    T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0
> [    T1] [c000000012503970] [c000000000d331bc]
> register_pernet_operations+0xec/0x1e0
> [    T1] [c0000000125039d0] [c000000000d33440]
> register_pernet_device+0x60/0xd0
> [    T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160
> [    T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0
> [    T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4
> [    T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4
> [    T1] [c000000012503d40] [c000000002005c7c]
> kernel_init_freeable+0x160/0x1ec
> [    T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270
> [    T1] [c000000012503e10] [c00000000000cd64]
> ret_from_kernel_thread+0x5c/0x64
> [    T1] Instruction dump:
> [    T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000
> 60000000
> [    T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000
> 4082fff8 7c632050
> [    T1] ---[ end trace 0000000000000000 ]---
> [    T1]
> [  T206] ata4: SATA link down (SStatus 0 SControl 300)
> [  T204] ata3: SATA link down (SStatus 0 SControl 300)
> [  T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [  T200] ata1.00: ATA-10: ST1000NX0313         00LY266 00LY265IBM, BE33,
> max UDMA/133
> [  T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA
> [  T200] ata1.00: configured for UDMA/133
> [    T7] scsi 0:0:0:0: Direct-Access     ATA      ST1000NX0313     BE33
> PQ: 0 ANSI: 5
> [    T7] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [  T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00
> TB/932 GiB)
> [  T209] sd 0:0:0:0: [sda] 4096-byte physical blocks
> [  T209] sd 0:0:0:0: [sda] Write Protect is off
> [  T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [  T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [  T209]  sda: sda1 sda2
> [  T209] sd 0:0:0:0: [sda] Attached SCSI removable disk
> [    T1] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> ```
>
>
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-01-25 19:13 BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) Paul Menzel
  2022-01-26  9:47 ` Zhouyi Zhou
@ 2022-01-29  2:23 ` Zhouyi Zhou
  2022-01-29 16:52   ` Paul Menzel
  1 sibling, 1 reply; 17+ messages in thread
From: Zhouyi Zhou @ 2022-01-29  2:23 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller,
	Jakub Kicinski, netdev

Dear Paul

I don't have an IBM machine, but I tried to analyze the problem using
my x86_64 kvm virtual machine, I can't reproduce the bug using my
x86_64 kvm virtual machine.

I saw the panic is caused by registration of sit device (A sit device
is a type of virtual network device that takes our IPv6 traffic,
encapsulates/decapsulates it in IPv4 packets, and sends/receives it
over the IPv4 Internet to another host)

sit device is registered in function sit_init_net:
1895    static int __net_init sit_init_net(struct net *net)
1896    {
1897        struct sit_net *sitn = net_generic(net, sit_net_id);
1898        struct ip_tunnel *t;
1899        int err;
1900
1901        sitn->tunnels[0] = sitn->tunnels_wc;
1902        sitn->tunnels[1] = sitn->tunnels_l;
1903        sitn->tunnels[2] = sitn->tunnels_r;
1904        sitn->tunnels[3] = sitn->tunnels_r_l;
1905
1906        if (!net_has_fallback_tunnels(net))
1907            return 0;
1908
1909        sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
1910                           NET_NAME_UNKNOWN,
1911                           ipip6_tunnel_setup);
1912        if (!sitn->fb_tunnel_dev) {
1913            err = -ENOMEM;
1914            goto err_alloc_dev;
1915        }
1916        dev_net_set(sitn->fb_tunnel_dev, net);
1917        sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
1918        /* FB netdevice is special: we have one, and only one per netns.
1919         * Allowing to move it to another netns is clearly unsafe.
1920         */
1921        sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
1922
1923        err = register_netdev(sitn->fb_tunnel_dev);
register_netdev on line 1923 will call if_nlmsg_size indirectly.

On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
(gdb) disassemble if_nlmsg_size
Dump of assembler code for function if_nlmsg_size:
   0xffffffff81a0dc20 <+0>:    nopl   0x0(%rax,%rax,1)
   0xffffffff81a0dc25 <+5>:    push   %rbp
   0xffffffff81a0dc26 <+6>:    push   %r15
   0xffffffff81a0dd04 <+228>:    je     0xffffffff81a0de20 <if_nlmsg_size+512>
   0xffffffff81a0dd0a <+234>:    mov    0x10(%rbp),%rdi
   ...
 => 0xffffffff81a0dd0e <+238>:    callq  0xffffffff817532d0 <strlen>
   0xffffffff81a0dd13 <+243>:    add    $0x10,%eax
   0xffffffff81a0dd16 <+246>:    movslq %eax,%r12

and the C code for 0xffffffff81a0dd0e is following (line 524):
515    static size_t rtnl_link_get_size(const struct net_device *dev)
516    {
517        const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
518        size_t size;
519
520        if (!ops)
521            return 0;
522
523        size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
524               nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */

But ops is assigned the value of sit_link_ops in function sit_init_net
line 1917, so I guess something must happened between the calls.

Do we have KASAN in IBM machine? would KASAN help us find out what
happened in between?

Hope I can be of more helpful.

Thanks
Sincerely
Zhouyi

On Wed, Jan 26, 2022 at 3:24 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Linux folks,
>
>
> I do not know, if this is an rcutorture issue, or if rcutorture found a
> bug with `rtmsg_ifinfo_build_skb()`.
>
>
> Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with
>
>      CONFIG_TORTURE_TEST=y
>      CONFIG_RCU_TORTURE_TEST=y
>
> and
>
>      $ clang --version
>      Ubuntu clang version 12.0.0-3ubuntu1~21.04.2
>      Target: powerpc64le-unknown-linux-gnu
>      Thread model: posix
>      InstalledDir: /usr/bin
>      $ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg
I build the kernel in LLVM/Clang also
>
> and booting it on an IBM S822LC, Linux paniced with a null pointer
> dereference, and the watchdog rebooted, and I found the message below in
> `/sys/fs/pstore/dmesg-nvram-2.enc.z`.
>
> ```
> [    T1] Key type id_legacy registered
> [    T1] SGI XFS with ACLs, security attributes, no debug enabled
> [    T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major
> 248)
> [    T1] io scheduler mq-deadline registered
> [    T1] io scheduler kyber registered
> [  T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left
> [    T1] pci 0021:10:00.0: enabling device (0141 -> 0143)
> [    T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32,
> pitch=4096
> [    T1] Console: switching to colour frame buffer device 128x48
> [    T1] fb0: Open Firmware frame buffer device on
> /pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0
> [    T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console)
> [    T1] hvc0: No interrupts property, using OPAL event
> [    T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> [    T1] Non-volatile memory driver v1.3
> [    T1] brd: module loaded
> [    T1] loop: module loaded
> [    T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March
> 14, 2017)
> [    T1] ahci 0021:0e:00.0: version 3.0
> [    T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143)
> [    T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf
> impl SATA mode
> [    T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio
> slum part sxs
> [    T1] scsi host0: ahci
> [    T1] scsi host1: ahci
> [    T1] scsi host2: ahci
> [    T1] scsi host3: ahci
> [    T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000100 irq 39
> [    T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000180 irq 39
> [    T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000200 irq 39
> [    T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000280 irq 39
> [    T1] e100: Intel(R) PRO/100 Network Driver
> [    T1] e100: Copyright(c) 1999-2006 Intel Corporation
> [    T1] e1000: Intel(R) PRO/1000 Network Driver
> [    T1] e1000: Copyright (c) 1999-2006 Intel Corporation.
> [    T1] e1000e: Intel(R) PRO/1000 Network Driver
> [    T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [    T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [    T1] ehci-pci: EHCI PCI platform driver
> [    T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> [    T1] ohci-pci: OHCI PCI platform driver
> [    T1] rtc-opal opal-rtc: registered as rtc0
> [    T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45
> UTC (1643048505)
> [    T1] i2c_dev: i2c /dev entries driver
> [    T1] device-mapper: uevent: version 1.0.3
> [    T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised:
> dm-devel@redhat.com
> [    T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal
> 0xffffffef max 0x0
> [    T1] powernv-cpufreq: Workload Optimized Frequency is disabled in
> the platform
> [    T1] powernv_idle_driver registered
> [    T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1
> [    T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9
> [    T1] usbcore: registered new interface driver usbhid
> [    T1] usbhid: USB HID core driver
> [    T1] ipip: IPv4 and MPLS over IPv4 tunneling driver
> [    T1] NET: Registered PF_INET6 protocol family
> [    T1] Segment Routing with IPv6
> [    T1] In-situ OAM (IOAM) with IPv6
> [    T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
> [    T1] BUG: Kernel NULL pointer dereference on write at 0x00000000
> [    T1] Faulting instruction address: 0xc0000000008e2400
> [    T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [    T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> [    T1] Modules linked in:
> [    T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted
> 5.17.0-rc1-00032-gdd81e1c7d5fb #29
> [    T1] NIP:  c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60
> [    T1] REGS: c0000000125033e0 TRAP: 0380   Not tainted
> (5.17.0-rc1-00032-gdd81e1c7d5fb)
> [    T1] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42800c40
> XER: 00000000
> [    T1] CFAR: c000000000d65dac IRQMASK: 0
> [    T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600
> 0000000000000000
> [    T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000
> 0000000000000cc0
> [    T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff
> 0000000000000001
> [    T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478
> 0000000000000000
> [    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0
> 0000000000000000
> [    T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000
> 0000000000000000
> [    T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000
> c000000012503680
> [    T1] NIP [c0000000008e2400] strlen+0x10/0x30
> [    T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360
> [    T1] Call Trace:
> [    T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0
> (unreliable)
> [    T1] [c0000000125036f0] [c000000000d65b40]
> rtmsg_ifinfo_build_skb+0x80/0x1a0
> [    T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0
> [    T1] [c000000012503800] [c000000000d4de50]
> register_netdevice+0x690/0x770
> [    T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80
> [    T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0
> [    T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0
> [    T1] [c000000012503970] [c000000000d331bc]
> register_pernet_operations+0xec/0x1e0
> [    T1] [c0000000125039d0] [c000000000d33440]
> register_pernet_device+0x60/0xd0
> [    T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160
> [    T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0
> [    T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4
> [    T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4
> [    T1] [c000000012503d40] [c000000002005c7c]
> kernel_init_freeable+0x160/0x1ec
> [    T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270
> [    T1] [c000000012503e10] [c00000000000cd64]
> ret_from_kernel_thread+0x5c/0x64
> [    T1] Instruction dump:
> [    T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000
> 60000000
> [    T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000
> 4082fff8 7c632050
> [    T1] ---[ end trace 0000000000000000 ]---
> [    T1]
> [  T206] ata4: SATA link down (SStatus 0 SControl 300)
> [  T204] ata3: SATA link down (SStatus 0 SControl 300)
> [  T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [  T200] ata1.00: ATA-10: ST1000NX0313         00LY266 00LY265IBM, BE33,
> max UDMA/133
> [  T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA
> [  T200] ata1.00: configured for UDMA/133
> [    T7] scsi 0:0:0:0: Direct-Access     ATA      ST1000NX0313     BE33
> PQ: 0 ANSI: 5
> [    T7] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [  T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00
> TB/932 GiB)
> [  T209] sd 0:0:0:0: [sda] 4096-byte physical blocks
> [  T209] sd 0:0:0:0: [sda] Write Protect is off
> [  T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [  T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [  T209]  sda: sda1 sda2
> [  T209] sd 0:0:0:0: [sda] Attached SCSI removable disk
> [    T1] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> ```
>
>
> Kind regards,
>
> Paul

On Wed, Jan 26, 2022 at 3:24 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Linux folks,
>
>
> I do not know, if this is an rcutorture issue, or if rcutorture found a
> bug with `rtmsg_ifinfo_build_skb()`.
>
>
> Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with
>
>      CONFIG_TORTURE_TEST=y
>      CONFIG_RCU_TORTURE_TEST=y
>
> and
>
>      $ clang --version
>      Ubuntu clang version 12.0.0-3ubuntu1~21.04.2
>      Target: powerpc64le-unknown-linux-gnu
>      Thread model: posix
>      InstalledDir: /usr/bin
>      $ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg
I build the kernel in LLVM/Clang also
>
> and booting it on an IBM S822LC, Linux paniced with a null pointer
> dereference, and the watchdog rebooted, and I found the message below in
> `/sys/fs/pstore/dmesg-nvram-2.enc.z`.
>
> ```
> [    T1] Key type id_legacy registered
> [    T1] SGI XFS with ACLs, security attributes, no debug enabled
> [    T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major
> 248)
> [    T1] io scheduler mq-deadline registered
> [    T1] io scheduler kyber registered
> [  T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left
> [    T1] pci 0021:10:00.0: enabling device (0141 -> 0143)
> [    T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32,
> pitch=4096
> [    T1] Console: switching to colour frame buffer device 128x48
> [    T1] fb0: Open Firmware frame buffer device on
> /pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0
> [    T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console)
> [    T1] hvc0: No interrupts property, using OPAL event
> [    T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> [    T1] Non-volatile memory driver v1.3
> [    T1] brd: module loaded
> [    T1] loop: module loaded
> [    T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March
> 14, 2017)
> [    T1] ahci 0021:0e:00.0: version 3.0
> [    T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143)
> [    T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf
> impl SATA mode
> [    T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio
> slum part sxs
> [    T1] scsi host0: ahci
> [    T1] scsi host1: ahci
> [    T1] scsi host2: ahci
> [    T1] scsi host3: ahci
> [    T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000100 irq 39
> [    T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000180 irq 39
> [    T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000200 irq 39
> [    T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port
> 0x3fe881000280 irq 39
> [    T1] e100: Intel(R) PRO/100 Network Driver
> [    T1] e100: Copyright(c) 1999-2006 Intel Corporation
> [    T1] e1000: Intel(R) PRO/1000 Network Driver
> [    T1] e1000: Copyright (c) 1999-2006 Intel Corporation.
> [    T1] e1000e: Intel(R) PRO/1000 Network Driver
> [    T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [    T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [    T1] ehci-pci: EHCI PCI platform driver
> [    T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> [    T1] ohci-pci: OHCI PCI platform driver
> [    T1] rtc-opal opal-rtc: registered as rtc0
> [    T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45
> UTC (1643048505)
> [    T1] i2c_dev: i2c /dev entries driver
> [    T1] device-mapper: uevent: version 1.0.3
> [    T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised:
> dm-devel@redhat.com
> [    T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal
> 0xffffffef max 0x0
> [    T1] powernv-cpufreq: Workload Optimized Frequency is disabled in
> the platform
> [    T1] powernv_idle_driver registered
> [    T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1
> [    T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9
> [    T1] usbcore: registered new interface driver usbhid
> [    T1] usbhid: USB HID core driver
> [    T1] ipip: IPv4 and MPLS over IPv4 tunneling driver
> [    T1] NET: Registered PF_INET6 protocol family
> [    T1] Segment Routing with IPv6
> [    T1] In-situ OAM (IOAM) with IPv6
> [    T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
> [    T1] BUG: Kernel NULL pointer dereference on write at 0x00000000
> [    T1] Faulting instruction address: 0xc0000000008e2400
> [    T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [    T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> [    T1] Modules linked in:
> [    T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted
> 5.17.0-rc1-00032-gdd81e1c7d5fb #29
> [    T1] NIP:  c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60
> [    T1] REGS: c0000000125033e0 TRAP: 0380   Not tainted
> (5.17.0-rc1-00032-gdd81e1c7d5fb)
> [    T1] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42800c40
> XER: 00000000
> [    T1] CFAR: c000000000d65dac IRQMASK: 0
> [    T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600
> 0000000000000000
> [    T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000
> 0000000000000cc0
> [    T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff
> 0000000000000001
> [    T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478
> 0000000000000000
> [    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0
> 0000000000000000
> [    T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000
> 0000000000000000
> [    T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000
> c000000012503680
> [    T1] NIP [c0000000008e2400] strlen+0x10/0x30
> [    T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360
> [    T1] Call Trace:
> [    T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0
> (unreliable)
> [    T1] [c0000000125036f0] [c000000000d65b40]
> rtmsg_ifinfo_build_skb+0x80/0x1a0
> [    T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0
> [    T1] [c000000012503800] [c000000000d4de50]
> register_netdevice+0x690/0x770
> [    T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80
> [    T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0
> [    T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0
> [    T1] [c000000012503970] [c000000000d331bc]
> register_pernet_operations+0xec/0x1e0
> [    T1] [c0000000125039d0] [c000000000d33440]
> register_pernet_device+0x60/0xd0
> [    T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160
> [    T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0
> [    T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4
> [    T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4
> [    T1] [c000000012503d40] [c000000002005c7c]
> kernel_init_freeable+0x160/0x1ec
> [    T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270
> [    T1] [c000000012503e10] [c00000000000cd64]
> ret_from_kernel_thread+0x5c/0x64
> [    T1] Instruction dump:
> [    T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000
> 60000000
> [    T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000
> 4082fff8 7c632050
> [    T1] ---[ end trace 0000000000000000 ]---
> [    T1]
> [  T206] ata4: SATA link down (SStatus 0 SControl 300)
> [  T204] ata3: SATA link down (SStatus 0 SControl 300)
> [  T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [  T200] ata1.00: ATA-10: ST1000NX0313         00LY266 00LY265IBM, BE33,
> max UDMA/133
> [  T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA
> [  T200] ata1.00: configured for UDMA/133
> [    T7] scsi 0:0:0:0: Direct-Access     ATA      ST1000NX0313     BE33
> PQ: 0 ANSI: 5
> [    T7] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [  T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00
> TB/932 GiB)
> [  T209] sd 0:0:0:0: [sda] 4096-byte physical blocks
> [  T209] sd 0:0:0:0: [sda] Write Protect is off
> [  T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [  T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [  T209]  sda: sda1 sda2
> [  T209] sd 0:0:0:0: [sda] Attached SCSI removable disk
> [    T1] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> ```
>
>
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-01-29  2:23 ` Zhouyi Zhou
@ 2022-01-29 16:52   ` Paul Menzel
  2022-01-30  0:21     ` Zhouyi Zhou
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Menzel @ 2022-01-29 16:52 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller,
	Jakub Kicinski, netdev

Dear Zhouyi,


Thank you for taking the time.


Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:

> I don't have an IBM machine, but I tried to analyze the problem using
> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> x86_64 kvm virtual machine.

No idea, if it’s architecture specific.

> I saw the panic is caused by registration of sit device (A sit device
> is a type of virtual network device that takes our IPv6 traffic,
> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> over the IPv4 Internet to another host)
> 
> sit device is registered in function sit_init_net:
> 1895    static int __net_init sit_init_net(struct net *net)
> 1896    {
> 1897        struct sit_net *sitn = net_generic(net, sit_net_id);
> 1898        struct ip_tunnel *t;
> 1899        int err;
> 1900
> 1901        sitn->tunnels[0] = sitn->tunnels_wc;
> 1902        sitn->tunnels[1] = sitn->tunnels_l;
> 1903        sitn->tunnels[2] = sitn->tunnels_r;
> 1904        sitn->tunnels[3] = sitn->tunnels_r_l;
> 1905
> 1906        if (!net_has_fallback_tunnels(net))
> 1907            return 0;
> 1908
> 1909        sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> 1910                           NET_NAME_UNKNOWN,
> 1911                           ipip6_tunnel_setup);
> 1912        if (!sitn->fb_tunnel_dev) {
> 1913            err = -ENOMEM;
> 1914            goto err_alloc_dev;
> 1915        }
> 1916        dev_net_set(sitn->fb_tunnel_dev, net);
> 1917        sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> 1918        /* FB netdevice is special: we have one, and only one per netns.
> 1919         * Allowing to move it to another netns is clearly unsafe.
> 1920         */
> 1921        sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> 1922
> 1923        err = register_netdev(sitn->fb_tunnel_dev);
> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> 
> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> (gdb) disassemble if_nlmsg_size
> Dump of assembler code for function if_nlmsg_size:
>     0xffffffff81a0dc20 <+0>:    nopl   0x0(%rax,%rax,1)
>     0xffffffff81a0dc25 <+5>:    push   %rbp
>     0xffffffff81a0dc26 <+6>:    push   %r15
>     0xffffffff81a0dd04 <+228>:    je     0xffffffff81a0de20 <if_nlmsg_size+512>
>     0xffffffff81a0dd0a <+234>:    mov    0x10(%rbp),%rdi
>     ...
>   => 0xffffffff81a0dd0e <+238>:    callq  0xffffffff817532d0 <strlen>
>     0xffffffff81a0dd13 <+243>:    add    $0x10,%eax
>     0xffffffff81a0dd16 <+246>:    movslq %eax,%r12

Excuse my ignorance, would that look the same for ppc64le? 
Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a 
current build (without rcutorture) I have the line below, where strlen 
shows up.

     (gdb) disassemble if_nlmsg_size
     […]
     0xc000000000f7f82c <+332>:	bl      0xc000000000a10e30 <strlen>
     […]

> and the C code for 0xffffffff81a0dd0e is following (line 524):
> 515    static size_t rtnl_link_get_size(const struct net_device *dev)
> 516    {
> 517        const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> 518        size_t size;
> 519
> 520        if (!ops)
> 521            return 0;
> 522
> 523        size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> 524               nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */

How do I connect the disassemby output with the corresponding line?

> But ops is assigned the value of sit_link_ops in function sit_init_net
> line 1917, so I guess something must happened between the calls.
> 
> Do we have KASAN in IBM machine? would KASAN help us find out what
> happened in between?

Unfortunately, KASAN is not support on Power, I have, as far as I can 
see. From `arch/powerpc/Kconfig`:

         select HAVE_ARCH_KASAN                  if PPC32 && 
PPC_PAGE_SHIFT <= 14
         select HAVE_ARCH_KASAN_VMALLOC          if PPC32 && 
PPC_PAGE_SHIFT <= 14

> Hope I can be of more helpful.

Some distributions support multi-arch, so they easily allow 
crosscompiling for different architectures.


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-01-29 16:52   ` Paul Menzel
@ 2022-01-30  0:21     ` Zhouyi Zhou
  2022-01-30  8:19       ` Paul Menzel
  0 siblings, 1 reply; 17+ messages in thread
From: Zhouyi Zhou @ 2022-01-30  0:21 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller,
	Jakub Kicinski, netdev

Dear Paul,

Thank you for your instructions, I learned a lot from this process.

On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Zhouyi,
>
>
> Thank you for taking the time.
>
>
> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
>
> > I don't have an IBM machine, but I tried to analyze the problem using
> > my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > x86_64 kvm virtual machine.
>
> No idea, if it’s architecture specific.
>
> > I saw the panic is caused by registration of sit device (A sit device
> > is a type of virtual network device that takes our IPv6 traffic,
> > encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > over the IPv4 Internet to another host)
> >
> > sit device is registered in function sit_init_net:
> > 1895    static int __net_init sit_init_net(struct net *net)
> > 1896    {
> > 1897        struct sit_net *sitn = net_generic(net, sit_net_id);
> > 1898        struct ip_tunnel *t;
> > 1899        int err;
> > 1900
> > 1901        sitn->tunnels[0] = sitn->tunnels_wc;
> > 1902        sitn->tunnels[1] = sitn->tunnels_l;
> > 1903        sitn->tunnels[2] = sitn->tunnels_r;
> > 1904        sitn->tunnels[3] = sitn->tunnels_r_l;
> > 1905
> > 1906        if (!net_has_fallback_tunnels(net))
> > 1907            return 0;
> > 1908
> > 1909        sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > 1910                           NET_NAME_UNKNOWN,
> > 1911                           ipip6_tunnel_setup);
> > 1912        if (!sitn->fb_tunnel_dev) {
> > 1913            err = -ENOMEM;
> > 1914            goto err_alloc_dev;
> > 1915        }
> > 1916        dev_net_set(sitn->fb_tunnel_dev, net);
> > 1917        sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > 1918        /* FB netdevice is special: we have one, and only one per netns.
> > 1919         * Allowing to move it to another netns is clearly unsafe.
> > 1920         */
> > 1921        sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > 1922
> > 1923        err = register_netdev(sitn->fb_tunnel_dev);
> > register_netdev on line 1923 will call if_nlmsg_size indirectly.
> >
> > On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > (gdb) disassemble if_nlmsg_size
> > Dump of assembler code for function if_nlmsg_size:
> >     0xffffffff81a0dc20 <+0>:    nopl   0x0(%rax,%rax,1)
> >     0xffffffff81a0dc25 <+5>:    push   %rbp
> >     0xffffffff81a0dc26 <+6>:    push   %r15
> >     0xffffffff81a0dd04 <+228>:    je     0xffffffff81a0de20 <if_nlmsg_size+512>
> >     0xffffffff81a0dd0a <+234>:    mov    0x10(%rbp),%rdi
> >     ...
> >   => 0xffffffff81a0dd0e <+238>:    callq  0xffffffff817532d0 <strlen>
> >     0xffffffff81a0dd13 <+243>:    add    $0x10,%eax
> >     0xffffffff81a0dd16 <+246>:    movslq %eax,%r12
>
> Excuse my ignorance, would that look the same for ppc64le?
> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> current build (without rcutorture) I have the line below, where strlen
> shows up.
>
>      (gdb) disassemble if_nlmsg_size
>      […]
>      0xc000000000f7f82c <+332>: bl      0xc000000000a10e30 <strlen>
>      […]
>
> > and the C code for 0xffffffff81a0dd0e is following (line 524):
> > 515    static size_t rtnl_link_get_size(const struct net_device *dev)
> > 516    {
> > 517        const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > 518        size_t size;
> > 519
> > 520        if (!ops)
> > 521            return 0;
> > 522
> > 523        size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > 524               nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
>
> How do I connect the disassemby output with the corresponding line?
I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
for powerpc64le in my Ubuntu 20.04 x86_64.

gdb-multiarch ./vmlinux
(gdb)disassemble if_nlmsg_size
[...]
0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
[...]
(gdb) break *0xc00000000191bf40
Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.

But in include/net/netlink.h:1112, I can't find the call to strlen
1110static inline int nla_total_size(int payload)
1111{
1112        return NLA_ALIGN(nla_attr_size(payload));
1113}
This may be due to the compiler wrongly encode the debug information, I guess.

>
> > But ops is assigned the value of sit_link_ops in function sit_init_net
> > line 1917, so I guess something must happened between the calls.
> >
> > Do we have KASAN in IBM machine? would KASAN help us find out what
> > happened in between?
>
> Unfortunately, KASAN is not support on Power, I have, as far as I can
> see. From `arch/powerpc/Kconfig`:
>
>          select HAVE_ARCH_KASAN                  if PPC32 &&
> PPC_PAGE_SHIFT <= 14
>          select HAVE_ARCH_KASAN_VMALLOC          if PPC32 &&
> PPC_PAGE_SHIFT <= 14
>
en, agree, I invoke "make  menuconfig  ARCH=powerpc
CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
16", I can't find KASAN under Memory Debugging, I guess we should find
the bug by bisecting instead.

> > Hope I can be of more helpful.
>
> Some distributions support multi-arch, so they easily allow
> crosscompiling for different architectures.
I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
to explore it.

Kind regards
Zhouyi

>
>
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-01-30  0:21     ` Zhouyi Zhou
@ 2022-01-30  8:19       ` Paul Menzel
  2022-01-30 13:24         ` Zhouyi Zhou
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Menzel @ 2022-01-30  8:19 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller,
	Jakub Kicinski, netdev

Dear Zhouyi,


Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:

> Thank you for your instructions, I learned a lot from this process.

Same on my end.

> On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:

>> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
>>
>>> I don't have an IBM machine, but I tried to analyze the problem using
>>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
>>> x86_64 kvm virtual machine.
>>
>> No idea, if it’s architecture specific.
>>
>>> I saw the panic is caused by registration of sit device (A sit device
>>> is a type of virtual network device that takes our IPv6 traffic,
>>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
>>> over the IPv4 Internet to another host)
>>>
>>> sit device is registered in function sit_init_net:
>>> 1895    static int __net_init sit_init_net(struct net *net)
>>> 1896    {
>>> 1897        struct sit_net *sitn = net_generic(net, sit_net_id);
>>> 1898        struct ip_tunnel *t;
>>> 1899        int err;
>>> 1900
>>> 1901        sitn->tunnels[0] = sitn->tunnels_wc;
>>> 1902        sitn->tunnels[1] = sitn->tunnels_l;
>>> 1903        sitn->tunnels[2] = sitn->tunnels_r;
>>> 1904        sitn->tunnels[3] = sitn->tunnels_r_l;
>>> 1905
>>> 1906        if (!net_has_fallback_tunnels(net))
>>> 1907            return 0;
>>> 1908
>>> 1909        sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
>>> 1910                           NET_NAME_UNKNOWN,
>>> 1911                           ipip6_tunnel_setup);
>>> 1912        if (!sitn->fb_tunnel_dev) {
>>> 1913            err = -ENOMEM;
>>> 1914            goto err_alloc_dev;
>>> 1915        }
>>> 1916        dev_net_set(sitn->fb_tunnel_dev, net);
>>> 1917        sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
>>> 1918        /* FB netdevice is special: we have one, and only one per netns.
>>> 1919         * Allowing to move it to another netns is clearly unsafe.
>>> 1920         */
>>> 1921        sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
>>> 1922
>>> 1923        err = register_netdev(sitn->fb_tunnel_dev);
>>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
>>>
>>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
>>> (gdb) disassemble if_nlmsg_size
>>> Dump of assembler code for function if_nlmsg_size:
>>>      0xffffffff81a0dc20 <+0>:    nopl   0x0(%rax,%rax,1)
>>>      0xffffffff81a0dc25 <+5>:    push   %rbp
>>>      0xffffffff81a0dc26 <+6>:    push   %r15
>>>      0xffffffff81a0dd04 <+228>:    je     0xffffffff81a0de20 <if_nlmsg_size+512>
>>>      0xffffffff81a0dd0a <+234>:    mov    0x10(%rbp),%rdi
>>>      ...
>>>    => 0xffffffff81a0dd0e <+238>:    callq  0xffffffff817532d0 <strlen>
>>>      0xffffffff81a0dd13 <+243>:    add    $0x10,%eax
>>>      0xffffffff81a0dd16 <+246>:    movslq %eax,%r12
>>
>> Excuse my ignorance, would that look the same for ppc64le?
>> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
>> current build (without rcutorture) I have the line below, where strlen
>> shows up.
>>
>>       (gdb) disassemble if_nlmsg_size
>>       […]
>>       0xc000000000f7f82c <+332>: bl      0xc000000000a10e30 <strlen>
>>       […]
>>
>>> and the C code for 0xffffffff81a0dd0e is following (line 524):
>>> 515    static size_t rtnl_link_get_size(const struct net_device *dev)
>>> 516    {
>>> 517        const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
>>> 518        size_t size;
>>> 519
>>> 520        if (!ops)
>>> 521            return 0;
>>> 522
>>> 523        size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
>>> 524               nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
>>
>> How do I connect the disassemby output with the corresponding line?
> I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> for powerpc64le in my Ubuntu 20.04 x86_64.
> 
> gdb-multiarch ./vmlinux
> (gdb)disassemble if_nlmsg_size
> [...]
> 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> [...]
> (gdb) break *0xc00000000191bf40
> Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> 
> But in include/net/netlink.h:1112, I can't find the call to strlen
> 1110static inline int nla_total_size(int payload)
> 1111{
> 1112        return NLA_ALIGN(nla_attr_size(payload));
> 1113}
> This may be due to the compiler wrongly encode the debug information, I guess.

`rtnl_link_get_size()` contains:

             size = nla_total_size(sizeof(struct nlattr)) + /* 
IFLA_LINKINFO */
                    nla_total_size(strlen(ops->kind) + 1);  /* 
IFLA_INFO_KIND */

Is that inlined(?) and the code at fault?

>>> But ops is assigned the value of sit_link_ops in function sit_init_net
>>> line 1917, so I guess something must happened between the calls.
>>>
>>> Do we have KASAN in IBM machine? would KASAN help us find out what
>>> happened in between?
>>
>> Unfortunately, KASAN is not support on Power, I have, as far as I can
>> see. From `arch/powerpc/Kconfig`:
>>
>>           select HAVE_ARCH_KASAN                  if PPC32 && PPC_PAGE_SHIFT <= 14
>>           select HAVE_ARCH_KASAN_VMALLOC          if PPC32 && PPC_PAGE_SHIFT <= 14
>>
> en, agree, I invoke "make  menuconfig  ARCH=powerpc
> CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> 16", I can't find KASAN under Memory Debugging, I guess we should find
> the bug by bisecting instead.

I do not know, if it is a regression, as it was the first time I tried 
to run a Linux kernel built with rcutorture on real hardware.

>>> Hope I can be of more helpful.
>>
>> Some distributions support multi-arch, so they easily allow
>> crosscompiling for different architectures.
> I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> to explore it.

Oh, that does not sound good. But I have not tried that in a long time 
either. It’s a separate issue, but maybe some of the PPC 
maintainers/folks could help.


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-01-30  8:19       ` Paul Menzel
@ 2022-01-30 13:24         ` Zhouyi Zhou
  2022-01-30 17:44           ` Paul E. McKenney
       [not found]           ` <f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de>
  0 siblings, 2 replies; 17+ messages in thread
From: Zhouyi Zhou @ 2022-01-30 13:24 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller,
	Jakub Kicinski, netdev

Dear Paul

On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Zhouyi,
>
>
> Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
>
> > Thank you for your instructions, I learned a lot from this process.
>
> Same on my end.
>
> > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> >>
> >>> I don't have an IBM machine, but I tried to analyze the problem using
> >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> >>> x86_64 kvm virtual machine.
> >>
> >> No idea, if it’s architecture specific.
> >>
> >>> I saw the panic is caused by registration of sit device (A sit device
> >>> is a type of virtual network device that takes our IPv6 traffic,
> >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> >>> over the IPv4 Internet to another host)
> >>>
> >>> sit device is registered in function sit_init_net:
> >>> 1895    static int __net_init sit_init_net(struct net *net)
> >>> 1896    {
> >>> 1897        struct sit_net *sitn = net_generic(net, sit_net_id);
> >>> 1898        struct ip_tunnel *t;
> >>> 1899        int err;
> >>> 1900
> >>> 1901        sitn->tunnels[0] = sitn->tunnels_wc;
> >>> 1902        sitn->tunnels[1] = sitn->tunnels_l;
> >>> 1903        sitn->tunnels[2] = sitn->tunnels_r;
> >>> 1904        sitn->tunnels[3] = sitn->tunnels_r_l;
> >>> 1905
> >>> 1906        if (!net_has_fallback_tunnels(net))
> >>> 1907            return 0;
> >>> 1908
> >>> 1909        sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> >>> 1910                           NET_NAME_UNKNOWN,
> >>> 1911                           ipip6_tunnel_setup);
> >>> 1912        if (!sitn->fb_tunnel_dev) {
> >>> 1913            err = -ENOMEM;
> >>> 1914            goto err_alloc_dev;
> >>> 1915        }
> >>> 1916        dev_net_set(sitn->fb_tunnel_dev, net);
> >>> 1917        sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> >>> 1918        /* FB netdevice is special: we have one, and only one per netns.
> >>> 1919         * Allowing to move it to another netns is clearly unsafe.
> >>> 1920         */
> >>> 1921        sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> >>> 1922
> >>> 1923        err = register_netdev(sitn->fb_tunnel_dev);
> >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> >>>
> >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> >>> (gdb) disassemble if_nlmsg_size
> >>> Dump of assembler code for function if_nlmsg_size:
> >>>      0xffffffff81a0dc20 <+0>:    nopl   0x0(%rax,%rax,1)
> >>>      0xffffffff81a0dc25 <+5>:    push   %rbp
> >>>      0xffffffff81a0dc26 <+6>:    push   %r15
> >>>      0xffffffff81a0dd04 <+228>:    je     0xffffffff81a0de20 <if_nlmsg_size+512>
> >>>      0xffffffff81a0dd0a <+234>:    mov    0x10(%rbp),%rdi
> >>>      ...
> >>>    => 0xffffffff81a0dd0e <+238>:    callq  0xffffffff817532d0 <strlen>
> >>>      0xffffffff81a0dd13 <+243>:    add    $0x10,%eax
> >>>      0xffffffff81a0dd16 <+246>:    movslq %eax,%r12
> >>
> >> Excuse my ignorance, would that look the same for ppc64le?
> >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> >> current build (without rcutorture) I have the line below, where strlen
> >> shows up.
> >>
> >>       (gdb) disassemble if_nlmsg_size
> >>       […]
> >>       0xc000000000f7f82c <+332>: bl      0xc000000000a10e30 <strlen>
> >>       […]
> >>
> >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> >>> 515    static size_t rtnl_link_get_size(const struct net_device *dev)
> >>> 516    {
> >>> 517        const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> >>> 518        size_t size;
> >>> 519
> >>> 520        if (!ops)
> >>> 521            return 0;
> >>> 522
> >>> 523        size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> >>> 524               nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> >>
> >> How do I connect the disassemby output with the corresponding line?
> > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > for powerpc64le in my Ubuntu 20.04 x86_64.
> >
> > gdb-multiarch ./vmlinux
> > (gdb)disassemble if_nlmsg_size
> > [...]
> > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > [...]
> > (gdb) break *0xc00000000191bf40
> > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> >
> > But in include/net/netlink.h:1112, I can't find the call to strlen
> > 1110static inline int nla_total_size(int payload)
> > 1111{
> > 1112        return NLA_ALIGN(nla_attr_size(payload));
> > 1113}
> > This may be due to the compiler wrongly encode the debug information, I guess.
>
> `rtnl_link_get_size()` contains:
>
>              size = nla_total_size(sizeof(struct nlattr)) + /*
> IFLA_LINKINFO */
>                     nla_total_size(strlen(ops->kind) + 1);  /*
> IFLA_INFO_KIND */
>
> Is that inlined(?) and the code at fault?
Yes, that is inlined! because
(gdb) disassemble if_nlmsg_size
Dump of assembler code for function if_nlmsg_size:
[...]
0xc00000000191bf38 <+104>:    beq     0xc00000000191c1f0 <if_nlmsg_size+800>
0xc00000000191bf3c <+108>:    ld      r3,16(r31)
0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
[...]
(gdb)
(gdb) break *0xc00000000191bf40
Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
(gdb) break *0xc00000000191bf38
Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.

>
> >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> >>> line 1917, so I guess something must happened between the calls.
> >>>
> >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> >>> happened in between?
> >>
> >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> >> see. From `arch/powerpc/Kconfig`:
> >>
> >>           select HAVE_ARCH_KASAN                  if PPC32 && PPC_PAGE_SHIFT <= 14
> >>           select HAVE_ARCH_KASAN_VMALLOC          if PPC32 && PPC_PAGE_SHIFT <= 14
> >>
> > en, agree, I invoke "make  menuconfig  ARCH=powerpc
> > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > the bug by bisecting instead.
>
> I do not know, if it is a regression, as it was the first time I tried
> to run a Linux kernel built with rcutorture on real hardware.
I tried to add some debug statements to the kernel to locate the bug
more accurately,  you can try it when you're not busy in the future,
or just ignore it if the following patch looks not very effective ;-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 1baab07820f6..969ac7c540cc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
      *    Prevent userspace races by waiting until the network
      *    device is fully setup before sending notifications.
      */
+    if (dev->rtnl_link_ops)
+        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
+               dev->rtnl_link_ops->kind, __FUNCTION__);
     if (!dev->rtnl_link_ops ||
         dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
         rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
@@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)

     if (rtnl_lock_killable())
         return -EINTR;
+    if (dev->rtnl_link_ops)
+        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
+               dev->rtnl_link_ops->kind, __FUNCTION__);
     err = register_netdevice(dev);
     rtnl_unlock();
     return err;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e476403231f0..e08986ae6238 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
net_device *dev)
     if (!ops)
         return 0;

+    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
+           ops->kind, __FUNCTION__);
     size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
            nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */

@@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
net_device *dev)
 static noinline size_t if_nlmsg_size(const struct net_device *dev,
                      u32 ext_filter_mask)
 {
+    if (dev->rtnl_link_ops)
+        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
+               dev->rtnl_link_ops->kind, __FUNCTION__);
     return NLMSG_ALIGN(sizeof(struct ifinfomsg))
            + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
            + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
@@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
struct net_device *dev,
     struct net *net = dev_net(dev);
     struct sk_buff *skb;
     int err = -ENOBUFS;
-
+    if (dev->rtnl_link_ops)
+        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
+               dev->rtnl_link_ops->kind, __FUNCTION__);
     skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
     if (skb == NULL)
         goto errout;
@@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
net_device *dev,

     if (dev->reg_state != NETREG_REGISTERED)
         return;
-
+    if (dev->rtnl_link_ops)
+        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
+               dev->rtnl_link_ops->kind, __FUNCTION__);
     skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
                      new_ifindex);
     if (skb)
@@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
net_device *dev,
 void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
           gfp_t flags)
 {
+    if (dev->rtnl_link_ops)
+        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
+               dev->rtnl_link_ops->kind, __FUNCTION__);
     rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
                NULL, 0);
 }
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index c0b138c20992..fa5b2725811c 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
      * Allowing to move it to another netns is clearly unsafe.
      */
     sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
-
+    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
+           sitn->fb_tunnel_dev->rtnl_link_ops,
+           sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
     err = register_netdev(sitn->fb_tunnel_dev);
     if (err)
         goto err_reg_dev;
>
> >>> Hope I can be of more helpful.
> >>
> >> Some distributions support multi-arch, so they easily allow
> >> crosscompiling for different architectures.
> > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > to explore it.
>
> Oh, that does not sound good. But I have not tried that in a long time
> either. It’s a separate issue, but maybe some of the PPC
> maintainers/folks could help.
I will do further research on this later.

Thanks for your time
Kind regards
Zhouyi
>
>
> Kind regards,
>
> Paul

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-01-30 13:24         ` Zhouyi Zhou
@ 2022-01-30 17:44           ` Paul E. McKenney
  2022-01-31  1:08             ` Zhouyi Zhou
       [not found]           ` <f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de>
  1 sibling, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2022-01-30 17:44 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Paul Menzel, Josh Triplett, rcu, LKML, David S. Miller,
	Jakub Kicinski, netdev

On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote:
> Dear Paul
> 
> On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> >
> > Dear Zhouyi,
> >
> >
> > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> >
> > > Thank you for your instructions, I learned a lot from this process.
> >
> > Same on my end.
> >
> > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> >
> > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> > >>
> > >>> I don't have an IBM machine, but I tried to analyze the problem using
> > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > >>> x86_64 kvm virtual machine.
> > >>
> > >> No idea, if it’s architecture specific.
> > >>
> > >>> I saw the panic is caused by registration of sit device (A sit device
> > >>> is a type of virtual network device that takes our IPv6 traffic,
> > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > >>> over the IPv4 Internet to another host)
> > >>>
> > >>> sit device is registered in function sit_init_net:
> > >>> 1895    static int __net_init sit_init_net(struct net *net)
> > >>> 1896    {
> > >>> 1897        struct sit_net *sitn = net_generic(net, sit_net_id);
> > >>> 1898        struct ip_tunnel *t;
> > >>> 1899        int err;
> > >>> 1900
> > >>> 1901        sitn->tunnels[0] = sitn->tunnels_wc;
> > >>> 1902        sitn->tunnels[1] = sitn->tunnels_l;
> > >>> 1903        sitn->tunnels[2] = sitn->tunnels_r;
> > >>> 1904        sitn->tunnels[3] = sitn->tunnels_r_l;
> > >>> 1905
> > >>> 1906        if (!net_has_fallback_tunnels(net))
> > >>> 1907            return 0;
> > >>> 1908
> > >>> 1909        sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > >>> 1910                           NET_NAME_UNKNOWN,
> > >>> 1911                           ipip6_tunnel_setup);
> > >>> 1912        if (!sitn->fb_tunnel_dev) {
> > >>> 1913            err = -ENOMEM;
> > >>> 1914            goto err_alloc_dev;
> > >>> 1915        }
> > >>> 1916        dev_net_set(sitn->fb_tunnel_dev, net);
> > >>> 1917        sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > >>> 1918        /* FB netdevice is special: we have one, and only one per netns.
> > >>> 1919         * Allowing to move it to another netns is clearly unsafe.
> > >>> 1920         */
> > >>> 1921        sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > >>> 1922
> > >>> 1923        err = register_netdev(sitn->fb_tunnel_dev);
> > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> > >>>
> > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > >>> (gdb) disassemble if_nlmsg_size
> > >>> Dump of assembler code for function if_nlmsg_size:
> > >>>      0xffffffff81a0dc20 <+0>:    nopl   0x0(%rax,%rax,1)
> > >>>      0xffffffff81a0dc25 <+5>:    push   %rbp
> > >>>      0xffffffff81a0dc26 <+6>:    push   %r15
> > >>>      0xffffffff81a0dd04 <+228>:    je     0xffffffff81a0de20 <if_nlmsg_size+512>
> > >>>      0xffffffff81a0dd0a <+234>:    mov    0x10(%rbp),%rdi
> > >>>      ...
> > >>>    => 0xffffffff81a0dd0e <+238>:    callq  0xffffffff817532d0 <strlen>
> > >>>      0xffffffff81a0dd13 <+243>:    add    $0x10,%eax
> > >>>      0xffffffff81a0dd16 <+246>:    movslq %eax,%r12
> > >>
> > >> Excuse my ignorance, would that look the same for ppc64le?
> > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> > >> current build (without rcutorture) I have the line below, where strlen
> > >> shows up.
> > >>
> > >>       (gdb) disassemble if_nlmsg_size
> > >>       […]
> > >>       0xc000000000f7f82c <+332>: bl      0xc000000000a10e30 <strlen>
> > >>       […]
> > >>
> > >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> > >>> 515    static size_t rtnl_link_get_size(const struct net_device *dev)
> > >>> 516    {
> > >>> 517        const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > >>> 518        size_t size;
> > >>> 519
> > >>> 520        if (!ops)
> > >>> 521            return 0;
> > >>> 522
> > >>> 523        size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > >>> 524               nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > >>
> > >> How do I connect the disassemby output with the corresponding line?
> > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > for powerpc64le in my Ubuntu 20.04 x86_64.
> > >
> > > gdb-multiarch ./vmlinux
> > > (gdb)disassemble if_nlmsg_size
> > > [...]
> > > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > > [...]
> > > (gdb) break *0xc00000000191bf40
> > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > >
> > > But in include/net/netlink.h:1112, I can't find the call to strlen
> > > 1110static inline int nla_total_size(int payload)
> > > 1111{
> > > 1112        return NLA_ALIGN(nla_attr_size(payload));
> > > 1113}
> > > This may be due to the compiler wrongly encode the debug information, I guess.
> >
> > `rtnl_link_get_size()` contains:
> >
> >              size = nla_total_size(sizeof(struct nlattr)) + /*
> > IFLA_LINKINFO */
> >                     nla_total_size(strlen(ops->kind) + 1);  /*
> > IFLA_INFO_KIND */
> >
> > Is that inlined(?) and the code at fault?
> Yes, that is inlined! because
> (gdb) disassemble if_nlmsg_size
> Dump of assembler code for function if_nlmsg_size:
> [...]
> 0xc00000000191bf38 <+104>:    beq     0xc00000000191c1f0 <if_nlmsg_size+800>
> 0xc00000000191bf3c <+108>:    ld      r3,16(r31)
> 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> [...]
> (gdb)
> (gdb) break *0xc00000000191bf40
> Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> (gdb) break *0xc00000000191bf38
> Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.

I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not
already doing so.  That gives gdb a lot more information about things
like inlining.

							Thanx, Paul

> > >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> > >>> line 1917, so I guess something must happened between the calls.
> > >>>
> > >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> > >>> happened in between?
> > >>
> > >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> > >> see. From `arch/powerpc/Kconfig`:
> > >>
> > >>           select HAVE_ARCH_KASAN                  if PPC32 && PPC_PAGE_SHIFT <= 14
> > >>           select HAVE_ARCH_KASAN_VMALLOC          if PPC32 && PPC_PAGE_SHIFT <= 14
> > >>
> > > en, agree, I invoke "make  menuconfig  ARCH=powerpc
> > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > > the bug by bisecting instead.
> >
> > I do not know, if it is a regression, as it was the first time I tried
> > to run a Linux kernel built with rcutorture on real hardware.
> I tried to add some debug statements to the kernel to locate the bug
> more accurately,  you can try it when you're not busy in the future,
> or just ignore it if the following patch looks not very effective ;-)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1baab07820f6..969ac7c540cc 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
>       *    Prevent userspace races by waiting until the network
>       *    device is fully setup before sending notifications.
>       */
> +    if (dev->rtnl_link_ops)
> +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> +               dev->rtnl_link_ops->kind, __FUNCTION__);
>      if (!dev->rtnl_link_ops ||
>          dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
>          rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
> 
>      if (rtnl_lock_killable())
>          return -EINTR;
> +    if (dev->rtnl_link_ops)
> +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> +               dev->rtnl_link_ops->kind, __FUNCTION__);
>      err = register_netdevice(dev);
>      rtnl_unlock();
>      return err;
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index e476403231f0..e08986ae6238 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> net_device *dev)
>      if (!ops)
>          return 0;
> 
> +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> +           ops->kind, __FUNCTION__);
>      size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
>             nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> 
> @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> net_device *dev)
>  static noinline size_t if_nlmsg_size(const struct net_device *dev,
>                       u32 ext_filter_mask)
>  {
> +    if (dev->rtnl_link_ops)
> +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> +               dev->rtnl_link_ops->kind, __FUNCTION__);
>      return NLMSG_ALIGN(sizeof(struct ifinfomsg))
>             + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
>             + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> struct net_device *dev,
>      struct net *net = dev_net(dev);
>      struct sk_buff *skb;
>      int err = -ENOBUFS;
> -
> +    if (dev->rtnl_link_ops)
> +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> +               dev->rtnl_link_ops->kind, __FUNCTION__);
>      skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
>      if (skb == NULL)
>          goto errout;
> @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> net_device *dev,
> 
>      if (dev->reg_state != NETREG_REGISTERED)
>          return;
> -
> +    if (dev->rtnl_link_ops)
> +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> +               dev->rtnl_link_ops->kind, __FUNCTION__);
>      skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
>                       new_ifindex);
>      if (skb)
> @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> net_device *dev,
>  void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
>            gfp_t flags)
>  {
> +    if (dev->rtnl_link_ops)
> +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> +               dev->rtnl_link_ops->kind, __FUNCTION__);
>      rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
>                 NULL, 0);
>  }
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index c0b138c20992..fa5b2725811c 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
>       * Allowing to move it to another netns is clearly unsafe.
>       */
>      sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> -
> +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> +           sitn->fb_tunnel_dev->rtnl_link_ops,
> +           sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
>      err = register_netdev(sitn->fb_tunnel_dev);
>      if (err)
>          goto err_reg_dev;
> >
> > >>> Hope I can be of more helpful.
> > >>
> > >> Some distributions support multi-arch, so they easily allow
> > >> crosscompiling for different architectures.
> > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > > to explore it.
> >
> > Oh, that does not sound good. But I have not tried that in a long time
> > either. It’s a separate issue, but maybe some of the PPC
> > maintainers/folks could help.
> I will do further research on this later.
> 
> Thanks for your time
> Kind regards
> Zhouyi
> >
> >
> > Kind regards,
> >
> > Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-01-30 17:44           ` Paul E. McKenney
@ 2022-01-31  1:08             ` Zhouyi Zhou
  2022-02-01 17:50               ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: Zhouyi Zhou @ 2022-01-31  1:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Paul Menzel, Josh Triplett, rcu, LKML, David S. Miller,
	Jakub Kicinski, netdev

Thank Paul for joining us!

On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote:
> > Dear Paul
> >
> > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> > >
> > > Dear Zhouyi,
> > >
> > >
> > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> > >
> > > > Thank you for your instructions, I learned a lot from this process.
> > >
> > > Same on my end.
> > >
> > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> > >
> > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> > > >>
> > > >>> I don't have an IBM machine, but I tried to analyze the problem using
> > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > > >>> x86_64 kvm virtual machine.
> > > >>
> > > >> No idea, if it’s architecture specific.
> > > >>
> > > >>> I saw the panic is caused by registration of sit device (A sit device
> > > >>> is a type of virtual network device that takes our IPv6 traffic,
> > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > > >>> over the IPv4 Internet to another host)
> > > >>>
> > > >>> sit device is registered in function sit_init_net:
> > > >>> 1895    static int __net_init sit_init_net(struct net *net)
> > > >>> 1896    {
> > > >>> 1897        struct sit_net *sitn = net_generic(net, sit_net_id);
> > > >>> 1898        struct ip_tunnel *t;
> > > >>> 1899        int err;
> > > >>> 1900
> > > >>> 1901        sitn->tunnels[0] = sitn->tunnels_wc;
> > > >>> 1902        sitn->tunnels[1] = sitn->tunnels_l;
> > > >>> 1903        sitn->tunnels[2] = sitn->tunnels_r;
> > > >>> 1904        sitn->tunnels[3] = sitn->tunnels_r_l;
> > > >>> 1905
> > > >>> 1906        if (!net_has_fallback_tunnels(net))
> > > >>> 1907            return 0;
> > > >>> 1908
> > > >>> 1909        sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > > >>> 1910                           NET_NAME_UNKNOWN,
> > > >>> 1911                           ipip6_tunnel_setup);
> > > >>> 1912        if (!sitn->fb_tunnel_dev) {
> > > >>> 1913            err = -ENOMEM;
> > > >>> 1914            goto err_alloc_dev;
> > > >>> 1915        }
> > > >>> 1916        dev_net_set(sitn->fb_tunnel_dev, net);
> > > >>> 1917        sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > > >>> 1918        /* FB netdevice is special: we have one, and only one per netns.
> > > >>> 1919         * Allowing to move it to another netns is clearly unsafe.
> > > >>> 1920         */
> > > >>> 1921        sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > >>> 1922
> > > >>> 1923        err = register_netdev(sitn->fb_tunnel_dev);
> > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> > > >>>
> > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > > >>> (gdb) disassemble if_nlmsg_size
> > > >>> Dump of assembler code for function if_nlmsg_size:
> > > >>>      0xffffffff81a0dc20 <+0>:    nopl   0x0(%rax,%rax,1)
> > > >>>      0xffffffff81a0dc25 <+5>:    push   %rbp
> > > >>>      0xffffffff81a0dc26 <+6>:    push   %r15
> > > >>>      0xffffffff81a0dd04 <+228>:    je     0xffffffff81a0de20 <if_nlmsg_size+512>
> > > >>>      0xffffffff81a0dd0a <+234>:    mov    0x10(%rbp),%rdi
> > > >>>      ...
> > > >>>    => 0xffffffff81a0dd0e <+238>:    callq  0xffffffff817532d0 <strlen>
> > > >>>      0xffffffff81a0dd13 <+243>:    add    $0x10,%eax
> > > >>>      0xffffffff81a0dd16 <+246>:    movslq %eax,%r12
> > > >>
> > > >> Excuse my ignorance, would that look the same for ppc64le?
> > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> > > >> current build (without rcutorture) I have the line below, where strlen
> > > >> shows up.
> > > >>
> > > >>       (gdb) disassemble if_nlmsg_size
> > > >>       […]
> > > >>       0xc000000000f7f82c <+332>: bl      0xc000000000a10e30 <strlen>
> > > >>       […]
> > > >>
> > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> > > >>> 515    static size_t rtnl_link_get_size(const struct net_device *dev)
> > > >>> 516    {
> > > >>> 517        const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > > >>> 518        size_t size;
> > > >>> 519
> > > >>> 520        if (!ops)
> > > >>> 521            return 0;
> > > >>> 522
> > > >>> 523        size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > >>> 524               nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > > >>
> > > >> How do I connect the disassemby output with the corresponding line?
> > > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > for powerpc64le in my Ubuntu 20.04 x86_64.
> > > >
> > > > gdb-multiarch ./vmlinux
> > > > (gdb)disassemble if_nlmsg_size
> > > > [...]
> > > > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > > > [...]
> > > > (gdb) break *0xc00000000191bf40
> > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > >
> > > > But in include/net/netlink.h:1112, I can't find the call to strlen
> > > > 1110static inline int nla_total_size(int payload)
> > > > 1111{
> > > > 1112        return NLA_ALIGN(nla_attr_size(payload));
> > > > 1113}
> > > > This may be due to the compiler wrongly encode the debug information, I guess.
> > >
> > > `rtnl_link_get_size()` contains:
> > >
> > >              size = nla_total_size(sizeof(struct nlattr)) + /*
> > > IFLA_LINKINFO */
> > >                     nla_total_size(strlen(ops->kind) + 1);  /*
> > > IFLA_INFO_KIND */
> > >
> > > Is that inlined(?) and the code at fault?
> > Yes, that is inlined! because
> > (gdb) disassemble if_nlmsg_size
> > Dump of assembler code for function if_nlmsg_size:
> > [...]
> > 0xc00000000191bf38 <+104>:    beq     0xc00000000191c1f0 <if_nlmsg_size+800>
> > 0xc00000000191bf3c <+108>:    ld      r3,16(r31)
> > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > [...]
> > (gdb)
> > (gdb) break *0xc00000000191bf40
> > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > (gdb) break *0xc00000000191bf38
> > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
>
> I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not
> already doing so.  That gives gdb a lot more information about things
> like inlining.
I check my .config file, CONFIG_DEBUG_INFO=y is here:
linux-next$ grep CONFIG_DEBUG_INFO .config
CONFIG_DEBUG_INFO=y
Then I invoke "make clean" and rebuild the kernel, the behavior of gdb
and vmlinux remain unchanged, sorry for that

I am trying to reproduce the bug on my bare metal x86_64 machines in
the coming days, and am also trying to work with Mr Menzel after he
comes back to the office.

Thanks
Zhouyi
>
>                                                         Thanx, Paul
>
> > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> > > >>> line 1917, so I guess something must happened between the calls.
> > > >>>
> > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> > > >>> happened in between?
> > > >>
> > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> > > >> see. From `arch/powerpc/Kconfig`:
> > > >>
> > > >>           select HAVE_ARCH_KASAN                  if PPC32 && PPC_PAGE_SHIFT <= 14
> > > >>           select HAVE_ARCH_KASAN_VMALLOC          if PPC32 && PPC_PAGE_SHIFT <= 14
> > > >>
> > > > en, agree, I invoke "make  menuconfig  ARCH=powerpc
> > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > > > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > > > the bug by bisecting instead.
> > >
> > > I do not know, if it is a regression, as it was the first time I tried
> > > to run a Linux kernel built with rcutorture on real hardware.
> > I tried to add some debug statements to the kernel to locate the bug
> > more accurately,  you can try it when you're not busy in the future,
> > or just ignore it if the following patch looks not very effective ;-)
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 1baab07820f6..969ac7c540cc 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
> >       *    Prevent userspace races by waiting until the network
> >       *    device is fully setup before sending notifications.
> >       */
> > +    if (dev->rtnl_link_ops)
> > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> >      if (!dev->rtnl_link_ops ||
> >          dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> >          rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
> >
> >      if (rtnl_lock_killable())
> >          return -EINTR;
> > +    if (dev->rtnl_link_ops)
> > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> >      err = register_netdevice(dev);
> >      rtnl_unlock();
> >      return err;
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index e476403231f0..e08986ae6238 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> > net_device *dev)
> >      if (!ops)
> >          return 0;
> >
> > +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> > +           ops->kind, __FUNCTION__);
> >      size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> >             nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> >
> > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> > net_device *dev)
> >  static noinline size_t if_nlmsg_size(const struct net_device *dev,
> >                       u32 ext_filter_mask)
> >  {
> > +    if (dev->rtnl_link_ops)
> > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> >      return NLMSG_ALIGN(sizeof(struct ifinfomsg))
> >             + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
> >             + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> > struct net_device *dev,
> >      struct net *net = dev_net(dev);
> >      struct sk_buff *skb;
> >      int err = -ENOBUFS;
> > -
> > +    if (dev->rtnl_link_ops)
> > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> >      skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
> >      if (skb == NULL)
> >          goto errout;
> > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > net_device *dev,
> >
> >      if (dev->reg_state != NETREG_REGISTERED)
> >          return;
> > -
> > +    if (dev->rtnl_link_ops)
> > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> >      skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
> >                       new_ifindex);
> >      if (skb)
> > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > net_device *dev,
> >  void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> >            gfp_t flags)
> >  {
> > +    if (dev->rtnl_link_ops)
> > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> >      rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> >                 NULL, 0);
> >  }
> > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > index c0b138c20992..fa5b2725811c 100644
> > --- a/net/ipv6/sit.c
> > +++ b/net/ipv6/sit.c
> > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
> >       * Allowing to move it to another netns is clearly unsafe.
> >       */
> >      sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > -
> > +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> > +           sitn->fb_tunnel_dev->rtnl_link_ops,
> > +           sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
> >      err = register_netdev(sitn->fb_tunnel_dev);
> >      if (err)
> >          goto err_reg_dev;
> > >
> > > >>> Hope I can be of more helpful.
> > > >>
> > > >> Some distributions support multi-arch, so they easily allow
> > > >> crosscompiling for different architectures.
> > > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > > > to explore it.
> > >
> > > Oh, that does not sound good. But I have not tried that in a long time
> > > either. It’s a separate issue, but maybe some of the PPC
> > > maintainers/folks could help.
> > I will do further research on this later.
> >
> > Thanks for your time
> > Kind regards
> > Zhouyi
> > >
> > >
> > > Kind regards,
> > >
> > > Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-01-31  1:08             ` Zhouyi Zhou
@ 2022-02-01 17:50               ` Paul E. McKenney
  2022-02-02  2:39                 ` Zhouyi Zhou
  0 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2022-02-01 17:50 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Paul Menzel, Josh Triplett, rcu, LKML, David S. Miller,
	Jakub Kicinski, netdev

On Mon, Jan 31, 2022 at 09:08:40AM +0800, Zhouyi Zhou wrote:
> Thank Paul for joining us!
> 
> On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote:
> > > Dear Paul
> > >
> > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> > > >
> > > > Dear Zhouyi,
> > > >
> > > >
> > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> > > >
> > > > > Thank you for your instructions, I learned a lot from this process.
> > > >
> > > > Same on my end.
> > > >
> > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> > > >
> > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> > > > >>
> > > > >>> I don't have an IBM machine, but I tried to analyze the problem using
> > > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > > > >>> x86_64 kvm virtual machine.
> > > > >>
> > > > >> No idea, if it’s architecture specific.
> > > > >>
> > > > >>> I saw the panic is caused by registration of sit device (A sit device
> > > > >>> is a type of virtual network device that takes our IPv6 traffic,
> > > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > > > >>> over the IPv4 Internet to another host)
> > > > >>>
> > > > >>> sit device is registered in function sit_init_net:
> > > > >>> 1895    static int __net_init sit_init_net(struct net *net)
> > > > >>> 1896    {
> > > > >>> 1897        struct sit_net *sitn = net_generic(net, sit_net_id);
> > > > >>> 1898        struct ip_tunnel *t;
> > > > >>> 1899        int err;
> > > > >>> 1900
> > > > >>> 1901        sitn->tunnels[0] = sitn->tunnels_wc;
> > > > >>> 1902        sitn->tunnels[1] = sitn->tunnels_l;
> > > > >>> 1903        sitn->tunnels[2] = sitn->tunnels_r;
> > > > >>> 1904        sitn->tunnels[3] = sitn->tunnels_r_l;
> > > > >>> 1905
> > > > >>> 1906        if (!net_has_fallback_tunnels(net))
> > > > >>> 1907            return 0;
> > > > >>> 1908
> > > > >>> 1909        sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > > > >>> 1910                           NET_NAME_UNKNOWN,
> > > > >>> 1911                           ipip6_tunnel_setup);
> > > > >>> 1912        if (!sitn->fb_tunnel_dev) {
> > > > >>> 1913            err = -ENOMEM;
> > > > >>> 1914            goto err_alloc_dev;
> > > > >>> 1915        }
> > > > >>> 1916        dev_net_set(sitn->fb_tunnel_dev, net);
> > > > >>> 1917        sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > > > >>> 1918        /* FB netdevice is special: we have one, and only one per netns.
> > > > >>> 1919         * Allowing to move it to another netns is clearly unsafe.
> > > > >>> 1920         */
> > > > >>> 1921        sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > >>> 1922
> > > > >>> 1923        err = register_netdev(sitn->fb_tunnel_dev);
> > > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> > > > >>>
> > > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > > > >>> (gdb) disassemble if_nlmsg_size
> > > > >>> Dump of assembler code for function if_nlmsg_size:
> > > > >>>      0xffffffff81a0dc20 <+0>:    nopl   0x0(%rax,%rax,1)
> > > > >>>      0xffffffff81a0dc25 <+5>:    push   %rbp
> > > > >>>      0xffffffff81a0dc26 <+6>:    push   %r15
> > > > >>>      0xffffffff81a0dd04 <+228>:    je     0xffffffff81a0de20 <if_nlmsg_size+512>
> > > > >>>      0xffffffff81a0dd0a <+234>:    mov    0x10(%rbp),%rdi
> > > > >>>      ...
> > > > >>>    => 0xffffffff81a0dd0e <+238>:    callq  0xffffffff817532d0 <strlen>
> > > > >>>      0xffffffff81a0dd13 <+243>:    add    $0x10,%eax
> > > > >>>      0xffffffff81a0dd16 <+246>:    movslq %eax,%r12
> > > > >>
> > > > >> Excuse my ignorance, would that look the same for ppc64le?
> > > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> > > > >> current build (without rcutorture) I have the line below, where strlen
> > > > >> shows up.
> > > > >>
> > > > >>       (gdb) disassemble if_nlmsg_size
> > > > >>       […]
> > > > >>       0xc000000000f7f82c <+332>: bl      0xc000000000a10e30 <strlen>
> > > > >>       […]
> > > > >>
> > > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> > > > >>> 515    static size_t rtnl_link_get_size(const struct net_device *dev)
> > > > >>> 516    {
> > > > >>> 517        const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > > > >>> 518        size_t size;
> > > > >>> 519
> > > > >>> 520        if (!ops)
> > > > >>> 521            return 0;
> > > > >>> 522
> > > > >>> 523        size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > > >>> 524               nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > > > >>
> > > > >> How do I connect the disassemby output with the corresponding line?
> > > > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > for powerpc64le in my Ubuntu 20.04 x86_64.
> > > > >
> > > > > gdb-multiarch ./vmlinux
> > > > > (gdb)disassemble if_nlmsg_size
> > > > > [...]
> > > > > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > > > > [...]
> > > > > (gdb) break *0xc00000000191bf40
> > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > >
> > > > > But in include/net/netlink.h:1112, I can't find the call to strlen
> > > > > 1110static inline int nla_total_size(int payload)
> > > > > 1111{
> > > > > 1112        return NLA_ALIGN(nla_attr_size(payload));
> > > > > 1113}
> > > > > This may be due to the compiler wrongly encode the debug information, I guess.
> > > >
> > > > `rtnl_link_get_size()` contains:
> > > >
> > > >              size = nla_total_size(sizeof(struct nlattr)) + /*
> > > > IFLA_LINKINFO */
> > > >                     nla_total_size(strlen(ops->kind) + 1);  /*
> > > > IFLA_INFO_KIND */
> > > >
> > > > Is that inlined(?) and the code at fault?
> > > Yes, that is inlined! because
> > > (gdb) disassemble if_nlmsg_size
> > > Dump of assembler code for function if_nlmsg_size:
> > > [...]
> > > 0xc00000000191bf38 <+104>:    beq     0xc00000000191c1f0 <if_nlmsg_size+800>
> > > 0xc00000000191bf3c <+108>:    ld      r3,16(r31)
> > > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > > [...]
> > > (gdb)
> > > (gdb) break *0xc00000000191bf40
> > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > (gdb) break *0xc00000000191bf38
> > > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
> >
> > I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not
> > already doing so.  That gives gdb a lot more information about things
> > like inlining.
> I check my .config file, CONFIG_DEBUG_INFO=y is here:
> linux-next$ grep CONFIG_DEBUG_INFO .config
> CONFIG_DEBUG_INFO=y
> Then I invoke "make clean" and rebuild the kernel, the behavior of gdb
> and vmlinux remain unchanged, sorry for that

Glad you were already on top of this one!

> I am trying to reproduce the bug on my bare metal x86_64 machines in
> the coming days, and am also trying to work with Mr Menzel after he
> comes back to the office.

This URL used to allow community members such as yourself to request
access to Power systems: https://osuosl.org/services/powerdev/

In case that helps.

							Thanx, Paul

> Thanks
> Zhouyi
> >
> >                                                         Thanx, Paul
> >
> > > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> > > > >>> line 1917, so I guess something must happened between the calls.
> > > > >>>
> > > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> > > > >>> happened in between?
> > > > >>
> > > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> > > > >> see. From `arch/powerpc/Kconfig`:
> > > > >>
> > > > >>           select HAVE_ARCH_KASAN                  if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > >>           select HAVE_ARCH_KASAN_VMALLOC          if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > >>
> > > > > en, agree, I invoke "make  menuconfig  ARCH=powerpc
> > > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > > > > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > > > > the bug by bisecting instead.
> > > >
> > > > I do not know, if it is a regression, as it was the first time I tried
> > > > to run a Linux kernel built with rcutorture on real hardware.
> > > I tried to add some debug statements to the kernel to locate the bug
> > > more accurately,  you can try it when you're not busy in the future,
> > > or just ignore it if the following patch looks not very effective ;-)
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 1baab07820f6..969ac7c540cc 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
> > >       *    Prevent userspace races by waiting until the network
> > >       *    device is fully setup before sending notifications.
> > >       */
> > > +    if (dev->rtnl_link_ops)
> > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > >      if (!dev->rtnl_link_ops ||
> > >          dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> > >          rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> > > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
> > >
> > >      if (rtnl_lock_killable())
> > >          return -EINTR;
> > > +    if (dev->rtnl_link_ops)
> > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > >      err = register_netdevice(dev);
> > >      rtnl_unlock();
> > >      return err;
> > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > > index e476403231f0..e08986ae6238 100644
> > > --- a/net/core/rtnetlink.c
> > > +++ b/net/core/rtnetlink.c
> > > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> > > net_device *dev)
> > >      if (!ops)
> > >          return 0;
> > >
> > > +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> > > +           ops->kind, __FUNCTION__);
> > >      size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > >             nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > >
> > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> > > net_device *dev)
> > >  static noinline size_t if_nlmsg_size(const struct net_device *dev,
> > >                       u32 ext_filter_mask)
> > >  {
> > > +    if (dev->rtnl_link_ops)
> > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > >      return NLMSG_ALIGN(sizeof(struct ifinfomsg))
> > >             + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
> > >             + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> > > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> > > struct net_device *dev,
> > >      struct net *net = dev_net(dev);
> > >      struct sk_buff *skb;
> > >      int err = -ENOBUFS;
> > > -
> > > +    if (dev->rtnl_link_ops)
> > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > >      skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
> > >      if (skb == NULL)
> > >          goto errout;
> > > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > net_device *dev,
> > >
> > >      if (dev->reg_state != NETREG_REGISTERED)
> > >          return;
> > > -
> > > +    if (dev->rtnl_link_ops)
> > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > >      skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
> > >                       new_ifindex);
> > >      if (skb)
> > > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > net_device *dev,
> > >  void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> > >            gfp_t flags)
> > >  {
> > > +    if (dev->rtnl_link_ops)
> > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > >      rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> > >                 NULL, 0);
> > >  }
> > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > > index c0b138c20992..fa5b2725811c 100644
> > > --- a/net/ipv6/sit.c
> > > +++ b/net/ipv6/sit.c
> > > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
> > >       * Allowing to move it to another netns is clearly unsafe.
> > >       */
> > >      sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > -
> > > +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> > > +           sitn->fb_tunnel_dev->rtnl_link_ops,
> > > +           sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
> > >      err = register_netdev(sitn->fb_tunnel_dev);
> > >      if (err)
> > >          goto err_reg_dev;
> > > >
> > > > >>> Hope I can be of more helpful.
> > > > >>
> > > > >> Some distributions support multi-arch, so they easily allow
> > > > >> crosscompiling for different architectures.
> > > > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > > > > to explore it.
> > > >
> > > > Oh, that does not sound good. But I have not tried that in a long time
> > > > either. It’s a separate issue, but maybe some of the PPC
> > > > maintainers/folks could help.
> > > I will do further research on this later.
> > >
> > > Thanks for your time
> > > Kind regards
> > > Zhouyi
> > > >
> > > >
> > > > Kind regards,
> > > >
> > > > Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-02-01 17:50               ` Paul E. McKenney
@ 2022-02-02  2:39                 ` Zhouyi Zhou
  2022-02-08 20:10                   ` Zhouyi Zhou
  0 siblings, 1 reply; 17+ messages in thread
From: Zhouyi Zhou @ 2022-02-02  2:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Paul Menzel, Josh Triplett, rcu, LKML, David S. Miller,
	Jakub Kicinski, netdev

Thank Paul for your encouragement!

On Wed, Feb 2, 2022 at 1:50 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Mon, Jan 31, 2022 at 09:08:40AM +0800, Zhouyi Zhou wrote:
> > Thank Paul for joining us!
> >
> > On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote:
> > > > Dear Paul
> > > >
> > > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> > > > >
> > > > > Dear Zhouyi,
> > > > >
> > > > >
> > > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> > > > >
> > > > > > Thank you for your instructions, I learned a lot from this process.
> > > > >
> > > > > Same on my end.
> > > > >
> > > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> > > > >
> > > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> > > > > >>
> > > > > >>> I don't have an IBM machine, but I tried to analyze the problem using
> > > > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > > > > >>> x86_64 kvm virtual machine.
> > > > > >>
> > > > > >> No idea, if it’s architecture specific.
> > > > > >>
> > > > > >>> I saw the panic is caused by registration of sit device (A sit device
> > > > > >>> is a type of virtual network device that takes our IPv6 traffic,
> > > > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > > > > >>> over the IPv4 Internet to another host)
> > > > > >>>
> > > > > >>> sit device is registered in function sit_init_net:
> > > > > >>> 1895    static int __net_init sit_init_net(struct net *net)
> > > > > >>> 1896    {
> > > > > >>> 1897        struct sit_net *sitn = net_generic(net, sit_net_id);
> > > > > >>> 1898        struct ip_tunnel *t;
> > > > > >>> 1899        int err;
> > > > > >>> 1900
> > > > > >>> 1901        sitn->tunnels[0] = sitn->tunnels_wc;
> > > > > >>> 1902        sitn->tunnels[1] = sitn->tunnels_l;
> > > > > >>> 1903        sitn->tunnels[2] = sitn->tunnels_r;
> > > > > >>> 1904        sitn->tunnels[3] = sitn->tunnels_r_l;
> > > > > >>> 1905
> > > > > >>> 1906        if (!net_has_fallback_tunnels(net))
> > > > > >>> 1907            return 0;
> > > > > >>> 1908
> > > > > >>> 1909        sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > > > > >>> 1910                           NET_NAME_UNKNOWN,
> > > > > >>> 1911                           ipip6_tunnel_setup);
> > > > > >>> 1912        if (!sitn->fb_tunnel_dev) {
> > > > > >>> 1913            err = -ENOMEM;
> > > > > >>> 1914            goto err_alloc_dev;
> > > > > >>> 1915        }
> > > > > >>> 1916        dev_net_set(sitn->fb_tunnel_dev, net);
> > > > > >>> 1917        sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > > > > >>> 1918        /* FB netdevice is special: we have one, and only one per netns.
> > > > > >>> 1919         * Allowing to move it to another netns is clearly unsafe.
> > > > > >>> 1920         */
> > > > > >>> 1921        sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > > >>> 1922
> > > > > >>> 1923        err = register_netdev(sitn->fb_tunnel_dev);
> > > > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> > > > > >>>
> > > > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > > > > >>> (gdb) disassemble if_nlmsg_size
> > > > > >>> Dump of assembler code for function if_nlmsg_size:
> > > > > >>>      0xffffffff81a0dc20 <+0>:    nopl   0x0(%rax,%rax,1)
> > > > > >>>      0xffffffff81a0dc25 <+5>:    push   %rbp
> > > > > >>>      0xffffffff81a0dc26 <+6>:    push   %r15
> > > > > >>>      0xffffffff81a0dd04 <+228>:    je     0xffffffff81a0de20 <if_nlmsg_size+512>
> > > > > >>>      0xffffffff81a0dd0a <+234>:    mov    0x10(%rbp),%rdi
> > > > > >>>      ...
> > > > > >>>    => 0xffffffff81a0dd0e <+238>:    callq  0xffffffff817532d0 <strlen>
> > > > > >>>      0xffffffff81a0dd13 <+243>:    add    $0x10,%eax
> > > > > >>>      0xffffffff81a0dd16 <+246>:    movslq %eax,%r12
> > > > > >>
> > > > > >> Excuse my ignorance, would that look the same for ppc64le?
> > > > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> > > > > >> current build (without rcutorture) I have the line below, where strlen
> > > > > >> shows up.
> > > > > >>
> > > > > >>       (gdb) disassemble if_nlmsg_size
> > > > > >>       […]
> > > > > >>       0xc000000000f7f82c <+332>: bl      0xc000000000a10e30 <strlen>
> > > > > >>       […]
> > > > > >>
> > > > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> > > > > >>> 515    static size_t rtnl_link_get_size(const struct net_device *dev)
> > > > > >>> 516    {
> > > > > >>> 517        const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > > > > >>> 518        size_t size;
> > > > > >>> 519
> > > > > >>> 520        if (!ops)
> > > > > >>> 521            return 0;
> > > > > >>> 522
> > > > > >>> 523        size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > > > >>> 524               nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > > > > >>
> > > > > >> How do I connect the disassemby output with the corresponding line?
> > > > > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > > for powerpc64le in my Ubuntu 20.04 x86_64.
> > > > > >
> > > > > > gdb-multiarch ./vmlinux
> > > > > > (gdb)disassemble if_nlmsg_size
> > > > > > [...]
> > > > > > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > > > > > [...]
> > > > > > (gdb) break *0xc00000000191bf40
> > > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > > >
> > > > > > But in include/net/netlink.h:1112, I can't find the call to strlen
> > > > > > 1110static inline int nla_total_size(int payload)
> > > > > > 1111{
> > > > > > 1112        return NLA_ALIGN(nla_attr_size(payload));
> > > > > > 1113}
> > > > > > This may be due to the compiler wrongly encode the debug information, I guess.
> > > > >
> > > > > `rtnl_link_get_size()` contains:
> > > > >
> > > > >              size = nla_total_size(sizeof(struct nlattr)) + /*
> > > > > IFLA_LINKINFO */
> > > > >                     nla_total_size(strlen(ops->kind) + 1);  /*
> > > > > IFLA_INFO_KIND */
> > > > >
> > > > > Is that inlined(?) and the code at fault?
> > > > Yes, that is inlined! because
> > > > (gdb) disassemble if_nlmsg_size
> > > > Dump of assembler code for function if_nlmsg_size:
> > > > [...]
> > > > 0xc00000000191bf38 <+104>:    beq     0xc00000000191c1f0 <if_nlmsg_size+800>
> > > > 0xc00000000191bf3c <+108>:    ld      r3,16(r31)
> > > > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > > > [...]
> > > > (gdb)
> > > > (gdb) break *0xc00000000191bf40
> > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > (gdb) break *0xc00000000191bf38
> > > > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
> > >
> > > I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not
> > > already doing so.  That gives gdb a lot more information about things
> > > like inlining.
> > I check my .config file, CONFIG_DEBUG_INFO=y is here:
> > linux-next$ grep CONFIG_DEBUG_INFO .config
> > CONFIG_DEBUG_INFO=y
> > Then I invoke "make clean" and rebuild the kernel, the behavior of gdb
> > and vmlinux remain unchanged, sorry for that
>
> Glad you were already on top of this one!
I am very pleased to contribute my tiny effort to the process of
making Linux better ;-)
>
> > I am trying to reproduce the bug on my bare metal x86_64 machines in
> > the coming days, and am also trying to work with Mr Menzel after he
> > comes back to the office.
>
> This URL used to allow community members such as yourself to request
> access to Power systems: https://osuosl.org/services/powerdev/
I have filled the request form on
https://osuosl.org/services/powerdev/ and now wait for them to deploy
the environment for me.

Thanks again
Zhouyi
>
> In case that helps.
>
>                                                         Thanx, Paul
>
> > Thanks
> > Zhouyi
> > >
> > >                                                         Thanx, Paul
> > >
> > > > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> > > > > >>> line 1917, so I guess something must happened between the calls.
> > > > > >>>
> > > > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> > > > > >>> happened in between?
> > > > > >>
> > > > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> > > > > >> see. From `arch/powerpc/Kconfig`:
> > > > > >>
> > > > > >>           select HAVE_ARCH_KASAN                  if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > > >>           select HAVE_ARCH_KASAN_VMALLOC          if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > > >>
> > > > > > en, agree, I invoke "make  menuconfig  ARCH=powerpc
> > > > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > > > > > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > > > > > the bug by bisecting instead.
> > > > >
> > > > > I do not know, if it is a regression, as it was the first time I tried
> > > > > to run a Linux kernel built with rcutorture on real hardware.
> > > > I tried to add some debug statements to the kernel to locate the bug
> > > > more accurately,  you can try it when you're not busy in the future,
> > > > or just ignore it if the following patch looks not very effective ;-)
> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index 1baab07820f6..969ac7c540cc 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
> > > >       *    Prevent userspace races by waiting until the network
> > > >       *    device is fully setup before sending notifications.
> > > >       */
> > > > +    if (dev->rtnl_link_ops)
> > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > >      if (!dev->rtnl_link_ops ||
> > > >          dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> > > >          rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> > > > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
> > > >
> > > >      if (rtnl_lock_killable())
> > > >          return -EINTR;
> > > > +    if (dev->rtnl_link_ops)
> > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > >      err = register_netdevice(dev);
> > > >      rtnl_unlock();
> > > >      return err;
> > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > > > index e476403231f0..e08986ae6238 100644
> > > > --- a/net/core/rtnetlink.c
> > > > +++ b/net/core/rtnetlink.c
> > > > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> > > > net_device *dev)
> > > >      if (!ops)
> > > >          return 0;
> > > >
> > > > +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> > > > +           ops->kind, __FUNCTION__);
> > > >      size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > >             nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > > >
> > > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> > > > net_device *dev)
> > > >  static noinline size_t if_nlmsg_size(const struct net_device *dev,
> > > >                       u32 ext_filter_mask)
> > > >  {
> > > > +    if (dev->rtnl_link_ops)
> > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > >      return NLMSG_ALIGN(sizeof(struct ifinfomsg))
> > > >             + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
> > > >             + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> > > > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> > > > struct net_device *dev,
> > > >      struct net *net = dev_net(dev);
> > > >      struct sk_buff *skb;
> > > >      int err = -ENOBUFS;
> > > > -
> > > > +    if (dev->rtnl_link_ops)
> > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > >      skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
> > > >      if (skb == NULL)
> > > >          goto errout;
> > > > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > > net_device *dev,
> > > >
> > > >      if (dev->reg_state != NETREG_REGISTERED)
> > > >          return;
> > > > -
> > > > +    if (dev->rtnl_link_ops)
> > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > >      skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
> > > >                       new_ifindex);
> > > >      if (skb)
> > > > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > > net_device *dev,
> > > >  void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> > > >            gfp_t flags)
> > > >  {
> > > > +    if (dev->rtnl_link_ops)
> > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > >      rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> > > >                 NULL, 0);
> > > >  }
> > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > > > index c0b138c20992..fa5b2725811c 100644
> > > > --- a/net/ipv6/sit.c
> > > > +++ b/net/ipv6/sit.c
> > > > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
> > > >       * Allowing to move it to another netns is clearly unsafe.
> > > >       */
> > > >      sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > -
> > > > +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> > > > +           sitn->fb_tunnel_dev->rtnl_link_ops,
> > > > +           sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
> > > >      err = register_netdev(sitn->fb_tunnel_dev);
> > > >      if (err)
> > > >          goto err_reg_dev;
> > > > >
> > > > > >>> Hope I can be of more helpful.
> > > > > >>
> > > > > >> Some distributions support multi-arch, so they easily allow
> > > > > >> crosscompiling for different architectures.
> > > > > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > > > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > > > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > > > > > to explore it.
> > > > >
> > > > > Oh, that does not sound good. But I have not tried that in a long time
> > > > > either. It’s a separate issue, but maybe some of the PPC
> > > > > maintainers/folks could help.
> > > > I will do further research on this later.
> > > >
> > > > Thanks for your time
> > > > Kind regards
> > > > Zhouyi
> > > > >
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-02-02  2:39                 ` Zhouyi Zhou
@ 2022-02-08 20:10                   ` Zhouyi Zhou
  0 siblings, 0 replies; 17+ messages in thread
From: Zhouyi Zhou @ 2022-02-08 20:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Paul Menzel, Josh Triplett, rcu, LKML, David S. Miller,
	Jakub Kicinski, netdev, linuxppc-dev

Hi Paul

Below are my preliminary test results tested on PPC VM supplied by
Open source lab of Oregon State University, thank you for your
support!

[Preliminary test results on ppc64le virtual guest]

1. Conclusion
Some other kernel configuration besides RCU may lead to "BUG: Kernel
NULL pointer dereference" at boot


2. Test Environment
2.1 host hardware
8 core ppc64le virtual guest with 16G ram and 160G disk
cpu        : POWER9 (architected), altivec supported
clock        : 2200.000000MHz
revision    : 2.2 (pvr 004e 1202)

2.2 host software
Operating System: Ubuntu 20.04.3 LTS, Compiler: gcc version 9.3.0


3. Test Procedure
3.1 kernel source
next-20220203

3.2 build and boot the kernel with CONFIG_DRM_BOCHS=m and
CONFIG_RCU_TORTURE_TEST=y
test result: "BUG: Kernel NULL pointer dereference" at boot
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.bochs.torture
boot msg: http://154.223.142.244/Feb2022/dmesg.torture.bochs

3.3 build and boot the kernel with CONFIG_DRM_BOCHS=m
test result: "BUG: Kernel NULL pointer dereference" at boot
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.bochs
boot msg: http://154.223.142.244/Feb2022/dmesg.bochs

3.4 build and boot the kernel with CONFIG_RCU_TORTURE_TEST=y (without
CONFIG_DRM_BOCHS)
test result: boot without error
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.torture
boot msg: http://154.223.142.244/Feb2022/dmesg.torture

3.5 build and boot the kernel with CONFIG_RCU_TORTURE_TEST=m (without
CONFIG_DRM_BOCHS)
test result: boot without error
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next
boot msg: http://154.223.142.244/Feb2022/dmesg

4. Acknowledgement
Thank Open source lab of Oregon State University and Paul Menzel and
all other community members who support my tiny research.

Thanks
Zhouyi

On Wed, Feb 2, 2022 at 10:39 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>
> Thank Paul for your encouragement!
>
> On Wed, Feb 2, 2022 at 1:50 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Mon, Jan 31, 2022 at 09:08:40AM +0800, Zhouyi Zhou wrote:
> > > Thank Paul for joining us!
> > >
> > > On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote:
> > > > > Dear Paul
> > > > >
> > > > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> > > > > >
> > > > > > Dear Zhouyi,
> > > > > >
> > > > > >
> > > > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> > > > > >
> > > > > > > Thank you for your instructions, I learned a lot from this process.
> > > > > >
> > > > > > Same on my end.
> > > > > >
> > > > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> > > > > >
> > > > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> > > > > > >>
> > > > > > >>> I don't have an IBM machine, but I tried to analyze the problem using
> > > > > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > > > > > >>> x86_64 kvm virtual machine.
> > > > > > >>
> > > > > > >> No idea, if it’s architecture specific.
> > > > > > >>
> > > > > > >>> I saw the panic is caused by registration of sit device (A sit device
> > > > > > >>> is a type of virtual network device that takes our IPv6 traffic,
> > > > > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > > > > > >>> over the IPv4 Internet to another host)
> > > > > > >>>
> > > > > > >>> sit device is registered in function sit_init_net:
> > > > > > >>> 1895    static int __net_init sit_init_net(struct net *net)
> > > > > > >>> 1896    {
> > > > > > >>> 1897        struct sit_net *sitn = net_generic(net, sit_net_id);
> > > > > > >>> 1898        struct ip_tunnel *t;
> > > > > > >>> 1899        int err;
> > > > > > >>> 1900
> > > > > > >>> 1901        sitn->tunnels[0] = sitn->tunnels_wc;
> > > > > > >>> 1902        sitn->tunnels[1] = sitn->tunnels_l;
> > > > > > >>> 1903        sitn->tunnels[2] = sitn->tunnels_r;
> > > > > > >>> 1904        sitn->tunnels[3] = sitn->tunnels_r_l;
> > > > > > >>> 1905
> > > > > > >>> 1906        if (!net_has_fallback_tunnels(net))
> > > > > > >>> 1907            return 0;
> > > > > > >>> 1908
> > > > > > >>> 1909        sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > > > > > >>> 1910                           NET_NAME_UNKNOWN,
> > > > > > >>> 1911                           ipip6_tunnel_setup);
> > > > > > >>> 1912        if (!sitn->fb_tunnel_dev) {
> > > > > > >>> 1913            err = -ENOMEM;
> > > > > > >>> 1914            goto err_alloc_dev;
> > > > > > >>> 1915        }
> > > > > > >>> 1916        dev_net_set(sitn->fb_tunnel_dev, net);
> > > > > > >>> 1917        sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > > > > > >>> 1918        /* FB netdevice is special: we have one, and only one per netns.
> > > > > > >>> 1919         * Allowing to move it to another netns is clearly unsafe.
> > > > > > >>> 1920         */
> > > > > > >>> 1921        sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > > > >>> 1922
> > > > > > >>> 1923        err = register_netdev(sitn->fb_tunnel_dev);
> > > > > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> > > > > > >>>
> > > > > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > > > > > >>> (gdb) disassemble if_nlmsg_size
> > > > > > >>> Dump of assembler code for function if_nlmsg_size:
> > > > > > >>>      0xffffffff81a0dc20 <+0>:    nopl   0x0(%rax,%rax,1)
> > > > > > >>>      0xffffffff81a0dc25 <+5>:    push   %rbp
> > > > > > >>>      0xffffffff81a0dc26 <+6>:    push   %r15
> > > > > > >>>      0xffffffff81a0dd04 <+228>:    je     0xffffffff81a0de20 <if_nlmsg_size+512>
> > > > > > >>>      0xffffffff81a0dd0a <+234>:    mov    0x10(%rbp),%rdi
> > > > > > >>>      ...
> > > > > > >>>    => 0xffffffff81a0dd0e <+238>:    callq  0xffffffff817532d0 <strlen>
> > > > > > >>>      0xffffffff81a0dd13 <+243>:    add    $0x10,%eax
> > > > > > >>>      0xffffffff81a0dd16 <+246>:    movslq %eax,%r12
> > > > > > >>
> > > > > > >> Excuse my ignorance, would that look the same for ppc64le?
> > > > > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> > > > > > >> current build (without rcutorture) I have the line below, where strlen
> > > > > > >> shows up.
> > > > > > >>
> > > > > > >>       (gdb) disassemble if_nlmsg_size
> > > > > > >>       […]
> > > > > > >>       0xc000000000f7f82c <+332>: bl      0xc000000000a10e30 <strlen>
> > > > > > >>       […]
> > > > > > >>
> > > > > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> > > > > > >>> 515    static size_t rtnl_link_get_size(const struct net_device *dev)
> > > > > > >>> 516    {
> > > > > > >>> 517        const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > > > > > >>> 518        size_t size;
> > > > > > >>> 519
> > > > > > >>> 520        if (!ops)
> > > > > > >>> 521            return 0;
> > > > > > >>> 522
> > > > > > >>> 523        size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > > > > >>> 524               nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > > > > > >>
> > > > > > >> How do I connect the disassemby output with the corresponding line?
> > > > > > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > > > for powerpc64le in my Ubuntu 20.04 x86_64.
> > > > > > >
> > > > > > > gdb-multiarch ./vmlinux
> > > > > > > (gdb)disassemble if_nlmsg_size
> > > > > > > [...]
> > > > > > > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > > > > > > [...]
> > > > > > > (gdb) break *0xc00000000191bf40
> > > > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > > > >
> > > > > > > But in include/net/netlink.h:1112, I can't find the call to strlen
> > > > > > > 1110static inline int nla_total_size(int payload)
> > > > > > > 1111{
> > > > > > > 1112        return NLA_ALIGN(nla_attr_size(payload));
> > > > > > > 1113}
> > > > > > > This may be due to the compiler wrongly encode the debug information, I guess.
> > > > > >
> > > > > > `rtnl_link_get_size()` contains:
> > > > > >
> > > > > >              size = nla_total_size(sizeof(struct nlattr)) + /*
> > > > > > IFLA_LINKINFO */
> > > > > >                     nla_total_size(strlen(ops->kind) + 1);  /*
> > > > > > IFLA_INFO_KIND */
> > > > > >
> > > > > > Is that inlined(?) and the code at fault?
> > > > > Yes, that is inlined! because
> > > > > (gdb) disassemble if_nlmsg_size
> > > > > Dump of assembler code for function if_nlmsg_size:
> > > > > [...]
> > > > > 0xc00000000191bf38 <+104>:    beq     0xc00000000191c1f0 <if_nlmsg_size+800>
> > > > > 0xc00000000191bf3c <+108>:    ld      r3,16(r31)
> > > > > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > > > > [...]
> > > > > (gdb)
> > > > > (gdb) break *0xc00000000191bf40
> > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > > (gdb) break *0xc00000000191bf38
> > > > > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
> > > >
> > > > I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not
> > > > already doing so.  That gives gdb a lot more information about things
> > > > like inlining.
> > > I check my .config file, CONFIG_DEBUG_INFO=y is here:
> > > linux-next$ grep CONFIG_DEBUG_INFO .config
> > > CONFIG_DEBUG_INFO=y
> > > Then I invoke "make clean" and rebuild the kernel, the behavior of gdb
> > > and vmlinux remain unchanged, sorry for that
> >
> > Glad you were already on top of this one!
> I am very pleased to contribute my tiny effort to the process of
> making Linux better ;-)
> >
> > > I am trying to reproduce the bug on my bare metal x86_64 machines in
> > > the coming days, and am also trying to work with Mr Menzel after he
> > > comes back to the office.
> >
> > This URL used to allow community members such as yourself to request
> > access to Power systems: https://osuosl.org/services/powerdev/
> I have filled the request form on
> https://osuosl.org/services/powerdev/ and now wait for them to deploy
> the environment for me.
>
> Thanks again
> Zhouyi
> >
> > In case that helps.
> >
> >                                                         Thanx, Paul
> >
> > > Thanks
> > > Zhouyi
> > > >
> > > >                                                         Thanx, Paul
> > > >
> > > > > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> > > > > > >>> line 1917, so I guess something must happened between the calls.
> > > > > > >>>
> > > > > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> > > > > > >>> happened in between?
> > > > > > >>
> > > > > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> > > > > > >> see. From `arch/powerpc/Kconfig`:
> > > > > > >>
> > > > > > >>           select HAVE_ARCH_KASAN                  if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > > > >>           select HAVE_ARCH_KASAN_VMALLOC          if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > > > >>
> > > > > > > en, agree, I invoke "make  menuconfig  ARCH=powerpc
> > > > > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > > > > > > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > > > > > > the bug by bisecting instead.
> > > > > >
> > > > > > I do not know, if it is a regression, as it was the first time I tried
> > > > > > to run a Linux kernel built with rcutorture on real hardware.
> > > > > I tried to add some debug statements to the kernel to locate the bug
> > > > > more accurately,  you can try it when you're not busy in the future,
> > > > > or just ignore it if the following patch looks not very effective ;-)
> > > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > > index 1baab07820f6..969ac7c540cc 100644
> > > > > --- a/net/core/dev.c
> > > > > +++ b/net/core/dev.c
> > > > > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
> > > > >       *    Prevent userspace races by waiting until the network
> > > > >       *    device is fully setup before sending notifications.
> > > > >       */
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      if (!dev->rtnl_link_ops ||
> > > > >          dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> > > > >          rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> > > > > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
> > > > >
> > > > >      if (rtnl_lock_killable())
> > > > >          return -EINTR;
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      err = register_netdevice(dev);
> > > > >      rtnl_unlock();
> > > > >      return err;
> > > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > > > > index e476403231f0..e08986ae6238 100644
> > > > > --- a/net/core/rtnetlink.c
> > > > > +++ b/net/core/rtnetlink.c
> > > > > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> > > > > net_device *dev)
> > > > >      if (!ops)
> > > > >          return 0;
> > > > >
> > > > > +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> > > > > +           ops->kind, __FUNCTION__);
> > > > >      size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > > >             nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > > > >
> > > > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> > > > > net_device *dev)
> > > > >  static noinline size_t if_nlmsg_size(const struct net_device *dev,
> > > > >                       u32 ext_filter_mask)
> > > > >  {
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      return NLMSG_ALIGN(sizeof(struct ifinfomsg))
> > > > >             + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
> > > > >             + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> > > > > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> > > > > struct net_device *dev,
> > > > >      struct net *net = dev_net(dev);
> > > > >      struct sk_buff *skb;
> > > > >      int err = -ENOBUFS;
> > > > > -
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
> > > > >      if (skb == NULL)
> > > > >          goto errout;
> > > > > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > > > net_device *dev,
> > > > >
> > > > >      if (dev->reg_state != NETREG_REGISTERED)
> > > > >          return;
> > > > > -
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
> > > > >                       new_ifindex);
> > > > >      if (skb)
> > > > > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > > > net_device *dev,
> > > > >  void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> > > > >            gfp_t flags)
> > > > >  {
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> > > > >                 NULL, 0);
> > > > >  }
> > > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > > > > index c0b138c20992..fa5b2725811c 100644
> > > > > --- a/net/ipv6/sit.c
> > > > > +++ b/net/ipv6/sit.c
> > > > > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
> > > > >       * Allowing to move it to another netns is clearly unsafe.
> > > > >       */
> > > > >      sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > > -
> > > > > +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> > > > > +           sitn->fb_tunnel_dev->rtnl_link_ops,
> > > > > +           sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      err = register_netdev(sitn->fb_tunnel_dev);
> > > > >      if (err)
> > > > >          goto err_reg_dev;
> > > > > >
> > > > > > >>> Hope I can be of more helpful.
> > > > > > >>
> > > > > > >> Some distributions support multi-arch, so they easily allow
> > > > > > >> crosscompiling for different architectures.
> > > > > > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > > > > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > > > > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > > > > > > to explore it.
> > > > > >
> > > > > > Oh, that does not sound good. But I have not tried that in a long time
> > > > > > either. It’s a separate issue, but maybe some of the PPC
> > > > > > maintainers/folks could help.
> > > > > I will do further research on this later.
> > > > >
> > > > > Thanks for your time
> > > > > Kind regards
> > > > > Zhouyi
> > > > > >
> > > > > >
> > > > > > Kind regards,
> > > > > >
> > > > > > Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
       [not found]           ` <f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de>
@ 2022-02-17  1:16             ` Nathan Chancellor
  2022-02-21 11:17               ` Paul Menzel
  0 siblings, 1 reply; 17+ messages in thread
From: Nathan Chancellor @ 2022-02-17  1:16 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Zhouyi Zhou, Paul E. McKenney, Josh Triplett, rcu, LKML,
	David S. Miller, Jakub Kicinski, netdev, Nick Desaulniers, llvm

Hi Paul,

On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote:
> [Cc: +LLVM/clang build support folks]
> 
> 
> Dear Zhouyi, dear Nathan, dear Nick,
> 
> 
> To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm*
> and *clang* 1:13.0-53~exp1
> 
>     $ clang --version
>     Ubuntu clang version 13.0.0-2
>     Target: powerpc64le-unknown-linux-gnu
>     Thread model: posix
>     InstalledDir: /usr/bin
> 
> results in a segmentation fault, while it works when building with GCC.
> 
>     $ gcc --version
>     gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0

Thank you for keying us in. I am going to have a bit of a brain dump
here based on the information I have uncovered after a couple of hours
of debugging.

TL;DR: It seems like something is broken with __read_mostly + ld.lld
before 14.0.0.

My initial reproduction steps (boot-qemu.sh comes from
https://github.com/ClangBuiltLinux/boot-utils):

$ clang --version
clang version 13.0.1 (Fedora 13.0.1-1.fc37)
Target: x86_64-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ powerpc64le-linux-gnu-as --version
GNU assembler version 2.37-2.fc36
Copyright (C) 2021 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `powerpc64le-linux-gnu'.

$ curl -LSso .config https://lore.kernel.org/all/f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de/3-linux-5.17-rc4-rcu-dev-config.txt

$ scripts/config --set-val INITRAMFS_SOURCE '""'

$ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all

$ boot-qemu.sh -a ppc64le -k . -t 45s
QEMU location: /usr/bin

QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37)

+ timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \
/home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \
ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \
/home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \
-machine powernv8 -display none -kernel \
/home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \
-nodefaults -serial mon:stdio
...
[    1.478028][    T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
[    1.478630][    T1] Faulting instruction address: 0xc00000000090bee0
[    1.479521][    T1] Oops: Kernel access of bad area, sig: 11 [#1]
[    1.480036][    T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
[    1.480853][    T1] Modules linked in:
[    1.481265][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1
[    1.481967][    T1] NIP:  c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c
[    1.482596][    T1] REGS: c000000007443330 TRAP: 0380   Not tainted  (5.17.0-rc4-00001-gfa15c7cb550f)
[    1.483305][    T1] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 22800a87  XER: 00000000
[    1.484277][    T1] CFAR: c000000000d96b5c IRQMASK: 0
[    1.484277][    T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000
[    1.484277][    T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88
[    1.484277][    T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000
[    1.484277][    T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000
[    1.484277][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    1.484277][    T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000
[    1.484277][    T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000
[    1.484277][    T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0
[    1.490325][    T1] NIP [c00000000090bee0] strlen+0x10/0x30
[    1.490788][    T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390
[    1.491319][    T1] Call Trace:
[    1.491573][    T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable)
[    1.492291][    T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0
[    1.492958][    T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0
[    1.493559][    T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670
[    1.494205][    T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80
[    1.494823][    T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200
[    1.495426][    T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0
[    1.496014][    T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0
[    1.496716][    T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0
[    1.497372][    T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160
[    1.497950][    T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0
[    1.498573][    T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4
[    1.499219][    T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4
[    1.499799][    T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec
[    1.500444][    T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270
[    1.501042][    T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
[    1.501721][    T1] Instruction dump:
[    1.502202][    T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000
[    1.502934][    T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050
[    1.504028][    T1] ---[ end trace 0000000000000000 ]---
...

First thing was figuring out where the NULL pointer dereference happens,
which appears to the "strlen(ops->kind)" in rtnl_link_get_size():

515 static size_t rtnl_link_get_size(const struct net_device *dev)
516 {
517 	const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
518 	size_t size;
519 
520 	if (!ops)
521 		return 0;
522 
523 	size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
524 	       nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */

which I confirmed some really rudimentary printk debugging:

[    1.476862][    T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 710da8a36729..c8d928e83aec 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
 	if (!ops)
 		return 0;
 
+	pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__,
+	       dev->name, ops, ops->kind);
+
 	size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
 	       nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
 

Okay... how did sit0 end up with a NULL kind...? It is very clearly
defined as "sit":

1830 static struct rtnl_link_ops sit_link_ops __read_mostly = {
1831 	.kind		= "sit",

Adding some more debug prints to net/ipv6/sit.c:

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index c0b138c20992..7b9edbed2fcd 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net)
 	 */
 	sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
 
+	pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops);
+	pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind);
+	pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype);
+	pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops);
+	pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind);
+
 	err = register_netdev(sitn->fb_tunnel_dev);
 	if (err)
 		goto err_reg_dev;

reveals:

[    1.471920][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8
[    1.472534][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null)
[    1.473088][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
[    1.473639][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8
[    1.474370][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null)

This is super bizarre, as the maxtype member appears to have the correct
value, but how is kind's initial getting dropped on the floor?

Removing the __read_mostly annotation "fixes" it:

[    1.481708][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60
[    1.482319][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
[    1.482878][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
[    1.483429][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60
[    1.484174][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
...
Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022
...

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 7b9edbed2fcd..f109c7a0233b 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev);
 static void ipip6_dev_free(struct net_device *dev);
 static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst,
 		      __be32 *v4dst);
-static struct rtnl_link_ops sit_link_ops __read_mostly;
+static struct rtnl_link_ops sit_link_ops;
 
 static unsigned int sit_net_id __read_mostly;
 struct sit_net {
@@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head)
 		unregister_netdevice_queue(dev, head);
 }
 
-static struct rtnl_link_ops sit_link_ops __read_mostly = {
+static struct rtnl_link_ops sit_link_ops = {
 	.kind		= "sit",
 	.maxtype	= IFLA_IPTUN_MAX,
 	.policy		= ipip6_policy,

Switching to ld.bfd also resolves it:

[    1.470405][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8
[    1.471016][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
[    1.471534][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
[    1.472062][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8
[    1.472790][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
...
Linux version 5.17.0-rc4-00001-g956f02ad5c31 (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022
...

I tested with ToT LLVM (or at least, close to it, since there is an
unrelated ld.lld regression there) and I could not reproduce it there,
so I did a reverse bisect to see what commit fixes this issue in LLVM 14
and I landed on:

commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4
Author: Fangrui Song <i@maskray.me>
Date:   Thu Nov 25 14:12:34 2021 -0800

    [ELF] Simplify DynamicSection content computation. NFC

    The new code computes the content twice, but avoides the tricky
    std::function<uint64_t()>. Removed 13KiB code in a Release build.

 lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++--------------------------
 lld/ELF/SyntheticSections.h   |  12 +----
 2 files changed, 44 insertions(+), 85 deletions(-)

That's... interesting, given that commit title says No Functional
Change, even though there clearly is one. That commit has a couple
mentions of PowerPC synthetic sections, so it is possible that the
new content calculation lines up with ld.bfd?

I am not really sure where to go from here, as I don't fully understand
what the problem was before that LLD change. I'll see if I can do some
more investigation tomorrow (unless someone wants to beat me to it ;)

Cheers,
Nathan

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-02-17  1:16             ` Nathan Chancellor
@ 2022-02-21 11:17               ` Paul Menzel
  2022-02-21 15:29                 ` Nathan Chancellor
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Menzel @ 2022-02-21 11:17 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Zhouyi Zhou, Paul E. McKenney, Josh Triplett, rcu, LKML,
	David S. Miller, Jakub Kicinski, netdev, Nick Desaulniers, llvm,
	Fangrui Song

[Cc: +Fangrui]

Dear Nathan,


Am 17.02.22 um 02:16 schrieb Nathan Chancellor:

> On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote:
>> [Cc: +LLVM/clang build support folks]

[…]

>> To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm*
>> and *clang* 1:13.0-53~exp1
>>
>>      $ clang --version
>>      Ubuntu clang version 13.0.0-2
>>      Target: powerpc64le-unknown-linux-gnu
>>      Thread model: posix
>>      InstalledDir: /usr/bin
>>
>> results in a segmentation fault, while it works when building with GCC.
>>
>>      $ gcc --version
>>      gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
> 
> Thank you for keying us in. I am going to have a bit of a brain dump
> here based on the information I have uncovered after a couple of hours
> of debugging.
> 
> TL;DR: It seems like something is broken with __read_mostly + ld.lld
> before 14.0.0.
> 
> My initial reproduction steps (boot-qemu.sh comes from
> https://github.com/ClangBuiltLinux/boot-utils):
> 
> $ clang --version
> clang version 13.0.1 (Fedora 13.0.1-1.fc37)
> Target: x86_64-redhat-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
> 
> $ powerpc64le-linux-gnu-as --version
> GNU assembler version 2.37-2.fc36
> Copyright (C) 2021 Free Software Foundation, Inc.
> This program is free software; you may redistribute it under the terms of
> the GNU General Public License version 3 or later.
> This program has absolutely no warranty.
> This assembler was configured for a target of `powerpc64le-linux-gnu'.
> 
> $ curl -LSso .config https://lore.kernel.org/all/f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de/3-linux-5.17-rc4-rcu-dev-config.txt
> 
> $ scripts/config --set-val INITRAMFS_SOURCE '""'
> 
> $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all
> 
> $ boot-qemu.sh -a ppc64le -k . -t 45s
> QEMU location: /usr/bin
> 
> QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37)
> 
> + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \
> /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \
> ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \
> /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \
> -machine powernv8 -display none -kernel \
> /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \
> -nodefaults -serial mon:stdio
> ...
> [    1.478028][    T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
> [    1.478630][    T1] Faulting instruction address: 0xc00000000090bee0
> [    1.479521][    T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [    1.480036][    T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> [    1.480853][    T1] Modules linked in:
> [    1.481265][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1
> [    1.481967][    T1] NIP:  c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c
> [    1.482596][    T1] REGS: c000000007443330 TRAP: 0380   Not tainted  (5.17.0-rc4-00001-gfa15c7cb550f)
> [    1.483305][    T1] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 22800a87  XER: 00000000
> [    1.484277][    T1] CFAR: c000000000d96b5c IRQMASK: 0
> [    1.484277][    T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000
> [    1.484277][    T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88
> [    1.484277][    T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000
> [    1.484277][    T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000
> [    1.484277][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    1.484277][    T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000
> [    1.484277][    T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000
> [    1.484277][    T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0
> [    1.490325][    T1] NIP [c00000000090bee0] strlen+0x10/0x30
> [    1.490788][    T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390
> [    1.491319][    T1] Call Trace:
> [    1.491573][    T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable)
> [    1.492291][    T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0
> [    1.492958][    T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0
> [    1.493559][    T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670
> [    1.494205][    T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80
> [    1.494823][    T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200
> [    1.495426][    T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0
> [    1.496014][    T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0
> [    1.496716][    T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0
> [    1.497372][    T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160
> [    1.497950][    T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0
> [    1.498573][    T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4
> [    1.499219][    T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4
> [    1.499799][    T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec
> [    1.500444][    T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270
> [    1.501042][    T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
> [    1.501721][    T1] Instruction dump:
> [    1.502202][    T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000
> [    1.502934][    T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050
> [    1.504028][    T1] ---[ end trace 0000000000000000 ]---
> ...
> 
> First thing was figuring out where the NULL pointer dereference happens,
> which appears to the "strlen(ops->kind)" in rtnl_link_get_size():
> 
> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> 516 {
> 517 	const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> 518 	size_t size;
> 519
> 520 	if (!ops)
> 521 		return 0;
> 522
> 523 	size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> 524 	       nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> 
> which I confirmed some really rudimentary printk debugging:
> 
> [    1.476862][    T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 710da8a36729..c8d928e83aec 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
>   	if (!ops)
>   		return 0;
>   
> +	pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__,
> +	       dev->name, ops, ops->kind);
> +
>   	size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
>   	       nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
>   
> 
> Okay... how did sit0 end up with a NULL kind...? It is very clearly
> defined as "sit":
> 
> 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = {
> 1831 	.kind		= "sit",
> 
> Adding some more debug prints to net/ipv6/sit.c:
> 
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index c0b138c20992..7b9edbed2fcd 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net)
>   	 */
>   	sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
>   
> +	pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops);
> +	pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind);
> +	pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype);
> +	pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops);
> +	pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind);
> +
>   	err = register_netdev(sitn->fb_tunnel_dev);
>   	if (err)
>   		goto err_reg_dev;
> 
> reveals:
> 
> [    1.471920][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8
> [    1.472534][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null)
> [    1.473088][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> [    1.473639][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8
> [    1.474370][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null)
> 
> This is super bizarre, as the maxtype member appears to have the correct
> value, but how is kind's initial getting dropped on the floor?
> 
> Removing the __read_mostly annotation "fixes" it:
> 
> [    1.481708][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60
> [    1.482319][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> [    1.482878][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> [    1.483429][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60
> [    1.484174][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> ...
> Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022
> ...
> 
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index 7b9edbed2fcd..f109c7a0233b 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev);
>   static void ipip6_dev_free(struct net_device *dev);
>   static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst,
>   		      __be32 *v4dst);
> -static struct rtnl_link_ops sit_link_ops __read_mostly;
> +static struct rtnl_link_ops sit_link_ops;
>   
>   static unsigned int sit_net_id __read_mostly;
>   struct sit_net {
> @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head)
>   		unregister_netdevice_queue(dev, head);
>   }
>   
> -static struct rtnl_link_ops sit_link_ops __read_mostly = {
> +static struct rtnl_link_ops sit_link_ops = {
>   	.kind		= "sit",
>   	.maxtype	= IFLA_IPTUN_MAX,
>   	.policy		= ipip6_policy,
> 
> Switching to ld.bfd also resolves it:
> 
> [    1.470405][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8
> [    1.471016][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> [    1.471534][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> [    1.472062][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8
> [    1.472790][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> ...
> Linux version 5.17.0-rc4-00001-g956f02ad5c31 (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022
> ...
> 
> I tested with ToT LLVM (or at least, close to it, since there is an
> unrelated ld.lld regression there) and I could not reproduce it there,
> so I did a reverse bisect to see what commit fixes this issue in LLVM 14
> and I landed on:
> 
> commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4
> Author: Fangrui Song <i@maskray.me>
> Date:   Thu Nov 25 14:12:34 2021 -0800
> 
>      [ELF] Simplify DynamicSection content computation. NFC
> 
>      The new code computes the content twice, but avoides the tricky
>      std::function<uint64_t()>. Removed 13KiB code in a Release build.
> 
>   lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++--------------------------
>   lld/ELF/SyntheticSections.h   |  12 +----
>   2 files changed, 44 insertions(+), 85 deletions(-)
> 
> That's... interesting, given that commit title says No Functional
> Change, even though there clearly is one. That commit has a couple
> mentions of PowerPC synthetic sections, so it is possible that the
> new content calculation lines up with ld.bfd?
> 
> I am not really sure where to go from here, as I don't fully understand
> what the problem was before that LLD change. I'll see if I can do some
> more investigation tomorrow (unless someone wants to beat me to it ;)

Thank you for looking into this, and sharing your analysis.

I built LLVM/clang from the master branch, rebuilt, but can still 
reproduce this.

     $ git clone --depth=1 https://github.com/llvm/llvm-project.git
     $ cd llvm-project/
     $ git log --oneline
     41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to 
Linalg Transforms
     $ mkdir build
     $ cd build
     $ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles" 
-DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON 
-DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm
     $ make -j20
     $ make -j20 clang-check
     $ make install
     $ /scratch/local2/llvm/bin/clang --version
     clang version 15.0.0 (https://github.com/llvm/llvm-project.git 
41cb504b7c4b18ac15830107431a0c1eec73a6b2)
     Target: powerpc64le-unknown-linux-gnu
     Thread model: posix
     InstalledDir: /scratch/local2/llvm/bin

Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in 
the path.

     $ LLVM=1 LLVM_IAS=0 eatmydata make -j20

     $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 
-net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial 
stdio -m 512 -kernel /dev/shm/linux/vmlinux -append 
"debug_boot_weak_hash panic=-1 console=ttyS0 
rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot 
rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_self_test=1 
rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30 
rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 
rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1 
rcutorture.verbose=1"
     […]
     Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7 
(pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (clang 
version 15.0.0 (https://github.com/llvm/llvm-project.git 
41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT 
Mon Feb 21 10:58:54 CET 2022
     […]
     [    0.465889][    T1] BUG: Kernel NULL pointer dereference on read 
at 0x00000000
     [    0.466749][    T1] Faulting instruction address: 0xc0000000008fc300
     [    0.467507][    T1] Oops: Kernel access of bad area, sig: 11 [#1]
     […]


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-02-21 11:17               ` Paul Menzel
@ 2022-02-21 15:29                 ` Nathan Chancellor
  2022-02-21 17:33                   ` Paul Menzel
  2022-04-19 21:34                   ` Nathan Chancellor
  0 siblings, 2 replies; 17+ messages in thread
From: Nathan Chancellor @ 2022-02-21 15:29 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Zhouyi Zhou, Paul E. McKenney, Josh Triplett, rcu, LKML,
	David S. Miller, Jakub Kicinski, netdev, Nick Desaulniers, llvm,
	Fangrui Song

Hi Paul,

On Mon, Feb 21, 2022 at 12:17:40PM +0100, Paul Menzel wrote:
> Am 17.02.22 um 02:16 schrieb Nathan Chancellor:
> 
> > On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote:
> > > [Cc: +LLVM/clang build support folks]
> 
> […]
> 
> > > To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm*
> > > and *clang* 1:13.0-53~exp1
> > > 
> > >      $ clang --version
> > >      Ubuntu clang version 13.0.0-2
> > >      Target: powerpc64le-unknown-linux-gnu
> > >      Thread model: posix
> > >      InstalledDir: /usr/bin
> > > 
> > > results in a segmentation fault, while it works when building with GCC.
> > > 
> > >      $ gcc --version
> > >      gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
> > 
> > Thank you for keying us in. I am going to have a bit of a brain dump
> > here based on the information I have uncovered after a couple of hours
> > of debugging.
> > 
> > TL;DR: It seems like something is broken with __read_mostly + ld.lld
> > before 14.0.0.
> > 
> > My initial reproduction steps (boot-qemu.sh comes from
> > https://github.com/ClangBuiltLinux/boot-utils):
> > 
> > $ clang --version
> > clang version 13.0.1 (Fedora 13.0.1-1.fc37)
> > Target: x86_64-redhat-linux-gnu
> > Thread model: posix
> > InstalledDir: /usr/bin
> > 
> > $ powerpc64le-linux-gnu-as --version
> > GNU assembler version 2.37-2.fc36
> > Copyright (C) 2021 Free Software Foundation, Inc.
> > This program is free software; you may redistribute it under the terms of
> > the GNU General Public License version 3 or later.
> > This program has absolutely no warranty.
> > This assembler was configured for a target of `powerpc64le-linux-gnu'.
> > 
> > $ curl -LSso .config https://lore.kernel.org/all/f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de/3-linux-5.17-rc4-rcu-dev-config.txt
> > 
> > $ scripts/config --set-val INITRAMFS_SOURCE '""'
> > 
> > $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all
> > 
> > $ boot-qemu.sh -a ppc64le -k . -t 45s
> > QEMU location: /usr/bin
> > 
> > QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37)
> > 
> > + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \
> > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \
> > ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \
> > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \
> > -machine powernv8 -display none -kernel \
> > /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \
> > -nodefaults -serial mon:stdio
> > ...
> > [    1.478028][    T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
> > [    1.478630][    T1] Faulting instruction address: 0xc00000000090bee0
> > [    1.479521][    T1] Oops: Kernel access of bad area, sig: 11 [#1]
> > [    1.480036][    T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> > [    1.480853][    T1] Modules linked in:
> > [    1.481265][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1
> > [    1.481967][    T1] NIP:  c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c
> > [    1.482596][    T1] REGS: c000000007443330 TRAP: 0380   Not tainted  (5.17.0-rc4-00001-gfa15c7cb550f)
> > [    1.483305][    T1] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 22800a87  XER: 00000000
> > [    1.484277][    T1] CFAR: c000000000d96b5c IRQMASK: 0
> > [    1.484277][    T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000
> > [    1.484277][    T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88
> > [    1.484277][    T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000
> > [    1.484277][    T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000
> > [    1.484277][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [    1.484277][    T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000
> > [    1.484277][    T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000
> > [    1.484277][    T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0
> > [    1.490325][    T1] NIP [c00000000090bee0] strlen+0x10/0x30
> > [    1.490788][    T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390
> > [    1.491319][    T1] Call Trace:
> > [    1.491573][    T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable)
> > [    1.492291][    T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0
> > [    1.492958][    T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0
> > [    1.493559][    T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670
> > [    1.494205][    T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80
> > [    1.494823][    T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200
> > [    1.495426][    T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0
> > [    1.496014][    T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0
> > [    1.496716][    T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0
> > [    1.497372][    T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160
> > [    1.497950][    T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0
> > [    1.498573][    T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4
> > [    1.499219][    T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4
> > [    1.499799][    T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec
> > [    1.500444][    T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270
> > [    1.501042][    T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
> > [    1.501721][    T1] Instruction dump:
> > [    1.502202][    T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000
> > [    1.502934][    T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050
> > [    1.504028][    T1] ---[ end trace 0000000000000000 ]---
> > ...
> > 
> > First thing was figuring out where the NULL pointer dereference happens,
> > which appears to the "strlen(ops->kind)" in rtnl_link_get_size():
> > 
> > 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> > 516 {
> > 517 	const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > 518 	size_t size;
> > 519
> > 520 	if (!ops)
> > 521 		return 0;
> > 522
> > 523 	size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > 524 	       nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > 
> > which I confirmed some really rudimentary printk debugging:
> > 
> > [    1.476862][    T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null)
> > 
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index 710da8a36729..c8d928e83aec 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
> >   	if (!ops)
> >   		return 0;
> > +	pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__,
> > +	       dev->name, ops, ops->kind);
> > +
> >   	size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> >   	       nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > 
> > Okay... how did sit0 end up with a NULL kind...? It is very clearly
> > defined as "sit":
> > 
> > 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = {
> > 1831 	.kind		= "sit",
> > 
> > Adding some more debug prints to net/ipv6/sit.c:
> > 
> > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > index c0b138c20992..7b9edbed2fcd 100644
> > --- a/net/ipv6/sit.c
> > +++ b/net/ipv6/sit.c
> > @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net)
> >   	 */
> >   	sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > +	pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops);
> > +	pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind);
> > +	pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype);
> > +	pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops);
> > +	pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind);
> > +
> >   	err = register_netdev(sitn->fb_tunnel_dev);
> >   	if (err)
> >   		goto err_reg_dev;
> > 
> > reveals:
> > 
> > [    1.471920][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8
> > [    1.472534][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null)
> > [    1.473088][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > [    1.473639][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8
> > [    1.474370][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null)
> > 
> > This is super bizarre, as the maxtype member appears to have the correct
> > value, but how is kind's initial getting dropped on the floor?
> > 
> > Removing the __read_mostly annotation "fixes" it:
> > 
> > [    1.481708][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60
> > [    1.482319][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> > [    1.482878][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > [    1.483429][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60
> > [    1.484174][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> > ...
> > Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022
> > ...
> > 
> > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > index 7b9edbed2fcd..f109c7a0233b 100644
> > --- a/net/ipv6/sit.c
> > +++ b/net/ipv6/sit.c
> > @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev);
> >   static void ipip6_dev_free(struct net_device *dev);
> >   static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst,
> >   		      __be32 *v4dst);
> > -static struct rtnl_link_ops sit_link_ops __read_mostly;
> > +static struct rtnl_link_ops sit_link_ops;
> >   static unsigned int sit_net_id __read_mostly;
> >   struct sit_net {
> > @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head)
> >   		unregister_netdevice_queue(dev, head);
> >   }
> > -static struct rtnl_link_ops sit_link_ops __read_mostly = {
> > +static struct rtnl_link_ops sit_link_ops = {
> >   	.kind		= "sit",
> >   	.maxtype	= IFLA_IPTUN_MAX,
> >   	.policy		= ipip6_policy,
> > 
> > Switching to ld.bfd also resolves it:
> > 
> > [    1.470405][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8
> > [    1.471016][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> > [    1.471534][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > [    1.472062][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8
> > [    1.472790][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> > ...
> > Linux version 5.17.0-rc4-00001-g956f02ad5c31 (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022
> > ...
> > 
> > I tested with ToT LLVM (or at least, close to it, since there is an
> > unrelated ld.lld regression there) and I could not reproduce it there,
> > so I did a reverse bisect to see what commit fixes this issue in LLVM 14
> > and I landed on:
> > 
> > commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4
> > Author: Fangrui Song <i@maskray.me>
> > Date:   Thu Nov 25 14:12:34 2021 -0800
> > 
> >      [ELF] Simplify DynamicSection content computation. NFC
> > 
> >      The new code computes the content twice, but avoides the tricky
> >      std::function<uint64_t()>. Removed 13KiB code in a Release build.
> > 
> >   lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++--------------------------
> >   lld/ELF/SyntheticSections.h   |  12 +----
> >   2 files changed, 44 insertions(+), 85 deletions(-)
> > 
> > That's... interesting, given that commit title says No Functional
> > Change, even though there clearly is one. That commit has a couple
> > mentions of PowerPC synthetic sections, so it is possible that the
> > new content calculation lines up with ld.bfd?
> > 
> > I am not really sure where to go from here, as I don't fully understand
> > what the problem was before that LLD change. I'll see if I can do some
> > more investigation tomorrow (unless someone wants to beat me to it ;)
> 
> Thank you for looking into this, and sharing your analysis.
> 
> I built LLVM/clang from the master branch, rebuilt, but can still reproduce
> this.
> 
>     $ git clone --depth=1 https://github.com/llvm/llvm-project.git
>     $ cd llvm-project/
>     $ git log --oneline
>     41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to Linalg
> Transforms
>     $ mkdir build
>     $ cd build
>     $ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles"

Since this is something related to ld.lld, not clang, this should be:

... -DLLVM_ENABLE_PROJECTS="clang;lld" ...

> -DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON
> -DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm
>     $ make -j20
>     $ make -j20 clang-check

You can also do 'check-lld' if you want.

>     $ make install
>     $ /scratch/local2/llvm/bin/clang --version
>     clang version 15.0.0 (https://github.com/llvm/llvm-project.git
> 41cb504b7c4b18ac15830107431a0c1eec73a6b2)
>     Target: powerpc64le-unknown-linux-gnu
>     Thread model: posix
>     InstalledDir: /scratch/local2/llvm/bin
> 
> Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in the
> path.
> 
>     $ LLVM=1 LLVM_IAS=0 eatmydata make -j20
> 
>     $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 -net
> none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m
> 512 -kernel /dev/shm/linux/vmlinux -append "debug_boot_weak_hash panic=-1
> console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1
> torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000
> rcupdate.rcu_self_test=1 rcutorture.onoff_interval=1000
> rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4
> rcutorture.stat_interval=15 rcutorture.shutdown_secs=420
> rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"
>     […]
>     Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7
> (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (clang version
> 15.0.0 (https://github.com/llvm/llvm-project.git
> 41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT Mon

                                             ^ still using ld.lld 13.0.0.

If you want to test the master branch, I would checkout LLVM at
460830a9c664e8cce959c660648faa7747ad8bdc, as the next commit introduces
a boot regression unrelated to this issue:

https://github.com/ClangBuiltLinux/linux/issues/1581

That should at least confirm this is resolved in a newer release.

> Feb 21 10:58:54 CET 2022
>     […]
>     [    0.465889][    T1] BUG: Kernel NULL pointer dereference on read at
> 0x00000000
>     [    0.466749][    T1] Faulting instruction address: 0xc0000000008fc300
>     [    0.467507][    T1] Oops: Kernel access of bad area, sig: 11 [#1]
>     […]

I do intend to do further analysis at some point over the next few days
to see if I can figure out exactly why that commit that I mentioned
above fixes the issue then we can look into what we should do about it
in the kernel sources.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-02-21 15:29                 ` Nathan Chancellor
@ 2022-02-21 17:33                   ` Paul Menzel
  2022-04-19 21:34                   ` Nathan Chancellor
  1 sibling, 0 replies; 17+ messages in thread
From: Paul Menzel @ 2022-02-21 17:33 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Zhouyi Zhou, Paul E. McKenney, Josh Triplett, rcu, LKML,
	David S. Miller, Jakub Kicinski, netdev, Nick Desaulniers, llvm,
	Fangrui Song

Dear Nathan,


Am 21.02.22 um 16:29 schrieb Nathan Chancellor:

> On Mon, Feb 21, 2022 at 12:17:40PM +0100, Paul Menzel wrote:
>> Am 17.02.22 um 02:16 schrieb Nathan Chancellor:
>>
>>> On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote:
>>>> [Cc: +LLVM/clang build support folks]
>>
>> […]
>>
>>>> To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm*
>>>> and *clang* 1:13.0-53~exp1
>>>>
>>>>       $ clang --version
>>>>       Ubuntu clang version 13.0.0-2
>>>>       Target: powerpc64le-unknown-linux-gnu
>>>>       Thread model: posix
>>>>       InstalledDir: /usr/bin
>>>>
>>>> results in a segmentation fault, while it works when building with GCC.
>>>>
>>>>       $ gcc --version
>>>>       gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
>>>
>>> Thank you for keying us in. I am going to have a bit of a brain dump
>>> here based on the information I have uncovered after a couple of hours
>>> of debugging.
>>>
>>> TL;DR: It seems like something is broken with __read_mostly + ld.lld
>>> before 14.0.0.
>>>
>>> My initial reproduction steps (boot-qemu.sh comes from
>>> https://github.com/ClangBuiltLinux/boot-utils):
>>>
>>> $ clang --version
>>> clang version 13.0.1 (Fedora 13.0.1-1.fc37)
>>> Target: x86_64-redhat-linux-gnu
>>> Thread model: posix
>>> InstalledDir: /usr/bin
>>>
>>> $ powerpc64le-linux-gnu-as --version
>>> GNU assembler version 2.37-2.fc36
>>> Copyright (C) 2021 Free Software Foundation, Inc.
>>> This program is free software; you may redistribute it under the terms of
>>> the GNU General Public License version 3 or later.
>>> This program has absolutely no warranty.
>>> This assembler was configured for a target of `powerpc64le-linux-gnu'.
>>>
>>> $ curl -LSso .config https://lore.kernel.org/all/f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de/3-linux-5.17-rc4-rcu-dev-config.txt
>>>
>>> $ scripts/config --set-val INITRAMFS_SOURCE '""'
>>>
>>> $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all
>>>
>>> $ boot-qemu.sh -a ppc64le -k . -t 45s
>>> QEMU location: /usr/bin
>>>
>>> QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37)
>>>
>>> + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \
>>> /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \
>>> ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \
>>> /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \
>>> -machine powernv8 -display none -kernel \
>>> /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \
>>> -nodefaults -serial mon:stdio
>>> ...
>>> [    1.478028][    T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
>>> [    1.478630][    T1] Faulting instruction address: 0xc00000000090bee0
>>> [    1.479521][    T1] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [    1.480036][    T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
>>> [    1.480853][    T1] Modules linked in:
>>> [    1.481265][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1
>>> [    1.481967][    T1] NIP:  c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c
>>> [    1.482596][    T1] REGS: c000000007443330 TRAP: 0380   Not tainted  (5.17.0-rc4-00001-gfa15c7cb550f)
>>> [    1.483305][    T1] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 22800a87  XER: 00000000
>>> [    1.484277][    T1] CFAR: c000000000d96b5c IRQMASK: 0
>>> [    1.484277][    T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000
>>> [    1.484277][    T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88
>>> [    1.484277][    T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000
>>> [    1.484277][    T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000
>>> [    1.484277][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> [    1.484277][    T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000
>>> [    1.484277][    T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000
>>> [    1.484277][    T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0
>>> [    1.490325][    T1] NIP [c00000000090bee0] strlen+0x10/0x30
>>> [    1.490788][    T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390
>>> [    1.491319][    T1] Call Trace:
>>> [    1.491573][    T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable)
>>> [    1.492291][    T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0
>>> [    1.492958][    T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0
>>> [    1.493559][    T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670
>>> [    1.494205][    T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80
>>> [    1.494823][    T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200
>>> [    1.495426][    T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0
>>> [    1.496014][    T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0
>>> [    1.496716][    T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0
>>> [    1.497372][    T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160
>>> [    1.497950][    T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0
>>> [    1.498573][    T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4
>>> [    1.499219][    T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4
>>> [    1.499799][    T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec
>>> [    1.500444][    T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270
>>> [    1.501042][    T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
>>> [    1.501721][    T1] Instruction dump:
>>> [    1.502202][    T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000
>>> [    1.502934][    T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050
>>> [    1.504028][    T1] ---[ end trace 0000000000000000 ]---
>>> ...
>>>
>>> First thing was figuring out where the NULL pointer dereference happens,
>>> which appears to the "strlen(ops->kind)" in rtnl_link_get_size():
>>>
>>> 515 static size_t rtnl_link_get_size(const struct net_device *dev)
>>> 516 {
>>> 517 	const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
>>> 518 	size_t size;
>>> 519
>>> 520 	if (!ops)
>>> 521 		return 0;
>>> 522
>>> 523 	size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
>>> 524 	       nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
>>>
>>> which I confirmed some really rudimentary printk debugging:
>>>
>>> [    1.476862][    T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null)
>>>
>>> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>>> index 710da8a36729..c8d928e83aec 100644
>>> --- a/net/core/rtnetlink.c
>>> +++ b/net/core/rtnetlink.c
>>> @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
>>>    	if (!ops)
>>>    		return 0;
>>> +	pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__,
>>> +	       dev->name, ops, ops->kind);
>>> +
>>>    	size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
>>>    	       nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
>>>
>>> Okay... how did sit0 end up with a NULL kind...? It is very clearly
>>> defined as "sit":
>>>
>>> 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = {
>>> 1831 	.kind		= "sit",
>>>
>>> Adding some more debug prints to net/ipv6/sit.c:
>>>
>>> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
>>> index c0b138c20992..7b9edbed2fcd 100644
>>> --- a/net/ipv6/sit.c
>>> +++ b/net/ipv6/sit.c
>>> @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net)
>>>    	 */
>>>    	sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
>>> +	pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops);
>>> +	pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind);
>>> +	pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype);
>>> +	pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops);
>>> +	pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind);
>>> +
>>>    	err = register_netdev(sitn->fb_tunnel_dev);
>>>    	if (err)
>>>    		goto err_reg_dev;
>>>
>>> reveals:
>>>
>>> [    1.471920][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8
>>> [    1.472534][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null)
>>> [    1.473088][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
>>> [    1.473639][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8
>>> [    1.474370][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null)
>>>
>>> This is super bizarre, as the maxtype member appears to have the correct
>>> value, but how is kind's initial getting dropped on the floor?
>>>
>>> Removing the __read_mostly annotation "fixes" it:
>>>
>>> [    1.481708][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60
>>> [    1.482319][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
>>> [    1.482878][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
>>> [    1.483429][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60
>>> [    1.484174][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
>>> ...
>>> Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022
>>> ...
>>>
>>> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
>>> index 7b9edbed2fcd..f109c7a0233b 100644
>>> --- a/net/ipv6/sit.c
>>> +++ b/net/ipv6/sit.c
>>> @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev);
>>>    static void ipip6_dev_free(struct net_device *dev);
>>>    static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst,
>>>    		      __be32 *v4dst);
>>> -static struct rtnl_link_ops sit_link_ops __read_mostly;
>>> +static struct rtnl_link_ops sit_link_ops;
>>>    static unsigned int sit_net_id __read_mostly;
>>>    struct sit_net {
>>> @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head)
>>>    		unregister_netdevice_queue(dev, head);
>>>    }
>>> -static struct rtnl_link_ops sit_link_ops __read_mostly = {
>>> +static struct rtnl_link_ops sit_link_ops = {
>>>    	.kind		= "sit",
>>>    	.maxtype	= IFLA_IPTUN_MAX,
>>>    	.policy		= ipip6_policy,
>>>
>>> Switching to ld.bfd also resolves it:
>>>
>>> [    1.470405][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8
>>> [    1.471016][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
>>> [    1.471534][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
>>> [    1.472062][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8
>>> [    1.472790][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
>>> ...
>>> Linux version 5.17.0-rc4-00001-g956f02ad5c31 (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022
>>> ...
>>>
>>> I tested with ToT LLVM (or at least, close to it, since there is an
>>> unrelated ld.lld regression there) and I could not reproduce it there,
>>> so I did a reverse bisect to see what commit fixes this issue in LLVM 14
>>> and I landed on:
>>>
>>> commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4
>>> Author: Fangrui Song <i@maskray.me>
>>> Date:   Thu Nov 25 14:12:34 2021 -0800
>>>
>>>       [ELF] Simplify DynamicSection content computation. NFC
>>>
>>>       The new code computes the content twice, but avoides the tricky
>>>       std::function<uint64_t()>. Removed 13KiB code in a Release build.
>>>
>>>    lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++--------------------------
>>>    lld/ELF/SyntheticSections.h   |  12 +----
>>>    2 files changed, 44 insertions(+), 85 deletions(-)
>>>
>>> That's... interesting, given that commit title says No Functional
>>> Change, even though there clearly is one. That commit has a couple
>>> mentions of PowerPC synthetic sections, so it is possible that the
>>> new content calculation lines up with ld.bfd?
>>>
>>> I am not really sure where to go from here, as I don't fully understand
>>> what the problem was before that LLD change. I'll see if I can do some
>>> more investigation tomorrow (unless someone wants to beat me to it ;)
>>
>> Thank you for looking into this, and sharing your analysis.
>>
>> I built LLVM/clang from the master branch, rebuilt, but can still reproduce
>> this.
>>
>>      $ git clone --depth=1 https://github.com/llvm/llvm-project.git
>>      $ cd llvm-project/
>>      $ git log --oneline
>>      41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to Linalg Transforms
>>      $ mkdir build
>>      $ cd build
>>      $ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles"
> 
> Since this is something related to ld.lld, not clang, this should be:
> 
> ... -DLLVM_ENABLE_PROJECTS="clang;lld" ...
> 
>> -DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON
>> -DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm
>>      $ make -j20
>>      $ make -j20 clang-check
> 
> You can also do 'check-lld' if you want.
> 
>>      $ make install
>>      $ /scratch/local2/llvm/bin/clang --version
>>      clang version 15.0.0 (https://github.com/llvm/llvm-project.git 41cb504b7c4b18ac15830107431a0c1eec73a6b2)
>>      Target: powerpc64le-unknown-linux-gnu
>>      Thread model: posix
>>      InstalledDir: /scratch/local2/llvm/bin
>>
>> Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in the
>> path.
>>
>>      $ LLVM=1 LLVM_IAS=0 eatmydata make -j20
>>
>>      $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m 512 -kernel /dev/shm/linux/vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_self_test=1 rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"
>>      […]
>>      Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7 (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (clang version 15.0.0 (https://github.com/llvm/llvm-project.git 41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT Mon
> 
>                                               ^ still using ld.lld 13.0.0.
> 
> If you want to test the master branch, I would checkout LLVM at
> 460830a9c664e8cce959c660648faa7747ad8bdc, as the next commit introduces
> a boot regression unrelated to this issue:
> 
> https://github.com/ClangBuiltLinux/linux/issues/1581
> 
> That should at least confirm this is resolved in a newer release.

Sorry for missing to update ld.lld. Indeed with the commit you 
mentioned, the segmentation fault is gone.

     $ /scratch/local2/llvm/bin/ld.lld --version
     LLD 14.0.0 (compatible with GNU linkers)

>> Feb 21 10:58:54 CET 2022
>>      […]
>>      [    0.465889][    T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
>>      [    0.466749][    T1] Faulting instruction address: 0xc0000000008fc300
>>      [    0.467507][    T1] Oops: Kernel access of bad area, sig: 11 [#1]
>>      […]
> 
> I do intend to do further analysis at some point over the next few days
> to see if I can figure out exactly why that commit that I mentioned
> above fixes the issue then we can look into what we should do about it
> in the kernel sources.

Awesome. Thank you for working on that.


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)
  2022-02-21 15:29                 ` Nathan Chancellor
  2022-02-21 17:33                   ` Paul Menzel
@ 2022-04-19 21:34                   ` Nathan Chancellor
  1 sibling, 0 replies; 17+ messages in thread
From: Nathan Chancellor @ 2022-04-19 21:34 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Zhouyi Zhou, Paul E. McKenney, Josh Triplett, rcu, LKML,
	David S. Miller, Jakub Kicinski, netdev, Nick Desaulniers, llvm,
	Fangrui Song

On Mon, Feb 21, 2022 at 08:29:46AM -0700, Nathan Chancellor wrote:
> Hi Paul,
> 
> On Mon, Feb 21, 2022 at 12:17:40PM +0100, Paul Menzel wrote:
> > Am 17.02.22 um 02:16 schrieb Nathan Chancellor:
> > 
> > > On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote:
> > > > [Cc: +LLVM/clang build support folks]
> > 
> > […]
> > 
> > > > To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm*
> > > > and *clang* 1:13.0-53~exp1
> > > > 
> > > >      $ clang --version
> > > >      Ubuntu clang version 13.0.0-2
> > > >      Target: powerpc64le-unknown-linux-gnu
> > > >      Thread model: posix
> > > >      InstalledDir: /usr/bin
> > > > 
> > > > results in a segmentation fault, while it works when building with GCC.
> > > > 
> > > >      $ gcc --version
> > > >      gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
> > > 
> > > Thank you for keying us in. I am going to have a bit of a brain dump
> > > here based on the information I have uncovered after a couple of hours
> > > of debugging.
> > > 
> > > TL;DR: It seems like something is broken with __read_mostly + ld.lld
> > > before 14.0.0.
> > > 
> > > My initial reproduction steps (boot-qemu.sh comes from
> > > https://github.com/ClangBuiltLinux/boot-utils):
> > > 
> > > $ clang --version
> > > clang version 13.0.1 (Fedora 13.0.1-1.fc37)
> > > Target: x86_64-redhat-linux-gnu
> > > Thread model: posix
> > > InstalledDir: /usr/bin
> > > 
> > > $ powerpc64le-linux-gnu-as --version
> > > GNU assembler version 2.37-2.fc36
> > > Copyright (C) 2021 Free Software Foundation, Inc.
> > > This program is free software; you may redistribute it under the terms of
> > > the GNU General Public License version 3 or later.
> > > This program has absolutely no warranty.
> > > This assembler was configured for a target of `powerpc64le-linux-gnu'.
> > > 
> > > $ curl -LSso .config https://lore.kernel.org/all/f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de/3-linux-5.17-rc4-rcu-dev-config.txt
> > > 
> > > $ scripts/config --set-val INITRAMFS_SOURCE '""'
> > > 
> > > $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all
> > > 
> > > $ boot-qemu.sh -a ppc64le -k . -t 45s
> > > QEMU location: /usr/bin
> > > 
> > > QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37)
> > > 
> > > + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \
> > > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \
> > > ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \
> > > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \
> > > -machine powernv8 -display none -kernel \
> > > /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \
> > > -nodefaults -serial mon:stdio
> > > ...
> > > [    1.478028][    T1] BUG: Kernel NULL pointer dereference on read at 0x00000000
> > > [    1.478630][    T1] Faulting instruction address: 0xc00000000090bee0
> > > [    1.479521][    T1] Oops: Kernel access of bad area, sig: 11 [#1]
> > > [    1.480036][    T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV
> > > [    1.480853][    T1] Modules linked in:
> > > [    1.481265][    T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1
> > > [    1.481967][    T1] NIP:  c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c
> > > [    1.482596][    T1] REGS: c000000007443330 TRAP: 0380   Not tainted  (5.17.0-rc4-00001-gfa15c7cb550f)
> > > [    1.483305][    T1] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 22800a87  XER: 00000000
> > > [    1.484277][    T1] CFAR: c000000000d96b5c IRQMASK: 0
> > > [    1.484277][    T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000
> > > [    1.484277][    T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88
> > > [    1.484277][    T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000
> > > [    1.484277][    T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000
> > > [    1.484277][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [    1.484277][    T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000
> > > [    1.484277][    T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000
> > > [    1.484277][    T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0
> > > [    1.490325][    T1] NIP [c00000000090bee0] strlen+0x10/0x30
> > > [    1.490788][    T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390
> > > [    1.491319][    T1] Call Trace:
> > > [    1.491573][    T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable)
> > > [    1.492291][    T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0
> > > [    1.492958][    T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0
> > > [    1.493559][    T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670
> > > [    1.494205][    T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80
> > > [    1.494823][    T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200
> > > [    1.495426][    T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0
> > > [    1.496014][    T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0
> > > [    1.496716][    T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0
> > > [    1.497372][    T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160
> > > [    1.497950][    T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0
> > > [    1.498573][    T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4
> > > [    1.499219][    T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4
> > > [    1.499799][    T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec
> > > [    1.500444][    T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270
> > > [    1.501042][    T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64
> > > [    1.501721][    T1] Instruction dump:
> > > [    1.502202][    T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000
> > > [    1.502934][    T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050
> > > [    1.504028][    T1] ---[ end trace 0000000000000000 ]---
> > > ...
> > > 
> > > First thing was figuring out where the NULL pointer dereference happens,
> > > which appears to the "strlen(ops->kind)" in rtnl_link_get_size():
> > > 
> > > 515 static size_t rtnl_link_get_size(const struct net_device *dev)
> > > 516 {
> > > 517 	const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > > 518 	size_t size;
> > > 519
> > > 520 	if (!ops)
> > > 521 		return 0;
> > > 522
> > > 523 	size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > 524 	       nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > > 
> > > which I confirmed some really rudimentary printk debugging:
> > > 
> > > [    1.476862][    T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null)
> > > 
> > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > > index 710da8a36729..c8d928e83aec 100644
> > > --- a/net/core/rtnetlink.c
> > > +++ b/net/core/rtnetlink.c
> > > @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev)
> > >   	if (!ops)
> > >   		return 0;
> > > +	pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__,
> > > +	       dev->name, ops, ops->kind);
> > > +
> > >   	size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > >   	       nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > > 
> > > Okay... how did sit0 end up with a NULL kind...? It is very clearly
> > > defined as "sit":
> > > 
> > > 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = {
> > > 1831 	.kind		= "sit",
> > > 
> > > Adding some more debug prints to net/ipv6/sit.c:
> > > 
> > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > > index c0b138c20992..7b9edbed2fcd 100644
> > > --- a/net/ipv6/sit.c
> > > +++ b/net/ipv6/sit.c
> > > @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net)
> > >   	 */
> > >   	sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > +	pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops);
> > > +	pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind);
> > > +	pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype);
> > > +	pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops);
> > > +	pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind);
> > > +
> > >   	err = register_netdev(sitn->fb_tunnel_dev);
> > >   	if (err)
> > >   		goto err_reg_dev;
> > > 
> > > reveals:
> > > 
> > > [    1.471920][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8
> > > [    1.472534][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null)
> > > [    1.473088][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > > [    1.473639][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8
> > > [    1.474370][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null)
> > > 
> > > This is super bizarre, as the maxtype member appears to have the correct
> > > value, but how is kind's initial getting dropped on the floor?
> > > 
> > > Removing the __read_mostly annotation "fixes" it:
> > > 
> > > [    1.481708][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60
> > > [    1.482319][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> > > [    1.482878][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > > [    1.483429][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60
> > > [    1.484174][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> > > ...
> > > Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022
> > > ...
> > > 
> > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > > index 7b9edbed2fcd..f109c7a0233b 100644
> > > --- a/net/ipv6/sit.c
> > > +++ b/net/ipv6/sit.c
> > > @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev);
> > >   static void ipip6_dev_free(struct net_device *dev);
> > >   static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst,
> > >   		      __be32 *v4dst);
> > > -static struct rtnl_link_ops sit_link_ops __read_mostly;
> > > +static struct rtnl_link_ops sit_link_ops;
> > >   static unsigned int sit_net_id __read_mostly;
> > >   struct sit_net {
> > > @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head)
> > >   		unregister_netdevice_queue(dev, head);
> > >   }
> > > -static struct rtnl_link_ops sit_link_ops __read_mostly = {
> > > +static struct rtnl_link_ops sit_link_ops = {
> > >   	.kind		= "sit",
> > >   	.maxtype	= IFLA_IPTUN_MAX,
> > >   	.policy		= ipip6_policy,
> > > 
> > > Switching to ld.bfd also resolves it:
> > > 
> > > [    1.470405][    T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8
> > > [    1.471016][    T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit
> > > [    1.471534][    T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20
> > > [    1.472062][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8
> > > [    1.472790][    T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit
> > > ...
> > > Linux version 5.17.0-rc4-00001-g956f02ad5c31 (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022
> > > ...
> > > 
> > > I tested with ToT LLVM (or at least, close to it, since there is an
> > > unrelated ld.lld regression there) and I could not reproduce it there,
> > > so I did a reverse bisect to see what commit fixes this issue in LLVM 14
> > > and I landed on:
> > > 
> > > commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4
> > > Author: Fangrui Song <i@maskray.me>
> > > Date:   Thu Nov 25 14:12:34 2021 -0800
> > > 
> > >      [ELF] Simplify DynamicSection content computation. NFC
> > > 
> > >      The new code computes the content twice, but avoides the tricky
> > >      std::function<uint64_t()>. Removed 13KiB code in a Release build.
> > > 
> > >   lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++--------------------------
> > >   lld/ELF/SyntheticSections.h   |  12 +----
> > >   2 files changed, 44 insertions(+), 85 deletions(-)
> > > 
> > > That's... interesting, given that commit title says No Functional
> > > Change, even though there clearly is one. That commit has a couple
> > > mentions of PowerPC synthetic sections, so it is possible that the
> > > new content calculation lines up with ld.bfd?
> > > 
> > > I am not really sure where to go from here, as I don't fully understand
> > > what the problem was before that LLD change. I'll see if I can do some
> > > more investigation tomorrow (unless someone wants to beat me to it ;)
> > 
> > Thank you for looking into this, and sharing your analysis.
> > 
> > I built LLVM/clang from the master branch, rebuilt, but can still reproduce
> > this.
> > 
> >     $ git clone --depth=1 https://github.com/llvm/llvm-project.git
> >     $ cd llvm-project/
> >     $ git log --oneline
> >     41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to Linalg
> > Transforms
> >     $ mkdir build
> >     $ cd build
> >     $ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles"
> 
> Since this is something related to ld.lld, not clang, this should be:
> 
> ... -DLLVM_ENABLE_PROJECTS="clang;lld" ...
> 
> > -DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON
> > -DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm
> >     $ make -j20
> >     $ make -j20 clang-check
> 
> You can also do 'check-lld' if you want.
> 
> >     $ make install
> >     $ /scratch/local2/llvm/bin/clang --version
> >     clang version 15.0.0 (https://github.com/llvm/llvm-project.git
> > 41cb504b7c4b18ac15830107431a0c1eec73a6b2)
> >     Target: powerpc64le-unknown-linux-gnu
> >     Thread model: posix
> >     InstalledDir: /scratch/local2/llvm/bin
> > 
> > Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in the
> > path.
> > 
> >     $ LLVM=1 LLVM_IAS=0 eatmydata make -j20
> > 
> >     $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 -net
> > none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m
> > 512 -kernel /dev/shm/linux/vmlinux -append "debug_boot_weak_hash panic=-1
> > console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1
> > torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000
> > rcupdate.rcu_self_test=1 rcutorture.onoff_interval=1000
> > rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4
> > rcutorture.stat_interval=15 rcutorture.shutdown_secs=420
> > rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"
> >     […]
> >     Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7
> > (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (clang version
> > 15.0.0 (https://github.com/llvm/llvm-project.git
> > 41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT Mon
> 
>                                              ^ still using ld.lld 13.0.0.
> 
> If you want to test the master branch, I would checkout LLVM at
> 460830a9c664e8cce959c660648faa7747ad8bdc, as the next commit introduces
> a boot regression unrelated to this issue:
> 
> https://github.com/ClangBuiltLinux/linux/issues/1581
> 
> That should at least confirm this is resolved in a newer release.
> 
> > Feb 21 10:58:54 CET 2022
> >     […]
> >     [    0.465889][    T1] BUG: Kernel NULL pointer dereference on read at
> > 0x00000000
> >     [    0.466749][    T1] Faulting instruction address: 0xc0000000008fc300
> >     [    0.467507][    T1] Oops: Kernel access of bad area, sig: 11 [#1]
> >     […]
> 
> I do intend to do further analysis at some point over the next few days
> to see if I can figure out exactly why that commit that I mentioned
> above fixes the issue then we can look into what we should do about it
> in the kernel sources.

Sorry for taking so long to get back to this. For me, commit
d79976918852 ("powerpc/64: Add UADDR64 relocation support") resolves
this for ld.lld 13.x. I have started a separate thread about whether or
not this commit is suitable for stable, specifically 5.17 and 5.15:

https://lore.kernel.org/Yl8pNxSGUgeHZ1FT@dev-arch.thelio-3990X/

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-04-19 21:34 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-25 19:13 BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) Paul Menzel
2022-01-26  9:47 ` Zhouyi Zhou
2022-01-29  2:23 ` Zhouyi Zhou
2022-01-29 16:52   ` Paul Menzel
2022-01-30  0:21     ` Zhouyi Zhou
2022-01-30  8:19       ` Paul Menzel
2022-01-30 13:24         ` Zhouyi Zhou
2022-01-30 17:44           ` Paul E. McKenney
2022-01-31  1:08             ` Zhouyi Zhou
2022-02-01 17:50               ` Paul E. McKenney
2022-02-02  2:39                 ` Zhouyi Zhou
2022-02-08 20:10                   ` Zhouyi Zhou
     [not found]           ` <f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de>
2022-02-17  1:16             ` Nathan Chancellor
2022-02-21 11:17               ` Paul Menzel
2022-02-21 15:29                 ` Nathan Chancellor
2022-02-21 17:33                   ` Paul Menzel
2022-04-19 21:34                   ` Nathan Chancellor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).