* BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) @ 2022-01-25 19:13 Paul Menzel 2022-01-26 9:47 ` Zhouyi Zhou 2022-01-29 2:23 ` Zhouyi Zhou 0 siblings, 2 replies; 17+ messages in thread From: Paul Menzel @ 2022-01-25 19:13 UTC (permalink / raw) To: Paul E. McKenney, Josh Triplett Cc: rcu, LKML, David S. Miller, Jakub Kicinski, netdev Dear Linux folks, I do not know, if this is an rcutorture issue, or if rcutorture found a bug with `rtmsg_ifinfo_build_skb()`. Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with CONFIG_TORTURE_TEST=y CONFIG_RCU_TORTURE_TEST=y and $ clang --version Ubuntu clang version 12.0.0-3ubuntu1~21.04.2 Target: powerpc64le-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin $ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg and booting it on an IBM S822LC, Linux paniced with a null pointer dereference, and the watchdog rebooted, and I found the message below in `/sys/fs/pstore/dmesg-nvram-2.enc.z`. ``` [ T1] Key type id_legacy registered [ T1] SGI XFS with ACLs, security attributes, no debug enabled [ T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248) [ T1] io scheduler mq-deadline registered [ T1] io scheduler kyber registered [ T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left [ T1] pci 0021:10:00.0: enabling device (0141 -> 0143) [ T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32, pitch=4096 [ T1] Console: switching to colour frame buffer device 128x48 [ T1] fb0: Open Firmware frame buffer device on /pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0 [ T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console) [ T1] hvc0: No interrupts property, using OPAL event [ T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ T1] Non-volatile memory driver v1.3 [ T1] brd: module loaded [ T1] loop: module loaded [ T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March 14, 2017) [ T1] ahci 0021:0e:00.0: version 3.0 [ T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143) [ T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf impl SATA mode [ T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs [ T1] scsi host0: ahci [ T1] scsi host1: ahci [ T1] scsi host2: ahci [ T1] scsi host3: ahci [ T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port 0x3fe881000100 irq 39 [ T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port 0x3fe881000180 irq 39 [ T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port 0x3fe881000200 irq 39 [ T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port 0x3fe881000280 irq 39 [ T1] e100: Intel(R) PRO/100 Network Driver [ T1] e100: Copyright(c) 1999-2006 Intel Corporation [ T1] e1000: Intel(R) PRO/1000 Network Driver [ T1] e1000: Copyright (c) 1999-2006 Intel Corporation. [ T1] e1000e: Intel(R) PRO/1000 Network Driver [ T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ T1] ehci-pci: EHCI PCI platform driver [ T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ T1] ohci-pci: OHCI PCI platform driver [ T1] rtc-opal opal-rtc: registered as rtc0 [ T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45 UTC (1643048505) [ T1] i2c_dev: i2c /dev entries driver [ T1] device-mapper: uevent: version 1.0.3 [ T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: dm-devel@redhat.com [ T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal 0xffffffef max 0x0 [ T1] powernv-cpufreq: Workload Optimized Frequency is disabled in the platform [ T1] powernv_idle_driver registered [ T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1 [ T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9 [ T1] usbcore: registered new interface driver usbhid [ T1] usbhid: USB HID core driver [ T1] ipip: IPv4 and MPLS over IPv4 tunneling driver [ T1] NET: Registered PF_INET6 protocol family [ T1] Segment Routing with IPv6 [ T1] In-situ OAM (IOAM) with IPv6 [ T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver [ T1] BUG: Kernel NULL pointer dereference on write at 0x00000000 [ T1] Faulting instruction address: 0xc0000000008e2400 [ T1] Oops: Kernel access of bad area, sig: 11 [#1] [ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV [ T1] Modules linked in: [ T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc1-00032-gdd81e1c7d5fb #29 [ T1] NIP: c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60 [ T1] REGS: c0000000125033e0 TRAP: 0380 Not tainted (5.17.0-rc1-00032-gdd81e1c7d5fb) [ T1] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42800c40 XER: 00000000 [ T1] CFAR: c000000000d65dac IRQMASK: 0 [ T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600 0000000000000000 [ T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000 0000000000000cc0 [ T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000001 [ T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478 0000000000000000 [ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0 0000000000000000 [ T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000 0000000000000000 [ T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000 c000000012503680 [ T1] NIP [c0000000008e2400] strlen+0x10/0x30 [ T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360 [ T1] Call Trace: [ T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0 (unreliable) [ T1] [c0000000125036f0] [c000000000d65b40] rtmsg_ifinfo_build_skb+0x80/0x1a0 [ T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0 [ T1] [c000000012503800] [c000000000d4de50] register_netdevice+0x690/0x770 [ T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80 [ T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0 [ T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0 [ T1] [c000000012503970] [c000000000d331bc] register_pernet_operations+0xec/0x1e0 [ T1] [c0000000125039d0] [c000000000d33440] register_pernet_device+0x60/0xd0 [ T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160 [ T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0 [ T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4 [ T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4 [ T1] [c000000012503d40] [c000000002005c7c] kernel_init_freeable+0x160/0x1ec [ T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270 [ T1] [c000000012503e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 [ T1] Instruction dump: [ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000 [ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050 [ T1] ---[ end trace 0000000000000000 ]--- [ T1] [ T206] ata4: SATA link down (SStatus 0 SControl 300) [ T204] ata3: SATA link down (SStatus 0 SControl 300) [ T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ T200] ata1.00: ATA-10: ST1000NX0313 00LY266 00LY265IBM, BE33, max UDMA/133 [ T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA [ T200] ata1.00: configured for UDMA/133 [ T7] scsi 0:0:0:0: Direct-Access ATA ST1000NX0313 BE33 PQ: 0 ANSI: 5 [ T7] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) [ T209] sd 0:0:0:0: [sda] 4096-byte physical blocks [ T209] sd 0:0:0:0: [sda] Write Protect is off [ T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ T209] sda: sda1 sda2 [ T209] sd 0:0:0:0: [sda] Attached SCSI removable disk [ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ``` Kind regards, Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-01-25 19:13 BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) Paul Menzel @ 2022-01-26 9:47 ` Zhouyi Zhou 2022-01-29 2:23 ` Zhouyi Zhou 1 sibling, 0 replies; 17+ messages in thread From: Zhouyi Zhou @ 2022-01-26 9:47 UTC (permalink / raw) To: Paul Menzel Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev Dear Menzel I am also very interested in RCU tests;-) First of all, thank your email for teaching me how to construct a kernel deb package using clang ;-) I build and test the linux-next under x86_64, but the kernel does not panic, I guess our kernel configuration maybe different, following is my steps: 1. git clone https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/next/linux-next.git 2. git describe: next-20220125 3. make menuconfig CC=clang-12 (CONFIG_TORTURE_TEST=y CONFIG_RCU_TORTURE_TEST=y) My configuration file is uploaded to my VPS cloud server: http://154.223.142.244/config-5.17.0-rc1-next-20220125+ 4. make CC=clang-12 -j 16 bindeb-pkg 5. install the kernel, reboot 6. the kernel does not panic (has been running for 30 minutes by now) I Hope I can be more helpful ;-) Thanks Sincerely Zhouyi On Wed, Jan 26, 2022 at 3:24 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > Dear Linux folks, > > > I do not know, if this is an rcutorture issue, or if rcutorture found a > bug with `rtmsg_ifinfo_build_skb()`. > > > Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with > > CONFIG_TORTURE_TEST=y > CONFIG_RCU_TORTURE_TEST=y > > and > > $ clang --version > Ubuntu clang version 12.0.0-3ubuntu1~21.04.2 > Target: powerpc64le-unknown-linux-gnu > Thread model: posix > InstalledDir: /usr/bin > $ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg > > and booting it on an IBM S822LC, Linux paniced with a null pointer > dereference, and the watchdog rebooted, and I found the message below in > `/sys/fs/pstore/dmesg-nvram-2.enc.z`. > > ``` > [ T1] Key type id_legacy registered > [ T1] SGI XFS with ACLs, security attributes, no debug enabled > [ T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major > 248) > [ T1] io scheduler mq-deadline registered > [ T1] io scheduler kyber registered > [ T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left > [ T1] pci 0021:10:00.0: enabling device (0141 -> 0143) > [ T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32, > pitch=4096 > [ T1] Console: switching to colour frame buffer device 128x48 > [ T1] fb0: Open Firmware frame buffer device on > /pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0 > [ T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console) > [ T1] hvc0: No interrupts property, using OPAL event > [ T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled > [ T1] Non-volatile memory driver v1.3 > [ T1] brd: module loaded > [ T1] loop: module loaded > [ T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March > 14, 2017) > [ T1] ahci 0021:0e:00.0: version 3.0 > [ T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143) > [ T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf > impl SATA mode > [ T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio > slum part sxs > [ T1] scsi host0: ahci > [ T1] scsi host1: ahci > [ T1] scsi host2: ahci > [ T1] scsi host3: ahci > [ T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000100 irq 39 > [ T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000180 irq 39 > [ T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000200 irq 39 > [ T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000280 irq 39 > [ T1] e100: Intel(R) PRO/100 Network Driver > [ T1] e100: Copyright(c) 1999-2006 Intel Corporation > [ T1] e1000: Intel(R) PRO/1000 Network Driver > [ T1] e1000: Copyright (c) 1999-2006 Intel Corporation. > [ T1] e1000e: Intel(R) PRO/1000 Network Driver > [ T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. > [ T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver > [ T1] ehci-pci: EHCI PCI platform driver > [ T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver > [ T1] ohci-pci: OHCI PCI platform driver > [ T1] rtc-opal opal-rtc: registered as rtc0 > [ T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45 > UTC (1643048505) > [ T1] i2c_dev: i2c /dev entries driver > [ T1] device-mapper: uevent: version 1.0.3 > [ T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: > dm-devel@redhat.com > [ T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal > 0xffffffef max 0x0 > [ T1] powernv-cpufreq: Workload Optimized Frequency is disabled in > the platform > [ T1] powernv_idle_driver registered > [ T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1 > [ T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9 > [ T1] usbcore: registered new interface driver usbhid > [ T1] usbhid: USB HID core driver > [ T1] ipip: IPv4 and MPLS over IPv4 tunneling driver > [ T1] NET: Registered PF_INET6 protocol family > [ T1] Segment Routing with IPv6 > [ T1] In-situ OAM (IOAM) with IPv6 > [ T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver > [ T1] BUG: Kernel NULL pointer dereference on write at 0x00000000 > [ T1] Faulting instruction address: 0xc0000000008e2400 > [ T1] Oops: Kernel access of bad area, sig: 11 [#1] > [ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV > [ T1] Modules linked in: > [ T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted > 5.17.0-rc1-00032-gdd81e1c7d5fb #29 > [ T1] NIP: c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60 > [ T1] REGS: c0000000125033e0 TRAP: 0380 Not tainted > (5.17.0-rc1-00032-gdd81e1c7d5fb) > [ T1] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42800c40 > XER: 00000000 > [ T1] CFAR: c000000000d65dac IRQMASK: 0 > [ T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600 > 0000000000000000 > [ T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000 > 0000000000000cc0 > [ T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff > 0000000000000001 > [ T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478 > 0000000000000000 > [ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > [ T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0 > 0000000000000000 > [ T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000 > 0000000000000000 > [ T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000 > c000000012503680 > [ T1] NIP [c0000000008e2400] strlen+0x10/0x30 > [ T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360 > [ T1] Call Trace: > [ T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0 > (unreliable) > [ T1] [c0000000125036f0] [c000000000d65b40] > rtmsg_ifinfo_build_skb+0x80/0x1a0 > [ T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0 > [ T1] [c000000012503800] [c000000000d4de50] > register_netdevice+0x690/0x770 > [ T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80 > [ T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0 > [ T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0 > [ T1] [c000000012503970] [c000000000d331bc] > register_pernet_operations+0xec/0x1e0 > [ T1] [c0000000125039d0] [c000000000d33440] > register_pernet_device+0x60/0xd0 > [ T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160 > [ T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0 > [ T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4 > [ T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4 > [ T1] [c000000012503d40] [c000000002005c7c] > kernel_init_freeable+0x160/0x1ec > [ T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270 > [ T1] [c000000012503e10] [c00000000000cd64] > ret_from_kernel_thread+0x5c/0x64 > [ T1] Instruction dump: > [ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 > 60000000 > [ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 > 4082fff8 7c632050 > [ T1] ---[ end trace 0000000000000000 ]--- > [ T1] > [ T206] ata4: SATA link down (SStatus 0 SControl 300) > [ T204] ata3: SATA link down (SStatus 0 SControl 300) > [ T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ T200] ata1.00: ATA-10: ST1000NX0313 00LY266 00LY265IBM, BE33, > max UDMA/133 > [ T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA > [ T200] ata1.00: configured for UDMA/133 > [ T7] scsi 0:0:0:0: Direct-Access ATA ST1000NX0313 BE33 > PQ: 0 ANSI: 5 > [ T7] sd 0:0:0:0: Attached scsi generic sg0 type 0 > [ T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 > TB/932 GiB) > [ T209] sd 0:0:0:0: [sda] 4096-byte physical blocks > [ T209] sd 0:0:0:0: [sda] Write Protect is off > [ T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > [ T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, > doesn't support DPO or FUA > [ T209] sda: sda1 sda2 > [ T209] sd 0:0:0:0: [sda] Attached SCSI removable disk > [ T1] Kernel panic - not syncing: Attempted to kill init! > exitcode=0x0000000b > ``` > > > Kind regards, > > Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-01-25 19:13 BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) Paul Menzel 2022-01-26 9:47 ` Zhouyi Zhou @ 2022-01-29 2:23 ` Zhouyi Zhou 2022-01-29 16:52 ` Paul Menzel 1 sibling, 1 reply; 17+ messages in thread From: Zhouyi Zhou @ 2022-01-29 2:23 UTC (permalink / raw) To: Paul Menzel Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev Dear Paul I don't have an IBM machine, but I tried to analyze the problem using my x86_64 kvm virtual machine, I can't reproduce the bug using my x86_64 kvm virtual machine. I saw the panic is caused by registration of sit device (A sit device is a type of virtual network device that takes our IPv6 traffic, encapsulates/decapsulates it in IPv4 packets, and sends/receives it over the IPv4 Internet to another host) sit device is registered in function sit_init_net: 1895 static int __net_init sit_init_net(struct net *net) 1896 { 1897 struct sit_net *sitn = net_generic(net, sit_net_id); 1898 struct ip_tunnel *t; 1899 int err; 1900 1901 sitn->tunnels[0] = sitn->tunnels_wc; 1902 sitn->tunnels[1] = sitn->tunnels_l; 1903 sitn->tunnels[2] = sitn->tunnels_r; 1904 sitn->tunnels[3] = sitn->tunnels_r_l; 1905 1906 if (!net_has_fallback_tunnels(net)) 1907 return 0; 1908 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0", 1910 NET_NAME_UNKNOWN, 1911 ipip6_tunnel_setup); 1912 if (!sitn->fb_tunnel_dev) { 1913 err = -ENOMEM; 1914 goto err_alloc_dev; 1915 } 1916 dev_net_set(sitn->fb_tunnel_dev, net); 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; 1918 /* FB netdevice is special: we have one, and only one per netns. 1919 * Allowing to move it to another netns is clearly unsafe. 1920 */ 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; 1922 1923 err = register_netdev(sitn->fb_tunnel_dev); register_netdev on line 1923 will call if_nlmsg_size indirectly. On the other hand, the function that calls the paniced strlen is if_nlmsg_size: (gdb) disassemble if_nlmsg_size Dump of assembler code for function if_nlmsg_size: 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1) 0xffffffff81a0dc25 <+5>: push %rbp 0xffffffff81a0dc26 <+6>: push %r15 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi ... => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen> 0xffffffff81a0dd13 <+243>: add $0x10,%eax 0xffffffff81a0dd16 <+246>: movslq %eax,%r12 and the C code for 0xffffffff81a0dd0e is following (line 524): 515 static size_t rtnl_link_get_size(const struct net_device *dev) 516 { 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; 518 size_t size; 519 520 if (!ops) 521 return 0; 522 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ But ops is assigned the value of sit_link_ops in function sit_init_net line 1917, so I guess something must happened between the calls. Do we have KASAN in IBM machine? would KASAN help us find out what happened in between? Hope I can be of more helpful. Thanks Sincerely Zhouyi On Wed, Jan 26, 2022 at 3:24 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > Dear Linux folks, > > > I do not know, if this is an rcutorture issue, or if rcutorture found a > bug with `rtmsg_ifinfo_build_skb()`. > > > Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with > > CONFIG_TORTURE_TEST=y > CONFIG_RCU_TORTURE_TEST=y > > and > > $ clang --version > Ubuntu clang version 12.0.0-3ubuntu1~21.04.2 > Target: powerpc64le-unknown-linux-gnu > Thread model: posix > InstalledDir: /usr/bin > $ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg I build the kernel in LLVM/Clang also > > and booting it on an IBM S822LC, Linux paniced with a null pointer > dereference, and the watchdog rebooted, and I found the message below in > `/sys/fs/pstore/dmesg-nvram-2.enc.z`. > > ``` > [ T1] Key type id_legacy registered > [ T1] SGI XFS with ACLs, security attributes, no debug enabled > [ T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major > 248) > [ T1] io scheduler mq-deadline registered > [ T1] io scheduler kyber registered > [ T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left > [ T1] pci 0021:10:00.0: enabling device (0141 -> 0143) > [ T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32, > pitch=4096 > [ T1] Console: switching to colour frame buffer device 128x48 > [ T1] fb0: Open Firmware frame buffer device on > /pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0 > [ T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console) > [ T1] hvc0: No interrupts property, using OPAL event > [ T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled > [ T1] Non-volatile memory driver v1.3 > [ T1] brd: module loaded > [ T1] loop: module loaded > [ T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March > 14, 2017) > [ T1] ahci 0021:0e:00.0: version 3.0 > [ T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143) > [ T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf > impl SATA mode > [ T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio > slum part sxs > [ T1] scsi host0: ahci > [ T1] scsi host1: ahci > [ T1] scsi host2: ahci > [ T1] scsi host3: ahci > [ T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000100 irq 39 > [ T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000180 irq 39 > [ T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000200 irq 39 > [ T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000280 irq 39 > [ T1] e100: Intel(R) PRO/100 Network Driver > [ T1] e100: Copyright(c) 1999-2006 Intel Corporation > [ T1] e1000: Intel(R) PRO/1000 Network Driver > [ T1] e1000: Copyright (c) 1999-2006 Intel Corporation. > [ T1] e1000e: Intel(R) PRO/1000 Network Driver > [ T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. > [ T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver > [ T1] ehci-pci: EHCI PCI platform driver > [ T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver > [ T1] ohci-pci: OHCI PCI platform driver > [ T1] rtc-opal opal-rtc: registered as rtc0 > [ T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45 > UTC (1643048505) > [ T1] i2c_dev: i2c /dev entries driver > [ T1] device-mapper: uevent: version 1.0.3 > [ T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: > dm-devel@redhat.com > [ T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal > 0xffffffef max 0x0 > [ T1] powernv-cpufreq: Workload Optimized Frequency is disabled in > the platform > [ T1] powernv_idle_driver registered > [ T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1 > [ T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9 > [ T1] usbcore: registered new interface driver usbhid > [ T1] usbhid: USB HID core driver > [ T1] ipip: IPv4 and MPLS over IPv4 tunneling driver > [ T1] NET: Registered PF_INET6 protocol family > [ T1] Segment Routing with IPv6 > [ T1] In-situ OAM (IOAM) with IPv6 > [ T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver > [ T1] BUG: Kernel NULL pointer dereference on write at 0x00000000 > [ T1] Faulting instruction address: 0xc0000000008e2400 > [ T1] Oops: Kernel access of bad area, sig: 11 [#1] > [ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV > [ T1] Modules linked in: > [ T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted > 5.17.0-rc1-00032-gdd81e1c7d5fb #29 > [ T1] NIP: c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60 > [ T1] REGS: c0000000125033e0 TRAP: 0380 Not tainted > (5.17.0-rc1-00032-gdd81e1c7d5fb) > [ T1] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42800c40 > XER: 00000000 > [ T1] CFAR: c000000000d65dac IRQMASK: 0 > [ T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600 > 0000000000000000 > [ T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000 > 0000000000000cc0 > [ T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff > 0000000000000001 > [ T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478 > 0000000000000000 > [ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > [ T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0 > 0000000000000000 > [ T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000 > 0000000000000000 > [ T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000 > c000000012503680 > [ T1] NIP [c0000000008e2400] strlen+0x10/0x30 > [ T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360 > [ T1] Call Trace: > [ T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0 > (unreliable) > [ T1] [c0000000125036f0] [c000000000d65b40] > rtmsg_ifinfo_build_skb+0x80/0x1a0 > [ T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0 > [ T1] [c000000012503800] [c000000000d4de50] > register_netdevice+0x690/0x770 > [ T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80 > [ T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0 > [ T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0 > [ T1] [c000000012503970] [c000000000d331bc] > register_pernet_operations+0xec/0x1e0 > [ T1] [c0000000125039d0] [c000000000d33440] > register_pernet_device+0x60/0xd0 > [ T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160 > [ T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0 > [ T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4 > [ T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4 > [ T1] [c000000012503d40] [c000000002005c7c] > kernel_init_freeable+0x160/0x1ec > [ T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270 > [ T1] [c000000012503e10] [c00000000000cd64] > ret_from_kernel_thread+0x5c/0x64 > [ T1] Instruction dump: > [ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 > 60000000 > [ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 > 4082fff8 7c632050 > [ T1] ---[ end trace 0000000000000000 ]--- > [ T1] > [ T206] ata4: SATA link down (SStatus 0 SControl 300) > [ T204] ata3: SATA link down (SStatus 0 SControl 300) > [ T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ T200] ata1.00: ATA-10: ST1000NX0313 00LY266 00LY265IBM, BE33, > max UDMA/133 > [ T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA > [ T200] ata1.00: configured for UDMA/133 > [ T7] scsi 0:0:0:0: Direct-Access ATA ST1000NX0313 BE33 > PQ: 0 ANSI: 5 > [ T7] sd 0:0:0:0: Attached scsi generic sg0 type 0 > [ T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 > TB/932 GiB) > [ T209] sd 0:0:0:0: [sda] 4096-byte physical blocks > [ T209] sd 0:0:0:0: [sda] Write Protect is off > [ T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > [ T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, > doesn't support DPO or FUA > [ T209] sda: sda1 sda2 > [ T209] sd 0:0:0:0: [sda] Attached SCSI removable disk > [ T1] Kernel panic - not syncing: Attempted to kill init! > exitcode=0x0000000b > ``` > > > Kind regards, > > Paul On Wed, Jan 26, 2022 at 3:24 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > Dear Linux folks, > > > I do not know, if this is an rcutorture issue, or if rcutorture found a > bug with `rtmsg_ifinfo_build_skb()`. > > > Building Linux 5.17-rc1+ (dd81e1c7d5fb) under Ubuntu 21.04 with > > CONFIG_TORTURE_TEST=y > CONFIG_RCU_TORTURE_TEST=y > > and > > $ clang --version > Ubuntu clang version 12.0.0-3ubuntu1~21.04.2 > Target: powerpc64le-unknown-linux-gnu > Thread model: posix > InstalledDir: /usr/bin > $ make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg I build the kernel in LLVM/Clang also > > and booting it on an IBM S822LC, Linux paniced with a null pointer > dereference, and the watchdog rebooted, and I found the message below in > `/sys/fs/pstore/dmesg-nvram-2.enc.z`. > > ``` > [ T1] Key type id_legacy registered > [ T1] SGI XFS with ACLs, security attributes, no debug enabled > [ T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major > 248) > [ T1] io scheduler mq-deadline registered > [ T1] io scheduler kyber registered > [ T198] cryptomgr_test (198) used greatest stack depth: 13536 bytes left > [ T1] pci 0021:10:00.0: enabling device (0141 -> 0143) > [ T1] Using unsupported 1024x768 (null) at 3fe882010000, depth=32, > pitch=4096 > [ T1] Console: switching to colour frame buffer device 128x48 > [ T1] fb0: Open Firmware frame buffer device on > /pciex@3fffe41100000/pci@0/pci@0/pci@b/pci@0/vga@0 > [ T1] hvc0: raw protocol on /ibm,opal/consoles/serial@0 (boot console) > [ T1] hvc0: No interrupts property, using OPAL event > [ T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled > [ T1] Non-volatile memory driver v1.3 > [ T1] brd: module loaded > [ T1] loop: module loaded > [ T1] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March > 14, 2017) > [ T1] ahci 0021:0e:00.0: version 3.0 > [ T1] ahci 0021:0e:00.0: enabling device (0141 -> 0143) > [ T1] ahci 0021:0e:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf > impl SATA mode > [ T1] ahci 0021:0e:00.0: flags: 64bit ncq sntf led only pmp fbs pio > slum part sxs > [ T1] scsi host0: ahci > [ T1] scsi host1: ahci > [ T1] scsi host2: ahci > [ T1] scsi host3: ahci > [ T1] ata1: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000100 irq 39 > [ T1] ata2: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000180 irq 39 > [ T1] ata3: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000200 irq 39 > [ T1] ata4: SATA max UDMA/133 abar m2048@0x3fe881000000 port > 0x3fe881000280 irq 39 > [ T1] e100: Intel(R) PRO/100 Network Driver > [ T1] e100: Copyright(c) 1999-2006 Intel Corporation > [ T1] e1000: Intel(R) PRO/1000 Network Driver > [ T1] e1000: Copyright (c) 1999-2006 Intel Corporation. > [ T1] e1000e: Intel(R) PRO/1000 Network Driver > [ T1] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. > [ T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver > [ T1] ehci-pci: EHCI PCI platform driver > [ T1] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver > [ T1] ohci-pci: OHCI PCI platform driver > [ T1] rtc-opal opal-rtc: registered as rtc0 > [ T1] rtc-opal opal-rtc: setting system clock to 2022-01-24T18:21:45 > UTC (1643048505) > [ T1] i2c_dev: i2c /dev entries driver > [ T1] device-mapper: uevent: version 1.0.3 > [ T1] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: > dm-devel@redhat.com > [ T1] powernv-cpufreq: cpufreq pstate min 0xffffffd5 nominal > 0xffffffef max 0x0 > [ T1] powernv-cpufreq: Workload Optimized Frequency is disabled in > the platform > [ T1] powernv_idle_driver registered > [ T1] nx_compress_powernv: coprocessor found on chip 0, CT 3 CI 1 > [ T1] nx_compress_powernv: coprocessor found on chip 8, CT 3 CI 9 > [ T1] usbcore: registered new interface driver usbhid > [ T1] usbhid: USB HID core driver > [ T1] ipip: IPv4 and MPLS over IPv4 tunneling driver > [ T1] NET: Registered PF_INET6 protocol family > [ T1] Segment Routing with IPv6 > [ T1] In-situ OAM (IOAM) with IPv6 > [ T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver > [ T1] BUG: Kernel NULL pointer dereference on write at 0x00000000 > [ T1] Faulting instruction address: 0xc0000000008e2400 > [ T1] Oops: Kernel access of bad area, sig: 11 [#1] > [ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV > [ T1] Modules linked in: > [ T1] CPU: 11 PID: 1 Comm: swapper/0 Not tainted > 5.17.0-rc1-00032-gdd81e1c7d5fb #29 > [ T1] NIP: c0000000008e2400 LR: c000000000d65db0 CTR: c000000000f0bb60 > [ T1] REGS: c0000000125033e0 TRAP: 0380 Not tainted > (5.17.0-rc1-00032-gdd81e1c7d5fb) > [ T1] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42800c40 > XER: 00000000 > [ T1] CFAR: c000000000d65dac IRQMASK: 0 > [ T1] GPR00: c000000000d65b40 c000000012503680 c00000000290c600 > 0000000000000000 > [ T1] GPR04: ffffffffffffffff 00000000ffffffff 0000000000000000 > 0000000000000cc0 > [ T1] GPR08: 0000000000000000 0000000000000000 ffffffffffffffff > 0000000000000001 > [ T1] GPR12: 0000000000000000 c000007fffff6c00 c000000000012478 > 0000000000000000 > [ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > [ T1] GPR20: 0000000000000000 c000000002810100 0000000000000cc0 > 0000000000000000 > [ T1] GPR24: 0000000000000010 c00000000294cf50 0000000000000000 > 0000000000000000 > [ T1] GPR28: 0000000000000000 c00000001ec61000 0000000000000000 > c000000012503680 > [ T1] NIP [c0000000008e2400] strlen+0x10/0x30 > [ T1] LR [c000000000d65db0] if_nlmsg_size+0x150/0x360 > [ T1] Call Trace: > [ T1] [c000000012503680] [c0000000125036c0] 0xc0000000125036c0 > (unreliable) > [ T1] [c0000000125036f0] [c000000000d65b40] > rtmsg_ifinfo_build_skb+0x80/0x1a0 > [ T1] [c0000000125037b0] [c000000000d66be0] rtmsg_ifinfo+0x70/0xd0 > [ T1] [c000000012503800] [c000000000d4de50] > register_netdevice+0x690/0x770 > [ T1] [c000000012503890] [c000000000d4e2bc] register_netdev+0x4c/0x80 > [ T1] [c0000000125038c0] [c000000000f4784c] sit_init_net+0x10c/0x1d0 > [ T1] [c000000012503910] [c000000000d33c0c] ops_init+0x13c/0x1b0 > [ T1] [c000000012503970] [c000000000d331bc] > register_pernet_operations+0xec/0x1e0 > [ T1] [c0000000125039d0] [c000000000d33440] > register_pernet_device+0x60/0xd0 > [ T1] [c000000012503a20] [c000000002085478] sit_init+0x54/0x160 > [ T1] [c000000012503ab0] [c000000000011ba8] do_one_initcall+0xd8/0x3b0 > [ T1] [c000000012503c70] [c000000002006064] do_initcall_level+0xe4/0x1c4 > [ T1] [c000000012503cc0] [c000000002005f20] do_initcalls+0x84/0xe4 > [ T1] [c000000012503d40] [c000000002005c7c] > kernel_init_freeable+0x160/0x1ec > [ T1] [c000000012503da0] [c0000000000124ac] kernel_init+0x3c/0x270 > [ T1] [c000000012503e10] [c00000000000cd64] > ret_from_kernel_thread+0x5c/0x64 > [ T1] Instruction dump: > [ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 > 60000000 > [ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 > 4082fff8 7c632050 > [ T1] ---[ end trace 0000000000000000 ]--- > [ T1] > [ T206] ata4: SATA link down (SStatus 0 SControl 300) > [ T204] ata3: SATA link down (SStatus 0 SControl 300) > [ T200] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ T200] ata1.00: ATA-10: ST1000NX0313 00LY266 00LY265IBM, BE33, > max UDMA/133 > [ T200] ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32), AA > [ T200] ata1.00: configured for UDMA/133 > [ T7] scsi 0:0:0:0: Direct-Access ATA ST1000NX0313 BE33 > PQ: 0 ANSI: 5 > [ T7] sd 0:0:0:0: Attached scsi generic sg0 type 0 > [ T209] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 > TB/932 GiB) > [ T209] sd 0:0:0:0: [sda] 4096-byte physical blocks > [ T209] sd 0:0:0:0: [sda] Write Protect is off > [ T209] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > [ T209] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, > doesn't support DPO or FUA > [ T209] sda: sda1 sda2 > [ T209] sd 0:0:0:0: [sda] Attached SCSI removable disk > [ T1] Kernel panic - not syncing: Attempted to kill init! > exitcode=0x0000000b > ``` > > > Kind regards, > > Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-01-29 2:23 ` Zhouyi Zhou @ 2022-01-29 16:52 ` Paul Menzel 2022-01-30 0:21 ` Zhouyi Zhou 0 siblings, 1 reply; 17+ messages in thread From: Paul Menzel @ 2022-01-29 16:52 UTC (permalink / raw) To: Zhouyi Zhou Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev Dear Zhouyi, Thank you for taking the time. Am 29.01.22 um 03:23 schrieb Zhouyi Zhou: > I don't have an IBM machine, but I tried to analyze the problem using > my x86_64 kvm virtual machine, I can't reproduce the bug using my > x86_64 kvm virtual machine. No idea, if it’s architecture specific. > I saw the panic is caused by registration of sit device (A sit device > is a type of virtual network device that takes our IPv6 traffic, > encapsulates/decapsulates it in IPv4 packets, and sends/receives it > over the IPv4 Internet to another host) > > sit device is registered in function sit_init_net: > 1895 static int __net_init sit_init_net(struct net *net) > 1896 { > 1897 struct sit_net *sitn = net_generic(net, sit_net_id); > 1898 struct ip_tunnel *t; > 1899 int err; > 1900 > 1901 sitn->tunnels[0] = sitn->tunnels_wc; > 1902 sitn->tunnels[1] = sitn->tunnels_l; > 1903 sitn->tunnels[2] = sitn->tunnels_r; > 1904 sitn->tunnels[3] = sitn->tunnels_r_l; > 1905 > 1906 if (!net_has_fallback_tunnels(net)) > 1907 return 0; > 1908 > 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0", > 1910 NET_NAME_UNKNOWN, > 1911 ipip6_tunnel_setup); > 1912 if (!sitn->fb_tunnel_dev) { > 1913 err = -ENOMEM; > 1914 goto err_alloc_dev; > 1915 } > 1916 dev_net_set(sitn->fb_tunnel_dev, net); > 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; > 1918 /* FB netdevice is special: we have one, and only one per netns. > 1919 * Allowing to move it to another netns is clearly unsafe. > 1920 */ > 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > 1922 > 1923 err = register_netdev(sitn->fb_tunnel_dev); > register_netdev on line 1923 will call if_nlmsg_size indirectly. > > On the other hand, the function that calls the paniced strlen is if_nlmsg_size: > (gdb) disassemble if_nlmsg_size > Dump of assembler code for function if_nlmsg_size: > 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1) > 0xffffffff81a0dc25 <+5>: push %rbp > 0xffffffff81a0dc26 <+6>: push %r15 > 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512> > 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi > ... > => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen> > 0xffffffff81a0dd13 <+243>: add $0x10,%eax > 0xffffffff81a0dd16 <+246>: movslq %eax,%r12 Excuse my ignorance, would that look the same for ppc64le? Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a current build (without rcutorture) I have the line below, where strlen shows up. (gdb) disassemble if_nlmsg_size […] 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen> […] > and the C code for 0xffffffff81a0dd0e is following (line 524): > 515 static size_t rtnl_link_get_size(const struct net_device *dev) > 516 { > 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; > 518 size_t size; > 519 > 520 if (!ops) > 521 return 0; > 522 > 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ How do I connect the disassemby output with the corresponding line? > But ops is assigned the value of sit_link_ops in function sit_init_net > line 1917, so I guess something must happened between the calls. > > Do we have KASAN in IBM machine? would KASAN help us find out what > happened in between? Unfortunately, KASAN is not support on Power, I have, as far as I can see. From `arch/powerpc/Kconfig`: select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14 select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14 > Hope I can be of more helpful. Some distributions support multi-arch, so they easily allow crosscompiling for different architectures. Kind regards, Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-01-29 16:52 ` Paul Menzel @ 2022-01-30 0:21 ` Zhouyi Zhou 2022-01-30 8:19 ` Paul Menzel 0 siblings, 1 reply; 17+ messages in thread From: Zhouyi Zhou @ 2022-01-30 0:21 UTC (permalink / raw) To: Paul Menzel Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev Dear Paul, Thank you for your instructions, I learned a lot from this process. On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > Dear Zhouyi, > > > Thank you for taking the time. > > > Am 29.01.22 um 03:23 schrieb Zhouyi Zhou: > > > I don't have an IBM machine, but I tried to analyze the problem using > > my x86_64 kvm virtual machine, I can't reproduce the bug using my > > x86_64 kvm virtual machine. > > No idea, if it’s architecture specific. > > > I saw the panic is caused by registration of sit device (A sit device > > is a type of virtual network device that takes our IPv6 traffic, > > encapsulates/decapsulates it in IPv4 packets, and sends/receives it > > over the IPv4 Internet to another host) > > > > sit device is registered in function sit_init_net: > > 1895 static int __net_init sit_init_net(struct net *net) > > 1896 { > > 1897 struct sit_net *sitn = net_generic(net, sit_net_id); > > 1898 struct ip_tunnel *t; > > 1899 int err; > > 1900 > > 1901 sitn->tunnels[0] = sitn->tunnels_wc; > > 1902 sitn->tunnels[1] = sitn->tunnels_l; > > 1903 sitn->tunnels[2] = sitn->tunnels_r; > > 1904 sitn->tunnels[3] = sitn->tunnels_r_l; > > 1905 > > 1906 if (!net_has_fallback_tunnels(net)) > > 1907 return 0; > > 1908 > > 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0", > > 1910 NET_NAME_UNKNOWN, > > 1911 ipip6_tunnel_setup); > > 1912 if (!sitn->fb_tunnel_dev) { > > 1913 err = -ENOMEM; > > 1914 goto err_alloc_dev; > > 1915 } > > 1916 dev_net_set(sitn->fb_tunnel_dev, net); > > 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; > > 1918 /* FB netdevice is special: we have one, and only one per netns. > > 1919 * Allowing to move it to another netns is clearly unsafe. > > 1920 */ > > 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > 1922 > > 1923 err = register_netdev(sitn->fb_tunnel_dev); > > register_netdev on line 1923 will call if_nlmsg_size indirectly. > > > > On the other hand, the function that calls the paniced strlen is if_nlmsg_size: > > (gdb) disassemble if_nlmsg_size > > Dump of assembler code for function if_nlmsg_size: > > 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1) > > 0xffffffff81a0dc25 <+5>: push %rbp > > 0xffffffff81a0dc26 <+6>: push %r15 > > 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512> > > 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi > > ... > > => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen> > > 0xffffffff81a0dd13 <+243>: add $0x10,%eax > > 0xffffffff81a0dd16 <+246>: movslq %eax,%r12 > > Excuse my ignorance, would that look the same for ppc64le? > Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a > current build (without rcutorture) I have the line below, where strlen > shows up. > > (gdb) disassemble if_nlmsg_size > […] > 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen> > […] > > > and the C code for 0xffffffff81a0dd0e is following (line 524): > > 515 static size_t rtnl_link_get_size(const struct net_device *dev) > > 516 { > > 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; > > 518 size_t size; > > 519 > > 520 if (!ops) > > 521 return 0; > > 522 > > 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > How do I connect the disassemby output with the corresponding line? I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel for powerpc64le in my Ubuntu 20.04 x86_64. gdb-multiarch ./vmlinux (gdb)disassemble if_nlmsg_size [...] 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> [...] (gdb) break *0xc00000000191bf40 Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. But in include/net/netlink.h:1112, I can't find the call to strlen 1110static inline int nla_total_size(int payload) 1111{ 1112 return NLA_ALIGN(nla_attr_size(payload)); 1113} This may be due to the compiler wrongly encode the debug information, I guess. > > > But ops is assigned the value of sit_link_ops in function sit_init_net > > line 1917, so I guess something must happened between the calls. > > > > Do we have KASAN in IBM machine? would KASAN help us find out what > > happened in between? > > Unfortunately, KASAN is not support on Power, I have, as far as I can > see. From `arch/powerpc/Kconfig`: > > select HAVE_ARCH_KASAN if PPC32 && > PPC_PAGE_SHIFT <= 14 > select HAVE_ARCH_KASAN_VMALLOC if PPC32 && > PPC_PAGE_SHIFT <= 14 > en, agree, I invoke "make menuconfig ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j 16", I can't find KASAN under Memory Debugging, I guess we should find the bug by bisecting instead. > > Hope I can be of more helpful. > > Some distributions support multi-arch, so they easily allow > crosscompiling for different architectures. I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue to explore it. Kind regards Zhouyi > > > Kind regards, > > Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-01-30 0:21 ` Zhouyi Zhou @ 2022-01-30 8:19 ` Paul Menzel 2022-01-30 13:24 ` Zhouyi Zhou 0 siblings, 1 reply; 17+ messages in thread From: Paul Menzel @ 2022-01-30 8:19 UTC (permalink / raw) To: Zhouyi Zhou Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev Dear Zhouyi, Am 30.01.22 um 01:21 schrieb Zhouyi Zhou: > Thank you for your instructions, I learned a lot from this process. Same on my end. > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote: >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou: >> >>> I don't have an IBM machine, but I tried to analyze the problem using >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my >>> x86_64 kvm virtual machine. >> >> No idea, if it’s architecture specific. >> >>> I saw the panic is caused by registration of sit device (A sit device >>> is a type of virtual network device that takes our IPv6 traffic, >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it >>> over the IPv4 Internet to another host) >>> >>> sit device is registered in function sit_init_net: >>> 1895 static int __net_init sit_init_net(struct net *net) >>> 1896 { >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id); >>> 1898 struct ip_tunnel *t; >>> 1899 int err; >>> 1900 >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc; >>> 1902 sitn->tunnels[1] = sitn->tunnels_l; >>> 1903 sitn->tunnels[2] = sitn->tunnels_r; >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l; >>> 1905 >>> 1906 if (!net_has_fallback_tunnels(net)) >>> 1907 return 0; >>> 1908 >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0", >>> 1910 NET_NAME_UNKNOWN, >>> 1911 ipip6_tunnel_setup); >>> 1912 if (!sitn->fb_tunnel_dev) { >>> 1913 err = -ENOMEM; >>> 1914 goto err_alloc_dev; >>> 1915 } >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net); >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; >>> 1918 /* FB netdevice is special: we have one, and only one per netns. >>> 1919 * Allowing to move it to another netns is clearly unsafe. >>> 1920 */ >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; >>> 1922 >>> 1923 err = register_netdev(sitn->fb_tunnel_dev); >>> register_netdev on line 1923 will call if_nlmsg_size indirectly. >>> >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size: >>> (gdb) disassemble if_nlmsg_size >>> Dump of assembler code for function if_nlmsg_size: >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1) >>> 0xffffffff81a0dc25 <+5>: push %rbp >>> 0xffffffff81a0dc26 <+6>: push %r15 >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512> >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi >>> ... >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen> >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12 >> >> Excuse my ignorance, would that look the same for ppc64le? >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a >> current build (without rcutorture) I have the line below, where strlen >> shows up. >> >> (gdb) disassemble if_nlmsg_size >> […] >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen> >> […] >> >>> and the C code for 0xffffffff81a0dd0e is following (line 524): >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev) >>> 516 { >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; >>> 518 size_t size; >>> 519 >>> 520 if (!ops) >>> 521 return 0; >>> 522 >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ >> >> How do I connect the disassemby output with the corresponding line? > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > for powerpc64le in my Ubuntu 20.04 x86_64. > > gdb-multiarch ./vmlinux > (gdb)disassemble if_nlmsg_size > [...] > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > [...] > (gdb) break *0xc00000000191bf40 > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > > But in include/net/netlink.h:1112, I can't find the call to strlen > 1110static inline int nla_total_size(int payload) > 1111{ > 1112 return NLA_ALIGN(nla_attr_size(payload)); > 1113} > This may be due to the compiler wrongly encode the debug information, I guess. `rtnl_link_get_size()` contains: size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ Is that inlined(?) and the code at fault? >>> But ops is assigned the value of sit_link_ops in function sit_init_net >>> line 1917, so I guess something must happened between the calls. >>> >>> Do we have KASAN in IBM machine? would KASAN help us find out what >>> happened in between? >> >> Unfortunately, KASAN is not support on Power, I have, as far as I can >> see. From `arch/powerpc/Kconfig`: >> >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14 >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14 >> > en, agree, I invoke "make menuconfig ARCH=powerpc > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j > 16", I can't find KASAN under Memory Debugging, I guess we should find > the bug by bisecting instead. I do not know, if it is a regression, as it was the first time I tried to run a Linux kernel built with rcutorture on real hardware. >>> Hope I can be of more helpful. >> >> Some distributions support multi-arch, so they easily allow >> crosscompiling for different architectures. > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue > to explore it. Oh, that does not sound good. But I have not tried that in a long time either. It’s a separate issue, but maybe some of the PPC maintainers/folks could help. Kind regards, Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-01-30 8:19 ` Paul Menzel @ 2022-01-30 13:24 ` Zhouyi Zhou 2022-01-30 17:44 ` Paul E. McKenney [not found] ` <f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de> 0 siblings, 2 replies; 17+ messages in thread From: Zhouyi Zhou @ 2022-01-30 13:24 UTC (permalink / raw) To: Paul Menzel Cc: Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev Dear Paul On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > Dear Zhouyi, > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou: > > > Thank you for your instructions, I learned a lot from this process. > > Same on my end. > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou: > >> > >>> I don't have an IBM machine, but I tried to analyze the problem using > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my > >>> x86_64 kvm virtual machine. > >> > >> No idea, if it’s architecture specific. > >> > >>> I saw the panic is caused by registration of sit device (A sit device > >>> is a type of virtual network device that takes our IPv6 traffic, > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it > >>> over the IPv4 Internet to another host) > >>> > >>> sit device is registered in function sit_init_net: > >>> 1895 static int __net_init sit_init_net(struct net *net) > >>> 1896 { > >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id); > >>> 1898 struct ip_tunnel *t; > >>> 1899 int err; > >>> 1900 > >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc; > >>> 1902 sitn->tunnels[1] = sitn->tunnels_l; > >>> 1903 sitn->tunnels[2] = sitn->tunnels_r; > >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l; > >>> 1905 > >>> 1906 if (!net_has_fallback_tunnels(net)) > >>> 1907 return 0; > >>> 1908 > >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0", > >>> 1910 NET_NAME_UNKNOWN, > >>> 1911 ipip6_tunnel_setup); > >>> 1912 if (!sitn->fb_tunnel_dev) { > >>> 1913 err = -ENOMEM; > >>> 1914 goto err_alloc_dev; > >>> 1915 } > >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net); > >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; > >>> 1918 /* FB netdevice is special: we have one, and only one per netns. > >>> 1919 * Allowing to move it to another netns is clearly unsafe. > >>> 1920 */ > >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > >>> 1922 > >>> 1923 err = register_netdev(sitn->fb_tunnel_dev); > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly. > >>> > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size: > >>> (gdb) disassemble if_nlmsg_size > >>> Dump of assembler code for function if_nlmsg_size: > >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1) > >>> 0xffffffff81a0dc25 <+5>: push %rbp > >>> 0xffffffff81a0dc26 <+6>: push %r15 > >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512> > >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi > >>> ... > >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen> > >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax > >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12 > >> > >> Excuse my ignorance, would that look the same for ppc64le? > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a > >> current build (without rcutorture) I have the line below, where strlen > >> shows up. > >> > >> (gdb) disassemble if_nlmsg_size > >> […] > >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen> > >> […] > >> > >>> and the C code for 0xffffffff81a0dd0e is following (line 524): > >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev) > >>> 516 { > >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; > >>> 518 size_t size; > >>> 519 > >>> 520 if (!ops) > >>> 521 return 0; > >>> 522 > >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > >> > >> How do I connect the disassemby output with the corresponding line? > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > for powerpc64le in my Ubuntu 20.04 x86_64. > > > > gdb-multiarch ./vmlinux > > (gdb)disassemble if_nlmsg_size > > [...] > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > > [...] > > (gdb) break *0xc00000000191bf40 > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > > > > But in include/net/netlink.h:1112, I can't find the call to strlen > > 1110static inline int nla_total_size(int payload) > > 1111{ > > 1112 return NLA_ALIGN(nla_attr_size(payload)); > > 1113} > > This may be due to the compiler wrongly encode the debug information, I guess. > > `rtnl_link_get_size()` contains: > > size = nla_total_size(sizeof(struct nlattr)) + /* > IFLA_LINKINFO */ > nla_total_size(strlen(ops->kind) + 1); /* > IFLA_INFO_KIND */ > > Is that inlined(?) and the code at fault? Yes, that is inlined! because (gdb) disassemble if_nlmsg_size Dump of assembler code for function if_nlmsg_size: [...] 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800> 0xc00000000191bf3c <+108>: ld r3,16(r31) 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> [...] (gdb) (gdb) break *0xc00000000191bf40 Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. (gdb) break *0xc00000000191bf38 Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520. > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net > >>> line 1917, so I guess something must happened between the calls. > >>> > >>> Do we have KASAN in IBM machine? would KASAN help us find out what > >>> happened in between? > >> > >> Unfortunately, KASAN is not support on Power, I have, as far as I can > >> see. From `arch/powerpc/Kconfig`: > >> > >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14 > >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14 > >> > > en, agree, I invoke "make menuconfig ARCH=powerpc > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j > > 16", I can't find KASAN under Memory Debugging, I guess we should find > > the bug by bisecting instead. > > I do not know, if it is a regression, as it was the first time I tried > to run a Linux kernel built with rcutorture on real hardware. I tried to add some debug statements to the kernel to locate the bug more accurately, you can try it when you're not busy in the future, or just ignore it if the following patch looks not very effective ;-) diff --git a/net/core/dev.c b/net/core/dev.c index 1baab07820f6..969ac7c540cc 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev) * Prevent userspace races by waiting until the network * device is fully setup before sending notifications. */ + if (dev->rtnl_link_ops) + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, + dev->rtnl_link_ops->kind, __FUNCTION__); if (!dev->rtnl_link_ops || dev->rtnl_link_state == RTNL_LINK_INITIALIZED) rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL); @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev) if (rtnl_lock_killable()) return -EINTR; + if (dev->rtnl_link_ops) + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, + dev->rtnl_link_ops->kind, __FUNCTION__); err = register_netdevice(dev); rtnl_unlock(); return err; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index e476403231f0..e08986ae6238 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct net_device *dev) if (!ops) return 0; + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops, + ops->kind, __FUNCTION__); size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct net_device *dev) static noinline size_t if_nlmsg_size(const struct net_device *dev, u32 ext_filter_mask) { + if (dev->rtnl_link_ops) + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, + dev->rtnl_link_ops->kind, __FUNCTION__); return NLMSG_ALIGN(sizeof(struct ifinfomsg)) + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */ + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */ @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev, struct net *net = dev_net(dev); struct sk_buff *skb; int err = -ENOBUFS; - + if (dev->rtnl_link_ops) + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, + dev->rtnl_link_ops->kind, __FUNCTION__); skb = nlmsg_new(if_nlmsg_size(dev, 0), flags); if (skb == NULL) goto errout; @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct net_device *dev, if (dev->reg_state != NETREG_REGISTERED) return; - + if (dev->rtnl_link_ops) + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, + dev->rtnl_link_ops->kind, __FUNCTION__); skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid, new_ifindex); if (skb) @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct net_device *dev, void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change, gfp_t flags) { + if (dev->rtnl_link_ops) + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, + dev->rtnl_link_ops->kind, __FUNCTION__); rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags, NULL, 0); } diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index c0b138c20992..fa5b2725811c 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net) * Allowing to move it to another netns is clearly unsafe. */ sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; - + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", + sitn->fb_tunnel_dev->rtnl_link_ops, + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__); err = register_netdev(sitn->fb_tunnel_dev); if (err) goto err_reg_dev; > > >>> Hope I can be of more helpful. > >> > >> Some distributions support multi-arch, so they easily allow > >> crosscompiling for different architectures. > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue > > to explore it. > > Oh, that does not sound good. But I have not tried that in a long time > either. It’s a separate issue, but maybe some of the PPC > maintainers/folks could help. I will do further research on this later. Thanks for your time Kind regards Zhouyi > > > Kind regards, > > Paul ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-01-30 13:24 ` Zhouyi Zhou @ 2022-01-30 17:44 ` Paul E. McKenney 2022-01-31 1:08 ` Zhouyi Zhou [not found] ` <f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de> 1 sibling, 1 reply; 17+ messages in thread From: Paul E. McKenney @ 2022-01-30 17:44 UTC (permalink / raw) To: Zhouyi Zhou Cc: Paul Menzel, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote: > Dear Paul > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > Dear Zhouyi, > > > > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou: > > > > > Thank you for your instructions, I learned a lot from this process. > > > > Same on my end. > > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou: > > >> > > >>> I don't have an IBM machine, but I tried to analyze the problem using > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my > > >>> x86_64 kvm virtual machine. > > >> > > >> No idea, if it’s architecture specific. > > >> > > >>> I saw the panic is caused by registration of sit device (A sit device > > >>> is a type of virtual network device that takes our IPv6 traffic, > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it > > >>> over the IPv4 Internet to another host) > > >>> > > >>> sit device is registered in function sit_init_net: > > >>> 1895 static int __net_init sit_init_net(struct net *net) > > >>> 1896 { > > >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id); > > >>> 1898 struct ip_tunnel *t; > > >>> 1899 int err; > > >>> 1900 > > >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc; > > >>> 1902 sitn->tunnels[1] = sitn->tunnels_l; > > >>> 1903 sitn->tunnels[2] = sitn->tunnels_r; > > >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l; > > >>> 1905 > > >>> 1906 if (!net_has_fallback_tunnels(net)) > > >>> 1907 return 0; > > >>> 1908 > > >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0", > > >>> 1910 NET_NAME_UNKNOWN, > > >>> 1911 ipip6_tunnel_setup); > > >>> 1912 if (!sitn->fb_tunnel_dev) { > > >>> 1913 err = -ENOMEM; > > >>> 1914 goto err_alloc_dev; > > >>> 1915 } > > >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net); > > >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; > > >>> 1918 /* FB netdevice is special: we have one, and only one per netns. > > >>> 1919 * Allowing to move it to another netns is clearly unsafe. > > >>> 1920 */ > > >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > >>> 1922 > > >>> 1923 err = register_netdev(sitn->fb_tunnel_dev); > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly. > > >>> > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size: > > >>> (gdb) disassemble if_nlmsg_size > > >>> Dump of assembler code for function if_nlmsg_size: > > >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1) > > >>> 0xffffffff81a0dc25 <+5>: push %rbp > > >>> 0xffffffff81a0dc26 <+6>: push %r15 > > >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512> > > >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi > > >>> ... > > >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen> > > >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax > > >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12 > > >> > > >> Excuse my ignorance, would that look the same for ppc64le? > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a > > >> current build (without rcutorture) I have the line below, where strlen > > >> shows up. > > >> > > >> (gdb) disassemble if_nlmsg_size > > >> […] > > >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen> > > >> […] > > >> > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524): > > >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev) > > >>> 516 { > > >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; > > >>> 518 size_t size; > > >>> 519 > > >>> 520 if (!ops) > > >>> 521 return 0; > > >>> 522 > > >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > >> > > >> How do I connect the disassemby output with the corresponding line? > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > > for powerpc64le in my Ubuntu 20.04 x86_64. > > > > > > gdb-multiarch ./vmlinux > > > (gdb)disassemble if_nlmsg_size > > > [...] > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > > > [...] > > > (gdb) break *0xc00000000191bf40 > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > > > > > > But in include/net/netlink.h:1112, I can't find the call to strlen > > > 1110static inline int nla_total_size(int payload) > > > 1111{ > > > 1112 return NLA_ALIGN(nla_attr_size(payload)); > > > 1113} > > > This may be due to the compiler wrongly encode the debug information, I guess. > > > > `rtnl_link_get_size()` contains: > > > > size = nla_total_size(sizeof(struct nlattr)) + /* > > IFLA_LINKINFO */ > > nla_total_size(strlen(ops->kind) + 1); /* > > IFLA_INFO_KIND */ > > > > Is that inlined(?) and the code at fault? > Yes, that is inlined! because > (gdb) disassemble if_nlmsg_size > Dump of assembler code for function if_nlmsg_size: > [...] > 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800> > 0xc00000000191bf3c <+108>: ld r3,16(r31) > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > [...] > (gdb) > (gdb) break *0xc00000000191bf40 > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > (gdb) break *0xc00000000191bf38 > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520. I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not already doing so. That gives gdb a lot more information about things like inlining. Thanx, Paul > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net > > >>> line 1917, so I guess something must happened between the calls. > > >>> > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what > > >>> happened in between? > > >> > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can > > >> see. From `arch/powerpc/Kconfig`: > > >> > > >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14 > > >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14 > > >> > > > en, agree, I invoke "make menuconfig ARCH=powerpc > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j > > > 16", I can't find KASAN under Memory Debugging, I guess we should find > > > the bug by bisecting instead. > > > > I do not know, if it is a regression, as it was the first time I tried > > to run a Linux kernel built with rcutorture on real hardware. > I tried to add some debug statements to the kernel to locate the bug > more accurately, you can try it when you're not busy in the future, > or just ignore it if the following patch looks not very effective ;-) > diff --git a/net/core/dev.c b/net/core/dev.c > index 1baab07820f6..969ac7c540cc 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev) > * Prevent userspace races by waiting until the network > * device is fully setup before sending notifications. > */ > + if (dev->rtnl_link_ops) > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > + dev->rtnl_link_ops->kind, __FUNCTION__); > if (!dev->rtnl_link_ops || > dev->rtnl_link_state == RTNL_LINK_INITIALIZED) > rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL); > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev) > > if (rtnl_lock_killable()) > return -EINTR; > + if (dev->rtnl_link_ops) > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > + dev->rtnl_link_ops->kind, __FUNCTION__); > err = register_netdevice(dev); > rtnl_unlock(); > return err; > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index e476403231f0..e08986ae6238 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct > net_device *dev) > if (!ops) > return 0; > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops, > + ops->kind, __FUNCTION__); > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct > net_device *dev) > static noinline size_t if_nlmsg_size(const struct net_device *dev, > u32 ext_filter_mask) > { > + if (dev->rtnl_link_ops) > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > + dev->rtnl_link_ops->kind, __FUNCTION__); > return NLMSG_ALIGN(sizeof(struct ifinfomsg)) > + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */ > + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */ > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, > struct net_device *dev, > struct net *net = dev_net(dev); > struct sk_buff *skb; > int err = -ENOBUFS; > - > + if (dev->rtnl_link_ops) > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > + dev->rtnl_link_ops->kind, __FUNCTION__); > skb = nlmsg_new(if_nlmsg_size(dev, 0), flags); > if (skb == NULL) > goto errout; > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct > net_device *dev, > > if (dev->reg_state != NETREG_REGISTERED) > return; > - > + if (dev->rtnl_link_ops) > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > + dev->rtnl_link_ops->kind, __FUNCTION__); > skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid, > new_ifindex); > if (skb) > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct > net_device *dev, > void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change, > gfp_t flags) > { > + if (dev->rtnl_link_ops) > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > + dev->rtnl_link_ops->kind, __FUNCTION__); > rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags, > NULL, 0); > } > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > index c0b138c20992..fa5b2725811c 100644 > --- a/net/ipv6/sit.c > +++ b/net/ipv6/sit.c > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net) > * Allowing to move it to another netns is clearly unsafe. > */ > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > - > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", > + sitn->fb_tunnel_dev->rtnl_link_ops, > + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__); > err = register_netdev(sitn->fb_tunnel_dev); > if (err) > goto err_reg_dev; > > > > >>> Hope I can be of more helpful. > > >> > > >> Some distributions support multi-arch, so they easily allow > > >> crosscompiling for different architectures. > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue > > > to explore it. > > > > Oh, that does not sound good. But I have not tried that in a long time > > either. It’s a separate issue, but maybe some of the PPC > > maintainers/folks could help. > I will do further research on this later. > > Thanks for your time > Kind regards > Zhouyi > > > > > > Kind regards, > > > > Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-01-30 17:44 ` Paul E. McKenney @ 2022-01-31 1:08 ` Zhouyi Zhou 2022-02-01 17:50 ` Paul E. McKenney 0 siblings, 1 reply; 17+ messages in thread From: Zhouyi Zhou @ 2022-01-31 1:08 UTC (permalink / raw) To: Paul E. McKenney Cc: Paul Menzel, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev Thank Paul for joining us! On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote: > > Dear Paul > > > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > > > Dear Zhouyi, > > > > > > > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou: > > > > > > > Thank you for your instructions, I learned a lot from this process. > > > > > > Same on my end. > > > > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou: > > > >> > > > >>> I don't have an IBM machine, but I tried to analyze the problem using > > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my > > > >>> x86_64 kvm virtual machine. > > > >> > > > >> No idea, if it’s architecture specific. > > > >> > > > >>> I saw the panic is caused by registration of sit device (A sit device > > > >>> is a type of virtual network device that takes our IPv6 traffic, > > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it > > > >>> over the IPv4 Internet to another host) > > > >>> > > > >>> sit device is registered in function sit_init_net: > > > >>> 1895 static int __net_init sit_init_net(struct net *net) > > > >>> 1896 { > > > >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id); > > > >>> 1898 struct ip_tunnel *t; > > > >>> 1899 int err; > > > >>> 1900 > > > >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc; > > > >>> 1902 sitn->tunnels[1] = sitn->tunnels_l; > > > >>> 1903 sitn->tunnels[2] = sitn->tunnels_r; > > > >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l; > > > >>> 1905 > > > >>> 1906 if (!net_has_fallback_tunnels(net)) > > > >>> 1907 return 0; > > > >>> 1908 > > > >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0", > > > >>> 1910 NET_NAME_UNKNOWN, > > > >>> 1911 ipip6_tunnel_setup); > > > >>> 1912 if (!sitn->fb_tunnel_dev) { > > > >>> 1913 err = -ENOMEM; > > > >>> 1914 goto err_alloc_dev; > > > >>> 1915 } > > > >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net); > > > >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; > > > >>> 1918 /* FB netdevice is special: we have one, and only one per netns. > > > >>> 1919 * Allowing to move it to another netns is clearly unsafe. > > > >>> 1920 */ > > > >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > > >>> 1922 > > > >>> 1923 err = register_netdev(sitn->fb_tunnel_dev); > > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly. > > > >>> > > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size: > > > >>> (gdb) disassemble if_nlmsg_size > > > >>> Dump of assembler code for function if_nlmsg_size: > > > >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1) > > > >>> 0xffffffff81a0dc25 <+5>: push %rbp > > > >>> 0xffffffff81a0dc26 <+6>: push %r15 > > > >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512> > > > >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi > > > >>> ... > > > >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen> > > > >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax > > > >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12 > > > >> > > > >> Excuse my ignorance, would that look the same for ppc64le? > > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a > > > >> current build (without rcutorture) I have the line below, where strlen > > > >> shows up. > > > >> > > > >> (gdb) disassemble if_nlmsg_size > > > >> […] > > > >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen> > > > >> […] > > > >> > > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524): > > > >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev) > > > >>> 516 { > > > >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; > > > >>> 518 size_t size; > > > >>> 519 > > > >>> 520 if (!ops) > > > >>> 521 return 0; > > > >>> 522 > > > >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > > >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > >> > > > >> How do I connect the disassemby output with the corresponding line? > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > > > for powerpc64le in my Ubuntu 20.04 x86_64. > > > > > > > > gdb-multiarch ./vmlinux > > > > (gdb)disassemble if_nlmsg_size > > > > [...] > > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > > > > [...] > > > > (gdb) break *0xc00000000191bf40 > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > > > > > > > > But in include/net/netlink.h:1112, I can't find the call to strlen > > > > 1110static inline int nla_total_size(int payload) > > > > 1111{ > > > > 1112 return NLA_ALIGN(nla_attr_size(payload)); > > > > 1113} > > > > This may be due to the compiler wrongly encode the debug information, I guess. > > > > > > `rtnl_link_get_size()` contains: > > > > > > size = nla_total_size(sizeof(struct nlattr)) + /* > > > IFLA_LINKINFO */ > > > nla_total_size(strlen(ops->kind) + 1); /* > > > IFLA_INFO_KIND */ > > > > > > Is that inlined(?) and the code at fault? > > Yes, that is inlined! because > > (gdb) disassemble if_nlmsg_size > > Dump of assembler code for function if_nlmsg_size: > > [...] > > 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800> > > 0xc00000000191bf3c <+108>: ld r3,16(r31) > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > > [...] > > (gdb) > > (gdb) break *0xc00000000191bf40 > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > > (gdb) break *0xc00000000191bf38 > > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520. > > I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not > already doing so. That gives gdb a lot more information about things > like inlining. I check my .config file, CONFIG_DEBUG_INFO=y is here: linux-next$ grep CONFIG_DEBUG_INFO .config CONFIG_DEBUG_INFO=y Then I invoke "make clean" and rebuild the kernel, the behavior of gdb and vmlinux remain unchanged, sorry for that I am trying to reproduce the bug on my bare metal x86_64 machines in the coming days, and am also trying to work with Mr Menzel after he comes back to the office. Thanks Zhouyi > > Thanx, Paul > > > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net > > > >>> line 1917, so I guess something must happened between the calls. > > > >>> > > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what > > > >>> happened in between? > > > >> > > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can > > > >> see. From `arch/powerpc/Kconfig`: > > > >> > > > >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14 > > > >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14 > > > >> > > > > en, agree, I invoke "make menuconfig ARCH=powerpc > > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j > > > > 16", I can't find KASAN under Memory Debugging, I guess we should find > > > > the bug by bisecting instead. > > > > > > I do not know, if it is a regression, as it was the first time I tried > > > to run a Linux kernel built with rcutorture on real hardware. > > I tried to add some debug statements to the kernel to locate the bug > > more accurately, you can try it when you're not busy in the future, > > or just ignore it if the following patch looks not very effective ;-) > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 1baab07820f6..969ac7c540cc 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev) > > * Prevent userspace races by waiting until the network > > * device is fully setup before sending notifications. > > */ > > + if (dev->rtnl_link_ops) > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > if (!dev->rtnl_link_ops || > > dev->rtnl_link_state == RTNL_LINK_INITIALIZED) > > rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL); > > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev) > > > > if (rtnl_lock_killable()) > > return -EINTR; > > + if (dev->rtnl_link_ops) > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > err = register_netdevice(dev); > > rtnl_unlock(); > > return err; > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > > index e476403231f0..e08986ae6238 100644 > > --- a/net/core/rtnetlink.c > > +++ b/net/core/rtnetlink.c > > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct > > net_device *dev) > > if (!ops) > > return 0; > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops, > > + ops->kind, __FUNCTION__); > > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct > > net_device *dev) > > static noinline size_t if_nlmsg_size(const struct net_device *dev, > > u32 ext_filter_mask) > > { > > + if (dev->rtnl_link_ops) > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > return NLMSG_ALIGN(sizeof(struct ifinfomsg)) > > + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */ > > + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */ > > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, > > struct net_device *dev, > > struct net *net = dev_net(dev); > > struct sk_buff *skb; > > int err = -ENOBUFS; > > - > > + if (dev->rtnl_link_ops) > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > skb = nlmsg_new(if_nlmsg_size(dev, 0), flags); > > if (skb == NULL) > > goto errout; > > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct > > net_device *dev, > > > > if (dev->reg_state != NETREG_REGISTERED) > > return; > > - > > + if (dev->rtnl_link_ops) > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid, > > new_ifindex); > > if (skb) > > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct > > net_device *dev, > > void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change, > > gfp_t flags) > > { > > + if (dev->rtnl_link_ops) > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags, > > NULL, 0); > > } > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > > index c0b138c20992..fa5b2725811c 100644 > > --- a/net/ipv6/sit.c > > +++ b/net/ipv6/sit.c > > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net) > > * Allowing to move it to another netns is clearly unsafe. > > */ > > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > - > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", > > + sitn->fb_tunnel_dev->rtnl_link_ops, > > + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__); > > err = register_netdev(sitn->fb_tunnel_dev); > > if (err) > > goto err_reg_dev; > > > > > > >>> Hope I can be of more helpful. > > > >> > > > >> Some distributions support multi-arch, so they easily allow > > > >> crosscompiling for different architectures. > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the > > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp > > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue > > > > to explore it. > > > > > > Oh, that does not sound good. But I have not tried that in a long time > > > either. It’s a separate issue, but maybe some of the PPC > > > maintainers/folks could help. > > I will do further research on this later. > > > > Thanks for your time > > Kind regards > > Zhouyi > > > > > > > > > Kind regards, > > > > > > Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-01-31 1:08 ` Zhouyi Zhou @ 2022-02-01 17:50 ` Paul E. McKenney 2022-02-02 2:39 ` Zhouyi Zhou 0 siblings, 1 reply; 17+ messages in thread From: Paul E. McKenney @ 2022-02-01 17:50 UTC (permalink / raw) To: Zhouyi Zhou Cc: Paul Menzel, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev On Mon, Jan 31, 2022 at 09:08:40AM +0800, Zhouyi Zhou wrote: > Thank Paul for joining us! > > On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote: > > > Dear Paul > > > > > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > > > > > Dear Zhouyi, > > > > > > > > > > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou: > > > > > > > > > Thank you for your instructions, I learned a lot from this process. > > > > > > > > Same on my end. > > > > > > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > > > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou: > > > > >> > > > > >>> I don't have an IBM machine, but I tried to analyze the problem using > > > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my > > > > >>> x86_64 kvm virtual machine. > > > > >> > > > > >> No idea, if it’s architecture specific. > > > > >> > > > > >>> I saw the panic is caused by registration of sit device (A sit device > > > > >>> is a type of virtual network device that takes our IPv6 traffic, > > > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it > > > > >>> over the IPv4 Internet to another host) > > > > >>> > > > > >>> sit device is registered in function sit_init_net: > > > > >>> 1895 static int __net_init sit_init_net(struct net *net) > > > > >>> 1896 { > > > > >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id); > > > > >>> 1898 struct ip_tunnel *t; > > > > >>> 1899 int err; > > > > >>> 1900 > > > > >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc; > > > > >>> 1902 sitn->tunnels[1] = sitn->tunnels_l; > > > > >>> 1903 sitn->tunnels[2] = sitn->tunnels_r; > > > > >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l; > > > > >>> 1905 > > > > >>> 1906 if (!net_has_fallback_tunnels(net)) > > > > >>> 1907 return 0; > > > > >>> 1908 > > > > >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0", > > > > >>> 1910 NET_NAME_UNKNOWN, > > > > >>> 1911 ipip6_tunnel_setup); > > > > >>> 1912 if (!sitn->fb_tunnel_dev) { > > > > >>> 1913 err = -ENOMEM; > > > > >>> 1914 goto err_alloc_dev; > > > > >>> 1915 } > > > > >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net); > > > > >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; > > > > >>> 1918 /* FB netdevice is special: we have one, and only one per netns. > > > > >>> 1919 * Allowing to move it to another netns is clearly unsafe. > > > > >>> 1920 */ > > > > >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > > > >>> 1922 > > > > >>> 1923 err = register_netdev(sitn->fb_tunnel_dev); > > > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly. > > > > >>> > > > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size: > > > > >>> (gdb) disassemble if_nlmsg_size > > > > >>> Dump of assembler code for function if_nlmsg_size: > > > > >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1) > > > > >>> 0xffffffff81a0dc25 <+5>: push %rbp > > > > >>> 0xffffffff81a0dc26 <+6>: push %r15 > > > > >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512> > > > > >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi > > > > >>> ... > > > > >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen> > > > > >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax > > > > >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12 > > > > >> > > > > >> Excuse my ignorance, would that look the same for ppc64le? > > > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a > > > > >> current build (without rcutorture) I have the line below, where strlen > > > > >> shows up. > > > > >> > > > > >> (gdb) disassemble if_nlmsg_size > > > > >> […] > > > > >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen> > > > > >> […] > > > > >> > > > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524): > > > > >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev) > > > > >>> 516 { > > > > >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; > > > > >>> 518 size_t size; > > > > >>> 519 > > > > >>> 520 if (!ops) > > > > >>> 521 return 0; > > > > >>> 522 > > > > >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > > > >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > > >> > > > > >> How do I connect the disassemby output with the corresponding line? > > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > > > > for powerpc64le in my Ubuntu 20.04 x86_64. > > > > > > > > > > gdb-multiarch ./vmlinux > > > > > (gdb)disassemble if_nlmsg_size > > > > > [...] > > > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > > > > > [...] > > > > > (gdb) break *0xc00000000191bf40 > > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > > > > > > > > > > But in include/net/netlink.h:1112, I can't find the call to strlen > > > > > 1110static inline int nla_total_size(int payload) > > > > > 1111{ > > > > > 1112 return NLA_ALIGN(nla_attr_size(payload)); > > > > > 1113} > > > > > This may be due to the compiler wrongly encode the debug information, I guess. > > > > > > > > `rtnl_link_get_size()` contains: > > > > > > > > size = nla_total_size(sizeof(struct nlattr)) + /* > > > > IFLA_LINKINFO */ > > > > nla_total_size(strlen(ops->kind) + 1); /* > > > > IFLA_INFO_KIND */ > > > > > > > > Is that inlined(?) and the code at fault? > > > Yes, that is inlined! because > > > (gdb) disassemble if_nlmsg_size > > > Dump of assembler code for function if_nlmsg_size: > > > [...] > > > 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800> > > > 0xc00000000191bf3c <+108>: ld r3,16(r31) > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > > > [...] > > > (gdb) > > > (gdb) break *0xc00000000191bf40 > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > > > (gdb) break *0xc00000000191bf38 > > > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520. > > > > I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not > > already doing so. That gives gdb a lot more information about things > > like inlining. > I check my .config file, CONFIG_DEBUG_INFO=y is here: > linux-next$ grep CONFIG_DEBUG_INFO .config > CONFIG_DEBUG_INFO=y > Then I invoke "make clean" and rebuild the kernel, the behavior of gdb > and vmlinux remain unchanged, sorry for that Glad you were already on top of this one! > I am trying to reproduce the bug on my bare metal x86_64 machines in > the coming days, and am also trying to work with Mr Menzel after he > comes back to the office. This URL used to allow community members such as yourself to request access to Power systems: https://osuosl.org/services/powerdev/ In case that helps. Thanx, Paul > Thanks > Zhouyi > > > > Thanx, Paul > > > > > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net > > > > >>> line 1917, so I guess something must happened between the calls. > > > > >>> > > > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what > > > > >>> happened in between? > > > > >> > > > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can > > > > >> see. From `arch/powerpc/Kconfig`: > > > > >> > > > > >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14 > > > > >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14 > > > > >> > > > > > en, agree, I invoke "make menuconfig ARCH=powerpc > > > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j > > > > > 16", I can't find KASAN under Memory Debugging, I guess we should find > > > > > the bug by bisecting instead. > > > > > > > > I do not know, if it is a regression, as it was the first time I tried > > > > to run a Linux kernel built with rcutorture on real hardware. > > > I tried to add some debug statements to the kernel to locate the bug > > > more accurately, you can try it when you're not busy in the future, > > > or just ignore it if the following patch looks not very effective ;-) > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index 1baab07820f6..969ac7c540cc 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev) > > > * Prevent userspace races by waiting until the network > > > * device is fully setup before sending notifications. > > > */ > > > + if (dev->rtnl_link_ops) > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > if (!dev->rtnl_link_ops || > > > dev->rtnl_link_state == RTNL_LINK_INITIALIZED) > > > rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL); > > > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev) > > > > > > if (rtnl_lock_killable()) > > > return -EINTR; > > > + if (dev->rtnl_link_ops) > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > err = register_netdevice(dev); > > > rtnl_unlock(); > > > return err; > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > > > index e476403231f0..e08986ae6238 100644 > > > --- a/net/core/rtnetlink.c > > > +++ b/net/core/rtnetlink.c > > > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct > > > net_device *dev) > > > if (!ops) > > > return 0; > > > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops, > > > + ops->kind, __FUNCTION__); > > > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > > > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct > > > net_device *dev) > > > static noinline size_t if_nlmsg_size(const struct net_device *dev, > > > u32 ext_filter_mask) > > > { > > > + if (dev->rtnl_link_ops) > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > return NLMSG_ALIGN(sizeof(struct ifinfomsg)) > > > + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */ > > > + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */ > > > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, > > > struct net_device *dev, > > > struct net *net = dev_net(dev); > > > struct sk_buff *skb; > > > int err = -ENOBUFS; > > > - > > > + if (dev->rtnl_link_ops) > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > skb = nlmsg_new(if_nlmsg_size(dev, 0), flags); > > > if (skb == NULL) > > > goto errout; > > > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct > > > net_device *dev, > > > > > > if (dev->reg_state != NETREG_REGISTERED) > > > return; > > > - > > > + if (dev->rtnl_link_ops) > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid, > > > new_ifindex); > > > if (skb) > > > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct > > > net_device *dev, > > > void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change, > > > gfp_t flags) > > > { > > > + if (dev->rtnl_link_ops) > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags, > > > NULL, 0); > > > } > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > > > index c0b138c20992..fa5b2725811c 100644 > > > --- a/net/ipv6/sit.c > > > +++ b/net/ipv6/sit.c > > > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net) > > > * Allowing to move it to another netns is clearly unsafe. > > > */ > > > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > > - > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", > > > + sitn->fb_tunnel_dev->rtnl_link_ops, > > > + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__); > > > err = register_netdev(sitn->fb_tunnel_dev); > > > if (err) > > > goto err_reg_dev; > > > > > > > > >>> Hope I can be of more helpful. > > > > >> > > > > >> Some distributions support multi-arch, so they easily allow > > > > >> crosscompiling for different architectures. > > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the > > > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp > > > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue > > > > > to explore it. > > > > > > > > Oh, that does not sound good. But I have not tried that in a long time > > > > either. It’s a separate issue, but maybe some of the PPC > > > > maintainers/folks could help. > > > I will do further research on this later. > > > > > > Thanks for your time > > > Kind regards > > > Zhouyi > > > > > > > > > > > > Kind regards, > > > > > > > > Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-02-01 17:50 ` Paul E. McKenney @ 2022-02-02 2:39 ` Zhouyi Zhou 2022-02-08 20:10 ` Zhouyi Zhou 0 siblings, 1 reply; 17+ messages in thread From: Zhouyi Zhou @ 2022-02-02 2:39 UTC (permalink / raw) To: Paul E. McKenney Cc: Paul Menzel, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev Thank Paul for your encouragement! On Wed, Feb 2, 2022 at 1:50 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Mon, Jan 31, 2022 at 09:08:40AM +0800, Zhouyi Zhou wrote: > > Thank Paul for joining us! > > > > On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote: > > > > Dear Paul > > > > > > > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > > > > > > > Dear Zhouyi, > > > > > > > > > > > > > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou: > > > > > > > > > > > Thank you for your instructions, I learned a lot from this process. > > > > > > > > > > Same on my end. > > > > > > > > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > > > > > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou: > > > > > >> > > > > > >>> I don't have an IBM machine, but I tried to analyze the problem using > > > > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my > > > > > >>> x86_64 kvm virtual machine. > > > > > >> > > > > > >> No idea, if it’s architecture specific. > > > > > >> > > > > > >>> I saw the panic is caused by registration of sit device (A sit device > > > > > >>> is a type of virtual network device that takes our IPv6 traffic, > > > > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it > > > > > >>> over the IPv4 Internet to another host) > > > > > >>> > > > > > >>> sit device is registered in function sit_init_net: > > > > > >>> 1895 static int __net_init sit_init_net(struct net *net) > > > > > >>> 1896 { > > > > > >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id); > > > > > >>> 1898 struct ip_tunnel *t; > > > > > >>> 1899 int err; > > > > > >>> 1900 > > > > > >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc; > > > > > >>> 1902 sitn->tunnels[1] = sitn->tunnels_l; > > > > > >>> 1903 sitn->tunnels[2] = sitn->tunnels_r; > > > > > >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l; > > > > > >>> 1905 > > > > > >>> 1906 if (!net_has_fallback_tunnels(net)) > > > > > >>> 1907 return 0; > > > > > >>> 1908 > > > > > >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0", > > > > > >>> 1910 NET_NAME_UNKNOWN, > > > > > >>> 1911 ipip6_tunnel_setup); > > > > > >>> 1912 if (!sitn->fb_tunnel_dev) { > > > > > >>> 1913 err = -ENOMEM; > > > > > >>> 1914 goto err_alloc_dev; > > > > > >>> 1915 } > > > > > >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net); > > > > > >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; > > > > > >>> 1918 /* FB netdevice is special: we have one, and only one per netns. > > > > > >>> 1919 * Allowing to move it to another netns is clearly unsafe. > > > > > >>> 1920 */ > > > > > >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > > > > >>> 1922 > > > > > >>> 1923 err = register_netdev(sitn->fb_tunnel_dev); > > > > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly. > > > > > >>> > > > > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size: > > > > > >>> (gdb) disassemble if_nlmsg_size > > > > > >>> Dump of assembler code for function if_nlmsg_size: > > > > > >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1) > > > > > >>> 0xffffffff81a0dc25 <+5>: push %rbp > > > > > >>> 0xffffffff81a0dc26 <+6>: push %r15 > > > > > >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512> > > > > > >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi > > > > > >>> ... > > > > > >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen> > > > > > >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax > > > > > >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12 > > > > > >> > > > > > >> Excuse my ignorance, would that look the same for ppc64le? > > > > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a > > > > > >> current build (without rcutorture) I have the line below, where strlen > > > > > >> shows up. > > > > > >> > > > > > >> (gdb) disassemble if_nlmsg_size > > > > > >> […] > > > > > >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen> > > > > > >> […] > > > > > >> > > > > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524): > > > > > >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev) > > > > > >>> 516 { > > > > > >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; > > > > > >>> 518 size_t size; > > > > > >>> 519 > > > > > >>> 520 if (!ops) > > > > > >>> 521 return 0; > > > > > >>> 522 > > > > > >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > > > > >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > > > >> > > > > > >> How do I connect the disassemby output with the corresponding line? > > > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > > > > > for powerpc64le in my Ubuntu 20.04 x86_64. > > > > > > > > > > > > gdb-multiarch ./vmlinux > > > > > > (gdb)disassemble if_nlmsg_size > > > > > > [...] > > > > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > > > > > > [...] > > > > > > (gdb) break *0xc00000000191bf40 > > > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > > > > > > > > > > > > But in include/net/netlink.h:1112, I can't find the call to strlen > > > > > > 1110static inline int nla_total_size(int payload) > > > > > > 1111{ > > > > > > 1112 return NLA_ALIGN(nla_attr_size(payload)); > > > > > > 1113} > > > > > > This may be due to the compiler wrongly encode the debug information, I guess. > > > > > > > > > > `rtnl_link_get_size()` contains: > > > > > > > > > > size = nla_total_size(sizeof(struct nlattr)) + /* > > > > > IFLA_LINKINFO */ > > > > > nla_total_size(strlen(ops->kind) + 1); /* > > > > > IFLA_INFO_KIND */ > > > > > > > > > > Is that inlined(?) and the code at fault? > > > > Yes, that is inlined! because > > > > (gdb) disassemble if_nlmsg_size > > > > Dump of assembler code for function if_nlmsg_size: > > > > [...] > > > > 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800> > > > > 0xc00000000191bf3c <+108>: ld r3,16(r31) > > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > > > > [...] > > > > (gdb) > > > > (gdb) break *0xc00000000191bf40 > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > > > > (gdb) break *0xc00000000191bf38 > > > > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520. > > > > > > I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not > > > already doing so. That gives gdb a lot more information about things > > > like inlining. > > I check my .config file, CONFIG_DEBUG_INFO=y is here: > > linux-next$ grep CONFIG_DEBUG_INFO .config > > CONFIG_DEBUG_INFO=y > > Then I invoke "make clean" and rebuild the kernel, the behavior of gdb > > and vmlinux remain unchanged, sorry for that > > Glad you were already on top of this one! I am very pleased to contribute my tiny effort to the process of making Linux better ;-) > > > I am trying to reproduce the bug on my bare metal x86_64 machines in > > the coming days, and am also trying to work with Mr Menzel after he > > comes back to the office. > > This URL used to allow community members such as yourself to request > access to Power systems: https://osuosl.org/services/powerdev/ I have filled the request form on https://osuosl.org/services/powerdev/ and now wait for them to deploy the environment for me. Thanks again Zhouyi > > In case that helps. > > Thanx, Paul > > > Thanks > > Zhouyi > > > > > > Thanx, Paul > > > > > > > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net > > > > > >>> line 1917, so I guess something must happened between the calls. > > > > > >>> > > > > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what > > > > > >>> happened in between? > > > > > >> > > > > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can > > > > > >> see. From `arch/powerpc/Kconfig`: > > > > > >> > > > > > >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14 > > > > > >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14 > > > > > >> > > > > > > en, agree, I invoke "make menuconfig ARCH=powerpc > > > > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j > > > > > > 16", I can't find KASAN under Memory Debugging, I guess we should find > > > > > > the bug by bisecting instead. > > > > > > > > > > I do not know, if it is a regression, as it was the first time I tried > > > > > to run a Linux kernel built with rcutorture on real hardware. > > > > I tried to add some debug statements to the kernel to locate the bug > > > > more accurately, you can try it when you're not busy in the future, > > > > or just ignore it if the following patch looks not very effective ;-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > > index 1baab07820f6..969ac7c540cc 100644 > > > > --- a/net/core/dev.c > > > > +++ b/net/core/dev.c > > > > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev) > > > > * Prevent userspace races by waiting until the network > > > > * device is fully setup before sending notifications. > > > > */ > > > > + if (dev->rtnl_link_ops) > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > if (!dev->rtnl_link_ops || > > > > dev->rtnl_link_state == RTNL_LINK_INITIALIZED) > > > > rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL); > > > > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev) > > > > > > > > if (rtnl_lock_killable()) > > > > return -EINTR; > > > > + if (dev->rtnl_link_ops) > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > err = register_netdevice(dev); > > > > rtnl_unlock(); > > > > return err; > > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > > > > index e476403231f0..e08986ae6238 100644 > > > > --- a/net/core/rtnetlink.c > > > > +++ b/net/core/rtnetlink.c > > > > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct > > > > net_device *dev) > > > > if (!ops) > > > > return 0; > > > > > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops, > > > > + ops->kind, __FUNCTION__); > > > > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > > > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > > > > > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct > > > > net_device *dev) > > > > static noinline size_t if_nlmsg_size(const struct net_device *dev, > > > > u32 ext_filter_mask) > > > > { > > > > + if (dev->rtnl_link_ops) > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > return NLMSG_ALIGN(sizeof(struct ifinfomsg)) > > > > + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */ > > > > + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */ > > > > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, > > > > struct net_device *dev, > > > > struct net *net = dev_net(dev); > > > > struct sk_buff *skb; > > > > int err = -ENOBUFS; > > > > - > > > > + if (dev->rtnl_link_ops) > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > skb = nlmsg_new(if_nlmsg_size(dev, 0), flags); > > > > if (skb == NULL) > > > > goto errout; > > > > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct > > > > net_device *dev, > > > > > > > > if (dev->reg_state != NETREG_REGISTERED) > > > > return; > > > > - > > > > + if (dev->rtnl_link_ops) > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid, > > > > new_ifindex); > > > > if (skb) > > > > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct > > > > net_device *dev, > > > > void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change, > > > > gfp_t flags) > > > > { > > > > + if (dev->rtnl_link_ops) > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags, > > > > NULL, 0); > > > > } > > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > > > > index c0b138c20992..fa5b2725811c 100644 > > > > --- a/net/ipv6/sit.c > > > > +++ b/net/ipv6/sit.c > > > > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net) > > > > * Allowing to move it to another netns is clearly unsafe. > > > > */ > > > > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > > > - > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", > > > > + sitn->fb_tunnel_dev->rtnl_link_ops, > > > > + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__); > > > > err = register_netdev(sitn->fb_tunnel_dev); > > > > if (err) > > > > goto err_reg_dev; > > > > > > > > > > >>> Hope I can be of more helpful. > > > > > >> > > > > > >> Some distributions support multi-arch, so they easily allow > > > > > >> crosscompiling for different architectures. > > > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > > > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the > > > > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp > > > > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue > > > > > > to explore it. > > > > > > > > > > Oh, that does not sound good. But I have not tried that in a long time > > > > > either. It’s a separate issue, but maybe some of the PPC > > > > > maintainers/folks could help. > > > > I will do further research on this later. > > > > > > > > Thanks for your time > > > > Kind regards > > > > Zhouyi > > > > > > > > > > > > > > > Kind regards, > > > > > > > > > > Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-02-02 2:39 ` Zhouyi Zhou @ 2022-02-08 20:10 ` Zhouyi Zhou 0 siblings, 0 replies; 17+ messages in thread From: Zhouyi Zhou @ 2022-02-08 20:10 UTC (permalink / raw) To: Paul E. McKenney Cc: Paul Menzel, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev, linuxppc-dev Hi Paul Below are my preliminary test results tested on PPC VM supplied by Open source lab of Oregon State University, thank you for your support! [Preliminary test results on ppc64le virtual guest] 1. Conclusion Some other kernel configuration besides RCU may lead to "BUG: Kernel NULL pointer dereference" at boot 2. Test Environment 2.1 host hardware 8 core ppc64le virtual guest with 16G ram and 160G disk cpu : POWER9 (architected), altivec supported clock : 2200.000000MHz revision : 2.2 (pvr 004e 1202) 2.2 host software Operating System: Ubuntu 20.04.3 LTS, Compiler: gcc version 9.3.0 3. Test Procedure 3.1 kernel source next-20220203 3.2 build and boot the kernel with CONFIG_DRM_BOCHS=m and CONFIG_RCU_TORTURE_TEST=y test result: "BUG: Kernel NULL pointer dereference" at boot config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.bochs.torture boot msg: http://154.223.142.244/Feb2022/dmesg.torture.bochs 3.3 build and boot the kernel with CONFIG_DRM_BOCHS=m test result: "BUG: Kernel NULL pointer dereference" at boot config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.bochs boot msg: http://154.223.142.244/Feb2022/dmesg.bochs 3.4 build and boot the kernel with CONFIG_RCU_TORTURE_TEST=y (without CONFIG_DRM_BOCHS) test result: boot without error config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.torture boot msg: http://154.223.142.244/Feb2022/dmesg.torture 3.5 build and boot the kernel with CONFIG_RCU_TORTURE_TEST=m (without CONFIG_DRM_BOCHS) test result: boot without error config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next boot msg: http://154.223.142.244/Feb2022/dmesg 4. Acknowledgement Thank Open source lab of Oregon State University and Paul Menzel and all other community members who support my tiny research. Thanks Zhouyi On Wed, Feb 2, 2022 at 10:39 AM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote: > > Thank Paul for your encouragement! > > On Wed, Feb 2, 2022 at 1:50 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Mon, Jan 31, 2022 at 09:08:40AM +0800, Zhouyi Zhou wrote: > > > Thank Paul for joining us! > > > > > > On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote: > > > > > Dear Paul > > > > > > > > > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > > > > > > > > > Dear Zhouyi, > > > > > > > > > > > > > > > > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou: > > > > > > > > > > > > > Thank you for your instructions, I learned a lot from this process. > > > > > > > > > > > > Same on my end. > > > > > > > > > > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > > > > > > > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou: > > > > > > >> > > > > > > >>> I don't have an IBM machine, but I tried to analyze the problem using > > > > > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my > > > > > > >>> x86_64 kvm virtual machine. > > > > > > >> > > > > > > >> No idea, if it’s architecture specific. > > > > > > >> > > > > > > >>> I saw the panic is caused by registration of sit device (A sit device > > > > > > >>> is a type of virtual network device that takes our IPv6 traffic, > > > > > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it > > > > > > >>> over the IPv4 Internet to another host) > > > > > > >>> > > > > > > >>> sit device is registered in function sit_init_net: > > > > > > >>> 1895 static int __net_init sit_init_net(struct net *net) > > > > > > >>> 1896 { > > > > > > >>> 1897 struct sit_net *sitn = net_generic(net, sit_net_id); > > > > > > >>> 1898 struct ip_tunnel *t; > > > > > > >>> 1899 int err; > > > > > > >>> 1900 > > > > > > >>> 1901 sitn->tunnels[0] = sitn->tunnels_wc; > > > > > > >>> 1902 sitn->tunnels[1] = sitn->tunnels_l; > > > > > > >>> 1903 sitn->tunnels[2] = sitn->tunnels_r; > > > > > > >>> 1904 sitn->tunnels[3] = sitn->tunnels_r_l; > > > > > > >>> 1905 > > > > > > >>> 1906 if (!net_has_fallback_tunnels(net)) > > > > > > >>> 1907 return 0; > > > > > > >>> 1908 > > > > > > >>> 1909 sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0", > > > > > > >>> 1910 NET_NAME_UNKNOWN, > > > > > > >>> 1911 ipip6_tunnel_setup); > > > > > > >>> 1912 if (!sitn->fb_tunnel_dev) { > > > > > > >>> 1913 err = -ENOMEM; > > > > > > >>> 1914 goto err_alloc_dev; > > > > > > >>> 1915 } > > > > > > >>> 1916 dev_net_set(sitn->fb_tunnel_dev, net); > > > > > > >>> 1917 sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops; > > > > > > >>> 1918 /* FB netdevice is special: we have one, and only one per netns. > > > > > > >>> 1919 * Allowing to move it to another netns is clearly unsafe. > > > > > > >>> 1920 */ > > > > > > >>> 1921 sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > > > > > >>> 1922 > > > > > > >>> 1923 err = register_netdev(sitn->fb_tunnel_dev); > > > > > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly. > > > > > > >>> > > > > > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size: > > > > > > >>> (gdb) disassemble if_nlmsg_size > > > > > > >>> Dump of assembler code for function if_nlmsg_size: > > > > > > >>> 0xffffffff81a0dc20 <+0>: nopl 0x0(%rax,%rax,1) > > > > > > >>> 0xffffffff81a0dc25 <+5>: push %rbp > > > > > > >>> 0xffffffff81a0dc26 <+6>: push %r15 > > > > > > >>> 0xffffffff81a0dd04 <+228>: je 0xffffffff81a0de20 <if_nlmsg_size+512> > > > > > > >>> 0xffffffff81a0dd0a <+234>: mov 0x10(%rbp),%rdi > > > > > > >>> ... > > > > > > >>> => 0xffffffff81a0dd0e <+238>: callq 0xffffffff817532d0 <strlen> > > > > > > >>> 0xffffffff81a0dd13 <+243>: add $0x10,%eax > > > > > > >>> 0xffffffff81a0dd16 <+246>: movslq %eax,%r12 > > > > > > >> > > > > > > >> Excuse my ignorance, would that look the same for ppc64le? > > > > > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a > > > > > > >> current build (without rcutorture) I have the line below, where strlen > > > > > > >> shows up. > > > > > > >> > > > > > > >> (gdb) disassemble if_nlmsg_size > > > > > > >> […] > > > > > > >> 0xc000000000f7f82c <+332>: bl 0xc000000000a10e30 <strlen> > > > > > > >> […] > > > > > > >> > > > > > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524): > > > > > > >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev) > > > > > > >>> 516 { > > > > > > >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; > > > > > > >>> 518 size_t size; > > > > > > >>> 519 > > > > > > >>> 520 if (!ops) > > > > > > >>> 521 return 0; > > > > > > >>> 522 > > > > > > >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > > > > > >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > > > > >> > > > > > > >> How do I connect the disassemby output with the corresponding line? > > > > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > > > > > > for powerpc64le in my Ubuntu 20.04 x86_64. > > > > > > > > > > > > > > gdb-multiarch ./vmlinux > > > > > > > (gdb)disassemble if_nlmsg_size > > > > > > > [...] > > > > > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > > > > > > > [...] > > > > > > > (gdb) break *0xc00000000191bf40 > > > > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > > > > > > > > > > > > > > But in include/net/netlink.h:1112, I can't find the call to strlen > > > > > > > 1110static inline int nla_total_size(int payload) > > > > > > > 1111{ > > > > > > > 1112 return NLA_ALIGN(nla_attr_size(payload)); > > > > > > > 1113} > > > > > > > This may be due to the compiler wrongly encode the debug information, I guess. > > > > > > > > > > > > `rtnl_link_get_size()` contains: > > > > > > > > > > > > size = nla_total_size(sizeof(struct nlattr)) + /* > > > > > > IFLA_LINKINFO */ > > > > > > nla_total_size(strlen(ops->kind) + 1); /* > > > > > > IFLA_INFO_KIND */ > > > > > > > > > > > > Is that inlined(?) and the code at fault? > > > > > Yes, that is inlined! because > > > > > (gdb) disassemble if_nlmsg_size > > > > > Dump of assembler code for function if_nlmsg_size: > > > > > [...] > > > > > 0xc00000000191bf38 <+104>: beq 0xc00000000191c1f0 <if_nlmsg_size+800> > > > > > 0xc00000000191bf3c <+108>: ld r3,16(r31) > > > > > 0xc00000000191bf40 <+112>: bl 0xc000000001c28ad0 <strlen> > > > > > [...] > > > > > (gdb) > > > > > (gdb) break *0xc00000000191bf40 > > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112. > > > > > (gdb) break *0xc00000000191bf38 > > > > > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520. > > > > > > > > I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not > > > > already doing so. That gives gdb a lot more information about things > > > > like inlining. > > > I check my .config file, CONFIG_DEBUG_INFO=y is here: > > > linux-next$ grep CONFIG_DEBUG_INFO .config > > > CONFIG_DEBUG_INFO=y > > > Then I invoke "make clean" and rebuild the kernel, the behavior of gdb > > > and vmlinux remain unchanged, sorry for that > > > > Glad you were already on top of this one! > I am very pleased to contribute my tiny effort to the process of > making Linux better ;-) > > > > > I am trying to reproduce the bug on my bare metal x86_64 machines in > > > the coming days, and am also trying to work with Mr Menzel after he > > > comes back to the office. > > > > This URL used to allow community members such as yourself to request > > access to Power systems: https://osuosl.org/services/powerdev/ > I have filled the request form on > https://osuosl.org/services/powerdev/ and now wait for them to deploy > the environment for me. > > Thanks again > Zhouyi > > > > In case that helps. > > > > Thanx, Paul > > > > > Thanks > > > Zhouyi > > > > > > > > Thanx, Paul > > > > > > > > > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net > > > > > > >>> line 1917, so I guess something must happened between the calls. > > > > > > >>> > > > > > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what > > > > > > >>> happened in between? > > > > > > >> > > > > > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can > > > > > > >> see. From `arch/powerpc/Kconfig`: > > > > > > >> > > > > > > >> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14 > > > > > > >> select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14 > > > > > > >> > > > > > > > en, agree, I invoke "make menuconfig ARCH=powerpc > > > > > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j > > > > > > > 16", I can't find KASAN under Memory Debugging, I guess we should find > > > > > > > the bug by bisecting instead. > > > > > > > > > > > > I do not know, if it is a regression, as it was the first time I tried > > > > > > to run a Linux kernel built with rcutorture on real hardware. > > > > > I tried to add some debug statements to the kernel to locate the bug > > > > > more accurately, you can try it when you're not busy in the future, > > > > > or just ignore it if the following patch looks not very effective ;-) > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > > > index 1baab07820f6..969ac7c540cc 100644 > > > > > --- a/net/core/dev.c > > > > > +++ b/net/core/dev.c > > > > > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev) > > > > > * Prevent userspace races by waiting until the network > > > > > * device is fully setup before sending notifications. > > > > > */ > > > > > + if (dev->rtnl_link_ops) > > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > > if (!dev->rtnl_link_ops || > > > > > dev->rtnl_link_state == RTNL_LINK_INITIALIZED) > > > > > rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL); > > > > > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev) > > > > > > > > > > if (rtnl_lock_killable()) > > > > > return -EINTR; > > > > > + if (dev->rtnl_link_ops) > > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > > err = register_netdevice(dev); > > > > > rtnl_unlock(); > > > > > return err; > > > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > > > > > index e476403231f0..e08986ae6238 100644 > > > > > --- a/net/core/rtnetlink.c > > > > > +++ b/net/core/rtnetlink.c > > > > > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct > > > > > net_device *dev) > > > > > if (!ops) > > > > > return 0; > > > > > > > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops, > > > > > + ops->kind, __FUNCTION__); > > > > > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > > > > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > > > > > > > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct > > > > > net_device *dev) > > > > > static noinline size_t if_nlmsg_size(const struct net_device *dev, > > > > > u32 ext_filter_mask) > > > > > { > > > > > + if (dev->rtnl_link_ops) > > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > > return NLMSG_ALIGN(sizeof(struct ifinfomsg)) > > > > > + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */ > > > > > + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */ > > > > > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, > > > > > struct net_device *dev, > > > > > struct net *net = dev_net(dev); > > > > > struct sk_buff *skb; > > > > > int err = -ENOBUFS; > > > > > - > > > > > + if (dev->rtnl_link_ops) > > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > > skb = nlmsg_new(if_nlmsg_size(dev, 0), flags); > > > > > if (skb == NULL) > > > > > goto errout; > > > > > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct > > > > > net_device *dev, > > > > > > > > > > if (dev->reg_state != NETREG_REGISTERED) > > > > > return; > > > > > - > > > > > + if (dev->rtnl_link_ops) > > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > > skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid, > > > > > new_ifindex); > > > > > if (skb) > > > > > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct > > > > > net_device *dev, > > > > > void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change, > > > > > gfp_t flags) > > > > > { > > > > > + if (dev->rtnl_link_ops) > > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops, > > > > > + dev->rtnl_link_ops->kind, __FUNCTION__); > > > > > rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags, > > > > > NULL, 0); > > > > > } > > > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > > > > > index c0b138c20992..fa5b2725811c 100644 > > > > > --- a/net/ipv6/sit.c > > > > > +++ b/net/ipv6/sit.c > > > > > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net) > > > > > * Allowing to move it to another netns is clearly unsafe. > > > > > */ > > > > > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > > > > - > > > > > + printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", > > > > > + sitn->fb_tunnel_dev->rtnl_link_ops, > > > > > + sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__); > > > > > err = register_netdev(sitn->fb_tunnel_dev); > > > > > if (err) > > > > > goto err_reg_dev; > > > > > > > > > > > > >>> Hope I can be of more helpful. > > > > > > >> > > > > > > >> Some distributions support multi-arch, so they easily allow > > > > > > >> crosscompiling for different architectures. > > > > > > > I use "make ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9 > > > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel > > > > > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the > > > > > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp > > > > > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue > > > > > > > to explore it. > > > > > > > > > > > > Oh, that does not sound good. But I have not tried that in a long time > > > > > > either. It’s a separate issue, but maybe some of the PPC > > > > > > maintainers/folks could help. > > > > > I will do further research on this later. > > > > > > > > > > Thanks for your time > > > > > Kind regards > > > > > Zhouyi > > > > > > > > > > > > > > > > > > Kind regards, > > > > > > > > > > > > Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de>]
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) [not found] ` <f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de> @ 2022-02-17 1:16 ` Nathan Chancellor 2022-02-21 11:17 ` Paul Menzel 0 siblings, 1 reply; 17+ messages in thread From: Nathan Chancellor @ 2022-02-17 1:16 UTC (permalink / raw) To: Paul Menzel Cc: Zhouyi Zhou, Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev, Nick Desaulniers, llvm Hi Paul, On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote: > [Cc: +LLVM/clang build support folks] > > > Dear Zhouyi, dear Nathan, dear Nick, > > > To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm* > and *clang* 1:13.0-53~exp1 > > $ clang --version > Ubuntu clang version 13.0.0-2 > Target: powerpc64le-unknown-linux-gnu > Thread model: posix > InstalledDir: /usr/bin > > results in a segmentation fault, while it works when building with GCC. > > $ gcc --version > gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0 Thank you for keying us in. I am going to have a bit of a brain dump here based on the information I have uncovered after a couple of hours of debugging. TL;DR: It seems like something is broken with __read_mostly + ld.lld before 14.0.0. My initial reproduction steps (boot-qemu.sh comes from https://github.com/ClangBuiltLinux/boot-utils): $ clang --version clang version 13.0.1 (Fedora 13.0.1-1.fc37) Target: x86_64-redhat-linux-gnu Thread model: posix InstalledDir: /usr/bin $ powerpc64le-linux-gnu-as --version GNU assembler version 2.37-2.fc36 Copyright (C) 2021 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or later. This program has absolutely no warranty. This assembler was configured for a target of `powerpc64le-linux-gnu'. $ curl -LSso .config https://lore.kernel.org/all/f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de/3-linux-5.17-rc4-rcu-dev-config.txt $ scripts/config --set-val INITRAMFS_SOURCE '""' $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all $ boot-qemu.sh -a ppc64le -k . -t 45s QEMU location: /usr/bin QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37) + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \ /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \ ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \ /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \ -machine powernv8 -display none -kernel \ /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \ -nodefaults -serial mon:stdio ... [ 1.478028][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 1.478630][ T1] Faulting instruction address: 0xc00000000090bee0 [ 1.479521][ T1] Oops: Kernel access of bad area, sig: 11 [#1] [ 1.480036][ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV [ 1.480853][ T1] Modules linked in: [ 1.481265][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1 [ 1.481967][ T1] NIP: c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c [ 1.482596][ T1] REGS: c000000007443330 TRAP: 0380 Not tainted (5.17.0-rc4-00001-gfa15c7cb550f) [ 1.483305][ T1] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 22800a87 XER: 00000000 [ 1.484277][ T1] CFAR: c000000000d96b5c IRQMASK: 0 [ 1.484277][ T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000 [ 1.484277][ T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88 [ 1.484277][ T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000 [ 1.484277][ T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000 [ 1.484277][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.484277][ T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000 [ 1.484277][ T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000 [ 1.484277][ T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0 [ 1.490325][ T1] NIP [c00000000090bee0] strlen+0x10/0x30 [ 1.490788][ T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390 [ 1.491319][ T1] Call Trace: [ 1.491573][ T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable) [ 1.492291][ T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0 [ 1.492958][ T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0 [ 1.493559][ T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670 [ 1.494205][ T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80 [ 1.494823][ T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200 [ 1.495426][ T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0 [ 1.496014][ T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0 [ 1.496716][ T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0 [ 1.497372][ T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160 [ 1.497950][ T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0 [ 1.498573][ T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4 [ 1.499219][ T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4 [ 1.499799][ T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec [ 1.500444][ T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270 [ 1.501042][ T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 [ 1.501721][ T1] Instruction dump: [ 1.502202][ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000 [ 1.502934][ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050 [ 1.504028][ T1] ---[ end trace 0000000000000000 ]--- ... First thing was figuring out where the NULL pointer dereference happens, which appears to the "strlen(ops->kind)" in rtnl_link_get_size(): 515 static size_t rtnl_link_get_size(const struct net_device *dev) 516 { 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; 518 size_t size; 519 520 if (!ops) 521 return 0; 522 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ which I confirmed some really rudimentary printk debugging: [ 1.476862][ T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 710da8a36729..c8d928e83aec 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev) if (!ops) return 0; + pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__, + dev->name, ops, ops->kind); + size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ Okay... how did sit0 end up with a NULL kind...? It is very clearly defined as "sit": 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = { 1831 .kind = "sit", Adding some more debug prints to net/ipv6/sit.c: diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index c0b138c20992..7b9edbed2fcd 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net) */ sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; + pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops); + pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind); + pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype); + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops); + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind); + err = register_netdev(sitn->fb_tunnel_dev); if (err) goto err_reg_dev; reveals: [ 1.471920][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8 [ 1.472534][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null) [ 1.473088][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 [ 1.473639][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8 [ 1.474370][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null) This is super bizarre, as the maxtype member appears to have the correct value, but how is kind's initial getting dropped on the floor? Removing the __read_mostly annotation "fixes" it: [ 1.481708][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60 [ 1.482319][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit [ 1.482878][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 [ 1.483429][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60 [ 1.484174][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit ... Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022 ... diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 7b9edbed2fcd..f109c7a0233b 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev); static void ipip6_dev_free(struct net_device *dev); static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst, __be32 *v4dst); -static struct rtnl_link_ops sit_link_ops __read_mostly; +static struct rtnl_link_ops sit_link_ops; static unsigned int sit_net_id __read_mostly; struct sit_net { @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head) unregister_netdevice_queue(dev, head); } -static struct rtnl_link_ops sit_link_ops __read_mostly = { +static struct rtnl_link_ops sit_link_ops = { .kind = "sit", .maxtype = IFLA_IPTUN_MAX, .policy = ipip6_policy, Switching to ld.bfd also resolves it: [ 1.470405][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8 [ 1.471016][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit [ 1.471534][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 [ 1.472062][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8 [ 1.472790][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit ... Linux version 5.17.0-rc4-00001-g956f02ad5c31 (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022 ... I tested with ToT LLVM (or at least, close to it, since there is an unrelated ld.lld regression there) and I could not reproduce it there, so I did a reverse bisect to see what commit fixes this issue in LLVM 14 and I landed on: commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4 Author: Fangrui Song <i@maskray.me> Date: Thu Nov 25 14:12:34 2021 -0800 [ELF] Simplify DynamicSection content computation. NFC The new code computes the content twice, but avoides the tricky std::function<uint64_t()>. Removed 13KiB code in a Release build. lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++-------------------------- lld/ELF/SyntheticSections.h | 12 +---- 2 files changed, 44 insertions(+), 85 deletions(-) That's... interesting, given that commit title says No Functional Change, even though there clearly is one. That commit has a couple mentions of PowerPC synthetic sections, so it is possible that the new content calculation lines up with ld.bfd? I am not really sure where to go from here, as I don't fully understand what the problem was before that LLD change. I'll see if I can do some more investigation tomorrow (unless someone wants to beat me to it ;) Cheers, Nathan ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-02-17 1:16 ` Nathan Chancellor @ 2022-02-21 11:17 ` Paul Menzel 2022-02-21 15:29 ` Nathan Chancellor 0 siblings, 1 reply; 17+ messages in thread From: Paul Menzel @ 2022-02-21 11:17 UTC (permalink / raw) To: Nathan Chancellor Cc: Zhouyi Zhou, Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev, Nick Desaulniers, llvm, Fangrui Song [Cc: +Fangrui] Dear Nathan, Am 17.02.22 um 02:16 schrieb Nathan Chancellor: > On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote: >> [Cc: +LLVM/clang build support folks] […] >> To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm* >> and *clang* 1:13.0-53~exp1 >> >> $ clang --version >> Ubuntu clang version 13.0.0-2 >> Target: powerpc64le-unknown-linux-gnu >> Thread model: posix >> InstalledDir: /usr/bin >> >> results in a segmentation fault, while it works when building with GCC. >> >> $ gcc --version >> gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0 > > Thank you for keying us in. I am going to have a bit of a brain dump > here based on the information I have uncovered after a couple of hours > of debugging. > > TL;DR: It seems like something is broken with __read_mostly + ld.lld > before 14.0.0. > > My initial reproduction steps (boot-qemu.sh comes from > https://github.com/ClangBuiltLinux/boot-utils): > > $ clang --version > clang version 13.0.1 (Fedora 13.0.1-1.fc37) > Target: x86_64-redhat-linux-gnu > Thread model: posix > InstalledDir: /usr/bin > > $ powerpc64le-linux-gnu-as --version > GNU assembler version 2.37-2.fc36 > Copyright (C) 2021 Free Software Foundation, Inc. > This program is free software; you may redistribute it under the terms of > the GNU General Public License version 3 or later. > This program has absolutely no warranty. > This assembler was configured for a target of `powerpc64le-linux-gnu'. > > $ curl -LSso .config https://lore.kernel.org/all/f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de/3-linux-5.17-rc4-rcu-dev-config.txt > > $ scripts/config --set-val INITRAMFS_SOURCE '""' > > $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all > > $ boot-qemu.sh -a ppc64le -k . -t 45s > QEMU location: /usr/bin > > QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37) > > + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \ > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \ > ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \ > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \ > -machine powernv8 -display none -kernel \ > /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \ > -nodefaults -serial mon:stdio > ... > [ 1.478028][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000 > [ 1.478630][ T1] Faulting instruction address: 0xc00000000090bee0 > [ 1.479521][ T1] Oops: Kernel access of bad area, sig: 11 [#1] > [ 1.480036][ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV > [ 1.480853][ T1] Modules linked in: > [ 1.481265][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1 > [ 1.481967][ T1] NIP: c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c > [ 1.482596][ T1] REGS: c000000007443330 TRAP: 0380 Not tainted (5.17.0-rc4-00001-gfa15c7cb550f) > [ 1.483305][ T1] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 22800a87 XER: 00000000 > [ 1.484277][ T1] CFAR: c000000000d96b5c IRQMASK: 0 > [ 1.484277][ T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000 > [ 1.484277][ T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88 > [ 1.484277][ T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000 > [ 1.484277][ T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000 > [ 1.484277][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 1.484277][ T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000 > [ 1.484277][ T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000 > [ 1.484277][ T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0 > [ 1.490325][ T1] NIP [c00000000090bee0] strlen+0x10/0x30 > [ 1.490788][ T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390 > [ 1.491319][ T1] Call Trace: > [ 1.491573][ T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable) > [ 1.492291][ T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0 > [ 1.492958][ T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0 > [ 1.493559][ T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670 > [ 1.494205][ T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80 > [ 1.494823][ T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200 > [ 1.495426][ T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0 > [ 1.496014][ T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0 > [ 1.496716][ T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0 > [ 1.497372][ T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160 > [ 1.497950][ T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0 > [ 1.498573][ T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4 > [ 1.499219][ T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4 > [ 1.499799][ T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec > [ 1.500444][ T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270 > [ 1.501042][ T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 > [ 1.501721][ T1] Instruction dump: > [ 1.502202][ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000 > [ 1.502934][ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050 > [ 1.504028][ T1] ---[ end trace 0000000000000000 ]--- > ... > > First thing was figuring out where the NULL pointer dereference happens, > which appears to the "strlen(ops->kind)" in rtnl_link_get_size(): > > 515 static size_t rtnl_link_get_size(const struct net_device *dev) > 516 { > 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; > 518 size_t size; > 519 > 520 if (!ops) > 521 return 0; > 522 > 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > which I confirmed some really rudimentary printk debugging: > > [ 1.476862][ T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null) > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index 710da8a36729..c8d928e83aec 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev) > if (!ops) > return 0; > > + pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__, > + dev->name, ops, ops->kind); > + > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > Okay... how did sit0 end up with a NULL kind...? It is very clearly > defined as "sit": > > 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = { > 1831 .kind = "sit", > > Adding some more debug prints to net/ipv6/sit.c: > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > index c0b138c20992..7b9edbed2fcd 100644 > --- a/net/ipv6/sit.c > +++ b/net/ipv6/sit.c > @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net) > */ > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > + pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops); > + pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind); > + pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype); > + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops); > + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind); > + > err = register_netdev(sitn->fb_tunnel_dev); > if (err) > goto err_reg_dev; > > reveals: > > [ 1.471920][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8 > [ 1.472534][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null) > [ 1.473088][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 > [ 1.473639][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8 > [ 1.474370][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null) > > This is super bizarre, as the maxtype member appears to have the correct > value, but how is kind's initial getting dropped on the floor? > > Removing the __read_mostly annotation "fixes" it: > > [ 1.481708][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60 > [ 1.482319][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit > [ 1.482878][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 > [ 1.483429][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60 > [ 1.484174][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit > ... > Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022 > ... > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > index 7b9edbed2fcd..f109c7a0233b 100644 > --- a/net/ipv6/sit.c > +++ b/net/ipv6/sit.c > @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev); > static void ipip6_dev_free(struct net_device *dev); > static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst, > __be32 *v4dst); > -static struct rtnl_link_ops sit_link_ops __read_mostly; > +static struct rtnl_link_ops sit_link_ops; > > static unsigned int sit_net_id __read_mostly; > struct sit_net { > @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head) > unregister_netdevice_queue(dev, head); > } > > -static struct rtnl_link_ops sit_link_ops __read_mostly = { > +static struct rtnl_link_ops sit_link_ops = { > .kind = "sit", > .maxtype = IFLA_IPTUN_MAX, > .policy = ipip6_policy, > > Switching to ld.bfd also resolves it: > > [ 1.470405][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8 > [ 1.471016][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit > [ 1.471534][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 > [ 1.472062][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8 > [ 1.472790][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit > ... > Linux version 5.17.0-rc4-00001-g956f02ad5c31 (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022 > ... > > I tested with ToT LLVM (or at least, close to it, since there is an > unrelated ld.lld regression there) and I could not reproduce it there, > so I did a reverse bisect to see what commit fixes this issue in LLVM 14 > and I landed on: > > commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4 > Author: Fangrui Song <i@maskray.me> > Date: Thu Nov 25 14:12:34 2021 -0800 > > [ELF] Simplify DynamicSection content computation. NFC > > The new code computes the content twice, but avoides the tricky > std::function<uint64_t()>. Removed 13KiB code in a Release build. > > lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++-------------------------- > lld/ELF/SyntheticSections.h | 12 +---- > 2 files changed, 44 insertions(+), 85 deletions(-) > > That's... interesting, given that commit title says No Functional > Change, even though there clearly is one. That commit has a couple > mentions of PowerPC synthetic sections, so it is possible that the > new content calculation lines up with ld.bfd? > > I am not really sure where to go from here, as I don't fully understand > what the problem was before that LLD change. I'll see if I can do some > more investigation tomorrow (unless someone wants to beat me to it ;) Thank you for looking into this, and sharing your analysis. I built LLVM/clang from the master branch, rebuilt, but can still reproduce this. $ git clone --depth=1 https://github.com/llvm/llvm-project.git $ cd llvm-project/ $ git log --oneline 41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to Linalg Transforms $ mkdir build $ cd build $ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON -DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm $ make -j20 $ make -j20 clang-check $ make install $ /scratch/local2/llvm/bin/clang --version clang version 15.0.0 (https://github.com/llvm/llvm-project.git 41cb504b7c4b18ac15830107431a0c1eec73a6b2) Target: powerpc64le-unknown-linux-gnu Thread model: posix InstalledDir: /scratch/local2/llvm/bin Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in the path. $ LLVM=1 LLVM_IAS=0 eatmydata make -j20 $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m 512 -kernel /dev/shm/linux/vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_self_test=1 rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1" […] Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7 (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (clang version 15.0.0 (https://github.com/llvm/llvm-project.git 41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT Mon Feb 21 10:58:54 CET 2022 […] [ 0.465889][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 0.466749][ T1] Faulting instruction address: 0xc0000000008fc300 [ 0.467507][ T1] Oops: Kernel access of bad area, sig: 11 [#1] […] Kind regards, Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-02-21 11:17 ` Paul Menzel @ 2022-02-21 15:29 ` Nathan Chancellor 2022-02-21 17:33 ` Paul Menzel 2022-04-19 21:34 ` Nathan Chancellor 0 siblings, 2 replies; 17+ messages in thread From: Nathan Chancellor @ 2022-02-21 15:29 UTC (permalink / raw) To: Paul Menzel Cc: Zhouyi Zhou, Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev, Nick Desaulniers, llvm, Fangrui Song Hi Paul, On Mon, Feb 21, 2022 at 12:17:40PM +0100, Paul Menzel wrote: > Am 17.02.22 um 02:16 schrieb Nathan Chancellor: > > > On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote: > > > [Cc: +LLVM/clang build support folks] > > […] > > > > To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm* > > > and *clang* 1:13.0-53~exp1 > > > > > > $ clang --version > > > Ubuntu clang version 13.0.0-2 > > > Target: powerpc64le-unknown-linux-gnu > > > Thread model: posix > > > InstalledDir: /usr/bin > > > > > > results in a segmentation fault, while it works when building with GCC. > > > > > > $ gcc --version > > > gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0 > > > > Thank you for keying us in. I am going to have a bit of a brain dump > > here based on the information I have uncovered after a couple of hours > > of debugging. > > > > TL;DR: It seems like something is broken with __read_mostly + ld.lld > > before 14.0.0. > > > > My initial reproduction steps (boot-qemu.sh comes from > > https://github.com/ClangBuiltLinux/boot-utils): > > > > $ clang --version > > clang version 13.0.1 (Fedora 13.0.1-1.fc37) > > Target: x86_64-redhat-linux-gnu > > Thread model: posix > > InstalledDir: /usr/bin > > > > $ powerpc64le-linux-gnu-as --version > > GNU assembler version 2.37-2.fc36 > > Copyright (C) 2021 Free Software Foundation, Inc. > > This program is free software; you may redistribute it under the terms of > > the GNU General Public License version 3 or later. > > This program has absolutely no warranty. > > This assembler was configured for a target of `powerpc64le-linux-gnu'. > > > > $ curl -LSso .config https://lore.kernel.org/all/f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de/3-linux-5.17-rc4-rcu-dev-config.txt > > > > $ scripts/config --set-val INITRAMFS_SOURCE '""' > > > > $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all > > > > $ boot-qemu.sh -a ppc64le -k . -t 45s > > QEMU location: /usr/bin > > > > QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37) > > > > + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \ > > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \ > > ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \ > > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \ > > -machine powernv8 -display none -kernel \ > > /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \ > > -nodefaults -serial mon:stdio > > ... > > [ 1.478028][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000 > > [ 1.478630][ T1] Faulting instruction address: 0xc00000000090bee0 > > [ 1.479521][ T1] Oops: Kernel access of bad area, sig: 11 [#1] > > [ 1.480036][ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV > > [ 1.480853][ T1] Modules linked in: > > [ 1.481265][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1 > > [ 1.481967][ T1] NIP: c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c > > [ 1.482596][ T1] REGS: c000000007443330 TRAP: 0380 Not tainted (5.17.0-rc4-00001-gfa15c7cb550f) > > [ 1.483305][ T1] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 22800a87 XER: 00000000 > > [ 1.484277][ T1] CFAR: c000000000d96b5c IRQMASK: 0 > > [ 1.484277][ T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000 > > [ 1.484277][ T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88 > > [ 1.484277][ T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000 > > [ 1.484277][ T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000 > > [ 1.484277][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > [ 1.484277][ T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000 > > [ 1.484277][ T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000 > > [ 1.484277][ T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0 > > [ 1.490325][ T1] NIP [c00000000090bee0] strlen+0x10/0x30 > > [ 1.490788][ T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390 > > [ 1.491319][ T1] Call Trace: > > [ 1.491573][ T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable) > > [ 1.492291][ T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0 > > [ 1.492958][ T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0 > > [ 1.493559][ T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670 > > [ 1.494205][ T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80 > > [ 1.494823][ T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200 > > [ 1.495426][ T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0 > > [ 1.496014][ T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0 > > [ 1.496716][ T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0 > > [ 1.497372][ T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160 > > [ 1.497950][ T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0 > > [ 1.498573][ T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4 > > [ 1.499219][ T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4 > > [ 1.499799][ T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec > > [ 1.500444][ T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270 > > [ 1.501042][ T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 > > [ 1.501721][ T1] Instruction dump: > > [ 1.502202][ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000 > > [ 1.502934][ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050 > > [ 1.504028][ T1] ---[ end trace 0000000000000000 ]--- > > ... > > > > First thing was figuring out where the NULL pointer dereference happens, > > which appears to the "strlen(ops->kind)" in rtnl_link_get_size(): > > > > 515 static size_t rtnl_link_get_size(const struct net_device *dev) > > 516 { > > 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; > > 518 size_t size; > > 519 > > 520 if (!ops) > > 521 return 0; > > 522 > > 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > > which I confirmed some really rudimentary printk debugging: > > > > [ 1.476862][ T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null) > > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > > index 710da8a36729..c8d928e83aec 100644 > > --- a/net/core/rtnetlink.c > > +++ b/net/core/rtnetlink.c > > @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev) > > if (!ops) > > return 0; > > + pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__, > > + dev->name, ops, ops->kind); > > + > > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > > Okay... how did sit0 end up with a NULL kind...? It is very clearly > > defined as "sit": > > > > 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = { > > 1831 .kind = "sit", > > > > Adding some more debug prints to net/ipv6/sit.c: > > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > > index c0b138c20992..7b9edbed2fcd 100644 > > --- a/net/ipv6/sit.c > > +++ b/net/ipv6/sit.c > > @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net) > > */ > > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > + pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops); > > + pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind); > > + pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype); > > + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops); > > + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind); > > + > > err = register_netdev(sitn->fb_tunnel_dev); > > if (err) > > goto err_reg_dev; > > > > reveals: > > > > [ 1.471920][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8 > > [ 1.472534][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null) > > [ 1.473088][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 > > [ 1.473639][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8 > > [ 1.474370][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null) > > > > This is super bizarre, as the maxtype member appears to have the correct > > value, but how is kind's initial getting dropped on the floor? > > > > Removing the __read_mostly annotation "fixes" it: > > > > [ 1.481708][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60 > > [ 1.482319][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit > > [ 1.482878][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 > > [ 1.483429][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60 > > [ 1.484174][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit > > ... > > Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022 > > ... > > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > > index 7b9edbed2fcd..f109c7a0233b 100644 > > --- a/net/ipv6/sit.c > > +++ b/net/ipv6/sit.c > > @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev); > > static void ipip6_dev_free(struct net_device *dev); > > static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst, > > __be32 *v4dst); > > -static struct rtnl_link_ops sit_link_ops __read_mostly; > > +static struct rtnl_link_ops sit_link_ops; > > static unsigned int sit_net_id __read_mostly; > > struct sit_net { > > @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head) > > unregister_netdevice_queue(dev, head); > > } > > -static struct rtnl_link_ops sit_link_ops __read_mostly = { > > +static struct rtnl_link_ops sit_link_ops = { > > .kind = "sit", > > .maxtype = IFLA_IPTUN_MAX, > > .policy = ipip6_policy, > > > > Switching to ld.bfd also resolves it: > > > > [ 1.470405][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8 > > [ 1.471016][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit > > [ 1.471534][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 > > [ 1.472062][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8 > > [ 1.472790][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit > > ... > > Linux version 5.17.0-rc4-00001-g956f02ad5c31 (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022 > > ... > > > > I tested with ToT LLVM (or at least, close to it, since there is an > > unrelated ld.lld regression there) and I could not reproduce it there, > > so I did a reverse bisect to see what commit fixes this issue in LLVM 14 > > and I landed on: > > > > commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4 > > Author: Fangrui Song <i@maskray.me> > > Date: Thu Nov 25 14:12:34 2021 -0800 > > > > [ELF] Simplify DynamicSection content computation. NFC > > > > The new code computes the content twice, but avoides the tricky > > std::function<uint64_t()>. Removed 13KiB code in a Release build. > > > > lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++-------------------------- > > lld/ELF/SyntheticSections.h | 12 +---- > > 2 files changed, 44 insertions(+), 85 deletions(-) > > > > That's... interesting, given that commit title says No Functional > > Change, even though there clearly is one. That commit has a couple > > mentions of PowerPC synthetic sections, so it is possible that the > > new content calculation lines up with ld.bfd? > > > > I am not really sure where to go from here, as I don't fully understand > > what the problem was before that LLD change. I'll see if I can do some > > more investigation tomorrow (unless someone wants to beat me to it ;) > > Thank you for looking into this, and sharing your analysis. > > I built LLVM/clang from the master branch, rebuilt, but can still reproduce > this. > > $ git clone --depth=1 https://github.com/llvm/llvm-project.git > $ cd llvm-project/ > $ git log --oneline > 41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to Linalg > Transforms > $ mkdir build > $ cd build > $ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles" Since this is something related to ld.lld, not clang, this should be: ... -DLLVM_ENABLE_PROJECTS="clang;lld" ... > -DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON > -DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm > $ make -j20 > $ make -j20 clang-check You can also do 'check-lld' if you want. > $ make install > $ /scratch/local2/llvm/bin/clang --version > clang version 15.0.0 (https://github.com/llvm/llvm-project.git > 41cb504b7c4b18ac15830107431a0c1eec73a6b2) > Target: powerpc64le-unknown-linux-gnu > Thread model: posix > InstalledDir: /scratch/local2/llvm/bin > > Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in the > path. > > $ LLVM=1 LLVM_IAS=0 eatmydata make -j20 > > $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 -net > none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m > 512 -kernel /dev/shm/linux/vmlinux -append "debug_boot_weak_hash panic=-1 > console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1 > torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 > rcupdate.rcu_self_test=1 rcutorture.onoff_interval=1000 > rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4 > rcutorture.stat_interval=15 rcutorture.shutdown_secs=420 > rcutorture.test_no_idle_hz=1 rcutorture.verbose=1" > […] > Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7 > (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (clang version > 15.0.0 (https://github.com/llvm/llvm-project.git > 41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT Mon ^ still using ld.lld 13.0.0. If you want to test the master branch, I would checkout LLVM at 460830a9c664e8cce959c660648faa7747ad8bdc, as the next commit introduces a boot regression unrelated to this issue: https://github.com/ClangBuiltLinux/linux/issues/1581 That should at least confirm this is resolved in a newer release. > Feb 21 10:58:54 CET 2022 > […] > [ 0.465889][ T1] BUG: Kernel NULL pointer dereference on read at > 0x00000000 > [ 0.466749][ T1] Faulting instruction address: 0xc0000000008fc300 > [ 0.467507][ T1] Oops: Kernel access of bad area, sig: 11 [#1] > […] I do intend to do further analysis at some point over the next few days to see if I can figure out exactly why that commit that I mentioned above fixes the issue then we can look into what we should do about it in the kernel sources. Cheers, Nathan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-02-21 15:29 ` Nathan Chancellor @ 2022-02-21 17:33 ` Paul Menzel 2022-04-19 21:34 ` Nathan Chancellor 1 sibling, 0 replies; 17+ messages in thread From: Paul Menzel @ 2022-02-21 17:33 UTC (permalink / raw) To: Nathan Chancellor Cc: Zhouyi Zhou, Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev, Nick Desaulniers, llvm, Fangrui Song Dear Nathan, Am 21.02.22 um 16:29 schrieb Nathan Chancellor: > On Mon, Feb 21, 2022 at 12:17:40PM +0100, Paul Menzel wrote: >> Am 17.02.22 um 02:16 schrieb Nathan Chancellor: >> >>> On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote: >>>> [Cc: +LLVM/clang build support folks] >> >> […] >> >>>> To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm* >>>> and *clang* 1:13.0-53~exp1 >>>> >>>> $ clang --version >>>> Ubuntu clang version 13.0.0-2 >>>> Target: powerpc64le-unknown-linux-gnu >>>> Thread model: posix >>>> InstalledDir: /usr/bin >>>> >>>> results in a segmentation fault, while it works when building with GCC. >>>> >>>> $ gcc --version >>>> gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0 >>> >>> Thank you for keying us in. I am going to have a bit of a brain dump >>> here based on the information I have uncovered after a couple of hours >>> of debugging. >>> >>> TL;DR: It seems like something is broken with __read_mostly + ld.lld >>> before 14.0.0. >>> >>> My initial reproduction steps (boot-qemu.sh comes from >>> https://github.com/ClangBuiltLinux/boot-utils): >>> >>> $ clang --version >>> clang version 13.0.1 (Fedora 13.0.1-1.fc37) >>> Target: x86_64-redhat-linux-gnu >>> Thread model: posix >>> InstalledDir: /usr/bin >>> >>> $ powerpc64le-linux-gnu-as --version >>> GNU assembler version 2.37-2.fc36 >>> Copyright (C) 2021 Free Software Foundation, Inc. >>> This program is free software; you may redistribute it under the terms of >>> the GNU General Public License version 3 or later. >>> This program has absolutely no warranty. >>> This assembler was configured for a target of `powerpc64le-linux-gnu'. >>> >>> $ curl -LSso .config https://lore.kernel.org/all/f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de/3-linux-5.17-rc4-rcu-dev-config.txt >>> >>> $ scripts/config --set-val INITRAMFS_SOURCE '""' >>> >>> $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all >>> >>> $ boot-qemu.sh -a ppc64le -k . -t 45s >>> QEMU location: /usr/bin >>> >>> QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37) >>> >>> + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \ >>> /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \ >>> ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \ >>> /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \ >>> -machine powernv8 -display none -kernel \ >>> /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \ >>> -nodefaults -serial mon:stdio >>> ... >>> [ 1.478028][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000 >>> [ 1.478630][ T1] Faulting instruction address: 0xc00000000090bee0 >>> [ 1.479521][ T1] Oops: Kernel access of bad area, sig: 11 [#1] >>> [ 1.480036][ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV >>> [ 1.480853][ T1] Modules linked in: >>> [ 1.481265][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1 >>> [ 1.481967][ T1] NIP: c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c >>> [ 1.482596][ T1] REGS: c000000007443330 TRAP: 0380 Not tainted (5.17.0-rc4-00001-gfa15c7cb550f) >>> [ 1.483305][ T1] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 22800a87 XER: 00000000 >>> [ 1.484277][ T1] CFAR: c000000000d96b5c IRQMASK: 0 >>> [ 1.484277][ T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000 >>> [ 1.484277][ T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88 >>> [ 1.484277][ T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000 >>> [ 1.484277][ T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000 >>> [ 1.484277][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> [ 1.484277][ T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000 >>> [ 1.484277][ T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000 >>> [ 1.484277][ T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0 >>> [ 1.490325][ T1] NIP [c00000000090bee0] strlen+0x10/0x30 >>> [ 1.490788][ T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390 >>> [ 1.491319][ T1] Call Trace: >>> [ 1.491573][ T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable) >>> [ 1.492291][ T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0 >>> [ 1.492958][ T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0 >>> [ 1.493559][ T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670 >>> [ 1.494205][ T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80 >>> [ 1.494823][ T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200 >>> [ 1.495426][ T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0 >>> [ 1.496014][ T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0 >>> [ 1.496716][ T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0 >>> [ 1.497372][ T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160 >>> [ 1.497950][ T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0 >>> [ 1.498573][ T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4 >>> [ 1.499219][ T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4 >>> [ 1.499799][ T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec >>> [ 1.500444][ T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270 >>> [ 1.501042][ T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 >>> [ 1.501721][ T1] Instruction dump: >>> [ 1.502202][ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000 >>> [ 1.502934][ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050 >>> [ 1.504028][ T1] ---[ end trace 0000000000000000 ]--- >>> ... >>> >>> First thing was figuring out where the NULL pointer dereference happens, >>> which appears to the "strlen(ops->kind)" in rtnl_link_get_size(): >>> >>> 515 static size_t rtnl_link_get_size(const struct net_device *dev) >>> 516 { >>> 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; >>> 518 size_t size; >>> 519 >>> 520 if (!ops) >>> 521 return 0; >>> 522 >>> 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ >>> 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ >>> >>> which I confirmed some really rudimentary printk debugging: >>> >>> [ 1.476862][ T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null) >>> >>> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c >>> index 710da8a36729..c8d928e83aec 100644 >>> --- a/net/core/rtnetlink.c >>> +++ b/net/core/rtnetlink.c >>> @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev) >>> if (!ops) >>> return 0; >>> + pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__, >>> + dev->name, ops, ops->kind); >>> + >>> size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ >>> nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ >>> >>> Okay... how did sit0 end up with a NULL kind...? It is very clearly >>> defined as "sit": >>> >>> 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = { >>> 1831 .kind = "sit", >>> >>> Adding some more debug prints to net/ipv6/sit.c: >>> >>> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c >>> index c0b138c20992..7b9edbed2fcd 100644 >>> --- a/net/ipv6/sit.c >>> +++ b/net/ipv6/sit.c >>> @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net) >>> */ >>> sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; >>> + pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops); >>> + pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind); >>> + pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype); >>> + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops); >>> + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind); >>> + >>> err = register_netdev(sitn->fb_tunnel_dev); >>> if (err) >>> goto err_reg_dev; >>> >>> reveals: >>> >>> [ 1.471920][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8 >>> [ 1.472534][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null) >>> [ 1.473088][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 >>> [ 1.473639][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8 >>> [ 1.474370][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null) >>> >>> This is super bizarre, as the maxtype member appears to have the correct >>> value, but how is kind's initial getting dropped on the floor? >>> >>> Removing the __read_mostly annotation "fixes" it: >>> >>> [ 1.481708][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60 >>> [ 1.482319][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit >>> [ 1.482878][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 >>> [ 1.483429][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60 >>> [ 1.484174][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit >>> ... >>> Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022 >>> ... >>> >>> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c >>> index 7b9edbed2fcd..f109c7a0233b 100644 >>> --- a/net/ipv6/sit.c >>> +++ b/net/ipv6/sit.c >>> @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev); >>> static void ipip6_dev_free(struct net_device *dev); >>> static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst, >>> __be32 *v4dst); >>> -static struct rtnl_link_ops sit_link_ops __read_mostly; >>> +static struct rtnl_link_ops sit_link_ops; >>> static unsigned int sit_net_id __read_mostly; >>> struct sit_net { >>> @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head) >>> unregister_netdevice_queue(dev, head); >>> } >>> -static struct rtnl_link_ops sit_link_ops __read_mostly = { >>> +static struct rtnl_link_ops sit_link_ops = { >>> .kind = "sit", >>> .maxtype = IFLA_IPTUN_MAX, >>> .policy = ipip6_policy, >>> >>> Switching to ld.bfd also resolves it: >>> >>> [ 1.470405][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8 >>> [ 1.471016][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit >>> [ 1.471534][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 >>> [ 1.472062][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8 >>> [ 1.472790][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit >>> ... >>> Linux version 5.17.0-rc4-00001-g956f02ad5c31 (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022 >>> ... >>> >>> I tested with ToT LLVM (or at least, close to it, since there is an >>> unrelated ld.lld regression there) and I could not reproduce it there, >>> so I did a reverse bisect to see what commit fixes this issue in LLVM 14 >>> and I landed on: >>> >>> commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4 >>> Author: Fangrui Song <i@maskray.me> >>> Date: Thu Nov 25 14:12:34 2021 -0800 >>> >>> [ELF] Simplify DynamicSection content computation. NFC >>> >>> The new code computes the content twice, but avoides the tricky >>> std::function<uint64_t()>. Removed 13KiB code in a Release build. >>> >>> lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++-------------------------- >>> lld/ELF/SyntheticSections.h | 12 +---- >>> 2 files changed, 44 insertions(+), 85 deletions(-) >>> >>> That's... interesting, given that commit title says No Functional >>> Change, even though there clearly is one. That commit has a couple >>> mentions of PowerPC synthetic sections, so it is possible that the >>> new content calculation lines up with ld.bfd? >>> >>> I am not really sure where to go from here, as I don't fully understand >>> what the problem was before that LLD change. I'll see if I can do some >>> more investigation tomorrow (unless someone wants to beat me to it ;) >> >> Thank you for looking into this, and sharing your analysis. >> >> I built LLVM/clang from the master branch, rebuilt, but can still reproduce >> this. >> >> $ git clone --depth=1 https://github.com/llvm/llvm-project.git >> $ cd llvm-project/ >> $ git log --oneline >> 41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to Linalg Transforms >> $ mkdir build >> $ cd build >> $ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles" > > Since this is something related to ld.lld, not clang, this should be: > > ... -DLLVM_ENABLE_PROJECTS="clang;lld" ... > >> -DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON >> -DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm >> $ make -j20 >> $ make -j20 clang-check > > You can also do 'check-lld' if you want. > >> $ make install >> $ /scratch/local2/llvm/bin/clang --version >> clang version 15.0.0 (https://github.com/llvm/llvm-project.git 41cb504b7c4b18ac15830107431a0c1eec73a6b2) >> Target: powerpc64le-unknown-linux-gnu >> Thread model: posix >> InstalledDir: /scratch/local2/llvm/bin >> >> Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in the >> path. >> >> $ LLVM=1 LLVM_IAS=0 eatmydata make -j20 >> >> $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m 512 -kernel /dev/shm/linux/vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_self_test=1 rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1" >> […] >> Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7 (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (clang version 15.0.0 (https://github.com/llvm/llvm-project.git 41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT Mon > > ^ still using ld.lld 13.0.0. > > If you want to test the master branch, I would checkout LLVM at > 460830a9c664e8cce959c660648faa7747ad8bdc, as the next commit introduces > a boot regression unrelated to this issue: > > https://github.com/ClangBuiltLinux/linux/issues/1581 > > That should at least confirm this is resolved in a newer release. Sorry for missing to update ld.lld. Indeed with the commit you mentioned, the segmentation fault is gone. $ /scratch/local2/llvm/bin/ld.lld --version LLD 14.0.0 (compatible with GNU linkers) >> Feb 21 10:58:54 CET 2022 >> […] >> [ 0.465889][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000 >> [ 0.466749][ T1] Faulting instruction address: 0xc0000000008fc300 >> [ 0.467507][ T1] Oops: Kernel access of bad area, sig: 11 [#1] >> […] > > I do intend to do further analysis at some point over the next few days > to see if I can figure out exactly why that commit that I mentioned > above fixes the issue then we can look into what we should do about it > in the kernel sources. Awesome. Thank you for working on that. Kind regards, Paul ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) 2022-02-21 15:29 ` Nathan Chancellor 2022-02-21 17:33 ` Paul Menzel @ 2022-04-19 21:34 ` Nathan Chancellor 1 sibling, 0 replies; 17+ messages in thread From: Nathan Chancellor @ 2022-04-19 21:34 UTC (permalink / raw) To: Paul Menzel Cc: Zhouyi Zhou, Paul E. McKenney, Josh Triplett, rcu, LKML, David S. Miller, Jakub Kicinski, netdev, Nick Desaulniers, llvm, Fangrui Song On Mon, Feb 21, 2022 at 08:29:46AM -0700, Nathan Chancellor wrote: > Hi Paul, > > On Mon, Feb 21, 2022 at 12:17:40PM +0100, Paul Menzel wrote: > > Am 17.02.22 um 02:16 schrieb Nathan Chancellor: > > > > > On Wed, Feb 16, 2022 at 02:19:51PM +0100, Paul Menzel wrote: > > > > [Cc: +LLVM/clang build support folks] > > > > […] > > > > > > To recap: On a ppc64le machine, building Linux in Ubuntu 21.10 with *llvm* > > > > and *clang* 1:13.0-53~exp1 > > > > > > > > $ clang --version > > > > Ubuntu clang version 13.0.0-2 > > > > Target: powerpc64le-unknown-linux-gnu > > > > Thread model: posix > > > > InstalledDir: /usr/bin > > > > > > > > results in a segmentation fault, while it works when building with GCC. > > > > > > > > $ gcc --version > > > > gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0 > > > > > > Thank you for keying us in. I am going to have a bit of a brain dump > > > here based on the information I have uncovered after a couple of hours > > > of debugging. > > > > > > TL;DR: It seems like something is broken with __read_mostly + ld.lld > > > before 14.0.0. > > > > > > My initial reproduction steps (boot-qemu.sh comes from > > > https://github.com/ClangBuiltLinux/boot-utils): > > > > > > $ clang --version > > > clang version 13.0.1 (Fedora 13.0.1-1.fc37) > > > Target: x86_64-redhat-linux-gnu > > > Thread model: posix > > > InstalledDir: /usr/bin > > > > > > $ powerpc64le-linux-gnu-as --version > > > GNU assembler version 2.37-2.fc36 > > > Copyright (C) 2021 Free Software Foundation, Inc. > > > This program is free software; you may redistribute it under the terms of > > > the GNU General Public License version 3 or later. > > > This program has absolutely no warranty. > > > This assembler was configured for a target of `powerpc64le-linux-gnu'. > > > > > > $ curl -LSso .config https://lore.kernel.org/all/f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de/3-linux-5.17-rc4-rcu-dev-config.txt > > > > > > $ scripts/config --set-val INITRAMFS_SOURCE '""' > > > > > > $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 LLVM_IAS=0 all > > > > > > $ boot-qemu.sh -a ppc64le -k . -t 45s > > > QEMU location: /usr/bin > > > > > > QEMU version: QEMU emulator version 6.2.0 (qemu-6.2.0-5.fc37) > > > > > > + timeout --foreground 45s stdbuf -oL -eL qemu-system-ppc64 -initrd \ > > > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/rootfs.cpio -device \ > > > ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 -L \ > > > /home/nathan/cbl/github/boot-utils-ro/images/ppc64le/ -bios skiboot.lid \ > > > -machine powernv8 -display none -kernel \ > > > /home/nathan/cbl/src/linux/arch/powerpc/boot/zImage.epapr -m 2G \ > > > -nodefaults -serial mon:stdio > > > ... > > > [ 1.478028][ T1] BUG: Kernel NULL pointer dereference on read at 0x00000000 > > > [ 1.478630][ T1] Faulting instruction address: 0xc00000000090bee0 > > > [ 1.479521][ T1] Oops: Kernel access of bad area, sig: 11 [#1] > > > [ 1.480036][ T1] LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=16 NUMA PowerNV > > > [ 1.480853][ T1] Modules linked in: > > > [ 1.481265][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-00001-gfa15c7cb550f #1 > > > [ 1.481967][ T1] NIP: c00000000090bee0 LR: c000000000d96b60 CTR: c0000000000d5b4c > > > [ 1.482596][ T1] REGS: c000000007443330 TRAP: 0380 Not tainted (5.17.0-rc4-00001-gfa15c7cb550f) > > > [ 1.483305][ T1] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 22800a87 XER: 00000000 > > > [ 1.484277][ T1] CFAR: c000000000d96b5c IRQMASK: 0 > > > [ 1.484277][ T1] GPR00: c000000000d96b54 c0000000074435d0 c0000000028bc600 0000000000000000 > > > [ 1.484277][ T1] GPR04: ffffffffffffffff ffffffffff1ea558 ffffffffff1ebfe4 c00000000261ae88 > > > [ 1.484277][ T1] GPR08: 0000000000000003 0000000000000004 c00000000261ae88 0000000000000000 > > > [ 1.484277][ T1] GPR12: 0000000000800000 c000000002a60000 c000000000012518 0000000000000000 > > > [ 1.484277][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > [ 1.484277][ T1] GPR20: 0000000000000000 c0000000027bff80 0000000000000cc0 0000000000000000 > > > [ 1.484277][ T1] GPR24: 0000000000000010 0000000000000000 0000000000000000 0000000000000000 > > > [ 1.484277][ T1] GPR28: c0000000028fcfd8 c000000007b83000 0000000000000000 c0000000074435d0 > > > [ 1.490325][ T1] NIP [c00000000090bee0] strlen+0x10/0x30 > > > [ 1.490788][ T1] LR [c000000000d96b60] if_nlmsg_size+0x2b0/0x390 > > > [ 1.491319][ T1] Call Trace: > > > [ 1.491573][ T1] [c0000000074435d0] [c000000000d96b54] if_nlmsg_size+0x2a4/0x390 (unreliable) > > > [ 1.492291][ T1] [c000000007443680] [c000000000d96790] rtmsg_ifinfo_build_skb+0x80/0x1a0 > > > [ 1.492958][ T1] [c000000007443740] [c000000000d97590] rtmsg_ifinfo+0x70/0xd0 > > > [ 1.493559][ T1] [c000000007443790] [c000000000d7d528] register_netdevice+0x5d8/0x670 > > > [ 1.494205][ T1] [c000000007443820] [c000000000d7d94c] register_netdev+0x4c/0x80 > > > [ 1.494823][ T1] [c000000007443850] [c000000000f826d8] sit_init_net+0x1b8/0x200 > > > [ 1.495426][ T1] [c0000000074438d0] [c000000000d63b5c] ops_init+0x14c/0x1c0 > > > [ 1.496014][ T1] [c000000007443930] [c000000000d6314c] register_pernet_operations+0xec/0x1e0 > > > [ 1.496716][ T1] [c000000007443990] [c000000000d633d0] register_pernet_device+0x60/0xd0 > > > [ 1.497372][ T1] [c0000000074439e0] [c000000002085194] sit_init+0x54/0x160 > > > [ 1.497950][ T1] [c000000007443a70] [c000000000011c58] do_one_initcall+0x108/0x3e0 > > > [ 1.498573][ T1] [c000000007443c70] [c000000002006190] do_initcall_level+0xe4/0x1c4 > > > [ 1.499219][ T1] [c000000007443cc0] [c00000000200604c] do_initcalls+0x84/0xe4 > > > [ 1.499799][ T1] [c000000007443d40] [c000000002005da8] kernel_init_freeable+0x160/0x1ec > > > [ 1.500444][ T1] [c000000007443da0] [c00000000001254c] kernel_init+0x3c/0x270 > > > [ 1.501042][ T1] [c000000007443e10] [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 > > > [ 1.501721][ T1] Instruction dump: > > > [ 1.502202][ T1] eb81ffe0 7c0803a6 4e800020 00000000 00000000 00000000 60000000 60000000 > > > [ 1.502934][ T1] 3883ffff 60000000 60000000 60000000 <8ca40001> 28050000 4082fff8 7c632050 > > > [ 1.504028][ T1] ---[ end trace 0000000000000000 ]--- > > > ... > > > > > > First thing was figuring out where the NULL pointer dereference happens, > > > which appears to the "strlen(ops->kind)" in rtnl_link_get_size(): > > > > > > 515 static size_t rtnl_link_get_size(const struct net_device *dev) > > > 516 { > > > 517 const struct rtnl_link_ops *ops = dev->rtnl_link_ops; > > > 518 size_t size; > > > 519 > > > 520 if (!ops) > > > 521 return 0; > > > 522 > > > 523 size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > > 524 nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > > > > which I confirmed some really rudimentary printk debugging: > > > > > > [ 1.476862][ T1] nathan: rtnl_link_get_size(): name: sit0, ops: c0000000028fcfd8, ops->kind: (null) > > > > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > > > index 710da8a36729..c8d928e83aec 100644 > > > --- a/net/core/rtnetlink.c > > > +++ b/net/core/rtnetlink.c > > > @@ -520,6 +520,9 @@ static size_t rtnl_link_get_size(const struct net_device *dev) > > > if (!ops) > > > return 0; > > > + pr_err("nathan: %s(): name: %s, ops: %px, ops->kind: %s\n", __func__, > > > + dev->name, ops, ops->kind); > > > + > > > size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */ > > > nla_total_size(strlen(ops->kind) + 1); /* IFLA_INFO_KIND */ > > > > > > Okay... how did sit0 end up with a NULL kind...? It is very clearly > > > defined as "sit": > > > > > > 1830 static struct rtnl_link_ops sit_link_ops __read_mostly = { > > > 1831 .kind = "sit", > > > > > > Adding some more debug prints to net/ipv6/sit.c: > > > > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > > > index c0b138c20992..7b9edbed2fcd 100644 > > > --- a/net/ipv6/sit.c > > > +++ b/net/ipv6/sit.c > > > @@ -1920,6 +1920,12 @@ static int __net_init sit_init_net(struct net *net) > > > */ > > > sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL; > > > + pr_err("nathan: %s(): &sit_link_ops: %px\n", __func__, &sit_link_ops); > > > + pr_err("nathan: %s(): sit_link_ops.kind: %s\n", __func__, sit_link_ops.kind); > > > + pr_err("nathan: %s(): sit_link_ops.maxtype: %u\n", __func__, sit_link_ops.maxtype); > > > + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops: %px\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops); > > > + pr_err("nathan: %s(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: %s\n", __func__, sitn->fb_tunnel_dev->rtnl_link_ops->kind); > > > + > > > err = register_netdev(sitn->fb_tunnel_dev); > > > if (err) > > > goto err_reg_dev; > > > > > > reveals: > > > > > > [ 1.471920][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028fcfd8 > > > [ 1.472534][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: (null) > > > [ 1.473088][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 > > > [ 1.473639][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028fcfd8 > > > [ 1.474370][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: (null) > > > > > > This is super bizarre, as the maxtype member appears to have the correct > > > value, but how is kind's initial getting dropped on the floor? > > > > > > Removing the __read_mostly annotation "fixes" it: > > > > > > [ 1.481708][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000027d3f60 > > > [ 1.482319][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit > > > [ 1.482878][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 > > > [ 1.483429][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000027d3f60 > > > [ 1.484174][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit > > > ... > > > Linux version 5.17.0-rc4-00001-g956f02ad5c31-dirty (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), LLD 13.0.1) #2 SMP PREEMPT Wed Feb 16 13:29:49 MST 2022 > > > ... > > > > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > > > index 7b9edbed2fcd..f109c7a0233b 100644 > > > --- a/net/ipv6/sit.c > > > +++ b/net/ipv6/sit.c > > > @@ -70,7 +70,7 @@ static void ipip6_tunnel_setup(struct net_device *dev); > > > static void ipip6_dev_free(struct net_device *dev); > > > static bool check_6rd(struct ip_tunnel *tunnel, const struct in6_addr *v6dst, > > > __be32 *v4dst); > > > -static struct rtnl_link_ops sit_link_ops __read_mostly; > > > +static struct rtnl_link_ops sit_link_ops; > > > static unsigned int sit_net_id __read_mostly; > > > struct sit_net { > > > @@ -1827,7 +1827,7 @@ static void ipip6_dellink(struct net_device *dev, struct list_head *head) > > > unregister_netdevice_queue(dev, head); > > > } > > > -static struct rtnl_link_ops sit_link_ops __read_mostly = { > > > +static struct rtnl_link_ops sit_link_ops = { > > > .kind = "sit", > > > .maxtype = IFLA_IPTUN_MAX, > > > .policy = ipip6_policy, > > > > > > Switching to ld.bfd also resolves it: > > > > > > [ 1.470405][ T1] sit: nathan: sit_init_net(): &sit_link_ops: c0000000028acfd8 > > > [ 1.471016][ T1] sit: nathan: sit_init_net(): sit_link_ops.kind: sit > > > [ 1.471534][ T1] sit: nathan: sit_init_net(): sit_link_ops.maxtype: 20 > > > [ 1.472062][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops: c0000000028acfd8 > > > [ 1.472790][ T1] sit: nathan: sit_init_net(): sitn->fb_tunnel_dev->rtnl_link_ops->kind: sit > > > ... > > > Linux version 5.17.0-rc4-00001-g956f02ad5c31 (nathan@dev-fedora.archlinux-ax161) (clang version 13.0.1 (Fedora 13.0.1-1.fc37), GNU ld version 2.37-2.fc36) #3 SMP PREEMPT Wed Feb 16 13:33:42 MST 2022 > > > ... > > > > > > I tested with ToT LLVM (or at least, close to it, since there is an > > > unrelated ld.lld regression there) and I could not reproduce it there, > > > so I did a reverse bisect to see what commit fixes this issue in LLVM 14 > > > and I landed on: > > > > > > commit 55c14d6dbfd8e7b86c15d2613fea3490078e2ae4 > > > Author: Fangrui Song <i@maskray.me> > > > Date: Thu Nov 25 14:12:34 2021 -0800 > > > > > > [ELF] Simplify DynamicSection content computation. NFC > > > > > > The new code computes the content twice, but avoides the tricky > > > std::function<uint64_t()>. Removed 13KiB code in a Release build. > > > > > > lld/ELF/SyntheticSections.cpp | 117 ++++++++++++++++-------------------------- > > > lld/ELF/SyntheticSections.h | 12 +---- > > > 2 files changed, 44 insertions(+), 85 deletions(-) > > > > > > That's... interesting, given that commit title says No Functional > > > Change, even though there clearly is one. That commit has a couple > > > mentions of PowerPC synthetic sections, so it is possible that the > > > new content calculation lines up with ld.bfd? > > > > > > I am not really sure where to go from here, as I don't fully understand > > > what the problem was before that LLD change. I'll see if I can do some > > > more investigation tomorrow (unless someone wants to beat me to it ;) > > > > Thank you for looking into this, and sharing your analysis. > > > > I built LLVM/clang from the master branch, rebuilt, but can still reproduce > > this. > > > > $ git clone --depth=1 https://github.com/llvm/llvm-project.git > > $ cd llvm-project/ > > $ git log --oneline > > 41cb504b7 [mlir][linalg][bufferize][NFC] Move interface impl to Linalg > > Transforms > > $ mkdir build > > $ cd build > > $ cmake -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles" > > Since this is something related to ld.lld, not clang, this should be: > > ... -DLLVM_ENABLE_PROJECTS="clang;lld" ... > > > -DCMAKE_BUILD_TYPE=Release -DLLVM_INSTALL_UTILS=ON > > -DCMAKE_INSTALL_PREFIX=/scratch/local2/llvm ../llvm > > $ make -j20 > > $ make -j20 clang-check > > You can also do 'check-lld' if you want. > > > $ make install > > $ /scratch/local2/llvm/bin/clang --version > > clang version 15.0.0 (https://github.com/llvm/llvm-project.git > > 41cb504b7c4b18ac15830107431a0c1eec73a6b2) > > Target: powerpc64le-unknown-linux-gnu > > Thread model: posix > > InstalledDir: /scratch/local2/llvm/bin > > > > Then build Linux after `make clean` with `/scratch/local2/llvm/bin` in the > > path. > > > > $ LLVM=1 LLVM_IAS=0 eatmydata make -j20 > > > > $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=1 -net > > none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m > > 512 -kernel /dev/shm/linux/vmlinux -append "debug_boot_weak_hash panic=-1 > > console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1 > > torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 > > rcupdate.rcu_self_test=1 rcutorture.onoff_interval=1000 > > rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4 > > rcutorture.stat_interval=15 rcutorture.shutdown_secs=420 > > rcutorture.test_no_idle_hz=1 rcutorture.verbose=1" > > […] > > Preparing to boot Linux version 5.17.0-rc5-00178-ga4b9a8fb20e7 > > (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (clang version > > 15.0.0 (https://github.com/llvm/llvm-project.git > > 41cb504b7c4b18ac15830107431a0c1eec73a6b2), LLD 13.0.0) #29 SMP PREEMPT Mon > > ^ still using ld.lld 13.0.0. > > If you want to test the master branch, I would checkout LLVM at > 460830a9c664e8cce959c660648faa7747ad8bdc, as the next commit introduces > a boot regression unrelated to this issue: > > https://github.com/ClangBuiltLinux/linux/issues/1581 > > That should at least confirm this is resolved in a newer release. > > > Feb 21 10:58:54 CET 2022 > > […] > > [ 0.465889][ T1] BUG: Kernel NULL pointer dereference on read at > > 0x00000000 > > [ 0.466749][ T1] Faulting instruction address: 0xc0000000008fc300 > > [ 0.467507][ T1] Oops: Kernel access of bad area, sig: 11 [#1] > > […] > > I do intend to do further analysis at some point over the next few days > to see if I can figure out exactly why that commit that I mentioned > above fixes the issue then we can look into what we should do about it > in the kernel sources. Sorry for taking so long to get back to this. For me, commit d79976918852 ("powerpc/64: Add UADDR64 relocation support") resolves this for ld.lld 13.x. I have started a separate thread about whether or not this commit is suitable for stable, specifically 5.17 and 5.15: https://lore.kernel.org/Yl8pNxSGUgeHZ1FT@dev-arch.thelio-3990X/ Cheers, Nathan ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2022-04-19 21:34 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-25 19:13 BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb) Paul Menzel 2022-01-26 9:47 ` Zhouyi Zhou 2022-01-29 2:23 ` Zhouyi Zhou 2022-01-29 16:52 ` Paul Menzel 2022-01-30 0:21 ` Zhouyi Zhou 2022-01-30 8:19 ` Paul Menzel 2022-01-30 13:24 ` Zhouyi Zhou 2022-01-30 17:44 ` Paul E. McKenney 2022-01-31 1:08 ` Zhouyi Zhou 2022-02-01 17:50 ` Paul E. McKenney 2022-02-02 2:39 ` Zhouyi Zhou 2022-02-08 20:10 ` Zhouyi Zhou [not found] ` <f41550c7-26c0-cf81-7de9-aa924434a565@molgen.mpg.de> 2022-02-17 1:16 ` Nathan Chancellor 2022-02-21 11:17 ` Paul Menzel 2022-02-21 15:29 ` Nathan Chancellor 2022-02-21 17:33 ` Paul Menzel 2022-04-19 21:34 ` Nathan Chancellor
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).