* [OOPS] In __netif_receive_skb_core @ 2016-01-07 17:54 Ivaylo Dimitrov 2016-01-10 17:48 ` Ivaylo Dimitrov 0 siblings, 1 reply; 17+ messages in thread From: Ivaylo Dimitrov @ 2016-01-07 17:54 UTC (permalink / raw) To: noureddine, davem Cc: Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap Hi, Trying to bring up the modem interface on Nokia n900 device with recent linux, leads to the following kernel oops (read from the mtdoops device): <6>[ 144.009765] bq27xxx-battery 2-0055: battery is not calibrated! ignoring capacity values <5>[ 145.\x1188964] ssi-protocol ssi-protocol: WAKELINES TEST OK <6>[ 145.194793] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes ready <1>[ 145.358154] Unable to handle kernel NULL pointer dereference at virtual address 0000005c <1>[ 145.366821] pgd = ce530000 <1>[ 145.369689] [0000005c] *pgd=8e44c831, *pte=00000000, *ppte=00000000 <0>[ 145.376373] Internal error: Oops: 17 [#1] PREEMPT ARM <4>[ 145.381683] Modules linked in: cmt_speech nokia_modem ssi_protocol sha256_generic hmac drbg ansi_cprng ctr ccm rfcomm sd_mod scsi_mod bnep"bluetooth omaplfb pvrsrvkm ipv6 bq2415x_charger uinput hsi_char radio_platform_si4713 joydev omap_ssi_port video_bus_switch arc4 wl1251_spi gpio_keys isp1704_charger wl1251 mac80211 smc91x mii cfg80211 omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713 bq27xxx_battery leds_lp5523 leds_lp55xx_common adp1653 tsl2563 rtc_twl twl4030_wdt et8ek8 ad5820 v4l2_common smiaregs videodev twl4030_vibra ff_memless media lis3lv02d_i2c lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal thermal_sys hwmon rx51_battery <4>[ 145.441802] CPU: 0 PID: 1040 Comm: csd Not tainted 4.4.0-rc7+ #2 <4>[ 145.448120] Hardware name: Nokia RX-51 board <4>[ 145.452636] task: ce517700 ti: cef50000 task.ti: cef50000 <4>[ 145.458343] PC is at __netif_receive_skb_core+0x7c0/0x92c <4>[ 145.464050] LR is at sock_queue_rcv_skb+0x208/0x214 <4>[ 145.469207] pc : [<c0393ebc>] lr : [<c03852f4>] psr: 00 00113 <4>[ 145.469207] sp : cef51e98 ip : 15800000 fp : c3a4a780 <4>[ 145.481292] r10: c3a2005c r9 : 00000000 r8 : c3987834 <4>[ 145.486816] r7 : 0000f500 r6 : c3a20000 r5 : c3a20048 r4 : c3987780 <4>[ 145.493682] r3 : 00000000 r2 : 00000002 r1 : 00000000 r0 : 00000000 <4>[ 145.500579] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none <4>[ 145.508087] Control: 10c5387d Table: 8e530019 DAC: 00000051 <0>[ 145.514160] Process csd (pid: 1040, stack limit = 0xcef50210) <0>[ 145.520202] Stack: (0xcef51e98 to 0xcef52000) <0>[ 145.524810] 1e80: cf334000 cf3ccd90 <0>[ 145.533447] 1ea0: 00000002 c3987780 c0651400 c3a5507c c3912000 c0068534 c3a5507c 40000113 <0>[ 145.542083] 1ec0: ffffffff bf3b5fac bf3b5d88 ffffffff 00000000 c0675f80 c3987780 00000000 <0>[ 145.550720] 1ee0: c0675f74 c0675f48 cef51f18 00000040 cef51f20 c03965f4 c0396580 c0675f80 <0>[ 145.559356] 1f00: 00000001 0000012c 00000040 c0676217 ffffc399 c0396d48 cef51f18 cef51f18 <0>[ 145.567993] 1f20: cef51f20 cef51f20 cf334000 00000008 c0677240 00000008 c0677200 c067724c <0>[ 145.576629] 1f40: 00000100 00400100 c0440be4 c0030dec cf807f00 fa2000cc 00000003 00000003 <0>[ 145.585266] 1f60: ffffc398 00000004 10c53c7d 00000000 00000000 cf802000 00000001 10c53c7d <0>[ 145.593902] 1f80: b6ea4f44 00001684 bee8e7dc c003119c 00000000 c005b44c cef51fb0 b6e1b024 <0>[ 145.602539] 1fa0: a0000010 ffffffff 10c5387d c043c610 b6ea71bc 00000000 00000000 00002f48 <0>[ 145.611175] 1fc0: 00000000 b6ea71bc 00002f48 b6ebb684 b6eba000 b6ea4f44 00001684 bee8e7dc <0>[ 145.619812] 1fe0: 00019817 bee8e788 b6e19d0c b6e1b024 a0000010 ffffffff 00000000 00000000 <4>[ 145.628648] [<c0393ebc>] (__netif_receive_skb_core) from [<c03965f4>] (process_backlog+0x74/0xf4) <4>[ 145.637817] [<c03965f4>] (process_backlog) from [<c0396d48>] (net_rx_action+0xd0/0x284) <4>[ 145.646301] [<c0396d48>] (net_rx_action) from [<c0030dec>] (__do_softirq+0xb0/0x208) <4>[ 145.654479] [<c0030dec>] (__do_softirq) from [<c003119c>] (irq_exit+0x80/0xe4) <4>[ 145.662109] [<c003119c>] (irq_exit) from [<c005b44c>] (__handle_domain_irq+0x88/0xa8) <4>[ 145.670379] [<c005b44c>] (__handle_domain_irq) from [<c043c610>] (__irq_usr+0x50/0x80) <0>[ 145.678741] Code: e59d400c e5943014 e1530006 0a000025 (e593505c) <4>[ 145.685241] ---[ end trace 17f822c9893a7c21 ]--- <0>[ 145.691864] Kernel pan)c - not syncing: Fatal exception in interrupt I tracked the problem down to the commit <7866a621043fbaca3d7389e9b9f69dd1a2e5a855> ("dev: add per net_device packet type chains"). After reverting that commit, the oops no longer appear. Userspace on Nokia N900 talks to the modem via phonet interface. Please advice on how to proceed to fix the problem and if there is anything else I can provide. Regards, Ivo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-07 17:54 [OOPS] In __netif_receive_skb_core Ivaylo Dimitrov @ 2016-01-10 17:48 ` Ivaylo Dimitrov 2016-01-10 20:26 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Ivaylo Dimitrov @ 2016-01-10 17:48 UTC (permalink / raw) To: noureddine, davem Cc: Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap Anyone? On 7.01.2016 19:54, Ivaylo Dimitrov wrote: > Hi, > > Trying to bring up the modem interface on Nokia n900 device with recent > linux, leads to the following kernel oops (read from the mtdoops device): > > <6>[ 144.009765] bq27xxx-battery 2-0055: battery is not calibrated! > ignoring capacity values > <5>[ 145.\x1188964] ssi-protocol ssi-protocol: WAKELINES TEST OK > <6>[ 145.194793] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes > ready > <1>[ 145.358154] Unable to handle kernel NULL pointer dereference at > virtual address 0000005c > <1>[ 145.366821] pgd = ce530000 > <1>[ 145.369689] [0000005c] *pgd=8e44c831, *pte=00000000, *ppte=00000000 > <0>[ 145.376373] Internal error: Oops: 17 [#1] PREEMPT ARM > <4>[ 145.381683] Modules linked in: cmt_speech nokia_modem ssi_protocol > sha256_generic hmac drbg ansi_cprng ctr ccm rfcomm sd_mod scsi_mod > bnep"bluetooth omaplfb pvrsrvkm ipv6 bq2415x_charger uinput hsi_char > radio_platform_si4713 joydev omap_ssi_port video_bus_switch arc4 > wl1251_spi gpio_keys isp1704_charger wl1251 mac80211 smc91x mii cfg80211 > omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713 bq27xxx_battery > leds_lp5523 leds_lp55xx_common adp1653 tsl2563 rtc_twl twl4030_wdt > et8ek8 ad5820 v4l2_common smiaregs videodev twl4030_vibra ff_memless > media lis3lv02d_i2c lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal > thermal_sys hwmon rx51_battery > <4>[ 145.441802] CPU: 0 PID: 1040 Comm: csd Not tainted 4.4.0-rc7+ #2 > <4>[ 145.448120] Hardware name: Nokia RX-51 board > <4>[ 145.452636] task: ce517700 ti: cef50000 task.ti: cef50000 > <4>[ 145.458343] PC is at __netif_receive_skb_core+0x7c0/0x92c > <4>[ 145.464050] LR is at sock_queue_rcv_skb+0x208/0x214 > <4>[ 145.469207] pc : [<c0393ebc>] lr : [<c03852f4>] psr: 00 00113 > <4>[ 145.469207] sp : cef51e98 ip : 15800000 fp : c3a4a780 > <4>[ 145.481292] r10: c3a2005c r9 : 00000000 r8 : c3987834 > <4>[ 145.486816] r7 : 0000f500 r6 : c3a20000 r5 : c3a20048 r4 : > c3987780 > <4>[ 145.493682] r3 : 00000000 r2 : 00000002 r1 : 00000000 r0 : > 00000000 > <4>[ 145.500579] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM > Segment none > <4>[ 145.508087] Control: 10c5387d Table: 8e530019 DAC: 00000051 > <0>[ 145.514160] Process csd (pid: 1040, stack limit = 0xcef50210) > <0>[ 145.520202] Stack: (0xcef51e98 to 0xcef52000) > <0>[ 145.524810] 1e80: cf334000 cf3ccd90 > <0>[ 145.533447] 1ea0: 00000002 c3987780 c0651400 c3a5507c c3912000 > c0068534 c3a5507c 40000113 > <0>[ 145.542083] 1ec0: ffffffff bf3b5fac bf3b5d88 ffffffff 00000000 > c0675f80 c3987780 00000000 > <0>[ 145.550720] 1ee0: c0675f74 c0675f48 cef51f18 00000040 cef51f20 > c03965f4 c0396580 c0675f80 > <0>[ 145.559356] 1f00: 00000001 0000012c 00000040 c0676217 ffffc399 > c0396d48 cef51f18 cef51f18 > <0>[ 145.567993] 1f20: cef51f20 cef51f20 cf334000 00000008 c0677240 > 00000008 c0677200 c067724c > <0>[ 145.576629] 1f40: 00000100 00400100 c0440be4 c0030dec cf807f00 > fa2000cc 00000003 00000003 > <0>[ 145.585266] 1f60: ffffc398 00000004 10c53c7d 00000000 00000000 > cf802000 00000001 10c53c7d > <0>[ 145.593902] 1f80: b6ea4f44 00001684 bee8e7dc c003119c 00000000 > c005b44c cef51fb0 b6e1b024 > <0>[ 145.602539] 1fa0: a0000010 ffffffff 10c5387d c043c610 b6ea71bc > 00000000 00000000 00002f48 > <0>[ 145.611175] 1fc0: 00000000 b6ea71bc 00002f48 b6ebb684 b6eba000 > b6ea4f44 00001684 bee8e7dc > <0>[ 145.619812] 1fe0: 00019817 bee8e788 b6e19d0c b6e1b024 a0000010 > ffffffff 00000000 00000000 > <4>[ 145.628648] [<c0393ebc>] (__netif_receive_skb_core) from > [<c03965f4>] (process_backlog+0x74/0xf4) > <4>[ 145.637817] [<c03965f4>] (process_backlog) from [<c0396d48>] > (net_rx_action+0xd0/0x284) > <4>[ 145.646301] [<c0396d48>] (net_rx_action) from [<c0030dec>] > (__do_softirq+0xb0/0x208) > <4>[ 145.654479] [<c0030dec>] (__do_softirq) from [<c003119c>] > (irq_exit+0x80/0xe4) > <4>[ 145.662109] [<c003119c>] (irq_exit) from [<c005b44c>] > (__handle_domain_irq+0x88/0xa8) > <4>[ 145.670379] [<c005b44c>] (__handle_domain_irq) from [<c043c610>] > (__irq_usr+0x50/0x80) > <0>[ 145.678741] Code: e59d400c e5943014 e1530006 0a000025 (e593505c) > <4>[ 145.685241] ---[ end trace 17f822c9893a7c21 ]--- > <0>[ 145.691864] Kernel pan)c - not syncing: Fatal exception in interrupt > > I tracked the problem down to the commit > <7866a621043fbaca3d7389e9b9f69dd1a2e5a855> ("dev: add per net_device > packet type chains"). After reverting that commit, the oops no longer > appear. > > Userspace on Nokia N900 talks to the modem via phonet interface. > > Please advice on how to proceed to fix the problem and if there is > anything else I can provide. > > Regards, > Ivo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-10 17:48 ` Ivaylo Dimitrov @ 2016-01-10 20:26 ` Eric Dumazet 2016-01-11 21:03 ` Ivaylo Dimitrov 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2016-01-10 20:26 UTC (permalink / raw) To: Ivaylo Dimitrov Cc: noureddine, davem, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap On Sun, 2016-01-10 at 19:48 +0200, Ivaylo Dimitrov wrote: > Anyone? > > On 7.01.2016 19:54, Ivaylo Dimitrov wrote: > > Hi, > > > > Trying to bring up the modem interface on Nokia n900 device with recent > > linux, leads to the following kernel oops (read from the mtdoops device): > > > > <6>[ 144.009765] bq27xxx-battery 2-0055: battery is not calibrated! > > ignoring capacity values > > <5>[ 145.\x1188964] ssi-protocol ssi-protocol: WAKELINES TEST OK > > <6>[ 145.194793] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes > > ready > > <1>[ 145.358154] Unable to handle kernel NULL pointer dereference at > > virtual address 0000005c > > <1>[ 145.366821] pgd = ce530000 > > <1>[ 145.369689] [0000005c] *pgd=8e44c831, *pte=00000000, *ppte=00000000 > > <0>[ 145.376373] Internal error: Oops: 17 [#1] PREEMPT ARM > > <4>[ 145.381683] Modules linked in: cmt_speech nokia_modem ssi_protocol > > sha256_generic hmac drbg ansi_cprng ctr ccm rfcomm sd_mod scsi_mod > > bnep"bluetooth omaplfb pvrsrvkm ipv6 bq2415x_charger uinput hsi_char > > radio_platform_si4713 joydev omap_ssi_port video_bus_switch arc4 > > wl1251_spi gpio_keys isp1704_charger wl1251 mac80211 smc91x mii cfg80211 > > omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713 bq27xxx_battery > > leds_lp5523 leds_lp55xx_common adp1653 tsl2563 rtc_twl twl4030_wdt > > et8ek8 ad5820 v4l2_common smiaregs videodev twl4030_vibra ff_memless > > media lis3lv02d_i2c lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal > > thermal_sys hwmon rx51_battery > > <4>[ 145.441802] CPU: 0 PID: 1040 Comm: csd Not tainted 4.4.0-rc7+ #2 > > <4>[ 145.448120] Hardware name: Nokia RX-51 board > > <4>[ 145.452636] task: ce517700 ti: cef50000 task.ti: cef50000 > > <4>[ 145.458343] PC is at __netif_receive_skb_core+0x7c0/0x92c > > <4>[ 145.464050] LR is at sock_queue_rcv_skb+0x208/0x214 > > <4>[ 145.469207] pc : [<c0393ebc>] lr : [<c03852f4>] psr: 00 00113 > > <4>[ 145.469207] sp : cef51e98 ip : 15800000 fp : c3a4a780 > > <4>[ 145.481292] r10: c3a2005c r9 : 00000000 r8 : c3987834 > > <4>[ 145.486816] r7 : 0000f500 r6 : c3a20000 r5 : c3a20048 r4 : > > c3987780 > > <4>[ 145.493682] r3 : 00000000 r2 : 00000002 r1 : 00000000 r0 : > > 00000000 > > <4>[ 145.500579] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM > > Segment none > > <4>[ 145.508087] Control: 10c5387d Table: 8e530019 DAC: 00000051 > > <0>[ 145.514160] Process csd (pid: 1040, stack limit = 0xcef50210) > > <0>[ 145.520202] Stack: (0xcef51e98 to 0xcef52000) > > <0>[ 145.524810] 1e80: cf334000 cf3ccd90 > > <0>[ 145.533447] 1ea0: 00000002 c3987780 c0651400 c3a5507c c3912000 > > c0068534 c3a5507c 40000113 > > <0>[ 145.542083] 1ec0: ffffffff bf3b5fac bf3b5d88 ffffffff 00000000 > > c0675f80 c3987780 00000000 > > <0>[ 145.550720] 1ee0: c0675f74 c0675f48 cef51f18 00000040 cef51f20 > > c03965f4 c0396580 c0675f80 > > <0>[ 145.559356] 1f00: 00000001 0000012c 00000040 c0676217 ffffc399 > > c0396d48 cef51f18 cef51f18 > > <0>[ 145.567993] 1f20: cef51f20 cef51f20 cf334000 00000008 c0677240 > > 00000008 c0677200 c067724c > > <0>[ 145.576629] 1f40: 00000100 00400100 c0440be4 c0030dec cf807f00 > > fa2000cc 00000003 00000003 > > <0>[ 145.585266] 1f60: ffffc398 00000004 10c53c7d 00000000 00000000 > > cf802000 00000001 10c53c7d > > <0>[ 145.593902] 1f80: b6ea4f44 00001684 bee8e7dc c003119c 00000000 > > c005b44c cef51fb0 b6e1b024 > > <0>[ 145.602539] 1fa0: a0000010 ffffffff 10c5387d c043c610 b6ea71bc > > 00000000 00000000 00002f48 > > <0>[ 145.611175] 1fc0: 00000000 b6ea71bc 00002f48 b6ebb684 b6eba000 > > b6ea4f44 00001684 bee8e7dc > > <0>[ 145.619812] 1fe0: 00019817 bee8e788 b6e19d0c b6e1b024 a0000010 > > ffffffff 00000000 00000000 > > <4>[ 145.628648] [<c0393ebc>] (__netif_receive_skb_core) from > > [<c03965f4>] (process_backlog+0x74/0xf4) > > <4>[ 145.637817] [<c03965f4>] (process_backlog) from [<c0396d48>] > > (net_rx_action+0xd0/0x284) > > <4>[ 145.646301] [<c0396d48>] (net_rx_action) from [<c0030dec>] > > (__do_softirq+0xb0/0x208) > > <4>[ 145.654479] [<c0030dec>] (__do_softirq) from [<c003119c>] > > (irq_exit+0x80/0xe4) > > <4>[ 145.662109] [<c003119c>] (irq_exit) from [<c005b44c>] > > (__handle_domain_irq+0x88/0xa8) > > <4>[ 145.670379] [<c005b44c>] (__handle_domain_irq) from [<c043c610>] > > (__irq_usr+0x50/0x80) > > <0>[ 145.678741] Code: e59d400c e5943014 e1530006 0a000025 (e593505c) > > <4>[ 145.685241] ---[ end trace 17f822c9893a7c21 ]--- > > <0>[ 145.691864] Kernel pan)c - not syncing: Fatal exception in interrupt > > > > I tracked the problem down to the commit > > <7866a621043fbaca3d7389e9b9f69dd1a2e5a855> ("dev: add per net_device > > packet type chains"). After reverting that commit, the oops no longer > > appear. > > > > Userspace on Nokia N900 talks to the modem via phonet interface. > > > > Please advice on how to proceed to fix the problem and if there is > > anything else I can provide. > > I do not see anything wrong with this commit. It must uncover a prior bug. Seems to be a phonet bug not reacting to NETDEV_DOWN maybe ? diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c index a58680016472..fd2f44940bd7 100644 --- a/net/phonet/pn_dev.c +++ b/net/phonet/pn_dev.c @@ -63,11 +63,11 @@ struct phonet_device_list *phonet_device_list(struct net *net) static struct phonet_device *__phonet_device_alloc(struct net_device *dev) { struct phonet_device_list *pndevs = phonet_device_list(dev_net(dev)); - struct phonet_device *pnd = kmalloc(sizeof(*pnd), GFP_ATOMIC); - if (pnd == NULL) + struct phonet_device *pnd = kzalloc(sizeof(*pnd), GFP_KERNEL); + + if (!pnd) return NULL; pnd->netdev = dev; - bitmap_zero(pnd->addrs, 64); BUG_ON(!mutex_is_locked(&pndevs->lock)); list_add_rcu(&pnd->list, &pndevs->list); @@ -117,7 +117,7 @@ static void phonet_device_destroy(struct net_device *dev) for_each_set_bit(addr, pnd->addrs, 64) phonet_address_notify(RTM_DELADDR, dev, addr); - kfree(pnd); + kfree_rcu(pnd, rcu); } } @@ -301,6 +301,7 @@ static int phonet_device_notify(struct notifier_block *me, unsigned long what, if (dev->type == ARPHRD_PHONET) phonet_device_autoconf(dev); break; + case NETDEV_DOWN: case NETDEV_UNREGISTER: phonet_device_destroy(dev); phonet_route_autodel(dev); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-10 20:26 ` Eric Dumazet @ 2016-01-11 21:03 ` Ivaylo Dimitrov 2016-01-11 22:11 ` Salam Noureddine 0 siblings, 1 reply; 17+ messages in thread From: Ivaylo Dimitrov @ 2016-01-11 21:03 UTC (permalink / raw) To: Eric Dumazet Cc: noureddine, davem, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap On 10.01.2016 22:26, Eric Dumazet wrote: > > I do not see anything wrong with this commit. > > It must uncover a prior bug. > Could be, but it was working fine in 3.19. > Seems to be a phonet bug not reacting to NETDEV_DOWN maybe ? > > > diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c > index a58680016472..fd2f44940bd7 100644 > --- a/net/phonet/pn_dev.c > +++ b/net/phonet/pn_dev.c The patch makes difference: <3>[ 118.943542] et8ek8 3-003e: could not get clock <6>[ 119.028045] IPv6: ADDRCONF(NETDEV_UP): phonet0: link is not ready <4>[ 120.945922] sched: RT throttling activated <5>[ 158.556091] ssi-protocol ssi-protocol: WAKELINES TEST OK <6>[ 158.561828] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes ready <1>[ 158.708831] Unable to handle kernel NULL pointer dereference at virtual address 0000005c <1>[ 158.717498] pgd = ce548000 <1>[ 158.720367] [0000005c] *pgd=8e53a831, *pte=00000000, *ppte=00000000 <0>[ 158.727020] Internal error: Oops: 17 [#1] PREEMPT ARM <4>[ 158.732330] Modules linked in: cmt_speech nokia_modem ssi_protocol vfat fat rfcomm sd_mod scsi_mod bnep bluetooth omaplfb pvrsrvkm ipv6 bq2415x_charger uinput hsi_char radio_platform_si4713 joydev omap_ssi_port video_bus_switch arc4 wl1251_spi wl1251 isp1704_charger gpio_keys mac80211 smc91x mii cfg80211 omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713 leds_lp5523 leds_lp55xx_common adp1653 bq27xxx_battery tsl2563 twl4030_wdt rtc_twl et8ek8 smiaregs ad5820 v4l2_common twl4030_vibra ff_memless videodev lis3lv02d_i2c media lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal thermal_sys hwmon rx51_battery <4>[ 158.789154] CPU: 0 PID: 1038 Comm: csd Not tainted 4.4.0-rc7+ #14 <4>[ 158.795562] Hardware name: Nokia RX-51 board <4>[ 158.800048] task: ce4d5900 ti: ce4e4000 task.ti: ce4e4000 <4>[ 158.805725] PC is at __netif_receive_skb_core+0x7c0/0x92c <4>[ 158.811401] LR is at sock_queue_rcv_skb+0x208/0x214 <4>[ 158.816528] pc : [<c0394218>] lr : [<c0385648>] psr: 00000113 <4>[ 158.816528] sp : ce4e5e98 ip : 14400000 fp : cd4ab380 <4>[ 158.828582] r10: c249405c r9 : 00000000 r8 : c24b2d74 <4>[ 158.834045] r7 : 0000f500 r6 : c2494000 r5 : c2494048 r4 : c24b2cc0 <4>[ 158.840911] r3 : 00000000 r2 : 00000002 r1 : 00000000 r0 : 00000000 <4>[ 158.847778] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none <4>[ 158.855285] Control: 10c5387d Table: 8e548019 DAC: 00000051 <0>[ 158.861297] Process csd (pid: 1038, stack limit = 0xce4e4210) <0>[ 158.867340] Stack: (0xce4e5e98 to 0xce4e6000) <0>[ 158.871917] 5e80: cf3ab000 cf2e4d90 <0>[ 158.880554] 5ea0: 00000002 c24b2cc0 c06534dc c243b67c c5dcae00 c00685ac c243b67c 40000113 <0>[ 158.889129] 5ec0: ffffffff bf3b0fac bf3b0d88 ffffffff 00000000 c06780c0 c24b2cc0 00000000 <0>[ 158.897735] 5ee0: c06780b4 c0678088 ce4e5f18 00000040 ce4e5f20 c0396950 c03968dc c06780c0 <0>[ 158.906341] 5f00: 00000001 0000012c 00000040 c0678357 ffffc8d0 c03970a4 ce4e5f18 ce4e5f18 <0>[ 158.914947] 5f20: ce4e5f20 ce4e5f20 cf3ab000 00000008 c0679380 00000008 c0679340 c067938c <0>[ 158.923522] 5f40: 00000100 00400100 c0441be8 c0030e64 cf807f00 fa2000cc 00000003 00000003 <0>[ 158.932128] 5f60: ffffc8cf 00000004 10c53c7d 00000000 00000000 cf802000 00000001 10c53c7d <0>[ 158.940734] 5f80: 00001a30 0000026c beba476c c0031214 00000000 c005b4c4 ce4e5fb0 b6e3f82c <0>[ 158.949310] 5fa0: 40000010 ffffffff 10c5387d c043c990 00000014 00000000 0001ef78 00000014 <0>[ 158.957946] 5fc0: 0001edf8 00000001 00000000 00000000 00000000 00001a30 0000026c beba476c <0>[ 158.966522] 5fe0: b6e4d82c beba4748 b6e3f82c b6e3f82c 40000010 ffffffff ddee3fd6 e7f5d5dd <4>[ 158.975158] [<c0394218>] (__netif_receive_skb_core) from [<c0396950>] (process_backlog+0x74/0xf4) <4>[ 158.984497] [<c0396950>] (process_backlog) from [<c03970a4>] (net_rx_action+0xd0/0x284) <4>[ 158.992919] [<c03970a4>] (net_rx_action) from [<c0030e64>] (__do_softirq+0xb0/0x208) <4>[ 159.001037] [<c0030e64>] (__do_softirq) from [<c0031214>] (irq_exit+0x80/0xe4) <4>[ 159.008636] [<c0031214>] (irq_exit) from [<c005b4c4>] (__handle_domain_irq+0x88/0xa8) <4>[ 159.016906] [<c005b4c4>] (__handle_domain_irq) from [<c043c990>] (__irq_usr+0x50/0x80) <0>[ 159.025207] Code: e59d400c e5943014 e1530006 0a000025 (e593505c) <4>[ 159.031677] ---[ end trace c52d1e36f2d07d67 ]--- <0>[ 159.038146] Kernel panic - not syncing: Fatal exception in interrupt --[ end trace c52d1e36f2d07d67 ]--- Could you provide some hints on where to put some printks or dump some data to get a clearer picture on what happens? Thanks, Ivo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-11 21:03 ` Ivaylo Dimitrov @ 2016-01-11 22:11 ` Salam Noureddine 2016-01-12 0:51 ` Ivaylo Dimitrov 0 siblings, 1 reply; 17+ messages in thread From: Salam Noureddine @ 2016-01-11 22:11 UTC (permalink / raw) To: Ivaylo Dimitrov Cc: Eric Dumazet, David S. Miller, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap Would you be able to disassemble your kernel so we could tell where the null pointer dereference happens? Thanks, Salam On Mon, Jan 11, 2016 at 1:03 PM, Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> wrote: > > > On 10.01.2016 22:26, Eric Dumazet wrote: >> >> >> I do not see anything wrong with this commit. >> >> It must uncover a prior bug. >> > > Could be, but it was working fine in 3.19. > >> Seems to be a phonet bug not reacting to NETDEV_DOWN maybe ? >> >> >> diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c >> index a58680016472..fd2f44940bd7 100644 >> --- a/net/phonet/pn_dev.c >> +++ b/net/phonet/pn_dev.c > > > The patch makes difference: > > <3>[ 118.943542] et8ek8 3-003e: could not get clock > <6>[ 119.028045] IPv6: ADDRCONF(NETDEV_UP): phonet0: link is not ready > <4>[ 120.945922] sched: RT throttling activated > <5>[ 158.556091] ssi-protocol ssi-protocol: WAKELINES TEST OK > <6>[ 158.561828] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes ready > <1>[ 158.708831] Unable to handle kernel NULL pointer dereference at > virtual address 0000005c > <1>[ 158.717498] pgd = ce548000 > <1>[ 158.720367] [0000005c] *pgd=8e53a831, *pte=00000000, *ppte=00000000 > <0>[ 158.727020] Internal error: Oops: 17 [#1] PREEMPT ARM > <4>[ 158.732330] Modules linked in: cmt_speech nokia_modem ssi_protocol > vfat fat rfcomm sd_mod scsi_mod bnep bluetooth omaplfb pvrsrvkm ipv6 > bq2415x_charger uinput hsi_char radio_platform_si4713 joydev omap_ssi_port > video_bus_switch arc4 wl1251_spi wl1251 isp1704_charger gpio_keys mac80211 > smc91x mii cfg80211 omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713 > leds_lp5523 leds_lp55xx_common adp1653 bq27xxx_battery tsl2563 twl4030_wdt > rtc_twl et8ek8 smiaregs ad5820 v4l2_common twl4030_vibra ff_memless videodev > lis3lv02d_i2c media lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal > thermal_sys hwmon rx51_battery > <4>[ 158.789154] CPU: 0 PID: 1038 Comm: csd Not tainted 4.4.0-rc7+ #14 > <4>[ 158.795562] Hardware name: Nokia RX-51 board > <4>[ 158.800048] task: ce4d5900 ti: ce4e4000 task.ti: ce4e4000 > <4>[ 158.805725] PC is at __netif_receive_skb_core+0x7c0/0x92c > <4>[ 158.811401] LR is at sock_queue_rcv_skb+0x208/0x214 > <4>[ 158.816528] pc : [<c0394218>] lr : [<c0385648>] psr: 00000113 > <4>[ 158.816528] sp : ce4e5e98 ip : 14400000 fp : cd4ab380 > <4>[ 158.828582] r10: c249405c r9 : 00000000 r8 : c24b2d74 > <4>[ 158.834045] r7 : 0000f500 r6 : c2494000 r5 : c2494048 r4 : c24b2cc0 > <4>[ 158.840911] r3 : 00000000 r2 : 00000002 r1 : 00000000 r0 : 00000000 > <4>[ 158.847778] Flags: nzcv IRQs on FIQs on Mode SVC_32 ISA ARM > Segment none > <4>[ 158.855285] Control: 10c5387d Table: 8e548019 DAC: 00000051 > <0>[ 158.861297] Process csd (pid: 1038, stack limit = 0xce4e4210) > <0>[ 158.867340] Stack: (0xce4e5e98 to 0xce4e6000) > <0>[ 158.871917] 5e80: cf3ab000 cf2e4d90 > <0>[ 158.880554] 5ea0: 00000002 c24b2cc0 c06534dc c243b67c c5dcae00 > c00685ac c243b67c 40000113 > <0>[ 158.889129] 5ec0: ffffffff bf3b0fac bf3b0d88 ffffffff 00000000 > c06780c0 c24b2cc0 00000000 > <0>[ 158.897735] 5ee0: c06780b4 c0678088 ce4e5f18 00000040 ce4e5f20 > c0396950 c03968dc c06780c0 > <0>[ 158.906341] 5f00: 00000001 0000012c 00000040 c0678357 ffffc8d0 > c03970a4 ce4e5f18 ce4e5f18 > <0>[ 158.914947] 5f20: ce4e5f20 ce4e5f20 cf3ab000 00000008 c0679380 > 00000008 c0679340 c067938c > <0>[ 158.923522] 5f40: 00000100 00400100 c0441be8 c0030e64 cf807f00 > fa2000cc 00000003 00000003 > <0>[ 158.932128] 5f60: ffffc8cf 00000004 10c53c7d 00000000 00000000 > cf802000 00000001 10c53c7d > <0>[ 158.940734] 5f80: 00001a30 0000026c beba476c c0031214 00000000 > c005b4c4 ce4e5fb0 b6e3f82c > <0>[ 158.949310] 5fa0: 40000010 ffffffff 10c5387d c043c990 00000014 > 00000000 0001ef78 00000014 > <0>[ 158.957946] 5fc0: 0001edf8 00000001 00000000 00000000 00000000 > 00001a30 0000026c beba476c > <0>[ 158.966522] 5fe0: b6e4d82c beba4748 b6e3f82c b6e3f82c 40000010 > ffffffff ddee3fd6 e7f5d5dd > <4>[ 158.975158] [<c0394218>] (__netif_receive_skb_core) from [<c0396950>] > (process_backlog+0x74/0xf4) > <4>[ 158.984497] [<c0396950>] (process_backlog) from [<c03970a4>] > (net_rx_action+0xd0/0x284) > <4>[ 158.992919] [<c03970a4>] (net_rx_action) from [<c0030e64>] > (__do_softirq+0xb0/0x208) > <4>[ 159.001037] [<c0030e64>] (__do_softirq) from [<c0031214>] > (irq_exit+0x80/0xe4) > <4>[ 159.008636] [<c0031214>] (irq_exit) from [<c005b4c4>] > (__handle_domain_irq+0x88/0xa8) > <4>[ 159.016906] [<c005b4c4>] (__handle_domain_irq) from [<c043c990>] > (__irq_usr+0x50/0x80) > <0>[ 159.025207] Code: e59d400c e5943014 e1530006 0a000025 (e593505c) > <4>[ 159.031677] ---[ end trace c52d1e36f2d07d67 ]--- > <0>[ 159.038146] Kernel panic - not syncing: Fatal exception in interrupt > --[ end trace c52d1e36f2d07d67 ]--- > > Could you provide some hints on where to put some printks or dump some data > to get a clearer picture on what happens? > > Thanks, > Ivo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-11 22:11 ` Salam Noureddine @ 2016-01-12 0:51 ` Ivaylo Dimitrov 2016-01-12 1:06 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Ivaylo Dimitrov @ 2016-01-12 0:51 UTC (permalink / raw) To: Salam Noureddine Cc: Eric Dumazet, David S. Miller, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap On 12.01.2016 00:11, Salam Noureddine wrote: > Would you be able to disassemble your kernel so we could tell where > the null pointer dereference happens? > Sure, but wouldn't it be better to provide the object file containing the debug symbols as well? Otherwise, the null pointer dereference happens somewhere in: (gdb) l *__netif_receive_skb_core+0x7c0 0x1318 is in __netif_receive_skb_core (include/linux/compiler.h:218). 213 }) 214 215 static __always_inline 216 void __read_once_size(const volatile void *p, void *res, int size) 217 { 218 __READ_ONCE_SIZE; 219 } 220 221 #ifdef CONFIG_KASAN 222 /* (gdb) l *__netif_receive_skb_core+0x7bc 0x1314 is in __netif_receive_skb_core (net/core/dev.c:3934). 3929 } 3930 3931 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type, 3932 &orig_dev->ptype_specific); 3933 3934 if (unlikely(skb->dev != orig_dev)) { 3935 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type, 3936 &skb->dev->ptype_specific); 3937 } 3938 (gdb) l *__netif_receive_skb_core+0x7c4 0x131c is in __netif_receive_skb_core (net/core/dev.c:3935). 3930 3931 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type, 3932 &orig_dev->ptype_specific); 3933 3934 if (unlikely(skb->dev != orig_dev)) { 3935 deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type, 3936 &skb->dev->ptype_specific); 3937 } 3938 3939 if (pt_prev) { 0x00001300 <+1960>: cmp r10, r3 0x00001304 <+1964>: bne 0x1284 <__netif_receive_skb_core+1836> 0x00001308 <+1968>: ldr r4, [sp, #12] 0x0000130c <+1972>: ldr r3, [r4, #20] 0x00001310 <+1976>: cmp r3, r6 0x00001314 <+1980>: beq 0x13b0 <__netif_receive_skb_core+2136> 0x00001318 <+1984>: ldr r5, [r3, #92] ; 0x5c <-FAULT r3 seems to be skb->dev 0x0000131c <+1988>: add r10, r3, #92 ; 0x5c 0x00001320 <+1992>: add r8, r4, #180 ; 0xb4 0x00001324 <+1996>: sub r5, r5, #20 0x00001328 <+2000>: b 0x13a4 <__netif_receive_skb_core+2124> 0x0000132c <+2004>: ldrh r3, [r5] 0x00001330 <+2008>: cmp r3, r7 I put some additional printks around that code, and it turned out that skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but "&skb->dev->ptype_specific" oopses. Thanks, Ivo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-12 0:51 ` Ivaylo Dimitrov @ 2016-01-12 1:06 ` Eric Dumazet 2016-01-12 1:19 ` Salam Noureddine 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2016-01-12 1:06 UTC (permalink / raw) To: Ivaylo Dimitrov Cc: Salam Noureddine, David S. Miller, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap On Tue, 2016-01-12 at 02:51 +0200, Ivaylo Dimitrov wrote: > > I put some additional printks around that code, and it turned out that > skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but > "&skb->dev->ptype_specific" oopses. Nice find ! Now lets find what possibly called netif_rx() with skb->dev == NULL (This is illegal) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-12 1:06 ` Eric Dumazet @ 2016-01-12 1:19 ` Salam Noureddine 2016-01-12 2:21 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Salam Noureddine @ 2016-01-12 1:19 UTC (permalink / raw) To: Eric Dumazet Cc: Ivaylo Dimitrov, David S. Miller, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap It must be that skb->dev was changed to NULL inside of __netif_receive_skb_core, otherwise we would have crashed much earlier. Also, orig_dev is saved at the beginning. Possibly a device is layered on top of the original device. On Mon, Jan 11, 2016 at 5:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2016-01-12 at 02:51 +0200, Ivaylo Dimitrov wrote: > >> >> I put some additional printks around that code, and it turned out that >> skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but >> "&skb->dev->ptype_specific" oopses. > > Nice find ! > > Now lets find what possibly called netif_rx() with skb->dev == NULL > > (This is illegal) > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-12 1:19 ` Salam Noureddine @ 2016-01-12 2:21 ` Eric Dumazet 2016-01-12 2:25 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2016-01-12 2:21 UTC (permalink / raw) To: Salam Noureddine Cc: Ivaylo Dimitrov, David S. Miller, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap On Mon, 2016-01-11 at 17:19 -0800, Salam Noureddine wrote: > It must be that skb->dev was changed to NULL inside of > __netif_receive_skb_core, otherwise we would have crashed much > earlier. Also, orig_dev is saved at the beginning. Possibly a device > is layered on top of the original device. Please do not top post on netdev / lkml mailing lists. My guess is a protocol handler queued the skb without calling skb_share_check() ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-12 2:21 ` Eric Dumazet @ 2016-01-12 2:25 ` Eric Dumazet 2016-01-12 7:16 ` Ivaylo Dimitrov 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2016-01-12 2:25 UTC (permalink / raw) To: Salam Noureddine Cc: Ivaylo Dimitrov, David S. Miller, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap On Mon, 2016-01-11 at 18:21 -0800, Eric Dumazet wrote: > On Mon, 2016-01-11 at 17:19 -0800, Salam Noureddine wrote: > > It must be that skb->dev was changed to NULL inside of > > __netif_receive_skb_core, otherwise we would have crashed much > > earlier. Also, orig_dev is saved at the beginning. Possibly a device > > is layered on top of the original device. > > Please do not top post on netdev / lkml mailing lists. > > My guess is a protocol handler queued the skb without calling > skb_share_check() > OK please try this fix : diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c index 10d42f3220ab..f925753668a7 100644 --- a/net/phonet/af_phonet.c +++ b/net/phonet/af_phonet.c @@ -377,6 +377,10 @@ static int phonet_rcv(struct sk_buff *skb, struct net_device *dev, struct sockaddr_pn sa; u16 len; + skb = skb_share_check(skb, GFP_ATOMIC); + if (!skb) + return NET_RX_DROP; + /* check we have at least a full Phonet header */ if (!pskb_pull(skb, sizeof(struct phonethdr))) goto out; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-12 2:25 ` Eric Dumazet @ 2016-01-12 7:16 ` Ivaylo Dimitrov 2016-01-12 14:19 ` Eric Dumazet 2016-01-12 18:15 ` [OOPS] In __netif_receive_skb_core Salam Noureddine 0 siblings, 2 replies; 17+ messages in thread From: Ivaylo Dimitrov @ 2016-01-12 7:16 UTC (permalink / raw) To: Eric Dumazet Cc: Salam Noureddine, David S. Miller, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap On 12.01.2016 04:25, Eric Dumazet wrote: > On Mon, 2016-01-11 at 18:21 -0800, Eric Dumazet wrote: >> On Mon, 2016-01-11 at 17:19 -0800, Salam Noureddine wrote: >>> It must be that skb->dev was changed to NULL inside of >>> __netif_receive_skb_core, otherwise we would have crashed much >>> earlier. Also, orig_dev is saved at the beginning. Possibly a device >>> is layered on top of the original device. Exactly (skb->dev was changed to NULL ...). Do you think it makes sense to put printks in various places in __netif_receive_skb_core to see after which function call skb->dev turns into NULL? > > OK please try this fix : > > diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c > index 10d42f3220ab..f925753668a7 100644 > --- a/net/phonet/af_phonet.c > +++ b/net/phonet/af_phonet.c > @@ -377,6 +377,10 @@ static int phonet_rcv(struct sk_buff *skb, struct net_device *dev, > struct sockaddr_pn sa; > u16 len; > > + skb = skb_share_check(skb, GFP_ATOMIC); > + if (!skb) > + return NET_RX_DROP; > + > /* check we have at least a full Phonet header */ > if (!pskb_pull(skb, sizeof(struct phonethdr))) > goto out; > > That one fixes the oops, though I wonder if your previous patch is needed (I reverted it before testing the current). Unfortunately I don't have SIM card around to test GPRS connection with, will do it as soon as I find one and will report. Thanks, Ivo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-12 7:16 ` Ivaylo Dimitrov @ 2016-01-12 14:19 ` Eric Dumazet 2016-01-12 16:58 ` [PATCH net] phonet: properly unshare skbs in phonet_rcv() Eric Dumazet 2016-01-12 18:15 ` [OOPS] In __netif_receive_skb_core Salam Noureddine 1 sibling, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2016-01-12 14:19 UTC (permalink / raw) To: Ivaylo Dimitrov Cc: Salam Noureddine, David S. Miller, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap On Tue, 2016-01-12 at 09:16 +0200, Ivaylo Dimitrov wrote: > > On 12.01.2016 04:25, Eric Dumazet wrote: > > > > OK please try this fix : > > > > diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c > > index 10d42f3220ab..f925753668a7 100644 > > --- a/net/phonet/af_phonet.c > > +++ b/net/phonet/af_phonet.c > > @@ -377,6 +377,10 @@ static int phonet_rcv(struct sk_buff *skb, struct net_device *dev, > > struct sockaddr_pn sa; > > u16 len; > > > > + skb = skb_share_check(skb, GFP_ATOMIC); > > + if (!skb) > > + return NET_RX_DROP; > > + > > /* check we have at least a full Phonet header */ > > if (!pskb_pull(skb, sizeof(struct phonethdr))) > > goto out; > > > > > > That one fixes the oops, though I wonder if your previous patch is > needed (I reverted it before testing the current). Unfortunately I don't > have SIM card around to test GPRS connection with, will do it as soon as > I find one and will report. Well, this bug in phonet_rcv() is rather obvious, I have no idea why nobody got crashes or corruptions before today. I'll send a formal patch. Thanks for your help ! ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH net] phonet: properly unshare skbs in phonet_rcv() 2016-01-12 14:19 ` Eric Dumazet @ 2016-01-12 16:58 ` Eric Dumazet 2016-01-12 20:47 ` David Miller 2016-01-13 12:26 ` Rémi Denis-Courmont 0 siblings, 2 replies; 17+ messages in thread From: Eric Dumazet @ 2016-01-12 16:58 UTC (permalink / raw) To: Ivaylo Dimitrov, Remi Denis-Courmont Cc: Salam Noureddine, David S. Miller, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap From: Eric Dumazet <edumazet@google.com> Ivaylo Dimitrov reported a regression caused by commit 7866a621043f ("dev: add per net_device packet type chains"). skb->dev becomes NULL and we crash in __netif_receive_skb_core(). Before above commit, different kind of bugs or corruptions could happen without major crash. But the root cause is that phonet_rcv() can queue skb without checking if skb is shared or not. Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests. Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Remi Denis-Courmont <courmisch@gmail.com> --- net/phonet/af_phonet.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c index 10d42f3220ab..f925753668a7 100644 --- a/net/phonet/af_phonet.c +++ b/net/phonet/af_phonet.c @@ -377,6 +377,10 @@ static int phonet_rcv(struct sk_buff *skb, struct net_device *dev, struct sockaddr_pn sa; u16 len; + skb = skb_share_check(skb, GFP_ATOMIC); + if (!skb) + return NET_RX_DROP; + /* check we have at least a full Phonet header */ if (!pskb_pull(skb, sizeof(struct phonethdr))) goto out; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net] phonet: properly unshare skbs in phonet_rcv() 2016-01-12 16:58 ` [PATCH net] phonet: properly unshare skbs in phonet_rcv() Eric Dumazet @ 2016-01-12 20:47 ` David Miller 2016-01-13 12:26 ` Rémi Denis-Courmont 1 sibling, 0 replies; 17+ messages in thread From: David Miller @ 2016-01-12 20:47 UTC (permalink / raw) To: eric.dumazet Cc: ivo.g.dimitrov.75, courmisch, noureddine, pali.rohar, netdev, linux-kernel, sre, linux-omap From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 12 Jan 2016 08:58:00 -0800 > From: Eric Dumazet <edumazet@google.com> > > Ivaylo Dimitrov reported a regression caused by commit 7866a621043f > ("dev: add per net_device packet type chains"). > > skb->dev becomes NULL and we crash in __netif_receive_skb_core(). > > Before above commit, different kind of bugs or corruptions could happen > without major crash. > > But the root cause is that phonet_rcv() can queue skb without checking > if skb is shared or not. > > Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests. > > Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> > Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Remi Denis-Courmont <courmisch@gmail.com> Applied and queued up for -stable, th anks Eric. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] phonet: properly unshare skbs in phonet_rcv() 2016-01-12 16:58 ` [PATCH net] phonet: properly unshare skbs in phonet_rcv() Eric Dumazet 2016-01-12 20:47 ` David Miller @ 2016-01-13 12:26 ` Rémi Denis-Courmont 2016-01-13 15:07 ` Eric Dumazet 1 sibling, 1 reply; 17+ messages in thread From: Rémi Denis-Courmont @ 2016-01-13 12:26 UTC (permalink / raw) To: Eric Dumazet; +Cc: Network Development Le 2016-01-12 18:58, Eric Dumazet a écrit : > From: Eric Dumazet <edumazet@google.com> > > Ivaylo Dimitrov reported a regression caused by commit 7866a621043f > ("dev: add per net_device packet type chains"). > > skb->dev becomes NULL and we crash in __netif_receive_skb_core(). > > Before above commit, different kind of bugs or corruptions could > happen > without major crash. Hmm... was that always a problem then, or did it get introduced earlier? I thought it was impossible for skb to be shared on PF-recv callback way back. > But the root cause is that phonet_rcv() can queue skb without > checking > if skb is shared or not. > > Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests. > > Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> > Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Remi Denis-Courmont <courmisch@gmail.com> > --- > net/phonet/af_phonet.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c > index 10d42f3220ab..f925753668a7 100644 > --- a/net/phonet/af_phonet.c > +++ b/net/phonet/af_phonet.c > @@ -377,6 +377,10 @@ static int phonet_rcv(struct sk_buff *skb, > struct net_device *dev, > struct sockaddr_pn sa; > u16 len; > > + skb = skb_share_check(skb, GFP_ATOMIC); > + if (!skb) > + return NET_RX_DROP; > + > /* check we have at least a full Phonet header */ > if (!pskb_pull(skb, sizeof(struct phonethdr))) > goto out; Ack, thanks. -- Rémi Denis-Courmont http://www.remlab.net/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net] phonet: properly unshare skbs in phonet_rcv() 2016-01-13 12:26 ` Rémi Denis-Courmont @ 2016-01-13 15:07 ` Eric Dumazet 0 siblings, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2016-01-13 15:07 UTC (permalink / raw) To: Rémi Denis-Courmont; +Cc: Network Development On Wed, 2016-01-13 at 14:26 +0200, Rémi Denis-Courmont wrote: > Le 2016-01-12 18:58, Eric Dumazet a écrit : > > From: Eric Dumazet <edumazet@google.com> > > > > Ivaylo Dimitrov reported a regression caused by commit 7866a621043f > > ("dev: add per net_device packet type chains"). > > > > skb->dev becomes NULL and we crash in __netif_receive_skb_core(). > > > > Before above commit, different kind of bugs or corruptions could > > happen > > without major crash. > > Hmm... was that always a problem then, or did it get introduced > earlier? I thought it was impossible for skb to be shared on PF-recv > callback way back. Always been a problem. Use tcpdump and you risk use after free. Or simply panics if skb->head needs to be expanded. Other relevant commits to explain this : 044453b3efdc90bdd5feffe74b99d95dec70ac43 arp: fix possible crash in arp_rcv() b30532515f0a62bfe17207ab00883dd262497006 bonding: Ensure that we unshare skbs prior to calling pskb_may_pull ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [OOPS] In __netif_receive_skb_core 2016-01-12 7:16 ` Ivaylo Dimitrov 2016-01-12 14:19 ` Eric Dumazet @ 2016-01-12 18:15 ` Salam Noureddine 1 sibling, 0 replies; 17+ messages in thread From: Salam Noureddine @ 2016-01-12 18:15 UTC (permalink / raw) To: Ivaylo Dimitrov Cc: Eric Dumazet, David S. Miller, Pali Rohár, Network Development, LKML, Sebastian Reichel, linux-omap On Mon, Jan 11, 2016 at 11:16 PM, Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> wrote: >>> On Mon, 2016-01-11 at 17:19 -0800, Salam Noureddine wrote: >>>> >>>> It must be that skb->dev was changed to NULL inside of >>>> __netif_receive_skb_core, otherwise we would have crashed much >>>> earlier. Also, orig_dev is saved at the beginning. Possibly a device >>>> is layered on top of the original device. > > > Exactly (skb->dev was changed to NULL ...). Do you think it makes sense to > put printks in various places in __netif_receive_skb_core to see after which > function call skb->dev turns into NULL? > No need anymore since Eric found the culprit in phonet_rcv. Thanks for your help debugging this! Salam ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2016-01-13 15:07 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-07 17:54 [OOPS] In __netif_receive_skb_core Ivaylo Dimitrov 2016-01-10 17:48 ` Ivaylo Dimitrov 2016-01-10 20:26 ` Eric Dumazet 2016-01-11 21:03 ` Ivaylo Dimitrov 2016-01-11 22:11 ` Salam Noureddine 2016-01-12 0:51 ` Ivaylo Dimitrov 2016-01-12 1:06 ` Eric Dumazet 2016-01-12 1:19 ` Salam Noureddine 2016-01-12 2:21 ` Eric Dumazet 2016-01-12 2:25 ` Eric Dumazet 2016-01-12 7:16 ` Ivaylo Dimitrov 2016-01-12 14:19 ` Eric Dumazet 2016-01-12 16:58 ` [PATCH net] phonet: properly unshare skbs in phonet_rcv() Eric Dumazet 2016-01-12 20:47 ` David Miller 2016-01-13 12:26 ` Rémi Denis-Courmont 2016-01-13 15:07 ` Eric Dumazet 2016-01-12 18:15 ` [OOPS] In __netif_receive_skb_core Salam Noureddine
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.