All of lore.kernel.org
 help / color / mirror / Atom feed
* [OOPS] In __netif_receive_skb_core
@ 2016-01-07 17:54 Ivaylo Dimitrov
  2016-01-10 17:48 ` Ivaylo Dimitrov
  0 siblings, 1 reply; 17+ messages in thread
From: Ivaylo Dimitrov @ 2016-01-07 17:54 UTC (permalink / raw)
  To: noureddine, davem
  Cc: Pali Rohár, Network Development, LKML, Sebastian Reichel,
	linux-omap

Hi,

Trying to bring up the modem interface on Nokia n900 device with recent 
linux, leads to the following kernel oops (read from the mtdoops device):

<6>[  144.009765] bq27xxx-battery 2-0055: battery is not calibrated! 
ignoring capacity values
<5>[  145.\x1188964] ssi-protocol ssi-protocol: WAKELINES TEST OK
<6>[  145.194793] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes ready
<1>[  145.358154] Unable to handle kernel NULL pointer dereference at 
virtual address 0000005c
<1>[  145.366821] pgd = ce530000
<1>[  145.369689] [0000005c] *pgd=8e44c831, *pte=00000000, *ppte=00000000
<0>[  145.376373] Internal error: Oops: 17 [#1] PREEMPT ARM
<4>[  145.381683] Modules linked in: cmt_speech nokia_modem ssi_protocol 
sha256_generic hmac drbg ansi_cprng ctr ccm rfcomm sd_mod scsi_mod 
bnep"bluetooth omaplfb pvrsrvkm ipv6 bq2415x_charger uinput hsi_char 
radio_platform_si4713 joydev omap_ssi_port video_bus_switch arc4 
wl1251_spi gpio_keys isp1704_charger wl1251 mac80211 smc91x mii cfg80211 
omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713 bq27xxx_battery 
leds_lp5523 leds_lp55xx_common adp1653 tsl2563 rtc_twl twl4030_wdt 
et8ek8 ad5820 v4l2_common smiaregs videodev twl4030_vibra ff_memless 
media lis3lv02d_i2c lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal 
thermal_sys hwmon rx51_battery
<4>[  145.441802] CPU: 0 PID: 1040 Comm: csd Not tainted 4.4.0-rc7+ #2
<4>[  145.448120] Hardware name: Nokia RX-51 board
<4>[  145.452636] task: ce517700 ti: cef50000 task.ti: cef50000
<4>[  145.458343] PC is at __netif_receive_skb_core+0x7c0/0x92c
<4>[  145.464050] LR is at sock_queue_rcv_skb+0x208/0x214
<4>[  145.469207] pc : [<c0393ebc>]    lr : [<c03852f4>]    psr: 00 00113
<4>[  145.469207] sp : cef51e98  ip : 15800000  fp : c3a4a780
<4>[  145.481292] r10: c3a2005c  r9 : 00000000  r8 : c3987834
<4>[  145.486816] r7 : 0000f500  r6 : c3a20000  r5 : c3a20048  r4 : c3987780
<4>[  145.493682] r3 : 00000000  r2 : 00000002  r1 : 00000000  r0 : 00000000
<4>[  145.500579] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
Segment none
<4>[  145.508087] Control: 10c5387d  Table: 8e530019  DAC: 00000051
<0>[  145.514160] Process csd (pid: 1040, stack limit = 0xcef50210)
<0>[  145.520202] Stack: (0xcef51e98 to 0xcef52000)
<0>[  145.524810] 1e80: 
       cf334000 cf3ccd90
<0>[  145.533447] 1ea0: 00000002 c3987780 c0651400 c3a5507c c3912000 
c0068534 c3a5507c 40000113
<0>[  145.542083] 1ec0: ffffffff bf3b5fac bf3b5d88 ffffffff 00000000 
c0675f80 c3987780 00000000
<0>[  145.550720] 1ee0: c0675f74 c0675f48 cef51f18 00000040 cef51f20 
c03965f4 c0396580 c0675f80
<0>[  145.559356] 1f00: 00000001 0000012c 00000040 c0676217 ffffc399 
c0396d48 cef51f18 cef51f18
<0>[  145.567993] 1f20: cef51f20 cef51f20 cf334000 00000008 c0677240 
00000008 c0677200 c067724c
<0>[  145.576629] 1f40: 00000100 00400100 c0440be4 c0030dec cf807f00 
fa2000cc 00000003 00000003
<0>[  145.585266] 1f60: ffffc398 00000004 10c53c7d 00000000 00000000 
cf802000 00000001 10c53c7d
<0>[  145.593902] 1f80: b6ea4f44 00001684 bee8e7dc c003119c 00000000 
c005b44c cef51fb0 b6e1b024
<0>[  145.602539] 1fa0: a0000010 ffffffff 10c5387d c043c610 b6ea71bc 
00000000 00000000 00002f48
<0>[  145.611175] 1fc0: 00000000 b6ea71bc 00002f48 b6ebb684 b6eba000 
b6ea4f44 00001684 bee8e7dc
<0>[  145.619812] 1fe0: 00019817 bee8e788 b6e19d0c b6e1b024 a0000010 
ffffffff 00000000 00000000
<4>[  145.628648] [<c0393ebc>] (__netif_receive_skb_core) from 
[<c03965f4>] (process_backlog+0x74/0xf4)
<4>[  145.637817] [<c03965f4>] (process_backlog) from [<c0396d48>] 
(net_rx_action+0xd0/0x284)
<4>[  145.646301] [<c0396d48>] (net_rx_action) from [<c0030dec>] 
(__do_softirq+0xb0/0x208)
<4>[  145.654479] [<c0030dec>] (__do_softirq) from [<c003119c>] 
(irq_exit+0x80/0xe4)
<4>[  145.662109] [<c003119c>] (irq_exit) from [<c005b44c>] 
(__handle_domain_irq+0x88/0xa8)
<4>[  145.670379] [<c005b44c>] (__handle_domain_irq) from [<c043c610>] 
(__irq_usr+0x50/0x80)
<0>[  145.678741] Code: e59d400c e5943014 e1530006 0a000025 (e593505c)
<4>[  145.685241] ---[ end trace 17f822c9893a7c21 ]---
<0>[  145.691864] Kernel pan)c - not syncing: Fatal exception in interrupt

I tracked the problem down to the commit 
<7866a621043fbaca3d7389e9b9f69dd1a2e5a855> ("dev: add per net_device 
packet type chains"). After reverting that commit, the oops no longer 
appear.

Userspace on Nokia N900 talks to the modem via phonet interface.

Please advice on how to proceed to fix the problem and if there is 
anything else I can provide.

Regards,
Ivo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-07 17:54 [OOPS] In __netif_receive_skb_core Ivaylo Dimitrov
@ 2016-01-10 17:48 ` Ivaylo Dimitrov
  2016-01-10 20:26   ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Ivaylo Dimitrov @ 2016-01-10 17:48 UTC (permalink / raw)
  To: noureddine, davem
  Cc: Pali Rohár, Network Development, LKML, Sebastian Reichel,
	linux-omap

Anyone?

On  7.01.2016 19:54, Ivaylo Dimitrov wrote:
> Hi,
>
> Trying to bring up the modem interface on Nokia n900 device with recent
> linux, leads to the following kernel oops (read from the mtdoops device):
>
> <6>[  144.009765] bq27xxx-battery 2-0055: battery is not calibrated!
> ignoring capacity values
> <5>[  145.\x1188964] ssi-protocol ssi-protocol: WAKELINES TEST OK
> <6>[  145.194793] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes
> ready
> <1>[  145.358154] Unable to handle kernel NULL pointer dereference at
> virtual address 0000005c
> <1>[  145.366821] pgd = ce530000
> <1>[  145.369689] [0000005c] *pgd=8e44c831, *pte=00000000, *ppte=00000000
> <0>[  145.376373] Internal error: Oops: 17 [#1] PREEMPT ARM
> <4>[  145.381683] Modules linked in: cmt_speech nokia_modem ssi_protocol
> sha256_generic hmac drbg ansi_cprng ctr ccm rfcomm sd_mod scsi_mod
> bnep"bluetooth omaplfb pvrsrvkm ipv6 bq2415x_charger uinput hsi_char
> radio_platform_si4713 joydev omap_ssi_port video_bus_switch arc4
> wl1251_spi gpio_keys isp1704_charger wl1251 mac80211 smc91x mii cfg80211
> omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713 bq27xxx_battery
> leds_lp5523 leds_lp55xx_common adp1653 tsl2563 rtc_twl twl4030_wdt
> et8ek8 ad5820 v4l2_common smiaregs videodev twl4030_vibra ff_memless
> media lis3lv02d_i2c lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal
> thermal_sys hwmon rx51_battery
> <4>[  145.441802] CPU: 0 PID: 1040 Comm: csd Not tainted 4.4.0-rc7+ #2
> <4>[  145.448120] Hardware name: Nokia RX-51 board
> <4>[  145.452636] task: ce517700 ti: cef50000 task.ti: cef50000
> <4>[  145.458343] PC is at __netif_receive_skb_core+0x7c0/0x92c
> <4>[  145.464050] LR is at sock_queue_rcv_skb+0x208/0x214
> <4>[  145.469207] pc : [<c0393ebc>]    lr : [<c03852f4>]    psr: 00 00113
> <4>[  145.469207] sp : cef51e98  ip : 15800000  fp : c3a4a780
> <4>[  145.481292] r10: c3a2005c  r9 : 00000000  r8 : c3987834
> <4>[  145.486816] r7 : 0000f500  r6 : c3a20000  r5 : c3a20048  r4 :
> c3987780
> <4>[  145.493682] r3 : 00000000  r2 : 00000002  r1 : 00000000  r0 :
> 00000000
> <4>[  145.500579] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
> Segment none
> <4>[  145.508087] Control: 10c5387d  Table: 8e530019  DAC: 00000051
> <0>[  145.514160] Process csd (pid: 1040, stack limit = 0xcef50210)
> <0>[  145.520202] Stack: (0xcef51e98 to 0xcef52000)
> <0>[  145.524810] 1e80:       cf334000 cf3ccd90
> <0>[  145.533447] 1ea0: 00000002 c3987780 c0651400 c3a5507c c3912000
> c0068534 c3a5507c 40000113
> <0>[  145.542083] 1ec0: ffffffff bf3b5fac bf3b5d88 ffffffff 00000000
> c0675f80 c3987780 00000000
> <0>[  145.550720] 1ee0: c0675f74 c0675f48 cef51f18 00000040 cef51f20
> c03965f4 c0396580 c0675f80
> <0>[  145.559356] 1f00: 00000001 0000012c 00000040 c0676217 ffffc399
> c0396d48 cef51f18 cef51f18
> <0>[  145.567993] 1f20: cef51f20 cef51f20 cf334000 00000008 c0677240
> 00000008 c0677200 c067724c
> <0>[  145.576629] 1f40: 00000100 00400100 c0440be4 c0030dec cf807f00
> fa2000cc 00000003 00000003
> <0>[  145.585266] 1f60: ffffc398 00000004 10c53c7d 00000000 00000000
> cf802000 00000001 10c53c7d
> <0>[  145.593902] 1f80: b6ea4f44 00001684 bee8e7dc c003119c 00000000
> c005b44c cef51fb0 b6e1b024
> <0>[  145.602539] 1fa0: a0000010 ffffffff 10c5387d c043c610 b6ea71bc
> 00000000 00000000 00002f48
> <0>[  145.611175] 1fc0: 00000000 b6ea71bc 00002f48 b6ebb684 b6eba000
> b6ea4f44 00001684 bee8e7dc
> <0>[  145.619812] 1fe0: 00019817 bee8e788 b6e19d0c b6e1b024 a0000010
> ffffffff 00000000 00000000
> <4>[  145.628648] [<c0393ebc>] (__netif_receive_skb_core) from
> [<c03965f4>] (process_backlog+0x74/0xf4)
> <4>[  145.637817] [<c03965f4>] (process_backlog) from [<c0396d48>]
> (net_rx_action+0xd0/0x284)
> <4>[  145.646301] [<c0396d48>] (net_rx_action) from [<c0030dec>]
> (__do_softirq+0xb0/0x208)
> <4>[  145.654479] [<c0030dec>] (__do_softirq) from [<c003119c>]
> (irq_exit+0x80/0xe4)
> <4>[  145.662109] [<c003119c>] (irq_exit) from [<c005b44c>]
> (__handle_domain_irq+0x88/0xa8)
> <4>[  145.670379] [<c005b44c>] (__handle_domain_irq) from [<c043c610>]
> (__irq_usr+0x50/0x80)
> <0>[  145.678741] Code: e59d400c e5943014 e1530006 0a000025 (e593505c)
> <4>[  145.685241] ---[ end trace 17f822c9893a7c21 ]---
> <0>[  145.691864] Kernel pan)c - not syncing: Fatal exception in interrupt
>
> I tracked the problem down to the commit
> <7866a621043fbaca3d7389e9b9f69dd1a2e5a855> ("dev: add per net_device
> packet type chains"). After reverting that commit, the oops no longer
> appear.
>
> Userspace on Nokia N900 talks to the modem via phonet interface.
>
> Please advice on how to proceed to fix the problem and if there is
> anything else I can provide.
>
> Regards,
> Ivo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-10 17:48 ` Ivaylo Dimitrov
@ 2016-01-10 20:26   ` Eric Dumazet
  2016-01-11 21:03     ` Ivaylo Dimitrov
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2016-01-10 20:26 UTC (permalink / raw)
  To: Ivaylo Dimitrov
  Cc: noureddine, davem, Pali Rohár, Network Development, LKML,
	Sebastian Reichel, linux-omap

On Sun, 2016-01-10 at 19:48 +0200, Ivaylo Dimitrov wrote:
> Anyone?
> 
> On  7.01.2016 19:54, Ivaylo Dimitrov wrote:
> > Hi,
> >
> > Trying to bring up the modem interface on Nokia n900 device with recent
> > linux, leads to the following kernel oops (read from the mtdoops device):
> >
> > <6>[  144.009765] bq27xxx-battery 2-0055: battery is not calibrated!
> > ignoring capacity values
> > <5>[  145.\x1188964] ssi-protocol ssi-protocol: WAKELINES TEST OK
> > <6>[  145.194793] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes
> > ready
> > <1>[  145.358154] Unable to handle kernel NULL pointer dereference at
> > virtual address 0000005c
> > <1>[  145.366821] pgd = ce530000
> > <1>[  145.369689] [0000005c] *pgd=8e44c831, *pte=00000000, *ppte=00000000
> > <0>[  145.376373] Internal error: Oops: 17 [#1] PREEMPT ARM
> > <4>[  145.381683] Modules linked in: cmt_speech nokia_modem ssi_protocol
> > sha256_generic hmac drbg ansi_cprng ctr ccm rfcomm sd_mod scsi_mod
> > bnep"bluetooth omaplfb pvrsrvkm ipv6 bq2415x_charger uinput hsi_char
> > radio_platform_si4713 joydev omap_ssi_port video_bus_switch arc4
> > wl1251_spi gpio_keys isp1704_charger wl1251 mac80211 smc91x mii cfg80211
> > omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713 bq27xxx_battery
> > leds_lp5523 leds_lp55xx_common adp1653 tsl2563 rtc_twl twl4030_wdt
> > et8ek8 ad5820 v4l2_common smiaregs videodev twl4030_vibra ff_memless
> > media lis3lv02d_i2c lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal
> > thermal_sys hwmon rx51_battery
> > <4>[  145.441802] CPU: 0 PID: 1040 Comm: csd Not tainted 4.4.0-rc7+ #2
> > <4>[  145.448120] Hardware name: Nokia RX-51 board
> > <4>[  145.452636] task: ce517700 ti: cef50000 task.ti: cef50000
> > <4>[  145.458343] PC is at __netif_receive_skb_core+0x7c0/0x92c
> > <4>[  145.464050] LR is at sock_queue_rcv_skb+0x208/0x214
> > <4>[  145.469207] pc : [<c0393ebc>]    lr : [<c03852f4>]    psr: 00 00113
> > <4>[  145.469207] sp : cef51e98  ip : 15800000  fp : c3a4a780
> > <4>[  145.481292] r10: c3a2005c  r9 : 00000000  r8 : c3987834
> > <4>[  145.486816] r7 : 0000f500  r6 : c3a20000  r5 : c3a20048  r4 :
> > c3987780
> > <4>[  145.493682] r3 : 00000000  r2 : 00000002  r1 : 00000000  r0 :
> > 00000000
> > <4>[  145.500579] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
> > Segment none
> > <4>[  145.508087] Control: 10c5387d  Table: 8e530019  DAC: 00000051
> > <0>[  145.514160] Process csd (pid: 1040, stack limit = 0xcef50210)
> > <0>[  145.520202] Stack: (0xcef51e98 to 0xcef52000)
> > <0>[  145.524810] 1e80:       cf334000 cf3ccd90
> > <0>[  145.533447] 1ea0: 00000002 c3987780 c0651400 c3a5507c c3912000
> > c0068534 c3a5507c 40000113
> > <0>[  145.542083] 1ec0: ffffffff bf3b5fac bf3b5d88 ffffffff 00000000
> > c0675f80 c3987780 00000000
> > <0>[  145.550720] 1ee0: c0675f74 c0675f48 cef51f18 00000040 cef51f20
> > c03965f4 c0396580 c0675f80
> > <0>[  145.559356] 1f00: 00000001 0000012c 00000040 c0676217 ffffc399
> > c0396d48 cef51f18 cef51f18
> > <0>[  145.567993] 1f20: cef51f20 cef51f20 cf334000 00000008 c0677240
> > 00000008 c0677200 c067724c
> > <0>[  145.576629] 1f40: 00000100 00400100 c0440be4 c0030dec cf807f00
> > fa2000cc 00000003 00000003
> > <0>[  145.585266] 1f60: ffffc398 00000004 10c53c7d 00000000 00000000
> > cf802000 00000001 10c53c7d
> > <0>[  145.593902] 1f80: b6ea4f44 00001684 bee8e7dc c003119c 00000000
> > c005b44c cef51fb0 b6e1b024
> > <0>[  145.602539] 1fa0: a0000010 ffffffff 10c5387d c043c610 b6ea71bc
> > 00000000 00000000 00002f48
> > <0>[  145.611175] 1fc0: 00000000 b6ea71bc 00002f48 b6ebb684 b6eba000
> > b6ea4f44 00001684 bee8e7dc
> > <0>[  145.619812] 1fe0: 00019817 bee8e788 b6e19d0c b6e1b024 a0000010
> > ffffffff 00000000 00000000
> > <4>[  145.628648] [<c0393ebc>] (__netif_receive_skb_core) from
> > [<c03965f4>] (process_backlog+0x74/0xf4)
> > <4>[  145.637817] [<c03965f4>] (process_backlog) from [<c0396d48>]
> > (net_rx_action+0xd0/0x284)
> > <4>[  145.646301] [<c0396d48>] (net_rx_action) from [<c0030dec>]
> > (__do_softirq+0xb0/0x208)
> > <4>[  145.654479] [<c0030dec>] (__do_softirq) from [<c003119c>]
> > (irq_exit+0x80/0xe4)
> > <4>[  145.662109] [<c003119c>] (irq_exit) from [<c005b44c>]
> > (__handle_domain_irq+0x88/0xa8)
> > <4>[  145.670379] [<c005b44c>] (__handle_domain_irq) from [<c043c610>]
> > (__irq_usr+0x50/0x80)
> > <0>[  145.678741] Code: e59d400c e5943014 e1530006 0a000025 (e593505c)
> > <4>[  145.685241] ---[ end trace 17f822c9893a7c21 ]---
> > <0>[  145.691864] Kernel pan)c - not syncing: Fatal exception in interrupt
> >
> > I tracked the problem down to the commit
> > <7866a621043fbaca3d7389e9b9f69dd1a2e5a855> ("dev: add per net_device
> > packet type chains"). After reverting that commit, the oops no longer
> > appear.
> >
> > Userspace on Nokia N900 talks to the modem via phonet interface.
> >
> > Please advice on how to proceed to fix the problem and if there is
> > anything else I can provide.
> >

I do not see anything wrong with this commit.

It must uncover a prior bug.

Seems to be a phonet bug not reacting to NETDEV_DOWN maybe ?


diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c
index a58680016472..fd2f44940bd7 100644
--- a/net/phonet/pn_dev.c
+++ b/net/phonet/pn_dev.c
@@ -63,11 +63,11 @@ struct phonet_device_list *phonet_device_list(struct net *net)
 static struct phonet_device *__phonet_device_alloc(struct net_device *dev)
 {
 	struct phonet_device_list *pndevs = phonet_device_list(dev_net(dev));
-	struct phonet_device *pnd = kmalloc(sizeof(*pnd), GFP_ATOMIC);
-	if (pnd == NULL)
+	struct phonet_device *pnd = kzalloc(sizeof(*pnd), GFP_KERNEL);
+
+	if (!pnd)
 		return NULL;
 	pnd->netdev = dev;
-	bitmap_zero(pnd->addrs, 64);
 
 	BUG_ON(!mutex_is_locked(&pndevs->lock));
 	list_add_rcu(&pnd->list, &pndevs->list);
@@ -117,7 +117,7 @@ static void phonet_device_destroy(struct net_device *dev)
 
 		for_each_set_bit(addr, pnd->addrs, 64)
 			phonet_address_notify(RTM_DELADDR, dev, addr);
-		kfree(pnd);
+		kfree_rcu(pnd, rcu);
 	}
 }
 
@@ -301,6 +301,7 @@ static int phonet_device_notify(struct notifier_block *me, unsigned long what,
 		if (dev->type == ARPHRD_PHONET)
 			phonet_device_autoconf(dev);
 		break;
+	case NETDEV_DOWN:
 	case NETDEV_UNREGISTER:
 		phonet_device_destroy(dev);
 		phonet_route_autodel(dev);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-10 20:26   ` Eric Dumazet
@ 2016-01-11 21:03     ` Ivaylo Dimitrov
  2016-01-11 22:11       ` Salam Noureddine
  0 siblings, 1 reply; 17+ messages in thread
From: Ivaylo Dimitrov @ 2016-01-11 21:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: noureddine, davem, Pali Rohár, Network Development, LKML,
	Sebastian Reichel, linux-omap



On 10.01.2016 22:26, Eric Dumazet wrote:
>
> I do not see anything wrong with this commit.
>
> It must uncover a prior bug.
>

Could be, but it was working fine in 3.19.

> Seems to be a phonet bug not reacting to NETDEV_DOWN maybe ?
>
>
> diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c
> index a58680016472..fd2f44940bd7 100644
> --- a/net/phonet/pn_dev.c
> +++ b/net/phonet/pn_dev.c

The patch makes difference:

<3>[  118.943542] et8ek8 3-003e: could not get clock
<6>[  119.028045] IPv6: ADDRCONF(NETDEV_UP): phonet0: link is not ready
<4>[  120.945922] sched: RT throttling activated
<5>[  158.556091] ssi-protocol ssi-protocol: WAKELINES TEST OK
<6>[  158.561828] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes ready
<1>[  158.708831] Unable to handle kernel NULL pointer dereference at 
virtual address 0000005c
<1>[  158.717498] pgd = ce548000
<1>[  158.720367] [0000005c] *pgd=8e53a831, *pte=00000000, *ppte=00000000
<0>[  158.727020] Internal error: Oops: 17 [#1] PREEMPT ARM
<4>[  158.732330] Modules linked in: cmt_speech nokia_modem ssi_protocol 
vfat fat rfcomm sd_mod scsi_mod bnep bluetooth omaplfb pvrsrvkm ipv6 
bq2415x_charger uinput hsi_char radio_platform_si4713 joydev 
omap_ssi_port video_bus_switch arc4 wl1251_spi wl1251 isp1704_charger 
gpio_keys mac80211 smc91x mii cfg80211 omap_wdt omap_sham crc7 tsc2005 
tsc200x_core si4713 leds_lp5523 leds_lp55xx_common adp1653 
bq27xxx_battery tsl2563 twl4030_wdt rtc_twl et8ek8 smiaregs ad5820 
v4l2_common twl4030_vibra ff_memless videodev lis3lv02d_i2c media 
lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal thermal_sys hwmon 
rx51_battery
<4>[  158.789154] CPU: 0 PID: 1038 Comm: csd Not tainted 4.4.0-rc7+ #14
<4>[  158.795562] Hardware name: Nokia RX-51 board
<4>[  158.800048] task: ce4d5900 ti: ce4e4000 task.ti: ce4e4000
<4>[  158.805725] PC is at __netif_receive_skb_core+0x7c0/0x92c
<4>[  158.811401] LR is at sock_queue_rcv_skb+0x208/0x214
<4>[  158.816528] pc : [<c0394218>]    lr : [<c0385648>]    psr: 00000113
<4>[  158.816528] sp : ce4e5e98  ip : 14400000  fp : cd4ab380
<4>[  158.828582] r10: c249405c  r9 : 00000000  r8 : c24b2d74
<4>[  158.834045] r7 : 0000f500  r6 : c2494000  r5 : c2494048  r4 : c24b2cc0
<4>[  158.840911] r3 : 00000000  r2 : 00000002  r1 : 00000000  r0 : 00000000
<4>[  158.847778] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
Segment none
<4>[  158.855285] Control: 10c5387d  Table: 8e548019  DAC: 00000051
<0>[  158.861297] Process csd (pid: 1038, stack limit = 0xce4e4210)
<0>[  158.867340] Stack: (0xce4e5e98 to 0xce4e6000)
<0>[  158.871917] 5e80: 
       cf3ab000 cf2e4d90
<0>[  158.880554] 5ea0: 00000002 c24b2cc0 c06534dc c243b67c c5dcae00 
c00685ac c243b67c 40000113
<0>[  158.889129] 5ec0: ffffffff bf3b0fac bf3b0d88 ffffffff 00000000 
c06780c0 c24b2cc0 00000000
<0>[  158.897735] 5ee0: c06780b4 c0678088 ce4e5f18 00000040 ce4e5f20 
c0396950 c03968dc c06780c0
<0>[  158.906341] 5f00: 00000001 0000012c 00000040 c0678357 ffffc8d0 
c03970a4 ce4e5f18 ce4e5f18
<0>[  158.914947] 5f20: ce4e5f20 ce4e5f20 cf3ab000 00000008 c0679380 
00000008 c0679340 c067938c
<0>[  158.923522] 5f40: 00000100 00400100 c0441be8 c0030e64 cf807f00 
fa2000cc 00000003 00000003
<0>[  158.932128] 5f60: ffffc8cf 00000004 10c53c7d 00000000 00000000 
cf802000 00000001 10c53c7d
<0>[  158.940734] 5f80: 00001a30 0000026c beba476c c0031214 00000000 
c005b4c4 ce4e5fb0 b6e3f82c
<0>[  158.949310] 5fa0: 40000010 ffffffff 10c5387d c043c990 00000014 
00000000 0001ef78 00000014
<0>[  158.957946] 5fc0: 0001edf8 00000001 00000000 00000000 00000000 
00001a30 0000026c beba476c
<0>[  158.966522] 5fe0: b6e4d82c beba4748 b6e3f82c b6e3f82c 40000010 
ffffffff ddee3fd6 e7f5d5dd
<4>[  158.975158] [<c0394218>] (__netif_receive_skb_core) from 
[<c0396950>] (process_backlog+0x74/0xf4)
<4>[  158.984497] [<c0396950>] (process_backlog) from [<c03970a4>] 
(net_rx_action+0xd0/0x284)
<4>[  158.992919] [<c03970a4>] (net_rx_action) from [<c0030e64>] 
(__do_softirq+0xb0/0x208)
<4>[  159.001037] [<c0030e64>] (__do_softirq) from [<c0031214>] 
(irq_exit+0x80/0xe4)
<4>[  159.008636] [<c0031214>] (irq_exit) from [<c005b4c4>] 
(__handle_domain_irq+0x88/0xa8)
<4>[  159.016906] [<c005b4c4>] (__handle_domain_irq) from [<c043c990>] 
(__irq_usr+0x50/0x80)
<0>[  159.025207] Code: e59d400c e5943014 e1530006 0a000025 (e593505c)
<4>[  159.031677] ---[ end trace c52d1e36f2d07d67 ]---
<0>[  159.038146] Kernel panic - not syncing: Fatal exception in interrupt
--[ end trace c52d1e36f2d07d67 ]---

Could you provide some hints on where to put some printks or dump some 
data to get a clearer picture on what happens?

Thanks,
Ivo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-11 21:03     ` Ivaylo Dimitrov
@ 2016-01-11 22:11       ` Salam Noureddine
  2016-01-12  0:51         ` Ivaylo Dimitrov
  0 siblings, 1 reply; 17+ messages in thread
From: Salam Noureddine @ 2016-01-11 22:11 UTC (permalink / raw)
  To: Ivaylo Dimitrov
  Cc: Eric Dumazet, David S. Miller, Pali Rohár,
	Network Development, LKML, Sebastian Reichel, linux-omap

Would you be able to disassemble your kernel so we could tell where
the null pointer dereference happens?

Thanks,

Salam

On Mon, Jan 11, 2016 at 1:03 PM, Ivaylo Dimitrov
<ivo.g.dimitrov.75@gmail.com> wrote:
>
>
> On 10.01.2016 22:26, Eric Dumazet wrote:
>>
>>
>> I do not see anything wrong with this commit.
>>
>> It must uncover a prior bug.
>>
>
> Could be, but it was working fine in 3.19.
>
>> Seems to be a phonet bug not reacting to NETDEV_DOWN maybe ?
>>
>>
>> diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c
>> index a58680016472..fd2f44940bd7 100644
>> --- a/net/phonet/pn_dev.c
>> +++ b/net/phonet/pn_dev.c
>
>
> The patch makes difference:
>
> <3>[  118.943542] et8ek8 3-003e: could not get clock
> <6>[  119.028045] IPv6: ADDRCONF(NETDEV_UP): phonet0: link is not ready
> <4>[  120.945922] sched: RT throttling activated
> <5>[  158.556091] ssi-protocol ssi-protocol: WAKELINES TEST OK
> <6>[  158.561828] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes ready
> <1>[  158.708831] Unable to handle kernel NULL pointer dereference at
> virtual address 0000005c
> <1>[  158.717498] pgd = ce548000
> <1>[  158.720367] [0000005c] *pgd=8e53a831, *pte=00000000, *ppte=00000000
> <0>[  158.727020] Internal error: Oops: 17 [#1] PREEMPT ARM
> <4>[  158.732330] Modules linked in: cmt_speech nokia_modem ssi_protocol
> vfat fat rfcomm sd_mod scsi_mod bnep bluetooth omaplfb pvrsrvkm ipv6
> bq2415x_charger uinput hsi_char radio_platform_si4713 joydev omap_ssi_port
> video_bus_switch arc4 wl1251_spi wl1251 isp1704_charger gpio_keys mac80211
> smc91x mii cfg80211 omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713
> leds_lp5523 leds_lp55xx_common adp1653 bq27xxx_battery tsl2563 twl4030_wdt
> rtc_twl et8ek8 smiaregs ad5820 v4l2_common twl4030_vibra ff_memless videodev
> lis3lv02d_i2c media lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal
> thermal_sys hwmon rx51_battery
> <4>[  158.789154] CPU: 0 PID: 1038 Comm: csd Not tainted 4.4.0-rc7+ #14
> <4>[  158.795562] Hardware name: Nokia RX-51 board
> <4>[  158.800048] task: ce4d5900 ti: ce4e4000 task.ti: ce4e4000
> <4>[  158.805725] PC is at __netif_receive_skb_core+0x7c0/0x92c
> <4>[  158.811401] LR is at sock_queue_rcv_skb+0x208/0x214
> <4>[  158.816528] pc : [<c0394218>]    lr : [<c0385648>]    psr: 00000113
> <4>[  158.816528] sp : ce4e5e98  ip : 14400000  fp : cd4ab380
> <4>[  158.828582] r10: c249405c  r9 : 00000000  r8 : c24b2d74
> <4>[  158.834045] r7 : 0000f500  r6 : c2494000  r5 : c2494048  r4 : c24b2cc0
> <4>[  158.840911] r3 : 00000000  r2 : 00000002  r1 : 00000000  r0 : 00000000
> <4>[  158.847778] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
> Segment none
> <4>[  158.855285] Control: 10c5387d  Table: 8e548019  DAC: 00000051
> <0>[  158.861297] Process csd (pid: 1038, stack limit = 0xce4e4210)
> <0>[  158.867340] Stack: (0xce4e5e98 to 0xce4e6000)
> <0>[  158.871917] 5e80:       cf3ab000 cf2e4d90
> <0>[  158.880554] 5ea0: 00000002 c24b2cc0 c06534dc c243b67c c5dcae00
> c00685ac c243b67c 40000113
> <0>[  158.889129] 5ec0: ffffffff bf3b0fac bf3b0d88 ffffffff 00000000
> c06780c0 c24b2cc0 00000000
> <0>[  158.897735] 5ee0: c06780b4 c0678088 ce4e5f18 00000040 ce4e5f20
> c0396950 c03968dc c06780c0
> <0>[  158.906341] 5f00: 00000001 0000012c 00000040 c0678357 ffffc8d0
> c03970a4 ce4e5f18 ce4e5f18
> <0>[  158.914947] 5f20: ce4e5f20 ce4e5f20 cf3ab000 00000008 c0679380
> 00000008 c0679340 c067938c
> <0>[  158.923522] 5f40: 00000100 00400100 c0441be8 c0030e64 cf807f00
> fa2000cc 00000003 00000003
> <0>[  158.932128] 5f60: ffffc8cf 00000004 10c53c7d 00000000 00000000
> cf802000 00000001 10c53c7d
> <0>[  158.940734] 5f80: 00001a30 0000026c beba476c c0031214 00000000
> c005b4c4 ce4e5fb0 b6e3f82c
> <0>[  158.949310] 5fa0: 40000010 ffffffff 10c5387d c043c990 00000014
> 00000000 0001ef78 00000014
> <0>[  158.957946] 5fc0: 0001edf8 00000001 00000000 00000000 00000000
> 00001a30 0000026c beba476c
> <0>[  158.966522] 5fe0: b6e4d82c beba4748 b6e3f82c b6e3f82c 40000010
> ffffffff ddee3fd6 e7f5d5dd
> <4>[  158.975158] [<c0394218>] (__netif_receive_skb_core) from [<c0396950>]
> (process_backlog+0x74/0xf4)
> <4>[  158.984497] [<c0396950>] (process_backlog) from [<c03970a4>]
> (net_rx_action+0xd0/0x284)
> <4>[  158.992919] [<c03970a4>] (net_rx_action) from [<c0030e64>]
> (__do_softirq+0xb0/0x208)
> <4>[  159.001037] [<c0030e64>] (__do_softirq) from [<c0031214>]
> (irq_exit+0x80/0xe4)
> <4>[  159.008636] [<c0031214>] (irq_exit) from [<c005b4c4>]
> (__handle_domain_irq+0x88/0xa8)
> <4>[  159.016906] [<c005b4c4>] (__handle_domain_irq) from [<c043c990>]
> (__irq_usr+0x50/0x80)
> <0>[  159.025207] Code: e59d400c e5943014 e1530006 0a000025 (e593505c)
> <4>[  159.031677] ---[ end trace c52d1e36f2d07d67 ]---
> <0>[  159.038146] Kernel panic - not syncing: Fatal exception in interrupt
> --[ end trace c52d1e36f2d07d67 ]---
>
> Could you provide some hints on where to put some printks or dump some data
> to get a clearer picture on what happens?
>
> Thanks,
> Ivo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-11 22:11       ` Salam Noureddine
@ 2016-01-12  0:51         ` Ivaylo Dimitrov
  2016-01-12  1:06           ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Ivaylo Dimitrov @ 2016-01-12  0:51 UTC (permalink / raw)
  To: Salam Noureddine
  Cc: Eric Dumazet, David S. Miller, Pali Rohár,
	Network Development, LKML, Sebastian Reichel, linux-omap



On 12.01.2016 00:11, Salam Noureddine wrote:
> Would you be able to disassemble your kernel so we could tell where
> the null pointer dereference happens?
>

Sure, but wouldn't it be better to provide the object file containing 
the debug symbols as well?

Otherwise, the null pointer dereference happens somewhere in:

(gdb) l *__netif_receive_skb_core+0x7c0
0x1318 is in __netif_receive_skb_core (include/linux/compiler.h:218).
213	})
214	
215	static __always_inline
216	void __read_once_size(const volatile void *p, void *res, int size)
217	{
218		__READ_ONCE_SIZE;
219	}
220	
221	#ifdef CONFIG_KASAN
222	/*

(gdb) l *__netif_receive_skb_core+0x7bc
0x1314 is in __netif_receive_skb_core (net/core/dev.c:3934).
3929		}
3930	
3931		deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
3932				       &orig_dev->ptype_specific);
3933	
3934		if (unlikely(skb->dev != orig_dev)) {
3935			deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
3936					       &skb->dev->ptype_specific);
3937		}
3938	

(gdb) l *__netif_receive_skb_core+0x7c4
0x131c is in __netif_receive_skb_core (net/core/dev.c:3935).
3930	
3931		deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
3932				       &orig_dev->ptype_specific);
3933	
3934		if (unlikely(skb->dev != orig_dev)) {
3935			deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
3936					       &skb->dev->ptype_specific);
3937		}
3938	
3939		if (pt_prev) {


0x00001300 <+1960>:	cmp	r10, r3
0x00001304 <+1964>:	bne	0x1284 <__netif_receive_skb_core+1836>
0x00001308 <+1968>:	ldr	r4, [sp, #12]
0x0000130c <+1972>:	ldr	r3, [r4, #20]
0x00001310 <+1976>:	cmp	r3, r6
0x00001314 <+1980>:	beq	0x13b0 <__netif_receive_skb_core+2136>

0x00001318 <+1984>:	ldr	r5, [r3, #92]	; 0x5c   <-FAULT

r3 seems to be skb->dev

0x0000131c <+1988>:	add	r10, r3, #92	; 0x5c
0x00001320 <+1992>:	add	r8, r4, #180	; 0xb4
0x00001324 <+1996>:	sub	r5, r5, #20
0x00001328 <+2000>:	b	0x13a4 <__netif_receive_skb_core+2124>
0x0000132c <+2004>:	ldrh	r3, [r5]
0x00001330 <+2008>:	cmp	r3, r7


I put some additional printks around that code, and it turned out that 
skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but 
"&skb->dev->ptype_specific" oopses.

Thanks,
Ivo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-12  0:51         ` Ivaylo Dimitrov
@ 2016-01-12  1:06           ` Eric Dumazet
  2016-01-12  1:19             ` Salam Noureddine
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2016-01-12  1:06 UTC (permalink / raw)
  To: Ivaylo Dimitrov
  Cc: Salam Noureddine, David S. Miller, Pali Rohár,
	Network Development, LKML, Sebastian Reichel, linux-omap

On Tue, 2016-01-12 at 02:51 +0200, Ivaylo Dimitrov wrote:

> 
> I put some additional printks around that code, and it turned out that 
> skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but 
> "&skb->dev->ptype_specific" oopses.

Nice find !

Now lets find what possibly called netif_rx() with skb->dev == NULL 

(This is illegal)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-12  1:06           ` Eric Dumazet
@ 2016-01-12  1:19             ` Salam Noureddine
  2016-01-12  2:21               ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Salam Noureddine @ 2016-01-12  1:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ivaylo Dimitrov, David S. Miller, Pali Rohár,
	Network Development, LKML, Sebastian Reichel, linux-omap

It must be that skb->dev was changed to NULL inside of
__netif_receive_skb_core, otherwise we would have crashed much
earlier. Also, orig_dev is saved at the beginning. Possibly a device
is layered on top of the original device.

On Mon, Jan 11, 2016 at 5:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-01-12 at 02:51 +0200, Ivaylo Dimitrov wrote:
>
>>
>> I put some additional printks around that code, and it turned out that
>> skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but
>> "&skb->dev->ptype_specific" oopses.
>
> Nice find !
>
> Now lets find what possibly called netif_rx() with skb->dev == NULL
>
> (This is illegal)
>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-12  1:19             ` Salam Noureddine
@ 2016-01-12  2:21               ` Eric Dumazet
  2016-01-12  2:25                 ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2016-01-12  2:21 UTC (permalink / raw)
  To: Salam Noureddine
  Cc: Ivaylo Dimitrov, David S. Miller, Pali Rohár,
	Network Development, LKML, Sebastian Reichel, linux-omap

On Mon, 2016-01-11 at 17:19 -0800, Salam Noureddine wrote:
> It must be that skb->dev was changed to NULL inside of
> __netif_receive_skb_core, otherwise we would have crashed much
> earlier. Also, orig_dev is saved at the beginning. Possibly a device
> is layered on top of the original device.

Please do not top post on netdev / lkml mailing lists.

My guess is a protocol handler queued the skb without calling
skb_share_check()

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-12  2:21               ` Eric Dumazet
@ 2016-01-12  2:25                 ` Eric Dumazet
  2016-01-12  7:16                   ` Ivaylo Dimitrov
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2016-01-12  2:25 UTC (permalink / raw)
  To: Salam Noureddine
  Cc: Ivaylo Dimitrov, David S. Miller, Pali Rohár,
	Network Development, LKML, Sebastian Reichel, linux-omap

On Mon, 2016-01-11 at 18:21 -0800, Eric Dumazet wrote:
> On Mon, 2016-01-11 at 17:19 -0800, Salam Noureddine wrote:
> > It must be that skb->dev was changed to NULL inside of
> > __netif_receive_skb_core, otherwise we would have crashed much
> > earlier. Also, orig_dev is saved at the beginning. Possibly a device
> > is layered on top of the original device.
> 
> Please do not top post on netdev / lkml mailing lists.
> 
> My guess is a protocol handler queued the skb without calling
> skb_share_check()
> 

OK please try this fix :

diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
index 10d42f3220ab..f925753668a7 100644
--- a/net/phonet/af_phonet.c
+++ b/net/phonet/af_phonet.c
@@ -377,6 +377,10 @@ static int phonet_rcv(struct sk_buff *skb, struct net_device *dev,
 	struct sockaddr_pn sa;
 	u16 len;
 
+	skb = skb_share_check(skb, GFP_ATOMIC);
+	if (!skb)
+		return NET_RX_DROP;
+
 	/* check we have at least a full Phonet header */
 	if (!pskb_pull(skb, sizeof(struct phonethdr)))
 		goto out;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-12  2:25                 ` Eric Dumazet
@ 2016-01-12  7:16                   ` Ivaylo Dimitrov
  2016-01-12 14:19                     ` Eric Dumazet
  2016-01-12 18:15                     ` [OOPS] In __netif_receive_skb_core Salam Noureddine
  0 siblings, 2 replies; 17+ messages in thread
From: Ivaylo Dimitrov @ 2016-01-12  7:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Salam Noureddine, David S. Miller, Pali Rohár,
	Network Development, LKML, Sebastian Reichel, linux-omap



On 12.01.2016 04:25, Eric Dumazet wrote:
> On Mon, 2016-01-11 at 18:21 -0800, Eric Dumazet wrote:
>> On Mon, 2016-01-11 at 17:19 -0800, Salam Noureddine wrote:
>>> It must be that skb->dev was changed to NULL inside of
>>> __netif_receive_skb_core, otherwise we would have crashed much
>>> earlier. Also, orig_dev is saved at the beginning. Possibly a device
>>> is layered on top of the original device.

Exactly (skb->dev was changed to NULL ...). Do you think it makes sense 
to put printks in various places in __netif_receive_skb_core to see 
after which function call skb->dev turns into NULL?

>
> OK please try this fix :
>
> diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
> index 10d42f3220ab..f925753668a7 100644
> --- a/net/phonet/af_phonet.c
> +++ b/net/phonet/af_phonet.c
> @@ -377,6 +377,10 @@ static int phonet_rcv(struct sk_buff *skb, struct net_device *dev,
>   	struct sockaddr_pn sa;
>   	u16 len;
>
> +	skb = skb_share_check(skb, GFP_ATOMIC);
> +	if (!skb)
> +		return NET_RX_DROP;
> +
>   	/* check we have at least a full Phonet header */
>   	if (!pskb_pull(skb, sizeof(struct phonethdr)))
>   		goto out;
>
>

That one fixes the oops, though I wonder if your previous patch is 
needed (I reverted it before testing the current). Unfortunately I don't 
have SIM card around to test GPRS connection with, will do it as soon as 
I find one and will report.

Thanks,
Ivo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-12  7:16                   ` Ivaylo Dimitrov
@ 2016-01-12 14:19                     ` Eric Dumazet
  2016-01-12 16:58                       ` [PATCH net] phonet: properly unshare skbs in phonet_rcv() Eric Dumazet
  2016-01-12 18:15                     ` [OOPS] In __netif_receive_skb_core Salam Noureddine
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2016-01-12 14:19 UTC (permalink / raw)
  To: Ivaylo Dimitrov
  Cc: Salam Noureddine, David S. Miller, Pali Rohár,
	Network Development, LKML, Sebastian Reichel, linux-omap

On Tue, 2016-01-12 at 09:16 +0200, Ivaylo Dimitrov wrote:
> 
> On 12.01.2016 04:25, Eric Dumazet wrote:

> >
> > OK please try this fix :
> >
> > diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
> > index 10d42f3220ab..f925753668a7 100644
> > --- a/net/phonet/af_phonet.c
> > +++ b/net/phonet/af_phonet.c
> > @@ -377,6 +377,10 @@ static int phonet_rcv(struct sk_buff *skb, struct net_device *dev,
> >   	struct sockaddr_pn sa;
> >   	u16 len;
> >
> > +	skb = skb_share_check(skb, GFP_ATOMIC);
> > +	if (!skb)
> > +		return NET_RX_DROP;
> > +
> >   	/* check we have at least a full Phonet header */
> >   	if (!pskb_pull(skb, sizeof(struct phonethdr)))
> >   		goto out;
> >
> >
> 
> That one fixes the oops, though I wonder if your previous patch is 
> needed (I reverted it before testing the current). Unfortunately I don't 
> have SIM card around to test GPRS connection with, will do it as soon as 
> I find one and will report.

Well, this bug in phonet_rcv() is rather obvious, I have no idea why
nobody got crashes or corruptions before today.

I'll send a formal patch.

Thanks for your help !

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net] phonet: properly unshare skbs in phonet_rcv()
  2016-01-12 14:19                     ` Eric Dumazet
@ 2016-01-12 16:58                       ` Eric Dumazet
  2016-01-12 20:47                         ` David Miller
  2016-01-13 12:26                         ` Rémi Denis-Courmont
  0 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2016-01-12 16:58 UTC (permalink / raw)
  To: Ivaylo Dimitrov, Remi Denis-Courmont
  Cc: Salam Noureddine, David S. Miller, Pali Rohár,
	Network Development, LKML, Sebastian Reichel, linux-omap

From: Eric Dumazet <edumazet@google.com>

Ivaylo Dimitrov reported a regression caused by commit 7866a621043f
("dev: add per net_device packet type chains").

skb->dev becomes NULL and we crash in __netif_receive_skb_core().

Before above commit, different kind of bugs or corruptions could happen
without major crash.

But the root cause is that phonet_rcv() can queue skb without checking
if skb is shared or not.

Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests.

Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Remi Denis-Courmont <courmisch@gmail.com>
---
 net/phonet/af_phonet.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
index 10d42f3220ab..f925753668a7 100644
--- a/net/phonet/af_phonet.c
+++ b/net/phonet/af_phonet.c
@@ -377,6 +377,10 @@ static int phonet_rcv(struct sk_buff *skb, struct net_device *dev,
 	struct sockaddr_pn sa;
 	u16 len;
 
+	skb = skb_share_check(skb, GFP_ATOMIC);
+	if (!skb)
+		return NET_RX_DROP;
+
 	/* check we have at least a full Phonet header */
 	if (!pskb_pull(skb, sizeof(struct phonethdr)))
 		goto out;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [OOPS] In __netif_receive_skb_core
  2016-01-12  7:16                   ` Ivaylo Dimitrov
  2016-01-12 14:19                     ` Eric Dumazet
@ 2016-01-12 18:15                     ` Salam Noureddine
  1 sibling, 0 replies; 17+ messages in thread
From: Salam Noureddine @ 2016-01-12 18:15 UTC (permalink / raw)
  To: Ivaylo Dimitrov
  Cc: Eric Dumazet, David S. Miller, Pali Rohár,
	Network Development, LKML, Sebastian Reichel, linux-omap

On Mon, Jan 11, 2016 at 11:16 PM, Ivaylo Dimitrov
<ivo.g.dimitrov.75@gmail.com> wrote:

>>> On Mon, 2016-01-11 at 17:19 -0800, Salam Noureddine wrote:
>>>>
>>>> It must be that skb->dev was changed to NULL inside of
>>>> __netif_receive_skb_core, otherwise we would have crashed much
>>>> earlier. Also, orig_dev is saved at the beginning. Possibly a device
>>>> is layered on top of the original device.
>
>
> Exactly (skb->dev was changed to NULL ...). Do you think it makes sense to
> put printks in various places in __netif_receive_skb_core to see after which
> function call skb->dev turns into NULL?
>

No need anymore since Eric found the culprit in phonet_rcv.
Thanks for your help debugging this!

Salam

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] phonet: properly unshare skbs in phonet_rcv()
  2016-01-12 16:58                       ` [PATCH net] phonet: properly unshare skbs in phonet_rcv() Eric Dumazet
@ 2016-01-12 20:47                         ` David Miller
  2016-01-13 12:26                         ` Rémi Denis-Courmont
  1 sibling, 0 replies; 17+ messages in thread
From: David Miller @ 2016-01-12 20:47 UTC (permalink / raw)
  To: eric.dumazet
  Cc: ivo.g.dimitrov.75, courmisch, noureddine, pali.rohar, netdev,
	linux-kernel, sre, linux-omap

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 12 Jan 2016 08:58:00 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Ivaylo Dimitrov reported a regression caused by commit 7866a621043f
> ("dev: add per net_device packet type chains").
> 
> skb->dev becomes NULL and we crash in __netif_receive_skb_core().
> 
> Before above commit, different kind of bugs or corruptions could happen
> without major crash.
> 
> But the root cause is that phonet_rcv() can queue skb without checking
> if skb is shared or not.
> 
> Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests.
> 
> Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
> Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Remi Denis-Courmont <courmisch@gmail.com>

Applied and queued up for -stable, th anks Eric.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] phonet: properly unshare skbs in phonet_rcv()
  2016-01-12 16:58                       ` [PATCH net] phonet: properly unshare skbs in phonet_rcv() Eric Dumazet
  2016-01-12 20:47                         ` David Miller
@ 2016-01-13 12:26                         ` Rémi Denis-Courmont
  2016-01-13 15:07                           ` Eric Dumazet
  1 sibling, 1 reply; 17+ messages in thread
From: Rémi Denis-Courmont @ 2016-01-13 12:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development

Le 2016-01-12 18:58, Eric Dumazet a écrit :
> From: Eric Dumazet <edumazet@google.com>
>
> Ivaylo Dimitrov reported a regression caused by commit 7866a621043f
> ("dev: add per net_device packet type chains").
>
> skb->dev becomes NULL and we crash in __netif_receive_skb_core().
>
> Before above commit, different kind of bugs or corruptions could 
> happen
> without major crash.

Hmm... was that always a problem then, or did it get introduced 
earlier? I thought it was impossible for skb to be shared on PF-recv 
callback way back.

> But the root cause is that phonet_rcv() can queue skb without 
> checking
> if skb is shared or not.
>
> Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests.
>
> Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
> Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Remi Denis-Courmont <courmisch@gmail.com>
> ---
>  net/phonet/af_phonet.c |    4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/phonet/af_phonet.c b/net/phonet/af_phonet.c
> index 10d42f3220ab..f925753668a7 100644
> --- a/net/phonet/af_phonet.c
> +++ b/net/phonet/af_phonet.c
> @@ -377,6 +377,10 @@ static int phonet_rcv(struct sk_buff *skb,
> struct net_device *dev,
>  	struct sockaddr_pn sa;
>  	u16 len;
>
> +	skb = skb_share_check(skb, GFP_ATOMIC);
> +	if (!skb)
> +		return NET_RX_DROP;
> +
>  	/* check we have at least a full Phonet header */
>  	if (!pskb_pull(skb, sizeof(struct phonethdr)))
>  		goto out;

Ack, thanks.

-- 
Rémi Denis-Courmont
http://www.remlab.net/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net] phonet: properly unshare skbs in phonet_rcv()
  2016-01-13 12:26                         ` Rémi Denis-Courmont
@ 2016-01-13 15:07                           ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2016-01-13 15:07 UTC (permalink / raw)
  To: Rémi Denis-Courmont; +Cc: Network Development

On Wed, 2016-01-13 at 14:26 +0200, Rémi Denis-Courmont wrote:
> Le 2016-01-12 18:58, Eric Dumazet a écrit :
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Ivaylo Dimitrov reported a regression caused by commit 7866a621043f
> > ("dev: add per net_device packet type chains").
> >
> > skb->dev becomes NULL and we crash in __netif_receive_skb_core().
> >
> > Before above commit, different kind of bugs or corruptions could 
> > happen
> > without major crash.
> 
> Hmm... was that always a problem then, or did it get introduced 
> earlier? I thought it was impossible for skb to be shared on PF-recv 
> callback way back.

Always been a problem.

Use tcpdump and you risk use after free.

Or simply panics if skb->head needs to be expanded.

Other relevant commits to explain this :

044453b3efdc90bdd5feffe74b99d95dec70ac43 arp: fix possible crash in
arp_rcv()

b30532515f0a62bfe17207ab00883dd262497006 bonding: Ensure that we unshare
skbs prior to calling pskb_may_pull

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-01-13 15:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-07 17:54 [OOPS] In __netif_receive_skb_core Ivaylo Dimitrov
2016-01-10 17:48 ` Ivaylo Dimitrov
2016-01-10 20:26   ` Eric Dumazet
2016-01-11 21:03     ` Ivaylo Dimitrov
2016-01-11 22:11       ` Salam Noureddine
2016-01-12  0:51         ` Ivaylo Dimitrov
2016-01-12  1:06           ` Eric Dumazet
2016-01-12  1:19             ` Salam Noureddine
2016-01-12  2:21               ` Eric Dumazet
2016-01-12  2:25                 ` Eric Dumazet
2016-01-12  7:16                   ` Ivaylo Dimitrov
2016-01-12 14:19                     ` Eric Dumazet
2016-01-12 16:58                       ` [PATCH net] phonet: properly unshare skbs in phonet_rcv() Eric Dumazet
2016-01-12 20:47                         ` David Miller
2016-01-13 12:26                         ` Rémi Denis-Courmont
2016-01-13 15:07                           ` Eric Dumazet
2016-01-12 18:15                     ` [OOPS] In __netif_receive_skb_core Salam Noureddine

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.