All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-21 21:07 ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-21 21:07 UTC (permalink / raw)
  To: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

I posted about this a few days ago, but this time the patches applied
are minimal and there are no out-of-tree kernel modules loaded.

I have to have the NFS patches (see below) for this test case to run
at all.

Test case is 2500 macvlans, each with a file-IO process running
on it doing writing on it's own mountpoint.  This is using NFSv4,
O_DIRECT, and 10k write() calls (and each file is 10k long, so
open/close file operations as well.  Smaller files and write sizes
showed the problem as well.

The bug is most easily hit during my application's restart, where interfaces
are being (re)configured and the file-io processes are being
started.  This includes changes to: IP addrs, routes, ip routing rules,
mount/unmount operations, etc.

I suspect it may be some sort of race between tearing down the IP protocol
and having traffic currently being received on that
interface.  I have not been able to hit this using HTTP traffic,
but perhaps that is just part of the race and not a particular
problem with NFS.

The patches applied on top of 3.7.3 are some NFS crash fixes Trond posted
last Friday and my own patches I just posted to linux-nfs:

http://www.spinics.net/lists/linux-nfs/msg34811.html

If anyone has any suggestions for how to further debug this,
please let me know.

All of the crashes are in the same place, but the value of
the bad address changes...sometimes appears as if NULL were
dereferenced...

<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/greearb/kernel/2.6/linux-3.7.x64/vmlinux...done.
(gdb) l *(ip_rcv_finish+0x2b7)
0xffffffff8149c783 is in ip_rcv_finish (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:373).
368					skb->len);
369		} else if (rt->rt_type == RTN_BROADCAST)
370			IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INBCAST,
371					skb->len);
372	
373		return dst_input(skb);
374	
375	drop:
376		kfree_skb(skb);
377		return NET_RX_DROP;
(gdb) quit



IPv6: ADDRCONF(NETDEV_CHANGE): eth2#903: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#923: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#943: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#963: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#983: link becomes ready
kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle kernel paging request at ffff88040d7d8000
IP: [<ffff88040d7d8000>] 0xffff88040d7d7fff
PGD 1a0c063 PUD df78e067 PMD 800000040d6001e3
Oops: 0011 [#1] PREEMPT SMP
Modules linked in: macvlan pktgen lockd sunrpc uinput coretemp hwmon kvm_intel kvm gpio_ich iTCO_wdt iTCO_vendor_support microcode pcspkr lpc_ich i2c_i801 
e1000e ioatdma igb i7core_edac edac_core ptp pps_core dca ipv6 mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
CPU 6
Pid: 47, comm: ksoftirqd/6 Tainted: G         C   3.7.3+ #37 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<ffff88040d7d8000>]  [<ffff88040d7d8000>] 0xffff88040d7d7fff
RSP: 0018:ffff88040d75bc00  EFLAGS: 00010282
RAX: ffff88040435cd80 RBX: ffff8803dd83e300 RCX: ffff8803dd83e300
RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff8803dd83e300
RBP: ffff88040d75bc28 R08: ffffffff8149c4cc R09: ffff88040d75bbf0
R10: dead000000200200 R11: dead000000100100 R12: ffff8803e10c88fc
R13: ffff8803dd83e300 R14: ffff88040d38e000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88040d7d8000 CR3: 0000000001a0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksoftirqd/6 (pid: 47, threadinfo ffff88040d75a000, task ffff88040d751730)
Stack:
  ffffffff8149c783 ffff8803dd83e300 ffffffff8149c4cc ffff8803dd83e300
  ffff88040d38e000 ffff88040d75bc58 ffffffff8149cae8 0000000080000000
  0000000000000001 ffff8803dd83e300 ffff88040d38e000 ffff88040d75bc88
Call Trace:
  [<ffffffff8149c783>] ? ip_rcv_finish+0x2b7/0x2cf
  [<ffffffff8149c4cc>] ? inet_del_protocol+0x37/0x37
  [<ffffffff8149cae8>] NF_HOOK.clone.1+0x4c/0x53
  [<ffffffff8149cd73>] ip_rcv+0x237/0x268
  [<ffffffff81468c46>] __netif_receive_skb+0x487/0x530
  [<ffffffff81468de5>] process_backlog+0xf6/0x1d7
  [<ffffffff8146b19b>] net_rx_action+0xad/0x20c
  [<ffffffff8108d292>] __do_softirq+0x9c/0x161
  [<ffffffff8108d37a>] run_ksoftirqd+0x23/0x42
  [<ffffffff810a6f3b>] smpboot_thread_fn+0x253/0x258
  [<ffffffff810a6ce8>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff8109ff60>] kthread+0xbf/0xc7
  [<ffffffff8109fea1>] ? __init_kthread_worker+0x37/0x37
  [<ffffffff815292bc>] ret_from_fork+0x7c/0xb0
  [<ffffffff8109fea1>] ? __init_kthread_worker+0x37/0x37
Code: 36 a4 36 6a ea 6a ea 6a ea 6a ea e9 81 e9 81 e9 81 e9 81 71 48 71 48 71 48 71 48 ea 50 ea 50 ea 50 ea 50 f7 e4 f7 e4 f7 e4 f7 e4 <22> 22 00 00 ad 4e ad de 
ff ff ff ff 00 00 00 00 ff ff ff ff ff
RIP  [<ffff88040d7d8000>] 0xffff88040d7d7fff
  RSP <ffff88040d75bc00>
CR2: ffff88040d7d8000
---[ end trace 1c131533c1de5e8b ]---

Thanks,
Ben

-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-21 21:07 ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-21 21:07 UTC (permalink / raw)
  To: netdev, linux-nfs

I posted about this a few days ago, but this time the patches applied
are minimal and there are no out-of-tree kernel modules loaded.

I have to have the NFS patches (see below) for this test case to run
at all.

Test case is 2500 macvlans, each with a file-IO process running
on it doing writing on it's own mountpoint.  This is using NFSv4,
O_DIRECT, and 10k write() calls (and each file is 10k long, so
open/close file operations as well.  Smaller files and write sizes
showed the problem as well.

The bug is most easily hit during my application's restart, where interfaces
are being (re)configured and the file-io processes are being
started.  This includes changes to: IP addrs, routes, ip routing rules,
mount/unmount operations, etc.

I suspect it may be some sort of race between tearing down the IP protocol
and having traffic currently being received on that
interface.  I have not been able to hit this using HTTP traffic,
but perhaps that is just part of the race and not a particular
problem with NFS.

The patches applied on top of 3.7.3 are some NFS crash fixes Trond posted
last Friday and my own patches I just posted to linux-nfs:

http://www.spinics.net/lists/linux-nfs/msg34811.html

If anyone has any suggestions for how to further debug this,
please let me know.

All of the crashes are in the same place, but the value of
the bad address changes...sometimes appears as if NULL were
dereferenced...

<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/greearb/kernel/2.6/linux-3.7.x64/vmlinux...done.
(gdb) l *(ip_rcv_finish+0x2b7)
0xffffffff8149c783 is in ip_rcv_finish (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:373).
368					skb->len);
369		} else if (rt->rt_type == RTN_BROADCAST)
370			IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INBCAST,
371					skb->len);
372	
373		return dst_input(skb);
374	
375	drop:
376		kfree_skb(skb);
377		return NET_RX_DROP;
(gdb) quit



IPv6: ADDRCONF(NETDEV_CHANGE): eth2#903: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#923: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#943: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#963: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#983: link becomes ready
kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle kernel paging request at ffff88040d7d8000
IP: [<ffff88040d7d8000>] 0xffff88040d7d7fff
PGD 1a0c063 PUD df78e067 PMD 800000040d6001e3
Oops: 0011 [#1] PREEMPT SMP
Modules linked in: macvlan pktgen lockd sunrpc uinput coretemp hwmon kvm_intel kvm gpio_ich iTCO_wdt iTCO_vendor_support microcode pcspkr lpc_ich i2c_i801 
e1000e ioatdma igb i7core_edac edac_core ptp pps_core dca ipv6 mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
CPU 6
Pid: 47, comm: ksoftirqd/6 Tainted: G         C   3.7.3+ #37 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<ffff88040d7d8000>]  [<ffff88040d7d8000>] 0xffff88040d7d7fff
RSP: 0018:ffff88040d75bc00  EFLAGS: 00010282
RAX: ffff88040435cd80 RBX: ffff8803dd83e300 RCX: ffff8803dd83e300
RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff8803dd83e300
RBP: ffff88040d75bc28 R08: ffffffff8149c4cc R09: ffff88040d75bbf0
R10: dead000000200200 R11: dead000000100100 R12: ffff8803e10c88fc
R13: ffff8803dd83e300 R14: ffff88040d38e000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88040d7d8000 CR3: 0000000001a0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksoftirqd/6 (pid: 47, threadinfo ffff88040d75a000, task ffff88040d751730)
Stack:
  ffffffff8149c783 ffff8803dd83e300 ffffffff8149c4cc ffff8803dd83e300
  ffff88040d38e000 ffff88040d75bc58 ffffffff8149cae8 0000000080000000
  0000000000000001 ffff8803dd83e300 ffff88040d38e000 ffff88040d75bc88
Call Trace:
  [<ffffffff8149c783>] ? ip_rcv_finish+0x2b7/0x2cf
  [<ffffffff8149c4cc>] ? inet_del_protocol+0x37/0x37
  [<ffffffff8149cae8>] NF_HOOK.clone.1+0x4c/0x53
  [<ffffffff8149cd73>] ip_rcv+0x237/0x268
  [<ffffffff81468c46>] __netif_receive_skb+0x487/0x530
  [<ffffffff81468de5>] process_backlog+0xf6/0x1d7
  [<ffffffff8146b19b>] net_rx_action+0xad/0x20c
  [<ffffffff8108d292>] __do_softirq+0x9c/0x161
  [<ffffffff8108d37a>] run_ksoftirqd+0x23/0x42
  [<ffffffff810a6f3b>] smpboot_thread_fn+0x253/0x258
  [<ffffffff810a6ce8>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff8109ff60>] kthread+0xbf/0xc7
  [<ffffffff8109fea1>] ? __init_kthread_worker+0x37/0x37
  [<ffffffff815292bc>] ret_from_fork+0x7c/0xb0
  [<ffffffff8109fea1>] ? __init_kthread_worker+0x37/0x37
Code: 36 a4 36 6a ea 6a ea 6a ea 6a ea e9 81 e9 81 e9 81 e9 81 71 48 71 48 71 48 71 48 ea 50 ea 50 ea 50 ea 50 f7 e4 f7 e4 f7 e4 f7 e4 <22> 22 00 00 ad 4e ad de 
ff ff ff ff 00 00 00 00 ff ff ff ff ff
RIP  [<ffff88040d7d8000>] 0xffff88040d7d7fff
  RSP <ffff88040d75bc00>
CR2: ffff88040d7d8000
---[ end trace 1c131533c1de5e8b ]---

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-21 21:07 ` Ben Greear
  (?)
@ 2013-01-22  0:32 ` Ben Greear
  2013-01-22  4:40   ` Eric Dumazet
  -1 siblings, 1 reply; 53+ messages in thread
From: Ben Greear @ 2013-01-22  0:32 UTC (permalink / raw)
  To: netdev, linux-nfs

On 01/21/2013 01:07 PM, Ben Greear wrote:
> I posted about this a few days ago, but this time the patches applied
> are minimal and there are no out-of-tree kernel modules loaded.

Here's another crash, this time with SLUB memory debugging turned on.

Seems much harder to hit this way...I've only managed this one.  I
believe the RCX register might be interesting...that 6b is probably
freed memory poisioning.  Maybe skb or skb_dest() is already freed?

I have added a 'verify_mem_not_deleted(skb)' before the
dst_input line in ip_rcv_finish..but so far, it hasn't
hit the problem...


general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: 8021q garp stp llc nfsv4 auth_rpcgss nfs fscache macvlan pktgen lockd sunrpc iTCO_wdt iTCO_vendor_support gpio_ich coretemp hwmon kvm_intel 
kvm microcode pcspkr i2c_i801 lpc_ich igb e1000e ioatdma ptp i7core_edac pps_core edac_core dca uinput ipv6 mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core 
[last unloaded: macvlan]
CPU 2
Pid: 22, comm: rcuc/2 Tainted: G         C O 3.7.3+ #38 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<ffffffff814c196d>]  [<ffffffff814c196d>] ip_rcv_finish+0x334/0x34f
RSP: 0018:ffff88041fc43d98  EFLAGS: 00010282
RAX: ffff880369f67180 RBX: ffff8803e4144fc0 RCX: 000000006b6b6b6b
RDX: ffff8803dd632f64 RSI: ffffffff81a2a580 RDI: ffff8803e4144fc0
RBP: ffff88041fc43db8 R08: ffffffff814c1639 R09: ffffffff814c1639
R10: ffff8803dd632f64 R11: ffff8803dd632f64 R12: ffff8803dd632f64
R13: ffff88041fc58420 R14: ffff88040cfa8000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fc40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000079e008 CR3: 00000003d5759000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rcuc/2 (pid: 22, threadinfo ffff88040d7d4000, task ffff88040d7d8000)
Stack:
  ffff8803e4144fc0 ffffffff814c1639 ffff8803e4144fc0 ffff88040cfa8000
  ffff88041fc43de8 ffffffff814c1e61 ffff880480000000 0000000000000246
  ffff8803e4144fc0 ffff88040cfa8000 ffff88041fc43e18 ffffffff814c20ec
Call Trace:
  <IRQ>
  [<ffffffff814c1639>] ? skb_dst+0x41/0x41
  [<ffffffff814c1e61>] NF_HOOK.clone.1+0x4c/0x53
  [<ffffffff814c20ec>] ip_rcv+0x237/0x267
  [<ffffffff814887f0>] __netif_receive_skb+0x537/0x5e0
  [<ffffffff8148895a>] process_backlog+0xc1/0x1b7
  [<ffffffff8148b082>] ? net_rx_action+0x1c4/0x1e2
  [<ffffffff8148af70>] net_rx_action+0xb2/0x1e2
  [<ffffffff8108eeb8>] __do_softirq+0xab/0x17b
  [<ffffffff81557cfc>] call_softirq+0x1c/0x30
  <EOI>
  [<ffffffff8100bd46>] do_softirq+0x46/0x9e
  [<ffffffff810fb709>] ? rcu_cpu_kthread+0xf6/0x12f
  [<ffffffff8108f08d>] _local_bh_enable_ip+0xb3/0xe7
  [<ffffffff8108f0d9>] local_bh_enable+0xd/0x10
  [<ffffffff810fb709>] rcu_cpu_kthread+0xf6/0x12f
  [<ffffffff810aacde>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810aaa8b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a3439>] kthread+0xc2/0xca
  [<ffffffff810a3377>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff815569bc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a3377>] ? __init_kthread_worker+0x56/0x56
Code: 8b 80 58 04 00 00 48 8b 80 48 01 00 00 65 48 ff 80 c8 00 00 00 8b 53 68 65 48 01 90 e8 00 00 00 48 89 df e8
8e fc ff ff 48 89 df <ff> 50 50 eb 0d 48 89 df e8 2c eb fb ff b8 01 00 00 00 5b 41 5c
RIP  [<ffffffff814c196d>] ip_rcv_finish+0x334/0x34f
  RSP <ffff88041fc43d98>
[drm:mgag200_bo_reserve] *ERROR* reserve failed ffff8803feba9fb0


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22  0:32 ` Ben Greear
@ 2013-01-22  4:40   ` Eric Dumazet
  2013-01-22  5:57       ` Ben Greear
  0 siblings, 1 reply; 53+ messages in thread
From: Eric Dumazet @ 2013-01-22  4:40 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs

On Mon, 2013-01-21 at 16:32 -0800, Ben Greear wrote:
> On 01/21/2013 01:07 PM, Ben Greear wrote:
> > I posted about this a few days ago, but this time the patches applied
> > are minimal and there are no out-of-tree kernel modules loaded.
> 
> Here's another crash, this time with SLUB memory debugging turned on.
> 
> Seems much harder to hit this way...I've only managed this one.  I
> believe the RCX register might be interesting...that 6b is probably
> freed memory poisioning.  Maybe skb or skb_dest() is already freed?
> 
> I have added a 'verify_mem_not_deleted(skb)' before the
> dst_input line in ip_rcv_finish..but so far, it hasn't
> hit the problem...
> 

There is no way skb is freed here.

I would say macvlan is at fault here.

It probably lacks a proper (dev->flags & IFF_UP) test.

Please try :

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index d3fb97d..c07bdef 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -111,9 +111,15 @@ static int macvlan_broadcast_one(struct sk_buff *skb,
 				 const struct ethhdr *eth, bool local)
 {
 	struct net_device *dev = vlan->dev;
+
 	if (!skb)
 		return NET_RX_DROP;
 
+	if (!(dev->flags & IFF_UP)) {
+		kfree_skb(skb);
+		return NET_RX_DROP;
+	}
+
 	if (local)
 		return vlan->forward(dev, skb);
 

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22  4:40   ` Eric Dumazet
@ 2013-01-22  5:57       ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-22  5:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 01/21/2013 08:40 PM, Eric Dumazet wrote:
> On Mon, 2013-01-21 at 16:32 -0800, Ben Greear wrote:
>> On 01/21/2013 01:07 PM, Ben Greear wrote:
>>> I posted about this a few days ago, but this time the patches applied
>>> are minimal and there are no out-of-tree kernel modules loaded.
>>
>> Here's another crash, this time with SLUB memory debugging turned on.
>>
>> Seems much harder to hit this way...I've only managed this one.  I
>> believe the RCX register might be interesting...that 6b is probably
>> freed memory poisioning.  Maybe skb or skb_dest() is already freed?
>>
>> I have added a 'verify_mem_not_deleted(skb)' before the
>> dst_input line in ip_rcv_finish..but so far, it hasn't
>> hit the problem...
>>
>
> There is no way skb is freed here.
>
> I would say macvlan is at fault here.
>
> It probably lacks a proper (dev->flags & IFF_UP) test.

Thanks for the patch.  It's running OK so far, but I'll need to
do a bunch more testing tomorrow to make sure I'm not just getting
lucky!

I'll let you know how it goes tomorrow.

Thanks,
Ben



-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-22  5:57       ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-22  5:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs

On 01/21/2013 08:40 PM, Eric Dumazet wrote:
> On Mon, 2013-01-21 at 16:32 -0800, Ben Greear wrote:
>> On 01/21/2013 01:07 PM, Ben Greear wrote:
>>> I posted about this a few days ago, but this time the patches applied
>>> are minimal and there are no out-of-tree kernel modules loaded.
>>
>> Here's another crash, this time with SLUB memory debugging turned on.
>>
>> Seems much harder to hit this way...I've only managed this one.  I
>> believe the RCX register might be interesting...that 6b is probably
>> freed memory poisioning.  Maybe skb or skb_dest() is already freed?
>>
>> I have added a 'verify_mem_not_deleted(skb)' before the
>> dst_input line in ip_rcv_finish..but so far, it hasn't
>> hit the problem...
>>
>
> There is no way skb is freed here.
>
> I would say macvlan is at fault here.
>
> It probably lacks a proper (dev->flags & IFF_UP) test.

Thanks for the patch.  It's running OK so far, but I'll need to
do a bunch more testing tomorrow to make sure I'm not just getting
lucky!

I'll let you know how it goes tomorrow.

Thanks,
Ben



-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22  5:57       ` Ben Greear
@ 2013-01-22 17:08           ` Ben Greear
  -1 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-22 17:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 01/21/2013 09:57 PM, Ben Greear wrote:
> On 01/21/2013 08:40 PM, Eric Dumazet wrote:
>> On Mon, 2013-01-21 at 16:32 -0800, Ben Greear wrote:
>>> On 01/21/2013 01:07 PM, Ben Greear wrote:
>>>> I posted about this a few days ago, but this time the patches applied
>>>> are minimal and there are no out-of-tree kernel modules loaded.
>>>
>>> Here's another crash, this time with SLUB memory debugging turned on.
>>>
>>> Seems much harder to hit this way...I've only managed this one.  I
>>> believe the RCX register might be interesting...that 6b is probably
>>> freed memory poisioning.  Maybe skb or skb_dest() is already freed?
>>>
>>> I have added a 'verify_mem_not_deleted(skb)' before the
>>> dst_input line in ip_rcv_finish..but so far, it hasn't
>>> hit the problem...
>>>
>>
>> There is no way skb is freed here.
>>
>> I would say macvlan is at fault here.
>>
>> It probably lacks a proper (dev->flags & IFF_UP) test.
>
> Thanks for the patch.  It's running OK so far, but I'll need to
> do a bunch more testing tomorrow to make sure I'm not just getting
> lucky!
>
> I'll let you know how it goes tomorrow.

Unfortunately, I hit it again this morning after the first restart of
my application (which bounces all 3000 interfaces).  Memory poisoning
was disabled.

(gdb) l *(ip_rcv_finish+0x2b9)
0xffffffff814a8ab3 is in ip_rcv_finish (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:373).
368					skb->len);
369		} else if (rt->rt_type == RTN_BROADCAST)
370			IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INBCAST,
371					skb->len);
372	
373		return dst_input(skb);
374	
375	drop:
376		kfree_skb(skb);
377		return NET_RX_DROP;
(gdb)

(gdb) l *(skb_dst+0x5a)
0xffffffff814a87fa is in ip_rcv_finish (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:320).
315	
316	int sysctl_ip_early_demux __read_mostly = 1;
317	EXPORT_SYMBOL(sysctl_ip_early_demux);
318	
319	static int ip_rcv_finish(struct sk_buff *skb)
320	{
321		const struct iphdr *iph = ip_hdr(skb);
322		struct rtable *rt;
323	
324		if (sysctl_ip_early_demux && !skb_dst(skb)) {

nfs: server 10.1.0.1 not responding, timed out
nfs: server 10.1.0.1 not responding, timed out
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] PREEMPT SMP
Modules linked in: nf_nat_ipv4 nf_nat nfsv4 auth_rpcgss nfs fscache 8021q garp stp llc lockd sunrpc macvlan pktgen uinput coretemp hwmon kvm_intel kvm iTCO_wdt 
iTCO_vendor_support gpio_ich microcode pcspkr i2c_i801 lpc_ich e1000e i7core_edac ioatdma edac_core igb ptp pps_core dca ipv6 mgag200 i2c_algo_bit 
drm_kms_helper ttm drm i2c_core [last unloaded: iptable_nat]
CPU 0
Pid: 9, comm: rcuc/0 Tainted: G         C O 3.7.3+ #43 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff88041fc03da0  EFLAGS: 00010286
RAX: ffff88023045f5c0 RBX: ffff88034491ab00 RCX: 00000000ffff8800
RDX: ffff88025920d4fc RSI: ffffffff81a2a500 RDI: ffff88034491ab00
RBP: ffff88041fc03dc8 R08: ffffffff814a87fa R09: ffff88041fc03d90
R10: ffff88025920d4fc R11: ffff88041fc03e28 R12: ffff88025920d4fc
R13: ffff88041fc18b60 R14: ffff88040d3f8000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rcuc/0 (pid: 9, threadinfo ffff88040d4e2000, task ffff88040d4e8000)
Stack:
  ffffffff814a8ab3 ffff88034491ab00 ffffffff814a87fa ffff88034491ab00
  ffff88040d3f8000 ffff88041fc03df8 ffffffff814a8e66 0000000080000000
  ffffffff81472e61 ffff88034491ab00 ffff88040d3f8000 ffff88041fc03e28
Call Trace:
  <IRQ>
  [<ffffffff814a8ab3>] ? ip_rcv_finish+0x2b9/0x2d1
  [<ffffffff814a87fa>] ? skb_dst+0x5a/0x5a
  [<ffffffff814a8e66>] NF_HOOK.clone.1+0x4c/0x54
  [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
  [<ffffffff814a90f3>] ip_rcv+0x237/0x268
  [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
  [<ffffffff81473f91>] process_backlog+0xf9/0x1da
nfs: server 10.1.8.1 not responding, timed out
  [<ffffffff8147639a>] net_rx_action+0xad/0x218
  [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
  [<ffffffff81538ddc>] call_softirq+0x1c/0x30
  <EOI>
  [<ffffffff8100bd21>] do_softirq+0x41/0x7e
  [<ffffffff8108d68b>] _local_bh_enable_ip+0x7a/0x9f
  [<ffffffff8108d6c8>] local_bh_enable+0xd/0x11
  [<ffffffff810f3661>] rcu_cpu_kthread+0xe6/0x11f
  [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a0a6d>] kthread+0xc2/0xca
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81537afc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
Code:  Bad RIP value.nfs: server 10.1.0.1 not responding, timed out
nfs: server 10.1.0.1 not responding, timed out

>
> Thanks,
> Ben
>
>
>


-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-22 17:08           ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-22 17:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs

On 01/21/2013 09:57 PM, Ben Greear wrote:
> On 01/21/2013 08:40 PM, Eric Dumazet wrote:
>> On Mon, 2013-01-21 at 16:32 -0800, Ben Greear wrote:
>>> On 01/21/2013 01:07 PM, Ben Greear wrote:
>>>> I posted about this a few days ago, but this time the patches applied
>>>> are minimal and there are no out-of-tree kernel modules loaded.
>>>
>>> Here's another crash, this time with SLUB memory debugging turned on.
>>>
>>> Seems much harder to hit this way...I've only managed this one.  I
>>> believe the RCX register might be interesting...that 6b is probably
>>> freed memory poisioning.  Maybe skb or skb_dest() is already freed?
>>>
>>> I have added a 'verify_mem_not_deleted(skb)' before the
>>> dst_input line in ip_rcv_finish..but so far, it hasn't
>>> hit the problem...
>>>
>>
>> There is no way skb is freed here.
>>
>> I would say macvlan is at fault here.
>>
>> It probably lacks a proper (dev->flags & IFF_UP) test.
>
> Thanks for the patch.  It's running OK so far, but I'll need to
> do a bunch more testing tomorrow to make sure I'm not just getting
> lucky!
>
> I'll let you know how it goes tomorrow.

Unfortunately, I hit it again this morning after the first restart of
my application (which bounces all 3000 interfaces).  Memory poisoning
was disabled.

(gdb) l *(ip_rcv_finish+0x2b9)
0xffffffff814a8ab3 is in ip_rcv_finish (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:373).
368					skb->len);
369		} else if (rt->rt_type == RTN_BROADCAST)
370			IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INBCAST,
371					skb->len);
372	
373		return dst_input(skb);
374	
375	drop:
376		kfree_skb(skb);
377		return NET_RX_DROP;
(gdb)

(gdb) l *(skb_dst+0x5a)
0xffffffff814a87fa is in ip_rcv_finish (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:320).
315	
316	int sysctl_ip_early_demux __read_mostly = 1;
317	EXPORT_SYMBOL(sysctl_ip_early_demux);
318	
319	static int ip_rcv_finish(struct sk_buff *skb)
320	{
321		const struct iphdr *iph = ip_hdr(skb);
322		struct rtable *rt;
323	
324		if (sysctl_ip_early_demux && !skb_dst(skb)) {

nfs: server 10.1.0.1 not responding, timed out
nfs: server 10.1.0.1 not responding, timed out
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] PREEMPT SMP
Modules linked in: nf_nat_ipv4 nf_nat nfsv4 auth_rpcgss nfs fscache 8021q garp stp llc lockd sunrpc macvlan pktgen uinput coretemp hwmon kvm_intel kvm iTCO_wdt 
iTCO_vendor_support gpio_ich microcode pcspkr i2c_i801 lpc_ich e1000e i7core_edac ioatdma edac_core igb ptp pps_core dca ipv6 mgag200 i2c_algo_bit 
drm_kms_helper ttm drm i2c_core [last unloaded: iptable_nat]
CPU 0
Pid: 9, comm: rcuc/0 Tainted: G         C O 3.7.3+ #43 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff88041fc03da0  EFLAGS: 00010286
RAX: ffff88023045f5c0 RBX: ffff88034491ab00 RCX: 00000000ffff8800
RDX: ffff88025920d4fc RSI: ffffffff81a2a500 RDI: ffff88034491ab00
RBP: ffff88041fc03dc8 R08: ffffffff814a87fa R09: ffff88041fc03d90
R10: ffff88025920d4fc R11: ffff88041fc03e28 R12: ffff88025920d4fc
R13: ffff88041fc18b60 R14: ffff88040d3f8000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rcuc/0 (pid: 9, threadinfo ffff88040d4e2000, task ffff88040d4e8000)
Stack:
  ffffffff814a8ab3 ffff88034491ab00 ffffffff814a87fa ffff88034491ab00
  ffff88040d3f8000 ffff88041fc03df8 ffffffff814a8e66 0000000080000000
  ffffffff81472e61 ffff88034491ab00 ffff88040d3f8000 ffff88041fc03e28
Call Trace:
  <IRQ>
  [<ffffffff814a8ab3>] ? ip_rcv_finish+0x2b9/0x2d1
  [<ffffffff814a87fa>] ? skb_dst+0x5a/0x5a
  [<ffffffff814a8e66>] NF_HOOK.clone.1+0x4c/0x54
  [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
  [<ffffffff814a90f3>] ip_rcv+0x237/0x268
  [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
  [<ffffffff81473f91>] process_backlog+0xf9/0x1da
nfs: server 10.1.8.1 not responding, timed out
  [<ffffffff8147639a>] net_rx_action+0xad/0x218
  [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
  [<ffffffff81538ddc>] call_softirq+0x1c/0x30
  <EOI>
  [<ffffffff8100bd21>] do_softirq+0x41/0x7e
  [<ffffffff8108d68b>] _local_bh_enable_ip+0x7a/0x9f
  [<ffffffff8108d6c8>] local_bh_enable+0xd/0x11
  [<ffffffff810f3661>] rcu_cpu_kthread+0xe6/0x11f
  [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a0a6d>] kthread+0xc2/0xca
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81537afc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
Code:  Bad RIP value.nfs: server 10.1.0.1 not responding, timed out
nfs: server 10.1.0.1 not responding, timed out

>
> Thanks,
> Ben
>
>
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22 17:08           ` Ben Greear
@ 2013-01-22 17:17               ` Eric Dumazet
  -1 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-22 17:17 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:

> Unfortunately, I hit it again this morning after the first restart of
> my application (which bounces all 3000 interfaces).  Memory poisoning
> was disabled.

Is your NFS traffic using TCP or UDP ?


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-22 17:17               ` Eric Dumazet
  0 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-22 17:17 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs

On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:

> Unfortunately, I hit it again this morning after the first restart of
> my application (which bounces all 3000 interfaces).  Memory poisoning
> was disabled.

Is your NFS traffic using TCP or UDP ?



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22 17:17               ` Eric Dumazet
@ 2013-01-22 17:26                 ` Ben Greear
  -1 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-22 17:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 01/22/2013 09:17 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:
>
>> Unfortunately, I hit it again this morning after the first restart of
>> my application (which bounces all 3000 interfaces).  Memory poisoning
>> was disabled.
>
> Is your NFS traffic using TCP or UDP ?

It's using TCP.

Thanks,
Ben

>


-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-22 17:26                 ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-22 17:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs

On 01/22/2013 09:17 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:
>
>> Unfortunately, I hit it again this morning after the first restart of
>> my application (which bounces all 3000 interfaces).  Memory poisoning
>> was disabled.
>
> Is your NFS traffic using TCP or UDP ?

It's using TCP.

Thanks,
Ben

>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22 17:17               ` Eric Dumazet
  (?)
  (?)
@ 2013-01-22 17:26               ` Eric Dumazet
  2013-01-22 22:18                   ` Ben Greear
  -1 siblings, 1 reply; 53+ messages in thread
From: Eric Dumazet @ 2013-01-22 17:26 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs

On Tue, 2013-01-22 at 09:17 -0800, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:
> 
> > Unfortunately, I hit it again this morning after the first restart of
> > my application (which bounces all 3000 interfaces).  Memory poisoning
> > was disabled.
> 
> Is your NFS traffic using TCP or UDP ?
> 

Oh well, it seems macvlan.c has to skb_drop_dst(skb) before giving skb
to netif_rx()

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 68a43fe..4f049ee 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -114,6 +114,7 @@ static int macvlan_broadcast_one(struct sk_buff *skb,
 	if (!skb)
 		return NET_RX_DROP;
 
+	skb_dst_drop(skb);
 	if (local)
 		return vlan->forward(dev, skb);
 
@@ -220,6 +221,7 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
 	if (!skb)
 		goto out;
 
+	skb_dst_drop(skb);
 	skb->dev = dev;
 	skb->pkt_type = PACKET_HOST;
 

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22 17:26               ` Eric Dumazet
@ 2013-01-22 22:18                   ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-22 22:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 01/22/2013 09:26 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 09:17 -0800, Eric Dumazet wrote:
>> On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:
>>
>>> Unfortunately, I hit it again this morning after the first restart of
>>> my application (which bounces all 3000 interfaces).  Memory poisoning
>>> was disabled.
>>
>> Is your NFS traffic using TCP or UDP ?
>>
>
> Oh well, it seems macvlan.c has to skb_drop_dst(skb) before giving skb
> to netif_rx()

I just saw another crash.  It had run 2 user-space restarts and
2 reboots, but on the third reboot, it crashed coming up.  It seemed
to last longer this time, but that could just be luck as it's never
been super easy to reproduce this quickly.

For completeness, here is the diff I was using:

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 68a43fe..eb55c88 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -111,9 +111,16 @@ static int macvlan_broadcast_one(struct sk_buff *skb,
                                  const struct ethhdr *eth, bool local)
  {
         struct net_device *dev = vlan->dev;
+
         if (!skb)
                 return NET_RX_DROP;

+       if (!(dev->flags & IFF_UP)) {
+               kfree_skb(skb);
+               return NET_RX_DROP;
+       }
+
+       skb_dst_drop(skb);
         if (local)
                 return vlan->forward(dev, skb);

@@ -220,6 +227,7 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
         if (!skb)
                 goto out;

+       skb_dst_drop(skb);
         skb->dev = dev;
         skb->pkt_type = PACKET_HOST;


BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] PREEMPT SMP
Modules linked in: nf_nat_ipv4 nf_nat nfsv4 auth_rpcgss nfs fscache 8021q garp stp llc macvlan pktgen lockd sunrpc uinput iTCO_wdt iTCO_vendor_support gpio_ich 
coretemp hwmon kvm_intel kvm microcode pcspkr i2c_i801 lpc_ich e1000e igb ptp ioatdma i7core_edac pps_core dca edac_core ipv6 mgag200 i2c_algo_bit 
drm_kms_helper ttm drm i2c_core [last unloaded: iptable_nat]
CPU 5
Pid: 40, comm: rcuc/5 Tainted: G         C   3.7.3+ #43 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff88041fca3da0  EFLAGS: 00010282
RAX: ffff88030ae8bc80 RBX: ffff880198694500 RCX: 0000000000000028
RDX: ffffffff81aafcb0 RSI: ffffffff81a2a500 RDI: ffff880198694500
RBP: ffff88041fca3dc8 R08: ffffffff814a87fa R09: ffff88041fca3d90
R10: ffff8803dc45b8fc R11: ffff88041fca3e28 R12: ffff8803dc45b8fc
R13: ffff880198694500 R14: ffff88040d3f8000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fca0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rcuc/5 (pid: 40, threadinfo ffff88040d73c000, task ffff88040d723ea0)
Stack:
  ffffffff814a8ab3 ffff880198694500 ffffffff814a87fa ffff880198694500
  ffff88040d3f8000 ffff88041fca3df8 ffffffff814a8e66 0000000080000000
  ffffffff81472e61 ffff880198694500 ffff88040d3f8000 ffff88041fca3e28
Call Trace:
  <IRQ>
  [<ffffffff814a8ab3>] ? ip_rcv_finish+0x2b9/0x2d1
  [<ffffffff814a87fa>] ? skb_dst+0x5a/0x5a
  [<ffffffff814a8e66>] NF_HOOK.clone.1+0x4c/0x54
  [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
  [<ffffffff814a90f3>] ip_rcv+0x237/0x268
  [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
  [<ffffffff81473f91>] process_backlog+0xf9/0x1da
  [<ffffffff8147639a>] net_rx_action+0xad/0x218
  [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
  [<ffffffff81538ddc>] call_softirq+0x1c/0x30
  <EOI>
  [<ffffffff8100bd21>] do_softirq+0x41/0x7e
  [<ffffffff8108d68b>] _local_bh_enable_ip+0x7a/0x9f
  [<ffffffff8108d6c8>] local_bh_enable+0xd/0x11
  [<ffffffff810f3661>] rcu_cpu_kthread+0xe6/0x11f
  [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a0a6d>] kthread+0xc2/0xca
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81537afc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
  RSP <ffff88041fca3da0>
CR2: 0000000000000000


-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-22 22:18                   ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-22 22:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs

On 01/22/2013 09:26 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 09:17 -0800, Eric Dumazet wrote:
>> On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:
>>
>>> Unfortunately, I hit it again this morning after the first restart of
>>> my application (which bounces all 3000 interfaces).  Memory poisoning
>>> was disabled.
>>
>> Is your NFS traffic using TCP or UDP ?
>>
>
> Oh well, it seems macvlan.c has to skb_drop_dst(skb) before giving skb
> to netif_rx()

I just saw another crash.  It had run 2 user-space restarts and
2 reboots, but on the third reboot, it crashed coming up.  It seemed
to last longer this time, but that could just be luck as it's never
been super easy to reproduce this quickly.

For completeness, here is the diff I was using:

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 68a43fe..eb55c88 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -111,9 +111,16 @@ static int macvlan_broadcast_one(struct sk_buff *skb,
                                  const struct ethhdr *eth, bool local)
  {
         struct net_device *dev = vlan->dev;
+
         if (!skb)
                 return NET_RX_DROP;

+       if (!(dev->flags & IFF_UP)) {
+               kfree_skb(skb);
+               return NET_RX_DROP;
+       }
+
+       skb_dst_drop(skb);
         if (local)
                 return vlan->forward(dev, skb);

@@ -220,6 +227,7 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
         if (!skb)
                 goto out;

+       skb_dst_drop(skb);
         skb->dev = dev;
         skb->pkt_type = PACKET_HOST;


BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] PREEMPT SMP
Modules linked in: nf_nat_ipv4 nf_nat nfsv4 auth_rpcgss nfs fscache 8021q garp stp llc macvlan pktgen lockd sunrpc uinput iTCO_wdt iTCO_vendor_support gpio_ich 
coretemp hwmon kvm_intel kvm microcode pcspkr i2c_i801 lpc_ich e1000e igb ptp ioatdma i7core_edac pps_core dca edac_core ipv6 mgag200 i2c_algo_bit 
drm_kms_helper ttm drm i2c_core [last unloaded: iptable_nat]
CPU 5
Pid: 40, comm: rcuc/5 Tainted: G         C   3.7.3+ #43 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff88041fca3da0  EFLAGS: 00010282
RAX: ffff88030ae8bc80 RBX: ffff880198694500 RCX: 0000000000000028
RDX: ffffffff81aafcb0 RSI: ffffffff81a2a500 RDI: ffff880198694500
RBP: ffff88041fca3dc8 R08: ffffffff814a87fa R09: ffff88041fca3d90
R10: ffff8803dc45b8fc R11: ffff88041fca3e28 R12: ffff8803dc45b8fc
R13: ffff880198694500 R14: ffff88040d3f8000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fca0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rcuc/5 (pid: 40, threadinfo ffff88040d73c000, task ffff88040d723ea0)
Stack:
  ffffffff814a8ab3 ffff880198694500 ffffffff814a87fa ffff880198694500
  ffff88040d3f8000 ffff88041fca3df8 ffffffff814a8e66 0000000080000000
  ffffffff81472e61 ffff880198694500 ffff88040d3f8000 ffff88041fca3e28
Call Trace:
  <IRQ>
  [<ffffffff814a8ab3>] ? ip_rcv_finish+0x2b9/0x2d1
  [<ffffffff814a87fa>] ? skb_dst+0x5a/0x5a
  [<ffffffff814a8e66>] NF_HOOK.clone.1+0x4c/0x54
  [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
  [<ffffffff814a90f3>] ip_rcv+0x237/0x268
  [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
  [<ffffffff81473f91>] process_backlog+0xf9/0x1da
  [<ffffffff8147639a>] net_rx_action+0xad/0x218
  [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
  [<ffffffff81538ddc>] call_softirq+0x1c/0x30
  <EOI>
  [<ffffffff8100bd21>] do_softirq+0x41/0x7e
  [<ffffffff8108d68b>] _local_bh_enable_ip+0x7a/0x9f
  [<ffffffff8108d6c8>] local_bh_enable+0xd/0x11
  [<ffffffff810f3661>] rcu_cpu_kthread+0xe6/0x11f
  [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a0a6d>] kthread+0xc2/0xca
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81537afc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
  RSP <ffff88041fca3da0>
CR2: 0000000000000000


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22 22:18                   ` Ben Greear
@ 2013-01-23  2:32                       ` Ben Greear
  -1 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-23  2:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 01/22/2013 02:18 PM, Ben Greear wrote:
> On 01/22/2013 09:26 AM, Eric Dumazet wrote:
>> On Tue, 2013-01-22 at 09:17 -0800, Eric Dumazet wrote:
>>> On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:
>>>
>>>> Unfortunately, I hit it again this morning after the first restart of
>>>> my application (which bounces all 3000 interfaces).  Memory poisoning
>>>> was disabled.
>>>
>>> Is your NFS traffic using TCP or UDP ?
>>>
>>
>> Oh well, it seems macvlan.c has to skb_drop_dst(skb) before giving skb
>> to netif_rx()
>
> I just saw another crash.  It had run 2 user-space restarts and
> 2 reboots, but on the third reboot, it crashed coming up.  It seemed
> to last longer this time, but that could just be luck as it's never
> been super easy to reproduce this quickly.

I added a patch to set dst->input and dst->output to 0xdeadbeef before
freeing the memory.  (The warn-on below did NOT hit)

@@ -452,6 +452,9 @@ static inline int dst_output(struct sk_buff *skb)
  /* Input packet from network to transport.  */
  static inline int dst_input(struct sk_buff *skb)
  {
+       if (WARN_ON(((unsigned long)(skb_dst(skb))) < 4000)) {
+               printk("Bad skb_dst: %lu\n", skb->_skb_refdst);
+       }
         return skb_dst(skb)->input(skb);
  }

diff --git a/net/core/dst.c b/net/core/dst.c
index ee6153e..234b168 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -245,6 +245,7 @@ again:
                 dst->ops->destroy(dst);
         if (dst->dev)
                 dev_put(dst->dev);
+       dst->input = dst->output = 0xdeadbeef;
         kmem_cache_free(dst->ops->kmem_cachep, dst);

         dst = child;

Looks like we do indeed access freed memory, based on this crash I saw on
the next reboot:

[root@lf1011-12060006 ~]# BUG: unable to handle kernel paging request at 00000000deadbeef
IP: [<00000000deadbeef>] 0xdeadbeee
PGD 0
Oops: 0010 [#1] PREEMPT SMP
Modules linked in: macvlan pktgen lockd sunrpc uinput iTCO_wdt iTCO_vendor_support gpio_ich coretemp hwmon kvm_intel kvm microcode pcspkr i2c_i801 lpc_ich 
e1000e i7core_edac ioatdma edac_core igb ptp pps_core dca ipv6 mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
CPU 8
Pid: 59, comm: ksoftirqd/8 Tainted: G         C O 3.7.3+ #46 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<00000000deadbeef>]  [<00000000deadbeef>] 0xdeadbeee
RSP: 0018:ffff88040d7d7bc0  EFLAGS: 00010286
RAX: ffff8803d97fc900 RBX: ffff8803d4d30d00 RCX: 0000000000000028
RDX: ffffffff81aafcb0 RSI: ffffffff81a2a500 RDI: ffff8803d4d30d00
RBP: ffff88040d7d7be8 R08: ffffffff814a8812 R09: ffff88040d7d7bb0
R10: ffff8803c9dfd8fc R11: ffff88040d7d7c48 R12: ffff8803c9dfd8fc
R13: ffff8803d4d30d00 R14: ffff88040d3f8000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000deadbeef CR3: 0000000001a0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksoftirqd/8 (pid: 59, threadinfo ffff88040d7d6000, task ffff88040d7e1f50)
Stack:
  ffffffff814a8b02 ffff8803d4d30d00 ffffffff814a8812 ffff8803d4d30d00
  ffff88040d3f8000 ffff88040d7d7c18 ffffffff814a8eb5 0000000080000000
  ffffffff81472e61 ffff8803d4d30d00 ffff88040d3f8000 ffff88040d7d7c48
Call Trace:
  [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308
  [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a
  [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54
  [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
  [<ffffffff814a9142>] ip_rcv+0x237/0x269
  [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
  [<ffffffff81473f91>] process_backlog+0xf9/0x1da
  [<ffffffff8147639a>] net_rx_action+0xad/0x218
  [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
  [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42
  [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a0a6d>] kthread+0xc2/0xca
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
Code:  Bad RIP value.
RIP  [<00000000deadbeef>] 0xdeadbeee
  RSP <ffff88040d7d7bc0>
CR2: 00000000deadbeef
---[ end trace eed854e70ff0a575 ]---
Kernel panic - not syncing: Fatal excepti

Thanks,
Ben


-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-23  2:32                       ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-23  2:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs

On 01/22/2013 02:18 PM, Ben Greear wrote:
> On 01/22/2013 09:26 AM, Eric Dumazet wrote:
>> On Tue, 2013-01-22 at 09:17 -0800, Eric Dumazet wrote:
>>> On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:
>>>
>>>> Unfortunately, I hit it again this morning after the first restart of
>>>> my application (which bounces all 3000 interfaces).  Memory poisoning
>>>> was disabled.
>>>
>>> Is your NFS traffic using TCP or UDP ?
>>>
>>
>> Oh well, it seems macvlan.c has to skb_drop_dst(skb) before giving skb
>> to netif_rx()
>
> I just saw another crash.  It had run 2 user-space restarts and
> 2 reboots, but on the third reboot, it crashed coming up.  It seemed
> to last longer this time, but that could just be luck as it's never
> been super easy to reproduce this quickly.

I added a patch to set dst->input and dst->output to 0xdeadbeef before
freeing the memory.  (The warn-on below did NOT hit)

@@ -452,6 +452,9 @@ static inline int dst_output(struct sk_buff *skb)
  /* Input packet from network to transport.  */
  static inline int dst_input(struct sk_buff *skb)
  {
+       if (WARN_ON(((unsigned long)(skb_dst(skb))) < 4000)) {
+               printk("Bad skb_dst: %lu\n", skb->_skb_refdst);
+       }
         return skb_dst(skb)->input(skb);
  }

diff --git a/net/core/dst.c b/net/core/dst.c
index ee6153e..234b168 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -245,6 +245,7 @@ again:
                 dst->ops->destroy(dst);
         if (dst->dev)
                 dev_put(dst->dev);
+       dst->input = dst->output = 0xdeadbeef;
         kmem_cache_free(dst->ops->kmem_cachep, dst);

         dst = child;

Looks like we do indeed access freed memory, based on this crash I saw on
the next reboot:

[root@lf1011-12060006 ~]# BUG: unable to handle kernel paging request at 00000000deadbeef
IP: [<00000000deadbeef>] 0xdeadbeee
PGD 0
Oops: 0010 [#1] PREEMPT SMP
Modules linked in: macvlan pktgen lockd sunrpc uinput iTCO_wdt iTCO_vendor_support gpio_ich coretemp hwmon kvm_intel kvm microcode pcspkr i2c_i801 lpc_ich 
e1000e i7core_edac ioatdma edac_core igb ptp pps_core dca ipv6 mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
CPU 8
Pid: 59, comm: ksoftirqd/8 Tainted: G         C O 3.7.3+ #46 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<00000000deadbeef>]  [<00000000deadbeef>] 0xdeadbeee
RSP: 0018:ffff88040d7d7bc0  EFLAGS: 00010286
RAX: ffff8803d97fc900 RBX: ffff8803d4d30d00 RCX: 0000000000000028
RDX: ffffffff81aafcb0 RSI: ffffffff81a2a500 RDI: ffff8803d4d30d00
RBP: ffff88040d7d7be8 R08: ffffffff814a8812 R09: ffff88040d7d7bb0
R10: ffff8803c9dfd8fc R11: ffff88040d7d7c48 R12: ffff8803c9dfd8fc
R13: ffff8803d4d30d00 R14: ffff88040d3f8000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000deadbeef CR3: 0000000001a0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksoftirqd/8 (pid: 59, threadinfo ffff88040d7d6000, task ffff88040d7e1f50)
Stack:
  ffffffff814a8b02 ffff8803d4d30d00 ffffffff814a8812 ffff8803d4d30d00
  ffff88040d3f8000 ffff88040d7d7c18 ffffffff814a8eb5 0000000080000000
  ffffffff81472e61 ffff8803d4d30d00 ffff88040d3f8000 ffff88040d7d7c48
Call Trace:
  [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308
  [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a
  [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54
  [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
  [<ffffffff814a9142>] ip_rcv+0x237/0x269
  [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
  [<ffffffff81473f91>] process_backlog+0xf9/0x1da
  [<ffffffff8147639a>] net_rx_action+0xad/0x218
  [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
  [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42
  [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a0a6d>] kthread+0xc2/0xca
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
Code:  Bad RIP value.
RIP  [<00000000deadbeef>] 0xdeadbeee
  RSP <ffff88040d7d7bc0>
CR2: 00000000deadbeef
---[ end trace eed854e70ff0a575 ]---
Kernel panic - not syncing: Fatal excepti

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23  2:32                       ` Ben Greear
  (?)
@ 2013-01-23  6:11                       ` Eric Dumazet
  2013-01-23  7:14                           ` Ben Greear
  -1 siblings, 1 reply; 53+ messages in thread
From: Eric Dumazet @ 2013-01-23  6:11 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs

On Tue, 2013-01-22 at 18:32 -0800, Ben Greear wrote:

> diff --git a/net/core/dst.c b/net/core/dst.c
> index ee6153e..234b168 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -245,6 +245,7 @@ again:
>                  dst->ops->destroy(dst);
>          if (dst->dev)
>                  dev_put(dst->dev);
> +       dst->input = dst->output = 0xdeadbeef;
>          kmem_cache_free(dst->ops->kmem_cachep, dst);

Great !

You could comment the kmem_cache_free() as well to get better chances to
hit the bug, and maybe start a bisection to find the faulty commit ?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23  6:11                       ` Eric Dumazet
@ 2013-01-23  7:14                           ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-23  7:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 01/22/2013 10:11 PM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 18:32 -0800, Ben Greear wrote:
>
>> diff --git a/net/core/dst.c b/net/core/dst.c
>> index ee6153e..234b168 100644
>> --- a/net/core/dst.c
>> +++ b/net/core/dst.c
>> @@ -245,6 +245,7 @@ again:
>>                   dst->ops->destroy(dst);
>>           if (dst->dev)
>>                   dev_put(dst->dev);
>> +       dst->input = dst->output = 0xdeadbeef;
>>           kmem_cache_free(dst->ops->kmem_cachep, dst);
>
> Great !
>
> You could comment the kmem_cache_free() as well to get better chances to
> hit the bug, and maybe start a bisection to find the faulty commit ?

I suspect the bug goes back at least as far as 3.3.  And since
I need the NFS patches for this test case, bisecting will be pure hell.

I'll work on some more code instrumentation tomorrow.

One thing that came to mind while I was looking at the code today:

How are the non-ref-counted dst objects used safely?  Any chance
that tearing down the IP protocol on a device (or deleting a device)
could delete a dst that is referenced by an skb (and thus crashes as
I see)?

Thanks,
Ben

-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-23  7:14                           ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-23  7:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs

On 01/22/2013 10:11 PM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 18:32 -0800, Ben Greear wrote:
>
>> diff --git a/net/core/dst.c b/net/core/dst.c
>> index ee6153e..234b168 100644
>> --- a/net/core/dst.c
>> +++ b/net/core/dst.c
>> @@ -245,6 +245,7 @@ again:
>>                   dst->ops->destroy(dst);
>>           if (dst->dev)
>>                   dev_put(dst->dev);
>> +       dst->input = dst->output = 0xdeadbeef;
>>           kmem_cache_free(dst->ops->kmem_cachep, dst);
>
> Great !
>
> You could comment the kmem_cache_free() as well to get better chances to
> hit the bug, and maybe start a bisection to find the faulty commit ?

I suspect the bug goes back at least as far as 3.3.  And since
I need the NFS patches for this test case, bisecting will be pure hell.

I'll work on some more code instrumentation tomorrow.

One thing that came to mind while I was looking at the code today:

How are the non-ref-counted dst objects used safely?  Any chance
that tearing down the IP protocol on a device (or deleting a device)
could delete a dst that is referenced by an skb (and thus crashes as
I see)?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23  7:14                           ` Ben Greear
@ 2013-01-23 13:35                               ` Eric Dumazet
  -1 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-23 13:35 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:

> How are the non-ref-counted dst objects used safely?  Any chance
> that tearing down the IP protocol on a device (or deleting a device)
> could delete a dst that is referenced by an skb (and thus crashes as
> I see)?

There is probably a bug. Normally it should be RCU protected.



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-23 13:35                               ` Eric Dumazet
  0 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-23 13:35 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs

On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:

> How are the non-ref-counted dst objects used safely?  Any chance
> that tearing down the IP protocol on a device (or deleting a device)
> could delete a dst that is referenced by an skb (and thus crashes as
> I see)?

There is probably a bug. Normally it should be RCU protected.




^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23  7:14                           ` Ben Greear
@ 2013-01-23 14:42                               ` Eric Dumazet
  -1 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-23 14:42 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:

> I suspect the bug goes back at least as far as 3.3.  And since
> I need the NFS patches for this test case, bisecting will be pure hell.

Make sure the bug is still present in net-next.


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-23 14:42                               ` Eric Dumazet
  0 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-23 14:42 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs

On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:

> I suspect the bug goes back at least as far as 3.3.  And since
> I need the NFS patches for this test case, bisecting will be pure hell.

Make sure the bug is still present in net-next.



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23 13:35                               ` Eric Dumazet
@ 2013-01-23 18:15                                 ` Ben Greear
  -1 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-23 18:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 01/23/2013 05:35 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:
>
>> How are the non-ref-counted dst objects used safely?  Any chance
>> that tearing down the IP protocol on a device (or deleting a device)
>> could delete a dst that is referenced by an skb (and thus crashes as
>> I see)?
>
> There is probably a bug. Normally it should be RCU protected.

I'm building net-next, and will do some testing on it later today
if all goes well.

In the meantime, a few questions about RCU.

If the non-ref-counted dst is protected by RCU, does that mean
that from the time we acquire the pointer (ie, set it in the skb),
to the time we clear the pointer from the skb, we must be under
RCU read lock?

If so, that means that ip_rcv_finish must be called under RCU
read lock, etc?

Btw, this seems non related to NFS, so I'm dropping them
from CC after this email unless I hear a request otherwise.

Thanks,
Ben


-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-23 18:15                                 ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-23 18:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs

On 01/23/2013 05:35 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:
>
>> How are the non-ref-counted dst objects used safely?  Any chance
>> that tearing down the IP protocol on a device (or deleting a device)
>> could delete a dst that is referenced by an skb (and thus crashes as
>> I see)?
>
> There is probably a bug. Normally it should be RCU protected.

I'm building net-next, and will do some testing on it later today
if all goes well.

In the meantime, a few questions about RCU.

If the non-ref-counted dst is protected by RCU, does that mean
that from the time we acquire the pointer (ie, set it in the skb),
to the time we clear the pointer from the skb, we must be under
RCU read lock?

If so, that means that ip_rcv_finish must be called under RCU
read lock, etc?

Btw, this seems non related to NFS, so I'm dropping them
from CC after this email unless I hear a request otherwise.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23 18:15                                 ` Ben Greear
  (?)
@ 2013-01-23 21:43                                 ` Eric Dumazet
  -1 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-23 21:43 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs

On Wed, 2013-01-23 at 10:15 -0800, Ben Greear wrote:
> On 01/23/2013 05:35 AM, Eric Dumazet wrote:
> > On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:
> >
> >> How are the non-ref-counted dst objects used safely?  Any chance
> >> that tearing down the IP protocol on a device (or deleting a device)
> >> could delete a dst that is referenced by an skb (and thus crashes as
> >> I see)?
> >
> > There is probably a bug. Normally it should be RCU protected.
> 
> I'm building net-next, and will do some testing on it later today
> if all goes well.
> 
> In the meantime, a few questions about RCU.
> 
> If the non-ref-counted dst is protected by RCU, does that mean
> that from the time we acquire the pointer (ie, set it in the skb),
> to the time we clear the pointer from the skb, we must be under
> RCU read lock?
> 
> If so, that means that ip_rcv_finish must be called under RCU
> read lock, etc?

Obviously yes.  Check skb_dst_force() for the rare cases we want to
escape the RCU section.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23 14:42                               ` Eric Dumazet
@ 2013-01-23 21:53                                 ` Ben Greear
  -1 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-23 21:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 01/23/2013 06:42 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:
>
>> I suspect the bug goes back at least as far as 3.3.  And since
>> I need the NFS patches for this test case, bisecting will be pure hell.
>
> Make sure the bug is still present in net-next.
>

My test case isn't running well on net-next.  Seems most
of the file-io processes are hung trying to open a file:

[root@lf1011-12060006 lanforge]# cat /proc/10612/stack
[<ffffffff8115fadb>] do_last+0x1db/0xa4e
[<ffffffff8116090b>] path_openat+0xcb/0x363
[<ffffffff81160ca8>] do_filp_open+0x38/0x84
[<ffffffff81153bfc>] do_sys_open+0x6d/0xff
[<ffffffff81153cbb>] sys_open+0x1c/0x1e
[<ffffffff8154f629>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


I'm going back to 3.7.3 and will try to learn some more
about what it's doing there.

Thanks,
Ben


-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-23 21:53                                 ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-23 21:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs

On 01/23/2013 06:42 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:
>
>> I suspect the bug goes back at least as far as 3.3.  And since
>> I need the NFS patches for this test case, bisecting will be pure hell.
>
> Make sure the bug is still present in net-next.
>

My test case isn't running well on net-next.  Seems most
of the file-io processes are hung trying to open a file:

[root@lf1011-12060006 lanforge]# cat /proc/10612/stack
[<ffffffff8115fadb>] do_last+0x1db/0xa4e
[<ffffffff8116090b>] path_openat+0xcb/0x363
[<ffffffff81160ca8>] do_filp_open+0x38/0x84
[<ffffffff81153bfc>] do_sys_open+0x6d/0xff
[<ffffffff81153cbb>] sys_open+0x1c/0x1e
[<ffffffff8154f629>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


I'm going back to 3.7.3 and will try to learn some more
about what it's doing there.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23  2:32                       ` Ben Greear
@ 2013-01-23 23:55                           ` Ben Greear
  -1 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-23 23:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 01/22/2013 06:32 PM, Ben Greear wrote:

So, I'm slowly making some progress.  I've verified that the skb
has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
method.  I'm trying to track it backwards and figure out which
device it belongs to, etc....takes a while to reproduce though.

One thing about this stack trace below...the dev_seq_stop() does
a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
can cause dev_seq_stop() to run, but if this stack is legit,
then maybe by the time we enter the ip_rcv_finish() code we are
running without rcu_readlock() held?

If so, that would probably explain the bug.

> Call Trace:
>   [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308
>   [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a
>   [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54
>   [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
>   [<ffffffff814a9142>] ip_rcv+0x237/0x269
>   [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
>   [<ffffffff81473f91>] process_backlog+0xf9/0x1da
>   [<ffffffff8147639a>] net_rx_action+0xad/0x218
>   [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
>   [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42
>   [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
>   [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
>   [<ffffffff810a0a6d>] kthread+0xc2/0xca
>   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
>   [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56


## This is from a slightly different kernel image...but probably this part is legit.

0xffffffff814a92b3 is in ip_rcv (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:466).
461		/* Our transport medium may have padded the buffer out. Now we know it
462		 * is IP we can trim to the true length of the frame.
463		 * Note this now means skb->len holds ntohs(iph->tot_len).
464		 */
465		if (pskb_trim_rcsum(skb, len)) {
466			IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);
467			goto drop;
468		}
469	
470		/* Remove any debris in the socket control block */


-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-23 23:55                           ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-23 23:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs

On 01/22/2013 06:32 PM, Ben Greear wrote:

So, I'm slowly making some progress.  I've verified that the skb
has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
method.  I'm trying to track it backwards and figure out which
device it belongs to, etc....takes a while to reproduce though.

One thing about this stack trace below...the dev_seq_stop() does
a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
can cause dev_seq_stop() to run, but if this stack is legit,
then maybe by the time we enter the ip_rcv_finish() code we are
running without rcu_readlock() held?

If so, that would probably explain the bug.

> Call Trace:
>   [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308
>   [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a
>   [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54
>   [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
>   [<ffffffff814a9142>] ip_rcv+0x237/0x269
>   [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
>   [<ffffffff81473f91>] process_backlog+0xf9/0x1da
>   [<ffffffff8147639a>] net_rx_action+0xad/0x218
>   [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
>   [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42
>   [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
>   [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
>   [<ffffffff810a0a6d>] kthread+0xc2/0xca
>   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
>   [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56


## This is from a slightly different kernel image...but probably this part is legit.

0xffffffff814a92b3 is in ip_rcv (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:466).
461		/* Our transport medium may have padded the buffer out. Now we know it
462		 * is IP we can trim to the true length of the frame.
463		 * Note this now means skb->len holds ntohs(iph->tot_len).
464		 */
465		if (pskb_trim_rcsum(skb, len)) {
466			IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);
467			goto drop;
468		}
469	
470		/* Remove any debris in the socket control block */


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23 23:55                           ` Ben Greear
@ 2013-01-24  0:01                               ` Eric Dumazet
  -1 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-24  0:01 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote:
> On 01/22/2013 06:32 PM, Ben Greear wrote:
> 
> So, I'm slowly making some progress.  I've verified that the skb
> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
> method.  I'm trying to track it backwards and figure out which
> device it belongs to, etc....takes a while to reproduce though.
> 
> One thing about this stack trace below...the dev_seq_stop() does
> a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
> can cause dev_seq_stop() to run, but if this stack is legit,
> then maybe by the time we enter the ip_rcv_finish() code we are
> running without rcu_readlock() held?
> 
> If so, that would probably explain the bug.
> 

The whole thing is run under rcu_read_lock() done in
__netif_receive_skb()

My suspicion was that we called netif_rx() from macvlan leaving a
not refcounted skb dst.

But the patch I sent to you didnt solve the bug, so its something else.

You could trace at which point the dst was released. (where you set
dst->input/output to deadbeef)

> > Call Trace:
> >   [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308
> >   [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a
> >   [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54
> >   [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
> >   [<ffffffff814a9142>] ip_rcv+0x237/0x269
> >   [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
> >   [<ffffffff81473f91>] process_backlog+0xf9/0x1da
> >   [<ffffffff8147639a>] net_rx_action+0xad/0x218
> >   [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
> >   [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42
> >   [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
> >   [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
> >   [<ffffffff810a0a6d>] kthread+0xc2/0xca
> >   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
> >   [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0
> >   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
> 
> 
> ## This is from a slightly different kernel image...but probably this part is legit.
> 
> 0xffffffff814a92b3 is in ip_rcv (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:466).
> 461		/* Our transport medium may have padded the buffer out. Now we know it
> 462		 * is IP we can trim to the true length of the frame.
> 463		 * Note this now means skb->len holds ntohs(iph->tot_len).
> 464		 */
> 465		if (pskb_trim_rcsum(skb, len)) {
> 466			IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);
> 467			goto drop;
> 468		}
> 469	
> 470		/* Remove any debris in the socket control block */
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-24  0:01                               ` Eric Dumazet
  0 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-24  0:01 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs

On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote:
> On 01/22/2013 06:32 PM, Ben Greear wrote:
> 
> So, I'm slowly making some progress.  I've verified that the skb
> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
> method.  I'm trying to track it backwards and figure out which
> device it belongs to, etc....takes a while to reproduce though.
> 
> One thing about this stack trace below...the dev_seq_stop() does
> a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
> can cause dev_seq_stop() to run, but if this stack is legit,
> then maybe by the time we enter the ip_rcv_finish() code we are
> running without rcu_readlock() held?
> 
> If so, that would probably explain the bug.
> 

The whole thing is run under rcu_read_lock() done in
__netif_receive_skb()

My suspicion was that we called netif_rx() from macvlan leaving a
not refcounted skb dst.

But the patch I sent to you didnt solve the bug, so its something else.

You could trace at which point the dst was released. (where you set
dst->input/output to deadbeef)

> > Call Trace:
> >   [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308
> >   [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a
> >   [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54
> >   [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
> >   [<ffffffff814a9142>] ip_rcv+0x237/0x269
> >   [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
> >   [<ffffffff81473f91>] process_backlog+0xf9/0x1da
> >   [<ffffffff8147639a>] net_rx_action+0xad/0x218
> >   [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
> >   [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42
> >   [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
> >   [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
> >   [<ffffffff810a0a6d>] kthread+0xc2/0xca
> >   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
> >   [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0
> >   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
> 
> 
> ## This is from a slightly different kernel image...but probably this part is legit.
> 
> 0xffffffff814a92b3 is in ip_rcv (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:466).
> 461		/* Our transport medium may have padded the buffer out. Now we know it
> 462		 * is IP we can trim to the true length of the frame.
> 463		 * Note this now means skb->len holds ntohs(iph->tot_len).
> 464		 */
> 465		if (pskb_trim_rcsum(skb, len)) {
> 466			IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);
> 467			goto drop;
> 468		}
> 469	
> 470		/* Remove any debris in the socket control block */
> 
> 



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  0:01                               ` Eric Dumazet
@ 2013-01-24  0:13                                 ` Ben Greear
  -1 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-24  0:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 01/23/2013 04:01 PM, Eric Dumazet wrote:
> On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote:
>> On 01/22/2013 06:32 PM, Ben Greear wrote:
>>
>> So, I'm slowly making some progress.  I've verified that the skb
>> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
>> method.  I'm trying to track it backwards and figure out which
>> device it belongs to, etc....takes a while to reproduce though.
>>
>> One thing about this stack trace below...the dev_seq_stop() does
>> a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
>> can cause dev_seq_stop() to run, but if this stack is legit,
>> then maybe by the time we enter the ip_rcv_finish() code we are
>> running without rcu_readlock() held?
>>
>> If so, that would probably explain the bug.
>>
>
> The whole thing is run under rcu_read_lock() done in
> __netif_receive_skb()

I was worried that the dev_seq_stop might be called
incorrectly causing an asymetric unlock.  I have no
idea how that might happened, but several crashes
have that dev_seq_stop method listed, so it got me suspicious.

>
> My suspicion was that we called netif_rx() from macvlan leaving a
> not refcounted skb dst.
>
> But the patch I sent to you didnt solve the bug, so its something else.
>
> You could trace at which point the dst was released. (where you set
> dst->input/output to deadbeef)

My current code is in some garbage collector timer code, but I can
work on saving the call-site that first pokes the dst into the
garbage collection list...

Thanks,
Ben

-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-24  0:13                                 ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-24  0:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs

On 01/23/2013 04:01 PM, Eric Dumazet wrote:
> On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote:
>> On 01/22/2013 06:32 PM, Ben Greear wrote:
>>
>> So, I'm slowly making some progress.  I've verified that the skb
>> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
>> method.  I'm trying to track it backwards and figure out which
>> device it belongs to, etc....takes a while to reproduce though.
>>
>> One thing about this stack trace below...the dev_seq_stop() does
>> a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
>> can cause dev_seq_stop() to run, but if this stack is legit,
>> then maybe by the time we enter the ip_rcv_finish() code we are
>> running without rcu_readlock() held?
>>
>> If so, that would probably explain the bug.
>>
>
> The whole thing is run under rcu_read_lock() done in
> __netif_receive_skb()

I was worried that the dev_seq_stop might be called
incorrectly causing an asymetric unlock.  I have no
idea how that might happened, but several crashes
have that dev_seq_stop method listed, so it got me suspicious.

>
> My suspicion was that we called netif_rx() from macvlan leaving a
> not refcounted skb dst.
>
> But the patch I sent to you didnt solve the bug, so its something else.
>
> You could trace at which point the dst was released. (where you set
> dst->input/output to deadbeef)

My current code is in some garbage collector timer code, but I can
work on saving the call-site that first pokes the dst into the
garbage collection list...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  0:13                                 ` Ben Greear
@ 2013-01-24  0:23                                     ` Eric Dumazet
  -1 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-24  0:23 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 2013-01-23 at 16:13 -0800, Ben Greear wrote:
> On 01/23/2013 04:01 PM, Eric Dumazet wrote:
> > On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote:
> >> On 01/22/2013 06:32 PM, Ben Greear wrote:
> >>
> >> So, I'm slowly making some progress.  I've verified that the skb
> >> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
> >> method.  I'm trying to track it backwards and figure out which
> >> device it belongs to, etc....takes a while to reproduce though.
> >>
> >> One thing about this stack trace below...the dev_seq_stop() does
> >> a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
> >> can cause dev_seq_stop() to run, but if this stack is legit,
> >> then maybe by the time we enter the ip_rcv_finish() code we are
> >> running without rcu_readlock() held?
> >>
> >> If so, that would probably explain the bug.
> >>
> >
> > The whole thing is run under rcu_read_lock() done in
> > __netif_receive_skb()
> 
> I was worried that the dev_seq_stop might be called
> incorrectly causing an asymetric unlock.  I have no
> idea how that might happened, but several crashes
> have that dev_seq_stop method listed, so it got me suspicious.

dev_seq_stop() is some word in the kernel stack, result of a prior
system call. Stack is not cleanup.

Each function reserves an amount of stack but not always write on all
reserved space (some automatic variables might be not set)

Note the "? " before the name : linux printed the symbol but this was
not a call site for this particular call graph. Its only an extra
indication, that can be useful sometimes.



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-24  0:23                                     ` Eric Dumazet
  0 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-24  0:23 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs

On Wed, 2013-01-23 at 16:13 -0800, Ben Greear wrote:
> On 01/23/2013 04:01 PM, Eric Dumazet wrote:
> > On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote:
> >> On 01/22/2013 06:32 PM, Ben Greear wrote:
> >>
> >> So, I'm slowly making some progress.  I've verified that the skb
> >> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
> >> method.  I'm trying to track it backwards and figure out which
> >> device it belongs to, etc....takes a while to reproduce though.
> >>
> >> One thing about this stack trace below...the dev_seq_stop() does
> >> a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
> >> can cause dev_seq_stop() to run, but if this stack is legit,
> >> then maybe by the time we enter the ip_rcv_finish() code we are
> >> running without rcu_readlock() held?
> >>
> >> If so, that would probably explain the bug.
> >>
> >
> > The whole thing is run under rcu_read_lock() done in
> > __netif_receive_skb()
> 
> I was worried that the dev_seq_stop might be called
> incorrectly causing an asymetric unlock.  I have no
> idea how that might happened, but several crashes
> have that dev_seq_stop method listed, so it got me suspicious.

dev_seq_stop() is some word in the kernel stack, result of a prior
system call. Stack is not cleanup.

Each function reserves an amount of stack but not always write on all
reserved space (some automatic variables might be not set)

Note the "? " before the name : linux printed the symbol but this was
not a call site for this particular call graph. Its only an extra
indication, that can be useful sometimes.




^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  0:23                                     ` Eric Dumazet
  (?)
@ 2013-01-24  0:38                                     ` Ben Greear
  2013-01-24  0:45                                       ` Eric Dumazet
  -1 siblings, 1 reply; 53+ messages in thread
From: Ben Greear @ 2013-01-24  0:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 01/23/2013 04:23 PM, Eric Dumazet wrote:
> On Wed, 2013-01-23 at 16:13 -0800, Ben Greear wrote:
>> On 01/23/2013 04:01 PM, Eric Dumazet wrote:

>> I was worried that the dev_seq_stop might be called
>> incorrectly causing an asymetric unlock.  I have no
>> idea how that might happened, but several crashes
>> have that dev_seq_stop method listed, so it got me suspicious.
>
> dev_seq_stop() is some word in the kernel stack, result of a prior
> system call. Stack is not cleanup.
>
> Each function reserves an amount of stack but not always write on all
> reserved space (some automatic variables might be not set)
>
> Note the "? " before the name : linux printed the symbol but this was
> not a call site for this particular call graph. Its only an extra
> indication, that can be useful sometimes.

Ahh, thanks for that info...I'd never quite pieced that together
before.

Here's another crash.  Interestingly, the dst is bad before the rcu-read-lock()
(the bug is from the first of the 'deadbeef' debugging code below)

Perhaps other useful info:  The skb->dev claims to be 'lo'.  The dst 'pointer'
in the skb has 0x1 set, so it is the 'noref' variant.


static int __netif_receive_skb(struct sk_buff *skb)
{
	struct packet_type *ptype, *pt_prev;
	rx_handler_func_t *rx_handler;
	struct net_device *orig_dev;
	struct net_device *null_or_dev;
	bool deliver_exact = false;
	int ret = NET_RX_DROP;
	__be16 type;
	unsigned long pflags = current->flags;

	net_timestamp_check(!netdev_tstamp_prequeue, skb);

	trace_netif_receive_skb(skb);

	/*
	 * PFMEMALLOC skbs are special, they should
	 * - be delivered to SOCK_MEMALLOC sockets only
	 * - stay away from userspace
	 * - have bounded memory usage
	 *
	 * Use PF_MEMALLOC as this saves us from propagating the allocation
	 * context down to all allocation sites.
	 */
	if (sk_memalloc_socks() && skb_pfmemalloc(skb))
		current->flags |= PF_MEMALLOC;

	/* if we've gotten here through NAPI, check netpoll */
	if (netpoll_receive_skb(skb))
		goto out;

	orig_dev = skb->dev;

	skb_reset_network_header(skb);
	skb_reset_transport_header(skb);
	skb_reset_mac_len(skb);

	pt_prev = NULL;

	if (skb_dst(skb)) {
		if (skb_dst(skb)->input == 0xdeadbeef) {
			printk("bad dst: %lu, skb->dev: %s  len: %i\n",
			       skb->_skb_refdst, skb->dev->name, skb->len);
			BUG_ON(1);
		}
	}
	
	rcu_read_lock();

	if (skb_dst(skb)) {
		if (skb_dst(skb)->input == 0xdeadbeef) {
			printk("bad dst: %lu, skb->dev: %s  len: %i\n",
			       skb->_skb_refdst, skb->dev->name, skb->len);
			BUG_ON(1);
		}
	}
	
	
another_round:
	skb->skb_iif = skb->dev->ifindex;

	__this_cpu_inc(softnet_data.processed);
...



[root@lf1011-12060006 ~]# bad dst: 18446612148864241601, skb->dev: lo  len: 3232
------------[ cut here ]------------
kernel BUG at /home/greearb/git/linux-3.7.dev.y/net/core/dev.c:3266!
invalid opcode: 0000 [#1] PREEMPT SMP
Modules linked in: macvlan pktgen lockd sunrpc uinput coretemp hwmon kvm_intel kvm microcode iTCO_wdt iTe
CPU 4
Pid: 35, comm: ksoftirqd/4 Tainted: G         C O 3.7.3+ #50 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<ffffffff81473d22>]  [<ffffffff81473d22>] __netif_receive_skb+0x101/0x5b8
RSP: 0018:ffff88040d711c58  EFLAGS: 00010296
RAX: 0000000000000036 RBX: ffff88041fc93e80 RCX: 000000000000a6a5
RDX: ffffffff810883a6 RSI: 00000000000005fc RDI: 0000000000000246
RBP: ffff88040d711cb8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000004 R11: 0000000000000000 R12: ffff88041fc93fd0
R13: 0000000000000040 R14: ffff88040d3f8000 R15: ffff88041fc93f80
FS:  0000000000000000(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000262c118 CR3: 00000003da651000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksoftirqd/4 (pid: 35, threadinfo ffff88040d710000, task ffff88040d701f50)
Stack:
  0000000000000046 0420804000000100 ffffffff81aaf0a0 ffff8803da901200
  000000000d711cb8 ffffffff81aaf0a0 ffff8803ffa90428 ffff88041fc93e80
  ffff88041fc93fd0 0000000000000040 0000000000000024 ffff88041fc93f80
Call Trace:
  [<ffffffff814742d2>] process_backlog+0xf9/0x1da
  [<ffffffff814766db>] net_rx_action+0xad/0x218
  [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
  [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42
  [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a0a6d>] kthread+0xc2/0xca
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81537dbc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
Code: fc ff ff ba ef be ad de 48 39 50 50 75 21 48 8b 45 b8 48 c7 c7 50 ea 82 81 8b 48 68 48 8b 50 20 48
RIP  [<ffffffff81473d22>] __netif_receive_skb+0x101/0x5b8
  RSP <ffff88040d711c58>
---[ end trace e5f94dc78f5e5277 ]---
Kernel panic - not syncing: Fatal exception in interrupt
>
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  0:38                                     ` Ben Greear
@ 2013-01-24  0:45                                       ` Eric Dumazet
  2013-01-24  0:51                                         ` Ben Greear
  0 siblings, 1 reply; 53+ messages in thread
From: Eric Dumazet @ 2013-01-24  0:45 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Wed, 2013-01-23 at 16:38 -0800, Ben Greear wrote:
> On 01/23/2013 04:23 PM, Eric Dumazet wrote:
> > On Wed, 2013-01-23 at 16:13 -0800, Ben Greear wrote:
> >> On 01/23/2013 04:01 PM, Eric Dumazet wrote:
> 
> >> I was worried that the dev_seq_stop might be called
> >> incorrectly causing an asymetric unlock.  I have no
> >> idea how that might happened, but several crashes
> >> have that dev_seq_stop method listed, so it got me suspicious.
> >
> > dev_seq_stop() is some word in the kernel stack, result of a prior
> > system call. Stack is not cleanup.
> >
> > Each function reserves an amount of stack but not always write on all
> > reserved space (some automatic variables might be not set)
> >
> > Note the "? " before the name : linux printed the symbol but this was
> > not a call site for this particular call graph. Its only an extra
> > indication, that can be useful sometimes.
> 
> Ahh, thanks for that info...I'd never quite pieced that together
> before.
> 
> Here's another crash.  Interestingly, the dst is bad before the rcu-read-lock()
> (the bug is from the first of the 'deadbeef' debugging code below)
> 
> Perhaps other useful info:  The skb->dev claims to be 'lo'.  The dst 'pointer'
> in the skb has 0x1 set, so it is the 'noref' variant.
> 
> 
> static int __netif_receive_skb(struct sk_buff *skb)
> {
> 	struct packet_type *ptype, *pt_prev;
> 	rx_handler_func_t *rx_handler;
> 	struct net_device *orig_dev;
> 	struct net_device *null_or_dev;
> 	bool deliver_exact = false;
> 	int ret = NET_RX_DROP;
> 	__be16 type;
> 	unsigned long pflags = current->flags;
> 
> 	net_timestamp_check(!netdev_tstamp_prequeue, skb);
> 
> 	trace_netif_receive_skb(skb);
> 
> 	/*
> 	 * PFMEMALLOC skbs are special, they should
> 	 * - be delivered to SOCK_MEMALLOC sockets only
> 	 * - stay away from userspace
> 	 * - have bounded memory usage
> 	 *
> 	 * Use PF_MEMALLOC as this saves us from propagating the allocation
> 	 * context down to all allocation sites.
> 	 */
> 	if (sk_memalloc_socks() && skb_pfmemalloc(skb))
> 		current->flags |= PF_MEMALLOC;
> 
> 	/* if we've gotten here through NAPI, check netpoll */
> 	if (netpoll_receive_skb(skb))
> 		goto out;
> 
> 	orig_dev = skb->dev;
> 
> 	skb_reset_network_header(skb);
> 	skb_reset_transport_header(skb);
> 	skb_reset_mac_len(skb);
> 
> 	pt_prev = NULL;
> 
> 	if (skb_dst(skb)) {
> 		if (skb_dst(skb)->input == 0xdeadbeef) {
> 			printk("bad dst: %lu, skb->dev: %s  len: %i\n",
> 			       skb->_skb_refdst, skb->dev->name, skb->len);
> 			BUG_ON(1);
> 		}
> 	}
> 	

You should add your debuging code in netif_rx() so that we know the
caller

by the way you could only add

BUG_ON(skb->_skb_refdst & SKB_DST_NOREF)

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  0:45                                       ` Eric Dumazet
@ 2013-01-24  0:51                                         ` Ben Greear
  2013-01-24  1:00                                           ` Eric Dumazet
  0 siblings, 1 reply; 53+ messages in thread
From: Ben Greear @ 2013-01-24  0:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 01/23/2013 04:45 PM, Eric Dumazet wrote:
> On Wed, 2013-01-23 at 16:38 -0800, Ben Greear wrote:
>> On 01/23/2013 04:23 PM, Eric Dumazet wrote:
>>> On Wed, 2013-01-23 at 16:13 -0800, Ben Greear wrote:
>>>> On 01/23/2013 04:01 PM, Eric Dumazet wrote:
>>
>>>> I was worried that the dev_seq_stop might be called
>>>> incorrectly causing an asymetric unlock.  I have no
>>>> idea how that might happened, but several crashes
>>>> have that dev_seq_stop method listed, so it got me suspicious.
>>>
>>> dev_seq_stop() is some word in the kernel stack, result of a prior
>>> system call. Stack is not cleanup.
>>>
>>> Each function reserves an amount of stack but not always write on all
>>> reserved space (some automatic variables might be not set)
>>>
>>> Note the "? " before the name : linux printed the symbol but this was
>>> not a call site for this particular call graph. Its only an extra
>>> indication, that can be useful sometimes.
>>
>> Ahh, thanks for that info...I'd never quite pieced that together
>> before.
>>
>> Here's another crash.  Interestingly, the dst is bad before the rcu-read-lock()
>> (the bug is from the first of the 'deadbeef' debugging code below)
>>
>> Perhaps other useful info:  The skb->dev claims to be 'lo'.  The dst 'pointer'
>> in the skb has 0x1 set, so it is the 'noref' variant.
>>
>>
>> static int __netif_receive_skb(struct sk_buff *skb)
>> {
>> 	struct packet_type *ptype, *pt_prev;
>> 	rx_handler_func_t *rx_handler;
>> 	struct net_device *orig_dev;
>> 	struct net_device *null_or_dev;
>> 	bool deliver_exact = false;
>> 	int ret = NET_RX_DROP;
>> 	__be16 type;
>> 	unsigned long pflags = current->flags;
>>
>> 	net_timestamp_check(!netdev_tstamp_prequeue, skb);
>>
>> 	trace_netif_receive_skb(skb);
>>
>> 	/*
>> 	 * PFMEMALLOC skbs are special, they should
>> 	 * - be delivered to SOCK_MEMALLOC sockets only
>> 	 * - stay away from userspace
>> 	 * - have bounded memory usage
>> 	 *
>> 	 * Use PF_MEMALLOC as this saves us from propagating the allocation
>> 	 * context down to all allocation sites.
>> 	 */
>> 	if (sk_memalloc_socks() && skb_pfmemalloc(skb))
>> 		current->flags |= PF_MEMALLOC;
>>
>> 	/* if we've gotten here through NAPI, check netpoll */
>> 	if (netpoll_receive_skb(skb))
>> 		goto out;
>>
>> 	orig_dev = skb->dev;
>>
>> 	skb_reset_network_header(skb);
>> 	skb_reset_transport_header(skb);
>> 	skb_reset_mac_len(skb);
>>
>> 	pt_prev = NULL;
>>
>> 	if (skb_dst(skb)) {
>> 		if (skb_dst(skb)->input == 0xdeadbeef) {
>> 			printk("bad dst: %lu, skb->dev: %s  len: %i\n",
>> 			       skb->_skb_refdst, skb->dev->name, skb->len);
>> 			BUG_ON(1);
>> 		}
>> 	}
>> 	
>
> You should add your debuging code in netif_rx() so that we know the
> caller
>
> by the way you could only add
>
> BUG_ON(skb->_skb_refdst & SKB_DST_NOREF)

Ok, will add that.

I was poking around in drivers/net/loopback.c.  Maybe it needs
to clean up the skb_dst() before calling the rx logic in the
loopback_xmit method?


Thanks,
ben

>
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  0:51                                         ` Ben Greear
@ 2013-01-24  1:00                                           ` Eric Dumazet
  2013-01-24  1:06                                             ` Ben Greear
  0 siblings, 1 reply; 53+ messages in thread
From: Eric Dumazet @ 2013-01-24  1:00 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Wed, 2013-01-23 at 16:51 -0800, Ben Greear wrote:

> 
> I was poking around in drivers/net/loopback.c.  Maybe it needs
> to clean up the skb_dst() before calling the rx logic in the
> loopback_xmit method?

Nope. Its ok there. We need a dst for loopback

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  1:00                                           ` Eric Dumazet
@ 2013-01-24  1:06                                             ` Ben Greear
  2013-01-24  1:10                                               ` Eric Dumazet
  0 siblings, 1 reply; 53+ messages in thread
From: Ben Greear @ 2013-01-24  1:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 01/23/2013 05:00 PM, Eric Dumazet wrote:
> On Wed, 2013-01-23 at 16:51 -0800, Ben Greear wrote:
>
>>
>> I was poking around in drivers/net/loopback.c.  Maybe it needs
>> to clean up the skb_dst() before calling the rx logic in the
>> loopback_xmit method?
>
> Nope. Its ok there. We need a dst for loopback

Here's the crash with the BUG_ON().  Did I understand your
suggestion properly?  This hit immediately when booting.


int netif_rx(struct sk_buff *skb)
{
	int ret;

	BUG_ON(skb->_skb_refdst & SKB_DST_NOREF);
	
	/* if netpoll wants it, pretend we never saw it */
	if (netpoll_rx(skb))
		return NET_RX_DROP;


kernel BUG at /home/greearb/git/linux-3.7.dev.y/net/core/dev.c:2982!
invalid opcode: 0000 [#1] PREEMPT SMP
Modules linked in: lockd sunrpc macvlan pktgen uinput coretemp hwmon kvm_intel kvm iTCO_wdt iTCO_vendor_e
CPU 0
Pid: 1554, comm: btserver Tainted: G         C O 3.7.3+ #51 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<ffffffff8147685a>]  [<ffffffff8147685a>] netif_rx+0x14/0x109
RSP: 0018:ffff8804030359a8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8803fff3a2f0 RCX: ffff88040d3b4490
RDX: ffff880403035a08 RSI: ffff88040d3f8000 RDI: ffff8803fff3a2f0
RBP: ffff8804030359d8 R08: 0000000000000001 R09: 0000000000000000
R10: ffffffff81472e61 R11: ffff8803fff62cc0 R12: 0000000000016ff0
R13: 000000000000003c R14: ffff88041fc00000 R15: ffffffff81670560
FS:  00007fcbd5c01740(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000439004 CR3: 00000003ff68d000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btserver (pid: 1554, threadinfo ffff880403034000, task ffff880407e09f50)
Stack:
  ffff8804030359d8 ffffffff814c0290 ffff880403035a08 ffff8803fff3a2f0
  ffff8803fff3a2f0 0000000000016ff0 ffff880403035a08 ffffffff813a3c4d
  ffff8803fff3a2f0 ffff88040d3f8000 00000000a01b7c08 ffff88040d0b7400
Call Trace:
  [<ffffffff814c0290>] ? tcp_wfree+0xc0/0xc8
  [<ffffffff813a3c4d>] loopback_xmit+0x64/0x83
  [<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e
  [<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c
  [<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e
  [<ffffffff8148cfa6>] ? eth_header+0x28/0xb6
  [<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7
  [<ffffffff814ad835>] ip_finish_output2+0x297/0x30d
  [<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d
  [<ffffffff814ad90e>] ip_finish_output+0x63/0x68
  [<ffffffff814ae412>] ip_output+0x61/0x67
  [<ffffffff814ab904>] dst_output+0x17/0x1b
  [<ffffffff814adb6d>] ip_local_out+0x1e/0x23
  [<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353
  [<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc
  [<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b
  [<ffffffff814c3571>] tcp_connect+0x53c/0x587
  [<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d
  [<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e
  [<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431
  [<ffffffff814d6913>] __inet_stream_connect+0x84/0x287
  [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
  [<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f
  [<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11
  [<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79
  [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
  [<ffffffff814d6b49>] inet_stream_connect+0x33/0x49
  [<ffffffff814632c6>] sys_connect+0x75/0x98
  [<ffffffff811551fd>] ? path_put+0x1d/0x21
  [<ffffffff810e8d06>] ? __audit_syscall_entry+0x11c/0x148
  [<ffffffff8128a509>] ? lockdep_sys_exit_thunk+0x35/0x67
  [<ffffffff81162f8b>] ? __fd_install+0x26/0x52
  [<ffffffff81537e69>] system_call_fastpath+0x16/0x1b
Code: 49 8b 5c 24 10 48 8b 43 68 48 85 c0 0f 85 5e fe ff ff e9 79 fe ff ff 55 48 89 e5 41 54 53 48 89 fb
RIP  [<ffffffff8147685a>] netif_rx+0x14/0x109
  RSP <ffff8804030359a8>
... DONE

>
>
>
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  1:06                                             ` Ben Greear
@ 2013-01-24  1:10                                               ` Eric Dumazet
  2013-01-24  1:45                                                 ` Eric Dumazet
  0 siblings, 1 reply; 53+ messages in thread
From: Eric Dumazet @ 2013-01-24  1:10 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Wed, 2013-01-23 at 17:06 -0800, Ben Greear wrote:
> On 01/23/2013 05:00 PM, Eric Dumazet wrote:
> > On Wed, 2013-01-23 at 16:51 -0800, Ben Greear wrote:
> >
> >>
> >> I was poking around in drivers/net/loopback.c.  Maybe it needs
> >> to clean up the skb_dst() before calling the rx logic in the
> >> loopback_xmit method?
> >
> > Nope. Its ok there. We need a dst for loopback
> 
> Here's the crash with the BUG_ON().  Did I understand your
> suggestion properly?  This hit immediately when booting.
> 
> 
> int netif_rx(struct sk_buff *skb)
> {
> 	int ret;
> 
> 	BUG_ON(skb->_skb_refdst & SKB_DST_NOREF);
> 	
> 	/* if netpoll wants it, pretend we never saw it */
> 	if (netpoll_rx(skb))
> 		return NET_RX_DROP;
> 
> 
> kernel BUG at /home/greearb/git/linux-3.7.dev.y/net/core/dev.c:2982!
> invalid opcode: 0000 [#1] PREEMPT SMP
> Modules linked in: lockd sunrpc macvlan pktgen uinput coretemp hwmon kvm_intel kvm iTCO_wdt iTCO_vendor_e
> CPU 0
> Pid: 1554, comm: btserver Tainted: G         C O 3.7.3+ #51 Iron Systems Inc. EE2610R/X8ST3
> RIP: 0010:[<ffffffff8147685a>]  [<ffffffff8147685a>] netif_rx+0x14/0x109
> RSP: 0018:ffff8804030359a8  EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff8803fff3a2f0 RCX: ffff88040d3b4490
> RDX: ffff880403035a08 RSI: ffff88040d3f8000 RDI: ffff8803fff3a2f0
> RBP: ffff8804030359d8 R08: 0000000000000001 R09: 0000000000000000
> R10: ffffffff81472e61 R11: ffff8803fff62cc0 R12: 0000000000016ff0
> R13: 000000000000003c R14: ffff88041fc00000 R15: ffffffff81670560
> FS:  00007fcbd5c01740(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000439004 CR3: 00000003ff68d000 CR4: 00000000000007f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process btserver (pid: 1554, threadinfo ffff880403034000, task ffff880407e09f50)
> Stack:
>   ffff8804030359d8 ffffffff814c0290 ffff880403035a08 ffff8803fff3a2f0
>   ffff8803fff3a2f0 0000000000016ff0 ffff880403035a08 ffffffff813a3c4d
>   ffff8803fff3a2f0 ffff88040d3f8000 00000000a01b7c08 ffff88040d0b7400
> Call Trace:
>   [<ffffffff814c0290>] ? tcp_wfree+0xc0/0xc8
>   [<ffffffff813a3c4d>] loopback_xmit+0x64/0x83
>   [<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e
>   [<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c
>   [<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e
>   [<ffffffff8148cfa6>] ? eth_header+0x28/0xb6
>   [<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7
>   [<ffffffff814ad835>] ip_finish_output2+0x297/0x30d
>   [<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d
>   [<ffffffff814ad90e>] ip_finish_output+0x63/0x68
>   [<ffffffff814ae412>] ip_output+0x61/0x67
>   [<ffffffff814ab904>] dst_output+0x17/0x1b
>   [<ffffffff814adb6d>] ip_local_out+0x1e/0x23
>   [<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353
>   [<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc
>   [<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b
>   [<ffffffff814c3571>] tcp_connect+0x53c/0x587
>   [<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d
>   [<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e
>   [<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431
>   [<ffffffff814d6913>] __inet_stream_connect+0x84/0x287
>   [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
>   [<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f
>   [<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11
>   [<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79
>   [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
>   [<ffffffff814d6b49>] inet_stream_connect+0x33/0x49
>   [<ffffffff814632c6>] sys_connect+0x75/0x98
>   [<ffffffff811551fd>] ? path_put+0x1d/0x21
>   [<ffffffff810e8d06>] ? __audit_syscall_entry+0x11c/0x148
>   [<ffffffff8128a509>] ? lockdep_sys_exit_thunk+0x35/0x67
>   [<ffffffff81162f8b>] ? __fd_install+0x26/0x52
>   [<ffffffff81537e69>] system_call_fastpath+0x16/0x1b
> Code: 49 8b 5c 24 10 48 8b 43 68 48 85 c0 0f 85 5e fe ff ff e9 79 fe ff ff 55 48 89 e5 41 54 53 48 89 fb
> RIP  [<ffffffff8147685a>] netif_rx+0x14/0x109
>   RSP <ffff8804030359a8>
> ... DONE

Excellent, thats the bug.

I'll send a fix asap, thanks !

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  1:10                                               ` Eric Dumazet
@ 2013-01-24  1:45                                                 ` Eric Dumazet
  2013-01-24  4:26                                                   ` Ben Greear
                                                                     ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-24  1:45 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Wed, 2013-01-23 at 17:10 -0800, Eric Dumazet wrote:

> Excellent, thats the bug.
> 
> I'll send a fix asap, thanks !

loopback device doesnt have a qdisc, and unsets IFF_XMIT_DST_RELEASE

Its hard to believe such an old bug never hit us in the past.

Probably because most of the time, the packet given to netif_rx() is
immediately processed (and loopback device is hard wired ?)


diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 81f8f9e..fcbf680 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -77,6 +77,11 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 
 	skb_orphan(skb);
 
+	/* Before queueing this packet to netif_rx(),
+	 * make sure dst is refcounted.
+	 */
+	skb_dst_force(skb);
+
 	skb->protocol = eth_type_trans(skb, dev);
 
 	/* it's OK to use per_cpu_ptr() because BHs are off */

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  1:45                                                 ` Eric Dumazet
@ 2013-01-24  4:26                                                   ` Ben Greear
  2013-01-24  5:39                                                     ` Eric Dumazet
  2013-01-24 20:03                                                   ` Ben Greear
  2013-01-25 17:44                                                   ` [PATCH] net: loopback: fix a dst refcounting issue Eric Dumazet
  2 siblings, 1 reply; 53+ messages in thread
From: Ben Greear @ 2013-01-24  4:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 01/23/2013 05:45 PM, Eric Dumazet wrote:
> On Wed, 2013-01-23 at 17:10 -0800, Eric Dumazet wrote:
>
>> Excellent, thats the bug.
>>
>> I'll send a fix asap, thanks !
>
> loopback device doesnt have a qdisc, and unsets IFF_XMIT_DST_RELEASE
>
> Its hard to believe such an old bug never hit us in the past.
>
> Probably because most of the time, the packet given to netif_rx() is
> immediately processed (and loopback device is hard wired ?)

I'll test this in the morning...one thing, maybe it's worth adding
a WARN_ON in netif_rx() in case there are other bugs of
this nature?

The dst objects are also garbage collected, so they have a fairly
long valid-ish lifetime after they are logically freed.  That might
be part of why it was so hard to reproduce the bug...

Thanks again for all the help on this bug!  Hopefully it will now
be well and truly squashed.

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  4:26                                                   ` Ben Greear
@ 2013-01-24  5:39                                                     ` Eric Dumazet
  0 siblings, 0 replies; 53+ messages in thread
From: Eric Dumazet @ 2013-01-24  5:39 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Wed, 2013-01-23 at 20:26 -0800, Ben Greear wrote:

> I'll test this in the morning...one thing, maybe it's worth adding
> a WARN_ON in netif_rx() in case there are other bugs of
> this nature?

Quite frankly we should do the skb_dst_force() in netif_rx(), and remove
it from callers. This will be a net-next patch.

Thanks !

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  1:45                                                 ` Eric Dumazet
  2013-01-24  4:26                                                   ` Ben Greear
@ 2013-01-24 20:03                                                   ` Ben Greear
  2013-01-24 20:59                                                     ` Eric Dumazet
  2013-01-25 17:44                                                   ` [PATCH] net: loopback: fix a dst refcounting issue Eric Dumazet
  2 siblings, 1 reply; 53+ messages in thread
From: Ben Greear @ 2013-01-24 20:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 01/23/2013 05:45 PM, Eric Dumazet wrote:
> On Wed, 2013-01-23 at 17:10 -0800, Eric Dumazet wrote:
>
>> Excellent, thats the bug.
>>
>> I'll send a fix asap, thanks !
>
> loopback device doesnt have a qdisc, and unsets IFF_XMIT_DST_RELEASE
>
> Its hard to believe such an old bug never hit us in the past.
>
> Probably because most of the time, the packet given to netif_rx() is
> immediately processed (and loopback device is hard wired ?)

I have not seen any crashes since this change went into my
test bed, so I believe this one is finally fixed!  I'm running
with your macvlan fixes as well..and even if they did not
fix the problem I saw, they at least appear to cause no harm.

If you wish, feel free to add:

Tested-By:  Ben Greear <greearb@candelatech.com>

Thanks,
Ben

>
>
> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> index 81f8f9e..fcbf680 100644
> --- a/drivers/net/loopback.c
> +++ b/drivers/net/loopback.c
> @@ -77,6 +77,11 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
>
>   	skb_orphan(skb);
>
> +	/* Before queueing this packet to netif_rx(),
> +	 * make sure dst is refcounted.
> +	 */
> +	skb_dst_force(skb);
> +
>   	skb->protocol = eth_type_trans(skb, dev);
>
>   	/* it's OK to use per_cpu_ptr() because BHs are off */
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24 20:03                                                   ` Ben Greear
@ 2013-01-24 20:59                                                     ` Eric Dumazet
  2013-01-24 21:01                                                       ` Ben Greear
  0 siblings, 1 reply; 53+ messages in thread
From: Eric Dumazet @ 2013-01-24 20:59 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Thu, 2013-01-24 at 12:03 -0800, Ben Greear wrote:
> iately processed (and loopback device is hard wired ?)
> 
> I have not seen any crashes since this change went into my
> test bed, so I believe this one is finally fixed!  I'm running
> with your macvlan fixes as well..and even if they did not
> fix the problem I saw, they at least appear to cause no harm.
> 
> If you wish, feel free to add:
> 
> Tested-By:  Ben Greear <greearb@candelatech.com>
> 
> Thanks,
> Ben

Just to make sure, you still have the

WARN_ON_ONCE(skb_dst_is_noref(skb))

in netif_rx() ?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24 20:59                                                     ` Eric Dumazet
@ 2013-01-24 21:01                                                       ` Ben Greear
  0 siblings, 0 replies; 53+ messages in thread
From: Ben Greear @ 2013-01-24 21:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 01/24/2013 12:59 PM, Eric Dumazet wrote:
> On Thu, 2013-01-24 at 12:03 -0800, Ben Greear wrote:
>> iately processed (and loopback device is hard wired ?)
>>
>> I have not seen any crashes since this change went into my
>> test bed, so I believe this one is finally fixed!  I'm running
>> with your macvlan fixes as well..and even if they did not
>> fix the problem I saw, they at least appear to cause no harm.
>>
>> If you wish, feel free to add:
>>
>> Tested-By:  Ben Greear <greearb@candelatech.com>
>>
>> Thanks,
>> Ben
>
> Just to make sure, you still have the
>
> WARN_ON_ONCE(skb_dst_is_noref(skb))
>
> in netif_rx() ?
>

I have something similar, and I have not seen any splats:

int netif_rx(struct sk_buff *skb)
{
	int ret;

	WARN_ON(skb->_skb_refdst & SKB_DST_NOREF);


I'll keep this in our kernels so when we do .1q vlans and other
sorts of testing we'll get some more coverage.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH] net: loopback: fix a dst refcounting issue
  2013-01-24  1:45                                                 ` Eric Dumazet
  2013-01-24  4:26                                                   ` Ben Greear
  2013-01-24 20:03                                                   ` Ben Greear
@ 2013-01-25 17:44                                                   ` Eric Dumazet
  2013-01-27  6:32                                                     ` David Miller
  2 siblings, 1 reply; 53+ messages in thread
From: Eric Dumazet @ 2013-01-25 17:44 UTC (permalink / raw)
  To: Ben Greear, David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

Ben Greear reported crashes in ip_rcv_finish() on a stress
test involving many macvlans.

We tracked the bug to a dst use after free. ip_rcv_finish()
was calling dst->input() and got garbage for dst->input value.

It appears the bug is in loopback driver, lacking
a skb_dst_force() before calling netif_rx().

As a result, a non refcounted dst, normally protected by a
RCU read_lock section, was escaping this section and could
be freed before the packet being processed.

  [<ffffffff813a3c4d>] loopback_xmit+0x64/0x83
  [<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e
  [<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c
  [<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e
  [<ffffffff8148cfa6>] ? eth_header+0x28/0xb6
  [<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7
  [<ffffffff814ad835>] ip_finish_output2+0x297/0x30d
  [<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d
  [<ffffffff814ad90e>] ip_finish_output+0x63/0x68
  [<ffffffff814ae412>] ip_output+0x61/0x67
  [<ffffffff814ab904>] dst_output+0x17/0x1b
  [<ffffffff814adb6d>] ip_local_out+0x1e/0x23
  [<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353
  [<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc
  [<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b
  [<ffffffff814c3571>] tcp_connect+0x53c/0x587
  [<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d
  [<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e
  [<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431
  [<ffffffff814d6913>] __inet_stream_connect+0x84/0x287
  [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
  [<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f
  [<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11
  [<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79
  [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
  [<ffffffff814d6b49>] inet_stream_connect+0x33/0x49
  [<ffffffff814632c6>] sys_connect+0x75/0x98

This bug was introduced in linux-2.6.35, in commit
7fee226ad2397b (net: add a noref bit on skb dst)

skb_dst_force() is enforced in dev_queue_xmit() for devices having a
qdisc. 

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Ben Greear <greearb@candelatech.com>
---
 drivers/net/loopback.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 81f8f9e..fcbf680 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -77,6 +77,11 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 
 	skb_orphan(skb);
 
+	/* Before queueing this packet to netif_rx(),
+	 * make sure dst is refcounted.
+	 */
+	skb_dst_force(skb);
+
 	skb->protocol = eth_type_trans(skb, dev);
 
 	/* it's OK to use per_cpu_ptr() because BHs are off */

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH] net: loopback: fix a dst refcounting issue
  2013-01-25 17:44                                                   ` [PATCH] net: loopback: fix a dst refcounting issue Eric Dumazet
@ 2013-01-27  6:32                                                     ` David Miller
  2013-01-27 17:25                                                       ` Eric Dumazet
  0 siblings, 1 reply; 53+ messages in thread
From: David Miller @ 2013-01-27  6:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: greearb, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 25 Jan 2013 09:44:41 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Ben Greear reported crashes in ip_rcv_finish() on a stress
> test involving many macvlans.
> 
> We tracked the bug to a dst use after free. ip_rcv_finish()
> was calling dst->input() and got garbage for dst->input value.
> 
> It appears the bug is in loopback driver, lacking
> a skb_dst_force() before calling netif_rx().
> 
> As a result, a non refcounted dst, normally protected by a
> RCU read_lock section, was escaping this section and could
> be freed before the packet being processed.
 ...
> This bug was introduced in linux-2.6.35, in commit
> 7fee226ad2397b (net: add a noref bit on skb dst)
> 
> skb_dst_force() is enforced in dev_queue_xmit() for devices having a
> qdisc. 
> 
> Reported-by: Ben Greear <greearb@candelatech.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Tested-by: Ben Greear <greearb@candelatech.com>

Applied, I suspect we need to tree wide audit for other instances of
this situation.  Probably greps for skb_orphan() would be a good
start.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] net: loopback: fix a dst refcounting issue
  2013-01-27  6:32                                                     ` David Miller
@ 2013-01-27 17:25                                                       ` Eric Dumazet
  2013-01-28  0:26                                                         ` David Miller
  0 siblings, 1 reply; 53+ messages in thread
From: Eric Dumazet @ 2013-01-27 17:25 UTC (permalink / raw)
  To: David Miller; +Cc: greearb, netdev

On Sun, 2013-01-27 at 01:32 -0500, David Miller wrote:

> Applied, I suspect we need to tree wide audit for other instances of
> this situation.  Probably greps for skb_orphan() would be a good
> start.

I highly suspect macvlan needs a skb_dst_drop(), but couldnt trigger
a case to trigger the fault.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] net: loopback: fix a dst refcounting issue
  2013-01-27 17:25                                                       ` Eric Dumazet
@ 2013-01-28  0:26                                                         ` David Miller
  0 siblings, 0 replies; 53+ messages in thread
From: David Miller @ 2013-01-28  0:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: greearb, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 27 Jan 2013 09:25:08 -0800

> On Sun, 2013-01-27 at 01:32 -0500, David Miller wrote:
> 
>> Applied, I suspect we need to tree wide audit for other instances of
>> this situation.  Probably greps for skb_orphan() would be a good
>> start.
> 
> I highly suspect macvlan needs a skb_dst_drop(), but couldnt trigger
> a case to trigger the fault.

I completely agree, when we hit the ->forward() or ->receive() method
we are leaving the RCU safety zone for the cached dst.

Even if you can't trigger it, this seems clear enough to me that we
should put those skb_dst_drop() calls in, so please formally submit
that patch you posted for Ben as an RFC last week.

Thanks.

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2013-01-28  0:26 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-21 21:07 3.7.3+: Bad paging request in ip_rcv_finish while running NFS traffic Ben Greear
2013-01-21 21:07 ` Ben Greear
2013-01-22  0:32 ` Ben Greear
2013-01-22  4:40   ` Eric Dumazet
2013-01-22  5:57     ` Ben Greear
2013-01-22  5:57       ` Ben Greear
     [not found]       ` <50FE2A57.3040804-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2013-01-22 17:08         ` Ben Greear
2013-01-22 17:08           ` Ben Greear
     [not found]           ` <50FEC796.5090404-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2013-01-22 17:17             ` Eric Dumazet
2013-01-22 17:17               ` Eric Dumazet
2013-01-22 17:26               ` Ben Greear
2013-01-22 17:26                 ` Ben Greear
2013-01-22 17:26               ` Eric Dumazet
2013-01-22 22:18                 ` Ben Greear
2013-01-22 22:18                   ` Ben Greear
     [not found]                   ` <50FF102F.2050008-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2013-01-23  2:32                     ` Ben Greear
2013-01-23  2:32                       ` Ben Greear
2013-01-23  6:11                       ` Eric Dumazet
2013-01-23  7:14                         ` Ben Greear
2013-01-23  7:14                           ` Ben Greear
     [not found]                           ` <50FF8DEC.9070901-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2013-01-23 13:35                             ` Eric Dumazet
2013-01-23 13:35                               ` Eric Dumazet
2013-01-23 18:15                               ` Ben Greear
2013-01-23 18:15                                 ` Ben Greear
2013-01-23 21:43                                 ` Eric Dumazet
2013-01-23 14:42                             ` Eric Dumazet
2013-01-23 14:42                               ` Eric Dumazet
2013-01-23 21:53                               ` Ben Greear
2013-01-23 21:53                                 ` Ben Greear
     [not found]                       ` <50FF4BC9.1060206-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2013-01-23 23:55                         ` Ben Greear
2013-01-23 23:55                           ` Ben Greear
     [not found]                           ` <5100785D.8040101-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2013-01-24  0:01                             ` Eric Dumazet
2013-01-24  0:01                               ` Eric Dumazet
2013-01-24  0:13                               ` Ben Greear
2013-01-24  0:13                                 ` Ben Greear
     [not found]                                 ` <51007CA8.2050105-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2013-01-24  0:23                                   ` Eric Dumazet
2013-01-24  0:23                                     ` Eric Dumazet
2013-01-24  0:38                                     ` Ben Greear
2013-01-24  0:45                                       ` Eric Dumazet
2013-01-24  0:51                                         ` Ben Greear
2013-01-24  1:00                                           ` Eric Dumazet
2013-01-24  1:06                                             ` Ben Greear
2013-01-24  1:10                                               ` Eric Dumazet
2013-01-24  1:45                                                 ` Eric Dumazet
2013-01-24  4:26                                                   ` Ben Greear
2013-01-24  5:39                                                     ` Eric Dumazet
2013-01-24 20:03                                                   ` Ben Greear
2013-01-24 20:59                                                     ` Eric Dumazet
2013-01-24 21:01                                                       ` Ben Greear
2013-01-25 17:44                                                   ` [PATCH] net: loopback: fix a dst refcounting issue Eric Dumazet
2013-01-27  6:32                                                     ` David Miller
2013-01-27 17:25                                                       ` Eric Dumazet
2013-01-28  0:26                                                         ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.