* 3.3.0, 3.4-rc1 reproducible tun Oops @ 2012-04-04 22:05 Simon Kirby 2012-04-05 2:41 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Simon Kirby @ 2012-04-04 22:05 UTC (permalink / raw) To: netdev I use an SSH VPN occasionally from home, and since upgrading the remote kernel to 3.3.0, the it now seems to Oops when I ^C the tunnel with sockets still active. If I start the tunnel, log in to a box through it and run "vmstat 1", ^C the tunnel SSH process, and start it up again, I get an Oops like this: BUG: unable to handle kernel NULL pointer dereference at 00000000000000ff IP: [<ffffffff810ed5fa>] __kmalloc_track_caller+0xaa/0x1b0 PGD 12d2bc067 PUD 0 Oops: 0000 [#1] SMP CPU 1 Modules linked in: nf_conntrack_netlink nfnetlink iptable_mangle ipt_MASQUERADE xt_state xt_conntrack iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack hwmon_vid ppp_async ppp_generic slhc crc_ccitt tun nvidia(PO) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core e100 Pid: 16156, comm: sshd Tainted: P O 3.3.0 #32 System manufacturer System Product Name/A8N-VM CSM RIP: 0010:[<ffffffff810ed5fa>] [<ffffffff810ed5fa>] __kmalloc_track_caller+0xaa/0x1b0 RSP: 0000:ffff88012d0b3b58 EFLAGS: 00210206 RAX: 0000000000000000 RBX: ffff8801783f8e00 RCX: 000000000002c11f RDX: 000000000002c11e RSI: 00000000000000d0 RDI: 0000000000014ac0 RBP: ffff88012d0b3ba8 R08: ffffffff81693c81 R09: ffff88007f546f30 R10: 00000000f80057e0 R11: 0000000000000000 R12: 00000000000000ff R13: ffff88017b002900 R14: 0000000000000800 R15: 0000000000000800 FS: 0000000000000000(0000) GS:ffff88017fd00000(0063) knlGS:00000000f71ea740 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00000000000000ff CR3: 000000011906a000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process sshd (pid: 16156, threadinfo ffff88012d0b2000, task ffff880100a43a00) Stack: dead000000200200 ffff88007fabc0c0 ffffffff816d692c 000000d0000000db ffff880100000000 ffff8801783f8e00 0000000000000001 00000000000000d0 ffff88017b002780 0000000000000800 ffff88012d0b3be8 ffffffff81693cae Call Trace: [<ffffffff816d692c>] ? sk_stream_alloc_skb+0x3c/0x110 [<ffffffff81693cae>] __alloc_skb+0x6e/0x220 [<ffffffff816d692c>] sk_stream_alloc_skb+0x3c/0x110 [<ffffffff816d6c90>] tcp_sendmsg+0x290/0xd90 [<ffffffff81694537>] ? skb_release_data+0xe7/0xf0 [<ffffffffa0032e3a>] ? tun_do_read.isra.24+0x29a/0x420 [tun] [<ffffffff816f8703>] inet_sendmsg+0x43/0xb0 [<ffffffff8168b78e>] sock_aio_write+0x10e/0x130 [<ffffffff810f04fa>] do_sync_write+0xca/0x110 [<ffffffff8104676a>] ? set_current_blocked+0x3a/0x60 [<ffffffff810467d5>] ? sigprocmask+0x45/0x80 [<ffffffff810f0e15>] vfs_write+0x165/0x180 [<ffffffff810f1085>] sys_write+0x45/0x90 [<ffffffff818098f9>] ia32_do_call+0x13/0x13 Code: 76 bf 49 8b 4d 00 65 48 03 0c 25 b8 cb 00 00 48 8b 51 08 4c 8b 21 4d 85 e4 0f 84 eb 00 00 00 49 63 45 20 49 8b 7d 00 48 8d 4a 01 <49> 8b 1c 04 4c 89 e0 48 8d 37 e8 37 41 28 00 84 c0 74 c4 4d 85 RIP [<ffffffff810ed5fa>] __kmalloc_track_caller+0xaa/0x1b0 RSP <ffff88012d0b3b58> CR2: 00000000000000ff ---[ end trace 4a40da26b9b3bff5 ]--- Looks like it might need some poisoning there. Sometimes the Oops stops before it is fully emitted over the serial port. I have verified that this happens on v3.3 and current Linus head (3.4-rc1+) and not on v3.2. When I get some more time, I will try to track it down a bit further. ssh -w any <vpn box> 'ifconfig tun0 x pointopoint y; echo "ifconfig tun0 y pointopoint x && ip route add 10.0.0.0/8 via x"; sleep 1d' | sh -v Simon- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.3.0, 3.4-rc1 reproducible tun Oops 2012-04-04 22:05 3.3.0, 3.4-rc1 reproducible tun Oops Simon Kirby @ 2012-04-05 2:41 ` Eric Dumazet 2012-04-05 5:58 ` Simon Kirby 2012-04-17 2:08 ` Simon Kirby 0 siblings, 2 replies; 12+ messages in thread From: Eric Dumazet @ 2012-04-05 2:41 UTC (permalink / raw) To: Simon Kirby; +Cc: netdev On Wed, 2012-04-04 at 15:05 -0700, Simon Kirby wrote: > I use an SSH VPN occasionally from home, and since upgrading the > remote > kernel to 3.3.0, the it now seems to Oops when I ^C the tunnel with > sockets still active. If I start the tunnel, log in to a box through > it > and run "vmstat 1", ^C the tunnel SSH process, and start it up again, > I > get an Oops like this: > > BUG: unable to handle kernel NULL pointer dereference at > 00000000000000ff > IP: [<ffffffff810ed5fa>] __kmalloc_track_caller+0xaa/0x1b0 > PGD 12d2bc067 PUD 0 > Oops: 0000 [#1] SMP > CPU 1 > Modules linked in: nf_conntrack_netlink nfnetlink iptable_mangle > ipt_MASQUERADE xt_state xt_conntrack iptable_nat nf_nat > nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack hwmon_vid ppp_async > ppp_generic slhc crc_ccitt tun nvidia(PO) uvcvideo videobuf2_vmalloc > videobuf2_memops videobuf2_core e100 > > Pid: 16156, comm: sshd Tainted: P O 3.3.0 #32 System > manufacturer System Product Name/A8N-VM CSM Hmm, is it happening if you remove the nvidia module ? If yes, please try to add slub_debug=FZPU CONFIG_SLUB_DEBUG=y CONFIG_SLUB=y # CONFIG_SLUB_DEBUG_ON is not set ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.3.0, 3.4-rc1 reproducible tun Oops 2012-04-05 2:41 ` Eric Dumazet @ 2012-04-05 5:58 ` Simon Kirby 2012-04-17 2:08 ` Simon Kirby 1 sibling, 0 replies; 12+ messages in thread From: Simon Kirby @ 2012-04-05 5:58 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote: > On Wed, 2012-04-04 at 15:05 -0700, Simon Kirby wrote: > > I use an SSH VPN occasionally from home, and since upgrading the > > remote > > kernel to 3.3.0, the it now seems to Oops when I ^C the tunnel with > > sockets still active. If I start the tunnel, log in to a box through > > it > > and run "vmstat 1", ^C the tunnel SSH process, and start it up again, > > I > > get an Oops like this: > > > > BUG: unable to handle kernel NULL pointer dereference at > > 00000000000000ff > > IP: [<ffffffff810ed5fa>] __kmalloc_track_caller+0xaa/0x1b0 > > PGD 12d2bc067 PUD 0 > > Oops: 0000 [#1] SMP > > CPU 1 > > Modules linked in: nf_conntrack_netlink nfnetlink iptable_mangle > > ipt_MASQUERADE xt_state xt_conntrack iptable_nat nf_nat > > nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack hwmon_vid ppp_async > > ppp_generic slhc crc_ccitt tun nvidia(PO) uvcvideo videobuf2_vmalloc > > videobuf2_memops videobuf2_core e100 > > > > Pid: 16156, comm: sshd Tainted: P O 3.3.0 #32 System > > manufacturer System Product Name/A8N-VM CSM > > Hmm, is it happening if you remove the nvidia module ? > > If yes, please try to add slub_debug=FZPU > > CONFIG_SLUB_DEBUG=y > CONFIG_SLUB=y > # CONFIG_SLUB_DEBUG_ON is not set Yes it is, and here we go: [ 3223.596062] ============================================================================= [ 3223.603618] BUG kmalloc-2048 (Not tainted): Redzone overwritten [ 3223.603618] ----------------------------------------------------------------------------- [ 3223.603618] [ 3223.603618] INFO: 0xffff88017499d240-0xffff88017499d240. First byte 0xcb instead of 0xcc [ 3223.603618] INFO: Allocated in alloc_netdev_mqs+0x62/0x300 age=2919 cpu=1 pid=3929 [ 3223.603618] __slab_alloc.constprop.65+0x21b/0x25f [ 3223.603618] __kmalloc+0x1a7/0x1c0 [ 3223.603618] alloc_netdev_mqs+0x62/0x300 [ 3223.603618] __tun_chr_ioctl+0x903/0xc50 [tun] [ 3223.603618] tun_chr_compat_ioctl+0x10/0x20 [tun] [ 3223.603618] compat_sys_ioctl+0x8f/0x10d0 [ 3223.603618] ia32_sysret+0x0/0x5 [ 3223.603618] INFO: Freed in skb_release_data+0xe7/0xf0 age=3004 cpu=1 pid=0 [ 3223.603618] __slab_free+0x2d/0x28a [ 3223.603618] kfree+0x114/0x140 [ 3223.603618] skb_release_data+0xe7/0xf0 [ 3223.603618] __kfree_skb+0x19/0xa0 [ 3223.603618] kfree_skb+0x44/0xb0 [ 3223.603618] ip_rcv_finish+0xd8/0x2c0 [ 3223.603618] ip_rcv+0x1f5/0x2b0 [ 3223.603618] __netif_receive_skb+0x32b/0x410 [ 3223.603618] netif_receive_skb+0x28/0x80 [ 3223.603618] napi_skb_finish+0x48/0x60 [ 3223.603618] napi_gro_receive+0xed/0x130 [ 3223.603618] nv_rx_process_optimized+0x135/0x270 [ 3223.603618] nv_napi_poll+0x84/0x5f0 [ 3223.603618] net_rx_action+0x111/0x200 [ 3223.603618] __do_softirq+0x99/0x210 [ 3223.603618] call_softirq+0x1c/0x30 [ 3223.603618] INFO: Slab 0xffffea0005d26600 objects=13 used=9 fp=0xffff880174999290 flags=0x8000000000004081 [ 3223.603618] INFO: Object 0xffff88017499ca40 @offset=19008 fp=0xffff880174999290 [ 3223.603618] [ 3223.603618] Bytes b4 ffff88017499ca30: 5f 25 0b 00 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a _%......ZZZZZZZZ [ 3223.603618] Object ffff88017499ca40: 74 75 6e 30 00 00 00 00 00 00 00 00 00 00 00 00 tun0............ [ 3223.603618] Object ffff88017499ca50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499ca60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499ca70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499ca80: 00 00 00 00 00 00 00 00 00 02 20 00 00 00 ad de .......... ..... [ 3223.603618] Object ffff88017499ca90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499caa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cab0: 00 00 00 00 00 00 00 00 06 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cac0: 78 36 d8 81 ff ff ff ff 00 02 20 00 00 00 ad de x6........ ..... [ 3223.603618] Object ffff88017499cad0: d0 ca 99 74 01 88 ff ff d0 ca 99 74 01 88 ff ff ...t.......t.... [ 3223.603618] Object ffff88017499cae0: e0 ca 99 74 01 88 ff ff e0 ca 99 74 01 88 ff ff ...t.......t.... [ 3223.603618] Object ffff88017499caf0: 40 40 00 40 00 00 00 00 49 48 1b 40 00 00 00 00 @@.@....IH.@.... [ 3223.603618] Object ffff88017499cb00: 49 48 1b 40 00 00 00 00 20 00 00 00 00 00 00 00 IH.@.... ....... [ 3223.603618] Object ffff88017499cb10: 07 00 00 00 07 00 00 00 0d 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cb20: 0e 00 00 00 00 00 00 00 a4 02 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cb30: 82 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cb40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cb50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cb60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cb70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cb90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cbb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cbc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cbd0: 00 00 00 00 00 00 00 00 40 89 03 a0 ff ff ff ff ........@....... [ 3223.603618] Object ffff88017499cbe0: e0 87 03 a0 ff ff ff ff 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cbf0: 90 10 00 00 00 04 00 00 00 00 00 00 02 00 00 00 ................ [ 3223.603618] Object ffff88017499cc00: dc 05 00 00 fe ff 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cc10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cc20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cc30: 00 00 06 06 00 00 00 00 38 cc 99 74 01 88 ff ff ........8..t.... [ 3223.603618] Object ffff88017499cc40: 38 cc 99 74 01 88 ff ff 00 00 00 00 00 00 00 00 8..t............ [ 3223.603618] Object ffff88017499cc50: 50 cc 99 74 01 88 ff ff 50 cc 99 74 01 88 ff ff P..t....P..t.... [ 3223.603618] Object ffff88017499cc60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cc70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cc90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499ccb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499ccc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499ccd0: d0 cc 99 74 01 88 ff ff d0 cc 99 74 01 88 ff ff ...t.......t.... [ 3223.603618] Object ffff88017499cce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499ccf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cd00: 00 00 00 00 00 00 00 00 90 30 4d 76 01 88 ff ff .........0Mv.... [ 3223.603618] Object ffff88017499cd10: 20 47 2a 79 01 88 ff ff 01 00 00 00 01 00 00 00 G*y............ [ 3223.603618] Object ffff88017499cd20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cd30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cd40: 38 f1 c1 6e 01 88 ff ff 01 00 00 00 01 00 00 00 8..n............ [ 3223.603618] Object ffff88017499cd50: 80 1a c3 81 ff ff ff ff f4 01 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cd60: 03 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cd70: 0b 1b 0b 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cd90: 00 00 00 00 00 00 00 00 00 40 15 7b 01 88 ff ff .........@.{.... [ 3223.603618] Object ffff88017499cda0: a0 49 6d 81 ff ff ff ff 40 ca 99 74 01 88 ff ff .Im.....@..t.... [ 3223.603618] Object ffff88017499cdb0: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cdc0: 00 01 10 00 00 00 ad de 00 02 20 00 00 00 ad de .......... ..... [ 3223.603618] Object ffff88017499cdd0: 00 00 00 00 00 00 00 00 00 02 20 00 00 00 ad de .......... ..... [ 3223.603618] Object ffff88017499cde0: e0 cd 99 74 01 88 ff ff e0 cd 99 74 01 88 ff ff ...t.......t.... [ 3223.603618] Object ffff88017499cdf0: 04 01 00 00 00 00 00 00 f0 6b 03 a0 ff ff ff ff .........k...... [ 3223.603618] Object ffff88017499ce00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499ce10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499ce20: 40 d0 c9 6e 01 88 ff ff 80 da a6 73 01 88 ff ff @..n.......s.... [ 3223.603618] Object ffff88017499ce30: 30 ce 99 74 01 88 ff ff 30 ce 99 74 01 88 ff ff 0..t....0..t.... [ 3223.603618] Object ffff88017499ce40: 00 00 00 00 00 00 00 00 a8 81 06 7b 01 88 ff ff ...........{.... [ 3223.603618] Object ffff88017499ce50: 20 09 c0 81 ff ff ff ff 00 00 00 00 00 00 00 00 ............... [ 3223.603618] Object ffff88017499ce60: 00 00 00 00 0d 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499ce70: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499ce80: 80 ce 99 74 01 88 ff ff 80 ce 99 74 01 88 ff ff ...t.......t.... [ 3223.603618] Object ffff88017499ce90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cea0: 78 ce 99 74 01 88 ff ff 00 00 00 00 00 00 00 00 x..t............ [ 3223.603618] Object ffff88017499ceb0: 00 00 00 00 00 00 00 00 40 ca 99 74 01 88 ff ff ........@..t.... [ 3223.603618] Object ffff88017499cec0: ff ff ff ff 00 00 00 00 c8 ce 99 74 01 88 ff ff ...........t.... [ 3223.603618] Object ffff88017499ced0: c8 ce 99 74 01 88 ff ff fe ff ff ff 00 00 00 00 ...t............ [ 3223.603618] Object ffff88017499cee0: 02 02 00 00 00 00 00 00 e8 ce 99 74 01 88 ff ff ...........t.... [ 3223.603618] Object ffff88017499cef0: e8 ce 99 74 01 88 ff ff 00 00 00 00 00 00 00 00 ...t............ [ 3223.603618] Object ffff88017499cf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cf10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cf20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cf30: 00 00 00 00 00 00 00 00 38 cf 99 74 01 88 ff ff ........8..t.... [ 3223.603618] Object ffff88017499cf40: 38 cf 99 74 01 88 ff ff 00 00 00 00 00 00 00 00 8..t............ [ 3223.603618] Object ffff88017499cf50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cf60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cf70: 01 01 00 00 00 00 00 00 78 cf 99 74 01 88 ff ff ........x..t.... [ 3223.603618] Object ffff88017499cf80: 78 cf 99 74 01 88 ff ff 00 00 00 00 00 00 00 00 x..t............ [ 3223.603618] Object ffff88017499cf90: 00 01 10 00 00 00 ad de 00 02 20 00 00 00 ad de .......... ..... [ 3223.603618] Object ffff88017499cfa0: 00 00 00 00 00 00 00 00 a0 02 c3 81 ff ff ff ff ................ [ 3223.603618] Object ffff88017499cfb0: c0 cf 99 74 01 88 ff ff 00 00 00 00 00 00 00 00 ...t............ [ 3223.603618] Object ffff88017499cfc0: 20 03 c3 81 ff ff ff ff 00 00 00 00 00 00 00 00 ............... [ 3223.603618] Object ffff88017499cfd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cfe0: 00 8f 03 a0 ff ff ff ff 00 00 01 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499cff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d080: 00 00 00 00 00 00 00 00 41 00 00 00 ff ff ff ff ........A....... [ 3223.603618] Object ffff88017499d090: ff ff ff ff 00 00 00 00 40 ca 99 74 01 88 ff ff ........@..t.... [ 3223.603618] Object ffff88017499d0a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d0b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d0c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d0d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d0e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d0f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d100: 40 d1 99 74 01 88 ff ff 00 00 00 00 00 00 00 00 @..t............ [ 3223.603618] Object ffff88017499d110: d8 cf b5 73 01 88 ff ff 00 00 00 00 00 00 00 00 ...s............ [ 3223.603618] Object ffff88017499d120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d140: 64 64 00 00 00 00 00 00 48 d1 99 74 01 88 ff ff dd......H..t.... [ 3223.603618] Object ffff88017499d150: 48 d1 99 74 01 88 ff ff 00 00 00 00 00 00 00 00 H..t............ [ 3223.603618] Object ffff88017499d160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d180: 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Object ffff88017499d230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3223.603618] Redzone ffff88017499d240: cb cc cc cc cc cc cc cc ........ [ 3223.603618] Padding ffff88017499d380: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ [ 3223.603618] Pid: 3929, comm: sshd Not tainted 3.4.0-rc1+ #35 [ 3223.603618] Call Trace: [ 3223.603618] [<ffffffff810ea1c8>] ? print_section+0x38/0x40 [ 3223.603618] [<ffffffff810ea913>] print_trailer+0xe3/0x160 [ 3223.603618] [<ffffffff810eade7>] check_bytes_and_report+0xd7/0x110 [ 3223.603618] [<ffffffff810eafae>] check_object+0x18e/0x270 [ 3223.603618] [<ffffffff816cd534>] ? netdev_release+0x34/0x40 [ 3223.603618] [<ffffffff81819a78>] free_debug_processing+0xf1/0x1e7 [ 3223.603618] [<ffffffff81819b9b>] __slab_free+0x2d/0x28a [ 3223.603618] [<ffffffff810ea48a>] ? set_track+0x5a/0x180 [ 3223.603618] [<ffffffff816cd534>] ? netdev_release+0x34/0x40 [ 3223.603618] [<ffffffff810ebd05>] ? init_object+0x45/0x80 [ 3223.603618] [<ffffffff816cd534>] ? netdev_release+0x34/0x40 [ 3223.603618] [<ffffffff810ebfa4>] kfree+0x114/0x140 [ 3223.603618] [<ffffffff816cd534>] netdev_release+0x34/0x40 [ 3223.603618] [<ffffffff814ded72>] device_release+0x22/0x90 [ 3223.603618] [<ffffffff8137689c>] kobject_release+0x4c/0xa0 [ 3223.603618] [<ffffffff8137675c>] kobject_put+0x2c/0x60 [ 3223.603618] [<ffffffff814debe2>] put_device+0x12/0x20 [ 3223.603618] [<ffffffff816b6e57>] free_netdev+0xb7/0xf0 [ 3223.603618] [<ffffffffa0036434>] tun_sock_destruct+0x14/0x20 [tun] [ 3223.603618] [<ffffffff816a9af8>] __sk_free+0x18/0x140 [ 3223.603618] [<ffffffff816a9c9d>] sk_free+0x1d/0x30 [ 3223.603618] [<ffffffffa003635e>] tun_chr_close+0x5e/0xa0 [tun] [ 3223.603618] [<ffffffff810f3242>] fput+0xd2/0x240 [ 3223.603618] [<ffffffff810efb71>] filp_close+0x61/0x90 [ 3223.603618] [<ffffffff810efc1b>] sys_close+0x7b/0xd0 [ 3223.603618] [<ffffffff81825179>] ia32_do_call+0x13/0x13 [ 3223.603618] FIX kmalloc-2048: Restoring 0xffff88017499d240-0xffff88017499d240=0xcc [ 3223.603618] Simon- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.3.0, 3.4-rc1 reproducible tun Oops 2012-04-05 2:41 ` Eric Dumazet 2012-04-05 5:58 ` Simon Kirby @ 2012-04-17 2:08 ` Simon Kirby 2012-04-17 12:18 ` Stanislav Kinsbursky 1 sibling, 1 reply; 12+ messages in thread From: Simon Kirby @ 2012-04-17 2:08 UTC (permalink / raw) To: Eric Dumazet, Stanislav Kinsbursky; +Cc: netdev On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote: > Hmm, is it happening if you remove the nvidia module ? > > If yes, please try to add slub_debug=FZPU Finally got annoyed enough at this to bisect it. It doesn't happen every time and I got a bit confused, but I finally tracked it down to: 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d is the first bad commit commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d Author: Stanislav Kinsbursky <skinsbursky@parallels.com> Date: Mon Mar 12 02:59:41 2012 +0000 tun: don't hold network namespace by tun sockets v3: added previously removed sock_put() to the tun_release() callback, because sk_release_kernel() doesn't drop the socket reference. v2: sk_release_kernel() used for socket release. Dummy tun_release() is required for sk_release_kernel() ---> sock_release() ---> sock->ops->release() call. TUN was designed to destroy it's socket on network namesapce shutdown. But this will never happen for persistent device, because it's socket holds network namespace. This patch removes of holding network namespace by TUN socket and replaces it by creating socket in init_net and then changing it's net it to desired one. On shutdown socket is moved back to init_net prior to final put. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net> ...With this reverted on top of 3.4-rc3, I no longer see crashes when I keep making and breaking the SSH tunnel while running "vmstat 1" in an SSH session over a socket that is running through that tunnel. Simon- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.3.0, 3.4-rc1 reproducible tun Oops 2012-04-17 2:08 ` Simon Kirby @ 2012-04-17 12:18 ` Stanislav Kinsbursky 2012-04-17 18:35 ` Simon Kirby 0 siblings, 1 reply; 12+ messages in thread From: Stanislav Kinsbursky @ 2012-04-17 12:18 UTC (permalink / raw) To: Simon Kirby; +Cc: Eric Dumazet, netdev 17.04.2012 06:08, Simon Kirby пишет: > On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote: > >> Hmm, is it happening if you remove the nvidia module ? >> >> If yes, please try to add slub_debug=FZPU > > Finally got annoyed enough at this to bisect it. It doesn't happen every > time and I got a bit confused, but I finally tracked it down to: > > 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d is the first bad commit > commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d > Author: Stanislav Kinsbursky<skinsbursky@parallels.com> > Date: Mon Mar 12 02:59:41 2012 +0000 > > tun: don't hold network namespace by tun sockets > > v3: added previously removed sock_put() to the tun_release() callback, because > sk_release_kernel() doesn't drop the socket reference. > > v2: sk_release_kernel() used for socket release. Dummy tun_release() is > required for sk_release_kernel() ---> sock_release() ---> sock->ops->release() > call. > > TUN was designed to destroy it's socket on network namesapce shutdown. But this > will never happen for persistent device, because it's socket holds network > namespace. > This patch removes of holding network namespace by TUN socket and replaces it > by creating socket in init_net and then changing it's net it to desired one. On > shutdown socket is moved back to init_net prior to final put. > > Signed-off-by: Stanislav Kinsbursky<skinsbursky@parallels.com> > Signed-off-by: David S. Miller<davem@davemloft.net> > > ...With this reverted on top of 3.4-rc3, I no longer see crashes when I > keep making and breaking the SSH tunnel while running "vmstat 1" in an > SSH session over a socket that is running through that tunnel. > > Simon- Hi, Simon. Could you please try to apply the patch below on top of your the tree (with 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and check does it fix the problem: diff --git a/drivers/net/tun.c b/drivers/net/tun.c index bb8c72c..1fc4622 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode *inode, struct file *file) if (dev->reg_state == NETREG_REGISTERED) unregister_netdevice(dev); rtnl_unlock(); - } + } else + sock_put(tun->socket.sk); } - tun = tfile->tun; - if (tun) - sock_put(tun->socket.sk); - put_net(tfile->net); kfree(tfile); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: 3.3.0, 3.4-rc1 reproducible tun Oops 2012-04-17 12:18 ` Stanislav Kinsbursky @ 2012-04-17 18:35 ` Simon Kirby 2012-04-17 18:49 ` Stanislav Kinsbursky 2012-04-18 11:32 ` Stanislav Kinsbursky 0 siblings, 2 replies; 12+ messages in thread From: Simon Kirby @ 2012-04-17 18:35 UTC (permalink / raw) To: Stanislav Kinsbursky; +Cc: Eric Dumazet, netdev On Tue, Apr 17, 2012 at 04:18:53PM +0400, Stanislav Kinsbursky wrote: > 17.04.2012 06:08, Simon Kirby ??????????: > >On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote: > > > >>Hmm, is it happening if you remove the nvidia module ? > >> > >>If yes, please try to add slub_debug=FZPU > > > >Finally got annoyed enough at this to bisect it. It doesn't happen every > >time and I got a bit confused, but I finally tracked it down to: > > > >1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d is the first bad commit > >commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d > >Author: Stanislav Kinsbursky<skinsbursky@parallels.com> > >Date: Mon Mar 12 02:59:41 2012 +0000 > > > > tun: don't hold network namespace by tun sockets > > > > v3: added previously removed sock_put() to the tun_release() callback, because > > sk_release_kernel() doesn't drop the socket reference. > > > > v2: sk_release_kernel() used for socket release. Dummy tun_release() is > > required for sk_release_kernel() ---> sock_release() ---> sock->ops->release() > > call. > > > > TUN was designed to destroy it's socket on network namesapce shutdown. But this > > will never happen for persistent device, because it's socket holds network > > namespace. > > This patch removes of holding network namespace by TUN socket and replaces it > > by creating socket in init_net and then changing it's net it to desired one. On > > shutdown socket is moved back to init_net prior to final put. > > > > Signed-off-by: Stanislav Kinsbursky<skinsbursky@parallels.com> > > Signed-off-by: David S. Miller<davem@davemloft.net> > > > >...With this reverted on top of 3.4-rc3, I no longer see crashes when I > >keep making and breaking the SSH tunnel while running "vmstat 1" in an > >SSH session over a socket that is running through that tunnel. > > > >Simon- > > Hi, Simon. > Could you please try to apply the patch below on top of your the > tree (with 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and > check does it fix the problem: > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index bb8c72c..1fc4622 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode > *inode, struct file *file) > if (dev->reg_state == NETREG_REGISTERED) > unregister_netdevice(dev); > rtnl_unlock(); > - } > + } else > + sock_put(tun->socket.sk); > } > > - tun = tfile->tun; > - if (tun) > - sock_put(tun->socket.sk); > - > put_net(tfile->net); > kfree(tfile); (Whitespace-damaged patch, applied manually) Yes, I no longer see crashes with this applied. I haven't tried with kmemleak or similar, but it seems to work. Thanks, Simon- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.3.0, 3.4-rc1 reproducible tun Oops 2012-04-17 18:35 ` Simon Kirby @ 2012-04-17 18:49 ` Stanislav Kinsbursky 2012-04-18 2:38 ` David Miller 2012-04-18 11:32 ` Stanislav Kinsbursky 1 sibling, 1 reply; 12+ messages in thread From: Stanislav Kinsbursky @ 2012-04-17 18:49 UTC (permalink / raw) To: Simon Kirby; +Cc: Eric Dumazet, netdev 17.04.2012 22:35, Simon Kirby написал: > On Tue, Apr 17, 2012 at 04:18:53PM +0400, Stanislav Kinsbursky wrote: > >> 17.04.2012 06:08, Simon Kirby ??????????: >>> On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote: >>> >>>> Hmm, is it happening if you remove the nvidia module ? >>>> >>>> If yes, please try to add slub_debug=FZPU >>> Finally got annoyed enough at this to bisect it. It doesn't happen every >>> time and I got a bit confused, but I finally tracked it down to: >>> >>> 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d is the first bad commit >>> commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d >>> Author: Stanislav Kinsbursky<skinsbursky@parallels.com> >>> Date: Mon Mar 12 02:59:41 2012 +0000 >>> >>> tun: don't hold network namespace by tun sockets >>> >>> v3: added previously removed sock_put() to the tun_release() callback, because >>> sk_release_kernel() doesn't drop the socket reference. >>> >>> v2: sk_release_kernel() used for socket release. Dummy tun_release() is >>> required for sk_release_kernel() ---> sock_release() ---> sock->ops->release() >>> call. >>> >>> TUN was designed to destroy it's socket on network namesapce shutdown. But this >>> will never happen for persistent device, because it's socket holds network >>> namespace. >>> This patch removes of holding network namespace by TUN socket and replaces it >>> by creating socket in init_net and then changing it's net it to desired one. On >>> shutdown socket is moved back to init_net prior to final put. >>> >>> Signed-off-by: Stanislav Kinsbursky<skinsbursky@parallels.com> >>> Signed-off-by: David S. Miller<davem@davemloft.net> >>> >>> ...With this reverted on top of 3.4-rc3, I no longer see crashes when I >>> keep making and breaking the SSH tunnel while running "vmstat 1" in an >>> SSH session over a socket that is running through that tunnel. >>> >>> Simon- >> Hi, Simon. >> Could you please try to apply the patch below on top of your the >> tree (with 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and >> check does it fix the problem: >> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >> index bb8c72c..1fc4622 100644 >> --- a/drivers/net/tun.c >> +++ b/drivers/net/tun.c >> @@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode >> *inode, struct file *file) >> if (dev->reg_state == NETREG_REGISTERED) >> unregister_netdevice(dev); >> rtnl_unlock(); >> - } >> + } else >> + sock_put(tun->socket.sk); >> } >> >> - tun = tfile->tun; >> - if (tun) >> - sock_put(tun->socket.sk); >> - >> put_net(tfile->net); >> kfree(tfile); > (Whitespace-damaged patch, applied manually) > > Yes, I no longer see crashes with this applied. I haven't tried with > kmemleak or similar, but it seems to work. Sorry for whitespaces. And thanks, Simon. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.3.0, 3.4-rc1 reproducible tun Oops 2012-04-17 18:49 ` Stanislav Kinsbursky @ 2012-04-18 2:38 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2012-04-18 2:38 UTC (permalink / raw) To: skinsbursky; +Cc: sim, eric.dumazet, netdev From: Stanislav Kinsbursky <skinsbursky@parallels.com> Date: Tue, 17 Apr 2012 22:49:06 +0400 > Sorry for whitespaces. > And thanks, Simon. Please submit this fix formally, with Simon's Tested-by: ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.3.0, 3.4-rc1 reproducible tun Oops 2012-04-17 18:35 ` Simon Kirby 2012-04-17 18:49 ` Stanislav Kinsbursky @ 2012-04-18 11:32 ` Stanislav Kinsbursky 2012-05-19 1:07 ` Simon Kirby 1 sibling, 1 reply; 12+ messages in thread From: Stanislav Kinsbursky @ 2012-04-18 11:32 UTC (permalink / raw) To: Simon Kirby; +Cc: Eric Dumazet, netdev 17.04.2012 22:35, Simon Kirby пишет: > On Tue, Apr 17, 2012 at 04:18:53PM +0400, Stanislav Kinsbursky wrote: > >> 17.04.2012 06:08, Simon Kirby ??????????: >>> On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote: >>> >>>> Hmm, is it happening if you remove the nvidia module ? >>>> >>>> If yes, please try to add slub_debug=FZPU >>> >>> Finally got annoyed enough at this to bisect it. It doesn't happen every >>> time and I got a bit confused, but I finally tracked it down to: >>> >>> 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d is the first bad commit >>> commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d >>> Author: Stanislav Kinsbursky<skinsbursky@parallels.com> >>> Date: Mon Mar 12 02:59:41 2012 +0000 >>> >>> tun: don't hold network namespace by tun sockets >>> >>> v3: added previously removed sock_put() to the tun_release() callback, because >>> sk_release_kernel() doesn't drop the socket reference. >>> >>> v2: sk_release_kernel() used for socket release. Dummy tun_release() is >>> required for sk_release_kernel() ---> sock_release() ---> sock->ops->release() >>> call. >>> >>> TUN was designed to destroy it's socket on network namesapce shutdown. But this >>> will never happen for persistent device, because it's socket holds network >>> namespace. >>> This patch removes of holding network namespace by TUN socket and replaces it >>> by creating socket in init_net and then changing it's net it to desired one. On >>> shutdown socket is moved back to init_net prior to final put. >>> >>> Signed-off-by: Stanislav Kinsbursky<skinsbursky@parallels.com> >>> Signed-off-by: David S. Miller<davem@davemloft.net> >>> >>> ...With this reverted on top of 3.4-rc3, I no longer see crashes when I >>> keep making and breaking the SSH tunnel while running "vmstat 1" in an >>> SSH session over a socket that is running through that tunnel. >>> >>> Simon- >> >> Hi, Simon. >> Could you please try to apply the patch below on top of your the >> tree (with 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and >> check does it fix the problem: >> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >> index bb8c72c..1fc4622 100644 >> --- a/drivers/net/tun.c >> +++ b/drivers/net/tun.c >> @@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode >> *inode, struct file *file) >> if (dev->reg_state == NETREG_REGISTERED) >> unregister_netdevice(dev); >> rtnl_unlock(); >> - } >> + } else >> + sock_put(tun->socket.sk); >> } >> >> - tun = tfile->tun; >> - if (tun) >> - sock_put(tun->socket.sk); >> - >> put_net(tfile->net); >> kfree(tfile); > > (Whitespace-damaged patch, applied manually) > > Yes, I no longer see crashes with this applied. I haven't tried with > kmemleak or similar, but it seems to work. > > Thanks, > This bug looks like double free, but I can't understand how does this can happen... Simon, would be really great, if you'll describe in details some simple way, how to reproduce the bug. -- Best regards, Stanislav Kinsbursky ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.3.0, 3.4-rc1 reproducible tun Oops 2012-04-18 11:32 ` Stanislav Kinsbursky @ 2012-05-19 1:07 ` Simon Kirby 2012-05-21 14:51 ` Stanislav Kinsbursky 0 siblings, 1 reply; 12+ messages in thread From: Simon Kirby @ 2012-05-19 1:07 UTC (permalink / raw) To: Stanislav Kinsbursky; +Cc: Eric Dumazet, netdev On Wed, Apr 18, 2012 at 03:32:27PM +0400, Stanislav Kinsbursky wrote: > 17.04.2012 22:35, Simon Kirby ??????????: > >On Tue, Apr 17, 2012 at 04:18:53PM +0400, Stanislav Kinsbursky wrote: > >> > >>Hi, Simon. > >>Could you please try to apply the patch below on top of your the > >>tree (with 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and > >>check does it fix the problem: > >> > >>diff --git a/drivers/net/tun.c b/drivers/net/tun.c > >>index bb8c72c..1fc4622 100644 > >>--- a/drivers/net/tun.c > >>+++ b/drivers/net/tun.c > >>@@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode > >>*inode, struct file *file) > >> if (dev->reg_state == NETREG_REGISTERED) > >> unregister_netdevice(dev); > >> rtnl_unlock(); > >>- } > >>+ } else > >>+ sock_put(tun->socket.sk); > >> } > >> > >>- tun = tfile->tun; > >>- if (tun) > >>- sock_put(tun->socket.sk); > >>- > >> put_net(tfile->net); > >> kfree(tfile); > > > >(Whitespace-damaged patch, applied manually) > > > >Yes, I no longer see crashes with this applied. I haven't tried with > >kmemleak or similar, but it seems to work. > > > >Thanks, > > > > This bug looks like double free, but I can't understand how does this can happen... > Simon, would be really great, if you'll describe in details some > simple way, how to reproduce the bug. Oh, sorry, I did not see this until now. I just noticed it was still floating in my tree with no upstream changes yet, then found your email. I still have not seen any issues since applying your patch. I was definitely seeing the issue on 3.4-rc3. I can try and see if it still occurs with your patch removed, if that would help. Do you have a box on which you can set up an SSH tunnel? In my case, I can reproduce it easily with three boxes. From home, I run ssh to my work box to establish the layer 2 tunnel. This goes through a ProxyCommand to jump through an entry box, but I don't think that should matter. I use a cheap tunnel start script similar to this: work_net=10.0.0.0/8 work_tun_ip=10.x.x.x home_tun_ip=10.x.x.x echo 1 > /proc/sys/net/ipv4/conf/eth0/proxy_arp ssh -w any:any <work box> "ifconfig tun0 $work_tun_ip pointopoint $home_tun_ip; echo 'ifconfig tun0 $home_tun_ip pointopoint $work_tun_ip && ip route add $work_net via $work_tun_ip'; sleep 1d" | sh -v ...there's probably a better way, but it works. To reproduce, I log in to a third box over this tunnel, and start a "vmstat 1", so that packets keep coming back to the tunnel host. ^C on the SSH session will then produce an Oops within a second. With CONFIG_SLUB_DEBUG=y and booting with slub_debug=FZPU, I got the Redzone overwritten notice. Without it, the box usually Oopses and hangs immediately. Sometimes, I might have to reconnect the tunnel and ^C it once more. If I don't have that vmstat session open, it usually doesn't crash. Does this work for you? Simon- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.3.0, 3.4-rc1 reproducible tun Oops 2012-05-19 1:07 ` Simon Kirby @ 2012-05-21 14:51 ` Stanislav Kinsbursky 0 siblings, 0 replies; 12+ messages in thread From: Stanislav Kinsbursky @ 2012-05-21 14:51 UTC (permalink / raw) To: Simon Kirby; +Cc: Eric Dumazet, netdev On 19.05.2012 05:07, Simon Kirby wrote: > On Wed, Apr 18, 2012 at 03:32:27PM +0400, Stanislav Kinsbursky wrote: > >> 17.04.2012 22:35, Simon Kirby ??????????: >>> On Tue, Apr 17, 2012 at 04:18:53PM +0400, Stanislav Kinsbursky wrote: >>>> >>>> Hi, Simon. >>>> Could you please try to apply the patch below on top of your the >>>> tree (with 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and >>>> check does it fix the problem: >>>> >>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >>>> index bb8c72c..1fc4622 100644 >>>> --- a/drivers/net/tun.c >>>> +++ b/drivers/net/tun.c >>>> @@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode >>>> *inode, struct file *file) >>>> if (dev->reg_state == NETREG_REGISTERED) >>>> unregister_netdevice(dev); >>>> rtnl_unlock(); >>>> - } >>>> + } else >>>> + sock_put(tun->socket.sk); >>>> } >>>> >>>> - tun = tfile->tun; >>>> - if (tun) >>>> - sock_put(tun->socket.sk); >>>> - >>>> put_net(tfile->net); >>>> kfree(tfile); >>> >>> (Whitespace-damaged patch, applied manually) >>> >>> Yes, I no longer see crashes with this applied. I haven't tried with >>> kmemleak or similar, but it seems to work. >>> >>> Thanks, >>> >> >> This bug looks like double free, but I can't understand how does this can happen... >> Simon, would be really great, if you'll describe in details some >> simple way, how to reproduce the bug. > > Oh, sorry, I did not see this until now. I just noticed it was still > floating in my tree with no upstream changes yet, then found your email. > I still have not seen any issues since applying your patch. > > I was definitely seeing the issue on 3.4-rc3. I can try and see if it > still occurs with your patch removed, if that would help. > > Do you have a box on which you can set up an SSH tunnel? In my case, I > can reproduce it easily with three boxes. From home, I run ssh to my work > box to establish the layer 2 tunnel. This goes through a ProxyCommand to > jump through an entry box, but I don't think that should matter. I use a > cheap tunnel start script similar to this: > > work_net=10.0.0.0/8 > work_tun_ip=10.x.x.x > home_tun_ip=10.x.x.x > echo 1> /proc/sys/net/ipv4/conf/eth0/proxy_arp > ssh -w any:any<work box> "ifconfig tun0 $work_tun_ip pointopoint > $home_tun_ip; echo 'ifconfig tun0 $home_tun_ip pointopoint $work_tun_ip > && ip route add $work_net via $work_tun_ip'; sleep 1d" | sh -v > > ...there's probably a better way, but it works. To reproduce, I log in > to a third box over this tunnel, and start a "vmstat 1", so that packets > keep coming back to the tunnel host. ^C on the SSH session will then > produce an Oops within a second. > > With CONFIG_SLUB_DEBUG=y and booting with slub_debug=FZPU, I got the > Redzone overwritten notice. Without it, the box usually Oopses and > hangs immediately. Sometimes, I might have to reconnect the tunnel and > ^C it once more. If I don't have that vmstat session open, it usually > doesn't crash. > > Does this work for you? > Hello, Simon. Thanks for details. I still can't reproduce the issue. Here is my configuration: 1) three nodes: A, B and C. 2) A and B connected with a tunnel (your script - slightly modified). 3) Packets to C from A are routed through the tunnel. 4) Node B has 3.4.0-rc2 based kernel. A and C - rhel6 kernel. So, I login to C from A by ssh, run "vmstat 1" and then cut off (^C) the tunnel between A and B. Connection hanged. No panic or oops occurred. Is it the same you've done when panic occurred? Or I'm doing something wrong? > Simon- -- Best regards, Stanislav Kinsbursky ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
@ 2012-04-18 6:51 Stanislav Kinsbursky
0 siblings, 0 replies; 12+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-18 6:51 UTC (permalink / raw)
To: David Miller; +Cc: sim, eric.dumazet@gmail.com , netdev
Sure, David. This is not a fix yet since I don't completely understand, what's happening. Just a proof of concert.
Пользователь David Miller <davem@davemloft.net> писал:
>From: Stanislav Kinsbursky <skinsbursky@parallels.com>
>Date: Tue, 17 Apr 2012 22:49:06 +0400
>
>> Sorry for whitespaces.
>> And thanks, Simon.
>
>Please submit this fix formally, with Simon's Tested-by:
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-05-21 14:51 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-04-04 22:05 3.3.0, 3.4-rc1 reproducible tun Oops Simon Kirby 2012-04-05 2:41 ` Eric Dumazet 2012-04-05 5:58 ` Simon Kirby 2012-04-17 2:08 ` Simon Kirby 2012-04-17 12:18 ` Stanislav Kinsbursky 2012-04-17 18:35 ` Simon Kirby 2012-04-17 18:49 ` Stanislav Kinsbursky 2012-04-18 2:38 ` David Miller 2012-04-18 11:32 ` Stanislav Kinsbursky 2012-05-19 1:07 ` Simon Kirby 2012-05-21 14:51 ` Stanislav Kinsbursky 2012-04-18 6:51 Stanislav Kinsbursky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).