netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 3.3.0, 3.4-rc1 reproducible tun Oops
@ 2012-04-04 22:05 Simon Kirby
  2012-04-05  2:41 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Simon Kirby @ 2012-04-04 22:05 UTC (permalink / raw)
  To: netdev

I use an SSH VPN occasionally from home, and since upgrading the remote
kernel to 3.3.0, the it now seems to Oops when I ^C the tunnel with
sockets still active. If I start the tunnel, log in to a box through it
and run "vmstat 1", ^C the tunnel SSH process, and start it up again, I
get an Oops like this:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000ff
IP: [<ffffffff810ed5fa>] __kmalloc_track_caller+0xaa/0x1b0
PGD 12d2bc067 PUD 0
Oops: 0000 [#1] SMP
CPU 1
Modules linked in: nf_conntrack_netlink nfnetlink iptable_mangle ipt_MASQUERADE xt_state xt_conntrack iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack hwmon_vid ppp_async ppp_generic slhc crc_ccitt tun nvidia(PO) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core e100

Pid: 16156, comm: sshd Tainted: P           O 3.3.0 #32 System manufacturer System Product Name/A8N-VM CSM
RIP: 0010:[<ffffffff810ed5fa>]  [<ffffffff810ed5fa>] __kmalloc_track_caller+0xaa/0x1b0
RSP: 0000:ffff88012d0b3b58  EFLAGS: 00210206
RAX: 0000000000000000 RBX: ffff8801783f8e00 RCX: 000000000002c11f
RDX: 000000000002c11e RSI: 00000000000000d0 RDI: 0000000000014ac0
RBP: ffff88012d0b3ba8 R08: ffffffff81693c81 R09: ffff88007f546f30
R10: 00000000f80057e0 R11: 0000000000000000 R12: 00000000000000ff
R13: ffff88017b002900 R14: 0000000000000800 R15: 0000000000000800
FS:  0000000000000000(0000) GS:ffff88017fd00000(0063) knlGS:00000000f71ea740
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000000000ff CR3: 000000011906a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sshd (pid: 16156, threadinfo ffff88012d0b2000, task ffff880100a43a00)
Stack:
 dead000000200200 ffff88007fabc0c0 ffffffff816d692c 000000d0000000db
 ffff880100000000 ffff8801783f8e00 0000000000000001 00000000000000d0
 ffff88017b002780 0000000000000800 ffff88012d0b3be8 ffffffff81693cae
Call Trace:
 [<ffffffff816d692c>] ? sk_stream_alloc_skb+0x3c/0x110
 [<ffffffff81693cae>] __alloc_skb+0x6e/0x220
 [<ffffffff816d692c>] sk_stream_alloc_skb+0x3c/0x110
 [<ffffffff816d6c90>] tcp_sendmsg+0x290/0xd90
 [<ffffffff81694537>] ? skb_release_data+0xe7/0xf0
 [<ffffffffa0032e3a>] ? tun_do_read.isra.24+0x29a/0x420 [tun]
 [<ffffffff816f8703>] inet_sendmsg+0x43/0xb0
 [<ffffffff8168b78e>] sock_aio_write+0x10e/0x130
 [<ffffffff810f04fa>] do_sync_write+0xca/0x110
 [<ffffffff8104676a>] ? set_current_blocked+0x3a/0x60
 [<ffffffff810467d5>] ? sigprocmask+0x45/0x80
 [<ffffffff810f0e15>] vfs_write+0x165/0x180
 [<ffffffff810f1085>] sys_write+0x45/0x90
 [<ffffffff818098f9>] ia32_do_call+0x13/0x13
Code: 76 bf 49 8b 4d 00 65 48 03 0c 25 b8 cb 00 00 48 8b 51 08 4c 8b 21 4d 85 e4 0f 84 eb 00 00 00 49 63 45 20 49 8b 7d 00 48 8d 4a 01 <49> 8b 1c 04 4c 89 e0 48 8d 37 e8 37 41 28 00 84 c0 74 c4 4d 85
RIP  [<ffffffff810ed5fa>] __kmalloc_track_caller+0xaa/0x1b0
 RSP <ffff88012d0b3b58>
CR2: 00000000000000ff
---[ end trace 4a40da26b9b3bff5 ]---

Looks like it might need some poisoning there. Sometimes the Oops stops
before it is fully emitted over the serial port. I have verified that
this happens on v3.3 and current Linus head (3.4-rc1+) and not on v3.2.

When I get some more time, I will try to track it down a bit further.

ssh -w any <vpn box> 'ifconfig tun0 x pointopoint y; echo "ifconfig tun0 y pointopoint x && ip route add 10.0.0.0/8 via x"; sleep 1d' | sh -v

Simon-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
  2012-04-04 22:05 3.3.0, 3.4-rc1 reproducible tun Oops Simon Kirby
@ 2012-04-05  2:41 ` Eric Dumazet
  2012-04-05  5:58   ` Simon Kirby
  2012-04-17  2:08   ` Simon Kirby
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2012-04-05  2:41 UTC (permalink / raw)
  To: Simon Kirby; +Cc: netdev

On Wed, 2012-04-04 at 15:05 -0700, Simon Kirby wrote:
> I use an SSH VPN occasionally from home, and since upgrading the
> remote
> kernel to 3.3.0, the it now seems to Oops when I ^C the tunnel with
> sockets still active. If I start the tunnel, log in to a box through
> it
> and run "vmstat 1", ^C the tunnel SSH process, and start it up again,
> I
> get an Oops like this:
> 
> BUG: unable to handle kernel NULL pointer dereference at
> 00000000000000ff
> IP: [<ffffffff810ed5fa>] __kmalloc_track_caller+0xaa/0x1b0
> PGD 12d2bc067 PUD 0
> Oops: 0000 [#1] SMP
> CPU 1
> Modules linked in: nf_conntrack_netlink nfnetlink iptable_mangle
> ipt_MASQUERADE xt_state xt_conntrack iptable_nat nf_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack hwmon_vid ppp_async
> ppp_generic slhc crc_ccitt tun nvidia(PO) uvcvideo videobuf2_vmalloc
> videobuf2_memops videobuf2_core e100
> 
> Pid: 16156, comm: sshd Tainted: P           O 3.3.0 #32 System
> manufacturer System Product Name/A8N-VM CSM

Hmm, is it happening if you remove the nvidia module ?

If yes, please try to add slub_debug=FZPU

CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
# CONFIG_SLUB_DEBUG_ON is not set

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
  2012-04-05  2:41 ` Eric Dumazet
@ 2012-04-05  5:58   ` Simon Kirby
  2012-04-17  2:08   ` Simon Kirby
  1 sibling, 0 replies; 12+ messages in thread
From: Simon Kirby @ 2012-04-05  5:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote:

> On Wed, 2012-04-04 at 15:05 -0700, Simon Kirby wrote:
> > I use an SSH VPN occasionally from home, and since upgrading the
> > remote
> > kernel to 3.3.0, the it now seems to Oops when I ^C the tunnel with
> > sockets still active. If I start the tunnel, log in to a box through
> > it
> > and run "vmstat 1", ^C the tunnel SSH process, and start it up again,
> > I
> > get an Oops like this:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at
> > 00000000000000ff
> > IP: [<ffffffff810ed5fa>] __kmalloc_track_caller+0xaa/0x1b0
> > PGD 12d2bc067 PUD 0
> > Oops: 0000 [#1] SMP
> > CPU 1
> > Modules linked in: nf_conntrack_netlink nfnetlink iptable_mangle
> > ipt_MASQUERADE xt_state xt_conntrack iptable_nat nf_nat
> > nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack hwmon_vid ppp_async
> > ppp_generic slhc crc_ccitt tun nvidia(PO) uvcvideo videobuf2_vmalloc
> > videobuf2_memops videobuf2_core e100
> > 
> > Pid: 16156, comm: sshd Tainted: P           O 3.3.0 #32 System
> > manufacturer System Product Name/A8N-VM CSM
> 
> Hmm, is it happening if you remove the nvidia module ?
> 
> If yes, please try to add slub_debug=FZPU
> 
> CONFIG_SLUB_DEBUG=y
> CONFIG_SLUB=y
> # CONFIG_SLUB_DEBUG_ON is not set

Yes it is, and here we go:

[ 3223.596062] =============================================================================
[ 3223.603618] BUG kmalloc-2048 (Not tainted): Redzone overwritten
[ 3223.603618] -----------------------------------------------------------------------------
[ 3223.603618] 
[ 3223.603618] INFO: 0xffff88017499d240-0xffff88017499d240. First byte 0xcb instead of 0xcc
[ 3223.603618] INFO: Allocated in alloc_netdev_mqs+0x62/0x300 age=2919 cpu=1 pid=3929
[ 3223.603618]  __slab_alloc.constprop.65+0x21b/0x25f
[ 3223.603618]  __kmalloc+0x1a7/0x1c0
[ 3223.603618]  alloc_netdev_mqs+0x62/0x300
[ 3223.603618]  __tun_chr_ioctl+0x903/0xc50 [tun]
[ 3223.603618]  tun_chr_compat_ioctl+0x10/0x20 [tun]
[ 3223.603618]  compat_sys_ioctl+0x8f/0x10d0
[ 3223.603618]  ia32_sysret+0x0/0x5
[ 3223.603618] INFO: Freed in skb_release_data+0xe7/0xf0 age=3004 cpu=1 pid=0
[ 3223.603618]  __slab_free+0x2d/0x28a
[ 3223.603618]  kfree+0x114/0x140
[ 3223.603618]  skb_release_data+0xe7/0xf0
[ 3223.603618]  __kfree_skb+0x19/0xa0
[ 3223.603618]  kfree_skb+0x44/0xb0
[ 3223.603618]  ip_rcv_finish+0xd8/0x2c0
[ 3223.603618]  ip_rcv+0x1f5/0x2b0
[ 3223.603618]  __netif_receive_skb+0x32b/0x410
[ 3223.603618]  netif_receive_skb+0x28/0x80
[ 3223.603618]  napi_skb_finish+0x48/0x60
[ 3223.603618]  napi_gro_receive+0xed/0x130
[ 3223.603618]  nv_rx_process_optimized+0x135/0x270
[ 3223.603618]  nv_napi_poll+0x84/0x5f0
[ 3223.603618]  net_rx_action+0x111/0x200
[ 3223.603618]  __do_softirq+0x99/0x210
[ 3223.603618]  call_softirq+0x1c/0x30
[ 3223.603618] INFO: Slab 0xffffea0005d26600 objects=13 used=9 fp=0xffff880174999290 flags=0x8000000000004081
[ 3223.603618] INFO: Object 0xffff88017499ca40 @offset=19008 fp=0xffff880174999290
[ 3223.603618] 
[ 3223.603618] Bytes b4 ffff88017499ca30: 5f 25 0b 00 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  _%......ZZZZZZZZ
[ 3223.603618] Object ffff88017499ca40: 74 75 6e 30 00 00 00 00 00 00 00 00 00 00 00 00  tun0............
[ 3223.603618] Object ffff88017499ca50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499ca60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499ca70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499ca80: 00 00 00 00 00 00 00 00 00 02 20 00 00 00 ad de  .......... .....
[ 3223.603618] Object ffff88017499ca90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499caa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cab0: 00 00 00 00 00 00 00 00 06 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cac0: 78 36 d8 81 ff ff ff ff 00 02 20 00 00 00 ad de  x6........ .....
[ 3223.603618] Object ffff88017499cad0: d0 ca 99 74 01 88 ff ff d0 ca 99 74 01 88 ff ff  ...t.......t....
[ 3223.603618] Object ffff88017499cae0: e0 ca 99 74 01 88 ff ff e0 ca 99 74 01 88 ff ff  ...t.......t....
[ 3223.603618] Object ffff88017499caf0: 40 40 00 40 00 00 00 00 49 48 1b 40 00 00 00 00  @@.@....IH.@....
[ 3223.603618] Object ffff88017499cb00: 49 48 1b 40 00 00 00 00 20 00 00 00 00 00 00 00  IH.@.... .......
[ 3223.603618] Object ffff88017499cb10: 07 00 00 00 07 00 00 00 0d 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cb20: 0e 00 00 00 00 00 00 00 a4 02 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cb30: 82 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cb40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cb50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cb60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cb70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cb90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cbb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cbc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cbd0: 00 00 00 00 00 00 00 00 40 89 03 a0 ff ff ff ff  ........@.......
[ 3223.603618] Object ffff88017499cbe0: e0 87 03 a0 ff ff ff ff 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cbf0: 90 10 00 00 00 04 00 00 00 00 00 00 02 00 00 00  ................
[ 3223.603618] Object ffff88017499cc00: dc 05 00 00 fe ff 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cc10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cc20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cc30: 00 00 06 06 00 00 00 00 38 cc 99 74 01 88 ff ff  ........8..t....
[ 3223.603618] Object ffff88017499cc40: 38 cc 99 74 01 88 ff ff 00 00 00 00 00 00 00 00  8..t............
[ 3223.603618] Object ffff88017499cc50: 50 cc 99 74 01 88 ff ff 50 cc 99 74 01 88 ff ff  P..t....P..t....
[ 3223.603618] Object ffff88017499cc60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cc70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cc90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499ccb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499ccc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499ccd0: d0 cc 99 74 01 88 ff ff d0 cc 99 74 01 88 ff ff  ...t.......t....
[ 3223.603618] Object ffff88017499cce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499ccf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cd00: 00 00 00 00 00 00 00 00 90 30 4d 76 01 88 ff ff  .........0Mv....
[ 3223.603618] Object ffff88017499cd10: 20 47 2a 79 01 88 ff ff 01 00 00 00 01 00 00 00   G*y............
[ 3223.603618] Object ffff88017499cd20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cd30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cd40: 38 f1 c1 6e 01 88 ff ff 01 00 00 00 01 00 00 00  8..n............
[ 3223.603618] Object ffff88017499cd50: 80 1a c3 81 ff ff ff ff f4 01 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cd60: 03 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cd70: 0b 1b 0b 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cd90: 00 00 00 00 00 00 00 00 00 40 15 7b 01 88 ff ff  .........@.{....
[ 3223.603618] Object ffff88017499cda0: a0 49 6d 81 ff ff ff ff 40 ca 99 74 01 88 ff ff  .Im.....@..t....
[ 3223.603618] Object ffff88017499cdb0: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cdc0: 00 01 10 00 00 00 ad de 00 02 20 00 00 00 ad de  .......... .....
[ 3223.603618] Object ffff88017499cdd0: 00 00 00 00 00 00 00 00 00 02 20 00 00 00 ad de  .......... .....
[ 3223.603618] Object ffff88017499cde0: e0 cd 99 74 01 88 ff ff e0 cd 99 74 01 88 ff ff  ...t.......t....
[ 3223.603618] Object ffff88017499cdf0: 04 01 00 00 00 00 00 00 f0 6b 03 a0 ff ff ff ff  .........k......
[ 3223.603618] Object ffff88017499ce00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499ce10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499ce20: 40 d0 c9 6e 01 88 ff ff 80 da a6 73 01 88 ff ff  @..n.......s....
[ 3223.603618] Object ffff88017499ce30: 30 ce 99 74 01 88 ff ff 30 ce 99 74 01 88 ff ff  0..t....0..t....
[ 3223.603618] Object ffff88017499ce40: 00 00 00 00 00 00 00 00 a8 81 06 7b 01 88 ff ff  ...........{....
[ 3223.603618] Object ffff88017499ce50: 20 09 c0 81 ff ff ff ff 00 00 00 00 00 00 00 00   ...............
[ 3223.603618] Object ffff88017499ce60: 00 00 00 00 0d 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499ce70: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499ce80: 80 ce 99 74 01 88 ff ff 80 ce 99 74 01 88 ff ff  ...t.......t....
[ 3223.603618] Object ffff88017499ce90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cea0: 78 ce 99 74 01 88 ff ff 00 00 00 00 00 00 00 00  x..t............
[ 3223.603618] Object ffff88017499ceb0: 00 00 00 00 00 00 00 00 40 ca 99 74 01 88 ff ff  ........@..t....
[ 3223.603618] Object ffff88017499cec0: ff ff ff ff 00 00 00 00 c8 ce 99 74 01 88 ff ff  ...........t....
[ 3223.603618] Object ffff88017499ced0: c8 ce 99 74 01 88 ff ff fe ff ff ff 00 00 00 00  ...t............
[ 3223.603618] Object ffff88017499cee0: 02 02 00 00 00 00 00 00 e8 ce 99 74 01 88 ff ff  ...........t....
[ 3223.603618] Object ffff88017499cef0: e8 ce 99 74 01 88 ff ff 00 00 00 00 00 00 00 00  ...t............
[ 3223.603618] Object ffff88017499cf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cf10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cf20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cf30: 00 00 00 00 00 00 00 00 38 cf 99 74 01 88 ff ff  ........8..t....
[ 3223.603618] Object ffff88017499cf40: 38 cf 99 74 01 88 ff ff 00 00 00 00 00 00 00 00  8..t............
[ 3223.603618] Object ffff88017499cf50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cf60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cf70: 01 01 00 00 00 00 00 00 78 cf 99 74 01 88 ff ff  ........x..t....
[ 3223.603618] Object ffff88017499cf80: 78 cf 99 74 01 88 ff ff 00 00 00 00 00 00 00 00  x..t............
[ 3223.603618] Object ffff88017499cf90: 00 01 10 00 00 00 ad de 00 02 20 00 00 00 ad de  .......... .....
[ 3223.603618] Object ffff88017499cfa0: 00 00 00 00 00 00 00 00 a0 02 c3 81 ff ff ff ff  ................
[ 3223.603618] Object ffff88017499cfb0: c0 cf 99 74 01 88 ff ff 00 00 00 00 00 00 00 00  ...t............
[ 3223.603618] Object ffff88017499cfc0: 20 03 c3 81 ff ff ff ff 00 00 00 00 00 00 00 00   ...............
[ 3223.603618] Object ffff88017499cfd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cfe0: 00 8f 03 a0 ff ff ff ff 00 00 01 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499cff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d080: 00 00 00 00 00 00 00 00 41 00 00 00 ff ff ff ff  ........A.......
[ 3223.603618] Object ffff88017499d090: ff ff ff ff 00 00 00 00 40 ca 99 74 01 88 ff ff  ........@..t....
[ 3223.603618] Object ffff88017499d0a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d0b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d0c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d0d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d0e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d0f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d100: 40 d1 99 74 01 88 ff ff 00 00 00 00 00 00 00 00  @..t............
[ 3223.603618] Object ffff88017499d110: d8 cf b5 73 01 88 ff ff 00 00 00 00 00 00 00 00  ...s............
[ 3223.603618] Object ffff88017499d120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d140: 64 64 00 00 00 00 00 00 48 d1 99 74 01 88 ff ff  dd......H..t....
[ 3223.603618] Object ffff88017499d150: 48 d1 99 74 01 88 ff ff 00 00 00 00 00 00 00 00  H..t............
[ 3223.603618] Object ffff88017499d160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d180: 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Object ffff88017499d230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3223.603618] Redzone ffff88017499d240: cb cc cc cc cc cc cc cc                          ........
[ 3223.603618] Padding ffff88017499d380: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
[ 3223.603618] Pid: 3929, comm: sshd Not tainted 3.4.0-rc1+ #35
[ 3223.603618] Call Trace:
[ 3223.603618]  [<ffffffff810ea1c8>] ? print_section+0x38/0x40
[ 3223.603618]  [<ffffffff810ea913>] print_trailer+0xe3/0x160
[ 3223.603618]  [<ffffffff810eade7>] check_bytes_and_report+0xd7/0x110
[ 3223.603618]  [<ffffffff810eafae>] check_object+0x18e/0x270
[ 3223.603618]  [<ffffffff816cd534>] ? netdev_release+0x34/0x40
[ 3223.603618]  [<ffffffff81819a78>] free_debug_processing+0xf1/0x1e7
[ 3223.603618]  [<ffffffff81819b9b>] __slab_free+0x2d/0x28a
[ 3223.603618]  [<ffffffff810ea48a>] ? set_track+0x5a/0x180
[ 3223.603618]  [<ffffffff816cd534>] ? netdev_release+0x34/0x40
[ 3223.603618]  [<ffffffff810ebd05>] ? init_object+0x45/0x80
[ 3223.603618]  [<ffffffff816cd534>] ? netdev_release+0x34/0x40
[ 3223.603618]  [<ffffffff810ebfa4>] kfree+0x114/0x140
[ 3223.603618]  [<ffffffff816cd534>] netdev_release+0x34/0x40
[ 3223.603618]  [<ffffffff814ded72>] device_release+0x22/0x90
[ 3223.603618]  [<ffffffff8137689c>] kobject_release+0x4c/0xa0
[ 3223.603618]  [<ffffffff8137675c>] kobject_put+0x2c/0x60
[ 3223.603618]  [<ffffffff814debe2>] put_device+0x12/0x20
[ 3223.603618]  [<ffffffff816b6e57>] free_netdev+0xb7/0xf0
[ 3223.603618]  [<ffffffffa0036434>] tun_sock_destruct+0x14/0x20 [tun]
[ 3223.603618]  [<ffffffff816a9af8>] __sk_free+0x18/0x140
[ 3223.603618]  [<ffffffff816a9c9d>] sk_free+0x1d/0x30
[ 3223.603618]  [<ffffffffa003635e>] tun_chr_close+0x5e/0xa0 [tun]
[ 3223.603618]  [<ffffffff810f3242>] fput+0xd2/0x240
[ 3223.603618]  [<ffffffff810efb71>] filp_close+0x61/0x90
[ 3223.603618]  [<ffffffff810efc1b>] sys_close+0x7b/0xd0
[ 3223.603618]  [<ffffffff81825179>] ia32_do_call+0x13/0x13
[ 3223.603618] FIX kmalloc-2048: Restoring 0xffff88017499d240-0xffff88017499d240=0xcc
[ 3223.603618] 

Simon-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
  2012-04-05  2:41 ` Eric Dumazet
  2012-04-05  5:58   ` Simon Kirby
@ 2012-04-17  2:08   ` Simon Kirby
  2012-04-17 12:18     ` Stanislav Kinsbursky
  1 sibling, 1 reply; 12+ messages in thread
From: Simon Kirby @ 2012-04-17  2:08 UTC (permalink / raw)
  To: Eric Dumazet, Stanislav Kinsbursky; +Cc: netdev

On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote:

> Hmm, is it happening if you remove the nvidia module ?
> 
> If yes, please try to add slub_debug=FZPU

Finally got annoyed enough at this to bisect it. It doesn't happen every
time and I got a bit confused, but I finally tracked it down to:

1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d is the first bad commit
commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d
Author: Stanislav Kinsbursky <skinsbursky@parallels.com>
Date:   Mon Mar 12 02:59:41 2012 +0000

    tun: don't hold network namespace by tun sockets
    
    v3: added previously removed sock_put() to the tun_release() callback, because
    sk_release_kernel() doesn't drop the socket reference.
    
    v2: sk_release_kernel() used for socket release. Dummy tun_release() is
    required for sk_release_kernel() ---> sock_release() ---> sock->ops->release()
    call.
    
    TUN was designed to destroy it's socket on network namesapce shutdown. But this
    will never happen for persistent device, because it's socket holds network
    namespace.
    This patch removes of holding network namespace by TUN socket and replaces it
    by creating socket in init_net and then changing it's net it to desired one. On
    shutdown socket is moved back to init_net prior to final put.
    
    Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

...With this reverted on top of 3.4-rc3, I no longer see crashes when I
keep making and breaking the SSH tunnel while running "vmstat 1" in an
SSH session over a socket that is running through that tunnel.

Simon-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
  2012-04-17  2:08   ` Simon Kirby
@ 2012-04-17 12:18     ` Stanislav Kinsbursky
  2012-04-17 18:35       ` Simon Kirby
  0 siblings, 1 reply; 12+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-17 12:18 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Eric Dumazet, netdev

17.04.2012 06:08, Simon Kirby пишет:
> On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote:
>
>> Hmm, is it happening if you remove the nvidia module ?
>>
>> If yes, please try to add slub_debug=FZPU
>
> Finally got annoyed enough at this to bisect it. It doesn't happen every
> time and I got a bit confused, but I finally tracked it down to:
>
> 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d is the first bad commit
> commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d
> Author: Stanislav Kinsbursky<skinsbursky@parallels.com>
> Date:   Mon Mar 12 02:59:41 2012 +0000
>
>      tun: don't hold network namespace by tun sockets
>
>      v3: added previously removed sock_put() to the tun_release() callback, because
>      sk_release_kernel() doesn't drop the socket reference.
>
>      v2: sk_release_kernel() used for socket release. Dummy tun_release() is
>      required for sk_release_kernel() --->  sock_release() --->  sock->ops->release()
>      call.
>
>      TUN was designed to destroy it's socket on network namesapce shutdown. But this
>      will never happen for persistent device, because it's socket holds network
>      namespace.
>      This patch removes of holding network namespace by TUN socket and replaces it
>      by creating socket in init_net and then changing it's net it to desired one. On
>      shutdown socket is moved back to init_net prior to final put.
>
>      Signed-off-by: Stanislav Kinsbursky<skinsbursky@parallels.com>
>      Signed-off-by: David S. Miller<davem@davemloft.net>
>
> ...With this reverted on top of 3.4-rc3, I no longer see crashes when I
> keep making and breaking the SSH tunnel while running "vmstat 1" in an
> SSH session over a socket that is running through that tunnel.
>
> Simon-

Hi, Simon.
Could you please try to apply the patch below on top of your the tree (with 
1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and check does it fix the problem:

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bb8c72c..1fc4622 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode *inode, struct 
file *file)
  			if (dev->reg_state == NETREG_REGISTERED)
  				unregister_netdevice(dev);
  			rtnl_unlock();
-		}
+		} else
+			sock_put(tun->socket.sk);
  	}

-	tun = tfile->tun;
-	if (tun)
-		sock_put(tun->socket.sk);
-
  	put_net(tfile->net);
  	kfree(tfile);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
  2012-04-17 12:18     ` Stanislav Kinsbursky
@ 2012-04-17 18:35       ` Simon Kirby
  2012-04-17 18:49         ` Stanislav Kinsbursky
  2012-04-18 11:32         ` Stanislav Kinsbursky
  0 siblings, 2 replies; 12+ messages in thread
From: Simon Kirby @ 2012-04-17 18:35 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: Eric Dumazet, netdev

On Tue, Apr 17, 2012 at 04:18:53PM +0400, Stanislav Kinsbursky wrote:

> 17.04.2012 06:08, Simon Kirby ??????????:
> >On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote:
> >
> >>Hmm, is it happening if you remove the nvidia module ?
> >>
> >>If yes, please try to add slub_debug=FZPU
> >
> >Finally got annoyed enough at this to bisect it. It doesn't happen every
> >time and I got a bit confused, but I finally tracked it down to:
> >
> >1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d is the first bad commit
> >commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d
> >Author: Stanislav Kinsbursky<skinsbursky@parallels.com>
> >Date:   Mon Mar 12 02:59:41 2012 +0000
> >
> >     tun: don't hold network namespace by tun sockets
> >
> >     v3: added previously removed sock_put() to the tun_release() callback, because
> >     sk_release_kernel() doesn't drop the socket reference.
> >
> >     v2: sk_release_kernel() used for socket release. Dummy tun_release() is
> >     required for sk_release_kernel() --->  sock_release() --->  sock->ops->release()
> >     call.
> >
> >     TUN was designed to destroy it's socket on network namesapce shutdown. But this
> >     will never happen for persistent device, because it's socket holds network
> >     namespace.
> >     This patch removes of holding network namespace by TUN socket and replaces it
> >     by creating socket in init_net and then changing it's net it to desired one. On
> >     shutdown socket is moved back to init_net prior to final put.
> >
> >     Signed-off-by: Stanislav Kinsbursky<skinsbursky@parallels.com>
> >     Signed-off-by: David S. Miller<davem@davemloft.net>
> >
> >...With this reverted on top of 3.4-rc3, I no longer see crashes when I
> >keep making and breaking the SSH tunnel while running "vmstat 1" in an
> >SSH session over a socket that is running through that tunnel.
> >
> >Simon-
> 
> Hi, Simon.
> Could you please try to apply the patch below on top of your the
> tree (with 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and
> check does it fix the problem:
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index bb8c72c..1fc4622 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode
> *inode, struct file *file)
>  			if (dev->reg_state == NETREG_REGISTERED)
>  				unregister_netdevice(dev);
>  			rtnl_unlock();
> -		}
> +		} else
> +			sock_put(tun->socket.sk);
>  	}
> 
> -	tun = tfile->tun;
> -	if (tun)
> -		sock_put(tun->socket.sk);
> -
>  	put_net(tfile->net);
>  	kfree(tfile);

(Whitespace-damaged patch, applied manually)

Yes, I no longer see crashes with this applied. I haven't tried with
kmemleak or similar, but it seems to work.

Thanks,

Simon-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
  2012-04-17 18:35       ` Simon Kirby
@ 2012-04-17 18:49         ` Stanislav Kinsbursky
  2012-04-18  2:38           ` David Miller
  2012-04-18 11:32         ` Stanislav Kinsbursky
  1 sibling, 1 reply; 12+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-17 18:49 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Eric Dumazet, netdev

17.04.2012 22:35, Simon Kirby написал:
> On Tue, Apr 17, 2012 at 04:18:53PM +0400, Stanislav Kinsbursky wrote:
>
>> 17.04.2012 06:08, Simon Kirby ??????????:
>>> On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote:
>>>
>>>> Hmm, is it happening if you remove the nvidia module ?
>>>>
>>>> If yes, please try to add slub_debug=FZPU
>>> Finally got annoyed enough at this to bisect it. It doesn't happen every
>>> time and I got a bit confused, but I finally tracked it down to:
>>>
>>> 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d is the first bad commit
>>> commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d
>>> Author: Stanislav Kinsbursky<skinsbursky@parallels.com>
>>> Date:   Mon Mar 12 02:59:41 2012 +0000
>>>
>>>      tun: don't hold network namespace by tun sockets
>>>
>>>      v3: added previously removed sock_put() to the tun_release() callback, because
>>>      sk_release_kernel() doesn't drop the socket reference.
>>>
>>>      v2: sk_release_kernel() used for socket release. Dummy tun_release() is
>>>      required for sk_release_kernel() --->   sock_release() --->   sock->ops->release()
>>>      call.
>>>
>>>      TUN was designed to destroy it's socket on network namesapce shutdown. But this
>>>      will never happen for persistent device, because it's socket holds network
>>>      namespace.
>>>      This patch removes of holding network namespace by TUN socket and replaces it
>>>      by creating socket in init_net and then changing it's net it to desired one. On
>>>      shutdown socket is moved back to init_net prior to final put.
>>>
>>>      Signed-off-by: Stanislav Kinsbursky<skinsbursky@parallels.com>
>>>      Signed-off-by: David S. Miller<davem@davemloft.net>
>>>
>>> ...With this reverted on top of 3.4-rc3, I no longer see crashes when I
>>> keep making and breaking the SSH tunnel while running "vmstat 1" in an
>>> SSH session over a socket that is running through that tunnel.
>>>
>>> Simon-
>> Hi, Simon.
>> Could you please try to apply the patch below on top of your the
>> tree (with 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and
>> check does it fix the problem:
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index bb8c72c..1fc4622 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode
>> *inode, struct file *file)
>>   			if (dev->reg_state == NETREG_REGISTERED)
>>   				unregister_netdevice(dev);
>>   			rtnl_unlock();
>> -		}
>> +		} else
>> +			sock_put(tun->socket.sk);
>>   	}
>>
>> -	tun = tfile->tun;
>> -	if (tun)
>> -		sock_put(tun->socket.sk);
>> -
>>   	put_net(tfile->net);
>>   	kfree(tfile);
> (Whitespace-damaged patch, applied manually)
>
> Yes, I no longer see crashes with this applied. I haven't tried with
> kmemleak or similar, but it seems to work.

Sorry for whitespaces.
And thanks, Simon.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
  2012-04-17 18:49         ` Stanislav Kinsbursky
@ 2012-04-18  2:38           ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2012-04-18  2:38 UTC (permalink / raw)
  To: skinsbursky; +Cc: sim, eric.dumazet, netdev

From: Stanislav Kinsbursky <skinsbursky@parallels.com>
Date: Tue, 17 Apr 2012 22:49:06 +0400

> Sorry for whitespaces.
> And thanks, Simon.

Please submit this fix formally, with Simon's Tested-by:

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
  2012-04-17 18:35       ` Simon Kirby
  2012-04-17 18:49         ` Stanislav Kinsbursky
@ 2012-04-18 11:32         ` Stanislav Kinsbursky
  2012-05-19  1:07           ` Simon Kirby
  1 sibling, 1 reply; 12+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-18 11:32 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Eric Dumazet, netdev

17.04.2012 22:35, Simon Kirby пишет:
> On Tue, Apr 17, 2012 at 04:18:53PM +0400, Stanislav Kinsbursky wrote:
>
>> 17.04.2012 06:08, Simon Kirby ??????????:
>>> On Thu, Apr 05, 2012 at 04:41:04AM +0200, Eric Dumazet wrote:
>>>
>>>> Hmm, is it happening if you remove the nvidia module ?
>>>>
>>>> If yes, please try to add slub_debug=FZPU
>>>
>>> Finally got annoyed enough at this to bisect it. It doesn't happen every
>>> time and I got a bit confused, but I finally tracked it down to:
>>>
>>> 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d is the first bad commit
>>> commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d
>>> Author: Stanislav Kinsbursky<skinsbursky@parallels.com>
>>> Date:   Mon Mar 12 02:59:41 2012 +0000
>>>
>>>      tun: don't hold network namespace by tun sockets
>>>
>>>      v3: added previously removed sock_put() to the tun_release() callback, because
>>>      sk_release_kernel() doesn't drop the socket reference.
>>>
>>>      v2: sk_release_kernel() used for socket release. Dummy tun_release() is
>>>      required for sk_release_kernel() --->   sock_release() --->   sock->ops->release()
>>>      call.
>>>
>>>      TUN was designed to destroy it's socket on network namesapce shutdown. But this
>>>      will never happen for persistent device, because it's socket holds network
>>>      namespace.
>>>      This patch removes of holding network namespace by TUN socket and replaces it
>>>      by creating socket in init_net and then changing it's net it to desired one. On
>>>      shutdown socket is moved back to init_net prior to final put.
>>>
>>>      Signed-off-by: Stanislav Kinsbursky<skinsbursky@parallels.com>
>>>      Signed-off-by: David S. Miller<davem@davemloft.net>
>>>
>>> ...With this reverted on top of 3.4-rc3, I no longer see crashes when I
>>> keep making and breaking the SSH tunnel while running "vmstat 1" in an
>>> SSH session over a socket that is running through that tunnel.
>>>
>>> Simon-
>>
>> Hi, Simon.
>> Could you please try to apply the patch below on top of your the
>> tree (with 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and
>> check does it fix the problem:
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index bb8c72c..1fc4622 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode
>> *inode, struct file *file)
>>   			if (dev->reg_state == NETREG_REGISTERED)
>>   				unregister_netdevice(dev);
>>   			rtnl_unlock();
>> -		}
>> +		} else
>> +			sock_put(tun->socket.sk);
>>   	}
>>
>> -	tun = tfile->tun;
>> -	if (tun)
>> -		sock_put(tun->socket.sk);
>> -
>>   	put_net(tfile->net);
>>   	kfree(tfile);
>
> (Whitespace-damaged patch, applied manually)
>
> Yes, I no longer see crashes with this applied. I haven't tried with
> kmemleak or similar, but it seems to work.
>
> Thanks,
>

This bug looks like double free, but I can't understand how does this can happen...
Simon, would be really great, if you'll describe in details some simple way, how 
to reproduce the bug.

-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
  2012-04-18 11:32         ` Stanislav Kinsbursky
@ 2012-05-19  1:07           ` Simon Kirby
  2012-05-21 14:51             ` Stanislav Kinsbursky
  0 siblings, 1 reply; 12+ messages in thread
From: Simon Kirby @ 2012-05-19  1:07 UTC (permalink / raw)
  To: Stanislav Kinsbursky; +Cc: Eric Dumazet, netdev

On Wed, Apr 18, 2012 at 03:32:27PM +0400, Stanislav Kinsbursky wrote:

> 17.04.2012 22:35, Simon Kirby ??????????:
> >On Tue, Apr 17, 2012 at 04:18:53PM +0400, Stanislav Kinsbursky wrote:
> >>
> >>Hi, Simon.
> >>Could you please try to apply the patch below on top of your the
> >>tree (with 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and
> >>check does it fix the problem:
> >>
> >>diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >>index bb8c72c..1fc4622 100644
> >>--- a/drivers/net/tun.c
> >>+++ b/drivers/net/tun.c
> >>@@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode
> >>*inode, struct file *file)
> >>  			if (dev->reg_state == NETREG_REGISTERED)
> >>  				unregister_netdevice(dev);
> >>  			rtnl_unlock();
> >>-		}
> >>+		} else
> >>+			sock_put(tun->socket.sk);
> >>  	}
> >>
> >>-	tun = tfile->tun;
> >>-	if (tun)
> >>-		sock_put(tun->socket.sk);
> >>-
> >>  	put_net(tfile->net);
> >>  	kfree(tfile);
> >
> >(Whitespace-damaged patch, applied manually)
> >
> >Yes, I no longer see crashes with this applied. I haven't tried with
> >kmemleak or similar, but it seems to work.
> >
> >Thanks,
> >
> 
> This bug looks like double free, but I can't understand how does this can happen...
> Simon, would be really great, if you'll describe in details some
> simple way, how to reproduce the bug.

Oh, sorry, I did not see this until now. I just noticed it was still
floating in my tree with no upstream changes yet, then found your email.
I still have not seen any issues since applying your patch.

I was definitely seeing the issue on 3.4-rc3. I can try and see if it
still occurs with your patch removed, if that would help.

Do you have a box on which you can set up an SSH tunnel? In my case, I
can reproduce it easily with three boxes. From home, I run ssh to my work
box to establish the layer 2 tunnel. This goes through a ProxyCommand to
jump through an entry box, but I don't think that should matter. I use a
cheap tunnel start script similar to this:

work_net=10.0.0.0/8
work_tun_ip=10.x.x.x
home_tun_ip=10.x.x.x
echo 1 > /proc/sys/net/ipv4/conf/eth0/proxy_arp
ssh -w any:any <work box> "ifconfig tun0 $work_tun_ip pointopoint
$home_tun_ip; echo 'ifconfig tun0 $home_tun_ip pointopoint $work_tun_ip
&& ip route add $work_net via $work_tun_ip'; sleep 1d" | sh -v

...there's probably a better way, but it works. To reproduce, I log in
to a third box over this tunnel, and start a "vmstat 1", so that packets
keep coming back to the tunnel host. ^C on the SSH session will then
produce an Oops within a second.

With CONFIG_SLUB_DEBUG=y and booting with slub_debug=FZPU, I got the
Redzone overwritten notice. Without it, the box usually Oopses and
hangs immediately. Sometimes, I might have to reconnect the tunnel and
^C it once more. If I don't have that vmstat session open, it usually
doesn't crash.

Does this work for you?

Simon-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
  2012-05-19  1:07           ` Simon Kirby
@ 2012-05-21 14:51             ` Stanislav Kinsbursky
  0 siblings, 0 replies; 12+ messages in thread
From: Stanislav Kinsbursky @ 2012-05-21 14:51 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Eric Dumazet, netdev

On 19.05.2012 05:07, Simon Kirby wrote:
> On Wed, Apr 18, 2012 at 03:32:27PM +0400, Stanislav Kinsbursky wrote:
>
>> 17.04.2012 22:35, Simon Kirby ??????????:
>>> On Tue, Apr 17, 2012 at 04:18:53PM +0400, Stanislav Kinsbursky wrote:
>>>>
>>>> Hi, Simon.
>>>> Could you please try to apply the patch below on top of your the
>>>> tree (with 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d applied) and
>>>> check does it fix the problem:
>>>>
>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>> index bb8c72c..1fc4622 100644
>>>> --- a/drivers/net/tun.c
>>>> +++ b/drivers/net/tun.c
>>>> @@ -1540,13 +1540,10 @@ static int tun_chr_close(struct inode
>>>> *inode, struct file *file)
>>>>   			if (dev->reg_state == NETREG_REGISTERED)
>>>>   				unregister_netdevice(dev);
>>>>   			rtnl_unlock();
>>>> -		}
>>>> +		} else
>>>> +			sock_put(tun->socket.sk);
>>>>   	}
>>>>
>>>> -	tun = tfile->tun;
>>>> -	if (tun)
>>>> -		sock_put(tun->socket.sk);
>>>> -
>>>>   	put_net(tfile->net);
>>>>   	kfree(tfile);
>>>
>>> (Whitespace-damaged patch, applied manually)
>>>
>>> Yes, I no longer see crashes with this applied. I haven't tried with
>>> kmemleak or similar, but it seems to work.
>>>
>>> Thanks,
>>>
>>
>> This bug looks like double free, but I can't understand how does this can happen...
>> Simon, would be really great, if you'll describe in details some
>> simple way, how to reproduce the bug.
>
> Oh, sorry, I did not see this until now. I just noticed it was still
> floating in my tree with no upstream changes yet, then found your email.
> I still have not seen any issues since applying your patch.
>
> I was definitely seeing the issue on 3.4-rc3. I can try and see if it
> still occurs with your patch removed, if that would help.
>
> Do you have a box on which you can set up an SSH tunnel? In my case, I
> can reproduce it easily with three boxes. From home, I run ssh to my work
> box to establish the layer 2 tunnel. This goes through a ProxyCommand to
> jump through an entry box, but I don't think that should matter. I use a
> cheap tunnel start script similar to this:
>
> work_net=10.0.0.0/8
> work_tun_ip=10.x.x.x
> home_tun_ip=10.x.x.x
> echo 1>  /proc/sys/net/ipv4/conf/eth0/proxy_arp
> ssh -w any:any<work box>  "ifconfig tun0 $work_tun_ip pointopoint
> $home_tun_ip; echo 'ifconfig tun0 $home_tun_ip pointopoint $work_tun_ip
> &&  ip route add $work_net via $work_tun_ip'; sleep 1d" | sh -v
>
> ...there's probably a better way, but it works. To reproduce, I log in
> to a third box over this tunnel, and start a "vmstat 1", so that packets
> keep coming back to the tunnel host. ^C on the SSH session will then
> produce an Oops within a second.
>
> With CONFIG_SLUB_DEBUG=y and booting with slub_debug=FZPU, I got the
> Redzone overwritten notice. Without it, the box usually Oopses and
> hangs immediately. Sometimes, I might have to reconnect the tunnel and
> ^C it once more. If I don't have that vmstat session open, it usually
> doesn't crash.
>
> Does this work for you?
>

Hello, Simon.
Thanks for details.
I still can't reproduce the issue.
Here is my configuration:
1) three nodes: A, B and C.
2) A and B connected with a tunnel (your script - slightly modified).
3) Packets to C from A are routed through the tunnel.
4) Node B has 3.4.0-rc2 based kernel. A and C - rhel6 kernel.

So, I login to C from A by ssh, run "vmstat 1" and then cut off (^C) the tunnel 
between A and B. Connection hanged. No panic or oops occurred.

Is it the same you've done when panic occurred?
Or I'm doing something wrong?

> Simon-


-- 
Best regards,
Stanislav Kinsbursky

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.3.0, 3.4-rc1 reproducible tun Oops
@ 2012-04-18  6:51 Stanislav Kinsbursky
  0 siblings, 0 replies; 12+ messages in thread
From: Stanislav Kinsbursky @ 2012-04-18  6:51 UTC (permalink / raw)
  To: David Miller; +Cc: sim, eric.dumazet@gmail.com , netdev

Sure,  David.  This is not a fix yet since I don't completely understand,  what's happening.  Just a proof of concert.

Пользователь David Miller <davem@davemloft.net> писал:

>From: Stanislav Kinsbursky <skinsbursky@parallels.com>
>Date: Tue, 17 Apr 2012 22:49:06 +0400
>
>> Sorry for whitespaces.
>> And thanks, Simon.
>
>Please submit this fix formally, with Simon's Tested-by:

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-05-21 14:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-04 22:05 3.3.0, 3.4-rc1 reproducible tun Oops Simon Kirby
2012-04-05  2:41 ` Eric Dumazet
2012-04-05  5:58   ` Simon Kirby
2012-04-17  2:08   ` Simon Kirby
2012-04-17 12:18     ` Stanislav Kinsbursky
2012-04-17 18:35       ` Simon Kirby
2012-04-17 18:49         ` Stanislav Kinsbursky
2012-04-18  2:38           ` David Miller
2012-04-18 11:32         ` Stanislav Kinsbursky
2012-05-19  1:07           ` Simon Kirby
2012-05-21 14:51             ` Stanislav Kinsbursky
2012-04-18  6:51 Stanislav Kinsbursky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).