All of lore.kernel.org
 help / color / mirror / Atom feed
* tun netns BUG()
@ 2009-06-04 19:05 David Woodhouse
  2009-06-05  1:19 ` Eric W. Biederman
  0 siblings, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2009-06-04 19:05 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, johannes

Johannes and I have been playing with using network namespaces for VPNs:
	http://david.woodhou.se/vpnc-netns.sh

The idea is that you set up a separate netns to be 'afflicted' by the
VPN, while your normal programs have a normal view of the network.

One option was to run a SOCKS server in the VPN's netns, so that normal
programs could get to the VPN through that. That's not what I'm doing
here though -- this one is using NAT-PT so a range of IPv6 space is
mapped to the Legacy IP range on the VPN. Any connections to
fec0:0:0:ffff:0:0:xxyy:zzww get mapped to xx.yy.zz.ww on the VPN.

This is done by passing the VPN tundev (from vpnc or openconnect) into
the new netns, and running ptrtd inside the netns. Then passing the
tundev from ptrtd back _out_ from the netns to the normal namespace.

First I noticed that when vpnc/openconnect closes, the tundev doesn't
disappear from inside the netns -- that was unexpected. So I made the
script in there exit some other way, and then it oopses when
vpnc/openconnect closes its tun fd.

Looks like you can reproduce it by passing a tundev to a different
netns, then closing that netns before you close tun fd which is attached
to it.

(This was actually tun.c from commit 1bded710a5 on 2.6.29.4, but I see
no reason to believe that 2.6.30-rc is different).

------------[ cut here ]------------
kernel BUG at net/core/dev.c:4188!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT0/voltage_now
CPU 0 
Modules linked in: tun veth vmnet parport_pc vmblock vmci vmmon nls_utf8 hfsplus fuse hidp rfcomm bridge stp llc bnep sco l2cap ppdev parport sunrpc ipv6 cpufreq_ondemand acpi_cpufreq freq_table vfat fat dm_multipath kvm uinput snd_hda_codec_realtek snd_hda_intel arc4 snd_hda_codec snd_hwdep ecb btusb snd_pcm snd_timer bluetooth iwlagn iwlcore lib80211 joydev snd applesmc hwmon firewire_ohci input_polldev firewire_core crc_itu_t uvcvideo mac80211 usb_storage bcm5974 soundcore i2c_i801 iTCO_wdt iTCO_vendor_support mbp_nvidia_bl sky2 videodev v4l1_compat v4l2_compat_ioctl32 cfg80211 snd_page_alloc pcspkr video output ata_generic pata_acpi nouveau drm i2c_algo_bit i2c_core [last unloaded: tun]
Pid: 22838, comm: openconnect Not tainted 2.6.29.4-167.fc11.x86_64 #1 MacBookPro4,1
RIP: 0010:[<ffffffff813117b8>]  [<ffffffff813117b8>] rollback_registered+0x7e/0x10a
RSP: 0018:ffff88000721fd68  EFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff880100196000 RCX: ffff88010019672c
RDX: ffffffff815ce568 RSI: 0000000000000292 RDI: ffff880100196000
RBP: ffff88000721fd78 R08: ffff88000721e000 R09: ffff880100196700
R10: ffff88000721fd78 R11: ffff88000721fd78 R12: ffff880100196700
R13: ffff88013d0e8800 R14: 0000000000000008 R15: ffff88001050b880
FS:  00007f5d456987b0(0000) GS:ffffffff817b7000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000116f6e0 CR3: 000000013c114000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process openconnect (pid: 22838, threadinfo ffff88000721e000, task ffff88011a401700)
Stack:
 ffff880100196700 ffff880100196000 ffff88000721fd98 ffffffff81311883
 ffff88011a582700 ffff88011a582700 ffff88000721fdb8 ffffffffa02ea4a6
 ffff880118d6f600 ffff88001050b880 ffff88000721fe08 ffffffff810d61c4
Call Trace:
 [<ffffffff81311883>] unregister_netdevice+0x3f/0x56
 [<ffffffffa02ea4a6>] tun_chr_close+0x42/0x69 [tun]
 [<ffffffff810d61c4>] __fput+0xf9/0x1a0
 [<ffffffff810d6285>] fput+0x1a/0x1c
 [<ffffffff810d35c5>] filp_close+0x68/0x72
 [<ffffffff8104a5f9>] put_files_struct+0x6c/0xc3
 [<ffffffff8104a692>] exit_files+0x42/0x47
 [<ffffffff8104c050>] do_exit+0x210/0x834
 [<ffffffff813abd37>] ? _spin_lock_irq+0x27/0x2a
 [<ffffffff813ab88d>] ? trace_hardirqs_off_thunk+0x3a/0x6c
 [<ffffffff8104c702>] do_group_exit+0x8e/0xbe
 [<ffffffff8104c749>] sys_exit_group+0x17/0x1b
 [<ffffffff8101133a>] system_call_fastpath+0x16/0x1b
Code: de 48 c7 c7 29 5c 52 81 e8 24 7e 09 00 31 d2 be 58 10 00 00 48 c7 c7 54 59 52 81 31 c0 e8 54 71 d3 ff e9 8c 00 00 00 ff c8 74 04 <0f> 0b eb fe 48 89 df e8 d2 fe ff ff 48 89 df e8 8f cd ff ff c7 
RIP  [<ffffffff813117b8>] rollback_registered+0x7e/0x10a
 RSP <ffff88000721fd68>
---[ end trace 96330edd51618473 ]---

-- 
dwmw2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-06-04 19:05 tun netns BUG() David Woodhouse
@ 2009-06-05  1:19 ` Eric W. Biederman
  2009-06-05  8:42   ` David Woodhouse
  0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2009-06-05  1:19 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev, johannes

David Woodhouse <dwmw2@infradead.org> writes:

> Johannes and I have been playing with using network namespaces for VPNs:
> 	http://david.woodhou.se/vpnc-netns.sh
>
> The idea is that you set up a separate netns to be 'afflicted' by the
> VPN, while your normal programs have a normal view of the network.
>
> One option was to run a SOCKS server in the VPN's netns, so that normal
> programs could get to the VPN through that. That's not what I'm doing
> here though -- this one is using NAT-PT so a range of IPv6 space is
> mapped to the Legacy IP range on the VPN. Any connections to
> fec0:0:0:ffff:0:0:xxyy:zzww get mapped to xx.yy.zz.ww on the VPN.
>
> This is done by passing the VPN tundev (from vpnc or openconnect) into
> the new netns, and running ptrtd inside the netns. Then passing the
> tundev from ptrtd back _out_ from the netns to the normal namespace.
>
> First I noticed that when vpnc/openconnect closes, the tundev doesn't
> disappear from inside the netns -- that was unexpected.

Agreed.   If it doesn't ask for that handling it should not happen.

> So I made the
> script in there exit some other way, and then it oopses when
> vpnc/openconnect closes its tun fd.
>
> Looks like you can reproduce it by passing a tundev to a different
> netns, then closing that netns before you close tun fd which is attached
> to it.
>
> (This was actually tun.c from commit 1bded710a5 on 2.6.29.4, but I see
> no reason to believe that 2.6.30-rc is different).

This?

commit 1bded710a574f20d41bc9e7fb531301db282d623
Author: Michael Tokarev <mjt@tls.msk.ru>
Date:   Mon Feb 2 23:34:56 2009 -0800

    tun: Check supplemental groups in TUN/TAP driver.
    
    Michael Tokarev wrote:
    []
    > 2, and this is the main one: How about supplementary groups?
    >
    > Here I have a valid usage case: a group of testers running various
    > versions of windows using KVM (kernel virtual machine), 1 at a time,
    > to test some software.  kvm is set up to use bridge with a tap device
    > (there should be a way to connect to the machine).  Anyone on that group
    > has to be able to start/stop the virtual machines.
    >
    > My first attempt - pretty obvious when I saw -g option of tunctl - is
    > to add group ownership for the tun device and add a supplementary group
    > to each user (their primary group should be different).  But that fails,
    > since kernel only checks for egid, not any other group ids.
    >
    > What's the reasoning to not allow supplementary groups and to only check
    > for egid?
    
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    Signed-off-by: David S. Miller <davem@davemloft.net>


There has been a lot of work in this area, and definitely reason to believe
that 2.6.30-rc is different.

As I read it the driver in 2.6.29 has NETIF_F_NETNS_LOCAL set because
it doesn't even try to handle this case.

I had it very close to 100% in netdev and then I looked away and
Herber Xu merged a conflicting patch into it.  Most of the damage
has been fixed now and I expect it to be much more likely to work in
2.6.30-rc.

I haven't had a chance to go back and check it thoroughly.  The code
in 2.6.30-rc at least tries to do the right thing.

I will be happy to look into any problems with the driver on
2.6.30-rc.  The driver in 2.6.29 is enough different it isn't really
worth trying to debug.

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-06-05  1:19 ` Eric W. Biederman
@ 2009-06-05  8:42   ` David Woodhouse
  2009-06-05  9:17     ` David Woodhouse
  2009-06-05 23:20     ` Eric W. Biederman
  0 siblings, 2 replies; 13+ messages in thread
From: David Woodhouse @ 2009-06-05  8:42 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, johannes

On Thu, 2009-06-04 at 18:19 -0700, Eric W. Biederman wrote:
> David Woodhouse <dwmw2@infradead.org> writes:
> 
> > Johannes and I have been playing with using network namespaces for VPNs:
> > 	http://david.woodhou.se/vpnc-netns.sh
> >
> > The idea is that you set up a separate netns to be 'afflicted' by the
> > VPN, while your normal programs have a normal view of the network.
> >
> > One option was to run a SOCKS server in the VPN's netns, so that normal
> > programs could get to the VPN through that. That's not what I'm doing
> > here though -- this one is using NAT-PT so a range of IPv6 space is
> > mapped to the Legacy IP range on the VPN. Any connections to
> > fec0:0:0:ffff:0:0:xxyy:zzww get mapped to xx.yy.zz.ww on the VPN.
> >
> > This is done by passing the VPN tundev (from vpnc or openconnect) into
> > the new netns, and running ptrtd inside the netns. Then passing the
> > tundev from ptrtd back _out_ from the netns to the normal namespace.
> >
> > First I noticed that when vpnc/openconnect closes, the tundev doesn't
> > disappear from inside the netns -- that was unexpected.
> 
> Agreed.   If it doesn't ask for that handling it should not happen.
> 
> > So I made the
> > script in there exit some other way, and then it oopses when
> > vpnc/openconnect closes its tun fd.
> >
> > Looks like you can reproduce it by passing a tundev to a different
> > netns, then closing that netns before you close tun fd which is attached
> > to it.
> >
> > (This was actually tun.c from commit 1bded710a5 on 2.6.29.4, but I see
> > no reason to believe that 2.6.30-rc is different).
> 
> This?
> 
> commit 1bded710a574f20d41bc9e7fb531301db282d623
> 
>     tun: Check supplemental groups in TUN/TAP driver.

Yes. That's the latest version of tun.c from 2.6.30-rc which would build
against my distro's 2.6.29 kernel without me having to think. It
includes all your patches -- as you point out, the driver in 2.6.29
still has NETIF_F_NETNS_LOCAL.

There have been 6 patches to tun.c in 2.6.30-rc since that version -- 5
of which are from Herbert. Are you suggesting that one of those patches
fixes a bug which you introduced? At first glance, I didn't think that
looked likely.

> I haven't had a chance to go back and check it thoroughly.  The code
> in 2.6.30-rc at least tries to do the right thing.

This _is_ the tun.c code from 2.6.30-rc, basically.

> I will be happy to look into any problems with the driver on
> 2.6.30-rc.  The driver in 2.6.29 is enough different it isn't really
> worth trying to debug.

OK, here's one from 2.6.30-rc8. This one looks like it's the other way
round -- my VPN script passes one tun device _into_ the netns, and one
tun device _out_ of it. This time, it's oopsed on the one that was
passed _out_.

I've been experimenting with a standalone test case but haven't yet
managed to reproduce it -- I still had to use openconnect with the
vpnc-script I gave before (with the 'sleep 1 # to avoid BUG()' commented
out of course). Using that script with vpnc or anything similar should
presumably have the same effect.

[ 2103.148009] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[ 2103.155850] IP: [<ffffffffa0024371>] tun_chr_poll+0x23/0xb7 [tun]
[ 2103.161976] PGD 158024067 PUD 14e9a0067 PMD 0 
[ 2103.166433] Oops: 0000 [#1] SMP 
[ 2103.169691] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
[ 2103.177516] CPU 5 
[ 2103.179528] Modules linked in: tun firewire_ohci firewire_core crc_itu_t [last unloaded: scsi_wait_scan]
[ 2103.189093] Pid: 10336, comm: ptrtd Not tainted 2.6.30-rc8 #71         
[ 2103.195696] RIP: 0010:[<ffffffffa0024371>]  [<ffffffffa0024371>] tun_chr_poll+0x23/0xb7 [tun]
[ 2103.204221] RSP: 0018:ffff8801425439f8  EFLAGS: 00010246
[ 2103.209525] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff880142543a18
[ 2103.216650] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88015c103ca0
[ 2103.223775] RBP: ffff880142543a18 R08: ffff880159899958 R09: ffff880158479cc0
[ 2103.230905] R10: ffff88015fb11020 R11: 0000000000000001 R12: ffff8801584d2900
[ 2103.238032] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000005
[ 2103.245158] FS:  00007f82389836f0(0000) GS:ffff88002d07d000(0000) knlGS:0000000000000000
[ 2103.253244] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2103.258982] CR2: 0000000000000080 CR3: 000000015cc5a000 CR4: 00000000000006a0
[ 2103.266108] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2103.273238] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2103.280367] Process ptrtd (pid: 10336, threadinfo ffff880142542000, task ffff88015994ebd0)
[ 2103.288634] Stack:
[ 2103.290640]  0000000000000020 0000000000000000 0000000000000000 0000000000000000
[ 2103.297873]  ffff880142543d78 ffffffff810b9c00 ffff88015d8e7000 ffff8801584d2900
[ 2103.305345]  ffff880142543f38 ffff880142543ea8 ffff880142543dc0 ffff880142543dc8
[ 2103.313042] Call Trace:
[ 2103.315481]  [<ffffffff810b9c00>] do_select+0x303/0x4eb
[ 2103.320709]  [<ffffffff810b9de8>] ? __pollwait+0x0/0xbf
[ 2103.325934]  [<ffffffff810b9ea7>] ? pollwake+0x0/0x42
[ 2103.330984]  [<ffffffff810b9ea7>] ? pollwake+0x0/0x42
[ 2103.336034]  [<ffffffff810b9ea7>] ? pollwake+0x0/0x42
[ 2103.341115]  [<ffffffff810b9ea7>] ? pollwake+0x0/0x42
[ 2103.346166]  [<ffffffff813a82e5>] ? sock_alloc_send_pskb+0xa6/0x2ae
[ 2103.352437]  [<ffffffff810a8ea8>] ? __kmalloc_track_caller+0x110/0x122
[ 2103.358962]  [<ffffffff813a82e5>] ? sock_alloc_send_pskb+0xa6/0x2ae
[ 2103.365230]  [<ffffffff813ac41f>] ? __alloc_skb+0x69/0x13d
[ 2103.370713]  [<ffffffff8104ef02>] ? clocksource_read+0xa/0xc
[ 2103.376372]  [<ffffffff8104f158>] ? getnstimeofday+0x56/0xaa
[ 2103.382036]  [<ffffffff8104b75f>] ? ktime_get_real+0x11/0x3e
[ 2103.387715]  [<ffffffff810ba0b4>] core_sys_select+0x16f/0x205
[ 2103.393459]  [<ffffffff8104865d>] ? autoremove_wake_function+0x0/0x34
[ 2103.399904]  [<ffffffff8104ef02>] ? clocksource_read+0xa/0xc
[ 2103.405570]  [<ffffffff8104f158>] ? getnstimeofday+0x56/0xaa
[ 2103.411233]  [<ffffffff8104b565>] ? ktime_get_ts+0x49/0x4e
[ 2103.416719]  [<ffffffff810ba387>] sys_select+0x94/0xbc
[ 2103.421858]  [<ffffffff8100b9eb>] system_call_fastpath+0x16/0x1b
[ 2103.427868] Code: 41 5c 41 5d 41 5e c9 c3 55 48 89 e5 41 56 41 55 41 54 49 89 fc 53 48 8b bf a0 00 00 00 48 89 f3 e8 f9 fd ff ff 48 85 db 49 89 c5 <4c> 8b b0 80 00 00 00 74 0f 48 8d b0 a0 00 00 00 48 89 da 4c 89 
[ 2103.447437] RIP  [<ffffffffa0024371>] tun_chr_poll+0x23/0xb7 [tun]
[ 2103.453620]  RSP <ffff8801425439f8>
[ 2103.457100] CR2: 0000000000000080
[ 2103.460806] ---[ end trace 0a4ee7821f3fe65e ]---


-- 
dwmw2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-06-05  8:42   ` David Woodhouse
@ 2009-06-05  9:17     ` David Woodhouse
  2009-06-05 23:24       ` Eric W. Biederman
  2009-06-05 23:20     ` Eric W. Biederman
  1 sibling, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2009-06-05  9:17 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, johannes

[-- Attachment #1: Type: text/plain, Size: 3378 bytes --]

On Fri, 2009-06-05 at 09:42 +0100, David Woodhouse wrote:
> I've been experimenting with a standalone test case but haven't yet
> managed to reproduce it.

This works, although it's probably still more complex than it needs to
be...

# ./tuncrash 
/sbin/ip link set tun0 netns 3495
Open tun1 in child
/sbin/ip link set tun1 netns 3492
del tun1
child bye

[  748.322433] ------------[ cut here ]------------
[  748.327046] kernel BUG at net/core/dev.c:4241!
[  748.331519] invalid opcode: 0000 [#1] SMP 
[  748.335767] last sysfs file: /sys/devices/virtual/net/tun1/flags
[  748.341804] CPU 4 
[  748.343900] Modules linked in: tun firewire_ohci firewire_core crc_itu_t [last unloaded: scsi_wait_scan]
[  748.353654] Pid: 3495, comm: tuncrash Not tainted 2.6.30-rc8 #71         
[  748.360466] RIP: 0010:[<ffffffff813b485c>]  [<ffffffff813b485c>] rollback_registered+0x75/0xfe
[  748.369149] RSP: 0018:ffff8801510f7d88  EFLAGS: 00010202
[  748.374486] RAX: 0000000000000002 RBX: ffff88015cea3000 RCX: ffff88015f195602
[  748.381644] RDX: 0000000000000000 RSI: 0000000000000292 RDI: ffff88015cea3000
[  748.388808] RBP: ffff8801510f7d98 R08: ffff8801510f6000 R09: 0000000000de76a8
[  748.395978] R10: ffff88015cea3600 R11: 0000000005f5e100 R12: ffff88015cea3600
[  748.403128] R13: ffff88015c80ce00 R14: 0000000000000008 R15: ffff88015d0d06f0
[  748.410296] FS:  00007f85d1d036f0(0000) GS:ffff88002d064000(0000) knlGS:0000000000000000
[  748.418411] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  748.424186] CR2: 0000000000401cc8 CR3: 0000000156c9d000 CR4: 00000000000006a0
[  748.431345] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  748.438508] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  748.445672] Process tuncrash (pid: 3495, threadinfo ffff8801510f6000, task ffff880155b347e0)
[  748.454133] Stack:
[  748.456177]  ffff88015cea3600 ffff88015cea3000 ffff8801510f7db8 ffffffff813b491f
[  748.463566]  ffff8801510f7db8 ffff8801520d38a0 ffff8801510f7dd8 ffffffffa002446a
[  748.471221]  ffff8801550bb440 ffff88015d0d06f0 ffff8801510f7e28 ffffffff810addf7
[  748.479107] Call Trace:
[  748.481587]  [<ffffffff813b491f>] unregister_netdevice+0x3a/0x69
[  748.487640]  [<ffffffffa002446a>] tun_chr_close+0x3d/0x79 [tun]
[  748.493670]  [<ffffffff810addf7>] __fput+0xe3/0x18a
[  748.498644]  [<ffffffff810adeb3>] fput+0x15/0x17
[  748.503377]  [<ffffffff810ab257>] filp_close+0x63/0x6d
[  748.508585]  [<ffffffff810372e1>] put_files_struct+0x67/0xbe
[  748.514286]  [<ffffffff81037373>] exit_files+0x3b/0x40
[  748.519462]  [<ffffffff81038c1c>] do_exit+0x1e6/0x715
[  748.524590]  [<ffffffff8104b7ab>] ? up_read+0x9/0xb
[  748.529507]  [<ffffffff814bbd81>] ? do_page_fault+0x1d4/0x1fa
[  748.535330]  [<ffffffff810391bb>] do_group_exit+0x70/0x9d
[  748.540797]  [<ffffffff810391fa>] sys_exit_group+0x12/0x16
[  748.546361]  [<ffffffff8100b9eb>] system_call_fastpath+0x16/0x1b
[  748.552441] Code: 89 de 48 89 da 48 c7 c7 88 33 6d 81 e8 46 32 10 00 be 8d 10 00 00 48 c7 c7 13 2e 6d 81 e8 5b 11 c8 ff e9 88 00 00 00 ff c8 74 04 <0f> 0b eb fe 48 89 df e8 f6 fe ff ff 48 89 df e8 ee cf ff ff c7 
[  748.574374] RIP  [<ffffffff813b485c>] rollback_registered+0x75/0xfe
[  748.580722]  RSP <ffff8801510f7d88>
[  748.584638] ---[ end trace 4f761942078445d9 ]---
[  748.589250] Fixing recursive fault but reboot is needed!


-- 
dwmw2

[-- Attachment #2: tuncrash.c --]
[-- Type: text/x-csrc, Size: 1809 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <net/if.h>
#include <sys/ioctl.h>
#include <linux/if_tun.h>
#include <linux/sched.h>

int main(int argc, char **argv)
{
	struct ifreq ifr;
	pid_t pid, ppid = getpid();
	int fd;

	fd = open("/dev/net/tun", O_RDWR);
	if (fd < 0) {
		perror("open tun");
		exit(1);
	}

	memset(&ifr, 0, sizeof(ifr));

	ifr.ifr_flags = (IFF_TUN | IFF_NO_PI);
	if (ioctl(fd, TUNSETIFF, &ifr) < 0) {
		perror("TUNSETIFF");
		exit(1);
	}

	pid = fork();
	if (pid < 0) {
		perror("fork");
		exit(1);
	}
	if (!pid) {
		char *cmd;
		close(fd);
		if (unshare(CLONE_NEWNET) < 0) {
			perror("unshare");
			exit(1);
		}
		/* Wait for the tun device to be given to us... */
		sleep(2);
		
		fd = open("/dev/net/tun", O_RDWR);
		if (fd < 0) {
			perror("open tun");
			exit(1);
		}
		memset(&ifr, 0, sizeof(ifr));

		ifr.ifr_flags = (IFF_TUN | IFF_NO_PI);
		if (ioctl(fd, TUNSETIFF, &ifr) < 0) {
			perror("TUNSETIFF");
			exit(1);
		}
		printf("Open %s in child\n", ifr.ifr_name);

		if (asprintf(&cmd, "/sbin/ip link set %s netns %d",
			     ifr.ifr_name, ppid) < 0) {
			perror("asprintf");
			exit(1);
		}
		printf("%s\n", cmd);
		if (system(cmd)) {
			perror("give tun to parent");
			exit(1);
		}
		sleep(1);
		printf("child bye\n");
		exit(0);
	} else {
		/* Wait for the unshare() */
		char *cmd;

		if (asprintf(&cmd, "/sbin/ip link set %s netns %d",
			     ifr.ifr_name, pid) < 0) {
			perror("asprintf");
			exit(1);
		}
		sleep(1);
		printf("%s\n", cmd);
		if (system(cmd)) {
			perror("give tun to child");
			exit(1);
		}
		free(cmd);
		sleep(1);

		/* Yeah, too lazy to pass the real name back */
		printf("del tun1\n");
		sleep(1);
		system("/sbin/ip link del tun1");
		close(fd);
	}
	return 0;
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-06-05  8:42   ` David Woodhouse
  2009-06-05  9:17     ` David Woodhouse
@ 2009-06-05 23:20     ` Eric W. Biederman
  1 sibling, 0 replies; 13+ messages in thread
From: Eric W. Biederman @ 2009-06-05 23:20 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev, johannes

David Woodhouse <dwmw2@infradead.org> writes:

> On Thu, 2009-06-04 at 18:19 -0700, Eric W. Biederman wrote:
>> David Woodhouse <dwmw2@infradead.org> writes:
>> 
>> > Johannes and I have been playing with using network namespaces for VPNs:
>> > 	http://david.woodhou.se/vpnc-netns.sh
>> >
>> > The idea is that you set up a separate netns to be 'afflicted' by the
>> > VPN, while your normal programs have a normal view of the network.
>> >
>> > One option was to run a SOCKS server in the VPN's netns, so that normal
>> > programs could get to the VPN through that. That's not what I'm doing
>> > here though -- this one is using NAT-PT so a range of IPv6 space is
>> > mapped to the Legacy IP range on the VPN. Any connections to
>> > fec0:0:0:ffff:0:0:xxyy:zzww get mapped to xx.yy.zz.ww on the VPN.
>> >
>> > This is done by passing the VPN tundev (from vpnc or openconnect) into
>> > the new netns, and running ptrtd inside the netns. Then passing the
>> > tundev from ptrtd back _out_ from the netns to the normal namespace.
>> >
>> > First I noticed that when vpnc/openconnect closes, the tundev doesn't
>> > disappear from inside the netns -- that was unexpected.
>> 
>> Agreed.   If it doesn't ask for that handling it should not happen.
>> 
>> > So I made the
>> > script in there exit some other way, and then it oopses when
>> > vpnc/openconnect closes its tun fd.
>> >
>> > Looks like you can reproduce it by passing a tundev to a different
>> > netns, then closing that netns before you close tun fd which is attached
>> > to it.
>> >
>> > (This was actually tun.c from commit 1bded710a5 on 2.6.29.4, but I see
>> > no reason to believe that 2.6.30-rc is different).
>> 
>> This?
>> 
>> commit 1bded710a574f20d41bc9e7fb531301db282d623
>> 
>>     tun: Check supplemental groups in TUN/TAP driver.
>
> Yes. That's the latest version of tun.c from 2.6.30-rc which would build
> against my distro's 2.6.29 kernel without me having to think. It
> includes all your patches -- as you point out, the driver in 2.6.29
> still has NETIF_F_NETNS_LOCAL.

Hmm.  I think you have the wrong commit id.  But now I understand what
code you are running.

> There have been 6 patches to tun.c in 2.6.30-rc since that version -- 5
> of which are from Herbert. Are you suggesting that one of those patches
> fixes a bug which you introduced? At first glance, I didn't think that
> looked likely.

I think the pot was stirred very well with Herbert's patches, and I
think it just about anything is possible.  The reference counting
completely changed.  So I expect at this point stabilizing the code
without Herbert's patches is wasted work.

>> I haven't had a chance to go back and check it thoroughly.  The code
>> in 2.6.30-rc at least tries to do the right thing.
>
> This _is_ the tun.c code from 2.6.30-rc, basically.

>> I will be happy to look into any problems with the driver on
>> 2.6.30-rc.  The driver in 2.6.29 is enough different it isn't really
>> worth trying to debug.
>
> OK, here's one from 2.6.30-rc8. This one looks like it's the other way
> round -- my VPN script passes one tun device _into_ the netns, and one
> tun device _out_ of it. This time, it's oopsed on the one that was
> passed _out_.

Thanks.  Ugh I was afraid of that.

> I've been experimenting with a standalone test case but haven't yet
> managed to reproduce it -- I still had to use openconnect with the
> vpnc-script I gave before (with the 'sleep 1 # to avoid BUG()' commented
> out of course). Using that script with vpnc or anything similar should
> presumably have the same effect.

Ah the infamous poll path.  I will take a look. 

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-06-05  9:17     ` David Woodhouse
@ 2009-06-05 23:24       ` Eric W. Biederman
  2009-06-06  7:06         ` David Woodhouse
  2009-07-03  8:35         ` Herbert Xu
  0 siblings, 2 replies; 13+ messages in thread
From: Eric W. Biederman @ 2009-06-05 23:24 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev, johannes


The patch below fixes the one bug I know of in with just my changes
applied.  I don't think it fixes everything with Herberts changes.

I expect some of this will need to be moved to release to fix the
select problem on 2.6.30-rc8.

Eric


From: Eric W. Biederman <ebiederm@aristanetworks.com>
Subject: [PATCH] tun: Fix unregister race

It is possible for tun_chr_close to race with dellink on the
a tun device.  In which case if __tun_get runs before dellink
but dellink runs before tun_chr_close calls unregister_netdevice
we will attempt to unregister the netdevice after it is already
gone.  

The two cases are already serialized on the rtnl_lock, so I have
gone for the cheap simple fix of moving rtnl_lock to cover __tun_get
in tun_chr_close.  Eliminating the possibility of the tun device
being unregistered between __tun_get and unregister_netdevice in
tun_chr_close.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---

--- linux-2.6.28.x86_64-old/drivers/net/tun.c	2009-05-06 15:01:56.000000000 -0700
+++ linux-2.6.28.x86_64/drivers/net/tun.c	2009-05-06 15:10:09.000000000 -0700
@@ -1194,21 +1194,22 @@
 static int tun_chr_close(struct inode *inode, struct file *file)
 {
 	struct tun_file *tfile = file->private_data;
-	struct tun_struct *tun = __tun_get(tfile);
+	struct tun_struct *tun;
 
 
+	rtnl_lock();
+	tun = __tun_get(tfile);
 	if (tun) {
 		DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name);
 
-		rtnl_lock();
 		__tun_detach(tun);
 
 		/* If desireable, unregister the netdevice. */
 		if (!(tun->flags & TUN_PERSIST))
 			unregister_netdevice(tun->dev);
 
-		rtnl_unlock();
 	}
+	rtnl_unlock();
 
 	put_net(tfile->net);
 	kfree(tfile);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-06-05 23:24       ` Eric W. Biederman
@ 2009-06-06  7:06         ` David Woodhouse
  2009-06-08  7:44           ` David Miller
  2009-07-03  8:35         ` Herbert Xu
  1 sibling, 1 reply; 13+ messages in thread
From: David Woodhouse @ 2009-06-06  7:06 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, johannes

On Fri, 2009-06-05 at 16:24 -0700, Eric W. Biederman wrote:
> The patch below fixes the one bug I know of in with just my changes
> applied.  I don't think it fixes everything with Herberts changes.
> 
> I expect some of this will need to be moved to release to fix the
> select problem on 2.6.30-rc8.

It certainly seems to fix the BUG in unregister_netdevice() which was
the failure mode I was seeing most often in 2.6.30-rc8. Thanks.

Tested-by: David Woodhouse <David.Woodhouse@intel.com>

-- 
dwmw2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-06-06  7:06         ` David Woodhouse
@ 2009-06-08  7:44           ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2009-06-08  7:44 UTC (permalink / raw)
  To: dwmw2; +Cc: ebiederm, netdev, johannes

From: David Woodhouse <dwmw2@infradead.org>
Date: Sat, 06 Jun 2009 08:06:11 +0100

> On Fri, 2009-06-05 at 16:24 -0700, Eric W. Biederman wrote:
>> The patch below fixes the one bug I know of in with just my changes
>> applied.  I don't think it fixes everything with Herberts changes.
>> 
>> I expect some of this will need to be moved to release to fix the
>> select problem on 2.6.30-rc8.
> 
> It certainly seems to fix the BUG in unregister_netdevice() which was
> the failure mode I was seeing most often in 2.6.30-rc8. Thanks.
> 
> Tested-by: David Woodhouse <David.Woodhouse@intel.com>

Applied, thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-06-05 23:24       ` Eric W. Biederman
  2009-06-06  7:06         ` David Woodhouse
@ 2009-07-03  8:35         ` Herbert Xu
  2009-07-03  9:03           ` Herbert Xu
  1 sibling, 1 reply; 13+ messages in thread
From: Herbert Xu @ 2009-07-03  8:35 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: dwmw2, netdev, johannes

Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> --- linux-2.6.28.x86_64-old/drivers/net/tun.c   2009-05-06 15:01:56.000000000 -0700
> +++ linux-2.6.28.x86_64/drivers/net/tun.c       2009-05-06 15:10:09.000000000 -0700
> @@ -1194,21 +1194,22 @@
> static int tun_chr_close(struct inode *inode, struct file *file)
> {
>        struct tun_file *tfile = file->private_data;
> -       struct tun_struct *tun = __tun_get(tfile);
> +       struct tun_struct *tun;
> 
> 
> +       rtnl_lock();
> +       tun = __tun_get(tfile);
>        if (tun) {
>                DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name);
> 
> -               rtnl_lock();
>                __tun_detach(tun);
> 
>                /* If desireable, unregister the netdevice. */
>                if (!(tun->flags & TUN_PERSIST))
>                        unregister_netdevice(tun->dev);
> 
> -               rtnl_unlock();
>        }
> +       rtnl_unlock();

Sorry, catching up with emails (hmm, maybe I should stop doing
that and read some new emails :)

This just turns a wide-open race into a less likely one (that's
why it appears to fix the problem).  The crux of the issue is that 
__tun_get(tfile) != NULL has nothing to do with whether the device
has been unregistered.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-07-03  8:35         ` Herbert Xu
@ 2009-07-03  9:03           ` Herbert Xu
  2009-07-03 14:52             ` Eric W. Biederman
  0 siblings, 1 reply; 13+ messages in thread
From: Herbert Xu @ 2009-07-03  9:03 UTC (permalink / raw)
  To: Eric W. Biederman, David S. Miller; +Cc: dwmw2, netdev, johannes

On Fri, Jul 03, 2009 at 04:35:01PM +0800, Herbert Xu wrote:
> 
> This just turns a wide-open race into a less likely one (that's
> why it appears to fix the problem).  The crux of the issue is that 
> __tun_get(tfile) != NULL has nothing to do with whether the device
> has been unregistered.

tun: Fix device unregister race

It is currently possible for an asynchronous device unregister
to cause the same tun device to be unregistered twice.  This
is because the unregister in tun_chr_close only checks whether
__tun_get(tfile) != NULL.  This however has nothing to do with
whether the device has already been unregistered.  All it tells
you is whether __tun_detach has been called.

This patch fixes this by using the most obvious thing to test
whether the device has been unregistered.

It also moves __tun_detach outside of rtnl_unlock since nothing
that it does requires that lock.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 11a0ba4..b393536 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1324,20 +1324,22 @@ static int tun_chr_close(struct inode *inode, struct file *file)
 	struct tun_file *tfile = file->private_data;
 	struct tun_struct *tun;
 
-
-	rtnl_lock();
 	tun = __tun_get(tfile);
 	if (tun) {
-		DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name);
+		struct net_device *dev = tun->dev;
+
+		DBG(KERN_INFO "%s: tun_chr_close\n", dev->name);
 
 		__tun_detach(tun);
 
 		/* If desireable, unregister the netdevice. */
-		if (!(tun->flags & TUN_PERSIST))
-			unregister_netdevice(tun->dev);
-
+		if (!(tun->flags & TUN_PERSIST)) {
+			rtnl_lock();
+			if (dev->reg_state == NETREG_REGISTERED)
+				unregister_netdevice(dev);
+			rtnl_unlock();
+		}
 	}
-	rtnl_unlock();
 
 	tun = tfile->tun;
 	if (tun)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-07-03  9:03           ` Herbert Xu
@ 2009-07-03 14:52             ` Eric W. Biederman
  2009-07-03 15:25               ` Herbert Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2009-07-03 14:52 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, dwmw2, netdev, johannes

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Fri, Jul 03, 2009 at 04:35:01PM +0800, Herbert Xu wrote:
>> 
>> This just turns a wide-open race into a less likely one (that's
>> why it appears to fix the problem).  The crux of the issue is that 
>> __tun_get(tfile) != NULL has nothing to do with whether the device
>> has been unregistered.

Not so.

unregister_netdevice
  rollback_registered
    tun_net_unit
      __tun_detach.


Further we need rtnl_lock around __tun_detach.

Eric



> tun: Fix device unregister race
>
> It is currently possible for an asynchronous device unregister
> to cause the same tun device to be unregistered twice.  This
> is because the unregister in tun_chr_close only checks whether
> __tun_get(tfile) != NULL.  This however has nothing to do with
> whether the device has already been unregistered.  All it tells
> you is whether __tun_detach has been called.
>
> This patch fixes this by using the most obvious thing to test
> whether the device has been unregistered.
>
> It also moves __tun_detach outside of rtnl_unlock since nothing
> that it does requires that lock.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 11a0ba4..b393536 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1324,20 +1324,22 @@ static int tun_chr_close(struct inode *inode, struct file *file)
>  	struct tun_file *tfile = file->private_data;
>  	struct tun_struct *tun;
>  
> -
> -	rtnl_lock();
>  	tun = __tun_get(tfile);
>  	if (tun) {
> -		DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name);
> +		struct net_device *dev = tun->dev;
> +
> +		DBG(KERN_INFO "%s: tun_chr_close\n", dev->name);
>  
>  		__tun_detach(tun);
>  
>  		/* If desireable, unregister the netdevice. */
> -		if (!(tun->flags & TUN_PERSIST))
> -			unregister_netdevice(tun->dev);
> -
> +		if (!(tun->flags & TUN_PERSIST)) {
> +			rtnl_lock();
> +			if (dev->reg_state == NETREG_REGISTERED)
> +				unregister_netdevice(dev);
> +			rtnl_unlock();
> +		}
>  	}
> -	rtnl_unlock();
>  
>  	tun = tfile->tun;
>  	if (tun)
>
> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-07-03 14:52             ` Eric W. Biederman
@ 2009-07-03 15:25               ` Herbert Xu
  2009-07-06  2:01                 ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Herbert Xu @ 2009-07-03 15:25 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, dwmw2, netdev, johannes

Eric W. Biederman <ebiederm@xmission.com> wrote:
> 
> Not so.
> 
> unregister_netdevice
>  rollback_registered
>    tun_net_unit
>      __tun_detach.
> 
> 
> Further we need rtnl_lock around __tun_detach.

No we don't, tfile->count prevents this from occuring.  The async
path will only __tun_detach if the count hits zero, in which case
the first if clause in tun_chr_close will fail.  Conversely, if
we're in the first if clause in tun_chr_close, then the async
path either didn't execute at all or did not call __tun_detach.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: tun netns BUG()
  2009-07-03 15:25               ` Herbert Xu
@ 2009-07-06  2:01                 ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2009-07-06  2:01 UTC (permalink / raw)
  To: herbert; +Cc: ebiederm, dwmw2, netdev, johannes

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 3 Jul 2009 23:25:47 +0800

> Eric W. Biederman <ebiederm@xmission.com> wrote:
>> 
>> Not so.
>> 
>> unregister_netdevice
>>  rollback_registered
>>    tun_net_unit
>>      __tun_detach.
>> 
>> 
>> Further we need rtnl_lock around __tun_detach.
> 
> No we don't, tfile->count prevents this from occuring.  The async
> path will only __tun_detach if the count hits zero, in which case
> the first if clause in tun_chr_close will fail.  Conversely, if
> we're in the first if clause in tun_chr_close, then the async
> path either didn't execute at all or did not call __tun_detach.

I've applied Herbert's patch, thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-07-06  2:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-04 19:05 tun netns BUG() David Woodhouse
2009-06-05  1:19 ` Eric W. Biederman
2009-06-05  8:42   ` David Woodhouse
2009-06-05  9:17     ` David Woodhouse
2009-06-05 23:24       ` Eric W. Biederman
2009-06-06  7:06         ` David Woodhouse
2009-06-08  7:44           ` David Miller
2009-07-03  8:35         ` Herbert Xu
2009-07-03  9:03           ` Herbert Xu
2009-07-03 14:52             ` Eric W. Biederman
2009-07-03 15:25               ` Herbert Xu
2009-07-06  2:01                 ` David Miller
2009-06-05 23:20     ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.