All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix repeatable Oops on container destroy with conntrack
@ 2011-09-10 18:48 ` Alex Bligh
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2011-09-10 18:48 UTC (permalink / raw)
  To: netfilter-devel, netfilter, coreteam, linux-kernel, containers,
	Linux Containers, Alexey Dobriyan
  Cc: Alex Bligh

Problem:

A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.

A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.

Analysis:

The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.

The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly	set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).

Patch:

The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.

Applicability:

If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.

-- 
Alex Bligh

Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on container destroy

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 net/netfilter/nf_conntrack_netlink.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 482e90c..0790d0a 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
 		return 0;

 	net = nf_ct_net(ct);
+
+	/* container deinit, netlink may have died before death_by_timeout */
+	if (!net->nfnl)
+		return 0;
+
 	if (!item->report && !nfnetlink_has_listeners(net, group))
 		return 0;

-- 
1.7.5.4


Perl script to replicate bug (and demonstrate fixed)

#!/usr/bin/perl

# Reprequisites:
# Install Linux::Unshare from CPAN
# Ensure conntrack is installed

use strict;
use warnings;

use POSIX "setsid";
use Linux::Unshare qw(unshare :clone); # get this from CPAN

# Parent returns PID, child returns 0
sub daemonize {
    chdir("/")                      || die "can't chdir to /: $!";
    open(STDIN,  "< /dev/null")     || die "can't read /dev/null: $!";
    open(STDOUT, "> /dev/null")     || die "can't write to /dev/null: $!";
    defined(my $pid = fork())       || die "can't fork: $!";
    return $pid if $pid;               # non-zero now means I am the parent
    (setsid() != -1)                || die "Can't start a new session: $!";
    open(STDERR, ">&STDOUT")        || die "can't dup stdout: $!";
    return 0;
}

sub docontainer
{
    print STDERR "Child: container starting\n";
    sleep (5);
    print STDERR "Child: setting ip addresses\n";
    system("ip link set vethr up");
    system("ip link show");
    system("ip addr add 10.99.99.2/24 dev vethr");
    system("ip addr add 127.0.0.1/8 dev lo");
    system("ip link set lo up");
    system("echo 1 > /proc/sys/net/ipv4/ip_forward");
    system("echo 1 > /proc/sys/net/ipv4/conf/all/forwarding");
    system("echo 1 > /proc/sys/net/ipv6/conf/all/forwarding");

    system("echo 2 > /proc/sys/net/ipv4/conf/all/rp_filter");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/accept_source_route");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/accept_redirects");
    system("echo 1 > /proc/sys/net/ipv4/conf/all/proxy_arp");
    system("iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT");
    system("iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT");
    print STDERR "Child: pinging parent and running conntrack\n";
    while (1)
    {
	system("conntrack -L");
	system("ping -n -c1 10.99.99.2");
	sleep(1);
    }
    exit(0);
}

sub startcontainer
{
    system("ip link add vethl type veth peer name vethr");
    system("ip addr add 10.99.99.1/24 dev vethl");
    system("ip link set vethl up");

    print "Parent: Start container\n";

    defined(my $cpid = fork())       || die "can't fork: $!";
	
    if (!$cpid)
    {
	print STDERR "Child: starting\n";
	unshare(CLONE_NEWNET);
	docontainer();
	exit 0;
    }
	
    print STDERR "Parent: Container started pid $cpid\n";
    system("ip addr show | fgrep veth");
    sleep(1);
    print STDERR "Parent: running ip link set vethr netns $cpid\n";
    (system("ip link set vethr netns $cpid") ==0) || print STDERR "WARNING!: ip link netns failed\n";
    sleep(1);
    system("ip addr show | fgrep veth");
    print STDERR "Parent: Moved vethr, parent pinging child\n";
    system("ping -n -c5 10.99.99.2");
    print STDERR "Parent: Moved vethr, ping done\n";
    sleep(2);
    system("kill -KILL $cpid");
}

while (1)
{
    startcontainer;
}


The oops:

root@node-10-157-128-100:~# uname -a
Linux node-10-157-128-100 3.0.0-10-server #16-Ubuntu SMP Fri Sep 2 18:51:05 UTC 2011 x86_64 GNU/Linux


Sep  9 14:30:56 node-10-157-128-100 kernel: [79418.880263] IN=evrr-000000 OUT= MAC=33:33:00:00:00:01:00:15:17:a6:cd:49:86:dd SRC=fe80:0000:0000:0000:0215:17ff:fea6:cd
49 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:30:56 node-10-157-128-100 kernel: [79418.880332] IN=evrr-000008 OUT= MAC=33:33:00:00:00:01:0e:f1:9d:53:b5:1e:86:dd SRC=fe80:0000:0000:0000:58e8:01ff:fe0b:0d
b2 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:33:01 node-10-157-128-100 kernel: [79544.160335] IN=evrr-000000 OUT= MAC=33:33:00:00:00:01:00:15:17:a6:cd:49:86:dd SRC=fe80:0000:0000:0000:0215:17ff:fea6:cd
49 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:33:01 node-10-157-128-100 kernel: [79544.160417] IN=evrr-000008 OUT= MAC=33:33:00:00:00:01:0e:f1:9d:53:b5:1e:86:dd SRC=fe80:0000:0000:0000:58e8:01ff:fe0b:0d
b2 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780106] BUG: unable to handle kernel NULL pointer dereference at 0000000000000274
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780274] IP: [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780398] PGD 0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780441] Oops: 0000 [#1] SMP
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780515] CPU 7
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780547] Modules linked in: ipt_REJECT nf_conntrack_netlink nfnetlink dm_round_robin dm_multipath veth ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ebt_ip ebtable_filter ebtables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi bonding usb_storage uas joydev usbhid hid radeon ttm ses enclosure drm_kms_helper drm kvm_amd psmouse kvm amd64_edac_mod dcdbas serio_raw e1000e megaraid_sas edac_core i2c_algo_bit k8temp edac_mce_amd pata_serverworks shpchp i2c_piix4 bridge 8021q garp stp ixgbe dca mdio [last unloaded: multipath]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.786143]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Pid: 25711, comm: kworker/u:2 Not tainted 3.0.0-10-server #16-Ubuntu Dell Inc. PowerEdge 6950/0GK775
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RIP: 0010:[<ffffffff81511959>]  [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RSP: 0018:ffff880801109c00  EFLAGS: 00010246
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RAX: 0000000000000004 RBX: ffff880c0f796af8 RCX: 000000000000ffff
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RDX: ffff8803f1700000 RSI: 0000000000000003 RDI: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RBP: ffff880801109c00 R08: ffff880801108000 R09: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] R10: 0000000000000001 R11: dead000000200200 R12: ffff880801109cb0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] R13: ffff880c0f796af8 R14: ffff880c0f796af8 R15: 0000000000000004
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] FS: 00007fe2d2d57710(0000) GS:ffff881027d00000(0000) knlGS:0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] CR2: 0000000000000274 CR3: 0000000001c03000 CR4: 00000000000006e0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Process kworker/u:2 (pid: 25711, threadinfo ffff880801108000, task ffff88080f5f9720)
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880801109c10 ffffffffa048f145 ffff880801109c90 ffffffffa049943b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880801109c70 7fffffffffffffff ffff8803f1700000 0000000300000004
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880800000000 ffffffff00000002 ffff880c0fc80000 ffff880c0f796b98
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Call Trace:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa048f145>] nfnetlink_has_listeners+0x15/0x20 [nfnetlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa049943b>] ctnetlink_conntrack_event+0x5cb/0x890 [nf_conntrack_netlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e34d0>] ? net_drop_ns+0x50/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa04062d8>] death_by_timeout+0xc8/0x1c0 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0405270>] ? nf_conntrack_attach+0x50/0x50 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0406448>] nf_ct_iterate_cleanup+0x78/0x90 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0406491>] nf_conntrack_cleanup_net+0x31/0x100 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0407f97>] nf_conntrack_cleanup+0x27/0x60 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa04081f0>] nf_conntrack_net_exit+0x60/0x80 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e2d28>] ops_exit_list.isra.1+0x38/0x60
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e35e2>] cleanup_net+0x112/0x1b0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107bb0a>] process_one_work+0x11a/0x480
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107c7d5>] worker_thread+0x165/0x370
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107c670>] ? manage_workers.isra.30+0x130/0x130
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff81080c1c>] kthread+0x8c/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81607c24>] kernel_thread_helper+0x4/0x10
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81080b90>] ? flush_kthread_worker+0xa0/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81607c20>] ? gs_change+0x13/0x13
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] Code: 00 00 48 85 f6 74 0c 48 83 ee 01 48 89 df e8 cf f6 ff ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66 90 <f6> 87 74 02 00 00 01 74 30 0f b6 87 21 01 00 00 4c 8b 05 38 8d
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] RIP [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699]  RSP <ffff880801109c00>
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] CR2: 0000000000000274
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954892] ---[ end trace 73540474560834fd ]---
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954941] BUG: unable to handle kernel paging request at fffffffffffffff8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954945] IP: [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954950] PGD 1c05067 PUD 1c06067 PMD 0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954954] Oops: 0000 [#2] SMP
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954957] CPU 7
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954959] Modules linked in: ipt_REJECT nf_conntrack_netlink nfnetlink dm_round_robin dm_multipath veth ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ebt_ip ebtable_filter ebtables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi bonding usb_storage uas joydev usbhid hid radeon ttm ses enclosure drm_kms_helper drm kvm_amd psmouse kvm amd64_edac_mod dcdbas serio_raw e1000e megaraid_sas edac_core i2c_algo_bit k8temp edac_mce_amd pata_serverworks shpchp i2c_piix4 bridge 8021q garp stp ixgbe dca mdio [last unloaded: multipath]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955032]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955034] Pid: 25711, comm: kworker/u:2 Tainted: G      D     3.0.0-10-server #16-Ubuntu Dell Inc. PowerEdge 6950/0GK775
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955040] RIP: 0010:[<ffffffff810810b1>]  [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955046] RSP: 0018:ffff880801109870  EFLAGS: 00010096
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955048] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000007
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955051] RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff88080f5f9720
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955054] RBP: ffff880801109888 R08: 0000000000989680 R09: 0000000000000001
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955057] R10: 0000000000000400 R11: ffff88080f65c118 R12: 0000000000000007
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955060] R13: ffff88080f5f9ae8 R14: 0000000000000000 R15: 0000000000000246
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955064] FS: 00007fe2d2d57710(0000) GS:ffff881027d00000(0000) knlGS:0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955067] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955070] CR2: fffffffffffffff8 CR3: 0000000001c03000 CR4: 00000000000006e0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955073] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955076] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955079] Process kworker/u:2 (pid: 25711, threadinfo ffff880801108000, task ffff88080f5f9720)
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955082] Stack:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955083] ffffffff8107cd25 ffff880801109888 ffff881027d12a40 ffff880801109908
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955089] ffffffff815fc737 ffff88080f5f9720 0000000000000000 ffff880801109fd8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955094] ffff880801109fd8 ffff880801109fd8 0000000000012a40 0000000000000011
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955100] Call Trace:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955103] [<ffffffff8107cd25>] ? wq_worker_sleeping+0x15/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955108] [<ffffffff815fc737>] schedule+0x637/0x770
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955114] [<ffffffff81063053>] do_exit+0x273/0x440
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955119] [<ffffffff815ffbd0>] oops_end+0xb0/0xf0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955124] [<ffffffff815e7104>] no_context+0x145/0x152
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955128] [<ffffffff815e729f>] __bad_area_nosemaphore+0x18e/0x1b1
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955133] [<ffffffff815e72d5>] bad_area_nosemaphore+0x13/0x15
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955138] [<ffffffff816024fd>] do_page_fault+0x43d/0x530
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955144] [<ffffffff8100969a>] ? __switch_to+0xca/0x310
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955148] [<ffffffff815fe73e>] ? _raw_spin_lock+0xe/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955154] [<ffffffff8104e749>] ? finish_task_switch+0x49/0xf0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955158] [<ffffffff815fef15>] page_fault+0x25/0x30
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955162] [<ffffffff81511959>] ? netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955167] [<ffffffffa048f145>] nfnetlink_has_listeners+0x15/0x20 [nfnetlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955172] [<ffffffffa049943b>] ctnetlink_conntrack_event+0x5cb/0x890 [nf_conntrack_netlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955177] [<ffffffff814e34d0>] ? net_drop_ns+0x50/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955183] [<ffffffffa04062d8>] death_by_timeout+0xc8/0x1c0 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955189] [<ffffffffa0405270>] ? nf_conntrack_attach+0x50/0x50 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955195] [<ffffffffa0406448>] nf_ct_iterate_cleanup+0x78/0x90 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955202] [<ffffffffa0406491>] nf_conntrack_cleanup_net+0x31/0x100 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955209] [<ffffffffa0407f97>] nf_conntrack_cleanup+0x27/0x60 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955215] [<ffffffffa04081f0>] nf_conntrack_net_exit+0x60/0x80 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955220] [<ffffffff814e2d28>] ops_exit_list.isra.1+0x38/0x60
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955224] [<ffffffff814e35e2>] cleanup_net+0x112/0x1b0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955229] [<ffffffff8107bb0a>] process_one_work+0x11a/0x480
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955233] [<ffffffff8107c7d5>] worker_thread+0x165/0x370
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955237] [<ffffffff8107c670>] ? manage_workers.isra.30+0x130/0x130
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955241] [<ffffffff81080c1c>] kthread+0x8c/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955245] [<ffffffff81607c24>] kernel_thread_helper+0x4/0x10
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955250] [<ffffffff81080b90>] ? flush_kthread_worker+0xa0/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955254] [<ffffffff81607c20>] ? gs_change+0x13/0x13
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955256] Code: 41 5f 5d c3 be 3e 01 00 00 48 c7 c7 88 90 9e 81 e8 b5 d7 fd ff e9 74 fe ff ff 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03 00 00 5d
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955311] RIP [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955315]  RSP <ffff880801109870>
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955317] CR2: fffffffffffffff8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955319] ---[ end trace 73540474560834fe ]---
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955322] Fixing recursive fault but reboot is needed!
Sep  9 14:35:00 node-10-157-128-100 kernel: [79663.120046] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=6002 jiffies)
Sep  9 14:35:05 node-10-157-128-100 kernel: [79667.970047] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=6002 jiffies)
Sep  9 14:38:00 node-10-157-128-100 kernel: [79843.440048] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=24034 jiffies)
Sep  9 14:38:05 node-10-157-128-100 kernel: [79848.290046] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=24034 jiffies)
Sep  9 14:41:00 node-10-157-128-100 kernel: [80023.760051] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=42066 jiffies)
Sep  9 14:41:05 node-10-157-128-100 kernel: [80028.610044] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=42066 jiffies)
Sep  9 14:44:01 node-10-157-128-100 kernel: [80204.080047] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=60098 jiffies)
Sep  9 14:44:06 node-10-157-128-100 kernel: [80208.930045] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=60098 jiffies)
Sep  9 14:44:44 node-10-157-128-100 kernel: Kernel logging (proc) stopped.




^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] Fix repeatable Oops on container destroy with conntrack
@ 2011-09-10 18:48 ` Alex Bligh
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2011-09-10 18:48 UTC (permalink / raw)
  To: netfilter-devel, netfilter, coreteam, linux-kernel, containers,
	Linux Containers
  Cc: Alex Bligh

Problem:

A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.

A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.

Analysis:

The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.

The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly	set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).

Patch:

The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.

Applicability:

If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.

-- 
Alex Bligh

Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on container destroy

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 net/netfilter/nf_conntrack_netlink.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 482e90c..0790d0a 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
 		return 0;

 	net = nf_ct_net(ct);
+
+	/* container deinit, netlink may have died before death_by_timeout */
+	if (!net->nfnl)
+		return 0;
+
 	if (!item->report && !nfnetlink_has_listeners(net, group))
 		return 0;

-- 
1.7.5.4


Perl script to replicate bug (and demonstrate fixed)

#!/usr/bin/perl

# Reprequisites:
# Install Linux::Unshare from CPAN
# Ensure conntrack is installed

use strict;
use warnings;

use POSIX "setsid";
use Linux::Unshare qw(unshare :clone); # get this from CPAN

# Parent returns PID, child returns 0
sub daemonize {
    chdir("/")                      || die "can't chdir to /: $!";
    open(STDIN,  "< /dev/null")     || die "can't read /dev/null: $!";
    open(STDOUT, "> /dev/null")     || die "can't write to /dev/null: $!";
    defined(my $pid = fork())       || die "can't fork: $!";
    return $pid if $pid;               # non-zero now means I am the parent
    (setsid() != -1)                || die "Can't start a new session: $!";
    open(STDERR, ">&STDOUT")        || die "can't dup stdout: $!";
    return 0;
}

sub docontainer
{
    print STDERR "Child: container starting\n";
    sleep (5);
    print STDERR "Child: setting ip addresses\n";
    system("ip link set vethr up");
    system("ip link show");
    system("ip addr add 10.99.99.2/24 dev vethr");
    system("ip addr add 127.0.0.1/8 dev lo");
    system("ip link set lo up");
    system("echo 1 > /proc/sys/net/ipv4/ip_forward");
    system("echo 1 > /proc/sys/net/ipv4/conf/all/forwarding");
    system("echo 1 > /proc/sys/net/ipv6/conf/all/forwarding");

    system("echo 2 > /proc/sys/net/ipv4/conf/all/rp_filter");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/accept_source_route");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/accept_redirects");
    system("echo 1 > /proc/sys/net/ipv4/conf/all/proxy_arp");
    system("iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT");
    system("iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT");
    print STDERR "Child: pinging parent and running conntrack\n";
    while (1)
    {
	system("conntrack -L");
	system("ping -n -c1 10.99.99.2");
	sleep(1);
    }
    exit(0);
}

sub startcontainer
{
    system("ip link add vethl type veth peer name vethr");
    system("ip addr add 10.99.99.1/24 dev vethl");
    system("ip link set vethl up");

    print "Parent: Start container\n";

    defined(my $cpid = fork())       || die "can't fork: $!";
	
    if (!$cpid)
    {
	print STDERR "Child: starting\n";
	unshare(CLONE_NEWNET);
	docontainer();
	exit 0;
    }
	
    print STDERR "Parent: Container started pid $cpid\n";
    system("ip addr show | fgrep veth");
    sleep(1);
    print STDERR "Parent: running ip link set vethr netns $cpid\n";
    (system("ip link set vethr netns $cpid") ==0) || print STDERR "WARNING!: ip link netns failed\n";
    sleep(1);
    system("ip addr show | fgrep veth");
    print STDERR "Parent: Moved vethr, parent pinging child\n";
    system("ping -n -c5 10.99.99.2");
    print STDERR "Parent: Moved vethr, ping done\n";
    sleep(2);
    system("kill -KILL $cpid");
}

while (1)
{
    startcontainer;
}


The oops:

root@node-10-157-128-100:~# uname -a
Linux node-10-157-128-100 3.0.0-10-server #16-Ubuntu SMP Fri Sep 2 18:51:05 UTC 2011 x86_64 GNU/Linux


Sep  9 14:30:56 node-10-157-128-100 kernel: [79418.880263] IN=evrr-000000 OUT= MAC=33:33:00:00:00:01:00:15:17:a6:cd:49:86:dd SRC=fe80:0000:0000:0000:0215:17ff:fea6:cd
49 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:30:56 node-10-157-128-100 kernel: [79418.880332] IN=evrr-000008 OUT= MAC=33:33:00:00:00:01:0e:f1:9d:53:b5:1e:86:dd SRC=fe80:0000:0000:0000:58e8:01ff:fe0b:0d
b2 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:33:01 node-10-157-128-100 kernel: [79544.160335] IN=evrr-000000 OUT= MAC=33:33:00:00:00:01:00:15:17:a6:cd:49:86:dd SRC=fe80:0000:0000:0000:0215:17ff:fea6:cd
49 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:33:01 node-10-157-128-100 kernel: [79544.160417] IN=evrr-000008 OUT= MAC=33:33:00:00:00:01:0e:f1:9d:53:b5:1e:86:dd SRC=fe80:0000:0000:0000:58e8:01ff:fe0b:0d
b2 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780106] BUG: unable to handle kernel NULL pointer dereference at 0000000000000274
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780274] IP: [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780398] PGD 0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780441] Oops: 0000 [#1] SMP
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780515] CPU 7
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780547] Modules linked in: ipt_REJECT nf_conntrack_netlink nfnetlink dm_round_robin dm_multipath veth ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ebt_ip ebtable_filter ebtables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi bonding usb_storage uas joydev usbhid hid radeon ttm ses enclosure drm_kms_helper drm kvm_amd psmouse kvm amd64_edac_mod dcdbas serio_raw e1000e megaraid_sas edac_core i2c_algo_bit k8temp edac_mce_amd pata_serverworks shpchp i2c_piix4 bridge 8021q garp stp ixgbe dca mdio [last unloaded: multipath]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.786143]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Pid: 25711, comm: kworker/u:2 Not tainted 3.0.0-10-server #16-Ubuntu Dell Inc. PowerEdge 6950/0GK775
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RIP: 0010:[<ffffffff81511959>]  [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RSP: 0018:ffff880801109c00  EFLAGS: 00010246
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RAX: 0000000000000004 RBX: ffff880c0f796af8 RCX: 000000000000ffff
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RDX: ffff8803f1700000 RSI: 0000000000000003 RDI: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RBP: ffff880801109c00 R08: ffff880801108000 R09: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] R10: 0000000000000001 R11: dead000000200200 R12: ffff880801109cb0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] R13: ffff880c0f796af8 R14: ffff880c0f796af8 R15: 0000000000000004
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] FS: 00007fe2d2d57710(0000) GS:ffff881027d00000(0000) knlGS:0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] CR2: 0000000000000274 CR3: 0000000001c03000 CR4: 00000000000006e0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Process kworker/u:2 (pid: 25711, threadinfo ffff880801108000, task ffff88080f5f9720)
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880801109c10 ffffffffa048f145 ffff880801109c90 ffffffffa049943b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880801109c70 7fffffffffffffff ffff8803f1700000 0000000300000004
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880800000000 ffffffff00000002 ffff880c0fc80000 ffff880c0f796b98
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Call Trace:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa048f145>] nfnetlink_has_listeners+0x15/0x20 [nfnetlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa049943b>] ctnetlink_conntrack_event+0x5cb/0x890 [nf_conntrack_netlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e34d0>] ? net_drop_ns+0x50/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa04062d8>] death_by_timeout+0xc8/0x1c0 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0405270>] ? nf_conntrack_attach+0x50/0x50 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0406448>] nf_ct_iterate_cleanup+0x78/0x90 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0406491>] nf_conntrack_cleanup_net+0x31/0x100 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0407f97>] nf_conntrack_cleanup+0x27/0x60 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa04081f0>] nf_conntrack_net_exit+0x60/0x80 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e2d28>] ops_exit_list.isra.1+0x38/0x60
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e35e2>] cleanup_net+0x112/0x1b0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107bb0a>] process_one_work+0x11a/0x480
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107c7d5>] worker_thread+0x165/0x370
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107c670>] ? manage_workers.isra.30+0x130/0x130
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff81080c1c>] kthread+0x8c/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81607c24>] kernel_thread_helper+0x4/0x10
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81080b90>] ? flush_kthread_worker+0xa0/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81607c20>] ? gs_change+0x13/0x13
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] Code: 00 00 48 85 f6 74 0c 48 83 ee 01 48 89 df e8 cf f6 ff ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66 90 <f6> 87 74 02 00 00 01 74 30 0f b6 87 21 01 00 00 4c 8b 05 38 8d
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] RIP [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699]  RSP <ffff880801109c00>
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] CR2: 0000000000000274
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954892] ---[ end trace 73540474560834fd ]---
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954941] BUG: unable to handle kernel paging request at fffffffffffffff8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954945] IP: [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954950] PGD 1c05067 PUD 1c06067 PMD 0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954954] Oops: 0000 [#2] SMP
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954957] CPU 7
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954959] Modules linked in: ipt_REJECT nf_conntrack_netlink nfnetlink dm_round_robin dm_multipath veth ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ebt_ip ebtable_filter ebtables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi bonding usb_storage uas joydev usbhid hid radeon ttm ses enclosure drm_kms_helper drm kvm_amd psmouse kvm amd64_edac_mod dcdbas serio_raw e1000e megaraid_sas edac_core i2c_algo_bit k8temp edac_mce_amd pata_serverworks shpchp i2c_piix4 bridge 8021q garp stp ixgbe dca mdio [last unloaded: multipath]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955032]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955034] Pid: 25711, comm: kworker/u:2 Tainted: G      D     3.0.0-10-server #16-Ubuntu Dell Inc. PowerEdge 6950/0GK775
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955040] RIP: 0010:[<ffffffff810810b1>]  [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955046] RSP: 0018:ffff880801109870  EFLAGS: 00010096
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955048] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000007
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955051] RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff88080f5f9720
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955054] RBP: ffff880801109888 R08: 0000000000989680 R09: 0000000000000001
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955057] R10: 0000000000000400 R11: ffff88080f65c118 R12: 0000000000000007
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955060] R13: ffff88080f5f9ae8 R14: 0000000000000000 R15: 0000000000000246
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955064] FS: 00007fe2d2d57710(0000) GS:ffff881027d00000(0000) knlGS:0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955067] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955070] CR2: fffffffffffffff8 CR3: 0000000001c03000 CR4: 00000000000006e0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955073] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955076] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955079] Process kworker/u:2 (pid: 25711, threadinfo ffff880801108000, task ffff88080f5f9720)
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955082] Stack:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955083] ffffffff8107cd25 ffff880801109888 ffff881027d12a40 ffff880801109908
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955089] ffffffff815fc737 ffff88080f5f9720 0000000000000000 ffff880801109fd8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955094] ffff880801109fd8 ffff880801109fd8 0000000000012a40 0000000000000011
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955100] Call Trace:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955103] [<ffffffff8107cd25>] ? wq_worker_sleeping+0x15/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955108] [<ffffffff815fc737>] schedule+0x637/0x770
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955114] [<ffffffff81063053>] do_exit+0x273/0x440
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955119] [<ffffffff815ffbd0>] oops_end+0xb0/0xf0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955124] [<ffffffff815e7104>] no_context+0x145/0x152
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955128] [<ffffffff815e729f>] __bad_area_nosemaphore+0x18e/0x1b1
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955133] [<ffffffff815e72d5>] bad_area_nosemaphore+0x13/0x15
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955138] [<ffffffff816024fd>] do_page_fault+0x43d/0x530
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955144] [<ffffffff8100969a>] ? __switch_to+0xca/0x310
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955148] [<ffffffff815fe73e>] ? _raw_spin_lock+0xe/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955154] [<ffffffff8104e749>] ? finish_task_switch+0x49/0xf0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955158] [<ffffffff815fef15>] page_fault+0x25/0x30
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955162] [<ffffffff81511959>] ? netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955167] [<ffffffffa048f145>] nfnetlink_has_listeners+0x15/0x20 [nfnetlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955172] [<ffffffffa049943b>] ctnetlink_conntrack_event+0x5cb/0x890 [nf_conntrack_netlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955177] [<ffffffff814e34d0>] ? net_drop_ns+0x50/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955183] [<ffffffffa04062d8>] death_by_timeout+0xc8/0x1c0 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955189] [<ffffffffa0405270>] ? nf_conntrack_attach+0x50/0x50 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955195] [<ffffffffa0406448>] nf_ct_iterate_cleanup+0x78/0x90 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955202] [<ffffffffa0406491>] nf_conntrack_cleanup_net+0x31/0x100 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955209] [<ffffffffa0407f97>] nf_conntrack_cleanup+0x27/0x60 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955215] [<ffffffffa04081f0>] nf_conntrack_net_exit+0x60/0x80 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955220] [<ffffffff814e2d28>] ops_exit_list.isra.1+0x38/0x60
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955224] [<ffffffff814e35e2>] cleanup_net+0x112/0x1b0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955229] [<ffffffff8107bb0a>] process_one_work+0x11a/0x480
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955233] [<ffffffff8107c7d5>] worker_thread+0x165/0x370
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955237] [<ffffffff8107c670>] ? manage_workers.isra.30+0x130/0x130
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955241] [<ffffffff81080c1c>] kthread+0x8c/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955245] [<ffffffff81607c24>] kernel_thread_helper+0x4/0x10
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955250] [<ffffffff81080b90>] ? flush_kthread_worker+0xa0/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955254] [<ffffffff81607c20>] ? gs_change+0x13/0x13
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955256] Code: 41 5f 5d c3 be 3e 01 00 00 48 c7 c7 88 90 9e 81 e8 b5 d7 fd ff e9 74 fe ff ff 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03 00 00 5d
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955311] RIP [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955315]  RSP <ffff880801109870>
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955317] CR2: fffffffffffffff8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955319] ---[ end trace 73540474560834fe ]---
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955322] Fixing recursive fault but reboot is needed!
Sep  9 14:35:00 node-10-157-128-100 kernel: [79663.120046] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=6002 jiffies)
Sep  9 14:35:05 node-10-157-128-100 kernel: [79667.970047] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=6002 jiffies)
Sep  9 14:38:00 node-10-157-128-100 kernel: [79843.440048] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=24034 jiffies)
Sep  9 14:38:05 node-10-157-128-100 kernel: [79848.290046] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=24034 jiffies)
Sep  9 14:41:00 node-10-157-128-100 kernel: [80023.760051] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=42066 jiffies)
Sep  9 14:41:05 node-10-157-128-100 kernel: [80028.610044] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=42066 jiffies)
Sep  9 14:44:01 node-10-157-128-100 kernel: [80204.080047] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=60098 jiffies)
Sep  9 14:44:06 node-10-157-128-100 kernel: [80208.930045] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=60098 jiffies)
Sep  9 14:44:44 node-10-157-128-100 kernel: Kernel logging (proc) stopped.




^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] Fix repeatable Oops on container destroy with conntrack
@ 2011-09-10 18:48 ` Alex Bligh
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2011-09-10 18:48 UTC (permalink / raw)
  To: netfilter-devel, netfilter, coreteam, linux-kernel, containers,
	Linux Containers
  Cc: Alex Bligh

Problem:

A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.

A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.

Analysis:

The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.

The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly	set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).

Patch:

The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.

Applicability:

If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.

-- 
Alex Bligh

Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on container destroy

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 net/netfilter/nf_conntrack_netlink.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 482e90c..0790d0a 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
 		return 0;

 	net = nf_ct_net(ct);
+
+	/* container deinit, netlink may have died before death_by_timeout */
+	if (!net->nfnl)
+		return 0;
+
 	if (!item->report && !nfnetlink_has_listeners(net, group))
 		return 0;

-- 
1.7.5.4


Perl script to replicate bug (and demonstrate fixed)

#!/usr/bin/perl

# Reprequisites:
# Install Linux::Unshare from CPAN
# Ensure conntrack is installed

use strict;
use warnings;

use POSIX "setsid";
use Linux::Unshare qw(unshare :clone); # get this from CPAN

# Parent returns PID, child returns 0
sub daemonize {
    chdir("/")                      || die "can't chdir to /: $!";
    open(STDIN,  "< /dev/null")     || die "can't read /dev/null: $!";
    open(STDOUT, "> /dev/null")     || die "can't write to /dev/null: $!";
    defined(my $pid = fork())       || die "can't fork: $!";
    return $pid if $pid;               # non-zero now means I am the parent
    (setsid() != -1)                || die "Can't start a new session: $!";
    open(STDERR, ">&STDOUT")        || die "can't dup stdout: $!";
    return 0;
}

sub docontainer
{
    print STDERR "Child: container starting\n";
    sleep (5);
    print STDERR "Child: setting ip addresses\n";
    system("ip link set vethr up");
    system("ip link show");
    system("ip addr add 10.99.99.2/24 dev vethr");
    system("ip addr add 127.0.0.1/8 dev lo");
    system("ip link set lo up");
    system("echo 1 > /proc/sys/net/ipv4/ip_forward");
    system("echo 1 > /proc/sys/net/ipv4/conf/all/forwarding");
    system("echo 1 > /proc/sys/net/ipv6/conf/all/forwarding");

    system("echo 2 > /proc/sys/net/ipv4/conf/all/rp_filter");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/accept_source_route");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/accept_redirects");
    system("echo 1 > /proc/sys/net/ipv4/conf/all/proxy_arp");
    system("iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT");
    system("iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT");
    print STDERR "Child: pinging parent and running conntrack\n";
    while (1)
    {
	system("conntrack -L");
	system("ping -n -c1 10.99.99.2");
	sleep(1);
    }
    exit(0);
}

sub startcontainer
{
    system("ip link add vethl type veth peer name vethr");
    system("ip addr add 10.99.99.1/24 dev vethl");
    system("ip link set vethl up");

    print "Parent: Start container\n";

    defined(my $cpid = fork())       || die "can't fork: $!";
	
    if (!$cpid)
    {
	print STDERR "Child: starting\n";
	unshare(CLONE_NEWNET);
	docontainer();
	exit 0;
    }
	
    print STDERR "Parent: Container started pid $cpid\n";
    system("ip addr show | fgrep veth");
    sleep(1);
    print STDERR "Parent: running ip link set vethr netns $cpid\n";
    (system("ip link set vethr netns $cpid") ==0) || print STDERR "WARNING!: ip link netns failed\n";
    sleep(1);
    system("ip addr show | fgrep veth");
    print STDERR "Parent: Moved vethr, parent pinging child\n";
    system("ping -n -c5 10.99.99.2");
    print STDERR "Parent: Moved vethr, ping done\n";
    sleep(2);
    system("kill -KILL $cpid");
}

while (1)
{
    startcontainer;
}


The oops:

root@node-10-157-128-100:~# uname -a
Linux node-10-157-128-100 3.0.0-10-server #16-Ubuntu SMP Fri Sep 2 18:51:05 UTC 2011 x86_64 GNU/Linux


Sep  9 14:30:56 node-10-157-128-100 kernel: [79418.880263] IN=evrr-000000 OUT= MAC=33:33:00:00:00:01:00:15:17:a6:cd:49:86:dd SRC=fe80:0000:0000:0000:0215:17ff:fea6:cd
49 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:30:56 node-10-157-128-100 kernel: [79418.880332] IN=evrr-000008 OUT= MAC=33:33:00:00:00:01:0e:f1:9d:53:b5:1e:86:dd SRC=fe80:0000:0000:0000:58e8:01ff:fe0b:0d
b2 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:33:01 node-10-157-128-100 kernel: [79544.160335] IN=evrr-000000 OUT= MAC=33:33:00:00:00:01:00:15:17:a6:cd:49:86:dd SRC=fe80:0000:0000:0000:0215:17ff:fea6:cd
49 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:33:01 node-10-157-128-100 kernel: [79544.160417] IN=evrr-000008 OUT= MAC=33:33:00:00:00:01:0e:f1:9d:53:b5:1e:86:dd SRC=fe80:0000:0000:0000:58e8:01ff:fe0b:0d
b2 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780106] BUG: unable to handle kernel NULL pointer dereference at 0000000000000274
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780274] IP: [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780398] PGD 0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780441] Oops: 0000 [#1] SMP
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780515] CPU 7
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780547] Modules linked in: ipt_REJECT nf_conntrack_netlink nfnetlink dm_round_robin dm_multipath veth ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ebt_ip ebtable_filter ebtables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi bonding usb_storage uas joydev usbhid hid radeon ttm ses enclosure drm_kms_helper drm kvm_amd psmouse kvm amd64_edac_mod dcdbas serio_raw e1000e megaraid_sas edac_core i2c_algo_bit k8temp edac_mce_amd pata_serverworks shpchp i2c_piix4 bridge 8021q garp stp ixgbe dca mdio [last unloaded: multipath]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.786143]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Pid: 25711, comm: kworker/u:2 Not tainted 3.0.0-10-server #16-Ubuntu Dell Inc. PowerEdge 6950/0GK775
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RIP: 0010:[<ffffffff81511959>]  [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RSP: 0018:ffff880801109c00  EFLAGS: 00010246
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RAX: 0000000000000004 RBX: ffff880c0f796af8 RCX: 000000000000ffff
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RDX: ffff8803f1700000 RSI: 0000000000000003 RDI: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RBP: ffff880801109c00 R08: ffff880801108000 R09: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] R10: 0000000000000001 R11: dead000000200200 R12: ffff880801109cb0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] R13: ffff880c0f796af8 R14: ffff880c0f796af8 R15: 0000000000000004
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] FS: 00007fe2d2d57710(0000) GS:ffff881027d00000(0000) knlGS:0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] CR2: 0000000000000274 CR3: 0000000001c03000 CR4: 00000000000006e0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Process kworker/u:2 (pid: 25711, threadinfo ffff880801108000, task ffff88080f5f9720)
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880801109c10 ffffffffa048f145 ffff880801109c90 ffffffffa049943b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880801109c70 7fffffffffffffff ffff8803f1700000 0000000300000004
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880800000000 ffffffff00000002 ffff880c0fc80000 ffff880c0f796b98
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Call Trace:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa048f145>] nfnetlink_has_listeners+0x15/0x20 [nfnetlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa049943b>] ctnetlink_conntrack_event+0x5cb/0x890 [nf_conntrack_netlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e34d0>] ? net_drop_ns+0x50/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa04062d8>] death_by_timeout+0xc8/0x1c0 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0405270>] ? nf_conntrack_attach+0x50/0x50 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0406448>] nf_ct_iterate_cleanup+0x78/0x90 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0406491>] nf_conntrack_cleanup_net+0x31/0x100 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0407f97>] nf_conntrack_cleanup+0x27/0x60 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa04081f0>] nf_conntrack_net_exit+0x60/0x80 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e2d28>] ops_exit_list.isra.1+0x38/0x60
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e35e2>] cleanup_net+0x112/0x1b0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107bb0a>] process_one_work+0x11a/0x480
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107c7d5>] worker_thread+0x165/0x370
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107c670>] ? manage_workers.isra.30+0x130/0x130
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff81080c1c>] kthread+0x8c/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81607c24>] kernel_thread_helper+0x4/0x10
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81080b90>] ? flush_kthread_worker+0xa0/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81607c20>] ? gs_change+0x13/0x13
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] Code: 00 00 48 85 f6 74 0c 48 83 ee 01 48 89 df e8 cf f6 ff ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66 90 <f6> 87 74 02 00 00 01 74 30 0f b6 87 21 01 00 00 4c 8b 05 38 8d
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] RIP [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699]  RSP <ffff880801109c00>
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] CR2: 0000000000000274
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954892] ---[ end trace 73540474560834fd ]---
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954941] BUG: unable to handle kernel paging request at fffffffffffffff8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954945] IP: [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954950] PGD 1c05067 PUD 1c06067 PMD 0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954954] Oops: 0000 [#2] SMP
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954957] CPU 7
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954959] Modules linked in: ipt_REJECT nf_conntrack_netlink nfnetlink dm_round_robin dm_multipath veth ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ebt_ip ebtable_filter ebtables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi bonding usb_storage uas joydev usbhid hid radeon ttm ses enclosure drm_kms_helper drm kvm_amd psmouse kvm amd64_edac_mod dcdbas serio_raw e1000e megaraid_sas edac_core i2c_algo_bit k8temp edac_mce_amd pata_serverworks shpchp i2c_piix4 bridge 8021q garp stp ixgbe dca mdio [last unloaded: multipath]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955032]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955034] Pid: 25711, comm: kworker/u:2 Tainted: G      D     3.0.0-10-server #16-Ubuntu Dell Inc. PowerEdge 6950/0GK775
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955040] RIP: 0010:[<ffffffff810810b1>]  [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955046] RSP: 0018:ffff880801109870  EFLAGS: 00010096
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955048] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000007
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955051] RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff88080f5f9720
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955054] RBP: ffff880801109888 R08: 0000000000989680 R09: 0000000000000001
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955057] R10: 0000000000000400 R11: ffff88080f65c118 R12: 0000000000000007
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955060] R13: ffff88080f5f9ae8 R14: 0000000000000000 R15: 0000000000000246
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955064] FS: 00007fe2d2d57710(0000) GS:ffff881027d00000(0000) knlGS:0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955067] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955070] CR2: fffffffffffffff8 CR3: 0000000001c03000 CR4: 00000000000006e0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955073] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955076] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955079] Process kworker/u:2 (pid: 25711, threadinfo ffff880801108000, task ffff88080f5f9720)
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955082] Stack:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955083] ffffffff8107cd25 ffff880801109888 ffff881027d12a40 ffff880801109908
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955089] ffffffff815fc737 ffff88080f5f9720 0000000000000000 ffff880801109fd8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955094] ffff880801109fd8 ffff880801109fd8 0000000000012a40 0000000000000011
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955100] Call Trace:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955103] [<ffffffff8107cd25>] ? wq_worker_sleeping+0x15/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955108] [<ffffffff815fc737>] schedule+0x637/0x770
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955114] [<ffffffff81063053>] do_exit+0x273/0x440
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955119] [<ffffffff815ffbd0>] oops_end+0xb0/0xf0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955124] [<ffffffff815e7104>] no_context+0x145/0x152
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955128] [<ffffffff815e729f>] __bad_area_nosemaphore+0x18e/0x1b1
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955133] [<ffffffff815e72d5>] bad_area_nosemaphore+0x13/0x15
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955138] [<ffffffff816024fd>] do_page_fault+0x43d/0x530
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955144] [<ffffffff8100969a>] ? __switch_to+0xca/0x310
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955148] [<ffffffff815fe73e>] ? _raw_spin_lock+0xe/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955154] [<ffffffff8104e749>] ? finish_task_switch+0x49/0xf0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955158] [<ffffffff815fef15>] page_fault+0x25/0x30
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955162] [<ffffffff81511959>] ? netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955167] [<ffffffffa048f145>] nfnetlink_has_listeners+0x15/0x20 [nfnetlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955172] [<ffffffffa049943b>] ctnetlink_conntrack_event+0x5cb/0x890 [nf_conntrack_netlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955177] [<ffffffff814e34d0>] ? net_drop_ns+0x50/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955183] [<ffffffffa04062d8>] death_by_timeout+0xc8/0x1c0 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955189] [<ffffffffa0405270>] ? nf_conntrack_attach+0x50/0x50 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955195] [<ffffffffa0406448>] nf_ct_iterate_cleanup+0x78/0x90 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955202] [<ffffffffa0406491>] nf_conntrack_cleanup_net+0x31/0x100 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955209] [<ffffffffa0407f97>] nf_conntrack_cleanup+0x27/0x60 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955215] [<ffffffffa04081f0>] nf_conntrack_net_exit+0x60/0x80 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955220] [<ffffffff814e2d28>] ops_exit_list.isra.1+0x38/0x60
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955224] [<ffffffff814e35e2>] cleanup_net+0x112/0x1b0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955229] [<ffffffff8107bb0a>] process_one_work+0x11a/0x480
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955233] [<ffffffff8107c7d5>] worker_thread+0x165/0x370
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955237] [<ffffffff8107c670>] ? manage_workers.isra.30+0x130/0x130
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955241] [<ffffffff81080c1c>] kthread+0x8c/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955245] [<ffffffff81607c24>] kernel_thread_helper+0x4/0x10
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955250] [<ffffffff81080b90>] ? flush_kthread_worker+0xa0/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955254] [<ffffffff81607c20>] ? gs_change+0x13/0x13
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955256] Code: 41 5f 5d c3 be 3e 01 00 00 48 c7 c7 88 90 9e 81 e8 b5 d7 fd ff e9 74 fe ff ff 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03 00 00 5d
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955311] RIP [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955315]  RSP <ffff880801109870>
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955317] CR2: fffffffffffffff8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955319] ---[ end trace 73540474560834fe ]---
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955322] Fixing recursive fault but reboot is needed!
Sep  9 14:35:00 node-10-157-128-100 kernel: [79663.120046] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=6002 jiffies)
Sep  9 14:35:05 node-10-157-128-100 kernel: [79667.970047] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=6002 jiffies)
Sep  9 14:38:00 node-10-157-128-100 kernel: [79843.440048] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=24034 jiffies)
Sep  9 14:38:05 node-10-157-128-100 kernel: [79848.290046] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=24034 jiffies)
Sep  9 14:41:00 node-10-157-128-100 kernel: [80023.760051] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=42066 jiffies)
Sep  9 14:41:05 node-10-157-128-100 kernel: [80028.610044] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=42066 jiffies)
Sep  9 14:44:01 node-10-157-128-100 kernel: [80204.080047] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=60098 jiffies)
Sep  9 14:44:06 node-10-157-128-100 kernel: [80208.930045] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=60098 jiffies)
Sep  9 14:44:44 node-10-157-128-100 kernel: Kernel logging (proc) stopped.




^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix repeatable Oops on container destroy with conntrack
  2011-09-10 18:48 ` Alex Bligh
  (?)
  (?)
@ 2011-09-12  7:25 ` Alexey Dobriyan
  2011-09-12  9:37   ` Pablo Neira Ayuso
  -1 siblings, 1 reply; 16+ messages in thread
From: Alexey Dobriyan @ 2011-09-12  7:25 UTC (permalink / raw)
  To: Alex Bligh
  Cc: netfilter-devel, netfilter, coreteam, linux-kernel, containers,
	Linux Containers

On Sat, Sep 10, 2011 at 07:48:43PM +0100, Alex Bligh wrote:
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
>  		return 0;
> 
>  	net = nf_ct_net(ct);
> +
> +	/* container deinit, netlink may have died before death_by_timeout */
> +	if (!net->nfnl)
> +		return 0;
> +
>  	if (!item->report && !nfnetlink_has_listeners(net, group))
>  		return 0;

If this is correct fix, ->nfnl check should be folded into
nfnetlink_has_listeners(), otherwise expectations aren't covered.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix repeatable Oops on container destroy with conntrack
  2011-09-12  7:25 ` Alexey Dobriyan
@ 2011-09-12  9:37   ` Pablo Neira Ayuso
  2011-09-12 10:32     ` Alex Bligh
  0 siblings, 1 reply; 16+ messages in thread
From: Pablo Neira Ayuso @ 2011-09-12  9:37 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Alex Bligh, netfilter-devel, netfilter, coreteam, linux-kernel,
	containers, Linux Containers

On Mon, Sep 12, 2011 at 10:25:24AM +0300, Alexey Dobriyan wrote:
> On Sat, Sep 10, 2011 at 07:48:43PM +0100, Alex Bligh wrote:
> > --- a/net/netfilter/nf_conntrack_netlink.c
> > +++ b/net/netfilter/nf_conntrack_netlink.c
> > @@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
> >  		return 0;
> > 
> >  	net = nf_ct_net(ct);
> > +
> > +	/* container deinit, netlink may have died before death_by_timeout */
> > +	if (!net->nfnl)
> > +		return 0;
> > +
> >  	if (!item->report && !nfnetlink_has_listeners(net, group))
> >  		return 0;
> 
> If this is correct fix, ->nfnl check should be folded into
> nfnetlink_has_listeners(), otherwise expectations aren't covered.

Agreed.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix repeatable Oops on container destroy with conntrack
  2011-09-12  9:37   ` Pablo Neira Ayuso
@ 2011-09-12 10:32     ` Alex Bligh
  2011-09-12 18:33       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Bligh @ 2011-09-12 10:32 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Alexey Dobriyan
  Cc: netfilter-devel, netfilter, coreteam, linux-kernel, containers,
	Linux Containers, Alex Bligh

Alexey / Pablo,

--On 12 September 2011 11:37:49 +0200 Pablo Neira Ayuso 
<pablo@netfilter.org> wrote:

> On Mon, Sep 12, 2011 at 10:25:24AM +0300, Alexey Dobriyan wrote:
>> On Sat, Sep 10, 2011 at 07:48:43PM +0100, Alex Bligh wrote:
>> > --- a/net/netfilter/nf_conntrack_netlink.c
>> > +++ b/net/netfilter/nf_conntrack_netlink.c
>> > @@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events,
>> > struct nf_ct_event *item) return 0;
>> >
>> >  	net = nf_ct_net(ct);
>> > +
>> > +	/* container deinit, netlink may have died before death_by_timeout */
>> > +	if (!net->nfnl)
>> > +		return 0;
>> > +
>> >  	if (!item->report && !nfnetlink_has_listeners(net, group))
>> >  		return 0;
>>
>> If this is correct fix, ->nfnl check should be folded into
>> nfnetlink_has_listeners(), otherwise expectations aren't covered.
>
> Agreed.

I /think/ it is the correct fix, in that it certainly fixes the oops,
and it's relatively low overhead. I ran the torture test for 24 hours
without a problem.

My only concern is that eventually my torture test died as the
machine (512MB VM) had run out of memory - this was after about 30
hours. Save for having no free memory, the box is happy.
It looks like there is something (possibly something
entirely different) leaking memory. It does not appear to be
conntrack. Whatever, a slow memory leak causing death on a tiny
VM over 5,000 iterations is better than an oops after 5. Memory
stats below. I will leave the vm up in case anyone wants other
stats.

On the suggestion to move the check for ->nfnl into
nfnetlink_has_listeners(), the problem with that is that
if item->report is non-NULL, nfnetlink_has_listeners()
will not be called, and the early return will not be made.
This will merely delay the oops until elsewhere (nfnetlink_send
for example). The check is currently as follows:

        if (!item->report && !nfnetlink_has_listeners(net, group))
                return 0;

I am a very long way from being a netlink expert, but I am not
entirely sure what the point of progressing further is if there
are no listeners if item->report is non-null. Certainly there is
no point in progressing if net->nfnl NULL (as this will oops
before item->report is meaningfully used - it's just passed
as a parametner to nfnetlink_send which will crash). It's
almost as if that test should be || not &&.

Perhaps we should check net->nfnl in both places.

I think there might be similar issues with ctnetlink_expect_event.

-- 
Alex Bligh

root@azed:/home/amb# cat /proc/meminfo
MemTotal:         438432 kB
MemFree:           10648 kB
Buffers:           88944 kB
Cached:           219532 kB
SwapCached:         3500 kB
Active:           142540 kB
Inactive:         182796 kB
Active(anon):       7092 kB
Inactive(anon):     9804 kB
Active(file):     135448 kB
Inactive(file):   172992 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:        520188 kB
SwapFree:         485356 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         15956 kB
Mapped:             5644 kB
Shmem:                36 kB
Slab:              87296 kB
SReclaimable:      65384 kB
SUnreclaim:        21912 kB
KernelStack:        1080 kB
PageTables:         3208 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      739404 kB
Committed_AS:     570652 kB
VmallocTotal:   34359738367 kB
VmallocUsed:        3600 kB
VmallocChunk:   34359732156 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       36844 kB
DirectMap2M:      487424 kB

root@azed:/home/amb# cat /proc/slabinfo
slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab> 
<pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata 
<active_slabs> <num_slabs> <sharedavail>
nf_conntrack_expect      0      0    240   17    1 : tunables    0    0 
0 : slabdata      0      0      0
nf_conntrack_ffffffff81f09100     28     39    312   13    1 : tunables 
0    0    0 : slabdata      3      3      0
UDPLITEv6              0      0   1024   16    4 : tunables    0    0    0 
: slabdata      0      0      0
UDPv6                 32     32   1024   16    4 : tunables    0    0    0 
: slabdata      2      2      0
tw_sock_TCPv6         12     12    320   12    1 : tunables    0    0    0 
: slabdata      1      1      0
TCPv6                 34     34   1920   17    8 : tunables    0    0    0 
: slabdata      2      2      0
kcopyd_job             0      0   3384    9    8 : tunables    0    0    0 
: slabdata      0      0      0
dm_uevent              0      0   2608   12    8 : tunables    0    0    0 
: slabdata      0      0      0
dm_rq_target_io        0      0    400   20    2 : tunables    0    0    0 
: slabdata      0      0      0
cfq_queue              0      0    232   17    1 : tunables    0    0    0 
: slabdata      0      0      0
bsg_cmd                0      0    312   13    1 : tunables    0    0    0 
: slabdata      0      0      0
mqueue_inode_cache     18     18    896   18    4 : tunables    0    0    0 
: slabdata      1      1      0
fuse_request           0      0    608   13    2 : tunables    0    0    0 
: slabdata      0      0      0
fuse_inode             0      0    768   21    4 : tunables    0    0    0 
: slabdata      0      0      0
ecryptfs_key_record_cache      0      0    576   14    2 : tunables    0 
0    0 : slabdata      0      0      0
ecryptfs_inode_cache      0      0   1024   16    4 : tunables    0    0 
0 : slabdata      0      0      0
hugetlbfs_inode_cache     13     13    616   13    2 : tunables    0    0 
0 : slabdata      1      1      0
journal_handle       340    340     24  170    1 : tunables    0    0    0 
: slabdata      2      2      0
journal_head          72     72    112   36    1 : tunables    0    0    0 
: slabdata      2      2      0
revoke_record        256    256     32  128    1 : tunables    0    0    0 
: slabdata      2      2      0
ext4_inode_cache   27639  27727    920   17    4 : tunables    0    0    0 
: slabdata   1631   1631      0
ext4_free_data       146    146     56   73    1 : tunables    0    0    0 
: slabdata      2      2      0
ext4_allocation_context    210    210    136   30    1 : tunables    0    0 
0 : slabdata      7      7      0
ext4_io_end           28     28   1128   14    4 : tunables    0    0    0 
: slabdata      2      2      0
ext4_io_page         514    768     16  256    1 : tunables    0    0    0 
: slabdata      3      3      0
ext2_inode_cache      40     40    792   20    4 : tunables    0    0    0 
: slabdata      2      2      0
ext3_inode_cache       0      0    816   20    4 : tunables    0    0    0 
: slabdata      0      0      0
ext3_xattr             0      0     88   46    1 : tunables    0    0    0 
: slabdata      0      0      0
dquot                  0      0    256   16    1 : tunables    0    0    0 
: slabdata      0      0      0
dnotify_mark          63     90    136   30    1 : tunables    0    0    0 
: slabdata      3      3      0
pid_namespace          0      0   2112   15    8 : tunables    0    0    0 
: slabdata      0      0      0
user_namespace         0      0   1072   15    4 : tunables    0    0    0 
: slabdata      0      0      0
UDP-Lite               0      0    832   19    4 : tunables    0    0    0 
: slabdata      0      0      0
xfrm_dst_cache         0      0    448   18    2 : tunables    0    0    0 
: slabdata      0      0      0
ip_fib_trie          146    146     56   73    1 : tunables    0    0    0 
: slabdata      2      2      0
arp_cache             24     24    320   12    1 : tunables    0    0    0 
: slabdata      2      2      0
RAW                   62     76    832   19    4 : tunables    0    0    0 
: slabdata      4      4      0
UDP                   38     38    832   19    4 : tunables    0    0    0 
: slabdata      2      2      0
tw_sock_TCP           16     16    256   16    1 : tunables    0    0    0 
: slabdata      1      1      0
TCP                   36     36   1728   18    8 : tunables    0    0    0 
: slabdata      2      2      0
blkdev_queue          54     54   1728   18    8 : tunables    0    0    0 
: slabdata      3      3      0
blkdev_requests       79     88    360   22    2 : tunables    0    0    0 
: slabdata      4      4      0
fsnotify_event        68     68    120   34    1 : tunables    0    0    0 
: slabdata      2      2      0
bip-256                7      7   4224    7    8 : tunables    0    0    0 
: slabdata      1      1      0
bip-128                0      0   2176   15    8 : tunables    0    0    0 
: slabdata      0      0      0
bip-64                 0      0   1152   14    4 : tunables    0    0    0 
: slabdata      0      0      0
bip-16                49     63    384   21    2 : tunables    0    0    0 
: slabdata      3      3      0
sock_inode_cache      96    161    704   23    4 : tunables    0    0    0 
: slabdata      7      7      0
file_lock_cache       44     44    184   22    1 : tunables    0    0    0 
: slabdata      2      2      0
net_namespace         24     24   2624   12    8 : tunables    0    0    0 
: slabdata      2      2      0
shmem_inode_cache   4000   4009    824   19    4 : tunables    0    0    0 
: slabdata    211    211      0
Acpi-ParseExt       1085   1176     72   56    1 : tunables    0    0    0 
: slabdata     21     21      0
Acpi-Namespace       981   1122     40  102    1 : tunables    0    0    0 
: slabdata     11     11      0
task_delay_info      206    540    112   36    1 : tunables    0    0    0 
: slabdata     15     15      0
taskstats             24     24    328   12    1 : tunables    0    0    0 
: slabdata      2      2      0
proc_inode_cache   37093  37512    664   12    2 : tunables    0    0    0 
: slabdata   3126   3126      0
sigqueue              50     50    160   25    1 : tunables    0    0    0 
: slabdata      2      2      0
bdev_cache            38     38    832   19    4 : tunables    0    0    0 
: slabdata      2      2      0
sysfs_dir_cache    13096  13209     80   51    1 : tunables    0    0    0 
: slabdata    259    259      0
inode_cache         4199   4329    600   13    2 : tunables    0    0    0 
: slabdata    333    333      0
dentry             66510  72786    192   21    1 : tunables    0    0    0 
: slabdata   3466   3466      0
buffer_head        42233  43368    104   39    1 : tunables    0    0    0 
: slabdata   1112   1112      0
vm_area_struct      2685   2875    176   23    1 : tunables    0    0    0 
: slabdata    125    125      0
mm_struct             67    108    896   18    4 : tunables    0    0    0 
: slabdata      6      6      0
files_cache           74    115    704   23    4 : tunables    0    0    0 
: slabdata      5      5      0
signal_cache         104    285   1088   15    4 : tunables    0    0    0 
: slabdata     19     19      0
sighand_cache        104    270   2112   15    8 : tunables    0    0    0 
: slabdata     18     18      0
task_struct          142    220   5920    5    8 : tunables    0    0    0 
: slabdata     44     44      0
anon_vma            1549   1736     72   56    1 : tunables    0    0    0 
: slabdata     31     31      0
shared_policy_node   2813   5015     48   85    1 : tunables    0    0    0 
: slabdata     59     59      0
numa_policy          852   1020     24  170    1 : tunables    0    0    0 
: slabdata      6      6      0
radix_tree_node     2234   2282    568   14    2 : tunables    0    0    0 
: slabdata    163    163      0
idr_layer_cache      269    300    544   15    2 : tunables    0    0    0 
: slabdata     20     20      0
dma-kmalloc-8192       0      0   8192    4    8 : tunables    0    0    0 
: slabdata      0      0      0
dma-kmalloc-4096       0      0   4096    8    8 : tunables    0    0    0 
: slabdata      0      0      0
dma-kmalloc-2048       0      0   2048   16    8 : tunables    0    0    0 
: slabdata      0      0      0
dma-kmalloc-1024       0      0   1024   16    4 : tunables    0    0    0 
: slabdata      0      0      0
dma-kmalloc-512       16     16    512   16    2 : tunables    0    0    0 
: slabdata      1      1      0
dma-kmalloc-256        0      0    256   16    1 : tunables    0    0    0 
: slabdata      0      0      0
dma-kmalloc-128        0      0    128   32    1 : tunables    0    0    0 
: slabdata      0      0      0
dma-kmalloc-64         0      0     64   64    1 : tunables    0    0    0 
: slabdata      0      0      0
dma-kmalloc-32         0      0     32  128    1 : tunables    0    0    0 
: slabdata      0      0      0
dma-kmalloc-16         0      0     16  256    1 : tunables    0    0    0 
: slabdata      0      0      0
dma-kmalloc-8          0      0      8  512    1 : tunables    0    0    0 
: slabdata      0      0      0
dma-kmalloc-192        0      0    192   21    1 : tunables    0    0    0 
: slabdata      0      0      0
dma-kmalloc-96         0      0     96   42    1 : tunables    0    0    0 
: slabdata      0      0      0
kmalloc-8192          36     36   8192    4    8 : tunables    0    0    0 
: slabdata      9      9      0
kmalloc-4096         128    128   4096    8    8 : tunables    0    0    0 
: slabdata     16     16      0
kmalloc-2048         177    192   2048   16    8 : tunables    0    0    0 
: slabdata     12     12      0
kmalloc-1024        4340   4800   1024   16    4 : tunables    0    0    0 
: slabdata    300    300      0
kmalloc-512         2503   6240    512   16    2 : tunables    0    0    0 
: slabdata    390    390      0
kmalloc-256          461    464    256   16    1 : tunables    0    0    0 
: slabdata     29     29      0
kmalloc-128         6270  14144    128   32    1 : tunables    0    0    0 
: slabdata    442    442      0
kmalloc-64          3123   4288     64   64    1 : tunables    0    0    0 
: slabdata     67     67      0
kmalloc-32           978   2048     32  128    1 : tunables    0    0    0 
: slabdata     16     16      0
kmalloc-16          2560   2560     16  256    1 : tunables    0    0    0 
: slabdata     10     10      0
kmalloc-8           6079   6144      8  512    1 : tunables    0    0    0 
: slabdata     12     12      0
kmalloc-192         1823   3234    192   21    1 : tunables    0    0    0 
: slabdata    154    154      0
kmalloc-96           516    630     96   42    1 : tunables    0    0    0 
: slabdata     15     15      0
kmem_cache            32     32    256   16    1 : tunables    0    0    0 
: slabdata      2      2      0
kmem_cache_node      191    192     64   64    1 : tunables    0    0    0 
: slabdata      3      3      0
root@azed:/home/amb#
root@azed:/home/amb# ps auxwwg
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.4  24044  1768 ?        Ss   Sep10   0:19 /sbin/init
root         2  0.0  0.0      0     0 ?        S    Sep10   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Sep10   0:01 
[ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S    Sep10   0:00 
[kworker/u:0]
root         6  0.0  0.0      0     0 ?        S    Sep10   0:00 
[migration/0]
root         7  0.0  0.0      0     0 ?        S    Sep10   0:00 
[migration/1]
root         9  0.0  0.0      0     0 ?        S    Sep10   0:01 
[ksoftirqd/1]
root        11  0.0  0.0      0     0 ?        S<   Sep10   0:00 [cpuset]
root        12  0.0  0.0      0     0 ?        S<   Sep10   0:00 [khelper]
root        13  0.0  0.0      0     0 ?        S<   Sep10   0:00 [netns]
root        14  0.0  0.0      0     0 ?        S    Sep10   0:00 
[sync_supers]
root        15  0.0  0.0      0     0 ?        S    Sep10   0:08 
[kworker/u:1]
root        16  0.0  0.0      0     0 ?        S    Sep10   0:00 
[bdi-default]
root        17  0.0  0.0      0     0 ?        S<   Sep10   0:00 
[kintegrityd]
root        18  0.0  0.0      0     0 ?        S<   Sep10   0:00 [kblockd]
root        19  0.0  0.0      0     0 ?        S<   Sep10   0:00 [ata_sff]
root        20  0.0  0.0      0     0 ?        S    Sep10   0:00 [khubd]
root        21  0.0  0.0      0     0 ?        S<   Sep10   0:00 [md]
root        23  0.0  0.0      0     0 ?        S    Sep10   0:00 
[khungtaskd]
root        24  0.0  0.0      0     0 ?        S    Sep10   0:05 [kswapd0]
root        25  0.0  0.0      0     0 ?        SN   Sep10   0:00 [ksmd]
root        26  0.0  0.0      0     0 ?        S    Sep10   0:00 
[fsnotify_mark]
root        27  0.0  0.0      0     0 ?        S    Sep10   0:00 
[ecryptfs-kthrea]
root        28  0.0  0.0      0     0 ?        S<   Sep10   0:00 [crypto]
root        36  0.0  0.0      0     0 ?        S<   Sep10   0:00 [kthrotld]
root        38  0.0  0.0      0     0 ?        S    Sep10   0:00 [scsi_eh_0]
root        39  0.0  0.0      0     0 ?        S    Sep10   0:00 [scsi_eh_1]
root       208  0.0  0.0      0     0 ?        S<   Sep10   0:00 [kdmflush]
root       220  0.0  0.0      0     0 ?        S<   Sep10   0:00 [kdmflush]
root       229  0.0  0.0      0     0 ?        S    Sep10   0:01 
[jbd2/dm-0-8]
root       230  0.0  0.0      0     0 ?        S<   Sep10   0:00 
[ext4-dio-unwrit]
root       292  0.0  0.1  17096   476 ?        S    Sep10   0:08 
upstart-udev-bridge --daemon
root       295  0.0  0.1  21360   796 ?        Ss   Sep10   0:11 udevd 
--daemon
root       373  0.0  0.0      0     0 ?        S    Sep10   0:00 [vballoon]
105        405  0.0  0.1  24152   568 ?        Ss   Sep10   0:03 
dbus-daemon --system --fork --activation=upstart
syslog     421  0.0  0.1  52732   820 ?        Sl   Sep10   0:08 rsyslogd 
-c5
root       428  0.0  0.0      0     0 ?        S<   Sep10   0:00 [kpsmoused]
root       522  0.0  0.0  15048   352 ?        S    Sep10   0:01 
upstart-socket-bridge --daemon
root       563  0.0  0.3  49684  1564 ?        Ss   Sep10   0:00 
/usr/sbin/sshd -D
root       678  0.0  0.1   4180   500 tty4     Ss+  Sep10   0:00 
/sbin/getty -8 38400 tty4
root       684  0.0  0.1   4180   500 tty5     Ss+  Sep10   0:00 
/sbin/getty -8 38400 tty5
root       696  0.0  0.1   4180   500 tty2     Ss+  Sep10   0:00 
/sbin/getty -8 38400 tty2
root       697  0.0  0.1   4180   500 tty3     Ss+  Sep10   0:00 
/sbin/getty -8 38400 tty3
root       699  0.0  0.1   4180   500 tty6     Ss+  Sep10   0:00 
/sbin/getty -8 38400 tty6
root       702  0.0  0.1   4196   520 ?        Ss   Sep10   0:00 acpid -c 
/etc/acpi/events -s /var/run/acpid.socket
root       703  0.0  0.1  18976   704 ?        Ss   Sep10   0:00 cron
daemon     704  0.0  0.0  16776   196 ?        Ss   Sep10   0:00 atd
root       705  0.0  0.1  15848   488 ?        Ss   Sep10   0:10 
/usr/sbin/irqbalance
bind       764  0.0  0.4 125828  2068 ?        Ssl  Sep10   0:00 
/usr/sbin/named -u bind
root       840  0.0  0.1   4180   500 tty1     Ss+  Sep10   0:00 
/sbin/getty -8 38400 tty1
root       844  0.0  0.3  73084  1608 ?        Ss   Sep10   0:00 sshd: amb 
[priv]
amb        871  0.0  0.1  73084   684 ?        S    Sep10   0:21 sshd: 
amb@pts/0
amb        872  0.0  0.1  28104   856 pts/0    Ss   Sep10   0:00 -bash
root       974  0.0  0.2  35548   956 pts/0    S    Sep10   0:00 sudo su
root       975  0.0  0.2  39320   884 pts/0    S    Sep10   0:00 su
root       976  0.0  0.3  21752  1692 pts/0    S    Sep10   0:00 bash
root      1328  0.0  0.3  73084  1608 ?        Ss   Sep10   0:00 sshd: amb 
[priv]
amb       1371  0.0  0.1  73084   632 ?        S    Sep10   0:01 sshd: 
amb@pts/1
amb       1372  0.0  0.1  28172   852 pts/1    Ss+  Sep10   0:00 -bash
root      3919  0.0  0.0      0     0 ?        S    Sep11   0:00 
[kworker/0:2]
root      6185  0.0  0.0    188    12 ?        Ss   Sep10   0:01 runsvdir 
-P /etc/service log: 
...........................................................................................................................................................................................................................................................................................................................................................................................................
root      6350  0.0  0.0    164     0 ?        Ss   Sep10   0:00 runsv 
git-daemon
gitlog    6351  0.0  0.0    184     0 ?        S    Sep10   0:00 svlogd -tt 
/var/log/git-daemon
107       6352  0.0  0.1   9108   552 ?        S    Sep10   0:00 
/usr/lib/git-core/git-daemon --verbose --reuseaddr --base-path=/var/cache 
/var/cache/git
root     10047  0.0  2.8  57560 12404 ?        S    Sep11   0:04 python 
/usr/sbin/denyhosts --daemon --purge --config=/etc/denyhosts.conf
root     10561  0.0  0.1  21356   464 ?        S    Sep11   0:00 udevd 
--daemon
root     10639  0.0  0.0  21356   416 ?        S    Sep11   0:00 udevd 
--daemon
root     13015  0.0  0.0      0     0 ?        S    Sep11   0:00 
[kworker/1:2]
root     20473  0.0  0.0      0     0 ?        S    Sep11   0:00 
[kworker/0:0]
root     20831  0.0  0.0      0     0 ?        S    11:05   0:00 
[flush-252:0]
root     20914  0.0  0.2  16680  1188 pts/0    R+   11:15   0:00 ps auxwwg
root     22549  0.0  0.0      0     0 ?        S    Sep11   0:02 
[kworker/1:1]
root@azed:/home/amb#

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix repeatable Oops on container destroy with conntrack
  2011-09-12 10:32     ` Alex Bligh
@ 2011-09-12 18:33       ` Pablo Neira Ayuso
  2011-09-12 19:06         ` Alex Bligh
  0 siblings, 1 reply; 16+ messages in thread
From: Pablo Neira Ayuso @ 2011-09-12 18:33 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Alexey Dobriyan, netfilter-devel, netfilter, coreteam,
	linux-kernel, containers, Linux Containers

Hi Alex,

On Mon, Sep 12, 2011 at 11:32:18AM +0100, Alex Bligh wrote:
> I /think/ it is the correct fix, in that it certainly fixes the oops,
> and it's relatively low overhead. I ran the torture test for 24 hours
> without a problem.
> 
> My only concern is that eventually my torture test died as the
> machine (512MB VM) had run out of memory - this was after about 30
> hours. Save for having no free memory, the box is happy.
> It looks like there is something (possibly something
> entirely different) leaking memory. It does not appear to be
> conntrack. Whatever, a slow memory leak causing death on a tiny
> VM over 5,000 iterations is better than an oops after 5. Memory
> stats below. I will leave the vm up in case anyone wants other
> stats.

Seems like a different issue.

> On the suggestion to move the check for ->nfnl into
> nfnetlink_has_listeners(), the problem with that is that
> if item->report is non-NULL, nfnetlink_has_listeners()
> will not be called, and the early return will not be made.
> This will merely delay the oops until elsewhere (nfnetlink_send
> for example). The check is currently as follows:
> 
>        if (!item->report && !nfnetlink_has_listeners(net, group))
>                return 0;
> 
> I am a very long way from being a netlink expert, but I am not
> entirely sure what the point of progressing further is if there
> are no listeners if item->report is non-null. Certainly there is
> no point in progressing if net->nfnl NULL (as this will oops
> before item->report is meaningfully used - it's just passed
> as a parametner to nfnetlink_send which will crash). It's
> almost as if that test should be || not &&.
> 
> Perhaps we should check net->nfnl in both places.
> 
> I think there might be similar issues with ctnetlink_expect_event.

Yes, this is what Alexey was pointing out in the previous email and
why he suggested to move it to nfnetlink_has_listeners (to cover the
expectation case).

But you're right, we cannot move it to nfnetlink_has_listeners because
of the item->report case. Please, include the expectation part and
resend the patch.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix repeatable Oops on container destroy with conntrack
  2011-09-12 18:33       ` Pablo Neira Ayuso
@ 2011-09-12 19:06         ` Alex Bligh
  2011-09-13 20:44             ` Alex Bligh
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Bligh @ 2011-09-12 19:06 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Alexey Dobriyan, netfilter-devel, netfilter, coreteam,
	linux-kernel, containers, Linux Containers, Alex Bligh

Pablo,

--On 12 September 2011 20:33:57 +0200 Pablo Neira Ayuso <pablo@netfilter.org> wrote:

> Yes, this is what Alexey was pointing out in the previous email and
> why he suggested to move it to nfnetlink_has_listeners (to cover the
> expectation case).
>
> But you're right, we cannot move it to nfnetlink_has_listeners because
> of the item->report case. Please, include the expectation part and
> resend the patch.

Thanks - see below

-- 
Alex Bligh

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 net/netfilter/nf_conntrack_netlink.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 482e90c..f44d571 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
                return 0;

        net = nf_ct_net(ct);
+
+       /* container deinit, netlink may have died before death_by_timeout */
+       if (!net->nfnl)
+               return 0;
+
        if (!item->report && !nfnetlink_has_listeners(net, group))
                return 0;

@@ -1723,6 +1728,10 @@ ctnetlink_expect_event(unsigned int events, struct nf_exp_event *item)
        } else
                return 0;

+       /* container deinit, netlink may have died before death_by_timeout */
+       if (!net->nfnl)
+               return 0;
+
        if (!item->report && !nfnetlink_has_listeners(net, group))
                return 0;

-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix repeatable Oops on container destroy with conntrack
  2011-09-12 19:06         ` Alex Bligh
@ 2011-09-13 20:44             ` Alex Bligh
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2011-09-13 20:44 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Alexey Dobriyan, netfilter-devel, netfilter, coreteam,
	linux-kernel, containers, Linux Containers, Alex Bligh

Alexey / Pablo,

--On 12 September 2011 20:06:25 +0100 Alex Bligh <alex@alex.org.uk> wrote:

> Pablo,
>
> --On 12 September 2011 20:33:57 +0200 Pablo Neira Ayuso
> <pablo@netfilter.org> wrote:
>
>> Yes, this is what Alexey was pointing out in the previous email and
>> why he suggested to move it to nfnetlink_has_listeners (to cover the
>> expectation case).
>>
>> But you're right, we cannot move it to nfnetlink_has_listeners because
>> of the item->report case. Please, include the expectation part and
>> resend the patch.
>
> Thanks - see below

Is this new version OK? I am happy to adjust if not.

I think we ought to get /something/ in, because without anything it's
very simple to cause an oops and a resultant machine hang.

-- 
Alex Bligh

> Signed-off-by: Alex Bligh <alex@alex.org.uk>
> ---
>  net/netfilter/nf_conntrack_netlink.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_netlink.c
> b/net/netfilter/nf_conntrack_netlink.c
> index 482e90c..f44d571 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events,
> struct nf_ct_event *item)
>                 return 0;
>
>         net = nf_ct_net(ct);
> +
> +       /* container deinit, netlink may have died before
> death_by_timeout */
> +       if (!net->nfnl)
> +               return 0;
> +
>         if (!item->report && !nfnetlink_has_listeners(net, group))
>                 return 0;
>
> @@ -1723,6 +1728,10 @@ ctnetlink_expect_event(unsigned int events, struct
> nf_exp_event *item)
>         } else
>                 return 0;
>
> +       /* container deinit, netlink may have died before
> death_by_timeout */
> +       if (!net->nfnl)
> +               return 0;
> +
>         if (!item->report && !nfnetlink_has_listeners(net, group))
>                 return 0;
>
> --
> 1.7.5.4
>
>



-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix repeatable Oops on container destroy with conntrack
@ 2011-09-13 20:44             ` Alex Bligh
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2011-09-13 20:44 UTC (permalink / raw)
  To: Alex Bligh, Pablo Neira Ayuso
  Cc: Alexey Dobriyan, netfilter-devel, netfilter, coreteam,
	linux-kernel, containers, Linux Containers, Alex Bligh

Alexey / Pablo,

--On 12 September 2011 20:06:25 +0100 Alex Bligh <alex@alex.org.uk> wrote:

> Pablo,
>
> --On 12 September 2011 20:33:57 +0200 Pablo Neira Ayuso
> <pablo@netfilter.org> wrote:
>
>> Yes, this is what Alexey was pointing out in the previous email and
>> why he suggested to move it to nfnetlink_has_listeners (to cover the
>> expectation case).
>>
>> But you're right, we cannot move it to nfnetlink_has_listeners because
>> of the item->report case. Please, include the expectation part and
>> resend the patch.
>
> Thanks - see below

Is this new version OK? I am happy to adjust if not.

I think we ought to get /something/ in, because without anything it's
very simple to cause an oops and a resultant machine hang.

-- 
Alex Bligh

> Signed-off-by: Alex Bligh <alex@alex.org.uk>
> ---
>  net/netfilter/nf_conntrack_netlink.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_netlink.c
> b/net/netfilter/nf_conntrack_netlink.c
> index 482e90c..f44d571 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events,
> struct nf_ct_event *item)
>                 return 0;
>
>         net = nf_ct_net(ct);
> +
> +       /* container deinit, netlink may have died before
> death_by_timeout */
> +       if (!net->nfnl)
> +               return 0;
> +
>         if (!item->report && !nfnetlink_has_listeners(net, group))
>                 return 0;
>
> @@ -1723,6 +1728,10 @@ ctnetlink_expect_event(unsigned int events, struct
> nf_exp_event *item)
>         } else
>                 return 0;
>
> +       /* container deinit, netlink may have died before
> death_by_timeout */
> +       if (!net->nfnl)
> +               return 0;
> +
>         if (!item->report && !nfnetlink_has_listeners(net, group))
>                 return 0;
>
> --
> 1.7.5.4
>
>



-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix repeatable Oops on container destroy with conntrack
  2011-09-13 20:44             ` Alex Bligh
  (?)
@ 2011-09-14  1:35             ` Pablo Neira Ayuso
  2011-09-14  8:01               ` Alex Bligh
  -1 siblings, 1 reply; 16+ messages in thread
From: Pablo Neira Ayuso @ 2011-09-14  1:35 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Alexey Dobriyan, netfilter-devel, netfilter, coreteam,
	linux-kernel, containers, Linux Containers

On Tue, Sep 13, 2011 at 09:44:38PM +0100, Alex Bligh wrote:
> Alexey / Pablo,
> 
> --On 12 September 2011 20:06:25 +0100 Alex Bligh <alex@alex.org.uk> wrote:
> 
> >Pablo,
> >
> >--On 12 September 2011 20:33:57 +0200 Pablo Neira Ayuso
> ><pablo@netfilter.org> wrote:
> >
> >>Yes, this is what Alexey was pointing out in the previous email and
> >>why he suggested to move it to nfnetlink_has_listeners (to cover the
> >>expectation case).
> >>
> >>But you're right, we cannot move it to nfnetlink_has_listeners because
> >>of the item->report case. Please, include the expectation part and
> >>resend the patch.
> >
> >Thanks - see below
> 
> Is this new version OK? I am happy to adjust if not.

Hm, I still think that this is a workaround.

The nice fix should move nf_conntrack_event_cb in
nf_conntrack_ecache.c to the net container structure.

Alexey?

> I think we ought to get /something/ in, because without anything it's
> very simple to cause an oops and a resultant machine hang.

Sure, I'm all for fixing it :-).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix repeatable Oops on container destroy with conntrack
  2011-09-14  1:35             ` Pablo Neira Ayuso
@ 2011-09-14  8:01               ` Alex Bligh
  2011-09-28 21:08                 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Bligh @ 2011-09-14  8:01 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Alexey Dobriyan, netfilter-devel, netfilter, coreteam,
	linux-kernel, containers, Linux Containers, Alex Bligh



--On 14 September 2011 03:35:00 +0200 Pablo Neira Ayuso 
<pablo@netfilter.org> wrote:

>> Is this new version OK? I am happy to adjust if not.
>
> Hm, I still think that this is a workaround.

It is a bit of a workaround, that is true. But it is a workaround
that will fix the bug in every kernel since 2.6.32 (and perhaps
before - I haven't looked). It's thus reasonably easily applicable
to stable kernel series.

I'm not clued-up enough on Netfilter to know what the right fix is,
but is applying the workaround in a commit which could be easily
backported, then applying the 'right fix' (assuming that is different)
a reasonable strategy?

As you can probably tell, my interest here is to get something that
doesn't oops into stable kernels.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix repeatable Oops on container destroy with conntrack
  2011-09-14  8:01               ` Alex Bligh
@ 2011-09-28 21:08                 ` Pablo Neira Ayuso
  2011-09-30 15:54                   ` Alex Bligh
  0 siblings, 1 reply; 16+ messages in thread
From: Pablo Neira Ayuso @ 2011-09-28 21:08 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Alexey Dobriyan, netfilter-devel, linux-kernel, containers,
	Linux Containers, netdev

On Wed, Sep 14, 2011 at 09:01:34AM +0100, Alex Bligh wrote:
> --On 14 September 2011 03:35:00 +0200 Pablo Neira Ayuso
> <pablo@netfilter.org> wrote:
> 
> >>Is this new version OK? I am happy to adjust if not.
> >
> >Hm, I still think that this is a workaround.
> 
> It is a bit of a workaround, that is true. But it is a workaround
> that will fix the bug in every kernel since 2.6.32 (and perhaps
> before - I haven't looked). It's thus reasonably easily applicable
> to stable kernel series.

The container support for netfilter seems to be in intermediate state,
we need several patches to get it finished that:

* subsys_table definition in nfnetlink.c.
* ctnl_notifier and ctnl_notifier_exp definitions in
  nfnetlink_conntrack.c
* similar things for nfnetlink_queue and nfnetlink_log.

If nobody is going to fix all these, I'll find some spare time to do
it myself, but I don't think we'll have a proper fix that we can pass
to -stable. This will have to go to net-next, given the amount of
patches that we'll need to appropriately fix this.

> I'm not clued-up enough on Netfilter to know what the right fix is,
> but is applying the workaround in a commit which could be easily
> backported, then applying the 'right fix' (assuming that is different)
> a reasonable strategy?
> 
> As you can probably tell, my interest here is to get something that
> doesn't oops into stable kernels.

As said, I'm not sure that this can happen, given that the amount of
patches that we need to fix it fine, sorry.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Fix repeatable Oops on container destroy with conntrack
  2011-09-28 21:08                 ` Pablo Neira Ayuso
@ 2011-09-30 15:54                   ` Alex Bligh
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2011-09-30 15:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Alexey Dobriyan, netfilter-devel, linux-kernel, containers,
	Linux Containers, netdev, Alex Bligh



--On 28 September 2011 23:08:51 +0200 Pablo Neira Ayuso 
<pablo@netfilter.org> wrote:

>> As you can probably tell, my interest here is to get something that
>> doesn't oops into stable kernels.
>
> As said, I'm not sure that this can happen, given that the amount of
> patches that we need to fix it fine, sorry.

This is why I was suggesting the 2 line "don't oops" patch in the
interim.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Fix repeatable Oops on container destroy with conntrack
@ 2011-09-10 18:48 Alex Bligh
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2011-09-10 18:48 UTC (permalink / raw)
  To: netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	netfilter-u79uwXL29TY76Z2rM5mHXA,
	coreteam-Cap9r6Oaw4JrovVCs/uTlw,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Linux

Problem:

A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.

A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.

Analysis:

The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.

The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly	set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).

Patch:

The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.

Applicability:

If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.

-- 
Alex Bligh

Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on container destroy

Signed-off-by: Alex Bligh <alex-rWA27mgs/Jz10XsdtD+oqA@public.gmane.org>
---
 net/netfilter/nf_conntrack_netlink.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 482e90c..0790d0a 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
 		return 0;

 	net = nf_ct_net(ct);
+
+	/* container deinit, netlink may have died before death_by_timeout */
+	if (!net->nfnl)
+		return 0;
+
 	if (!item->report && !nfnetlink_has_listeners(net, group))
 		return 0;

-- 
1.7.5.4


Perl script to replicate bug (and demonstrate fixed)

#!/usr/bin/perl

# Reprequisites:
# Install Linux::Unshare from CPAN
# Ensure conntrack is installed

use strict;
use warnings;

use POSIX "setsid";
use Linux::Unshare qw(unshare :clone); # get this from CPAN

# Parent returns PID, child returns 0
sub daemonize {
    chdir("/")                      || die "can't chdir to /: $!";
    open(STDIN,  "< /dev/null")     || die "can't read /dev/null: $!";
    open(STDOUT, "> /dev/null")     || die "can't write to /dev/null: $!";
    defined(my $pid = fork())       || die "can't fork: $!";
    return $pid if $pid;               # non-zero now means I am the parent
    (setsid() != -1)                || die "Can't start a new session: $!";
    open(STDERR, ">&STDOUT")        || die "can't dup stdout: $!";
    return 0;
}

sub docontainer
{
    print STDERR "Child: container starting\n";
    sleep (5);
    print STDERR "Child: setting ip addresses\n";
    system("ip link set vethr up");
    system("ip link show");
    system("ip addr add 10.99.99.2/24 dev vethr");
    system("ip addr add 127.0.0.1/8 dev lo");
    system("ip link set lo up");
    system("echo 1 > /proc/sys/net/ipv4/ip_forward");
    system("echo 1 > /proc/sys/net/ipv4/conf/all/forwarding");
    system("echo 1 > /proc/sys/net/ipv6/conf/all/forwarding");

    system("echo 2 > /proc/sys/net/ipv4/conf/all/rp_filter");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/accept_source_route");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/accept_redirects");
    system("echo 1 > /proc/sys/net/ipv4/conf/all/proxy_arp");
    system("iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT");
    system("iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT");
    print STDERR "Child: pinging parent and running conntrack\n";
    while (1)
    {
	system("conntrack -L");
	system("ping -n -c1 10.99.99.2");
	sleep(1);
    }
    exit(0);
}

sub startcontainer
{
    system("ip link add vethl type veth peer name vethr");
    system("ip addr add 10.99.99.1/24 dev vethl");
    system("ip link set vethl up");

    print "Parent: Start container\n";

    defined(my $cpid = fork())       || die "can't fork: $!";
	
    if (!$cpid)
    {
	print STDERR "Child: starting\n";
	unshare(CLONE_NEWNET);
	docontainer();
	exit 0;
    }
	
    print STDERR "Parent: Container started pid $cpid\n";
    system("ip addr show | fgrep veth");
    sleep(1);
    print STDERR "Parent: running ip link set vethr netns $cpid\n";
    (system("ip link set vethr netns $cpid") ==0) || print STDERR "WARNING!: ip link netns failed\n";
    sleep(1);
    system("ip addr show | fgrep veth");
    print STDERR "Parent: Moved vethr, parent pinging child\n";
    system("ping -n -c5 10.99.99.2");
    print STDERR "Parent: Moved vethr, ping done\n";
    sleep(2);
    system("kill -KILL $cpid");
}

while (1)
{
    startcontainer;
}


The oops:

root@node-10-157-128-100:~# uname -a
Linux node-10-157-128-100 3.0.0-10-server #16-Ubuntu SMP Fri Sep 2 18:51:05 UTC 2011 x86_64 GNU/Linux


Sep  9 14:30:56 node-10-157-128-100 kernel: [79418.880263] IN=evrr-000000 OUT= MAC=33:33:00:00:00:01:00:15:17:a6:cd:49:86:dd SRC=fe80:0000:0000:0000:0215:17ff:fea6:cd
49 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:30:56 node-10-157-128-100 kernel: [79418.880332] IN=evrr-000008 OUT= MAC=33:33:00:00:00:01:0e:f1:9d:53:b5:1e:86:dd SRC=fe80:0000:0000:0000:58e8:01ff:fe0b:0d
b2 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:33:01 node-10-157-128-100 kernel: [79544.160335] IN=evrr-000000 OUT= MAC=33:33:00:00:00:01:00:15:17:a6:cd:49:86:dd SRC=fe80:0000:0000:0000:0215:17ff:fea6:cd
49 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:33:01 node-10-157-128-100 kernel: [79544.160417] IN=evrr-000008 OUT= MAC=33:33:00:00:00:01:0e:f1:9d:53:b5:1e:86:dd SRC=fe80:0000:0000:0000:58e8:01ff:fe0b:0d
b2 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780106] BUG: unable to handle kernel NULL pointer dereference at 0000000000000274
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780274] IP: [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780398] PGD 0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780441] Oops: 0000 [#1] SMP
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780515] CPU 7
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780547] Modules linked in: ipt_REJECT nf_conntrack_netlink nfnetlink dm_round_robin dm_multipath veth ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ebt_ip ebtable_filter ebtables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi bonding usb_storage uas joydev usbhid hid radeon ttm ses enclosure drm_kms_helper drm kvm_amd psmouse kvm amd64_edac_mod dcdbas serio_raw e1000e megaraid_sas edac_core i2c_algo_bit k8temp edac_mce_amd pata_serverworks shpchp i2c_piix4 bridge 8021q garp stp ixgbe dca mdio [last unloaded: multipath]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.786143]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Pid: 25711, comm: kworker/u:2 Not tainted 3.0.0-10-server #16-Ubuntu Dell Inc. PowerEdge 6950/0GK775
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RIP: 0010:[<ffffffff81511959>]  [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RSP: 0018:ffff880801109c00  EFLAGS: 00010246
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RAX: 0000000000000004 RBX: ffff880c0f796af8 RCX: 000000000000ffff
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RDX: ffff8803f1700000 RSI: 0000000000000003 RDI: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RBP: ffff880801109c00 R08: ffff880801108000 R09: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] R10: 0000000000000001 R11: dead000000200200 R12: ffff880801109cb0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] R13: ffff880c0f796af8 R14: ffff880c0f796af8 R15: 0000000000000004
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] FS: 00007fe2d2d57710(0000) GS:ffff881027d00000(0000) knlGS:0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] CR2: 0000000000000274 CR3: 0000000001c03000 CR4: 00000000000006e0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Process kworker/u:2 (pid: 25711, threadinfo ffff880801108000, task ffff88080f5f9720)
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880801109c10 ffffffffa048f145 ffff880801109c90 ffffffffa049943b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880801109c70 7fffffffffffffff ffff8803f1700000 0000000300000004
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880800000000 ffffffff00000002 ffff880c0fc80000 ffff880c0f796b98
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Call Trace:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa048f145>] nfnetlink_has_listeners+0x15/0x20 [nfnetlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa049943b>] ctnetlink_conntrack_event+0x5cb/0x890 [nf_conntrack_netlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e34d0>] ? net_drop_ns+0x50/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa04062d8>] death_by_timeout+0xc8/0x1c0 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0405270>] ? nf_conntrack_attach+0x50/0x50 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0406448>] nf_ct_iterate_cleanup+0x78/0x90 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0406491>] nf_conntrack_cleanup_net+0x31/0x100 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0407f97>] nf_conntrack_cleanup+0x27/0x60 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa04081f0>] nf_conntrack_net_exit+0x60/0x80 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e2d28>] ops_exit_list.isra.1+0x38/0x60
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e35e2>] cleanup_net+0x112/0x1b0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107bb0a>] process_one_work+0x11a/0x480
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107c7d5>] worker_thread+0x165/0x370
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107c670>] ? manage_workers.isra.30+0x130/0x130
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff81080c1c>] kthread+0x8c/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81607c24>] kernel_thread_helper+0x4/0x10
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81080b90>] ? flush_kthread_worker+0xa0/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81607c20>] ? gs_change+0x13/0x13
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] Code: 00 00 48 85 f6 74 0c 48 83 ee 01 48 89 df e8 cf f6 ff ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66 90 <f6> 87 74 02 00 00 01 74 30 0f b6 87 21 01 00 00 4c 8b 05 38 8d
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] RIP [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699]  RSP <ffff880801109c00>
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] CR2: 0000000000000274
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954892] ---[ end trace 73540474560834fd ]---
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954941] BUG: unable to handle kernel paging request at fffffffffffffff8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954945] IP: [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954950] PGD 1c05067 PUD 1c06067 PMD 0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954954] Oops: 0000 [#2] SMP
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954957] CPU 7
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954959] Modules linked in: ipt_REJECT nf_conntrack_netlink nfnetlink dm_round_robin dm_multipath veth ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ebt_ip ebtable_filter ebtables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi bonding usb_storage uas joydev usbhid hid radeon ttm ses enclosure drm_kms_helper drm kvm_amd psmouse kvm amd64_edac_mod dcdbas serio_raw e1000e megaraid_sas edac_core i2c_algo_bit k8temp edac_mce_amd pata_serverworks shpchp i2c_piix4 bridge 8021q garp stp ixgbe dca mdio [last unloaded: multipath]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955032]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955034] Pid: 25711, comm: kworker/u:2 Tainted: G      D     3.0.0-10-server #16-Ubuntu Dell Inc. PowerEdge 6950/0GK775
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955040] RIP: 0010:[<ffffffff810810b1>]  [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955046] RSP: 0018:ffff880801109870  EFLAGS: 00010096
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955048] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000007
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955051] RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff88080f5f9720
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955054] RBP: ffff880801109888 R08: 0000000000989680 R09: 0000000000000001
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955057] R10: 0000000000000400 R11: ffff88080f65c118 R12: 0000000000000007
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955060] R13: ffff88080f5f9ae8 R14: 0000000000000000 R15: 0000000000000246
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955064] FS: 00007fe2d2d57710(0000) GS:ffff881027d00000(0000) knlGS:0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955067] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955070] CR2: fffffffffffffff8 CR3: 0000000001c03000 CR4: 00000000000006e0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955073] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955076] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955079] Process kworker/u:2 (pid: 25711, threadinfo ffff880801108000, task ffff88080f5f9720)
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955082] Stack:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955083] ffffffff8107cd25 ffff880801109888 ffff881027d12a40 ffff880801109908
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955089] ffffffff815fc737 ffff88080f5f9720 0000000000000000 ffff880801109fd8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955094] ffff880801109fd8 ffff880801109fd8 0000000000012a40 0000000000000011
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955100] Call Trace:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955103] [<ffffffff8107cd25>] ? wq_worker_sleeping+0x15/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955108] [<ffffffff815fc737>] schedule+0x637/0x770
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955114] [<ffffffff81063053>] do_exit+0x273/0x440
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955119] [<ffffffff815ffbd0>] oops_end+0xb0/0xf0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955124] [<ffffffff815e7104>] no_context+0x145/0x152
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955128] [<ffffffff815e729f>] __bad_area_nosemaphore+0x18e/0x1b1
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955133] [<ffffffff815e72d5>] bad_area_nosemaphore+0x13/0x15
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955138] [<ffffffff816024fd>] do_page_fault+0x43d/0x530
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955144] [<ffffffff8100969a>] ? __switch_to+0xca/0x310
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955148] [<ffffffff815fe73e>] ? _raw_spin_lock+0xe/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955154] [<ffffffff8104e749>] ? finish_task_switch+0x49/0xf0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955158] [<ffffffff815fef15>] page_fault+0x25/0x30
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955162] [<ffffffff81511959>] ? netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955167] [<ffffffffa048f145>] nfnetlink_has_listeners+0x15/0x20 [nfnetlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955172] [<ffffffffa049943b>] ctnetlink_conntrack_event+0x5cb/0x890 [nf_conntrack_netlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955177] [<ffffffff814e34d0>] ? net_drop_ns+0x50/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955183] [<ffffffffa04062d8>] death_by_timeout+0xc8/0x1c0 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955189] [<ffffffffa0405270>] ? nf_conntrack_attach+0x50/0x50 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955195] [<ffffffffa0406448>] nf_ct_iterate_cleanup+0x78/0x90 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955202] [<ffffffffa0406491>] nf_conntrack_cleanup_net+0x31/0x100 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955209] [<ffffffffa0407f97>] nf_conntrack_cleanup+0x27/0x60 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955215] [<ffffffffa04081f0>] nf_conntrack_net_exit+0x60/0x80 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955220] [<ffffffff814e2d28>] ops_exit_list.isra.1+0x38/0x60
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955224] [<ffffffff814e35e2>] cleanup_net+0x112/0x1b0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955229] [<ffffffff8107bb0a>] process_one_work+0x11a/0x480
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955233] [<ffffffff8107c7d5>] worker_thread+0x165/0x370
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955237] [<ffffffff8107c670>] ? manage_workers.isra.30+0x130/0x130
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955241] [<ffffffff81080c1c>] kthread+0x8c/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955245] [<ffffffff81607c24>] kernel_thread_helper+0x4/0x10
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955250] [<ffffffff81080b90>] ? flush_kthread_worker+0xa0/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955254] [<ffffffff81607c20>] ? gs_change+0x13/0x13
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955256] Code: 41 5f 5d c3 be 3e 01 00 00 48 c7 c7 88 90 9e 81 e8 b5 d7 fd ff e9 74 fe ff ff 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03 00 00 5d
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955311] RIP [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955315]  RSP <ffff880801109870>
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955317] CR2: fffffffffffffff8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955319] ---[ end trace 73540474560834fe ]---
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955322] Fixing recursive fault but reboot is needed!
Sep  9 14:35:00 node-10-157-128-100 kernel: [79663.120046] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=6002 jiffies)
Sep  9 14:35:05 node-10-157-128-100 kernel: [79667.970047] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=6002 jiffies)
Sep  9 14:38:00 node-10-157-128-100 kernel: [79843.440048] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=24034 jiffies)
Sep  9 14:38:05 node-10-157-128-100 kernel: [79848.290046] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=24034 jiffies)
Sep  9 14:41:00 node-10-157-128-100 kernel: [80023.760051] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=42066 jiffies)
Sep  9 14:41:05 node-10-157-128-100 kernel: [80028.610044] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=42066 jiffies)
Sep  9 14:44:01 node-10-157-128-100 kernel: [80204.080047] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=60098 jiffies)
Sep  9 14:44:06 node-10-157-128-100 kernel: [80208.930045] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=60098 jiffies)
Sep  9 14:44:44 node-10-157-128-100 kernel: Kernel logging (proc) stopped.

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] Fix repeatable Oops on container destroy with conntrack
@ 2011-09-10 18:48 Alex Bligh
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Bligh @ 2011-09-10 18:48 UTC (permalink / raw)
  To: netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	netfilter-u79uwXL29TY76Z2rM5mHXA,
	coreteam-Cap9r6Oaw4JrovVCs/uTlw,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Linux

Problem:

A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.

A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.

Analysis:

The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.

The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly	set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).

Patch:

The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.

Applicability:

If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.

-- 
Alex Bligh

Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on container destroy

Signed-off-by: Alex Bligh <alex-rWA27mgs/Jz10XsdtD+oqA@public.gmane.org>
---
 net/netfilter/nf_conntrack_netlink.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 482e90c..0790d0a 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
 		return 0;

 	net = nf_ct_net(ct);
+
+	/* container deinit, netlink may have died before death_by_timeout */
+	if (!net->nfnl)
+		return 0;
+
 	if (!item->report && !nfnetlink_has_listeners(net, group))
 		return 0;

-- 
1.7.5.4


Perl script to replicate bug (and demonstrate fixed)

#!/usr/bin/perl

# Reprequisites:
# Install Linux::Unshare from CPAN
# Ensure conntrack is installed

use strict;
use warnings;

use POSIX "setsid";
use Linux::Unshare qw(unshare :clone); # get this from CPAN

# Parent returns PID, child returns 0
sub daemonize {
    chdir("/")                      || die "can't chdir to /: $!";
    open(STDIN,  "< /dev/null")     || die "can't read /dev/null: $!";
    open(STDOUT, "> /dev/null")     || die "can't write to /dev/null: $!";
    defined(my $pid = fork())       || die "can't fork: $!";
    return $pid if $pid;               # non-zero now means I am the parent
    (setsid() != -1)                || die "Can't start a new session: $!";
    open(STDERR, ">&STDOUT")        || die "can't dup stdout: $!";
    return 0;
}

sub docontainer
{
    print STDERR "Child: container starting\n";
    sleep (5);
    print STDERR "Child: setting ip addresses\n";
    system("ip link set vethr up");
    system("ip link show");
    system("ip addr add 10.99.99.2/24 dev vethr");
    system("ip addr add 127.0.0.1/8 dev lo");
    system("ip link set lo up");
    system("echo 1 > /proc/sys/net/ipv4/ip_forward");
    system("echo 1 > /proc/sys/net/ipv4/conf/all/forwarding");
    system("echo 1 > /proc/sys/net/ipv6/conf/all/forwarding");

    system("echo 2 > /proc/sys/net/ipv4/conf/all/rp_filter");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/accept_source_route");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects");
    system("echo 0 > /proc/sys/net/ipv4/conf/all/accept_redirects");
    system("echo 1 > /proc/sys/net/ipv4/conf/all/proxy_arp");
    system("iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT");
    system("iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT");
    print STDERR "Child: pinging parent and running conntrack\n";
    while (1)
    {
	system("conntrack -L");
	system("ping -n -c1 10.99.99.2");
	sleep(1);
    }
    exit(0);
}

sub startcontainer
{
    system("ip link add vethl type veth peer name vethr");
    system("ip addr add 10.99.99.1/24 dev vethl");
    system("ip link set vethl up");

    print "Parent: Start container\n";

    defined(my $cpid = fork())       || die "can't fork: $!";
	
    if (!$cpid)
    {
	print STDERR "Child: starting\n";
	unshare(CLONE_NEWNET);
	docontainer();
	exit 0;
    }
	
    print STDERR "Parent: Container started pid $cpid\n";
    system("ip addr show | fgrep veth");
    sleep(1);
    print STDERR "Parent: running ip link set vethr netns $cpid\n";
    (system("ip link set vethr netns $cpid") ==0) || print STDERR "WARNING!: ip link netns failed\n";
    sleep(1);
    system("ip addr show | fgrep veth");
    print STDERR "Parent: Moved vethr, parent pinging child\n";
    system("ping -n -c5 10.99.99.2");
    print STDERR "Parent: Moved vethr, ping done\n";
    sleep(2);
    system("kill -KILL $cpid");
}

while (1)
{
    startcontainer;
}


The oops:

root@node-10-157-128-100:~# uname -a
Linux node-10-157-128-100 3.0.0-10-server #16-Ubuntu SMP Fri Sep 2 18:51:05 UTC 2011 x86_64 GNU/Linux


Sep  9 14:30:56 node-10-157-128-100 kernel: [79418.880263] IN=evrr-000000 OUT= MAC=33:33:00:00:00:01:00:15:17:a6:cd:49:86:dd SRC=fe80:0000:0000:0000:0215:17ff:fea6:cd
49 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:30:56 node-10-157-128-100 kernel: [79418.880332] IN=evrr-000008 OUT= MAC=33:33:00:00:00:01:0e:f1:9d:53:b5:1e:86:dd SRC=fe80:0000:0000:0000:58e8:01ff:fe0b:0d
b2 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:33:01 node-10-157-128-100 kernel: [79544.160335] IN=evrr-000000 OUT= MAC=33:33:00:00:00:01:00:15:17:a6:cd:49:86:dd SRC=fe80:0000:0000:0000:0215:17ff:fea6:cd
49 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:33:01 node-10-157-128-100 kernel: [79544.160417] IN=evrr-000008 OUT= MAC=33:33:00:00:00:01:0e:f1:9d:53:b5:1e:86:dd SRC=fe80:0000:0000:0000:58e8:01ff:fe0b:0d
b2 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780106] BUG: unable to handle kernel NULL pointer dereference at 0000000000000274
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780274] IP: [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780398] PGD 0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780441] Oops: 0000 [#1] SMP
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780515] CPU 7
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.780547] Modules linked in: ipt_REJECT nf_conntrack_netlink nfnetlink dm_round_robin dm_multipath veth ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ebt_ip ebtable_filter ebtables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi bonding usb_storage uas joydev usbhid hid radeon ttm ses enclosure drm_kms_helper drm kvm_amd psmouse kvm amd64_edac_mod dcdbas serio_raw e1000e megaraid_sas edac_core i2c_algo_bit k8temp edac_mce_amd pata_serverworks shpchp i2c_piix4 bridge 8021q garp stp ixgbe dca mdio [last unloaded: multipath]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.786143]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Pid: 25711, comm: kworker/u:2 Not tainted 3.0.0-10-server #16-Ubuntu Dell Inc. PowerEdge 6950/0GK775
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RIP: 0010:[<ffffffff81511959>]  [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RSP: 0018:ffff880801109c00  EFLAGS: 00010246
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RAX: 0000000000000004 RBX: ffff880c0f796af8 RCX: 000000000000ffff
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RDX: ffff8803f1700000 RSI: 0000000000000003 RDI: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] RBP: ffff880801109c00 R08: ffff880801108000 R09: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] R10: 0000000000000001 R11: dead000000200200 R12: ffff880801109cb0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] R13: ffff880c0f796af8 R14: ffff880c0f796af8 R15: 0000000000000004
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] FS: 00007fe2d2d57710(0000) GS:ffff881027d00000(0000) knlGS:0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] CR2: 0000000000000274 CR3: 0000000001c03000 CR4: 00000000000006e0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Process kworker/u:2 (pid: 25711, threadinfo ffff880801108000, task ffff88080f5f9720)
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880801109c10 ffffffffa048f145 ffff880801109c90 ffffffffa049943b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880801109c70 7fffffffffffffff ffff8803f1700000 0000000300000004
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] ffff880800000000 ffffffff00000002 ffff880c0fc80000 ffff880c0f796b98
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] Call Trace:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa048f145>] nfnetlink_has_listeners+0x15/0x20 [nfnetlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa049943b>] ctnetlink_conntrack_event+0x5cb/0x890 [nf_conntrack_netlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e34d0>] ? net_drop_ns+0x50/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa04062d8>] death_by_timeout+0xc8/0x1c0 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0405270>] ? nf_conntrack_attach+0x50/0x50 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0406448>] nf_ct_iterate_cleanup+0x78/0x90 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0406491>] nf_conntrack_cleanup_net+0x31/0x100 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa0407f97>] nf_conntrack_cleanup+0x27/0x60 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffffa04081f0>] nf_conntrack_net_exit+0x60/0x80 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e2d28>] ops_exit_list.isra.1+0x38/0x60
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff814e35e2>] cleanup_net+0x112/0x1b0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107bb0a>] process_one_work+0x11a/0x480
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107c7d5>] worker_thread+0x165/0x370
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff8107c670>] ? manage_workers.isra.30+0x130/0x130
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.790033] [<ffffffff81080c1c>] kthread+0x8c/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81607c24>] kernel_thread_helper+0x4/0x10
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81080b90>] ? flush_kthread_worker+0xa0/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] [<ffffffff81607c20>] ? gs_change+0x13/0x13
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] Code: 00 00 48 85 f6 74 0c 48 83 ee 01 48 89 df e8 cf f6 ff ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66 90 <f6> 87 74 02 00 00 01 74 30 0f b6 87 21 01 00 00 4c 8b 05 38 8d
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] RIP [<ffffffff81511959>] netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699]  RSP <ffff880801109c00>
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.950699] CR2: 0000000000000274
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954892] ---[ end trace 73540474560834fd ]---
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954941] BUG: unable to handle kernel paging request at fffffffffffffff8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954945] IP: [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954950] PGD 1c05067 PUD 1c06067 PMD 0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954954] Oops: 0000 [#2] SMP
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954957] CPU 7
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.954959] Modules linked in: ipt_REJECT nf_conntrack_netlink nfnetlink dm_round_robin dm_multipath veth ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state xt_tcpudp iptable_filter ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ebt_ip ebtable_filter ebtables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi bonding usb_storage uas joydev usbhid hid radeon ttm ses enclosure drm_kms_helper drm kvm_amd psmouse kvm amd64_edac_mod dcdbas serio_raw e1000e megaraid_sas edac_core i2c_algo_bit k8temp edac_mce_amd pata_serverworks shpchp i2c_piix4 bridge 8021q garp stp ixgbe dca mdio [last unloaded: multipath]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955032]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955034] Pid: 25711, comm: kworker/u:2 Tainted: G      D     3.0.0-10-server #16-Ubuntu Dell Inc. PowerEdge 6950/0GK775
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955040] RIP: 0010:[<ffffffff810810b1>]  [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955046] RSP: 0018:ffff880801109870  EFLAGS: 00010096
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955048] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000007
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955051] RDX: 0000000000000007 RSI: 0000000000000007 RDI: ffff88080f5f9720
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955054] RBP: ffff880801109888 R08: 0000000000989680 R09: 0000000000000001
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955057] R10: 0000000000000400 R11: ffff88080f65c118 R12: 0000000000000007
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955060] R13: ffff88080f5f9ae8 R14: 0000000000000000 R15: 0000000000000246
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955064] FS: 00007fe2d2d57710(0000) GS:ffff881027d00000(0000) knlGS:0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955067] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955070] CR2: fffffffffffffff8 CR3: 0000000001c03000 CR4: 00000000000006e0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955073] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955076] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955079] Process kworker/u:2 (pid: 25711, threadinfo ffff880801108000, task ffff88080f5f9720)
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955082] Stack:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955083] ffffffff8107cd25 ffff880801109888 ffff881027d12a40 ffff880801109908
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955089] ffffffff815fc737 ffff88080f5f9720 0000000000000000 ffff880801109fd8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955094] ffff880801109fd8 ffff880801109fd8 0000000000012a40 0000000000000011
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955100] Call Trace:
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955103] [<ffffffff8107cd25>] ? wq_worker_sleeping+0x15/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955108] [<ffffffff815fc737>] schedule+0x637/0x770
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955114] [<ffffffff81063053>] do_exit+0x273/0x440
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955119] [<ffffffff815ffbd0>] oops_end+0xb0/0xf0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955124] [<ffffffff815e7104>] no_context+0x145/0x152
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955128] [<ffffffff815e729f>] __bad_area_nosemaphore+0x18e/0x1b1
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955133] [<ffffffff815e72d5>] bad_area_nosemaphore+0x13/0x15
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955138] [<ffffffff816024fd>] do_page_fault+0x43d/0x530
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955144] [<ffffffff8100969a>] ? __switch_to+0xca/0x310
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955148] [<ffffffff815fe73e>] ? _raw_spin_lock+0xe/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955154] [<ffffffff8104e749>] ? finish_task_switch+0x49/0xf0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955158] [<ffffffff815fef15>] page_fault+0x25/0x30
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955162] [<ffffffff81511959>] ? netlink_has_listeners+0x9/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955167] [<ffffffffa048f145>] nfnetlink_has_listeners+0x15/0x20 [nfnetlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955172] [<ffffffffa049943b>] ctnetlink_conntrack_event+0x5cb/0x890 [nf_conntrack_netlink]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955177] [<ffffffff814e34d0>] ? net_drop_ns+0x50/0x50
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955183] [<ffffffffa04062d8>] death_by_timeout+0xc8/0x1c0 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955189] [<ffffffffa0405270>] ? nf_conntrack_attach+0x50/0x50 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955195] [<ffffffffa0406448>] nf_ct_iterate_cleanup+0x78/0x90 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955202] [<ffffffffa0406491>] nf_conntrack_cleanup_net+0x31/0x100 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955209] [<ffffffffa0407f97>] nf_conntrack_cleanup+0x27/0x60 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955215] [<ffffffffa04081f0>] nf_conntrack_net_exit+0x60/0x80 [nf_conntrack]
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955220] [<ffffffff814e2d28>] ops_exit_list.isra.1+0x38/0x60
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955224] [<ffffffff814e35e2>] cleanup_net+0x112/0x1b0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955229] [<ffffffff8107bb0a>] process_one_work+0x11a/0x480
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955233] [<ffffffff8107c7d5>] worker_thread+0x165/0x370
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955237] [<ffffffff8107c670>] ? manage_workers.isra.30+0x130/0x130
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955241] [<ffffffff81080c1c>] kthread+0x8c/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955245] [<ffffffff81607c24>] kernel_thread_helper+0x4/0x10
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955250] [<ffffffff81080b90>] ? flush_kthread_worker+0xa0/0xa0
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955254] [<ffffffff81607c20>] ? gs_change+0x13/0x13
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955256] Code: 41 5f 5d c3 be 3e 01 00 00 48 c7 c7 88 90 9e 81 e8 b5 d7 fd ff e9 74 fe ff ff 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03 00 00 5d
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955311] RIP [<ffffffff810810b1>] kthread_data+0x11/0x20
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955315]  RSP <ffff880801109870>
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955317] CR2: fffffffffffffff8
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955319] ---[ end trace 73540474560834fe ]---
Sep  9 14:34:00 node-10-157-128-100 kernel: [79602.955322] Fixing recursive fault but reboot is needed!
Sep  9 14:35:00 node-10-157-128-100 kernel: [79663.120046] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=6002 jiffies)
Sep  9 14:35:05 node-10-157-128-100 kernel: [79667.970047] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=6002 jiffies)
Sep  9 14:38:00 node-10-157-128-100 kernel: [79843.440048] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=24034 jiffies)
Sep  9 14:38:05 node-10-157-128-100 kernel: [79848.290046] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=24034 jiffies)
Sep  9 14:41:00 node-10-157-128-100 kernel: [80023.760051] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=42066 jiffies)
Sep  9 14:41:05 node-10-157-128-100 kernel: [80028.610044] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=42066 jiffies)
Sep  9 14:44:01 node-10-157-128-100 kernel: [80204.080047] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=60098 jiffies)
Sep  9 14:44:06 node-10-157-128-100 kernel: [80208.930045] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 1 7} (detected by 6, t=60098 jiffies)
Sep  9 14:44:44 node-10-157-128-100 kernel: Kernel logging (proc) stopped.

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-09-30 15:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-10 18:48 [PATCH] Fix repeatable Oops on container destroy with conntrack Alex Bligh
2011-09-10 18:48 ` Alex Bligh
2011-09-10 18:48 ` Alex Bligh
2011-09-12  7:25 ` Alexey Dobriyan
2011-09-12  9:37   ` Pablo Neira Ayuso
2011-09-12 10:32     ` Alex Bligh
2011-09-12 18:33       ` Pablo Neira Ayuso
2011-09-12 19:06         ` Alex Bligh
2011-09-13 20:44           ` Alex Bligh
2011-09-13 20:44             ` Alex Bligh
2011-09-14  1:35             ` Pablo Neira Ayuso
2011-09-14  8:01               ` Alex Bligh
2011-09-28 21:08                 ` Pablo Neira Ayuso
2011-09-30 15:54                   ` Alex Bligh
  -- strict thread matches above, loose matches on Subject: below --
2011-09-10 18:48 Alex Bligh
2011-09-10 18:48 Alex Bligh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.