All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Eric Dumazet <eric.dumazet@gmail.com>,
	Florian Westphal <fw@strlen.de>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Patrick McHardy <kaber@trash.net>,
	Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
	"David S. Miller" <davem@davemloft.net>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Andrey Vagin <avagin@openvz.org>,
	Eric Dumazet <edumazet@google.com>
Subject: [PATCH 3.10 03/80] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
Date: Tue,  1 Mar 2016 15:44:57 -0800	[thread overview]
Message-ID: <20160301234349.777272833@linuxfoundation.org> (raw)
In-Reply-To: <20160301234349.667990420@linuxfoundation.org>

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Andrey Vagin <avagin@openvz.org>

commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

Lets look at destroy_conntrack:

hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
...
nf_conntrack_free(ct)
	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);

net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.

The hash is protected by rcu, so readers look up conntracks without
locks.
A conntrack is removed from the hash, but in this moment a few readers
still can use the conntrack. Then this conntrack is released and another
thread creates conntrack with the same address and the equal tuple.
After this a reader starts to validate the conntrack:
* It's not dying, because a new conntrack was created
* nf_ct_tuple_equal() returns true.

But this conntrack is not initialized yet, so it can not be used by two
threads concurrently. In this case BUG_ON may be triggered from
nf_nat_setup_info().

Florian Westphal suggested to check the confirm bit too. I think it's
right.

task 1			task 2			task 3
			nf_conntrack_find_get
			 ____nf_conntrack_find
destroy_conntrack
 hlist_nulls_del_rcu
 nf_conntrack_free
 kmem_cache_free
						__nf_conntrack_alloc
						 kmem_cache_alloc
						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
			 if (nf_ct_is_dying(ct))
			 if (!nf_ct_tuple_equal()

I'm not sure, that I have ever seen this race condition in a real life.
Currently we are investigating a bug, which is reproduced on a few nodes.
In our case one conntrack is initialized from a few tasks concurrently,
we don't have any other explanation for this.

<2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
...
<4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>]  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat]
...
<4>[46267.085549] Call Trace:
<4>[46267.085622]  [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat]
<4>[46267.085697]  [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
<4>[46267.085770]  [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat]
<4>[46267.085843]  [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat]
<4>[46267.085919]  [<ffffffff814841b9>] nf_iterate+0x69/0xb0
<4>[46267.085991]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[46267.086063]  [<ffffffff81484374>] nf_hook_slow+0x74/0x110
<4>[46267.086133]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[46267.086207]  [<ffffffff814b5890>] ? dst_output+0x0/0x20
<4>[46267.086277]  [<ffffffff81495204>] ip_output+0xa4/0xc0
<4>[46267.086346]  [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910
<4>[46267.086419]  [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0
<4>[46267.086491]  [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50
<4>[46267.086562]  [<ffffffff81444d67>] sock_sendmsg+0x117/0x140
<4>[46267.086638]  [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20
<4>[46267.086712]  [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40
<4>[46267.086785]  [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80
<4>[46267.086858]  [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20
<4>[46267.086936]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
<4>[46267.087006]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
<4>[46267.087081]  [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0
<4>[46267.087151]  [<ffffffff81445599>] sys_sendto+0x139/0x190
<4>[46267.087229]  [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0
<4>[46267.087303]  [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200
<4>[46267.087378]  [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290
<4>[46267.087454]  [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210
<4>[46267.087531]  [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210
<4>[46267.087607]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
<4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
<1>[46267.088023] RIP  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/netfilter/nf_conntrack_core.c |   21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -311,6 +311,21 @@ static void death_by_timeout(unsigned lo
 	nf_ct_put(ct);
 }
 
+static inline bool
+nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
+			const struct nf_conntrack_tuple *tuple,
+			u16 zone)
+{
+	struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
+
+	/* A conntrack can be recreated with the equal tuple,
+	 * so we need to check that the conntrack is confirmed
+	 */
+	return nf_ct_tuple_equal(tuple, &h->tuple) &&
+		nf_ct_zone(ct) == zone &&
+		nf_ct_is_confirmed(ct);
+}
+
 /*
  * Warning :
  * - Caller must take a reference on returned object
@@ -332,8 +347,7 @@ ____nf_conntrack_find(struct net *net, u
 	local_bh_disable();
 begin:
 	hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[bucket], hnnode) {
-		if (nf_ct_tuple_equal(tuple, &h->tuple) &&
-		    nf_ct_zone(nf_ct_tuplehash_to_ctrack(h)) == zone) {
+		if (nf_ct_key_equal(h, tuple, zone)) {
 			NF_CT_STAT_INC(net, found);
 			local_bh_enable();
 			return h;
@@ -380,8 +394,7 @@ begin:
 			     !atomic_inc_not_zero(&ct->ct_general.use)))
 			h = NULL;
 		else {
-			if (unlikely(!nf_ct_tuple_equal(tuple, &h->tuple) ||
-				     nf_ct_zone(ct) != zone)) {
+			if (unlikely(!nf_ct_key_equal(h, tuple, zone))) {
 				nf_ct_put(ct);
 				goto begin;
 			}

  parent reply	other threads:[~2016-03-02  2:39 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-01 23:44 [PATCH 3.10 00/80] 3.10.99-stable review Greg Kroah-Hartman
2016-03-01 23:44 ` [PATCH 3.10 01/80] tracepoints: Do not trace when cpu is offline Greg Kroah-Hartman
2016-03-01 23:44 ` [PATCH 3.10 02/80] drm/ast: Initialized data needed to map fbdev memory Greg Kroah-Hartman
2016-03-01 23:44 ` Greg Kroah-Hartman [this message]
2016-03-01 23:44 ` [PATCH 3.10 04/80] bcache: unregister reboot notifier if bcache fails to unregister device Greg Kroah-Hartman
2016-03-01 23:44 ` [PATCH 3.10 05/80] tools: Add a "make all" rule Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 06/80] drm/radeon: fix hotplug race at startup Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 07/80] efi: Disable interrupts around EFI calls, not in the epilog/prolog calls Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 08/80] dm thin metadata: fix bug when taking a metadata snapshot Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 09/80] dm thin: fix race condition when destroying thin pool workqueue Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 10/80] can: ems_usb: Fix possible tx overflow Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 11/80] USB: cp210x: add IDs for GE B650V3 and B850V3 boards Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 12/80] USB: option: add support for SIM7100E Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 14/80] proc: Fix ptrace-based permission checks for accessing task maps Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 15/80] iw_cxgb3: Fix incorrectly returning error on success Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 16/80] MIPS: KVM: Fix ASID restoration logic Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 17/80] MIPS: KVM: Fix CACHE immediate offset sign extension Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 18/80] MIPS: KVM: Uninit VCPU in vcpu_create error path Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 19/80] splice: sendfile() at once fails for big files Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 21/80] unix: correctly track in-flight fds in sending process user_struct Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 23/80] dts: vt8500: Add SDHC node to DTS file for WM8650 Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 24/80] clocksource/drivers/vt8500: Increase the minimum delta Greg Kroah-Hartman
2016-03-01 23:45   ` Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 25/80] lockd: create NSM handles per net namespace Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 26/80] devres: fix a for loop bounds check Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 27/80] wm831x_power: Use IRQF_ONESHOT to request threaded IRQs Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 28/80] megaraid_sas: Do not use PAGE_SIZE for max_sectors Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 29/80] megaraid_sas : SMAP restriction--do not access user memory from IOCTL code Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 30/80] mmc: remove bondage between REQ_META and reliable write Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 31/80] mac: validate mac_partition is within sector Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 32/80] ARC: dw2 unwind: Remove falllback linear search thru FDE entries Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 33/80] vfs: Avoid softlockups with sendfile(2) Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 34/80] ring-buffer: Update read stamp with first real commit on page Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 35/80] virtio: fix memory leak of virtio ida cache layers Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 36/80] mac80211: mesh: fix call_rcu() usage Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 37/80] RDS: fix race condition when sending a message on unbound socket Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 38/80] can: sja1000: clear interrupts on start Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 39/80] sched/core: Remove false-positive warning from wake_up_process() Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 40/80] sata_sil: disable trim Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 41/80] dm btree: fix bufio buffer leaks in dm_btree_del() error path Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 42/80] vgaarb: fix signal handling in vga_get() Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 43/80] rfkill: copy the name into the rfkill struct Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 44/80] ses: Fix problems with simple enclosures Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 45/80] ses: fix additional element traversal bug Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 46/80] scripts: recordmcount: break hardlinks Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 47/80] Btrfs: add missing brelse when superblock checksum fails Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 48/80] Btrfs: igrab inode in writepage Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 49/80] Btrfs: send, dont BUG_ON() when an empty symlink is found Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 50/80] Btrfs: fix number of transaction units required to create symlink Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 51/80] s390: fix normalization bug in exception table sorting Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 52/80] s390/dasd: prevent incorrect length error under z/VM after PAV changes Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 53/80] s390/dasd: fix refcount for PAV reassignment Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 54/80] uml: flush stdout before forking Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 55/80] uml: fix hostfs mknod() Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 56/80] [media] media: dvb-core: Dont force CAN_INVERSION_AUTO in oneshot mode Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 57/80] [media] gspca: ov534/topro: prevent a division by 0 Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 58/80] [media] tda1004x: only update the frontend properties if locked Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 59/80] dm snapshot: fix hung bios when copy error occurs Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 60/80] posix-clock: Fix return code on the poll methods error path Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 61/80] mmc: mmci: fix an ages old detection error Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 62/80] sparc64: fix incorrect sign extension in sys_sparc64_personality Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 63/80] drm/vmwgfx: respect nomodeset Greg Kroah-Hartman
2016-03-01 23:45 ` [PATCH 3.10 64/80] drm/radeon: clean up fujitsu quirks Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 67/80] IB/qib: fix mcast detach when qp not attached Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 68/80] libceph: dont bail early from try_read() when skipping a message Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 69/80] cdc-acm:exclude Samsung phone 04e8:685d Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 70/80] rfkill: fix rfkill_fop_read wait_event usage Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 71/80] Revert "workqueue: make sure delayed work run in local cpu" Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 72/80] libata: fix sff host state machine locking while polling Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 73/80] PCI/AER: Flush workqueue on device remove to avoid use-after-free Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 74/80] nfs: fix nfs_size_to_loff_t Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 75/80] KVM: async_pf: do not warn on page allocation failures Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 76/80] tracing: Fix showing function event in available_events Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 77/80] sunrpc/cache: fix off-by-one in qword_get() Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 78/80] kernel/resource.c: fix muxed resource handling in __request_region() Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 79/80] do_last(): dont let a bogus return value from ->open() et.al. to confuse us Greg Kroah-Hartman
2016-03-01 23:46 ` [PATCH 3.10 80/80] xen/pcifront: Fix mysterious crashes when NUMA locality information was extracted Greg Kroah-Hartman
2016-03-02  1:37 ` [PATCH 3.10 00/80] 3.10.99-stable review Shuah Khan
2016-03-02 14:32 ` Guenter Roeck
2016-03-02 15:48   ` Willy Tarreau
2016-03-02 17:29     ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160301234349.777272833@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=avagin@openvz.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=fw@strlen.de \
    --cc=gorcunov@openvz.org \
    --cc=kaber@trash.net \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.