From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9EB4C4360F for ; Thu, 4 Apr 2019 09:24:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A0D7F20449 for ; Thu, 4 Apr 2019 09:24:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1554369869; bh=Sge524wtbEGLP1Fgee0ladk6Co39L53MMFu5z/9qCfM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=DQgGB38yigWkGWdFPN0kQoCJw5khjgIpcikgXj9B4gnYpiXx+s3NImkYuhLbedxZU 3uTlRSFkoibKoKCq2RJhj5kdPhiAWVuU5v8S8WhN5r9O0xHyb1dWR96IP9T/IVt9AU hSTHJRcoUFJyuQqWVlh4v02f4ki0DfihRMgTN0Bo= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387520AbfDDJO3 (ORCPT ); Thu, 4 Apr 2019 05:14:29 -0400 Received: from mail.kernel.org ([198.145.29.99]:54940 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387698AbfDDJOT (ORCPT ); Thu, 4 Apr 2019 05:14:19 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B0A2D21734; Thu, 4 Apr 2019 09:14:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1554369259; bh=Sge524wtbEGLP1Fgee0ladk6Co39L53MMFu5z/9qCfM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NYez6fO0Z3eLefKfb7wkn7DZ4rLIOzqodAAgzArbgTJ0gOR7sdc9cerq7/J7z1mJX SdC2DE2yfJ/DD1NOYSTxkPlS+7aVMvMaPsB/+3RByjE9QlR6E8OTLhXzgzkysrDF0e ru/g4T6Vn3JBHUa5ouJnY6ZWUseEghReVFg/JZOM= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Chieh-Min Wang , Pablo Neira Ayuso , Sasha Levin Subject: [PATCH 5.0 143/246] netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm Date: Thu, 4 Apr 2019 10:47:23 +0200 Message-Id: <20190404084624.174384112@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190404084619.236418459@linuxfoundation.org> References: <20190404084619.236418459@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 5.0-stable review patch. If anyone has any objections, please let me know. ------------------ [ Upstream commit 13f5251fd17088170c18844534682d9cab5ff5aa ] For bridge(br_flood) or broadcast/multicast packets, they could clone skb with unconfirmed conntrack which break the rule that unconfirmed skb->_nfct is never shared. With nfqueue running on my system, the race can be easily reproduced with following warning calltrace: [13257.707525] CPU: 0 PID: 12132 Comm: main Tainted: P W 4.4.60 #7744 [13257.707568] Hardware name: Qualcomm (Flattened Device Tree) [13257.714700] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [13257.720253] [] (show_stack) from [] (dump_stack+0x94/0xa8) [13257.728240] [] (dump_stack) from [] (warn_slowpath_common+0x94/0xb0) [13257.735268] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24) [13257.743519] [] (warn_slowpath_null) from [] (__nf_conntrack_confirm+0xa8/0x618) [13257.752284] [] (__nf_conntrack_confirm) from [] (ipv4_confirm+0xb8/0xfc) [13257.761049] [] (ipv4_confirm) from [] (nf_iterate+0x48/0xa8) [13257.769725] [] (nf_iterate) from [] (nf_hook_slow+0x30/0xb0) [13257.777108] [] (nf_hook_slow) from [] (br_nf_post_routing+0x274/0x31c) [13257.784486] [] (br_nf_post_routing) from [] (nf_iterate+0x48/0xa8) [13257.792556] [] (nf_iterate) from [] (nf_hook_slow+0x30/0xb0) [13257.800458] [] (nf_hook_slow) from [] (br_forward_finish+0x94/0xa4) [13257.808010] [] (br_forward_finish) from [] (br_nf_forward_finish+0x150/0x1ac) [13257.815736] [] (br_nf_forward_finish) from [] (nf_reinject+0x108/0x170) [13257.824762] [] (nf_reinject) from [] (nfqnl_recv_verdict+0x3d8/0x420) [13257.832924] [] (nfqnl_recv_verdict) from [] (nfnetlink_rcv_msg+0x158/0x248) [13257.841256] [] (nfnetlink_rcv_msg) from [] (netlink_rcv_skb+0x54/0xb0) [13257.849762] [] (netlink_rcv_skb) from [] (netlink_unicast+0x148/0x23c) [13257.858093] [] (netlink_unicast) from [] (netlink_sendmsg+0x2ec/0x368) [13257.866348] [] (netlink_sendmsg) from [] (sock_sendmsg+0x34/0x44) [13257.874590] [] (sock_sendmsg) from [] (___sys_sendmsg+0x1ec/0x200) [13257.882489] [] (___sys_sendmsg) from [] (__sys_sendmsg+0x3c/0x64) [13257.890300] [] (__sys_sendmsg) from [] (ret_fast_syscall+0x0/0x34) The original code just triggered the warning but do nothing. It will caused the shared conntrack moves to the dying list and the packet be droppped (nf_ct_resolve_clash returns NF_DROP for dying conntrack). - Reproduce steps: +----------------------------+ | br0(bridge) | | | +-+---------+---------+------+ | eth0| | eth1| | eth2| | | | | | | +--+--+ +--+--+ +---+-+ | | | | | | +--+-+ +-+--+ +--+-+ | PC1| | PC2| | PC3| +----+ +----+ +----+ iptables -A FORWARD -m mark --mark 0x1000000/0x1000000 -j NFQUEUE --queue-num 100 --queue-bypass ps: Our nfq userspace program will set mark on packets whose connection has already been processed. PC1 sends broadcast packets simulated by hping3: hping3 --rand-source --udp 192.168.1.255 -i u100 - Broadcast racing flow chart is as follow: br_handle_frame BR_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, br_handle_frame_finish) // skb->_nfct (unconfirmed conntrack) is constructed at PRE_ROUTING stage br_handle_frame_finish // check if this packet is broadcast br_flood_forward br_flood list_for_each_entry_rcu(p, &br->port_list, list) // iterate through each port maybe_deliver deliver_clone skb = skb_clone(skb) __br_forward BR_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD,...) // queue in our nfq and received by our userspace program // goto __nf_conntrack_confirm with process context on CPU 1 br_pass_frame_up BR_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,...) // goto __nf_conntrack_confirm with softirq context on CPU 0 Because conntrack confirm can happen at both INPUT and POSTROUTING stage. So with NFQUEUE running, skb->_nfct with the same unconfirmed conntrack could race on different core. This patch fixes a repeating kernel splat, now it is only displayed once. Signed-off-by: Chieh-Min Wang Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/nf_conntrack_core.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index db4d46332e86..9dd4c2048a2b 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -901,10 +901,18 @@ __nf_conntrack_confirm(struct sk_buff *skb) * REJECT will give spurious warnings here. */ - /* No external references means no one else could have - * confirmed us. + /* Another skb with the same unconfirmed conntrack may + * win the race. This may happen for bridge(br_flood) + * or broadcast/multicast packets do skb_clone with + * unconfirmed conntrack. */ - WARN_ON(nf_ct_is_confirmed(ct)); + if (unlikely(nf_ct_is_confirmed(ct))) { + WARN_ON_ONCE(1); + nf_conntrack_double_unlock(hash, reply_hash); + local_bh_enable(); + return NF_DROP; + } + pr_debug("Confirming conntrack %p\n", ct); /* We have to check the DYING flag after unlink to prevent * a race against nf_ct_get_next_corpse() possibly called from -- 2.19.1