All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf 0/3] netfilter: nat: fix ancient dnat+edemux bug
@ 2021-02-24 16:23 Florian Westphal
  2021-02-24 16:23 ` [PATCH nf 1/3] netfilter: nf_nat: undo erroneous tcp edemux lookup Florian Westphal
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Florian Westphal @ 2021-02-24 16:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Netfilter NAT collision handling + TCP edemux can cause packets to end
up with the wrong socket.
This happens since TCP early demux was added more than 8 years ago, so
this needs very rare and specific conditions to trigger.

Patch 1 fixes the bug.
Patch 2 rewords a debug message that imlies packets are treated
as invalid while they are not.
Patch 3 adds a test case for this.  On unpatched kernel this script
should error out with:
(UNKNOWN) [10.96.0.1] 443 (https) : Connection timed out
FAIL: nc cannot connect via NAT'd address

Florian Westphal (3):
  netfilter: nf_nat: undo erroneous tcp edemux lookup
  netfilter: conntrack: avoid misleading 'invalid' in log message
  selftests: netfilter: test nat port clash resolution interaction with
    tcp early demux

 net/netfilter/nf_conntrack_proto_tcp.c        |  6 +-
 net/netfilter/nf_nat_proto.c                  | 25 ++++-
 tools/testing/selftests/netfilter/Makefile    |  2 +-
 .../selftests/netfilter/nf_nat_edemux.sh      | 99 +++++++++++++++++++
 4 files changed, 125 insertions(+), 7 deletions(-)
 create mode 100755 tools/testing/selftests/netfilter/nf_nat_edemux.sh

-- 
2.26.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH nf 1/3] netfilter: nf_nat: undo erroneous tcp edemux lookup
  2021-02-24 16:23 [PATCH nf 0/3] netfilter: nat: fix ancient dnat+edemux bug Florian Westphal
@ 2021-02-24 16:23 ` Florian Westphal
  2021-02-24 16:23 ` [PATCH nf 2/3] netfilter: conntrack: avoid misleading 'invalid' in log message Florian Westphal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2021-02-24 16:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Under extremely rare conditions TCP early demux will retrieve the wrong
socket.

1. local machine establishes a connection to a remote server, S, on port
   p.

   This gives:
   laddr:lport -> S:p
   ... both in tcp and conntrack.

2. local machine establishes a connection to host H, on port p2.
   2a. TCP stack choses same laddr:lport, so we have
   laddr:lport -> H:p2 from TCP point of view.
   2b). There is a destination NAT rewrite in place, translating
        H:p2 to S:p.  This results in following conntrack entries:

   I)  laddr:lport -> S:p  (origin)  S:p -> laddr:lport (reply)
   II) laddr:lport -> H:p2 (origin)  S:p -> laddr:lport2 (reply)

   NAT engine has rewritten laddr:lport to laddr:lport2 to map
   the reply packet to the correct origin.

   When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
   will undo-the SNAT transformation, rewriting IP header to
   S:p -> laddr:lport

   This causes TCP early demux to associate the skb with the TCP socket
   of the first connection.

   The INPUT hook will then reverse the DNAT transformation, rewriting
   the IP header to H:p2 -> laddr:lport.

Because packet ends up with the wrong socket, the new connection
never completes: originator stays in SYN_SENT and conntrack entry
remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
until it gives up.

To resolve this, orphan the skb after the input rewrite:
Because the source IP address changed, the socket must be incorrect.
We can't move the DNAT undo to prerouting due to backwards
compatibility, doing so will make iptables/nftables rules to no longer
match the way they did.

After orphan, the packet will be handed to the next protocol layer
(tcp, udp, ...) and that will repeat the socket lookup just like as if
early demux was disabled.

Fixes: 41063e9dd1195 ("ipv4: Early TCP socket demux.")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_nat_proto.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_nat_proto.c b/net/netfilter/nf_nat_proto.c
index e87b6bd6b3cd..4731d21fc3ad 100644
--- a/net/netfilter/nf_nat_proto.c
+++ b/net/netfilter/nf_nat_proto.c
@@ -646,8 +646,8 @@ nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
 }
 
 static unsigned int
-nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
-	       const struct nf_hook_state *state)
+nf_nat_ipv4_pre_routing(void *priv, struct sk_buff *skb,
+			const struct nf_hook_state *state)
 {
 	unsigned int ret;
 	__be32 daddr = ip_hdr(skb)->daddr;
@@ -659,6 +659,23 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
 	return ret;
 }
 
+static unsigned int
+nf_nat_ipv4_local_in(void *priv, struct sk_buff *skb,
+		     const struct nf_hook_state *state)
+{
+	__be32 saddr = ip_hdr(skb)->saddr;
+	struct sock *sk = skb->sk;
+	unsigned int ret;
+
+	ret = nf_nat_ipv4_fn(priv, skb, state);
+
+	if (ret == NF_ACCEPT && sk && saddr != ip_hdr(skb)->saddr &&
+	    !inet_sk_transparent(sk))
+		skb_orphan(skb); /* TCP edemux obtained wrong socket */
+
+	return ret;
+}
+
 static unsigned int
 nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
 		const struct nf_hook_state *state)
@@ -736,7 +753,7 @@ nf_nat_ipv4_local_fn(void *priv, struct sk_buff *skb,
 static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
 	/* Before packet filtering, change destination */
 	{
-		.hook		= nf_nat_ipv4_in,
+		.hook		= nf_nat_ipv4_pre_routing,
 		.pf		= NFPROTO_IPV4,
 		.hooknum	= NF_INET_PRE_ROUTING,
 		.priority	= NF_IP_PRI_NAT_DST,
@@ -757,7 +774,7 @@ static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
 	},
 	/* After packet filtering, change source */
 	{
-		.hook		= nf_nat_ipv4_fn,
+		.hook		= nf_nat_ipv4_local_in,
 		.pf		= NFPROTO_IPV4,
 		.hooknum	= NF_INET_LOCAL_IN,
 		.priority	= NF_IP_PRI_NAT_SRC,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH nf 2/3] netfilter: conntrack: avoid misleading 'invalid' in log message
  2021-02-24 16:23 [PATCH nf 0/3] netfilter: nat: fix ancient dnat+edemux bug Florian Westphal
  2021-02-24 16:23 ` [PATCH nf 1/3] netfilter: nf_nat: undo erroneous tcp edemux lookup Florian Westphal
@ 2021-02-24 16:23 ` Florian Westphal
  2021-02-24 16:23 ` [PATCH nf 3/3] selftests: netfilter: test nat port clash resolution interaction with tcp early demux Florian Westphal
  2021-02-27 21:33 ` [PATCH nf 0/3] netfilter: nat: fix ancient dnat+edemux bug Pablo Neira Ayuso
  3 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2021-02-24 16:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

The packet is not flagged as invalid: conntrack will accept it and
its associated with the conntrack entry.

This happens e.g. when receiving a retransmitted SYN in SYN_RECV state.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_proto_tcp.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 1d7e1c595546..ec23330687a5 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -982,8 +982,10 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
 					IP_CT_EXP_CHALLENGE_ACK;
 		}
 		spin_unlock_bh(&ct->lock);
-		nf_ct_l4proto_log_invalid(skb, ct, "invalid packet ignored in "
-					  "state %s ", tcp_conntrack_names[old_state]);
+		nf_ct_l4proto_log_invalid(skb, ct,
+					  "packet (index %d) in dir %d ignored, state %s",
+					  index, dir,
+					  tcp_conntrack_names[old_state]);
 		return NF_ACCEPT;
 	case TCP_CONNTRACK_MAX:
 		/* Special case for SYN proxy: when the SYN to the server or
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH nf 3/3] selftests: netfilter: test nat port clash resolution interaction with tcp early demux
  2021-02-24 16:23 [PATCH nf 0/3] netfilter: nat: fix ancient dnat+edemux bug Florian Westphal
  2021-02-24 16:23 ` [PATCH nf 1/3] netfilter: nf_nat: undo erroneous tcp edemux lookup Florian Westphal
  2021-02-24 16:23 ` [PATCH nf 2/3] netfilter: conntrack: avoid misleading 'invalid' in log message Florian Westphal
@ 2021-02-24 16:23 ` Florian Westphal
  2021-02-27 21:33 ` [PATCH nf 0/3] netfilter: nat: fix ancient dnat+edemux bug Pablo Neira Ayuso
  3 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2021-02-24 16:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Convert Antonio Ojeas bug reproducer to a kselftest.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 tools/testing/selftests/netfilter/Makefile    |  2 +-
 .../selftests/netfilter/nf_nat_edemux.sh      | 99 +++++++++++++++++++
 2 files changed, 100 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/netfilter/nf_nat_edemux.sh

diff --git a/tools/testing/selftests/netfilter/Makefile b/tools/testing/selftests/netfilter/Makefile
index 3006a8e5b41a..3171069a6b46 100644
--- a/tools/testing/selftests/netfilter/Makefile
+++ b/tools/testing/selftests/netfilter/Makefile
@@ -4,7 +4,7 @@
 TEST_PROGS := nft_trans_stress.sh nft_nat.sh bridge_brouter.sh \
 	conntrack_icmp_related.sh nft_flowtable.sh ipvs.sh \
 	nft_concat_range.sh nft_conntrack_helper.sh \
-	nft_queue.sh nft_meta.sh \
+	nft_queue.sh nft_meta.sh nf_nat_edemux.sh \
 	ipip-conntrack-mtu.sh
 
 LDLIBS = -lmnl
diff --git a/tools/testing/selftests/netfilter/nf_nat_edemux.sh b/tools/testing/selftests/netfilter/nf_nat_edemux.sh
new file mode 100755
index 000000000000..cfee3b65be0f
--- /dev/null
+++ b/tools/testing/selftests/netfilter/nf_nat_edemux.sh
@@ -0,0 +1,99 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test NAT source port clash resolution
+#
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+ret=0
+
+sfx=$(mktemp -u "XXXXXXXX")
+ns1="ns1-$sfx"
+ns2="ns2-$sfx"
+
+cleanup()
+{
+	ip netns del $ns1
+	ip netns del $ns2
+}
+
+iperf3 -v > /dev/null 2>&1
+if [ $? -ne 0 ];then
+	echo "SKIP: Could not run test without iperf3"
+	exit $ksft_skip
+fi
+
+iptables --version > /dev/null 2>&1
+if [ $? -ne 0 ];then
+	echo "SKIP: Could not run test without iptables"
+	exit $ksft_skip
+fi
+
+ip -Version > /dev/null 2>&1
+if [ $? -ne 0 ];then
+	echo "SKIP: Could not run test without ip tool"
+	exit $ksft_skip
+fi
+
+ip netns add "$ns1"
+if [ $? -ne 0 ];then
+	echo "SKIP: Could not create net namespace $ns1"
+	exit $ksft_skip
+fi
+
+trap cleanup EXIT
+
+ip netns add $ns2
+
+# Connect the namespaces using a veth pair
+ip link add name veth2 type veth peer name veth1
+ip link set netns $ns1 dev veth1
+ip link set netns $ns2 dev veth2
+
+ip netns exec $ns1 ip link set up dev lo
+ip netns exec $ns1 ip link set up dev veth1
+ip netns exec $ns1 ip addr add 192.168.1.1/24 dev veth1
+
+ip netns exec $ns2 ip link set up dev lo
+ip netns exec $ns2 ip link set up dev veth2
+ip netns exec $ns2 ip addr add 192.168.1.2/24 dev veth2
+
+# Create a server in one namespace
+ip netns exec $ns1 iperf3 -s > /dev/null 2>&1 &
+iperfs=$!
+
+# Restrict source port to just one so we don't have to exhaust
+# all others.
+ip netns exec $ns2 sysctl -q net.ipv4.ip_local_port_range="10000 10000"
+
+# add a virtual IP using DNAT
+ip netns exec $ns2 iptables -t nat -A OUTPUT -d 10.96.0.1/32 -p tcp --dport 443 -j DNAT --to-destination 192.168.1.1:5201
+
+# ... and route it to the other namespace
+ip netns exec $ns2 ip route add 10.96.0.1 via 192.168.1.1
+
+sleep 1
+
+# add a persistent connection from the other namespace
+ip netns exec $ns2 nc -q 10 -w 10 192.168.1.1 5201 > /dev/null &
+
+sleep 1
+
+# ip daddr:dport will be rewritten to 192.168.1.1 5201
+# NAT must reallocate source port 10000 because
+# 192.168.1.2:10000 -> 192.168.1.1:5201 is already in use
+echo test | ip netns exec $ns2 nc -w 3 -q 3 10.96.0.1 443 >/dev/null
+ret=$?
+
+kill $iperfs
+
+# Check nc can connect to 10.96.0.1:443 (aka 192.168.1.1:5201).
+if [ $ret -eq 0 ]; then
+	echo "PASS: nc can connect via NAT'd address"
+else
+	echo "FAIL: nc cannot connect via NAT'd address"
+	exit 1
+fi
+
+exit 0
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH nf 0/3] netfilter: nat: fix ancient dnat+edemux bug
  2021-02-24 16:23 [PATCH nf 0/3] netfilter: nat: fix ancient dnat+edemux bug Florian Westphal
                   ` (2 preceding siblings ...)
  2021-02-24 16:23 ` [PATCH nf 3/3] selftests: netfilter: test nat port clash resolution interaction with tcp early demux Florian Westphal
@ 2021-02-27 21:33 ` Pablo Neira Ayuso
  3 siblings, 0 replies; 5+ messages in thread
From: Pablo Neira Ayuso @ 2021-02-27 21:33 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Wed, Feb 24, 2021 at 05:23:18PM +0100, Florian Westphal wrote:
> Netfilter NAT collision handling + TCP edemux can cause packets to end
> up with the wrong socket.
> This happens since TCP early demux was added more than 8 years ago, so
> this needs very rare and specific conditions to trigger.
> 
> Patch 1 fixes the bug.
> Patch 2 rewords a debug message that imlies packets are treated
> as invalid while they are not.
> Patch 3 adds a test case for this.  On unpatched kernel this script
> should error out with:
> (UNKNOWN) [10.96.0.1] 443 (https) : Connection timed out
> FAIL: nc cannot connect via NAT'd address

Applied, thanks Florian.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-02-27 21:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-24 16:23 [PATCH nf 0/3] netfilter: nat: fix ancient dnat+edemux bug Florian Westphal
2021-02-24 16:23 ` [PATCH nf 1/3] netfilter: nf_nat: undo erroneous tcp edemux lookup Florian Westphal
2021-02-24 16:23 ` [PATCH nf 2/3] netfilter: conntrack: avoid misleading 'invalid' in log message Florian Westphal
2021-02-24 16:23 ` [PATCH nf 3/3] selftests: netfilter: test nat port clash resolution interaction with tcp early demux Florian Westphal
2021-02-27 21:33 ` [PATCH nf 0/3] netfilter: nat: fix ancient dnat+edemux bug Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.