All of lore.kernel.org
 help / color / mirror / Atom feed
* TCP connection fails in a asymmetric routing situation
@ 2022-02-25 12:30 Florian Westphal
  2022-03-02 10:59 ` Florian Westphal
  2022-03-02 11:32 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 5+ messages in thread
From: Florian Westphal @ 2022-02-25 12:30 UTC (permalink / raw)
  To: netfilter-devel; +Cc: pablo, kadlec, hmmsjan

https://bugzilla.redhat.com/show_bug.cgi?id=2051413

Gist is:
as of 878aed8db324bec64f3c3f956e64d5ae7375a5de
(" netfilter: nat: force port remap to prevent shadowing well-known
 port"), tcp connections won't get established with asymmetric routing
setups.

Workaround: Block conntrack for  LAN-LAN2 traffic by
iptables  -t raw -A PREROUTING -j CT --notrack
Or: echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

I'd guess that is because conntrack picks up the flow on syn-ack rather
than syn, snat check then thinks that source port is < 16384 and dest
port is large, so we do port rewrite but we do it on syn-ack and
connection cannot complete because client and server have different
views of the source ports involved.

Question is on how this can be prevented. I see a few solutions:

1. Change ct->local_origin to "ct->no_srcremap" (or a new status bit)
that indicates that this should not have src remap done, just like we
do for locally generated connections.

2. Add a new "mid-stream" status bit, then bypass the entire -t nat
logic if its set. nf_nat_core would create a null binding for the
flow, this also bypasses the "src remap" code.

3. Simpler version: from tcp conntrack, set the nat-done status bits
if its a mid-stream pickup.

Downside: nat engine (as-is) won't create a null binding, so connection
will not be known to nat engine for masquerade source port clash
detection.

I would go for 2) unless you have a better suggestion/idea.

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP connection fails in a asymmetric routing situation
  2022-02-25 12:30 TCP connection fails in a asymmetric routing situation Florian Westphal
@ 2022-03-02 10:59 ` Florian Westphal
  2022-03-08 10:22   ` H.Janssen
  2022-03-02 11:32 ` Pablo Neira Ayuso
  1 sibling, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2022-03-02 10:59 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, pablo, kadlec, hmmsjan

Florian Westphal <fw@strlen.de> wrote:
> 1. Change ct->local_origin to "ct->no_srcremap" (or a new status bit)
> that indicates that this should not have src remap done, just like we
> do for locally generated connections.
> 
> 2. Add a new "mid-stream" status bit, then bypass the entire -t nat
> logic if its set. nf_nat_core would create a null binding for the
> flow, this also bypasses the "src remap" code.
> 
> 3. Simpler version: from tcp conntrack, set the nat-done status bits
> if its a mid-stream pickup.
> 
> Downside: nat engine (as-is) won't create a null binding, so connection
> will not be known to nat engine for masquerade source port clash
> detection.
> 
> I would go for 2) unless you have a better suggestion/idea.

Something like this:

diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h
--- a/include/uapi/linux/netfilter/nf_conntrack_common.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_common.h
@@ -118,15 +118,19 @@ enum ip_conntrack_status {
 	IPS_HW_OFFLOAD_BIT = 15,
 	IPS_HW_OFFLOAD = (1 << IPS_HW_OFFLOAD_BIT),
 
+	/* Connection started mid-transfer, origin/reply might be inversed */
+	IPS_MID_STREAM_BIT = 16,
+	IPS_MID_STREAM = (1 << IPS_MID_STREAM_BIT),
+
 	/* Be careful here, modifying these bits can make things messy,
 	 * so don't let users modify them directly.
 	 */
 	IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK |
 				 IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING |
 				 IPS_SEQ_ADJUST | IPS_TEMPLATE | IPS_UNTRACKED |
-				 IPS_OFFLOAD | IPS_HW_OFFLOAD),
+				 IPS_OFFLOAD | IPS_HW_OFFLOAD | IPS_MID_STREAM_BIT),
 
-	__IPS_MAX_BIT = 16,
+	__IPS_MAX_BIT = 17,
 };
 
 /* Connection tracking event types */
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -537,6 +537,8 @@ static bool tcp_in_window(struct nf_conn *ct,
 			 * its history is lost for us.
 			 * Let's try to use the data from the packet.
 			 */
+			set_bit(IPS_MID_STREAM_BIT, &ct->status);
+
 			sender->td_end = end;
 			swin = win << sender->td_scale;
 			sender->td_maxwin = (swin == 0 ? 1 : swin);
@@ -816,6 +818,8 @@ static noinline bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb,
 		 * its history is lost for us.
 		 * Let's try to use the data from the packet.
 		 */
+		set_bit(IPS_MID_STREAM_BIT, &ct->status);
+
 		ct->proto.tcp.seen[0].td_end =
 			segment_seq_plus_len(ntohl(th->seq), skb->len,
 					     dataoff, th);
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -545,6 +545,12 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple,
 
 	zone = nf_ct_zone(ct);
 
+	if (unlikely(test_bit(IPS_MID_STREAM_BIT, &ct->status))) {
+		/* Can't NAT if connection was picked up mid-stream (e.g. tcp) */
+		*tuple = *orig_tuple;
+		return;
+	}
+
 	if (maniptype == NF_NAT_MANIP_SRC &&
 	    !random_port &&
 	    !ct->local_origin)
@@ -781,7 +787,7 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
 			unsigned int ret;
 			int i;
 
-			if (!e)
+			if (!e || unlikely(test_bit(IPS_MID_STREAM_BIT, &ct->status)))
 				goto null_bind;
 
 			for (i = 0; i < e->num_hook_entries; i++) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP connection fails in a asymmetric routing situation
  2022-02-25 12:30 TCP connection fails in a asymmetric routing situation Florian Westphal
  2022-03-02 10:59 ` Florian Westphal
@ 2022-03-02 11:32 ` Pablo Neira Ayuso
  2022-03-02 13:30   ` Florian Westphal
  1 sibling, 1 reply; 5+ messages in thread
From: Pablo Neira Ayuso @ 2022-03-02 11:32 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, kadlec, hmmsjan

Hi Florian,

On Fri, Feb 25, 2022 at 01:30:30PM +0100, Florian Westphal wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=2051413
> 
> Gist is:
> as of 878aed8db324bec64f3c3f956e64d5ae7375a5de
> (" netfilter: nat: force port remap to prevent shadowing well-known
>  port"), tcp connections won't get established with asymmetric routing
> setups.
> 
> Workaround: Block conntrack for  LAN-LAN2 traffic by
> iptables  -t raw -A PREROUTING -j CT --notrack
> Or: echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
> 
> I'd guess that is because conntrack picks up the flow on syn-ack rather
> than syn, snat check then thinks that source port is < 16384 and dest
> port is large, so we do port rewrite but we do it on syn-ack and
> connection cannot complete because client and server have different
> views of the source ports involved.
> 
> Question is on how this can be prevented. I see a few solutions:
> 
> 1. Change ct->local_origin to "ct->no_srcremap" (or a new status bit)
> that indicates that this should not have src remap done, just like we
> do for locally generated connections.
> 
> 2. Add a new "mid-stream" status bit, then bypass the entire -t nat
> logic if its set. nf_nat_core would create a null binding for the
> flow, this also bypasses the "src remap" code.
> 
> 3. Simpler version: from tcp conntrack, set the nat-done status bits
> if its a mid-stream pickup.
> 
> Downside: nat engine (as-is) won't create a null binding, so connection
> will not be known to nat engine for masquerade source port clash
> detection.
> 
> I would go for 2) unless you have a better suggestion/idea.

Conntrack needs to see traffic in both directions, otherwise it is
pickup the state from the middle from time to time (part of the
history is lost for us).

What am I missing here?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP connection fails in a asymmetric routing situation
  2022-03-02 11:32 ` Pablo Neira Ayuso
@ 2022-03-02 13:30   ` Florian Westphal
  0 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2022-03-02 13:30 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, kadlec, hmmsjan

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Conntrack needs to see traffic in both directions, otherwise it is
> pickup the state from the middle from time to time (part of the
> history is lost for us).
> 
> What am I missing here?

Connectivity breaks.

First packet that conntrack picks up is the syn-ack, this results in
following sequence:

1. SYN    x:12345 -> y -> 443 // sent by initiator, receiverd by responder
2. SYNACK y:443 -> x:12345 // First packet seen by conntrack, as sent by responder
3. tuple_force_port_remap() gets called, sees:
  'tcp from 443 to port 12345 NAT' -> pick a new source port, inititor receives
4. SYNACK y:$RANDOM -> x:12345   // connection is never established

This needs:
1. conntrack + nat enabled
2. connection is forwarded between two different machines
3. SYN packet is sent by a different route so conntrack only
sees return traffic.

This broke before as well if you would e.g. add 'masquerade random'
rule, but now the nat core does that automatically due to the 'port
shadow avoidance' change.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: TCP connection fails in a asymmetric routing situation
  2022-03-02 10:59 ` Florian Westphal
@ 2022-03-08 10:22   ` H.Janssen
  0 siblings, 0 replies; 5+ messages in thread
From: H.Janssen @ 2022-03-08 10:22 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, pablo, kadlec

Dear Mr. Westphal,

The problem seems to be solved with the patch below in kernel 
5.16.12-200.fc35.x86_64. Few lines offset during patch, but it patches 
and compiles.

No idea about possible side effects, but ports are no longer manipulated..

contrack -E --dst 192.168.10.12

     [NEW] tcp      6 300 ESTABLISHED src=192.168.1.47 dst=192.168.10.12 
sport=80 dport=52044 [UNREPLIED] src=192.168.10.12 dst=192.168.1.47 
sport=52044 dport=80
  [UPDATE] tcp      6 120 FIN_WAIT src=192.168.1.47 dst=192.168.10.12 
sport=80 dport=52044 [UNREPLIED] src=192.168.10.12 dst=192.168.1.47 
sport=52044 dport=80
[DESTROY] tcp      6 FIN_WAIT src=192.168.1.47 dst=192.168.10.12 
sport=80 dport=52040 [UNREPLIED] src=192.168.10.12 dst=192.168.1.47 
sport=52040 dport=80
[DESTROY] tcp      6 FIN_WAIT src=192.168.1.47 dst=192.168.10.12 
sport=80 dport=52042 [UNREPLIED] src=192.168.10.12 dst=192.168.1.47 
sport=52042 dport=80

Thanks and kind regards





On 3/2/22 11:59, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
>> 1. Change ct->local_origin to "ct->no_srcremap" (or a new status bit)
>> that indicates that this should not have src remap done, just like we
>> do for locally generated connections.
>>
>> 2. Add a new "mid-stream" status bit, then bypass the entire -t nat
>> logic if its set. nf_nat_core would create a null binding for the
>> flow, this also bypasses the "src remap" code.
>>
>> 3. Simpler version: from tcp conntrack, set the nat-done status bits
>> if its a mid-stream pickup.
>>
>> Downside: nat engine (as-is) won't create a null binding, so connection
>> will not be known to nat engine for masquerade source port clash
>> detection.
>>
>> I would go for 2) unless you have a better suggestion/idea.
> Something like this:
>
> diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h
> --- a/include/uapi/linux/netfilter/nf_conntrack_common.h
> +++ b/include/uapi/linux/netfilter/nf_conntrack_common.h
> @@ -118,15 +118,19 @@ enum ip_conntrack_status {
>   	IPS_HW_OFFLOAD_BIT = 15,
>   	IPS_HW_OFFLOAD = (1 << IPS_HW_OFFLOAD_BIT),
>   
> +	/* Connection started mid-transfer, origin/reply might be inversed */
> +	IPS_MID_STREAM_BIT = 16,
> +	IPS_MID_STREAM = (1 << IPS_MID_STREAM_BIT),
> +
>   	/* Be careful here, modifying these bits can make things messy,
>   	 * so don't let users modify them directly.
>   	 */
>   	IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK |
>   				 IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING |
>   				 IPS_SEQ_ADJUST | IPS_TEMPLATE | IPS_UNTRACKED |
> -				 IPS_OFFLOAD | IPS_HW_OFFLOAD),
> +				 IPS_OFFLOAD | IPS_HW_OFFLOAD | IPS_MID_STREAM_BIT),
>   
> -	__IPS_MAX_BIT = 16,
> +	__IPS_MAX_BIT = 17,
>   };
>   
>   /* Connection tracking event types */
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -537,6 +537,8 @@ static bool tcp_in_window(struct nf_conn *ct,
>   			 * its history is lost for us.
>   			 * Let's try to use the data from the packet.
>   			 */
> +			set_bit(IPS_MID_STREAM_BIT, &ct->status);
> +
>   			sender->td_end = end;
>   			swin = win << sender->td_scale;
>   			sender->td_maxwin = (swin == 0 ? 1 : swin);
> @@ -816,6 +818,8 @@ static noinline bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb,
>   		 * its history is lost for us.
>   		 * Let's try to use the data from the packet.
>   		 */
> +		set_bit(IPS_MID_STREAM_BIT, &ct->status);
> +
>   		ct->proto.tcp.seen[0].td_end =
>   			segment_seq_plus_len(ntohl(th->seq), skb->len,
>   					     dataoff, th);
> diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> --- a/net/netfilter/nf_nat_core.c
> +++ b/net/netfilter/nf_nat_core.c
> @@ -545,6 +545,12 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple,
>   
>   	zone = nf_ct_zone(ct);
>   
> +	if (unlikely(test_bit(IPS_MID_STREAM_BIT, &ct->status))) {
> +		/* Can't NAT if connection was picked up mid-stream (e.g. tcp) */
> +		*tuple = *orig_tuple;
> +		return;
> +	}
> +
>   	if (maniptype == NF_NAT_MANIP_SRC &&
>   	    !random_port &&
>   	    !ct->local_origin)
> @@ -781,7 +787,7 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
>   			unsigned int ret;
>   			int i;
>   
> -			if (!e)
> +			if (!e || unlikely(test_bit(IPS_MID_STREAM_BIT, &ct->status)))
>   				goto null_bind;
>   
>   			for (i = 0; i < e->num_hook_entries; i++) {

-- 
H.Janssen
Lekerwaard 36
1824HC Alkmaar
06-58971047


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-08 10:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25 12:30 TCP connection fails in a asymmetric routing situation Florian Westphal
2022-03-02 10:59 ` Florian Westphal
2022-03-08 10:22   ` H.Janssen
2022-03-02 11:32 ` Pablo Neira Ayuso
2022-03-02 13:30   ` Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.