[PATCH nf v2] netfilter: conntrack: connection timeout after re-register

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
@ 2020-10-07 19:32 Francesco Ruggeri
  2020-10-08 23:41 ` Francesco Ruggeri
  2020-10-20 15:21 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 15+ messages in thread
From: Francesco Ruggeri @ 2020-10-07 19:32 UTC (permalink / raw)
  To: linux-kernel, netdev, coreteam, netfilter-devel, kuba, davem, fw,
	kadlec, pablo, fruggeri

If the first packet conntrack sees after a re-register is an outgoing
keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to
SND.NXT-1.
When the peer correctly acknowledges SND.NXT, tcp_in_window fails
check III (Upper bound for valid (s)ack: sack <= receiver.td_end) and
returns false, which cascades into nf_conntrack_in setting
skb->_nfct = 0 and in later conntrack iptables rules not matching.
In cases where iptables are dropping packets that do not match
conntrack rules this can result in idle tcp connections to time out.

v2: adjust td_end when getting the reply rather than when sending out
    the keepalive packet.

Fixes: f94e63801ab2 ("netfilter: conntrack: reset tcp maxwin on re-register")
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
---
 net/netfilter/nf_conntrack_proto_tcp.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index e8c86ee4c1c4..c8fb2187ad4b 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -541,13 +541,20 @@ static bool tcp_in_window(const struct nf_conn *ct,
 			swin = win << sender->td_scale;
 			sender->td_maxwin = (swin == 0 ? 1 : swin);
 			sender->td_maxend = end + sender->td_maxwin;
-			/*
-			 * We haven't seen traffic in the other direction yet
-			 * but we have to tweak window tracking to pass III
-			 * and IV until that happens.
-			 */
-			if (receiver->td_maxwin == 0)
+			if (receiver->td_maxwin == 0) {
+				/* We haven't seen traffic in the other
+				 * direction yet but we have to tweak window
+				 * tracking to pass III and IV until that
+				 * happens.
+				 */
 				receiver->td_end = receiver->td_maxend = sack;
+			} else if (sack == receiver->td_end + 1) {
+				/* Likely a reply to a keepalive.
+				 * Needed for III.
+				 */
+				receiver->td_end++;
+			}
+
 		}
 	} else if (((state->state == TCP_CONNTRACK_SYN_SENT
 		     && dir == IP_CT_DIR_ORIGINAL)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-07 19:32 [PATCH nf v2] netfilter: conntrack: connection timeout after re-register Francesco Ruggeri
@ 2020-10-08 23:41 ` Francesco Ruggeri
  2020-10-09  6:52   ` Jozsef Kadlecsik
  2020-10-20 15:21 ` Pablo Neira Ayuso
  1 sibling, 1 reply; 15+ messages in thread
From: Francesco Ruggeri @ 2020-10-08 23:41 UTC (permalink / raw)
  To: open list, netdev, coreteam, netfilter-devel, Jakub Kicinski,
	David Miller, fw, kadlec, Pablo Neira Ayuso, Francesco Ruggeri

On Wed, Oct 7, 2020 at 12:32 PM Francesco Ruggeri <fruggeri@arista.com> wrote:
>
> If the first packet conntrack sees after a re-register is an outgoing
> keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to
> SND.NXT-1.
> When the peer correctly acknowledges SND.NXT, tcp_in_window fails
> check III (Upper bound for valid (s)ack: sack <= receiver.td_end) and
> returns false, which cascades into nf_conntrack_in setting
> skb->_nfct = 0 and in later conntrack iptables rules not matching.
> In cases where iptables are dropping packets that do not match
> conntrack rules this can result in idle tcp connections to time out.
>
> v2: adjust td_end when getting the reply rather than when sending out
>     the keepalive packet.
>

Any comments?
Here is a simple reproducer.
The idea is to show that keepalive packets in an idle tcp
connection will be dropped (and the connection will time out)
if conntrack hooks are de-registered and then re-registered.
The reproducer has two files.
client_server.py creates both ends of a tcp connection, bounces
a few packets back and forth, and then blocks on a recv on the
client side. The client's keepalive is configured to time out in
20 seconds. This connection should not time out.
test is a bash script that creates a net namespace where it sets
iptables rules for the connection, starts client_server.py, and
then clears and restores the iptables rules (which causes
conntrack hooks to be de-registered and re-registered).

================ file client_server.py
#!/usr/bin/python

import socket

PORT=4446

# create server socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('localhost', PORT))
sock.listen(1)

# create client socket
cl_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
cl_sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 2)
cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 2)
cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 10)
cl_sock.connect(('localhost', PORT))

srv_sock, _ = sock.accept()

# Bounce a packet back and forth a few times
buf = 'aaaaaaaaaaaa'
for i in range(5):
   cl_sock.send(buf)
   buf = srv_sock.recv(100)
   srv_sock.send(buf)
   buf = cl_sock.recv(100)
   print buf

# idle the connection
try:
   buf = cl_sock.recv(100)
except socket.error, e:
   print "Error: %s" % e

sock.close()
cl_sock.close()
srv_sock.close()

============== file test
#!/bin/bash

ip netns add dummy
ip netns exec dummy ip link set lo up
echo "Created namespace"

ip netns exec dummy iptables-restore <<END
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 4446 -j ACCEPT
COMMIT
END
echo "Installed iptables rules"

ip netns exec dummy ./client_server.py &
echo "Created tcp connection"
sleep 2

ip netns exec dummy iptables-restore << END
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
COMMIT
END
echo "Cleared iptables rules"
sleep 4

ip netns exec dummy iptables-restore << END
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 4446 -j ACCEPT
COMMIT
END
echo "Restored original iptables rules"

wait
ip netns del dummy
exit 0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-08 23:41 ` Francesco Ruggeri
@ 2020-10-09  6:52   ` Jozsef Kadlecsik
  2020-10-09 11:03     ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Jozsef Kadlecsik @ 2020-10-09  6:52 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: open list, netdev, coreteam, netfilter-devel, Jakub Kicinski,
	David Miller, fw, Pablo Neira Ayuso

Hi Francesco,

On Thu, 8 Oct 2020, Francesco Ruggeri wrote:

> On Wed, Oct 7, 2020 at 12:32 PM Francesco Ruggeri <fruggeri@arista.com> wrote:
> >
> > If the first packet conntrack sees after a re-register is an outgoing 
> > keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to 
> > SND.NXT-1. When the peer correctly acknowledges SND.NXT, tcp_in_window 
> > fails check III (Upper bound for valid (s)ack: sack <= 
> > receiver.td_end) and returns false, which cascades into 
> > nf_conntrack_in setting skb->_nfct = 0 and in later conntrack iptables 
> > rules not matching. In cases where iptables are dropping packets that 
> > do not match conntrack rules this can result in idle tcp connections 
> > to time out.
> >
> > v2: adjust td_end when getting the reply rather than when sending out
> >     the keepalive packet.
> >
> 
> Any comments?
> Here is a simple reproducer. The idea is to show that keepalive packets 
> in an idle tcp connection will be dropped (and the connection will time 
> out) if conntrack hooks are de-registered and then re-registered. The 
> reproducer has two files. client_server.py creates both ends of a tcp 
> connection, bounces a few packets back and forth, and then blocks on a 
> recv on the client side. The client's keepalive is configured to time 
> out in 20 seconds. This connection should not time out. test is a bash 
> script that creates a net namespace where it sets iptables rules for the 
> connection, starts client_server.py, and then clears and restores the 
> iptables rules (which causes conntrack hooks to be de-registered and 
> re-registered).

In my opinion an iptables restore should not cause conntrack hooks to be 
de-registered and re-registered, because important TCP initialization 
parameters cannot be "restored" later from the packets. Therefore the 
proper fix would be to prevent it to happen. Otherwise your patch looks OK 
to handle the case when conntrack is intentionally restarted.

Best regards,
Jozsef
 
> ================ file client_server.py
> #!/usr/bin/python
> 
> import socket
> 
> PORT=4446
> 
> # create server socket
> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> sock.bind(('localhost', PORT))
> sock.listen(1)
> 
> # create client socket
> cl_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> cl_sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
> cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 2)
> cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 2)
> cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 10)
> cl_sock.connect(('localhost', PORT))
> 
> srv_sock, _ = sock.accept()
> 
> # Bounce a packet back and forth a few times
> buf = 'aaaaaaaaaaaa'
> for i in range(5):
>    cl_sock.send(buf)
>    buf = srv_sock.recv(100)
>    srv_sock.send(buf)
>    buf = cl_sock.recv(100)
>    print buf
> 
> # idle the connection
> try:
>    buf = cl_sock.recv(100)
> except socket.error, e:
>    print "Error: %s" % e
> 
> sock.close()
> cl_sock.close()
> srv_sock.close()
> 
> ============== file test
> #!/bin/bash
> 
> ip netns add dummy
> ip netns exec dummy ip link set lo up
> echo "Created namespace"
> 
> ip netns exec dummy iptables-restore <<END
> *filter
> :INPUT DROP [0:0]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [0:0]
> -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 4446 -j ACCEPT
> COMMIT
> END
> echo "Installed iptables rules"
> 
> ip netns exec dummy ./client_server.py &
> echo "Created tcp connection"
> sleep 2
> 
> ip netns exec dummy iptables-restore << END
> *filter
> :INPUT ACCEPT [0:0]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [0:0]
> COMMIT
> END
> echo "Cleared iptables rules"
> sleep 4
> 
> ip netns exec dummy iptables-restore << END
> *filter
> :INPUT DROP [0:0]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [0:0]
> -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 4446 -j ACCEPT
> COMMIT
> END
> echo "Restored original iptables rules"
> 
> wait
> ip netns del dummy
> exit 0
> 

-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-09  6:52   ` Jozsef Kadlecsik
@ 2020-10-09 11:03     ` Florian Westphal
  2020-10-09 18:48       ` Jozsef Kadlecsik
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2020-10-09 11:03 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Francesco Ruggeri, open list, netdev, coreteam, netfilter-devel,
	Jakub Kicinski, David Miller, fw, Pablo Neira Ayuso

Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > Any comments?
> > Here is a simple reproducer. The idea is to show that keepalive packets 
> > in an idle tcp connection will be dropped (and the connection will time 
> > out) if conntrack hooks are de-registered and then re-registered. The 
> > reproducer has two files. client_server.py creates both ends of a tcp 
> > connection, bounces a few packets back and forth, and then blocks on a 
> > recv on the client side. The client's keepalive is configured to time 
> > out in 20 seconds. This connection should not time out. test is a bash 
> > script that creates a net namespace where it sets iptables rules for the 
> > connection, starts client_server.py, and then clears and restores the 
> > iptables rules (which causes conntrack hooks to be de-registered and 
> > re-registered).
> 
> In my opinion an iptables restore should not cause conntrack hooks to be 
> de-registered and re-registered, because important TCP initialization 
> parameters cannot be "restored" later from the packets. Therefore the 
> proper fix would be to prevent it to happen. Otherwise your patch looks OK 
> to handle the case when conntrack is intentionally restarted.

The repro clears all rules, waits 4 seconds, then restores the ruleset.
using iptables-restore < FOO; sleep 4; iptables-restore < FOO will
not result in any unregister ops.

We could make kernel defer unregister via some work queue but i don't
see what this would help/accomplish (and its questionable of how long it
should wait).

We could disallow unregister, but that seems silly (forces reboot...).

I think the patch is fine.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-09 11:03     ` Florian Westphal
@ 2020-10-09 18:48       ` Jozsef Kadlecsik
  2020-10-09 18:55         ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Jozsef Kadlecsik @ 2020-10-09 18:48 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Francesco Ruggeri, open list, netdev, coreteam, netfilter-devel,
	Jakub Kicinski, David Miller, fw, Pablo Neira Ayuso

Hi Florian,

On Fri, 9 Oct 2020, Florian Westphal wrote:

> Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > > The reproducer has two files. client_server.py creates both ends of 
> > > a tcp connection, bounces a few packets back and forth, and then 
> > > blocks on a recv on the client side. The client's keepalive is 
> > > configured to time out in 20 seconds. This connection should not 
> > > time out. test is a bash script that creates a net namespace where 
> > > it sets iptables rules for the connection, starts client_server.py, 
> > > and then clears and restores the iptables rules (which causes 
> > > conntrack hooks to be de-registered and re-registered).
> > 
> > In my opinion an iptables restore should not cause conntrack hooks to be 
> > de-registered and re-registered, because important TCP initialization 
> > parameters cannot be "restored" later from the packets. Therefore the 
> > proper fix would be to prevent it to happen. Otherwise your patch looks OK 
> > to handle the case when conntrack is intentionally restarted.
> 
> The repro clears all rules, waits 4 seconds, then restores the ruleset. 
> using iptables-restore < FOO; sleep 4; iptables-restore < FOO will not 
> result in any unregister ops.
>
> We could make kernel defer unregister via some work queue but i don't
> see what this would help/accomplish (and its questionable of how long it
> should wait).

Sorry, I can't put together the two paragraphs above: in the first you 
wrote that no (hook) unregister-register happens and in the second one 
that those could be derefed.

> We could disallow unregister, but that seems silly (forces reboot...).
> 
> I think the patch is fine.

The patch is fine, but why the packets are handled by conntrack (after the 
first restore and during the 4s sleep? And then again after the second 
restore?) as if all conntrack entries were removed?
 
Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-09 18:48       ` Jozsef Kadlecsik
@ 2020-10-09 18:55         ` Florian Westphal
  2020-10-09 19:49           ` Jozsef Kadlecsik
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2020-10-09 18:55 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Florian Westphal, Francesco Ruggeri, open list, netdev, coreteam,
	netfilter-devel, Jakub Kicinski, David Miller, fw,
	Pablo Neira Ayuso

Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > The repro clears all rules, waits 4 seconds, then restores the ruleset. 
> > using iptables-restore < FOO; sleep 4; iptables-restore < FOO will not 
> > result in any unregister ops.
> >
> > We could make kernel defer unregister via some work queue but i don't
> > see what this would help/accomplish (and its questionable of how long it
> > should wait).
> 
> Sorry, I can't put together the two paragraphs above: in the first you 
> wrote that no (hook) unregister-register happens and in the second one 
> that those could be derefed.

Sorry, my reply is confusing indeed.

Matches/targets that need conntrack increment a refcount.
So, when all rules are flushed, refcount goes down to 0 and conntrack is
disabled because the hooks get removed..

Just doing iptables-restore doesn't unregister as long as both the old
and new rulesets need conntrack.

The "delay unregister" remark was wrt. the "all rules were deleted"
case, i.e. add a "grace period" rather than acting right away when
conntrack use count did hit 0.

> > We could disallow unregister, but that seems silly (forces reboot...).
> > 
> > I think the patch is fine.
> 
> The patch is fine, but why the packets are handled by conntrack (after the 
> first restore and during the 4s sleep? And then again after the second 
> restore?) as if all conntrack entries were removed?

Conntrack entries are not removed, only the base hooks get unregistered.
This is a problem for tcp window tracking.

When re-register occurs, kernel is supposed to switch the existing
entries to "loose" mode so window tracking won't flag packets as
invalid, but apparently this isn't enough to handle keepalive case.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-09 18:55         ` Florian Westphal
@ 2020-10-09 19:49           ` Jozsef Kadlecsik
  2020-10-09 20:00             ` Francesco Ruggeri
  2020-10-09 20:05             ` Florian Westphal
  0 siblings, 2 replies; 15+ messages in thread
From: Jozsef Kadlecsik @ 2020-10-09 19:49 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Francesco Ruggeri, open list, netdev, coreteam, netfilter-devel,
	Jakub Kicinski, David Miller, Pablo Neira Ayuso

On Fri, 9 Oct 2020, Florian Westphal wrote:

> Matches/targets that need conntrack increment a refcount. So, when all 
> rules are flushed, refcount goes down to 0 and conntrack is disabled 
> because the hooks get removed..
> 
> Just doing iptables-restore doesn't unregister as long as both the old
> and new rulesets need conntrack.
> 
> The "delay unregister" remark was wrt. the "all rules were deleted"
> case, i.e. add a "grace period" rather than acting right away when
> conntrack use count did hit 0.

Now I understand it, thanks really. The hooks are removed, so conntrack 
cannot "see" the packets and the entries become stale. 

What is the rationale behind "remove the conntrack hooks when there are no 
rule left referring to conntrack"? Performance optimization? But then the 
content of the whole conntrack table could be deleted too... ;-)
 
> Conntrack entries are not removed, only the base hooks get unregistered. 
> This is a problem for tcp window tracking.
> 
> When re-register occurs, kernel is supposed to switch the existing 
> entries to "loose" mode so window tracking won't flag packets as 
> invalid, but apparently this isn't enough to handle keepalive case.

"loose" (nf_ct_tcp_loose) mode doesn't disable window tracking, it 
enables/disables picking up already established connections. 

nf_ct_tcp_be_liberal would disable TCP window checking (but not tracking) 
for non RST packets.

But both seems to be modified only via the proc entries.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-09 19:49           ` Jozsef Kadlecsik
@ 2020-10-09 20:00             ` Francesco Ruggeri
  2020-10-09 20:05             ` Florian Westphal
  1 sibling, 0 replies; 15+ messages in thread
From: Francesco Ruggeri @ 2020-10-09 20:00 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Florian Westphal, open list, netdev, coreteam, netfilter-devel,
	Jakub Kicinski, David Miller, Pablo Neira Ayuso

On Fri, Oct 9, 2020 at 12:49 PM Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> What is the rationale behind "remove the conntrack hooks when there are no
> rule left referring to conntrack"? Performance optimization?

That seems to be the case. See commit 4d3a57f23dec ("netfilter: conntrack:
do not enable connection tracking unless needed").

Francesco

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-09 19:49           ` Jozsef Kadlecsik
  2020-10-09 20:00             ` Francesco Ruggeri
@ 2020-10-09 20:05             ` Florian Westphal
  2020-10-14  0:06               ` Pablo Neira Ayuso
  1 sibling, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2020-10-09 20:05 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Florian Westphal, Francesco Ruggeri, open list, netdev, coreteam,
	netfilter-devel, Jakub Kicinski, David Miller, Pablo Neira Ayuso

Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > The "delay unregister" remark was wrt. the "all rules were deleted"
> > case, i.e. add a "grace period" rather than acting right away when
> > conntrack use count did hit 0.
> 
> Now I understand it, thanks really. The hooks are removed, so conntrack 
> cannot "see" the packets and the entries become stale. 

Yes.

> What is the rationale behind "remove the conntrack hooks when there are no 
> rule left referring to conntrack"? Performance optimization? But then the 
> content of the whole conntrack table could be deleted too... ;-)

Yes, this isn't the case at the moment -- only hooks are removed,
entries will eventually time out.

> > Conntrack entries are not removed, only the base hooks get unregistered. 
> > This is a problem for tcp window tracking.
> > 
> > When re-register occurs, kernel is supposed to switch the existing 
> > entries to "loose" mode so window tracking won't flag packets as 
> > invalid, but apparently this isn't enough to handle keepalive case.
> 
> "loose" (nf_ct_tcp_loose) mode doesn't disable window tracking, it 
> enables/disables picking up already established connections. 
> 
> nf_ct_tcp_be_liberal would disable TCP window checking (but not tracking) 
> for non RST packets.

You are right, mixup on my part.

> But both seems to be modified only via the proc entries.

Yes, we iterate table on re-register and modify the existing entries.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-09 20:05             ` Florian Westphal
@ 2020-10-14  0:06               ` Pablo Neira Ayuso
  2020-10-14  8:11                 ` Pablo Neira Ayuso
  2020-10-14  8:23                 ` Florian Westphal
  0 siblings, 2 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2020-10-14  0:06 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Jozsef Kadlecsik, Francesco Ruggeri, open list, netdev, coreteam,
	netfilter-devel, Jakub Kicinski, David Miller

On Fri, Oct 09, 2020 at 10:05:48PM +0200, Florian Westphal wrote:
> Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > > The "delay unregister" remark was wrt. the "all rules were deleted"
> > > case, i.e. add a "grace period" rather than acting right away when
> > > conntrack use count did hit 0.
> > 
> > Now I understand it, thanks really. The hooks are removed, so conntrack 
> > cannot "see" the packets and the entries become stale. 
> 
> Yes.
> 
> > What is the rationale behind "remove the conntrack hooks when there are no 
> > rule left referring to conntrack"? Performance optimization? But then the 
> > content of the whole conntrack table could be deleted too... ;-)
> 
> Yes, this isn't the case at the moment -- only hooks are removed,
> entries will eventually time out.
> 
> > > Conntrack entries are not removed, only the base hooks get unregistered. 
> > > This is a problem for tcp window tracking.
> > > 
> > > When re-register occurs, kernel is supposed to switch the existing 
> > > entries to "loose" mode so window tracking won't flag packets as 
> > > invalid, but apparently this isn't enough to handle keepalive case.
> > 
> > "loose" (nf_ct_tcp_loose) mode doesn't disable window tracking, it 
> > enables/disables picking up already established connections. 
> > 
> > nf_ct_tcp_be_liberal would disable TCP window checking (but not tracking) 
> > for non RST packets.
> 
> You are right, mixup on my part.
> 
> > But both seems to be modified only via the proc entries.
> 
> Yes, we iterate table on re-register and modify the existing entries.

For iptables-nft, it might be possible to avoid this deregister +
register ct hooks in the same transaction: Maybe add something like
nf_ct_netns_get_all() to bump refcounters by one _iff_ they are > 0
before starting the transaction processing, then call
nf_ct_netns_put_all() which decrements refcounters and unregister
hooks if they reach 0.

The only problem with this approach is that this pulls in the
conntrack module, to solve that, struct nf_ct_hook in
net/netfilter/core.c could be used to store the reference to
->netns_get_all and ->net_put_all.

Legacy would still be flawed though.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-14  0:06               ` Pablo Neira Ayuso
@ 2020-10-14  8:11                 ` Pablo Neira Ayuso
  2020-10-14  8:23                 ` Florian Westphal
  1 sibling, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2020-10-14  8:11 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Jozsef Kadlecsik, Francesco Ruggeri, open list, netdev, coreteam,
	netfilter-devel, Jakub Kicinski, David Miller

On Wed, Oct 14, 2020 at 02:06:28AM +0200, Pablo Neira Ayuso wrote:
> On Fri, Oct 09, 2020 at 10:05:48PM +0200, Florian Westphal wrote:
> > Jozsef Kadlecsik <kadlec@netfilter.org> wrote:
> > > > The "delay unregister" remark was wrt. the "all rules were deleted"
> > > > case, i.e. add a "grace period" rather than acting right away when
> > > > conntrack use count did hit 0.
> > > 
> > > Now I understand it, thanks really. The hooks are removed, so conntrack 
> > > cannot "see" the packets and the entries become stale. 
> > 
> > Yes.
> > 
> > > What is the rationale behind "remove the conntrack hooks when there are no 
> > > rule left referring to conntrack"? Performance optimization? But then the 
> > > content of the whole conntrack table could be deleted too... ;-)
> > 
> > Yes, this isn't the case at the moment -- only hooks are removed,
> > entries will eventually time out.
> > 
> > > > Conntrack entries are not removed, only the base hooks get unregistered. 
> > > > This is a problem for tcp window tracking.
> > > > 
> > > > When re-register occurs, kernel is supposed to switch the existing 
> > > > entries to "loose" mode so window tracking won't flag packets as 
> > > > invalid, but apparently this isn't enough to handle keepalive case.
> > > 
> > > "loose" (nf_ct_tcp_loose) mode doesn't disable window tracking, it 
> > > enables/disables picking up already established connections. 
> > > 
> > > nf_ct_tcp_be_liberal would disable TCP window checking (but not tracking) 
> > > for non RST packets.
> > 
> > You are right, mixup on my part.
> > 
> > > But both seems to be modified only via the proc entries.
> > 
> > Yes, we iterate table on re-register and modify the existing entries.
> 
> For iptables-nft, it might be possible to avoid this deregister +
> register ct hooks in the same transaction: Maybe add something like
> nf_ct_netns_get_all() to bump refcounters by one _iff_ they are > 0
> before starting the transaction processing, then call
> nf_ct_netns_put_all() which decrements refcounters and unregister
> hooks if they reach 0.

Hm, scratch that, put_all() would create an imbalance with this
conditional increment.

> The only problem with this approach is that this pulls in the
> conntrack module, to solve that, struct nf_ct_hook in
> net/netfilter/core.c could be used to store the reference to
> ->netns_get_all and ->net_put_all.
> 
> Legacy would still be flawed though.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-14  0:06               ` Pablo Neira Ayuso
  2020-10-14  8:11                 ` Pablo Neira Ayuso
@ 2020-10-14  8:23                 ` Florian Westphal
  2020-10-14 18:42                   ` Francesco Ruggeri
  1 sibling, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2020-10-14  8:23 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, Jozsef Kadlecsik, Francesco Ruggeri, open list,
	netdev, coreteam, netfilter-devel, Jakub Kicinski, David Miller

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Yes, we iterate table on re-register and modify the existing entries.
> 
> For iptables-nft, it might be possible to avoid this deregister +
> register ct hooks in the same transaction: Maybe add something like
> nf_ct_netns_get_all() to bump refcounters by one _iff_ they are > 0
> before starting the transaction processing, then call
> nf_ct_netns_put_all() which decrements refcounters and unregister
> hooks if they reach 0.

No need, its already fine.  Decrement happens from destroy path,
so new rules are already in place.

> The only problem with this approach is that this pulls in the
> conntrack module, to solve that, struct nf_ct_hook in
> net/netfilter/core.c could be used to store the reference to
> ->netns_get_all and ->net_put_all.
> 
> Legacy would still be flawed though.

Its fine too, new rule blob gets handled (and match/target checkentry
called) before old one is dismantled.

We only have a 0 refcount + hook unregister when rules get
flushed/removed explicitly.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-14  8:23                 ` Florian Westphal
@ 2020-10-14 18:42                   ` Francesco Ruggeri
  2020-10-14 19:35                     ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Francesco Ruggeri @ 2020-10-14 18:42 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, open list, netdev, coreteam,
	netfilter-devel, Jakub Kicinski, David Miller

On Wed, Oct 14, 2020 at 1:23 AM Florian Westphal <fw@strlen.de> wrote:
>
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Legacy would still be flawed though.
>
> Its fine too, new rule blob gets handled (and match/target checkentry
> called) before old one is dismantled.
>
> We only have a 0 refcount + hook unregister when rules get
> flushed/removed explicitly.

Should the patch be used in the meantime while this gets
worked out?

Francesco

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-14 18:42                   ` Francesco Ruggeri
@ 2020-10-14 19:35                     ` Florian Westphal
  0 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2020-10-14 19:35 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: Florian Westphal, Pablo Neira Ayuso, Jozsef Kadlecsik, open list,
	netdev, coreteam, netfilter-devel, Jakub Kicinski, David Miller

Francesco Ruggeri <fruggeri@arista.com> wrote:
> On Wed, Oct 14, 2020 at 1:23 AM Florian Westphal <fw@strlen.de> wrote:
> >
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > Legacy would still be flawed though.
> >
> > Its fine too, new rule blob gets handled (and match/target checkentry
> > called) before old one is dismantled.
> >
> > We only have a 0 refcount + hook unregister when rules get
> > flushed/removed explicitly.
> 
> Should the patch be used in the meantime while this gets
> worked out?

I think the patch is correct, and I do NOT see a better solution.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register
  2020-10-07 19:32 [PATCH nf v2] netfilter: conntrack: connection timeout after re-register Francesco Ruggeri
  2020-10-08 23:41 ` Francesco Ruggeri
@ 2020-10-20 15:21 ` Pablo Neira Ayuso
  1 sibling, 0 replies; 15+ messages in thread
From: Pablo Neira Ayuso @ 2020-10-20 15:21 UTC (permalink / raw)
  To: Francesco Ruggeri
  Cc: linux-kernel, netdev, coreteam, netfilter-devel, kuba, davem, fw, kadlec

On Wed, Oct 07, 2020 at 12:32:52PM -0700, Francesco Ruggeri wrote:
> If the first packet conntrack sees after a re-register is an outgoing
> keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to
> SND.NXT-1.
> When the peer correctly acknowledges SND.NXT, tcp_in_window fails
> check III (Upper bound for valid (s)ack: sack <= receiver.td_end) and
> returns false, which cascades into nf_conntrack_in setting
> skb->_nfct = 0 and in later conntrack iptables rules not matching.
> In cases where iptables are dropping packets that do not match
> conntrack rules this can result in idle tcp connections to time out.

Applied, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-10-20 15:21 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-07 19:32 [PATCH nf v2] netfilter: conntrack: connection timeout after re-register Francesco Ruggeri
2020-10-08 23:41 ` Francesco Ruggeri
2020-10-09  6:52   ` Jozsef Kadlecsik
2020-10-09 11:03     ` Florian Westphal
2020-10-09 18:48       ` Jozsef Kadlecsik
2020-10-09 18:55         ` Florian Westphal
2020-10-09 19:49           ` Jozsef Kadlecsik
2020-10-09 20:00             ` Francesco Ruggeri
2020-10-09 20:05             ` Florian Westphal
2020-10-14  0:06               ` Pablo Neira Ayuso
2020-10-14  8:11                 ` Pablo Neira Ayuso
2020-10-14  8:23                 ` Florian Westphal
2020-10-14 18:42                   ` Francesco Ruggeri
2020-10-14 19:35                     ` Florian Westphal
2020-10-20 15:21 ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).