All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH-2.4-POM] sessions stuck in CLOSE_WAIT state
@ 2003-10-07 22:18 Willy Tarreau
  2003-10-08  9:44 ` Harald Welte
  0 siblings, 1 reply; 14+ messages in thread
From: Willy Tarreau @ 2003-10-07 22:18 UTC (permalink / raw)
  To: kadlec; +Cc: netfilter, laforge

Hello !

I have installed 2 new Netfilter-based firewalls at a customer's a few weeks
ago. They use mainline 2.4.22 + the tcp_window_tracking patch from
pom-20030912. They're being load-balanced by external equipments (alteon)
and used only for incoming HTTP connections.

Since they're in production, I have been noticing some quirks that didn't
show up on the homologation platform, nor development platform, despite a
ten-fold load for the later :
  - under a load of about 500 hits/s/firewall, I got almost every second one
    of those common messages "out of window data", "invalid scale", ...
  - I noticed a *LOT* of drops (about 2.5% of all logs ; only accepted SYNs
    and drops are logged).
  - there are a *LOT* of CLOSE_WAIT sockets in /proc/net/ip_conntrack (about
    40000 per firewall with the default 3 days timeout).

I had some time this evening to dig through the logs, but only noticed that
some packets were dropped for unknown reasons, such as FIN/ACKs or RSTs,
most of which were coming from our servers.

So I set up a test platform between two PCs to try to reproduce. The client
injected at about 400 hits/s on the server (thttpd), which BTW also hosted
4 rules which logged the session state (NEW, ESTABLISHED, RELATED, INVALID).
No problem showed up without the tcp_windows_tracking patch. Interestingly,
with it, as soon as the client ports were reused, I began to get INVALID
entries in my logs. And long before that, I already had many CLOSE_WAIT
sessions while neither the client nor the server had any of them left.

Fortunately, tcpdump showed to me that all stuck sessions had in common the
fact that the server aborted them with an RST after the first FIN+ACK, so
in CLOSE_WAIT. Others which were aborted with an RST just before or just after
were destroyed without any problem.

Then I enabled full debugging in ip_conntrack_proto_tcp.c and could capture
this session :

=== begin cut here ===

Oct  7 20:22:24 wta-tig tcp_in_window: START
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.1:53541 dst=10.0.0.2:80 seq=379441844 ack=0 sack=0 win=5840 end=379441845
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=379441845 maxend=379441845 maxwin=5840 scale=0 receiver end=0 maxend=0 maxwin=1 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.1:53541 dst=10.0.0.2:80 seq=379441844 ack=0 sack=0 win=5840 end=379441845
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=379441845 maxend=379441845 maxwin=5840 scale=0 receiver end=0 maxend=0 maxwin=1 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: I=1 II=1 III=1 IV=1
Oct  7 20:22:24 wta-tig tcp_in_window: res=1 sender end=379441845 maxend=379441845 maxwin=5840 receiver end=0 maxend=5840 maxwin=1
Oct  7 20:22:24 wta-tig tcp_conntracks: src=10.0.0.1:53541 dst=10.0.0.2:80 syn=1 ack=0 fin=0 rst=0 old=1 new=1

Oct  7 20:22:24 wta-tig NEW IN=eth0 OUT= MAC=00:04:76:11:b6:86:08:00:46:6f:30:d9:08:00 SRC=10.0.0.1 DST=10.0.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=43524 DF PROTO=TCP SPT=53541 DPT=80 WINDOW=5840 RES=0x00 SYN URGP=0 

Oct  7 20:22:24 wta-tig tcp_in_window: START
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.2:80 dst=10.0.0.1:53541 seq=4059370607 ack=379441845 sack=379441845 win=5792 end=4059370608
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=0 maxend=5840 maxwin=1 scale=0 receiver end=379441845 maxend=379441845 maxwin=5840 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.2:80 dst=10.0.0.1:53541 seq=4059370607 ack=379441845 sack=379441845 win=5792 end=4059370608
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=4059370608 maxend=4059370608 maxwin=5792 scale=0 receiver end=379441845 maxend=379441845 maxwin=5840 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: I=1 II=1 III=1 IV=1
Oct  7 20:22:24 wta-tig tcp_in_window: res=1 sender end=4059370608 maxend=4059370608 maxwin=5792 receiver end=379441845 maxend=379447637 maxwin=5840
Oct  7 20:22:24 wta-tig tcp_conntracks: src=10.0.0.2:80 dst=10.0.0.1:53541 syn=1 ack=1 fin=0 rst=0 old=1 new=2
Oct  7 20:22:24 wta-tig tcp_new: sender end=389142051 maxend=389142051 maxwin=5840 scale=0 receiver end=0 maxend=0 maxwin=1 scale=0

Oct  7 20:22:24 wta-tig tcp_in_window: START
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.1:53541 dst=10.0.0.2:80 seq=379441845 ack=4059370608 sack=4059370608 win=5840 end=379441845
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=379441845 maxend=379447637 maxwin=5840 scale=0 receiver end=4059370608 maxend=4059370608 maxwin=5792 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.1:53541 dst=10.0.0.2:80 seq=379441845 ack=4059370608 sack=4059370608 win=5840 end=379441845
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=379441845 maxend=379447637 maxwin=5840 scale=0 receiver end=4059370608 maxend=4059370608 maxwin=5792 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: I=1 II=1 III=1 IV=1
Oct  7 20:22:24 wta-tig tcp_in_window: res=1 sender end=379441845 maxend=379447637 maxwin=5840 receiver end=4059370608 maxend=4059376448 maxwin=5792
Oct  7 20:22:24 wta-tig tcp_conntracks: src=10.0.0.1:53541 dst=10.0.0.2:80 syn=0 ack=1 fin=0 rst=0 old=2 new=3

Oct  7 20:22:24 wta-tig tcp_in_window: START
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.2:80 dst=10.0.0.1:53541 seq=4059370608 ack=379441845 sack=379441845 win=5792 end=4059370609
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=4059370608 maxend=4059376448 maxwin=5792 scale=0 receiver end=379441845 maxend=379447637 maxwin=5840 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.2:80 dst=10.0.0.1:53541 seq=4059370608 ack=379441845 sack=379441845 win=5792 end=4059370609
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=4059370608 maxend=4059376448 maxwin=5792 scale=0 receiver end=379441845 maxend=379447637 maxwin=5840 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: I=1 II=1 III=1 IV=1
Oct  7 20:22:24 wta-tig tcp_in_window: res=1 sender end=4059370609 maxend=4059376448 maxwin=5792 receiver end=379441845 maxend=379447637 maxwin=5840
Oct  7 20:22:24 wta-tig tcp_conntracks: src=10.0.0.2:80 dst=10.0.0.1:53541 syn=0 ack=1 fin=1 rst=0 old=3 new=4

Oct  7 20:22:24 wta-tig tcp_in_window: START
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.1:53541 dst=10.0.0.2:80 seq=379441845 ack=4059370609 sack=4059370609 win=5840 end=379441845
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=379441845 maxend=379447637 maxwin=5840 scale=0 receiver end=4059370609 maxend=4059376448 maxwin=5792 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.1:53541 dst=10.0.0.2:80 seq=379441845 ack=4059370609 sack=4059370609 win=5840 end=379441845
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=379441845 maxend=379447637 maxwin=5840 scale=0 receiver end=4059370609 maxend=4059376448 maxwin=5792 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: I=1 II=1 III=1 IV=1
Oct  7 20:22:24 wta-tig tcp_in_window: res=1 sender end=379441845 maxend=379447637 maxwin=5840 receiver end=4059370609 maxend=4059376449 maxwin=5792
Oct  7 20:22:24 wta-tig tcp_conntracks: src=10.0.0.1:53541 dst=10.0.0.2:80 syn=0 ack=1 fin=0 rst=0 old=4 new=5

Oct  7 20:22:24 wta-tig tcp_in_window: START
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.1:53541 dst=10.0.0.2:80 seq=379441845 ack=4059370609 sack=4059370609 win=5840 end=379441959
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=379441845 maxend=379447637 maxwin=5840 scale=0 receiver end=4059370609 maxend=4059376449 maxwin=5792 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: src=10.0.0.1:53541 dst=10.0.0.2:80 seq=379441845 ack=4059370609 sack=4059370609 win=5840 end=379441959
Oct  7 20:22:24 wta-tig tcp_in_window: sender end=379441845 maxend=379447637 maxwin=5840 scale=0 receiver end=4059370609 maxend=4059376449 maxwin=5792 scale=0
Oct  7 20:22:24 wta-tig tcp_in_window: I=1 II=1 III=1 IV=1
Oct  7 20:22:24 wta-tig tcp_in_window: res=1 sender end=379441959 maxend=379447637 maxwin=5840 receiver end=4059370609 maxend=4059376449 maxwin=5792
Oct  7 20:22:24 wta-tig tcp_conntracks: src=10.0.0.1:53541 dst=10.0.0.2:80 syn=0 ack=1 fin=0 rst=0 old=5 new=5

Oct  7 20:22:24 wta-tig tcp_conntracks: src=10.0.0.2:80 dst=10.0.0.1:53541 syn=0 ack=0 fin=0 rst=1 old=5 new=10
Oct  7 20:22:24 wta-tig ip_conntrack_tcp: Invalid dir=1 index=4 conntrack=5

=== end cut here ===

(value 5 is CLOSE_WAIT, and 10 is INVALID or TCP_CONNTRACK_MAX). You see that
an RST has been handled very early in the code path, on the last two lines.

In fact, the thttpd server was rejecting some connections, sending an RST and
closing its local connection. The client normally terminated its connection,
and only the firewall kept it in CLOSE_WAIT. Then, when the client wanted to
reuse the port later (since the session did no longer exist), the firewall
noticed that a session still existed for this port, and that sequence numbers 
did not match, which led to some more logs, and to the client not being able
to connect.

The code without tcp_window_tracking correctly sets the session to the CLOSE
state when receiving an RST in CLOSE_WAIT. This is why I could never make it
complain :

/*     REPLY */
/*       sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI      */
/*syn*/ {sSR, sES, sSR, sSR, sSR, sSR, sSR, sSR, sSR, sSR },
/*fin*/ {sCL, sCW, sSS, sTW, sTW, sTW, sCL, sCW, sLA, sLI },
/*ack*/ {sCL, sES, sSS, sSR, sFW, sTW, sCL, sCW, sCL, sLI },
/*rst*/ {sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sLA, sLI },
/*none*/{sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV }

But the tcp_window_tracking patch changes this behaviour, I think because of
a typo, since I see no other reason not to terminate a session upon RST :

/*          sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sLI   */
/*rst*/   { sIV, sCL, sCL, sCL, sCL, sIV, sCL, sCL, sCL, sIV },
/*none*   { sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV }

BTW, is it normal that an RST received in TIME_WAIT state terminates the
session ? I would have thought that we should keep it in TIME_WAIT anyway.

So I simply applied this trivial patch, and could never trigger the problem
anymore, even at 2500 hits/s : all sessions properly terminate, and I don't
even get any "Out of window" message on my test platform anymore. I will
soon patch the homologation then the production machines with it.

Here it is :

--- linux-2.4.23-pre6-pom-20030912/net/ipv4/netfilter/ip_conntrack_proto_tcp.c.orig	Tue Oct  7 23:06:49 2003
+++ linux-2.4.23-pre6-pom-20030912/net/ipv4/netfilter/ip_conntrack_proto_tcp.c	Tue Oct  7 23:07:12 2003
@@ -271,7 +271,7 @@
  *	sCL -> sCL
  */
 /* 	     sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sLI	*/
-/*rst*/    { sIV, sCL, sCL, sCL, sCL, sIV, sCL, sCL, sCL, sIV },
+/*rst*/    { sIV, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sIV },
 /*none*/   { sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV }
 	}
 };

Could you please review and/or apply it to recent patch-o-matic ?

BTW, I noticed that the tcp_conntracks[] array takes some memory because every
entry takes one full int. This is 480 bytes on x86. Wouldn't it be more cache-
friendly to use compact structures with bit fields instead ?

For example, either something like this (byte-aligned) :
  u8 tcp_conntracks[6][10] = { (REPLY_STATE)<<4 | (ORIGINAL_STATE), ... }
   -> read it from (tcp_conntrack[flag][old_state] >> dir?4:0) & 0xF

or (u32 aligned) 80 bytes :
  int tcp_conntracks[2][10] = { /* original */
                                { /* old sNO */
                                  sSS << SYN | sTW << FIN | sES << ACK ...
  -> read id from (tcp_conntrack[dir][old_state] >> flag*4) & 0xF.

Same could be done on unclean(). If someone's interested, I can try to look
at what seems possible, and propose something.

Regards,
Willy



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH-2.4-POM] sessions stuck in CLOSE_WAIT state
  2003-10-07 22:18 [PATCH-2.4-POM] sessions stuck in CLOSE_WAIT state Willy Tarreau
@ 2003-10-08  9:44 ` Harald Welte
  2003-10-09  7:01     ` Jozsef Kadlecsik
  0 siblings, 1 reply; 14+ messages in thread
From: Harald Welte @ 2003-10-08  9:44 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: kadlec, netfilter, Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 2818 bytes --]

On Wed, Oct 08, 2003 at 12:18:17AM +0200, Willy Tarreau wrote:
> Fortunately, tcpdump showed to me that all stuck sessions had in common the
> fact that the server aborted them with an RST after the first FIN+ACK, so
> in CLOSE_WAIT. Others which were aborted with an RST just before or just after
> were destroyed without any problem.

Ok, this sounds like a pretty clear case than.  This bug has not been
found because replying with RST afer FIN+ACK seems a quite unusual
pattern.
 
> /*rst*/ {sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sLA, sLI },
> 
> But the tcp_window_tracking patch changes this behaviour, I think because of
> a typo, since I see no other reason not to terminate a session upon RST :
> 
> /*          sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sLI   */
> /*rst*/   { sIV, sCL, sCL, sCL, sCL, sIV, sCL, sCL, sCL, sIV },

yes, I totally agree with you.  I'll apply your proposed patch to
patch-o-matic.  If Jozsef disagrees, he can revert that change.

> BTW, is it normal that an RST received in TIME_WAIT state terminates the
> session ? I would have thought that we should keep it in TIME_WAIT anyway.

again agreed.  Jozsef, what do you think?

> BTW, I noticed that the tcp_conntracks[] array takes some memory because every
> entry takes one full int. This is 480 bytes on x86. Wouldn't it be more cache-
> friendly to use compact structures with bit fields instead ?

yes, it was more cache line friendly.  But then, wouldn't we get lots of
unaligned accesses?  oh you made a u32 aligned version...

> For example, either something like this (byte-aligned) :
>   u8 tcp_conntracks[6][10] = { (REPLY_STATE)<<4 | (ORIGINAL_STATE), ... }
>    -> read it from (tcp_conntrack[flag][old_state] >> dir?4:0) & 0xF
> 
> or (u32 aligned) 80 bytes :
>   int tcp_conntracks[2][10] = { /* original */
>                                 { /* old sNO */
>                                   sSS << SYN | sTW << FIN | sES << ACK ...
>   -> read id from (tcp_conntrack[dir][old_state] >> flag*4) & 0xF.

this makes the code really hard to read.  If you can come up with some
easy-to-read macros, I'd be willing to give it a try.

> Same could be done on unclean(). If someone's interested, I can try to look
> at what seems possible, and propose something.

yes, I'm interested - providing the code is not crippled by bit-shifts
and multiplications everywhere.

> Regards,
> Willy

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH-2.4-POM] sessions stuck in CLOSE_WAIT state
  2003-10-08  9:44 ` Harald Welte
@ 2003-10-09  7:01     ` Jozsef Kadlecsik
  0 siblings, 0 replies; 14+ messages in thread
From: Jozsef Kadlecsik @ 2003-10-09  7:01 UTC (permalink / raw)
  To: Harald Welte; +Cc: Willy Tarreau, netfilter, Netfilter Development Mailinglist

On Wed, 8 Oct 2003, Harald Welte wrote:

> On Wed, Oct 08, 2003 at 12:18:17AM +0200, Willy Tarreau wrote:
> > Fortunately, tcpdump showed to me that all stuck sessions had in common the
> > fact that the server aborted them with an RST after the first FIN+ACK, so
> > in CLOSE_WAIT. Others which were aborted with an RST just before or just after
> > were destroyed without any problem.
>
> Ok, this sounds like a pretty clear case than.  This bug has not been
> found because replying with RST afer FIN+ACK seems a quite unusual
> pattern.
>
> > /*rst*/ {sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sLA, sLI },
> >
> > But the tcp_window_tracking patch changes this behaviour, I think because of
> > a typo, since I see no other reason not to terminate a session upon RST :
> >
> > /*          sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sLI   */
> > /*rst*/   { sIV, sCL, sCL, sCL, sCL, sIV, sCL, sCL, sCL, sIV },
>
> yes, I totally agree with you.  I'll apply your proposed patch to
> patch-o-matic.  If Jozsef disagrees, he can revert that change.

I agree completely.

> > BTW, is it normal that an RST received in TIME_WAIT state terminates the
> > session ? I would have thought that we should keep it in TIME_WAIT anyway.
>
> again agreed.  Jozsef, what do you think?

It's normal, but highly unusual I think. (Were it an "optimization" of TCP?)

Thank you very much the nice analysis and report.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH-2.4-POM] sessions stuck in CLOSE_WAIT state
@ 2003-10-09  7:01     ` Jozsef Kadlecsik
  0 siblings, 0 replies; 14+ messages in thread
From: Jozsef Kadlecsik @ 2003-10-09  7:01 UTC (permalink / raw)
  To: Harald Welte; +Cc: Willy Tarreau, netfilter, Netfilter Development Mailinglist

On Wed, 8 Oct 2003, Harald Welte wrote:

> On Wed, Oct 08, 2003 at 12:18:17AM +0200, Willy Tarreau wrote:
> > Fortunately, tcpdump showed to me that all stuck sessions had in common the
> > fact that the server aborted them with an RST after the first FIN+ACK, so
> > in CLOSE_WAIT. Others which were aborted with an RST just before or just after
> > were destroyed without any problem.
>
> Ok, this sounds like a pretty clear case than.  This bug has not been
> found because replying with RST afer FIN+ACK seems a quite unusual
> pattern.
>
> > /*rst*/ {sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sLA, sLI },
> >
> > But the tcp_window_tracking patch changes this behaviour, I think because of
> > a typo, since I see no other reason not to terminate a session upon RST :
> >
> > /*          sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sLI   */
> > /*rst*/   { sIV, sCL, sCL, sCL, sCL, sIV, sCL, sCL, sCL, sIV },
>
> yes, I totally agree with you.  I'll apply your proposed patch to
> patch-o-matic.  If Jozsef disagrees, he can revert that change.

I agree completely.

> > BTW, is it normal that an RST received in TIME_WAIT state terminates the
> > session ? I would have thought that we should keep it in TIME_WAIT anyway.
>
> again agreed.  Jozsef, what do you think?

It's normal, but highly unusual I think. (Were it an "optimization" of TCP?)

Thank you very much the nice analysis and report.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH-2.4-POM] NF_REPEAT was ignored !
  2003-10-09  7:01     ` Jozsef Kadlecsik
  (?)
@ 2003-10-22 10:25     ` Willy Tarreau
  2003-10-23 16:03         ` Willy Tarreau
  2003-10-23 22:02       ` [PATCH-2.4-POM] " Jozsef Kadlecsik
  -1 siblings, 2 replies; 14+ messages in thread
From: Willy Tarreau @ 2003-10-22 10:25 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Harald Welte, netfilter, Netfilter Development Mailinglist

Hi,

after updating the production firewalls to handle the CW->CL state, I saw the
rate of drops decrease, but not as much as I would have expected it to.

I captured lots of data (/p/n/ip_conntrack, logs, tcpdump) and discovered
another problem with tcp_window_tracking that I could easily reproduce on
a lab : if a client reused a port too early, then the SYN/ACK from the
server was dropped, and the client could only connect after the next SYN
retransmit. I simply checked it with nc -p 1234 server 80. The first one
succeeds immediately, the second one needs 3 seconds to establish. There
is a logical explication to this :

The client completes a first connection to server:80 with spt=1234. A few
seconds later, he reuses the same port to initiate a new connection to the
server. The firewall still sees the connection in TIME_WAIT state, so its
state matrix switches it to SYN_SENT  (orig:sTW--(SY)-->sSS).

In ip_conntrack_proto_tcp.c:tcp_packet(), there is a test for this case. The
existing session is deleted and NF_REPEAT is returned so that the caller tries
again (here, ip_conntrack_core.c:ip_conntrack_in()). This one simply returns
the same code NF_REPEAT to its caller which will call it again (nf_iterate()).

The problem is that once ip_conntrack_in() is called again with the same pskb,
it already has its ->nfct filled, so ip_conntrack_in() immediately returns
NF_ACCEPT without doing any lookup. The result is that the SYN is passed to
the server, and the deleted session is not recreated. When the server replies
with a SYN/ACK, this one has no matching session it is blocked by the firewall
rules. Then, 3 seconds later, the client retransmits its SYN, which reaches
the firewall without any matching session, and correctly initiates a new one.

The solution is to correctly clear the ->nfct field in ip_conntrack_in() if
we return NF_REPEAT. This is what the following patch does. It's to be applied
to 2.4+POM-20030912, but I'm confident it may be easily applied and/or ported
to later versions.

I've not checked yet if the mainline conntrack code is also affected, but this
could be possible.

Regards,
Willy


--- ./net/ipv4/netfilter/ip_conntrack_core.c.orig	Tue Oct 21 14:21:08 2003
+++ ./net/ipv4/netfilter/ip_conntrack_core.c	Tue Oct 21 16:14:53 2003
@@ -856,6 +861,14 @@
 	IP_NF_ASSERT((*pskb)->nfct);
 
 	ret = proto->packet(ct, (*pskb)->nh.iph, (*pskb)->len, ctinfo);
+
+	if (ret == NF_REPEAT) {
+		/* we must loop here again */
+		nf_conntrack_put((*pskb)->nfct);
+		(*pskb)->nfct = NULL;
+		return ret;
+	}
+
 	if (ret == -1) {
 		/* Invalid */
 		nf_conntrack_put((*pskb)->nfct);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH-2.4] NF_REPEAT was ignored !
  2003-10-22 10:25     ` [PATCH-2.4-POM] NF_REPEAT was ignored ! Willy Tarreau
@ 2003-10-23 16:03         ` Willy Tarreau
  2003-10-23 22:02       ` [PATCH-2.4-POM] " Jozsef Kadlecsik
  1 sibling, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2003-10-23 16:03 UTC (permalink / raw)
  To: laforge; +Cc: Netfilter Development Mailinglist, linux-kernel

Hi Harald !

Just replying to myself to state that vanilla 2.4.23-pre8 has the same problem
(linux-kernel cc'd for this matter), and the patch applies to it too. There is
a difference, though, because I found no user of NF_REPEAT in 2.4.23-pre8, so
as of today, no mainline code seems affected, but the bug is waiting for
someone to bite :-)

Please review, comment and/or apply.

Regards,
Willy

==== original mail below ====

On Wed, Oct 22, 2003 at 12:25:56PM +0200, Willy Tarreau wrote:
Hi,

after updating the production firewalls to handle the CW->CL state, I saw the
rate of drops decrease, but not as much as I would have expected it to.

I captured lots of data (/p/n/ip_conntrack, logs, tcpdump) and discovered
another problem with tcp_window_tracking that I could easily reproduce on
a lab : if a client reused a port too early, then the SYN/ACK from the
server was dropped, and the client could only connect after the next SYN
retransmit. I simply checked it with nc -p 1234 server 80. The first one
succeeds immediately, the second one needs 3 seconds to establish. There
is a logical explication to this :

The client completes a first connection to server:80 with spt=1234. A few
seconds later, he reuses the same port to initiate a new connection to the
server. The firewall still sees the connection in TIME_WAIT state, so its
state matrix switches it to SYN_SENT  (orig:sTW--(SY)-->sSS).

In ip_conntrack_proto_tcp.c:tcp_packet(), there is a test for this case. The
existing session is deleted and NF_REPEAT is returned so that the caller tries
again (here, ip_conntrack_core.c:ip_conntrack_in()). This one simply returns
the same code NF_REPEAT to its caller which will call it again (nf_iterate()).

The problem is that once ip_conntrack_in() is called again with the same pskb,
it already has its ->nfct filled, so ip_conntrack_in() immediately returns
NF_ACCEPT without doing any lookup. The result is that the SYN is passed to
the server, and the deleted session is not recreated. When the server replies
with a SYN/ACK, this one has no matching session it is blocked by the firewall
rules. Then, 3 seconds later, the client retransmits its SYN, which reaches
the firewall without any matching session, and correctly initiates a new one.

The solution is to correctly clear the ->nfct field in ip_conntrack_in() if
we return NF_REPEAT. This is what the following patch does. It's to be applied
to 2.4+POM-20030912, but I'm confident it may be easily applied and/or ported
to later versions.

I've not checked yet if the mainline conntrack code is also affected, but this
could be possible.

Regards,
Willy


--- ./net/ipv4/netfilter/ip_conntrack_core.c.orig	Tue Oct 21 14:21:08 2003
+++ ./net/ipv4/netfilter/ip_conntrack_core.c	Tue Oct 21 16:14:53 2003
@@ -856,6 +861,14 @@
 	IP_NF_ASSERT((*pskb)->nfct);
 
 	ret = proto->packet(ct, (*pskb)->nh.iph, (*pskb)->len, ctinfo);
+
+	if (ret == NF_REPEAT) {
+		/* we must loop here again */
+		nf_conntrack_put((*pskb)->nfct);
+		(*pskb)->nfct = NULL;
+		return ret;
+	}
+
 	if (ret == -1) {
 		/* Invalid */
 		nf_conntrack_put((*pskb)->nfct);



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH-2.4] NF_REPEAT was ignored !
@ 2003-10-23 16:03         ` Willy Tarreau
  0 siblings, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2003-10-23 16:03 UTC (permalink / raw)
  To: laforge; +Cc: Netfilter Development Mailinglist, linux-kernel

Hi Harald !

Just replying to myself to state that vanilla 2.4.23-pre8 has the same problem
(linux-kernel cc'd for this matter), and the patch applies to it too. There is
a difference, though, because I found no user of NF_REPEAT in 2.4.23-pre8, so
as of today, no mainline code seems affected, but the bug is waiting for
someone to bite :-)

Please review, comment and/or apply.

Regards,
Willy

==== original mail below ====

On Wed, Oct 22, 2003 at 12:25:56PM +0200, Willy Tarreau wrote:
Hi,

after updating the production firewalls to handle the CW->CL state, I saw the
rate of drops decrease, but not as much as I would have expected it to.

I captured lots of data (/p/n/ip_conntrack, logs, tcpdump) and discovered
another problem with tcp_window_tracking that I could easily reproduce on
a lab : if a client reused a port too early, then the SYN/ACK from the
server was dropped, and the client could only connect after the next SYN
retransmit. I simply checked it with nc -p 1234 server 80. The first one
succeeds immediately, the second one needs 3 seconds to establish. There
is a logical explication to this :

The client completes a first connection to server:80 with spt=1234. A few
seconds later, he reuses the same port to initiate a new connection to the
server. The firewall still sees the connection in TIME_WAIT state, so its
state matrix switches it to SYN_SENT  (orig:sTW--(SY)-->sSS).

In ip_conntrack_proto_tcp.c:tcp_packet(), there is a test for this case. The
existing session is deleted and NF_REPEAT is returned so that the caller tries
again (here, ip_conntrack_core.c:ip_conntrack_in()). This one simply returns
the same code NF_REPEAT to its caller which will call it again (nf_iterate()).

The problem is that once ip_conntrack_in() is called again with the same pskb,
it already has its ->nfct filled, so ip_conntrack_in() immediately returns
NF_ACCEPT without doing any lookup. The result is that the SYN is passed to
the server, and the deleted session is not recreated. When the server replies
with a SYN/ACK, this one has no matching session it is blocked by the firewall
rules. Then, 3 seconds later, the client retransmits its SYN, which reaches
the firewall without any matching session, and correctly initiates a new one.

The solution is to correctly clear the ->nfct field in ip_conntrack_in() if
we return NF_REPEAT. This is what the following patch does. It's to be applied
to 2.4+POM-20030912, but I'm confident it may be easily applied and/or ported
to later versions.

I've not checked yet if the mainline conntrack code is also affected, but this
could be possible.

Regards,
Willy


--- ./net/ipv4/netfilter/ip_conntrack_core.c.orig	Tue Oct 21 14:21:08 2003
+++ ./net/ipv4/netfilter/ip_conntrack_core.c	Tue Oct 21 16:14:53 2003
@@ -856,6 +861,14 @@
 	IP_NF_ASSERT((*pskb)->nfct);
 
 	ret = proto->packet(ct, (*pskb)->nh.iph, (*pskb)->len, ctinfo);
+
+	if (ret == NF_REPEAT) {
+		/* we must loop here again */
+		nf_conntrack_put((*pskb)->nfct);
+		(*pskb)->nfct = NULL;
+		return ret;
+	}
+
 	if (ret == -1) {
 		/* Invalid */
 		nf_conntrack_put((*pskb)->nfct);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH-2.4-POM] NF_REPEAT was ignored !
  2003-10-22 10:25     ` [PATCH-2.4-POM] NF_REPEAT was ignored ! Willy Tarreau
  2003-10-23 16:03         ` Willy Tarreau
@ 2003-10-23 22:02       ` Jozsef Kadlecsik
  2003-10-23 22:08         ` Willy Tarreau
  2003-10-25 12:35         ` [RFC][PATCH-2.4] fix NF_REPEAT (take 2) Willy TARREAU
  1 sibling, 2 replies; 14+ messages in thread
From: Jozsef Kadlecsik @ 2003-10-23 22:02 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Harald Welte, Netfilter Development Mailinglist

Hi Willy,

On Wed, 22 Oct 2003, Willy Tarreau wrote:

> I captured lots of data (/p/n/ip_conntrack, logs, tcpdump) and discovered
> another problem with tcp_window_tracking that I could easily reproduce on
> a lab : if a client reused a port too early, then the SYN/ACK from the
> server was dropped, and the client could only connect after the next SYN
> retransmit. I simply checked it with nc -p 1234 server 80. The first one
> succeeds immediately, the second one needs 3 seconds to establish. There
> is a logical explication to this :

[...]

> The problem is that once ip_conntrack_in() is called again with the same pskb,
> it already has its ->nfct filled, so ip_conntrack_in() immediately returns
> NF_ACCEPT without doing any lookup. The result is that the SYN is passed to
> the server, and the deleted session is not recreated. When the server replies
> with a SYN/ACK, this one has no matching session it is blocked by the firewall
> rules. Then, 3 seconds later, the client retransmits its SYN, which reaches
> the firewall without any matching session, and correctly initiates a new one.
>
> The solution is to correctly clear the ->nfct field in ip_conntrack_in() if
> we return NF_REPEAT. This is what the following patch does. It's to be applied
> to 2.4+POM-20030912, but I'm confident it may be easily applied and/or ported
> to later versions.

Well spotted problem and the fix is correct. But I'd better apply
a sligthly modified version which does not add one more condition to the
conntrack core:

--- linux/net/ipv4/netfilter/ip_conntrack_core.c.orig	2003-10-23 23:43:20.000000000  +0200
+++ linux/net/ipv4/netfilter/ip_conntrack_core.c	2003-10-23 23:46:03.000000000  +0200
@@ -838,11 +838,12 @@
 	IP_NF_ASSERT((*pskb)->nfct);

 	ret = proto->packet(ct, *pskb, ctinfo);
-	if (ret == -1) {
-		/* Invalid */
+	if (ret < 0 ) {
+		/* Invalid: inverse of the return code tells
+		   to netfilter core what to do. */
		nf_conntrack_put((*pskb)->nfct);
 		(*pskb)->nfct = NULL;
-		return NF_ACCEPT;
+		return -ret;
 	}

 	if (ret != NF_DROP && ct->helper) {

with the slight modification of the TCP window tracking code by returning
-NF_REPEAT instead of NF_REPEAT.

> I've not checked yet if the mainline conntrack code is also affected, but this
> could be possible.

No, that does not handle reopening at all.

Thank you your valuable reports!

Best regards,
Jozsef

-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH-2.4-POM] NF_REPEAT was ignored !
  2003-10-23 22:02       ` [PATCH-2.4-POM] " Jozsef Kadlecsik
@ 2003-10-23 22:08         ` Willy Tarreau
  2003-10-25 12:35         ` [RFC][PATCH-2.4] fix NF_REPEAT (take 2) Willy TARREAU
  1 sibling, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2003-10-23 22:08 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Harald Welte, Netfilter Development Mailinglist

On Fri, Oct 24, 2003 at 12:02:03AM +0200, Jozsef Kadlecsik wrote:
 
> Well spotted problem and the fix is correct. But I'd better apply
> a sligthly modified version which does not add one more condition to the
> conntrack core:
> 
> --- linux/net/ipv4/netfilter/ip_conntrack_core.c.orig	2003-10-23 23:43:20.000000000  +0200
> +++ linux/net/ipv4/netfilter/ip_conntrack_core.c	2003-10-23 23:46:03.000000000  +0200
> @@ -838,11 +838,12 @@
>  	IP_NF_ASSERT((*pskb)->nfct);
> 
>  	ret = proto->packet(ct, *pskb, ctinfo);
> -	if (ret == -1) {
> -		/* Invalid */
> +	if (ret < 0 ) {
> +		/* Invalid: inverse of the return code tells
> +		   to netfilter core what to do. */
> 		nf_conntrack_put((*pskb)->nfct);
>  		(*pskb)->nfct = NULL;
> -		return NF_ACCEPT;
> +		return -ret;
>  	}
> 
>  	if (ret != NF_DROP && ct->helper) {
> 
> with the slight modification of the TCP window tracking code by returning
> -NF_REPEAT instead of NF_REPEAT.

That's a very clever way of doing it ! But in case other ->packet() functions
one day rely on NF_REPEAT, perhaps it's a bit risky, or the documentation
should be extended to state clearly that negative values will lead to the
->nfct being dropped and the opposite value to be returned.

> > I've not checked yet if the mainline conntrack code is also affected, but this
> > could be possible.
> 
> No, that does not handle reopening at all.

In fact yes, the conntrack code can do it but nothing uses it. It should be
safer to apply the same patch to keep the code in sync.

Cheers,
Willy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][PATCH-2.4] fix NF_REPEAT (take 2)
  2003-10-23 22:02       ` [PATCH-2.4-POM] " Jozsef Kadlecsik
  2003-10-23 22:08         ` Willy Tarreau
@ 2003-10-25 12:35         ` Willy TARREAU
  2003-10-27  9:20           ` Jozsef Kadlecsik
  1 sibling, 1 reply; 14+ messages in thread
From: Willy TARREAU @ 2003-10-25 12:35 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Harald Welte, Netfilter Development Mailinglist, marcelo.tosatti

Hi Jozsef,

here come two patches to fix the NF_REPEAT bug ; the first one based upon
your suggestion (your patch was for 2.6), but against ip_conntrack_core.c
in 2.4, which is the same in standard kernel and tcp_window_tracking, and
the second one is only targetted at a window_tracking enabled kernel.

I have verified that they fix the reported problem. Could you please check
them so that Marcelo can merge it before 2.4.23-final ?

Regards,
Willy

====== first one, for Marcelo

diff -urN linux-2.4.23-pre8/net/ipv4/netfilter/ip_conntrack_core.c linux-2.4.23-pre8-nf_repeat/net/ipv4/netfilter/ip_conntrack_core.c
--- linux-2.4.23-pre8/net/ipv4/netfilter/ip_conntrack_core.c	Fri Oct 10 08:47:16 2003
+++ linux-2.4.23-pre8-nf_repeat/net/ipv4/netfilter/ip_conntrack_core.c	Fri Oct 24 09:09:22 2003
@@ -859,11 +859,13 @@
 	IP_NF_ASSERT((*pskb)->nfct);
 
 	ret = proto->packet(ct, (*pskb)->nh.iph, (*pskb)->len, ctinfo);
-	if (ret == -1) {
-		/* Invalid */
+	if (ret < 0) {
+		/* Invalid: inverse of the return code tells
+		 * to netfilter core what to do :
+		 * -1 == -NF_ACCEPT,  -NF_REPEAT, ... */
 		nf_conntrack_put((*pskb)->nfct);
 		(*pskb)->nfct = NULL;
-		return NF_ACCEPT;
+		return -ret;
 	}
 
 	if (ret != NF_DROP && ct->helper) {



====== second one, for window tracking use only

diff -urN linux-2.4.23-pre8-wintrack-20031023/net/ipv4/netfilter/ip_conntrack_proto_tcp.c linux-2.4.23-pre8-wintrack-nf_repeat/net/ipv4/netfilter/ip_conntrack_proto_tcp.c
--- linux-2.4.23-pre8-wintrack-20031023/net/ipv4/netfilter/ip_conntrack_proto_tcp.c	Fri Oct 24 09:13:33 2003
+++ linux-2.4.23-pre8-wintrack-nf_repeat/net/ipv4/netfilter/ip_conntrack_proto_tcp.c	Fri Oct 24 09:15:30 2003
@@ -736,7 +736,7 @@
 	    	WRITE_UNLOCK(&tcp_lock);
 	    	if (del_timer(&conntrack->timeout))
 	    		conntrack->timeout.function((unsigned long)conntrack);
-	    	return NF_REPEAT;
+	    	return -NF_REPEAT;
 	} else if (!(new_state == TCP_CONNTRACK_MAX
 		   || tcp_in_window(&conntrack->proto.tcp,
 		   		    dir, iph, len, tcph)))

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH-2.4] fix NF_REPEAT (take 2)
  2003-10-25 12:35         ` [RFC][PATCH-2.4] fix NF_REPEAT (take 2) Willy TARREAU
@ 2003-10-27  9:20           ` Jozsef Kadlecsik
  0 siblings, 0 replies; 14+ messages in thread
From: Jozsef Kadlecsik @ 2003-10-27  9:20 UTC (permalink / raw)
  To: Willy TARREAU
  Cc: Harald Welte, Netfilter Development Mailinglist, marcelo.tosatti

Hi Willy,

On Sat, 25 Oct 2003, Willy TARREAU wrote:

> here come two patches to fix the NF_REPEAT bug ; the first one based upon
> your suggestion (your patch was for 2.6), but against ip_conntrack_core.c
> in 2.4, which is the same in standard kernel and tcp_window_tracking, and
> the second one is only targetted at a window_tracking enabled kernel.
>
> I have verified that they fix the reported problem. Could you please check
> them so that Marcelo can merge it before 2.4.23-final ?

The patches are all right. I'll create p-o-m patches from them and then
Harald can submit the first part for kernel inclusion. (I'd like a TCP
window tracking patch completed with the other pending fixes issued at the
same time.)

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH-2.4-POM] NF_REPEAT was ignored !
@ 2003-10-22 12:00 Gaby Schilders
  0 siblings, 0 replies; 14+ messages in thread
From: Gaby Schilders @ 2003-10-22 12:00 UTC (permalink / raw)
  To: netfilter list

What I meant to say is that tcp-window-tracking has been patched two weeks ago,
more recent than p-o-m 2003-09-24 that your patch is against.

I didn't look at the addressing of your post, my bad.

Gaby

-----Original Message-----
From: Willy Tarreau [mailto:willy@w.ods.org]
Sent: woensdag 22 oktober 2003 13:55
To: Gaby Schilders
Cc: netfilter list
Subject: Re: [PATCH-2.4-POM] NF_REPEAT was ignored !


> There are quite recent changes to the tcp-window-tracking patches.

At least not since the last one I sent, it seems :-)

> Also, I was wondering if this might not be better suited for the developer
> list?

indeed, I replied to a previous mail, hence this cross-posting. The nfdev
list was also cc'd.

Cheers,
Willy



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH-2.4-POM] NF_REPEAT was ignored !
  2003-10-22 10:42 [PATCH-2.4-POM] NF_REPEAT was ignored ! Gaby Schilders
@ 2003-10-22 11:54 ` Willy Tarreau
  0 siblings, 0 replies; 14+ messages in thread
From: Willy Tarreau @ 2003-10-22 11:54 UTC (permalink / raw)
  To: Gaby Schilders; +Cc: netfilter list

> There are quite recent changes to the tcp-window-tracking patches.

At least not since the last one I sent, it seems :-)

> Also, I was wondering if this might not be better suited for the developer
> list?

indeed, I replied to a previous mail, hence this cross-posting. The nfdev
list was also cc'd.

Cheers,
Willy



^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH-2.4-POM] NF_REPEAT was ignored !
@ 2003-10-22 10:42 Gaby Schilders
  2003-10-22 11:54 ` Willy Tarreau
  0 siblings, 1 reply; 14+ messages in thread
From: Gaby Schilders @ 2003-10-22 10:42 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netfilter list

Hi,

There are quite recent changes to the tcp-window-tracking patches. You can
look them up on the CVS interface to p-o-m at the netfilter site:
http://cvs.netfilter.org/netfilter/patch-o-matic/extra/tcp-window-tracking.patch
I haven't checked if they include a solution to your problem, but maybe you
can. (My knowledge of C is... rather limited ;)

Also, I was wondering if this might not be better suited for the developer
list?

Regards,

Gaby Schilders
IBFD network admin

-----Original Message-----
From: Willy Tarreau [mailto:willy@w.ods.org]
Sent: woensdag 22 oktober 2003 12:26
To: Jozsef Kadlecsik
Cc: Harald Welte; netfilter@lists.netfilter.org; Netfilter Development
Mailinglist
Subject: [PATCH-2.4-POM] NF_REPEAT was ignored !


Hi,

after updating the production firewalls to handle the CW->CL state, I saw the
rate of drops decrease, but not as much as I would have expected it to.

I captured lots of data (/p/n/ip_conntrack, logs, tcpdump) and discovered
another problem with tcp_window_tracking that I could easily reproduce on
a lab : if a client reused a port too early, then the SYN/ACK from the
server was dropped, and the client could only connect after the next SYN
retransmit. I simply checked it with nc -p 1234 server 80. The first one
succeeds immediately, the second one needs 3 seconds to establish. There
is a logical explication to this :

The client completes a first connection to server:80 with spt=1234. A few
seconds later, he reuses the same port to initiate a new connection to the
server. The firewall still sees the connection in TIME_WAIT state, so its
state matrix switches it to SYN_SENT  (orig:sTW--(SY)-->sSS).

In ip_conntrack_proto_tcp.c:tcp_packet(), there is a test for this case. The
existing session is deleted and NF_REPEAT is returned so that the caller tries
again (here, ip_conntrack_core.c:ip_conntrack_in()). This one simply returns
the same code NF_REPEAT to its caller which will call it again (nf_iterate()).

The problem is that once ip_conntrack_in() is called again with the same pskb,
it already has its ->nfct filled, so ip_conntrack_in() immediately returns
NF_ACCEPT without doing any lookup. The result is that the SYN is passed to
the server, and the deleted session is not recreated. When the server replies
with a SYN/ACK, this one has no matching session it is blocked by the firewall
rules. Then, 3 seconds later, the client retransmits its SYN, which reaches
the firewall without any matching session, and correctly initiates a new one.

The solution is to correctly clear the ->nfct field in ip_conntrack_in() if
we return NF_REPEAT. This is what the following patch does. It's to be applied
to 2.4+POM-20030912, but I'm confident it may be easily applied and/or ported
to later versions.

I've not checked yet if the mainline conntrack code is also affected, but this
could be possible.

Regards,
Willy


--- ./net/ipv4/netfilter/ip_conntrack_core.c.orig	Tue Oct 21 14:21:08 2003
+++ ./net/ipv4/netfilter/ip_conntrack_core.c	Tue Oct 21 16:14:53 2003
@@ -856,6 +861,14 @@
 	IP_NF_ASSERT((*pskb)->nfct);
 
 	ret = proto->packet(ct, (*pskb)->nh.iph, (*pskb)->len, ctinfo);
+
+	if (ret == NF_REPEAT) {
+		/* we must loop here again */
+		nf_conntrack_put((*pskb)->nfct);
+		(*pskb)->nfct = NULL;
+		return ret;
+	}
+
 	if (ret == -1) {
 		/* Invalid */
 		nf_conntrack_put((*pskb)->nfct);





^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2003-10-27  9:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-07 22:18 [PATCH-2.4-POM] sessions stuck in CLOSE_WAIT state Willy Tarreau
2003-10-08  9:44 ` Harald Welte
2003-10-09  7:01   ` Jozsef Kadlecsik
2003-10-09  7:01     ` Jozsef Kadlecsik
2003-10-22 10:25     ` [PATCH-2.4-POM] NF_REPEAT was ignored ! Willy Tarreau
2003-10-23 16:03       ` [PATCH-2.4] " Willy Tarreau
2003-10-23 16:03         ` Willy Tarreau
2003-10-23 22:02       ` [PATCH-2.4-POM] " Jozsef Kadlecsik
2003-10-23 22:08         ` Willy Tarreau
2003-10-25 12:35         ` [RFC][PATCH-2.4] fix NF_REPEAT (take 2) Willy TARREAU
2003-10-27  9:20           ` Jozsef Kadlecsik
2003-10-22 10:42 [PATCH-2.4-POM] NF_REPEAT was ignored ! Gaby Schilders
2003-10-22 11:54 ` Willy Tarreau
2003-10-22 12:00 Gaby Schilders

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.