All of lore.kernel.org
 help / color / mirror / Atom feed
* PREROUTING DNAT *inconsistent* behavior
@ 2010-12-15  4:42 Alec Matusis
  2010-12-17 22:20 ` Pascal Hambourg
  0 siblings, 1 reply; 6+ messages in thread
From: Alec Matusis @ 2010-12-15  4:42 UTC (permalink / raw)
  To: netfilter

We are operating large TCP chat servers: 8 servers per machine, about 70,000
outbound pps per machine. On each machine, all servers are listening on port
5228, and each server is listening on its own IP address. All IP addresses
are assigned to the same physical WAN interface, with virtual interfaces
eth0:*. 
The clients connect to an IP address of the server on port 443, and we have
the following port-forwarding rule in the NAT table:
*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A PREROUTING -p tcp --dport 443 -j REDIRECT --to-port 5228
 

In the FILTER table, we have:
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]

-A INPUT -d x.x.x.x/22 -p tcp -m multiport --dports 443,5228 -j ACCEPT

When we look at tcpdump, we mostly see the traffic between the port 443 on
the servers and various IPs of the clients, as expected.
The problem is that there is some very odd *rare* packets that tcpdump
shows, between the port 5228 on the server, and the clients. This is NOT
expected, since 5228 is forwarded to 443. The rate of this unexpected
traffic is about 2pps, or about 0.003% of the total number of packets. Most
of these packets (about 95% of them) are from the server to the client, with
NOTHING from the client to the server. 
#tcpdump  -n -ieth0 'port 5228'
20:22:34.657672 IP server.ip.5228 > client1.ip.49892: P
3242847898:3242847907(9) ack 3767768131 win 5840
20:22:36.308379 IP server.ip.5228 > client2.ip.57065: P
3305194993:3305195001(8) ack 579435130 win 46 <nop,nop,timestamp 2680337205
794384040>
20:22:37.237683 IP server.ip.5228 > client3.34992: F
2841447925:2841447925(0) ack 691623366 win 5840
20:22:37.794555 IP server.ip.87.5228 > client5.52491: F
3958524831:3958524831(0) ack 1914557806 win 46
These look like some martian packets, as if the firewall port-forwarding
rule has been ignored for them.
 
Typically, when we take a client IP that is a target of these martian
packets (e.g. client1.ip), and do
#tcpdump  -n -ieth0 'host client1.ip'
We discover that this client also participates in the normal connection to
the server port 443:
20:28:25.622835 IP client1.ip.2646 > server.ip.443: . ack 2789704759 win
64664
20:28:25.622853 IP server.ip.443 > client1.ip.2646: P 1:116(115) ack 0 win
5840
20:28:26.414852 IP client1.ip.2646 > server.ip.443: . ack 116 win 64549
20:28:26.414868 IP server.ip.443 > client1.ip.2646: P 116:124(8) ack 0 win
5840
20:28:27.142808 IP client1.ip.2646 > server.ip.443: . ack 124 win 64541

The ephemeral port on the client for the normal connection is always
different from the ephemeral port that receives those martian packets.
We cannot reproduce this on a staging or development machines, since these
odd packets appear only above a certain high overall packet rate.
Does this look like some kind of a race condition in netfilter, so that for
some outbound packets, the port-forwarding rules are ignored? 


This behavior appears on several different kernels between 2.6.18 and 2.6.32
and iptables between v1.3.6 and v1.4.4.



 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PREROUTING DNAT *inconsistent* behavior
  2010-12-15  4:42 PREROUTING DNAT *inconsistent* behavior Alec Matusis
@ 2010-12-17 22:20 ` Pascal Hambourg
  2010-12-18  0:01   ` Alec Matusis
  0 siblings, 1 reply; 6+ messages in thread
From: Pascal Hambourg @ 2010-12-17 22:20 UTC (permalink / raw)
  To: Alec Matusis; +Cc: netfilter

Hello,

Alec Matusis a écrit :
> We are operating large TCP chat servers: 8 servers per machine, about 70,000
> outbound pps per machine. On each machine, all servers are listening on port
> 5228, and each server is listening on its own IP address. All IP addresses
> are assigned to the same physical WAN interface, with virtual interfaces
> eth0:*. 

Note : eth0:* are not virtual interfaces, they are just IP aliases.

> The clients connect to an IP address of the server on port 443, and we have
> the following port-forwarding rule in the NAT table:
> *nat
> :PREROUTING ACCEPT [0:0]
> :POSTROUTING ACCEPT [0:0]
> :OUTPUT ACCEPT [0:0]
> -A PREROUTING -p tcp --dport 443 -j REDIRECT --to-port 5228

Hmm, I wouldn't have used REDIRECT if you want to preserve the
destination address. man iptables states : "It redirects the packet to
the machine itself by changing the destination IP to the primary address
of the incoming interface." I would have used DNAT instead to make sure
the destination address is not changed.

> The problem is that there is some very odd *rare* packets that tcpdump
> shows, between the port 5228 on the server, and the clients. This is NOT
> expected, since 5228 is forwarded to 443. The rate of this unexpected
> traffic is about 2pps, or about 0.003% of the total number of packets. Most
> of these packets (about 95% of them) are from the server to the client, with
> NOTHING from the client to the server. 

What are the other 5% then ?

> #tcpdump  -n -ieth0 'port 5228'
> 20:22:34.657672 IP server.ip.5228 > client1.ip.49892: P
> 3242847898:3242847907(9) ack 3767768131 win 5840
> 20:22:36.308379 IP server.ip.5228 > client2.ip.57065: P
> 3305194993:3305195001(8) ack 579435130 win 46 <nop,nop,timestamp 2680337205
> 794384040>
> 20:22:37.237683 IP server.ip.5228 > client3.34992: F
> 2841447925:2841447925(0) ack 691623366 win 5840
> 20:22:37.794555 IP server.ip.87.5228 > client5.52491: F
> 3958524831:3958524831(0) ack 1914557806 win 46
> These look like some martian packets, as if the firewall port-forwarding
> rule has been ignored for them.

They are probably packets classified in the INVALID state by the
connection tracking, which are ignored by the nat table. In a NAT setup,
INVALID packets should be dropped because of this. Now the real question
is : why are they classified in the INVALID state ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: PREROUTING DNAT *inconsistent* behavior
  2010-12-17 22:20 ` Pascal Hambourg
@ 2010-12-18  0:01   ` Alec Matusis
  2010-12-18  0:15     ` Pascal Hambourg
  0 siblings, 1 reply; 6+ messages in thread
From: Alec Matusis @ 2010-12-18  0:01 UTC (permalink / raw)
  To: 'Pascal Hambourg'; +Cc: netfilter

>I would have used DNAT instead to make sure
> the destination address is not changed.

Instead of REDIRECT, we used: 
-A PREROUTING -d server.ip -p tcp --dport 443 -j DNAT --to-destination
server.ip:5228
The result is exactly the same.

> What are the other 5% then ?

They are mostly RST packets from various clients:

root@serv6:~# tcpdump -nn -s2048 -A -ieth0 'port 5228 and dst host
server.ip'                         
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 2048 bytes
15:27:50.368876 IP client.ip1.10548 > server.ip.5228: R
3729545609:3729545609(0) win 0
E..(A......HM.~.H..V)4.l.LY.....P...<.........
15:28:54.077335 IP client.ip2.10566 > server.ip.5228: R
1354979512:1354979512(0) win 0
E..(A......=M.~.H..V)F.lP.X.....P.............
15:29:42.109229 IP client.ip3.12477 > server.ip.5228: R
1932917654:1932917654(0) win 0
E..(.|@.0....kT.H..V0..ls5......P....h........
15:32:04.729505 IP client.ip4.52554 > server.ip.5228: R
563692688:563692688(0) win 0
E..(..@.,..._...H..V.J.l!.D.....P.............
15:35:30.660292 IP client.ip5.3702 > server.ip.5228: R
3115365672:3115365672(0) win 0
E..(~_@....K.K(.H..V.v.l...(....P.............
15:39:34.739543 IP client.ip6.50657 > server.ip.5228: R
3859022157:3859022157(0) win 0
E..(Iu.....a[h%.H..V...l...M????P.............
15:41:33.761420 IP client.ip7.35088 > server.ip.5228: R
3766839986:3766839986(0) win 0
E..(J.@./.....vBH..V...l..j.....P...V.........


> They are probably packets classified in the INVALID state by the
> connection tracking, which are ignored by the nat table. In a NAT
> setup,
> INVALID packets should be dropped because of this. Now the real
> question
> is : why are they classified in the INVALID state ?

How can I verify that  these packets have been classified as in the INVALID
state? That may be the key to this problem.



> -----Original Message-----
> From: netfilter-owner@vger.kernel.org [mailto:netfilter-
> owner@vger.kernel.org] On Behalf Of Pascal Hambourg
> Sent: Friday, December 17, 2010 2:20 PM
> To: Alec Matusis
> Cc: netfilter@vger.kernel.org
> Subject: Re: PREROUTING DNAT *inconsistent* behavior
> 
> Hello,
> 
> Alec Matusis a écrit :
> > We are operating large TCP chat servers: 8 servers per machine, about
> 70,000
> > outbound pps per machine. On each machine, all servers are listening
> on port
> > 5228, and each server is listening on its own IP address. All IP
> addresses
> > are assigned to the same physical WAN interface, with virtual
> interfaces
> > eth0:*.
> 
> Note : eth0:* are not virtual interfaces, they are just IP aliases.
> 
> > The clients connect to an IP address of the server on port 443, and
> we have
> > the following port-forwarding rule in the NAT table:
> > *nat
> > :PREROUTING ACCEPT [0:0]
> > :POSTROUTING ACCEPT [0:0]
> > :OUTPUT ACCEPT [0:0]
> > -A PREROUTING -p tcp --dport 443 -j REDIRECT --to-port 5228
> 
> Hmm, I wouldn't have used REDIRECT if you want to preserve the
> destination address. man iptables states : "It redirects the packet to
> the machine itself by changing the destination IP to the primary
> address
> of the incoming interface." I would have used DNAT instead to make sure
> the destination address is not changed.
> 
> > The problem is that there is some very odd *rare* packets that
> tcpdump
> > shows, between the port 5228 on the server, and the clients. This is
> NOT
> > expected, since 5228 is forwarded to 443. The rate of this unexpected
> > traffic is about 2pps, or about 0.003% of the total number of
> packets. Most
> > of these packets (about 95% of them) are from the server to the
> client, with
> > NOTHING from the client to the server.
> 
> What are the other 5% then ?
> 
> > #tcpdump  -n -ieth0 'port 5228'
> > 20:22:34.657672 IP server.ip.5228 > client1.ip.49892: P
> > 3242847898:3242847907(9) ack 3767768131 win 5840
> > 20:22:36.308379 IP server.ip.5228 > client2.ip.57065: P
> > 3305194993:3305195001(8) ack 579435130 win 46 <nop,nop,timestamp
> 2680337205
> > 794384040>
> > 20:22:37.237683 IP server.ip.5228 > client3.34992: F
> > 2841447925:2841447925(0) ack 691623366 win 5840
> > 20:22:37.794555 IP server.ip.87.5228 > client5.52491: F
> > 3958524831:3958524831(0) ack 1914557806 win 46
> > These look like some martian packets, as if the firewall port-
> forwarding
> > rule has been ignored for them.
> 
> They are probably packets classified in the INVALID state by the
> connection tracking, which are ignored by the nat table. In a NAT
> setup,
> INVALID packets should be dropped because of this. Now the real
> question
> is : why are they classified in the INVALID state ?
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PREROUTING DNAT *inconsistent* behavior
  2010-12-18  0:01   ` Alec Matusis
@ 2010-12-18  0:15     ` Pascal Hambourg
  2010-12-18  1:55       ` Alec Matusis
  0 siblings, 1 reply; 6+ messages in thread
From: Pascal Hambourg @ 2010-12-18  0:15 UTC (permalink / raw)
  To: Alec Matusis; +Cc: netfilter

Alec Matusis a écrit :
>> I would have used DNAT instead to make sure
>> the destination address is not changed.
> 
> Instead of REDIRECT, we used: 
> -A PREROUTING -d server.ip -p tcp --dport 443 -j DNAT --to-destination
> server.ip:5228
> The result is exactly the same.

Do you mean that REDIRECT did not alter the destination address when it
was different from the primary address on eth0 ?

>> What are the other 5% then ?
> 
> They are mostly RST packets from various clients:

Sure, RSTs are sent in reply to the bogus packets from the servers.

>> They are probably packets classified in the INVALID state by the
>> connection tracking, which are ignored by the nat table. In a NAT
>> setup,
>> INVALID packets should be dropped because of this. Now the real
>> question
>> is : why are they classified in the INVALID state ?
> 
> How can I verify that  these packets have been classified as in the INVALID
> state? That may be the key to this problem.

As I suggested, DROP packets in the INVALID state. If you don't see them
any more, you'll know.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: PREROUTING DNAT *inconsistent* behavior
  2010-12-18  0:15     ` Pascal Hambourg
@ 2010-12-18  1:55       ` Alec Matusis
  2010-12-20 21:05         ` Pascal Hambourg
  0 siblings, 1 reply; 6+ messages in thread
From: Alec Matusis @ 2010-12-18  1:55 UTC (permalink / raw)
  To: 'Pascal Hambourg'; +Cc: netfilter

> Do you mean that REDIRECT did not alter the destination address when it
> was different from the primary address on eth0 ?

I cannot confirm or deny this, since currently all our production servers
run with:
-A PREROUTING -d server.ip -p tcp --dport 443 -j DNAT --to-destination
server.ip:5228
The REDIRECT rule is something we tried in the past, to see if these strange
packets from port 5228 would go away.

> As I suggested, DROP packets in the INVALID state. If you don't see
> them
> any more, you'll know.

I added the following logging rules:
-I OUTPUT 1 -p tcp --sport 5228 -m state --state INVALID -j LOG
and
-I INPUT 1 -p tcp -m state --state INVALID -j LOG

It turns out, that every strange packet that we see in tcpdump, that goes
out from port 5228, e.g.
17:34:05.147063 IP server.ip.5228 > client.ip.35263: F 65950323:65950323(0)
ack 4249584466 win 5840
is in the INVALID state as you suggested, since that client IP is found in
the INVALID state output log, and has the same timestamp:
#grep client.ip /var/log/messages
Dec 17 17:32:22 serv6 kernel: [9021890.300104] IN= OUT=eth0 SRC=server.ip
DST=client.ip LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=13916 DF PROTO=TCP
SPT=5228 DPT=35263 WINDOW=5840 RES=0x00 ACK FIN URGP=0 
Dec 17 17:32:30 serv6 kernel: [9021898.213417] IN= OUT=eth0 SRC=server.ip
DST=client.ip LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=45133 DF PROTO=TCP
SPT=5228 DPT=35312 WINDOW=5840 RES=0x00 ACK FIN URGP=0 
Dec 17 17:33:41 serv6 kernel: [9021968.570562] IN= OUT=eth0 SRC=server.ip
DST=client.ip LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=45134 DF PROTO=TCP
SPT=5228 DPT=35312 WINDOW=5840 RES=0x00 ACK FIN URGP=0 
Dec 17 17:34:05 serv6 kernel: [9021992.637769] IN= OUT=eth0 SRC=server.ip
DST=client.ip LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=13917 DF PROTO=TCP
SPT=5228 DPT=35263 WINDOW=5840 RES=0x00 ACK FIN URGP=0

What is strange however, is that even though I am also logging all incoming
packets in the INVALID state, there are no such packets with this client.ip.
This suggests that the server responds to a *normal* packet from this
client.ip with a packet in the INVALID state? Is there any way to track down
the reason why these INVALID state packets are generated in the server?


> -----Original Message-----
> From: netfilter-owner@vger.kernel.org [mailto:netfilter-
> owner@vger.kernel.org] On Behalf Of Pascal Hambourg
> Sent: Friday, December 17, 2010 4:15 PM
> To: Alec Matusis
> Cc: netfilter@vger.kernel.org
> Subject: Re: PREROUTING DNAT *inconsistent* behavior
> 
> Alec Matusis a écrit :
> >> I would have used DNAT instead to make sure
> >> the destination address is not changed.
> >
> > Instead of REDIRECT, we used:
> > -A PREROUTING -d server.ip -p tcp --dport 443 -j DNAT --to-
> destination
> > server.ip:5228
> > The result is exactly the same.
> 
> Do you mean that REDIRECT did not alter the destination address when it
> was different from the primary address on eth0 ?
> 
> >> What are the other 5% then ?
> >
> > They are mostly RST packets from various clients:
> 
> Sure, RSTs are sent in reply to the bogus packets from the servers.
> 
> >> They are probably packets classified in the INVALID state by the
> >> connection tracking, which are ignored by the nat table. In a NAT
> >> setup,
> >> INVALID packets should be dropped because of this. Now the real
> >> question
> >> is : why are they classified in the INVALID state ?
> >
> > How can I verify that  these packets have been classified as in the
> INVALID
> > state? That may be the key to this problem.
> 
> As I suggested, DROP packets in the INVALID state. If you don't see
> them
> any more, you'll know.
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: PREROUTING DNAT *inconsistent* behavior
  2010-12-18  1:55       ` Alec Matusis
@ 2010-12-20 21:05         ` Pascal Hambourg
  0 siblings, 0 replies; 6+ messages in thread
From: Pascal Hambourg @ 2010-12-20 21:05 UTC (permalink / raw)
  To: Alec Matusis; +Cc: netfilter

Alec Matusis a écrit :
>> Do you mean that REDIRECT did not alter the destination address when it
>> was different from the primary address on eth0 ?
> 
> I cannot confirm or deny this, since currently all our production servers
> run with:
> -A PREROUTING -d server.ip -p tcp --dport 443 -j DNAT --to-destination
> server.ip:5228
> The REDIRECT rule is something we tried in the past, to see if these strange
> packets from port 5228 would go away.

Ok. Note that you can skip the server address and use a single rule for
all the server addresses :

-A PREROUTING -p tcp --dport 443 -j DNAT --to-destination :5228

> It turns out, that every strange packet that we see in tcpdump, that goes
> out from port 5228, e.g.
> 17:34:05.147063 IP server.ip.5228 > client.ip.35263: F 65950323:65950323(0)
> ack 4249584466 win 5840
> is in the INVALID state as you suggested, since that client IP is found in
> the INVALID state output log, and has the same timestamp
[...]
> What is strange however, is that even though I am also logging all incoming
> packets in the INVALID state, there are no such packets with this client.ip.
> This suggests that the server responds to a *normal* packet from this
> client.ip with a packet in the INVALID state?

Maybe these packets belong to closed or lost TCP connections. You can
see that most of them have the FIN flag set. So the reason could be that
conntrack has forgotten about these connections.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-12-20 21:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-15  4:42 PREROUTING DNAT *inconsistent* behavior Alec Matusis
2010-12-17 22:20 ` Pascal Hambourg
2010-12-18  0:01   ` Alec Matusis
2010-12-18  0:15     ` Pascal Hambourg
2010-12-18  1:55       ` Alec Matusis
2010-12-20 21:05         ` Pascal Hambourg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.