netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TCP syn flood handling regression
@ 2012-03-10 12:27 Simon Kirby
  2012-03-10 17:44 ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Kirby @ 2012-03-10 12:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

Hello!

A typical port 80 SYN flood started up to one of our clusters, but this
time, it didn't work so well. Legitimate connections and trying to fetch
server-status via localhost would hang for ~30 seconds before responding,
even though though the box had plenty of spare cycles. An strace of all
Apache processes showed quite a bit of sleeping in accept4().

This was with 3.2.9, so I went back in kernel builds and found that 3.1
and 3.0 were also broken, while 2.6.39 works as I remember -- when syn
cookies are enabled, everything just works and is fast. The DoS kept up,
so I was able to feed a bit to a node to do some bisection.

Of course, the DoS stopped literally seconds before the last bisection
test, but I got it down to:

# good: [0e734419923bd8e599858f8fc196c7804bb85564] ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.
# bad: [ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0] ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit().

...leaving ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0 and
d9d8da805dcb503ef8ee49918a94d49085060f23 as culprits.

I've stared at them but can't see what could be doing this.

Simon-

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TCP syn flood handling regression
  2012-03-10 12:27 TCP syn flood handling regression Simon Kirby
@ 2012-03-10 17:44 ` Eric Dumazet
  2012-03-10 17:56   ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2012-03-10 17:44 UTC (permalink / raw)
  To: Simon Kirby; +Cc: David S. Miller, netdev

Le samedi 10 mars 2012 à 04:27 -0800, Simon Kirby a écrit :
> Hello!
> 
> A typical port 80 SYN flood started up to one of our clusters, but this
> time, it didn't work so well. Legitimate connections and trying to fetch
> server-status via localhost would hang for ~30 seconds before responding,
> even though though the box had plenty of spare cycles. An strace of all
> Apache processes showed quite a bit of sleeping in accept4().
> 
> This was with 3.2.9, so I went back in kernel builds and found that 3.1
> and 3.0 were also broken, while 2.6.39 works as I remember -- when syn
> cookies are enabled, everything just works and is fast. The DoS kept up,
> so I was able to feed a bit to a node to do some bisection.
> 
> Of course, the DoS stopped literally seconds before the last bisection
> test, but I got it down to:
> 
> # good: [0e734419923bd8e599858f8fc196c7804bb85564] ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.
> # bad: [ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0] ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit().
> 
> ...leaving ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0 and
> d9d8da805dcb503ef8ee49918a94d49085060f23 as culprits.
> 
> I've stared at them but can't see what could be doing this.
> 

Hi Simon, thanks a lot for this report.

This sounds as a SYNCOOKIE side effect

I could reproduce this with a typical DOS on ssh port.

Yes, it seems first messages are not sent at all, only after several tcp
timeouts :

Here my sshd write() its first frame (23 bytes) right after the 3WHS,
but we can see first frame leaves the machine 17 seconds later :

17:42:49.958432 IP 10.170.237.2.37550 > 192.168.20.110.22: S 3066771530:3066771530(0) win 14600 <mss 1460,sackOK,timestamp 19981015 0,nop,wscale 6>
17:42:49.958438 IP 192.168.20.110.22 > 10.170.237.2.37550: S 881027529:881027529(0) ack 3066771531 win 14480 <mss 1460,sackOK,timestamp 1550353622 19981015,nop,wscale 4>
17:42:50.137163 IP 10.170.237.2.37550 > 192.168.20.110.22: . ack 1 win 229 <nop,nop,timestamp 19981060 1550353622>

17:43:07.557096 IP 192.168.20.110.22 > 10.170.237.2.37550: P 1:24(23) ack 1 win 913 <nop,nop,timestamp 1550371264 19981060>
17:43:07.757096 IP 10.170.237.2.37550 > 192.168.20.110.22: . ack 24 win 229 <nop,nop,timestamp 19985459 1550371264>
17:43:07.936756 IP 10.170.237.2.37550 > 192.168.20.110.22: P 1:40(39) ack 24 win 229 <nop,nop,timestamp 19985459 1550371264>
17:43:07.936769 IP 192.168.20.110.22 > 10.170.237.2.37550: . ack 40 win 913 <nop,nop,timestamp 1550371643 19985459>
17:43:07.937384 IP 192.168.20.110.22 > 10.170.237.2.37550: P 24:664(640) ack 40 win 913 <nop,nop,timestamp 1550371644 19985459>
17:43:08.156252 IP 10.170.237.2.37550 > 192.168.20.110.22: P 40:1184(1144) ack 24 win 229 <nop,nop,timestamp 19985554 1550371643>

To check that its indeed the tcp retransmits that trigger and eventually send the frame :

$ ss -emoi dst 10.170.237.2
State       Recv-Q Send-Q                            Local Address:Port                                Peer Address:Port   
ESTAB       0      23                        ::ffff:192.168.20.110:ssh                          ::ffff:10.170.237.2:37604    timer:(on,700ms,0) ino:11067765 sk:f3564780
	 mem:(r0,w2240,f1856,t0) ts sack bic wscale:6,4 rto:1209 rtt:403/201.5 cwnd:10 ssthresh:5 send 287.4Kbps rcv_space:14600
$ ss -emoi dst 10.170.237.2
State       Recv-Q Send-Q                            Local Address:Port                                Peer Address:Port   
ESTAB       0      23                        ::ffff:192.168.20.110:ssh                          ::ffff:10.170.237.2:37604    timer:(on,1.529ms,1) ino:11067765 sk:f3564780
	 mem:(r0,w2240,f1856,t0) ts sack bic wscale:6,4 rto:2418 rtt:403/201.5 ssthresh:5 send 57.5Kbps rcv_space:14600
$ ss -emoi dst 10.170.237.2
State       Recv-Q Send-Q                            Local Address:Port                                Peer Address:Port   
ESTAB       0      23                        ::ffff:192.168.20.110:ssh                          ::ffff:10.170.237.2:37604    timer:(on,4.005ms,2) ino:11067765 sk:f3564780
	 mem:(r0,w2240,f1856,t0) ts sack bic wscale:6,4 rto:4836 rtt:403/201.5 ssthresh:5 send 57.5Kbps rcv_space:14600
$ ss -emoi dst 10.170.237.2
State       Recv-Q Send-Q                            Local Address:Port                                Peer Address:Port   
ESTAB       0      23                        ::ffff:192.168.20.110:ssh                          ::ffff:10.170.237.2:37604    timer:(on,1.773ms,2) ino:11067765 sk:f3564780
	 mem:(r0,w2240,f1856,t0) ts sack bic wscale:6,4 rto:4836 rtt:403/201.5 ssthresh:5 send 57.5Kbps rcv_space:14600
$ ss -emoi dst 10.170.237.2
State       Recv-Q Send-Q                            Local Address:Port                                Peer Address:Port   
ESTAB       0      23                        ::ffff:192.168.20.110:ssh                          ::ffff:10.170.237.2:37604    timer:(on,8.574ms,3) ino:11067765 sk:f3564780
	 mem:(r0,w2240,f1856,t0) ts sack bic wscale:6,4 rto:9672 rtt:403/201.5 ssthresh:5 send 57.5Kbps rcv_space:14600
$ ss -emoi dst 10.170.237.2
State       Recv-Q Send-Q                            Local Address:Port                                Peer Address:Port   
ESTAB       0      0                         ::ffff:192.168.20.110:ssh                          ::ffff:10.170.237.2:37604    timer:(keepalive,48min,0) ino:11067765 sk:f3564780
	 mem:(r0,w0,f0,t0) ts sack bic wscale:6,4 rto:806 rtt:263.875/105.5 ato:40 cwnd:6 ssthresh:5 send 263.4Kbps rcv_rtt:181 rcv_space:14600

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TCP syn flood handling regression
  2012-03-10 17:44 ` Eric Dumazet
@ 2012-03-10 17:56   ` Eric Dumazet
  2012-03-10 19:20     ` [PATCH] tcp: fix syncookie regression Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2012-03-10 17:56 UTC (permalink / raw)
  To: Simon Kirby; +Cc: David S. Miller, netdev

Le samedi 10 mars 2012 à 09:44 -0800, Eric Dumazet a écrit :
> Le samedi 10 mars 2012 à 04:27 -0800, Simon Kirby a écrit :
> > Hello!
> > 
> > A typical port 80 SYN flood started up to one of our clusters, but this
> > time, it didn't work so well. Legitimate connections and trying to fetch
> > server-status via localhost would hang for ~30 seconds before responding,
> > even though though the box had plenty of spare cycles. An strace of all
> > Apache processes showed quite a bit of sleeping in accept4().
> > 
> > This was with 3.2.9, so I went back in kernel builds and found that 3.1
> > and 3.0 were also broken, while 2.6.39 works as I remember -- when syn
> > cookies are enabled, everything just works and is fast. The DoS kept up,
> > so I was able to feed a bit to a node to do some bisection.
> > 
> > Of course, the DoS stopped literally seconds before the last bisection
> > test, but I got it down to:
> > 
> > # good: [0e734419923bd8e599858f8fc196c7804bb85564] ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.
> > # bad: [ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0] ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit().
> > 
> > ...leaving ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0 and
> > d9d8da805dcb503ef8ee49918a94d49085060f23 as culprits.
> > 
> > I've stared at them but can't see what could be doing this.
> > 
> 
> Hi Simon, thanks a lot for this report.
> 
> This sounds as a SYNCOOKIE side effect
> 

It seems we miss a rebuild_header() indeed in the case of a syncookie
initiated socket.

So inet->cork.fl.u.ip4 is not correctly setup and
ea4fc0d6193ff56fcef39b0d2210d402a7acb5f relied on this.

I'll send a patch once tested.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] tcp: fix syncookie regression
  2012-03-10 17:56   ` Eric Dumazet
@ 2012-03-10 19:20     ` Eric Dumazet
  2012-03-11 22:54       ` David Miller
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Eric Dumazet @ 2012-03-10 19:20 UTC (permalink / raw)
  To: Simon Kirby; +Cc: David S. Miller, netdev

commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit())
added a serious regression on synflood handling.

Simon Kirby discovered a successful connection was delayed by 20 seconds
before being responsive.

In my tests, I discovered that xmit frames were lost, and needed ~4
retransmits and a socket dst rebuild before being really sent.

In case of syncookie initiated connection, we use a different path to
initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared.

As ip_queue_xmit() now depends on inet flow being setup, fix this by
copying the temp flowi4 we use in cookie_v4_check().

Reported-by: Simon Kirby <sim@netnation.com>
Bisected-by: Simon Kirby <sim@netnation.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/syncookies.c |   30 ++++++++++++++++--------------
 net/ipv4/tcp_ipv4.c   |   10 +++++++---
 2 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 51fdbb4..eab2a7f 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -278,6 +278,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 	struct rtable *rt;
 	__u8 rcv_wscale;
 	bool ecn_ok = false;
+	struct flowi4 fl4;
 
 	if (!sysctl_tcp_syncookies || !th->ack || th->rst)
 		goto out;
@@ -346,20 +347,16 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 	 * hasn't changed since we received the original syn, but I see
 	 * no easy way to do this.
 	 */
-	{
-		struct flowi4 fl4;
-
-		flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
-				   RT_SCOPE_UNIVERSE, IPPROTO_TCP,
-				   inet_sk_flowi_flags(sk),
-				   (opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
-				   ireq->loc_addr, th->source, th->dest);
-		security_req_classify_flow(req, flowi4_to_flowi(&fl4));
-		rt = ip_route_output_key(sock_net(sk), &fl4);
-		if (IS_ERR(rt)) {
-			reqsk_free(req);
-			goto out;
-		}
+	flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
+			   RT_SCOPE_UNIVERSE, IPPROTO_TCP,
+			   inet_sk_flowi_flags(sk),
+			   (opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
+			   ireq->loc_addr, th->source, th->dest);
+	security_req_classify_flow(req, flowi4_to_flowi(&fl4));
+	rt = ip_route_output_key(sock_net(sk), &fl4);
+	if (IS_ERR(rt)) {
+		reqsk_free(req);
+		goto out;
 	}
 
 	/* Try to redo what tcp_v4_send_synack did. */
@@ -373,5 +370,10 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 	ireq->rcv_wscale  = rcv_wscale;
 
 	ret = get_cookie_sock(sk, skb, req, &rt->dst);
+	/* ip_queue_xmit() depends on our flow being setup
+	 * Normal sockets get it right from inet_csk_route_child_sock()
+	 */
+	if (ret)
+		inet_sk(ret)->cork.fl.u.ip4 = fl4;
 out:	return ret;
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 94d683a..fd54c5f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1466,9 +1466,13 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 		inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
 	newinet->inet_id = newtp->write_seq ^ jiffies;
 
-	if (!dst && (dst = inet_csk_route_child_sock(sk, newsk, req)) == NULL)
-		goto put_and_exit;
-
+	if (!dst) {
+		dst = inet_csk_route_child_sock(sk, newsk, req);
+		if (!dst)
+			goto put_and_exit;
+	} else {
+		/* syncookie case : see end of cookie_v4_check() */
+	}
 	sk_setup_caps(newsk, dst);
 
 	tcp_mtup_init(newsk);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] tcp: fix syncookie regression
  2012-03-10 19:20     ` [PATCH] tcp: fix syncookie regression Eric Dumazet
@ 2012-03-11 22:54       ` David Miller
  2012-03-12 20:30       ` Simon Kirby
  2012-03-13  7:26       ` Simon Kirby
  2 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2012-03-11 22:54 UTC (permalink / raw)
  To: eric.dumazet; +Cc: sim, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 10 Mar 2012 11:20:21 -0800

> commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit())
> added a serious regression on synflood handling.
> 
> Simon Kirby discovered a successful connection was delayed by 20 seconds
> before being responsive.
> 
> In my tests, I discovered that xmit frames were lost, and needed ~4
> retransmits and a socket dst rebuild before being really sent.
> 
> In case of syncookie initiated connection, we use a different path to
> initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared.
> 
> As ip_queue_xmit() now depends on inet flow being setup, fix this by
> copying the temp flowi4 we use in cookie_v4_check().
> 
> Reported-by: Simon Kirby <sim@netnation.com>
> Bisected-by: Simon Kirby <sim@netnation.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tcp: fix syncookie regression
  2012-03-10 19:20     ` [PATCH] tcp: fix syncookie regression Eric Dumazet
  2012-03-11 22:54       ` David Miller
@ 2012-03-12 20:30       ` Simon Kirby
  2012-03-13  7:26       ` Simon Kirby
  2 siblings, 0 replies; 14+ messages in thread
From: Simon Kirby @ 2012-03-12 20:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev

On Sat, Mar 10, 2012 at 11:20:21AM -0800, Eric Dumazet wrote:

> commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit())
> added a serious regression on synflood handling.
> 
> Simon Kirby discovered a successful connection was delayed by 20 seconds
> before being responsive.
> 
> In my tests, I discovered that xmit frames were lost, and needed ~4
> retransmits and a socket dst rebuild before being really sent.
> 
> In case of syncookie initiated connection, we use a different path to
> initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared.
> 
> As ip_queue_xmit() now depends on inet flow being setup, fix this by
> copying the temp flowi4 we use in cookie_v4_check().
> 
> Reported-by: Simon Kirby <sim@netnation.com>
> Bisected-by: Simon Kirby <sim@netnation.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/ipv4/syncookies.c |   30 ++++++++++++++++--------------
>  net/ipv4/tcp_ipv4.c   |   10 +++++++---
>  2 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index 51fdbb4..eab2a7f 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -278,6 +278,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>  	struct rtable *rt;
>  	__u8 rcv_wscale;
>  	bool ecn_ok = false;
> +	struct flowi4 fl4;
>  
>  	if (!sysctl_tcp_syncookies || !th->ack || th->rst)
>  		goto out;
> @@ -346,20 +347,16 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>  	 * hasn't changed since we received the original syn, but I see
>  	 * no easy way to do this.
>  	 */
> -	{
> -		struct flowi4 fl4;
> -
> -		flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
> -				   RT_SCOPE_UNIVERSE, IPPROTO_TCP,
> -				   inet_sk_flowi_flags(sk),
> -				   (opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
> -				   ireq->loc_addr, th->source, th->dest);
> -		security_req_classify_flow(req, flowi4_to_flowi(&fl4));
> -		rt = ip_route_output_key(sock_net(sk), &fl4);
> -		if (IS_ERR(rt)) {
> -			reqsk_free(req);
> -			goto out;
> -		}
> +	flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
> +			   RT_SCOPE_UNIVERSE, IPPROTO_TCP,
> +			   inet_sk_flowi_flags(sk),
> +			   (opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
> +			   ireq->loc_addr, th->source, th->dest);
> +	security_req_classify_flow(req, flowi4_to_flowi(&fl4));
> +	rt = ip_route_output_key(sock_net(sk), &fl4);
> +	if (IS_ERR(rt)) {
> +		reqsk_free(req);
> +		goto out;
>  	}
>  
>  	/* Try to redo what tcp_v4_send_synack did. */
> @@ -373,5 +370,10 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>  	ireq->rcv_wscale  = rcv_wscale;
>  
>  	ret = get_cookie_sock(sk, skb, req, &rt->dst);
> +	/* ip_queue_xmit() depends on our flow being setup
> +	 * Normal sockets get it right from inet_csk_route_child_sock()
> +	 */
> +	if (ret)
> +		inet_sk(ret)->cork.fl.u.ip4 = fl4;
>  out:	return ret;
>  }
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 94d683a..fd54c5f 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1466,9 +1466,13 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
>  		inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
>  	newinet->inet_id = newtp->write_seq ^ jiffies;
>  
> -	if (!dst && (dst = inet_csk_route_child_sock(sk, newsk, req)) == NULL)
> -		goto put_and_exit;
> -
> +	if (!dst) {
> +		dst = inet_csk_route_child_sock(sk, newsk, req);
> +		if (!dst)
> +			goto put_and_exit;
> +	} else {
> +		/* syncookie case : see end of cookie_v4_check() */
> +	}
>  	sk_setup_caps(newsk, dst);
>  
>  	tcp_mtup_init(newsk);

Tested under real SYN flood on 3.3-rc7 and 3.2.9 -- works! Thanks!

Simon-

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tcp: fix syncookie regression
  2012-03-10 19:20     ` [PATCH] tcp: fix syncookie regression Eric Dumazet
  2012-03-11 22:54       ` David Miller
  2012-03-12 20:30       ` Simon Kirby
@ 2012-03-13  7:26       ` Simon Kirby
  2012-03-13 14:07         ` Dave Jones
  2012-03-13 16:44         ` Neal Cardwell
  2 siblings, 2 replies; 14+ messages in thread
From: Simon Kirby @ 2012-03-13  7:26 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell; +Cc: netdev

On Sat, Mar 10, 2012 at 11:20:21AM -0800, Eric Dumazet wrote:

> Reported-by: Simon Kirby <sim@netnation.com>
> Bisected-by: Simon Kirby <sim@netnation.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/ipv4/syncookies.c |   30 ++++++++++++++++--------------
>  net/ipv4/tcp_ipv4.c   |   10 +++++++---
>  2 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index 51fdbb4..eab2a7f 100644

While deploying this on top of 3.2.9, we hit what seems to be the bug
fixed by 4648dc97af9d496218a05353b0e442b3dfa6aaab in 3.3. I see 3.2.9 has
daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, so maybe this is exposed in 3.2
now?

[ 1196.683749] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0()
[ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3058 tcp_ack+0x209b/0x25f0()
[ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0()
[ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3058 tcp_ack+0x209b/0x25f0()
[ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0()
[ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3058 tcp_ack+0x209b/0x25f0()
[ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0()
[ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3058 tcp_ack+0x209b/0x25f0()
[ 1196.716010] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0()
[ 1196.722262] WARNING: at net/ipv4/tcp_input.c:3058 tcp_ack+0x209b/0x25f0()
[ 1196.726111] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0()

net/ipv4/tcp_input.c:
     3438 #if FASTRETRANS_DEBUG > 0
---> 3439         WARN_ON((int)tp->sacked_out < 0);
     3440         WARN_ON((int)tp->lost_out < 0);
     3441         WARN_ON((int)tp->retrans_out < 0);
     3442         if (!tp->packets_out && tcp_is_sack(tp)) {
...
     3057         /* D. Check consistency of the current state. */
---> 3058         tcp_verify_left_out(tp);

Oops two seconds later, scrolled off console, didn't get written to disk
or remote syslog server, all we have is syslog-broadcasted Oops and Code
lines due to broken printk priorities:

Message from syslogd at Tue Mar 13 00:25:13 2012 ...
kernel: [ 1198.637303] Oops: 0000 [#1] SMP
...
kernel: [ 1198.640041] Code: f8 01 00 00 48 39 f9 49 89 ca 0f 84 cd 03 00 00 44 3b 67 40 79 28 e9 f7 04 00 00 66 90 4c 39 d1 0f 1f 44 00 00 0f 84 b2 03 00 00 <45> 3b 62 40 66 0f 1f 44 00 00 0f 88 a2 03 00 00 4c 89 d7 8b 87
kernel: [ 1198.640041] CR2: 0000000000000040
kernel: [ 1198.646208] Kernel panic - not syncing: Fatal exception in interrupt

(gdb) find /b 0, +1000000000, 0x45,0x3b,0x62,0x40,0x66,0x0f,0x1f,0x44,0x00,0x00
0x66594e <tcp_sacktag_write_queue+958>
(gdb) disass /m tcp_sacktag_write_queue
...
1709                    if (after(TCP_SKB_CB(skb)->end_seq, skip_to_seq))
   0x0000000000665933 <+931>:   cmp    0x40(%rdi),%r12d
   0x0000000000665937 <+935>:   jns    0x665961 <tcp_sacktag_write_queue+977>
   0x0000000000665939 <+937>:   jmpq   0x665e35 <tcp_sacktag_write_queue+2213>
   0x000000000066593e <+942>:   xchg   %ax,%ax
-> 0x000000000066594e <+958>:   cmp    0x40(%r10),%r12d
   0x0000000000665952 <+962>:   nopw   0x0(%rax,%rax,1)
   0x0000000000665958 <+968>:   js     0x665d00 <tcp_sacktag_write_queue+1904>
   0x000000000066595e <+974>:   mov    %r10,%rdi

This only happened on one of the 30 or so servers I had just deployed it to.

Simon-

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tcp: fix syncookie regression
  2012-03-13  7:26       ` Simon Kirby
@ 2012-03-13 14:07         ` Dave Jones
  2012-03-13 16:52           ` Neal Cardwell
  2012-03-13 16:44         ` Neal Cardwell
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Jones @ 2012-03-13 14:07 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Eric Dumazet, Neal Cardwell, netdev

On Tue, Mar 13, 2012 at 12:26:44AM -0700, Simon Kirby wrote:

 > Oops two seconds later, scrolled off console, didn't get written to disk
 > or remote syslog server, all we have is syslog-broadcasted Oops and Code
 > lines due to broken printk priorities:

I posted two reports we've had against Fedora yesterday which look like this bug.

https://bugzilla.redhat.com/show_bug.cgi?id=801599
https://bugzilla.redhat.com/show_bug.cgi?id=802660

	Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tcp: fix syncookie regression
  2012-03-13  7:26       ` Simon Kirby
  2012-03-13 14:07         ` Dave Jones
@ 2012-03-13 16:44         ` Neal Cardwell
  2012-03-13 17:37           ` Eric Dumazet
  1 sibling, 1 reply; 14+ messages in thread
From: Neal Cardwell @ 2012-03-13 16:44 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Eric Dumazet, netdev

On Tue, Mar 13, 2012 at 3:26 AM, Simon Kirby <sim@netnation.com> wrote:
>
> While deploying this on top of 3.2.9, we hit what seems to be the bug
> fixed by 4648dc97af9d496218a05353b0e442b3dfa6aaab in 3.3. I see 3.2.9 has
> daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, so maybe this is exposed in 3.2
> now?
...
> net/ipv4/tcp_input.c:
>     3438 #if FASTRETRANS_DEBUG > 0
> ---> 3439         WARN_ON((int)tp->sacked_out < 0);
>     3440         WARN_ON((int)tp->lost_out < 0);
>     3441         WARN_ON((int)tp->retrans_out < 0);
>     3442         if (!tp->packets_out && tcp_is_sack(tp)) {
> ...
>     3057         /* D. Check consistency of the current state. */
> ---> 3058         tcp_verify_left_out(tp);

Yes, exactly. 3.2.9 and 3.2.10 have
daef52bab1fd26e24e8e9578f8fb33ba1d0cb412 but not the
4648dc97af9d496218a05353b0e442b3dfa6aaab that fixes the resulting
issue with sacked_out going negative, and these lines you've flagged
are exactly the symptoms we'd expect because of that. The fix is
queued up for the -stable series already, so it should be in 3.2.11, I
presume. The fix is already in 3.3-rc7.

> Oops two seconds later, scrolled off console, didn't get written to disk
> or remote syslog server, all we have is syslog-broadcasted Oops and Code
> lines due to broken printk priorities:

As you can imagine, the oops is probably connected to sacked_out going
negative; others have reported an oops along with sacked_out going
negative. The fix should take care of that.

neal

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tcp: fix syncookie regression
  2012-03-13 14:07         ` Dave Jones
@ 2012-03-13 16:52           ` Neal Cardwell
  0 siblings, 0 replies; 14+ messages in thread
From: Neal Cardwell @ 2012-03-13 16:52 UTC (permalink / raw)
  To: Dave Jones; +Cc: Simon Kirby, Eric Dumazet, netdev

On Tue, Mar 13, 2012 at 10:07 AM, Dave Jones <davej@redhat.com> wrote:
> On Tue, Mar 13, 2012 at 12:26:44AM -0700, Simon Kirby wrote:
>
>  > Oops two seconds later, scrolled off console, didn't get written to disk
>  > or remote syslog server, all we have is syslog-broadcasted Oops and Code
>  > lines due to broken printk priorities:
>
> I posted two reports we've had against Fedora yesterday which look like this bug.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=801599
> https://bugzilla.redhat.com/show_bug.cgi?id=802660

Yes, these look like the same issue.

neal

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tcp: fix syncookie regression
  2012-03-13 16:44         ` Neal Cardwell
@ 2012-03-13 17:37           ` Eric Dumazet
  2012-03-13 22:59             ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2012-03-13 17:37 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Simon Kirby, netdev

On Tue, 2012-03-13 at 12:44 -0400, Neal Cardwell wrote:

> Yes, exactly. 3.2.9 and 3.2.10 have
> daef52bab1fd26e24e8e9578f8fb33ba1d0cb412 but not the
> 4648dc97af9d496218a05353b0e442b3dfa6aaab that fixes the resulting
> issue with sacked_out going negative, and these lines you've flagged
> are exactly the symptoms we'd expect because of that. The fix is
> queued up for the -stable series already, so it should be in 3.2.11, I
> presume. The fix is already in 3.3-rc7.

3.2.12 more probably, as 3.2.11 is a single build fix

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tcp: fix syncookie regression
  2012-03-13 17:37           ` Eric Dumazet
@ 2012-03-13 22:59             ` David Miller
  2012-03-13 23:30               ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2012-03-13 22:59 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ncardwell, sim, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Mar 2012 10:37:31 -0700

> On Tue, 2012-03-13 at 12:44 -0400, Neal Cardwell wrote:
> 
>> Yes, exactly. 3.2.9 and 3.2.10 have
>> daef52bab1fd26e24e8e9578f8fb33ba1d0cb412 but not the
>> 4648dc97af9d496218a05353b0e442b3dfa6aaab that fixes the resulting
>> issue with sacked_out going negative, and these lines you've flagged
>> are exactly the symptoms we'd expect because of that. The fix is
>> queued up for the -stable series already, so it should be in 3.2.11, I
>> presume. The fix is already in 3.3-rc7.
> 
> 3.2.12 more probably, as 3.2.11 is a single build fix

The syncookie regression fix isn't even pushed to Linus's tree yet,
and once it's there it still needs to cook for a few more days before
I'm willing to sent it to -stable.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tcp: fix syncookie regression
  2012-03-13 22:59             ` David Miller
@ 2012-03-13 23:30               ` Eric Dumazet
  2012-03-14  0:36                 ` Simon Kirby
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2012-03-13 23:30 UTC (permalink / raw)
  To: David Miller; +Cc: ncardwell, sim, netdev

On Tue, 2012-03-13 at 15:59 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 13 Mar 2012 10:37:31 -0700
> 
> > On Tue, 2012-03-13 at 12:44 -0400, Neal Cardwell wrote:
> > 
> >> Yes, exactly. 3.2.9 and 3.2.10 have
> >> daef52bab1fd26e24e8e9578f8fb33ba1d0cb412 but not the
> >> 4648dc97af9d496218a05353b0e442b3dfa6aaab that fixes the resulting
> >> issue with sacked_out going negative, and these lines you've flagged
> >> are exactly the symptoms we'd expect because of that. The fix is
> >> queued up for the -stable series already, so it should be in 3.2.11, I
> >> presume. The fix is already in 3.3-rc7.
> > 
> > 3.2.12 more probably, as 3.2.11 is a single build fix
> 
> The syncookie regression fix isn't even pushed to Linus's tree yet,
> and once it's there it still needs to cook for a few more days before
> I'm willing to sent it to -stable.

Yeah, discussion diverged from tcp syncookie to other stuff (tcp at
least ;) )

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tcp: fix syncookie regression
  2012-03-13 23:30               ` Eric Dumazet
@ 2012-03-14  0:36                 ` Simon Kirby
  0 siblings, 0 replies; 14+ messages in thread
From: Simon Kirby @ 2012-03-14  0:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, ncardwell, netdev

On Tue, Mar 13, 2012 at 04:30:57PM -0700, Eric Dumazet wrote:

> On Tue, 2012-03-13 at 15:59 -0700, David Miller wrote:
> 
> > The syncookie regression fix isn't even pushed to Linus's tree yet,
> > and once it's there it still needs to cook for a few more days before
> > I'm willing to sent it to -stable.
> 
> Yeah, discussion diverged from tcp syncookie to other stuff (tcp at
> least ;) )

Yes, sorry, at first I thought it might be related. Anyway, the syncookie
fix is cooking in production(tm) for us, and working well. Thanks,

Simon-

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-03-14  0:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-10 12:27 TCP syn flood handling regression Simon Kirby
2012-03-10 17:44 ` Eric Dumazet
2012-03-10 17:56   ` Eric Dumazet
2012-03-10 19:20     ` [PATCH] tcp: fix syncookie regression Eric Dumazet
2012-03-11 22:54       ` David Miller
2012-03-12 20:30       ` Simon Kirby
2012-03-13  7:26       ` Simon Kirby
2012-03-13 14:07         ` Dave Jones
2012-03-13 16:52           ` Neal Cardwell
2012-03-13 16:44         ` Neal Cardwell
2012-03-13 17:37           ` Eric Dumazet
2012-03-13 22:59             ` David Miller
2012-03-13 23:30               ` Eric Dumazet
2012-03-14  0:36                 ` Simon Kirby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).