All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps
@ 2010-11-14  7:35 Zhang Le
  2010-11-14  8:52 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Zhang Le @ 2010-11-14  7:35 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Zhang Le, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy

Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
req->ts_recent. In this case, theoretically the req should not be ignored.

But in fact, it could be ignored, if peer->tcp_ts is so small that the
difference between this two number is larger than 2 to the power of 31.

I understand that under this situation, timestamp does not make sense any more,
because it actually comes from difference machines. However, if anyone
ever need to do the same investigation which I have done, this will
save some time for him.

Signed-off-by: Zhang Le <r0bertz@gentoo.org>
---
 net/ipv4/tcp_ipv4.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 8f8527d..1eb4974 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		    peer->v4daddr == saddr) {
 			inet_peer_refcheck(peer);
 			if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
-			    (s32)(peer->tcp_ts - req->ts_recent) >
-							TCP_PAWS_WINDOW) {
+			    ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
+			     peer->tcp_ts > req->ts_recent)) {
 				NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
 				goto drop_and_release;
 			}
-- 
1.7.3.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps
  2010-11-14  7:35 [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps Zhang Le
@ 2010-11-14  8:52 ` Eric Dumazet
  2010-11-14 15:00   ` Zhang Le
  2010-11-14 19:55   ` David Miller
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Dumazet @ 2010-11-14  8:52 UTC (permalink / raw)
  To: Zhang Le
  Cc: netdev, linux-kernel, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy

Le dimanche 14 novembre 2010 à 15:35 +0800, Zhang Le a écrit :
> Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
> req->ts_recent. In this case, theoretically the req should not be ignored.
> 
> But in fact, it could be ignored, if peer->tcp_ts is so small that the
> difference between this two number is larger than 2 to the power of 31.
> 
> I understand that under this situation, timestamp does not make sense any more,
> because it actually comes from difference machines. However, if anyone
> ever need to do the same investigation which I have done, this will
> save some time for him.
> 
> Signed-off-by: Zhang Le <r0bertz@gentoo.org>
> ---
>  net/ipv4/tcp_ipv4.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 8f8527d..1eb4974 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>  		    peer->v4daddr == saddr) {
>  			inet_peer_refcheck(peer);
>  			if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
> -			    (s32)(peer->tcp_ts - req->ts_recent) >
> -							TCP_PAWS_WINDOW) {
> +			    ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
> +			     peer->tcp_ts > req->ts_recent)) {
>  				NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
>  				goto drop_and_release;
>  			}

This seems very wrong to me.

Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going
to help. And it might break some working setups, because of wrap around.

Really, if you have multiple clients behind a common NAT, you cannot use
this code at all, since NAT doesnt usually change TCP timestamps.

What about following patch instead ?

[PATCH] doc: extend tcp_tw_recycle documentation

tcp_tw_recycle should not be used on a server if there is a chance
clients are behind a same NAT. Document this fact before too many users
discover this too late.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 Documentation/networking/ip-sysctl.txt |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index c7165f4..406f0d5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER
 tcp_tw_recycle - BOOLEAN
 	Enable fast recycling TIME-WAIT sockets. Default value is 0.
 	It should not be changed without advice/request of technical
-	experts.
+	experts. If you set it to 1, make sure you dont miss connections
+	attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter).
+	In particular, this might break if several clients are behind
+	a common NAT device, since their TCP timestamp wont be changed
+	by the NAT. tcp_tw_recycle should be used with care, most
+	probably in private networks.
 
 tcp_tw_reuse - BOOLEAN
 	Allow to reuse TIME-WAIT sockets for new connections when it is



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps
  2010-11-14  8:52 ` Eric Dumazet
@ 2010-11-14 15:00   ` Zhang Le
  2010-11-14 19:55   ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: Zhang Le @ 2010-11-14 15:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy

[-- Attachment #1: Type: text/plain, Size: 3712 bytes --]

On 09:52 Sun 14 Nov     , Eric Dumazet wrote:
> Le dimanche 14 novembre 2010 à 15:35 +0800, Zhang Le a écrit :
> > Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
> > req->ts_recent. In this case, theoretically the req should not be ignored.
> > 
> > But in fact, it could be ignored, if peer->tcp_ts is so small that the
> > difference between this two number is larger than 2 to the power of 31.
> > 
> > I understand that under this situation, timestamp does not make sense any more,
> > because it actually comes from difference machines. However, if anyone
> > ever need to do the same investigation which I have done, this will
> > save some time for him.
> > 
> > Signed-off-by: Zhang Le <r0bertz@gentoo.org>
> > ---
> >  net/ipv4/tcp_ipv4.c |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index 8f8527d..1eb4974 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> >  		    peer->v4daddr == saddr) {
> >  			inet_peer_refcheck(peer);
> >  			if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
> > -			    (s32)(peer->tcp_ts - req->ts_recent) >
> > -							TCP_PAWS_WINDOW) {
> > +			    ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
> > +			     peer->tcp_ts > req->ts_recent)) {
> >  				NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
> >  				goto drop_and_release;
> >  			}
> 
> This seems very wrong to me.
> 
> Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going
> to help. And it might break some working setups, because of wrap around.

Yeah, you are right. And sorry for overlooking this.

I should have reviewed time_{before,after}'s implementation before posting this.

So it seems we can't do anything to improve this except to add some warning in
documentation. Maybe some comments in the code too.

> 
> Really, if you have multiple clients behind a common NAT, you cannot use
> this code at all, since NAT doesnt usually change TCP timestamps.
> 
> What about following patch instead ?
> 
> [PATCH] doc: extend tcp_tw_recycle documentation
> 
> tcp_tw_recycle should not be used on a server if there is a chance
> clients are behind a same NAT. Document this fact before too many users
> discover this too late.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  Documentation/networking/ip-sysctl.txt |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index c7165f4..406f0d5 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER
>  tcp_tw_recycle - BOOLEAN
>  	Enable fast recycling TIME-WAIT sockets. Default value is 0.
>  	It should not be changed without advice/request of technical
> -	experts.
> +	experts. If you set it to 1, make sure you dont miss connections
> +	attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter).
> +	In particular, this might break if several clients are behind
> +	a common NAT device, since their TCP timestamp wont be changed
> +	by the NAT. tcp_tw_recycle should be used with care, most
> +	probably in private networks.
>  
>  tcp_tw_reuse - BOOLEAN
>  	Allow to reuse TIME-WAIT sockets for new connections when it is
> 
> 

-- 
Zhang, Le
Gentoo/Loongson Developer
http://zhangle.is-a-geek.org
0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps
  2010-11-14  8:52 ` Eric Dumazet
  2010-11-14 15:00   ` Zhang Le
@ 2010-11-14 19:55   ` David Miller
  2010-11-25 16:55     ` ZHANG, Le
  1 sibling, 1 reply; 5+ messages in thread
From: David Miller @ 2010-11-14 19:55 UTC (permalink / raw)
  To: eric.dumazet
  Cc: r0bertz, netdev, linux-kernel, kuznet, pekkas, jmorris, yoshfuji, kaber

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 14 Nov 2010 09:52:25 +0100

> Really, if you have multiple clients behind a common NAT, you cannot use
> this code at all, since NAT doesnt usually change TCP timestamps.

NAT is %100 incompatible with TW recycling, full stop.

There is no maybe, or maybe not.

If you are behind NAT you must not turn this feature on, ever.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps
  2010-11-14 19:55   ` David Miller
@ 2010-11-25 16:55     ` ZHANG, Le
  0 siblings, 0 replies; 5+ messages in thread
From: ZHANG, Le @ 2010-11-25 16:55 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, netdev, linux-kernel, kuznet, pekkas, jmorris,
	yoshfuji, kaber

[-- Attachment #1: Type: text/plain, Size: 953 bytes --]

On 11:55 Sun 14 Nov     , David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 14 Nov 2010 09:52:25 +0100
> 
> > Really, if you have multiple clients behind a common NAT, you cannot use
> > this code at all, since NAT doesnt usually change TCP timestamps.
> 
> NAT is %100 incompatible with TW recycling, full stop.
> 
> There is no maybe, or maybe not.
> 
> If you are behind NAT you must not turn this feature on, ever.

Sorry, this question may be OT on this list, but I am just curious:

Is there any other OS has implemented this feature like Linux?

To be very specific, by this feature, I mean rejecting old duplicates based
on per-host cache of last timestamp received from any connections.
As suggested in RFC1323 Appendix B.2 (b).

Does anyone, by any chance, know the answer? Thanks in advance!

-- 
ZHANG, Le
http://zhangle.is-a-geek.org
0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-11-25 16:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-14  7:35 [PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps Zhang Le
2010-11-14  8:52 ` Eric Dumazet
2010-11-14 15:00   ` Zhang Le
2010-11-14 19:55   ` David Miller
2010-11-25 16:55     ` ZHANG, Le

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.