linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Heffner <jheffner@psc.edu>
To: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Ion Badulescu <lists@limebrokerage.com>,
	"David S. Miller" <davem@davemloft.net>,
	linux-kernel@vger.kernel.org, linux-net@vger.kernel.org,
	netdev@vger.kernel.org, gautran@mrv.com
Subject: Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels
Date: Thu, 29 Sep 2005 12:04:28 -0400	[thread overview]
Message-ID: <200509291204.29393.jheffner@psc.edu> (raw)
In-Reply-To: <20050929151729.GA2158@ms2.inr.ac.ru>

[-- Attachment #1: Type: text/plain, Size: 2317 bytes --]

On Thursday 29 September 2005 11:17 am, Alexey Kuznetsov wrote:
> Hello!
>
> > >Anyway, ignoring this puzzle, the following patch for 2.4 should help.
> > >
> > >
> > >--- net/ipv4/tcp_input.c.orig	2003-02-20 20:38:39.000000000 +0300
> > >+++ net/ipv4/tcp_input.c	2005-09-02 22:28:00.845952888 +0400
> > >@@ -343,8 +343,6 @@
> > >			app_win -= tp->ack.rcv_mss;
> > >		app_win = max(app_win, 2U*tp->advmss);
> > >
> > >-		if (!ofo_win)
> > >-			tp->window_clamp = min(tp->window_clamp, app_win);
> > >		tp->rcv_ssthresh = min(tp->window_clamp, 2U*tp->advmss);
> > >	}
> > >}
> >
> > I'm very happy to report that the above patch, applied to 2.6.12.6, seems
> > to have cured the TCP window problem we were experiencing.
>
> Good. I think the patch is to be applied to all mainstream kernels.

Has anyone looked at the patch I sent out on Sept 9?  It goes a few steps 
further, addressing some additional problems.  Original message below.

Thanks,
  -John

-----

This is a patch for discussion addressing some receive buffer growing issues.  
This is partially related to the thread "Possible BUG in IPv4 TCP window 
handling..." last week.

Specifically it addresses the problem of an interaction between rcvbuf 
moderation (receiver autotuning) and rcv_ssthresh.  The problem occurs when 
sending small packets to a receiver with a larger MTU.  (A very common case I 
have is a host with a 1500 byte MTU sending to a host with a 9k MTU.)  In 
such a case, the rcv_ssthresh code is targeting a window size corresponding 
to filling up the current rcvbuf, not taking into account that the new rcvbuf 
moderation may increase the rcvbuf size.

One hunk makes rcv_ssthresh use tcp_rmem[2] as the size target rather than 
rcvbuf.  The other changes the behavior when it overflows its memory bounds 
with in-order data so that it tries to grow rcvbuf (the same as with 
out-of-order data).

These changes should help my problem of mixed MTUs, and should also help the 
case from last week's thread I think.  (In both cases though you still need 
tcp_rmem[2] to be set much larger than the TCP window.)  One question is if 
this is too aggressive at trying to increase rcvbuf if it's under memory 
stress.

  -John


Signed-off-by: John Heffner <jheffner@psc.edu>

[-- Attachment #2: rcv_ssthresh.diff --]
[-- Type: text/x-diff, Size: 2005 bytes --]

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -233,7 +233,7 @@ static int __tcp_grow_window(const struc
 {
 	/* Optimize this! */
 	int truesize = tcp_win_from_space(skb->truesize)/2;
-	int window = tcp_full_space(sk)/2;
+	int window = tcp_win_from_space(sysctl_tcp_rmem[2])/2;
 
 	while (tp->rcv_ssthresh <= window) {
 		if (truesize <= skb->len)
@@ -326,39 +326,18 @@ static void tcp_init_buffer_space(struct
 static void tcp_clamp_window(struct sock *sk, struct tcp_sock *tp)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
-	struct sk_buff *skb;
-	unsigned int app_win = tp->rcv_nxt - tp->copied_seq;
-	int ofo_win = 0;
 
 	icsk->icsk_ack.quick = 0;
 
-	skb_queue_walk(&tp->out_of_order_queue, skb) {
-		ofo_win += skb->len;
+	if (sk->sk_rcvbuf < sysctl_tcp_rmem[2] &&
+	    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
+	    !tcp_memory_pressure &&
+	    atomic_read(&tcp_memory_allocated) < sysctl_tcp_mem[0]) {
+		sk->sk_rcvbuf = min(atomic_read(&sk->sk_rmem_alloc),
+				    sysctl_tcp_rmem[2]);
 	}
-
-	/* If overcommit is due to out of order segments,
-	 * do not clamp window. Try to expand rcvbuf instead.
-	 */
-	if (ofo_win) {
-		if (sk->sk_rcvbuf < sysctl_tcp_rmem[2] &&
-		    !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
-		    !tcp_memory_pressure &&
-		    atomic_read(&tcp_memory_allocated) < sysctl_tcp_mem[0])
-			sk->sk_rcvbuf = min(atomic_read(&sk->sk_rmem_alloc),
-					    sysctl_tcp_rmem[2]);
-	}
-	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) {
-		app_win += ofo_win;
-		if (atomic_read(&sk->sk_rmem_alloc) >= 2 * sk->sk_rcvbuf)
-			app_win >>= 1;
-		if (app_win > icsk->icsk_ack.rcv_mss)
-			app_win -= icsk->icsk_ack.rcv_mss;
-		app_win = max(app_win, 2U*tp->advmss);
-
-		if (!ofo_win)
-			tp->window_clamp = min(tp->window_clamp, app_win);
+	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
 		tp->rcv_ssthresh = min(tp->window_clamp, 2U*tp->advmss);
-	}
 }
 
 /* Receiver "autotuning" code.

  parent reply	other threads:[~2005-09-29 16:05 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-01 22:30 Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels Ion Badulescu
2005-09-01 22:43 ` David S. Miller
2005-09-01 22:49   ` Jesper Juhl
2005-09-01 22:53     ` David S. Miller
2005-09-01 22:53   ` Ion Badulescu
2005-09-01 23:37     ` Jesper Juhl
2005-09-02  2:51     ` John Heffner
2005-09-02  6:28       ` David S. Miller
2005-09-02 14:05         ` lists
2005-09-02 14:10           ` John Heffner
2005-09-02 14:33             ` lists
2005-09-02 14:48               ` John Heffner
2005-09-02 15:43                 ` Ion Badulescu
2005-09-02 13:02       ` Guillaume Autran
2005-09-02 13:48         ` Ion Badulescu
2005-09-02 13:52         ` Alexey Kuznetsov
2005-09-02 14:11           ` John Heffner
2005-09-02 13:48       ` Alexey Kuznetsov
2005-09-02 14:16         ` John Heffner
2005-09-02 15:11           ` Alexey Kuznetsov
2005-09-02 18:36     ` Alexey Kuznetsov
2005-09-02 20:57       ` Ion Badulescu
2005-09-02 21:18         ` Alexey Kuznetsov
2005-09-02 23:09           ` Ion Badulescu
2005-09-28 16:31       ` Ion Badulescu
2005-09-29 15:17         ` Alexey Kuznetsov
2005-09-29 15:34           ` Guillaume Autran
2005-09-29 16:04           ` John Heffner [this message]
2005-09-29 18:16             ` David S. Miller
2005-09-30  0:29           ` David S. Miller
2005-09-02  4:51 ` Noritoshi Demizu
2005-09-02  5:20   ` Stephen Hemminger
2005-09-02  5:45     ` Noritoshi Demizu
2005-09-02  6:11       ` Noritoshi Demizu
2005-09-02 12:11         ` Ion Badulescu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200509291204.29393.jheffner@psc.edu \
    --to=jheffner@psc.edu \
    --cc=davem@davemloft.net \
    --cc=gautran@mrv.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-net@vger.kernel.org \
    --cc=lists@limebrokerage.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).