All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH next] tcp: use zero-window when free_space is low
@ 2013-12-16 11:15 Florian Westphal
  2013-12-16 13:41 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Florian Westphal @ 2013-12-16 11:15 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

Currently the kernel tries to announce a zero window when free_space
is below the current receiver mss estimate.

When a sender is transmitting small packets, the receiver might be
unable to shrink the receive window, because
a) we cannot withdraw already-commited receive window, and,
b) we have to round the current rwin up to a multiple of the wscale factor,
   else we would shrink the current window.

This causes the receive buffer to fill up until the rmem limit is hit.
When this happens, we start dropping packets.

As we cannot avoid the "current_win is rounded up to multiple
of mss" issue (we would violate a) above) at least try to prevent the receive
buffer growth towards tcp_rmem[2] limit by attempting
to move to zero-window announcement when free_space
becomes less than 1/16 of the current allowed receive
buffer maximum.  If tcp_rmem[2] is large, this will increase
our chances to get a zero-window announcement out in time.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Unfortunately I couldn't come up with something that has
 no magic ('allowed >> 4') value.  I chose >>4 because it didn't cause
 tput limitations in my 'full-mss-sized, steady state' netcat tests.

 net/ipv4/tcp_output.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2a69f42..fd8d821 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2145,7 +2145,8 @@ u32 __tcp_select_window(struct sock *sk)
 	 */
 	int mss = icsk->icsk_ack.rcv_mss;
 	int free_space = tcp_space(sk);
-	int full_space = min_t(int, tp->window_clamp, tcp_full_space(sk));
+	int allowed_space = tcp_full_space(sk);
+	int full_space = min_t(int, tp->window_clamp, allowed_space);
 	int window;
 
 	if (mss > full_space)
@@ -2158,7 +2159,19 @@ u32 __tcp_select_window(struct sock *sk)
 			tp->rcv_ssthresh = min(tp->rcv_ssthresh,
 					       4U * tp->advmss);
 
-		if (free_space < mss)
+		/* free_space might become our new window, make sure we don't
+		 * increase it due to wscale.
+		 */
+		free_space = round_down(free_space, 1 << tp->rx_opt.rcv_wscale);
+
+		/* if free space is less than mss estimate, or is below 1/16th
+		 * of the maximum allowed, try to move to zero-window, else
+		 * tcp_clamp_window() will grow rcv buf up to tcp_rmem[2], and
+		 * new incoming data is dropped due to memory limits.
+		 * With large window, mss test triggers way too late in order
+		 * to announce zero window in time before rmem limit kicks in.
+		 */
+		if (free_space < mss || free_space < (allowed_space >> 4))
 			return 0;
 	}
 
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH next] tcp: use zero-window when free_space is low
  2013-12-16 11:15 [PATCH next] tcp: use zero-window when free_space is low Florian Westphal
@ 2013-12-16 13:41 ` Eric Dumazet
  2013-12-16 15:51   ` Florian Westphal
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2013-12-16 13:41 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On Mon, 2013-12-16 at 12:15 +0100, Florian Westphal wrote:
> Currently the kernel tries to announce a zero window when free_space
> is below the current receiver mss estimate.
> 
> When a sender is transmitting small packets, the receiver might be
> unable to shrink the receive window, because
> a) we cannot withdraw already-commited receive window, and,
> b) we have to round the current rwin up to a multiple of the wscale factor,
>    else we would shrink the current window.
> 
> This causes the receive buffer to fill up until the rmem limit is hit.
> When this happens, we start dropping packets.

I do not really understand the issue.
Do you have a packetdrill test to demonstrate it ?

Thanks !

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH next] tcp: use zero-window when free_space is low
  2013-12-16 13:41 ` Eric Dumazet
@ 2013-12-16 15:51   ` Florian Westphal
  0 siblings, 0 replies; 3+ messages in thread
From: Florian Westphal @ 2013-12-16 15:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netdev

Eric Dumazet <eric.dumazet@gmail.com> wrote:

Hi Eric,

> On Mon, 2013-12-16 at 12:15 +0100, Florian Westphal wrote:
> > Currently the kernel tries to announce a zero window when free_space
> > is below the current receiver mss estimate.
> > 
> > When a sender is transmitting small packets, the receiver might be
> > unable to shrink the receive window, because
> > a) we cannot withdraw already-commited receive window, and,
> > b) we have to round the current rwin up to a multiple of the wscale factor,
> >    else we would shrink the current window.
> > 
> > This causes the receive buffer to fill up until the rmem limit is hit.
> > When this happens, we start dropping packets.
> 
> I do not really understand the issue.
> Do you have a packetdrill test to demonstrate it ?

I am a moron and forgot to stress one crucial bit of information:
_slow_reader_ (or a reader that doesn't read from socket at all!)

I am not very familiar with packetdrill, it would look something like

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.100...0.200 connect(3, ..., ...) = 0

0.100 > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 7>
0.200 < S. 0:0(0) ack 1 win 32792 <mss 1460,sackOK,TS val 100 ecr 100,nop,wscale 7>
0.200 > . 1:1(0) ack 1 <nop,nop,TS val 100 ecr 100>

0.300 write(3, ..., 23) = 23
0.310 write(3, ..., 23) = 23
0.320 write(3, ..., 23) = 23
0.330 write(3, ..., 23) = 23
0.340 write(3, ..., 23) = 23
0.350 write(3, ..., 23) = 23
.. repeat indefinitely ..

Reproducer (non-packetdrill):

On server:
$ nc -l -p 12345
<suspend it: CTRL-Z>

Client:
#!/usr/bin/env python

import socket
import time

sock = socket.socket()
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
sock.connect(("192.168.4.1", 12345));
while True:
   sock.send('A' * 23)
   time.sleep(0.005)

socket buffer on server-side will grow until tcp_rmem[2] is hit,
at which point the client rexmits data until -EDTIMEOUT.

Code flow on server side is:
tcp_data_queue -> tcp_try_rmem_schedule -> \
	tcp_prune_queue -> tcp_clamp_window()

tcp_clamp_window will then grow sk->sk_rcvbuf, up until it eventually
hits tcp_rmem[2]

Many thanks for looking into this Eric!

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-12-16 15:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-16 11:15 [PATCH next] tcp: use zero-window when free_space is low Florian Westphal
2013-12-16 13:41 ` Eric Dumazet
2013-12-16 15:51   ` Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.