* 2.4 delayed acks don't work, fixed
@ 2003-03-17 8:25 Andrea Arcangeli
2003-03-18 18:34 ` kuznet
0 siblings, 1 reply; 14+ messages in thread
From: Andrea Arcangeli @ 2003-03-17 8:25 UTC (permalink / raw)
To: linux-kernel; +Cc: David S. Miller, kuznet, Andi Kleen
Last week I installed adsl, and in the weekend I was playing with some
streaming service. While watching tcpdump I noticed some huge breakage
in the dealyed acks algorithm that generated an overkill number of acks
(around the double than what was really necessary). I suspect this
problem never seen the light of the day as it only silenty generates
some relevant global waste of the internet. this is a 2.4 tcp stacks
getting data from a streamer with bulk transfers on normal connection
with stock 2.4 tcp stack:
01:34:24.061554 streamer.8300 > linux.53972: . 131401:132861(1460) ack 0 win 58400 (DF)
01:34:24.139741 linux.53972 > streamer.8300: . ack 132861 win 1460 (DF)
01:34:24.140554 streamer.8300 > linux.53972: . 132861:134321(1460) ack 0 win 58400 (DF)
01:34:24.143710 linux.53972 > streamer.8300: . ack 134321 win 1460 (DF)
01:34:24.223566 linux.53972 > streamer.8300: . ack 134321 win 2920 (DF)
01:34:24.241532 streamer.8300 > linux.53972: . 134321:135781(1460) ack 0 win 58400 (DF)
01:34:24.319737 linux.53972 > streamer.8300: . ack 135781 win 1460 (DF)
01:34:24.321529 streamer.8300 > linux.53972: . 135781:137241(1460) ack 0 win 58400 (DF)
01:34:24.323634 linux.53972 > streamer.8300: . ack 137241 win 1460 (DF)
01:34:24.421492 streamer.8300 > linux.53972: . 137241:138701(1460) ack 0 win 58400 (DF)
01:34:24.423541 linux.53972 > streamer.8300: . ack 138701 win 1460 (DF)
01:34:24.503581 linux.53972 > streamer.8300: . ack 138701 win 2920 (DF)
01:34:24.521555 streamer.8300 > linux.53972: . 138701:140161(1460) ack 0 win 58400 (DF)
01:34:24.599739 linux.53972 > streamer.8300: . ack 140161 win 1460 (DF)
01:34:24.601480 streamer.8300 > linux.53972: . 140161:141621(1460) ack 0 win 58400 (DF)
01:34:24.603663 linux.53972 > streamer.8300: . ack 141621 win 1460 (DF)
linux is the receiver of course (tcpdump is running on the linux box),
the sender is a streamer over the internet but it doesn't really matter,
it happens with any kind of transfer: after the first delack timer
triggers it keeps going like the above for the all remaining part of the
large downloads (i.e. for days until I reset the computer). This stremer
makes it more obvious becaue it waits some time before sending the next
packet (my bandwidth is now much higher than the one needed by the
player). These seldom waits triggers the delacks timers and after that
the delack feature is completely disabled, it restarts for very few
packets once in a while when ack.quick is set to 0 but those seldom
delayed acks are completely hidden by the above quickacks. Even worse
linux keeps sending more than 1 ack ever few received packets for
suprious too short window updates, so it's doing the exact opposite of
the delack feature. It looks very broken to me.
rfc1122 says (quote):
A TCP SHOULD implement a delayed ACK, but an ACK should not be
excessively delayed; in particular, the delay MUST be less than 0.5
seconds,
Apparently linux only waits 0.2 at max, this appears wrong too (but .2
would be more than enough for my testcase, when it's longer than .2 it's
because the streamer is intentionally delaying, so triggering the delack
is fine in such case).
I had a look and I found various explanations for the bad behaviour:
1) the delayed ack timer destroy the ato value resetting it to the min
value (40msec) and the quickack mode is activated (pingpong = 0)
2) the pingpong is never re-activated, so it takes the whole receive
window before the pingpong isn't significant anymore, then after the
first delack timer it will take another receive window before I
can see a new delayed ack
3) the ato averaging logic during the packet reception will not inflate
the ato if "m > ato" which is obviously the case after a delack timer
triggered and in turn after the ato is been deflated to its min value
4) the logic that bounds the delayed ack to the srtt >> 3 looks also
risky, using the rto looks much safer to me there, to be sure
those delacks aren't going to trigger too early
5) I suspect the current delack algorithm can wait more than 2 packets,
the && must be a || after the (tp->rcv_nxt - tp->rcv_wup) >
tp->ack.rcv_mss check, just try a netcat xxx chargen >/dev/null on a
100mbit and see how many packets you need to receive before you can
see the ack some time, this doesn't seem to happen with these
modifications applied
Besides the above, there's also quite some ack overhead due the window
updates triggered by the userspace so I made it a little more
aggressive by sending an ack in recvmsg only if the potential rcv window is
been increase of _more_ than 2 times the current outstanding rcv window
(not equal), this way the suprious updates rarely happens, and I also
avoid updates if there's a delack timer pending and not blocked (this
last one looks quite a natural idea, this may actually hurt but I doubt,
certainly I would be ok to drop that goto out in cleanup_rbuf if you
think it's going to be wrong on very high speed networks).
this new one is the (IMHO) a much nicer behaviour for the same workloads
as above with the modifications applied:
08:57:27.718987 streamer.8300 > linux.32792: . 26281:27741(1460) ack 0 win 58400 (DF)
08:57:27.747964 streamer.8300 > linux.32792: . 27741:29201(1460) ack 0 win 58400 (DF)
08:57:27.748017 linux.32792 > streamer.8300: . ack 29201 win 2920 (DF)
08:57:27.768949 streamer.8300 > linux.32792: . 29201:30661(1460) ack 0 win 58400 (DF)
08:57:27.848937 streamer.8300 > linux.32792: . 30661:32121(1460) ack 0 win 58400 (DF)
08:57:27.848986 linux.32792 > streamer.8300: . ack 32121 win 1460 (DF)
08:57:27.934286 linux.32792 > streamer.8300: . ack 32121 win 4380 (DF)
08:57:27.948918 streamer.8300 > linux.32792: . 32121:33581(1460) ack 0 win 58400 (DF)
08:57:28.038882 streamer.8300 > linux.32792: . 33581:35041(1460) ack 0 win 58400 (DF)
08:57:28.038931 linux.32792 > streamer.8300: . ack 35041 win 2920 (DF)
08:57:28.058882 streamer.8300 > linux.32792: . 35041:36501(1460) ack 0 win 58400 (DF)
08:57:28.138866 streamer.8300 > linux.32792: . 36501:37961(1460) ack 0 win 58400 (DF)
08:57:28.138919 linux.32792 > streamer.8300: . ack 37961 win 1460 (DF)
08:57:28.238912 streamer.8300 > linux.32792: . 37961:39421(1460) ack 0 win 58400 (DF)
08:57:28.394274 linux.32792 > streamer.8300: . ack 39421 win 4380 (DF)
08:57:28.488823 streamer.8300 > linux.32792: . 39421:40881(1460) ack 0 win 58400 (DF)
08:57:28.508800 streamer.8300 > linux.32792: . 40881:42341(1460) ack 0 win 58400 (DF)
08:57:28.508841 linux.32792 > streamer.8300: . ack 42341 win 2920 (DF)
08:57:28.538803 streamer.8300 > linux.32792: . 42341:43801(1460) ack 0 win 58400 (DF)
08:57:28.608829 streamer.8300 > linux.32792: . 43801:45261(1460) ack 0 win 58400 (DF)
08:57:28.608877 linux.32792 > streamer.8300: . ack 45261 win 1460 (DF)
08:57:28.708788 streamer.8300 > linux.32792: . 45261:46721(1460) ack 0 win 58400 (DF)
08:57:28.864277 linux.32792 > streamer.8300: . ack 46721 win 4380 (DF)
08:57:28.958765 streamer.8300 > linux.32792: . 46721:48181(1460) ack 0 win 58400 (DF)
08:57:28.988704 streamer.8300 > linux.32792: . 48181:49641(1460) ack 0 win 58400 (DF)
08:57:28.988759 linux.32792 > streamer.8300: . ack 49641 win 2920 (DF)
08:57:29.018705 streamer.8300 > linux.32792: . 49641:51101(1460) ack 0 win 58400 (DF)
08:57:29.098699 streamer.8300 > linux.32792: . 51101:52561(1460) ack 0 win 58400 (DF)
08:57:29.098749 linux.32792 > streamer.8300: . ack 52561 win 1460 (DF)
08:57:29.208694 streamer.8300 > linux.32792: . 52561:54021(1460) ack 0 win 58400 (DF)
08:57:29.380937 linux.32792 > streamer.8300: . ack 54021 win 4380 (DF)
08:57:29.478646 streamer.8300 > linux.32792: . 54021:55481(1460) ack 0 win 58400 (DF)
08:57:29.498614 streamer.8300 > linux.32792: . 55481:56941(1460) ack 0 win 58400 (DF)
08:57:29.498648 linux.32792 > streamer.8300: . ack 56941 win 4380 (DF)
08:57:29.518615 streamer.8300 > linux.32792: . 56941:58401(1460) ack 0 win 58400 (DF)
08:57:29.598632 streamer.8300 > linux.32792: . 58401:59861(1460) ack 0 win 58400 (DF)
08:57:29.598677 linux.32792 > streamer.8300: . ack 59861 win 2920 (DF)
08:57:29.618619 streamer.8300 > linux.32792: . 59861:61321(1460) ack 0 win 58400 (DF)
08:57:29.698591 streamer.8300 > linux.32792: . 61321:62781(1460) ack 0 win 58400 (DF)
08:57:29.698637 linux.32792 > streamer.8300: . ack 62781 win 1460 (DF)
now my streming services are generating 1/4 of number of packets over
the internet compared to what the buggy logic in mainline does obviously
w/o any possible change in performance, so I'm going to use it. It may
not be RFC complaint but I doubt the current mainline code could be RFC
compliant in the first place.
here's the diff, comments welcome.
diff -urNp xx/include/net/tcp.h xxx/include/net/tcp.h
--- xx/include/net/tcp.h 2003-03-17 09:01:13.000000000 +0100
+++ xxx/include/net/tcp.h 2003-03-17 08:45:28.000000000 +0100
@@ -323,7 +323,7 @@ static __inline__ int tcp_sk_listen_hash
* TIME-WAIT timer.
*/
-#define TCP_DELACK_MAX ((unsigned)(HZ/5)) /* maximal time to delay before sending an ACK */
+#define TCP_DELACK_MAX ((unsigned)(HZ/2)) /* maximal time to delay before sending an ACK */
#if HZ >= 100
#define TCP_DELACK_MIN ((unsigned)(HZ/25)) /* minimal time to delay before sending an ACK */
#define TCP_ATO_MIN ((unsigned)(HZ/25))
diff -urNp xx/net/ipv4/tcp.c xxx/net/ipv4/tcp.c
--- xx/net/ipv4/tcp.c 2003-03-17 09:01:13.000000000 +0100
+++ xxx/net/ipv4/tcp.c 2003-03-17 08:10:23.000000000 +0100
@@ -1290,22 +1290,10 @@ void cleanup_rbuf(struct sock *sk, int c
#endif
if (tcp_ack_scheduled(tp)) {
- /* Delayed ACKs frequently hit locked sockets during bulk receive. */
- if (tp->ack.blocked
- /* Once-per-two-segments ACK was not sent by tcp_input.c */
- || tp->rcv_nxt - tp->rcv_wup > tp->ack.rcv_mss
- /*
- * If this read emptied read buffer, we send ACK, if
- * connection is not bidirectional, user drained
- * receive buffer and there was a small segment
- * in queue.
- */
- || (copied > 0 &&
- (tp->ack.pending&TCP_ACK_PUSHED) &&
- !tp->ack.pingpong &&
- atomic_read(&sk->rmem_alloc) == 0)) {
+ if (tp->ack.blocked)
+ /* Delayed ACKs frequently hit locked sockets during bulk receive. */
time_to_ack = 1;
- }
+ goto out;
}
/* We send an ACK if we can now advertise a non-zero window
@@ -1318,7 +1306,7 @@ void cleanup_rbuf(struct sock *sk, int c
__u32 rcv_window_now = tcp_receive_window(tp);
/* Optimize, __tcp_select_window() is not cheap. */
- if (2*rcv_window_now <= tp->window_clamp) {
+ if (2*rcv_window_now < tp->window_clamp) {
__u32 new_window = __tcp_select_window(sk);
/* Send ACK now, if this read freed lots of space
@@ -1326,10 +1314,11 @@ void cleanup_rbuf(struct sock *sk, int c
* We can advertise it now, if it is not less than current one.
* "Lots" means "at least twice" here.
*/
- if(new_window && new_window >= 2*rcv_window_now)
+ if(new_window && new_window > 2*rcv_window_now)
time_to_ack = 1;
}
}
+ out:
if (time_to_ack)
tcp_send_ack(sk);
}
diff -urNp xx/net/ipv4/tcp_input.c xxx/net/ipv4/tcp_input.c
--- xx/net/ipv4/tcp_input.c 2003-03-17 09:01:03.000000000 +0100
+++ xxx/net/ipv4/tcp_input.c 2003-03-17 08:36:15.000000000 +0100
@@ -173,6 +173,11 @@ void tcp_enter_quickack_mode(struct tcp_
tp->ack.ato = TCP_ATO_MIN;
}
+static inline void tcp_exit_quickack_mode(struct tcp_opt *tp)
+{
+ tp->ack.pingpong = 1;
+}
+
/* Send ACKs quickly, if "quick" count is not exhausted
* and the session is not interactive.
*/
@@ -381,16 +386,21 @@ static void tcp_event_data_recv(struct s
if (m <= TCP_ATO_MIN/2) {
/* The fastest case is the first. */
tp->ack.ato = (tp->ack.ato>>1) + TCP_ATO_MIN/2;
- } else if (m < tp->ack.ato) {
- tp->ack.ato = (tp->ack.ato>>1) + m;
- if (tp->ack.ato > tp->rto)
- tp->ack.ato = tp->rto;
- } else if (m > tp->rto) {
+ tcp_exit_quickack_mode(tp);
+ } else if (unlikely(m > TCP_DELACK_MAX)) {
+ /* Delayed acks are worthless on a very slow link. */
+ tcp_incr_quickack(tp);
+ } else if (unlikely(m > tp->rto)) {
/* Too long gap. Apparently sender falled to
* restart window, so that we send ACKs quickly.
*/
tcp_incr_quickack(tp);
tcp_mem_reclaim(sk);
+ } else {
+ tp->ack.ato = (tp->ack.ato>>1) + m;
+ if (tp->ack.ato > tp->rto)
+ tp->ack.ato = tp->rto;
+ tcp_exit_quickack_mode(tp);
}
}
tp->ack.lrcvtime = now;
@@ -3131,11 +3141,7 @@ static __inline__ void __tcp_ack_snd_che
struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp);
/* More than one full frame received... */
- if (((tp->rcv_nxt - tp->rcv_wup) > tp->ack.rcv_mss
- /* ... and right edge of window advances far enough.
- * (tcp_recvmsg() will send ACK otherwise). Or...
- */
- && __tcp_select_window(sk) >= tp->rcv_wnd) ||
+ if ((tp->rcv_nxt - tp->rcv_wup) > tp->ack.rcv_mss ||
/* We ACK each frame or... */
tcp_in_quickack_mode(tp) ||
/* We have out of order data. */
diff -urNp xx/net/ipv4/tcp_output.c xxx/net/ipv4/tcp_output.c
--- xx/net/ipv4/tcp_output.c 2003-03-15 01:25:19.000000000 +0100
+++ xxx/net/ipv4/tcp_output.c 2003-03-17 08:37:07.000000000 +0100
@@ -1257,19 +1257,13 @@ void tcp_send_delayed_ack(struct sock *s
unsigned long timeout;
if (ato > TCP_DELACK_MIN) {
- int max_ato = HZ/2;
-
- if (tp->ack.pingpong || (tp->ack.pending&TCP_ACK_PUSHED))
- max_ato = TCP_DELACK_MAX;
+ int max_ato = TCP_DELACK_MAX;
/* Slow path, intersegment interval is "high". */
- /* If some rtt estimate is known, use it to bound delayed ack.
- * Do not use tp->rto here, use results of rtt measurements
- * directly.
- */
- if (tp->srtt) {
- int rtt = max(tp->srtt>>3, TCP_DELACK_MIN);
+ /* If some rtt estimate is known, use it to bound delayed ack. */
+ if (tp->rto) {
+ int rtt = max(tp->rto, TCP_DELACK_MIN);
if (rtt < max_ato)
max_ato = rtt;
diff -urNp xx/net/ipv4/tcp_timer.c xxx/net/ipv4/tcp_timer.c
--- xx/net/ipv4/tcp_timer.c 2003-03-15 01:25:19.000000000 +0100
+++ xxx/net/ipv4/tcp_timer.c 2003-03-17 08:37:07.000000000 +0100
@@ -250,11 +250,10 @@ static void tcp_delack_timer(unsigned lo
/* Delayed ACK missed: inflate ATO. */
tp->ack.ato = min(tp->ack.ato << 1, tp->rto);
} else {
- /* Delayed ACK missed: leave pingpong mode and
- * deflate ATO.
+ /* Delayed ACK missed: leave pingpong mode
+ * but be ready to reenable delay acks fast.
*/
tp->ack.pingpong = 0;
- tp->ack.ato = TCP_ATO_MIN;
}
tcp_send_ack(sk);
NET_INC_STATS_BH(DelayedACKs);
Andrea
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4 delayed acks don't work, fixed
2003-03-17 8:25 2.4 delayed acks don't work, fixed Andrea Arcangeli
@ 2003-03-18 18:34 ` kuznet
2003-03-18 19:34 ` Andrea Arcangeli
0 siblings, 1 reply; 14+ messages in thread
From: kuznet @ 2003-03-18 18:34 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel, davem, ak
Hello!
> Apparently linux only waits 0.2 at max,
This is not true, the maximum is 0.5 in your case.
> 1) the delayed ack timer destroy the ato value resetting it to the min
> value (40msec) and the quickack mode is activated (pingpong = 0)
This is not true, delack timer inflates ato. pingpong=0 is not quickack
mode, it means that the session is unidirectional stream, which
is correct in your case.
> 2) the pingpong is never re-activated,
It MUST NOT. It is activated on transactional sessions only.
> 3) the ato averaging logic during the packet reception will not inflate
> the ato if "m > ato" which is obviously the case after a delack timer
> triggered and in turn after the ato is been deflated to its min value
When m > ato, the sample is invalid, apparently it is triggered by
a random delay at sender. When real ato increases, increase
is made in delack timer, not through estimator.
> 4) the logic that bounds the delayed ack to the srtt >> 3 looks also
> risky, using the rto looks much safer to me there, to be sure
> those delacks aren't going to trigger too early
It is necessary to provide more or less sane behaviour on interactive
session when ato > 100msec. Clamping by rto just does not make any sense.
> 5) I suspect the current delack algorithm can wait more than 2 packets,
Yes, when window is not opening, it is not required. Delack is send
when window is advanced.
Shortly, I still do not understand what kind of pathalogy happens in your
case (particularly, difference in adevrtised window before and after
applying your patch is confusing _a_ _lot_, I really would like
to look at larger tcpdump, covering beggining of the sssion),
but all the 5 items are surely wrong.
Unnumbered 6th one may be right, the heuristic with expansion twice
have no explanation, I think it can be relaxed even more.
Alexey
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4 delayed acks don't work, fixed
2003-03-18 18:34 ` kuznet
@ 2003-03-18 19:34 ` Andrea Arcangeli
2003-03-18 20:13 ` kuznet
0 siblings, 1 reply; 14+ messages in thread
From: Andrea Arcangeli @ 2003-03-18 19:34 UTC (permalink / raw)
To: kuznet; +Cc: linux-kernel, davem, ak
On Tue, Mar 18, 2003 at 09:34:50PM +0300, kuznet@ms2.inr.ac.ru wrote:
> Hello!
>
> > Apparently linux only waits 0.2 at max,
>
> This is not true, the maximum is 0.5 in your case.
what is the point of this:
#define TCP_DELACK_MAX ((unsigned)(HZ/5)) /* maximal time to delay before sending an ACK */
if the above means the maximal time to delay before sending an ack in
0.5 and not 0.2?
> > 1) the delayed ack timer destroy the ato value resetting it to the min
> > value (40msec) and the quickack mode is activated (pingpong = 0)
>
> This is not true, delack timer inflates ato. pingpong=0 is not quickack
> mode, it means that the session is unidirectional stream, which
> is correct in your case.
pingpong is set to 0 only by "tcp_enter_quickack_mode" and later by:
/* RFC2581. 4.2. SHOULD send immediate ACK, when
^^^^^^^^^^^^^^^^^^^^^^^^^^
* gap in queue is filled.
*/
if (skb_queue_len(&tp->out_of_order_queue) == 0)
tp->ack.pingpong = 0;
and finally by the delack timer (if it was set to 1):
if (tcp_ack_scheduled(tp)) {
if (!tp->ack.pingpong) {
/* Delayed ACK missed: inflate ATO. */
tp->ack.ato = min(tp->ack.ato << 1, tp->rto);
} else {
/* Delayed ACK missed: leave pingpong mode and
* deflate ATO.
*/
tp->ack.pingpong = 0;
^^^^^^^^^^^^^^^^^^^^
tp->ack.ato = TCP_ATO_MIN;
}
tcp_enter_quickack_mode is called every time we have to disable delayed
acks like when we send duplicate acks or when there's packet reordering
or whatever similar error.
how can 'pingpong' relate to the direction of the stream? I see no
relation at all. That means quickack mode according to the code.
'pingpong' has no clue of what I'm sending, it only knows if I'm
receiving fine and in turn if I can delay the ack.
since it's never re-activated, the tcp stack assumes I'm not receiving
fine, right after the first error, and forever until I close the socket.
> > 2) the pingpong is never re-activated,
>
> It MUST NOT. It is activated on transactional sessions only.
And according to tcp_in_quickack_mode this means you'll have a chance to
send a delayed ack only after "tp->ack.quick" have been sent out, after
an error of some sort (because whatever error triggers a
tcp_incr_quickack, tcp_enter_quickack_mode calls tcp_incr_quickack
infact).
> > 3) the ato averaging logic during the packet reception will not inflate
> > the ato if "m > ato" which is obviously the case after a delack timer
> > triggered and in turn after the ato is been deflated to its min value
>
> When m > ato, the sample is invalid, apparently it is triggered by
> a random delay at sender. When real ato increases, increase
> is made in delack timer, not through estimator.
this is only true if pingpong was just 0. but if pingpong is 0 it won't
send delayed acks in the first place because quick will very rarely get
down to 0. The streamer delays the next packet of something longer than
ato once in a while, but for all other packets delayed acks could work
perfectly. Only once in a while the delack timer will trigger.
> > 4) the logic that bounds the delayed ack to the srtt >> 3 looks also
> > risky, using the rto looks much safer to me there, to be sure
> > those delacks aren't going to trigger too early
>
> It is necessary to provide more or less sane behaviour on interactive
> session when ato > 100msec. Clamping by rto just does not make any sense.
ok.
> > 5) I suspect the current delack algorithm can wait more than 2 packets,
>
> Yes, when window is not opening, it is not required. Delack is send
> when window is advanced.
The RFC 1122 says:
[..] in a stream of full-sized
segments there SHOULD be an ACK for at least every second
segment.
unless you can point me to a more recent RFC where the above text is
been obsoleted, you're wrong. After the second packet the delack must be
sent _always_ according to rfc 1122 (at least if they're full-sized like
in my workload). No matter if this generated a window advance or not.
2.2 did it right. Maybe I'm reading obsolete specifications? Could you
point me to the RFC that speaks about the pingpong directional stream as
well?
I also wonder if this is correct:
if (eaten) {
if (tcp_in_quickack_mode(tp)) {
tcp_send_ack(sk);
} else {
tcp_send_delayed_ack(sk);
}
it's not checking if more than one segment arrived.
> Shortly, I still do not understand what kind of pathalogy happens in your
> case (particularly, difference in adevrtised window before and after
> applying your patch is confusing _a_ _lot_, I really would like
ok, the window updates doesn't matter much, they're a separate issue and
what I did of delaying the window update if a delack is probably not RFC
compliant either even if I liked it, I should have sent two separate
patches for clarity.
> to look at larger tcpdump, covering beggining of the sssion),
> but all the 5 items are surely wrong.
ok, I need to recompile the kernel and reboot my desktop for generating
it, I will try to send a full trace shortly, but it will be exactly like
the one that I posted but replicated forever. the start of the trace
will be different, and in the start of the trace the delayed acks will
work correctly, after they get disabled (or as you say the stream gets
detected as ""unidirectional"" [i.e. quickack forever IMHO]) the delayed
acks stops working completely forever and my machine sends the double
number of packets than it was really necessary if delacks would work as
I expect by reading the RFC like in the start of the trace.
I'm not aware if there's a more uptodate draft defining new behaviour of
delayed acks different from rfc 1122, let me know if that's the case of
course.
> Unnumbered 6th one may be right, the heuristic with expansion twice
> have no explanation, I think it can be relaxed even more.
Ok fine. I guess it matters more in my case because the window was
small, so when changing from 1460 to 2920 was just triggering a windows
update for just 1 packet read in userspace, while excluding the == case,
the window update is sent only when at least two packets are read in
userspace that better fits the delayed ack behavior and removes lots of
spurious window updates.
Andrea
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4 delayed acks don't work, fixed
2003-03-18 19:34 ` Andrea Arcangeli
@ 2003-03-18 20:13 ` kuznet
2003-03-18 22:19 ` Andrea Arcangeli
0 siblings, 1 reply; 14+ messages in thread
From: kuznet @ 2003-03-18 20:13 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel, davem, ak
Hello!
> what is the point of this:
>
> #define TCP_DELACK_MAX ((unsigned)(HZ/5)) /* maximal time to delay before sending an ACK */
It is maximal delack for generic (transactional) traffic. It is not used
in stream mode. Big clamp of 500msec is hardwired to tcp_send_delayed_ack,
I simply was not able to invent name for it.
> and finally by the delack timer (if it was set to 1):
It is the place. Session stops to be tranasaction, when we
experience the first delack timeout.
> tcp_enter_quickack_mode is called every time we have to disable delayed
> acks like when we send duplicate acks or when there's packet reordering
> or whatever similar error.
Also correct. Delacks are disabled while recovery periods.
> how can 'pingpong' relate to the direction of the stream? I see no
> relation at all.
It is set, when we see traffic in both directions. It is cleared
when we see the first delack timeout. Logically, it should be cleared
when we do not see data flowing in opposite direction for some time,
but as soon as we do not see delack timeouts, it does not matter.
> since it's never re-activated,
If you do not see any delack timeouts, clearing pingpong does not make
difference.
> this is only true if pingpong was just 0. but if pingpong is 0 it won't
> send delayed acks in the first place because quick will very rarely get
> down to 0.
Stop here. quick quickly must become zero. In your case, when window
is one packet, it happens exactly after the first packet.
I am confused. Please, check.
> segments there SHOULD be an ACK for at least every second
> segment.
SHOULD, not MUST. :-)
Jokes apart, it is simply wrong statement. Right one reads: "when right
egde of window advanced by at least two segments". It is supposed to provide
ACK clock, but when window stalled, such acks are pure abuse, they are simply
ignored by clocking mechanism.
> if (eaten) {
> if (tcp_in_quickack_mode(tp)) {
> tcp_send_ack(sk);
> } else {
> tcp_send_delayed_ack(sk);
> }
>
> it's not checking if more than one segment arrived.
"eaten" is special path, it happens when this function is subroutine
of tcp_recvmsg(), where the same code is executed upon return
from the function.
Alexey
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4 delayed acks don't work, fixed
2003-03-18 20:13 ` kuznet
@ 2003-03-18 22:19 ` Andrea Arcangeli
2003-03-18 22:35 ` kuznet
0 siblings, 1 reply; 14+ messages in thread
From: Andrea Arcangeli @ 2003-03-18 22:19 UTC (permalink / raw)
To: kuznet; +Cc: linux-kernel, davem, ak
On Tue, Mar 18, 2003 at 11:13:42PM +0300, kuznet@ms2.inr.ac.ru wrote:
> Hello!
>
> > what is the point of this:
> >
> > #define TCP_DELACK_MAX ((unsigned)(HZ/5)) /* maximal time to delay before sending an ACK */
>
> It is maximal delack for generic (transactional) traffic. It is not used
> in stream mode. Big clamp of 500msec is hardwired to tcp_send_delayed_ack,
> I simply was not able to invent name for it.
>
> > and finally by the delack timer (if it was set to 1):
>
> It is the place. Session stops to be tranasaction, when we
> experience the first delack timeout.
In a normal internet connection you will always get packet loss or
timeouts in the middle of any big transfer.
however as far as the delacks can be reactivated w/o waiting dozen of
packets it's ok.
> > tcp_enter_quickack_mode is called every time we have to disable delayed
> > acks like when we send duplicate acks or when there's packet reordering
> > or whatever similar error.
>
> Also correct. Delacks are disabled while recovery periods.
sure, delacks must be disabled until the ofo queue is empty again.
> > how can 'pingpong' relate to the direction of the stream? I see no
> > relation at all.
>
> It is set, when we see traffic in both directions. It is cleared
> when we see the first delack timeout. Logically, it should be cleared
> when we do not see data flowing in opposite direction for some time,
> but as soon as we do not see delack timeouts, it does not matter.
>
>
> > since it's never re-activated,
>
> If you do not see any delack timeouts, clearing pingpong does not make
> difference.
I see seldom delack timeouts during streming because the streamer simply
waits, the bandwidth of the link is higher than the streamer one
> > this is only true if pingpong was just 0. but if pingpong is 0 it won't
> > send delayed acks in the first place because quick will very rarely get
> > down to 0.
>
> Stop here. quick quickly must become zero. In your case, when window
> is one packet, it happens exactly after the first packet.
there must be something that forbids it because I get immediate acks
instead.
>
> I am confused. Please, check.
>
>
> > segments there SHOULD be an ACK for at least every second
> > segment.
>
> SHOULD, not MUST. :-)
>
> Jokes apart, it is simply wrong statement. Right one reads: "when right
> egde of window advanced by at least two segments". It is supposed to provide
> ACK clock, but when window stalled, such acks are pure abuse, they are simply
> ignored by clocking mechanism.
>
>
> > if (eaten) {
> > if (tcp_in_quickack_mode(tp)) {
> > tcp_send_ack(sk);
> > } else {
> > tcp_send_delayed_ack(sk);
> > }
> >
> > it's not checking if more than one segment arrived.
>
> "eaten" is special path, it happens when this function is subroutine
> of tcp_recvmsg(), where the same code is executed upon return
> from the function.
so is the ack sent elsewhere if this was the third packet and there's a
window advance?
Andrea
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.4 delayed acks don't work, fixed
2003-03-18 22:19 ` Andrea Arcangeli
@ 2003-03-18 22:35 ` kuznet
[not found] ` <20030319002409.GI30541@dualathlon.random>
0 siblings, 1 reply; 14+ messages in thread
From: kuznet @ 2003-03-18 22:35 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel, davem, ak
Hello!
> sure, delacks must be disabled until the ofo queue is empty again.
We do the best efforts to disable them until sender completes slow start
i.e. half of advertised window.
Again, this does not matter in your case, you have window of one (rarely, two)
packets though all the tcpdump. And this bothers me a lot, much more than
ack frequency.
> I see seldom delack timeouts during streming because the streamer simply
> waits, the bandwidth of the link is higher than the streamer one
I do not understand this, to be honest. What does clock this sender?
Some internal clock of sender?
If it is clocked not by acks, and its clock is slower than ack clock,
then I daresay we do absolutely right thing (modulo funny window).
> there must be something that forbids it because I get immediate acks
> instead.
Adding to:
> > I am confused. Please, check.
... it is the thing to understand. I still do not.
> so is the ack sent elsewhere if this was the third packet and there's a
> window advance?
It is sent when cleanup_rbuf() is called, this happens when the function
returns and no more skbs are _already_ queued in backlog. Until that time
it does not make sense to send ACKs. On tcpdump you would see it as burst
of ACKs, spaced by microsecond intervals.
Alexey
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2003-03-19 19:38 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-17 8:25 2.4 delayed acks don't work, fixed Andrea Arcangeli
2003-03-18 18:34 ` kuznet
2003-03-18 19:34 ` Andrea Arcangeli
2003-03-18 20:13 ` kuznet
2003-03-18 22:19 ` Andrea Arcangeli
2003-03-18 22:35 ` kuznet
[not found] ` <20030319002409.GI30541@dualathlon.random>
2003-03-19 0:37 ` David S. Miller
2003-03-19 0:58 ` Andrea Arcangeli
2003-03-19 1:33 ` Help with patch for vesafbd support again? Kendall Bennett
2003-03-19 3:00 ` Randy.Dunlap
2003-03-19 19:25 ` Kendall Bennett
2003-03-19 1:55 ` 2.4 delayed acks don't work, fixed Andi Kleen
2003-03-19 2:02 ` David S. Miller
2003-03-19 19:48 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).