linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4 delayed acks don't work, fixed
@ 2003-03-17  8:25 Andrea Arcangeli
  2003-03-18 18:34 ` kuznet
  0 siblings, 1 reply; 14+ messages in thread
From: Andrea Arcangeli @ 2003-03-17  8:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: David S. Miller, kuznet, Andi Kleen

Last week I installed adsl, and in the weekend I was playing with some
streaming service. While watching tcpdump I noticed some huge breakage
in the dealyed acks algorithm that generated an overkill number of acks
(around the double than what was really necessary). I suspect this
problem never seen the light of the day as it only silenty generates
some relevant global waste of the internet.  this is a 2.4 tcp stacks
getting data from a streamer with bulk transfers on normal connection
with stock 2.4 tcp stack:

01:34:24.061554 streamer.8300 > linux.53972: . 131401:132861(1460) ack 0 win 58400 (DF)
01:34:24.139741 linux.53972 > streamer.8300: . ack 132861 win 1460 (DF)
01:34:24.140554 streamer.8300 > linux.53972: . 132861:134321(1460) ack 0 win 58400 (DF)
01:34:24.143710 linux.53972 > streamer.8300: . ack 134321 win 1460 (DF)
01:34:24.223566 linux.53972 > streamer.8300: . ack 134321 win 2920 (DF)
01:34:24.241532 streamer.8300 > linux.53972: . 134321:135781(1460) ack 0 win 58400 (DF)
01:34:24.319737 linux.53972 > streamer.8300: . ack 135781 win 1460 (DF)
01:34:24.321529 streamer.8300 > linux.53972: . 135781:137241(1460) ack 0 win 58400 (DF)
01:34:24.323634 linux.53972 > streamer.8300: . ack 137241 win 1460 (DF)
01:34:24.421492 streamer.8300 > linux.53972: . 137241:138701(1460) ack 0 win 58400 (DF)
01:34:24.423541 linux.53972 > streamer.8300: . ack 138701 win 1460 (DF)
01:34:24.503581 linux.53972 > streamer.8300: . ack 138701 win 2920 (DF)
01:34:24.521555 streamer.8300 > linux.53972: . 138701:140161(1460) ack 0 win 58400 (DF)
01:34:24.599739 linux.53972 > streamer.8300: . ack 140161 win 1460 (DF)
01:34:24.601480 streamer.8300 > linux.53972: . 140161:141621(1460) ack 0 win 58400 (DF)
01:34:24.603663 linux.53972 > streamer.8300: . ack 141621 win 1460 (DF)

linux is the receiver of course (tcpdump is running on the linux box),
the sender is a streamer over the internet but it doesn't really matter,
it happens with any kind of transfer: after the first delack timer
triggers it keeps going like the above for the all remaining part of the
large downloads (i.e. for days until I reset the computer). This stremer
makes it more obvious becaue it waits some time before sending the next
packet (my bandwidth is now much higher than the one needed by the
player). These seldom waits triggers the delacks timers and after that
the delack feature is completely disabled, it restarts for very few
packets once in a while when ack.quick is set to 0 but those seldom
delayed acks are completely hidden by the above quickacks. Even worse
linux keeps sending more than 1 ack ever few received packets for
suprious too short window updates, so it's doing the exact opposite of
the delack feature.  It looks very broken to me.

rfc1122 says (quote):

	A TCP SHOULD implement a delayed ACK, but an ACK should not be
	excessively delayed; in particular, the delay MUST be less than 0.5
	seconds,

Apparently linux only waits 0.2 at max, this appears wrong too (but .2
would be more than enough for my testcase, when it's longer than .2 it's
because the streamer is intentionally delaying, so triggering the delack
is fine in such case).

I had a look and I found various explanations for the bad behaviour:

1) the delayed ack timer destroy the ato value resetting it to the min
   value (40msec) and the quickack mode is activated (pingpong = 0)
2) the pingpong is never re-activated, so it takes the whole receive
   window before the pingpong isn't significant anymore, then after the
   first delack timer it will take another receive window before I
   can see a new delayed ack
3) the ato averaging logic during the packet reception will not inflate
   the ato if "m > ato" which is obviously the case after a delack timer
   triggered and in turn after the ato is been deflated to its min value
4) the logic that bounds the delayed ack to the srtt >> 3 looks also
   risky, using the rto looks much safer to me there, to be sure
   those delacks aren't going to trigger too early
5) I suspect the current delack algorithm can wait more than 2 packets,
   the && must be a || after the (tp->rcv_nxt - tp->rcv_wup) >
   tp->ack.rcv_mss check, just try a netcat xxx chargen >/dev/null on a
   100mbit and see how many packets you need to receive before you can
   see the ack some time, this doesn't seem to happen with these
   modifications applied

Besides the above, there's also quite some ack overhead due the window
updates triggered by the userspace so I made it a little more
aggressive by sending an ack in recvmsg only if the potential rcv window is
been increase of _more_ than 2 times the current outstanding rcv window
(not equal), this way the suprious updates rarely happens, and I also
avoid updates if there's a delack timer pending and not blocked (this
last one looks quite a natural idea, this may actually hurt but I doubt,
certainly I would be ok to drop that goto out in cleanup_rbuf if you
think it's going to be wrong on very high speed networks).

this new one is the (IMHO) a much nicer behaviour for the same workloads
as above with the modifications applied:

08:57:27.718987 streamer.8300 > linux.32792: . 26281:27741(1460) ack 0 win 58400 (DF)
08:57:27.747964 streamer.8300 > linux.32792: . 27741:29201(1460) ack 0 win 58400 (DF)
08:57:27.748017 linux.32792 > streamer.8300: . ack 29201 win 2920 (DF)
08:57:27.768949 streamer.8300 > linux.32792: . 29201:30661(1460) ack 0 win 58400 (DF)
08:57:27.848937 streamer.8300 > linux.32792: . 30661:32121(1460) ack 0 win 58400 (DF)
08:57:27.848986 linux.32792 > streamer.8300: . ack 32121 win 1460 (DF)
08:57:27.934286 linux.32792 > streamer.8300: . ack 32121 win 4380 (DF)
08:57:27.948918 streamer.8300 > linux.32792: . 32121:33581(1460) ack 0 win 58400 (DF)
08:57:28.038882 streamer.8300 > linux.32792: . 33581:35041(1460) ack 0 win 58400 (DF)
08:57:28.038931 linux.32792 > streamer.8300: . ack 35041 win 2920 (DF)
08:57:28.058882 streamer.8300 > linux.32792: . 35041:36501(1460) ack 0 win 58400 (DF)
08:57:28.138866 streamer.8300 > linux.32792: . 36501:37961(1460) ack 0 win 58400 (DF)
08:57:28.138919 linux.32792 > streamer.8300: . ack 37961 win 1460 (DF)
08:57:28.238912 streamer.8300 > linux.32792: . 37961:39421(1460) ack 0 win 58400 (DF)
08:57:28.394274 linux.32792 > streamer.8300: . ack 39421 win 4380 (DF)
08:57:28.488823 streamer.8300 > linux.32792: . 39421:40881(1460) ack 0 win 58400 (DF)
08:57:28.508800 streamer.8300 > linux.32792: . 40881:42341(1460) ack 0 win 58400 (DF)
08:57:28.508841 linux.32792 > streamer.8300: . ack 42341 win 2920 (DF)
08:57:28.538803 streamer.8300 > linux.32792: . 42341:43801(1460) ack 0 win 58400 (DF)
08:57:28.608829 streamer.8300 > linux.32792: . 43801:45261(1460) ack 0 win 58400 (DF)
08:57:28.608877 linux.32792 > streamer.8300: . ack 45261 win 1460 (DF)
08:57:28.708788 streamer.8300 > linux.32792: . 45261:46721(1460) ack 0 win 58400 (DF)
08:57:28.864277 linux.32792 > streamer.8300: . ack 46721 win 4380 (DF)
08:57:28.958765 streamer.8300 > linux.32792: . 46721:48181(1460) ack 0 win 58400 (DF)
08:57:28.988704 streamer.8300 > linux.32792: . 48181:49641(1460) ack 0 win 58400 (DF)
08:57:28.988759 linux.32792 > streamer.8300: . ack 49641 win 2920 (DF)
08:57:29.018705 streamer.8300 > linux.32792: . 49641:51101(1460) ack 0 win 58400 (DF)
08:57:29.098699 streamer.8300 > linux.32792: . 51101:52561(1460) ack 0 win 58400 (DF)
08:57:29.098749 linux.32792 > streamer.8300: . ack 52561 win 1460 (DF)
08:57:29.208694 streamer.8300 > linux.32792: . 52561:54021(1460) ack 0 win 58400 (DF)
08:57:29.380937 linux.32792 > streamer.8300: . ack 54021 win 4380 (DF)
08:57:29.478646 streamer.8300 > linux.32792: . 54021:55481(1460) ack 0 win 58400 (DF)
08:57:29.498614 streamer.8300 > linux.32792: . 55481:56941(1460) ack 0 win 58400 (DF)
08:57:29.498648 linux.32792 > streamer.8300: . ack 56941 win 4380 (DF)
08:57:29.518615 streamer.8300 > linux.32792: . 56941:58401(1460) ack 0 win 58400 (DF)
08:57:29.598632 streamer.8300 > linux.32792: . 58401:59861(1460) ack 0 win 58400 (DF)
08:57:29.598677 linux.32792 > streamer.8300: . ack 59861 win 2920 (DF)
08:57:29.618619 streamer.8300 > linux.32792: . 59861:61321(1460) ack 0 win 58400 (DF)
08:57:29.698591 streamer.8300 > linux.32792: . 61321:62781(1460) ack 0 win 58400 (DF)
08:57:29.698637 linux.32792 > streamer.8300: . ack 62781 win 1460 (DF)

now my streming services are generating 1/4 of number of packets over
the internet compared to what the buggy logic in mainline does obviously
w/o any possible change in performance, so I'm going to use it. It may
not be RFC complaint but I doubt the current mainline code could be RFC
compliant in the first place.

here's the diff, comments welcome.

diff -urNp xx/include/net/tcp.h xxx/include/net/tcp.h
--- xx/include/net/tcp.h	2003-03-17 09:01:13.000000000 +0100
+++ xxx/include/net/tcp.h	2003-03-17 08:45:28.000000000 +0100
@@ -323,7 +323,7 @@ static __inline__ int tcp_sk_listen_hash
 				  * TIME-WAIT timer.
 				  */
 
-#define TCP_DELACK_MAX	((unsigned)(HZ/5))	/* maximal time to delay before sending an ACK */
+#define TCP_DELACK_MAX	((unsigned)(HZ/2))	/* maximal time to delay before sending an ACK */
 #if HZ >= 100
 #define TCP_DELACK_MIN	((unsigned)(HZ/25))	/* minimal time to delay before sending an ACK */
 #define TCP_ATO_MIN	((unsigned)(HZ/25))
diff -urNp xx/net/ipv4/tcp.c xxx/net/ipv4/tcp.c
--- xx/net/ipv4/tcp.c	2003-03-17 09:01:13.000000000 +0100
+++ xxx/net/ipv4/tcp.c	2003-03-17 08:10:23.000000000 +0100
@@ -1290,22 +1290,10 @@ void cleanup_rbuf(struct sock *sk, int c
 #endif
 
 	if (tcp_ack_scheduled(tp)) {
-		   /* Delayed ACKs frequently hit locked sockets during bulk receive. */
-		if (tp->ack.blocked
-		    /* Once-per-two-segments ACK was not sent by tcp_input.c */
-		    || tp->rcv_nxt - tp->rcv_wup > tp->ack.rcv_mss
-		    /*
-		     * If this read emptied read buffer, we send ACK, if
-		     * connection is not bidirectional, user drained
-		     * receive buffer and there was a small segment
-		     * in queue.
-		     */
-		    || (copied > 0 &&
-			(tp->ack.pending&TCP_ACK_PUSHED) &&
-			!tp->ack.pingpong &&
-			atomic_read(&sk->rmem_alloc) == 0)) {
+		if (tp->ack.blocked)
+			/* Delayed ACKs frequently hit locked sockets during bulk receive. */
 			time_to_ack = 1;
-		}
+		goto out;
 	}
 
   	/* We send an ACK if we can now advertise a non-zero window
@@ -1318,7 +1306,7 @@ void cleanup_rbuf(struct sock *sk, int c
 		__u32 rcv_window_now = tcp_receive_window(tp);
 
 		/* Optimize, __tcp_select_window() is not cheap. */
-		if (2*rcv_window_now <= tp->window_clamp) {
+		if (2*rcv_window_now < tp->window_clamp) {
 			__u32 new_window = __tcp_select_window(sk);
 
 			/* Send ACK now, if this read freed lots of space
@@ -1326,10 +1314,11 @@ void cleanup_rbuf(struct sock *sk, int c
 			 * We can advertise it now, if it is not less than current one.
 			 * "Lots" means "at least twice" here.
 			 */
-			if(new_window && new_window >= 2*rcv_window_now)
+			if(new_window && new_window > 2*rcv_window_now)
 				time_to_ack = 1;
 		}
 	}
+ out:
 	if (time_to_ack)
 		tcp_send_ack(sk);
 }
diff -urNp xx/net/ipv4/tcp_input.c xxx/net/ipv4/tcp_input.c
--- xx/net/ipv4/tcp_input.c	2003-03-17 09:01:03.000000000 +0100
+++ xxx/net/ipv4/tcp_input.c	2003-03-17 08:36:15.000000000 +0100
@@ -173,6 +173,11 @@ void tcp_enter_quickack_mode(struct tcp_
 	tp->ack.ato = TCP_ATO_MIN;
 }
 
+static inline void tcp_exit_quickack_mode(struct tcp_opt *tp)
+{
+	tp->ack.pingpong = 1;
+}
+
 /* Send ACKs quickly, if "quick" count is not exhausted
  * and the session is not interactive.
  */
@@ -381,16 +386,21 @@ static void tcp_event_data_recv(struct s
 		if (m <= TCP_ATO_MIN/2) {
 			/* The fastest case is the first. */
 			tp->ack.ato = (tp->ack.ato>>1) + TCP_ATO_MIN/2;
-		} else if (m < tp->ack.ato) {
-			tp->ack.ato = (tp->ack.ato>>1) + m;
-			if (tp->ack.ato > tp->rto)
-				tp->ack.ato = tp->rto;
-		} else if (m > tp->rto) {
+			tcp_exit_quickack_mode(tp);
+		} else if (unlikely(m > TCP_DELACK_MAX)) {
+			/* Delayed acks are worthless on a very slow link. */
+			tcp_incr_quickack(tp);
+		} else if (unlikely(m > tp->rto)) {
 			/* Too long gap. Apparently sender falled to
 			 * restart window, so that we send ACKs quickly.
 			 */
 			tcp_incr_quickack(tp);
 			tcp_mem_reclaim(sk);
+		} else { 
+			tp->ack.ato = (tp->ack.ato>>1) + m;
+			if (tp->ack.ato > tp->rto)
+				tp->ack.ato = tp->rto;
+			tcp_exit_quickack_mode(tp);
 		}
 	}
 	tp->ack.lrcvtime = now;
@@ -3131,11 +3141,7 @@ static __inline__ void __tcp_ack_snd_che
 	struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp);
 
 	    /* More than one full frame received... */
-	if (((tp->rcv_nxt - tp->rcv_wup) > tp->ack.rcv_mss
-	     /* ... and right edge of window advances far enough.
-	      * (tcp_recvmsg() will send ACK otherwise). Or...
-	      */
-	     && __tcp_select_window(sk) >= tp->rcv_wnd) ||
+	if ((tp->rcv_nxt - tp->rcv_wup) > tp->ack.rcv_mss ||
 	    /* We ACK each frame or... */
 	    tcp_in_quickack_mode(tp) ||
 	    /* We have out of order data. */
diff -urNp xx/net/ipv4/tcp_output.c xxx/net/ipv4/tcp_output.c
--- xx/net/ipv4/tcp_output.c	2003-03-15 01:25:19.000000000 +0100
+++ xxx/net/ipv4/tcp_output.c	2003-03-17 08:37:07.000000000 +0100
@@ -1257,19 +1257,13 @@ void tcp_send_delayed_ack(struct sock *s
 	unsigned long timeout;
 
 	if (ato > TCP_DELACK_MIN) {
-		int max_ato = HZ/2;
-
-		if (tp->ack.pingpong || (tp->ack.pending&TCP_ACK_PUSHED))
-			max_ato = TCP_DELACK_MAX;
+		int max_ato = TCP_DELACK_MAX;
 
 		/* Slow path, intersegment interval is "high". */
 
-		/* If some rtt estimate is known, use it to bound delayed ack.
-		 * Do not use tp->rto here, use results of rtt measurements
-		 * directly.
-		 */
-		if (tp->srtt) {
-			int rtt = max(tp->srtt>>3, TCP_DELACK_MIN);
+		/* If some rtt estimate is known, use it to bound delayed ack. */
+		if (tp->rto) {
+			int rtt = max(tp->rto, TCP_DELACK_MIN);
 
 			if (rtt < max_ato)
 				max_ato = rtt;
diff -urNp xx/net/ipv4/tcp_timer.c xxx/net/ipv4/tcp_timer.c
--- xx/net/ipv4/tcp_timer.c	2003-03-15 01:25:19.000000000 +0100
+++ xxx/net/ipv4/tcp_timer.c	2003-03-17 08:37:07.000000000 +0100
@@ -250,11 +250,10 @@ static void tcp_delack_timer(unsigned lo
 			/* Delayed ACK missed: inflate ATO. */
 			tp->ack.ato = min(tp->ack.ato << 1, tp->rto);
 		} else {
-			/* Delayed ACK missed: leave pingpong mode and
-			 * deflate ATO.
+			/* Delayed ACK missed: leave pingpong mode
+			 * but be ready to reenable delay acks fast.
 			 */
 			tp->ack.pingpong = 0;
-			tp->ack.ato = TCP_ATO_MIN;
 		}
 		tcp_send_ack(sk);
 		NET_INC_STATS_BH(DelayedACKs);

Andrea

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 delayed acks don't work, fixed
  2003-03-17  8:25 2.4 delayed acks don't work, fixed Andrea Arcangeli
@ 2003-03-18 18:34 ` kuznet
  2003-03-18 19:34   ` Andrea Arcangeli
  0 siblings, 1 reply; 14+ messages in thread
From: kuznet @ 2003-03-18 18:34 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, davem, ak

Hello!

> Apparently linux only waits 0.2 at max,

This is not true, the maximum is 0.5 in your case.


> 1) the delayed ack timer destroy the ato value resetting it to the min
>    value (40msec) and the quickack mode is activated (pingpong = 0)

This is not true, delack timer inflates ato. pingpong=0 is not quickack
mode, it means that the session is unidirectional stream, which
is correct in your case.


> 2) the pingpong is never re-activated,

It MUST NOT. It is activated on transactional sessions only.


> 3) the ato averaging logic during the packet reception will not inflate
>    the ato if "m > ato" which is obviously the case after a delack timer
>    triggered and in turn after the ato is been deflated to its min value

When m > ato, the sample is invalid, apparently it is triggered by
a random delay at sender. When real ato increases, increase
is made in delack timer, not through estimator.



> 4) the logic that bounds the delayed ack to the srtt >> 3 looks also
>    risky, using the rto looks much safer to me there, to be sure
>    those delacks aren't going to trigger too early

It is necessary to provide more or less sane behaviour on interactive
session when ato > 100msec. Clamping by rto just does not make any sense.


> 5) I suspect the current delack algorithm can wait more than 2 packets,

Yes, when window is not opening, it is not required. Delack is send
when window is advanced.


Shortly, I still do not understand what kind of pathalogy happens in your
case (particularly, difference in adevrtised window before and after
applying your patch is confusing _a_ _lot_, I really would like
to look at larger tcpdump, covering beggining of the sssion),
but all the 5 items are surely wrong.

Unnumbered 6th one may be right, the heuristic with expansion twice
have no explanation, I think it can be relaxed even more.

Alexey

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 delayed acks don't work, fixed
  2003-03-18 18:34 ` kuznet
@ 2003-03-18 19:34   ` Andrea Arcangeli
  2003-03-18 20:13     ` kuznet
  0 siblings, 1 reply; 14+ messages in thread
From: Andrea Arcangeli @ 2003-03-18 19:34 UTC (permalink / raw)
  To: kuznet; +Cc: linux-kernel, davem, ak

On Tue, Mar 18, 2003 at 09:34:50PM +0300, kuznet@ms2.inr.ac.ru wrote:
> Hello!
> 
> > Apparently linux only waits 0.2 at max,
> 
> This is not true, the maximum is 0.5 in your case.

what is the point of this:

#define TCP_DELACK_MAX	((unsigned)(HZ/5))	/* maximal time to delay before sending an ACK */

if the above means the maximal time to delay before sending an ack in
0.5 and not 0.2?

> > 1) the delayed ack timer destroy the ato value resetting it to the min
> >    value (40msec) and the quickack mode is activated (pingpong = 0)
> 
> This is not true, delack timer inflates ato. pingpong=0 is not quickack
> mode, it means that the session is unidirectional stream, which
> is correct in your case.

pingpong is set to 0 only by "tcp_enter_quickack_mode" and later by:

			/* RFC2581. 4.2. SHOULD send immediate ACK, when
					 ^^^^^^^^^^^^^^^^^^^^^^^^^^
			 * gap in queue is filled.
			 */
			if (skb_queue_len(&tp->out_of_order_queue) == 0)
				tp->ack.pingpong = 0;

and finally by the delack timer (if it was set to 1):

	if (tcp_ack_scheduled(tp)) {
		if (!tp->ack.pingpong) {
			/* Delayed ACK missed: inflate ATO. */
			tp->ack.ato = min(tp->ack.ato << 1, tp->rto);
		} else {
			/* Delayed ACK missed: leave pingpong mode and
			 * deflate ATO.
			 */
			tp->ack.pingpong = 0;
			^^^^^^^^^^^^^^^^^^^^
			tp->ack.ato = TCP_ATO_MIN;
		}

tcp_enter_quickack_mode is called every time we have to disable delayed
acks like when we send duplicate acks or when there's packet reordering
or whatever similar error.

how can 'pingpong' relate to the direction of the stream? I see no
relation at all. That means quickack mode according to the code.
'pingpong' has no clue of what I'm sending, it only knows if I'm
receiving fine and in turn if I can delay the ack.

since it's never re-activated, the tcp stack assumes I'm not receiving
fine, right after the first error, and forever until I close the socket.

> > 2) the pingpong is never re-activated,
> 
> It MUST NOT. It is activated on transactional sessions only.

And according to tcp_in_quickack_mode this means you'll have a chance to
send a delayed ack only after "tp->ack.quick" have been sent out, after
an error of some sort (because whatever error triggers a
tcp_incr_quickack, tcp_enter_quickack_mode calls tcp_incr_quickack
infact).

> > 3) the ato averaging logic during the packet reception will not inflate
> >    the ato if "m > ato" which is obviously the case after a delack timer
> >    triggered and in turn after the ato is been deflated to its min value
> 
> When m > ato, the sample is invalid, apparently it is triggered by
> a random delay at sender. When real ato increases, increase
> is made in delack timer, not through estimator.

this is only true if pingpong was just 0. but if pingpong is 0 it won't
send delayed acks in the first place because quick will very rarely get
down to 0. The streamer delays the next packet of something longer than
ato once in a while, but for all other packets delayed acks could work
perfectly. Only once in a while the delack timer will trigger.

> > 4) the logic that bounds the delayed ack to the srtt >> 3 looks also
> >    risky, using the rto looks much safer to me there, to be sure
> >    those delacks aren't going to trigger too early
> 
> It is necessary to provide more or less sane behaviour on interactive
> session when ato > 100msec. Clamping by rto just does not make any sense.

ok.

> > 5) I suspect the current delack algorithm can wait more than 2 packets,
> 
> Yes, when window is not opening, it is not required. Delack is send
> when window is advanced.

The RFC 1122 says:


	    [..] in a stream of full-sized
            segments there SHOULD be an ACK for at least every second
            segment.

unless you can point me to a more recent RFC where the above text is
been obsoleted, you're wrong. After the second packet the delack must be
sent _always_ according to rfc 1122 (at least if they're full-sized like
in my workload). No matter if this generated a window advance or not.
2.2 did it right. Maybe I'm reading obsolete specifications? Could you
point me to the RFC that speaks about the pingpong directional stream as
well?

I also wonder if this is correct:

			if (eaten) {
				if (tcp_in_quickack_mode(tp)) {
					tcp_send_ack(sk);
				} else {
					tcp_send_delayed_ack(sk);
				}

it's not checking if more than one segment arrived.

> Shortly, I still do not understand what kind of pathalogy happens in your
> case (particularly, difference in adevrtised window before and after
> applying your patch is confusing _a_ _lot_, I really would like

ok, the window updates doesn't matter much, they're a separate issue and
what I did of delaying the window update if a delack is probably not RFC
compliant either even if I liked it, I should have sent two separate
patches for clarity.

> to look at larger tcpdump, covering beggining of the sssion),
> but all the 5 items are surely wrong.

ok, I need to recompile the kernel and reboot my desktop for generating
it, I will try to send a full trace shortly, but it will be exactly like
the one that I posted but replicated forever. the start of the trace
will be different, and in the start of the trace the delayed acks will
work correctly, after they get disabled (or as you say the stream gets
detected as ""unidirectional"" [i.e. quickack forever IMHO]) the delayed
acks stops working completely forever and my machine sends the double
number of packets than it was really necessary if delacks would work as
I expect by reading the RFC like in the start of the trace.

I'm not aware if there's a more uptodate draft defining new behaviour of
delayed acks different from rfc 1122, let me know if that's the case of
course.

> Unnumbered 6th one may be right, the heuristic with expansion twice
> have no explanation, I think it can be relaxed even more.

Ok fine. I guess it matters more in my case because the window was
small, so when changing from 1460 to 2920 was just triggering a windows
update for just 1 packet read in userspace, while excluding the == case,
the window update is sent only when at least two packets are read in
userspace that better fits the delayed ack behavior and removes lots of
spurious window updates.

Andrea

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 delayed acks don't work, fixed
  2003-03-18 19:34   ` Andrea Arcangeli
@ 2003-03-18 20:13     ` kuznet
  2003-03-18 22:19       ` Andrea Arcangeli
  0 siblings, 1 reply; 14+ messages in thread
From: kuznet @ 2003-03-18 20:13 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, davem, ak

Hello!

> what is the point of this:
> 
> #define TCP_DELACK_MAX	((unsigned)(HZ/5))	/* maximal time to delay before sending an ACK */

It is maximal delack for generic (transactional) traffic. It is not used
in stream mode. Big clamp of 500msec is hardwired to tcp_send_delayed_ack,
I simply was not able to invent name for it.

> and finally by the delack timer (if it was set to 1):

It is the place. Session stops to be tranasaction, when we
experience the first delack timeout.


> tcp_enter_quickack_mode is called every time we have to disable delayed
> acks like when we send duplicate acks or when there's packet reordering
> or whatever similar error.

Also correct. Delacks are disabled while recovery periods.

> how can 'pingpong' relate to the direction of the stream? I see no
> relation at all.

It is set, when we see traffic in both directions. It is cleared
when we see the first delack timeout. Logically, it should be cleared
when we do not see data flowing in opposite direction for some time,
but as soon as we do not see delack timeouts, it does not matter.


> since it's never re-activated,

If you do not see any delack timeouts, clearing pingpong does not make
difference.


> this is only true if pingpong was just 0. but if pingpong is 0 it won't
> send delayed acks in the first place because quick will very rarely get
> down to 0.

Stop here. quick quickly must become zero. In your case, when window
is one packet, it happens exactly after the first packet.

I am confused. Please, check.


>             segments there SHOULD be an ACK for at least every second
>             segment.

SHOULD, not MUST. :-)

Jokes apart, it is simply wrong statement. Right one reads: "when right
egde of window advanced by at least two segments". It is supposed to provide
ACK clock, but when window stalled, such acks are pure abuse, they are simply
ignored by clocking mechanism.


> 			if (eaten) {
> 				if (tcp_in_quickack_mode(tp)) {
> 					tcp_send_ack(sk);
> 				} else {
> 					tcp_send_delayed_ack(sk);
> 				}
> 
> it's not checking if more than one segment arrived.

"eaten" is special path, it happens when this function is subroutine
of tcp_recvmsg(), where the same code is executed upon return
from the function.

Alexey

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 delayed acks don't work, fixed
  2003-03-18 20:13     ` kuznet
@ 2003-03-18 22:19       ` Andrea Arcangeli
  2003-03-18 22:35         ` kuznet
  0 siblings, 1 reply; 14+ messages in thread
From: Andrea Arcangeli @ 2003-03-18 22:19 UTC (permalink / raw)
  To: kuznet; +Cc: linux-kernel, davem, ak

On Tue, Mar 18, 2003 at 11:13:42PM +0300, kuznet@ms2.inr.ac.ru wrote:
> Hello!
> 
> > what is the point of this:
> > 
> > #define TCP_DELACK_MAX	((unsigned)(HZ/5))	/* maximal time to delay before sending an ACK */
> 
> It is maximal delack for generic (transactional) traffic. It is not used
> in stream mode. Big clamp of 500msec is hardwired to tcp_send_delayed_ack,
> I simply was not able to invent name for it.
> 
> > and finally by the delack timer (if it was set to 1):
> 
> It is the place. Session stops to be tranasaction, when we
> experience the first delack timeout.

In a normal internet connection you will always get packet loss or
timeouts in the middle of any big transfer.

however as far as the delacks can be reactivated w/o waiting dozen of
packets it's ok.

> > tcp_enter_quickack_mode is called every time we have to disable delayed
> > acks like when we send duplicate acks or when there's packet reordering
> > or whatever similar error.
> 
> Also correct. Delacks are disabled while recovery periods.

sure, delacks must be disabled until the ofo queue is empty again.

> > how can 'pingpong' relate to the direction of the stream? I see no
> > relation at all.
> 
> It is set, when we see traffic in both directions. It is cleared
> when we see the first delack timeout. Logically, it should be cleared
> when we do not see data flowing in opposite direction for some time,
> but as soon as we do not see delack timeouts, it does not matter.
> 
> 
> > since it's never re-activated,
> 
> If you do not see any delack timeouts, clearing pingpong does not make
> difference.

I see seldom delack timeouts during streming because the streamer simply
waits, the bandwidth of the link is higher than the streamer one

> > this is only true if pingpong was just 0. but if pingpong is 0 it won't
> > send delayed acks in the first place because quick will very rarely get
> > down to 0.
> 
> Stop here. quick quickly must become zero. In your case, when window
> is one packet, it happens exactly after the first packet.

there must be something that forbids it because I get immediate acks
instead.

> 
> I am confused. Please, check.
> 
> 
> >             segments there SHOULD be an ACK for at least every second
> >             segment.
> 
> SHOULD, not MUST. :-)
> 
> Jokes apart, it is simply wrong statement. Right one reads: "when right
> egde of window advanced by at least two segments". It is supposed to provide
> ACK clock, but when window stalled, such acks are pure abuse, they are simply
> ignored by clocking mechanism.
> 
> 
> > 			if (eaten) {
> > 				if (tcp_in_quickack_mode(tp)) {
> > 					tcp_send_ack(sk);
> > 				} else {
> > 					tcp_send_delayed_ack(sk);
> > 				}
> > 
> > it's not checking if more than one segment arrived.
> 
> "eaten" is special path, it happens when this function is subroutine
> of tcp_recvmsg(), where the same code is executed upon return
> from the function.

so is the ack sent elsewhere if this was the third packet and there's a
window advance?

Andrea

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 delayed acks don't work, fixed
  2003-03-18 22:19       ` Andrea Arcangeli
@ 2003-03-18 22:35         ` kuznet
       [not found]           ` <20030319002409.GI30541@dualathlon.random>
  0 siblings, 1 reply; 14+ messages in thread
From: kuznet @ 2003-03-18 22:35 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, davem, ak

Hello!

> sure, delacks must be disabled until the ofo queue is empty again.

We do the best efforts to disable them until sender completes slow start
i.e. half of advertised window.

Again, this does not matter in your case, you have window of one (rarely, two)
packets though all the tcpdump. And this bothers me a lot, much more than
ack frequency.


> I see seldom delack timeouts during streming because the streamer simply
> waits, the bandwidth of the link is higher than the streamer one

I do not understand this, to be honest. What does clock this sender?
Some internal clock of sender?

If it is clocked not by acks, and its clock is slower than ack clock,
then I daresay we do absolutely right thing (modulo funny window).



> there must be something that forbids it because I get immediate acks
> instead.

Adding to:

> > I am confused. Please, check.

... it is the thing to understand. I still do not.


> so is the ack sent elsewhere if this was the third packet and there's a
> window advance?

It is sent when cleanup_rbuf() is called, this happens when the function
returns and no more skbs are _already_ queued in backlog. Until that time
it does not make sense to send ACKs. On tcpdump you would see it as burst
of ACKs, spaced by microsecond intervals.

Alexey

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 delayed acks don't work, fixed
       [not found]           ` <20030319002409.GI30541@dualathlon.random>
@ 2003-03-19  0:37             ` David S. Miller
  2003-03-19  0:58               ` Andrea Arcangeli
  2003-03-19  1:55               ` 2.4 delayed acks don't work, fixed Andi Kleen
  0 siblings, 2 replies; 14+ messages in thread
From: David S. Miller @ 2003-03-19  0:37 UTC (permalink / raw)
  To: andrea; +Cc: kuznet, linux-kernel, ak

   From: Andrea Arcangeli <andrea@suse.de>
   Date: Wed, 19 Mar 2003 01:24:09 +0100

   On Wed, Mar 19, 2003 at 01:35:23AM +0300, kuznet@ms2.inr.ac.ru wrote:
   > I do not understand this, to be honest. What does clock this sender?
   > Some internal clock of sender?
   
   I don't know the details of the userspace, but the data is generated in
   real time, it's like if you cat /dev/dsp | netcat -l on the server, and
   the receiver does netcat streamer xx >/dev/dsp
   
This streamer application should buffer at the sending side, in order
to keep the window full.  Introducing artificial delays on the sending
side of a unidirectional TCP transfer is really bad for performance
and I can assure you that more than just "weird delayed ACK" behavior
will result.

In fact, it is the most suboptimal way to send data over a TCP socket.
If you can't keep the window full, you do not end up using the
bandwidth available on the path.

I would not be surprised if the news pulling case you mentioned does
something similar.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 delayed acks don't work, fixed
  2003-03-19  0:37             ` David S. Miller
@ 2003-03-19  0:58               ` Andrea Arcangeli
  2003-03-19  1:33                 ` Help with patch for vesafbd support again? Kendall Bennett
  2003-03-19  1:55               ` 2.4 delayed acks don't work, fixed Andi Kleen
  1 sibling, 1 reply; 14+ messages in thread
From: Andrea Arcangeli @ 2003-03-19  0:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: kuznet, linux-kernel, ak

On Tue, Mar 18, 2003 at 04:37:01PM -0800, David S. Miller wrote:
>    From: Andrea Arcangeli <andrea@suse.de>
>    Date: Wed, 19 Mar 2003 01:24:09 +0100
> 
>    On Wed, Mar 19, 2003 at 01:35:23AM +0300, kuznet@ms2.inr.ac.ru wrote:
>    > I do not understand this, to be honest. What does clock this sender?
>    > Some internal clock of sender?
>    
>    I don't know the details of the userspace, but the data is generated in
>    real time, it's like if you cat /dev/dsp | netcat -l on the server, and
>    the receiver does netcat streamer xx >/dev/dsp
>    
> This streamer application should buffer at the sending side, in order
> to keep the window full.  Introducing artificial delays on the sending

the dev/dsp netcat was a dumb example, the app does proper buffering of
course.

> side of a unidirectional TCP transfer is really bad for performance
> and I can assure you that more than just "weird delayed ACK" behavior
> will result.
> 
> In fact, it is the most suboptimal way to send data over a TCP socket.
> If you can't keep the window full, you do not end up using the
> bandwidth available on the path.
> 
> I would not be surprised if the news pulling case you mentioned does
> something similar.

the rcv window looks full too doesn't it? the sender pumps the data and
then it keeps pumping until the window is full. there is an initial
buffering at the start of the playback that should take care of this.

But the data is also generated in real time, so the window is full, but
I guess new data to send it's not necessairly available immediatly as
soon as the rcv windows grows.

I don't see obvious inefficiencies in the protocol.

this is a tcpdump after 23 minutes for example (with my patch applied of
course or you would see the overkill number of acks):

01:48:44.030323 linux.43258 > streamer.8300: . ack 4163660229 win 7300 (DF)
01:48:44.171554 streamer.8300 > linux.43258: . 1:1461(1460) ack 0 win 58400 (DF)
01:48:44.201501 streamer.8300 > linux.43258: . 1461:2921(1460) ack 0 win 58400 (DF)
01:48:44.201557 linux.43258 > streamer.8300: . ack 2921 win 5840 (DF)
01:48:44.371461 streamer.8300 > linux.43258: . 2921:4381(1460) ack 0 win 58400 (DF)
01:48:44.411450 streamer.8300 > linux.43258: . 4381:5841(1460) ack 0 win 58400 (DF)
01:48:44.411500 linux.43258 > streamer.8300: . ack 5841 win 7300 (DF)
01:48:44.461430 streamer.8300 > linux.43258: . 5841:7301(1460) ack 0 win 58400 (DF)
01:48:44.501410 streamer.8300 > linux.43258: . 7301:8761(1460) ack 0 win 58400 (DF)
01:48:44.501461 linux.43258 > streamer.8300: . ack 8761 win 5840 (DF)
01:48:44.771374 streamer.8300 > linux.43258: . 8761:10221(1460) ack 0 win 58400 (DF)
01:48:44.811364 streamer.8300 > linux.43258: . 10221:11681(1460) ack 0 win 58400 (DF)
01:48:44.811413 linux.43258 > streamer.8300: . ack 11681 win 7300 (DF)
01:48:44.851352 streamer.8300 > linux.43258: . 11681:13141(1460) ack 0 win 58400 (DF)
01:48:44.881358 streamer.8300 > linux.43258: . 13141:14601(1460) ack 0 win 58400 (DF)
01:48:44.881405 linux.43258 > streamer.8300: . ack 14601 win 5840 (DF)
01:48:44.951348 streamer.8300 > linux.43258: . 14601:16061(1460) ack 0 win 58400 (DF)
01:48:44.981320 streamer.8300 > linux.43258: . 16061:17521(1460) ack 0 win 58400 (DF)
01:48:44.981381 linux.43258 > streamer.8300: . ack 17521 win 2920 (DF)
01:48:45.021321 streamer.8300 > linux.43258: . 17521:18981(1460) ack 0 win 58400 (DF)
01:48:45.051308 streamer.8300 > linux.43258: . 18981:20441(1460) ack 0 win 58400 (DF)
01:48:45.051355 linux.43258 > streamer.8300: . ack 20441 win 2920 (DF)
01:48:45.294200 linux.43258 > streamer.8300: . ack 20441 win 7300 (DF)
01:48:45.311272 streamer.8300 > linux.43258: . 20441:21901(1460) ack 0 win 58400 (DF)
01:48:45.361297 streamer.8300 > linux.43258: . 21901:23361(1460) ack 0 win 58400 (DF)
01:48:45.361352 linux.43258 > streamer.8300: . ack 23361 win 4380 (DF)
01:48:45.691242 streamer.8300 > linux.43258: . 23361:24821(1460) ack 0 win 58400 (DF)
01:48:45.721194 streamer.8300 > linux.43258: . 24821:26281(1460) ack 0 win 58400 (DF)
01:48:45.721245 linux.43258 > streamer.8300: . ack 26281 win 7300 (DF)
01:48:45.771199 streamer.8300 > linux.43258: . 26281:27741(1460) ack 0 win 58400 (DF)
01:48:45.891161 streamer.8300 > linux.43258: . 27741:29201(1460) ack 0 win 58400 (DF)
01:48:45.891207 linux.43258 > streamer.8300: . ack 29201 win 7300 (DF)
01:48:45.921147 streamer.8300 > linux.43258: . 29201:30661(1460) ack 0 win 58400 (DF)
01:48:45.941150 streamer.8300 > linux.43258: . 30661:32121(1460) ack 0 win 58400 (DF)
01:48:45.941182 linux.43258 > streamer.8300: . ack 32121 win 5840 (DF)
01:48:45.981153 streamer.8300 > linux.43258: . 32121:33581(1460) ack 0 win 58400 (DF)
01:48:46.021133 streamer.8300 > linux.43258: . 33581:35041(1460) ack 0 win 58400 (DF)
01:48:46.021189 linux.43258 > streamer.8300: . ack 35041 win 2920 (DF)
01:48:46.051129 streamer.8300 > linux.43258: . 35041:36501(1460) ack 0 win 58400 (DF)
01:48:46.091112 streamer.8300 > linux.43258: . 36501:37961(1460) ack 0 win 58400 (DF)
01:48:46.091162 linux.43258 > streamer.8300: . ack 37961 win 1460 (DF)
01:48:46.201167 streamer.8300 > linux.43258: . 37961:39421(1460) ack 0 win 58400 (DF)
01:48:46.340311 linux.43258 > streamer.8300: . ack 39421 win 4380 (DF)
01:48:46.461074 streamer.8300 > linux.43258: . 39421:40881(1460) ack 0 win 58400 (DF)
01:48:46.491046 streamer.8300 > linux.43258: . 40881:42341(1460) ack 0 win 58400 (DF)
01:48:46.491100 linux.43258 > streamer.8300: . ack 42341 win 2920 (DF)
01:48:46.531039 streamer.8300 > linux.43258: . 42341:43801(1460) ack 0 win 58400 (DF)
01:48:46.653668 linux.43258 > streamer.8300: . ack 43801 win 4380 (DF)
01:48:46.711062 streamer.8300 > linux.43258: . 43801:45261(1460) ack 0 win 58400 (DF)
01:48:46.771007 streamer.8300 > linux.43258: . 45261:46721(1460) ack 0 win 58400 (DF)
01:48:46.771062 linux.43258 > streamer.8300: . ack 46721 win 2920 (DF)
01:48:46.810011 streamer.8300 > linux.43258: . 46721:48181(1460) ack 0 win 58400 (DF)
01:48:46.880996 streamer.8300 > linux.43258: . 48181:49641(1460) ack 0 win 58400 (DF)
01:48:46.881049 linux.43258 > streamer.8300: . ack 49641 win 2920 (DF)
01:48:46.990978 streamer.8300 > linux.43258: . 49641:51101(1460) ack 0 win 58400 (DF)
01:48:47.030972 streamer.8300 > linux.43258: . 51101:52561(1460) ack 0 win 58400 (DF)
01:48:47.031017 linux.43258 > streamer.8300: . ack 52561 win 1460 (DF)
01:48:47.149944 streamer.8300 > linux.43258: . 52561:54021(1460) ack 0 win 58400 (DF)
01:48:47.330327 linux.43258 > streamer.8300: . ack 54021 win 4380 (DF)
01:48:47.440894 streamer.8300 > linux.43258: . 54021:55481(1460) ack 0 win 58400 (DF)
01:48:47.480858 streamer.8300 > linux.43258: . 55481:56941(1460) ack 0 win 58400 (DF)
01:48:47.480912 linux.43258 > streamer.8300: . ack 56941 win 4380 (DF)
01:48:47.520857 streamer.8300 > linux.43258: . 56941:58401(1460) ack 0 win 58400 (DF)
01:48:47.670328 linux.43258 > streamer.8300: . ack 58401 win 5840 (DF)
01:48:47.760838 streamer.8300 > linux.43258: . 58401:59861(1460) ack 0 win 58400 (DF)
01:48:47.800808 streamer.8300 > linux.43258: . 59861:61321(1460) ack 0 win 58400 (DF)
01:48:47.800862 linux.43258 > streamer.8300: . ack 61321 win 5840 (DF)
01:48:47.890787 streamer.8300 > linux.43258: . 61321:62781(1460) ack 0 win 58400 (DF)
01:48:47.930791 streamer.8300 > linux.43258: . 62781:64241(1460) ack 0 win 58400 (DF)
01:48:47.930857 linux.43258 > streamer.8300: . ack 64241 win 4380 (DF)
01:48:47.960770 streamer.8300 > linux.43258: . 64241:65701(1460) ack 0 win 58400 (DF)
01:48:47.999824 streamer.8300 > linux.43258: . 65701:67161(1460) ack 0 win 58400 (DF)
01:48:47.999853 linux.43258 > streamer.8300: . ack 67161 win 2920 (DF)
01:48:48.050760 streamer.8300 > linux.43258: . 67161:68621(1460) ack 0 win 58400 (DF)
01:48:48.133671 linux.43258 > streamer.8300: . ack 68621 win 2920 (DF)
01:48:48.150747 streamer.8300 > linux.43258: . 68621:70081(1460) ack 0 win 58400 (DF)
01:48:48.290329 linux.43258 > streamer.8300: . ack 70081 win 4380 (DF)
01:48:48.350694 streamer.8300 > linux.43258: . 70081:71541(1460) ack 0 win 58400 (DF)
01:48:48.410696 streamer.8300 > linux.43258: . 71541:73001(1460) ack 0 win 58400 (DF)
01:48:48.410748 linux.43258 > streamer.8300: . ack 73001 win 4380 (DF)
01:48:48.460677 streamer.8300 > linux.43258: . 73001:74461(1460) ack 0 win 58400 (DF)
01:48:48.600333 linux.43258 > streamer.8300: . ack 74461 win 5840 (DF)
01:48:48.790637 streamer.8300 > linux.43258: . 74461:75921(1460) ack 0 win 58400 (DF)
01:48:48.830615 streamer.8300 > linux.43258: . 75921:77381(1460) ack 0 win 58400 (DF)
01:48:48.830669 linux.43258 > streamer.8300: . ack 77381 win 5840 (DF)
01:48:49.050551 streamer.8300 > linux.43258: . 77381:78841(1460) ack 0 win 58400 (DF)
01:48:49.090572 streamer.8300 > linux.43258: . 78841:80301(1460) ack 0 win 58400 (DF)
01:48:49.090639 linux.43258 > streamer.8300: . ack 80301 win 7300 (DF)

rcv window is almost full. netstat shows constant 65853/69628 in the
receive queue ready to be copied into userspace at the next recvmsg.

Andrea

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Help with patch for vesafbd support again?
  2003-03-19  0:58               ` Andrea Arcangeli
@ 2003-03-19  1:33                 ` Kendall Bennett
  2003-03-19  3:00                   ` Randy.Dunlap
  0 siblings, 1 reply; 14+ messages in thread
From: Kendall Bennett @ 2003-03-19  1:33 UTC (permalink / raw)
  To: linux-kernel

Hi Guys,

Well I managed to finally dig up the old code that Aki Laukkanen deveoped 
sometime in early 2000. Unfortunately Aki died sometime in January 2001, 
so his work on the vesafbd daemon and patches to the vesafb device driver 
were lost - until now.  

I would like to revive this project, and the code I received from Matan 
Ziv-Av still configures and compiles correctly on Red Hat 8.0. I need to 
patch the latest kernel vesafb driver, but I think his patch is very old 
(probably around 2.2.14 timeframe). I am grabbing the 2.2.14 code to see 
if the patch will apply to that code, and then try to port the patch to 
the latest kernel release. Which brings up the first question. What 
kernel version should I patch against? 2.4.x or 2.5.x?

However since I am not that familiar with the patching mechanism for the 
Linux kernel, would someone more familiar with this be willing to help 
out? I would like to modify the vesafb module in the kernel to optionally 
support the vesafbd daemon if it is present on the system, if not it will 
function as it does today. If vesafbd is present, it will be used to 
provide extended features to the default VESA framebuffer console driver.

I would also like to generalise the daemon module a bit such that it does 
not need to be a VESA specific daemon, but could in fact contain it's own 
hardware interfacing module. For instance the daemon could use XFree86 
loadable driver modules to implement the functions rather than the VESA 
interface code, which would also open up the option of doing accelerated 
screen blits using the existing XFree86 driver modules. Hence I was 
thinking that the name 'vesafbd' for the daemon is a misnomer and should 
probably be changed to something else like 'fbcond' or something. Any 
suggestions? Or should we just leave it as 'vesafbd' even though it could 
be updated to support more than just the VESA BIOS interface?

Also the code I have right now for the daemon relies on the /dev/vesafb 
special file to have been created, which is used as the communication 
mechanism between the modified vesafb kernel driver and the daemon code. 
The daemon simply constantly reads from /dev/vesafb for command packets 
to process and writes the results to /dev/vesafb. Some people suggested 
in the past that a better approach might be to either use extended 
ioctl()'s to the existing /dev/fb special file, and have the kernel 
module sleep until it needs to do something, or use other polling methods 
(of which I am not familiar). I would like some guidance here as to the 
best way to implement this daemon if people think it should be changed.

Finally, before I embark on this project, will this patch will be 
accepted into the kernel source code tree? I would hate to spend my time 
on it only to find out that the kernel developers don't like it and won't 
accept it.

Regards,

---
Kendall Bennett
Chief Executive Officer
SciTech Software, Inc.
Phone: (530) 894 8400
http://www.scitechsoft.com

~ SciTech SNAP - The future of device driver technology! ~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 delayed acks don't work, fixed
  2003-03-19  0:37             ` David S. Miller
  2003-03-19  0:58               ` Andrea Arcangeli
@ 2003-03-19  1:55               ` Andi Kleen
  2003-03-19  2:02                 ` David S. Miller
  1 sibling, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2003-03-19  1:55 UTC (permalink / raw)
  To: David S. Miller; +Cc: andrea, kuznet, linux-kernel, ak

> This streamer application should buffer at the sending side, in order
> to keep the window full.  Introducing artificial delays on the sending
> side of a unidirectional TCP transfer is really bad for performance
> and I can assure you that more than just "weird delayed ACK" behavior
> will result.

The broken tail append patch I did some time ago was supposed to address 
that (better merging of writes on the sender side even for non SG
NICs). Perhaps it should be rechecked.

It may fix this.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 delayed acks don't work, fixed
  2003-03-19  1:55               ` 2.4 delayed acks don't work, fixed Andi Kleen
@ 2003-03-19  2:02                 ` David S. Miller
  2003-03-19 19:48                   ` Andrea Arcangeli
  0 siblings, 1 reply; 14+ messages in thread
From: David S. Miller @ 2003-03-19  2:02 UTC (permalink / raw)
  To: ak; +Cc: andrea, kuznet, linux-kernel

   From: Andi Kleen <ak@suse.de>
   Date: Wed, 19 Mar 2003 02:55:17 +0100

   > This streamer application should buffer at the sending side, in order
   > to keep the window full.  Introducing artificial delays on the sending
   > side of a unidirectional TCP transfer is really bad for performance
   > and I can assure you that more than just "weird delayed ACK" behavior
   > will result.
   
   The broken tail append patch I did some time ago was supposed to address 
   that (better merging of writes on the sender side even for non SG
   NICs). Perhaps it should be rechecked.
   
   It may fix this.

I think we're talking about independant problems.

This streamer application buffers, but once the buffer is fully pushed
to the other end and the "receiver catches up", we get periodic sends
created at the rate of device data creation.  This cannot fill the
pipe between sender and receiver, thus TCP behaves suboptimally.

TCP needs at least a full window of data on the send side to clock
things properly.  This streamer application doesn't give TCP that
after it's initial send buffering is been shrunk.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Help with patch for vesafbd support again?
  2003-03-19  1:33                 ` Help with patch for vesafbd support again? Kendall Bennett
@ 2003-03-19  3:00                   ` Randy.Dunlap
  2003-03-19 19:25                     ` Kendall Bennett
  0 siblings, 1 reply; 14+ messages in thread
From: Randy.Dunlap @ 2003-03-19  3:00 UTC (permalink / raw)
  To: KendallB; +Cc: linux-kernel

> Hi Guys,
>
> Well I managed to finally dig up the old code that Aki Laukkanen deveoped
> sometime in early 2000. Unfortunately Aki died sometime in January 2001,  so
> his work on the vesafbd daemon and patches to the vesafb device driver  were
> lost - until now.
>
> I would like to revive this project, and the code I received from Matan
> Ziv-Av still configures and compiles correctly on Red Hat 8.0. I need to
> patch the latest kernel vesafb driver, but I think his patch is very old
> (probably around 2.2.14 timeframe). I am grabbing the 2.2.14 code to see  if
> the patch will apply to that code, and then try to port the patch to  the
> latest kernel release. Which brings up the first question. What  kernel
> version should I patch against? 2.4.x or 2.5.x?
>
> However since I am not that familiar with the patching mechanism for the
> Linux kernel, would someone more familiar with this be willing to help  out?
> I would like to modify the vesafb module in the kernel to optionally
> support the vesafbd daemon if it is present on the system, if not it will
> function as it does today. If vesafbd is present, it will be used to
> provide extended features to the default VESA framebuffer console driver.
>
> I would also like to generalise the daemon module a bit such that it does
> not need to be a VESA specific daemon, but could in fact contain it's own
> hardware interfacing module. For instance the daemon could use XFree86
> loadable driver modules to implement the functions rather than the VESA
> interface code, which would also open up the option of doing accelerated
> screen blits using the existing XFree86 driver modules. Hence I was
> thinking that the name 'vesafbd' for the daemon is a misnomer and should
> probably be changed to something else like 'fbcond' or something. Any
> suggestions? Or should we just leave it as 'vesafbd' even though it could
> be updated to support more than just the VESA BIOS interface?
>
> Also the code I have right now for the daemon relies on the /dev/vesafb
> special file to have been created, which is used as the communication
> mechanism between the modified vesafb kernel driver and the daemon code.
> The daemon simply constantly reads from /dev/vesafb for command packets  to
> process and writes the results to /dev/vesafb. Some people suggested  in the
> past that a better approach might be to either use extended  ioctl()'s to
> the existing /dev/fb special file, and have the kernel  module sleep until
> it needs to do something, or use other polling methods  (of which I am not
> familiar). I would like some guidance here as to the  best way to implement
> this daemon if people think it should be changed.
>
> Finally, before I embark on this project, will this patch will be  accepted
> into the kernel source code tree? I would hate to spend my time  on it only
> to find out that the kernel developers don't like it and won't  accept it.

Hi,

Can (will) you say *why* you want this?  I can't find that info here.

and can you post the patch file (source code) that you have somewhere,
like a web page (not email if it's large)?

Thanks,
~Randy




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Help with patch for vesafbd support again?
  2003-03-19  3:00                   ` Randy.Dunlap
@ 2003-03-19 19:25                     ` Kendall Bennett
  0 siblings, 0 replies; 14+ messages in thread
From: Kendall Bennett @ 2003-03-19 19:25 UTC (permalink / raw)
  To: linux-kernel

"Randy.Dunlap" <rddunlap@osdl.org> wrote:

> > Finally, before I embark on this project, will this patch will be  accepted
> > into the kernel source code tree? I would hate to spend my time  on it only
> > to find out that the kernel developers don't like it and won't  accept it.
> 
> Can (will) you say *why* you want this?  I can't find that info here.

Why? I thought that would be clearly obvious. Right now with the VESA 
framebuffer device driver you cannot change the mode on the graphics card 
once the system has started. You are also restricted to only working with 
VESA 2.0 cards that have functional 32-bit protected mode functions if 
you wish to support panning and proper color map progamming, which is not 
always possible (believe me, I know of many that will not work unless you 
play magic with the selectors passed to the functions; something Linux 
cannot do).

With the vesafbd driver it is possible to use 'fbset' to change the 
active console display mode at any time after the system has booted, as 
well as use the fallback BIOS functions to program the palette and pan 
the display. Not as fast as the VBE 2.0 functions, but if the VBE 2.0 
functions are broken this is a good compromise.

On top of that I already mentioned the fact that it would allow 
framebuffer console drivers to be developed inside the daemon that could 
be implemented using XFree86 4.0 modules if desired (ie: sharing source 
code with XFree86 rather than having completely separate framebuffer 
console modules developed for all the same graphics cards).

> and can you post the patch file (source code) that you have
> somewhere, like a web page (not email if it's large)? 

The patch is not very large. However I have put the original release 
files up on my private web page for people to download and examine:

http://www.scitechsoft.com/~kendallb/vesafbd/vesafb-20000122.patch

http://www.scitechsoft.com/~kendallb/vesafbd/vesafbd-0.1.tar.gz

Regards,

---
Kendall Bennett
Chief Executive Officer
SciTech Software, Inc.
Phone: (530) 894 8400
http://www.scitechsoft.com

~ SciTech SNAP - The future of device driver technology! ~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4 delayed acks don't work, fixed
  2003-03-19  2:02                 ` David S. Miller
@ 2003-03-19 19:48                   ` Andrea Arcangeli
  0 siblings, 0 replies; 14+ messages in thread
From: Andrea Arcangeli @ 2003-03-19 19:48 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, kuznet, linux-kernel

On Tue, Mar 18, 2003 at 06:02:19PM -0800, David S. Miller wrote:
> TCP needs at least a full window of data on the send side to clock
> things properly.  This streamer application doesn't give TCP that

the inflating doubling the ato is too slow after you destroyed the ato
info setting it to 4 this is why it takes so long for 2.4 to clock
things properly, at least you should inflate it with the average down.
the <= in the window raise check in recvmsg as well generate window
updates too early, I find better to wait two packets to be read before
sending the window update. Now that I understood better how the logic in
mainline works I'll try to make a better patch (not very soon though
since I'm busy with other stuff and next week I'll be offline most of
the time).

Andrea

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2003-03-19 19:38 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-17  8:25 2.4 delayed acks don't work, fixed Andrea Arcangeli
2003-03-18 18:34 ` kuznet
2003-03-18 19:34   ` Andrea Arcangeli
2003-03-18 20:13     ` kuznet
2003-03-18 22:19       ` Andrea Arcangeli
2003-03-18 22:35         ` kuznet
     [not found]           ` <20030319002409.GI30541@dualathlon.random>
2003-03-19  0:37             ` David S. Miller
2003-03-19  0:58               ` Andrea Arcangeli
2003-03-19  1:33                 ` Help with patch for vesafbd support again? Kendall Bennett
2003-03-19  3:00                   ` Randy.Dunlap
2003-03-19 19:25                     ` Kendall Bennett
2003-03-19  1:55               ` 2.4 delayed acks don't work, fixed Andi Kleen
2003-03-19  2:02                 ` David S. Miller
2003-03-19 19:48                   ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).