linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: kuznet@ms2.inr.ac.ru
Cc: linux-kernel@vger.kernel.org, davem@redhat.com, ak@suse.de
Subject: Re: 2.4 delayed acks don't work, fixed
Date: Tue, 18 Mar 2003 20:34:58 +0100	[thread overview]
Message-ID: <20030318193458.GY30541@dualathlon.random> (raw)
In-Reply-To: <200303181834.VAA04953@sex.inr.ac.ru>

On Tue, Mar 18, 2003 at 09:34:50PM +0300, kuznet@ms2.inr.ac.ru wrote:
> Hello!
> 
> > Apparently linux only waits 0.2 at max,
> 
> This is not true, the maximum is 0.5 in your case.

what is the point of this:

#define TCP_DELACK_MAX	((unsigned)(HZ/5))	/* maximal time to delay before sending an ACK */

if the above means the maximal time to delay before sending an ack in
0.5 and not 0.2?

> > 1) the delayed ack timer destroy the ato value resetting it to the min
> >    value (40msec) and the quickack mode is activated (pingpong = 0)
> 
> This is not true, delack timer inflates ato. pingpong=0 is not quickack
> mode, it means that the session is unidirectional stream, which
> is correct in your case.

pingpong is set to 0 only by "tcp_enter_quickack_mode" and later by:

			/* RFC2581. 4.2. SHOULD send immediate ACK, when
					 ^^^^^^^^^^^^^^^^^^^^^^^^^^
			 * gap in queue is filled.
			 */
			if (skb_queue_len(&tp->out_of_order_queue) == 0)
				tp->ack.pingpong = 0;

and finally by the delack timer (if it was set to 1):

	if (tcp_ack_scheduled(tp)) {
		if (!tp->ack.pingpong) {
			/* Delayed ACK missed: inflate ATO. */
			tp->ack.ato = min(tp->ack.ato << 1, tp->rto);
		} else {
			/* Delayed ACK missed: leave pingpong mode and
			 * deflate ATO.
			 */
			tp->ack.pingpong = 0;
			^^^^^^^^^^^^^^^^^^^^
			tp->ack.ato = TCP_ATO_MIN;
		}

tcp_enter_quickack_mode is called every time we have to disable delayed
acks like when we send duplicate acks or when there's packet reordering
or whatever similar error.

how can 'pingpong' relate to the direction of the stream? I see no
relation at all. That means quickack mode according to the code.
'pingpong' has no clue of what I'm sending, it only knows if I'm
receiving fine and in turn if I can delay the ack.

since it's never re-activated, the tcp stack assumes I'm not receiving
fine, right after the first error, and forever until I close the socket.

> > 2) the pingpong is never re-activated,
> 
> It MUST NOT. It is activated on transactional sessions only.

And according to tcp_in_quickack_mode this means you'll have a chance to
send a delayed ack only after "tp->ack.quick" have been sent out, after
an error of some sort (because whatever error triggers a
tcp_incr_quickack, tcp_enter_quickack_mode calls tcp_incr_quickack
infact).

> > 3) the ato averaging logic during the packet reception will not inflate
> >    the ato if "m > ato" which is obviously the case after a delack timer
> >    triggered and in turn after the ato is been deflated to its min value
> 
> When m > ato, the sample is invalid, apparently it is triggered by
> a random delay at sender. When real ato increases, increase
> is made in delack timer, not through estimator.

this is only true if pingpong was just 0. but if pingpong is 0 it won't
send delayed acks in the first place because quick will very rarely get
down to 0. The streamer delays the next packet of something longer than
ato once in a while, but for all other packets delayed acks could work
perfectly. Only once in a while the delack timer will trigger.

> > 4) the logic that bounds the delayed ack to the srtt >> 3 looks also
> >    risky, using the rto looks much safer to me there, to be sure
> >    those delacks aren't going to trigger too early
> 
> It is necessary to provide more or less sane behaviour on interactive
> session when ato > 100msec. Clamping by rto just does not make any sense.

ok.

> > 5) I suspect the current delack algorithm can wait more than 2 packets,
> 
> Yes, when window is not opening, it is not required. Delack is send
> when window is advanced.

The RFC 1122 says:


	    [..] in a stream of full-sized
            segments there SHOULD be an ACK for at least every second
            segment.

unless you can point me to a more recent RFC where the above text is
been obsoleted, you're wrong. After the second packet the delack must be
sent _always_ according to rfc 1122 (at least if they're full-sized like
in my workload). No matter if this generated a window advance or not.
2.2 did it right. Maybe I'm reading obsolete specifications? Could you
point me to the RFC that speaks about the pingpong directional stream as
well?

I also wonder if this is correct:

			if (eaten) {
				if (tcp_in_quickack_mode(tp)) {
					tcp_send_ack(sk);
				} else {
					tcp_send_delayed_ack(sk);
				}

it's not checking if more than one segment arrived.

> Shortly, I still do not understand what kind of pathalogy happens in your
> case (particularly, difference in adevrtised window before and after
> applying your patch is confusing _a_ _lot_, I really would like

ok, the window updates doesn't matter much, they're a separate issue and
what I did of delaying the window update if a delack is probably not RFC
compliant either even if I liked it, I should have sent two separate
patches for clarity.

> to look at larger tcpdump, covering beggining of the sssion),
> but all the 5 items are surely wrong.

ok, I need to recompile the kernel and reboot my desktop for generating
it, I will try to send a full trace shortly, but it will be exactly like
the one that I posted but replicated forever. the start of the trace
will be different, and in the start of the trace the delayed acks will
work correctly, after they get disabled (or as you say the stream gets
detected as ""unidirectional"" [i.e. quickack forever IMHO]) the delayed
acks stops working completely forever and my machine sends the double
number of packets than it was really necessary if delacks would work as
I expect by reading the RFC like in the start of the trace.

I'm not aware if there's a more uptodate draft defining new behaviour of
delayed acks different from rfc 1122, let me know if that's the case of
course.

> Unnumbered 6th one may be right, the heuristic with expansion twice
> have no explanation, I think it can be relaxed even more.

Ok fine. I guess it matters more in my case because the window was
small, so when changing from 1460 to 2920 was just triggering a windows
update for just 1 packet read in userspace, while excluding the == case,
the window update is sent only when at least two packets are read in
userspace that better fits the delayed ack behavior and removes lots of
spurious window updates.

Andrea

  reply	other threads:[~2003-03-18 19:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-03-17  8:25 2.4 delayed acks don't work, fixed Andrea Arcangeli
2003-03-18 18:34 ` kuznet
2003-03-18 19:34   ` Andrea Arcangeli [this message]
2003-03-18 20:13     ` kuznet
2003-03-18 22:19       ` Andrea Arcangeli
2003-03-18 22:35         ` kuznet
     [not found]           ` <20030319002409.GI30541@dualathlon.random>
2003-03-19  0:37             ` David S. Miller
2003-03-19  0:58               ` Andrea Arcangeli
2003-03-19  1:33                 ` Help with patch for vesafbd support again? Kendall Bennett
2003-03-19  3:00                   ` Randy.Dunlap
2003-03-19 19:25                     ` Kendall Bennett
2003-03-19  1:55               ` 2.4 delayed acks don't work, fixed Andi Kleen
2003-03-19  2:02                 ` David S. Miller
2003-03-19 19:48                   ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030318193458.GY30541@dualathlon.random \
    --to=andrea@suse.de \
    --cc=ak@suse.de \
    --cc=davem@redhat.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).