From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexey Kuznetsov Subject: Re: TCP packet size and delivery packet decisions Date: Wed, 8 Sep 2010 16:14:50 +0400 Message-ID: <20100908121450.GA11412@ms2.inr.ac.ru> References: <20100906.223010.173858342.davem@davemloft.net> <1283859552.2338.402.camel@edumazet-laptop> <20100907.201843.179933180.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: ilpo.jarvinen@helsinki.fi, eric.dumazet@gmail.com, leandroal@gmail.com, netdev@vger.kernel.org To: David Miller Return-path: Received: from minus.inr.ac.ru ([194.67.69.97]:34273 "HELO ms2.inr.ac.ru" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with SMTP id S1756135Ab0IHMPN (ORCPT ); Wed, 8 Sep 2010 08:15:13 -0400 Content-Disposition: inline In-Reply-To: <20100907.201843.179933180.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-ID: Hello! > [ Alexey, problem is that when receiver's maximum window is miniscule > (f.e. equal to MSS :-), we never send full MSS sized frames due to > our sender size SWS implementation. ] I see. The problem was that we do early packetization. If we chop frames to real mss (> max_window/2), we cannot do #3 (min(D.U) >= Fs * Max(SND.WND)) without subsequent fragmentation. The solution to chop to max_window/2 was mine and it was intended to solve the problem for devices with _large_ mtu, where max_window is comparable with mtu not because window is small, but because mtu is large. This case was important from performance viewpoint, fragmentation would destroy our smart early packetization technique. What's about the case of sane mtu and small max_window, that case was not simply ignored as "not-so-important", it also had a rational explanation, see below. If the issue must be resolved, I would suggest to: 1. Complicate mss = min(mss, max_window/2). Probably, do something like: if (max_window >= 65536 /* just a guess */) mss = min(mss, max_window/2); else mss = min(mss, max_window); 2. Add SWS avoidance checks in tcp_write_xmit(). It should detect condition when end_seq > tcp_wnd_end(tp) (like now), but proceed with fragmentation when tp->snd_nxt == tp->snd_una && tp->snd_wnd >= tp->max_window/2. Luckily, all this logic is already there due to TSO, only conditions when to fragment are to be adjusted a little. Frankly, I am not so sure that the issue should be resolved. There is one more aspect, not related to SWS. When mss > max_window/2 we can have only one segment in pipe, which is not good. When mss==max_window and we never see full sized frame sent, this looks strange, but I bet it is still better for performance under almost any curcumstances. I do not know actual context, of course. I can guess the situation can be like this: "I set window=mss exactly to see only one packet in flight all the times. Why the hell linux tries to send two mss/2 sized?" :-) > In ancient times we used to do this straight in sendmsg(), which had > the comment: > > /* We also need to worry about the window. If > * window < 1/2 the maximum window we've seen > * from this host, don't use it. This is > * sender side silly window prevention, as > * specified in RFC1122. (Note that this is > * different than earlier versions of SWS > * prevention, e.g. RFC813.). What we > * actually do is use the whole MSS. Since > * the results in the right edge of the packet > * being outside the window, it will be queued > * for later rather than sent. > */ BTW the comment was good, but this logic was not actually implemented. The code below this comment was incorrect, it chopped segments at tail of write_queue (with seq > snd_nxt) based on window calculation at head of queue, so that it did not work. Actually, this check can be done not earlier than in tcp_write_xmit(). Alexey