From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: TCPBacklogDrops during aggressive bursts of traffic Date: Tue, 22 May 2012 18:12:50 +0200 Message-ID: <1337703170.3361.217.camel@edumazet-glaptop> References: <1337092718.1689.45.camel@kjm-desktop.uk.level5networks.com> <1337093776.8512.1089.camel@edumazet-glaptop> <1337099368.1689.47.camel@kjm-desktop.uk.level5networks.com> <1337099641.8512.1102.camel@edumazet-glaptop> <1337100454.2544.25.camel@bwh-desktop.uk.solarflarecom.com> <1337101280.8512.1108.camel@edumazet-glaptop> <1337272292.1681.16.camel@kjm-desktop.uk.level5networks.com> <1337272654.3403.20.camel@edumazet-glaptop> <1337674831.1698.7.camel@kjm-desktop.uk.level5networks.com> <1337678759.3361.147.camel@edumazet-glaptop> <1337679045.3361.154.camel@edumazet-glaptop> <1337699379.1698.30.camel@kjm-desktop.uk.level5networks.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Ben Hutchings , netdev@vger.kernel.org To: Kieran Mansley Return-path: Received: from mail-ee0-f46.google.com ([74.125.83.46]:58554 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754575Ab2EVQMz (ORCPT ); Tue, 22 May 2012 12:12:55 -0400 Received: by eeit10 with SMTP id t10so1773669eei.19 for ; Tue, 22 May 2012 09:12:54 -0700 (PDT) In-Reply-To: <1337699379.1698.30.camel@kjm-desktop.uk.level5networks.com> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 2012-05-22 at 16:09 +0100, Kieran Mansley wrote: > On Tue, 2012-05-22 at 11:30 +0200, Eric Dumazet wrote: > > Also can you post a pcap capture of problematic flow ? > > I'll email this to you directly. The capture is generated with netserver > on the system under test, and NetPerf sending from a similar server. > I've only included the first 1000 frames to keep the capture size down. > There are 7 retransmissions in that capture, and the TCPBacklogDrops > counter incremented by 7 during the test, so I'm happy to say they are > the cause of the drops. > > The system under test was running net-next. > > I've not tried with another NIC (e.g. tg3) but will see if I can find > one to test. Or you could change sfc to allow its frames being coalesced. > > I've got a feeling that the drops might be easier to reproduce if I > taskset the netserver process to a different package than the one that > is handling the network interrupt for that NIC. This fits with my > earlier theory in that it is likely to increase the overhead of waking > the user-level process to satisfy the read and so increase the time > during which received packets could overflow the backlog. Having a > relatively aggressive sending TCP also helps, e.g. one that is > configured to open its congestion window quickly, as this will produce > more intensive bursts. __tcp_select_window() ( more precisely tcp_space() takes into account memory used in receive/ofo queue, but not frames in backlog queue) So if you send bursts, it might explain TCP stack continues to advertise a too big window, instead of anticipate the problem. Please try the following patch : diff --git a/include/net/tcp.h b/include/net/tcp.h index e79aa48..82382cb 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1042,8 +1042,9 @@ static inline int tcp_win_from_space(int space) /* Note: caller must be prepared to deal with negative returns */ static inline int tcp_space(const struct sock *sk) { - return tcp_win_from_space(sk->sk_rcvbuf - - atomic_read(&sk->sk_rmem_alloc)); + int used = atomic_read(&sk->sk_rmem_alloc) + sk->sk_backlog.len; + + return tcp_win_from_space(sk->sk_rcvbuf - used); } static inline int tcp_full_space(const struct sock *sk)