From: Eric Dumazet <eric.dumazet@gmail.com> To: Willy Tarreau <w@1wt.eu> Cc: Arnaud Ebalard <arno@natisbad.org>, Cong Wang <xiyou.wangcong@gmail.com>, edumazet@google.com, linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org, Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Subject: Re: [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s Date: Sun, 17 Nov 2013 09:41:38 -0800 [thread overview] Message-ID: <1384710098.8604.58.camel@edumazet-glaptop2.roam.corp.google.com> (raw) In-Reply-To: <20131117141940.GA18569@1wt.eu> On Sun, 2013-11-17 at 15:19 +0100, Willy Tarreau wrote: > > So it is fairly possible that in your case you can't fill the link if you > consume too many descriptors. For example, if your server uses TCP_NODELAY > and sends incomplete segments (which is quite common), it's very easy to > run out of descriptors before the link is full. BTW I have a very simple patch for TCP stack that could help this exact situation... Idea is to use TCP Small Queue so that we dont fill qdisc/TX ring with very small frames, and let tcp_sendmsg() have more chance to fill complete packets. Again, for this to work very well, you need that NIC performs TX completion in reasonable amount of time... diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 3dc0c6c..10456cf 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -624,13 +624,19 @@ static inline void tcp_push(struct sock *sk, int flags, int mss_now, { if (tcp_send_head(sk)) { struct tcp_sock *tp = tcp_sk(sk); + struct sk_buff *skb = tcp_write_queue_tail(sk); if (!(flags & MSG_MORE) || forced_push(tp)) - tcp_mark_push(tp, tcp_write_queue_tail(sk)); + tcp_mark_push(tp, skb); tcp_mark_urg(tp, flags); - __tcp_push_pending_frames(sk, mss_now, - (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle); + if (flags & MSG_MORE) + nonagle = TCP_NAGLE_CORK; + if (atomic_read(&sk->sk_wmem_alloc) > 2048) { + set_bit(TSQ_THROTTLED, &tp->tsq_flags); + nonagle = TCP_NAGLE_CORK; + } + __tcp_push_pending_frames(sk, mss_now, nonagle); } }
WARNING: multiple messages have this Message-ID (diff)
From: eric.dumazet@gmail.com (Eric Dumazet) To: linux-arm-kernel@lists.infradead.org Subject: [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s Date: Sun, 17 Nov 2013 09:41:38 -0800 [thread overview] Message-ID: <1384710098.8604.58.camel@edumazet-glaptop2.roam.corp.google.com> (raw) In-Reply-To: <20131117141940.GA18569@1wt.eu> On Sun, 2013-11-17 at 15:19 +0100, Willy Tarreau wrote: > > So it is fairly possible that in your case you can't fill the link if you > consume too many descriptors. For example, if your server uses TCP_NODELAY > and sends incomplete segments (which is quite common), it's very easy to > run out of descriptors before the link is full. BTW I have a very simple patch for TCP stack that could help this exact situation... Idea is to use TCP Small Queue so that we dont fill qdisc/TX ring with very small frames, and let tcp_sendmsg() have more chance to fill complete packets. Again, for this to work very well, you need that NIC performs TX completion in reasonable amount of time... diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 3dc0c6c..10456cf 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -624,13 +624,19 @@ static inline void tcp_push(struct sock *sk, int flags, int mss_now, { if (tcp_send_head(sk)) { struct tcp_sock *tp = tcp_sk(sk); + struct sk_buff *skb = tcp_write_queue_tail(sk); if (!(flags & MSG_MORE) || forced_push(tp)) - tcp_mark_push(tp, tcp_write_queue_tail(sk)); + tcp_mark_push(tp, skb); tcp_mark_urg(tp, flags); - __tcp_push_pending_frames(sk, mss_now, - (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle); + if (flags & MSG_MORE) + nonagle = TCP_NAGLE_CORK; + if (atomic_read(&sk->sk_wmem_alloc) > 2048) { + set_bit(TSQ_THROTTLED, &tp->tsq_flags); + nonagle = TCP_NAGLE_CORK; + } + __tcp_push_pending_frames(sk, mss_now, nonagle); } }
next prev parent reply other threads:[~2013-11-17 17:41 UTC|newest] Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-11-10 13:53 [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s Arnaud Ebalard 2013-11-10 13:53 ` Arnaud Ebalard 2013-11-12 6:48 ` Cong Wang 2013-11-12 6:48 ` Cong Wang 2013-11-12 7:56 ` Arnaud Ebalard 2013-11-12 7:56 ` Arnaud Ebalard 2013-11-12 8:36 ` Willy Tarreau 2013-11-12 8:36 ` Willy Tarreau 2013-11-12 9:14 ` Arnaud Ebalard 2013-11-12 9:14 ` Arnaud Ebalard 2013-11-12 10:01 ` Willy Tarreau 2013-11-12 10:01 ` Willy Tarreau 2013-11-12 15:34 ` Arnaud Ebalard 2013-11-12 15:34 ` Arnaud Ebalard 2013-11-13 7:22 ` Willy Tarreau 2013-11-13 7:22 ` Willy Tarreau 2013-11-17 14:19 ` Willy Tarreau 2013-11-17 14:19 ` Willy Tarreau 2013-11-17 17:41 ` Eric Dumazet [this message] 2013-11-17 17:41 ` Eric Dumazet 2013-11-19 6:44 ` Arnaud Ebalard 2013-11-19 6:44 ` Arnaud Ebalard 2013-11-19 13:53 ` Eric Dumazet 2013-11-19 13:53 ` Eric Dumazet 2013-11-19 17:43 ` Willy Tarreau 2013-11-19 17:43 ` Willy Tarreau 2013-11-19 18:31 ` Eric Dumazet 2013-11-19 18:31 ` Eric Dumazet 2013-11-19 18:41 ` Willy Tarreau 2013-11-19 18:41 ` Willy Tarreau 2013-11-19 23:53 ` Arnaud Ebalard 2013-11-19 23:53 ` Arnaud Ebalard 2013-11-20 0:08 ` Eric Dumazet 2013-11-20 0:08 ` Eric Dumazet 2013-11-20 0:35 ` Willy Tarreau 2013-11-20 0:35 ` Willy Tarreau 2013-11-20 0:43 ` Eric Dumazet 2013-11-20 0:43 ` Eric Dumazet 2013-11-20 0:52 ` Willy Tarreau 2013-11-20 0:52 ` Willy Tarreau 2013-11-20 8:50 ` Thomas Petazzoni 2013-11-20 8:50 ` Thomas Petazzoni 2013-11-20 19:21 ` Arnaud Ebalard 2013-11-20 19:11 ` Willy Tarreau 2013-11-20 19:11 ` Willy Tarreau 2013-11-20 19:26 ` Arnaud Ebalard 2013-11-20 19:26 ` Arnaud Ebalard 2013-11-20 21:28 ` Arnaud Ebalard 2013-11-20 21:28 ` Arnaud Ebalard 2013-11-20 21:54 ` Willy Tarreau 2013-11-20 21:54 ` Willy Tarreau 2013-11-21 0:44 ` Willy Tarreau 2013-11-21 0:44 ` Willy Tarreau 2013-11-21 18:38 ` ARM network performance and dma_mask (was: [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s) Willy Tarreau 2013-11-21 19:04 ` Thomas Petazzoni 2013-11-21 19:04 ` Thomas Petazzoni 2013-11-21 21:51 ` Willy Tarreau 2013-11-21 21:51 ` ARM network performance and dma_mask (was: [BUG, REGRESSION?] 3.11.6+, 3.12: " Willy Tarreau 2013-11-21 22:01 ` ARM network performance and dma_mask Rob Herring 2013-11-21 22:01 ` Rob Herring 2013-11-21 22:13 ` Willy Tarreau 2013-11-21 22:13 ` Willy Tarreau 2013-11-21 21:51 ` [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s Arnaud Ebalard 2013-11-21 21:51 ` Arnaud Ebalard 2013-11-21 21:52 ` Willy Tarreau 2013-11-21 21:52 ` Willy Tarreau 2013-11-21 22:00 ` Eric Dumazet 2013-11-21 22:00 ` Eric Dumazet 2013-11-21 22:55 ` Arnaud Ebalard 2013-11-21 22:55 ` Arnaud Ebalard 2013-11-21 23:23 ` Rick Jones 2013-11-21 23:23 ` Rick Jones 2013-11-20 17:12 ` Willy Tarreau 2013-11-20 17:12 ` Willy Tarreau 2013-11-20 17:30 ` Eric Dumazet 2013-11-20 17:30 ` Eric Dumazet 2013-11-20 17:38 ` Willy Tarreau 2013-11-20 17:38 ` Willy Tarreau 2013-11-20 18:52 ` David Miller 2013-11-20 18:52 ` David Miller 2013-11-20 17:34 ` Willy Tarreau 2013-11-20 17:34 ` Willy Tarreau 2013-11-20 17:40 ` Eric Dumazet 2013-11-20 17:40 ` Eric Dumazet 2013-11-20 18:15 ` Willy Tarreau 2013-11-20 18:15 ` Willy Tarreau 2013-11-20 18:21 ` Eric Dumazet 2013-11-20 18:21 ` Eric Dumazet 2013-11-20 18:29 ` Willy Tarreau 2013-11-20 18:29 ` Willy Tarreau 2013-11-20 19:22 ` Arnaud Ebalard 2013-11-20 19:22 ` Arnaud Ebalard 2013-11-18 10:09 ` David Laight 2013-11-18 10:09 ` David Laight 2013-11-18 10:52 ` Willy Tarreau 2013-11-18 10:52 ` Willy Tarreau 2013-11-18 10:26 ` Thomas Petazzoni 2013-11-18 10:26 ` Thomas Petazzoni 2013-11-18 10:44 ` Simon Guinot 2013-11-18 10:44 ` Simon Guinot 2013-11-18 16:54 ` Stephen Hemminger 2013-11-18 16:54 ` Stephen Hemminger 2013-11-18 17:13 ` Eric Dumazet 2013-11-18 17:13 ` Eric Dumazet 2013-11-18 10:51 ` Willy Tarreau 2013-11-18 10:51 ` Willy Tarreau 2013-11-18 17:58 ` Florian Fainelli 2013-11-18 17:58 ` Florian Fainelli 2013-11-12 14:39 ` [PATCH] tcp: tsq: restore minimal amount of queueing Eric Dumazet 2013-11-12 15:24 ` Sujith Manoharan 2013-11-13 14:06 ` Eric Dumazet 2013-11-13 14:32 ` [PATCH v2] " Eric Dumazet 2013-11-13 21:18 ` Arnaud Ebalard 2013-11-13 21:59 ` Holger Hoffstaette 2013-11-13 23:40 ` Eric Dumazet 2013-11-13 23:52 ` Holger Hoffstaette 2013-11-17 23:15 ` Francois Romieu 2013-11-18 16:26 ` Holger Hoffstätte 2013-11-18 16:47 ` Eric Dumazet 2013-11-13 22:41 ` Eric Dumazet 2013-11-14 21:26 ` David Miller
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1384710098.8604.58.camel@edumazet-glaptop2.roam.corp.google.com \ --to=eric.dumazet@gmail.com \ --cc=arno@natisbad.org \ --cc=edumazet@google.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=netdev@vger.kernel.org \ --cc=thomas.petazzoni@free-electrons.com \ --cc=w@1wt.eu \ --cc=xiyou.wangcong@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.