Netdev Archive on lore.kernel.org
 help / color / Atom feed
From: Michal Kubecek <mkubecek@suse.cz>
To: netdev@vger.kernel.org
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Christoph Paasch <christoph.paasch@gmail.com>,
	"Prout, Andrew - LLSC - MITLL" <aprout@ll.mit.edu>,
	David Miller <davem@davemloft.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jonathan Looney <jtl@netflix.com>,
	Neal Cardwell <ncardwell@google.com>,
	Tyler Hicks <tyhicks@canonical.com>,
	Yuchung Cheng <ycheng@google.com>,
	Bruce Curtis <brucec@netflix.com>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	Dustin Marquess <dmarquess@apple.com>
Subject: Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits
Date: Thu, 11 Jul 2019 20:26:54 +0200
Message-ID: <20190711182654.GG5700@unicorn.suse.cz> (raw)
In-Reply-To: <eb6121ea-b02d-672e-25c9-2ad054d49fc7@gmail.com>

On Thu, Jul 11, 2019 at 11:19:45AM +0200, Eric Dumazet wrote:
> 
> 
> On 7/11/19 9:28 AM, Christoph Paasch wrote:
> > 
> > 
> >> On Jul 10, 2019, at 9:26 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >>
> >>
> >> On 7/10/19 8:53 PM, Prout, Andrew - LLSC - MITLL wrote:
> >>>
> >>> Our initial rollout was v4.14.130, but I reproduced it with v4.14.132 as well, reliably for the samba test and once (not reliably) with synthetic test I was trying. A patched v4.14.132 with this patch partially reverted (just the four lines from tcp_fragment deleted) passed the samba test.
> >>>
> >>> The synthetic test was a pair of simple send/recv test programs under the following conditions:
> >>> -The send socket was non-blocking
> >>> -SO_SNDBUF set to 128KiB
> >>> -The receiver NIC was being flooded with traffic from multiple hosts (to induce packet loss/retransmits)
> >>> -Load was on both systems: a while(1) program spinning on each CPU core
> >>> -The receiver was on an older unaffected kernel
> >>>
> >>
> >> SO_SNDBUF to 128KB does not permit to recover from heavy losses,
> >> since skbs needs to be allocated for retransmits.
> > 
> > Would it make sense to always allow the alloc in tcp_fragment when coming from __tcp_retransmit_skb() through the retransmit-timer ?
> 
> 4.15+ kernels have :
> 
> if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf &&
>     tcp_queue != TCP_FRAG_IN_WRITE_QUEUE)) {
> 
> 
> Meaning that things like TLP will succeed.

I get

          <idle>-0     [010] ..s. 301696.143296: p_tcp_fragment_0: (tcp_fragment+0x0/0x310) sndbuf=30000 wmemq=65600
          <idle>-0     [010] d.s. 301696.143301: r_tcp_fragment_0: (tcp_send_loss_probe+0x13d/0x1f0 <- tcp_fragment) ret=-12
          <idle>-0     [010] ..s. 301696.267644: p_tcp_fragment_0: (tcp_fragment+0x0/0x310) sndbuf=30000 wmemq=65600
          <idle>-0     [010] d.s. 301696.267650: r_tcp_fragment_0: (__tcp_retransmit_skb+0xf9/0x800 <- tcp_fragment) ret=-12
          <idle>-0     [010] ..s. 301696.875289: p_tcp_fragment_0: (tcp_fragment+0x0/0x310) sndbuf=30000 wmemq=65600
          <idle>-0     [010] d.s. 301696.875293: r_tcp_fragment_0: (__tcp_retransmit_skb+0xf9/0x800 <- tcp_fragment) ret=-12
          <idle>-0     [010] ..s. 301698.059267: p_tcp_fragment_0: (tcp_fragment+0x0/0x310) sndbuf=30000 wmemq=65600
          <idle>-0     [010] d.s. 301698.059271: r_tcp_fragment_0: (__tcp_retransmit_skb+0xf9/0x800 <- tcp_fragment) ret=-12
          <idle>-0     [010] ..s. 301700.427225: p_tcp_fragment_0: (tcp_fragment+0x0/0x310) sndbuf=30000 wmemq=65600
          <idle>-0     [010] d.s. 301700.427230: r_tcp_fragment_0: (__tcp_retransmit_skb+0xf9/0x800 <- tcp_fragment) ret=-12
          <idle>-0     [010] ..s. 301705.291144: p_tcp_fragment_0: (tcp_fragment+0x0/0x310) sndbuf=30000 wmemq=65600
          <idle>-0     [010] d.s. 301705.291151: r_tcp_fragment_0: (__tcp_retransmit_skb+0xf9/0x800 <- tcp_fragment) ret=-12
          <idle>-0     [010] ..s. 301714.762961: p_tcp_fragment_0: (tcp_fragment+0x0/0x310) sndbuf=30000 wmemq=65600
          <idle>-0     [010] d.s. 301714.762966: r_tcp_fragment_0: (__tcp_retransmit_skb+0xf9/0x800 <- tcp_fragment) ret=-12

on 5.2 kernel with this packetdrill script:

------------------------------------------------------------------------
--tolerance_usecs=10000

// flush cached TCP metrics
0.000  `ip tcp_metrics flush all`

// establish a connection
+0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.000 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [15000], 4) = 0
+0.000 bind(3, ..., ...) = 0
+0.000 listen(3, 1) = 0

+0.100 < S 0:0(0) win 60000 <mss 1000,nop,nop,sackOK,nop,wscale 7>
+0.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
+0.100 < . 1:1(0) ack 1 win 2000
+0.000 accept(3, ..., ...) = 4
+0.100 write(4, ..., 30000) = 30000

+0.000 > . 1:2001(2000) ack 1
+0.000 > . 2001:4001(2000) ack 1
+0.000 > . 4001:6001(2000) ack 1
+0.000 > . 6001:8001(2000) ack 1
+0.000 > . 8001:10001(2000) ack 1
+0.010 < . 1:1(0) ack 10001 win 2000
+0.000 > . 10001:12001(2000) ack 1
+0.000 > . 12001:14001(2000) ack 1
+0.000 > . 14001:16001(2000) ack 1
+0.000 > . 16001:18001(2000) ack 1
+0.000 > . 18001:20001(2000) ack 1
+0.000 > . 20001:22001(2000) ack 1
+0.000 > . 22001:24001(2000) ack 1
+0.000 > . 24001:26001(2000) ack 1
+0.000 > . 26001:28001(2000) ack 1
+0.000 > P. 28001:30001(2000) ack 1
+0.010 < . 1:1(0) ack 30001 win 2000
+0.000 write(4, ..., 40000) = 40000
+0.000 > . 30001:32001(2000) ack 1
+0.000 > . 32001:34001(2000) ack 1
+0.000 > . 34001:36001(2000) ack 1
+0.000 > . 36001:38001(2000) ack 1
+0.000 > . 38001:40001(2000) ack 1
+0.000 > . 40001:42001(2000) ack 1
+0.000 > . 42001:44001(2000) ack 1
+0.000 > . 44001:46001(2000) ack 1
+0.000 > . 46001:48001(2000) ack 1
+0.000 > . 48001:50001(2000) ack 1
+0.000 > . 50001:52001(2000) ack 1
+0.000 > . 52001:54001(2000) ack 1
+0.000 > . 54001:56001(2000) ack 1
+0.000 > . 56001:58001(2000) ack 1
+0.000 > . 58001:60001(2000) ack 1
+0.000 > . 60001:62001(2000) ack 1
+0.000 > . 62001:64001(2000) ack 1
+0.000 > . 64001:66001(2000) ack 1
+0.000 > . 66001:68001(2000) ack 1
+0.000 > P. 68001:70001(2000) ack 1

+0.000 `ss -nteim state established sport == :8080`

+0.120~+0.200 > P. 69001:70001(1000) ack 1
------------------------------------------------------------------------

I'm aware it's not a realistic test. It was written as quick and simple
check of the pre-4.19 patch, but it shows that even TLP may not get
through.

Michal

  reply index

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17 17:03 [PATCH net 0/4] tcp: make sack processing more robust Eric Dumazet
2019-06-17 17:03 ` [PATCH net 1/4] tcp: limit payload size of sacked skbs Eric Dumazet
2019-06-17 17:14   ` Jonathan Lemon
2019-06-17 17:03 ` [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits Eric Dumazet
2019-06-17 17:14   ` Jonathan Lemon
2019-06-18  0:18   ` Christoph Paasch
2019-06-18  2:28     ` Eric Dumazet
2019-06-18  3:19       ` Christoph Paasch
2019-06-18  3:44         ` Eric Dumazet
2019-06-18  3:53           ` Christoph Paasch
2019-06-18  4:08             ` Eric Dumazet
2019-07-10 18:23         ` Prout, Andrew - LLSC - MITLL
2019-07-10 18:28           ` Eric Dumazet
2019-07-10 18:53             ` Prout, Andrew - LLSC - MITLL
2019-07-10 19:26               ` Eric Dumazet
2019-07-11  7:28                 ` Christoph Paasch
2019-07-11  9:19                   ` Eric Dumazet
2019-07-11 18:26                     ` Michal Kubecek [this message]
2019-07-11 18:50                       ` Eric Dumazet
2019-07-11 10:18                   ` Eric Dumazet
2019-07-11 17:14                 ` Prout, Andrew - LLSC - MITLL
2019-07-11 18:28                   ` Eric Dumazet
2019-07-11 19:04                     ` Jonathan Lemon
2019-07-12  7:05                       ` Eric Dumazet
2019-07-16 15:13                   ` Prout, Andrew - LLSC - MITLL
2019-06-17 17:03 ` [PATCH net 3/4] tcp: add tcp_min_snd_mss sysctl Eric Dumazet
2019-06-17 17:15   ` Jonathan Lemon
2019-06-17 17:18   ` Tyler Hicks
2019-06-17 17:03 ` [PATCH net 4/4] tcp: enforce tcp_min_snd_mss in tcp_mtu_probing() Eric Dumazet
2019-06-17 17:16   ` Jonathan Lemon
2019-06-17 17:18   ` Tyler Hicks
2019-06-17 17:41 ` [PATCH net 0/4] tcp: make sack processing more robust David Miller
2019-08-02 19:02 [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits Bernd
2019-08-02 19:14 ` Neal Cardwell
2019-08-02 19:58   ` Bernd
2019-08-14 14:41     ` Marcelo Ricardo Leitner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190711182654.GG5700@unicorn.suse.cz \
    --to=mkubecek@suse.cz \
    --cc=aprout@ll.mit.edu \
    --cc=brucec@netflix.com \
    --cc=christoph.paasch@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dmarquess@apple.com \
    --cc=eric.dumazet@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=jtl@netflix.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=tyhicks@canonical.com \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git