linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Dao <dqminh@cloudflare.com>
To: Eric Dumazet <edumazet@google.com>
Cc: netdev <netdev@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Marek Majkowski <marek@cloudflare.com>
Subject: Re: Expensive tcp_collapse with high tcp_rmem limit
Date: Thu, 20 Jan 2022 17:29:52 +0000	[thread overview]
Message-ID: <CA+wXwBSGsBjovTqvoPQEe012yEF2eYbnC5_0W==EAvWH1zbOAg@mail.gmail.com> (raw)
In-Reply-To: <CANn89iKBqPRHFy5U+SMxT5RUPkioDFrZ5rN5WKNwfzA-TkMhwA@mail.gmail.com>

On Thu, Jan 6, 2022 at 6:55 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Jan 6, 2022 at 10:52 AM Eric Dumazet <edumazet@google.com> wrote:
>
> > I think that you should first look if you are under some kind of attack [1]
> >
> > Eventually you would still have to make room, involving expensive copies.
> >
> > 12% of 16MB is still a lot of memory to copy.
> >
> > [1] Detecting an attack signature could allow you to zap the socket
> > and save ~16MB of memory per flow.

Sorry for the late reply, we spent more time over the past weeks to
gather more data.

>   tid 0: rmem_alloc=16780416 sk_rcvbuf=16777216 rcv_ssthresh=2920
>   tid 0: advmss=1460 wclamp=4194304 rcv_wnd=450560
>   tid 0: len=3316 truesize=15808
>   tid 0: len=4106 truesize=16640
>   tid 0: len=3967 truesize=16512
>   tid 0: len=2988 truesize=15488
> > I think that you should first look if you are under some kind of attack [1]

This and indeed the majority of similar occurrences come from a
websocket origin that can
emit a large flow of tiny packets. As the tcp_collapse hiccups occur
in a proxy node, we think that
a combination of slow / unresponsive clients and the websocket traffic
can trigger this.

We made a workaround to clamp the websocket's rcvbuf to a smaller
value and it reduces
the peak latency of tcp_collapse as we no longer need to collapse up to 16MB.

> What kind of NIC driver is used on your host ?

We are running mlx5

> Except that you would still have to parse the linear list.

Most of the time when we see a high value of tcp_collapse, the bloated
skb is almost always at the top
of the list. I guess the client is already unresponsive so the flow is
full of bloated skbs. I would rather not
having to spend too much time collapsing these skbs.

      reply	other threads:[~2022-01-20 17:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+wXwBRbLq6SW39qCD8GNG98YD5BJR2MFXmJV2zU1xwFjC-V0A@mail.gmail.com>
2022-01-05 13:38 ` Expensive tcp_collapse with high tcp_rmem limit Eric Dumazet
2022-01-06 12:32   ` Daniel Dao
2022-01-06 18:52     ` Eric Dumazet
2022-01-06 18:55       ` Eric Dumazet
2022-01-20 17:29         ` Daniel Dao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+wXwBSGsBjovTqvoPQEe012yEF2eYbnC5_0W==EAvWH1zbOAg@mail.gmail.com' \
    --to=dqminh@cloudflare.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marek@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).