netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: John Ousterhout <ouster@cs.stanford.edu>
Cc: netdev@vger.kernel.org
Subject: Re: BUG: sk_backlog.len can overestimate
Date: Tue, 1 Oct 2019 11:34:16 -0700	[thread overview]
Message-ID: <47fef079-635d-483e-b530-943b2a55fc22@gmail.com> (raw)
In-Reply-To: <CAGXJAmzHvKzKb1wzxtZK_KCu-pEQghznM4qmfzYmWeWR1CaJ7Q@mail.gmail.com>



On 10/1/19 10:25 AM, John Ousterhout wrote:
> On Tue, Oct 1, 2019 at 9:19 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> ...
>> Sorry, I have no idea what is the problem you see.
> 
> OK, let me try again from the start. Consider two values:
> * sk->sk_backlog.len
> * The actual number of bytes in buffers in the current backlog list
> 
> Now consider a series of propositions:
> 
> 1. These two are not always the same. As packets get processed by
> calling sk_backlog_rcv, they are removed from the backlog list, so the
> actual amount of memory consumed by the backlog list drops. However,
> sk->sk_backlog.len doesn't change until the entire backlog is cleared,
> at which point it is reset to zero. So, there can be periods of time
> where sk->sk_backlog.len overstates the actual memory consumption of
> the backlog.

Yes, this is done on purpose (and documented in __release_sock()

Otherwise you could have a livelock situation, with user thread being
trapped forever in system, and never return to user land.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871


> 
> 2. The gap between sk->sk_backlog.len and actual backlog size can grow
> quite large. This happens if new packets arrive while sk_backlog_rcv
> is working. The socket is locked, so these new packets will be added
> to the backlog, which will increase sk->sk_backlog_len. Under high
> load, this could continue indefinitely: packets keep arriving, so the
> backlog never empties, so sk->sk_backlog_len never gets reset.
> However, packets are actually being processed from the backlog, so
> it's possible that the actual size of the backlog isn't changing, yet
> sk->sk_backlog.len continues to grow.
> 
> 3. Eventually, the growth in sk->sk_backlog.len will be limited by the
> "limit" argument to sk_add_backlog. When this happens, packets will be
> dropped.

_Exactly_ WAI

> 
> 4. Now suppose I pass a value of 1000000 as the limit to
> sk_add_backlog. It's possible that sk_add_backlog will reject my
> request even though the backlog only contains a total of 10000 bytes.
> The other 990000 bytes were present on the backlog at one time (though
> not necessarily all at the same time), but they have been processed
> and removed; __release_sock hasn't gotten around to updating
> sk->sk_backlog.len, because it hasn't been able to completely clear
> the backlog.

WAI

> 
> 5. Bottom line: under high load, a socket can be forced to drop
> packets even though it never actually exceeded its memory budget. This
> isn't a case of a sender trying to fool us; we fooled ourselves,
> because of the delay in resetting sk->sk_backlog.len.
> 
> Does this make sense?

Yes, just increase your socket limits. setsockopt(...  SO_RCVBUF ...),
and risk user threads having bigger socket syscall latencies, obviously.

> 
> By the way, I have actually observed this phenomenon in an
> implementation of the Homa transport protocol.
> 

Maybe this transport protocol should size correctly its sockets limits :)

  reply	other threads:[~2019-10-01 18:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAGXJAmwQw1ohc48NfAvMyNDpDgHGkdVO89Jo8B0j0TuMr7wLpA@mail.gmail.com>
2019-09-30 23:58 ` BUG: sk_backlog.len can overestimate John Ousterhout
2019-10-01  0:14   ` Eric Dumazet
2019-10-01 15:44     ` John Ousterhout
     [not found]     ` <CAGXJAmxmJ-Vm379N4nbjXeQCAgY9ur53wmr0HZy23dQ_t++r-Q@mail.gmail.com>
     [not found]       ` <f4520c32-3133-fb3b-034e-d492d40eb066@gmail.com>
2019-10-01 15:46         ` Fwd: " John Ousterhout
2019-10-01 15:48         ` John Ousterhout
2019-10-01 16:19           ` Eric Dumazet
2019-10-01 17:25             ` John Ousterhout
2019-10-01 18:34               ` Eric Dumazet [this message]
2019-10-01 20:45                 ` John Ousterhout
2019-10-01 20:53                   ` Eric Dumazet
2019-10-01 23:01                     ` John Ousterhout

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47fef079-635d-483e-b530-943b2a55fc22@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=ouster@cs.stanford.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).