netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Herbert <tom@herbertland.com>
To: Yann Ylavic <ylavic.dev@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Willy Tarreau <w@1wt.eu>, Tolga Ceylan <tolga.ceylan@gmail.com>,
	Craig Gallek <cgallek@google.com>, Josh Snyder <josh@code406.com>,
	Aaron Conole <aconole@bytheb.org>,
	"David S. Miller" <davem@davemloft.net>,
	Daniel Borkmann <daniel@iogearbox.net>
Subject: Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode
Date: Thu, 24 Mar 2016 16:54:03 -0700	[thread overview]
Message-ID: <CALx6S37uZah89sNgH9wuD1J+_WEhd34Z5zmrnG8Qp-AQ7Ew=Jg@mail.gmail.com> (raw)
In-Reply-To: <CAKQ1sVPJoaaf1gRfgYPRygfMFVqk08Q0K74wn6kLHV+vy=8w1w@mail.gmail.com>

On Thu, Mar 24, 2016 at 4:40 PM, Yann Ylavic <ylavic.dev@gmail.com> wrote:
> On Thu, Mar 24, 2016 at 11:49 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Thu, 2016-03-24 at 23:40 +0100, Yann Ylavic wrote:
>>
>>> FWIW, I find:
>>>
>>>     const struct bpf_insn prog[] = {
>>>         /* BPF_MOV64_REG(BPF_REG_6, BPF_REG_1) */
>>>         { BPF_ALU64 | BPF_MOV | BPF_X, BPF_REG_6, BPF_REG_1, 0, 0 },
>>>         /* BPF_LD_ABS(BPF_W, 0) R0 = (uint32_t)skb[0] */
>>>         { BPF_LD | BPF_ABS | BPF_W, 0, 0, 0, 0 },
>>>         /* BPF_ALU64_IMM(BPF_MOD, BPF_REG_0, mod) */
>>>         { BPF_ALU64 | BPF_MOD | BPF_K, BPF_REG_0, 0, 0, mod },
>>>         /* BPF_EXIT_INSN() */
>>>         { BPF_JMP | BPF_EXIT, 0, 0, 0, 0 }
>>>     };
>>> (and all the way to make it run)
>>>
>>> something quite unintuitive from a web server developper perspective,
>>> simply to make SO_REUSEPORT work with forked TCP listeners (probably
>>> as it should out of the box)...
>>
>>
>> That is why EBPF has LLVM backend.
>>
>> Basically you can write your "BPF" program in C, and let llvm convert it
>> into EBPF.
>
> I'll learn how to do this to get the best performances from the
> server, but having to do so to work around what looks like a defect
> (for simple/default SMP configurations at least, no NUMA or clever
> CPU-affinity or queuing policy involved) seems odd in the first place.
>
I disagree with your assessment that there is a defect. SO_REUSEPORT
is designed to spread packets amongst _equivalent_ connections. In the
server draining case sockets are no longer equivalent, but that is a
special case.

> From this POV, draining the (ending) listeners is already non obvious
> but might be reasonable, (e)BPF sounds really overkill.
>
Just the opposite, it's a simplification. With BPF we no longer to add
interfaces for all these special cases. This is an important point,
because the question is going to be raised for any proposed interface
change that could be accomplished with BPF (i.e. adding new interfaces
in the kernel becomes the overkill).

Please try to work with it. As I mentioned, the part that we may be
missing are some real world programs that we can direct people to use,
but aside from that I don't think we've seen any arguments that BPF is
overkill or too hard to use for stuff like this.

Tom

> But there are surely plenty of good reasons for it, and I won't be
> able to dispute your technical arguments in any case ;)
>
>>
>> Sure, you still can write BPF manually, as you could write HTTPS server
>> in assembly.
>
> OK, I'll take your previous proposal :)

  reply	other threads:[~2016-03-24 23:54 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-27  0:30 [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode Tolga Ceylan
2015-09-27  1:04 ` Eric Dumazet
2015-09-27  1:37   ` Tolga Ceylan
2015-09-27  1:44 ` Aaron Conole
2015-09-27  2:02   ` Tolga Ceylan
2015-09-27  2:24     ` Eric Dumazet
2015-11-11  5:41       ` Tom Herbert
2015-11-11  6:19         ` Eric Dumazet
2015-11-11 17:05           ` Tom Herbert
2015-11-11 17:23             ` Eric Dumazet
2015-11-11 18:23               ` Tom Herbert
2015-11-11 18:43                 ` Eric Dumazet
2015-11-12  1:09                   ` Eric Dumazet
2015-12-15 16:14                     ` Willy Tarreau
2015-12-15 17:10                       ` Eric Dumazet
2015-12-15 17:43                         ` Willy Tarreau
2015-12-15 18:21                           ` Eric Dumazet
2015-12-15 19:44                             ` Willy Tarreau
2015-12-15 21:21                               ` Eric Dumazet
2015-12-16  7:38                                 ` Willy Tarreau
2015-12-16 16:15                                   ` Willy Tarreau
2015-12-18 16:33                                     ` Josh Snyder
2015-12-18 18:58                                       ` Willy Tarreau
2015-12-19  2:38                                         ` Eric Dumazet
2015-12-19  7:00                                           ` Willy Tarreau
2015-12-21 20:38                                             ` Tom Herbert
2015-12-21 20:41                                               ` Willy Tarreau
2016-03-24  5:10                                                 ` Tolga Ceylan
2016-03-24  6:12                                                   ` Willy Tarreau
2016-03-24 14:13                                                     ` Eric Dumazet
2016-03-24 14:22                                                       ` Willy Tarreau
2016-03-24 14:45                                                         ` Eric Dumazet
2016-03-24 15:30                                                           ` Willy Tarreau
2016-03-24 16:33                                                             ` Eric Dumazet
2016-03-24 16:50                                                               ` Willy Tarreau
2016-03-24 17:01                                                                 ` Eric Dumazet
2016-03-24 17:26                                                                   ` Tom Herbert
2016-03-24 17:55                                                                     ` Daniel Borkmann
2016-03-24 18:20                                                                       ` Tolga Ceylan
2016-03-24 18:24                                                                         ` Willy Tarreau
2016-03-24 18:37                                                                         ` Eric Dumazet
2016-03-24 22:40                                                                       ` Yann Ylavic
2016-03-24 22:49                                                                         ` Eric Dumazet
2016-03-24 23:40                                                                           ` Yann Ylavic
2016-03-24 23:54                                                                             ` Tom Herbert [this message]
2016-03-25  0:01                                                                               ` Yann Ylavic
2016-03-25  5:28                                                                               ` Willy Tarreau
2016-03-25  6:49                                                                                 ` Eric Dumazet
2016-03-25  8:53                                                                                   ` Willy Tarreau
2016-03-25 11:21                                                                                     ` Yann Ylavic
2016-03-25 13:17                                                                                       ` Eric Dumazet
2016-03-25  0:25                                                                           ` David Miller
2016-03-25  0:24                                                                         ` David Miller
2016-03-24 18:00                                                                   ` Willy Tarreau
2016-03-24 18:21                                                                     ` Willy Tarreau
2016-03-24 18:32                                                                     ` Eric Dumazet
2016-03-25 15:29 Craig Gallek
2016-03-25 16:21 ` Alexei Starovoitov
2016-03-25 16:31   ` Craig Gallek
2016-03-25 17:00     ` Eric Dumazet
2016-03-25 18:31       ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALx6S37uZah89sNgH9wuD1J+_WEhd34Z5zmrnG8Qp-AQ7Ew=Jg@mail.gmail.com' \
    --to=tom@herbertland.com \
    --cc=aconole@bytheb.org \
    --cc=cgallek@google.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=josh@code406.com \
    --cc=netdev@vger.kernel.org \
    --cc=tolga.ceylan@gmail.com \
    --cc=w@1wt.eu \
    --cc=ylavic.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).