netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Craig Gallek <kraigatgoog@gmail.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Subject: Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode
Date: Fri, 25 Mar 2016 14:31:23 -0400	[thread overview]
Message-ID: <CAF=yD-+QgnbM5_LR73NFnzE2AJWwPxkoii30O8bbEwXR3XhCvA@mail.gmail.com> (raw)
In-Reply-To: <1458925242.6473.41.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, Mar 25, 2016 at 1:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-03-25 at 12:31 -0400, Craig Gallek wrote:
>
>> I believe the issue here is that closing the listen sockets will drop
>> any connections that are in the listen queue but have not been
>> accepted yet.  In the case of reuseport, you could in theory drain
>> those queues into the non-closed sockets, but that probably has some
>> interesting consequences...
>
> It is more complicated than this.
>
> Ideally, no TCP connection should be dropped during a server change.
>
> The idea is to let old program running as long as :
> 1) It has established TCP sessions
> 2) Some SYN_RECV pseudo requests are still around
>
> Once 3WHS completes for these SYN_RECV, children are queued into
> listener accept queues.
>
> But the idea is to direct all new SYN packets to the 'new' process and
> its listeners. (New SYN_RECV should be created on behalf on the new
> listeners only)
>
>
> In some environments, the listeners are simply transfered via FD
> passing, from the 'old process' to the new one.

Right. Comparatively, one of the nice features of the BPF variant is
that the sockets in the old process can passively enter listen_off
state solely with changes initiated by the new process (change the bpf
filter for the group).

By the way, if I read correctly, the listen_off feature was already
possible without kernel changes prior to fast reuseport by changing
SO_BINDTODEVICE on the old process's sockets to effectively segment
them into a separate reuseport group. With fast reuseport,
sk_bound_dev_if state equivalence is checked on joining a group, but
the socket is not removed from the array when that syscall is made, so
this does not work.

  reply	other threads:[~2016-03-25 18:32 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-25 15:29 [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode Craig Gallek
2016-03-25 16:21 ` Alexei Starovoitov
2016-03-25 16:31   ` Craig Gallek
2016-03-25 17:00     ` Eric Dumazet
2016-03-25 18:31       ` Willem de Bruijn [this message]
  -- strict thread matches above, loose matches on Subject: below --
2015-09-27  0:30 Tolga Ceylan
2015-09-27  1:04 ` Eric Dumazet
2015-09-27  1:37   ` Tolga Ceylan
2015-09-27  1:44 ` Aaron Conole
2015-09-27  2:02   ` Tolga Ceylan
2015-09-27  2:24     ` Eric Dumazet
2015-11-11  5:41       ` Tom Herbert
2015-11-11  6:19         ` Eric Dumazet
2015-11-11 17:05           ` Tom Herbert
2015-11-11 17:23             ` Eric Dumazet
2015-11-11 18:23               ` Tom Herbert
2015-11-11 18:43                 ` Eric Dumazet
2015-11-12  1:09                   ` Eric Dumazet
2015-12-15 16:14                     ` Willy Tarreau
2015-12-15 17:10                       ` Eric Dumazet
2015-12-15 17:43                         ` Willy Tarreau
2015-12-15 18:21                           ` Eric Dumazet
2015-12-15 19:44                             ` Willy Tarreau
2015-12-15 21:21                               ` Eric Dumazet
2015-12-16  7:38                                 ` Willy Tarreau
2015-12-16 16:15                                   ` Willy Tarreau
2015-12-18 16:33                                     ` Josh Snyder
2015-12-18 18:58                                       ` Willy Tarreau
2015-12-19  2:38                                         ` Eric Dumazet
2015-12-19  7:00                                           ` Willy Tarreau
2015-12-21 20:38                                             ` Tom Herbert
2015-12-21 20:41                                               ` Willy Tarreau
2016-03-24  5:10                                                 ` Tolga Ceylan
2016-03-24  6:12                                                   ` Willy Tarreau
2016-03-24 14:13                                                     ` Eric Dumazet
2016-03-24 14:22                                                       ` Willy Tarreau
2016-03-24 14:45                                                         ` Eric Dumazet
2016-03-24 15:30                                                           ` Willy Tarreau
2016-03-24 16:33                                                             ` Eric Dumazet
2016-03-24 16:50                                                               ` Willy Tarreau
2016-03-24 17:01                                                                 ` Eric Dumazet
2016-03-24 17:26                                                                   ` Tom Herbert
2016-03-24 17:55                                                                     ` Daniel Borkmann
2016-03-24 18:20                                                                       ` Tolga Ceylan
2016-03-24 18:24                                                                         ` Willy Tarreau
2016-03-24 18:37                                                                         ` Eric Dumazet
2016-03-24 22:40                                                                       ` Yann Ylavic
2016-03-24 22:49                                                                         ` Eric Dumazet
2016-03-24 23:40                                                                           ` Yann Ylavic
2016-03-24 23:54                                                                             ` Tom Herbert
2016-03-25  0:01                                                                               ` Yann Ylavic
2016-03-25  5:28                                                                               ` Willy Tarreau
2016-03-25  6:49                                                                                 ` Eric Dumazet
2016-03-25  8:53                                                                                   ` Willy Tarreau
2016-03-25 11:21                                                                                     ` Yann Ylavic
2016-03-25 13:17                                                                                       ` Eric Dumazet
2016-03-25  0:25                                                                           ` David Miller
2016-03-25  0:24                                                                         ` David Miller
2016-03-24 18:00                                                                   ` Willy Tarreau
2016-03-24 18:21                                                                     ` Willy Tarreau
2016-03-24 18:32                                                                     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF=yD-+QgnbM5_LR73NFnzE2AJWwPxkoii30O8bbEwXR3XhCvA@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kraigatgoog@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).