netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@herbertland.com>,
	Tolga Ceylan <tolga.ceylan@gmail.com>,
	Aaron Conole <aconole@bytheb.org>,
	"David S. Miller" <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode
Date: Wed, 16 Dec 2015 17:15:14 +0100	[thread overview]
Message-ID: <20151216161514.GA3476@1wt.eu> (raw)
In-Reply-To: <20151216073814.GA3373@1wt.eu>

[-- Attachment #1: Type: text/plain, Size: 2023 bytes --]

Hi Eric,

On Wed, Dec 16, 2015 at 08:38:14AM +0100, Willy Tarreau wrote:
> On Tue, Dec 15, 2015 at 01:21:15PM -0800, Eric Dumazet wrote:
> > On Tue, 2015-12-15 at 20:44 +0100, Willy Tarreau wrote:
> > 
> > > Thus do you think it's worth adding a new option as Tolga proposed ?
> > 
> > 
> > I thought we tried hard to avoid adding the option but determined
> > we could not avoid it ;)
> 
> Not yet, your other proposal of disabling SO_REUSEPORT makes sense if
> we combine it with the proposal to change the score in my patch. If
> we say that a socket which has SO_REUSEPORT scores higher, then the
> connections which don't want to accept new connections anymore will
> simply have to drop it an not be elected. I find this even cleaner
> since the sole purpose of the loop is to find the best socket in case
> of SO_REUSEPORT.

So I tried this and am pretty satisfied with the results, as I couldn't
see any single reset on 4.4-rc5 with it. On 4.1 I got a few very rare
resets at the exact moment the new process binds to the socket, because
I suspect some ACKs end up in the wrong queue exactly there. But
apparently the changes you did in 4.4 totally got rid of this, which is
great!

I suspected that I could enter a situation where a new process could
fail to bind if generations n-1 and n-2 were still present, because
n-2 would be running without SO_REUSEPORT and that should make this
test fail in inet_csk_bind_conflict(), but it never failed for me :

                        if ((!reuse || !sk2->sk_reuse ||
                            sk2->sk_state == TCP_LISTEN) &&
                            (!reuseport || !sk2->sk_reuseport ||
                            (sk2->sk_state != TCP_TIME_WAIT &&
                             !uid_eq(uid, sock_i_uid(sk2))))) {
...

So I'm clearly missing something and can't spot what. I mean, I'd
prefer to see my patch occasionally fail than not understanding why
it always works! If anyone has an suggestion I'm interested.

Here's the updated patch.

Best regards,
Willy


[-- Attachment #2: 0001-net-make-lingering-sockets-score-less-in-compute_sco.patch --]
[-- Type: text/plain, Size: 2647 bytes --]

>From c060a5db92274402a0178d7c777a1e37c15eadb9 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Tue, 15 Dec 2015 16:40:00 +0100
Subject: net: make lingering sockets score less in compute_score()

When multiple processes use SO_REUSEPORT for a seamless restart
operation, there's a tiny window during which both the old and the new
process are bound to the same port, and there's no way for the old
process to gracefully stop receiving connections without dropping
the few that are left in the queue between the last poll() and the
shutdown() or close() operation.

Incoming connections are distributed between multiple listening sockets
in inet_lookup_listener() according to multiple criteria. The first
criterion is a score based on a number of attributes for each socket,
then a hash computation ensures that the connections are evenly
distributed between sockets of equal weight.

This patch provides a simple approach by which the old process can
simply decrease its score by disabling SO_REUSEPORT on its listening
sockets. Thus, the sockets from the new process always score higher
and are always preferred.

The old process can then safely drain incoming connections and stop
after meeting the -1 EAGAIN condition, as shown in the example below :

         process A (old one)          |  process B (new one)
                                      |
          setsockopt(SO_REUSEPORT, 1) |
          listen() >= 0               |
          ...                         |
          accept()                    |
          ...                         |  setsockopt(SO_REUSEPORT, 1)
          ...                         |  listen()

       From now on, both processes receive incoming connections

          ...                         |  kill(process A, go_away)
          setsockopt(SO_REUSEPORT, 0) |  accept() >= 0

       Here process A stops receiving new connections

          accept() >= 0               |  accept() >= 0
          ...                         |
          accept() = -1 EAGAIN        |  accept() >= 0
          close()                     |
          exit()                      |

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/inet_hashtables.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ccc5980..1c950ba 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -189,6 +189,8 @@ static inline int compute_score(struct sock *sk, struct net *net,
 				return -1;
 			score += 4;
 		}
+		if (sk->sk_reuseport)
+			score++;
 		if (sk->sk_incoming_cpu == raw_smp_processor_id())
 			score++;
 	}
-- 
1.7.12.1


  reply	other threads:[~2015-12-16 16:15 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-27  0:30 [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode Tolga Ceylan
2015-09-27  1:04 ` Eric Dumazet
2015-09-27  1:37   ` Tolga Ceylan
2015-09-27  1:44 ` Aaron Conole
2015-09-27  2:02   ` Tolga Ceylan
2015-09-27  2:24     ` Eric Dumazet
2015-11-11  5:41       ` Tom Herbert
2015-11-11  6:19         ` Eric Dumazet
2015-11-11 17:05           ` Tom Herbert
2015-11-11 17:23             ` Eric Dumazet
2015-11-11 18:23               ` Tom Herbert
2015-11-11 18:43                 ` Eric Dumazet
2015-11-12  1:09                   ` Eric Dumazet
2015-12-15 16:14                     ` Willy Tarreau
2015-12-15 17:10                       ` Eric Dumazet
2015-12-15 17:43                         ` Willy Tarreau
2015-12-15 18:21                           ` Eric Dumazet
2015-12-15 19:44                             ` Willy Tarreau
2015-12-15 21:21                               ` Eric Dumazet
2015-12-16  7:38                                 ` Willy Tarreau
2015-12-16 16:15                                   ` Willy Tarreau [this message]
2015-12-18 16:33                                     ` Josh Snyder
2015-12-18 18:58                                       ` Willy Tarreau
2015-12-19  2:38                                         ` Eric Dumazet
2015-12-19  7:00                                           ` Willy Tarreau
2015-12-21 20:38                                             ` Tom Herbert
2015-12-21 20:41                                               ` Willy Tarreau
2016-03-24  5:10                                                 ` Tolga Ceylan
2016-03-24  6:12                                                   ` Willy Tarreau
2016-03-24 14:13                                                     ` Eric Dumazet
2016-03-24 14:22                                                       ` Willy Tarreau
2016-03-24 14:45                                                         ` Eric Dumazet
2016-03-24 15:30                                                           ` Willy Tarreau
2016-03-24 16:33                                                             ` Eric Dumazet
2016-03-24 16:50                                                               ` Willy Tarreau
2016-03-24 17:01                                                                 ` Eric Dumazet
2016-03-24 17:26                                                                   ` Tom Herbert
2016-03-24 17:55                                                                     ` Daniel Borkmann
2016-03-24 18:20                                                                       ` Tolga Ceylan
2016-03-24 18:24                                                                         ` Willy Tarreau
2016-03-24 18:37                                                                         ` Eric Dumazet
2016-03-24 22:40                                                                       ` Yann Ylavic
2016-03-24 22:49                                                                         ` Eric Dumazet
2016-03-24 23:40                                                                           ` Yann Ylavic
2016-03-24 23:54                                                                             ` Tom Herbert
2016-03-25  0:01                                                                               ` Yann Ylavic
2016-03-25  5:28                                                                               ` Willy Tarreau
2016-03-25  6:49                                                                                 ` Eric Dumazet
2016-03-25  8:53                                                                                   ` Willy Tarreau
2016-03-25 11:21                                                                                     ` Yann Ylavic
2016-03-25 13:17                                                                                       ` Eric Dumazet
2016-03-25  0:25                                                                           ` David Miller
2016-03-25  0:24                                                                         ` David Miller
2016-03-24 18:00                                                                   ` Willy Tarreau
2016-03-24 18:21                                                                     ` Willy Tarreau
2016-03-24 18:32                                                                     ` Eric Dumazet
2016-03-25 15:29 Craig Gallek
2016-03-25 16:21 ` Alexei Starovoitov
2016-03-25 16:31   ` Craig Gallek
2016-03-25 17:00     ` Eric Dumazet
2016-03-25 18:31       ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151216161514.GA3476@1wt.eu \
    --to=w@1wt.eu \
    --cc=aconole@bytheb.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=tolga.ceylan@gmail.com \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).