linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Baron <jbaron@akamai.com>
To: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Cc: andrii@kernel.org, ast@kernel.org, benh@amazon.com,
	bpf@vger.kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	edumazet@google.com, kafai@fb.com, kuba@kernel.org,
	kuni1840@gmail.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
Subject: Re: [PATCH v4 bpf-next 00/11] Socket migration for SO_REUSEPORT.
Date: Wed, 28 Apr 2021 10:44:12 -0400	[thread overview]
Message-ID: <fabd0598-c62e-ea88-f340-050136bb8266@akamai.com> (raw)
In-Reply-To: <20210428081342.1944-1-kuniyu@amazon.co.jp>



On 4/28/21 4:13 AM, Kuniyuki Iwashima wrote:
> From:   Jason Baron <jbaron@akamai.com>
> Date:   Tue, 27 Apr 2021 12:38:58 -0400
>> On 4/26/21 11:46 PM, Kuniyuki Iwashima wrote:
>>> The SO_REUSEPORT option allows sockets to listen on the same port and to
>>> accept connections evenly. However, there is a defect in the current
>>> implementation [1]. When a SYN packet is received, the connection is tied
>>> to a listening socket. Accordingly, when the listener is closed, in-flight
>>> requests during the three-way handshake and child sockets in the accept
>>> queue are dropped even if other listeners on the same port could accept
>>> such connections.
>>>
>>> This situation can happen when various server management tools restart
>>> server (such as nginx) processes. For instance, when we change nginx
>>> configurations and restart it, it spins up new workers that respect the new
>>> configuration and closes all listeners on the old workers, resulting in the
>>> in-flight ACK of 3WHS is responded by RST.
>>
>> Hi Kuniyuki,
>>
>> I had implemented a different approach to this that I wanted to get your
>> thoughts about. The idea is to use unix sockets and SCM_RIGHTS to pass the
>> listen fd (or any other fd) around. Currently, if you have an 'old' webserver
>> that you want to replace with a 'new' webserver, you would need a separate
>> process to receive the listen fd and then have that process send the fd to
>> the new webserver, if they are not running con-currently. So instead what
>> I'm proposing is a 'delayed close' for a unix socket. That is, one could do:
>>
>> 1) bind unix socket with path '/sockets'
>> 2) sendmsg() the listen fd via the unix socket
>> 2) setsockopt() some 'timeout' on the unix socket (maybe 10 seconds or so)
>> 3) exit/close the old webserver and the listen socket
>> 4) start the new webserver
>> 5) create new unix socket and bind to '/sockets' (if has MAY_WRITE file permissions)
>> 6) recvmsg() the listen fd
>>
>> So the idea is that we set a timeout on the unix socket. If the new process
>> does not start and bind to the unix socket, it simply closes, thus releasing
>> the listen socket. However, if it does bind it can now call recvmsg() and
>> use the listen fd as normal. It can then simply continue to use the old listen
>> fds and/or create new ones and drain the old ones.
>>
>> Thus, the old and new webservers do not have to run concurrently. This doesn't
>> involve any changes to the tcp layer and can be used to pass any type of fd.
>> not sure if it's actually useful for anything else though.
>>
>> I'm not sure if this solves your use-case or not but I thought I'd share it.
>> One can also inherit the fds like in systemd's socket activation model, but
>> that again requires another process to hold open the listen fd.
> 
> Thank you for sharing code.
> 
> It seems bit more crash-tolerant than normal fd passing, but it can still
> suffer if the process dies before passing fds. With this patch set, we can
> migrate children sockets even if the process dies.
> 

I don't think crashing should be much of an issue. The old server can setup the
unix socket patch '/sockets' when it starts up and queue the listen sockets
there from the start. When it dies it will close all its fds, and the new
server can pick anything up any fds that are in the '/sockets' queue.


> Also, as Martin said, fd passing tends to make application complicated.
> 

It may be but perhaps its more flexible? It gives the new server the
chance to re-use the existing listen fds, close, drain and/or start new
ones. It also addresses the non-REUSEPORT case where you can't bind right
away.

Thanks,

-Jason

> If we do not mind these points, your approach could be an option.
> 
> 

  reply	other threads:[~2021-04-28 14:44 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-27  3:46 [PATCH v4 bpf-next 00/11] Socket migration for SO_REUSEPORT Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 01/11] net: Introduce net.ipv4.tcp_migrate_req Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 02/11] tcp: Add num_closed_socks to struct sock_reuseport Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 03/11] tcp: Keep TCP_CLOSE sockets in the reuseport group Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 04/11] tcp: Add reuseport_migrate_sock() to select a new listener Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 05/11] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 06/11] tcp: Migrate TCP_NEW_SYN_RECV requests at retransmitting SYN+ACKs Kuniyuki Iwashima
2021-05-05  4:56   ` Martin KaFai Lau
2021-05-05 23:16     ` Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 07/11] tcp: Migrate TCP_NEW_SYN_RECV requests at receiving the final ACK Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 08/11] bpf: Support BPF_FUNC_get_socket_cookie() for BPF_PROG_TYPE_SK_REUSEPORT Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 09/11] bpf: Support socket migration by eBPF Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 10/11] libbpf: Set expected_attach_type for BPF_PROG_TYPE_SK_REUSEPORT Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 11/11] bpf: Test BPF_SK_REUSEPORT_SELECT_OR_MIGRATE Kuniyuki Iwashima
2021-05-05  5:14   ` Martin KaFai Lau
2021-05-05 23:19     ` Kuniyuki Iwashima
2021-04-27 16:38 ` [PATCH v4 bpf-next 00/11] Socket migration for SO_REUSEPORT Jason Baron
2021-04-28  1:27   ` Martin KaFai Lau
2021-04-28 14:18     ` Eric Dumazet
2021-04-28 15:49       ` Kuniyuki Iwashima
2021-04-28  8:13   ` Kuniyuki Iwashima
2021-04-28 14:44     ` Jason Baron [this message]
2021-04-28 15:52       ` Kuniyuki Iwashima
2021-04-28 16:33         ` Eric Dumazet
2021-04-29  3:16           ` Kuniyuki Iwashima
2021-05-05  6:54             ` Martin KaFai Lau
2021-04-27 21:55 ` Maciej Żenczykowski
2021-04-27 22:00   ` Maciej Żenczykowski
2021-04-28  8:18     ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fabd0598-c62e-ea88-f340-050136bb8266@akamai.com \
    --to=jbaron@akamai.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=benh@amazon.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kafai@fb.com \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@amazon.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).