linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin KaFai Lau <kafai@fb.com>
To: Kuniyuki Iwashima <kuniyu@amazon.co.jp>, <edumazet@google.com>,
	<jbaron@akamai.com>
Cc: <andrii@kernel.org>, <ast@kernel.org>, <benh@amazon.com>,
	<bpf@vger.kernel.org>, <daniel@iogearbox.net>,
	<davem@davemloft.net>, <kuba@kernel.org>, <kuni1840@gmail.com>,
	<linux-kernel@vger.kernel.org>, <netdev@vger.kernel.org>
Subject: Re: [PATCH v4 bpf-next 00/11] Socket migration for SO_REUSEPORT.
Date: Tue, 4 May 2021 23:54:18 -0700	[thread overview]
Message-ID: <20210505065418.uqfmyy5es3y5zw2d@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20210429031609.1398-1-kuniyu@amazon.co.jp>

On Thu, Apr 29, 2021 at 12:16:09PM +0900, Kuniyuki Iwashima wrote:
[ ... ]

> > > > It may be but perhaps its more flexible? It gives the new server the
> > > > chance to re-use the existing listen fds, close, drain and/or start new
> > > > ones. It also addresses the non-REUSEPORT case where you can't bind right
> > > > away.

> > > If the flexibility is really worth the complexity, we do not care about it.
> > > But, SO_REUSEPORT can give enough flexibility we want.
> > >
> > > With socket migration, there is no need to reuse listener (fd passing),
> > > drain children (incoming connections are automatically migrated if there is
> > > already another listener bind()ed), and of course another listener can
> > > close itself and migrated children.
> > >
> > > If two different approaches resolves the same issue and one does not need
> > > complexity in userspace, we select the simpler one.

> > 
> > Kernel bloat and complexity is _not_ the simplest choice.
> > 
> > Touching a complex part of TCP stack is quite risky.

> 
> Yes, we understand that is not a simple decision and your concern. So many
> reviews are needed to see if our approach is really risky or not.

If fd passing is sufficient for a set of use cases, it is great.

However, it does not work well for everyone.  We are not saying
the SO_REUSEPORT(+ optional bpf) is better in all cases also.

After SO_REUSEPORT was added, some people had moved from fd-passing
to SO_REUSEPORT instead and have one bpf policy to select for both
TCP and UDP sk.

Since SO_REUSEPORT was first added, there has been multiple contributions
from different people and companies.  For example, first adding bpf
support to UDP, then to TCP, then a much more flexible way to select sk
from reuseport_array, and then sock_map/sock_hash support.  That is another
perspective showing that people find it useful.  Each of the contributions
changed the kernel code also for practical use cases.

This set is an extension/improvement to address a lacking in SO_REUSEPORT
when some of the sk is closed.  Patch 2 to 4 are the prep work
in sock_reuseport.c and they have the most changes in this set.
Patch 5 to 7 are the changes in tcp.  The code has been structured
to be as isolated as possible.  It will be most useful to at least
review and getting feedback in this part.  The remaining is bpf
related.

  reply	other threads:[~2021-05-05  6:54 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-27  3:46 [PATCH v4 bpf-next 00/11] Socket migration for SO_REUSEPORT Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 01/11] net: Introduce net.ipv4.tcp_migrate_req Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 02/11] tcp: Add num_closed_socks to struct sock_reuseport Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 03/11] tcp: Keep TCP_CLOSE sockets in the reuseport group Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 04/11] tcp: Add reuseport_migrate_sock() to select a new listener Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 05/11] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 06/11] tcp: Migrate TCP_NEW_SYN_RECV requests at retransmitting SYN+ACKs Kuniyuki Iwashima
2021-05-05  4:56   ` Martin KaFai Lau
2021-05-05 23:16     ` Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 07/11] tcp: Migrate TCP_NEW_SYN_RECV requests at receiving the final ACK Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 08/11] bpf: Support BPF_FUNC_get_socket_cookie() for BPF_PROG_TYPE_SK_REUSEPORT Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 09/11] bpf: Support socket migration by eBPF Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 10/11] libbpf: Set expected_attach_type for BPF_PROG_TYPE_SK_REUSEPORT Kuniyuki Iwashima
2021-04-27  3:46 ` [PATCH v4 bpf-next 11/11] bpf: Test BPF_SK_REUSEPORT_SELECT_OR_MIGRATE Kuniyuki Iwashima
2021-05-05  5:14   ` Martin KaFai Lau
2021-05-05 23:19     ` Kuniyuki Iwashima
2021-04-27 16:38 ` [PATCH v4 bpf-next 00/11] Socket migration for SO_REUSEPORT Jason Baron
2021-04-28  1:27   ` Martin KaFai Lau
2021-04-28 14:18     ` Eric Dumazet
2021-04-28 15:49       ` Kuniyuki Iwashima
2021-04-28  8:13   ` Kuniyuki Iwashima
2021-04-28 14:44     ` Jason Baron
2021-04-28 15:52       ` Kuniyuki Iwashima
2021-04-28 16:33         ` Eric Dumazet
2021-04-29  3:16           ` Kuniyuki Iwashima
2021-05-05  6:54             ` Martin KaFai Lau [this message]
2021-04-27 21:55 ` Maciej Żenczykowski
2021-04-27 22:00   ` Maciej Żenczykowski
2021-04-28  8:18     ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210505065418.uqfmyy5es3y5zw2d@kafai-mbp.dhcp.thefacebook.com \
    --to=kafai@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=benh@amazon.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jbaron@akamai.com \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@amazon.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).