netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin KaFai Lau <kafai@fb.com>
To: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Cc: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Benjamin Herrenschmidt <benh@amazon.com>,
	Kuniyuki Iwashima <kuni1840@gmail.com>, <bpf@vger.kernel.org>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH bpf-next 3/8] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.
Date: Wed, 18 Nov 2020 15:50:17 -0800	[thread overview]
Message-ID: <20201118235017.xrudgf6bfwgkaukh@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20201117094023.3685-4-kuniyu@amazon.co.jp>

On Tue, Nov 17, 2020 at 06:40:18PM +0900, Kuniyuki Iwashima wrote:
> This patch lets reuseport_detach_sock() return a pointer of struct sock,
> which is used only by inet_unhash(). If it is not NULL,
> inet_csk_reqsk_queue_migrate() migrates TCP_ESTABLISHED/TCP_SYN_RECV
> sockets from the closing listener to the selected one.
> 
> Listening sockets hold incoming connections as a linked list of struct
> request_sock in the accept queue, and each request has reference to a full
> socket and its listener. In inet_csk_reqsk_queue_migrate(), we unlink the
> requests from the closing listener's queue and relink them to the head of
> the new listener's queue. We do not process each request, so the migration
> completes in O(1) time complexity. However, in the case of TCP_SYN_RECV
> sockets, we will take special care in the next commit.
> 
> By default, we select the last element of socks[] as the new listener.
> This behaviour is based on how the kernel moves sockets in socks[].
> 
> For example, we call listen() for four sockets (A, B, C, D), and close the
> first two by turns. The sockets move in socks[] like below. (See also [1])
> 
>   socks[0] : A <-.      socks[0] : D          socks[0] : D
>   socks[1] : B   |  =>  socks[1] : B <-.  =>  socks[1] : C
>   socks[2] : C   |      socks[2] : C --'
>   socks[3] : D --'
> 
> Then, if C and D have newer settings than A and B, and each socket has a
> request (a, b, c, d) in their accept queue, we can redistribute old
> requests evenly to new listeners.
I don't think it should emphasize/claim there is a specific way that
the kernel-pick here can redistribute the requests evenly.  It depends on
how the application close/listen.  The userspace can not expect the
ordering of socks[] will behave in a certain way.

The primary redistribution policy has to depend on BPF which is the
policy defined by the user based on its application logic (e.g. how
its binary restart work).  The application (and bpf) knows which one
is a dying process and can avoid distributing to it.

The kernel-pick could be an optional fallback but not a must.  If the bpf
prog is attached, I would even go further to call bpf to redistribute
regardless of the sysctl, so I think the sysctl is not necessary.

> 
>   socks[0] : A (a) <-.      socks[0] : D (a + d)      socks[0] : D (a + d)
>   socks[1] : B (b)   |  =>  socks[1] : B (b) <-.  =>  socks[1] : C (b + c)
>   socks[2] : C (c)   |      socks[2] : C (c) --'
>   socks[3] : D (d) --'
> 

  reply	other threads:[~2020-11-18 23:51 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17  9:40 [RFC PATCH bpf-next 0/8] Socket migration for SO_REUSEPORT Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 1/8] net: Introduce net.ipv4.tcp_migrate_req Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 2/8] tcp: Keep TCP_CLOSE sockets in the reuseport group Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 3/8] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues Kuniyuki Iwashima
2020-11-18 23:50   ` Martin KaFai Lau [this message]
2020-11-19 22:09     ` Kuniyuki Iwashima
2020-11-20  1:53       ` Martin KaFai Lau
2020-11-21 10:13         ` Kuniyuki Iwashima
2020-11-23  0:40           ` Martin KaFai Lau
2020-11-24  9:24             ` Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 4/8] tcp: Migrate TFO requests causing RST during TCP_SYN_RECV Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 5/8] tcp: Migrate TCP_NEW_SYN_RECV requests Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 6/8] bpf: Add cookie in sk_reuseport_md Kuniyuki Iwashima
2020-11-19  0:11   ` Martin KaFai Lau
2020-11-19 22:10     ` Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 7/8] bpf: Call bpf_run_sk_reuseport() for socket migration Kuniyuki Iwashima
2020-11-19  1:00   ` Martin KaFai Lau
2020-11-19 22:13     ` Kuniyuki Iwashima
2020-11-17  9:40 ` [RFC PATCH bpf-next 8/8] bpf: Test BPF_PROG_TYPE_SK_REUSEPORT " Kuniyuki Iwashima
2020-11-18  9:18 ` [RFC PATCH bpf-next 0/8] Socket migration for SO_REUSEPORT David Laight
2020-11-19 22:01   ` Kuniyuki Iwashima
2020-11-18 16:25 ` Eric Dumazet
2020-11-19 22:05   ` Kuniyuki Iwashima
2020-11-19  1:49 ` Martin KaFai Lau
2020-11-19 22:17   ` Kuniyuki Iwashima
2020-11-20  2:31     ` Martin KaFai Lau
2020-11-21 10:16       ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201118235017.xrudgf6bfwgkaukh@kafai-mbp.dhcp.thefacebook.com \
    --to=kafai@fb.com \
    --cc=ast@kernel.org \
    --cc=benh@amazon.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@amazon.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).