All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Kuniyuki Iwashima <kuniyu@amazon.co.jp>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>
Cc: Benjamin Herrenschmidt <benh@amazon.com>,
	Kuniyuki Iwashima <kuni1840@gmail.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v7 bpf-next 07/11] tcp: Migrate TCP_NEW_SYN_RECV requests at receiving the final ACK.
Date: Thu, 10 Jun 2021 22:36:27 +0200	[thread overview]
Message-ID: <89c4ce38-fe2c-1d80-f814-c4b3a5e4781d@gmail.com> (raw)
In-Reply-To: <20210521182104.18273-8-kuniyu@amazon.co.jp>



On 5/21/21 8:21 PM, Kuniyuki Iwashima wrote:
> This patch also changes the code to call reuseport_migrate_sock() and
> inet_reqsk_clone(), but unlike the other cases, we do not call
> inet_reqsk_clone() right after reuseport_migrate_sock().
> 
> Currently, in the receive path for TCP_NEW_SYN_RECV sockets, its listener
> has three kinds of refcnt:
> 
>   (A) for listener itself
>   (B) carried by reuqest_sock
>   (C) sock_hold() in tcp_v[46]_rcv()
> 
> While processing the req, (A) may disappear by close(listener). Also, (B)
> can disappear by accept(listener) once we put the req into the accept
> queue. So, we have to hold another refcnt (C) for the listener to prevent
> use-after-free.
> 
> For socket migration, we call reuseport_migrate_sock() to select a listener
> with (A) and to increment the new listener's refcnt in tcp_v[46]_rcv().
> This refcnt corresponds to (C) and is cleaned up later in tcp_v[46]_rcv().
> Thus we have to take another refcnt (B) for the newly cloned request_sock.
> 
> In inet_csk_complete_hashdance(), we hold the count (B), clone the req, and
> try to put the new req into the accept queue. By migrating req after
> winning the "own_req" race, we can avoid such a worst situation:
> 
>   CPU 1 looks up req1
>   CPU 2 looks up req1, unhashes it, then CPU 1 loses the race
>   CPU 3 looks up req2, unhashes it, then CPU 2 loses the race
>   ...
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> ---
>  net/ipv4/inet_connection_sock.c | 34 ++++++++++++++++++++++++++++++---
>  net/ipv4/tcp_ipv4.c             | 20 +++++++++++++------
>  net/ipv4/tcp_minisocks.c        |  4 ++--
>  net/ipv6/tcp_ipv6.c             | 14 +++++++++++---
>  4 files changed, 58 insertions(+), 14 deletions(-)
> 
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index c1f068464363..b795198f919a 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1113,12 +1113,40 @@ struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
>  					 struct request_sock *req, bool own_req)
>  {
>  	if (own_req) {
> -		inet_csk_reqsk_queue_drop(sk, req);
> -		reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
> -		if (inet_csk_reqsk_queue_add(sk, req, child))
> +		inet_csk_reqsk_queue_drop(req->rsk_listener, req);
> +		reqsk_queue_removed(&inet_csk(req->rsk_listener)->icsk_accept_queue, req);
> +
> +		if (sk != req->rsk_listener) {
> +			/* another listening sk has been selected,
> +			 * migrate the req to it.
> +			 */
> +			struct request_sock *nreq;
> +
> +			/* hold a refcnt for the nreq->rsk_listener
> +			 * which is assigned in inet_reqsk_clone()
> +			 */
> +			sock_hold(sk);
> +			nreq = inet_reqsk_clone(req, sk);
> +			if (!nreq) {
> +				inet_child_forget(sk, req, child);

Don't you need a sock_put(sk) here ?

\
> +				goto child_put;
> +			}
> +
> +			refcount_set(&nreq->rsk_refcnt, 1);
> +			if (inet_csk_reqsk_queue_add(sk, nreq, child)) {
> +				reqsk_migrate_reset(req);
> +				reqsk_put(req);
> +				return child;
> +			}
> +
> +			reqsk_migrate_reset(nreq);
> +			__reqsk_free(nreq);
> +		} else if (inet_csk_reqsk_queue_add(sk, req, child)) {
>  			return child;
> +		}
> 

  reply	other threads:[~2021-06-10 20:37 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21 18:20 [PATCH v7 bpf-next 00/11] Socket migration for SO_REUSEPORT Kuniyuki Iwashima
2021-05-21 18:20 ` [PATCH v7 bpf-next 01/11] net: Introduce net.ipv4.tcp_migrate_req Kuniyuki Iwashima
2021-06-10 17:24   ` Eric Dumazet
2021-06-10 22:31     ` Kuniyuki Iwashima
2021-05-21 18:20 ` [PATCH v7 bpf-next 02/11] tcp: Add num_closed_socks to struct sock_reuseport Kuniyuki Iwashima
2021-06-10 17:38   ` Eric Dumazet
2021-06-10 22:33     ` Kuniyuki Iwashima
2021-05-21 18:20 ` [PATCH v7 bpf-next 03/11] tcp: Keep TCP_CLOSE sockets in the reuseport group Kuniyuki Iwashima
2021-06-10 17:59   ` Eric Dumazet
2021-06-10 22:37     ` Kuniyuki Iwashima
2021-05-21 18:20 ` [PATCH v7 bpf-next 04/11] tcp: Add reuseport_migrate_sock() to select a new listener Kuniyuki Iwashima
2021-06-10 18:09   ` Eric Dumazet
2021-06-10 22:39     ` Kuniyuki Iwashima
2021-05-21 18:20 ` [PATCH v7 bpf-next 05/11] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues Kuniyuki Iwashima
2021-06-10 18:20   ` Eric Dumazet
2021-06-10 22:45     ` Kuniyuki Iwashima
2021-05-21 18:20 ` [PATCH v7 bpf-next 06/11] tcp: Migrate TCP_NEW_SYN_RECV requests at retransmitting SYN+ACKs Kuniyuki Iwashima
2021-06-10 20:21   ` Eric Dumazet
2021-06-10 22:52     ` Kuniyuki Iwashima
2021-05-21 18:21 ` [PATCH v7 bpf-next 07/11] tcp: Migrate TCP_NEW_SYN_RECV requests at receiving the final ACK Kuniyuki Iwashima
2021-06-10 20:36   ` Eric Dumazet [this message]
2021-06-10 22:56     ` Kuniyuki Iwashima
2021-05-21 18:21 ` [PATCH v7 bpf-next 08/11] bpf: Support BPF_FUNC_get_socket_cookie() for BPF_PROG_TYPE_SK_REUSEPORT Kuniyuki Iwashima
2021-05-21 18:21 ` [PATCH v7 bpf-next 09/11] bpf: Support socket migration by eBPF Kuniyuki Iwashima
2021-05-21 18:21 ` [PATCH v7 bpf-next 10/11] libbpf: Set expected_attach_type for BPF_PROG_TYPE_SK_REUSEPORT Kuniyuki Iwashima
2021-05-21 18:21 ` [PATCH v7 bpf-next 11/11] bpf: Test BPF_SK_REUSEPORT_SELECT_OR_MIGRATE Kuniyuki Iwashima
2021-05-26  6:42 ` [PATCH v7 bpf-next 00/11] Socket migration for SO_REUSEPORT Daniel Borkmann
2021-06-08  3:13   ` Alexei Starovoitov
2021-06-08 17:48   ` Yuchung Cheng
2021-06-08 23:03     ` Kuniyuki Iwashima
2021-06-08 23:47       ` Yuchung Cheng
2021-06-09  0:34         ` Kuniyuki Iwashima
2021-06-09 17:04           ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89c4ce38-fe2c-1d80-f814-c4b3a5e4781d@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=benh@amazon.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kafai@fb.com \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@amazon.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.