All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
To: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <kafai@fb.com>
Cc: Benjamin Herrenschmidt <benh@amazon.com>,
	Kuniyuki Iwashima <kuniyu@amazon.co.jp>,
	Kuniyuki Iwashima <kuni1840@gmail.com>,
	<osa-contribution-log@amazon.com>, <bpf@vger.kernel.org>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: [PATCH v1 bpf-next 04/11] tcp: Migrate TFO requests causing RST during TCP_SYN_RECV.
Date: Tue, 1 Dec 2020 23:44:11 +0900	[thread overview]
Message-ID: <20201201144418.35045-5-kuniyu@amazon.co.jp> (raw)
In-Reply-To: <20201201144418.35045-1-kuniyu@amazon.co.jp>

A TFO request socket is only freed after BOTH 3WHS has completed (or
aborted) and the child socket has been accepted (or its listener has been
closed). Hence, depending on the order, there can be two kinds of request
sockets in the accept queue.

  3WHS -> accept : TCP_ESTABLISHED
  accept -> 3WHS : TCP_SYN_RECV

Unlike TCP_ESTABLISHED socket, accept() does not free the request socket
for TCP_SYN_RECV socket. It is freed later at reqsk_fastopen_remove().
Also, it accesses request_sock.rsk_listener. So, in order to complete TFO
socket migration, we have to set the current listener to it at accept()
before reqsk_fastopen_remove().

Moreover, if TFO request caused RST before 3WHS has completed, it is held
in the listener's TFO queue to prevent DDoS attack. Thus, we also have to
migrate the requests in TFO queue.

Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
 net/ipv4/inet_connection_sock.c | 35 ++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index b27241ea96bd..361efe55b1ad 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -500,6 +500,16 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
 	    tcp_rsk(req)->tfo_listener) {
 		spin_lock_bh(&queue->fastopenq.lock);
 		if (tcp_rsk(req)->tfo_listener) {
+			if (req->rsk_listener != sk) {
+				/* TFO request was migrated to another listener so
+				 * the new listener must be used in reqsk_fastopen_remove()
+				 * to hold requests which cause RST.
+				 */
+				sock_put(req->rsk_listener);
+				sock_hold(sk);
+				req->rsk_listener = sk;
+			}
+
 			/* We are still waiting for the final ACK from 3WHS
 			 * so can't free req now. Instead, we set req->sk to
 			 * NULL to signify that the child socket is taken
@@ -954,7 +964,6 @@ static void inet_child_forget(struct sock *sk, struct request_sock *req,
 
 	if (sk->sk_protocol == IPPROTO_TCP && tcp_rsk(req)->tfo_listener) {
 		BUG_ON(rcu_access_pointer(tcp_sk(child)->fastopen_rsk) != req);
-		BUG_ON(sk != req->rsk_listener);
 
 		/* Paranoid, to prevent race condition if
 		 * an inbound pkt destined for child is
@@ -995,6 +1004,7 @@ EXPORT_SYMBOL(inet_csk_reqsk_queue_add);
 void inet_csk_reqsk_queue_migrate(struct sock *sk, struct sock *nsk)
 {
 	struct request_sock_queue *old_accept_queue, *new_accept_queue;
+	struct fastopen_queue *old_fastopenq, *new_fastopenq;
 
 	old_accept_queue = &inet_csk(sk)->icsk_accept_queue;
 	new_accept_queue = &inet_csk(nsk)->icsk_accept_queue;
@@ -1019,6 +1029,29 @@ void inet_csk_reqsk_queue_migrate(struct sock *sk, struct sock *nsk)
 
 	spin_unlock(&new_accept_queue->rskq_lock);
 	spin_unlock(&old_accept_queue->rskq_lock);
+
+	old_fastopenq = &old_accept_queue->fastopenq;
+	new_fastopenq = &new_accept_queue->fastopenq;
+
+	spin_lock_bh(&old_fastopenq->lock);
+	spin_lock_bh(&new_fastopenq->lock);
+
+	new_fastopenq->qlen += old_fastopenq->qlen;
+	old_fastopenq->qlen = 0;
+
+	if (old_fastopenq->rskq_rst_head) {
+		if (new_fastopenq->rskq_rst_head)
+			old_fastopenq->rskq_rst_tail->dl_next = new_fastopenq->rskq_rst_head;
+		else
+			old_fastopenq->rskq_rst_tail = new_fastopenq->rskq_rst_tail;
+
+		new_fastopenq->rskq_rst_head = old_fastopenq->rskq_rst_head;
+		old_fastopenq->rskq_rst_head = NULL;
+		old_fastopenq->rskq_rst_tail = NULL;
+	}
+
+	spin_unlock_bh(&new_fastopenq->lock);
+	spin_unlock_bh(&old_fastopenq->lock);
 }
 EXPORT_SYMBOL(inet_csk_reqsk_queue_migrate);
 
-- 
2.17.2 (Apple Git-113)


  parent reply	other threads:[~2020-12-01 14:46 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-01 14:44 [PATCH v1 bpf-next 00/11] Socket migration for SO_REUSEPORT Kuniyuki Iwashima
2020-12-01 14:44 ` [PATCH v1 bpf-next 01/11] tcp: Keep TCP_CLOSE sockets in the reuseport group Kuniyuki Iwashima
2020-12-05  1:31   ` Martin KaFai Lau
2020-12-06  4:38     ` Kuniyuki Iwashima
2020-12-01 14:44 ` [PATCH v1 bpf-next 02/11] bpf: Define migration types for SO_REUSEPORT Kuniyuki Iwashima
2020-12-01 14:44 ` [PATCH v1 bpf-next 03/11] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues Kuniyuki Iwashima
2020-12-01 15:25   ` Eric Dumazet
2020-12-03 14:14     ` Kuniyuki Iwashima
2020-12-03 14:31       ` Eric Dumazet
2020-12-03 15:41         ` Kuniyuki Iwashima
2020-12-07 20:33       ` Martin KaFai Lau
2020-12-08  6:31         ` Kuniyuki Iwashima
2020-12-08  7:34           ` Martin KaFai Lau
2020-12-08  8:17             ` Kuniyuki Iwashima
2020-12-09  3:09               ` Martin KaFai Lau
2020-12-09  8:05                 ` Kuniyuki Iwashima
2020-12-09 16:57                   ` Kuniyuki Iwashima
2020-12-10  1:53                     ` Martin KaFai Lau
2020-12-10  5:58                       ` Kuniyuki Iwashima
2020-12-10 19:33                         ` Martin KaFai Lau
2020-12-14 17:16                           ` Kuniyuki Iwashima
2020-12-05  1:42   ` Martin KaFai Lau
2020-12-06  4:41     ` Kuniyuki Iwashima
     [not found]     ` <20201205160307.91179-1-kuniyu@amazon.co.jp>
2020-12-07 20:14       ` Martin KaFai Lau
2020-12-08  6:27         ` Kuniyuki Iwashima
2020-12-08  8:13           ` Martin KaFai Lau
2020-12-08  9:02             ` Kuniyuki Iwashima
2020-12-08  6:54   ` Martin KaFai Lau
2020-12-08  7:42     ` Kuniyuki Iwashima
2020-12-01 14:44 ` Kuniyuki Iwashima [this message]
2020-12-01 15:30   ` [PATCH v1 bpf-next 04/11] tcp: Migrate TFO requests causing RST during TCP_SYN_RECV Eric Dumazet
2020-12-01 14:44 ` [PATCH v1 bpf-next 05/11] tcp: Migrate TCP_NEW_SYN_RECV requests Kuniyuki Iwashima
2020-12-01 15:13   ` Eric Dumazet
2020-12-03 14:12     ` Kuniyuki Iwashima
2020-12-01 17:37   ` kernel test robot
2020-12-01 17:37     ` kernel test robot
2020-12-01 17:42   ` kernel test robot
2020-12-01 17:42     ` kernel test robot
2020-12-10  0:07   ` Martin KaFai Lau
2020-12-10  5:15     ` Kuniyuki Iwashima
2020-12-10 18:49       ` Martin KaFai Lau
2020-12-14 17:03         ` Kuniyuki Iwashima
2020-12-15  2:58           ` Martin KaFai Lau
2020-12-16 16:41             ` Kuniyuki Iwashima
2020-12-16 22:24               ` Martin KaFai Lau
2020-12-01 14:44 ` [PATCH v1 bpf-next 06/11] bpf: Introduce two attach types for BPF_PROG_TYPE_SK_REUSEPORT Kuniyuki Iwashima
2020-12-02  2:04   ` Andrii Nakryiko
2020-12-02 19:19     ` Martin KaFai Lau
2020-12-03  4:24       ` Martin KaFai Lau
2020-12-03 14:16         ` Kuniyuki Iwashima
2020-12-04  5:56           ` Martin KaFai Lau
2020-12-06  4:32             ` Kuniyuki Iwashima
2020-12-01 14:44 ` [PATCH v1 bpf-next 07/11] libbpf: Set expected_attach_type " Kuniyuki Iwashima
2020-12-01 14:44 ` [PATCH v1 bpf-next 08/11] bpf: Add migration to sk_reuseport_(kern|md) Kuniyuki Iwashima
2020-12-01 14:44 ` [PATCH v1 bpf-next 09/11] bpf: Support bpf_get_socket_cookie_sock() for BPF_PROG_TYPE_SK_REUSEPORT Kuniyuki Iwashima
2020-12-04 19:58   ` Martin KaFai Lau
2020-12-06  4:36     ` Kuniyuki Iwashima
2020-12-01 14:44 ` [PATCH v1 bpf-next 10/11] bpf: Call bpf_run_sk_reuseport() for socket migration Kuniyuki Iwashima
2020-12-01 14:44 ` [PATCH v1 bpf-next 11/11] bpf: Test BPF_SK_REUSEPORT_SELECT_OR_MIGRATE Kuniyuki Iwashima
2020-12-05  1:50   ` Martin KaFai Lau
2020-12-06  4:43     ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201201144418.35045-5-kuniyu@amazon.co.jp \
    --to=kuniyu@amazon.co.jp \
    --cc=ast@kernel.org \
    --cc=benh@amazon.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kafai@fb.com \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=osa-contribution-log@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.