linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race
@ 2020-01-31 12:24 sjpark
  2020-01-31 12:24 ` [PATCH 1/3] net/ipv4/inet_timewait_sock: Fix inconsistent comments sjpark
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: sjpark @ 2020-01-31 12:24 UTC (permalink / raw)
  To: edumazet, davem
  Cc: shuah, netdev, linux-kselftest, linux-kernel, sj38.park, aams,
	SeongJae Park

From: SeongJae Park <sjpark@amazon.de>

When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order.  This is possible in RSS disabled environments such as a
connection inside a host.

For example, expected state transitions and required packets for the
disconnection will be similar to below flow.

	 00 (Process A)				(Process B)
	 01 ESTABLISHED				ESTABLISHED
	 02 close()
	 03 FIN_WAIT_1
	 04 		---FIN-->
	 05 					CLOSE_WAIT
	 06 		<--ACK---
	 07 FIN_WAIT_2
	 08 		<--FIN/ACK---
	 09 TIME_WAIT
	 10 		---ACK-->
	 11 					LAST_ACK
	 12 CLOSED				CLOSED

The acks in lines 6 and 8 are the acks.  If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B.  Thus, Process B will left in
CLOSE_WAIT status, as below.

	 00 (Process A)				(Process B)
	 01 ESTABLISHED				ESTABLISHED
	 02 close()
	 03 FIN_WAIT_1
	 04 		---FIN-->
	 05 					CLOSE_WAIT
	 06 				(<--ACK---)
	 07	  			(<--FIN/ACK---)
	 08 				(fired in right order)
	 09 		<--FIN/ACK---
	 10 		<--ACK---
	 11 		(processed in reverse order)
	 12 FIN_WAIT_2

Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number.  Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection.  If reconnections are frequent, the one second latency
spikes can be a big problem.  Below is a tcpdump results of the problem:

    14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
    14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512
    14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298
    /* ONE SECOND DELAY */
    15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644

Patchset Organization
---------------------

The first patch fix a trivial nit.  The second one fix the problem by
adjusting the resend delay of the SYN in the case.  Finally, the third
patch adds a user space test to reproduce this problem.

The patches are based on the v5.5.  You can also clone the complete git
tree:

    $ git clone git://github.com/sjp38/linux -b patches/finack_lat/v1

The web is also available:
https://github.com/sjp38/linux/tree/patches/finack_lat/v1

SeongJae Park (3):
  net/ipv4/inet_timewait_sock: Fix inconsistent comments
  tcp: Reduce SYN resend delay if a suspicous ACK is received
  selftests: net: Add FIN_ACK processing order related latency spike
    test

 net/ipv4/inet_timewait_sock.c                 |  1 +
 net/ipv4/tcp_input.c                          |  6 +-
 tools/testing/selftests/net/.gitignore        |  2 +
 tools/testing/selftests/net/Makefile          |  2 +
 tools/testing/selftests/net/fin_ack_lat.sh    | 42 ++++++++++
 .../selftests/net/fin_ack_lat_accept.c        | 49 +++++++++++
 .../selftests/net/fin_ack_lat_connect.c       | 81 +++++++++++++++++++
 7 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/net/fin_ack_lat.sh
 create mode 100644 tools/testing/selftests/net/fin_ack_lat_accept.c
 create mode 100644 tools/testing/selftests/net/fin_ack_lat_connect.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/3] net/ipv4/inet_timewait_sock: Fix inconsistent comments
  2020-01-31 12:24 [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race sjpark
@ 2020-01-31 12:24 ` sjpark
  2020-01-31 14:54   ` Eric Dumazet
  2020-01-31 12:24 ` [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received sjpark
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: sjpark @ 2020-01-31 12:24 UTC (permalink / raw)
  To: edumazet, davem
  Cc: shuah, netdev, linux-kselftest, linux-kernel, sj38.park, aams,
	SeongJae Park

From: SeongJae Park <sjpark@amazon.de>

Commit ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait
hashdance") mistakenly erased a comment for the second step of
`inet_twsk_hashdance()`.  This commit restores it for better
readability.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 net/ipv4/inet_timewait_sock.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index c411c87ae865..fbfcd63cc170 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -120,6 +120,7 @@ void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk,
 
 	spin_lock(lock);
 
+	/* Step 2: Hash TW into tcp ehash chain. */
 	inet_twsk_add_node_rcu(tw, &ehead->chain);
 
 	/* Step 3: Remove SK from hash chain */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 12:24 [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race sjpark
  2020-01-31 12:24 ` [PATCH 1/3] net/ipv4/inet_timewait_sock: Fix inconsistent comments sjpark
@ 2020-01-31 12:24 ` sjpark
  2020-01-31 15:01   ` Eric Dumazet
  2020-01-31 15:10   ` Neal Cardwell
  2020-01-31 12:24 ` [PATCH 3/3] selftests: net: Add FIN_ACK processing order related latency spike test sjpark
  2020-01-31 14:00 ` [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race David Laight
  3 siblings, 2 replies; 25+ messages in thread
From: sjpark @ 2020-01-31 12:24 UTC (permalink / raw)
  To: edumazet, davem
  Cc: shuah, netdev, linux-kselftest, linux-kernel, sj38.park, aams,
	SeongJae Park

From: SeongJae Park <sjpark@amazon.de>

When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order.  This is possible in RSS disabled environments such as a
connection inside a host.

For example, expected state transitions and required packets for the
disconnection will be similar to below flow.

	 00 (Process A)				(Process B)
	 01 ESTABLISHED				ESTABLISHED
	 02 close()
	 03 FIN_WAIT_1
	 04 		---FIN-->
	 05 					CLOSE_WAIT
	 06 		<--ACK---
	 07 FIN_WAIT_2
	 08 		<--FIN/ACK---
	 09 TIME_WAIT
	 10 		---ACK-->
	 11 					LAST_ACK
	 12 CLOSED				CLOSED

The acks in lines 6 and 8 are the acks.  If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B.  Thus, Process B will left in
CLOSE_WAIT status, as below.

	 00 (Process A)				(Process B)
	 01 ESTABLISHED				ESTABLISHED
	 02 close()
	 03 FIN_WAIT_1
	 04 		---FIN-->
	 05 					CLOSE_WAIT
	 06 				(<--ACK---)
	 07	  			(<--FIN/ACK---)
	 08 				(fired in right order)
	 09 		<--FIN/ACK---
	 10 		<--ACK---
	 11 		(processed in reverse order)
	 12 FIN_WAIT_2

Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number.  Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection.  If reconnections are frequent, the one second latency
spikes can be a big problem.  Below is a tcpdump results of the problem:

    14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644
    14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512
    14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298
    /* ONE SECOND DELAY */
    15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644

This commit mitigates the problem by reducing the delay for the next SYN
if the suspicous ACK is received while in SYN_SENT state.

Following commit will add a selftest, which can be also helpful for
understanding of this issue.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 net/ipv4/tcp_input.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2a976f57f7e7..b168e29e1ad1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5893,8 +5893,12 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 		 *        the segment and return)"
 		 */
 		if (!after(TCP_SKB_CB(skb)->ack_seq, tp->snd_una) ||
-		    after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))
+		    after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
+			/* Previous FIN/ACK or RST/ACK might be ignore. */
+			inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
+						  TCP_ATO_MIN, TCP_RTO_MAX);
 			goto reset_and_undo;
+		}
 
 		if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
 		    !between(tp->rx_opt.rcv_tsecr, tp->retrans_stamp,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/3] selftests: net: Add FIN_ACK processing order related latency spike test
  2020-01-31 12:24 [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race sjpark
  2020-01-31 12:24 ` [PATCH 1/3] net/ipv4/inet_timewait_sock: Fix inconsistent comments sjpark
  2020-01-31 12:24 ` [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received sjpark
@ 2020-01-31 12:24 ` sjpark
  2020-01-31 14:56   ` Eric Dumazet
  2020-01-31 14:00 ` [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race David Laight
  3 siblings, 1 reply; 25+ messages in thread
From: sjpark @ 2020-01-31 12:24 UTC (permalink / raw)
  To: edumazet, davem
  Cc: shuah, netdev, linux-kselftest, linux-kernel, sj38.park, aams,
	SeongJae Park

From: SeongJae Park <sjpark@amazon.de>

This commit adds a test for FIN_ACK process races related reconnection
latency spike issues.  The issue has described and solved by the
previous commit ("tcp: Reduce SYN resend delay if a suspicous ACK is
received").

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 tools/testing/selftests/net/.gitignore        |  2 +
 tools/testing/selftests/net/Makefile          |  2 +
 tools/testing/selftests/net/fin_ack_lat.sh    | 42 ++++++++++
 .../selftests/net/fin_ack_lat_accept.c        | 49 +++++++++++
 .../selftests/net/fin_ack_lat_connect.c       | 81 +++++++++++++++++++
 5 files changed, 176 insertions(+)
 create mode 100755 tools/testing/selftests/net/fin_ack_lat.sh
 create mode 100644 tools/testing/selftests/net/fin_ack_lat_accept.c
 create mode 100644 tools/testing/selftests/net/fin_ack_lat_connect.c

diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 8aefd81fbc86..1bcf7b5498dd 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -22,3 +22,5 @@ ipv6_flowlabel_mgr
 so_txtime
 tcp_fastopen_backup_key
 nettest
+fin_ack_lat_accept
+fin_ack_lat_connect
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index a8e04d665b69..e4938c26ce3f 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -11,6 +11,7 @@ TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
 TEST_PROGS += udpgro_bench.sh udpgro.sh test_vxlan_under_vrf.sh reuseport_addr_any.sh
 TEST_PROGS += test_vxlan_fdb_changelink.sh so_txtime.sh ipv6_flowlabel.sh
 TEST_PROGS += tcp_fastopen_backup_key.sh fcnal-test.sh l2tp.sh traceroute.sh
+TEST_PROGS += fin_ack_lat.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket nettest
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
@@ -18,6 +19,7 @@ TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd txring_overwrite
 TEST_GEN_FILES += udpgso udpgso_bench_tx udpgso_bench_rx ip_defrag
 TEST_GEN_FILES += so_txtime ipv6_flowlabel ipv6_flowlabel_mgr
 TEST_GEN_FILES += tcp_fastopen_backup_key
+TEST_GEN_FILES += fin_ack_lat_accept fin_ack_lat_connect
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
 
diff --git a/tools/testing/selftests/net/fin_ack_lat.sh b/tools/testing/selftests/net/fin_ack_lat.sh
new file mode 100755
index 000000000000..0a398c837b7a
--- /dev/null
+++ b/tools/testing/selftests/net/fin_ack_lat.sh
@@ -0,0 +1,42 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test latency spikes caused by FIN/ACK handling race.
+
+set +x
+set -e
+
+tmpfile=$(mktemp /tmp/fin_ack_latency.XXXX.log)
+
+kill_accept() {
+	kill $ACCEPT_PID
+}
+
+cleanup() {
+	kill_accept
+	rm -f $tmpfile
+}
+
+trap cleanup EXIT
+
+do_test() {
+	RUNTIME=$1
+
+	./fin_ack_lat_accept &
+	ACCEPT_PID=$!
+	sleep 1
+
+	./fin_ack_lat_connect | tee $tmpfile &
+	sleep $RUNTIME
+	NR_SPIKES=$(wc -l $tmpfile | awk '{print $1}')
+	rm $tmpfile
+	if [ $NR_SPIKES -gt 0 ]
+	then
+		echo "FAIL: $NR_SPIKES spikes detected"
+		return 1
+	fi
+	return 0
+}
+
+do_test "30"
+echo "test done"
diff --git a/tools/testing/selftests/net/fin_ack_lat_accept.c b/tools/testing/selftests/net/fin_ack_lat_accept.c
new file mode 100644
index 000000000000..a0f0210f12b4
--- /dev/null
+++ b/tools/testing/selftests/net/fin_ack_lat_accept.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <error.h>
+#include <netinet/in.h>
+#include <stdio.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+int main(int argc, char const *argv[])
+{
+	int sock, new_sock;
+	int opt = 1;
+	struct sockaddr_in address;
+	int addrlen = sizeof(address);
+	int buffer;
+	int rc;
+
+	sock = socket(AF_INET, SOCK_STREAM, 0);
+	if (!sock)
+		error(-1, -1, "socket");
+
+	rc = setsockopt(sock, SOL_SOCKET, SO_REUSEADDR | SO_REUSEPORT,
+			&opt, sizeof(opt));
+	if (rc == -1)
+		error(-1, -1, "setsockopt");
+
+	address.sin_family = AF_INET;
+	address.sin_addr.s_addr = INADDR_ANY;
+	address.sin_port = htons(4242);
+
+	rc = bind(sock, (struct sockaddr *)&address, sizeof(address));
+	if (rc < 0)
+		error(-1, -1, "bind");
+
+	rc = listen(sock, 3);
+	if (rc < 0)
+		error(-1, -1, "listen");
+
+	while (1) {
+		new_sock = accept(sock, (struct sockaddr *)&address,
+				(socklen_t *)&addrlen);
+		if (new_sock < 0)
+			error(-1, -1, "accept");
+
+		rc = read(new_sock, &buffer, sizeof(buffer));
+		close(new_sock);
+	}
+	return 0;
+}
diff --git a/tools/testing/selftests/net/fin_ack_lat_connect.c b/tools/testing/selftests/net/fin_ack_lat_connect.c
new file mode 100644
index 000000000000..abfdd79f2e17
--- /dev/null
+++ b/tools/testing/selftests/net/fin_ack_lat_connect.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <arpa/inet.h>
+#include <error.h>
+#include <netinet/tcp.h>
+#include <stdio.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <unistd.h>
+
+static unsigned long timediff(struct timeval s, struct timeval e)
+{
+	if (s.tv_sec > e.tv_sec)
+		return 0;
+	return (e.tv_sec - s.tv_sec) * 1000000 + e.tv_usec - s.tv_usec;
+}
+
+int main(int argc, char const *argv[])
+{
+	int sock = 0;
+	struct sockaddr_in addr, laddr;
+	socklen_t len = sizeof(laddr);
+	struct linger sl;
+	int flag = 1;
+	int buffer;
+	int rc;
+	struct timeval start, end;
+	unsigned long lat, sum_lat = 0, nr_lat = 0;
+
+	while (1) {
+		gettimeofday(&start, NULL);
+
+		sock = socket(AF_INET, SOCK_STREAM, 0);
+		if (sock < 0)
+			error(-1, -1, "socket creation");
+
+		sl.l_onoff = 1;
+		sl.l_linger = 0;
+		if (setsockopt(sock, SOL_SOCKET, SO_LINGER, &sl, sizeof(sl)))
+			error(-1, -1, "setsockopt(linger)");
+
+		if (setsockopt(sock, IPPROTO_TCP, TCP_NODELAY,
+					&flag, sizeof(flag)))
+			error(-1, -1, "setsockopt(nodelay)");
+
+		addr.sin_family = AF_INET;
+		addr.sin_port = htons(4242);
+
+		rc = inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
+		if (rc <= 0)
+			error(-1, -1, "inet_pton");
+
+		rc = connect(sock, (struct sockaddr *)&addr, sizeof(addr));
+		if (rc < 0)
+			error(-1, -1, "connect");
+
+		send(sock, &buffer, sizeof(buffer), 0);
+
+		rc = read(sock, &buffer, sizeof(buffer));
+
+		gettimeofday(&end, NULL);
+		lat = timediff(start, end);
+		sum_lat += lat;
+		nr_lat++;
+		if (lat > 100000) {
+			rc = getsockname(sock, (struct sockaddr *)&laddr, &len);
+			if (rc == -1)
+				error(-1, -1, "getsockname");
+			printf("port: %d, lat: %lu, avg: %lu, nr: %lu\n",
+					ntohs(laddr.sin_port), lat,
+					sum_lat / nr_lat, nr_lat);
+		}
+
+		if (nr_lat % 1000 == 0)
+			fflush(stdout);
+
+
+		close(sock);
+	}
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race
  2020-01-31 12:24 [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race sjpark
                   ` (2 preceding siblings ...)
  2020-01-31 12:24 ` [PATCH 3/3] selftests: net: Add FIN_ACK processing order related latency spike test sjpark
@ 2020-01-31 14:00 ` David Laight
  2020-01-31 15:05   ` sjpark
  3 siblings, 1 reply; 25+ messages in thread
From: David Laight @ 2020-01-31 14:00 UTC (permalink / raw)
  To: 'sjpark@amazon.com', edumazet, davem
  Cc: shuah, netdev, linux-kselftest, linux-kernel, sj38.park, aams,
	SeongJae Park

From: sjpark@amazon.com
> Sent: 31 January 2020 12:24
...
> The acks in lines 6 and 8 are the acks.  If the line 8 packet is
> processed before the line 6 packet, it will be just ignored as it is not
> a expected packet, and the later process of the line 6 packet will
> change the status of Process A to FIN_WAIT_2, but as it has already
> handled line 8 packet, it will not go to TIME_WAIT and thus will not
> send the line 10 packet to Process B.  Thus, Process B will left in
> CLOSE_WAIT status, as below.
> 
> 	 00 (Process A)				(Process B)
> 	 01 ESTABLISHED				ESTABLISHED
> 	 02 close()
> 	 03 FIN_WAIT_1
> 	 04 		---FIN-->
> 	 05 					CLOSE_WAIT
> 	 06 				(<--ACK---)
> 	 07	  			(<--FIN/ACK---)
> 	 08 				(fired in right order)
> 	 09 		<--FIN/ACK---
> 	 10 		<--ACK---
> 	 11 		(processed in reverse order)
> 	 12 FIN_WAIT_2

Why doesn't A treat the FIN/ACK (09) as valid (as if
the ACK had got lost) and then ignore the ACK (10) because
it refers to a closed socket?

I presume that B sends two ACKs (06 and 07) because it can
sit in an intermediate state and the first ACK stops the FIN
being resent?

I've implemented lots of protocols in my time, but not TCP.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] net/ipv4/inet_timewait_sock: Fix inconsistent comments
  2020-01-31 12:24 ` [PATCH 1/3] net/ipv4/inet_timewait_sock: Fix inconsistent comments sjpark
@ 2020-01-31 14:54   ` Eric Dumazet
  2020-01-31 15:09     ` sjpark
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2020-01-31 14:54 UTC (permalink / raw)
  To: sjpark
  Cc: David Miller, Shuah Khan, netdev,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, sj38.park, aams,
	SeongJae Park

On Fri, Jan 31, 2020 at 4:24 AM <sjpark@amazon.com> wrote:
>
> From: SeongJae Park <sjpark@amazon.de>
>
> Commit ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait
> hashdance") mistakenly erased a comment for the second step of
> `inet_twsk_hashdance()`.  This commit restores it for better
> readability.
>
> Signed-off-by: SeongJae Park <sjpark@amazon.de>
> ---
>  net/ipv4/inet_timewait_sock.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> index c411c87ae865..fbfcd63cc170 100644
> --- a/net/ipv4/inet_timewait_sock.c
> +++ b/net/ipv4/inet_timewait_sock.c
> @@ -120,6 +120,7 @@ void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk,
>
>         spin_lock(lock);
>
> +       /* Step 2: Hash TW into tcp ehash chain. */

This comment adds no value, please do not bring it back.

net-next is closed, now is not the time for cosmetic changes.

Also take a look at Documentation/networking/netdev-FAQ.rst

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] selftests: net: Add FIN_ACK processing order related latency spike test
  2020-01-31 12:24 ` [PATCH 3/3] selftests: net: Add FIN_ACK processing order related latency spike test sjpark
@ 2020-01-31 14:56   ` Eric Dumazet
  2020-01-31 15:13     ` sjpark
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2020-01-31 14:56 UTC (permalink / raw)
  To: sjpark
  Cc: David Miller, Shuah Khan, netdev,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, sj38.park, aams,
	SeongJae Park

On Fri, Jan 31, 2020 at 4:25 AM <sjpark@amazon.com> wrote:
>
> From: SeongJae Park <sjpark@amazon.de>
>
> This commit adds a test for FIN_ACK process races related reconnection
> latency spike issues.  The issue has described and solved by the
> previous commit ("tcp: Reduce SYN resend delay if a suspicous ACK is
> received").
>

I do not know for other tests, but using a hard coded port (4242) is
going to be flakky, since the port might be already used.

Please make sure to run tests on a separate namespace.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 12:24 ` [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received sjpark
@ 2020-01-31 15:01   ` Eric Dumazet
  2020-01-31 16:12     ` sjpark
  2020-01-31 15:10   ` Neal Cardwell
  1 sibling, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2020-01-31 15:01 UTC (permalink / raw)
  To: sjpark
  Cc: David Miller, Shuah Khan, netdev,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, sj38.park, aams,
	SeongJae Park

On Fri, Jan 31, 2020 at 4:25 AM <sjpark@amazon.com> wrote:

> Signed-off-by: SeongJae Park <sjpark@amazon.de>
> ---
>  net/ipv4/tcp_input.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 2a976f57f7e7..b168e29e1ad1 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5893,8 +5893,12 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
>                  *        the segment and return)"
>                  */
>                 if (!after(TCP_SKB_CB(skb)->ack_seq, tp->snd_una) ||
> -                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))
> +                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
> +                       /* Previous FIN/ACK or RST/ACK might be ignore. */
> +                       inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
> +                                                 TCP_ATO_MIN, TCP_RTO_MAX);

This is not what I suggested.

I suggested implementing a strategy where only the _first_ retransmit
would be done earlier.

So you need to look at the current counter of retransmit attempts,
then reset the timer if this SYN_SENT
socket never resent a SYN.

We do not want to trigger packet storms, if for some reason the remote
peer constantly sends
us the same packet.

Thanks.

>                         goto reset_and_undo;
> +               }
>
>                 if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
>                     !between(tp->rx_opt.rcv_tsecr, tp->retrans_stamp,
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RE: [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race
  2020-01-31 14:00 ` [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race David Laight
@ 2020-01-31 15:05   ` sjpark
  0 siblings, 0 replies; 25+ messages in thread
From: sjpark @ 2020-01-31 15:05 UTC (permalink / raw)
  To: David Laight
  Cc: 'sjpark@amazon.com',
	edumazet, davem, shuah, netdev, linux-kselftest, linux-kernel,
	sj38.park, aams, SeongJae Park

On Fri, 31 Jan 2020 14:00:27 +0000 David Laight <David.Laight@ACULAB.COM> wrote:

> From: sjpark@amazon.com
> > Sent: 31 January 2020 12:24
> ...
> > The acks in lines 6 and 8 are the acks.  If the line 8 packet is
> > processed before the line 6 packet, it will be just ignored as it is not
> > a expected packet, and the later process of the line 6 packet will
> > change the status of Process A to FIN_WAIT_2, but as it has already
> > handled line 8 packet, it will not go to TIME_WAIT and thus will not
> > send the line 10 packet to Process B.  Thus, Process B will left in
> > CLOSE_WAIT status, as below.
> > 
> > 	 00 (Process A)				(Process B)
> > 	 01 ESTABLISHED				ESTABLISHED
> > 	 02 close()
> > 	 03 FIN_WAIT_1
> > 	 04 		---FIN-->
> > 	 05 					CLOSE_WAIT
> > 	 06 				(<--ACK---)
> > 	 07	  			(<--FIN/ACK---)
> > 	 08 				(fired in right order)
> > 	 09 		<--FIN/ACK---
> > 	 10 		<--ACK---
> > 	 11 		(processed in reverse order)
> > 	 12 FIN_WAIT_2
> 
> Why doesn't A treat the FIN/ACK (09) as valid (as if
> the ACK had got lost) and then ignore the ACK (10) because
> it refers to a closed socket?

Because the TCP protocol (RFC 793) doesn't have such speculation.  TCP is
stateful protocol.  Thus, packets arrived in unexpected state are not required
to be respected, AFAIU.

> 
> I presume that B sends two ACKs (06 and 07) because it can
> sit in an intermediate state and the first ACK stops the FIN
> being resent?

I think there is no such presume in the protocol, either.

> 
> I've implemented lots of protocols in my time, but not TCP.

If you find anything I'm misunderstanding, please don't hesitate to yell at me.
Hope the previous discussion[1] regarding this issue to be helpful.


Thanks,
SeongJae Park

[1] https://lore.kernel.org/bpf/20200129171403.3926-1-sjpark@amazon.com/

> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: [PATCH 1/3] net/ipv4/inet_timewait_sock: Fix inconsistent comments
  2020-01-31 14:54   ` Eric Dumazet
@ 2020-01-31 15:09     ` sjpark
  0 siblings, 0 replies; 25+ messages in thread
From: sjpark @ 2020-01-31 15:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: sjpark, David Miller, Shuah Khan, netdev,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, sj38.park, aams,
	SeongJae Park

On   Fri, 31 Jan 2020 06:54:53 -0800   Eric Dumazet <edumazet@google.com> wrote:

> On Fri, Jan 31, 2020 at 4:24 AM <sjpark@amazon.com> wrote:
> >
> > From: SeongJae Park <sjpark@amazon.de>
> >
> > Commit ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait
> > hashdance") mistakenly erased a comment for the second step of
> > `inet_twsk_hashdance()`.  This commit restores it for better
> > readability.
> >
> > Signed-off-by: SeongJae Park <sjpark@amazon.de>
> > ---
> >  net/ipv4/inet_timewait_sock.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> > index c411c87ae865..fbfcd63cc170 100644
> > --- a/net/ipv4/inet_timewait_sock.c
> > +++ b/net/ipv4/inet_timewait_sock.c
> > @@ -120,6 +120,7 @@ void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk,
> >
> >         spin_lock(lock);
> >
> > +       /* Step 2: Hash TW into tcp ehash chain. */
> 
> This comment adds no value, please do not bring it back.
> 
> net-next is closed, now is not the time for cosmetic changes.
> 
> Also take a look at Documentation/networking/netdev-FAQ.rst

Thank you for this kind reference.  Will drop this in next spin.


Thanks,
SeongJae Park

> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 12:24 ` [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received sjpark
  2020-01-31 15:01   ` Eric Dumazet
@ 2020-01-31 15:10   ` Neal Cardwell
  2020-01-31 18:12     ` Eric Dumazet
  1 sibling, 1 reply; 25+ messages in thread
From: Neal Cardwell @ 2020-01-31 15:10 UTC (permalink / raw)
  To: sjpark
  Cc: Eric Dumazet, David Miller, shuah, Netdev, linux-kselftest, LKML,
	sj38.park, aams, SeongJae Park, Yuchung Cheng

On Fri, Jan 31, 2020 at 7:25 AM <sjpark@amazon.com> wrote:
>
> From: SeongJae Park <sjpark@amazon.de>
>
> When closing a connection, the two acks that required to change closing
> socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
> reverse order.  This is possible in RSS disabled environments such as a
> connection inside a host.
>
> For example, expected state transitions and required packets for the
> disconnection will be similar to below flow.
>
>          00 (Process A)                         (Process B)
>          01 ESTABLISHED                         ESTABLISHED
>          02 close()
>          03 FIN_WAIT_1
>          04             ---FIN-->
>          05                                     CLOSE_WAIT
>          06             <--ACK---
>          07 FIN_WAIT_2
>          08             <--FIN/ACK---
>          09 TIME_WAIT
>          10             ---ACK-->
>          11                                     LAST_ACK
>          12 CLOSED                              CLOSED

AFAICT this sequence is not quite what would happen, and that it would
be different starting in line 8, and would unfold as follows:

          08                                     close()
          09                                     LAST_ACK
          10             <--FIN/ACK---
          11 TIME_WAIT
          12             ---ACK-->
          13 CLOSED                              CLOSED


> The acks in lines 6 and 8 are the acks.  If the line 8 packet is
> processed before the line 6 packet, it will be just ignored as it is not
> a expected packet,

AFAICT that is where the bug starts.

AFAICT, from first principles, when process A receives the FIN/ACK it
should move to TIME_WAIT even if it has not received a preceding ACK.
That's because ACKs are cumulative. So receiving a later cumulative
ACK conveys all the information in the previous ACKs.

Also, consider the de facto standard state transition diagram from
"TCP/IP Illustrated, Volume 2: The Implementation", by Wright and
Stevens, e.g.:

  https://courses.cs.washington.edu/courses/cse461/19sp/lectures/TCPIP_State_Transition_Diagram.pdf

This first-principles analysis agrees with the Wright/Stevens diagram,
which says that a connection in FIN_WAIT_1 that receives a FIN/ACK
should move to TIME_WAIT.

This seems like a faster and more robust solution than installing
special timers.

Thoughts?

neal

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: [PATCH 3/3] selftests: net: Add FIN_ACK processing order related latency spike test
  2020-01-31 14:56   ` Eric Dumazet
@ 2020-01-31 15:13     ` sjpark
  0 siblings, 0 replies; 25+ messages in thread
From: sjpark @ 2020-01-31 15:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: sjpark, David Miller, Shuah Khan, netdev,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, sj38.park, aams,
	SeongJae Park

On Fri, 31 Jan 2020 06:56:13 -0800 Eric Dumazet <edumazet@google.com> wrote:

> On Fri, Jan 31, 2020 at 4:25 AM <sjpark@amazon.com> wrote:
> >
> > From: SeongJae Park <sjpark@amazon.de>
> >
> > This commit adds a test for FIN_ACK process races related reconnection
> > latency spike issues.  The issue has described and solved by the
> > previous commit ("tcp: Reduce SYN resend delay if a suspicous ACK is
> > received").
> >
> 
> I do not know for other tests, but using a hard coded port (4242) is
> going to be flakky, since the port might be already used.
> 
> Please make sure to run tests on a separate namespace.

Agreed, will do so in next spin.


Thanks,
SeongJae Park

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 15:01   ` Eric Dumazet
@ 2020-01-31 16:12     ` sjpark
  2020-01-31 16:55       ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: sjpark @ 2020-01-31 16:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: sjpark, David Miller, Shuah Khan, netdev,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, sj38.park, aams,
	SeongJae Park

On Fri, 31 Jan 2020 07:01:21 -0800 Eric Dumazet <edumazet@google.com> wrote:

> On Fri, Jan 31, 2020 at 4:25 AM <sjpark@amazon.com> wrote:
> 
> > Signed-off-by: SeongJae Park <sjpark@amazon.de>
> > ---
> >  net/ipv4/tcp_input.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 2a976f57f7e7..b168e29e1ad1 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -5893,8 +5893,12 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
> >                  *        the segment and return)"
> >                  */
> >                 if (!after(TCP_SKB_CB(skb)->ack_seq, tp->snd_una) ||
> > -                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))
> > +                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
> > +                       /* Previous FIN/ACK or RST/ACK might be ignore. */
> > +                       inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
> > +                                                 TCP_ATO_MIN, TCP_RTO_MAX);
> 
> This is not what I suggested.
> 
> I suggested implementing a strategy where only the _first_ retransmit
> would be done earlier.
> 
> So you need to look at the current counter of retransmit attempts,
> then reset the timer if this SYN_SENT
> socket never resent a SYN.
> 
> We do not want to trigger packet storms, if for some reason the remote
> peer constantly sends
> us the same packet.

You're right, I missed the important point, thank you for pointing it.  Among
retransmission related fields of 'tcp_sock', I think '->total_retrans' would
fit for this check.  How about below change?

```
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2a976f57f7e7..29fc0e4da931 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5893,8 +5893,14 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
                 *        the segment and return)"
                 */
                if (!after(TCP_SKB_CB(skb)->ack_seq, tp->snd_una) ||
-                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))
+                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
+                       /* Previous FIN/ACK or RST/ACK might be ignored. */
+                       if (tp->total_retrans == 0)
+                               inet_csk_reset_xmit_timer(sk,
+                                               ICSK_TIME_RETRANS, TCP_ATO_MIN,
+                                               TCP_RTO_MAX);
                        goto reset_and_undo;
+               }

                if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
                    !between(tp->rx_opt.rcv_tsecr, tp->retrans_stamp,
```

Thanks,
SeongJae Park

> 
> Thanks.
> 
> >                         goto reset_and_undo;
> > +               }
> >
> >                 if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
> >                     !between(tp->rx_opt.rcv_tsecr, tp->retrans_stamp,
> > --
> > 2.17.1
> >
> 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 16:12     ` sjpark
@ 2020-01-31 16:55       ` Eric Dumazet
  2020-01-31 17:05         ` sjpark
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2020-01-31 16:55 UTC (permalink / raw)
  To: sjpark
  Cc: David Miller, Shuah Khan, netdev,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, sj38.park, aams,
	SeongJae Park

On Fri, Jan 31, 2020 at 8:12 AM <sjpark@amazon.com> wrote:
>
> On Fri, 31 Jan 2020 07:01:21 -0800 Eric Dumazet <edumazet@google.com> wrote:
>
> > On Fri, Jan 31, 2020 at 4:25 AM <sjpark@amazon.com> wrote:
> >
> > > Signed-off-by: SeongJae Park <sjpark@amazon.de>
> > > ---
> > >  net/ipv4/tcp_input.c | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 2a976f57f7e7..b168e29e1ad1 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -5893,8 +5893,12 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
> > >                  *        the segment and return)"
> > >                  */
> > >                 if (!after(TCP_SKB_CB(skb)->ack_seq, tp->snd_una) ||
> > > -                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))
> > > +                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
> > > +                       /* Previous FIN/ACK or RST/ACK might be ignore. */
> > > +                       inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
> > > +                                                 TCP_ATO_MIN, TCP_RTO_MAX);
> >
> > This is not what I suggested.
> >
> > I suggested implementing a strategy where only the _first_ retransmit
> > would be done earlier.
> >
> > So you need to look at the current counter of retransmit attempts,
> > then reset the timer if this SYN_SENT
> > socket never resent a SYN.
> >
> > We do not want to trigger packet storms, if for some reason the remote
> > peer constantly sends
> > us the same packet.
>
> You're right, I missed the important point, thank you for pointing it.  Among
> retransmission related fields of 'tcp_sock', I think '->total_retrans' would
> fit for this check.  How about below change?
>
> ```
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 2a976f57f7e7..29fc0e4da931 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5893,8 +5893,14 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
>                  *        the segment and return)"
>                  */
>                 if (!after(TCP_SKB_CB(skb)->ack_seq, tp->snd_una) ||
> -                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))
> +                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
> +                       /* Previous FIN/ACK or RST/ACK might be ignored. */
> +                       if (tp->total_retrans == 0)

canonical fied would be icsk->icsk_retransmits (look in net/ipv4/tcp_timer.c )

AFAIK, it seems we forget to clear tp->total_retrans in tcp_disconnect()
I will send a patch for this tp->total_retrans thing.

> +                               inet_csk_reset_xmit_timer(sk,
> +                                               ICSK_TIME_RETRANS, TCP_ATO_MIN,
> +                                               TCP_RTO_MAX);
>                         goto reset_and_undo;
> +               }
>
>                 if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
>                     !between(tp->rx_opt.rcv_tsecr, tp->retrans_stamp,
> ```
>
> Thanks,
> SeongJae Park
>
> >
> > Thanks.
> >
> > >                         goto reset_and_undo;
> > > +               }
> > >
> > >                 if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
> > >                     !between(tp->rx_opt.rcv_tsecr, tp->retrans_stamp,
> > > --
> > > 2.17.1
> > >
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 16:55       ` Eric Dumazet
@ 2020-01-31 17:05         ` sjpark
  2020-01-31 17:08           ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: sjpark @ 2020-01-31 17:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: sjpark, David Miller, Shuah Khan, netdev,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, sj38.park, aams,
	SeongJae Park

On   Fri, 31 Jan 2020 08:55:08 -0800   Eric Dumazet <edumazet@google.com> wrote:

> On Fri, Jan 31, 2020 at 8:12 AM <sjpark@amazon.com> wrote:
> >
> > On Fri, 31 Jan 2020 07:01:21 -0800 Eric Dumazet <edumazet@google.com> wrote:
> >
> > > On Fri, Jan 31, 2020 at 4:25 AM <sjpark@amazon.com> wrote:
> > >
> > > > Signed-off-by: SeongJae Park <sjpark@amazon.de>
> > > > ---
> > > >  net/ipv4/tcp_input.c | 6 +++++-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > index 2a976f57f7e7..b168e29e1ad1 100644
> > > > --- a/net/ipv4/tcp_input.c
> > > > +++ b/net/ipv4/tcp_input.c
> > > > @@ -5893,8 +5893,12 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
> > > >                  *        the segment and return)"
> > > >                  */
> > > >                 if (!after(TCP_SKB_CB(skb)->ack_seq, tp->snd_una) ||
> > > > -                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))
> > > > +                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
> > > > +                       /* Previous FIN/ACK or RST/ACK might be ignore. */
> > > > +                       inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
> > > > +                                                 TCP_ATO_MIN, TCP_RTO_MAX);
> > >
> > > This is not what I suggested.
> > >
> > > I suggested implementing a strategy where only the _first_ retransmit
> > > would be done earlier.
> > >
> > > So you need to look at the current counter of retransmit attempts,
> > > then reset the timer if this SYN_SENT
> > > socket never resent a SYN.
> > >
> > > We do not want to trigger packet storms, if for some reason the remote
> > > peer constantly sends
> > > us the same packet.
> >
> > You're right, I missed the important point, thank you for pointing it.  Among
> > retransmission related fields of 'tcp_sock', I think '->total_retrans' would
> > fit for this check.  How about below change?
> >
> > ```
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 2a976f57f7e7..29fc0e4da931 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -5893,8 +5893,14 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
> >                  *        the segment and return)"
> >                  */
> >                 if (!after(TCP_SKB_CB(skb)->ack_seq, tp->snd_una) ||
> > -                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))
> > +                   after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
> > +                       /* Previous FIN/ACK or RST/ACK might be ignored. */
> > +                       if (tp->total_retrans == 0)
> 
> canonical fied would be icsk->icsk_retransmits (look in net/ipv4/tcp_timer.c )
> 
> AFAIK, it seems we forget to clear tp->total_retrans in tcp_disconnect()
> I will send a patch for this tp->total_retrans thing.

Oh, then I will use 'tcsk->icsk_retransmits' instead of 'tp->total_retrans', in
next spin.  May I also ask you to Cc me for your 'tp->total_retrans' fix patch?


Thanks,
SeongJae Park

> 
> > +                               inet_csk_reset_xmit_timer(sk,
> > +                                               ICSK_TIME_RETRANS, TCP_ATO_MIN,
> > +                                               TCP_RTO_MAX);
> >                         goto reset_and_undo;
> > +               }
> >
> >                 if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
> >                     !between(tp->rx_opt.rcv_tsecr, tp->retrans_stamp,
> > ```
> >
> > Thanks,
> > SeongJae Park
> >
> > >
> > > Thanks.
> > >
> > > >                         goto reset_and_undo;
> > > > +               }
> > > >
> > > >                 if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
> > > >                     !between(tp->rx_opt.rcv_tsecr, tp->retrans_stamp,
> > > > --
> > > > 2.17.1
> > > >
> > >
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 17:05         ` sjpark
@ 2020-01-31 17:08           ` Eric Dumazet
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2020-01-31 17:08 UTC (permalink / raw)
  To: sjpark
  Cc: David Miller, Shuah Khan, netdev,
	open list:KERNEL SELFTEST FRAMEWORK, LKML, sj38.park, aams,
	SeongJae Park

On Fri, Jan 31, 2020 at 9:05 AM <sjpark@amazon.com> wrote:
> Oh, then I will use 'tcsk->icsk_retransmits' instead of 'tp->total_retrans', in
> next spin.  May I also ask you to Cc me for your 'tp->total_retrans' fix patch?
>

Sure, but I usually send my patches to netdev@

Please subscribe to the list if you want to get a copy of all TCP
patches in the future.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 15:10   ` Neal Cardwell
@ 2020-01-31 18:12     ` Eric Dumazet
  2020-01-31 22:11       ` Neal Cardwell
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2020-01-31 18:12 UTC (permalink / raw)
  To: Neal Cardwell, sjpark
  Cc: Eric Dumazet, David Miller, shuah, Netdev, linux-kselftest, LKML,
	sj38.park, aams, SeongJae Park, Yuchung Cheng



On 1/31/20 7:10 AM, Neal Cardwell wrote:
> On Fri, Jan 31, 2020 at 7:25 AM <sjpark@amazon.com> wrote:
>>
>> From: SeongJae Park <sjpark@amazon.de>
>>
>> When closing a connection, the two acks that required to change closing
>> socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
>> reverse order.  This is possible in RSS disabled environments such as a
>> connection inside a host.
>>
>> For example, expected state transitions and required packets for the
>> disconnection will be similar to below flow.
>>
>>          00 (Process A)                         (Process B)
>>          01 ESTABLISHED                         ESTABLISHED
>>          02 close()
>>          03 FIN_WAIT_1
>>          04             ---FIN-->
>>          05                                     CLOSE_WAIT
>>          06             <--ACK---
>>          07 FIN_WAIT_2
>>          08             <--FIN/ACK---
>>          09 TIME_WAIT
>>          10             ---ACK-->
>>          11                                     LAST_ACK
>>          12 CLOSED                              CLOSED
> 
> AFAICT this sequence is not quite what would happen, and that it would
> be different starting in line 8, and would unfold as follows:
> 
>           08                                     close()
>           09                                     LAST_ACK
>           10             <--FIN/ACK---
>           11 TIME_WAIT
>           12             ---ACK-->
>           13 CLOSED                              CLOSED
> 
> 
>> The acks in lines 6 and 8 are the acks.  If the line 8 packet is
>> processed before the line 6 packet, it will be just ignored as it is not
>> a expected packet,
> 
> AFAICT that is where the bug starts.
> 
> AFAICT, from first principles, when process A receives the FIN/ACK it
> should move to TIME_WAIT even if it has not received a preceding ACK.
> That's because ACKs are cumulative. So receiving a later cumulative
> ACK conveys all the information in the previous ACKs.
> 
> Also, consider the de facto standard state transition diagram from
> "TCP/IP Illustrated, Volume 2: The Implementation", by Wright and
> Stevens, e.g.:
> 
>   https://courses.cs.washington.edu/courses/cse461/19sp/lectures/TCPIP_State_Transition_Diagram.pdf
> 
> This first-principles analysis agrees with the Wright/Stevens diagram,
> which says that a connection in FIN_WAIT_1 that receives a FIN/ACK
> should move to TIME_WAIT.
> 
> This seems like a faster and more robust solution than installing
> special timers.
> 
> Thoughts?


This is orthogonal I think.

No matter how hard we fix the other side, we should improve the active side.

Since we send a RST, sending the SYN a few ms after the RST seems way better
than waiting 1 second as if we received no packet at all.

Receiving this ACK tells us something about networking health, no need
to be very cautious about the next attempt.

Of course, if you have a fix for the passive side, that would be nice to review !




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 18:12     ` Eric Dumazet
@ 2020-01-31 22:11       ` Neal Cardwell
  2020-01-31 22:17         ` SeongJae Park
  2020-01-31 22:53         ` Eric Dumazet
  0 siblings, 2 replies; 25+ messages in thread
From: Neal Cardwell @ 2020-01-31 22:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: sjpark, Eric Dumazet, David Miller, shuah, Netdev,
	linux-kselftest, LKML, sj38.park, aams, SeongJae Park,
	Yuchung Cheng

On Fri, Jan 31, 2020 at 1:12 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 1/31/20 7:10 AM, Neal Cardwell wrote:
> > On Fri, Jan 31, 2020 at 7:25 AM <sjpark@amazon.com> wrote:
> >>
> >> From: SeongJae Park <sjpark@amazon.de>
> >>
> >> When closing a connection, the two acks that required to change closing
> >> socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
> >> reverse order.  This is possible in RSS disabled environments such as a
> >> connection inside a host.
> >>
> >> For example, expected state transitions and required packets for the
> >> disconnection will be similar to below flow.
> >>
> >>          00 (Process A)                         (Process B)
> >>          01 ESTABLISHED                         ESTABLISHED
> >>          02 close()
> >>          03 FIN_WAIT_1
> >>          04             ---FIN-->
> >>          05                                     CLOSE_WAIT
> >>          06             <--ACK---
> >>          07 FIN_WAIT_2
> >>          08             <--FIN/ACK---
> >>          09 TIME_WAIT
> >>          10             ---ACK-->
> >>          11                                     LAST_ACK
> >>          12 CLOSED                              CLOSED
> >
> > AFAICT this sequence is not quite what would happen, and that it would
> > be different starting in line 8, and would unfold as follows:
> >
> >           08                                     close()
> >           09                                     LAST_ACK
> >           10             <--FIN/ACK---
> >           11 TIME_WAIT
> >           12             ---ACK-->
> >           13 CLOSED                              CLOSED
> >
> >
> >> The acks in lines 6 and 8 are the acks.  If the line 8 packet is
> >> processed before the line 6 packet, it will be just ignored as it is not
> >> a expected packet,
> >
> > AFAICT that is where the bug starts.
> >
> > AFAICT, from first principles, when process A receives the FIN/ACK it
> > should move to TIME_WAIT even if it has not received a preceding ACK.
> > That's because ACKs are cumulative. So receiving a later cumulative
> > ACK conveys all the information in the previous ACKs.
> >
> > Also, consider the de facto standard state transition diagram from
> > "TCP/IP Illustrated, Volume 2: The Implementation", by Wright and
> > Stevens, e.g.:
> >
> >   https://courses.cs.washington.edu/courses/cse461/19sp/lectures/TCPIP_State_Transition_Diagram.pdf
> >
> > This first-principles analysis agrees with the Wright/Stevens diagram,
> > which says that a connection in FIN_WAIT_1 that receives a FIN/ACK
> > should move to TIME_WAIT.
> >
> > This seems like a faster and more robust solution than installing
> > special timers.
> >
> > Thoughts?
>
>
> This is orthogonal I think.
>
> No matter how hard we fix the other side, we should improve the active side.
>
> Since we send a RST, sending the SYN a few ms after the RST seems way better
> than waiting 1 second as if we received no packet at all.
>
> Receiving this ACK tells us something about networking health, no need
> to be very cautious about the next attempt.

Yes, all good points. Thanks!

> Of course, if you have a fix for the passive side, that would be nice to review !

I looked into fixing this, but my quick reading of the Linux
tcp_rcv_state_process() code is that it should behave correctly and
that a connection in FIN_WAIT_1 that receives a FIN/ACK should move to
TIME_WAIT.

SeongJae, do you happen to have a tcpdump trace of the problematic
sequence where the "process A" ends up in FIN_WAIT_2 when it should be
in TIME_WAIT?

If I have time I will try to construct a packetdrill case to verify
the behavior in this case.

thanks,
neal

>
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 22:11       ` Neal Cardwell
@ 2020-01-31 22:17         ` SeongJae Park
  2020-02-01  3:55           ` Neal Cardwell
  2020-01-31 22:53         ` Eric Dumazet
  1 sibling, 1 reply; 25+ messages in thread
From: SeongJae Park @ 2020-01-31 22:17 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Eric Dumazet, sjpark, Eric Dumazet, David Miller, shuah, Netdev,
	linux-kselftest, LKML, sj38.park, aams, SeongJae Park,
	Yuchung Cheng

On Fri, 31 Jan 2020 17:11:35 -0500 Neal Cardwell <ncardwell@google.com> wrote:

> On Fri, Jan 31, 2020 at 1:12 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >
> >
> > On 1/31/20 7:10 AM, Neal Cardwell wrote:
> > > On Fri, Jan 31, 2020 at 7:25 AM <sjpark@amazon.com> wrote:
> > >>
> > >> From: SeongJae Park <sjpark@amazon.de>
> > >>
> > >> When closing a connection, the two acks that required to change closing
> > >> socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
> > >> reverse order.  This is possible in RSS disabled environments such as a
> > >> connection inside a host.
[...]
> 
> I looked into fixing this, but my quick reading of the Linux
> tcp_rcv_state_process() code is that it should behave correctly and
> that a connection in FIN_WAIT_1 that receives a FIN/ACK should move to
> TIME_WAIT.
> 
> SeongJae, do you happen to have a tcpdump trace of the problematic
> sequence where the "process A" ends up in FIN_WAIT_2 when it should be
> in TIME_WAIT?

Hi Neal,


Yes, I have.  You can get it from the previous discussion for this patchset
(https://lore.kernel.org/bpf/20200129171403.3926-1-sjpark@amazon.com/).  As it
also has a reproducer program and how I got the tcpdump trace, I believe you
could get your own trace, too.  If you have any question or need help, feel
free to let me know. :)


Thanks,
SeongJae Park

> 
> If I have time I will try to construct a packetdrill case to verify
> the behavior in this case.
> 
> thanks,
> neal
> 
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 22:11       ` Neal Cardwell
  2020-01-31 22:17         ` SeongJae Park
@ 2020-01-31 22:53         ` Eric Dumazet
  2020-02-03 15:40           ` David Laight
  1 sibling, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2020-01-31 22:53 UTC (permalink / raw)
  To: Neal Cardwell, Eric Dumazet
  Cc: sjpark, Eric Dumazet, David Miller, shuah, Netdev,
	linux-kselftest, LKML, sj38.park, aams, SeongJae Park,
	Yuchung Cheng



On 1/31/20 2:11 PM, Neal Cardwell wrote:

> I looked into fixing this, but my quick reading of the Linux
> tcp_rcv_state_process() code is that it should behave correctly and
> that a connection in FIN_WAIT_1 that receives a FIN/ACK should move to
> TIME_WAIT.
> 
> SeongJae, do you happen to have a tcpdump trace of the problematic
> sequence where the "process A" ends up in FIN_WAIT_2 when it should be
> in TIME_WAIT?
> 
> If I have time I will try to construct a packetdrill case to verify
> the behavior in this case.

Unfortunately you wont be able to reproduce the issue with packetdrill,
since it involved packets being processed at the same time (race window)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 22:17         ` SeongJae Park
@ 2020-02-01  3:55           ` Neal Cardwell
  2020-02-01  6:08             ` SeongJae Park
  0 siblings, 1 reply; 25+ messages in thread
From: Neal Cardwell @ 2020-02-01  3:55 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Eric Dumazet, sjpark, Eric Dumazet, David Miller, shuah, Netdev,
	linux-kselftest, LKML, aams, SeongJae Park, Yuchung Cheng

On Fri, Jan 31, 2020 at 5:18 PM SeongJae Park <sj38.park@gmail.com> wrote:
>
> On Fri, 31 Jan 2020 17:11:35 -0500 Neal Cardwell <ncardwell@google.com> wrote:
>
> > On Fri, Jan 31, 2020 at 1:12 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >
> > >
> > >
> > > On 1/31/20 7:10 AM, Neal Cardwell wrote:
> > > > On Fri, Jan 31, 2020 at 7:25 AM <sjpark@amazon.com> wrote:
> > > >>
> > > >> From: SeongJae Park <sjpark@amazon.de>
> > > >>
> > > >> When closing a connection, the two acks that required to change closing
> > > >> socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
> > > >> reverse order.  This is possible in RSS disabled environments such as a
> > > >> connection inside a host.
> [...]
> >
> > I looked into fixing this, but my quick reading of the Linux
> > tcp_rcv_state_process() code is that it should behave correctly and
> > that a connection in FIN_WAIT_1 that receives a FIN/ACK should move to
> > TIME_WAIT.
> >
> > SeongJae, do you happen to have a tcpdump trace of the problematic
> > sequence where the "process A" ends up in FIN_WAIT_2 when it should be
> > in TIME_WAIT?
>
> Hi Neal,
>
>
> Yes, I have.  You can get it from the previous discussion for this patchset
> (https://lore.kernel.org/bpf/20200129171403.3926-1-sjpark@amazon.com/).  As it
> also has a reproducer program and how I got the tcpdump trace, I believe you
> could get your own trace, too.  If you have any question or need help, feel
> free to let me know. :)

Great. Thank you for the pointer.

I had one quick question: in the message:
  https://lore.kernel.org/bpf/20200129171403.3926-1-sjpark@amazon.com/
... it showed a trace with the client sending a RST/ACK, but this
email thread shows a FIN/ACK. I am curious about the motivation for
the difference?

Anyway, thanks for the report, and thanks to Eric for further clarifying!

neal

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-02-01  3:55           ` Neal Cardwell
@ 2020-02-01  6:08             ` SeongJae Park
  2020-02-01 13:30               ` Neal Cardwell
  0 siblings, 1 reply; 25+ messages in thread
From: SeongJae Park @ 2020-02-01  6:08 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: SeongJae Park, Eric Dumazet, sjpark, Eric Dumazet, David Miller,
	shuah, Netdev, linux-kselftest, LKML, aams, SeongJae Park,
	Yuchung Cheng

On Fri, 31 Jan 2020 22:55:34 -0500 Neal Cardwell <ncardwell@google.com> wrote:

> On Fri, Jan 31, 2020 at 5:18 PM SeongJae Park <sj38.park@gmail.com> wrote:
> >
> > On Fri, 31 Jan 2020 17:11:35 -0500 Neal Cardwell <ncardwell@google.com> wrote:
> >
> > > On Fri, Jan 31, 2020 at 1:12 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > >
> > > >
> > > >
> > > > On 1/31/20 7:10 AM, Neal Cardwell wrote:
> > > > > On Fri, Jan 31, 2020 at 7:25 AM <sjpark@amazon.com> wrote:
> > > > >>
> > > > >> From: SeongJae Park <sjpark@amazon.de>
> > > > >>
> > > > >> When closing a connection, the two acks that required to change closing
> > > > >> socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
> > > > >> reverse order.  This is possible in RSS disabled environments such as a
> > > > >> connection inside a host.
> > [...]
> > >
> > > I looked into fixing this, but my quick reading of the Linux
> > > tcp_rcv_state_process() code is that it should behave correctly and
> > > that a connection in FIN_WAIT_1 that receives a FIN/ACK should move to
> > > TIME_WAIT.
> > >
> > > SeongJae, do you happen to have a tcpdump trace of the problematic
> > > sequence where the "process A" ends up in FIN_WAIT_2 when it should be
> > > in TIME_WAIT?
> >
> > Hi Neal,
> >
> >
> > Yes, I have.  You can get it from the previous discussion for this patchset
> > (https://lore.kernel.org/bpf/20200129171403.3926-1-sjpark@amazon.com/).  As it
> > also has a reproducer program and how I got the tcpdump trace, I believe you
> > could get your own trace, too.  If you have any question or need help, feel
> > free to let me know. :)
> 
> Great. Thank you for the pointer.
> 
> I had one quick question: in the message:
>   https://lore.kernel.org/bpf/20200129171403.3926-1-sjpark@amazon.com/
> ... it showed a trace with the client sending a RST/ACK, but this
> email thread shows a FIN/ACK. I am curious about the motivation for
> the difference?

RST/ACK is traced if LINGER socket option is applied in the reproduce program,
and FIN/ACK is traced if it is not applied.  LINGER applied version shows the
spikes more frequently, but the main problem logic has no difference.  I
confirmed this by testing both of the two versions.

In the previous discussion, I showed the LINGER applied trace.  However, as
many other documents are using FIN/ACK, I changed the trace to FIN/ACK version
in this patchset for better understanding.  I will comment that it doesn't
matter whether it is FIN/ACK or RST/ACK in the next spin.


Thanks,
SeongJae Park

> 
> Anyway, thanks for the report, and thanks to Eric for further clarifying!
> 
> neal
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Re: Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-02-01  6:08             ` SeongJae Park
@ 2020-02-01 13:30               ` Neal Cardwell
  0 siblings, 0 replies; 25+ messages in thread
From: Neal Cardwell @ 2020-02-01 13:30 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Eric Dumazet, sjpark, Eric Dumazet, David Miller, shuah, Netdev,
	linux-kselftest, LKML, aams, SeongJae Park, Yuchung Cheng

On Sat, Feb 1, 2020 at 1:08 AM SeongJae Park <sj38.park@gmail.com> wrote:
> RST/ACK is traced if LINGER socket option is applied in the reproduce program,
> and FIN/ACK is traced if it is not applied.  LINGER applied version shows the
> spikes more frequently, but the main problem logic has no difference.  I
> confirmed this by testing both of the two versions.
>
> In the previous discussion, I showed the LINGER applied trace.  However, as
> many other documents are using FIN/ACK, I changed the trace to FIN/ACK version
> in this patchset for better understanding.  I will comment that it doesn't
> matter whether it is FIN/ACK or RST/ACK in the next spin.

Great. Thanks for the details!

neal

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-01-31 22:53         ` Eric Dumazet
@ 2020-02-03 15:40           ` David Laight
  2020-02-03 15:54             ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: David Laight @ 2020-02-03 15:40 UTC (permalink / raw)
  To: 'Eric Dumazet', Neal Cardwell
  Cc: sjpark, Eric Dumazet, David Miller, shuah, Netdev,
	linux-kselftest, LKML, sj38.park, aams, SeongJae Park,
	Yuchung Cheng

From: Eric Dumazet
> Sent: 31 January 2020 22:54
> On 1/31/20 2:11 PM, Neal Cardwell wrote:
> 
> > I looked into fixing this, but my quick reading of the Linux
> > tcp_rcv_state_process() code is that it should behave correctly and
> > that a connection in FIN_WAIT_1 that receives a FIN/ACK should move to
> > TIME_WAIT.
> >
> > SeongJae, do you happen to have a tcpdump trace of the problematic
> > sequence where the "process A" ends up in FIN_WAIT_2 when it should be
> > in TIME_WAIT?
> >
> > If I have time I will try to construct a packetdrill case to verify
> > the behavior in this case.
> 
> Unfortunately you wont be able to reproduce the issue with packetdrill,
> since it involved packets being processed at the same time (race window)

You might be able to force the timing race by adding a sleep
in one of the code paths.

No good for a regression test, but ok for code testing.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received
  2020-02-03 15:40           ` David Laight
@ 2020-02-03 15:54             ` Eric Dumazet
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2020-02-03 15:54 UTC (permalink / raw)
  To: David Laight
  Cc: Eric Dumazet, Neal Cardwell, sjpark, David Miller, shuah, Netdev,
	linux-kselftest, LKML, sj38.park, aams, SeongJae Park,
	Yuchung Cheng

On Mon, Feb 3, 2020 at 7:40 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Eric Dumazet
> > Sent: 31 January 2020 22:54
> > On 1/31/20 2:11 PM, Neal Cardwell wrote:
> >
> > > I looked into fixing this, but my quick reading of the Linux
> > > tcp_rcv_state_process() code is that it should behave correctly and
> > > that a connection in FIN_WAIT_1 that receives a FIN/ACK should move to
> > > TIME_WAIT.
> > >
> > > SeongJae, do you happen to have a tcpdump trace of the problematic
> > > sequence where the "process A" ends up in FIN_WAIT_2 when it should be
> > > in TIME_WAIT?
> > >
> > > If I have time I will try to construct a packetdrill case to verify
> > > the behavior in this case.
> >
> > Unfortunately you wont be able to reproduce the issue with packetdrill,
> > since it involved packets being processed at the same time (race window)
>
> You might be able to force the timing race by adding a sleep
> in one of the code paths.
>
> No good for a regression test, but ok for code testing.

Please take a look at packetdrill, there is no possibility for it to
send more than one packet at a time.

Even if we modify packetdrill adding the possibility of feeding
packets to its tun device from multiple threads,
the race is tiny and you would have to run the packetdrill thousands
of times to eventually trigger the race once.

While the test SeongJae provided is using two threads and regular TCP
stack over loopback interface,
it triggers the race more reliably.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-02-03 15:54 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-31 12:24 [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race sjpark
2020-01-31 12:24 ` [PATCH 1/3] net/ipv4/inet_timewait_sock: Fix inconsistent comments sjpark
2020-01-31 14:54   ` Eric Dumazet
2020-01-31 15:09     ` sjpark
2020-01-31 12:24 ` [PATCH 2/3] tcp: Reduce SYN resend delay if a suspicous ACK is received sjpark
2020-01-31 15:01   ` Eric Dumazet
2020-01-31 16:12     ` sjpark
2020-01-31 16:55       ` Eric Dumazet
2020-01-31 17:05         ` sjpark
2020-01-31 17:08           ` Eric Dumazet
2020-01-31 15:10   ` Neal Cardwell
2020-01-31 18:12     ` Eric Dumazet
2020-01-31 22:11       ` Neal Cardwell
2020-01-31 22:17         ` SeongJae Park
2020-02-01  3:55           ` Neal Cardwell
2020-02-01  6:08             ` SeongJae Park
2020-02-01 13:30               ` Neal Cardwell
2020-01-31 22:53         ` Eric Dumazet
2020-02-03 15:40           ` David Laight
2020-02-03 15:54             ` Eric Dumazet
2020-01-31 12:24 ` [PATCH 3/3] selftests: net: Add FIN_ACK processing order related latency spike test sjpark
2020-01-31 14:56   ` Eric Dumazet
2020-01-31 15:13     ` sjpark
2020-01-31 14:00 ` [PATCH 0/3] Fix reconnection latency caused by FIN/ACK handling race David Laight
2020-01-31 15:05   ` sjpark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).