netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes
@ 2020-11-19 19:45 Mat Martineau
  2020-11-19 19:45 ` [PATCH net-next 01/10] mptcp: drop WORKER_RUNNING status bit Mat Martineau
                   ` (10 more replies)
  0 siblings, 11 replies; 14+ messages in thread
From: Mat Martineau @ 2020-11-19 19:45 UTC (permalink / raw)
  To: netdev; +Cc: Mat Martineau, kuba, mptcp

Here's another batch of fixup and enhancement patches that we have
collected in the MPTCP tree.

Patch 1 removes an unnecessary flag and related code.

Patch 2 fixes a bug encountered when closing fallback sockets.

Patches 3 and 4 choose a better transmit subflow, with a self test.

Patch 5 adjusts tracking of unaccepted subflows

Patches 6-8 improve handling of long ADD_ADDR options, with a test.

Patch 9 more reliably tracks the MPTCP-level window shared with peers.

Patch 10 sends MPTCP-level acknowledgements more aggressively, so the
peer can send more data without extra delay.


Florian Westphal (3):
  mptcp: skip to next candidate if subflow has unacked data
  selftests: mptcp: add link failure test case
  mptcp: track window announced to peer

Geliang Tang (3):
  mptcp: change add_addr_signal type
  mptcp: send out dedicated ADD_ADDR packet
  selftests: mptcp: add ADD_ADDR IPv6 test cases

Paolo Abeni (4):
  mptcp: drop WORKER_RUNNING status bit
  mptcp: fix state tracking for fallback socket
  mptcp: keep unaccepted MPC subflow into join list
  mptcp: refine MPTCP-level ack scheduling

 include/net/mptcp.h                           |   3 +-
 net/ipv4/tcp_output.c                         |  11 +-
 net/mptcp/options.c                           |  48 ++++-
 net/mptcp/pm.c                                |  31 ++-
 net/mptcp/pm_netlink.c                        |  29 +++
 net/mptcp/protocol.c                          | 178 +++++++++---------
 net/mptcp/protocol.h                          |  44 ++++-
 net/mptcp/subflow.c                           |  14 +-
 .../testing/selftests/net/mptcp/mptcp_join.sh | 174 ++++++++++++++---
 9 files changed, 391 insertions(+), 141 deletions(-)


base-commit: 657bc1d10bfc23ac06d5d687ce45826c760744f9
-- 
2.29.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next 01/10] mptcp: drop WORKER_RUNNING status bit
  2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
@ 2020-11-19 19:45 ` Mat Martineau
  2020-11-19 19:45 ` [PATCH net-next 02/10] mptcp: fix state tracking for fallback socket Mat Martineau
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-11-19 19:45 UTC (permalink / raw)
  To: netdev; +Cc: Paolo Abeni, kuba, mptcp, Matthieu Baerts, Mat Martineau

From: Paolo Abeni <pabeni@redhat.com>

Only mptcp_close() can actually cancel the workqueue,
no need to add and use this flag.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 net/mptcp/protocol.c | 8 +-------
 net/mptcp/protocol.h | 1 -
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 8df013daea88..749c00fffff5 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1861,7 +1861,6 @@ static void mptcp_worker(struct work_struct *work)
 	int state, ret;
 
 	lock_sock(sk);
-	set_bit(MPTCP_WORKER_RUNNING, &msk->flags);
 	state = sk->sk_state;
 	if (unlikely(state == TCP_CLOSE))
 		goto unlock;
@@ -1939,7 +1938,6 @@ static void mptcp_worker(struct work_struct *work)
 		mptcp_reset_timer(sk);
 
 unlock:
-	clear_bit(MPTCP_WORKER_RUNNING, &msk->flags);
 	release_sock(sk);
 	sock_put(sk);
 }
@@ -2010,11 +2008,7 @@ static void mptcp_cancel_work(struct sock *sk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
-	/* if called by the work itself, do not try to cancel the work, or
-	 * we will hang.
-	 */
-	if (!test_bit(MPTCP_WORKER_RUNNING, &msk->flags) &&
-	    cancel_work_sync(&msk->work))
+	if (cancel_work_sync(&msk->work))
 		__sock_put(sk);
 }
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index b4c8dbe9236b..10fffc5de9e4 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -91,7 +91,6 @@
 #define MPTCP_WORK_EOF		3
 #define MPTCP_FALLBACK_DONE	4
 #define MPTCP_WORK_CLOSE_SUBFLOW 5
-#define MPTCP_WORKER_RUNNING	6
 
 static inline bool before64(__u64 seq1, __u64 seq2)
 {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 02/10] mptcp: fix state tracking for fallback socket
  2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
  2020-11-19 19:45 ` [PATCH net-next 01/10] mptcp: drop WORKER_RUNNING status bit Mat Martineau
@ 2020-11-19 19:45 ` Mat Martineau
  2020-11-19 19:45 ` [PATCH net-next 03/10] mptcp: skip to next candidate if subflow has unacked data Mat Martineau
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-11-19 19:45 UTC (permalink / raw)
  To: netdev; +Cc: Paolo Abeni, kuba, mptcp, Mat Martineau

From: Paolo Abeni <pabeni@redhat.com>

We need to cope with some more state transition for
fallback sockets, or could still end-up moving to TCP_CLOSE
too early and avoid spooling some pending data

Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 net/mptcp/protocol.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 749c00fffff5..a46a542b1766 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -777,7 +777,9 @@ static void mptcp_check_for_eof(struct mptcp_sock *msk)
 		inet_sk_state_store(sk, TCP_CLOSE_WAIT);
 		break;
 	case TCP_FIN_WAIT1:
-		/* fallback sockets skip TCP_CLOSING - TCP will take care */
+		inet_sk_state_store(sk, TCP_CLOSING);
+		break;
+	case TCP_FIN_WAIT2:
 		inet_sk_state_store(sk, TCP_CLOSE);
 		break;
 	default:
@@ -2085,10 +2087,16 @@ static void __mptcp_check_send_data_fin(struct sock *sk)
 
 	WRITE_ONCE(msk->snd_nxt, msk->write_seq);
 
-	/* fallback socket will not get data_fin/ack, can move to close now */
-	if (__mptcp_check_fallback(msk) && sk->sk_state == TCP_LAST_ACK) {
-		inet_sk_state_store(sk, TCP_CLOSE);
-		mptcp_close_wake_up(sk);
+	/* fallback socket will not get data_fin/ack, can move to the next
+	 * state now
+	 */
+	if (__mptcp_check_fallback(msk)) {
+		if ((1 << sk->sk_state) & (TCPF_CLOSING | TCPF_LAST_ACK)) {
+			inet_sk_state_store(sk, TCP_CLOSE);
+			mptcp_close_wake_up(sk);
+		} else if (sk->sk_state == TCP_FIN_WAIT1) {
+			inet_sk_state_store(sk, TCP_FIN_WAIT2);
+		}
 	}
 
 	__mptcp_flush_join_list(msk);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 03/10] mptcp: skip to next candidate if subflow has unacked data
  2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
  2020-11-19 19:45 ` [PATCH net-next 01/10] mptcp: drop WORKER_RUNNING status bit Mat Martineau
  2020-11-19 19:45 ` [PATCH net-next 02/10] mptcp: fix state tracking for fallback socket Mat Martineau
@ 2020-11-19 19:45 ` Mat Martineau
  2020-11-19 19:45 ` [PATCH net-next 04/10] selftests: mptcp: add link failure test case Mat Martineau
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-11-19 19:45 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal, kuba, mptcp, Mat Martineau

From: Florian Westphal <fw@strlen.de>

In case a subflow path is blocked, MPTCP-level retransmit may not take
place anymore because such subflow is likely to have unacked data left
in its write queue.

Ignore subflows that have experienced loss and test next candidate.

Fixes: 3b1d6210a95773691 ("mptcp: implement and use MPTCP-level retransmission")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 net/mptcp/protocol.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index a46a542b1766..806c0658e42f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1725,8 +1725,11 @@ static struct sock *mptcp_subflow_get_retrans(const struct mptcp_sock *msk)
 			continue;
 
 		/* still data outstanding at TCP level?  Don't retransmit. */
-		if (!tcp_write_queue_empty(ssk))
+		if (!tcp_write_queue_empty(ssk)) {
+			if (inet_csk(ssk)->icsk_ca_state >= TCP_CA_Loss)
+				continue;
 			return NULL;
+		}
 
 		if (subflow->backup) {
 			if (!backup)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 04/10] selftests: mptcp: add link failure test case
  2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
                   ` (2 preceding siblings ...)
  2020-11-19 19:45 ` [PATCH net-next 03/10] mptcp: skip to next candidate if subflow has unacked data Mat Martineau
@ 2020-11-19 19:45 ` Mat Martineau
  2020-11-19 19:45 ` [PATCH net-next 05/10] mptcp: keep unaccepted MPC subflow into join list Mat Martineau
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-11-19 19:45 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal, kuba, mptcp, Geliang Tang, Mat Martineau

From: Florian Westphal <fw@strlen.de>

Add a test case where a link fails with multiple subflows.
The expectation is that MPTCP will transmit any data that
could not be delivered via the failed link on another subflow.

Co-developed-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 .../testing/selftests/net/mptcp/mptcp_join.sh | 104 ++++++++++++++----
 1 file changed, 82 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 0d93b243695f..f841ed8186c1 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -5,6 +5,7 @@ ret=0
 sin=""
 sout=""
 cin=""
+cinsent=""
 cout=""
 ksft_skip=4
 timeout=30
@@ -81,7 +82,7 @@ cleanup_partial()
 cleanup()
 {
 	rm -f "$cin" "$cout"
-	rm -f "$sin" "$sout"
+	rm -f "$sin" "$sout" "$cinsent"
 	cleanup_partial
 }
 
@@ -144,6 +145,13 @@ if [ $? -ne 0 ];then
 	exit $ksft_skip
 fi
 
+print_file_err()
+{
+	ls -l "$1" 1>&2
+	echo "Trailing bytes are: "
+	tail -c 27 "$1"
+}
+
 check_transfer()
 {
 	in=$1
@@ -155,6 +163,7 @@ check_transfer()
 		echo "[ FAIL ] $what does not match (in, out):"
 		print_file_err "$in"
 		print_file_err "$out"
+		ret=1
 
 		return 1
 	fi
@@ -175,6 +184,17 @@ do_ping()
 	fi
 }
 
+link_failure()
+{
+	ns="$1"
+
+	l=$((RANDOM%4))
+	l=$((l+1))
+
+	veth="ns1eth$l"
+	ip -net "$ns" link set "$veth" down
+}
+
 do_transfer()
 {
 	listener_ns="$1"
@@ -182,9 +202,10 @@ do_transfer()
 	cl_proto="$3"
 	srv_proto="$4"
 	connect_addr="$5"
-	rm_nr_ns1="$6"
-	rm_nr_ns2="$7"
-	speed="$8"
+	test_link_fail="$6"
+	rm_nr_ns1="$7"
+	rm_nr_ns2="$8"
+	speed="$9"
 
 	port=$((10000+$TEST_COUNT))
 	TEST_COUNT=$((TEST_COUNT+1))
@@ -220,7 +241,12 @@ do_transfer()
 
 	sleep 1
 
-	ip netns exec ${connector_ns} $mptcp_connect -t $timeout -p $port -s ${cl_proto} $connect_addr < "$cin" > "$cout" &
+	if [ "$test_link_fail" -eq 0 ];then
+		ip netns exec ${connector_ns} $mptcp_connect -t $timeout -p $port -s ${cl_proto} $connect_addr < "$cin" > "$cout" &
+	else
+		( cat "$cin" ; sleep 2; link_failure $listener_ns ; cat "$cin" ) | tee "$cinsent" | \
+		ip netns exec ${connector_ns} $mptcp_connect -t $timeout -p $port -s ${cl_proto} $connect_addr > "$cout" &
+	fi
 	cpid=$!
 
 	if [ $rm_nr_ns1 -gt 0 ]; then
@@ -265,12 +291,17 @@ do_transfer()
 		ip netns exec ${connector_ns} ss -nita 1>&2 -o "dport = :$port"
 
 		cat "$capout"
+		ret=1
 		return 1
 	fi
 
 	check_transfer $sin $cout "file received by client"
 	retc=$?
-	check_transfer $cin $sout "file received by server"
+	if [ "$test_link_fail" -eq 0 ];then
+		check_transfer $cin $sout "file received by server"
+	else
+		check_transfer $cinsent $sout "file received by server"
+	fi
 	rets=$?
 
 	if [ $retc -eq 0 ] && [ $rets -eq 0 ];then
@@ -286,13 +317,12 @@ make_file()
 {
 	name=$1
 	who=$2
+	size=$3
 
-	SIZE=1
-
-	dd if=/dev/urandom of="$name" bs=1024 count=$SIZE 2> /dev/null
+	dd if=/dev/urandom of="$name" bs=1024 count=$size 2> /dev/null
 	echo -e "\nMPTCP_TEST_FILE_END_MARKER" >> "$name"
 
-	echo "Created $name (size $SIZE KB) containing data sent by $who"
+	echo "Created $name (size $size KB) containing data sent by $who"
 }
 
 run_tests()
@@ -300,14 +330,32 @@ run_tests()
 	listener_ns="$1"
 	connector_ns="$2"
 	connect_addr="$3"
-	rm_nr_ns1="${4:-0}"
-	rm_nr_ns2="${5:-0}"
-	speed="${6:-fast}"
+	test_linkfail="${4:-0}"
+	rm_nr_ns1="${5:-0}"
+	rm_nr_ns2="${6:-0}"
+	speed="${7:-fast}"
 	lret=0
+	oldin=""
+
+	if [ "$test_linkfail" -eq 1 ];then
+		size=$((RANDOM%1024))
+		size=$((size+1))
+		size=$((size*128))
+
+		oldin=$(mktemp)
+		cp "$cin" "$oldin"
+		make_file "$cin" "client" $size
+	fi
 
 	do_transfer ${listener_ns} ${connector_ns} MPTCP MPTCP ${connect_addr} \
-		${rm_nr_ns1} ${rm_nr_ns2} ${speed}
+		${test_linkfail} ${rm_nr_ns1} ${rm_nr_ns2} ${speed}
 	lret=$?
+
+	if [ "$test_linkfail" -eq 1 ];then
+		cp "$oldin" "$cin"
+		rm -f "$oldin"
+	fi
+
 	if [ $lret -ne 0 ]; then
 		ret=$lret
 		return
@@ -440,10 +488,11 @@ chk_rm_nr()
 sin=$(mktemp)
 sout=$(mktemp)
 cin=$(mktemp)
+cinsent=$(mktemp)
 cout=$(mktemp)
 init
-make_file "$cin" "client"
-make_file "$sin" "server"
+make_file "$cin" "client" 1
+make_file "$sin" "server" 1
 trap cleanup EXIT
 
 run_tests $ns1 $ns2 10.0.1.1
@@ -528,12 +577,23 @@ run_tests $ns1 $ns2 10.0.1.1
 chk_join_nr "multiple subflows and signal" 3 3 3
 chk_add_nr 1 1
 
+# accept and use add_addr with additional subflows and link loss
+reset
+ip netns exec $ns1 ./pm_nl_ctl limits 0 3
+ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 flags signal
+ip netns exec $ns2 ./pm_nl_ctl limits 1 3
+ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
+ip netns exec $ns2 ./pm_nl_ctl add 10.0.4.2 flags subflow
+run_tests $ns1 $ns2 10.0.1.1 1
+chk_join_nr "multiple flows, signal, link failure" 3 3 3
+chk_add_nr 1 1
+
 # add_addr timeout
 reset_with_add_addr_timeout
 ip netns exec $ns1 ./pm_nl_ctl limits 0 1
 ip netns exec $ns2 ./pm_nl_ctl limits 1 1
 ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 flags signal
-run_tests $ns1 $ns2 10.0.1.1 0 0 slow
+run_tests $ns1 $ns2 10.0.1.1 0 0 0 slow
 chk_join_nr "signal address, ADD_ADDR timeout" 1 1 1
 chk_add_nr 4 0
 
@@ -542,7 +602,7 @@ reset
 ip netns exec $ns1 ./pm_nl_ctl limits 0 1
 ip netns exec $ns2 ./pm_nl_ctl limits 0 1
 ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
-run_tests $ns1 $ns2 10.0.1.1 0 1 slow
+run_tests $ns1 $ns2 10.0.1.1 0 0 1 slow
 chk_join_nr "remove single subflow" 1 1 1
 chk_rm_nr 1 1
 
@@ -552,7 +612,7 @@ ip netns exec $ns1 ./pm_nl_ctl limits 0 2
 ip netns exec $ns2 ./pm_nl_ctl limits 0 2
 ip netns exec $ns2 ./pm_nl_ctl add 10.0.2.2 flags subflow
 ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
-run_tests $ns1 $ns2 10.0.1.1 0 2 slow
+run_tests $ns1 $ns2 10.0.1.1 0 0 2 slow
 chk_join_nr "remove multiple subflows" 2 2 2
 chk_rm_nr 2 2
 
@@ -561,7 +621,7 @@ reset
 ip netns exec $ns1 ./pm_nl_ctl limits 0 1
 ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 flags signal
 ip netns exec $ns2 ./pm_nl_ctl limits 1 1
-run_tests $ns1 $ns2 10.0.1.1 1 0 slow
+run_tests $ns1 $ns2 10.0.1.1 0 1 0 slow
 chk_join_nr "remove single address" 1 1 1
 chk_add_nr 1 1
 chk_rm_nr 0 0
@@ -572,7 +632,7 @@ ip netns exec $ns1 ./pm_nl_ctl limits 0 2
 ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 flags signal
 ip netns exec $ns2 ./pm_nl_ctl limits 1 2
 ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
-run_tests $ns1 $ns2 10.0.1.1 1 1 slow
+run_tests $ns1 $ns2 10.0.1.1 0 1 1 slow
 chk_join_nr "remove subflow and signal" 2 2 2
 chk_add_nr 1 1
 chk_rm_nr 1 1
@@ -584,7 +644,7 @@ ip netns exec $ns1 ./pm_nl_ctl add 10.0.2.1 flags signal
 ip netns exec $ns2 ./pm_nl_ctl limits 1 3
 ip netns exec $ns2 ./pm_nl_ctl add 10.0.3.2 flags subflow
 ip netns exec $ns2 ./pm_nl_ctl add 10.0.4.2 flags subflow
-run_tests $ns1 $ns2 10.0.1.1 1 2 slow
+run_tests $ns1 $ns2 10.0.1.1 0 1 2 slow
 chk_join_nr "remove subflows and signal" 3 3 3
 chk_add_nr 1 1
 chk_rm_nr 2 2
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 05/10] mptcp: keep unaccepted MPC subflow into join list
  2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
                   ` (3 preceding siblings ...)
  2020-11-19 19:45 ` [PATCH net-next 04/10] selftests: mptcp: add link failure test case Mat Martineau
@ 2020-11-19 19:45 ` Mat Martineau
  2020-11-19 19:45 ` [PATCH net-next 06/10] mptcp: change add_addr_signal type Mat Martineau
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-11-19 19:45 UTC (permalink / raw)
  To: netdev; +Cc: Paolo Abeni, kuba, mptcp, Geliang Tang, Mat Martineau

From: Paolo Abeni <pabeni@redhat.com>

This will simplify all operation dealing with subflows
before accept time (e.g. data fin processing, add_addr).

The join list is already flushed by mptcp_stream_accept()
before returning the newly created msk to the user space.

This also fixes an potential bug present into the old code:
conn_list was manipulated without helding the msk lock
in mptcp_stream_accept().

Tested-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 net/mptcp/protocol.c | 24 ++++++++----------------
 net/mptcp/protocol.h |  9 +++++++++
 net/mptcp/subflow.c  | 10 +++++-----
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 806c0658e42f..0e83887efbc8 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2342,7 +2342,6 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
 	if (sk_is_mptcp(newsk)) {
 		struct mptcp_subflow_context *subflow;
 		struct sock *new_mptcp_sock;
-		struct sock *ssk = newsk;
 
 		subflow = mptcp_subflow_ctx(newsk);
 		new_mptcp_sock = subflow->conn;
@@ -2357,22 +2356,8 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
 
 		/* acquire the 2nd reference for the owning socket */
 		sock_hold(new_mptcp_sock);
-
-		local_bh_disable();
-		bh_lock_sock(new_mptcp_sock);
-		msk = mptcp_sk(new_mptcp_sock);
-		msk->first = newsk;
-
 		newsk = new_mptcp_sock;
-		mptcp_copy_inaddrs(newsk, ssk);
-		list_add(&subflow->node, &msk->conn_list);
-		sock_hold(ssk);
-
-		mptcp_rcv_space_init(msk, ssk);
-		bh_unlock_sock(new_mptcp_sock);
-
-		__MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEPASSIVEACK);
-		local_bh_enable();
+		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEPASSIVEACK);
 	} else {
 		MPTCP_INC_STATS(sock_net(sk),
 				MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
@@ -2823,6 +2808,12 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 	if (err == 0 && !mptcp_is_tcpsk(newsock->sk)) {
 		struct mptcp_sock *msk = mptcp_sk(newsock->sk);
 		struct mptcp_subflow_context *subflow;
+		struct sock *newsk = newsock->sk;
+		bool slowpath;
+
+		slowpath = lock_sock_fast(newsk);
+		mptcp_copy_inaddrs(newsk, msk->first);
+		mptcp_rcv_space_init(msk, msk->first);
 
 		/* set ssk->sk_socket of accept()ed flows to mptcp socket.
 		 * This is needed so NOSPACE flag can be set from tcp stack.
@@ -2834,6 +2825,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 			if (!ssk->sk_socket)
 				mptcp_sock_graft(ssk, newsock);
 		}
+		unlock_sock_fast(newsk, slowpath);
 	}
 
 	if (inet_csk_listen_poll(ssock->sk))
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 10fffc5de9e4..7affaf0b1941 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -403,6 +403,15 @@ mptcp_subflow_get_mapped_dsn(const struct mptcp_subflow_context *subflow)
 	return subflow->map_seq + mptcp_subflow_get_map_offset(subflow);
 }
 
+static inline void mptcp_add_pending_subflow(struct mptcp_sock *msk,
+					     struct mptcp_subflow_context *subflow)
+{
+	sock_hold(mptcp_subflow_tcp_sock(subflow));
+	spin_lock_bh(&msk->join_list_lock);
+	list_add_tail(&subflow->node, &msk->join_list);
+	spin_unlock_bh(&msk->join_list_lock);
+}
+
 int mptcp_is_enabled(struct net *net);
 unsigned int mptcp_get_add_addr_timeout(struct net *net);
 void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 794259789194..d3c6b3a5ad55 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -578,6 +578,10 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
 			 */
 			inet_sk_state_store((void *)new_msk, TCP_ESTABLISHED);
 
+			/* link the newly created socket to the msk */
+			mptcp_add_pending_subflow(mptcp_sk(new_msk), ctx);
+			WRITE_ONCE(mptcp_sk(new_msk)->first, child);
+
 			/* new mpc subflow takes ownership of the newly
 			 * created mptcp socket
 			 */
@@ -1124,11 +1128,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 	if (err && err != -EINPROGRESS)
 		goto failed;
 
-	sock_hold(ssk);
-	spin_lock_bh(&msk->join_list_lock);
-	list_add_tail(&subflow->node, &msk->join_list);
-	spin_unlock_bh(&msk->join_list_lock);
-
+	mptcp_add_pending_subflow(msk, subflow);
 	return err;
 
 failed:
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 06/10] mptcp: change add_addr_signal type
  2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
                   ` (4 preceding siblings ...)
  2020-11-19 19:45 ` [PATCH net-next 05/10] mptcp: keep unaccepted MPC subflow into join list Mat Martineau
@ 2020-11-19 19:45 ` Mat Martineau
  2020-11-19 19:46 ` [PATCH net-next 07/10] mptcp: send out dedicated ADD_ADDR packet Mat Martineau
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-11-19 19:45 UTC (permalink / raw)
  To: netdev; +Cc: Geliang Tang, kuba, mptcp, Paolo Abeni, Mat Martineau

From: Geliang Tang <geliangtang@gmail.com>

This patch changed the 'add_addr_signal' type from bool to char, so that
we could encode the addr type there.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 net/mptcp/pm.c       | 15 +++++++++------
 net/mptcp/protocol.h | 15 ++++++++++++---
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index f9c88e2abb8e..c2c12f02a263 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -16,11 +16,15 @@ int mptcp_pm_announce_addr(struct mptcp_sock *msk,
 			   const struct mptcp_addr_info *addr,
 			   bool echo)
 {
+	u8 add_addr = READ_ONCE(msk->pm.add_addr_signal);
+
 	pr_debug("msk=%p, local_id=%d", msk, addr->id);
 
 	msk->pm.local = *addr;
-	WRITE_ONCE(msk->pm.add_addr_echo, echo);
-	WRITE_ONCE(msk->pm.add_addr_signal, true);
+	add_addr |= BIT(MPTCP_ADD_ADDR_SIGNAL);
+	if (echo)
+		add_addr |= BIT(MPTCP_ADD_ADDR_ECHO);
+	WRITE_ONCE(msk->pm.add_addr_signal, add_addr);
 	return 0;
 }
 
@@ -182,13 +186,13 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, unsigned int remaining,
 	if (!mptcp_pm_should_add_signal(msk))
 		goto out_unlock;
 
-	*echo = READ_ONCE(msk->pm.add_addr_echo);
+	*echo = mptcp_pm_should_add_signal_echo(msk);
 
 	if (remaining < mptcp_add_addr_len(msk->pm.local.family, *echo))
 		goto out_unlock;
 
 	*saddr = msk->pm.local;
-	WRITE_ONCE(msk->pm.add_addr_signal, false);
+	WRITE_ONCE(msk->pm.add_addr_signal, 0);
 	ret = true;
 
 out_unlock:
@@ -232,11 +236,10 @@ void mptcp_pm_data_init(struct mptcp_sock *msk)
 	msk->pm.subflows = 0;
 	msk->pm.rm_id = 0;
 	WRITE_ONCE(msk->pm.work_pending, false);
-	WRITE_ONCE(msk->pm.add_addr_signal, false);
+	WRITE_ONCE(msk->pm.add_addr_signal, 0);
 	WRITE_ONCE(msk->pm.rm_addr_signal, false);
 	WRITE_ONCE(msk->pm.accept_addr, false);
 	WRITE_ONCE(msk->pm.accept_subflow, false);
-	WRITE_ONCE(msk->pm.add_addr_echo, false);
 	msk->pm.status = 0;
 
 	spin_lock_init(&msk->pm.lock);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 7affaf0b1941..a62ffae621c2 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -165,6 +165,11 @@ enum mptcp_pm_status {
 	MPTCP_PM_SUBFLOW_ESTABLISHED,
 };
 
+enum mptcp_add_addr_status {
+	MPTCP_ADD_ADDR_SIGNAL,
+	MPTCP_ADD_ADDR_ECHO,
+};
+
 struct mptcp_pm_data {
 	struct mptcp_addr_info local;
 	struct mptcp_addr_info remote;
@@ -172,13 +177,12 @@ struct mptcp_pm_data {
 
 	spinlock_t	lock;		/*protects the whole PM data */
 
-	bool		add_addr_signal;
+	u8		add_addr_signal;
 	bool		rm_addr_signal;
 	bool		server_side;
 	bool		work_pending;
 	bool		accept_addr;
 	bool		accept_subflow;
-	bool		add_addr_echo;
 	u8		add_addr_signaled;
 	u8		add_addr_accepted;
 	u8		local_addr_used;
@@ -516,7 +520,12 @@ int mptcp_pm_remove_subflow(struct mptcp_sock *msk, u8 local_id);
 
 static inline bool mptcp_pm_should_add_signal(struct mptcp_sock *msk)
 {
-	return READ_ONCE(msk->pm.add_addr_signal);
+	return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_SIGNAL);
+}
+
+static inline bool mptcp_pm_should_add_signal_echo(struct mptcp_sock *msk)
+{
+	return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_ECHO);
 }
 
 static inline bool mptcp_pm_should_rm_signal(struct mptcp_sock *msk)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 07/10] mptcp: send out dedicated ADD_ADDR packet
  2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
                   ` (5 preceding siblings ...)
  2020-11-19 19:45 ` [PATCH net-next 06/10] mptcp: change add_addr_signal type Mat Martineau
@ 2020-11-19 19:46 ` Mat Martineau
  2020-11-19 19:46 ` [PATCH net-next 08/10] selftests: mptcp: add ADD_ADDR IPv6 test cases Mat Martineau
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-11-19 19:46 UTC (permalink / raw)
  To: netdev; +Cc: Geliang Tang, kuba, mptcp, Paolo Abeni, Mat Martineau

From: Geliang Tang <geliangtang@gmail.com>

When ADD_ADDR suboption includes an IPv6 address, the size is 28 octets.
It will not fit when other MPTCP suboptions are included in this packet,
e.g. DSS. So here we send out an ADD_ADDR dedicated packet to carry only
ADD_ADDR suboption, no other MPTCP suboptions.

In mptcp_pm_announce_addr, we check whether this is an IPv6 ADD_ADDR.
If it is, we set the flag MPTCP_ADD_ADDR_IPV6 to true. Then we call
mptcp_pm_nl_add_addr_send_ack to sent out a new pure ACK packet.

In mptcp_established_options_add_addr, we check whether this is a pure
ACK packet for ADD_ADDR. If it is, we drop all other MPTCP suboptions
in this packet, only put ADD_ADDR suboption in it.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 net/mptcp/options.c    | 25 ++++++++++++++++++++++---
 net/mptcp/pm.c         | 16 ++++++++++++++--
 net/mptcp/pm_netlink.c | 29 +++++++++++++++++++++++++++++
 net/mptcp/protocol.c   |  6 +++++-
 net/mptcp/protocol.h   | 10 ++++++++++
 5 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index f2d1e27a2bc1..d7afffcc87c1 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -242,7 +242,9 @@ static void mptcp_parse_option(const struct sk_buff *skb,
 
 		mp_opt->add_addr = 1;
 		mp_opt->addr_id = *ptr++;
-		pr_debug("ADD_ADDR: id=%d, echo=%d", mp_opt->addr_id, mp_opt->echo);
+		pr_debug("ADD_ADDR%s: id=%d, echo=%d",
+			 (mp_opt->family == MPTCP_ADDR_IPVERSION_6) ? "6" : "",
+			 mp_opt->addr_id, mp_opt->echo);
 		if (mp_opt->family == MPTCP_ADDR_IPVERSION_4) {
 			memcpy((u8 *)&mp_opt->addr.s_addr, (u8 *)ptr, 4);
 			ptr += 4;
@@ -573,17 +575,27 @@ static u64 add_addr6_generate_hmac(u64 key1, u64 key2, u8 addr_id,
 }
 #endif
 
-static bool mptcp_established_options_add_addr(struct sock *sk,
+static bool mptcp_established_options_add_addr(struct sock *sk, struct sk_buff *skb,
 					       unsigned int *size,
 					       unsigned int remaining,
 					       struct mptcp_out_options *opts)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
 	struct mptcp_sock *msk = mptcp_sk(subflow->conn);
+	bool drop_other_suboptions = false;
+	unsigned int opt_size = *size;
 	struct mptcp_addr_info saddr;
 	bool echo;
 	int len;
 
+	if (mptcp_pm_should_add_signal_ipv6(msk) &&
+	    skb && skb_is_tcp_pure_ack(skb)) {
+		pr_debug("drop other suboptions");
+		opts->suboptions = 0;
+		remaining += opt_size;
+		drop_other_suboptions = true;
+	}
+
 	if (!mptcp_pm_should_add_signal(msk) ||
 	    !(mptcp_pm_add_addr_signal(msk, remaining, &saddr, &echo)))
 		return false;
@@ -593,6 +605,8 @@ static bool mptcp_established_options_add_addr(struct sock *sk,
 		return false;
 
 	*size = len;
+	if (drop_other_suboptions)
+		*size -= opt_size;
 	opts->addr_id = saddr.id;
 	if (saddr.family == AF_INET) {
 		opts->suboptions |= OPTION_MPTCP_ADD_ADDR;
@@ -678,7 +692,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
 
 	*size += opt_size;
 	remaining -= opt_size;
-	if (mptcp_established_options_add_addr(sk, &opt_size, remaining, opts)) {
+	if (mptcp_established_options_add_addr(sk, skb, &opt_size, remaining, opts)) {
 		*size += opt_size;
 		remaining -= opt_size;
 		ret = true;
@@ -759,6 +773,11 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
 		goto fully_established;
 	}
 
+	if (mp_opt->add_addr) {
+		WRITE_ONCE(msk->fully_established, true);
+		return true;
+	}
+
 	/* If the first established packet does not contain MP_CAPABLE + data
 	 * then fallback to TCP. Fallback scenarios requires a reset for
 	 * MP_JOIN subflows.
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index c2c12f02a263..75c5040e8d5d 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -24,6 +24,8 @@ int mptcp_pm_announce_addr(struct mptcp_sock *msk,
 	add_addr |= BIT(MPTCP_ADD_ADDR_SIGNAL);
 	if (echo)
 		add_addr |= BIT(MPTCP_ADD_ADDR_ECHO);
+	if (addr->family == AF_INET6)
+		add_addr |= BIT(MPTCP_ADD_ADDR_IPV6);
 	WRITE_ONCE(msk->pm.add_addr_signal, add_addr);
 	return 0;
 }
@@ -153,14 +155,24 @@ void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
 
 	spin_lock_bh(&pm->lock);
 
-	if (!READ_ONCE(pm->accept_addr))
+	if (!READ_ONCE(pm->accept_addr)) {
 		mptcp_pm_announce_addr(msk, addr, true);
-	else if (mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_RECEIVED))
+		mptcp_pm_add_addr_send_ack(msk);
+	} else if (mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_RECEIVED)) {
 		pm->remote = *addr;
+	}
 
 	spin_unlock_bh(&pm->lock);
 }
 
+void mptcp_pm_add_addr_send_ack(struct mptcp_sock *msk)
+{
+	if (!mptcp_pm_should_add_signal_ipv6(msk))
+		return;
+
+	mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_SEND_ACK);
+}
+
 void mptcp_pm_rm_addr_received(struct mptcp_sock *msk, u8 rm_id)
 {
 	struct mptcp_pm_data *pm = &msk->pm;
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index f8a9d82a0ea8..03f2c28f11f5 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -228,6 +228,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer)
 	if (!mptcp_pm_should_add_signal(msk)) {
 		pr_debug("retransmit ADD_ADDR id=%d", entry->addr.id);
 		mptcp_pm_announce_addr(msk, &entry->addr, false);
+		mptcp_pm_add_addr_send_ack(msk);
 		entry->retrans_times++;
 	}
 
@@ -328,6 +329,7 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 			if (mptcp_pm_alloc_anno_list(msk, local)) {
 				msk->pm.add_addr_signaled++;
 				mptcp_pm_announce_addr(msk, &local->addr, false);
+				mptcp_pm_nl_add_addr_send_ack(msk);
 			}
 		} else {
 			/* pick failed, avoid fourther attempts later */
@@ -398,6 +400,33 @@ void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk)
 	spin_lock_bh(&msk->pm.lock);
 
 	mptcp_pm_announce_addr(msk, &remote, true);
+	mptcp_pm_nl_add_addr_send_ack(msk);
+}
+
+void mptcp_pm_nl_add_addr_send_ack(struct mptcp_sock *msk)
+{
+	struct mptcp_subflow_context *subflow;
+
+	if (!mptcp_pm_should_add_signal_ipv6(msk))
+		return;
+
+	__mptcp_flush_join_list(msk);
+	subflow = list_first_entry_or_null(&msk->conn_list, typeof(*subflow), node);
+	if (subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+		u8 add_addr;
+
+		spin_unlock_bh(&msk->pm.lock);
+		pr_debug("send ack for add_addr6");
+		lock_sock(ssk);
+		tcp_send_ack(ssk);
+		release_sock(ssk);
+		spin_lock_bh(&msk->pm.lock);
+
+		add_addr = READ_ONCE(msk->pm.add_addr_signal);
+		add_addr &= ~BIT(MPTCP_ADD_ADDR_IPV6);
+		WRITE_ONCE(msk->pm.add_addr_signal, add_addr);
+	}
 }
 
 void mptcp_pm_nl_rm_addr_received(struct mptcp_sock *msk)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 0e83887efbc8..c2629190b1ce 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -690,7 +690,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 		sk->sk_data_ready(sk);
 }
 
-static void __mptcp_flush_join_list(struct mptcp_sock *msk)
+void __mptcp_flush_join_list(struct mptcp_sock *msk)
 {
 	if (likely(list_empty(&msk->join_list)))
 		return;
@@ -1807,6 +1807,10 @@ static void pm_work(struct mptcp_sock *msk)
 		pm->status &= ~BIT(MPTCP_PM_ADD_ADDR_RECEIVED);
 		mptcp_pm_nl_add_addr_received(msk);
 	}
+	if (pm->status & BIT(MPTCP_PM_ADD_ADDR_SEND_ACK)) {
+		pm->status &= ~BIT(MPTCP_PM_ADD_ADDR_SEND_ACK);
+		mptcp_pm_nl_add_addr_send_ack(msk);
+	}
 	if (pm->status & BIT(MPTCP_PM_RM_ADDR_RECEIVED)) {
 		pm->status &= ~BIT(MPTCP_PM_RM_ADDR_RECEIVED);
 		mptcp_pm_nl_rm_addr_received(msk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index a62ffae621c2..4d6dd4adaaaf 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -160,6 +160,7 @@ struct mptcp_addr_info {
 
 enum mptcp_pm_status {
 	MPTCP_PM_ADD_ADDR_RECEIVED,
+	MPTCP_PM_ADD_ADDR_SEND_ACK,
 	MPTCP_PM_RM_ADDR_RECEIVED,
 	MPTCP_PM_ESTABLISHED,
 	MPTCP_PM_SUBFLOW_ESTABLISHED,
@@ -168,6 +169,7 @@ enum mptcp_pm_status {
 enum mptcp_add_addr_status {
 	MPTCP_ADD_ADDR_SIGNAL,
 	MPTCP_ADD_ADDR_ECHO,
+	MPTCP_ADD_ADDR_IPV6,
 };
 
 struct mptcp_pm_data {
@@ -466,6 +468,7 @@ bool mptcp_schedule_work(struct sock *sk);
 void mptcp_data_acked(struct sock *sk);
 void mptcp_subflow_eof(struct sock *sk);
 bool mptcp_update_rcv_data_fin(struct mptcp_sock *msk, u64 data_fin_seq, bool use_64bit);
+void __mptcp_flush_join_list(struct mptcp_sock *msk);
 static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
 {
 	return READ_ONCE(msk->snd_data_fin_enable) &&
@@ -506,6 +509,7 @@ void mptcp_pm_subflow_established(struct mptcp_sock *msk,
 void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id);
 void mptcp_pm_add_addr_received(struct mptcp_sock *msk,
 				const struct mptcp_addr_info *addr);
+void mptcp_pm_add_addr_send_ack(struct mptcp_sock *msk);
 void mptcp_pm_rm_addr_received(struct mptcp_sock *msk, u8 rm_id);
 void mptcp_pm_free_anno_list(struct mptcp_sock *msk);
 struct mptcp_pm_add_entry *
@@ -528,6 +532,11 @@ static inline bool mptcp_pm_should_add_signal_echo(struct mptcp_sock *msk)
 	return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_ECHO);
 }
 
+static inline bool mptcp_pm_should_add_signal_ipv6(struct mptcp_sock *msk)
+{
+	return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_IPV6);
+}
+
 static inline bool mptcp_pm_should_rm_signal(struct mptcp_sock *msk)
 {
 	return READ_ONCE(msk->pm.rm_addr_signal);
@@ -552,6 +561,7 @@ void mptcp_pm_nl_data_init(struct mptcp_sock *msk);
 void mptcp_pm_nl_fully_established(struct mptcp_sock *msk);
 void mptcp_pm_nl_subflow_established(struct mptcp_sock *msk);
 void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk);
+void mptcp_pm_nl_add_addr_send_ack(struct mptcp_sock *msk);
 void mptcp_pm_nl_rm_addr_received(struct mptcp_sock *msk);
 void mptcp_pm_nl_rm_subflow_received(struct mptcp_sock *msk, u8 rm_id);
 int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 08/10] selftests: mptcp: add ADD_ADDR IPv6 test cases
  2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
                   ` (6 preceding siblings ...)
  2020-11-19 19:46 ` [PATCH net-next 07/10] mptcp: send out dedicated ADD_ADDR packet Mat Martineau
@ 2020-11-19 19:46 ` Mat Martineau
  2020-11-19 19:46 ` [PATCH net-next 09/10] mptcp: track window announced to peer Mat Martineau
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-11-19 19:46 UTC (permalink / raw)
  To: netdev; +Cc: Geliang Tang, kuba, mptcp, Paolo Abeni, Mat Martineau

From: Geliang Tang <geliangtang@gmail.com>

This patch added IPv6 support for do_transfer, and the test cases for
ADD_ADDR IPv6.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 .../testing/selftests/net/mptcp/mptcp_join.sh | 70 ++++++++++++++++++-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index f841ed8186c1..0eae628d1ffd 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -195,6 +195,12 @@ link_failure()
 	ip -net "$ns" link set "$veth" down
 }
 
+# $1: IP address
+is_v6()
+{
+	[ -z "${1##*:*}" ]
+}
+
 do_transfer()
 {
 	listener_ns="$1"
@@ -236,7 +242,15 @@ do_transfer()
 		mptcp_connect="./mptcp_connect -r"
 	fi
 
-	ip netns exec ${listener_ns} $mptcp_connect -t $timeout -l -p $port -s ${srv_proto} 0.0.0.0 < "$sin" > "$sout" &
+	local local_addr
+	if is_v6 "${connect_addr}"; then
+		local_addr="::"
+	else
+		local_addr="0.0.0.0"
+	fi
+
+	ip netns exec ${listener_ns} $mptcp_connect -t $timeout -l -p $port \
+		-s ${srv_proto} ${local_addr} < "$sin" > "$sout" &
 	spid=$!
 
 	sleep 1
@@ -649,6 +663,60 @@ chk_join_nr "remove subflows and signal" 3 3 3
 chk_add_nr 1 1
 chk_rm_nr 2 2
 
+# subflow IPv6
+reset
+ip netns exec $ns1 ./pm_nl_ctl limits 0 1
+ip netns exec $ns2 ./pm_nl_ctl limits 0 1
+ip netns exec $ns2 ./pm_nl_ctl add dead:beef:3::2 flags subflow
+run_tests $ns1 $ns2 dead:beef:1::1 0 0 0 slow
+chk_join_nr "single subflow IPv6" 1 1 1
+
+# add_address, unused IPv6
+reset
+ip netns exec $ns1 ./pm_nl_ctl add dead:beef:2::1 flags signal
+run_tests $ns1 $ns2 dead:beef:1::1 0 0 0 slow
+chk_join_nr "unused signal address IPv6" 0 0 0
+chk_add_nr 1 1
+
+# signal address IPv6
+reset
+ip netns exec $ns1 ./pm_nl_ctl limits 0 1
+ip netns exec $ns1 ./pm_nl_ctl add dead:beef:2::1 flags signal
+ip netns exec $ns2 ./pm_nl_ctl limits 1 1
+run_tests $ns1 $ns2 dead:beef:1::1 0 0 0 slow
+chk_join_nr "single address IPv6" 1 1 1
+chk_add_nr 1 1
+
+# add_addr timeout IPv6
+reset_with_add_addr_timeout 6
+ip netns exec $ns1 ./pm_nl_ctl limits 0 1
+ip netns exec $ns2 ./pm_nl_ctl limits 1 1
+ip netns exec $ns1 ./pm_nl_ctl add dead:beef:2::1 flags signal
+run_tests $ns1 $ns2 dead:beef:1::1 0 0 0 slow
+chk_join_nr "signal address, ADD_ADDR6 timeout" 1 1 1
+chk_add_nr 4 0
+
+# single address IPv6, remove
+reset
+ip netns exec $ns1 ./pm_nl_ctl limits 0 1
+ip netns exec $ns1 ./pm_nl_ctl add dead:beef:2::1 flags signal
+ip netns exec $ns2 ./pm_nl_ctl limits 1 1
+run_tests $ns1 $ns2 dead:beef:1::1 0 1 0 slow
+chk_join_nr "remove single address IPv6" 1 1 1
+chk_add_nr 1 1
+chk_rm_nr 0 0
+
+# subflow and signal IPv6, remove
+reset
+ip netns exec $ns1 ./pm_nl_ctl limits 0 2
+ip netns exec $ns1 ./pm_nl_ctl add dead:beef:2::1 flags signal
+ip netns exec $ns2 ./pm_nl_ctl limits 1 2
+ip netns exec $ns2 ./pm_nl_ctl add dead:beef:3::2 flags subflow
+run_tests $ns1 $ns2 dead:beef:1::1 0 1 1 slow
+chk_join_nr "remove subflow and signal IPv6" 2 2 2
+chk_add_nr 1 1
+chk_rm_nr 1 1
+
 # single subflow, syncookies
 reset_with_cookies
 ip netns exec $ns1 ./pm_nl_ctl limits 0 1
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 09/10] mptcp: track window announced to peer
  2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
                   ` (7 preceding siblings ...)
  2020-11-19 19:46 ` [PATCH net-next 08/10] selftests: mptcp: add ADD_ADDR IPv6 test cases Mat Martineau
@ 2020-11-19 19:46 ` Mat Martineau
  2020-11-19 19:46 ` [PATCH net-next 10/10] mptcp: refine MPTCP-level ack scheduling Mat Martineau
  2020-11-20 23:35 ` [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Jakub Kicinski
  10 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2020-11-19 19:46 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal, kuba, mptcp, Paolo Abeni, Mat Martineau

From: Florian Westphal <fw@strlen.de>

OoO handling attempts to detect when packet is out-of-window by testing
current ack sequence and remaining space vs. sequence number.

This doesn't work reliably. Store the highest allowed sequence number
that we've announced and use it to detect oow packets.

Do this when mptcp options get written to the packet (wire format).
For this to work we need to move the write_options call until after
stack selected a new tcp window.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 include/net/mptcp.h   |  3 ++-
 net/ipv4/tcp_output.c | 11 +++++++----
 net/mptcp/options.c   | 22 +++++++++++++++++++++-
 net/mptcp/protocol.c  | 12 +++++++-----
 net/mptcp/protocol.h  |  1 +
 5 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 6e706d838e4e..b6cf07143a8a 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -88,7 +88,8 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
 			       struct mptcp_out_options *opts);
 void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
 
-void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts);
+void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
+			 struct mptcp_out_options *opts);
 
 /* move the skb extension owership, with the assumption that 'to' is
  * newly allocated
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 99905bc01d40..41880d3521ed 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -445,11 +445,12 @@ struct tcp_out_options {
 	struct mptcp_out_options mptcp;
 };
 
-static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts)
+static void mptcp_options_write(__be32 *ptr, const struct tcp_sock *tp,
+				struct tcp_out_options *opts)
 {
 #if IS_ENABLED(CONFIG_MPTCP)
 	if (unlikely(OPTION_MPTCP & opts->options))
-		mptcp_write_options(ptr, &opts->mptcp);
+		mptcp_write_options(ptr, tp, &opts->mptcp);
 #endif
 }
 
@@ -701,7 +702,7 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
 
 	smc_options_write(ptr, &options);
 
-	mptcp_options_write(ptr, opts);
+	mptcp_options_write(ptr, tp, opts);
 }
 
 static void smc_set_option(const struct tcp_sock *tp,
@@ -1346,7 +1347,6 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
 		}
 	}
 
-	tcp_options_write((__be32 *)(th + 1), tp, &opts);
 	skb_shinfo(skb)->gso_type = sk->sk_gso_type;
 	if (likely(!(tcb->tcp_flags & TCPHDR_SYN))) {
 		th->window      = htons(tcp_select_window(sk));
@@ -1357,6 +1357,9 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
 		 */
 		th->window	= htons(min(tp->rcv_wnd, 65535U));
 	}
+
+	tcp_options_write((__be32 *)(th + 1), tp, &opts);
+
 #ifdef CONFIG_TCP_MD5SIG
 	/* Calculate the MD5 hash, as we have all we need now */
 	if (md5) {
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index d7afffcc87c1..248e3930c0cb 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1010,7 +1010,24 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
 	}
 }
 
-void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts)
+static void mptcp_set_rwin(const struct tcp_sock *tp)
+{
+	const struct sock *ssk = (const struct sock *)tp;
+	const struct mptcp_subflow_context *subflow;
+	struct mptcp_sock *msk;
+	u64 ack_seq;
+
+	subflow = mptcp_subflow_ctx(ssk);
+	msk = mptcp_sk(subflow->conn);
+
+	ack_seq = READ_ONCE(msk->ack_seq) + tp->rcv_wnd;
+
+	if (after64(ack_seq, READ_ONCE(msk->rcv_wnd_sent)))
+		WRITE_ONCE(msk->rcv_wnd_sent, ack_seq);
+}
+
+void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
+			 struct mptcp_out_options *opts)
 {
 	if ((OPTION_MPTCP_MPC_SYN | OPTION_MPTCP_MPC_SYNACK |
 	     OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
@@ -1167,4 +1184,7 @@ void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts)
 					   TCPOPT_NOP << 8 | TCPOPT_NOP, ptr);
 		}
 	}
+
+	if (tp)
+		mptcp_set_rwin(tp);
 }
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index c2629190b1ce..4ae2c4a30e44 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -168,19 +168,19 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb)
 	struct rb_node **p, *parent;
 	u64 seq, end_seq, max_seq;
 	struct sk_buff *skb1;
-	int space;
 
 	seq = MPTCP_SKB_CB(skb)->map_seq;
 	end_seq = MPTCP_SKB_CB(skb)->end_seq;
-	space = tcp_space(sk);
-	max_seq = space > 0 ? space + msk->ack_seq : msk->ack_seq;
+	max_seq = READ_ONCE(msk->rcv_wnd_sent);
 
 	pr_debug("msk=%p seq=%llx limit=%llx empty=%d", msk, seq, max_seq,
 		 RB_EMPTY_ROOT(&msk->out_of_order_queue));
-	if (after64(seq, max_seq)) {
+	if (after64(end_seq, max_seq)) {
 		/* out of window */
 		mptcp_drop(sk, skb);
-		pr_debug("oow by %ld", (unsigned long)seq - (unsigned long)max_seq);
+		pr_debug("oow by %lld, rcv_wnd_sent %llu\n",
+			 (unsigned long long)end_seq - (unsigned long)max_seq,
+			 (unsigned long long)msk->rcv_wnd_sent);
 		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_NODSSWINDOW);
 		return;
 	}
@@ -2294,6 +2294,7 @@ struct sock *mptcp_sk_clone(const struct sock *sk,
 		mptcp_crypto_key_sha(msk->remote_key, NULL, &ack_seq);
 		ack_seq++;
 		WRITE_ONCE(msk->ack_seq, ack_seq);
+		WRITE_ONCE(msk->rcv_wnd_sent, ack_seq);
 	}
 
 	sock_reset_flag(nsk, SOCK_RCU_FREE);
@@ -2586,6 +2587,7 @@ void mptcp_finish_connect(struct sock *ssk)
 	WRITE_ONCE(msk->write_seq, subflow->idsn + 1);
 	WRITE_ONCE(msk->snd_nxt, msk->write_seq);
 	WRITE_ONCE(msk->ack_seq, ack_seq);
+	WRITE_ONCE(msk->rcv_wnd_sent, ack_seq);
 	WRITE_ONCE(msk->can_ack, 1);
 	atomic64_set(&msk->snd_una, msk->write_seq);
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 4d6dd4adaaaf..67f86818203f 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -216,6 +216,7 @@ struct mptcp_sock {
 	u64		write_seq;
 	u64		snd_nxt;
 	u64		ack_seq;
+	u64		rcv_wnd_sent;
 	u64		rcv_data_fin_seq;
 	struct sock	*last_snd;
 	int		snd_burst;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next 10/10] mptcp: refine MPTCP-level ack scheduling
  2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
                   ` (8 preceding siblings ...)
  2020-11-19 19:46 ` [PATCH net-next 09/10] mptcp: track window announced to peer Mat Martineau
@ 2020-11-19 19:46 ` Mat Martineau
  2020-11-23 11:57   ` Eric Dumazet
  2020-11-20 23:35 ` [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Jakub Kicinski
  10 siblings, 1 reply; 14+ messages in thread
From: Mat Martineau @ 2020-11-19 19:46 UTC (permalink / raw)
  To: netdev; +Cc: Paolo Abeni, kuba, mptcp, Mat Martineau

From: Paolo Abeni <pabeni@redhat.com>

Send timely MPTCP-level ack is somewhat difficult when
the insertion into the msk receive level is performed
by the worker.

It needs TCP-level dup-ack to notify the MPTCP-level
ack_seq increase, as both the TCP-level ack seq and the
rcv window are unchanged.

We can actually avoid processing incoming data with the
worker, and let the subflow or recevmsg() send ack as needed.

When recvmsg() moves the skbs inside the msk receive queue,
the msk space is still unchanged, so tcp_cleanup_rbuf() could
end-up skipping TCP-level ack generation. Anyway, when
__mptcp_move_skbs() is invoked, a known amount of bytes is
going to be consumed soon: we update rcv wnd computation taking
them in account.

Additionally we need to explicitly trigger tcp_cleanup_rbuf()
when recvmsg() consumes a significant amount of the receive buffer.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 net/mptcp/options.c  |   1 +
 net/mptcp/protocol.c | 105 +++++++++++++++++++++----------------------
 net/mptcp/protocol.h |   8 ++++
 net/mptcp/subflow.c  |   4 +-
 4 files changed, 61 insertions(+), 57 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 248e3930c0cb..8a59b3e44599 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -530,6 +530,7 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
 		opts->ext_copy.ack64 = 0;
 	}
 	opts->ext_copy.use_ack = 1;
+	WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk));
 
 	/* Add kind/length/subtype/flag overhead if mapping is not populated */
 	if (dss_size == 0)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4ae2c4a30e44..748343f1a968 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -407,16 +407,42 @@ static void mptcp_set_timeout(const struct sock *sk, const struct sock *ssk)
 	mptcp_sk(sk)->timer_ival = tout > 0 ? tout : TCP_RTO_MIN;
 }
 
-static void mptcp_send_ack(struct mptcp_sock *msk)
+static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
+{
+	struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+	/* can't send if JOIN hasn't completed yet (i.e. is usable for mptcp) */
+	if (subflow->request_join && !subflow->fully_established)
+		return false;
+
+	/* only send if our side has not closed yet */
+	return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT));
+}
+
+static void mptcp_send_ack(struct mptcp_sock *msk, bool force)
 {
 	struct mptcp_subflow_context *subflow;
+	struct sock *pick = NULL;
 
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 
-		lock_sock(ssk);
-		tcp_send_ack(ssk);
-		release_sock(ssk);
+		if (force) {
+			lock_sock(ssk);
+			tcp_send_ack(ssk);
+			release_sock(ssk);
+			continue;
+		}
+
+		/* if the hintes ssk is still active, use it */
+		pick = ssk;
+		if (ssk == msk->ack_hint)
+			break;
+	}
+	if (!force && pick) {
+		lock_sock(pick);
+		tcp_cleanup_rbuf(pick, 1);
+		release_sock(pick);
 	}
 }
 
@@ -468,7 +494,7 @@ static bool mptcp_check_data_fin(struct sock *sk)
 
 		ret = true;
 		mptcp_set_timeout(sk, NULL);
-		mptcp_send_ack(msk);
+		mptcp_send_ack(msk, true);
 		mptcp_close_wake_up(sk);
 	}
 	return ret;
@@ -483,7 +509,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 	unsigned int moved = 0;
 	bool more_data_avail;
 	struct tcp_sock *tp;
-	u32 old_copied_seq;
 	bool done = false;
 	int sk_rbuf;
 
@@ -500,7 +525,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 
 	pr_debug("msk=%p ssk=%p", msk, ssk);
 	tp = tcp_sk(ssk);
-	old_copied_seq = tp->copied_seq;
 	do {
 		u32 map_remaining, offset;
 		u32 seq = tp->copied_seq;
@@ -564,11 +588,9 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 			break;
 		}
 	} while (more_data_avail);
+	msk->ack_hint = ssk;
 
 	*bytes += moved;
-	if (tp->copied_seq != old_copied_seq)
-		tcp_cleanup_rbuf(ssk, 1);
-
 	return done;
 }
 
@@ -672,19 +694,8 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 	if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf)
 		goto wake;
 
-	if (move_skbs_to_msk(msk, ssk))
-		goto wake;
+	move_skbs_to_msk(msk, ssk);
 
-	/* mptcp socket is owned, release_cb should retry */
-	if (!test_and_set_bit(TCP_DELACK_TIMER_DEFERRED,
-			      &sk->sk_tsq_flags)) {
-		sock_hold(sk);
-
-		/* need to try again, its possible release_cb() has already
-		 * been called after the test_and_set_bit() above.
-		 */
-		move_skbs_to_msk(msk, ssk);
-	}
 wake:
 	if (wake)
 		sk->sk_data_ready(sk);
@@ -1095,18 +1106,6 @@ static void mptcp_nospace(struct mptcp_sock *msk)
 	mptcp_clean_una((struct sock *)msk);
 }
 
-static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
-{
-	struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-
-	/* can't send if JOIN hasn't completed yet (i.e. is usable for mptcp) */
-	if (subflow->request_join && !subflow->fully_established)
-		return false;
-
-	/* only send if our side has not closed yet */
-	return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT));
-}
-
 #define MPTCP_SEND_BURST_SIZE		((1 << 16) - \
 					 sizeof(struct tcphdr) - \
 					 MAX_TCP_OPTION_SPACE - \
@@ -1533,7 +1532,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
 	msk->rcvq_space.time = mstamp;
 }
 
-static bool __mptcp_move_skbs(struct mptcp_sock *msk)
+static bool __mptcp_move_skbs(struct mptcp_sock *msk, unsigned int rcv)
 {
 	unsigned int moved = 0;
 	bool done;
@@ -1552,12 +1551,16 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk)
 
 		slowpath = lock_sock_fast(ssk);
 		done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved);
+		if (moved && rcv) {
+			WRITE_ONCE(msk->rmem_pending, min(rcv, moved));
+			tcp_cleanup_rbuf(ssk, 1);
+			WRITE_ONCE(msk->rmem_pending, 0);
+		}
 		unlock_sock_fast(ssk, slowpath);
 	} while (!done);
 
 	if (mptcp_ofo_queue(msk) || moved > 0) {
-		if (!mptcp_check_data_fin((struct sock *)msk))
-			mptcp_send_ack(msk);
+		mptcp_check_data_fin((struct sock *)msk);
 		return true;
 	}
 	return false;
@@ -1581,8 +1584,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
 	__mptcp_flush_join_list(msk);
 
-	while (len > (size_t)copied) {
-		int bytes_read;
+	for (;;) {
+		int bytes_read, old_space;
 
 		bytes_read = __mptcp_recvmsg_mskq(msk, msg, len - copied);
 		if (unlikely(bytes_read < 0)) {
@@ -1594,9 +1597,14 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		copied += bytes_read;
 
 		if (skb_queue_empty(&sk->sk_receive_queue) &&
-		    __mptcp_move_skbs(msk))
+		    __mptcp_move_skbs(msk, len - copied))
 			continue;
 
+		/* be sure to advertise window change */
+		old_space = READ_ONCE(msk->old_wspace);
+		if ((tcp_space(sk) - old_space) >= old_space)
+			mptcp_send_ack(msk, false);
+
 		/* only the master socket status is relevant here. The exit
 		 * conditions mirror closely tcp_recvmsg()
 		 */
@@ -1649,7 +1657,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		/* .. race-breaker: ssk might have gotten new data
 		 * after last __mptcp_move_skbs() returned false.
 		 */
-		if (unlikely(__mptcp_move_skbs(msk)))
+		if (unlikely(__mptcp_move_skbs(msk, 0)))
 			set_bit(MPTCP_DATA_READY, &msk->flags);
 	} else if (unlikely(!test_bit(MPTCP_DATA_READY, &msk->flags))) {
 		/* data to read but mptcp_wait_data() cleared DATA_READY */
@@ -1880,7 +1888,6 @@ static void mptcp_worker(struct work_struct *work)
 	if (test_and_clear_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags))
 		__mptcp_close_subflow(msk);
 
-	__mptcp_move_skbs(msk);
 	if (mptcp_send_head(sk))
 		mptcp_push_pending(sk, 0);
 
@@ -1964,6 +1971,7 @@ static int __mptcp_init_sock(struct sock *sk)
 	msk->out_of_order_queue = RB_ROOT;
 	msk->first_pending = NULL;
 
+	msk->ack_hint = NULL;
 	msk->first = NULL;
 	inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss;
 
@@ -2499,8 +2507,7 @@ static int mptcp_getsockopt(struct sock *sk, int level, int optname,
 	return -EOPNOTSUPP;
 }
 
-#define MPTCP_DEFERRED_ALL (TCPF_DELACK_TIMER_DEFERRED | \
-			    TCPF_WRITE_TIMER_DEFERRED)
+#define MPTCP_DEFERRED_ALL (TCPF_WRITE_TIMER_DEFERRED)
 
 /* this is very alike tcp_release_cb() but we must handle differently a
  * different set of events
@@ -2518,16 +2525,6 @@ static void mptcp_release_cb(struct sock *sk)
 
 	sock_release_ownership(sk);
 
-	if (flags & TCPF_DELACK_TIMER_DEFERRED) {
-		struct mptcp_sock *msk = mptcp_sk(sk);
-		struct sock *ssk;
-
-		ssk = mptcp_subflow_recv_lookup(msk);
-		if (!ssk || sk->sk_state == TCP_CLOSE ||
-		    !schedule_work(&msk->work))
-			__sock_put(sk);
-	}
-
 	if (flags & TCPF_WRITE_TIMER_DEFERRED) {
 		mptcp_retransmit_handler(sk);
 		__sock_put(sk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 67f86818203f..82d5626323b1 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -220,10 +220,12 @@ struct mptcp_sock {
 	u64		rcv_data_fin_seq;
 	struct sock	*last_snd;
 	int		snd_burst;
+	int		old_wspace;
 	atomic64_t	snd_una;
 	atomic64_t	wnd_end;
 	unsigned long	timer_ival;
 	u32		token;
+	int		rmem_pending;
 	unsigned long	flags;
 	bool		can_ack;
 	bool		fully_established;
@@ -231,6 +233,7 @@ struct mptcp_sock {
 	bool		snd_data_fin_enable;
 	bool		use_64bit_ack; /* Set when we received a 64-bit DSN */
 	spinlock_t	join_list_lock;
+	struct sock	*ack_hint;
 	struct work_struct work;
 	struct sk_buff  *ooo_last_skb;
 	struct rb_root  out_of_order_queue;
@@ -258,6 +261,11 @@ static inline struct mptcp_sock *mptcp_sk(const struct sock *sk)
 	return (struct mptcp_sock *)sk;
 }
 
+static inline int __mptcp_space(const struct sock *sk)
+{
+	return tcp_space(sk) + READ_ONCE(mptcp_sk(sk)->rmem_pending);
+}
+
 static inline struct mptcp_data_frag *mptcp_send_head(const struct sock *sk)
 {
 	const struct mptcp_sock *msk = mptcp_sk(sk);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index d3c6b3a5ad55..4d8abff1be18 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -850,8 +850,6 @@ static void mptcp_subflow_discard_data(struct sock *ssk, struct sk_buff *skb,
 		sk_eat_skb(ssk, skb);
 	if (mptcp_subflow_get_map_offset(subflow) >= subflow->map_data_len)
 		subflow->map_valid = 0;
-	if (incr)
-		tcp_cleanup_rbuf(ssk, incr);
 }
 
 static bool subflow_check_data_avail(struct sock *ssk)
@@ -973,7 +971,7 @@ void mptcp_space(const struct sock *ssk, int *space, int *full_space)
 	const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 	const struct sock *sk = subflow->conn;
 
-	*space = tcp_space(sk);
+	*space = __mptcp_space(sk);
 	*full_space = tcp_full_space(sk);
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes
  2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
                   ` (9 preceding siblings ...)
  2020-11-19 19:46 ` [PATCH net-next 10/10] mptcp: refine MPTCP-level ack scheduling Mat Martineau
@ 2020-11-20 23:35 ` Jakub Kicinski
  10 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2020-11-20 23:35 UTC (permalink / raw)
  To: Mat Martineau; +Cc: netdev, mptcp

On Thu, 19 Nov 2020 11:45:53 -0800 Mat Martineau wrote:
> Here's another batch of fixup and enhancement patches that we have
> collected in the MPTCP tree.
> 
> Patch 1 removes an unnecessary flag and related code.
> 
> Patch 2 fixes a bug encountered when closing fallback sockets.
> 
> Patches 3 and 4 choose a better transmit subflow, with a self test.
> 
> Patch 5 adjusts tracking of unaccepted subflows
> 
> Patches 6-8 improve handling of long ADD_ADDR options, with a test.
> 
> Patch 9 more reliably tracks the MPTCP-level window shared with peers.
> 
> Patch 10 sends MPTCP-level acknowledgements more aggressively, so the
> peer can send more data without extra delay.

Applied, thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 10/10] mptcp: refine MPTCP-level ack scheduling
  2020-11-19 19:46 ` [PATCH net-next 10/10] mptcp: refine MPTCP-level ack scheduling Mat Martineau
@ 2020-11-23 11:57   ` Eric Dumazet
  2020-11-23 14:21     ` Paolo Abeni
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2020-11-23 11:57 UTC (permalink / raw)
  To: Mat Martineau, netdev; +Cc: Paolo Abeni, kuba, mptcp



On 11/19/20 8:46 PM, Mat Martineau wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> 
> Send timely MPTCP-level ack is somewhat difficult when
> the insertion into the msk receive level is performed
> by the worker.
> 
> It needs TCP-level dup-ack to notify the MPTCP-level
> ack_seq increase, as both the TCP-level ack seq and the
> rcv window are unchanged.
> 
> We can actually avoid processing incoming data with the
> worker, and let the subflow or recevmsg() send ack as needed.
> 
> When recvmsg() moves the skbs inside the msk receive queue,
> the msk space is still unchanged, so tcp_cleanup_rbuf() could
> end-up skipping TCP-level ack generation. Anyway, when
> __mptcp_move_skbs() is invoked, a known amount of bytes is
> going to be consumed soon: we update rcv wnd computation taking
> them in account.
> 
> Additionally we need to explicitly trigger tcp_cleanup_rbuf()
> when recvmsg() consumes a significant amount of the receive buffer.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
> ---
>  net/mptcp/options.c  |   1 +
>  net/mptcp/protocol.c | 105 +++++++++++++++++++++----------------------
>  net/mptcp/protocol.h |   8 ++++
>  net/mptcp/subflow.c  |   4 +-
>  4 files changed, 61 insertions(+), 57 deletions(-)
> 
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index 248e3930c0cb..8a59b3e44599 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -530,6 +530,7 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
>  		opts->ext_copy.ack64 = 0;
>  	}
>  	opts->ext_copy.use_ack = 1;
> +	WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk));
>  
>  	/* Add kind/length/subtype/flag overhead if mapping is not populated */
>  	if (dss_size == 0)
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 4ae2c4a30e44..748343f1a968 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -407,16 +407,42 @@ static void mptcp_set_timeout(const struct sock *sk, const struct sock *ssk)
>  	mptcp_sk(sk)->timer_ival = tout > 0 ? tout : TCP_RTO_MIN;
>  }
>  
> -static void mptcp_send_ack(struct mptcp_sock *msk)
> +static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
> +{
> +	struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> +
> +	/* can't send if JOIN hasn't completed yet (i.e. is usable for mptcp) */
> +	if (subflow->request_join && !subflow->fully_established)
> +		return false;
> +
> +	/* only send if our side has not closed yet */
> +	return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT));
> +}
> +
> +static void mptcp_send_ack(struct mptcp_sock *msk, bool force)
>  {
>  	struct mptcp_subflow_context *subflow;
> +	struct sock *pick = NULL;
>  
>  	mptcp_for_each_subflow(msk, subflow) {
>  		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>  
> -		lock_sock(ssk);
> -		tcp_send_ack(ssk);
> -		release_sock(ssk);
> +		if (force) {
> +			lock_sock(ssk);
> +			tcp_send_ack(ssk);
> +			release_sock(ssk);
> +			continue;
> +		}
> +
> +		/* if the hintes ssk is still active, use it */
> +		pick = ssk;
> +		if (ssk == msk->ack_hint)
> +			break;
> +	}
> +	if (!force && pick) {
> +		lock_sock(pick);
> +		tcp_cleanup_rbuf(pick, 1);

Calling tcp_cleanup_rbuf() on a socket that was never established is going to fail
with a divide by 0 (mss being 0)

AFAIK, mptcp_recvmsg() can be called right after a socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP)
call.

Probably, after a lock_sock(), you should double check socket state (same above before calling tcp_send_ack())



> +		release_sock(pick);
>  	}
>  }
>  


....

>  
> +		/* be sure to advertise window change */
> +		old_space = READ_ONCE(msk->old_wspace);
> +		if ((tcp_space(sk) - old_space) >= old_space)
> +			mptcp_send_ack(msk, false);
> +

Yes, if we call recvmsg() right after socket(), we will end up calling tcp_cleanup_rbuf(),
while no byte was ever copied/drained.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next 10/10] mptcp: refine MPTCP-level ack scheduling
  2020-11-23 11:57   ` Eric Dumazet
@ 2020-11-23 14:21     ` Paolo Abeni
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2020-11-23 14:21 UTC (permalink / raw)
  To: Eric Dumazet, Mat Martineau, netdev; +Cc: kuba, mptcp

Hi,

On Mon, 2020-11-23 at 12:57 +0100, Eric Dumazet wrote:
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 4ae2c4a30e44..748343f1a968 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -407,16 +407,42 @@ static void mptcp_set_timeout(const struct sock *sk, const struct sock *ssk)
> >  	mptcp_sk(sk)->timer_ival = tout > 0 ? tout : TCP_RTO_MIN;
> >  }
> >  
> > -static void mptcp_send_ack(struct mptcp_sock *msk)
> > +static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
> > +{
> > +	struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> > +
> > +	/* can't send if JOIN hasn't completed yet (i.e. is usable for mptcp) */
> > +	if (subflow->request_join && !subflow->fully_established)
> > +		return false;
> > +
> > +	/* only send if our side has not closed yet */
> > +	return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT));
> > +}
> > +
> > +static void mptcp_send_ack(struct mptcp_sock *msk, bool force)
> >  {
> >  	struct mptcp_subflow_context *subflow;
> > +	struct sock *pick = NULL;
> >  
> >  	mptcp_for_each_subflow(msk, subflow) {
> >  		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> >  
> > -		lock_sock(ssk);
> > -		tcp_send_ack(ssk);
> > -		release_sock(ssk);
> > +		if (force) {
> > +			lock_sock(ssk);
> > +			tcp_send_ack(ssk);
> > +			release_sock(ssk);
> > +			continue;
> > +		}
> > +
> > +		/* if the hintes ssk is still active, use it */
> > +		pick = ssk;
> > +		if (ssk == msk->ack_hint)
> > +			break;
> > +	}
> > +	if (!force && pick) {
> > +		lock_sock(pick);
> > +		tcp_cleanup_rbuf(pick, 1);
> 
> Calling tcp_cleanup_rbuf() on a socket that was never established is going to fail
> with a divide by 0 (mss being 0)
> 
> AFAIK, mptcp_recvmsg() can be called right after a socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP)
> call.
> 
> Probably, after a lock_sock(), you should double check socket state (same above before calling tcp_send_ack())

Thank you for looking into this.

Indeed you are right! I'll try to cook a fix.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-11-23 14:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19 19:45 [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Mat Martineau
2020-11-19 19:45 ` [PATCH net-next 01/10] mptcp: drop WORKER_RUNNING status bit Mat Martineau
2020-11-19 19:45 ` [PATCH net-next 02/10] mptcp: fix state tracking for fallback socket Mat Martineau
2020-11-19 19:45 ` [PATCH net-next 03/10] mptcp: skip to next candidate if subflow has unacked data Mat Martineau
2020-11-19 19:45 ` [PATCH net-next 04/10] selftests: mptcp: add link failure test case Mat Martineau
2020-11-19 19:45 ` [PATCH net-next 05/10] mptcp: keep unaccepted MPC subflow into join list Mat Martineau
2020-11-19 19:45 ` [PATCH net-next 06/10] mptcp: change add_addr_signal type Mat Martineau
2020-11-19 19:46 ` [PATCH net-next 07/10] mptcp: send out dedicated ADD_ADDR packet Mat Martineau
2020-11-19 19:46 ` [PATCH net-next 08/10] selftests: mptcp: add ADD_ADDR IPv6 test cases Mat Martineau
2020-11-19 19:46 ` [PATCH net-next 09/10] mptcp: track window announced to peer Mat Martineau
2020-11-19 19:46 ` [PATCH net-next 10/10] mptcp: refine MPTCP-level ack scheduling Mat Martineau
2020-11-23 11:57   ` Eric Dumazet
2020-11-23 14:21     ` Paolo Abeni
2020-11-20 23:35 ` [PATCH net-next 00/10] mptcp: More miscellaneous MPTCP fixes Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).