mptcp.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 mptcp-next 0/4] mptcp: just another receive path refactor
@ 2022-07-29 18:14 Paolo Abeni
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 1/4] mptcp: move RCVPRUNE event later Paolo Abeni
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Paolo Abeni @ 2022-07-29 18:14 UTC (permalink / raw)
  To: mptcp

This is a refresh and rebase of an already shared work:

https://lore.kernel.org/mptcp/cover.1621963632.git.pabeni@redhat.com/
[1]

The motiviation for refreshing this is:

https://lore.kernel.org/mptcp/YtVhyGSsv1CWvPz4@xsang-OptiPlex-9020/

specifically it looks like that properly attaching mem_cg to the
msk socket would be (much?) easier if the RX path and the fwd memory
allocation would be under msk socket lock protection.

The first 2 patches are proably worthy even without the refactor,
but specifically the 2nd one is required to get a good mptcp-level
acking behavior when we move the rx under the socket lock.

The 3rd patch does the real work, and the 4th is a follow-up cleanup.

Back at [1], I measured relevant performance regressions in some
specific cases. I've done the same test here and I now see little to
noise changes. I guess that is mostly due to the better acking
strategy already introduce with commit 949dfdcf343c ("Merge branch 
'mptcp-improve-mptcp-level-window-tracking'") and refined here.

v1 -> v2:
 - fix build issue in patch 3/4 due to PEBKAC
 - added missing commit messages(!!!) in patch 3/4 & 4/4

Paolo Abeni (4):
  mptcp: move RCVPRUNE event later
  mptcp: more accurate receive buffer updates
  mptcp: move msk input path under full msk socket lock
  mptcp: use common helper for rmem memory accounting

 include/net/mptcp.h  |   2 +
 net/ipv4/tcp.c       |   3 +
 net/mptcp/options.c  |   1 -
 net/mptcp/protocol.c | 219 +++++++++++--------------------------------
 net/mptcp/protocol.h |  12 ++-
 5 files changed, 68 insertions(+), 169 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 1/4] mptcp: move RCVPRUNE event later
  2022-07-29 18:14 [PATCH v2 mptcp-next 0/4] mptcp: just another receive path refactor Paolo Abeni
@ 2022-07-29 18:14 ` Paolo Abeni
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 2/4] mptcp: more accurate receive buffer updates Paolo Abeni
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2022-07-29 18:14 UTC (permalink / raw)
  To: mptcp

This clean the code a bit, and avoid skipping msk receive
buffer update on some weird corner case.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 970da88cd04f..5af3d591a20b 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -642,6 +642,10 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 		}
 	}
 
+	/* over limit? can't append more skbs to msk */
+	if (__mptcp_rmem(sk) > sk_rbuf)
+		return true;
+
 	pr_debug("msk=%p ssk=%p", msk, ssk);
 	tp = tcp_sk(ssk);
 	do {
@@ -786,8 +790,6 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
 void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
-	struct mptcp_sock *msk = mptcp_sk(sk);
-	int sk_rbuf, ssk_rbuf;
 
 	/* The peer can send data while we are shutting down this
 	 * subflow at msk destruction time, but we must avoid enqueuing
@@ -796,20 +798,9 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 	if (unlikely(subflow->disposable))
 		return;
 
-	ssk_rbuf = READ_ONCE(ssk->sk_rcvbuf);
-	sk_rbuf = READ_ONCE(sk->sk_rcvbuf);
-	if (unlikely(ssk_rbuf > sk_rbuf))
-		sk_rbuf = ssk_rbuf;
-
-	/* over limit? can't append more skbs to msk, Also, no need to wake-up*/
-	if (__mptcp_rmem(sk) > sk_rbuf) {
-		MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED);
-		return;
-	}
-
 	/* Wake-up the reader only for in-sequence data */
 	mptcp_data_lock(sk);
-	if (move_skbs_to_msk(msk, ssk))
+	if (move_skbs_to_msk(mptcp_sk(sk), ssk))
 		sk->sk_data_ready(sk);
 
 	mptcp_data_unlock(sk);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 2/4] mptcp: more accurate receive buffer updates
  2022-07-29 18:14 [PATCH v2 mptcp-next 0/4] mptcp: just another receive path refactor Paolo Abeni
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 1/4] mptcp: move RCVPRUNE event later Paolo Abeni
@ 2022-07-29 18:14 ` Paolo Abeni
  2022-07-29 23:01   ` Mat Martineau
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 3/4] mptcp: move msk input path under full msk socket lock Paolo Abeni
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 4/4] mptcp: use common helper for rmem memory accounting Paolo Abeni
  3 siblings, 1 reply; 14+ messages in thread
From: Paolo Abeni @ 2022-07-29 18:14 UTC (permalink / raw)
  To: mptcp

Currently mptcp_cleanup_rbuf() makes a significant effort to avoid
acquiring the subflow socket lock, estimating if the tcp level
cleanup could actually send an ack.

Such estimate is a bit rough when accounting for receive window
change, as it consider the msk available buffer space instead
of the announced mptcp-level window.

Let's consider the announced window instead, mirroring closely
the plain TCP implementation. We need to lockless access a bunch
of additional, tcp fields.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/mptcp.h  |  2 ++
 net/ipv4/tcp.c       |  3 +++
 net/mptcp/options.c  |  1 -
 net/mptcp/protocol.c | 14 +++++++++++---
 net/mptcp/protocol.h |  6 +++++-
 5 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 7af7fd48acc7..4b6c66b73bf4 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -136,6 +136,7 @@ static inline bool rsk_drop_req(const struct request_sock *req)
 	return tcp_rsk(req)->is_mptcp && tcp_rsk(req)->drop_req;
 }
 
+void mptcp_receive_window(const struct sock *ssk, int *rwnd);
 void mptcp_space(const struct sock *ssk, int *space, int *full_space);
 bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
 		       unsigned int *size, struct mptcp_out_options *opts);
@@ -284,6 +285,7 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to,
 	return true;
 }
 
+static inline void mptcp_receive_window(const struct sock *ssk, int *rwnd) { }
 static inline void mptcp_space(const struct sock *ssk, int *s, int *fs) { }
 static inline void mptcp_seq_show(struct seq_file *seq) { }
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 970e9a2cca4a..d0c0100f74f3 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1606,6 +1606,9 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
 	if (copied > 0 && !time_to_ack && !(sk->sk_shutdown & RCV_SHUTDOWN)) {
 		__u32 rcv_window_now = tcp_receive_window(tp);
 
+		if (sk_is_mptcp(sk))
+			mptcp_receive_window(sk, &rcv_window_now);
+
 		/* Optimize, __tcp_select_window() is not cheap. */
 		if (2*rcv_window_now <= tp->window_clamp) {
 			__u32 new_window = __tcp_select_window(sk);
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 30d289044e71..563ef8fe5a85 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -604,7 +604,6 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
 	}
 	opts->ext_copy.use_ack = 1;
 	opts->suboptions = OPTION_MPTCP_DSS;
-	WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk));
 
 	/* Add kind/length/subtype/flag overhead if mapping is not populated */
 	if (dss_size == 0)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 5af3d591a20b..17e2dbe43639 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -535,6 +535,14 @@ static void mptcp_subflow_cleanup_rbuf(struct sock *ssk)
 	unlock_sock_fast(ssk, slow);
 }
 
+void mptcp_receive_window(const struct sock *ssk, int *rwnd)
+{
+	const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	const struct sock *sk = subflow->conn;
+
+	*rwnd = __mptcp_receive_window(mptcp_sk(sk));
+}
+
 static bool mptcp_subflow_could_cleanup(const struct sock *ssk, bool rx_empty)
 {
 	const struct inet_connection_sock *icsk = inet_csk(ssk);
@@ -550,13 +558,13 @@ static bool mptcp_subflow_could_cleanup(const struct sock *ssk, bool rx_empty)
 
 static void mptcp_cleanup_rbuf(struct mptcp_sock *msk)
 {
-	int old_space = READ_ONCE(msk->old_wspace);
+	int cur_window = __mptcp_receive_window(msk);
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
-	int space =  __mptcp_space(sk);
+	int new_window =  __mptcp_space(sk);
 	bool cleanup, rx_empty;
 
-	cleanup = (space > 0) && (space >= (old_space << 1));
+	cleanup = new_window > 0 && new_window >= (cur_window << 1);
 	rx_empty = !__mptcp_rmem(sk);
 
 	mptcp_for_each_subflow(msk, subflow) {
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 1bc9f6e77ddd..1e603f28f1db 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -260,7 +260,6 @@ struct mptcp_sock {
 	int		rmem_fwd_alloc;
 	struct sock	*last_snd;
 	int		snd_burst;
-	int		old_wspace;
 	u64		recovery_snd_nxt;	/* in recovery mode accept up to this seq;
 						 * recovery related fields are under data_lock
 						 * protection
@@ -336,6 +335,11 @@ static inline int __mptcp_rmem(const struct sock *sk)
 	return atomic_read(&sk->sk_rmem_alloc) - READ_ONCE(mptcp_sk(sk)->rmem_released);
 }
 
+static inline int __mptcp_receive_window(const struct mptcp_sock *msk)
+{
+	return atomic64_read(&msk->rcv_wnd_sent) - READ_ONCE(msk->ack_seq);
+}
+
 static inline int __mptcp_space(const struct sock *sk)
 {
 	return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - __mptcp_rmem(sk));
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 3/4] mptcp: move msk input path under full msk socket lock
  2022-07-29 18:14 [PATCH v2 mptcp-next 0/4] mptcp: just another receive path refactor Paolo Abeni
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 1/4] mptcp: move RCVPRUNE event later Paolo Abeni
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 2/4] mptcp: more accurate receive buffer updates Paolo Abeni
@ 2022-07-29 18:14 ` Paolo Abeni
  2022-07-29 23:02   ` Mat Martineau
                     ` (2 more replies)
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 4/4] mptcp: use common helper for rmem memory accounting Paolo Abeni
  3 siblings, 3 replies; 14+ messages in thread
From: Paolo Abeni @ 2022-07-29 18:14 UTC (permalink / raw)
  To: mptcp

After commit c2e6048fa1cf ("mptcp: fix race in release_cb") it's
pretty straight forward move the whole MPTCP rx path under the socket
lock leveraging the release_cb.

We can drop a bunch of spin_lock pairs in the receive functions, use
a single receive queue and invoke __mptcp_move_skbs only when subflows
ask for it.

This will allow more cleanup in the next patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 78 +++++++++++++++++++-------------------------
 net/mptcp/protocol.h |  2 +-
 2 files changed, 35 insertions(+), 45 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 17e2dbe43639..b7982b578c86 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -795,7 +795,7 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
 	return moved > 0;
 }
 
-void mptcp_data_ready(struct sock *sk, struct sock *ssk)
+void __mptcp_data_ready(struct sock *sk, struct sock *ssk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 
@@ -807,10 +807,17 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 		return;
 
 	/* Wake-up the reader only for in-sequence data */
-	mptcp_data_lock(sk);
 	if (move_skbs_to_msk(mptcp_sk(sk), ssk))
 		sk->sk_data_ready(sk);
+}
 
+void mptcp_data_ready(struct sock *sk, struct sock *ssk)
+{
+	mptcp_data_lock(sk);
+	if (!sock_owned_by_user(sk))
+		__mptcp_data_ready(sk, ssk);
+	else
+		__set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags);
 	mptcp_data_unlock(sk);
 }
 
@@ -1768,16 +1775,22 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return copied ? : ret;
 }
 
-static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
+static bool __mptcp_move_skbs(struct sock *sk);
+
+static int __mptcp_recvmsg_mskq(struct sock *sk,
 				struct msghdr *msg,
 				size_t len, int flags,
 				struct scm_timestamping_internal *tss,
 				int *cmsg_flags)
 {
+	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct sk_buff *skb, *tmp;
 	int copied = 0;
 
-	skb_queue_walk_safe(&msk->receive_queue, skb, tmp) {
+	if (skb_queue_empty(&sk->sk_receive_queue) && !__mptcp_move_skbs(sk))
+		return 0;
+
+	skb_queue_walk_safe(&sk->sk_receive_queue, skb, tmp) {
 		u32 offset = MPTCP_SKB_CB(skb)->offset;
 		u32 data_len = skb->len - offset;
 		u32 count = min_t(size_t, len - copied, data_len);
@@ -1811,7 +1824,7 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
 			/* we will bulk release the skb memory later */
 			skb->destructor = NULL;
 			WRITE_ONCE(msk->rmem_released, msk->rmem_released + skb->truesize);
-			__skb_unlink(skb, &msk->receive_queue);
+			__skb_unlink(skb, &sk->sk_receive_queue);
 			__kfree_skb(skb);
 		}
 
@@ -1932,16 +1945,9 @@ static void __mptcp_update_rmem(struct sock *sk)
 	WRITE_ONCE(msk->rmem_released, 0);
 }
 
-static void __mptcp_splice_receive_queue(struct sock *sk)
+static bool __mptcp_move_skbs(struct sock *sk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
-
-	skb_queue_splice_tail_init(&sk->sk_receive_queue, &msk->receive_queue);
-}
-
-static bool __mptcp_move_skbs(struct mptcp_sock *msk)
-{
-	struct sock *sk = (struct sock *)msk;
 	unsigned int moved = 0;
 	bool ret, done;
 
@@ -1949,37 +1955,29 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk)
 		struct sock *ssk = mptcp_subflow_recv_lookup(msk);
 		bool slowpath;
 
-		/* we can have data pending in the subflows only if the msk
-		 * receive buffer was full at subflow_data_ready() time,
-		 * that is an unlikely slow path.
-		 */
-		if (likely(!ssk))
+		if (unlikely(!ssk))
 			break;
 
 		slowpath = lock_sock_fast(ssk);
-		mptcp_data_lock(sk);
 		__mptcp_update_rmem(sk);
 		done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved);
-		mptcp_data_unlock(sk);
 
 		if (unlikely(ssk->sk_err))
 			__mptcp_error_report(sk);
 		unlock_sock_fast(ssk, slowpath);
 	} while (!done);
 
-	/* acquire the data lock only if some input data is pending */
 	ret = moved > 0;
 	if (!RB_EMPTY_ROOT(&msk->out_of_order_queue) ||
-	    !skb_queue_empty_lockless(&sk->sk_receive_queue)) {
-		mptcp_data_lock(sk);
+	    !skb_queue_empty(&sk->sk_receive_queue)) {
 		__mptcp_update_rmem(sk);
 		ret |= __mptcp_ofo_queue(msk);
-		__mptcp_splice_receive_queue(sk);
-		mptcp_data_unlock(sk);
 	}
-	if (ret)
+	if (ret) {
+		mptcp_cleanup_rbuf(msk);
 		mptcp_check_data_fin((struct sock *)msk);
-	return !skb_queue_empty(&msk->receive_queue);
+	}
+	return ret;
 }
 
 static unsigned int mptcp_inq_hint(const struct sock *sk)
@@ -1987,7 +1985,7 @@ static unsigned int mptcp_inq_hint(const struct sock *sk)
 	const struct mptcp_sock *msk = mptcp_sk(sk);
 	const struct sk_buff *skb;
 
-	skb = skb_peek(&msk->receive_queue);
+	skb = skb_peek(&sk->sk_receive_queue);
 	if (skb) {
 		u64 hint_val = msk->ack_seq - MPTCP_SKB_CB(skb)->map_seq;
 
@@ -2033,7 +2031,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	while (copied < len) {
 		int bytes_read;
 
-		bytes_read = __mptcp_recvmsg_mskq(msk, msg, len - copied, flags, &tss, &cmsg_flags);
+		bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags, &tss, &cmsg_flags);
 		if (unlikely(bytes_read < 0)) {
 			if (!copied)
 				copied = bytes_read;
@@ -2045,9 +2043,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		/* be sure to advertise window change */
 		mptcp_cleanup_rbuf(msk);
 
-		if (skb_queue_empty(&msk->receive_queue) && __mptcp_move_skbs(msk))
-			continue;
-
 		/* only the master socket status is relevant here. The exit
 		 * conditions mirror closely tcp_recvmsg()
 		 */
@@ -2074,7 +2069,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 				/* race breaker: the shutdown could be after the
 				 * previous receive queue check
 				 */
-				if (__mptcp_move_skbs(msk))
+				if (__mptcp_move_skbs(sk))
 					continue;
 				break;
 			}
@@ -2111,9 +2106,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		}
 	}
 
-	pr_debug("msk=%p rx queue empty=%d:%d copied=%d",
-		 msk, skb_queue_empty_lockless(&sk->sk_receive_queue),
-		 skb_queue_empty(&msk->receive_queue), copied);
+	pr_debug("msk=%p rx queue empty=%d copied=%d",
+		 msk, skb_queue_empty(&sk->sk_receive_queue), copied);
 	if (!(flags & MSG_PEEK))
 		mptcp_rcv_space_adjust(msk, copied);
 
@@ -2566,7 +2560,6 @@ static int __mptcp_init_sock(struct sock *sk)
 	INIT_LIST_HEAD(&msk->join_list);
 	INIT_LIST_HEAD(&msk->rtx_queue);
 	INIT_WORK(&msk->work, mptcp_worker);
-	__skb_queue_head_init(&msk->receive_queue);
 	msk->out_of_order_queue = RB_ROOT;
 	msk->first_pending = NULL;
 	msk->rmem_fwd_alloc = 0;
@@ -3048,12 +3041,8 @@ void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
 	mptcp_for_each_subflow_safe(msk, subflow, tmp)
 		__mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags);
 
-	/* move to sk_receive_queue, sk_stream_kill_queues will purge it */
-	mptcp_data_lock(sk);
-	skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue);
 	__skb_queue_purge(&sk->sk_receive_queue);
 	skb_rbtree_purge(&msk->out_of_order_queue);
-	mptcp_data_unlock(sk);
 
 	/* move all the rx fwd alloc into the sk_mem_reclaim_final in
 	 * inet_sock_destruct() will dispose it
@@ -3135,6 +3124,8 @@ static void mptcp_release_cb(struct sock *sk)
 			__mptcp_flush_join_list(sk);
 		if (flags & BIT(MPTCP_PUSH_PENDING))
 			__mptcp_push_pending(sk, 0);
+		if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk))
+			sk->sk_data_ready(sk);
 		if (flags & BIT(MPTCP_RETRANSMIT))
 			__mptcp_retrans(sk);
 
@@ -3383,7 +3374,7 @@ static int mptcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 			return -EINVAL;
 
 		lock_sock(sk);
-		__mptcp_move_skbs(msk);
+		__mptcp_move_skbs(sk);
 		answ = mptcp_inq_hint(sk);
 		release_sock(sk);
 		break;
@@ -3619,8 +3610,7 @@ static __poll_t mptcp_check_readable(struct mptcp_sock *msk)
 	/* Concurrent splices from sk_receive_queue into receive_queue will
 	 * always show at least one non-empty queue when checked in this order.
 	 */
-	if (skb_queue_empty_lockless(&((struct sock *)msk)->sk_receive_queue) &&
-	    skb_queue_empty_lockless(&msk->receive_queue))
+	if (skb_queue_empty_lockless(&((struct sock *)msk)->sk_receive_queue))
 		return 0;
 
 	return EPOLLIN | EPOLLRDNORM;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 1e603f28f1db..f12a6e80171d 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -125,6 +125,7 @@
 #define MPTCP_FLUSH_JOIN_LIST	5
 #define MPTCP_CONNECTED		6
 #define MPTCP_RESET_SCHEDULER	7
+#define MPTCP_DEQUEUE		8
 
 static inline bool before64(__u64 seq1, __u64 seq2)
 {
@@ -288,7 +289,6 @@ struct mptcp_sock {
 	struct work_struct work;
 	struct sk_buff  *ooo_last_skb;
 	struct rb_root  out_of_order_queue;
-	struct sk_buff_head receive_queue;
 	struct list_head conn_list;
 	struct list_head rtx_queue;
 	struct mptcp_data_frag *first_pending;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 mptcp-next 4/4] mptcp: use common helper for rmem memory accounting
  2022-07-29 18:14 [PATCH v2 mptcp-next 0/4] mptcp: just another receive path refactor Paolo Abeni
                   ` (2 preceding siblings ...)
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 3/4] mptcp: move msk input path under full msk socket lock Paolo Abeni
@ 2022-07-29 18:14 ` Paolo Abeni
  2022-07-29 18:32   ` mptcp: use common helper for rmem memory accounting: Build Failure MPTCP CI
                     ` (2 more replies)
  3 siblings, 3 replies; 14+ messages in thread
From: Paolo Abeni @ 2022-07-29 18:14 UTC (permalink / raw)
  To: mptcp

After the previous patch, updating sk_forward_memory is cheap and
we can drop a lot of complexity from the MPTCP memory acconting,
using the common helper for that.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 112 ++++---------------------------------------
 net/mptcp/protocol.h |   4 +-
 2 files changed, 9 insertions(+), 107 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index b7982b578c86..f6ff130f39c1 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -131,11 +131,6 @@ static void mptcp_drop(struct sock *sk, struct sk_buff *skb)
 	__kfree_skb(skb);
 }
 
-static void mptcp_rmem_charge(struct sock *sk, int size)
-{
-	mptcp_sk(sk)->rmem_fwd_alloc -= size;
-}
-
 static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to,
 			       struct sk_buff *from)
 {
@@ -152,7 +147,7 @@ static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to,
 	MPTCP_SKB_CB(to)->end_seq = MPTCP_SKB_CB(from)->end_seq;
 	kfree_skb_partial(from, fragstolen);
 	atomic_add(delta, &sk->sk_rmem_alloc);
-	mptcp_rmem_charge(sk, delta);
+	sk->sk_forward_alloc -= delta;
 	return true;
 }
 
@@ -165,44 +160,6 @@ static bool mptcp_ooo_try_coalesce(struct mptcp_sock *msk, struct sk_buff *to,
 	return mptcp_try_coalesce((struct sock *)msk, to, from);
 }
 
-static void __mptcp_rmem_reclaim(struct sock *sk, int amount)
-{
-	amount >>= PAGE_SHIFT;
-	mptcp_sk(sk)->rmem_fwd_alloc -= amount << PAGE_SHIFT;
-	__sk_mem_reduce_allocated(sk, amount);
-}
-
-static void mptcp_rmem_uncharge(struct sock *sk, int size)
-{
-	struct mptcp_sock *msk = mptcp_sk(sk);
-	int reclaimable;
-
-	msk->rmem_fwd_alloc += size;
-	reclaimable = msk->rmem_fwd_alloc - sk_unused_reserved_mem(sk);
-
-	/* see sk_mem_uncharge() for the rationale behind the following schema */
-	if (unlikely(reclaimable >= PAGE_SIZE))
-		__mptcp_rmem_reclaim(sk, reclaimable);
-}
-
-static void mptcp_rfree(struct sk_buff *skb)
-{
-	unsigned int len = skb->truesize;
-	struct sock *sk = skb->sk;
-
-	atomic_sub(len, &sk->sk_rmem_alloc);
-	mptcp_rmem_uncharge(sk, len);
-}
-
-static void mptcp_set_owner_r(struct sk_buff *skb, struct sock *sk)
-{
-	skb_orphan(skb);
-	skb->sk = sk;
-	skb->destructor = mptcp_rfree;
-	atomic_add(skb->truesize, &sk->sk_rmem_alloc);
-	mptcp_rmem_charge(sk, skb->truesize);
-}
-
 /* "inspired" by tcp_data_queue_ofo(), main differences:
  * - use mptcp seqs
  * - don't cope with sacks
@@ -315,25 +272,7 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb)
 
 end:
 	skb_condense(skb);
-	mptcp_set_owner_r(skb, sk);
-}
-
-static bool mptcp_rmem_schedule(struct sock *sk, struct sock *ssk, int size)
-{
-	struct mptcp_sock *msk = mptcp_sk(sk);
-	int amt, amount;
-
-	if (size <= msk->rmem_fwd_alloc)
-		return true;
-
-	size -= msk->rmem_fwd_alloc;
-	amt = sk_mem_pages(size);
-	amount = amt << PAGE_SHIFT;
-	if (!__sk_mem_raise_allocated(sk, size, amt, SK_MEM_RECV))
-		return false;
-
-	msk->rmem_fwd_alloc += amount;
-	return true;
+	skb_set_owner_r(skb, sk);
 }
 
 static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
@@ -350,8 +289,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
 	skb_ext_reset(skb);
 	skb_orphan(skb);
 
-	/* try to fetch required memory from subflow */
-	if (!mptcp_rmem_schedule(sk, ssk, skb->truesize))
+	if (!sk_rmem_schedule(sk, skb, skb->truesize))
 		goto drop;
 
 	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
@@ -372,7 +310,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
 		if (tail && mptcp_try_coalesce(sk, tail, skb))
 			return true;
 
-		mptcp_set_owner_r(skb, sk);
+		skb_set_owner_r(skb, sk);
 		__skb_queue_tail(&sk->sk_receive_queue, skb);
 		return true;
 	} else if (after64(MPTCP_SKB_CB(skb)->map_seq, msk->ack_seq)) {
@@ -1783,7 +1721,6 @@ static int __mptcp_recvmsg_mskq(struct sock *sk,
 				struct scm_timestamping_internal *tss,
 				int *cmsg_flags)
 {
-	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct sk_buff *skb, *tmp;
 	int copied = 0;
 
@@ -1821,9 +1758,10 @@ static int __mptcp_recvmsg_mskq(struct sock *sk,
 		}
 
 		if (!(flags & MSG_PEEK)) {
-			/* we will bulk release the skb memory later */
+			/* avoid the indirect call, we know the destructor is sock_wfree */
 			skb->destructor = NULL;
-			WRITE_ONCE(msk->rmem_released, msk->rmem_released + skb->truesize);
+			atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
+			sk_mem_uncharge(sk, skb->truesize);
 			__skb_unlink(skb, &sk->sk_receive_queue);
 			__kfree_skb(skb);
 		}
@@ -1933,18 +1871,6 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
 	msk->rcvq_space.time = mstamp;
 }
 
-static void __mptcp_update_rmem(struct sock *sk)
-{
-	struct mptcp_sock *msk = mptcp_sk(sk);
-
-	if (!msk->rmem_released)
-		return;
-
-	atomic_sub(msk->rmem_released, &sk->sk_rmem_alloc);
-	mptcp_rmem_uncharge(sk, msk->rmem_released);
-	WRITE_ONCE(msk->rmem_released, 0);
-}
-
 static bool __mptcp_move_skbs(struct sock *sk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
@@ -1959,7 +1885,6 @@ static bool __mptcp_move_skbs(struct sock *sk)
 			break;
 
 		slowpath = lock_sock_fast(ssk);
-		__mptcp_update_rmem(sk);
 		done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved);
 
 		if (unlikely(ssk->sk_err))
@@ -1968,11 +1893,7 @@ static bool __mptcp_move_skbs(struct sock *sk)
 	} while (!done);
 
 	ret = moved > 0;
-	if (!RB_EMPTY_ROOT(&msk->out_of_order_queue) ||
-	    !skb_queue_empty(&sk->sk_receive_queue)) {
-		__mptcp_update_rmem(sk);
-		ret |= __mptcp_ofo_queue(msk);
-	}
+	ret |= __mptcp_ofo_queue(msk);
 	if (ret) {
 		mptcp_cleanup_rbuf(msk);
 		mptcp_check_data_fin((struct sock *)msk);
@@ -2562,8 +2483,6 @@ static int __mptcp_init_sock(struct sock *sk)
 	INIT_WORK(&msk->work, mptcp_worker);
 	msk->out_of_order_queue = RB_ROOT;
 	msk->first_pending = NULL;
-	msk->rmem_fwd_alloc = 0;
-	WRITE_ONCE(msk->rmem_released, 0);
 	msk->timer_ival = TCP_RTO_MIN;
 
 	msk->first = NULL;
@@ -2775,8 +2694,6 @@ static void __mptcp_destroy_sock(struct sock *sk)
 
 	sk->sk_prot->destroy(sk);
 
-	WARN_ON_ONCE(msk->rmem_fwd_alloc);
-	WARN_ON_ONCE(msk->rmem_released);
 	sk_stream_kill_queues(sk);
 	xfrm_sk_free_policy(sk);
 
@@ -3044,11 +2961,6 @@ void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
 	__skb_queue_purge(&sk->sk_receive_queue);
 	skb_rbtree_purge(&msk->out_of_order_queue);
 
-	/* move all the rx fwd alloc into the sk_mem_reclaim_final in
-	 * inet_sock_destruct() will dispose it
-	 */
-	sk->sk_forward_alloc += msk->rmem_fwd_alloc;
-	msk->rmem_fwd_alloc = 0;
 	mptcp_token_destroy(msk);
 	mptcp_pm_free_anno_list(msk);
 	mptcp_free_local_addr_list(msk);
@@ -3146,8 +3058,6 @@ static void mptcp_release_cb(struct sock *sk)
 		if (__test_and_clear_bit(MPTCP_RESET_SCHEDULER, &msk->cb_flags))
 			msk->last_snd = NULL;
 	}
-
-	__mptcp_update_rmem(sk);
 }
 
 /* MP_JOIN client subflow must wait for 4th ack before sending any data:
@@ -3328,11 +3238,6 @@ static void mptcp_shutdown(struct sock *sk, int how)
 		__mptcp_wr_shutdown(sk);
 }
 
-static int mptcp_forward_alloc_get(const struct sock *sk)
-{
-	return sk->sk_forward_alloc + mptcp_sk(sk)->rmem_fwd_alloc;
-}
-
 static int mptcp_ioctl_outq(const struct mptcp_sock *msk, u64 v)
 {
 	const struct sock *sk = (void *)msk;
@@ -3413,7 +3318,6 @@ static struct proto mptcp_prot = {
 	.hash		= mptcp_hash,
 	.unhash		= mptcp_unhash,
 	.get_port	= mptcp_get_port,
-	.forward_alloc_get	= mptcp_forward_alloc_get,
 	.sockets_allocated	= &mptcp_sockets_allocated,
 
 	.memory_allocated	= &tcp_memory_allocated,
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index f12a6e80171d..d4a267cb7663 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -258,7 +258,6 @@ struct mptcp_sock {
 	u64		ack_seq;
 	atomic64_t	rcv_wnd_sent;
 	u64		rcv_data_fin_seq;
-	int		rmem_fwd_alloc;
 	struct sock	*last_snd;
 	int		snd_burst;
 	u64		recovery_snd_nxt;	/* in recovery mode accept up to this seq;
@@ -269,7 +268,6 @@ struct mptcp_sock {
 	u64		wnd_end;
 	unsigned long	timer_ival;
 	u32		token;
-	int		rmem_released;
 	unsigned long	flags;
 	unsigned long	cb_flags;
 	unsigned long	push_pending;
@@ -332,7 +330,7 @@ static inline struct mptcp_sock *mptcp_sk(const struct sock *sk)
  */
 static inline int __mptcp_rmem(const struct sock *sk)
 {
-	return atomic_read(&sk->sk_rmem_alloc) - READ_ONCE(mptcp_sk(sk)->rmem_released);
+	return atomic_read(&sk->sk_rmem_alloc);
 }
 
 static inline int __mptcp_receive_window(const struct mptcp_sock *msk)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: mptcp: use common helper for rmem memory accounting: Build Failure
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 4/4] mptcp: use common helper for rmem memory accounting Paolo Abeni
@ 2022-07-29 18:32   ` MPTCP CI
  2022-07-29 21:20   ` mptcp: use common helper for rmem memory accounting: Tests Results MPTCP CI
  2022-07-29 23:16   ` [PATCH v2 mptcp-next 4/4] mptcp: use common helper for rmem memory accounting Mat Martineau
  2 siblings, 0 replies; 14+ messages in thread
From: MPTCP CI @ 2022-07-29 18:32 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

But sadly, our CI spotted some issues with it when trying to build it.

You can find more details there:

  https://patchwork.kernel.org/project/mptcp/patch/a5a451ed4421993380b46e8b546b228a66001212.1659117128.git.pabeni@redhat.com/
  https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2762483896

Status: failure
Initiator: MPTCPimporter
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/2febaa3d4a51

Feel free to reply to this email if you cannot access logs, if you need
some support to fix the error, if this doesn't seem to be caused by your
modifications or if the error is a false positive one.

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mptcp: use common helper for rmem memory accounting: Tests Results
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 4/4] mptcp: use common helper for rmem memory accounting Paolo Abeni
  2022-07-29 18:32   ` mptcp: use common helper for rmem memory accounting: Build Failure MPTCP CI
@ 2022-07-29 21:20   ` MPTCP CI
  2022-07-29 23:16   ` [PATCH v2 mptcp-next 4/4] mptcp: use common helper for rmem memory accounting Mat Martineau
  2 siblings, 0 replies; 14+ messages in thread
From: MPTCP CI @ 2022-07-29 21:20 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal:
  - Unstable: 1 failed test(s): selftest_mptcp_join 🔴:
  - Task: https://cirrus-ci.com/task/6475967970410496
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6475967970410496/summary/summary.txt

- KVM Validation: debug:
  - Unstable: 1 failed test(s): selftest_mptcp_join 🔴:
  - Task: https://cirrus-ci.com/task/6082667547459584
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6082667547459584/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/2febaa3d4a51


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 2/4] mptcp: more accurate receive buffer updates
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 2/4] mptcp: more accurate receive buffer updates Paolo Abeni
@ 2022-07-29 23:01   ` Mat Martineau
  2022-07-30  6:32     ` Paolo Abeni
  0 siblings, 1 reply; 14+ messages in thread
From: Mat Martineau @ 2022-07-29 23:01 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Fri, 29 Jul 2022, Paolo Abeni wrote:

> Currently mptcp_cleanup_rbuf() makes a significant effort to avoid
> acquiring the subflow socket lock, estimating if the tcp level
> cleanup could actually send an ack.
>
> Such estimate is a bit rough when accounting for receive window
> change, as it consider the msk available buffer space instead
> of the announced mptcp-level window.
>
> Let's consider the announced window instead, mirroring closely
> the plain TCP implementation. We need to lockless access a bunch
> of additional, tcp fields.

Hi Paolo -

The code changes look good. For this last sentence above ("...lockless 
access a bunch of ... tcp fields"), I only see access to msk->rcv_wnd_sent 
and msk->ack_seq, and nothing obvious in the later commits. Could you 
clarify?


- Mat



>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> include/net/mptcp.h  |  2 ++
> net/ipv4/tcp.c       |  3 +++
> net/mptcp/options.c  |  1 -
> net/mptcp/protocol.c | 14 +++++++++++---
> net/mptcp/protocol.h |  6 +++++-
> 5 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> index 7af7fd48acc7..4b6c66b73bf4 100644
> --- a/include/net/mptcp.h
> +++ b/include/net/mptcp.h
> @@ -136,6 +136,7 @@ static inline bool rsk_drop_req(const struct request_sock *req)
> 	return tcp_rsk(req)->is_mptcp && tcp_rsk(req)->drop_req;
> }
>
> +void mptcp_receive_window(const struct sock *ssk, int *rwnd);
> void mptcp_space(const struct sock *ssk, int *space, int *full_space);
> bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
> 		       unsigned int *size, struct mptcp_out_options *opts);
> @@ -284,6 +285,7 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to,
> 	return true;
> }
>
> +static inline void mptcp_receive_window(const struct sock *ssk, int *rwnd) { }
> static inline void mptcp_space(const struct sock *ssk, int *s, int *fs) { }
> static inline void mptcp_seq_show(struct seq_file *seq) { }
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 970e9a2cca4a..d0c0100f74f3 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1606,6 +1606,9 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
> 	if (copied > 0 && !time_to_ack && !(sk->sk_shutdown & RCV_SHUTDOWN)) {
> 		__u32 rcv_window_now = tcp_receive_window(tp);
>
> +		if (sk_is_mptcp(sk))
> +			mptcp_receive_window(sk, &rcv_window_now);
> +
> 		/* Optimize, __tcp_select_window() is not cheap. */
> 		if (2*rcv_window_now <= tp->window_clamp) {
> 			__u32 new_window = __tcp_select_window(sk);
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index 30d289044e71..563ef8fe5a85 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -604,7 +604,6 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
> 	}
> 	opts->ext_copy.use_ack = 1;
> 	opts->suboptions = OPTION_MPTCP_DSS;
> -	WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk));
>
> 	/* Add kind/length/subtype/flag overhead if mapping is not populated */
> 	if (dss_size == 0)
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 5af3d591a20b..17e2dbe43639 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -535,6 +535,14 @@ static void mptcp_subflow_cleanup_rbuf(struct sock *ssk)
> 	unlock_sock_fast(ssk, slow);
> }
>
> +void mptcp_receive_window(const struct sock *ssk, int *rwnd)
> +{
> +	const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> +	const struct sock *sk = subflow->conn;
> +
> +	*rwnd = __mptcp_receive_window(mptcp_sk(sk));
> +}
> +
> static bool mptcp_subflow_could_cleanup(const struct sock *ssk, bool rx_empty)
> {
> 	const struct inet_connection_sock *icsk = inet_csk(ssk);
> @@ -550,13 +558,13 @@ static bool mptcp_subflow_could_cleanup(const struct sock *ssk, bool rx_empty)
>
> static void mptcp_cleanup_rbuf(struct mptcp_sock *msk)
> {
> -	int old_space = READ_ONCE(msk->old_wspace);
> +	int cur_window = __mptcp_receive_window(msk);
> 	struct mptcp_subflow_context *subflow;
> 	struct sock *sk = (struct sock *)msk;
> -	int space =  __mptcp_space(sk);
> +	int new_window =  __mptcp_space(sk);
> 	bool cleanup, rx_empty;
>
> -	cleanup = (space > 0) && (space >= (old_space << 1));
> +	cleanup = new_window > 0 && new_window >= (cur_window << 1);
> 	rx_empty = !__mptcp_rmem(sk);
>
> 	mptcp_for_each_subflow(msk, subflow) {
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 1bc9f6e77ddd..1e603f28f1db 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -260,7 +260,6 @@ struct mptcp_sock {
> 	int		rmem_fwd_alloc;
> 	struct sock	*last_snd;
> 	int		snd_burst;
> -	int		old_wspace;
> 	u64		recovery_snd_nxt;	/* in recovery mode accept up to this seq;
> 						 * recovery related fields are under data_lock
> 						 * protection
> @@ -336,6 +335,11 @@ static inline int __mptcp_rmem(const struct sock *sk)
> 	return atomic_read(&sk->sk_rmem_alloc) - READ_ONCE(mptcp_sk(sk)->rmem_released);
> }
>
> +static inline int __mptcp_receive_window(const struct mptcp_sock *msk)
> +{
> +	return atomic64_read(&msk->rcv_wnd_sent) - READ_ONCE(msk->ack_seq);
> +}
> +
> static inline int __mptcp_space(const struct sock *sk)
> {
> 	return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - __mptcp_rmem(sk));
> -- 
> 2.35.3
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 3/4] mptcp: move msk input path under full msk socket lock
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 3/4] mptcp: move msk input path under full msk socket lock Paolo Abeni
@ 2022-07-29 23:02   ` Mat Martineau
  2022-07-30  6:32     ` Paolo Abeni
  2022-07-30  5:36   ` kernel test robot
  2022-07-30  6:43   ` kernel test robot
  2 siblings, 1 reply; 14+ messages in thread
From: Mat Martineau @ 2022-07-29 23:02 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Fri, 29 Jul 2022, Paolo Abeni wrote:

> After commit c2e6048fa1cf ("mptcp: fix race in release_cb") it's
> pretty straight forward move the whole MPTCP rx path under the socket
> lock leveraging the release_cb.
>
> We can drop a bunch of spin_lock pairs in the receive functions, use
> a single receive queue and invoke __mptcp_move_skbs only when subflows
> ask for it.
>
> This will allow more cleanup in the next patch.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 78 +++++++++++++++++++-------------------------
> net/mptcp/protocol.h |  2 +-
> 2 files changed, 35 insertions(+), 45 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 17e2dbe43639..b7982b578c86 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -795,7 +795,7 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
> 	return moved > 0;
> }
>
> -void mptcp_data_ready(struct sock *sk, struct sock *ssk)
> +void __mptcp_data_ready(struct sock *sk, struct sock *ssk)

While a normal build works fine, the CI is complaining about this line in 
the "make W=1 C=1" build. The function is only used in this file, so 
adding 'static' would make 'sparse' happy.

- Mat

> {
> 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
>
> @@ -807,10 +807,17 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
> 		return;
>
> 	/* Wake-up the reader only for in-sequence data */
> -	mptcp_data_lock(sk);
> 	if (move_skbs_to_msk(mptcp_sk(sk), ssk))
> 		sk->sk_data_ready(sk);
> +}
>
> +void mptcp_data_ready(struct sock *sk, struct sock *ssk)
> +{
> +	mptcp_data_lock(sk);
> +	if (!sock_owned_by_user(sk))
> +		__mptcp_data_ready(sk, ssk);
> +	else
> +		__set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags);
> 	mptcp_data_unlock(sk);
> }
>
> @@ -1768,16 +1775,22 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> 	return copied ? : ret;
> }
>
> -static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
> +static bool __mptcp_move_skbs(struct sock *sk);
> +
> +static int __mptcp_recvmsg_mskq(struct sock *sk,
> 				struct msghdr *msg,
> 				size_t len, int flags,
> 				struct scm_timestamping_internal *tss,
> 				int *cmsg_flags)
> {
> +	struct mptcp_sock *msk = mptcp_sk(sk);
> 	struct sk_buff *skb, *tmp;
> 	int copied = 0;
>
> -	skb_queue_walk_safe(&msk->receive_queue, skb, tmp) {
> +	if (skb_queue_empty(&sk->sk_receive_queue) && !__mptcp_move_skbs(sk))
> +		return 0;
> +
> +	skb_queue_walk_safe(&sk->sk_receive_queue, skb, tmp) {
> 		u32 offset = MPTCP_SKB_CB(skb)->offset;
> 		u32 data_len = skb->len - offset;
> 		u32 count = min_t(size_t, len - copied, data_len);
> @@ -1811,7 +1824,7 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
> 			/* we will bulk release the skb memory later */
> 			skb->destructor = NULL;
> 			WRITE_ONCE(msk->rmem_released, msk->rmem_released + skb->truesize);
> -			__skb_unlink(skb, &msk->receive_queue);
> +			__skb_unlink(skb, &sk->sk_receive_queue);
> 			__kfree_skb(skb);
> 		}
>
> @@ -1932,16 +1945,9 @@ static void __mptcp_update_rmem(struct sock *sk)
> 	WRITE_ONCE(msk->rmem_released, 0);
> }
>
> -static void __mptcp_splice_receive_queue(struct sock *sk)
> +static bool __mptcp_move_skbs(struct sock *sk)
> {
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> -
> -	skb_queue_splice_tail_init(&sk->sk_receive_queue, &msk->receive_queue);
> -}
> -
> -static bool __mptcp_move_skbs(struct mptcp_sock *msk)
> -{
> -	struct sock *sk = (struct sock *)msk;
> 	unsigned int moved = 0;
> 	bool ret, done;
>
> @@ -1949,37 +1955,29 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk)
> 		struct sock *ssk = mptcp_subflow_recv_lookup(msk);
> 		bool slowpath;
>
> -		/* we can have data pending in the subflows only if the msk
> -		 * receive buffer was full at subflow_data_ready() time,
> -		 * that is an unlikely slow path.
> -		 */
> -		if (likely(!ssk))
> +		if (unlikely(!ssk))
> 			break;
>
> 		slowpath = lock_sock_fast(ssk);
> -		mptcp_data_lock(sk);
> 		__mptcp_update_rmem(sk);
> 		done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved);
> -		mptcp_data_unlock(sk);
>
> 		if (unlikely(ssk->sk_err))
> 			__mptcp_error_report(sk);
> 		unlock_sock_fast(ssk, slowpath);
> 	} while (!done);
>
> -	/* acquire the data lock only if some input data is pending */
> 	ret = moved > 0;
> 	if (!RB_EMPTY_ROOT(&msk->out_of_order_queue) ||
> -	    !skb_queue_empty_lockless(&sk->sk_receive_queue)) {
> -		mptcp_data_lock(sk);
> +	    !skb_queue_empty(&sk->sk_receive_queue)) {
> 		__mptcp_update_rmem(sk);
> 		ret |= __mptcp_ofo_queue(msk);
> -		__mptcp_splice_receive_queue(sk);
> -		mptcp_data_unlock(sk);
> 	}
> -	if (ret)
> +	if (ret) {
> +		mptcp_cleanup_rbuf(msk);
> 		mptcp_check_data_fin((struct sock *)msk);
> -	return !skb_queue_empty(&msk->receive_queue);
> +	}
> +	return ret;
> }
>
> static unsigned int mptcp_inq_hint(const struct sock *sk)
> @@ -1987,7 +1985,7 @@ static unsigned int mptcp_inq_hint(const struct sock *sk)
> 	const struct mptcp_sock *msk = mptcp_sk(sk);
> 	const struct sk_buff *skb;
>
> -	skb = skb_peek(&msk->receive_queue);
> +	skb = skb_peek(&sk->sk_receive_queue);
> 	if (skb) {
> 		u64 hint_val = msk->ack_seq - MPTCP_SKB_CB(skb)->map_seq;
>
> @@ -2033,7 +2031,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> 	while (copied < len) {
> 		int bytes_read;
>
> -		bytes_read = __mptcp_recvmsg_mskq(msk, msg, len - copied, flags, &tss, &cmsg_flags);
> +		bytes_read = __mptcp_recvmsg_mskq(sk, msg, len - copied, flags, &tss, &cmsg_flags);
> 		if (unlikely(bytes_read < 0)) {
> 			if (!copied)
> 				copied = bytes_read;
> @@ -2045,9 +2043,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> 		/* be sure to advertise window change */
> 		mptcp_cleanup_rbuf(msk);
>
> -		if (skb_queue_empty(&msk->receive_queue) && __mptcp_move_skbs(msk))
> -			continue;
> -
> 		/* only the master socket status is relevant here. The exit
> 		 * conditions mirror closely tcp_recvmsg()
> 		 */
> @@ -2074,7 +2069,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> 				/* race breaker: the shutdown could be after the
> 				 * previous receive queue check
> 				 */
> -				if (__mptcp_move_skbs(msk))
> +				if (__mptcp_move_skbs(sk))
> 					continue;
> 				break;
> 			}
> @@ -2111,9 +2106,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> 		}
> 	}
>
> -	pr_debug("msk=%p rx queue empty=%d:%d copied=%d",
> -		 msk, skb_queue_empty_lockless(&sk->sk_receive_queue),
> -		 skb_queue_empty(&msk->receive_queue), copied);
> +	pr_debug("msk=%p rx queue empty=%d copied=%d",
> +		 msk, skb_queue_empty(&sk->sk_receive_queue), copied);
> 	if (!(flags & MSG_PEEK))
> 		mptcp_rcv_space_adjust(msk, copied);
>
> @@ -2566,7 +2560,6 @@ static int __mptcp_init_sock(struct sock *sk)
> 	INIT_LIST_HEAD(&msk->join_list);
> 	INIT_LIST_HEAD(&msk->rtx_queue);
> 	INIT_WORK(&msk->work, mptcp_worker);
> -	__skb_queue_head_init(&msk->receive_queue);
> 	msk->out_of_order_queue = RB_ROOT;
> 	msk->first_pending = NULL;
> 	msk->rmem_fwd_alloc = 0;
> @@ -3048,12 +3041,8 @@ void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
> 	mptcp_for_each_subflow_safe(msk, subflow, tmp)
> 		__mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags);
>
> -	/* move to sk_receive_queue, sk_stream_kill_queues will purge it */
> -	mptcp_data_lock(sk);
> -	skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue);
> 	__skb_queue_purge(&sk->sk_receive_queue);
> 	skb_rbtree_purge(&msk->out_of_order_queue);
> -	mptcp_data_unlock(sk);
>
> 	/* move all the rx fwd alloc into the sk_mem_reclaim_final in
> 	 * inet_sock_destruct() will dispose it
> @@ -3135,6 +3124,8 @@ static void mptcp_release_cb(struct sock *sk)
> 			__mptcp_flush_join_list(sk);
> 		if (flags & BIT(MPTCP_PUSH_PENDING))
> 			__mptcp_push_pending(sk, 0);
> +		if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk))
> +			sk->sk_data_ready(sk);
> 		if (flags & BIT(MPTCP_RETRANSMIT))
> 			__mptcp_retrans(sk);
>
> @@ -3383,7 +3374,7 @@ static int mptcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
> 			return -EINVAL;
>
> 		lock_sock(sk);
> -		__mptcp_move_skbs(msk);
> +		__mptcp_move_skbs(sk);
> 		answ = mptcp_inq_hint(sk);
> 		release_sock(sk);
> 		break;
> @@ -3619,8 +3610,7 @@ static __poll_t mptcp_check_readable(struct mptcp_sock *msk)
> 	/* Concurrent splices from sk_receive_queue into receive_queue will
> 	 * always show at least one non-empty queue when checked in this order.
> 	 */
> -	if (skb_queue_empty_lockless(&((struct sock *)msk)->sk_receive_queue) &&
> -	    skb_queue_empty_lockless(&msk->receive_queue))
> +	if (skb_queue_empty_lockless(&((struct sock *)msk)->sk_receive_queue))
> 		return 0;
>
> 	return EPOLLIN | EPOLLRDNORM;
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 1e603f28f1db..f12a6e80171d 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -125,6 +125,7 @@
> #define MPTCP_FLUSH_JOIN_LIST	5
> #define MPTCP_CONNECTED		6
> #define MPTCP_RESET_SCHEDULER	7
> +#define MPTCP_DEQUEUE		8
>
> static inline bool before64(__u64 seq1, __u64 seq2)
> {
> @@ -288,7 +289,6 @@ struct mptcp_sock {
> 	struct work_struct work;
> 	struct sk_buff  *ooo_last_skb;
> 	struct rb_root  out_of_order_queue;
> -	struct sk_buff_head receive_queue;
> 	struct list_head conn_list;
> 	struct list_head rtx_queue;
> 	struct mptcp_data_frag *first_pending;
> -- 
> 2.35.3
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 4/4] mptcp: use common helper for rmem memory accounting
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 4/4] mptcp: use common helper for rmem memory accounting Paolo Abeni
  2022-07-29 18:32   ` mptcp: use common helper for rmem memory accounting: Build Failure MPTCP CI
  2022-07-29 21:20   ` mptcp: use common helper for rmem memory accounting: Tests Results MPTCP CI
@ 2022-07-29 23:16   ` Mat Martineau
  2 siblings, 0 replies; 14+ messages in thread
From: Mat Martineau @ 2022-07-29 23:16 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On Fri, 29 Jul 2022, Paolo Abeni wrote:

> After the previous patch, updating sk_forward_memory is cheap and
> we can drop a lot of complexity from the MPTCP memory acconting,
> using the common helper for that.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 112 ++++---------------------------------------
> net/mptcp/protocol.h |   4 +-
> 2 files changed, 9 insertions(+), 107 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index b7982b578c86..f6ff130f39c1 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -131,11 +131,6 @@ static void mptcp_drop(struct sock *sk, struct sk_buff *skb)
> 	__kfree_skb(skb);
> }
>
> -static void mptcp_rmem_charge(struct sock *sk, int size)
> -{
> -	mptcp_sk(sk)->rmem_fwd_alloc -= size;
> -}
> -
> static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to,
> 			       struct sk_buff *from)
> {
> @@ -152,7 +147,7 @@ static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to,
> 	MPTCP_SKB_CB(to)->end_seq = MPTCP_SKB_CB(from)->end_seq;
> 	kfree_skb_partial(from, fragstolen);
> 	atomic_add(delta, &sk->sk_rmem_alloc);
> -	mptcp_rmem_charge(sk, delta);
> +	sk->sk_forward_alloc -= delta;
> 	return true;
> }
>
> @@ -165,44 +160,6 @@ static bool mptcp_ooo_try_coalesce(struct mptcp_sock *msk, struct sk_buff *to,
> 	return mptcp_try_coalesce((struct sock *)msk, to, from);
> }
>
> -static void __mptcp_rmem_reclaim(struct sock *sk, int amount)
> -{
> -	amount >>= PAGE_SHIFT;
> -	mptcp_sk(sk)->rmem_fwd_alloc -= amount << PAGE_SHIFT;
> -	__sk_mem_reduce_allocated(sk, amount);
> -}
> -
> -static void mptcp_rmem_uncharge(struct sock *sk, int size)
> -{
> -	struct mptcp_sock *msk = mptcp_sk(sk);
> -	int reclaimable;
> -
> -	msk->rmem_fwd_alloc += size;
> -	reclaimable = msk->rmem_fwd_alloc - sk_unused_reserved_mem(sk);
> -
> -	/* see sk_mem_uncharge() for the rationale behind the following schema */
> -	if (unlikely(reclaimable >= PAGE_SIZE))
> -		__mptcp_rmem_reclaim(sk, reclaimable);
> -}
> -
> -static void mptcp_rfree(struct sk_buff *skb)
> -{
> -	unsigned int len = skb->truesize;
> -	struct sock *sk = skb->sk;
> -
> -	atomic_sub(len, &sk->sk_rmem_alloc);
> -	mptcp_rmem_uncharge(sk, len);
> -}
> -
> -static void mptcp_set_owner_r(struct sk_buff *skb, struct sock *sk)
> -{
> -	skb_orphan(skb);
> -	skb->sk = sk;
> -	skb->destructor = mptcp_rfree;
> -	atomic_add(skb->truesize, &sk->sk_rmem_alloc);
> -	mptcp_rmem_charge(sk, skb->truesize);
> -}
> -
> /* "inspired" by tcp_data_queue_ofo(), main differences:
>  * - use mptcp seqs
>  * - don't cope with sacks
> @@ -315,25 +272,7 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb)
>
> end:
> 	skb_condense(skb);
> -	mptcp_set_owner_r(skb, sk);
> -}
> -
> -static bool mptcp_rmem_schedule(struct sock *sk, struct sock *ssk, int size)
> -{
> -	struct mptcp_sock *msk = mptcp_sk(sk);
> -	int amt, amount;
> -
> -	if (size <= msk->rmem_fwd_alloc)
> -		return true;
> -
> -	size -= msk->rmem_fwd_alloc;
> -	amt = sk_mem_pages(size);
> -	amount = amt << PAGE_SHIFT;
> -	if (!__sk_mem_raise_allocated(sk, size, amt, SK_MEM_RECV))
> -		return false;
> -
> -	msk->rmem_fwd_alloc += amount;
> -	return true;
> +	skb_set_owner_r(skb, sk);
> }
>
> static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
> @@ -350,8 +289,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
> 	skb_ext_reset(skb);
> 	skb_orphan(skb);
>
> -	/* try to fetch required memory from subflow */
> -	if (!mptcp_rmem_schedule(sk, ssk, skb->truesize))
> +	if (!sk_rmem_schedule(sk, skb, skb->truesize))
> 		goto drop;
>
> 	has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp;
> @@ -372,7 +310,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
> 		if (tail && mptcp_try_coalesce(sk, tail, skb))
> 			return true;
>
> -		mptcp_set_owner_r(skb, sk);
> +		skb_set_owner_r(skb, sk);
> 		__skb_queue_tail(&sk->sk_receive_queue, skb);
> 		return true;
> 	} else if (after64(MPTCP_SKB_CB(skb)->map_seq, msk->ack_seq)) {
> @@ -1783,7 +1721,6 @@ static int __mptcp_recvmsg_mskq(struct sock *sk,
> 				struct scm_timestamping_internal *tss,
> 				int *cmsg_flags)
> {
> -	struct mptcp_sock *msk = mptcp_sk(sk);
> 	struct sk_buff *skb, *tmp;
> 	int copied = 0;
>
> @@ -1821,9 +1758,10 @@ static int __mptcp_recvmsg_mskq(struct sock *sk,
> 		}
>
> 		if (!(flags & MSG_PEEK)) {
> -			/* we will bulk release the skb memory later */
> +			/* avoid the indirect call, we know the destructor is sock_wfree */
> 			skb->destructor = NULL;
> -			WRITE_ONCE(msk->rmem_released, msk->rmem_released + skb->truesize);
> +			atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
> +			sk_mem_uncharge(sk, skb->truesize);
> 			__skb_unlink(skb, &sk->sk_receive_queue);
> 			__kfree_skb(skb);
> 		}
> @@ -1933,18 +1871,6 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
> 	msk->rcvq_space.time = mstamp;
> }
>
> -static void __mptcp_update_rmem(struct sock *sk)
> -{
> -	struct mptcp_sock *msk = mptcp_sk(sk);
> -
> -	if (!msk->rmem_released)
> -		return;
> -
> -	atomic_sub(msk->rmem_released, &sk->sk_rmem_alloc);
> -	mptcp_rmem_uncharge(sk, msk->rmem_released);
> -	WRITE_ONCE(msk->rmem_released, 0);
> -}
> -
> static bool __mptcp_move_skbs(struct sock *sk)
> {
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> @@ -1959,7 +1885,6 @@ static bool __mptcp_move_skbs(struct sock *sk)
> 			break;
>
> 		slowpath = lock_sock_fast(ssk);
> -		__mptcp_update_rmem(sk);
> 		done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved);
>
> 		if (unlikely(ssk->sk_err))
> @@ -1968,11 +1893,7 @@ static bool __mptcp_move_skbs(struct sock *sk)
> 	} while (!done);
>
> 	ret = moved > 0;
> -	if (!RB_EMPTY_ROOT(&msk->out_of_order_queue) ||
> -	    !skb_queue_empty(&sk->sk_receive_queue)) {
> -		__mptcp_update_rmem(sk);
> -		ret |= __mptcp_ofo_queue(msk);
> -	}
> +	ret |= __mptcp_ofo_queue(msk);
> 	if (ret) {
> 		mptcp_cleanup_rbuf(msk);
> 		mptcp_check_data_fin((struct sock *)msk);
> @@ -2562,8 +2483,6 @@ static int __mptcp_init_sock(struct sock *sk)
> 	INIT_WORK(&msk->work, mptcp_worker);
> 	msk->out_of_order_queue = RB_ROOT;
> 	msk->first_pending = NULL;
> -	msk->rmem_fwd_alloc = 0;
> -	WRITE_ONCE(msk->rmem_released, 0);
> 	msk->timer_ival = TCP_RTO_MIN;
>
> 	msk->first = NULL;
> @@ -2775,8 +2694,6 @@ static void __mptcp_destroy_sock(struct sock *sk)
>
> 	sk->sk_prot->destroy(sk);
>
> -	WARN_ON_ONCE(msk->rmem_fwd_alloc);
> -	WARN_ON_ONCE(msk->rmem_released);
> 	sk_stream_kill_queues(sk);
> 	xfrm_sk_free_policy(sk);
>
> @@ -3044,11 +2961,6 @@ void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
> 	__skb_queue_purge(&sk->sk_receive_queue);
> 	skb_rbtree_purge(&msk->out_of_order_queue);
>
> -	/* move all the rx fwd alloc into the sk_mem_reclaim_final in
> -	 * inet_sock_destruct() will dispose it
> -	 */
> -	sk->sk_forward_alloc += msk->rmem_fwd_alloc;
> -	msk->rmem_fwd_alloc = 0;
> 	mptcp_token_destroy(msk);
> 	mptcp_pm_free_anno_list(msk);
> 	mptcp_free_local_addr_list(msk);
> @@ -3146,8 +3058,6 @@ static void mptcp_release_cb(struct sock *sk)
> 		if (__test_and_clear_bit(MPTCP_RESET_SCHEDULER, &msk->cb_flags))
> 			msk->last_snd = NULL;
> 	}
> -
> -	__mptcp_update_rmem(sk);
> }
>
> /* MP_JOIN client subflow must wait for 4th ack before sending any data:
> @@ -3328,11 +3238,6 @@ static void mptcp_shutdown(struct sock *sk, int how)
> 		__mptcp_wr_shutdown(sk);
> }
>
> -static int mptcp_forward_alloc_get(const struct sock *sk)
> -{
> -	return sk->sk_forward_alloc + mptcp_sk(sk)->rmem_fwd_alloc;
> -}
> -
> static int mptcp_ioctl_outq(const struct mptcp_sock *msk, u64 v)
> {
> 	const struct sock *sk = (void *)msk;
> @@ -3413,7 +3318,6 @@ static struct proto mptcp_prot = {
> 	.hash		= mptcp_hash,
> 	.unhash		= mptcp_unhash,
> 	.get_port	= mptcp_get_port,
> -	.forward_alloc_get	= mptcp_forward_alloc_get,

Since MPTCP is (or "was") the only user of forward_alloc_get... seems like 
another patch should be added to rip out all the forward_alloc_get stuff?

This is mostly a matter of reverting

292e6077b040 ("net: introduce sk_forward_alloc_get()")
and
6c302e799a0d ("net: forward_alloc_get depends on CONFIG_MPTCP")

(although the 2nd has a minor conflict)


Overall, the series tests fine and removes a lot of complexity, just 
these minor comments to address. Thanks!

- Mat

> 	.sockets_allocated	= &mptcp_sockets_allocated,
>
> 	.memory_allocated	= &tcp_memory_allocated,
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index f12a6e80171d..d4a267cb7663 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -258,7 +258,6 @@ struct mptcp_sock {
> 	u64		ack_seq;
> 	atomic64_t	rcv_wnd_sent;
> 	u64		rcv_data_fin_seq;
> -	int		rmem_fwd_alloc;
> 	struct sock	*last_snd;
> 	int		snd_burst;
> 	u64		recovery_snd_nxt;	/* in recovery mode accept up to this seq;
> @@ -269,7 +268,6 @@ struct mptcp_sock {
> 	u64		wnd_end;
> 	unsigned long	timer_ival;
> 	u32		token;
> -	int		rmem_released;
> 	unsigned long	flags;
> 	unsigned long	cb_flags;
> 	unsigned long	push_pending;
> @@ -332,7 +330,7 @@ static inline struct mptcp_sock *mptcp_sk(const struct sock *sk)
>  */
> static inline int __mptcp_rmem(const struct sock *sk)
> {
> -	return atomic_read(&sk->sk_rmem_alloc) - READ_ONCE(mptcp_sk(sk)->rmem_released);
> +	return atomic_read(&sk->sk_rmem_alloc);
> }
>
> static inline int __mptcp_receive_window(const struct mptcp_sock *msk)
> -- 
> 2.35.3
>
>
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 3/4] mptcp: move msk input path under full msk socket lock
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 3/4] mptcp: move msk input path under full msk socket lock Paolo Abeni
  2022-07-29 23:02   ` Mat Martineau
@ 2022-07-30  5:36   ` kernel test robot
  2022-07-30  6:43   ` kernel test robot
  2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2022-07-30  5:36 UTC (permalink / raw)
  To: Paolo Abeni, mptcp; +Cc: kbuild-all

Hi Paolo,

I love your patch! Perhaps something to improve:

[auto build test WARNING on mptcp/export]
[cannot apply to linus/master v5.19-rc8 next-20220728]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/mptcp-just-another-receive-path-refactor/20220730-021638
base:   https://github.com/multipath-tcp/mptcp_net-next.git export
config: x86_64-randconfig-a013 (https://download.01.org/0day-ci/archive/20220730/202207301339.GBivsDUw-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/f2cae8b9b78304e87a6c4ee4dbaa50cc02b078d5
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Paolo-Abeni/mptcp-just-another-receive-path-refactor/20220730-021638
        git checkout f2cae8b9b78304e87a6c4ee4dbaa50cc02b078d5
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/mptcp/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/mptcp/protocol.c:798:6: warning: no previous prototype for '__mptcp_data_ready' [-Wmissing-prototypes]
     798 | void __mptcp_data_ready(struct sock *sk, struct sock *ssk)
         |      ^~~~~~~~~~~~~~~~~~


vim +/__mptcp_data_ready +798 net/mptcp/protocol.c

   797	
 > 798	void __mptcp_data_ready(struct sock *sk, struct sock *ssk)
   799	{
   800		struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
   801	
   802		/* The peer can send data while we are shutting down this
   803		 * subflow at msk destruction time, but we must avoid enqueuing
   804		 * more data to the msk receive queue
   805		 */
   806		if (unlikely(subflow->disposable))
   807			return;
   808	
   809		/* Wake-up the reader only for in-sequence data */
   810		if (move_skbs_to_msk(mptcp_sk(sk), ssk))
   811			sk->sk_data_ready(sk);
   812	}
   813	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 2/4] mptcp: more accurate receive buffer updates
  2022-07-29 23:01   ` Mat Martineau
@ 2022-07-30  6:32     ` Paolo Abeni
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2022-07-30  6:32 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Fri, 2022-07-29 at 16:01 -0700, Mat Martineau wrote:
> On Fri, 29 Jul 2022, Paolo Abeni wrote:
> 
> > Currently mptcp_cleanup_rbuf() makes a significant effort to avoid
> > acquiring the subflow socket lock, estimating if the tcp level
> > cleanup could actually send an ack.
> > 
> > Such estimate is a bit rough when accounting for receive window
> > change, as it consider the msk available buffer space instead
> > of the announced mptcp-level window.
> > 
> > Let's consider the announced window instead, mirroring closely
> > the plain TCP implementation. We need to lockless access a bunch
> > of additional, tcp fields.
> 
> Hi Paolo -
> 
> The code changes look good. For this last sentence above ("...lockless 
> access a bunch of ... tcp fields"), I only see access to msk->rcv_wnd_sent 
> and msk->ack_seq, and nothing obvious in the later commits. Could you 
> clarify?

You are right, that comment is inaccurate. Is there from a previous
internal version. I'll drop it in the next iteration.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 3/4] mptcp: move msk input path under full msk socket lock
  2022-07-29 23:02   ` Mat Martineau
@ 2022-07-30  6:32     ` Paolo Abeni
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2022-07-30  6:32 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp

On Fri, 2022-07-29 at 16:02 -0700, Mat Martineau wrote:
> On Fri, 29 Jul 2022, Paolo Abeni wrote:
> 
> > After commit c2e6048fa1cf ("mptcp: fix race in release_cb") it's
> > pretty straight forward move the whole MPTCP rx path under the socket
> > lock leveraging the release_cb.
> > 
> > We can drop a bunch of spin_lock pairs in the receive functions, use
> > a single receive queue and invoke __mptcp_move_skbs only when subflows
> > ask for it.
> > 
> > This will allow more cleanup in the next patch.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > net/mptcp/protocol.c | 78 +++++++++++++++++++-------------------------
> > net/mptcp/protocol.h |  2 +-
> > 2 files changed, 35 insertions(+), 45 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 17e2dbe43639..b7982b578c86 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -795,7 +795,7 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
> > 	return moved > 0;
> > }
> > 
> > -void mptcp_data_ready(struct sock *sk, struct sock *ssk)
> > +void __mptcp_data_ready(struct sock *sk, struct sock *ssk)
> 
> While a normal build works fine, the CI is complaining about this line in 
> the "make W=1 C=1" build. The function is only used in this file, so 
> adding 'static' would make 'sparse' happy.

Thanks, I'll fix it in the next iteration.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 mptcp-next 3/4] mptcp: move msk input path under full msk socket lock
  2022-07-29 18:14 ` [PATCH v2 mptcp-next 3/4] mptcp: move msk input path under full msk socket lock Paolo Abeni
  2022-07-29 23:02   ` Mat Martineau
  2022-07-30  5:36   ` kernel test robot
@ 2022-07-30  6:43   ` kernel test robot
  2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2022-07-30  6:43 UTC (permalink / raw)
  To: Paolo Abeni, mptcp; +Cc: llvm, kbuild-all

Hi Paolo,

I love your patch! Perhaps something to improve:

[auto build test WARNING on mptcp/export]
[cannot apply to linus/master v5.19-rc8 next-20220728]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/mptcp-just-another-receive-path-refactor/20220730-021638
base:   https://github.com/multipath-tcp/mptcp_net-next.git export
config: x86_64-randconfig-a001 (https://download.01.org/0day-ci/archive/20220730/202207301455.Biyfbfoh-lkp@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 52cd00cabf479aa7eb6dbb063b7ba41ea57bce9e)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/f2cae8b9b78304e87a6c4ee4dbaa50cc02b078d5
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Paolo-Abeni/mptcp-just-another-receive-path-refactor/20220730-021638
        git checkout f2cae8b9b78304e87a6c4ee4dbaa50cc02b078d5
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash net/mptcp/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/mptcp/protocol.c:798:6: warning: no previous prototype for function '__mptcp_data_ready' [-Wmissing-prototypes]
   void __mptcp_data_ready(struct sock *sk, struct sock *ssk)
        ^
   net/mptcp/protocol.c:798:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __mptcp_data_ready(struct sock *sk, struct sock *ssk)
   ^
   static 
   1 warning generated.


vim +/__mptcp_data_ready +798 net/mptcp/protocol.c

   797	
 > 798	void __mptcp_data_ready(struct sock *sk, struct sock *ssk)
   799	{
   800		struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
   801	
   802		/* The peer can send data while we are shutting down this
   803		 * subflow at msk destruction time, but we must avoid enqueuing
   804		 * more data to the msk receive queue
   805		 */
   806		if (unlikely(subflow->disposable))
   807			return;
   808	
   809		/* Wake-up the reader only for in-sequence data */
   810		if (move_skbs_to_msk(mptcp_sk(sk), ssk))
   811			sk->sk_data_ready(sk);
   812	}
   813	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-07-30  6:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-29 18:14 [PATCH v2 mptcp-next 0/4] mptcp: just another receive path refactor Paolo Abeni
2022-07-29 18:14 ` [PATCH v2 mptcp-next 1/4] mptcp: move RCVPRUNE event later Paolo Abeni
2022-07-29 18:14 ` [PATCH v2 mptcp-next 2/4] mptcp: more accurate receive buffer updates Paolo Abeni
2022-07-29 23:01   ` Mat Martineau
2022-07-30  6:32     ` Paolo Abeni
2022-07-29 18:14 ` [PATCH v2 mptcp-next 3/4] mptcp: move msk input path under full msk socket lock Paolo Abeni
2022-07-29 23:02   ` Mat Martineau
2022-07-30  6:32     ` Paolo Abeni
2022-07-30  5:36   ` kernel test robot
2022-07-30  6:43   ` kernel test robot
2022-07-29 18:14 ` [PATCH v2 mptcp-next 4/4] mptcp: use common helper for rmem memory accounting Paolo Abeni
2022-07-29 18:32   ` mptcp: use common helper for rmem memory accounting: Build Failure MPTCP CI
2022-07-29 21:20   ` mptcp: use common helper for rmem memory accounting: Tests Results MPTCP CI
2022-07-29 23:16   ` [PATCH v2 mptcp-next 4/4] mptcp: use common helper for rmem memory accounting Mat Martineau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).