netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path
@ 2020-02-26  9:14 Florian Westphal
  2020-02-26  9:14 ` [PATCH net-next 1/7] mptcp: add and use mptcp_data_ready helper Florian Westphal
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Florian Westphal @ 2020-02-26  9:14 UTC (permalink / raw)
  To: netdev

This series moves mptcp-level ack sequence update outside of the recvmsg path.
Current approach has two problems:

1. There is delay between arrival of new data and the time we can ack
   this data.
2. If userspace doesn't call recv for some time, mptcp ack_seq is not
   updated at all, even if this data is queued in the subflow socket
   receive queue.

Move skbs from the subflow socket receive queue to the mptcp-level
receive queue, updating the mptcp-level ack sequence and have recv
take skbs from the mptcp-level receive queue.

The first place where we will attempt to update the mptcp level acks
is from the subflows' data_ready callback, even before we make userspace
aware of new data.

Because of possible deadlock (we need to take the mptcp socket lock
while already holding the subflow sockets lock), we may still need to
defer the mptcp-level ack update.  In such case, this work will be either
done from work queue or recv path, depending on which runs sooner.

In order to avoid pointless scheduling of the work queue, work
will be queued from the mptcp sockets lock release callback.
This allows to detect when the socket owner did drain the subflow
socket receive queue.

Please see individual patches for more information.

Florian Westphal (5):
      mptcp: add and use mptcp_data_ready helper
      mptcp: update mptcp ack sequence from work queue
      mptcp: add rmem queue accounting
      mptcp: remove mptcp_read_actor
      mptcp: avoid work queue scheduling if possible

Paolo Abeni (2):
      mptcp: add work queue skeleton
      mptcp: defer work schedule until mptcp lock is released

 net/mptcp/protocol.c | 364 ++++++++++++++++++++++++++++++++++++++-------------
 net/mptcp/protocol.h |   9 +-
 net/mptcp/subflow.c  |  32 ++---
 3 files changed, 289 insertions(+), 116 deletions(-)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next 1/7] mptcp: add and use mptcp_data_ready helper
  2020-02-26  9:14 [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path Florian Westphal
@ 2020-02-26  9:14 ` Florian Westphal
  2020-02-27  0:09   ` Mat Martineau
  2020-02-26  9:14 ` [PATCH net-next 2/7] mptcp: add work queue skeleton Florian Westphal
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2020-02-26  9:14 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

allows us to schedule the work queue to drain the ssk receive queue in
a followup patch.

This is needed to avoid sending all-to-pessimistic mptcp-level
acknowledgements.  At this time, the ack_seq is what was last read by
userspace instead of the highest in-sequence number queued for reading.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c |  8 ++++++++
 net/mptcp/protocol.h |  1 +
 net/mptcp/subflow.c  | 14 ++++----------
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index e9aa6807b5be..1d55563e9aca 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -111,6 +111,14 @@ static struct sock *mptcp_subflow_get(const struct mptcp_sock *msk)
 	return NULL;
 }
 
+void mptcp_data_ready(struct sock *sk)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+
+	set_bit(MPTCP_DATA_READY, &msk->flags);
+	sk->sk_data_ready(sk);
+}
+
 static bool mptcp_ext_cache_refill(struct mptcp_sock *msk)
 {
 	if (!msk->cached_ext)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 9f8663b30456..67895a7c1e5b 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -201,6 +201,7 @@ void mptcp_get_options(const struct sk_buff *skb,
 		       struct tcp_options_received *opt_rx);
 
 void mptcp_finish_connect(struct sock *sk);
+void mptcp_data_ready(struct sock *sk);
 
 int mptcp_token_new_request(struct request_sock *req);
 void mptcp_token_destroy_request(u32 token);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 65122edf60aa..3dad662840aa 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -554,11 +554,8 @@ static void subflow_data_ready(struct sock *sk)
 		return;
 	}
 
-	if (mptcp_subflow_data_available(sk)) {
-		set_bit(MPTCP_DATA_READY, &mptcp_sk(parent)->flags);
-
-		parent->sk_data_ready(parent);
-	}
+	if (mptcp_subflow_data_available(sk))
+		mptcp_data_ready(parent);
 }
 
 static void subflow_write_space(struct sock *sk)
@@ -690,11 +687,8 @@ static void subflow_state_change(struct sock *sk)
 	 * a fin packet carrying a DSS can be unnoticed if we don't trigger
 	 * the data available machinery here.
 	 */
-	if (parent && subflow->mp_capable && mptcp_subflow_data_available(sk)) {
-		set_bit(MPTCP_DATA_READY, &mptcp_sk(parent)->flags);
-
-		parent->sk_data_ready(parent);
-	}
+	if (parent && subflow->mp_capable && mptcp_subflow_data_available(sk))
+		mptcp_data_ready(parent);
 
 	if (parent && !(parent->sk_shutdown & RCV_SHUTDOWN) &&
 	    !subflow->rx_eof && subflow_is_done(sk)) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 2/7] mptcp: add work queue skeleton
  2020-02-26  9:14 [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path Florian Westphal
  2020-02-26  9:14 ` [PATCH net-next 1/7] mptcp: add and use mptcp_data_ready helper Florian Westphal
@ 2020-02-26  9:14 ` Florian Westphal
  2020-02-27  0:10   ` Mat Martineau
  2020-02-26  9:14 ` [PATCH net-next 3/7] mptcp: update mptcp ack sequence from work queue Florian Westphal
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2020-02-26  9:14 UTC (permalink / raw)
  To: netdev; +Cc: Paolo Abeni, Florian Westphal

From: Paolo Abeni <pabeni@redhat.com>

Will be extended with functionality in followup patches.
Initial user is moving skbs from subflows receive queue to
the mptcp-level receive queue.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 22 ++++++++++++++++++++++
 net/mptcp/protocol.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1d55563e9aca..cbf184a71ed7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -551,12 +551,24 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
 	}
 }
 
+static void mptcp_worker(struct work_struct *work)
+{
+	struct mptcp_sock *msk = container_of(work, struct mptcp_sock, work);
+	struct sock *sk = &msk->sk.icsk_inet.sk;
+
+	lock_sock(sk);
+
+	release_sock(sk);
+	sock_put(sk);
+}
+
 static int __mptcp_init_sock(struct sock *sk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
 	INIT_LIST_HEAD(&msk->conn_list);
 	__set_bit(MPTCP_SEND_SPACE, &msk->flags);
+	INIT_WORK(&msk->work, mptcp_worker);
 
 	msk->first = NULL;
 
@@ -571,6 +583,14 @@ static int mptcp_init_sock(struct sock *sk)
 	return __mptcp_init_sock(sk);
 }
 
+static void mptcp_cancel_work(struct sock *sk)
+{
+	struct mptcp_sock *msk = mptcp_sk(sk);
+
+	if (cancel_work_sync(&msk->work))
+		sock_put(sk);
+}
+
 static void mptcp_subflow_shutdown(struct sock *ssk, int how)
 {
 	lock_sock(ssk);
@@ -616,6 +636,8 @@ static void mptcp_close(struct sock *sk, long timeout)
 		__mptcp_close_ssk(sk, ssk, subflow, timeout);
 	}
 
+	mptcp_cancel_work(sk);
+
 	sk_common_release(sk);
 }
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 67895a7c1e5b..6e6e162d25f1 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -70,6 +70,7 @@ struct mptcp_sock {
 	u32		token;
 	unsigned long	flags;
 	bool		can_ack;
+	struct work_struct work;
 	struct list_head conn_list;
 	struct skb_ext	*cached_ext;	/* for the next sendmsg */
 	struct socket	*subflow; /* outgoing connect/listener/!mp_capable */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 3/7] mptcp: update mptcp ack sequence from work queue
  2020-02-26  9:14 [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path Florian Westphal
  2020-02-26  9:14 ` [PATCH net-next 1/7] mptcp: add and use mptcp_data_ready helper Florian Westphal
  2020-02-26  9:14 ` [PATCH net-next 2/7] mptcp: add work queue skeleton Florian Westphal
@ 2020-02-26  9:14 ` Florian Westphal
  2020-02-27  0:10   ` Mat Martineau
  2020-02-26  9:14 ` [PATCH net-next 4/7] mptcp: add rmem queue accounting Florian Westphal
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2020-02-26  9:14 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

If userspace is not reading data, all the mptcp-level acks contain the
ack_seq from the last time userspace read data rather than the most
recent in-sequence value.

This causes pointless retransmissions for data that is already queued.

The reason for this is that all the mptcp protocol level processing
happens at mptcp_recv time.

This adds work queue to move skbs from the subflow sockets receive
queue on the mptcp socket receive queue (which was not used so far).

This allows us to announce the correct mptcp ack sequence in a timely
fashion, even when the application does not call recv() on the mptcp socket
for some time.

We still wake userspace tasks waiting for POLLIN immediately:
If the mptcp level receive queue is empty (because the work queue is
still pending) it can be filled from in-sequence subflow sockets at
recv time without a need to wait for the worker.

The skb_orphan when moving skbs from subflow to mptcp level is needed,
because the destructor (sock_rfree) relies on skb->sk (ssk!) lock
being taken.

A followup patch will add needed rmem accouting for the moved skbs.

Other problem: In case application behaves as expected, and calls
recv() as soon as mptcp socket becomes readable, the work queue will
only waste cpu cycles.  This will also be addressed in followup patches.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 234 ++++++++++++++++++++++++++++++-------------
 1 file changed, 165 insertions(+), 69 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index cbf184a71ed7..b4a8517d8eac 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -31,6 +31,12 @@ struct mptcp6_sock {
 };
 #endif
 
+struct mptcp_skb_cb {
+	u32 offset;
+};
+
+#define MPTCP_SKB_CB(__skb)	((struct mptcp_skb_cb *)&((__skb)->cb[0]))
+
 /* If msk has an initial subflow socket, and the MP_CAPABLE handshake has not
  * completed yet or has failed, return the subflow socket.
  * Otherwise return NULL.
@@ -111,11 +117,88 @@ static struct sock *mptcp_subflow_get(const struct mptcp_sock *msk)
 	return NULL;
 }
 
+static void __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
+			     struct sk_buff *skb,
+			     unsigned int offset, size_t copy_len)
+{
+	struct sock *sk = (struct sock *)msk;
+
+	__skb_unlink(skb, &ssk->sk_receive_queue);
+	skb_orphan(skb);
+	__skb_queue_tail(&sk->sk_receive_queue, skb);
+
+	msk->ack_seq += copy_len;
+	MPTCP_SKB_CB(skb)->offset = offset;
+}
+
+static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
+					   struct sock *ssk,
+					   unsigned int *bytes)
+{
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	unsigned int moved = 0;
+	bool more_data_avail;
+	struct tcp_sock *tp;
+	bool done = false;
+
+	tp = tcp_sk(ssk);
+	do {
+		u32 map_remaining, offset;
+		u32 seq = tp->copied_seq;
+		struct sk_buff *skb;
+		bool fin;
+
+		/* try to move as much data as available */
+		map_remaining = subflow->map_data_len -
+				mptcp_subflow_get_map_offset(subflow);
+
+		skb = skb_peek(&ssk->sk_receive_queue);
+		if (!skb)
+			break;
+
+		offset = seq - TCP_SKB_CB(skb)->seq;
+		fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN;
+		if (fin) {
+			done = true;
+			seq++;
+		}
+
+		if (offset < skb->len) {
+			size_t len = skb->len - offset;
+
+			if (tp->urg_data)
+				done = true;
+
+			__mptcp_move_skb(msk, ssk, skb, offset, len);
+			seq += len;
+			moved += len;
+
+			if (WARN_ON_ONCE(map_remaining < len))
+				break;
+		} else {
+			WARN_ON_ONCE(!fin);
+			sk_eat_skb(ssk, skb);
+			done = true;
+		}
+
+		WRITE_ONCE(tp->copied_seq, seq);
+		more_data_avail = mptcp_subflow_data_available(ssk);
+	} while (more_data_avail);
+
+	*bytes = moved;
+
+	return done;
+}
+
 void mptcp_data_ready(struct sock *sk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
 	set_bit(MPTCP_DATA_READY, &msk->flags);
+
+	if (schedule_work(&msk->work))
+		sock_hold((struct sock *)msk);
+
 	sk->sk_data_ready(sk);
 }
 
@@ -373,19 +456,68 @@ static void mptcp_wait_data(struct sock *sk, long *timeo)
 	remove_wait_queue(sk_sleep(sk), &wait);
 }
 
+static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
+				struct msghdr *msg,
+				size_t len)
+{
+	struct sock *sk = (struct sock *)msk;
+	struct sk_buff *skb;
+	int copied = 0;
+
+	while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
+		u32 offset = MPTCP_SKB_CB(skb)->offset;
+		u32 data_len = skb->len - offset;
+		u32 count = min_t(size_t, len - copied, data_len);
+		int err;
+
+		err = skb_copy_datagram_msg(skb, offset, msg, count);
+		if (unlikely(err < 0)) {
+			if (!copied)
+				return err;
+			break;
+		}
+
+		copied += count;
+
+		if (count < data_len) {
+			MPTCP_SKB_CB(skb)->offset += count;
+			break;
+		}
+
+		__skb_unlink(skb, &sk->sk_receive_queue);
+		__kfree_skb(skb);
+
+		if (copied >= len)
+			break;
+	}
+
+	return copied;
+}
+
+static bool __mptcp_move_skbs(struct mptcp_sock *msk)
+{
+	unsigned int moved = 0;
+	bool done;
+
+	do {
+		struct sock *ssk = mptcp_subflow_recv_lookup(msk);
+
+		if (!ssk)
+			break;
+
+		lock_sock(ssk);
+		done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved);
+		release_sock(ssk);
+	} while (!done);
+
+	return moved > 0;
+}
+
 static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 			 int nonblock, int flags, int *addr_len)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	struct mptcp_subflow_context *subflow;
-	bool more_data_avail = false;
-	struct mptcp_read_arg arg;
-	read_descriptor_t desc;
-	bool wait_data = false;
 	struct socket *ssock;
-	struct tcp_sock *tp;
-	bool done = false;
-	struct sock *ssk;
 	int copied = 0;
 	int target;
 	long timeo;
@@ -403,65 +535,26 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		return copied;
 	}
 
-	arg.msg = msg;
-	desc.arg.data = &arg;
-	desc.error = 0;
-
 	timeo = sock_rcvtimeo(sk, nonblock);
 
 	len = min_t(size_t, len, INT_MAX);
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
 
-	while (!done) {
-		u32 map_remaining;
+	while (len > (size_t)copied) {
 		int bytes_read;
 
-		ssk = mptcp_subflow_recv_lookup(msk);
-		pr_debug("msk=%p ssk=%p", msk, ssk);
-		if (!ssk)
-			goto wait_for_data;
+		bytes_read = __mptcp_recvmsg_mskq(msk, msg, len - copied);
+		if (unlikely(bytes_read < 0)) {
+			if (!copied)
+				copied = bytes_read;
+			goto out_err;
+		}
 
-		subflow = mptcp_subflow_ctx(ssk);
-		tp = tcp_sk(ssk);
+		copied += bytes_read;
 
-		lock_sock(ssk);
-		do {
-			/* try to read as much data as available */
-			map_remaining = subflow->map_data_len -
-					mptcp_subflow_get_map_offset(subflow);
-			desc.count = min_t(size_t, len - copied, map_remaining);
-			pr_debug("reading %zu bytes, copied %d", desc.count,
-				 copied);
-			bytes_read = tcp_read_sock(ssk, &desc,
-						   mptcp_read_actor);
-			if (bytes_read < 0) {
-				if (!copied)
-					copied = bytes_read;
-				done = true;
-				goto next;
-			}
-
-			pr_debug("msk ack_seq=%llx -> %llx", msk->ack_seq,
-				 msk->ack_seq + bytes_read);
-			msk->ack_seq += bytes_read;
-			copied += bytes_read;
-			if (copied >= len) {
-				done = true;
-				goto next;
-			}
-			if (tp->urg_data && tp->urg_seq == tp->copied_seq) {
-				pr_err("Urgent data present, cannot proceed");
-				done = true;
-				goto next;
-			}
-next:
-			more_data_avail = mptcp_subflow_data_available(ssk);
-		} while (more_data_avail && !done);
-		release_sock(ssk);
-		continue;
-
-wait_for_data:
-		more_data_avail = false;
+		if (skb_queue_empty(&sk->sk_receive_queue) &&
+		    __mptcp_move_skbs(msk))
+			continue;
 
 		/* only the master socket status is relevant here. The exit
 		 * conditions mirror closely tcp_recvmsg()
@@ -502,26 +595,25 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		}
 
 		pr_debug("block timeout %ld", timeo);
-		wait_data = true;
 		mptcp_wait_data(sk, &timeo);
 		if (unlikely(__mptcp_tcp_fallback(msk)))
 			goto fallback;
 	}
 
-	if (more_data_avail) {
-		if (!test_bit(MPTCP_DATA_READY, &msk->flags))
-			set_bit(MPTCP_DATA_READY, &msk->flags);
-	} else if (!wait_data) {
+	if (skb_queue_empty(&sk->sk_receive_queue)) {
+		/* entire backlog drained, clear DATA_READY. */
 		clear_bit(MPTCP_DATA_READY, &msk->flags);
 
-		/* .. race-breaker: ssk might get new data after last
-		 * data_available() returns false.
+		/* .. race-breaker: ssk might have gotten new data
+		 * after last __mptcp_move_skbs() returned false.
 		 */
-		ssk = mptcp_subflow_recv_lookup(msk);
-		if (unlikely(ssk))
+		if (unlikely(__mptcp_move_skbs(msk)))
 			set_bit(MPTCP_DATA_READY, &msk->flags);
+	} else if (unlikely(!test_bit(MPTCP_DATA_READY, &msk->flags))) {
+		/* data to read but mptcp_wait_data() cleared DATA_READY */
+		set_bit(MPTCP_DATA_READY, &msk->flags);
 	}
-
+out_err:
 	release_sock(sk);
 	return copied;
 }
@@ -557,7 +649,7 @@ static void mptcp_worker(struct work_struct *work)
 	struct sock *sk = &msk->sk.icsk_inet.sk;
 
 	lock_sock(sk);
-
+	__mptcp_move_skbs(msk);
 	release_sock(sk);
 	sock_put(sk);
 }
@@ -638,6 +730,8 @@ static void mptcp_close(struct sock *sk, long timeout)
 
 	mptcp_cancel_work(sk);
 
+	__skb_queue_purge(&sk->sk_receive_queue);
+
 	sk_common_release(sk);
 }
 
@@ -1204,6 +1298,8 @@ void mptcp_proto_init(void)
 		panic("Failed to register MPTCP proto.\n");
 
 	inet_register_protosw(&mptcp_protosw);
+
+	BUILD_BUG_ON(sizeof(struct mptcp_skb_cb) > sizeof_field(struct sk_buff, cb));
 }
 
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 4/7] mptcp: add rmem queue accounting
  2020-02-26  9:14 [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path Florian Westphal
                   ` (2 preceding siblings ...)
  2020-02-26  9:14 ` [PATCH net-next 3/7] mptcp: update mptcp ack sequence from work queue Florian Westphal
@ 2020-02-26  9:14 ` Florian Westphal
  2020-02-27  0:11   ` Mat Martineau
  2020-02-26  9:14 ` [PATCH net-next 5/7] mptcp: remove mptcp_read_actor Florian Westphal
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2020-02-26  9:14 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

If userspace never drains the receive buffers we must stop draining
the subflow socket(s) at some point.

This adds the needed rmem accouting for this.
If the threshold is reached, we stop draining the subflows.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index b4a8517d8eac..381d5647a95b 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -124,7 +124,7 @@ static void __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
 	struct sock *sk = (struct sock *)msk;
 
 	__skb_unlink(skb, &ssk->sk_receive_queue);
-	skb_orphan(skb);
+	skb_set_owner_r(skb, sk);
 	__skb_queue_tail(&sk->sk_receive_queue, skb);
 
 	msk->ack_seq += copy_len;
@@ -136,10 +136,16 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 					   unsigned int *bytes)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	struct sock *sk = (struct sock *)msk;
 	unsigned int moved = 0;
 	bool more_data_avail;
 	struct tcp_sock *tp;
 	bool done = false;
+	int rcvbuf;
+
+	rcvbuf = max(ssk->sk_rcvbuf, sk->sk_rcvbuf);
+	if (rcvbuf > sk->sk_rcvbuf)
+		sk->sk_rcvbuf = rcvbuf;
 
 	tp = tcp_sk(ssk);
 	do {
@@ -183,6 +189,11 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 
 		WRITE_ONCE(tp->copied_seq, seq);
 		more_data_avail = mptcp_subflow_data_available(ssk);
+
+		if (atomic_read(&sk->sk_rmem_alloc) > READ_ONCE(sk->sk_rcvbuf)) {
+			done = true;
+			break;
+		}
 	} while (more_data_avail);
 
 	*bytes = moved;
@@ -196,9 +207,14 @@ void mptcp_data_ready(struct sock *sk)
 
 	set_bit(MPTCP_DATA_READY, &msk->flags);
 
+	/* don't schedule if mptcp sk is (still) over limit */
+	if (atomic_read(&sk->sk_rmem_alloc) > READ_ONCE(sk->sk_rcvbuf))
+		goto wake;
+
 	if (schedule_work(&msk->work))
 		sock_hold((struct sock *)msk);
 
+wake:
 	sk->sk_data_ready(sk);
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 5/7] mptcp: remove mptcp_read_actor
  2020-02-26  9:14 [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path Florian Westphal
                   ` (3 preceding siblings ...)
  2020-02-26  9:14 ` [PATCH net-next 4/7] mptcp: add rmem queue accounting Florian Westphal
@ 2020-02-26  9:14 ` Florian Westphal
  2020-02-27  0:11   ` Mat Martineau
  2020-02-26  9:14 ` [PATCH net-next 6/7] mptcp: avoid work queue scheduling if possible Florian Westphal
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2020-02-26  9:14 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

Only used to discard stale data from the subflow, so move
it where needed.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 27 ---------------------------
 net/mptcp/protocol.h |  7 -------
 net/mptcp/subflow.c  | 18 +++++++++++++-----
 3 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 381d5647a95b..b781498e69b4 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -430,33 +430,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return ret;
 }
 
-int mptcp_read_actor(read_descriptor_t *desc, struct sk_buff *skb,
-		     unsigned int offset, size_t len)
-{
-	struct mptcp_read_arg *arg = desc->arg.data;
-	size_t copy_len;
-
-	copy_len = min(desc->count, len);
-
-	if (likely(arg->msg)) {
-		int err;
-
-		err = skb_copy_datagram_msg(skb, offset, arg->msg, copy_len);
-		if (err) {
-			pr_debug("error path");
-			desc->error = err;
-			return err;
-		}
-	} else {
-		pr_debug("Flushing skb payload");
-	}
-
-	desc->count -= copy_len;
-
-	pr_debug("consumed %zu bytes, %zu left", copy_len, desc->count);
-	return copy_len;
-}
-
 static void mptcp_wait_data(struct sock *sk, long *timeo)
 {
 	DEFINE_WAIT_FUNC(wait, woken_wake_function);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 6e6e162d25f1..d06170c5f191 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -191,13 +191,6 @@ void mptcp_proto_init(void);
 int mptcp_proto_v6_init(void);
 #endif
 
-struct mptcp_read_arg {
-	struct msghdr *msg;
-};
-
-int mptcp_read_actor(read_descriptor_t *desc, struct sk_buff *skb,
-		     unsigned int offset, size_t len);
-
 void mptcp_get_options(const struct sk_buff *skb,
 		       struct tcp_options_received *opt_rx);
 
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 3dad662840aa..37a4767db441 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -408,6 +408,18 @@ static enum mapping_status get_mapping_status(struct sock *ssk)
 	return MAPPING_OK;
 }
 
+static int subflow_read_actor(read_descriptor_t *desc,
+			      struct sk_buff *skb,
+			      unsigned int offset, size_t len)
+{
+	size_t copy_len = min(desc->count, len);
+
+	desc->count -= copy_len;
+
+	pr_debug("flushed %zu bytes, %zu left", copy_len, desc->count);
+	return copy_len;
+}
+
 static bool subflow_check_data_avail(struct sock *ssk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
@@ -482,16 +494,12 @@ static bool subflow_check_data_avail(struct sock *ssk)
 		pr_debug("discarding %zu bytes, current map len=%d", delta,
 			 map_remaining);
 		if (delta) {
-			struct mptcp_read_arg arg = {
-				.msg = NULL,
-			};
 			read_descriptor_t desc = {
 				.count = delta,
-				.arg.data = &arg,
 			};
 			int ret;
 
-			ret = tcp_read_sock(ssk, &desc, mptcp_read_actor);
+			ret = tcp_read_sock(ssk, &desc, subflow_read_actor);
 			if (ret < 0) {
 				ssk->sk_err = -ret;
 				goto fatal;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 6/7] mptcp: avoid work queue scheduling if possible
  2020-02-26  9:14 [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path Florian Westphal
                   ` (4 preceding siblings ...)
  2020-02-26  9:14 ` [PATCH net-next 5/7] mptcp: remove mptcp_read_actor Florian Westphal
@ 2020-02-26  9:14 ` Florian Westphal
  2020-02-27  0:11   ` Mat Martineau
  2020-02-26  9:14 ` [PATCH net-next 7/7] mptcp: defer work schedule until mptcp lock is released Florian Westphal
  2020-02-27  4:47 ` [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path David Miller
  7 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2020-02-26  9:14 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

We can't lock_sock() the mptcp socket from the subflow data_ready callback,
it would result in ABBA deadlock with the subflow socket lock.

We can however grab the spinlock: if that succeeds and the mptcp socket
is not owned at the moment, we can process the new skbs right away
without deferring this to the work queue.

This avoids the schedule_work and hence the small delay until the
work item is processed.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 29 ++++++++++++++++++++++++++++-
 net/mptcp/protocol.h |  2 +-
 net/mptcp/subflow.c  |  4 ++--
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index b781498e69b4..70f20c8eddbd 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -201,12 +201,39 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 	return done;
 }
 
-void mptcp_data_ready(struct sock *sk)
+/* In most cases we will be able to lock the mptcp socket.  If its already
+ * owned, we need to defer to the work queue to avoid ABBA deadlock.
+ */
+static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
+{
+	struct sock *sk = (struct sock *)msk;
+	unsigned int moved = 0;
+
+	if (READ_ONCE(sk->sk_lock.owned))
+		return false;
+
+	if (unlikely(!spin_trylock_bh(&sk->sk_lock.slock)))
+		return false;
+
+	/* must re-check after taking the lock */
+	if (!READ_ONCE(sk->sk_lock.owned))
+		__mptcp_move_skbs_from_subflow(msk, ssk, &moved);
+
+	spin_unlock_bh(&sk->sk_lock.slock);
+
+	return moved > 0;
+}
+
+void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
 	set_bit(MPTCP_DATA_READY, &msk->flags);
 
+	if (atomic_read(&sk->sk_rmem_alloc) < READ_ONCE(sk->sk_rcvbuf) &&
+	    move_skbs_to_msk(msk, ssk))
+		goto wake;
+
 	/* don't schedule if mptcp sk is (still) over limit */
 	if (atomic_read(&sk->sk_rmem_alloc) > READ_ONCE(sk->sk_rcvbuf))
 		goto wake;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index d06170c5f191..6c0b2c8ab674 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -195,7 +195,7 @@ void mptcp_get_options(const struct sk_buff *skb,
 		       struct tcp_options_received *opt_rx);
 
 void mptcp_finish_connect(struct sock *sk);
-void mptcp_data_ready(struct sock *sk);
+void mptcp_data_ready(struct sock *sk, struct sock *ssk);
 
 int mptcp_token_new_request(struct request_sock *req);
 void mptcp_token_destroy_request(u32 token);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 37a4767db441..0de2a44bdaa0 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -563,7 +563,7 @@ static void subflow_data_ready(struct sock *sk)
 	}
 
 	if (mptcp_subflow_data_available(sk))
-		mptcp_data_ready(parent);
+		mptcp_data_ready(parent, sk);
 }
 
 static void subflow_write_space(struct sock *sk)
@@ -696,7 +696,7 @@ static void subflow_state_change(struct sock *sk)
 	 * the data available machinery here.
 	 */
 	if (parent && subflow->mp_capable && mptcp_subflow_data_available(sk))
-		mptcp_data_ready(parent);
+		mptcp_data_ready(parent, sk);
 
 	if (parent && !(parent->sk_shutdown & RCV_SHUTDOWN) &&
 	    !subflow->rx_eof && subflow_is_done(sk)) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 7/7] mptcp: defer work schedule until mptcp lock is released
  2020-02-26  9:14 [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path Florian Westphal
                   ` (5 preceding siblings ...)
  2020-02-26  9:14 ` [PATCH net-next 6/7] mptcp: avoid work queue scheduling if possible Florian Westphal
@ 2020-02-26  9:14 ` Florian Westphal
  2020-02-27  0:12   ` Mat Martineau
  2020-02-27  4:47 ` [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path David Miller
  7 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2020-02-26  9:14 UTC (permalink / raw)
  To: netdev; +Cc: Paolo Abeni, Florian Westphal

From: Paolo Abeni <pabeni@redhat.com>

Don't schedule the work queue right away, instead defer this
to the lock release callback.

This has the advantage that it will give recv path a chance to
complete -- this might have moved all pending packets from the
subflow to the mptcp receive queue, which allows to avoid the
schedule_work().

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 70f20c8eddbd..044295707bbf 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -238,9 +238,16 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 	if (atomic_read(&sk->sk_rmem_alloc) > READ_ONCE(sk->sk_rcvbuf))
 		goto wake;
 
-	if (schedule_work(&msk->work))
-		sock_hold((struct sock *)msk);
+	/* mptcp socket is owned, release_cb should retry */
+	if (!test_and_set_bit(TCP_DELACK_TIMER_DEFERRED,
+			      &sk->sk_tsq_flags)) {
+		sock_hold(sk);
 
+		/* need to try again, its possible release_cb() has already
+		 * been called after the test_and_set_bit() above.
+		 */
+		move_skbs_to_msk(msk, ssk);
+	}
 wake:
 	sk->sk_data_ready(sk);
 }
@@ -941,6 +948,32 @@ static int mptcp_getsockopt(struct sock *sk, int level, int optname,
 	return -EOPNOTSUPP;
 }
 
+#define MPTCP_DEFERRED_ALL TCPF_DELACK_TIMER_DEFERRED
+
+/* this is very alike tcp_release_cb() but we must handle differently a
+ * different set of events
+ */
+static void mptcp_release_cb(struct sock *sk)
+{
+	unsigned long flags, nflags;
+
+	do {
+		flags = sk->sk_tsq_flags;
+		if (!(flags & MPTCP_DEFERRED_ALL))
+			return;
+		nflags = flags & ~MPTCP_DEFERRED_ALL;
+	} while (cmpxchg(&sk->sk_tsq_flags, flags, nflags) != flags);
+
+	if (flags & TCPF_DELACK_TIMER_DEFERRED) {
+		struct mptcp_sock *msk = mptcp_sk(sk);
+		struct sock *ssk;
+
+		ssk = mptcp_subflow_recv_lookup(msk);
+		if (!ssk || !schedule_work(&msk->work))
+			__sock_put(sk);
+	}
+}
+
 static int mptcp_get_port(struct sock *sk, unsigned short snum)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
@@ -1016,6 +1049,7 @@ static struct proto mptcp_prot = {
 	.destroy	= mptcp_destroy,
 	.sendmsg	= mptcp_sendmsg,
 	.recvmsg	= mptcp_recvmsg,
+	.release_cb	= mptcp_release_cb,
 	.hash		= inet_hash,
 	.unhash		= inet_unhash,
 	.get_port	= mptcp_get_port,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 1/7] mptcp: add and use mptcp_data_ready helper
  2020-02-26  9:14 ` [PATCH net-next 1/7] mptcp: add and use mptcp_data_ready helper Florian Westphal
@ 2020-02-27  0:09   ` Mat Martineau
  0 siblings, 0 replies; 16+ messages in thread
From: Mat Martineau @ 2020-02-27  0:09 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev


On Wed, 26 Feb 2020, Florian Westphal wrote:

> allows us to schedule the work queue to drain the ssk receive queue in
> a followup patch.
>
> This is needed to avoid sending all-to-pessimistic mptcp-level
> acknowledgements.  At this time, the ack_seq is what was last read by
> userspace instead of the highest in-sequence number queued for reading.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/7] mptcp: add work queue skeleton
  2020-02-26  9:14 ` [PATCH net-next 2/7] mptcp: add work queue skeleton Florian Westphal
@ 2020-02-27  0:10   ` Mat Martineau
  0 siblings, 0 replies; 16+ messages in thread
From: Mat Martineau @ 2020-02-27  0:10 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Paolo Abeni

On Wed, 26 Feb 2020, Florian Westphal wrote:

> From: Paolo Abeni <pabeni@redhat.com>
>
> Will be extended with functionality in followup patches.
> Initial user is moving skbs from subflows receive queue to
> the mptcp-level receive queue.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 3/7] mptcp: update mptcp ack sequence from work queue
  2020-02-26  9:14 ` [PATCH net-next 3/7] mptcp: update mptcp ack sequence from work queue Florian Westphal
@ 2020-02-27  0:10   ` Mat Martineau
  0 siblings, 0 replies; 16+ messages in thread
From: Mat Martineau @ 2020-02-27  0:10 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev


On Wed, 26 Feb 2020, Florian Westphal wrote:

> If userspace is not reading data, all the mptcp-level acks contain the
> ack_seq from the last time userspace read data rather than the most
> recent in-sequence value.
>
> This causes pointless retransmissions for data that is already queued.
>
> The reason for this is that all the mptcp protocol level processing
> happens at mptcp_recv time.
>
> This adds work queue to move skbs from the subflow sockets receive
> queue on the mptcp socket receive queue (which was not used so far).
>
> This allows us to announce the correct mptcp ack sequence in a timely
> fashion, even when the application does not call recv() on the mptcp socket
> for some time.
>
> We still wake userspace tasks waiting for POLLIN immediately:
> If the mptcp level receive queue is empty (because the work queue is
> still pending) it can be filled from in-sequence subflow sockets at
> recv time without a need to wait for the worker.
>
> The skb_orphan when moving skbs from subflow to mptcp level is needed,
> because the destructor (sock_rfree) relies on skb->sk (ssk!) lock
> being taken.
>
> A followup patch will add needed rmem accouting for the moved skbs.
>
> Other problem: In case application behaves as expected, and calls
> recv() as soon as mptcp socket becomes readable, the work queue will
> only waste cpu cycles.  This will also be addressed in followup patches.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 4/7] mptcp: add rmem queue accounting
  2020-02-26  9:14 ` [PATCH net-next 4/7] mptcp: add rmem queue accounting Florian Westphal
@ 2020-02-27  0:11   ` Mat Martineau
  0 siblings, 0 replies; 16+ messages in thread
From: Mat Martineau @ 2020-02-27  0:11 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On Wed, 26 Feb 2020, Florian Westphal wrote:

> If userspace never drains the receive buffers we must stop draining
> the subflow socket(s) at some point.
>
> This adds the needed rmem accouting for this.
> If the threshold is reached, we stop draining the subflows.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 5/7] mptcp: remove mptcp_read_actor
  2020-02-26  9:14 ` [PATCH net-next 5/7] mptcp: remove mptcp_read_actor Florian Westphal
@ 2020-02-27  0:11   ` Mat Martineau
  0 siblings, 0 replies; 16+ messages in thread
From: Mat Martineau @ 2020-02-27  0:11 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On Wed, 26 Feb 2020, Florian Westphal wrote:

> Only used to discard stale data from the subflow, so move
> it where needed.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 6/7] mptcp: avoid work queue scheduling if possible
  2020-02-26  9:14 ` [PATCH net-next 6/7] mptcp: avoid work queue scheduling if possible Florian Westphal
@ 2020-02-27  0:11   ` Mat Martineau
  0 siblings, 0 replies; 16+ messages in thread
From: Mat Martineau @ 2020-02-27  0:11 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On Wed, 26 Feb 2020, Florian Westphal wrote:

> We can't lock_sock() the mptcp socket from the subflow data_ready callback,
> it would result in ABBA deadlock with the subflow socket lock.
>
> We can however grab the spinlock: if that succeeds and the mptcp socket
> is not owned at the moment, we can process the new skbs right away
> without deferring this to the work queue.
>
> This avoids the schedule_work and hence the small delay until the
> work item is processed.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 7/7] mptcp: defer work schedule until mptcp lock is released
  2020-02-26  9:14 ` [PATCH net-next 7/7] mptcp: defer work schedule until mptcp lock is released Florian Westphal
@ 2020-02-27  0:12   ` Mat Martineau
  0 siblings, 0 replies; 16+ messages in thread
From: Mat Martineau @ 2020-02-27  0:12 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Paolo Abeni

On Wed, 26 Feb 2020, Florian Westphal wrote:

> From: Paolo Abeni <pabeni@redhat.com>
>
> Don't schedule the work queue right away, instead defer this
> to the lock release callback.
>
> This has the advantage that it will give recv path a chance to
> complete -- this might have moved all pending packets from the
> subflow to the mptcp receive queue, which allows to avoid the
> schedule_work().
>
> Co-developed-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path
  2020-02-26  9:14 [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path Florian Westphal
                   ` (6 preceding siblings ...)
  2020-02-26  9:14 ` [PATCH net-next 7/7] mptcp: defer work schedule until mptcp lock is released Florian Westphal
@ 2020-02-27  4:47 ` David Miller
  7 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2020-02-27  4:47 UTC (permalink / raw)
  To: fw; +Cc: netdev

From: Florian Westphal <fw@strlen.de>
Date: Wed, 26 Feb 2020 10:14:45 +0100

> This series moves mptcp-level ack sequence update outside of the recvmsg path.
> Current approach has two problems:
> 
> 1. There is delay between arrival of new data and the time we can ack
>    this data.
> 2. If userspace doesn't call recv for some time, mptcp ack_seq is not
>    updated at all, even if this data is queued in the subflow socket
>    receive queue.
> 
> Move skbs from the subflow socket receive queue to the mptcp-level
> receive queue, updating the mptcp-level ack sequence and have recv
> take skbs from the mptcp-level receive queue.
> 
> The first place where we will attempt to update the mptcp level acks
> is from the subflows' data_ready callback, even before we make userspace
> aware of new data.
> 
> Because of possible deadlock (we need to take the mptcp socket lock
> while already holding the subflow sockets lock), we may still need to
> defer the mptcp-level ack update.  In such case, this work will be either
> done from work queue or recv path, depending on which runs sooner.
> 
> In order to avoid pointless scheduling of the work queue, work
> will be queued from the mptcp sockets lock release callback.
> This allows to detect when the socket owner did drain the subflow
> socket receive queue.
> 
> Please see individual patches for more information.

Series applied, thanks Florian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-02-27  4:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-26  9:14 [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path Florian Westphal
2020-02-26  9:14 ` [PATCH net-next 1/7] mptcp: add and use mptcp_data_ready helper Florian Westphal
2020-02-27  0:09   ` Mat Martineau
2020-02-26  9:14 ` [PATCH net-next 2/7] mptcp: add work queue skeleton Florian Westphal
2020-02-27  0:10   ` Mat Martineau
2020-02-26  9:14 ` [PATCH net-next 3/7] mptcp: update mptcp ack sequence from work queue Florian Westphal
2020-02-27  0:10   ` Mat Martineau
2020-02-26  9:14 ` [PATCH net-next 4/7] mptcp: add rmem queue accounting Florian Westphal
2020-02-27  0:11   ` Mat Martineau
2020-02-26  9:14 ` [PATCH net-next 5/7] mptcp: remove mptcp_read_actor Florian Westphal
2020-02-27  0:11   ` Mat Martineau
2020-02-26  9:14 ` [PATCH net-next 6/7] mptcp: avoid work queue scheduling if possible Florian Westphal
2020-02-27  0:11   ` Mat Martineau
2020-02-26  9:14 ` [PATCH net-next 7/7] mptcp: defer work schedule until mptcp lock is released Florian Westphal
2020-02-27  0:12   ` Mat Martineau
2020-02-27  4:47 ` [PATCH net-next 0/7] mptcp: update mptcp ack sequence outside of recv path David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).