All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/7] mptcp: do not block on subflow socket
@ 2020-05-16  8:46 Florian Westphal
  2020-05-16  8:46 ` [PATCH net-next 1/7] mptcp: move common nospace-pattern to a helper Florian Westphal
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Florian Westphal @ 2020-05-16  8:46 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, mathew.j.martineau, matthieu.baerts

This series reworks mptcp_sendmsg logic to avoid blocking on the subflow
socket.

It does so by removing the wait loop from mptcp_sendmsg_frag helper.

In order to do that, it moves prerequisites that are currently
handled in mptcp_sendmsg_frag (and cause it to wait until they are
met, e.g. frag cache refill) into the callers.

The worker can just reschedule in case no subflow socket is ready,
since it can't wait -- doing so would block other work items and
doesn't make sense anyway because we should not (re)send data
in case resources are already low.

The sendmsg path can use the existing wait logic until memory
becomes available.

Because large send requests can result in multiple mptcp_sendmsg_frag
calls from mptcp_sendmsg, we may need to restart the socket lookup in
case subflow can't accept more data or memory is low.

Doing so blocks on the mptcp socket, and existing wait handling
releases the msk lock while blocking.

Lastly, no need to use GFP_ATOMIC for extension allocation:
extend __skb_ext_alloc with gfp_t arg instead of hard-coded ATOMIC and
then relax the allocation constraints for mptcp case: those requests
occur in process context.

Florian Westphal (7):
      mptcp: move common nospace-pattern to a helper
      mptcp: break and restart in case mptcp sndbuf is full
      mptcp: avoid blocking in tcp_sendpages
      mptcp: fill skb extension cache outside of mptcp_sendmsg_frag
      mptcp: fill skb page frag cache outside of mptcp_sendmsg_frag
      mptcp: remove inner wait loop from mptcp_sendmsg_frag
      net: allow __skb_ext_alloc to sleep

 include/linux/skbuff.h |   2 +-
 net/core/skbuff.c      |   8 +--
 net/mptcp/protocol.c   | 139 ++++++++++++++++++++++++++++++++++++-------------
 3 files changed, 109 insertions(+), 40 deletions(-)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next 1/7] mptcp: move common nospace-pattern to a helper
  2020-05-16  8:46 [PATCH net-next 0/7] mptcp: do not block on subflow socket Florian Westphal
@ 2020-05-16  8:46 ` Florian Westphal
  2020-05-16  8:46 ` [PATCH net-next 2/7] mptcp: break and restart in case mptcp sndbuf is full Florian Westphal
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2020-05-16  8:46 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, mathew.j.martineau, matthieu.baerts, Florian Westphal

Paolo noticed that ssk_check_wmem() has same pattern, so add/use
common helper for both places.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1f52a0fa31ed..0413454fcdaf 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -653,6 +653,15 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 	return ret;
 }
 
+static void mptcp_nospace(struct mptcp_sock *msk, struct socket *sock)
+{
+	clear_bit(MPTCP_SEND_SPACE, &msk->flags);
+	smp_mb__after_atomic(); /* msk->flags is changed by write_space cb */
+
+	/* enables sk->write_space() callbacks */
+	set_bit(SOCK_NOSPACE, &sock->flags);
+}
+
 static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 {
 	struct mptcp_subflow_context *subflow;
@@ -666,13 +675,8 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 		if (!sk_stream_memory_free(ssk)) {
 			struct socket *sock = ssk->sk_socket;
 
-			if (sock) {
-				clear_bit(MPTCP_SEND_SPACE, &msk->flags);
-				smp_mb__after_atomic();
-
-				/* enables sk->write_space() callbacks */
-				set_bit(SOCK_NOSPACE, &sock->flags);
-			}
+			if (sock)
+				mptcp_nospace(msk, sock);
 
 			return NULL;
 		}
@@ -698,13 +702,8 @@ static void ssk_check_wmem(struct mptcp_sock *msk, struct sock *ssk)
 		return;
 
 	sock = READ_ONCE(ssk->sk_socket);
-
-	if (sock) {
-		clear_bit(MPTCP_SEND_SPACE, &msk->flags);
-		smp_mb__after_atomic();
-		/* set NOSPACE only after clearing SEND_SPACE flag */
-		set_bit(SOCK_NOSPACE, &sock->flags);
-	}
+	if (sock)
+		mptcp_nospace(msk, sock);
 }
 
 static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 2/7] mptcp: break and restart in case mptcp sndbuf is full
  2020-05-16  8:46 [PATCH net-next 0/7] mptcp: do not block on subflow socket Florian Westphal
  2020-05-16  8:46 ` [PATCH net-next 1/7] mptcp: move common nospace-pattern to a helper Florian Westphal
@ 2020-05-16  8:46 ` Florian Westphal
  2020-05-16  8:46 ` [PATCH net-next 3/7] mptcp: avoid blocking in tcp_sendpages Florian Westphal
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2020-05-16  8:46 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, mathew.j.martineau, matthieu.baerts, Florian Westphal

Its not enough to check for available tcp send space.

We also hold on to transmitted data for mptcp-level retransmits.
Right now we will send more and more data if the peer can ack data
at the tcp level fast enough, since that frees up tcp send buffer space.

But we also need to check that data was acked and reclaimed at the mptcp
level.

Therefore add needed check in mptcp_sendmsg, flush tcp data and
wait until more mptcp snd space becomes available if we are over the
limit.  Before we wait for more data, also make sure we start the
retransmit timer if we ran out of sndbuf space.

Otherwise there is a very small chance that we wait forever:

 * receiver is waiting for data
 * sender is blocked because mptcp socket buffer is full
 * at tcp level, all data was acked
 * mptcp-level snd_una was not updated, because last ack
   that acknowledged the last data packet carried an older
   MPTCP-ack.

Restarting the retransmit timer avoids this problem: if TCP
subflow is idle, data is retransmitted from the RTX queue.

New data will make the peer send a new, updated MPTCP-Ack.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 0413454fcdaf..75be8d662ac5 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -739,9 +739,23 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	mptcp_clean_una(sk);
 
+wait_for_sndbuf:
 	__mptcp_flush_join_list(msk);
 	ssk = mptcp_subflow_get_send(msk);
 	while (!sk_stream_memory_free(sk) || !ssk) {
+		if (ssk) {
+			/* make sure retransmit timer is
+			 * running before we wait for memory.
+			 *
+			 * The retransmit timer might be needed
+			 * to make the peer send an up-to-date
+			 * MPTCP Ack.
+			 */
+			mptcp_set_timeout(sk, ssk);
+			if (!mptcp_timer_pending(sk))
+				mptcp_reset_timer(sk);
+		}
+
 		ret = sk_stream_wait_memory(sk, &timeo);
 		if (ret)
 			goto out;
@@ -776,6 +790,28 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		}
 
 		copied += ret;
+
+		/* memory is charged to mptcp level socket as well, i.e.
+		 * if msg is very large, mptcp socket may run out of buffer
+		 * space.  mptcp_clean_una() will release data that has
+		 * been acked at mptcp level in the mean time, so there is
+		 * a good chance we can continue sending data right away.
+		 */
+		if (unlikely(!sk_stream_memory_free(sk))) {
+			tcp_push(ssk, msg->msg_flags, mss_now,
+				 tcp_sk(ssk)->nonagle, size_goal);
+			mptcp_clean_una(sk);
+			if (!sk_stream_memory_free(sk)) {
+				/* can't send more for now, need to wait for
+				 * MPTCP-level ACKs from peer.
+				 *
+				 * Wakeup will happen via mptcp_clean_una().
+				 */
+				mptcp_set_timeout(sk, ssk);
+				release_sock(ssk);
+				goto wait_for_sndbuf;
+			}
+		}
 	}
 
 	mptcp_set_timeout(sk, ssk);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 3/7] mptcp: avoid blocking in tcp_sendpages
  2020-05-16  8:46 [PATCH net-next 0/7] mptcp: do not block on subflow socket Florian Westphal
  2020-05-16  8:46 ` [PATCH net-next 1/7] mptcp: move common nospace-pattern to a helper Florian Westphal
  2020-05-16  8:46 ` [PATCH net-next 2/7] mptcp: break and restart in case mptcp sndbuf is full Florian Westphal
@ 2020-05-16  8:46 ` Florian Westphal
  2020-05-16  8:46 ` [PATCH net-next 4/7] mptcp: fill skb extension cache outside of mptcp_sendmsg_frag Florian Westphal
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2020-05-16  8:46 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, mathew.j.martineau, matthieu.baerts, Florian Westphal

The transmit loop continues to xmit new data until an error is returned
or all data was transmitted.

For the blocking i/o case, this means that tcp_sendpages() may block on
the subflow until more space becomes available, i.e. we end up sleeping
with the mptcp socket lock held.

Instead we should check if a different subflow is ready to be used.

This restarts the subflow sk lookup when the tx operation succeeded
and the tcp subflow can't accept more data or if tcp_sendpages
indicates -EAGAIN on a blocking mptcp socket.

In that case we also need to set the NOSPACE bit to make sure we get
notified once memory becomes available.

In case all subflows are busy, the existing logic will wait until a
subflow is ready, releasing the mptcp socket lock while doing so.

The mptcp worker already sets DONTWAIT, so no need to make changes there.

v2:
 * set NOSPACE bit
 * add a comment to clarify that mptcp-sk sndbuf limits need to
   be checked as well.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 75be8d662ac5..e97357066b21 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -590,7 +590,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 	 * access the skb after the sendpages call
 	 */
 	ret = do_tcp_sendpages(ssk, page, offset, psize,
-			       msg->msg_flags | MSG_SENDPAGE_NOTLAST);
+			       msg->msg_flags | MSG_SENDPAGE_NOTLAST | MSG_DONTWAIT);
 	if (ret <= 0)
 		return ret;
 
@@ -713,6 +713,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	struct socket *ssock;
 	size_t copied = 0;
 	struct sock *ssk;
+	bool tx_ok;
 	long timeo;
 
 	if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL))
@@ -737,6 +738,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		return ret >= 0 ? ret + copied : (copied ? copied : ret);
 	}
 
+restart:
 	mptcp_clean_una(sk);
 
 wait_for_sndbuf:
@@ -772,11 +774,18 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	pr_debug("conn_list->subflow=%p", ssk);
 
 	lock_sock(ssk);
-	while (msg_data_left(msg)) {
+	tx_ok = msg_data_left(msg);
+	while (tx_ok) {
 		ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now,
 					 &size_goal);
-		if (ret < 0)
+		if (ret < 0) {
+			if (ret == -EAGAIN && timeo > 0) {
+				mptcp_set_timeout(sk, ssk);
+				release_sock(ssk);
+				goto restart;
+			}
 			break;
+		}
 		if (ret == 0 && unlikely(__mptcp_needs_tcp_fallback(msk))) {
 			/* Can happen for passive sockets:
 			 * 3WHS negotiated MPTCP, but first packet after is
@@ -791,11 +800,31 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 		copied += ret;
 
+		tx_ok = msg_data_left(msg);
+		if (!tx_ok)
+			break;
+
+		if (!sk_stream_memory_free(ssk)) {
+			set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
+			tcp_push(ssk, msg->msg_flags, mss_now,
+				 tcp_sk(ssk)->nonagle, size_goal);
+			mptcp_set_timeout(sk, ssk);
+			release_sock(ssk);
+			goto restart;
+		}
+
 		/* memory is charged to mptcp level socket as well, i.e.
 		 * if msg is very large, mptcp socket may run out of buffer
 		 * space.  mptcp_clean_una() will release data that has
 		 * been acked at mptcp level in the mean time, so there is
 		 * a good chance we can continue sending data right away.
+		 *
+		 * Normally, when the tcp subflow can accept more data, then
+		 * so can the MPTCP socket.  However, we need to cope with
+		 * peers that might lag behind in their MPTCP-level
+		 * acknowledgements, i.e.  data might have been acked at
+		 * tcp level only.  So, we must also check the MPTCP socket
+		 * limits before we send more data.
 		 */
 		if (unlikely(!sk_stream_memory_free(sk))) {
 			tcp_push(ssk, msg->msg_flags, mss_now,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 4/7] mptcp: fill skb extension cache outside of mptcp_sendmsg_frag
  2020-05-16  8:46 [PATCH net-next 0/7] mptcp: do not block on subflow socket Florian Westphal
                   ` (2 preceding siblings ...)
  2020-05-16  8:46 ` [PATCH net-next 3/7] mptcp: avoid blocking in tcp_sendpages Florian Westphal
@ 2020-05-16  8:46 ` Florian Westphal
  2020-05-16  8:46 ` [PATCH net-next 5/7] mptcp: fill skb page frag " Florian Westphal
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2020-05-16  8:46 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, mathew.j.martineau, matthieu.baerts, Florian Westphal

The mptcp_sendmsg_frag helper contains a loop that will wait on the
subflow sk.

It seems preferrable to only wait in mptcp_sendmsg() when blocking io is
requested.  mptcp_sendmsg already has such a wait loop that is used when
no subflow socket is available for transmission.

This is a preparation patch that makes sure we call
mptcp_sendmsg_frag only if a skb extension has been allocated.

Moreover, such allocation currently uses GFP_ATOMIC while it
could use sleeping allocation instead.

Followup patches will remove the wait loop from mptcp_sendmsg_frag()
and will allow to do a sleeping allocation for the extension.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index e97357066b21..1bdfbca1c23a 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -669,6 +669,9 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk)
 
 	sock_owned_by_me((const struct sock *)msk);
 
+	if (!mptcp_ext_cache_refill(msk))
+		return NULL;
+
 	mptcp_for_each_subflow(msk, subflow) {
 		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
 
@@ -804,7 +807,8 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		if (!tx_ok)
 			break;
 
-		if (!sk_stream_memory_free(ssk)) {
+		if (!sk_stream_memory_free(ssk) ||
+		    !mptcp_ext_cache_refill(msk)) {
 			set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 			tcp_push(ssk, msg->msg_flags, mss_now,
 				 tcp_sk(ssk)->nonagle, size_goal);
@@ -1158,7 +1162,7 @@ static void mptcp_worker(struct work_struct *work)
 {
 	struct mptcp_sock *msk = container_of(work, struct mptcp_sock, work);
 	struct sock *ssk, *sk = &msk->sk.icsk_inet.sk;
-	int orig_len, orig_offset, ret, mss_now = 0, size_goal = 0;
+	int orig_len, orig_offset, mss_now = 0, size_goal = 0;
 	struct mptcp_data_frag *dfrag;
 	u64 orig_write_seq;
 	size_t copied = 0;
@@ -1180,6 +1184,9 @@ static void mptcp_worker(struct work_struct *work)
 	if (!dfrag)
 		goto unlock;
 
+	if (!mptcp_ext_cache_refill(msk))
+		goto reset_unlock;
+
 	ssk = mptcp_subflow_get_retrans(msk);
 	if (!ssk)
 		goto reset_unlock;
@@ -1191,8 +1198,8 @@ static void mptcp_worker(struct work_struct *work)
 	orig_offset = dfrag->offset;
 	orig_write_seq = dfrag->data_seq;
 	while (dfrag->data_len > 0) {
-		ret = mptcp_sendmsg_frag(sk, ssk, &msg, dfrag, &timeo, &mss_now,
-					 &size_goal);
+		int ret = mptcp_sendmsg_frag(sk, ssk, &msg, dfrag, &timeo,
+					     &mss_now, &size_goal);
 		if (ret < 0)
 			break;
 
@@ -1200,6 +1207,9 @@ static void mptcp_worker(struct work_struct *work)
 		copied += ret;
 		dfrag->data_len -= ret;
 		dfrag->offset += ret;
+
+		if (!mptcp_ext_cache_refill(msk))
+			break;
 	}
 	if (copied)
 		tcp_push(ssk, msg.msg_flags, mss_now, tcp_sk(ssk)->nonagle,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 5/7] mptcp: fill skb page frag cache outside of mptcp_sendmsg_frag
  2020-05-16  8:46 [PATCH net-next 0/7] mptcp: do not block on subflow socket Florian Westphal
                   ` (3 preceding siblings ...)
  2020-05-16  8:46 ` [PATCH net-next 4/7] mptcp: fill skb extension cache outside of mptcp_sendmsg_frag Florian Westphal
@ 2020-05-16  8:46 ` Florian Westphal
  2020-05-16  8:46 ` [PATCH net-next 6/7] mptcp: remove inner wait loop from mptcp_sendmsg_frag Florian Westphal
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2020-05-16  8:46 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, mathew.j.martineau, matthieu.baerts, Florian Westphal

The mptcp_sendmsg_frag helper contains a loop that will wait on the
subflow sk.

It seems preferrable to only wait in mptcp_sendmsg() when blocking io is
requested.  mptcp_sendmsg already has such a wait loop that is used when
no subflow socket is available for transmission.

This is another preparation patch that makes sure we call
mptcp_sendmsg_frag only if the page frag cache has been refilled.

Followup patch will remove the wait loop from mptcp_sendmsg_frag().

The retransmit worker doesn't need to do this refill as it won't
transmit new mptcp-level data.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1bdfbca1c23a..a11e51222e59 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -713,6 +713,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 {
 	int mss_now = 0, size_goal = 0, ret = 0;
 	struct mptcp_sock *msk = mptcp_sk(sk);
+	struct page_frag *pfrag;
 	struct socket *ssock;
 	size_t copied = 0;
 	struct sock *ssk;
@@ -741,13 +742,16 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		return ret >= 0 ? ret + copied : (copied ? copied : ret);
 	}
 
+	pfrag = sk_page_frag(sk);
 restart:
 	mptcp_clean_una(sk);
 
 wait_for_sndbuf:
 	__mptcp_flush_join_list(msk);
 	ssk = mptcp_subflow_get_send(msk);
-	while (!sk_stream_memory_free(sk) || !ssk) {
+	while (!sk_stream_memory_free(sk) ||
+	       !ssk ||
+	       !mptcp_page_frag_refill(ssk, pfrag)) {
 		if (ssk) {
 			/* make sure retransmit timer is
 			 * running before we wait for memory.
@@ -808,6 +812,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 			break;
 
 		if (!sk_stream_memory_free(ssk) ||
+		    !mptcp_page_frag_refill(ssk, pfrag) ||
 		    !mptcp_ext_cache_refill(msk)) {
 			set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 			tcp_push(ssk, msg->msg_flags, mss_now,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 6/7] mptcp: remove inner wait loop from mptcp_sendmsg_frag
  2020-05-16  8:46 [PATCH net-next 0/7] mptcp: do not block on subflow socket Florian Westphal
                   ` (4 preceding siblings ...)
  2020-05-16  8:46 ` [PATCH net-next 5/7] mptcp: fill skb page frag " Florian Westphal
@ 2020-05-16  8:46 ` Florian Westphal
  2020-05-16  8:46 ` [PATCH net-next 7/7] net: allow __skb_ext_alloc to sleep Florian Westphal
  2020-05-17 19:36 ` [PATCH net-next 0/7] mptcp: do not block on subflow socket David Miller
  7 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2020-05-16  8:46 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, mathew.j.martineau, matthieu.baerts, Florian Westphal

previous patches made sure we only call into this function
when these prerequisites are met, so no need to wait on the
subflow socket anymore.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/7
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/mptcp/protocol.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index a11e51222e59..bc950cf818f7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -510,20 +510,6 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 	 * fooled into a warning if we don't init here
 	 */
 	pfrag = sk_page_frag(sk);
-	while ((!retransmission && !mptcp_page_frag_refill(ssk, pfrag)) ||
-	       !mptcp_ext_cache_refill(msk)) {
-		ret = sk_stream_wait_memory(ssk, timeo);
-		if (ret)
-			return ret;
-
-		/* if sk_stream_wait_memory() sleeps snd_una can change
-		 * significantly, refresh the rtx queue
-		 */
-		mptcp_clean_una(sk);
-
-		if (unlikely(__mptcp_needs_tcp_fallback(msk)))
-			return 0;
-	}
 	if (!retransmission) {
 		write_seq = &msk->write_seq;
 		page = pfrag->page;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 7/7] net: allow __skb_ext_alloc to sleep
  2020-05-16  8:46 [PATCH net-next 0/7] mptcp: do not block on subflow socket Florian Westphal
                   ` (5 preceding siblings ...)
  2020-05-16  8:46 ` [PATCH net-next 6/7] mptcp: remove inner wait loop from mptcp_sendmsg_frag Florian Westphal
@ 2020-05-16  8:46 ` Florian Westphal
  2020-05-17 19:36 ` [PATCH net-next 0/7] mptcp: do not block on subflow socket David Miller
  7 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2020-05-16  8:46 UTC (permalink / raw)
  To: netdev; +Cc: pabeni, mathew.j.martineau, matthieu.baerts, Florian Westphal

mptcp calls this from the transmit side, from process context.
Allow a sleeping allocation instead of unconditional GFP_ATOMIC.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/skbuff.h | 2 +-
 net/core/skbuff.c      | 8 +++++---
 net/mptcp/protocol.c   | 4 +++-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 3000c526f552..531843952809 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4165,7 +4165,7 @@ struct skb_ext {
 	char data[] __aligned(8);
 };
 
-struct skb_ext *__skb_ext_alloc(void);
+struct skb_ext *__skb_ext_alloc(gfp_t flags);
 void *__skb_ext_set(struct sk_buff *skb, enum skb_ext_id id,
 		    struct skb_ext *ext);
 void *skb_ext_add(struct sk_buff *skb, enum skb_ext_id id);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1bf0c3d278e7..35a133c6d13b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6087,13 +6087,15 @@ static void *skb_ext_get_ptr(struct skb_ext *ext, enum skb_ext_id id)
 /**
  * __skb_ext_alloc - allocate a new skb extensions storage
  *
+ * @flags: See kmalloc().
+ *
  * Returns the newly allocated pointer. The pointer can later attached to a
  * skb via __skb_ext_set().
  * Note: caller must handle the skb_ext as an opaque data.
  */
-struct skb_ext *__skb_ext_alloc(void)
+struct skb_ext *__skb_ext_alloc(gfp_t flags)
 {
-	struct skb_ext *new = kmem_cache_alloc(skbuff_ext_cache, GFP_ATOMIC);
+	struct skb_ext *new = kmem_cache_alloc(skbuff_ext_cache, flags);
 
 	if (new) {
 		memset(new->offset, 0, sizeof(new->offset));
@@ -6188,7 +6190,7 @@ void *skb_ext_add(struct sk_buff *skb, enum skb_ext_id id)
 	} else {
 		newoff = SKB_EXT_CHUNKSIZEOF(*new);
 
-		new = __skb_ext_alloc();
+		new = __skb_ext_alloc(GFP_ATOMIC);
 		if (!new)
 			return NULL;
 	}
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index bc950cf818f7..e3a628bea2b8 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -367,8 +367,10 @@ static void mptcp_stop_timer(struct sock *sk)
 
 static bool mptcp_ext_cache_refill(struct mptcp_sock *msk)
 {
+	const struct sock *sk = (const struct sock *)msk;
+
 	if (!msk->cached_ext)
-		msk->cached_ext = __skb_ext_alloc();
+		msk->cached_ext = __skb_ext_alloc(sk->sk_allocation);
 
 	return !!msk->cached_ext;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 0/7] mptcp: do not block on subflow socket
  2020-05-16  8:46 [PATCH net-next 0/7] mptcp: do not block on subflow socket Florian Westphal
                   ` (6 preceding siblings ...)
  2020-05-16  8:46 ` [PATCH net-next 7/7] net: allow __skb_ext_alloc to sleep Florian Westphal
@ 2020-05-17 19:36 ` David Miller
  7 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2020-05-17 19:36 UTC (permalink / raw)
  To: fw; +Cc: netdev, pabeni, mathew.j.martineau, matthieu.baerts

From: Florian Westphal <fw@strlen.de>
Date: Sat, 16 May 2020 10:46:16 +0200

> This series reworks mptcp_sendmsg logic to avoid blocking on the subflow
> socket.
> 
> It does so by removing the wait loop from mptcp_sendmsg_frag helper.
> 
> In order to do that, it moves prerequisites that are currently
> handled in mptcp_sendmsg_frag (and cause it to wait until they are
> met, e.g. frag cache refill) into the callers.
> 
> The worker can just reschedule in case no subflow socket is ready,
> since it can't wait -- doing so would block other work items and
> doesn't make sense anyway because we should not (re)send data
> in case resources are already low.
> 
> The sendmsg path can use the existing wait logic until memory
> becomes available.
> 
> Because large send requests can result in multiple mptcp_sendmsg_frag
> calls from mptcp_sendmsg, we may need to restart the socket lookup in
> case subflow can't accept more data or memory is low.
> 
> Doing so blocks on the mptcp socket, and existing wait handling
> releases the msk lock while blocking.
> 
> Lastly, no need to use GFP_ATOMIC for extension allocation:
> extend __skb_ext_alloc with gfp_t arg instead of hard-coded ATOMIC and
> then relax the allocation constraints for mptcp case: those requests
> occur in process context.

Series applied, thanks Florian.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-05-17 19:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-16  8:46 [PATCH net-next 0/7] mptcp: do not block on subflow socket Florian Westphal
2020-05-16  8:46 ` [PATCH net-next 1/7] mptcp: move common nospace-pattern to a helper Florian Westphal
2020-05-16  8:46 ` [PATCH net-next 2/7] mptcp: break and restart in case mptcp sndbuf is full Florian Westphal
2020-05-16  8:46 ` [PATCH net-next 3/7] mptcp: avoid blocking in tcp_sendpages Florian Westphal
2020-05-16  8:46 ` [PATCH net-next 4/7] mptcp: fill skb extension cache outside of mptcp_sendmsg_frag Florian Westphal
2020-05-16  8:46 ` [PATCH net-next 5/7] mptcp: fill skb page frag " Florian Westphal
2020-05-16  8:46 ` [PATCH net-next 6/7] mptcp: remove inner wait loop from mptcp_sendmsg_frag Florian Westphal
2020-05-16  8:46 ` [PATCH net-next 7/7] net: allow __skb_ext_alloc to sleep Florian Westphal
2020-05-17 19:36 ` [PATCH net-next 0/7] mptcp: do not block on subflow socket David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.