Re: [MPTCP] [RFC PATCH 4/4] mptcp: allow collapsing consecutive sendpages on the same substream

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [MPTCP] [RFC PATCH 4/4] mptcp: allow collapsing consecutive sendpages on the same substream
@ 2019-04-10  8:05 Paolo Abeni
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2019-04-10  8:05 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]

On Fri, 2019-04-05 at 17:34 -0700, Mat Martineau wrote:
> On Fri, 5 Apr 2019, Paolo Abeni wrote:
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 2fbfe3e30630..5df2283de488 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -47,11 +47,19 @@ static struct sock *mptcp_subflow_get_ref(const struct mptcp_sock *msk)
> > 	return NULL;
> > }
> > 
> > +static inline bool mptcp_skb_can_collapse_to(const struct mptcp_sock *msk,
> > +					     const struct sock *ssk,
> > +					     const struct sk_buff *skb)
> > +{
> > +	return tcp_skb_can_collapse_to(skb) && (msk->last == ssk);
> 
> Even if the last send was on a particular subflow, that doesn't guarantee 
> the next send can be collapsed on that subflow. There are other cases like 
> MPTCP-level resends or sending duplicate data on different subflows.

uhmm... I thought a bit more about the above and the code in the patch
still LGTM: msk->last references the last subflow we enqueued new data
to, that is, the subflow with the higher/newer MPTCP write_seq...

> I think it would have to make sure the MPTCP-level sequence numbers are 
> contiguous before collapsing.

... so this condition always held, when (msk->last == ssk).

Am I missing something?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [MPTCP] [RFC PATCH 4/4] mptcp: allow collapsing consecutive sendpages on the same substream
@ 2019-04-10 20:59 Paolo Abeni
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2019-04-10 20:59 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

On Wed, 2019-04-10 at 10:16 -0700, Mat Martineau wrote:
> On Wed, 10 Apr 2019, Paolo Abeni wrote:
> > uhmm... I thought a bit more about the above and the code in the patch
> > still LGTM: msk->last references the last subflow we enqueued new data
> > to, that is, the subflow with the higher/newer MPTCP write_seq...
> > 
> > > I think it would have to make sure the MPTCP-level sequence numbers are
> > > contiguous before collapsing.
> > 
> > ... so this condition always held, when (msk->last == ssk).
> > 
> > Am I missing something?
> > 
> 
> Resends will be a problem with the "last subflow written to" assumption.
> 
> In a case where you have multiple subflows S1 and S2, you might have a 
> scenario like this (sequence numbers are in the 64-bit MPTCP sequence 
> space):
> 
> 1. Send bytes 0-99 on S1
> 
> 2. Send bytes 100-199 on S2
> 
> 3. <time passes: S2 connection has a problem and data does not get 
> through>
> 
> 4. Send bytes 200-299 on S1
> 
> 5. MPTCP-level resend 100-199 on S1
> 
> 6. Send bytes 300-399 on S1
> 
> 
> When you get to step 6, the "last" send was on S1, but you need a new 
> MPTCP mapping header for bytes 300-399 because the previous byte in the S1 
> stream is 199. It's important to know where the earlier mappings end so 
> you can create a new one without leaving a gap in the MPTCP sequence 
> space.

Ah, now I see! thank you for the detailed example! 

I'll take care in the next iteration, but I doubt I'll be able to send
a new version before the next mtg, because I'm travelling with no/very
limited time avail.

Thank you,

Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [MPTCP] [RFC PATCH 4/4] mptcp: allow collapsing consecutive sendpages on the same substream
@ 2019-04-10 17:16 Mat Martineau
  0 siblings, 0 replies; 5+ messages in thread
From: Mat Martineau @ 2019-04-10 17:16 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2161 bytes --]


On Wed, 10 Apr 2019, Paolo Abeni wrote:

> On Fri, 2019-04-05 at 17:34 -0700, Mat Martineau wrote:
>> On Fri, 5 Apr 2019, Paolo Abeni wrote:
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index 2fbfe3e30630..5df2283de488 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -47,11 +47,19 @@ static struct sock *mptcp_subflow_get_ref(const struct mptcp_sock *msk)
>>> 	return NULL;
>>> }
>>>
>>> +static inline bool mptcp_skb_can_collapse_to(const struct mptcp_sock *msk,
>>> +					     const struct sock *ssk,
>>> +					     const struct sk_buff *skb)
>>> +{
>>> +	return tcp_skb_can_collapse_to(skb) && (msk->last == ssk);
>>
>> Even if the last send was on a particular subflow, that doesn't guarantee
>> the next send can be collapsed on that subflow. There are other cases like
>> MPTCP-level resends or sending duplicate data on different subflows.
>
> uhmm... I thought a bit more about the above and the code in the patch
> still LGTM: msk->last references the last subflow we enqueued new data
> to, that is, the subflow with the higher/newer MPTCP write_seq...
>
>> I think it would have to make sure the MPTCP-level sequence numbers are
>> contiguous before collapsing.
>
> ... so this condition always held, when (msk->last == ssk).
>
> Am I missing something?
>

Resends will be a problem with the "last subflow written to" assumption.

In a case where you have multiple subflows S1 and S2, you might have a 
scenario like this (sequence numbers are in the 64-bit MPTCP sequence 
space):

1. Send bytes 0-99 on S1

2. Send bytes 100-199 on S2

3. <time passes: S2 connection has a problem and data does not get 
through>

4. Send bytes 200-299 on S1

5. MPTCP-level resend 100-199 on S1

6. Send bytes 300-399 on S1


When you get to step 6, the "last" send was on S1, but you need a new 
MPTCP mapping header for bytes 300-399 because the previous byte in the S1 
stream is 199. It's important to know where the earlier mappings end so 
you can create a new one without leaving a gap in the MPTCP sequence 
space.



--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [MPTCP] [RFC PATCH 4/4] mptcp: allow collapsing consecutive sendpages on the same substream
@ 2019-04-06  0:34 Mat Martineau
  0 siblings, 0 replies; 5+ messages in thread
From: Mat Martineau @ 2019-04-06  0:34 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 5620 bytes --]


Hi Paolo,

On Fri, 5 Apr 2019, Paolo Abeni wrote:

> If the current sendmsg() lands on the same subflow we used last, we
> can try to collapse the data.
>
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> ---
> include/net/mptcp.h  |  1 +
> net/mptcp/protocol.c | 57 +++++++++++++++++++++++++++++++-------------
> 2 files changed, 41 insertions(+), 17 deletions(-)
>
> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> index 0c0839d9ab38..788d4d5a084c 100644
> --- a/include/net/mptcp.h
> +++ b/include/net/mptcp.h
> @@ -50,6 +50,7 @@ struct mptcp_sock {
> 	spinlock_t	conn_list_lock;
> 	struct hlist_head conn_list;
> 	struct socket	*subflow; /* outgoing connect/listener/!mp_capable */
> +	struct sock	*last;
> };
>
> #define mptcp_for_each_subflow(__msk, __subflow)			\
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 2fbfe3e30630..5df2283de488 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -47,11 +47,19 @@ static struct sock *mptcp_subflow_get_ref(const struct mptcp_sock *msk)
> 	return NULL;
> }
>
> +static inline bool mptcp_skb_can_collapse_to(const struct mptcp_sock *msk,
> +					     const struct sock *ssk,
> +					     const struct sk_buff *skb)
> +{
> +	return tcp_skb_can_collapse_to(skb) && (msk->last == ssk);

Even if the last send was on a particular subflow, that doesn't guarantee 
the next send can be collapsed on that subflow. There are other cases like 
MPTCP-level resends or sending duplicate data on different subflows.

I think it would have to make sure the MPTCP-level sequence numbers are 
contiguous before collapsing.

> +}
> +
> static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
> -			      struct msghdr *msg)
> +			      struct msghdr *msg, int *pmss_now, int *ps_goal)
> {
> +	int mss_now, avail_size, size_goal, poffset, ret;
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> -	int mss_now, size_goal, poffset, ret;
> +	bool collapsed, can_collapse = false;
> 	struct mptcp_ext *mpext = NULL;
> 	struct page_frag *pfrag;
> 	struct sk_buff *skb;
> @@ -78,25 +86,35 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
> 	if (!psize)
> 		return -EINVAL;
>
> -	/* Mark the end of the previous write so the beginning of the
> -	 * next write (with its own mptcp skb extension data) is not
> -	 * collapsed.
> -	 */
> +	mss_now = tcp_send_mss(ssk, &size_goal, msg->msg_flags);
> +	*pmss_now = mss_now;
> +	*ps_goal = size_goal;
> +	avail_size = size_goal;
> 	skb = tcp_write_queue_tail(ssk);
> -	if (skb)
> -		TCP_SKB_CB(skb)->eor = 1;
> +	if (skb) {
> +		can_collapse = mptcp_skb_can_collapse_to(msk, ssk, skb);
> +		if (!can_collapse)
> +			TCP_SKB_CB(skb)->eor = 1;
> +		else if (size_goal - skb->len > 0)
> +			avail_size = size_goal + size_goal - skb->len;

Can you comment more on the calculation for avail_size here - is this how 
you "allocate at most a new skb" as you mention below?

> +	}
>
> -	mss_now = tcp_send_mss(ssk, &size_goal, msg->msg_flags);
> -	psize = min_t(int, size_goal, psize);
> +	/* be sure that we are going to allocate at most a new skb, and
> +	 * tell the tcp stack to delay the push so that we can safely
> +	 * access the skb after the sendpages call
> +	 */
> +	psize = min_t(int, avail_size, psize);
> 	ret = do_tcp_sendpages(ssk, pfrag->page, poffset, psize,
> 			       msg->msg_flags | MSG_SENDPAGE_NOTLAST);
> 	if (ret <= 0)
> 		return ret;
>
> -	if (skb == tcp_write_queue_tail(ssk))
> -		pr_err("no new skb %p/%p", sk, ssk);
> +	collapsed = skb == tcp_write_queue_tail(ssk);
> +	BUG_ON(collapsed && !can_collapse);
>
> 	skb = tcp_write_queue_tail(ssk);
> +	if (collapsed)
> +		goto out;

Without the previous commit (moving mpext->data_len assignment), could you 
instead update the existing mpext in the "collapsed == true" case?


Mat


>
> 	mpext = skb_ext_add(skb, SKB_EXT_MPTCP);
>
> @@ -109,24 +127,25 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
> 		mpext->dsn64 = 1;
>
> 		pr_debug("data_seq=%llu subflow_seq=%u data_len=%u checksum=%u, dsn64=%d",
> -			 mpext->data_seq, mpext->subflow_seq, mpext->data_len,
> +			 mpext->data_seq, mpext->subflow_seq, skb->len,
> 			 mpext->checksum, mpext->dsn64);
> 	} /* TODO: else fallback */
>
> +out:
> 	pfrag->offset += ret;
> +	msk->last = ssk;
> 	msk->write_seq += ret;
> 	subflow_ctx(ssk)->rel_write_seq += ret;
>
> -	tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle, size_goal);
> 	return ret;
> }
>
> static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> {
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> +	int mss_now = 0, size_goal = 0, ret = 0;
> 	size_t copied = 0;
> 	struct sock *ssk;
> -	int ret = 0;
>
> 	pr_debug("msk=%p", msk);
> 	if (msk->subflow) {
> @@ -153,15 +172,19 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>
> 	lock_sock(sk);
> 	lock_sock(ssk);
> +
> 	while (msg_data_left(msg)) {
> -		ret = mptcp_sendmsg_frag(sk, ssk, msg);
> +		ret = mptcp_sendmsg_frag(sk, ssk, msg, &mss_now, &size_goal);
> 		if (ret < 0)
> 			break;
>
> 		copied += ret;
> 	}
> -	if (copied > 0)
> +	if (copied) {
> 		ret = copied;
> +		tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle,
> +			 size_goal);
> +	}
>
> 	release_sock(ssk);
> 	release_sock(sk);
> -- 
> 2.20.1
>
> _______________________________________________
> mptcp mailing list
> mptcp(a)lists.01.org
> https://lists.01.org/mailman/listinfo/mptcp
>

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [MPTCP] [RFC PATCH 4/4] mptcp: allow collapsing consecutive sendpages on the same substream
@ 2019-04-05 14:46 Paolo Abeni
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2019-04-05 14:46 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 4550 bytes --]

If the current sendmsg() lands on the same subflow we used last, we
can try to collapse the data.

Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
 include/net/mptcp.h  |  1 +
 net/mptcp/protocol.c | 57 +++++++++++++++++++++++++++++++-------------
 2 files changed, 41 insertions(+), 17 deletions(-)

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 0c0839d9ab38..788d4d5a084c 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -50,6 +50,7 @@ struct mptcp_sock {
 	spinlock_t	conn_list_lock;
 	struct hlist_head conn_list;
 	struct socket	*subflow; /* outgoing connect/listener/!mp_capable */
+	struct sock	*last;
 };
 
 #define mptcp_for_each_subflow(__msk, __subflow)			\
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2fbfe3e30630..5df2283de488 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -47,11 +47,19 @@ static struct sock *mptcp_subflow_get_ref(const struct mptcp_sock *msk)
 	return NULL;
 }
 
+static inline bool mptcp_skb_can_collapse_to(const struct mptcp_sock *msk,
+					     const struct sock *ssk,
+					     const struct sk_buff *skb)
+{
+	return tcp_skb_can_collapse_to(skb) && (msk->last == ssk);
+}
+
 static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
-			      struct msghdr *msg)
+			      struct msghdr *msg, int *pmss_now, int *ps_goal)
 {
+	int mss_now, avail_size, size_goal, poffset, ret;
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	int mss_now, size_goal, poffset, ret;
+	bool collapsed, can_collapse = false;
 	struct mptcp_ext *mpext = NULL;
 	struct page_frag *pfrag;
 	struct sk_buff *skb;
@@ -78,25 +86,35 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 	if (!psize)
 		return -EINVAL;
 
-	/* Mark the end of the previous write so the beginning of the
-	 * next write (with its own mptcp skb extension data) is not
-	 * collapsed.
-	 */
+	mss_now = tcp_send_mss(ssk, &size_goal, msg->msg_flags);
+	*pmss_now = mss_now;
+	*ps_goal = size_goal;
+	avail_size = size_goal;
 	skb = tcp_write_queue_tail(ssk);
-	if (skb)
-		TCP_SKB_CB(skb)->eor = 1;
+	if (skb) {
+		can_collapse = mptcp_skb_can_collapse_to(msk, ssk, skb);
+		if (!can_collapse)
+			TCP_SKB_CB(skb)->eor = 1;
+		else if (size_goal - skb->len > 0)
+			avail_size = size_goal + size_goal - skb->len;
+	}
 
-	mss_now = tcp_send_mss(ssk, &size_goal, msg->msg_flags);
-	psize = min_t(int, size_goal, psize);
+	/* be sure that we are going to allocate at most a new skb, and
+	 * tell the tcp stack to delay the push so that we can safely
+	 * access the skb after the sendpages call
+	 */
+	psize = min_t(int, avail_size, psize);
 	ret = do_tcp_sendpages(ssk, pfrag->page, poffset, psize,
 			       msg->msg_flags | MSG_SENDPAGE_NOTLAST);
 	if (ret <= 0)
 		return ret;
 
-	if (skb == tcp_write_queue_tail(ssk))
-		pr_err("no new skb %p/%p", sk, ssk);
+	collapsed = skb == tcp_write_queue_tail(ssk);
+	BUG_ON(collapsed && !can_collapse);
 
 	skb = tcp_write_queue_tail(ssk);
+	if (collapsed)
+		goto out;
 
 	mpext = skb_ext_add(skb, SKB_EXT_MPTCP);
 
@@ -109,24 +127,25 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 		mpext->dsn64 = 1;
 
 		pr_debug("data_seq=%llu subflow_seq=%u data_len=%u checksum=%u, dsn64=%d",
-			 mpext->data_seq, mpext->subflow_seq, mpext->data_len,
+			 mpext->data_seq, mpext->subflow_seq, skb->len,
 			 mpext->checksum, mpext->dsn64);
 	} /* TODO: else fallback */
 
+out:
 	pfrag->offset += ret;
+	msk->last = ssk;
 	msk->write_seq += ret;
 	subflow_ctx(ssk)->rel_write_seq += ret;
 
-	tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle, size_goal);
 	return ret;
 }
 
 static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
+	int mss_now = 0, size_goal = 0, ret = 0;
 	size_t copied = 0;
 	struct sock *ssk;
-	int ret = 0;
 
 	pr_debug("msk=%p", msk);
 	if (msk->subflow) {
@@ -153,15 +172,19 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	lock_sock(sk);
 	lock_sock(ssk);
+
 	while (msg_data_left(msg)) {
-		ret = mptcp_sendmsg_frag(sk, ssk, msg);
+		ret = mptcp_sendmsg_frag(sk, ssk, msg, &mss_now, &size_goal);
 		if (ret < 0)
 			break;
 
 		copied += ret;
 	}
-	if (copied > 0)
+	if (copied) {
 		ret = copied;
+		tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle,
+			 size_goal);
+	}
 
 	release_sock(ssk);
 	release_sock(sk);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-04-10 20:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-10  8:05 [MPTCP] [RFC PATCH 4/4] mptcp: allow collapsing consecutive sendpages on the same substream Paolo Abeni
  -- strict thread matches above, loose matches on Subject: below --
2019-04-10 20:59 Paolo Abeni
2019-04-10 17:16 Mat Martineau
2019-04-06  0:34 Mat Martineau
2019-04-05 14:46 Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.