All of lore.kernel.org
 help / color / mirror / Atom feed
* [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows"
@ 2020-09-10  0:40 Mat Martineau
  0 siblings, 0 replies; 5+ messages in thread
From: Mat Martineau @ 2020-09-10  0:40 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 7305 bytes --]


On Wed, 9 Sep 2020, Paolo Abeni wrote:

> This improves the current packet scheduler for scenarios
> where the msk sockets is using different links with different
> bandwith limits.
>
> The commit comment should be changed into:
>
> """
> Update the scheduler to less trivial heuristic: cache
> the last used subflow, and try to send on it a reasonably
> long burst of data.
>
> When the burst or the subflow send space is exhausted, pick
> the subflow with the lower ratio between write space and
> send buffer - that is, the subflow with the greater relative
> amount of free space.
> """
>
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> ---
> v1 -> v2:
> - remove some unneeded magic number
> - fix potential division by 0 - Florian
>
> Note:
> - subflows are now selected when sk_stream_memory_free(ssk),
>  while before we used sk_stream_is_writeable(ssk). The current
>  choice is closer to plain TCP and IMHO more correct, but
>  even the old one should not be buggy.
>  Anyhow I saw reproducible write error (EAGAIN) when using
>  sk_stream_is_writeable() with this strategy, and I was unable
>  to track them down, so possibly there is some other bug there
>  I could not find.
> - simuilt xmit self-tests are still failing, even if we are now
>  closer to success. The failures are now caused by the high delay
>  configured on the virtual link.
>  This problem would be addressed by a larger initial send buffer,
>  but the current xmit path is unable to cope with that correctly:
>  with a larger send buffer it will send pkts outside the MPTCP
>  window.
>  To properly address the above we additionally need something alike
>  "mptcp: keep track of receivers advertised window" from Florian,
>  plus the ability to enqueue data at the MPTCP level and send it
>  when the MPTCP level window is open.

Is this what the tcp_cleanup_rbuf() patch helps with?

I do like this approach better for the default scheduler, but hopefully 
some of our community members who wrote graduate theses on this kind of 
thing can give a more informed opinion at the meeting tomorrow :)


Mat


> ---
> net/mptcp/protocol.c | 107 ++++++++++++++++++++-----------------------
> 1 file changed, 49 insertions(+), 58 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index eb833ecbba9e..7469e89b3d7a 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1063,26 +1063,25 @@ static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
> 	return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT));
> }
>
> -#define MPTCP_SEND_BURST_SIZE		(1 << 15)
> +#define MPTCP_SEND_BURST_SIZE		((1 << 16) - \
> +					 sizeof(struct tcphdr) - \
> +					 MAX_TCP_OPTION_SPACE - \
> +					 sizeof(struct ipv6hdr) - \
> +					 sizeof(struct frag_hdr))
>
> -static struct list_head *mptcp_next_subflow(struct mptcp_sock *msk,
> -					    struct list_head *pos,
> -					    bool wrap_around)
> -{
> -	if (list_is_last(pos, &msk->conn_list) && wrap_around)
> -		return msk->conn_list.next;
> -	return pos->next;
> -}
> +struct subflow_send_info {
> +	struct sock *ssk;
> +	uint64_t ratio;
> +};
>
> static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk,
> 					   u32 *sndbuf)
> {
> +	struct subflow_send_info send_info[2];
> 	struct mptcp_subflow_context *subflow;
> -	struct sock *next_backup = NULL;
> -	struct list_head *pos, *start;
> -	struct sock *next_ssk = NULL;
> -	bool wrap_around;
> -	bool free;
> +	int i, nr_active = 0;
> +	int64_t ratio, pace;
> +	struct sock *ssk;
>
> 	sock_owned_by_me((struct sock *)msk);
>
> @@ -1095,65 +1094,57 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk,
> 		return msk->first;
> 	}
>
> -	/* lookup the first writeable subflow and first writable back subflow
> -	 * starting from last used, with wrap-around
> -	 */
> -	if (msk->last_snd) {
> -		start = &mptcp_subflow_ctx(msk->last_snd)->node;
> -		wrap_around = true;
> -	} else {
> -		start = &msk->conn_list;
> -		wrap_around = false;
> -	}
> -	pr_debug("msk=%p start=%p pos=%p wrap_around=%d last=%p", msk, start,
> -		 mptcp_next_subflow(msk, start, wrap_around), wrap_around,
> -		 msk->last_snd);
> -	for (pos = mptcp_next_subflow(msk, start, wrap_around); pos != start;
> -	     pos = mptcp_next_subflow(msk, pos, wrap_around)) {
> -		struct sock *ssk;
> +	/* re-use last subflow, if the burst allow that */
> +	if (msk->last_snd && msk->snd_burst > 0 &&
> +	    sk_stream_memory_free(msk->last_snd) &&
> +	    mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) {
> +		mptcp_for_each_subflow(msk, subflow) {
> +			ssk =  mptcp_subflow_tcp_sock(subflow);
> +			*sndbuf = max(tcp_sk(ssk)->snd_wnd, *sndbuf);
> +		}
> +		return msk->last_snd;
> +	}
>
> -		subflow = list_entry(pos, struct mptcp_subflow_context, node);
> +	/* pick the subflow with the lower wmem/wspace ratio */
> +	for (i = 0; i < 2; ++i) {
> +		send_info[i].ssk = NULL;
> +		send_info[i].ratio = -1;
> +	}
> +	mptcp_for_each_subflow(msk, subflow) {
> 		ssk =  mptcp_subflow_tcp_sock(subflow);
> 		if (!mptcp_subflow_active(subflow))
> 			continue;
>
> +		nr_active += !subflow->backup;
> 		*sndbuf = max(tcp_sk(ssk)->snd_wnd, *sndbuf);
> -		if (next_backup || next_ssk)
> +		if (!sk_stream_memory_free(subflow->tcp_sock))
> 			continue;
>
> -		free = sk_stream_is_writeable(subflow->tcp_sock);
> -		if (!free)
> +		pace = READ_ONCE(ssk->sk_pacing_rate);
> +		if (!pace)
> 			continue;
>
> -		if (!subflow->backup && !next_ssk)
> -			next_ssk = ssk;
> -
> -		if (subflow->backup && !next_backup)
> -			next_backup = ssk;
> +		ratio = (int64_t)READ_ONCE(ssk->sk_wmem_queued) << 32 / pace;
> +		if (ratio < send_info[subflow->backup].ratio) {
> +			send_info[subflow->backup].ssk = ssk;
> +			send_info[subflow->backup].ratio = ratio;
> +		}
> 	}
> -	if (!next_ssk)
> -		next_ssk = next_backup;
>
> -	if (msk->last_snd) {
> -		*sndbuf = max(tcp_sk(msk->last_snd)->snd_wnd, *sndbuf);
> -		free = sk_stream_memory_free(msk->last_snd);
> -	} else {
> -		free = false;
> -	}
> -	pr_debug("msk=%p ssk=%p:%d last=%p last free=%d burst=%d", msk,
> -		 next_ssk, next_ssk ? sk_stream_wspace(next_ssk) : 0,
> -		 msk->last_snd, free, msk->snd_burst);
> +	/* pick the best backup if no other subflow is active */
> +	if (!nr_active)
> +		send_info[0].ssk = send_info[1].ssk;
>
> -	/* use the looked-up subflow if the previusly one has exauted the burst
> -	 * or is not writable
> -	 */
> -	if (next_ssk && (!free || msk->snd_burst <= 0)) {
> -		msk->last_snd = next_ssk;
> +	pr_debug("msk=%p nr_active=%d ssk=%p:%lld backup=%p:%lld",
> +		 msk, nr_active, send_info[0].ssk, send_info[0].ratio,
> +		 send_info[1].ssk, send_info[1].ratio);
> +	if (send_info[0].ssk) {
> +		msk->last_snd = send_info[0].ssk;
> 		msk->snd_burst = min_t(int, MPTCP_SEND_BURST_SIZE,
> -				       sk_stream_wspace(next_ssk));
> -		free = true;
> +				       sk_stream_wspace(msk->last_snd));
> +		return msk->last_snd;
> 	}
> -	return free ? msk->last_snd : NULL;
> +	return NULL;
> }
>
> static void ssk_check_wmem(struct mptcp_sock *msk)
> -- 
> 2.26.2

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows"
@ 2020-09-11 10:57 Matthieu Baerts
  0 siblings, 0 replies; 5+ messages in thread
From: Matthieu Baerts @ 2020-09-11 10:57 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1917 bytes --]

Hi Paolo,

On 11/09/2020 12:36, Matthieu Baerts wrote:
> Hi Paolo,
> 
> Thank you for the patch!
> 
> On 09/09/2020 15:52, Paolo Abeni wrote:
>> This improves the current packet scheduler for scenarios
>> where the msk sockets is using different links with different
>> bandwith limits.
>>
>> The commit comment should be changed into:
>>
>> """
>> Update the scheduler to less trivial heuristic: cache
>> the last used subflow, and try to send on it a reasonably
>> long burst of data.
>>
>> When the burst or the subflow send space is exhausted, pick
>> the subflow with the lower ratio between write space and
>> send buffer - that is, the subflow with the greater relative
>> amount of free space.
>> """
>>
>> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> 
> (...)
> 
>> +    /* pick the best backup if no other subflow is active */
>> +    if (!nr_active)
>> +        send_info[0].ssk = send_info[1].ssk;
>> -    /* use the looked-up subflow if the previusly one has exauted the 
>> burst
>> -     * or is not writable
>> -     */
>> -    if (next_ssk && (!free || msk->snd_burst <= 0)) {
>> -        msk->last_snd = next_ssk;
>> +    pr_debug("msk=%p nr_active=%d ssk=%p:%lld backup=%p:%lld",
>> +         msk, nr_active, send_info[0].ssk, send_info[0].ratio,
>> +         send_info[1].ssk, send_info[1].ratio);
> 
> We could maybe have this pr_debug before "if (!nr_active)" just above to 
> avoid confusions but that's just a debug print. For me we can apply it 
> like that. I can also move this pr_debug when applying if you want to.

I did this small notification there:

- 88d1a56e2c6f: mptcp: move pr_debug before changes
- 2e7e0cdd7acd..2998848b5594: result

Feel free to revert if you prefer the other way!

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows"
@ 2020-09-11 10:48 Matthieu Baerts
  0 siblings, 0 replies; 5+ messages in thread
From: Matthieu Baerts @ 2020-09-11 10:48 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]

Hi Paolo, Mat, Florian,

On 09/09/2020 15:52, Paolo Abeni wrote:
> This improves the current packet scheduler for scenarios
> where the msk sockets is using different links with different
> bandwith limits.
> 
> The commit comment should be changed into:
> 
> """
> Update the scheduler to less trivial heuristic: cache
> the last used subflow, and try to send on it a reasonably
> long burst of data.
> 
> When the burst or the subflow send space is exhausted, pick
> the subflow with the lower ratio between write space and
> send buffer - that is, the subflow with the greater relative
> amount of free space.
> """
> 
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>

Thank you for the patch and the reviews!

- 145c1405c183: "squashed" patch 1/2 in "mptcp: allow picking different 
xmit subflows"
- a2332e5650da: tg:msg: explain new heuristic
- b5e4b1109338..4822f58422e9: result

Tests + export will be started soon (after having applied your other 
patches)

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows"
@ 2020-09-11 10:36 Matthieu Baerts
  0 siblings, 0 replies; 5+ messages in thread
From: Matthieu Baerts @ 2020-09-11 10:36 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]

Hi Paolo,

Thank you for the patch!

On 09/09/2020 15:52, Paolo Abeni wrote:
> This improves the current packet scheduler for scenarios
> where the msk sockets is using different links with different
> bandwith limits.
> 
> The commit comment should be changed into:
> 
> """
> Update the scheduler to less trivial heuristic: cache
> the last used subflow, and try to send on it a reasonably
> long burst of data.
> 
> When the burst or the subflow send space is exhausted, pick
> the subflow with the lower ratio between write space and
> send buffer - that is, the subflow with the greater relative
> amount of free space.
> """
> 
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>

(...)

> +	/* pick the best backup if no other subflow is active */
> +	if (!nr_active)
> +		send_info[0].ssk = send_info[1].ssk;
>   
> -	/* use the looked-up subflow if the previusly one has exauted the burst
> -	 * or is not writable
> -	 */
> -	if (next_ssk && (!free || msk->snd_burst <= 0)) {
> -		msk->last_snd = next_ssk;
> +	pr_debug("msk=%p nr_active=%d ssk=%p:%lld backup=%p:%lld",
> +		 msk, nr_active, send_info[0].ssk, send_info[0].ratio,
> +		 send_info[1].ssk, send_info[1].ratio);

We could maybe have this pr_debug before "if (!nr_active)" just above to 
avoid confusions but that's just a debug print. For me we can apply it 
like that. I can also move this pr_debug when applying if you want to.

The rest is (very) good to me!

Cheers,
Matt

> +	if (send_info[0].ssk) {
> +		msk->last_snd = send_info[0].ssk;
>   		msk->snd_burst = min_t(int, MPTCP_SEND_BURST_SIZE,
> -				       sk_stream_wspace(next_ssk));
> -		free = true;
> +				       sk_stream_wspace(msk->last_snd));
> +		return msk->last_snd;
>   	}
> -	return free ? msk->last_snd : NULL;
> +	return NULL;
>   }
>   
>   static void ssk_check_wmem(struct mptcp_sock *msk)
> 

-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows"
@ 2020-09-10  8:38 Paolo Abeni
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2020-09-10  8:38 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 2506 bytes --]

On Wed, 2020-09-09 at 17:40 -0700, Mat Martineau wrote:
> On Wed, 9 Sep 2020, Paolo Abeni wrote:
> 
> > This improves the current packet scheduler for scenarios
> > where the msk sockets is using different links with different
> > bandwith limits.
> > 
> > The commit comment should be changed into:
> > 
> > """
> > Update the scheduler to less trivial heuristic: cache
> > the last used subflow, and try to send on it a reasonably
> > long burst of data.
> > 
> > When the burst or the subflow send space is exhausted, pick
> > the subflow with the lower ratio between write space and
> > send buffer - that is, the subflow with the greater relative
> > amount of free space.
> > """
> > 
> > Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> > ---
> > v1 -> v2:
> > - remove some unneeded magic number
> > - fix potential division by 0 - Florian
> > 
> > Note:
> > - subflows are now selected when sk_stream_memory_free(ssk),
> >  while before we used sk_stream_is_writeable(ssk). The current
> >  choice is closer to plain TCP and IMHO more correct, but
> >  even the old one should not be buggy.
> >  Anyhow I saw reproducible write error (EAGAIN) when using
> >  sk_stream_is_writeable() with this strategy, and I was unable
> >  to track them down, so possibly there is some other bug there
> >  I could not find.
> > - simuilt xmit self-tests are still failing, even if we are now
> >  closer to success. The failures are now caused by the high delay
> >  configured on the virtual link.
> >  This problem would be addressed by a larger initial send buffer,
> >  but the current xmit path is unable to cope with that correctly:
> >  with a larger send buffer it will send pkts outside the MPTCP
> >  window.
> >  To properly address the above we additionally need something alike
> >  "mptcp: keep track of receivers advertised window" from Florian,
> >  plus the ability to enqueue data at the MPTCP level and send it
> >  when the MPTCP level window is open.
> 
> Is this what the tcp_cleanup_rbuf() patch helps with?

There are 2 issues:

1) very long delay introduced by missing acks, addressed by
the tcp_cleanup_rbuf() patch

2) inability of current MPTCP xmit path code to handle with large
sndbuf - specifically larger then the send window. This latter needs
quite a bit of rework - which is in progress - comprising Florian's
snd_wnd patches and implementing the mptcp equivalent of
tcp_push_pending_frames()

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-11 10:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10  0:40 [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows" Mat Martineau
2020-09-10  8:38 Paolo Abeni
2020-09-11 10:36 Matthieu Baerts
2020-09-11 10:48 Matthieu Baerts
2020-09-11 10:57 Matthieu Baerts

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.