* [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows"
@ 2020-09-10 0:40 Mat Martineau
0 siblings, 0 replies; 5+ messages in thread
From: Mat Martineau @ 2020-09-10 0:40 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 7305 bytes --]
On Wed, 9 Sep 2020, Paolo Abeni wrote:
> This improves the current packet scheduler for scenarios
> where the msk sockets is using different links with different
> bandwith limits.
>
> The commit comment should be changed into:
>
> """
> Update the scheduler to less trivial heuristic: cache
> the last used subflow, and try to send on it a reasonably
> long burst of data.
>
> When the burst or the subflow send space is exhausted, pick
> the subflow with the lower ratio between write space and
> send buffer - that is, the subflow with the greater relative
> amount of free space.
> """
>
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> ---
> v1 -> v2:
> - remove some unneeded magic number
> - fix potential division by 0 - Florian
>
> Note:
> - subflows are now selected when sk_stream_memory_free(ssk),
> while before we used sk_stream_is_writeable(ssk). The current
> choice is closer to plain TCP and IMHO more correct, but
> even the old one should not be buggy.
> Anyhow I saw reproducible write error (EAGAIN) when using
> sk_stream_is_writeable() with this strategy, and I was unable
> to track them down, so possibly there is some other bug there
> I could not find.
> - simuilt xmit self-tests are still failing, even if we are now
> closer to success. The failures are now caused by the high delay
> configured on the virtual link.
> This problem would be addressed by a larger initial send buffer,
> but the current xmit path is unable to cope with that correctly:
> with a larger send buffer it will send pkts outside the MPTCP
> window.
> To properly address the above we additionally need something alike
> "mptcp: keep track of receivers advertised window" from Florian,
> plus the ability to enqueue data at the MPTCP level and send it
> when the MPTCP level window is open.
Is this what the tcp_cleanup_rbuf() patch helps with?
I do like this approach better for the default scheduler, but hopefully
some of our community members who wrote graduate theses on this kind of
thing can give a more informed opinion at the meeting tomorrow :)
Mat
> ---
> net/mptcp/protocol.c | 107 ++++++++++++++++++++-----------------------
> 1 file changed, 49 insertions(+), 58 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index eb833ecbba9e..7469e89b3d7a 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1063,26 +1063,25 @@ static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow)
> return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT));
> }
>
> -#define MPTCP_SEND_BURST_SIZE (1 << 15)
> +#define MPTCP_SEND_BURST_SIZE ((1 << 16) - \
> + sizeof(struct tcphdr) - \
> + MAX_TCP_OPTION_SPACE - \
> + sizeof(struct ipv6hdr) - \
> + sizeof(struct frag_hdr))
>
> -static struct list_head *mptcp_next_subflow(struct mptcp_sock *msk,
> - struct list_head *pos,
> - bool wrap_around)
> -{
> - if (list_is_last(pos, &msk->conn_list) && wrap_around)
> - return msk->conn_list.next;
> - return pos->next;
> -}
> +struct subflow_send_info {
> + struct sock *ssk;
> + uint64_t ratio;
> +};
>
> static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk,
> u32 *sndbuf)
> {
> + struct subflow_send_info send_info[2];
> struct mptcp_subflow_context *subflow;
> - struct sock *next_backup = NULL;
> - struct list_head *pos, *start;
> - struct sock *next_ssk = NULL;
> - bool wrap_around;
> - bool free;
> + int i, nr_active = 0;
> + int64_t ratio, pace;
> + struct sock *ssk;
>
> sock_owned_by_me((struct sock *)msk);
>
> @@ -1095,65 +1094,57 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk,
> return msk->first;
> }
>
> - /* lookup the first writeable subflow and first writable back subflow
> - * starting from last used, with wrap-around
> - */
> - if (msk->last_snd) {
> - start = &mptcp_subflow_ctx(msk->last_snd)->node;
> - wrap_around = true;
> - } else {
> - start = &msk->conn_list;
> - wrap_around = false;
> - }
> - pr_debug("msk=%p start=%p pos=%p wrap_around=%d last=%p", msk, start,
> - mptcp_next_subflow(msk, start, wrap_around), wrap_around,
> - msk->last_snd);
> - for (pos = mptcp_next_subflow(msk, start, wrap_around); pos != start;
> - pos = mptcp_next_subflow(msk, pos, wrap_around)) {
> - struct sock *ssk;
> + /* re-use last subflow, if the burst allow that */
> + if (msk->last_snd && msk->snd_burst > 0 &&
> + sk_stream_memory_free(msk->last_snd) &&
> + mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) {
> + mptcp_for_each_subflow(msk, subflow) {
> + ssk = mptcp_subflow_tcp_sock(subflow);
> + *sndbuf = max(tcp_sk(ssk)->snd_wnd, *sndbuf);
> + }
> + return msk->last_snd;
> + }
>
> - subflow = list_entry(pos, struct mptcp_subflow_context, node);
> + /* pick the subflow with the lower wmem/wspace ratio */
> + for (i = 0; i < 2; ++i) {
> + send_info[i].ssk = NULL;
> + send_info[i].ratio = -1;
> + }
> + mptcp_for_each_subflow(msk, subflow) {
> ssk = mptcp_subflow_tcp_sock(subflow);
> if (!mptcp_subflow_active(subflow))
> continue;
>
> + nr_active += !subflow->backup;
> *sndbuf = max(tcp_sk(ssk)->snd_wnd, *sndbuf);
> - if (next_backup || next_ssk)
> + if (!sk_stream_memory_free(subflow->tcp_sock))
> continue;
>
> - free = sk_stream_is_writeable(subflow->tcp_sock);
> - if (!free)
> + pace = READ_ONCE(ssk->sk_pacing_rate);
> + if (!pace)
> continue;
>
> - if (!subflow->backup && !next_ssk)
> - next_ssk = ssk;
> -
> - if (subflow->backup && !next_backup)
> - next_backup = ssk;
> + ratio = (int64_t)READ_ONCE(ssk->sk_wmem_queued) << 32 / pace;
> + if (ratio < send_info[subflow->backup].ratio) {
> + send_info[subflow->backup].ssk = ssk;
> + send_info[subflow->backup].ratio = ratio;
> + }
> }
> - if (!next_ssk)
> - next_ssk = next_backup;
>
> - if (msk->last_snd) {
> - *sndbuf = max(tcp_sk(msk->last_snd)->snd_wnd, *sndbuf);
> - free = sk_stream_memory_free(msk->last_snd);
> - } else {
> - free = false;
> - }
> - pr_debug("msk=%p ssk=%p:%d last=%p last free=%d burst=%d", msk,
> - next_ssk, next_ssk ? sk_stream_wspace(next_ssk) : 0,
> - msk->last_snd, free, msk->snd_burst);
> + /* pick the best backup if no other subflow is active */
> + if (!nr_active)
> + send_info[0].ssk = send_info[1].ssk;
>
> - /* use the looked-up subflow if the previusly one has exauted the burst
> - * or is not writable
> - */
> - if (next_ssk && (!free || msk->snd_burst <= 0)) {
> - msk->last_snd = next_ssk;
> + pr_debug("msk=%p nr_active=%d ssk=%p:%lld backup=%p:%lld",
> + msk, nr_active, send_info[0].ssk, send_info[0].ratio,
> + send_info[1].ssk, send_info[1].ratio);
> + if (send_info[0].ssk) {
> + msk->last_snd = send_info[0].ssk;
> msk->snd_burst = min_t(int, MPTCP_SEND_BURST_SIZE,
> - sk_stream_wspace(next_ssk));
> - free = true;
> + sk_stream_wspace(msk->last_snd));
> + return msk->last_snd;
> }
> - return free ? msk->last_snd : NULL;
> + return NULL;
> }
>
> static void ssk_check_wmem(struct mptcp_sock *msk)
> --
> 2.26.2
--
Mat Martineau
Intel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows"
@ 2020-09-11 10:57 Matthieu Baerts
0 siblings, 0 replies; 5+ messages in thread
From: Matthieu Baerts @ 2020-09-11 10:57 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 1917 bytes --]
Hi Paolo,
On 11/09/2020 12:36, Matthieu Baerts wrote:
> Hi Paolo,
>
> Thank you for the patch!
>
> On 09/09/2020 15:52, Paolo Abeni wrote:
>> This improves the current packet scheduler for scenarios
>> where the msk sockets is using different links with different
>> bandwith limits.
>>
>> The commit comment should be changed into:
>>
>> """
>> Update the scheduler to less trivial heuristic: cache
>> the last used subflow, and try to send on it a reasonably
>> long burst of data.
>>
>> When the burst or the subflow send space is exhausted, pick
>> the subflow with the lower ratio between write space and
>> send buffer - that is, the subflow with the greater relative
>> amount of free space.
>> """
>>
>> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
>
> (...)
>
>> + /* pick the best backup if no other subflow is active */
>> + if (!nr_active)
>> + send_info[0].ssk = send_info[1].ssk;
>> - /* use the looked-up subflow if the previusly one has exauted the
>> burst
>> - * or is not writable
>> - */
>> - if (next_ssk && (!free || msk->snd_burst <= 0)) {
>> - msk->last_snd = next_ssk;
>> + pr_debug("msk=%p nr_active=%d ssk=%p:%lld backup=%p:%lld",
>> + msk, nr_active, send_info[0].ssk, send_info[0].ratio,
>> + send_info[1].ssk, send_info[1].ratio);
>
> We could maybe have this pr_debug before "if (!nr_active)" just above to
> avoid confusions but that's just a debug print. For me we can apply it
> like that. I can also move this pr_debug when applying if you want to.
I did this small notification there:
- 88d1a56e2c6f: mptcp: move pr_debug before changes
- 2e7e0cdd7acd..2998848b5594: result
Feel free to revert if you prefer the other way!
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 5+ messages in thread
* [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows"
@ 2020-09-11 10:48 Matthieu Baerts
0 siblings, 0 replies; 5+ messages in thread
From: Matthieu Baerts @ 2020-09-11 10:48 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]
Hi Paolo, Mat, Florian,
On 09/09/2020 15:52, Paolo Abeni wrote:
> This improves the current packet scheduler for scenarios
> where the msk sockets is using different links with different
> bandwith limits.
>
> The commit comment should be changed into:
>
> """
> Update the scheduler to less trivial heuristic: cache
> the last used subflow, and try to send on it a reasonably
> long burst of data.
>
> When the burst or the subflow send space is exhausted, pick
> the subflow with the lower ratio between write space and
> send buffer - that is, the subflow with the greater relative
> amount of free space.
> """
>
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Thank you for the patch and the reviews!
- 145c1405c183: "squashed" patch 1/2 in "mptcp: allow picking different
xmit subflows"
- a2332e5650da: tg:msg: explain new heuristic
- b5e4b1109338..4822f58422e9: result
Tests + export will be started soon (after having applied your other
patches)
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 5+ messages in thread
* [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows"
@ 2020-09-11 10:36 Matthieu Baerts
0 siblings, 0 replies; 5+ messages in thread
From: Matthieu Baerts @ 2020-09-11 10:36 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]
Hi Paolo,
Thank you for the patch!
On 09/09/2020 15:52, Paolo Abeni wrote:
> This improves the current packet scheduler for scenarios
> where the msk sockets is using different links with different
> bandwith limits.
>
> The commit comment should be changed into:
>
> """
> Update the scheduler to less trivial heuristic: cache
> the last used subflow, and try to send on it a reasonably
> long burst of data.
>
> When the burst or the subflow send space is exhausted, pick
> the subflow with the lower ratio between write space and
> send buffer - that is, the subflow with the greater relative
> amount of free space.
> """
>
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
(...)
> + /* pick the best backup if no other subflow is active */
> + if (!nr_active)
> + send_info[0].ssk = send_info[1].ssk;
>
> - /* use the looked-up subflow if the previusly one has exauted the burst
> - * or is not writable
> - */
> - if (next_ssk && (!free || msk->snd_burst <= 0)) {
> - msk->last_snd = next_ssk;
> + pr_debug("msk=%p nr_active=%d ssk=%p:%lld backup=%p:%lld",
> + msk, nr_active, send_info[0].ssk, send_info[0].ratio,
> + send_info[1].ssk, send_info[1].ratio);
We could maybe have this pr_debug before "if (!nr_active)" just above to
avoid confusions but that's just a debug print. For me we can apply it
like that. I can also move this pr_debug when applying if you want to.
The rest is (very) good to me!
Cheers,
Matt
> + if (send_info[0].ssk) {
> + msk->last_snd = send_info[0].ssk;
> msk->snd_burst = min_t(int, MPTCP_SEND_BURST_SIZE,
> - sk_stream_wspace(next_ssk));
> - free = true;
> + sk_stream_wspace(msk->last_snd));
> + return msk->last_snd;
> }
> - return free ? msk->last_snd : NULL;
> + return NULL;
> }
>
> static void ssk_check_wmem(struct mptcp_sock *msk)
>
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 5+ messages in thread
* [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows"
@ 2020-09-10 8:38 Paolo Abeni
0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2020-09-10 8:38 UTC (permalink / raw)
To: mptcp
[-- Attachment #1: Type: text/plain, Size: 2506 bytes --]
On Wed, 2020-09-09 at 17:40 -0700, Mat Martineau wrote:
> On Wed, 9 Sep 2020, Paolo Abeni wrote:
>
> > This improves the current packet scheduler for scenarios
> > where the msk sockets is using different links with different
> > bandwith limits.
> >
> > The commit comment should be changed into:
> >
> > """
> > Update the scheduler to less trivial heuristic: cache
> > the last used subflow, and try to send on it a reasonably
> > long burst of data.
> >
> > When the burst or the subflow send space is exhausted, pick
> > the subflow with the lower ratio between write space and
> > send buffer - that is, the subflow with the greater relative
> > amount of free space.
> > """
> >
> > Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> > ---
> > v1 -> v2:
> > - remove some unneeded magic number
> > - fix potential division by 0 - Florian
> >
> > Note:
> > - subflows are now selected when sk_stream_memory_free(ssk),
> > while before we used sk_stream_is_writeable(ssk). The current
> > choice is closer to plain TCP and IMHO more correct, but
> > even the old one should not be buggy.
> > Anyhow I saw reproducible write error (EAGAIN) when using
> > sk_stream_is_writeable() with this strategy, and I was unable
> > to track them down, so possibly there is some other bug there
> > I could not find.
> > - simuilt xmit self-tests are still failing, even if we are now
> > closer to success. The failures are now caused by the high delay
> > configured on the virtual link.
> > This problem would be addressed by a larger initial send buffer,
> > but the current xmit path is unable to cope with that correctly:
> > with a larger send buffer it will send pkts outside the MPTCP
> > window.
> > To properly address the above we additionally need something alike
> > "mptcp: keep track of receivers advertised window" from Florian,
> > plus the ability to enqueue data at the MPTCP level and send it
> > when the MPTCP level window is open.
>
> Is this what the tcp_cleanup_rbuf() patch helps with?
There are 2 issues:
1) very long delay introduced by missing acks, addressed by
the tcp_cleanup_rbuf() patch
2) inability of current MPTCP xmit path code to handle with large
sndbuf - specifically larger then the send window. This latter needs
quite a bit of rework - which is in progress - comprising Florian's
snd_wnd patches and implementing the mptcp equivalent of
tcp_push_pending_frames()
Cheers,
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-09-11 10:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10 0:40 [MPTCP] Re: [PATCH net-next 1/2] Squash-to: "mptcp: allow picking different xmit subflows" Mat Martineau
2020-09-10 8:38 Paolo Abeni
2020-09-11 10:36 Matthieu Baerts
2020-09-11 10:48 Matthieu Baerts
2020-09-11 10:57 Matthieu Baerts
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.