* [PATCH mptcp-net 0/2] mptcp: fix MPJ 3rd ack retransmissions
@ 2021-10-28 13:08 Paolo Abeni
2021-10-28 13:08 ` [PATCH mptcp-net 1/2] mptcp: fix delack timer Paolo Abeni
2021-10-28 13:08 ` [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans Paolo Abeni
0 siblings, 2 replies; 5+ messages in thread
From: Paolo Abeni @ 2021-10-28 13:08 UTC (permalink / raw)
To: mptcp
The first patch is from Eric, thus is correct ;)
The 2nd one is much more doubtful, starting from the author ;)
The individual commit messages should be descriptive enough. With these
2 patches the related, pending pktdrill test:
https://github.com/multipath-tcp/packetdrill/pull/70
now complete successfully
Eric Dumazet (1):
mptcp: fix delack timer
Paolo Abeni (1):
mptcp: use delegate action to schedule 3rd ack retrans
net/mptcp/options.c | 32 ++++++++--------------------
net/mptcp/protocol.c | 50 ++++++++++++++++++++++++++++++++++++--------
net/mptcp/protocol.h | 17 ++++++++-------
3 files changed, 59 insertions(+), 40 deletions(-)
--
2.26.3
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH mptcp-net 1/2] mptcp: fix delack timer
2021-10-28 13:08 [PATCH mptcp-net 0/2] mptcp: fix MPJ 3rd ack retransmissions Paolo Abeni
@ 2021-10-28 13:08 ` Paolo Abeni
2021-10-28 13:08 ` [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans Paolo Abeni
1 sibling, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2021-10-28 13:08 UTC (permalink / raw)
To: mptcp
From: Eric Dumazet <edumazet@google.com>
To compute the rtx timeout schedule_3rdack_retransmission() does multiple
things in the wrong way: srtt_us is measures in usec/8 and the timeout
itself is an absolute vale.
Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
commit message typos added by me
---
net/mptcp/options.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 422f4acfb3e6..465d4143d555 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -434,9 +434,10 @@ static void schedule_3rdack_retransmission(struct sock *sk)
/* reschedule with a timeout above RTT, as we must look only for drop */
if (tp->srtt_us)
- timeout = tp->srtt_us << 1;
+ timeout = usecs_to_jiffies(tp->srtt_us >> (3-1));
else
timeout = TCP_TIMEOUT_INIT;
+ timeout += jiffies;
WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER);
icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER;
--
2.26.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans
2021-10-28 13:08 [PATCH mptcp-net 0/2] mptcp: fix MPJ 3rd ack retransmissions Paolo Abeni
2021-10-28 13:08 ` [PATCH mptcp-net 1/2] mptcp: fix delack timer Paolo Abeni
@ 2021-10-28 13:08 ` Paolo Abeni
2021-10-29 23:36 ` Mat Martineau
1 sibling, 1 reply; 5+ messages in thread
From: Paolo Abeni @ 2021-10-28 13:08 UTC (permalink / raw)
To: mptcp
Scheduling a delack in mptcp_established_options_mp() is
not a good idea: such function is called by tcp_send_ack() and
the pending delayed ack will be cleared shortly after by the
tcp_event_ack_sent() call in __tcp_transmit_skb().
Instead use the mptcp delegated action infrastructure to
schedule the delayed ack after the current bh processing completes.
Additionally moves the schedule_3rdack_retransmission() helper
into protocol.c to avoid making it visible in a different compilation
unit.
Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
side note:
Why we already see 3rd ack retransimssion in pcap traces even if the delack timer
was broken sice the beginning?!? I guess that is due to pure ack generated
by mptcp_recvmsg() and pushed on all the subflow. mptcp_send_ack() only
checks for the TCP status and not the MPTCP one. When landing on MPJ
subflows not already fully established, such data-ack will be converted in MPJ
3rd ack by the MPTCP output path/out options building, as the MPJ option
has higher priority than DSS
---
net/mptcp/options.c | 33 ++++++++---------------------
net/mptcp/protocol.c | 50 ++++++++++++++++++++++++++++++++++++--------
net/mptcp/protocol.h | 17 ++++++++-------
3 files changed, 59 insertions(+), 41 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 465d4143d555..12ffee025373 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -422,29 +422,6 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
return false;
}
-/* MP_JOIN client subflow must wait for 4th ack before sending any data:
- * TCP can't schedule delack timer before the subflow is fully established.
- * MPTCP uses the delack timer to do 3rd ack retransmissions
- */
-static void schedule_3rdack_retransmission(struct sock *sk)
-{
- struct inet_connection_sock *icsk = inet_csk(sk);
- struct tcp_sock *tp = tcp_sk(sk);
- unsigned long timeout;
-
- /* reschedule with a timeout above RTT, as we must look only for drop */
- if (tp->srtt_us)
- timeout = usecs_to_jiffies(tp->srtt_us >> (3-1));
- else
- timeout = TCP_TIMEOUT_INIT;
- timeout += jiffies;
-
- WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER);
- icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER;
- icsk->icsk_ack.timeout = timeout;
- sk_reset_timer(sk, &icsk->icsk_delack_timer, timeout);
-}
-
static void clear_3rdack_retransmission(struct sock *sk)
{
struct inet_connection_sock *icsk = inet_csk(sk);
@@ -527,7 +504,15 @@ static bool mptcp_established_options_mp(struct sock *sk, struct sk_buff *skb,
*size = TCPOLEN_MPTCP_MPJ_ACK;
pr_debug("subflow=%p", subflow);
- schedule_3rdack_retransmission(sk);
+ /* we can use the full delegate action helper only from BH context
+ * If we are in process context - sk is flushing the backlog at
+ * socket lock release time - just set the appropriate flag, will
+ * be handled by the release callback
+ */
+ if (sock_owned_by_user(sk))
+ set_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status);
+ else
+ mptcp_subflow_delegate(subflow, MPTCP_DELEGATE_ACK);
return true;
}
return false;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 582f5f88d6ef..cd00f8998375 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1619,7 +1619,7 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk)
if (!xmit_ssk)
goto out;
if (xmit_ssk != ssk) {
- mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk));
+ mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk), MPTCP_DELEGATE_SEND);
goto out;
}
@@ -2972,7 +2972,7 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk)
if (xmit_ssk == ssk)
__mptcp_subflow_push_pending(sk, ssk);
else if (xmit_ssk)
- mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk));
+ mptcp_subflow_delegate(mptcp_subflow_ctx(xmit_ssk), MPTCP_DELEGATE_SEND);
} else {
set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
}
@@ -3022,18 +3022,50 @@ static void mptcp_release_cb(struct sock *sk)
__mptcp_update_rmem(sk);
}
+/* MP_JOIN client subflow must wait for 4th ack before sending any data:
+ * TCP can't schedule delack timer before the subflow is fully established.
+ * MPTCP uses the delack timer to do 3rd ack retransmissions
+ */
+static void schedule_3rdack_retransmission(struct sock *ssk)
+{
+ struct inet_connection_sock *icsk = inet_csk(ssk);
+ struct tcp_sock *tp = tcp_sk(ssk);
+ unsigned long timeout;
+
+ if (mptcp_subflow_ctx(ssk)->fully_established)
+ return;
+
+ /* reschedule with a timeout above RTT, as we must look only for drop */
+ if (tp->srtt_us)
+ timeout = usecs_to_jiffies(tp->srtt_us >> (3-1));
+ else
+ timeout = TCP_TIMEOUT_INIT;
+ timeout += jiffies;
+
+ WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER);
+ icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER;
+ icsk->icsk_ack.timeout = timeout;
+ sk_reset_timer(ssk, &icsk->icsk_delack_timer, timeout);
+}
+
void mptcp_subflow_process_delegated(struct sock *ssk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
struct sock *sk = subflow->conn;
- mptcp_data_lock(sk);
- if (!sock_owned_by_user(sk))
- __mptcp_subflow_push_pending(sk, ssk);
- else
- set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
- mptcp_data_unlock(sk);
- mptcp_subflow_delegated_done(subflow);
+ if (test_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status)) {
+ mptcp_data_lock(sk);
+ if (!sock_owned_by_user(sk))
+ __mptcp_subflow_push_pending(sk, ssk);
+ else
+ set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags);
+ mptcp_data_unlock(sk);
+ mptcp_subflow_delegated_done(subflow, MPTCP_DELEGATE_SEND);
+ }
+ if (test_bit(MPTCP_DELEGATE_ACK, &subflow->delegated_status)) {
+ schedule_3rdack_retransmission(ssk);
+ mptcp_subflow_delegated_done(subflow, MPTCP_DELEGATE_ACK);
+ }
}
static int mptcp_hash(struct sock *sk)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 46691acdea24..f6f4054e19ee 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -387,6 +387,7 @@ struct mptcp_delegated_action {
DECLARE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions);
#define MPTCP_DELEGATE_SEND 0
+#define MPTCP_DELEGATE_ACK 1
/* MPTCP subflow context */
struct mptcp_subflow_context {
@@ -493,23 +494,23 @@ static inline void mptcp_add_pending_subflow(struct mptcp_sock *msk,
void mptcp_subflow_process_delegated(struct sock *ssk);
-static inline void mptcp_subflow_delegate(struct mptcp_subflow_context *subflow)
+static inline void mptcp_subflow_delegate(struct mptcp_subflow_context *subflow, int action)
{
struct mptcp_delegated_action *delegated;
bool schedule;
+ /* the caller held the subflow bh socket lock */
+ lockdep_assert_in_softirq();
+
/* The implied barrier pairs with mptcp_subflow_delegated_done(), and
* ensures the below list check sees list updates done prior to status
* bit changes
*/
- if (!test_and_set_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status)) {
+ if (!test_and_set_bit(action, &subflow->delegated_status)) {
/* still on delegated list from previous scheduling */
if (!list_empty(&subflow->delegated_node))
return;
- /* the caller held the subflow bh socket lock */
- lockdep_assert_in_softirq();
-
delegated = this_cpu_ptr(&mptcp_delegated_actions);
schedule = list_empty(&delegated->head);
list_add_tail(&subflow->delegated_node, &delegated->head);
@@ -534,16 +535,16 @@ mptcp_subflow_delegated_next(struct mptcp_delegated_action *delegated)
static inline bool mptcp_subflow_has_delegated_action(const struct mptcp_subflow_context *subflow)
{
- return test_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status);
+ return !!READ_ONCE(subflow->delegated_status);
}
-static inline void mptcp_subflow_delegated_done(struct mptcp_subflow_context *subflow)
+static inline void mptcp_subflow_delegated_done(struct mptcp_subflow_context *subflow, int action)
{
/* pairs with mptcp_subflow_delegate, ensures delegate_node is updated before
* touching the status bit
*/
smp_wmb();
- clear_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status);
+ clear_bit(action, &subflow->delegated_status);
}
int mptcp_is_enabled(const struct net *net);
--
2.26.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans
2021-10-28 13:08 ` [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans Paolo Abeni
@ 2021-10-29 23:36 ` Mat Martineau
2021-11-02 17:56 ` Paolo Abeni
0 siblings, 1 reply; 5+ messages in thread
From: Mat Martineau @ 2021-10-29 23:36 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
On Thu, 28 Oct 2021, Paolo Abeni wrote:
> Scheduling a delack in mptcp_established_options_mp() is
> not a good idea: such function is called by tcp_send_ack() and
> the pending delayed ack will be cleared shortly after by the
> tcp_event_ack_sent() call in __tcp_transmit_skb().
>
> Instead use the mptcp delegated action infrastructure to
> schedule the delayed ack after the current bh processing completes.
>
> Additionally moves the schedule_3rdack_retransmission() helper
> into protocol.c to avoid making it visible in a different compilation
> unit.
>
> Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> side note:
> Why we already see 3rd ack retransimssion in pcap traces even if the delack timer
> was broken sice the beginning?!? I guess that is due to pure ack generated
> by mptcp_recvmsg() and pushed on all the subflow. mptcp_send_ack() only
> checks for the TCP status and not the MPTCP one. When landing on MPJ
> subflows not already fully established, such data-ack will be converted in MPJ
> 3rd ack by the MPTCP output path/out options building, as the MPJ option
> has higher priority than DSS
> ---
> net/mptcp/options.c | 33 ++++++++---------------------
> net/mptcp/protocol.c | 50 ++++++++++++++++++++++++++++++++++++--------
> net/mptcp/protocol.h | 17 ++++++++-------
> 3 files changed, 59 insertions(+), 41 deletions(-)
>
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index 465d4143d555..12ffee025373 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -422,29 +422,6 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
> return false;
> }
>
> -/* MP_JOIN client subflow must wait for 4th ack before sending any data:
> - * TCP can't schedule delack timer before the subflow is fully established.
> - * MPTCP uses the delack timer to do 3rd ack retransmissions
> - */
> -static void schedule_3rdack_retransmission(struct sock *sk)
> -{
> - struct inet_connection_sock *icsk = inet_csk(sk);
> - struct tcp_sock *tp = tcp_sk(sk);
> - unsigned long timeout;
> -
> - /* reschedule with a timeout above RTT, as we must look only for drop */
> - if (tp->srtt_us)
> - timeout = usecs_to_jiffies(tp->srtt_us >> (3-1));
> - else
> - timeout = TCP_TIMEOUT_INIT;
> - timeout += jiffies;
> -
> - WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER);
> - icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER;
> - icsk->icsk_ack.timeout = timeout;
> - sk_reset_timer(sk, &icsk->icsk_delack_timer, timeout);
> -}
> -
> static void clear_3rdack_retransmission(struct sock *sk)
> {
> struct inet_connection_sock *icsk = inet_csk(sk);
> @@ -527,7 +504,15 @@ static bool mptcp_established_options_mp(struct sock *sk, struct sk_buff *skb,
> *size = TCPOLEN_MPTCP_MPJ_ACK;
> pr_debug("subflow=%p", subflow);
>
> - schedule_3rdack_retransmission(sk);
> + /* we can use the full delegate action helper only from BH context
> + * If we are in process context - sk is flushing the backlog at
> + * socket lock release time - just set the appropriate flag, will
> + * be handled by the release callback
> + */
> + if (sock_owned_by_user(sk))
> + set_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status);
I believe you intended MPTCP_DELEGATE_ACK here
> + else
> + mptcp_subflow_delegate(subflow, MPTCP_DELEGATE_ACK);
> return true;
> }
> return false;
The rest looks good.
--
Mat Martineau
Intel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans
2021-10-29 23:36 ` Mat Martineau
@ 2021-11-02 17:56 ` Paolo Abeni
0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2021-11-02 17:56 UTC (permalink / raw)
To: Mat Martineau; +Cc: mptcp
On Fri, 2021-10-29 at 16:36 -0700, Mat Martineau wrote:
> On Thu, 28 Oct 2021, Paolo Abeni wrote:
>
> > Scheduling a delack in mptcp_established_options_mp() is
> > not a good idea: such function is called by tcp_send_ack() and
> > the pending delayed ack will be cleared shortly after by the
> > tcp_event_ack_sent() call in __tcp_transmit_skb().
> >
> > Instead use the mptcp delegated action infrastructure to
> > schedule the delayed ack after the current bh processing completes.
> >
> > Additionally moves the schedule_3rdack_retransmission() helper
> > into protocol.c to avoid making it visible in a different compilation
> > unit.
> >
> > Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > side note:
> > Why we already see 3rd ack retransimssion in pcap traces even if the delack timer
> > was broken sice the beginning?!? I guess that is due to pure ack generated
> > by mptcp_recvmsg() and pushed on all the subflow. mptcp_send_ack() only
> > checks for the TCP status and not the MPTCP one. When landing on MPJ
> > subflows not already fully established, such data-ack will be converted in MPJ
> > 3rd ack by the MPTCP output path/out options building, as the MPJ option
> > has higher priority than DSS
> > ---
> > net/mptcp/options.c | 33 ++++++++---------------------
> > net/mptcp/protocol.c | 50 ++++++++++++++++++++++++++++++++++++--------
> > net/mptcp/protocol.h | 17 ++++++++-------
> > 3 files changed, 59 insertions(+), 41 deletions(-)
> >
> > diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> > index 465d4143d555..12ffee025373 100644
> > --- a/net/mptcp/options.c
> > +++ b/net/mptcp/options.c
> > @@ -422,29 +422,6 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
> > return false;
> > }
> >
> > -/* MP_JOIN client subflow must wait for 4th ack before sending any data:
> > - * TCP can't schedule delack timer before the subflow is fully established.
> > - * MPTCP uses the delack timer to do 3rd ack retransmissions
> > - */
> > -static void schedule_3rdack_retransmission(struct sock *sk)
> > -{
> > - struct inet_connection_sock *icsk = inet_csk(sk);
> > - struct tcp_sock *tp = tcp_sk(sk);
> > - unsigned long timeout;
> > -
> > - /* reschedule with a timeout above RTT, as we must look only for drop */
> > - if (tp->srtt_us)
> > - timeout = usecs_to_jiffies(tp->srtt_us >> (3-1));
> > - else
> > - timeout = TCP_TIMEOUT_INIT;
> > - timeout += jiffies;
> > -
> > - WARN_ON_ONCE(icsk->icsk_ack.pending & ICSK_ACK_TIMER);
> > - icsk->icsk_ack.pending |= ICSK_ACK_SCHED | ICSK_ACK_TIMER;
> > - icsk->icsk_ack.timeout = timeout;
> > - sk_reset_timer(sk, &icsk->icsk_delack_timer, timeout);
> > -}
> > -
> > static void clear_3rdack_retransmission(struct sock *sk)
> > {
> > struct inet_connection_sock *icsk = inet_csk(sk);
> > @@ -527,7 +504,15 @@ static bool mptcp_established_options_mp(struct sock *sk, struct sk_buff *skb,
> > *size = TCPOLEN_MPTCP_MPJ_ACK;
> > pr_debug("subflow=%p", subflow);
> >
> > - schedule_3rdack_retransmission(sk);
> > + /* we can use the full delegate action helper only from BH context
> > + * If we are in process context - sk is flushing the backlog at
> > + * socket lock release time - just set the appropriate flag, will
> > + * be handled by the release callback
> > + */
> > + if (sock_owned_by_user(sk))
> > + set_bit(MPTCP_DELEGATE_SEND, &subflow->delegated_status);
>
> I believe you intended MPTCP_DELEGATE_ACK here
Indeed! self-test did not catch the above because the mpj ssh is almost
never owned by user before being fully-acked (and generally speaking
every subflow has a very small probability of being owned by user)
I'll send a new version, thanks!
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-11-02 17:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-28 13:08 [PATCH mptcp-net 0/2] mptcp: fix MPJ 3rd ack retransmissions Paolo Abeni
2021-10-28 13:08 ` [PATCH mptcp-net 1/2] mptcp: fix delack timer Paolo Abeni
2021-10-28 13:08 ` [PATCH mptcp-net 2/2] mptcp: use delegate action to schedule 3rd ack retrans Paolo Abeni
2021-10-29 23:36 ` Mat Martineau
2021-11-02 17:56 ` Paolo Abeni
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.