On Tue, 13 Jul 2021, Geliang Tang wrote:

> Hi Mat,
>
> Sorry for late response on this.
>
> Mat Martineau <mathew.j.martineau@linux.intel.com> 于2021年7月8日周四 上午7:20写道：
>>
>> On Mon, 28 Jun 2021, Geliang Tang wrote:
>>
>>> When a bad checksum is detected, send out the MP_FAIL option.
>>>
>>> When multiple subflows are in use, close the affected subflow with a RST
>>> that includes an MP_FAIL option.
>>>
>>> When a single subfow is in use, send back an MP_FAIL option on the
>>> subflow-level ACK. And the receiver of this MP_FAIL respond with an
>>> MP_FAIL in the reverse direction.
>>>
>>> Signed-off-by: Geliang Tang <geliangtang@gmail.com>
>>> ---
>>> net/mptcp/pm.c       | 10 ++++++++++
>>> net/mptcp/protocol.h | 14 ++++++++++++++
>>> net/mptcp/subflow.c  | 18 ++++++++++++++++++
>>> 3 files changed, 42 insertions(+)
>>>
>>> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
>>> index d4c19420194a..c34c9c0b2fa5 100644
>>> --- a/net/mptcp/pm.c
>>> +++ b/net/mptcp/pm.c
>>> @@ -250,8 +250,18 @@ void mptcp_pm_mp_prio_received(struct sock *sk, u8 bkup)
>>> void mptcp_pm_mp_fail_received(struct sock *sk, u64 fail_seq)
>>> {
>>>       struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
>>> +     struct mptcp_sock *msk = mptcp_sk(subflow->conn);
>>>
>>>       pr_debug("map_seq=%llu fail_seq=%llu", subflow->map_seq, fail_seq);
>>> +
>>> +     if (!msk->pm.subflows) {
>>
>> The pm.lock isn't held so it's not safe to access pm.subflows
>>
>> I don't think it's sufficient to read pm.subflows with the lock or add
>> READ_ONCE/WRITE_ONCE, since that would still allow race conditions with
>> the msk. To handle fallback when receiving MP_FAIL I think the
>> sock_owned_by_user() checks and delegated callback (similar to
>> mptcp_subflow_process_delegated()) may be needed.
>>
>
> I think it's better to use the below mptcp_has_another_subflow() function
> here instead of using msk->pm.subflows to check the single subflow case.
>

Yes, that sounds good.

>>> +             if (!subflow->mp_fail_need_echo) {
>>> +                     subflow->send_mp_fail = 1;
>>> +                     subflow->fail_seq = fail_seq;
>>
>> Echoing the fail_seq back doesn't seem correct, from the RFC it seems like
>> this side should send a sequence number for the opposite data direction?
>> Do you agree?
>
> So we should use 'subflow->fail_seq = subflow->map_seq;' here?

I think so - but I'm concerned about the possibility of out-of-order data 
and checking for single- or multiple-subflow cases. I need to look at the 
RFC some more to tell for sure.

>
>>
>>> +             } else {
>>> +                     subflow->mp_fail_need_echo = 0;
>>> +             }
>>> +     }
>>> }
>>>
>>> /* path manager helpers */
>>> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
>>> index 8e050575a2d9..7a49454c77a6 100644
>>> --- a/net/mptcp/protocol.h
>>> +++ b/net/mptcp/protocol.h
>>> @@ -422,6 +422,7 @@ struct mptcp_subflow_context {
>>>               backup : 1,
>>>               send_mp_prio : 1,
>>>               send_mp_fail : 1,
>>> +             mp_fail_need_echo : 1,
>>
>> I think mp_fail_expect_echo would be a more accurate name.
>
> Updated in v4.
>

Ok.

>>
>>>               rx_eof : 1,
>>>               can_ack : 1,        /* only after processing the remote a key */
>>>               disposable : 1;     /* ctx can be free at ulp release time */
>>> @@ -594,6 +595,19 @@ static inline void mptcp_subflow_tcp_fallback(struct sock *sk,
>>>       inet_csk(sk)->icsk_af_ops = ctx->icsk_af_ops;
>>> }
>>>
>>> +static inline bool mptcp_has_another_subflow_established(struct sock *ssk)
>>> +{
>>> +     struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk), *tmp;
>>> +     struct mptcp_sock *msk = mptcp_sk(subflow->conn);
>>> +
>>> +     mptcp_for_each_subflow(msk, tmp) {
>>> +             if (tmp->fully_established && tmp != subflow)
>>
>> Why check tmp->fully_established here?
>
> I'll drop tmp->fully_established, and rename this function
> mptcp_has_another_subflow to check whether there's a single subflow.

Sounds good.

>
>
> In my test, this MP_FAIL patchset only works in the multiple subflows case.
> The receiver detected the bad checksum, sent RST + MP_FAIL to close this
> subflow and discard the data with the bad checksum. Then the sender will
> retransmit a new data with good checksum from other subflow. And the
> receiver can resemble the whole data successfully.
>
> But the single subflow case dosen't work. I don't know how the new data
> can be retransmitted from the sender side when it fallback to regular TCP.
> So I'm not sure how to implement the single subflow case.
>
> Could you please give me some suggestions?

Yes, the multiple subflow case is the simpler one this time. We can throw 
away the affected subflow and recover using the others.

Since the non-contiguous, single subflow case has to close the subflow, 
that's more like the "easier" case.

So, it's the infinite mapping case that's tricky. Have you looked at the 
multipath-tcp.org code? The important part seems to be using the 
information in the infinite mapping to determine what data to keep or 
throw away before beginning "fallback" behavior. I can look at the 
infinite mapping information in the RFC and see how that fits with our 
implementation tomorrow.

- Mat


>>
>>> +                     return true;
>>> +     }
>>> +
>>> +     return false;
>>> +}
>>> +
>>> void __init mptcp_proto_init(void);
>>> #if IS_ENABLED(CONFIG_MPTCP_IPV6)
>>> int __init mptcp_proto_v6_init(void);
>>> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
>>> index 0b5d4a3eadcd..46302208c474 100644
>>> --- a/net/mptcp/subflow.c
>>> +++ b/net/mptcp/subflow.c
>>> @@ -913,6 +913,8 @@ static enum mapping_status validate_data_csum(struct sock *ssk, struct sk_buff *
>>>       csum = csum_partial(&header, sizeof(header), subflow->map_data_csum);
>>>       if (unlikely(csum_fold(csum))) {
>>>               MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DATACSUMERR);
>>> +             subflow->send_mp_fail = 1;
>>> +             subflow->fail_seq = subflow->map_seq;
>>>               return subflow->mp_join ? MAPPING_INVALID : MAPPING_DUMMY;
>>>       }
>>>
>>> @@ -1160,6 +1162,22 @@ static bool subflow_check_data_avail(struct sock *ssk)
>>>
>>> fallback:
>>>       /* RFC 8684 section 3.7. */
>>> +     if (subflow->send_mp_fail) {
>>> +             if (mptcp_has_another_subflow_established(ssk)) {
>>> +                     mptcp_subflow_reset(ssk);
>>> +                     while ((skb = skb_peek(&ssk->sk_receive_queue)))
>>> +                             sk_eat_skb(ssk, skb);
>>> +                     WRITE_ONCE(subflow->data_avail, 0);
>>> +                     return true;
>>> +             }
>>> +
>>> +             if (!msk->pm.subflows) {
>>> +                     subflow->mp_fail_need_echo = 1;
>>> +                     WRITE_ONCE(subflow->data_avail, 0);
>>> +                     return true;
>>> +             }
>>> +     }
>>> +
>>>       if (subflow->mp_join || subflow->fully_established) {
>>>               /* fatal protocol error, close the socket.
>>>                * subflow_error_report() will introduce the appropriate barriers
>>> --
>>> 2.31.1

--
Mat Martineau
Intel