On Wed, 14 Jul 2021, Geliang Tang wrote:

> Hi Mat,
>
> Mat Martineau <mathew.j.martineau@linux.intel.com> 于2021年7月14日周三 上午4:35写道：
>>
>> On Tue, 13 Jul 2021, Geliang Tang wrote:
>>
>>> Hi Mat,
>>>
>>> Sorry for late response on this.
>>>
>>> Mat Martineau <mathew.j.martineau@linux.intel.com> 于2021年7月8日周四 上午7:20写道：
>>>>
>>>> On Mon, 28 Jun 2021, Geliang Tang wrote:
>>>>
>>>>> When a bad checksum is detected, send out the MP_FAIL option.
>>>>>
>>>>> When multiple subflows are in use, close the affected subflow with a RST
>>>>> that includes an MP_FAIL option.
>>>>>
>>>>> When a single subfow is in use, send back an MP_FAIL option on the
>>>>> subflow-level ACK. And the receiver of this MP_FAIL respond with an
>>>>> MP_FAIL in the reverse direction.
>>>>>
>>>>> Signed-off-by: Geliang Tang <geliangtang@gmail.com>
>>>>> ---
>>>>> net/mptcp/pm.c       | 10 ++++++++++
>>>>> net/mptcp/protocol.h | 14 ++++++++++++++
>>>>> net/mptcp/subflow.c  | 18 ++++++++++++++++++
>>>>> 3 files changed, 42 insertions(+)
>>>>>
>>>>> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
>>>>> index d4c19420194a..c34c9c0b2fa5 100644
>>>>> --- a/net/mptcp/pm.c
>>>>> +++ b/net/mptcp/pm.c
>>>>> @@ -250,8 +250,18 @@ void mptcp_pm_mp_prio_received(struct sock *sk, u8 bkup)
>>>>> void mptcp_pm_mp_fail_received(struct sock *sk, u64 fail_seq)
>>>>> {
>>>>>       struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
>>>>> +     struct mptcp_sock *msk = mptcp_sk(subflow->conn);
>>>>>
>>>>>       pr_debug("map_seq=%llu fail_seq=%llu", subflow->map_seq, fail_seq);
>>>>> +
>>>>> +     if (!msk->pm.subflows) {
>>>>
>>>> The pm.lock isn't held so it's not safe to access pm.subflows
>>>>
>>>> I don't think it's sufficient to read pm.subflows with the lock or add
>>>> READ_ONCE/WRITE_ONCE, since that would still allow race conditions with
>>>> the msk. To handle fallback when receiving MP_FAIL I think the
>>>> sock_owned_by_user() checks and delegated callback (similar to
>>>> mptcp_subflow_process_delegated()) may be needed.
>>>>
>>>
>>> I think it's better to use the below mptcp_has_another_subflow() function
>>> here instead of using msk->pm.subflows to check the single subflow case.
>>>
>>
>> Yes, that sounds good.
>>
>>>>> +             if (!subflow->mp_fail_need_echo) {
>>>>> +                     subflow->send_mp_fail = 1;
>>>>> +                     subflow->fail_seq = fail_seq;
>>>>
>>>> Echoing the fail_seq back doesn't seem correct, from the RFC it seems like
>>>> this side should send a sequence number for the opposite data direction?
>>>> Do you agree?
>>>
>>> So we should use 'subflow->fail_seq = subflow->map_seq;' here?
>>
>> I think so - but I'm concerned about the possibility of out-of-order data
>> and checking for single- or multiple-subflow cases. I need to look at the
>> RFC some more to tell for sure.
>>
>>>
>>>>
>>>>> +             } else {
>>>>> +                     subflow->mp_fail_need_echo = 0;
>>>>> +             }
>>>>> +     }
>>>>> }
>>>>>
>>>>> /* path manager helpers */
>>>>> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
>>>>> index 8e050575a2d9..7a49454c77a6 100644
>>>>> --- a/net/mptcp/protocol.h
>>>>> +++ b/net/mptcp/protocol.h
>>>>> @@ -422,6 +422,7 @@ struct mptcp_subflow_context {
>>>>>               backup : 1,
>>>>>               send_mp_prio : 1,
>>>>>               send_mp_fail : 1,
>>>>> +             mp_fail_need_echo : 1,
>>>>
>>>> I think mp_fail_expect_echo would be a more accurate name.
>>>
>>> Updated in v4.
>>>
>>
>> Ok.
>>
>>>>
>>>>>               rx_eof : 1,
>>>>>               can_ack : 1,        /* only after processing the remote a key */
>>>>>               disposable : 1;     /* ctx can be free at ulp release time */
>>>>> @@ -594,6 +595,19 @@ static inline void mptcp_subflow_tcp_fallback(struct sock *sk,
>>>>>       inet_csk(sk)->icsk_af_ops = ctx->icsk_af_ops;
>>>>> }
>>>>>
>>>>> +static inline bool mptcp_has_another_subflow_established(struct sock *ssk)
>>>>> +{
>>>>> +     struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk), *tmp;
>>>>> +     struct mptcp_sock *msk = mptcp_sk(subflow->conn);
>>>>> +
>>>>> +     mptcp_for_each_subflow(msk, tmp) {
>>>>> +             if (tmp->fully_established && tmp != subflow)
>>>>
>>>> Why check tmp->fully_established here?
>>>
>>> I'll drop tmp->fully_established, and rename this function
>>> mptcp_has_another_subflow to check whether there's a single subflow.
>>
>> Sounds good.
>>
>>>
>>>
>>> In my test, this MP_FAIL patchset only works in the multiple subflows case.
>>> The receiver detected the bad checksum, sent RST + MP_FAIL to close this
>>> subflow and discard the data with the bad checksum. Then the sender will
>>> retransmit a new data with good checksum from other subflow. And the
>>> receiver can resemble the whole data successfully.
>>>
>>> But the single subflow case dosen't work. I don't know how the new data
>>> can be retransmitted from the sender side when it fallback to regular TCP.
>>> So I'm not sure how to implement the single subflow case.
>>>
>>> Could you please give me some suggestions?
>>
>> Yes, the multiple subflow case is the simpler one this time. We can throw
>> away the affected subflow and recover using the others.
>>
>> Since the non-contiguous, single subflow case has to close the subflow,
>> that's more like the "easier" case.
>>
>> So, it's the infinite mapping case that's tricky. Have you looked at the
>> multipath-tcp.org code? The important part seems to be using the
>> information in the infinite mapping to determine what data to keep or
>> throw away before beginning "fallback" behavior. I can look at the
>> infinite mapping information in the RFC and see how that fits with our
>> implementation tomorrow.
>>
>
> Thanks for your help.
>
> I want to split this patchset into two part, "MP_FAIL support" and "the
> infinite mapping support".
>
> In the MP_FAIL part, just deal with the multiple subflows case, the easier
> case. Include the MP_FAIL sending and MP_FAIL receiving code in the part.
> This part had been implemented and tested. I'll send a v4 for it.
>
> And put the single subflow case into the new "infinite mapping" part. I'll
> add a new ticket on github to trace it, and send a RFC version for it later.
> It seems that there's still a lot of work to do in this part.
>
> Do you agree?

Yes, I think it would help to separate the single-subflow case like that. 
Thanks!

Mat


>>>>
>>>>> +                     return true;
>>>>> +     }
>>>>> +
>>>>> +     return false;
>>>>> +}
>>>>> +
>>>>> void __init mptcp_proto_init(void);
>>>>> #if IS_ENABLED(CONFIG_MPTCP_IPV6)
>>>>> int __init mptcp_proto_v6_init(void);
>>>>> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
>>>>> index 0b5d4a3eadcd..46302208c474 100644
>>>>> --- a/net/mptcp/subflow.c
>>>>> +++ b/net/mptcp/subflow.c
>>>>> @@ -913,6 +913,8 @@ static enum mapping_status validate_data_csum(struct sock *ssk, struct sk_buff *
>>>>>       csum = csum_partial(&header, sizeof(header), subflow->map_data_csum);
>>>>>       if (unlikely(csum_fold(csum))) {
>>>>>               MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DATACSUMERR);
>>>>> +             subflow->send_mp_fail = 1;
>>>>> +             subflow->fail_seq = subflow->map_seq;
>>>>>               return subflow->mp_join ? MAPPING_INVALID : MAPPING_DUMMY;
>>>>>       }
>>>>>
>>>>> @@ -1160,6 +1162,22 @@ static bool subflow_check_data_avail(struct sock *ssk)
>>>>>
>>>>> fallback:
>>>>>       /* RFC 8684 section 3.7. */
>>>>> +     if (subflow->send_mp_fail) {
>>>>> +             if (mptcp_has_another_subflow_established(ssk)) {
>>>>> +                     mptcp_subflow_reset(ssk);
>>>>> +                     while ((skb = skb_peek(&ssk->sk_receive_queue)))
>>>>> +                             sk_eat_skb(ssk, skb);
>>>>> +                     WRITE_ONCE(subflow->data_avail, 0);
>>>>> +                     return true;
>>>>> +             }
>>>>> +
>>>>> +             if (!msk->pm.subflows) {
>>>>> +                     subflow->mp_fail_need_echo = 1;
>>>>> +                     WRITE_ONCE(subflow->data_avail, 0);
>>>>> +                     return true;
>>>>> +             }
>>>>> +     }
>>>>> +
>>>>>       if (subflow->mp_join || subflow->fully_established) {
>>>>>               /* fatal protocol error, close the socket.
>>>>>                * subflow_error_report() will introduce the appropriate barriers
>>>>> --
>>>>> 2.31.1

--
Mat Martineau
Intel