On Wed, 14 Jul 2021, Geliang Tang wrote: > Hi Mat, > > Mat Martineau 于2021年7月14日周三 上午4:35写道: >> >> On Tue, 13 Jul 2021, Geliang Tang wrote: >> >>> Hi Mat, >>> >>> Sorry for late response on this. >>> >>> Mat Martineau 于2021年7月8日周四 上午7:20写道: >>>> >>>> On Mon, 28 Jun 2021, Geliang Tang wrote: >>>> >>>>> When a bad checksum is detected, send out the MP_FAIL option. >>>>> >>>>> When multiple subflows are in use, close the affected subflow with a RST >>>>> that includes an MP_FAIL option. >>>>> >>>>> When a single subfow is in use, send back an MP_FAIL option on the >>>>> subflow-level ACK. And the receiver of this MP_FAIL respond with an >>>>> MP_FAIL in the reverse direction. >>>>> >>>>> Signed-off-by: Geliang Tang >>>>> --- >>>>> net/mptcp/pm.c | 10 ++++++++++ >>>>> net/mptcp/protocol.h | 14 ++++++++++++++ >>>>> net/mptcp/subflow.c | 18 ++++++++++++++++++ >>>>> 3 files changed, 42 insertions(+) >>>>> >>>>> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c >>>>> index d4c19420194a..c34c9c0b2fa5 100644 >>>>> --- a/net/mptcp/pm.c >>>>> +++ b/net/mptcp/pm.c >>>>> @@ -250,8 +250,18 @@ void mptcp_pm_mp_prio_received(struct sock *sk, u8 bkup) >>>>> void mptcp_pm_mp_fail_received(struct sock *sk, u64 fail_seq) >>>>> { >>>>> struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); >>>>> + struct mptcp_sock *msk = mptcp_sk(subflow->conn); >>>>> >>>>> pr_debug("map_seq=%llu fail_seq=%llu", subflow->map_seq, fail_seq); >>>>> + >>>>> + if (!msk->pm.subflows) { >>>> >>>> The pm.lock isn't held so it's not safe to access pm.subflows >>>> >>>> I don't think it's sufficient to read pm.subflows with the lock or add >>>> READ_ONCE/WRITE_ONCE, since that would still allow race conditions with >>>> the msk. To handle fallback when receiving MP_FAIL I think the >>>> sock_owned_by_user() checks and delegated callback (similar to >>>> mptcp_subflow_process_delegated()) may be needed. >>>> >>> >>> I think it's better to use the below mptcp_has_another_subflow() function >>> here instead of using msk->pm.subflows to check the single subflow case. >>> >> >> Yes, that sounds good. >> >>>>> + if (!subflow->mp_fail_need_echo) { >>>>> + subflow->send_mp_fail = 1; >>>>> + subflow->fail_seq = fail_seq; >>>> >>>> Echoing the fail_seq back doesn't seem correct, from the RFC it seems like >>>> this side should send a sequence number for the opposite data direction? >>>> Do you agree? >>> >>> So we should use 'subflow->fail_seq = subflow->map_seq;' here? >> >> I think so - but I'm concerned about the possibility of out-of-order data >> and checking for single- or multiple-subflow cases. I need to look at the >> RFC some more to tell for sure. >> >>> >>>> >>>>> + } else { >>>>> + subflow->mp_fail_need_echo = 0; >>>>> + } >>>>> + } >>>>> } >>>>> >>>>> /* path manager helpers */ >>>>> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h >>>>> index 8e050575a2d9..7a49454c77a6 100644 >>>>> --- a/net/mptcp/protocol.h >>>>> +++ b/net/mptcp/protocol.h >>>>> @@ -422,6 +422,7 @@ struct mptcp_subflow_context { >>>>> backup : 1, >>>>> send_mp_prio : 1, >>>>> send_mp_fail : 1, >>>>> + mp_fail_need_echo : 1, >>>> >>>> I think mp_fail_expect_echo would be a more accurate name. >>> >>> Updated in v4. >>> >> >> Ok. >> >>>> >>>>> rx_eof : 1, >>>>> can_ack : 1, /* only after processing the remote a key */ >>>>> disposable : 1; /* ctx can be free at ulp release time */ >>>>> @@ -594,6 +595,19 @@ static inline void mptcp_subflow_tcp_fallback(struct sock *sk, >>>>> inet_csk(sk)->icsk_af_ops = ctx->icsk_af_ops; >>>>> } >>>>> >>>>> +static inline bool mptcp_has_another_subflow_established(struct sock *ssk) >>>>> +{ >>>>> + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk), *tmp; >>>>> + struct mptcp_sock *msk = mptcp_sk(subflow->conn); >>>>> + >>>>> + mptcp_for_each_subflow(msk, tmp) { >>>>> + if (tmp->fully_established && tmp != subflow) >>>> >>>> Why check tmp->fully_established here? >>> >>> I'll drop tmp->fully_established, and rename this function >>> mptcp_has_another_subflow to check whether there's a single subflow. >> >> Sounds good. >> >>> >>> >>> In my test, this MP_FAIL patchset only works in the multiple subflows case. >>> The receiver detected the bad checksum, sent RST + MP_FAIL to close this >>> subflow and discard the data with the bad checksum. Then the sender will >>> retransmit a new data with good checksum from other subflow. And the >>> receiver can resemble the whole data successfully. >>> >>> But the single subflow case dosen't work. I don't know how the new data >>> can be retransmitted from the sender side when it fallback to regular TCP. >>> So I'm not sure how to implement the single subflow case. >>> >>> Could you please give me some suggestions? >> >> Yes, the multiple subflow case is the simpler one this time. We can throw >> away the affected subflow and recover using the others. >> >> Since the non-contiguous, single subflow case has to close the subflow, >> that's more like the "easier" case. >> >> So, it's the infinite mapping case that's tricky. Have you looked at the >> multipath-tcp.org code? The important part seems to be using the >> information in the infinite mapping to determine what data to keep or >> throw away before beginning "fallback" behavior. I can look at the >> infinite mapping information in the RFC and see how that fits with our >> implementation tomorrow. >> > > Thanks for your help. > > I want to split this patchset into two part, "MP_FAIL support" and "the > infinite mapping support". > > In the MP_FAIL part, just deal with the multiple subflows case, the easier > case. Include the MP_FAIL sending and MP_FAIL receiving code in the part. > This part had been implemented and tested. I'll send a v4 for it. > > And put the single subflow case into the new "infinite mapping" part. I'll > add a new ticket on github to trace it, and send a RFC version for it later. > It seems that there's still a lot of work to do in this part. > > Do you agree? Yes, I think it would help to separate the single-subflow case like that. Thanks! Mat >>>> >>>>> + return true; >>>>> + } >>>>> + >>>>> + return false; >>>>> +} >>>>> + >>>>> void __init mptcp_proto_init(void); >>>>> #if IS_ENABLED(CONFIG_MPTCP_IPV6) >>>>> int __init mptcp_proto_v6_init(void); >>>>> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c >>>>> index 0b5d4a3eadcd..46302208c474 100644 >>>>> --- a/net/mptcp/subflow.c >>>>> +++ b/net/mptcp/subflow.c >>>>> @@ -913,6 +913,8 @@ static enum mapping_status validate_data_csum(struct sock *ssk, struct sk_buff * >>>>> csum = csum_partial(&header, sizeof(header), subflow->map_data_csum); >>>>> if (unlikely(csum_fold(csum))) { >>>>> MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DATACSUMERR); >>>>> + subflow->send_mp_fail = 1; >>>>> + subflow->fail_seq = subflow->map_seq; >>>>> return subflow->mp_join ? MAPPING_INVALID : MAPPING_DUMMY; >>>>> } >>>>> >>>>> @@ -1160,6 +1162,22 @@ static bool subflow_check_data_avail(struct sock *ssk) >>>>> >>>>> fallback: >>>>> /* RFC 8684 section 3.7. */ >>>>> + if (subflow->send_mp_fail) { >>>>> + if (mptcp_has_another_subflow_established(ssk)) { >>>>> + mptcp_subflow_reset(ssk); >>>>> + while ((skb = skb_peek(&ssk->sk_receive_queue))) >>>>> + sk_eat_skb(ssk, skb); >>>>> + WRITE_ONCE(subflow->data_avail, 0); >>>>> + return true; >>>>> + } >>>>> + >>>>> + if (!msk->pm.subflows) { >>>>> + subflow->mp_fail_need_echo = 1; >>>>> + WRITE_ONCE(subflow->data_avail, 0); >>>>> + return true; >>>>> + } >>>>> + } >>>>> + >>>>> if (subflow->mp_join || subflow->fully_established) { >>>>> /* fatal protocol error, close the socket. >>>>> * subflow_error_report() will introduce the appropriate barriers >>>>> -- >>>>> 2.31.1 -- Mat Martineau Intel