* [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error. @ 2022-09-01 17:31 Paolo Abeni 2022-09-01 17:31 ` [PATCH v3 mptcp-next 2/3] mptcp: use fastclose on more edge scenarios Paolo Abeni ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Paolo Abeni @ 2022-09-01 17:31 UTC (permalink / raw) To: mptcp When an mptcp socket is closed due to an incoming FASTCLOSE option, so specific sk_err is set and later syscall will fail usually with EPIPE. Align the current fastclose error handling with TCP reset, properly setting the socket error according to the current msk state and propagating such error. Additionally sendmsg() is currently not handling properly the sk_err, always returning EPIPE. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/mptcp/protocol.c | 45 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 35 insertions(+), 10 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f4891db86217..b04f184695e4 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1685,9 +1685,13 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) { ret = sk_stream_wait_connect(sk, &timeo); if (ret) - goto out; + goto do_error; } + ret = -EPIPE; + if (unlikely(sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))) + goto do_error; + pfrag = sk_page_frag(sk); while (msg_data_left(msg)) { @@ -1696,11 +1700,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) bool dfrag_collapsed; size_t psize, offset; - if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) { - ret = -EPIPE; - goto out; - } - /* reuse tail pfrag, if possible, or carve a new one from the * page allocator */ @@ -1732,7 +1731,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) if (copy_page_from_iter(dfrag->page, offset, psize, &msg->msg_iter) != psize) { ret = -EFAULT; - goto out; + goto do_error; } /* data successfully copied into the write queue */ @@ -1764,7 +1763,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) __mptcp_push_pending(sk, msg->msg_flags); ret = sk_stream_wait_memory(sk, &timeo); if (ret) - goto out; + goto do_error; } if (copied) @@ -1772,7 +1771,14 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) out: release_sock(sk); - return copied ? : ret; + return copied; + +do_error: + if (copied) + goto out; + + copied = sk_stream_error(sk, msg->msg_flags, ret); + goto out; } static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, @@ -2404,12 +2410,31 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk) unlock_sock_fast(tcp_sk, slow); } + /* Mirror the tcp_reset() error propagation */ + switch (sk->sk_state) { + case TCP_SYN_SENT: + sk->sk_err = ECONNREFUSED; + break; + case TCP_CLOSE_WAIT: + sk->sk_err = EPIPE; + break; + case TCP_CLOSE: + return; + default: + sk->sk_err = ECONNRESET; + } + inet_sk_state_store(sk, TCP_CLOSE); sk->sk_shutdown = SHUTDOWN_MASK; smp_mb__before_atomic(); /* SHUTDOWN must be visible first */ set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags); - mptcp_close_wake_up(sk); + /* the calling mptcp_worker will properly destroy the socket */ + if (sock_flag(sk, SOCK_DEAD)) + return; + + sk->sk_state_change(sk); + sk_error_report(sk); } static void __mptcp_retrans(struct sock *sk) -- 2.37.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 mptcp-next 2/3] mptcp: use fastclose on more edge scenarios. 2022-09-01 17:31 [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error Paolo Abeni @ 2022-09-01 17:31 ` Paolo Abeni 2022-09-01 17:31 ` [PATCH v3 mptcp-next 3/3] selftests: mptcp: update and extend fastclose test-cases Paolo Abeni 2022-09-07 17:22 ` [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error Matthieu Baerts 2 siblings, 0 replies; 10+ messages in thread From: Paolo Abeni @ 2022-09-01 17:31 UTC (permalink / raw) To: mptcp Daire reported a user-space application hang-up when the peer is forcibly closed before the data transfer completion. The relevant application expects the peer to either do an application-level clean shutdown or a transport-level connection reset. We can accommodate a such user by extending the fastclose usage: at fd close time, if the msk socket has some unread data, and at FIN_WAIT timeout. Note that at MPTCP close time we must ensure that the TCP subflows will reset: set the linger socket option to a suitable value. Signed-off-by: Paolo Abeni <pabeni@redhat.com> -- v2 -> v3: - move self-tests update to a different patch v1 -> v2: - ensure mptcp_do_fastclose will reset all the subflows, - update fastclose self-tests --- net/mptcp/protocol.c | 50 +++++++++++++++++++++++++++++++------------- 1 file changed, 36 insertions(+), 14 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index b04f184695e4..fec542fea02a 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2276,8 +2276,14 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk, lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); - if (flags & MPTCP_CF_FASTCLOSE) + if (flags & MPTCP_CF_FASTCLOSE) { + /* be sure to force the tcp_disconnect() path, + * to generate the egress reset + */ + ssk->sk_lingertime = 0; + sock_set_flag(ssk, SOCK_LINGER); subflow->send_fastclose = 1; + } need_push = (flags & MPTCP_CF_PUSH) && __mptcp_retransmit_pending_data(sk); if (!dispose_it) { @@ -2540,6 +2546,16 @@ static void mptcp_mp_fail_no_response(struct mptcp_sock *msk) mptcp_reset_timeout(msk, 0); } +static void mptcp_do_fastclose(struct sock *sk) +{ + struct mptcp_subflow_context *subflow, *tmp; + struct mptcp_sock *msk = mptcp_sk(sk); + + mptcp_for_each_subflow_safe(msk, subflow, tmp) + __mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), + subflow, MPTCP_CF_FASTCLOSE); +} + static void mptcp_worker(struct work_struct *work) { struct mptcp_sock *msk = container_of(work, struct mptcp_sock, work); @@ -2571,6 +2587,7 @@ static void mptcp_worker(struct work_struct *work) if (sock_flag(sk, SOCK_DEAD) && (mptcp_check_close_timeout(sk) || sk->sk_state == TCP_CLOSE)) { inet_sk_state_store(sk, TCP_CLOSE); + mptcp_do_fastclose(sk); __mptcp_destroy_sock(sk); goto unlock; } @@ -2823,6 +2840,18 @@ static void __mptcp_destroy_sock(struct sock *sk) sock_put(sk); } +static __poll_t mptcp_check_readable(struct mptcp_sock *msk) +{ + /* Concurrent splices from sk_receive_queue into receive_queue will + * always show at least one non-empty queue when checked in this order. + */ + if (skb_queue_empty_lockless(&((struct sock *)msk)->sk_receive_queue) && + skb_queue_empty_lockless(&msk->receive_queue)) + return 0; + + return EPOLLIN | EPOLLRDNORM; +} + static void mptcp_close(struct sock *sk, long timeout) { struct mptcp_subflow_context *subflow; @@ -2837,8 +2866,13 @@ static void mptcp_close(struct sock *sk, long timeout) goto cleanup; } - if (mptcp_close_state(sk)) + if (mptcp_check_readable(msk)) { + /* the msk has read data, do the MPTCP equivalent of TCP reset */ + inet_sk_state_store(sk, TCP_CLOSE); + mptcp_do_fastclose(sk); + } else if (mptcp_close_state(sk)) { __mptcp_wr_shutdown(sk); + } sk_stream_wait_close(sk, timeout); @@ -3646,18 +3680,6 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, return err; } -static __poll_t mptcp_check_readable(struct mptcp_sock *msk) -{ - /* Concurrent splices from sk_receive_queue into receive_queue will - * always show at least one non-empty queue when checked in this order. - */ - if (skb_queue_empty_lockless(&((struct sock *)msk)->sk_receive_queue) && - skb_queue_empty_lockless(&msk->receive_queue)) - return 0; - - return EPOLLIN | EPOLLRDNORM; -} - static __poll_t mptcp_check_writeable(struct mptcp_sock *msk) { struct sock *sk = (struct sock *)msk; -- 2.37.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 mptcp-next 3/3] selftests: mptcp: update and extend fastclose test-cases 2022-09-01 17:31 [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error Paolo Abeni 2022-09-01 17:31 ` [PATCH v3 mptcp-next 2/3] mptcp: use fastclose on more edge scenarios Paolo Abeni @ 2022-09-01 17:31 ` Paolo Abeni 2022-09-07 17:22 ` Matthieu Baerts 2022-09-07 17:22 ` [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error Matthieu Baerts 2 siblings, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2022-09-01 17:31 UTC (permalink / raw) To: mptcp After the previous patches, the MPTCP protocol can generate fast-closes on both ends of the connection. Rework the relevant test-case to carefully trigger the fast-close code-path on a single end at the time, while ensuring than a predictable amount of data is spooled on both ends. Additionally add another test-cases for the passive socket fast-close. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- v2 -> v3: - splitted out of 2/2 - fixed a bunch of checkpatch issues --- .../selftests/net/mptcp/mptcp_connect.c | 58 ++++++++++-- .../testing/selftests/net/mptcp/mptcp_join.sh | 88 +++++++++++++++---- 2 files changed, 120 insertions(+), 26 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index 24d4e9cb617e..ee6df83c55f3 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -72,6 +72,7 @@ static int cfg_wait; static uint32_t cfg_mark; static char *cfg_input; static int cfg_repeat = 1; +static int cfg_truncate; struct cfg_cmsg_types { unsigned int cmsg_enabled:1; @@ -381,8 +382,6 @@ static size_t do_rnd_write(const int fd, char *buf, const size_t len) do_w = cfg_do_w; bw = write(fd, buf, do_w); - if (bw < 0) - perror("write"); /* let the join handshake complete, before going on */ if (cfg_join && first) { @@ -571,7 +570,7 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after .fd = peerfd, .events = POLLIN | POLLOUT, }; - unsigned int woff = 0, wlen = 0; + unsigned int woff = 0, wlen = 0, total_wlen = 0, total_rlen = 0; char wbuf[8192]; set_nonblock(peerfd, true); @@ -597,7 +596,16 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after } if (fds.revents & POLLIN) { - len = do_rnd_read(peerfd, rbuf, sizeof(rbuf)); + ssize_t rb = sizeof(rbuf); + + /* limit the total amount of read data to the trunc value*/ + if (cfg_truncate > 0) { + if (rb + total_rlen > cfg_truncate) + rb = cfg_truncate - total_rlen; + len = read(peerfd, rbuf, rb); + } else { + len = do_rnd_read(peerfd, rbuf, sizeof(rbuf)); + } if (len == 0) { /* no more data to receive: * peer has closed its write side @@ -612,10 +620,14 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after /* Else, still have data to transmit */ } else if (len < 0) { + /* ignore errors on I/O operation when the peer is fastclosing */ + if (cfg_truncate < 0) + return 0; perror("read"); return 3; } + total_rlen += len; do_write(outfd, rbuf, len); } @@ -628,12 +640,21 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after if (wlen > 0) { ssize_t bw; + /* limit the total amount of written data to the trunc value */ + if (cfg_truncate > 0 && wlen + total_wlen > cfg_truncate) + wlen = cfg_truncate - total_wlen; + bw = do_rnd_write(peerfd, wbuf + woff, wlen); - if (bw < 0) + if (bw < 0) { + if (cfg_truncate < 0) + return 0; + perror("write"); return 111; + } woff += bw; wlen -= bw; + total_wlen += bw; } else if (wlen == 0) { /* We have no more data to send. */ fds.events &= ~POLLOUT; @@ -652,10 +673,19 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after } if (fds.revents & (POLLERR | POLLNVAL)) { + /* when the peer is truncating the file and fastclosing, ignore + * HUP/connection reset errors + */ + if (cfg_truncate < 0) + return 0; fprintf(stderr, "Unexpected revents: " "POLLERR/POLLNVAL(%x)\n", fds.revents); return 5; } + + if (cfg_truncate > 0 && total_wlen >= cfg_truncate && + total_rlen >= cfg_truncate) + break; } /* leave some time for late join/announce */ @@ -1160,11 +1190,13 @@ int main_loop(void) } /* close the client socket open only if we are not going to reconnect */ - ret = copyfd_io(fd_in, fd, 1, cfg_repeat == 1); + ret = copyfd_io(fd_in, fd, 1, 0); if (ret) return ret; - if (--cfg_repeat > 0) { + if (cfg_truncate > 0) { + xdisconnect(fd, peer->ai_addrlen); + } else if (--cfg_repeat > 0) { xdisconnect(fd, peer->ai_addrlen); /* the socket could be unblocking at this point, we need the @@ -1176,7 +1208,8 @@ int main_loop(void) if (cfg_input) close(fd_in); goto again; - } + } else + close(fd); return 0; } @@ -1262,8 +1295,15 @@ static void parse_opts(int argc, char **argv) { int c; - while ((c = getopt(argc, argv, "6c:hi:I:jlm:M:o:p:P:r:R:s:S:t:T:w:")) != -1) { + while ((c = getopt(argc, argv, "6c:f:hi:I:jlm:M:o:p:P:r:R:s:S:t:T:w:")) != -1) { switch (c) { + case 'f': + cfg_truncate = atoi(optarg); + + /* when receiving a fastclose, ignore PIPE signals */ + if (cfg_truncate < 0) + signal(SIGPIPE, handle_signal); + break; case 'j': cfg_join = true; cfg_mode = CFG_MODE_POLL; diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index 2957fe414639..0f4c9c4bcb5b 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -346,10 +346,21 @@ check_transfer() local in=$1 local out=$2 local what=$3 + local bytes=$4 local i a b local line - cmp -l "$in" "$out" | while read -r i a b; do + if [ -n "$bytes" ]; then + # when truncating we must check the size explicitly + local out_size=$(wc -c $out | awk '{print $1}') + if [ $out_size -ne $bytes ]; then + echo "[ FAIL ] $what output file has wrong size ($out_size, $bytes)" + fail_test + return 1 + fi + bytes="--bytes=${bytes}" + fi + cmp -l "$in" "$out" ${bytes} | while read -r i a b; do local sum=$((0${a} + 0${b})) if [ $check_invert -eq 0 ] || [ $sum -ne $((0xff)) ]; then echo "[ FAIL ] $what does not match (in, out):" @@ -707,9 +718,29 @@ do_transfer() fi local flags="subflow" + local extra_cl_args="" + local extra_srv_args="" + local trunc_size="" if [[ "${addr_nr_ns2}" = "fastclose_"* ]]; then + if [ ${test_link_fail} -le 1 ]; then + echo "fastclose tests need test_link_fail argument" + return 0 + fi + # disconnect - extra_args="$extra_args -I ${addr_nr_ns2:10}" + trunc_size=${test_link_fail} + local side=${addr_nr_ns2:10} + + if [ ${side} = "client" ]; then + extra_cl_args="-f ${test_link_fail}" + extra_srv_args="-f -1" + elif [ ${side} = "server" ]; then + extra_srv_args="-f ${test_link_fail}" + extra_cl_args="-f -1" + else + echo "wrong/unknown fastclose spec ${side}" + return 0 + fi addr_nr_ns2=0 elif [[ "${addr_nr_ns2}" = "userspace_"* ]]; then userspace_pm=1 @@ -737,39 +768,41 @@ do_transfer() local_addr="0.0.0.0" fi + extra_srv_args="$extra_args $extra_srv_args" if [ "$test_link_fail" -gt 1 ];then timeout ${timeout_test} \ ip netns exec ${listener_ns} \ ./mptcp_connect -t ${timeout_poll} -l -p $port -s ${srv_proto} \ - $extra_args ${local_addr} < "$sinfail" > "$sout" & + $extra_srv_args ${local_addr} < "$sinfail" > "$sout" & else timeout ${timeout_test} \ ip netns exec ${listener_ns} \ ./mptcp_connect -t ${timeout_poll} -l -p $port -s ${srv_proto} \ - $extra_args ${local_addr} < "$sin" > "$sout" & + $extra_srv_args ${local_addr} < "$sin" > "$sout" & fi local spid=$! wait_local_port_listen "${listener_ns}" "${port}" + extra_cl_args="$extra_args $extra_cl_args" if [ "$test_link_fail" -eq 0 ];then timeout ${timeout_test} \ ip netns exec ${connector_ns} \ ./mptcp_connect -t ${timeout_poll} -p $port -s ${cl_proto} \ - $extra_args $connect_addr < "$cin" > "$cout" & + $extra_cl_args $connect_addr < "$cin" > "$cout" & elif [ "$test_link_fail" -eq 1 ] || [ "$test_link_fail" -eq 2 ];then ( cat "$cinfail" ; sleep 2; link_failure $listener_ns ; cat "$cinfail" ) | \ tee "$cinsent" | \ timeout ${timeout_test} \ ip netns exec ${connector_ns} \ ./mptcp_connect -t ${timeout_poll} -p $port -s ${cl_proto} \ - $extra_args $connect_addr > "$cout" & + $extra_cl_args $connect_addr > "$cout" & else tee "$cinsent" < "$cinfail" | \ timeout ${timeout_test} \ ip netns exec ${connector_ns} \ ./mptcp_connect -t ${timeout_poll} -p $port -s ${cl_proto} \ - $extra_args $connect_addr > "$cout" & + $extra_cl_args $connect_addr > "$cout" & fi local cpid=$! @@ -971,15 +1004,15 @@ do_transfer() fi if [ "$test_link_fail" -gt 1 ];then - check_transfer $sinfail $cout "file received by client" + check_transfer $sinfail $cout "file received by client" $trunc_size else - check_transfer $sin $cout "file received by client" + check_transfer $sin $cout "file received by client" $trunc_size fi retc=$? if [ "$test_link_fail" -eq 0 ];then - check_transfer $cin $sout "file received by server" + check_transfer $cin $sout "file received by server" $trunc_size else - check_transfer $cinsent $sout "file received by server" + check_transfer $cinsent $sout "file received by server" $trunc_size fi rets=$? @@ -1188,12 +1221,23 @@ chk_fclose_nr() { local fclose_tx=$1 local fclose_rx=$2 + local ns_invert=${3:-""} local count local dump_stats + local ns_tx=$ns2 + local ns_rx=$ns1 + local extra_msg="" + + if [[ $ns_invert = "invert" ]]; then + ns_tx=$ns1 + ns_rx=$ns2 + extra_msg=" invert" + fi printf "%-${nr_blank}s %s" " " "ctx" - count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtMPFastcloseTx | awk '{print $2}') + count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPFastcloseTx | awk '{print $2}') [ -z "$count" ] && count=0 + [ "$count" != "$fclose_tx" ] && extra_msg="$extra_msg,tx=$count" if [ "$count" != "$fclose_tx" ]; then echo "[fail] got $count MP_FASTCLOSE[s] TX expected $fclose_tx" fail_test @@ -1203,17 +1247,20 @@ chk_fclose_nr() fi echo -n " - fclzrx" - count=$(ip netns exec $ns1 nstat -as | grep MPTcpExtMPFastcloseRx | awk '{print $2}') + count=$(ip netns exec $ns_rx nstat -as | grep MPTcpExtMPFastcloseRx | awk '{print $2}') [ -z "$count" ] && count=0 + [ "$count" != "$fclose_rx" ] && extra_msg="$extra_msg,rx=$count" if [ "$count" != "$fclose_rx" ]; then echo "[fail] got $count MP_FASTCLOSE[s] RX expected $fclose_rx" fail_test dump_stats=1 else - echo "[ ok ]" + echo -n "[ ok ]" fi [ "${dump_stats}" = 1 ] && dump_stats + + echo "$extra_msg" } chk_rst_nr() @@ -1236,7 +1283,7 @@ chk_rst_nr() printf "%-${nr_blank}s %s" " " "rtx" count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPRstTx | awk '{print $2}') [ -z "$count" ] && count=0 - if [ "$count" != "$rst_tx" ]; then + if [ $count -lt $rst_tx ]; then echo "[fail] got $count MP_RST[s] TX expected $rst_tx" fail_test dump_stats=1 @@ -1247,7 +1294,7 @@ chk_rst_nr() echo -n " - rstrx " count=$(ip netns exec $ns_rx nstat -as | grep MPTcpExtMPRstRx | awk '{print $2}') [ -z "$count" ] && count=0 - if [ "$count" != "$rst_rx" ]; then + if [ "$count" -lt "$rst_rx" ]; then echo "[fail] got $count MP_RST[s] RX expected $rst_rx" fail_test dump_stats=1 @@ -2801,11 +2848,18 @@ fullmesh_tests() fastclose_tests() { if reset "fastclose test"; then - run_tests $ns1 $ns2 10.0.1.1 1024 0 fastclose_2 + run_tests $ns1 $ns2 10.0.1.1 1024 0 fastclose_client chk_join_nr 0 0 0 chk_fclose_nr 1 1 chk_rst_nr 1 1 invert fi + + if reset "fastclose server test"; then + run_tests $ns1 $ns2 10.0.1.1 1024 0 fastclose_server + chk_join_nr 0 0 0 + chk_fclose_nr 1 1 invert + chk_rst_nr 1 1 + fi } pedit_action_pkts() -- 2.37.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 mptcp-next 3/3] selftests: mptcp: update and extend fastclose test-cases 2022-09-01 17:31 ` [PATCH v3 mptcp-next 3/3] selftests: mptcp: update and extend fastclose test-cases Paolo Abeni @ 2022-09-07 17:22 ` Matthieu Baerts 2022-09-07 18:58 ` Matthieu Baerts 2022-09-09 8:09 ` Paolo Abeni 0 siblings, 2 replies; 10+ messages in thread From: Matthieu Baerts @ 2022-09-07 17:22 UTC (permalink / raw) To: Paolo Abeni, mptcp Hi Paolo, On 01/09/2022 19:31, Paolo Abeni wrote: > After the previous patches, the MPTCP protocol can generate > fast-closes on both ends of the connection. Rework the relevant > test-case to carefully trigger the fast-close code-path on a > single end at the time, while ensuring than a predictable amount > of data is spooled on both ends. > > Additionally add another test-cases for the passive socket > fast-close. Thank you for the v3! (...) > diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c > index 24d4e9cb617e..ee6df83c55f3 100644 > --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c > +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c (...) > @@ -381,8 +382,6 @@ static size_t do_rnd_write(const int fd, char *buf, const size_t len) > do_w = cfg_do_w; > > bw = write(fd, buf, do_w); > - if (bw < 0) > - perror("write"); Should we skip the sleep done here just below and return bw directly in case of error? Or is it more like an small opti not really needed? > > /* let the join handshake complete, before going on */ > if (cfg_join && first) { > @@ -571,7 +570,7 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after > .fd = peerfd, > .events = POLLIN | POLLOUT, > }; > - unsigned int woff = 0, wlen = 0; > + unsigned int woff = 0, wlen = 0, total_wlen = 0, total_rlen = 0; > char wbuf[8192]; > > set_nonblock(peerfd, true); > @@ -597,7 +596,16 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after > } > > if (fds.revents & POLLIN) { > - len = do_rnd_read(peerfd, rbuf, sizeof(rbuf)); > + ssize_t rb = sizeof(rbuf); > + > + /* limit the total amount of read data to the trunc value*/ > + if (cfg_truncate > 0) { > + if (rb + total_rlen > cfg_truncate) > + rb = cfg_truncate - total_rlen; > + len = read(peerfd, rbuf, rb); > + } else { > + len = do_rnd_read(peerfd, rbuf, sizeof(rbuf)); > + } > if (len == 0) { > /* no more data to receive: > * peer has closed its write side > @@ -612,10 +620,14 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after > > /* Else, still have data to transmit */ > } else if (len < 0) { > + /* ignore errors on I/O operation when the peer is fastclosing */ > + if (cfg_truncate < 0) (it is confusing to me to use negative values for a special case (requiring an addional comment each time) instead of using a new dedicated option with a clear name but probably just me :) ) > + return 0; > perror("read"); > return 3; > } > > + total_rlen += len; > do_write(outfd, rbuf, len); > } > (...) > @@ -1262,8 +1295,15 @@ static void parse_opts(int argc, char **argv) > { > int c; > > - while ((c = getopt(argc, argv, "6c:hi:I:jlm:M:o:p:P:r:R:s:S:t:T:w:")) != -1) { > + while ((c = getopt(argc, argv, "6c:f:hi:I:jlm:M:o:p:P:r:R:s:S:t:T:w:")) != -1) { > switch (c) { > + case 'f': Maybe better to update the die_usage() function if you don't mind. > + cfg_truncate = atoi(optarg); > + > + /* when receiving a fastclose, ignore PIPE signals */ > + if (cfg_truncate < 0) > + signal(SIGPIPE, handle_signal); > + break; > case 'j': > cfg_join = true; > cfg_mode = CFG_MODE_POLL; > diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh > index 2957fe414639..0f4c9c4bcb5b 100755 > --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh > +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh (...) > @@ -707,9 +718,29 @@ do_transfer() > fi > > local flags="subflow" > + local extra_cl_args="" > + local extra_srv_args="" > + local trunc_size="" > if [[ "${addr_nr_ns2}" = "fastclose_"* ]]; then > + if [ ${test_link_fail} -le 1 ]; then > + echo "fastclose tests need test_link_fail argument" > + return 0 Probably clearer to exit directly with an error, to be sure the CI catches this, no? Or do you think it is not needed? > + fi > + > # disconnect > - extra_args="$extra_args -I ${addr_nr_ns2:10}" > + trunc_size=${test_link_fail} > + local side=${addr_nr_ns2:10} > + > + if [ ${side} = "client" ]; then > + extra_cl_args="-f ${test_link_fail}" > + extra_srv_args="-f -1" > + elif [ ${side} = "server" ]; then > + extra_srv_args="-f ${test_link_fail}" > + extra_cl_args="-f -1" > + else > + echo "wrong/unknown fastclose spec ${side}" > + return 0 Same here. > + fi > addr_nr_ns2=0 > elif [[ "${addr_nr_ns2}" = "userspace_"* ]]; then > userspace_pm=1 (...) > @@ -1188,12 +1221,23 @@ chk_fclose_nr() > { > local fclose_tx=$1 > local fclose_rx=$2 > + local ns_invert=${3:-""} (I'm not sure why you need to default to an empty string but that's a detail :) ) > local count > local dump_stats > + local ns_tx=$ns2 > + local ns_rx=$ns1 > + local extra_msg="" > + > + if [[ $ns_invert = "invert" ]]; then > + ns_tx=$ns1 > + ns_rx=$ns2 > + extra_msg=" invert" > + fi > > printf "%-${nr_blank}s %s" " " "ctx" > - count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtMPFastcloseTx | awk '{print $2}') > + count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPFastcloseTx | awk '{print $2}') > [ -z "$count" ] && count=0 > + [ "$count" != "$fclose_tx" ] && extra_msg="$extra_msg,tx=$count" detail: if we are not in the "invert" mode, extra_msg will be empty: no need to add the 3 spaces? or init extra_msg with these 3 spaces? > if [ "$count" != "$fclose_tx" ]; then > echo "[fail] got $count MP_FASTCLOSE[s] TX expected $fclose_tx" > fail_test (...) > @@ -1236,7 +1283,7 @@ chk_rst_nr() > printf "%-${nr_blank}s %s" " " "rtx" > count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPRstTx | awk '{print $2}') > [ -z "$count" ] && count=0 > - if [ "$count" != "$rst_tx" ]; then > + if [ $count -lt $rst_tx ]; then Why is it fine if we send (and below receive) more than expected? > echo "[fail] got $count MP_RST[s] TX expected $rst_tx" > fail_test > dump_stats=1 > @@ -1247,7 +1294,7 @@ chk_rst_nr() > echo -n " - rstrx " > count=$(ip netns exec $ns_rx nstat -as | grep MPTcpExtMPRstRx | awk '{print $2}') > [ -z "$count" ] && count=0 > - if [ "$count" != "$rst_rx" ]; then > + if [ "$count" -lt "$rst_rx" ]; then > echo "[fail] got $count MP_RST[s] RX expected $rst_rx" > fail_test > dump_stats=1 > @@ -2801,11 +2848,18 @@ fullmesh_tests() > fastclose_tests() > { > if reset "fastclose test"; then > - run_tests $ns1 $ns2 10.0.1.1 1024 0 fastclose_2 > + run_tests $ns1 $ns2 10.0.1.1 1024 0 fastclose_client clearer :) > chk_join_nr 0 0 0 > chk_fclose_nr 1 1 > chk_rst_nr 1 1 invert > fi > + > + if reset "fastclose server test"; then > + run_tests $ns1 $ns2 10.0.1.1 1024 0 fastclose_server > + chk_join_nr 0 0 0 > + chk_fclose_nr 1 1 invert > + chk_rst_nr 1 1 > + fi > } > > pedit_action_pkts() Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 mptcp-next 3/3] selftests: mptcp: update and extend fastclose test-cases 2022-09-07 17:22 ` Matthieu Baerts @ 2022-09-07 18:58 ` Matthieu Baerts 2022-09-09 8:09 ` Paolo Abeni 1 sibling, 0 replies; 10+ messages in thread From: Matthieu Baerts @ 2022-09-07 18:58 UTC (permalink / raw) To: Paolo Abeni, mptcp On 07/09/2022 19:22, Matthieu Baerts wrote: > Hi Paolo, > > On 01/09/2022 19:31, Paolo Abeni wrote: >> After the previous patches, the MPTCP protocol can generate >> fast-closes on both ends of the connection. Rework the relevant >> test-case to carefully trigger the fast-close code-path on a >> single end at the time, while ensuring than a predictable amount >> of data is spooled on both ends. >> >> Additionally add another test-cases for the passive socket >> fast-close. > > Thank you for the v3! One last thing: it looks like some packetdrill tests -- dss and syscalls -- will need to be adapted as well because in some situations, a RST is now going to be sent instead of a FIN. Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 mptcp-next 3/3] selftests: mptcp: update and extend fastclose test-cases 2022-09-07 17:22 ` Matthieu Baerts 2022-09-07 18:58 ` Matthieu Baerts @ 2022-09-09 8:09 ` Paolo Abeni 2022-09-09 8:52 ` Matthieu Baerts 1 sibling, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2022-09-09 8:09 UTC (permalink / raw) To: Matthieu Baerts, mptcp On Wed, 2022-09-07 at 19:22 +0200, Matthieu Baerts wrote: > Hi Paolo, > > On 01/09/2022 19:31, Paolo Abeni wrote: > > After the previous patches, the MPTCP protocol can generate > > fast-closes on both ends of the connection. Rework the relevant > > test-case to carefully trigger the fast-close code-path on a > > single end at the time, while ensuring than a predictable amount > > of data is spooled on both ends. > > > > Additionally add another test-cases for the passive socket > > fast-close. > > Thank you for the v3! > > (...) > > > diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c > > index 24d4e9cb617e..ee6df83c55f3 100644 > > --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c > > +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c > > (...) > > > @@ -381,8 +382,6 @@ static size_t do_rnd_write(const int fd, char *buf, const size_t len) > > do_w = cfg_do_w; > > > > bw = write(fd, buf, do_w); > > - if (bw < 0) > > - perror("write"); > > Should we skip the sleep done here just below and return bw directly in > case of error? > Or is it more like an small opti not really needed? > > > > > /* let the join handshake complete, before going on */ > > if (cfg_join && first) { > > @@ -571,7 +570,7 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after > > .fd = peerfd, > > .events = POLLIN | POLLOUT, > > }; > > - unsigned int woff = 0, wlen = 0; > > + unsigned int woff = 0, wlen = 0, total_wlen = 0, total_rlen = 0; > > char wbuf[8192]; > > > > set_nonblock(peerfd, true); > > @@ -597,7 +596,16 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after > > } > > > > if (fds.revents & POLLIN) { > > - len = do_rnd_read(peerfd, rbuf, sizeof(rbuf)); > > + ssize_t rb = sizeof(rbuf); > > + > > + /* limit the total amount of read data to the trunc value*/ > > + if (cfg_truncate > 0) { > > + if (rb + total_rlen > cfg_truncate) > > + rb = cfg_truncate - total_rlen; > > + len = read(peerfd, rbuf, rb); > > + } else { > > + len = do_rnd_read(peerfd, rbuf, sizeof(rbuf)); > > + } > > if (len == 0) { > > /* no more data to receive: > > * peer has closed its write side > > @@ -612,10 +620,14 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after > > > > /* Else, still have data to transmit */ > > } else if (len < 0) { > > + /* ignore errors on I/O operation when the peer is fastclosing */ > > + if (cfg_truncate < 0) > > (it is confusing to me to use negative values for a special case > (requiring an addional comment each time) instead of using a new > dedicated option with a clear name but probably just me :) ) > > > + return 0; > > perror("read"); > > return 3; > > } > > > > + total_rlen += len; > > do_write(outfd, rbuf, len); > > } > > > > (...) > > > @@ -1262,8 +1295,15 @@ static void parse_opts(int argc, char **argv) > > { > > int c; > > > > - while ((c = getopt(argc, argv, "6c:hi:I:jlm:M:o:p:P:r:R:s:S:t:T:w:")) != -1) { > > + while ((c = getopt(argc, argv, "6c:f:hi:I:jlm:M:o:p:P:r:R:s:S:t:T:w:")) != -1) { > > switch (c) { > > + case 'f': > > Maybe better to update the die_usage() function if you don't mind. > > > + cfg_truncate = atoi(optarg); > > + > > + /* when receiving a fastclose, ignore PIPE signals */ > > + if (cfg_truncate < 0) > > + signal(SIGPIPE, handle_signal); > > + break; > > case 'j': > > cfg_join = true; > > cfg_mode = CFG_MODE_POLL; > > diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh > > index 2957fe414639..0f4c9c4bcb5b 100755 > > --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh > > +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh > > (...) > > > @@ -707,9 +718,29 @@ do_transfer() > > fi > > > > local flags="subflow" > > + local extra_cl_args="" > > + local extra_srv_args="" > > + local trunc_size="" > > if [[ "${addr_nr_ns2}" = "fastclose_"* ]]; then > > + if [ ${test_link_fail} -le 1 ]; then > > + echo "fastclose tests need test_link_fail argument" > > + return 0 > > Probably clearer to exit directly with an error, to be sure the CI > catches this, no? > Or do you think it is not needed? > > > + fi > > + > > # disconnect > > - extra_args="$extra_args -I ${addr_nr_ns2:10}" > > + trunc_size=${test_link_fail} > > + local side=${addr_nr_ns2:10} > > + > > + if [ ${side} = "client" ]; then > > + extra_cl_args="-f ${test_link_fail}" > > + extra_srv_args="-f -1" > > + elif [ ${side} = "server" ]; then > > + extra_srv_args="-f ${test_link_fail}" > > + extra_cl_args="-f -1" > > + else > > + echo "wrong/unknown fastclose spec ${side}" > > + return 0 > > Same here. > > > + fi > > addr_nr_ns2=0 > > elif [[ "${addr_nr_ns2}" = "userspace_"* ]]; then > > userspace_pm=1 > > (...) > > > @@ -1188,12 +1221,23 @@ chk_fclose_nr() > > { > > local fclose_tx=$1 > > local fclose_rx=$2 > > + local ns_invert=${3:-""} > > (I'm not sure why you need to default to an empty string but that's a > detail :) ) > > > local count > > local dump_stats > > + local ns_tx=$ns2 > > + local ns_rx=$ns1 > > + local extra_msg="" > > + > > + if [[ $ns_invert = "invert" ]]; then > > + ns_tx=$ns1 > > + ns_rx=$ns2 > > + extra_msg=" invert" > > + fi > > > > printf "%-${nr_blank}s %s" " " "ctx" > > - count=$(ip netns exec $ns2 nstat -as | grep MPTcpExtMPFastcloseTx | awk '{print $2}') > > + count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPFastcloseTx | awk '{print $2}') > > [ -z "$count" ] && count=0 > > + [ "$count" != "$fclose_tx" ] && extra_msg="$extra_msg,tx=$count" > > detail: if we are not in the "invert" mode, extra_msg will be empty: no > need to add the 3 spaces? or init extra_msg with these 3 spaces? > > > if [ "$count" != "$fclose_tx" ]; then > > echo "[fail] got $count MP_FASTCLOSE[s] TX expected $fclose_tx" > > fail_test > > (...) > > > @@ -1236,7 +1283,7 @@ chk_rst_nr() > > printf "%-${nr_blank}s %s" " " "rtx" > > count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPRstTx | awk '{print $2}') > > [ -z "$count" ] && count=0 > > - if [ "$count" != "$rst_tx" ]; then > > + if [ $count -lt $rst_tx ]; then > > Why is it fine if we send (and below receive) more than expected? [answering only here because this will be the only suggestion leading to no changes in the next revision] Because the rst can be retransmitted. I'm not sure/can't recall how the self-test previously was able to enforce a single reset, but that does not look possible now. Thanks, Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 mptcp-next 3/3] selftests: mptcp: update and extend fastclose test-cases 2022-09-09 8:09 ` Paolo Abeni @ 2022-09-09 8:52 ` Matthieu Baerts 0 siblings, 0 replies; 10+ messages in thread From: Matthieu Baerts @ 2022-09-09 8:52 UTC (permalink / raw) To: Paolo Abeni, mptcp Hi Paolo, On 09/09/2022 10:09, Paolo Abeni wrote: > On Wed, 2022-09-07 at 19:22 +0200, Matthieu Baerts wrote: >> On 01/09/2022 19:31, Paolo Abeni wrote: >>> After the previous patches, the MPTCP protocol can generate >>> fast-closes on both ends of the connection. Rework the relevant >>> test-case to carefully trigger the fast-close code-path on a >>> single end at the time, while ensuring than a predictable amount >>> of data is spooled on both ends. >>> >>> Additionally add another test-cases for the passive socket >>> fast-close. >> >> Thank you for the v3! >> >> (...) >> >>> diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c >>> index 24d4e9cb617e..ee6df83c55f3 100644 >>> --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c >>> +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c (...) >>> @@ -1236,7 +1283,7 @@ chk_rst_nr() >>> printf "%-${nr_blank}s %s" " " "rtx" >>> count=$(ip netns exec $ns_tx nstat -as | grep MPTcpExtMPRstTx | awk '{print $2}') >>> [ -z "$count" ] && count=0 >>> - if [ "$count" != "$rst_tx" ]; then >>> + if [ $count -lt $rst_tx ]; then >> >> Why is it fine if we send (and below receive) more than expected? > > [answering only here because this will be the only suggestion leading > to no changes in the next revision] > > Because the rst can be retransmitted. I'm not sure/can't recall how the > self-test previously was able to enforce a single reset, but that does > not look possible now. Indeed, good idea to do that! I guess we can control the number of retransmissions in general but not specific to the reset: we don't want to not accept any retransmissions of the previous data packets. (I'm sorry to repeat the same kind of thing again but should we extract this part and have a dedicated commit for -net? :) ) Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error. 2022-09-01 17:31 [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error Paolo Abeni 2022-09-01 17:31 ` [PATCH v3 mptcp-next 2/3] mptcp: use fastclose on more edge scenarios Paolo Abeni 2022-09-01 17:31 ` [PATCH v3 mptcp-next 3/3] selftests: mptcp: update and extend fastclose test-cases Paolo Abeni @ 2022-09-07 17:22 ` Matthieu Baerts 2022-09-08 11:48 ` Paolo Abeni 2 siblings, 1 reply; 10+ messages in thread From: Matthieu Baerts @ 2022-09-07 17:22 UTC (permalink / raw) To: Paolo Abeni, mptcp Hi Paolo, On 01/09/2022 19:31, Paolo Abeni wrote: > When an mptcp socket is closed due to an incoming FASTCLOSE > option, so specific sk_err is set and later syscall will > fail usually with EPIPE. > > Align the current fastclose error handling with TCP reset, > properly setting the socket error according to the current > msk state and propagating such error. Thank you for the patch and sorry for the delay! I have some questions here below: > Additionally sendmsg() is currently not handling properly > the sk_err, always returning EPIPE. Could it be seen as a bug-fix and deserving a dedicated commit? > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > net/mptcp/protocol.c | 45 ++++++++++++++++++++++++++++++++++---------- > 1 file changed, 35 insertions(+), 10 deletions(-) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index f4891db86217..b04f184695e4 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -1685,9 +1685,13 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) > if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) { > ret = sk_stream_wait_connect(sk, &timeo); > if (ret) > - goto out; > + goto do_error; > } > > + ret = -EPIPE; Regarding this line above... (see below) > + if (unlikely(sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))) > + goto do_error; > + > pfrag = sk_page_frag(sk); > > while (msg_data_left(msg)) { ...Could we eventually have no data left and no going through the while loop? In this case "copied" will be 0 and we will return the new -EPIPE, maybe not what we want, no? Or should we change 'ret' value just before calling 'goto do_error' to avoid that? (...) > @@ -2404,12 +2410,31 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk) > unlock_sock_fast(tcp_sk, slow); > } > > + /* Mirror the tcp_reset() error propagation */ > + switch (sk->sk_state) { > + case TCP_SYN_SENT: > + sk->sk_err = ECONNREFUSED; > + break; > + case TCP_CLOSE_WAIT: > + sk->sk_err = EPIPE; > + break; > + case TCP_CLOSE: > + return; > + default: > + sk->sk_err = ECONNRESET; > + } Should we eventually move this out of tcp_reset(), export it (inline function?) and re-using it here to avoid the duplication (and eventually being out-of-sync in case of changes on TCP side? even if I guess this would not change but well) > + > inet_sk_state_store(sk, TCP_CLOSE); > sk->sk_shutdown = SHUTDOWN_MASK; > smp_mb__before_atomic(); /* SHUTDOWN must be visible first */ > set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags); > > - mptcp_close_wake_up(sk); > + /* the calling mptcp_worker will properly destroy the socket */ > + if (sock_flag(sk, SOCK_DEAD)) > + return; > + > + sk->sk_state_change(sk); > + sk_error_report(sk); > } > > static void __mptcp_retrans(struct sock *sk) Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error. 2022-09-07 17:22 ` [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error Matthieu Baerts @ 2022-09-08 11:48 ` Paolo Abeni 2022-09-08 12:13 ` Matthieu Baerts 0 siblings, 1 reply; 10+ messages in thread From: Paolo Abeni @ 2022-09-08 11:48 UTC (permalink / raw) To: Matthieu Baerts, mptcp On Wed, 2022-09-07 at 19:22 +0200, Matthieu Baerts wrote: > Hi Paolo, > > On 01/09/2022 19:31, Paolo Abeni wrote: > > When an mptcp socket is closed due to an incoming FASTCLOSE > > option, so specific sk_err is set and later syscall will > > fail usually with EPIPE. > > > > Align the current fastclose error handling with TCP reset, > > properly setting the socket error according to the current > > msk state and propagating such error. > > Thank you for the patch and sorry for the delay! Thank you for the review! > > I have some questions here below: > > > Additionally sendmsg() is currently not handling properly > > the sk_err, always returning EPIPE. > Could it be seen as a bug-fix and deserving a dedicated commit? I personally see this as an improvement, as AFAIK there is no mandated by RFC return error code for MPTCP. > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > net/mptcp/protocol.c | 45 ++++++++++++++++++++++++++++++++++---------- > > 1 file changed, 35 insertions(+), 10 deletions(-) > > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > > index f4891db86217..b04f184695e4 100644 > > --- a/net/mptcp/protocol.c > > +++ b/net/mptcp/protocol.c > > @@ -1685,9 +1685,13 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) > > if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) { > > ret = sk_stream_wait_connect(sk, &timeo); > > if (ret) > > - goto out; > > + goto do_error; > > } > > > > + ret = -EPIPE; > > Regarding this line above... (see below) > > > + if (unlikely(sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))) > > + goto do_error; > > + > > pfrag = sk_page_frag(sk); > > > > while (msg_data_left(msg)) { > > ...Could we eventually have no data left and no going through the while > loop? yes... > In this case "copied" will be 0 and we will return the new -EPIPE, maybe > not what we want, no? no;) we now always return 'copied'. We set 'copied' to the correct error code only in the error path, saving a conditional in the fast path. All gracefully stolen from plain TCP ;) > Or should we change 'ret' value just before calling 'goto do_error' to > avoid that? > > (...) > > > @@ -2404,12 +2410,31 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk) > > unlock_sock_fast(tcp_sk, slow); > > } > > > > + /* Mirror the tcp_reset() error propagation */ > > + switch (sk->sk_state) { > > + case TCP_SYN_SENT: > > + sk->sk_err = ECONNREFUSED; > > + break; > > + case TCP_CLOSE_WAIT: > > + sk->sk_err = EPIPE; > > + break; > > + case TCP_CLOSE: > > + return; > > + default: > > + sk->sk_err = ECONNRESET; > > + } > > Should we eventually move this out of tcp_reset(), export it (inline > function?) and re-using it here to avoid the duplication (and eventually > being out-of-sync in case of changes on TCP side? even if I guess this > would not change but well) That could be a possible follow-up I guess, but I would refrain from touching TCP to address issues/295. Thanks! Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error. 2022-09-08 11:48 ` Paolo Abeni @ 2022-09-08 12:13 ` Matthieu Baerts 0 siblings, 0 replies; 10+ messages in thread From: Matthieu Baerts @ 2022-09-08 12:13 UTC (permalink / raw) To: Paolo Abeni; +Cc: mptcp Hi Paolo, Thank you for your replies! On 08/09/2022 13:48, Paolo Abeni wrote: > On Wed, 2022-09-07 at 19:22 +0200, Matthieu Baerts wrote: >> On 01/09/2022 19:31, Paolo Abeni wrote: (...) >>> Additionally sendmsg() is currently not handling properly >>> the sk_err, always returning EPIPE. >> Could it be seen as a bug-fix and deserving a dedicated commit? > > I personally see this as an improvement, as AFAIK there is no mandated > by RFC return error code for MPTCP. Mmh yes I see, we can say that. >> >>> Signed-off-by: Paolo Abeni <pabeni@redhat.com> >>> --- >>> net/mptcp/protocol.c | 45 ++++++++++++++++++++++++++++++++++---------- >>> 1 file changed, 35 insertions(+), 10 deletions(-) >>> >>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >>> index f4891db86217..b04f184695e4 100644 >>> --- a/net/mptcp/protocol.c >>> +++ b/net/mptcp/protocol.c >>> @@ -1685,9 +1685,13 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) >>> if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) { >>> ret = sk_stream_wait_connect(sk, &timeo); >>> if (ret) >>> - goto out; >>> + goto do_error; >>> } >>> >>> + ret = -EPIPE; >> >> Regarding this line above... (see below) >> >>> + if (unlikely(sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))) >>> + goto do_error; >>> + >>> pfrag = sk_page_frag(sk); >>> >>> while (msg_data_left(msg)) { >> >> ...Could we eventually have no data left and no going through the while >> loop? > > yes... > >> In this case "copied" will be 0 and we will return the new -EPIPE, maybe >> not what we want, no? > > no;) we now always return 'copied'. We set 'copied' to the correct > error code only in the error path, saving a conditional in the fast > path. My bad, for the end of the function, I was looking at code prior this patch... Indeed, all good with the rest of the modifications :) > All gracefully stolen from plain TCP ;) :) > >> Or should we change 'ret' value just before calling 'goto do_error' to >> avoid that? >> >> (...) >> >>> @@ -2404,12 +2410,31 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk) >>> unlock_sock_fast(tcp_sk, slow); >>> } >>> >>> + /* Mirror the tcp_reset() error propagation */ >>> + switch (sk->sk_state) { >>> + case TCP_SYN_SENT: >>> + sk->sk_err = ECONNREFUSED; >>> + break; >>> + case TCP_CLOSE_WAIT: >>> + sk->sk_err = EPIPE; >>> + break; >>> + case TCP_CLOSE: >>> + return; >>> + default: >>> + sk->sk_err = ECONNRESET; >>> + } >> >> Should we eventually move this out of tcp_reset(), export it (inline >> function?) and re-using it here to avoid the duplication (and eventually >> being out-of-sync in case of changes on TCP side? even if I guess this >> would not change but well) > > That could be a possible follow-up I guess, but I would refrain from > touching TCP to address issues/295. Sure, we can do that later as a clean-up. Patch 1/3 and 2/3 from this v3 are then good to me: Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> (I'm not going to apply them now as we need patch 3/3 and I had some questions there + packetdrill adaptations) Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-09-09 8:52 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-09-01 17:31 [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error Paolo Abeni 2022-09-01 17:31 ` [PATCH v3 mptcp-next 2/3] mptcp: use fastclose on more edge scenarios Paolo Abeni 2022-09-01 17:31 ` [PATCH v3 mptcp-next 3/3] selftests: mptcp: update and extend fastclose test-cases Paolo Abeni 2022-09-07 17:22 ` Matthieu Baerts 2022-09-07 18:58 ` Matthieu Baerts 2022-09-09 8:09 ` Paolo Abeni 2022-09-09 8:52 ` Matthieu Baerts 2022-09-07 17:22 ` [PATCH v3 mptcp-next 1/3] mptcp: propagate fastclose error Matthieu Baerts 2022-09-08 11:48 ` Paolo Abeni 2022-09-08 12:13 ` Matthieu Baerts
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).