[MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close

* [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close
@ 2020-09-25 17:02 Paolo Abeni
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2020-09-25 17:02 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 3929 bytes --]

On Thu, 2020-09-24 at 15:59 +0200, Paolo Abeni wrote:
> We must not close the subflows before all the MPTCP level
> data, comprising the DATA_FIN has been acked at the MPTCP
> level, otherwise we could be unable to retransmit as needed.
> 
> __mptcp_wr_shutdown() shutdown is responsible to check for the
> correct status and close all subflows. Is called by the output
> path after spooling any data and at shutdown/close time.
> 
> In a similar way, __mptcp_destroy_sock() is responsible to clean-up
> the MPTCP level status, and is called when the msk transition
> to TCP_CLOSE.
> 
> The protocol level close() does not force anymore the TCP_CLOSE
> status, but orphan the msk socket and all the subflows.
> Orphaned msk sockets are forciby closed after a timeout or
> when all MPTCP-level data is acked.
> 
> There is a caveat about keeping the orphaned subflows around:
> the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
> tcp_close(). To prevent accessing freed memory on later MPTCP
> level operations, the msk acquires a reference to each subflow
> socket and prevent subflow_ulp_release() from releasing the
> subflow context before __mptcp_destroy_sock().
> 
> The additional subflow references are released by __mptcp_done()
> and the async ULP release is detected checking ULP ops. If such
> field has been already cleared by the ULP release path, the
> dangling context is freed directly by __mptcp_done().
> 
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> ---
> Note:
> The convoluted ctx deallocation schema fixes the UaF described in:
> 
> https://lists.01.org/hyperkitty/list/mptcp(a)lists.01.org/message/JC7GTZHEKI6A3WHQKAWF3SYUSBEU5DYM/
> 
> for me.
> ---
>  net/mptcp/protocol.c | 264 ++++++++++++++++++++++++++++++-----------
> --
>  net/mptcp/protocol.h |   4 +-
>  net/mptcp/subflow.c  |  21 +++-
>  3 files changed, 203 insertions(+), 86 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 2b6c704d4505..dec703103398 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -21,6 +21,7 @@
>  #include <net/transp_v6.h>
>  #endif
>  #include <net/mptcp.h>
> +#include <net/xfrm.h>
>  #include "protocol.h"
>  #include "mib.h"
>  
> @@ -41,6 +42,8 @@ struct mptcp_skb_cb {
>  
>  static struct percpu_counter mptcp_sockets_allocated;
>  
> +static void __mptcp_destroy_sock(struct sock *sk, int timeout);
> +
>  /* If msk has an initial subflow socket, and the MP_CAPABLE
> handshake has not
>   * completed yet or has failed, return the subflow socket.
>   * Otherwise return NULL.
> @@ -102,6 +105,7 @@ static int __mptcp_socket_create(struct
> mptcp_sock *msk)
>  	msk->subflow = ssock;
>  	subflow = mptcp_subflow_ctx(ssock->sk);
>  	list_add(&subflow->node, &msk->conn_list);
> +	sock_hold(ssock->sk);
>  	subflow->request_mptcp = 1;
>  
>  	/* accept() will wait on first subflow sk_wq, and we always
> wakes up
> @@ -313,6 +317,13 @@ static void mptcp_stop_timer(struct sock *sk)
>  	mptcp_sk(sk)->timer_ival = 0;
>  }
>  
> +static bool mptcp_wr_eof(const struct sock *sk)
> +{
> +	return sk->sk_state == TCP_FIN_WAIT2 || sk->sk_state ==
> TCP_TIME_WAIT ||
> +	       sk->sk_state == TCP_CLOSE;
> +}
> +
> +/* return true when transitioning to CLOSE state */
>  static void mptcp_check_data_fin_ack(struct sock *sk)
>  {
>  	struct mptcp_sock *msk = mptcp_sk(sk);
> @@ -331,15 +342,22 @@ static void mptcp_check_data_fin_ack(struct
> sock *sk)
>  		switch (sk->sk_state) {
>  		case TCP_FIN_WAIT1:
>  			inet_sk_state_store(sk, TCP_FIN_WAIT2);
> -			sk->sk_state_change(sk);
>  			break;
>  		case TCP_CLOSING:
> +			inet_sk_state_store(sk, TCP_TIME_WAIT);
> +			break;

I can't use TIME_WAIT here nor later, otherwise sk_fullsock() will be
fooled. Not sure why I did not notice before. I'll use directly
TCP_CLOSE.

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread