All of lore.kernel.org
 help / color / mirror / Atom feed
* [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close
@ 2020-09-25 17:02 Paolo Abeni
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2020-09-25 17:02 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 3929 bytes --]

On Thu, 2020-09-24 at 15:59 +0200, Paolo Abeni wrote:
> We must not close the subflows before all the MPTCP level
> data, comprising the DATA_FIN has been acked at the MPTCP
> level, otherwise we could be unable to retransmit as needed.
> 
> __mptcp_wr_shutdown() shutdown is responsible to check for the
> correct status and close all subflows. Is called by the output
> path after spooling any data and at shutdown/close time.
> 
> In a similar way, __mptcp_destroy_sock() is responsible to clean-up
> the MPTCP level status, and is called when the msk transition
> to TCP_CLOSE.
> 
> The protocol level close() does not force anymore the TCP_CLOSE
> status, but orphan the msk socket and all the subflows.
> Orphaned msk sockets are forciby closed after a timeout or
> when all MPTCP-level data is acked.
> 
> There is a caveat about keeping the orphaned subflows around:
> the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
> tcp_close(). To prevent accessing freed memory on later MPTCP
> level operations, the msk acquires a reference to each subflow
> socket and prevent subflow_ulp_release() from releasing the
> subflow context before __mptcp_destroy_sock().
> 
> The additional subflow references are released by __mptcp_done()
> and the async ULP release is detected checking ULP ops. If such
> field has been already cleared by the ULP release path, the
> dangling context is freed directly by __mptcp_done().
> 
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> ---
> Note:
> The convoluted ctx deallocation schema fixes the UaF described in:
> 
> https://lists.01.org/hyperkitty/list/mptcp(a)lists.01.org/message/JC7GTZHEKI6A3WHQKAWF3SYUSBEU5DYM/
> 
> for me.
> ---
>  net/mptcp/protocol.c | 264 ++++++++++++++++++++++++++++++-----------
> --
>  net/mptcp/protocol.h |   4 +-
>  net/mptcp/subflow.c  |  21 +++-
>  3 files changed, 203 insertions(+), 86 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 2b6c704d4505..dec703103398 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -21,6 +21,7 @@
>  #include <net/transp_v6.h>
>  #endif
>  #include <net/mptcp.h>
> +#include <net/xfrm.h>
>  #include "protocol.h"
>  #include "mib.h"
>  
> @@ -41,6 +42,8 @@ struct mptcp_skb_cb {
>  
>  static struct percpu_counter mptcp_sockets_allocated;
>  
> +static void __mptcp_destroy_sock(struct sock *sk, int timeout);
> +
>  /* If msk has an initial subflow socket, and the MP_CAPABLE
> handshake has not
>   * completed yet or has failed, return the subflow socket.
>   * Otherwise return NULL.
> @@ -102,6 +105,7 @@ static int __mptcp_socket_create(struct
> mptcp_sock *msk)
>  	msk->subflow = ssock;
>  	subflow = mptcp_subflow_ctx(ssock->sk);
>  	list_add(&subflow->node, &msk->conn_list);
> +	sock_hold(ssock->sk);
>  	subflow->request_mptcp = 1;
>  
>  	/* accept() will wait on first subflow sk_wq, and we always
> wakes up
> @@ -313,6 +317,13 @@ static void mptcp_stop_timer(struct sock *sk)
>  	mptcp_sk(sk)->timer_ival = 0;
>  }
>  
> +static bool mptcp_wr_eof(const struct sock *sk)
> +{
> +	return sk->sk_state == TCP_FIN_WAIT2 || sk->sk_state ==
> TCP_TIME_WAIT ||
> +	       sk->sk_state == TCP_CLOSE;
> +}
> +
> +/* return true when transitioning to CLOSE state */
>  static void mptcp_check_data_fin_ack(struct sock *sk)
>  {
>  	struct mptcp_sock *msk = mptcp_sk(sk);
> @@ -331,15 +342,22 @@ static void mptcp_check_data_fin_ack(struct
> sock *sk)
>  		switch (sk->sk_state) {
>  		case TCP_FIN_WAIT1:
>  			inet_sk_state_store(sk, TCP_FIN_WAIT2);
> -			sk->sk_state_change(sk);
>  			break;
>  		case TCP_CLOSING:
> +			inet_sk_state_store(sk, TCP_TIME_WAIT);
> +			break;

I can't use TIME_WAIT here nor later, otherwise sk_fullsock() will be
fooled. Not sure why I did not notice before. I'll use directly
TCP_CLOSE.

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close
@ 2020-09-28 14:47 Paolo Abeni
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2020-09-28 14:47 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 614 bytes --]

On Thu, 2020-09-24 at 15:59 +0200, Paolo Abeni wrote:
> @@ -417,9 +435,8 @@ static void mptcp_check_data_fin(struct sock *sk)
>  			inet_sk_state_store(sk, TCP_CLOSING);
>  			break;
>  		case TCP_FIN_WAIT2:
> -			inet_sk_state_store(sk, TCP_CLOSE);
> -			// @@ Close subflows now?
> -			break;
> +			inet_sk_state_store(sk, TCP_TIME_WAIT);
> +			return;

the 'return;' statement slipped-in unintentionally here: this does not
send data_fin ack, if the fin in received in fin_wait2 state.

I have quite a few pending fixes, so I'll send a v1 soon, even if there
are likely more bugs pending.

/P

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close
@ 2020-09-28  9:39 Paolo Abeni
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2020-09-28  9:39 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 4706 bytes --]

On Fri, 2020-09-25 at 14:34 -0700, Mat Martineau wrote:
> On Fri, 25 Sep 2020, Paolo Abeni wrote:
> 
> > On Thu, 2020-09-24 at 15:59 +0200, Paolo Abeni wrote:
> > > We must not close the subflows before all the MPTCP level
> > > data, comprising the DATA_FIN has been acked at the MPTCP
> > > level, otherwise we could be unable to retransmit as needed.
> > > 
> > > __mptcp_wr_shutdown() shutdown is responsible to check for the
> > > correct status and close all subflows. Is called by the output
> > > path after spooling any data and at shutdown/close time.
> > > 
> > > In a similar way, __mptcp_destroy_sock() is responsible to clean-
> > > up
> > > the MPTCP level status, and is called when the msk transition
> > > to TCP_CLOSE.
> > > 
> > > The protocol level close() does not force anymore the TCP_CLOSE
> > > status, but orphan the msk socket and all the subflows.
> > > Orphaned msk sockets are forciby closed after a timeout or
> > > when all MPTCP-level data is acked.
> > > 
> > > There is a caveat about keeping the orphaned subflows around:
> > > the TCP stack can asynchronusly call tcp_cleanup_ulp() on them
> > > via
> > > tcp_close(). To prevent accessing freed memory on later MPTCP
> > > level operations, the msk acquires a reference to each subflow
> > > socket and prevent subflow_ulp_release() from releasing the
> > > subflow context before __mptcp_destroy_sock().
> > > 
> > > The additional subflow references are released by __mptcp_done()
> > > and the async ULP release is detected checking ULP ops. If such
> > > field has been already cleared by the ULP release path, the
> > > dangling context is freed directly by __mptcp_done().
> > > 
> > > Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> > > ---
> > > Note:
> > > The convoluted ctx deallocation schema fixes the UaF described
> > > in:
> > > 
> > > https://lists.01.org/hyperkitty/list/mptcp(a)lists.01.org/message/JC7GTZHEKI6A3WHQKAWF3SYUSBEU5DYM/
> > > 
> > > for me.
> > > ---
> > >  net/mptcp/protocol.c | 264 ++++++++++++++++++++++++++++++-------
> > > ----
> > > --
> > >  net/mptcp/protocol.h |   4 +-
> > >  net/mptcp/subflow.c  |  21 +++-
> > >  3 files changed, 203 insertions(+), 86 deletions(-)
> > > 
> > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > index 2b6c704d4505..dec703103398 100644
> > > --- a/net/mptcp/protocol.c
> > > +++ b/net/mptcp/protocol.c
> > > @@ -21,6 +21,7 @@
> > >  #include <net/transp_v6.h>
> > >  #endif
> > >  #include <net/mptcp.h>
> > > +#include <net/xfrm.h>
> > >  #include "protocol.h"
> > >  #include "mib.h"
> > > 
> > > @@ -41,6 +42,8 @@ struct mptcp_skb_cb {
> > > 
> > >  static struct percpu_counter mptcp_sockets_allocated;
> > > 
> > > +static void __mptcp_destroy_sock(struct sock *sk, int timeout);
> > > +
> > >  /* If msk has an initial subflow socket, and the MP_CAPABLE
> > > handshake has not
> > >   * completed yet or has failed, return the subflow socket.
> > >   * Otherwise return NULL.
> > > @@ -102,6 +105,7 @@ static int __mptcp_socket_create(struct
> > > mptcp_sock *msk)
> > >  	msk->subflow = ssock;
> > >  	subflow = mptcp_subflow_ctx(ssock->sk);
> > >  	list_add(&subflow->node, &msk->conn_list);
> > > +	sock_hold(ssock->sk);
> > >  	subflow->request_mptcp = 1;
> > > 
> > >  	/* accept() will wait on first subflow sk_wq, and we always
> > > wakes up
> > > @@ -313,6 +317,13 @@ static void mptcp_stop_timer(struct sock
> > > *sk)
> > >  	mptcp_sk(sk)->timer_ival = 0;
> > >  }
> > > 
> > > +static bool mptcp_wr_eof(const struct sock *sk)
> > > +{
> > > +	return sk->sk_state == TCP_FIN_WAIT2 || sk->sk_state ==
> > > TCP_TIME_WAIT ||
> > > +	       sk->sk_state == TCP_CLOSE;
> 
> Need to remove TCP_TIME_WAIT here too, the state isn't used (yet?).

Indeed, TIME_WAIT should be dropped from here. It's use is introduced
by this version of this patch, but I removed in in later version
 (shared only on IRC so far ;)

> I looked at how this interoperated with the mptcp_trunk kernel (with 
> MPTCPv1). With TCP_FIN_WAIT2 included in the mptcp_wr_eof() check, 
> __mptcp_destroy_sock() gets called before we can receive and
> acknowledge 
> the peer's DATA_FIN.
> I ran an experiment removing TCP_FIN_WAIT2 and TCP_TIME_WAIT from
> the 
> above, and it did keep the socket alive to receive DATA_FIN... 

Uhm... thinking again about it, I agree mptcp_wr_eof() is too
aggressive: we still need to have the subflows around to cope with the
scenario you describe.

> but then 
> exposed a bug matching an incoming DATA_FIN with 32-bit sequence
> number.

Lucky me, that seams an unrelated issue ;)

/P

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close
@ 2020-09-25 21:34 Mat Martineau
  0 siblings, 0 replies; 5+ messages in thread
From: Mat Martineau @ 2020-09-25 21:34 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 4614 bytes --]

On Fri, 25 Sep 2020, Paolo Abeni wrote:

> On Thu, 2020-09-24 at 15:59 +0200, Paolo Abeni wrote:
>> We must not close the subflows before all the MPTCP level
>> data, comprising the DATA_FIN has been acked at the MPTCP
>> level, otherwise we could be unable to retransmit as needed.
>>
>> __mptcp_wr_shutdown() shutdown is responsible to check for the
>> correct status and close all subflows. Is called by the output
>> path after spooling any data and at shutdown/close time.
>>
>> In a similar way, __mptcp_destroy_sock() is responsible to clean-up
>> the MPTCP level status, and is called when the msk transition
>> to TCP_CLOSE.
>>
>> The protocol level close() does not force anymore the TCP_CLOSE
>> status, but orphan the msk socket and all the subflows.
>> Orphaned msk sockets are forciby closed after a timeout or
>> when all MPTCP-level data is acked.
>>
>> There is a caveat about keeping the orphaned subflows around:
>> the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
>> tcp_close(). To prevent accessing freed memory on later MPTCP
>> level operations, the msk acquires a reference to each subflow
>> socket and prevent subflow_ulp_release() from releasing the
>> subflow context before __mptcp_destroy_sock().
>>
>> The additional subflow references are released by __mptcp_done()
>> and the async ULP release is detected checking ULP ops. If such
>> field has been already cleared by the ULP release path, the
>> dangling context is freed directly by __mptcp_done().
>>
>> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
>> ---
>> Note:
>> The convoluted ctx deallocation schema fixes the UaF described in:
>>
>> https://lists.01.org/hyperkitty/list/mptcp(a)lists.01.org/message/JC7GTZHEKI6A3WHQKAWF3SYUSBEU5DYM/
>>
>> for me.
>> ---
>>  net/mptcp/protocol.c | 264 ++++++++++++++++++++++++++++++-----------
>> --
>>  net/mptcp/protocol.h |   4 +-
>>  net/mptcp/subflow.c  |  21 +++-
>>  3 files changed, 203 insertions(+), 86 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 2b6c704d4505..dec703103398 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -21,6 +21,7 @@
>>  #include <net/transp_v6.h>
>>  #endif
>>  #include <net/mptcp.h>
>> +#include <net/xfrm.h>
>>  #include "protocol.h"
>>  #include "mib.h"
>>
>> @@ -41,6 +42,8 @@ struct mptcp_skb_cb {
>>
>>  static struct percpu_counter mptcp_sockets_allocated;
>>
>> +static void __mptcp_destroy_sock(struct sock *sk, int timeout);
>> +
>>  /* If msk has an initial subflow socket, and the MP_CAPABLE
>> handshake has not
>>   * completed yet or has failed, return the subflow socket.
>>   * Otherwise return NULL.
>> @@ -102,6 +105,7 @@ static int __mptcp_socket_create(struct
>> mptcp_sock *msk)
>>  	msk->subflow = ssock;
>>  	subflow = mptcp_subflow_ctx(ssock->sk);
>>  	list_add(&subflow->node, &msk->conn_list);
>> +	sock_hold(ssock->sk);
>>  	subflow->request_mptcp = 1;
>>
>>  	/* accept() will wait on first subflow sk_wq, and we always
>> wakes up
>> @@ -313,6 +317,13 @@ static void mptcp_stop_timer(struct sock *sk)
>>  	mptcp_sk(sk)->timer_ival = 0;
>>  }
>>
>> +static bool mptcp_wr_eof(const struct sock *sk)
>> +{
>> +	return sk->sk_state == TCP_FIN_WAIT2 || sk->sk_state ==
>> TCP_TIME_WAIT ||
>> +	       sk->sk_state == TCP_CLOSE;

Need to remove TCP_TIME_WAIT here too, the state isn't used (yet?).

I looked at how this interoperated with the mptcp_trunk kernel (with 
MPTCPv1). With TCP_FIN_WAIT2 included in the mptcp_wr_eof() check, 
__mptcp_destroy_sock() gets called before we can receive and acknowledge 
the peer's DATA_FIN.

I ran an experiment removing TCP_FIN_WAIT2 and TCP_TIME_WAIT from the 
above, and it did keep the socket alive to receive DATA_FIN... but then 
exposed a bug matching an incoming DATA_FIN with 32-bit sequence number.

>> +}
>> +
>> +/* return true when transitioning to CLOSE state */
>>  static void mptcp_check_data_fin_ack(struct sock *sk)
>>  {
>>  	struct mptcp_sock *msk = mptcp_sk(sk);
>> @@ -331,15 +342,22 @@ static void mptcp_check_data_fin_ack(struct
>> sock *sk)
>>  		switch (sk->sk_state) {
>>  		case TCP_FIN_WAIT1:
>>  			inet_sk_state_store(sk, TCP_FIN_WAIT2);
>> -			sk->sk_state_change(sk);
>>  			break;
>>  		case TCP_CLOSING:
>> +			inet_sk_state_store(sk, TCP_TIME_WAIT);
>> +			break;
>
> I can't use TIME_WAIT here nor later, otherwise sk_fullsock() will be
> fooled. Not sure why I did not notice before. I'll use directly
> TCP_CLOSE.

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close
@ 2020-09-25 17:07 Paolo Abeni
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2020-09-25 17:07 UTC (permalink / raw)
  To: mptcp

[-- Attachment #1: Type: text/plain, Size: 1814 bytes --]

On Thu, 2020-09-24 at 15:59 +0200, Paolo Abeni wrote:
> @@ -1946,42 +2024,43 @@ static int mptcp_close_state(struct sock *sk)
>  	return next & TCP_ACTION_FIN;
>  }
>  
> -static void mptcp_close(struct sock *sk, long timeout)
> +static void __mptcp_wr_shutdown(struct sock *sk)
>  {
> -	struct mptcp_subflow_context *subflow, *tmp;
> +	struct mptcp_subflow_context *subflow;
>  	struct mptcp_sock *msk = mptcp_sk(sk);
> -	LIST_HEAD(conn_list);
>  
> -	lock_sock(sk);
> -	sk->sk_shutdown = SHUTDOWN_MASK;
> +	pr_debug("msk=%p snd_data_fin_enable=%d shutdown=%x state=%d pending=%d",
> +		 msk, msk->snd_data_fin_enable, sk->sk_shutdown, sk->sk_state,
> +		 !!mptcp_send_head(sk));
>  
> -	if (sk->sk_state == TCP_LISTEN) {
> -		inet_sk_state_store(sk, TCP_CLOSE);
> -		goto cleanup;
> -	} else if (sk->sk_state == TCP_CLOSE) {
> -		goto cleanup;
> -	}
> +	/* If we've already sent a FIN, we still need to enqueue data to
> +	 * subflows or not really shutting down, skip this
> +	 */
> +	if (msk->snd_data_fin_enable ||
> +	    !(sk->sk_shutdown & SEND_SHUTDOWN) || mptcp_send_head(sk) ||
> +	    (sk->sk_state != TCP_FIN_WAIT1 && sk->sk_state != TCP_LAST_ACK))
> +		return;
>  
> -	if (__mptcp_check_fallback(msk)) {
> -		goto update_state;
> -	} else if (mptcp_close_state(sk)) {
> -		pr_debug("Sending DATA_FIN sk=%p", sk);
> -		WRITE_ONCE(msk->write_seq, msk->write_seq + 1);
> -		WRITE_ONCE(msk->snd_data_fin_enable, 1);
> +	/* will be ignored by fallback sockets */
> +	WRITE_ONCE(msk->write_seq, msk->write_seq + 1);
> +	WRITE_ONCE(msk->snd_data_fin_enable, 1);

Here I need to increment also snd_nxt, or we will drop DATA FIN ack
in ack_update_msk(). Technically this last change could land in a
previous patch, but will be irrelevant till the next one.

/P

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-28 14:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 17:02 [MPTCP] Re: [RFC PATCH 6/8] mptcp: refactor shutdown and close Paolo Abeni
2020-09-25 17:07 Paolo Abeni
2020-09-25 21:34 Mat Martineau
2020-09-28  9:39 Paolo Abeni
2020-09-28 14:47 Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.