RE: [PATCH] tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function

From: "liujian (CE)" <liujian56@huawei.com>
To: John Fastabend <john.fastabend@gmail.com>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"jakub@cloudflare.com" <jakub@cloudflare.com>,
	"lmb@cloudflare.com" <lmb@cloudflare.com>,
	"edumazet@google.com" <edumazet@google.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>,
	"dsahern@kernel.org" <dsahern@kernel.org>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"ast@kernel.org" <ast@kernel.org>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"kafai@fb.com" <kafai@fb.com>,
	"songliubraving@fb.com" <songliubraving@fb.com>,
	"yhs@fb.com" <yhs@fb.com>,
	"kpsingh@kernel.org" <kpsingh@kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: RE: [PATCH] tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function
Date: Mon, 4 Oct 2021 04:21:30 +0000	[thread overview]
Message-ID: <dfed692354c94efab9c83dee2cd5cf16@huawei.com> (raw)
In-Reply-To: <61563953e731f_6c4e420814@john-XPS-13-9370.notmuch>


> -----Original Message-----
> From: John Fastabend [mailto:john.fastabend@gmail.com]
> Sent: Friday, October 1, 2021 6:25 AM
> To: liujian (CE) <liujian56@huawei.com>; john.fastabend@gmail.com;
> daniel@iogearbox.net; jakub@cloudflare.com; lmb@cloudflare.com;
> edumazet@google.com; davem@davemloft.net; yoshfuji@linux-ipv6.org;
> dsahern@kernel.org; kuba@kernel.org; ast@kernel.org; andrii@kernel.org;
> kafai@fb.com; songliubraving@fb.com; yhs@fb.com; kpsingh@kernel.org;
> netdev@vger.kernel.org; bpf@vger.kernel.org
> Cc: liujian (CE) <liujian56@huawei.com>
> Subject: RE: [PATCH] tcp_bpf: Fix one concurrency problem in the
> tcp_bpf_send_verdict function
> 
> Liu Jian wrote:
> > In the following cases:
> > We need to redirect the first msg to sock1 and the second msg to sock2.
> > The sock lock needs to be released at __SK_REDIRECT and to get another
> > sock lock, this will cause the probability that psock->eval is not set
> > to __SK_NONE when the second msg comes.
> >
> > If psock does not set apple bytes, fix this by do the cleanup before
> > releasing the sock lock. And keep the original logic in other cases.
> 
> It took me sometime to figure out the above description. Please include a bit
> more details here this needs to be backported so we want to be very clear
> what the error  is and how to trigger it.
> 
> In this case we should list the flow to show how the interleaving of msgs
> breaks.
> 
> "
> With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls
> (or multiple cores) on a single socket 'sk' we could get the following flow.
> 
>  msgA, sk                               msgB, sk
>  -----------                            ---------------
>  tcp_bpf_sendmsg()
>  lock(sk)
>  psock = sk->psock
>                                         tcp_bpf_sendmsg()
>                                         lock(sk) ... blocking  tcp_bpf_send_verdict  if (psock-
> >eval == NONE)
>    psock->eval = sk_psock_msg_verdict
>  ..
>  < handle SK_REDIRECT case >
>    release_sock(sk)                     < lock dropped so grab here >
>    ret = tcp_bpf_sendmsg_redir
>                                         psock = sk->psock
>                                         tcp_bpf_send_verdict
>  lock_sock(sk) ... blocking on B
>                                         if (psock->eval == NONE) <- boom.
>                                          psock->eval will have msgA state
> 
> The problem here is we dropped the lock on msgA and grabbed it with msgB.
> Now we have old state in psock and importantly psock->eval has not been
> cleared. So msgB will run whatever action was done on A and the verdict
> program may never see it.
> "
> 
> Showing the flow makes it painfully obvious why dropping that lock with old
> state is broken.
> 
Thanks a lot for such a detailed example.
> 
> >
> > Signed-off-by: Liu Jian <liujian56@huawei.com>
> 
> We need a fixes tag as well so we can backport this.
 I will add it.
> 
> > ---
> >  net/ipv4/tcp_bpf.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index
> > d3e9386b493e..02442e43ac4d 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> > @@ -232,6 +232,7 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> >  	bool cork = false, enospc = sk_msg_full(msg);
> >  	struct sock *sk_redir;
> >  	u32 tosend, delta = 0;
> > +	u32 eval = __SK_NONE;
> >  	int ret;
> >
> >  more_data:
> > @@ -274,6 +275,12 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> >  		break;
> >  	case __SK_REDIRECT:
> >  		sk_redir = psock->sk_redir;
> > +		if (!psock->apply_bytes) {
> > +			/* Clean up before releasing the sock lock. */
> > +			eval = psock->eval;
> > +			psock->eval = __SK_NONE;
> > +			psock->sk_redir = NULL;
> > +		}
> 
> We need to move above chunk below sk_msg_apply_bytes() so we account
> for the bytes and if we zero apply bytes with this send we clear the psock
> state. Otherwise we could have the same issue with stale state on the
> boundary where apply bytes is met.
> 
> >  		sk_msg_apply_bytes(psock, tosend);
> 
> <-- put above chunk here.
yes, here looks better. 
> 
> >  		if (psock->cork) {
> >  			cork = true;
> 
> Interestingly, I caught the race with cork state, but missed it with the eval
> case. Likely because our program redirected to a single sk.
> 
Yes. 
> > @@ -281,7 +288,12 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> >  		}
> >  		sk_msg_return(sk, msg, tosend);
> >  		release_sock(sk);
> > +
> >  		ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
> > +
> > +		if (eval == __SK_REDIRECT)
> 
> Is the 'if' needed? we are in this case because eval is SK_REDIRECT.
> 
Need it, because If the "apply bytes" is not zero, i did not change the logic.
> > +			sock_put(sk_redir);
> > +
> >  		lock_sock(sk);
> >  		if (unlikely(ret < 0)) {
> >  			int free = sk_msg_free_nocharge(sk, msg);
> > --
> > 2.17.1
> >