From: "liujian (CE)" <liujian56@huawei.com>
To: John Fastabend <john.fastabend@gmail.com>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"jakub@cloudflare.com" <jakub@cloudflare.com>,
"lmb@cloudflare.com" <lmb@cloudflare.com>,
"edumazet@google.com" <edumazet@google.com>,
"davem@davemloft.net" <davem@davemloft.net>,
"yoshfuji@linux-ipv6.org" <yoshfuji@linux-ipv6.org>,
"dsahern@kernel.org" <dsahern@kernel.org>,
"kuba@kernel.org" <kuba@kernel.org>,
"ast@kernel.org" <ast@kernel.org>,
"andrii@kernel.org" <andrii@kernel.org>,
"kafai@fb.com" <kafai@fb.com>,
"songliubraving@fb.com" <songliubraving@fb.com>,
"yhs@fb.com" <yhs@fb.com>,
"kpsingh@kernel.org" <kpsingh@kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: RE: [PATCH] tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function
Date: Mon, 4 Oct 2021 04:21:30 +0000 [thread overview]
Message-ID: <dfed692354c94efab9c83dee2cd5cf16@huawei.com> (raw)
In-Reply-To: <61563953e731f_6c4e420814@john-XPS-13-9370.notmuch>
> -----Original Message-----
> From: John Fastabend [mailto:john.fastabend@gmail.com]
> Sent: Friday, October 1, 2021 6:25 AM
> To: liujian (CE) <liujian56@huawei.com>; john.fastabend@gmail.com;
> daniel@iogearbox.net; jakub@cloudflare.com; lmb@cloudflare.com;
> edumazet@google.com; davem@davemloft.net; yoshfuji@linux-ipv6.org;
> dsahern@kernel.org; kuba@kernel.org; ast@kernel.org; andrii@kernel.org;
> kafai@fb.com; songliubraving@fb.com; yhs@fb.com; kpsingh@kernel.org;
> netdev@vger.kernel.org; bpf@vger.kernel.org
> Cc: liujian (CE) <liujian56@huawei.com>
> Subject: RE: [PATCH] tcp_bpf: Fix one concurrency problem in the
> tcp_bpf_send_verdict function
>
> Liu Jian wrote:
> > In the following cases:
> > We need to redirect the first msg to sock1 and the second msg to sock2.
> > The sock lock needs to be released at __SK_REDIRECT and to get another
> > sock lock, this will cause the probability that psock->eval is not set
> > to __SK_NONE when the second msg comes.
> >
> > If psock does not set apple bytes, fix this by do the cleanup before
> > releasing the sock lock. And keep the original logic in other cases.
>
> It took me sometime to figure out the above description. Please include a bit
> more details here this needs to be backported so we want to be very clear
> what the error is and how to trigger it.
>
> In this case we should list the flow to show how the interleaving of msgs
> breaks.
>
> "
> With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls
> (or multiple cores) on a single socket 'sk' we could get the following flow.
>
> msgA, sk msgB, sk
> ----------- ---------------
> tcp_bpf_sendmsg()
> lock(sk)
> psock = sk->psock
> tcp_bpf_sendmsg()
> lock(sk) ... blocking tcp_bpf_send_verdict if (psock-
> >eval == NONE)
> psock->eval = sk_psock_msg_verdict
> ..
> < handle SK_REDIRECT case >
> release_sock(sk) < lock dropped so grab here >
> ret = tcp_bpf_sendmsg_redir
> psock = sk->psock
> tcp_bpf_send_verdict
> lock_sock(sk) ... blocking on B
> if (psock->eval == NONE) <- boom.
> psock->eval will have msgA state
>
> The problem here is we dropped the lock on msgA and grabbed it with msgB.
> Now we have old state in psock and importantly psock->eval has not been
> cleared. So msgB will run whatever action was done on A and the verdict
> program may never see it.
> "
>
> Showing the flow makes it painfully obvious why dropping that lock with old
> state is broken.
>
Thanks a lot for such a detailed example.
>
> >
> > Signed-off-by: Liu Jian <liujian56@huawei.com>
>
> We need a fixes tag as well so we can backport this.
I will add it.
>
> > ---
> > net/ipv4/tcp_bpf.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index
> > d3e9386b493e..02442e43ac4d 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> > @@ -232,6 +232,7 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> > bool cork = false, enospc = sk_msg_full(msg);
> > struct sock *sk_redir;
> > u32 tosend, delta = 0;
> > + u32 eval = __SK_NONE;
> > int ret;
> >
> > more_data:
> > @@ -274,6 +275,12 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> > break;
> > case __SK_REDIRECT:
> > sk_redir = psock->sk_redir;
> > + if (!psock->apply_bytes) {
> > + /* Clean up before releasing the sock lock. */
> > + eval = psock->eval;
> > + psock->eval = __SK_NONE;
> > + psock->sk_redir = NULL;
> > + }
>
> We need to move above chunk below sk_msg_apply_bytes() so we account
> for the bytes and if we zero apply bytes with this send we clear the psock
> state. Otherwise we could have the same issue with stale state on the
> boundary where apply bytes is met.
>
> > sk_msg_apply_bytes(psock, tosend);
>
> <-- put above chunk here.
yes, here looks better.
>
> > if (psock->cork) {
> > cork = true;
>
> Interestingly, I caught the race with cork state, but missed it with the eval
> case. Likely because our program redirected to a single sk.
>
Yes.
> > @@ -281,7 +288,12 @@ static int tcp_bpf_send_verdict(struct sock *sk,
> struct sk_psock *psock,
> > }
> > sk_msg_return(sk, msg, tosend);
> > release_sock(sk);
> > +
> > ret = tcp_bpf_sendmsg_redir(sk_redir, msg, tosend, flags);
> > +
> > + if (eval == __SK_REDIRECT)
>
> Is the 'if' needed? we are in this case because eval is SK_REDIRECT.
>
Need it, because If the "apply bytes" is not zero, i did not change the logic.
> > + sock_put(sk_redir);
> > +
> > lock_sock(sk);
> > if (unlikely(ret < 0)) {
> > int free = sk_msg_free_nocharge(sk, msg);
> > --
> > 2.17.1
> >
prev parent reply other threads:[~2021-10-04 4:21 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-29 8:45 [PATCH] tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function Liu Jian
2021-09-30 22:25 ` John Fastabend
2021-10-04 4:21 ` liujian (CE) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dfed692354c94efab9c83dee2cd5cf16@huawei.com \
--to=liujian56@huawei.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=lmb@cloudflare.com \
--cc=netdev@vger.kernel.org \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).