linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bpf: Fix sk_psock refcnt leak when receiving message
@ 2020-04-25 12:50 Xiyu Yang
  2020-04-25 14:24 ` Jakub Sitnicki
  0 siblings, 1 reply; 3+ messages in thread
From: Xiyu Yang @ 2020-04-25 12:50 UTC (permalink / raw)
  To: John Fastabend, Daniel Borkmann, Jakub Sitnicki, Lorenz Bauer,
	Eric Dumazet, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Jakub Kicinski, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	KP Singh, netdev, bpf, linux-kernel
  Cc: yuanxzhang, kjlu, Xiyu Yang, Xin Tan

tcp_bpf_recvmsg() invokes sk_psock_get(), which returns a reference of
the specified sk_psock object to "psock" with increased refcnt.

When tcp_bpf_recvmsg() returns, local variable "psock" becomes invalid,
so the refcount should be decreased to keep refcount balanced.

The reference counting issue happens in several exception handling paths
of tcp_bpf_recvmsg(). When those error scenarios occur such as "flags"
includes MSG_ERRQUEUE, the function forgets to decrease the refcnt
increased by sk_psock_get(), causing a refcnt leak.

Fix this issue by calling sk_psock_put() when those error scenarios
occur.

Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
---
 net/ipv4/tcp_bpf.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 5a05327f97c1..feb6b90672c1 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -265,11 +265,15 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	psock = sk_psock_get(sk);
 	if (unlikely(!psock))
 		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
-	if (unlikely(flags & MSG_ERRQUEUE))
+	if (unlikely(flags & MSG_ERRQUEUE)) {
+		sk_psock_put(sk, psock);
 		return inet_recv_error(sk, msg, len, addr_len);
+	}
 	if (!skb_queue_empty(&sk->sk_receive_queue) &&
-	    sk_psock_queue_empty(psock))
+	    sk_psock_queue_empty(psock)) {
+		sk_psock_put(sk, psock);
 		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
+	}
 	lock_sock(sk);
 msg_bytes_ready:
 	copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] bpf: Fix sk_psock refcnt leak when receiving message
  2020-04-25 12:50 [PATCH] bpf: Fix sk_psock refcnt leak when receiving message Xiyu Yang
@ 2020-04-25 14:24 ` Jakub Sitnicki
  2020-04-25 21:17   ` John Fastabend
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Sitnicki @ 2020-04-25 14:24 UTC (permalink / raw)
  To: Xiyu Yang
  Cc: John Fastabend, Daniel Borkmann, Lorenz Bauer, Eric Dumazet,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, KP Singh, netdev, bpf,
	linux-kernel, yuanxzhang, kjlu, Xin Tan

On Sat, Apr 25, 2020 at 02:50 PM CEST, Xiyu Yang wrote:
> tcp_bpf_recvmsg() invokes sk_psock_get(), which returns a reference of
> the specified sk_psock object to "psock" with increased refcnt.
>
> When tcp_bpf_recvmsg() returns, local variable "psock" becomes invalid,
> so the refcount should be decreased to keep refcount balanced.
>
> The reference counting issue happens in several exception handling paths
> of tcp_bpf_recvmsg(). When those error scenarios occur such as "flags"
> includes MSG_ERRQUEUE, the function forgets to decrease the refcnt
> increased by sk_psock_get(), causing a refcnt leak.
>
> Fix this issue by calling sk_psock_put() when those error scenarios
> occur.
>
> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
> ---
>  net/ipv4/tcp_bpf.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 5a05327f97c1..feb6b90672c1 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -265,11 +265,15 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>  	psock = sk_psock_get(sk);
>  	if (unlikely(!psock))
>  		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
> -	if (unlikely(flags & MSG_ERRQUEUE))
> +	if (unlikely(flags & MSG_ERRQUEUE)) {
> +		sk_psock_put(sk, psock);
>  		return inet_recv_error(sk, msg, len, addr_len);
> +	}
>  	if (!skb_queue_empty(&sk->sk_receive_queue) &&
> -	    sk_psock_queue_empty(psock))
> +	    sk_psock_queue_empty(psock)) {
> +		sk_psock_put(sk, psock);
>  		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
> +	}
>  	lock_sock(sk);
>  msg_bytes_ready:
>  	copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags);

Thanks for the fix.

We can pull up the error queue read handling, that is the `flags &
MSG_ERRQUEUE` branch, so that it happens before we grab a psock ref.

The effect is the same because now, if we hit the !psock branch,
tcp_recvmsg will first check if user wants to read the error queue
anyway.

That would translate to something like below, in addition to your
changes.

WDYT?

---8<---

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 5a05327f97c1..99aa57bd1901 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -262,14 +262,17 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	struct sk_psock *psock;
 	int copied, ret;

+	if (unlikely(flags & MSG_ERRQUEUE))
+		return inet_recv_error(sk, msg, len, addr_len);
+
 	psock = sk_psock_get(sk);
 	if (unlikely(!psock))
 		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
-	if (unlikely(flags & MSG_ERRQUEUE))
-		return inet_recv_error(sk, msg, len, addr_len);
 	if (!skb_queue_empty(&sk->sk_receive_queue) &&
 	    sk_psock_queue_empty(psock))
		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] bpf: Fix sk_psock refcnt leak when receiving message
  2020-04-25 14:24 ` Jakub Sitnicki
@ 2020-04-25 21:17   ` John Fastabend
  0 siblings, 0 replies; 3+ messages in thread
From: John Fastabend @ 2020-04-25 21:17 UTC (permalink / raw)
  To: Jakub Sitnicki, Xiyu Yang
  Cc: John Fastabend, Daniel Borkmann, Lorenz Bauer, Eric Dumazet,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Jakub Kicinski, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, KP Singh, netdev, bpf,
	linux-kernel, yuanxzhang, kjlu, Xin Tan

Jakub Sitnicki wrote:
> On Sat, Apr 25, 2020 at 02:50 PM CEST, Xiyu Yang wrote:
> > tcp_bpf_recvmsg() invokes sk_psock_get(), which returns a reference of
> > the specified sk_psock object to "psock" with increased refcnt.
> >
> > When tcp_bpf_recvmsg() returns, local variable "psock" becomes invalid,
> > so the refcount should be decreased to keep refcount balanced.
> >
> > The reference counting issue happens in several exception handling paths
> > of tcp_bpf_recvmsg(). When those error scenarios occur such as "flags"
> > includes MSG_ERRQUEUE, the function forgets to decrease the refcnt
> > increased by sk_psock_get(), causing a refcnt leak.
> >
> > Fix this issue by calling sk_psock_put() when those error scenarios
> > occur.
> >
> > Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
> > Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
> > ---

Thanks Xiyu and Xin. Please add a Fixes tag,

Fixes: e7a5f1f1cd000 ("bpf/sockmap: Read psock ingress_msg before sk_receive_queue")

> >  net/ipv4/tcp_bpf.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> > index 5a05327f97c1..feb6b90672c1 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> > @@ -265,11 +265,15 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
> >  	psock = sk_psock_get(sk);
> >  	if (unlikely(!psock))
> >  		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
> > -	if (unlikely(flags & MSG_ERRQUEUE))
> > +	if (unlikely(flags & MSG_ERRQUEUE)) {
> > +		sk_psock_put(sk, psock);
> >  		return inet_recv_error(sk, msg, len, addr_len);
> > +	}
> >  	if (!skb_queue_empty(&sk->sk_receive_queue) &&
> > -	    sk_psock_queue_empty(psock))
> > +	    sk_psock_queue_empty(psock)) {
> > +		sk_psock_put(sk, psock);
> >  		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
> > +	}
> >  	lock_sock(sk);
> >  msg_bytes_ready:
> >  	copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags);
> 
> Thanks for the fix.
> 
> We can pull up the error queue read handling, that is the `flags &
> MSG_ERRQUEUE` branch, so that it happens before we grab a psock ref.
> 
> The effect is the same because now, if we hit the !psock branch,
> tcp_recvmsg will first check if user wants to read the error queue
> anyway.
> 
> That would translate to something like below, in addition to your
> changes.
> 
> WDYT?

Sure that seems slightly nicer to me. Xiyu do you mind sending a
v2 with that change.

Thanks again,
John

> 
> ---8<---
> 
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 5a05327f97c1..99aa57bd1901 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -262,14 +262,17 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>  	struct sk_psock *psock;
>  	int copied, ret;
> 
> +	if (unlikely(flags & MSG_ERRQUEUE))
> +		return inet_recv_error(sk, msg, len, addr_len);
> +
>  	psock = sk_psock_get(sk);
>  	if (unlikely(!psock))
>  		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
> -	if (unlikely(flags & MSG_ERRQUEUE))
> -		return inet_recv_error(sk, msg, len, addr_len);
>  	if (!skb_queue_empty(&sk->sk_receive_queue) &&
>  	    sk_psock_queue_empty(psock))
> 		return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-25 21:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-25 12:50 [PATCH] bpf: Fix sk_psock refcnt leak when receiving message Xiyu Yang
2020-04-25 14:24 ` Jakub Sitnicki
2020-04-25 21:17   ` John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).