Re: [PATCH 4/5] bpf: sockmap, sockhash: return file descriptors from privileged lookup

From: Jakub Sitnicki <jakub@cloudflare.com>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	kernel-team@cloudflare.com, netdev@vger.kernel.org,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/5] bpf: sockmap, sockhash: return file descriptors from privileged lookup
Date: Tue, 17 Mar 2020 16:18:22 +0100	[thread overview]
Message-ID: <87imj3xb5t.fsf@cloudflare.com> (raw)
In-Reply-To: <20200310174711.7490-5-lmb@cloudflare.com>

On Tue, Mar 10, 2020 at 06:47 PM CET, Lorenz Bauer wrote:
> Allow callers with CAP_NET_ADMIN to retrieve file descriptors from a
> sockmap and sockhash. O_CLOEXEC is enforced on all fds.
>
> Without this, it's difficult to resize or otherwise rebuild existing
> sockmap or sockhashes.
>
> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---
>  net/core/sock_map.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 03e04426cd21..3228936aa31e 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -347,12 +347,31 @@ static void *sock_map_lookup(struct bpf_map *map, void *key)
>  static int __sock_map_copy_value(struct bpf_map *map, struct sock *sk,
>  				 void *value)
>  {
> +	struct file *file;
> +	int fd;
> +
>  	switch (map->value_size) {
>  	case sizeof(u64):
>  		sock_gen_cookie(sk);
>  		*(u64 *)value = atomic64_read(&sk->sk_cookie);
>  		return 0;
>
> +	case sizeof(u32):
> +		if (!capable(CAP_NET_ADMIN))
> +			return -EPERM;
> +
> +		fd = get_unused_fd_flags(O_CLOEXEC);
> +		if (unlikely(fd < 0))
> +			return fd;
> +
> +		read_lock_bh(&sk->sk_callback_lock);
> +		file = get_file(sk->sk_socket->file);

I think this deserves a second look.

We don't lock the sock, so what if tcp_close orphans it before we enter
this critical section? Looks like sk->sk_socket might be NULL.

I'd find a test that tries to trigger the race helpful, like:

  thread A: loop in lookup FD from map
  thread B: loop in insert FD into map, close FD

> +		read_unlock_bh(&sk->sk_callback_lock);
> +
> +		fd_install(fd, file);
> +		*(u32 *)value = fd;
> +		return 0;
> +
>  	default:
>  		return -ENOSPC;
>  	}