bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt()
@ 2022-08-24 22:26 Martin KaFai Lau
  2022-08-24 22:26 ` [PATCH bpf-next 01/17] net: Change sock_getsockopt() to take the sk ptr instead of the sock ptr Martin KaFai Lau
                   ` (16 more replies)
  0 siblings, 17 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:26 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

The earlier commits [0] removed duplicated code from bpf_setsockopt().
This series is to remove duplicated code from bpf_getsockopt().

Unlike the setsockopt() which had already changed to take
the sockptr_t argument, the same has not been done to
getsockopt().  This is the extra step being done in this
series.

[0]: https://lore.kernel.org/all/20220817061704.4174272-1-kafai@fb.com/

Martin KaFai Lau (17):
  net: Change sock_getsockopt() to take the sk ptr instead of the sock
    ptr
  bpf: net: Change sk_getsockopt() to take the sockptr_t argument
  bpf: net: Avoid sk_getsockopt() taking sk lock when called from bpf
  bpf: net: Change do_tcp_getsockopt() to take the sockptr_t argument
  bpf: net: Avoid do_tcp_getsockopt() taking sk lock when called from
    bpf
  bpf: net: Change do_ip_getsockopt() to take the sockptr_t argument
  bpf: net: Avoid do_ip_getsockopt() taking sk lock when called from bpf
  net: Remove unused flags argument from do_ipv6_getsockopt
  net: Add a len argument to compat_ipv6_get_msfilter()
  bpf: net: Change do_ipv6_getsockopt() to take the sockptr_t argument
  bpf: net: Avoid do_ipv6_getsockopt() taking sk lock when called from
    bpf
  bpf: Embed kernel CONFIG check into the if statement in bpf_getsockopt
  bpf: Change bpf_getsockopt(SOL_SOCKET) to reuse sk_getsockopt()
  bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt()
  bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt()
  bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt()
  selftest/bpf: Add test for bpf_getsockopt()

 include/linux/filter.h                        |   3 +-
 include/linux/igmp.h                          |   4 +-
 include/linux/mroute.h                        |   6 +-
 include/linux/mroute6.h                       |   4 +-
 include/linux/sockptr.h                       |   5 +
 include/net/ip.h                              |   2 +
 include/net/ipv6.h                            |   4 +-
 include/net/ipv6_stubs.h                      |   2 +
 include/net/sock.h                            |   2 +
 include/net/tcp.h                             |   2 +
 net/core/filter.c                             | 216 ++++++++----------
 net/core/sock.c                               |  51 +++--
 net/ipv4/igmp.c                               |  22 +-
 net/ipv4/ip_sockglue.c                        |  98 ++++----
 net/ipv4/ipmr.c                               |   9 +-
 net/ipv4/tcp.c                                |  92 ++++----
 net/ipv6/af_inet6.c                           |   1 +
 net/ipv6/ip6mr.c                              |  10 +-
 net/ipv6/ipv6_sockglue.c                      |  95 ++++----
 net/ipv6/mcast.c                              |   8 +-
 .../selftests/bpf/progs/bpf_tracing_net.h     |   1 +
 .../selftests/bpf/progs/setget_sockopt.c      | 148 ++++--------
 22 files changed, 375 insertions(+), 410 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 01/17] net: Change sock_getsockopt() to take the sk ptr instead of the sock ptr
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
@ 2022-08-24 22:26 ` Martin KaFai Lau
  2022-08-24 22:26 ` [PATCH bpf-next 02/17] bpf: net: Change sk_getsockopt() to take the sockptr_t argument Martin KaFai Lau
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:26 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

A latter patch refactors bpf_getsockopt(SOL_SOCKET) with the
sock_getsockopt() to avoid code duplication and code
drift between the two duplicates.

The current sock_getsockopt() takes sock ptr as the argument.
The very first thing of this function is to get back the sk ptr
by 'sk = sock->sk'.

bpf_getsockopt() could be called when the sk does not have
the sock ptr created.  Meaning sk->sk_socket is NULL.  For example,
when a passive tcp connection has just been established but has yet
been accept()-ed.  Thus, it cannot use the sock_getsockopt(sk->sk_socket)
or else it will pass a NULL ptr.

This patch moves all sock_getsockopt implementation to the newly
added sk_getsockopt().  The new sk_getsockopt() takes a sk ptr
and immediately gets the sock ptr by 'sock = sk->sk_socket'

The existing sock_getsockopt(sock) is changed to call
sk_getsockopt(sock->sk).  All existing callers have both sock->sk
and sk->sk_socket pointer.

The latter patch will make bpf_getsockopt(SOL_SOCKET) call
sk_getsockopt(sk) directly.  The bpf_getsockopt(SOL_SOCKET) does
not use the optnames that require sk->sk_socket, so it will
be safe.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/core/sock.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 2a6f84702eb9..21bc4bf6b485 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1580,10 +1580,10 @@ static int groups_to_user(gid_t __user *dst, const struct group_info *src)
 	return 0;
 }
 
-int sock_getsockopt(struct socket *sock, int level, int optname,
-		    char __user *optval, int __user *optlen)
+static int sk_getsockopt(struct sock *sk, int level, int optname,
+			 char __user *optval, int __user *optlen)
 {
-	struct sock *sk = sock->sk;
+	struct socket *sock = sk->sk_socket;
 
 	union {
 		int val;
@@ -1947,6 +1947,12 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 	return 0;
 }
 
+int sock_getsockopt(struct socket *sock, int level, int optname,
+		    char __user *optval, int __user *optlen)
+{
+	return sk_getsockopt(sock->sk, level, optname, optval, optlen);
+}
+
 /*
  * Initialize an sk_lock.
  *
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 02/17] bpf: net: Change sk_getsockopt() to take the sockptr_t argument
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
  2022-08-24 22:26 ` [PATCH bpf-next 01/17] net: Change sock_getsockopt() to take the sk ptr instead of the sock ptr Martin KaFai Lau
@ 2022-08-24 22:26 ` Martin KaFai Lau
  2022-08-25 18:07   ` Stanislav Fomichev
  2022-08-24 22:26 ` [PATCH bpf-next 03/17] bpf: net: Avoid sk_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:26 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

This patch changes sk_getsockopt() to take the sockptr_t argument
such that it can be used by bpf_getsockopt(SOL_SOCKET) in a
latter patch.

security_socket_getpeersec_stream() is not changed.  It stays
with the __user ptr (optval.user and optlen.user) to avoid changes
to other security hooks.  bpf_getsockopt(SOL_SOCKET) also does not
support SO_PEERSEC.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/filter.h  |  3 +--
 include/linux/sockptr.h |  5 +++++
 net/core/filter.c       |  5 ++---
 net/core/sock.c         | 43 +++++++++++++++++++++++------------------
 4 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index a5f21dc3c432..527ae1d64e27 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -900,8 +900,7 @@ int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk);
 int sk_reuseport_attach_bpf(u32 ufd, struct sock *sk);
 void sk_reuseport_prog_free(struct bpf_prog *prog);
 int sk_detach_filter(struct sock *sk);
-int sk_get_filter(struct sock *sk, struct sock_filter __user *filter,
-		  unsigned int len);
+int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len);
 
 bool sk_filter_charge(struct sock *sk, struct sk_filter *fp);
 void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp);
diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h
index d45902fb4cad..bae5e2369b4f 100644
--- a/include/linux/sockptr.h
+++ b/include/linux/sockptr.h
@@ -64,6 +64,11 @@ static inline int copy_to_sockptr_offset(sockptr_t dst, size_t offset,
 	return 0;
 }
 
+static inline int copy_to_sockptr(sockptr_t dst, const void *src, size_t size)
+{
+	return copy_to_sockptr_offset(dst, 0, src, size);
+}
+
 static inline void *memdup_sockptr(sockptr_t src, size_t len)
 {
 	void *p = kmalloc_track_caller(len, GFP_USER | __GFP_NOWARN);
diff --git a/net/core/filter.c b/net/core/filter.c
index 63e25d8ce501..0f6f86b9e487 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10712,8 +10712,7 @@ int sk_detach_filter(struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(sk_detach_filter);
 
-int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
-		  unsigned int len)
+int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len)
 {
 	struct sock_fprog_kern *fprog;
 	struct sk_filter *filter;
@@ -10744,7 +10743,7 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
 		goto out;
 
 	ret = -EFAULT;
-	if (copy_to_user(ubuf, fprog->filter, bpf_classic_proglen(fprog)))
+	if (copy_to_sockptr(optval, fprog->filter, bpf_classic_proglen(fprog)))
 		goto out;
 
 	/* Instead of bytes, the API requests to return the number
diff --git a/net/core/sock.c b/net/core/sock.c
index 21bc4bf6b485..7fa30fd4b37f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -712,8 +712,8 @@ static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen)
 	return ret;
 }
 
-static int sock_getbindtodevice(struct sock *sk, char __user *optval,
-				int __user *optlen, int len)
+static int sock_getbindtodevice(struct sock *sk, sockptr_t optval,
+				sockptr_t optlen, int len)
 {
 	int ret = -ENOPROTOOPT;
 #ifdef CONFIG_NETDEVICES
@@ -737,12 +737,12 @@ static int sock_getbindtodevice(struct sock *sk, char __user *optval,
 	len = strlen(devname) + 1;
 
 	ret = -EFAULT;
-	if (copy_to_user(optval, devname, len))
+	if (copy_to_sockptr(optval, devname, len))
 		goto out;
 
 zero:
 	ret = -EFAULT;
-	if (put_user(len, optlen))
+	if (copy_to_sockptr(optlen, &len, sizeof(int)))
 		goto out;
 
 	ret = 0;
@@ -1568,20 +1568,23 @@ static void cred_to_ucred(struct pid *pid, const struct cred *cred,
 	}
 }
 
-static int groups_to_user(gid_t __user *dst, const struct group_info *src)
+static int groups_to_user(sockptr_t dst, const struct group_info *src)
 {
 	struct user_namespace *user_ns = current_user_ns();
 	int i;
 
-	for (i = 0; i < src->ngroups; i++)
-		if (put_user(from_kgid_munged(user_ns, src->gid[i]), dst + i))
+	for (i = 0; i < src->ngroups; i++) {
+		gid_t gid = from_kgid_munged(user_ns, src->gid[i]);
+
+		if (copy_to_sockptr_offset(dst, i * sizeof(gid), &gid, sizeof(gid)))
 			return -EFAULT;
+	}
 
 	return 0;
 }
 
 static int sk_getsockopt(struct sock *sk, int level, int optname,
-			 char __user *optval, int __user *optlen)
+			 sockptr_t optval, sockptr_t optlen)
 {
 	struct socket *sock = sk->sk_socket;
 
@@ -1600,7 +1603,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
 	int lv = sizeof(int);
 	int len;
 
-	if (get_user(len, optlen))
+	if (copy_from_sockptr(&len, optlen, sizeof(int)))
 		return -EFAULT;
 	if (len < 0)
 		return -EINVAL;
@@ -1735,7 +1738,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
 		cred_to_ucred(sk->sk_peer_pid, sk->sk_peer_cred, &peercred);
 		spin_unlock(&sk->sk_peer_lock);
 
-		if (copy_to_user(optval, &peercred, len))
+		if (copy_to_sockptr(optval, &peercred, len))
 			return -EFAULT;
 		goto lenout;
 	}
@@ -1753,11 +1756,11 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
 		if (len < n * sizeof(gid_t)) {
 			len = n * sizeof(gid_t);
 			put_cred(cred);
-			return put_user(len, optlen) ? -EFAULT : -ERANGE;
+			return copy_to_sockptr(optlen, &len, sizeof(int)) ? -EFAULT : -ERANGE;
 		}
 		len = n * sizeof(gid_t);
 
-		ret = groups_to_user((gid_t __user *)optval, cred->group_info);
+		ret = groups_to_user(optval, cred->group_info);
 		put_cred(cred);
 		if (ret)
 			return ret;
@@ -1773,7 +1776,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
 			return -ENOTCONN;
 		if (lv < len)
 			return -EINVAL;
-		if (copy_to_user(optval, address, len))
+		if (copy_to_sockptr(optval, address, len))
 			return -EFAULT;
 		goto lenout;
 	}
@@ -1790,7 +1793,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case SO_PEERSEC:
-		return security_socket_getpeersec_stream(sock, optval, optlen, len);
+		return security_socket_getpeersec_stream(sock, optval.user, optlen.user, len);
 
 	case SO_MARK:
 		v.val = sk->sk_mark;
@@ -1822,7 +1825,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
 		return sock_getbindtodevice(sk, optval, optlen, len);
 
 	case SO_GET_FILTER:
-		len = sk_get_filter(sk, (struct sock_filter __user *)optval, len);
+		len = sk_get_filter(sk, optval, len);
 		if (len < 0)
 			return len;
 
@@ -1870,7 +1873,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
 		sk_get_meminfo(sk, meminfo);
 
 		len = min_t(unsigned int, len, sizeof(meminfo));
-		if (copy_to_user(optval, &meminfo, len))
+		if (copy_to_sockptr(optval, &meminfo, len))
 			return -EFAULT;
 
 		goto lenout;
@@ -1939,10 +1942,10 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
 
 	if (len > lv)
 		len = lv;
-	if (copy_to_user(optval, &v, len))
+	if (copy_to_sockptr(optval, &v, len))
 		return -EFAULT;
 lenout:
-	if (put_user(len, optlen))
+	if (copy_to_sockptr(optlen, &len, sizeof(int)))
 		return -EFAULT;
 	return 0;
 }
@@ -1950,7 +1953,9 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
 int sock_getsockopt(struct socket *sock, int level, int optname,
 		    char __user *optval, int __user *optlen)
 {
-	return sk_getsockopt(sock->sk, level, optname, optval, optlen);
+	return sk_getsockopt(sock->sk, level, optname,
+			     USER_SOCKPTR(optval),
+			     USER_SOCKPTR(optlen));
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 03/17] bpf: net: Avoid sk_getsockopt() taking sk lock when called from bpf
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
  2022-08-24 22:26 ` [PATCH bpf-next 01/17] net: Change sock_getsockopt() to take the sk ptr instead of the sock ptr Martin KaFai Lau
  2022-08-24 22:26 ` [PATCH bpf-next 02/17] bpf: net: Change sk_getsockopt() to take the sockptr_t argument Martin KaFai Lau
@ 2022-08-24 22:26 ` Martin KaFai Lau
  2022-08-24 22:26 ` [PATCH bpf-next 04/17] bpf: net: Change do_tcp_getsockopt() to take the sockptr_t argument Martin KaFai Lau
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:26 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

Similar to the earlier commit that changed sk_setsockopt() to
use sockopt_{lock,release}_sock() such that it can avoid taking
lock when called from bpf.  This patch also changes sk_getsockopt()
to use sockopt_{lock,release}_sock() such that a latter patch can
make bpf_getsockopt(SOL_SOCKET) to reuse sk_getsockopt().

Only sk_get_filter() requires this change and it is used by
the optname SO_GET_FILTER.

The '.getname' implementations in sock->ops->getname() is not
changed also since bpf does not always have the sk->sk_socket
pointer and cannot support SO_PEERNAME.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/core/filter.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 0f6f86b9e487..8e7fc71160cd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10718,7 +10718,7 @@ int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len)
 	struct sk_filter *filter;
 	int ret = 0;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	filter = rcu_dereference_protected(sk->sk_filter,
 					   lockdep_sock_is_held(sk));
 	if (!filter)
@@ -10751,7 +10751,7 @@ int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len)
 	 */
 	ret = fprog->len;
 out:
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	return ret;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 04/17] bpf: net: Change do_tcp_getsockopt() to take the sockptr_t argument
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (2 preceding siblings ...)
  2022-08-24 22:26 ` [PATCH bpf-next 03/17] bpf: net: Avoid sk_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
@ 2022-08-24 22:26 ` Martin KaFai Lau
  2022-08-24 22:26 ` [PATCH bpf-next 05/17] bpf: net: Avoid do_tcp_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:26 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

Similar to the earlier patch that changes sk_getsockopt() to
take the sockptr_t argument .  This patch also changes
do_tcp_getsockopt() to take the sockptr_t argument such that
a latter patch can make bpf_getsockopt(SOL_TCP) to reuse
do_tcp_getsockopt().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/ipv4/tcp.c | 72 ++++++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 35 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a6986f201f92..7cd04f357873 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4044,14 +4044,14 @@ struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk,
 }
 
 static int do_tcp_getsockopt(struct sock *sk, int level,
-		int optname, char __user *optval, int __user *optlen)
+			     int optname, sockptr_t optval, sockptr_t optlen)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct net *net = sock_net(sk);
 	int val, len;
 
-	if (get_user(len, optlen))
+	if (copy_from_sockptr(&len, optlen, sizeof(int)))
 		return -EFAULT;
 
 	len = min_t(unsigned int, len, sizeof(int));
@@ -4101,15 +4101,15 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 	case TCP_INFO: {
 		struct tcp_info info;
 
-		if (get_user(len, optlen))
+		if (copy_from_sockptr(&len, optlen, sizeof(int)))
 			return -EFAULT;
 
 		tcp_get_info(sk, &info);
 
 		len = min_t(unsigned int, len, sizeof(info));
-		if (put_user(len, optlen))
+		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
-		if (copy_to_user(optval, &info, len))
+		if (copy_to_sockptr(optval, &info, len))
 			return -EFAULT;
 		return 0;
 	}
@@ -4119,7 +4119,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		size_t sz = 0;
 		int attr;
 
-		if (get_user(len, optlen))
+		if (copy_from_sockptr(&len, optlen, sizeof(int)))
 			return -EFAULT;
 
 		ca_ops = icsk->icsk_ca_ops;
@@ -4127,9 +4127,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 			sz = ca_ops->get_info(sk, ~0U, &attr, &info);
 
 		len = min_t(unsigned int, len, sz);
-		if (put_user(len, optlen))
+		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
-		if (copy_to_user(optval, &info, len))
+		if (copy_to_sockptr(optval, &info, len))
 			return -EFAULT;
 		return 0;
 	}
@@ -4138,27 +4138,28 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		break;
 
 	case TCP_CONGESTION:
-		if (get_user(len, optlen))
+		if (copy_from_sockptr(&len, optlen, sizeof(int)))
 			return -EFAULT;
 		len = min_t(unsigned int, len, TCP_CA_NAME_MAX);
-		if (put_user(len, optlen))
+		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
-		if (copy_to_user(optval, icsk->icsk_ca_ops->name, len))
+		if (copy_to_sockptr(optval, icsk->icsk_ca_ops->name, len))
 			return -EFAULT;
 		return 0;
 
 	case TCP_ULP:
-		if (get_user(len, optlen))
+		if (copy_from_sockptr(&len, optlen, sizeof(int)))
 			return -EFAULT;
 		len = min_t(unsigned int, len, TCP_ULP_NAME_MAX);
 		if (!icsk->icsk_ulp_ops) {
-			if (put_user(0, optlen))
+			len = 0;
+			if (copy_to_sockptr(optlen, &len, sizeof(int)))
 				return -EFAULT;
 			return 0;
 		}
-		if (put_user(len, optlen))
+		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
-		if (copy_to_user(optval, icsk->icsk_ulp_ops->name, len))
+		if (copy_to_sockptr(optval, icsk->icsk_ulp_ops->name, len))
 			return -EFAULT;
 		return 0;
 
@@ -4166,15 +4167,15 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		u64 key[TCP_FASTOPEN_KEY_BUF_LENGTH / sizeof(u64)];
 		unsigned int key_len;
 
-		if (get_user(len, optlen))
+		if (copy_from_sockptr(&len, optlen, sizeof(int)))
 			return -EFAULT;
 
 		key_len = tcp_fastopen_get_cipher(net, icsk, key) *
 				TCP_FASTOPEN_KEY_LENGTH;
 		len = min_t(unsigned int, len, key_len);
-		if (put_user(len, optlen))
+		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
-		if (copy_to_user(optval, key, len))
+		if (copy_to_sockptr(optval, key, len))
 			return -EFAULT;
 		return 0;
 	}
@@ -4200,7 +4201,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 	case TCP_REPAIR_WINDOW: {
 		struct tcp_repair_window opt;
 
-		if (get_user(len, optlen))
+		if (copy_from_sockptr(&len, optlen, sizeof(int)))
 			return -EFAULT;
 
 		if (len != sizeof(opt))
@@ -4215,7 +4216,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		opt.rcv_wnd	= tp->rcv_wnd;
 		opt.rcv_wup	= tp->rcv_wup;
 
-		if (copy_to_user(optval, &opt, len))
+		if (copy_to_sockptr(optval, &opt, len))
 			return -EFAULT;
 		return 0;
 	}
@@ -4261,14 +4262,14 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		val = tp->save_syn;
 		break;
 	case TCP_SAVED_SYN: {
-		if (get_user(len, optlen))
+		if (copy_from_sockptr(&len, optlen, sizeof(int)))
 			return -EFAULT;
 
 		lock_sock(sk);
 		if (tp->saved_syn) {
 			if (len < tcp_saved_syn_len(tp->saved_syn)) {
-				if (put_user(tcp_saved_syn_len(tp->saved_syn),
-					     optlen)) {
+				len = tcp_saved_syn_len(tp->saved_syn);
+				if (copy_to_sockptr(optlen, &len, sizeof(int))) {
 					release_sock(sk);
 					return -EFAULT;
 				}
@@ -4276,11 +4277,11 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 				return -EINVAL;
 			}
 			len = tcp_saved_syn_len(tp->saved_syn);
-			if (put_user(len, optlen)) {
+			if (copy_to_sockptr(optlen, &len, sizeof(int))) {
 				release_sock(sk);
 				return -EFAULT;
 			}
-			if (copy_to_user(optval, tp->saved_syn->data, len)) {
+			if (copy_to_sockptr(optval, tp->saved_syn->data, len)) {
 				release_sock(sk);
 				return -EFAULT;
 			}
@@ -4289,7 +4290,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		} else {
 			release_sock(sk);
 			len = 0;
-			if (put_user(len, optlen))
+			if (copy_to_sockptr(optlen, &len, sizeof(int)))
 				return -EFAULT;
 		}
 		return 0;
@@ -4300,21 +4301,21 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		struct tcp_zerocopy_receive zc = {};
 		int err;
 
-		if (get_user(len, optlen))
+		if (copy_from_sockptr(&len, optlen, sizeof(int)))
 			return -EFAULT;
 		if (len < 0 ||
 		    len < offsetofend(struct tcp_zerocopy_receive, length))
 			return -EINVAL;
 		if (unlikely(len > sizeof(zc))) {
-			err = check_zeroed_user(optval + sizeof(zc),
-						len - sizeof(zc));
+			err = check_zeroed_sockptr(optval, sizeof(zc),
+						   len - sizeof(zc));
 			if (err < 1)
 				return err == 0 ? -EINVAL : err;
 			len = sizeof(zc);
-			if (put_user(len, optlen))
+			if (copy_to_sockptr(optlen, &len, sizeof(int)))
 				return -EFAULT;
 		}
-		if (copy_from_user(&zc, optval, len))
+		if (copy_from_sockptr(&zc, optval, len))
 			return -EFAULT;
 		if (zc.reserved)
 			return -EINVAL;
@@ -4354,7 +4355,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 zerocopy_rcv_inq:
 		zc.inq = tcp_inq_hint(sk);
 zerocopy_rcv_out:
-		if (!err && copy_to_user(optval, &zc, len))
+		if (!err && copy_to_sockptr(optval, &zc, len))
 			err = -EFAULT;
 		return err;
 	}
@@ -4363,9 +4364,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		return -ENOPROTOOPT;
 	}
 
-	if (put_user(len, optlen))
+	if (copy_to_sockptr(optlen, &len, sizeof(int)))
 		return -EFAULT;
-	if (copy_to_user(optval, &val, len))
+	if (copy_to_sockptr(optval, &val, len))
 		return -EFAULT;
 	return 0;
 }
@@ -4390,7 +4391,8 @@ int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval,
 	if (level != SOL_TCP)
 		return icsk->icsk_af_ops->getsockopt(sk, level, optname,
 						     optval, optlen);
-	return do_tcp_getsockopt(sk, level, optname, optval, optlen);
+	return do_tcp_getsockopt(sk, level, optname, USER_SOCKPTR(optval),
+				 USER_SOCKPTR(optlen));
 }
 EXPORT_SYMBOL(tcp_getsockopt);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 05/17] bpf: net: Avoid do_tcp_getsockopt() taking sk lock when called from bpf
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (3 preceding siblings ...)
  2022-08-24 22:26 ` [PATCH bpf-next 04/17] bpf: net: Change do_tcp_getsockopt() to take the sockptr_t argument Martin KaFai Lau
@ 2022-08-24 22:26 ` Martin KaFai Lau
  2022-08-24 22:26 ` [PATCH bpf-next 06/17] bpf: net: Change do_ip_getsockopt() to take the sockptr_t argument Martin KaFai Lau
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:26 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

Similar to the earlier commit that changed sk_setsockopt() to
use sockopt_{lock,release}_sock() such that it can avoid taking
lock when called from bpf.  This patch also changes do_tcp_getsockopt()
to use sockopt_{lock,release}_sock() such that a latter patch can
make bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/ipv4/tcp.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7cd04f357873..ab8118225797 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4265,30 +4265,30 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		if (copy_from_sockptr(&len, optlen, sizeof(int)))
 			return -EFAULT;
 
-		lock_sock(sk);
+		sockopt_lock_sock(sk);
 		if (tp->saved_syn) {
 			if (len < tcp_saved_syn_len(tp->saved_syn)) {
 				len = tcp_saved_syn_len(tp->saved_syn);
 				if (copy_to_sockptr(optlen, &len, sizeof(int))) {
-					release_sock(sk);
+					sockopt_release_sock(sk);
 					return -EFAULT;
 				}
-				release_sock(sk);
+				sockopt_release_sock(sk);
 				return -EINVAL;
 			}
 			len = tcp_saved_syn_len(tp->saved_syn);
 			if (copy_to_sockptr(optlen, &len, sizeof(int))) {
-				release_sock(sk);
+				sockopt_release_sock(sk);
 				return -EFAULT;
 			}
 			if (copy_to_sockptr(optval, tp->saved_syn->data, len)) {
-				release_sock(sk);
+				sockopt_release_sock(sk);
 				return -EFAULT;
 			}
 			tcp_saved_syn_free(tp);
-			release_sock(sk);
+			sockopt_release_sock(sk);
 		} else {
-			release_sock(sk);
+			sockopt_release_sock(sk);
 			len = 0;
 			if (copy_to_sockptr(optlen, &len, sizeof(int)))
 				return -EFAULT;
@@ -4321,11 +4321,11 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 			return -EINVAL;
 		if (zc.msg_flags &  ~(TCP_VALID_ZC_MSG_FLAGS))
 			return -EINVAL;
-		lock_sock(sk);
+		sockopt_lock_sock(sk);
 		err = tcp_zerocopy_receive(sk, &zc, &tss);
 		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname,
 							  &zc, &len, err);
-		release_sock(sk);
+		sockopt_release_sock(sk);
 		if (len >= offsetofend(struct tcp_zerocopy_receive, msg_flags))
 			goto zerocopy_rcv_cmsg;
 		switch (len) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 06/17] bpf: net: Change do_ip_getsockopt() to take the sockptr_t argument
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (4 preceding siblings ...)
  2022-08-24 22:26 ` [PATCH bpf-next 05/17] bpf: net: Avoid do_tcp_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
@ 2022-08-24 22:26 ` Martin KaFai Lau
  2022-08-24 22:26 ` [PATCH bpf-next 07/17] bpf: net: Avoid do_ip_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:26 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

Similar to the earlier patch that changes sk_getsockopt() to
take the sockptr_t argument.  This patch also changes
do_ip_getsockopt() to take the sockptr_t argument such that
a latter patch can make bpf_getsockopt(SOL_IP) to reuse
do_ip_getsockopt().

Note on the change in ip_mc_gsfget().  This function is to
return an array of sockaddr_storage in optval.  This function
is shared between ip_get_mcast_msfilter() and
compat_ip_get_mcast_msfilter().  However, the sockaddr_storage
is stored at different offset of the optval because of
the difference between group_filter and compat_group_filter.
Thus, a new 'ss_offset' argument is added to ip_mc_gsfget().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/igmp.h   |  4 +--
 include/linux/mroute.h |  6 ++--
 net/ipv4/igmp.c        | 22 +++++++-----
 net/ipv4/ip_sockglue.c | 80 ++++++++++++++++++++++++------------------
 net/ipv4/ipmr.c        |  9 ++---
 5 files changed, 68 insertions(+), 53 deletions(-)

diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 93c262ecbdc9..78890143f079 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -118,9 +118,9 @@ extern int ip_mc_source(int add, int omode, struct sock *sk,
 		struct ip_mreq_source *mreqs, int ifindex);
 extern int ip_mc_msfilter(struct sock *sk, struct ip_msfilter *msf,int ifindex);
 extern int ip_mc_msfget(struct sock *sk, struct ip_msfilter *msf,
-		struct ip_msfilter __user *optval, int __user *optlen);
+			sockptr_t optval, sockptr_t optlen);
 extern int ip_mc_gsfget(struct sock *sk, struct group_filter *gsf,
-			struct sockaddr_storage __user *p);
+			sockptr_t optval, size_t offset);
 extern int ip_mc_sf_allow(struct sock *sk, __be32 local, __be32 rmt,
 			  int dif, int sdif);
 extern void ip_mc_init_dev(struct in_device *);
diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index 6cbbfe94348c..80b8400ab8b2 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -17,7 +17,7 @@ static inline int ip_mroute_opt(int opt)
 }
 
 int ip_mroute_setsockopt(struct sock *, int, sockptr_t, unsigned int);
-int ip_mroute_getsockopt(struct sock *, int, char __user *, int __user *);
+int ip_mroute_getsockopt(struct sock *, int, sockptr_t, sockptr_t);
 int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg);
 int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg);
 int ip_mr_init(void);
@@ -29,8 +29,8 @@ static inline int ip_mroute_setsockopt(struct sock *sock, int optname,
 	return -ENOPROTOOPT;
 }
 
-static inline int ip_mroute_getsockopt(struct sock *sock, int optname,
-				       char __user *optval, int __user *optlen)
+static inline int ip_mroute_getsockopt(struct sock *sk, int optname,
+				       sockptr_t optval, sockptr_t optlen)
 {
 	return -ENOPROTOOPT;
 }
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index e3ab0cb61624..df0660d818ac 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -2529,11 +2529,10 @@ int ip_mc_msfilter(struct sock *sk, struct ip_msfilter *msf, int ifindex)
 		err = ip_mc_leave_group(sk, &imr);
 	return err;
 }
-
 int ip_mc_msfget(struct sock *sk, struct ip_msfilter *msf,
-	struct ip_msfilter __user *optval, int __user *optlen)
+		 sockptr_t optval, sockptr_t optlen)
 {
-	int err, len, count, copycount;
+	int err, len, count, copycount, msf_size;
 	struct ip_mreqn	imr;
 	__be32 addr = msf->imsf_multiaddr;
 	struct ip_mc_socklist *pmc;
@@ -2575,12 +2574,15 @@ int ip_mc_msfget(struct sock *sk, struct ip_msfilter *msf,
 	copycount = count < msf->imsf_numsrc ? count : msf->imsf_numsrc;
 	len = flex_array_size(psl, sl_addr, copycount);
 	msf->imsf_numsrc = count;
-	if (put_user(IP_MSFILTER_SIZE(copycount), optlen) ||
-	    copy_to_user(optval, msf, IP_MSFILTER_SIZE(0))) {
+	msf_size = IP_MSFILTER_SIZE(copycount);
+	if (copy_to_sockptr(optlen, &msf_size, sizeof(int)) ||
+	    copy_to_sockptr(optval, msf, IP_MSFILTER_SIZE(0))) {
 		return -EFAULT;
 	}
 	if (len &&
-	    copy_to_user(&optval->imsf_slist_flex[0], psl->sl_addr, len))
+	    copy_to_sockptr_offset(optval,
+				   offsetof(struct ip_msfilter, imsf_slist_flex),
+				   psl->sl_addr, len))
 		return -EFAULT;
 	return 0;
 done:
@@ -2588,7 +2590,7 @@ int ip_mc_msfget(struct sock *sk, struct ip_msfilter *msf,
 }
 
 int ip_mc_gsfget(struct sock *sk, struct group_filter *gsf,
-	struct sockaddr_storage __user *p)
+		 sockptr_t optval, size_t ss_offset)
 {
 	int i, count, copycount;
 	struct sockaddr_in *psin;
@@ -2618,15 +2620,17 @@ int ip_mc_gsfget(struct sock *sk, struct group_filter *gsf,
 	count = psl ? psl->sl_count : 0;
 	copycount = count < gsf->gf_numsrc ? count : gsf->gf_numsrc;
 	gsf->gf_numsrc = count;
-	for (i = 0; i < copycount; i++, p++) {
+	for (i = 0; i < copycount; i++) {
 		struct sockaddr_storage ss;
 
 		psin = (struct sockaddr_in *)&ss;
 		memset(&ss, 0, sizeof(ss));
 		psin->sin_family = AF_INET;
 		psin->sin_addr.s_addr = psl->sl_addr[i];
-		if (copy_to_user(p, &ss, sizeof(ss)))
+		if (copy_to_sockptr_offset(optval, ss_offset,
+					   &ss, sizeof(ss)))
 			return -EFAULT;
+		ss_offset += sizeof(ss);
 	}
 	return 0;
 }
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 751fa69cb557..5310def20e0c 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1462,37 +1462,37 @@ static bool getsockopt_needs_rtnl(int optname)
 	return false;
 }
 
-static int ip_get_mcast_msfilter(struct sock *sk, void __user *optval,
-		int __user *optlen, int len)
+static int ip_get_mcast_msfilter(struct sock *sk, sockptr_t optval,
+				 sockptr_t optlen, int len)
 {
 	const int size0 = offsetof(struct group_filter, gf_slist_flex);
-	struct group_filter __user *p = optval;
 	struct group_filter gsf;
-	int num;
+	int num, gsf_size;
 	int err;
 
 	if (len < size0)
 		return -EINVAL;
-	if (copy_from_user(&gsf, p, size0))
+	if (copy_from_sockptr(&gsf, optval, size0))
 		return -EFAULT;
 
 	num = gsf.gf_numsrc;
-	err = ip_mc_gsfget(sk, &gsf, p->gf_slist_flex);
+	err = ip_mc_gsfget(sk, &gsf, optval,
+			   offsetof(struct group_filter, gf_slist_flex));
 	if (err)
 		return err;
 	if (gsf.gf_numsrc < num)
 		num = gsf.gf_numsrc;
-	if (put_user(GROUP_FILTER_SIZE(num), optlen) ||
-	    copy_to_user(p, &gsf, size0))
+	gsf_size = GROUP_FILTER_SIZE(num);
+	if (copy_to_sockptr(optlen, &gsf_size, sizeof(int)) ||
+	    copy_to_sockptr(optval, &gsf, size0))
 		return -EFAULT;
 	return 0;
 }
 
-static int compat_ip_get_mcast_msfilter(struct sock *sk, void __user *optval,
-		int __user *optlen, int len)
+static int compat_ip_get_mcast_msfilter(struct sock *sk, sockptr_t optval,
+					sockptr_t optlen, int len)
 {
 	const int size0 = offsetof(struct compat_group_filter, gf_slist_flex);
-	struct compat_group_filter __user *p = optval;
 	struct compat_group_filter gf32;
 	struct group_filter gf;
 	int num;
@@ -1500,7 +1500,7 @@ static int compat_ip_get_mcast_msfilter(struct sock *sk, void __user *optval,
 
 	if (len < size0)
 		return -EINVAL;
-	if (copy_from_user(&gf32, p, size0))
+	if (copy_from_sockptr(&gf32, optval, size0))
 		return -EFAULT;
 
 	gf.gf_interface = gf32.gf_interface;
@@ -1508,21 +1508,24 @@ static int compat_ip_get_mcast_msfilter(struct sock *sk, void __user *optval,
 	num = gf.gf_numsrc = gf32.gf_numsrc;
 	gf.gf_group = gf32.gf_group;
 
-	err = ip_mc_gsfget(sk, &gf, p->gf_slist_flex);
+	err = ip_mc_gsfget(sk, &gf, optval,
+			   offsetof(struct compat_group_filter, gf_slist_flex));
 	if (err)
 		return err;
 	if (gf.gf_numsrc < num)
 		num = gf.gf_numsrc;
 	len = GROUP_FILTER_SIZE(num) - (sizeof(gf) - sizeof(gf32));
-	if (put_user(len, optlen) ||
-	    put_user(gf.gf_fmode, &p->gf_fmode) ||
-	    put_user(gf.gf_numsrc, &p->gf_numsrc))
+	if (copy_to_sockptr(optlen, &len, sizeof(int)) ||
+	    copy_to_sockptr_offset(optval, offsetof(struct compat_group_filter, gf_fmode),
+				   &gf.gf_fmode, sizeof(gf.gf_fmode)) ||
+	    copy_to_sockptr_offset(optval, offsetof(struct compat_group_filter, gf_numsrc),
+				   &gf.gf_numsrc, sizeof(gf.gf_numsrc)))
 		return -EFAULT;
 	return 0;
 }
 
 static int do_ip_getsockopt(struct sock *sk, int level, int optname,
-			    char __user *optval, int __user *optlen)
+			    sockptr_t optval, sockptr_t optlen)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	bool needs_rtnl = getsockopt_needs_rtnl(optname);
@@ -1535,7 +1538,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 	if (ip_mroute_opt(optname))
 		return ip_mroute_getsockopt(sk, optname, optval, optlen);
 
-	if (get_user(len, optlen))
+	if (copy_from_sockptr(&len, optlen, sizeof(int)))
 		return -EFAULT;
 	if (len < 0)
 		return -EINVAL;
@@ -1560,15 +1563,17 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 			       inet_opt->opt.optlen);
 		release_sock(sk);
 
-		if (opt->optlen == 0)
-			return put_user(0, optlen);
+		if (opt->optlen == 0) {
+			len = 0;
+			return copy_to_sockptr(optlen, &len, sizeof(int));
+		}
 
 		ip_options_undo(opt);
 
 		len = min_t(unsigned int, len, opt->optlen);
-		if (put_user(len, optlen))
+		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
-		if (copy_to_user(optval, opt->__data, len))
+		if (copy_to_sockptr(optval, opt->__data, len))
 			return -EFAULT;
 		return 0;
 	}
@@ -1659,9 +1664,9 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 		addr.s_addr = inet->mc_addr;
 		release_sock(sk);
 
-		if (put_user(len, optlen))
+		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
-		if (copy_to_user(optval, &addr, len))
+		if (copy_to_sockptr(optval, &addr, len))
 			return -EFAULT;
 		return 0;
 	}
@@ -1673,12 +1678,11 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 			err = -EINVAL;
 			goto out;
 		}
-		if (copy_from_user(&msf, optval, IP_MSFILTER_SIZE(0))) {
+		if (copy_from_sockptr(&msf, optval, IP_MSFILTER_SIZE(0))) {
 			err = -EFAULT;
 			goto out;
 		}
-		err = ip_mc_msfget(sk, &msf,
-				   (struct ip_msfilter __user *)optval, optlen);
+		err = ip_mc_msfget(sk, &msf, optval, optlen);
 		goto out;
 	}
 	case MCAST_MSFILTER:
@@ -1700,8 +1704,13 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 		if (sk->sk_type != SOCK_STREAM)
 			return -ENOPROTOOPT;
 
-		msg.msg_control_is_user = true;
-		msg.msg_control_user = optval;
+		if (optval.is_kernel) {
+			msg.msg_control_is_user = false;
+			msg.msg_control = optval.kernel;
+		} else {
+			msg.msg_control_is_user = true;
+			msg.msg_control_user = optval.user;
+		}
 		msg.msg_controllen = len;
 		msg.msg_flags = in_compat_syscall() ? MSG_CMSG_COMPAT : 0;
 
@@ -1722,7 +1731,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 			put_cmsg(&msg, SOL_IP, IP_TOS, sizeof(tos), &tos);
 		}
 		len -= msg.msg_controllen;
-		return put_user(len, optlen);
+		return copy_to_sockptr(optlen, &len, sizeof(int));
 	}
 	case IP_FREEBIND:
 		val = inet->freebind;
@@ -1742,15 +1751,15 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 	if (len < sizeof(int) && len > 0 && val >= 0 && val <= 255) {
 		unsigned char ucval = (unsigned char)val;
 		len = 1;
-		if (put_user(len, optlen))
+		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
-		if (copy_to_user(optval, &ucval, 1))
+		if (copy_to_sockptr(optval, &ucval, 1))
 			return -EFAULT;
 	} else {
 		len = min_t(unsigned int, sizeof(int), len);
-		if (put_user(len, optlen))
+		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
-		if (copy_to_user(optval, &val, len))
+		if (copy_to_sockptr(optval, &val, len))
 			return -EFAULT;
 	}
 	return 0;
@@ -1767,7 +1776,8 @@ int ip_getsockopt(struct sock *sk, int level,
 {
 	int err;
 
-	err = do_ip_getsockopt(sk, level, optname, optval, optlen);
+	err = do_ip_getsockopt(sk, level, optname,
+			       USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
 
 #if IS_ENABLED(CONFIG_BPFILTER_UMH)
 	if (optname >= BPFILTER_IPT_SO_GET_INFO &&
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 73651d17e51f..95eefbe2e142 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1546,7 +1546,8 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, sockptr_t optval,
 }
 
 /* Getsock opt support for the multicast routing system. */
-int ip_mroute_getsockopt(struct sock *sk, int optname, char __user *optval, int __user *optlen)
+int ip_mroute_getsockopt(struct sock *sk, int optname, sockptr_t optval,
+			 sockptr_t optlen)
 {
 	int olr;
 	int val;
@@ -1577,14 +1578,14 @@ int ip_mroute_getsockopt(struct sock *sk, int optname, char __user *optval, int
 		return -ENOPROTOOPT;
 	}
 
-	if (get_user(olr, optlen))
+	if (copy_from_sockptr(&olr, optlen, sizeof(int)))
 		return -EFAULT;
 	olr = min_t(unsigned int, olr, sizeof(int));
 	if (olr < 0)
 		return -EINVAL;
-	if (put_user(olr, optlen))
+	if (copy_to_sockptr(optlen, &olr, sizeof(int)))
 		return -EFAULT;
-	if (copy_to_user(optval, &val, olr))
+	if (copy_to_sockptr(optval, &val, olr))
 		return -EFAULT;
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 07/17] bpf: net: Avoid do_ip_getsockopt() taking sk lock when called from bpf
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (5 preceding siblings ...)
  2022-08-24 22:26 ` [PATCH bpf-next 06/17] bpf: net: Change do_ip_getsockopt() to take the sockptr_t argument Martin KaFai Lau
@ 2022-08-24 22:26 ` Martin KaFai Lau
  2022-08-24 22:26 ` [PATCH bpf-next 08/17] net: Remove unused flags argument from do_ipv6_getsockopt Martin KaFai Lau
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:26 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

Similar to the earlier commit that changed sk_setsockopt() to
use sockopt_{lock,release}_sock() such that it can avoid taking
lock when called from bpf.  This patch also changes do_ip_getsockopt()
to use sockopt_{lock,release}_sock() such that a latter patch can
make bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/ipv4/ip_sockglue.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 5310def20e0c..5d134a75cad0 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1545,7 +1545,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 
 	if (needs_rtnl)
 		rtnl_lock();
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 
 	switch (optname) {
 	case IP_OPTIONS:
@@ -1561,7 +1561,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 			memcpy(optbuf, &inet_opt->opt,
 			       sizeof(struct ip_options) +
 			       inet_opt->opt.optlen);
-		release_sock(sk);
+		sockopt_release_sock(sk);
 
 		if (opt->optlen == 0) {
 			len = 0;
@@ -1637,7 +1637,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 			dst_release(dst);
 		}
 		if (!val) {
-			release_sock(sk);
+			sockopt_release_sock(sk);
 			return -ENOTCONN;
 		}
 		break;
@@ -1662,7 +1662,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 		struct in_addr addr;
 		len = min_t(unsigned int, len, sizeof(struct in_addr));
 		addr.s_addr = inet->mc_addr;
-		release_sock(sk);
+		sockopt_release_sock(sk);
 
 		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
@@ -1699,7 +1699,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 	{
 		struct msghdr msg;
 
-		release_sock(sk);
+		sockopt_release_sock(sk);
 
 		if (sk->sk_type != SOCK_STREAM)
 			return -ENOPROTOOPT;
@@ -1743,10 +1743,10 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 		val = inet->min_ttl;
 		break;
 	default:
-		release_sock(sk);
+		sockopt_release_sock(sk);
 		return -ENOPROTOOPT;
 	}
-	release_sock(sk);
+	sockopt_release_sock(sk);
 
 	if (len < sizeof(int) && len > 0 && val >= 0 && val <= 255) {
 		unsigned char ucval = (unsigned char)val;
@@ -1765,7 +1765,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 	return 0;
 
 out:
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	if (needs_rtnl)
 		rtnl_unlock();
 	return err;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 08/17] net: Remove unused flags argument from do_ipv6_getsockopt
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (6 preceding siblings ...)
  2022-08-24 22:26 ` [PATCH bpf-next 07/17] bpf: net: Avoid do_ip_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
@ 2022-08-24 22:26 ` Martin KaFai Lau
  2022-08-24 22:26 ` [PATCH bpf-next 09/17] net: Add a len argument to compat_ipv6_get_msfilter() Martin KaFai Lau
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:26 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

The 'unsigned int flags' argument is always 0, so it can be removed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/ipv6/ipv6_sockglue.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index a4535bdbd310..2894271c2c08 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -1133,7 +1133,7 @@ static int compat_ipv6_get_msfilter(struct sock *sk, void __user *optval,
 }
 
 static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
-		    char __user *optval, int __user *optlen, unsigned int flags)
+		    char __user *optval, int __user *optlen)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	int len;
@@ -1168,7 +1168,7 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 
 		msg.msg_control_user = optval;
 		msg.msg_controllen = len;
-		msg.msg_flags = flags;
+		msg.msg_flags = 0;
 		msg.msg_control_is_user = true;
 
 		lock_sock(sk);
@@ -1492,7 +1492,7 @@ int ipv6_getsockopt(struct sock *sk, int level, int optname,
 	if (level != SOL_IPV6)
 		return -ENOPROTOOPT;
 
-	err = do_ipv6_getsockopt(sk, level, optname, optval, optlen, 0);
+	err = do_ipv6_getsockopt(sk, level, optname, optval, optlen);
 #ifdef CONFIG_NETFILTER
 	/* we need to exclude all possible ENOPROTOOPTs except default case */
 	if (err == -ENOPROTOOPT && optname != IPV6_2292PKTOPTIONS) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 09/17] net: Add a len argument to compat_ipv6_get_msfilter()
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (7 preceding siblings ...)
  2022-08-24 22:26 ` [PATCH bpf-next 08/17] net: Remove unused flags argument from do_ipv6_getsockopt Martin KaFai Lau
@ 2022-08-24 22:26 ` Martin KaFai Lau
  2022-08-24 22:27 ` [PATCH bpf-next 10/17] bpf: net: Change do_ipv6_getsockopt() to take the sockptr_t argument Martin KaFai Lau
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:26 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

Pass the len to the compat_ipv6_get_msfilter() instead of
compat_ipv6_get_msfilter() getting it again from optlen.
Its counter part ipv6_get_msfilter() is also taking the
len from do_ipv6_getsockopt().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/ipv6/ipv6_sockglue.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 2894271c2c08..4ab284a4adf8 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -1093,17 +1093,15 @@ static int ipv6_get_msfilter(struct sock *sk, void __user *optval,
 }
 
 static int compat_ipv6_get_msfilter(struct sock *sk, void __user *optval,
-		int __user *optlen)
+		int __user *optlen, int len)
 {
 	const int size0 = offsetof(struct compat_group_filter, gf_slist_flex);
 	struct compat_group_filter __user *p = optval;
 	struct compat_group_filter gf32;
 	struct group_filter gf;
-	int len, err;
+	int err;
 	int num;
 
-	if (get_user(len, optlen))
-		return -EFAULT;
 	if (len < size0)
 		return -EINVAL;
 
@@ -1156,7 +1154,7 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		break;
 	case MCAST_MSFILTER:
 		if (in_compat_syscall())
-			return compat_ipv6_get_msfilter(sk, optval, optlen);
+			return compat_ipv6_get_msfilter(sk, optval, optlen, len);
 		return ipv6_get_msfilter(sk, optval, optlen, len);
 	case IPV6_2292PKTOPTIONS:
 	{
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 10/17] bpf: net: Change do_ipv6_getsockopt() to take the sockptr_t argument
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (8 preceding siblings ...)
  2022-08-24 22:26 ` [PATCH bpf-next 09/17] net: Add a len argument to compat_ipv6_get_msfilter() Martin KaFai Lau
@ 2022-08-24 22:27 ` Martin KaFai Lau
  2022-08-24 22:27 ` [PATCH bpf-next 11/17] bpf: net: Avoid do_ipv6_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:27 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

Similar to the earlier patch that changes sk_getsockopt() to
take the sockptr_t argument .  This patch also changes
do_ipv6_getsockopt() to take the sockptr_t argument such that
a latter patch can make bpf_getsockopt(SOL_IPV6) to reuse
do_ipv6_getsockopt().

Note on the change in ip6_mc_msfget().  This function is to
return an array of sockaddr_storage in optval.  This function
is shared between ipv6_get_msfilter() and compat_ipv6_get_msfilter().
However, the sockaddr_storage is stored at different offset of the
optval because of the difference between group_filter and
compat_group_filter.  Thus, a new 'ss_offset' argument is
added to ip6_mc_msfget().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/mroute6.h  |  4 +--
 include/net/ipv6.h       |  2 +-
 net/ipv6/ip6mr.c         | 10 +++---
 net/ipv6/ipv6_sockglue.c | 69 ++++++++++++++++++++++------------------
 net/ipv6/mcast.c         |  8 ++---
 5 files changed, 50 insertions(+), 43 deletions(-)

diff --git a/include/linux/mroute6.h b/include/linux/mroute6.h
index bc351a85ce9b..8f2b307fb124 100644
--- a/include/linux/mroute6.h
+++ b/include/linux/mroute6.h
@@ -27,7 +27,7 @@ struct sock;
 
 #ifdef CONFIG_IPV6_MROUTE
 extern int ip6_mroute_setsockopt(struct sock *, int, sockptr_t, unsigned int);
-extern int ip6_mroute_getsockopt(struct sock *, int, char __user *, int __user *);
+extern int ip6_mroute_getsockopt(struct sock *, int, sockptr_t, sockptr_t);
 extern int ip6_mr_input(struct sk_buff *skb);
 extern int ip6mr_ioctl(struct sock *sk, int cmd, void __user *arg);
 extern int ip6mr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg);
@@ -42,7 +42,7 @@ static inline int ip6_mroute_setsockopt(struct sock *sock, int optname,
 
 static inline
 int ip6_mroute_getsockopt(struct sock *sock,
-			  int optname, char __user *optval, int __user *optlen)
+			  int optname, sockptr_t optval, sockptr_t optlen)
 {
 	return -ENOPROTOOPT;
 }
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index c110d9032083..a4f24573ed7a 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1209,7 +1209,7 @@ int ip6_mc_source(int add, int omode, struct sock *sk,
 int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf,
 		  struct sockaddr_storage *list);
 int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
-		  struct sockaddr_storage __user *p);
+		  sockptr_t optval, size_t ss_offset);
 
 #ifdef CONFIG_PROC_FS
 int ac6_proc_init(struct net *net);
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index a9ba41648e36..516e83b52f26 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1827,8 +1827,8 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, sockptr_t optval,
  *	Getsock opt support for the multicast routing system.
  */
 
-int ip6_mroute_getsockopt(struct sock *sk, int optname, char __user *optval,
-			  int __user *optlen)
+int ip6_mroute_getsockopt(struct sock *sk, int optname, sockptr_t optval,
+			  sockptr_t optlen)
 {
 	int olr;
 	int val;
@@ -1859,16 +1859,16 @@ int ip6_mroute_getsockopt(struct sock *sk, int optname, char __user *optval,
 		return -ENOPROTOOPT;
 	}
 
-	if (get_user(olr, optlen))
+	if (copy_from_sockptr(&olr, optlen, sizeof(int)))
 		return -EFAULT;
 
 	olr = min_t(int, olr, sizeof(int));
 	if (olr < 0)
 		return -EINVAL;
 
-	if (put_user(olr, optlen))
+	if (copy_to_sockptr(optlen, &olr, sizeof(int)))
 		return -EFAULT;
-	if (copy_to_user(optval, &val, olr))
+	if (copy_to_sockptr(optval, &val, olr))
 		return -EFAULT;
 	return 0;
 }
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 4ab284a4adf8..4d9fadef2d3e 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -1030,7 +1030,7 @@ int ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
 EXPORT_SYMBOL(ipv6_setsockopt);
 
 static int ipv6_getsockopt_sticky(struct sock *sk, struct ipv6_txoptions *opt,
-				  int optname, char __user *optval, int len)
+				  int optname, sockptr_t optval, int len)
 {
 	struct ipv6_opt_hdr *hdr;
 
@@ -1058,45 +1058,44 @@ static int ipv6_getsockopt_sticky(struct sock *sk, struct ipv6_txoptions *opt,
 		return 0;
 
 	len = min_t(unsigned int, len, ipv6_optlen(hdr));
-	if (copy_to_user(optval, hdr, len))
+	if (copy_to_sockptr(optval, hdr, len))
 		return -EFAULT;
 	return len;
 }
 
-static int ipv6_get_msfilter(struct sock *sk, void __user *optval,
-		int __user *optlen, int len)
+static int ipv6_get_msfilter(struct sock *sk, sockptr_t optval,
+			     sockptr_t optlen, int len)
 {
 	const int size0 = offsetof(struct group_filter, gf_slist_flex);
-	struct group_filter __user *p = optval;
 	struct group_filter gsf;
 	int num;
 	int err;
 
 	if (len < size0)
 		return -EINVAL;
-	if (copy_from_user(&gsf, p, size0))
+	if (copy_from_sockptr(&gsf, optval, size0))
 		return -EFAULT;
 	if (gsf.gf_group.ss_family != AF_INET6)
 		return -EADDRNOTAVAIL;
 	num = gsf.gf_numsrc;
 	lock_sock(sk);
-	err = ip6_mc_msfget(sk, &gsf, p->gf_slist_flex);
+	err = ip6_mc_msfget(sk, &gsf, optval, size0);
 	if (!err) {
 		if (num > gsf.gf_numsrc)
 			num = gsf.gf_numsrc;
-		if (put_user(GROUP_FILTER_SIZE(num), optlen) ||
-		    copy_to_user(p, &gsf, size0))
+		len = GROUP_FILTER_SIZE(num);
+		if (copy_to_sockptr(optlen, &len, sizeof(int)) ||
+		    copy_to_sockptr(optval, &gsf, size0))
 			err = -EFAULT;
 	}
 	release_sock(sk);
 	return err;
 }
 
-static int compat_ipv6_get_msfilter(struct sock *sk, void __user *optval,
-		int __user *optlen, int len)
+static int compat_ipv6_get_msfilter(struct sock *sk, sockptr_t optval,
+				    sockptr_t optlen, int len)
 {
 	const int size0 = offsetof(struct compat_group_filter, gf_slist_flex);
-	struct compat_group_filter __user *p = optval;
 	struct compat_group_filter gf32;
 	struct group_filter gf;
 	int err;
@@ -1105,7 +1104,7 @@ static int compat_ipv6_get_msfilter(struct sock *sk, void __user *optval,
 	if (len < size0)
 		return -EINVAL;
 
-	if (copy_from_user(&gf32, p, size0))
+	if (copy_from_sockptr(&gf32, optval, size0))
 		return -EFAULT;
 	gf.gf_interface = gf32.gf_interface;
 	gf.gf_fmode = gf32.gf_fmode;
@@ -1116,22 +1115,24 @@ static int compat_ipv6_get_msfilter(struct sock *sk, void __user *optval,
 		return -EADDRNOTAVAIL;
 
 	lock_sock(sk);
-	err = ip6_mc_msfget(sk, &gf, p->gf_slist_flex);
+	err = ip6_mc_msfget(sk, &gf, optval, size0);
 	release_sock(sk);
 	if (err)
 		return err;
 	if (num > gf.gf_numsrc)
 		num = gf.gf_numsrc;
 	len = GROUP_FILTER_SIZE(num) - (sizeof(gf)-sizeof(gf32));
-	if (put_user(len, optlen) ||
-	    put_user(gf.gf_fmode, &p->gf_fmode) ||
-	    put_user(gf.gf_numsrc, &p->gf_numsrc))
+	if (copy_to_sockptr(optlen, &len, sizeof(int)) ||
+	    copy_to_sockptr_offset(optval, offsetof(struct compat_group_filter, gf_fmode),
+				   &gf.gf_fmode, sizeof(gf32.gf_fmode)) ||
+	    copy_to_sockptr_offset(optval, offsetof(struct compat_group_filter, gf_numsrc),
+				   &gf.gf_numsrc, sizeof(gf32.gf_numsrc)))
 		return -EFAULT;
 	return 0;
 }
 
 static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
-		    char __user *optval, int __user *optlen)
+			      sockptr_t optval, sockptr_t optlen)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	int len;
@@ -1140,7 +1141,7 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 	if (ip6_mroute_opt(optname))
 		return ip6_mroute_getsockopt(sk, optname, optval, optlen);
 
-	if (get_user(len, optlen))
+	if (copy_from_sockptr(&len, optlen, sizeof(int)))
 		return -EFAULT;
 	switch (optname) {
 	case IPV6_ADDRFORM:
@@ -1164,10 +1165,15 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		if (sk->sk_type != SOCK_STREAM)
 			return -ENOPROTOOPT;
 
-		msg.msg_control_user = optval;
+		if (optval.is_kernel) {
+			msg.msg_control_is_user = false;
+			msg.msg_control = optval.kernel;
+		} else {
+			msg.msg_control_is_user = true;
+			msg.msg_control_user = optval.user;
+		}
 		msg.msg_controllen = len;
 		msg.msg_flags = 0;
-		msg.msg_control_is_user = true;
 
 		lock_sock(sk);
 		skb = np->pktoptions;
@@ -1210,7 +1216,7 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 			}
 		}
 		len -= msg.msg_controllen;
-		return put_user(len, optlen);
+		return copy_to_sockptr(optlen, &len, sizeof(int));
 	}
 	case IPV6_MTU:
 	{
@@ -1270,7 +1276,7 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		/* check if ipv6_getsockopt_sticky() returns err code */
 		if (len < 0)
 			return len;
-		return put_user(len, optlen);
+		return copy_to_sockptr(optlen, &len, sizeof(int));
 	}
 
 	case IPV6_RECVHOPOPTS:
@@ -1324,9 +1330,9 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		if (!mtuinfo.ip6m_mtu)
 			return -ENOTCONN;
 
-		if (put_user(len, optlen))
+		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
-		if (copy_to_user(optval, &mtuinfo, len))
+		if (copy_to_sockptr(optval, &mtuinfo, len))
 			return -EFAULT;
 
 		return 0;
@@ -1403,7 +1409,7 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		if (len < sizeof(freq))
 			return -EINVAL;
 
-		if (copy_from_user(&freq, optval, sizeof(freq)))
+		if (copy_from_sockptr(&freq, optval, sizeof(freq)))
 			return -EFAULT;
 
 		if (freq.flr_action != IPV6_FL_A_GET)
@@ -1418,9 +1424,9 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		if (val < 0)
 			return val;
 
-		if (put_user(len, optlen))
+		if (copy_to_sockptr(optlen, &len, sizeof(int)))
 			return -EFAULT;
-		if (copy_to_user(optval, &freq, len))
+		if (copy_to_sockptr(optval, &freq, len))
 			return -EFAULT;
 
 		return 0;
@@ -1472,9 +1478,9 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		return -ENOPROTOOPT;
 	}
 	len = min_t(unsigned int, sizeof(int), len);
-	if (put_user(len, optlen))
+	if (copy_to_sockptr(optlen, &len, sizeof(int)))
 		return -EFAULT;
-	if (copy_to_user(optval, &val, len))
+	if (copy_to_sockptr(optval, &val, len))
 		return -EFAULT;
 	return 0;
 }
@@ -1490,7 +1496,8 @@ int ipv6_getsockopt(struct sock *sk, int level, int optname,
 	if (level != SOL_IPV6)
 		return -ENOPROTOOPT;
 
-	err = do_ipv6_getsockopt(sk, level, optname, optval, optlen);
+	err = do_ipv6_getsockopt(sk, level, optname,
+				 USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
 #ifdef CONFIG_NETFILTER
 	/* we need to exclude all possible ENOPROTOOPTs except default case */
 	if (err == -ENOPROTOOPT && optname != IPV6_2292PKTOPTIONS) {
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 87c699d57b36..0566ab03ddbe 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -580,7 +580,7 @@ int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf,
 }
 
 int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
-		  struct sockaddr_storage __user *p)
+		  sockptr_t optval, size_t ss_offset)
 {
 	struct ipv6_pinfo *inet6 = inet6_sk(sk);
 	const struct in6_addr *group;
@@ -612,8 +612,7 @@ int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
 
 	copycount = count < gsf->gf_numsrc ? count : gsf->gf_numsrc;
 	gsf->gf_numsrc = count;
-
-	for (i = 0; i < copycount; i++, p++) {
+	for (i = 0; i < copycount; i++) {
 		struct sockaddr_in6 *psin6;
 		struct sockaddr_storage ss;
 
@@ -621,8 +620,9 @@ int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
 		memset(&ss, 0, sizeof(ss));
 		psin6->sin6_family = AF_INET6;
 		psin6->sin6_addr = psl->sl_addr[i];
-		if (copy_to_user(p, &ss, sizeof(ss)))
+		if (copy_to_sockptr_offset(optval, ss_offset, &ss, sizeof(ss)))
 			return -EFAULT;
+		ss_offset += sizeof(ss);
 	}
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 11/17] bpf: net: Avoid do_ipv6_getsockopt() taking sk lock when called from bpf
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (9 preceding siblings ...)
  2022-08-24 22:27 ` [PATCH bpf-next 10/17] bpf: net: Change do_ipv6_getsockopt() to take the sockptr_t argument Martin KaFai Lau
@ 2022-08-24 22:27 ` Martin KaFai Lau
  2022-08-24 22:27 ` [PATCH bpf-next 12/17] bpf: Embed kernel CONFIG check into the if statement in bpf_getsockopt Martin KaFai Lau
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:27 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

Similar to the earlier patch that changes sk_getsockopt() to
use sockopt_{lock,release}_sock() such that it can avoid taking the
lock when called from bpf.  This patch also changes do_ipv6_getsockopt()
to use sockopt_{lock,release}_sock() such that bpf_getsockopt(SOL_IPV6)
can reuse do_ipv6_getsockopt().

Although bpf_getsockopt(SOL_IPV6) currently does not support optname
that requires lock_sock(), using sockopt_{lock,release}_sock()
consistently across *_getsockopt() will make future optname addition
harder to miss the sockopt_{lock,release}_sock() usage. eg. when
adding new optname that requires a lock and the new optname is
needed in bpf_getsockopt() also.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/ipv6/ipv6_sockglue.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 4d9fadef2d3e..d9887e3a6077 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -1078,7 +1078,7 @@ static int ipv6_get_msfilter(struct sock *sk, sockptr_t optval,
 	if (gsf.gf_group.ss_family != AF_INET6)
 		return -EADDRNOTAVAIL;
 	num = gsf.gf_numsrc;
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	err = ip6_mc_msfget(sk, &gsf, optval, size0);
 	if (!err) {
 		if (num > gsf.gf_numsrc)
@@ -1088,7 +1088,7 @@ static int ipv6_get_msfilter(struct sock *sk, sockptr_t optval,
 		    copy_to_sockptr(optval, &gsf, size0))
 			err = -EFAULT;
 	}
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	return err;
 }
 
@@ -1114,9 +1114,9 @@ static int compat_ipv6_get_msfilter(struct sock *sk, sockptr_t optval,
 	if (gf.gf_group.ss_family != AF_INET6)
 		return -EADDRNOTAVAIL;
 
-	lock_sock(sk);
+	sockopt_lock_sock(sk);
 	err = ip6_mc_msfget(sk, &gf, optval, size0);
-	release_sock(sk);
+	sockopt_release_sock(sk);
 	if (err)
 		return err;
 	if (num > gf.gf_numsrc)
@@ -1175,11 +1175,11 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		msg.msg_controllen = len;
 		msg.msg_flags = 0;
 
-		lock_sock(sk);
+		sockopt_lock_sock(sk);
 		skb = np->pktoptions;
 		if (skb)
 			ip6_datagram_recv_ctl(sk, &msg, skb);
-		release_sock(sk);
+		sockopt_release_sock(sk);
 		if (!skb) {
 			if (np->rxopt.bits.rxinfo) {
 				struct in6_pktinfo src_info;
@@ -1268,11 +1268,11 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 	{
 		struct ipv6_txoptions *opt;
 
-		lock_sock(sk);
+		sockopt_lock_sock(sk);
 		opt = rcu_dereference_protected(np->opt,
 						lockdep_sock_is_held(sk));
 		len = ipv6_getsockopt_sticky(sk, opt, optname, optval, len);
-		release_sock(sk);
+		sockopt_release_sock(sk);
 		/* check if ipv6_getsockopt_sticky() returns err code */
 		if (len < 0)
 			return len;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 12/17] bpf: Embed kernel CONFIG check into the if statement in bpf_getsockopt
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (10 preceding siblings ...)
  2022-08-24 22:27 ` [PATCH bpf-next 11/17] bpf: net: Avoid do_ipv6_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
@ 2022-08-24 22:27 ` Martin KaFai Lau
  2022-08-24 22:27 ` [PATCH bpf-next 13/17] bpf: Change bpf_getsockopt(SOL_SOCKET) to reuse sk_getsockopt() Martin KaFai Lau
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:27 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

This patch moves the "#ifdef CONFIG_XXX" check into the "if/else"
statement itself.  The change is done for the bpf_getsockopt()
function only.  It will make the latter patches easier to follow
without the surrounding ifdef macro.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 net/core/filter.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 8e7fc71160cd..347c38da1e0c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5218,8 +5218,8 @@ static int __bpf_getsockopt(struct sock *sk, int level, int optname,
 		default:
 			goto err_clear;
 		}
-#ifdef CONFIG_INET
-	} else if (level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) {
+	} else if (IS_ENABLED(CONFIG_INET) &&
+		   level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) {
 		struct inet_connection_sock *icsk;
 		struct tcp_sock *tp;
 
@@ -5243,7 +5243,7 @@ static int __bpf_getsockopt(struct sock *sk, int level, int optname,
 		default:
 			goto err_clear;
 		}
-	} else if (level == SOL_IP) {
+	} else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) {
 		struct inet_sock *inet = inet_sk(sk);
 
 		if (optlen != sizeof(int) || sk->sk_family != AF_INET)
@@ -5257,8 +5257,7 @@ static int __bpf_getsockopt(struct sock *sk, int level, int optname,
 		default:
 			goto err_clear;
 		}
-#if IS_ENABLED(CONFIG_IPV6)
-	} else if (level == SOL_IPV6) {
+	} else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) {
 		struct ipv6_pinfo *np = inet6_sk(sk);
 
 		if (optlen != sizeof(int) || sk->sk_family != AF_INET6)
@@ -5272,8 +5271,6 @@ static int __bpf_getsockopt(struct sock *sk, int level, int optname,
 		default:
 			goto err_clear;
 		}
-#endif
-#endif
 	} else {
 		goto err_clear;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 13/17] bpf: Change bpf_getsockopt(SOL_SOCKET) to reuse sk_getsockopt()
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (11 preceding siblings ...)
  2022-08-24 22:27 ` [PATCH bpf-next 12/17] bpf: Embed kernel CONFIG check into the if statement in bpf_getsockopt Martin KaFai Lau
@ 2022-08-24 22:27 ` Martin KaFai Lau
  2022-08-24 22:27 ` [PATCH bpf-next 14/17] bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt() Martin KaFai Lau
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:27 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

This patch changes bpf_getsockopt(SOL_SOCKET) to reuse
sk_getsockopt().  It removes all duplicated code from
bpf_getsockopt(SOL_SOCKET).

Before this patch, there were some optnames available to
bpf_setsockopt(SOL_SOCKET) but missing in bpf_getsockopt(SOL_SOCKET).
It surprises users from time to time.  For example, SO_REUSEADDR,
SO_KEEPALIVE, SO_RCVLOWAT, and SO_MAX_PACING_RATE.  This patch
automatically closes this gap without duplicating more code.
The only exception is SO_BINDTODEVICE because it needs to acquire a
blocking lock.  Thus, SO_BINDTODEVICE is not supported.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/net/sock.h |  2 ++
 net/core/filter.c  | 57 +++++++++++++++++++---------------------------
 net/core/sock.c    |  4 ++--
 3 files changed, 27 insertions(+), 36 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index ee44b424d952..ea7965524133 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1833,6 +1833,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
 int sock_setsockopt(struct socket *sock, int level, int op,
 		    sockptr_t optval, unsigned int optlen);
 
+int sk_getsockopt(struct sock *sk, int level, int optname,
+		  sockptr_t optval, sockptr_t optlen);
 int sock_getsockopt(struct socket *sock, int level, int op,
 		    char __user *optval, int __user *optlen);
 int sock_gettstamp(struct socket *sock, void __user *userstamp,
diff --git a/net/core/filter.c b/net/core/filter.c
index 347c38da1e0c..68b52243b306 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5013,8 +5013,9 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
-static int sol_socket_setsockopt(struct sock *sk, int optname,
-				 char *optval, int optlen)
+static int sol_socket_sockopt(struct sock *sk, int optname,
+			      char *optval, int *optlen,
+			      bool getopt)
 {
 	switch (optname) {
 	case SO_REUSEADDR:
@@ -5028,7 +5029,7 @@ static int sol_socket_setsockopt(struct sock *sk, int optname,
 	case SO_MAX_PACING_RATE:
 	case SO_BINDTOIFINDEX:
 	case SO_TXREHASH:
-		if (optlen != sizeof(int))
+		if (*optlen != sizeof(int))
 			return -EINVAL;
 		break;
 	case SO_BINDTODEVICE:
@@ -5037,8 +5038,16 @@ static int sol_socket_setsockopt(struct sock *sk, int optname,
 		return -EINVAL;
 	}
 
+	if (getopt) {
+		if (optname == SO_BINDTODEVICE)
+			return -EINVAL;
+		return sk_getsockopt(sk, SOL_SOCKET, optname,
+				     KERNEL_SOCKPTR(optval),
+				     KERNEL_SOCKPTR(optlen));
+	}
+
 	return sk_setsockopt(sk, SOL_SOCKET, optname,
-			     KERNEL_SOCKPTR(optval), optlen);
+			     KERNEL_SOCKPTR(optval), *optlen);
 }
 
 static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
@@ -5164,7 +5173,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname,
 		return -EINVAL;
 
 	if (level == SOL_SOCKET)
-		return sol_socket_setsockopt(sk, optname, optval, optlen);
+		return sol_socket_sockopt(sk, optname, optval, &optlen, false);
 	else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP)
 		return sol_ip_setsockopt(sk, optname, optval, optlen);
 	else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6)
@@ -5186,38 +5195,13 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
 static int __bpf_getsockopt(struct sock *sk, int level, int optname,
 			    char *optval, int optlen)
 {
+	int err = 0, saved_optlen = optlen;
+
 	if (!sk_fullsock(sk))
 		goto err_clear;
 
 	if (level == SOL_SOCKET) {
-		if (optlen != sizeof(int))
-			goto err_clear;
-
-		switch (optname) {
-		case SO_RCVBUF:
-			*((int *)optval) = sk->sk_rcvbuf;
-			break;
-		case SO_SNDBUF:
-			*((int *)optval) = sk->sk_sndbuf;
-			break;
-		case SO_MARK:
-			*((int *)optval) = sk->sk_mark;
-			break;
-		case SO_PRIORITY:
-			*((int *)optval) = sk->sk_priority;
-			break;
-		case SO_BINDTOIFINDEX:
-			*((int *)optval) = sk->sk_bound_dev_if;
-			break;
-		case SO_REUSEPORT:
-			*((int *)optval) = sk->sk_reuseport;
-			break;
-		case SO_TXREHASH:
-			*((int *)optval) = sk->sk_txrehash;
-			break;
-		default:
-			goto err_clear;
-		}
+		err = sol_socket_sockopt(sk, optname, optval, &optlen, true);
 	} else if (IS_ENABLED(CONFIG_INET) &&
 		   level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) {
 		struct inet_connection_sock *icsk;
@@ -5274,7 +5258,12 @@ static int __bpf_getsockopt(struct sock *sk, int level, int optname,
 	} else {
 		goto err_clear;
 	}
-	return 0;
+
+	if (err)
+		optlen = 0;
+	if (optlen < saved_optlen)
+		memset(optval + optlen, 0, saved_optlen - optlen);
+	return err;
 err_clear:
 	memset(optval, 0, optlen);
 	return -EINVAL;
diff --git a/net/core/sock.c b/net/core/sock.c
index 7fa30fd4b37f..68e4662eb2eb 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1583,8 +1583,8 @@ static int groups_to_user(sockptr_t dst, const struct group_info *src)
 	return 0;
 }
 
-static int sk_getsockopt(struct sock *sk, int level, int optname,
-			 sockptr_t optval, sockptr_t optlen)
+int sk_getsockopt(struct sock *sk, int level, int optname,
+		  sockptr_t optval, sockptr_t optlen)
 {
 	struct socket *sock = sk->sk_socket;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 14/17] bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt()
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (12 preceding siblings ...)
  2022-08-24 22:27 ` [PATCH bpf-next 13/17] bpf: Change bpf_getsockopt(SOL_SOCKET) to reuse sk_getsockopt() Martin KaFai Lau
@ 2022-08-24 22:27 ` Martin KaFai Lau
  2022-08-25 18:36   ` sdf
  2022-08-24 22:27 ` [PATCH bpf-next 15/17] bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt() Martin KaFai Lau
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:27 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

This patch changes bpf_getsockopt(SOL_TCP) to reuse
do_tcp_getsockopt().  It removes the duplicated code from
bpf_getsockopt(SOL_TCP).

Before this patch, there were some optnames available to
bpf_setsockopt(SOL_TCP) but missing in bpf_getsockopt(SOL_TCP).
For example, TCP_NODELAY, TCP_MAXSEG, TCP_KEEPIDLE, TCP_KEEPINTVL,
and a few more.  It surprises users from time to time.  This patch
automatically closes this gap without duplicating more code.

bpf_getsockopt(TCP_SAVED_SYN) does not free the saved_syn,
so it stays in sol_tcp_sockopt().

For string name value like TCP_CONGESTION, bpf expects it
is always null terminated, so sol_tcp_sockopt() decrements
optlen by one before calling do_tcp_getsockopt() and
the 'if (optlen < saved_optlen) memset(..,0,..);'
in __bpf_getsockopt() will always do a null termination.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/net/tcp.h |  2 ++
 net/core/filter.c | 70 ++++++++++++++++++++++++++---------------------
 net/ipv4/tcp.c    |  4 +--
 3 files changed, 43 insertions(+), 33 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index c03a50c72f40..735e957f7f4b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -402,6 +402,8 @@ void tcp_init_sock(struct sock *sk);
 void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb);
 __poll_t tcp_poll(struct file *file, struct socket *sock,
 		      struct poll_table_struct *wait);
+int do_tcp_getsockopt(struct sock *sk, int level,
+		      int optname, sockptr_t optval, sockptr_t optlen);
 int tcp_getsockopt(struct sock *sk, int level, int optname,
 		   char __user *optval, int __user *optlen);
 bool tcp_bpf_bypass_getsockopt(int level, int optname);
diff --git a/net/core/filter.c b/net/core/filter.c
index 68b52243b306..cdbbcec46e8b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5096,8 +5096,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
 	return 0;
 }
 
-static int sol_tcp_setsockopt(struct sock *sk, int optname,
-			      char *optval, int optlen)
+static int sol_tcp_sockopt(struct sock *sk, int optname,
+			   char *optval, int *optlen,
+			   bool getopt)
 {
 	if (sk->sk_prot->setsockopt != tcp_setsockopt)
 		return -EINVAL;
@@ -5114,17 +5115,47 @@ static int sol_tcp_setsockopt(struct sock *sk, int optname,
 	case TCP_USER_TIMEOUT:
 	case TCP_NOTSENT_LOWAT:
 	case TCP_SAVE_SYN:
-		if (optlen != sizeof(int))
+		if (*optlen != sizeof(int))
 			return -EINVAL;
 		break;
 	case TCP_CONGESTION:
+		if (*optlen < 2)
+			return -EINVAL;
+		break;
+	case TCP_SAVED_SYN:
+		if (*optlen < 1)
+			return -EINVAL;
 		break;
 	default:
-		return bpf_sol_tcp_setsockopt(sk, optname, optval, optlen);
+		if (getopt)
+			return -EINVAL;
+		return bpf_sol_tcp_setsockopt(sk, optname, optval, *optlen);
+	}
+
+	if (getopt) {
+		if (optname == TCP_SAVED_SYN) {
+			struct tcp_sock *tp = tcp_sk(sk);
+
+			if (!tp->saved_syn ||
+			    *optlen > tcp_saved_syn_len(tp->saved_syn))
+				return -EINVAL;
+			memcpy(optval, tp->saved_syn->data, *optlen);
+			return 0;
+		}
+
+		if (optname == TCP_CONGESTION) {
+			if (!inet_csk(sk)->icsk_ca_ops)
+				return -EINVAL;
+			(*optlen)--;
+		}
+
+		return do_tcp_getsockopt(sk, SOL_TCP, optname,
+					 KERNEL_SOCKPTR(optval),
+					 KERNEL_SOCKPTR(optlen));
 	}
 
 	return do_tcp_setsockopt(sk, SOL_TCP, optname,
-				 KERNEL_SOCKPTR(optval), optlen);
+				 KERNEL_SOCKPTR(optval), *optlen);
 }
 
 static int sol_ip_setsockopt(struct sock *sk, int optname,
@@ -5179,7 +5210,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname,
 	else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6)
 		return sol_ipv6_setsockopt(sk, optname, optval, optlen);
 	else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP)
-		return sol_tcp_setsockopt(sk, optname, optval, optlen);
+		return sol_tcp_sockopt(sk, optname, optval, &optlen, false);
 
 	return -EINVAL;
 }
@@ -5202,31 +5233,8 @@ static int __bpf_getsockopt(struct sock *sk, int level, int optname,
 
 	if (level == SOL_SOCKET) {
 		err = sol_socket_sockopt(sk, optname, optval, &optlen, true);
-	} else if (IS_ENABLED(CONFIG_INET) &&
-		   level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) {
-		struct inet_connection_sock *icsk;
-		struct tcp_sock *tp;
-
-		switch (optname) {
-		case TCP_CONGESTION:
-			icsk = inet_csk(sk);
-
-			if (!icsk->icsk_ca_ops || optlen <= 1)
-				goto err_clear;
-			strncpy(optval, icsk->icsk_ca_ops->name, optlen);
-			optval[optlen - 1] = 0;
-			break;
-		case TCP_SAVED_SYN:
-			tp = tcp_sk(sk);
-
-			if (optlen <= 0 || !tp->saved_syn ||
-			    optlen > tcp_saved_syn_len(tp->saved_syn))
-				goto err_clear;
-			memcpy(optval, tp->saved_syn->data, optlen);
-			break;
-		default:
-			goto err_clear;
-		}
+	} else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) {
+		err = sol_tcp_sockopt(sk, optname, optval, &optlen, true);
 	} else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) {
 		struct inet_sock *inet = inet_sk(sk);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ab8118225797..a47cb5662be6 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4043,8 +4043,8 @@ struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk,
 	return stats;
 }
 
-static int do_tcp_getsockopt(struct sock *sk, int level,
-			     int optname, sockptr_t optval, sockptr_t optlen)
+int do_tcp_getsockopt(struct sock *sk, int level,
+		      int optname, sockptr_t optval, sockptr_t optlen)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 15/17] bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt()
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (13 preceding siblings ...)
  2022-08-24 22:27 ` [PATCH bpf-next 14/17] bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt() Martin KaFai Lau
@ 2022-08-24 22:27 ` Martin KaFai Lau
  2022-08-24 22:27 ` [PATCH bpf-next 16/17] bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt() Martin KaFai Lau
  2022-08-24 22:27 ` [PATCH bpf-next 17/17] selftest/bpf: Add test for bpf_getsockopt() Martin KaFai Lau
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:27 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

This patch changes bpf_getsockopt(SOL_IP) to reuse
do_ip_getsockopt() and remove the duplicated code.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/net/ip.h       |  2 ++
 net/core/filter.c      | 30 ++++++++++++------------------
 net/ipv4/ip_sockglue.c |  4 ++--
 3 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 34fa5b0f0a0e..038097c2a152 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -747,6 +747,8 @@ int do_ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
 		     unsigned int optlen);
 int ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
 		  unsigned int optlen);
+int do_ip_getsockopt(struct sock *sk, int level, int optname,
+		     sockptr_t optval, sockptr_t optlen);
 int ip_getsockopt(struct sock *sk, int level, int optname, char __user *optval,
 		  int __user *optlen);
 int ip_ra_control(struct sock *sk, unsigned char on,
diff --git a/net/core/filter.c b/net/core/filter.c
index cdbbcec46e8b..266948960b49 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5158,23 +5158,29 @@ static int sol_tcp_sockopt(struct sock *sk, int optname,
 				 KERNEL_SOCKPTR(optval), *optlen);
 }
 
-static int sol_ip_setsockopt(struct sock *sk, int optname,
-			     char *optval, int optlen)
+static int sol_ip_sockopt(struct sock *sk, int optname,
+			  char *optval, int *optlen,
+			  bool getopt)
 {
 	if (sk->sk_family != AF_INET)
 		return -EINVAL;
 
 	switch (optname) {
 	case IP_TOS:
-		if (optlen != sizeof(int))
+		if (*optlen != sizeof(int))
 			return -EINVAL;
 		break;
 	default:
 		return -EINVAL;
 	}
 
+	if (getopt)
+		return do_ip_getsockopt(sk, SOL_IP, optname,
+					KERNEL_SOCKPTR(optval),
+					KERNEL_SOCKPTR(optlen));
+
 	return do_ip_setsockopt(sk, SOL_IP, optname,
-				KERNEL_SOCKPTR(optval), optlen);
+				KERNEL_SOCKPTR(optval), *optlen);
 }
 
 static int sol_ipv6_setsockopt(struct sock *sk, int optname,
@@ -5206,7 +5212,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname,
 	if (level == SOL_SOCKET)
 		return sol_socket_sockopt(sk, optname, optval, &optlen, false);
 	else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP)
-		return sol_ip_setsockopt(sk, optname, optval, optlen);
+		return sol_ip_sockopt(sk, optname, optval, &optlen, false);
 	else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6)
 		return sol_ipv6_setsockopt(sk, optname, optval, optlen);
 	else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP)
@@ -5236,19 +5242,7 @@ static int __bpf_getsockopt(struct sock *sk, int level, int optname,
 	} else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) {
 		err = sol_tcp_sockopt(sk, optname, optval, &optlen, true);
 	} else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) {
-		struct inet_sock *inet = inet_sk(sk);
-
-		if (optlen != sizeof(int) || sk->sk_family != AF_INET)
-			goto err_clear;
-
-		/* Only some options are supported */
-		switch (optname) {
-		case IP_TOS:
-			*((int *)optval) = (int)inet->tos;
-			break;
-		default:
-			goto err_clear;
-		}
+		err = sol_ip_sockopt(sk, optname, optval, &optlen, true);
 	} else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) {
 		struct ipv6_pinfo *np = inet6_sk(sk);
 
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 5d134a75cad0..47830f3fea1b 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1524,8 +1524,8 @@ static int compat_ip_get_mcast_msfilter(struct sock *sk, sockptr_t optval,
 	return 0;
 }
 
-static int do_ip_getsockopt(struct sock *sk, int level, int optname,
-			    sockptr_t optval, sockptr_t optlen)
+int do_ip_getsockopt(struct sock *sk, int level, int optname,
+		     sockptr_t optval, sockptr_t optlen)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	bool needs_rtnl = getsockopt_needs_rtnl(optname);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 16/17] bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt()
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (14 preceding siblings ...)
  2022-08-24 22:27 ` [PATCH bpf-next 15/17] bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt() Martin KaFai Lau
@ 2022-08-24 22:27 ` Martin KaFai Lau
  2022-08-24 22:27 ` [PATCH bpf-next 17/17] selftest/bpf: Add test for bpf_getsockopt() Martin KaFai Lau
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:27 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

This patch changes bpf_getsockopt(SOL_IPV6) to reuse
do_ipv6_getsockopt().  It removes the duplicated code from
bpf_getsockopt(SOL_IPV6).

This also makes bpf_getsockopt(SOL_IPV6) supporting the same
set of optnames as in bpf_setsockopt(SOL_IPV6).  In particular,
this adds IPV6_AUTOFLOWLABEL support to bpf_getsockopt(SOL_IPV6).

ipv6 could be compiled as a module.  Like how other code solved it
with stubs in ipv6_stubs.h, this patch adds the do_ipv6_getsockopt
to the ipv6_bpf_stub.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/net/ipv6.h       |  2 ++
 include/net/ipv6_stubs.h |  2 ++
 net/core/filter.c        | 55 ++++++++++++++++++----------------------
 net/ipv6/af_inet6.c      |  1 +
 net/ipv6/ipv6_sockglue.c |  4 +--
 5 files changed, 31 insertions(+), 33 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index a4f24573ed7a..d664ba5812d8 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1160,6 +1160,8 @@ int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval
 		       unsigned int optlen);
 int ipv6_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
 		    unsigned int optlen);
+int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
+		       sockptr_t optval, sockptr_t optlen);
 int ipv6_getsockopt(struct sock *sk, int level, int optname,
 		    char __user *optval, int __user *optlen);
 
diff --git a/include/net/ipv6_stubs.h b/include/net/ipv6_stubs.h
index 8692698b01cf..c48186bf4737 100644
--- a/include/net/ipv6_stubs.h
+++ b/include/net/ipv6_stubs.h
@@ -83,6 +83,8 @@ struct ipv6_bpf_stub {
 				     struct sk_buff *skb);
 	int (*ipv6_setsockopt)(struct sock *sk, int level, int optname,
 			       sockptr_t optval, unsigned int optlen);
+	int (*ipv6_getsockopt)(struct sock *sk, int level, int optname,
+			       sockptr_t optval, sockptr_t optlen);
 };
 extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 266948960b49..2a4b241387b7 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5183,8 +5183,9 @@ static int sol_ip_sockopt(struct sock *sk, int optname,
 				KERNEL_SOCKPTR(optval), *optlen);
 }
 
-static int sol_ipv6_setsockopt(struct sock *sk, int optname,
-			       char *optval, int optlen)
+static int sol_ipv6_sockopt(struct sock *sk, int optname,
+			    char *optval, int *optlen,
+			    bool getopt)
 {
 	if (sk->sk_family != AF_INET6)
 		return -EINVAL;
@@ -5192,15 +5193,20 @@ static int sol_ipv6_setsockopt(struct sock *sk, int optname,
 	switch (optname) {
 	case IPV6_TCLASS:
 	case IPV6_AUTOFLOWLABEL:
-		if (optlen != sizeof(int))
+		if (*optlen != sizeof(int))
 			return -EINVAL;
 		break;
 	default:
 		return -EINVAL;
 	}
 
+	if (getopt)
+		return ipv6_bpf_stub->ipv6_getsockopt(sk, SOL_IPV6, optname,
+						      KERNEL_SOCKPTR(optval),
+						      KERNEL_SOCKPTR(optlen));
+
 	return ipv6_bpf_stub->ipv6_setsockopt(sk, SOL_IPV6, optname,
-					      KERNEL_SOCKPTR(optval), optlen);
+					      KERNEL_SOCKPTR(optval), *optlen);
 }
 
 static int __bpf_setsockopt(struct sock *sk, int level, int optname,
@@ -5214,7 +5220,7 @@ static int __bpf_setsockopt(struct sock *sk, int level, int optname,
 	else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP)
 		return sol_ip_sockopt(sk, optname, optval, &optlen, false);
 	else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6)
-		return sol_ipv6_setsockopt(sk, optname, optval, optlen);
+		return sol_ipv6_sockopt(sk, optname, optval, &optlen, false);
 	else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP)
 		return sol_tcp_sockopt(sk, optname, optval, &optlen, false);
 
@@ -5232,43 +5238,30 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
 static int __bpf_getsockopt(struct sock *sk, int level, int optname,
 			    char *optval, int optlen)
 {
-	int err = 0, saved_optlen = optlen;
+	int err, saved_optlen = optlen;
 
-	if (!sk_fullsock(sk))
-		goto err_clear;
+	if (!sk_fullsock(sk)) {
+		err = -EINVAL;
+		goto done;
+	}
 
-	if (level == SOL_SOCKET) {
+	if (level == SOL_SOCKET)
 		err = sol_socket_sockopt(sk, optname, optval, &optlen, true);
-	} else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) {
+	else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP)
 		err = sol_tcp_sockopt(sk, optname, optval, &optlen, true);
-	} else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) {
+	else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP)
 		err = sol_ip_sockopt(sk, optname, optval, &optlen, true);
-	} else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6) {
-		struct ipv6_pinfo *np = inet6_sk(sk);
-
-		if (optlen != sizeof(int) || sk->sk_family != AF_INET6)
-			goto err_clear;
-
-		/* Only some options are supported */
-		switch (optname) {
-		case IPV6_TCLASS:
-			*((int *)optval) = (int)np->tclass;
-			break;
-		default:
-			goto err_clear;
-		}
-	} else {
-		goto err_clear;
-	}
+	else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6)
+		err = sol_ipv6_sockopt(sk, optname, optval, &optlen, true);
+	else
+		err = -EINVAL;
 
+done:
 	if (err)
 		optlen = 0;
 	if (optlen < saved_optlen)
 		memset(optval + optlen, 0, saved_optlen - optlen);
 	return err;
-err_clear:
-	memset(optval, 0, optlen);
-	return -EINVAL;
 }
 
 static int _bpf_getsockopt(struct sock *sk, int level, int optname,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index cadc97852787..19732b5dce23 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -1058,6 +1058,7 @@ static const struct ipv6_bpf_stub ipv6_bpf_stub_impl = {
 	.inet6_bind = __inet6_bind,
 	.udp6_lib_lookup = __udp6_lib_lookup,
 	.ipv6_setsockopt = do_ipv6_setsockopt,
+	.ipv6_getsockopt = do_ipv6_getsockopt,
 };
 
 static int __init inet6_init(void)
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index d9887e3a6077..1193f5a5247d 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -1131,8 +1131,8 @@ static int compat_ipv6_get_msfilter(struct sock *sk, sockptr_t optval,
 	return 0;
 }
 
-static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
-			      sockptr_t optval, sockptr_t optlen)
+int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
+		       sockptr_t optval, sockptr_t optlen)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	int len;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 17/17] selftest/bpf: Add test for bpf_getsockopt()
  2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
                   ` (15 preceding siblings ...)
  2022-08-24 22:27 ` [PATCH bpf-next 16/17] bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt() Martin KaFai Lau
@ 2022-08-24 22:27 ` Martin KaFai Lau
  16 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-24 22:27 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni, Stanislav Fomichev

This patch removes the __bpf_getsockopt() which directly
reads the sk by using PTR_TO_BTF_ID.  Instead, the test now directly
uses the kernel bpf helper bpf_getsockopt() which supports all
the required optname now.

TCP_SAVE[D]_SYN and TCP_MAXSEG are not tested in a loop for all
the hooks and sock_ops's cb.  TCP_SAVE[D]_SYN only works
in passive connection.  TCP_MAXSEG only works when
it is setsockopt before the connection is established and
the getsockopt return value can only be tested after
the connection is established.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 .../selftests/bpf/progs/bpf_tracing_net.h     |   1 +
 .../selftests/bpf/progs/setget_sockopt.c      | 148 +++++-------------
 2 files changed, 43 insertions(+), 106 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
index 5ebc6dabef84..adb087aecc9e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
+++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
@@ -38,6 +38,7 @@
 #define TCP_USER_TIMEOUT	18
 #define TCP_NOTSENT_LOWAT	25
 #define TCP_SAVE_SYN		27
+#define TCP_SAVED_SYN		28
 #define TCP_CA_NAME_MAX		16
 #define TCP_NAGLE_OFF		1
 
diff --git a/tools/testing/selftests/bpf/progs/setget_sockopt.c b/tools/testing/selftests/bpf/progs/setget_sockopt.c
index 40606ef47a38..094b4b8b8f93 100644
--- a/tools/testing/selftests/bpf/progs/setget_sockopt.c
+++ b/tools/testing/selftests/bpf/progs/setget_sockopt.c
@@ -51,7 +51,6 @@ static const struct sockopt_test sol_socket_tests[] = {
 
 static const struct sockopt_test sol_tcp_tests[] = {
 	{ .opt = TCP_NODELAY, .flip = 1, },
-	{ .opt = TCP_MAXSEG, .new = 1314, .expected = 1314, },
 	{ .opt = TCP_KEEPIDLE, .new = 123, .expected = 123, .restore = 321, },
 	{ .opt = TCP_KEEPINTVL, .new = 123, .expected = 123, .restore = 321, },
 	{ .opt = TCP_KEEPCNT, .new = 123, .expected = 123, .restore = 124, },
@@ -61,7 +60,6 @@ static const struct sockopt_test sol_tcp_tests[] = {
 	{ .opt = TCP_THIN_LINEAR_TIMEOUTS, .flip = 1, },
 	{ .opt = TCP_USER_TIMEOUT, .new = 123400, .expected = 123400, },
 	{ .opt = TCP_NOTSENT_LOWAT, .new = 1314, .expected = 1314, },
-	{ .opt = TCP_SAVE_SYN, .new = 1, .expected = 1, },
 	{ .opt = 0, },
 };
 
@@ -81,102 +79,6 @@ struct loop_ctx {
 	struct sock *sk;
 };
 
-static int __bpf_getsockopt(void *ctx, struct sock *sk,
-			    int level, int opt, int *optval,
-			    int optlen)
-{
-	if (level == SOL_SOCKET) {
-		switch (opt) {
-		case SO_REUSEADDR:
-			*optval = !!BPF_CORE_READ_BITFIELD(sk, sk_reuse);
-			break;
-		case SO_KEEPALIVE:
-			*optval = !!(sk->sk_flags & (1UL << 3));
-			break;
-		case SO_RCVLOWAT:
-			*optval = sk->sk_rcvlowat;
-			break;
-		case SO_MAX_PACING_RATE:
-			*optval = sk->sk_max_pacing_rate;
-			break;
-		default:
-			return bpf_getsockopt(ctx, level, opt, optval, optlen);
-		}
-		return 0;
-	}
-
-	if (level == IPPROTO_TCP) {
-		struct tcp_sock *tp = bpf_skc_to_tcp_sock(sk);
-
-		if (!tp)
-			return -1;
-
-		switch (opt) {
-		case TCP_NODELAY:
-			*optval = !!(BPF_CORE_READ_BITFIELD(tp, nonagle) & TCP_NAGLE_OFF);
-			break;
-		case TCP_MAXSEG:
-			*optval = tp->rx_opt.user_mss;
-			break;
-		case TCP_KEEPIDLE:
-			*optval = tp->keepalive_time / CONFIG_HZ;
-			break;
-		case TCP_SYNCNT:
-			*optval = tp->inet_conn.icsk_syn_retries;
-			break;
-		case TCP_KEEPINTVL:
-			*optval = tp->keepalive_intvl / CONFIG_HZ;
-			break;
-		case TCP_KEEPCNT:
-			*optval = tp->keepalive_probes;
-			break;
-		case TCP_WINDOW_CLAMP:
-			*optval = tp->window_clamp;
-			break;
-		case TCP_THIN_LINEAR_TIMEOUTS:
-			*optval = !!BPF_CORE_READ_BITFIELD(tp, thin_lto);
-			break;
-		case TCP_USER_TIMEOUT:
-			*optval = tp->inet_conn.icsk_user_timeout;
-			break;
-		case TCP_NOTSENT_LOWAT:
-			*optval = tp->notsent_lowat;
-			break;
-		case TCP_SAVE_SYN:
-			*optval = BPF_CORE_READ_BITFIELD(tp, save_syn);
-			break;
-		default:
-			return bpf_getsockopt(ctx, level, opt, optval, optlen);
-		}
-		return 0;
-	}
-
-	if (level == IPPROTO_IPV6) {
-		switch (opt) {
-		case IPV6_AUTOFLOWLABEL: {
-			__u16 proto = sk->sk_protocol;
-			struct inet_sock *inet_sk;
-
-			if (proto == IPPROTO_TCP)
-				inet_sk = (struct inet_sock *)bpf_skc_to_tcp_sock(sk);
-			else
-				inet_sk = (struct inet_sock *)bpf_skc_to_udp6_sock(sk);
-
-			if (!inet_sk)
-				return -1;
-
-			*optval = !!inet_sk->pinet6->autoflowlabel;
-			break;
-		}
-		default:
-			return bpf_getsockopt(ctx, level, opt, optval, optlen);
-		}
-		return 0;
-	}
-
-	return bpf_getsockopt(ctx, level, opt, optval, optlen);
-}
-
 static int bpf_test_sockopt_flip(void *ctx, struct sock *sk,
 				 const struct sockopt_test *t,
 				 int level)
@@ -185,7 +87,7 @@ static int bpf_test_sockopt_flip(void *ctx, struct sock *sk,
 
 	opt = t->opt;
 
-	if (__bpf_getsockopt(ctx, sk, level, opt, &old, sizeof(old)))
+	if (bpf_getsockopt(ctx, level, opt, &old, sizeof(old)))
 		return 1;
 	/* kernel initialized txrehash to 255 */
 	if (level == SOL_SOCKET && opt == SO_TXREHASH && old != 0 && old != 1)
@@ -194,7 +96,7 @@ static int bpf_test_sockopt_flip(void *ctx, struct sock *sk,
 	new = !old;
 	if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new)))
 		return 1;
-	if (__bpf_getsockopt(ctx, sk, level, opt, &tmp, sizeof(tmp)) ||
+	if (bpf_getsockopt(ctx, level, opt, &tmp, sizeof(tmp)) ||
 	    tmp != new)
 		return 1;
 
@@ -217,13 +119,13 @@ static int bpf_test_sockopt_int(void *ctx, struct sock *sk,
 	else
 		expected = t->expected;
 
-	if (__bpf_getsockopt(ctx, sk, level, opt, &old, sizeof(old)) ||
+	if (bpf_getsockopt(ctx, level, opt, &old, sizeof(old)) ||
 	    old == new)
 		return 1;
 
 	if (bpf_setsockopt(ctx, level, opt, &new, sizeof(new)))
 		return 1;
-	if (__bpf_getsockopt(ctx, sk, level, opt, &tmp, sizeof(tmp)) ||
+	if (bpf_getsockopt(ctx, level, opt, &tmp, sizeof(tmp)) ||
 	    tmp != expected)
 		return 1;
 
@@ -406,6 +308,34 @@ static int binddev_test(void *ctx)
 	return 0;
 }
 
+static int test_tcp_maxseg(void *ctx, struct sock *sk)
+{
+	int val = 1314, tmp;
+
+	if (sk->sk_state != TCP_ESTABLISHED)
+		return bpf_setsockopt(ctx, IPPROTO_TCP, TCP_MAXSEG,
+				      &val, sizeof(val));
+
+	if (bpf_getsockopt(ctx, IPPROTO_TCP, TCP_MAXSEG, &tmp, sizeof(tmp)) ||
+	    tmp > val)
+		return -1;
+
+	return 0;
+}
+
+static int test_tcp_saved_syn(void *ctx, struct sock *sk)
+{
+	__u8 saved_syn[20];
+	int one = 1;
+
+	if (sk->sk_state == TCP_LISTEN)
+		return bpf_setsockopt(ctx, IPPROTO_TCP, TCP_SAVE_SYN,
+				      &one, sizeof(one));
+
+	return bpf_getsockopt(ctx, IPPROTO_TCP, TCP_SAVED_SYN,
+			      saved_syn, sizeof(saved_syn));
+}
+
 SEC("lsm_cgroup/socket_post_create")
 int BPF_PROG(socket_post_create, struct socket *sock, int family,
 	     int type, int protocol, int kern)
@@ -436,16 +366,22 @@ int skops_sockopt(struct bpf_sock_ops *skops)
 
 	switch (skops->op) {
 	case BPF_SOCK_OPS_TCP_LISTEN_CB:
-		nr_listen += !bpf_test_sockopt(skops, sk);
+		nr_listen += !(bpf_test_sockopt(skops, sk) ||
+			       test_tcp_maxseg(skops, sk) ||
+			       test_tcp_saved_syn(skops, sk));
 		break;
 	case BPF_SOCK_OPS_TCP_CONNECT_CB:
-		nr_connect += !bpf_test_sockopt(skops, sk);
+		nr_connect += !(bpf_test_sockopt(skops, sk) ||
+				test_tcp_maxseg(skops, sk));
 		break;
 	case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
-		nr_active += !bpf_test_sockopt(skops, sk);
+		nr_active += !(bpf_test_sockopt(skops, sk) ||
+			       test_tcp_maxseg(skops, sk));
 		break;
 	case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
-		nr_passive += !bpf_test_sockopt(skops, sk);
+		nr_passive += !(bpf_test_sockopt(skops, sk) ||
+				test_tcp_maxseg(skops, sk) ||
+				test_tcp_saved_syn(skops, sk));
 		break;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: net: Change sk_getsockopt() to take the sockptr_t argument
  2022-08-24 22:26 ` [PATCH bpf-next 02/17] bpf: net: Change sk_getsockopt() to take the sockptr_t argument Martin KaFai Lau
@ 2022-08-25 18:07   ` Stanislav Fomichev
  2022-08-26 19:15     ` Martin KaFai Lau
  0 siblings, 1 reply; 23+ messages in thread
From: Stanislav Fomichev @ 2022-08-25 18:07 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, netdev, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, David Miller, Eric Dumazet, Jakub Kicinski,
	kernel-team, Paolo Abeni

On Wed, Aug 24, 2022 at 3:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> This patch changes sk_getsockopt() to take the sockptr_t argument
> such that it can be used by bpf_getsockopt(SOL_SOCKET) in a
> latter patch.
>
> security_socket_getpeersec_stream() is not changed.  It stays
> with the __user ptr (optval.user and optlen.user) to avoid changes
> to other security hooks.  bpf_getsockopt(SOL_SOCKET) also does not
> support SO_PEERSEC.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---
>  include/linux/filter.h  |  3 +--
>  include/linux/sockptr.h |  5 +++++
>  net/core/filter.c       |  5 ++---
>  net/core/sock.c         | 43 +++++++++++++++++++++++------------------
>  4 files changed, 32 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index a5f21dc3c432..527ae1d64e27 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -900,8 +900,7 @@ int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk);
>  int sk_reuseport_attach_bpf(u32 ufd, struct sock *sk);
>  void sk_reuseport_prog_free(struct bpf_prog *prog);
>  int sk_detach_filter(struct sock *sk);
> -int sk_get_filter(struct sock *sk, struct sock_filter __user *filter,
> -                 unsigned int len);
> +int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len);
>
>  bool sk_filter_charge(struct sock *sk, struct sk_filter *fp);
>  void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp);
> diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h
> index d45902fb4cad..bae5e2369b4f 100644
> --- a/include/linux/sockptr.h
> +++ b/include/linux/sockptr.h
> @@ -64,6 +64,11 @@ static inline int copy_to_sockptr_offset(sockptr_t dst, size_t offset,
>         return 0;
>  }
>
> +static inline int copy_to_sockptr(sockptr_t dst, const void *src, size_t size)
> +{
> +       return copy_to_sockptr_offset(dst, 0, src, size);
> +}
> +
>  static inline void *memdup_sockptr(sockptr_t src, size_t len)
>  {
>         void *p = kmalloc_track_caller(len, GFP_USER | __GFP_NOWARN);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 63e25d8ce501..0f6f86b9e487 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -10712,8 +10712,7 @@ int sk_detach_filter(struct sock *sk)
>  }
>  EXPORT_SYMBOL_GPL(sk_detach_filter);
>
> -int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
> -                 unsigned int len)
> +int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len)
>  {
>         struct sock_fprog_kern *fprog;
>         struct sk_filter *filter;
> @@ -10744,7 +10743,7 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
>                 goto out;
>
>         ret = -EFAULT;
> -       if (copy_to_user(ubuf, fprog->filter, bpf_classic_proglen(fprog)))
> +       if (copy_to_sockptr(optval, fprog->filter, bpf_classic_proglen(fprog)))
>                 goto out;
>
>         /* Instead of bytes, the API requests to return the number
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 21bc4bf6b485..7fa30fd4b37f 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -712,8 +712,8 @@ static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen)
>         return ret;
>  }
>
> -static int sock_getbindtodevice(struct sock *sk, char __user *optval,
> -                               int __user *optlen, int len)
> +static int sock_getbindtodevice(struct sock *sk, sockptr_t optval,
> +                               sockptr_t optlen, int len)
>  {
>         int ret = -ENOPROTOOPT;
>  #ifdef CONFIG_NETDEVICES
> @@ -737,12 +737,12 @@ static int sock_getbindtodevice(struct sock *sk, char __user *optval,
>         len = strlen(devname) + 1;
>
>         ret = -EFAULT;
> -       if (copy_to_user(optval, devname, len))
> +       if (copy_to_sockptr(optval, devname, len))
>                 goto out;
>
>  zero:
>         ret = -EFAULT;
> -       if (put_user(len, optlen))
> +       if (copy_to_sockptr(optlen, &len, sizeof(int)))
>                 goto out;
>
>         ret = 0;
> @@ -1568,20 +1568,23 @@ static void cred_to_ucred(struct pid *pid, const struct cred *cred,
>         }
>  }
>
> -static int groups_to_user(gid_t __user *dst, const struct group_info *src)
> +static int groups_to_user(sockptr_t dst, const struct group_info *src)
>  {
>         struct user_namespace *user_ns = current_user_ns();
>         int i;
>
> -       for (i = 0; i < src->ngroups; i++)
> -               if (put_user(from_kgid_munged(user_ns, src->gid[i]), dst + i))
> +       for (i = 0; i < src->ngroups; i++) {
> +               gid_t gid = from_kgid_munged(user_ns, src->gid[i]);
> +
> +               if (copy_to_sockptr_offset(dst, i * sizeof(gid), &gid, sizeof(gid)))
>                         return -EFAULT;
> +       }
>
>         return 0;
>  }
>
>  static int sk_getsockopt(struct sock *sk, int level, int optname,
> -                        char __user *optval, int __user *optlen)
> +                        sockptr_t optval, sockptr_t optlen)
>  {
>         struct socket *sock = sk->sk_socket;
>
> @@ -1600,7 +1603,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
>         int lv = sizeof(int);
>         int len;
>
> -       if (get_user(len, optlen))
> +       if (copy_from_sockptr(&len, optlen, sizeof(int)))

Do we want to be consistent wrt to sizeof?

copy_from_sockptr(&len, optlen, sizeof(int))
vs
copy_from_sockptr(&len, optlen, sizeof(optlen))

Alternatively, should we have put_sockptr/get_sockopt with a semantics
similar to put_user/get_user to remove all this ambiguity?

>                 return -EFAULT;
>         if (len < 0)
>                 return -EINVAL;
> @@ -1735,7 +1738,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
>                 cred_to_ucred(sk->sk_peer_pid, sk->sk_peer_cred, &peercred);
>                 spin_unlock(&sk->sk_peer_lock);
>
> -               if (copy_to_user(optval, &peercred, len))
> +               if (copy_to_sockptr(optval, &peercred, len))
>                         return -EFAULT;
>                 goto lenout;
>         }
> @@ -1753,11 +1756,11 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
>                 if (len < n * sizeof(gid_t)) {
>                         len = n * sizeof(gid_t);
>                         put_cred(cred);
> -                       return put_user(len, optlen) ? -EFAULT : -ERANGE;
> +                       return copy_to_sockptr(optlen, &len, sizeof(int)) ? -EFAULT : -ERANGE;
>                 }
>                 len = n * sizeof(gid_t);
>
> -               ret = groups_to_user((gid_t __user *)optval, cred->group_info);
> +               ret = groups_to_user(optval, cred->group_info);
>                 put_cred(cred);
>                 if (ret)
>                         return ret;
> @@ -1773,7 +1776,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
>                         return -ENOTCONN;
>                 if (lv < len)
>                         return -EINVAL;
> -               if (copy_to_user(optval, address, len))
> +               if (copy_to_sockptr(optval, address, len))
>                         return -EFAULT;
>                 goto lenout;
>         }
> @@ -1790,7 +1793,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
>                 break;
>
>         case SO_PEERSEC:
> -               return security_socket_getpeersec_stream(sock, optval, optlen, len);
> +               return security_socket_getpeersec_stream(sock, optval.user, optlen.user, len);

I'm assuming there should be something to prevent this being called
from BPF? (haven't read all the patches yet)
Do we want to be a bit more defensive with 'if (!optval.user) return
-EFAULT' or something similar?


>         case SO_MARK:
>                 v.val = sk->sk_mark;
> @@ -1822,7 +1825,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
>                 return sock_getbindtodevice(sk, optval, optlen, len);
>
>         case SO_GET_FILTER:
> -               len = sk_get_filter(sk, (struct sock_filter __user *)optval, len);
> +               len = sk_get_filter(sk, optval, len);
>                 if (len < 0)
>                         return len;
>
> @@ -1870,7 +1873,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
>                 sk_get_meminfo(sk, meminfo);
>
>                 len = min_t(unsigned int, len, sizeof(meminfo));
> -               if (copy_to_user(optval, &meminfo, len))
> +               if (copy_to_sockptr(optval, &meminfo, len))
>                         return -EFAULT;
>
>                 goto lenout;
> @@ -1939,10 +1942,10 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
>
>         if (len > lv)
>                 len = lv;
> -       if (copy_to_user(optval, &v, len))
> +       if (copy_to_sockptr(optval, &v, len))
>                 return -EFAULT;
>  lenout:
> -       if (put_user(len, optlen))
> +       if (copy_to_sockptr(optlen, &len, sizeof(int)))
>                 return -EFAULT;
>         return 0;
>  }
> @@ -1950,7 +1953,9 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
>  int sock_getsockopt(struct socket *sock, int level, int optname,
>                     char __user *optval, int __user *optlen)
>  {
> -       return sk_getsockopt(sock->sk, level, optname, optval, optlen);
> +       return sk_getsockopt(sock->sk, level, optname,
> +                            USER_SOCKPTR(optval),
> +                            USER_SOCKPTR(optlen));
>  }
>
>  /*
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 14/17] bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt()
  2022-08-24 22:27 ` [PATCH bpf-next 14/17] bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt() Martin KaFai Lau
@ 2022-08-25 18:36   ` sdf
  2022-08-26 19:24     ` Martin KaFai Lau
  0 siblings, 1 reply; 23+ messages in thread
From: sdf @ 2022-08-25 18:36 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, netdev, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, David Miller, Eric Dumazet, Jakub Kicinski,
	kernel-team, Paolo Abeni

On 08/24, Martin KaFai Lau wrote:
> This patch changes bpf_getsockopt(SOL_TCP) to reuse
> do_tcp_getsockopt().  It removes the duplicated code from
> bpf_getsockopt(SOL_TCP).

> Before this patch, there were some optnames available to
> bpf_setsockopt(SOL_TCP) but missing in bpf_getsockopt(SOL_TCP).
> For example, TCP_NODELAY, TCP_MAXSEG, TCP_KEEPIDLE, TCP_KEEPINTVL,
> and a few more.  It surprises users from time to time.  This patch
> automatically closes this gap without duplicating more code.

> bpf_getsockopt(TCP_SAVED_SYN) does not free the saved_syn,
> so it stays in sol_tcp_sockopt().

> For string name value like TCP_CONGESTION, bpf expects it
> is always null terminated, so sol_tcp_sockopt() decrements
> optlen by one before calling do_tcp_getsockopt() and
> the 'if (optlen < saved_optlen) memset(..,0,..);'
> in __bpf_getsockopt() will always do a null termination.

> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---
>   include/net/tcp.h |  2 ++
>   net/core/filter.c | 70 ++++++++++++++++++++++++++---------------------
>   net/ipv4/tcp.c    |  4 +--
>   3 files changed, 43 insertions(+), 33 deletions(-)

> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index c03a50c72f40..735e957f7f4b 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -402,6 +402,8 @@ void tcp_init_sock(struct sock *sk);
>   void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb);
>   __poll_t tcp_poll(struct file *file, struct socket *sock,
>   		      struct poll_table_struct *wait);
> +int do_tcp_getsockopt(struct sock *sk, int level,
> +		      int optname, sockptr_t optval, sockptr_t optlen);
>   int tcp_getsockopt(struct sock *sk, int level, int optname,
>   		   char __user *optval, int __user *optlen);
>   bool tcp_bpf_bypass_getsockopt(int level, int optname);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 68b52243b306..cdbbcec46e8b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5096,8 +5096,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk,  
> int optname,
>   	return 0;
>   }

> -static int sol_tcp_setsockopt(struct sock *sk, int optname,
> -			      char *optval, int optlen)
> +static int sol_tcp_sockopt(struct sock *sk, int optname,
> +			   char *optval, int *optlen,
> +			   bool getopt)
>   {
>   	if (sk->sk_prot->setsockopt != tcp_setsockopt)
>   		return -EINVAL;
> @@ -5114,17 +5115,47 @@ static int sol_tcp_setsockopt(struct sock *sk,  
> int optname,
>   	case TCP_USER_TIMEOUT:
>   	case TCP_NOTSENT_LOWAT:
>   	case TCP_SAVE_SYN:
> -		if (optlen != sizeof(int))
> +		if (*optlen != sizeof(int))
>   			return -EINVAL;
>   		break;

[..]

>   	case TCP_CONGESTION:
> +		if (*optlen < 2)
> +			return -EINVAL;
> +		break;
> +	case TCP_SAVED_SYN:
> +		if (*optlen < 1)
> +			return -EINVAL;
>   		break;

This looks a bit inconsistent vs '*optlen != sizeof(int)' above. Maybe

if (*optlen < sizeof(u16))
if (*optlen < sizeof(u8))

?

>   	default:
> -		return bpf_sol_tcp_setsockopt(sk, optname, optval, optlen);
> +		if (getopt)
> +			return -EINVAL;
> +		return bpf_sol_tcp_setsockopt(sk, optname, optval, *optlen);
> +	}
> +
> +	if (getopt) {
> +		if (optname == TCP_SAVED_SYN) {
> +			struct tcp_sock *tp = tcp_sk(sk);
> +
> +			if (!tp->saved_syn ||
> +			    *optlen > tcp_saved_syn_len(tp->saved_syn))
> +				return -EINVAL;

You mention in the description that bpf doesn't doesn't free saved_syn,
maybe worth putting a comment with the rationale here as well?
I'm assuming we don't free from bpf because we want userspace to
have an opportunity to read it as well?

> +			memcpy(optval, tp->saved_syn->data, *optlen);
> +			return 0;
> +		}
> +
> +		if (optname == TCP_CONGESTION) {
> +			if (!inet_csk(sk)->icsk_ca_ops)
> +				return -EINVAL;

Is it worth it doing null termination more explicitly here?
For readability sake:
			/* BPF always expects NULL-terminated strings. */
			optval[*optlen-1] = '\0';
> +			(*optlen)--;
> +		}
> +
> +		return do_tcp_getsockopt(sk, SOL_TCP, optname,
> +					 KERNEL_SOCKPTR(optval),
> +					 KERNEL_SOCKPTR(optlen));
>   	}

>   	return do_tcp_setsockopt(sk, SOL_TCP, optname,
> -				 KERNEL_SOCKPTR(optval), optlen);
> +				 KERNEL_SOCKPTR(optval), *optlen);
>   }

>   static int sol_ip_setsockopt(struct sock *sk, int optname,
> @@ -5179,7 +5210,7 @@ static int __bpf_setsockopt(struct sock *sk, int  
> level, int optname,
>   	else if (IS_ENABLED(CONFIG_IPV6) && level == SOL_IPV6)
>   		return sol_ipv6_setsockopt(sk, optname, optval, optlen);
>   	else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP)
> -		return sol_tcp_setsockopt(sk, optname, optval, optlen);
> +		return sol_tcp_sockopt(sk, optname, optval, &optlen, false);

>   	return -EINVAL;
>   }
> @@ -5202,31 +5233,8 @@ static int __bpf_getsockopt(struct sock *sk, int  
> level, int optname,

>   	if (level == SOL_SOCKET) {
>   		err = sol_socket_sockopt(sk, optname, optval, &optlen, true);
> -	} else if (IS_ENABLED(CONFIG_INET) &&
> -		   level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) {
> -		struct inet_connection_sock *icsk;
> -		struct tcp_sock *tp;
> -
> -		switch (optname) {
> -		case TCP_CONGESTION:
> -			icsk = inet_csk(sk);
> -
> -			if (!icsk->icsk_ca_ops || optlen <= 1)
> -				goto err_clear;
> -			strncpy(optval, icsk->icsk_ca_ops->name, optlen);
> -			optval[optlen - 1] = 0;
> -			break;
> -		case TCP_SAVED_SYN:
> -			tp = tcp_sk(sk);
> -
> -			if (optlen <= 0 || !tp->saved_syn ||
> -			    optlen > tcp_saved_syn_len(tp->saved_syn))
> -				goto err_clear;
> -			memcpy(optval, tp->saved_syn->data, optlen);
> -			break;
> -		default:
> -			goto err_clear;
> -		}
> +	} else if (IS_ENABLED(CONFIG_INET) && level == SOL_TCP) {
> +		err = sol_tcp_sockopt(sk, optname, optval, &optlen, true);
>   	} else if (IS_ENABLED(CONFIG_INET) && level == SOL_IP) {
>   		struct inet_sock *inet = inet_sk(sk);

> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index ab8118225797..a47cb5662be6 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4043,8 +4043,8 @@ struct sk_buff  
> *tcp_get_timestamping_opt_stats(const struct sock *sk,
>   	return stats;
>   }

> -static int do_tcp_getsockopt(struct sock *sk, int level,
> -			     int optname, sockptr_t optval, sockptr_t optlen)
> +int do_tcp_getsockopt(struct sock *sk, int level,
> +		      int optname, sockptr_t optval, sockptr_t optlen)
>   {
>   	struct inet_connection_sock *icsk = inet_csk(sk);
>   	struct tcp_sock *tp = tcp_sk(sk);
> --
> 2.30.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: net: Change sk_getsockopt() to take the sockptr_t argument
  2022-08-25 18:07   ` Stanislav Fomichev
@ 2022-08-26 19:15     ` Martin KaFai Lau
  2022-08-26 20:40       ` Stanislav Fomichev
  0 siblings, 1 reply; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-26 19:15 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, netdev, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, David Miller, Eric Dumazet, Jakub Kicinski,
	kernel-team, Paolo Abeni

On Thu, Aug 25, 2022 at 11:07:36AM -0700, Stanislav Fomichev wrote:
> On Wed, Aug 24, 2022 at 3:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > This patch changes sk_getsockopt() to take the sockptr_t argument
> > such that it can be used by bpf_getsockopt(SOL_SOCKET) in a
> > latter patch.
> >
> > security_socket_getpeersec_stream() is not changed.  It stays
> > with the __user ptr (optval.user and optlen.user) to avoid changes
> > to other security hooks.  bpf_getsockopt(SOL_SOCKET) also does not
> > support SO_PEERSEC.
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> >  include/linux/filter.h  |  3 +--
> >  include/linux/sockptr.h |  5 +++++
> >  net/core/filter.c       |  5 ++---
> >  net/core/sock.c         | 43 +++++++++++++++++++++++------------------
> >  4 files changed, 32 insertions(+), 24 deletions(-)
> >
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index a5f21dc3c432..527ae1d64e27 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -900,8 +900,7 @@ int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk);
> >  int sk_reuseport_attach_bpf(u32 ufd, struct sock *sk);
> >  void sk_reuseport_prog_free(struct bpf_prog *prog);
> >  int sk_detach_filter(struct sock *sk);
> > -int sk_get_filter(struct sock *sk, struct sock_filter __user *filter,
> > -                 unsigned int len);
> > +int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len);
> >
> >  bool sk_filter_charge(struct sock *sk, struct sk_filter *fp);
> >  void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp);
> > diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h
> > index d45902fb4cad..bae5e2369b4f 100644
> > --- a/include/linux/sockptr.h
> > +++ b/include/linux/sockptr.h
> > @@ -64,6 +64,11 @@ static inline int copy_to_sockptr_offset(sockptr_t dst, size_t offset,
> >         return 0;
> >  }
> >
> > +static inline int copy_to_sockptr(sockptr_t dst, const void *src, size_t size)
> > +{
> > +       return copy_to_sockptr_offset(dst, 0, src, size);
> > +}
> > +
> >  static inline void *memdup_sockptr(sockptr_t src, size_t len)
> >  {
> >         void *p = kmalloc_track_caller(len, GFP_USER | __GFP_NOWARN);
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 63e25d8ce501..0f6f86b9e487 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -10712,8 +10712,7 @@ int sk_detach_filter(struct sock *sk)
> >  }
> >  EXPORT_SYMBOL_GPL(sk_detach_filter);
> >
> > -int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
> > -                 unsigned int len)
> > +int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len)
> >  {
> >         struct sock_fprog_kern *fprog;
> >         struct sk_filter *filter;
> > @@ -10744,7 +10743,7 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
> >                 goto out;
> >
> >         ret = -EFAULT;
> > -       if (copy_to_user(ubuf, fprog->filter, bpf_classic_proglen(fprog)))
> > +       if (copy_to_sockptr(optval, fprog->filter, bpf_classic_proglen(fprog)))
> >                 goto out;
> >
> >         /* Instead of bytes, the API requests to return the number
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 21bc4bf6b485..7fa30fd4b37f 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -712,8 +712,8 @@ static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen)
> >         return ret;
> >  }
> >
> > -static int sock_getbindtodevice(struct sock *sk, char __user *optval,
> > -                               int __user *optlen, int len)
> > +static int sock_getbindtodevice(struct sock *sk, sockptr_t optval,
> > +                               sockptr_t optlen, int len)
> >  {
> >         int ret = -ENOPROTOOPT;
> >  #ifdef CONFIG_NETDEVICES
> > @@ -737,12 +737,12 @@ static int sock_getbindtodevice(struct sock *sk, char __user *optval,
> >         len = strlen(devname) + 1;
> >
> >         ret = -EFAULT;
> > -       if (copy_to_user(optval, devname, len))
> > +       if (copy_to_sockptr(optval, devname, len))
> >                 goto out;
> >
> >  zero:
> >         ret = -EFAULT;
> > -       if (put_user(len, optlen))
> > +       if (copy_to_sockptr(optlen, &len, sizeof(int)))
> >                 goto out;
> >
> >         ret = 0;
> > @@ -1568,20 +1568,23 @@ static void cred_to_ucred(struct pid *pid, const struct cred *cred,
> >         }
> >  }
> >
> > -static int groups_to_user(gid_t __user *dst, const struct group_info *src)
> > +static int groups_to_user(sockptr_t dst, const struct group_info *src)
> >  {
> >         struct user_namespace *user_ns = current_user_ns();
> >         int i;
> >
> > -       for (i = 0; i < src->ngroups; i++)
> > -               if (put_user(from_kgid_munged(user_ns, src->gid[i]), dst + i))
> > +       for (i = 0; i < src->ngroups; i++) {
> > +               gid_t gid = from_kgid_munged(user_ns, src->gid[i]);
> > +
> > +               if (copy_to_sockptr_offset(dst, i * sizeof(gid), &gid, sizeof(gid)))
> >                         return -EFAULT;
> > +       }
> >
> >         return 0;
> >  }
> >
> >  static int sk_getsockopt(struct sock *sk, int level, int optname,
> > -                        char __user *optval, int __user *optlen)
> > +                        sockptr_t optval, sockptr_t optlen)
> >  {
> >         struct socket *sock = sk->sk_socket;
> >
> > @@ -1600,7 +1603,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> >         int lv = sizeof(int);
> >         int len;
> >
> > -       if (get_user(len, optlen))
> > +       if (copy_from_sockptr(&len, optlen, sizeof(int)))
> 
> Do we want to be consistent wrt to sizeof?
> 
> copy_from_sockptr(&len, optlen, sizeof(int))
> vs
> copy_from_sockptr(&len, optlen, sizeof(optlen))
optlen type is sockptr_t now. sizeof(optlen) won't work.

so either

copy_from_sockptr(&len, optlen, sizeof(len))
or
copy_from_sockptr(&len, optlen, sizeof(int))

I went with the latter 'sizeof(int)' for consistency because the
name is not always 'len' but optlen is always in 'int'.

> 
> Alternatively, should we have put_sockptr/get_sockopt with a semantics
> similar to put_user/get_user to remove all this ambiguity?
The type is lost in sockptr.{kernel,user} which is 'void *'.  {get,put}_user()
depends on it.  The very early sockptr_t introduction also changes
get_user() to copy_from_sockptr() for integer value.

One option could be to make {get,put}_sockopt(x, sockptr) to use x to decide the
type.  Not sure how that may look like.  I can give it a try.

> 
> >                 return -EFAULT;
> >         if (len < 0)
> >                 return -EINVAL;
> > @@ -1735,7 +1738,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> >                 cred_to_ucred(sk->sk_peer_pid, sk->sk_peer_cred, &peercred);
> >                 spin_unlock(&sk->sk_peer_lock);
> >
> > -               if (copy_to_user(optval, &peercred, len))
> > +               if (copy_to_sockptr(optval, &peercred, len))
> >                         return -EFAULT;
> >                 goto lenout;
> >         }
> > @@ -1753,11 +1756,11 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> >                 if (len < n * sizeof(gid_t)) {
> >                         len = n * sizeof(gid_t);
> >                         put_cred(cred);
> > -                       return put_user(len, optlen) ? -EFAULT : -ERANGE;
> > +                       return copy_to_sockptr(optlen, &len, sizeof(int)) ? -EFAULT : -ERANGE;
> >                 }
> >                 len = n * sizeof(gid_t);
> >
> > -               ret = groups_to_user((gid_t __user *)optval, cred->group_info);
> > +               ret = groups_to_user(optval, cred->group_info);
> >                 put_cred(cred);
> >                 if (ret)
> >                         return ret;
> > @@ -1773,7 +1776,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> >                         return -ENOTCONN;
> >                 if (lv < len)
> >                         return -EINVAL;
> > -               if (copy_to_user(optval, address, len))
> > +               if (copy_to_sockptr(optval, address, len))
> >                         return -EFAULT;
> >                 goto lenout;
> >         }
> > @@ -1790,7 +1793,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> >                 break;
> >
> >         case SO_PEERSEC:
> > -               return security_socket_getpeersec_stream(sock, optval, optlen, len);
> > +               return security_socket_getpeersec_stream(sock, optval.user, optlen.user, len);
> 
> I'm assuming there should be something to prevent this being called
> from BPF? (haven't read all the patches yet)
Not sure if any of the hooks may block.

> Do we want to be a bit more defensive with 'if (!optval.user) return
> -EFAULT' or something similar?
Checking 'optval.is_kernel || optlen.is_kernel'?
Yep.  Make sense.  It may be better to do the check inside
security_socket_getpeersec_stream(sock, optval, optlen, ...).

> 
> 
> >         case SO_MARK:
> >                 v.val = sk->sk_mark;
> > @@ -1822,7 +1825,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> >                 return sock_getbindtodevice(sk, optval, optlen, len);
> >
> >         case SO_GET_FILTER:
> > -               len = sk_get_filter(sk, (struct sock_filter __user *)optval, len);
> > +               len = sk_get_filter(sk, optval, len);
> >                 if (len < 0)
> >                         return len;
> >
> > @@ -1870,7 +1873,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> >                 sk_get_meminfo(sk, meminfo);
> >
> >                 len = min_t(unsigned int, len, sizeof(meminfo));
> > -               if (copy_to_user(optval, &meminfo, len))
> > +               if (copy_to_sockptr(optval, &meminfo, len))
> >                         return -EFAULT;
> >
> >                 goto lenout;
> > @@ -1939,10 +1942,10 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> >
> >         if (len > lv)
> >                 len = lv;
> > -       if (copy_to_user(optval, &v, len))
> > +       if (copy_to_sockptr(optval, &v, len))
> >                 return -EFAULT;
> >  lenout:
> > -       if (put_user(len, optlen))
> > +       if (copy_to_sockptr(optlen, &len, sizeof(int)))
> >                 return -EFAULT;
> >         return 0;
> >  }
> > @@ -1950,7 +1953,9 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> >  int sock_getsockopt(struct socket *sock, int level, int optname,
> >                     char __user *optval, int __user *optlen)
> >  {
> > -       return sk_getsockopt(sock->sk, level, optname, optval, optlen);
> > +       return sk_getsockopt(sock->sk, level, optname,
> > +                            USER_SOCKPTR(optval),
> > +                            USER_SOCKPTR(optlen));
> >  }
> >
> >  /*
> > --
> > 2.30.2
> >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 14/17] bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt()
  2022-08-25 18:36   ` sdf
@ 2022-08-26 19:24     ` Martin KaFai Lau
  0 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2022-08-26 19:24 UTC (permalink / raw)
  To: sdf
  Cc: bpf, netdev, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, David Miller, Eric Dumazet, Jakub Kicinski,
	kernel-team, Paolo Abeni

On Thu, Aug 25, 2022 at 11:36:36AM -0700, sdf@google.com wrote:
> On 08/24, Martin KaFai Lau wrote:
> > This patch changes bpf_getsockopt(SOL_TCP) to reuse
> > do_tcp_getsockopt().  It removes the duplicated code from
> > bpf_getsockopt(SOL_TCP).
> 
> > Before this patch, there were some optnames available to
> > bpf_setsockopt(SOL_TCP) but missing in bpf_getsockopt(SOL_TCP).
> > For example, TCP_NODELAY, TCP_MAXSEG, TCP_KEEPIDLE, TCP_KEEPINTVL,
> > and a few more.  It surprises users from time to time.  This patch
> > automatically closes this gap without duplicating more code.
> 
> > bpf_getsockopt(TCP_SAVED_SYN) does not free the saved_syn,
> > so it stays in sol_tcp_sockopt().
> 
> > For string name value like TCP_CONGESTION, bpf expects it
> > is always null terminated, so sol_tcp_sockopt() decrements
> > optlen by one before calling do_tcp_getsockopt() and
> > the 'if (optlen < saved_optlen) memset(..,0,..);'
> > in __bpf_getsockopt() will always do a null termination.
> 
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> >   include/net/tcp.h |  2 ++
> >   net/core/filter.c | 70 ++++++++++++++++++++++++++---------------------
> >   net/ipv4/tcp.c    |  4 +--
> >   3 files changed, 43 insertions(+), 33 deletions(-)
> 
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index c03a50c72f40..735e957f7f4b 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -402,6 +402,8 @@ void tcp_init_sock(struct sock *sk);
> >   void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb);
> >   __poll_t tcp_poll(struct file *file, struct socket *sock,
> >   		      struct poll_table_struct *wait);
> > +int do_tcp_getsockopt(struct sock *sk, int level,
> > +		      int optname, sockptr_t optval, sockptr_t optlen);
> >   int tcp_getsockopt(struct sock *sk, int level, int optname,
> >   		   char __user *optval, int __user *optlen);
> >   bool tcp_bpf_bypass_getsockopt(int level, int optname);
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 68b52243b306..cdbbcec46e8b 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5096,8 +5096,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk,
> > int optname,
> >   	return 0;
> >   }
> 
> > -static int sol_tcp_setsockopt(struct sock *sk, int optname,
> > -			      char *optval, int optlen)
> > +static int sol_tcp_sockopt(struct sock *sk, int optname,
> > +			   char *optval, int *optlen,
> > +			   bool getopt)
> >   {
> >   	if (sk->sk_prot->setsockopt != tcp_setsockopt)
> >   		return -EINVAL;
> > @@ -5114,17 +5115,47 @@ static int sol_tcp_setsockopt(struct sock *sk,
> > int optname,
> >   	case TCP_USER_TIMEOUT:
> >   	case TCP_NOTSENT_LOWAT:
> >   	case TCP_SAVE_SYN:
> > -		if (optlen != sizeof(int))
> > +		if (*optlen != sizeof(int))
> >   			return -EINVAL;
> >   		break;
> 
> [..]
> 
> >   	case TCP_CONGESTION:
> > +		if (*optlen < 2)
> > +			return -EINVAL;
> > +		break;
> > +	case TCP_SAVED_SYN:
> > +		if (*optlen < 1)
> > +			return -EINVAL;
> >   		break;
> 
> This looks a bit inconsistent vs '*optlen != sizeof(int)' above. Maybe
> 
> if (*optlen < sizeof(u16))
> if (*optlen < sizeof(u8))
TCP_CONGESTION (name string) and TCP_SAVED_SYN (raw binary)
are not expecting integer optval, so I think it is
better to stay away from using integer u16 or u8.

> 
> ?
> 
> >   	default:
> > -		return bpf_sol_tcp_setsockopt(sk, optname, optval, optlen);
> > +		if (getopt)
> > +			return -EINVAL;
> > +		return bpf_sol_tcp_setsockopt(sk, optname, optval, *optlen);
> > +	}
> > +
> > +	if (getopt) {
> > +		if (optname == TCP_SAVED_SYN) {
> > +			struct tcp_sock *tp = tcp_sk(sk);
> > +
> > +			if (!tp->saved_syn ||
> > +			    *optlen > tcp_saved_syn_len(tp->saved_syn))
> > +				return -EINVAL;
> 
> You mention in the description that bpf doesn't doesn't free saved_syn,
> maybe worth putting a comment with the rationale here as well?
> I'm assuming we don't free from bpf because we want userspace to
> have an opportunity to read it as well?
Yes, it is the reason.  I will add a comment.

> 
> > +			memcpy(optval, tp->saved_syn->data, *optlen);
> > +			return 0;
> > +		}
> > +
> > +		if (optname == TCP_CONGESTION) {
> > +			if (!inet_csk(sk)->icsk_ca_ops)
> > +				return -EINVAL;
> 
> Is it worth it doing null termination more explicitly here?
> For readability sake:
> 			/* BPF always expects NULL-terminated strings. */
> 			optval[*optlen-1] = '\0';
Yep.  will do in v2.

> > +			(*optlen)--;
> > +		}
> > +
> > +		return do_tcp_getsockopt(sk, SOL_TCP, optname,
> > +					 KERNEL_SOCKPTR(optval),
> > +					 KERNEL_SOCKPTR(optlen));
> >   	}
> 
> >   	return do_tcp_setsockopt(sk, SOL_TCP, optname,
> > -				 KERNEL_SOCKPTR(optval), optlen);
> > +				 KERNEL_SOCKPTR(optval), *optlen);
> >   }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: net: Change sk_getsockopt() to take the sockptr_t argument
  2022-08-26 19:15     ` Martin KaFai Lau
@ 2022-08-26 20:40       ` Stanislav Fomichev
  0 siblings, 0 replies; 23+ messages in thread
From: Stanislav Fomichev @ 2022-08-26 20:40 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, netdev, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, David Miller, Eric Dumazet, Jakub Kicinski,
	kernel-team, Paolo Abeni

On Fri, Aug 26, 2022 at 12:15 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Thu, Aug 25, 2022 at 11:07:36AM -0700, Stanislav Fomichev wrote:
> > On Wed, Aug 24, 2022 at 3:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > This patch changes sk_getsockopt() to take the sockptr_t argument
> > > such that it can be used by bpf_getsockopt(SOL_SOCKET) in a
> > > latter patch.
> > >
> > > security_socket_getpeersec_stream() is not changed.  It stays
> > > with the __user ptr (optval.user and optlen.user) to avoid changes
> > > to other security hooks.  bpf_getsockopt(SOL_SOCKET) also does not
> > > support SO_PEERSEC.
> > >
> > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > ---
> > >  include/linux/filter.h  |  3 +--
> > >  include/linux/sockptr.h |  5 +++++
> > >  net/core/filter.c       |  5 ++---
> > >  net/core/sock.c         | 43 +++++++++++++++++++++++------------------
> > >  4 files changed, 32 insertions(+), 24 deletions(-)
> > >
> > > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > > index a5f21dc3c432..527ae1d64e27 100644
> > > --- a/include/linux/filter.h
> > > +++ b/include/linux/filter.h
> > > @@ -900,8 +900,7 @@ int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk);
> > >  int sk_reuseport_attach_bpf(u32 ufd, struct sock *sk);
> > >  void sk_reuseport_prog_free(struct bpf_prog *prog);
> > >  int sk_detach_filter(struct sock *sk);
> > > -int sk_get_filter(struct sock *sk, struct sock_filter __user *filter,
> > > -                 unsigned int len);
> > > +int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len);
> > >
> > >  bool sk_filter_charge(struct sock *sk, struct sk_filter *fp);
> > >  void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp);
> > > diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h
> > > index d45902fb4cad..bae5e2369b4f 100644
> > > --- a/include/linux/sockptr.h
> > > +++ b/include/linux/sockptr.h
> > > @@ -64,6 +64,11 @@ static inline int copy_to_sockptr_offset(sockptr_t dst, size_t offset,
> > >         return 0;
> > >  }
> > >
> > > +static inline int copy_to_sockptr(sockptr_t dst, const void *src, size_t size)
> > > +{
> > > +       return copy_to_sockptr_offset(dst, 0, src, size);
> > > +}
> > > +
> > >  static inline void *memdup_sockptr(sockptr_t src, size_t len)
> > >  {
> > >         void *p = kmalloc_track_caller(len, GFP_USER | __GFP_NOWARN);
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 63e25d8ce501..0f6f86b9e487 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -10712,8 +10712,7 @@ int sk_detach_filter(struct sock *sk)
> > >  }
> > >  EXPORT_SYMBOL_GPL(sk_detach_filter);
> > >
> > > -int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
> > > -                 unsigned int len)
> > > +int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len)
> > >  {
> > >         struct sock_fprog_kern *fprog;
> > >         struct sk_filter *filter;
> > > @@ -10744,7 +10743,7 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
> > >                 goto out;
> > >
> > >         ret = -EFAULT;
> > > -       if (copy_to_user(ubuf, fprog->filter, bpf_classic_proglen(fprog)))
> > > +       if (copy_to_sockptr(optval, fprog->filter, bpf_classic_proglen(fprog)))
> > >                 goto out;
> > >
> > >         /* Instead of bytes, the API requests to return the number
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index 21bc4bf6b485..7fa30fd4b37f 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -712,8 +712,8 @@ static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen)
> > >         return ret;
> > >  }
> > >
> > > -static int sock_getbindtodevice(struct sock *sk, char __user *optval,
> > > -                               int __user *optlen, int len)
> > > +static int sock_getbindtodevice(struct sock *sk, sockptr_t optval,
> > > +                               sockptr_t optlen, int len)
> > >  {
> > >         int ret = -ENOPROTOOPT;
> > >  #ifdef CONFIG_NETDEVICES
> > > @@ -737,12 +737,12 @@ static int sock_getbindtodevice(struct sock *sk, char __user *optval,
> > >         len = strlen(devname) + 1;
> > >
> > >         ret = -EFAULT;
> > > -       if (copy_to_user(optval, devname, len))
> > > +       if (copy_to_sockptr(optval, devname, len))
> > >                 goto out;
> > >
> > >  zero:
> > >         ret = -EFAULT;
> > > -       if (put_user(len, optlen))
> > > +       if (copy_to_sockptr(optlen, &len, sizeof(int)))
> > >                 goto out;
> > >
> > >         ret = 0;
> > > @@ -1568,20 +1568,23 @@ static void cred_to_ucred(struct pid *pid, const struct cred *cred,
> > >         }
> > >  }
> > >
> > > -static int groups_to_user(gid_t __user *dst, const struct group_info *src)
> > > +static int groups_to_user(sockptr_t dst, const struct group_info *src)
> > >  {
> > >         struct user_namespace *user_ns = current_user_ns();
> > >         int i;
> > >
> > > -       for (i = 0; i < src->ngroups; i++)
> > > -               if (put_user(from_kgid_munged(user_ns, src->gid[i]), dst + i))
> > > +       for (i = 0; i < src->ngroups; i++) {
> > > +               gid_t gid = from_kgid_munged(user_ns, src->gid[i]);
> > > +
> > > +               if (copy_to_sockptr_offset(dst, i * sizeof(gid), &gid, sizeof(gid)))
> > >                         return -EFAULT;
> > > +       }
> > >
> > >         return 0;
> > >  }
> > >
> > >  static int sk_getsockopt(struct sock *sk, int level, int optname,
> > > -                        char __user *optval, int __user *optlen)
> > > +                        sockptr_t optval, sockptr_t optlen)
> > >  {
> > >         struct socket *sock = sk->sk_socket;
> > >
> > > @@ -1600,7 +1603,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> > >         int lv = sizeof(int);
> > >         int len;
> > >
> > > -       if (get_user(len, optlen))
> > > +       if (copy_from_sockptr(&len, optlen, sizeof(int)))
> >
> > Do we want to be consistent wrt to sizeof?
> >
> > copy_from_sockptr(&len, optlen, sizeof(int))
> > vs
> > copy_from_sockptr(&len, optlen, sizeof(optlen))
> optlen type is sockptr_t now. sizeof(optlen) won't work.
>
> so either
>
> copy_from_sockptr(&len, optlen, sizeof(len))
> or
> copy_from_sockptr(&len, optlen, sizeof(int))
>
> I went with the latter 'sizeof(int)' for consistency because the
> name is not always 'len' but optlen is always in 'int'.
>
> >
> > Alternatively, should we have put_sockptr/get_sockopt with a semantics
> > similar to put_user/get_user to remove all this ambiguity?
> The type is lost in sockptr.{kernel,user} which is 'void *'.  {get,put}_user()
> depends on it.  The very early sockptr_t introduction also changes
> get_user() to copy_from_sockptr() for integer value.
>
> One option could be to make {get,put}_sockopt(x, sockptr) to use x to decide the
> type.  Not sure how that may look like.  I can give it a try.

I was thinking:
#define get_sockopt(x, ptr) copy_from_sockptr(&x, ptr, sizeof(x))

I was also slightly preferring the following:
copy_from_sockptr(&len, optlen, sizeof(len))

But now I'm not sure. I guess with that call, we really want
sizeof(len) == sizeof(actual pointer of optlen data).
So maybe the way you do it, with explicit sizeof(int), communicates it better.

Let's keep your version for now, because I'm not sure whatever I'm
suggesting actually makes it better..

> >
> > >                 return -EFAULT;
> > >         if (len < 0)
> > >                 return -EINVAL;
> > > @@ -1735,7 +1738,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> > >                 cred_to_ucred(sk->sk_peer_pid, sk->sk_peer_cred, &peercred);
> > >                 spin_unlock(&sk->sk_peer_lock);
> > >
> > > -               if (copy_to_user(optval, &peercred, len))
> > > +               if (copy_to_sockptr(optval, &peercred, len))
> > >                         return -EFAULT;
> > >                 goto lenout;
> > >         }
> > > @@ -1753,11 +1756,11 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> > >                 if (len < n * sizeof(gid_t)) {
> > >                         len = n * sizeof(gid_t);
> > >                         put_cred(cred);
> > > -                       return put_user(len, optlen) ? -EFAULT : -ERANGE;
> > > +                       return copy_to_sockptr(optlen, &len, sizeof(int)) ? -EFAULT : -ERANGE;
> > >                 }
> > >                 len = n * sizeof(gid_t);
> > >
> > > -               ret = groups_to_user((gid_t __user *)optval, cred->group_info);
> > > +               ret = groups_to_user(optval, cred->group_info);
> > >                 put_cred(cred);
> > >                 if (ret)
> > >                         return ret;
> > > @@ -1773,7 +1776,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> > >                         return -ENOTCONN;
> > >                 if (lv < len)
> > >                         return -EINVAL;
> > > -               if (copy_to_user(optval, address, len))
> > > +               if (copy_to_sockptr(optval, address, len))
> > >                         return -EFAULT;
> > >                 goto lenout;
> > >         }
> > > @@ -1790,7 +1793,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> > >                 break;
> > >
> > >         case SO_PEERSEC:
> > > -               return security_socket_getpeersec_stream(sock, optval, optlen, len);
> > > +               return security_socket_getpeersec_stream(sock, optval.user, optlen.user, len);
> >
> > I'm assuming there should be something to prevent this being called
> > from BPF? (haven't read all the patches yet)
> Not sure if any of the hooks may block.

I wasn't clear with my question. I found the answer later on, need to
develop a habit of not pressing 'send' before I read the whole series
:-)
You do the checks in sol_socket_sockopt to ignore that SO_PEERSEC from bpf.

> > Do we want to be a bit more defensive with 'if (!optval.user) return
> > -EFAULT' or something similar?
> Checking 'optval.is_kernel || optlen.is_kernel'?
> Yep.  Make sense.  It may be better to do the check inside
> security_socket_getpeersec_stream(sock, optval, optlen, ...).
>
> >
> >
> > >         case SO_MARK:
> > >                 v.val = sk->sk_mark;
> > > @@ -1822,7 +1825,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> > >                 return sock_getbindtodevice(sk, optval, optlen, len);
> > >
> > >         case SO_GET_FILTER:
> > > -               len = sk_get_filter(sk, (struct sock_filter __user *)optval, len);
> > > +               len = sk_get_filter(sk, optval, len);
> > >                 if (len < 0)
> > >                         return len;
> > >
> > > @@ -1870,7 +1873,7 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> > >                 sk_get_meminfo(sk, meminfo);
> > >
> > >                 len = min_t(unsigned int, len, sizeof(meminfo));
> > > -               if (copy_to_user(optval, &meminfo, len))
> > > +               if (copy_to_sockptr(optval, &meminfo, len))
> > >                         return -EFAULT;
> > >
> > >                 goto lenout;
> > > @@ -1939,10 +1942,10 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> > >
> > >         if (len > lv)
> > >                 len = lv;
> > > -       if (copy_to_user(optval, &v, len))
> > > +       if (copy_to_sockptr(optval, &v, len))
> > >                 return -EFAULT;
> > >  lenout:
> > > -       if (put_user(len, optlen))
> > > +       if (copy_to_sockptr(optlen, &len, sizeof(int)))
> > >                 return -EFAULT;
> > >         return 0;
> > >  }
> > > @@ -1950,7 +1953,9 @@ static int sk_getsockopt(struct sock *sk, int level, int optname,
> > >  int sock_getsockopt(struct socket *sock, int level, int optname,
> > >                     char __user *optval, int __user *optlen)
> > >  {
> > > -       return sk_getsockopt(sock->sk, level, optname, optval, optlen);
> > > +       return sk_getsockopt(sock->sk, level, optname,
> > > +                            USER_SOCKPTR(optval),
> > > +                            USER_SOCKPTR(optlen));
> > >  }
> > >
> > >  /*
> > > --
> > > 2.30.2
> > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-08-26 20:40 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-24 22:26 [PATCH bpf-next 00/17] bpf: net: Remove duplicated code from bpf_getsockopt() Martin KaFai Lau
2022-08-24 22:26 ` [PATCH bpf-next 01/17] net: Change sock_getsockopt() to take the sk ptr instead of the sock ptr Martin KaFai Lau
2022-08-24 22:26 ` [PATCH bpf-next 02/17] bpf: net: Change sk_getsockopt() to take the sockptr_t argument Martin KaFai Lau
2022-08-25 18:07   ` Stanislav Fomichev
2022-08-26 19:15     ` Martin KaFai Lau
2022-08-26 20:40       ` Stanislav Fomichev
2022-08-24 22:26 ` [PATCH bpf-next 03/17] bpf: net: Avoid sk_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
2022-08-24 22:26 ` [PATCH bpf-next 04/17] bpf: net: Change do_tcp_getsockopt() to take the sockptr_t argument Martin KaFai Lau
2022-08-24 22:26 ` [PATCH bpf-next 05/17] bpf: net: Avoid do_tcp_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
2022-08-24 22:26 ` [PATCH bpf-next 06/17] bpf: net: Change do_ip_getsockopt() to take the sockptr_t argument Martin KaFai Lau
2022-08-24 22:26 ` [PATCH bpf-next 07/17] bpf: net: Avoid do_ip_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
2022-08-24 22:26 ` [PATCH bpf-next 08/17] net: Remove unused flags argument from do_ipv6_getsockopt Martin KaFai Lau
2022-08-24 22:26 ` [PATCH bpf-next 09/17] net: Add a len argument to compat_ipv6_get_msfilter() Martin KaFai Lau
2022-08-24 22:27 ` [PATCH bpf-next 10/17] bpf: net: Change do_ipv6_getsockopt() to take the sockptr_t argument Martin KaFai Lau
2022-08-24 22:27 ` [PATCH bpf-next 11/17] bpf: net: Avoid do_ipv6_getsockopt() taking sk lock when called from bpf Martin KaFai Lau
2022-08-24 22:27 ` [PATCH bpf-next 12/17] bpf: Embed kernel CONFIG check into the if statement in bpf_getsockopt Martin KaFai Lau
2022-08-24 22:27 ` [PATCH bpf-next 13/17] bpf: Change bpf_getsockopt(SOL_SOCKET) to reuse sk_getsockopt() Martin KaFai Lau
2022-08-24 22:27 ` [PATCH bpf-next 14/17] bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt() Martin KaFai Lau
2022-08-25 18:36   ` sdf
2022-08-26 19:24     ` Martin KaFai Lau
2022-08-24 22:27 ` [PATCH bpf-next 15/17] bpf: Change bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt() Martin KaFai Lau
2022-08-24 22:27 ` [PATCH bpf-next 16/17] bpf: Change bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt() Martin KaFai Lau
2022-08-24 22:27 ` [PATCH bpf-next 17/17] selftest/bpf: Add test for bpf_getsockopt() Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).