All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/9] Add cgroup sockaddr hooks for unix sockets
@ 2022-12-10 19:35 Daan De Meyer
  2022-12-10 19:35 ` [PATCH bpf-next v2 1/9] selftests/bpf: Add missing section name tests for getpeername/getsockname Daan De Meyer
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Daan De Meyer @ 2022-12-10 19:35 UTC (permalink / raw)
  To: bpf; +Cc: Daan De Meyer, martin.lau, kernel-team

Changes since v1:

* Split into multiple patches instead of one single patch
* Added unix support for all socket address hooks instead of only connect()
* Switched approach to expose the socket address length to the bpf hook
instead of recalculating the socket address length in kernelspace to
properly support abstract unix socket addresses
* Modified socket address hook tests to calculate the socket address length
once and pass it around everywhere instead of recalculating the actual unix
socket address length on demand.
* Added some missing section name tests for getpeername()/getsockname()

This patch series extends the cgroup sockaddr hooks to include support for
unix sockets. To add support for unix sockets, struct bpf_sock_addr is
extended to expose the unix socket path (sun_path) and the socket address
length to the bpf program. For unix sockets, the address length is writable,
for the other socket address hook types, the address length is only readable.

I intend to use these new hooks in systemd to reimplement the LogNamespace=
feature, which allows running multiple instances of systemd-journald to
process the logs of different services. systemd-journald also processes
syslog messages, so currently, using log namespaces means all services running
in the same log namespace have to live in the same private mount namespace
so that systemd can mount the journal namespace's associated syslog socket
over /dev/log to properly direct syslog messages from all services running
in that log namespace to the correct systemd-journald instance. We want to
relax this requirement so that processes running in disjoint mount namespaces
can still run in the same log namespace. To achieve this, we can use these
new hooks to rewrite the socket address of any connect(), sendto(), ...
syscalls to /dev/log to the socket address of the journal namespace's syslog
socket instead, which will transparently do the redirection without requiring
use of a mount namespace and mounting over /dev/log.

Aside from the above usecase, these hooks can more generally be used to
transparently redirect unix sockets to different addresses as required by
services.

Daan De Meyer (9):
  selftests/bpf: Add missing section name tests for
    getpeername/getsockname
  bpf: Allow read access to addr_len from cgroup sockaddr programs
  bpf: Support access to sun_path from cgroup sockaddr programs
  selftests/bpf: Track sockaddr length in sock addr tests
  bpf: Implement cgroup sockaddr hooks for unix sockets
  libbpf: Add support for cgroup unix socket address hooks
  bpftool: Add support for cgroup unix socket address hooks
  selftests/bpf: Add tests for cgroup unix socket address hooks
  documentation/bpf: Document cgroup unix socket address hooks

 Documentation/bpf/libbpf/program_types.rst    |  12 +
 include/linux/bpf-cgroup-defs.h               |   6 +
 include/linux/bpf-cgroup.h                    | 153 +++++-----
 include/linux/filter.h                        |   1 +
 include/uapi/linux/bpf.h                      |  16 +-
 kernel/bpf/cgroup.c                           |  14 +-
 kernel/bpf/syscall.c                          |  18 ++
 kernel/bpf/verifier.c                         |   7 +-
 net/core/filter.c                             | 109 +++++++-
 net/ipv4/af_inet.c                            |   9 +-
 net/ipv4/ping.c                               |   2 +-
 net/ipv4/tcp_ipv4.c                           |   2 +-
 net/ipv4/udp.c                                |  11 +-
 net/ipv6/af_inet6.c                           |   9 +-
 net/ipv6/ping.c                               |   2 +-
 net/ipv6/tcp_ipv6.c                           |   2 +-
 net/ipv6/udp.c                                |  12 +-
 net/unix/af_unix.c                            |  85 +++++-
 .../bpftool/Documentation/bpftool-cgroup.rst  |  21 +-
 tools/bpf/bpftool/cgroup.c                    |  17 +-
 tools/bpf/bpftool/common.c                    |   6 +
 tools/include/uapi/linux/bpf.h                |  16 +-
 tools/lib/bpf/libbpf.c                        |  12 +
 .../selftests/bpf/prog_tests/section_names.c  |  50 ++++
 .../testing/selftests/bpf/progs/bindun_prog.c |  36 +++
 .../selftests/bpf/progs/connectun_prog.c      |  28 ++
 .../selftests/bpf/progs/recvmsgun_prog.c      |  36 +++
 .../selftests/bpf/progs/sendmsgun_prog.c      |  28 ++
 tools/testing/selftests/bpf/test_sock_addr.c  | 263 ++++++++++++++----
 29 files changed, 815 insertions(+), 168 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bindun_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/connectun_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/recvmsgun_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/sendmsgun_prog.c

--
2.38.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 1/9] selftests/bpf: Add missing section name tests for getpeername/getsockname
  2022-12-10 19:35 [PATCH bpf-next v2 0/9] Add cgroup sockaddr hooks for unix sockets Daan De Meyer
@ 2022-12-10 19:35 ` Daan De Meyer
  2022-12-13  4:46   ` Yonghong Song
  2022-12-10 19:35 ` [PATCH bpf-next v2 2/9] bpf: Allow read access to addr_len from cgroup sockaddr programs Daan De Meyer
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Daan De Meyer @ 2022-12-10 19:35 UTC (permalink / raw)
  To: bpf; +Cc: Daan De Meyer, martin.lau, kernel-team

---
 .../selftests/bpf/prog_tests/section_names.c  | 20 +++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/section_names.c b/tools/testing/selftests/bpf/prog_tests/section_names.c
index 8b571890c57e..fc5248e94a01 100644
--- a/tools/testing/selftests/bpf/prog_tests/section_names.c
+++ b/tools/testing/selftests/bpf/prog_tests/section_names.c
@@ -158,6 +158,26 @@ static struct sec_name_test tests[] = {
 		{0, BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_CGROUP_SETSOCKOPT},
 		{0, BPF_CGROUP_SETSOCKOPT},
 	},
+	{
+		"cgroup/getpeername4",
+		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_GETPEERNAME},
+		{0, BPF_CGROUP_INET4_GETPEERNAME},
+	},
+	{
+		"cgroup/getpeername6",
+		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_GETPEERNAME},
+		{0, BPF_CGROUP_INET6_GETPEERNAME},
+	},
+	{
+		"cgroup/getsockname4",
+		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_GETSOCKNAME},
+		{0, BPF_CGROUP_INET4_GETSOCKNAME},
+	},
+	{
+		"cgroup/getsockname6",
+		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_GETSOCKNAME},
+		{0, BPF_CGROUP_INET6_GETSOCKNAME},
+	},
 };
 
 static void test_prog_type_by_name(const struct sec_name_test *test)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 2/9] bpf: Allow read access to addr_len from cgroup sockaddr programs
  2022-12-10 19:35 [PATCH bpf-next v2 0/9] Add cgroup sockaddr hooks for unix sockets Daan De Meyer
  2022-12-10 19:35 ` [PATCH bpf-next v2 1/9] selftests/bpf: Add missing section name tests for getpeername/getsockname Daan De Meyer
@ 2022-12-10 19:35 ` Daan De Meyer
  2022-12-13  6:06   ` Yonghong Song
  2022-12-16  7:28   ` Martin KaFai Lau
  2022-12-10 19:35 ` [PATCH bpf-next v2 3/9] bpf: Support access to sun_path " Daan De Meyer
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 20+ messages in thread
From: Daan De Meyer @ 2022-12-10 19:35 UTC (permalink / raw)
  To: bpf; +Cc: Daan De Meyer, martin.lau, kernel-team

---
 include/linux/bpf-cgroup.h     | 124 +++++++++++++++++----------------
 include/linux/filter.h         |   1 +
 include/uapi/linux/bpf.h       |   1 +
 kernel/bpf/cgroup.c            |   3 +
 net/core/filter.c              |  51 ++++++++++++++
 net/ipv4/af_inet.c             |   9 +--
 net/ipv4/ping.c                |   2 +-
 net/ipv4/tcp_ipv4.c            |   2 +-
 net/ipv4/udp.c                 |  11 +--
 net/ipv6/af_inet6.c            |   9 +--
 net/ipv6/ping.c                |   2 +-
 net/ipv6/tcp_ipv6.c            |   2 +-
 net/ipv6/udp.c                 |  12 ++--
 tools/include/uapi/linux/bpf.h |   1 +
 14 files changed, 146 insertions(+), 84 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 57e9e109257e..3ab2f06ddc8a 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -120,6 +120,7 @@ int __cgroup_bpf_run_filter_sk(struct sock *sk,
 
 int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 				      struct sockaddr *uaddr,
+				      int *uaddrlen,
 				      enum cgroup_bpf_attach_type atype,
 				      void *t_ctx,
 				      u32 *flags);
@@ -230,75 +231,76 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
 #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk)				       \
 	BPF_CGROUP_RUN_SK_PROG(sk, CGROUP_INET6_POST_BIND)
 
-#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, atype)				       \
-({									       \
-	int __ret = 0;							       \
-	if (cgroup_bpf_enabled(atype))					       \
-		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
-							  NULL, NULL);	       \
-	__ret;								       \
-})
-
-#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, atype, t_ctx)		       \
-({									       \
-	int __ret = 0;							       \
-	if (cgroup_bpf_enabled(atype))	{				       \
-		lock_sock(sk);						       \
-		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
-							  t_ctx, NULL);	       \
-		release_sock(sk);					       \
-	}								       \
-	__ret;								       \
-})
+#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, atype)               \
+	({                                                               \
+		int __ret = 0;                                           \
+		if (cgroup_bpf_enabled(atype))                           \
+			__ret = __cgroup_bpf_run_filter_sock_addr(       \
+				sk, uaddr, uaddrlen, atype, NULL, NULL); \
+		__ret;                                                   \
+	})
+
+#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, atype, t_ctx)    \
+	({                                                                \
+		int __ret = 0;                                            \
+		if (cgroup_bpf_enabled(atype)) {                          \
+			lock_sock(sk);                                    \
+			__ret = __cgroup_bpf_run_filter_sock_addr(        \
+				sk, uaddr, uaddrlen, atype, t_ctx, NULL); \
+			release_sock(sk);                                 \
+		}                                                         \
+		__ret;                                                    \
+	})
 
 /* BPF_CGROUP_INET4_BIND and BPF_CGROUP_INET6_BIND can return extra flags
  * via upper bits of return code. The only flag that is supported
  * (at bit position 0) is to indicate CAP_NET_BIND_SERVICE capability check
  * should be bypassed (BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE).
  */
-#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, atype, bind_flags)	       \
-({									       \
-	u32 __flags = 0;						       \
-	int __ret = 0;							       \
-	if (cgroup_bpf_enabled(atype))	{				       \
-		lock_sock(sk);						       \
-		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
-							  NULL, &__flags);     \
-		release_sock(sk);					       \
-		if (__flags & BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE)	       \
-			*bind_flags |= BIND_NO_CAP_NET_BIND_SERVICE;	       \
-	}								       \
-	__ret;								       \
-})
+#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, uaddrlen, atype,       \
+					   bind_flags)                       \
+	({                                                                   \
+		u32 __flags = 0;                                             \
+		int __ret = 0;                                               \
+		if (cgroup_bpf_enabled(atype)) {                             \
+			lock_sock(sk);                                       \
+			__ret = __cgroup_bpf_run_filter_sock_addr(           \
+				sk, uaddr, uaddrlen, atype, NULL, &__flags); \
+			release_sock(sk);                                    \
+			if (__flags & BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE)  \
+				*bind_flags |= BIND_NO_CAP_NET_BIND_SERVICE; \
+		}                                                            \
+		__ret;                                                       \
+	})
 
 #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk)				       \
 	((cgroup_bpf_enabled(CGROUP_INET4_CONNECT) ||		       \
 	  cgroup_bpf_enabled(CGROUP_INET6_CONNECT)) &&		       \
 	 (sk)->sk_prot->pre_connect)
 
-#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr)			       \
-	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, CGROUP_INET4_CONNECT)
+#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen)		       \
+	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT)
 
-#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr)			       \
-	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, CGROUP_INET6_CONNECT)
+#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen)		       \
+	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT)
 
-#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr)		       \
-	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_INET4_CONNECT, NULL)
+#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen)	       \
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT, NULL)
 
-#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr)		       \
-	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_INET6_CONNECT, NULL)
+#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen)	       \
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT, NULL)
 
-#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx)		       \
-	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP4_SENDMSG, t_ctx)
+#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_SENDMSG, t_ctx)
 
-#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx)		       \
-	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP6_SENDMSG, t_ctx)
+#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_SENDMSG, t_ctx)
 
-#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr)			\
-	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP4_RECVMSG, NULL)
+#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen)		\
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_RECVMSG, NULL)
 
-#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr)			\
-	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP6_RECVMSG, NULL)
+#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen)		\
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_RECVMSG, NULL)
 
 /* The SOCK_OPS"_SK" macro should be used when sock_ops->sk is not a
  * fullsock and its parent fullsock cannot be traced by
@@ -477,24 +479,24 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
 }
 
 #define cgroup_bpf_enabled(atype) (0)
-#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, atype, t_ctx) ({ 0; })
-#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, atype) ({ 0; })
+#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, atype, t_ctx) ({ 0; })
+#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, atype) ({ 0; })
 #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0)
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; })
-#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, atype, flags) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, uaddrlen, atype, flags) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk) ({ 0; })
-#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr) ({ 0; })
-#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr) ({ 0; })
-#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr) ({ 0; })
-#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr) ({ 0; })
-#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; })
-#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; })
-#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr) ({ 0; })
-#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(atype, major, minor, access) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos) ({ 0; })
diff --git a/include/linux/filter.h b/include/linux/filter.h
index bf701976056e..510cf4042f8b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1294,6 +1294,7 @@ struct bpf_sock_addr_kern {
 	 */
 	u64 tmp_reg;
 	void *t_ctx;	/* Attach type specific context. */
+	int *uaddrlen;
 };
 
 struct bpf_sock_ops_kern {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f89de51a45db..7cafcfdbb9b2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6366,6 +6366,7 @@ struct bpf_sock_addr {
 				 * Stored in network byte order.
 				 */
 	__bpf_md_ptr(struct bpf_sock *, sk);
+	__u32 user_addrlen;	/* Allows 4 byte read and write. */
 };
 
 /* User bpf_sock_ops struct to access socket values and specify request ops
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index bf2fdb33fb31..f97afed8a115 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1449,6 +1449,7 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
  *                                       provided by user sockaddr
  * @sk: sock struct that will use sockaddr
  * @uaddr: sockaddr struct provided by user
+ * @uaddrlen: Pointer to length of sockaddr struct provided by user
  * @type: The type of program to be executed
  * @t_ctx: Pointer to attach type specific context
  * @flags: Pointer to u32 which contains higher bits of BPF program
@@ -1461,6 +1462,7 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
  */
 int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 				      struct sockaddr *uaddr,
+				      int *uaddrlen,
 				      enum cgroup_bpf_attach_type atype,
 				      void *t_ctx,
 				      u32 *flags)
@@ -1468,6 +1470,7 @@ int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 	struct bpf_sock_addr_kern ctx = {
 		.sk = sk,
 		.uaddr = uaddr,
+		.uaddrlen = uaddrlen,
 		.t_ctx = t_ctx,
 	};
 	struct sockaddr_storage unspec;
diff --git a/net/core/filter.c b/net/core/filter.c
index 8607136b6e2c..d0620927dbca 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8876,6 +8876,13 @@ static bool sock_addr_is_valid_access(int off, int size,
 			return false;
 		info->reg_type = PTR_TO_SOCKET;
 		break;
+	case bpf_ctx_range(struct bpf_sock_addr, user_addrlen):
+		if (type != BPF_READ)
+			return false;
+
+		if (size != sizeof(__u32))
+			return false;
+		break;
 	default:
 		if (type == BPF_READ) {
 			if (size != size_default)
@@ -9909,6 +9916,7 @@ static u32 sock_addr_convert_ctx_access(enum bpf_access_type type,
 {
 	int off, port_size = sizeof_field(struct sockaddr_in6, sin6_port);
 	struct bpf_insn *insn = insn_buf;
+	u32 read_size;
 
 	switch (si->off) {
 	case offsetof(struct bpf_sock_addr, user_family):
@@ -9986,6 +9994,49 @@ static u32 sock_addr_convert_ctx_access(enum bpf_access_type type,
 				      si->dst_reg, si->src_reg,
 				      offsetof(struct bpf_sock_addr_kern, sk));
 		break;
+
+	case offsetof(struct bpf_sock_addr, user_addrlen):
+		/* uaddrlen is a pointer so it should be accessed via indirect
+		 * loads and stores. Also for stores additional temporary
+		 * register is used since neither src_reg nor dst_reg can be
+		 * overridden.
+		 */
+		if (type == BPF_WRITE) {
+			int treg = BPF_REG_9;
+
+			if (si->src_reg == treg || si->dst_reg == treg)
+				--treg;
+			if (si->src_reg == treg || si->dst_reg == treg)
+				--treg;
+			*insn++ = BPF_STX_MEM(
+				BPF_DW, si->dst_reg, treg,
+				offsetof(struct bpf_sock_addr_kern, tmp_reg));
+			*insn++ = BPF_LDX_MEM(
+				BPF_FIELD_SIZEOF(struct bpf_sock_addr_kern,
+						 uaddrlen),
+				treg, si->dst_reg,
+				offsetof(struct bpf_sock_addr_kern, uaddrlen));
+			*insn++ = BPF_STX_MEM(
+				BPF_SIZEOF(u32), treg, si->src_reg,
+				bpf_ctx_narrow_access_offset(0, sizeof(u32),
+							     sizeof(int)));
+			*insn++ = BPF_LDX_MEM(
+				BPF_DW, treg, si->dst_reg,
+				offsetof(struct bpf_sock_addr_kern, tmp_reg));
+		} else {
+			*insn++ = BPF_LDX_MEM(
+				BPF_FIELD_SIZEOF(struct bpf_sock_addr_kern,
+						 uaddrlen),
+				si->dst_reg, si->src_reg,
+				offsetof(struct bpf_sock_addr_kern, uaddrlen));
+			read_size = bpf_size_to_bytes(BPF_SIZE(si->code));
+			*insn++ = BPF_LDX_MEM(
+				BPF_SIZE(si->code), si->dst_reg, si->dst_reg,
+				bpf_ctx_narrow_access_offset(0, read_size,
+							     sizeof(int)));
+		}
+		*target_size = sizeof(u32);
+		break;
 	}
 
 	return insn - insn_buf;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index ab4a06be489b..3024837b57e7 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -448,7 +448,7 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	/* BPF prog is run before any checks are done so that if the prog
 	 * changes context in a wrong way it will be caught.
 	 */
-	err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr,
+	err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, &addr_len,
 						 CGROUP_INET4_BIND, &flags);
 	if (err)
 		return err;
@@ -775,6 +775,7 @@ int inet_getname(struct socket *sock, struct sockaddr *uaddr,
 	struct sock *sk		= sock->sk;
 	struct inet_sock *inet	= inet_sk(sk);
 	DECLARE_SOCKADDR(struct sockaddr_in *, sin, uaddr);
+	int addrlen = sizeof(*sin);
 
 	sin->sin_family = AF_INET;
 	lock_sock(sk);
@@ -787,7 +788,7 @@ int inet_getname(struct socket *sock, struct sockaddr *uaddr,
 		}
 		sin->sin_port = inet->inet_dport;
 		sin->sin_addr.s_addr = inet->inet_daddr;
-		BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin,
+		BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &addrlen,
 				       CGROUP_INET4_GETPEERNAME);
 	} else {
 		__be32 addr = inet->inet_rcv_saddr;
@@ -795,12 +796,12 @@ int inet_getname(struct socket *sock, struct sockaddr *uaddr,
 			addr = inet->inet_saddr;
 		sin->sin_port = inet->inet_sport;
 		sin->sin_addr.s_addr = addr;
-		BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin,
+		BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &addrlen,
 				       CGROUP_INET4_GETSOCKNAME);
 	}
 	release_sock(sk);
 	memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
-	return sizeof(*sin);
+	return addrlen;
 }
 EXPORT_SYMBOL(inet_getname);
 
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index bb9854c2b7a1..9b82798e5542 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -306,7 +306,7 @@ static int ping_pre_connect(struct sock *sk, struct sockaddr *uaddr,
 	if (addr_len < sizeof(struct sockaddr_in))
 		return -EINVAL;
 
-	return BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr);
+	return BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, &addr_len);
 }
 
 /* Checks the bind address and possibly modifies sk->sk_bound_dev_if. */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 1215fa4c1b9f..e4d1903239b9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -193,7 +193,7 @@ static int tcp_v4_pre_connect(struct sock *sk, struct sockaddr *uaddr,
 
 	sock_owned_by_me(sk);
 
-	return BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr);
+	return BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, &addr_len);
 }
 
 /* This will initiate an outgoing connection. */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9592fe3e444a..ff4b4513d0fb 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1156,8 +1156,9 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	}
 
 	if (cgroup_bpf_enabled(CGROUP_UDP4_SENDMSG) && !connected) {
-		err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk,
-					    (struct sockaddr *)usin, &ipc.addr);
+		err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(
+			sk, (struct sockaddr *)usin, &msg->msg_namelen,
+			&ipc.addr);
 		if (err)
 			goto out_free;
 		if (usin) {
@@ -1921,8 +1922,8 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
 		memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
 		*addr_len = sizeof(*sin);
 
-		BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk,
-						      (struct sockaddr *)sin);
+		BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(
+			sk, (struct sockaddr *)sin, addr_len);
 	}
 
 	if (udp_sk(sk)->gro_enabled)
@@ -1961,7 +1962,7 @@ int udp_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	if (addr_len < sizeof(struct sockaddr_in))
 		return -EINVAL;
 
-	return BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr);
+	return BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, &addr_len);
 }
 EXPORT_SYMBOL(udp_pre_connect);
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index fee9163382c2..ac3671e48710 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -464,7 +464,7 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	/* BPF prog is run before any checks are done so that if the prog
 	 * changes context in a wrong way it will be caught.
 	 */
-	err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr,
+	err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, &addr_len,
 						 CGROUP_INET6_BIND, &flags);
 	if (err)
 		return err;
@@ -527,6 +527,7 @@ int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
 	struct sock *sk = sock->sk;
 	struct inet_sock *inet = inet_sk(sk);
 	struct ipv6_pinfo *np = inet6_sk(sk);
+	int addrlen = sizeof(*sin);
 
 	sin->sin6_family = AF_INET6;
 	sin->sin6_flowinfo = 0;
@@ -543,7 +544,7 @@ int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
 		sin->sin6_addr = sk->sk_v6_daddr;
 		if (np->sndflow)
 			sin->sin6_flowinfo = np->flow_label;
-		BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin,
+		BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &addrlen,
 				       CGROUP_INET6_GETPEERNAME);
 	} else {
 		if (ipv6_addr_any(&sk->sk_v6_rcv_saddr))
@@ -551,13 +552,13 @@ int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
 		else
 			sin->sin6_addr = sk->sk_v6_rcv_saddr;
 		sin->sin6_port = inet->inet_sport;
-		BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin,
+		BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &addrlen,
 				       CGROUP_INET6_GETSOCKNAME);
 	}
 	sin->sin6_scope_id = ipv6_iface_scope_id(&sin->sin6_addr,
 						 sk->sk_bound_dev_if);
 	release_sock(sk);
-	return sizeof(*sin);
+	return addrlen;
 }
 EXPORT_SYMBOL(inet6_getname);
 
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index 808983bc2ec9..a46c1eec72e6 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -56,7 +56,7 @@ static int ping_v6_pre_connect(struct sock *sk, struct sockaddr *uaddr,
 	if (addr_len < SIN6_LEN_RFC2133)
 		return -EINVAL;
 
-	return BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr);
+	return BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, &addr_len);
 }
 
 static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index f52b6f271a24..22213641bb3c 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -139,7 +139,7 @@ static int tcp_v6_pre_connect(struct sock *sk, struct sockaddr *uaddr,
 
 	sock_owned_by_me(sk);
 
-	return BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr);
+	return BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, &addr_len);
 }
 
 static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9fb2f33ee3a7..2ac73446dcdc 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -428,8 +428,8 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		}
 		*addr_len = sizeof(*sin6);
 
-		BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk,
-						      (struct sockaddr *)sin6);
+		BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(
+			sk, (struct sockaddr *)sin6, addr_len);
 	}
 
 	if (udp_sk(sk)->gro_enabled)
@@ -1167,7 +1167,7 @@ static int udpv6_pre_connect(struct sock *sk, struct sockaddr *uaddr,
 	if (addr_len < SIN6_LEN_RFC2133)
 		return -EINVAL;
 
-	return BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr);
+	return BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, &addr_len);
 }
 
 /**
@@ -1516,9 +1516,9 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	fl6->fl6_sport = inet->inet_sport;
 
 	if (cgroup_bpf_enabled(CGROUP_UDP6_SENDMSG) && !connected) {
-		err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk,
-					   (struct sockaddr *)sin6,
-					   &fl6->saddr);
+		err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(
+			sk, (struct sockaddr *)sin6, &msg->msg_namelen,
+			&fl6->saddr);
 		if (err)
 			goto out_no_dst;
 		if (sin6) {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f89de51a45db..7cafcfdbb9b2 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6366,6 +6366,7 @@ struct bpf_sock_addr {
 				 * Stored in network byte order.
 				 */
 	__bpf_md_ptr(struct bpf_sock *, sk);
+	__u32 user_addrlen;	/* Allows 4 byte read and write. */
 };
 
 /* User bpf_sock_ops struct to access socket values and specify request ops
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 3/9] bpf: Support access to sun_path from cgroup sockaddr programs
  2022-12-10 19:35 [PATCH bpf-next v2 0/9] Add cgroup sockaddr hooks for unix sockets Daan De Meyer
  2022-12-10 19:35 ` [PATCH bpf-next v2 1/9] selftests/bpf: Add missing section name tests for getpeername/getsockname Daan De Meyer
  2022-12-10 19:35 ` [PATCH bpf-next v2 2/9] bpf: Allow read access to addr_len from cgroup sockaddr programs Daan De Meyer
@ 2022-12-10 19:35 ` Daan De Meyer
  2022-12-13  6:15   ` Yonghong Song
  2022-12-10 19:35 ` [PATCH bpf-next v2 4/9] selftests/bpf: Track sockaddr length in sock addr tests Daan De Meyer
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Daan De Meyer @ 2022-12-10 19:35 UTC (permalink / raw)
  To: bpf; +Cc: Daan De Meyer, martin.lau, kernel-team

Preparation for adding unix support to cgroup sockaddr bpf programs.
In this commit, no programs are allowed to access user_path. We'll
open this up to the new unix program types in a later commit.
---
 include/uapi/linux/bpf.h       |  1 +
 net/core/filter.c              | 19 +++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  1 +
 3 files changed, 21 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7cafcfdbb9b2..9e3c33f83bba 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6366,6 +6366,7 @@ struct bpf_sock_addr {
 				 * Stored in network byte order.
 				 */
 	__bpf_md_ptr(struct bpf_sock *, sk);
+	char user_path[108];    /* Allows 1 byte read and write. */
 	__u32 user_addrlen;	/* Allows 4 byte read and write. */
 };
 
diff --git a/net/core/filter.c b/net/core/filter.c
index d0620927dbca..cc86b38fc764 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -26,6 +26,7 @@
 #include <linux/socket.h>
 #include <linux/sock_diag.h>
 #include <linux/in.h>
+#include <linux/un.h>
 #include <linux/inet.h>
 #include <linux/netdevice.h>
 #include <linux/if_packet.h>
@@ -8830,6 +8831,8 @@ static bool sock_addr_is_valid_access(int off, int size,
 			return false;
 		}
 		break;
+	case bpf_ctx_range_till(struct bpf_sock_addr, user_path[0], user_path[107]):
+		return false;
 	}
 
 	switch (off) {
@@ -8876,6 +8879,10 @@ static bool sock_addr_is_valid_access(int off, int size,
 			return false;
 		info->reg_type = PTR_TO_SOCKET;
 		break;
+	case bpf_ctx_range_till(struct bpf_sock_addr, user_path[0], user_path[107]):
+		if (size != sizeof(char))
+			return false;
+		break;
 	case bpf_ctx_range(struct bpf_sock_addr, user_addrlen):
 		if (type != BPF_READ)
 			return false;
@@ -9995,6 +10002,18 @@ static u32 sock_addr_convert_ctx_access(enum bpf_access_type type,
 				      offsetof(struct bpf_sock_addr_kern, sk));
 		break;
 
+	case bpf_ctx_range_till(struct bpf_sock_addr, user_path[0], user_path[107]):
+		/* In kernelspace, addresses are always stored in
+		 * sockaddr_storage so any access in the full range of
+		 * sockaddr_un.sun_path is safe.
+		 */
+		off = si->off;
+		off -= offsetof(struct bpf_sock_addr, user_path[0]);
+		SOCK_ADDR_LOAD_OR_STORE_NESTED_FIELD_SIZE_OFF(
+			struct bpf_sock_addr_kern, struct sockaddr_un, uaddr,
+			sun_path, BPF_SIZE(si->code), off, tmp_reg);
+		break;
+
 	case offsetof(struct bpf_sock_addr, user_addrlen):
 		/* uaddrlen is a pointer so it should be accessed via indirect
 		 * loads and stores. Also for stores additional temporary
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7cafcfdbb9b2..9e3c33f83bba 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6366,6 +6366,7 @@ struct bpf_sock_addr {
 				 * Stored in network byte order.
 				 */
 	__bpf_md_ptr(struct bpf_sock *, sk);
+	char user_path[108];    /* Allows 1 byte read and write. */
 	__u32 user_addrlen;	/* Allows 4 byte read and write. */
 };
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 4/9] selftests/bpf: Track sockaddr length in sock addr tests
  2022-12-10 19:35 [PATCH bpf-next v2 0/9] Add cgroup sockaddr hooks for unix sockets Daan De Meyer
                   ` (2 preceding siblings ...)
  2022-12-10 19:35 ` [PATCH bpf-next v2 3/9] bpf: Support access to sun_path " Daan De Meyer
@ 2022-12-10 19:35 ` Daan De Meyer
  2022-12-10 19:35 ` [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets Daan De Meyer
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Daan De Meyer @ 2022-12-10 19:35 UTC (permalink / raw)
  To: bpf; +Cc: Daan De Meyer, martin.lau, kernel-team

In preparation for adding unix socket support to the bpf cgroup
socket address hooks, start tracking the sockaddr length in the
sockaddr tests which will be required when adding tests for unix
sockets.
---
 tools/testing/selftests/bpf/test_sock_addr.c | 130 ++++++++++++-------
 1 file changed, 85 insertions(+), 45 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sock_addr.c b/tools/testing/selftests/bpf/test_sock_addr.c
index 2c89674fc62c..6a618c8f477c 100644
--- a/tools/testing/selftests/bpf/test_sock_addr.c
+++ b/tools/testing/selftests/bpf/test_sock_addr.c
@@ -604,7 +604,7 @@ static struct sock_addr_test tests[] = {
 };
 
 static int mk_sockaddr(int domain, const char *ip, unsigned short port,
-		       struct sockaddr *addr, socklen_t addr_len)
+		       struct sockaddr *addr, socklen_t *addr_len)
 {
 	struct sockaddr_in6 *addr6;
 	struct sockaddr_in *addr4;
@@ -614,10 +614,10 @@ static int mk_sockaddr(int domain, const char *ip, unsigned short port,
 		return -1;
 	}
 
-	memset(addr, 0, addr_len);
+	memset(addr, 0, *addr_len);
 
 	if (domain == AF_INET) {
-		if (addr_len < sizeof(struct sockaddr_in))
+		if (*addr_len < sizeof(struct sockaddr_in))
 			return -1;
 		addr4 = (struct sockaddr_in *)addr;
 		addr4->sin_family = domain;
@@ -626,8 +626,9 @@ static int mk_sockaddr(int domain, const char *ip, unsigned short port,
 			log_err("Invalid IPv4: %s", ip);
 			return -1;
 		}
+		*addr_len = sizeof(struct sockaddr_in);
 	} else if (domain == AF_INET6) {
-		if (addr_len < sizeof(struct sockaddr_in6))
+		if (*addr_len < sizeof(struct sockaddr_in6))
 			return -1;
 		addr6 = (struct sockaddr_in6 *)addr;
 		addr6->sin6_family = domain;
@@ -636,6 +637,7 @@ static int mk_sockaddr(int domain, const char *ip, unsigned short port,
 			log_err("Invalid IPv6: %s", ip);
 			return -1;
 		}
+		*addr_len = sizeof(struct sockaddr_in6);
 	}
 
 	return 0;
@@ -749,6 +751,7 @@ static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test)
 {
 	struct sockaddr_in dst4_rw_addr;
 	struct in_addr src4_rw_ip;
+	socklen_t dst4_rw_addr_len = sizeof(dst4_rw_addr);
 
 	if (inet_pton(AF_INET, SRC4_REWRITE_IP, (void *)&src4_rw_ip) != 1) {
 		log_err("Invalid IPv4: %s", SRC4_REWRITE_IP);
@@ -757,7 +760,7 @@ static int sendmsg4_rw_asm_prog_load(const struct sock_addr_test *test)
 
 	if (mk_sockaddr(AF_INET, SERV4_REWRITE_IP, SERV4_REWRITE_PORT,
 			(struct sockaddr *)&dst4_rw_addr,
-			sizeof(dst4_rw_addr)) == -1)
+			&dst4_rw_addr_len) == -1)
 		return -1;
 
 	struct bpf_insn insns[] = {
@@ -812,6 +815,7 @@ static int sendmsg6_rw_dst_asm_prog_load(const struct sock_addr_test *test,
 {
 	struct sockaddr_in6 dst6_rw_addr;
 	struct in6_addr src6_rw_ip;
+	socklen_t dst6_rw_addr_len = sizeof(dst6_rw_addr);
 
 	if (inet_pton(AF_INET6, SRC6_REWRITE_IP, (void *)&src6_rw_ip) != 1) {
 		log_err("Invalid IPv6: %s", SRC6_REWRITE_IP);
@@ -820,7 +824,7 @@ static int sendmsg6_rw_dst_asm_prog_load(const struct sock_addr_test *test,
 
 	if (mk_sockaddr(AF_INET6, rw_dst_ip, SERV6_REWRITE_PORT,
 			(struct sockaddr *)&dst6_rw_addr,
-			sizeof(dst6_rw_addr)) == -1)
+			&dst6_rw_addr_len) == -1)
 		return -1;
 
 	struct bpf_insn insns[] = {
@@ -885,8 +889,9 @@ static int sendmsg6_rw_c_prog_load(const struct sock_addr_test *test)
 	return load_path(test, SENDMSG6_PROG_PATH);
 }
 
-static int cmp_addr(const struct sockaddr_storage *addr1,
-		    const struct sockaddr_storage *addr2, int cmp_port)
+static int cmp_addr(const struct sockaddr_storage *addr1, socklen_t addr1_len,
+		    const struct sockaddr_storage *addr2, socklen_t addr2_len,
+		    int cmp_port)
 {
 	const struct sockaddr_in *four1, *four2;
 	const struct sockaddr_in6 *six1, *six2;
@@ -894,6 +899,9 @@ static int cmp_addr(const struct sockaddr_storage *addr1,
 	if (addr1->ss_family != addr2->ss_family)
 		return -1;
 
+	if (addr1_len != addr2_len)
+		return -1;
+
 	if (addr1->ss_family == AF_INET) {
 		four1 = (const struct sockaddr_in *)addr1;
 		four2 = (const struct sockaddr_in *)addr2;
@@ -911,7 +919,8 @@ static int cmp_addr(const struct sockaddr_storage *addr1,
 }
 
 static int cmp_sock_addr(info_fn fn, int sock1,
-			 const struct sockaddr_storage *addr2, int cmp_port)
+			 const struct sockaddr_storage *addr2,
+			 socklen_t addr2_len, int cmp_port)
 {
 	struct sockaddr_storage addr1;
 	socklen_t len1 = sizeof(addr1);
@@ -920,22 +929,28 @@ static int cmp_sock_addr(info_fn fn, int sock1,
 	if (fn(sock1, (struct sockaddr *)&addr1, (socklen_t *)&len1) != 0)
 		return -1;
 
-	return cmp_addr(&addr1, addr2, cmp_port);
+	return cmp_addr(&addr1, len1, addr2, addr2_len, cmp_port);
 }
 
-static int cmp_local_ip(int sock1, const struct sockaddr_storage *addr2)
+static int cmp_local_ip(int sock1, const struct sockaddr_storage *addr2,
+			socklen_t addr2_len)
 {
-	return cmp_sock_addr(getsockname, sock1, addr2, /*cmp_port*/ 0);
+	return cmp_sock_addr(getsockname, sock1, addr2, addr2_len,
+			     /*cmp_port*/ 0);
 }
 
-static int cmp_local_addr(int sock1, const struct sockaddr_storage *addr2)
+static int cmp_local_addr(int sock1, const struct sockaddr_storage *addr2,
+			  socklen_t addr2_len)
 {
-	return cmp_sock_addr(getsockname, sock1, addr2, /*cmp_port*/ 1);
+	return cmp_sock_addr(getsockname, sock1, addr2, addr2_len,
+			     /*cmp_port*/ 1);
 }
 
-static int cmp_peer_addr(int sock1, const struct sockaddr_storage *addr2)
+static int cmp_peer_addr(int sock1, const struct sockaddr_storage *addr2,
+			 socklen_t addr2_len)
 {
-	return cmp_sock_addr(getpeername, sock1, addr2, /*cmp_port*/ 1);
+	return cmp_sock_addr(getpeername, sock1, addr2, addr2_len,
+			     /*cmp_port*/ 1);
 }
 
 static int start_server(int type, const struct sockaddr_storage *addr,
@@ -1109,7 +1124,8 @@ static int fastconnect_to_server(const struct sockaddr_storage *addr,
 				 MSG_FASTOPEN, &sendmsg_err);
 }
 
-static int recvmsg_from_client(int sockfd, struct sockaddr_storage *src_addr)
+static int recvmsg_from_client(int sockfd, struct sockaddr_storage *src_addr,
+			       socklen_t *src_addr_len)
 {
 	struct timeval tv;
 	struct msghdr hdr;
@@ -1133,31 +1149,39 @@ static int recvmsg_from_client(int sockfd, struct sockaddr_storage *src_addr)
 
 	memset(&hdr, 0, sizeof(hdr));
 	hdr.msg_name = src_addr;
-	hdr.msg_namelen = sizeof(struct sockaddr_storage);
+	hdr.msg_namelen = *src_addr_len;
 	hdr.msg_iov = &iov;
 	hdr.msg_iovlen = 1;
 
-	return recvmsg(sockfd, &hdr, 0);
+	if (recvmsg(sockfd, &hdr, 0) < 0)
+		return -1;
+
+	*src_addr_len = hdr.msg_namelen;
+	return 0;
 }
 
 static int init_addrs(const struct sock_addr_test *test,
 		      struct sockaddr_storage *requested_addr,
+		      socklen_t *requested_addr_len,
 		      struct sockaddr_storage *expected_addr,
-		      struct sockaddr_storage *expected_src_addr)
+		      socklen_t *expected_addr_len,
+		      struct sockaddr_storage *expected_src_addr,
+		      socklen_t *expected_src_addr_len)
 {
-	socklen_t addr_len = sizeof(struct sockaddr_storage);
-
 	if (mk_sockaddr(test->domain, test->expected_ip, test->expected_port,
-			(struct sockaddr *)expected_addr, addr_len) == -1)
+			(struct sockaddr *)expected_addr,
+			expected_addr_len) == -1)
 		goto err;
 
 	if (mk_sockaddr(test->domain, test->requested_ip, test->requested_port,
-			(struct sockaddr *)requested_addr, addr_len) == -1)
+			(struct sockaddr *)requested_addr,
+			requested_addr_len) == -1)
 		goto err;
 
 	if (test->expected_src_ip &&
 	    mk_sockaddr(test->domain, test->expected_src_ip, 0,
-			(struct sockaddr *)expected_src_addr, addr_len) == -1)
+			(struct sockaddr *)expected_src_addr,
+			expected_src_addr_len) == -1)
 		goto err;
 
 	return 0;
@@ -1167,25 +1191,28 @@ static int init_addrs(const struct sock_addr_test *test,
 
 static int run_bind_test_case(const struct sock_addr_test *test)
 {
-	socklen_t addr_len = sizeof(struct sockaddr_storage);
 	struct sockaddr_storage requested_addr;
 	struct sockaddr_storage expected_addr;
+	socklen_t requested_addr_len = sizeof(struct sockaddr_storage);
+	socklen_t expected_addr_len = sizeof(struct sockaddr_storage);
 	int clientfd = -1;
 	int servfd = -1;
 	int err = 0;
 
-	if (init_addrs(test, &requested_addr, &expected_addr, NULL))
+	if (init_addrs(test, &requested_addr, &requested_addr_len,
+		       &expected_addr, &expected_addr_len, NULL, NULL))
 		goto err;
 
-	servfd = start_server(test->type, &requested_addr, addr_len);
+	servfd = start_server(test->type, &requested_addr, requested_addr_len);
 	if (servfd == -1)
 		goto err;
 
-	if (cmp_local_addr(servfd, &expected_addr))
+	if (cmp_local_addr(servfd, &expected_addr, expected_addr_len))
 		goto err;
 
 	/* Try to connect to server just in case */
-	clientfd = connect_to_server(test->type, &expected_addr, addr_len);
+	clientfd = connect_to_server(test->type, &expected_addr,
+				     expected_addr_len);
 	if (clientfd == -1)
 		goto err;
 
@@ -1204,28 +1231,33 @@ static int run_connect_test_case(const struct sock_addr_test *test)
 	struct sockaddr_storage expected_src_addr;
 	struct sockaddr_storage requested_addr;
 	struct sockaddr_storage expected_addr;
+	socklen_t expected_src_addr_len = sizeof(struct sockaddr_storage);
+	socklen_t requested_addr_len = sizeof(struct sockaddr_storage);
+	socklen_t expected_addr_len = sizeof(struct sockaddr_storage);
 	int clientfd = -1;
 	int servfd = -1;
 	int err = 0;
 
-	if (init_addrs(test, &requested_addr, &expected_addr,
-		       &expected_src_addr))
+	if (init_addrs(test, &requested_addr, &requested_addr_len,
+		       &expected_addr, &expected_addr_len, &expected_src_addr,
+		       &expected_src_addr_len))
 		goto err;
 
 	/* Prepare server to connect to */
-	servfd = start_server(test->type, &expected_addr, addr_len);
+	servfd = start_server(test->type, &expected_addr, expected_addr_len);
 	if (servfd == -1)
 		goto err;
 
-	clientfd = connect_to_server(test->type, &requested_addr, addr_len);
+	clientfd = connect_to_server(test->type, &requested_addr,
+				     requested_addr_len);
 	if (clientfd == -1)
 		goto err;
 
 	/* Make sure src and dst addrs were overridden properly */
-	if (cmp_peer_addr(clientfd, &expected_addr))
+	if (cmp_peer_addr(clientfd, &expected_addr, expected_addr_len))
 		goto err;
 
-	if (cmp_local_ip(clientfd, &expected_src_addr))
+	if (cmp_local_ip(clientfd, &expected_src_addr, expected_src_addr_len))
 		goto err;
 
 	if (test->type == SOCK_STREAM) {
@@ -1235,10 +1267,11 @@ static int run_connect_test_case(const struct sock_addr_test *test)
 			goto err;
 
 		/* Make sure src and dst addrs were overridden properly */
-		if (cmp_peer_addr(clientfd, &expected_addr))
+		if (cmp_peer_addr(clientfd, &expected_addr, expected_addr_len))
 			goto err;
 
-		if (cmp_local_ip(clientfd, &expected_src_addr))
+		if (cmp_local_ip(clientfd, &expected_src_addr,
+				 expected_src_addr_len))
 			goto err;
 	}
 
@@ -1253,11 +1286,14 @@ static int run_connect_test_case(const struct sock_addr_test *test)
 
 static int run_xmsg_test_case(const struct sock_addr_test *test, int max_cmsg)
 {
-	socklen_t addr_len = sizeof(struct sockaddr_storage);
 	struct sockaddr_storage expected_addr;
 	struct sockaddr_storage server_addr;
 	struct sockaddr_storage sendmsg_addr;
 	struct sockaddr_storage recvmsg_addr;
+	socklen_t expected_addr_len = sizeof(struct sockaddr_storage);
+	socklen_t server_addr_len = sizeof(struct sockaddr_storage);
+	socklen_t sendmsg_addr_len = sizeof(struct sockaddr_storage);
+	socklen_t recvmsg_addr_len = sizeof(struct sockaddr_storage);
 	int clientfd = -1;
 	int servfd = -1;
 	int set_cmsg;
@@ -1266,11 +1302,12 @@ static int run_xmsg_test_case(const struct sock_addr_test *test, int max_cmsg)
 	if (test->type != SOCK_DGRAM)
 		goto err;
 
-	if (init_addrs(test, &sendmsg_addr, &server_addr, &expected_addr))
+	if (init_addrs(test, &sendmsg_addr, &sendmsg_addr_len, &server_addr,
+		       &server_addr_len, &expected_addr, &expected_addr_len))
 		goto err;
 
 	/* Prepare server to sendmsg to */
-	servfd = start_server(test->type, &server_addr, addr_len);
+	servfd = start_server(test->type, &server_addr, server_addr_len);
 	if (servfd == -1)
 		goto err;
 
@@ -1279,8 +1316,8 @@ static int run_xmsg_test_case(const struct sock_addr_test *test, int max_cmsg)
 			close(clientfd);
 
 		clientfd = sendmsg_to_server(test->type, &sendmsg_addr,
-					     addr_len, set_cmsg, /*flags*/0,
-					     &err);
+					     sendmsg_addr_len, set_cmsg,
+					     /*flags*/ 0, &err);
 		if (err)
 			goto out;
 		else if (clientfd == -1)
@@ -1298,10 +1335,13 @@ static int run_xmsg_test_case(const struct sock_addr_test *test, int max_cmsg)
 		 * specific packet may differ from the one used by default and
 		 * returned by getsockname(2).
 		 */
-		if (recvmsg_from_client(servfd, &recvmsg_addr) == -1)
+		if (recvmsg_from_client(servfd, &recvmsg_addr,
+					&recvmsg_addr_len) == -1)
 			goto err;
 
-		if (cmp_addr(&recvmsg_addr, &expected_addr, /*cmp_port*/0))
+		if (cmp_addr(&recvmsg_addr, recvmsg_addr_len, &expected_addr,
+			     expected_addr_len,
+			     /*cmp_port*/ 0))
 			goto err;
 	}
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets
  2022-12-10 19:35 [PATCH bpf-next v2 0/9] Add cgroup sockaddr hooks for unix sockets Daan De Meyer
                   ` (3 preceding siblings ...)
  2022-12-10 19:35 ` [PATCH bpf-next v2 4/9] selftests/bpf: Track sockaddr length in sock addr tests Daan De Meyer
@ 2022-12-10 19:35 ` Daan De Meyer
  2022-12-13  6:20   ` Yonghong Song
  2022-12-10 19:35 ` [PATCH bpf-next v2 6/9] libbpf: Add support for cgroup unix socket address hooks Daan De Meyer
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Daan De Meyer @ 2022-12-10 19:35 UTC (permalink / raw)
  To: bpf; +Cc: Daan De Meyer, martin.lau, kernel-team

These hooks allows intercepting bind(), connect(), getsockname(),
getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
socket hooks get write access to the address length because the
address length is not fixed when dealing with unix sockets and
needs to be modified when a unix socket address is modified by
the hook. Because abstract socket unix addresses start with a
NUL byte, we cannot recalculate the socket address in kernelspace
after running the hook by calculating the length of the unix socket
path using strlen().

This hook can be used when users want to multiplex syscall to a
single unix socket to multiple different processes behind the scenes
by redirecting the connect() and other syscalls to process specific
sockets.
---
 include/linux/bpf-cgroup-defs.h |  6 +++
 include/linux/bpf-cgroup.h      | 29 ++++++++++-
 include/uapi/linux/bpf.h        | 14 ++++--
 kernel/bpf/cgroup.c             | 11 ++++-
 kernel/bpf/syscall.c            | 18 +++++++
 kernel/bpf/verifier.c           |  7 ++-
 net/core/filter.c               | 45 +++++++++++++++--
 net/unix/af_unix.c              | 85 +++++++++++++++++++++++++++++----
 tools/include/uapi/linux/bpf.h  | 14 ++++--
 9 files changed, 204 insertions(+), 25 deletions(-)

diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
index 7b121bd780eb..8196ccb81915 100644
--- a/include/linux/bpf-cgroup-defs.h
+++ b/include/linux/bpf-cgroup-defs.h
@@ -26,21 +26,27 @@ enum cgroup_bpf_attach_type {
 	CGROUP_DEVICE,
 	CGROUP_INET4_BIND,
 	CGROUP_INET6_BIND,
+	CGROUP_UNIX_BIND,
 	CGROUP_INET4_CONNECT,
 	CGROUP_INET6_CONNECT,
+	CGROUP_UNIX_CONNECT,
 	CGROUP_INET4_POST_BIND,
 	CGROUP_INET6_POST_BIND,
 	CGROUP_UDP4_SENDMSG,
 	CGROUP_UDP6_SENDMSG,
+	CGROUP_UNIX_SENDMSG,
 	CGROUP_SYSCTL,
 	CGROUP_UDP4_RECVMSG,
 	CGROUP_UDP6_RECVMSG,
+	CGROUP_UNIX_RECVMSG,
 	CGROUP_GETSOCKOPT,
 	CGROUP_SETSOCKOPT,
 	CGROUP_INET4_GETPEERNAME,
 	CGROUP_INET6_GETPEERNAME,
+	CGROUP_UNIX_GETPEERNAME,
 	CGROUP_INET4_GETSOCKNAME,
 	CGROUP_INET6_GETSOCKNAME,
+	CGROUP_UNIX_GETSOCKNAME,
 	CGROUP_INET_SOCK_RELEASE,
 	CGROUP_LSM_START,
 	CGROUP_LSM_END = CGROUP_LSM_START + CGROUP_LSM_NUM - 1,
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 3ab2f06ddc8a..4de3016f01e4 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -46,21 +46,27 @@ to_cgroup_bpf_attach_type(enum bpf_attach_type attach_type)
 	CGROUP_ATYPE(CGROUP_DEVICE);
 	CGROUP_ATYPE(CGROUP_INET4_BIND);
 	CGROUP_ATYPE(CGROUP_INET6_BIND);
+	CGROUP_ATYPE(CGROUP_UNIX_BIND);
 	CGROUP_ATYPE(CGROUP_INET4_CONNECT);
 	CGROUP_ATYPE(CGROUP_INET6_CONNECT);
+	CGROUP_ATYPE(CGROUP_UNIX_CONNECT);
 	CGROUP_ATYPE(CGROUP_INET4_POST_BIND);
 	CGROUP_ATYPE(CGROUP_INET6_POST_BIND);
 	CGROUP_ATYPE(CGROUP_UDP4_SENDMSG);
 	CGROUP_ATYPE(CGROUP_UDP6_SENDMSG);
+	CGROUP_ATYPE(CGROUP_UNIX_SENDMSG);
 	CGROUP_ATYPE(CGROUP_SYSCTL);
 	CGROUP_ATYPE(CGROUP_UDP4_RECVMSG);
 	CGROUP_ATYPE(CGROUP_UDP6_RECVMSG);
+	CGROUP_ATYPE(CGROUP_UNIX_RECVMSG);
 	CGROUP_ATYPE(CGROUP_GETSOCKOPT);
 	CGROUP_ATYPE(CGROUP_SETSOCKOPT);
 	CGROUP_ATYPE(CGROUP_INET4_GETPEERNAME);
 	CGROUP_ATYPE(CGROUP_INET6_GETPEERNAME);
+	CGROUP_ATYPE(CGROUP_UNIX_GETPEERNAME);
 	CGROUP_ATYPE(CGROUP_INET4_GETSOCKNAME);
 	CGROUP_ATYPE(CGROUP_INET6_GETSOCKNAME);
+	CGROUP_ATYPE(CGROUP_UNIX_GETSOCKNAME);
 	CGROUP_ATYPE(CGROUP_INET_SOCK_RELEASE);
 	default:
 		return CGROUP_BPF_ATTACH_TYPE_INVALID;
@@ -273,9 +279,13 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
 		__ret;                                                       \
 	})
 
+#define BPF_CGROUP_RUN_PROG_UNIX_BIND_LOCK(sk, uaddr, uaddrlen)			\
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_BIND, NULL)
+
 #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk)				       \
 	((cgroup_bpf_enabled(CGROUP_INET4_CONNECT) ||		       \
-	  cgroup_bpf_enabled(CGROUP_INET6_CONNECT)) &&		       \
+	  cgroup_bpf_enabled(CGROUP_INET6_CONNECT) ||		       \
+	  cgroup_bpf_enabled(CGROUP_UNIX_CONNECT)) &&		       \
 	 (sk)->sk_prot->pre_connect)
 
 #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen)		       \
@@ -284,24 +294,36 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen)		       \
 	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT)
 
+#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT(sk, uaddr, uaddrlen)	               \
+	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_UNIX_CONNECT)
+
 #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen)	       \
 	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT, NULL)
 
 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen)	       \
 	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT, NULL)
 
+#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT_LOCK(sk, uaddr, uaddrlen)	       \
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_CONNECT, NULL)
+
 #define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
 	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_SENDMSG, t_ctx)
 
 #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
 	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_SENDMSG, t_ctx)
 
+#define BPF_CGROUP_RUN_PROG_UNIX_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)	\
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_SENDMSG, t_ctx)
+
 #define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen)		\
 	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_RECVMSG, NULL)
 
 #define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen)		\
 	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_RECVMSG, NULL)
 
+#define BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk, uaddr, uaddrlen)		\
+	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_RECVMSG, NULL)
+
 /* The SOCK_OPS"_SK" macro should be used when sock_ops->sk is not a
  * fullsock and its parent fullsock cannot be traced by
  * sk_to_full_sk().
@@ -487,16 +509,21 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
 #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, uaddrlen, atype, flags) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UNIX_BIND_LOCK(sk, uaddr, uaddrlen) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UNIX_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(atype, major, minor, access) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos) ({ 0; })
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9e3c33f83bba..b73e4da458fd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -999,17 +999,21 @@ enum bpf_attach_type {
 	BPF_SK_MSG_VERDICT,
 	BPF_CGROUP_INET4_BIND,
 	BPF_CGROUP_INET6_BIND,
+	BPF_CGROUP_UNIX_BIND,
 	BPF_CGROUP_INET4_CONNECT,
 	BPF_CGROUP_INET6_CONNECT,
+	BPF_CGROUP_UNIX_CONNECT,
 	BPF_CGROUP_INET4_POST_BIND,
 	BPF_CGROUP_INET6_POST_BIND,
 	BPF_CGROUP_UDP4_SENDMSG,
 	BPF_CGROUP_UDP6_SENDMSG,
+	BPF_CGROUP_UNIX_SENDMSG,
 	BPF_LIRC_MODE2,
 	BPF_FLOW_DISSECTOR,
 	BPF_CGROUP_SYSCTL,
 	BPF_CGROUP_UDP4_RECVMSG,
 	BPF_CGROUP_UDP6_RECVMSG,
+	BPF_CGROUP_UNIX_RECVMSG,
 	BPF_CGROUP_GETSOCKOPT,
 	BPF_CGROUP_SETSOCKOPT,
 	BPF_TRACE_RAW_TP,
@@ -1020,8 +1024,10 @@ enum bpf_attach_type {
 	BPF_TRACE_ITER,
 	BPF_CGROUP_INET4_GETPEERNAME,
 	BPF_CGROUP_INET6_GETPEERNAME,
+	BPF_CGROUP_UNIX_GETPEERNAME,
 	BPF_CGROUP_INET4_GETSOCKNAME,
 	BPF_CGROUP_INET6_GETSOCKNAME,
+	BPF_CGROUP_UNIX_GETSOCKNAME,
 	BPF_XDP_DEVMAP,
 	BPF_CGROUP_INET_SOCK_RELEASE,
 	BPF_XDP_CPUMAP,
@@ -2575,8 +2581,8 @@ union bpf_attr {
  * 		*bpf_socket* should be one of the following:
  *
  * 		* **struct bpf_sock_ops** for **BPF_PROG_TYPE_SOCK_OPS**.
- * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**
- * 		  and **BPF_CGROUP_INET6_CONNECT**.
+ * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**,
+ * 		  **BPF_CGROUP_INET6_CONNECT** and **BPF_CGROUP_UNIX_CONNECT**.
  *
  * 		This helper actually implements a subset of **setsockopt()**.
  * 		It supports the following *level*\ s:
@@ -2809,8 +2815,8 @@ union bpf_attr {
  * 		*bpf_socket* should be one of the following:
  *
  * 		* **struct bpf_sock_ops** for **BPF_PROG_TYPE_SOCK_OPS**.
- * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**
- * 		  and **BPF_CGROUP_INET6_CONNECT**.
+ * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**,
+ * 		  **BPF_CGROUP_INET6_CONNECT** and **BPF_CGROUP_UNIX_CONNECT**.
  *
  * 		This helper actually implements a subset of **getsockopt()**.
  * 		It supports the same set of *optname*\ s that is supported by
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index f97afed8a115..eeb349cef624 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1455,7 +1455,7 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
  * @flags: Pointer to u32 which contains higher bits of BPF program
  *         return value (OR'ed together).
  *
- * socket is expected to be of type INET or INET6.
+ * socket is expected to be of type INET, INET6 or UNIX.
  *
  * This function will return %-EPERM if an attached program is found and
  * returned value != 1 during execution. In all other cases, 0 is returned.
@@ -1479,7 +1479,8 @@ int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
 	/* Check socket family since not all sockets represent network
 	 * endpoint (e.g. AF_UNIX).
 	 */
-	if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6)
+	if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6 &&
+		sk->sk_family != AF_UNIX)
 		return 0;
 
 	if (!ctx.uaddr) {
@@ -2493,10 +2494,13 @@ cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		case BPF_CGROUP_SOCK_OPS:
 		case BPF_CGROUP_UDP4_RECVMSG:
 		case BPF_CGROUP_UDP6_RECVMSG:
+		case BPF_CGROUP_UNIX_RECVMSG:
 		case BPF_CGROUP_INET4_GETPEERNAME:
 		case BPF_CGROUP_INET6_GETPEERNAME:
+		case BPF_CGROUP_UNIX_GETPEERNAME:
 		case BPF_CGROUP_INET4_GETSOCKNAME:
 		case BPF_CGROUP_INET6_GETSOCKNAME:
+		case BPF_CGROUP_UNIX_GETSOCKNAME:
 			return NULL;
 		default:
 			return &bpf_get_retval_proto;
@@ -2508,10 +2512,13 @@ cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		case BPF_CGROUP_SOCK_OPS:
 		case BPF_CGROUP_UDP4_RECVMSG:
 		case BPF_CGROUP_UDP6_RECVMSG:
+		case BPF_CGROUP_UNIX_RECVMSG:
 		case BPF_CGROUP_INET4_GETPEERNAME:
 		case BPF_CGROUP_INET6_GETPEERNAME:
+		case BPF_CGROUP_UNIX_GETPEERNAME:
 		case BPF_CGROUP_INET4_GETSOCKNAME:
 		case BPF_CGROUP_INET6_GETSOCKNAME:
+		case BPF_CGROUP_UNIX_GETSOCKNAME:
 			return NULL;
 		default:
 			return &bpf_set_retval_proto;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 35972afb6850..142b5ece735f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2370,16 +2370,22 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
 		switch (expected_attach_type) {
 		case BPF_CGROUP_INET4_BIND:
 		case BPF_CGROUP_INET6_BIND:
+		case BPF_CGROUP_UNIX_BIND:
 		case BPF_CGROUP_INET4_CONNECT:
 		case BPF_CGROUP_INET6_CONNECT:
+		case BPF_CGROUP_UNIX_CONNECT:
 		case BPF_CGROUP_INET4_GETPEERNAME:
 		case BPF_CGROUP_INET6_GETPEERNAME:
+		case BPF_CGROUP_UNIX_GETPEERNAME:
 		case BPF_CGROUP_INET4_GETSOCKNAME:
 		case BPF_CGROUP_INET6_GETSOCKNAME:
+		case BPF_CGROUP_UNIX_GETSOCKNAME:
 		case BPF_CGROUP_UDP4_SENDMSG:
 		case BPF_CGROUP_UDP6_SENDMSG:
+		case BPF_CGROUP_UNIX_SENDMSG:
 		case BPF_CGROUP_UDP4_RECVMSG:
 		case BPF_CGROUP_UDP6_RECVMSG:
+		case BPF_CGROUP_UNIX_RECVMSG:
 			return 0;
 		default:
 			return -EINVAL;
@@ -3418,16 +3424,22 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 		return BPF_PROG_TYPE_CGROUP_SOCK;
 	case BPF_CGROUP_INET4_BIND:
 	case BPF_CGROUP_INET6_BIND:
+	case BPF_CGROUP_UNIX_BIND:
 	case BPF_CGROUP_INET4_CONNECT:
 	case BPF_CGROUP_INET6_CONNECT:
+	case BPF_CGROUP_UNIX_CONNECT:
 	case BPF_CGROUP_INET4_GETPEERNAME:
 	case BPF_CGROUP_INET6_GETPEERNAME:
+	case BPF_CGROUP_UNIX_GETPEERNAME:
 	case BPF_CGROUP_INET4_GETSOCKNAME:
 	case BPF_CGROUP_INET6_GETSOCKNAME:
+	case BPF_CGROUP_UNIX_GETSOCKNAME:
 	case BPF_CGROUP_UDP4_SENDMSG:
 	case BPF_CGROUP_UDP6_SENDMSG:
+	case BPF_CGROUP_UNIX_SENDMSG:
 	case BPF_CGROUP_UDP4_RECVMSG:
 	case BPF_CGROUP_UDP6_RECVMSG:
+	case BPF_CGROUP_UNIX_RECVMSG:
 		return BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
 	case BPF_CGROUP_SOCK_OPS:
 		return BPF_PROG_TYPE_SOCK_OPS;
@@ -3583,18 +3595,24 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_CGROUP_INET_SOCK_RELEASE:
 	case BPF_CGROUP_INET4_BIND:
 	case BPF_CGROUP_INET6_BIND:
+	case BPF_CGROUP_UNIX_BIND:
 	case BPF_CGROUP_INET4_POST_BIND:
 	case BPF_CGROUP_INET6_POST_BIND:
 	case BPF_CGROUP_INET4_CONNECT:
 	case BPF_CGROUP_INET6_CONNECT:
+	case BPF_CGROUP_UNIX_CONNECT:
 	case BPF_CGROUP_INET4_GETPEERNAME:
 	case BPF_CGROUP_INET6_GETPEERNAME:
+	case BPF_CGROUP_UNIX_GETPEERNAME:
 	case BPF_CGROUP_INET4_GETSOCKNAME:
 	case BPF_CGROUP_INET6_GETSOCKNAME:
+	case BPF_CGROUP_UNIX_GETSOCKNAME:
 	case BPF_CGROUP_UDP4_SENDMSG:
 	case BPF_CGROUP_UDP6_SENDMSG:
+	case BPF_CGROUP_UNIX_SENDMSG:
 	case BPF_CGROUP_UDP4_RECVMSG:
 	case BPF_CGROUP_UDP6_RECVMSG:
+	case BPF_CGROUP_UNIX_RECVMSG:
 	case BPF_CGROUP_SOCK_OPS:
 	case BPF_CGROUP_DEVICE:
 	case BPF_CGROUP_SYSCTL:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1d51bd9596da..c06a6e43676c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11991,14 +11991,19 @@ static int check_return_code(struct bpf_verifier_env *env)
 	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
 		if (env->prog->expected_attach_type == BPF_CGROUP_UDP4_RECVMSG ||
 		    env->prog->expected_attach_type == BPF_CGROUP_UDP6_RECVMSG ||
+		    env->prog->expected_attach_type == BPF_CGROUP_UNIX_RECVMSG ||
 		    env->prog->expected_attach_type == BPF_CGROUP_INET4_GETPEERNAME ||
 		    env->prog->expected_attach_type == BPF_CGROUP_INET6_GETPEERNAME ||
+		    env->prog->expected_attach_type == BPF_CGROUP_UNIX_GETPEERNAME ||
 		    env->prog->expected_attach_type == BPF_CGROUP_INET4_GETSOCKNAME ||
-		    env->prog->expected_attach_type == BPF_CGROUP_INET6_GETSOCKNAME)
+		    env->prog->expected_attach_type == BPF_CGROUP_INET6_GETSOCKNAME ||
+		    env->prog->expected_attach_type == BPF_CGROUP_UNIX_GETSOCKNAME)
 			range = tnum_range(1, 1);
 		if (env->prog->expected_attach_type == BPF_CGROUP_INET4_BIND ||
 		    env->prog->expected_attach_type == BPF_CGROUP_INET6_BIND)
 			range = tnum_range(0, 3);
+		if (env->prog->expected_attach_type == BPF_CGROUP_UNIX_BIND)
+			range = tnum_range(0, 1);
 		break;
 	case BPF_PROG_TYPE_CGROUP_SKB:
 		if (env->prog->expected_attach_type == BPF_CGROUP_INET_EGRESS) {
diff --git a/net/core/filter.c b/net/core/filter.c
index cc86b38fc764..0c8427305009 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7666,6 +7666,7 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		switch (prog->expected_attach_type) {
 		case BPF_CGROUP_INET4_CONNECT:
 		case BPF_CGROUP_INET6_CONNECT:
+		case BPF_CGROUP_UNIX_CONNECT:
 			return &bpf_bind_proto;
 		default:
 			return NULL;
@@ -7694,16 +7695,22 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		switch (prog->expected_attach_type) {
 		case BPF_CGROUP_INET4_BIND:
 		case BPF_CGROUP_INET6_BIND:
+		case BPF_CGROUP_UNIX_BIND:
 		case BPF_CGROUP_INET4_CONNECT:
 		case BPF_CGROUP_INET6_CONNECT:
+		case BPF_CGROUP_UNIX_CONNECT:
 		case BPF_CGROUP_UDP4_RECVMSG:
 		case BPF_CGROUP_UDP6_RECVMSG:
+		case BPF_CGROUP_UNIX_RECVMSG:
 		case BPF_CGROUP_UDP4_SENDMSG:
 		case BPF_CGROUP_UDP6_SENDMSG:
+		case BPF_CGROUP_UNIX_SENDMSG:
 		case BPF_CGROUP_INET4_GETPEERNAME:
 		case BPF_CGROUP_INET6_GETPEERNAME:
+		case BPF_CGROUP_UNIX_GETPEERNAME:
 		case BPF_CGROUP_INET4_GETSOCKNAME:
 		case BPF_CGROUP_INET6_GETSOCKNAME:
+		case BPF_CGROUP_UNIX_GETSOCKNAME:
 			return &bpf_sock_addr_setsockopt_proto;
 		default:
 			return NULL;
@@ -7712,16 +7719,22 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		switch (prog->expected_attach_type) {
 		case BPF_CGROUP_INET4_BIND:
 		case BPF_CGROUP_INET6_BIND:
+		case BPF_CGROUP_UNIX_BIND:
 		case BPF_CGROUP_INET4_CONNECT:
 		case BPF_CGROUP_INET6_CONNECT:
+		case BPF_CGROUP_UNIX_CONNECT:
 		case BPF_CGROUP_UDP4_RECVMSG:
 		case BPF_CGROUP_UDP6_RECVMSG:
+		case BPF_CGROUP_UNIX_RECVMSG:
 		case BPF_CGROUP_UDP4_SENDMSG:
 		case BPF_CGROUP_UDP6_SENDMSG:
+		case BPF_CGROUP_UNIX_SENDMSG:
 		case BPF_CGROUP_INET4_GETPEERNAME:
 		case BPF_CGROUP_INET6_GETPEERNAME:
+		case BPF_CGROUP_UNIX_GETPEERNAME:
 		case BPF_CGROUP_INET4_GETSOCKNAME:
 		case BPF_CGROUP_INET6_GETSOCKNAME:
+		case BPF_CGROUP_UNIX_GETSOCKNAME:
 			return &bpf_sock_addr_getsockopt_proto;
 		default:
 			return NULL;
@@ -8784,8 +8797,8 @@ static bool sock_addr_is_valid_access(int off, int size,
 	if (off % size != 0)
 		return false;
 
-	/* Disallow access to IPv6 fields from IPv4 contex and vise
-	 * versa.
+	/* Disallow access to fields not belonging to the attach type's address
+	 * family.
 	 */
 	switch (off) {
 	case bpf_ctx_range(struct bpf_sock_addr, user_ip4):
@@ -8832,7 +8845,18 @@ static bool sock_addr_is_valid_access(int off, int size,
 		}
 		break;
 	case bpf_ctx_range_till(struct bpf_sock_addr, user_path[0], user_path[107]):
-		return false;
+		switch (prog->expected_attach_type) {
+		case BPF_CGROUP_UNIX_BIND:
+		case BPF_CGROUP_UNIX_CONNECT:
+		case BPF_CGROUP_UNIX_SENDMSG:
+		case BPF_CGROUP_UNIX_RECVMSG:
+		case BPF_CGROUP_UNIX_GETPEERNAME:
+		case BPF_CGROUP_UNIX_GETSOCKNAME:
+			break;
+		default:
+			return false;
+		}
+		break;
 	}
 
 	switch (off) {
@@ -8884,8 +8908,19 @@ static bool sock_addr_is_valid_access(int off, int size,
 			return false;
 		break;
 	case bpf_ctx_range(struct bpf_sock_addr, user_addrlen):
-		if (type != BPF_READ)
-			return false;
+		if (type != BPF_READ) {
+			switch (prog->expected_attach_type) {
+			case BPF_CGROUP_UNIX_BIND:
+			case BPF_CGROUP_UNIX_CONNECT:
+			case BPF_CGROUP_UNIX_SENDMSG:
+			case BPF_CGROUP_UNIX_RECVMSG:
+			case BPF_CGROUP_UNIX_GETPEERNAME:
+			case BPF_CGROUP_UNIX_GETSOCKNAME:
+				break;
+			default:
+				return false;
+			}
+		}
 
 		if (size != sizeof(__u32))
 			return false;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index b3545fc68097..8d250cb75636 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -115,6 +115,7 @@
 #include <linux/freezer.h>
 #include <linux/file.h>
 #include <linux/btf_ids.h>
+#include <linux/bpf-cgroup.h>
 
 #include "scm.h"
 
@@ -1302,6 +1303,12 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 	struct sock *sk = sock->sk;
 	int err;
 
+	if (cgroup_bpf_enabled(CGROUP_UNIX_BIND)) {
+		err = BPF_CGROUP_RUN_PROG_UNIX_BIND_LOCK(sk, uaddr, &addr_len);
+		if (err)
+			return err;
+	}
+
 	if (addr_len == offsetof(struct sockaddr_un, sun_path) &&
 	    sunaddr->sun_family == AF_UNIX)
 		return unix_autobind(sk);
@@ -1356,6 +1363,13 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
 		goto out;
 
 	if (addr->sa_family != AF_UNSPEC) {
+		if (cgroup_bpf_enabled(CGROUP_UNIX_CONNECT)) {
+			err = BPF_CGROUP_RUN_PROG_UNIX_CONNECT_LOCK(sk, addr,
+								    &alen);
+			if (err)
+				goto out;
+		}
+
 		err = unix_validate_addr(sunaddr, alen);
 		if (err)
 			goto out;
@@ -1464,6 +1478,13 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	int err;
 	int st;
 
+	if (cgroup_bpf_enabled(CGROUP_UNIX_CONNECT)) {
+		err = BPF_CGROUP_RUN_PROG_UNIX_CONNECT_LOCK(sk, uaddr,
+							    &addr_len);
+		if (err)
+			goto out;
+	}
+
 	err = unix_validate_addr(sunaddr, addr_len);
 	if (err)
 		goto out;
@@ -1724,7 +1745,7 @@ static int unix_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
 	struct sock *sk = sock->sk;
 	struct unix_address *addr;
 	DECLARE_SOCKADDR(struct sockaddr_un *, sunaddr, uaddr);
-	int err = 0;
+	int addr_len = 0, err = 0;
 
 	if (peer) {
 		sk = unix_peer_get(sk);
@@ -1741,14 +1762,35 @@ static int unix_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
 	if (!addr) {
 		sunaddr->sun_family = AF_UNIX;
 		sunaddr->sun_path[0] = 0;
-		err = offsetof(struct sockaddr_un, sun_path);
+		addr_len = offsetof(struct sockaddr_un, sun_path);
 	} else {
-		err = addr->len;
+		addr_len = addr->len;
 		memcpy(sunaddr, addr->name, addr->len);
 	}
+
+	if (peer && cgroup_bpf_enabled(CGROUP_UNIX_GETPEERNAME)) {
+		err = BPF_CGROUP_RUN_SA_PROG(sk, uaddr, &addr_len,
+					     CGROUP_UNIX_GETPEERNAME);
+		if (err)
+			goto out;
+
+		err = unix_validate_addr(sunaddr, addr_len);
+		if (err)
+			goto out;
+	} else if (cgroup_bpf_enabled(CGROUP_UNIX_GETSOCKNAME)) {
+		err = BPF_CGROUP_RUN_SA_PROG(sk, uaddr, &addr_len,
+					     CGROUP_UNIX_GETSOCKNAME);
+		if (err)
+			goto out;
+
+		err = unix_validate_addr(sunaddr, addr_len);
+		if (err)
+			goto out;
+	}
+
 	sock_put(sk);
 out:
-	return err;
+	return err ?: addr_len;
 }
 
 static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb)
@@ -1910,6 +1952,13 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 		goto out;
 
 	if (msg->msg_namelen) {
+		if (cgroup_bpf_enabled(CGROUP_UNIX_SENDMSG)) {
+			err = BPF_CGROUP_RUN_PROG_UNIX_SENDMSG_LOCK(
+				sk, msg->msg_name, &msg->msg_namelen, NULL);
+			if (err)
+				goto out;
+		}
+
 		err = unix_validate_addr(sunaddr, msg->msg_namelen);
 		if (err)
 			goto out;
@@ -2404,14 +2453,29 @@ static int unix_seqpacket_recvmsg(struct socket *sock, struct msghdr *msg,
 	return unix_dgram_recvmsg(sock, msg, size, flags);
 }
 
-static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
+static int unix_recvmsg_copy_addr(struct msghdr *msg, struct sock *sk)
 {
 	struct unix_address *addr = smp_load_acquire(&unix_sk(sk)->addr);
+	int err;
 
 	if (addr) {
 		msg->msg_namelen = addr->len;
 		memcpy(msg->msg_name, addr->name, addr->len);
+
+		if (cgroup_bpf_enabled(CGROUP_UNIX_RECVMSG)) {
+			err = BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(
+				sk, msg->msg_name, &msg->msg_namelen);
+			if (err)
+				return err;
+
+			err = unix_validate_addr(msg->msg_name,
+						 msg->msg_namelen);
+			if (err)
+				return err;
+		}
 	}
+
+	return 0;
 }
 
 int __unix_dgram_recvmsg(struct sock *sk, struct msghdr *msg, size_t size,
@@ -2466,8 +2530,11 @@ int __unix_dgram_recvmsg(struct sock *sk, struct msghdr *msg, size_t size,
 						EPOLLOUT | EPOLLWRNORM |
 						EPOLLWRBAND);
 
-	if (msg->msg_name)
-		unix_copy_addr(msg, skb->sk);
+	if (msg->msg_name) {
+		err = unix_recvmsg_copy_addr(msg, skb->sk);
+		if (err)
+			goto out_free;
+	}
 
 	if (size > skb->len - skip)
 		size = skb->len - skip;
@@ -2821,7 +2888,9 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
 		if (state->msg && state->msg->msg_name) {
 			DECLARE_SOCKADDR(struct sockaddr_un *, sunaddr,
 					 state->msg->msg_name);
-			unix_copy_addr(state->msg, skb->sk);
+			err = unix_recvmsg_copy_addr(state->msg, skb->sk);
+			if (err)
+				break;
 			sunaddr = NULL;
 		}
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9e3c33f83bba..b73e4da458fd 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -999,17 +999,21 @@ enum bpf_attach_type {
 	BPF_SK_MSG_VERDICT,
 	BPF_CGROUP_INET4_BIND,
 	BPF_CGROUP_INET6_BIND,
+	BPF_CGROUP_UNIX_BIND,
 	BPF_CGROUP_INET4_CONNECT,
 	BPF_CGROUP_INET6_CONNECT,
+	BPF_CGROUP_UNIX_CONNECT,
 	BPF_CGROUP_INET4_POST_BIND,
 	BPF_CGROUP_INET6_POST_BIND,
 	BPF_CGROUP_UDP4_SENDMSG,
 	BPF_CGROUP_UDP6_SENDMSG,
+	BPF_CGROUP_UNIX_SENDMSG,
 	BPF_LIRC_MODE2,
 	BPF_FLOW_DISSECTOR,
 	BPF_CGROUP_SYSCTL,
 	BPF_CGROUP_UDP4_RECVMSG,
 	BPF_CGROUP_UDP6_RECVMSG,
+	BPF_CGROUP_UNIX_RECVMSG,
 	BPF_CGROUP_GETSOCKOPT,
 	BPF_CGROUP_SETSOCKOPT,
 	BPF_TRACE_RAW_TP,
@@ -1020,8 +1024,10 @@ enum bpf_attach_type {
 	BPF_TRACE_ITER,
 	BPF_CGROUP_INET4_GETPEERNAME,
 	BPF_CGROUP_INET6_GETPEERNAME,
+	BPF_CGROUP_UNIX_GETPEERNAME,
 	BPF_CGROUP_INET4_GETSOCKNAME,
 	BPF_CGROUP_INET6_GETSOCKNAME,
+	BPF_CGROUP_UNIX_GETSOCKNAME,
 	BPF_XDP_DEVMAP,
 	BPF_CGROUP_INET_SOCK_RELEASE,
 	BPF_XDP_CPUMAP,
@@ -2575,8 +2581,8 @@ union bpf_attr {
  * 		*bpf_socket* should be one of the following:
  *
  * 		* **struct bpf_sock_ops** for **BPF_PROG_TYPE_SOCK_OPS**.
- * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**
- * 		  and **BPF_CGROUP_INET6_CONNECT**.
+ * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**,
+ * 		  **BPF_CGROUP_INET6_CONNECT** and **BPF_CGROUP_UNIX_CONNECT**.
  *
  * 		This helper actually implements a subset of **setsockopt()**.
  * 		It supports the following *level*\ s:
@@ -2809,8 +2815,8 @@ union bpf_attr {
  * 		*bpf_socket* should be one of the following:
  *
  * 		* **struct bpf_sock_ops** for **BPF_PROG_TYPE_SOCK_OPS**.
- * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**
- * 		  and **BPF_CGROUP_INET6_CONNECT**.
+ * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**,
+ * 		  **BPF_CGROUP_INET6_CONNECT** and **BPF_CGROUP_UNIX_CONNECT**.
  *
  * 		This helper actually implements a subset of **getsockopt()**.
  * 		It supports the same set of *optname*\ s that is supported by
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 6/9] libbpf: Add support for cgroup unix socket address hooks
  2022-12-10 19:35 [PATCH bpf-next v2 0/9] Add cgroup sockaddr hooks for unix sockets Daan De Meyer
                   ` (4 preceding siblings ...)
  2022-12-10 19:35 ` [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets Daan De Meyer
@ 2022-12-10 19:35 ` Daan De Meyer
  2022-12-10 19:35 ` [PATCH bpf-next v2 7/9] bpftool: " Daan De Meyer
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Daan De Meyer @ 2022-12-10 19:35 UTC (permalink / raw)
  To: bpf; +Cc: Daan De Meyer, martin.lau, kernel-team

---
 tools/lib/bpf/libbpf.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2a82f49ce16f..8924404b9185 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -80,19 +80,25 @@ static const char * const attach_type_name[] = {
 	[BPF_CGROUP_DEVICE]		= "cgroup_device",
 	[BPF_CGROUP_INET4_BIND]		= "cgroup_inet4_bind",
 	[BPF_CGROUP_INET6_BIND]		= "cgroup_inet6_bind",
+	[BPF_CGROUP_UNIX_BIND]		= "cgroup_unix_bind",
 	[BPF_CGROUP_INET4_CONNECT]	= "cgroup_inet4_connect",
 	[BPF_CGROUP_INET6_CONNECT]	= "cgroup_inet6_connect",
+	[BPF_CGROUP_UNIX_CONNECT]       = "cgroup_unix_connect",
 	[BPF_CGROUP_INET4_POST_BIND]	= "cgroup_inet4_post_bind",
 	[BPF_CGROUP_INET6_POST_BIND]	= "cgroup_inet6_post_bind",
 	[BPF_CGROUP_INET4_GETPEERNAME]	= "cgroup_inet4_getpeername",
 	[BPF_CGROUP_INET6_GETPEERNAME]	= "cgroup_inet6_getpeername",
+	[BPF_CGROUP_UNIX_GETPEERNAME]	= "cgroup_unix_getpeername",
 	[BPF_CGROUP_INET4_GETSOCKNAME]	= "cgroup_inet4_getsockname",
 	[BPF_CGROUP_INET6_GETSOCKNAME]	= "cgroup_inet6_getsockname",
+	[BPF_CGROUP_UNIX_GETSOCKNAME]	= "cgroup_unix_getsockname",
 	[BPF_CGROUP_UDP4_SENDMSG]	= "cgroup_udp4_sendmsg",
 	[BPF_CGROUP_UDP6_SENDMSG]	= "cgroup_udp6_sendmsg",
+	[BPF_CGROUP_UNIX_SENDMSG]	= "cgroup_unix_sendmsg",
 	[BPF_CGROUP_SYSCTL]		= "cgroup_sysctl",
 	[BPF_CGROUP_UDP4_RECVMSG]	= "cgroup_udp4_recvmsg",
 	[BPF_CGROUP_UDP6_RECVMSG]	= "cgroup_udp6_recvmsg",
+	[BPF_CGROUP_UNIX_RECVMSG]	= "cgroup_unix_recvmsg",
 	[BPF_CGROUP_GETSOCKOPT]		= "cgroup_getsockopt",
 	[BPF_CGROUP_SETSOCKOPT]		= "cgroup_setsockopt",
 	[BPF_SK_SKB_STREAM_PARSER]	= "sk_skb_stream_parser",
@@ -8590,16 +8596,22 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("cgroup/post_bind6",	CGROUP_SOCK, BPF_CGROUP_INET6_POST_BIND, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/bind4",		CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_BIND, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/bind6",		CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_BIND, SEC_ATTACHABLE),
+	SEC_DEF("cgroup/bindun",	CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_BIND, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/connect4",	CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_CONNECT, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/connect6",	CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_CONNECT, SEC_ATTACHABLE),
+	SEC_DEF("cgroup/connectun",	CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_CONNECT, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/sendmsg4",	CGROUP_SOCK_ADDR, BPF_CGROUP_UDP4_SENDMSG, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/sendmsg6",	CGROUP_SOCK_ADDR, BPF_CGROUP_UDP6_SENDMSG, SEC_ATTACHABLE),
+	SEC_DEF("cgroup/sendmsgun",	CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_SENDMSG, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/recvmsg4",	CGROUP_SOCK_ADDR, BPF_CGROUP_UDP4_RECVMSG, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/recvmsg6",	CGROUP_SOCK_ADDR, BPF_CGROUP_UDP6_RECVMSG, SEC_ATTACHABLE),
+	SEC_DEF("cgroup/recvmsgun",	CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_RECVMSG, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/getpeername4",	CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_GETPEERNAME, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/getpeername6",	CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_GETPEERNAME, SEC_ATTACHABLE),
+	SEC_DEF("cgroup/getpeernameun", CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_GETPEERNAME, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/getsockname4",	CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_GETSOCKNAME, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/getsockname6",	CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_GETSOCKNAME, SEC_ATTACHABLE),
+	SEC_DEF("cgroup/getsocknameun", CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_GETSOCKNAME, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/sysctl",	CGROUP_SYSCTL, BPF_CGROUP_SYSCTL, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/getsockopt",	CGROUP_SOCKOPT, BPF_CGROUP_GETSOCKOPT, SEC_ATTACHABLE),
 	SEC_DEF("cgroup/setsockopt",	CGROUP_SOCKOPT, BPF_CGROUP_SETSOCKOPT, SEC_ATTACHABLE),
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 7/9] bpftool: Add support for cgroup unix socket address hooks
  2022-12-10 19:35 [PATCH bpf-next v2 0/9] Add cgroup sockaddr hooks for unix sockets Daan De Meyer
                   ` (5 preceding siblings ...)
  2022-12-10 19:35 ` [PATCH bpf-next v2 6/9] libbpf: Add support for cgroup unix socket address hooks Daan De Meyer
@ 2022-12-10 19:35 ` Daan De Meyer
  2022-12-10 19:35 ` [PATCH bpf-next v2 8/9] selftests/bpf: Add tests " Daan De Meyer
  2022-12-10 19:35 ` [PATCH bpf-next v2 9/9] documentation/bpf: Document " Daan De Meyer
  8 siblings, 0 replies; 20+ messages in thread
From: Daan De Meyer @ 2022-12-10 19:35 UTC (permalink / raw)
  To: bpf; +Cc: Daan De Meyer, martin.lau, kernel-team

---
 .../bpftool/Documentation/bpftool-cgroup.rst  | 21 ++++++++++++++-----
 tools/bpf/bpftool/cgroup.c                    | 17 ++++++++-------
 tools/bpf/bpftool/common.c                    |  6 ++++++
 3 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
index bd015ec9847b..a2d990fa623b 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
@@ -34,13 +34,16 @@ CGROUP COMMANDS
 |	*ATTACH_TYPE* := { **cgroup_inet_ingress** | **cgroup_inet_egress** |
 |		**cgroup_inet_sock_create** | **cgroup_sock_ops** |
 |		**cgroup_device** | **cgroup_inet4_bind** | **cgroup_inet6_bind** |
-|		**cgroup_inet4_post_bind** | **cgroup_inet6_post_bind** |
-|		**cgroup_inet4_connect** | **cgroup_inet6_connect** |
+|		**cgroup_unix_bind** | **cgroup_inet4_post_bind** |
+|		**cgroup_inet6_post_bind** | **cgroup_inet4_connect** |
+|		**cgroup_inet6_connect** | **cgroup_unix_connect** |
 |		**cgroup_inet4_getpeername** | **cgroup_inet6_getpeername** |
-|		**cgroup_inet4_getsockname** | **cgroup_inet6_getsockname** |
-|		**cgroup_udp4_sendmsg** | **cgroup_udp6_sendmsg** |
+|		**cgroup_unix_getpeername** | **cgroup_inet4_getsockname** |
+|		**cgroup_inet6_getsockname** | **cgroup_udp4_sendmsg** |
+|		**cgroup_udp6_sendmsg** | **cgroup_unix_sendmsg** |
 |		**cgroup_udp4_recvmsg** | **cgroup_udp6_recvmsg** |
-|		**cgroup_sysctl** | **cgroup_getsockopt** | **cgroup_setsockopt** |
+|		**cgroup_unix_recvmsg** | **cgroup_sysctl** |
+|		**cgroup_getsockopt** | **cgroup_setsockopt** |
 |		**cgroup_inet_sock_release** }
 |	*ATTACH_FLAGS* := { **multi** | **override** }
 
@@ -98,25 +101,33 @@ DESCRIPTION
 		  **device** device access (since 4.15);
 		  **bind4** call to bind(2) for an inet4 socket (since 4.17);
 		  **bind6** call to bind(2) for an inet6 socket (since 4.17);
+		  **bindun** call to bind(2) for a unix socket (since 6.3);
 		  **post_bind4** return from bind(2) for an inet4 socket (since 4.17);
 		  **post_bind6** return from bind(2) for an inet6 socket (since 4.17);
 		  **connect4** call to connect(2) for an inet4 socket (since 4.17);
 		  **connect6** call to connect(2) for an inet6 socket (since 4.17);
+		  **connectun** call to connect(2) for a unix socket (since 6.3);
 		  **sendmsg4** call to sendto(2), sendmsg(2), sendmmsg(2) for an
 		  unconnected udp4 socket (since 4.18);
 		  **sendmsg6** call to sendto(2), sendmsg(2), sendmmsg(2) for an
 		  unconnected udp6 socket (since 4.18);
+		  **sendmsgun** call to sendto(2), sendmsg(2), sendmmsg(2) for
+		  an unconnected unix socket (since 6.3);
 		  **recvmsg4** call to recvfrom(2), recvmsg(2), recvmmsg(2) for
 		  an unconnected udp4 socket (since 5.2);
 		  **recvmsg6** call to recvfrom(2), recvmsg(2), recvmmsg(2) for
 		  an unconnected udp6 socket (since 5.2);
+		  **recvmsgun** call to recvfrom(2), recvmsg(2), recvmmsg(2) for
+		  an unconnected unix socket (since 6.3);
 		  **sysctl** sysctl access (since 5.2);
 		  **getsockopt** call to getsockopt (since 5.3);
 		  **setsockopt** call to setsockopt (since 5.3);
 		  **getpeername4** call to getpeername(2) for an inet4 socket (since 5.8);
 		  **getpeername6** call to getpeername(2) for an inet6 socket (since 5.8);
+		  **getpeernameun** call to getpeername(2) for a unix socket (since 6.3);
 		  **getsockname4** call to getsockname(2) for an inet4 socket (since 5.8);
 		  **getsockname6** call to getsockname(2) for an inet6 socket (since 5.8).
+		  **getsocknameun** call to getsockname(2) for a unix socket (since 6.3);
 		  **sock_release** closing an userspace inet socket (since 5.9).
 
 	**bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
index b46a998d8f8d..3a57ca208a1c 100644
--- a/tools/bpf/bpftool/cgroup.c
+++ b/tools/bpf/bpftool/cgroup.c
@@ -26,13 +26,16 @@
 	"       ATTACH_TYPE := { cgroup_inet_ingress | cgroup_inet_egress |\n" \
 	"                        cgroup_inet_sock_create | cgroup_sock_ops |\n" \
 	"                        cgroup_device | cgroup_inet4_bind |\n" \
-	"                        cgroup_inet6_bind | cgroup_inet4_post_bind |\n" \
-	"                        cgroup_inet6_post_bind | cgroup_inet4_connect |\n" \
-	"                        cgroup_inet6_connect | cgroup_inet4_getpeername |\n" \
-	"                        cgroup_inet6_getpeername | cgroup_inet4_getsockname |\n" \
-	"                        cgroup_inet6_getsockname | cgroup_udp4_sendmsg |\n" \
-	"                        cgroup_udp6_sendmsg | cgroup_udp4_recvmsg |\n" \
-	"                        cgroup_udp6_recvmsg | cgroup_sysctl |\n" \
+	"                        cgroup_inet6_bind | cgroup_unix_bind |\n" \
+	"                        cgroup_inet4_post_bind | cgroup_inet6_post_bind |\n" \
+	"                        cgroup_inet4_connect | cgroup_inet6_connect |\n" \
+	"                        cgroup_unix_connect | cgroup_inet4_getpeername |\n" \
+	"                        cgroup_inet6_getpeername | cgroup_unix_getpeername |\n" \
+	"                        cgroup_inet4_getsockname | cgroup_inet6_getsockname |\n" \
+	"                        cgroup_unix_getsockname | cgroup_udp4_sendmsg |\n" \
+	"                        cgroup_udp6_sendmsg | cgroup_unix_sendmsg |\n" \
+	"                        cgroup_udp4_recvmsg | cgroup_udp6_recvmsg |\n" \
+	"                        cgroup_unix_recvmsg | cgroup_sysctl |\n" \
 	"                        cgroup_getsockopt | cgroup_setsockopt |\n" \
 	"                        cgroup_inet_sock_release }"
 
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index c90b756945e3..94f48740fd2a 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -1065,19 +1065,25 @@ const char *bpf_attach_type_input_str(enum bpf_attach_type t)
 	case BPF_CGROUP_DEVICE:			return "device";
 	case BPF_CGROUP_INET4_BIND:		return "bind4";
 	case BPF_CGROUP_INET6_BIND:		return "bind6";
+	case BPF_CGROUP_UNIX_BIND:		return "bindun";
 	case BPF_CGROUP_INET4_CONNECT:		return "connect4";
 	case BPF_CGROUP_INET6_CONNECT:		return "connect6";
+	case BPF_CGROUP_UNIX_CONNECT:		return "connectun";
 	case BPF_CGROUP_INET4_POST_BIND:	return "post_bind4";
 	case BPF_CGROUP_INET6_POST_BIND:	return "post_bind6";
 	case BPF_CGROUP_INET4_GETPEERNAME:	return "getpeername4";
 	case BPF_CGROUP_INET6_GETPEERNAME:	return "getpeername6";
+	case BPF_CGROUP_UNIX_GETPEERNAME:	return "getpeernameun";
 	case BPF_CGROUP_INET4_GETSOCKNAME:	return "getsockname4";
 	case BPF_CGROUP_INET6_GETSOCKNAME:	return "getsockname6";
+	case BPF_CGROUP_UNIX_GETSOCKNAME:	return "getsocknameun";
 	case BPF_CGROUP_UDP4_SENDMSG:		return "sendmsg4";
 	case BPF_CGROUP_UDP6_SENDMSG:		return "sendmsg6";
+	case BPF_CGROUP_UNIX_SENDMSG:		return "sendmsgun";
 	case BPF_CGROUP_SYSCTL:			return "sysctl";
 	case BPF_CGROUP_UDP4_RECVMSG:		return "recvmsg4";
 	case BPF_CGROUP_UDP6_RECVMSG:		return "recvmsg6";
+	case BPF_CGROUP_UNIX_RECVMSG:		return "recvmsgun";
 	case BPF_CGROUP_GETSOCKOPT:		return "getsockopt";
 	case BPF_CGROUP_SETSOCKOPT:		return "setsockopt";
 	case BPF_TRACE_RAW_TP:			return "raw_tp";
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 8/9] selftests/bpf: Add tests for cgroup unix socket address hooks
  2022-12-10 19:35 [PATCH bpf-next v2 0/9] Add cgroup sockaddr hooks for unix sockets Daan De Meyer
                   ` (6 preceding siblings ...)
  2022-12-10 19:35 ` [PATCH bpf-next v2 7/9] bpftool: " Daan De Meyer
@ 2022-12-10 19:35 ` Daan De Meyer
  2022-12-10 19:35 ` [PATCH bpf-next v2 9/9] documentation/bpf: Document " Daan De Meyer
  8 siblings, 0 replies; 20+ messages in thread
From: Daan De Meyer @ 2022-12-10 19:35 UTC (permalink / raw)
  To: bpf; +Cc: Daan De Meyer, martin.lau, kernel-team

The unix socket address hooks do not support modifying the source
address so we skip source address checks when we're running a unix
socket address hook test.
---
 .../selftests/bpf/prog_tests/section_names.c  |  30 ++++
 .../testing/selftests/bpf/progs/bindun_prog.c |  36 +++++
 .../selftests/bpf/progs/connectun_prog.c      |  28 ++++
 .../selftests/bpf/progs/recvmsgun_prog.c      |  36 +++++
 .../selftests/bpf/progs/sendmsgun_prog.c      |  28 ++++
 tools/testing/selftests/bpf/test_sock_addr.c  | 137 +++++++++++++++++-
 6 files changed, 288 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bindun_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/connectun_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/recvmsgun_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/sendmsgun_prog.c

diff --git a/tools/testing/selftests/bpf/prog_tests/section_names.c b/tools/testing/selftests/bpf/prog_tests/section_names.c
index fc5248e94a01..51ebc8e6065d 100644
--- a/tools/testing/selftests/bpf/prog_tests/section_names.c
+++ b/tools/testing/selftests/bpf/prog_tests/section_names.c
@@ -113,6 +113,11 @@ static struct sec_name_test tests[] = {
 		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_BIND},
 		{0, BPF_CGROUP_INET6_BIND},
 	},
+	{
+		"cgroup/bindun",
+		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_BIND},
+		{0, BPF_CGROUP_UNIX_BIND},
+	},
 	{
 		"cgroup/connect4",
 		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_CONNECT},
@@ -123,6 +128,11 @@ static struct sec_name_test tests[] = {
 		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_CONNECT},
 		{0, BPF_CGROUP_INET6_CONNECT},
 	},
+	{
+		"cgroup/connectun",
+		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_CONNECT},
+		{0, BPF_CGROUP_UNIX_CONNECT},
+	},
 	{
 		"cgroup/sendmsg4",
 		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP4_SENDMSG},
@@ -133,6 +143,11 @@ static struct sec_name_test tests[] = {
 		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP6_SENDMSG},
 		{0, BPF_CGROUP_UDP6_SENDMSG},
 	},
+	{
+		"cgroup/sendmsgun",
+		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_SENDMSG},
+		{0, BPF_CGROUP_UNIX_SENDMSG},
+	},
 	{
 		"cgroup/recvmsg4",
 		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP4_RECVMSG},
@@ -143,6 +158,11 @@ static struct sec_name_test tests[] = {
 		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UDP6_RECVMSG},
 		{0, BPF_CGROUP_UDP6_RECVMSG},
 	},
+	{
+		"cgroup/recvmsgun",
+		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_RECVMSG},
+		{0, BPF_CGROUP_UNIX_RECVMSG},
+	},
 	{
 		"cgroup/sysctl",
 		{0, BPF_PROG_TYPE_CGROUP_SYSCTL, BPF_CGROUP_SYSCTL},
@@ -168,6 +188,11 @@ static struct sec_name_test tests[] = {
 		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_GETPEERNAME},
 		{0, BPF_CGROUP_INET6_GETPEERNAME},
 	},
+	{
+		"cgroup/getpeernameun",
+		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_GETPEERNAME},
+		{0, BPF_CGROUP_UNIX_GETPEERNAME},
+	},
 	{
 		"cgroup/getsockname4",
 		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_GETSOCKNAME},
@@ -178,6 +203,11 @@ static struct sec_name_test tests[] = {
 		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_GETSOCKNAME},
 		{0, BPF_CGROUP_INET6_GETSOCKNAME},
 	},
+	{
+		"cgroup/getsocknameun",
+		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_UNIX_GETSOCKNAME},
+		{0, BPF_CGROUP_UNIX_GETSOCKNAME},
+	},
 };
 
 static void test_prog_type_by_name(const struct sec_name_test *test)
diff --git a/tools/testing/selftests/bpf/progs/bindun_prog.c b/tools/testing/selftests/bpf/progs/bindun_prog.c
new file mode 100644
index 000000000000..1183063ef745
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bindun_prog.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <string.h>
+
+#include <linux/bpf.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+
+#include <bpf/bpf_helpers.h>
+
+#define DST_REWRITE_PATH	"\0bpf_cgroup_unix_test_rewrite"
+
+SEC("cgroup/bindun")
+int bind_un_prog(struct bpf_sock_addr *ctx)
+{
+	struct bpf_sock *sk;
+
+	sk = ctx->sk;
+	if (!sk)
+		return 0;
+
+	if (sk->family != AF_UNIX)
+		return 0;
+
+	if (ctx->type != SOCK_STREAM && ctx->type != SOCK_DGRAM)
+		return 0;
+
+	memcpy(ctx->user_path, DST_REWRITE_PATH, sizeof(DST_REWRITE_PATH));
+	ctx->user_addrlen = offsetof(struct sockaddr_un, sun_path) +
+			    sizeof(DST_REWRITE_PATH) - 1;
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/connectun_prog.c b/tools/testing/selftests/bpf/progs/connectun_prog.c
new file mode 100644
index 000000000000..f443fdb862fe
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/connectun_prog.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <string.h>
+
+#include <linux/bpf.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+
+#include <bpf/bpf_helpers.h>
+
+#define DST_REWRITE_PATH	"\0bpf_cgroup_unix_test_rewrite"
+
+SEC("cgroup/connectun")
+int connect_un_prog(struct bpf_sock_addr *ctx)
+{
+	if (ctx->type != SOCK_STREAM && ctx->type != SOCK_DGRAM)
+		return 0;
+
+	/* Rewrite destination. */
+	memcpy(ctx->user_path, DST_REWRITE_PATH, sizeof(DST_REWRITE_PATH));
+	ctx->user_addrlen = offsetof(struct sockaddr_un, sun_path) +
+			    sizeof(DST_REWRITE_PATH) - 1;
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/recvmsgun_prog.c b/tools/testing/selftests/bpf/progs/recvmsgun_prog.c
new file mode 100644
index 000000000000..0334b9420adc
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/recvmsgun_prog.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <string.h>
+
+#include <linux/bpf.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+
+#include <bpf/bpf_helpers.h>
+
+#define SERVUN_PATH		"\0bpf_cgroup_unix_test"
+
+SEC("cgroup/recvmsgun")
+int recvmsgun_prog(struct bpf_sock_addr *ctx)
+{
+	struct bpf_sock *sk;
+
+	sk = ctx->sk;
+	if (!sk)
+		return 1;
+
+	if (sk->family != AF_UNIX)
+		return 1;
+
+	if (ctx->type != SOCK_STREAM && ctx->type != SOCK_DGRAM)
+		return 1;
+
+	memcpy(ctx->user_path, SERVUN_PATH, sizeof(SERVUN_PATH));
+	ctx->user_addrlen = offsetof(struct sockaddr_un, sun_path) +
+			    sizeof(SERVUN_PATH) - 1;
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/sendmsgun_prog.c b/tools/testing/selftests/bpf/progs/sendmsgun_prog.c
new file mode 100644
index 000000000000..0f6020e5d463
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/sendmsgun_prog.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <string.h>
+
+#include <linux/bpf.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+
+#include <bpf/bpf_helpers.h>
+
+#define DST_REWRITE_PATH	"\0bpf_cgroup_unix_test_rewrite"
+
+SEC("cgroup/sendmsgun")
+int sendmsg_un_prog(struct bpf_sock_addr *ctx)
+{
+	if (ctx->type != SOCK_DGRAM)
+		return 0;
+
+	/* Rewrite destination. */
+	memcpy(ctx->user_path, DST_REWRITE_PATH, sizeof(DST_REWRITE_PATH));
+	ctx->user_addrlen = offsetof(struct sockaddr_un, sun_path) +
+			    sizeof(DST_REWRITE_PATH) - 1;
+
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_sock_addr.c b/tools/testing/selftests/bpf/test_sock_addr.c
index 6a618c8f477c..c96322bcc6c8 100644
--- a/tools/testing/selftests/bpf/test_sock_addr.c
+++ b/tools/testing/selftests/bpf/test_sock_addr.c
@@ -12,6 +12,7 @@
 #include <sys/types.h>
 #include <sys/select.h>
 #include <sys/socket.h>
+#include <sys/un.h>
 
 #include <linux/filter.h>
 
@@ -28,12 +29,16 @@
 #define CG_PATH	"/foo"
 #define CONNECT4_PROG_PATH	"./connect4_prog.bpf.o"
 #define CONNECT6_PROG_PATH	"./connect6_prog.bpf.o"
+#define CONNECTUN_PROG_PATH	"./connectun_prog.bpf.o"
 #define SENDMSG4_PROG_PATH	"./sendmsg4_prog.bpf.o"
 #define SENDMSG6_PROG_PATH	"./sendmsg6_prog.bpf.o"
+#define SENDMSGUN_PROG_PATH	"./sendmsgun_prog.bpf.o"
 #define RECVMSG4_PROG_PATH	"./recvmsg4_prog.bpf.o"
 #define RECVMSG6_PROG_PATH	"./recvmsg6_prog.bpf.o"
+#define RECVMSGUN_PROG_PATH	"./recvmsgun_prog.bpf.o"
 #define BIND4_PROG_PATH		"./bind4_prog.bpf.o"
 #define BIND6_PROG_PATH		"./bind6_prog.bpf.o"
+#define BINDUN_PROG_PATH	"./bindun_prog.bpf.o"
 
 #define SERV4_IP		"192.168.1.254"
 #define SERV4_REWRITE_IP	"127.0.0.1"
@@ -51,6 +56,9 @@
 #define SERV6_PORT		6060
 #define SERV6_REWRITE_PORT	6666
 
+#define SERVUN_ADDRESS		"bpf_cgroup_unix_test"
+#define SERVUN_REWRITE_ADDRESS	"bpf_cgroup_unix_test_rewrite"
+
 #define INET_NTOP_BUF	40
 
 struct sock_addr_test;
@@ -88,8 +96,10 @@ struct sock_addr_test {
 
 static int bind4_prog_load(const struct sock_addr_test *test);
 static int bind6_prog_load(const struct sock_addr_test *test);
+static int bindun_prog_load(const struct sock_addr_test *test);
 static int connect4_prog_load(const struct sock_addr_test *test);
 static int connect6_prog_load(const struct sock_addr_test *test);
+static int connectun_prog_load(const struct sock_addr_test *test);
 static int sendmsg_allow_prog_load(const struct sock_addr_test *test);
 static int sendmsg_deny_prog_load(const struct sock_addr_test *test);
 static int recvmsg_allow_prog_load(const struct sock_addr_test *test);
@@ -102,6 +112,8 @@ static int recvmsg6_rw_c_prog_load(const struct sock_addr_test *test);
 static int sendmsg6_rw_c_prog_load(const struct sock_addr_test *test);
 static int sendmsg6_rw_v4mapped_prog_load(const struct sock_addr_test *test);
 static int sendmsg6_rw_wildcard_prog_load(const struct sock_addr_test *test);
+static int sendmsgun_prog_load(const struct sock_addr_test *test);
+static int recvmsgun_prog_load(const struct sock_addr_test *test);
 
 static struct sock_addr_test tests[] = {
 	/* bind */
@@ -217,6 +229,20 @@ static struct sock_addr_test tests[] = {
 		NULL,
 		SUCCESS,
 	},
+	{
+		"bindun: rewrite path",
+		bindun_prog_load,
+		BPF_CGROUP_UNIX_BIND,
+		BPF_CGROUP_UNIX_BIND,
+		AF_UNIX,
+		SOCK_STREAM,
+		SERVUN_ADDRESS,
+		0,
+		SERVUN_REWRITE_ADDRESS,
+		0,
+		NULL,
+		SUCCESS,
+	},
 
 	/* connect */
 	{
@@ -331,6 +357,34 @@ static struct sock_addr_test tests[] = {
 		SRC6_REWRITE_IP,
 		SUCCESS,
 	},
+	{
+		"connectun: rewrite SOCK_STREAM path",
+		connectun_prog_load,
+		BPF_CGROUP_UNIX_CONNECT,
+		BPF_CGROUP_UNIX_CONNECT,
+		AF_UNIX,
+		SOCK_STREAM,
+		SERVUN_ADDRESS,
+		0,
+		SERVUN_REWRITE_ADDRESS,
+		0,
+		NULL,
+		SUCCESS,
+	},
+	{
+		"connectun: rewrite SOCK_DGRAM path",
+		connectun_prog_load,
+		BPF_CGROUP_UNIX_CONNECT,
+		BPF_CGROUP_UNIX_CONNECT,
+		AF_UNIX,
+		SOCK_DGRAM,
+		SERVUN_ADDRESS,
+		0,
+		SERVUN_REWRITE_ADDRESS,
+		0,
+		NULL,
+		SUCCESS,
+	},
 
 	/* sendmsg */
 	{
@@ -515,6 +569,20 @@ static struct sock_addr_test tests[] = {
 		SRC6_REWRITE_IP,
 		SYSCALL_EPERM,
 	},
+	{
+		"sendmsgun: rewrite SOCK_DGRAM path",
+		sendmsgun_prog_load,
+		BPF_CGROUP_UNIX_SENDMSG,
+		BPF_CGROUP_UNIX_SENDMSG,
+		AF_UNIX,
+		SOCK_DGRAM,
+		SERVUN_ADDRESS,
+		0,
+		SERVUN_REWRITE_ADDRESS,
+		0,
+		NULL,
+		SUCCESS,
+	},
 
 	/* recvmsg */
 	{
@@ -601,6 +669,20 @@ static struct sock_addr_test tests[] = {
 		SERV6_IP,
 		SUCCESS,
 	},
+	{
+		"recvmsgun: rewrite SOCK_DGRAM path",
+		recvmsgun_prog_load,
+		BPF_CGROUP_UNIX_RECVMSG,
+		BPF_CGROUP_UNIX_RECVMSG,
+		AF_UNIX,
+		SOCK_DGRAM,
+		SERVUN_REWRITE_ADDRESS,
+		0,
+		SERVUN_REWRITE_ADDRESS,
+		0,
+		NULL,
+		SUCCESS,
+	},
 };
 
 static int mk_sockaddr(int domain, const char *ip, unsigned short port,
@@ -608,8 +690,9 @@ static int mk_sockaddr(int domain, const char *ip, unsigned short port,
 {
 	struct sockaddr_in6 *addr6;
 	struct sockaddr_in *addr4;
+	struct sockaddr_un *addrun;
 
-	if (domain != AF_INET && domain != AF_INET6) {
+	if (domain != AF_INET && domain != AF_INET6 && domain != AF_UNIX) {
 		log_err("Unsupported address family");
 		return -1;
 	}
@@ -638,6 +721,15 @@ static int mk_sockaddr(int domain, const char *ip, unsigned short port,
 			return -1;
 		}
 		*addr_len = sizeof(struct sockaddr_in6);
+	} else if (domain == AF_UNIX) {
+		if (*addr_len < sizeof(struct sockaddr_un))
+			return -1;
+		addrun = (struct sockaddr_un *)addr;
+		addrun->sun_family = domain;
+		addrun->sun_path[0] = 0;
+		strcpy(addrun->sun_path + 1, ip);
+		*addr_len = offsetof(struct sockaddr_un, sun_path) + 1 +
+			    strlen(ip);
 	}
 
 	return 0;
@@ -706,6 +798,11 @@ static int bind6_prog_load(const struct sock_addr_test *test)
 	return load_path(test, BIND6_PROG_PATH);
 }
 
+static int bindun_prog_load(const struct sock_addr_test *test)
+{
+	return load_path(test, BINDUN_PROG_PATH);
+}
+
 static int connect4_prog_load(const struct sock_addr_test *test)
 {
 	return load_path(test, CONNECT4_PROG_PATH);
@@ -716,6 +813,11 @@ static int connect6_prog_load(const struct sock_addr_test *test)
 	return load_path(test, CONNECT6_PROG_PATH);
 }
 
+static int connectun_prog_load(const struct sock_addr_test *test)
+{
+	return load_path(test, CONNECTUN_PROG_PATH);
+}
+
 static int xmsg_ret_only_prog_load(const struct sock_addr_test *test,
 				   int32_t rc)
 {
@@ -889,12 +991,23 @@ static int sendmsg6_rw_c_prog_load(const struct sock_addr_test *test)
 	return load_path(test, SENDMSG6_PROG_PATH);
 }
 
+static int sendmsgun_prog_load(const struct sock_addr_test *test)
+{
+	return load_path(test, SENDMSGUN_PROG_PATH);
+}
+
+static int recvmsgun_prog_load(const struct sock_addr_test *test)
+{
+	return load_path(test, RECVMSGUN_PROG_PATH);
+}
+
 static int cmp_addr(const struct sockaddr_storage *addr1, socklen_t addr1_len,
 		    const struct sockaddr_storage *addr2, socklen_t addr2_len,
 		    int cmp_port)
 {
 	const struct sockaddr_in *four1, *four2;
 	const struct sockaddr_in6 *six1, *six2;
+	const struct sockaddr_un *un1, *un2;
 
 	if (addr1->ss_family != addr2->ss_family)
 		return -1;
@@ -913,6 +1026,10 @@ static int cmp_addr(const struct sockaddr_storage *addr1, socklen_t addr1_len,
 		return !((six1->sin6_port == six2->sin6_port || !cmp_port) &&
 			 !memcmp(&six1->sin6_addr, &six2->sin6_addr,
 				 sizeof(struct in6_addr)));
+	} else if (addr1->ss_family == AF_UNIX) {
+		un1 = (const struct sockaddr_un *)addr1;
+		un2 = (const struct sockaddr_un *)addr2;
+		return memcmp(un1, un2, addr1_len);
 	}
 
 	return -1;
@@ -992,7 +1109,7 @@ static int connect_to_server(int type, const struct sockaddr_storage *addr,
 
 	domain = addr->ss_family;
 
-	if (domain != AF_INET && domain != AF_INET6) {
+	if (domain != AF_INET && domain != AF_INET6 && domain != AF_UNIX) {
 		log_err("Unsupported address family");
 		goto err;
 	}
@@ -1066,7 +1183,7 @@ static int sendmsg_to_server(int type, const struct sockaddr_storage *addr,
 
 	domain = addr->ss_family;
 
-	if (domain != AF_INET && domain != AF_INET6) {
+	if (domain != AF_INET && domain != AF_INET6 && domain != AF_UNIX) {
 		log_err("Unsupported address family");
 		goto err;
 	}
@@ -1095,7 +1212,7 @@ static int sendmsg_to_server(int type, const struct sockaddr_storage *addr,
 			hdr.msg_control = &control6;
 			hdr.msg_controllen = sizeof(control6.buf);
 		}
-		if (init_pktinfo(domain, CMSG_FIRSTHDR(&hdr))) {
+		if (domain != AF_UNIX && init_pktinfo(domain, CMSG_FIRSTHDR(&hdr))) {
 			log_err("Fail to init pktinfo");
 			goto err;
 		}
@@ -1257,10 +1374,11 @@ static int run_connect_test_case(const struct sock_addr_test *test)
 	if (cmp_peer_addr(clientfd, &expected_addr, expected_addr_len))
 		goto err;
 
-	if (cmp_local_ip(clientfd, &expected_src_addr, expected_src_addr_len))
+	if (test->domain != AF_UNIX &&
+	    cmp_local_ip(clientfd, &expected_src_addr, expected_src_addr_len))
 		goto err;
 
-	if (test->type == SOCK_STREAM) {
+	if (test->domain != AF_UNIX && test->type == SOCK_STREAM) {
 		/* Test TCP Fast Open scenario */
 		clientfd = fastconnect_to_server(&requested_addr, addr_len);
 		if (clientfd == -1)
@@ -1339,7 +1457,8 @@ static int run_xmsg_test_case(const struct sock_addr_test *test, int max_cmsg)
 					&recvmsg_addr_len) == -1)
 			goto err;
 
-		if (cmp_addr(&recvmsg_addr, recvmsg_addr_len, &expected_addr,
+		if (test->domain != AF_UNIX &&
+		    cmp_addr(&recvmsg_addr, recvmsg_addr_len, &expected_addr,
 			     expected_addr_len,
 			     /*cmp_port*/ 0))
 			goto err;
@@ -1382,18 +1501,22 @@ static int run_test_case(int cgfd, const struct sock_addr_test *test)
 	switch (test->attach_type) {
 	case BPF_CGROUP_INET4_BIND:
 	case BPF_CGROUP_INET6_BIND:
+	case BPF_CGROUP_UNIX_BIND:
 		err = run_bind_test_case(test);
 		break;
 	case BPF_CGROUP_INET4_CONNECT:
 	case BPF_CGROUP_INET6_CONNECT:
+	case BPF_CGROUP_UNIX_CONNECT:
 		err = run_connect_test_case(test);
 		break;
 	case BPF_CGROUP_UDP4_SENDMSG:
 	case BPF_CGROUP_UDP6_SENDMSG:
+	case BPF_CGROUP_UNIX_SENDMSG:
 		err = run_xmsg_test_case(test, 1);
 		break;
 	case BPF_CGROUP_UDP4_RECVMSG:
 	case BPF_CGROUP_UDP6_RECVMSG:
+	case BPF_CGROUP_UNIX_RECVMSG:
 		err = run_xmsg_test_case(test, 0);
 		break;
 	default:
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 9/9] documentation/bpf: Document cgroup unix socket address hooks
  2022-12-10 19:35 [PATCH bpf-next v2 0/9] Add cgroup sockaddr hooks for unix sockets Daan De Meyer
                   ` (7 preceding siblings ...)
  2022-12-10 19:35 ` [PATCH bpf-next v2 8/9] selftests/bpf: Add tests " Daan De Meyer
@ 2022-12-10 19:35 ` Daan De Meyer
  8 siblings, 0 replies; 20+ messages in thread
From: Daan De Meyer @ 2022-12-10 19:35 UTC (permalink / raw)
  To: bpf; +Cc: Daan De Meyer, martin.lau, kernel-team

---
 Documentation/bpf/libbpf/program_types.rst | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/Documentation/bpf/libbpf/program_types.rst b/Documentation/bpf/libbpf/program_types.rst
index ad4d4d5eecb0..06168ad73d5e 100644
--- a/Documentation/bpf/libbpf/program_types.rst
+++ b/Documentation/bpf/libbpf/program_types.rst
@@ -56,6 +56,18 @@ described in more detail in the footnotes.
 |                                           | ``BPF_CGROUP_UDP6_RECVMSG``            | ``cgroup/recvmsg6``              |           |
 +                                           +----------------------------------------+----------------------------------+-----------+
 |                                           | ``BPF_CGROUP_UDP6_SENDMSG``            | ``cgroup/sendmsg6``              |           |
+|                                           +----------------------------------------+----------------------------------+-----------+
+|                                           | ``BPF_CGROUP_UNIX_BIND``               | ``cgroup/bindun``                |           |
+|                                           +----------------------------------------+----------------------------------+-----------+
+|                                           | ``BPF_CGROUP_UNIX_CONNECT``            | ``cgroup/connectun``             |           |
+|                                           +----------------------------------------+----------------------------------+-----------+
+|                                           | ``BPF_CGROUP_UNIX_SENDMSG``            | ``cgroup/sendmsgun``             |           |
+|                                           +----------------------------------------+----------------------------------+-----------+
+|                                           | ``BPF_CGROUP_UNIX_RECVMSG``            | ``cgroup/recvmsgun``             |           |
+|                                           +----------------------------------------+----------------------------------+-----------+
+|                                           | ``BPF_CGROUP_UNIX_GETPEERNAME``        | ``cgroup/getpeernameun``         |           |
+|                                           +----------------------------------------+----------------------------------+-----------+
+|                                           | ``BPF_CGROUP_UNIX_GETSOCKNAME``        | ``cgroup/getsocknameun``         |           |
 +-------------------------------------------+----------------------------------------+----------------------------------+-----------+
 | ``BPF_PROG_TYPE_CGROUP_SOCK``             | ``BPF_CGROUP_INET4_POST_BIND``         | ``cgroup/post_bind4``            |           |
 +                                           +----------------------------------------+----------------------------------+-----------+
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/9] selftests/bpf: Add missing section name tests for getpeername/getsockname
  2022-12-10 19:35 ` [PATCH bpf-next v2 1/9] selftests/bpf: Add missing section name tests for getpeername/getsockname Daan De Meyer
@ 2022-12-13  4:46   ` Yonghong Song
  0 siblings, 0 replies; 20+ messages in thread
From: Yonghong Song @ 2022-12-13  4:46 UTC (permalink / raw)
  To: Daan De Meyer, bpf; +Cc: martin.lau, kernel-team



On 12/10/22 11:35 AM, Daan De Meyer wrote:
> ---
>   .../selftests/bpf/prog_tests/section_names.c  | 20 +++++++++++++++++++
>   1 file changed, 20 insertions(+)

Please add some commit message, even if it is short and mimics
the subject line.

> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/section_names.c b/tools/testing/selftests/bpf/prog_tests/section_names.c
> index 8b571890c57e..fc5248e94a01 100644
> --- a/tools/testing/selftests/bpf/prog_tests/section_names.c
> +++ b/tools/testing/selftests/bpf/prog_tests/section_names.c
> @@ -158,6 +158,26 @@ static struct sec_name_test tests[] = {
>   		{0, BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_CGROUP_SETSOCKOPT},
>   		{0, BPF_CGROUP_SETSOCKOPT},
>   	},
> +	{
> +		"cgroup/getpeername4",
> +		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_GETPEERNAME},
> +		{0, BPF_CGROUP_INET4_GETPEERNAME},
> +	},
> +	{
> +		"cgroup/getpeername6",
> +		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_GETPEERNAME},
> +		{0, BPF_CGROUP_INET6_GETPEERNAME},
> +	},
> +	{
> +		"cgroup/getsockname4",
> +		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET4_GETSOCKNAME},
> +		{0, BPF_CGROUP_INET4_GETSOCKNAME},
> +	},
> +	{
> +		"cgroup/getsockname6",
> +		{0, BPF_PROG_TYPE_CGROUP_SOCK_ADDR, BPF_CGROUP_INET6_GETSOCKNAME},
> +		{0, BPF_CGROUP_INET6_GETSOCKNAME},
> +	},
>   };
>   
>   static void test_prog_type_by_name(const struct sec_name_test *test)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 2/9] bpf: Allow read access to addr_len from cgroup sockaddr programs
  2022-12-10 19:35 ` [PATCH bpf-next v2 2/9] bpf: Allow read access to addr_len from cgroup sockaddr programs Daan De Meyer
@ 2022-12-13  6:06   ` Yonghong Song
  2022-12-16  7:28   ` Martin KaFai Lau
  1 sibling, 0 replies; 20+ messages in thread
From: Yonghong Song @ 2022-12-13  6:06 UTC (permalink / raw)
  To: Daan De Meyer, bpf; +Cc: martin.lau, kernel-team



On 12/10/22 11:35 AM, Daan De Meyer wrote:
> ---
>   include/linux/bpf-cgroup.h     | 124 +++++++++++++++++----------------
>   include/linux/filter.h         |   1 +
>   include/uapi/linux/bpf.h       |   1 +
>   kernel/bpf/cgroup.c            |   3 +
>   net/core/filter.c              |  51 ++++++++++++++
>   net/ipv4/af_inet.c             |   9 +--
>   net/ipv4/ping.c                |   2 +-
>   net/ipv4/tcp_ipv4.c            |   2 +-
>   net/ipv4/udp.c                 |  11 +--
>   net/ipv6/af_inet6.c            |   9 +--
>   net/ipv6/ping.c                |   2 +-
>   net/ipv6/tcp_ipv6.c            |   2 +-
>   net/ipv6/udp.c                 |  12 ++--
>   tools/include/uapi/linux/bpf.h |   1 +
>   14 files changed, 146 insertions(+), 84 deletions(-)

Again, please add some commit message for this patch. The same for
a few other following patches.

> 
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 57e9e109257e..3ab2f06ddc8a 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -120,6 +120,7 @@ int __cgroup_bpf_run_filter_sk(struct sock *sk,
>   
>   int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
>   				      struct sockaddr *uaddr,
> +				      int *uaddrlen,
>   				      enum cgroup_bpf_attach_type atype,
>   				      void *t_ctx,
>   				      u32 *flags);
> @@ -230,75 +231,76 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
>   #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk)				       \
>   	BPF_CGROUP_RUN_SK_PROG(sk, CGROUP_INET6_POST_BIND)
>   
> -#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, atype)				       \
> -({									       \
> -	int __ret = 0;							       \
> -	if (cgroup_bpf_enabled(atype))					       \
> -		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
> -							  NULL, NULL);	       \
> -	__ret;								       \
> -})
> -
> -#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, atype, t_ctx)		       \
> -({									       \
> -	int __ret = 0;							       \
> -	if (cgroup_bpf_enabled(atype))	{				       \
> -		lock_sock(sk);						       \
> -		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
> -							  t_ctx, NULL);	       \
> -		release_sock(sk);					       \
> -	}								       \
> -	__ret;								       \
> -})
> +#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, atype)               \
> +	({                                                               \
> +		int __ret = 0;                                           \
> +		if (cgroup_bpf_enabled(atype))                           \
> +			__ret = __cgroup_bpf_run_filter_sock_addr(       \
> +				sk, uaddr, uaddrlen, atype, NULL, NULL); \
> +		__ret;                                                   \
> +	})
> +
> +#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, atype, t_ctx)    \
> +	({                                                                \
> +		int __ret = 0;                                            \
> +		if (cgroup_bpf_enabled(atype)) {                          \
> +			lock_sock(sk);                                    \
> +			__ret = __cgroup_bpf_run_filter_sock_addr(        \
> +				sk, uaddr, uaddrlen, atype, t_ctx, NULL); \
> +			release_sock(sk);                                 \
> +		}                                                         \
> +		__ret;                                                    \
> +	})
>   
[...]
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 8607136b6e2c..d0620927dbca 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -8876,6 +8876,13 @@ static bool sock_addr_is_valid_access(int off, int size,
>   			return false;
>   		info->reg_type = PTR_TO_SOCKET;
>   		break;
> +	case bpf_ctx_range(struct bpf_sock_addr, user_addrlen):
> +		if (type != BPF_READ)
> +			return false;
> +
> +		if (size != sizeof(__u32))
> +			return false;
> +		break;
>   	default:
>   		if (type == BPF_READ) {
>   			if (size != size_default)
> @@ -9909,6 +9916,7 @@ static u32 sock_addr_convert_ctx_access(enum bpf_access_type type,
>   {
>   	int off, port_size = sizeof_field(struct sockaddr_in6, sin6_port);
>   	struct bpf_insn *insn = insn_buf;
> +	u32 read_size;
>   
>   	switch (si->off) {
>   	case offsetof(struct bpf_sock_addr, user_family):
> @@ -9986,6 +9994,49 @@ static u32 sock_addr_convert_ctx_access(enum bpf_access_type type,
>   				      si->dst_reg, si->src_reg,
>   				      offsetof(struct bpf_sock_addr_kern, sk));
>   		break;
> +
> +	case offsetof(struct bpf_sock_addr, user_addrlen):
> +		/* uaddrlen is a pointer so it should be accessed via indirect
> +		 * loads and stores. Also for stores additional temporary
> +		 * register is used since neither src_reg nor dst_reg can be
> +		 * overridden.
> +		 */
> +		if (type == BPF_WRITE) {

In the above, we have
 > +	case bpf_ctx_range(struct bpf_sock_addr, user_addrlen):
 > +		if (type != BPF_READ)
 > +			return false;
 > +
 > +		if (size != sizeof(__u32))
 > +			return false;
 > +		break;

So let us delay BPF_WRITE later once the write is enabled.


> +			int treg = BPF_REG_9;
> +
> +			if (si->src_reg == treg || si->dst_reg == treg)
> +				--treg;
> +			if (si->src_reg == treg || si->dst_reg == treg)
> +				--treg;
> +			*insn++ = BPF_STX_MEM(
> +				BPF_DW, si->dst_reg, treg,
> +				offsetof(struct bpf_sock_addr_kern, tmp_reg));
> +			*insn++ = BPF_LDX_MEM(
> +				BPF_FIELD_SIZEOF(struct bpf_sock_addr_kern,
> +						 uaddrlen),
> +				treg, si->dst_reg,
> +				offsetof(struct bpf_sock_addr_kern, uaddrlen));
> +			*insn++ = BPF_STX_MEM(
> +				BPF_SIZEOF(u32), treg, si->src_reg,
> +				bpf_ctx_narrow_access_offset(0, sizeof(u32),
> +							     sizeof(int)));
> +			*insn++ = BPF_LDX_MEM(
> +				BPF_DW, treg, si->dst_reg,
> +				offsetof(struct bpf_sock_addr_kern, tmp_reg));
> +		} else {
> +			*insn++ = BPF_LDX_MEM(
> +				BPF_FIELD_SIZEOF(struct bpf_sock_addr_kern,
> +						 uaddrlen),
> +				si->dst_reg, si->src_reg,
> +				offsetof(struct bpf_sock_addr_kern, uaddrlen));
> +			read_size = bpf_size_to_bytes(BPF_SIZE(si->code));
> +			*insn++ = BPF_LDX_MEM(
> +				BPF_SIZE(si->code), si->dst_reg, si->dst_reg,
> +				bpf_ctx_narrow_access_offset(0, read_size,
> +							     sizeof(int)));
> +		}
> +		*target_size = sizeof(u32);
> +		break;
>   	}
>   
>   	return insn - insn_buf;
[...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 3/9] bpf: Support access to sun_path from cgroup sockaddr programs
  2022-12-10 19:35 ` [PATCH bpf-next v2 3/9] bpf: Support access to sun_path " Daan De Meyer
@ 2022-12-13  6:15   ` Yonghong Song
  0 siblings, 0 replies; 20+ messages in thread
From: Yonghong Song @ 2022-12-13  6:15 UTC (permalink / raw)
  To: Daan De Meyer, bpf; +Cc: martin.lau, kernel-team



On 12/10/22 11:35 AM, Daan De Meyer wrote:
> Preparation for adding unix support to cgroup sockaddr bpf programs.
> In this commit, no programs are allowed to access user_path. We'll
> open this up to the new unix program types in a later commit.
> ---
>   include/uapi/linux/bpf.h       |  1 +
>   net/core/filter.c              | 19 +++++++++++++++++++
>   tools/include/uapi/linux/bpf.h |  1 +
>   3 files changed, 21 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 7cafcfdbb9b2..9e3c33f83bba 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -6366,6 +6366,7 @@ struct bpf_sock_addr {
>   				 * Stored in network byte order.
>   				 */
>   	__bpf_md_ptr(struct bpf_sock *, sk);
> +	char user_path[108];    /* Allows 1 byte read and write. */
>   	__u32 user_addrlen;	/* Allows 4 byte read and write. */
>   };

Ideally, for bisecting reason, it would be great to add user_path
first and then user_addrlen second. Otherwise, some tests utilizing
user_addrlen might not run correctly with Patch 2/9.

>   
> diff --git a/net/core/filter.c b/net/core/filter.c
> index d0620927dbca..cc86b38fc764 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -26,6 +26,7 @@
>   #include <linux/socket.h>
>   #include <linux/sock_diag.h>
>   #include <linux/in.h>
> +#include <linux/un.h>
>   #include <linux/inet.h>
>   #include <linux/netdevice.h>
>   #include <linux/if_packet.h>
[...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets
  2022-12-10 19:35 ` [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets Daan De Meyer
@ 2022-12-13  6:20   ` Yonghong Song
  2022-12-13 11:36     ` Daan De Meyer
  0 siblings, 1 reply; 20+ messages in thread
From: Yonghong Song @ 2022-12-13  6:20 UTC (permalink / raw)
  To: Daan De Meyer, bpf; +Cc: martin.lau, kernel-team



On 12/10/22 11:35 AM, Daan De Meyer wrote:
> These hooks allows intercepting bind(), connect(), getsockname(),
> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
> socket hooks get write access to the address length because the
> address length is not fixed when dealing with unix sockets and
> needs to be modified when a unix socket address is modified by
> the hook. Because abstract socket unix addresses start with a
> NUL byte, we cannot recalculate the socket address in kernelspace
> after running the hook by calculating the length of the unix socket
> path using strlen().

Yes, although we cannot calculate the socket path length with
strlen(). But we still have a method to find the path. In
unix_seq_show(), the unix socket path is calculated as below,

                 if (u->addr) {  // under a hash table lock here
                         int i, len;
                         seq_putc(seq, ' ');

                         i = 0;
                         len = u->addr->len -
                                 offsetof(struct sockaddr_un, sun_path);
                         if (u->addr->name->sun_path[0]) {
                                 len--;
                         } else {
                                 seq_putc(seq, '@');
                                 i++;
                         }
                         for ( ; i < len; i++)
                                 seq_putc(seq, u->addr->name->sun_path[i] ?:
                                          '@');
                 }

Is it possible that we can use the above method to find the
address length so we won't need to pass uaddr_len to bpf program?

Since all other hooks do not need to uaddr_len, you could add some
new hooks for unix socket which can specially calculate uaddr_len
after the bpf program run.

> 
> This hook can be used when users want to multiplex syscall to a
> single unix socket to multiple different processes behind the scenes
> by redirecting the connect() and other syscalls to process specific
> sockets.
> ---
>   include/linux/bpf-cgroup-defs.h |  6 +++
>   include/linux/bpf-cgroup.h      | 29 ++++++++++-
>   include/uapi/linux/bpf.h        | 14 ++++--
>   kernel/bpf/cgroup.c             | 11 ++++-
>   kernel/bpf/syscall.c            | 18 +++++++
>   kernel/bpf/verifier.c           |  7 ++-
>   net/core/filter.c               | 45 +++++++++++++++--
>   net/unix/af_unix.c              | 85 +++++++++++++++++++++++++++++----
>   tools/include/uapi/linux/bpf.h  | 14 ++++--
>   9 files changed, 204 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> index 7b121bd780eb..8196ccb81915 100644
> --- a/include/linux/bpf-cgroup-defs.h
> +++ b/include/linux/bpf-cgroup-defs.h
> @@ -26,21 +26,27 @@ enum cgroup_bpf_attach_type {
>   	CGROUP_DEVICE,
>   	CGROUP_INET4_BIND,
>   	CGROUP_INET6_BIND,
> +	CGROUP_UNIX_BIND,
>   	CGROUP_INET4_CONNECT,
>   	CGROUP_INET6_CONNECT,
> +	CGROUP_UNIX_CONNECT,
>   	CGROUP_INET4_POST_BIND,
>   	CGROUP_INET6_POST_BIND,
>   	CGROUP_UDP4_SENDMSG,
>   	CGROUP_UDP6_SENDMSG,
> +	CGROUP_UNIX_SENDMSG,
>   	CGROUP_SYSCTL,
>   	CGROUP_UDP4_RECVMSG,
>   	CGROUP_UDP6_RECVMSG,
> +	CGROUP_UNIX_RECVMSG,
>   	CGROUP_GETSOCKOPT,
>   	CGROUP_SETSOCKOPT,
>   	CGROUP_INET4_GETPEERNAME,
>   	CGROUP_INET6_GETPEERNAME,
> +	CGROUP_UNIX_GETPEERNAME,
>   	CGROUP_INET4_GETSOCKNAME,
>   	CGROUP_INET6_GETSOCKNAME,
> +	CGROUP_UNIX_GETSOCKNAME,
>   	CGROUP_INET_SOCK_RELEASE,
>   	CGROUP_LSM_START,
>   	CGROUP_LSM_END = CGROUP_LSM_START + CGROUP_LSM_NUM - 1,
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 3ab2f06ddc8a..4de3016f01e4 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -46,21 +46,27 @@ to_cgroup_bpf_attach_type(enum bpf_attach_type attach_type)
>   	CGROUP_ATYPE(CGROUP_DEVICE);
>   	CGROUP_ATYPE(CGROUP_INET4_BIND);
>   	CGROUP_ATYPE(CGROUP_INET6_BIND);
> +	CGROUP_ATYPE(CGROUP_UNIX_BIND);
>   	CGROUP_ATYPE(CGROUP_INET4_CONNECT);
>   	CGROUP_ATYPE(CGROUP_INET6_CONNECT);
> +	CGROUP_ATYPE(CGROUP_UNIX_CONNECT);
>   	CGROUP_ATYPE(CGROUP_INET4_POST_BIND);
>   	CGROUP_ATYPE(CGROUP_INET6_POST_BIND);
>   	CGROUP_ATYPE(CGROUP_UDP4_SENDMSG);
>   	CGROUP_ATYPE(CGROUP_UDP6_SENDMSG);
> +	CGROUP_ATYPE(CGROUP_UNIX_SENDMSG);
>   	CGROUP_ATYPE(CGROUP_SYSCTL);
>   	CGROUP_ATYPE(CGROUP_UDP4_RECVMSG);
>   	CGROUP_ATYPE(CGROUP_UDP6_RECVMSG);
> +	CGROUP_ATYPE(CGROUP_UNIX_RECVMSG);
>   	CGROUP_ATYPE(CGROUP_GETSOCKOPT);
>   	CGROUP_ATYPE(CGROUP_SETSOCKOPT);
>   	CGROUP_ATYPE(CGROUP_INET4_GETPEERNAME);
>   	CGROUP_ATYPE(CGROUP_INET6_GETPEERNAME);
> +	CGROUP_ATYPE(CGROUP_UNIX_GETPEERNAME);
>   	CGROUP_ATYPE(CGROUP_INET4_GETSOCKNAME);
>   	CGROUP_ATYPE(CGROUP_INET6_GETSOCKNAME);
> +	CGROUP_ATYPE(CGROUP_UNIX_GETSOCKNAME);
>   	CGROUP_ATYPE(CGROUP_INET_SOCK_RELEASE);
>   	default:
>   		return CGROUP_BPF_ATTACH_TYPE_INVALID;
> @@ -273,9 +279,13 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
>   		__ret;                                                       \
>   	})
>   
> +#define BPF_CGROUP_RUN_PROG_UNIX_BIND_LOCK(sk, uaddr, uaddrlen)			\
> +	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_BIND, NULL)
> +
>   #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk)				       \
>   	((cgroup_bpf_enabled(CGROUP_INET4_CONNECT) ||		       \
> -	  cgroup_bpf_enabled(CGROUP_INET6_CONNECT)) &&		       \
> +	  cgroup_bpf_enabled(CGROUP_INET6_CONNECT) ||		       \
> +	  cgroup_bpf_enabled(CGROUP_UNIX_CONNECT)) &&		       \
>   	 (sk)->sk_prot->pre_connect)
>   
>   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen)		       \
> @@ -284,24 +294,36 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
>   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen)		       \
>   	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT)
>   
> +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT(sk, uaddr, uaddrlen)	               \
> +	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_UNIX_CONNECT)
> +
>   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen)	       \
>   	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT, NULL)
>   
>   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen)	       \
>   	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT, NULL)
>   
> +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT_LOCK(sk, uaddr, uaddrlen)	       \
> +	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_CONNECT, NULL)
> +
>   #define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
>   	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_SENDMSG, t_ctx)
>   
>   #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
>   	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_SENDMSG, t_ctx)
>   
> +#define BPF_CGROUP_RUN_PROG_UNIX_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)	\
> +	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_SENDMSG, t_ctx)
> +
>   #define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen)		\
>   	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_RECVMSG, NULL)
>   
>   #define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen)		\
>   	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_RECVMSG, NULL)
>   
> +#define BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk, uaddr, uaddrlen)		\
> +	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_RECVMSG, NULL)
> +
>   /* The SOCK_OPS"_SK" macro should be used when sock_ops->sk is not a
>    * fullsock and its parent fullsock cannot be traced by
>    * sk_to_full_sk().
> @@ -487,16 +509,21 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
>   #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, uaddrlen, atype, flags) ({ 0; })
> +#define BPF_CGROUP_RUN_PROG_UNIX_BIND_LOCK(sk, uaddr, uaddrlen) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
> +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
> +#define BPF_CGROUP_RUN_PROG_UNIX_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> +#define BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(atype, major, minor, access) ({ 0; })
>   #define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos) ({ 0; })
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 9e3c33f83bba..b73e4da458fd 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -999,17 +999,21 @@ enum bpf_attach_type {
>   	BPF_SK_MSG_VERDICT,
>   	BPF_CGROUP_INET4_BIND,
>   	BPF_CGROUP_INET6_BIND,
> +	BPF_CGROUP_UNIX_BIND,
>   	BPF_CGROUP_INET4_CONNECT,
>   	BPF_CGROUP_INET6_CONNECT,
> +	BPF_CGROUP_UNIX_CONNECT,
>   	BPF_CGROUP_INET4_POST_BIND,
>   	BPF_CGROUP_INET6_POST_BIND,
>   	BPF_CGROUP_UDP4_SENDMSG,
>   	BPF_CGROUP_UDP6_SENDMSG,
> +	BPF_CGROUP_UNIX_SENDMSG,
>   	BPF_LIRC_MODE2,
>   	BPF_FLOW_DISSECTOR,
>   	BPF_CGROUP_SYSCTL,
>   	BPF_CGROUP_UDP4_RECVMSG,
>   	BPF_CGROUP_UDP6_RECVMSG,
> +	BPF_CGROUP_UNIX_RECVMSG,
>   	BPF_CGROUP_GETSOCKOPT,
>   	BPF_CGROUP_SETSOCKOPT,
>   	BPF_TRACE_RAW_TP,
> @@ -1020,8 +1024,10 @@ enum bpf_attach_type {
>   	BPF_TRACE_ITER,
>   	BPF_CGROUP_INET4_GETPEERNAME,
>   	BPF_CGROUP_INET6_GETPEERNAME,
> +	BPF_CGROUP_UNIX_GETPEERNAME,
>   	BPF_CGROUP_INET4_GETSOCKNAME,
>   	BPF_CGROUP_INET6_GETSOCKNAME,
> +	BPF_CGROUP_UNIX_GETSOCKNAME,
>   	BPF_XDP_DEVMAP,
>   	BPF_CGROUP_INET_SOCK_RELEASE,
>   	BPF_XDP_CPUMAP,

This is uapi. Please add new attach type to the end of enum type.

> @@ -2575,8 +2581,8 @@ union bpf_attr {
>    * 		*bpf_socket* should be one of the following:
>    *
>    * 		* **struct bpf_sock_ops** for **BPF_PROG_TYPE_SOCK_OPS**.
> - * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**
> - * 		  and **BPF_CGROUP_INET6_CONNECT**.
> + * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**,
> + * 		  **BPF_CGROUP_INET6_CONNECT** and **BPF_CGROUP_UNIX_CONNECT**.
>    *
>    * 		This helper actually implements a subset of **setsockopt()**.
>    * 		It supports the following *level*\ s:
> @@ -2809,8 +2815,8 @@ union bpf_attr {
>    * 		*bpf_socket* should be one of the following:
>    *
>    * 		* **struct bpf_sock_ops** for **BPF_PROG_TYPE_SOCK_OPS**.
> - * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**
> - * 		  and **BPF_CGROUP_INET6_CONNECT**.
> + * 		* **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**,
> + * 		  **BPF_CGROUP_INET6_CONNECT** and **BPF_CGROUP_UNIX_CONNECT**.
>    *
>    * 		This helper actually implements a subset of **getsockopt()**.
>    * 		It supports the same set of *optname*\ s that is supported by
[...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets
  2022-12-13  6:20   ` Yonghong Song
@ 2022-12-13 11:36     ` Daan De Meyer
  2022-12-13 21:54       ` Yonghong Song
  0 siblings, 1 reply; 20+ messages in thread
From: Daan De Meyer @ 2022-12-13 11:36 UTC (permalink / raw)
  To: Yonghong Song; +Cc: bpf, martin.lau, kernel-team

> On 12/10/22 11:35 AM, Daan De Meyer wrote:
> > These hooks allows intercepting bind(), connect(), getsockname(),
> > getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
> > socket hooks get write access to the address length because the
> > address length is not fixed when dealing with unix sockets and
> > needs to be modified when a unix socket address is modified by
> > the hook. Because abstract socket unix addresses start with a
> > NUL byte, we cannot recalculate the socket address in kernelspace
> > after running the hook by calculating the length of the unix socket
> > path using strlen().
>
> Yes, although we cannot calculate the socket path length with
> strlen(). But we still have a method to find the path. In
> unix_seq_show(), the unix socket path is calculated as below,
>
>                  if (u->addr) {  // under a hash table lock here
>                          int i, len;
>                          seq_putc(seq, ' ');
>
>                          i = 0;
>                          len = u->addr->len -
>                                  offsetof(struct sockaddr_un, sun_path);
>                          if (u->addr->name->sun_path[0]) {
>                                  len--;
>                          } else {
>                                  seq_putc(seq, '@');
>                                  i++;
>                          }
>                          for ( ; i < len; i++)
>                                  seq_putc(seq, u->addr->name->sun_path[i] ?:
>                                           '@');
>                  }
>
> Is it possible that we can use the above method to find the
> address length so we won't need to pass uaddr_len to bpf program?
>
> Since all other hooks do not need to uaddr_len, you could add some
> new hooks for unix socket which can specially calculate uaddr_len
> after the bpf program run.

I don't think we can. If we look at the definition of abstract unix
socket in the official man page:

> abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0').  The socket's address in this namespace is given by the additional bytes in sun_path that are covered by the specified length of the address structure.  (Null bytes in
> the  name  have  no  special  significance.)   The name has no connection with filesystem pathnames.  When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the first (addrlen -
> sizeof(sa_family_t)) bytes of sun_path.

This specifically says that the address in the abstract namespace is
given by the additional bytes in sun_path that are covered by the
length of the address structure. If I understand correctly, that means
there's no way to derive the length from just the contents of the
sockaddr structure. We need
the actual length as specified by the caller to know which bytes
belong to the address. Note that it's valid for the abstract name to
contain Null bytes, so we cannot use those in any way or form to
detect whether further bytes belong to the address or not. It seems
valid to have an abstract name
consisting of 107 Null bytes in sun_path.


On Tue, 13 Dec 2022 at 06:20, Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 12/10/22 11:35 AM, Daan De Meyer wrote:
> > These hooks allows intercepting bind(), connect(), getsockname(),
> > getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
> > socket hooks get write access to the address length because the
> > address length is not fixed when dealing with unix sockets and
> > needs to be modified when a unix socket address is modified by
> > the hook. Because abstract socket unix addresses start with a
> > NUL byte, we cannot recalculate the socket address in kernelspace
> > after running the hook by calculating the length of the unix socket
> > path using strlen().
>
> Yes, although we cannot calculate the socket path length with
> strlen(). But we still have a method to find the path. In
> unix_seq_show(), the unix socket path is calculated as below,
>
>                  if (u->addr) {  // under a hash table lock here
>                          int i, len;
>                          seq_putc(seq, ' ');
>
>                          i = 0;
>                          len = u->addr->len -
>                                  offsetof(struct sockaddr_un, sun_path);
>                          if (u->addr->name->sun_path[0]) {
>                                  len--;
>                          } else {
>                                  seq_putc(seq, '@');
>                                  i++;
>                          }
>                          for ( ; i < len; i++)
>                                  seq_putc(seq, u->addr->name->sun_path[i] ?:
>                                           '@');
>                  }
>
> Is it possible that we can use the above method to find the
> address length so we won't need to pass uaddr_len to bpf program?
>
> Since all other hooks do not need to uaddr_len, you could add some
> new hooks for unix socket which can specially calculate uaddr_len
> after the bpf program run.
>
> >
> > This hook can be used when users want to multiplex syscall to a
> > single unix socket to multiple different processes behind the scenes
> > by redirecting the connect() and other syscalls to process specific
> > sockets.
> > ---
> >   include/linux/bpf-cgroup-defs.h |  6 +++
> >   include/linux/bpf-cgroup.h      | 29 ++++++++++-
> >   include/uapi/linux/bpf.h        | 14 ++++--
> >   kernel/bpf/cgroup.c             | 11 ++++-
> >   kernel/bpf/syscall.c            | 18 +++++++
> >   kernel/bpf/verifier.c           |  7 ++-
> >   net/core/filter.c               | 45 +++++++++++++++--
> >   net/unix/af_unix.c              | 85 +++++++++++++++++++++++++++++----
> >   tools/include/uapi/linux/bpf.h  | 14 ++++--
> >   9 files changed, 204 insertions(+), 25 deletions(-)
> >
> > diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h
> > index 7b121bd780eb..8196ccb81915 100644
> > --- a/include/linux/bpf-cgroup-defs.h
> > +++ b/include/linux/bpf-cgroup-defs.h
> > @@ -26,21 +26,27 @@ enum cgroup_bpf_attach_type {
> >       CGROUP_DEVICE,
> >       CGROUP_INET4_BIND,
> >       CGROUP_INET6_BIND,
> > +     CGROUP_UNIX_BIND,
> >       CGROUP_INET4_CONNECT,
> >       CGROUP_INET6_CONNECT,
> > +     CGROUP_UNIX_CONNECT,
> >       CGROUP_INET4_POST_BIND,
> >       CGROUP_INET6_POST_BIND,
> >       CGROUP_UDP4_SENDMSG,
> >       CGROUP_UDP6_SENDMSG,
> > +     CGROUP_UNIX_SENDMSG,
> >       CGROUP_SYSCTL,
> >       CGROUP_UDP4_RECVMSG,
> >       CGROUP_UDP6_RECVMSG,
> > +     CGROUP_UNIX_RECVMSG,
> >       CGROUP_GETSOCKOPT,
> >       CGROUP_SETSOCKOPT,
> >       CGROUP_INET4_GETPEERNAME,
> >       CGROUP_INET6_GETPEERNAME,
> > +     CGROUP_UNIX_GETPEERNAME,
> >       CGROUP_INET4_GETSOCKNAME,
> >       CGROUP_INET6_GETSOCKNAME,
> > +     CGROUP_UNIX_GETSOCKNAME,
> >       CGROUP_INET_SOCK_RELEASE,
> >       CGROUP_LSM_START,
> >       CGROUP_LSM_END = CGROUP_LSM_START + CGROUP_LSM_NUM - 1,
> > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> > index 3ab2f06ddc8a..4de3016f01e4 100644
> > --- a/include/linux/bpf-cgroup.h
> > +++ b/include/linux/bpf-cgroup.h
> > @@ -46,21 +46,27 @@ to_cgroup_bpf_attach_type(enum bpf_attach_type attach_type)
> >       CGROUP_ATYPE(CGROUP_DEVICE);
> >       CGROUP_ATYPE(CGROUP_INET4_BIND);
> >       CGROUP_ATYPE(CGROUP_INET6_BIND);
> > +     CGROUP_ATYPE(CGROUP_UNIX_BIND);
> >       CGROUP_ATYPE(CGROUP_INET4_CONNECT);
> >       CGROUP_ATYPE(CGROUP_INET6_CONNECT);
> > +     CGROUP_ATYPE(CGROUP_UNIX_CONNECT);
> >       CGROUP_ATYPE(CGROUP_INET4_POST_BIND);
> >       CGROUP_ATYPE(CGROUP_INET6_POST_BIND);
> >       CGROUP_ATYPE(CGROUP_UDP4_SENDMSG);
> >       CGROUP_ATYPE(CGROUP_UDP6_SENDMSG);
> > +     CGROUP_ATYPE(CGROUP_UNIX_SENDMSG);
> >       CGROUP_ATYPE(CGROUP_SYSCTL);
> >       CGROUP_ATYPE(CGROUP_UDP4_RECVMSG);
> >       CGROUP_ATYPE(CGROUP_UDP6_RECVMSG);
> > +     CGROUP_ATYPE(CGROUP_UNIX_RECVMSG);
> >       CGROUP_ATYPE(CGROUP_GETSOCKOPT);
> >       CGROUP_ATYPE(CGROUP_SETSOCKOPT);
> >       CGROUP_ATYPE(CGROUP_INET4_GETPEERNAME);
> >       CGROUP_ATYPE(CGROUP_INET6_GETPEERNAME);
> > +     CGROUP_ATYPE(CGROUP_UNIX_GETPEERNAME);
> >       CGROUP_ATYPE(CGROUP_INET4_GETSOCKNAME);
> >       CGROUP_ATYPE(CGROUP_INET6_GETSOCKNAME);
> > +     CGROUP_ATYPE(CGROUP_UNIX_GETSOCKNAME);
> >       CGROUP_ATYPE(CGROUP_INET_SOCK_RELEASE);
> >       default:
> >               return CGROUP_BPF_ATTACH_TYPE_INVALID;
> > @@ -273,9 +279,13 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
> >               __ret;                                                       \
> >       })
> >
> > +#define BPF_CGROUP_RUN_PROG_UNIX_BIND_LOCK(sk, uaddr, uaddrlen)                      \
> > +     BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_BIND, NULL)
> > +
> >   #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk)                                 \
> >       ((cgroup_bpf_enabled(CGROUP_INET4_CONNECT) ||                  \
> > -       cgroup_bpf_enabled(CGROUP_INET6_CONNECT)) &&                 \
> > +       cgroup_bpf_enabled(CGROUP_INET6_CONNECT) ||                  \
> > +       cgroup_bpf_enabled(CGROUP_UNIX_CONNECT)) &&                  \
> >        (sk)->sk_prot->pre_connect)
> >
> >   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen)                     \
> > @@ -284,24 +294,36 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
> >   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen)                     \
> >       BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT)
> >
> > +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT(sk, uaddr, uaddrlen)                       \
> > +     BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_UNIX_CONNECT)
> > +
> >   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen)        \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT, NULL)
> >
> >   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen)        \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT, NULL)
> >
> > +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT_LOCK(sk, uaddr, uaddrlen)          \
> > +     BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_CONNECT, NULL)
> > +
> >   #define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_SENDMSG, t_ctx)
> >
> >   #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_SENDMSG, t_ctx)
> >
> > +#define BPF_CGROUP_RUN_PROG_UNIX_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)    \
> > +     BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_SENDMSG, t_ctx)
> > +
> >   #define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen)          \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_RECVMSG, NULL)
> >
> >   #define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen)          \
> >       BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_RECVMSG, NULL)
> >
> > +#define BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk, uaddr, uaddrlen)           \
> > +     BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UNIX_RECVMSG, NULL)
> > +
> >   /* The SOCK_OPS"_SK" macro should be used when sock_ops->sk is not a
> >    * fullsock and its parent fullsock cannot be traced by
> >    * sk_to_full_sk().
> > @@ -487,16 +509,21 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map,
> >   #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, uaddrlen, atype, flags) ({ 0; })
> > +#define BPF_CGROUP_RUN_PROG_UNIX_BIND_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> > +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT(sk, uaddr, uaddrlen) ({ 0; })
> > +#define BPF_CGROUP_RUN_PROG_UNIX_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
> > +#define BPF_CGROUP_RUN_PROG_UNIX_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> > +#define BPF_CGROUP_RUN_PROG_UNIX_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(atype, major, minor, access) ({ 0; })
> >   #define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos) ({ 0; })
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 9e3c33f83bba..b73e4da458fd 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -999,17 +999,21 @@ enum bpf_attach_type {
> >       BPF_SK_MSG_VERDICT,
> >       BPF_CGROUP_INET4_BIND,
> >       BPF_CGROUP_INET6_BIND,
> > +     BPF_CGROUP_UNIX_BIND,
> >       BPF_CGROUP_INET4_CONNECT,
> >       BPF_CGROUP_INET6_CONNECT,
> > +     BPF_CGROUP_UNIX_CONNECT,
> >       BPF_CGROUP_INET4_POST_BIND,
> >       BPF_CGROUP_INET6_POST_BIND,
> >       BPF_CGROUP_UDP4_SENDMSG,
> >       BPF_CGROUP_UDP6_SENDMSG,
> > +     BPF_CGROUP_UNIX_SENDMSG,
> >       BPF_LIRC_MODE2,
> >       BPF_FLOW_DISSECTOR,
> >       BPF_CGROUP_SYSCTL,
> >       BPF_CGROUP_UDP4_RECVMSG,
> >       BPF_CGROUP_UDP6_RECVMSG,
> > +     BPF_CGROUP_UNIX_RECVMSG,
> >       BPF_CGROUP_GETSOCKOPT,
> >       BPF_CGROUP_SETSOCKOPT,
> >       BPF_TRACE_RAW_TP,
> > @@ -1020,8 +1024,10 @@ enum bpf_attach_type {
> >       BPF_TRACE_ITER,
> >       BPF_CGROUP_INET4_GETPEERNAME,
> >       BPF_CGROUP_INET6_GETPEERNAME,
> > +     BPF_CGROUP_UNIX_GETPEERNAME,
> >       BPF_CGROUP_INET4_GETSOCKNAME,
> >       BPF_CGROUP_INET6_GETSOCKNAME,
> > +     BPF_CGROUP_UNIX_GETSOCKNAME,
> >       BPF_XDP_DEVMAP,
> >       BPF_CGROUP_INET_SOCK_RELEASE,
> >       BPF_XDP_CPUMAP,
>
> This is uapi. Please add new attach type to the end of enum type.
>
> > @@ -2575,8 +2581,8 @@ union bpf_attr {
> >    *          *bpf_socket* should be one of the following:
> >    *
> >    *          * **struct bpf_sock_ops** for **BPF_PROG_TYPE_SOCK_OPS**.
> > - *           * **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**
> > - *             and **BPF_CGROUP_INET6_CONNECT**.
> > + *           * **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**,
> > + *             **BPF_CGROUP_INET6_CONNECT** and **BPF_CGROUP_UNIX_CONNECT**.
> >    *
> >    *          This helper actually implements a subset of **setsockopt()**.
> >    *          It supports the following *level*\ s:
> > @@ -2809,8 +2815,8 @@ union bpf_attr {
> >    *          *bpf_socket* should be one of the following:
> >    *
> >    *          * **struct bpf_sock_ops** for **BPF_PROG_TYPE_SOCK_OPS**.
> > - *           * **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**
> > - *             and **BPF_CGROUP_INET6_CONNECT**.
> > + *           * **struct bpf_sock_addr** for **BPF_CGROUP_INET4_CONNECT**,
> > + *             **BPF_CGROUP_INET6_CONNECT** and **BPF_CGROUP_UNIX_CONNECT**.
> >    *
> >    *          This helper actually implements a subset of **getsockopt()**.
> >    *          It supports the same set of *optname*\ s that is supported by
> [...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets
  2022-12-13 11:36     ` Daan De Meyer
@ 2022-12-13 21:54       ` Yonghong Song
  2022-12-15 14:34         ` Daan De Meyer
  0 siblings, 1 reply; 20+ messages in thread
From: Yonghong Song @ 2022-12-13 21:54 UTC (permalink / raw)
  To: Daan De Meyer; +Cc: bpf, martin.lau, kernel-team



On 12/13/22 3:36 AM, Daan De Meyer wrote:
>> On 12/10/22 11:35 AM, Daan De Meyer wrote:
>>> These hooks allows intercepting bind(), connect(), getsockname(),
>>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
>>> socket hooks get write access to the address length because the
>>> address length is not fixed when dealing with unix sockets and
>>> needs to be modified when a unix socket address is modified by
>>> the hook. Because abstract socket unix addresses start with a
>>> NUL byte, we cannot recalculate the socket address in kernelspace
>>> after running the hook by calculating the length of the unix socket
>>> path using strlen().
>>
>> Yes, although we cannot calculate the socket path length with
>> strlen(). But we still have a method to find the path. In
>> unix_seq_show(), the unix socket path is calculated as below,
>>
>>                   if (u->addr) {  // under a hash table lock here
>>                           int i, len;
>>                           seq_putc(seq, ' ');
>>
>>                           i = 0;
>>                           len = u->addr->len -
>>                                   offsetof(struct sockaddr_un, sun_path);
>>                           if (u->addr->name->sun_path[0]) {
>>                                   len--;
>>                           } else {
>>                                   seq_putc(seq, '@');
>>                                   i++;
>>                           }
>>                           for ( ; i < len; i++)
>>                                   seq_putc(seq, u->addr->name->sun_path[i] ?:
>>                                            '@');
>>                   }
>>
>> Is it possible that we can use the above method to find the
>> address length so we won't need to pass uaddr_len to bpf program?
>>
>> Since all other hooks do not need to uaddr_len, you could add some
>> new hooks for unix socket which can specially calculate uaddr_len
>> after the bpf program run.
> 
> I don't think we can. If we look at the definition of abstract unix
> socket in the official man page:
> 
>> abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0').  The socket's address in this namespace is given by the additional bytes in sun_path that are covered by the specified length of the address structure.  (Null bytes in
>> the  name  have  no  special  significance.)   The name has no connection with filesystem pathnames.  When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the first (addrlen -
>> sizeof(sa_family_t)) bytes of sun_path.
> 
> This specifically says that the address in the abstract namespace is
> given by the additional bytes in sun_path that are covered by the
> length of the address structure. If I understand correctly, that means
> there's no way to derive the length from just the contents of the
> sockaddr structure. We need
> the actual length as specified by the caller to know which bytes
> belong to the address. Note that it's valid for the abstract name to
> contain Null bytes, so we cannot use those in any way or form to
> detect whether further bytes belong to the address or not. It seems
> valid to have an abstract name
> consisting of 107 Null bytes in sun_path.

Okay, it looks like bpf program is able to set abstract name as well.
It would be good we have an example for this in selftest.

With abstract address setable by bpf program, I guess you are right,
we have to let user to explicitly tell us the address length.

I assume it is possible for user to write an address like below:
"a\0b\0"
addr_len = offsetof(struct sockaddr_un, sun_path) + 4
but actually it is illegal, right? We have to validate the
legality of sun_path/addr_len beyond unix_validate_addr(), right?

> 
> 
> On Tue, 13 Dec 2022 at 06:20, Yonghong Song <yhs@meta.com> wrote:
>>
>>
>>
>> On 12/10/22 11:35 AM, Daan De Meyer wrote:
>>> These hooks allows intercepting bind(), connect(), getsockname(),
>>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
>>> socket hooks get write access to the address length because the
>>> address length is not fixed when dealing with unix sockets and
>>> needs to be modified when a unix socket address is modified by
>>> the hook. Because abstract socket unix addresses start with a
>>> NUL byte, we cannot recalculate the socket address in kernelspace
>>> after running the hook by calculating the length of the unix socket
>>> path using strlen().
>>
>> Yes, although we cannot calculate the socket path length with
>> strlen(). But we still have a method to find the path. In
>> unix_seq_show(), the unix socket path is calculated as below,
>>
>>                   if (u->addr) {  // under a hash table lock here
>>                           int i, len;
>>                           seq_putc(seq, ' ');
>>
>>                           i = 0;
>>                           len = u->addr->len -
>>                                   offsetof(struct sockaddr_un, sun_path);
>>                           if (u->addr->name->sun_path[0]) {
>>                                   len--;
>>                           } else {
>>                                   seq_putc(seq, '@');
>>                                   i++;
>>                           }
>>                           for ( ; i < len; i++)
>>                                   seq_putc(seq, u->addr->name->sun_path[i] ?:
>>                                            '@');
>>                   }
>>
>> Is it possible that we can use the above method to find the
>> address length so we won't need to pass uaddr_len to bpf program?
>>
>> Since all other hooks do not need to uaddr_len, you could add some
>> new hooks for unix socket which can specially calculate uaddr_len
>> after the bpf program run.
>>
>>>
>>> This hook can be used when users want to multiplex syscall to a
>>> single unix socket to multiple different processes behind the scenes
>>> by redirecting the connect() and other syscalls to process specific
>>> sockets.
>>> ---
>>>    include/linux/bpf-cgroup-defs.h |  6 +++
>>>    include/linux/bpf-cgroup.h      | 29 ++++++++++-
>>>    include/uapi/linux/bpf.h        | 14 ++++--
>>>    kernel/bpf/cgroup.c             | 11 ++++-
>>>    kernel/bpf/syscall.c            | 18 +++++++
>>>    kernel/bpf/verifier.c           |  7 ++-
>>>    net/core/filter.c               | 45 +++++++++++++++--
>>>    net/unix/af_unix.c              | 85 +++++++++++++++++++++++++++++----
>>>    tools/include/uapi/linux/bpf.h  | 14 ++++--
>>>    9 files changed, 204 insertions(+), 25 deletions(-)
>>>
[...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets
  2022-12-13 21:54       ` Yonghong Song
@ 2022-12-15 14:34         ` Daan De Meyer
  2022-12-15 18:32           ` Yonghong Song
  0 siblings, 1 reply; 20+ messages in thread
From: Daan De Meyer @ 2022-12-15 14:34 UTC (permalink / raw)
  To: Yonghong Song; +Cc: bpf, martin.lau, kernel-team

> >> On 12/10/22 11:35 AM, Daan De Meyer wrote:
> >>> These hooks allows intercepting bind(), connect(), getsockname(),
> >>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
> >>> socket hooks get write access to the address length because the
> >>> address length is not fixed when dealing with unix sockets and
> >>> needs to be modified when a unix socket address is modified by
> >>> the hook. Because abstract socket unix addresses start with a
> >>> NUL byte, we cannot recalculate the socket address in kernelspace
> >>> after running the hook by calculating the length of the unix socket
> >>> path using strlen().
> >>
> >> Yes, although we cannot calculate the socket path length with
> >> strlen(). But we still have a method to find the path. In
> >> unix_seq_show(), the unix socket path is calculated as below,
> >>
> >>                   if (u->addr) {  // under a hash table lock here
> >>                           int i, len;
> >>                           seq_putc(seq, ' ');
> >>
> >>                           i = 0;
> >>                           len = u->addr->len -
> >>                                   offsetof(struct sockaddr_un, sun_path);
> >>                           if (u->addr->name->sun_path[0]) {
> >>                                   len--;
> >>                           } else {
> >>                                   seq_putc(seq, '@');
> >>                                   i++;
> >>                           }
> >>                           for ( ; i < len; i++)
> >>                                   seq_putc(seq, u->addr->name->sun_path[i] ?:
> >>                                            '@');
> >>                   }
> >>
> >> Is it possible that we can use the above method to find the
> >> address length so we won't need to pass uaddr_len to bpf program?
> >>
> >> Since all other hooks do not need to uaddr_len, you could add some
> >> new hooks for unix socket which can specially calculate uaddr_len
> >> after the bpf program run.
> >
> > I don't think we can. If we look at the definition of abstract unix
> > socket in the official man page:
> >
> >> abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0').  The socket's address in this namespace is given by the additional bytes in sun_path that are covered by the specified length of the address structure.  (Null bytes in
> >> the  name  have  no  special  significance.)   The name has no connection with filesystem pathnames.  When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the first (addrlen -
> >> sizeof(sa_family_t)) bytes of sun_path.
> >
> > This specifically says that the address in the abstract namespace is
> > given by the additional bytes in sun_path that are covered by the
> > length of the address structure. If I understand correctly, that means
> > there's no way to derive the length from just the contents of the
> > sockaddr structure. We need
> > the actual length as specified by the caller to know which bytes
> > belong to the address. Note that it's valid for the abstract name to
> > contain Null bytes, so we cannot use those in any way or form to
> > detect whether further bytes belong to the address or not. It seems
> > valid to have an abstract name
> > consisting of 107 Null bytes in sun_path.
>
> Okay, it looks like bpf program is able to set abstract name as well.
> It would be good we have an example for this in selftest.
>
> With abstract address setable by bpf program, I guess you are right,
> we have to let user to explicitly tell us the address length.
>
> I assume it is possible for user to write an address like below:
> "a\0b\0"
> addr_len = offsetof(struct sockaddr_un, sun_path) + 4
> but actually it is illegal, right? We have to validate the
> legality of sun_path/addr_len beyond unix_validate_addr(), right?

This is not actually illegal according to the man page I think, let's
look at the following quote from the man page:

>  Pathname sockets
>      When binding a socket to a pathname, a few rules should be observed for maximum portability and ease of coding:
>
>      *  The pathname in sun_path should be null-terminated.
>
>      *  The length of the pathname, including the terminating null byte, should not exceed the size of sun_path.
>
>      *  The addrlen argument that describes the enclosing sockaddr_un structure should have a value of at least:
>
>             offsetof(struct sockaddr_un, sun_path)+strlen(addr.sun_path)+1
>
>         or, more simply, addrlen can be specified as sizeof(struct sockaddr_un).

So when doing a pathname based path, the address length is allowed to
be bigger than the actual path. So I don't think
we need to do any more validation than what is done by
unix_validate_addr(). The selftests are already using abstract
unix sockets because they don't need any cleanup.


On Tue, 13 Dec 2022 at 21:54, Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 12/13/22 3:36 AM, Daan De Meyer wrote:
> >> On 12/10/22 11:35 AM, Daan De Meyer wrote:
> >>> These hooks allows intercepting bind(), connect(), getsockname(),
> >>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
> >>> socket hooks get write access to the address length because the
> >>> address length is not fixed when dealing with unix sockets and
> >>> needs to be modified when a unix socket address is modified by
> >>> the hook. Because abstract socket unix addresses start with a
> >>> NUL byte, we cannot recalculate the socket address in kernelspace
> >>> after running the hook by calculating the length of the unix socket
> >>> path using strlen().
> >>
> >> Yes, although we cannot calculate the socket path length with
> >> strlen(). But we still have a method to find the path. In
> >> unix_seq_show(), the unix socket path is calculated as below,
> >>
> >>                   if (u->addr) {  // under a hash table lock here
> >>                           int i, len;
> >>                           seq_putc(seq, ' ');
> >>
> >>                           i = 0;
> >>                           len = u->addr->len -
> >>                                   offsetof(struct sockaddr_un, sun_path);
> >>                           if (u->addr->name->sun_path[0]) {
> >>                                   len--;
> >>                           } else {
> >>                                   seq_putc(seq, '@');
> >>                                   i++;
> >>                           }
> >>                           for ( ; i < len; i++)
> >>                                   seq_putc(seq, u->addr->name->sun_path[i] ?:
> >>                                            '@');
> >>                   }
> >>
> >> Is it possible that we can use the above method to find the
> >> address length so we won't need to pass uaddr_len to bpf program?
> >>
> >> Since all other hooks do not need to uaddr_len, you could add some
> >> new hooks for unix socket which can specially calculate uaddr_len
> >> after the bpf program run.
> >
> > I don't think we can. If we look at the definition of abstract unix
> > socket in the official man page:
> >
> >> abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0').  The socket's address in this namespace is given by the additional bytes in sun_path that are covered by the specified length of the address structure.  (Null bytes in
> >> the  name  have  no  special  significance.)   The name has no connection with filesystem pathnames.  When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the first (addrlen -
> >> sizeof(sa_family_t)) bytes of sun_path.
> >
> > This specifically says that the address in the abstract namespace is
> > given by the additional bytes in sun_path that are covered by the
> > length of the address structure. If I understand correctly, that means
> > there's no way to derive the length from just the contents of the
> > sockaddr structure. We need
> > the actual length as specified by the caller to know which bytes
> > belong to the address. Note that it's valid for the abstract name to
> > contain Null bytes, so we cannot use those in any way or form to
> > detect whether further bytes belong to the address or not. It seems
> > valid to have an abstract name
> > consisting of 107 Null bytes in sun_path.
>
> Okay, it looks like bpf program is able to set abstract name as well.
> It would be good we have an example for this in selftest.
>
> With abstract address setable by bpf program, I guess you are right,
> we have to let user to explicitly tell us the address length.
>
> I assume it is possible for user to write an address like below:
> "a\0b\0"
> addr_len = offsetof(struct sockaddr_un, sun_path) + 4
> but actually it is illegal, right? We have to validate the
> legality of sun_path/addr_len beyond unix_validate_addr(), right?
>
> >
> >
> > On Tue, 13 Dec 2022 at 06:20, Yonghong Song <yhs@meta.com> wrote:
> >>
> >>
> >>
> >> On 12/10/22 11:35 AM, Daan De Meyer wrote:
> >>> These hooks allows intercepting bind(), connect(), getsockname(),
> >>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
> >>> socket hooks get write access to the address length because the
> >>> address length is not fixed when dealing with unix sockets and
> >>> needs to be modified when a unix socket address is modified by
> >>> the hook. Because abstract socket unix addresses start with a
> >>> NUL byte, we cannot recalculate the socket address in kernelspace
> >>> after running the hook by calculating the length of the unix socket
> >>> path using strlen().
> >>
> >> Yes, although we cannot calculate the socket path length with
> >> strlen(). But we still have a method to find the path. In
> >> unix_seq_show(), the unix socket path is calculated as below,
> >>
> >>                   if (u->addr) {  // under a hash table lock here
> >>                           int i, len;
> >>                           seq_putc(seq, ' ');
> >>
> >>                           i = 0;
> >>                           len = u->addr->len -
> >>                                   offsetof(struct sockaddr_un, sun_path);
> >>                           if (u->addr->name->sun_path[0]) {
> >>                                   len--;
> >>                           } else {
> >>                                   seq_putc(seq, '@');
> >>                                   i++;
> >>                           }
> >>                           for ( ; i < len; i++)
> >>                                   seq_putc(seq, u->addr->name->sun_path[i] ?:
> >>                                            '@');
> >>                   }
> >>
> >> Is it possible that we can use the above method to find the
> >> address length so we won't need to pass uaddr_len to bpf program?
> >>
> >> Since all other hooks do not need to uaddr_len, you could add some
> >> new hooks for unix socket which can specially calculate uaddr_len
> >> after the bpf program run.
> >>
> >>>
> >>> This hook can be used when users want to multiplex syscall to a
> >>> single unix socket to multiple different processes behind the scenes
> >>> by redirecting the connect() and other syscalls to process specific
> >>> sockets.
> >>> ---
> >>>    include/linux/bpf-cgroup-defs.h |  6 +++
> >>>    include/linux/bpf-cgroup.h      | 29 ++++++++++-
> >>>    include/uapi/linux/bpf.h        | 14 ++++--
> >>>    kernel/bpf/cgroup.c             | 11 ++++-
> >>>    kernel/bpf/syscall.c            | 18 +++++++
> >>>    kernel/bpf/verifier.c           |  7 ++-
> >>>    net/core/filter.c               | 45 +++++++++++++++--
> >>>    net/unix/af_unix.c              | 85 +++++++++++++++++++++++++++++----
> >>>    tools/include/uapi/linux/bpf.h  | 14 ++++--
> >>>    9 files changed, 204 insertions(+), 25 deletions(-)
> >>>
> [...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets
  2022-12-15 14:34         ` Daan De Meyer
@ 2022-12-15 18:32           ` Yonghong Song
  0 siblings, 0 replies; 20+ messages in thread
From: Yonghong Song @ 2022-12-15 18:32 UTC (permalink / raw)
  To: Daan De Meyer; +Cc: bpf, martin.lau, kernel-team



On 12/15/22 6:34 AM, Daan De Meyer wrote:
>>>> On 12/10/22 11:35 AM, Daan De Meyer wrote:
>>>>> These hooks allows intercepting bind(), connect(), getsockname(),
>>>>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
>>>>> socket hooks get write access to the address length because the
>>>>> address length is not fixed when dealing with unix sockets and
>>>>> needs to be modified when a unix socket address is modified by
>>>>> the hook. Because abstract socket unix addresses start with a
>>>>> NUL byte, we cannot recalculate the socket address in kernelspace
>>>>> after running the hook by calculating the length of the unix socket
>>>>> path using strlen().
>>>>
>>>> Yes, although we cannot calculate the socket path length with
>>>> strlen(). But we still have a method to find the path. In
>>>> unix_seq_show(), the unix socket path is calculated as below,
>>>>
>>>>                    if (u->addr) {  // under a hash table lock here
>>>>                            int i, len;
>>>>                            seq_putc(seq, ' ');
>>>>
>>>>                            i = 0;
>>>>                            len = u->addr->len -
>>>>                                    offsetof(struct sockaddr_un, sun_path);
>>>>                            if (u->addr->name->sun_path[0]) {
>>>>                                    len--;
>>>>                            } else {
>>>>                                    seq_putc(seq, '@');
>>>>                                    i++;
>>>>                            }
>>>>                            for ( ; i < len; i++)
>>>>                                    seq_putc(seq, u->addr->name->sun_path[i] ?:
>>>>                                             '@');
>>>>                    }
>>>>
>>>> Is it possible that we can use the above method to find the
>>>> address length so we won't need to pass uaddr_len to bpf program?
>>>>
>>>> Since all other hooks do not need to uaddr_len, you could add some
>>>> new hooks for unix socket which can specially calculate uaddr_len
>>>> after the bpf program run.
>>>
>>> I don't think we can. If we look at the definition of abstract unix
>>> socket in the official man page:
>>>
>>>> abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0').  The socket's address in this namespace is given by the additional bytes in sun_path that are covered by the specified length of the address structure.  (Null bytes in
>>>> the  name  have  no  special  significance.)   The name has no connection with filesystem pathnames.  When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the first (addrlen -
>>>> sizeof(sa_family_t)) bytes of sun_path.
>>>
>>> This specifically says that the address in the abstract namespace is
>>> given by the additional bytes in sun_path that are covered by the
>>> length of the address structure. If I understand correctly, that means
>>> there's no way to derive the length from just the contents of the
>>> sockaddr structure. We need
>>> the actual length as specified by the caller to know which bytes
>>> belong to the address. Note that it's valid for the abstract name to
>>> contain Null bytes, so we cannot use those in any way or form to
>>> detect whether further bytes belong to the address or not. It seems
>>> valid to have an abstract name
>>> consisting of 107 Null bytes in sun_path.
>>
>> Okay, it looks like bpf program is able to set abstract name as well.
>> It would be good we have an example for this in selftest.
>>
>> With abstract address setable by bpf program, I guess you are right,
>> we have to let user to explicitly tell us the address length.
>>
>> I assume it is possible for user to write an address like below:
>> "a\0b\0"
>> addr_len = offsetof(struct sockaddr_un, sun_path) + 4
>> but actually it is illegal, right? We have to validate the
>> legality of sun_path/addr_len beyond unix_validate_addr(), right?
> 
> This is not actually illegal according to the man page I think, let's
> look at the following quote from the man page:
> 
>>   Pathname sockets
>>       When binding a socket to a pathname, a few rules should be observed for maximum portability and ease of coding:
>>
>>       *  The pathname in sun_path should be null-terminated.
>>
>>       *  The length of the pathname, including the terminating null byte, should not exceed the size of sun_path.
>>
>>       *  The addrlen argument that describes the enclosing sockaddr_un structure should have a value of at least:
>>
>>              offsetof(struct sockaddr_un, sun_path)+strlen(addr.sun_path)+1
>>
>>          or, more simply, addrlen can be specified as sizeof(struct sockaddr_un).
> 
> So when doing a pathname based path, the address length is allowed to
> be bigger than the actual path. So I don't think
> we need to do any more validation than what is done by
> unix_validate_addr(). The selftests are already using abstract
> unix sockets because they don't need any cleanup.

What about smaller, address "abc", and the length is
   offsetof(struct sockaddr_un) + 2

> 
> 
> On Tue, 13 Dec 2022 at 21:54, Yonghong Song <yhs@meta.com> wrote:
>>
>>
>>
>> On 12/13/22 3:36 AM, Daan De Meyer wrote:
>>>> On 12/10/22 11:35 AM, Daan De Meyer wrote:
>>>>> These hooks allows intercepting bind(), connect(), getsockname(),
>>>>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
>>>>> socket hooks get write access to the address length because the
>>>>> address length is not fixed when dealing with unix sockets and
>>>>> needs to be modified when a unix socket address is modified by
>>>>> the hook. Because abstract socket unix addresses start with a
>>>>> NUL byte, we cannot recalculate the socket address in kernelspace
>>>>> after running the hook by calculating the length of the unix socket
>>>>> path using strlen().
>>>>
>>>> Yes, although we cannot calculate the socket path length with
>>>> strlen(). But we still have a method to find the path. In
>>>> unix_seq_show(), the unix socket path is calculated as below,
>>>>
>>>>                    if (u->addr) {  // under a hash table lock here
>>>>                            int i, len;
>>>>                            seq_putc(seq, ' ');
>>>>
>>>>                            i = 0;
>>>>                            len = u->addr->len -
>>>>                                    offsetof(struct sockaddr_un, sun_path);
>>>>                            if (u->addr->name->sun_path[0]) {
>>>>                                    len--;
>>>>                            } else {
>>>>                                    seq_putc(seq, '@');
>>>>                                    i++;
>>>>                            }
>>>>                            for ( ; i < len; i++)
>>>>                                    seq_putc(seq, u->addr->name->sun_path[i] ?:
>>>>                                             '@');
>>>>                    }
>>>>
>>>> Is it possible that we can use the above method to find the
>>>> address length so we won't need to pass uaddr_len to bpf program?
>>>>
>>>> Since all other hooks do not need to uaddr_len, you could add some
>>>> new hooks for unix socket which can specially calculate uaddr_len
>>>> after the bpf program run.
>>>
>>> I don't think we can. If we look at the definition of abstract unix
>>> socket in the official man page:
>>>
>>>> abstract: an abstract socket address is distinguished (from a pathname socket) by the fact that sun_path[0] is a null byte ('\0').  The socket's address in this namespace is given by the additional bytes in sun_path that are covered by the specified length of the address structure.  (Null bytes in
>>>> the  name  have  no  special  significance.)   The name has no connection with filesystem pathnames.  When the address of an abstract socket is returned, the returned addrlen is greater than sizeof(sa_family_t) (i.e., greater than 2), and the name of the socket is contained in the first (addrlen -
>>>> sizeof(sa_family_t)) bytes of sun_path.
>>>
>>> This specifically says that the address in the abstract namespace is
>>> given by the additional bytes in sun_path that are covered by the
>>> length of the address structure. If I understand correctly, that means
>>> there's no way to derive the length from just the contents of the
>>> sockaddr structure. We need
>>> the actual length as specified by the caller to know which bytes
>>> belong to the address. Note that it's valid for the abstract name to
>>> contain Null bytes, so we cannot use those in any way or form to
>>> detect whether further bytes belong to the address or not. It seems
>>> valid to have an abstract name
>>> consisting of 107 Null bytes in sun_path.
>>
>> Okay, it looks like bpf program is able to set abstract name as well.
>> It would be good we have an example for this in selftest.
>>
>> With abstract address setable by bpf program, I guess you are right,
>> we have to let user to explicitly tell us the address length.
>>
>> I assume it is possible for user to write an address like below:
>> "a\0b\0"
>> addr_len = offsetof(struct sockaddr_un, sun_path) + 4
>> but actually it is illegal, right? We have to validate the
>> legality of sun_path/addr_len beyond unix_validate_addr(), right?
>>
>>>
>>>
>>> On Tue, 13 Dec 2022 at 06:20, Yonghong Song <yhs@meta.com> wrote:
>>>>
>>>>
>>>>
>>>> On 12/10/22 11:35 AM, Daan De Meyer wrote:
>>>>> These hooks allows intercepting bind(), connect(), getsockname(),
>>>>> getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
>>>>> socket hooks get write access to the address length because the
>>>>> address length is not fixed when dealing with unix sockets and
>>>>> needs to be modified when a unix socket address is modified by
>>>>> the hook. Because abstract socket unix addresses start with a
>>>>> NUL byte, we cannot recalculate the socket address in kernelspace
>>>>> after running the hook by calculating the length of the unix socket
>>>>> path using strlen().
>>>>
>>>> Yes, although we cannot calculate the socket path length with
>>>> strlen(). But we still have a method to find the path. In
>>>> unix_seq_show(), the unix socket path is calculated as below,
>>>>
>>>>                    if (u->addr) {  // under a hash table lock here
>>>>                            int i, len;
>>>>                            seq_putc(seq, ' ');
>>>>
>>>>                            i = 0;
>>>>                            len = u->addr->len -
>>>>                                    offsetof(struct sockaddr_un, sun_path);
>>>>                            if (u->addr->name->sun_path[0]) {
>>>>                                    len--;
>>>>                            } else {
>>>>                                    seq_putc(seq, '@');
>>>>                                    i++;
>>>>                            }
>>>>                            for ( ; i < len; i++)
>>>>                                    seq_putc(seq, u->addr->name->sun_path[i] ?:
>>>>                                             '@');
>>>>                    }
>>>>
>>>> Is it possible that we can use the above method to find the
>>>> address length so we won't need to pass uaddr_len to bpf program?
>>>>
>>>> Since all other hooks do not need to uaddr_len, you could add some
>>>> new hooks for unix socket which can specially calculate uaddr_len
>>>> after the bpf program run.
>>>>
>>>>>
>>>>> This hook can be used when users want to multiplex syscall to a
>>>>> single unix socket to multiple different processes behind the scenes
>>>>> by redirecting the connect() and other syscalls to process specific
>>>>> sockets.
>>>>> ---
>>>>>     include/linux/bpf-cgroup-defs.h |  6 +++
>>>>>     include/linux/bpf-cgroup.h      | 29 ++++++++++-
>>>>>     include/uapi/linux/bpf.h        | 14 ++++--
>>>>>     kernel/bpf/cgroup.c             | 11 ++++-
>>>>>     kernel/bpf/syscall.c            | 18 +++++++
>>>>>     kernel/bpf/verifier.c           |  7 ++-
>>>>>     net/core/filter.c               | 45 +++++++++++++++--
>>>>>     net/unix/af_unix.c              | 85 +++++++++++++++++++++++++++++----
>>>>>     tools/include/uapi/linux/bpf.h  | 14 ++++--
>>>>>     9 files changed, 204 insertions(+), 25 deletions(-)
>>>>>
>> [...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 2/9] bpf: Allow read access to addr_len from cgroup sockaddr programs
  2022-12-10 19:35 ` [PATCH bpf-next v2 2/9] bpf: Allow read access to addr_len from cgroup sockaddr programs Daan De Meyer
  2022-12-13  6:06   ` Yonghong Song
@ 2022-12-16  7:28   ` Martin KaFai Lau
  2022-12-16 17:40     ` Martin KaFai Lau
  1 sibling, 1 reply; 20+ messages in thread
From: Martin KaFai Lau @ 2022-12-16  7:28 UTC (permalink / raw)
  To: Daan De Meyer; +Cc: kernel-team, bpf

On 12/10/22 11:35 AM, Daan De Meyer wrote:
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 57e9e109257e..3ab2f06ddc8a 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -120,6 +120,7 @@ int __cgroup_bpf_run_filter_sk(struct sock *sk,
>   
>   int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
>   				      struct sockaddr *uaddr,
> +				      int *uaddrlen,
>   				      enum cgroup_bpf_attach_type atype,
>   				      void *t_ctx,
>   				      u32 *flags);
> @@ -230,75 +231,76 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
>   #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk)				       \
>   	BPF_CGROUP_RUN_SK_PROG(sk, CGROUP_INET6_POST_BIND)
>   
> -#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, atype)				       \
> -({									       \
> -	int __ret = 0;							       \
> -	if (cgroup_bpf_enabled(atype))					       \
> -		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
> -							  NULL, NULL);	       \
> -	__ret;								       \
> -})
> -
> -#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, atype, t_ctx)		       \
> -({									       \
> -	int __ret = 0;							       \
> -	if (cgroup_bpf_enabled(atype))	{				       \
> -		lock_sock(sk);						       \
> -		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
> -							  t_ctx, NULL);	       \
> -		release_sock(sk);					       \
> -	}								       \
> -	__ret;								       \
> -})
> +#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, atype)               \
> +	({                                                               \
> +		int __ret = 0;                                           \
> +		if (cgroup_bpf_enabled(atype))                           \
> +			__ret = __cgroup_bpf_run_filter_sock_addr(       \
> +				sk, uaddr, uaddrlen, atype, NULL, NULL); \
> +		__ret;                                                   \
> +	})
> +
> +#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, atype, t_ctx)    \
> +	({                                                                \
> +		int __ret = 0;                                            \
> +		if (cgroup_bpf_enabled(atype)) {                          \
> +			lock_sock(sk);                                    \
> +			__ret = __cgroup_bpf_run_filter_sock_addr(        \
> +				sk, uaddr, uaddrlen, atype, t_ctx, NULL); \
> +			release_sock(sk);                                 \
> +		}                                                         \
> +		__ret;                                                    \
> +	} >
>   /* BPF_CGROUP_INET4_BIND and BPF_CGROUP_INET6_BIND can return extra flags
>    * via upper bits of return code. The only flag that is supported
>    * (at bit position 0) is to indicate CAP_NET_BIND_SERVICE capability check
>    * should be bypassed (BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE).
>    */
> -#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, atype, bind_flags)	       \
> -({									       \
> -	u32 __flags = 0;						       \
> -	int __ret = 0;							       \
> -	if (cgroup_bpf_enabled(atype))	{				       \
> -		lock_sock(sk);						       \
> -		__ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
> -							  NULL, &__flags);     \
> -		release_sock(sk);					       \
> -		if (__flags & BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE)	       \
> -			*bind_flags |= BIND_NO_CAP_NET_BIND_SERVICE;	       \
> -	}								       \
> -	__ret;								       \
> -})
> +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, uaddrlen, atype,       \
> +					   bind_flags)                       \
> +	({                                                                   \
> +		u32 __flags = 0;                                             \
> +		int __ret = 0;                                               \
> +		if (cgroup_bpf_enabled(atype)) {                             \
> +			lock_sock(sk);                                       \
> +			__ret = __cgroup_bpf_run_filter_sock_addr(           \
> +				sk, uaddr, uaddrlen, atype, NULL, &__flags); \
> +			release_sock(sk);                                    \
> +			if (__flags & BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE)  \
> +				*bind_flags |= BIND_NO_CAP_NET_BIND_SERVICE; \
> +		}                                                            \
> +		__ret;                                                       \
> +	})


Some comments on review logistics:

1. Other than empty commit message, please add 'Sign-off-by:'.
It is likely one of the red ERROR that the ./script/checkpatch.pl script will 
complain.  This patch set was quickly put into 'Changes Requested' status:
https://patchwork.kernel.org/project/netdevbpf/patch/20221210193559.371515-2-daan.j.demeyer@gmail.com/

Documentation/process/submitting-patches.rst
and Documentation/bpf/bpf_devel_QA.rst have useful details.


2. Please avoid unnecessary indentation changes like the above 
BPF_CGROUP_RUN_XXX macros.  It makes the review much harder, eg. which line has 
the real change?

>   
>   #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk)				       \
>   	((cgroup_bpf_enabled(CGROUP_INET4_CONNECT) ||		       \
>   	  cgroup_bpf_enabled(CGROUP_INET6_CONNECT)) &&		       \
>   	 (sk)->sk_prot->pre_connect)
>   
> -#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr)			       \
> -	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, CGROUP_INET4_CONNECT)
> +#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen)		       \
> +	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT)
>   
> -#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr)			       \
> -	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, CGROUP_INET6_CONNECT)
> +#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen)		       \
> +	BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT)
>   
> -#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr)		       \
> -	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_INET4_CONNECT, NULL)
> +#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen)	       \
> +	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT, NULL)
>   
> -#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr)		       \
> -	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_INET6_CONNECT, NULL)
> +#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen)	       \
> +	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT, NULL)
>   
> -#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx)		       \
> -	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP4_SENDMSG, t_ctx)
> +#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
> +	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_SENDMSG, t_ctx)
>   
> -#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx)		       \
> -	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP6_SENDMSG, t_ctx)
> +#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx)       \
> +	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_SENDMSG, t_ctx)
>   
> -#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr)			\
> -	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP4_RECVMSG, NULL)
> +#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen)		\
> +	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_RECVMSG, NULL)
>   
> -#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr)			\
> -	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP6_RECVMSG, NULL)
> +#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen)		\
> +	BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_RECVMSG, NULL)

Can the above changes to the INET[4|6] macro be avoided?  If I read the patch 
set correctly, the uaddrlen is not useful for INET.

[ ... ]

> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index bf701976056e..510cf4042f8b 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1294,6 +1294,7 @@ struct bpf_sock_addr_kern {
>   	 */
>   	u64 tmp_reg;
>   	void *t_ctx;	/* Attach type specific context. */
> +	int *uaddrlen;

If I read this patch 2, 3, and 5 right, the newly added "int *uaddrlen" allows 
the bpf prog to specify the length of the kernel address "struct sockaddr 
*uaddr" in bpf_sock_addr_kern.  This feels a bit unease for potential memory 
related issue.  I saw patch 5 added some new unix_validate_addr(sunaddr, 
addr_len) in a few places after the prog is run.  How about the existing INET 
cases?  It doesn't make sense to allow the prog changing the INET[4|6] addrlen. 
Ignoring the change for INET in the kernel also feels wrong.  Checking in the 
kernel after the bpf prog run also seems too late and there are many grounds to 
audit for the INET[4|6] alone.  I think all of these seems crying for a new 
kfunc to set the uaddr and uaddrlen together.  The kfunc can check for incorrect 
addrlen and directly return error to the bpf prog.  Something like this:

int bpf_sock_addr_set(struct bpf_sock_addr_kern *sa_kern,
		const struct sockaddr *addr, u32 addrlen__sz);

This kfunc should work for INET also.  Documentation/bpf/kfuncs.rst has some 
details and net/netfilter/nf_conntrack_bpf.c has some kfunc examples that use a 
similar "__sz" arg.

Also, there are some recent advancements in bpf. Instead of adding a "int *" 
pointer, I would suggest to directly add the value "u32 uaddrlen" to the struct 
bpf_sock_addr"_kern" instead.  Then

SEC("cgroup/bindun")
int bind_un_prog(struct bpf_sock_addr *ctx)
{
	struct bpf_sock_addr_kern *sa_kern;
	struct sockaddr_un *unaddr;
	u32 unaddrlen;

	sa_kern = bpf_cast_to_kern_ctx(ctx);
	unaddrlen = sa_kern->uaddrlen;
	unaddr = bpf_rdonly_cast(sa_kern->uaddr,
				bpf_core_type_id_kernel(struct sockaddr_un));

	/* Read unaddr->sun_path here */
}


In above, sa_kern and unaddr are read only. Let the CO-RE do the job instead 
and no need to do the conversion in convert_ctx_access().  Together with the 
bpf_sock_addr_set() kfunc which takes care of the WRITE, the changes in 
convert_ctx_access() and is_valid_access() should not be needed.   There is also 
no need to add new field "user_path[108]" and "user_addrlen" to the uapi's 
"struct bpf_sock_addr".




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 2/9] bpf: Allow read access to addr_len from cgroup sockaddr programs
  2022-12-16  7:28   ` Martin KaFai Lau
@ 2022-12-16 17:40     ` Martin KaFai Lau
  0 siblings, 0 replies; 20+ messages in thread
From: Martin KaFai Lau @ 2022-12-16 17:40 UTC (permalink / raw)
  To: Daan De Meyer; +Cc: kernel-team, bpf

On 12/15/22 11:28 PM, Martin KaFai Lau wrote:
> On 12/10/22 11:35 AM, Daan De Meyer wrote:
>> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
>> index 57e9e109257e..3ab2f06ddc8a 100644
>> --- a/include/linux/bpf-cgroup.h
>> +++ b/include/linux/bpf-cgroup.h
>> @@ -120,6 +120,7 @@ int __cgroup_bpf_run_filter_sk(struct sock *sk,
>>   int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
>>                         struct sockaddr *uaddr,
>> +                      int *uaddrlen,
>>                         enum cgroup_bpf_attach_type atype,
>>                         void *t_ctx,
>>                         u32 *flags);
>> @@ -230,75 +231,76 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
>>   #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk)                       \
>>       BPF_CGROUP_RUN_SK_PROG(sk, CGROUP_INET6_POST_BIND)
>> -#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, atype)                       \
>> -({                                           \
>> -    int __ret = 0;                                   \
>> -    if (cgroup_bpf_enabled(atype))                           \
>> -        __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
>> -                              NULL, NULL);           \
>> -    __ret;                                       \
>> -})
>> -
>> -#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, atype, t_ctx)               \
>> -({                                           \
>> -    int __ret = 0;                                   \
>> -    if (cgroup_bpf_enabled(atype))    {                       \
>> -        lock_sock(sk);                               \
>> -        __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
>> -                              t_ctx, NULL);           \
>> -        release_sock(sk);                           \
>> -    }                                       \
>> -    __ret;                                       \
>> -})
>> +#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, atype)               \
>> +    ({                                                               \
>> +        int __ret = 0;                                           \
>> +        if (cgroup_bpf_enabled(atype))                           \
>> +            __ret = __cgroup_bpf_run_filter_sock_addr(       \
>> +                sk, uaddr, uaddrlen, atype, NULL, NULL); \
>> +        __ret;                                                   \
>> +    })
>> +
>> +#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, atype, t_ctx)    \
>> +    ({                                                                \
>> +        int __ret = 0;                                            \
>> +        if (cgroup_bpf_enabled(atype)) {                          \
>> +            lock_sock(sk);                                    \
>> +            __ret = __cgroup_bpf_run_filter_sock_addr(        \
>> +                sk, uaddr, uaddrlen, atype, t_ctx, NULL); \
>> +            release_sock(sk);                                 \
>> +        }                                                         \
>> +        __ret;                                                    \
>> +    } >
>>   /* BPF_CGROUP_INET4_BIND and BPF_CGROUP_INET6_BIND can return extra flags
>>    * via upper bits of return code. The only flag that is supported
>>    * (at bit position 0) is to indicate CAP_NET_BIND_SERVICE capability check
>>    * should be bypassed (BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE).
>>    */
>> -#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, atype, 
>> bind_flags)           \
>> -({                                           \
>> -    u32 __flags = 0;                               \
>> -    int __ret = 0;                                   \
>> -    if (cgroup_bpf_enabled(atype))    {                       \
>> -        lock_sock(sk);                               \
>> -        __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
>> -                              NULL, &__flags);     \
>> -        release_sock(sk);                           \
>> -        if (__flags & BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE)           \
>> -            *bind_flags |= BIND_NO_CAP_NET_BIND_SERVICE;           \
>> -    }                                       \
>> -    __ret;                                       \
>> -})
>> +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, uaddrlen, atype,       \
>> +                       bind_flags)                       \
>> +    ({                                                                   \
>> +        u32 __flags = 0;                                             \
>> +        int __ret = 0;                                               \
>> +        if (cgroup_bpf_enabled(atype)) {                             \
>> +            lock_sock(sk);                                       \
>> +            __ret = __cgroup_bpf_run_filter_sock_addr(           \
>> +                sk, uaddr, uaddrlen, atype, NULL, &__flags); \
>> +            release_sock(sk);                                    \
>> +            if (__flags & BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE)  \
>> +                *bind_flags |= BIND_NO_CAP_NET_BIND_SERVICE; \
>> +        }                                                            \
>> +        __ret;                                                       \
>> +    })
> 
> 
> Some comments on review logistics:
> 
> 1. Other than empty commit message, please add 'Sign-off-by:'.
> It is likely one of the red ERROR that the ./script/checkpatch.pl script will 
> complain.  This patch set was quickly put into 'Changes Requested' status:
> https://patchwork.kernel.org/project/netdevbpf/patch/20221210193559.371515-2-daan.j.demeyer@gmail.com/
> 
> Documentation/process/submitting-patches.rst
> and Documentation/bpf/bpf_devel_QA.rst have useful details.
> 
> 
> 2. Please avoid unnecessary indentation changes like the above 
> BPF_CGROUP_RUN_XXX macros.  It makes the review much harder, eg. which line has 
> the real change?
> 
>>   #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk)                       \
>>       ((cgroup_bpf_enabled(CGROUP_INET4_CONNECT) ||               \
>>         cgroup_bpf_enabled(CGROUP_INET6_CONNECT)) &&               \
>>        (sk)->sk_prot->pre_connect)
>> -#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr)                   \
>> -    BPF_CGROUP_RUN_SA_PROG(sk, uaddr, CGROUP_INET4_CONNECT)
>> +#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen)               \
>> +    BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT)
>> -#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr)                   \
>> -    BPF_CGROUP_RUN_SA_PROG(sk, uaddr, CGROUP_INET6_CONNECT)
>> +#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen)               \
>> +    BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT)
>> -#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr)               \
>> -    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_INET4_CONNECT, NULL)
>> +#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen)           \
>> +    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT, NULL)
>> -#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr)               \
>> -    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_INET6_CONNECT, NULL)
>> +#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen)           \
>> +    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT, NULL)
>> -#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx)               \
>> -    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP4_SENDMSG, t_ctx)
>> +#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, 
>> t_ctx)       \
>> +    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_SENDMSG, t_ctx)
>> -#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx)               \
>> -    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP6_SENDMSG, t_ctx)
>> +#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, 
>> t_ctx)       \
>> +    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_SENDMSG, t_ctx)
>> -#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr)            \
>> -    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP4_RECVMSG, NULL)
>> +#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen)        \
>> +    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_RECVMSG, NULL)
>> -#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr)            \
>> -    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP6_RECVMSG, NULL)
>> +#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen)        \
>> +    BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_RECVMSG, NULL)
> 
> Can the above changes to the INET[4|6] macro be avoided?  If I read the patch 
> set correctly, the uaddrlen is not useful for INET.
> 
> [ ... ]
> 
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index bf701976056e..510cf4042f8b 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -1294,6 +1294,7 @@ struct bpf_sock_addr_kern {
>>        */
>>       u64 tmp_reg;
>>       void *t_ctx;    /* Attach type specific context. */
>> +    int *uaddrlen;
> 
> If I read this patch 2, 3, and 5 right, the newly added "int *uaddrlen" allows 
> the bpf prog to specify the length of the kernel address "struct sockaddr 
> *uaddr" in bpf_sock_addr_kern.  This feels a bit unease for potential memory 
> related issue.  I saw patch 5 added some new unix_validate_addr(sunaddr, 
> addr_len) in a few places after the prog is run.  How about the existing INET 
> cases?  It doesn't make sense to allow the prog changing the INET[4|6] addrlen. 
> Ignoring the change for INET in the kernel also feels wrong.  Checking in the 
> kernel after the bpf prog run also seems too late and there are many grounds to 
> audit for the INET[4|6] alone.  I think all of these seems crying for a new 
> kfunc to set the uaddr and uaddrlen together.  The kfunc can check for incorrect 
> addrlen and directly return error to the bpf prog.  Something like this:
> 
> int bpf_sock_addr_set(struct bpf_sock_addr_kern *sa_kern,
>          const struct sockaddr *addr, u32 addrlen__sz);
> 
> This kfunc should work for INET also.  Documentation/bpf/kfuncs.rst has some 
> details and net/netfilter/nf_conntrack_bpf.c has some kfunc examples that use a 
> similar "__sz" arg.
> 
> Also, there are some recent advancements in bpf. Instead of adding a "int *" 
> pointer, I would suggest to directly add the value "u32 uaddrlen" to the struct 
> bpf_sock_addr"_kern" instead.  Then
> 
> SEC("cgroup/bindun")
> int bind_un_prog(struct bpf_sock_addr *ctx)
> {
>      struct bpf_sock_addr_kern *sa_kern;
>      struct sockaddr_un *unaddr;
>      u32 unaddrlen;
> 
>      sa_kern = bpf_cast_to_kern_ctx(ctx);
>      unaddrlen = sa_kern->uaddrlen;
>      unaddr = bpf_rdonly_cast(sa_kern->uaddr,
>                  bpf_core_type_id_kernel(struct sockaddr_un));
> 
>      /* Read unaddr->sun_path here */
> }
> 
> 
> In above, sa_kern and unaddr are read only. Let the CO-RE do the job instead and 
> no need to do the conversion in convert_ctx_access().  Together with the 
> bpf_sock_addr_set() kfunc which takes care of the WRITE, the changes in 
> convert_ctx_access() and is_valid_access() should not be needed.   There is also 
> no need to add new field "user_path[108]" and "user_addrlen" to the uapi's 
> "struct bpf_sock_addr".

I just noticed patch 5 has checked on INET for writing to "user_addrlen", my bad 
for overlook.  However, the other points still stand for kfunc, 
bpf_cast_to_kern_ctx(), and bpf_rdonly_cast(). eg. a kfunc interface that works 
for INET, UNIX and potentially other AF_* in the future, check incorrect addrlen 
at the kfunc, no changes to convert_ctx_access() and the uapi.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-12-16 17:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-10 19:35 [PATCH bpf-next v2 0/9] Add cgroup sockaddr hooks for unix sockets Daan De Meyer
2022-12-10 19:35 ` [PATCH bpf-next v2 1/9] selftests/bpf: Add missing section name tests for getpeername/getsockname Daan De Meyer
2022-12-13  4:46   ` Yonghong Song
2022-12-10 19:35 ` [PATCH bpf-next v2 2/9] bpf: Allow read access to addr_len from cgroup sockaddr programs Daan De Meyer
2022-12-13  6:06   ` Yonghong Song
2022-12-16  7:28   ` Martin KaFai Lau
2022-12-16 17:40     ` Martin KaFai Lau
2022-12-10 19:35 ` [PATCH bpf-next v2 3/9] bpf: Support access to sun_path " Daan De Meyer
2022-12-13  6:15   ` Yonghong Song
2022-12-10 19:35 ` [PATCH bpf-next v2 4/9] selftests/bpf: Track sockaddr length in sock addr tests Daan De Meyer
2022-12-10 19:35 ` [PATCH bpf-next v2 5/9] bpf: Implement cgroup sockaddr hooks for unix sockets Daan De Meyer
2022-12-13  6:20   ` Yonghong Song
2022-12-13 11:36     ` Daan De Meyer
2022-12-13 21:54       ` Yonghong Song
2022-12-15 14:34         ` Daan De Meyer
2022-12-15 18:32           ` Yonghong Song
2022-12-10 19:35 ` [PATCH bpf-next v2 6/9] libbpf: Add support for cgroup unix socket address hooks Daan De Meyer
2022-12-10 19:35 ` [PATCH bpf-next v2 7/9] bpftool: " Daan De Meyer
2022-12-10 19:35 ` [PATCH bpf-next v2 8/9] selftests/bpf: Add tests " Daan De Meyer
2022-12-10 19:35 ` [PATCH bpf-next v2 9/9] documentation/bpf: Document " Daan De Meyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.