bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands
@ 2023-10-16 13:47 Breno Leitao
  2023-10-16 13:47 ` [PATCH v7 01/11] bpf: Add sockptr support for getsockopt Breno Leitao
                   ` (12 more replies)
  0 siblings, 13 replies; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman
  Cc: bpf, linux-kernel, netdev, io-uring

This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
SOL_SOCKET level, which seems to be the most common level parameter for
get/setsockopt(2).

In order to keep the implementation (and tests) simple, some refactors
were done prior to the changes, as follows:

Patches 1-2: Make BPF cgroup filters sockptr aware

Patches 3-4: Remove the core {s,g}etsockopt() core function from
__sys_{g,s}etsockopt, so, the code could be reused by other callers, such as
io_uring.

Patch 5: Pass compat mode to the file/socket callbacks

Patch 6-7: Move io_uring helpers from io_uring_zerocopy_tx to a generic
io_uring headers. This simplify the test case (last patch). Also copy the
io_uring UAPI to the tests directory.

Patch 8: Protect io_uring_cmd_sock() to not be called if CONFIG_NET is
disabled.

These changes were tested with a new test[1] in liburing, LTP sockopt*
tests, as also with bpf/progs/sockopt test case, which is now adapted to
run using both system calls and io_uring commands.

[1] Link: https://github.com/leitao/liburing/blob/getsock/test/socket-getsetsock-cmd.c

RFC -> V1:
	* Copy user memory at io_uring subsystem, and call proto_ops
	  callbacks using kernel memory
	* Implement all the cases for SOCKET_URING_OP_SETSOCKOPT

V1 -> V2
	* Implemented the BPF part
	* Using user pointers from optval to avoid kmalloc in io_uring part.

V2 -> V3:
	* Break down __sys_setsockopt and reuse the core code, avoiding
	  duplicated code. This removed the requirement to expose
	  sock_use_custom_sol_socket().
	* Added io_uring test to selftests/bpf/sockopt.
	* Fixed compat argument, by passing it to the issue_flags.

V3 -> V4:
	* Rebase on top of commit 1ded5e5a5931b ("net: annotate data-races around sock->ops")
	* Also broke down __sys_setsockopt() to reuse the core function
	  from io_uring.
	* Create a new patch to return -EOPNOTSUPP if CONFIG_NET is
	  disabled.
	* Added two SOL_SOCKET tests in bpf/prog_tests/sockopt.

V4 -> V5:
	* Do not use sockptr anymore, by changing the optlen getsock argument
	  to be a user pointer (instead of a kernel pointer). This change also drop
	  the limitation on getsockopt from previous versions, and now all
	  levels are supported.
	* Simplified the BPF sockopt test, since there is no more limitation on
	  the io_uring commands.
	* No more changes in the BPF subsystem.
	* Moved the optlen field in the SQE struct. It is now a pointer instead
	  of u32.

V5 -> V6:
	* Removed the need for #ifdef CONFIG_NET as suggested by Gabriel
	  Krisman.
	* Changed the variable declaration order to respect the reverse
	  xmas declaration as suggested by Paolo Abeni.

V6 -> V7:
	* Changed the optlen back to a value in the SQE instead of
	  user-pointer. This is similar to version 4.
	  [https://lore.kernel.org/all/20231009095518.288a5573@kernel.org/]
	* Imported the io_uring.h into tools/include/uapi/linux to be able to
	  run the tests in machines without liburing.
	  [https://lore.kernel.org/all/77405214-ae42-d58b-1d40-c639683a0cb1@linux.dev/]

Breno Leitao (11):
  bpf: Leverage sockptr_t in BPF getsockopt hook
  bpf: Leverage sockptr_t in BPF setsockopt hook
  net/socket: Break down __sys_setsockopt
  net/socket: Break down __sys_getsockopt
  io_uring/cmd: Pass compat mode in issue_flags
  tools headers: Grab copy of io_uring.h
  selftests/net: Extract uring helpers to be reusable
  io_uring/cmd: return -EOPNOTSUPP if net is disabled
  io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  selftests/bpf/sockopt: Add io_uring support
Breno Leitao (11):
  bpf: Add sockptr support for getsockopt
  bpf: Add sockptr support for setsockopt
  net/socket: Break down __sys_setsockopt
  net/socket: Break down __sys_getsockopt
  io_uring/cmd: Pass compat mode in issue_flags
  tools headers: Grab copy of io_uring.h
  selftests/net: Extract uring helpers to be reusable
  io_uring/cmd: return -EOPNOTSUPP if net is disabled
  io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  selftests/bpf/sockopt: Add io_uring support

 include/linux/bpf-cgroup.h                    |   9 +-
 include/linux/io_uring.h                      |   1 +
 include/net/sock.h                            |   6 +-
 include/uapi/linux/io_uring.h                 |   8 +
 io_uring/uring_cmd.c                          |  53 ++
 kernel/bpf/cgroup.c                           |  25 +-
 net/core/sock.c                               |   8 -
 net/socket.c                                  | 103 ++-
 tools/include/io_uring/mini_liburing.h        | 282 +++++++
 tools/include/uapi/linux/io_uring.h           | 757 ++++++++++++++++++
 .../selftests/bpf/prog_tests/sockopt.c        | 113 ++-
 tools/testing/selftests/net/Makefile          |   1 +
 .../selftests/net/io_uring_zerocopy_tx.c      | 268 +------
 13 files changed, 1300 insertions(+), 334 deletions(-)
 create mode 100644 tools/include/io_uring/mini_liburing.h
 create mode 100644 tools/include/uapi/linux/io_uring.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v7 01/11] bpf: Add sockptr support for getsockopt
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
  2023-10-19 19:24   ` Martin KaFai Lau
  2023-10-16 13:47 ` [PATCH v7 02/11] bpf: Add sockptr support for setsockopt Breno Leitao
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet
  Cc: bpf, linux-kernel, netdev, io-uring

The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in getsockptr BPF hooks, so, it
could be used by other callers.

The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.

Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/bpf-cgroup.h |  5 +++--
 kernel/bpf/cgroup.c        | 20 +++++++++++---------
 net/socket.c               |  5 +++--
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 98b8cea904fe..7b55844f6ba7 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -145,9 +145,10 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
 				       int *optname, char __user *optval,
 				       int *optlen, char **kernel_optval);
+
 int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
-				       int optname, char __user *optval,
-				       int __user *optlen, int max_optlen,
+				       int optname, sockptr_t optval,
+				       sockptr_t optlen, int max_optlen,
 				       int retval);
 
 int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 74ad2215e1ba..97745f67ac15 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1890,8 +1890,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 }
 
 int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
-				       int optname, char __user *optval,
-				       int __user *optlen, int max_optlen,
+				       int optname, sockptr_t optval,
+				       sockptr_t optlen, int max_optlen,
 				       int retval)
 {
 	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
@@ -1918,8 +1918,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 		 * one that kernel returned as well to let
 		 * BPF programs inspect the value.
 		 */
-
-		if (get_user(ctx.optlen, optlen)) {
+		if (copy_from_sockptr(&ctx.optlen, optlen,
+				      sizeof(ctx.optlen))) {
 			ret = -EFAULT;
 			goto out;
 		}
@@ -1930,8 +1930,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 		}
 		orig_optlen = ctx.optlen;
 
-		if (copy_from_user(ctx.optval, optval,
-				   min(ctx.optlen, max_optlen)) != 0) {
+		if (copy_from_sockptr(ctx.optval, optval,
+				      min(ctx.optlen, max_optlen))) {
 			ret = -EFAULT;
 			goto out;
 		}
@@ -1945,7 +1945,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 	if (ret < 0)
 		goto out;
 
-	if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
+	if (!sockptr_is_null(optval) &&
+	    (ctx.optlen > max_optlen || ctx.optlen < 0)) {
 		if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) {
 			pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n",
 				     ctx.optlen, max_optlen);
@@ -1957,11 +1958,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 	}
 
 	if (ctx.optlen != 0) {
-		if (optval && copy_to_user(optval, ctx.optval, ctx.optlen)) {
+		if (!sockptr_is_null(optval) &&
+		    copy_to_sockptr(optval, ctx.optval, ctx.optlen)) {
 			ret = -EFAULT;
 			goto out;
 		}
-		if (put_user(ctx.optlen, optlen)) {
+		if (copy_to_sockptr(optlen, &ctx.optlen, sizeof(ctx.optlen))) {
 			ret = -EFAULT;
 			goto out;
 		}
diff --git a/net/socket.c b/net/socket.c
index 5740475e084c..6b47dd499218 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2373,8 +2373,9 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 
 	if (!in_compat_syscall())
 		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
-						     optval, optlen, max_optlen,
-						     err);
+						     USER_SOCKPTR(optval),
+						     USER_SOCKPTR(optlen),
+						     max_optlen, err);
 out_put:
 	fput_light(sock->file, fput_needed);
 	return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 02/11] bpf: Add sockptr support for setsockopt
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
  2023-10-16 13:47 ` [PATCH v7 01/11] bpf: Add sockptr support for getsockopt Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
  2023-10-19 19:32   ` Martin KaFai Lau
  2023-10-16 13:47 ` [PATCH v7 03/11] net/socket: Break down __sys_setsockopt Breno Leitao
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet
  Cc: bpf, linux-kernel, netdev, io-uring

The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in setsockptr BPF hooks, so, it
could be used by other callers.

The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.

Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/bpf-cgroup.h | 2 +-
 kernel/bpf/cgroup.c        | 5 +++--
 net/socket.c               | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 7b55844f6ba7..2912dce9144e 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -143,7 +143,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 				   enum cgroup_bpf_attach_type atype);
 
 int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
-				       int *optname, char __user *optval,
+				       int *optname, sockptr_t optval,
 				       int *optlen, char **kernel_optval);
 
 int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 97745f67ac15..491d20038cbe 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1800,7 +1800,7 @@ static bool sockopt_buf_allocated(struct bpf_sockopt_kern *ctx,
 }
 
 int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
-				       int *optname, char __user *optval,
+				       int *optname, sockptr_t optval,
 				       int *optlen, char **kernel_optval)
 {
 	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
@@ -1823,7 +1823,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 
 	ctx.optlen = *optlen;
 
-	if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) {
+	if (copy_from_sockptr(ctx.optval, optval,
+			      min(*optlen, max_optlen))) {
 		ret = -EFAULT;
 		goto out;
 	}
diff --git a/net/socket.c b/net/socket.c
index 6b47dd499218..28d3eb339514 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2305,7 +2305,7 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
 
 	if (!in_compat_syscall())
 		err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
-						     user_optval, &optlen,
+						     optval, &optlen,
 						     &kernel_optval);
 	if (err < 0)
 		goto out_put;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 03/11] net/socket: Break down __sys_setsockopt
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
  2023-10-16 13:47 ` [PATCH v7 01/11] bpf: Add sockptr support for getsockopt Breno Leitao
  2023-10-16 13:47 ` [PATCH v7 02/11] bpf: Add sockptr support for setsockopt Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
  2023-10-19  0:20   ` Jakub Kicinski
  2023-10-19 19:38   ` Martin KaFai Lau
  2023-10-16 13:47 ` [PATCH v7 04/11] net/socket: Break down __sys_getsockopt Breno Leitao
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman, David S. Miller, Eric Dumazet
  Cc: bpf, linux-kernel, netdev, io-uring, Willem de Bruijn

Split __sys_setsockopt() into two functions by removing the core
logic into a sub-function (do_sock_setsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.

do_sock_setsockopt() will be called by io_uring setsockopt() command
operation in the following patch.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
 include/net/sock.h |  2 ++
 net/socket.c       | 39 +++++++++++++++++++++++++--------------
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 242590308d64..00103e3143c4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1864,6 +1864,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
 		  sockptr_t optval, unsigned int optlen);
 int sock_setsockopt(struct socket *sock, int level, int op,
 		    sockptr_t optval, unsigned int optlen);
+int do_sock_setsockopt(struct socket *sock, bool compat, int level,
+		       int optname, sockptr_t optval, int optlen);
 
 int sk_getsockopt(struct sock *sk, int level, int optname,
 		  sockptr_t optval, sockptr_t optlen);
diff --git a/net/socket.c b/net/socket.c
index 28d3eb339514..0087f8c071e7 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2279,31 +2279,21 @@ static bool sock_use_custom_sol_socket(const struct socket *sock)
 	return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
 }
 
-/*
- *	Set a socket option. Because we don't know the option lengths we have
- *	to pass the user mode parameter for the protocols to sort out.
- */
-int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
-		int optlen)
+int do_sock_setsockopt(struct socket *sock, bool compat, int level,
+		       int optname, sockptr_t optval, int optlen)
 {
-	sockptr_t optval = USER_SOCKPTR(user_optval);
 	const struct proto_ops *ops;
 	char *kernel_optval = NULL;
-	int err, fput_needed;
-	struct socket *sock;
+	int err;
 
 	if (optlen < 0)
 		return -EINVAL;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
-
 	err = security_socket_setsockopt(sock, level, optname);
 	if (err)
 		goto out_put;
 
-	if (!in_compat_syscall())
+	if (!compat)
 		err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
 						     optval, &optlen,
 						     &kernel_optval);
@@ -2326,6 +2316,27 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
 					    optlen);
 	kfree(kernel_optval);
 out_put:
+	return err;
+}
+EXPORT_SYMBOL(do_sock_setsockopt);
+
+/* Set a socket option. Because we don't know the option lengths we have
+ * to pass the user mode parameter for the protocols to sort out.
+ */
+int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
+		     int optlen)
+{
+	sockptr_t optval = USER_SOCKPTR(user_optval);
+	bool compat = in_compat_syscall();
+	int err, fput_needed;
+	struct socket *sock;
+
+	sock = sockfd_lookup_light(fd, &err, &fput_needed);
+	if (!sock)
+		return err;
+
+	err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
+
 	fput_light(sock->file, fput_needed);
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (2 preceding siblings ...)
  2023-10-16 13:47 ` [PATCH v7 03/11] net/socket: Break down __sys_setsockopt Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
  2023-10-19  0:20   ` Jakub Kicinski
  2023-10-19 19:12   ` Martin KaFai Lau
  2023-10-16 13:47 ` [PATCH v7 05/11] io_uring/cmd: Pass compat mode in issue_flags Breno Leitao
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet
  Cc: bpf, linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
	Alexander Mikhalitsyn, David Howells

Split __sys_getsockopt() into two functions by removing the core
logic into a sub-function (do_sock_getsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.

do_sock_getsockopt() will be called by io_uring getsockopt() command
operation in the following patch.

The same was done for the setsockopt pair.

Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/bpf-cgroup.h |  2 +-
 include/net/sock.h         |  4 +--
 net/core/sock.c            |  8 -----
 net/socket.c               | 63 ++++++++++++++++++++++++--------------
 4 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 2912dce9144e..a789266feac3 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -393,7 +393,7 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
 ({									       \
 	int __ret = 0;							       \
 	if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT))			       \
-		get_user(__ret, optlen);				       \
+		copy_from_sockptr(&__ret, optlen, sizeof(int));		       \
 	__ret;								       \
 })
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 00103e3143c4..1d6931caf0c3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1866,11 +1866,11 @@ int sock_setsockopt(struct socket *sock, int level, int op,
 		    sockptr_t optval, unsigned int optlen);
 int do_sock_setsockopt(struct socket *sock, bool compat, int level,
 		       int optname, sockptr_t optval, int optlen);
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+		       int optname, sockptr_t optval, sockptr_t optlen);
 
 int sk_getsockopt(struct sock *sk, int level, int optname,
 		  sockptr_t optval, sockptr_t optlen);
-int sock_getsockopt(struct socket *sock, int level, int op,
-		    char __user *optval, int __user *optlen);
 int sock_gettstamp(struct socket *sock, void __user *userstamp,
 		   bool timeval, bool time32);
 struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
diff --git a/net/core/sock.c b/net/core/sock.c
index 290165954379..d4cb8d6e75b7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2003,14 +2003,6 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
 	return 0;
 }
 
-int sock_getsockopt(struct socket *sock, int level, int optname,
-		    char __user *optval, int __user *optlen)
-{
-	return sk_getsockopt(sock->sk, level, optname,
-			     USER_SOCKPTR(optval),
-			     USER_SOCKPTR(optlen));
-}
-
 /*
  * Initialize an sk_lock.
  *
diff --git a/net/socket.c b/net/socket.c
index 0087f8c071e7..f4c156a1987e 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2350,6 +2350,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
 INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
 							 int optname));
 
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+		       int optname, sockptr_t optval, sockptr_t optlen)
+{
+	int max_optlen __maybe_unused;
+	const struct proto_ops *ops;
+	int err;
+
+	err = security_socket_getsockopt(sock, level, optname);
+	if (err)
+		return err;
+
+	ops = READ_ONCE(sock->ops);
+	if (level == SOL_SOCKET) {
+		err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
+	} else if (unlikely(!ops->getsockopt)) {
+		err = -EOPNOTSUPP;
+	} else {
+		if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
+			      "Invalid argument type"))
+			return -EOPNOTSUPP;
+
+		err = ops->getsockopt(sock, level, optname, optval.user,
+				      optlen.user);
+	}
+
+	if (!compat) {
+		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
+						     optval, optlen, max_optlen,
+						     err);
+	}
+
+	return err;
+}
+EXPORT_SYMBOL(do_sock_getsockopt);
+
 /*
  *	Get a socket option. Because we don't know the option lengths we have
  *	to pass a user mode parameter for the protocols to sort out.
@@ -2357,37 +2393,18 @@ INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
 int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 		int __user *optlen)
 {
-	int max_optlen __maybe_unused;
-	const struct proto_ops *ops;
 	int err, fput_needed;
 	struct socket *sock;
+	bool compat;
 
 	sock = sockfd_lookup_light(fd, &err, &fput_needed);
 	if (!sock)
 		return err;
 
-	err = security_socket_getsockopt(sock, level, optname);
-	if (err)
-		goto out_put;
-
-	if (!in_compat_syscall())
-		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+	compat = in_compat_syscall();
+	err = do_sock_getsockopt(sock, compat, level, optname,
+				 USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
 
-	ops = READ_ONCE(sock->ops);
-	if (level == SOL_SOCKET)
-		err = sock_getsockopt(sock, level, optname, optval, optlen);
-	else if (unlikely(!ops->getsockopt))
-		err = -EOPNOTSUPP;
-	else
-		err = ops->getsockopt(sock, level, optname, optval,
-					    optlen);
-
-	if (!in_compat_syscall())
-		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
-						     USER_SOCKPTR(optval),
-						     USER_SOCKPTR(optlen),
-						     max_optlen, err);
-out_put:
 	fput_light(sock->file, fput_needed);
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 05/11] io_uring/cmd: Pass compat mode in issue_flags
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (3 preceding siblings ...)
  2023-10-16 13:47 ` [PATCH v7 04/11] net/socket: Break down __sys_getsockopt Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
  2023-10-16 13:47 ` [PATCH v7 06/11] tools headers: Grab copy of io_uring.h Breno Leitao
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman
  Cc: bpf, linux-kernel, netdev, io-uring

Create a new flag to track if the operation is running compat mode.
This basically check the context->compat and pass it to the issue_flags,
so, it could be queried later in the callbacks.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
---
 include/linux/io_uring.h | 1 +
 io_uring/uring_cmd.c     | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index b4391e0a9bc8..aefb73eeeebf 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -23,6 +23,7 @@ enum io_uring_cmd_flags {
 
 	/* set when uring wants to cancel a previously issued command */
 	IO_URING_F_CANCEL		= (1 << 11),
+	IO_URING_F_COMPAT		= (1 << 12),
 };
 
 /* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 00a5e5621a28..4bedd633c08c 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -175,6 +175,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 		issue_flags |= IO_URING_F_SQE128;
 	if (ctx->flags & IORING_SETUP_CQE32)
 		issue_flags |= IO_URING_F_CQE32;
+	if (ctx->compat)
+		issue_flags |= IO_URING_F_COMPAT;
 	if (ctx->flags & IORING_SETUP_IOPOLL) {
 		if (!file->f_op->uring_cmd_iopoll)
 			return -EOPNOTSUPP;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 06/11] tools headers: Grab copy of io_uring.h
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (4 preceding siblings ...)
  2023-10-16 13:47 ` [PATCH v7 05/11] io_uring/cmd: Pass compat mode in issue_flags Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
  2023-10-16 18:56   ` Gabriel Krisman Bertazi
       [not found]   ` <652d877c.250a0220.b0af2.3a66SMTPIN_ADDED_BROKEN@mx.google.com>
  2023-10-16 13:47 ` [PATCH v7 07/11] selftests/net: Extract uring helpers to be reusable Breno Leitao
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman
  Cc: bpf, linux-kernel, netdev, io-uring, Peter Zijlstra (Intel),
	Stefan Metzmacher, Josh Triplett

This file will be used by mini_uring.h and allow tests to run without
the need of installing liburing to run the tests.

This is needed to run io_uring tests in BPF, such as
(tools/testing/selftests/bpf/prog_tests/sockopt.c).

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 tools/include/uapi/linux/io_uring.h | 757 ++++++++++++++++++++++++++++
 1 file changed, 757 insertions(+)
 create mode 100644 tools/include/uapi/linux/io_uring.h

diff --git a/tools/include/uapi/linux/io_uring.h b/tools/include/uapi/linux/io_uring.h
new file mode 100644
index 000000000000..f1c16f817742
--- /dev/null
+++ b/tools/include/uapi/linux/io_uring.h
@@ -0,0 +1,757 @@
+/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */
+/*
+ * Header file for the io_uring interface.
+ *
+ * Copyright (C) 2019 Jens Axboe
+ * Copyright (C) 2019 Christoph Hellwig
+ */
+#ifndef LINUX_IO_URING_H
+#define LINUX_IO_URING_H
+
+#include <linux/fs.h>
+#include <linux/types.h>
+/*
+ * this file is shared with liburing and that has to autodetect
+ * if linux/time_types.h is available or not, it can
+ * define UAPI_LINUX_IO_URING_H_SKIP_LINUX_TIME_TYPES_H
+ * if linux/time_types.h is not available
+ */
+#ifndef UAPI_LINUX_IO_URING_H_SKIP_LINUX_TIME_TYPES_H
+#include <linux/time_types.h>
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * IO submission data structure (Submission Queue Entry)
+ */
+struct io_uring_sqe {
+	__u8	opcode;		/* type of operation for this sqe */
+	__u8	flags;		/* IOSQE_ flags */
+	__u16	ioprio;		/* ioprio for the request */
+	__s32	fd;		/* file descriptor to do IO on */
+	union {
+		__u64	off;	/* offset into file */
+		__u64	addr2;
+		struct {
+			__u32	cmd_op;
+			__u32	__pad1;
+		};
+	};
+	union {
+		__u64	addr;	/* pointer to buffer or iovecs */
+		__u64	splice_off_in;
+		struct {
+			__u32	level;
+			__u32	optname;
+		};
+	};
+	__u32	len;		/* buffer size or number of iovecs */
+	union {
+		__kernel_rwf_t	rw_flags;
+		__u32		fsync_flags;
+		__u16		poll_events;	/* compatibility */
+		__u32		poll32_events;	/* word-reversed for BE */
+		__u32		sync_range_flags;
+		__u32		msg_flags;
+		__u32		timeout_flags;
+		__u32		accept_flags;
+		__u32		cancel_flags;
+		__u32		open_flags;
+		__u32		statx_flags;
+		__u32		fadvise_advice;
+		__u32		splice_flags;
+		__u32		rename_flags;
+		__u32		unlink_flags;
+		__u32		hardlink_flags;
+		__u32		xattr_flags;
+		__u32		msg_ring_flags;
+		__u32		uring_cmd_flags;
+		__u32		waitid_flags;
+		__u32		futex_flags;
+	};
+	__u64	user_data;	/* data to be passed back at completion time */
+	/* pack this to avoid bogus arm OABI complaints */
+	union {
+		/* index into fixed buffers, if used */
+		__u16	buf_index;
+		/* for grouped buffer selection */
+		__u16	buf_group;
+	} __attribute__((packed));
+	/* personality to use, if used */
+	__u16	personality;
+	union {
+		__s32	splice_fd_in;
+		__u32	file_index;
+		__u32	optlen;
+		struct {
+			__u16	addr_len;
+			__u16	__pad3[1];
+		};
+	};
+	union {
+		struct {
+			__u64	addr3;
+			__u64	__pad2[1];
+		};
+		__u64	optval;
+		/*
+		 * If the ring is initialized with IORING_SETUP_SQE128, then
+		 * this field is used for 80 bytes of arbitrary command data
+		 */
+		__u8	cmd[0];
+	};
+};
+
+/*
+ * If sqe->file_index is set to this for opcodes that instantiate a new
+ * direct descriptor (like openat/openat2/accept), then io_uring will allocate
+ * an available direct descriptor instead of having the application pass one
+ * in. The picked direct descriptor will be returned in cqe->res, or -ENFILE
+ * if the space is full.
+ */
+#define IORING_FILE_INDEX_ALLOC		(~0U)
+
+enum {
+	IOSQE_FIXED_FILE_BIT,
+	IOSQE_IO_DRAIN_BIT,
+	IOSQE_IO_LINK_BIT,
+	IOSQE_IO_HARDLINK_BIT,
+	IOSQE_ASYNC_BIT,
+	IOSQE_BUFFER_SELECT_BIT,
+	IOSQE_CQE_SKIP_SUCCESS_BIT,
+};
+
+/*
+ * sqe->flags
+ */
+/* use fixed fileset */
+#define IOSQE_FIXED_FILE	(1U << IOSQE_FIXED_FILE_BIT)
+/* issue after inflight IO */
+#define IOSQE_IO_DRAIN		(1U << IOSQE_IO_DRAIN_BIT)
+/* links next sqe */
+#define IOSQE_IO_LINK		(1U << IOSQE_IO_LINK_BIT)
+/* like LINK, but stronger */
+#define IOSQE_IO_HARDLINK	(1U << IOSQE_IO_HARDLINK_BIT)
+/* always go async */
+#define IOSQE_ASYNC		(1U << IOSQE_ASYNC_BIT)
+/* select buffer from sqe->buf_group */
+#define IOSQE_BUFFER_SELECT	(1U << IOSQE_BUFFER_SELECT_BIT)
+/* don't post CQE if request succeeded */
+#define IOSQE_CQE_SKIP_SUCCESS	(1U << IOSQE_CQE_SKIP_SUCCESS_BIT)
+
+/*
+ * io_uring_setup() flags
+ */
+#define IORING_SETUP_IOPOLL	(1U << 0)	/* io_context is polled */
+#define IORING_SETUP_SQPOLL	(1U << 1)	/* SQ poll thread */
+#define IORING_SETUP_SQ_AFF	(1U << 2)	/* sq_thread_cpu is valid */
+#define IORING_SETUP_CQSIZE	(1U << 3)	/* app defines CQ size */
+#define IORING_SETUP_CLAMP	(1U << 4)	/* clamp SQ/CQ ring sizes */
+#define IORING_SETUP_ATTACH_WQ	(1U << 5)	/* attach to existing wq */
+#define IORING_SETUP_R_DISABLED	(1U << 6)	/* start with ring disabled */
+#define IORING_SETUP_SUBMIT_ALL	(1U << 7)	/* continue submit on error */
+/*
+ * Cooperative task running. When requests complete, they often require
+ * forcing the submitter to transition to the kernel to complete. If this
+ * flag is set, work will be done when the task transitions anyway, rather
+ * than force an inter-processor interrupt reschedule. This avoids interrupting
+ * a task running in userspace, and saves an IPI.
+ */
+#define IORING_SETUP_COOP_TASKRUN	(1U << 8)
+/*
+ * If COOP_TASKRUN is set, get notified if task work is available for
+ * running and a kernel transition would be needed to run it. This sets
+ * IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN.
+ */
+#define IORING_SETUP_TASKRUN_FLAG	(1U << 9)
+#define IORING_SETUP_SQE128		(1U << 10) /* SQEs are 128 byte */
+#define IORING_SETUP_CQE32		(1U << 11) /* CQEs are 32 byte */
+/*
+ * Only one task is allowed to submit requests
+ */
+#define IORING_SETUP_SINGLE_ISSUER	(1U << 12)
+
+/*
+ * Defer running task work to get events.
+ * Rather than running bits of task work whenever the task transitions
+ * try to do it just before it is needed.
+ */
+#define IORING_SETUP_DEFER_TASKRUN	(1U << 13)
+
+/*
+ * Application provides the memory for the rings
+ */
+#define IORING_SETUP_NO_MMAP		(1U << 14)
+
+/*
+ * Register the ring fd in itself for use with
+ * IORING_REGISTER_USE_REGISTERED_RING; return a registered fd index rather
+ * than an fd.
+ */
+#define IORING_SETUP_REGISTERED_FD_ONLY	(1U << 15)
+
+/*
+ * Removes indirection through the SQ index array.
+ */
+#define IORING_SETUP_NO_SQARRAY		(1U << 16)
+
+enum io_uring_op {
+	IORING_OP_NOP,
+	IORING_OP_READV,
+	IORING_OP_WRITEV,
+	IORING_OP_FSYNC,
+	IORING_OP_READ_FIXED,
+	IORING_OP_WRITE_FIXED,
+	IORING_OP_POLL_ADD,
+	IORING_OP_POLL_REMOVE,
+	IORING_OP_SYNC_FILE_RANGE,
+	IORING_OP_SENDMSG,
+	IORING_OP_RECVMSG,
+	IORING_OP_TIMEOUT,
+	IORING_OP_TIMEOUT_REMOVE,
+	IORING_OP_ACCEPT,
+	IORING_OP_ASYNC_CANCEL,
+	IORING_OP_LINK_TIMEOUT,
+	IORING_OP_CONNECT,
+	IORING_OP_FALLOCATE,
+	IORING_OP_OPENAT,
+	IORING_OP_CLOSE,
+	IORING_OP_FILES_UPDATE,
+	IORING_OP_STATX,
+	IORING_OP_READ,
+	IORING_OP_WRITE,
+	IORING_OP_FADVISE,
+	IORING_OP_MADVISE,
+	IORING_OP_SEND,
+	IORING_OP_RECV,
+	IORING_OP_OPENAT2,
+	IORING_OP_EPOLL_CTL,
+	IORING_OP_SPLICE,
+	IORING_OP_PROVIDE_BUFFERS,
+	IORING_OP_REMOVE_BUFFERS,
+	IORING_OP_TEE,
+	IORING_OP_SHUTDOWN,
+	IORING_OP_RENAMEAT,
+	IORING_OP_UNLINKAT,
+	IORING_OP_MKDIRAT,
+	IORING_OP_SYMLINKAT,
+	IORING_OP_LINKAT,
+	IORING_OP_MSG_RING,
+	IORING_OP_FSETXATTR,
+	IORING_OP_SETXATTR,
+	IORING_OP_FGETXATTR,
+	IORING_OP_GETXATTR,
+	IORING_OP_SOCKET,
+	IORING_OP_URING_CMD,
+	IORING_OP_SEND_ZC,
+	IORING_OP_SENDMSG_ZC,
+	IORING_OP_READ_MULTISHOT,
+	IORING_OP_WAITID,
+	IORING_OP_FUTEX_WAIT,
+	IORING_OP_FUTEX_WAKE,
+	IORING_OP_FUTEX_WAITV,
+
+	/* this goes last, obviously */
+	IORING_OP_LAST,
+};
+
+/*
+ * sqe->uring_cmd_flags		top 8bits aren't available for userspace
+ * IORING_URING_CMD_FIXED	use registered buffer; pass this flag
+ *				along with setting sqe->buf_index.
+ */
+#define IORING_URING_CMD_FIXED	(1U << 0)
+#define IORING_URING_CMD_MASK	IORING_URING_CMD_FIXED
+
+
+/*
+ * sqe->fsync_flags
+ */
+#define IORING_FSYNC_DATASYNC	(1U << 0)
+
+/*
+ * sqe->timeout_flags
+ */
+#define IORING_TIMEOUT_ABS		(1U << 0)
+#define IORING_TIMEOUT_UPDATE		(1U << 1)
+#define IORING_TIMEOUT_BOOTTIME		(1U << 2)
+#define IORING_TIMEOUT_REALTIME		(1U << 3)
+#define IORING_LINK_TIMEOUT_UPDATE	(1U << 4)
+#define IORING_TIMEOUT_ETIME_SUCCESS	(1U << 5)
+#define IORING_TIMEOUT_MULTISHOT	(1U << 6)
+#define IORING_TIMEOUT_CLOCK_MASK	(IORING_TIMEOUT_BOOTTIME | IORING_TIMEOUT_REALTIME)
+#define IORING_TIMEOUT_UPDATE_MASK	(IORING_TIMEOUT_UPDATE | IORING_LINK_TIMEOUT_UPDATE)
+/*
+ * sqe->splice_flags
+ * extends splice(2) flags
+ */
+#define SPLICE_F_FD_IN_FIXED	(1U << 31) /* the last bit of __u32 */
+
+/*
+ * POLL_ADD flags. Note that since sqe->poll_events is the flag space, the
+ * command flags for POLL_ADD are stored in sqe->len.
+ *
+ * IORING_POLL_ADD_MULTI	Multishot poll. Sets IORING_CQE_F_MORE if
+ *				the poll handler will continue to report
+ *				CQEs on behalf of the same SQE.
+ *
+ * IORING_POLL_UPDATE		Update existing poll request, matching
+ *				sqe->addr as the old user_data field.
+ *
+ * IORING_POLL_LEVEL		Level triggered poll.
+ */
+#define IORING_POLL_ADD_MULTI	(1U << 0)
+#define IORING_POLL_UPDATE_EVENTS	(1U << 1)
+#define IORING_POLL_UPDATE_USER_DATA	(1U << 2)
+#define IORING_POLL_ADD_LEVEL		(1U << 3)
+
+/*
+ * ASYNC_CANCEL flags.
+ *
+ * IORING_ASYNC_CANCEL_ALL	Cancel all requests that match the given key
+ * IORING_ASYNC_CANCEL_FD	Key off 'fd' for cancelation rather than the
+ *				request 'user_data'
+ * IORING_ASYNC_CANCEL_ANY	Match any request
+ * IORING_ASYNC_CANCEL_FD_FIXED	'fd' passed in is a fixed descriptor
+ * IORING_ASYNC_CANCEL_USERDATA	Match on user_data, default for no other key
+ * IORING_ASYNC_CANCEL_OP	Match request based on opcode
+ */
+#define IORING_ASYNC_CANCEL_ALL	(1U << 0)
+#define IORING_ASYNC_CANCEL_FD	(1U << 1)
+#define IORING_ASYNC_CANCEL_ANY	(1U << 2)
+#define IORING_ASYNC_CANCEL_FD_FIXED	(1U << 3)
+#define IORING_ASYNC_CANCEL_USERDATA	(1U << 4)
+#define IORING_ASYNC_CANCEL_OP	(1U << 5)
+
+/*
+ * send/sendmsg and recv/recvmsg flags (sqe->ioprio)
+ *
+ * IORING_RECVSEND_POLL_FIRST	If set, instead of first attempting to send
+ *				or receive and arm poll if that yields an
+ *				-EAGAIN result, arm poll upfront and skip
+ *				the initial transfer attempt.
+ *
+ * IORING_RECV_MULTISHOT	Multishot recv. Sets IORING_CQE_F_MORE if
+ *				the handler will continue to report
+ *				CQEs on behalf of the same SQE.
+ *
+ * IORING_RECVSEND_FIXED_BUF	Use registered buffers, the index is stored in
+ *				the buf_index field.
+ *
+ * IORING_SEND_ZC_REPORT_USAGE
+ *				If set, SEND[MSG]_ZC should report
+ *				the zerocopy usage in cqe.res
+ *				for the IORING_CQE_F_NOTIF cqe.
+ *				0 is reported if zerocopy was actually possible.
+ *				IORING_NOTIF_USAGE_ZC_COPIED if data was copied
+ *				(at least partially).
+ */
+#define IORING_RECVSEND_POLL_FIRST	(1U << 0)
+#define IORING_RECV_MULTISHOT		(1U << 1)
+#define IORING_RECVSEND_FIXED_BUF	(1U << 2)
+#define IORING_SEND_ZC_REPORT_USAGE	(1U << 3)
+
+/*
+ * cqe.res for IORING_CQE_F_NOTIF if
+ * IORING_SEND_ZC_REPORT_USAGE was requested
+ *
+ * It should be treated as a flag, all other
+ * bits of cqe.res should be treated as reserved!
+ */
+#define IORING_NOTIF_USAGE_ZC_COPIED    (1U << 31)
+
+/*
+ * accept flags stored in sqe->ioprio
+ */
+#define IORING_ACCEPT_MULTISHOT	(1U << 0)
+
+/*
+ * IORING_OP_MSG_RING command types, stored in sqe->addr
+ */
+enum {
+	IORING_MSG_DATA,	/* pass sqe->len as 'res' and off as user_data */
+	IORING_MSG_SEND_FD,	/* send a registered fd to another ring */
+};
+
+/*
+ * IORING_OP_MSG_RING flags (sqe->msg_ring_flags)
+ *
+ * IORING_MSG_RING_CQE_SKIP	Don't post a CQE to the target ring. Not
+ *				applicable for IORING_MSG_DATA, obviously.
+ */
+#define IORING_MSG_RING_CQE_SKIP	(1U << 0)
+/* Pass through the flags from sqe->file_index to cqe->flags */
+#define IORING_MSG_RING_FLAGS_PASS	(1U << 1)
+
+/*
+ * IO completion data structure (Completion Queue Entry)
+ */
+struct io_uring_cqe {
+	__u64	user_data;	/* sqe->data submission passed back */
+	__s32	res;		/* result code for this event */
+	__u32	flags;
+
+	/*
+	 * If the ring is initialized with IORING_SETUP_CQE32, then this field
+	 * contains 16-bytes of padding, doubling the size of the CQE.
+	 */
+	__u64 big_cqe[];
+};
+
+/*
+ * cqe->flags
+ *
+ * IORING_CQE_F_BUFFER	If set, the upper 16 bits are the buffer ID
+ * IORING_CQE_F_MORE	If set, parent SQE will generate more CQE entries
+ * IORING_CQE_F_SOCK_NONEMPTY	If set, more data to read after socket recv
+ * IORING_CQE_F_NOTIF	Set for notification CQEs. Can be used to distinct
+ * 			them from sends.
+ */
+#define IORING_CQE_F_BUFFER		(1U << 0)
+#define IORING_CQE_F_MORE		(1U << 1)
+#define IORING_CQE_F_SOCK_NONEMPTY	(1U << 2)
+#define IORING_CQE_F_NOTIF		(1U << 3)
+
+enum {
+	IORING_CQE_BUFFER_SHIFT		= 16,
+};
+
+/*
+ * Magic offsets for the application to mmap the data it needs
+ */
+#define IORING_OFF_SQ_RING		0ULL
+#define IORING_OFF_CQ_RING		0x8000000ULL
+#define IORING_OFF_SQES			0x10000000ULL
+#define IORING_OFF_PBUF_RING		0x80000000ULL
+#define IORING_OFF_PBUF_SHIFT		16
+#define IORING_OFF_MMAP_MASK		0xf8000000ULL
+
+/*
+ * Filled with the offset for mmap(2)
+ */
+struct io_sqring_offsets {
+	__u32 head;
+	__u32 tail;
+	__u32 ring_mask;
+	__u32 ring_entries;
+	__u32 flags;
+	__u32 dropped;
+	__u32 array;
+	__u32 resv1;
+	__u64 user_addr;
+};
+
+/*
+ * sq_ring->flags
+ */
+#define IORING_SQ_NEED_WAKEUP	(1U << 0) /* needs io_uring_enter wakeup */
+#define IORING_SQ_CQ_OVERFLOW	(1U << 1) /* CQ ring is overflown */
+#define IORING_SQ_TASKRUN	(1U << 2) /* task should enter the kernel */
+
+struct io_cqring_offsets {
+	__u32 head;
+	__u32 tail;
+	__u32 ring_mask;
+	__u32 ring_entries;
+	__u32 overflow;
+	__u32 cqes;
+	__u32 flags;
+	__u32 resv1;
+	__u64 user_addr;
+};
+
+/*
+ * cq_ring->flags
+ */
+
+/* disable eventfd notifications */
+#define IORING_CQ_EVENTFD_DISABLED	(1U << 0)
+
+/*
+ * io_uring_enter(2) flags
+ */
+#define IORING_ENTER_GETEVENTS		(1U << 0)
+#define IORING_ENTER_SQ_WAKEUP		(1U << 1)
+#define IORING_ENTER_SQ_WAIT		(1U << 2)
+#define IORING_ENTER_EXT_ARG		(1U << 3)
+#define IORING_ENTER_REGISTERED_RING	(1U << 4)
+
+/*
+ * Passed in for io_uring_setup(2). Copied back with updated info on success
+ */
+struct io_uring_params {
+	__u32 sq_entries;
+	__u32 cq_entries;
+	__u32 flags;
+	__u32 sq_thread_cpu;
+	__u32 sq_thread_idle;
+	__u32 features;
+	__u32 wq_fd;
+	__u32 resv[3];
+	struct io_sqring_offsets sq_off;
+	struct io_cqring_offsets cq_off;
+};
+
+/*
+ * io_uring_params->features flags
+ */
+#define IORING_FEAT_SINGLE_MMAP		(1U << 0)
+#define IORING_FEAT_NODROP		(1U << 1)
+#define IORING_FEAT_SUBMIT_STABLE	(1U << 2)
+#define IORING_FEAT_RW_CUR_POS		(1U << 3)
+#define IORING_FEAT_CUR_PERSONALITY	(1U << 4)
+#define IORING_FEAT_FAST_POLL		(1U << 5)
+#define IORING_FEAT_POLL_32BITS 	(1U << 6)
+#define IORING_FEAT_SQPOLL_NONFIXED	(1U << 7)
+#define IORING_FEAT_EXT_ARG		(1U << 8)
+#define IORING_FEAT_NATIVE_WORKERS	(1U << 9)
+#define IORING_FEAT_RSRC_TAGS		(1U << 10)
+#define IORING_FEAT_CQE_SKIP		(1U << 11)
+#define IORING_FEAT_LINKED_FILE		(1U << 12)
+#define IORING_FEAT_REG_REG_RING	(1U << 13)
+
+/*
+ * io_uring_register(2) opcodes and arguments
+ */
+enum {
+	IORING_REGISTER_BUFFERS			= 0,
+	IORING_UNREGISTER_BUFFERS		= 1,
+	IORING_REGISTER_FILES			= 2,
+	IORING_UNREGISTER_FILES			= 3,
+	IORING_REGISTER_EVENTFD			= 4,
+	IORING_UNREGISTER_EVENTFD		= 5,
+	IORING_REGISTER_FILES_UPDATE		= 6,
+	IORING_REGISTER_EVENTFD_ASYNC		= 7,
+	IORING_REGISTER_PROBE			= 8,
+	IORING_REGISTER_PERSONALITY		= 9,
+	IORING_UNREGISTER_PERSONALITY		= 10,
+	IORING_REGISTER_RESTRICTIONS		= 11,
+	IORING_REGISTER_ENABLE_RINGS		= 12,
+
+	/* extended with tagging */
+	IORING_REGISTER_FILES2			= 13,
+	IORING_REGISTER_FILES_UPDATE2		= 14,
+	IORING_REGISTER_BUFFERS2		= 15,
+	IORING_REGISTER_BUFFERS_UPDATE		= 16,
+
+	/* set/clear io-wq thread affinities */
+	IORING_REGISTER_IOWQ_AFF		= 17,
+	IORING_UNREGISTER_IOWQ_AFF		= 18,
+
+	/* set/get max number of io-wq workers */
+	IORING_REGISTER_IOWQ_MAX_WORKERS	= 19,
+
+	/* register/unregister io_uring fd with the ring */
+	IORING_REGISTER_RING_FDS		= 20,
+	IORING_UNREGISTER_RING_FDS		= 21,
+
+	/* register ring based provide buffer group */
+	IORING_REGISTER_PBUF_RING		= 22,
+	IORING_UNREGISTER_PBUF_RING		= 23,
+
+	/* sync cancelation API */
+	IORING_REGISTER_SYNC_CANCEL		= 24,
+
+	/* register a range of fixed file slots for automatic slot allocation */
+	IORING_REGISTER_FILE_ALLOC_RANGE	= 25,
+
+	/* this goes last */
+	IORING_REGISTER_LAST,
+
+	/* flag added to the opcode to use a registered ring fd */
+	IORING_REGISTER_USE_REGISTERED_RING	= 1U << 31
+};
+
+/* io-wq worker categories */
+enum {
+	IO_WQ_BOUND,
+	IO_WQ_UNBOUND,
+};
+
+/* deprecated, see struct io_uring_rsrc_update */
+struct io_uring_files_update {
+	__u32 offset;
+	__u32 resv;
+	__aligned_u64 /* __s32 * */ fds;
+};
+
+/*
+ * Register a fully sparse file space, rather than pass in an array of all
+ * -1 file descriptors.
+ */
+#define IORING_RSRC_REGISTER_SPARSE	(1U << 0)
+
+struct io_uring_rsrc_register {
+	__u32 nr;
+	__u32 flags;
+	__u64 resv2;
+	__aligned_u64 data;
+	__aligned_u64 tags;
+};
+
+struct io_uring_rsrc_update {
+	__u32 offset;
+	__u32 resv;
+	__aligned_u64 data;
+};
+
+struct io_uring_rsrc_update2 {
+	__u32 offset;
+	__u32 resv;
+	__aligned_u64 data;
+	__aligned_u64 tags;
+	__u32 nr;
+	__u32 resv2;
+};
+
+/* Skip updating fd indexes set to this value in the fd table */
+#define IORING_REGISTER_FILES_SKIP	(-2)
+
+#define IO_URING_OP_SUPPORTED	(1U << 0)
+
+struct io_uring_probe_op {
+	__u8 op;
+	__u8 resv;
+	__u16 flags;	/* IO_URING_OP_* flags */
+	__u32 resv2;
+};
+
+struct io_uring_probe {
+	__u8 last_op;	/* last opcode supported */
+	__u8 ops_len;	/* length of ops[] array below */
+	__u16 resv;
+	__u32 resv2[3];
+	struct io_uring_probe_op ops[];
+};
+
+struct io_uring_restriction {
+	__u16 opcode;
+	union {
+		__u8 register_op; /* IORING_RESTRICTION_REGISTER_OP */
+		__u8 sqe_op;      /* IORING_RESTRICTION_SQE_OP */
+		__u8 sqe_flags;   /* IORING_RESTRICTION_SQE_FLAGS_* */
+	};
+	__u8 resv;
+	__u32 resv2[3];
+};
+
+struct io_uring_buf {
+	__u64	addr;
+	__u32	len;
+	__u16	bid;
+	__u16	resv;
+};
+
+struct io_uring_buf_ring {
+	union {
+		/*
+		 * To avoid spilling into more pages than we need to, the
+		 * ring tail is overlaid with the io_uring_buf->resv field.
+		 */
+		struct {
+			__u64	resv1;
+			__u32	resv2;
+			__u16	resv3;
+			__u16	tail;
+		};
+		__DECLARE_FLEX_ARRAY(struct io_uring_buf, bufs);
+	};
+};
+
+/*
+ * Flags for IORING_REGISTER_PBUF_RING.
+ *
+ * IOU_PBUF_RING_MMAP:	If set, kernel will allocate the memory for the ring.
+ *			The application must not set a ring_addr in struct
+ *			io_uring_buf_reg, instead it must subsequently call
+ *			mmap(2) with the offset set as:
+ *			IORING_OFF_PBUF_RING | (bgid << IORING_OFF_PBUF_SHIFT)
+ *			to get a virtual mapping for the ring.
+ */
+enum {
+	IOU_PBUF_RING_MMAP	= 1,
+};
+
+/* argument for IORING_(UN)REGISTER_PBUF_RING */
+struct io_uring_buf_reg {
+	__u64	ring_addr;
+	__u32	ring_entries;
+	__u16	bgid;
+	__u16	flags;
+	__u64	resv[3];
+};
+
+/*
+ * io_uring_restriction->opcode values
+ */
+enum {
+	/* Allow an io_uring_register(2) opcode */
+	IORING_RESTRICTION_REGISTER_OP		= 0,
+
+	/* Allow an sqe opcode */
+	IORING_RESTRICTION_SQE_OP		= 1,
+
+	/* Allow sqe flags */
+	IORING_RESTRICTION_SQE_FLAGS_ALLOWED	= 2,
+
+	/* Require sqe flags (these flags must be set on each submission) */
+	IORING_RESTRICTION_SQE_FLAGS_REQUIRED	= 3,
+
+	IORING_RESTRICTION_LAST
+};
+
+struct io_uring_getevents_arg {
+	__u64	sigmask;
+	__u32	sigmask_sz;
+	__u32	pad;
+	__u64	ts;
+};
+
+/*
+ * Argument for IORING_REGISTER_SYNC_CANCEL
+ */
+struct io_uring_sync_cancel_reg {
+	__u64				addr;
+	__s32				fd;
+	__u32				flags;
+	struct __kernel_timespec	timeout;
+	__u8				opcode;
+	__u8				pad[7];
+	__u64				pad2[3];
+};
+
+/*
+ * Argument for IORING_REGISTER_FILE_ALLOC_RANGE
+ * The range is specified as [off, off + len)
+ */
+struct io_uring_file_index_range {
+	__u32	off;
+	__u32	len;
+	__u64	resv;
+};
+
+struct io_uring_recvmsg_out {
+	__u32 namelen;
+	__u32 controllen;
+	__u32 payloadlen;
+	__u32 flags;
+};
+
+/*
+ * Argument for IORING_OP_URING_CMD when file is a socket
+ */
+enum {
+	SOCKET_URING_OP_SIOCINQ		= 0,
+	SOCKET_URING_OP_SIOCOUTQ,
+	SOCKET_URING_OP_GETSOCKOPT,
+	SOCKET_URING_OP_SETSOCKOPT,
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 07/11] selftests/net: Extract uring helpers to be reusable
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (5 preceding siblings ...)
  2023-10-16 13:47 ` [PATCH v7 06/11] tools headers: Grab copy of io_uring.h Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
  2023-10-16 13:47 ` [PATCH v7 08/11] io_uring/cmd: return -EOPNOTSUPP if net is disabled Breno Leitao
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman, David S. Miller, Eric Dumazet, Shuah Khan
  Cc: bpf, linux-kernel, netdev, io-uring, open list:KERNEL SELFTEST FRAMEWORK

Instead of defining basic io_uring functions in the test case, move them
to a common directory, so, other tests can use them.

This simplify the test code and reuse the common liburing
infrastructure. This is basically a copy of what we have in
io_uring_zerocopy_tx with some minor improvements to make checkpatch
happy.

A follow-up test will use the same helpers in a BPF sockopt test.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 tools/include/io_uring/mini_liburing.h        | 282 ++++++++++++++++++
 tools/testing/selftests/net/Makefile          |   1 +
 .../selftests/net/io_uring_zerocopy_tx.c      | 268 +----------------
 3 files changed, 285 insertions(+), 266 deletions(-)
 create mode 100644 tools/include/io_uring/mini_liburing.h

diff --git a/tools/include/io_uring/mini_liburing.h b/tools/include/io_uring/mini_liburing.h
new file mode 100644
index 000000000000..9ccb16074eb5
--- /dev/null
+++ b/tools/include/io_uring/mini_liburing.h
@@ -0,0 +1,282 @@
+/* SPDX-License-Identifier: MIT */
+
+#include <linux/io_uring.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+struct io_sq_ring {
+	unsigned int *head;
+	unsigned int *tail;
+	unsigned int *ring_mask;
+	unsigned int *ring_entries;
+	unsigned int *flags;
+	unsigned int *array;
+};
+
+struct io_cq_ring {
+	unsigned int *head;
+	unsigned int *tail;
+	unsigned int *ring_mask;
+	unsigned int *ring_entries;
+	struct io_uring_cqe *cqes;
+};
+
+struct io_uring_sq {
+	unsigned int *khead;
+	unsigned int *ktail;
+	unsigned int *kring_mask;
+	unsigned int *kring_entries;
+	unsigned int *kflags;
+	unsigned int *kdropped;
+	unsigned int *array;
+	struct io_uring_sqe *sqes;
+
+	unsigned int sqe_head;
+	unsigned int sqe_tail;
+
+	size_t ring_sz;
+};
+
+struct io_uring_cq {
+	unsigned int *khead;
+	unsigned int *ktail;
+	unsigned int *kring_mask;
+	unsigned int *kring_entries;
+	unsigned int *koverflow;
+	struct io_uring_cqe *cqes;
+
+	size_t ring_sz;
+};
+
+struct io_uring {
+	struct io_uring_sq sq;
+	struct io_uring_cq cq;
+	int ring_fd;
+};
+
+#if defined(__x86_64) || defined(__i386__)
+#define read_barrier()	__asm__ __volatile__("":::"memory")
+#define write_barrier()	__asm__ __volatile__("":::"memory")
+#else
+#define read_barrier()	__sync_synchronize()
+#define write_barrier()	__sync_synchronize()
+#endif
+
+static inline int io_uring_mmap(int fd, struct io_uring_params *p,
+				struct io_uring_sq *sq, struct io_uring_cq *cq)
+{
+	size_t size;
+	void *ptr;
+	int ret;
+
+	sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned int);
+	ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE,
+		   MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING);
+	if (ptr == MAP_FAILED)
+		return -errno;
+	sq->khead = ptr + p->sq_off.head;
+	sq->ktail = ptr + p->sq_off.tail;
+	sq->kring_mask = ptr + p->sq_off.ring_mask;
+	sq->kring_entries = ptr + p->sq_off.ring_entries;
+	sq->kflags = ptr + p->sq_off.flags;
+	sq->kdropped = ptr + p->sq_off.dropped;
+	sq->array = ptr + p->sq_off.array;
+
+	size = p->sq_entries * sizeof(struct io_uring_sqe);
+	sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES);
+	if (sq->sqes == MAP_FAILED) {
+		ret = -errno;
+err:
+		munmap(sq->khead, sq->ring_sz);
+		return ret;
+	}
+
+	cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe);
+	ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE,
+		   MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING);
+	if (ptr == MAP_FAILED) {
+		ret = -errno;
+		munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe));
+		goto err;
+	}
+	cq->khead = ptr + p->cq_off.head;
+	cq->ktail = ptr + p->cq_off.tail;
+	cq->kring_mask = ptr + p->cq_off.ring_mask;
+	cq->kring_entries = ptr + p->cq_off.ring_entries;
+	cq->koverflow = ptr + p->cq_off.overflow;
+	cq->cqes = ptr + p->cq_off.cqes;
+	return 0;
+}
+
+static inline int io_uring_setup(unsigned int entries,
+				 struct io_uring_params *p)
+{
+	return syscall(__NR_io_uring_setup, entries, p);
+}
+
+static inline int io_uring_enter(int fd, unsigned int to_submit,
+				 unsigned int min_complete,
+				 unsigned int flags, sigset_t *sig)
+{
+	return syscall(__NR_io_uring_enter, fd, to_submit, min_complete,
+		       flags, sig, _NSIG / 8);
+}
+
+static inline int io_uring_queue_init(unsigned int entries,
+				      struct io_uring *ring,
+				      unsigned int flags)
+{
+	struct io_uring_params p;
+	int fd, ret;
+
+	memset(ring, 0, sizeof(*ring));
+	memset(&p, 0, sizeof(p));
+	p.flags = flags;
+
+	fd = io_uring_setup(entries, &p);
+	if (fd < 0)
+		return fd;
+	ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq);
+	if (!ret)
+		ring->ring_fd = fd;
+	else
+		close(fd);
+	return ret;
+}
+
+/* Get a sqe */
+static inline struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
+{
+	struct io_uring_sq *sq = &ring->sq;
+
+	if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries)
+		return NULL;
+	return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask];
+}
+
+static inline int io_uring_wait_cqe(struct io_uring *ring,
+				    struct io_uring_cqe **cqe_ptr)
+{
+	struct io_uring_cq *cq = &ring->cq;
+	const unsigned int mask = *cq->kring_mask;
+	unsigned int head = *cq->khead;
+	int ret;
+
+	*cqe_ptr = NULL;
+	do {
+		read_barrier();
+		if (head != *cq->ktail) {
+			*cqe_ptr = &cq->cqes[head & mask];
+			break;
+		}
+		ret = io_uring_enter(ring->ring_fd, 0, 1,
+				     IORING_ENTER_GETEVENTS, NULL);
+		if (ret < 0)
+			return -errno;
+	} while (1);
+
+	return 0;
+}
+
+static inline int io_uring_submit(struct io_uring *ring)
+{
+	struct io_uring_sq *sq = &ring->sq;
+	const unsigned int mask = *sq->kring_mask;
+	unsigned int ktail, submitted, to_submit;
+	int ret;
+
+	read_barrier();
+	if (*sq->khead != *sq->ktail) {
+		submitted = *sq->kring_entries;
+		goto submit;
+	}
+	if (sq->sqe_head == sq->sqe_tail)
+		return 0;
+
+	ktail = *sq->ktail;
+	to_submit = sq->sqe_tail - sq->sqe_head;
+	for (submitted = 0; submitted < to_submit; submitted++) {
+		read_barrier();
+		sq->array[ktail++ & mask] = sq->sqe_head++ & mask;
+	}
+	if (!submitted)
+		return 0;
+
+	if (*sq->ktail != ktail) {
+		write_barrier();
+		*sq->ktail = ktail;
+		write_barrier();
+	}
+submit:
+	ret = io_uring_enter(ring->ring_fd, submitted, 0,
+			     IORING_ENTER_GETEVENTS, NULL);
+	return ret < 0 ? -errno : ret;
+}
+
+static inline void io_uring_queue_exit(struct io_uring *ring)
+{
+	struct io_uring_sq *sq = &ring->sq;
+
+	munmap(sq->sqes, *sq->kring_entries * sizeof(struct io_uring_sqe));
+	munmap(sq->khead, sq->ring_sz);
+	close(ring->ring_fd);
+}
+
+/* Prepare and send the SQE */
+static inline void io_uring_prep_cmd(struct io_uring_sqe *sqe, int op,
+				     int sockfd,
+				     int level, int optname,
+				     const void *optval,
+				     int optlen)
+{
+	memset(sqe, 0, sizeof(*sqe));
+	sqe->opcode = (__u8)IORING_OP_URING_CMD;
+	sqe->fd = sockfd;
+	sqe->cmd_op = op;
+
+	sqe->level = level;
+	sqe->optname = optname;
+	sqe->optval = (unsigned long long)optval;
+	sqe->optlen = optlen;
+}
+
+static inline int io_uring_register_buffers(struct io_uring *ring,
+					    const struct iovec *iovecs,
+					    unsigned int nr_iovecs)
+{
+	int ret;
+
+	ret = syscall(__NR_io_uring_register, ring->ring_fd,
+		      IORING_REGISTER_BUFFERS, iovecs, nr_iovecs);
+	return (ret < 0) ? -errno : ret;
+}
+
+static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd,
+				      const void *buf, size_t len, int flags)
+{
+	memset(sqe, 0, sizeof(*sqe));
+	sqe->opcode = (__u8)IORING_OP_SEND;
+	sqe->fd = sockfd;
+	sqe->addr = (unsigned long)buf;
+	sqe->len = len;
+	sqe->msg_flags = (__u32)flags;
+}
+
+static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd,
+					const void *buf, size_t len, int flags,
+					unsigned int zc_flags)
+{
+	io_uring_prep_send(sqe, sockfd, buf, len, flags);
+	sqe->opcode = (__u8)IORING_OP_SEND_ZC;
+	sqe->ioprio = zc_flags;
+}
+
+static inline void io_uring_cqe_seen(struct io_uring *ring)
+{
+	*(&ring->cq)->khead += 1;
+	write_barrier();
+}
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 61939a695f95..4adfe2186f39 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -99,6 +99,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
 $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto
 $(OUTPUT)/tcp_inq: LDLIBS += -lpthread
 $(OUTPUT)/bind_bhash: LDLIBS += -lpthread
+$(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
 
 # Rules to generate bpf obj nat6to4.o
 CLANG ?= clang
diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c
index 154287740172..76e604e4810e 100644
--- a/tools/testing/selftests/net/io_uring_zerocopy_tx.c
+++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c
@@ -36,6 +36,8 @@
 #include <sys/un.h>
 #include <sys/wait.h>
 
+#include <io_uring/mini_liburing.h>
+
 #define NOTIF_TAG 0xfffffffULL
 #define NONZC_TAG 0
 #define ZC_TAG 1
@@ -60,272 +62,6 @@ static struct sockaddr_storage cfg_dst_addr;
 
 static char payload[IP_MAXPACKET] __attribute__((aligned(4096)));
 
-struct io_sq_ring {
-	unsigned *head;
-	unsigned *tail;
-	unsigned *ring_mask;
-	unsigned *ring_entries;
-	unsigned *flags;
-	unsigned *array;
-};
-
-struct io_cq_ring {
-	unsigned *head;
-	unsigned *tail;
-	unsigned *ring_mask;
-	unsigned *ring_entries;
-	struct io_uring_cqe *cqes;
-};
-
-struct io_uring_sq {
-	unsigned *khead;
-	unsigned *ktail;
-	unsigned *kring_mask;
-	unsigned *kring_entries;
-	unsigned *kflags;
-	unsigned *kdropped;
-	unsigned *array;
-	struct io_uring_sqe *sqes;
-
-	unsigned sqe_head;
-	unsigned sqe_tail;
-
-	size_t ring_sz;
-};
-
-struct io_uring_cq {
-	unsigned *khead;
-	unsigned *ktail;
-	unsigned *kring_mask;
-	unsigned *kring_entries;
-	unsigned *koverflow;
-	struct io_uring_cqe *cqes;
-
-	size_t ring_sz;
-};
-
-struct io_uring {
-	struct io_uring_sq sq;
-	struct io_uring_cq cq;
-	int ring_fd;
-};
-
-#ifdef __alpha__
-# ifndef __NR_io_uring_setup
-#  define __NR_io_uring_setup		535
-# endif
-# ifndef __NR_io_uring_enter
-#  define __NR_io_uring_enter		536
-# endif
-# ifndef __NR_io_uring_register
-#  define __NR_io_uring_register	537
-# endif
-#else /* !__alpha__ */
-# ifndef __NR_io_uring_setup
-#  define __NR_io_uring_setup		425
-# endif
-# ifndef __NR_io_uring_enter
-#  define __NR_io_uring_enter		426
-# endif
-# ifndef __NR_io_uring_register
-#  define __NR_io_uring_register	427
-# endif
-#endif
-
-#if defined(__x86_64) || defined(__i386__)
-#define read_barrier()	__asm__ __volatile__("":::"memory")
-#define write_barrier()	__asm__ __volatile__("":::"memory")
-#else
-
-#define read_barrier()	__sync_synchronize()
-#define write_barrier()	__sync_synchronize()
-#endif
-
-static int io_uring_setup(unsigned int entries, struct io_uring_params *p)
-{
-	return syscall(__NR_io_uring_setup, entries, p);
-}
-
-static int io_uring_enter(int fd, unsigned int to_submit,
-			  unsigned int min_complete,
-			  unsigned int flags, sigset_t *sig)
-{
-	return syscall(__NR_io_uring_enter, fd, to_submit, min_complete,
-			flags, sig, _NSIG / 8);
-}
-
-static int io_uring_register_buffers(struct io_uring *ring,
-				     const struct iovec *iovecs,
-				     unsigned nr_iovecs)
-{
-	int ret;
-
-	ret = syscall(__NR_io_uring_register, ring->ring_fd,
-		      IORING_REGISTER_BUFFERS, iovecs, nr_iovecs);
-	return (ret < 0) ? -errno : ret;
-}
-
-static int io_uring_mmap(int fd, struct io_uring_params *p,
-			 struct io_uring_sq *sq, struct io_uring_cq *cq)
-{
-	size_t size;
-	void *ptr;
-	int ret;
-
-	sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned);
-	ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE,
-		   MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING);
-	if (ptr == MAP_FAILED)
-		return -errno;
-	sq->khead = ptr + p->sq_off.head;
-	sq->ktail = ptr + p->sq_off.tail;
-	sq->kring_mask = ptr + p->sq_off.ring_mask;
-	sq->kring_entries = ptr + p->sq_off.ring_entries;
-	sq->kflags = ptr + p->sq_off.flags;
-	sq->kdropped = ptr + p->sq_off.dropped;
-	sq->array = ptr + p->sq_off.array;
-
-	size = p->sq_entries * sizeof(struct io_uring_sqe);
-	sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE,
-			MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES);
-	if (sq->sqes == MAP_FAILED) {
-		ret = -errno;
-err:
-		munmap(sq->khead, sq->ring_sz);
-		return ret;
-	}
-
-	cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe);
-	ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE,
-			MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING);
-	if (ptr == MAP_FAILED) {
-		ret = -errno;
-		munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe));
-		goto err;
-	}
-	cq->khead = ptr + p->cq_off.head;
-	cq->ktail = ptr + p->cq_off.tail;
-	cq->kring_mask = ptr + p->cq_off.ring_mask;
-	cq->kring_entries = ptr + p->cq_off.ring_entries;
-	cq->koverflow = ptr + p->cq_off.overflow;
-	cq->cqes = ptr + p->cq_off.cqes;
-	return 0;
-}
-
-static int io_uring_queue_init(unsigned entries, struct io_uring *ring,
-			       unsigned flags)
-{
-	struct io_uring_params p;
-	int fd, ret;
-
-	memset(ring, 0, sizeof(*ring));
-	memset(&p, 0, sizeof(p));
-	p.flags = flags;
-
-	fd = io_uring_setup(entries, &p);
-	if (fd < 0)
-		return fd;
-	ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq);
-	if (!ret)
-		ring->ring_fd = fd;
-	else
-		close(fd);
-	return ret;
-}
-
-static int io_uring_submit(struct io_uring *ring)
-{
-	struct io_uring_sq *sq = &ring->sq;
-	const unsigned mask = *sq->kring_mask;
-	unsigned ktail, submitted, to_submit;
-	int ret;
-
-	read_barrier();
-	if (*sq->khead != *sq->ktail) {
-		submitted = *sq->kring_entries;
-		goto submit;
-	}
-	if (sq->sqe_head == sq->sqe_tail)
-		return 0;
-
-	ktail = *sq->ktail;
-	to_submit = sq->sqe_tail - sq->sqe_head;
-	for (submitted = 0; submitted < to_submit; submitted++) {
-		read_barrier();
-		sq->array[ktail++ & mask] = sq->sqe_head++ & mask;
-	}
-	if (!submitted)
-		return 0;
-
-	if (*sq->ktail != ktail) {
-		write_barrier();
-		*sq->ktail = ktail;
-		write_barrier();
-	}
-submit:
-	ret = io_uring_enter(ring->ring_fd, submitted, 0,
-				IORING_ENTER_GETEVENTS, NULL);
-	return ret < 0 ? -errno : ret;
-}
-
-static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd,
-				      const void *buf, size_t len, int flags)
-{
-	memset(sqe, 0, sizeof(*sqe));
-	sqe->opcode = (__u8) IORING_OP_SEND;
-	sqe->fd = sockfd;
-	sqe->addr = (unsigned long) buf;
-	sqe->len = len;
-	sqe->msg_flags = (__u32) flags;
-}
-
-static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd,
-				        const void *buf, size_t len, int flags,
-				        unsigned zc_flags)
-{
-	io_uring_prep_send(sqe, sockfd, buf, len, flags);
-	sqe->opcode = (__u8) IORING_OP_SEND_ZC;
-	sqe->ioprio = zc_flags;
-}
-
-static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
-{
-	struct io_uring_sq *sq = &ring->sq;
-
-	if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries)
-		return NULL;
-	return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask];
-}
-
-static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr)
-{
-	struct io_uring_cq *cq = &ring->cq;
-	const unsigned mask = *cq->kring_mask;
-	unsigned head = *cq->khead;
-	int ret;
-
-	*cqe_ptr = NULL;
-	do {
-		read_barrier();
-		if (head != *cq->ktail) {
-			*cqe_ptr = &cq->cqes[head & mask];
-			break;
-		}
-		ret = io_uring_enter(ring->ring_fd, 0, 1,
-					IORING_ENTER_GETEVENTS, NULL);
-		if (ret < 0)
-			return -errno;
-	} while (1);
-
-	return 0;
-}
-
-static inline void io_uring_cqe_seen(struct io_uring *ring)
-{
-	*(&ring->cq)->khead += 1;
-	write_barrier();
-}
-
 static unsigned long gettimeofday_ms(void)
 {
 	struct timeval tv;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 08/11] io_uring/cmd: return -EOPNOTSUPP if net is disabled
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (6 preceding siblings ...)
  2023-10-16 13:47 ` [PATCH v7 07/11] selftests/net: Extract uring helpers to be reusable Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
  2023-10-16 13:47 ` [PATCH v7 09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman
  Cc: bpf, linux-kernel, netdev, io-uring

Protect io_uring_cmd_sock() to be called if CONFIG_NET is not set. If
network is not enabled, but io_uring is, then we want to return
-EOPNOTSUPP for any possible socket operation.

This is helpful because io_uring_cmd_sock() can now call functions that
only exits if CONFIG_NET is enabled without having #ifdef CONFIG_NET
inside the function itself.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 io_uring/uring_cmd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 4bedd633c08c..42694c07d8fd 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -214,6 +214,7 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 }
 EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
 
+#if defined(CONFIG_NET)
 int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 {
 	struct socket *sock = cmd->file->private_data;
@@ -240,3 +241,4 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 	}
 }
 EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
+#endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (7 preceding siblings ...)
  2023-10-16 13:47 ` [PATCH v7 08/11] io_uring/cmd: return -EOPNOTSUPP if net is disabled Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
  2023-10-16 18:33   ` Gabriel Krisman Bertazi
  2023-10-16 13:47 ` [PATCH v7 10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman
  Cc: bpf, linux-kernel, netdev, io-uring

Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
where a sockptr_t is either userspace or kernel space, and handled as
such.

Differently from the getsockopt(2), the optlen field is not a userspace
pointers. In getsockopt(2), userspace provides optlen pointer, which is
overwritten by the kernel.  In this implementation, userspace passes a
u32, and the new value is returned in cqe->res. I.e., optlen is not a
pointer.

Important to say that userspace needs to keep the pointer alive until
the CQE is completed.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/uapi/linux/io_uring.h |  7 +++++++
 io_uring/uring_cmd.c          | 28 ++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 92be89a871fc..9628d4f5daba 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -43,6 +43,10 @@ struct io_uring_sqe {
 	union {
 		__u64	addr;	/* pointer to buffer or iovecs */
 		__u64	splice_off_in;
+		struct {
+			__u32	level;
+			__u32	optname;
+		};
 	};
 	__u32	len;		/* buffer size or number of iovecs */
 	union {
@@ -81,6 +85,7 @@ struct io_uring_sqe {
 	union {
 		__s32	splice_fd_in;
 		__u32	file_index;
+		__u32	optlen;
 		struct {
 			__u16	addr_len;
 			__u16	__pad3[1];
@@ -91,6 +96,7 @@ struct io_uring_sqe {
 			__u64	addr3;
 			__u64	__pad2[1];
 		};
+		__u64	optval;
 		/*
 		 * If the ring is initialized with IORING_SETUP_SQE128, then
 		 * this field is used for 80 bytes of arbitrary command data
@@ -740,6 +746,7 @@ struct io_uring_recvmsg_out {
 enum {
 	SOCKET_URING_OP_SIOCINQ		= 0,
 	SOCKET_URING_OP_SIOCOUTQ,
+	SOCKET_URING_OP_GETSOCKOPT,
 };
 
 #ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 42694c07d8fd..8b045830b0d9 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -214,6 +214,32 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 }
 EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
 
+static inline int io_uring_cmd_getsockopt(struct socket *sock,
+					  struct io_uring_cmd *cmd,
+					  unsigned int issue_flags)
+{
+	bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+	int optlen, optname, level, err;
+	void __user *optval;
+
+	level = READ_ONCE(cmd->sqe->level);
+	if (level != SOL_SOCKET)
+		return -EOPNOTSUPP;
+
+	optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+	optname = READ_ONCE(cmd->sqe->optname);
+	optlen = READ_ONCE(cmd->sqe->optlen);
+
+	err = do_sock_getsockopt(sock, compat, level, optname,
+				 USER_SOCKPTR(optval),
+				 KERNEL_SOCKPTR(&optlen));
+	if (err)
+		return err;
+
+	/* On success, return optlen */
+	return optlen;
+}
+
 #if defined(CONFIG_NET)
 int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 {
@@ -236,6 +262,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 		if (ret)
 			return ret;
 		return arg;
+	case SOCKET_URING_OP_GETSOCKOPT:
+		return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (8 preceding siblings ...)
  2023-10-16 13:47 ` [PATCH v7 09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
  2023-10-16 18:33   ` Gabriel Krisman Bertazi
  2023-10-16 13:47 ` [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support Breno Leitao
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman
  Cc: bpf, linux-kernel, netdev, io-uring

Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is
similar to setsockopt. This implementation leverages the function
do_sock_setsockopt(), which is shared with the setsockopt() system call
path.

Important to say that userspace needs to keep the pointer's memory alive
until the operation is completed. I.e, the memory could not be
deallocated before the CQE is returned to userspace.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/uapi/linux/io_uring.h |  1 +
 io_uring/uring_cmd.c          | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 9628d4f5daba..f1c16f817742 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -747,6 +747,7 @@ enum {
 	SOCKET_URING_OP_SIOCINQ		= 0,
 	SOCKET_URING_OP_SIOCOUTQ,
 	SOCKET_URING_OP_GETSOCKOPT,
+	SOCKET_URING_OP_SETSOCKOPT,
 };
 
 #ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 8b045830b0d9..acbc2924ecd2 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -240,6 +240,25 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
 	return optlen;
 }
 
+static inline int io_uring_cmd_setsockopt(struct socket *sock,
+					  struct io_uring_cmd *cmd,
+					  unsigned int issue_flags)
+{
+	bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+	int optname, optlen, level;
+	void __user *optval;
+	sockptr_t optval_s;
+
+	optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+	optname = READ_ONCE(cmd->sqe->optname);
+	optlen = READ_ONCE(cmd->sqe->optlen);
+	level = READ_ONCE(cmd->sqe->level);
+	optval_s = USER_SOCKPTR(optval);
+
+	return do_sock_setsockopt(sock, compat, level, optname, optval_s,
+				  optlen);
+}
+
 #if defined(CONFIG_NET)
 int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 {
@@ -264,6 +283,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 		return arg;
 	case SOCKET_URING_OP_GETSOCKOPT:
 		return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
+	case SOCKET_URING_OP_SETSOCKOPT:
+		return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (9 preceding siblings ...)
  2023-10-16 13:47 ` [PATCH v7 10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
  2023-10-19 19:51   ` Martin KaFai Lau
  2023-10-19 14:58 ` [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Jens Axboe
  2023-10-19 15:41 ` Jens Axboe
  12 siblings, 1 reply; 30+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko, Shuah Khan
  Cc: bpf, linux-kernel, netdev, io-uring, Daniel Müller,
	open list:KERNEL SELFTEST FRAMEWORK

Expand the sockopt test to use also check for io_uring {g,s}etsockopt
commands operations.

This patch starts by marking each test if they support io_uring support
or not.

Right now, io_uring cmd getsockopt() has a limitation of only
accepting level == SOL_SOCKET, otherwise it returns -EOPNOTSUPP. Since
there aren't any test exercising getsockopt(level == SOL_SOCKET), this
patch changes two tests to use level == SOL_SOCKET, they are
"getsockopt: support smaller ctx->optlen" and "getsockopt: read
ctx->optlen".
There is no limitation for the setsockopt() part.

Later, each test runs using regular {g,s}etsockopt systemcalls, and, if
liburing is supported, execute the same test (again), but calling
liburing {g,s}setsockopt commands.

This patch also changes the level of two tests to use SOL_SOCKET for the
following two tests. This is going to help to exercise the io_uring
subsystem:
 * getsockopt: read ctx->optlen
 * getsockopt: support smaller ctx->optlen

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 .../selftests/bpf/prog_tests/sockopt.c        | 113 +++++++++++++++++-
 1 file changed, 107 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt.c b/tools/testing/selftests/bpf/prog_tests/sockopt.c
index 9e6a5e3ed4de..5a4491d4edfe 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockopt.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockopt.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
+#include <io_uring/mini_liburing.h>
 #include "cgroup_helpers.h"
 
 static char bpf_log_buf[4096];
@@ -38,6 +39,7 @@ static struct sockopt_test {
 	socklen_t			get_optlen_ret;
 
 	enum sockopt_test_error		error;
+	bool				io_uring_support;
 } tests[] = {
 
 	/* ==================== getsockopt ====================  */
@@ -251,7 +253,9 @@ static struct sockopt_test {
 		.attach_type = BPF_CGROUP_GETSOCKOPT,
 		.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
 
+		.get_level = SOL_SOCKET,
 		.get_optlen = 64,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "getsockopt: deny bigger ctx->optlen",
@@ -276,6 +280,7 @@ static struct sockopt_test {
 		.get_optlen = 64,
 
 		.error = EFAULT_GETSOCKOPT,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "getsockopt: ignore >PAGE_SIZE optlen",
@@ -318,6 +323,7 @@ static struct sockopt_test {
 		.get_optval = {}, /* the changes are ignored */
 		.get_optlen = PAGE_SIZE + 1,
 		.error = EOPNOTSUPP_GETSOCKOPT,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "getsockopt: support smaller ctx->optlen",
@@ -337,8 +343,10 @@ static struct sockopt_test {
 		.attach_type = BPF_CGROUP_GETSOCKOPT,
 		.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
 
+		.get_level = SOL_SOCKET,
 		.get_optlen = 64,
 		.get_optlen_ret = 32,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "getsockopt: deny writing to ctx->optval",
@@ -518,6 +526,7 @@ static struct sockopt_test {
 		.set_level = 123,
 
 		.set_optlen = 1,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: allow changing ctx->level",
@@ -572,6 +581,7 @@ static struct sockopt_test {
 		.set_optname = 123,
 
 		.set_optlen = 1,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: allow changing ctx->optname",
@@ -624,6 +634,7 @@ static struct sockopt_test {
 		.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
 
 		.set_optlen = 64,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: ctx->optlen == -1 is ok",
@@ -640,6 +651,7 @@ static struct sockopt_test {
 		.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
 
 		.set_optlen = 64,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: deny ctx->optlen < 0 (except -1)",
@@ -658,6 +670,7 @@ static struct sockopt_test {
 		.set_optlen = 4,
 
 		.error = EFAULT_SETSOCKOPT,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: deny ctx->optlen > input optlen",
@@ -675,6 +688,7 @@ static struct sockopt_test {
 		.set_optlen = 64,
 
 		.error = EFAULT_SETSOCKOPT,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: ignore >PAGE_SIZE optlen",
@@ -940,7 +954,89 @@ static int load_prog(const struct bpf_insn *insns,
 	return fd;
 }
 
-static int run_test(int cgroup_fd, struct sockopt_test *test)
+/* Core function that handles io_uring ring initialization,
+ * sending SQE with sockopt command and waiting for the CQE.
+ */
+static int uring_sockopt(int op, int fd, int level, int optname,
+			 const void *optval, socklen_t optlen)
+{
+	struct io_uring_cqe *cqe;
+	struct io_uring_sqe *sqe;
+	struct io_uring ring;
+	int err;
+
+	err = io_uring_queue_init(1, &ring, 0);
+	if (!ASSERT_OK(err, "io_uring initialization"))
+		return err;
+
+	sqe = io_uring_get_sqe(&ring);
+	if (!ASSERT_NEQ(sqe, NULL, "Get an SQE")) {
+		err = -1;
+		goto fail;
+	}
+
+	io_uring_prep_cmd(sqe, op, fd, level, optname, optval, optlen);
+
+	err = io_uring_submit(&ring);
+	if (!ASSERT_EQ(err, 1, "Submit SQE"))
+		goto fail;
+
+	err = io_uring_wait_cqe(&ring, &cqe);
+	if (!ASSERT_OK(err, "Wait for CQE"))
+		goto fail;
+
+	err = cqe->res;
+
+fail:
+	io_uring_queue_exit(&ring);
+
+	return err;
+}
+
+static int uring_setsockopt(int fd, int level, int optname, const void *optval,
+			    socklen_t optlen)
+{
+	return uring_sockopt(SOCKET_URING_OP_SETSOCKOPT, fd, level, optname,
+			     optval, optlen);
+}
+
+static int uring_getsockopt(int fd, int level, int optname, void *optval,
+			    socklen_t *optlen)
+{
+	int ret = uring_sockopt(SOCKET_URING_OP_GETSOCKOPT, fd, level, optname,
+				optval, *optlen);
+	if (ret < 0)
+		return ret;
+
+	/* Populate optlen back to be compatible with systemcall interface,
+	 * and simplify the test.
+	 */
+	*optlen = ret;
+
+	return 0;
+}
+
+/* Execute the setsocktopt operation */
+static int call_setsockopt(bool use_io_uring, int fd, int level, int optname,
+			   const void *optval, socklen_t optlen)
+{
+	if (use_io_uring)
+		return uring_setsockopt(fd, level, optname, optval, optlen);
+
+	return setsockopt(fd, level, optname, optval, optlen);
+}
+
+/* Execute the getsocktopt operation */
+static int call_getsockopt(bool use_io_uring, int fd, int level, int optname,
+			   void *optval, socklen_t *optlen)
+{
+	if (use_io_uring)
+		return uring_getsockopt(fd, level, optname, optval, optlen);
+
+	return getsockopt(fd, level, optname, optval, optlen);
+}
+
+static int run_test(int cgroup_fd, struct sockopt_test *test, bool use_io_uring)
 {
 	int sock_fd, err, prog_fd;
 	void *optval = NULL;
@@ -980,8 +1076,9 @@ static int run_test(int cgroup_fd, struct sockopt_test *test)
 			test->set_optlen = num_pages * sysconf(_SC_PAGESIZE) + remainder;
 		}
 
-		err = setsockopt(sock_fd, test->set_level, test->set_optname,
-				 test->set_optval, test->set_optlen);
+		err = call_setsockopt(use_io_uring, sock_fd, test->set_level,
+				      test->set_optname, test->set_optval,
+				      test->set_optlen);
 		if (err) {
 			if (errno == EPERM && test->error == EPERM_SETSOCKOPT)
 				goto close_sock_fd;
@@ -1008,8 +1105,8 @@ static int run_test(int cgroup_fd, struct sockopt_test *test)
 		socklen_t expected_get_optlen = test->get_optlen_ret ?:
 			test->get_optlen;
 
-		err = getsockopt(sock_fd, test->get_level, test->get_optname,
-				 optval, &optlen);
+		err = call_getsockopt(use_io_uring, sock_fd, test->get_level,
+				      test->get_optname, optval, &optlen);
 		if (err) {
 			if (errno == EOPNOTSUPP && test->error == EOPNOTSUPP_GETSOCKOPT)
 				goto free_optval;
@@ -1063,7 +1160,11 @@ void test_sockopt(void)
 		if (!test__start_subtest(tests[i].descr))
 			continue;
 
-		ASSERT_OK(run_test(cgroup_fd, &tests[i]), tests[i].descr);
+		ASSERT_OK(run_test(cgroup_fd, &tests[i], false),
+			  tests[i].descr);
+		if (tests[i].io_uring_support)
+			ASSERT_OK(run_test(cgroup_fd, &tests[i], true),
+				  tests[i].descr);
 	}
 
 	close(cgroup_fd);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-10-16 13:47 ` [PATCH v7 09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
@ 2023-10-16 18:33   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 30+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-10-16 18:33 UTC (permalink / raw)
  To: Breno Leitao
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, bpf, linux-kernel, netdev, io-uring

Breno Leitao <leitao@debian.org> writes:

> Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> where a sockptr_t is either userspace or kernel space, and handled as
> such.
>
> Differently from the getsockopt(2), the optlen field is not a userspace
> pointers. In getsockopt(2), userspace provides optlen pointer, which is
> overwritten by the kernel.  In this implementation, userspace passes a
> u32, and the new value is returned in cqe->res. I.e., optlen is not a
> pointer.
>
> Important to say that userspace needs to keep the pointer alive until
> the CQE is completed.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>

I suspect you forgot to collect my previous r-b on this :).  Either way,
still good to me:

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  2023-10-16 13:47 ` [PATCH v7 10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
@ 2023-10-16 18:33   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 30+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-10-16 18:33 UTC (permalink / raw)
  To: Breno Leitao
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, bpf, linux-kernel, netdev, io-uring

Breno Leitao <leitao@debian.org> writes:

> Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is
> similar to setsockopt. This implementation leverages the function
> do_sock_setsockopt(), which is shared with the setsockopt() system call
> path.
>
> Important to say that userspace needs to keep the pointer's memory alive
> until the operation is completed. I.e, the memory could not be
> deallocated before the CQE is returned to userspace.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>

likewise..

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 06/11] tools headers: Grab copy of io_uring.h
  2023-10-16 13:47 ` [PATCH v7 06/11] tools headers: Grab copy of io_uring.h Breno Leitao
@ 2023-10-16 18:56   ` Gabriel Krisman Bertazi
       [not found]   ` <652d877c.250a0220.b0af2.3a66SMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 0 replies; 30+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-10-16 18:56 UTC (permalink / raw)
  To: Breno Leitao
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, bpf, linux-kernel, netdev, io-uring,
	Peter Zijlstra (Intel),
	Stefan Metzmacher, Josh Triplett

Breno Leitao <leitao@debian.org> writes:

> This file will be used by mini_uring.h and allow tests to run without
> the need of installing liburing to run the tests.
>
> This is needed to run io_uring tests in BPF, such as
> (tools/testing/selftests/bpf/prog_tests/sockopt.c).
>
> Signed-off-by: Breno Leitao <leitao@debian.org>

Can't mini_uring rely on the kernel header like
selftests/net/io_uring_zerocopy_tx.c does?  That test doesn't depend on
liburing either.  I ask because this will be the third copy of these
definitions that we're gonna need to keep in sync (kernel, liburing and
here). Given this is only used for selftests, we better avoid the
duplication.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 06/11] tools headers: Grab copy of io_uring.h
       [not found]   ` <652d877c.250a0220.b0af2.3a66SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2023-10-17 12:34     ` Breno Leitao
  0 siblings, 0 replies; 30+ messages in thread
From: Breno Leitao @ 2023-10-17 12:34 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi, martin.lau, sdf
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, bpf, linux-kernel, netdev, io-uring,
	Peter Zijlstra (Intel),
	Stefan Metzmacher, Josh Triplett

Hello Gabriel,

On Mon, Oct 16, 2023 at 02:56:55PM -0400, Gabriel Krisman Bertazi wrote:
> Breno Leitao <leitao@debian.org> writes:
> 
> > This file will be used by mini_uring.h and allow tests to run without
> > the need of installing liburing to run the tests.
> >
> > This is needed to run io_uring tests in BPF, such as
> > (tools/testing/selftests/bpf/prog_tests/sockopt.c).
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> Can't mini_uring rely on the kernel header like
> selftests/net/io_uring_zerocopy_tx.c does?

Before this patch, io_uring_zerocopy_tx was not relying on "make
headers" headers, as far as I know. I think it was not a problem because
there was no CI running the test, and whoever was running the test was
relying on local io_uring headers.

My patch is, in fact,  adding the following flag, which relies on the
headers now on:

	+$(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/

> I ask because this will be the third copy of these
> definitions that we're gonna need to keep in sync (kernel, liburing and
> here). Given this is only used for selftests, we better avoid the
> duplication.

Right, I don't know why this was the suggested way, but, that is how
people are using it.

I can definitely get rid of the copy and do the same mechanism as
io_uring_zerocopy_tx. This is what I've tested, and it worked fine.

---

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 4225f975fce3..9f79a392acc1 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -383,6 +383,8 @@ BPF_CFLAGS = -g -Wall -Werror -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) \
 CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
               -Wno-compare-distinct-pointer-types

+HEADER_CFLAGS = -I$(abspath $(OUTPUT)/../../../../usr/include)
+
 $(OUTPUT)/test_l4lb_noinline.o: BPF_CFLAGS += -fno-inline
 $(OUTPUT)/test_xdp_noinline.o: BPF_CFLAGS += -fno-inline

@@ -551,7 +553,7 @@ $(TRUNNER_TEST_OBJS): $(TRUNNER_OUTPUT)/%.test.o:                   \
                      $(TRUNNER_BPF_SKELS_LINKED)                       \
                      $$(BPFOBJ) | $(TRUNNER_OUTPUT)
        $$(call msg,TEST-OBJ,$(TRUNNER_BINARY),$$@)
-       $(Q)cd $$(@D) && $$(CC) -I. $$(CFLAGS) -c $(CURDIR)/$$< $$(LDLIBS) -o $$(@F)
+       $(Q)cd $$(@D) && $$(CC) -I. $$(CFLAGS) $$(HEADER_CFLAGS) -c $(CURDIR)/$$< $$(LDLIBS) -o $$(@F)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 03/11] net/socket: Break down __sys_setsockopt
  2023-10-16 13:47 ` [PATCH v7 03/11] net/socket: Break down __sys_setsockopt Breno Leitao
@ 2023-10-19  0:20   ` Jakub Kicinski
  2023-10-19 19:38   ` Martin KaFai Lau
  1 sibling, 0 replies; 30+ messages in thread
From: Jakub Kicinski @ 2023-10-19  0:20 UTC (permalink / raw)
  To: Breno Leitao
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, pabeni,
	martin.lau, krisman, David S. Miller, Eric Dumazet, bpf,
	linux-kernel, netdev, io-uring, Willem de Bruijn

On Mon, 16 Oct 2023 06:47:41 -0700 Breno Leitao wrote:
> Split __sys_setsockopt() into two functions by removing the core
> logic into a sub-function (do_sock_setsockopt()). This will avoid
> code duplication when doing the same operation in other callers, for
> instance.
> 
> do_sock_setsockopt() will be called by io_uring setsockopt() command
> operation in the following patch.

Acked-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
  2023-10-16 13:47 ` [PATCH v7 04/11] net/socket: Break down __sys_getsockopt Breno Leitao
@ 2023-10-19  0:20   ` Jakub Kicinski
  2023-10-19 19:12   ` Martin KaFai Lau
  1 sibling, 0 replies; 30+ messages in thread
From: Jakub Kicinski @ 2023-10-19  0:20 UTC (permalink / raw)
  To: Breno Leitao
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, pabeni,
	martin.lau, krisman, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Song Liu, Yonghong Song,
	KP Singh, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, bpf,
	linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
	Alexander Mikhalitsyn, David Howells

On Mon, 16 Oct 2023 06:47:42 -0700 Breno Leitao wrote:
> Split __sys_getsockopt() into two functions by removing the core
> logic into a sub-function (do_sock_getsockopt()). This will avoid
> code duplication when doing the same operation in other callers, for
> instance.
> 
> do_sock_getsockopt() will be called by io_uring getsockopt() command
> operation in the following patch.
> 
> The same was done for the setsockopt pair.

Acked-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (10 preceding siblings ...)
  2023-10-16 13:47 ` [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support Breno Leitao
@ 2023-10-19 14:58 ` Jens Axboe
  2023-10-19 15:33   ` Jakub Kicinski
  2023-10-19 15:41 ` Jens Axboe
  12 siblings, 1 reply; 30+ messages in thread
From: Jens Axboe @ 2023-10-19 14:58 UTC (permalink / raw)
  To: Breno Leitao, sdf, asml.silence, willemdebruijn.kernel, kuba,
	pabeni, martin.lau, krisman
  Cc: bpf, linux-kernel, netdev, io-uring

On 10/16/23 7:47 AM, Breno Leitao wrote:
> This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
> and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
> SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
> and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
> SOL_SOCKET level, which seems to be the most common level parameter for
> get/setsockopt(2).
> 
> In order to keep the implementation (and tests) simple, some refactors
> were done prior to the changes, as follows:

Looks like folks are mostly happy with this now, so the next question is
how to stage it?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands
  2023-10-19 14:58 ` [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Jens Axboe
@ 2023-10-19 15:33   ` Jakub Kicinski
  2023-10-19 15:40     ` Jens Axboe
  0 siblings, 1 reply; 30+ messages in thread
From: Jakub Kicinski @ 2023-10-19 15:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Breno Leitao, sdf, asml.silence, willemdebruijn.kernel, pabeni,
	martin.lau, krisman, bpf, linux-kernel, netdev, io-uring

On Thu, 19 Oct 2023 08:58:59 -0600 Jens Axboe wrote:
> On 10/16/23 7:47 AM, Breno Leitao wrote:
> > This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
> > and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
> > SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
> > and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
> > SOL_SOCKET level, which seems to be the most common level parameter for
> > get/setsockopt(2).
> > 
> > In order to keep the implementation (and tests) simple, some refactors
> > were done prior to the changes, as follows:  
> 
> Looks like folks are mostly happy with this now, so the next question is
> how to stage it?

Would be good to get acks from BPF folks but AFAICT first four patches
apply cleanly for us now. If they apply cleanly for you I reckon you
can take them directly with io-uring. It's -rc7 time, with a bit of
luck we'll get to the merge window without a conflict.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands
  2023-10-19 15:33   ` Jakub Kicinski
@ 2023-10-19 15:40     ` Jens Axboe
  0 siblings, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2023-10-19 15:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Breno Leitao, sdf, asml.silence, willemdebruijn.kernel, pabeni,
	martin.lau, krisman, bpf, linux-kernel, netdev, io-uring

On 10/19/23 9:33 AM, Jakub Kicinski wrote:
> On Thu, 19 Oct 2023 08:58:59 -0600 Jens Axboe wrote:
>> On 10/16/23 7:47 AM, Breno Leitao wrote:
>>> This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
>>> and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
>>> SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
>>> and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
>>> SOL_SOCKET level, which seems to be the most common level parameter for
>>> get/setsockopt(2).
>>>
>>> In order to keep the implementation (and tests) simple, some refactors
>>> were done prior to the changes, as follows:  
>>
>> Looks like folks are mostly happy with this now, so the next question is
>> how to stage it?
> 
> Would be good to get acks from BPF folks but AFAICT first four patches

Agree, those are still missing. BPF folks, do patches 1-2 look OK to
you?

> apply cleanly for us now. If they apply cleanly for you I reckon you
> can take them directly with io-uring. It's -rc7 time, with a bit of
> luck we'll get to the merge window without a conflict.

I'll tentatively setup a branch for this just to see if we run into
anything on the merge front. Depending on how the BPF side goes, I can
rebase/collect reviews/whatever as we go.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands
  2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (11 preceding siblings ...)
  2023-10-19 14:58 ` [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Jens Axboe
@ 2023-10-19 15:41 ` Jens Axboe
  12 siblings, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2023-10-19 15:41 UTC (permalink / raw)
  To: sdf, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	martin.lau, krisman, Breno Leitao
  Cc: bpf, linux-kernel, netdev, io-uring


On Mon, 16 Oct 2023 06:47:38 -0700, Breno Leitao wrote:
> This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
> and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
> SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
> and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
> SOL_SOCKET level, which seems to be the most common level parameter for
> get/setsockopt(2).
> 
> [...]

Applied, thanks!

[01/11] bpf: Add sockptr support for getsockopt
        commit: 7cb15cc7e081730df3392f136a8789f3d2c3fd66
[02/11] bpf: Add sockptr support for setsockopt
        commit: c028f6e54aa180747e384796760eee3bd78e0891
[03/11] net/socket: Break down __sys_setsockopt
        commit: e70464dcdcddb5128fe7956bf809683824c64de5
[04/11] net/socket: Break down __sys_getsockopt
        commit: 25f82732c8352bd0bec33c5a9989fd46cac5789f
[05/11] io_uring/cmd: Pass compat mode in issue_flags
        commit: 66c87d5639f2f80421b3a01f12dcb7718f996093
[06/11] tools headers: Grab copy of io_uring.h
        commit: c36507ed1a2c2cb05c4a2aad9acb39ca5d7c12fe
[07/11] selftests/net: Extract uring helpers to be reusable
        commit: 11336afdd4141bbbd144b118a8a559b1993dc5d2
[08/11] io_uring/cmd: return -EOPNOTSUPP if net is disabled
        commit: d807234143872e460cdf851f1b2bbda2b427f95d
[09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
        commit: c3199f61b896cdef3664dc12729a2beadf322783
[10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
        commit: 43ad652250d24e9496f4cd6a0d670417807ac9a0
[11/11] selftests/bpf/sockopt: Add io_uring support
        commit: d9710f1d12a99738ff168e252ab8e9ffdeb90ed5

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
  2023-10-16 13:47 ` [PATCH v7 04/11] net/socket: Break down __sys_getsockopt Breno Leitao
  2023-10-19  0:20   ` Jakub Kicinski
@ 2023-10-19 19:12   ` Martin KaFai Lau
  2023-10-19 20:04     ` Jens Axboe
  1 sibling, 1 reply; 30+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 19:12 UTC (permalink / raw)
  To: Breno Leitao
  Cc: bpf, linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
	Alexander Mikhalitsyn, David Howells, sdf, axboe, asml.silence,
	willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
	Eric Dumazet

On 10/16/23 6:47 AM, Breno Leitao wrote:
> diff --git a/net/socket.c b/net/socket.c
> index 0087f8c071e7..f4c156a1987e 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -2350,6 +2350,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
>   INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
>   							 int optname));
>   
> +int do_sock_getsockopt(struct socket *sock, bool compat, int level,
> +		       int optname, sockptr_t optval, sockptr_t optlen)
> +{
> +	int max_optlen __maybe_unused;
> +	const struct proto_ops *ops;
> +	int err;
> +
> +	err = security_socket_getsockopt(sock, level, optname);
> +	if (err)
> +		return err;
> +
> +	ops = READ_ONCE(sock->ops);
> +	if (level == SOL_SOCKET) {
> +		err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
> +	} else if (unlikely(!ops->getsockopt)) {
> +		err = -EOPNOTSUPP;
> +	} else {
> +		if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
> +			      "Invalid argument type"))
> +			return -EOPNOTSUPP;
> +
> +		err = ops->getsockopt(sock, level, optname, optval.user,
> +				      optlen.user);
> +	}
> +
> +	if (!compat) {
> +		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);

The max_optlen was done before the above sk_getsockopt. The bpf CI cannot catch 
it because it cannot apply patch 5 cleanly. I ran the following out of the 
linux-block tree:

$> ./test_progs -t sockopt_sk
test_sockopt_sk:PASS:join_cgroup /sockopt_sk 0 nsec
run_test:PASS:skel_load 0 nsec
run_test:PASS:setsockopt_link 0 nsec
run_test:PASS:getsockopt_link 0 nsec
(/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c:111: 
errno: Operation not permitted) Failed to call getsockopt, ret=-1
run_test:FAIL:getsetsockopt unexpected error: -1 (errno 1)
#217     sockopt_sk:FAIL


> +		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
> +						     optval, optlen, max_optlen,
> +						     err);
> +	}
> +
> +	return err;
> +}
> +EXPORT_SYMBOL(do_sock_getsockopt);
> +
>   /*
>    *	Get a socket option. Because we don't know the option lengths we have
>    *	to pass a user mode parameter for the protocols to sort out.
> @@ -2357,37 +2393,18 @@ INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
>   int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
>   		int __user *optlen)
>   {
> -	int max_optlen __maybe_unused;
> -	const struct proto_ops *ops;
>   	int err, fput_needed;
>   	struct socket *sock;
> +	bool compat;
>   
>   	sock = sockfd_lookup_light(fd, &err, &fput_needed);
>   	if (!sock)
>   		return err;
>   
> -	err = security_socket_getsockopt(sock, level, optname);
> -	if (err)
> -		goto out_put;
> -
> -	if (!in_compat_syscall())
> -		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);

The old max_optlen was done here.

> +	compat = in_compat_syscall();
> +	err = do_sock_getsockopt(sock, compat, level, optname,
> +				 USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
>   
> -	ops = READ_ONCE(sock->ops);
> -	if (level == SOL_SOCKET)
> -		err = sock_getsockopt(sock, level, optname, optval, optlen);
> -	else if (unlikely(!ops->getsockopt))
> -		err = -EOPNOTSUPP;
> -	else
> -		err = ops->getsockopt(sock, level, optname, optval,
> -					    optlen);
> -
> -	if (!in_compat_syscall())
> -		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
> -						     USER_SOCKPTR(optval),
> -						     USER_SOCKPTR(optlen),
> -						     max_optlen, err);
> -out_put:
>   	fput_light(sock->file, fput_needed);
>   	return err;
>   }


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 01/11] bpf: Add sockptr support for getsockopt
  2023-10-16 13:47 ` [PATCH v7 01/11] bpf: Add sockptr support for getsockopt Breno Leitao
@ 2023-10-19 19:24   ` Martin KaFai Lau
  0 siblings, 0 replies; 30+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 19:24 UTC (permalink / raw)
  To: Breno Leitao
  Cc: bpf, linux-kernel, netdev, io-uring, sdf, axboe, asml.silence,
	willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
	Eric Dumazet

On 10/16/23 6:47 AM, Breno Leitao wrote:
> The whole network stack uses sockptr, and while it doesn't move to
> something more modern, let's use sockptr in getsockptr BPF hooks, so, it
> could be used by other callers.
> 
> The main motivation for this change is to use it in the io_uring
> {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
> kernel value for optlen.

Acked-by: Martin KaFai Lau <martin.lau@kernel.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 02/11] bpf: Add sockptr support for setsockopt
  2023-10-16 13:47 ` [PATCH v7 02/11] bpf: Add sockptr support for setsockopt Breno Leitao
@ 2023-10-19 19:32   ` Martin KaFai Lau
  0 siblings, 0 replies; 30+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 19:32 UTC (permalink / raw)
  To: Breno Leitao
  Cc: bpf, linux-kernel, netdev, io-uring, sdf, axboe, asml.silence,
	willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
	Eric Dumazet

On 10/16/23 6:47 AM, Breno Leitao wrote:
> The whole network stack uses sockptr, and while it doesn't move to
> something more modern, let's use sockptr in setsockptr BPF hooks, so, it
> could be used by other callers.
> 
> The main motivation for this change is to use it in the io_uring
> {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
> kernel value for optlen.

Acked-by: Martin KaFai Lau <martin.lau@kernel.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 03/11] net/socket: Break down __sys_setsockopt
  2023-10-16 13:47 ` [PATCH v7 03/11] net/socket: Break down __sys_setsockopt Breno Leitao
  2023-10-19  0:20   ` Jakub Kicinski
@ 2023-10-19 19:38   ` Martin KaFai Lau
  1 sibling, 0 replies; 30+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 19:38 UTC (permalink / raw)
  To: Breno Leitao
  Cc: bpf, linux-kernel, netdev, io-uring, Willem de Bruijn, sdf,
	axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
	krisman, David S. Miller, Eric Dumazet

On 10/16/23 6:47 AM, Breno Leitao wrote:
> Split __sys_setsockopt() into two functions by removing the core
> logic into a sub-function (do_sock_setsockopt()). This will avoid
> code duplication when doing the same operation in other callers, for
> instance.
> 
> do_sock_setsockopt() will be called by io_uring setsockopt() command
> operation in the following patch.

Acked-by: Martin KaFai Lau <martin.lau@kernel.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support
  2023-10-16 13:47 ` [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support Breno Leitao
@ 2023-10-19 19:51   ` Martin KaFai Lau
  0 siblings, 0 replies; 30+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 19:51 UTC (permalink / raw)
  To: Breno Leitao
  Cc: bpf, linux-kernel, netdev, io-uring, Daniel Müller,
	open list:KERNEL SELFTEST FRAMEWORK, sdf, axboe, asml.silence,
	willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan

On 10/16/23 6:47 AM, Breno Leitao wrote:
> Expand the sockopt test to use also check for io_uring {g,s}etsockopt
> commands operations.
> 
> This patch starts by marking each test if they support io_uring support
> or not.
> 
> Right now, io_uring cmd getsockopt() has a limitation of only
> accepting level == SOL_SOCKET, otherwise it returns -EOPNOTSUPP. Since
> there aren't any test exercising getsockopt(level == SOL_SOCKET), this
> patch changes two tests to use level == SOL_SOCKET, they are
> "getsockopt: support smaller ctx->optlen" and "getsockopt: read
> ctx->optlen".
> There is no limitation for the setsockopt() part.
> 
> Later, each test runs using regular {g,s}etsockopt systemcalls, and, if
> liburing is supported, execute the same test (again), but calling
> liburing {g,s}setsockopt commands.
> 
> This patch also changes the level of two tests to use SOL_SOCKET for the
> following two tests. This is going to help to exercise the io_uring
> subsystem:
>   * getsockopt: read ctx->optlen
>   * getsockopt: support smaller ctx->optlen

Acked-by: Martin KaFai Lau <martin.lau@kernel.org>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
  2023-10-19 19:12   ` Martin KaFai Lau
@ 2023-10-19 20:04     ` Jens Axboe
  2023-10-19 20:37       ` Martin KaFai Lau
  0 siblings, 1 reply; 30+ messages in thread
From: Jens Axboe @ 2023-10-19 20:04 UTC (permalink / raw)
  To: Martin KaFai Lau, Breno Leitao
  Cc: bpf, linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
	Alexander Mikhalitsyn, David Howells, sdf, asml.silence,
	willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
	Eric Dumazet

On 10/19/23 1:12 PM, Martin KaFai Lau wrote:
> On 10/16/23 6:47?AM, Breno Leitao wrote:
>> diff --git a/net/socket.c b/net/socket.c
>> index 0087f8c071e7..f4c156a1987e 100644
>> --- a/net/socket.c
>> +++ b/net/socket.c
>> @@ -2350,6 +2350,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
>>   INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
>>                                int optname));
>>   +int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>> +               int optname, sockptr_t optval, sockptr_t optlen)
>> +{
>> +    int max_optlen __maybe_unused;
>> +    const struct proto_ops *ops;
>> +    int err;
>> +
>> +    err = security_socket_getsockopt(sock, level, optname);
>> +    if (err)
>> +        return err;
>> +
>> +    ops = READ_ONCE(sock->ops);
>> +    if (level == SOL_SOCKET) {
>> +        err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
>> +    } else if (unlikely(!ops->getsockopt)) {
>> +        err = -EOPNOTSUPP;
>> +    } else {
>> +        if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
>> +                  "Invalid argument type"))
>> +            return -EOPNOTSUPP;
>> +
>> +        err = ops->getsockopt(sock, level, optname, optval.user,
>> +                      optlen.user);
>> +    }
>> +
>> +    if (!compat) {
>> +        max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
> 
> The max_optlen was done before the above sk_getsockopt. The bpf CI cannot catch it because it cannot apply patch 5 cleanly. I ran the following out of the linux-block tree:
> 
> $> ./test_progs -t sockopt_sk
> test_sockopt_sk:PASS:join_cgroup /sockopt_sk 0 nsec
> run_test:PASS:skel_load 0 nsec
> run_test:PASS:setsockopt_link 0 nsec
> run_test:PASS:getsockopt_link 0 nsec
> (/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c:111: errno: Operation not permitted) Failed to call getsockopt, ret=-1
> run_test:FAIL:getsetsockopt unexpected error: -1 (errno 1)
> #217     sockopt_sk:FAIL

Does it work with this incremental? I can fold that in, will rebase
anyway to collect acks.


diff --git a/net/socket.c b/net/socket.c
index bccd257e13fe..eb6960958026 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2344,6 +2344,9 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
 	if (err)
 		return err;
 
+	if (!compat)
+		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+
 	ops = READ_ONCE(sock->ops);
 	if (level == SOL_SOCKET) {
 		err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
@@ -2358,12 +2361,10 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
 				      optlen.user);
 	}
 
-	if (!compat) {
-		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+	if (!compat)
 		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
 						     optval, optlen, max_optlen,
 						     err);
-	}
 
 	return err;
 }

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
  2023-10-19 20:04     ` Jens Axboe
@ 2023-10-19 20:37       ` Martin KaFai Lau
  2023-10-19 22:32         ` Jens Axboe
  0 siblings, 1 reply; 30+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 20:37 UTC (permalink / raw)
  To: Jens Axboe, Breno Leitao
  Cc: bpf, linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
	Alexander Mikhalitsyn, David Howells, sdf, asml.silence,
	willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
	Eric Dumazet

On 10/19/23 1:04 PM, Jens Axboe wrote:
> On 10/19/23 1:12 PM, Martin KaFai Lau wrote:
>> On 10/16/23 6:47?AM, Breno Leitao wrote:
>>> diff --git a/net/socket.c b/net/socket.c
>>> index 0087f8c071e7..f4c156a1987e 100644
>>> --- a/net/socket.c
>>> +++ b/net/socket.c
>>> @@ -2350,6 +2350,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
>>>    INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
>>>                                 int optname));
>>>    +int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>>> +               int optname, sockptr_t optval, sockptr_t optlen)
>>> +{
>>> +    int max_optlen __maybe_unused;
>>> +    const struct proto_ops *ops;
>>> +    int err;
>>> +
>>> +    err = security_socket_getsockopt(sock, level, optname);
>>> +    if (err)
>>> +        return err;
>>> +
>>> +    ops = READ_ONCE(sock->ops);
>>> +    if (level == SOL_SOCKET) {
>>> +        err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
>>> +    } else if (unlikely(!ops->getsockopt)) {
>>> +        err = -EOPNOTSUPP;
>>> +    } else {
>>> +        if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
>>> +                  "Invalid argument type"))
>>> +            return -EOPNOTSUPP;
>>> +
>>> +        err = ops->getsockopt(sock, level, optname, optval.user,
>>> +                      optlen.user);
>>> +    }
>>> +
>>> +    if (!compat) {
>>> +        max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
>>
>> The max_optlen was done before the above sk_getsockopt. The bpf CI cannot catch it because it cannot apply patch 5 cleanly. I ran the following out of the linux-block tree:
>>
>> $> ./test_progs -t sockopt_sk
>> test_sockopt_sk:PASS:join_cgroup /sockopt_sk 0 nsec
>> run_test:PASS:skel_load 0 nsec
>> run_test:PASS:setsockopt_link 0 nsec
>> run_test:PASS:getsockopt_link 0 nsec
>> (/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c:111: errno: Operation not permitted) Failed to call getsockopt, ret=-1
>> run_test:FAIL:getsetsockopt unexpected error: -1 (errno 1)
>> #217     sockopt_sk:FAIL
> 
> Does it work with this incremental? I can fold that in, will rebase
> anyway to collect acks.

Yes, that should work.

Acked-by: Martin KaFai Lau <martin.lau@kernel.org>

> 
> 
> diff --git a/net/socket.c b/net/socket.c
> index bccd257e13fe..eb6960958026 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -2344,6 +2344,9 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>   	if (err)
>   		return err;
>   
> +	if (!compat)
> +		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
> +
>   	ops = READ_ONCE(sock->ops);
>   	if (level == SOL_SOCKET) {
>   		err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
> @@ -2358,12 +2361,10 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>   				      optlen.user);
>   	}
>   
> -	if (!compat) {
> -		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
> +	if (!compat)
>   		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
>   						     optval, optlen, max_optlen,
>   						     err);
> -	}
>   
>   	return err;
>   }
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
  2023-10-19 20:37       ` Martin KaFai Lau
@ 2023-10-19 22:32         ` Jens Axboe
  0 siblings, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2023-10-19 22:32 UTC (permalink / raw)
  To: Martin KaFai Lau, Breno Leitao
  Cc: bpf, linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
	Alexander Mikhalitsyn, David Howells, sdf, asml.silence,
	willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
	Eric Dumazet

On 10/19/23 2:37 PM, Martin KaFai Lau wrote:
> On 10/19/23 1:04?PM, Jens Axboe wrote:
>> On 10/19/23 1:12 PM, Martin KaFai Lau wrote:
>>> On 10/16/23 6:47?AM, Breno Leitao wrote:
>>>> diff --git a/net/socket.c b/net/socket.c
>>>> index 0087f8c071e7..f4c156a1987e 100644
>>>> --- a/net/socket.c
>>>> +++ b/net/socket.c
>>>> @@ -2350,6 +2350,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
>>>>    INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
>>>>                                 int optname));
>>>>    +int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>>>> +               int optname, sockptr_t optval, sockptr_t optlen)
>>>> +{
>>>> +    int max_optlen __maybe_unused;
>>>> +    const struct proto_ops *ops;
>>>> +    int err;
>>>> +
>>>> +    err = security_socket_getsockopt(sock, level, optname);
>>>> +    if (err)
>>>> +        return err;
>>>> +
>>>> +    ops = READ_ONCE(sock->ops);
>>>> +    if (level == SOL_SOCKET) {
>>>> +        err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
>>>> +    } else if (unlikely(!ops->getsockopt)) {
>>>> +        err = -EOPNOTSUPP;
>>>> +    } else {
>>>> +        if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
>>>> +                  "Invalid argument type"))
>>>> +            return -EOPNOTSUPP;
>>>> +
>>>> +        err = ops->getsockopt(sock, level, optname, optval.user,
>>>> +                      optlen.user);
>>>> +    }
>>>> +
>>>> +    if (!compat) {
>>>> +        max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
>>>
>>> The max_optlen was done before the above sk_getsockopt. The bpf CI cannot catch it because it cannot apply patch 5 cleanly. I ran the following out of the linux-block tree:
>>>
>>> $> ./test_progs -t sockopt_sk
>>> test_sockopt_sk:PASS:join_cgroup /sockopt_sk 0 nsec
>>> run_test:PASS:skel_load 0 nsec
>>> run_test:PASS:setsockopt_link 0 nsec
>>> run_test:PASS:getsockopt_link 0 nsec
>>> (/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c:111: errno: Operation not permitted) Failed to call getsockopt, ret=-1
>>> run_test:FAIL:getsetsockopt unexpected error: -1 (errno 1)
>>> #217     sockopt_sk:FAIL
>>
>> Does it work with this incremental? I can fold that in, will rebase
>> anyway to collect acks.
> 
> Yes, that should work.
> 
> Acked-by: Martin KaFai Lau <martin.lau@kernel.org>

Thanks Martin, I'll add that too.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-10-19 22:32 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
2023-10-16 13:47 ` [PATCH v7 01/11] bpf: Add sockptr support for getsockopt Breno Leitao
2023-10-19 19:24   ` Martin KaFai Lau
2023-10-16 13:47 ` [PATCH v7 02/11] bpf: Add sockptr support for setsockopt Breno Leitao
2023-10-19 19:32   ` Martin KaFai Lau
2023-10-16 13:47 ` [PATCH v7 03/11] net/socket: Break down __sys_setsockopt Breno Leitao
2023-10-19  0:20   ` Jakub Kicinski
2023-10-19 19:38   ` Martin KaFai Lau
2023-10-16 13:47 ` [PATCH v7 04/11] net/socket: Break down __sys_getsockopt Breno Leitao
2023-10-19  0:20   ` Jakub Kicinski
2023-10-19 19:12   ` Martin KaFai Lau
2023-10-19 20:04     ` Jens Axboe
2023-10-19 20:37       ` Martin KaFai Lau
2023-10-19 22:32         ` Jens Axboe
2023-10-16 13:47 ` [PATCH v7 05/11] io_uring/cmd: Pass compat mode in issue_flags Breno Leitao
2023-10-16 13:47 ` [PATCH v7 06/11] tools headers: Grab copy of io_uring.h Breno Leitao
2023-10-16 18:56   ` Gabriel Krisman Bertazi
     [not found]   ` <652d877c.250a0220.b0af2.3a66SMTPIN_ADDED_BROKEN@mx.google.com>
2023-10-17 12:34     ` Breno Leitao
2023-10-16 13:47 ` [PATCH v7 07/11] selftests/net: Extract uring helpers to be reusable Breno Leitao
2023-10-16 13:47 ` [PATCH v7 08/11] io_uring/cmd: return -EOPNOTSUPP if net is disabled Breno Leitao
2023-10-16 13:47 ` [PATCH v7 09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
2023-10-16 18:33   ` Gabriel Krisman Bertazi
2023-10-16 13:47 ` [PATCH v7 10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
2023-10-16 18:33   ` Gabriel Krisman Bertazi
2023-10-16 13:47 ` [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support Breno Leitao
2023-10-19 19:51   ` Martin KaFai Lau
2023-10-19 14:58 ` [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Jens Axboe
2023-10-19 15:33   ` Jakub Kicinski
2023-10-19 15:40     ` Jens Axboe
2023-10-19 15:41 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).