All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] io_uring: Initial support for {s,g}etsockopt commands
@ 2023-07-24 14:22 Breno Leitao
  2023-07-24 14:22 ` [PATCH 1/4] net: expose sock_use_custom_sol_socket Breno Leitao
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Breno Leitao @ 2023-07-24 14:22 UTC (permalink / raw)
  To: asml.silence, axboe
  Cc: io-uring, netdev, davem, kuba, edumazet, pabeni, linux-kernel, leit

This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
and optnames. On the other hand, SOCKET_URING_OP_GETSOCKOPT just
implements level SOL_SOCKET case, which seems to be the most common
level parameter for get/setsockopt(2).

struct proto_ops->setsockopt() uses sockptr instead of userspace
pointers, which makes it easy to bind to io_uring. Unfortunately
proto_ops->getsockopt() callback uses userspace pointers, except for
SOL_SOCKET, which is handled by sk_getsockopt(). Thus, this patchset
leverages sk_getsockopt() to implement the SOCKET_URING_OP_GETSOCKOPT
case.

PS1: For getsockopt command, the optlen field is not a userspace
pointers, but an absolute value, so this is slightly different from
getsockopt(2) behaviour. The new optlen value is returned in cqe->res.

PS2: The userspace pointers need to be alive until the operation is
completed.

This patch was tested with a new test[1] in liburing.
This patch depends on "io_uring: Add io_uring command support for sockets"[2]

[1] Link: https://github.com/leitao/liburing/blob/getsock/test/socket-getsetsock-cmd.c
[2] Link: https://lore.kernel.org/all/20230627134424.2784797-1-leitao@debian.org/

Changes from RFC:
	* Copy user memory at io_uring subsystem, and call proto_ops
	  callbacks using kernel memory
	* Implement all the cases for SOCKET_URING_OP_SETSOCKOPT

Breno Leitao (4):
  net: expose sock_use_custom_sol_socket
  io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  io_uring/cmd: Extend support beyond SOL_SOCKET

 include/linux/net.h           |  5 +++
 include/uapi/linux/io_uring.h |  8 ++++
 io_uring/uring_cmd.c          | 81 +++++++++++++++++++++++++++++++++++
 net/socket.c                  |  5 ---
 4 files changed, 94 insertions(+), 5 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] net: expose sock_use_custom_sol_socket
  2023-07-24 14:22 [PATCH 0/3] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
@ 2023-07-24 14:22 ` Breno Leitao
  2023-07-24 14:22 ` [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Breno Leitao @ 2023-07-24 14:22 UTC (permalink / raw)
  To: asml.silence, axboe, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: io-uring, netdev, linux-kernel, leit

Exposing function sock_use_custom_sol_socket(), so it could be used by
io_uring subsystem.

This function will be used in the function io_uring_cmd_setsockopt() in
the coming patch, so, let's move it to the socket.h header file.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/net.h | 5 +++++
 net/socket.c        | 5 -----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 41c608c1b02c..14a956e4530e 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -355,4 +355,9 @@ u32 kernel_sock_ip_overhead(struct sock *sk);
 #define MODULE_ALIAS_NET_PF_PROTO_NAME(pf, proto, name) \
 	MODULE_ALIAS("net-pf-" __stringify(pf) "-proto-" __stringify(proto) \
 		     name)
+
+static inline bool sock_use_custom_sol_socket(const struct socket *sock)
+{
+	return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
+}
 #endif	/* _LINUX_NET_H */
diff --git a/net/socket.c b/net/socket.c
index 1dc23f5298ba..8df54352af83 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2216,11 +2216,6 @@ SYSCALL_DEFINE4(recv, int, fd, void __user *, ubuf, size_t, size,
 	return __sys_recvfrom(fd, ubuf, size, flags, NULL, NULL);
 }
 
-static bool sock_use_custom_sol_socket(const struct socket *sock)
-{
-	return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
-}
-
 /*
  *	Set a socket option. Because we don't know the option lengths we have
  *	to pass the user mode parameter for the protocols to sort out.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-24 14:22 [PATCH 0/3] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
  2023-07-24 14:22 ` [PATCH 1/4] net: expose sock_use_custom_sol_socket Breno Leitao
@ 2023-07-24 14:22 ` Breno Leitao
  2023-07-24 17:31   ` Stanislav Fomichev
  2023-07-24 22:58   ` Willem de Bruijn
  2023-07-24 14:22 ` [PATCH 3/4] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
  2023-07-24 14:22 ` [PATCH 4/4] io_uring/cmd: Extend support beyond SOL_SOCKET Breno Leitao
  3 siblings, 2 replies; 17+ messages in thread
From: Breno Leitao @ 2023-07-24 14:22 UTC (permalink / raw)
  To: asml.silence, axboe
  Cc: io-uring, netdev, davem, kuba, edumazet, pabeni, linux-kernel, leit

Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
where a sockptr_t is either userspace or kernel space, and handled as
such.

Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().

Differently from the getsockopt(2), the optlen field is not a userspace
pointers. In getsockopt(2), userspace provides optlen pointer, which is
overwritten by the kernel.  In this implementation, userspace passes a
u32, and the new value is returned in cqe->res. I.e., optlen is not a
pointer.

Important to say that userspace needs to keep the pointer alive until
the CQE is completed.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/uapi/linux/io_uring.h |  7 ++++++
 io_uring/uring_cmd.c          | 43 +++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 9fc7195f25df..8152151080db 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -43,6 +43,10 @@ struct io_uring_sqe {
 	union {
 		__u64	addr;	/* pointer to buffer or iovecs */
 		__u64	splice_off_in;
+		struct {
+			__u32	level;
+			__u32	optname;
+		};
 	};
 	__u32	len;		/* buffer size or number of iovecs */
 	union {
@@ -79,6 +83,7 @@ struct io_uring_sqe {
 	union {
 		__s32	splice_fd_in;
 		__u32	file_index;
+		__u32	optlen;
 		struct {
 			__u16	addr_len;
 			__u16	__pad3[1];
@@ -89,6 +94,7 @@ struct io_uring_sqe {
 			__u64	addr3;
 			__u64	__pad2[1];
 		};
+		__u64	optval;
 		/*
 		 * If the ring is initialized with IORING_SETUP_SQE128, then
 		 * this field is used for 80 bytes of arbitrary command data
@@ -729,6 +735,7 @@ struct io_uring_recvmsg_out {
 enum {
 	SOCKET_URING_OP_SIOCINQ		= 0,
 	SOCKET_URING_OP_SIOCOUTQ,
+	SOCKET_URING_OP_GETSOCKOPT,
 };
 
 #ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 8e7a03c1b20e..16c857cbf3b0 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -166,6 +166,47 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 }
 EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
 
+static inline int io_uring_cmd_getsockopt(struct socket *sock,
+					  struct io_uring_cmd *cmd)
+{
+	void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+	int optname = READ_ONCE(cmd->sqe->optname);
+	int optlen = READ_ONCE(cmd->sqe->optlen);
+	int level = READ_ONCE(cmd->sqe->level);
+	void *koptval;
+	int err;
+
+	err = security_socket_getsockopt(sock, level, optname);
+	if (err)
+		return err;
+
+	koptval = kmalloc(optlen, GFP_KERNEL);
+	if (!koptval)
+		return -ENOMEM;
+
+	err = copy_from_user(koptval, optval, optlen);
+	if (err)
+		goto fail;
+
+	err = -EOPNOTSUPP;
+	if (level == SOL_SOCKET) {
+		err = sk_getsockopt(sock->sk, level, optname,
+				    KERNEL_SOCKPTR(koptval),
+				    KERNEL_SOCKPTR(&optlen));
+		if (err)
+			goto fail;
+	}
+
+	err = copy_to_user(optval, koptval, optlen);
+
+fail:
+	kfree(koptval);
+	if (err)
+		return err;
+	else
+		return optlen;
+}
+
 int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 {
 	struct socket *sock = cmd->file->private_data;
@@ -187,6 +228,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 		if (ret)
 			return ret;
 		return arg;
+	case SOCKET_URING_OP_GETSOCKOPT:
+		return io_uring_cmd_getsockopt(sock, cmd);
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  2023-07-24 14:22 [PATCH 0/3] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
  2023-07-24 14:22 ` [PATCH 1/4] net: expose sock_use_custom_sol_socket Breno Leitao
  2023-07-24 14:22 ` [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
@ 2023-07-24 14:22 ` Breno Leitao
  2023-07-24 14:22 ` [PATCH 4/4] io_uring/cmd: Extend support beyond SOL_SOCKET Breno Leitao
  3 siblings, 0 replies; 17+ messages in thread
From: Breno Leitao @ 2023-07-24 14:22 UTC (permalink / raw)
  To: asml.silence, axboe
  Cc: io-uring, netdev, davem, kuba, edumazet, pabeni, linux-kernel, leit

Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is
similar to setsockopt. This initial implementation just cares about
SOL_SOCKET level for now. The next patch implements the generic case.

Function io_uring_cmd_setsockopt() is inspired by the function
__sys_setsockopt().

"optval" is currently copied to kernel space in io_uring_cmd_setsockopt(),
so, the setsockopt() protocol callbacks operate on kernel space memory
after io_uring handlers.

Important to say that userspace needs to keep the pointer's memory alive
until the operation is completed.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/uapi/linux/io_uring.h |  1 +
 io_uring/uring_cmd.c          | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 8152151080db..3fe82df06abf 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -736,6 +736,7 @@ enum {
 	SOCKET_URING_OP_SIOCINQ		= 0,
 	SOCKET_URING_OP_SIOCOUTQ,
 	SOCKET_URING_OP_GETSOCKOPT,
+	SOCKET_URING_OP_SETSOCKOPT,
 };
 
 #ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 16c857cbf3b0..d63a3b0f93a3 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -207,6 +207,39 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
 		return optlen;
 }
 
+static inline int io_uring_cmd_setsockopt(struct socket *sock,
+					  struct io_uring_cmd *cmd)
+{
+	void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+	int optname = READ_ONCE(cmd->sqe->optname);
+	int optlen = READ_ONCE(cmd->sqe->optlen);
+	int level = READ_ONCE(cmd->sqe->level);
+	void *koptval;
+	int err;
+
+	err = security_socket_setsockopt(sock, level, optname);
+	if (err)
+		return err;
+
+	koptval = kmalloc(optlen, GFP_KERNEL);
+	if (!koptval)
+		return -ENOMEM;
+
+	err = copy_from_user(koptval, optval, optlen);
+	if (err)
+		goto fail;
+
+	err = -EOPNOTSUPP;
+	if (level == SOL_SOCKET && !sock_use_custom_sol_socket(sock)) {
+		err = sock_setsockopt(sock, level, optname,
+				      KERNEL_SOCKPTR(koptval), optlen);
+	}
+
+fail:
+	kfree(koptval);
+	return err;
+}
+
 int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 {
 	struct socket *sock = cmd->file->private_data;
@@ -230,6 +263,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 		return arg;
 	case SOCKET_URING_OP_GETSOCKOPT:
 		return io_uring_cmd_getsockopt(sock, cmd);
+	case SOCKET_URING_OP_SETSOCKOPT:
+		return io_uring_cmd_setsockopt(sock, cmd);
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] io_uring/cmd: Extend support beyond SOL_SOCKET
  2023-07-24 14:22 [PATCH 0/3] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (2 preceding siblings ...)
  2023-07-24 14:22 ` [PATCH 3/4] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
@ 2023-07-24 14:22 ` Breno Leitao
  3 siblings, 0 replies; 17+ messages in thread
From: Breno Leitao @ 2023-07-24 14:22 UTC (permalink / raw)
  To: asml.silence, axboe
  Cc: io-uring, netdev, davem, kuba, edumazet, pabeni, linux-kernel, leit

Add generic support for SOCKET_URING_OP_SETSOCKOPT, expanding the
current case, that just execute if level is SOL_SOCKET.

This implementation basically calls sock->ops->setsockopt() with a
kernel allocated optval;

Since the callback for ops->setsockopt() is already using sockptr_t,
then the callbacks are leveraged to be called directly, similarly to
__sys_setsockopt().

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 io_uring/uring_cmd.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index d63a3b0f93a3..ff438826e63f 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -229,11 +229,14 @@ static inline int io_uring_cmd_setsockopt(struct socket *sock,
 	if (err)
 		goto fail;
 
-	err = -EOPNOTSUPP;
-	if (level == SOL_SOCKET && !sock_use_custom_sol_socket(sock)) {
+	if (level == SOL_SOCKET && !sock_use_custom_sol_socket(sock))
 		err = sock_setsockopt(sock, level, optname,
 				      KERNEL_SOCKPTR(koptval), optlen);
-	}
+	else if (unlikely(!sock->ops->setsockopt))
+		err = -EOPNOTSUPP;
+	else
+		err = sock->ops->setsockopt(sock, level, optname,
+					    KERNEL_SOCKPTR(koptval), optlen);
 
 fail:
 	kfree(koptval);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-24 14:22 ` [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
@ 2023-07-24 17:31   ` Stanislav Fomichev
  2023-07-25  9:27     ` Breno Leitao
  2023-07-24 22:58   ` Willem de Bruijn
  1 sibling, 1 reply; 17+ messages in thread
From: Stanislav Fomichev @ 2023-07-24 17:31 UTC (permalink / raw)
  To: Breno Leitao
  Cc: asml.silence, axboe, io-uring, netdev, davem, kuba, edumazet,
	pabeni, linux-kernel, leit, bpf

On 07/24, Breno Leitao wrote:
> Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> where a sockptr_t is either userspace or kernel space, and handled as
> such.
> 
> Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().

We probably need to also have bpf bits in the new
io_uring_cmd_getsockopt?

> Differently from the getsockopt(2), the optlen field is not a userspace
> pointers. In getsockopt(2), userspace provides optlen pointer, which is
> overwritten by the kernel.  In this implementation, userspace passes a
> u32, and the new value is returned in cqe->res. I.e., optlen is not a
> pointer.
> 
> Important to say that userspace needs to keep the pointer alive until
> the CQE is completed.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  include/uapi/linux/io_uring.h |  7 ++++++
>  io_uring/uring_cmd.c          | 43 +++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 9fc7195f25df..8152151080db 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -43,6 +43,10 @@ struct io_uring_sqe {
>  	union {
>  		__u64	addr;	/* pointer to buffer or iovecs */
>  		__u64	splice_off_in;
> +		struct {
> +			__u32	level;
> +			__u32	optname;
> +		};
>  	};
>  	__u32	len;		/* buffer size or number of iovecs */
>  	union {
> @@ -79,6 +83,7 @@ struct io_uring_sqe {
>  	union {
>  		__s32	splice_fd_in;
>  		__u32	file_index;
> +		__u32	optlen;
>  		struct {
>  			__u16	addr_len;
>  			__u16	__pad3[1];
> @@ -89,6 +94,7 @@ struct io_uring_sqe {
>  			__u64	addr3;
>  			__u64	__pad2[1];
>  		};
> +		__u64	optval;
>  		/*
>  		 * If the ring is initialized with IORING_SETUP_SQE128, then
>  		 * this field is used for 80 bytes of arbitrary command data
> @@ -729,6 +735,7 @@ struct io_uring_recvmsg_out {
>  enum {
>  	SOCKET_URING_OP_SIOCINQ		= 0,
>  	SOCKET_URING_OP_SIOCOUTQ,
> +	SOCKET_URING_OP_GETSOCKOPT,
>  };
>  
>  #ifdef __cplusplus
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index 8e7a03c1b20e..16c857cbf3b0 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -166,6 +166,47 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
>  }
>  EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
>  
> +static inline int io_uring_cmd_getsockopt(struct socket *sock,
> +					  struct io_uring_cmd *cmd)
> +{
> +	void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
> +	int optname = READ_ONCE(cmd->sqe->optname);
> +	int optlen = READ_ONCE(cmd->sqe->optlen);
> +	int level = READ_ONCE(cmd->sqe->level);
> +	void *koptval;
> +	int err;
> +
> +	err = security_socket_getsockopt(sock, level, optname);
> +	if (err)
> +		return err;
> +
> +	koptval = kmalloc(optlen, GFP_KERNEL);
> +	if (!koptval)
> +		return -ENOMEM;
> +
> +	err = copy_from_user(koptval, optval, optlen);
> +	if (err)
> +		goto fail;
> +
> +	err = -EOPNOTSUPP;
> +	if (level == SOL_SOCKET) {
> +		err = sk_getsockopt(sock->sk, level, optname,
> +				    KERNEL_SOCKPTR(koptval),
> +				    KERNEL_SOCKPTR(&optlen));
> +		if (err)
> +			goto fail;
> +	}
> +
> +	err = copy_to_user(optval, koptval, optlen);
> +
> +fail:
> +	kfree(koptval);
> +	if (err)
> +		return err;
> +	else
> +		return optlen;
> +}
> +
>  int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>  {
>  	struct socket *sock = cmd->file->private_data;
> @@ -187,6 +228,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>  		if (ret)
>  			return ret;
>  		return arg;
> +	case SOCKET_URING_OP_GETSOCKOPT:
> +		return io_uring_cmd_getsockopt(sock, cmd);
>  	default:
>  		return -EOPNOTSUPP;
>  	}
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-24 14:22 ` [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
  2023-07-24 17:31   ` Stanislav Fomichev
@ 2023-07-24 22:58   ` Willem de Bruijn
  2023-07-25  9:51     ` Breno Leitao
  1 sibling, 1 reply; 17+ messages in thread
From: Willem de Bruijn @ 2023-07-24 22:58 UTC (permalink / raw)
  To: Breno Leitao, asml.silence, axboe
  Cc: io-uring, netdev, davem, kuba, edumazet, pabeni, linux-kernel, leit

Breno Leitao wrote:
> Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> where a sockptr_t is either userspace or kernel space, and handled as
> such.
> 
> Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
> 
> Differently from the getsockopt(2), the optlen field is not a userspace
> pointers. In getsockopt(2), userspace provides optlen pointer, which is
> overwritten by the kernel.  In this implementation, userspace passes a
> u32, and the new value is returned in cqe->res. I.e., optlen is not a
> pointer.
> 
> Important to say that userspace needs to keep the pointer alive until
> the CQE is completed.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  include/uapi/linux/io_uring.h |  7 ++++++
>  io_uring/uring_cmd.c          | 43 +++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 9fc7195f25df..8152151080db 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -43,6 +43,10 @@ struct io_uring_sqe {
>  	union {
>  		__u64	addr;	/* pointer to buffer or iovecs */
>  		__u64	splice_off_in;
> +		struct {
> +			__u32	level;
> +			__u32	optname;
> +		};
>  	};
>  	__u32	len;		/* buffer size or number of iovecs */
>  	union {
> @@ -79,6 +83,7 @@ struct io_uring_sqe {
>  	union {
>  		__s32	splice_fd_in;
>  		__u32	file_index;
> +		__u32	optlen;
>  		struct {
>  			__u16	addr_len;
>  			__u16	__pad3[1];
> @@ -89,6 +94,7 @@ struct io_uring_sqe {
>  			__u64	addr3;
>  			__u64	__pad2[1];
>  		};
> +		__u64	optval;
>  		/*
>  		 * If the ring is initialized with IORING_SETUP_SQE128, then
>  		 * this field is used for 80 bytes of arbitrary command data
> @@ -729,6 +735,7 @@ struct io_uring_recvmsg_out {
>  enum {
>  	SOCKET_URING_OP_SIOCINQ		= 0,
>  	SOCKET_URING_OP_SIOCOUTQ,
> +	SOCKET_URING_OP_GETSOCKOPT,
>  };
>  
>  #ifdef __cplusplus
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index 8e7a03c1b20e..16c857cbf3b0 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -166,6 +166,47 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
>  }
>  EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
>  
> +static inline int io_uring_cmd_getsockopt(struct socket *sock,
> +					  struct io_uring_cmd *cmd)
> +{
> +	void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
> +	int optname = READ_ONCE(cmd->sqe->optname);
> +	int optlen = READ_ONCE(cmd->sqe->optlen);
> +	int level = READ_ONCE(cmd->sqe->level);
> +	void *koptval;
> +	int err;
> +
> +	err = security_socket_getsockopt(sock, level, optname);
> +	if (err)
> +		return err;
> +
> +	koptval = kmalloc(optlen, GFP_KERNEL);
> +	if (!koptval)
> +		return -ENOMEM;

This will try to kmalloc any length that userspace passes?

That is unnecessary ..

> +
> +	err = copy_from_user(koptval, optval, optlen);
> +	if (err)
> +		goto fail;
> +
> +	err = -EOPNOTSUPP;
> +	if (level == SOL_SOCKET) {
> +		err = sk_getsockopt(sock->sk, level, optname,
> +				    KERNEL_SOCKPTR(koptval),
> +				    KERNEL_SOCKPTR(&optlen));

.. sk_getsockopt defines a union of acceptable fields, which
are all fairly small.

I notice that BPF added BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN to
work around the issue of pre-allocating for the worst case.

But that also needs to deal woth other getsockopt levels.


> +		if (err)
> +			goto fail;
> +	}
> +
> +	err = copy_to_user(optval, koptval, optlen);
> +
> +fail:
> +	kfree(koptval);
> +	if (err)
> +		return err;
> +	else
> +		return optlen;
> +}
> +
>  int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>  {
>  	struct socket *sock = cmd->file->private_data;
> @@ -187,6 +228,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>  		if (ret)
>  			return ret;
>  		return arg;
> +	case SOCKET_URING_OP_GETSOCKOPT:
> +		return io_uring_cmd_getsockopt(sock, cmd);
>  	default:
>  		return -EOPNOTSUPP;
>  	}
> -- 
> 2.34.1
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-24 17:31   ` Stanislav Fomichev
@ 2023-07-25  9:27     ` Breno Leitao
  2023-07-25 17:02       ` Stanislav Fomichev
  0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2023-07-25  9:27 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: asml.silence, axboe, io-uring, netdev, davem, kuba, edumazet,
	pabeni, linux-kernel, leit, bpf

On Mon, Jul 24, 2023 at 10:31:28AM -0700, Stanislav Fomichev wrote:
> On 07/24, Breno Leitao wrote:
> > Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> > level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> > where a sockptr_t is either userspace or kernel space, and handled as
> > such.
> > 
> > Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
> 
> We probably need to also have bpf bits in the new
> io_uring_cmd_getsockopt?

It might be interesting to have the BPF hook for this function as
well, but I would like to do it in a following patch, so, I can
experiment with it better, if that is OK.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-24 22:58   ` Willem de Bruijn
@ 2023-07-25  9:51     ` Breno Leitao
  2023-07-25 13:56       ` Willem de Bruijn
  0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2023-07-25  9:51 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: asml.silence, axboe, io-uring, netdev, davem, kuba, edumazet,
	pabeni, linux-kernel, leit

On Mon, Jul 24, 2023 at 06:58:04PM -0400, Willem de Bruijn wrote:
> Breno Leitao wrote:
> > Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> > level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> > where a sockptr_t is either userspace or kernel space, and handled as
> > such.
> > 
> > Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
> > 
> > Differently from the getsockopt(2), the optlen field is not a userspace
> > pointers. In getsockopt(2), userspace provides optlen pointer, which is
> > overwritten by the kernel.  In this implementation, userspace passes a
> > u32, and the new value is returned in cqe->res. I.e., optlen is not a
> > pointer.
> > 
> > Important to say that userspace needs to keep the pointer alive until
> > the CQE is completed.
> > 
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> >  include/uapi/linux/io_uring.h |  7 ++++++
> >  io_uring/uring_cmd.c          | 43 +++++++++++++++++++++++++++++++++++
> >  2 files changed, 50 insertions(+)
> > 
> > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > index 9fc7195f25df..8152151080db 100644
> > --- a/include/uapi/linux/io_uring.h
> > +++ b/include/uapi/linux/io_uring.h
> > @@ -43,6 +43,10 @@ struct io_uring_sqe {
> >  	union {
> >  		__u64	addr;	/* pointer to buffer or iovecs */
> >  		__u64	splice_off_in;
> > +		struct {
> > +			__u32	level;
> > +			__u32	optname;
> > +		};
> >  	};
> >  	__u32	len;		/* buffer size or number of iovecs */
> >  	union {
> > @@ -79,6 +83,7 @@ struct io_uring_sqe {
> >  	union {
> >  		__s32	splice_fd_in;
> >  		__u32	file_index;
> > +		__u32	optlen;
> >  		struct {
> >  			__u16	addr_len;
> >  			__u16	__pad3[1];
> > @@ -89,6 +94,7 @@ struct io_uring_sqe {
> >  			__u64	addr3;
> >  			__u64	__pad2[1];
> >  		};
> > +		__u64	optval;
> >  		/*
> >  		 * If the ring is initialized with IORING_SETUP_SQE128, then
> >  		 * this field is used for 80 bytes of arbitrary command data
> > @@ -729,6 +735,7 @@ struct io_uring_recvmsg_out {
> >  enum {
> >  	SOCKET_URING_OP_SIOCINQ		= 0,
> >  	SOCKET_URING_OP_SIOCOUTQ,
> > +	SOCKET_URING_OP_GETSOCKOPT,
> >  };
> >  
> >  #ifdef __cplusplus
> > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> > index 8e7a03c1b20e..16c857cbf3b0 100644
> > --- a/io_uring/uring_cmd.c
> > +++ b/io_uring/uring_cmd.c
> > @@ -166,6 +166,47 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
> >  }
> >  EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
> >  
> > +static inline int io_uring_cmd_getsockopt(struct socket *sock,
> > +					  struct io_uring_cmd *cmd)
> > +{
> > +	void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
> > +	int optname = READ_ONCE(cmd->sqe->optname);
> > +	int optlen = READ_ONCE(cmd->sqe->optlen);
> > +	int level = READ_ONCE(cmd->sqe->level);
> > +	void *koptval;
> > +	int err;
> > +
> > +	err = security_socket_getsockopt(sock, level, optname);
> > +	if (err)
> > +		return err;
> > +
> > +	koptval = kmalloc(optlen, GFP_KERNEL);
> > +	if (!koptval)
> > +		return -ENOMEM;
> 
> This will try to kmalloc any length that userspace passes?

Yes, this value is coming directly from userspace.

> That is unnecessary ..
> > +
> > +	err = copy_from_user(koptval, optval, optlen);
> > +	if (err)
> > +		goto fail;
> > +
> > +	err = -EOPNOTSUPP;
> > +	if (level == SOL_SOCKET) {
> > +		err = sk_getsockopt(sock->sk, level, optname,
> > +				    KERNEL_SOCKPTR(koptval),
> > +				    KERNEL_SOCKPTR(&optlen));
> 
> .. sk_getsockopt defines a union of acceptable fields, which
> are all fairly small.

Right, and they are all I need for SOL_SOCKET level for now.

> I notice that BPF added BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN to
> work around the issue of pre-allocating for the worst case.

I am having a hard time how to understand how
BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN gets the MAX_OPTLEN. Reading this
function, it seems it is conditionally get_user().


	#define BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen)
	({
		int __ret = 0;
		if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT))
			get_user(__ret, optlen);
		__ret;
	})

That said, how is BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN being used to
workaroundthe pre-allocating for the worst case?

> But that also needs to deal woth other getsockopt levels.

Right, and we also have a similar kmalloc() on the setsockopt path
(SOCKET_URING_OP_SETSOCKOPT).

What about if I pass the userspace sockptr (USER_SOCKPTR) to the
{g,s}etsockopt callback directly, instead of kmalloc() in io_uring(), as
I was doing int the RFC[1]? It avoids any extra kmalloc at all.

Something as:

	static inline int io_uring_cmd_getsockopt(struct socket *sock,
						  struct io_uring_cmd *cmd)
	{
		void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
		int optlen = READ_ONCE(cmd->sqe->optlen);
		int optname = READ_ONCE(cmd->sqe->optname);
		int level = READ_ONCE(cmd->sqe->level);
		int err;

		err = security_socket_getsockopt(sock, level, optname);
		if (err)
			return err;

		if (level == SOL_SOCKET) {
			err = sk_getsockopt(sock->sk, level, optname,
					    USER_SOCKPTR(optval),
					    KERNEL_SOCKPTR(&optlen));
			if (err < 0)
				return err;
			return optlen;
		}

		return -EOPNOTSUPP;

Thanks for the review!

[1] Link: https://lore.kernel.org/all/20230719102737.2513246-3-leitao@debian.org/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-25  9:51     ` Breno Leitao
@ 2023-07-25 13:56       ` Willem de Bruijn
  2023-07-25 15:23         ` Breno Leitao
  0 siblings, 1 reply; 17+ messages in thread
From: Willem de Bruijn @ 2023-07-25 13:56 UTC (permalink / raw)
  To: Breno Leitao, Willem de Bruijn
  Cc: asml.silence, axboe, io-uring, netdev, davem, kuba, edumazet,
	pabeni, linux-kernel, leit

Breno Leitao wrote:
> On Mon, Jul 24, 2023 at 06:58:04PM -0400, Willem de Bruijn wrote:
> > Breno Leitao wrote:
> > > Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> > > level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> > > where a sockptr_t is either userspace or kernel space, and handled as
> > > such.
> > > 
> > > Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
> > > 
> > > Differently from the getsockopt(2), the optlen field is not a userspace
> > > pointers. In getsockopt(2), userspace provides optlen pointer, which is
> > > overwritten by the kernel.  In this implementation, userspace passes a
> > > u32, and the new value is returned in cqe->res. I.e., optlen is not a
> > > pointer.
> > > 
> > > Important to say that userspace needs to keep the pointer alive until
> > > the CQE is completed.
> > > 
> > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > ---
> > >  include/uapi/linux/io_uring.h |  7 ++++++
> > >  io_uring/uring_cmd.c          | 43 +++++++++++++++++++++++++++++++++++
> > >  2 files changed, 50 insertions(+)
> > > 
> > > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > > index 9fc7195f25df..8152151080db 100644
> > > --- a/include/uapi/linux/io_uring.h
> > > +++ b/include/uapi/linux/io_uring.h
> > > @@ -43,6 +43,10 @@ struct io_uring_sqe {
> > >  	union {
> > >  		__u64	addr;	/* pointer to buffer or iovecs */
> > >  		__u64	splice_off_in;
> > > +		struct {
> > > +			__u32	level;
> > > +			__u32	optname;
> > > +		};
> > >  	};
> > >  	__u32	len;		/* buffer size or number of iovecs */
> > >  	union {
> > > @@ -79,6 +83,7 @@ struct io_uring_sqe {
> > >  	union {
> > >  		__s32	splice_fd_in;
> > >  		__u32	file_index;
> > > +		__u32	optlen;
> > >  		struct {
> > >  			__u16	addr_len;
> > >  			__u16	__pad3[1];
> > > @@ -89,6 +94,7 @@ struct io_uring_sqe {
> > >  			__u64	addr3;
> > >  			__u64	__pad2[1];
> > >  		};
> > > +		__u64	optval;
> > >  		/*
> > >  		 * If the ring is initialized with IORING_SETUP_SQE128, then
> > >  		 * this field is used for 80 bytes of arbitrary command data
> > > @@ -729,6 +735,7 @@ struct io_uring_recvmsg_out {
> > >  enum {
> > >  	SOCKET_URING_OP_SIOCINQ		= 0,
> > >  	SOCKET_URING_OP_SIOCOUTQ,
> > > +	SOCKET_URING_OP_GETSOCKOPT,
> > >  };
> > >  
> > >  #ifdef __cplusplus
> > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> > > index 8e7a03c1b20e..16c857cbf3b0 100644
> > > --- a/io_uring/uring_cmd.c
> > > +++ b/io_uring/uring_cmd.c
> > > @@ -166,6 +166,47 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
> > >  }
> > >  EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
> > >  
> > > +static inline int io_uring_cmd_getsockopt(struct socket *sock,
> > > +					  struct io_uring_cmd *cmd)
> > > +{
> > > +	void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
> > > +	int optname = READ_ONCE(cmd->sqe->optname);
> > > +	int optlen = READ_ONCE(cmd->sqe->optlen);
> > > +	int level = READ_ONCE(cmd->sqe->level);
> > > +	void *koptval;
> > > +	int err;
> > > +
> > > +	err = security_socket_getsockopt(sock, level, optname);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	koptval = kmalloc(optlen, GFP_KERNEL);
> > > +	if (!koptval)
> > > +		return -ENOMEM;
> > 
> > This will try to kmalloc any length that userspace passes?
> 
> Yes, this value is coming directly from userspace.
> 
> > That is unnecessary ..
> > > +
> > > +	err = copy_from_user(koptval, optval, optlen);
> > > +	if (err)
> > > +		goto fail;
> > > +
> > > +	err = -EOPNOTSUPP;
> > > +	if (level == SOL_SOCKET) {
> > > +		err = sk_getsockopt(sock->sk, level, optname,
> > > +				    KERNEL_SOCKPTR(koptval),
> > > +				    KERNEL_SOCKPTR(&optlen));
> > 
> > .. sk_getsockopt defines a union of acceptable fields, which
> > are all fairly small.
> 
> Right, and they are all I need for SOL_SOCKET level for now.
> 
> > I notice that BPF added BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN to
> > work around the issue of pre-allocating for the worst case.
> 
> I am having a hard time how to understand how
> BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN gets the MAX_OPTLEN. Reading this
> function, it seems it is conditionally get_user().
> 
> 
> 	#define BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen)
> 	({
> 		int __ret = 0;
> 		if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT))
> 			get_user(__ret, optlen);
> 		__ret;
> 	})
> 
> That said, how is BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN being used to
> workaroundthe pre-allocating for the worst case?
> 
> > But that also needs to deal woth other getsockopt levels.
> 
> Right, and we also have a similar kmalloc() on the setsockopt path
> (SOCKET_URING_OP_SETSOCKOPT).
> 
> What about if I pass the userspace sockptr (USER_SOCKPTR) to the
> {g,s}etsockopt callback directly, instead of kmalloc() in io_uring(), as
> I was doing int the RFC[1]? It avoids any extra kmalloc at all.

That looks like a great solution to me.

Avoids the whole problem of kmalloc based on untrusted user input.

> Something as:
> 
> 	static inline int io_uring_cmd_getsockopt(struct socket *sock,
> 						  struct io_uring_cmd *cmd)
> 	{
> 		void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
> 		int optlen = READ_ONCE(cmd->sqe->optlen);
> 		int optname = READ_ONCE(cmd->sqe->optname);
> 		int level = READ_ONCE(cmd->sqe->level);
> 		int err;
> 
> 		err = security_socket_getsockopt(sock, level, optname);
> 		if (err)
> 			return err;
> 
> 		if (level == SOL_SOCKET) {
> 			err = sk_getsockopt(sock->sk, level, optname,
> 					    USER_SOCKPTR(optval),
> 					    KERNEL_SOCKPTR(&optlen));
> 			if (err < 0)
> 				return err;
> 			return optlen;
> 		}

Do you have a plan to extend this to other levels?

No need to implement immediately, but it would be good to know
whether it is feasible to extend the current solution when the
need (inevitably) shows up.

> 
> 		return -EOPNOTSUPP;
> 
> Thanks for the review!
> 
> [1] Link: https://lore.kernel.org/all/20230719102737.2513246-3-leitao@debian.org/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-25 13:56       ` Willem de Bruijn
@ 2023-07-25 15:23         ` Breno Leitao
  0 siblings, 0 replies; 17+ messages in thread
From: Breno Leitao @ 2023-07-25 15:23 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: asml.silence, axboe, io-uring, netdev, davem, kuba, edumazet,
	pabeni, linux-kernel, leit

On Tue, Jul 25, 2023 at 09:56:50AM -0400, Willem de Bruijn wrote:
> Breno Leitao wrote:
> > On Mon, Jul 24, 2023 at 06:58:04PM -0400, Willem de Bruijn wrote:
> > > Breno Leitao wrote:
> > > > Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> > > > level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> > > > where a sockptr_t is either userspace or kernel space, and handled as
> > > > such.
> > > > 
> > > > Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
> > > > 
> > > > Differently from the getsockopt(2), the optlen field is not a userspace
> > > > pointers. In getsockopt(2), userspace provides optlen pointer, which is
> > > > overwritten by the kernel.  In this implementation, userspace passes a
> > > > u32, and the new value is returned in cqe->res. I.e., optlen is not a
> > > > pointer.
> > > > 
> > > > Important to say that userspace needs to keep the pointer alive until
> > > > the CQE is completed.
> > > > 
> > > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > > ---
> > > >  include/uapi/linux/io_uring.h |  7 ++++++
> > > >  io_uring/uring_cmd.c          | 43 +++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 50 insertions(+)
> > > > 
> > > > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > > > index 9fc7195f25df..8152151080db 100644
> > > > --- a/include/uapi/linux/io_uring.h
> > > > +++ b/include/uapi/linux/io_uring.h
> > > > @@ -43,6 +43,10 @@ struct io_uring_sqe {
> > > >  	union {
> > > >  		__u64	addr;	/* pointer to buffer or iovecs */
> > > >  		__u64	splice_off_in;
> > > > +		struct {
> > > > +			__u32	level;
> > > > +			__u32	optname;
> > > > +		};
> > > >  	};
> > > >  	__u32	len;		/* buffer size or number of iovecs */
> > > >  	union {
> > > > @@ -79,6 +83,7 @@ struct io_uring_sqe {
> > > >  	union {
> > > >  		__s32	splice_fd_in;
> > > >  		__u32	file_index;
> > > > +		__u32	optlen;
> > > >  		struct {
> > > >  			__u16	addr_len;
> > > >  			__u16	__pad3[1];
> > > > @@ -89,6 +94,7 @@ struct io_uring_sqe {
> > > >  			__u64	addr3;
> > > >  			__u64	__pad2[1];
> > > >  		};
> > > > +		__u64	optval;
> > > >  		/*
> > > >  		 * If the ring is initialized with IORING_SETUP_SQE128, then
> > > >  		 * this field is used for 80 bytes of arbitrary command data
> > > > @@ -729,6 +735,7 @@ struct io_uring_recvmsg_out {
> > > >  enum {
> > > >  	SOCKET_URING_OP_SIOCINQ		= 0,
> > > >  	SOCKET_URING_OP_SIOCOUTQ,
> > > > +	SOCKET_URING_OP_GETSOCKOPT,
> > > >  };
> > > >  
> > > >  #ifdef __cplusplus
> > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> > > > index 8e7a03c1b20e..16c857cbf3b0 100644
> > > > --- a/io_uring/uring_cmd.c
> > > > +++ b/io_uring/uring_cmd.c
> > > > @@ -166,6 +166,47 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
> > > >  
> > > > +static inline int io_uring_cmd_getsockopt(struct socket *sock,
> > > > +					  struct io_uring_cmd *cmd)
> > > > +{
> > > > +	void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
> > > > +	int optname = READ_ONCE(cmd->sqe->optname);
> > > > +	int optlen = READ_ONCE(cmd->sqe->optlen);
> > > > +	int level = READ_ONCE(cmd->sqe->level);
> > > > +	void *koptval;
> > > > +	int err;
> > > > +
> > > > +	err = security_socket_getsockopt(sock, level, optname);
> > > > +	if (err)
> > > > +		return err;
> > > > +
> > > > +	koptval = kmalloc(optlen, GFP_KERNEL);
> > > > +	if (!koptval)
> > > > +		return -ENOMEM;
> > > 
> > > This will try to kmalloc any length that userspace passes?
> > 
> > Yes, this value is coming directly from userspace.
> > 
> > > That is unnecessary ..
> > > > +
> > > > +	err = copy_from_user(koptval, optval, optlen);
> > > > +	if (err)
> > > > +		goto fail;
> > > > +
> > > > +	err = -EOPNOTSUPP;
> > > > +	if (level == SOL_SOCKET) {
> > > > +		err = sk_getsockopt(sock->sk, level, optname,
> > > > +				    KERNEL_SOCKPTR(koptval),
> > > > +				    KERNEL_SOCKPTR(&optlen));
> > > 
> > > .. sk_getsockopt defines a union of acceptable fields, which
> > > are all fairly small.
> > 
> > Right, and they are all I need for SOL_SOCKET level for now.
> > 
> > > I notice that BPF added BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN to
> > > work around the issue of pre-allocating for the worst case.
> > 
> > I am having a hard time how to understand how
> > BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN gets the MAX_OPTLEN. Reading this
> > function, it seems it is conditionally get_user().
> > 
> > 
> > 	#define BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen)
> > 	({
> > 		int __ret = 0;
> > 		if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT))
> > 			get_user(__ret, optlen);
> > 		__ret;
> > 	})
> > 
> > That said, how is BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN being used to
> > workaroundthe pre-allocating for the worst case?
> > 
> > > But that also needs to deal woth other getsockopt levels.
> > 
> > Right, and we also have a similar kmalloc() on the setsockopt path
> > (SOCKET_URING_OP_SETSOCKOPT).
> > 
> > What about if I pass the userspace sockptr (USER_SOCKPTR) to the
> > {g,s}etsockopt callback directly, instead of kmalloc() in io_uring(), as
> > I was doing int the RFC[1]? It avoids any extra kmalloc at all.
> 
> That looks like a great solution to me.
> 
> Avoids the whole problem of kmalloc based on untrusted user input.
> 
> > Something as:
> > 
> > 	static inline int io_uring_cmd_getsockopt(struct socket *sock,
> > 						  struct io_uring_cmd *cmd)
> > 	{
> > 		void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
> > 		int optlen = READ_ONCE(cmd->sqe->optlen);
> > 		int optname = READ_ONCE(cmd->sqe->optname);
> > 		int level = READ_ONCE(cmd->sqe->level);
> > 		int err;
> > 
> > 		err = security_socket_getsockopt(sock, level, optname);
> > 		if (err)
> > 			return err;
> > 
> > 		if (level == SOL_SOCKET) {
> > 			err = sk_getsockopt(sock->sk, level, optname,
> > 					    USER_SOCKPTR(optval),
> > 					    KERNEL_SOCKPTR(&optlen));
> > 			if (err < 0)
> > 				return err;
> > 			return optlen;
> > 		}
> 
> Do you have a plan to extend this to other levels?
> 
> No need to implement immediately, but it would be good to know
> whether it is feasible to extend the current solution when the
> need (inevitably) shows up.

Yes, I plan to extend getsockopt() to all levels, but it means we need
to convert proto_ops->setsockopt to uset sockptr_t instead of
userpointers. This might require some intrusive changes, but totally
doable.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-25  9:27     ` Breno Leitao
@ 2023-07-25 17:02       ` Stanislav Fomichev
  2023-07-25 17:56         ` Martin KaFai Lau
  2023-07-28 17:03         ` Breno Leitao
  0 siblings, 2 replies; 17+ messages in thread
From: Stanislav Fomichev @ 2023-07-25 17:02 UTC (permalink / raw)
  To: Breno Leitao
  Cc: asml.silence, axboe, io-uring, netdev, davem, kuba, edumazet,
	pabeni, linux-kernel, leit, bpf, ast, martin.lau

On 07/25, Breno Leitao wrote:
> On Mon, Jul 24, 2023 at 10:31:28AM -0700, Stanislav Fomichev wrote:
> > On 07/24, Breno Leitao wrote:
> > > Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> > > level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> > > where a sockptr_t is either userspace or kernel space, and handled as
> > > such.
> > > 
> > > Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
> > 
> > We probably need to also have bpf bits in the new
> > io_uring_cmd_getsockopt?
> 
> It might be interesting to have the BPF hook for this function as
> well, but I would like to do it in a following patch, so, I can
> experiment with it better, if that is OK.

We are not using io_uring, so fine with me. However, having a way to bypass
get/setsockopt bpf might be problematic for some other heavy io_uring
users.

Lemme CC a bunch of Meta folks explicitly. I'm not sure what that state
of bpf support in io_uring.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-25 17:02       ` Stanislav Fomichev
@ 2023-07-25 17:56         ` Martin KaFai Lau
  2023-07-26  9:26           ` Breno Leitao
  2023-07-28 17:03         ` Breno Leitao
  1 sibling, 1 reply; 17+ messages in thread
From: Martin KaFai Lau @ 2023-07-25 17:56 UTC (permalink / raw)
  To: Stanislav Fomichev, Breno Leitao
  Cc: asml.silence, axboe, io-uring, netdev, davem, kuba, edumazet,
	pabeni, linux-kernel, leit, bpf, ast

On 7/25/23 10:02 AM, Stanislav Fomichev wrote:
> On 07/25, Breno Leitao wrote:
>> On Mon, Jul 24, 2023 at 10:31:28AM -0700, Stanislav Fomichev wrote:
>>> On 07/24, Breno Leitao wrote:
>>>> Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
>>>> level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
>>>> where a sockptr_t is either userspace or kernel space, and handled as
>>>> such.
>>>>
>>>> Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
>>>
>>> We probably need to also have bpf bits in the new
>>> io_uring_cmd_getsockopt?

I also think this inconsistency behavior should be avoided.

>>
>> It might be interesting to have the BPF hook for this function as
>> well, but I would like to do it in a following patch, so, I can
>> experiment with it better, if that is OK.
> 
> We are not using io_uring, so fine with me. However, having a way to bypass
> get/setsockopt bpf might be problematic for some other heavy io_uring
> users.
> 
> Lemme CC a bunch of Meta folks explicitly. I'm not sure what that state
> of bpf support in io_uring.

We have use cases on the "cgroup/{g,s}etsockopt". It will be a surprise when the 
user moves from the syscall {g,s}etsockopt to SOCKET_URING_OP_*SOCKOPT and 
figured that the bpf handling is skipped.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-25 17:56         ` Martin KaFai Lau
@ 2023-07-26  9:26           ` Breno Leitao
  0 siblings, 0 replies; 17+ messages in thread
From: Breno Leitao @ 2023-07-26  9:26 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Stanislav Fomichev, asml.silence, axboe, io-uring, netdev, davem,
	kuba, edumazet, pabeni, linux-kernel, leit, bpf, ast

On Tue, Jul 25, 2023 at 10:56:23AM -0700, Martin KaFai Lau wrote:
> On 7/25/23 10:02 AM, Stanislav Fomichev wrote:
> > On 07/25, Breno Leitao wrote:
> > > On Mon, Jul 24, 2023 at 10:31:28AM -0700, Stanislav Fomichev wrote:
> > > > On 07/24, Breno Leitao wrote:
> > > > > Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> > > > > level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> > > > > where a sockptr_t is either userspace or kernel space, and handled as
> > > > > such.
> > > > > 
> > > > > Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
> > > > 
> > > > We probably need to also have bpf bits in the new
> > > > io_uring_cmd_getsockopt?
> 
> I also think this inconsistency behavior should be avoided.
> 
> > > 
> > > It might be interesting to have the BPF hook for this function as
> > > well, but I would like to do it in a following patch, so, I can
> > > experiment with it better, if that is OK.
> > 
> > We are not using io_uring, so fine with me. However, having a way to bypass
> > get/setsockopt bpf might be problematic for some other heavy io_uring
> > users.
> > 
> > Lemme CC a bunch of Meta folks explicitly. I'm not sure what that state
> > of bpf support in io_uring.
> 
> We have use cases on the "cgroup/{g,s}etsockopt". It will be a surprise when
> the user moves from the syscall {g,s}etsockopt to SOCKET_URING_OP_*SOCKOPT
> and figured that the bpf handling is skipped.

Ok, I will add the BPF bits in the next revision then. Thanks for
clarifying it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-25 17:02       ` Stanislav Fomichev
  2023-07-25 17:56         ` Martin KaFai Lau
@ 2023-07-28 17:03         ` Breno Leitao
  2023-07-28 18:07           ` Stanislav Fomichev
  1 sibling, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2023-07-28 17:03 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: asml.silence, axboe, io-uring, netdev, davem, kuba, edumazet,
	pabeni, linux-kernel, leit, bpf, ast, martin.lau

Hello Stanislav,

On Tue, Jul 25, 2023 at 10:02:40AM -0700, Stanislav Fomichev wrote:
> On 07/25, Breno Leitao wrote:
> > On Mon, Jul 24, 2023 at 10:31:28AM -0700, Stanislav Fomichev wrote:
> > > On 07/24, Breno Leitao wrote:
> > > > Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> > > > level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> > > > where a sockptr_t is either userspace or kernel space, and handled as
> > > > such.
> > > > 
> > > > Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
> > > 
> > > We probably need to also have bpf bits in the new
> > > io_uring_cmd_getsockopt?
> > 
> > It might be interesting to have the BPF hook for this function as
> > well, but I would like to do it in a following patch, so, I can
> > experiment with it better, if that is OK.

I spent smoe time looking at the problem, and I understand we want to
call something as BPF_CGROUP_RUN_PROG_{G,S}ETSOCKOPT() into
io_uring_cmd_{g,s}etsockopt().

Per the previous conversation with Williem,
io_uring_cmd_{g,s}etsockopt() should use optval as a user pointer (void __user
*optval), and optlen as a kernel integer (it comes as from the io_uring
SQE), such as:

	void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
	int optlen = READ_ONCE(cmd->sqe->optlen);

Function BPF_CGROUP_RUN_PROG_GETSOCKOPT() calls
__cgroup_bpf_run_filter_getsockopt() which expects userpointer for
optlen and optval.

At the same time BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN() expects kernel
pointers for both optlen and optval.

In this current patchset, it has user pointer for optval and kernel value
for optlen. I.e., a third combination.  So, none of the functions would
work properly, and we probably do not want to create another function.

I am wondering if it is a good idea to move
__cgroup_bpf_run_filter_getsockopt() to use sockptr_t, so, it will be
able to adapt to any combination.

Any feedback is appreciate.
Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-28 17:03         ` Breno Leitao
@ 2023-07-28 18:07           ` Stanislav Fomichev
  2023-07-31 10:13             ` Breno Leitao
  0 siblings, 1 reply; 17+ messages in thread
From: Stanislav Fomichev @ 2023-07-28 18:07 UTC (permalink / raw)
  To: Breno Leitao
  Cc: asml.silence, axboe, io-uring, netdev, davem, kuba, edumazet,
	pabeni, linux-kernel, leit, bpf, ast, martin.lau

On Fri, Jul 28, 2023 at 10:03 AM Breno Leitao <leitao@debian.org> wrote:
>
> Hello Stanislav,
>
> On Tue, Jul 25, 2023 at 10:02:40AM -0700, Stanislav Fomichev wrote:
> > On 07/25, Breno Leitao wrote:
> > > On Mon, Jul 24, 2023 at 10:31:28AM -0700, Stanislav Fomichev wrote:
> > > > On 07/24, Breno Leitao wrote:
> > > > > Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> > > > > level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> > > > > where a sockptr_t is either userspace or kernel space, and handled as
> > > > > such.
> > > > >
> > > > > Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
> > > >
> > > > We probably need to also have bpf bits in the new
> > > > io_uring_cmd_getsockopt?
> > >
> > > It might be interesting to have the BPF hook for this function as
> > > well, but I would like to do it in a following patch, so, I can
> > > experiment with it better, if that is OK.
>
> I spent smoe time looking at the problem, and I understand we want to
> call something as BPF_CGROUP_RUN_PROG_{G,S}ETSOCKOPT() into
> io_uring_cmd_{g,s}etsockopt().
>
> Per the previous conversation with Williem,
> io_uring_cmd_{g,s}etsockopt() should use optval as a user pointer (void __user
> *optval), and optlen as a kernel integer (it comes as from the io_uring
> SQE), such as:
>
>         void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
>         int optlen = READ_ONCE(cmd->sqe->optlen);
>
> Function BPF_CGROUP_RUN_PROG_GETSOCKOPT() calls
> __cgroup_bpf_run_filter_getsockopt() which expects userpointer for
> optlen and optval.
>
> At the same time BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN() expects kernel
> pointers for both optlen and optval.
>
> In this current patchset, it has user pointer for optval and kernel value
> for optlen. I.e., a third combination.  So, none of the functions would
> work properly, and we probably do not want to create another function.
>
> I am wondering if it is a good idea to move
> __cgroup_bpf_run_filter_getsockopt() to use sockptr_t, so, it will be
> able to adapt to any combination.

Yeah, I think it makes sense. However, note that the intent of that
optlen being a __user pointer is to possibly write some (updated)
value back into the userspace.
Presumably, you'll pass that updated optlen into some io_uring
completion queue? (maybe a stupid question, not super familiar with
io_uring)

> Any feedback is appreciate.
> Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-07-28 18:07           ` Stanislav Fomichev
@ 2023-07-31 10:13             ` Breno Leitao
  0 siblings, 0 replies; 17+ messages in thread
From: Breno Leitao @ 2023-07-31 10:13 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: asml.silence, axboe, io-uring, netdev, davem, kuba, edumazet,
	pabeni, linux-kernel, leit, bpf, ast, martin.lau

On Fri, Jul 28, 2023 at 11:07:10AM -0700, Stanislav Fomichev wrote:
> On Fri, Jul 28, 2023 at 10:03 AM Breno Leitao <leitao@debian.org> wrote:
> >
> > Hello Stanislav,
> >
> > On Tue, Jul 25, 2023 at 10:02:40AM -0700, Stanislav Fomichev wrote:
> > > On 07/25, Breno Leitao wrote:
> > > > On Mon, Jul 24, 2023 at 10:31:28AM -0700, Stanislav Fomichev wrote:
> > > > > On 07/24, Breno Leitao wrote:
> > > > > > Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> > > > > > level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> > > > > > where a sockptr_t is either userspace or kernel space, and handled as
> > > > > > such.
> > > > > >
> > > > > > Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
> > > > >
> > > > > We probably need to also have bpf bits in the new
> > > > > io_uring_cmd_getsockopt?
> > > >
> > > > It might be interesting to have the BPF hook for this function as
> > > > well, but I would like to do it in a following patch, so, I can
> > > > experiment with it better, if that is OK.
> >
> > I spent smoe time looking at the problem, and I understand we want to
> > call something as BPF_CGROUP_RUN_PROG_{G,S}ETSOCKOPT() into
> > io_uring_cmd_{g,s}etsockopt().
> >
> > Per the previous conversation with Williem,
> > io_uring_cmd_{g,s}etsockopt() should use optval as a user pointer (void __user
> > *optval), and optlen as a kernel integer (it comes as from the io_uring
> > SQE), such as:
> >
> >         void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
> >         int optlen = READ_ONCE(cmd->sqe->optlen);
> >
> > Function BPF_CGROUP_RUN_PROG_GETSOCKOPT() calls
> > __cgroup_bpf_run_filter_getsockopt() which expects userpointer for
> > optlen and optval.
> >
> > At the same time BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN() expects kernel
> > pointers for both optlen and optval.
> >
> > In this current patchset, it has user pointer for optval and kernel value
> > for optlen. I.e., a third combination.  So, none of the functions would
> > work properly, and we probably do not want to create another function.
> >
> > I am wondering if it is a good idea to move
> > __cgroup_bpf_run_filter_getsockopt() to use sockptr_t, so, it will be
> > able to adapt to any combination.
> 
> Yeah, I think it makes sense. However, note that the intent of that
> optlen being a __user pointer is to possibly write some (updated)
> value back into the userspace.
> Presumably, you'll pass that updated optlen into some io_uring
> completion queue? (maybe a stupid question, not super familiar with
> io_uring)

On io_uring proposal, the optlen is part of the SQE for setsockopt().
You give a  userpointer (optval) and set the optlen in the SQE->optlen.

For getsockopt(), the optlen is returned as a result of the operation,
in the CQE->res.

If you need more detail about it, I documented this behaviour in the
cover-letter (PS1):

https://lore.kernel.org/all/20230724142237.358769-1-leitao@debian.org/

Thanks for the feedback!

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-07-31 10:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-24 14:22 [PATCH 0/3] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
2023-07-24 14:22 ` [PATCH 1/4] net: expose sock_use_custom_sol_socket Breno Leitao
2023-07-24 14:22 ` [PATCH 2/4] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
2023-07-24 17:31   ` Stanislav Fomichev
2023-07-25  9:27     ` Breno Leitao
2023-07-25 17:02       ` Stanislav Fomichev
2023-07-25 17:56         ` Martin KaFai Lau
2023-07-26  9:26           ` Breno Leitao
2023-07-28 17:03         ` Breno Leitao
2023-07-28 18:07           ` Stanislav Fomichev
2023-07-31 10:13             ` Breno Leitao
2023-07-24 22:58   ` Willem de Bruijn
2023-07-25  9:51     ` Breno Leitao
2023-07-25 13:56       ` Willem de Bruijn
2023-07-25 15:23         ` Breno Leitao
2023-07-24 14:22 ` [PATCH 3/4] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
2023-07-24 14:22 ` [PATCH 4/4] io_uring/cmd: Extend support beyond SOL_SOCKET Breno Leitao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.