netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] Support reading msg errq from io_uring
@ 2020-08-20 23:49 Luke Hsiao
  2020-08-20 23:49 ` [PATCH net-next 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Luke Hsiao
  2020-08-20 23:49 ` [PATCH net-next 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
  0 siblings, 2 replies; 18+ messages in thread
From: Luke Hsiao @ 2020-08-20 23:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jens Axboe, Luke Hsiao

From: Luke Hsiao <lukehsiao@google.com>

This patch series adds support for reading MSG_ERRQUEUE using the
io_uring interface. Support for this operation allows io_uring to serve
as an alternative to epoll + recvmsg for reading notification
completions for TCP tx zerocopy. 

The first patch allows ancillary data to be read using recvmsg from
io_uring, while the second patch provides an optimization for reading
these notification completions.

Luke Hsiao (2):
  io_uring: allow tcp ancillary data for __sys_recvmsg_sock()
  io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE

 fs/io_uring.c       | 11 +++++++++--
 include/linux/net.h |  3 +++
 net/ipv4/af_inet.c  |  1 +
 net/ipv6/af_inet6.c |  1 +
 net/socket.c        |  8 +++++---
 5 files changed, 19 insertions(+), 5 deletions(-)

-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock()
  2020-08-20 23:49 [PATCH net-next 0/2] Support reading msg errq from io_uring Luke Hsiao
@ 2020-08-20 23:49 ` Luke Hsiao
  2020-08-21 21:10   ` Jens Axboe
  2020-08-20 23:49 ` [PATCH net-next 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
  1 sibling, 1 reply; 18+ messages in thread
From: Luke Hsiao @ 2020-08-20 23:49 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Jens Axboe, Luke Hsiao, Soheil Hassas Yeganeh, Arjun Roy,
	Eric Dumazet, Jann Horn

From: Luke Hsiao <lukehsiao@google.com>

For TCP tx zero-copy, the kernel notifies the process of completions by
queuing completion notifications on the socket error queue. This patch
allows reading these notifications via recvmsg to support TCP tx
zero-copy.

Ancillary data was originally disallowed due to privilege escalation
via io_uring's offloading of sendmsg() onto a kernel thread with kernel
credentials (https://crbug.com/project-zero/1975). So, we must ensure
that the socket type is one where the ancillary data types that are
delivered on recvmsg are plain data (no file descriptors or values that
are translated based on the identity of the calling process).

This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE
with tx zero-copy enabled. Before this patch, we received -EINVALID from
this specific code path. After this patch, we could read tcp tx
zero-copy completion notifications from the MSG_ERRQUEUE.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Luke Hsiao <lukehsiao@google.com>
---
 include/linux/net.h | 3 +++
 net/ipv4/af_inet.c  | 1 +
 net/ipv6/af_inet6.c | 1 +
 net/socket.c        | 8 +++++---
 4 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index d48ff1180879..7657c6432a69 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -41,6 +41,8 @@ struct net;
 #define SOCK_PASSCRED		3
 #define SOCK_PASSSEC		4
 
+#define PROTO_CMSG_DATA_ONLY	0x0001
+
 #ifndef ARCH_HAS_SOCKET_TYPES
 /**
  * enum sock_type - Socket types
@@ -135,6 +137,7 @@ typedef int (*sk_read_actor_t)(read_descriptor_t *, struct sk_buff *,
 
 struct proto_ops {
 	int		family;
+	unsigned int	flags;
 	struct module	*owner;
 	int		(*release)   (struct socket *sock);
 	int		(*bind)	     (struct socket *sock,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 4307503a6f0b..b7260c8cef2e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1017,6 +1017,7 @@ static int inet_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon
 
 const struct proto_ops inet_stream_ops = {
 	.family		   = PF_INET,
+	.flags		   = PROTO_CMSG_DATA_ONLY,
 	.owner		   = THIS_MODULE,
 	.release	   = inet_release,
 	.bind		   = inet_bind,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 0306509ab063..d9a14935f402 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -661,6 +661,7 @@ int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 
 const struct proto_ops inet6_stream_ops = {
 	.family		   = PF_INET6,
+	.flags		   = PROTO_CMSG_DATA_ONLY,
 	.owner		   = THIS_MODULE,
 	.release	   = inet6_release,
 	.bind		   = inet6_bind,
diff --git a/net/socket.c b/net/socket.c
index dbbe8ea7d395..e84a8e281b4c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2628,9 +2628,11 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg,
 			struct user_msghdr __user *umsg,
 			struct sockaddr __user *uaddr, unsigned int flags)
 {
-	/* disallow ancillary data requests from this path */
-	if (msg->msg_control || msg->msg_controllen)
-		return -EINVAL;
+	if (msg->msg_control || msg->msg_controllen) {
+		/* disallow ancillary data reqs unless cmsg is plain data */
+		if (!(sock->ops->flags & PROTO_CMSG_DATA_ONLY))
+			return -EINVAL;
+	}
 
 	return ____sys_recvmsg(sock, msg, umsg, uaddr, flags, 0);
 }
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-20 23:49 [PATCH net-next 0/2] Support reading msg errq from io_uring Luke Hsiao
  2020-08-20 23:49 ` [PATCH net-next 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Luke Hsiao
@ 2020-08-20 23:49 ` Luke Hsiao
  2020-08-21 20:41   ` Jakub Kicinski
  1 sibling, 1 reply; 18+ messages in thread
From: Luke Hsiao @ 2020-08-20 23:49 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Jens Axboe, Luke Hsiao, Arjun Roy, Soheil Hassas Yeganeh,
	Eric Dumazet

From: Luke Hsiao <lukehsiao@google.com>

Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
the context of TCP tx zero-copy, this is inefficient since we are only
reading the error queue and not using recvmsg to read POLLIN responses.

This patch was tested by using a simple sending program to call recvmsg
using io_uring with MSG_ERRQUEUE set and verifying with printks that the
POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Luke Hsiao <lukehsiao@google.com>
---
 fs/io_uring.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index dc506b75659c..664ce8739615 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -79,6 +79,7 @@
 #include <linux/splice.h>
 #include <linux/task_work.h>
 #include <linux/pagemap.h>
+#include <linux/socket.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -4902,7 +4903,8 @@ static __poll_t __io_arm_poll_handler(struct io_kiocb *req,
 	return mask;
 }
 
-static bool io_arm_poll_handler(struct io_kiocb *req)
+static bool io_arm_poll_handler(struct io_kiocb *req,
+				const struct io_uring_sqe *sqe)
 {
 	const struct io_op_def *def = &io_op_defs[req->opcode];
 	struct io_ring_ctx *ctx = req->ctx;
@@ -4932,6 +4934,11 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
 		mask |= POLLIN | POLLRDNORM;
 	if (def->pollout)
 		mask |= POLLOUT | POLLWRNORM;
+
+	/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
+	if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
+		mask &= ~(POLLIN);
+
 	mask |= POLLERR | POLLPRI;
 
 	ipt.pt._qproc = io_async_queue_proc;
@@ -6146,7 +6153,7 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	 * doesn't support non-blocking read/write attempts
 	 */
 	if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
-		if (!io_arm_poll_handler(req)) {
+		if (!io_arm_poll_handler(req, sqe)) {
 punt:
 			ret = io_prep_work_files(req);
 			if (unlikely(ret))
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-20 23:49 ` [PATCH net-next 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
@ 2020-08-21 20:41   ` Jakub Kicinski
  2020-08-21 21:11     ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2020-08-21 20:41 UTC (permalink / raw)
  To: Luke Hsiao
  Cc: David Miller, netdev, Jens Axboe, Luke Hsiao, Arjun Roy,
	Soheil Hassas Yeganeh, Eric Dumazet

On Thu, 20 Aug 2020 16:49:54 -0700 Luke Hsiao wrote:
> +	/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
> +	if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
> +		mask &= ~(POLLIN);

FWIW this adds another W=1 C=1 warnings to this code:

fs/io_uring.c:4940:22: warning: invalid assignment: &=
fs/io_uring.c:4940:22:    left side has type restricted __poll_t
fs/io_uring.c:4940:22:    right side has type int


And obviously the brackets around POLLIN are not necessary.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock()
  2020-08-20 23:49 ` [PATCH net-next 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Luke Hsiao
@ 2020-08-21 21:10   ` Jens Axboe
  0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2020-08-21 21:10 UTC (permalink / raw)
  To: Luke Hsiao, David Miller
  Cc: netdev, Luke Hsiao, Soheil Hassas Yeganeh, Arjun Roy,
	Eric Dumazet, Jann Horn

On 8/20/20 5:49 PM, Luke Hsiao wrote:
> From: Luke Hsiao <lukehsiao@google.com>
> 
> For TCP tx zero-copy, the kernel notifies the process of completions by
> queuing completion notifications on the socket error queue. This patch
> allows reading these notifications via recvmsg to support TCP tx
> zero-copy.
> 
> Ancillary data was originally disallowed due to privilege escalation
> via io_uring's offloading of sendmsg() onto a kernel thread with kernel
> credentials (https://crbug.com/project-zero/1975). So, we must ensure
> that the socket type is one where the ancillary data types that are
> delivered on recvmsg are plain data (no file descriptors or values that
> are translated based on the identity of the calling process).
> 
> This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE
> with tx zero-copy enabled. Before this patch, we received -EINVALID from
> this specific code path. After this patch, we could read tcp tx
> zero-copy completion notifications from the MSG_ERRQUEUE.
> 
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Jann Horn <jannh@google.com>
> Signed-off-by: Luke Hsiao <lukehsiao@google.com>

Reviewed-by: Jens Axboe <axboe@kernel.dk>

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-21 20:41   ` Jakub Kicinski
@ 2020-08-21 21:11     ` Jens Axboe
  2020-08-22  0:08       ` Luke Hsiao
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2020-08-21 21:11 UTC (permalink / raw)
  To: Jakub Kicinski, Luke Hsiao
  Cc: David Miller, netdev, Luke Hsiao, Arjun Roy,
	Soheil Hassas Yeganeh, Eric Dumazet

On 8/21/20 2:41 PM, Jakub Kicinski wrote:
> On Thu, 20 Aug 2020 16:49:54 -0700 Luke Hsiao wrote:
>> +	/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
>> +	if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
>> +		mask &= ~(POLLIN);
> 
> FWIW this adds another W=1 C=1 warnings to this code:
> 
> fs/io_uring.c:4940:22: warning: invalid assignment: &=
> fs/io_uring.c:4940:22:    left side has type restricted __poll_t
> fs/io_uring.c:4940:22:    right side has type int

Well, 8 or 9 of them don't really matter... This is something that should
be cleaned up separately at some point.

> And obviously the brackets around POLLIN are not necessary.

Agree, would be cleaner without!

Luke, with that:

Reviewed-by: Jens Axboe <axboe@kernel.dk>

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-21 21:11     ` Jens Axboe
@ 2020-08-22  0:08       ` Luke Hsiao
  2020-08-22  1:17         ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Luke Hsiao @ 2020-08-22  0:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jakub Kicinski, Luke Hsiao, David Miller, netdev, Arjun Roy,
	Soheil Hassas Yeganeh, Eric Dumazet

Hi Jakub and Jens,

Thank you for both of your reviews. Some responses inline below.

On Fri, Aug 21, 2020 at 2:11 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 8/21/20 2:41 PM, Jakub Kicinski wrote:
> > On Thu, 20 Aug 2020 16:49:54 -0700 Luke Hsiao wrote:
> >> +    /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
> >> +    if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
> >> +            mask &= ~(POLLIN);
> >
> > FWIW this adds another W=1 C=1 warnings to this code:
> >
> > fs/io_uring.c:4940:22: warning: invalid assignment: &=
> > fs/io_uring.c:4940:22:    left side has type restricted __poll_t
> > fs/io_uring.c:4940:22:    right side has type int
>
> Well, 8 or 9 of them don't really matter... This is something that should
> be cleaned up separately at some point.

In the spirit of not adding a warning, even if it doesn't really
matter, I'd like to fix this. But, I'm struggling to reproduce these
warnings using upstream net-next and make for x86_64. With my patches
on top of net-next/master, I'm running

$ make defconfig
$ make -j`nproc` W=1

I don't see the warning you mentioned in the logs. Could you tell me
how I can repro this warning?

> > And obviously the brackets around POLLIN are not necessary.
>
> Agree, would be cleaner without!

Thanks, I'll also remove these unnecessary parens in v2 of the patch series.

--
Luke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-22  0:08       ` Luke Hsiao
@ 2020-08-22  1:17         ` Jens Axboe
  2020-08-22  2:04           ` [PATCH net-next v2 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Luke Hsiao
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2020-08-22  1:17 UTC (permalink / raw)
  To: Luke Hsiao
  Cc: Jakub Kicinski, Luke Hsiao, David Miller, netdev, Arjun Roy,
	Soheil Hassas Yeganeh, Eric Dumazet

On 8/21/20 6:08 PM, Luke Hsiao wrote:
> Hi Jakub and Jens,
> 
> Thank you for both of your reviews. Some responses inline below.
> 
> On Fri, Aug 21, 2020 at 2:11 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 8/21/20 2:41 PM, Jakub Kicinski wrote:
>>> On Thu, 20 Aug 2020 16:49:54 -0700 Luke Hsiao wrote:
>>>> +    /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
>>>> +    if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
>>>> +            mask &= ~(POLLIN);
>>>
>>> FWIW this adds another W=1 C=1 warnings to this code:
>>>
>>> fs/io_uring.c:4940:22: warning: invalid assignment: &=
>>> fs/io_uring.c:4940:22:    left side has type restricted __poll_t
>>> fs/io_uring.c:4940:22:    right side has type int
>>
>> Well, 8 or 9 of them don't really matter... This is something that should
>> be cleaned up separately at some point.
> 
> In the spirit of not adding a warning, even if it doesn't really
> matter, I'd like to fix this. But, I'm struggling to reproduce these
> warnings using upstream net-next and make for x86_64. With my patches
> on top of net-next/master, I'm running
> 
> $ make defconfig
> $ make -j`nproc` W=1
> 
> I don't see the warning you mentioned in the logs. Could you tell me
> how I can repro this warning?

You should see them with C=1. But as I said, don't worry about it, as that's
a class of warnings and several exist already. It needs to get cleaned up
separately.

>>> And obviously the brackets around POLLIN are not necessary.
>>
>> Agree, would be cleaner without!
> 
> Thanks, I'll also remove these unnecessary parens in v2 of the patch series.

Please do, thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v2 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock()
  2020-08-22  1:17         ` Jens Axboe
@ 2020-08-22  2:04           ` Luke Hsiao
  2020-08-22  2:04             ` [PATCH net-next v2 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
  0 siblings, 1 reply; 18+ messages in thread
From: Luke Hsiao @ 2020-08-22  2:04 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Jens Axboe, Jakub Kicinski, Luke Hsiao,
	Soheil Hassas Yeganeh, Arjun Roy, Eric Dumazet, Jann Horn

From: Luke Hsiao <lukehsiao@google.com>

For TCP tx zero-copy, the kernel notifies the process of completions by
queuing completion notifications on the socket error queue. This patch
allows reading these notifications via recvmsg to support TCP tx
zero-copy.

Ancillary data was originally disallowed due to privilege escalation
via io_uring's offloading of sendmsg() onto a kernel thread with kernel
credentials (https://crbug.com/project-zero/1975). So, we must ensure
that the socket type is one where the ancillary data types that are
delivered on recvmsg are plain data (no file descriptors or values that
are translated based on the identity of the calling process).

This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE
with tx zero-copy enabled. Before this patch, we received -EINVALID from
this specific code path. After this patch, we could read tcp tx
zero-copy completion notifications from the MSG_ERRQUEUE.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Luke Hsiao <lukehsiao@google.com>
---
 include/linux/net.h | 3 +++
 net/ipv4/af_inet.c  | 1 +
 net/ipv6/af_inet6.c | 1 +
 net/socket.c        | 8 +++++---
 4 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index d48ff1180879..7657c6432a69 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -41,6 +41,8 @@ struct net;
 #define SOCK_PASSCRED		3
 #define SOCK_PASSSEC		4
 
+#define PROTO_CMSG_DATA_ONLY	0x0001
+
 #ifndef ARCH_HAS_SOCKET_TYPES
 /**
  * enum sock_type - Socket types
@@ -135,6 +137,7 @@ typedef int (*sk_read_actor_t)(read_descriptor_t *, struct sk_buff *,
 
 struct proto_ops {
 	int		family;
+	unsigned int	flags;
 	struct module	*owner;
 	int		(*release)   (struct socket *sock);
 	int		(*bind)	     (struct socket *sock,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 4307503a6f0b..b7260c8cef2e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1017,6 +1017,7 @@ static int inet_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon
 
 const struct proto_ops inet_stream_ops = {
 	.family		   = PF_INET,
+	.flags		   = PROTO_CMSG_DATA_ONLY,
 	.owner		   = THIS_MODULE,
 	.release	   = inet_release,
 	.bind		   = inet_bind,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 0306509ab063..d9a14935f402 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -661,6 +661,7 @@ int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 
 const struct proto_ops inet6_stream_ops = {
 	.family		   = PF_INET6,
+	.flags		   = PROTO_CMSG_DATA_ONLY,
 	.owner		   = THIS_MODULE,
 	.release	   = inet6_release,
 	.bind		   = inet6_bind,
diff --git a/net/socket.c b/net/socket.c
index dbbe8ea7d395..e84a8e281b4c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2628,9 +2628,11 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg,
 			struct user_msghdr __user *umsg,
 			struct sockaddr __user *uaddr, unsigned int flags)
 {
-	/* disallow ancillary data requests from this path */
-	if (msg->msg_control || msg->msg_controllen)
-		return -EINVAL;
+	if (msg->msg_control || msg->msg_controllen) {
+		/* disallow ancillary data reqs unless cmsg is plain data */
+		if (!(sock->ops->flags & PROTO_CMSG_DATA_ONLY))
+			return -EINVAL;
+	}
 
 	return ____sys_recvmsg(sock, msg, umsg, uaddr, flags, 0);
 }
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v2 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-22  2:04           ` [PATCH net-next v2 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Luke Hsiao
@ 2020-08-22  2:04             ` Luke Hsiao
  2020-08-22  2:08               ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Luke Hsiao @ 2020-08-22  2:04 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Jens Axboe, Jakub Kicinski, Luke Hsiao, Arjun Roy,
	Soheil Hassas Yeganeh, Eric Dumazet

From: Luke Hsiao <lukehsiao@google.com>

Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
the context of TCP tx zero-copy, this is inefficient since we are only
reading the error queue and not using recvmsg to read POLLIN responses.

This patch was tested by using a simple sending program to call recvmsg
using io_uring with MSG_ERRQUEUE set and verifying with printks that the
POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Luke Hsiao <lukehsiao@google.com>
---
 fs/io_uring.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index dc506b75659c..fd5353e31a2c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -79,6 +79,7 @@
 #include <linux/splice.h>
 #include <linux/task_work.h>
 #include <linux/pagemap.h>
+#include <linux/socket.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -4902,7 +4903,8 @@ static __poll_t __io_arm_poll_handler(struct io_kiocb *req,
 	return mask;
 }
 
-static bool io_arm_poll_handler(struct io_kiocb *req)
+static bool io_arm_poll_handler(struct io_kiocb *req,
+				const struct io_uring_sqe *sqe)
 {
 	const struct io_op_def *def = &io_op_defs[req->opcode];
 	struct io_ring_ctx *ctx = req->ctx;
@@ -4932,6 +4934,11 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
 		mask |= POLLIN | POLLRDNORM;
 	if (def->pollout)
 		mask |= POLLOUT | POLLWRNORM;
+
+	/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
+	if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
+		mask &= ~POLLIN;
+
 	mask |= POLLERR | POLLPRI;
 
 	ipt.pt._qproc = io_async_queue_proc;
@@ -6146,7 +6153,7 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	 * doesn't support non-blocking read/write attempts
 	 */
 	if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
-		if (!io_arm_poll_handler(req)) {
+		if (!io_arm_poll_handler(req, sqe)) {
 punt:
 			ret = io_prep_work_files(req);
 			if (unlikely(ret))
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v2 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-22  2:04             ` [PATCH net-next v2 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
@ 2020-08-22  2:08               ` Jens Axboe
  2020-08-22  2:13                 ` Luke Hsiao
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2020-08-22  2:08 UTC (permalink / raw)
  To: Luke Hsiao, David Miller
  Cc: netdev, Jakub Kicinski, Luke Hsiao, Arjun Roy,
	Soheil Hassas Yeganeh, Eric Dumazet

On 8/21/20 8:04 PM, Luke Hsiao wrote:
> From: Luke Hsiao <lukehsiao@google.com>
> 
> Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
> the context of TCP tx zero-copy, this is inefficient since we are only
> reading the error queue and not using recvmsg to read POLLIN responses.
> 
> This patch was tested by using a simple sending program to call recvmsg
> using io_uring with MSG_ERRQUEUE set and verifying with printks that the
> POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.

Sorry, one more minor thing to fix up:

> @@ -4932,6 +4934,11 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
>  		mask |= POLLIN | POLLRDNORM;
>  	if (def->pollout)
>  		mask |= POLLOUT | POLLWRNORM;
> +
> +	/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
> +	if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
> +		mask &= ~POLLIN;
> +

Don't pass in the sqe here, but use req->sr_msg.msg_flags for this check. This
is actually really important, as you don't want to re-read anything from the
sqe.

I'm actually surprised this one got past Jann :-)

> @@ -6146,7 +6153,7 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>  	 * doesn't support non-blocking read/write attempts
>  	 */
>  	if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
> -		if (!io_arm_poll_handler(req)) {
> +		if (!io_arm_poll_handler(req, sqe)) {

Also means you can drop this part.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v2 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-22  2:08               ` Jens Axboe
@ 2020-08-22  2:13                 ` Luke Hsiao
  2020-08-22  2:16                   ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Luke Hsiao @ 2020-08-22  2:13 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Luke Hsiao, David Miller, netdev, Jakub Kicinski, Arjun Roy,
	Soheil Hassas Yeganeh, Eric Dumazet

Hi Jens,

On Fri, Aug 21, 2020 at 7:09 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 8/21/20 8:04 PM, Luke Hsiao wrote:
> >
> Sorry, one more minor thing to fix up:
>
> > @@ -4932,6 +4934,11 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
> >               mask |= POLLIN | POLLRDNORM;
> >       if (def->pollout)
> >               mask |= POLLOUT | POLLWRNORM;
> > +
> > +     /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
> > +     if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
> > +             mask &= ~POLLIN;
> > +
>
> Don't pass in the sqe here, but use req->sr_msg.msg_flags for this check. This
> is actually really important, as you don't want to re-read anything from the
> sqe.
>
> I'm actually surprised this one got past Jann :-)

Got it, I will make the change and send v3. In Jann's defense, he
reviewed the previous commit, but not this one :). Thanks for your
detailed feedback.

--
Luke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v2 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-22  2:13                 ` Luke Hsiao
@ 2020-08-22  2:16                   ` Jens Axboe
  2020-08-22  4:41                     ` [PATCH net-next v3 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Luke Hsiao
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2020-08-22  2:16 UTC (permalink / raw)
  To: Luke Hsiao
  Cc: Luke Hsiao, David Miller, netdev, Jakub Kicinski, Arjun Roy,
	Soheil Hassas Yeganeh, Eric Dumazet

On 8/21/20 8:13 PM, Luke Hsiao wrote:
> Hi Jens,
> 
> On Fri, Aug 21, 2020 at 7:09 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 8/21/20 8:04 PM, Luke Hsiao wrote:
>>>
>> Sorry, one more minor thing to fix up:
>>
>>> @@ -4932,6 +4934,11 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
>>>               mask |= POLLIN | POLLRDNORM;
>>>       if (def->pollout)
>>>               mask |= POLLOUT | POLLWRNORM;
>>> +
>>> +     /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
>>> +     if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
>>> +             mask &= ~POLLIN;
>>> +
>>
>> Don't pass in the sqe here, but use req->sr_msg.msg_flags for this check. This
>> is actually really important, as you don't want to re-read anything from the
>> sqe.
>>
>> I'm actually surprised this one got past Jann :-)
> 
> Got it, I will make the change and send v3. In Jann's defense, he
> reviewed the previous commit, but not this one :). Thanks for your
> detailed feedback.

Ah right you are, I guess it was the previous patch that had his
review! Thanks for taking care of this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock()
  2020-08-22  2:16                   ` Jens Axboe
@ 2020-08-22  4:41                     ` Luke Hsiao
  2020-08-22  4:41                       ` [PATCH net-next v3 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
  2020-08-24 23:16                       ` [PATCH net-next v3 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() David Miller
  0 siblings, 2 replies; 18+ messages in thread
From: Luke Hsiao @ 2020-08-22  4:41 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Jens Axboe, Jakub Kicinski, Luke Hsiao,
	Soheil Hassas Yeganeh, Arjun Roy, Eric Dumazet, Jann Horn

From: Luke Hsiao <lukehsiao@google.com>

For TCP tx zero-copy, the kernel notifies the process of completions by
queuing completion notifications on the socket error queue. This patch
allows reading these notifications via recvmsg to support TCP tx
zero-copy.

Ancillary data was originally disallowed due to privilege escalation
via io_uring's offloading of sendmsg() onto a kernel thread with kernel
credentials (https://crbug.com/project-zero/1975). So, we must ensure
that the socket type is one where the ancillary data types that are
delivered on recvmsg are plain data (no file descriptors or values that
are translated based on the identity of the calling process).

This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE
with tx zero-copy enabled. Before this patch, we received -EINVALID from
this specific code path. After this patch, we could read tcp tx
zero-copy completion notifications from the MSG_ERRQUEUE.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Luke Hsiao <lukehsiao@google.com>
---
 include/linux/net.h | 3 +++
 net/ipv4/af_inet.c  | 1 +
 net/ipv6/af_inet6.c | 1 +
 net/socket.c        | 8 +++++---
 4 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index d48ff1180879..7657c6432a69 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -41,6 +41,8 @@ struct net;
 #define SOCK_PASSCRED		3
 #define SOCK_PASSSEC		4
 
+#define PROTO_CMSG_DATA_ONLY	0x0001
+
 #ifndef ARCH_HAS_SOCKET_TYPES
 /**
  * enum sock_type - Socket types
@@ -135,6 +137,7 @@ typedef int (*sk_read_actor_t)(read_descriptor_t *, struct sk_buff *,
 
 struct proto_ops {
 	int		family;
+	unsigned int	flags;
 	struct module	*owner;
 	int		(*release)   (struct socket *sock);
 	int		(*bind)	     (struct socket *sock,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 4307503a6f0b..b7260c8cef2e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1017,6 +1017,7 @@ static int inet_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon
 
 const struct proto_ops inet_stream_ops = {
 	.family		   = PF_INET,
+	.flags		   = PROTO_CMSG_DATA_ONLY,
 	.owner		   = THIS_MODULE,
 	.release	   = inet_release,
 	.bind		   = inet_bind,
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 0306509ab063..d9a14935f402 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -661,6 +661,7 @@ int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 
 const struct proto_ops inet6_stream_ops = {
 	.family		   = PF_INET6,
+	.flags		   = PROTO_CMSG_DATA_ONLY,
 	.owner		   = THIS_MODULE,
 	.release	   = inet6_release,
 	.bind		   = inet6_bind,
diff --git a/net/socket.c b/net/socket.c
index dbbe8ea7d395..e84a8e281b4c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2628,9 +2628,11 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg,
 			struct user_msghdr __user *umsg,
 			struct sockaddr __user *uaddr, unsigned int flags)
 {
-	/* disallow ancillary data requests from this path */
-	if (msg->msg_control || msg->msg_controllen)
-		return -EINVAL;
+	if (msg->msg_control || msg->msg_controllen) {
+		/* disallow ancillary data reqs unless cmsg is plain data */
+		if (!(sock->ops->flags & PROTO_CMSG_DATA_ONLY))
+			return -EINVAL;
+	}
 
 	return ____sys_recvmsg(sock, msg, umsg, uaddr, flags, 0);
 }
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-22  4:41                     ` [PATCH net-next v3 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Luke Hsiao
@ 2020-08-22  4:41                       ` Luke Hsiao
  2020-08-22 15:49                         ` Jens Axboe
  2020-08-24 23:16                         ` David Miller
  2020-08-24 23:16                       ` [PATCH net-next v3 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() David Miller
  1 sibling, 2 replies; 18+ messages in thread
From: Luke Hsiao @ 2020-08-22  4:41 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Jens Axboe, Jakub Kicinski, Luke Hsiao, Arjun Roy,
	Soheil Hassas Yeganeh, Eric Dumazet

From: Luke Hsiao <lukehsiao@google.com>

Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
the context of TCP tx zero-copy, this is inefficient since we are only
reading the error queue and not using recvmsg to read POLLIN responses.

This patch was tested by using a simple sending program to call recvmsg
using io_uring with MSG_ERRQUEUE set and verifying with printks that the
POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Luke Hsiao <lukehsiao@google.com>
---
 fs/io_uring.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index dc506b75659c..1aa2191ea683 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4932,6 +4932,12 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
 		mask |= POLLIN | POLLRDNORM;
 	if (def->pollout)
 		mask |= POLLOUT | POLLWRNORM;
+
+	/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
+	if ((req->opcode == IORING_OP_RECVMSG) &&
+	    (req->sr_msg.msg_flags & MSG_ERRQUEUE))
+		mask &= ~POLLIN;
+
 	mask |= POLLERR | POLLPRI;
 
 	ipt.pt._qproc = io_async_queue_proc;
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-22  4:41                       ` [PATCH net-next v3 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
@ 2020-08-22 15:49                         ` Jens Axboe
  2020-08-24 23:16                         ` David Miller
  1 sibling, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2020-08-22 15:49 UTC (permalink / raw)
  To: Luke Hsiao, David Miller
  Cc: netdev, Jakub Kicinski, Luke Hsiao, Arjun Roy,
	Soheil Hassas Yeganeh, Eric Dumazet

On 8/21/20 10:41 PM, Luke Hsiao wrote:
> From: Luke Hsiao <lukehsiao@google.com>
> 
> Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
> the context of TCP tx zero-copy, this is inefficient since we are only
> reading the error queue and not using recvmsg to read POLLIN responses.
> 
> This patch was tested by using a simple sending program to call recvmsg
> using io_uring with MSG_ERRQUEUE set and verifying with printks that the
> POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.

Perfect, and ends up being much simpler too and straight forward.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock()
  2020-08-22  4:41                     ` [PATCH net-next v3 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Luke Hsiao
  2020-08-22  4:41                       ` [PATCH net-next v3 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
@ 2020-08-24 23:16                       ` David Miller
  1 sibling, 0 replies; 18+ messages in thread
From: David Miller @ 2020-08-24 23:16 UTC (permalink / raw)
  To: luke.w.hsiao
  Cc: netdev, axboe, kuba, lukehsiao, soheil, arjunroy, edumazet, jannh

From: Luke Hsiao <luke.w.hsiao@gmail.com>
Date: Fri, 21 Aug 2020 21:41:04 -0700

> From: Luke Hsiao <lukehsiao@google.com>
> 
> For TCP tx zero-copy, the kernel notifies the process of completions by
> queuing completion notifications on the socket error queue. This patch
> allows reading these notifications via recvmsg to support TCP tx
> zero-copy.
> 
> Ancillary data was originally disallowed due to privilege escalation
> via io_uring's offloading of sendmsg() onto a kernel thread with kernel
> credentials (https://crbug.com/project-zero/1975). So, we must ensure
> that the socket type is one where the ancillary data types that are
> delivered on recvmsg are plain data (no file descriptors or values that
> are translated based on the identity of the calling process).
> 
> This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE
> with tx zero-copy enabled. Before this patch, we received -EINVALID from
> this specific code path. After this patch, we could read tcp tx
> zero-copy completion notifications from the MSG_ERRQUEUE.

Would be great to see such test programs added to selftests instead of
vaguely being described.

> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Jann Horn <jannh@google.com>
> Reviewed-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Luke Hsiao <lukehsiao@google.com>

Applied.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE
  2020-08-22  4:41                       ` [PATCH net-next v3 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
  2020-08-22 15:49                         ` Jens Axboe
@ 2020-08-24 23:16                         ` David Miller
  1 sibling, 0 replies; 18+ messages in thread
From: David Miller @ 2020-08-24 23:16 UTC (permalink / raw)
  To: luke.w.hsiao; +Cc: netdev, axboe, kuba, lukehsiao, arjunroy, soheil, edumazet

From: Luke Hsiao <luke.w.hsiao@gmail.com>
Date: Fri, 21 Aug 2020 21:41:05 -0700

> From: Luke Hsiao <lukehsiao@google.com>
> 
> Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
> the context of TCP tx zero-copy, this is inefficient since we are only
> reading the error queue and not using recvmsg to read POLLIN responses.
> 
> This patch was tested by using a simple sending program to call recvmsg
> using io_uring with MSG_ERRQUEUE set and verifying with printks that the
> POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.

Again, selftests additions please.

> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Luke Hsiao <lukehsiao@google.com>

Applied.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-08-24 23:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-20 23:49 [PATCH net-next 0/2] Support reading msg errq from io_uring Luke Hsiao
2020-08-20 23:49 ` [PATCH net-next 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Luke Hsiao
2020-08-21 21:10   ` Jens Axboe
2020-08-20 23:49 ` [PATCH net-next 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
2020-08-21 20:41   ` Jakub Kicinski
2020-08-21 21:11     ` Jens Axboe
2020-08-22  0:08       ` Luke Hsiao
2020-08-22  1:17         ` Jens Axboe
2020-08-22  2:04           ` [PATCH net-next v2 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Luke Hsiao
2020-08-22  2:04             ` [PATCH net-next v2 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
2020-08-22  2:08               ` Jens Axboe
2020-08-22  2:13                 ` Luke Hsiao
2020-08-22  2:16                   ` Jens Axboe
2020-08-22  4:41                     ` [PATCH net-next v3 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Luke Hsiao
2020-08-22  4:41                       ` [PATCH net-next v3 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Luke Hsiao
2020-08-22 15:49                         ` Jens Axboe
2020-08-24 23:16                         ` David Miller
2020-08-24 23:16                       ` [PATCH net-next v3 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).