io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
@ 2021-03-16 15:33 Stefan Metzmacher
  2021-03-16 15:33 ` [PATCH 1/2] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() calls Stefan Metzmacher
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Stefan Metzmacher @ 2021-03-16 15:33 UTC (permalink / raw)
  To: io-uring; +Cc: Stefan Metzmacher

Hi,

here're patches which fix linking of send[msg]()/recv[msg]() calls
and make sure io_uring_enter() never generate a SIGPIPE.

Stefan Metzmacher (2):
  io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]()
    calls
  io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls

 fs/io_uring.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() calls
  2021-03-16 15:33 [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements Stefan Metzmacher
@ 2021-03-16 15:33 ` Stefan Metzmacher
  2021-03-20 19:33   ` [PATCH v2 1/1] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL Stefan Metzmacher
  2021-03-16 15:33 ` [PATCH 2/2] io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls Stefan Metzmacher
  2021-03-17 22:36 ` [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements Jens Axboe
  2 siblings, 1 reply; 15+ messages in thread
From: Stefan Metzmacher @ 2021-03-16 15:33 UTC (permalink / raw)
  To: io-uring; +Cc: Stefan Metzmacher, stable

Without that it's not safe to use them in a linked combination with
others.

Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
should be possible.

We already handle short reads and writes for the following opcodes:

- IORING_OP_READV
- IORING_OP_READ_FIXED
- IORING_OP_READ
- IORING_OP_WRITEV
- IORING_OP_WRITE_FIXED
- IORING_OP_WRITE
- IORING_OP_SPLICE
- IORING_OP_TEE

Now we have it for these as well:

- IORING_OP_SENDMSG
- IORING_OP_SEND
- IORING_OP_RECVMSG
- IORING_OP_RECV

For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
flags in order to call req_set_fail_links().

cc: stable@vger.kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
---
 fs/io_uring.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e88d9f95d0aa..f8a6a629e4db 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4354,6 +4354,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_async_msghdr iomsg, *kmsg;
 	struct socket *sock;
 	unsigned flags;
+	int expected_ret;
 	int ret;
 
 	sock = sock_from_file(req->file);
@@ -4374,6 +4375,9 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 	else if (issue_flags & IO_URING_F_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 
+	expected_ret = iov_iter_count(&kmsg->msg.msg_iter);
+	if (unlikely(expected_ret == MAX_RW_COUNT))
+		expected_ret += 1;
 	ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
 	if ((issue_flags & IO_URING_F_NONBLOCK) && ret == -EAGAIN)
 		return io_setup_async_msg(req, kmsg);
@@ -4384,7 +4388,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 	if (kmsg->free_iov)
 		kfree(kmsg->free_iov);
 	req->flags &= ~REQ_F_NEED_CLEANUP;
-	if (ret < 0)
+	if (ret != expected_ret)
 		req_set_fail_links(req);
 	__io_req_complete(req, issue_flags, ret, 0);
 	return 0;
@@ -4425,7 +4429,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	if (ret == -ERESTARTSYS)
 		ret = -EINTR;
 
-	if (ret < 0)
+	if (ret != sr->len)
 		req_set_fail_links(req);
 	__io_req_complete(req, issue_flags, ret, 0);
 	return 0;
@@ -4577,6 +4581,7 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	struct socket *sock;
 	struct io_buffer *kbuf;
 	unsigned flags;
+	int expected_ret;
 	int ret, cflags = 0;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 
@@ -4608,6 +4613,9 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	else if (force_nonblock)
 		flags |= MSG_DONTWAIT;
 
+	expected_ret = iov_iter_count(&kmsg->msg.msg_iter);
+	if (unlikely(expected_ret == MAX_RW_COUNT))
+		expected_ret += 1;
 	ret = __sys_recvmsg_sock(sock, &kmsg->msg, req->sr_msg.umsg,
 					kmsg->uaddr, flags);
 	if (force_nonblock && ret == -EAGAIN)
@@ -4621,7 +4629,7 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	if (kmsg->free_iov)
 		kfree(kmsg->free_iov);
 	req->flags &= ~REQ_F_NEED_CLEANUP;
-	if (ret < 0)
+	if (ret != expected_ret || (kmsg->msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC)))
 		req_set_fail_links(req);
 	__io_req_complete(req, issue_flags, ret, cflags);
 	return 0;
@@ -4675,7 +4683,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 out_free:
 	if (req->flags & REQ_F_BUFFER_SELECTED)
 		cflags = io_put_recv_kbuf(req);
-	if (ret < 0)
+	if (ret != sr->len)
 		req_set_fail_links(req);
 	__io_req_complete(req, issue_flags, ret, cflags);
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
  2021-03-16 15:33 [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements Stefan Metzmacher
  2021-03-16 15:33 ` [PATCH 1/2] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() calls Stefan Metzmacher
@ 2021-03-16 15:33 ` Stefan Metzmacher
  2021-03-17 22:36 ` [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements Jens Axboe
  2 siblings, 0 replies; 15+ messages in thread
From: Stefan Metzmacher @ 2021-03-16 15:33 UTC (permalink / raw)
  To: io-uring; +Cc: Stefan Metzmacher, stable

We never want to generate any SIGPIPE, -EPIPE only is much better.

cc: stable@vger.kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
---
 fs/io_uring.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f8a6a629e4db..54105c5cf9e8 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4369,7 +4369,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 		kmsg = &iomsg;
 	}
 
-	flags = req->sr_msg.msg_flags;
+	flags = req->sr_msg.msg_flags | MSG_NOSIGNAL;
 	if (flags & MSG_DONTWAIT)
 		req->flags |= REQ_F_NOWAIT;
 	else if (issue_flags & IO_URING_F_NONBLOCK)
@@ -4416,7 +4416,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	msg.msg_controllen = 0;
 	msg.msg_namelen = 0;
 
-	flags = req->sr_msg.msg_flags;
+	flags = req->sr_msg.msg_flags | MSG_NOSIGNAL;
 	if (flags & MSG_DONTWAIT)
 		req->flags |= REQ_F_NOWAIT;
 	else if (issue_flags & IO_URING_F_NONBLOCK)
@@ -4607,7 +4607,7 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 				1, req->sr_msg.len);
 	}
 
-	flags = req->sr_msg.msg_flags;
+	flags = req->sr_msg.msg_flags | MSG_NOSIGNAL;
 	if (flags & MSG_DONTWAIT)
 		req->flags |= REQ_F_NOWAIT;
 	else if (force_nonblock)
@@ -4669,7 +4669,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	msg.msg_iocb = NULL;
 	msg.msg_flags = 0;
 
-	flags = req->sr_msg.msg_flags;
+	flags = req->sr_msg.msg_flags | MSG_NOSIGNAL;
 	if (flags & MSG_DONTWAIT)
 		req->flags |= REQ_F_NOWAIT;
 	else if (force_nonblock)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
  2021-03-16 15:33 [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements Stefan Metzmacher
  2021-03-16 15:33 ` [PATCH 1/2] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() calls Stefan Metzmacher
  2021-03-16 15:33 ` [PATCH 2/2] io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls Stefan Metzmacher
@ 2021-03-17 22:36 ` Jens Axboe
  2021-03-17 23:07   ` Pavel Begunkov
  2 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2021-03-17 22:36 UTC (permalink / raw)
  To: Stefan Metzmacher, io-uring

On 3/16/21 9:33 AM, Stefan Metzmacher wrote:
> Hi,
> 
> here're patches which fix linking of send[msg]()/recv[msg]() calls
> and make sure io_uring_enter() never generate a SIGPIPE.
> 
> Stefan Metzmacher (2):
>   io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]()
>     calls
>   io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
> 
>  fs/io_uring.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)

Applied for 5.12, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
  2021-03-17 22:36 ` [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements Jens Axboe
@ 2021-03-17 23:07   ` Pavel Begunkov
  2021-03-17 23:24     ` Jens Axboe
  2021-03-17 23:26     ` Stefan Metzmacher
  0 siblings, 2 replies; 15+ messages in thread
From: Pavel Begunkov @ 2021-03-17 23:07 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher, io-uring

On 17/03/2021 22:36, Jens Axboe wrote:
> On 3/16/21 9:33 AM, Stefan Metzmacher wrote:
>> Hi,
>>
>> here're patches which fix linking of send[msg]()/recv[msg]() calls
>> and make sure io_uring_enter() never generate a SIGPIPE.

1/2 breaks userspace. Sounds like 2/2 might too, does it?

>>
>> Stefan Metzmacher (2):
>>   io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]()
>>     calls
>>   io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
>>
>>  fs/io_uring.c | 24 ++++++++++++++++--------
>>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> Applied for 5.12, thanks.
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
  2021-03-17 23:07   ` Pavel Begunkov
@ 2021-03-17 23:24     ` Jens Axboe
  2021-03-17 23:26     ` Stefan Metzmacher
  1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2021-03-17 23:24 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher, io-uring

On 3/17/21 5:07 PM, Pavel Begunkov wrote:
> On 17/03/2021 22:36, Jens Axboe wrote:
>> On 3/16/21 9:33 AM, Stefan Metzmacher wrote:
>>> Hi,
>>>
>>> here're patches which fix linking of send[msg]()/recv[msg]() calls
>>> and make sure io_uring_enter() never generate a SIGPIPE.
> 
> 1/2 breaks userspace. Sounds like 2/2 might too, does it?

If they don't sever on short sends/receives, they cannot have been
used reliably by applications. So I don't think that's a real risk.

2/2 should just make it a reliable delivery, SIGPIPE would get
lost for async, amongst other things.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
  2021-03-17 23:07   ` Pavel Begunkov
  2021-03-17 23:24     ` Jens Axboe
@ 2021-03-17 23:26     ` Stefan Metzmacher
  2021-03-17 23:39       ` Pavel Begunkov
  1 sibling, 1 reply; 15+ messages in thread
From: Stefan Metzmacher @ 2021-03-17 23:26 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

Hi Pavel,

>>> here're patches which fix linking of send[msg]()/recv[msg]() calls
>>> and make sure io_uring_enter() never generate a SIGPIPE.
> 
> 1/2 breaks userspace.

Can you explain that a bit please, how could some application ever
have a useful use of IOSQE_IO_LINK with these socket calls?

> Sounds like 2/2 might too, does it?

Do you think any application really expects to get a SIGPIPE
when calling io_uring_enter()?

metze


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
  2021-03-17 23:26     ` Stefan Metzmacher
@ 2021-03-17 23:39       ` Pavel Begunkov
  2021-03-18  0:15         ` Stefan Metzmacher
  0 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2021-03-17 23:39 UTC (permalink / raw)
  To: Stefan Metzmacher, Jens Axboe, io-uring

On 17/03/2021 23:26, Stefan Metzmacher wrote:
> Hi Pavel,
> 
>>>> here're patches which fix linking of send[msg]()/recv[msg]() calls
>>>> and make sure io_uring_enter() never generate a SIGPIPE.
>>
>> 1/2 breaks userspace.
> 
> Can you explain that a bit please, how could some application ever
> have a useful use of IOSQE_IO_LINK with these socket calls?

Packet delivery of variable size, i.e. recv(max_size). Byte stream
that consumes whatever you've got and links something (e.g. notification
delivery, or poll). Not sure about netlink, but maybe. Or some
"create a file via send" crap, or some made-up custom protocols

>> Sounds like 2/2 might too, does it?
> 
> Do you think any application really expects to get a SIGPIPE
> when calling io_uring_enter()?

If it was about what I think I would remove lots of old garbage :)
I doubt it wasn't working well before, e.g. because of iowq, but
who knows

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
  2021-03-17 23:39       ` Pavel Begunkov
@ 2021-03-18  0:15         ` Stefan Metzmacher
  2021-03-18 13:00           ` Pavel Begunkov
  2021-03-18 13:08           ` Pavel Begunkov
  0 siblings, 2 replies; 15+ messages in thread
From: Stefan Metzmacher @ 2021-03-18  0:15 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring; +Cc: netdev

Hi Pavel,

>>>>> here're patches which fix linking of send[msg]()/recv[msg]() calls
>>>>> and make sure io_uring_enter() never generate a SIGPIPE.
>>>
>>> 1/2 breaks userspace.
>>
>> Can you explain that a bit please, how could some application ever
>> have a useful use of IOSQE_IO_LINK with these socket calls?
> 
> Packet delivery of variable size, i.e. recv(max_size). Byte stream
> that consumes whatever you've got and links something (e.g. notification
> delivery, or poll). Not sure about netlink, but maybe. Or some
> "create a file via send" crap, or some made-up custom protocols

Ok, then we need a flag or a new opcode to provide that behavior?

For recv() and recvmsg() MSG_WAITALL might be usable.

It's not defined in 'man 2 sendmsg', but should we use it anyway
for IORING_OP_SEND[MSG] in order to activate the short send check
as the low level sock_sendmsg() call seem to ignore unused flags,
which seems to be the reason for the following logic in tcp_sendmsg_locked:

if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) {

You need to set SOCK_ZEROCOPY in the socket in order to give a meaning
to MSG_ZEROCOPY.

Should I prepare an add-on patch to make the short send/recv logic depend
on MSG_WAITALL?

I'm cc'ing netdev@vger.kernel.org in order to more feedback of
MSG_WAITALL can be passed to sendmsg without fear to trigger
-EINVAL.

The example for io_sendmsg() would look like this:

--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4383,7 +4383,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
        struct io_async_msghdr iomsg, *kmsg;
        struct socket *sock;
        unsigned flags;
-       int expected_ret;
+       int min_ret = 0;
        int ret;

        sock = sock_from_file(req->file);
@@ -4404,9 +4404,11 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
        else if (issue_flags & IO_URING_F_NONBLOCK)
                flags |= MSG_DONTWAIT;

-       expected_ret = iov_iter_count(&kmsg->msg.msg_iter);
-       if (unlikely(expected_ret == MAX_RW_COUNT))
-               expected_ret += 1;
+       if (flags & MSG_WAITALL) {
+               min_ret = iov_iter_count(&kmsg->msg.msg_iter);
+               if (unlikely(min_ret == MAX_RW_COUNT))
+                       min_ret += 1;
+       }
        ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
        if ((issue_flags & IO_URING_F_NONBLOCK) && ret == -EAGAIN)
                return io_setup_async_msg(req, kmsg);
@@ -4417,7 +4419,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
        if (kmsg->free_iov)
                kfree(kmsg->free_iov);
        req->flags &= ~REQ_F_NEED_CLEANUP;
-       if (ret != expected_ret)
+       if (ret < min_ret)
                req_set_fail_links(req);
        __io_req_complete(req, issue_flags, ret, 0);
        return 0;

Which means the default of min_ret = 0 would result in:

        if (ret < 0)
                req_set_fail_links(req);

again...

>>> Sounds like 2/2 might too, does it?
>>
>> Do you think any application really expects to get a SIGPIPE
>> when calling io_uring_enter()?
> 
> If it was about what I think I would remove lots of old garbage :)
> I doubt it wasn't working well before, e.g. because of iowq, but
> who knows

Yes, it was inconsistent before and now it's reliable.

metze




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
  2021-03-18  0:15         ` Stefan Metzmacher
@ 2021-03-18 13:00           ` Pavel Begunkov
  2021-03-18 13:08           ` Pavel Begunkov
  1 sibling, 0 replies; 15+ messages in thread
From: Pavel Begunkov @ 2021-03-18 13:00 UTC (permalink / raw)
  To: Stefan Metzmacher, Jens Axboe, io-uring; +Cc: netdev

On 18/03/2021 00:15, Stefan Metzmacher wrote:
> Hi Pavel,
> 
>>>>>> here're patches which fix linking of send[msg]()/recv[msg]() calls
>>>>>> and make sure io_uring_enter() never generate a SIGPIPE.
>>>>
>>>> 1/2 breaks userspace.
>>>
>>> Can you explain that a bit please, how could some application ever
>>> have a useful use of IOSQE_IO_LINK with these socket calls?
>>
>> Packet delivery of variable size, i.e. recv(max_size). Byte stream
>> that consumes whatever you've got and links something (e.g. notification
>> delivery, or poll). Not sure about netlink, but maybe. Or some
>> "create a file via send" crap, or some made-up custom protocols
> 
> Ok, then we need a flag or a new opcode to provide that behavior?
> 
> For recv() and recvmsg() MSG_WAITALL might be usable.

Hmm, unrelated, but there is a good chance MSG_WAITALL with io_uring
is broken because of our first MSG_DONTWAIT attempt. 

> It's not defined in 'man 2 sendmsg', but should we use it anyway
> for IORING_OP_SEND[MSG] in order to activate the short send check
> as the low level sock_sendmsg() call seem to ignore unused flags,
> which seems to be the reason for the following logic in tcp_sendmsg_locked:
> 
> if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) {

Yep, it maintains compatibility because of unchecked unsupported flags.
Alleviating an old design problem, IIRC.

> 
> You need to set SOCK_ZEROCOPY in the socket in order to give a meaning
> to MSG_ZEROCOPY.
> 
> Should I prepare an add-on patch to make the short send/recv logic depend
> on MSG_WAITALL?

IMHO, conceptually it would make much more sense with MSG_WAITALL.

> 
> I'm cc'ing netdev@vger.kernel.org in order to more feedback of
> MSG_WAITALL can be passed to sendmsg without fear to trigger
> -EINVAL.
> 
> The example for io_sendmsg() would look like this:
> 
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -4383,7 +4383,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>         struct io_async_msghdr iomsg, *kmsg;
>         struct socket *sock;
>         unsigned flags;
> -       int expected_ret;
> +       int min_ret = 0;
>         int ret;
> 
>         sock = sock_from_file(req->file);
> @@ -4404,9 +4404,11 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>         else if (issue_flags & IO_URING_F_NONBLOCK)
>                 flags |= MSG_DONTWAIT;
> 
> -       expected_ret = iov_iter_count(&kmsg->msg.msg_iter);
> -       if (unlikely(expected_ret == MAX_RW_COUNT))
> -               expected_ret += 1;
> +       if (flags & MSG_WAITALL) {
> +               min_ret = iov_iter_count(&kmsg->msg.msg_iter);
> +               if (unlikely(min_ret == MAX_RW_COUNT))
> +                       min_ret += 1;
> +       }
>         ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
>         if ((issue_flags & IO_URING_F_NONBLOCK) && ret == -EAGAIN)
>                 return io_setup_async_msg(req, kmsg);
> @@ -4417,7 +4419,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>         if (kmsg->free_iov)
>                 kfree(kmsg->free_iov);
>         req->flags &= ~REQ_F_NEED_CLEANUP;
> -       if (ret != expected_ret)
> +       if (ret < min_ret)
>                 req_set_fail_links(req);
>         __io_req_complete(req, issue_flags, ret, 0);
>         return 0;
> 
> Which means the default of min_ret = 0 would result in:
> 
>         if (ret < 0)
>                 req_set_fail_links(req);
> 
> again...
> 
>>>> Sounds like 2/2 might too, does it?
>>>
>>> Do you think any application really expects to get a SIGPIPE
>>> when calling io_uring_enter()?
>>
>> If it was about what I think I would remove lots of old garbage :)
>> I doubt it wasn't working well before, e.g. because of iowq, but
>> who knows
> 
> Yes, it was inconsistent before and now it's reliable.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements
  2021-03-18  0:15         ` Stefan Metzmacher
  2021-03-18 13:00           ` Pavel Begunkov
@ 2021-03-18 13:08           ` Pavel Begunkov
  1 sibling, 0 replies; 15+ messages in thread
From: Pavel Begunkov @ 2021-03-18 13:08 UTC (permalink / raw)
  To: Stefan Metzmacher, Jens Axboe, io-uring; +Cc: netdev

On 18/03/2021 00:15, Stefan Metzmacher wrote:
>>>> Sounds like 2/2 might too, does it?
>>>
>>> Do you think any application really expects to get a SIGPIPE
>>> when calling io_uring_enter()?
>>
>> If it was about what I think I would remove lots of old garbage :)
>> I doubt it wasn't working well before, e.g. because of iowq, but
>> who knows
> 
> Yes, it was inconsistent before and now it's reliable.

Yep, that where my hesitation was coming from, but the case I had
in mind is 

1) send() -> gone to io-wq
2) close the other end
3) send() fails, probably without SIGPIPE (because io-wq)
4) userspace retries send() and inline execution delivers SIGPIPE

But I guess we don't really care. In any case, let's drop stable tag,
maybe? I don't see a reason for it, considering that stable tries hard
to preserve ABI.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/1] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
  2021-03-16 15:33 ` [PATCH 1/2] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() calls Stefan Metzmacher
@ 2021-03-20 19:33   ` Stefan Metzmacher
  2021-03-20 22:57     ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Metzmacher @ 2021-03-20 19:33 UTC (permalink / raw)
  To: io-uring; +Cc: Stefan Metzmacher, netdev

Without that it's not safe to use them in a linked combination with
others.

Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
should be possible.

We already handle short reads and writes for the following opcodes:

- IORING_OP_READV
- IORING_OP_READ_FIXED
- IORING_OP_READ
- IORING_OP_WRITEV
- IORING_OP_WRITE_FIXED
- IORING_OP_WRITE
- IORING_OP_SPLICE
- IORING_OP_TEE

Now we have it for these as well:

- IORING_OP_SENDMSG
- IORING_OP_SEND
- IORING_OP_RECVMSG
- IORING_OP_RECV

For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
flags in order to call req_set_fail_links().

There might be applications arround depending on the behavior
that even short send[msg]()/recv[msg]() retuns continue an
IOSQE_IO_LINK chain.

It's very unlikely that such applications pass in MSG_WAITALL,
which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.

It's expected that the low level sock_sendmsg() call just ignores
MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
SO_ZEROCOPY.

We also expect the caller to know about the implicit truncation to
MAX_RW_COUNT, which we don't detect.

cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
---
 fs/io_uring.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 75b791ff21ec..746435e3f534 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4386,6 +4386,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_async_msghdr iomsg, *kmsg;
 	struct socket *sock;
 	unsigned flags;
+	int min_ret = 0;
 	int ret;
 
 	sock = sock_from_file(req->file);
@@ -4406,6 +4407,9 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 	else if (issue_flags & IO_URING_F_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 
+	if (flags & MSG_WAITALL)
+		min_ret = iov_iter_count(&kmsg->msg.msg_iter);
+
 	ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
 	if ((issue_flags & IO_URING_F_NONBLOCK) && ret == -EAGAIN)
 		return io_setup_async_msg(req, kmsg);
@@ -4416,7 +4420,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 	if (kmsg->free_iov)
 		kfree(kmsg->free_iov);
 	req->flags &= ~REQ_F_NEED_CLEANUP;
-	if (ret < 0)
+	if (ret < min_ret)
 		req_set_fail_links(req);
 	__io_req_complete(req, issue_flags, ret, 0);
 	return 0;
@@ -4429,6 +4433,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	struct iovec iov;
 	struct socket *sock;
 	unsigned flags;
+	int min_ret = 0;
 	int ret;
 
 	sock = sock_from_file(req->file);
@@ -4450,6 +4455,9 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	else if (issue_flags & IO_URING_F_NONBLOCK)
 		flags |= MSG_DONTWAIT;
 
+	if (flags & MSG_WAITALL)
+		min_ret = iov_iter_count(&msg.msg_iter);
+
 	msg.msg_flags = flags;
 	ret = sock_sendmsg(sock, &msg);
 	if ((issue_flags & IO_URING_F_NONBLOCK) && ret == -EAGAIN)
@@ -4457,7 +4465,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	if (ret == -ERESTARTSYS)
 		ret = -EINTR;
 
-	if (ret < 0)
+	if (ret < min_ret)
 		req_set_fail_links(req);
 	__io_req_complete(req, issue_flags, ret, 0);
 	return 0;
@@ -4609,6 +4617,7 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	struct socket *sock;
 	struct io_buffer *kbuf;
 	unsigned flags;
+	int min_ret = 0;
 	int ret, cflags = 0;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 
@@ -4640,6 +4649,9 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	else if (force_nonblock)
 		flags |= MSG_DONTWAIT;
 
+	if (flags & MSG_WAITALL)
+		min_ret = iov_iter_count(&kmsg->msg.msg_iter);
+
 	ret = __sys_recvmsg_sock(sock, &kmsg->msg, req->sr_msg.umsg,
 					kmsg->uaddr, flags);
 	if (force_nonblock && ret == -EAGAIN)
@@ -4653,7 +4665,7 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	if (kmsg->free_iov)
 		kfree(kmsg->free_iov);
 	req->flags &= ~REQ_F_NEED_CLEANUP;
-	if (ret < 0)
+	if (ret < min_ret || ((flags & MSG_WAITALL) && (kmsg->msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))))
 		req_set_fail_links(req);
 	__io_req_complete(req, issue_flags, ret, cflags);
 	return 0;
@@ -4668,6 +4680,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	struct socket *sock;
 	struct iovec iov;
 	unsigned flags;
+	int min_ret = 0;
 	int ret, cflags = 0;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 
@@ -4699,6 +4712,9 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	else if (force_nonblock)
 		flags |= MSG_DONTWAIT;
 
+	if (flags & MSG_WAITALL)
+		min_ret = iov_iter_count(&msg.msg_iter);
+
 	ret = sock_recvmsg(sock, &msg, flags);
 	if (force_nonblock && ret == -EAGAIN)
 		return -EAGAIN;
@@ -4707,7 +4723,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 out_free:
 	if (req->flags & REQ_F_BUFFER_SELECTED)
 		cflags = io_put_recv_kbuf(req);
-	if (ret < 0)
+	if (ret < min_ret || ((flags & MSG_WAITALL) && (msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))))
 		req_set_fail_links(req);
 	__io_req_complete(req, issue_flags, ret, cflags);
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
  2021-03-20 19:33   ` [PATCH v2 1/1] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL Stefan Metzmacher
@ 2021-03-20 22:57     ` Jens Axboe
  2021-03-21 10:20       ` Stefan Metzmacher
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2021-03-20 22:57 UTC (permalink / raw)
  To: Stefan Metzmacher, io-uring; +Cc: netdev

On 3/20/21 1:33 PM, Stefan Metzmacher wrote:
> Without that it's not safe to use them in a linked combination with
> others.
> 
> Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
> should be possible.
> 
> We already handle short reads and writes for the following opcodes:
> 
> - IORING_OP_READV
> - IORING_OP_READ_FIXED
> - IORING_OP_READ
> - IORING_OP_WRITEV
> - IORING_OP_WRITE_FIXED
> - IORING_OP_WRITE
> - IORING_OP_SPLICE
> - IORING_OP_TEE
> 
> Now we have it for these as well:
> 
> - IORING_OP_SENDMSG
> - IORING_OP_SEND
> - IORING_OP_RECVMSG
> - IORING_OP_RECV
> 
> For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
> flags in order to call req_set_fail_links().
> 
> There might be applications arround depending on the behavior
> that even short send[msg]()/recv[msg]() retuns continue an
> IOSQE_IO_LINK chain.
> 
> It's very unlikely that such applications pass in MSG_WAITALL,
> which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.
> 
> It's expected that the low level sock_sendmsg() call just ignores
> MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
> SO_ZEROCOPY.
> 
> We also expect the caller to know about the implicit truncation to
> MAX_RW_COUNT, which we don't detect.

Thanks, I do think this is much better and I feel comfortable getting
htis applied for 5.12 (and stable).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
  2021-03-20 22:57     ` Jens Axboe
@ 2021-03-21 10:20       ` Stefan Metzmacher
  2021-03-21 13:10         ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Metzmacher @ 2021-03-21 10:20 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: netdev


Am 20.03.21 um 23:57 schrieb Jens Axboe:
> On 3/20/21 1:33 PM, Stefan Metzmacher wrote:
>> Without that it's not safe to use them in a linked combination with
>> others.
>>
>> Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
>> should be possible.
>>
>> We already handle short reads and writes for the following opcodes:
>>
>> - IORING_OP_READV
>> - IORING_OP_READ_FIXED
>> - IORING_OP_READ
>> - IORING_OP_WRITEV
>> - IORING_OP_WRITE_FIXED
>> - IORING_OP_WRITE
>> - IORING_OP_SPLICE
>> - IORING_OP_TEE
>>
>> Now we have it for these as well:
>>
>> - IORING_OP_SENDMSG
>> - IORING_OP_SEND
>> - IORING_OP_RECVMSG
>> - IORING_OP_RECV
>>
>> For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
>> flags in order to call req_set_fail_links().
>>
>> There might be applications arround depending on the behavior
>> that even short send[msg]()/recv[msg]() retuns continue an
>> IOSQE_IO_LINK chain.
>>
>> It's very unlikely that such applications pass in MSG_WAITALL,
>> which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.
>>
>> It's expected that the low level sock_sendmsg() call just ignores
>> MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
>> SO_ZEROCOPY.
>>
>> We also expect the caller to know about the implicit truncation to
>> MAX_RW_COUNT, which we don't detect.
> 
> Thanks, I do think this is much better and I feel comfortable getting
> htis applied for 5.12 (and stable).
> 

Great thanks!

Related to that I have a questing regarding the IOSQE_IO_LINK behavior.
(Assuming I have a dedicated ring for the send-path of each socket.)

Is it possible to just set IOSQE_IO_LINK on every sqe in order to create
an endless chain of requests so that userspace can pass as much sqes as possible
which all need to be submitted in the exact correct order. And if any request
is short, then all remaining get ECANCELED, without the risk of running any later
request out of order.

Are such link chains possible also over multiple io_uring_submit() calls?
Is there still a race between, having an iothread removing the request from
from the list and fill in a cqe with ECANCELED, that userspace is not awaire
of yet, which then starts a new independed link chain with a request that
ought to be submitted after all the canceled once.

Or do I have to submit a link chain with just a single __io_uring_flush_sq()
and then strictly need to wait until I got a cqe for the last request in
the chain?

Thanks!
metze

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/1] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
  2021-03-21 10:20       ` Stefan Metzmacher
@ 2021-03-21 13:10         ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2021-03-21 13:10 UTC (permalink / raw)
  To: Stefan Metzmacher, io-uring; +Cc: netdev

On 3/21/21 4:20 AM, Stefan Metzmacher wrote:
> 
> Am 20.03.21 um 23:57 schrieb Jens Axboe:
>> On 3/20/21 1:33 PM, Stefan Metzmacher wrote:
>>> Without that it's not safe to use them in a linked combination with
>>> others.
>>>
>>> Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
>>> should be possible.
>>>
>>> We already handle short reads and writes for the following opcodes:
>>>
>>> - IORING_OP_READV
>>> - IORING_OP_READ_FIXED
>>> - IORING_OP_READ
>>> - IORING_OP_WRITEV
>>> - IORING_OP_WRITE_FIXED
>>> - IORING_OP_WRITE
>>> - IORING_OP_SPLICE
>>> - IORING_OP_TEE
>>>
>>> Now we have it for these as well:
>>>
>>> - IORING_OP_SENDMSG
>>> - IORING_OP_SEND
>>> - IORING_OP_RECVMSG
>>> - IORING_OP_RECV
>>>
>>> For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
>>> flags in order to call req_set_fail_links().
>>>
>>> There might be applications arround depending on the behavior
>>> that even short send[msg]()/recv[msg]() retuns continue an
>>> IOSQE_IO_LINK chain.
>>>
>>> It's very unlikely that such applications pass in MSG_WAITALL,
>>> which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.
>>>
>>> It's expected that the low level sock_sendmsg() call just ignores
>>> MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
>>> SO_ZEROCOPY.
>>>
>>> We also expect the caller to know about the implicit truncation to
>>> MAX_RW_COUNT, which we don't detect.
>>
>> Thanks, I do think this is much better and I feel comfortable getting
>> htis applied for 5.12 (and stable).
>>
> 
> Great thanks!
> 
> Related to that I have a questing regarding the IOSQE_IO_LINK behavior.
> (Assuming I have a dedicated ring for the send-path of each socket.)
> 
> Is it possible to just set IOSQE_IO_LINK on every sqe in order to create
> an endless chain of requests so that userspace can pass as much sqes as possible
> which all need to be submitted in the exact correct order. And if any request
> is short, then all remaining get ECANCELED, without the risk of running any later
> request out of order.
> 
> Are such link chains possible also over multiple io_uring_submit() calls?
> Is there still a race between, having an iothread removing the request from
> from the list and fill in a cqe with ECANCELED, that userspace is not awaire
> of yet, which then starts a new independed link chain with a request that
> ought to be submitted after all the canceled once.
> 
> Or do I have to submit a link chain with just a single __io_uring_flush_sq()
> and then strictly need to wait until I got a cqe for the last request in
> the chain?

A chain can only exist within a single submit attempt, so it will not work
if you need to break it up over multiple io_uring_enter() calls.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-03-21 13:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-16 15:33 [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements Stefan Metzmacher
2021-03-16 15:33 ` [PATCH 1/2] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() calls Stefan Metzmacher
2021-03-20 19:33   ` [PATCH v2 1/1] io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL Stefan Metzmacher
2021-03-20 22:57     ` Jens Axboe
2021-03-21 10:20       ` Stefan Metzmacher
2021-03-21 13:10         ` Jens Axboe
2021-03-16 15:33 ` [PATCH 2/2] io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls Stefan Metzmacher
2021-03-17 22:36 ` [PATCH 0/2] send[msg]()/recv[msg]() fixes/improvements Jens Axboe
2021-03-17 23:07   ` Pavel Begunkov
2021-03-17 23:24     ` Jens Axboe
2021-03-17 23:26     ` Stefan Metzmacher
2021-03-17 23:39       ` Pavel Begunkov
2021-03-18  0:15         ` Stefan Metzmacher
2021-03-18 13:00           ` Pavel Begunkov
2021-03-18 13:08           ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).