io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Feature request: Please implement IORING_OP_TEE
@ 2020-04-27 15:40 Clay Harris
  2020-04-27 15:55 ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Clay Harris @ 2020-04-27 15:40 UTC (permalink / raw)
  To: io-uring

I was excited to see IORING_OP_SPLICE go in, but disappointed that tee
didn't go in at the same time.  It would be very useful to copy pipe
buffers in an async program.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feature request: Please implement IORING_OP_TEE
  2020-04-27 15:40 Feature request: Please implement IORING_OP_TEE Clay Harris
@ 2020-04-27 15:55 ` Jens Axboe
  2020-04-27 18:03   ` Pavel Begunkov
  2020-04-27 18:22   ` Jann Horn
  0 siblings, 2 replies; 8+ messages in thread
From: Jens Axboe @ 2020-04-27 15:55 UTC (permalink / raw)
  To: Clay Harris; +Cc: io-uring, Pavel Begunkov

On 4/27/20 9:40 AM, Clay Harris wrote:
> I was excited to see IORING_OP_SPLICE go in, but disappointed that tee
> didn't go in at the same time.  It would be very useful to copy pipe
> buffers in an async program.

Pavel, care to wire up tee? From a quick look, looks like just exposing
do_tee() and calling that, so should be trivial.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feature request: Please implement IORING_OP_TEE
  2020-04-27 15:55 ` Jens Axboe
@ 2020-04-27 18:03   ` Pavel Begunkov
  2020-04-27 18:11     ` Jens Axboe
  2020-04-27 18:22   ` Jann Horn
  1 sibling, 1 reply; 8+ messages in thread
From: Pavel Begunkov @ 2020-04-27 18:03 UTC (permalink / raw)
  To: Jens Axboe, Clay Harris; +Cc: io-uring

On 27/04/2020 18:55, Jens Axboe wrote:
> On 4/27/20 9:40 AM, Clay Harris wrote:
>> I was excited to see IORING_OP_SPLICE go in, but disappointed that tee
>> didn't go in at the same time.  It would be very useful to copy pipe
>> buffers in an async program.
> 
> Pavel, care to wire up tee? From a quick look, looks like just exposing
> do_tee() and calling that, so should be trivial.

Yes, should be, I'll add it

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feature request: Please implement IORING_OP_TEE
  2020-04-27 18:03   ` Pavel Begunkov
@ 2020-04-27 18:11     ` Jens Axboe
  0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2020-04-27 18:11 UTC (permalink / raw)
  To: Pavel Begunkov, Clay Harris; +Cc: io-uring

On 4/27/20 12:03 PM, Pavel Begunkov wrote:
> On 27/04/2020 18:55, Jens Axboe wrote:
>> On 4/27/20 9:40 AM, Clay Harris wrote:
>>> I was excited to see IORING_OP_SPLICE go in, but disappointed that tee
>>> didn't go in at the same time.  It would be very useful to copy pipe
>>> buffers in an async program.
>>
>> Pavel, care to wire up tee? From a quick look, looks like just exposing
>> do_tee() and calling that, so should be trivial.
> 
> Yes, should be, I'll add it

Only other thing I spotted is making the inode lock / double lock honor
nowait, which is separate from the SPLICE_F_NONBLOCK flag.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feature request: Please implement IORING_OP_TEE
  2020-04-27 15:55 ` Jens Axboe
  2020-04-27 18:03   ` Pavel Begunkov
@ 2020-04-27 18:22   ` Jann Horn
  2020-04-27 20:02     ` Jens Axboe
  2020-04-27 20:17     ` Clay Harris
  1 sibling, 2 replies; 8+ messages in thread
From: Jann Horn @ 2020-04-27 18:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Clay Harris, io-uring, Pavel Begunkov

On Mon, Apr 27, 2020 at 5:56 PM Jens Axboe <axboe@kernel.dk> wrote:
> On 4/27/20 9:40 AM, Clay Harris wrote:
> > I was excited to see IORING_OP_SPLICE go in, but disappointed that tee
> > didn't go in at the same time.  It would be very useful to copy pipe
> > buffers in an async program.
>
> Pavel, care to wire up tee? From a quick look, looks like just exposing
> do_tee() and calling that, so should be trivial.

Just out of curiosity:

What's the purpose of doing that via io_uring? Non-blocking sys_tee()
just shoves around some metadata, it doesn't do any I/O, right? Is
this purely for syscall-batching reasons? (And does that mean that you
would also add syscalls like epoll_wait() and futex() to io_uring?) Or
is this because you're worried about blocking on the pipe mutex?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feature request: Please implement IORING_OP_TEE
  2020-04-27 18:22   ` Jann Horn
@ 2020-04-27 20:02     ` Jens Axboe
  2020-04-29 15:57       ` Pavel Begunkov
  2020-04-27 20:17     ` Clay Harris
  1 sibling, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2020-04-27 20:02 UTC (permalink / raw)
  To: Jann Horn; +Cc: Clay Harris, io-uring, Pavel Begunkov

On 4/27/20 12:22 PM, Jann Horn wrote:
> On Mon, Apr 27, 2020 at 5:56 PM Jens Axboe <axboe@kernel.dk> wrote:
>> On 4/27/20 9:40 AM, Clay Harris wrote:
>>> I was excited to see IORING_OP_SPLICE go in, but disappointed that tee
>>> didn't go in at the same time.  It would be very useful to copy pipe
>>> buffers in an async program.
>>
>> Pavel, care to wire up tee? From a quick look, looks like just exposing
>> do_tee() and calling that, so should be trivial.
> 
> Just out of curiosity:
> 
> What's the purpose of doing that via io_uring? Non-blocking sys_tee()
> just shoves around some metadata, it doesn't do any I/O, right? Is
> this purely for syscall-batching reasons? (And does that mean that you
> would also add syscalls like epoll_wait() and futex() to io_uring?) Or
> is this because you're worried about blocking on the pipe mutex?

Right, it doesn't do any IO. It does potentially block on the inode
mutex, but that's about it. I think the reasons are mainly:

- Keep the interfaces the same, instead of using both sync and async
  calls.
- Bundling/batch reasons, either in same submission, or chained.

Some folks have talked about futex, and epoll_wait would also be a
natural extension as well, since we already have the ctl part.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feature request: Please implement IORING_OP_TEE
  2020-04-27 18:22   ` Jann Horn
  2020-04-27 20:02     ` Jens Axboe
@ 2020-04-27 20:17     ` Clay Harris
  1 sibling, 0 replies; 8+ messages in thread
From: Clay Harris @ 2020-04-27 20:17 UTC (permalink / raw)
  To: Jann Horn; +Cc: Jens Axboe, io-uring, Pavel Begunkov

On Mon, Apr 27 2020 at 20:22:18 +0200, Jann Horn quoth thus:

> On Mon, Apr 27, 2020 at 5:56 PM Jens Axboe <axboe@kernel.dk> wrote:
> > On 4/27/20 9:40 AM, Clay Harris wrote:
> > > I was excited to see IORING_OP_SPLICE go in, but disappointed that tee
> > > didn't go in at the same time.  It would be very useful to copy pipe
> > > buffers in an async program.
> >
> > Pavel, care to wire up tee? From a quick look, looks like just exposing
> > do_tee() and calling that, so should be trivial.
> 
> Just out of curiosity:
> 
> What's the purpose of doing that via io_uring? Non-blocking sys_tee()
> just shoves around some metadata, it doesn't do any I/O, right? Is
> this purely for syscall-batching reasons? (And does that mean that you
> would also add syscalls like epoll_wait() and futex() to io_uring?) Or
> is this because you're worried about blocking on the pipe mutex?

From my perspective -- syscall-batching.

But, if you're going to be working with a very large number of file
descriptors, you'll need to have epoll().  You could do this by building
epoll_wait into io_uring and/or having a separate uring only for IO and
never waiting for completions there, but instead calling epoll() when
there are no ready cqe's.  I'd had assumed that this was already being
looked at because of the definition of IORING_OP_EPOLL_CTL.

----

So, I'd like to take this opportunity to bounce a related thought off
of all of you.  Even with the advent of io_uring, I think the approach
of handling a bunch of IO by marking all of the fds non-blocking and
using epoll() in edge-triggered mode is still valuable.

But, there is an impedance mismatch between splice() / tee() and using
epoll() this way.  (In fact, this applies to all requests that take
both an input and output fd.)  That is the request is working on two
fds, but returning only one status.  In the IO loop, we want to do
IO until we receive an EAGAIN and mark the fd as blocked.  We then
unblock it when epoll() says we can do IO again.  This doesn't work
well when we don't know which fd the EAGAIN was for.  So, we have
to issue a seperate poll() request on the involved fds to find out.

Logically, we'd like to get the status of both fds back from the
initial request, but that's not practical because once an error is
detected on one, the other is not further examined.

So, the idea is to introduce a new flag which could be passed to
any request that takes both an input and output fd.

If the flag is clear, errors are returned exactly as they are now.
If the flag is set, and the error occured with the output fd,
add 1 << 30 to the error number.

As it would be very rare for errors to concurrently be on both fds,
this would be practically as good as simultaneously getting the
status of both fds back.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Feature request: Please implement IORING_OP_TEE
  2020-04-27 20:02     ` Jens Axboe
@ 2020-04-29 15:57       ` Pavel Begunkov
  0 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2020-04-29 15:57 UTC (permalink / raw)
  To: Jens Axboe, Jann Horn; +Cc: Clay Harris, io-uring

On 27/04/2020 23:02, Jens Axboe wrote:
> On 4/27/20 12:22 PM, Jann Horn wrote:
>> On Mon, Apr 27, 2020 at 5:56 PM Jens Axboe <axboe@kernel.dk> wrote:
>>> On 4/27/20 9:40 AM, Clay Harris wrote:
>>>> I was excited to see IORING_OP_SPLICE go in, but disappointed that tee
>>>> didn't go in at the same time.  It would be very useful to copy pipe
>>>> buffers in an async program.
>>>
>>> Pavel, care to wire up tee? From a quick look, looks like just exposing
>>> do_tee() and calling that, so should be trivial.
>>
>> Just out of curiosity:
>>
>> What's the purpose of doing that via io_uring? Non-blocking sys_tee()
>> just shoves around some metadata, it doesn't do any I/O, right? Is
>> this purely for syscall-batching reasons? (And does that mean that you
>> would also add syscalls like epoll_wait() and futex() to io_uring?) Or
>> is this because you're worried about blocking on the pipe mutex?
> 
> Right, it doesn't do any IO. It does potentially block on the inode
> mutex, but that's about it. I think the reasons are mainly:

Good catch, the waiting probably can happen with splice as well.
I need to read it through, but looks strange that it just ignores O_NONBLOCK,
is there some upper bound for holding it or something?

> 
> - Keep the interfaces the same, instead of using both sync and async
>   calls.
> - Bundling/batch reasons, either in same submission, or chained.
> 
> Some folks have talked about futex, and epoll_wait would also be a
> natural extension as well, since we already have the ctl part.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-04-29 15:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-27 15:40 Feature request: Please implement IORING_OP_TEE Clay Harris
2020-04-27 15:55 ` Jens Axboe
2020-04-27 18:03   ` Pavel Begunkov
2020-04-27 18:11     ` Jens Axboe
2020-04-27 18:22   ` Jann Horn
2020-04-27 20:02     ` Jens Axboe
2020-04-29 15:57       ` Pavel Begunkov
2020-04-27 20:17     ` Clay Harris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).