bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF
@ 2020-01-24 14:18 Pavel Begunkov
  2020-01-31 21:30 ` Jens Axboe
  0 siblings, 1 reply; 2+ messages in thread
From: Pavel Begunkov @ 2020-01-24 14:18 UTC (permalink / raw)
  To: lsf-pc; +Cc: Jens Axboe, io-uring, linux-fsdevel, bpf


[-- Attachment #1.1: Type: text/plain, Size: 2247 bytes --]

Apart from concurrent IO execution, io_uring allows to issue a sequence
of operations, a.k.a links, where requests are executed sequentially one
after another. If an "error" happened, the rest of the link will be
cancelled.

The problem is what to consider an "error". For example, if we
read less bytes than have been asked for, the link will be cancelled.
It's necessary to play safe here, but this implies a lot of overhead if
that isn't the desired behaviour. The user would need to reap all
cancelled requests, analyse the state, resubmit them and suffer from
context switches and all in-kernel preparation work. And there are
dozens of possibly desirable patterns, so it's just not viable to
hard-code them into the kernel.

The other problem is to keep in running even when a request depends on
a result of the previous one. It could be simple passing return code or
something more fancy, like reading from the userspace.

And that's where BPF will be extremely useful. It will control the flow
and do steering.

The concept is to be able run a BPF program after a request's
completion, taking the request's state, and doing some of the following:
1. drop a link/request
2. issue new requests
3. link/unlink requests
4. do fast calculations / accumulate data
5. emit information to the userspace (e.g. via ring's CQ)

With that, it will be possible to have almost context-switch-less IO,
and that's really tempting considering how fast current devices are.

What to discuss:
1. use cases
2. control flow for non-privileged users (e.g. allowing some popular
   pre-registered patterns)
3. what input the program needs (e.g. last request's
   io_uring_cqe) and how to pass it.
4. whether we need notification via CQ for each cancelled/requested
   request, because sometimes they only add noise
5. BPF access to user data (e.g. allow to read only registered buffers)
6. implementation details. E.g.
   - how to ask to run BPF (e.g. with a new opcode)
   - having global BPF, bound to an io_uring instance or mixed
   - program state and how to register
   - rework notion of draining and sequencing
   - live-lock avoidance (e.g. double check io_uring shut-down code)


-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF
  2020-01-24 14:18 [LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF Pavel Begunkov
@ 2020-01-31 21:30 ` Jens Axboe
  0 siblings, 0 replies; 2+ messages in thread
From: Jens Axboe @ 2020-01-31 21:30 UTC (permalink / raw)
  To: Pavel Begunkov, lsf-pc; +Cc: io-uring, linux-fsdevel, bpf

On 1/24/20 7:18 AM, Pavel Begunkov wrote:
> Apart from concurrent IO execution, io_uring allows to issue a sequence
> of operations, a.k.a links, where requests are executed sequentially one
> after another. If an "error" happened, the rest of the link will be
> cancelled.
> 
> The problem is what to consider an "error". For example, if we
> read less bytes than have been asked for, the link will be cancelled.
> It's necessary to play safe here, but this implies a lot of overhead if
> that isn't the desired behaviour. The user would need to reap all
> cancelled requests, analyse the state, resubmit them and suffer from
> context switches and all in-kernel preparation work. And there are
> dozens of possibly desirable patterns, so it's just not viable to
> hard-code them into the kernel.
> 
> The other problem is to keep in running even when a request depends on
> a result of the previous one. It could be simple passing return code or
> something more fancy, like reading from the userspace.
> 
> And that's where BPF will be extremely useful. It will control the flow
> and do steering.
> 
> The concept is to be able run a BPF program after a request's
> completion, taking the request's state, and doing some of the following:
> 1. drop a link/request
> 2. issue new requests
> 3. link/unlink requests
> 4. do fast calculations / accumulate data
> 5. emit information to the userspace (e.g. via ring's CQ)
> 
> With that, it will be possible to have almost context-switch-less IO,
> and that's really tempting considering how fast current devices are.
> 
> What to discuss:
> 1. use cases
> 2. control flow for non-privileged users (e.g. allowing some popular
>    pre-registered patterns)
> 3. what input the program needs (e.g. last request's
>    io_uring_cqe) and how to pass it.
> 4. whether we need notification via CQ for each cancelled/requested
>    request, because sometimes they only add noise
> 5. BPF access to user data (e.g. allow to read only registered buffers)
> 6. implementation details. E.g.
>    - how to ask to run BPF (e.g. with a new opcode)
>    - having global BPF, bound to an io_uring instance or mixed
>    - program state and how to register
>    - rework notion of draining and sequencing
>    - live-lock avoidance (e.g. double check io_uring shut-down code)

I think this is a key topic that we should absolutely discuss at LSFMM.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-01-31 21:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-24 14:18 [LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF Pavel Begunkov
2020-01-31 21:30 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).