[LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF

* [LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF
@ 2020-01-24 14:18 Pavel Begunkov
  2020-01-31 21:30 ` Jens Axboe
  0 siblings, 1 reply; 2+ messages in thread
From: Pavel Begunkov @ 2020-01-24 14:18 UTC (permalink / raw)
  To: lsf-pc; +Cc: Jens Axboe, io-uring, linux-fsdevel, bpf

[-- Attachment #1.1: Type: text/plain, Size: 2247 bytes --]

Apart from concurrent IO execution, io_uring allows to issue a sequence
of operations, a.k.a links, where requests are executed sequentially one
after another. If an "error" happened, the rest of the link will be
cancelled.

The problem is what to consider an "error". For example, if we
read less bytes than have been asked for, the link will be cancelled.
It's necessary to play safe here, but this implies a lot of overhead if
that isn't the desired behaviour. The user would need to reap all
cancelled requests, analyse the state, resubmit them and suffer from
context switches and all in-kernel preparation work. And there are
dozens of possibly desirable patterns, so it's just not viable to
hard-code them into the kernel.

The other problem is to keep in running even when a request depends on
a result of the previous one. It could be simple passing return code or
something more fancy, like reading from the userspace.

And that's where BPF will be extremely useful. It will control the flow
and do steering.

The concept is to be able run a BPF program after a request's
completion, taking the request's state, and doing some of the following:
1. drop a link/request
2. issue new requests
3. link/unlink requests
4. do fast calculations / accumulate data
5. emit information to the userspace (e.g. via ring's CQ)

With that, it will be possible to have almost context-switch-less IO,
and that's really tempting considering how fast current devices are.

What to discuss:
1. use cases
2. control flow for non-privileged users (e.g. allowing some popular
   pre-registered patterns)
3. what input the program needs (e.g. last request's
   io_uring_cqe) and how to pass it.
4. whether we need notification via CQ for each cancelled/requested
   request, because sometimes they only add noise
5. BPF access to user data (e.g. allow to read only registered buffers)
6. implementation details. E.g.
   - how to ask to run BPF (e.g. with a new opcode)
   - having global BPF, bound to an io_uring instance or mixed
   - program state and how to register
   - rework notion of draining and sequencing
   - live-lock avoidance (e.g. double check io_uring shut-down code)

-- 
Pavel Begunkov

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread