All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: io-uring <io-uring@vger.kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [LSF/MM/BPF TOPIC] io_uring: BPF controlled I/O
Date: Sat, 5 Jun 2021 10:16:34 +0100	[thread overview]
Message-ID: <77b2c502-f8a3-2ec3-0373-6a34f991ab19@gmail.com> (raw)
In-Reply-To: <23168ac0-0f05-3cd7-90dc-08855dd275b2@gmail.com>

I botched subject tags, should be [LSF/MM/BPF TOPIC].

On 6/5/21 10:08 AM, Pavel Begunkov wrote:
> One of the core ideas behind io_uring is passing requests via memory
> shared b/w the userspace and the kernel, a.k.a. queues or rings. That
> serves a purpose of reducing number of context switches or bypassing
> them, but the userspace is responsible for controlling the flow,
> reaping and processing completions (a.k.a. Completion Queue Entry, CQE),
> and submitting new requests, adding extra context switches even if there
> is not much work to do. A simple illustration is read(open()), where
> io_uring is unable to propagate the returned fd to the read, with more
> cases piling up.
> 
> The big picture idea stays the same since last year, to give out some
> of this control to BPF, allow it to check results of completed requests,
> manipulate memory if needed and submit new requests. Apart from being
> just a glue between two requests, it might even offer more flexibility
> like keeping a QD, doing reduce/broadcast and so on.
> 
> The prototype [1,2] is in a good shape but some work need to be done.
> However, the main concern is getting an understanding what features and
> functionality have to be added to be flexible enough. Various toy
> examples can be found at [3] ([1] includes an overview of cases).
> 
> Discussion points:
> - Use cases, feature requests, benchmarking
> - Userspace programming model, code reuse (e.g. liburing)
> - BPF-BPF and userspace-BPF synchronisation. There is
>   CQE based notification approach and plans (see design
>   notes), however need to discuss what else might be
>   needed.
> - Do we need more contexts passed apart from user_data?
>   e.g. specifying a BPF map/array/etc fd io_uring requests?
> - Userspace atomics and efficiency of userspace reads/writes. If
>   proved to be not performant enough there are potential ways to take
>   on it, e.g. inlining, having it in BPF ISA, and pre-verifying
>   userspace pointers.
> 
> [1] https://lore.kernel.org/io-uring/a83f147b-ea9d-e693-a2e9-c6ce16659749@gmail.com/T/#m31d0a2ac6e2213f912a200f5e8d88bd74f81406b
> [2] https://github.com/isilence/linux/tree/ebpf_v2
> [3] https://github.com/isilence/liburing/tree/ebpf_v2/examples/bpf
> 
> 
> -----------------------------------------------------------------------
> Design notes:
> 
> Instead of basing it on hooks it adds support of a new type of io_uring
> requests as it gives a better control and let's to reuse internal
> infrastructure. These requests run a new type of io_uring BPF programs
> wired with a bunch of new helpers for submitting requests and dealing
> with CQEs, are allowed to read/write userspace memory in virtue of a
> recently added sleepable BPF feature. and also provided with a token
> (generic io_uring token, aka user_data, specified at submission and
> returned in an CQE), which may be used to pass a userspace pointer used
> as a context.
> 
> Besides running BPF programs, they are able to request waiting.
> Currently it supports CQ waiting for a number of completions, but others
> might be added and/or needed, e.g. futex and/or requeueing the current
> BPF request onto an io_uring request/link being submitted. That hides
> the overhead of creating BPF requests by keeping them alive and
> invoking multiple times.
> 
> Another big chunk solved is figuring out a good way of feeding CQEs
> (potentially many) to a BPF program. The current approach
> is to enable multiple completion queues (CQ), and specify for each
> request to which one steer its CQE, so all the synchronisation
> is in control of the userspace. For instance, there may be a separate
> CQ per each in-flight BPF request, and they can work with their own
> queues and send an CQE to the main CQ so notifying the userspace.
> It also opens up a notification-like sync through CQE posting to
> neighbours' CQs.
> 
> 

-- 
Pavel Begunkov

  reply	other threads:[~2021-06-05  9:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-05  9:08 io_uring: BPF controlled I/O Pavel Begunkov
2021-06-05  9:16 ` Pavel Begunkov [this message]
2021-06-07 18:51 ` Victor Stewart
2021-06-10  9:09   ` Pavel Begunkov
2021-06-14  7:54   ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77b2c502-f8a3-2ec3-0373-6a34f991ab19@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.