Re: [RFC] Programming model for io_uring + eBPF

From: Christian Dietrich <stettberger@dokucode.de>
To: Pavel Begunkov <asml.silence@gmail.com>,
	io-uring <io-uring@vger.kernel.org>
Cc: Horst Schirmeier <horst.schirmeier@tu-dortmund.de>,
	"Franz-B. Tuneke" <franz-bernhard.tuneke@tu-dortmund.de>
Subject: Re: [RFC] Programming model for io_uring + eBPF
Date: Wed, 12 May 2021 13:20:27 +0200	[thread overview]
Message-ID: <s7beeec8ah0.fsf@dokucode.de> (raw)
In-Reply-To: <c45d633e-1278-1dcb-0d59-f0886abc3e60@gmail.com>

Pavel Begunkov <asml.silence@gmail.com> [07. May 2021]:

>> The following SQE would become: Append this SQE to the SQE-link chain
>> with the name '1'. If the link chain has completed, start a new one.
>> Thereby, the user could add an SQE to an existing link chain, even other
>> SQEs are already submitted.
>> 
>>>     sqe->flags |= IOSQE_SYNCHRONIZE;
>>>     sqe->synchronize_group = 1;     // could probably be restricted to uint8_t.
>> 
>> Implementation wise, we would hold a pointer to the last element of the
>> implicitly generated link chain.
>
> It will be in the common path hurting performance for those not using
> it, and with no clear benefit that can't be implemented in userspace.
> And io_uring is thin enough for all those extra ifs to affect end
> performance.
>
> Let's consider if we run out of userspace options.

So summarize my proposal: I want io_uring to support implicit
synchronization by sequentialization at submit time. Doing this would
avoid the overheads of locking (and potentially sleeping).

So the problem that I see with a userspace solution is the following:
If I want to sequentialize an SQE with another SQE that was submitted
waaaaaay earlier, the usual IOSQE_IO_LINK cannot be used as I cannot the
the link flag of that already submitted SQE. Therefore, I would have to
wait in userspace for the CQE and submit my second SQE lateron.

Especially if the goal is to remain in Kernelspace as long as possible
via eBPF-SQEs this is not optimal.

> Such things go really horribly with performant APIs as io_uring, even
> if not used. Just see IOSQE_IO_DRAIN, it maybe almost never used but
> still in the hot path.

If we extend the semantic of IOSEQ_IO_LINK instead of introducing a new
flag, we should be able to limit the problem, or?

- With synchronize_group=0, the usual link-the-next SQE semantic could
  remain.
- While synchronize_group!=0 could expose the described synchronization
  semantic.

Thereby, the overhead is at least hidden behind the existing check for
IOSEQ_IO_LINK, which is there anyway. Do you consider IOSQE_IO_LINK=1
part of the hot path?

chris
-- 
Prof. Dr.-Ing. Christian Dietrich
Operating System Group (E-EXK4)
Technische Universität Hamburg
Am Schwarzenberg-Campus 3 (E), 4.092
21073 Hamburg

eMail:  christian.dietrich@tuhh.de
Tel:    +49 40 42878 2188
WWW:    https://osg.tuhh.de/