io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Metzmacher <metze@samba.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring <io-uring@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Samba Technical <samba-technical@lists.samba.org>
Subject: Re: Problems replacing epoll with io_uring in tevent
Date: Wed, 26 Oct 2022 18:00:12 +0200	[thread overview]
Message-ID: <949fdb8e-bd12-03dc-05c6-c972f26ec0ec@samba.org> (raw)
In-Reply-To: <c01f72ac-b2f1-0b1c-6757-26769ee071e2@samba.org>

Hi Jens,

> 9. The above works mostly, but manual testing and our massive automated regression tests
>     found the following problems:
> 
>     a) Related to https://github.com/axboe/liburing/issues/684 I was also wondering
>        about the return value of io_uring_submit_and_wait_timeout(),
>        but in addition I noticed that the timeout parameter doesn't work
>        as expected, the function will wait for two times of the timeout value.
>        I hacked a fix here:
>        https://git.samba.org/?p=metze/samba/wip.git;a=commitdiff;h=06fec644dd9f5748952c8b875878e0e1b0000d33

Thanks for doing an upstream fix for the problem.

>     b) The major show stopper is that IORING_OP_POLL_ADD calls fget(), while
>        it's pending. Which means that a close() on the related file descriptor
>        is not able to remove the last reference! This is a problem for points 3.d,
>        4.a and 4.b from above.
> 
>        I doubt IORING_ASYNC_CANCEL_FD would be able to be used as there's not always
>        code being triggered around a raw close() syscall, which could do a sync cancel.
> 
>        For now I plan to epoll_ctl (or IORING_OP_EPOLL_CTL) and only
>        register the fd from epoll_create() with IORING_OP_POLL_ADD
>        or I keep epoll_wait() as blocking call and register the io_uring fd
>        with epoll.
> 
>        I looked at the related epoll code and found that it uses
>        a list in struct file->f_ep to keep the reference, which gets
>        detached also via eventpoll_release_file() called from __fput()
> 
>        Would it be possible move IORING_OP_POLL_ADD to use a similar model
>        so that close() will causes a cqe with -ECANCELED?

I'm currently trying to prototype for an IORING_POLL_CANCEL_ON_CLOSE
flag that can be passed to POLL_ADD. With that we'll register
the request in &req->file->f_uring_poll (similar to the file->f_ep list for epoll)
Then we only get a real reference to the file during the call to
vfs_poll() otherwise we drop the fget/fput reference and rely on
an io_uring_poll_release_file() (similar to eventpoll_release_file())
to cancel our registered poll request.

>     c) A simple pipe based performance test shows the following numbers:
>        - 'poll':               Got 232387.31 pipe events/sec
>        - 'epoll':              Got 251125.25 pipe events/sec
>        - 'samba_io_uring_ev':  Got 210998.77 pipe events/sec
>        So the io_uring backend is even slower than the 'poll' backend.
>        I guess the reason is the constant re-submission of IORING_OP_POLL_ADD.

Added some feature autodetection today and I'm now using
IORING_SETUP_COOP_TASKRUN, IORING_SETUP_TASKRUN_FLAG,
IORING_SETUP_SINGLE_ISSUER and IORING_SETUP_DEFER_TASKRUN if supported
by the kernel.

On a 6.1 kernel this improved the performance a lot, it's now faster
than the epoll backend.

The key flag is IORING_SETUP_DEFER_TASKRUN. On a different system than above
I'm getting the following numbers:
- epoll:                                    Got 114450.16 pipe events/sec
- poll:                                     Got 105872.52 pipe events/sec
- samba_io_uring_ev-without-defer_taskrun': Got  95564.22 pipe events/sec
- samba_io_uring_ev-with-defer_taskrun':    Got 122853.85 pipe events/sec

>        My hope would be that IORING_POLL_ADD_MULTI + IORING_POLL_ADD_LEVEL
>        would be able to avoid the performance problem with samba_io_uring_ev
>        compared to epoll.

I've started with a IORING_POLL_ADD_MULTI + IORING_POLL_ADD_LEVEL prototype,
but it's not very far yet and due to the IORING_SETUP_DEFER_TASKRUN
speedup, I'll postpone working on IORING_POLL_ADD_LEVEL.

metze


  reply	other threads:[~2022-10-26 16:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-18 14:42 Problems replacing epoll with io_uring in tevent Stefan Metzmacher
2022-10-26 16:00 ` Stefan Metzmacher [this message]
2022-10-26 17:08   ` Jens Axboe
2022-10-26 17:41     ` Pavel Begunkov
2022-10-27  8:18       ` Stefan Metzmacher
2022-10-27  8:05     ` Stefan Metzmacher
2022-10-27 19:25       ` Stefan Metzmacher
2022-12-28 16:19         ` Stefan Metzmacher
2023-01-18 15:56           ` Jens Axboe
2023-02-01 20:29             ` Stefan Metzmacher
2022-10-27  8:51     ` Stefan Metzmacher
2022-10-27 12:12       ` Jens Axboe
2022-10-27 18:35         ` Stefan Metzmacher
2022-10-27 19:54     ` Stefan Metzmacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=949fdb8e-bd12-03dc-05c6-c972f26ec0ec@samba.org \
    --to=metze@samba.org \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=samba-technical@lists.samba.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).