IO-Uring Archive on
 help / color / Atom feed
From: Andres Freund <>
To: Jens Axboe <>
Subject: Re: Buffered IO async context overhead
Date: Mon, 24 Feb 2020 01:35:44 -0800
Message-ID: <> (raw)
In-Reply-To: <>


On 2020-02-14 13:49:31 -0700, Jens Axboe wrote:
> [description of buffered write workloads being slower via io_uring
> than plain writes]
> Because I'm working on other items, I didn't read carefully enough. Yes
> this won't change the situation for writes. I'll take a look at this when
> I get time, maybe there's something we can do to improve the situation.

I looked a bit into this.

I think one issue is the spinning the workers do:

static int io_wqe_worker(void *data)

	while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
		if (did_work)
		if (io_wqe_run_queue(wqe)) {

static inline void io_worker_spin_for_work(struct io_wqe *wqe)
	int i = 0;

	while (++i < 1000) {
		if (io_wqe_run_queue(wqe))
		if (need_resched())

even with the cpu_relax(), that causes quite a lot of cross socket
traffic, slowing down the submission side. Which after all frequently
needs to take the wqe->lock, just to be able to submit a queue

lock, work_list, flags all reside in one cacheline, so it's pretty
likely that a single io_wqe_enqueue would get the cacheline "stolen"
several times during one enqueue - without allowing any progress in the
worker, of course.

I also wonder if we can't avoid dequeuing entries one-by-one within the
worker, at least for the IO_WQ_WORK_HASHED case. Especially when writes
are just hitting the page cache, they're pretty fast, making it
plausible to cause pretty bad contention on the spinlock (even without
the spining above). Whereas the submission side is at least somewhat
likely to be able to submit several queue entries while the worker is
processing one job, that's pretty unlikely for workers.

In the hashed case there shouldn't be another worker processing entries
for the same hash. So it seems quite possible for the wqe to drain a few
of the entries for that hash within one spinlock acquisition, and then
process them one-by-one?


Andres Freund

  reply index

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14 19:50 Andres Freund
2020-02-14 20:13 ` Jens Axboe
2020-02-14 20:31   ` Andres Freund
2020-02-14 20:49     ` Jens Axboe
2020-02-24  9:35       ` Andres Freund [this message]
2020-02-24 15:22         ` Jens Axboe
2020-03-09 20:03           ` Pavel Begunkov
2020-03-09 20:41             ` Jens Axboe
2020-03-09 21:02               ` Pavel Begunkov
2020-03-09 21:29                 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

IO-Uring Archive on

Archives are clonable:
	git clone --mirror io-uring/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 io-uring io-uring/ \
	public-inbox-index io-uring

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone