IO-Uring Archive on
 help / color / Atom feed
From: Jens Axboe <>
To: Andres Freund <>
Subject: Re: Buffered IO async context overhead
Date: Mon, 24 Feb 2020 08:22:30 -0700
Message-ID: <> (raw)
In-Reply-To: <>

On 2/24/20 2:35 AM, Andres Freund wrote:
> Hi,
> On 2020-02-14 13:49:31 -0700, Jens Axboe wrote:
>> [description of buffered write workloads being slower via io_uring
>> than plain writes]
>> Because I'm working on other items, I didn't read carefully enough. Yes
>> this won't change the situation for writes. I'll take a look at this when
>> I get time, maybe there's something we can do to improve the situation.
> I looked a bit into this.
> I think one issue is the spinning the workers do:
> static int io_wqe_worker(void *data)
> {
> 	while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
> 		set_current_state(TASK_INTERRUPTIBLE);
> loop:
> 		if (did_work)
> 			io_worker_spin_for_work(wqe);
> 		spin_lock_irq(&wqe->lock);
> 		if (io_wqe_run_queue(wqe)) {
> static inline void io_worker_spin_for_work(struct io_wqe *wqe)
> {
> 	int i = 0;
> 	while (++i < 1000) {
> 		if (io_wqe_run_queue(wqe))
> 			break;
> 		if (need_resched())
> 			break;
> 		cpu_relax();
> 	}
> }
> even with the cpu_relax(), that causes quite a lot of cross socket
> traffic, slowing down the submission side. Which after all frequently
> needs to take the wqe->lock, just to be able to submit a queue
> entry.
> lock, work_list, flags all reside in one cacheline, so it's pretty
> likely that a single io_wqe_enqueue would get the cacheline "stolen"
> several times during one enqueue - without allowing any progress in the
> worker, of course.

Since it's provably harmful for this case, and the gain was small (but
noticeable) on single issue cases, I think we should just kill it. With
the poll retry stuff for 5.7, there'll be even less of a need for it.

Care to send a patch for 5.6 to kill it?

> I also wonder if we can't avoid dequeuing entries one-by-one within the
> worker, at least for the IO_WQ_WORK_HASHED case. Especially when writes
> are just hitting the page cache, they're pretty fast, making it
> plausible to cause pretty bad contention on the spinlock (even without
> the spining above). Whereas the submission side is at least somewhat
> likely to be able to submit several queue entries while the worker is
> processing one job, that's pretty unlikely for workers.
> In the hashed case there shouldn't be another worker processing entries
> for the same hash. So it seems quite possible for the wqe to drain a few
> of the entries for that hash within one spinlock acquisition, and then
> process them one-by-one?

Yeah, I think that'd be a good optimization for hashed work. Work N+1
can't make any progress before work N is done anyway, so might as well
grab a batch at the time.

Jens Axboe

  reply index

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-14 19:50 Andres Freund
2020-02-14 20:13 ` Jens Axboe
2020-02-14 20:31   ` Andres Freund
2020-02-14 20:49     ` Jens Axboe
2020-02-24  9:35       ` Andres Freund
2020-02-24 15:22         ` Jens Axboe [this message]
2020-03-09 20:03           ` Pavel Begunkov
2020-03-09 20:41             ` Jens Axboe
2020-03-09 21:02               ` Pavel Begunkov
2020-03-09 21:29                 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

IO-Uring Archive on

Archives are clonable:
	git clone --mirror io-uring/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 io-uring io-uring/ \
	public-inbox-index io-uring

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone