[RFC 0/2] 3 cacheline io_kiocb

* [RFC 0/2] 3 cacheline io_kiocb
@ 2020-07-25  8:31 Pavel Begunkov
  2020-07-25  8:31 ` [PATCH 1/2] io_uring: allocate req->work dynamically Pavel Begunkov
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Pavel Begunkov @ 2020-07-25  8:31 UTC (permalink / raw)
  To: Jens Axboe, io-uring

That's not final for a several reasons, but good enough for discussion.
That brings io_kiocb down to 192B. I didn't try to benchmark it
properly, but quick nop test gave +5% throughput increase.
7531 vs 7910 KIOPS with fio/t/io_uring

The whole situation is obviously a bunch of tradeoffs. For instance,
instead of shrinking it, we can inline apoll to speed apoll path.

[2/2] just for a reference, I'm thinking about other ways to shrink it.
e.g. ->link_list can be a single-linked list with linked tiemouts
storing a back-reference. This can turn out to be better, because
that would move ->fixed_file_refs to the 2nd cacheline, so we won't
ever touch 3rd cacheline in the submission path.
Any other ideas?

note: on top of for-5.9/io_uring,
f56040b819998 ("io_uring: deduplicate io_grab_files() calls")

Pavel Begunkov (2):
  io_uring: allocate req->work dynamically
  io_uring: unionise ->apoll and ->work

 fs/io-wq.h    |   1 +
 fs/io_uring.c | 207 ++++++++++++++++++++++++++------------------------
 2 files changed, 110 insertions(+), 98 deletions(-)

-- 
2.24.0

^ permalink raw reply	[flat|nested] 8+ messages in thread