Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCHSET v5] io_uring IO interface
@ 2019-01-16 17:49 Jens Axboe
  2019-01-16 17:49 ` [PATCH 01/15] fs: add an iopoll method to struct file_operations Jens Axboe
                   ` (14 more replies)
  0 siblings, 15 replies; 41+ messages in thread
From: Jens Axboe @ 2019-01-16 17:49 UTC (permalink / raw)
  To: linux-fsdevel, linux-aio, linux-block, linux-arch; +Cc: hch, jmoyer, avi

Here's v5 of the io_uring interface. Mostly feels like putting some
finishing touches on top of v4, though we do have a few user interface
tweaks because of that.

Arnd was kind enough to review the code with an eye towards 32-bit
compatability, and that resulted in a few changes. See changelog below.

I also cleaned up the internal ring handling, enabling us to batch
writes to the SQ ring head and CQ ring tail. This reduces the number of
write ordering barriers we need.

I also dumped the io_submit_state intermediate poll list handling. This
drops a patch, and also cleans up the block flush handling since we no
longer have to tie into the deep internal of plug callbacks. The win of
this just wasn't enough to warrant the complexity.

LWN did a great write up of the API and internals, see that here:

https://lwn.net/Articles/776703/

In terms of benchmarks, I ran some numbers comparing io_uring to libaio
and spdk. The tldr is that io_uring is pretty close to spdk, in some
cases faster. Latencies over spdk are generally better. The areas where
we are still missing a bit of performance all lie in the block layer,
and I'll be working on that to close the gap some more.

Latency tests, 3d xpoint, 4k random read

Interface	QD	Polled		Latency		IOPS
--------------------------------------------------------------------------
io_uring	1	0		 9.5usec	 77K
io_uring	2	0		 8.2usec	183K
io_uring	4	0		 8.4usec	383K
io_uring	8	0		13.3usec	449K

libaio		1	0		 9.7usec	 74K
libaio		2	0		 8.5usec	181K
libaio		4	0		 8.5usec	373K
libaio		8	0		15.4usec	402K

io_uring	1	1		 6.1usec	139K
io_uring	2	1		 6.1usec	272K	
io_uring	4	1		 6.3usec	519K
io_uring	8	1		11.5usec	592K

spdk		1	1		 6.1usec	151K
spdk		2	1		 6.2usec	293K
spdk		4	1		 6.7usec	536K
spdk		8	1		12.6usec	586K

io_uring vs libaio, non polled, io_uring has a slight lead. spdk
slightly faster over io_uring polled, especially a lower queue depths.
At QD=8, io_uring is faster.


Peak IOPS, 512b random read

Interface	QD	Polled		Latency		IOPS
--------------------------------------------------------------------------
io_uring	4	1		 6.8usec	 513K
io_uring	8	1		 8.7usec	 829K
io_uring	16	1		13.1usec	1019K
io_uring	32	1		20.6usec	1161K
io_uring	64	1		32.4usec	1244K

spdk		4	1		 6.8usec	 549K
spdk		8	1		 8.6usec	 865K
spdk		16	1		14.0usec	1105K
spdk		32	1		25.0usec	1227K
spdk		64	1		47.3usec	1251K

io_uring lags spdk about 7% at lower queue depths, getting to within 1%
of spdk at higher queue depths.


Peak per-core, multiple devices, 4k random read

Interface	QD	Polled		IOPS
--------------------------------------------------------------------------
io_uring	128	1		1620K

libaio		128	0		 608K

spdk		128	1		1739K

This is using multiple devices, all running on the same core, meant to
test how much performance we can eke out out a single CPU core. spdk has
a slight edge over io_uring, with libaio not able to compete at all.

As usual, patches are against 5.0-rc2, and can also be found in my
io_uring branch here:


git://git.kernel.dk/linux-block io_uring


Since v4:
- Update some commit messages
- Update some stale comments
- Tweak polling efficiency
- Avoid multiple SQ/CQ ring inc+barriers for batches of IO
- Cache SQ head and CQ tail in the kernel
- Fix buffered rw/work union issue for punted IO
- Drop submit state request issue cache
- Rework io_uring_register() for buffers and files to be more 32-bit
  friendly
- Make sqe->addr an __u64 instead of playing padding tricks
- Add compat conditional syscall entry for io_uring_setup()


 Documentation/filesystems/vfs.txt      |    3 +
 arch/x86/entry/syscalls/syscall_32.tbl |    3 +
 arch/x86/entry/syscalls/syscall_64.tbl |    3 +
 block/bio.c                            |   59 +-
 fs/Makefile                            |    1 +
 fs/block_dev.c                         |   19 +-
 fs/file.c                              |   15 +-
 fs/file_table.c                        |    9 +-
 fs/gfs2/file.c                         |    2 +
 fs/io_uring.c                          | 2017 ++++++++++++++++++++++++
 fs/iomap.c                             |   48 +-
 fs/xfs/xfs_file.c                      |    1 +
 include/linux/bio.h                    |   14 +
 include/linux/blk_types.h              |    1 +
 include/linux/file.h                   |    2 +
 include/linux/fs.h                     |    6 +-
 include/linux/iomap.h                  |    1 +
 include/linux/sched/user.h             |    2 +-
 include/linux/syscalls.h               |    7 +
 include/uapi/linux/io_uring.h          |  136 ++
 init/Kconfig                           |    9 +
 kernel/sys_ni.c                        |    4 +
 22 files changed, 2322 insertions(+), 40 deletions(-)

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 41+ messages in thread
* [PATCHSET v2] io_uring IO interface
@ 2019-01-10  2:43 Jens Axboe
  2019-01-10  2:43 ` [PATCH 09/15] io_uring: use fget/fput_many() for file references Jens Axboe
  0 siblings, 1 reply; 41+ messages in thread
From: Jens Axboe @ 2019-01-10  2:43 UTC (permalink / raw)
  To: linux-fsdevel, linux-aio, linux-block, linux-arch; +Cc: hch, jmoyer, avi

Here's v2 of the io_uring interface. See the v1 posting for some more info:

https://lore.kernel.org/linux-block/20190108165645.19311-1-axboe@kernel.dk/

The data structures changed, to improve the symmetry of the submission
and completion side. The io_uring_iocb is now io_uring_sqe, but it
otherwise remains the same as before. Ditto on the completion side,
where io_uring_event is now io_uring_cqe.

I've updated the fio io_uring test app, and the io_uring engine. The
liburing git repo has also been adapted to the various changes since the
v1 posting. As a reminder, the liburing git repo contains some helpers
for doing IO without having to muck with the ring directly, setting up
an io_uring context, etc. Clone that here:

git://git.kernel.dk/liburing

In terms of usage, there's also a small test app here:

http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

and the liburing repo has a few test apps in test/ as well.

Patches are aginst 5.0-rc1, but can also be found here:

git://git.kernel.dk/linux-block io_uring

Changes since v1:

- Fail IORING_OP_{READ,WRITE}_FIXED if not configured
- Fix ctx drop ref issue on failure to close ring_fd when sq thread/wq
  are in use
- Move to separate Kconfig entry (CONFIG_IO_URING)
- Add SPDX headers
- Drop gcc ism of zero sized arrays
- Rename io_uring_iocb -> io_uring_sqe
- Rename io_uring_event -> io_uring_cqe
- Drop needless io_event_ring and io_iocb_ring structures
- Drop ctx->max_reqs, use ->sq_entries
- Drop unused ->ring_lock
- Drop io_ring_ctx slab cache
- Fix state batched kiocb alloc failure to put ctx
- Fix missing write ordering barrier when filling in the cqe
- Drop io_req_init()
- Various renames
- Fix a few lines that were too long
- Address other minor review comments
- Fix IORING_SETUP_SQPOLL being set without IORING_SETUP_SQTHREAD
- Drop IORING_SETUP_FIXEDBUFS, iovecs being non-NULL is enough
- Fix error handling free of ctx in setup path
- Change standard read/write commands to be iov based READV/WRITEV
- Pass in struct sqe_submit instead of separate sqe/index everywhere
- Fix reap of polled events on fops->release()
- Lock uring for sq thread polling
- Don't grab ->completion_lock for polled IO cqe filling
- Fix ev_flags vs flags typo
- Consolidate parts of the io_ring_ctx alignment

 Documentation/filesystems/vfs.txt      |    3 +
 arch/x86/entry/syscalls/syscall_64.tbl |    2 +
 block/bio.c                            |   59 +-
 fs/Makefile                            |    1 +
 fs/block_dev.c                         |   19 +-
 fs/file.c                              |   15 +-
 fs/file_table.c                        |    9 +-
 fs/gfs2/file.c                         |    2 +
 fs/io_uring.c                          | 1890 ++++++++++++++++++++++++
 fs/iomap.c                             |   48 +-
 fs/xfs/xfs_file.c                      |    1 +
 include/linux/bio.h                    |   14 +
 include/linux/blk_types.h              |    1 +
 include/linux/file.h                   |    2 +
 include/linux/fs.h                     |    6 +-
 include/linux/iomap.h                  |    1 +
 include/linux/syscalls.h               |    5 +
 include/uapi/linux/io_uring.h          |  114 ++
 init/Kconfig                           |    8 +
 kernel/sys_ni.c                        |    2 +
 20 files changed, 2163 insertions(+), 39 deletions(-)

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, back to index

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-16 17:49 [PATCHSET v5] io_uring IO interface Jens Axboe
2019-01-16 17:49 ` [PATCH 01/15] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-16 17:49 ` [PATCH 02/15] block: wire up block device iopoll method Jens Axboe
2019-01-16 17:49 ` [PATCH 03/15] block: add bio_set_polled() helper Jens Axboe
2019-01-16 17:49 ` [PATCH 04/15] iomap: wire up the iopoll method Jens Axboe
2019-01-16 17:49 ` [PATCH 05/15] Add io_uring IO interface Jens Axboe
2019-01-17 12:02   ` Roman Penyaev
2019-01-17 13:54     ` Jens Axboe
2019-01-17 14:34       ` Roman Penyaev
2019-01-17 14:54         ` Jens Axboe
2019-01-17 15:19           ` Roman Penyaev
2019-01-17 12:48   ` Roman Penyaev
2019-01-17 14:01     ` Jens Axboe
2019-01-17 20:03       ` Jeff Moyer
2019-01-17 20:09         ` Jens Axboe
2019-01-17 20:14           ` Jens Axboe
2019-01-17 20:50             ` Jeff Moyer
2019-01-17 20:53               ` Jens Axboe
2019-01-17 21:02                 ` Jeff Moyer
2019-01-17 21:17                   ` Jens Axboe
2019-01-17 21:21                     ` Jeff Moyer
2019-01-17 21:27                       ` Jens Axboe
2019-01-18  8:23               ` Roman Penyaev
2019-01-16 17:49 ` [PATCH 06/15] io_uring: add fsync support Jens Axboe
2019-01-16 17:49 ` [PATCH 07/15] io_uring: support for IO polling Jens Axboe
2019-01-16 17:49 ` [PATCH 08/15] fs: add fget_many() and fput_many() Jens Axboe
2019-01-16 17:49 ` [PATCH 09/15] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-16 17:49 ` [PATCH 10/15] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-16 17:49 ` [PATCH 11/15] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-16 17:50 ` [PATCH 12/15] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-16 20:53   ` Dave Chinner
2019-01-16 21:20     ` Jens Axboe
2019-01-16 22:09       ` Dave Chinner
2019-01-16 22:21         ` Jens Axboe
2019-01-16 23:09           ` Dave Chinner
2019-01-16 23:17             ` Jens Axboe
2019-01-16 22:13       ` Jens Axboe
2019-01-16 17:50 ` [PATCH 13/15] io_uring: add submission polling Jens Axboe
2019-01-16 17:50 ` [PATCH 14/15] io_uring: add file registration Jens Axboe
2019-01-16 17:50 ` [PATCH 15/15] io_uring: add io_uring_event cache hit information Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2019-01-10  2:43 [PATCHSET v2] io_uring IO interface Jens Axboe
2019-01-10  2:43 ` [PATCH 09/15] io_uring: use fget/fput_many() for file references Jens Axboe

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org
	public-inbox-index linux-block

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git