linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v3] io_uring IO interface
@ 2019-01-12 21:29 Jens Axboe
  2019-01-12 21:29 ` [PATCH 01/16] fs: add an iopoll method to struct file_operations Jens Axboe
                   ` (15 more replies)
  0 siblings, 16 replies; 19+ messages in thread
From: Jens Axboe @ 2019-01-12 21:29 UTC (permalink / raw)
  To: linux-fsdevel, linux-aio, linux-block, linux-arch; +Cc: hch, jmoyer, avi

Here's v3 of the io_uring interface. Since data structures etc have
changed since the v1 posting, here's a refresher of what io_uring
is and how it works.

io_uring is a submission queue (SQ) and completion queue (CQ) pair that
an application can use to communicate with the kernel for doing IO. This
isn't aio/libaio, but it provides a similar set of features, as well as
some new ones:

- io_uring is a lot more efficient than aio. A lot, and in many ways.

- io_uring supports buffered aio. Not just that, but efficiently as
  well. Cached data isn't punted to an async context.

- io_uring supports polled IO, it takes advantage of the blk-mq polling
  work that went into 5.0-rc.

- io_uring supports kernel side submissions for polled IO. This enables
  IO without ever having to do a system call.

- io_uring supports fixed buffers for O_DIRECT. Buffers can be
  registered after an io_uring context has been setup, which eliminates
  the need to do get_user_pages() / put_pages() for each and every IO.

To use io_uring, you must first setup an io_uring context. This is done
through the first of three new system calls:

io_uring_setup(entries, params)
	Sets up a context for doing async IO. On success, returns a file
	descriptor that the application can mmap to gain access to the
	SQ ring, CQ ring, and io_uring_sqe's.

Once the rings are setup, the application then mmap's these rings to
communicate with the kernel. See a sample application I wrote that
natively does this:

http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

IO is done by filling out an io_uring_sqe, and updating the SQ ring. The
format of the sqe is as follows:

struct io_uring_sqe {
	__u8	opcode;		/* type of operation for this sqe */
	__u8	flags;		/* IOSQE_ flags */
	__u16	ioprio;		/* ioprio for the request */
	__s32	fd;		/* file descriptor to do IO on */
	__u64	off;		/* offset into file */
	union {
		void	*addr;	/* buffer or iovecs */
		__u64	__pad;
	};
	__u32	len;		/* buffer size or number of iovecs */
	union {
		__kernel_rwf_t	rw_flags;
		__u32		fsync_flags;
	};
	__u16	buf_index;	/* index into fixed buffers, if used */
	__u16	__pad2;
	__u32	__pad3;
	__u64	user_data;	/* data to be passed back at completion time */
};

Most of this is self explanatory. The ->user_data field is passed back
through a completion event, so the application can track IOs
individually.

Completions are posted on the CQ ring when an sqe completes, they are a
struct io_uring_cqe and the format is as follows:

struct io_uring_cqe {
	__u64	user_data;	/* sqe->data submission passed back */
	__s32	res;		/* result code for this event */
	__u32	flags;
};

To either submit IO or reap completions, there's a 2nd new system call:

io_uring_enter(fd, to_submit, min_complete, flags)
	Initiates IO against the rings mapped to this fd, or waits for
	them to complete, or both The behavior is controlled by the
	parameters passed in. If 'min_complete' is non-zero, then we'll
	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
	kernel will wait for 'min_complete' events, if they aren't
	already available.

The sample application mentioned above uses the rings directly, but for
most uses cases, I intend to have the necessary support in a liburing
library that abstracts it enough for application to use in a performant
way, without having to deal with the intricacies of the ring. There's
already some basic support there and a few test applications, but that
side definitely needs some work. Find that repo here:

git://git.kernel.dk/liburing

io_uring is designed to be fast and scalable. I've demonstrated 1.6M 4k
IOPS from a single core on my aging test box, and on the latency front,
we're also doing extremely well. It's designed to both be async and
batching, if you wish, the application gets to control how to use that
side.

If you want to play with io_uring, see the sample app above, the
liburing repo, or the fio io_uring engine as well.

Patches are against 5.0-rc1 (ish), and can also be found in my
'io_uring' git branch:

git://git.kernel.dk/linux-block io_uring


Since v2
- Separate fixed buffers from sqe entries. register/unregister them
  through the new io_uring_register(2) system call
- sqe->index is now sqe->buf_index to make it clearer
- fixed buffers require sqe->flags to have IOSQE_FIXED_BUFFER set
- Add sqe field that is passed back at completion through the cqe, instead
  of passing back the original sqe index. This is more useful as it allows
  per-life of IO data, ->index did not.
- Cleanup async IO punting
- Don't punt O_DIRECT writes to async handling
- Make sq thread just for polling (submissions and completions)
- Always enable sq workqueue for async offload
- Use GFP_ATOMIC for req allocation
- Fix bio_vec being an unknown type on some kconfigs
- New IORING_OP_FSYNC implementation
- Add fixed fileset support through io_uring_register(2)
- Integrate workqueue support into main patchset
- Fix io_sq_thread() logic for when to grab current->mm
- Fix io_sq_thread() off-by-one
- Improve polling performance for multiple files in an io_uring context
- Have CONFIG_IO_URING select ANON_INODES
- Don't make io_kiocb->ki_flags atomic
- Be fully consistent in naming, for some reason we used the same
  mess that aio.c is, where io_kiocb,kiocb,iocb are used interchangably.
  'req' is now always io_kiocb, 'kiocb' is always kiocb.
- Rename KIOCB_F_* flags as they are req flags, REQ_F_*.


 Documentation/filesystems/vfs.txt      |    3 +
 arch/x86/entry/syscalls/syscall_64.tbl |    3 +
 block/bio.c                            |   59 +-
 fs/Makefile                            |    1 +
 fs/block_dev.c                         |   19 +-
 fs/file.c                              |   15 +-
 fs/file_table.c                        |    9 +-
 fs/gfs2/file.c                         |    2 +
 fs/io_uring.c                          | 2023 ++++++++++++++++++++++++
 fs/iomap.c                             |   48 +-
 fs/xfs/xfs_file.c                      |    1 +
 include/linux/bio.h                    |   14 +
 include/linux/blk_types.h              |    1 +
 include/linux/file.h                   |    2 +
 include/linux/fs.h                     |    6 +-
 include/linux/iomap.h                  |    1 +
 include/linux/syscalls.h               |    7 +
 include/uapi/linux/io_uring.h          |  147 ++
 init/Kconfig                           |    9 +
 kernel/sys_ni.c                        |    3 +
 20 files changed, 2334 insertions(+), 39 deletions(-)

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 19+ messages in thread
[parent not found: <20190115025531.13985-1-axboe@kernel.dk>]
* [PATCHSET v1] io_uring IO interface
@ 2019-01-08 16:56 Jens Axboe
  2019-01-08 16:56 ` [PATCH 01/16] fs: add an iopoll method to struct file_operations Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-01-08 16:56 UTC (permalink / raw)
  To: linux-fsdevel, linux-aio, linux-block, linux-arch; +Cc: hch, jmoyer, avi

After some arm twisting from Christoph, I finally caved and divorced the
aio-poll patches from aio/libaio itself. The io_uring interface itself
is useful and efficient, and after rebasing all the new goodies on top
of that, there was little reason to retail the aio connection.

Hence io_uring was born. This is what I previously called scqring for
aio, but now as a standalone entity. Patch #5 adds the core of this
interface, but in short, it has two main data structures:

struct io_uring_iocb {
	__u8	opcode;
	__u8	flags;
	__u16	ioprio;
	__s32	fd;
	__u64	off;
	union {
		void	*addr;
		__u64	__pad;
	};
	__u32	len;
	union {
		__kernel_rwf_t	rw_flags;
		__u32		__resv;
	};
};

struct io_uring_event {
	__u64	index;		/* what iocb this event came from */
	__s32	res;		/* result code for this event */
	__u32	flags;
};

The SQ ring is an array of indexes into an array of io_uring_iocbs,
which describe the IO to be done. The SQ ring is an array of
io_uring_events, which describe a completion event. Both of these rings
are mapped into the application through mmap(2), at special magic
offsets. The application manipulates the ring directly, and then
communicates with the kernel through these two system calls:

io_uring_setup(entries, iovecs, params)
	Sets up a context for doing async IO. On success, returns a file
	descriptor that the application can mmap to gain access to the
	SQ ring, CQ ring, and io_uring_iocbs.

io_uring_enter(fd, to_submit, min_complete, flags)
	Initiates IO against the rings mapped to this fd, or waits for
	them to complete, or both The behavior is controlled by the
	parameters passed in. If 'min_complete' is non-zero, then we'll
	try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
	kernel will wait for 'min_complete' events, if they aren't
	already available.

I've started a liburing git repo for this, which contains some helpers
for doing IO without having to muck with the ring directly, setting up
an io_uring context, etc. Clone that here:

git://git.kernel.dk/liburing

In terms of usage, there's also a small test app here:

http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

and if you're into fio, there's a io_uring engine included with that as
well for test purposes.

In terms of features, this has everything that the prior aio-poll
postings did. Later patches add support for polled IO, fixed buffers,
kernel side submission and polling, buffered aio, etc. Also a number of
bug fixes in here from previous postings.

Series is against 5.0-rc1, and can also be found in my io_uring branch.
For now just x86-64 has the system calls wired up, and liburing also
only supports x86-64. The latter just needs system call numbers and
reasonable read/write barrier defines to work, however.

 Documentation/filesystems/vfs.txt      |    3 +
 arch/x86/entry/syscalls/syscall_64.tbl |    2 +
 block/bio.c                            |   59 +-
 fs/Makefile                            |    2 +-
 fs/block_dev.c                         |   19 +-
 fs/file.c                              |   15 +-
 fs/file_table.c                        |    9 +-
 fs/gfs2/file.c                         |    2 +
 fs/io_uring.c                          | 1907 ++++++++++++++++++++++++
 fs/iomap.c                             |   48 +-
 fs/xfs/xfs_file.c                      |    1 +
 include/linux/bio.h                    |   14 +
 include/linux/blk_types.h              |    1 +
 include/linux/file.h                   |    2 +
 include/linux/fs.h                     |    6 +-
 include/linux/iomap.h                  |    1 +
 include/linux/syscalls.h               |    5 +
 include/uapi/linux/io_uring.h          |  115 ++
 kernel/sys_ni.c                        |    2 +
 19 files changed, 2173 insertions(+), 40 deletions(-)

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-01-15  2:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-12 21:29 [PATCHSET v3] io_uring IO interface Jens Axboe
2019-01-12 21:29 ` [PATCH 01/16] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-12 21:29 ` [PATCH 02/16] block: wire up block device iopoll method Jens Axboe
2019-01-12 21:29 ` [PATCH 03/16] block: add bio_set_polled() helper Jens Axboe
2019-01-12 21:29 ` [PATCH 04/16] iomap: wire up the iopoll method Jens Axboe
2019-01-12 21:30 ` [PATCH 05/16] Add io_uring IO interface Jens Axboe
2019-01-12 21:30 ` [PATCH 06/16] io_uring: add fsync support Jens Axboe
2019-01-12 21:30 ` [PATCH 07/16] io_uring: support for IO polling Jens Axboe
2019-01-12 21:30 ` [PATCH 08/16] io_uring: add submission side request cache Jens Axboe
2019-01-12 21:30 ` [PATCH 09/16] fs: add fget_many() and fput_many() Jens Axboe
2019-01-12 21:30 ` [PATCH 10/16] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-12 21:30 ` [PATCH 11/16] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-12 21:30 ` [PATCH 12/16] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-12 21:30 ` [PATCH 13/16] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-12 21:30 ` [PATCH 14/16] io_uring: add submission polling Jens Axboe
2019-01-12 21:30 ` [PATCH 15/16] io_uring: add file registration Jens Axboe
2019-01-12 21:30 ` [PATCH 16/16] io_uring: add io_uring_event cache hit information Jens Axboe
     [not found] <20190115025531.13985-1-axboe@kernel.dk>
2019-01-15  2:55 ` [PATCH 01/16] fs: add an iopoll method to struct file_operations Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2019-01-08 16:56 [PATCHSET v1] io_uring IO interface Jens Axboe
2019-01-08 16:56 ` [PATCH 01/16] fs: add an iopoll method to struct file_operations Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).