[GIT PULL] io_uring changes for 5.9-rc1

* [GIT PULL] io_uring changes for 5.9-rc1
@ 2020-08-02 21:41 Jens Axboe
  2020-08-03 20:48 ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2020-08-02 21:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: io-uring, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 9735 bytes --]

Hi Linus,

Lots of cleanups in here, hardening the code and/or making it easier to
read and fixing buts, but a core feature/change too adding support for
real async buffered reads. With the latter in place, we just need
buffered write async support and we're done relying on kthreads for the
fast path. In detail:

- Cleanup how memory accounting is done on ring setup/free (Bijan)

- sq array offset calculation fixup (Dmitry)

- Consistently handle blocking off O_DIRECT submission path (me)

- Support proper async buffered reads, instead of relying on kthread
  offload for that. This uses the page waitqueue to drive retries from
  task_work, like we handle poll based retry. (me)

- IO completion optimizations (me)

- Fix race with accounting and ring fd install (me)

- Support EPOLLEXCLUSIVE (Jiufei)

- Get rid of the io_kiocb unionizing, made possible by shrinking other
  bits (Pavel)

- Completion side cleanups (Pavel)

- Cleanup REQ_F_ flags handling, and kill off many of them (Pavel)

- Request environment grabbing cleanups (Pavel)

- File and socket read/write cleanups (Pavel)

- Improve kiocb_set_rw_flags() (Pavel)

- Tons of fixes and cleanups (Pavel)

- IORING_SQ_NEED_WAKEUP clear fix (Xiaoguang)

This will throw a few merge conflicts. One is due to the IOCB_NOIO
addition that happened late in 5.8-rc, the other is due to a change in
for-5.9/block. Both are trivial to fixup, I'm attaching my merge
resolution when I pulled it in locally.

Please pull!


The following changes since commit 4ae6dbd683860b9edc254ea8acf5e04b5ae242e5:

  io_uring: fix lockup in io_fail_links() (2020-07-24 12:51:33 -0600)

are available in the Git repository at:

  git://git.kernel.dk/linux-block.git tags/for-5.9/io_uring-20200802

for you to fetch changes up to fa15bafb71fd7a4d6018dae87cfaf890fd4ab47f:

  io_uring: flip if handling after io_setup_async_rw (2020-08-01 11:02:57 -0600)

----------------------------------------------------------------
for-5.9/io_uring-20200802

----------------------------------------------------------------
Bijan Mottahedeh (4):
      io_uring: add wrappers for memory accounting
      io_uring: rename ctx->account_mem field
      io_uring: report pinned memory usage
      io_uring: separate reporting of ring pages from registered pages

Dan Carpenter (1):
      io_uring: fix a use after free in io_async_task_func()

Dmitry Vyukov (1):
      io_uring: fix sq array offset calculation

Jens Axboe (31):
      block: provide plug based way of signaling forced no-wait semantics
      io_uring: always plug for any number of IOs
      io_uring: catch -EIO from buffered issue request failure
      io_uring: re-issue block requests that failed because of resources
      mm: allow read-ahead with IOCB_NOWAIT set
      mm: abstract out wake_page_match() from wake_page_function()
      mm: add support for async page locking
      mm: support async buffered reads in generic_file_buffered_read()
      fs: add FMODE_BUF_RASYNC
      block: flag block devices as supporting IOCB_WAITQ
      xfs: flag files as supporting buffered async reads
      btrfs: flag files as supporting buffered async reads
      mm: add kiocb_wait_page_queue_init() helper
      io_uring: support true async buffered reads, if file provides it
      Merge branch 'async-buffered.8' into for-5.9/io_uring
      io_uring: provide generic io_req_complete() helper
      io_uring: add 'io_comp_state' to struct io_submit_state
      io_uring: pass down completion state on the issue side
      io_uring: pass in completion state to appropriate issue side handlers
      io_uring: enable READ/WRITE to use deferred completions
      io_uring: use task_work for links if possible
      Merge branch 'io_uring-5.8' into for-5.9/io_uring
      io_uring: clean up io_kill_linked_timeout() locking
      Merge branch 'io_uring-5.8' into for-5.9/io_uring
      io_uring: abstract out task work running
      io_uring: use new io_req_task_work_add() helper throughout
      io_uring: only call kfree() for a non-zero pointer
      io_uring: get rid of __req_need_defer()
      io_uring: remove dead 'ctx' argument and move forward declaration
      Merge branch 'io_uring-5.8' into for-5.9/io_uring
      io_uring: don't touch 'ctx' after installing file descriptor

Jiufei Xue (2):
      io_uring: change the poll type to be 32-bits
      io_uring: use EPOLLEXCLUSIVE flag to aoid thundering herd type behavior

Pavel Begunkov (90):
      io_uring: remove setting REQ_F_MUST_PUNT in rw
      io_uring: remove REQ_F_MUST_PUNT
      io_uring: set @poll->file after @poll init
      io_uring: kill NULL checks for submit state
      io_uring: fix NULL-mm for linked reqs
      io-wq: compact io-wq flags numbers
      io-wq: return next work from ->do_work() directly
      io_uring: fix req->work corruption
      io_uring: fix punting req w/o grabbed env
      io_uring: fix feeding io-wq with uninit reqs
      io_uring: don't mark link's head for_async
      io_uring: fix missing io_grab_files()
      io_uring: fix refs underflow in io_iopoll_queue()
      io_uring: remove inflight batching in free_many()
      io_uring: dismantle req early and remove need_iter
      io_uring: batch-free linked requests as well
      io_uring: cosmetic changes for batch free
      io_uring: kill REQ_F_LINK_NEXT
      io_uring: clean up req->result setting by rw
      io_uring: do task_work_run() during iopoll
      io_uring: fix iopoll -EAGAIN handling
      io_uring: fix missing wake_up io_rw_reissue()
      io_uring: deduplicate freeing linked timeouts
      io_uring: replace find_next() out param with ret
      io_uring: kill REQ_F_TIMEOUT
      io_uring: kill REQ_F_TIMEOUT_NOSEQ
      io_uring: fix potential use after free on fallback request free
      io_uring: don't pass def into io_req_work_grab_env
      io_uring: do init work in grab_env()
      io_uring: factor out grab_env() from defer_prep()
      io_uring: do grab_env() just before punting
      io_uring: don't fail iopoll requeue without ->mm
      io_uring: fix NULL mm in io_poll_task_func()
      io_uring: simplify io_async_task_func()
      io_uring: optimise io_req_find_next() fast check
      io_uring: fix missing ->mm on exit
      io_uring: fix mis-refcounting linked timeouts
      io_uring: keep queue_sqe()'s fail path separately
      io_uring: fix lost cqe->flags
      io_uring: don't delay iopoll'ed req completion
      io_uring: fix stopping iopoll'ing too early
      io_uring: briefly loose locks while reaping events
      io_uring: partially inline io_iopoll_getevents()
      io_uring: remove nr_events arg from iopoll_check()
      io_uring: don't burn CPU for iopoll on exit
      io_uring: rename sr->msg into umsg
      io_uring: use more specific type in rcv/snd msg cp
      io_uring: extract io_sendmsg_copy_hdr()
      io_uring: replace rw->task_work with rq->task_work
      io_uring: simplify io_req_map_rw()
      io_uring: add a helper for async rw iovec prep
      io_uring: follow **iovec idiom in io_import_iovec
      io_uring: share completion list w/ per-op space
      io_uring: rename ctx->poll into ctx->iopoll
      io_uring: use inflight_entry list for iopoll'ing
      io_uring: use completion list for CQ overflow
      io_uring: add req->timeout.list
      io_uring: remove init for unused list
      io_uring: use non-intrusive list for defer
      io_uring: remove sequence from io_kiocb
      io_uring: place cflags into completion data
      io_uring: inline io_req_work_grab_env()
      io_uring: remove empty cleanup of OP_OPEN* reqs
      io_uring: alloc ->io in io_req_defer_prep()
      io_uring/io-wq: move RLIMIT_FSIZE to io-wq
      io_uring: simplify file ref tracking in submission state
      io_uring: indent left {send,recv}[msg]()
      io_uring: remove extra checks in send/recv
      io_uring: don't forget cflags in io_recv()
      io_uring: free selected-bufs if error'ed
      io_uring: move BUFFER_SELECT check into *recv[msg]
      io_uring: extract io_put_kbuf() helper
      io_uring: don't open-code recv kbuf managment
      io_uring: don't miscount pinned memory
      io_uring: return locked and pinned page accounting
      tasks: add put_task_struct_many()
      io_uring: batch put_task_struct()
      io_uring: don't do opcode prep twice
      io_uring: deduplicate io_grab_files() calls
      io_uring: mark ->work uninitialised after cleanup
      io_uring: fix missing io_queue_linked_timeout()
      io-wq: update hash bits
      io_uring: de-unionise io_kiocb
      io_uring: deduplicate __io_complete_rw()
      io_uring: fix racy overflow count reporting
      io_uring: fix stalled deferred requests
      io_uring: consolidate *_check_overflow accounting
      io_uring: get rid of atomic FAA for cq_timeouts
      fs: optimise kiocb_set_rw_flags()
      io_uring: flip if handling after io_setup_async_rw

Randy Dunlap (1):
      io_uring: fix function args for !CONFIG_NET

Xiaoguang Wang (1):
      io_uring: clear IORING_SQ_NEED_WAKEUP after executing task works

 block/blk-core.c              |    6 +
 fs/block_dev.c                |    2 +-
 fs/btrfs/file.c               |    2 +-
 fs/io-wq.c                    |   14 +-
 fs/io-wq.h                    |   11 +-
 fs/io_uring.c                 | 2588 +++++++++++++++++++++++------------------
 fs/xfs/xfs_file.c             |    2 +-
 include/linux/blkdev.h        |    1 +
 include/linux/fs.h            |   26 +-
 include/linux/pagemap.h       |   75 ++
 include/linux/sched/task.h    |    6 +
 include/uapi/linux/io_uring.h |    4 +-
 mm/filemap.c                  |  110 +-
 tools/io_uring/liburing.h     |    6 +-
 14 files changed, 1658 insertions(+), 1195 deletions(-)

-- 
Jens Axboe




[-- Attachment #2: merge.txt --]
[-- Type: text/plain, Size: 3460 bytes --]

commit 32a5169a5562db6a09a2d85164e0079913ecc227
Merge: 5fb023fb414a fa15bafb71fd
Author: Jens Axboe <axboe@kernel.dk>
Date:   Sun Aug 2 10:43:35 2020 -0600

    Merge branch 'for-5.9/io_uring' into test
    
    * for-5.9/io_uring: (127 commits)
      io_uring: flip if handling after io_setup_async_rw
      fs: optimise kiocb_set_rw_flags()
      io_uring: don't touch 'ctx' after installing file descriptor
      io_uring: get rid of atomic FAA for cq_timeouts
      io_uring: consolidate *_check_overflow accounting
      io_uring: fix stalled deferred requests
      io_uring: fix racy overflow count reporting
      io_uring: deduplicate __io_complete_rw()
      io_uring: de-unionise io_kiocb
      io-wq: update hash bits
      io_uring: fix missing io_queue_linked_timeout()
      io_uring: mark ->work uninitialised after cleanup
      io_uring: deduplicate io_grab_files() calls
      io_uring: don't do opcode prep twice
      io_uring: clear IORING_SQ_NEED_WAKEUP after executing task works
      io_uring: batch put_task_struct()
      tasks: add put_task_struct_many()
      io_uring: return locked and pinned page accounting
      io_uring: don't miscount pinned memory
      io_uring: don't open-code recv kbuf managment
      ...
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

diff --cc block/blk-core.c
index 93104c7470e8,62a4904db921..d9d632639bd1

--- a/block/blk-core.c
+++ b/block/blk-core.c
@@@ -956,13 -952,30 +956,18 @@@ static inline blk_status_t blk_check_zo
  	return BLK_STS_OK;
  }
  
 -static noinline_for_stack bool
 -generic_make_request_checks(struct bio *bio)
 +static noinline_for_stack bool submit_bio_checks(struct bio *bio)
  {
 -	struct request_queue *q;
 -	int nr_sectors = bio_sectors(bio);
 +	struct request_queue *q = bio->bi_disk->queue;
  	blk_status_t status = BLK_STS_IOERR;
+ 	struct blk_plug *plug;
 -	char b[BDEVNAME_SIZE];
  
  	might_sleep();
  
 -	q = bio->bi_disk->queue;
 -	if (unlikely(!q)) {
 -		printk(KERN_ERR
 -		       "generic_make_request: Trying to access "
 -			"nonexistent block-device %s (%Lu)\n",
 -			bio_devname(bio, b), (long long)bio->bi_iter.bi_sector);
 -		goto end_io;
 -	}
 -
+ 	plug = blk_mq_plug(q, bio);
+ 	if (plug && plug->nowait)
+ 		bio->bi_opf |= REQ_NOWAIT;
+ 
  	/*
  	 * For a REQ_NOWAIT based request, return -EOPNOTSUPP
  	 * if queue is not a request based queue.
diff --cc include/linux/fs.h
index 41cd993ec0f6,e535543d31d9..b7f1f1b7d691
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@@ -315,7 -318,8 +318,9 @@@ enum rw_hint 
  #define IOCB_SYNC		(1 << 5)
  #define IOCB_WRITE		(1 << 6)
  #define IOCB_NOWAIT		(1 << 7)
+ /* iocb->ki_waitq is valid */
+ #define IOCB_WAITQ		(1 << 8)
 +#define IOCB_NOIO		(1 << 9)
  
  struct kiocb {
  	struct file		*ki_filp;
diff --cc mm/filemap.c
index 385759c4ce4b,a5b1fa8f7ce4..4e39c1f4c7d9
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@@ -2028,8 -2044,6 +2044,8 @@@ find_page
  
  		page = find_get_page(mapping, index);
  		if (!page) {
- 			if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO))
++			if (iocb->ki_flags & IOCB_NOIO)
 +				goto would_block;
  			page_cache_sync_readahead(mapping,
  					ra, filp,
  					index, last_index - index);
@@@ -2164,7 -2185,7 +2191,7 @@@ page_not_up_to_date_locked
  		}
  
  readpage:
- 		if (iocb->ki_flags & IOCB_NOIO) {
 -		if (iocb->ki_flags & IOCB_NOWAIT) {
++		if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) {
  			unlock_page(page);
  			put_page(page);
  			goto would_block;

^ permalink raw reply	[flat|nested] 13+ messages in thread