linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/9] nxt propagation + locking optimisation
@ 2020-03-01 16:18 Pavel Begunkov
  2020-03-01 16:18 ` [PATCH 1/9] io_uring: clean up io_close Pavel Begunkov
                   ` (10 more replies)
  0 siblings, 11 replies; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:18 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

There are several independent parts in the patchset, but bundled
to make a point.
1-2: random stuff, that implicitly used later.
3-5: restore @nxt propagation
6-8: optimise locking in io_worker_handle_work()
9: optimise io_uring refcounting

The next propagation bits are done similarly as it was before, but
- nxt stealing is now at top-level, but not hidden in handlers
- ensure there is no with REQ_F_DONT_STEAL_NEXT

[6-8] is the reason to dismiss the previous @nxt propagation appoach,
I didn't found a good way to do the same. Even though it looked
clearer and without new flag.

Performance tested it with link-of-nops + IOSQE_ASYNC:

link size: 100
orig:  501 (ns per nop)
0-8:   446
0-9:   416

link size: 10
orig:  826
0-8:   776
0-9:   756

Pavel Begunkov (9):
  io_uring: clean up io_close
  io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation
  io_uring: make submission ref putting consistent
  io_uring: remove @nxt from handlers
  io_uring: get next req on subm ref drop
  io-wq: shuffle io_worker_handle_work() code
  io-wq: io_worker_handle_work() optimise locking
  io-wq: optimise double lock for io_get_next_work()
  io_uring: pass submission ref to async

 fs/io-wq.c    | 162 ++++++++++++----------
 fs/io_uring.c | 366 ++++++++++++++++++++++----------------------------
 2 files changed, 258 insertions(+), 270 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/9] io_uring: clean up io_close
  2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
@ 2020-03-01 16:18 ` Pavel Begunkov
  2020-03-01 16:18 ` [PATCH 2/9] io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation Pavel Begunkov
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:18 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

Don't abuse labels for plain and straightworward code.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index fb8fe0bd5e18..ff6cc05b86c7 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3030,8 +3030,16 @@ static int io_close(struct io_kiocb *req, struct io_kiocb **nxt,
 		return ret;
 
 	/* if the file has a flush method, be safe and punt to async */
-	if (req->close.put_file->f_op->flush && !io_wq_current_is_worker())
-		goto eagain;
+	if (req->close.put_file->f_op->flush && force_nonblock) {
+		req->work.func = io_close_finish;
+		/*
+		 * Do manual async queue here to avoid grabbing files - we don't
+		 * need the files, and it'll cause io_close_finish() to close
+		 * the file again and cause a double CQE entry for this request
+		 */
+		io_queue_async_work(req);
+		return 0;
+	}
 
 	/*
 	 * No ->flush(), safely close from here and just punt the
@@ -3039,15 +3047,6 @@ static int io_close(struct io_kiocb *req, struct io_kiocb **nxt,
 	 */
 	__io_close_finish(req, nxt);
 	return 0;
-eagain:
-	req->work.func = io_close_finish;
-	/*
-	 * Do manual async queue here to avoid grabbing files - we don't
-	 * need the files, and it'll cause io_close_finish() to close
-	 * the file again and cause a double CQE entry for this request
-	 */
-	io_queue_async_work(req);
-	return 0;
 }
 
 static int io_prep_sfr(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/9] io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation
  2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
  2020-03-01 16:18 ` [PATCH 1/9] io_uring: clean up io_close Pavel Begunkov
@ 2020-03-01 16:18 ` Pavel Begunkov
  2020-03-02 14:24   ` Jens Axboe
  2020-03-01 16:18 ` [PATCH 3/9] io_uring: make submission ref putting consistent Pavel Begunkov
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:18 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

To cancel a work, io-wq sets IO_WQ_WORK_CANCEL and executes the
callback. However, IO_WQ_WORK_NO_CANCEL works will just execute and may
return next work, which will be ignored and lost.

Cancel the whole link.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io-wq.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index a05c32df2046..f74a105ab968 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -742,6 +742,17 @@ static bool io_wq_can_queue(struct io_wqe *wqe, struct io_wqe_acct *acct,
 	return true;
 }
 
+static void io_run_cancel(struct io_wq_work *work)
+{
+	do {
+		struct io_wq_work *old_work = work;
+
+		work->flags |= IO_WQ_WORK_CANCEL;
+		work->func(&work);
+		work = (work == old_work) ? NULL : work;
+	} while (work);
+}
+
 static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work)
 {
 	struct io_wqe_acct *acct = io_work_get_acct(wqe, work);
@@ -755,8 +766,7 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work)
 	 * It's close enough to not be an issue, fork() has the same delay.
 	 */
 	if (unlikely(!io_wq_can_queue(wqe, acct, work))) {
-		work->flags |= IO_WQ_WORK_CANCEL;
-		work->func(&work);
+		io_run_cancel(work);
 		return;
 	}
 
@@ -895,8 +905,7 @@ static enum io_wq_cancel io_wqe_cancel_cb_work(struct io_wqe *wqe,
 	spin_unlock_irqrestore(&wqe->lock, flags);
 
 	if (found) {
-		work->flags |= IO_WQ_WORK_CANCEL;
-		work->func(&work);
+		io_run_cancel(work);
 		return IO_WQ_CANCEL_OK;
 	}
 
@@ -971,8 +980,7 @@ static enum io_wq_cancel io_wqe_cancel_work(struct io_wqe *wqe,
 	spin_unlock_irqrestore(&wqe->lock, flags);
 
 	if (found) {
-		work->flags |= IO_WQ_WORK_CANCEL;
-		work->func(&work);
+		io_run_cancel(work);
 		return IO_WQ_CANCEL_OK;
 	}
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/9] io_uring: make submission ref putting consistent
  2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
  2020-03-01 16:18 ` [PATCH 1/9] io_uring: clean up io_close Pavel Begunkov
  2020-03-01 16:18 ` [PATCH 2/9] io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation Pavel Begunkov
@ 2020-03-01 16:18 ` Pavel Begunkov
  2020-03-01 16:18 ` [PATCH 4/9] io_uring: remove @nxt from handlers Pavel Begunkov
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:18 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

The rule is simple, any async handler gets a submission ref and should
put it at the end. Make them all follow it, and so more consistent.

This is a preparation patch, and as io_wq_assign_next() currently won't
ever work, this doesn't care to use io_put_req_find_next() instead of
io_put_req().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index ff6cc05b86c7..ad8046a9bc0f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2550,7 +2550,7 @@ static bool io_req_cancelled(struct io_kiocb *req)
 	if (req->work.flags & IO_WQ_WORK_CANCEL) {
 		req_set_fail_links(req);
 		io_cqring_add_event(req, -ECANCELED);
-		io_put_req(req);
+		io_double_put_req(req);
 		return true;
 	}
 
@@ -2600,6 +2600,7 @@ static void io_fsync_finish(struct io_wq_work **workptr)
 	if (io_req_cancelled(req))
 		return;
 	__io_fsync(req, &nxt);
+	io_put_req(req); /* drop submission reference */
 	if (nxt)
 		io_wq_assign_next(workptr, nxt);
 }
@@ -2609,7 +2610,6 @@ static int io_fsync(struct io_kiocb *req, struct io_kiocb **nxt,
 {
 	/* fsync always requires a blocking context */
 	if (force_nonblock) {
-		io_put_req(req);
 		req->work.func = io_fsync_finish;
 		return -EAGAIN;
 	}
@@ -2621,9 +2621,6 @@ static void __io_fallocate(struct io_kiocb *req, struct io_kiocb **nxt)
 {
 	int ret;
 
-	if (io_req_cancelled(req))
-		return;
-
 	ret = vfs_fallocate(req->file, req->sync.mode, req->sync.off,
 				req->sync.len);
 	if (ret < 0)
@@ -2637,7 +2634,10 @@ static void io_fallocate_finish(struct io_wq_work **workptr)
 	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
 	struct io_kiocb *nxt = NULL;
 
+	if (io_req_cancelled(req))
+		return;
 	__io_fallocate(req, &nxt);
+	io_put_req(req); /* drop submission reference */
 	if (nxt)
 		io_wq_assign_next(workptr, nxt);
 }
@@ -2659,7 +2659,6 @@ static int io_fallocate(struct io_kiocb *req, struct io_kiocb **nxt,
 {
 	/* fallocate always requiring blocking context */
 	if (force_nonblock) {
-		io_put_req(req);
 		req->work.func = io_fallocate_finish;
 		return -EAGAIN;
 	}
@@ -3015,6 +3014,7 @@ static void io_close_finish(struct io_wq_work **workptr)
 
 	/* not cancellable, don't do io_req_cancelled() */
 	__io_close_finish(req, &nxt);
+	io_put_req(req); /* drop submission reference */
 	if (nxt)
 		io_wq_assign_next(workptr, nxt);
 }
@@ -3038,6 +3038,8 @@ static int io_close(struct io_kiocb *req, struct io_kiocb **nxt,
 		 * the file again and cause a double CQE entry for this request
 		 */
 		io_queue_async_work(req);
+		/* submission ref will be dropped, take it for async */
+		refcount_inc_not_zero(&req->refs);
 		return 0;
 	}
 
@@ -3088,6 +3090,7 @@ static void io_sync_file_range_finish(struct io_wq_work **workptr)
 	if (io_req_cancelled(req))
 		return;
 	__io_sync_file_range(req, &nxt);
+	io_put_req(req); /* put submission ref */
 	if (nxt)
 		io_wq_assign_next(workptr, nxt);
 }
@@ -3097,7 +3100,6 @@ static int io_sync_file_range(struct io_kiocb *req, struct io_kiocb **nxt,
 {
 	/* sync_file_range always requires a blocking context */
 	if (force_nonblock) {
-		io_put_req(req);
 		req->work.func = io_sync_file_range_finish;
 		return -EAGAIN;
 	}
@@ -3464,11 +3466,10 @@ static void io_accept_finish(struct io_wq_work **workptr)
 	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
 	struct io_kiocb *nxt = NULL;
 
-	io_put_req(req);
-
 	if (io_req_cancelled(req))
 		return;
 	__io_accept(req, &nxt, false);
+	io_put_req(req); /* drop submission reference */
 	if (nxt)
 		io_wq_assign_next(workptr, nxt);
 }
@@ -4733,17 +4734,14 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
 		} while (1);
 	}
 
-	/* drop submission reference */
-	io_put_req(req);
-
 	if (ret) {
 		req_set_fail_links(req);
 		io_cqring_add_event(req, ret);
 		io_put_req(req);
 	}
 
-	/* if a dependent link is ready, pass it back */
-	if (!ret && nxt)
+	io_put_req(req); /* drop submission reference */
+	if (nxt)
 		io_wq_assign_next(workptr, nxt);
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/9] io_uring: remove @nxt from handlers
  2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
                   ` (2 preceding siblings ...)
  2020-03-01 16:18 ` [PATCH 3/9] io_uring: make submission ref putting consistent Pavel Begunkov
@ 2020-03-01 16:18 ` Pavel Begunkov
  2020-03-01 16:18 ` [PATCH 5/9] io_uring: get next req on subm ref drop Pavel Begunkov
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:18 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

There will be no use for @nxt in the handlers, and it's doesn't work
anyway, so purge it

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 204 +++++++++++++++++++++-----------------------------
 1 file changed, 86 insertions(+), 118 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index ad8046a9bc0f..daf7c2095523 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1804,17 +1804,6 @@ static void io_complete_rw(struct kiocb *kiocb, long res, long res2)
 	io_put_req(req);
 }
 
-static struct io_kiocb *__io_complete_rw(struct kiocb *kiocb, long res)
-{
-	struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
-	struct io_kiocb *nxt = NULL;
-
-	io_complete_rw_common(kiocb, res);
-	io_put_req_find_next(req, &nxt);
-
-	return nxt;
-}
-
 static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2)
 {
 	struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
@@ -2009,14 +1998,14 @@ static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
 	}
 }
 
-static void kiocb_done(struct kiocb *kiocb, ssize_t ret, struct io_kiocb **nxt)
+static void kiocb_done(struct kiocb *kiocb, ssize_t ret)
 {
 	struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
 
 	if (req->flags & REQ_F_CUR_POS)
 		req->file->f_pos = kiocb->ki_pos;
 	if (ret >= 0 && kiocb->ki_complete == io_complete_rw)
-		*nxt = __io_complete_rw(kiocb, ret);
+		io_complete_rw(kiocb, ret, 0);
 	else
 		io_rw_done(kiocb, ret);
 }
@@ -2265,8 +2254,7 @@ static int io_read_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	return 0;
 }
 
-static int io_read(struct io_kiocb *req, struct io_kiocb **nxt,
-		   bool force_nonblock)
+static int io_read(struct io_kiocb *req, bool force_nonblock)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct kiocb *kiocb = &req->rw.kiocb;
@@ -2306,7 +2294,7 @@ static int io_read(struct io_kiocb *req, struct io_kiocb **nxt,
 
 		/* Catch -EAGAIN return for forced non-blocking submission */
 		if (!force_nonblock || ret2 != -EAGAIN) {
-			kiocb_done(kiocb, ret2, nxt);
+			kiocb_done(kiocb, ret2);
 		} else {
 copy_iov:
 			ret = io_setup_async_rw(req, io_size, iovec,
@@ -2355,8 +2343,7 @@ static int io_write_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	return 0;
 }
 
-static int io_write(struct io_kiocb *req, struct io_kiocb **nxt,
-		    bool force_nonblock)
+static int io_write(struct io_kiocb *req, bool force_nonblock)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct kiocb *kiocb = &req->rw.kiocb;
@@ -2420,7 +2407,7 @@ static int io_write(struct io_kiocb *req, struct io_kiocb **nxt,
 		if (ret2 == -EOPNOTSUPP && (kiocb->ki_flags & IOCB_NOWAIT))
 			ret2 = -EAGAIN;
 		if (!force_nonblock || ret2 != -EAGAIN) {
-			kiocb_done(kiocb, ret2, nxt);
+			kiocb_done(kiocb, ret2);
 		} else {
 copy_iov:
 			ret = io_setup_async_rw(req, io_size, iovec,
@@ -2477,8 +2464,7 @@ static bool io_splice_punt(struct file *file)
 	return !(file->f_mode & O_NONBLOCK);
 }
 
-static int io_splice(struct io_kiocb *req, struct io_kiocb **nxt,
-		     bool force_nonblock)
+static int io_splice(struct io_kiocb *req, bool force_nonblock)
 {
 	struct io_splice *sp = &req->splice;
 	struct file *in = sp->file_in;
@@ -2505,7 +2491,7 @@ static int io_splice(struct io_kiocb *req, struct io_kiocb **nxt,
 	io_cqring_add_event(req, ret);
 	if (ret != sp->len)
 		req_set_fail_links(req);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 }
 
@@ -2578,7 +2564,7 @@ static void io_wq_assign_next(struct io_wq_work **workptr, struct io_kiocb *nxt)
 	}
 }
 
-static void __io_fsync(struct io_kiocb *req, struct io_kiocb **nxt)
+static void __io_fsync(struct io_kiocb *req)
 {
 	loff_t end = req->sync.off + req->sync.len;
 	int ret;
@@ -2589,7 +2575,7 @@ static void __io_fsync(struct io_kiocb *req, struct io_kiocb **nxt)
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 }
 
 static void io_fsync_finish(struct io_wq_work **workptr)
@@ -2599,25 +2585,24 @@ static void io_fsync_finish(struct io_wq_work **workptr)
 
 	if (io_req_cancelled(req))
 		return;
-	__io_fsync(req, &nxt);
+	__io_fsync(req);
 	io_put_req(req); /* drop submission reference */
 	if (nxt)
 		io_wq_assign_next(workptr, nxt);
 }
 
-static int io_fsync(struct io_kiocb *req, struct io_kiocb **nxt,
-		    bool force_nonblock)
+static int io_fsync(struct io_kiocb *req, bool force_nonblock)
 {
 	/* fsync always requires a blocking context */
 	if (force_nonblock) {
 		req->work.func = io_fsync_finish;
 		return -EAGAIN;
 	}
-	__io_fsync(req, nxt);
+	__io_fsync(req);
 	return 0;
 }
 
-static void __io_fallocate(struct io_kiocb *req, struct io_kiocb **nxt)
+static void __io_fallocate(struct io_kiocb *req)
 {
 	int ret;
 
@@ -2626,7 +2611,7 @@ static void __io_fallocate(struct io_kiocb *req, struct io_kiocb **nxt)
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 }
 
 static void io_fallocate_finish(struct io_wq_work **workptr)
@@ -2636,7 +2621,7 @@ static void io_fallocate_finish(struct io_wq_work **workptr)
 
 	if (io_req_cancelled(req))
 		return;
-	__io_fallocate(req, &nxt);
+	__io_fallocate(req);
 	io_put_req(req); /* drop submission reference */
 	if (nxt)
 		io_wq_assign_next(workptr, nxt);
@@ -2654,8 +2639,7 @@ static int io_fallocate_prep(struct io_kiocb *req,
 	return 0;
 }
 
-static int io_fallocate(struct io_kiocb *req, struct io_kiocb **nxt,
-			bool force_nonblock)
+static int io_fallocate(struct io_kiocb *req, bool force_nonblock)
 {
 	/* fallocate always requiring blocking context */
 	if (force_nonblock) {
@@ -2663,7 +2647,7 @@ static int io_fallocate(struct io_kiocb *req, struct io_kiocb **nxt,
 		return -EAGAIN;
 	}
 
-	__io_fallocate(req, nxt);
+	__io_fallocate(req);
 	return 0;
 }
 
@@ -2736,8 +2720,7 @@ static int io_openat2_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static int io_openat2(struct io_kiocb *req, struct io_kiocb **nxt,
-		      bool force_nonblock)
+static int io_openat2(struct io_kiocb *req, bool force_nonblock)
 {
 	struct open_flags op;
 	struct file *file;
@@ -2768,15 +2751,14 @@ static int io_openat2(struct io_kiocb *req, struct io_kiocb **nxt,
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 }
 
-static int io_openat(struct io_kiocb *req, struct io_kiocb **nxt,
-		     bool force_nonblock)
+static int io_openat(struct io_kiocb *req, bool force_nonblock)
 {
 	req->open.how = build_open_how(req->open.how.flags, req->open.how.mode);
-	return io_openat2(req, nxt, force_nonblock);
+	return io_openat2(req, force_nonblock);
 }
 
 static int io_epoll_ctl_prep(struct io_kiocb *req,
@@ -2804,8 +2786,7 @@ static int io_epoll_ctl_prep(struct io_kiocb *req,
 #endif
 }
 
-static int io_epoll_ctl(struct io_kiocb *req, struct io_kiocb **nxt,
-			bool force_nonblock)
+static int io_epoll_ctl(struct io_kiocb *req, bool force_nonblock)
 {
 #if defined(CONFIG_EPOLL)
 	struct io_epoll *ie = &req->epoll;
@@ -2818,7 +2799,7 @@ static int io_epoll_ctl(struct io_kiocb *req, struct io_kiocb **nxt,
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 #else
 	return -EOPNOTSUPP;
@@ -2840,8 +2821,7 @@ static int io_madvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 #endif
 }
 
-static int io_madvise(struct io_kiocb *req, struct io_kiocb **nxt,
-		      bool force_nonblock)
+static int io_madvise(struct io_kiocb *req, bool force_nonblock)
 {
 #if defined(CONFIG_ADVISE_SYSCALLS) && defined(CONFIG_MMU)
 	struct io_madvise *ma = &req->madvise;
@@ -2854,7 +2834,7 @@ static int io_madvise(struct io_kiocb *req, struct io_kiocb **nxt,
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 #else
 	return -EOPNOTSUPP;
@@ -2872,8 +2852,7 @@ static int io_fadvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static int io_fadvise(struct io_kiocb *req, struct io_kiocb **nxt,
-		      bool force_nonblock)
+static int io_fadvise(struct io_kiocb *req, bool force_nonblock)
 {
 	struct io_fadvise *fa = &req->fadvise;
 	int ret;
@@ -2893,7 +2872,7 @@ static int io_fadvise(struct io_kiocb *req, struct io_kiocb **nxt,
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 }
 
@@ -2930,8 +2909,7 @@ static int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static int io_statx(struct io_kiocb *req, struct io_kiocb **nxt,
-		    bool force_nonblock)
+static int io_statx(struct io_kiocb *req, bool force_nonblock)
 {
 	struct io_open *ctx = &req->open;
 	unsigned lookup_flags;
@@ -2968,7 +2946,7 @@ static int io_statx(struct io_kiocb *req, struct io_kiocb **nxt,
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 }
 
@@ -2995,7 +2973,7 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 }
 
 /* only called when __close_fd_get_file() is done */
-static void __io_close_finish(struct io_kiocb *req, struct io_kiocb **nxt)
+static void __io_close_finish(struct io_kiocb *req)
 {
 	int ret;
 
@@ -3004,7 +2982,7 @@ static void __io_close_finish(struct io_kiocb *req, struct io_kiocb **nxt)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
 	fput(req->close.put_file);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 }
 
 static void io_close_finish(struct io_wq_work **workptr)
@@ -3013,14 +2991,13 @@ static void io_close_finish(struct io_wq_work **workptr)
 	struct io_kiocb *nxt = NULL;
 
 	/* not cancellable, don't do io_req_cancelled() */
-	__io_close_finish(req, &nxt);
+	__io_close_finish(req);
 	io_put_req(req); /* drop submission reference */
 	if (nxt)
 		io_wq_assign_next(workptr, nxt);
 }
 
-static int io_close(struct io_kiocb *req, struct io_kiocb **nxt,
-		    bool force_nonblock)
+static int io_close(struct io_kiocb *req, bool force_nonblock)
 {
 	int ret;
 
@@ -3047,7 +3024,7 @@ static int io_close(struct io_kiocb *req, struct io_kiocb **nxt,
 	 * No ->flush(), safely close from here and just punt the
 	 * fput() to async context.
 	 */
-	__io_close_finish(req, nxt);
+	__io_close_finish(req);
 	return 0;
 }
 
@@ -3069,7 +3046,7 @@ static int io_prep_sfr(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static void __io_sync_file_range(struct io_kiocb *req, struct io_kiocb **nxt)
+static void __io_sync_file_range(struct io_kiocb *req)
 {
 	int ret;
 
@@ -3078,7 +3055,7 @@ static void __io_sync_file_range(struct io_kiocb *req, struct io_kiocb **nxt)
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 }
 
 
@@ -3089,14 +3066,13 @@ static void io_sync_file_range_finish(struct io_wq_work **workptr)
 
 	if (io_req_cancelled(req))
 		return;
-	__io_sync_file_range(req, &nxt);
+	__io_sync_file_range(req);
 	io_put_req(req); /* put submission ref */
 	if (nxt)
 		io_wq_assign_next(workptr, nxt);
 }
 
-static int io_sync_file_range(struct io_kiocb *req, struct io_kiocb **nxt,
-			      bool force_nonblock)
+static int io_sync_file_range(struct io_kiocb *req, bool force_nonblock)
 {
 	/* sync_file_range always requires a blocking context */
 	if (force_nonblock) {
@@ -3104,7 +3080,7 @@ static int io_sync_file_range(struct io_kiocb *req, struct io_kiocb **nxt,
 		return -EAGAIN;
 	}
 
-	__io_sync_file_range(req, nxt);
+	__io_sync_file_range(req);
 	return 0;
 }
 
@@ -3156,8 +3132,7 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 #endif
 }
 
-static int io_sendmsg(struct io_kiocb *req, struct io_kiocb **nxt,
-		      bool force_nonblock)
+static int io_sendmsg(struct io_kiocb *req, bool force_nonblock)
 {
 #if defined(CONFIG_NET)
 	struct io_async_msghdr *kmsg = NULL;
@@ -3211,15 +3186,14 @@ static int io_sendmsg(struct io_kiocb *req, struct io_kiocb **nxt,
 	io_cqring_add_event(req, ret);
 	if (ret < 0)
 		req_set_fail_links(req);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 #else
 	return -EOPNOTSUPP;
 #endif
 }
 
-static int io_send(struct io_kiocb *req, struct io_kiocb **nxt,
-		   bool force_nonblock)
+static int io_send(struct io_kiocb *req, bool force_nonblock)
 {
 #if defined(CONFIG_NET)
 	struct socket *sock;
@@ -3262,7 +3236,7 @@ static int io_send(struct io_kiocb *req, struct io_kiocb **nxt,
 	io_cqring_add_event(req, ret);
 	if (ret < 0)
 		req_set_fail_links(req);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 #else
 	return -EOPNOTSUPP;
@@ -3303,8 +3277,7 @@ static int io_recvmsg_prep(struct io_kiocb *req,
 #endif
 }
 
-static int io_recvmsg(struct io_kiocb *req, struct io_kiocb **nxt,
-		      bool force_nonblock)
+static int io_recvmsg(struct io_kiocb *req, bool force_nonblock)
 {
 #if defined(CONFIG_NET)
 	struct io_async_msghdr *kmsg = NULL;
@@ -3360,15 +3333,14 @@ static int io_recvmsg(struct io_kiocb *req, struct io_kiocb **nxt,
 	io_cqring_add_event(req, ret);
 	if (ret < 0)
 		req_set_fail_links(req);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 #else
 	return -EOPNOTSUPP;
 #endif
 }
 
-static int io_recv(struct io_kiocb *req, struct io_kiocb **nxt,
-		   bool force_nonblock)
+static int io_recv(struct io_kiocb *req, bool force_nonblock)
 {
 #if defined(CONFIG_NET)
 	struct socket *sock;
@@ -3412,7 +3384,7 @@ static int io_recv(struct io_kiocb *req, struct io_kiocb **nxt,
 	io_cqring_add_event(req, ret);
 	if (ret < 0)
 		req_set_fail_links(req);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 #else
 	return -EOPNOTSUPP;
@@ -3440,8 +3412,7 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 }
 
 #if defined(CONFIG_NET)
-static int __io_accept(struct io_kiocb *req, struct io_kiocb **nxt,
-		       bool force_nonblock)
+static int __io_accept(struct io_kiocb *req, bool force_nonblock)
 {
 	struct io_accept *accept = &req->accept;
 	unsigned file_flags;
@@ -3457,7 +3428,7 @@ static int __io_accept(struct io_kiocb *req, struct io_kiocb **nxt,
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 }
 
@@ -3468,20 +3439,19 @@ static void io_accept_finish(struct io_wq_work **workptr)
 
 	if (io_req_cancelled(req))
 		return;
-	__io_accept(req, &nxt, false);
+	__io_accept(req, false);
 	io_put_req(req); /* drop submission reference */
 	if (nxt)
 		io_wq_assign_next(workptr, nxt);
 }
 #endif
 
-static int io_accept(struct io_kiocb *req, struct io_kiocb **nxt,
-		     bool force_nonblock)
+static int io_accept(struct io_kiocb *req, bool force_nonblock)
 {
 #if defined(CONFIG_NET)
 	int ret;
 
-	ret = __io_accept(req, nxt, force_nonblock);
+	ret = __io_accept(req, force_nonblock);
 	if (ret == -EAGAIN && force_nonblock) {
 		req->work.func = io_accept_finish;
 		return -EAGAIN;
@@ -3516,8 +3486,7 @@ static int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 #endif
 }
 
-static int io_connect(struct io_kiocb *req, struct io_kiocb **nxt,
-		      bool force_nonblock)
+static int io_connect(struct io_kiocb *req, bool force_nonblock)
 {
 #if defined(CONFIG_NET)
 	struct io_async_ctx __io, *io;
@@ -3555,7 +3524,7 @@ static int io_connect(struct io_kiocb *req, struct io_kiocb **nxt,
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_cqring_add_event(req, ret);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 	return 0;
 #else
 	return -EOPNOTSUPP;
@@ -3951,7 +3920,7 @@ static int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
 	return 0;
 }
 
-static int io_poll_add(struct io_kiocb *req, struct io_kiocb **nxt)
+static int io_poll_add(struct io_kiocb *req)
 {
 	struct io_poll_iocb *poll = &req->poll;
 	struct io_ring_ctx *ctx = req->ctx;
@@ -3973,7 +3942,7 @@ static int io_poll_add(struct io_kiocb *req, struct io_kiocb **nxt)
 
 	if (mask) {
 		io_cqring_ev_posted(ctx);
-		io_put_req_find_next(req, nxt);
+		io_put_req(req);
 	}
 	return ipt.error;
 }
@@ -4222,7 +4191,7 @@ static int io_async_cancel_one(struct io_ring_ctx *ctx, void *sqe_addr)
 
 static void io_async_find_and_cancel(struct io_ring_ctx *ctx,
 				     struct io_kiocb *req, __u64 sqe_addr,
-				     struct io_kiocb **nxt, int success_ret)
+				     int success_ret)
 {
 	unsigned long flags;
 	int ret;
@@ -4248,7 +4217,7 @@ static void io_async_find_and_cancel(struct io_ring_ctx *ctx,
 
 	if (ret < 0)
 		req_set_fail_links(req);
-	io_put_req_find_next(req, nxt);
+	io_put_req(req);
 }
 
 static int io_async_cancel_prep(struct io_kiocb *req,
@@ -4264,11 +4233,11 @@ static int io_async_cancel_prep(struct io_kiocb *req,
 	return 0;
 }
 
-static int io_async_cancel(struct io_kiocb *req, struct io_kiocb **nxt)
+static int io_async_cancel(struct io_kiocb *req)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 
-	io_async_find_and_cancel(ctx, req, req->cancel.addr, nxt, 0);
+	io_async_find_and_cancel(ctx, req, req->cancel.addr, 0);
 	return 0;
 }
 
@@ -4475,7 +4444,7 @@ static void io_cleanup_req(struct io_kiocb *req)
 }
 
 static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
-			struct io_kiocb **nxt, bool force_nonblock)
+			bool force_nonblock)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
@@ -4492,7 +4461,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret < 0)
 				break;
 		}
-		ret = io_read(req, nxt, force_nonblock);
+		ret = io_read(req, force_nonblock);
 		break;
 	case IORING_OP_WRITEV:
 	case IORING_OP_WRITE_FIXED:
@@ -4502,7 +4471,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret < 0)
 				break;
 		}
-		ret = io_write(req, nxt, force_nonblock);
+		ret = io_write(req, force_nonblock);
 		break;
 	case IORING_OP_FSYNC:
 		if (sqe) {
@@ -4510,7 +4479,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret < 0)
 				break;
 		}
-		ret = io_fsync(req, nxt, force_nonblock);
+		ret = io_fsync(req, force_nonblock);
 		break;
 	case IORING_OP_POLL_ADD:
 		if (sqe) {
@@ -4518,7 +4487,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_poll_add(req, nxt);
+		ret = io_poll_add(req);
 		break;
 	case IORING_OP_POLL_REMOVE:
 		if (sqe) {
@@ -4534,7 +4503,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret < 0)
 				break;
 		}
-		ret = io_sync_file_range(req, nxt, force_nonblock);
+		ret = io_sync_file_range(req, force_nonblock);
 		break;
 	case IORING_OP_SENDMSG:
 	case IORING_OP_SEND:
@@ -4544,9 +4513,9 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 				break;
 		}
 		if (req->opcode == IORING_OP_SENDMSG)
-			ret = io_sendmsg(req, nxt, force_nonblock);
+			ret = io_sendmsg(req, force_nonblock);
 		else
-			ret = io_send(req, nxt, force_nonblock);
+			ret = io_send(req, force_nonblock);
 		break;
 	case IORING_OP_RECVMSG:
 	case IORING_OP_RECV:
@@ -4556,9 +4525,9 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 				break;
 		}
 		if (req->opcode == IORING_OP_RECVMSG)
-			ret = io_recvmsg(req, nxt, force_nonblock);
+			ret = io_recvmsg(req, force_nonblock);
 		else
-			ret = io_recv(req, nxt, force_nonblock);
+			ret = io_recv(req, force_nonblock);
 		break;
 	case IORING_OP_TIMEOUT:
 		if (sqe) {
@@ -4582,7 +4551,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_accept(req, nxt, force_nonblock);
+		ret = io_accept(req, force_nonblock);
 		break;
 	case IORING_OP_CONNECT:
 		if (sqe) {
@@ -4590,7 +4559,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_connect(req, nxt, force_nonblock);
+		ret = io_connect(req, force_nonblock);
 		break;
 	case IORING_OP_ASYNC_CANCEL:
 		if (sqe) {
@@ -4598,7 +4567,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_async_cancel(req, nxt);
+		ret = io_async_cancel(req);
 		break;
 	case IORING_OP_FALLOCATE:
 		if (sqe) {
@@ -4606,7 +4575,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_fallocate(req, nxt, force_nonblock);
+		ret = io_fallocate(req, force_nonblock);
 		break;
 	case IORING_OP_OPENAT:
 		if (sqe) {
@@ -4614,7 +4583,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_openat(req, nxt, force_nonblock);
+		ret = io_openat(req, force_nonblock);
 		break;
 	case IORING_OP_CLOSE:
 		if (sqe) {
@@ -4622,7 +4591,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_close(req, nxt, force_nonblock);
+		ret = io_close(req, force_nonblock);
 		break;
 	case IORING_OP_FILES_UPDATE:
 		if (sqe) {
@@ -4638,7 +4607,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_statx(req, nxt, force_nonblock);
+		ret = io_statx(req, force_nonblock);
 		break;
 	case IORING_OP_FADVISE:
 		if (sqe) {
@@ -4646,7 +4615,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_fadvise(req, nxt, force_nonblock);
+		ret = io_fadvise(req, force_nonblock);
 		break;
 	case IORING_OP_MADVISE:
 		if (sqe) {
@@ -4654,7 +4623,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_madvise(req, nxt, force_nonblock);
+		ret = io_madvise(req, force_nonblock);
 		break;
 	case IORING_OP_OPENAT2:
 		if (sqe) {
@@ -4662,7 +4631,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_openat2(req, nxt, force_nonblock);
+		ret = io_openat2(req, force_nonblock);
 		break;
 	case IORING_OP_EPOLL_CTL:
 		if (sqe) {
@@ -4670,7 +4639,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret)
 				break;
 		}
-		ret = io_epoll_ctl(req, nxt, force_nonblock);
+		ret = io_epoll_ctl(req, force_nonblock);
 		break;
 	case IORING_OP_SPLICE:
 		if (sqe) {
@@ -4678,7 +4647,7 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 			if (ret < 0)
 				break;
 		}
-		ret = io_splice(req, nxt, force_nonblock);
+		ret = io_splice(req, force_nonblock);
 		break;
 	default:
 		ret = -EINVAL;
@@ -4722,7 +4691,7 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
 
 	if (!ret) {
 		do {
-			ret = io_issue_sqe(req, NULL, &nxt, false);
+			ret = io_issue_sqe(req, NULL, false);
 			/*
 			 * We can get EAGAIN for polled IO even though we're
 			 * forcing a sync submission from here, since we can't
@@ -4868,8 +4837,7 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer)
 
 	if (prev) {
 		req_set_fail_links(prev);
-		io_async_find_and_cancel(ctx, req, prev->user_data, NULL,
-						-ETIME);
+		io_async_find_and_cancel(ctx, req, prev->user_data, -ETIME);
 		io_put_req(prev);
 	} else {
 		io_cqring_add_event(req, -ETIME);
@@ -4938,7 +4906,7 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 			old_creds = override_creds(req->work.creds);
 	}
 
-	ret = io_issue_sqe(req, sqe, &nxt, true);
+	ret = io_issue_sqe(req, sqe, true);
 
 	/*
 	 * We async punt it if the file wasn't marked NOWAIT, or if the file
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/9] io_uring: get next req on subm ref drop
  2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
                   ` (3 preceding siblings ...)
  2020-03-01 16:18 ` [PATCH 4/9] io_uring: remove @nxt from handlers Pavel Begunkov
@ 2020-03-01 16:18 ` Pavel Begunkov
  2020-03-01 21:31   ` Pavel Begunkov
  2020-03-01 16:18 ` [PATCH 6/9] io-wq: shuffle io_worker_handle_work() code Pavel Begunkov
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:18 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

Get next request when dropping the submission reference. However, if
there is an asynchronous counterpart (i.e. read/write, timeout, etc),
that would be dangerous to do, so ignore them using new
REQ_F_DONT_STEAL_NEXT flag.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 121 ++++++++++++++++++++++++++------------------------
 1 file changed, 62 insertions(+), 59 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index daf7c2095523..d456b0ff6835 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -488,6 +488,7 @@ enum {
 	REQ_F_NEED_CLEANUP_BIT,
 	REQ_F_OVERFLOW_BIT,
 	REQ_F_POLLED_BIT,
+	REQ_F_DONT_STEAL_NEXT_BIT,
 };
 
 enum {
@@ -532,6 +533,8 @@ enum {
 	REQ_F_OVERFLOW		= BIT(REQ_F_OVERFLOW_BIT),
 	/* already went through poll handler */
 	REQ_F_POLLED		= BIT(REQ_F_POLLED_BIT),
+	/* don't try to get next req, it's async and racy */
+	REQ_F_DONT_STEAL_NEXT	= BIT(REQ_F_DONT_STEAL_NEXT_BIT),
 };
 
 struct async_poll {
@@ -1218,6 +1221,27 @@ static void io_cqring_add_event(struct io_kiocb *req, long res)
 	io_cqring_ev_posted(ctx);
 }
 
+static void io_link_work_cb(struct io_wq_work **workptr)
+{
+	struct io_wq_work *work = *workptr;
+	struct io_kiocb *link = work->data;
+
+	io_queue_linked_timeout(link);
+	io_wq_submit_work(workptr);
+}
+
+static void io_wq_assign_next(struct io_wq_work **workptr, struct io_kiocb *nxt)
+{
+	struct io_kiocb *link;
+
+	*workptr = &nxt->work;
+	link = io_prep_linked_timeout(nxt);
+	if (link) {
+		nxt->work.func = io_link_work_cb;
+		nxt->work.data = link;
+	}
+}
+
 static inline bool io_is_fallback_req(struct io_kiocb *req)
 {
 	return req == (struct io_kiocb *)
@@ -1518,17 +1542,28 @@ static void io_free_req(struct io_kiocb *req)
 		io_queue_async_work(nxt);
 }
 
-/*
- * Drop reference to request, return next in chain (if there is one) if this
- * was the last reference to this request.
- */
-__attribute__((nonnull))
-static void io_put_req_find_next(struct io_kiocb *req, struct io_kiocb **nxtptr)
+__attribute__((warn_unused_result))
+static struct io_kiocb *io_put_req_submission(struct io_kiocb *req)
 {
-	if (refcount_dec_and_test(&req->refs)) {
-		io_req_find_next(req, nxtptr);
+	bool last_ref = refcount_dec_and_test(&req->refs);
+	struct io_kiocb *nxt = NULL;
+
+	if (!(req->flags & REQ_F_DONT_STEAL_NEXT) || last_ref)
+		io_req_find_next(req, &nxt);
+	if (last_ref)
 		__io_free_req(req);
-	}
+
+	return nxt;
+}
+
+static void io_put_req_async_submission(struct io_kiocb *req,
+					struct io_wq_work **workptr)
+{
+	static struct io_kiocb *nxt;
+
+	nxt = io_put_req_submission(req);
+	if (nxt)
+		io_wq_assign_next(workptr, nxt);
 }
 
 static void io_put_req(struct io_kiocb *req)
@@ -1979,8 +2014,11 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 
 static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
 {
+	struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
+
 	switch (ret) {
 	case -EIOCBQUEUED:
+		req->flags |= REQ_F_DONT_STEAL_NEXT;
 		break;
 	case -ERESTARTSYS:
 	case -ERESTARTNOINTR:
@@ -2526,6 +2564,7 @@ static int io_prep_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	if (unlikely(req->sync.flags & ~IORING_FSYNC_DATASYNC))
 		return -EINVAL;
 
+	req->flags |= REQ_F_DONT_STEAL_NEXT;
 	req->sync.off = READ_ONCE(sqe->off);
 	req->sync.len = READ_ONCE(sqe->len);
 	return 0;
@@ -2543,27 +2582,6 @@ static bool io_req_cancelled(struct io_kiocb *req)
 	return false;
 }
 
-static void io_link_work_cb(struct io_wq_work **workptr)
-{
-	struct io_wq_work *work = *workptr;
-	struct io_kiocb *link = work->data;
-
-	io_queue_linked_timeout(link);
-	io_wq_submit_work(workptr);
-}
-
-static void io_wq_assign_next(struct io_wq_work **workptr, struct io_kiocb *nxt)
-{
-	struct io_kiocb *link;
-
-	*workptr = &nxt->work;
-	link = io_prep_linked_timeout(nxt);
-	if (link) {
-		nxt->work.func = io_link_work_cb;
-		nxt->work.data = link;
-	}
-}
-
 static void __io_fsync(struct io_kiocb *req)
 {
 	loff_t end = req->sync.off + req->sync.len;
@@ -2581,14 +2599,11 @@ static void __io_fsync(struct io_kiocb *req)
 static void io_fsync_finish(struct io_wq_work **workptr)
 {
 	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
-	struct io_kiocb *nxt = NULL;
 
 	if (io_req_cancelled(req))
 		return;
 	__io_fsync(req);
-	io_put_req(req); /* drop submission reference */
-	if (nxt)
-		io_wq_assign_next(workptr, nxt);
+	io_put_req_async_submission(req, workptr);
 }
 
 static int io_fsync(struct io_kiocb *req, bool force_nonblock)
@@ -2617,14 +2632,11 @@ static void __io_fallocate(struct io_kiocb *req)
 static void io_fallocate_finish(struct io_wq_work **workptr)
 {
 	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
-	struct io_kiocb *nxt = NULL;
 
 	if (io_req_cancelled(req))
 		return;
 	__io_fallocate(req);
-	io_put_req(req); /* drop submission reference */
-	if (nxt)
-		io_wq_assign_next(workptr, nxt);
+	io_put_req_async_submission(req, workptr);
 }
 
 static int io_fallocate_prep(struct io_kiocb *req,
@@ -2988,13 +3000,10 @@ static void __io_close_finish(struct io_kiocb *req)
 static void io_close_finish(struct io_wq_work **workptr)
 {
 	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
-	struct io_kiocb *nxt = NULL;
 
 	/* not cancellable, don't do io_req_cancelled() */
 	__io_close_finish(req);
-	io_put_req(req); /* drop submission reference */
-	if (nxt)
-		io_wq_assign_next(workptr, nxt);
+	io_put_req_async_submission(req, workptr);
 }
 
 static int io_close(struct io_kiocb *req, bool force_nonblock)
@@ -3016,6 +3025,7 @@ static int io_close(struct io_kiocb *req, bool force_nonblock)
 		 */
 		io_queue_async_work(req);
 		/* submission ref will be dropped, take it for async */
+		req->flags |= REQ_F_DONT_STEAL_NEXT;
 		refcount_inc_not_zero(&req->refs);
 		return 0;
 	}
@@ -3062,14 +3072,11 @@ static void __io_sync_file_range(struct io_kiocb *req)
 static void io_sync_file_range_finish(struct io_wq_work **workptr)
 {
 	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
-	struct io_kiocb *nxt = NULL;
 
 	if (io_req_cancelled(req))
 		return;
 	__io_sync_file_range(req);
-	io_put_req(req); /* put submission ref */
-	if (nxt)
-		io_wq_assign_next(workptr, nxt);
+	io_put_req_async_submission(req, workptr);
 }
 
 static int io_sync_file_range(struct io_kiocb *req, bool force_nonblock)
@@ -3435,14 +3442,11 @@ static int __io_accept(struct io_kiocb *req, bool force_nonblock)
 static void io_accept_finish(struct io_wq_work **workptr)
 {
 	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
-	struct io_kiocb *nxt = NULL;
 
 	if (io_req_cancelled(req))
 		return;
 	__io_accept(req, false);
-	io_put_req(req); /* drop submission reference */
-	if (nxt)
-		io_wq_assign_next(workptr, nxt);
+	io_put_req_async_submission(req, workptr);
 }
 #endif
 
@@ -3859,9 +3863,10 @@ static void io_poll_task_handler(struct io_kiocb *req, struct io_kiocb **nxt)
 	hash_del(&req->hash_node);
 	io_poll_complete(req, req->result, 0);
 	req->flags |= REQ_F_COMP_LOCKED;
-	io_put_req_find_next(req, nxt);
 	spin_unlock_irq(&ctx->completion_lock);
 
+	req->flags &= ~REQ_F_DONT_STEAL_NEXT;
+	*nxt = io_put_req_submission(req);
 	io_cqring_ev_posted(ctx);
 }
 
@@ -3944,6 +3949,8 @@ static int io_poll_add(struct io_kiocb *req)
 		io_cqring_ev_posted(ctx);
 		io_put_req(req);
 	}
+
+	req->flags |= REQ_F_DONT_STEAL_NEXT;
 	return ipt.error;
 }
 
@@ -4066,6 +4073,7 @@ static int io_timeout_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	if (flags & ~IORING_TIMEOUT_ABS)
 		return -EINVAL;
 
+	req->flags |= REQ_F_DONT_STEAL_NEXT;
 	req->timeout.count = READ_ONCE(sqe->off);
 
 	if (!req->io && io_alloc_async_ctx(req))
@@ -4680,7 +4688,6 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
 {
 	struct io_wq_work *work = *workptr;
 	struct io_kiocb *req = container_of(work, struct io_kiocb, work);
-	struct io_kiocb *nxt = NULL;
 	int ret = 0;
 
 	/* if NO_CANCEL is set, we must still run the work */
@@ -4709,9 +4716,7 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
 		io_put_req(req);
 	}
 
-	io_put_req(req); /* drop submission reference */
-	if (nxt)
-		io_wq_assign_next(workptr, nxt);
+	io_put_req_async_submission(req, workptr);
 }
 
 static int io_req_needs_file(struct io_kiocb *req, int fd)
@@ -4935,10 +4940,6 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	}
 
 err:
-	nxt = NULL;
-	/* drop submission reference */
-	io_put_req_find_next(req, &nxt);
-
 	if (linked_timeout) {
 		if (!ret)
 			io_queue_linked_timeout(linked_timeout);
@@ -4952,6 +4953,8 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		req_set_fail_links(req);
 		io_put_req(req);
 	}
+
+	nxt = io_put_req_submission(req);
 	if (nxt) {
 		req = nxt;
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/9] io-wq: shuffle io_worker_handle_work() code
  2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
                   ` (4 preceding siblings ...)
  2020-03-01 16:18 ` [PATCH 5/9] io_uring: get next req on subm ref drop Pavel Begunkov
@ 2020-03-01 16:18 ` Pavel Begunkov
  2020-03-01 16:18 ` [PATCH 7/9] io-wq: io_worker_handle_work() optimise locking Pavel Begunkov
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:18 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

This is a preparation patch and doesn't change much, but makes next
patches cleaner.

- extract io_impersonate_work() and io_assign_current_work()
- replace @next label with nested do-while
- move put_work() right after NULL'ing cur_work.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io-wq.c | 125 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 66 insertions(+), 59 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index f74a105ab968..3a97d35b569e 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -440,15 +440,45 @@ static void io_wq_switch_creds(struct io_worker *worker,
 		worker->saved_creds = old_creds;
 }
 
+static void io_impersonate_work(struct io_worker *worker,
+				struct io_wq_work *work)
+{
+	if (work->files && current->files != work->files) {
+		task_lock(current);
+		current->files = work->files;
+		task_unlock(current);
+	}
+	if (work->fs && current->fs != work->fs)
+		current->fs = work->fs;
+	if (work->mm != worker->mm)
+		io_wq_switch_mm(worker, work);
+	if (worker->cur_creds != work->creds)
+		io_wq_switch_creds(worker, work);
+}
+
+static void io_assign_current_work(struct io_worker *worker,
+				   struct io_wq_work *work)
+{
+	/* flush pending signals before assigning new work */
+	if (signal_pending(current))
+		flush_signals(current);
+	cond_resched();
+
+	spin_lock_irq(&worker->lock);
+	worker->cur_work = work;
+	spin_unlock_irq(&worker->lock);
+}
+
 static void io_worker_handle_work(struct io_worker *worker)
 	__releases(wqe->lock)
 {
-	struct io_wq_work *work, *old_work = NULL, *put_work = NULL;
 	struct io_wqe *wqe = worker->wqe;
 	struct io_wq *wq = wqe->wq;
 
 	do {
+		struct io_wq_work *work, *old_work;
 		unsigned hash = -1U;
+		bool is_internal;
 
 		/*
 		 * If we got some work, mark us as busy. If we didn't, but
@@ -464,69 +494,46 @@ static void io_worker_handle_work(struct io_worker *worker)
 			wqe->flags |= IO_WQE_FLAG_STALLED;
 
 		spin_unlock_irq(&wqe->lock);
-		if (put_work && wq->put_work)
-			wq->put_work(old_work);
 		if (!work)
 			break;
-next:
-		/* flush any pending signals before assigning new work */
-		if (signal_pending(current))
-			flush_signals(current);
-
-		cond_resched();
 
-		spin_lock_irq(&worker->lock);
-		worker->cur_work = work;
-		spin_unlock_irq(&worker->lock);
-
-		if (work->files && current->files != work->files) {
-			task_lock(current);
-			current->files = work->files;
-			task_unlock(current);
-		}
-		if (work->fs && current->fs != work->fs)
-			current->fs = work->fs;
-		if (work->mm != worker->mm)
-			io_wq_switch_mm(worker, work);
-		if (worker->cur_creds != work->creds)
-			io_wq_switch_creds(worker, work);
-		/*
-		 * OK to set IO_WQ_WORK_CANCEL even for uncancellable work,
-		 * the worker function will do the right thing.
-		 */
-		if (test_bit(IO_WQ_BIT_CANCEL, &wq->state))
-			work->flags |= IO_WQ_WORK_CANCEL;
-
-		if (wq->get_work && !(work->flags & IO_WQ_WORK_INTERNAL)) {
-			put_work = work;
-			wq->get_work(work);
-		}
-
-		old_work = work;
-		work->func(&work);
-
-		spin_lock_irq(&worker->lock);
-		worker->cur_work = NULL;
-		spin_unlock_irq(&worker->lock);
-
-		spin_lock_irq(&wqe->lock);
-
-		if (hash != -1U) {
-			wqe->hash_map &= ~BIT(hash);
-			wqe->flags &= ~IO_WQE_FLAG_STALLED;
-		}
-		if (work && work != old_work) {
-			spin_unlock_irq(&wqe->lock);
-
-			if (put_work && wq->put_work) {
-				wq->put_work(put_work);
-				put_work = NULL;
+		/* handle a whole dependent link */
+		do {
+			io_assign_current_work(worker, work);
+			io_impersonate_work(worker, work);
+
+			/*
+			 * OK to set IO_WQ_WORK_CANCEL even for uncancellable
+			 * work, the worker function will do the right thing.
+			 */
+			if (test_bit(IO_WQ_BIT_CANCEL, &wq->state))
+				work->flags |= IO_WQ_WORK_CANCEL;
+
+			is_internal = work->flags & IO_WQ_WORK_INTERNAL;
+			if (wq->get_work && !is_internal)
+				wq->get_work(work);
+
+			old_work = work;
+			work->func(&work);
+
+			spin_lock_irq(&worker->lock);
+			worker->cur_work = NULL;
+			spin_unlock_irq(&worker->lock);
+
+			if (wq->put_work && !is_internal)
+				wq->put_work(old_work);
+
+			if (hash != -1U) {
+				spin_lock_irq(&wqe->lock);
+				wqe->hash_map &= ~BIT_ULL(hash);
+				wqe->flags &= ~IO_WQE_FLAG_STALLED;
+				spin_unlock_irq(&wqe->lock);
+				/* dependent work is not hashed */
+				hash = -1U;
 			}
+		} while (work && work != old_work);
 
-			/* dependent work not hashed */
-			hash = -1U;
-			goto next;
-		}
+		spin_lock_irq(&wqe->lock);
 	} while (1);
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/9] io-wq: io_worker_handle_work() optimise locking
  2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
                   ` (5 preceding siblings ...)
  2020-03-01 16:18 ` [PATCH 6/9] io-wq: shuffle io_worker_handle_work() code Pavel Begunkov
@ 2020-03-01 16:18 ` Pavel Begunkov
  2020-03-01 16:18 ` [PATCH 8/9] io-wq: optimise double lock for io_get_next_work() Pavel Begunkov
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:18 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

There are 2 points:
- dependant requests are not hashed, don't lock &wqe->lock for them
- remove extra spin_lock_irq(&worker->lock) to reset worker->cur_work
to NULL, because it will be set to a dependant request in a moment.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io-wq.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 3a97d35b569e..da67c931db79 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -476,9 +476,8 @@ static void io_worker_handle_work(struct io_worker *worker)
 	struct io_wq *wq = wqe->wq;
 
 	do {
-		struct io_wq_work *work, *old_work;
+		struct io_wq_work *work;
 		unsigned hash = -1U;
-		bool is_internal;
 
 		/*
 		 * If we got some work, mark us as busy. If we didn't, but
@@ -496,12 +495,14 @@ static void io_worker_handle_work(struct io_worker *worker)
 		spin_unlock_irq(&wqe->lock);
 		if (!work)
 			break;
+		io_assign_current_work(worker, work);
 
 		/* handle a whole dependent link */
 		do {
-			io_assign_current_work(worker, work);
-			io_impersonate_work(worker, work);
+			bool is_internal;
+			struct io_wq_work *old_work;
 
+			io_impersonate_work(worker, work);
 			/*
 			 * OK to set IO_WQ_WORK_CANCEL even for uncancellable
 			 * work, the worker function will do the right thing.
@@ -515,10 +516,8 @@ static void io_worker_handle_work(struct io_worker *worker)
 
 			old_work = work;
 			work->func(&work);
-
-			spin_lock_irq(&worker->lock);
-			worker->cur_work = NULL;
-			spin_unlock_irq(&worker->lock);
+			work = (old_work == work) ? NULL : work;
+			io_assign_current_work(worker, work);
 
 			if (wq->put_work && !is_internal)
 				wq->put_work(old_work);
@@ -527,11 +526,11 @@ static void io_worker_handle_work(struct io_worker *worker)
 				spin_lock_irq(&wqe->lock);
 				wqe->hash_map &= ~BIT_ULL(hash);
 				wqe->flags &= ~IO_WQE_FLAG_STALLED;
-				spin_unlock_irq(&wqe->lock);
 				/* dependent work is not hashed */
 				hash = -1U;
+				spin_unlock_irq(&wqe->lock);
 			}
-		} while (work && work != old_work);
+		} while (work);
 
 		spin_lock_irq(&wqe->lock);
 	} while (1);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 8/9] io-wq: optimise double lock for io_get_next_work()
  2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
                   ` (6 preceding siblings ...)
  2020-03-01 16:18 ` [PATCH 7/9] io-wq: io_worker_handle_work() optimise locking Pavel Begunkov
@ 2020-03-01 16:18 ` Pavel Begunkov
  2020-03-01 16:18 ` [PATCH 9/9] io_uring: pass submission ref to async Pavel Begunkov
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:18 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

When executing non-linked hashed work, io_worker_handle_work()
will lock-unlock wqe->lock to update hash, and then immediately
lock-unlock to get next work. Optimise this case and lock/unlock
only once.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io-wq.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index da67c931db79..f9b18c16ebd8 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -474,11 +474,11 @@ static void io_worker_handle_work(struct io_worker *worker)
 {
 	struct io_wqe *wqe = worker->wqe;
 	struct io_wq *wq = wqe->wq;
+	unsigned hash = -1U;
 
 	do {
 		struct io_wq_work *work;
-		unsigned hash = -1U;
-
+get_next:
 		/*
 		 * If we got some work, mark us as busy. If we didn't, but
 		 * the list isn't empty, it means we stalled on hashed work.
@@ -528,6 +528,9 @@ static void io_worker_handle_work(struct io_worker *worker)
 				wqe->flags &= ~IO_WQE_FLAG_STALLED;
 				/* dependent work is not hashed */
 				hash = -1U;
+				/* skip unnecessary unlock-lock wqe->lock */
+				if (!work)
+					goto get_next;
 				spin_unlock_irq(&wqe->lock);
 			}
 		} while (work);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 9/9] io_uring: pass submission ref to async
  2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
                   ` (7 preceding siblings ...)
  2020-03-01 16:18 ` [PATCH 8/9] io-wq: optimise double lock for io_get_next_work() Pavel Begunkov
@ 2020-03-01 16:18 ` Pavel Begunkov
  2020-03-01 21:39   ` Pavel Begunkov
  2020-03-01 16:23 ` [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
  2020-03-01 19:14 ` Jens Axboe
  10 siblings, 1 reply; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:18 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

Currenlty, every async work handler accepts a submission reference,
which it should put. Also there is a reference grabbed in io_get_work()
and dropped in io_put_work(). This patch merge them together.

- So, ownership of the submission reference passed to io-wq, and it'll
be put in io_put_work().
- io_get_put() doesn't take a ref now and so deleted.
- async handlers don't put the submission ref anymore.
- make cancellation bits of io-wq to call {get,put}_work() handlers

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io-wq.c    | 17 +++++++++++++----
 fs/io_uring.c | 32 +++++++++++++-------------------
 2 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index f9b18c16ebd8..686ad043c6ac 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -751,14 +751,23 @@ static bool io_wq_can_queue(struct io_wqe *wqe, struct io_wqe_acct *acct,
 	return true;
 }
 
-static void io_run_cancel(struct io_wq_work *work)
+static void io_run_cancel(struct io_wq_work *work, struct io_wqe *wqe)
 {
+	struct io_wq *wq = wqe->wq;
+
 	do {
 		struct io_wq_work *old_work = work;
+		bool is_internal = work->flags & IO_WQ_WORK_INTERNAL;
+
+		if (wq->get_work && !is_internal)
+			wq->get_work(work);
 
 		work->flags |= IO_WQ_WORK_CANCEL;
 		work->func(&work);
 		work = (work == old_work) ? NULL : work;
+
+		if (wq->put_work && !is_internal)
+			wq->put_work(old_work);
 	} while (work);
 }
 
@@ -775,7 +784,7 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work)
 	 * It's close enough to not be an issue, fork() has the same delay.
 	 */
 	if (unlikely(!io_wq_can_queue(wqe, acct, work))) {
-		io_run_cancel(work);
+		io_run_cancel(work, wqe);
 		return;
 	}
 
@@ -914,7 +923,7 @@ static enum io_wq_cancel io_wqe_cancel_cb_work(struct io_wqe *wqe,
 	spin_unlock_irqrestore(&wqe->lock, flags);
 
 	if (found) {
-		io_run_cancel(work);
+		io_run_cancel(work, wqe);
 		return IO_WQ_CANCEL_OK;
 	}
 
@@ -989,7 +998,7 @@ static enum io_wq_cancel io_wqe_cancel_work(struct io_wqe *wqe,
 	spin_unlock_irqrestore(&wqe->lock, flags);
 
 	if (found) {
-		io_run_cancel(work);
+		io_run_cancel(work, wqe);
 		return IO_WQ_CANCEL_OK;
 	}
 
diff --git a/fs/io_uring.c b/fs/io_uring.c
index d456b0ff6835..c6845a1e5aaa 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1556,12 +1556,13 @@ static struct io_kiocb *io_put_req_submission(struct io_kiocb *req)
 	return nxt;
 }
 
-static void io_put_req_async_submission(struct io_kiocb *req,
-					struct io_wq_work **workptr)
+static void io_steal_work(struct io_kiocb *req,
+			  struct io_wq_work **workptr)
 {
-	static struct io_kiocb *nxt;
+	struct io_kiocb *nxt = NULL;
 
-	nxt = io_put_req_submission(req);
+	if (!(req->flags & REQ_F_DONT_STEAL_NEXT))
+		io_req_find_next(req, &nxt);
 	if (nxt)
 		io_wq_assign_next(workptr, nxt);
 }
@@ -2575,7 +2576,7 @@ static bool io_req_cancelled(struct io_kiocb *req)
 	if (req->work.flags & IO_WQ_WORK_CANCEL) {
 		req_set_fail_links(req);
 		io_cqring_add_event(req, -ECANCELED);
-		io_double_put_req(req);
+		io_put_req(req);
 		return true;
 	}
 
@@ -2603,7 +2604,7 @@ static void io_fsync_finish(struct io_wq_work **workptr)
 	if (io_req_cancelled(req))
 		return;
 	__io_fsync(req);
-	io_put_req_async_submission(req, workptr);
+	io_steal_work(req, workptr);
 }
 
 static int io_fsync(struct io_kiocb *req, bool force_nonblock)
@@ -2636,7 +2637,7 @@ static void io_fallocate_finish(struct io_wq_work **workptr)
 	if (io_req_cancelled(req))
 		return;
 	__io_fallocate(req);
-	io_put_req_async_submission(req, workptr);
+	io_steal_work(req, workptr);
 }
 
 static int io_fallocate_prep(struct io_kiocb *req,
@@ -3003,7 +3004,7 @@ static void io_close_finish(struct io_wq_work **workptr)
 
 	/* not cancellable, don't do io_req_cancelled() */
 	__io_close_finish(req);
-	io_put_req_async_submission(req, workptr);
+	io_steal_work(req, workptr);
 }
 
 static int io_close(struct io_kiocb *req, bool force_nonblock)
@@ -3076,7 +3077,7 @@ static void io_sync_file_range_finish(struct io_wq_work **workptr)
 	if (io_req_cancelled(req))
 		return;
 	__io_sync_file_range(req);
-	io_put_req_async_submission(req, workptr);
+	io_steal_work(req, workptr);
 }
 
 static int io_sync_file_range(struct io_kiocb *req, bool force_nonblock)
@@ -3446,7 +3447,7 @@ static void io_accept_finish(struct io_wq_work **workptr)
 	if (io_req_cancelled(req))
 		return;
 	__io_accept(req, false);
-	io_put_req_async_submission(req, workptr);
+	io_steal_work(req, workptr);
 }
 #endif
 
@@ -4716,7 +4717,7 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
 		io_put_req(req);
 	}
 
-	io_put_req_async_submission(req, workptr);
+	io_steal_work(req, workptr);
 }
 
 static int io_req_needs_file(struct io_kiocb *req, int fd)
@@ -6107,13 +6108,6 @@ static void io_put_work(struct io_wq_work *work)
 	io_put_req(req);
 }
 
-static void io_get_work(struct io_wq_work *work)
-{
-	struct io_kiocb *req = container_of(work, struct io_kiocb, work);
-
-	refcount_inc(&req->refs);
-}
-
 static int io_init_wq_offload(struct io_ring_ctx *ctx,
 			      struct io_uring_params *p)
 {
@@ -6124,7 +6118,7 @@ static int io_init_wq_offload(struct io_ring_ctx *ctx,
 	int ret = 0;
 
 	data.user = ctx->user;
-	data.get_work = io_get_work;
+	data.get_work = NULL;
 	data.put_work = io_put_work;
 
 	if (!(p->flags & IORING_SETUP_ATTACH_WQ)) {
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/9] nxt propagation + locking optimisation
  2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
                   ` (8 preceding siblings ...)
  2020-03-01 16:18 ` [PATCH 9/9] io_uring: pass submission ref to async Pavel Begunkov
@ 2020-03-01 16:23 ` Pavel Begunkov
  2020-03-01 16:41   ` Pavel Begunkov
  2020-03-01 19:14 ` Jens Axboe
  10 siblings, 1 reply; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:23 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

On 01/03/2020 19:18, Pavel Begunkov wrote:
> There are several independent parts in the patchset, but bundled
> to make a point.
> 1-2: random stuff, that implicitly used later.
> 3-5: restore @nxt propagation
> 6-8: optimise locking in io_worker_handle_work()
> 9: optimise io_uring refcounting
> 
> The next propagation bits are done similarly as it was before, but
> - nxt stealing is now at top-level, but not hidden in handlers
> - ensure there is no with REQ_F_DONT_STEAL_NEXT
> 
> [6-8] is the reason to dismiss the previous @nxt propagation appoach,
> I didn't found a good way to do the same. Even though it looked
> clearer and without new flag.
> 
> Performance tested it with link-of-nops + IOSQE_ASYNC:
> 
> link size: 100
> orig:  501 (ns per nop)
> 0-8:   446
> 0-9:   416
> 
> link size: 10
> orig:  826
> 0-8:   776
> 0-9:   756

BTW, that's basically QD1, and with contention for wqe->lock the gap should be
even wider.

> 
> Pavel Begunkov (9):
>   io_uring: clean up io_close
>   io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation
>   io_uring: make submission ref putting consistent
>   io_uring: remove @nxt from handlers
>   io_uring: get next req on subm ref drop
>   io-wq: shuffle io_worker_handle_work() code
>   io-wq: io_worker_handle_work() optimise locking
>   io-wq: optimise double lock for io_get_next_work()
>   io_uring: pass submission ref to async
> 
>  fs/io-wq.c    | 162 ++++++++++++----------
>  fs/io_uring.c | 366 ++++++++++++++++++++++----------------------------
>  2 files changed, 258 insertions(+), 270 deletions(-)
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/9] nxt propagation + locking optimisation
  2020-03-01 16:23 ` [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
@ 2020-03-01 16:41   ` Pavel Begunkov
  0 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 16:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

On 01/03/2020 19:23, Pavel Begunkov wrote:
> On 01/03/2020 19:18, Pavel Begunkov wrote:
>> There are several independent parts in the patchset, but bundled
>> to make a point.
>> 1-2: random stuff, that implicitly used later.
>> 3-5: restore @nxt propagation
>> 6-8: optimise locking in io_worker_handle_work()
>> 9: optimise io_uring refcounting
>>
>> The next propagation bits are done similarly as it was before, but
>> - nxt stealing is now at top-level, but not hidden in handlers
>> - ensure there is no with REQ_F_DONT_STEAL_NEXT
>>
>> [6-8] is the reason to dismiss the previous @nxt propagation appoach,
>> I didn't found a good way to do the same. Even though it looked
>> clearer and without new flag.
>>
>> Performance tested it with link-of-nops + IOSQE_ASYNC:
>>
>> link size: 100
>> orig:  501 (ns per nop)
>> 0-8:   446
>> 0-9:   416
>>
>> link size: 10
>> orig:  826
>> 0-8:   776
>> 0-9:   756
> 
> BTW, that's basically QD1, and with contention for wqe->lock the gap should be
> even wider.

And another notice: "orig" actually includes [1-5], so @nxt propagation was
working there.

>>
>> Pavel Begunkov (9):
>>   io_uring: clean up io_close
>>   io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation
>>   io_uring: make submission ref putting consistent
>>   io_uring: remove @nxt from handlers
>>   io_uring: get next req on subm ref drop
>>   io-wq: shuffle io_worker_handle_work() code
>>   io-wq: io_worker_handle_work() optimise locking
>>   io-wq: optimise double lock for io_get_next_work()
>>   io_uring: pass submission ref to async
>>
>>  fs/io-wq.c    | 162 ++++++++++++----------
>>  fs/io_uring.c | 366 ++++++++++++++++++++++----------------------------
>>  2 files changed, 258 insertions(+), 270 deletions(-)
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/9] nxt propagation + locking optimisation
  2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
                   ` (9 preceding siblings ...)
  2020-03-01 16:23 ` [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
@ 2020-03-01 19:14 ` Jens Axboe
  2020-03-01 20:33   ` Pavel Begunkov
  10 siblings, 1 reply; 21+ messages in thread
From: Jens Axboe @ 2020-03-01 19:14 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-kernel

On 3/1/20 9:18 AM, Pavel Begunkov wrote:
> There are several independent parts in the patchset, but bundled
> to make a point.
> 1-2: random stuff, that implicitly used later.
> 3-5: restore @nxt propagation
> 6-8: optimise locking in io_worker_handle_work()
> 9: optimise io_uring refcounting
> 
> The next propagation bits are done similarly as it was before, but
> - nxt stealing is now at top-level, but not hidden in handlers
> - ensure there is no with REQ_F_DONT_STEAL_NEXT
> 
> [6-8] is the reason to dismiss the previous @nxt propagation appoach,
> I didn't found a good way to do the same. Even though it looked
> clearer and without new flag.
> 
> Performance tested it with link-of-nops + IOSQE_ASYNC:
> 
> link size: 100
> orig:  501 (ns per nop)
> 0-8:   446
> 0-9:   416
> 
> link size: 10
> orig:  826
> 0-8:   776
> 0-9:   756

This looks nice, I'll take a closer look tomorrow or later today. Seems
that at least patch 2 should go into 5.6 however, so may make sense to
order the series like that.

BTW, Andres brought up a good point, and that's hashed file write works.
Generally they complete super fast (just copying into the page cache),
which means that that worker will be hammering the wq lock a lot. Since
work N+1 can't make any progress before N completes (since that's how
hashed work works), we should pull a bigger batch of these work items
instead of just one at the time. I think that'd potentially make a huge
difference for the performance of buffered writes.

Just throwing it out there, since you're working in that space anyway
and the rewards will be much larger.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/9] nxt propagation + locking optimisation
  2020-03-01 19:14 ` Jens Axboe
@ 2020-03-01 20:33   ` Pavel Begunkov
  2020-03-02 14:39     ` Jens Axboe
  0 siblings, 1 reply; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 20:33 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

On 01/03/2020 22:14, Jens Axboe wrote:
> On 3/1/20 9:18 AM, Pavel Begunkov wrote:
>> There are several independent parts in the patchset, but bundled
>> to make a point.
>> 1-2: random stuff, that implicitly used later.
>> 3-5: restore @nxt propagation
>> 6-8: optimise locking in io_worker_handle_work()
>> 9: optimise io_uring refcounting
>>
>> The next propagation bits are done similarly as it was before, but
>> - nxt stealing is now at top-level, but not hidden in handlers
>> - ensure there is no with REQ_F_DONT_STEAL_NEXT
>>
>> [6-8] is the reason to dismiss the previous @nxt propagation appoach,
>> I didn't found a good way to do the same. Even though it looked
>> clearer and without new flag.
>>
>> Performance tested it with link-of-nops + IOSQE_ASYNC:
>>
>> link size: 100
>> orig:  501 (ns per nop)
>> 0-8:   446
>> 0-9:   416
>>
>> link size: 10
>> orig:  826
>> 0-8:   776
>> 0-9:   756
> 
> This looks nice, I'll take a closer look tomorrow or later today. Seems
> that at least patch 2 should go into 5.6 however, so may make sense to
> order the series like that.

It's the first one modifying io-wq.c, so should be fine to pick from the middle
as is.

> 
> BTW, Andres brought up a good point, and that's hashed file write works.
> Generally they complete super fast (just copying into the page cache),
> which means that that worker will be hammering the wq lock a lot. Since
> work N+1 can't make any progress before N completes (since that's how
> hashed work works), we should pull a bigger batch of these work items
> instead of just one at the time. I think that'd potentially make a huge
> difference for the performance of buffered writes.

Flashed the same thought. It should be easy enough for hashed requests. Though,
general batching would make us to think about fairness, work stealing, etc.

BTW, what's the point of hashing only heads of a link? Sounds like it can lead
to the write-write collisions, which it tries to avoid.

> 
> Just throwing it out there, since you're working in that space anyway
> and the rewards will be much larger.

I will take a look, but not sure when, I yet have some hunches myself.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/9] io_uring: get next req on subm ref drop
  2020-03-01 16:18 ` [PATCH 5/9] io_uring: get next req on subm ref drop Pavel Begunkov
@ 2020-03-01 21:31   ` Pavel Begunkov
  0 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 21:31 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

On 01/03/2020 19:18, Pavel Begunkov wrote:
> Get next request when dropping the submission reference. However, if
> there is an asynchronous counterpart (i.e. read/write, timeout, etc),
> that would be dangerous to do, so ignore them using new
> REQ_F_DONT_STEAL_NEXT flag.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  fs/io_uring.c | 121 ++++++++++++++++++++++++++------------------------
>  1 file changed, 62 insertions(+), 59 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index daf7c2095523..d456b0ff6835 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -488,6 +488,7 @@ enum {
>  	REQ_F_NEED_CLEANUP_BIT,
>  	REQ_F_OVERFLOW_BIT,
>  	REQ_F_POLLED_BIT,
> +	REQ_F_DONT_STEAL_NEXT_BIT,
>  };
>  
>  enum {
> @@ -532,6 +533,8 @@ enum {
>  	REQ_F_OVERFLOW		= BIT(REQ_F_OVERFLOW_BIT),
>  	/* already went through poll handler */
>  	REQ_F_POLLED		= BIT(REQ_F_POLLED_BIT),
> +	/* don't try to get next req, it's async and racy */
> +	REQ_F_DONT_STEAL_NEXT	= BIT(REQ_F_DONT_STEAL_NEXT_BIT),
>  };
>  
>  struct async_poll {
> @@ -1218,6 +1221,27 @@ static void io_cqring_add_event(struct io_kiocb *req, long res)
>  	io_cqring_ev_posted(ctx);
>  }
>  
> +static void io_link_work_cb(struct io_wq_work **workptr)
> +{
> +	struct io_wq_work *work = *workptr;
> +	struct io_kiocb *link = work->data;
> +
> +	io_queue_linked_timeout(link);
> +	io_wq_submit_work(workptr);
> +}
> +
> +static void io_wq_assign_next(struct io_wq_work **workptr, struct io_kiocb *nxt)
> +{
> +	struct io_kiocb *link;
> +
> +	*workptr = &nxt->work;
> +	link = io_prep_linked_timeout(nxt);
> +	if (link) {
> +		nxt->work.func = io_link_work_cb;
> +		nxt->work.data = link;
> +	}
> +}
> +
>  static inline bool io_is_fallback_req(struct io_kiocb *req)
>  {
>  	return req == (struct io_kiocb *)
> @@ -1518,17 +1542,28 @@ static void io_free_req(struct io_kiocb *req)
>  		io_queue_async_work(nxt);
>  }
>  
> -/*
> - * Drop reference to request, return next in chain (if there is one) if this
> - * was the last reference to this request.
> - */
> -__attribute__((nonnull))
> -static void io_put_req_find_next(struct io_kiocb *req, struct io_kiocb **nxtptr)
> +__attribute__((warn_unused_result))
> +static struct io_kiocb *io_put_req_submission(struct io_kiocb *req)
>  {
> -	if (refcount_dec_and_test(&req->refs)) {
> -		io_req_find_next(req, nxtptr);
> +	bool last_ref = refcount_dec_and_test(&req->refs);
> +	struct io_kiocb *nxt = NULL;
> +
> +	if (!(req->flags & REQ_F_DONT_STEAL_NEXT) || last_ref)

This is a bit racy, need to take @nxt first and then do atomic.
The fix is trivial, so makes sense to review the rest.

> +		io_req_find_next(req, &nxt);
> +	if (last_ref)
>  		__io_free_req(req);
> -	}
> +
> +	return nxt;
> +}
> +
> +static void io_put_req_async_submission(struct io_kiocb *req,
> +					struct io_wq_work **workptr)
> +{
> +	static struct io_kiocb *nxt;
> +
> +	nxt = io_put_req_submission(req);
> +	if (nxt)
> +		io_wq_assign_next(workptr, nxt);
>  }
>  
>  static void io_put_req(struct io_kiocb *req)
> @@ -1979,8 +2014,11 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>  
>  static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
>  {
> +	struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
> +
>  	switch (ret) {
>  	case -EIOCBQUEUED:
> +		req->flags |= REQ_F_DONT_STEAL_NEXT;
>  		break;
>  	case -ERESTARTSYS:
>  	case -ERESTARTNOINTR:
> @@ -2526,6 +2564,7 @@ static int io_prep_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  	if (unlikely(req->sync.flags & ~IORING_FSYNC_DATASYNC))
>  		return -EINVAL;
>  
> +	req->flags |= REQ_F_DONT_STEAL_NEXT;
>  	req->sync.off = READ_ONCE(sqe->off);
>  	req->sync.len = READ_ONCE(sqe->len);
>  	return 0;
> @@ -2543,27 +2582,6 @@ static bool io_req_cancelled(struct io_kiocb *req)
>  	return false;
>  }
>  
> -static void io_link_work_cb(struct io_wq_work **workptr)
> -{
> -	struct io_wq_work *work = *workptr;
> -	struct io_kiocb *link = work->data;
> -
> -	io_queue_linked_timeout(link);
> -	io_wq_submit_work(workptr);
> -}
> -
> -static void io_wq_assign_next(struct io_wq_work **workptr, struct io_kiocb *nxt)
> -{
> -	struct io_kiocb *link;
> -
> -	*workptr = &nxt->work;
> -	link = io_prep_linked_timeout(nxt);
> -	if (link) {
> -		nxt->work.func = io_link_work_cb;
> -		nxt->work.data = link;
> -	}
> -}
> -
>  static void __io_fsync(struct io_kiocb *req)
>  {
>  	loff_t end = req->sync.off + req->sync.len;
> @@ -2581,14 +2599,11 @@ static void __io_fsync(struct io_kiocb *req)
>  static void io_fsync_finish(struct io_wq_work **workptr)
>  {
>  	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
> -	struct io_kiocb *nxt = NULL;
>  
>  	if (io_req_cancelled(req))
>  		return;
>  	__io_fsync(req);
> -	io_put_req(req); /* drop submission reference */
> -	if (nxt)
> -		io_wq_assign_next(workptr, nxt);
> +	io_put_req_async_submission(req, workptr);
>  }
>  
>  static int io_fsync(struct io_kiocb *req, bool force_nonblock)
> @@ -2617,14 +2632,11 @@ static void __io_fallocate(struct io_kiocb *req)
>  static void io_fallocate_finish(struct io_wq_work **workptr)
>  {
>  	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
> -	struct io_kiocb *nxt = NULL;
>  
>  	if (io_req_cancelled(req))
>  		return;
>  	__io_fallocate(req);
> -	io_put_req(req); /* drop submission reference */
> -	if (nxt)
> -		io_wq_assign_next(workptr, nxt);
> +	io_put_req_async_submission(req, workptr);
>  }
>  
>  static int io_fallocate_prep(struct io_kiocb *req,
> @@ -2988,13 +3000,10 @@ static void __io_close_finish(struct io_kiocb *req)
>  static void io_close_finish(struct io_wq_work **workptr)
>  {
>  	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
> -	struct io_kiocb *nxt = NULL;
>  
>  	/* not cancellable, don't do io_req_cancelled() */
>  	__io_close_finish(req);
> -	io_put_req(req); /* drop submission reference */
> -	if (nxt)
> -		io_wq_assign_next(workptr, nxt);
> +	io_put_req_async_submission(req, workptr);
>  }
>  
>  static int io_close(struct io_kiocb *req, bool force_nonblock)
> @@ -3016,6 +3025,7 @@ static int io_close(struct io_kiocb *req, bool force_nonblock)
>  		 */
>  		io_queue_async_work(req);
>  		/* submission ref will be dropped, take it for async */
> +		req->flags |= REQ_F_DONT_STEAL_NEXT;
>  		refcount_inc_not_zero(&req->refs);
>  		return 0;
>  	}
> @@ -3062,14 +3072,11 @@ static void __io_sync_file_range(struct io_kiocb *req)
>  static void io_sync_file_range_finish(struct io_wq_work **workptr)
>  {
>  	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
> -	struct io_kiocb *nxt = NULL;
>  
>  	if (io_req_cancelled(req))
>  		return;
>  	__io_sync_file_range(req);
> -	io_put_req(req); /* put submission ref */
> -	if (nxt)
> -		io_wq_assign_next(workptr, nxt);
> +	io_put_req_async_submission(req, workptr);
>  }
>  
>  static int io_sync_file_range(struct io_kiocb *req, bool force_nonblock)
> @@ -3435,14 +3442,11 @@ static int __io_accept(struct io_kiocb *req, bool force_nonblock)
>  static void io_accept_finish(struct io_wq_work **workptr)
>  {
>  	struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work);
> -	struct io_kiocb *nxt = NULL;
>  
>  	if (io_req_cancelled(req))
>  		return;
>  	__io_accept(req, false);
> -	io_put_req(req); /* drop submission reference */
> -	if (nxt)
> -		io_wq_assign_next(workptr, nxt);
> +	io_put_req_async_submission(req, workptr);
>  }
>  #endif
>  
> @@ -3859,9 +3863,10 @@ static void io_poll_task_handler(struct io_kiocb *req, struct io_kiocb **nxt)
>  	hash_del(&req->hash_node);
>  	io_poll_complete(req, req->result, 0);
>  	req->flags |= REQ_F_COMP_LOCKED;
> -	io_put_req_find_next(req, nxt);
>  	spin_unlock_irq(&ctx->completion_lock);
>  
> +	req->flags &= ~REQ_F_DONT_STEAL_NEXT;
> +	*nxt = io_put_req_submission(req);
>  	io_cqring_ev_posted(ctx);
>  }
>  
> @@ -3944,6 +3949,8 @@ static int io_poll_add(struct io_kiocb *req)
>  		io_cqring_ev_posted(ctx);
>  		io_put_req(req);
>  	}
> +
> +	req->flags |= REQ_F_DONT_STEAL_NEXT;
>  	return ipt.error;
>  }
>  
> @@ -4066,6 +4073,7 @@ static int io_timeout_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>  	if (flags & ~IORING_TIMEOUT_ABS)
>  		return -EINVAL;
>  
> +	req->flags |= REQ_F_DONT_STEAL_NEXT;
>  	req->timeout.count = READ_ONCE(sqe->off);
>  
>  	if (!req->io && io_alloc_async_ctx(req))
> @@ -4680,7 +4688,6 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
>  {
>  	struct io_wq_work *work = *workptr;
>  	struct io_kiocb *req = container_of(work, struct io_kiocb, work);
> -	struct io_kiocb *nxt = NULL;
>  	int ret = 0;
>  
>  	/* if NO_CANCEL is set, we must still run the work */
> @@ -4709,9 +4716,7 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
>  		io_put_req(req);
>  	}
>  
> -	io_put_req(req); /* drop submission reference */
> -	if (nxt)
> -		io_wq_assign_next(workptr, nxt);
> +	io_put_req_async_submission(req, workptr);
>  }
>  
>  static int io_req_needs_file(struct io_kiocb *req, int fd)
> @@ -4935,10 +4940,6 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  	}
>  
>  err:
> -	nxt = NULL;
> -	/* drop submission reference */
> -	io_put_req_find_next(req, &nxt);
> -
>  	if (linked_timeout) {
>  		if (!ret)
>  			io_queue_linked_timeout(linked_timeout);
> @@ -4952,6 +4953,8 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  		req_set_fail_links(req);
>  		io_put_req(req);
>  	}
> +
> +	nxt = io_put_req_submission(req);
>  	if (nxt) {
>  		req = nxt;
>  
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 9/9] io_uring: pass submission ref to async
  2020-03-01 16:18 ` [PATCH 9/9] io_uring: pass submission ref to async Pavel Begunkov
@ 2020-03-01 21:39   ` Pavel Begunkov
  2020-03-02 15:08     ` Pavel Begunkov
  0 siblings, 1 reply; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-01 21:39 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

On 01/03/2020 19:18, Pavel Begunkov wrote:
> Currenlty, every async work handler accepts a submission reference,
> which it should put. Also there is a reference grabbed in io_get_work()
> and dropped in io_put_work(). This patch merge them together.
> 
> - So, ownership of the submission reference passed to io-wq, and it'll
> be put in io_put_work().
> - io_get_put() doesn't take a ref now and so deleted.
> - async handlers don't put the submission ref anymore.
> - make cancellation bits of io-wq to call {get,put}_work() handlers

Hmm, it makes them more like {init,fini}_work() and unbalanced/unpaired. May be
no a desirable thing.

> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  fs/io-wq.c    | 17 +++++++++++++----
>  fs/io_uring.c | 32 +++++++++++++-------------------
>  2 files changed, 26 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/io-wq.c b/fs/io-wq.c
> index f9b18c16ebd8..686ad043c6ac 100644
> --- a/fs/io-wq.c
> +++ b/fs/io-wq.c
> @@ -751,14 +751,23 @@ static bool io_wq_can_queue(struct io_wqe *wqe, struct io_wqe_acct *acct,
>  	return true;
>  }
>  
> -static void io_run_cancel(struct io_wq_work *work)
> +static void io_run_cancel(struct io_wq_work *work, struct io_wqe *wqe)
>  {
> +	struct io_wq *wq = wqe->wq;
> +
>  	do {
>  		struct io_wq_work *old_work = work;
> +		bool is_internal = work->flags & IO_WQ_WORK_INTERNAL;
> +
> +		if (wq->get_work && !is_internal)
> +			wq->get_work(work);
>  
>  		work->flags |= IO_WQ_WORK_CANCEL;
>  		work->func(&work);
>  		work = (work == old_work) ? NULL : work;
> +
> +		if (wq->put_work && !is_internal)
> +			wq->put_work(old_work);
>  	} while (work);
>  }
>  
> @@ -775,7 +784,7 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work)
>  	 * It's close enough to not be an issue, fork() has the same delay.
>  	 */
>  	if (unlikely(!io_wq_can_queue(wqe, acct, work))) {
> -		io_run_cancel(work);
> +		io_run_cancel(work, wqe);
>  		return;
>  	}
>  
> @@ -914,7 +923,7 @@ static enum io_wq_cancel io_wqe_cancel_cb_work(struct io_wqe *wqe,
>  	spin_unlock_irqrestore(&wqe->lock, flags);
>  
>  	if (found) {
> -		io_run_cancel(work);
> +		io_run_cancel(work, wqe);
>  		return IO_WQ_CANCEL_OK;
>  	}
>  
> @@ -989,7 +998,7 @@ static enum io_wq_cancel io_wqe_cancel_work(struct io_wqe *wqe,
>  	spin_unlock_irqrestore(&wqe->lock, flags);
>  
>  	if (found) {
> -		io_run_cancel(work);
> +		io_run_cancel(work, wqe);
>  		return IO_WQ_CANCEL_OK;
>  	}
>  
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index d456b0ff6835..c6845a1e5aaa 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1556,12 +1556,13 @@ static struct io_kiocb *io_put_req_submission(struct io_kiocb *req)
>  	return nxt;
>  }
>  
> -static void io_put_req_async_submission(struct io_kiocb *req,
> -					struct io_wq_work **workptr)
> +static void io_steal_work(struct io_kiocb *req,
> +			  struct io_wq_work **workptr)
>  {
> -	static struct io_kiocb *nxt;
> +	struct io_kiocb *nxt = NULL;
>  
> -	nxt = io_put_req_submission(req);
> +	if (!(req->flags & REQ_F_DONT_STEAL_NEXT))
> +		io_req_find_next(req, &nxt);
>  	if (nxt)
>  		io_wq_assign_next(workptr, nxt);
>  }
> @@ -2575,7 +2576,7 @@ static bool io_req_cancelled(struct io_kiocb *req)
>  	if (req->work.flags & IO_WQ_WORK_CANCEL) {
>  		req_set_fail_links(req);
>  		io_cqring_add_event(req, -ECANCELED);
> -		io_double_put_req(req);
> +		io_put_req(req);
>  		return true;
>  	}
>  
> @@ -2603,7 +2604,7 @@ static void io_fsync_finish(struct io_wq_work **workptr)
>  	if (io_req_cancelled(req))
>  		return;
>  	__io_fsync(req);
> -	io_put_req_async_submission(req, workptr);
> +	io_steal_work(req, workptr);
>  }
>  
>  static int io_fsync(struct io_kiocb *req, bool force_nonblock)
> @@ -2636,7 +2637,7 @@ static void io_fallocate_finish(struct io_wq_work **workptr)
>  	if (io_req_cancelled(req))
>  		return;
>  	__io_fallocate(req);
> -	io_put_req_async_submission(req, workptr);
> +	io_steal_work(req, workptr);
>  }
>  
>  static int io_fallocate_prep(struct io_kiocb *req,
> @@ -3003,7 +3004,7 @@ static void io_close_finish(struct io_wq_work **workptr)
>  
>  	/* not cancellable, don't do io_req_cancelled() */
>  	__io_close_finish(req);
> -	io_put_req_async_submission(req, workptr);
> +	io_steal_work(req, workptr);
>  }
>  
>  static int io_close(struct io_kiocb *req, bool force_nonblock)
> @@ -3076,7 +3077,7 @@ static void io_sync_file_range_finish(struct io_wq_work **workptr)
>  	if (io_req_cancelled(req))
>  		return;
>  	__io_sync_file_range(req);
> -	io_put_req_async_submission(req, workptr);
> +	io_steal_work(req, workptr);
>  }
>  
>  static int io_sync_file_range(struct io_kiocb *req, bool force_nonblock)
> @@ -3446,7 +3447,7 @@ static void io_accept_finish(struct io_wq_work **workptr)
>  	if (io_req_cancelled(req))
>  		return;
>  	__io_accept(req, false);
> -	io_put_req_async_submission(req, workptr);
> +	io_steal_work(req, workptr);
>  }
>  #endif
>  
> @@ -4716,7 +4717,7 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
>  		io_put_req(req);
>  	}
>  
> -	io_put_req_async_submission(req, workptr);
> +	io_steal_work(req, workptr);
>  }
>  
>  static int io_req_needs_file(struct io_kiocb *req, int fd)
> @@ -6107,13 +6108,6 @@ static void io_put_work(struct io_wq_work *work)
>  	io_put_req(req);
>  }
>  
> -static void io_get_work(struct io_wq_work *work)
> -{
> -	struct io_kiocb *req = container_of(work, struct io_kiocb, work);
> -
> -	refcount_inc(&req->refs);
> -}
> -
>  static int io_init_wq_offload(struct io_ring_ctx *ctx,
>  			      struct io_uring_params *p)
>  {
> @@ -6124,7 +6118,7 @@ static int io_init_wq_offload(struct io_ring_ctx *ctx,
>  	int ret = 0;
>  
>  	data.user = ctx->user;
> -	data.get_work = io_get_work;
> +	data.get_work = NULL;
>  	data.put_work = io_put_work;
>  
>  	if (!(p->flags & IORING_SETUP_ATTACH_WQ)) {
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/9] io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation
  2020-03-01 16:18 ` [PATCH 2/9] io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation Pavel Begunkov
@ 2020-03-02 14:24   ` Jens Axboe
  0 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2020-03-02 14:24 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-kernel

On 3/1/20 9:18 AM, Pavel Begunkov wrote:
> To cancel a work, io-wq sets IO_WQ_WORK_CANCEL and executes the
> callback. However, IO_WQ_WORK_NO_CANCEL works will just execute and may
> return next work, which will be ignored and lost.

Applied this one for 5.6.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC 0/9] nxt propagation + locking optimisation
  2020-03-01 20:33   ` Pavel Begunkov
@ 2020-03-02 14:39     ` Jens Axboe
  0 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2020-03-02 14:39 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-kernel

On 3/1/20 1:33 PM, Pavel Begunkov wrote:
> On 01/03/2020 22:14, Jens Axboe wrote:
>> On 3/1/20 9:18 AM, Pavel Begunkov wrote:
>>> There are several independent parts in the patchset, but bundled
>>> to make a point.
>>> 1-2: random stuff, that implicitly used later.
>>> 3-5: restore @nxt propagation
>>> 6-8: optimise locking in io_worker_handle_work()
>>> 9: optimise io_uring refcounting
>>>
>>> The next propagation bits are done similarly as it was before, but
>>> - nxt stealing is now at top-level, but not hidden in handlers
>>> - ensure there is no with REQ_F_DONT_STEAL_NEXT
>>>
>>> [6-8] is the reason to dismiss the previous @nxt propagation appoach,
>>> I didn't found a good way to do the same. Even though it looked
>>> clearer and without new flag.
>>>
>>> Performance tested it with link-of-nops + IOSQE_ASYNC:
>>>
>>> link size: 100
>>> orig:  501 (ns per nop)
>>> 0-8:   446
>>> 0-9:   416
>>>
>>> link size: 10
>>> orig:  826
>>> 0-8:   776
>>> 0-9:   756
>>
>> This looks nice, I'll take a closer look tomorrow or later today. Seems
>> that at least patch 2 should go into 5.6 however, so may make sense to
>> order the series like that.
> 
> It's the first one modifying io-wq.c, so should be fine to pick from
> the middle as is.

Yep, just did.

>> BTW, Andres brought up a good point, and that's hashed file write works.
>> Generally they complete super fast (just copying into the page cache),
>> which means that that worker will be hammering the wq lock a lot. Since
>> work N+1 can't make any progress before N completes (since that's how
>> hashed work works), we should pull a bigger batch of these work items
>> instead of just one at the time. I think that'd potentially make a huge
>> difference for the performance of buffered writes.
> 
> Flashed the same thought. It should be easy enough for hashed
> requests. Though, general batching would make us to think about
> fairness, work stealing, etc.

There's only the one list anyway, so the work is doing to be processed
in order to begin with. Hence I don't think there's a lot of fairness to
be worried about here, we're just going to be processing the existing
work in the same order, but more efficiently. We should be getting both
better throughput and fairness if we remove all items hashed to the same
key for that one worker, only stopping if we encounter a non-hashed work
or work hashed to a different key. Because if we do, if any of that
hashed work ever needs to sleep, the next independent work can proceed
in a different worker.

> BTW, what's the point of hashing only heads of a link? Sounds like it
> can lead to the write-write collisions, which it tries to avoid.

Yeah, the linked items should be hashed as well, not sure why that isn't
done.

>> Just throwing it out there, since you're working in that space anyway
>> and the rewards will be much larger.
> 
> I will take a look, but not sure when, I yet have some hunches myself.

Thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 9/9] io_uring: pass submission ref to async
  2020-03-01 21:39   ` Pavel Begunkov
@ 2020-03-02 15:08     ` Pavel Begunkov
  2020-03-02 15:12       ` Jens Axboe
  0 siblings, 1 reply; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-02 15:08 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

On 3/2/2020 12:39 AM, Pavel Begunkov wrote:
> On 01/03/2020 19:18, Pavel Begunkov wrote:
>> Currenlty, every async work handler accepts a submission reference,
>> which it should put. Also there is a reference grabbed in io_get_work()
>> and dropped in io_put_work(). This patch merge them together.
>>
>> - So, ownership of the submission reference passed to io-wq, and it'll
>> be put in io_put_work().
>> - io_get_put() doesn't take a ref now and so deleted.
>> - async handlers don't put the submission ref anymore.
>> - make cancellation bits of io-wq to call {get,put}_work() handlers
> 
> Hmm, it makes them more like {init,fini}_work() and unbalanced/unpaired. May be
> no a desirable thing.

Any objections against replacing {get,put}_work() with
io_finilise_work()? It will be called once and only once, and a work
must not go away until it happened. It will be enough for now, but not
sure whether you have some plans for this get/put pinning.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 9/9] io_uring: pass submission ref to async
  2020-03-02 15:08     ` Pavel Begunkov
@ 2020-03-02 15:12       ` Jens Axboe
  2020-03-02 15:26         ` Pavel Begunkov
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Axboe @ 2020-03-02 15:12 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-kernel

On 3/2/20 8:08 AM, Pavel Begunkov wrote:
> On 3/2/2020 12:39 AM, Pavel Begunkov wrote:
>> On 01/03/2020 19:18, Pavel Begunkov wrote:
>>> Currenlty, every async work handler accepts a submission reference,
>>> which it should put. Also there is a reference grabbed in io_get_work()
>>> and dropped in io_put_work(). This patch merge them together.
>>>
>>> - So, ownership of the submission reference passed to io-wq, and it'll
>>> be put in io_put_work().
>>> - io_get_put() doesn't take a ref now and so deleted.
>>> - async handlers don't put the submission ref anymore.
>>> - make cancellation bits of io-wq to call {get,put}_work() handlers
>>
>> Hmm, it makes them more like {init,fini}_work() and unbalanced/unpaired. May be
>> no a desirable thing.
> 
> Any objections against replacing {get,put}_work() with
> io_finilise_work()? It will be called once and only once, and a work
> must not go away until it happened. It will be enough for now, but not
> sure whether you have some plans for this get/put pinning.

I have no further plans there, the get/put work only exist to ensure that
the work item stays valid in case of cancelation lookups.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 9/9] io_uring: pass submission ref to async
  2020-03-02 15:12       ` Jens Axboe
@ 2020-03-02 15:26         ` Pavel Begunkov
  0 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2020-03-02 15:26 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

On 3/2/2020 6:12 PM, Jens Axboe wrote:
> On 3/2/20 8:08 AM, Pavel Begunkov wrote:
>> On 3/2/2020 12:39 AM, Pavel Begunkov wrote:
>>> On 01/03/2020 19:18, Pavel Begunkov wrote:
>>>> Currenlty, every async work handler accepts a submission reference,
>>>> which it should put. Also there is a reference grabbed in io_get_work()
>>>> and dropped in io_put_work(). This patch merge them together.
>>>>
>>>> - So, ownership of the submission reference passed to io-wq, and it'll
>>>> be put in io_put_work().
>>>> - io_get_put() doesn't take a ref now and so deleted.
>>>> - async handlers don't put the submission ref anymore.
>>>> - make cancellation bits of io-wq to call {get,put}_work() handlers
>>>
>>> Hmm, it makes them more like {init,fini}_work() and unbalanced/unpaired. May be
>>> no a desirable thing.
>>
>> Any objections against replacing {get,put}_work() with
>> io_finilise_work()? It will be called once and only once, and a work
>> must not go away until it happened. It will be enough for now, but not
>> sure whether you have some plans for this get/put pinning.
> 
> I have no further plans there, the get/put work only exist to ensure that
> the work item stays valid in case of cancelation lookups.

Great, it'll go into v2 as well.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-03-02 15:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-01 16:18 [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
2020-03-01 16:18 ` [PATCH 1/9] io_uring: clean up io_close Pavel Begunkov
2020-03-01 16:18 ` [PATCH 2/9] io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation Pavel Begunkov
2020-03-02 14:24   ` Jens Axboe
2020-03-01 16:18 ` [PATCH 3/9] io_uring: make submission ref putting consistent Pavel Begunkov
2020-03-01 16:18 ` [PATCH 4/9] io_uring: remove @nxt from handlers Pavel Begunkov
2020-03-01 16:18 ` [PATCH 5/9] io_uring: get next req on subm ref drop Pavel Begunkov
2020-03-01 21:31   ` Pavel Begunkov
2020-03-01 16:18 ` [PATCH 6/9] io-wq: shuffle io_worker_handle_work() code Pavel Begunkov
2020-03-01 16:18 ` [PATCH 7/9] io-wq: io_worker_handle_work() optimise locking Pavel Begunkov
2020-03-01 16:18 ` [PATCH 8/9] io-wq: optimise double lock for io_get_next_work() Pavel Begunkov
2020-03-01 16:18 ` [PATCH 9/9] io_uring: pass submission ref to async Pavel Begunkov
2020-03-01 21:39   ` Pavel Begunkov
2020-03-02 15:08     ` Pavel Begunkov
2020-03-02 15:12       ` Jens Axboe
2020-03-02 15:26         ` Pavel Begunkov
2020-03-01 16:23 ` [PATCH RFC 0/9] nxt propagation + locking optimisation Pavel Begunkov
2020-03-01 16:41   ` Pavel Begunkov
2020-03-01 19:14 ` Jens Axboe
2020-03-01 20:33   ` Pavel Begunkov
2020-03-02 14:39     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).