[PATCH v2 0/5] fast poll multishot mode

io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/5] fast poll multishot mode
@ 2022-05-06  7:00 Hao Xu
  2022-05-06  7:00 ` [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-06  7:00 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel

Let multishot support multishot mode, currently only add accept as its
first comsumer.
theoretical analysis:
  1) when connections come in fast
    - singleshot:
              add accept sqe(userpsace) --> accept inline
                              ^                 |
                              |-----------------|
    - multishot:
             add accept sqe(userspace) --> accept inline
                                              ^     |
                                              |--*--|

    we do accept repeatedly in * place until get EAGAIN

  2) when connections come in at a low pressure
    similar thing like 1), we reduce a lot of userspace-kernel context
    switch and useless vfs_poll()


tests:
Did some tests, which goes in this way:

  server    client(multiple)
  accept    connect
  read      write
  write     read
  close     close

Basically, raise up a number of clients(on same machine with server) to
connect to the server, and then write some data to it, the server will
write those data back to the client after it receives them, and then
close the connection after write return. Then the client will read the
data and then close the connection. Here I test 10000 clients connect
one server, data size 128 bytes. And each client has a go routine for
it, so they come to the server in short time.
test 20 times before/after this patchset, time spent:(unit cycle, which
is the return value of clock())
before:
  1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
  +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
  +1934226+1914385)/20.0 = 1927633.75
after:
  1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
  +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
  +1871324+1940803)/20.0 = 1894750.45

(1927633.75 - 1894750.45) / 1927633.75 = 1.65%


A liburing test is here:
https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c

v1->v2:
 - re-implement it against the reworked poll code

Hao Xu (5):
  io_uring: add IORING_ACCEPT_MULTISHOT for accept
  io_uring: add REQ_F_APOLL_MULTISHOT for requests
  io_uring: let fast poll support multishot
  io_uring: add a helper for poll clean
  io_uring: implement multishot mode for accept

 fs/io_uring.c                 | 121 ++++++++++++++++++++++++++--------
 include/uapi/linux/io_uring.h |   5 ++
 2 files changed, 100 insertions(+), 26 deletions(-)


base-commit: f2e030dd7aaea5a937a2547dc980fab418fbc5e7
-- 
2.36.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept
  2022-05-06  7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
@ 2022-05-06  7:00 ` Hao Xu
  2022-05-06 14:32   ` Jens Axboe
  2022-05-06  7:00 ` [PATCH 2/5] io_uring: add REQ_F_APOLL_MULTISHOT for requests Hao Xu
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Hao Xu @ 2022-05-06  7:00 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel

From: Hao Xu <howeyxu@tencent.com>

add an accept_flag IORING_ACCEPT_MULTISHOT for accept, which is to
support multishot.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
---
 include/uapi/linux/io_uring.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index fad63564678a..73bc7e54ac18 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -213,6 +213,11 @@ enum {
 #define IORING_ASYNC_CANCEL_FD	(1U << 1)
 #define IORING_ASYNC_CANCEL_ANY	(1U << 2)
 
+/*
+ * accept flags stored in accept_flags
+ */
+#define IORING_ACCEPT_MULTISHOT	(1U << 15)
+
 /*
  * IO completion data structure (Completion Queue Entry)
  */
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/5] io_uring: add REQ_F_APOLL_MULTISHOT for requests
  2022-05-06  7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
  2022-05-06  7:00 ` [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
@ 2022-05-06  7:00 ` Hao Xu
  2022-05-06  7:01 ` [PATCH 3/5] io_uring: let fast poll support multishot Hao Xu
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-06  7:00 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel

From: Hao Xu <howeyxu@tencent.com>

Add a flag to indicate multishot mode for fast poll. currently only
accept use it, but there may be more operations leveraging it in the
future.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
---
 fs/io_uring.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 1e7466079af7..8ebb1a794e36 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -808,6 +808,7 @@ enum {
 	REQ_F_SINGLE_POLL_BIT,
 	REQ_F_DOUBLE_POLL_BIT,
 	REQ_F_PARTIAL_IO_BIT,
+	REQ_F_APOLL_MULTISHOT_BIT,
 	/* keep async read/write and isreg together and in order */
 	REQ_F_SUPPORT_NOWAIT_BIT,
 	REQ_F_ISREG_BIT,
@@ -872,6 +873,8 @@ enum {
 	REQ_F_DOUBLE_POLL	= BIT(REQ_F_DOUBLE_POLL_BIT),
 	/* request has already done partial IO */
 	REQ_F_PARTIAL_IO	= BIT(REQ_F_PARTIAL_IO_BIT),
+	/* fast poll multishot mode */
+	REQ_F_APOLL_MULTISHOT	= BIT(REQ_F_APOLL_MULTISHOT_BIT),
 };
 
 struct async_poll {
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/5] io_uring: let fast poll support multishot
  2022-05-06  7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
  2022-05-06  7:00 ` [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
  2022-05-06  7:00 ` [PATCH 2/5] io_uring: add REQ_F_APOLL_MULTISHOT for requests Hao Xu
@ 2022-05-06  7:01 ` Hao Xu
  2022-05-06 17:19   ` Pavel Begunkov
  2022-05-06 18:02   ` kernel test robot
  2022-05-06  7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-06  7:01 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel

From: Hao Xu <howeyxu@tencent.com>

For operations like accept, multishot is a useful feature, since we can
reduce a number of accept sqe. Let's integrate it to fast poll, it may
be good for other operations in the future.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
---
 fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8ebb1a794e36..d33777575faf 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
  * either spurious wakeup or multishot CQE is served. 0 when it's done with
  * the request, then the mask is stored in req->cqe.res.
  */
-static int io_poll_check_events(struct io_kiocb *req, bool locked)
+static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	int v;
@@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct io_kiocb *req, bool locked)
 
 		/* multishot, just fill an CQE and proceed */
 		if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
-			__poll_t mask = mangle_poll(req->cqe.res & req->apoll_events);
-			bool filled;
-
-			spin_lock(&ctx->completion_lock);
-			filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
-						 IORING_CQE_F_MORE);
-			io_commit_cqring(ctx);
-			spin_unlock(&ctx->completion_lock);
-			if (unlikely(!filled))
-				return -ECANCELED;
-			io_cqring_ev_posted(ctx);
+			if (req->flags & REQ_F_APOLL_MULTISHOT) {
+				io_tw_lock(req->ctx, locked);
+				if (likely(!(req->task->flags & PF_EXITING)))
+					io_queue_sqe(req);
+				else
+					return -EFAULT;
+			} else {
+				__poll_t mask = mangle_poll(req->cqe.res &
+							    req->apoll_events);
+				bool filled;
+
+				spin_lock(&ctx->completion_lock);
+				filled = io_fill_cqe_aux(ctx, req->cqe.user_data,
+							 mask, IORING_CQE_F_MORE);
+				io_commit_cqring(ctx);
+				spin_unlock(&ctx->completion_lock);
+				if (unlikely(!filled))
+					return -ECANCELED;
+				io_cqring_ev_posted(ctx);
+			}
 		} else if (req->cqe.res) {
 			return 0;
 		}
@@ -6010,7 +6019,7 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
 
-	ret = io_poll_check_events(req, *locked);
+	ret = io_poll_check_events(req, locked);
 	if (ret > 0)
 		return;
 
@@ -6035,7 +6044,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
 
-	ret = io_poll_check_events(req, *locked);
+	ret = io_poll_check_events(req, locked);
 	if (ret > 0)
 		return;
 
@@ -6275,7 +6284,7 @@ static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
 	struct io_ring_ctx *ctx = req->ctx;
 	struct async_poll *apoll;
 	struct io_poll_table ipt;
-	__poll_t mask = EPOLLONESHOT | POLLERR | POLLPRI;
+	__poll_t mask = POLLERR | POLLPRI;
 	int ret;
 
 	if (!def->pollin && !def->pollout)
@@ -6284,6 +6293,8 @@ static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
 		return IO_APOLL_ABORTED;
 	if ((req->flags & (REQ_F_POLLED|REQ_F_PARTIAL_IO)) == REQ_F_POLLED)
 		return IO_APOLL_ABORTED;
+	if (!(req->flags & REQ_F_APOLL_MULTISHOT))
+		mask |= EPOLLONESHOT;
 
 	if (def->pollin) {
 		mask |= POLLIN | POLLRDNORM;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/5] io_uring: add a helper for poll clean
  2022-05-06  7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
                   ` (2 preceding siblings ...)
  2022-05-06  7:01 ` [PATCH 3/5] io_uring: let fast poll support multishot Hao Xu
@ 2022-05-06  7:01 ` Hao Xu
  2022-05-06 11:04   ` kernel test robot
                     ` (3 more replies)
  2022-05-06  7:01 ` [PATCH 5/5] io_uring: implement multishot mode for accept Hao Xu
                   ` (2 subsequent siblings)
  6 siblings, 4 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-06  7:01 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel

From: Hao Xu <howeyxu@tencent.com>

Add a helper for poll clean, it will be used in the multishot accept in
the later patches.

Signed-off-by: Hao Xu <howeyxu@tencent.com>
---
 fs/io_uring.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index d33777575faf..0a83ecc457d1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5711,6 +5711,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
+static inline void __io_poll_clean(struct io_kiocb *req)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+
+	io_poll_remove_entries(req);
+	spin_lock(&ctx->completion_lock);
+	hash_del(&req->hash_node);
+	spin_unlock(&ctx->completion_lock);
+}
+
+#define REQ_F_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)
+static inline void io_poll_clean(struct io_kiocb *req)
+{
+	if ((req->flags & REQ_F_APOLL_MULTI_POLLED) == REQ_F_APOLL_MULTI_POLLED)
+		__io_poll_clean(req);
+}
+
 static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_accept *accept = &req->accept;
@@ -6041,17 +6058,13 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 
 static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 {
-	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
 
 	ret = io_poll_check_events(req, locked);
 	if (ret > 0)
 		return;
 
-	io_poll_remove_entries(req);
-	spin_lock(&ctx->completion_lock);
-	hash_del(&req->hash_node);
-	spin_unlock(&ctx->completion_lock);
+	__io_poll_clean(req);
 
 	if (!ret)
 		io_req_task_submit(req, locked);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/5] io_uring: implement multishot mode for accept
  2022-05-06  7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
                   ` (3 preceding siblings ...)
  2022-05-06  7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
@ 2022-05-06  7:01 ` Hao Xu
  2022-05-06 14:42   ` Jens Axboe
  2022-05-06 20:50   ` Jens Axboe
  2022-05-06  7:36 ` [PATCH v2 0/5] fast poll multishot mode Hao Xu
  2022-05-06 22:23 ` Jens Axboe
  6 siblings, 2 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-06  7:01 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel

From: Hao Xu <howeyxu@tencent.com>

Refactor io_accept() to support multishot mode.

theoretical analysis:
  1) when connections come in fast
    - singleshot:
              add accept sqe(userpsace) --> accept inline
                              ^                 |
                              |-----------------|
    - multishot:
             add accept sqe(userspace) --> accept inline
                                              ^     |
                                              |--*--|

    we do accept repeatedly in * place until get EAGAIN

  2) when connections come in at a low pressure
    similar thing like 1), we reduce a lot of userspace-kernel context
    switch and useless vfs_poll()

tests:
Did some tests, which goes in this way:

  server    client(multiple)
  accept    connect
  read      write
  write     read
  close     close

Basically, raise up a number of clients(on same machine with server) to
connect to the server, and then write some data to it, the server will
write those data back to the client after it receives them, and then
close the connection after write return. Then the client will read the
data and then close the connection. Here I test 10000 clients connect
one server, data size 128 bytes. And each client has a go routine for
it, so they come to the server in short time.
test 20 times before/after this patchset, time spent:(unit cycle, which
is the return value of clock())
before:
  1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
  +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
  +1934226+1914385)/20.0 = 1927633.75
after:
  1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
  +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
  +1871324+1940803)/20.0 = 1894750.45

(1927633.75 - 1894750.45) / 1927633.75 = 1.65%

Signed-off-by: Hao Xu <howeyxu@tencent.com>
---
 fs/io_uring.c | 54 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 48 insertions(+), 6 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0a83ecc457d1..9febe7774dc3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1254,6 +1254,7 @@ static int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags);
 static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer);
 static void io_eventfd_signal(struct io_ring_ctx *ctx);
 static void io_req_tw_post_queue(struct io_kiocb *req, s32 res, u32 cflags);
+static void io_poll_remove_entries(struct io_kiocb *req);
 
 static struct kmem_cache *req_cachep;
 
@@ -5690,24 +5691,29 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_accept *accept = &req->accept;
+	bool multishot;
 
 	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
 		return -EINVAL;
-	if (sqe->ioprio || sqe->len || sqe->buf_index)
+	if (sqe->len || sqe->buf_index)
 		return -EINVAL;
 
 	accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
 	accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2));
 	accept->flags = READ_ONCE(sqe->accept_flags);
 	accept->nofile = rlimit(RLIMIT_NOFILE);
+	multishot = !!(READ_ONCE(sqe->ioprio) & IORING_ACCEPT_MULTISHOT);
 
 	accept->file_slot = READ_ONCE(sqe->file_index);
-	if (accept->file_slot && (accept->flags & SOCK_CLOEXEC))
+	if (accept->file_slot && ((accept->flags & SOCK_CLOEXEC) || multishot))
 		return -EINVAL;
 	if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
 		return -EINVAL;
 	if (SOCK_NONBLOCK != O_NONBLOCK && (accept->flags & SOCK_NONBLOCK))
 		accept->flags = (accept->flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
+	if (multishot)
+		req->flags |= REQ_F_APOLL_MULTISHOT;
+
 	return 0;
 }
 
@@ -5730,6 +5736,7 @@ static inline void io_poll_clean(struct io_kiocb *req)
 
 static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 {
+	struct io_ring_ctx *ctx = req->ctx;
 	struct io_accept *accept = &req->accept;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 	unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
@@ -5737,10 +5744,13 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 	struct file *file;
 	int ret, fd;
 
+retry:
 	if (!fixed) {
 		fd = __get_unused_fd_flags(accept->flags, accept->nofile);
-		if (unlikely(fd < 0))
+		if (unlikely(fd < 0)) {
+			io_poll_clean(req);
 			return fd;
+		}
 	}
 	file = do_accept(req->file, file_flags, accept->addr, accept->addr_len,
 			 accept->flags);
@@ -5748,8 +5758,12 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 		if (!fixed)
 			put_unused_fd(fd);
 		ret = PTR_ERR(file);
-		if (ret == -EAGAIN && force_nonblock)
-			return -EAGAIN;
+		if (ret == -EAGAIN && force_nonblock) {
+			if ((req->flags & REQ_F_APOLL_MULTI_POLLED) ==
+			    REQ_F_APOLL_MULTI_POLLED)
+				ret = 0;
+			return ret;
+		}
 		if (ret == -ERESTARTSYS)
 			ret = -EINTR;
 		req_set_fail(req);
@@ -5760,7 +5774,35 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 		ret = io_install_fixed_file(req, file, issue_flags,
 					    accept->file_slot - 1);
 	}
-	__io_req_complete(req, issue_flags, ret, 0);
+
+	if (req->flags & REQ_F_APOLL_MULTISHOT) {
+		if (ret >= 0) {
+			bool filled;
+
+			spin_lock(&ctx->completion_lock);
+			filled = io_fill_cqe_aux(ctx, req->cqe.user_data, ret,
+						 IORING_CQE_F_MORE);
+			io_commit_cqring(ctx);
+			spin_unlock(&ctx->completion_lock);
+			if (unlikely(!filled)) {
+				io_poll_clean(req);
+				return -ECANCELED;
+			}
+			io_cqring_ev_posted(ctx);
+			goto retry;
+		} else {
+			/*
+			 * the apoll multishot req should handle poll
+			 * cancellation by itself since the upper layer
+			 * who called io_queue_sqe() cannot get errors
+			 * happened here.
+			 */
+			io_poll_clean(req);
+			return ret;
+		}
+	} else {
+		__io_req_complete(req, issue_flags, ret, 0);
+	}
 	return 0;
 }
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] fast poll multishot mode
  2022-05-06  7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
                   ` (4 preceding siblings ...)
  2022-05-06  7:01 ` [PATCH 5/5] io_uring: implement multishot mode for accept Hao Xu
@ 2022-05-06  7:36 ` Hao Xu
  2022-05-06 14:18   ` Jens Axboe
  2022-05-06 22:23 ` Jens Axboe
  6 siblings, 1 reply; 36+ messages in thread
From: Hao Xu @ 2022-05-06  7:36 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel

Hi All,
I actually had a question about the current poll code, from the code it
seems when we cancel a poll-like request, it will ignore the existing
events and just raise a -ECANCELED cqe though I haven't tested it. Is
this by design or am I missing something?

Regards,
Hao

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] io_uring: add a helper for poll clean
  2022-05-06  7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
@ 2022-05-06 11:04   ` kernel test robot
  2022-05-06 12:47   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2022-05-06 11:04 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: kbuild-all, Jens Axboe, Pavel Begunkov, linux-kernel

Hi Hao,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on f2e030dd7aaea5a937a2547dc980fab418fbc5e7]

url:    https://github.com/intel-lab-lkp/linux/commits/Hao-Xu/fast-poll-multishot-mode/20220506-150750
base:   f2e030dd7aaea5a937a2547dc980fab418fbc5e7
config: m68k-randconfig-r025-20220506 (https://download.01.org/0day-ci/archive/20220506/202205061844.HxnOWhwG-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/acb232e81643bd097278ebdc17038e6f280e7212
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Hao-Xu/fast-poll-multishot-mode/20220506-150750
        git checkout acb232e81643bd097278ebdc17038e6f280e7212
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   fs/io_uring.c: In function '__io_submit_flush_completions':
   fs/io_uring.c:2785:40: warning: variable 'prev' set but not used [-Wunused-but-set-variable]
    2785 |         struct io_wq_work_node *node, *prev;
         |                                        ^~~~
   fs/io_uring.c: In function 'io_apoll_task_func':
>> fs/io_uring.c:6067:9: error: implicit declaration of function '__io_poll_clean'; did you mean '__io_fill_cqe'? [-Werror=implicit-function-declaration]
    6067 |         __io_poll_clean(req);
         |         ^~~~~~~~~~~~~~~
         |         __io_fill_cqe
   cc1: some warnings being treated as errors


vim +6067 fs/io_uring.c

  6058	
  6059	static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
  6060	{
  6061		int ret;
  6062	
  6063		ret = io_poll_check_events(req, locked);
  6064		if (ret > 0)
  6065			return;
  6066	
> 6067		__io_poll_clean(req);
  6068	
  6069		if (!ret)
  6070			io_req_task_submit(req, locked);
  6071		else
  6072			io_req_complete_failed(req, ret);
  6073	}
  6074	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] io_uring: add a helper for poll clean
  2022-05-06  7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
  2022-05-06 11:04   ` kernel test robot
@ 2022-05-06 12:47   ` kernel test robot
  2022-05-06 14:36   ` Jens Axboe
  2022-05-06 16:22   ` Pavel Begunkov
  3 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2022-05-06 12:47 UTC (permalink / raw)
  To: Hao Xu, io-uring
  Cc: llvm, kbuild-all, Jens Axboe, Pavel Begunkov, linux-kernel

Hi Hao,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on f2e030dd7aaea5a937a2547dc980fab418fbc5e7]

url:    https://github.com/intel-lab-lkp/linux/commits/Hao-Xu/fast-poll-multishot-mode/20220506-150750
base:   f2e030dd7aaea5a937a2547dc980fab418fbc5e7
config: mips-pic32mzda_defconfig (https://download.01.org/0day-ci/archive/20220506/202205062033.yS62xm5Z-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5e004fb787698440a387750db7f8028e7cb14cfc)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install mips cross compiling tool for clang build
        # apt-get install binutils-mips-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/acb232e81643bd097278ebdc17038e6f280e7212
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Hao-Xu/fast-poll-multishot-mode/20220506-150750
        git checkout acb232e81643bd097278ebdc17038e6f280e7212
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=mips SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> fs/io_uring.c:6067:2: error: call to undeclared function '__io_poll_clean'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
           __io_poll_clean(req);
           ^
   fs/io_uring.c:6067:2: note: did you mean '__io_fill_cqe'?
   fs/io_uring.c:2141:20: note: '__io_fill_cqe' declared here
   static inline bool __io_fill_cqe(struct io_ring_ctx *ctx, u64 user_data,
                      ^
   1 error generated.


vim +/__io_poll_clean +6067 fs/io_uring.c

  6058	
  6059	static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
  6060	{
  6061		int ret;
  6062	
  6063		ret = io_poll_check_events(req, locked);
  6064		if (ret > 0)
  6065			return;
  6066	
> 6067		__io_poll_clean(req);
  6068	
  6069		if (!ret)
  6070			io_req_task_submit(req, locked);
  6071		else
  6072			io_req_complete_failed(req, ret);
  6073	}
  6074	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] fast poll multishot mode
  2022-05-06  7:36 ` [PATCH v2 0/5] fast poll multishot mode Hao Xu
@ 2022-05-06 14:18   ` Jens Axboe
  2022-05-06 16:01     ` Pavel Begunkov
  0 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 14:18 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel

On 5/6/22 1:36 AM, Hao Xu wrote:
> Hi All,
> I actually had a question about the current poll code, from the code it
> seems when we cancel a poll-like request, it will ignore the existing
> events and just raise a -ECANCELED cqe though I haven't tested it. Is
> this by design or am I missing something?

That's by design, but honestly I don't think anyone considered the case
where it's being canceled but has events already. For that case, I think
we should follow the usual logic of only returning an error (canceled)
if we don't have events, if we have events just return them. For
multi-shot, obviously also terminate, but same logic there.

Care to do a separate patch for that?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept
  2022-05-06  7:00 ` [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
@ 2022-05-06 14:32   ` Jens Axboe
  2022-05-07  4:05     ` Hao Xu
  0 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 14:32 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel

On 5/6/22 1:00 AM, Hao Xu wrote:
> From: Hao Xu <howeyxu@tencent.com>
> 
> add an accept_flag IORING_ACCEPT_MULTISHOT for accept, which is to
> support multishot.
> 
> Signed-off-by: Hao Xu <howeyxu@tencent.com>
> ---
>  include/uapi/linux/io_uring.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index fad63564678a..73bc7e54ac18 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -213,6 +213,11 @@ enum {
>  #define IORING_ASYNC_CANCEL_FD	(1U << 1)
>  #define IORING_ASYNC_CANCEL_ANY	(1U << 2)
>  
> +/*
> + * accept flags stored in accept_flags
> + */
> +#define IORING_ACCEPT_MULTISHOT	(1U << 15)

It isn't stored in accept_flags, is it? This is an io_uring private
flag, and it's in ioprio. Which is honestly a good place for per-op
private flags, since nobody really uses ioprio outside of read/write
style requests. But the comment is wrong :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] io_uring: add a helper for poll clean
  2022-05-06  7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
  2022-05-06 11:04   ` kernel test robot
  2022-05-06 12:47   ` kernel test robot
@ 2022-05-06 14:36   ` Jens Axboe
  2022-05-07  6:37     ` Hao Xu
  2022-05-06 16:22   ` Pavel Begunkov
  3 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 14:36 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel

On 5/6/22 1:01 AM, Hao Xu wrote:
> From: Hao Xu <howeyxu@tencent.com>
> 
> Add a helper for poll clean, it will be used in the multishot accept in
> the later patches.

Should this just go into io_clean_op()? Didn't look at it thoroughly,
but it'd remove some cases from the next patch if it could.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] io_uring: implement multishot mode for accept
  2022-05-06  7:01 ` [PATCH 5/5] io_uring: implement multishot mode for accept Hao Xu
@ 2022-05-06 14:42   ` Jens Axboe
  2022-05-07  9:13     ` Hao Xu
  2022-05-06 20:50   ` Jens Axboe
  1 sibling, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 14:42 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel

On 5/6/22 1:01 AM, Hao Xu wrote:
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 0a83ecc457d1..9febe7774dc3 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1254,6 +1254,7 @@ static int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags);
>  static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer);
>  static void io_eventfd_signal(struct io_ring_ctx *ctx);
>  static void io_req_tw_post_queue(struct io_kiocb *req, s32 res, u32 cflags);
> +static void io_poll_remove_entries(struct io_kiocb *req);
>  
>  static struct kmem_cache *req_cachep;
>  
> @@ -5690,24 +5691,29 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>  static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  {
>  	struct io_accept *accept = &req->accept;
> +	bool multishot;
>  
>  	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>  		return -EINVAL;
> -	if (sqe->ioprio || sqe->len || sqe->buf_index)
> +	if (sqe->len || sqe->buf_index)
>  		return -EINVAL;
>  
>  	accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
>  	accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>  	accept->flags = READ_ONCE(sqe->accept_flags);
>  	accept->nofile = rlimit(RLIMIT_NOFILE);
> +	multishot = !!(READ_ONCE(sqe->ioprio) & IORING_ACCEPT_MULTISHOT);

I tend to like:

	multishot = READ_ONCE(sqe->ioprio) & IORING_ACCEPT_MULTISHOT) != 0;

as I think it's more readable. But I think we really want it ala:

	u16 poll_flags;

	poll_flags = READ_ONCE(sqe->ioprio);
	if (poll_flags & ~IORING_ACCEPT_MULTISHOT)
		return -EINVAL;

	...

to ensure that we can add more flags later, hence only accepting this
single flag right now.

Do we need REQ_F_APOLL_MULTI_POLLED, or can we just store whether this
is a multishot request in struct io_accept?

> @@ -5760,7 +5774,35 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>  		ret = io_install_fixed_file(req, file, issue_flags,
>  					    accept->file_slot - 1);
>  	}
> -	__io_req_complete(req, issue_flags, ret, 0);
> +
> +	if (req->flags & REQ_F_APOLL_MULTISHOT) {
> +		if (ret >= 0) {
> +			bool filled;
> +
> +			spin_lock(&ctx->completion_lock);
> +			filled = io_fill_cqe_aux(ctx, req->cqe.user_data, ret,
> +						 IORING_CQE_F_MORE);
> +			io_commit_cqring(ctx);
> +			spin_unlock(&ctx->completion_lock);
> +			if (unlikely(!filled)) {
> +				io_poll_clean(req);
> +				return -ECANCELED;
> +			}
> +			io_cqring_ev_posted(ctx);
> +			goto retry;
> +		} else {
> +			/*
> +			 * the apoll multishot req should handle poll
> +			 * cancellation by itself since the upper layer
> +			 * who called io_queue_sqe() cannot get errors
> +			 * happened here.
> +			 */
> +			io_poll_clean(req);
> +			return ret;
> +		}
> +	} else {
> +		__io_req_complete(req, issue_flags, ret, 0);
> +	}
>  	return 0;
>  }

I'd probably just make that:

	if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
		__io_req_complete(req, issue_flags, ret, 0);
		return 0;
	}
	if (ret >= 0) {
		bool filled;

		spin_lock(&ctx->completion_lock);
		filled = io_fill_cqe_aux(ctx, req->cqe.user_data, ret,
					 IORING_CQE_F_MORE);
		io_commit_cqring(ctx);
		spin_unlock(&ctx->completion_lock);
		if (filled) {
			io_cqring_ev_posted(ctx);
			goto retry;
		}
		/* fall through to error case */
		ret = -ECANCELED;
	}

	/*
	 * the apoll multishot req should handle poll
	 * cancellation by itself since the upper layer
	 * who called io_queue_sqe() cannot get errors
	 * happened here.
	 */
	io_poll_clean(req);
	return ret;

which I think is a lot easier to read and keeps the indentation at a
manageable level and reduces duplicate code.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] fast poll multishot mode
  2022-05-06 14:18   ` Jens Axboe
@ 2022-05-06 16:01     ` Pavel Begunkov
  2022-05-06 16:03       ` Jens Axboe
  0 siblings, 1 reply; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-06 16:01 UTC (permalink / raw)
  To: Jens Axboe, Hao Xu, io-uring; +Cc: linux-kernel

On 5/6/22 15:18, Jens Axboe wrote:
> On 5/6/22 1:36 AM, Hao Xu wrote:
>> Hi All,
>> I actually had a question about the current poll code, from the code it
>> seems when we cancel a poll-like request, it will ignore the existing
>> events and just raise a -ECANCELED cqe though I haven't tested it. Is
>> this by design or am I missing something?
> 
> That's by design, but honestly I don't think anyone considered the case
> where it's being canceled but has events already. For that case, I think
> we should follow the usual logic of only returning an error (canceled)
> if we don't have events, if we have events just return them. For
> multi-shot, obviously also terminate, but same logic there.

Why would we care? It's inherently racy in any case and any
user not handling this is already screwed.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] fast poll multishot mode
  2022-05-06 16:01     ` Pavel Begunkov
@ 2022-05-06 16:03       ` Jens Axboe
  0 siblings, 0 replies; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 16:03 UTC (permalink / raw)
  To: Pavel Begunkov, Hao Xu, io-uring; +Cc: linux-kernel

On 5/6/22 10:01 AM, Pavel Begunkov wrote:
> On 5/6/22 15:18, Jens Axboe wrote:
>> On 5/6/22 1:36 AM, Hao Xu wrote:
>>> Hi All,
>>> I actually had a question about the current poll code, from the code it
>>> seems when we cancel a poll-like request, it will ignore the existing
>>> events and just raise a -ECANCELED cqe though I haven't tested it. Is
>>> this by design or am I missing something?
>>
>> That's by design, but honestly I don't think anyone considered the case
>> where it's being canceled but has events already. For that case, I think
>> we should follow the usual logic of only returning an error (canceled)
>> if we don't have events, if we have events just return them. For
>> multi-shot, obviously also terminate, but same logic there.
> 
> Why would we care? It's inherently racy in any case and any
> user not handling this is already screwed.

We don't really need to care, but the normal logic is "return error if
we have no result, return result otherwise". No reason this should be
any different imho, it's not really a correctness thing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] io_uring: add a helper for poll clean
  2022-05-06  7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
                     ` (2 preceding siblings ...)
  2022-05-06 14:36   ` Jens Axboe
@ 2022-05-06 16:22   ` Pavel Begunkov
  2022-05-07  6:43     ` Hao Xu
  3 siblings, 1 reply; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-06 16:22 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Jens Axboe, linux-kernel

On 5/6/22 08:01, Hao Xu wrote:
> From: Hao Xu <howeyxu@tencent.com>
> 
> Add a helper for poll clean, it will be used in the multishot accept in
> the later patches.
> 
> Signed-off-by: Hao Xu <howeyxu@tencent.com>
> ---
>   fs/io_uring.c | 23 ++++++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index d33777575faf..0a83ecc457d1 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5711,6 +5711,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>   	return 0;
>   }
>   
> +static inline void __io_poll_clean(struct io_kiocb *req)
> +{
> +	struct io_ring_ctx *ctx = req->ctx;
> +
> +	io_poll_remove_entries(req);
> +	spin_lock(&ctx->completion_lock);
> +	hash_del(&req->hash_node);
> +	spin_unlock(&ctx->completion_lock);
> +}
> +
> +#define REQ_F_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)
> +static inline void io_poll_clean(struct io_kiocb *req)
> +{
> +	if ((req->flags & REQ_F_APOLL_MULTI_POLLED) == REQ_F_APOLL_MULTI_POLLED)

So it triggers for apoll multishot only when REQ_F_POLLED is _not_ set,
but if it's not set it did never go through arm_poll / etc. and there is
nothing to clean up. What is the catch?

btw, don't see the function used in this patch, better to add it later
or at least mark with attribute unused, or some may get build failures.


> +		__io_poll_clean(req);
> +}
> +
>   static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>   {
>   	struct io_accept *accept = &req->accept;
> @@ -6041,17 +6058,13 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>   
>   static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
>   {
> -	struct io_ring_ctx *ctx = req->ctx;
>   	int ret;
>   
>   	ret = io_poll_check_events(req, locked);
>   	if (ret > 0)
>   		return;
>   
> -	io_poll_remove_entries(req);
> -	spin_lock(&ctx->completion_lock);
> -	hash_del(&req->hash_node);
> -	spin_unlock(&ctx->completion_lock);
> +	__io_poll_clean(req);
>   
>   	if (!ret)
>   		io_req_task_submit(req, locked);

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] io_uring: let fast poll support multishot
  2022-05-06  7:01 ` [PATCH 3/5] io_uring: let fast poll support multishot Hao Xu
@ 2022-05-06 17:19   ` Pavel Begunkov
  2022-05-06 22:02     ` Jens Axboe
  2022-05-07  7:08     ` Hao Xu
  2022-05-06 18:02   ` kernel test robot
  1 sibling, 2 replies; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-06 17:19 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Jens Axboe, linux-kernel

On 5/6/22 08:01, Hao Xu wrote:
> From: Hao Xu <howeyxu@tencent.com>
> 
> For operations like accept, multishot is a useful feature, since we can
> reduce a number of accept sqe. Let's integrate it to fast poll, it may
> be good for other operations in the future.
> 
> Signed-off-by: Hao Xu <howeyxu@tencent.com>
> ---
>   fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
>   1 file changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 8ebb1a794e36..d33777575faf 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
>    * either spurious wakeup or multishot CQE is served. 0 when it's done with
>    * the request, then the mask is stored in req->cqe.res.
>    */
> -static int io_poll_check_events(struct io_kiocb *req, bool locked)
> +static int io_poll_check_events(struct io_kiocb *req, bool *locked)
>   {
>   	struct io_ring_ctx *ctx = req->ctx;
>   	int v;
> @@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct io_kiocb *req, bool locked)
>   
>   		/* multishot, just fill an CQE and proceed */
>   		if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
> -			__poll_t mask = mangle_poll(req->cqe.res & req->apoll_events);
> -			bool filled;
> -
> -			spin_lock(&ctx->completion_lock);
> -			filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
> -						 IORING_CQE_F_MORE);
> -			io_commit_cqring(ctx);
> -			spin_unlock(&ctx->completion_lock);
> -			if (unlikely(!filled))
> -				return -ECANCELED;
> -			io_cqring_ev_posted(ctx);
> +			if (req->flags & REQ_F_APOLL_MULTISHOT) {
> +				io_tw_lock(req->ctx, locked);
> +				if (likely(!(req->task->flags & PF_EXITING)))
> +					io_queue_sqe(req);

That looks dangerous, io_queue_sqe() usually takes the request ownership
and doesn't expect that someone, i.e. io_poll_check_events(), may still be
actively using it.

E.g. io_accept() fails on fd < 0, return an error,
io_queue_sqe() -> io_queue_async() -> io_req_complete_failed()
kills it. Then io_poll_check_events() and polling in general
carry on using the freed request => UAF. Didn't look at it
too carefully, but there might other similar cases.


> +				else
> +					return -EFAULT;
> +			} else {
> +				__poll_t mask = mangle_poll(req->cqe.res &
> +							    req->apoll_events);
> +				bool filled;
> +
> +				spin_lock(&ctx->completion_lock);
> +				filled = io_fill_cqe_aux(ctx, req->cqe.user_data,
> +							 mask, IORING_CQE_F_MORE);
> +				io_commit_cqring(ctx);
> +				spin_unlock(&ctx->completion_lock);
> +				if (unlikely(!filled))
> +					return -ECANCELED;
> +				io_cqring_ev_posted(ctx);
> +			}
>   		} else if (req->cqe.res) {
>   			return 0;
>   		}
> @@ -6010,7 +6019,7 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>   	struct io_ring_ctx *ctx = req->ctx;
>   	int ret;
>   
> -	ret = io_poll_check_events(req, *locked);
> +	ret = io_poll_check_events(req, locked);
>   	if (ret > 0)
>   		return;
>   
> @@ -6035,7 +6044,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
>   	struct io_ring_ctx *ctx = req->ctx;
>   	int ret;
>   
> -	ret = io_poll_check_events(req, *locked);
> +	ret = io_poll_check_events(req, locked);
>   	if (ret > 0)
>   		return;
>   
> @@ -6275,7 +6284,7 @@ static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
>   	struct io_ring_ctx *ctx = req->ctx;
>   	struct async_poll *apoll;
>   	struct io_poll_table ipt;
> -	__poll_t mask = EPOLLONESHOT | POLLERR | POLLPRI;
> +	__poll_t mask = POLLERR | POLLPRI;
>   	int ret;
>   
>   	if (!def->pollin && !def->pollout)
> @@ -6284,6 +6293,8 @@ static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
>   		return IO_APOLL_ABORTED;
>   	if ((req->flags & (REQ_F_POLLED|REQ_F_PARTIAL_IO)) == REQ_F_POLLED)
>   		return IO_APOLL_ABORTED;
> +	if (!(req->flags & REQ_F_APOLL_MULTISHOT))
> +		mask |= EPOLLONESHOT;
>   
>   	if (def->pollin) {
>   		mask |= POLLIN | POLLRDNORM;

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] io_uring: let fast poll support multishot
  2022-05-06  7:01 ` [PATCH 3/5] io_uring: let fast poll support multishot Hao Xu
  2022-05-06 17:19   ` Pavel Begunkov
@ 2022-05-06 18:02   ` kernel test robot
  1 sibling, 0 replies; 36+ messages in thread
From: kernel test robot @ 2022-05-06 18:02 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: kbuild-all, Jens Axboe, Pavel Begunkov, linux-kernel

Hi Hao,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on f2e030dd7aaea5a937a2547dc980fab418fbc5e7]

url:    https://github.com/intel-lab-lkp/linux/commits/Hao-Xu/fast-poll-multishot-mode/20220506-150750
base:   f2e030dd7aaea5a937a2547dc980fab418fbc5e7
config: x86_64-randconfig-s021 (https://download.01.org/0day-ci/archive/20220507/202205070134.WJEE4sX5-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/intel-lab-lkp/linux/commit/6001c3e95550875d4328aa2ca8b342c42b0e644e
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Hao-Xu/fast-poll-multishot-mode/20220506-150750
        git checkout 6001c3e95550875d4328aa2ca8b342c42b0e644e
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   fs/io_uring.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/io_uring.h):
   include/trace/events/io_uring.h:488:1: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned int [usertype] op_flags @@     got restricted __kernel_rwf_t const [usertype] rw_flags @@
   include/trace/events/io_uring.h:488:1: sparse:     expected unsigned int [usertype] op_flags
   include/trace/events/io_uring.h:488:1: sparse:     got restricted __kernel_rwf_t const [usertype] rw_flags
   fs/io_uring.c: note: in included file (through include/trace/perf.h, include/trace/define_trace.h, include/trace/events/io_uring.h):
   include/trace/events/io_uring.h:488:1: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned int [usertype] op_flags @@     got restricted __kernel_rwf_t const [usertype] rw_flags @@
   include/trace/events/io_uring.h:488:1: sparse:     expected unsigned int [usertype] op_flags
   include/trace/events/io_uring.h:488:1: sparse:     got restricted __kernel_rwf_t const [usertype] rw_flags
   fs/io_uring.c:3280:23: sparse: sparse: incorrect type in assignment (different base types) @@     expected unsigned int [usertype] flags @@     got restricted __kernel_rwf_t @@
   fs/io_uring.c:3280:23: sparse:     expected unsigned int [usertype] flags
   fs/io_uring.c:3280:23: sparse:     got restricted __kernel_rwf_t
   fs/io_uring.c:3477:24: sparse: sparse: incorrect type in return expression (different address spaces) @@     expected void [noderef] __user * @@     got struct io_buffer *[assigned] kbuf @@
   fs/io_uring.c:3477:24: sparse:     expected void [noderef] __user *
   fs/io_uring.c:3477:24: sparse:     got struct io_buffer *[assigned] kbuf
   fs/io_uring.c:3864:48: sparse: sparse: incorrect type in argument 2 (different base types) @@     expected restricted __kernel_rwf_t [usertype] flags @@     got unsigned int [usertype] flags @@
   fs/io_uring.c:3864:48: sparse:     expected restricted __kernel_rwf_t [usertype] flags
   fs/io_uring.c:3864:48: sparse:     got unsigned int [usertype] flags
   fs/io_uring.c:5187:14: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct file *file @@     got struct file [noderef] __rcu * @@
   fs/io_uring.c:5187:14: sparse:     expected struct file *file
   fs/io_uring.c:5187:14: sparse:     got struct file [noderef] __rcu *
   fs/io_uring.c:5974:68: sparse: sparse: incorrect type in initializer (different base types) @@     expected restricted __poll_t [usertype] _key @@     got int apoll_events @@
   fs/io_uring.c:5974:68: sparse:     expected restricted __poll_t [usertype] _key
   fs/io_uring.c:5974:68: sparse:     got int apoll_events
   fs/io_uring.c:5979:48: sparse: sparse: restricted __poll_t degrades to integer
   fs/io_uring.c:5983:59: sparse: sparse: restricted __poll_t degrades to integer
   fs/io_uring.c:5991:74: sparse: sparse: incorrect type in argument 1 (different base types) @@     expected restricted __poll_t [usertype] val @@     got int @@
   fs/io_uring.c:5991:74: sparse:     expected restricted __poll_t [usertype] val
   fs/io_uring.c:5991:74: sparse:     got int
   fs/io_uring.c:5991:60: sparse: sparse: incorrect type in initializer (different base types) @@     expected restricted __poll_t [usertype] mask @@     got unsigned short @@
   fs/io_uring.c:5991:60: sparse:     expected restricted __poll_t [usertype] mask
   fs/io_uring.c:5991:60: sparse:     got unsigned short
   fs/io_uring.c:5997:58: sparse: sparse: incorrect type in argument 3 (different base types) @@     expected signed int [usertype] res @@     got restricted __poll_t [usertype] mask @@
   fs/io_uring.c:5997:58: sparse:     expected signed int [usertype] res
   fs/io_uring.c:5997:58: sparse:     got restricted __poll_t [usertype] mask
   fs/io_uring.c:6027:68: sparse: sparse: restricted __poll_t degrades to integer
   fs/io_uring.c:6027:57: sparse: sparse: incorrect type in argument 1 (different base types) @@     expected restricted __poll_t [usertype] val @@     got unsigned int @@
   fs/io_uring.c:6027:57: sparse:     expected restricted __poll_t [usertype] val
   fs/io_uring.c:6027:57: sparse:     got unsigned int
   fs/io_uring.c:6108:45: sparse: sparse: incorrect type in argument 3 (different base types) @@     expected int events @@     got restricted __poll_t [usertype] events @@
   fs/io_uring.c:6108:45: sparse:     expected int events
   fs/io_uring.c:6108:45: sparse:     got restricted __poll_t [usertype] events
   fs/io_uring.c:6143:40: sparse: sparse: incorrect type in argument 2 (different base types) @@     expected int mask @@     got restricted __poll_t [usertype] mask @@
   fs/io_uring.c:6143:40: sparse:     expected int mask
   fs/io_uring.c:6143:40: sparse:     got restricted __poll_t [usertype] mask
   fs/io_uring.c:6143:50: sparse: sparse: incorrect type in argument 3 (different base types) @@     expected int events @@     got restricted __poll_t [usertype] events @@
   fs/io_uring.c:6143:50: sparse:     expected int events
   fs/io_uring.c:6143:50: sparse:     got restricted __poll_t [usertype] events
   fs/io_uring.c:6235:24: sparse: sparse: incorrect type in return expression (different base types) @@     expected int @@     got restricted __poll_t [assigned] [usertype] mask @@
   fs/io_uring.c:6235:24: sparse:     expected int
   fs/io_uring.c:6235:24: sparse:     got restricted __poll_t [assigned] [usertype] mask
   fs/io_uring.c:6252:40: sparse: sparse: incorrect type in argument 2 (different base types) @@     expected int mask @@     got restricted __poll_t [assigned] [usertype] mask @@
   fs/io_uring.c:6252:40: sparse:     expected int mask
   fs/io_uring.c:6252:40: sparse:     got restricted __poll_t [assigned] [usertype] mask
   fs/io_uring.c:6252:50: sparse: sparse: incorrect type in argument 3 (different base types) @@     expected int events @@     got restricted __poll_t [usertype] events @@
   fs/io_uring.c:6252:50: sparse:     expected int events
   fs/io_uring.c:6252:50: sparse:     got restricted __poll_t [usertype] events
   fs/io_uring.c:6262:47: sparse: sparse: incorrect type in argument 3 (different base types) @@     expected int events @@     got restricted __poll_t [usertype] events @@
   fs/io_uring.c:6262:47: sparse:     expected int events
   fs/io_uring.c:6262:47: sparse:     got restricted __poll_t [usertype] events
>> fs/io_uring.c:6287:33: sparse: sparse: incorrect type in initializer (different base types) @@     expected restricted __poll_t [usertype] mask @@     got int @@
   fs/io_uring.c:6287:33: sparse:     expected restricted __poll_t [usertype] mask
   fs/io_uring.c:6287:33: sparse:     got int
   fs/io_uring.c:6300:22: sparse: sparse: invalid assignment: |=
   fs/io_uring.c:6300:22: sparse:    left side has type restricted __poll_t
   fs/io_uring.c:6300:22: sparse:    right side has type int
   fs/io_uring.c:6305:30: sparse: sparse: invalid assignment: &=
   fs/io_uring.c:6305:30: sparse:    left side has type restricted __poll_t
   fs/io_uring.c:6305:30: sparse:    right side has type int
   fs/io_uring.c:6307:22: sparse: sparse: invalid assignment: |=
   fs/io_uring.c:6307:22: sparse:    left side has type restricted __poll_t
   fs/io_uring.c:6307:22: sparse:    right side has type int
   fs/io_uring.c:6335:33: sparse: sparse: incorrect type in argument 5 (different base types) @@     expected int mask @@     got restricted __poll_t [assigned] [usertype] mask @@
   fs/io_uring.c:6335:33: sparse:     expected int mask
   fs/io_uring.c:6335:33: sparse:     got restricted __poll_t [assigned] [usertype] mask
   fs/io_uring.c:6335:50: sparse: sparse: incorrect type in argument 6 (different base types) @@     expected int events @@     got restricted __poll_t [usertype] events @@
   fs/io_uring.c:6335:50: sparse:     expected int events
   fs/io_uring.c:6335:50: sparse:     got restricted __poll_t [usertype] events
   fs/io_uring.c:6449:24: sparse: sparse: invalid assignment: |=
   fs/io_uring.c:6449:24: sparse:    left side has type unsigned int
   fs/io_uring.c:6449:24: sparse:    right side has type restricted __poll_t
   fs/io_uring.c:6450:65: sparse: sparse: restricted __poll_t degrades to integer
   fs/io_uring.c:6450:29: sparse: sparse: restricted __poll_t degrades to integer
   fs/io_uring.c:6450:38: sparse: sparse: incorrect type in return expression (different base types) @@     expected restricted __poll_t @@     got unsigned int @@
   fs/io_uring.c:6450:38: sparse:     expected restricted __poll_t
   fs/io_uring.c:6450:38: sparse:     got unsigned int
   fs/io_uring.c:6502:27: sparse: sparse: incorrect type in assignment (different base types) @@     expected int apoll_events @@     got restricted __poll_t [usertype] events @@
   fs/io_uring.c:6502:27: sparse:     expected int apoll_events
   fs/io_uring.c:6502:27: sparse:     got restricted __poll_t [usertype] events
   fs/io_uring.c:6541:43: sparse: sparse: invalid assignment: &=
   fs/io_uring.c:6541:43: sparse:    left side has type restricted __poll_t
   fs/io_uring.c:6541:43: sparse:    right side has type int
   fs/io_uring.c:6542:62: sparse: sparse: restricted __poll_t degrades to integer
   fs/io_uring.c:6542:43: sparse: sparse: invalid assignment: |=
   fs/io_uring.c:6542:43: sparse:    left side has type restricted __poll_t
   fs/io_uring.c:6542:43: sparse:    right side has type unsigned int
   fs/io_uring.c:2536:17: sparse: sparse: context imbalance in 'handle_prev_tw_list' - different lock contexts for basic block
   fs/io_uring.c:7610:39: sparse: sparse: marked inline, but without a definition
   fs/io_uring.c:7610:39: sparse: sparse: marked inline, but without a definition
   fs/io_uring.c:7610:39: sparse: sparse: marked inline, but without a definition
   fs/io_uring.c:7610:39: sparse: sparse: marked inline, but without a definition

vim +6287 fs/io_uring.c

  6280	
  6281	static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
  6282	{
  6283		const struct io_op_def *def = &io_op_defs[req->opcode];
  6284		struct io_ring_ctx *ctx = req->ctx;
  6285		struct async_poll *apoll;
  6286		struct io_poll_table ipt;
> 6287		__poll_t mask = POLLERR | POLLPRI;
  6288		int ret;
  6289	
  6290		if (!def->pollin && !def->pollout)
  6291			return IO_APOLL_ABORTED;
  6292		if (!file_can_poll(req->file))
  6293			return IO_APOLL_ABORTED;
  6294		if ((req->flags & (REQ_F_POLLED|REQ_F_PARTIAL_IO)) == REQ_F_POLLED)
  6295			return IO_APOLL_ABORTED;
  6296		if (!(req->flags & REQ_F_APOLL_MULTISHOT))
  6297			mask |= EPOLLONESHOT;
  6298	
  6299		if (def->pollin) {
  6300			mask |= POLLIN | POLLRDNORM;
  6301	
  6302			/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
  6303			if ((req->opcode == IORING_OP_RECVMSG) &&
  6304			    (req->sr_msg.msg_flags & MSG_ERRQUEUE))
  6305				mask &= ~POLLIN;
  6306		} else {
  6307			mask |= POLLOUT | POLLWRNORM;
  6308		}
  6309		if (def->poll_exclusive)
  6310			mask |= EPOLLEXCLUSIVE;
  6311		if (req->flags & REQ_F_POLLED) {
  6312			apoll = req->apoll;
  6313		} else if (!(issue_flags & IO_URING_F_UNLOCKED) &&
  6314			   !list_empty(&ctx->apoll_cache)) {
  6315			apoll = list_first_entry(&ctx->apoll_cache, struct async_poll,
  6316							poll.wait.entry);
  6317			list_del_init(&apoll->poll.wait.entry);
  6318		} else {
  6319			apoll = kmalloc(sizeof(*apoll), GFP_ATOMIC);
  6320			if (unlikely(!apoll))
  6321				return IO_APOLL_ABORTED;
  6322		}
  6323		apoll->double_poll = NULL;
  6324		req->apoll = apoll;
  6325		req->flags |= REQ_F_POLLED;
  6326		ipt.pt._qproc = io_async_queue_proc;
  6327	
  6328		io_kbuf_recycle(req, issue_flags);
  6329	
  6330		ret = __io_arm_poll_handler(req, &apoll->poll, &ipt, mask);
  6331		if (ret || ipt.error)
  6332			return ret ? IO_APOLL_READY : IO_APOLL_ABORTED;
  6333	
  6334		trace_io_uring_poll_arm(ctx, req, req->cqe.user_data, req->opcode,
  6335					mask, apoll->poll.events);
  6336		return IO_APOLL_OK;
  6337	}
  6338	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] io_uring: implement multishot mode for accept
  2022-05-06  7:01 ` [PATCH 5/5] io_uring: implement multishot mode for accept Hao Xu
  2022-05-06 14:42   ` Jens Axboe
@ 2022-05-06 20:50   ` Jens Axboe
  2022-05-06 21:29     ` Jens Axboe
  1 sibling, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 20:50 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel

On 5/6/22 1:01 AM, Hao Xu wrote:
> @@ -5748,8 +5758,12 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>  		if (!fixed)
>  			put_unused_fd(fd);
>  		ret = PTR_ERR(file);
> -		if (ret == -EAGAIN && force_nonblock)
> -			return -EAGAIN;
> +		if (ret == -EAGAIN && force_nonblock) {
> +			if ((req->flags & REQ_F_APOLL_MULTI_POLLED) ==
> +			    REQ_F_APOLL_MULTI_POLLED)
> +				ret = 0;
> +			return ret;

FWIW, this

	if ((req->flags & REQ_F_APOLL_MULTI_POLLED) == REQ_F_APOLL_MULTI_POLLED)

is identical to

	if (req->flags & REQ_F_APOLL_MULTI_POLLED)

but I suspect this used to check more flags (??), because as it stands
it seems a bit nonsensical.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] io_uring: implement multishot mode for accept
  2022-05-06 20:50   ` Jens Axboe
@ 2022-05-06 21:29     ` Jens Axboe
  0 siblings, 0 replies; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 21:29 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel

On 5/6/22 2:50 PM, Jens Axboe wrote:
> On 5/6/22 1:01 AM, Hao Xu wrote:
>> @@ -5748,8 +5758,12 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>>  		if (!fixed)
>>  			put_unused_fd(fd);
>>  		ret = PTR_ERR(file);
>> -		if (ret == -EAGAIN && force_nonblock)
>> -			return -EAGAIN;
>> +		if (ret == -EAGAIN && force_nonblock) {
>> +			if ((req->flags & REQ_F_APOLL_MULTI_POLLED) ==
>> +			    REQ_F_APOLL_MULTI_POLLED)
>> +				ret = 0;
>> +			return ret;
> 
> FWIW, this
> 
> 	if ((req->flags & REQ_F_APOLL_MULTI_POLLED) == REQ_F_APOLL_MULTI_POLLED)
> 
> is identical to
> 
> 	if (req->flags & REQ_F_APOLL_MULTI_POLLED)
> 
> but I suspect this used to check more flags (??), because as it stands
> it seems a bit nonsensical.

Looking deeper, it is indeed a mask and not a single flag! So the check
looks fine.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] io_uring: let fast poll support multishot
  2022-05-06 17:19   ` Pavel Begunkov
@ 2022-05-06 22:02     ` Jens Axboe
  2022-05-07  6:32       ` Hao Xu
  2022-05-07  9:26       ` Pavel Begunkov
  2022-05-07  7:08     ` Hao Xu
  1 sibling, 2 replies; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 22:02 UTC (permalink / raw)
  To: Pavel Begunkov, Hao Xu, io-uring; +Cc: linux-kernel

On 5/6/22 11:19 AM, Pavel Begunkov wrote:
> On 5/6/22 08:01, Hao Xu wrote:
>> From: Hao Xu <howeyxu@tencent.com>
>>
>> For operations like accept, multishot is a useful feature, since we can
>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>> be good for other operations in the future.
>>
>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>> ---
>>   fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
>>   1 file changed, 26 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 8ebb1a794e36..d33777575faf 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
>>    * either spurious wakeup or multishot CQE is served. 0 when it's done with
>>    * the request, then the mask is stored in req->cqe.res.
>>    */
>> -static int io_poll_check_events(struct io_kiocb *req, bool locked)
>> +static int io_poll_check_events(struct io_kiocb *req, bool *locked)
>>   {
>>       struct io_ring_ctx *ctx = req->ctx;
>>       int v;
>> @@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct io_kiocb *req, bool locked)
>>             /* multishot, just fill an CQE and proceed */
>>           if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
>> -            __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events);
>> -            bool filled;
>> -
>> -            spin_lock(&ctx->completion_lock);
>> -            filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
>> -                         IORING_CQE_F_MORE);
>> -            io_commit_cqring(ctx);
>> -            spin_unlock(&ctx->completion_lock);
>> -            if (unlikely(!filled))
>> -                return -ECANCELED;
>> -            io_cqring_ev_posted(ctx);
>> +            if (req->flags & REQ_F_APOLL_MULTISHOT) {
>> +                io_tw_lock(req->ctx, locked);
>> +                if (likely(!(req->task->flags & PF_EXITING)))
>> +                    io_queue_sqe(req);
> 
> That looks dangerous, io_queue_sqe() usually takes the request
> ownership and doesn't expect that someone, i.e.
> io_poll_check_events(), may still be actively using it.

I took a look at this, too. We do own the request at this point, but
it's still on the poll list. If io_accept() fails, then we do run the
poll_clean.

> E.g. io_accept() fails on fd < 0, return an error, io_queue_sqe() ->
> io_queue_async() -> io_req_complete_failed() kills it. Then
> io_poll_check_events() and polling in general carry on using the freed
> request => UAF. Didn't look at it too carefully, but there might other
> similar cases.

But we better have done poll_clean() before returning the error. What am
I missing here?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] fast poll multishot mode
  2022-05-06  7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
                   ` (5 preceding siblings ...)
  2022-05-06  7:36 ` [PATCH v2 0/5] fast poll multishot mode Hao Xu
@ 2022-05-06 22:23 ` Jens Axboe
  2022-05-06 23:26   ` Jens Axboe
  6 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 22:23 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel

On 5/6/22 1:00 AM, Hao Xu wrote:
> Let multishot support multishot mode, currently only add accept as its
> first comsumer.
> theoretical analysis:
>   1) when connections come in fast
>     - singleshot:
>               add accept sqe(userpsace) --> accept inline
>                               ^                 |
>                               |-----------------|
>     - multishot:
>              add accept sqe(userspace) --> accept inline
>                                               ^     |
>                                               |--*--|
> 
>     we do accept repeatedly in * place until get EAGAIN
> 
>   2) when connections come in at a low pressure
>     similar thing like 1), we reduce a lot of userspace-kernel context
>     switch and useless vfs_poll()
> 
> 
> tests:
> Did some tests, which goes in this way:
> 
>   server    client(multiple)
>   accept    connect
>   read      write
>   write     read
>   close     close
> 
> Basically, raise up a number of clients(on same machine with server) to
> connect to the server, and then write some data to it, the server will
> write those data back to the client after it receives them, and then
> close the connection after write return. Then the client will read the
> data and then close the connection. Here I test 10000 clients connect
> one server, data size 128 bytes. And each client has a go routine for
> it, so they come to the server in short time.
> test 20 times before/after this patchset, time spent:(unit cycle, which
> is the return value of clock())
> before:
>   1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>   +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>   +1934226+1914385)/20.0 = 1927633.75
> after:
>   1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>   +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>   +1871324+1940803)/20.0 = 1894750.45
> 
> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
> 
> 
> A liburing test is here:
> https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c

Wish I had seen that, I wrote my own! But maybe that's good, you tend to
find other issues through that.

Anyway, works for me in testing, and I can see this being a nice win for
accept intensive workloads. I pushed a bunch of cleanup patches that
should just get folded in. Can you fold them into your patches and
address the other feedback, and post a v3? I pushed the test branch
here:

https://git.kernel.dk/cgit/linux-block/log/?h=fastpoll-mshot

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] fast poll multishot mode
  2022-05-06 22:23 ` Jens Axboe
@ 2022-05-06 23:26   ` Jens Axboe
  2022-05-07  2:33     ` Jens Axboe
  0 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 23:26 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel

On 5/6/22 4:23 PM, Jens Axboe wrote:
> On 5/6/22 1:00 AM, Hao Xu wrote:
>> Let multishot support multishot mode, currently only add accept as its
>> first comsumer.
>> theoretical analysis:
>>   1) when connections come in fast
>>     - singleshot:
>>               add accept sqe(userpsace) --> accept inline
>>                               ^                 |
>>                               |-----------------|
>>     - multishot:
>>              add accept sqe(userspace) --> accept inline
>>                                               ^     |
>>                                               |--*--|
>>
>>     we do accept repeatedly in * place until get EAGAIN
>>
>>   2) when connections come in at a low pressure
>>     similar thing like 1), we reduce a lot of userspace-kernel context
>>     switch and useless vfs_poll()
>>
>>
>> tests:
>> Did some tests, which goes in this way:
>>
>>   server    client(multiple)
>>   accept    connect
>>   read      write
>>   write     read
>>   close     close
>>
>> Basically, raise up a number of clients(on same machine with server) to
>> connect to the server, and then write some data to it, the server will
>> write those data back to the client after it receives them, and then
>> close the connection after write return. Then the client will read the
>> data and then close the connection. Here I test 10000 clients connect
>> one server, data size 128 bytes. And each client has a go routine for
>> it, so they come to the server in short time.
>> test 20 times before/after this patchset, time spent:(unit cycle, which
>> is the return value of clock())
>> before:
>>   1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>>   +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>>   +1934226+1914385)/20.0 = 1927633.75
>> after:
>>   1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>>   +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>>   +1871324+1940803)/20.0 = 1894750.45
>>
>> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
>>
>>
>> A liburing test is here:
>> https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c
> 
> Wish I had seen that, I wrote my own! But maybe that's good, you tend to
> find other issues through that.
> 
> Anyway, works for me in testing, and I can see this being a nice win for
> accept intensive workloads. I pushed a bunch of cleanup patches that
> should just get folded in. Can you fold them into your patches and
> address the other feedback, and post a v3? I pushed the test branch
> here:
> 
> https://git.kernel.dk/cgit/linux-block/log/?h=fastpoll-mshot

Quick benchmark here, accepting 10k connections:

Stock kernel
real	0m0.728s
user	0m0.009s
sys	0m0.192s

Patched
real	0m0.684s
user	0m0.018s
sys	0m0.102s

Looks like a nice win for a highly synthetic benchmark. Nothing
scientific, was just curious.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] fast poll multishot mode
  2022-05-06 23:26   ` Jens Axboe
@ 2022-05-07  2:33     ` Jens Axboe
  2022-05-07  3:08       ` Jens Axboe
  0 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-07  2:33 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel

On 5/6/22 5:26 PM, Jens Axboe wrote:
> On 5/6/22 4:23 PM, Jens Axboe wrote:
>> On 5/6/22 1:00 AM, Hao Xu wrote:
>>> Let multishot support multishot mode, currently only add accept as its
>>> first comsumer.
>>> theoretical analysis:
>>>   1) when connections come in fast
>>>     - singleshot:
>>>               add accept sqe(userpsace) --> accept inline
>>>                               ^                 |
>>>                               |-----------------|
>>>     - multishot:
>>>              add accept sqe(userspace) --> accept inline
>>>                                               ^     |
>>>                                               |--*--|
>>>
>>>     we do accept repeatedly in * place until get EAGAIN
>>>
>>>   2) when connections come in at a low pressure
>>>     similar thing like 1), we reduce a lot of userspace-kernel context
>>>     switch and useless vfs_poll()
>>>
>>>
>>> tests:
>>> Did some tests, which goes in this way:
>>>
>>>   server    client(multiple)
>>>   accept    connect
>>>   read      write
>>>   write     read
>>>   close     close
>>>
>>> Basically, raise up a number of clients(on same machine with server) to
>>> connect to the server, and then write some data to it, the server will
>>> write those data back to the client after it receives them, and then
>>> close the connection after write return. Then the client will read the
>>> data and then close the connection. Here I test 10000 clients connect
>>> one server, data size 128 bytes. And each client has a go routine for
>>> it, so they come to the server in short time.
>>> test 20 times before/after this patchset, time spent:(unit cycle, which
>>> is the return value of clock())
>>> before:
>>>   1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>>>   +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>>>   +1934226+1914385)/20.0 = 1927633.75
>>> after:
>>>   1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>>>   +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>>>   +1871324+1940803)/20.0 = 1894750.45
>>>
>>> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
>>>
>>>
>>> A liburing test is here:
>>> https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c
>>
>> Wish I had seen that, I wrote my own! But maybe that's good, you tend to
>> find other issues through that.
>>
>> Anyway, works for me in testing, and I can see this being a nice win for
>> accept intensive workloads. I pushed a bunch of cleanup patches that
>> should just get folded in. Can you fold them into your patches and
>> address the other feedback, and post a v3? I pushed the test branch
>> here:
>>
>> https://git.kernel.dk/cgit/linux-block/log/?h=fastpoll-mshot
> 
> Quick benchmark here, accepting 10k connections:
> 
> Stock kernel
> real	0m0.728s
> user	0m0.009s
> sys	0m0.192s
> 
> Patched
> real	0m0.684s
> user	0m0.018s
> sys	0m0.102s
> 
> Looks like a nice win for a highly synthetic benchmark. Nothing
> scientific, was just curious.

One more thought on this - how is it supposed to work with
accept-direct? One idea would be to make it incrementally increasing.
But we need a good story for that, if it's exclusive to non-direct
files, then it's a lot less interesting as the latter is really nice win
for lots of files. If we can combine the two, even better.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] fast poll multishot mode
  2022-05-07  2:33     ` Jens Axboe
@ 2022-05-07  3:08       ` Jens Axboe
  2022-05-07 16:01         ` Hao Xu
  0 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-07  3:08 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel

On 5/6/22 8:33 PM, Jens Axboe wrote:
> On 5/6/22 5:26 PM, Jens Axboe wrote:
>> On 5/6/22 4:23 PM, Jens Axboe wrote:
>>> On 5/6/22 1:00 AM, Hao Xu wrote:
>>>> Let multishot support multishot mode, currently only add accept as its
>>>> first comsumer.
>>>> theoretical analysis:
>>>>   1) when connections come in fast
>>>>     - singleshot:
>>>>               add accept sqe(userpsace) --> accept inline
>>>>                               ^                 |
>>>>                               |-----------------|
>>>>     - multishot:
>>>>              add accept sqe(userspace) --> accept inline
>>>>                                               ^     |
>>>>                                               |--*--|
>>>>
>>>>     we do accept repeatedly in * place until get EAGAIN
>>>>
>>>>   2) when connections come in at a low pressure
>>>>     similar thing like 1), we reduce a lot of userspace-kernel context
>>>>     switch and useless vfs_poll()
>>>>
>>>>
>>>> tests:
>>>> Did some tests, which goes in this way:
>>>>
>>>>   server    client(multiple)
>>>>   accept    connect
>>>>   read      write
>>>>   write     read
>>>>   close     close
>>>>
>>>> Basically, raise up a number of clients(on same machine with server) to
>>>> connect to the server, and then write some data to it, the server will
>>>> write those data back to the client after it receives them, and then
>>>> close the connection after write return. Then the client will read the
>>>> data and then close the connection. Here I test 10000 clients connect
>>>> one server, data size 128 bytes. And each client has a go routine for
>>>> it, so they come to the server in short time.
>>>> test 20 times before/after this patchset, time spent:(unit cycle, which
>>>> is the return value of clock())
>>>> before:
>>>>   1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>>>>   +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>>>>   +1934226+1914385)/20.0 = 1927633.75
>>>> after:
>>>>   1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>>>>   +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>>>>   +1871324+1940803)/20.0 = 1894750.45
>>>>
>>>> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
>>>>
>>>>
>>>> A liburing test is here:
>>>> https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c
>>>
>>> Wish I had seen that, I wrote my own! But maybe that's good, you tend to
>>> find other issues through that.
>>>
>>> Anyway, works for me in testing, and I can see this being a nice win for
>>> accept intensive workloads. I pushed a bunch of cleanup patches that
>>> should just get folded in. Can you fold them into your patches and
>>> address the other feedback, and post a v3? I pushed the test branch
>>> here:
>>>
>>> https://git.kernel.dk/cgit/linux-block/log/?h=fastpoll-mshot
>>
>> Quick benchmark here, accepting 10k connections:
>>
>> Stock kernel
>> real	0m0.728s
>> user	0m0.009s
>> sys	0m0.192s
>>
>> Patched
>> real	0m0.684s
>> user	0m0.018s
>> sys	0m0.102s
>>
>> Looks like a nice win for a highly synthetic benchmark. Nothing
>> scientific, was just curious.
> 
> One more thought on this - how is it supposed to work with
> accept-direct? One idea would be to make it incrementally increasing.
> But we need a good story for that, if it's exclusive to non-direct
> files, then it's a lot less interesting as the latter is really nice win
> for lots of files. If we can combine the two, even better.

Running some quick testing, on an actual test box (previous numbers were
from a vm on my laptop):

Testing singleshot, normal files
Did 10000 accepts

________________________________________________________
Executed in  216.10 millis    fish           external
   usr time    9.32 millis  150.00 micros    9.17 millis
   sys time  110.06 millis   67.00 micros  109.99 millis

Testing multishot, fixed files
Did 10000 accepts

________________________________________________________
Executed in  189.04 millis    fish           external
   usr time   11.86 millis  159.00 micros   11.71 millis
   sys time   93.71 millis   70.00 micros   93.64 millis

That's about ~19 usec to accept a connection, pretty decent. Using
singleshot and with fixed files, it shaves about ~8% off, ends at around
200msec.

I think we can get away with using fixed files and multishot, attaching
the quick patch I did below to test it. We need something better than
this, otherwise once the space fills up, we'll likely end up with a
sparse space and the naive approach of just incrementing the next slot
won't work at all.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept
  2022-05-06 14:32   ` Jens Axboe
@ 2022-05-07  4:05     ` Hao Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07  4:05 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, linux-kernel

在 5/6/22 10:32 PM, Jens Axboe 写道:
> On 5/6/22 1:00 AM, Hao Xu wrote:
>> From: Hao Xu <howeyxu@tencent.com>
>>
>> add an accept_flag IORING_ACCEPT_MULTISHOT for accept, which is to
>> support multishot.
>>
>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>> ---
>>   include/uapi/linux/io_uring.h | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>> index fad63564678a..73bc7e54ac18 100644
>> --- a/include/uapi/linux/io_uring.h
>> +++ b/include/uapi/linux/io_uring.h
>> @@ -213,6 +213,11 @@ enum {
>>   #define IORING_ASYNC_CANCEL_FD	(1U << 1)
>>   #define IORING_ASYNC_CANCEL_ANY	(1U << 2)
>>   
>> +/*
>> + * accept flags stored in accept_flags
>> + */
>> +#define IORING_ACCEPT_MULTISHOT	(1U << 15)
> 
> It isn't stored in accept_flags, is it? This is an io_uring private
> flag, and it's in ioprio. Which is honestly a good place for per-op
> private flags, since nobody really uses ioprio outside of read/write
> style requests. But the comment is wrong :-)

Ah, yes, thanks for pointing it out, I forgot to update the comment
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] io_uring: let fast poll support multishot
  2022-05-06 22:02     ` Jens Axboe
@ 2022-05-07  6:32       ` Hao Xu
  2022-05-07  9:26       ` Pavel Begunkov
  1 sibling, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07  6:32 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: linux-kernel

在 2022/5/7 上午6:02, Jens Axboe 写道:
> On 5/6/22 11:19 AM, Pavel Begunkov wrote:
>> On 5/6/22 08:01, Hao Xu wrote:
>>> From: Hao Xu <howeyxu@tencent.com>
>>>
>>> For operations like accept, multishot is a useful feature, since we can
>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>> be good for other operations in the future.
>>>
>>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>>> ---
>>>    fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
>>>    1 file changed, 26 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 8ebb1a794e36..d33777575faf 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
>>>     * either spurious wakeup or multishot CQE is served. 0 when it's done with
>>>     * the request, then the mask is stored in req->cqe.res.
>>>     */
>>> -static int io_poll_check_events(struct io_kiocb *req, bool locked)
>>> +static int io_poll_check_events(struct io_kiocb *req, bool *locked)
>>>    {
>>>        struct io_ring_ctx *ctx = req->ctx;
>>>        int v;
>>> @@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct io_kiocb *req, bool locked)
>>>              /* multishot, just fill an CQE and proceed */
>>>            if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
>>> -            __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events);
>>> -            bool filled;
>>> -
>>> -            spin_lock(&ctx->completion_lock);
>>> -            filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
>>> -                         IORING_CQE_F_MORE);
>>> -            io_commit_cqring(ctx);
>>> -            spin_unlock(&ctx->completion_lock);
>>> -            if (unlikely(!filled))
>>> -                return -ECANCELED;
>>> -            io_cqring_ev_posted(ctx);
>>> +            if (req->flags & REQ_F_APOLL_MULTISHOT) {
>>> +                io_tw_lock(req->ctx, locked);
>>> +                if (likely(!(req->task->flags & PF_EXITING)))
>>> +                    io_queue_sqe(req);
>>
>> That looks dangerous, io_queue_sqe() usually takes the request
>> ownership and doesn't expect that someone, i.e.
>> io_poll_check_events(), may still be actively using it.
> 
> I took a look at this, too. We do own the request at this point, but
> it's still on the poll list. If io_accept() fails, then we do run the
> poll_clean.
> 
>> E.g. io_accept() fails on fd < 0, return an error, io_queue_sqe() ->
>> io_queue_async() -> io_req_complete_failed() kills it. Then
>> io_poll_check_events() and polling in general carry on using the freed
>> request => UAF. Didn't look at it too carefully, but there might other
>> similar cases.
> 
> But we better have done poll_clean() before returning the error. What am
> I missing here?

Sorry, I don't get it. I've done the poll_clean() before returnning
error in io_accept()
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] io_uring: add a helper for poll clean
  2022-05-06 14:36   ` Jens Axboe
@ 2022-05-07  6:37     ` Hao Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07  6:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, linux-kernel

在 2022/5/6 下午10:36, Jens Axboe 写道:
> On 5/6/22 1:01 AM, Hao Xu wrote:
>> From: Hao Xu <howeyxu@tencent.com>
>>
>> Add a helper for poll clean, it will be used in the multishot accept in
>> the later patches.
> 
> Should this just go into io_clean_op()? Didn't look at it thoroughly,
> but it'd remove some cases from the next patch if it could.
> 
Actually this was my first version, here I put it in io_accept() to make
it happen as early as possible. I rethinked about this, seems it's ok to
put it into the io_clean_op().

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] io_uring: add a helper for poll clean
  2022-05-06 16:22   ` Pavel Begunkov
@ 2022-05-07  6:43     ` Hao Xu
  2022-05-07  9:29       ` Pavel Begunkov
  0 siblings, 1 reply; 36+ messages in thread
From: Hao Xu @ 2022-05-07  6:43 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe, linux-kernel

在 2022/5/7 上午12:22, Pavel Begunkov 写道:
> On 5/6/22 08:01, Hao Xu wrote:
>> From: Hao Xu <howeyxu@tencent.com>
>>
>> Add a helper for poll clean, it will be used in the multishot accept in
>> the later patches.
>>
>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>> ---
>>   fs/io_uring.c | 23 ++++++++++++++++++-----
>>   1 file changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index d33777575faf..0a83ecc457d1 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5711,6 +5711,23 @@ static int io_accept_prep(struct io_kiocb *req, 
>> const struct io_uring_sqe *sqe)
>>       return 0;
>>   }
>> +static inline void __io_poll_clean(struct io_kiocb *req)
>> +{
>> +    struct io_ring_ctx *ctx = req->ctx;
>> +
>> +    io_poll_remove_entries(req);
>> +    spin_lock(&ctx->completion_lock);
>> +    hash_del(&req->hash_node);
>> +    spin_unlock(&ctx->completion_lock);
>> +}
>> +
>> +#define REQ_F_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)
>> +static inline void io_poll_clean(struct io_kiocb *req)
>> +{
>> +    if ((req->flags & REQ_F_APOLL_MULTI_POLLED) == 
>> REQ_F_APOLL_MULTI_POLLED)
> 
> So it triggers for apoll multishot only when REQ_F_POLLED is _not_ set,
> but if it's not set it did never go through arm_poll / etc. and there is
> nothing to clean up. What is the catch?

No, it is triggered for apoll multishot only when REQ_F_POLLED is set..
> 
> btw, don't see the function used in this patch, better to add it later
> or at least mark with attribute unused, or some may get build failures.
Gotcha.
> 
> 
>> +        __io_poll_clean(req);
>> +}
>> +
>>   static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>>   {
>>       struct io_accept *accept = &req->accept;
>> @@ -6041,17 +6058,13 @@ static void io_poll_task_func(struct io_kiocb 
>> *req, bool *locked)
>>   static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
>>   {
>> -    struct io_ring_ctx *ctx = req->ctx;
>>       int ret;
>>       ret = io_poll_check_events(req, locked);
>>       if (ret > 0)
>>           return;
>> -    io_poll_remove_entries(req);
>> -    spin_lock(&ctx->completion_lock);
>> -    hash_del(&req->hash_node);
>> -    spin_unlock(&ctx->completion_lock);
>> +    __io_poll_clean(req);
>>       if (!ret)
>>           io_req_task_submit(req, locked);
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] io_uring: let fast poll support multishot
  2022-05-06 17:19   ` Pavel Begunkov
  2022-05-06 22:02     ` Jens Axboe
@ 2022-05-07  7:08     ` Hao Xu
  2022-05-07  9:47       ` Pavel Begunkov
  1 sibling, 1 reply; 36+ messages in thread
From: Hao Xu @ 2022-05-07  7:08 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe, linux-kernel

在 2022/5/7 上午1:19, Pavel Begunkov 写道:
> On 5/6/22 08:01, Hao Xu wrote:
>> From: Hao Xu <howeyxu@tencent.com>
>>
>> For operations like accept, multishot is a useful feature, since we can
>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>> be good for other operations in the future.
>>
>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>> ---
>>   fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
>>   1 file changed, 26 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 8ebb1a794e36..d33777575faf 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct 
>> io_kiocb *req)
>>    * either spurious wakeup or multishot CQE is served. 0 when it's 
>> done with
>>    * the request, then the mask is stored in req->cqe.res.
>>    */
>> -static int io_poll_check_events(struct io_kiocb *req, bool locked)
>> +static int io_poll_check_events(struct io_kiocb *req, bool *locked)
>>   {
>>       struct io_ring_ctx *ctx = req->ctx;
>>       int v;
>> @@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct 
>> io_kiocb *req, bool locked)
>>           /* multishot, just fill an CQE and proceed */
>>           if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
>> -            __poll_t mask = mangle_poll(req->cqe.res & 
>> req->apoll_events);
>> -            bool filled;
>> -
>> -            spin_lock(&ctx->completion_lock);
>> -            filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
>> -                         IORING_CQE_F_MORE);
>> -            io_commit_cqring(ctx);
>> -            spin_unlock(&ctx->completion_lock);
>> -            if (unlikely(!filled))
>> -                return -ECANCELED;
>> -            io_cqring_ev_posted(ctx);
>> +            if (req->flags & REQ_F_APOLL_MULTISHOT) {
>> +                io_tw_lock(req->ctx, locked);
>> +                if (likely(!(req->task->flags & PF_EXITING)))
>> +                    io_queue_sqe(req);
> 
> That looks dangerous, io_queue_sqe() usually takes the request ownership
> and doesn't expect that someone, i.e. io_poll_check_events(), may still be
> actively using it.
> 
> E.g. io_accept() fails on fd < 0, return an error,
> io_queue_sqe() -> io_queue_async() -> io_req_complete_failed()
> kills it. Then io_poll_check_events() and polling in general
> carry on using the freed request => UAF. Didn't look at it
> too carefully, but there might other similar cases.
> 
I checked this when I did the coding, it seems the only case is
while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
uses req again after req recycled in io_queue_sqe() path like you
pointed out above, but this case should be ok since we haven't
reuse the struct req{} at that point.
In my first version, I skiped the do while{} in io_poll_check_events()
for multishot apoll and do the reap in io_req_task_submit()

static void io_apoll_task_func(struct io_kiocb *req, bool *locked) 

   { 

           int ret; 

 

           ret = io_poll_check_events(req, locked); 

           if (ret > 0) 

                   return; 

 

           __io_poll_clean(req); 

 

           if (!ret) 

                   io_req_task_submit(req, locked);   <------here 

           else 

                   io_req_complete_failed(req, ret); 

   }

But the disadvantage is in high frequent workloads case, it may loop in
io_poll_check_events for long time, then finally generating cqes in the
above io_req_task_submit() which is not good in terms of latency. 

> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 5/5] io_uring: implement multishot mode for accept
  2022-05-06 14:42   ` Jens Axboe
@ 2022-05-07  9:13     ` Hao Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07  9:13 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, linux-kernel

在 2022/5/6 下午10:42, Jens Axboe 写道:
> On 5/6/22 1:01 AM, Hao Xu wrote:
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 0a83ecc457d1..9febe7774dc3 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -1254,6 +1254,7 @@ static int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags);
>>   static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer);
>>   static void io_eventfd_signal(struct io_ring_ctx *ctx);
>>   static void io_req_tw_post_queue(struct io_kiocb *req, s32 res, u32 cflags);
>> +static void io_poll_remove_entries(struct io_kiocb *req);
>>   
>>   static struct kmem_cache *req_cachep;
>>   
>> @@ -5690,24 +5691,29 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>   static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>   {
>>   	struct io_accept *accept = &req->accept;
>> +	bool multishot;
>>   
>>   	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>   		return -EINVAL;
>> -	if (sqe->ioprio || sqe->len || sqe->buf_index)
>> +	if (sqe->len || sqe->buf_index)
>>   		return -EINVAL;
>>   
>>   	accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>   	accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>   	accept->flags = READ_ONCE(sqe->accept_flags);
>>   	accept->nofile = rlimit(RLIMIT_NOFILE);
>> +	multishot = !!(READ_ONCE(sqe->ioprio) & IORING_ACCEPT_MULTISHOT);
> 
> I tend to like:
> 
> 	multishot = READ_ONCE(sqe->ioprio) & IORING_ACCEPT_MULTISHOT) != 0;
> 
> as I think it's more readable. But I think we really want it ala:
> 
> 	u16 poll_flags;
> 
> 	poll_flags = READ_ONCE(sqe->ioprio);
> 	if (poll_flags & ~IORING_ACCEPT_MULTISHOT)
> 		return -EINVAL;
> 
> 	...
> 
> to ensure that we can add more flags later, hence only accepting this
> single flag right now.
> 
> Do we need REQ_F_APOLL_MULTI_POLLED, or can we just store whether this
> is a multishot request in struct io_accept?
I think we can do it in this way, but it may be a bit inconvenient if we
add other multishot OPCODE. With REQ_F_APOLL_MULTI_POLLED we can just
check req->flags in the poll arming path, which keeps it op unrelated.
> 
>> @@ -5760,7 +5774,35 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>>   		ret = io_install_fixed_file(req, file, issue_flags,
>>   					    accept->file_slot - 1);
>>   	}
>> -	__io_req_complete(req, issue_flags, ret, 0);
>> +
>> +	if (req->flags & REQ_F_APOLL_MULTISHOT) {
>> +		if (ret >= 0) {
>> +			bool filled;
>> +
>> +			spin_lock(&ctx->completion_lock);
>> +			filled = io_fill_cqe_aux(ctx, req->cqe.user_data, ret,
>> +						 IORING_CQE_F_MORE);
>> +			io_commit_cqring(ctx);
>> +			spin_unlock(&ctx->completion_lock);
>> +			if (unlikely(!filled)) {
>> +				io_poll_clean(req);
>> +				return -ECANCELED;
>> +			}
>> +			io_cqring_ev_posted(ctx);
>> +			goto retry;
>> +		} else {
>> +			/*
>> +			 * the apoll multishot req should handle poll
>> +			 * cancellation by itself since the upper layer
>> +			 * who called io_queue_sqe() cannot get errors
>> +			 * happened here.
>> +			 */
>> +			io_poll_clean(req);
>> +			return ret;
>> +		}
>> +	} else {
>> +		__io_req_complete(req, issue_flags, ret, 0);
>> +	}
>>   	return 0;
>>   }
> 
> I'd probably just make that:
> 
> 	if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
> 		__io_req_complete(req, issue_flags, ret, 0);
> 		return 0;
> 	}
> 	if (ret >= 0) {
> 		bool filled;
> 
> 		spin_lock(&ctx->completion_lock);
> 		filled = io_fill_cqe_aux(ctx, req->cqe.user_data, ret,
> 					 IORING_CQE_F_MORE);
> 		io_commit_cqring(ctx);
> 		spin_unlock(&ctx->completion_lock);
> 		if (filled) {
> 			io_cqring_ev_posted(ctx);
> 			goto retry;
> 		}
> 		/* fall through to error case */
> 		ret = -ECANCELED;
> 	}
> 
> 	/*
> 	 * the apoll multishot req should handle poll
> 	 * cancellation by itself since the upper layer
> 	 * who called io_queue_sqe() cannot get errors
> 	 * happened here.
> 	 */
> 	io_poll_clean(req);
> 	return ret;
> 
> which I think is a lot easier to read and keeps the indentation at a
> manageable level and reduces duplicate code.
Great, thanks, it's better.
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] io_uring: let fast poll support multishot
  2022-05-06 22:02     ` Jens Axboe
  2022-05-07  6:32       ` Hao Xu
@ 2022-05-07  9:26       ` Pavel Begunkov
  1 sibling, 0 replies; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-07  9:26 UTC (permalink / raw)
  To: Jens Axboe, Hao Xu, io-uring; +Cc: linux-kernel

On 5/6/22 23:02, Jens Axboe wrote:
> On 5/6/22 11:19 AM, Pavel Begunkov wrote:
>> On 5/6/22 08:01, Hao Xu wrote:
>>> From: Hao Xu <howeyxu@tencent.com>
>>>
>>> For operations like accept, multishot is a useful feature, since we can
>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>> be good for other operations in the future.
>>>
>>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>>> ---
>>>    fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
>>>    1 file changed, 26 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 8ebb1a794e36..d33777575faf 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
>>>     * either spurious wakeup or multishot CQE is served. 0 when it's done with
>>>     * the request, then the mask is stored in req->cqe.res.
>>>     */
>>> -static int io_poll_check_events(struct io_kiocb *req, bool locked)
>>> +static int io_poll_check_events(struct io_kiocb *req, bool *locked)
>>>    {
>>>        struct io_ring_ctx *ctx = req->ctx;
>>>        int v;
>>> @@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct io_kiocb *req, bool locked)
>>>              /* multishot, just fill an CQE and proceed */
>>>            if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
>>> -            __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events);
>>> -            bool filled;
>>> -
>>> -            spin_lock(&ctx->completion_lock);
>>> -            filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
>>> -                         IORING_CQE_F_MORE);
>>> -            io_commit_cqring(ctx);
>>> -            spin_unlock(&ctx->completion_lock);
>>> -            if (unlikely(!filled))
>>> -                return -ECANCELED;
>>> -            io_cqring_ev_posted(ctx);
>>> +            if (req->flags & REQ_F_APOLL_MULTISHOT) {
>>> +                io_tw_lock(req->ctx, locked);
>>> +                if (likely(!(req->task->flags & PF_EXITING)))
>>> +                    io_queue_sqe(req);
>>
>> That looks dangerous, io_queue_sqe() usually takes the request
>> ownership and doesn't expect that someone, i.e.
>> io_poll_check_events(), may still be actively using it.
> 
> I took a look at this, too. We do own the request at this point, but

Right, but we don't pass the ownership into io_queue_sqe(). IOW,
it can potentially free it / use tw / etc. inside and then we
return back to io_poll_check_events() with a broken req.

> it's still on the poll list. If io_accept() fails, then we do run the
> poll_clean.
> 
>> E.g. io_accept() fails on fd < 0, return an error, io_queue_sqe() ->
>> io_queue_async() -> io_req_complete_failed() kills it. Then
>> io_poll_check_events() and polling in general carry on using the freed
>> request => UAF. Didn't look at it too carefully, but there might other
>> similar cases.
> 
> But we better have done poll_clean() before returning the error. What am
> I missing here?

One scenario I'd be worry about is sth like:


io_apoll_task_func()                            |
-> io_poll_check_events()                       |
   // 1st iteration                              |
   -> io_queue_sqe()                             |
                                                 | poll cancel()
                                                 |   -> set IO_POLL_CANCEL_FLAG
     -> io_accept() fails                        |
       -> io_poll_clean()                        |
     -> io_req_complete_failed()                 |
   // 2nd iteration finds IO_POLL_CANCEL_FLAG    |
     return -ECANCELLED                          |
-> io_req_complete_failed(req, ret)             |


The problem in this example is double io_req_complete_failed()

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/5] io_uring: add a helper for poll clean
  2022-05-07  6:43     ` Hao Xu
@ 2022-05-07  9:29       ` Pavel Begunkov
  0 siblings, 0 replies; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-07  9:29 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Jens Axboe, linux-kernel

On 5/7/22 07:43, Hao Xu wrote:
> 在 2022/5/7 上午12:22, Pavel Begunkov 写道:
>> On 5/6/22 08:01, Hao Xu wrote:
>>> From: Hao Xu <howeyxu@tencent.com>
>>>
>>> Add a helper for poll clean, it will be used in the multishot accept in
>>> the later patches.
>>>
>>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>>> ---
>>>   fs/io_uring.c | 23 ++++++++++++++++++-----
>>>   1 file changed, 18 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index d33777575faf..0a83ecc457d1 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -5711,6 +5711,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>       return 0;
>>>   }
>>> +static inline void __io_poll_clean(struct io_kiocb *req)
>>> +{
>>> +    struct io_ring_ctx *ctx = req->ctx;
>>> +
>>> +    io_poll_remove_entries(req);
>>> +    spin_lock(&ctx->completion_lock);
>>> +    hash_del(&req->hash_node);
>>> +    spin_unlock(&ctx->completion_lock);
>>> +}
>>> +
>>> +#define REQ_F_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)
>>> +static inline void io_poll_clean(struct io_kiocb *req)
>>> +{
>>> +    if ((req->flags & REQ_F_APOLL_MULTI_POLLED) == REQ_F_APOLL_MULTI_POLLED)
>>
>> So it triggers for apoll multishot only when REQ_F_POLLED is _not_ set,
>> but if it's not set it did never go through arm_poll / etc. and there is
>> nothing to clean up. What is the catch?
> 
> No, it is triggered for apoll multishot only when REQ_F_POLLED is set..

Ok, see it now, probably confused REQ_F_APOLL_MULTI_POLLED on the right
hand side with something else


>> btw, don't see the function used in this patch, better to add it later
>> or at least mark with attribute unused, or some may get build failures.
> Gotcha.
>>
>>
>>> +        __io_poll_clean(req);
>>> +}
>>> +
>>>   static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>>>   {
>>>       struct io_accept *accept = &req->accept;
>>> @@ -6041,17 +6058,13 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>>>   static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
>>>   {
>>> -    struct io_ring_ctx *ctx = req->ctx;
>>>       int ret;
>>>       ret = io_poll_check_events(req, locked);
>>>       if (ret > 0)
>>>           return;
>>> -    io_poll_remove_entries(req);
>>> -    spin_lock(&ctx->completion_lock);
>>> -    hash_del(&req->hash_node);
>>> -    spin_unlock(&ctx->completion_lock);
>>> +    __io_poll_clean(req);
>>>       if (!ret)
>>>           io_req_task_submit(req, locked);
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] io_uring: let fast poll support multishot
  2022-05-07  7:08     ` Hao Xu
@ 2022-05-07  9:47       ` Pavel Begunkov
  2022-05-07 11:06         ` Hao Xu
  0 siblings, 1 reply; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-07  9:47 UTC (permalink / raw)
  To: Hao Xu, io-uring; +Cc: Jens Axboe, linux-kernel

On 5/7/22 08:08, Hao Xu wrote:
> 在 2022/5/7 上午1:19, Pavel Begunkov 写道:
>> On 5/6/22 08:01, Hao Xu wrote:
[...]
>> That looks dangerous, io_queue_sqe() usually takes the request ownership
>> and doesn't expect that someone, i.e. io_poll_check_events(), may still be
>> actively using it.
>>
>> E.g. io_accept() fails on fd < 0, return an error,
>> io_queue_sqe() -> io_queue_async() -> io_req_complete_failed()
>> kills it. Then io_poll_check_events() and polling in general
>> carry on using the freed request => UAF. Didn't look at it
>> too carefully, but there might other similar cases.
>>
> I checked this when I did the coding, it seems the only case is
> while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
> uses req again after req recycled in io_queue_sqe() path like you
> pointed out above, but this case should be ok since we haven't
> reuse the struct req{} at that point.

Replied to another message with an example that I think might
be broken, please take a look.

The issue is that io_queue_sqe() was always consuming / freeing /
redirecting / etc. requests, i.e. call it and forget about the req.
With io_accept now it may or may not free it and not even returning
any return code about that. This implicit knowledge is quite tricky
to maintain.

might make more sense to "duplicate" io_queue_sqe()

ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
// REQ_F_COMPLETE_INLINE should never happen, no check for that
// don't care about io_arm_ltimeout(), should already be armed
// ret handling here



> In my first version, I skiped the do while{} in io_poll_check_events()
> for multishot apoll and do the reap in io_req_task_submit()
> 
> static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
>    {
>            int ret;
> 
> 
>            ret = io_poll_check_events(req, locked);
>            if (ret > 0)
>                    return;
> 
> 
>            __io_poll_clean(req);
> 
> 
>            if (!ret)
>                    io_req_task_submit(req, locked);   <------here
>            else
>                    io_req_complete_failed(req, ret);
>    }
> 
> But the disadvantage is in high frequent workloads case, it may loop in
> io_poll_check_events for long time, then finally generating cqes in the
> above io_req_task_submit() which is not good in terms of latency.
>>

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/5] io_uring: let fast poll support multishot
  2022-05-07  9:47       ` Pavel Begunkov
@ 2022-05-07 11:06         ` Hao Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07 11:06 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe, linux-kernel

在 2022/5/7 下午5:47, Pavel Begunkov 写道:
> On 5/7/22 08:08, Hao Xu wrote:
>> 在 2022/5/7 上午1:19, Pavel Begunkov 写道:
>>> On 5/6/22 08:01, Hao Xu wrote:
> [...]
>>> That looks dangerous, io_queue_sqe() usually takes the request ownership
>>> and doesn't expect that someone, i.e. io_poll_check_events(), may 
>>> still be
>>> actively using it.
>>>
>>> E.g. io_accept() fails on fd < 0, return an error,
>>> io_queue_sqe() -> io_queue_async() -> io_req_complete_failed()
>>> kills it. Then io_poll_check_events() and polling in general
>>> carry on using the freed request => UAF. Didn't look at it
>>> too carefully, but there might other similar cases.
>>>
>> I checked this when I did the coding, it seems the only case is
>> while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
>> uses req again after req recycled in io_queue_sqe() path like you
>> pointed out above, but this case should be ok since we haven't
>> reuse the struct req{} at that point.
> 
> Replied to another message with an example that I think might
> be broken, please take a look.
I saw it just now, it looks a valid case to me. Thanks.
> 
> The issue is that io_queue_sqe() was always consuming / freeing /
> redirecting / etc. requests, i.e. call it and forget about the req.
> With io_accept now it may or may not free it and not even returning
> any return code about that. This implicit knowledge is quite tricky
> to maintain.
> 
> might make more sense to "duplicate" io_queue_sqe()
> 
> ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
> // REQ_F_COMPLETE_INLINE should never happen, no check for that
> // don't care about io_arm_ltimeout(), should already be armed
> // ret handling here
This is what I'm doing for v3, indeed make more sense.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/5] fast poll multishot mode
  2022-05-07  3:08       ` Jens Axboe
@ 2022-05-07 16:01         ` Hao Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07 16:01 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, linux-kernel

在 2022/5/7 上午11:08, Jens Axboe 写道:
> On 5/6/22 8:33 PM, Jens Axboe wrote:
>> On 5/6/22 5:26 PM, Jens Axboe wrote:
>>> On 5/6/22 4:23 PM, Jens Axboe wrote:
>>>> On 5/6/22 1:00 AM, Hao Xu wrote:
>>>>> Let multishot support multishot mode, currently only add accept as its
>>>>> first comsumer.
>>>>> theoretical analysis:
>>>>>    1) when connections come in fast
>>>>>      - singleshot:
>>>>>                add accept sqe(userpsace) --> accept inline
>>>>>                                ^                 |
>>>>>                                |-----------------|
>>>>>      - multishot:
>>>>>               add accept sqe(userspace) --> accept inline
>>>>>                                                ^     |
>>>>>                                                |--*--|
>>>>>
>>>>>      we do accept repeatedly in * place until get EAGAIN
>>>>>
>>>>>    2) when connections come in at a low pressure
>>>>>      similar thing like 1), we reduce a lot of userspace-kernel context
>>>>>      switch and useless vfs_poll()
>>>>>
>>>>>
>>>>> tests:
>>>>> Did some tests, which goes in this way:
>>>>>
>>>>>    server    client(multiple)
>>>>>    accept    connect
>>>>>    read      write
>>>>>    write     read
>>>>>    close     close
>>>>>
>>>>> Basically, raise up a number of clients(on same machine with server) to
>>>>> connect to the server, and then write some data to it, the server will
>>>>> write those data back to the client after it receives them, and then
>>>>> close the connection after write return. Then the client will read the
>>>>> data and then close the connection. Here I test 10000 clients connect
>>>>> one server, data size 128 bytes. And each client has a go routine for
>>>>> it, so they come to the server in short time.
>>>>> test 20 times before/after this patchset, time spent:(unit cycle, which
>>>>> is the return value of clock())
>>>>> before:
>>>>>    1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>>>>>    +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>>>>>    +1934226+1914385)/20.0 = 1927633.75
>>>>> after:
>>>>>    1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>>>>>    +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>>>>>    +1871324+1940803)/20.0 = 1894750.45
>>>>>
>>>>> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
>>>>>
>>>>>
>>>>> A liburing test is here:
>>>>> https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c
>>>>
>>>> Wish I had seen that, I wrote my own! But maybe that's good, you tend to
>>>> find other issues through that.
>>>>
>>>> Anyway, works for me in testing, and I can see this being a nice win for
>>>> accept intensive workloads. I pushed a bunch of cleanup patches that
>>>> should just get folded in. Can you fold them into your patches and
>>>> address the other feedback, and post a v3? I pushed the test branch
>>>> here:
>>>>
>>>> https://git.kernel.dk/cgit/linux-block/log/?h=fastpoll-mshot
>>>
>>> Quick benchmark here, accepting 10k connections:
>>>
>>> Stock kernel
>>> real	0m0.728s
>>> user	0m0.009s
>>> sys	0m0.192s
>>>
>>> Patched
>>> real	0m0.684s
>>> user	0m0.018s
>>> sys	0m0.102s
>>>
>>> Looks like a nice win for a highly synthetic benchmark. Nothing
>>> scientific, was just curious.
>>
>> One more thought on this - how is it supposed to work with
>> accept-direct? One idea would be to make it incrementally increasing.
>> But we need a good story for that, if it's exclusive to non-direct
>> files, then it's a lot less interesting as the latter is really nice win
>> for lots of files. If we can combine the two, even better.
> 
> Running some quick testing, on an actual test box (previous numbers were
> from a vm on my laptop):
> 
> Testing singleshot, normal files
> Did 10000 accepts
> 
> ________________________________________________________
> Executed in  216.10 millis    fish           external
>     usr time    9.32 millis  150.00 micros    9.17 millis
>     sys time  110.06 millis   67.00 micros  109.99 millis
> 
> Testing multishot, fixed files
> Did 10000 accepts
> 
> ________________________________________________________
> Executed in  189.04 millis    fish           external
>     usr time   11.86 millis  159.00 micros   11.71 millis
>     sys time   93.71 millis   70.00 micros   93.64 millis
> 
> That's about ~19 usec to accept a connection, pretty decent. Using
> singleshot and with fixed files, it shaves about ~8% off, ends at around
> 200msec.
> 
> I think we can get away with using fixed files and multishot, attaching
I'm not following, do you mean we shouldn't do the multishot+fixed file
or we should use multishot+fixed to make the result better?
> the quick patch I did below to test it. We need something better than
Sorry Jens, I didn't see the quick patch, is there anything I misunderstand?
> this, otherwise once the space fills up, we'll likely end up with a
> sparse space and the naive approach of just incrementing the next slot
> won't work at all.

> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-05-07 16:01 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-06  7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
2022-05-06  7:00 ` [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
2022-05-06 14:32   ` Jens Axboe
2022-05-07  4:05     ` Hao Xu
2022-05-06  7:00 ` [PATCH 2/5] io_uring: add REQ_F_APOLL_MULTISHOT for requests Hao Xu
2022-05-06  7:01 ` [PATCH 3/5] io_uring: let fast poll support multishot Hao Xu
2022-05-06 17:19   ` Pavel Begunkov
2022-05-06 22:02     ` Jens Axboe
2022-05-07  6:32       ` Hao Xu
2022-05-07  9:26       ` Pavel Begunkov
2022-05-07  7:08     ` Hao Xu
2022-05-07  9:47       ` Pavel Begunkov
2022-05-07 11:06         ` Hao Xu
2022-05-06 18:02   ` kernel test robot
2022-05-06  7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
2022-05-06 11:04   ` kernel test robot
2022-05-06 12:47   ` kernel test robot
2022-05-06 14:36   ` Jens Axboe
2022-05-07  6:37     ` Hao Xu
2022-05-06 16:22   ` Pavel Begunkov
2022-05-07  6:43     ` Hao Xu
2022-05-07  9:29       ` Pavel Begunkov
2022-05-06  7:01 ` [PATCH 5/5] io_uring: implement multishot mode for accept Hao Xu
2022-05-06 14:42   ` Jens Axboe
2022-05-07  9:13     ` Hao Xu
2022-05-06 20:50   ` Jens Axboe
2022-05-06 21:29     ` Jens Axboe
2022-05-06  7:36 ` [PATCH v2 0/5] fast poll multishot mode Hao Xu
2022-05-06 14:18   ` Jens Axboe
2022-05-06 16:01     ` Pavel Begunkov
2022-05-06 16:03       ` Jens Axboe
2022-05-06 22:23 ` Jens Axboe
2022-05-06 23:26   ` Jens Axboe
2022-05-07  2:33     ` Jens Axboe
2022-05-07  3:08       ` Jens Axboe
2022-05-07 16:01         ` Hao Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).