io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/6] fast poll multishot mode
@ 2021-09-03 11:00 Hao Xu
  2021-09-03 11:00 ` [PATCH 1/6] io_uring: enhance flush completion logic Hao Xu
                   ` (6 more replies)
  0 siblings, 7 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-03 11:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

Let multishot support multishot mode, currently only add accept as its
first comsumer.
theoretical analysis:
  1) when connections come in fast
    - singleshot:
              add accept sqe(userpsace) --> accept inline
                              ^                 |
                              |-----------------|
    - multishot:
             add accept sqe(userspace) --> accept inline
                                              ^     |
                                              |--*--|

    we do accept repeatedly in * place until get EAGAIN

  2) when connections come in at a low pressure
    similar thing like 1), we reduce a lot of userspace-kernel context
    switch and useless vfs_poll()


tests:
Did some tests, which goes in this way:

  server    client(multiple)
  accept    connect
  read      write
  write     read
  close     close

Basically, raise up a number of clients(on same machine with server) to
connect to the server, and then write some data to it, the server will
write those data back to the client after it receives them, and then
close the connection after write return. Then the client will read the
data and then close the connection. Here I test 10000 clients connect
one server, data size 128 bytes. And each client has a go routine for
it, so they come to the server in short time.
test 20 times before/after this patchset, time spent:(unit cycle, which
is the return value of clock())
before:
  1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
  +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
  +1934226+1914385)/20.0 = 1927633.75
after:
  1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
  +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
  +1871324+1940803)/20.0 = 1894750.45

(1927633.75 - 1894750.45) / 1927633.75 = 1.65%

Higher pressure like 10^5 connections causes problems since ports are
in use.
I'll test the cancellation path and try to lift pressure to much high
level to see if numbers get better. Sent this early for comments and
suggestions.

Hao Xu (6):
  io_uring: enhance flush completion logic
  io_uring: add IORING_ACCEPT_MULTISHOT for accept
  io_uring: add REQ_F_APOLL_MULTISHOT for requests
  io_uring: let fast poll support multishot
  io_uring: implement multishot mode for accept
  io_uring: enable multishot mode for accept

 fs/io_uring.c                 | 72 +++++++++++++++++++++++++++++------
 include/uapi/linux/io_uring.h |  4 ++
 2 files changed, 64 insertions(+), 12 deletions(-)

-- 
2.24.4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/6] io_uring: enhance flush completion logic
  2021-09-03 11:00 [RFC 0/6] fast poll multishot mode Hao Xu
@ 2021-09-03 11:00 ` Hao Xu
  2021-09-03 11:42   ` Pavel Begunkov
  2021-09-03 11:00 ` [PATCH 2/6] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Hao Xu @ 2021-09-03 11:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

Though currently refcount of a req is always one when we flush inline
completions, but still a chance there will be exception in the future.
Enhance the flush logic to make sure we maintain compl_nr correctly.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---

we need to either removing the if check to claim clearly that the req's
refcount is 1 or adding this patch's logic.

 fs/io_uring.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 2bde732a1183..c48d43207f57 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2291,7 +2291,7 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
 	__must_hold(&ctx->uring_lock)
 {
 	struct io_submit_state *state = &ctx->submit_state;
-	int i, nr = state->compl_nr;
+	int i, nr = state->compl_nr, remain = 0;
 	struct req_batch rb;
 
 	spin_lock(&ctx->completion_lock);
@@ -2311,10 +2311,12 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
 
 		if (req_ref_put_and_test(req))
 			io_req_free_batch(&rb, req, &ctx->submit_state);
+		else
+			state->compl_reqs[remain++] = state->compl_reqs[i];
 	}
 
 	io_req_free_batch_finish(ctx, &rb);
-	state->compl_nr = 0;
+	state->compl_nr = remain;
 }
 
 /*
-- 
2.24.4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 2/6] io_uring: add IORING_ACCEPT_MULTISHOT for accept
  2021-09-03 11:00 [RFC 0/6] fast poll multishot mode Hao Xu
  2021-09-03 11:00 ` [PATCH 1/6] io_uring: enhance flush completion logic Hao Xu
@ 2021-09-03 11:00 ` Hao Xu
  2021-09-03 11:00 ` [PATCH 3/6] io_uring: add REQ_F_APOLL_MULTISHOT for requests Hao Xu
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-03 11:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

add an accept_flag IORING_ACCEPT_MULTISHOT for accept, which is to
support multishot.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---
 include/uapi/linux/io_uring.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 3caec9199658..faa3f2e70e46 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -177,6 +177,10 @@ enum {
 #define IORING_POLL_UPDATE_EVENTS	(1U << 1)
 #define IORING_POLL_UPDATE_USER_DATA	(1U << 2)
 
+/*
+ * accept flags stored in accept_flags
+ */
+#define IORING_ACCEPT_MULTISHOT	(1U << 0)
 /*
  * IO completion data structure (Completion Queue Entry)
  */
-- 
2.24.4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 3/6] io_uring: add REQ_F_APOLL_MULTISHOT for requests
  2021-09-03 11:00 [RFC 0/6] fast poll multishot mode Hao Xu
  2021-09-03 11:00 ` [PATCH 1/6] io_uring: enhance flush completion logic Hao Xu
  2021-09-03 11:00 ` [PATCH 2/6] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
@ 2021-09-03 11:00 ` Hao Xu
  2021-09-03 11:00 ` [PATCH 4/6] io_uring: let fast poll support multishot Hao Xu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-03 11:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

add a flag to indicate multishot mode for fast poll. Currently only
accept use it, but there may be more operations leveraging it in the
future.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---
 fs/io_uring.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index c48d43207f57..d6df60c4cdb9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -720,6 +720,7 @@ enum {
 	REQ_F_NOWAIT_READ_BIT,
 	REQ_F_NOWAIT_WRITE_BIT,
 	REQ_F_ISREG_BIT,
+	REQ_F_APOLL_MULTISHOT_BIT,
 
 	/* not a real bit, just to check we're not overflowing the space */
 	__REQ_F_LAST_BIT,
@@ -773,6 +774,8 @@ enum {
 	REQ_F_REFCOUNT		= BIT(REQ_F_REFCOUNT_BIT),
 	/* there is a linked timeout that has to be armed */
 	REQ_F_ARM_LTIMEOUT	= BIT(REQ_F_ARM_LTIMEOUT_BIT),
+	/* fast poll multishot mode */
+	REQ_F_APOLL_MULTISHOT	= BIT(REQ_F_APOLL_MULTISHOT_BIT),
 };
 
 struct async_poll {
-- 
2.24.4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-03 11:00 [RFC 0/6] fast poll multishot mode Hao Xu
                   ` (2 preceding siblings ...)
  2021-09-03 11:00 ` [PATCH 3/6] io_uring: add REQ_F_APOLL_MULTISHOT for requests Hao Xu
@ 2021-09-03 11:00 ` Hao Xu
  2021-09-06 15:56   ` Pavel Begunkov
  2021-09-06 19:04   ` Pavel Begunkov
  2021-09-03 11:00 ` [PATCH 5/6] io_uring: implement multishot mode for accept Hao Xu
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-03 11:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

For operations like accept, multishot is a useful feature, since we can
reduce a number of accept sqe. Let's integrate it to fast poll, it may
be good for other operations in the future.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---
 fs/io_uring.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index d6df60c4cdb9..dae7044e0c24 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
 		return;
 	}
 
-	hash_del(&req->hash_node);
-	io_poll_remove_double(req);
+	if (READ_ONCE(apoll->poll.canceled))
+		apoll->poll.events |= EPOLLONESHOT;
+	if (apoll->poll.events & EPOLLONESHOT) {
+		hash_del(&req->hash_node);
+		io_poll_remove_double(req);
+	} else {
+		add_wait_queue(apoll->poll.head, &apoll->poll.wait);
+	}
+
 	spin_unlock(&ctx->completion_lock);
 
 	if (!READ_ONCE(apoll->poll.canceled))
@@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb *req)
 	struct io_ring_ctx *ctx = req->ctx;
 	struct async_poll *apoll;
 	struct io_poll_table ipt;
-	__poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
+	__poll_t ret, mask = POLLERR | POLLPRI;
 	int rw;
 
 	if (!req->file || !file_can_poll(req->file))
@@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb *req)
 		rw = WRITE;
 		mask |= POLLOUT | POLLWRNORM;
 	}
+	if (!(req->flags & REQ_F_APOLL_MULTISHOT))
+		mask |= EPOLLONESHOT;
 
 	/* if we can't nonblock try, then no point in arming a poll handler */
 	if (!io_file_supports_nowait(req, rw))
-- 
2.24.4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 5/6] io_uring: implement multishot mode for accept
  2021-09-03 11:00 [RFC 0/6] fast poll multishot mode Hao Xu
                   ` (3 preceding siblings ...)
  2021-09-03 11:00 ` [PATCH 4/6] io_uring: let fast poll support multishot Hao Xu
@ 2021-09-03 11:00 ` Hao Xu
  2021-09-04 22:39   ` Pavel Begunkov
  2021-09-03 11:00 ` [PATCH 6/6] io_uring: enable " Hao Xu
  2021-09-03 11:02 ` [RFC 0/6] fast poll multishot mode Hao Xu
  6 siblings, 1 reply; 46+ messages in thread
From: Hao Xu @ 2021-09-03 11:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

Refactor io_accept() to support multishot mode.

theoretical analysis:
  1) when connections come in fast
    - singleshot:
              add accept sqe(userpsace) --> accept inline
                              ^                 |
                              |-----------------|
    - multishot:
             add accept sqe(userspace) --> accept inline
                                              ^     |
                                              |--*--|

    we do accept repeatedly in * place until get EAGAIN

  2) when connections come in at a low pressure
    similar thing like 1), we reduce a lot of userspace-kernel context
    switch and useless vfs_poll()

tests:
Did some tests, which goes in this way:

  server    client(multiple)
  accept    connect
  read      write
  write     read
  close     close

Basically, raise up a number of clients(on same machine with server) to
connect to the server, and then write some data to it, the server will
write those data back to the client after it receives them, and then
close the connection after write return. Then the client will read the
data and then close the connection. Here I test 10000 clients connect
one server, data size 128 bytes. And each client has a go routine for
it, so they come to the server in short time.
test 20 times before/after this patchset, time spent:(unit cycle, which
is the return value of clock())
before:
  1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
  +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
  +1934226+1914385)/20.0 = 1927633.75
after:
  1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
  +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
  +1871324+1940803)/20.0 = 1894750.45

(1927633.75 - 1894750.45) / 1927633.75 = 1.65%

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---

not sure if we should cancel it when io_cqring_fill_event() reurn false

 fs/io_uring.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index dae7044e0c24..eb81d37dce78 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4885,16 +4885,18 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 {
+	struct io_ring_ctx *ctx = req->ctx;
 	struct io_accept *accept = &req->accept;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 	unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
 	bool fixed = !!accept->file_slot;
 	struct file *file;
-	int ret, fd;
+	int ret, ret2 = 0, fd;
 
 	if (req->file->f_flags & O_NONBLOCK)
 		req->flags |= REQ_F_NOWAIT;
 
+retry:
 	if (!fixed) {
 		fd = __get_unused_fd_flags(accept->flags, accept->nofile);
 		if (unlikely(fd < 0))
@@ -4906,20 +4908,42 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 		if (!fixed)
 			put_unused_fd(fd);
 		ret = PTR_ERR(file);
-		if (ret == -EAGAIN && force_nonblock)
-			return -EAGAIN;
+		if (ret == -EAGAIN && force_nonblock) {
+			if ((req->flags & (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)) ==
+			    (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED))
+				ret = 0;
+			return ret;
+		}
 		if (ret == -ERESTARTSYS)
 			ret = -EINTR;
 		req_set_fail(req);
 	} else if (!fixed) {
 		fd_install(fd, file);
 		ret = fd;
+		/*
+		 * if it's in multishot mode, let's return -EAGAIN to make it go
+		 * into fast poll path
+		 */
+		if ((req->flags & REQ_F_APOLL_MULTISHOT) && force_nonblock &&
+		   !(req->flags & REQ_F_POLLED))
+			ret2 = -EAGAIN;
 	} else {
 		ret = io_install_fixed_file(req, file, issue_flags,
 					    accept->file_slot - 1);
 	}
-	__io_req_complete(req, issue_flags, ret, 0);
-	return 0;
+
+	if (req->flags & REQ_F_APOLL_MULTISHOT) {
+		spin_lock(&ctx->completion_lock);
+		if (io_cqring_fill_event(ctx, req->user_data, ret, 0)) {
+			io_commit_cqring(ctx);
+			ctx->cq_extra++;
+		}
+		spin_unlock(&ctx->completion_lock);
+		goto retry;
+	} else {
+		__io_req_complete(req, issue_flags, ret, 0);
+	}
+	return ret2;
 }
 
 static int io_connect_prep_async(struct io_kiocb *req)
-- 
2.24.4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-03 11:00 [RFC 0/6] fast poll multishot mode Hao Xu
                   ` (4 preceding siblings ...)
  2021-09-03 11:00 ` [PATCH 5/6] io_uring: implement multishot mode for accept Hao Xu
@ 2021-09-03 11:00 ` Hao Xu
  2021-09-03 16:29   ` Jens Axboe
  2021-09-04 22:43   ` Pavel Begunkov
  2021-09-03 11:02 ` [RFC 0/6] fast poll multishot mode Hao Xu
  6 siblings, 2 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-03 11:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

Update io_accept_prep() to enable multishot mode for accept operation.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
---
 fs/io_uring.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index eb81d37dce78..34612646ae3c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_accept *accept = &req->accept;
+	bool is_multishot;
 
 	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
 		return -EINVAL;
@@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	accept->flags = READ_ONCE(sqe->accept_flags);
 	accept->nofile = rlimit(RLIMIT_NOFILE);
 
+	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
+	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
+		return -EINVAL;
+
 	accept->file_slot = READ_ONCE(sqe->file_index);
 	if (accept->file_slot && ((req->open.how.flags & O_CLOEXEC) ||
-				  (accept->flags & SOCK_CLOEXEC)))
+				  (accept->flags & SOCK_CLOEXEC) || is_multishot))
 		return -EINVAL;
-	if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
+	if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK | IORING_ACCEPT_MULTISHOT))
 		return -EINVAL;
 	if (SOCK_NONBLOCK != O_NONBLOCK && (accept->flags & SOCK_NONBLOCK))
 		accept->flags = (accept->flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
+	if (is_multishot) {
+		req->flags |= REQ_F_APOLL_MULTISHOT;
+		accept->flags &= ~IORING_ACCEPT_MULTISHOT;
+	}
+
 	return 0;
 }
 
-- 
2.24.4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 0/6] fast poll multishot mode
  2021-09-03 11:00 [RFC 0/6] fast poll multishot mode Hao Xu
                   ` (5 preceding siblings ...)
  2021-09-03 11:00 ` [PATCH 6/6] io_uring: enable " Hao Xu
@ 2021-09-03 11:02 ` Hao Xu
  6 siblings, 0 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-03 11:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

在 2021/9/3 下午7:00, Hao Xu 写道:
> Let multishot support multishot mode, currently only add accept as its

               ^ fast poll
> first comsumer.
> theoretical analysis:
>    1) when connections come in fast
>      - singleshot:
>                add accept sqe(userpsace) --> accept inline
>                                ^                 |
>                                |-----------------|
>      - multishot:
>               add accept sqe(userspace) --> accept inline
>                                                ^     |
>                                                |--*--|
> 
>      we do accept repeatedly in * place until get EAGAIN
> 
>    2) when connections come in at a low pressure
>      similar thing like 1), we reduce a lot of userspace-kernel context
>      switch and useless vfs_poll()
> 
> 
> tests:
> Did some tests, which goes in this way:
> 
>    server    client(multiple)
>    accept    connect
>    read      write
>    write     read
>    close     close
> 
> Basically, raise up a number of clients(on same machine with server) to
> connect to the server, and then write some data to it, the server will
> write those data back to the client after it receives them, and then
> close the connection after write return. Then the client will read the
> data and then close the connection. Here I test 10000 clients connect
> one server, data size 128 bytes. And each client has a go routine for
> it, so they come to the server in short time.
> test 20 times before/after this patchset, time spent:(unit cycle, which
> is the return value of clock())
> before:
>    1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>    +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>    +1934226+1914385)/20.0 = 1927633.75
> after:
>    1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>    +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>    +1871324+1940803)/20.0 = 1894750.45
> 
> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
> 
> Higher pressure like 10^5 connections causes problems since ports are
> in use.
> I'll test the cancellation path and try to lift pressure to much high
> level to see if numbers get better. Sent this early for comments and
> suggestions.
> 
> Hao Xu (6):
>    io_uring: enhance flush completion logic
>    io_uring: add IORING_ACCEPT_MULTISHOT for accept
>    io_uring: add REQ_F_APOLL_MULTISHOT for requests
>    io_uring: let fast poll support multishot
>    io_uring: implement multishot mode for accept
>    io_uring: enable multishot mode for accept
> 
>   fs/io_uring.c                 | 72 +++++++++++++++++++++++++++++------
>   include/uapi/linux/io_uring.h |  4 ++
>   2 files changed, 64 insertions(+), 12 deletions(-)
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/6] io_uring: enhance flush completion logic
  2021-09-03 11:00 ` [PATCH 1/6] io_uring: enhance flush completion logic Hao Xu
@ 2021-09-03 11:42   ` Pavel Begunkov
  2021-09-03 12:08     ` Hao Xu
  0 siblings, 1 reply; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-03 11:42 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/3/21 12:00 PM, Hao Xu wrote:
> Though currently refcount of a req is always one when we flush inline

It can be refcounted and != 1. E.g. poll requests, or consider
that tw also flushes, and you may have a read that goes to apoll
and then get tw resubmitted from io_async_task_func(). And other
cases.

> completions, but still a chance there will be exception in the future.
> Enhance the flush logic to make sure we maintain compl_nr correctly.

See below

> 
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
> 
> we need to either removing the if check to claim clearly that the req's
> refcount is 1 or adding this patch's logic.
> 
>  fs/io_uring.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 2bde732a1183..c48d43207f57 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2291,7 +2291,7 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
>  	__must_hold(&ctx->uring_lock)
>  {
>  	struct io_submit_state *state = &ctx->submit_state;
> -	int i, nr = state->compl_nr;
> +	int i, nr = state->compl_nr, remain = 0;
>  	struct req_batch rb;
>  
>  	spin_lock(&ctx->completion_lock);
> @@ -2311,10 +2311,12 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
>  
>  		if (req_ref_put_and_test(req))
>  			io_req_free_batch(&rb, req, &ctx->submit_state);
> +		else
> +			state->compl_reqs[remain++] = state->compl_reqs[i];

Our ref is dropped, and something else holds another reference. That
"something else" is responsible to free the request once it put the last
reference. This chunk would make the following io_submit_flush_completions()
to underflow refcount and double free.

>  	}
>  
>  	io_req_free_batch_finish(ctx, &rb);
> -	state->compl_nr = 0;
> +	state->compl_nr = remain;
>  }
>  
>  /*
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/6] io_uring: enhance flush completion logic
  2021-09-03 11:42   ` Pavel Begunkov
@ 2021-09-03 12:08     ` Hao Xu
  2021-09-03 12:27       ` Pavel Begunkov
  0 siblings, 1 reply; 46+ messages in thread
From: Hao Xu @ 2021-09-03 12:08 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/3 下午7:42, Pavel Begunkov 写道:
> On 9/3/21 12:00 PM, Hao Xu wrote:
>> Though currently refcount of a req is always one when we flush inline
> 
> It can be refcounted and != 1. E.g. poll requests, or consider
It seems poll requests don't leverage comp cache, do I miss anything?
> that tw also flushes, and you may have a read that goes to apoll
> and then get tw resubmitted from io_async_task_func(). And other
when it goes to apoll, (say no double wait entry) ref is 1, then read
completes inline and then the only ref is droped by flush.
> cases.
> 
>> completions, but still a chance there will be exception in the future.
>> Enhance the flush logic to make sure we maintain compl_nr correctly.
> 
> See below
> 
>>
>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>> ---
>>
>> we need to either removing the if check to claim clearly that the req's
>> refcount is 1 or adding this patch's logic.
>>
>>   fs/io_uring.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 2bde732a1183..c48d43207f57 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -2291,7 +2291,7 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
>>   	__must_hold(&ctx->uring_lock)
>>   {
>>   	struct io_submit_state *state = &ctx->submit_state;
>> -	int i, nr = state->compl_nr;
>> +	int i, nr = state->compl_nr, remain = 0;
>>   	struct req_batch rb;
>>   
>>   	spin_lock(&ctx->completion_lock);
>> @@ -2311,10 +2311,12 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
>>   
>>   		if (req_ref_put_and_test(req))
>>   			io_req_free_batch(&rb, req, &ctx->submit_state);
>> +		else
>> +			state->compl_reqs[remain++] = state->compl_reqs[i];
> 
> Our ref is dropped, and something else holds another reference. That
> "something else" is responsible to free the request once it put the last
> reference. This chunk would make the following io_submit_flush_completions()
> to underflow refcount and double free.
True, I see. Thanks Pavel.
> 
>>   	}
>>   
>>   	io_req_free_batch_finish(ctx, &rb);
>> -	state->compl_nr = 0;
>> +	state->compl_nr = remain;
>>   }
>>   
>>   /*
>>
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/6] io_uring: enhance flush completion logic
  2021-09-03 12:08     ` Hao Xu
@ 2021-09-03 12:27       ` Pavel Begunkov
  2021-09-03 13:38         ` Hao Xu
  0 siblings, 1 reply; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-03 12:27 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/3/21 1:08 PM, Hao Xu wrote:
> 在 2021/9/3 下午7:42, Pavel Begunkov 写道:
>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>> Though currently refcount of a req is always one when we flush inline
>>
>> It can be refcounted and != 1. E.g. poll requests, or consider
> It seems poll requests don't leverage comp cache, do I miss anything?

Hmm, looks so. Not great that it doesn't, but probably it's because
of trying to submit next reqs right in io_poll_task_func().

I'll be pushing for some changes around tw, with it should be easy
to hook poll completion batching with no drawbacks. Would be great
if you will be willing to take a shot on it.

>> that tw also flushes, and you may have a read that goes to apoll
>> and then get tw resubmitted from io_async_task_func(). And other
> when it goes to apoll, (say no double wait entry) ref is 1, then read
> completes inline and then the only ref is droped by flush.

Yep, but some might have double entry. It also will have elevated
refs if there was a linked timeout. Another case is io-wq, which
takes a reference, and if iowq doesn't put it for a while (e.g. got
rescheduled) but requests goes to tw (resubmit, poll, whatever)
you have the same picture.

>> cases.
>>
>>> completions, but still a chance there will be exception in the future.
>>> Enhance the flush logic to make sure we maintain compl_nr correctly.
>>
>> See below
>>
>>>
>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>> ---
>>>
>>> we need to either removing the if check to claim clearly that the req's
>>> refcount is 1 or adding this patch's logic.
>>>
>>>   fs/io_uring.c | 6 ++++--
>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 2bde732a1183..c48d43207f57 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -2291,7 +2291,7 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
>>>       __must_hold(&ctx->uring_lock)
>>>   {
>>>       struct io_submit_state *state = &ctx->submit_state;
>>> -    int i, nr = state->compl_nr;
>>> +    int i, nr = state->compl_nr, remain = 0;
>>>       struct req_batch rb;
>>>         spin_lock(&ctx->completion_lock);
>>> @@ -2311,10 +2311,12 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
>>>             if (req_ref_put_and_test(req))
>>>               io_req_free_batch(&rb, req, &ctx->submit_state);
>>> +        else
>>> +            state->compl_reqs[remain++] = state->compl_reqs[i];
>>
>> Our ref is dropped, and something else holds another reference. That
>> "something else" is responsible to free the request once it put the last
>> reference. This chunk would make the following io_submit_flush_completions()
>> to underflow refcount and double free.
> True, I see. Thanks Pavel.
>>
>>>       }
>>>         io_req_free_batch_finish(ctx, &rb);
>>> -    state->compl_nr = 0;
>>> +    state->compl_nr = remain;
>>>   }
>>>     /*
>>>
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/6] io_uring: enhance flush completion logic
  2021-09-03 12:27       ` Pavel Begunkov
@ 2021-09-03 13:38         ` Hao Xu
  2021-09-17 18:49           ` Hao Xu
  0 siblings, 1 reply; 46+ messages in thread
From: Hao Xu @ 2021-09-03 13:38 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/3 下午8:27, Pavel Begunkov 写道:
> On 9/3/21 1:08 PM, Hao Xu wrote:
>> 在 2021/9/3 下午7:42, Pavel Begunkov 写道:
>>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>>> Though currently refcount of a req is always one when we flush inline
>>>
>>> It can be refcounted and != 1. E.g. poll requests, or consider
>> It seems poll requests don't leverage comp cache, do I miss anything?
> 
> Hmm, looks so. Not great that it doesn't, but probably it's because
> of trying to submit next reqs right in io_poll_task_func().
> 
> I'll be pushing for some changes around tw, with it should be easy
> to hook poll completion batching with no drawbacks. Would be great
> if you will be willing to take a shot on it.
Sure, I'll take a look.
> 
>>> that tw also flushes, and you may have a read that goes to apoll
>>> and then get tw resubmitted from io_async_task_func(). And other
>> when it goes to apoll, (say no double wait entry) ref is 1, then read
>> completes inline and then the only ref is droped by flush.
> 
> Yep, but some might have double entry. It also will have elevated
double entry should be fine.
> refs if there was a linked timeout. Another case is io-wq, which
Haven't dig into linked timeout yet.
> takes a reference, and if iowq doesn't put it for a while (e.g. got
> rescheduled) but requests goes to tw (resubmit, poll, whatever)
> you have the same picture.
Gotcha, thanks.
> 
>>> cases.
>>>
>>>> completions, but still a chance there will be exception in the future.
>>>> Enhance the flush logic to make sure we maintain compl_nr correctly.
>>>
>>> See below
>>>
>>>>
>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>> ---
>>>>
>>>> we need to either removing the if check to claim clearly that the req's
>>>> refcount is 1 or adding this patch's logic.
>>>>
>>>>    fs/io_uring.c | 6 ++++--
>>>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 2bde732a1183..c48d43207f57 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -2291,7 +2291,7 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
>>>>        __must_hold(&ctx->uring_lock)
>>>>    {
>>>>        struct io_submit_state *state = &ctx->submit_state;
>>>> -    int i, nr = state->compl_nr;
>>>> +    int i, nr = state->compl_nr, remain = 0;
>>>>        struct req_batch rb;
>>>>          spin_lock(&ctx->completion_lock);
>>>> @@ -2311,10 +2311,12 @@ static void io_submit_flush_completions(struct io_ring_ctx *ctx)
>>>>              if (req_ref_put_and_test(req))
>>>>                io_req_free_batch(&rb, req, &ctx->submit_state);
>>>> +        else
>>>> +            state->compl_reqs[remain++] = state->compl_reqs[i];
>>>
>>> Our ref is dropped, and something else holds another reference. That
>>> "something else" is responsible to free the request once it put the last
>>> reference. This chunk would make the following io_submit_flush_completions()
>>> to underflow refcount and double free.
>> True, I see. Thanks Pavel.
>>>
>>>>        }
>>>>          io_req_free_batch_finish(ctx, &rb);
>>>> -    state->compl_nr = 0;
>>>> +    state->compl_nr = remain;
>>>>    }
>>>>      /*
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-03 11:00 ` [PATCH 6/6] io_uring: enable " Hao Xu
@ 2021-09-03 16:29   ` Jens Axboe
  2021-09-04 15:34     ` Hao Xu
  2021-09-04 22:43   ` Pavel Begunkov
  1 sibling, 1 reply; 46+ messages in thread
From: Jens Axboe @ 2021-09-03 16:29 UTC (permalink / raw)
  To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi

On 9/3/21 5:00 AM, Hao Xu wrote:
> Update io_accept_prep() to enable multishot mode for accept operation.
> 
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
>  fs/io_uring.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index eb81d37dce78..34612646ae3c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>  static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  {
>  	struct io_accept *accept = &req->accept;
> +	bool is_multishot;
>  
>  	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>  		return -EINVAL;
> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  	accept->flags = READ_ONCE(sqe->accept_flags);
>  	accept->nofile = rlimit(RLIMIT_NOFILE);
>  
> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
> +		return -EINVAL;

I like the idea itself as I think it makes a lot of sense to just have
an accept sitting there and generating multiple CQEs, but I'm a bit
puzzled by how you pass it in. accept->flags is the accept4(2) flags,
which can currently be:

SOCK_NONBLOCK
SOCK_CLOEXEC

While there's not any overlap here, that is mostly by chance I think. A
cleaner separation is needed here, what happens if some other accept4(2)
flag is enabled and it just happens to be the same as
IORING_ACCEPT_MULTISHOT?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-03 16:29   ` Jens Axboe
@ 2021-09-04 15:34     ` Hao Xu
  2021-09-04 18:40       ` Jens Axboe
  0 siblings, 1 reply; 46+ messages in thread
From: Hao Xu @ 2021-09-04 15:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

在 2021/9/4 上午12:29, Jens Axboe 写道:
> On 9/3/21 5:00 AM, Hao Xu wrote:
>> Update io_accept_prep() to enable multishot mode for accept operation.
>>
>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>> ---
>>   fs/io_uring.c | 14 ++++++++++++--
>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index eb81d37dce78..34612646ae3c 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>   static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>   {
>>   	struct io_accept *accept = &req->accept;
>> +	bool is_multishot;
>>   
>>   	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>   		return -EINVAL;
>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>   	accept->flags = READ_ONCE(sqe->accept_flags);
>>   	accept->nofile = rlimit(RLIMIT_NOFILE);
>>   
>> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>> +		return -EINVAL;
> 
> I like the idea itself as I think it makes a lot of sense to just have
> an accept sitting there and generating multiple CQEs, but I'm a bit
> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
> which can currently be:
> 
> SOCK_NONBLOCK
> SOCK_CLOEXEC
> 
> While there's not any overlap here, that is mostly by chance I think. A
> cleaner separation is needed here, what happens if some other accept4(2)
> flag is enabled and it just happens to be the same as
> IORING_ACCEPT_MULTISHOT?
Make sense, how about a new IOSQE flag, I saw not many
entries left there.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-04 15:34     ` Hao Xu
@ 2021-09-04 18:40       ` Jens Axboe
  2021-09-04 22:46         ` Pavel Begunkov
  0 siblings, 1 reply; 46+ messages in thread
From: Jens Axboe @ 2021-09-04 18:40 UTC (permalink / raw)
  To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi

On 9/4/21 9:34 AM, Hao Xu wrote:
> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>
>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>> ---
>>>   fs/io_uring.c | 14 ++++++++++++--
>>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index eb81d37dce78..34612646ae3c 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>   static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>   {
>>>   	struct io_accept *accept = &req->accept;
>>> +	bool is_multishot;
>>>   
>>>   	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>   		return -EINVAL;
>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>   	accept->flags = READ_ONCE(sqe->accept_flags);
>>>   	accept->nofile = rlimit(RLIMIT_NOFILE);
>>>   
>>> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>> +		return -EINVAL;
>>
>> I like the idea itself as I think it makes a lot of sense to just have
>> an accept sitting there and generating multiple CQEs, but I'm a bit
>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>> which can currently be:
>>
>> SOCK_NONBLOCK
>> SOCK_CLOEXEC
>>
>> While there's not any overlap here, that is mostly by chance I think. A
>> cleaner separation is needed here, what happens if some other accept4(2)
>> flag is enabled and it just happens to be the same as
>> IORING_ACCEPT_MULTISHOT?
> Make sense, how about a new IOSQE flag, I saw not many
> entries left there.

Not quite sure what the best approach would be... The mshot flag only
makes sense for a few request types, so a bit of a shame to have to
waste an IOSQE flag on it. Especially when the flags otherwise passed in
are so sparse, there's plenty of bits there.

Hence while it may not be the prettiest, perhaps using accept->flags is
ok and we just need some careful code to ensure that we never have any
overlap.

Probably best to solve that issue in include/linux/net.h, ala:

/* Flags for socket, socketpair, accept4 */
#define SOCK_CLOEXEC	O_CLOEXEC
#ifndef SOCK_NONBLOCK
#define SOCK_NONBLOCK	O_NONBLOCK
#endif

/*
 * Only used for io_uring accept4, and deliberately chosen to overlap
 * with the O_* file bits for read/write mode so we won't risk overlap
 * other flags added for socket/socketpair/accept4 use in the future.
 */
#define SOCK_URING_MULTISHOT	00000001

which should be OK, as these overlap with the O_* filespace and the
read/write bits are at the start of that space.

Should be done as a prep patch and sent out to netdev as well, so we can
get their sign-off on this "hack". If we can get that done, then we have
our flag and we can just stuff it in accept->flags as long as we clear
it before calling into accept from io_uring.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 5/6] io_uring: implement multishot mode for accept
  2021-09-03 11:00 ` [PATCH 5/6] io_uring: implement multishot mode for accept Hao Xu
@ 2021-09-04 22:39   ` Pavel Begunkov
  2021-09-04 22:40     ` Pavel Begunkov
  0 siblings, 1 reply; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-04 22:39 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/3/21 12:00 PM, Hao Xu wrote:
> Refactor io_accept() to support multishot mode.

Multishot with the fixed/direct mode sounds weird (considering that
the slot index is specified by userspace), let's forbid them.

io_accept_prep() {
	if (accept->file_slot && (flags & MULTISHOT))
		return -EINVAL;
	...
}

> theoretical analysis:
>   1) when connections come in fast
>     - singleshot:
>               add accept sqe(userpsace) --> accept inline
>                               ^                 |
>                               |-----------------|
>     - multishot:
>              add accept sqe(userspace) --> accept inline
>                                               ^     |
>                                               |--*--|
> 
>     we do accept repeatedly in * place until get EAGAIN
> 
>   2) when connections come in at a low pressure
>     similar thing like 1), we reduce a lot of userspace-kernel context
>     switch and useless vfs_poll()
> 
> tests:
> Did some tests, which goes in this way:
> 
>   server    client(multiple)
>   accept    connect
>   read      write
>   write     read
>   close     close
> 
> Basically, raise up a number of clients(on same machine with server) to
> connect to the server, and then write some data to it, the server will
> write those data back to the client after it receives them, and then
> close the connection after write return. Then the client will read the
> data and then close the connection. Here I test 10000 clients connect
> one server, data size 128 bytes. And each client has a go routine for
> it, so they come to the server in short time.
> test 20 times before/after this patchset, time spent:(unit cycle, which
> is the return value of clock())
> before:
>   1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>   +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>   +1934226+1914385)/20.0 = 1927633.75
> after:
>   1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>   +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>   +1871324+1940803)/20.0 = 1894750.45
> 
> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
> 
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
> 
> not sure if we should cancel it when io_cqring_fill_event() reurn false
> 
>  fs/io_uring.c | 34 +++++++++++++++++++++++++++++-----
>  1 file changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index dae7044e0c24..eb81d37dce78 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -4885,16 +4885,18 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  
>  static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>  {
> +	struct io_ring_ctx *ctx = req->ctx;
>  	struct io_accept *accept = &req->accept;
>  	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>  	unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
>  	bool fixed = !!accept->file_slot;
>  	struct file *file;
> -	int ret, fd;
> +	int ret, ret2 = 0, fd;
>  
>  	if (req->file->f_flags & O_NONBLOCK)
>  		req->flags |= REQ_F_NOWAIT;
>  
> +retry:
>  	if (!fixed) {
>  		fd = __get_unused_fd_flags(accept->flags, accept->nofile);
>  		if (unlikely(fd < 0))
> @@ -4906,20 +4908,42 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>  		if (!fixed)
>  			put_unused_fd(fd);
>  		ret = PTR_ERR(file);
> -		if (ret == -EAGAIN && force_nonblock)
> -			return -EAGAIN;
> +		if (ret == -EAGAIN && force_nonblock) {
> +			if ((req->flags & (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)) ==
> +			    (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED))
> +				ret = 0;
> +			return ret;
> +		}
>  		if (ret == -ERESTARTSYS)
>  			ret = -EINTR;
>  		req_set_fail(req);
>  	} else if (!fixed) {
>  		fd_install(fd, file);
>  		ret = fd;
> +		/*
> +		 * if it's in multishot mode, let's return -EAGAIN to make it go
> +		 * into fast poll path
> +		 */
> +		if ((req->flags & REQ_F_APOLL_MULTISHOT) && force_nonblock &&
> +		   !(req->flags & REQ_F_POLLED))
> +			ret2 = -EAGAIN;
>  	} else {
>  		ret = io_install_fixed_file(req, file, issue_flags,
>  					    accept->file_slot - 1);
>  	}
> -	__io_req_complete(req, issue_flags, ret, 0);
> -	return 0;
> +
> +	if (req->flags & REQ_F_APOLL_MULTISHOT) {
> +		spin_lock(&ctx->completion_lock);
> +		if (io_cqring_fill_event(ctx, req->user_data, ret, 0)) {
> +			io_commit_cqring(ctx);
> +			ctx->cq_extra++;
> +		}
> +		spin_unlock(&ctx->completion_lock);
> +		goto retry;
> +	} else {
> +		__io_req_complete(req, issue_flags, ret, 0);
> +	}
> +	return ret2;
>  }
>  
>  static int io_connect_prep_async(struct io_kiocb *req)
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 5/6] io_uring: implement multishot mode for accept
  2021-09-04 22:39   ` Pavel Begunkov
@ 2021-09-04 22:40     ` Pavel Begunkov
  2021-09-06 15:34       ` Pavel Begunkov
  0 siblings, 1 reply; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-04 22:40 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/4/21 11:39 PM, Pavel Begunkov wrote:
> On 9/3/21 12:00 PM, Hao Xu wrote:
>> Refactor io_accept() to support multishot mode.
> 
> Multishot with the fixed/direct mode sounds weird (considering that
> the slot index is specified by userspace), let's forbid them.
> 
> io_accept_prep() {
> 	if (accept->file_slot && (flags & MULTISHOT))
> 		return -EINVAL;
> 	...
> }

Ah, never mind, it's already there in 6/6

> 
>> theoretical analysis:
>>   1) when connections come in fast
>>     - singleshot:
>>               add accept sqe(userpsace) --> accept inline
>>                               ^                 |
>>                               |-----------------|
>>     - multishot:
>>              add accept sqe(userspace) --> accept inline
>>                                               ^     |
>>                                               |--*--|
>>
>>     we do accept repeatedly in * place until get EAGAIN
>>
>>   2) when connections come in at a low pressure
>>     similar thing like 1), we reduce a lot of userspace-kernel context
>>     switch and useless vfs_poll()
>>
>> tests:
>> Did some tests, which goes in this way:
>>
>>   server    client(multiple)
>>   accept    connect
>>   read      write
>>   write     read
>>   close     close
>>
>> Basically, raise up a number of clients(on same machine with server) to
>> connect to the server, and then write some data to it, the server will
>> write those data back to the client after it receives them, and then
>> close the connection after write return. Then the client will read the
>> data and then close the connection. Here I test 10000 clients connect
>> one server, data size 128 bytes. And each client has a go routine for
>> it, so they come to the server in short time.
>> test 20 times before/after this patchset, time spent:(unit cycle, which
>> is the return value of clock())
>> before:
>>   1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>>   +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>>   +1934226+1914385)/20.0 = 1927633.75
>> after:
>>   1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>>   +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>>   +1871324+1940803)/20.0 = 1894750.45
>>
>> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
>>
>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>> ---
>>
>> not sure if we should cancel it when io_cqring_fill_event() reurn false
>>
>>  fs/io_uring.c | 34 +++++++++++++++++++++++++++++-----
>>  1 file changed, 29 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index dae7044e0c24..eb81d37dce78 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -4885,16 +4885,18 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>  
>>  static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>>  {
>> +	struct io_ring_ctx *ctx = req->ctx;
>>  	struct io_accept *accept = &req->accept;
>>  	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>  	unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
>>  	bool fixed = !!accept->file_slot;
>>  	struct file *file;
>> -	int ret, fd;
>> +	int ret, ret2 = 0, fd;
>>  
>>  	if (req->file->f_flags & O_NONBLOCK)
>>  		req->flags |= REQ_F_NOWAIT;
>>  
>> +retry:
>>  	if (!fixed) {
>>  		fd = __get_unused_fd_flags(accept->flags, accept->nofile);
>>  		if (unlikely(fd < 0))
>> @@ -4906,20 +4908,42 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>>  		if (!fixed)
>>  			put_unused_fd(fd);
>>  		ret = PTR_ERR(file);
>> -		if (ret == -EAGAIN && force_nonblock)
>> -			return -EAGAIN;
>> +		if (ret == -EAGAIN && force_nonblock) {
>> +			if ((req->flags & (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)) ==
>> +			    (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED))
>> +				ret = 0;
>> +			return ret;
>> +		}
>>  		if (ret == -ERESTARTSYS)
>>  			ret = -EINTR;
>>  		req_set_fail(req);
>>  	} else if (!fixed) {
>>  		fd_install(fd, file);
>>  		ret = fd;
>> +		/*
>> +		 * if it's in multishot mode, let's return -EAGAIN to make it go
>> +		 * into fast poll path
>> +		 */
>> +		if ((req->flags & REQ_F_APOLL_MULTISHOT) && force_nonblock &&
>> +		   !(req->flags & REQ_F_POLLED))
>> +			ret2 = -EAGAIN;
>>  	} else {
>>  		ret = io_install_fixed_file(req, file, issue_flags,
>>  					    accept->file_slot - 1);
>>  	}
>> -	__io_req_complete(req, issue_flags, ret, 0);
>> -	return 0;
>> +
>> +	if (req->flags & REQ_F_APOLL_MULTISHOT) {
>> +		spin_lock(&ctx->completion_lock);
>> +		if (io_cqring_fill_event(ctx, req->user_data, ret, 0)) {
>> +			io_commit_cqring(ctx);
>> +			ctx->cq_extra++;
>> +		}
>> +		spin_unlock(&ctx->completion_lock);
>> +		goto retry;
>> +	} else {
>> +		__io_req_complete(req, issue_flags, ret, 0);
>> +	}
>> +	return ret2;
>>  }
>>  
>>  static int io_connect_prep_async(struct io_kiocb *req)
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-03 11:00 ` [PATCH 6/6] io_uring: enable " Hao Xu
  2021-09-03 16:29   ` Jens Axboe
@ 2021-09-04 22:43   ` Pavel Begunkov
  2021-09-05  6:25     ` Hao Xu
  1 sibling, 1 reply; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-04 22:43 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/3/21 12:00 PM, Hao Xu wrote:
> Update io_accept_prep() to enable multishot mode for accept operation.
> 
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
>  fs/io_uring.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index eb81d37dce78..34612646ae3c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>  static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  {
>  	struct io_accept *accept = &req->accept;
> +	bool is_multishot;
>  
>  	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>  		return -EINVAL;
> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  	accept->flags = READ_ONCE(sqe->accept_flags);
>  	accept->nofile = rlimit(RLIMIT_NOFILE);
>  
> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
> +		return -EINVAL;

Why REQ_F_FORCE_ASYNC is not allowed? It doesn't sound like there
should be any problem, would just eventually go looping
poll_wait + tw

> +
>  	accept->file_slot = READ_ONCE(sqe->file_index);
>  	if (accept->file_slot && ((req->open.how.flags & O_CLOEXEC) ||
> -				  (accept->flags & SOCK_CLOEXEC)))
> +				  (accept->flags & SOCK_CLOEXEC) || is_multishot))
>  		return -EINVAL;
> -	if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
> +	if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK | IORING_ACCEPT_MULTISHOT))
>  		return -EINVAL;
>  	if (SOCK_NONBLOCK != O_NONBLOCK && (accept->flags & SOCK_NONBLOCK))
>  		accept->flags = (accept->flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
> +	if (is_multishot) {
> +		req->flags |= REQ_F_APOLL_MULTISHOT;
> +		accept->flags &= ~IORING_ACCEPT_MULTISHOT;
> +	}
> +
>  	return 0;
>  }
>  
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-04 18:40       ` Jens Axboe
@ 2021-09-04 22:46         ` Pavel Begunkov
  2021-09-05  7:29           ` Hao Xu
  2021-09-05 19:44           ` Jens Axboe
  0 siblings, 2 replies; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-04 22:46 UTC (permalink / raw)
  To: Jens Axboe, Hao Xu; +Cc: io-uring, Joseph Qi

On 9/4/21 7:40 PM, Jens Axboe wrote:
> On 9/4/21 9:34 AM, Hao Xu wrote:
>> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>>
>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>> ---
>>>>   fs/io_uring.c | 14 ++++++++++++--
>>>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index eb81d37dce78..34612646ae3c 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>>   static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>   {
>>>>   	struct io_accept *accept = &req->accept;
>>>> +	bool is_multishot;
>>>>   
>>>>   	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>   		return -EINVAL;
>>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>   	accept->flags = READ_ONCE(sqe->accept_flags);
>>>>   	accept->nofile = rlimit(RLIMIT_NOFILE);
>>>>   
>>>> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>>> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>>> +		return -EINVAL;
>>>
>>> I like the idea itself as I think it makes a lot of sense to just have
>>> an accept sitting there and generating multiple CQEs, but I'm a bit
>>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>>> which can currently be:
>>>
>>> SOCK_NONBLOCK
>>> SOCK_CLOEXEC
>>>
>>> While there's not any overlap here, that is mostly by chance I think. A
>>> cleaner separation is needed here, what happens if some other accept4(2)
>>> flag is enabled and it just happens to be the same as
>>> IORING_ACCEPT_MULTISHOT?
>> Make sense, how about a new IOSQE flag, I saw not many
>> entries left there.
> 
> Not quite sure what the best approach would be... The mshot flag only
> makes sense for a few request types, so a bit of a shame to have to
> waste an IOSQE flag on it. Especially when the flags otherwise passed in
> are so sparse, there's plenty of bits there.
> 
> Hence while it may not be the prettiest, perhaps using accept->flags is
> ok and we just need some careful code to ensure that we never have any
> overlap.

Or we can alias with some of the almost-never-used fields like
->ioprio or ->buf_index.

> Probably best to solve that issue in include/linux/net.h, ala:
> 
> /* Flags for socket, socketpair, accept4 */
> #define SOCK_CLOEXEC	O_CLOEXEC
> #ifndef SOCK_NONBLOCK
> #define SOCK_NONBLOCK	O_NONBLOCK
> #endif
> 
> /*
>  * Only used for io_uring accept4, and deliberately chosen to overlap
>  * with the O_* file bits for read/write mode so we won't risk overlap
>  * other flags added for socket/socketpair/accept4 use in the future.
>  */
> #define SOCK_URING_MULTISHOT	00000001
> 
> which should be OK, as these overlap with the O_* filespace and the
> read/write bits are at the start of that space.
> 
> Should be done as a prep patch and sent out to netdev as well, so we can
> get their sign-off on this "hack". If we can get that done, then we have
> our flag and we can just stuff it in accept->flags as long as we clear
> it before calling into accept from io_uring.
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-04 22:43   ` Pavel Begunkov
@ 2021-09-05  6:25     ` Hao Xu
  2021-09-05  8:27       ` Pavel Begunkov
  0 siblings, 1 reply; 46+ messages in thread
From: Hao Xu @ 2021-09-05  6:25 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/5 上午6:43, Pavel Begunkov 写道:
> On 9/3/21 12:00 PM, Hao Xu wrote:
>> Update io_accept_prep() to enable multishot mode for accept operation.
>>
>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>> ---
>>   fs/io_uring.c | 14 ++++++++++++--
>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index eb81d37dce78..34612646ae3c 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>   static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>   {
>>   	struct io_accept *accept = &req->accept;
>> +	bool is_multishot;
>>   
>>   	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>   		return -EINVAL;
>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>   	accept->flags = READ_ONCE(sqe->accept_flags);
>>   	accept->nofile = rlimit(RLIMIT_NOFILE);
>>   
>> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>> +		return -EINVAL;
> 
> Why REQ_F_FORCE_ASYNC is not allowed? It doesn't sound like there
> should be any problem, would just eventually go looping
> poll_wait + tw
Hmm..The arm_poll facility is only on the io_submit_sqes() path. If
FORCE_ASYNC is set, the req goes to io_wq_submit_work-->io_issue_sqe.
Moreover, I guess theoretically poll based retry and async iowq are
two ways for users to handle their sqes, it may not be sane to go to
poll based retry path if a user already forcely choose iowq as their
prefer way.
> 
>> +
>>   	accept->file_slot = READ_ONCE(sqe->file_index);
>>   	if (accept->file_slot && ((req->open.how.flags & O_CLOEXEC) ||
>> -				  (accept->flags & SOCK_CLOEXEC)))
>> +				  (accept->flags & SOCK_CLOEXEC) || is_multishot))
>>   		return -EINVAL;
>> -	if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
>> +	if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK | IORING_ACCEPT_MULTISHOT))
>>   		return -EINVAL;
>>   	if (SOCK_NONBLOCK != O_NONBLOCK && (accept->flags & SOCK_NONBLOCK))
>>   		accept->flags = (accept->flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
>> +	if (is_multishot) {
>> +		req->flags |= REQ_F_APOLL_MULTISHOT;
>> +		accept->flags &= ~IORING_ACCEPT_MULTISHOT;
>> +	}
>> +
>>   	return 0;
>>   }
>>   
>>
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-04 22:46         ` Pavel Begunkov
@ 2021-09-05  7:29           ` Hao Xu
  2021-09-05 19:44           ` Jens Axboe
  1 sibling, 0 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-05  7:29 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/5 上午6:46, Pavel Begunkov 写道:
> On 9/4/21 7:40 PM, Jens Axboe wrote:
>> On 9/4/21 9:34 AM, Hao Xu wrote:
>>> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>>>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>>>
>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>> ---
>>>>>    fs/io_uring.c | 14 ++++++++++++--
>>>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index eb81d37dce78..34612646ae3c 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>>>    static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>    {
>>>>>    	struct io_accept *accept = &req->accept;
>>>>> +	bool is_multishot;
>>>>>    
>>>>>    	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>    		return -EINVAL;
>>>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>    	accept->flags = READ_ONCE(sqe->accept_flags);
>>>>>    	accept->nofile = rlimit(RLIMIT_NOFILE);
>>>>>    
>>>>> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>>>> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>>>> +		return -EINVAL;
>>>>
>>>> I like the idea itself as I think it makes a lot of sense to just have
>>>> an accept sitting there and generating multiple CQEs, but I'm a bit
>>>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>>>> which can currently be:
>>>>
>>>> SOCK_NONBLOCK
>>>> SOCK_CLOEXEC
>>>>
>>>> While there's not any overlap here, that is mostly by chance I think. A
>>>> cleaner separation is needed here, what happens if some other accept4(2)
>>>> flag is enabled and it just happens to be the same as
>>>> IORING_ACCEPT_MULTISHOT?
>>> Make sense, how about a new IOSQE flag, I saw not many
>>> entries left there.
>>
>> Not quite sure what the best approach would be... The mshot flag only
>> makes sense for a few request types, so a bit of a shame to have to
>> waste an IOSQE flag on it. Especially when the flags otherwise passed in
>> are so sparse, there's plenty of bits there.
>>
>> Hence while it may not be the prettiest, perhaps using accept->flags is
>> ok and we just need some careful code to ensure that we never have any
>> overlap.
> 
> Or we can alias with some of the almost-never-used fields like
> ->ioprio or ->buf_index.
I think leverage the highest bit of ioprio may be a good idea?
> 
>> Probably best to solve that issue in include/linux/net.h, ala:
>>
>> /* Flags for socket, socketpair, accept4 */
>> #define SOCK_CLOEXEC	O_CLOEXEC
>> #ifndef SOCK_NONBLOCK
>> #define SOCK_NONBLOCK	O_NONBLOCK
>> #endif
>>
>> /*
>>   * Only used for io_uring accept4, and deliberately chosen to overlap
>>   * with the O_* file bits for read/write mode so we won't risk overlap
>>   * other flags added for socket/socketpair/accept4 use in the future.
>>   */
>> #define SOCK_URING_MULTISHOT	00000001
>>
>> which should be OK, as these overlap with the O_* filespace and the
>> read/write bits are at the start of that space.
>>
>> Should be done as a prep patch and sent out to netdev as well, so we can
>> get their sign-off on this "hack". If we can get that done, then we have
>> our flag and we can just stuff it in accept->flags as long as we clear
>> it before calling into accept from io_uring.
>>
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-05  6:25     ` Hao Xu
@ 2021-09-05  8:27       ` Pavel Begunkov
  0 siblings, 0 replies; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-05  8:27 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/5/21 7:25 AM, Hao Xu wrote:
> 在 2021/9/5 上午6:43, Pavel Begunkov 写道:
>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>
>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>> ---
>>>   fs/io_uring.c | 14 ++++++++++++--
>>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index eb81d37dce78..34612646ae3c 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>   static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>   {
>>>       struct io_accept *accept = &req->accept;
>>> +    bool is_multishot;
>>>         if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>           return -EINVAL;
>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>       accept->flags = READ_ONCE(sqe->accept_flags);
>>>       accept->nofile = rlimit(RLIMIT_NOFILE);
>>>   +    is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>> +    if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>> +        return -EINVAL;
>>
>> Why REQ_F_FORCE_ASYNC is not allowed? It doesn't sound like there
>> should be any problem, would just eventually go looping
>> poll_wait + tw
> Hmm..The arm_poll facility is only on the io_submit_sqes() path. If
> FORCE_ASYNC is set, the req goes to io_wq_submit_work-->io_issue_sqe.
> Moreover, I guess theoretically poll based retry and async iowq are
> two ways for users to handle their sqes, it may not be sane to go to
> poll based retry path if a user already forcely choose iowq as their
> prefer way.

The flag is just a hint, there are already plenty of ways to trick
requests into tw. Forbidding it would be inconsistent.


>>> +
>>>       accept->file_slot = READ_ONCE(sqe->file_index);
>>>       if (accept->file_slot && ((req->open.how.flags & O_CLOEXEC) ||
>>> -                  (accept->flags & SOCK_CLOEXEC)))
>>> +                  (accept->flags & SOCK_CLOEXEC) || is_multishot))
>>>           return -EINVAL;
>>> -    if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
>>> +    if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK | IORING_ACCEPT_MULTISHOT))
>>>           return -EINVAL;
>>>       if (SOCK_NONBLOCK != O_NONBLOCK && (accept->flags & SOCK_NONBLOCK))
>>>           accept->flags = (accept->flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
>>> +    if (is_multishot) {
>>> +        req->flags |= REQ_F_APOLL_MULTISHOT;
>>> +        accept->flags &= ~IORING_ACCEPT_MULTISHOT;
>>> +    }
>>> +
>>>       return 0;
>>>   }
>>>  
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-04 22:46         ` Pavel Begunkov
  2021-09-05  7:29           ` Hao Xu
@ 2021-09-05 19:44           ` Jens Axboe
  2021-09-06  8:26             ` Hao Xu
  2021-09-06 12:35             ` Hao Xu
  1 sibling, 2 replies; 46+ messages in thread
From: Jens Axboe @ 2021-09-05 19:44 UTC (permalink / raw)
  To: Pavel Begunkov, Hao Xu; +Cc: io-uring, Joseph Qi

On 9/4/21 4:46 PM, Pavel Begunkov wrote:
> On 9/4/21 7:40 PM, Jens Axboe wrote:
>> On 9/4/21 9:34 AM, Hao Xu wrote:
>>> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>>>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>>>
>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>> ---
>>>>>   fs/io_uring.c | 14 ++++++++++++--
>>>>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index eb81d37dce78..34612646ae3c 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>>>   static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>   {
>>>>>   	struct io_accept *accept = &req->accept;
>>>>> +	bool is_multishot;
>>>>>   
>>>>>   	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>   		return -EINVAL;
>>>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>   	accept->flags = READ_ONCE(sqe->accept_flags);
>>>>>   	accept->nofile = rlimit(RLIMIT_NOFILE);
>>>>>   
>>>>> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>>>> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>>>> +		return -EINVAL;
>>>>
>>>> I like the idea itself as I think it makes a lot of sense to just have
>>>> an accept sitting there and generating multiple CQEs, but I'm a bit
>>>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>>>> which can currently be:
>>>>
>>>> SOCK_NONBLOCK
>>>> SOCK_CLOEXEC
>>>>
>>>> While there's not any overlap here, that is mostly by chance I think. A
>>>> cleaner separation is needed here, what happens if some other accept4(2)
>>>> flag is enabled and it just happens to be the same as
>>>> IORING_ACCEPT_MULTISHOT?
>>> Make sense, how about a new IOSQE flag, I saw not many
>>> entries left there.
>>
>> Not quite sure what the best approach would be... The mshot flag only
>> makes sense for a few request types, so a bit of a shame to have to
>> waste an IOSQE flag on it. Especially when the flags otherwise passed in
>> are so sparse, there's plenty of bits there.
>>
>> Hence while it may not be the prettiest, perhaps using accept->flags is
>> ok and we just need some careful code to ensure that we never have any
>> overlap.
> 
> Or we can alias with some of the almost-never-used fields like
> ->ioprio or ->buf_index.

It's not a bad idea, as long as we can safely use flags from eg ioprio
for cases where ioprio would never be used. In that sense it's probably
safer than using buf_index.

The alternative is, as has been brougt up before, adding a flags2 and
reserving the last flag in ->flags to say "there are flags in flags2".
Not exactly super pretty either, but we'll need to extend them at some
point.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-05 19:44           ` Jens Axboe
@ 2021-09-06  8:26             ` Hao Xu
  2021-09-06  8:28               ` Hao Xu
  2021-09-06 13:24               ` Jens Axboe
  2021-09-06 12:35             ` Hao Xu
  1 sibling, 2 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-06  8:26 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: io-uring, Joseph Qi

在 2021/9/6 上午3:44, Jens Axboe 写道:
> On 9/4/21 4:46 PM, Pavel Begunkov wrote:
>> On 9/4/21 7:40 PM, Jens Axboe wrote:
>>> On 9/4/21 9:34 AM, Hao Xu wrote:
>>>> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>>>>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>>>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>>>>
>>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>>> ---
>>>>>>    fs/io_uring.c | 14 ++++++++++++--
>>>>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>> index eb81d37dce78..34612646ae3c 100644
>>>>>> --- a/fs/io_uring.c
>>>>>> +++ b/fs/io_uring.c
>>>>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>    static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>    {
>>>>>>    	struct io_accept *accept = &req->accept;
>>>>>> +	bool is_multishot;
>>>>>>    
>>>>>>    	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>>    		return -EINVAL;
>>>>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>    	accept->flags = READ_ONCE(sqe->accept_flags);
>>>>>>    	accept->nofile = rlimit(RLIMIT_NOFILE);
>>>>>>    
>>>>>> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>>>>> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>>>>> +		return -EINVAL;
>>>>>
>>>>> I like the idea itself as I think it makes a lot of sense to just have
>>>>> an accept sitting there and generating multiple CQEs, but I'm a bit
>>>>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>>>>> which can currently be:
>>>>>
>>>>> SOCK_NONBLOCK
>>>>> SOCK_CLOEXEC
>>>>>
>>>>> While there's not any overlap here, that is mostly by chance I think. A
>>>>> cleaner separation is needed here, what happens if some other accept4(2)
>>>>> flag is enabled and it just happens to be the same as
>>>>> IORING_ACCEPT_MULTISHOT?
>>>> Make sense, how about a new IOSQE flag, I saw not many
>>>> entries left there.
>>>
>>> Not quite sure what the best approach would be... The mshot flag only
>>> makes sense for a few request types, so a bit of a shame to have to
>>> waste an IOSQE flag on it. Especially when the flags otherwise passed in
>>> are so sparse, there's plenty of bits there.
>>>
>>> Hence while it may not be the prettiest, perhaps using accept->flags is
>>> ok and we just need some careful code to ensure that we never have any
>>> overlap.
>>
>> Or we can alias with some of the almost-never-used fields like
>> ->ioprio or ->buf_index.
> 
> It's not a bad idea, as long as we can safely use flags from eg ioprio
> for cases where ioprio would never be used. In that sense it's probably
> safer than using buf_index.
> 
> The alternative is, as has been brougt up before, adding a flags2 and
> reserving the last flag in ->flags to say "there are flags in flags2".
May I ask when do we have to use a bit in ->flags to do this? My
understanding is just leverage a __u8 in pad2[2] in io_uring_sqe to
deliver ext_flags.
> Not exactly super pretty either, but we'll need to extend them at some
> point.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-06  8:26             ` Hao Xu
@ 2021-09-06  8:28               ` Hao Xu
  2021-09-06 13:24               ` Jens Axboe
  1 sibling, 0 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-06  8:28 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: io-uring, Joseph Qi

在 2021/9/6 下午4:26, Hao Xu 写道:
> 在 2021/9/6 上午3:44, Jens Axboe 写道:
>> On 9/4/21 4:46 PM, Pavel Begunkov wrote:
>>> On 9/4/21 7:40 PM, Jens Axboe wrote:
>>>> On 9/4/21 9:34 AM, Hao Xu wrote:
>>>>> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>>>>>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>>>>>> Update io_accept_prep() to enable multishot mode for accept 
>>>>>>> operation.
>>>>>>>
>>>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>>>> ---
>>>>>>>    fs/io_uring.c | 14 ++++++++++++--
>>>>>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>> index eb81d37dce78..34612646ae3c 100644
>>>>>>> --- a/fs/io_uring.c
>>>>>>> +++ b/fs/io_uring.c
>>>>>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, 
>>>>>>> unsigned int issue_flags)
>>>>>>>    static int io_accept_prep(struct io_kiocb *req, const struct 
>>>>>>> io_uring_sqe *sqe)
>>>>>>>    {
>>>>>>>        struct io_accept *accept = &req->accept;
>>>>>>> +    bool is_multishot;
>>>>>>>        if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>>>            return -EINVAL;
>>>>>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb 
>>>>>>> *req, const struct io_uring_sqe *sqe)
>>>>>>>        accept->flags = READ_ONCE(sqe->accept_flags);
>>>>>>>        accept->nofile = rlimit(RLIMIT_NOFILE);
>>>>>>> +    is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>>>>>> +    if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>>>>>> +        return -EINVAL;
>>>>>>
>>>>>> I like the idea itself as I think it makes a lot of sense to just 
>>>>>> have
>>>>>> an accept sitting there and generating multiple CQEs, but I'm a bit
>>>>>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>>>>>> which can currently be:
>>>>>>
>>>>>> SOCK_NONBLOCK
>>>>>> SOCK_CLOEXEC
>>>>>>
>>>>>> While there's not any overlap here, that is mostly by chance I 
>>>>>> think. A
>>>>>> cleaner separation is needed here, what happens if some other 
>>>>>> accept4(2)
>>>>>> flag is enabled and it just happens to be the same as
>>>>>> IORING_ACCEPT_MULTISHOT?
>>>>> Make sense, how about a new IOSQE flag, I saw not many
>>>>> entries left there.
>>>>
>>>> Not quite sure what the best approach would be... The mshot flag only
>>>> makes sense for a few request types, so a bit of a shame to have to
>>>> waste an IOSQE flag on it. Especially when the flags otherwise 
>>>> passed in
>>>> are so sparse, there's plenty of bits there.
>>>>
>>>> Hence while it may not be the prettiest, perhaps using accept->flags is
>>>> ok and we just need some careful code to ensure that we never have any
>>>> overlap.
>>>
>>> Or we can alias with some of the almost-never-used fields like
>>> ->ioprio or ->buf_index.
>>
>> It's not a bad idea, as long as we can safely use flags from eg ioprio
>> for cases where ioprio would never be used. In that sense it's probably
>> safer than using buf_index.
>>
>> The alternative is, as has been brougt up before, adding a flags2 and
>> reserving the last flag in ->flags to say "there are flags in flags2".
> May I ask when do we have to use a bit in ->flags to do this? My
               ^why
> understanding is just leverage a __u8 in pad2[2] in io_uring_sqe to
> deliver ext_flags.
>> Not exactly super pretty either, but we'll need to extend them at some
>> point.
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-05 19:44           ` Jens Axboe
  2021-09-06  8:26             ` Hao Xu
@ 2021-09-06 12:35             ` Hao Xu
  2021-09-06 13:31               ` Jens Axboe
  2021-09-06 15:32               ` Pavel Begunkov
  1 sibling, 2 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-06 12:35 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: io-uring, Joseph Qi

在 2021/9/6 上午3:44, Jens Axboe 写道:
> On 9/4/21 4:46 PM, Pavel Begunkov wrote:
>> On 9/4/21 7:40 PM, Jens Axboe wrote:
>>> On 9/4/21 9:34 AM, Hao Xu wrote:
>>>> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>>>>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>>>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>>>>
>>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>>> ---
>>>>>>    fs/io_uring.c | 14 ++++++++++++--
>>>>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>> index eb81d37dce78..34612646ae3c 100644
>>>>>> --- a/fs/io_uring.c
>>>>>> +++ b/fs/io_uring.c
>>>>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>    static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>    {
>>>>>>    	struct io_accept *accept = &req->accept;
>>>>>> +	bool is_multishot;
>>>>>>    
>>>>>>    	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>>    		return -EINVAL;
>>>>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>    	accept->flags = READ_ONCE(sqe->accept_flags);
>>>>>>    	accept->nofile = rlimit(RLIMIT_NOFILE);
>>>>>>    
>>>>>> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>>>>> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>>>>> +		return -EINVAL;
>>>>>
>>>>> I like the idea itself as I think it makes a lot of sense to just have
>>>>> an accept sitting there and generating multiple CQEs, but I'm a bit
>>>>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>>>>> which can currently be:
>>>>>
>>>>> SOCK_NONBLOCK
>>>>> SOCK_CLOEXEC
>>>>>
>>>>> While there's not any overlap here, that is mostly by chance I think. A
>>>>> cleaner separation is needed here, what happens if some other accept4(2)
>>>>> flag is enabled and it just happens to be the same as
>>>>> IORING_ACCEPT_MULTISHOT?
>>>> Make sense, how about a new IOSQE flag, I saw not many
>>>> entries left there.
>>>
>>> Not quite sure what the best approach would be... The mshot flag only
>>> makes sense for a few request types, so a bit of a shame to have to
>>> waste an IOSQE flag on it. Especially when the flags otherwise passed in
>>> are so sparse, there's plenty of bits there.
>>>
>>> Hence while it may not be the prettiest, perhaps using accept->flags is
>>> ok and we just need some careful code to ensure that we never have any
>>> overlap.
>>
>> Or we can alias with some of the almost-never-used fields like
>> ->ioprio or ->buf_index.
> 
> It's not a bad idea, as long as we can safely use flags from eg ioprio
> for cases where ioprio would never be used. In that sense it's probably
> safer than using buf_index.
> 
> The alternative is, as has been brougt up before, adding a flags2 and
> reserving the last flag in ->flags to say "there are flags in flags2".
> Not exactly super pretty either, but we'll need to extend them at some
> point.
I'm going to do it in this way, there is another thing we have to do:
extend req->flags too, since flags we already used > 32 if we add
sqe->ext_flags
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-06  8:26             ` Hao Xu
  2021-09-06  8:28               ` Hao Xu
@ 2021-09-06 13:24               ` Jens Axboe
  1 sibling, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2021-09-06 13:24 UTC (permalink / raw)
  To: Hao Xu, Pavel Begunkov; +Cc: io-uring, Joseph Qi

On 9/6/21 2:26 AM, Hao Xu wrote:
> 在 2021/9/6 上午3:44, Jens Axboe 写道:
>> On 9/4/21 4:46 PM, Pavel Begunkov wrote:
>>> On 9/4/21 7:40 PM, Jens Axboe wrote:
>>>> On 9/4/21 9:34 AM, Hao Xu wrote:
>>>>> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>>>>>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>>>>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>>>>>
>>>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>>>> ---
>>>>>>>    fs/io_uring.c | 14 ++++++++++++--
>>>>>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>> index eb81d37dce78..34612646ae3c 100644
>>>>>>> --- a/fs/io_uring.c
>>>>>>> +++ b/fs/io_uring.c
>>>>>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>    static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>    {
>>>>>>>    	struct io_accept *accept = &req->accept;
>>>>>>> +	bool is_multishot;
>>>>>>>    
>>>>>>>    	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>>>    		return -EINVAL;
>>>>>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>    	accept->flags = READ_ONCE(sqe->accept_flags);
>>>>>>>    	accept->nofile = rlimit(RLIMIT_NOFILE);
>>>>>>>    
>>>>>>> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>>>>>> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>>>>>> +		return -EINVAL;
>>>>>>
>>>>>> I like the idea itself as I think it makes a lot of sense to just have
>>>>>> an accept sitting there and generating multiple CQEs, but I'm a bit
>>>>>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>>>>>> which can currently be:
>>>>>>
>>>>>> SOCK_NONBLOCK
>>>>>> SOCK_CLOEXEC
>>>>>>
>>>>>> While there's not any overlap here, that is mostly by chance I think. A
>>>>>> cleaner separation is needed here, what happens if some other accept4(2)
>>>>>> flag is enabled and it just happens to be the same as
>>>>>> IORING_ACCEPT_MULTISHOT?
>>>>> Make sense, how about a new IOSQE flag, I saw not many
>>>>> entries left there.
>>>>
>>>> Not quite sure what the best approach would be... The mshot flag only
>>>> makes sense for a few request types, so a bit of a shame to have to
>>>> waste an IOSQE flag on it. Especially when the flags otherwise passed in
>>>> are so sparse, there's plenty of bits there.
>>>>
>>>> Hence while it may not be the prettiest, perhaps using accept->flags is
>>>> ok and we just need some careful code to ensure that we never have any
>>>> overlap.
>>>
>>> Or we can alias with some of the almost-never-used fields like
>>> ->ioprio or ->buf_index.
>>
>> It's not a bad idea, as long as we can safely use flags from eg ioprio
>> for cases where ioprio would never be used. In that sense it's probably
>> safer than using buf_index.
>>
>> The alternative is, as has been brougt up before, adding a flags2 and
>> reserving the last flag in ->flags to say "there are flags in flags2".
> May I ask when do we have to use a bit in ->flags to do this? My
> understanding is just leverage a __u8 in pad2[2] in io_uring_sqe to
> deliver ext_flags.

We may not need to, it depends solely on whether or not we currently
check that pad for zero or not, and -EINVAL if it's not zero. Otherwise
you can end up with the unfortunate side effect that applications will
set some extended flag on an older kernel and think that it works, but
the old kernel just ignores is as it isn't checked.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-06 12:35             ` Hao Xu
@ 2021-09-06 13:31               ` Jens Axboe
  2021-09-06 15:00                 ` Hao Xu
  2021-09-06 15:32               ` Pavel Begunkov
  1 sibling, 1 reply; 46+ messages in thread
From: Jens Axboe @ 2021-09-06 13:31 UTC (permalink / raw)
  To: Hao Xu, Pavel Begunkov; +Cc: io-uring, Joseph Qi

On 9/6/21 6:35 AM, Hao Xu wrote:
> 在 2021/9/6 上午3:44, Jens Axboe 写道:
>> On 9/4/21 4:46 PM, Pavel Begunkov wrote:
>>> On 9/4/21 7:40 PM, Jens Axboe wrote:
>>>> On 9/4/21 9:34 AM, Hao Xu wrote:
>>>>> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>>>>>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>>>>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>>>>>
>>>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>>>> ---
>>>>>>>    fs/io_uring.c | 14 ++++++++++++--
>>>>>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>> index eb81d37dce78..34612646ae3c 100644
>>>>>>> --- a/fs/io_uring.c
>>>>>>> +++ b/fs/io_uring.c
>>>>>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>    static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>    {
>>>>>>>    	struct io_accept *accept = &req->accept;
>>>>>>> +	bool is_multishot;
>>>>>>>    
>>>>>>>    	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>>>    		return -EINVAL;
>>>>>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>    	accept->flags = READ_ONCE(sqe->accept_flags);
>>>>>>>    	accept->nofile = rlimit(RLIMIT_NOFILE);
>>>>>>>    
>>>>>>> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>>>>>> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>>>>>> +		return -EINVAL;
>>>>>>
>>>>>> I like the idea itself as I think it makes a lot of sense to just have
>>>>>> an accept sitting there and generating multiple CQEs, but I'm a bit
>>>>>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>>>>>> which can currently be:
>>>>>>
>>>>>> SOCK_NONBLOCK
>>>>>> SOCK_CLOEXEC
>>>>>>
>>>>>> While there's not any overlap here, that is mostly by chance I think. A
>>>>>> cleaner separation is needed here, what happens if some other accept4(2)
>>>>>> flag is enabled and it just happens to be the same as
>>>>>> IORING_ACCEPT_MULTISHOT?
>>>>> Make sense, how about a new IOSQE flag, I saw not many
>>>>> entries left there.
>>>>
>>>> Not quite sure what the best approach would be... The mshot flag only
>>>> makes sense for a few request types, so a bit of a shame to have to
>>>> waste an IOSQE flag on it. Especially when the flags otherwise passed in
>>>> are so sparse, there's plenty of bits there.
>>>>
>>>> Hence while it may not be the prettiest, perhaps using accept->flags is
>>>> ok and we just need some careful code to ensure that we never have any
>>>> overlap.
>>>
>>> Or we can alias with some of the almost-never-used fields like
>>> ->ioprio or ->buf_index.
>>
>> It's not a bad idea, as long as we can safely use flags from eg ioprio
>> for cases where ioprio would never be used. In that sense it's probably
>> safer than using buf_index.
>>
>> The alternative is, as has been brougt up before, adding a flags2 and
>> reserving the last flag in ->flags to say "there are flags in flags2".
>> Not exactly super pretty either, but we'll need to extend them at some
>> point.
> I'm going to do it in this way, there is another thing we have to do:
> extend req->flags too, since flags we already used > 32 if we add
> sqe->ext_flags

As far as I can tell from a quick look, there's still plenty of flags
left for REQ_F additions, about 8 of them. Don't expand req->flags if we
can avoid it, just add some safeguards to ensure we don't mess up.

Since we haven't really been tight on req->flags before, there's also
some low hanging fruit there that will allow us to reclaim some of them
if we need to.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-06 13:31               ` Jens Axboe
@ 2021-09-06 15:00                 ` Hao Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-06 15:00 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: io-uring, Joseph Qi

在 2021/9/6 下午9:31, Jens Axboe 写道:
> On 9/6/21 6:35 AM, Hao Xu wrote:
>> 在 2021/9/6 上午3:44, Jens Axboe 写道:
>>> On 9/4/21 4:46 PM, Pavel Begunkov wrote:
>>>> On 9/4/21 7:40 PM, Jens Axboe wrote:
>>>>> On 9/4/21 9:34 AM, Hao Xu wrote:
>>>>>> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>>>>>>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>>>>>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>>>>>>
>>>>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>>>>> ---
>>>>>>>>     fs/io_uring.c | 14 ++++++++++++--
>>>>>>>>     1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>>> index eb81d37dce78..34612646ae3c 100644
>>>>>>>> --- a/fs/io_uring.c
>>>>>>>> +++ b/fs/io_uring.c
>>>>>>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>>     static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>>     {
>>>>>>>>     	struct io_accept *accept = &req->accept;
>>>>>>>> +	bool is_multishot;
>>>>>>>>     
>>>>>>>>     	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>>>>     		return -EINVAL;
>>>>>>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>>     	accept->flags = READ_ONCE(sqe->accept_flags);
>>>>>>>>     	accept->nofile = rlimit(RLIMIT_NOFILE);
>>>>>>>>     
>>>>>>>> +	is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>>>>>>> +	if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>>>>>>> +		return -EINVAL;
>>>>>>>
>>>>>>> I like the idea itself as I think it makes a lot of sense to just have
>>>>>>> an accept sitting there and generating multiple CQEs, but I'm a bit
>>>>>>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>>>>>>> which can currently be:
>>>>>>>
>>>>>>> SOCK_NONBLOCK
>>>>>>> SOCK_CLOEXEC
>>>>>>>
>>>>>>> While there's not any overlap here, that is mostly by chance I think. A
>>>>>>> cleaner separation is needed here, what happens if some other accept4(2)
>>>>>>> flag is enabled and it just happens to be the same as
>>>>>>> IORING_ACCEPT_MULTISHOT?
>>>>>> Make sense, how about a new IOSQE flag, I saw not many
>>>>>> entries left there.
>>>>>
>>>>> Not quite sure what the best approach would be... The mshot flag only
>>>>> makes sense for a few request types, so a bit of a shame to have to
>>>>> waste an IOSQE flag on it. Especially when the flags otherwise passed in
>>>>> are so sparse, there's plenty of bits there.
>>>>>
>>>>> Hence while it may not be the prettiest, perhaps using accept->flags is
>>>>> ok and we just need some careful code to ensure that we never have any
>>>>> overlap.
>>>>
>>>> Or we can alias with some of the almost-never-used fields like
>>>> ->ioprio or ->buf_index.
>>>
>>> It's not a bad idea, as long as we can safely use flags from eg ioprio
>>> for cases where ioprio would never be used. In that sense it's probably
>>> safer than using buf_index.
>>>
>>> The alternative is, as has been brougt up before, adding a flags2 and
>>> reserving the last flag in ->flags to say "there are flags in flags2".
>>> Not exactly super pretty either, but we'll need to extend them at some
>>> point.
>> I'm going to do it in this way, there is another thing we have to do:
>> extend req->flags too, since flags we already used > 32 if we add
>> sqe->ext_flags
> 
> As far as I can tell from a quick look, there's still plenty of flags
> left for REQ_F additions, about 8 of them. Don't
Ah, sorry, I realised that I reserved the 8
ext_flags bits right after the first 8 bits for
sqe->flags for convenience. And that's why I
though not enough space in REQ_F. Thanks.
  expand req->flags if we
> can avoid it, just add some safeguards to ensure we don't mess up.
> 
> Since we haven't really been tight on req->flags before, there's also
> some low hanging fruit there that will allow us to reclaim some of them
> if we need to.
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-06 12:35             ` Hao Xu
  2021-09-06 13:31               ` Jens Axboe
@ 2021-09-06 15:32               ` Pavel Begunkov
  2021-09-06 16:42                 ` Jens Axboe
  1 sibling, 1 reply; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-06 15:32 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/6/21 1:35 PM, Hao Xu wrote:
> 在 2021/9/6 上午3:44, Jens Axboe 写道:
>> On 9/4/21 4:46 PM, Pavel Begunkov wrote:
>>> On 9/4/21 7:40 PM, Jens Axboe wrote:
>>>> On 9/4/21 9:34 AM, Hao Xu wrote:
>>>>> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>>>>>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>>>>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>>>>>
>>>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>>>> ---
>>>>>>>    fs/io_uring.c | 14 ++++++++++++--
>>>>>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>> index eb81d37dce78..34612646ae3c 100644
>>>>>>> --- a/fs/io_uring.c
>>>>>>> +++ b/fs/io_uring.c
>>>>>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>    static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>    {
>>>>>>>        struct io_accept *accept = &req->accept;
>>>>>>> +    bool is_multishot;
>>>>>>>           if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>>>            return -EINVAL;
>>>>>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>        accept->flags = READ_ONCE(sqe->accept_flags);
>>>>>>>        accept->nofile = rlimit(RLIMIT_NOFILE);
>>>>>>>    +    is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>>>>>> +    if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>>>>>> +        return -EINVAL;
>>>>>>
>>>>>> I like the idea itself as I think it makes a lot of sense to just have
>>>>>> an accept sitting there and generating multiple CQEs, but I'm a bit
>>>>>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>>>>>> which can currently be:
>>>>>>
>>>>>> SOCK_NONBLOCK
>>>>>> SOCK_CLOEXEC
>>>>>>
>>>>>> While there's not any overlap here, that is mostly by chance I think. A
>>>>>> cleaner separation is needed here, what happens if some other accept4(2)
>>>>>> flag is enabled and it just happens to be the same as
>>>>>> IORING_ACCEPT_MULTISHOT?
>>>>> Make sense, how about a new IOSQE flag, I saw not many
>>>>> entries left there.
>>>>
>>>> Not quite sure what the best approach would be... The mshot flag only
>>>> makes sense for a few request types, so a bit of a shame to have to
>>>> waste an IOSQE flag on it. Especially when the flags otherwise passed in
>>>> are so sparse, there's plenty of bits there.
>>>>
>>>> Hence while it may not be the prettiest, perhaps using accept->flags is
>>>> ok and we just need some careful code to ensure that we never have any
>>>> overlap.
>>>
>>> Or we can alias with some of the almost-never-used fields like
>>> ->ioprio or ->buf_index.
>>
>> It's not a bad idea, as long as we can safely use flags from eg ioprio
>> for cases where ioprio would never be used. In that sense it's probably
>> safer than using buf_index.
>>
>> The alternative is, as has been brougt up before, adding a flags2 and
>> reserving the last flag in ->flags to say "there are flags in flags2".
>> Not exactly super pretty either, but we'll need to extend them at some
>> point.
> I'm going to do it in this way, there is another thing we have to do:
> extend req->flags too, since flags we already used > 32 if we add
> sqe->ext_flags

We still have 2 bits left, and IIRC you wanted to take only 1 of them.
We don't need extending it at the moment, it sounded to me like a plan
for the future. No extra trouble for now

Anyway, I can't think of many requests working in this mode, and I think
sqe_flags should be taken only for features applicable to all (~most) of
requests. Maybe we'd better to fit it individually into accept in the
end? Sounds more plausible tbh

p.s. yes, there is IOSQE_BUFFER_SELECT, but I don't think that was the
best solution, but in any case it's history.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 5/6] io_uring: implement multishot mode for accept
  2021-09-04 22:40     ` Pavel Begunkov
@ 2021-09-06 15:34       ` Pavel Begunkov
  0 siblings, 0 replies; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-06 15:34 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/4/21 11:40 PM, Pavel Begunkov wrote:
> On 9/4/21 11:39 PM, Pavel Begunkov wrote:
>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>> Refactor io_accept() to support multishot mode.
>>
>> Multishot with the fixed/direct mode sounds weird (considering that
>> the slot index is specified by userspace), let's forbid them.
>>
>> io_accept_prep() {
>> 	if (accept->file_slot && (flags & MULTISHOT))
>> 		return -EINVAL;
>> 	...
>> }
> 
> Ah, never mind, it's already there in 6/6

btw, I think it would be less confusing if 5/6 and 6/6 are combined
into a single patch. But it's a matter of taste, I guess

>>
>>> theoretical analysis:
>>>   1) when connections come in fast
>>>     - singleshot:
>>>               add accept sqe(userpsace) --> accept inline
>>>                               ^                 |
>>>                               |-----------------|
>>>     - multishot:
>>>              add accept sqe(userspace) --> accept inline
>>>                                               ^     |
>>>                                               |--*--|
>>>
>>>     we do accept repeatedly in * place until get EAGAIN
>>>
>>>   2) when connections come in at a low pressure
>>>     similar thing like 1), we reduce a lot of userspace-kernel context
>>>     switch and useless vfs_poll()
>>>
>>> tests:
>>> Did some tests, which goes in this way:
>>>
>>>   server    client(multiple)
>>>   accept    connect
>>>   read      write
>>>   write     read
>>>   close     close
>>>
>>> Basically, raise up a number of clients(on same machine with server) to
>>> connect to the server, and then write some data to it, the server will
>>> write those data back to the client after it receives them, and then
>>> close the connection after write return. Then the client will read the
>>> data and then close the connection. Here I test 10000 clients connect
>>> one server, data size 128 bytes. And each client has a go routine for
>>> it, so they come to the server in short time.
>>> test 20 times before/after this patchset, time spent:(unit cycle, which
>>> is the return value of clock())
>>> before:
>>>   1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>>>   +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>>>   +1934226+1914385)/20.0 = 1927633.75
>>> after:
>>>   1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>>>   +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>>>   +1871324+1940803)/20.0 = 1894750.45
>>>
>>> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
>>>
>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>> ---
>>>
>>> not sure if we should cancel it when io_cqring_fill_event() reurn false
>>>
>>>  fs/io_uring.c | 34 +++++++++++++++++++++++++++++-----
>>>  1 file changed, 29 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index dae7044e0c24..eb81d37dce78 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -4885,16 +4885,18 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>  
>>>  static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>>>  {
>>> +	struct io_ring_ctx *ctx = req->ctx;
>>>  	struct io_accept *accept = &req->accept;
>>>  	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>>  	unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
>>>  	bool fixed = !!accept->file_slot;
>>>  	struct file *file;
>>> -	int ret, fd;
>>> +	int ret, ret2 = 0, fd;
>>>  
>>>  	if (req->file->f_flags & O_NONBLOCK)
>>>  		req->flags |= REQ_F_NOWAIT;
>>>  
>>> +retry:
>>>  	if (!fixed) {
>>>  		fd = __get_unused_fd_flags(accept->flags, accept->nofile);
>>>  		if (unlikely(fd < 0))
>>> @@ -4906,20 +4908,42 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>>>  		if (!fixed)
>>>  			put_unused_fd(fd);
>>>  		ret = PTR_ERR(file);
>>> -		if (ret == -EAGAIN && force_nonblock)
>>> -			return -EAGAIN;
>>> +		if (ret == -EAGAIN && force_nonblock) {
>>> +			if ((req->flags & (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)) ==
>>> +			    (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED))
>>> +				ret = 0;
>>> +			return ret;
>>> +		}
>>>  		if (ret == -ERESTARTSYS)
>>>  			ret = -EINTR;
>>>  		req_set_fail(req);
>>>  	} else if (!fixed) {
>>>  		fd_install(fd, file);
>>>  		ret = fd;
>>> +		/*
>>> +		 * if it's in multishot mode, let's return -EAGAIN to make it go
>>> +		 * into fast poll path
>>> +		 */
>>> +		if ((req->flags & REQ_F_APOLL_MULTISHOT) && force_nonblock &&
>>> +		   !(req->flags & REQ_F_POLLED))
>>> +			ret2 = -EAGAIN;
>>>  	} else {
>>>  		ret = io_install_fixed_file(req, file, issue_flags,
>>>  					    accept->file_slot - 1);
>>>  	}
>>> -	__io_req_complete(req, issue_flags, ret, 0);
>>> -	return 0;
>>> +
>>> +	if (req->flags & REQ_F_APOLL_MULTISHOT) {
>>> +		spin_lock(&ctx->completion_lock);
>>> +		if (io_cqring_fill_event(ctx, req->user_data, ret, 0)) {
>>> +			io_commit_cqring(ctx);
>>> +			ctx->cq_extra++;
>>> +		}
>>> +		spin_unlock(&ctx->completion_lock);
>>> +		goto retry;
>>> +	} else {
>>> +		__io_req_complete(req, issue_flags, ret, 0);
>>> +	}
>>> +	return ret2;
>>>  }
>>>  
>>>  static int io_connect_prep_async(struct io_kiocb *req)
>>>
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-03 11:00 ` [PATCH 4/6] io_uring: let fast poll support multishot Hao Xu
@ 2021-09-06 15:56   ` Pavel Begunkov
  2021-09-06 17:40     ` Hao Xu
  2021-09-06 19:04   ` Pavel Begunkov
  1 sibling, 1 reply; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-06 15:56 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/3/21 12:00 PM, Hao Xu wrote:
> For operations like accept, multishot is a useful feature, since we can
> reduce a number of accept sqe. Let's integrate it to fast poll, it may
> be good for other operations in the future.

__io_arm_poll_handler()         |
  -> vfs_poll()                 |
                                | io_async_task_func() // post CQE
                                | ...
                                | do_apoll_rewait();
  -> continues after vfs_poll(),|
     removing poll->head of     |
     the second poll attempt.   |


One of the reasons for forbidding multiple apoll's is that it
might be racy. I haven't looked into this implementation, but
we should check if there will be problems from that.

FWIW, putting aside this patchset, the poll/apoll is not in
the best shape and can use some refactoring.


> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
>  fs/io_uring.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index d6df60c4cdb9..dae7044e0c24 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>  		return;
>  	}
>  
> -	hash_del(&req->hash_node);
> -	io_poll_remove_double(req);
> +	if (READ_ONCE(apoll->poll.canceled))
> +		apoll->poll.events |= EPOLLONESHOT;
> +	if (apoll->poll.events & EPOLLONESHOT) {
> +		hash_del(&req->hash_node);
> +		io_poll_remove_double(req);
> +	} else {
> +		add_wait_queue(apoll->poll.head, &apoll->poll.wait);
> +	}
> +
>  	spin_unlock(&ctx->completion_lock);
>  
>  	if (!READ_ONCE(apoll->poll.canceled))
> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>  	struct io_ring_ctx *ctx = req->ctx;
>  	struct async_poll *apoll;
>  	struct io_poll_table ipt;
> -	__poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
> +	__poll_t ret, mask = POLLERR | POLLPRI;
>  	int rw;
>  
>  	if (!req->file || !file_can_poll(req->file))
> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>  		rw = WRITE;
>  		mask |= POLLOUT | POLLWRNORM;
>  	}
> +	if (!(req->flags & REQ_F_APOLL_MULTISHOT))
> +		mask |= EPOLLONESHOT;
>  
>  	/* if we can't nonblock try, then no point in arming a poll handler */
>  	if (!io_file_supports_nowait(req, rw))
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 6/6] io_uring: enable multishot mode for accept
  2021-09-06 15:32               ` Pavel Begunkov
@ 2021-09-06 16:42                 ` Jens Axboe
  0 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2021-09-06 16:42 UTC (permalink / raw)
  To: Pavel Begunkov, Hao Xu; +Cc: io-uring, Joseph Qi

On 9/6/21 9:32 AM, Pavel Begunkov wrote:
> On 9/6/21 1:35 PM, Hao Xu wrote:
>> 在 2021/9/6 上午3:44, Jens Axboe 写道:
>>> On 9/4/21 4:46 PM, Pavel Begunkov wrote:
>>>> On 9/4/21 7:40 PM, Jens Axboe wrote:
>>>>> On 9/4/21 9:34 AM, Hao Xu wrote:
>>>>>> 在 2021/9/4 上午12:29, Jens Axboe 写道:
>>>>>>> On 9/3/21 5:00 AM, Hao Xu wrote:
>>>>>>>> Update io_accept_prep() to enable multishot mode for accept operation.
>>>>>>>>
>>>>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>>>>> ---
>>>>>>>>    fs/io_uring.c | 14 ++++++++++++--
>>>>>>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>>> index eb81d37dce78..34612646ae3c 100644
>>>>>>>> --- a/fs/io_uring.c
>>>>>>>> +++ b/fs/io_uring.c
>>>>>>>> @@ -4861,6 +4861,7 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>>    static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>>    {
>>>>>>>>        struct io_accept *accept = &req->accept;
>>>>>>>> +    bool is_multishot;
>>>>>>>>           if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>>>>>>            return -EINVAL;
>>>>>>>> @@ -4872,14 +4873,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>>>>        accept->flags = READ_ONCE(sqe->accept_flags);
>>>>>>>>        accept->nofile = rlimit(RLIMIT_NOFILE);
>>>>>>>>    +    is_multishot = accept->flags & IORING_ACCEPT_MULTISHOT;
>>>>>>>> +    if (is_multishot && (req->flags & REQ_F_FORCE_ASYNC))
>>>>>>>> +        return -EINVAL;
>>>>>>>
>>>>>>> I like the idea itself as I think it makes a lot of sense to just have
>>>>>>> an accept sitting there and generating multiple CQEs, but I'm a bit
>>>>>>> puzzled by how you pass it in. accept->flags is the accept4(2) flags,
>>>>>>> which can currently be:
>>>>>>>
>>>>>>> SOCK_NONBLOCK
>>>>>>> SOCK_CLOEXEC
>>>>>>>
>>>>>>> While there's not any overlap here, that is mostly by chance I think. A
>>>>>>> cleaner separation is needed here, what happens if some other accept4(2)
>>>>>>> flag is enabled and it just happens to be the same as
>>>>>>> IORING_ACCEPT_MULTISHOT?
>>>>>> Make sense, how about a new IOSQE flag, I saw not many
>>>>>> entries left there.
>>>>>
>>>>> Not quite sure what the best approach would be... The mshot flag only
>>>>> makes sense for a few request types, so a bit of a shame to have to
>>>>> waste an IOSQE flag on it. Especially when the flags otherwise passed in
>>>>> are so sparse, there's plenty of bits there.
>>>>>
>>>>> Hence while it may not be the prettiest, perhaps using accept->flags is
>>>>> ok and we just need some careful code to ensure that we never have any
>>>>> overlap.
>>>>
>>>> Or we can alias with some of the almost-never-used fields like
>>>> ->ioprio or ->buf_index.
>>>
>>> It's not a bad idea, as long as we can safely use flags from eg ioprio
>>> for cases where ioprio would never be used. In that sense it's probably
>>> safer than using buf_index.
>>>
>>> The alternative is, as has been brougt up before, adding a flags2 and
>>> reserving the last flag in ->flags to say "there are flags in flags2".
>>> Not exactly super pretty either, but we'll need to extend them at some
>>> point.
>> I'm going to do it in this way, there is another thing we have to do:
>> extend req->flags too, since flags we already used > 32 if we add
>> sqe->ext_flags
> 
> We still have 2 bits left, and IIRC you wanted to take only 1 of them.
> We don't need extending it at the moment, it sounded to me like a plan
> for the future. No extra trouble for now

Right, and it should be a separate thing anyway. But as you say, there's
still 2 bits left, this is more about longer term planning than this
particular patchset.

> Anyway, I can't think of many requests working in this mode, and I think
> sqe_flags should be taken only for features applicable to all (~most) of
> requests. Maybe we'd better to fit it individually into accept in the
> end? Sounds more plausible tbh

That's why I suggested making it op private instead, I don't
particularly like having io_uring wide flags that are only applicable to
a (very) small subset of requests. And there's also precedence here
already in terms of poll supporting mshot with a private flag.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-06 15:56   ` Pavel Begunkov
@ 2021-09-06 17:40     ` Hao Xu
  2021-09-06 19:09       ` Pavel Begunkov
  0 siblings, 1 reply; 46+ messages in thread
From: Hao Xu @ 2021-09-06 17:40 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/6 下午11:56, Pavel Begunkov 写道:
> On 9/3/21 12:00 PM, Hao Xu wrote:
>> For operations like accept, multishot is a useful feature, since we can
>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>> be good for other operations in the future.
> 
> __io_arm_poll_handler()         |
>    -> vfs_poll()                 |
>                                  | io_async_task_func() // post CQE
>                                  | ...
>                                  | do_apoll_rewait();
>    -> continues after vfs_poll(),|
>       removing poll->head of     |
>       the second poll attempt.   |
> 
> 
Sorry.. a little bit confused by this case, would you mind explain a bit
more..is the right part a system-workqueue context? and is
do_apoll_rewait() io_poll_rewait() function?
> One of the reasons for forbidding multiple apoll's is that it
> might be racy. I haven't looked into this implementation, but
> we should check if there will be problems from that.
> 
> FWIW, putting aside this patchset, the poll/apoll is not in
> the best shape and can use some refactoring.
> 
> 
>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>> ---
>>   fs/io_uring.c | 15 ++++++++++++---
>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index d6df60c4cdb9..dae7044e0c24 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>>   		return;
>>   	}
>>   
>> -	hash_del(&req->hash_node);
>> -	io_poll_remove_double(req);
>> +	if (READ_ONCE(apoll->poll.canceled))
>> +		apoll->poll.events |= EPOLLONESHOT;
>> +	if (apoll->poll.events & EPOLLONESHOT) {
>> +		hash_del(&req->hash_node);
>> +		io_poll_remove_double(req);
>> +	} else {
>> +		add_wait_queue(apoll->poll.head, &apoll->poll.wait);
>> +	}
>> +
>>   	spin_unlock(&ctx->completion_lock);
>>   
>>   	if (!READ_ONCE(apoll->poll.canceled))
>> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>   	struct io_ring_ctx *ctx = req->ctx;
>>   	struct async_poll *apoll;
>>   	struct io_poll_table ipt;
>> -	__poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
>> +	__poll_t ret, mask = POLLERR | POLLPRI;
>>   	int rw;
>>   
>>   	if (!req->file || !file_can_poll(req->file))
>> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>   		rw = WRITE;
>>   		mask |= POLLOUT | POLLWRNORM;
>>   	}
>> +	if (!(req->flags & REQ_F_APOLL_MULTISHOT))
>> +		mask |= EPOLLONESHOT;
>>   
>>   	/* if we can't nonblock try, then no point in arming a poll handler */
>>   	if (!io_file_supports_nowait(req, rw))
>>
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-03 11:00 ` [PATCH 4/6] io_uring: let fast poll support multishot Hao Xu
  2021-09-06 15:56   ` Pavel Begunkov
@ 2021-09-06 19:04   ` Pavel Begunkov
  2021-09-07  6:48     ` Hao Xu
  1 sibling, 1 reply; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-06 19:04 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/3/21 12:00 PM, Hao Xu wrote:
> For operations like accept, multishot is a useful feature, since we can
> reduce a number of accept sqe. Let's integrate it to fast poll, it may
> be good for other operations in the future.
> 
> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
> ---
>  fs/io_uring.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index d6df60c4cdb9..dae7044e0c24 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>  		return;
>  	}
>  
> -	hash_del(&req->hash_node);
> -	io_poll_remove_double(req);
> +	if (READ_ONCE(apoll->poll.canceled))
> +		apoll->poll.events |= EPOLLONESHOT;
> +	if (apoll->poll.events & EPOLLONESHOT) {
> +		hash_del(&req->hash_node);
> +		io_poll_remove_double(req);
> +	} else {
> +		add_wait_queue(apoll->poll.head, &apoll->poll.wait);

It looks like it does both io_req_task_submit() and adding back
to the wq, so io_issue_sqe() may be called in parallel with
io_async_task_func(). If so, there will be tons of all kind of
races.

> +	}
> +
>  	spin_unlock(&ctx->completion_lock);
>  
>  	if (!READ_ONCE(apoll->poll.canceled))
> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>  	struct io_ring_ctx *ctx = req->ctx;
>  	struct async_poll *apoll;
>  	struct io_poll_table ipt;
> -	__poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
> +	__poll_t ret, mask = POLLERR | POLLPRI;
>  	int rw;
>  
>  	if (!req->file || !file_can_poll(req->file))
> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>  		rw = WRITE;
>  		mask |= POLLOUT | POLLWRNORM;
>  	}
> +	if (!(req->flags & REQ_F_APOLL_MULTISHOT))
> +		mask |= EPOLLONESHOT;
>  
>  	/* if we can't nonblock try, then no point in arming a poll handler */
>  	if (!io_file_supports_nowait(req, rw))
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-06 17:40     ` Hao Xu
@ 2021-09-06 19:09       ` Pavel Begunkov
  2021-09-07  6:38         ` Hao Xu
  0 siblings, 1 reply; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-06 19:09 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/6/21 6:40 PM, Hao Xu wrote:
> 在 2021/9/6 下午11:56, Pavel Begunkov 写道:
>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>> For operations like accept, multishot is a useful feature, since we can
>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>> be good for other operations in the future.
>>
>> __io_arm_poll_handler()         |
>>    -> vfs_poll()                 |
>>                                  | io_async_task_func() // post CQE
>>                                  | ...
>>                                  | do_apoll_rewait();
>>    -> continues after vfs_poll(),|
>>       removing poll->head of     |
>>       the second poll attempt.   |
>>
>>
> Sorry.. a little bit confused by this case, would you mind explain a bit
> more..is the right part a system-workqueue context? and is
> do_apoll_rewait() io_poll_rewait() function?

I meant in a broad sense. If your patches make lifetime of an accept
request to be like:

accept() -> arm_apoll() -> apoll_func() -> accept() -> ...
    -> ... (repeat many times)

then do_apoll_rewait() is the second accept in the scheme.

If not, and it's

accept() -> arm_poll() -> apoll_func() -> apoll_func() ->
 ... -> ?

Then that "do_apoll_rewait()" should have been second and
other apoll_func()s.

So, it's rather a thing to look after, but not a particular
bug.


>> One of the reasons for forbidding multiple apoll's is that it
>> might be racy. I haven't looked into this implementation, but
>> we should check if there will be problems from that.
>>
>> FWIW, putting aside this patchset, the poll/apoll is not in
>> the best shape and can use some refactoring.
>>
>>
[...]

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-06 19:09       ` Pavel Begunkov
@ 2021-09-07  6:38         ` Hao Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-07  6:38 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/7 上午3:09, Pavel Begunkov 写道:
> On 9/6/21 6:40 PM, Hao Xu wrote:
>> 在 2021/9/6 下午11:56, Pavel Begunkov 写道:
>>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>>> For operations like accept, multishot is a useful feature, since we can
>>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>>> be good for other operations in the future.
>>>
>>> __io_arm_poll_handler()         |
>>>     -> vfs_poll()                 |
>>>                                   | io_async_task_func() // post CQE
>>>                                   | ...
>>>                                   | do_apoll_rewait();
>>>     -> continues after vfs_poll(),|
>>>        removing poll->head of     |
>>>        the second poll attempt.
         this(removal) only happen when there is error or it is
EPOLLONESHOT
    |
>>>
>>>
>> Sorry.. a little bit confused by this case, would you mind explain a bit
>> more..is the right part a system-workqueue context? and is
>> do_apoll_rewait() io_poll_rewait() function?
> 
> I meant in a broad sense. If your patches make lifetime of an accept
> request to be like:
> 
> accept() -> arm_apoll() -> apoll_func() -> accept() -> ...
>      -> ... (repeat many times)
> 
> then do_apoll_rewait() is the second accept in the scheme.
> 
> If not, and it's
> 
> accept() -> arm_poll() -> apoll_func() -> apoll_func() ->
>   ... -> ?
> 
> Then that "do_apoll_rewait()" should have been second and
> other apoll_func()s.
> 
> So, it's rather a thing to look after, but not a particular
> bug.
> 
> 
>>> One of the reasons for forbidding multiple apoll's is that it
>>> might be racy. I haven't looked into this implementation, but
>>> we should check if there will be problems from that.
>>>
>>> FWIW, putting aside this patchset, the poll/apoll is not in
>>> the best shape and can use some refactoring.
>>>
>>>
> [...]
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-06 19:04   ` Pavel Begunkov
@ 2021-09-07  6:48     ` Hao Xu
  2021-09-08 11:21       ` Hao Xu
  0 siblings, 1 reply; 46+ messages in thread
From: Hao Xu @ 2021-09-07  6:48 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/7 上午3:04, Pavel Begunkov 写道:
> On 9/3/21 12:00 PM, Hao Xu wrote:
>> For operations like accept, multishot is a useful feature, since we can
>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>> be good for other operations in the future.
>>
>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>> ---
>>   fs/io_uring.c | 15 ++++++++++++---
>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index d6df60c4cdb9..dae7044e0c24 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>>   		return;
>>   	}
>>   
>> -	hash_del(&req->hash_node);
>> -	io_poll_remove_double(req);
>> +	if (READ_ONCE(apoll->poll.canceled))
>> +		apoll->poll.events |= EPOLLONESHOT;
>> +	if (apoll->poll.events & EPOLLONESHOT) {
>> +		hash_del(&req->hash_node);
>> +		io_poll_remove_double(req);
>> +	} else {
>> +		add_wait_queue(apoll->poll.head, &apoll->poll.wait);
> 
> It looks like it does both io_req_task_submit() and adding back
> to the wq, so io_issue_sqe() may be called in parallel with
> io_async_task_func(). If so, there will be tons of all kind of
> races.
IMHO, io_async_task_func() is called in original context one by
one(except PF_EXITING is set, it is also called in system-wq), so
shouldn't be parallel case there.
> 
>> +	}
>> +
>>   	spin_unlock(&ctx->completion_lock);
>>   
>>   	if (!READ_ONCE(apoll->poll.canceled))
>> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>   	struct io_ring_ctx *ctx = req->ctx;
>>   	struct async_poll *apoll;
>>   	struct io_poll_table ipt;
>> -	__poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
>> +	__poll_t ret, mask = POLLERR | POLLPRI;
>>   	int rw;
>>   
>>   	if (!req->file || !file_can_poll(req->file))
>> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>   		rw = WRITE;
>>   		mask |= POLLOUT | POLLWRNORM;
>>   	}
>> +	if (!(req->flags & REQ_F_APOLL_MULTISHOT))
>> +		mask |= EPOLLONESHOT;
>>   
>>   	/* if we can't nonblock try, then no point in arming a poll handler */
>>   	if (!io_file_supports_nowait(req, rw))
>>
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-07  6:48     ` Hao Xu
@ 2021-09-08 11:21       ` Hao Xu
  2021-09-08 12:03         ` Pavel Begunkov
  0 siblings, 1 reply; 46+ messages in thread
From: Hao Xu @ 2021-09-08 11:21 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/7 下午2:48, Hao Xu 写道:
> 在 2021/9/7 上午3:04, Pavel Begunkov 写道:
>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>> For operations like accept, multishot is a useful feature, since we can
>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>> be good for other operations in the future.
>>>
>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>> ---
>>>   fs/io_uring.c | 15 ++++++++++++---
>>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index d6df60c4cdb9..dae7044e0c24 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb 
>>> *req, bool *locked)
>>>           return;
>>>       }
>>> -    hash_del(&req->hash_node);
>>> -    io_poll_remove_double(req);
>>> +    if (READ_ONCE(apoll->poll.canceled))
>>> +        apoll->poll.events |= EPOLLONESHOT;
>>> +    if (apoll->poll.events & EPOLLONESHOT) {
>>> +        hash_del(&req->hash_node);
>>> +        io_poll_remove_double(req);
>>> +    } else {
>>> +        add_wait_queue(apoll->poll.head, &apoll->poll.wait);
>>
>> It looks like it does both io_req_task_submit() and adding back
>> to the wq, so io_issue_sqe() may be called in parallel with
>> io_async_task_func(). If so, there will be tons of all kind of
>> races.
> IMHO, io_async_task_func() is called in original context one by
> one(except PF_EXITING is set, it is also called in system-wq), so
> shouldn't be parallel case there.
ping...
>>
>>> +    }
>>> +
>>>       spin_unlock(&ctx->completion_lock);
>>>       if (!READ_ONCE(apoll->poll.canceled))
>>> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb 
>>> *req)
>>>       struct io_ring_ctx *ctx = req->ctx;
>>>       struct async_poll *apoll;
>>>       struct io_poll_table ipt;
>>> -    __poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
>>> +    __poll_t ret, mask = POLLERR | POLLPRI;
>>>       int rw;
>>>       if (!req->file || !file_can_poll(req->file))
>>> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb 
>>> *req)
>>>           rw = WRITE;
>>>           mask |= POLLOUT | POLLWRNORM;
>>>       }
>>> +    if (!(req->flags & REQ_F_APOLL_MULTISHOT))
>>> +        mask |= EPOLLONESHOT;
>>>       /* if we can't nonblock try, then no point in arming a poll 
>>> handler */
>>>       if (!io_file_supports_nowait(req, rw))
>>>
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-08 11:21       ` Hao Xu
@ 2021-09-08 12:03         ` Pavel Begunkov
  2021-09-08 13:13           ` Pavel Begunkov
  2021-09-09  7:01           ` Hao Xu
  0 siblings, 2 replies; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-08 12:03 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/8/21 12:21 PM, Hao Xu wrote:
> 在 2021/9/7 下午2:48, Hao Xu 写道:
>> 在 2021/9/7 上午3:04, Pavel Begunkov 写道:
>>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>>> For operations like accept, multishot is a useful feature, since we can
>>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>>> be good for other operations in the future.
>>>>
>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>> ---
>>>>   fs/io_uring.c | 15 ++++++++++++---
>>>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index d6df60c4cdb9..dae7044e0c24 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>>>>           return;
>>>>       }
>>>> -    hash_del(&req->hash_node);
>>>> -    io_poll_remove_double(req);
>>>> +    if (READ_ONCE(apoll->poll.canceled))
>>>> +        apoll->poll.events |= EPOLLONESHOT;
>>>> +    if (apoll->poll.events & EPOLLONESHOT) {
>>>> +        hash_del(&req->hash_node);
>>>> +        io_poll_remove_double(req);
>>>> +    } else {
>>>> +        add_wait_queue(apoll->poll.head, &apoll->poll.wait);
>>>
>>> It looks like it does both io_req_task_submit() and adding back
>>> to the wq, so io_issue_sqe() may be called in parallel with
>>> io_async_task_func(). If so, there will be tons of all kind of
>>> races.
>> IMHO, io_async_task_func() is called in original context one by
>> one(except PF_EXITING is set, it is also called in system-wq), so
>> shouldn't be parallel case there.
> ping...

fwiw, the case we're talking about:

CPU0                            | CPU1
io_async_task_func()            |
-> add_wait_queue();            |
-> io_req_task_submit();        |
               /* no tw run happened in between */
                                | io_async_task_func()
                                | --> io_req_task_submit()

We called io_req_task_submit() twice without running tw in-between,
both of the calls use the same req->io_task_work.node field in the
request for accounting, and so the second call will screw
tctx->task_list and not only by not considering that
req->io_task_work.node is already taken/enqueued.

io_req_task_work_add() {
        wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
}

>>>
>>>> +    }
>>>> +
>>>>       spin_unlock(&ctx->completion_lock);
>>>>       if (!READ_ONCE(apoll->poll.canceled))
>>>> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>>>       struct io_ring_ctx *ctx = req->ctx;
>>>>       struct async_poll *apoll;
>>>>       struct io_poll_table ipt;
>>>> -    __poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
>>>> +    __poll_t ret, mask = POLLERR | POLLPRI;
>>>>       int rw;
>>>>       if (!req->file || !file_can_poll(req->file))
>>>> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>>>           rw = WRITE;
>>>>           mask |= POLLOUT | POLLWRNORM;
>>>>       }
>>>> +    if (!(req->flags & REQ_F_APOLL_MULTISHOT))
>>>> +        mask |= EPOLLONESHOT;
>>>>       /* if we can't nonblock try, then no point in arming a poll handler */
>>>>       if (!io_file_supports_nowait(req, rw))
>>>>
>>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-08 12:03         ` Pavel Begunkov
@ 2021-09-08 13:13           ` Pavel Begunkov
  2021-09-09  7:01           ` Hao Xu
  1 sibling, 0 replies; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-08 13:13 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/8/21 1:03 PM, Pavel Begunkov wrote:
> On 9/8/21 12:21 PM, Hao Xu wrote:
>> 在 2021/9/7 下午2:48, Hao Xu 写道:
>>> 在 2021/9/7 上午3:04, Pavel Begunkov 写道:
>>>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>>>> For operations like accept, multishot is a useful feature, since we can
>>>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>>>> be good for other operations in the future.
>>>>>
>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>> ---
>>>>>   fs/io_uring.c | 15 ++++++++++++---
>>>>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index d6df60c4cdb9..dae7044e0c24 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>>>>>           return;
>>>>>       }
>>>>> -    hash_del(&req->hash_node);
>>>>> -    io_poll_remove_double(req);
>>>>> +    if (READ_ONCE(apoll->poll.canceled))
>>>>> +        apoll->poll.events |= EPOLLONESHOT;
>>>>> +    if (apoll->poll.events & EPOLLONESHOT) {
>>>>> +        hash_del(&req->hash_node);
>>>>> +        io_poll_remove_double(req);
>>>>> +    } else {
>>>>> +        add_wait_queue(apoll->poll.head, &apoll->poll.wait);
>>>>
>>>> It looks like it does both io_req_task_submit() and adding back
>>>> to the wq, so io_issue_sqe() may be called in parallel with
>>>> io_async_task_func(). If so, there will be tons of all kind of
>>>> races.
>>> IMHO, io_async_task_func() is called in original context one by
>>> one(except PF_EXITING is set, it is also called in system-wq), so
>>> shouldn't be parallel case there.
>> ping...
> 
> fwiw, the case we're talking about:
> 
> CPU0                            | CPU1
> io_async_task_func()            |
> -> add_wait_queue();            |
> -> io_req_task_submit();        |
>                /* no tw run happened in between */
>                                 | io_async_task_func()
>                                 | --> io_req_task_submit()
> 
> We called io_req_task_submit() twice without running tw in-between,
> both of the calls use the same req->io_task_work.node field in the
> request for accounting, and so the second call will screw
> tctx->task_list and not only by not considering that
> req->io_task_work.node is already taken/enqueued.
> 
> io_req_task_work_add() {
>         wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
> }

fwiw, can be just 1 CPU, just

io_req_task_work_add();
io_req_task_work_add();
task_work_run(); // first one

is buggy in current constraints.

> 
>>>>
>>>>> +    }
>>>>> +
>>>>>       spin_unlock(&ctx->completion_lock);
>>>>>       if (!READ_ONCE(apoll->poll.canceled))
>>>>> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>>>>       struct io_ring_ctx *ctx = req->ctx;
>>>>>       struct async_poll *apoll;
>>>>>       struct io_poll_table ipt;
>>>>> -    __poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
>>>>> +    __poll_t ret, mask = POLLERR | POLLPRI;
>>>>>       int rw;
>>>>>       if (!req->file || !file_can_poll(req->file))
>>>>> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>>>>           rw = WRITE;
>>>>>           mask |= POLLOUT | POLLWRNORM;
>>>>>       }
>>>>> +    if (!(req->flags & REQ_F_APOLL_MULTISHOT))
>>>>> +        mask |= EPOLLONESHOT;
>>>>>       /* if we can't nonblock try, then no point in arming a poll handler */
>>>>>       if (!io_file_supports_nowait(req, rw))
>>>>>
>>>>
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-08 12:03         ` Pavel Begunkov
  2021-09-08 13:13           ` Pavel Begunkov
@ 2021-09-09  7:01           ` Hao Xu
  2021-09-09  8:29             ` Hao Xu
  1 sibling, 1 reply; 46+ messages in thread
From: Hao Xu @ 2021-09-09  7:01 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/8 下午8:03, Pavel Begunkov 写道:
> On 9/8/21 12:21 PM, Hao Xu wrote:
>> 在 2021/9/7 下午2:48, Hao Xu 写道:
>>> 在 2021/9/7 上午3:04, Pavel Begunkov 写道:
>>>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>>>> For operations like accept, multishot is a useful feature, since we can
>>>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>>>> be good for other operations in the future.
>>>>>
>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>> ---
>>>>>    fs/io_uring.c | 15 ++++++++++++---
>>>>>    1 file changed, 12 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index d6df60c4cdb9..dae7044e0c24 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>>>>>            return;
>>>>>        }
>>>>> -    hash_del(&req->hash_node);
>>>>> -    io_poll_remove_double(req);
>>>>> +    if (READ_ONCE(apoll->poll.canceled))
>>>>> +        apoll->poll.events |= EPOLLONESHOT;
>>>>> +    if (apoll->poll.events & EPOLLONESHOT) {
>>>>> +        hash_del(&req->hash_node);
>>>>> +        io_poll_remove_double(req);
>>>>> +    } else {
>>>>> +        add_wait_queue(apoll->poll.head, &apoll->poll.wait);
>>>>
>>>> It looks like it does both io_req_task_submit() and adding back
>>>> to the wq, so io_issue_sqe() may be called in parallel with
>>>> io_async_task_func(). If so, there will be tons of all kind of
>>>> races.
>>> IMHO, io_async_task_func() is called in original context one by
>>> one(except PF_EXITING is set, it is also called in system-wq), so
>>> shouldn't be parallel case there.
>> ping...
> 
> fwiw, the case we're talking about:
> 
> CPU0                            | CPU1
> io_async_task_func()            |
> -> add_wait_queue();            |
> -> io_req_task_submit();        |
>                 /* no tw run happened in between */
>                                  | io_async_task_func()
>                                  | --> io_req_task_submit()
> 
> We called io_req_task_submit() twice without running tw in-between,
> both of the calls use the same req->io_task_work.node field in the
> request for accounting, and so the second call will screw
> tctx->task_list and not only by not considering that
> req->io_task_work.node is already taken/enqueued.
> 
> io_req_task_work_add() {
>          wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
> }
> 
I guess you mean io_req_task_work_add() called by async_wake() two times:
io_async_task_func()
-> add_wait_queue()
                             async_wake()
                             ->io_req_task_work_add()
                             this one mess up the running task_work list
                             since req->io_task_work.node is in use.

It seems the current poll_add + multishot logic has this issue too, I'll
give it a shot(simply clean req->io_task_work.node before running
req->io_task_work.func should work)
>>>>
>>>>> +    }
>>>>> +
>>>>>        spin_unlock(&ctx->completion_lock);
>>>>>        if (!READ_ONCE(apoll->poll.canceled))
>>>>> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>>>>        struct io_ring_ctx *ctx = req->ctx;
>>>>>        struct async_poll *apoll;
>>>>>        struct io_poll_table ipt;
>>>>> -    __poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
>>>>> +    __poll_t ret, mask = POLLERR | POLLPRI;
>>>>>        int rw;
>>>>>        if (!req->file || !file_can_poll(req->file))
>>>>> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>>>>            rw = WRITE;
>>>>>            mask |= POLLOUT | POLLWRNORM;
>>>>>        }
>>>>> +    if (!(req->flags & REQ_F_APOLL_MULTISHOT))
>>>>> +        mask |= EPOLLONESHOT;
>>>>>        /* if we can't nonblock try, then no point in arming a poll handler */
>>>>>        if (!io_file_supports_nowait(req, rw))
>>>>>
>>>>
>>
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-09  7:01           ` Hao Xu
@ 2021-09-09  8:29             ` Hao Xu
  2021-09-11 10:49               ` Pavel Begunkov
  0 siblings, 1 reply; 46+ messages in thread
From: Hao Xu @ 2021-09-09  8:29 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/9 下午3:01, Hao Xu 写道:
> 在 2021/9/8 下午8:03, Pavel Begunkov 写道:
>> On 9/8/21 12:21 PM, Hao Xu wrote:
>>> 在 2021/9/7 下午2:48, Hao Xu 写道:
>>>> 在 2021/9/7 上午3:04, Pavel Begunkov 写道:
>>>>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>>>>> For operations like accept, multishot is a useful feature, since 
>>>>>> we can
>>>>>> reduce a number of accept sqe. Let's integrate it to fast poll, it 
>>>>>> may
>>>>>> be good for other operations in the future.
>>>>>>
>>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>>> ---
>>>>>>    fs/io_uring.c | 15 ++++++++++++---
>>>>>>    1 file changed, 12 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>> index d6df60c4cdb9..dae7044e0c24 100644
>>>>>> --- a/fs/io_uring.c
>>>>>> +++ b/fs/io_uring.c
>>>>>> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct 
>>>>>> io_kiocb *req, bool *locked)
>>>>>>            return;
>>>>>>        }
>>>>>> -    hash_del(&req->hash_node);
>>>>>> -    io_poll_remove_double(req);
>>>>>> +    if (READ_ONCE(apoll->poll.canceled))
>>>>>> +        apoll->poll.events |= EPOLLONESHOT;
>>>>>> +    if (apoll->poll.events & EPOLLONESHOT) {
>>>>>> +        hash_del(&req->hash_node);
>>>>>> +        io_poll_remove_double(req);
>>>>>> +    } else {
>>>>>> +        add_wait_queue(apoll->poll.head, &apoll->poll.wait);
>>>>>
>>>>> It looks like it does both io_req_task_submit() and adding back
>>>>> to the wq, so io_issue_sqe() may be called in parallel with
>>>>> io_async_task_func(). If so, there will be tons of all kind of
>>>>> races.
>>>> IMHO, io_async_task_func() is called in original context one by
>>>> one(except PF_EXITING is set, it is also called in system-wq), so
>>>> shouldn't be parallel case there.
>>> ping...
>>
>> fwiw, the case we're talking about:
>>
>> CPU0                            | CPU1
>> io_async_task_func()            |
>> -> add_wait_queue();            |
>> -> io_req_task_submit();        |
>>                 /* no tw run happened in between */
>>                                  | io_async_task_func()
>>                                  | --> io_req_task_submit()
>>
>> We called io_req_task_submit() twice without running tw in-between,
>> both of the calls use the same req->io_task_work.node field in the
>> request for accounting, and so the second call will screw
>> tctx->task_list and not only by not considering that
>> req->io_task_work.node is already taken/enqueued.
>>
>> io_req_task_work_add() {
>>          wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
>> }
>>
> I guess you mean io_req_task_work_add() called by async_wake() two times:
> io_async_task_func()
> -> add_wait_queue()
>                              async_wake()
>                              ->io_req_task_work_add()
>                              this one mess up the running task_work list
>                              since req->io_task_work.node is in use.
> 
> It seems the current poll_add + multishot logic has this issue too, I'll
> give it a shot(simply clean req->io_task_work.node before running
> req->io_task_work.func should work)
Similar issue for double wait entry since we didn't remove double entry
in interrupt handler:
async_wake() --> io_req_task_work_add()
io_poll_double_wake()-->async_wake()-->io_req_task_work_add()

>>>>>
>>>>>> +    }
>>>>>> +
>>>>>>        spin_unlock(&ctx->completion_lock);
>>>>>>        if (!READ_ONCE(apoll->poll.canceled))
>>>>>> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct 
>>>>>> io_kiocb *req)
>>>>>>        struct io_ring_ctx *ctx = req->ctx;
>>>>>>        struct async_poll *apoll;
>>>>>>        struct io_poll_table ipt;
>>>>>> -    __poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
>>>>>> +    __poll_t ret, mask = POLLERR | POLLPRI;
>>>>>>        int rw;
>>>>>>        if (!req->file || !file_can_poll(req->file))
>>>>>> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct 
>>>>>> io_kiocb *req)
>>>>>>            rw = WRITE;
>>>>>>            mask |= POLLOUT | POLLWRNORM;
>>>>>>        }
>>>>>> +    if (!(req->flags & REQ_F_APOLL_MULTISHOT))
>>>>>> +        mask |= EPOLLONESHOT;
>>>>>>        /* if we can't nonblock try, then no point in arming a poll 
>>>>>> handler */
>>>>>>        if (!io_file_supports_nowait(req, rw))
>>>>>>
>>>>>
>>>
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-09  8:29             ` Hao Xu
@ 2021-09-11 10:49               ` Pavel Begunkov
  2021-09-11 20:19                 ` Hao Xu
  0 siblings, 1 reply; 46+ messages in thread
From: Pavel Begunkov @ 2021-09-11 10:49 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 9/9/21 9:29 AM, Hao Xu wrote:
> 在 2021/9/9 下午3:01, Hao Xu 写道:
>> 在 2021/9/8 下午8:03, Pavel Begunkov 写道:
>>> On 9/8/21 12:21 PM, Hao Xu wrote:
>>>> 在 2021/9/7 下午2:48, Hao Xu 写道:
>>>>> 在 2021/9/7 上午3:04, Pavel Begunkov 写道:
>>>>>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>>>>>> For operations like accept, multishot is a useful feature, since we can
>>>>>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>>>>>> be good for other operations in the future.
>>>>>>>
>>>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>>>> ---
>>>>>>>    fs/io_uring.c | 15 ++++++++++++---
>>>>>>>    1 file changed, 12 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>> index d6df60c4cdb9..dae7044e0c24 100644
>>>>>>> --- a/fs/io_uring.c
>>>>>>> +++ b/fs/io_uring.c
>>>>>>> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>>>>>>>            return;
>>>>>>>        }
>>>>>>> -    hash_del(&req->hash_node);
>>>>>>> -    io_poll_remove_double(req);
>>>>>>> +    if (READ_ONCE(apoll->poll.canceled))
>>>>>>> +        apoll->poll.events |= EPOLLONESHOT;
>>>>>>> +    if (apoll->poll.events & EPOLLONESHOT) {
>>>>>>> +        hash_del(&req->hash_node);
>>>>>>> +        io_poll_remove_double(req);
>>>>>>> +    } else {
>>>>>>> +        add_wait_queue(apoll->poll.head, &apoll->poll.wait);
>>>>>>
>>>>>> It looks like it does both io_req_task_submit() and adding back
>>>>>> to the wq, so io_issue_sqe() may be called in parallel with
>>>>>> io_async_task_func(). If so, there will be tons of all kind of
>>>>>> races.
>>>>> IMHO, io_async_task_func() is called in original context one by
>>>>> one(except PF_EXITING is set, it is also called in system-wq), so
>>>>> shouldn't be parallel case there.
>>>> ping...
>>>
>>> fwiw, the case we're talking about:
>>>
>>> CPU0                            | CPU1
>>> io_async_task_func()            |
>>> -> add_wait_queue();            |
>>> -> io_req_task_submit();        |
>>>                 /* no tw run happened in between */
>>>                                  | io_async_task_func()
>>>                                  | --> io_req_task_submit()
>>>
>>> We called io_req_task_submit() twice without running tw in-between,
>>> both of the calls use the same req->io_task_work.node field in the
>>> request for accounting, and so the second call will screw
>>> tctx->task_list and not only by not considering that
>>> req->io_task_work.node is already taken/enqueued.
>>>
>>> io_req_task_work_add() {
>>>          wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
>>> }
>>>
>> I guess you mean io_req_task_work_add() called by async_wake() two times:
>> io_async_task_func()
>> -> add_wait_queue()
>>                              async_wake()
>>                              ->io_req_task_work_add()
>>                              this one mess up the running task_work list
>>                              since req->io_task_work.node is in use.
>>
>> It seems the current poll_add + multishot logic has this issue too, I'll
>> give it a shot(simply clean req->io_task_work.node before running
>> req->io_task_work.func should work)
> Similar issue for double wait entry since we didn't remove double entry
> in interrupt handler:

Yep, sounds like that. Polling needs reworking, and not only
because of this one.


> async_wake() --> io_req_task_work_add()
> io_poll_double_wake()-->async_wake()-->io_req_task_work_add()
> 
>>>>>>
>>>>>>> +    }
>>>>>>> +
>>>>>>>        spin_unlock(&ctx->completion_lock);
>>>>>>>        if (!READ_ONCE(apoll->poll.canceled))
>>>>>>> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>>>>>>        struct io_ring_ctx *ctx = req->ctx;
>>>>>>>        struct async_poll *apoll;
>>>>>>>        struct io_poll_table ipt;
>>>>>>> -    __poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
>>>>>>> +    __poll_t ret, mask = POLLERR | POLLPRI;
>>>>>>>        int rw;
>>>>>>>        if (!req->file || !file_can_poll(req->file))
>>>>>>> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>>>>>>            rw = WRITE;
>>>>>>>            mask |= POLLOUT | POLLWRNORM;
>>>>>>>        }
>>>>>>> +    if (!(req->flags & REQ_F_APOLL_MULTISHOT))
>>>>>>> +        mask |= EPOLLONESHOT;
>>>>>>>        /* if we can't nonblock try, then no point in arming a poll handler */
>>>>>>>        if (!io_file_supports_nowait(req, rw))
>>>>>>>
>>>>>>
>>>>
>>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 4/6] io_uring: let fast poll support multishot
  2021-09-11 10:49               ` Pavel Begunkov
@ 2021-09-11 20:19                 ` Hao Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-11 20:19 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/11 下午6:49, Pavel Begunkov 写道:
> On 9/9/21 9:29 AM, Hao Xu wrote:
>> 在 2021/9/9 下午3:01, Hao Xu 写道:
>>> 在 2021/9/8 下午8:03, Pavel Begunkov 写道:
>>>> On 9/8/21 12:21 PM, Hao Xu wrote:
>>>>> 在 2021/9/7 下午2:48, Hao Xu 写道:
>>>>>> 在 2021/9/7 上午3:04, Pavel Begunkov 写道:
>>>>>>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>>>>>>> For operations like accept, multishot is a useful feature, since we can
>>>>>>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>>>>>>> be good for other operations in the future.
>>>>>>>>
>>>>>>>> Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
>>>>>>>> ---
>>>>>>>>     fs/io_uring.c | 15 ++++++++++++---
>>>>>>>>     1 file changed, 12 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>>> index d6df60c4cdb9..dae7044e0c24 100644
>>>>>>>> --- a/fs/io_uring.c
>>>>>>>> +++ b/fs/io_uring.c
>>>>>>>> @@ -5277,8 +5277,15 @@ static void io_async_task_func(struct io_kiocb *req, bool *locked)
>>>>>>>>             return;
>>>>>>>>         }
>>>>>>>> -    hash_del(&req->hash_node);
>>>>>>>> -    io_poll_remove_double(req);
>>>>>>>> +    if (READ_ONCE(apoll->poll.canceled))
>>>>>>>> +        apoll->poll.events |= EPOLLONESHOT;
>>>>>>>> +    if (apoll->poll.events & EPOLLONESHOT) {
>>>>>>>> +        hash_del(&req->hash_node);
>>>>>>>> +        io_poll_remove_double(req);
>>>>>>>> +    } else {
>>>>>>>> +        add_wait_queue(apoll->poll.head, &apoll->poll.wait);
>>>>>>>
>>>>>>> It looks like it does both io_req_task_submit() and adding back
>>>>>>> to the wq, so io_issue_sqe() may be called in parallel with
>>>>>>> io_async_task_func(). If so, there will be tons of all kind of
>>>>>>> races.
>>>>>> IMHO, io_async_task_func() is called in original context one by
>>>>>> one(except PF_EXITING is set, it is also called in system-wq), so
>>>>>> shouldn't be parallel case there.
>>>>> ping...
>>>>
>>>> fwiw, the case we're talking about:
>>>>
>>>> CPU0                            | CPU1
>>>> io_async_task_func()            |
>>>> -> add_wait_queue();            |
>>>> -> io_req_task_submit();        |
>>>>                  /* no tw run happened in between */
>>>>                                   | io_async_task_func()
>>>>                                   | --> io_req_task_submit()
>>>>
>>>> We called io_req_task_submit() twice without running tw in-between,
>>>> both of the calls use the same req->io_task_work.node field in the
>>>> request for accounting, and so the second call will screw
>>>> tctx->task_list and not only by not considering that
>>>> req->io_task_work.node is already taken/enqueued.
>>>>
>>>> io_req_task_work_add() {
>>>>           wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
>>>> }
>>>>
>>> I guess you mean io_req_task_work_add() called by async_wake() two times:
>>> io_async_task_func()
>>> -> add_wait_queue()
>>>                               async_wake()
>>>                               ->io_req_task_work_add()
>>>                               this one mess up the running task_work list
>>>                               since req->io_task_work.node is in use.
>>>
>>> It seems the current poll_add + multishot logic has this issue too, I'll
>>> give it a shot(simply clean req->io_task_work.node before running
>>> req->io_task_work.func should work)
>> Similar issue for double wait entry since we didn't remove double entry
>> in interrupt handler:
> 
> Yep, sounds like that. Polling needs reworking, and not only
> because of this one.
Currently I'm working on these type of poll issues, will send something
soon.
> 
> 
>> async_wake() --> io_req_task_work_add()
>> io_poll_double_wake()-->async_wake()-->io_req_task_work_add()
>>
>>>>>>>
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>         spin_unlock(&ctx->completion_lock);
>>>>>>>>         if (!READ_ONCE(apoll->poll.canceled))
>>>>>>>> @@ -5366,7 +5373,7 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>>>>>>>         struct io_ring_ctx *ctx = req->ctx;
>>>>>>>>         struct async_poll *apoll;
>>>>>>>>         struct io_poll_table ipt;
>>>>>>>> -    __poll_t ret, mask = EPOLLONESHOT | POLLERR | POLLPRI;
>>>>>>>> +    __poll_t ret, mask = POLLERR | POLLPRI;
>>>>>>>>         int rw;
>>>>>>>>         if (!req->file || !file_can_poll(req->file))
>>>>>>>> @@ -5388,6 +5395,8 @@ static int io_arm_poll_handler(struct io_kiocb *req)
>>>>>>>>             rw = WRITE;
>>>>>>>>             mask |= POLLOUT | POLLWRNORM;
>>>>>>>>         }
>>>>>>>> +    if (!(req->flags & REQ_F_APOLL_MULTISHOT))
>>>>>>>> +        mask |= EPOLLONESHOT;
>>>>>>>>         /* if we can't nonblock try, then no point in arming a poll handler */
>>>>>>>>         if (!io_file_supports_nowait(req, rw))
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/6] io_uring: enhance flush completion logic
  2021-09-03 13:38         ` Hao Xu
@ 2021-09-17 18:49           ` Hao Xu
  0 siblings, 0 replies; 46+ messages in thread
From: Hao Xu @ 2021-09-17 18:49 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/9/3 下午9:38, Hao Xu 写道:
> 在 2021/9/3 下午8:27, Pavel Begunkov 写道:
>> On 9/3/21 1:08 PM, Hao Xu wrote:
>>> 在 2021/9/3 下午7:42, Pavel Begunkov 写道:
>>>> On 9/3/21 12:00 PM, Hao Xu wrote:
>>>>> Though currently refcount of a req is always one when we flush inline
>>>>
>>>> It can be refcounted and != 1. E.g. poll requests, or consider
>>> It seems poll requests don't leverage comp cache, do I miss anything?
>>
>> Hmm, looks so. Not great that it doesn't, but probably it's because
>> of trying to submit next reqs right in io_poll_task_func().
>>
>> I'll be pushing for some changes around tw, with it should be easy
>> to hook poll completion batching with no drawbacks. Would be great
>> if you will be willing to take a shot on it.
> Sure, I'll take a look.
>>
Tried to integrate poll logic to completion cache, I did test it with an
echo-server program, both multishot and singleshot mode, not difference
on req/s.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2021-09-17 18:49 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03 11:00 [RFC 0/6] fast poll multishot mode Hao Xu
2021-09-03 11:00 ` [PATCH 1/6] io_uring: enhance flush completion logic Hao Xu
2021-09-03 11:42   ` Pavel Begunkov
2021-09-03 12:08     ` Hao Xu
2021-09-03 12:27       ` Pavel Begunkov
2021-09-03 13:38         ` Hao Xu
2021-09-17 18:49           ` Hao Xu
2021-09-03 11:00 ` [PATCH 2/6] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
2021-09-03 11:00 ` [PATCH 3/6] io_uring: add REQ_F_APOLL_MULTISHOT for requests Hao Xu
2021-09-03 11:00 ` [PATCH 4/6] io_uring: let fast poll support multishot Hao Xu
2021-09-06 15:56   ` Pavel Begunkov
2021-09-06 17:40     ` Hao Xu
2021-09-06 19:09       ` Pavel Begunkov
2021-09-07  6:38         ` Hao Xu
2021-09-06 19:04   ` Pavel Begunkov
2021-09-07  6:48     ` Hao Xu
2021-09-08 11:21       ` Hao Xu
2021-09-08 12:03         ` Pavel Begunkov
2021-09-08 13:13           ` Pavel Begunkov
2021-09-09  7:01           ` Hao Xu
2021-09-09  8:29             ` Hao Xu
2021-09-11 10:49               ` Pavel Begunkov
2021-09-11 20:19                 ` Hao Xu
2021-09-03 11:00 ` [PATCH 5/6] io_uring: implement multishot mode for accept Hao Xu
2021-09-04 22:39   ` Pavel Begunkov
2021-09-04 22:40     ` Pavel Begunkov
2021-09-06 15:34       ` Pavel Begunkov
2021-09-03 11:00 ` [PATCH 6/6] io_uring: enable " Hao Xu
2021-09-03 16:29   ` Jens Axboe
2021-09-04 15:34     ` Hao Xu
2021-09-04 18:40       ` Jens Axboe
2021-09-04 22:46         ` Pavel Begunkov
2021-09-05  7:29           ` Hao Xu
2021-09-05 19:44           ` Jens Axboe
2021-09-06  8:26             ` Hao Xu
2021-09-06  8:28               ` Hao Xu
2021-09-06 13:24               ` Jens Axboe
2021-09-06 12:35             ` Hao Xu
2021-09-06 13:31               ` Jens Axboe
2021-09-06 15:00                 ` Hao Xu
2021-09-06 15:32               ` Pavel Begunkov
2021-09-06 16:42                 ` Jens Axboe
2021-09-04 22:43   ` Pavel Begunkov
2021-09-05  6:25     ` Hao Xu
2021-09-05  8:27       ` Pavel Begunkov
2021-09-03 11:02 ` [RFC 0/6] fast poll multishot mode Hao Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).