All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH liburing v2 00/12] Defer taskrun changes
@ 2022-09-01  9:32 Dylan Yudaken
  2022-09-01  9:32 ` [PATCH liburing v2 01/12] Copy defer task run definition from kernel Dylan Yudaken
                   ` (11 more replies)
  0 siblings, 12 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:32 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

This series adds support to liburing for the IORING_SETUP_DEFER_TASKRUN flag.

This flag needs a couple of new API calls to force a call to get events for
users that are polling the io_uring fd (or a registered eventfd).

The second half of the series is a bit mixed and includes some documentation
fixes, overflow cleanups and test cleanups. I sent these a couple of months
ago and forgot about it, but now it does depend on the new API so it needs to
be ordered.
I can send it separately if you like.

Patches:

1 copies the definition from the kernel include file
2 introduces new APIs required for this feature
3/4/5 add tests for IORING_SETUP_DEFER_TASKRUN

6/7/8 clean and update existing documentation to match upstream
9 exposes the overflow state to the application
10 uses this and tests overflow functionality
11 gives an explicit warning if there is a short read in file-verify
12 is an unrelated fix to a flaky test

Changes since v1:
 - update tests to require IORING_SETUP_SINGLE_ISSUER
 - add docs for IORING_SETUP_DEFER_TASKRUN
 - add shutdown test

Dylan Yudaken (12):
  Copy defer task run definition from kernel
  Add documentation for IORING_SETUP_DEFER_TASKRUN flag
  add io_uring_submit_and_get_events and io_uring_get_events
  add a t_probe_defer_taskrun helper function for tests
  update existing tests for defer taskrun
  add a defer-taskrun test
  update io_uring_enter.2 docs for IORING_FEAT_NODROP
  add docs for overflow lost errors
  expose CQ ring overflow state
  overflow: add tests
  file-verify test: log if short read
  shutdown test: bind to ephemeral port

 man/io_uring_enter.2            |  24 ++-
 man/io_uring_setup.2            |  30 ++-
 src/include/liburing.h          |  11 ++
 src/include/liburing/io_uring.h |   7 +
 src/queue.c                     |  26 ++-
 test/Makefile                   |   1 +
 test/cq-overflow.c              | 243 ++++++++++++++++++++++-
 test/defer-taskrun.c            | 333 ++++++++++++++++++++++++++++++++
 test/eventfd-disable.c          |  33 +++-
 test/file-verify.c              |   4 +
 test/helpers.c                  |  17 +-
 test/helpers.h                  |   2 +
 test/iopoll.c                   |  17 +-
 test/multicqes_drain.c          |  50 ++++-
 test/poll-mshot-overflow.c      |  40 +++-
 test/recv-multishot.c           |  33 ++--
 test/rsrc_tags.c                |  10 +-
 test/shutdown.c                 |   7 +-
 18 files changed, 836 insertions(+), 52 deletions(-)
 create mode 100644 test/defer-taskrun.c


base-commit: a71d56ef3259216739677473ddb17ad861c3a964
-- 
2.30.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 01/12] Copy defer task run definition from kernel
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
@ 2022-09-01  9:32 ` Dylan Yudaken
  2022-09-01  9:32 ` [PATCH liburing v2 02/12] Add documentation for IORING_SETUP_DEFER_TASKRUN flag Dylan Yudaken
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:32 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Copy the flag from upstream

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 src/include/liburing/io_uring.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/include/liburing/io_uring.h b/src/include/liburing/io_uring.h
index 9e0b5c8d92ce..48e5c70e0baf 100644
--- a/src/include/liburing/io_uring.h
+++ b/src/include/liburing/io_uring.h
@@ -157,6 +157,13 @@ enum {
  */
 #define IORING_SETUP_SINGLE_ISSUER	(1U << 12)
 
+/*
+ * Defer running task work to get events.
+ * Rather than running bits of task work whenever the task transitions
+ * try to do it just before it is needed.
+ */
+#define IORING_SETUP_DEFER_TASKRUN	(1U << 13)
+
 enum io_uring_op {
 	IORING_OP_NOP,
 	IORING_OP_READV,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 02/12] Add documentation for IORING_SETUP_DEFER_TASKRUN flag
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
  2022-09-01  9:32 ` [PATCH liburing v2 01/12] Copy defer task run definition from kernel Dylan Yudaken
@ 2022-09-01  9:32 ` Dylan Yudaken
  2022-09-01  9:32 ` [PATCH liburing v2 03/12] add io_uring_submit_and_get_events and io_uring_get_events Dylan Yudaken
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:32 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Add man page entry to io_uring_setup.2 for the new flag

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 man/io_uring_setup.2 | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/man/io_uring_setup.2 b/man/io_uring_setup.2
index 32a9e2ee89b5..01eb70d95292 100644
--- a/man/io_uring_setup.2
+++ b/man/io_uring_setup.2
@@ -257,6 +257,25 @@ how many userspace tasks do
 .I
 io_uring_enter(2).
 Available since 5.20.
+.TP
+.B IORING_SETUP_DEFER_TASKRUN
+By default, io_uring will process all outstanding work at the end of any system
+call or thread interrupt. This can delay the application from making other progress.
+Setting this flag will hint to io_uring that it should defer work until an
+.BR io_uring_enter(2)
+call with the 
+.B IORING_ENTER_GETEVENTS
+flag set. This allows the application to request work to run just before it wants to
+process completions.
+This flag requires the
+.BR IORING_SETUP_SINGLE_ISSUER
+flag to be set, and also enforces that the call to
+.BR io_uring_enter(2)
+is called from the same thread that submitted requests.
+Note that if this flag is set then it is the application's responsibility to periodically
+trigger work (for example via any of the CQE waiting functions) or else completions may
+not be delivered.
+Available since 6.1.
 .PP
 If no flags are specified, the io_uring instance is setup for
 interrupt driven I/O.  I/O may be submitted using
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 03/12] add io_uring_submit_and_get_events and io_uring_get_events
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
  2022-09-01  9:32 ` [PATCH liburing v2 01/12] Copy defer task run definition from kernel Dylan Yudaken
  2022-09-01  9:32 ` [PATCH liburing v2 02/12] Add documentation for IORING_SETUP_DEFER_TASKRUN flag Dylan Yudaken
@ 2022-09-01  9:32 ` Dylan Yudaken
  2022-09-01 18:36   ` Jens Axboe
  2022-09-01  9:32 ` [PATCH liburing v2 04/12] add a t_probe_defer_taskrun helper function for tests Dylan Yudaken
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:32 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

With deferred task running, we would like to be able to combine submit
with get events (regardless of if there are CQE's available), or if there
is nothing to submit then simply do an enter with IORING_ENTER_GETEVENTS
set, in order to process any available work.

Expose these APIs

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 src/include/liburing.h |  2 ++
 src/queue.c            | 26 ++++++++++++++++++--------
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/src/include/liburing.h b/src/include/liburing.h
index 6e868472b77a..3c5097b255de 100644
--- a/src/include/liburing.h
+++ b/src/include/liburing.h
@@ -202,6 +202,8 @@ int io_uring_register_file_alloc_range(struct io_uring *ring,
 int io_uring_register_notifications(struct io_uring *ring, unsigned nr,
 				    struct io_uring_notification_slot *slots);
 int io_uring_unregister_notifications(struct io_uring *ring);
+int io_uring_get_events(struct io_uring *ring);
+int io_uring_submit_and_get_events(struct io_uring *ring);
 
 /*
  * io_uring syscalls.
diff --git a/src/queue.c b/src/queue.c
index a670a8ecd20d..b012a3dd950b 100644
--- a/src/queue.c
+++ b/src/queue.c
@@ -130,6 +130,15 @@ int __io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr,
 	return _io_uring_get_cqe(ring, cqe_ptr, &data);
 }
 
+int io_uring_get_events(struct io_uring *ring)
+{
+	int flags = IORING_ENTER_GETEVENTS;
+
+	if (ring->int_flags & INT_FLAG_REG_RING)
+		flags |= IORING_ENTER_REGISTERED_RING;
+	return __sys_io_uring_enter(ring->enter_ring_fd, 0, 0, flags, NULL);
+}
+
 /*
  * Fill in an array of IO completions up to count, if any are available.
  * Returns the amount of IO completions filled.
@@ -164,11 +173,7 @@ again:
 		return 0;
 
 	if (cq_ring_needs_flush(ring)) {
-		int flags = IORING_ENTER_GETEVENTS;
-
-		if (ring->int_flags & INT_FLAG_REG_RING)
-			flags |= IORING_ENTER_REGISTERED_RING;
-		__sys_io_uring_enter(ring->enter_ring_fd, 0, 0, flags, NULL);
+		io_uring_get_events(ring);
 		overflow_checked = true;
 		goto again;
 	}
@@ -340,9 +345,9 @@ int io_uring_wait_cqe_timeout(struct io_uring *ring,
  * Returns number of sqes submitted
  */
 static int __io_uring_submit(struct io_uring *ring, unsigned submitted,
-			     unsigned wait_nr)
+			     unsigned wait_nr, bool getevents)
 {
-	bool cq_needs_enter = wait_nr || cq_ring_needs_enter(ring);
+	bool cq_needs_enter = getevents || wait_nr || cq_ring_needs_enter(ring);
 	unsigned flags;
 	int ret;
 
@@ -363,7 +368,7 @@ static int __io_uring_submit(struct io_uring *ring, unsigned submitted,
 
 static int __io_uring_submit_and_wait(struct io_uring *ring, unsigned wait_nr)
 {
-	return __io_uring_submit(ring, __io_uring_flush_sq(ring), wait_nr);
+	return __io_uring_submit(ring, __io_uring_flush_sq(ring), wait_nr, false);
 }
 
 /*
@@ -386,6 +391,11 @@ int io_uring_submit_and_wait(struct io_uring *ring, unsigned wait_nr)
 	return __io_uring_submit_and_wait(ring, wait_nr);
 }
 
+int io_uring_submit_and_get_events(struct io_uring *ring)
+{
+	return __io_uring_submit(ring, __io_uring_flush_sq(ring), 0, true);
+}
+
 #ifdef LIBURING_INTERNAL
 struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 04/12] add a t_probe_defer_taskrun helper function for tests
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
                   ` (2 preceding siblings ...)
  2022-09-01  9:32 ` [PATCH liburing v2 03/12] add io_uring_submit_and_get_events and io_uring_get_events Dylan Yudaken
@ 2022-09-01  9:32 ` Dylan Yudaken
  2022-09-01  9:32 ` [PATCH liburing v2 05/12] update existing tests for defer taskrun Dylan Yudaken
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:32 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Help tests to determine if they can use IORING_SETUP_DEFER_TASKRUN

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 test/helpers.c | 17 +++++++++++++++--
 test/helpers.h |  2 ++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/test/helpers.c b/test/helpers.c
index 014653313f41..80b75f4fed99 100644
--- a/test/helpers.c
+++ b/test/helpers.c
@@ -57,7 +57,7 @@ static void __t_create_file(const char *file, size_t size, char pattern)
 {
 	ssize_t ret;
 	char *buf;
-	int fd; 
+	int fd;
 
 	buf = t_malloc(size);
 	memset(buf, pattern, size);
@@ -94,7 +94,7 @@ struct iovec *t_create_buffers(size_t buf_num, size_t buf_size)
 	vecs = t_malloc(buf_num * sizeof(struct iovec));
 	for (i = 0; i < buf_num; i++) {
 		t_posix_memalign(&vecs[i].iov_base, buf_size, buf_size);
-		vecs[i].iov_len = buf_size; 
+		vecs[i].iov_len = buf_size;
 	}
 	return vecs;
 }
@@ -235,3 +235,16 @@ errno_cleanup:
 	close(fd[1]);
 	return ret;
 }
+
+bool t_probe_defer_taskrun(void)
+{
+	struct io_uring ring;
+	int ret;
+
+	ret = io_uring_queue_init(1, &ring, IORING_SETUP_SINGLE_ISSUER |
+					    IORING_SETUP_DEFER_TASKRUN);
+	if (ret < 0)
+		return false;
+	io_uring_queue_exit(&ring);
+	return true;
+}
diff --git a/test/helpers.h b/test/helpers.h
index 6d5726c9deb6..efce4b344f87 100644
--- a/test/helpers.h
+++ b/test/helpers.h
@@ -75,6 +75,8 @@ enum t_setup_ret t_register_buffers(struct io_uring *ring,
 				    const struct iovec *iovecs,
 				    unsigned nr_iovecs);
 
+bool t_probe_defer_taskrun(void);
+
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 
 #ifdef __cplusplus
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 05/12] update existing tests for defer taskrun
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
                   ` (3 preceding siblings ...)
  2022-09-01  9:32 ` [PATCH liburing v2 04/12] add a t_probe_defer_taskrun helper function for tests Dylan Yudaken
@ 2022-09-01  9:32 ` Dylan Yudaken
  2022-09-01  9:32 ` [PATCH liburing v2 06/12] add a defer-taskrun test Dylan Yudaken
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:32 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Add defer_taskrun to a few choice tests that can expose some bad
behaviour.
This requires adding some io_uring_get_events calls to make sure deferred
tasks are run

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 test/eventfd-disable.c     | 33 ++++++++++++++++++++++---
 test/iopoll.c              | 17 +++++++++----
 test/multicqes_drain.c     | 50 +++++++++++++++++++++++++++++++++-----
 test/poll-mshot-overflow.c | 40 +++++++++++++++++++++++++++---
 test/recv-multishot.c      | 33 ++++++++++++++++---------
 test/rsrc_tags.c           | 10 ++++++--
 6 files changed, 152 insertions(+), 31 deletions(-)

diff --git a/test/eventfd-disable.c b/test/eventfd-disable.c
index 2c8cf6dad7c1..162f9f9bc783 100644
--- a/test/eventfd-disable.c
+++ b/test/eventfd-disable.c
@@ -15,7 +15,7 @@
 #include "liburing.h"
 #include "helpers.h"
 
-int main(int argc, char *argv[])
+static int test(bool defer)
 {
 	struct io_uring_params p = {};
 	struct io_uring_sqe *sqe;
@@ -28,8 +28,9 @@ int main(int argc, char *argv[])
 	};
 	int ret, evfd, i;
 
-	if (argc > 1)
-		return T_EXIT_SKIP;
+	if (defer)
+		p.flags |= IORING_SETUP_SINGLE_ISSUER |
+			   IORING_SETUP_DEFER_TASKRUN;
 
 	ret = io_uring_queue_init_params(64, &ring, &p);
 	if (ret) {
@@ -148,5 +149,31 @@ int main(int argc, char *argv[])
 		io_uring_cqe_seen(&ring, cqe);
 	}
 
+	io_uring_queue_exit(&ring);
+	close(evfd);
 	return T_EXIT_PASS;
 }
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (argc > 1)
+		return T_EXIT_SKIP;
+
+	ret = test(false);
+	if (ret != T_EXIT_PASS) {
+		fprintf(stderr, "%s: test(false) failed\n", argv[0]);
+		return ret;
+	}
+
+	if (t_probe_defer_taskrun()) {
+		ret = test(true);
+		if (ret != T_EXIT_PASS) {
+			fprintf(stderr, "%s: test(true) failed\n", argv[0]);
+			return ret;
+		}
+	}
+
+	return ret;
+}
diff --git a/test/iopoll.c b/test/iopoll.c
index 91cb71bd2e9c..20f91c7947be 100644
--- a/test/iopoll.c
+++ b/test/iopoll.c
@@ -274,7 +274,7 @@ ok:
 }
 
 static int test_io(const char *file, int write, int sqthread, int fixed,
-		   int buf_select)
+		   int buf_select, int defer)
 {
 	struct io_uring ring;
 	int ret, ring_flags = IORING_SETUP_IOPOLL;
@@ -282,6 +282,10 @@ static int test_io(const char *file, int write, int sqthread, int fixed,
 	if (no_iopoll)
 		return 0;
 
+	if (defer)
+		ring_flags |= IORING_SETUP_SINGLE_ISSUER |
+			      IORING_SETUP_DEFER_TASKRUN;
+
 	ret = t_create_ring(64, &ring, ring_flags);
 	if (ret == T_SETUP_SKIP)
 		return 0;
@@ -337,19 +341,22 @@ int main(int argc, char *argv[])
 
 	vecs = t_create_buffers(BUFFERS, BS);
 
-	nr = 16;
+	nr = 32;
 	if (no_buf_select)
 		nr = 8;
+	else if (!t_probe_defer_taskrun())
+		nr = 16;
 	for (i = 0; i < nr; i++) {
 		int write = (i & 1) != 0;
 		int sqthread = (i & 2) != 0;
 		int fixed = (i & 4) != 0;
 		int buf_select = (i & 8) != 0;
+		int defer = (i & 16) != 0;
 
-		ret = test_io(fname, write, sqthread, fixed, buf_select);
+		ret = test_io(fname, write, sqthread, fixed, buf_select, defer);
 		if (ret) {
-			fprintf(stderr, "test_io failed %d/%d/%d/%d\n",
-				write, sqthread, fixed, buf_select);
+			fprintf(stderr, "test_io failed %d/%d/%d/%d/%d\n",
+				write, sqthread, fixed, buf_select, defer);
 			goto err;
 		}
 		if (no_iopoll)
diff --git a/test/multicqes_drain.c b/test/multicqes_drain.c
index 6cd03ba5f3f7..f95c4382b3f4 100644
--- a/test/multicqes_drain.c
+++ b/test/multicqes_drain.c
@@ -233,6 +233,8 @@ static int test_generic_drain(struct io_uring *ring)
 
 		if (trigger_event(pipes[i]))
 			goto err;
+
+		io_uring_get_events(ring);
 	}
 	sleep(1);
 	i = 0;
@@ -246,7 +248,7 @@ static int test_generic_drain(struct io_uring *ring)
 	 * compl_bits is a bit map to record completions.
 	 * eg. sqe[0], sqe[1], sqe[2] fully completed
 	 * then compl_bits is 000...00111b
-	 * 
+	 *
 	 */
 	unsigned long long compl_bits = 0;
 	for (j = 0; j < i; j++) {
@@ -295,7 +297,12 @@ static int test_simple_drain(struct io_uring *ring)
 	io_uring_prep_poll_add(sqe[1], pipe2[0], POLLIN);
 	sqe[1]->user_data = 1;
 
-	ret = io_uring_submit(ring);
+	/* This test relies on multishot poll to trigger events continually.
+	 * however with IORING_SETUP_DEFER_TASKRUN this will only happen when
+	 * triggered with a get_events. Hence we sprinkle get_events whenever
+	 * there might be work to process in order to get the same result
+	 */
+	ret = io_uring_submit_and_get_events(ring);
 	if (ret < 0) {
 		printf("sqe submit failed\n");
 		goto err;
@@ -307,9 +314,11 @@ static int test_simple_drain(struct io_uring *ring)
 	for (i = 0; i < 2; i++) {
 		if (trigger_event(pipe1))
 			goto err;
+		io_uring_get_events(ring);
 	}
 	if (trigger_event(pipe2))
 			goto err;
+	io_uring_get_events(ring);
 
 	for (i = 0; i < 2; i++) {
 		sqe[i] = io_uring_get_sqe(ring);
@@ -355,15 +364,17 @@ err:
 	return 1;
 }
 
-int main(int argc, char *argv[])
+static int test(bool defer_taskrun)
 {
 	struct io_uring ring;
 	int i, ret;
+	unsigned int flags = 0;
 
-	if (argc > 1)
-		return T_EXIT_SKIP;
+	if (defer_taskrun)
+		flags = IORING_SETUP_SINGLE_ISSUER |
+			IORING_SETUP_DEFER_TASKRUN;
 
-	ret = io_uring_queue_init(1024, &ring, 0);
+	ret = io_uring_queue_init(1024, &ring, flags);
 	if (ret) {
 		printf("ring setup failed\n");
 		return T_EXIT_FAIL;
@@ -384,5 +395,32 @@ int main(int argc, char *argv[])
 			return T_EXIT_FAIL;
 		}
 	}
+
+	io_uring_queue_exit(&ring);
+
 	return T_EXIT_PASS;
 }
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (argc > 1)
+		return T_EXIT_SKIP;
+
+	ret = test(false);
+	if (ret != T_EXIT_PASS) {
+		fprintf(stderr, "%s: test(false) failed\n", argv[0]);
+		return ret;
+	}
+
+	if (t_probe_defer_taskrun()) {
+		ret = test(true);
+		if (ret != T_EXIT_PASS) {
+			fprintf(stderr, "%s: test(true) failed\n", argv[0]);
+			return ret;
+		}
+	}
+
+	return ret;
+}
diff --git a/test/poll-mshot-overflow.c b/test/poll-mshot-overflow.c
index 360df65d2b15..431a337f19ae 100644
--- a/test/poll-mshot-overflow.c
+++ b/test/poll-mshot-overflow.c
@@ -42,7 +42,7 @@ int check_final_cqe(struct io_uring *ring)
 	return T_EXIT_PASS;
 }
 
-int main(int argc, char *argv[])
+static int test(bool defer_taskrun)
 {
 	struct io_uring_cqe *cqe;
 	struct io_uring_sqe *sqe;
@@ -50,9 +50,6 @@ int main(int argc, char *argv[])
 	int pipe1[2];
 	int ret, i;
 
-	if (argc > 1)
-		return 0;
-
 	if (pipe(pipe1) != 0) {
 		perror("pipe");
 		return T_EXIT_FAIL;
@@ -66,6 +63,10 @@ int main(int argc, char *argv[])
 		.cq_entries = 2
 	};
 
+	if (defer_taskrun)
+		params.flags |= IORING_SETUP_SINGLE_ISSUER |
+				IORING_SETUP_DEFER_TASKRUN;
+
 	ret = io_uring_queue_init_params(2, &ring, &params);
 	if (ret)
 		return T_EXIT_SKIP;
@@ -113,6 +114,9 @@ int main(int argc, char *argv[])
 		io_uring_cqe_seen(&ring, cqe);
 	}
 
+	/* make sure everything is processed */
+	io_uring_get_events(&ring);
+
 	/* now remove the poll */
 	sqe = io_uring_get_sqe(&ring);
 	io_uring_prep_poll_remove(sqe, 1);
@@ -126,5 +130,33 @@ int main(int argc, char *argv[])
 
 	ret = check_final_cqe(&ring);
 
+	close(pipe1[0]);
+	close(pipe1[1]);
+	io_uring_queue_exit(&ring);
+
+	return ret;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (argc > 1)
+		return T_EXIT_SKIP;
+
+	ret = test(false);
+	if (ret != T_EXIT_PASS) {
+		fprintf(stderr, "%s: test(false) failed\n", argv[0]);
+		return ret;
+	}
+
+	if (t_probe_defer_taskrun()) {
+		ret = test(true);
+		if (ret != T_EXIT_PASS) {
+			fprintf(stderr, "%s: test(true) failed\n", argv[0]);
+			return ret;
+		}
+	}
+
 	return ret;
 }
diff --git a/test/recv-multishot.c b/test/recv-multishot.c
index a322e4317232..1a041f8e865a 100644
--- a/test/recv-multishot.c
+++ b/test/recv-multishot.c
@@ -29,6 +29,7 @@ struct args {
 	bool wait_each;
 	bool recvmsg;
 	enum early_error_t early_error;
+	bool defer;
 };
 
 static int check_sockaddr(struct sockaddr_in *in)
@@ -76,19 +77,22 @@ static int test(struct args *args)
 		.tv_sec = 1,
 	};
 	struct msghdr msg;
+	struct io_uring_params params = { };
+	int n_sqe = 32;
 
 	memset(recv_buffs, 0, sizeof(recv_buffs));
 
-	if (args->early_error == ERROR_EARLY_OVERFLOW) {
-		struct io_uring_params params = {
-			.flags = IORING_SETUP_CQSIZE,
-			.cq_entries = N_CQE_OVERFLOW
-		};
+	if (args->defer)
+		params.flags |= IORING_SETUP_SINGLE_ISSUER |
+				IORING_SETUP_DEFER_TASKRUN;
 
-		ret = io_uring_queue_init_params(N_CQE_OVERFLOW, &ring, &params);
-	} else {
-		ret = io_uring_queue_init(32, &ring, 0);
+	if (args->early_error == ERROR_EARLY_OVERFLOW) {
+		params.flags |= IORING_SETUP_CQSIZE;
+		params.cq_entries = N_CQE_OVERFLOW;
+		n_sqe = N_CQE_OVERFLOW;
 	}
+
+	ret = io_uring_queue_init_params(n_sqe, &ring, &params);
 	if (ret) {
 		fprintf(stderr, "queue init failed: %d\n", ret);
 		return ret;
@@ -457,23 +461,30 @@ int main(int argc, char *argv[])
 	int ret;
 	int loop;
 	int early_error = 0;
+	bool has_defer;
 
 	if (argc > 1)
 		return T_EXIT_SKIP;
 
-	for (loop = 0; loop < 8; loop++) {
+	has_defer = t_probe_defer_taskrun();
+
+	for (loop = 0; loop < 16; loop++) {
 		struct args a = {
 			.stream = loop & 0x01,
 			.wait_each = loop & 0x2,
 			.recvmsg = loop & 0x04,
+			.defer = loop & 0x08,
 		};
+		if (a.defer && !has_defer)
+			continue;
 		for (early_error = 0; early_error < ERROR_EARLY_LAST; early_error++) {
 			a.early_error = (enum early_error_t)early_error;
 			ret = test(&a);
 			if (ret) {
 				fprintf(stderr,
-					"test stream=%d wait_each=%d recvmsg=%d early_error=%d failed\n",
-					a.stream, a.wait_each, a.recvmsg, a.early_error);
+					"test stream=%d wait_each=%d recvmsg=%d early_error=%d "
+					" defer=%d failed\n",
+					a.stream, a.wait_each, a.recvmsg, a.early_error, a.defer);
 				return T_EXIT_FAIL;
 			}
 			if (no_recv_mshot)
diff --git a/test/rsrc_tags.c b/test/rsrc_tags.c
index 22370644b200..047e844acfbd 100644
--- a/test/rsrc_tags.c
+++ b/test/rsrc_tags.c
@@ -401,7 +401,8 @@ static int test_notag(void)
 
 int main(int argc, char *argv[])
 {
-	int ring_flags[] = {0, IORING_SETUP_IOPOLL, IORING_SETUP_SQPOLL};
+	int ring_flags[] = {0, IORING_SETUP_IOPOLL, IORING_SETUP_SQPOLL,
+			    IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN};
 	int i, ret;
 
 	if (argc > 1)
@@ -423,7 +424,12 @@ int main(int argc, char *argv[])
 	}
 
 	for (i = 0; i < sizeof(ring_flags) / sizeof(ring_flags[0]); i++) {
-		ret = test_files(ring_flags[i]);
+		int flag = ring_flags[i];
+
+		if (flag & IORING_SETUP_DEFER_TASKRUN && !t_probe_defer_taskrun())
+			continue;
+
+		ret = test_files(flag);
 		if (ret) {
 			printf("test_tag failed, type %i\n", i);
 			return ret;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 06/12] add a defer-taskrun test
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
                   ` (4 preceding siblings ...)
  2022-09-01  9:32 ` [PATCH liburing v2 05/12] update existing tests for defer taskrun Dylan Yudaken
@ 2022-09-01  9:32 ` Dylan Yudaken
  2022-09-01  9:32 ` [PATCH liburing v2 07/12] update io_uring_enter.2 docs for IORING_FEAT_NODROP Dylan Yudaken
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:32 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Add a test specifically for IORING_SETUP_DEFER_TASKRUN

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 test/Makefile        |   1 +
 test/defer-taskrun.c | 333 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 334 insertions(+)
 create mode 100644 test/defer-taskrun.c

diff --git a/test/Makefile b/test/Makefile
index 418c11c95875..78a499a357d7 100644
--- a/test/Makefile
+++ b/test/Makefile
@@ -62,6 +62,7 @@ test_srcs := \
 	d4ae271dfaae.c \
 	d77a67ed5f27.c \
 	defer.c \
+	defer-taskrun.c \
 	double-poll-crash.c \
 	drop-submit.c \
 	eeed8b54e0df.c \
diff --git a/test/defer-taskrun.c b/test/defer-taskrun.c
new file mode 100644
index 000000000000..aec8c5d3f223
--- /dev/null
+++ b/test/defer-taskrun.c
@@ -0,0 +1,333 @@
+// SPDX-License-Identifier: MIT
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+#include <error.h>
+#include <sys/eventfd.h>
+#include <signal.h>
+#include <poll.h>
+#include <assert.h>
+#include <pthread.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "liburing.h"
+#include "test.h"
+#include "helpers.h"
+
+#define EXEC_FILENAME ".defer-taskrun"
+#define EXEC_FILESIZE (1U<<20)
+
+static bool can_read_t(int fd, int time)
+{
+	int ret;
+	struct pollfd p = {
+		.fd = fd,
+		.events = POLLIN,
+	};
+
+	ret = poll(&p, 1, time);
+
+	return ret == 1;
+}
+
+static bool can_read(int fd)
+{
+	return can_read_t(fd, 0);
+}
+
+static void eventfd_clear(int fd)
+{
+	uint64_t val;
+	int ret;
+
+	assert(can_read(fd));
+	ret = read(fd, &val, 8);
+	assert(ret == 8);
+}
+
+static void eventfd_trigger(int fd)
+{
+	uint64_t val = 1;
+	int ret;
+
+	ret = write(fd, &val, sizeof(val));
+	assert(ret == sizeof(val));
+}
+
+#define CHECK(x) if (!(x)) { \
+		fprintf(stderr, "%s:%d %s failed\n", __FILE__, __LINE__, #x); \
+		return -1; }
+
+static int test_eventfd(void)
+{
+	struct io_uring ring;
+	int ret;
+	int fda, fdb;
+	struct io_uring_cqe *cqe;
+
+	ret = io_uring_queue_init(8, &ring, IORING_SETUP_SINGLE_ISSUER |
+					    IORING_SETUP_DEFER_TASKRUN);
+	if (ret)
+		return ret;
+
+	fda = eventfd(0, EFD_NONBLOCK);
+	fdb = eventfd(0, EFD_NONBLOCK);
+
+	CHECK(fda >= 0 && fdb >= 0);
+
+	ret = io_uring_register_eventfd(&ring, fda);
+	if (ret)
+		return ret;
+
+	CHECK(!can_read(fda));
+	CHECK(!can_read(fdb));
+
+	io_uring_prep_poll_add(io_uring_get_sqe(&ring), fdb, POLLIN);
+	io_uring_submit(&ring);
+	CHECK(!can_read(fda)); /* poll should not have completed */
+
+	io_uring_prep_nop(io_uring_get_sqe(&ring));
+	io_uring_submit(&ring);
+	CHECK(can_read(fda)); /* nop should have */
+
+	CHECK(io_uring_peek_cqe(&ring, &cqe) == 0);
+	CHECK(cqe->res == 0);
+	io_uring_cqe_seen(&ring, cqe);
+	eventfd_clear(fda);
+
+	eventfd_trigger(fdb);
+	/* can take time due to rcu_call */
+	CHECK(can_read_t(fda, 1000));
+
+	/* should not have processed the cqe yet */
+	CHECK(io_uring_cq_ready(&ring) == 0);
+
+	io_uring_get_events(&ring);
+	CHECK(io_uring_cq_ready(&ring) == 1);
+
+
+	io_uring_queue_exit(&ring);
+	return 0;
+}
+
+struct thread_data {
+	struct io_uring ring;
+	int efd;
+	char buff[8];
+};
+
+void *thread(void *t)
+{
+	struct thread_data *td = t;
+
+	io_uring_prep_read(io_uring_get_sqe(&td->ring), td->efd, td->buff, sizeof(td->buff), 0);
+	io_uring_submit(&td->ring);
+
+	return NULL;
+}
+
+static int test_thread_shutdown(void)
+{
+	pthread_t t1;
+	int ret;
+	struct thread_data td;
+	struct io_uring_cqe *cqe;
+	uint64_t val = 1;
+
+	ret = io_uring_queue_init(8, &td.ring, IORING_SETUP_SINGLE_ISSUER |
+					       IORING_SETUP_DEFER_TASKRUN);
+	if (ret)
+		return ret;
+
+	/* check that even before submitting we don't get errors */
+	CHECK(io_uring_get_events(&td.ring) == 0);
+
+	td.efd = eventfd(0, 0);
+	CHECK(td.efd >= 0);
+
+	CHECK(pthread_create(&t1, NULL, thread, &td) == 0);
+	CHECK(pthread_join(t1, NULL) == 0);
+
+	CHECK(write(td.efd, &val, sizeof(val)) == sizeof(val));
+	CHECK(io_uring_wait_cqe(&td.ring, &cqe) == -EEXIST);
+
+	close(td.efd);
+	io_uring_queue_exit(&td.ring);
+	return 0;
+}
+
+static int test_exec(const char *filename)
+{
+	int ret;
+	int fd;
+	struct io_uring ring;
+	pid_t fork_pid;
+	static char * const new_argv[] = {"1", "2", "3", NULL};
+	static char * const new_env[] = {NULL};
+	char *buff;
+
+	fork_pid = fork();
+	CHECK(fork_pid >= 0);
+	if (fork_pid > 0) {
+		int wstatus;
+
+		CHECK(waitpid(fork_pid, &wstatus, 0) != (pid_t)-1);
+		if (!WIFEXITED(wstatus) || WEXITSTATUS(wstatus) != T_EXIT_SKIP) {
+			fprintf(stderr, "child failed %i\n", WEXITSTATUS(wstatus));
+			return -1;
+		}
+		return 0;
+	}
+
+	ret = io_uring_queue_init(8, &ring, IORING_SETUP_SINGLE_ISSUER |
+					    IORING_SETUP_DEFER_TASKRUN);
+	if (ret)
+		return ret;
+
+	if (filename) {
+		fd = open(filename, O_RDONLY | O_DIRECT);
+	} else {
+		t_create_file(EXEC_FILENAME, EXEC_FILESIZE);
+		fd = open(EXEC_FILENAME, O_RDONLY | O_DIRECT);
+		unlink(EXEC_FILENAME);
+	}
+	buff = (char*)malloc(EXEC_FILESIZE);
+	CHECK(posix_memalign((void **)&buff, 4096, EXEC_FILESIZE) == 0);
+	CHECK(buff);
+
+	CHECK(fd >= 0);
+	io_uring_prep_read(io_uring_get_sqe(&ring), fd, buff, EXEC_FILESIZE, 0);
+	io_uring_submit(&ring);
+	ret = execve("/proc/self/exe", new_argv, new_env);
+	/* if we get here it failed anyway */
+	fprintf(stderr, "execve failed %d\n", ret);
+	return -1;
+}
+
+static int test_flag(void)
+{
+	struct io_uring ring;
+	int ret;
+	int fd;
+	struct io_uring_cqe *cqe;
+
+	ret = io_uring_queue_init(8, &ring, IORING_SETUP_SINGLE_ISSUER |
+					    IORING_SETUP_DEFER_TASKRUN |
+					    IORING_SETUP_TASKRUN_FLAG);
+	CHECK(!ret);
+
+	fd = eventfd(0, EFD_NONBLOCK);
+	CHECK(fd >= 0);
+
+	io_uring_prep_poll_add(io_uring_get_sqe(&ring), fd, POLLIN);
+	io_uring_submit(&ring);
+	CHECK(!can_read(fd)); /* poll should not have completed */
+
+	eventfd_trigger(fd);
+	CHECK(can_read(fd));
+
+	/* should not have processed the poll cqe yet */
+	CHECK(io_uring_cq_ready(&ring) == 0);
+
+	/* flag should be set */
+	CHECK(IO_URING_READ_ONCE(*ring.sq.kflags) & IORING_SQ_TASKRUN);
+
+	/* Specifically peek, knowing we have only no cqe
+	 * but because the flag is set, liburing should try and get more
+	 */
+	ret = io_uring_peek_cqe(&ring, &cqe);
+
+	CHECK(ret == 0 && cqe);
+	CHECK(!(IO_URING_READ_ONCE(*ring.sq.kflags) & IORING_SQ_TASKRUN));
+
+	close(fd);
+	io_uring_queue_exit(&ring);
+	return 0;
+}
+
+static int test_ring_shutdown(void)
+{
+	struct io_uring ring;
+	int ret;
+	int fd[2];
+	char buff = '\0';
+	char send = 'X';
+
+	ret = io_uring_queue_init(8, &ring, IORING_SETUP_SINGLE_ISSUER |
+					    IORING_SETUP_DEFER_TASKRUN |
+					    IORING_SETUP_TASKRUN_FLAG);
+	CHECK(!ret);
+
+	ret = t_create_socket_pair(fd, true);
+	CHECK(!ret);
+
+	io_uring_prep_recv(io_uring_get_sqe(&ring), fd[0], &buff, 1, 0);
+	io_uring_submit(&ring);
+
+	ret = write(fd[1], &send, 1);
+	CHECK(ret == 1);
+
+	/* should not have processed the poll cqe yet */
+	CHECK(io_uring_cq_ready(&ring) == 0);
+	io_uring_queue_exit(&ring);
+
+	/* task work should have been processed by now */
+	CHECK(buff = 'X');
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret;
+	const char *filename = NULL;
+
+	if (argc > 2)
+		return T_EXIT_SKIP;
+	if (argc == 2) {
+		/* This test exposes interesting behaviour with a null-blk
+		 * device configured like:
+		 * $ modprobe null-blk completion_nsec=100000000 irqmode=2
+		 * and then run with $ defer-taskrun.t /dev/nullb0
+		 */
+		filename = argv[1];
+	}
+
+	if (!t_probe_defer_taskrun())
+		return T_EXIT_SKIP;
+
+	ret = test_thread_shutdown();
+	if (ret) {
+		fprintf(stderr, "test_thread_shutdown failed\n");
+		return T_EXIT_FAIL;
+	}
+
+	ret = test_exec(filename);
+	if (ret) {
+		fprintf(stderr, "test_exec failed\n");
+		return T_EXIT_FAIL;
+	}
+
+	ret = test_eventfd();
+	if (ret) {
+		fprintf(stderr, "eventfd failed\n");
+		return T_EXIT_FAIL;
+	}
+
+	ret = test_flag();
+	if (ret) {
+		fprintf(stderr, "flag failed\n");
+		return T_EXIT_FAIL;
+	}
+
+	ret = test_ring_shutdown();
+	if (ret) {
+		fprintf(stderr, "test_ring_shutdown failed\n");
+		return T_EXIT_FAIL;
+	}
+
+	return T_EXIT_PASS;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 07/12] update io_uring_enter.2 docs for IORING_FEAT_NODROP
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
                   ` (5 preceding siblings ...)
  2022-09-01  9:32 ` [PATCH liburing v2 06/12] add a defer-taskrun test Dylan Yudaken
@ 2022-09-01  9:32 ` Dylan Yudaken
  2022-09-01  9:32 ` [PATCH liburing v2 08/12] add docs for overflow lost errors Dylan Yudaken
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:32 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

The EBUSY docs are out of date, so update them for the IORING_FEAT_NODROP
feature flag

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 man/io_uring_enter.2 | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2
index 1a9311e20357..4d8d488b5c1f 100644
--- a/man/io_uring_enter.2
+++ b/man/io_uring_enter.2
@@ -1330,7 +1330,18 @@ is a valid file descriptor, but the io_uring ring is not in the right state
 for details on how to enable the ring.
 .TP
 .B EBUSY
-The application is attempting to overcommit the number of requests it can have
+If the
+.B IORING_FEAT_NODROP
+feature flag is set, then
+.B EBUSY
+will be returned if there were overflow entries,
+.B IORING_ENTER_GETEVENTS
+flag is set and not all of the overflow entries were able to be flushed to
+the CQ ring.
+
+Without
+.B IORING_FEAT_NODROP
+the application is attempting to overcommit the number of requests it can have
 pending. The application should wait for some completions and try again. May
 occur if the application tries to queue more requests than we have room for in
 the CQ ring, or if the application attempts to wait for more events without
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 08/12] add docs for overflow lost errors
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
                   ` (6 preceding siblings ...)
  2022-09-01  9:32 ` [PATCH liburing v2 07/12] update io_uring_enter.2 docs for IORING_FEAT_NODROP Dylan Yudaken
@ 2022-09-01  9:32 ` Dylan Yudaken
  2022-09-01  9:33 ` [PATCH liburing v2 09/12] expose CQ ring overflow state Dylan Yudaken
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:32 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Add man docs for return values indicating a CQE was lost.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 man/io_uring_enter.2 | 11 +++++++++++
 man/io_uring_setup.2 | 11 +++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2
index 4d8d488b5c1f..3c4aa04016cf 100644
--- a/man/io_uring_enter.2
+++ b/man/io_uring_enter.2
@@ -1329,6 +1329,17 @@ is a valid file descriptor, but the io_uring ring is not in the right state
 .BR io_uring_register (2)
 for details on how to enable the ring.
 .TP
+.B EBADR
+At least one CQE was dropped even with the
+.B IORING_FEAT_NODROP
+feature, and there are no otherwise available CQEs. This clears the error state
+and so with no other changes the next call to
+.BR io_uring_setup (2)
+will not have this error. This error should be extremely rare and indicates the
+machine is running critically low on memory and. It may be reasonable for the
+application to terminate running unless it is able to safely handle any CQE
+being lost.
+.TP
 .B EBUSY
 If the
 .B IORING_FEAT_NODROP
diff --git a/man/io_uring_setup.2 b/man/io_uring_setup.2
index 01eb70d95292..41c9e5f269c5 100644
--- a/man/io_uring_setup.2
+++ b/man/io_uring_setup.2
@@ -298,7 +298,7 @@ call. The SQEs must still be allocated separately. This brings the necessary
 calls down from three to two. Available since kernel 5.4.
 .TP
 .B IORING_FEAT_NODROP
-If this flag is set, io_uring supports never dropping completion events.
+If this flag is set, io_uring supports almost never dropping completion events.
 If a completion event occurs and the CQ ring is full, the kernel stores
 the event internally until such a time that the CQ ring has room for more
 entries. If this overflow condition is entered, attempting to submit more
@@ -306,7 +306,14 @@ IO will fail with the
 .B -EBUSY
 error value, if it can't flush the overflown events to the CQ ring. If this
 happens, the application must reap events from the CQ ring and attempt the
-submit again. Available since kernel 5.5.
+submit again. If the kernel has no free memory to store the event internally
+it will be visible by an increase in the overflow value on the cqring.
+Available since kernel 5.5. Additionally
+.BR io_uring_enter (2)
+will return
+.B -EBADR
+the next time it would otherwise sleep waiting for completions (since kernel 5.19).
+
 .TP
 .B IORING_FEAT_SUBMIT_STABLE
 If this flag is set, applications can be certain that any data for
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 09/12] expose CQ ring overflow state
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
                   ` (7 preceding siblings ...)
  2022-09-01  9:32 ` [PATCH liburing v2 08/12] add docs for overflow lost errors Dylan Yudaken
@ 2022-09-01  9:33 ` Dylan Yudaken
  2022-09-01  9:33 ` [PATCH liburing v2 10/12] overflow: add tests Dylan Yudaken
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:33 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Allow the application to easily view if the CQ ring is in overflow state in
case it would like to explicitly flush using io_uring_get_events.

Explicit flushing can be useful for applications that prefer to have
reduced latency on CQs than to process as many as possible.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 src/include/liburing.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/include/liburing.h b/src/include/liburing.h
index 3c5097b255de..ca8fbd608901 100644
--- a/src/include/liburing.h
+++ b/src/include/liburing.h
@@ -1086,6 +1086,15 @@ static inline unsigned io_uring_cq_ready(const struct io_uring *ring)
 	return io_uring_smp_load_acquire(ring->cq.ktail) - *ring->cq.khead;
 }
 
+/*
+ * Returns true if there are overflow entries waiting to be flushed onto
+ * the CQ ring
+ */
+static inline bool io_uring_cq_has_overflow(const struct io_uring *ring)
+{
+	return IO_URING_READ_ONCE(*ring->sq.kflags) & IORING_SQ_CQ_OVERFLOW;
+}
+
 /*
  * Returns true if the eventfd notification is currently enabled
  */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 10/12] overflow: add tests
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
                   ` (8 preceding siblings ...)
  2022-09-01  9:33 ` [PATCH liburing v2 09/12] expose CQ ring overflow state Dylan Yudaken
@ 2022-09-01  9:33 ` Dylan Yudaken
  2022-09-01  9:33 ` [PATCH liburing v2 11/12] file-verify test: log if short read Dylan Yudaken
  2022-09-01  9:33 ` [PATCH liburing v2 12/12] shutdown test: bind to ephemeral port Dylan Yudaken
  11 siblings, 0 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:33 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Add tests that verify that overflow conditions behave appropriately.
Specifically:
 * if overflow is continually flushed, then CQEs should arrive mostly in
 order to prevent starvation of some completions
 * if CQEs are dropped due to GFP_ATOMIC allocation failures it is
 possible to terminate cleanly. This is not tested by default as it
 requires debug kernel config, and also has system-wide effects

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 test/cq-overflow.c | 243 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 236 insertions(+), 7 deletions(-)

diff --git a/test/cq-overflow.c b/test/cq-overflow.c
index 312b414b2a79..1087eefaacf0 100644
--- a/test/cq-overflow.c
+++ b/test/cq-overflow.c
@@ -9,6 +9,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <fcntl.h>
+#include <assert.h>
 
 #include "helpers.h"
 #include "liburing.h"
@@ -21,6 +22,32 @@ static struct iovec *vecs;
 
 #define ENTRIES	8
 
+/*
+ * io_uring has rare cases where CQEs are lost.
+ * This happens when there is no space in the CQ ring, and also there is no
+ * GFP_ATOMIC memory available. In reality this probably means that the process
+ * is about to be killed as many other things might start failing, but we still
+ * want to test that liburing and the kernel deal with this properly. The fault
+ * injection framework allows us to test this scenario. Unfortunately this
+ * requires some system wide changes and so we do not enable this by default.
+ * The tests in this file should work in both cases (where overflows are queued
+ * and where they are dropped) on recent kernels.
+ *
+ * In order to test dropped CQEs you should enable fault injection in the kernel
+ * config:
+ *
+ * CONFIG_FAULT_INJECTION=y
+ * CONFIG_FAILSLAB=y
+ * CONFIG_FAULT_INJECTION_DEBUG_FS=y
+ *
+ * and then run the test as follows:
+ * echo Y > /sys/kernel/debug/failslab/task-filter
+ * echo 100 > /sys/kernel/debug/failslab/probability
+ * echo 0 > /sys/kernel/debug/failslab/verbose
+ * echo 100000 > /sys/kernel/debug/failslab/times
+ * bash -c "echo 1 > /proc/self/make-it-fail && exec ./cq-overflow.t"
+ */
+
 static int test_io(const char *file, unsigned long usecs, unsigned *drops, int fault)
 {
 	struct io_uring_sqe *sqe;
@@ -29,6 +56,7 @@ static int test_io(const char *file, unsigned long usecs, unsigned *drops, int f
 	unsigned reaped, total;
 	struct io_uring ring;
 	int nodrop, i, fd, ret;
+	bool cqe_dropped = false;
 
 	fd = open(file, O_RDONLY | O_DIRECT);
 	if (fd < 0) {
@@ -104,8 +132,8 @@ static int test_io(const char *file, unsigned long usecs, unsigned *drops, int f
 reap_it:
 	reaped = 0;
 	do {
-		if (nodrop) {
-			/* nodrop should never lose events */
+		if (nodrop && !cqe_dropped) {
+			/* nodrop should never lose events unless cqe_dropped */
 			if (reaped == total)
 				break;
 		} else {
@@ -113,7 +141,10 @@ reap_it:
 				break;
 		}
 		ret = io_uring_wait_cqe(&ring, &cqe);
-		if (ret) {
+		if (nodrop && ret == -EBADR) {
+			cqe_dropped = true;
+			continue;
+		} else if (ret) {
 			fprintf(stderr, "wait_cqe=%d\n", ret);
 			goto err;
 		}
@@ -133,7 +164,7 @@ reap_it:
 		goto err;
 	}
 
-	if (!nodrop) {
+	if (!nodrop || cqe_dropped) {
 		*drops = *ring.cq.koverflow;
 	} else if (*ring.cq.koverflow) {
 		fprintf(stderr, "Found %u overflows\n", *ring.cq.koverflow);
@@ -154,18 +185,29 @@ static int reap_events(struct io_uring *ring, unsigned nr_events, int do_wait)
 {
 	struct io_uring_cqe *cqe;
 	int i, ret = 0, seq = 0;
+	unsigned int start_overflow = *ring->cq.koverflow;
+	bool dropped = false;
 
 	for (i = 0; i < nr_events; i++) {
 		if (do_wait)
 			ret = io_uring_wait_cqe(ring, &cqe);
 		else
 			ret = io_uring_peek_cqe(ring, &cqe);
-		if (ret) {
+		if (do_wait && ret == -EBADR) {
+			unsigned int this_drop = *ring->cq.koverflow -
+				start_overflow;
+
+			dropped = true;
+			start_overflow = *ring->cq.koverflow;
+			assert(this_drop > 0);
+			i += (this_drop - 1);
+			continue;
+		} else if (ret) {
 			if (ret != -EAGAIN)
 				fprintf(stderr, "cqe peek failed: %d\n", ret);
 			break;
 		}
-		if (cqe->user_data != seq) {
+		if (!dropped && cqe->user_data != seq) {
 			fprintf(stderr, "cqe sequence out-of-order\n");
 			fprintf(stderr, "got %d, wanted %d\n", (int) cqe->user_data,
 					seq);
@@ -242,19 +284,206 @@ err:
 	return 1;
 }
 
+
+static void submit_one_nop(struct io_uring *ring, int ud)
+{
+	struct io_uring_sqe *sqe;
+	int ret;
+
+	sqe = io_uring_get_sqe(ring);
+	assert(sqe);
+	io_uring_prep_nop(sqe);
+	sqe->user_data = ud;
+	ret = io_uring_submit(ring);
+	assert(ret == 1);
+}
+
+/*
+ * Create an overflow condition and ensure that SQEs are still processed
+ */
+static int test_overflow_handling(bool batch, int cqe_multiple, bool poll,
+				  bool defer)
+{
+	struct io_uring ring;
+	struct io_uring_params p;
+	int ret, i, j, ud, cqe_count;
+	unsigned int count;
+	int const N = 8;
+	int const LOOPS = 128;
+	int const QUEUE_LENGTH = 1024;
+	int completions[N];
+	int queue[QUEUE_LENGTH];
+	int queued = 0;
+	int outstanding = 0;
+	bool cqe_dropped = false;
+
+	memset(&completions, 0, sizeof(int) * N);
+	memset(&p, 0, sizeof(p));
+	p.cq_entries = 2 * cqe_multiple;
+	p.flags |= IORING_SETUP_CQSIZE;
+
+	if (poll)
+		p.flags |= IORING_SETUP_IOPOLL;
+
+	if (defer)
+		p.flags |= IORING_SETUP_SINGLE_ISSUER |
+			   IORING_SETUP_DEFER_TASKRUN;
+
+	ret = io_uring_queue_init_params(2, &ring, &p);
+	if (ret) {
+		fprintf(stderr, "io_uring_queue_init failed %d\n", ret);
+		return 1;
+	}
+
+	assert(p.cq_entries < N);
+	/* submit N SQEs, some should overflow */
+	for (i = 0; i < N; i++) {
+		submit_one_nop(&ring, i);
+		outstanding++;
+	}
+
+	for (i = 0; i < LOOPS; i++) {
+		struct io_uring_cqe *cqes[N];
+
+		if (io_uring_cq_has_overflow(&ring)) {
+			/*
+			 * Flush any overflowed CQEs and process those. Actively
+			 * flush these to make sure CQEs arrive in vague order
+			 * of being sent.
+			 */
+			ret = io_uring_get_events(&ring);
+			if (ret != 0) {
+				fprintf(stderr,
+					"io_uring_get_events returned %d\n",
+					ret);
+				goto err;
+			}
+		} else if (!cqe_dropped) {
+			for (j = 0; j < queued; j++) {
+				submit_one_nop(&ring, queue[j]);
+				outstanding++;
+			}
+			queued = 0;
+		}
+
+		/* We have lost some random cqes, stop if no remaining. */
+		if (cqe_dropped && outstanding == *ring.cq.koverflow)
+			break;
+
+		ret = io_uring_wait_cqe(&ring, &cqes[0]);
+		if (ret == -EBADR) {
+			cqe_dropped = true;
+			fprintf(stderr, "CQE dropped\n");
+			continue;
+		} else if (ret != 0) {
+			fprintf(stderr, "io_uring_wait_cqes failed %d\n", ret);
+			goto err;
+		}
+		cqe_count = 1;
+		if (batch) {
+			ret = io_uring_peek_batch_cqe(&ring, &cqes[0], 2);
+			if (ret < 0) {
+				fprintf(stderr,
+					"io_uring_peek_batch_cqe failed %d\n",
+					ret);
+				goto err;
+			}
+			cqe_count = ret;
+		}
+		for (j = 0; j < cqe_count; j++) {
+			assert(cqes[j]->user_data < N);
+			ud = cqes[j]->user_data;
+			completions[ud]++;
+			assert(queued < QUEUE_LENGTH);
+			queue[queued++] = (int)ud;
+		}
+		io_uring_cq_advance(&ring, cqe_count);
+		outstanding -= cqe_count;
+	}
+
+	/* See if there were any drops by flushing the CQ ring *and* overflow */
+	do {
+		struct io_uring_cqe *cqe;
+
+		ret = io_uring_get_events(&ring);
+		if (ret < 0) {
+			if (ret == -EBADR) {
+				fprintf(stderr, "CQE dropped\n");
+				cqe_dropped = true;
+				break;
+			}
+			goto err;
+		}
+		if (outstanding && !io_uring_cq_ready(&ring))
+			ret = io_uring_wait_cqe_timeout(&ring, &cqe, NULL);
+
+		if (ret && ret != -ETIME) {
+			if (ret == -EBADR) {
+				fprintf(stderr, "CQE dropped\n");
+				cqe_dropped = true;
+				break;
+			}
+			fprintf(stderr, "wait_cqe_timeout = %d\n", ret);
+			goto err;
+		}
+		count = io_uring_cq_ready(&ring);
+		io_uring_cq_advance(&ring, count);
+		outstanding -= count;
+	} while (count);
+
+	io_uring_queue_exit(&ring);
+
+	/* Make sure that completions come back in the same order they were
+	 * sent. If they come back unfairly then this will concentrate on a
+	 * couple of indices.
+	 */
+	for (i = 1; !cqe_dropped && i < N; i++) {
+		if (abs(completions[i] - completions[i - 1]) > 1) {
+			fprintf(stderr, "bad completion size %d %d\n",
+				completions[i], completions[i - 1]);
+			goto err;
+		}
+	}
+	return 0;
+err:
+	io_uring_queue_exit(&ring);
+	return 1;
+}
+
 int main(int argc, char *argv[])
 {
 	const char *fname = ".cq-overflow";
 	unsigned iters, drops;
 	unsigned long usecs;
 	int ret;
+	int i;
+	bool can_defer;
 
 	if (argc > 1)
 		return T_EXIT_SKIP;
 
+	can_defer = t_probe_defer_taskrun();
+	for (i = 0; i < 16; i++) {
+		bool batch = i & 1;
+		int mult = (i & 2) ? 1 : 2;
+		bool poll = i & 4;
+		bool defer = i & 8;
+
+		if (defer && !can_defer)
+			continue;
+
+		ret = test_overflow_handling(batch, mult, poll, defer);
+		if (ret) {
+			fprintf(stderr, "test_overflow_handling("
+				"batch=%d, mult=%d, poll=%d, defer=%d) failed\n",
+				batch, mult, poll, defer);
+			goto err;
+		}
+	}
+
 	ret = test_overflow();
 	if (ret) {
-		printf("test_overflow failed\n");
+		fprintf(stderr, "test_overflow failed\n");
 		return ret;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 11/12] file-verify test: log if short read
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
                   ` (9 preceding siblings ...)
  2022-09-01  9:33 ` [PATCH liburing v2 10/12] overflow: add tests Dylan Yudaken
@ 2022-09-01  9:33 ` Dylan Yudaken
  2022-09-01  9:33 ` [PATCH liburing v2 12/12] shutdown test: bind to ephemeral port Dylan Yudaken
  11 siblings, 0 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:33 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

This test assumes any success is a full read. If it was a short read the
test would still fail (as verification would fail) but tracking down the
reason can be annoying.
For now we can just log if it's short and fail at that point.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 test/file-verify.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/test/file-verify.c b/test/file-verify.c
index 595dafd4bd8d..3a395f9dccda 100644
--- a/test/file-verify.c
+++ b/test/file-verify.c
@@ -430,6 +430,10 @@ static int test(struct io_uring *ring, const char *fname, int buffered,
 				fprintf(stderr, "bad read %d, read %d\n", cqe->res, i);
 				goto err;
 			}
+			if (cqe->res < CHUNK_SIZE) {
+				fprintf(stderr, "short read %d, read %d\n", cqe->res, i);
+				goto err;
+			}
 			if (cqe->flags & IORING_CQE_F_BUFFER)
 				index = cqe->flags >> 16;
 			else
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH liburing v2 12/12] shutdown test: bind to ephemeral port
  2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
                   ` (10 preceding siblings ...)
  2022-09-01  9:33 ` [PATCH liburing v2 11/12] file-verify test: log if short read Dylan Yudaken
@ 2022-09-01  9:33 ` Dylan Yudaken
  2022-09-01 11:44   ` Ammar Faizi
  11 siblings, 1 reply; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01  9:33 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

This test would occasionally fail if the chosen port was in use. Rather
bind to an ephemeral port which will not be in use.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 test/shutdown.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/test/shutdown.c b/test/shutdown.c
index 14c7407b5492..c584893bdd28 100644
--- a/test/shutdown.c
+++ b/test/shutdown.c
@@ -30,6 +30,7 @@ int main(int argc, char *argv[])
 	int32_t recv_s0;
 	int32_t val = 1;
 	struct sockaddr_in addr;
+	socklen_t addrlen;
 
 	if (argc > 1)
 		return 0;
@@ -44,7 +45,7 @@ int main(int argc, char *argv[])
 	assert(ret != -1);
 
 	addr.sin_family = AF_INET;
-	addr.sin_port = htons((rand() % 61440) + 4096);
+	addr.sin_port = 0;
 	addr.sin_addr.s_addr = inet_addr("127.0.0.1");
 
 	ret = bind(recv_s0, (struct sockaddr*)&addr, sizeof(addr));
@@ -52,6 +53,10 @@ int main(int argc, char *argv[])
 	ret = listen(recv_s0, 128);
 	assert(ret != -1);
 
+	addrlen = (socklen_t)sizeof(addr);
+	assert(!getsockname(recv_s0, (struct sockaddr *)&addr,
+			    &addrlen));
+
 	p_fd[1] = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
 
 	val = 1;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH liburing v2 12/12] shutdown test: bind to ephemeral port
  2022-09-01  9:33 ` [PATCH liburing v2 12/12] shutdown test: bind to ephemeral port Dylan Yudaken
@ 2022-09-01 11:44   ` Ammar Faizi
  2022-09-01 12:54     ` Dylan Yudaken
  0 siblings, 1 reply; 18+ messages in thread
From: Ammar Faizi @ 2022-09-01 11:44 UTC (permalink / raw)
  To: Jens Axboe, Dylan Yudaken
  Cc: Facebook Kernel Team, io-uring Mailing List, Pavel Begunkov

On 9/1/22 4:33 PM, Dylan Yudaken wrote:
> This test would occasionally fail if the chosen port was in use. Rather
> bind to an ephemeral port which will not be in use.
> 
> Signed-off-by: Dylan Yudaken<dylany@fb.com>
> ---
>   test/shutdown.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/test/shutdown.c b/test/shutdown.c
> index 14c7407b5492..c584893bdd28 100644
> --- a/test/shutdown.c
> +++ b/test/shutdown.c
> @@ -30,6 +30,7 @@ int main(int argc, char *argv[])
>   	int32_t recv_s0;
>   	int32_t val = 1;
>   	struct sockaddr_in addr;
> +	socklen_t addrlen;
>   
>   	if (argc > 1)
>   		return 0;
> @@ -44,7 +45,7 @@ int main(int argc, char *argv[])
>   	assert(ret != -1);
>   
>   	addr.sin_family = AF_INET;
> -	addr.sin_port = htons((rand() % 61440) + 4096);
> +	addr.sin_port = 0;
>   	addr.sin_addr.s_addr = inet_addr("127.0.0.1");
>   
>   	ret = bind(recv_s0, (struct sockaddr*)&addr, sizeof(addr));
> @@ -52,6 +53,10 @@ int main(int argc, char *argv[])
>   	ret = listen(recv_s0, 128);
>   	assert(ret != -1);
>   
> +	addrlen = (socklen_t)sizeof(addr);
> +	assert(!getsockname(recv_s0, (struct sockaddr *)&addr,
> +			    &addrlen));
> +

Hi Jens,
Hi Dylan,

I like the idea of using an ephemeral port. This is the most
reliable way to get a port number that's not in use. However,
we have many places that do this rand() mechanism to generate
a port number. This patch only fixes shutdown.c.

What do you think of creating a new function helper like this?

int t_bind_ephemeral(int fd, struct sockaddr_in *addr)
{
         socklen_t addrlen;
         int ret;

         addr->sin_port = 0;
         if (bind(fd, (struct sockaddr*)addr, sizeof(*addr)))
                 return -errno;

         addrlen = sizeof(*addr);
         assert(!getsockname(fd, (struct sockaddr *)&addr, &addrlen));
         return 0;
}

We can avoid code duplication by doing that. I can do that.
If everybody agrees, let's drop this patch and I will wire up
t_bind_ephemeral() function.

Yes? No?

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH liburing v2 12/12] shutdown test: bind to ephemeral port
  2022-09-01 11:44   ` Ammar Faizi
@ 2022-09-01 12:54     ` Dylan Yudaken
  2022-09-01 13:03       ` Ammar Faizi
  0 siblings, 1 reply; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-01 12:54 UTC (permalink / raw)
  To: ammarfaizi2, axboe; +Cc: Kernel Team, io-uring, asml.silence

On Thu, 2022-09-01 at 18:44 +0700, Ammar Faizi wrote:
> On 9/1/22 4:33 PM, Dylan Yudaken wrote:
> > This test would occasionally fail if the chosen port was in use.
> > Rather
> > bind to an ephemeral port which will not be in use.
> > 
> > Signed-off-by: Dylan Yudaken<dylany@fb.com>
> > ---
> >   test/shutdown.c | 7 ++++++-
> >   1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/test/shutdown.c b/test/shutdown.c
> > index 14c7407b5492..c584893bdd28 100644
> > --- a/test/shutdown.c
> > +++ b/test/shutdown.c
> > @@ -30,6 +30,7 @@ int main(int argc, char *argv[])
> >         int32_t recv_s0;
> >         int32_t val = 1;
> >         struct sockaddr_in addr;
> > +       socklen_t addrlen;
> >   
> >         if (argc > 1)
> >                 return 0;
> > @@ -44,7 +45,7 @@ int main(int argc, char *argv[])
> >         assert(ret != -1);
> >   
> >         addr.sin_family = AF_INET;
> > -       addr.sin_port = htons((rand() % 61440) + 4096);
> > +       addr.sin_port = 0;
> >         addr.sin_addr.s_addr = inet_addr("127.0.0.1");
> >   
> >         ret = bind(recv_s0, (struct sockaddr*)&addr, sizeof(addr));
> > @@ -52,6 +53,10 @@ int main(int argc, char *argv[])
> >         ret = listen(recv_s0, 128);
> >         assert(ret != -1);
> >   
> > +       addrlen = (socklen_t)sizeof(addr);
> > +       assert(!getsockname(recv_s0, (struct sockaddr *)&addr,
> > +                           &addrlen));
> > +
> 
> Hi Jens,
> Hi Dylan,
> 
> I like the idea of using an ephemeral port. This is the most
> reliable way to get a port number that's not in use. However,
> we have many places that do this rand() mechanism to generate
> a port number. This patch only fixes shutdown.c.

Good point. I only fixed that test as it seemed to be breaking often.
Maybe something unlucky with the port that was selected.

> 
> What do you think of creating a new function helper like this?
> 
> int t_bind_ephemeral(int fd, struct sockaddr_in *addr)
> {
>          socklen_t addrlen;
>          int ret;
> 
>          addr->sin_port = 0;
>          if (bind(fd, (struct sockaddr*)addr, sizeof(*addr)))
>                  return -errno;
> 
>          addrlen = sizeof(*addr);
>          assert(!getsockname(fd, (struct sockaddr *)&addr,
> &addrlen));
>          return 0;
> }
> 
> We can avoid code duplication by doing that. I can do that.
> If everybody agrees, let's drop this patch and I will wire up
> t_bind_ephemeral() function.
> 
> Yes? No?
> 


I think something like that sounds sensible.

There is also some duplication with t_create_socket_pair, as I suspect
most tests could just be rewritten to use that instead - depending on
how much effort you are looking to put into this.

For now I think dropping the patch and doing it properly in some form
makes a lot of sense.

Dylan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH liburing v2 12/12] shutdown test: bind to ephemeral port
  2022-09-01 12:54     ` Dylan Yudaken
@ 2022-09-01 13:03       ` Ammar Faizi
  0 siblings, 0 replies; 18+ messages in thread
From: Ammar Faizi @ 2022-09-01 13:03 UTC (permalink / raw)
  To: Jens Axboe, Dylan Yudaken
  Cc: Facebook Kernel Team, io-uring Mailing List, Pavel Begunkov,
	Kanna Scarlet

On 9/1/22 7:54 PM, Dylan Yudaken wrote:
> I think something like that sounds sensible.
> 
> There is also some duplication with t_create_socket_pair, as I suspect
> most tests could just be rewritten to use that instead - depending on
> how much effort you are looking to put into this.
> 
> For now I think dropping the patch and doing it properly in some form
> makes a lot of sense.

OK, I will do the t_bind_ephemeral() part first for now.

-- 
Ammar Faizi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH liburing v2 03/12] add io_uring_submit_and_get_events and io_uring_get_events
  2022-09-01  9:32 ` [PATCH liburing v2 03/12] add io_uring_submit_and_get_events and io_uring_get_events Dylan Yudaken
@ 2022-09-01 18:36   ` Jens Axboe
  2022-09-05  8:31     ` Dylan Yudaken
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2022-09-01 18:36 UTC (permalink / raw)
  To: Dylan Yudaken, Pavel Begunkov, io-uring; +Cc: Kernel-team

On 9/1/22 3:32 AM, Dylan Yudaken wrote:
> With deferred task running, we would like to be able to combine submit
> with get events (regardless of if there are CQE's available), or if there
> is nothing to submit then simply do an enter with IORING_ENTER_GETEVENTS
> set, in order to process any available work.
> 
> Expose these APIs

Maybe this is added later, but man page entries are missing for these
two.

We also need get these added to the liburing.map.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH liburing v2 03/12] add io_uring_submit_and_get_events and io_uring_get_events
  2022-09-01 18:36   ` Jens Axboe
@ 2022-09-05  8:31     ` Dylan Yudaken
  0 siblings, 0 replies; 18+ messages in thread
From: Dylan Yudaken @ 2022-09-05  8:31 UTC (permalink / raw)
  To: axboe, asml.silence, io-uring; +Cc: Kernel Team

On Thu, 2022-09-01 at 12:36 -0600, Jens Axboe wrote:
> On 9/1/22 3:32 AM, Dylan Yudaken wrote:
> > With deferred task running, we would like to be able to combine
> > submit
> > with get events (regardless of if there are CQE's available), or if
> > there
> > is nothing to submit then simply do an enter with
> > IORING_ENTER_GETEVENTS
> > set, in order to process any available work.
> > 
> > Expose these APIs
> 
> Maybe this is added later, but man page entries are missing for these
> two.
> 
> We also need get these added to the liburing.map.
> 

OK I'll do both of these and do a v3

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-09-05  8:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-01  9:32 [PATCH liburing v2 00/12] Defer taskrun changes Dylan Yudaken
2022-09-01  9:32 ` [PATCH liburing v2 01/12] Copy defer task run definition from kernel Dylan Yudaken
2022-09-01  9:32 ` [PATCH liburing v2 02/12] Add documentation for IORING_SETUP_DEFER_TASKRUN flag Dylan Yudaken
2022-09-01  9:32 ` [PATCH liburing v2 03/12] add io_uring_submit_and_get_events and io_uring_get_events Dylan Yudaken
2022-09-01 18:36   ` Jens Axboe
2022-09-05  8:31     ` Dylan Yudaken
2022-09-01  9:32 ` [PATCH liburing v2 04/12] add a t_probe_defer_taskrun helper function for tests Dylan Yudaken
2022-09-01  9:32 ` [PATCH liburing v2 05/12] update existing tests for defer taskrun Dylan Yudaken
2022-09-01  9:32 ` [PATCH liburing v2 06/12] add a defer-taskrun test Dylan Yudaken
2022-09-01  9:32 ` [PATCH liburing v2 07/12] update io_uring_enter.2 docs for IORING_FEAT_NODROP Dylan Yudaken
2022-09-01  9:32 ` [PATCH liburing v2 08/12] add docs for overflow lost errors Dylan Yudaken
2022-09-01  9:33 ` [PATCH liburing v2 09/12] expose CQ ring overflow state Dylan Yudaken
2022-09-01  9:33 ` [PATCH liburing v2 10/12] overflow: add tests Dylan Yudaken
2022-09-01  9:33 ` [PATCH liburing v2 11/12] file-verify test: log if short read Dylan Yudaken
2022-09-01  9:33 ` [PATCH liburing v2 12/12] shutdown test: bind to ephemeral port Dylan Yudaken
2022-09-01 11:44   ` Ammar Faizi
2022-09-01 12:54     ` Dylan Yudaken
2022-09-01 13:03       ` Ammar Faizi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.