io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH liburing 00/11] Defer taskrun changes
@ 2022-08-15 13:09 Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 01/11] Copy defer task run definition from kernel Dylan Yudaken
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

This series adds support to liburing for the IORING_SETUP_DEFER_TASKRUN flag.

The second half of the series is a bit mixed and includes some documentation
fixes, overflow cleanups and test cleanups. I sent these a couple of months
ago and forgot about it, but now it does depend on the new API so it needs to
be ordered.
I can send it separately if you like.

Patches:

1 copies the definition from the kernel include file
2 introduces new APIs required for this feature
3/4/5 add tests for IORING_SETUP_DEFER_TASKRUN

6/7/8 clean and update existing documentation to match upstream
9 exposes the overflow state to the application
10 uses this and tests overflow functionality
11 gives an explicit warning if there is a short read in file-verify

Dylan Yudaken (11):
  Copy defer task run definition from kernel
  add io_uring_submit_and_get_events and io_uring_get_events
  add a t_probe_defer_taskrun helper function for tests
  update existing tests for defer taskrun
  add a defer-taskrun test
  fix documentation shortenings
  update io_uring_enter.2 docs for IORING_FEAT_NODROP
  add docs for overflow lost errors
  expose CQ ring overflow state
  overflow: add tests
  file-verify test: log if short read

 man/io_uring_enter.2            |  24 +++-
 man/io_uring_setup.2            |  11 +-
 man/io_uring_wait_cqe.3         |   2 +-
 man/io_uring_wait_cqe_nr.3      |   2 +-
 man/io_uring_wait_cqe_timeout.3 |   2 +-
 man/io_uring_wait_cqes.3        |   2 +-
 src/include/liburing.h          |  11 ++
 src/include/liburing/io_uring.h |   7 +
 src/queue.c                     |  28 ++--
 test/Makefile                   |   1 +
 test/cq-overflow.c              | 248 +++++++++++++++++++++++++++++++-
 test/defer-taskrun.c            | 217 ++++++++++++++++++++++++++++
 test/eventfd-disable.c          |  32 ++++-
 test/file-verify.c              |   4 +
 test/helpers.c                  |  12 ++
 test/helpers.h                  |   2 +
 test/iopoll.c                   |  16 ++-
 test/multicqes_drain.c          |  49 ++++++-
 test/poll-mshot-overflow.c      |  39 ++++-
 test/recv-multishot.c           |  32 +++--
 test/rsrc_tags.c                |  10 +-
 21 files changed, 697 insertions(+), 54 deletions(-)
 create mode 100644 test/defer-taskrun.c


base-commit: 15cc446cb8b0caa5c939423d31182eba66141539
-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH liburing 01/11] Copy defer task run definition from kernel
  2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
@ 2022-08-15 13:09 ` Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 02/11] add io_uring_submit_and_get_events and io_uring_get_events Dylan Yudaken
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Copy the flag from upstream

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 src/include/liburing/io_uring.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/include/liburing/io_uring.h b/src/include/liburing/io_uring.h
index c923f5c07699..b80bb0c09681 100644
--- a/src/include/liburing/io_uring.h
+++ b/src/include/liburing/io_uring.h
@@ -156,6 +156,13 @@ enum {
  */
 #define IORING_SETUP_SINGLE_ISSUER	(1U << 12)
 
+/*
+ * Defer running task work to get events.
+ * Rather than running bits of task work whenever the task transitions
+ * try to do it just before it is needed.
+ */
+#define IORING_SETUP_DEFER_TASKRUN	(1U << 13)
+
 enum io_uring_op {
 	IORING_OP_NOP,
 	IORING_OP_READV,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH liburing 02/11] add io_uring_submit_and_get_events and io_uring_get_events
  2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 01/11] Copy defer task run definition from kernel Dylan Yudaken
@ 2022-08-15 13:09 ` Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 03/11] add a t_probe_defer_taskrun helper function for tests Dylan Yudaken
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

With deferred task running, we would like to be able to combine submit
with get events (regardless of if there are CQE's available), or if there
is nothing to submit then simply do an enter with IORING_ENTER_GETEVENTS
set, in order to process any available work.

Expose these APIs

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 src/include/liburing.h |  2 ++
 src/queue.c            | 28 +++++++++++++++++++---------
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/include/liburing.h b/src/include/liburing.h
index 06f4a50bacb1..6b25c358c63f 100644
--- a/src/include/liburing.h
+++ b/src/include/liburing.h
@@ -192,6 +192,8 @@ int io_uring_register_file_alloc_range(struct io_uring *ring,
 int io_uring_register_notifications(struct io_uring *ring, unsigned nr,
 				    struct io_uring_notification_slot *slots);
 int io_uring_unregister_notifications(struct io_uring *ring);
+int io_uring_get_events(struct io_uring *ring);
+int io_uring_submit_and_get_events(struct io_uring *ring);
 
 /*
  * Helper for the peek/wait single cqe functions. Exported because of that,
diff --git a/src/queue.c b/src/queue.c
index 72cc77b9f0d0..216f29a8afef 100644
--- a/src/queue.c
+++ b/src/queue.c
@@ -124,6 +124,15 @@ int __io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr,
 	return _io_uring_get_cqe(ring, cqe_ptr, &data);
 }
 
+int io_uring_get_events(struct io_uring *ring)
+{
+	int flags = IORING_ENTER_GETEVENTS;
+
+	if (ring->int_flags & INT_FLAG_REG_RING)
+		flags |= IORING_ENTER_REGISTERED_RING;
+	return __sys_io_uring_enter(ring->enter_ring_fd, 0, 0, flags, NULL);
+}
+
 /*
  * Fill in an array of IO completions up to count, if any are available.
  * Returns the amount of IO completions filled.
@@ -158,11 +167,7 @@ again:
 		goto done;
 
 	if (cq_ring_needs_flush(ring)) {
-		int flags = IORING_ENTER_GETEVENTS;
-
-		if (ring->int_flags & INT_FLAG_REG_RING)
-			flags |= IORING_ENTER_REGISTERED_RING;
-		__sys_io_uring_enter(ring->enter_ring_fd, 0, 0, flags, NULL);
+		io_uring_get_events(ring);
 		overflow_checked = true;
 		goto again;
 	}
@@ -348,14 +353,14 @@ int io_uring_wait_cqe_timeout(struct io_uring *ring,
  * Returns number of sqes submitted
  */
 static int __io_uring_submit(struct io_uring *ring, unsigned submitted,
-			     unsigned wait_nr)
+			     unsigned wait_nr, bool getevents)
 {
 	unsigned flags;
 	int ret;
 
 	flags = 0;
-	if (sq_ring_needs_enter(ring, &flags) || wait_nr) {
-		if (wait_nr || (ring->flags & IORING_SETUP_IOPOLL))
+	if (getevents || sq_ring_needs_enter(ring, &flags) || wait_nr) {
+		if (getevents || wait_nr || (ring->flags & IORING_SETUP_IOPOLL))
 			flags |= IORING_ENTER_GETEVENTS;
 		if (ring->int_flags & INT_FLAG_REG_RING)
 			flags |= IORING_ENTER_REGISTERED_RING;
@@ -370,7 +375,7 @@ static int __io_uring_submit(struct io_uring *ring, unsigned submitted,
 
 static int __io_uring_submit_and_wait(struct io_uring *ring, unsigned wait_nr)
 {
-	return __io_uring_submit(ring, __io_uring_flush_sq(ring), wait_nr);
+	return __io_uring_submit(ring, __io_uring_flush_sq(ring), wait_nr, false);
 }
 
 /*
@@ -393,6 +398,11 @@ int io_uring_submit_and_wait(struct io_uring *ring, unsigned wait_nr)
 	return __io_uring_submit_and_wait(ring, wait_nr);
 }
 
+int io_uring_submit_and_get_events(struct io_uring *ring)
+{
+	return __io_uring_submit(ring, __io_uring_flush_sq(ring), 0, true);
+}
+
 #ifdef LIBURING_INTERNAL
 struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH liburing 03/11] add a t_probe_defer_taskrun helper function for tests
  2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 01/11] Copy defer task run definition from kernel Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 02/11] add io_uring_submit_and_get_events and io_uring_get_events Dylan Yudaken
@ 2022-08-15 13:09 ` Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 04/11] update existing tests for defer taskrun Dylan Yudaken
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Help tests to determine if they can use IORING_SETUP_DEFER_TASKRUN

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 test/helpers.c | 12 ++++++++++++
 test/helpers.h |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/test/helpers.c b/test/helpers.c
index 014653313f41..12ae727accef 100644
--- a/test/helpers.c
+++ b/test/helpers.c
@@ -235,3 +235,15 @@ errno_cleanup:
 	close(fd[1]);
 	return ret;
 }
+
+bool t_probe_defer_taskrun(void)
+{
+	struct io_uring ring;
+	int ret;
+
+	ret = io_uring_queue_init(1, &ring, IORING_SETUP_DEFER_TASKRUN);
+	if (ret < 0)
+		return false;
+	io_uring_queue_exit(&ring);
+	return true;
+}
diff --git a/test/helpers.h b/test/helpers.h
index 6d5726c9deb6..efce4b344f87 100644
--- a/test/helpers.h
+++ b/test/helpers.h
@@ -75,6 +75,8 @@ enum t_setup_ret t_register_buffers(struct io_uring *ring,
 				    const struct iovec *iovecs,
 				    unsigned nr_iovecs);
 
+bool t_probe_defer_taskrun(void);
+
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 
 #ifdef __cplusplus
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH liburing 04/11] update existing tests for defer taskrun
  2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
                   ` (2 preceding siblings ...)
  2022-08-15 13:09 ` [PATCH liburing 03/11] add a t_probe_defer_taskrun helper function for tests Dylan Yudaken
@ 2022-08-15 13:09 ` Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 05/11] add a defer-taskrun test Dylan Yudaken
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Add defer_taskrun to a few choice tests that can expose some bad
behaviour.
This requires adding some io_uring_get_events calls to make sure deferred
tasks are run

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 test/eventfd-disable.c     | 32 ++++++++++++++++++++++---
 test/iopoll.c              | 16 +++++++++----
 test/multicqes_drain.c     | 49 +++++++++++++++++++++++++++++++++-----
 test/poll-mshot-overflow.c | 39 ++++++++++++++++++++++++++----
 test/recv-multishot.c      | 32 ++++++++++++++++---------
 test/rsrc_tags.c           | 10 ++++++--
 6 files changed, 147 insertions(+), 31 deletions(-)

diff --git a/test/eventfd-disable.c b/test/eventfd-disable.c
index 2c8cf6dad7c1..a8ecd6d0faee 100644
--- a/test/eventfd-disable.c
+++ b/test/eventfd-disable.c
@@ -15,7 +15,7 @@
 #include "liburing.h"
 #include "helpers.h"
 
-int main(int argc, char *argv[])
+static int test(bool defer)
 {
 	struct io_uring_params p = {};
 	struct io_uring_sqe *sqe;
@@ -28,8 +28,8 @@ int main(int argc, char *argv[])
 	};
 	int ret, evfd, i;
 
-	if (argc > 1)
-		return T_EXIT_SKIP;
+	if (defer)
+		p.flags |= IORING_SETUP_DEFER_TASKRUN;
 
 	ret = io_uring_queue_init_params(64, &ring, &p);
 	if (ret) {
@@ -148,5 +148,31 @@ int main(int argc, char *argv[])
 		io_uring_cqe_seen(&ring, cqe);
 	}
 
+	io_uring_queue_exit(&ring);
+	close(evfd);
 	return T_EXIT_PASS;
 }
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (argc > 1)
+		return T_EXIT_SKIP;
+
+	ret = test(false);
+	if (ret != T_EXIT_PASS) {
+		fprintf(stderr, "%s: test(false) failed\n", argv[0]);
+		return ret;
+	}
+
+	if (t_probe_defer_taskrun()) {
+		ret = test(true);
+		if (ret != T_EXIT_PASS) {
+			fprintf(stderr, "%s: test(true) failed\n", argv[0]);
+			return ret;
+		}
+	}
+
+	return ret;
+}
diff --git a/test/iopoll.c b/test/iopoll.c
index 51b192f0a10e..32040d24bbda 100644
--- a/test/iopoll.c
+++ b/test/iopoll.c
@@ -274,7 +274,7 @@ ok:
 }
 
 static int test_io(const char *file, int write, int sqthread, int fixed,
-		   int buf_select)
+		   int buf_select, int defer)
 {
 	struct io_uring ring;
 	int ret, ring_flags = IORING_SETUP_IOPOLL;
@@ -282,6 +282,9 @@ static int test_io(const char *file, int write, int sqthread, int fixed,
 	if (no_iopoll)
 		return 0;
 
+	if (defer)
+		ring_flags |= IORING_SETUP_DEFER_TASKRUN;
+
 	ret = t_create_ring(64, &ring, ring_flags);
 	if (ret == T_SETUP_SKIP)
 		return 0;
@@ -337,19 +340,22 @@ int main(int argc, char *argv[])
 
 	vecs = t_create_buffers(BUFFERS, BS);
 
-	nr = 16;
+	nr = 32;
 	if (no_buf_select)
 		nr = 8;
+	else if (!t_probe_defer_taskrun())
+		nr = 16;
 	for (i = 0; i < nr; i++) {
 		int write = (i & 1) != 0;
 		int sqthread = (i & 2) != 0;
 		int fixed = (i & 4) != 0;
 		int buf_select = (i & 8) != 0;
+		int defer = (i & 16) != 0;
 
-		ret = test_io(fname, write, sqthread, fixed, buf_select);
+		ret = test_io(fname, write, sqthread, fixed, buf_select, defer);
 		if (ret) {
-			fprintf(stderr, "test_io failed %d/%d/%d/%d\n",
-				write, sqthread, fixed, buf_select);
+			fprintf(stderr, "test_io failed %d/%d/%d/%d/%d\n",
+				write, sqthread, fixed, buf_select, defer);
 			goto err;
 		}
 		if (no_iopoll)
diff --git a/test/multicqes_drain.c b/test/multicqes_drain.c
index 1423f9290ec7..c2bc2ae84672 100644
--- a/test/multicqes_drain.c
+++ b/test/multicqes_drain.c
@@ -233,6 +233,8 @@ static int test_generic_drain(struct io_uring *ring)
 
 		if (trigger_event(pipes[i]))
 			goto err;
+
+		io_uring_get_events(ring);
 	}
 	sleep(1);
 	i = 0;
@@ -246,7 +248,7 @@ static int test_generic_drain(struct io_uring *ring)
 	 * compl_bits is a bit map to record completions.
 	 * eg. sqe[0], sqe[1], sqe[2] fully completed
 	 * then compl_bits is 000...00111b
-	 * 
+	 *
 	 */
 	unsigned long long compl_bits = 0;
 	for (j = 0; j < i; j++) {
@@ -295,7 +297,12 @@ static int test_simple_drain(struct io_uring *ring)
 	io_uring_prep_poll_add(sqe[1], pipe2[0], POLLIN);
 	sqe[1]->user_data = 1;
 
-	ret = io_uring_submit(ring);
+	/* This test relies on multishot poll to trigger events continually.
+	 * however with IORING_SETUP_DEFER_TASKRUN this will only happen when
+	 * triggered with a get_events. Hence we sprinkle get_events whenever
+	 * there might be work to process in order to get the same result
+	 */
+	ret = io_uring_submit_and_get_events(ring);
 	if (ret < 0) {
 		printf("sqe submit failed\n");
 		goto err;
@@ -307,9 +314,11 @@ static int test_simple_drain(struct io_uring *ring)
 	for (i = 0; i < 2; i++) {
 		if (trigger_event(pipe1))
 			goto err;
+		io_uring_get_events(ring);
 	}
 	if (trigger_event(pipe2))
 			goto err;
+	io_uring_get_events(ring);
 
 	for (i = 0; i < 2; i++) {
 		sqe[i] = io_uring_get_sqe(ring);
@@ -355,15 +364,16 @@ err:
 	return 1;
 }
 
-int main(int argc, char *argv[])
+static int test(bool defer_taskrun)
 {
 	struct io_uring ring;
 	int i, ret;
+	unsigned int flags = 0;
 
-	if (argc > 1)
-		return T_EXIT_SKIP;
+	if (defer_taskrun)
+		flags = IORING_SETUP_DEFER_TASKRUN;
 
-	ret = io_uring_queue_init(1024, &ring, 0);
+	ret = io_uring_queue_init(1024, &ring, flags);
 	if (ret) {
 		printf("ring setup failed\n");
 		return T_EXIT_FAIL;
@@ -384,5 +394,32 @@ int main(int argc, char *argv[])
 			return T_EXIT_FAIL;
 		}
 	}
+
+	io_uring_queue_exit(&ring);
+
 	return T_EXIT_PASS;
 }
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (argc > 1)
+		return T_EXIT_SKIP;
+
+	ret = test(false);
+	if (ret != T_EXIT_PASS) {
+		fprintf(stderr, "%s: test(false) failed\n", argv[0]);
+		return ret;
+	}
+
+	if (t_probe_defer_taskrun()) {
+		ret = test(true);
+		if (ret != T_EXIT_PASS) {
+			fprintf(stderr, "%s: test(true) failed\n", argv[0]);
+			return ret;
+		}
+	}
+
+	return ret;
+}
diff --git a/test/poll-mshot-overflow.c b/test/poll-mshot-overflow.c
index 360df65d2b15..729f7751410b 100644
--- a/test/poll-mshot-overflow.c
+++ b/test/poll-mshot-overflow.c
@@ -42,7 +42,7 @@ int check_final_cqe(struct io_uring *ring)
 	return T_EXIT_PASS;
 }
 
-int main(int argc, char *argv[])
+static int test(bool defer_taskrun)
 {
 	struct io_uring_cqe *cqe;
 	struct io_uring_sqe *sqe;
@@ -50,9 +50,6 @@ int main(int argc, char *argv[])
 	int pipe1[2];
 	int ret, i;
 
-	if (argc > 1)
-		return 0;
-
 	if (pipe(pipe1) != 0) {
 		perror("pipe");
 		return T_EXIT_FAIL;
@@ -66,6 +63,9 @@ int main(int argc, char *argv[])
 		.cq_entries = 2
 	};
 
+	if (defer_taskrun)
+		params.flags |= IORING_SETUP_DEFER_TASKRUN;
+
 	ret = io_uring_queue_init_params(2, &ring, &params);
 	if (ret)
 		return T_EXIT_SKIP;
@@ -113,6 +113,9 @@ int main(int argc, char *argv[])
 		io_uring_cqe_seen(&ring, cqe);
 	}
 
+	/* make sure everything is processed */
+	io_uring_get_events(&ring);
+
 	/* now remove the poll */
 	sqe = io_uring_get_sqe(&ring);
 	io_uring_prep_poll_remove(sqe, 1);
@@ -126,5 +129,33 @@ int main(int argc, char *argv[])
 
 	ret = check_final_cqe(&ring);
 
+	close(pipe1[0]);
+	close(pipe1[1]);
+	io_uring_queue_exit(&ring);
+
+	return ret;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (argc > 1)
+		return T_EXIT_SKIP;
+
+	ret = test(false);
+	if (ret != T_EXIT_PASS) {
+		fprintf(stderr, "%s: test(false) failed\n", argv[0]);
+		return ret;
+	}
+
+	if (t_probe_defer_taskrun()) {
+		ret = test(true);
+		if (ret != T_EXIT_PASS) {
+			fprintf(stderr, "%s: test(true) failed\n", argv[0]);
+			return ret;
+		}
+	}
+
 	return ret;
 }
diff --git a/test/recv-multishot.c b/test/recv-multishot.c
index a322e4317232..71664ba3c0fe 100644
--- a/test/recv-multishot.c
+++ b/test/recv-multishot.c
@@ -29,6 +29,7 @@ struct args {
 	bool wait_each;
 	bool recvmsg;
 	enum early_error_t early_error;
+	bool defer;
 };
 
 static int check_sockaddr(struct sockaddr_in *in)
@@ -76,19 +77,21 @@ static int test(struct args *args)
 		.tv_sec = 1,
 	};
 	struct msghdr msg;
+	struct io_uring_params params = { };
+	int n_sqe = 32;
 
 	memset(recv_buffs, 0, sizeof(recv_buffs));
 
-	if (args->early_error == ERROR_EARLY_OVERFLOW) {
-		struct io_uring_params params = {
-			.flags = IORING_SETUP_CQSIZE,
-			.cq_entries = N_CQE_OVERFLOW
-		};
+	if (args->defer)
+		params.flags |= IORING_SETUP_DEFER_TASKRUN;
 
-		ret = io_uring_queue_init_params(N_CQE_OVERFLOW, &ring, &params);
-	} else {
-		ret = io_uring_queue_init(32, &ring, 0);
+	if (args->early_error == ERROR_EARLY_OVERFLOW) {
+		params.flags |= IORING_SETUP_CQSIZE;
+		params.cq_entries = N_CQE_OVERFLOW;
+		n_sqe = N_CQE_OVERFLOW;
 	}
+
+	ret = io_uring_queue_init_params(n_sqe, &ring, &params);
 	if (ret) {
 		fprintf(stderr, "queue init failed: %d\n", ret);
 		return ret;
@@ -457,23 +460,30 @@ int main(int argc, char *argv[])
 	int ret;
 	int loop;
 	int early_error = 0;
+	bool has_defer;
 
 	if (argc > 1)
 		return T_EXIT_SKIP;
 
-	for (loop = 0; loop < 8; loop++) {
+	has_defer = t_probe_defer_taskrun();
+
+	for (loop = 0; loop < 16; loop++) {
 		struct args a = {
 			.stream = loop & 0x01,
 			.wait_each = loop & 0x2,
 			.recvmsg = loop & 0x04,
+			.defer = loop & 0x08,
 		};
+		if (a.defer && !has_defer)
+			continue;
 		for (early_error = 0; early_error < ERROR_EARLY_LAST; early_error++) {
 			a.early_error = (enum early_error_t)early_error;
 			ret = test(&a);
 			if (ret) {
 				fprintf(stderr,
-					"test stream=%d wait_each=%d recvmsg=%d early_error=%d failed\n",
-					a.stream, a.wait_each, a.recvmsg, a.early_error);
+					"test stream=%d wait_each=%d recvmsg=%d early_error=%d "
+					" defer=%d failed\n",
+					a.stream, a.wait_each, a.recvmsg, a.early_error, a.defer);
 				return T_EXIT_FAIL;
 			}
 			if (no_recv_mshot)
diff --git a/test/rsrc_tags.c b/test/rsrc_tags.c
index ca380d84a42f..b69223a54b12 100644
--- a/test/rsrc_tags.c
+++ b/test/rsrc_tags.c
@@ -401,7 +401,8 @@ static int test_notag(void)
 
 int main(int argc, char *argv[])
 {
-	int ring_flags[] = {0, IORING_SETUP_IOPOLL, IORING_SETUP_SQPOLL};
+	int ring_flags[] = {0, IORING_SETUP_IOPOLL, IORING_SETUP_SQPOLL,
+			    IORING_SETUP_DEFER_TASKRUN};
 	int i, ret;
 
 	if (argc > 1)
@@ -423,7 +424,12 @@ int main(int argc, char *argv[])
 	}
 
 	for (i = 0; i < sizeof(ring_flags) / sizeof(ring_flags[0]); i++) {
-		ret = test_files(ring_flags[i]);
+		int flag = ring_flags[i];
+
+		if (flag & IORING_SETUP_DEFER_TASKRUN && !t_probe_defer_taskrun())
+			continue;
+
+		ret = test_files(flag);
 		if (ret) {
 			printf("test_tag failed, type %i\n", i);
 			return ret;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH liburing 05/11] add a defer-taskrun test
  2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
                   ` (3 preceding siblings ...)
  2022-08-15 13:09 ` [PATCH liburing 04/11] update existing tests for defer taskrun Dylan Yudaken
@ 2022-08-15 13:09 ` Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 06/11] fix documentation shortenings Dylan Yudaken
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Add a test specifically for IORING_SETUP_DEFER_TASKRUN

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 test/Makefile        |   1 +
 test/defer-taskrun.c | 217 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 218 insertions(+)
 create mode 100644 test/defer-taskrun.c

diff --git a/test/Makefile b/test/Makefile
index 418c11c95875..78a499a357d7 100644
--- a/test/Makefile
+++ b/test/Makefile
@@ -62,6 +62,7 @@ test_srcs := \
 	d4ae271dfaae.c \
 	d77a67ed5f27.c \
 	defer.c \
+	defer-taskrun.c \
 	double-poll-crash.c \
 	drop-submit.c \
 	eeed8b54e0df.c \
diff --git a/test/defer-taskrun.c b/test/defer-taskrun.c
new file mode 100644
index 000000000000..5ba044a0955d
--- /dev/null
+++ b/test/defer-taskrun.c
@@ -0,0 +1,217 @@
+// SPDX-License-Identifier: MIT
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+#include <error.h>
+#include <sys/eventfd.h>
+#include <signal.h>
+#include <poll.h>
+#include <assert.h>
+#include <pthread.h>
+
+#include "liburing.h"
+#include "test.h"
+#include "helpers.h"
+
+static bool can_read(int fd)
+{
+	int ret;
+	struct pollfd p = {
+		.fd = fd,
+		.events = POLLIN,
+	};
+
+	ret = poll(&p, 1, 0);
+
+	return ret == 1;
+}
+
+static void eventfd_clear(int fd)
+{
+	uint64_t val;
+	int ret;
+
+	assert(can_read(fd));
+	ret = read(fd, &val, 8);
+	assert(ret == 8);
+}
+
+static void eventfd_trigger(int fd)
+{
+	uint64_t val = 1;
+	int ret;
+
+	ret = write(fd, &val, sizeof(val));
+	assert(ret == sizeof(val));
+}
+
+#define CHECK(x) if (!(x)) { \
+		fprintf(stderr, "%s:%d %s failed\n", __FILE__, __LINE__, #x); \
+		return -1; }
+
+static int test_eventfd(void)
+{
+	struct io_uring ring;
+	int ret;
+	int fda, fdb;
+	struct io_uring_cqe *cqe;
+
+	ret = io_uring_queue_init(8, &ring, IORING_SETUP_DEFER_TASKRUN);
+	if (ret)
+		return ret;
+
+	fda = eventfd(0, EFD_NONBLOCK);
+	fdb = eventfd(0, EFD_NONBLOCK);
+
+	CHECK(fda >= 0 && fdb >= 0);
+
+	ret = io_uring_register_eventfd(&ring, fda);
+	if (ret)
+		return ret;
+
+	CHECK(!can_read(fda));
+	CHECK(!can_read(fdb));
+
+	io_uring_prep_poll_add(io_uring_get_sqe(&ring), fdb, POLLIN);
+	io_uring_submit(&ring);
+	CHECK(!can_read(fda)); /* poll should not have completed */
+
+	io_uring_prep_nop(io_uring_get_sqe(&ring));
+	io_uring_submit(&ring);
+	CHECK(can_read(fda)); /* nop should have */
+
+	CHECK(io_uring_peek_cqe(&ring, &cqe) == 0);
+	CHECK(cqe->res == 0);
+	io_uring_cqe_seen(&ring, cqe);
+	eventfd_clear(fda);
+
+	eventfd_trigger(fdb);
+	CHECK(can_read(fda));
+
+	/* should not have processed the cqe yet */
+	CHECK(io_uring_cq_ready(&ring) == 0);
+
+	io_uring_get_events(&ring);
+	CHECK(io_uring_cq_ready(&ring) == 1);
+
+
+	io_uring_queue_exit(&ring);
+	return 0;
+}
+
+struct thread_data {
+	struct io_uring ring;
+	int efd;
+	char buff[8];
+};
+
+void *thread(void *t)
+{
+	struct thread_data *td = t;
+
+	io_uring_prep_read(io_uring_get_sqe(&td->ring), td->efd, td->buff, sizeof(td->buff), 0);
+	io_uring_submit(&td->ring);
+
+	return NULL;
+}
+
+static int test_thread_shutdown(void)
+{
+	pthread_t t1;
+	int ret;
+	struct thread_data td;
+	struct io_uring_cqe *cqe;
+	uint64_t val = 1;
+
+	ret = io_uring_queue_init(8, &td.ring, IORING_SETUP_DEFER_TASKRUN);
+	if (ret)
+		return ret;
+
+	td.efd = eventfd(0, 0);
+	CHECK(td.efd >= 0);
+
+	CHECK(pthread_create(&t1, NULL, thread, &td) == 0);
+	CHECK(pthread_join(t1, NULL) == 0);
+
+	CHECK(write(td.efd, &val, sizeof(val)) == sizeof(val));
+	CHECK(!io_uring_wait_cqe(&td.ring, &cqe));
+	CHECK(cqe->res == -ECANCELED);
+	io_uring_cqe_seen(&td.ring, cqe);
+
+	close(td.efd);
+	io_uring_queue_exit(&td.ring);
+	return 0;
+}
+
+static int test_flag(void)
+{
+	struct io_uring ring;
+	int ret;
+	int fd;
+	struct io_uring_cqe *cqe;
+
+	ret = io_uring_queue_init(8, &ring,
+				  IORING_SETUP_DEFER_TASKRUN | IORING_SETUP_TASKRUN_FLAG);
+	CHECK(!ret);
+
+	fd = eventfd(0, EFD_NONBLOCK);
+	CHECK(fd >= 0);
+
+	io_uring_prep_poll_add(io_uring_get_sqe(&ring), fd, POLLIN);
+	io_uring_submit(&ring);
+	CHECK(!can_read(fd)); /* poll should not have completed */
+
+	eventfd_trigger(fd);
+	CHECK(can_read(fd));
+
+	/* should not have processed the poll cqe yet */
+	CHECK(io_uring_cq_ready(&ring) == 0);
+
+	/* flag should be set */
+	CHECK(IO_URING_READ_ONCE(*ring.sq.kflags) & IORING_SQ_TASKRUN);
+
+	/* Specifically peek, knowing we have only no cqe
+	 * but because the flag is set, liburing should try and get more
+	 */
+	ret = io_uring_peek_cqe(&ring, &cqe);
+
+	CHECK(ret == 0 && cqe);
+	CHECK(!(IO_URING_READ_ONCE(*ring.sq.kflags) & IORING_SQ_TASKRUN));
+
+	close(fd);
+	io_uring_queue_exit(&ring);
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (argc > 1)
+		return T_EXIT_SKIP;
+
+	if (!t_probe_defer_taskrun())
+		return T_EXIT_SKIP;
+
+	ret = test_thread_shutdown();
+	if (ret) {
+		fprintf(stderr, "test_thread_shutdown failed\n");
+		return T_EXIT_FAIL;
+	}
+
+	ret = test_eventfd();
+	if (ret) {
+		fprintf(stderr, "eventfd failed\n");
+		return T_EXIT_FAIL;
+	}
+
+	ret = test_flag();
+	if (ret) {
+		fprintf(stderr, "flag failed\n");
+		return T_EXIT_FAIL;
+	}
+
+	return T_EXIT_PASS;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH liburing 06/11] fix documentation shortenings
  2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
                   ` (4 preceding siblings ...)
  2022-08-15 13:09 ` [PATCH liburing 05/11] add a defer-taskrun test Dylan Yudaken
@ 2022-08-15 13:09 ` Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 07/11] update io_uring_enter.2 docs for IORING_FEAT_NODROP Dylan Yudaken
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

'parm' probably was intended to be param, but parameter is more explicit,
so change it.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 man/io_uring_wait_cqe.3         | 2 +-
 man/io_uring_wait_cqe_nr.3      | 2 +-
 man/io_uring_wait_cqe_timeout.3 | 2 +-
 man/io_uring_wait_cqes.3        | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/man/io_uring_wait_cqe.3 b/man/io_uring_wait_cqe.3
index c115f6fc3c59..d1d0c8d6c513 100644
--- a/man/io_uring_wait_cqe.3
+++ b/man/io_uring_wait_cqe.3
@@ -31,7 +31,7 @@ the application can retrieve the completion with
 .SH RETURN VALUE
 On success
 .BR io_uring_wait_cqe (3)
-returns 0 and the cqe_ptr parm is filled in. On failure it returns
+returns 0 and the cqe_ptr parameter is filled in. On failure it returns
 .BR -errno .
 The return value indicates the result of waiting for a CQE, and it has no
 relation to the CQE result itself.
diff --git a/man/io_uring_wait_cqe_nr.3 b/man/io_uring_wait_cqe_nr.3
index 5a4a5d587250..7b7c598d9e8e 100644
--- a/man/io_uring_wait_cqe_nr.3
+++ b/man/io_uring_wait_cqe_nr.3
@@ -34,7 +34,7 @@ the application can retrieve the completion with
 .SH RETURN VALUE
 On success
 .BR io_uring_wait_cqe_nr (3)
-returns 0 and the cqe_ptr parm is filled in. On failure it returns
+returns 0 and the cqe_ptr parameter is filled in. On failure it returns
 .BR -errno .
 The return value indicates the result of waiting for a CQE, and it has no
 relation to the CQE result itself.
diff --git a/man/io_uring_wait_cqe_timeout.3 b/man/io_uring_wait_cqe_timeout.3
index 965fc3254ac9..3d5cf3c06fd5 100644
--- a/man/io_uring_wait_cqe_timeout.3
+++ b/man/io_uring_wait_cqe_timeout.3
@@ -43,7 +43,7 @@ when waiting for a request.
 .SH RETURN VALUE
 On success
 .BR io_uring_wait_cqes (3)
-returns 0 and the cqe_ptr parm is filled in. On failure it returns
+returns 0 and the cqe_ptr parameter is filled in. On failure it returns
 .BR -errno .
 The return value indicates the result of waiting for a CQE, and it has no
 relation to the CQE result itself.
diff --git a/man/io_uring_wait_cqes.3 b/man/io_uring_wait_cqes.3
index b771ebe44331..6ec6ec2c3757 100644
--- a/man/io_uring_wait_cqes.3
+++ b/man/io_uring_wait_cqes.3
@@ -48,7 +48,7 @@ when waiting for a request.
 .SH RETURN VALUE
 On success
 .BR io_uring_wait_cqes (3)
-returns 0 and the cqe_ptr parm is filled in. On failure it returns
+returns 0 and the cqe_ptr parameter is filled in. On failure it returns
 .BR -errno .
 .SH SEE ALSO
 .BR io_uring_submit (3),
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH liburing 07/11] update io_uring_enter.2 docs for IORING_FEAT_NODROP
  2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
                   ` (5 preceding siblings ...)
  2022-08-15 13:09 ` [PATCH liburing 06/11] fix documentation shortenings Dylan Yudaken
@ 2022-08-15 13:09 ` Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 08/11] add docs for overflow lost errors Dylan Yudaken
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

The EBUSY docs are out of date, so update them for the IORING_FEAT_NODROP
feature flag

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 man/io_uring_enter.2 | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2
index 6e3ffa0f5ba8..9ad6349b2c4a 100644
--- a/man/io_uring_enter.2
+++ b/man/io_uring_enter.2
@@ -1327,7 +1327,18 @@ is a valid file descriptor, but the io_uring ring is not in the right state
 for details on how to enable the ring.
 .TP
 .B EBUSY
-The application is attempting to overcommit the number of requests it can have
+If the
+.B IORING_FEAT_NODROP
+feature flag is set, then
+.B EBUSY
+will be returned if there were overflow entries,
+.B IORING_ENTER_GETEVENTS
+flag is set and not all of the overflow entries were able to be flushed to
+the CQ ring.
+
+Without
+.B IORING_FEAT_NODROP
+the application is attempting to overcommit the number of requests it can have
 pending. The application should wait for some completions and try again. May
 occur if the application tries to queue more requests than we have room for in
 the CQ ring, or if the application attempts to wait for more events without
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH liburing 08/11] add docs for overflow lost errors
  2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
                   ` (6 preceding siblings ...)
  2022-08-15 13:09 ` [PATCH liburing 07/11] update io_uring_enter.2 docs for IORING_FEAT_NODROP Dylan Yudaken
@ 2022-08-15 13:09 ` Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 09/11] expose CQ ring overflow state Dylan Yudaken
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Add man docs for return values indicating a CQE was lost.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 man/io_uring_enter.2 | 11 +++++++++++
 man/io_uring_setup.2 | 11 +++++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2
index 9ad6349b2c4a..688e57984f00 100644
--- a/man/io_uring_enter.2
+++ b/man/io_uring_enter.2
@@ -1326,6 +1326,17 @@ is a valid file descriptor, but the io_uring ring is not in the right state
 .BR io_uring_register (2)
 for details on how to enable the ring.
 .TP
+.B EBADR
+At least one CQE was dropped even with the
+.B IORING_FEAT_NODROP
+feature, and there are no otherwise available CQEs. This clears the error state
+and so with no other changes the next call to
+.BR io_uring_setup (2)
+will not have this error. This error should be extremely rare and indicates the
+machine is running critically low on memory and. It may be reasonable for the
+application to terminate running unless it is able to safely handle any CQE
+being lost.
+.TP
 .B EBUSY
 If the
 .B IORING_FEAT_NODROP
diff --git a/man/io_uring_setup.2 b/man/io_uring_setup.2
index 0f57d8f06a35..9a543d68dd02 100644
--- a/man/io_uring_setup.2
+++ b/man/io_uring_setup.2
@@ -279,7 +279,7 @@ call. The SQEs must still be allocated separately. This brings the necessary
 calls down from three to two. Available since kernel 5.4.
 .TP
 .B IORING_FEAT_NODROP
-If this flag is set, io_uring supports never dropping completion events.
+If this flag is set, io_uring supports almost never dropping completion events.
 If a completion event occurs and the CQ ring is full, the kernel stores
 the event internally until such a time that the CQ ring has room for more
 entries. If this overflow condition is entered, attempting to submit more
@@ -287,7 +287,14 @@ IO will fail with the
 .B -EBUSY
 error value, if it can't flush the overflown events to the CQ ring. If this
 happens, the application must reap events from the CQ ring and attempt the
-submit again. Available since kernel 5.5.
+submit again. If the kernel has no free memory to store the event internally
+it will be visible by an increase in the overflow value on the cqring.
+Available since kernel 5.5. Additionally
+.BR io_uring_enter (2)
+will return
+.B -EBADR
+the next time it would otherwise sleep waiting for completions (since kernel 5.19).
+
 .TP
 .B IORING_FEAT_SUBMIT_STABLE
 If this flag is set, applications can be certain that any data for
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH liburing 09/11] expose CQ ring overflow state
  2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
                   ` (7 preceding siblings ...)
  2022-08-15 13:09 ` [PATCH liburing 08/11] add docs for overflow lost errors Dylan Yudaken
@ 2022-08-15 13:09 ` Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 10/11] overflow: add tests Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 11/11] file-verify test: log if short read Dylan Yudaken
  10 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Allow the application to easily view if the CQ ring is in overflow state in
case it would like to explicitly flush using io_uring_get_events.

Explicit flushing can be useful for applications that prefer to have
reduced latency on CQs than to process as many as possible.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 src/include/liburing.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/include/liburing.h b/src/include/liburing.h
index 6b25c358c63f..47668c3f26a7 100644
--- a/src/include/liburing.h
+++ b/src/include/liburing.h
@@ -1064,6 +1064,15 @@ static inline unsigned io_uring_cq_ready(const struct io_uring *ring)
 	return io_uring_smp_load_acquire(ring->cq.ktail) - *ring->cq.khead;
 }
 
+/*
+ * Returns true if there are overflow entries waiting to be flushed onto
+ * the CQ ring
+ */
+static inline bool io_uring_cq_has_overflow(const struct io_uring *ring)
+{
+	return IO_URING_READ_ONCE(*ring->sq.kflags) & IORING_SQ_CQ_OVERFLOW;
+}
+
 /*
  * Returns true if the eventfd notification is currently enabled
  */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH liburing 10/11] overflow: add tests
  2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
                   ` (8 preceding siblings ...)
  2022-08-15 13:09 ` [PATCH liburing 09/11] expose CQ ring overflow state Dylan Yudaken
@ 2022-08-15 13:09 ` Dylan Yudaken
  2022-08-15 13:09 ` [PATCH liburing 11/11] file-verify test: log if short read Dylan Yudaken
  10 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

Add tests that verify that overflow conditions behave appropriately.
Specifically:
 * if overflow is continually flushed, then CQEs should arrive mostly in
 order to prevent starvation of some completions
 * if CQEs are dropped due to GFP_ATOMIC allocation failures it is
 possible to terminate cleanly. This is not tested by default as it
 requires debug kernel config, and also has system-wide effects

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 test/cq-overflow.c | 248 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 241 insertions(+), 7 deletions(-)

diff --git a/test/cq-overflow.c b/test/cq-overflow.c
index 312b414b2a79..4265d48ae8af 100644
--- a/test/cq-overflow.c
+++ b/test/cq-overflow.c
@@ -9,6 +9,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <fcntl.h>
+#include <assert.h>
 
 #include "helpers.h"
 #include "liburing.h"
@@ -21,6 +22,32 @@ static struct iovec *vecs;
 
 #define ENTRIES	8
 
+/*
+ * io_uring has rare cases where CQEs are lost.
+ * This happens when there is no space in the CQ ring, and also there is no
+ * GFP_ATOMIC memory available. In reality this probably means that the process
+ * is about to be killed as many other things might start failing, but we still
+ * want to test that liburing and the kernel deal with this properly. The fault
+ * injection framework allows us to test this scenario. Unfortunately this
+ * requires some system wide changes and so we do not enable this by default.
+ * The tests in this file should work in both cases (where overflows are queued
+ * and where they are dropped) on recent kernels.
+ *
+ * In order to test dropped CQEs you should enable fault injection in the kernel
+ * config:
+ *
+ * CONFIG_FAULT_INJECTION=y
+ * CONFIG_FAILSLAB=y
+ * CONFIG_FAULT_INJECTION_DEBUG_FS=y
+ *
+ * and then run the test as follows:
+ * echo Y > /sys/kernel/debug/failslab/task-filter
+ * echo 100 > /sys/kernel/debug/failslab/probability
+ * echo 0 > /sys/kernel/debug/failslab/verbose
+ * echo 100000 > /sys/kernel/debug/failslab/times
+ * bash -c "echo 1 > /proc/self/make-it-fail && exec ./cq-overflow.t"
+ */
+
 static int test_io(const char *file, unsigned long usecs, unsigned *drops, int fault)
 {
 	struct io_uring_sqe *sqe;
@@ -29,6 +56,7 @@ static int test_io(const char *file, unsigned long usecs, unsigned *drops, int f
 	unsigned reaped, total;
 	struct io_uring ring;
 	int nodrop, i, fd, ret;
+	bool cqe_dropped = false;
 
 	fd = open(file, O_RDONLY | O_DIRECT);
 	if (fd < 0) {
@@ -104,8 +132,8 @@ static int test_io(const char *file, unsigned long usecs, unsigned *drops, int f
 reap_it:
 	reaped = 0;
 	do {
-		if (nodrop) {
-			/* nodrop should never lose events */
+		if (nodrop && !cqe_dropped) {
+			/* nodrop should never lose events unless cqe_dropped */
 			if (reaped == total)
 				break;
 		} else {
@@ -113,7 +141,10 @@ reap_it:
 				break;
 		}
 		ret = io_uring_wait_cqe(&ring, &cqe);
-		if (ret) {
+		if (nodrop && ret == -EBADR) {
+			cqe_dropped = true;
+			continue;
+		} else if (ret) {
 			fprintf(stderr, "wait_cqe=%d\n", ret);
 			goto err;
 		}
@@ -133,7 +164,7 @@ reap_it:
 		goto err;
 	}
 
-	if (!nodrop) {
+	if (!nodrop || cqe_dropped) {
 		*drops = *ring.cq.koverflow;
 	} else if (*ring.cq.koverflow) {
 		fprintf(stderr, "Found %u overflows\n", *ring.cq.koverflow);
@@ -154,18 +185,29 @@ static int reap_events(struct io_uring *ring, unsigned nr_events, int do_wait)
 {
 	struct io_uring_cqe *cqe;
 	int i, ret = 0, seq = 0;
+	unsigned int start_overflow = *ring->cq.koverflow;
+	bool dropped = false;
 
 	for (i = 0; i < nr_events; i++) {
 		if (do_wait)
 			ret = io_uring_wait_cqe(ring, &cqe);
 		else
 			ret = io_uring_peek_cqe(ring, &cqe);
-		if (ret) {
+		if (do_wait && ret == -EBADR) {
+			unsigned int this_drop = *ring->cq.koverflow -
+				start_overflow;
+
+			dropped = true;
+			start_overflow = *ring->cq.koverflow;
+			assert(this_drop > 0);
+			i += (this_drop - 1);
+			continue;
+		} else if (ret) {
 			if (ret != -EAGAIN)
 				fprintf(stderr, "cqe peek failed: %d\n", ret);
 			break;
 		}
-		if (cqe->user_data != seq) {
+		if (!dropped && cqe->user_data != seq) {
 			fprintf(stderr, "cqe sequence out-of-order\n");
 			fprintf(stderr, "got %d, wanted %d\n", (int) cqe->user_data,
 					seq);
@@ -242,19 +284,211 @@ err:
 	return 1;
 }
 
+
+static void submit_one_nop(struct io_uring *ring, int ud)
+{
+	struct io_uring_sqe *sqe;
+	int ret;
+
+	sqe = io_uring_get_sqe(ring);
+	assert(sqe);
+	io_uring_prep_nop(sqe);
+	sqe->user_data = ud;
+	ret = io_uring_submit(ring);
+	assert(ret == 1);
+}
+
+/*
+ * Create an overflow condition and ensure that SQEs are still processed
+ */
+static int test_overflow_handling(
+	bool batch,
+	int cqe_multiple,
+	bool poll,
+	bool defer)
+{
+	struct io_uring ring;
+	struct io_uring_params p;
+	int ret, i, j, ud, cqe_count;
+	unsigned int count;
+	int const N = 8;
+	int const LOOPS = 128;
+	int const QUEUE_LENGTH = 1024;
+	int completions[N];
+	int queue[QUEUE_LENGTH];
+	int queued = 0;
+	int outstanding = 0;
+	bool cqe_dropped = false;
+
+	memset(&completions, 0, sizeof(int) * N);
+	memset(&p, 0, sizeof(p));
+	p.cq_entries = 2 * cqe_multiple;
+	p.flags |= IORING_SETUP_CQSIZE;
+
+	if (poll)
+		p.flags |= IORING_SETUP_IOPOLL;
+
+	if (defer)
+		p.flags |= IORING_SETUP_DEFER_TASKRUN;
+
+	ret = io_uring_queue_init_params(2, &ring, &p);
+	if (ret) {
+		fprintf(stderr, "io_uring_queue_init failed %d\n", ret);
+		return 1;
+	}
+
+	assert(p.cq_entries < N);
+	/* submit N SQEs, some should overflow */
+	for (i = 0; i < N; i++) {
+		submit_one_nop(&ring, i);
+		outstanding++;
+	}
+
+	for (i = 0; i < LOOPS; i++) {
+		struct io_uring_cqe *cqes[N];
+
+		if (io_uring_cq_has_overflow(&ring)) {
+			/*
+			 * Flush any overflowed CQEs and process those. Actively
+			 * flush these to make sure CQEs arrive in vague order
+			 * of being sent.
+			 */
+			ret = io_uring_get_events(&ring);
+			if (ret != 0) {
+				fprintf(stderr,
+					"io_uring_get_events returned %d\n",
+					ret);
+				goto err;
+			}
+		} else if (!cqe_dropped) {
+			for (j = 0; j < queued; j++) {
+				submit_one_nop(&ring, queue[j]);
+				outstanding++;
+			}
+			queued = 0;
+		}
+
+		/* We have lost some random cqes, stop if no remaining. */
+		if (cqe_dropped && outstanding == *ring.cq.koverflow)
+			break;
+
+		ret = io_uring_wait_cqe(&ring, &cqes[0]);
+		if (ret == -EBADR) {
+			cqe_dropped = true;
+			fprintf(stderr, "CQE dropped\n");
+			continue;
+		} else if (ret != 0) {
+			fprintf(stderr, "io_uring_wait_cqes failed %d\n", ret);
+			goto err;
+		}
+		cqe_count = 1;
+		if (batch) {
+			ret = io_uring_peek_batch_cqe(&ring, &cqes[0], 2);
+			if (ret < 0) {
+				fprintf(stderr,
+					"io_uring_peek_batch_cqe failed %d\n",
+					ret);
+				goto err;
+			}
+			cqe_count = ret;
+		}
+		for (j = 0; j < cqe_count; j++) {
+			assert(cqes[j]->user_data < N);
+			ud = cqes[j]->user_data;
+			completions[ud]++;
+			assert(queued < QUEUE_LENGTH);
+			queue[queued++] = (int)ud;
+		}
+		io_uring_cq_advance(&ring, cqe_count);
+		outstanding -= cqe_count;
+	}
+
+	/* See if there were any drops by flushing the CQ ring *and* overflow */
+	do {
+		struct io_uring_cqe *cqe;
+
+		ret = io_uring_get_events(&ring);
+		if (ret < 0) {
+			if (ret == -EBADR) {
+				fprintf(stderr, "CQE dropped\n");
+				cqe_dropped = true;
+				break;
+			}
+			goto err;
+		}
+		if (outstanding && !io_uring_cq_ready(&ring))
+			ret = io_uring_wait_cqe_timeout(&ring, &cqe, NULL);
+
+		if (ret && ret != -ETIME) {
+			if (ret == -EBADR) {
+				fprintf(stderr, "CQE dropped\n");
+				cqe_dropped = true;
+				break;
+			}
+			fprintf(stderr, "wait_cqe_timeout = %d\n", ret);
+			goto err;
+		}
+		count = io_uring_cq_ready(&ring);
+		io_uring_cq_advance(&ring, count);
+		outstanding -= count;
+	} while (count);
+
+	io_uring_queue_exit(&ring);
+
+	/* Make sure that completions come back in the same order they were
+	 * sent. If they come back unfairly then this will concentrate on a
+	 * couple of indices.
+	 */
+	for (i = 1; !cqe_dropped && i < N; i++) {
+		if (abs(completions[i] - completions[i - 1]) > 1) {
+			fprintf(
+				stderr,
+				"bad completion size %d %d\n",
+				completions[i],
+				completions[i - 1]);
+			goto err;
+		}
+	}
+	return 0;
+err:
+	io_uring_queue_exit(&ring);
+	return 1;
+}
+
 int main(int argc, char *argv[])
 {
 	const char *fname = ".cq-overflow";
 	unsigned iters, drops;
 	unsigned long usecs;
 	int ret;
+	int i;
+	bool can_defer;
 
 	if (argc > 1)
 		return T_EXIT_SKIP;
 
+	can_defer = t_probe_defer_taskrun();
+	for (i = 0; i < 16; i++) {
+		bool batch = i & 1;
+		int mult = (i & 2) ? 1 : 2;
+		bool poll = i & 4;
+		bool defer = i & 8;
+
+		if (defer && !can_defer)
+			continue;
+
+		ret = test_overflow_handling(batch, mult, poll, defer);
+		if (ret) {
+			fprintf(stderr, "test_overflow_handling("
+				"batch=%d, mult=%d, poll=%d, defer=%d) failed\n",
+				batch, mult, poll, defer);
+			goto err;
+		}
+	}
+
 	ret = test_overflow();
 	if (ret) {
-		printf("test_overflow failed\n");
+		fprintf(stderr, "test_overflow failed\n");
 		return ret;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH liburing 11/11] file-verify test: log if short read
  2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
                   ` (9 preceding siblings ...)
  2022-08-15 13:09 ` [PATCH liburing 10/11] overflow: add tests Dylan Yudaken
@ 2022-08-15 13:09 ` Dylan Yudaken
  10 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-08-15 13:09 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Kernel-team, Dylan Yudaken

this test assumes any success is a full read. for now we can just log if it's
short and fail, and if in future this wants to be handled that shouldn't be too
much work.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
---
 test/file-verify.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/test/file-verify.c b/test/file-verify.c
index 595dafd4bd8d..3a395f9dccda 100644
--- a/test/file-verify.c
+++ b/test/file-verify.c
@@ -430,6 +430,10 @@ static int test(struct io_uring *ring, const char *fname, int buffered,
 				fprintf(stderr, "bad read %d, read %d\n", cqe->res, i);
 				goto err;
 			}
+			if (cqe->res < CHUNK_SIZE) {
+				fprintf(stderr, "short read %d, read %d\n", cqe->res, i);
+				goto err;
+			}
 			if (cqe->flags & IORING_CQE_F_BUFFER)
 				index = cqe->flags >> 16;
 			else
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-08-15 13:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-15 13:09 [PATCH liburing 00/11] Defer taskrun changes Dylan Yudaken
2022-08-15 13:09 ` [PATCH liburing 01/11] Copy defer task run definition from kernel Dylan Yudaken
2022-08-15 13:09 ` [PATCH liburing 02/11] add io_uring_submit_and_get_events and io_uring_get_events Dylan Yudaken
2022-08-15 13:09 ` [PATCH liburing 03/11] add a t_probe_defer_taskrun helper function for tests Dylan Yudaken
2022-08-15 13:09 ` [PATCH liburing 04/11] update existing tests for defer taskrun Dylan Yudaken
2022-08-15 13:09 ` [PATCH liburing 05/11] add a defer-taskrun test Dylan Yudaken
2022-08-15 13:09 ` [PATCH liburing 06/11] fix documentation shortenings Dylan Yudaken
2022-08-15 13:09 ` [PATCH liburing 07/11] update io_uring_enter.2 docs for IORING_FEAT_NODROP Dylan Yudaken
2022-08-15 13:09 ` [PATCH liburing 08/11] add docs for overflow lost errors Dylan Yudaken
2022-08-15 13:09 ` [PATCH liburing 09/11] expose CQ ring overflow state Dylan Yudaken
2022-08-15 13:09 ` [PATCH liburing 10/11] overflow: add tests Dylan Yudaken
2022-08-15 13:09 ` [PATCH liburing 11/11] file-verify test: log if short read Dylan Yudaken

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).