All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/12] add large CQE support for io-uring
@ 2022-04-25 18:25 ` Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 01/12] io_uring: support CQE32 in io_uring_cqe Stefan Roesch
                     ` (13 more replies)
  0 siblings, 14 replies; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k

This adds the large CQE support for io-uring. Large CQE's are 16 bytes longer.
To support the longer CQE's the allocation part is changed and when the CQE is
accessed.

The allocation of the large CQE's is twice as big, so the allocation size is
doubled. The ring size calculation needs to take this into account.

All accesses to the large CQE's need to be shifted by 1 to take the bigger size
of each CQE into account. The existing index manipulation does not need to be
changed and can stay the same.

The setup and the completion processing needs to take the new fields into
account and initialize them. For the completion processing these fields need
to be passed through.

The flush completion processing needs to fill the additional CQE32 fields.

The code for overflows needs to be adapted accordingly: the allocation needs to
take large CQE's into account. This means that the order of the fields in the io
overflow structure needs to be changed and the allocation needs to be enlarged
for big CQE's.
In addition the two new fields need to be copied for large CQE's.

The new fields are added to the tracing statements, so the extra1 and extra2
fields are exposed in tracing. The new fields are also exposed in the /proc
filesystem entry.

For testing purposes the extra1 and extra2 fields are used by the nop operation.


Testing:

The exisiting tests have been run with the following configurations and they all
pass:

- Default config
- Large SQE
- Large CQE
- Large SQE and large CQE.

In addition a new test has been added to liburing to verify that extra1 and extra2
are set as expected for the nop operation.

Note:
To use this patch also the corresponding changes to the client library
liburing are required. A different patch series is sent out for this.


Changes:
  V2: - added support for CQE32 in the /proc filesystem entry output function
      - the definition of the io_uring_cqe_extra field has been changed
        to avoid warning with the /proc changes.
  V3: - use __64 for big cqe in io_uring_cqe data structure
      - use io_req_complete_state helper in __io_req_complete32
      - support cached cqe's
      - use bool for cqe32 check in io_cqring_event_overflow
      - use bool for cqe32 check in __io_uring_show_fdinfo


Stefan Roesch (12):
  io_uring: support CQE32 in io_uring_cqe
  io_uring: wire up inline completion path for CQE32
  io_uring: change ring size calculation for CQE32
  io_uring: add CQE32 setup processing
  io_uring: add CQE32 completion processing
  io_uring: modify io_get_cqe for CQE32
  io_uring: flush completions for CQE32
  io_uring: overflow processing for CQE32
  io_uring: add tracing for additional CQE32 fields
  io_uring: support CQE32 in /proc info
  io_uring: enable CQE32
  io_uring: support CQE32 for nop operation

 fs/io_uring.c                   | 234 ++++++++++++++++++++++++++++----
 include/trace/events/io_uring.h |  18 ++-
 include/uapi/linux/io_uring.h   |   7 +
 3 files changed, 225 insertions(+), 34 deletions(-)


base-commit: fd1cf8f1947eb7b009eb79807ec8af0e920fc57b
-- 
2.30.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 01/12] io_uring: support CQE32 in io_uring_cqe
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-26  5:22     ` Kanchan Joshi
  2022-04-25 18:25   ` [PATCH v3 02/12] io_uring: wire up inline completion path for CQE32 Stefan Roesch
                     ` (12 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe

This adds the struct io_uring_cqe_extra in the structure io_uring_cqe to
support large CQE's.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/uapi/linux/io_uring.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index ee677dbd6a6d..7020a434e3b1 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -111,6 +111,7 @@ enum {
 #define IORING_SETUP_R_DISABLED	(1U << 6)	/* start with ring disabled */
 #define IORING_SETUP_SUBMIT_ALL	(1U << 7)	/* continue submit on error */
 #define IORING_SETUP_SQE128	(1U << 8)	/* SQEs are 128b */
+#define IORING_SETUP_CQE32	(1U << 9)	/* CQEs are 32b */
 
 enum {
 	IORING_OP_NOP,
@@ -208,6 +209,12 @@ struct io_uring_cqe {
 	__u64	user_data;	/* sqe->data submission passed back */
 	__s32	res;		/* result code for this event */
 	__u32	flags;
+
+	/*
+	 * If the ring is initialized with IORING_SETUP_CQE32, then this field
+	 * contains 16-bytes of padding, doubling the size of the CQE.
+	 */
+	__u64 big_cqe[];
 };
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 02/12] io_uring: wire up inline completion path for CQE32
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 01/12] io_uring: support CQE32 in io_uring_cqe Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-26  5:45     ` Kanchan Joshi
  2022-04-25 18:25   ` [PATCH v3 03/12] io_uring: change ring size calculation " Stefan Roesch
                     ` (11 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe

Rather than always use the slower locked path, wire up use of the
deferred completion path that normal CQEs can take. This reuses the
hash list node for the storage we need to hold the two 64-bit values
that must be passed back.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4c32cf987ef3..bf2b02518332 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -964,7 +964,13 @@ struct io_kiocb {
 	atomic_t			poll_refs;
 	struct io_task_work		io_task_work;
 	/* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
-	struct hlist_node		hash_node;
+	union {
+		struct hlist_node	hash_node;
+		struct {
+			u64		extra1;
+			u64		extra2;
+		};
+	};
 	/* internal polling, see IORING_FEAT_FAST_POLL */
 	struct async_poll		*apoll;
 	/* opcode allocated if it needs to store data for async defer */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 03/12] io_uring: change ring size calculation for CQE32
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 01/12] io_uring: support CQE32 in io_uring_cqe Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 02/12] io_uring: wire up inline completion path for CQE32 Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 04/12] io_uring: add CQE32 setup processing Stefan Roesch
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe

This changes the function rings_size to take large CQE's into account.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index bf2b02518332..9712483d3a17 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -9693,8 +9693,8 @@ static void *io_mem_alloc(size_t size)
 	return (void *) __get_free_pages(gfp, get_order(size));
 }
 
-static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
-				size_t *sq_offset)
+static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries,
+				unsigned int cq_entries, size_t *sq_offset)
 {
 	struct io_rings *rings;
 	size_t off, sq_array_size;
@@ -9702,6 +9702,10 @@ static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
 	off = struct_size(rings, cqes, cq_entries);
 	if (off == SIZE_MAX)
 		return SIZE_MAX;
+	if (ctx->flags & IORING_SETUP_CQE32) {
+		if (check_shl_overflow(off, 1, &off))
+			return SIZE_MAX;
+	}
 
 #ifdef CONFIG_SMP
 	off = ALIGN(off, SMP_CACHE_BYTES);
@@ -11365,7 +11369,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
 	ctx->sq_entries = p->sq_entries;
 	ctx->cq_entries = p->cq_entries;
 
-	size = rings_size(p->sq_entries, p->cq_entries, &sq_array_offset);
+	size = rings_size(ctx, p->sq_entries, p->cq_entries, &sq_array_offset);
 	if (size == SIZE_MAX)
 		return -EOVERFLOW;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 04/12] io_uring: add CQE32 setup processing
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
                     ` (2 preceding siblings ...)
  2022-04-25 18:25   ` [PATCH v3 03/12] io_uring: change ring size calculation " Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 05/12] io_uring: add CQE32 completion processing Stefan Roesch
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k

This adds two new function to setup and fill the CQE32 result structure.

Signed-off-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9712483d3a17..8cb51676d38d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2175,12 +2175,70 @@ static inline bool __io_fill_cqe_req_filled(struct io_ring_ctx *ctx,
 					req->cqe.res, req->cqe.flags);
 }
 
+static inline bool __io_fill_cqe32_req_filled(struct io_ring_ctx *ctx,
+					      struct io_kiocb *req)
+{
+	struct io_uring_cqe *cqe;
+	u64 extra1 = req->extra1;
+	u64 extra2 = req->extra2;
+
+	trace_io_uring_complete(req->ctx, req, req->cqe.user_data,
+				req->cqe.res, req->cqe.flags);
+
+	/*
+	 * If we can't get a cq entry, userspace overflowed the
+	 * submission (by quite a lot). Increment the overflow count in
+	 * the ring.
+	 */
+	cqe = io_get_cqe(ctx);
+	if (likely(cqe)) {
+		memcpy(cqe, &req->cqe, sizeof(struct io_uring_cqe));
+		cqe->big_cqe[0] = extra1;
+		cqe->big_cqe[1] = extra2;
+		return true;
+	}
+
+	return io_cqring_event_overflow(ctx, req->cqe.user_data,
+					req->cqe.res, req->cqe.flags);
+}
+
 static inline bool __io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags)
 {
 	trace_io_uring_complete(req->ctx, req, req->cqe.user_data, res, cflags);
 	return __io_fill_cqe(req->ctx, req->cqe.user_data, res, cflags);
 }
 
+static inline void __io_fill_cqe32_req(struct io_kiocb *req, s32 res, u32 cflags,
+				u64 extra1, u64 extra2)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+	struct io_uring_cqe *cqe;
+
+	if (WARN_ON_ONCE(!(ctx->flags & IORING_SETUP_CQE32)))
+		return;
+	if (req->flags & REQ_F_CQE_SKIP)
+		return;
+
+	trace_io_uring_complete(ctx, req, req->cqe.user_data, res, cflags);
+
+	/*
+	 * If we can't get a cq entry, userspace overflowed the
+	 * submission (by quite a lot). Increment the overflow count in
+	 * the ring.
+	 */
+	cqe = io_get_cqe(ctx);
+	if (likely(cqe)) {
+		WRITE_ONCE(cqe->user_data, req->cqe.user_data);
+		WRITE_ONCE(cqe->res, res);
+		WRITE_ONCE(cqe->flags, cflags);
+		WRITE_ONCE(cqe->big_cqe[0], extra1);
+		WRITE_ONCE(cqe->big_cqe[1], extra2);
+		return;
+	}
+
+	io_cqring_event_overflow(ctx, req->cqe.user_data, res, cflags);
+}
+
 static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data,
 				     s32 res, u32 cflags)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 05/12] io_uring: add CQE32 completion processing
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
                     ` (3 preceding siblings ...)
  2022-04-25 18:25   ` [PATCH v3 04/12] io_uring: add CQE32 setup processing Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 06/12] io_uring: modify io_get_cqe for CQE32 Stefan Roesch
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe

This adds the completion processing for the large CQE's and makes sure
that the extra1 and extra2 fields are passed through.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 53 +++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 45 insertions(+), 8 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8cb51676d38d..f300130fd9f0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2247,18 +2247,15 @@ static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data,
 	return __io_fill_cqe(ctx, user_data, res, cflags);
 }
 
-static void __io_req_complete_post(struct io_kiocb *req, s32 res,
-				   u32 cflags)
+static void __io_req_complete_put(struct io_kiocb *req)
 {
-	struct io_ring_ctx *ctx = req->ctx;
-
-	if (!(req->flags & REQ_F_CQE_SKIP))
-		__io_fill_cqe_req(req, res, cflags);
 	/*
 	 * If we're the last reference to this request, add to our locked
 	 * free_list cache.
 	 */
 	if (req_ref_put_and_test(req)) {
+		struct io_ring_ctx *ctx = req->ctx;
+
 		if (req->flags & IO_REQ_LINK_FLAGS) {
 			if (req->flags & IO_DISARM_MASK)
 				io_disarm_next(req);
@@ -2281,8 +2278,23 @@ static void __io_req_complete_post(struct io_kiocb *req, s32 res,
 	}
 }
 
-static void io_req_complete_post(struct io_kiocb *req, s32 res,
-				 u32 cflags)
+static void __io_req_complete_post(struct io_kiocb *req, s32 res,
+				   u32 cflags)
+{
+	if (!(req->flags & REQ_F_CQE_SKIP))
+		__io_fill_cqe_req(req, res, cflags);
+	__io_req_complete_put(req);
+}
+
+static void __io_req_complete_post32(struct io_kiocb *req, s32 res,
+				   u32 cflags, u64 extra1, u64 extra2)
+{
+	if (!(req->flags & REQ_F_CQE_SKIP))
+		__io_fill_cqe32_req(req, res, cflags, extra1, extra2);
+	__io_req_complete_put(req);
+}
+
+static void io_req_complete_post(struct io_kiocb *req, s32 res, u32 cflags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 
@@ -2293,6 +2305,18 @@ static void io_req_complete_post(struct io_kiocb *req, s32 res,
 	io_cqring_ev_posted(ctx);
 }
 
+static void io_req_complete_post32(struct io_kiocb *req, s32 res,
+				   u32 cflags, u64 extra1, u64 extra2)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+
+	spin_lock(&ctx->completion_lock);
+	__io_req_complete_post32(req, res, cflags, extra1, extra2);
+	io_commit_cqring(ctx);
+	spin_unlock(&ctx->completion_lock);
+	io_cqring_ev_posted(ctx);
+}
+
 static inline void io_req_complete_state(struct io_kiocb *req, s32 res,
 					 u32 cflags)
 {
@@ -2310,6 +2334,19 @@ static inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags,
 		io_req_complete_post(req, res, cflags);
 }
 
+static inline void __io_req_complete32(struct io_kiocb *req,
+				       unsigned int issue_flags, s32 res,
+				       u32 cflags, u64 extra1, u64 extra2)
+{
+	if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
+		io_req_complete_state(req, res, cflags);
+		req->extra1 = extra1;
+		req->extra2 = extra2;
+	} else {
+		io_req_complete_post32(req, res, cflags, extra1, extra2);
+	}
+}
+
 static inline void io_req_complete(struct io_kiocb *req, s32 res)
 {
 	__io_req_complete(req, 0, res, 0);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 06/12] io_uring: modify io_get_cqe for CQE32
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
                     ` (4 preceding siblings ...)
  2022-04-25 18:25   ` [PATCH v3 05/12] io_uring: add CQE32 completion processing Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 07/12] io_uring: flush completions " Stefan Roesch
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k

Modify accesses to the CQE array to take large CQE's into account. The
index needs to be shifted by one for large CQE's.

Signed-off-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f300130fd9f0..726238dc65dc 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1909,8 +1909,12 @@ static noinline struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx)
 {
 	struct io_rings *rings = ctx->rings;
 	unsigned int off = ctx->cached_cq_tail & (ctx->cq_entries - 1);
+	unsigned int shift = 0;
 	unsigned int free, queued, len;
 
+	if (ctx->flags & IORING_SETUP_CQE32)
+		shift = 1;
+
 	/* userspace may cheat modifying the tail, be safe and do min */
 	queued = min(__io_cqring_events(ctx), ctx->cq_entries);
 	free = ctx->cq_entries - queued;
@@ -1922,15 +1926,26 @@ static noinline struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx)
 	ctx->cached_cq_tail++;
 	ctx->cqe_cached = &rings->cqes[off];
 	ctx->cqe_sentinel = ctx->cqe_cached + len;
-	return ctx->cqe_cached++;
+	ctx->cqe_cached++;
+	return &rings->cqes[off << shift];
 }
 
 static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx)
 {
 	if (likely(ctx->cqe_cached < ctx->cqe_sentinel)) {
+		struct io_uring_cqe *cqe = ctx->cqe_cached;
+
+		if (ctx->flags & IORING_SETUP_CQE32) {
+			unsigned int off = ctx->cqe_cached - ctx->rings->cqes;
+
+			cqe += off;
+		}
+
 		ctx->cached_cq_tail++;
-		return ctx->cqe_cached++;
+		ctx->cqe_cached++;
+		return cqe;
 	}
+
 	return __io_get_cqe(ctx);
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 07/12] io_uring: flush completions for CQE32
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
                     ` (5 preceding siblings ...)
  2022-04-25 18:25   ` [PATCH v3 06/12] io_uring: modify io_get_cqe for CQE32 Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 08/12] io_uring: overflow processing " Stefan Roesch
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k

This flushes the completions according to their CQE type: the same
processing is done for the default CQE size, but for large CQE's the
extra1 and extra2 fields are filled in.

Signed-off-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 726238dc65dc..68b61d2b356d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2885,8 +2885,12 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
 			struct io_kiocb *req = container_of(node, struct io_kiocb,
 						    comp_list);
 
-			if (!(req->flags & REQ_F_CQE_SKIP))
-				__io_fill_cqe_req_filled(ctx, req);
+			if (!(req->flags & REQ_F_CQE_SKIP)) {
+				if (!(ctx->flags & IORING_SETUP_CQE32))
+					__io_fill_cqe_req_filled(ctx, req);
+				else
+					__io_fill_cqe32_req_filled(ctx, req);
+			}
 		}
 
 		io_commit_cqring(ctx);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 08/12] io_uring: overflow processing for CQE32
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
                     ` (6 preceding siblings ...)
  2022-04-25 18:25   ` [PATCH v3 07/12] io_uring: flush completions " Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-26  6:28     ` Kanchan Joshi
  2022-04-25 18:25   ` [PATCH v3 09/12] io_uring: add tracing for additional CQE32 fields Stefan Roesch
                     ` (5 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe

This adds the overflow processing for large CQE's.

This adds two parameters to the io_cqring_event_overflow function and
uses these fields to initialize the large CQE fields.

Allocate enough space for large CQE's in the overflow structue. If no
large CQE's are used, the size of the allocation is unchanged.

The cqe field can have a different size depending if its a large
CQE or not. To be able to allocate different sizes, the two fields
in the structure are re-ordered.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 68b61d2b356d..3630671325ea 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -220,8 +220,8 @@ struct io_mapped_ubuf {
 struct io_ring_ctx;
 
 struct io_overflow_cqe {
-	struct io_uring_cqe cqe;
 	struct list_head list;
+	struct io_uring_cqe cqe;
 };
 
 struct io_fixed_file {
@@ -2017,10 +2017,14 @@ static void io_cqring_ev_posted_iopoll(struct io_ring_ctx *ctx)
 static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 {
 	bool all_flushed, posted;
+	size_t cqe_size = sizeof(struct io_uring_cqe);
 
 	if (!force && __io_cqring_events(ctx) == ctx->cq_entries)
 		return false;
 
+	if (ctx->flags & IORING_SETUP_CQE32)
+		cqe_size <<= 1;
+
 	posted = false;
 	spin_lock(&ctx->completion_lock);
 	while (!list_empty(&ctx->cq_overflow_list)) {
@@ -2032,7 +2036,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 		ocqe = list_first_entry(&ctx->cq_overflow_list,
 					struct io_overflow_cqe, list);
 		if (cqe)
-			memcpy(cqe, &ocqe->cqe, sizeof(*cqe));
+			memcpy(cqe, &ocqe->cqe, cqe_size);
 		else
 			io_account_cq_overflow(ctx);
 
@@ -2121,11 +2125,16 @@ static __cold void io_uring_drop_tctx_refs(struct task_struct *task)
 }
 
 static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data,
-				     s32 res, u32 cflags)
+				     s32 res, u32 cflags, u64 extra1, u64 extra2)
 {
 	struct io_overflow_cqe *ocqe;
+	size_t ocq_size = sizeof(struct io_overflow_cqe);
+	bool is_cqe32 = (ctx->flags & IORING_SETUP_CQE32);
+
+	if (is_cqe32)
+		ocq_size += sizeof(struct io_uring_cqe);
 
-	ocqe = kmalloc(sizeof(*ocqe), GFP_ATOMIC | __GFP_ACCOUNT);
+	ocqe = kmalloc(ocq_size, GFP_ATOMIC | __GFP_ACCOUNT);
 	if (!ocqe) {
 		/*
 		 * If we're in ring overflow flush mode, or in task cancel mode,
@@ -2144,6 +2153,10 @@ static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data,
 	ocqe->cqe.user_data = user_data;
 	ocqe->cqe.res = res;
 	ocqe->cqe.flags = cflags;
+	if (is_cqe32) {
+		ocqe->cqe.big_cqe[0] = extra1;
+		ocqe->cqe.big_cqe[1] = extra2;
+	}
 	list_add_tail(&ocqe->list, &ctx->cq_overflow_list);
 	return true;
 }
@@ -2165,7 +2178,7 @@ static inline bool __io_fill_cqe(struct io_ring_ctx *ctx, u64 user_data,
 		WRITE_ONCE(cqe->flags, cflags);
 		return true;
 	}
-	return io_cqring_event_overflow(ctx, user_data, res, cflags);
+	return io_cqring_event_overflow(ctx, user_data, res, cflags, 0, 0);
 }
 
 static inline bool __io_fill_cqe_req_filled(struct io_ring_ctx *ctx,
@@ -2187,7 +2200,7 @@ static inline bool __io_fill_cqe_req_filled(struct io_ring_ctx *ctx,
 		return true;
 	}
 	return io_cqring_event_overflow(ctx, req->cqe.user_data,
-					req->cqe.res, req->cqe.flags);
+					req->cqe.res, req->cqe.flags, 0, 0);
 }
 
 static inline bool __io_fill_cqe32_req_filled(struct io_ring_ctx *ctx,
@@ -2213,8 +2226,8 @@ static inline bool __io_fill_cqe32_req_filled(struct io_ring_ctx *ctx,
 		return true;
 	}
 
-	return io_cqring_event_overflow(ctx, req->cqe.user_data,
-					req->cqe.res, req->cqe.flags);
+	return io_cqring_event_overflow(ctx, req->cqe.user_data, req->cqe.res,
+					req->cqe.flags, extra1, extra2);
 }
 
 static inline bool __io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags)
@@ -2251,7 +2264,7 @@ static inline void __io_fill_cqe32_req(struct io_kiocb *req, s32 res, u32 cflags
 		return;
 	}
 
-	io_cqring_event_overflow(ctx, req->cqe.user_data, res, cflags);
+	io_cqring_event_overflow(ctx, req->cqe.user_data, res, cflags, extra1, extra2);
 }
 
 static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 09/12] io_uring: add tracing for additional CQE32 fields
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
                     ` (7 preceding siblings ...)
  2022-04-25 18:25   ` [PATCH v3 08/12] io_uring: overflow processing " Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 10/12] io_uring: support CQE32 in /proc info Stefan Roesch
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe

This adds tracing for the extra1 and extra2 fields.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c                   | 11 ++++++-----
 include/trace/events/io_uring.h | 18 ++++++++++++++----
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3630671325ea..9dd075e39850 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2187,7 +2187,7 @@ static inline bool __io_fill_cqe_req_filled(struct io_ring_ctx *ctx,
 	struct io_uring_cqe *cqe;
 
 	trace_io_uring_complete(req->ctx, req, req->cqe.user_data,
-				req->cqe.res, req->cqe.flags);
+				req->cqe.res, req->cqe.flags, 0, 0);
 
 	/*
 	 * If we can't get a cq entry, userspace overflowed the
@@ -2211,7 +2211,7 @@ static inline bool __io_fill_cqe32_req_filled(struct io_ring_ctx *ctx,
 	u64 extra2 = req->extra2;
 
 	trace_io_uring_complete(req->ctx, req, req->cqe.user_data,
-				req->cqe.res, req->cqe.flags);
+				req->cqe.res, req->cqe.flags, extra1, extra2);
 
 	/*
 	 * If we can't get a cq entry, userspace overflowed the
@@ -2232,7 +2232,7 @@ static inline bool __io_fill_cqe32_req_filled(struct io_ring_ctx *ctx,
 
 static inline bool __io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags)
 {
-	trace_io_uring_complete(req->ctx, req, req->cqe.user_data, res, cflags);
+	trace_io_uring_complete(req->ctx, req, req->cqe.user_data, res, cflags, 0, 0);
 	return __io_fill_cqe(req->ctx, req->cqe.user_data, res, cflags);
 }
 
@@ -2247,7 +2247,8 @@ static inline void __io_fill_cqe32_req(struct io_kiocb *req, s32 res, u32 cflags
 	if (req->flags & REQ_F_CQE_SKIP)
 		return;
 
-	trace_io_uring_complete(ctx, req, req->cqe.user_data, res, cflags);
+	trace_io_uring_complete(ctx, req, req->cqe.user_data, res, cflags,
+				extra1, extra2);
 
 	/*
 	 * If we can't get a cq entry, userspace overflowed the
@@ -2271,7 +2272,7 @@ static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data,
 				     s32 res, u32 cflags)
 {
 	ctx->cq_extra++;
-	trace_io_uring_complete(ctx, NULL, user_data, res, cflags);
+	trace_io_uring_complete(ctx, NULL, user_data, res, cflags, 0, 0);
 	return __io_fill_cqe(ctx, user_data, res, cflags);
 }
 
diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h
index 8477414d6d06..2eb4f4e47de4 100644
--- a/include/trace/events/io_uring.h
+++ b/include/trace/events/io_uring.h
@@ -318,13 +318,16 @@ TRACE_EVENT(io_uring_fail_link,
  * @user_data:		user data associated with the request
  * @res:		result of the request
  * @cflags:		completion flags
+ * @extra1:		extra 64-bit data for CQE32
+ * @extra2:		extra 64-bit data for CQE32
  *
  */
 TRACE_EVENT(io_uring_complete,
 
-	TP_PROTO(void *ctx, void *req, u64 user_data, int res, unsigned cflags),
+	TP_PROTO(void *ctx, void *req, u64 user_data, int res, unsigned cflags,
+		 u64 extra1, u64 extra2),
 
-	TP_ARGS(ctx, req, user_data, res, cflags),
+	TP_ARGS(ctx, req, user_data, res, cflags, extra1, extra2),
 
 	TP_STRUCT__entry (
 		__field(  void *,	ctx		)
@@ -332,6 +335,8 @@ TRACE_EVENT(io_uring_complete,
 		__field(  u64,		user_data	)
 		__field(  int,		res		)
 		__field(  unsigned,	cflags		)
+		__field(  u64,		extra1		)
+		__field(  u64,		extra2		)
 	),
 
 	TP_fast_assign(
@@ -340,12 +345,17 @@ TRACE_EVENT(io_uring_complete,
 		__entry->user_data	= user_data;
 		__entry->res		= res;
 		__entry->cflags		= cflags;
+		__entry->extra1		= extra1;
+		__entry->extra2		= extra2;
 	),
 
-	TP_printk("ring %p, req %p, user_data 0x%llx, result %d, cflags 0x%x",
+	TP_printk("ring %p, req %p, user_data 0x%llx, result %d, cflags 0x%x "
+		  "extra1 %llu extra2 %llu ",
 		__entry->ctx, __entry->req,
 		__entry->user_data,
-		__entry->res, __entry->cflags)
+		__entry->res, __entry->cflags,
+		(unsigned long long) __entry->extra1,
+		(unsigned long long) __entry->extra2)
 );
 
 /**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 10/12] io_uring: support CQE32 in /proc info
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
                     ` (8 preceding siblings ...)
  2022-04-25 18:25   ` [PATCH v3 09/12] io_uring: add tracing for additional CQE32 fields Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 11/12] io_uring: enable CQE32 Stefan Roesch
                     ` (3 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k

This exposes the extra1 and extra2 fields in the /proc output.

Signed-off-by: Stefan Roesch <shr@fb.com>
---
 fs/io_uring.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9dd075e39850..e1b84204b0ab 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -11354,10 +11354,15 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
 	unsigned int sq_tail = READ_ONCE(r->sq.tail);
 	unsigned int cq_head = READ_ONCE(r->cq.head);
 	unsigned int cq_tail = READ_ONCE(r->cq.tail);
+	unsigned int cq_shift = 0;
 	unsigned int sq_entries, cq_entries;
 	bool has_lock;
+	bool is_cqe32 = (ctx->flags & IORING_SETUP_CQE32);
 	unsigned int i;
 
+	if (is_cqe32)
+		cq_shift = 1;
+
 	/*
 	 * we may get imprecise sqe and cqe info if uring is actively running
 	 * since we get cached_sq_head and cached_cq_tail without uring_lock
@@ -11390,11 +11395,18 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
 	cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
 	for (i = 0; i < cq_entries; i++) {
 		unsigned int entry = i + cq_head;
-		struct io_uring_cqe *cqe = &r->cqes[entry & cq_mask];
+		struct io_uring_cqe *cqe = &r->cqes[(entry & cq_mask) << cq_shift];
 
-		seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x\n",
+		if (!is_cqe32) {
+			seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x\n",
 			   entry & cq_mask, cqe->user_data, cqe->res,
 			   cqe->flags);
+		} else {
+			seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x, "
+				"extra1:%llu, extra2:%llu\n",
+				entry & cq_mask, cqe->user_data, cqe->res,
+				cqe->flags, cqe->big_cqe[0], cqe->big_cqe[1]);
+		}
 	}
 
 	/*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 11/12] io_uring: enable CQE32
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
                     ` (9 preceding siblings ...)
  2022-04-25 18:25   ` [PATCH v3 10/12] io_uring: support CQE32 in /proc info Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-25 18:25   ` [PATCH v3 12/12] io_uring: support CQE32 for nop operation Stefan Roesch
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe

This enables large CQE's in the uring setup.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e1b84204b0ab..caeddcf8a61c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -11752,7 +11752,7 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
 			IORING_SETUP_SQ_AFF | IORING_SETUP_CQSIZE |
 			IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
 			IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
-			IORING_SETUP_SQE128))
+			IORING_SETUP_SQE128 | IORING_SETUP_CQE32))
 		return -EINVAL;
 
 	return  io_uring_create(entries, &p, params);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 12/12] io_uring: support CQE32 for nop operation
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
                     ` (10 preceding siblings ...)
  2022-04-25 18:25   ` [PATCH v3 11/12] io_uring: enable CQE32 Stefan Roesch
@ 2022-04-25 18:25   ` Stefan Roesch
  2022-04-25 18:38   ` [PATCH v3 00/12] add large CQE support for io-uring Jens Axboe
  2022-04-26 11:37   ` Kanchan Joshi
  13 siblings, 0 replies; 20+ messages in thread
From: Stefan Roesch @ 2022-04-25 18:25 UTC (permalink / raw)
  To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe

This adds support for filling the extra1 and extra2 fields for large
CQE's.

Co-developed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index caeddcf8a61c..9e1fb8be9687 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -744,6 +744,12 @@ struct io_msg {
 	u32 len;
 };
 
+struct io_nop {
+	struct file			*file;
+	u64				extra1;
+	u64				extra2;
+};
+
 struct io_async_connect {
 	struct sockaddr_storage		address;
 };
@@ -937,6 +943,7 @@ struct io_kiocb {
 		struct io_msg		msg;
 		struct io_xattr		xattr;
 		struct io_socket	sock;
+		struct io_nop		nop;
 	};
 
 	u8				opcode;
@@ -4872,6 +4879,19 @@ static int io_splice(struct io_kiocb *req, unsigned int issue_flags)
 	return 0;
 }
 
+static int io_nop_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+	/*
+	 * If the ring is setup with CQE32, relay back addr/addr
+	 */
+	if (req->ctx->flags & IORING_SETUP_CQE32) {
+		req->nop.extra1 = READ_ONCE(sqe->addr);
+		req->nop.extra2 = READ_ONCE(sqe->addr2);
+	}
+
+	return 0;
+}
+
 /*
  * IORING_OP_NOP just posts a completion event, nothing else.
  */
@@ -4882,7 +4902,11 @@ static int io_nop(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
 		return -EINVAL;
 
-	__io_req_complete(req, issue_flags, 0, 0);
+	if (!(ctx->flags & IORING_SETUP_CQE32))
+		__io_req_complete(req, issue_flags, 0, 0);
+	else
+		__io_req_complete32(req, issue_flags, 0, 0, req->nop.extra1,
+					req->nop.extra2);
 	return 0;
 }
 
@@ -7354,7 +7378,7 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	switch (req->opcode) {
 	case IORING_OP_NOP:
-		return 0;
+		return io_nop_prep(req, sqe);
 	case IORING_OP_READV:
 	case IORING_OP_READ_FIXED:
 	case IORING_OP_READ:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 00/12] add large CQE support for io-uring
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
                     ` (11 preceding siblings ...)
  2022-04-25 18:25   ` [PATCH v3 12/12] io_uring: support CQE32 for nop operation Stefan Roesch
@ 2022-04-25 18:38   ` Jens Axboe
  2022-04-26 11:37   ` Kanchan Joshi
  13 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2022-04-25 18:38 UTC (permalink / raw)
  To: io-uring, kernel-team, shr, linux-nvme; +Cc: joshi.k

On Mon, 25 Apr 2022 11:25:18 -0700, Stefan Roesch wrote:
> This adds the large CQE support for io-uring. Large CQE's are 16 bytes longer.
> To support the longer CQE's the allocation part is changed and when the CQE is
> accessed.
> 
> The allocation of the large CQE's is twice as big, so the allocation size is
> doubled. The ring size calculation needs to take this into account.
> 
> [...]

Applied, thanks!

[01/12] io_uring: support CQE32 in io_uring_cqe
        commit: fd5bd0a6ce17d29a41a05e1a90e0bf9589afcc61
[02/12] io_uring: wire up inline completion path for CQE32
        commit: f867ab4b4ff36109c62e2babcbbfb28937409d3a
[03/12] io_uring: change ring size calculation for CQE32
        commit: 279480550322febcceeecc3ca655fb04f3783c43
[04/12] io_uring: add CQE32 setup processing
        commit: 823d4b0ba7cd3c3fa3c3f2578517cf6ec1cbd932
[05/12] io_uring: add CQE32 completion processing
        commit: e9ba19e1015db1f874d15b3cc6d96a4b0420e647
[06/12] io_uring: modify io_get_cqe for CQE32
        commit: e2caab09ddfc573fd89fa77a5963577f6c7331d8
[07/12] io_uring: flush completions for CQE32
        commit: 0f5ddaf0afb7ca17d645ddba4ad866ce845028a3
[08/12] io_uring: overflow processing for CQE32
        commit: e440146360bac2740298c46e1d26802a8006d18f
[09/12] io_uring: add tracing for additional CQE32 fields
        commit: 0db691c0a5959c1e412d9237449c56b345777e57
[10/12] io_uring: support CQE32 in /proc info
        commit: 3b5a857e9998e18d970496b8989cd73c8214bb57
[11/12] io_uring: enable CQE32
        commit: 3b27f0e387239593c3074f8f9bcefea05b25ab7e
[12/12] io_uring: support CQE32 for nop operation
        commit: c5eb9a698f2a082cdfbfdc0b32ed8d855bc6040e

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 01/12] io_uring: support CQE32 in io_uring_cqe
  2022-04-25 18:25   ` [PATCH v3 01/12] io_uring: support CQE32 in io_uring_cqe Stefan Roesch
@ 2022-04-26  5:22     ` Kanchan Joshi
  0 siblings, 0 replies; 20+ messages in thread
From: Kanchan Joshi @ 2022-04-26  5:22 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: io-uring, linux-nvme, kernel-team, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 259 bytes --]

On Mon, Apr 25, 2022 at 11:25:19AM -0700, Stefan Roesch wrote:
>This adds the struct io_uring_cqe_extra in the structure io_uring_cqe to
>support large CQE's.
>
since we decided to kill that and now using "__u64 big_cqe[]" instead,
this too can be refreshed.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 02/12] io_uring: wire up inline completion path for CQE32
  2022-04-25 18:25   ` [PATCH v3 02/12] io_uring: wire up inline completion path for CQE32 Stefan Roesch
@ 2022-04-26  5:45     ` Kanchan Joshi
  0 siblings, 0 replies; 20+ messages in thread
From: Kanchan Joshi @ 2022-04-26  5:45 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: io-uring, linux-nvme, kernel-team, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 1251 bytes --]

On Mon, Apr 25, 2022 at 11:25:20AM -0700, Stefan Roesch wrote:
>Rather than always use the slower locked path, wire up use of the
>deferred completion path that normal CQEs can take. 
That patch does not do that; patch 5 has what is said here. So bit of
rewording here may clear up the commit message.

>This reuses the
>hash list node for the storage we need to hold the two 64-bit values
>that must be passed back.
>
>Co-developed-by: Jens Axboe <axboe@kernel.dk>
>Signed-off-by: Stefan Roesch <shr@fb.com>
>Signed-off-by: Jens Axboe <axboe@kernel.dk>
>---
> fs/io_uring.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
>diff --git a/fs/io_uring.c b/fs/io_uring.c
>index 4c32cf987ef3..bf2b02518332 100644
>--- a/fs/io_uring.c
>+++ b/fs/io_uring.c
>@@ -964,7 +964,13 @@ struct io_kiocb {
> 	atomic_t			poll_refs;
> 	struct io_task_work		io_task_work;
> 	/* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
>-	struct hlist_node		hash_node;
>+	union {
>+		struct hlist_node	hash_node;
>+		struct {
>+			u64		extra1;
>+			u64		extra2;
>+		};
>+	};
> 	/* internal polling, see IORING_FEAT_FAST_POLL */
> 	struct async_poll		*apoll;
> 	/* opcode allocated if it needs to store data for async defer */
>-- 
>2.30.2
>
>

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 08/12] io_uring: overflow processing for CQE32
  2022-04-25 18:25   ` [PATCH v3 08/12] io_uring: overflow processing " Stefan Roesch
@ 2022-04-26  6:28     ` Kanchan Joshi
  2022-04-26 12:53       ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Kanchan Joshi @ 2022-04-26  6:28 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: io-uring, linux-nvme, kernel-team, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 2102 bytes --]

On Mon, Apr 25, 2022 at 11:25:26AM -0700, Stefan Roesch wrote:
>This adds the overflow processing for large CQE's.
>
>This adds two parameters to the io_cqring_event_overflow function and
>uses these fields to initialize the large CQE fields.
>
>Allocate enough space for large CQE's in the overflow structue. If no
>large CQE's are used, the size of the allocation is unchanged.
>
>The cqe field can have a different size depending if its a large
>CQE or not. To be able to allocate different sizes, the two fields
>in the structure are re-ordered.
>
>Co-developed-by: Jens Axboe <axboe@kernel.dk>
>Signed-off-by: Stefan Roesch <shr@fb.com>
>Signed-off-by: Jens Axboe <axboe@kernel.dk>
>---
> fs/io_uring.c | 31 ++++++++++++++++++++++---------
> 1 file changed, 22 insertions(+), 9 deletions(-)
>
>diff --git a/fs/io_uring.c b/fs/io_uring.c
>index 68b61d2b356d..3630671325ea 100644
>--- a/fs/io_uring.c
>+++ b/fs/io_uring.c
>@@ -220,8 +220,8 @@ struct io_mapped_ubuf {
> struct io_ring_ctx;
>
> struct io_overflow_cqe {
>-	struct io_uring_cqe cqe;
> 	struct list_head list;
>+	struct io_uring_cqe cqe;
> };
>
> struct io_fixed_file {
>@@ -2017,10 +2017,14 @@ static void io_cqring_ev_posted_iopoll(struct io_ring_ctx *ctx)
> static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
> {
> 	bool all_flushed, posted;
>+	size_t cqe_size = sizeof(struct io_uring_cqe);
>
> 	if (!force && __io_cqring_events(ctx) == ctx->cq_entries)
> 		return false;
>
>+	if (ctx->flags & IORING_SETUP_CQE32)
>+		cqe_size <<= 1;
>+
> 	posted = false;
> 	spin_lock(&ctx->completion_lock);
> 	while (!list_empty(&ctx->cq_overflow_list)) {
>@@ -2032,7 +2036,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
> 		ocqe = list_first_entry(&ctx->cq_overflow_list,
> 					struct io_overflow_cqe, list);
> 		if (cqe)
>-			memcpy(cqe, &ocqe->cqe, sizeof(*cqe));
>+			memcpy(cqe, &ocqe->cqe, cqe_size);

Maybe a nit, but if we do it this way -
memcpy(cqe, &ocqe->cqe, 
	sizeof(*cqe) << (ctx->flags & IORING_SETUP_CQE32));

we can do away with all previous changes in this function.


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 00/12] add large CQE support for io-uring
  2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
                     ` (12 preceding siblings ...)
  2022-04-25 18:38   ` [PATCH v3 00/12] add large CQE support for io-uring Jens Axboe
@ 2022-04-26 11:37   ` Kanchan Joshi
  2022-04-26 12:54     ` Jens Axboe
  13 siblings, 1 reply; 20+ messages in thread
From: Kanchan Joshi @ 2022-04-26 11:37 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: io-uring, linux-nvme, kernel-team

[-- Attachment #1: Type: text/plain, Size: 382 bytes --]

On Mon, Apr 25, 2022 at 11:25:18AM -0700, Stefan Roesch wrote:
>This adds the large CQE support for io-uring. Large CQE's are 16 bytes longer.
>To support the longer CQE's the allocation part is changed and when the CQE is
>accessed.

Few nits that I commented on, mostly on commit-messages.
Regardless of that, things look good.

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>



[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 08/12] io_uring: overflow processing for CQE32
  2022-04-26  6:28     ` Kanchan Joshi
@ 2022-04-26 12:53       ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2022-04-26 12:53 UTC (permalink / raw)
  To: Kanchan Joshi, Stefan Roesch; +Cc: io-uring, linux-nvme, kernel-team

On 4/26/22 12:28 AM, Kanchan Joshi wrote:
> On Mon, Apr 25, 2022 at 11:25:26AM -0700, Stefan Roesch wrote:
>> This adds the overflow processing for large CQE's.
>>
>> This adds two parameters to the io_cqring_event_overflow function and
>> uses these fields to initialize the large CQE fields.
>>
>> Allocate enough space for large CQE's in the overflow structue. If no
>> large CQE's are used, the size of the allocation is unchanged.
>>
>> The cqe field can have a different size depending if its a large
>> CQE or not. To be able to allocate different sizes, the two fields
>> in the structure are re-ordered.
>>
>> Co-developed-by: Jens Axboe <axboe@kernel.dk>
>> Signed-off-by: Stefan Roesch <shr@fb.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>> fs/io_uring.c | 31 ++++++++++++++++++++++---------
>> 1 file changed, 22 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 68b61d2b356d..3630671325ea 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -220,8 +220,8 @@ struct io_mapped_ubuf {
>> struct io_ring_ctx;
>>
>> struct io_overflow_cqe {
>> -    struct io_uring_cqe cqe;
>>     struct list_head list;
>> +    struct io_uring_cqe cqe;
>> };
>>
>> struct io_fixed_file {
>> @@ -2017,10 +2017,14 @@ static void io_cqring_ev_posted_iopoll(struct io_ring_ctx *ctx)
>> static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
>> {
>>     bool all_flushed, posted;
>> +    size_t cqe_size = sizeof(struct io_uring_cqe);
>>
>>     if (!force && __io_cqring_events(ctx) == ctx->cq_entries)
>>         return false;
>>
>> +    if (ctx->flags & IORING_SETUP_CQE32)
>> +        cqe_size <<= 1;
>> +
>>     posted = false;
>>     spin_lock(&ctx->completion_lock);
>>     while (!list_empty(&ctx->cq_overflow_list)) {
>> @@ -2032,7 +2036,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
>>         ocqe = list_first_entry(&ctx->cq_overflow_list,
>>                     struct io_overflow_cqe, list);
>>         if (cqe)
>> -            memcpy(cqe, &ocqe->cqe, sizeof(*cqe));
>> +            memcpy(cqe, &ocqe->cqe, cqe_size);
> 
> Maybe a nit, but if we do it this way -
> memcpy(cqe, &ocqe->cqe,     sizeof(*cqe) << (ctx->flags & IORING_SETUP_CQE32));

Unless you make that:

memcpy(cqe, &ocqe->cqe, sizeof(*cqe) << !!(ctx->flags & IORING_SETUP_CQE32));

that will end in tears, and that just makes it less readable. So I don't
think that's a good idea at all.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 00/12] add large CQE support for io-uring
  2022-04-26 11:37   ` Kanchan Joshi
@ 2022-04-26 12:54     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2022-04-26 12:54 UTC (permalink / raw)
  To: Kanchan Joshi, Stefan Roesch; +Cc: io-uring, linux-nvme, kernel-team

On 4/26/22 5:37 AM, Kanchan Joshi wrote:
> On Mon, Apr 25, 2022 at 11:25:18AM -0700, Stefan Roesch wrote:
>> This adds the large CQE support for io-uring. Large CQE's are 16 bytes longer.
>> To support the longer CQE's the allocation part is changed and when the CQE is
>> accessed.
> 
> Few nits that I commented on, mostly on commit-messages.
> Regardless of that, things look good.
> 
> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>

Thanks for reviewing it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-04-26 12:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20220425182557epcas5p2e1b72edf0fcc4c21b2b96a32910a2736@epcas5p2.samsung.com>
2022-04-25 18:25 ` [PATCH v3 00/12] add large CQE support for io-uring Stefan Roesch
2022-04-25 18:25   ` [PATCH v3 01/12] io_uring: support CQE32 in io_uring_cqe Stefan Roesch
2022-04-26  5:22     ` Kanchan Joshi
2022-04-25 18:25   ` [PATCH v3 02/12] io_uring: wire up inline completion path for CQE32 Stefan Roesch
2022-04-26  5:45     ` Kanchan Joshi
2022-04-25 18:25   ` [PATCH v3 03/12] io_uring: change ring size calculation " Stefan Roesch
2022-04-25 18:25   ` [PATCH v3 04/12] io_uring: add CQE32 setup processing Stefan Roesch
2022-04-25 18:25   ` [PATCH v3 05/12] io_uring: add CQE32 completion processing Stefan Roesch
2022-04-25 18:25   ` [PATCH v3 06/12] io_uring: modify io_get_cqe for CQE32 Stefan Roesch
2022-04-25 18:25   ` [PATCH v3 07/12] io_uring: flush completions " Stefan Roesch
2022-04-25 18:25   ` [PATCH v3 08/12] io_uring: overflow processing " Stefan Roesch
2022-04-26  6:28     ` Kanchan Joshi
2022-04-26 12:53       ` Jens Axboe
2022-04-25 18:25   ` [PATCH v3 09/12] io_uring: add tracing for additional CQE32 fields Stefan Roesch
2022-04-25 18:25   ` [PATCH v3 10/12] io_uring: support CQE32 in /proc info Stefan Roesch
2022-04-25 18:25   ` [PATCH v3 11/12] io_uring: enable CQE32 Stefan Roesch
2022-04-25 18:25   ` [PATCH v3 12/12] io_uring: support CQE32 for nop operation Stefan Roesch
2022-04-25 18:38   ` [PATCH v3 00/12] add large CQE support for io-uring Jens Axboe
2022-04-26 11:37   ` Kanchan Joshi
2022-04-26 12:54     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.