All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next v10 0/7] Fixed-buffer for uring-cmd/passthru
       [not found] <CGME20220927174622epcas5p1685c0f97a7ee2ee13ba25f5fb58dff00@epcas5p1.samsung.com>
@ 2022-09-27 17:36 ` Kanchan Joshi
       [not found]   ` <CGME20220927174626epcas5p4002acda6f0578ee314ee5e611b8d6662@epcas5p4.samsung.com>
                     ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Kanchan Joshi @ 2022-09-27 17:36 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, gost.dev, Kanchan Joshi

Hi

uring-cmd lacks the ability to leverage the pre-registered buffers.
This series adds that support in uring-cmd, and plumbs nvme passthrough
to work with it.
Patch 3 and 4 contains a bunch of general nvme cleanups, which got added
along the iterations.

Using registered-buffers showed IOPS hike from 1.65M to 2.04M.
Without fixedbufs
*****************
# taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1
submitter=0, tid=2178, file=/dev/ng0n1, node=-1
polled=1, fixedbufs=0/0, register_files=1, buffered=1, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=1.63M, BW=796MiB/s, IOS/call=32/31
IOPS=1.64M, BW=800MiB/s, IOS/call=32/32
IOPS=1.64M, BW=801MiB/s, IOS/call=32/32
IOPS=1.65M, BW=803MiB/s, IOS/call=32/31
^CExiting on signal
Maximum IOPS=1.65M

With fixedbufs
**************
# taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -O0 -n1 -u1 /dev/ng0n1
submitter=0, tid=2180, file=/dev/ng0n1, node=-1
polled=1, fixedbufs=1/0, register_files=1, buffered=1, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=2.03M, BW=991MiB/s, IOS/call=32/31
IOPS=2.04M, BW=998MiB/s, IOS/call=32/32
IOPS=2.04M, BW=997MiB/s, IOS/call=32/31
^CExiting on signal
Maximum IOPS=2.04M

Changes since v9:
- Patch 6: Make blk_rq_map_user_iov() to operate on bvec iterator
  (Christoph)
- Patch 7: Change nvme to use the above

Changes since v8:
- Split some patches further; now 7 patches rather than 5 (Christoph)
- Applied a bunch of other suggested cleanups (Christoph)

Changes since v7:
- Patch 3: added many cleanups/refactoring suggested by Christoph
- Patch 4: added copying-pages fallback for bounce-buffer/dma-alignment case
  (Christoph)

Changes since v6:
- Patch 1: fix warning for io_uring_cmd_import_fixed (robot)
-
Changes since v5:
- Patch 4: newly addd, to split a nvme function into two
- Patch 3: folded cleanups in bio_map_user_iov (Chaitanya, Pankaj)
- Rebase to latest for-next

Changes since v4:
- Patch 1, 2: folded all review comments of Jens

Changes since v3:
- uring_cmd_flags, change from u16 to u32 (Jens)
- patch 3, add another helper to reduce code-duplication (Jens)

Changes since v2:
- Kill the new opcode, add a flag instead (Pavel)
- Fix standalone build issue with patch 1 (Pavel)

Changes since v1:
- Fix a naming issue for an exported helper

Anuj Gupta (2):
  io_uring: add io_uring_cmd_import_fixed
  io_uring: introduce fixed buffer support for io_uring_cmd

Kanchan Joshi (5):
  nvme: refactor nvme_add_user_metadata
  nvme: refactor nvme_alloc_request
  block: factor out bio_map_get helper
  block: extend functionality to map bvec iterator
  nvme: wire up fixed buffer support for nvme passthrough

 block/blk-map.c               | 108 ++++++++++++++++++++---
 drivers/nvme/host/ioctl.c     | 160 ++++++++++++++++++++--------------
 include/linux/io_uring.h      |  10 ++-
 include/uapi/linux/io_uring.h |   9 ++
 io_uring/uring_cmd.c          |  26 +++++-
 5 files changed, 237 insertions(+), 76 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH for-next v10 1/7] io_uring: add io_uring_cmd_import_fixed
       [not found]   ` <CGME20220927174626epcas5p4002acda6f0578ee314ee5e611b8d6662@epcas5p4.samsung.com>
@ 2022-09-27 17:36     ` Kanchan Joshi
  0 siblings, 0 replies; 23+ messages in thread
From: Kanchan Joshi @ 2022-09-27 17:36 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, gost.dev, Anuj Gupta, Kanchan Joshi

From: Anuj Gupta <anuj20.g@samsung.com>

This is a new helper that callers can use to obtain a bvec iterator for
the previously mapped buffer. This is preparatory work to enable
fixed-buffer support for io_uring_cmd.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 include/linux/io_uring.h |  8 ++++++++
 io_uring/uring_cmd.c     | 10 ++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 58676c0a398f..1dbf51115c30 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -4,6 +4,7 @@
 
 #include <linux/sched.h>
 #include <linux/xarray.h>
+#include <uapi/linux/io_uring.h>
 
 enum io_uring_cmd_flags {
 	IO_URING_F_COMPLETE_DEFER	= 1,
@@ -32,6 +33,8 @@ struct io_uring_cmd {
 };
 
 #if defined(CONFIG_IO_URING)
+int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
+			      struct iov_iter *iter, void *ioucmd);
 void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2);
 void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
 			void (*task_work_cb)(struct io_uring_cmd *));
@@ -59,6 +62,11 @@ static inline void io_uring_free(struct task_struct *tsk)
 		__io_uring_free(tsk);
 }
 #else
+static int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
+			      struct iov_iter *iter, void *ioucmd)
+{
+	return -EOPNOTSUPP;
+}
 static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret,
 		ssize_t ret2)
 {
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index f3ed61e9bd0f..6a6d69523d75 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -8,6 +8,7 @@
 #include <uapi/linux/io_uring.h>
 
 #include "io_uring.h"
+#include "rsrc.h"
 #include "uring_cmd.h"
 
 static void io_uring_cmd_work(struct io_kiocb *req, bool *locked)
@@ -129,3 +130,12 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 
 	return IOU_ISSUE_SKIP_COMPLETE;
 }
+
+int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
+			      struct iov_iter *iter, void *ioucmd)
+{
+	struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
+
+	return io_import_fixed(rw, iter, req->imu, ubuf, len);
+}
+EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v10 2/7] io_uring: introduce fixed buffer support for io_uring_cmd
       [not found]   ` <CGME20220927174628epcas5p21beda845f26eedeb538cb67e286954d4@epcas5p2.samsung.com>
@ 2022-09-27 17:36     ` Kanchan Joshi
  0 siblings, 0 replies; 23+ messages in thread
From: Kanchan Joshi @ 2022-09-27 17:36 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, gost.dev, Anuj Gupta, Kanchan Joshi

From: Anuj Gupta <anuj20.g@samsung.com>

Add IORING_URING_CMD_FIXED flag that is to be used for sending io_uring
command with previously registered buffers. User-space passes the buffer
index in sqe->buf_index, same as done in read/write variants that uses
fixed buffers.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 include/linux/io_uring.h      |  2 +-
 include/uapi/linux/io_uring.h |  9 +++++++++
 io_uring/uring_cmd.c          | 16 +++++++++++++++-
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 1dbf51115c30..e10c5cc81082 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -28,7 +28,7 @@ struct io_uring_cmd {
 		void *cookie;
 	};
 	u32		cmd_op;
-	u32		pad;
+	u32		flags;
 	u8		pdu[32]; /* available inline for free use */
 };
 
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 92f29d9505a6..ab7458033ee3 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -56,6 +56,7 @@ struct io_uring_sqe {
 		__u32		hardlink_flags;
 		__u32		xattr_flags;
 		__u32		msg_ring_flags;
+		__u32		uring_cmd_flags;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
 	/* pack this to avoid bogus arm OABI complaints */
@@ -219,6 +220,14 @@ enum io_uring_op {
 	IORING_OP_LAST,
 };
 
+/*
+ * sqe->uring_cmd_flags
+ * IORING_URING_CMD_FIXED	use registered buffer; pass thig flag
+ *				along with setting sqe->buf_index.
+ */
+#define IORING_URING_CMD_FIXED	(1U << 0)
+
+
 /*
  * sqe->fsync_flags
  */
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 6a6d69523d75..faefa9f6f259 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -4,6 +4,7 @@
 #include <linux/file.h>
 #include <linux/io_uring.h>
 #include <linux/security.h>
+#include <linux/nospec.h>
 
 #include <uapi/linux/io_uring.h>
 
@@ -77,8 +78,21 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
 
-	if (sqe->rw_flags || sqe->__pad1)
+	if (sqe->__pad1)
 		return -EINVAL;
+
+	ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags);
+	if (ioucmd->flags & IORING_URING_CMD_FIXED) {
+		struct io_ring_ctx *ctx = req->ctx;
+		u16 index;
+
+		req->buf_index = READ_ONCE(sqe->buf_index);
+		if (unlikely(req->buf_index >= ctx->nr_user_bufs))
+			return -EFAULT;
+		index = array_index_nospec(req->buf_index, ctx->nr_user_bufs);
+		req->imu = ctx->user_bufs[index];
+		io_req_set_rsrc_node(req, ctx, 0);
+	}
 	ioucmd->cmd = sqe->cmd;
 	ioucmd->cmd_op = READ_ONCE(sqe->cmd_op);
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v10 3/7] nvme: refactor nvme_add_user_metadata
       [not found]   ` <CGME20220927174631epcas5p12cd6ffbd7dad819b0af75733ce6cdd2c@epcas5p1.samsung.com>
@ 2022-09-27 17:36     ` Kanchan Joshi
  2022-09-28 17:18       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Kanchan Joshi @ 2022-09-27 17:36 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, gost.dev, Kanchan Joshi

Pass struct request rather than bio. It helps to kill a parameter, and
some processing clean-up too.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 drivers/nvme/host/ioctl.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 548aca8b5b9f..8f8435b55b95 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -20,19 +20,20 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval)
 	return (void __user *)ptrval;
 }
 
-static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf,
-		unsigned len, u32 seed, bool write)
+static void *nvme_add_user_metadata(struct request *req, void __user *ubuf,
+		unsigned len, u32 seed)
 {
 	struct bio_integrity_payload *bip;
 	int ret = -ENOMEM;
 	void *buf;
+	struct bio *bio = req->bio;
 
 	buf = kmalloc(len, GFP_KERNEL);
 	if (!buf)
 		goto out;
 
 	ret = -EFAULT;
-	if (write && copy_from_user(buf, ubuf, len))
+	if ((req_op(req) == REQ_OP_DRV_OUT) && copy_from_user(buf, ubuf, len))
 		goto out_free_meta;
 
 	bip = bio_integrity_alloc(bio, GFP_KERNEL, 1);
@@ -45,9 +46,13 @@ static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf,
 	bip->bip_iter.bi_sector = seed;
 	ret = bio_integrity_add_page(bio, virt_to_page(buf), len,
 			offset_in_page(buf));
-	if (ret == len)
-		return buf;
-	ret = -ENOMEM;
+	if (ret != len) {
+		ret = -ENOMEM;
+		goto out_free_meta;
+	}
+
+	req->cmd_flags |= REQ_INTEGRITY;
+	return buf;
 out_free_meta:
 	kfree(buf);
 out:
@@ -70,7 +75,6 @@ static struct request *nvme_alloc_user_request(struct request_queue *q,
 		u32 meta_seed, void **metap, unsigned timeout, bool vec,
 		blk_opf_t rq_flags, blk_mq_req_flags_t blk_flags)
 {
-	bool write = nvme_is_write(cmd);
 	struct nvme_ns *ns = q->queuedata;
 	struct block_device *bdev = ns ? ns->disk->part0 : NULL;
 	struct request *req;
@@ -110,13 +114,12 @@ static struct request *nvme_alloc_user_request(struct request_queue *q,
 		if (bdev)
 			bio_set_dev(bio, bdev);
 		if (bdev && meta_buffer && meta_len) {
-			meta = nvme_add_user_metadata(bio, meta_buffer, meta_len,
-					meta_seed, write);
+			meta = nvme_add_user_metadata(req, meta_buffer, meta_len,
+					meta_seed);
 			if (IS_ERR(meta)) {
 				ret = PTR_ERR(meta);
 				goto out_unmap;
 			}
-			req->cmd_flags |= REQ_INTEGRITY;
 			*metap = meta;
 		}
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v10 4/7] nvme: refactor nvme_alloc_request
       [not found]   ` <CGME20220927174633epcas5p4d492bdebde981e2c019e30c47cf00869@epcas5p4.samsung.com>
@ 2022-09-27 17:36     ` Kanchan Joshi
  2022-09-28 17:19       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Kanchan Joshi @ 2022-09-27 17:36 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, gost.dev, Kanchan Joshi

nvme_alloc_alloc_request expects a large number of parameters.
Split this out into two functions to reduce number of parameters.
First one retains the name nvme_alloc_request, while second one is
named nvme_map_user_request.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 drivers/nvme/host/ioctl.c | 121 ++++++++++++++++++++++----------------
 1 file changed, 69 insertions(+), 52 deletions(-)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 8f8435b55b95..b9f17dc87de9 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -70,68 +70,69 @@ static int nvme_finish_user_metadata(struct request *req, void __user *ubuf,
 }
 
 static struct request *nvme_alloc_user_request(struct request_queue *q,
-		struct nvme_command *cmd, void __user *ubuffer,
-		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
-		u32 meta_seed, void **metap, unsigned timeout, bool vec,
-		blk_opf_t rq_flags, blk_mq_req_flags_t blk_flags)
+		struct nvme_command *cmd, blk_opf_t rq_flags,
+		blk_mq_req_flags_t blk_flags)
 {
-	struct nvme_ns *ns = q->queuedata;
-	struct block_device *bdev = ns ? ns->disk->part0 : NULL;
 	struct request *req;
-	struct bio *bio = NULL;
-	void *meta = NULL;
-	int ret;
 
 	req = blk_mq_alloc_request(q, nvme_req_op(cmd) | rq_flags, blk_flags);
 	if (IS_ERR(req))
 		return req;
 	nvme_init_request(req, cmd);
-
-	if (timeout)
-		req->timeout = timeout;
 	nvme_req(req)->flags |= NVME_REQ_USERCMD;
+	return req;
+}
 
-	if (ubuffer && bufflen) {
-		if (!vec)
-			ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
-				GFP_KERNEL);
-		else {
-			struct iovec fast_iov[UIO_FASTIOV];
-			struct iovec *iov = fast_iov;
-			struct iov_iter iter;
-
-			ret = import_iovec(rq_data_dir(req), ubuffer, bufflen,
-					UIO_FASTIOV, &iov, &iter);
-			if (ret < 0)
-				goto out;
-			ret = blk_rq_map_user_iov(q, req, NULL, &iter,
-					GFP_KERNEL);
-			kfree(iov);
-		}
-		if (ret)
+static int nvme_map_user_request(struct request *req, void __user *ubuffer,
+		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
+		u32 meta_seed, void **metap, bool vec)
+{
+	struct request_queue *q = req->q;
+	struct nvme_ns *ns = q->queuedata;
+	struct block_device *bdev = ns ? ns->disk->part0 : NULL;
+	struct bio *bio = NULL;
+	void *meta = NULL;
+	int ret;
+
+	if (!vec)
+		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
+			GFP_KERNEL);
+	else {
+		struct iovec fast_iov[UIO_FASTIOV];
+		struct iovec *iov = fast_iov;
+		struct iov_iter iter;
+
+		ret = import_iovec(rq_data_dir(req), ubuffer, bufflen,
+				UIO_FASTIOV, &iov, &iter);
+		if (ret < 0)
 			goto out;
-		bio = req->bio;
-		if (bdev)
-			bio_set_dev(bio, bdev);
-		if (bdev && meta_buffer && meta_len) {
-			meta = nvme_add_user_metadata(req, meta_buffer, meta_len,
-					meta_seed);
-			if (IS_ERR(meta)) {
-				ret = PTR_ERR(meta);
-				goto out_unmap;
-			}
-			*metap = meta;
+
+		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
+		kfree(iov);
+	}
+	if (ret)
+		goto out;
+	bio = req->bio;
+	if (bdev)
+		bio_set_dev(bio, bdev);
+
+	if (bdev && meta_buffer && meta_len) {
+		meta = nvme_add_user_metadata(req, meta_buffer, meta_len,
+				meta_seed);
+		if (IS_ERR(meta)) {
+			ret = PTR_ERR(meta);
+			goto out_unmap;
 		}
+		*metap = meta;
 	}
 
-	return req;
+	return ret;
 
 out_unmap:
 	if (bio)
 		blk_rq_unmap_user(bio);
 out:
-	blk_mq_free_request(req);
-	return ERR_PTR(ret);
+	return ret;
 }
 
 static int nvme_submit_user_cmd(struct request_queue *q,
@@ -144,13 +145,19 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 	struct bio *bio;
 	int ret;
 
-	req = nvme_alloc_user_request(q, cmd, ubuffer, bufflen, meta_buffer,
-			meta_len, meta_seed, &meta, timeout, vec, 0, 0);
+	req = nvme_alloc_user_request(q, cmd, 0, 0);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	bio = req->bio;
+	req->timeout = timeout;
+	if (ubuffer && bufflen) {
+		ret = nvme_map_user_request(req, ubuffer, bufflen, meta_buffer,
+				meta_len, meta_seed, &meta, vec);
+		if (ret)
+			goto out;
+	}
 
+	bio = req->bio;
 	ret = nvme_execute_passthru_rq(req);
 
 	if (result)
@@ -160,6 +167,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 						meta_len, ret);
 	if (bio)
 		blk_rq_unmap_user(bio);
+out:
 	blk_mq_free_request(req);
 	return ret;
 }
@@ -421,6 +429,7 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 	blk_opf_t rq_flags = 0;
 	blk_mq_req_flags_t blk_flags = 0;
 	void *meta = NULL;
+	int ret;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
@@ -460,13 +469,18 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		rq_flags |= REQ_POLLED;
 
 retry:
-	req = nvme_alloc_user_request(q, &c, nvme_to_user_ptr(d.addr),
-			d.data_len, nvme_to_user_ptr(d.metadata),
-			d.metadata_len, 0, &meta, d.timeout_ms ?
-			msecs_to_jiffies(d.timeout_ms) : 0, vec, rq_flags,
-			blk_flags);
+	req = nvme_alloc_user_request(q, &c, rq_flags, blk_flags);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
+	req->timeout = d.timeout_ms ? msecs_to_jiffies(d.timeout_ms) : 0;
+
+	if (d.addr && d.data_len) {
+		ret = nvme_map_user_request(req, nvme_to_user_ptr(d.addr),
+			d.data_len, nvme_to_user_ptr(d.metadata),
+			d.metadata_len, 0, &meta, vec);
+		if (ret)
+			goto out_err;
+	}
 	req->end_io = nvme_uring_cmd_end_io;
 	req->end_io_data = ioucmd;
 
@@ -489,6 +503,9 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 
 	blk_execute_rq_nowait(req, false);
 	return -EIOCBQUEUED;
+out_err:
+	blk_mq_free_request(req);
+	return ret;
 }
 
 static bool is_ctrl_ioctl(unsigned int cmd)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v10 5/7] block: factor out bio_map_get helper
       [not found]   ` <CGME20220927174636epcas5p49008baa36dcbf2f61c25ba89c4707c0c@epcas5p4.samsung.com>
@ 2022-09-27 17:36     ` Kanchan Joshi
  2022-09-28 17:31       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Kanchan Joshi @ 2022-09-27 17:36 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, gost.dev, Kanchan Joshi

Move bio allocation logic from bio_map_user_iov to a new helper
bio_map_get. It is named so because functionality is opposite of what is
done inside bio_map_put. This is a prep patch.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 block/blk-map.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 7693f8e3c454..a7838879e28e 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -241,17 +241,10 @@ static void bio_map_put(struct bio *bio)
 	}
 }
 
-static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
+static struct bio *bio_map_get(struct request *rq, unsigned int nr_vecs,
 		gfp_t gfp_mask)
 {
-	unsigned int max_sectors = queue_max_hw_sectors(rq->q);
-	unsigned int nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS);
 	struct bio *bio;
-	int ret;
-	int j;
-
-	if (!iov_iter_count(iter))
-		return -EINVAL;
 
 	if (rq->cmd_flags & REQ_POLLED) {
 		blk_opf_t opf = rq->cmd_flags | REQ_ALLOC_CACHE;
@@ -259,13 +252,31 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 		bio = bio_alloc_bioset(NULL, nr_vecs, opf, gfp_mask,
 					&fs_bio_set);
 		if (!bio)
-			return -ENOMEM;
+			return NULL;
 	} else {
 		bio = bio_kmalloc(nr_vecs, gfp_mask);
 		if (!bio)
-			return -ENOMEM;
+			return NULL;
 		bio_init(bio, NULL, bio->bi_inline_vecs, nr_vecs, req_op(rq));
 	}
+	return bio;
+}
+
+static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
+		gfp_t gfp_mask)
+{
+	unsigned int max_sectors = queue_max_hw_sectors(rq->q);
+	unsigned int nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS);
+	struct bio *bio;
+	int ret;
+	int j;
+
+	if (!iov_iter_count(iter))
+		return -EINVAL;
+
+	bio = bio_map_get(rq, nr_vecs, gfp_mask);
+	if (bio == NULL)
+		return -ENOMEM;
 
 	while (iov_iter_count(iter)) {
 		struct page **pages, *stack_pages[UIO_FASTIOV];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v10 6/7] block: extend functionality to map bvec iterator
       [not found]   ` <CGME20220927174639epcas5p22b46aed144d81d82b2a9b9de586808ac@epcas5p2.samsung.com>
@ 2022-09-27 17:36     ` Kanchan Joshi
  2022-09-28 17:40       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Kanchan Joshi @ 2022-09-27 17:36 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, gost.dev, Kanchan Joshi, Anuj Gupta

Extend blk_rq_map_user_iov so that it can handle bvec iterator.
It  maps the pages from bvec iterator into a bio and place the bio into
request.

This helper will be used by nvme for uring-passthrough path when IO is
done using pre-mapped buffers.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
---
 block/blk-map.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/block/blk-map.c b/block/blk-map.c
index a7838879e28e..a1aa8dacb02b 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -548,6 +548,74 @@ int blk_rq_append_bio(struct request *rq, struct bio *bio)
 }
 EXPORT_SYMBOL(blk_rq_append_bio);
 
+/* Prepare bio for passthrough IO given ITER_BVEC iter */
+static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter,
+				bool *copy)
+{
+	struct request_queue *q = rq->q;
+	size_t nr_iter, nr_segs, i;
+	struct bio *bio = NULL;
+	struct bio_vec *bv, *bvecs, *bvprvp = NULL;
+	struct queue_limits *lim = &q->limits;
+	unsigned int nsegs = 0, bytes = 0;
+	unsigned long align = q->dma_pad_mask | queue_dma_alignment(q);
+
+	/* see if we need to copy pages due to any weird situation */
+	if (blk_queue_may_bounce(q))
+		goto out_copy;
+	else if (iov_iter_alignment(iter) & align)
+		goto out_copy;
+	/* virt-alignment gap is checked anyway down, so avoid extra loop here */
+
+	nr_iter = iov_iter_count(iter);
+	nr_segs = iter->nr_segs;
+
+	if (!nr_iter || (nr_iter >> SECTOR_SHIFT) > queue_max_hw_sectors(q))
+		return -EINVAL;
+	if (nr_segs > queue_max_segments(q))
+		return -EINVAL;
+
+	/* no iovecs to alloc, as we already have a BVEC iterator */
+	bio = bio_map_get(rq, 0, GFP_KERNEL);
+	if (bio == NULL)
+		return -ENOMEM;
+
+	bio_iov_bvec_set(bio, (struct iov_iter *)iter);
+	blk_rq_bio_prep(rq, bio, nr_segs);
+
+	/* loop to perform a bunch of sanity checks */
+	bvecs = (struct bio_vec *)iter->bvec;
+	for (i = 0; i < nr_segs; i++) {
+		bv = &bvecs[i];
+		/*
+		 * If the queue doesn't support SG gaps and adding this
+		 * offset would create a gap, fallback to copy.
+		 */
+		if (bvprvp && bvec_gap_to_prev(lim, bvprvp, bv->bv_offset)) {
+			bio_map_put(bio);
+			goto out_copy;
+		}
+		/* check full condition */
+		if (nsegs >= nr_segs || bytes > UINT_MAX - bv->bv_len)
+			goto put_bio;
+		if (bytes + bv->bv_len > nr_iter)
+			goto put_bio;
+		if (bv->bv_offset + bv->bv_len > PAGE_SIZE)
+			goto put_bio;
+
+		nsegs++;
+		bytes += bv->bv_len;
+		bvprvp = bv;
+	}
+	return 0;
+put_bio:
+	bio_map_put(bio);
+	return -EINVAL;
+out_copy:
+	*copy = true;
+	return 0;
+}
+
 /**
  * blk_rq_map_user_iov - map user data to a request, for passthrough requests
  * @q:		request queue where request should be inserted
@@ -573,6 +641,14 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 	struct iov_iter i;
 	int ret = -EINVAL;
 
+	if (iov_iter_is_bvec(iter)) {
+		ret = blk_rq_map_user_bvec(rq, iter, &copy);
+		if (ret != 0)
+			goto fail;
+		if (copy)
+			goto do_copy;
+		return ret;
+	}
 	if (!iter_is_iovec(iter))
 		goto fail;
 
@@ -585,6 +661,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 	else if (queue_virt_boundary(q))
 		copy = queue_virt_boundary(q) & iov_iter_gap_alignment(iter);
 
+do_copy:
 	i = *iter;
 	do {
 		if (copy)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for-next v10 7/7] nvme: wire up fixed buffer support for nvme passthrough
       [not found]   ` <CGME20220927174642epcas5p1dafa31776d4eb8180e18f149ed25640c@epcas5p1.samsung.com>
@ 2022-09-27 17:36     ` Kanchan Joshi
  2022-09-28 17:59       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Kanchan Joshi @ 2022-09-27 17:36 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, gost.dev, Kanchan Joshi

if io_uring sends passthrough command with IORING_URING_CMD_FIXED flag,
use the pre-registered buffer for IO (non-vectored variant). Pass the
buffer/length to io_uring and get the bvec iterator for the range. Next,
pass this bvec to block-layer and obtain a bio/request for subsequent
processing.
While at it, modify nvme_submit_user_cmd to take ubuffer as plain integer
argument, and do away with nvme_to_user_ptr conversion in callers.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 drivers/nvme/host/ioctl.c | 44 +++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index b9f17dc87de9..1a45246f0d7a 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -83,9 +83,10 @@ static struct request *nvme_alloc_user_request(struct request_queue *q,
 	return req;
 }
 
-static int nvme_map_user_request(struct request *req, void __user *ubuffer,
+static int nvme_map_user_request(struct request *req, u64 ubuffer,
 		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
-		u32 meta_seed, void **metap, bool vec)
+		u32 meta_seed, void **metap, struct io_uring_cmd *ioucmd,
+		bool vec)
 {
 	struct request_queue *q = req->q;
 	struct nvme_ns *ns = q->queuedata;
@@ -93,23 +94,34 @@ static int nvme_map_user_request(struct request *req, void __user *ubuffer,
 	struct bio *bio = NULL;
 	void *meta = NULL;
 	int ret;
+	bool fixedbufs = ioucmd && (ioucmd->flags & IORING_URING_CMD_FIXED);
 
-	if (!vec)
-		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
-			GFP_KERNEL);
-	else {
+	if (vec) {
 		struct iovec fast_iov[UIO_FASTIOV];
 		struct iovec *iov = fast_iov;
 		struct iov_iter iter;
 
-		ret = import_iovec(rq_data_dir(req), ubuffer, bufflen,
-				UIO_FASTIOV, &iov, &iter);
+		/* fixedbufs is only for non-vectored io */
+		WARN_ON_ONCE(fixedbufs);
+		ret = import_iovec(rq_data_dir(req), nvme_to_user_ptr(ubuffer),
+				bufflen, UIO_FASTIOV, &iov, &iter);
 		if (ret < 0)
 			goto out;
 
 		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
 		kfree(iov);
-	}
+	} else if (fixedbufs) {
+		struct iov_iter iter;
+
+		ret = io_uring_cmd_import_fixed(ubuffer, bufflen,
+				rq_data_dir(req), &iter, ioucmd);
+		if (ret < 0)
+			goto out;
+		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
+	} else
+		ret = blk_rq_map_user(q, req, NULL,
+					nvme_to_user_ptr(ubuffer), bufflen,
+					GFP_KERNEL);
 	if (ret)
 		goto out;
 	bio = req->bio;
@@ -136,7 +148,7 @@ static int nvme_map_user_request(struct request *req, void __user *ubuffer,
 }
 
 static int nvme_submit_user_cmd(struct request_queue *q,
-		struct nvme_command *cmd, void __user *ubuffer,
+		struct nvme_command *cmd, u64 ubuffer,
 		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
 		u32 meta_seed, u64 *result, unsigned timeout, bool vec)
 {
@@ -152,7 +164,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 	req->timeout = timeout;
 	if (ubuffer && bufflen) {
 		ret = nvme_map_user_request(req, ubuffer, bufflen, meta_buffer,
-				meta_len, meta_seed, &meta, vec);
+				meta_len, meta_seed, &meta, NULL, vec);
 		if (ret)
 			goto out;
 	}
@@ -231,7 +243,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	c.rw.appmask = cpu_to_le16(io.appmask);
 
 	return nvme_submit_user_cmd(ns->queue, &c,
-			nvme_to_user_ptr(io.addr), length,
+			io.addr, length,
 			metadata, meta_len, lower_32_bits(io.slba), NULL, 0,
 			false);
 }
@@ -285,7 +297,7 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		timeout = msecs_to_jiffies(cmd.timeout_ms);
 
 	status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c,
-			nvme_to_user_ptr(cmd.addr), cmd.data_len,
+			cmd.addr, cmd.data_len,
 			nvme_to_user_ptr(cmd.metadata), cmd.metadata_len,
 			0, &result, timeout, false);
 
@@ -331,7 +343,7 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		timeout = msecs_to_jiffies(cmd.timeout_ms);
 
 	status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c,
-			nvme_to_user_ptr(cmd.addr), cmd.data_len,
+			cmd.addr, cmd.data_len,
 			nvme_to_user_ptr(cmd.metadata), cmd.metadata_len,
 			0, &cmd.result, timeout, vec);
 
@@ -475,9 +487,9 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 	req->timeout = d.timeout_ms ? msecs_to_jiffies(d.timeout_ms) : 0;
 
 	if (d.addr && d.data_len) {
-		ret = nvme_map_user_request(req, nvme_to_user_ptr(d.addr),
+		ret = nvme_map_user_request(req, d.addr,
 			d.data_len, nvme_to_user_ptr(d.metadata),
-			d.metadata_len, 0, &meta, vec);
+			d.metadata_len, 0, &meta, ioucmd, vec);
 		if (ret)
 			goto out_err;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 0/7] Fixed-buffer for uring-cmd/passthru
  2022-09-27 17:36 ` [PATCH for-next v10 0/7] Fixed-buffer for uring-cmd/passthru Kanchan Joshi
                     ` (6 preceding siblings ...)
       [not found]   ` <CGME20220927174642epcas5p1dafa31776d4eb8180e18f149ed25640c@epcas5p1.samsung.com>
@ 2022-09-28 14:28   ` Jens Axboe
  2022-09-28 17:12     ` Christoph Hellwig
  7 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2022-09-28 14:28 UTC (permalink / raw)
  To: Kanchan Joshi, hch, kbusch; +Cc: io-uring, linux-nvme, linux-block, gost.dev

On 9/27/22 11:36 AM, Kanchan Joshi wrote:
> Hi
> 
> uring-cmd lacks the ability to leverage the pre-registered buffers.
> This series adds that support in uring-cmd, and plumbs nvme passthrough
> to work with it.
> Patch 3 and 4 contains a bunch of general nvme cleanups, which got added
> along the iterations.
> 
> Using registered-buffers showed IOPS hike from 1.65M to 2.04M.
> Without fixedbufs
> *****************
> # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1
> submitter=0, tid=2178, file=/dev/ng0n1, node=-1
> polled=1, fixedbufs=0/0, register_files=1, buffered=1, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> IOPS=1.63M, BW=796MiB/s, IOS/call=32/31
> IOPS=1.64M, BW=800MiB/s, IOS/call=32/32
> IOPS=1.64M, BW=801MiB/s, IOS/call=32/32
> IOPS=1.65M, BW=803MiB/s, IOS/call=32/31
> ^CExiting on signal
> Maximum IOPS=1.65M
> 
> With fixedbufs
> **************
> # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -O0 -n1 -u1 /dev/ng0n1
> submitter=0, tid=2180, file=/dev/ng0n1, node=-1
> polled=1, fixedbufs=1/0, register_files=1, buffered=1, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> IOPS=2.03M, BW=991MiB/s, IOS/call=32/31
> IOPS=2.04M, BW=998MiB/s, IOS/call=32/32
> IOPS=2.04M, BW=997MiB/s, IOS/call=32/31
> ^CExiting on signal
> Maximum IOPS=2.04M

Christoph, are you happy with the changes at this point?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 0/7] Fixed-buffer for uring-cmd/passthru
  2022-09-28 14:28   ` [PATCH for-next v10 0/7] Fixed-buffer for uring-cmd/passthru Jens Axboe
@ 2022-09-28 17:12     ` Christoph Hellwig
  2022-09-28 17:13       ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2022-09-28 17:12 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Kanchan Joshi, hch, kbusch, io-uring, linux-nvme, linux-block, gost.dev

On Wed, Sep 28, 2022 at 08:28:02AM -0600, Jens Axboe wrote:
> Christoph, are you happy with the changes at this point?

I'll look at it now.  Too much on my plate for less than 24 hour turn
around times..

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 0/7] Fixed-buffer for uring-cmd/passthru
  2022-09-28 17:12     ` Christoph Hellwig
@ 2022-09-28 17:13       ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2022-09-28 17:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kanchan Joshi, kbusch, io-uring, linux-nvme, linux-block, gost.dev

On 9/28/22 11:12 AM, Christoph Hellwig wrote:
> On Wed, Sep 28, 2022 at 08:28:02AM -0600, Jens Axboe wrote:
>> Christoph, are you happy with the changes at this point?
> 
> I'll look at it now.  Too much on my plate for less than 24 hour turn
> around times..

Yeah I know, I believe Kanchan is OOO for a bit from today which is
part of the reason why it got quick respins.

Thanks!

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 3/7] nvme: refactor nvme_add_user_metadata
  2022-09-27 17:36     ` [PATCH for-next v10 3/7] nvme: refactor nvme_add_user_metadata Kanchan Joshi
@ 2022-09-28 17:18       ` Christoph Hellwig
  2022-09-29 11:28         ` Anuj Gupta
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2022-09-28 17:18 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: axboe, hch, kbusch, io-uring, linux-nvme, linux-block, gost.dev

On Tue, Sep 27, 2022 at 11:06:06PM +0530, Kanchan Joshi wrote:
>  		if (bdev && meta_buffer && meta_len) {
> -			meta = nvme_add_user_metadata(bio, meta_buffer, meta_len,
> -					meta_seed, write);
> +			meta = nvme_add_user_metadata(req, meta_buffer, meta_len,

Pleae avoid the overly long line here.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 4/7] nvme: refactor nvme_alloc_request
  2022-09-27 17:36     ` [PATCH for-next v10 4/7] nvme: refactor nvme_alloc_request Kanchan Joshi
@ 2022-09-28 17:19       ` Christoph Hellwig
  2022-09-29 11:30         ` Anuj Gupta
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2022-09-28 17:19 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: axboe, hch, kbusch, io-uring, linux-nvme, linux-block, gost.dev

> +	if (!vec)
> +		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
> +			GFP_KERNEL);
> +	else {
> +		struct iovec fast_iov[UIO_FASTIOV];
> +		struct iovec *iov = fast_iov;
> +		struct iov_iter iter;
> +
> +		ret = import_iovec(rq_data_dir(req), ubuffer, bufflen,
> +				UIO_FASTIOV, &iov, &iter);
> +		if (ret < 0)
>  			goto out;
> +
> +		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
> +		kfree(iov);
> +	}

As mentioned before this is something that should got into blk-map.c
as a separate helper, and scsi_ioctl.c and sg.c should be switched to
use it as well.

Otherwise this looks good.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 5/7] block: factor out bio_map_get helper
  2022-09-27 17:36     ` [PATCH for-next v10 5/7] block: factor out bio_map_get helper Kanchan Joshi
@ 2022-09-28 17:31       ` Christoph Hellwig
  2022-09-28 17:49         ` Jens Axboe
  2022-09-29 11:34         ` Anuj Gupta
  0 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-09-28 17:31 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: axboe, hch, kbusch, io-uring, linux-nvme, linux-block, gost.dev

On Tue, Sep 27, 2022 at 11:06:08PM +0530, Kanchan Joshi wrote:
> Move bio allocation logic from bio_map_user_iov to a new helper
> bio_map_get. It is named so because functionality is opposite of what is
> done inside bio_map_put. This is a prep patch.

I'm still not a fan of using bio_sets for passthrough and would be
much happier if we could drill down what the problems with the
slab per-cpu allocator are, but it seems like I've lost that fight
against Jens..

> +static struct bio *bio_map_get(struct request *rq, unsigned int nr_vecs,
>  		gfp_t gfp_mask)

But these names just seems rather misleading.  Why not someting
like blk_rq_map_bio_alloc and blk_mq_map_bio_put?

Not really new in this code but a question to Jens:  The existing
bio_map_user_iov has no real upper bounds on the number of bios
allocated, how does that fit with the very limited pool size of
fs_bio_set?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 6/7] block: extend functionality to map bvec iterator
  2022-09-27 17:36     ` [PATCH for-next v10 6/7] block: extend functionality to map bvec iterator Kanchan Joshi
@ 2022-09-28 17:40       ` Christoph Hellwig
  2022-09-29 11:33         ` Anuj Gupta
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2022-09-28 17:40 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: axboe, hch, kbusch, io-uring, linux-nvme, linux-block, gost.dev,
	Anuj Gupta

On Tue, Sep 27, 2022 at 11:06:09PM +0530, Kanchan Joshi wrote:
> Extend blk_rq_map_user_iov so that it can handle bvec iterator.
> It  maps the pages from bvec iterator into a bio and place the bio into
> request.
> 
> This helper will be used by nvme for uring-passthrough path when IO is
> done using pre-mapped buffers.

Can we avoid duplicating some of the checks?  Something like the below
incremental patch.  Note that this now also allows the copy path for
all kinds of iov_iters, but as the copy from/to iter code is safe
and the sanity check was just or the map path that should be fine.
It's best split into a prep patch, though.

---
diff --git a/block/blk-map.c b/block/blk-map.c
index a1aa8dacb02bc..c51de30767403 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -549,26 +549,16 @@ int blk_rq_append_bio(struct request *rq, struct bio *bio)
 EXPORT_SYMBOL(blk_rq_append_bio);
 
 /* Prepare bio for passthrough IO given ITER_BVEC iter */
-static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter,
-				bool *copy)
+static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter)
 {
 	struct request_queue *q = rq->q;
-	size_t nr_iter, nr_segs, i;
-	struct bio *bio = NULL;
-	struct bio_vec *bv, *bvecs, *bvprvp = NULL;
+	size_t nr_iter = iov_iter_count(iter);
+	size_t nr_segs = iter->nr_segs;
+	struct bio_vec *bvecs, *bvprvp = NULL;
 	struct queue_limits *lim = &q->limits;
 	unsigned int nsegs = 0, bytes = 0;
-	unsigned long align = q->dma_pad_mask | queue_dma_alignment(q);
-
-	/* see if we need to copy pages due to any weird situation */
-	if (blk_queue_may_bounce(q))
-		goto out_copy;
-	else if (iov_iter_alignment(iter) & align)
-		goto out_copy;
-	/* virt-alignment gap is checked anyway down, so avoid extra loop here */
-
-	nr_iter = iov_iter_count(iter);
-	nr_segs = iter->nr_segs;
+	struct bio *bio;
+	size_t i;
 
 	if (!nr_iter || (nr_iter >> SECTOR_SHIFT) > queue_max_hw_sectors(q))
 		return -EINVAL;
@@ -586,14 +576,15 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter,
 	/* loop to perform a bunch of sanity checks */
 	bvecs = (struct bio_vec *)iter->bvec;
 	for (i = 0; i < nr_segs; i++) {
-		bv = &bvecs[i];
+		struct bio_vec *bv = &bvecs[i];
+
 		/*
 		 * If the queue doesn't support SG gaps and adding this
 		 * offset would create a gap, fallback to copy.
 		 */
 		if (bvprvp && bvec_gap_to_prev(lim, bvprvp, bv->bv_offset)) {
 			bio_map_put(bio);
-			goto out_copy;
+			return -EREMOTEIO;
 		}
 		/* check full condition */
 		if (nsegs >= nr_segs || bytes > UINT_MAX - bv->bv_len)
@@ -611,9 +602,6 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter,
 put_bio:
 	bio_map_put(bio);
 	return -EINVAL;
-out_copy:
-	*copy = true;
-	return 0;
 }
 
 /**
@@ -635,33 +623,35 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 			struct rq_map_data *map_data,
 			const struct iov_iter *iter, gfp_t gfp_mask)
 {
-	bool copy = false;
+	bool copy = false, map_bvec = false;
 	unsigned long align = q->dma_pad_mask | queue_dma_alignment(q);
 	struct bio *bio = NULL;
 	struct iov_iter i;
 	int ret = -EINVAL;
 
-	if (iov_iter_is_bvec(iter)) {
-		ret = blk_rq_map_user_bvec(rq, iter, &copy);
-		if (ret != 0)
-			goto fail;
-		if (copy)
-			goto do_copy;
-		return ret;
-	}
-	if (!iter_is_iovec(iter))
-		goto fail;
-
 	if (map_data)
 		copy = true;
 	else if (blk_queue_may_bounce(q))
 		copy = true;
 	else if (iov_iter_alignment(iter) & align)
 		copy = true;
+	else if (iov_iter_is_bvec(iter))
+		map_bvec = true;
+	else if (!iter_is_iovec(iter))
+		copy = true;
 	else if (queue_virt_boundary(q))
 		copy = queue_virt_boundary(q) & iov_iter_gap_alignment(iter);
 
-do_copy:
+	if (map_bvec) {
+		ret = blk_rq_map_user_bvec(rq, iter);
+		if (!ret)
+			return 0;
+		if (ret != -EREMOTEIO)
+			goto fail;
+		/* fall back to copying the data on limits mismatches */
+		copy = true;
+	}
+
 	i = *iter;
 	do {
 		if (copy)

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 5/7] block: factor out bio_map_get helper
  2022-09-28 17:31       ` Christoph Hellwig
@ 2022-09-28 17:49         ` Jens Axboe
  2022-09-28 17:53           ` Jens Axboe
  2022-09-29 11:34         ` Anuj Gupta
  1 sibling, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2022-09-28 17:49 UTC (permalink / raw)
  To: Christoph Hellwig, Kanchan Joshi
  Cc: kbusch, io-uring, linux-nvme, linux-block, gost.dev

On 9/28/22 11:31 AM, Christoph Hellwig wrote:
> On Tue, Sep 27, 2022 at 11:06:08PM +0530, Kanchan Joshi wrote:
>> Move bio allocation logic from bio_map_user_iov to a new helper
>> bio_map_get. It is named so because functionality is opposite of what is
>> done inside bio_map_put. This is a prep patch.
> 
> I'm still not a fan of using bio_sets for passthrough and would be
> much happier if we could drill down what the problems with the
> slab per-cpu allocator are, but it seems like I've lost that fight
> against Jens..

I don't think there are necessarily big problems with the slab side,
it's just that the per-cpu freeing there needs to be IRQ safe. And the
double cmpxchg() used for that isn't that fast compared to being able
to cache these locally with just preempt protection.

>> +static struct bio *bio_map_get(struct request *rq, unsigned int nr_vecs,
>>  		gfp_t gfp_mask)
> 
> But these names just seems rather misleading.  Why not someting
> like blk_rq_map_bio_alloc and blk_mq_map_bio_put?
> 
> Not really new in this code but a question to Jens:  The existing
> bio_map_user_iov has no real upper bounds on the number of bios
> allocated, how does that fit with the very limited pool size of
> fs_bio_set?

Good question - I think we'd need to ensure that once we get
past the initial alloc that we clear any gfp flags that'd make
the mempool_alloc() wait for completions.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 5/7] block: factor out bio_map_get helper
  2022-09-28 17:49         ` Jens Axboe
@ 2022-09-28 17:53           ` Jens Axboe
  0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2022-09-28 17:53 UTC (permalink / raw)
  To: Christoph Hellwig, Kanchan Joshi
  Cc: kbusch, io-uring, linux-nvme, linux-block, gost.dev

On 9/28/22 11:49 AM, Jens Axboe wrote:
>> Not really new in this code but a question to Jens:  The existing
>> bio_map_user_iov has no real upper bounds on the number of bios
>> allocated, how does that fit with the very limited pool size of
>> fs_bio_set?
> 
> Good question - I think we'd need to ensure that once we get
> past the initial alloc that we clear any gfp flags that'd make
> the mempool_alloc() wait for completions.

Either that, or just have the passthrough code use non-waiting flags to
begin with. Not guaranteeing forward progress is a bit iffy though...
But in practice it'd be no different than getting a failure due to OOM
because the application submitted a big request and we needed to alloc
and map multiple bios.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 7/7] nvme: wire up fixed buffer support for nvme passthrough
  2022-09-27 17:36     ` [PATCH for-next v10 7/7] nvme: wire up fixed buffer support for nvme passthrough Kanchan Joshi
@ 2022-09-28 17:59       ` Christoph Hellwig
  2022-09-29 11:36         ` Anuj Gupta
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2022-09-28 17:59 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: axboe, hch, kbusch, io-uring, linux-nvme, linux-block, gost.dev

> -static int nvme_map_user_request(struct request *req, void __user *ubuffer,
> +static int nvme_map_user_request(struct request *req, u64 ubuffer,

The changes to pass ubuffer as an integer trip me up every time.
They are obviously fine as we do the pointer conversion less often,
but maybe they'd be easier to follow if split into a prep patch.

> +	bool fixedbufs = ioucmd && (ioucmd->flags & IORING_URING_CMD_FIXED);
>  
> -	if (!vec)
> -		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
> -			GFP_KERNEL);
> -	else {
> +	if (vec) {

If we check IORING_URING_CMD_FIXED first this becomes a bit simpler,
and also works better with the block helper suggested earlier:

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 1a45246f0d7a8..f46142dcb8f1e 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -94,34 +94,33 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer,
 	struct bio *bio = NULL;
 	void *meta = NULL;
 	int ret;
-	bool fixedbufs = ioucmd && (ioucmd->flags & IORING_URING_CMD_FIXED);
 
-	if (vec) {
-		struct iovec fast_iov[UIO_FASTIOV];
-		struct iovec *iov = fast_iov;
+	if (ioucmd && (ioucmd->flags & IORING_URING_CMD_FIXED)) {
 		struct iov_iter iter;
 
 		/* fixedbufs is only for non-vectored io */
-		WARN_ON_ONCE(fixedbufs);
-		ret = import_iovec(rq_data_dir(req), nvme_to_user_ptr(ubuffer),
-				bufflen, UIO_FASTIOV, &iov, &iter);
+		if (WARN_ON_ONCE(vec))
+			return -EINVAL;
+		ret = io_uring_cmd_import_fixed(ubuffer, bufflen,
+				rq_data_dir(req), &iter, ioucmd);
 		if (ret < 0)
 			goto out;
-
 		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
-		kfree(iov);
-	} else if (fixedbufs) {
+	} else if (vec) {
+		struct iovec fast_iov[UIO_FASTIOV];
+		struct iovec *iov = fast_iov;
 		struct iov_iter iter;
 
-		ret = io_uring_cmd_import_fixed(ubuffer, bufflen,
-				rq_data_dir(req), &iter, ioucmd);
+		ret = import_iovec(rq_data_dir(req), nvme_to_user_ptr(ubuffer),
+				bufflen, UIO_FASTIOV, &iov, &iter);
 		if (ret < 0)
 			goto out;
 		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
-	} else
+		kfree(iov);
+	} else {
 		ret = blk_rq_map_user(q, req, NULL,
-					nvme_to_user_ptr(ubuffer), bufflen,
-					GFP_KERNEL);
+				nvme_to_user_ptr(ubuffer), bufflen, GFP_KERNEL);
+	}
 	if (ret)
 		goto out;
 	bio = req->bio;

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 3/7] nvme: refactor nvme_add_user_metadata
  2022-09-28 17:18       ` Christoph Hellwig
@ 2022-09-29 11:28         ` Anuj Gupta
  0 siblings, 0 replies; 23+ messages in thread
From: Anuj Gupta @ 2022-09-29 11:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kanchan Joshi, axboe, kbusch, io-uring, linux-nvme, linux-block,
	gost.dev

[-- Attachment #1: Type: text/plain, Size: 524 bytes --]

On Wed, Sep 28, 2022 at 07:18:05PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 27, 2022 at 11:06:06PM +0530, Kanchan Joshi wrote:
> >  		if (bdev && meta_buffer && meta_len) {
> > -			meta = nvme_add_user_metadata(bio, meta_buffer, meta_len,
> > -					meta_seed, write);
> > +			meta = nvme_add_user_metadata(req, meta_buffer, meta_len,
> 
> Pleae avoid the overly long line here.
> 

sure, will fold it in, in the next iteration
> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 

--
Anuj Gupta


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 4/7] nvme: refactor nvme_alloc_request
  2022-09-28 17:19       ` Christoph Hellwig
@ 2022-09-29 11:30         ` Anuj Gupta
  0 siblings, 0 replies; 23+ messages in thread
From: Anuj Gupta @ 2022-09-29 11:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kanchan Joshi, axboe, kbusch, io-uring, linux-nvme, linux-block,
	gost.dev

[-- Attachment #1: Type: text/plain, Size: 791 bytes --]

On Wed, Sep 28, 2022 at 07:19:32PM +0200, Christoph Hellwig wrote:
> > +	if (!vec)
> > +		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
> > +			GFP_KERNEL);
> > +	else {
> > +		struct iovec fast_iov[UIO_FASTIOV];
> > +		struct iovec *iov = fast_iov;
> > +		struct iov_iter iter;
> > +
> > +		ret = import_iovec(rq_data_dir(req), ubuffer, bufflen,
> > +				UIO_FASTIOV, &iov, &iter);
> > +		if (ret < 0)
> >  			goto out;
> > +
> > +		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
> > +		kfree(iov);
> > +	}
> 
> As mentioned before this is something that should got into blk-map.c
> as a separate helper, and scsi_ioctl.c and sg.c should be switched to
> use it as well.
> 

sure, this will be done in the next iteration
> Otherwise this looks good.

--
Anuj Gupta

> 

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 6/7] block: extend functionality to map bvec iterator
  2022-09-28 17:40       ` Christoph Hellwig
@ 2022-09-29 11:33         ` Anuj Gupta
  0 siblings, 0 replies; 23+ messages in thread
From: Anuj Gupta @ 2022-09-29 11:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kanchan Joshi, axboe, kbusch, io-uring, linux-nvme, linux-block,
	gost.dev

[-- Attachment #1: Type: text/plain, Size: 4404 bytes --]

On Wed, Sep 28, 2022 at 07:40:39PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 27, 2022 at 11:06:09PM +0530, Kanchan Joshi wrote:
> > Extend blk_rq_map_user_iov so that it can handle bvec iterator.
> > It  maps the pages from bvec iterator into a bio and place the bio into
> > request.
> > 
> > This helper will be used by nvme for uring-passthrough path when IO is
> > done using pre-mapped buffers.
> 
> Can we avoid duplicating some of the checks?  Something like the below
> incremental patch.  Note that this now also allows the copy path for
> all kinds of iov_iters, but as the copy from/to iter code is safe
> and the sanity check was just or the map path that should be fine.
> It's best split into a prep patch, though.

Right, this one looks way better. Will fold this in a separate prep patch.

> 
> ---
> diff --git a/block/blk-map.c b/block/blk-map.c
> index a1aa8dacb02bc..c51de30767403 100644
> --- a/block/blk-map.c
> +++ b/block/blk-map.c
> @@ -549,26 +549,16 @@ int blk_rq_append_bio(struct request *rq, struct bio *bio)
>  EXPORT_SYMBOL(blk_rq_append_bio);
>  
>  /* Prepare bio for passthrough IO given ITER_BVEC iter */
> -static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter,
> -				bool *copy)
> +static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter)
>  {
>  	struct request_queue *q = rq->q;
> -	size_t nr_iter, nr_segs, i;
> -	struct bio *bio = NULL;
> -	struct bio_vec *bv, *bvecs, *bvprvp = NULL;
> +	size_t nr_iter = iov_iter_count(iter);
> +	size_t nr_segs = iter->nr_segs;
> +	struct bio_vec *bvecs, *bvprvp = NULL;
>  	struct queue_limits *lim = &q->limits;
>  	unsigned int nsegs = 0, bytes = 0;
> -	unsigned long align = q->dma_pad_mask | queue_dma_alignment(q);
> -
> -	/* see if we need to copy pages due to any weird situation */
> -	if (blk_queue_may_bounce(q))
> -		goto out_copy;
> -	else if (iov_iter_alignment(iter) & align)
> -		goto out_copy;
> -	/* virt-alignment gap is checked anyway down, so avoid extra loop here */
> -
> -	nr_iter = iov_iter_count(iter);
> -	nr_segs = iter->nr_segs;
> +	struct bio *bio;
> +	size_t i;
>  
>  	if (!nr_iter || (nr_iter >> SECTOR_SHIFT) > queue_max_hw_sectors(q))
>  		return -EINVAL;
> @@ -586,14 +576,15 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter,
>  	/* loop to perform a bunch of sanity checks */
>  	bvecs = (struct bio_vec *)iter->bvec;
>  	for (i = 0; i < nr_segs; i++) {
> -		bv = &bvecs[i];
> +		struct bio_vec *bv = &bvecs[i];
> +
>  		/*
>  		 * If the queue doesn't support SG gaps and adding this
>  		 * offset would create a gap, fallback to copy.
>  		 */
>  		if (bvprvp && bvec_gap_to_prev(lim, bvprvp, bv->bv_offset)) {
>  			bio_map_put(bio);
> -			goto out_copy;
> +			return -EREMOTEIO;
>  		}
>  		/* check full condition */
>  		if (nsegs >= nr_segs || bytes > UINT_MAX - bv->bv_len)
> @@ -611,9 +602,6 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter,
>  put_bio:
>  	bio_map_put(bio);
>  	return -EINVAL;
> -out_copy:
> -	*copy = true;
> -	return 0;
>  }
>  
>  /**
> @@ -635,33 +623,35 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
>  			struct rq_map_data *map_data,
>  			const struct iov_iter *iter, gfp_t gfp_mask)
>  {
> -	bool copy = false;
> +	bool copy = false, map_bvec = false;
>  	unsigned long align = q->dma_pad_mask | queue_dma_alignment(q);
>  	struct bio *bio = NULL;
>  	struct iov_iter i;
>  	int ret = -EINVAL;
>  
> -	if (iov_iter_is_bvec(iter)) {
> -		ret = blk_rq_map_user_bvec(rq, iter, &copy);
> -		if (ret != 0)
> -			goto fail;
> -		if (copy)
> -			goto do_copy;
> -		return ret;
> -	}
> -	if (!iter_is_iovec(iter))
> -		goto fail;
> -
>  	if (map_data)
>  		copy = true;
>  	else if (blk_queue_may_bounce(q))
>  		copy = true;
>  	else if (iov_iter_alignment(iter) & align)
>  		copy = true;
> +	else if (iov_iter_is_bvec(iter))
> +		map_bvec = true;
> +	else if (!iter_is_iovec(iter))
> +		copy = true;
>  	else if (queue_virt_boundary(q))
>  		copy = queue_virt_boundary(q) & iov_iter_gap_alignment(iter);
>  
> -do_copy:
> +	if (map_bvec) {
> +		ret = blk_rq_map_user_bvec(rq, iter);
> +		if (!ret)
> +			return 0;
> +		if (ret != -EREMOTEIO)
> +			goto fail;
> +		/* fall back to copying the data on limits mismatches */
> +		copy = true;
> +	}
> +
>  	i = *iter;
>  	do {
>  		if (copy)
> 

--
Anuj Gupta


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 5/7] block: factor out bio_map_get helper
  2022-09-28 17:31       ` Christoph Hellwig
  2022-09-28 17:49         ` Jens Axboe
@ 2022-09-29 11:34         ` Anuj Gupta
  1 sibling, 0 replies; 23+ messages in thread
From: Anuj Gupta @ 2022-09-29 11:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kanchan Joshi, axboe, kbusch, io-uring, linux-nvme, linux-block,
	gost.dev

[-- Attachment #1: Type: text/plain, Size: 1086 bytes --]

On Wed, Sep 28, 2022 at 07:31:21PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 27, 2022 at 11:06:08PM +0530, Kanchan Joshi wrote:
> > Move bio allocation logic from bio_map_user_iov to a new helper
> > bio_map_get. It is named so because functionality is opposite of what is
> > done inside bio_map_put. This is a prep patch.
> 
> I'm still not a fan of using bio_sets for passthrough and would be
> much happier if we could drill down what the problems with the
> slab per-cpu allocator are, but it seems like I've lost that fight
> against Jens..
> 
> > +static struct bio *bio_map_get(struct request *rq, unsigned int nr_vecs,
> >  		gfp_t gfp_mask)
> 
> But these names just seems rather misleading.  Why not someting
> like blk_rq_map_bio_alloc and blk_mq_map_bio_put?

Agreed, will rename the alloc and put function in the next iteration

> 
> Not really new in this code but a question to Jens:  The existing
> bio_map_user_iov has no real upper bounds on the number of bios
> allocated, how does that fit with the very limited pool size of
> fs_bio_set?

--
Anuj Gupta

> 
> 

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for-next v10 7/7] nvme: wire up fixed buffer support for nvme passthrough
  2022-09-28 17:59       ` Christoph Hellwig
@ 2022-09-29 11:36         ` Anuj Gupta
  0 siblings, 0 replies; 23+ messages in thread
From: Anuj Gupta @ 2022-09-29 11:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kanchan Joshi, axboe, kbusch, io-uring, linux-nvme, linux-block,
	gost.dev

[-- Attachment #1: Type: text/plain, Size: 2715 bytes --]

On Wed, Sep 28, 2022 at 07:59:04PM +0200, Christoph Hellwig wrote:
> > -static int nvme_map_user_request(struct request *req, void __user *ubuffer,
> > +static int nvme_map_user_request(struct request *req, u64 ubuffer,
> 
> The changes to pass ubuffer as an integer trip me up every time.
> They are obviously fine as we do the pointer conversion less often,
> but maybe they'd be easier to follow if split into a prep patch.

ok, will separate these changes in a separate prep patch
> 
> > +	bool fixedbufs = ioucmd && (ioucmd->flags & IORING_URING_CMD_FIXED);
> >  
> > -	if (!vec)
> > -		ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
> > -			GFP_KERNEL);
> > -	else {
> > +	if (vec) {
> 
> If we check IORING_URING_CMD_FIXED first this becomes a bit simpler,
> and also works better with the block helper suggested earlier:

will create a block helper for this and the scsi counterparts in the next iteration
> 
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index 1a45246f0d7a8..f46142dcb8f1e 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -94,34 +94,33 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer,
>  	struct bio *bio = NULL;
>  	void *meta = NULL;
>  	int ret;
> -	bool fixedbufs = ioucmd && (ioucmd->flags & IORING_URING_CMD_FIXED);
>  
> -	if (vec) {
> -		struct iovec fast_iov[UIO_FASTIOV];
> -		struct iovec *iov = fast_iov;
> +	if (ioucmd && (ioucmd->flags & IORING_URING_CMD_FIXED)) {
>  		struct iov_iter iter;
>  
>  		/* fixedbufs is only for non-vectored io */
> -		WARN_ON_ONCE(fixedbufs);
> -		ret = import_iovec(rq_data_dir(req), nvme_to_user_ptr(ubuffer),
> -				bufflen, UIO_FASTIOV, &iov, &iter);
> +		if (WARN_ON_ONCE(vec))
> +			return -EINVAL;
> +		ret = io_uring_cmd_import_fixed(ubuffer, bufflen,
> +				rq_data_dir(req), &iter, ioucmd);
>  		if (ret < 0)
>  			goto out;
> -
>  		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
> -		kfree(iov);
> -	} else if (fixedbufs) {
> +	} else if (vec) {
> +		struct iovec fast_iov[UIO_FASTIOV];
> +		struct iovec *iov = fast_iov;
>  		struct iov_iter iter;
>  
> -		ret = io_uring_cmd_import_fixed(ubuffer, bufflen,
> -				rq_data_dir(req), &iter, ioucmd);
> +		ret = import_iovec(rq_data_dir(req), nvme_to_user_ptr(ubuffer),
> +				bufflen, UIO_FASTIOV, &iov, &iter);
>  		if (ret < 0)
>  			goto out;
>  		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
> -	} else
> +		kfree(iov);
> +	} else {
>  		ret = blk_rq_map_user(q, req, NULL,
> -					nvme_to_user_ptr(ubuffer), bufflen,
> -					GFP_KERNEL);
> +				nvme_to_user_ptr(ubuffer), bufflen, GFP_KERNEL);
> +	}
>  	if (ret)
>  		goto out;
>  	bio = req->bio;
> 

--
Anuj Gupta

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-09-29 11:51 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20220927174622epcas5p1685c0f97a7ee2ee13ba25f5fb58dff00@epcas5p1.samsung.com>
2022-09-27 17:36 ` [PATCH for-next v10 0/7] Fixed-buffer for uring-cmd/passthru Kanchan Joshi
     [not found]   ` <CGME20220927174626epcas5p4002acda6f0578ee314ee5e611b8d6662@epcas5p4.samsung.com>
2022-09-27 17:36     ` [PATCH for-next v10 1/7] io_uring: add io_uring_cmd_import_fixed Kanchan Joshi
     [not found]   ` <CGME20220927174628epcas5p21beda845f26eedeb538cb67e286954d4@epcas5p2.samsung.com>
2022-09-27 17:36     ` [PATCH for-next v10 2/7] io_uring: introduce fixed buffer support for io_uring_cmd Kanchan Joshi
     [not found]   ` <CGME20220927174631epcas5p12cd6ffbd7dad819b0af75733ce6cdd2c@epcas5p1.samsung.com>
2022-09-27 17:36     ` [PATCH for-next v10 3/7] nvme: refactor nvme_add_user_metadata Kanchan Joshi
2022-09-28 17:18       ` Christoph Hellwig
2022-09-29 11:28         ` Anuj Gupta
     [not found]   ` <CGME20220927174633epcas5p4d492bdebde981e2c019e30c47cf00869@epcas5p4.samsung.com>
2022-09-27 17:36     ` [PATCH for-next v10 4/7] nvme: refactor nvme_alloc_request Kanchan Joshi
2022-09-28 17:19       ` Christoph Hellwig
2022-09-29 11:30         ` Anuj Gupta
     [not found]   ` <CGME20220927174636epcas5p49008baa36dcbf2f61c25ba89c4707c0c@epcas5p4.samsung.com>
2022-09-27 17:36     ` [PATCH for-next v10 5/7] block: factor out bio_map_get helper Kanchan Joshi
2022-09-28 17:31       ` Christoph Hellwig
2022-09-28 17:49         ` Jens Axboe
2022-09-28 17:53           ` Jens Axboe
2022-09-29 11:34         ` Anuj Gupta
     [not found]   ` <CGME20220927174639epcas5p22b46aed144d81d82b2a9b9de586808ac@epcas5p2.samsung.com>
2022-09-27 17:36     ` [PATCH for-next v10 6/7] block: extend functionality to map bvec iterator Kanchan Joshi
2022-09-28 17:40       ` Christoph Hellwig
2022-09-29 11:33         ` Anuj Gupta
     [not found]   ` <CGME20220927174642epcas5p1dafa31776d4eb8180e18f149ed25640c@epcas5p1.samsung.com>
2022-09-27 17:36     ` [PATCH for-next v10 7/7] nvme: wire up fixed buffer support for nvme passthrough Kanchan Joshi
2022-09-28 17:59       ` Christoph Hellwig
2022-09-29 11:36         ` Anuj Gupta
2022-09-28 14:28   ` [PATCH for-next v10 0/7] Fixed-buffer for uring-cmd/passthru Jens Axboe
2022-09-28 17:12     ` Christoph Hellwig
2022-09-28 17:13       ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.