All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/5] io_uring: buffer registration enhancements
@ 2021-01-22  0:22 Bijan Mottahedeh
  2021-01-22  0:22 ` [PATCH v6 1/5] io_uring: call io_get_fixed_rsrc_ref for buffers Bijan Mottahedeh
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Bijan Mottahedeh @ 2021-01-22  0:22 UTC (permalink / raw)
  To: axboe, asml.silence, io-uring

v6:

- address v5 comments
- rebase on Pavel's rsrc generalization changes
- see also TBD section below

v5:

- call io_get_fixed_rsrc_ref for buffers
- make percpu_ref_release names consistent
- rebase on for-5.12/io_uring
- see also TBD section below

v4:

- address v3 comments (TBD REGISTER_BUFFERS)
- rebase

v3:

- batch file->rsrc renames into a signle patch when possible
- fix other review changes from v2
- fix checkpatch warnings

v2:

- drop readv/writev with fixed buffers patch
- handle ref_nodes both both files/buffers with a single ref_list
- make file/buffer handling more unified

This patchset implements a set of enhancements to buffer registration
consistent with existing file registration functionality:

- buffer registration updates		IORING_REGISTER_BUFFERS_UPDATE
					IORING_OP_BUFFERS_UPDATE

- buffer registration sharing		IORING_SETUP_SHARE_BUF
					IORING_SETUP_ATTACH_BUF

Patch 1 calls io_get_fixed_rsrc_ref() for buffers as well as files.

Patch 2 applies fixed_rsrc functionality for fixed buffers support.

Patch 3 generalize files_update functionality to rsrc_update.

Patch 4 implements buffer registration update, and introduces
IORING_REGISTER_BUFFERS_UPDATE and IORING_OP_BUFFERS_UPDATE, consistent
with file registration update.

Patch 5 implements buffer sharing among multiple rings; it works as follows:

- A new ring, A,  is setup. Since no buffers have been registered, the
  registered buffer state is an empty set, Z. That's different from the
  NULL state in current implementation.

- Ring B is setup, attaching to Ring A. It's also attaching to it's
  buffer registrations, now we have two references to the same empty
  set, Z.

- Ring A registers buffers into set Z, which is no longer empty.

- Ring B sees this immediately, since it's already sharing that set.

Testing

I have used liburing file-{register,update} tests as models for
buffer-{register,update,share}, tests and they run ok. Liburing test/self
fails but seems unrelated to these changes.

TBD

- Need a patch from Pavel to address a race between fixed IO from async
context and buffer unregister, or force buffer registration ops to do
full quiesce.

Bijan Mottahedeh (5):
  io_uring: call io_get_fixed_rsrc_ref for buffers
  io_uring: implement fixed buffers registration similar to fixed files
  io_uring: generalize files_update functionlity to rsrc_update
  io_uring: support buffer registration updates
  io_uring: support buffer registration sharing

 fs/io_uring.c                 | 448 +++++++++++++++++++++++++++++++++++++-----
 include/uapi/linux/io_uring.h |   4 +
 2 files changed, 403 insertions(+), 49 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v6 1/5] io_uring: call io_get_fixed_rsrc_ref for buffers
  2021-01-22  0:22 [PATCH v6 0/5] io_uring: buffer registration enhancements Bijan Mottahedeh
@ 2021-01-22  0:22 ` Bijan Mottahedeh
  2021-01-22  0:22 ` [PATCH v6 2/5] io_uring: implement fixed buffers registration similar to fixed files Bijan Mottahedeh
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Bijan Mottahedeh @ 2021-01-22  0:22 UTC (permalink / raw)
  To: axboe, asml.silence, io-uring

io_get_fixed_rsrc_ref() must be called for both buffers and files.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
---
 fs/io_uring.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5bfcb72..416c350 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1068,12 +1068,11 @@ static inline void io_clean_op(struct io_kiocb *req)
 		__io_clean_op(req);
 }
 
-static inline void io_set_resource_node(struct io_kiocb *req)
+static inline void io_get_fixed_rsrc_ref(struct io_kiocb *req,
+					 struct fixed_rsrc_data *rsrc_data)
 {
-	struct io_ring_ctx *ctx = req->ctx;
-
 	if (!req->fixed_rsrc_refs) {
-		req->fixed_rsrc_refs = &ctx->file_data->node->refs;
+		req->fixed_rsrc_refs = &rsrc_data->node->refs;
 		percpu_ref_get(req->fixed_rsrc_refs);
 	}
 }
@@ -2940,6 +2939,9 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	req->rw.addr = READ_ONCE(sqe->addr);
 	req->rw.len = READ_ONCE(sqe->len);
 	req->buf_index = READ_ONCE(sqe->buf_index);
+	if (req->opcode == IORING_OP_READ_FIXED ||
+	    req->opcode == IORING_OP_WRITE_FIXED)
+		io_get_fixed_rsrc_ref(req, ctx->buf_data);
 	return 0;
 }
 
@@ -6439,7 +6441,7 @@ static struct file *io_file_get(struct io_submit_state *state,
 			return NULL;
 		fd = array_index_nospec(fd, ctx->nr_user_files);
 		file = io_file_from_index(ctx, fd);
-		io_set_resource_node(req);
+		io_get_fixed_rsrc_ref(req, ctx->file_data);
 	} else {
 		trace_io_uring_file_get(ctx, fd);
 		file = __io_file_get(state, fd);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v6 2/5] io_uring: implement fixed buffers registration similar to fixed files
  2021-01-22  0:22 [PATCH v6 0/5] io_uring: buffer registration enhancements Bijan Mottahedeh
  2021-01-22  0:22 ` [PATCH v6 1/5] io_uring: call io_get_fixed_rsrc_ref for buffers Bijan Mottahedeh
@ 2021-01-22  0:22 ` Bijan Mottahedeh
  2021-01-22  0:22 ` [PATCH v6 3/5] io_uring: generalize files_update functionlity to rsrc_update Bijan Mottahedeh
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Bijan Mottahedeh @ 2021-01-22  0:22 UTC (permalink / raw)
  To: axboe, asml.silence, io-uring

Apply fixed_rsrc functionality for fixed buffers support.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
---
 fs/io_uring.c | 221 ++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 183 insertions(+), 38 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 416c350..2f02e11 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -104,6 +104,14 @@
 #define IORING_MAX_RESTRICTIONS	(IORING_RESTRICTION_LAST + \
 				 IORING_REGISTER_LAST + IORING_OP_LAST)
 
+/*
+ * Shift of 7 is 128 entries, or exactly one page on 64-bit archs
+ */
+#define IORING_BUF_TABLE_SHIFT	7	/* struct io_mapped_ubuf */
+#define IORING_MAX_BUFS_TABLE	(1U << IORING_BUF_TABLE_SHIFT)
+#define IORING_BUF_TABLE_MASK	(IORING_MAX_BUFS_TABLE - 1)
+#define IORING_MAX_FIXED_BUFS	UIO_MAXIOV
+
 struct io_uring {
 	u32 head ____cacheline_aligned_in_smp;
 	u32 tail ____cacheline_aligned_in_smp;
@@ -202,11 +210,15 @@ struct io_rsrc_put {
 	union {
 		void *rsrc;
 		struct file *file;
+		struct io_mapped_ubuf *buf;
 	};
 };
 
 struct fixed_rsrc_table {
-	struct file		**files;
+	union {
+		struct file		**files;
+		struct io_mapped_ubuf	*bufs;
+	};
 };
 
 struct fixed_rsrc_ref_node {
@@ -333,8 +345,8 @@ struct io_ring_ctx {
 	unsigned		nr_user_files;
 
 	/* if used, fixed mapped user buffers */
+	struct fixed_rsrc_data	*buf_data;
 	unsigned		nr_user_bufs;
-	struct io_mapped_ubuf	*user_bufs;
 
 	struct user_struct	*user;
 
@@ -1015,6 +1027,8 @@ static struct fixed_rsrc_ref_node *alloc_fixed_rsrc_ref_node(
 			struct io_ring_ctx *ctx);
 static void init_fixed_file_ref_node(struct io_ring_ctx *ctx,
 				     struct fixed_rsrc_ref_node *ref_node);
+static void init_fixed_buf_ref_node(struct io_ring_ctx *ctx,
+				    struct fixed_rsrc_ref_node *ref_node);
 
 static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
 			     struct io_comp_state *cs);
@@ -2988,6 +3002,15 @@ static void kiocb_done(struct kiocb *kiocb, ssize_t ret,
 		io_rw_done(kiocb, ret);
 }
 
+static inline struct io_mapped_ubuf *io_buf_from_index(struct io_ring_ctx *ctx,
+						       int index)
+{
+	struct fixed_rsrc_table *table;
+
+	table = &ctx->buf_data->table[index >> IORING_BUF_TABLE_SHIFT];
+	return &table->bufs[index & IORING_BUF_TABLE_MASK];
+}
+
 static ssize_t io_import_fixed(struct io_kiocb *req, int rw,
 			       struct iov_iter *iter)
 {
@@ -3001,7 +3024,7 @@ static ssize_t io_import_fixed(struct io_kiocb *req, int rw,
 	if (unlikely(buf_index >= ctx->nr_user_bufs))
 		return -EFAULT;
 	index = array_index_nospec(buf_index, ctx->nr_user_bufs);
-	imu = &ctx->user_bufs[index];
+	imu = io_buf_from_index(ctx, index);
 	buf_addr = req->rw.addr;
 
 	/* overflow */
@@ -6086,7 +6109,7 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 	printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
 			req->opcode);
-	return-EINVAL;
+	return -EINVAL;
 }
 
 static int io_req_defer_prep(struct io_kiocb *req,
@@ -8391,28 +8414,66 @@ static unsigned long ring_pages(unsigned sq_entries, unsigned cq_entries)
 	return pages;
 }
 
-static int io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
+static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_mapped_ubuf *imu)
 {
-	int i, j;
+	unsigned int i;
 
-	if (!ctx->user_bufs)
-		return -ENXIO;
+	for (i = 0; i < imu->nr_bvecs; i++)
+		unpin_user_page(imu->bvec[i].bv_page);
 
-	for (i = 0; i < ctx->nr_user_bufs; i++) {
-		struct io_mapped_ubuf *imu = &ctx->user_bufs[i];
+	if (imu->acct_pages)
+		io_unaccount_mem(ctx, imu->nr_bvecs, ACCT_PINNED);
+	kvfree(imu->bvec);
+	imu->nr_bvecs = 0;
+}
 
-		for (j = 0; j < imu->nr_bvecs; j++)
-			unpin_user_page(imu->bvec[j].bv_page);
+static void io_buffers_unmap(struct io_ring_ctx *ctx)
+{
+	unsigned int i;
+	struct io_mapped_ubuf *imu;
 
-		if (imu->acct_pages)
-			io_unaccount_mem(ctx, imu->acct_pages, ACCT_PINNED);
-		kvfree(imu->bvec);
-		imu->nr_bvecs = 0;
+	for (i = 0; i < ctx->nr_user_bufs; i++) {
+		imu = io_buf_from_index(ctx, i);
+		io_buffer_unmap(ctx, imu);
 	}
+}
 
-	kfree(ctx->user_bufs);
-	ctx->user_bufs = NULL;
+static void io_buffers_map_free(struct io_ring_ctx *ctx)
+{
+	struct fixed_rsrc_data *data = ctx->buf_data;
+	unsigned int nr_tables, i;
+
+	if (!data)
+		return;
+
+	nr_tables = DIV_ROUND_UP(ctx->nr_user_bufs, IORING_MAX_BUFS_TABLE);
+	for (i = 0; i < nr_tables; i++)
+		kfree(data->table[i].bufs);
+	free_fixed_rsrc_data(data);
+	ctx->buf_data = NULL;
 	ctx->nr_user_bufs = 0;
+}
+
+static int io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
+{
+	struct fixed_rsrc_data *data = ctx->buf_data;
+	struct fixed_rsrc_ref_node *backup_node;
+	int ret;
+
+	if (!data)
+		return -ENXIO;
+	backup_node = alloc_fixed_rsrc_ref_node(ctx);
+	if (!backup_node)
+		return -ENOMEM;
+	init_fixed_buf_ref_node(ctx, backup_node);
+
+	ret = io_rsrc_ref_quiesce(data, ctx, backup_node);
+	if (ret)
+		return ret;
+
+	io_buffers_unmap(ctx);
+	io_buffers_map_free(ctx);
+
 	return 0;
 }
 
@@ -8465,7 +8526,9 @@ static bool headpage_already_acct(struct io_ring_ctx *ctx, struct page **pages,
 
 	/* check previously registered pages */
 	for (i = 0; i < ctx->nr_user_bufs; i++) {
-		struct io_mapped_ubuf *imu = &ctx->user_bufs[i];
+		struct io_mapped_ubuf *imu;
+
+		imu = io_buf_from_index(ctx, i);
 
 		for (j = 0; j < imu->nr_bvecs; j++) {
 			if (!PageCompound(imu->bvec[j].bv_page))
@@ -8600,19 +8663,66 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
 	return ret;
 }
 
-static int io_buffers_map_alloc(struct io_ring_ctx *ctx, unsigned int nr_args)
+static void io_free_buf_tables(struct fixed_rsrc_data *buf_data,
+			       unsigned int nr_tables)
 {
-	if (ctx->user_bufs)
-		return -EBUSY;
-	if (!nr_args || nr_args > UIO_MAXIOV)
-		return -EINVAL;
+	int i;
 
-	ctx->user_bufs = kcalloc(nr_args, sizeof(struct io_mapped_ubuf),
-					GFP_KERNEL);
-	if (!ctx->user_bufs)
-		return -ENOMEM;
+	for (i = 0; i < nr_tables; i++) {
+		struct fixed_rsrc_table *table = &buf_data->table[i];
 
-	return 0;
+		kfree(table->bufs);
+	}
+}
+
+static int io_alloc_buf_tables(struct fixed_rsrc_data *buf_data,
+			       unsigned int nr_tables, unsigned int nr_bufs)
+{
+	int i;
+
+	for (i = 0; i < nr_tables; i++) {
+		struct fixed_rsrc_table *table = &buf_data->table[i];
+		unsigned int this_bufs;
+
+		this_bufs = min(nr_bufs, IORING_MAX_BUFS_TABLE);
+		table->bufs = kcalloc(this_bufs, sizeof(struct io_mapped_ubuf),
+				      GFP_KERNEL);
+		if (!table->bufs)
+			break;
+		nr_bufs -= this_bufs;
+	}
+
+	if (i == nr_tables)
+		return 0;
+
+	io_free_buf_tables(buf_data, i);
+	return 1;
+}
+
+static struct fixed_rsrc_data *io_buffers_map_alloc(struct io_ring_ctx *ctx,
+						    unsigned int nr_args)
+{
+	unsigned int nr_tables;
+	struct fixed_rsrc_data *buf_data;
+
+	buf_data = alloc_fixed_rsrc_data(ctx);
+	if (!buf_data)
+		return NULL;
+
+	nr_tables = DIV_ROUND_UP(nr_args, IORING_MAX_BUFS_TABLE);
+	buf_data->table = kcalloc(nr_tables, sizeof(*buf_data->table),
+				  GFP_KERNEL);
+	if (!buf_data->table)
+		goto out;
+
+	if (io_alloc_buf_tables(buf_data, nr_tables, nr_args))
+		goto out;
+
+	return buf_data;
+out:
+	free_fixed_rsrc_data(ctx->buf_data);
+	ctx->buf_data = NULL;
+	return NULL;
 }
 
 static int io_buffer_validate(struct iovec *iov)
@@ -8632,39 +8742,73 @@ static int io_buffer_validate(struct iovec *iov)
 	return 0;
 }
 
+static void io_ring_buf_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc)
+{
+	io_buffer_unmap(ctx, prsrc->buf);
+}
+
+static void init_fixed_buf_ref_node(struct io_ring_ctx *ctx,
+				    struct fixed_rsrc_ref_node *ref_node)
+{
+	ref_node->rsrc_data = ctx->buf_data;
+	ref_node->rsrc_put = io_ring_buf_put;
+}
+
 static int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 				   unsigned int nr_args)
 {
 	int i, ret;
 	struct iovec iov;
 	struct page *last_hpage = NULL;
+	struct fixed_rsrc_ref_node *ref_node;
+	struct fixed_rsrc_data *buf_data;
 
-	ret = io_buffers_map_alloc(ctx, nr_args);
-	if (ret)
-		return ret;
+	if (ctx->buf_data)
+		return -EBUSY;
+	if (!nr_args || nr_args > IORING_MAX_FIXED_BUFS)
+		return -EINVAL;
 
-	for (i = 0; i < nr_args; i++) {
-		struct io_mapped_ubuf *imu = &ctx->user_bufs[i];
+	buf_data = io_buffers_map_alloc(ctx, nr_args);
+	if (!buf_data)
+		return -ENOMEM;
+	ctx->buf_data = buf_data;
+
+	for (i = 0; i < nr_args; i++, ctx->nr_user_bufs++) {
+		struct io_mapped_ubuf *imu;
 
 		ret = io_copy_iov(ctx, &iov, arg, i);
 		if (ret)
 			break;
 
+		/* allow sparse sets */
+		if (!iov.iov_base && !iov.iov_len)
+			continue;
+
 		ret = io_buffer_validate(&iov);
 		if (ret)
 			break;
 
+		imu = io_buf_from_index(ctx, i);
+
 		ret = io_sqe_buffer_register(ctx, &iov, imu, &last_hpage);
 		if (ret)
 			break;
+	}
 
-		ctx->nr_user_bufs++;
+	if (ret) {
+		io_sqe_buffers_unregister(ctx);
+		return ret;
 	}
 
-	if (ret)
+	ref_node = alloc_fixed_rsrc_ref_node(ctx);
+	if (!ref_node) {
 		io_sqe_buffers_unregister(ctx);
+		return -ENOMEM;
+	}
+	init_fixed_buf_ref_node(ctx, ref_node);
 
-	return ret;
+	io_sqe_rsrc_set_node(ctx, buf_data, ref_node);
+	return 0;
 }
 
 static int io_eventfd_register(struct io_ring_ctx *ctx, void __user *arg)
@@ -9508,7 +9652,7 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
 	}
 	seq_printf(m, "UserBufs:\t%u\n", ctx->nr_user_bufs);
 	for (i = 0; has_lock && i < ctx->nr_user_bufs; i++) {
-		struct io_mapped_ubuf *buf = &ctx->user_bufs[i];
+		struct io_mapped_ubuf *buf = io_buf_from_index(ctx, i);
 
 		seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->ubuf,
 						(unsigned int) buf->len);
@@ -10025,6 +10169,7 @@ static bool io_register_op_must_quiesce(int op)
 	switch (op) {
 	case IORING_UNREGISTER_FILES:
 	case IORING_REGISTER_FILES_UPDATE:
+	case IORING_UNREGISTER_BUFFERS:
 	case IORING_REGISTER_PROBE:
 	case IORING_REGISTER_PERSONALITY:
 	case IORING_UNREGISTER_PERSONALITY:
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v6 3/5] io_uring: generalize files_update functionlity to rsrc_update
  2021-01-22  0:22 [PATCH v6 0/5] io_uring: buffer registration enhancements Bijan Mottahedeh
  2021-01-22  0:22 ` [PATCH v6 1/5] io_uring: call io_get_fixed_rsrc_ref for buffers Bijan Mottahedeh
  2021-01-22  0:22 ` [PATCH v6 2/5] io_uring: implement fixed buffers registration similar to fixed files Bijan Mottahedeh
@ 2021-01-22  0:22 ` Bijan Mottahedeh
  2021-01-22  0:22 ` [PATCH v6 4/5] io_uring: support buffer registration updates Bijan Mottahedeh
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Bijan Mottahedeh @ 2021-01-22  0:22 UTC (permalink / raw)
  To: axboe, asml.silence, io-uring

Generalize files_update functionality to rsrc_update in order to
leverage it for buffers updates.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
---
 fs/io_uring.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 2f02e11..62e1b84 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5994,7 +5994,7 @@ static int io_async_cancel(struct io_kiocb *req)
 }
 
 static int io_rsrc_update_prep(struct io_kiocb *req,
-				const struct io_uring_sqe *sqe)
+			       const struct io_uring_sqe *sqe)
 {
 	if (unlikely(req->ctx->flags & IORING_SETUP_SQPOLL))
 		return -EINVAL;
@@ -6011,8 +6011,8 @@ static int io_rsrc_update_prep(struct io_kiocb *req,
 	return 0;
 }
 
-static int io_files_update(struct io_kiocb *req, bool force_nonblock,
-			   struct io_comp_state *cs)
+static int io_rsrc_update(struct io_kiocb *req, bool force_nonblock,
+			  struct io_comp_state *cs)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_uring_rsrc_update up;
@@ -6025,7 +6025,10 @@ static int io_files_update(struct io_kiocb *req, bool force_nonblock,
 	up.data = req->rsrc_update.arg;
 
 	mutex_lock(&ctx->uring_lock);
-	ret = __io_sqe_files_update(ctx, &up, req->rsrc_update.nr_args);
+	if (req->opcode == IORING_OP_FILES_UPDATE)
+		ret = __io_sqe_files_update(ctx, &up, req->rsrc_update.nr_args);
+	else
+		ret = -EINVAL;
 	mutex_unlock(&ctx->uring_lock);
 
 	if (ret < 0)
@@ -6326,7 +6329,7 @@ static int io_issue_sqe(struct io_kiocb *req, bool force_nonblock,
 		ret = io_close(req, force_nonblock, cs);
 		break;
 	case IORING_OP_FILES_UPDATE:
-		ret = io_files_update(req, force_nonblock, cs);
+		ret = io_rsrc_update(req, force_nonblock, cs);
 		break;
 	case IORING_OP_STATX:
 		ret = io_statx(req, force_nonblock);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v6 4/5] io_uring: support buffer registration updates
  2021-01-22  0:22 [PATCH v6 0/5] io_uring: buffer registration enhancements Bijan Mottahedeh
                   ` (2 preceding siblings ...)
  2021-01-22  0:22 ` [PATCH v6 3/5] io_uring: generalize files_update functionlity to rsrc_update Bijan Mottahedeh
@ 2021-01-22  0:22 ` Bijan Mottahedeh
  2021-01-22  0:22 ` [PATCH v6 5/5] io_uring: support buffer registration sharing Bijan Mottahedeh
  2021-01-26  5:31 ` [PATCH v6 0/5] io_uring: buffer registration enhancements Bijan Mottahedeh
  5 siblings, 0 replies; 7+ messages in thread
From: Bijan Mottahedeh @ 2021-01-22  0:22 UTC (permalink / raw)
  To: axboe, asml.silence, io-uring

Introduce IORING_REGISTER_BUFFERS_UPDATE and IORING_OP_BUFFERS_UPDATE,
consistent with file registration update.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
---
 fs/io_uring.c                 | 125 +++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/io_uring.h |   2 +
 2 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 62e1b84..15f0e41 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1012,6 +1012,9 @@ struct io_op_def {
 		.work_flags		= IO_WQ_WORK_MM | IO_WQ_WORK_FILES |
 						IO_WQ_WORK_FS | IO_WQ_WORK_BLKCG,
 	},
+	[IORING_OP_BUFFERS_UPDATE] = {
+		.work_flags		= IO_WQ_WORK_MM,
+	},
 };
 
 enum io_mem_account {
@@ -1042,6 +1045,9 @@ static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
 static int __io_sqe_files_update(struct io_ring_ctx *ctx,
 				 struct io_uring_rsrc_update *ip,
 				 unsigned nr_args);
+static int __io_sqe_buffers_update(struct io_ring_ctx *ctx,
+				   struct io_uring_rsrc_update *up,
+				   unsigned int nr_args);
 static void __io_clean_op(struct io_kiocb *req);
 static struct file *io_file_get(struct io_submit_state *state,
 				struct io_kiocb *req, int fd, bool fixed);
@@ -6016,6 +6022,7 @@ static int io_rsrc_update(struct io_kiocb *req, bool force_nonblock,
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_uring_rsrc_update up;
+	u32 nr_args;
 	int ret;
 
 	if (force_nonblock)
@@ -6025,8 +6032,11 @@ static int io_rsrc_update(struct io_kiocb *req, bool force_nonblock,
 	up.data = req->rsrc_update.arg;
 
 	mutex_lock(&ctx->uring_lock);
+	nr_args = req->rsrc_update.nr_args;
 	if (req->opcode == IORING_OP_FILES_UPDATE)
-		ret = __io_sqe_files_update(ctx, &up, req->rsrc_update.nr_args);
+		ret = __io_sqe_files_update(ctx, &up, nr_args);
+	else if (req->opcode == IORING_OP_BUFFERS_UPDATE)
+		ret = __io_sqe_buffers_update(ctx, &up, nr_args);
 	else
 		ret = -EINVAL;
 	mutex_unlock(&ctx->uring_lock);
@@ -6108,6 +6118,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return io_renameat_prep(req, sqe);
 	case IORING_OP_UNLINKAT:
 		return io_unlinkat_prep(req, sqe);
+	case IORING_OP_BUFFERS_UPDATE:
+		return io_rsrc_update_prep(req, sqe);
 	}
 
 	printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
@@ -6329,6 +6341,7 @@ static int io_issue_sqe(struct io_kiocb *req, bool force_nonblock,
 		ret = io_close(req, force_nonblock, cs);
 		break;
 	case IORING_OP_FILES_UPDATE:
+	case IORING_OP_BUFFERS_UPDATE:
 		ret = io_rsrc_update(req, force_nonblock, cs);
 		break;
 	case IORING_OP_STATX:
@@ -8093,8 +8106,9 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
 	if (needs_switch) {
 		percpu_ref_kill(&data->node->refs);
 		io_sqe_rsrc_set_node(ctx, data, ref_node);
-	} else
+	} else {
 		destroy_fixed_rsrc_ref_node(ref_node);
+	}
 
 	return done ? done : err;
 }
@@ -8427,6 +8441,7 @@ static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_mapped_ubuf *imu)
 	if (imu->acct_pages)
 		io_unaccount_mem(ctx, imu->nr_bvecs, ACCT_PINNED);
 	kvfree(imu->bvec);
+	imu->bvec = NULL;
 	imu->nr_bvecs = 0;
 }
 
@@ -8633,6 +8648,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
 		if (pret > 0)
 			unpin_user_pages(pages, pret);
 		kvfree(imu->bvec);
+		imu->bvec = NULL;
 		goto done;
 	}
 
@@ -8748,6 +8764,8 @@ static int io_buffer_validate(struct iovec *iov)
 static void io_ring_buf_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc)
 {
 	io_buffer_unmap(ctx, prsrc->buf);
+	kvfree(prsrc->buf);
+	prsrc->buf = NULL;
 }
 
 static void init_fixed_buf_ref_node(struct io_ring_ctx *ctx,
@@ -8814,6 +8832,105 @@ static int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 	return 0;
 }
 
+static inline int io_queue_buffer_removal(struct fixed_rsrc_data *data,
+					  struct io_mapped_ubuf *imu)
+{
+	return io_queue_rsrc_removal(data, (void *)imu);
+}
+
+static int __io_sqe_buffers_update(struct io_ring_ctx *ctx,
+				   struct io_uring_rsrc_update *up,
+				   unsigned int nr_args)
+{
+	struct fixed_rsrc_data *data = ctx->buf_data;
+	struct fixed_rsrc_ref_node *ref_node;
+	struct io_mapped_ubuf *imu;
+	struct iovec iov;
+	struct iovec __user *iovs;
+	struct page *last_hpage = NULL;
+	__u32 done;
+	int i, err;
+	bool needs_switch = false;
+
+	if (check_add_overflow(up->offset, nr_args, &done))
+		return -EOVERFLOW;
+	if (done > ctx->nr_user_bufs)
+		return -EINVAL;
+
+	ref_node = alloc_fixed_rsrc_ref_node(ctx);
+	if (!ref_node)
+		return -ENOMEM;
+	init_fixed_buf_ref_node(ctx, ref_node);
+
+	done = 0;
+	iovs = u64_to_user_ptr(up->data);
+	while (nr_args) {
+		struct fixed_rsrc_table *table;
+		unsigned int index;
+
+		err = 0;
+		if (copy_from_user(&iov, &iovs[done], sizeof(iov))) {
+			err = -EFAULT;
+			break;
+		}
+		i = array_index_nospec(up->offset, ctx->nr_user_bufs);
+		table = &ctx->buf_data->table[i >> IORING_BUF_TABLE_SHIFT];
+		index = i & IORING_BUF_TABLE_MASK;
+		imu = &table->bufs[index];
+		if (table->bufs[index].ubuf) {
+			struct io_mapped_ubuf *dup;
+
+			dup = kmemdup(imu, sizeof(*imu), GFP_KERNEL);
+			if (!dup) {
+				err = -ENOMEM;
+				break;
+			}
+			err = io_queue_buffer_removal(data, dup);
+			if (err)
+				break;
+			memset(imu, 0, sizeof(*imu));
+			needs_switch = true;
+		}
+		if (!io_buffer_validate(&iov)) {
+			err = io_sqe_buffer_register(ctx, &iov, imu,
+						     &last_hpage);
+			if (err) {
+				memset(imu, 0, sizeof(*imu));
+				break;
+			}
+		}
+		nr_args--;
+		done++;
+		up->offset++;
+	}
+
+	if (needs_switch) {
+		percpu_ref_kill(&data->node->refs);
+		io_sqe_rsrc_set_node(ctx, data, ref_node);
+	} else {
+		destroy_fixed_rsrc_ref_node(ref_node);
+	}
+
+	return done ? done : err;
+}
+
+static int io_sqe_buffers_update(struct io_ring_ctx *ctx, void __user *arg,
+				 unsigned int nr_args)
+{
+	struct io_uring_rsrc_update up;
+
+	if (!ctx->buf_data)
+		return -ENXIO;
+	if (!nr_args)
+		return -EINVAL;
+	if (copy_from_user(&up, arg, sizeof(up)))
+		return -EFAULT;
+	if (up.resv)
+		return -EINVAL;
+
+	return __io_sqe_buffers_update(ctx, &up, nr_args);
+}
+
 static int io_eventfd_register(struct io_ring_ctx *ctx, void __user *arg)
 {
 	__s32 __user *fds = arg;
@@ -10173,6 +10290,7 @@ static bool io_register_op_must_quiesce(int op)
 	case IORING_UNREGISTER_FILES:
 	case IORING_REGISTER_FILES_UPDATE:
 	case IORING_UNREGISTER_BUFFERS:
+	case IORING_REGISTER_BUFFERS_UPDATE:
 	case IORING_REGISTER_PROBE:
 	case IORING_REGISTER_PERSONALITY:
 	case IORING_UNREGISTER_PERSONALITY:
@@ -10248,6 +10366,9 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
 			break;
 		ret = io_sqe_buffers_unregister(ctx);
 		break;
+	case IORING_REGISTER_BUFFERS_UPDATE:
+		ret = io_sqe_buffers_update(ctx, arg, nr_args);
+		break;
 	case IORING_REGISTER_FILES:
 		ret = io_sqe_files_register(ctx, arg, nr_args);
 		break;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index f9f106c..32b3fa6 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -137,6 +137,7 @@ enum {
 	IORING_OP_SHUTDOWN,
 	IORING_OP_RENAMEAT,
 	IORING_OP_UNLINKAT,
+	IORING_OP_BUFFERS_UPDATE,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
@@ -280,6 +281,7 @@ enum {
 	IORING_UNREGISTER_PERSONALITY		= 10,
 	IORING_REGISTER_RESTRICTIONS		= 11,
 	IORING_REGISTER_ENABLE_RINGS		= 12,
+	IORING_REGISTER_BUFFERS_UPDATE		= 13,
 
 	/* this goes last */
 	IORING_REGISTER_LAST
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v6 5/5] io_uring: support buffer registration sharing
  2021-01-22  0:22 [PATCH v6 0/5] io_uring: buffer registration enhancements Bijan Mottahedeh
                   ` (3 preceding siblings ...)
  2021-01-22  0:22 ` [PATCH v6 4/5] io_uring: support buffer registration updates Bijan Mottahedeh
@ 2021-01-22  0:22 ` Bijan Mottahedeh
  2021-01-26  5:31 ` [PATCH v6 0/5] io_uring: buffer registration enhancements Bijan Mottahedeh
  5 siblings, 0 replies; 7+ messages in thread
From: Bijan Mottahedeh @ 2021-01-22  0:22 UTC (permalink / raw)
  To: axboe, asml.silence, io-uring

Implement buffer sharing among multiple rings.

A ring shares its (future) buffer registrations at setup time with
IORING_SETUP_SHARE_BUF. A ring attaches to another ring's buffer
registration at setup time with IORING_SETUP_ATTACH_BUF, after
authenticating with the buffer registration owner's fd. Any updates to
the owner's buffer registrations become immediately available to the
attached rings.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
---
 fs/io_uring.c                 | 87 +++++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/io_uring.h |  2 +
 2 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 15f0e41..0e9da02 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8472,6 +8472,13 @@ static void io_buffers_map_free(struct io_ring_ctx *ctx)
 	ctx->nr_user_bufs = 0;
 }
 
+static void io_detach_buf_data(struct io_ring_ctx *ctx)
+{
+	percpu_ref_put(&ctx->buf_data->refs);
+	ctx->buf_data = NULL;
+	ctx->nr_user_bufs = 0;
+}
+
 static int io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
 {
 	struct fixed_rsrc_data *data = ctx->buf_data;
@@ -8480,6 +8487,12 @@ static int io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
 
 	if (!data)
 		return -ENXIO;
+
+	if (ctx->flags & IORING_SETUP_ATTACH_BUF) {
+		io_detach_buf_data(ctx);
+		return 0;
+	}
+
 	backup_node = alloc_fixed_rsrc_ref_node(ctx);
 	if (!backup_node)
 		return -ENOMEM;
@@ -8724,9 +8737,13 @@ static struct fixed_rsrc_data *io_buffers_map_alloc(struct io_ring_ctx *ctx,
 	unsigned int nr_tables;
 	struct fixed_rsrc_data *buf_data;
 
-	buf_data = alloc_fixed_rsrc_data(ctx);
-	if (!buf_data)
-		return NULL;
+	if (ctx->buf_data) {
+		buf_data = ctx->buf_data;
+	} else {
+		buf_data = alloc_fixed_rsrc_data(ctx);
+		if (!buf_data)
+			return buf_data;
+	}
 
 	nr_tables = DIV_ROUND_UP(nr_args, IORING_MAX_BUFS_TABLE);
 	buf_data->table = kcalloc(nr_tables, sizeof(*buf_data->table),
@@ -8784,8 +8801,16 @@ static int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 	struct fixed_rsrc_ref_node *ref_node;
 	struct fixed_rsrc_data *buf_data;
 
-	if (ctx->buf_data)
+	if (ctx->nr_user_bufs)
 		return -EBUSY;
+
+	if (ctx->flags & IORING_SETUP_ATTACH_BUF) {
+		if (!ctx->buf_data)
+			return -EFAULT;
+		ctx->nr_user_bufs = ctx->buf_data->ctx->nr_user_bufs;
+		return 0;
+	}
+
 	if (!nr_args || nr_args > IORING_MAX_FIXED_BUFS)
 		return -EINVAL;
 
@@ -9914,6 +9939,55 @@ static struct file *io_uring_get_file(struct io_ring_ctx *ctx)
 	return file;
 }
 
+static int io_attach_buf_data(struct io_ring_ctx *ctx,
+			      struct io_uring_params *p)
+{
+	struct io_ring_ctx *ctx_attach;
+	struct fd f;
+
+	f = fdget(p->wq_fd);
+	if (!f.file)
+		return -EBADF;
+	if (f.file->f_op != &io_uring_fops) {
+		fdput(f);
+		return -EINVAL;
+	}
+
+	ctx_attach = f.file->private_data;
+	if (!ctx_attach->buf_data) {
+		fdput(f);
+		return -EINVAL;
+	}
+	ctx->buf_data = ctx_attach->buf_data;
+
+	percpu_ref_get(&ctx->buf_data->refs);
+	fdput(f);
+	return 0;
+}
+
+static int io_init_buf_data(struct io_ring_ctx *ctx, struct io_uring_params *p)
+{
+	if ((p->flags & (IORING_SETUP_SHARE_BUF | IORING_SETUP_ATTACH_BUF)) ==
+	    (IORING_SETUP_SHARE_BUF | IORING_SETUP_ATTACH_BUF))
+		return -EINVAL;
+
+	if (p->flags & IORING_SETUP_SHARE_BUF) {
+		struct fixed_rsrc_data *buf_data;
+
+		buf_data = alloc_fixed_rsrc_data(ctx);
+		if (!buf_data)
+			return -ENOMEM;
+
+		ctx->buf_data = buf_data;
+		return 0;
+	}
+
+	if (p->flags & IORING_SETUP_ATTACH_BUF)
+		return io_attach_buf_data(ctx, p);
+
+	return 0;
+}
+
 static int io_uring_create(unsigned entries, struct io_uring_params *p,
 			   struct io_uring_params __user *params)
 {
@@ -10031,6 +10105,10 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p,
 	if (ret)
 		goto err;
 
+	ret = io_init_buf_data(ctx, p);
+	if (ret)
+		goto err;
+
 	ret = io_sq_offload_create(ctx, p);
 	if (ret)
 		goto err;
@@ -10113,6 +10191,7 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
 	if (p.flags & ~(IORING_SETUP_IOPOLL | IORING_SETUP_SQPOLL |
 			IORING_SETUP_SQ_AFF | IORING_SETUP_CQSIZE |
 			IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
+			IORING_SETUP_SHARE_BUF | IORING_SETUP_ATTACH_BUF |
 			IORING_SETUP_R_DISABLED))
 		return -EINVAL;
 
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 32b3fa6..aeaf72c 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -98,6 +98,8 @@ enum {
 #define IORING_SETUP_CLAMP	(1U << 4)	/* clamp SQ/CQ ring sizes */
 #define IORING_SETUP_ATTACH_WQ	(1U << 5)	/* attach to existing wq */
 #define IORING_SETUP_R_DISABLED	(1U << 6)	/* start with ring disabled */
+#define IORING_SETUP_SHARE_BUF	(1U << 7)	/* share buffer registration */
+#define IORING_SETUP_ATTACH_BUF	(1U << 8)	/* attach buffer registration */
 
 enum {
 	IORING_OP_NOP,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v6 0/5] io_uring: buffer registration enhancements
  2021-01-22  0:22 [PATCH v6 0/5] io_uring: buffer registration enhancements Bijan Mottahedeh
                   ` (4 preceding siblings ...)
  2021-01-22  0:22 ` [PATCH v6 5/5] io_uring: support buffer registration sharing Bijan Mottahedeh
@ 2021-01-26  5:31 ` Bijan Mottahedeh
  5 siblings, 0 replies; 7+ messages in thread
From: Bijan Mottahedeh @ 2021-01-26  5:31 UTC (permalink / raw)
  To: axboe, asml.silence, io-uring

Gentle reminder to please review this next version.

> v6:
> 
> - address v5 comments
> - rebase on Pavel's rsrc generalization changes
> - see also TBD section below
> 
> v5:
> 
> - call io_get_fixed_rsrc_ref for buffers
> - make percpu_ref_release names consistent
> - rebase on for-5.12/io_uring
> - see also TBD section below
> 
> v4:
> 
> - address v3 comments (TBD REGISTER_BUFFERS)
> - rebase
> 
> v3:
> 
> - batch file->rsrc renames into a signle patch when possible
> - fix other review changes from v2
> - fix checkpatch warnings
> 
> v2:
> 
> - drop readv/writev with fixed buffers patch
> - handle ref_nodes both both files/buffers with a single ref_list
> - make file/buffer handling more unified
> 
> This patchset implements a set of enhancements to buffer registration
> consistent with existing file registration functionality:
> 
> - buffer registration updates		IORING_REGISTER_BUFFERS_UPDATE
> 					IORING_OP_BUFFERS_UPDATE
> 
> - buffer registration sharing		IORING_SETUP_SHARE_BUF
> 					IORING_SETUP_ATTACH_BUF
> 
> Patch 1 calls io_get_fixed_rsrc_ref() for buffers as well as files.
> 
> Patch 2 applies fixed_rsrc functionality for fixed buffers support.
> 
> Patch 3 generalize files_update functionality to rsrc_update.
> 
> Patch 4 implements buffer registration update, and introduces
> IORING_REGISTER_BUFFERS_UPDATE and IORING_OP_BUFFERS_UPDATE, consistent
> with file registration update.
> 
> Patch 5 implements buffer sharing among multiple rings; it works as follows:
> 
> - A new ring, A,  is setup. Since no buffers have been registered, the
>    registered buffer state is an empty set, Z. That's different from the
>    NULL state in current implementation.
> 
> - Ring B is setup, attaching to Ring A. It's also attaching to it's
>    buffer registrations, now we have two references to the same empty
>    set, Z.
> 
> - Ring A registers buffers into set Z, which is no longer empty.
> 
> - Ring B sees this immediately, since it's already sharing that set.
> 
> Testing
> 
> I have used liburing file-{register,update} tests as models for
> buffer-{register,update,share}, tests and they run ok. Liburing test/self
> fails but seems unrelated to these changes.
> 
> TBD
> 
> - Need a patch from Pavel to address a race between fixed IO from async
> context and buffer unregister, or force buffer registration ops to do
> full quiesce.
> 
> Bijan Mottahedeh (5):
>    io_uring: call io_get_fixed_rsrc_ref for buffers
>    io_uring: implement fixed buffers registration similar to fixed files
>    io_uring: generalize files_update functionlity to rsrc_update
>    io_uring: support buffer registration updates
>    io_uring: support buffer registration sharing
> 
>   fs/io_uring.c                 | 448 +++++++++++++++++++++++++++++++++++++-----
>   include/uapi/linux/io_uring.h |   4 +
>   2 files changed, 403 insertions(+), 49 deletions(-)
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-26 23:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22  0:22 [PATCH v6 0/5] io_uring: buffer registration enhancements Bijan Mottahedeh
2021-01-22  0:22 ` [PATCH v6 1/5] io_uring: call io_get_fixed_rsrc_ref for buffers Bijan Mottahedeh
2021-01-22  0:22 ` [PATCH v6 2/5] io_uring: implement fixed buffers registration similar to fixed files Bijan Mottahedeh
2021-01-22  0:22 ` [PATCH v6 3/5] io_uring: generalize files_update functionlity to rsrc_update Bijan Mottahedeh
2021-01-22  0:22 ` [PATCH v6 4/5] io_uring: support buffer registration updates Bijan Mottahedeh
2021-01-22  0:22 ` [PATCH v6 5/5] io_uring: support buffer registration sharing Bijan Mottahedeh
2021-01-26  5:31 ` [PATCH v6 0/5] io_uring: buffer registration enhancements Bijan Mottahedeh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.