All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] io_uring/ublk: exit notifier support
@ 2023-09-18  4:10 Ming Lei
  2023-09-18  4:10 ` [PATCH 01/10] io_uring: allocate ctx id and build map between id and ctx Ming Lei
                   ` (10 more replies)
  0 siblings, 11 replies; 16+ messages in thread
From: Ming Lei @ 2023-09-18  4:10 UTC (permalink / raw)
  To: Jens Axboe, linux-block, io-uring; +Cc: ZiyangZhang, Ming Lei

Hello,

In do_exit(), io_uring needs to wait pending requests.

ublk is one uring_cmd driver, and its usage is a bit special by submitting
command for waiting incoming block IO request in advance, so if there
isn't any IO request coming, the command can't be completed. So far ublk
driver has to bind its queue with one ublk daemon server, meantime
starts one monitor work to check if this daemon is live periodically.
This way requires ublk queue to be bound one single daemon pthread, and
not flexible, meantime the monitor work is run in 3rd context, and the
implementation is a bit tricky.

The 1st 3 patches adds io_uring task exit notifier, and the other
patches converts ublk into this exit notifier, and the implementation
becomes more robust & readable, meantime it becomes easier to relax
the ublk queue/daemon limit in future, such as not require to bind
ublk queue with single daemon.

Ming Lei (10):
  io_uring: allocate ctx id and build map between id and ctx
  io_uring: pass io_uring_ctx->id to uring_cmd
  io_uring: support io_uring notifier for uring_cmd
  ublk: don't get ublk device reference in ublk_abort_queue()
  ublk: make sure ublk uring cmd handling is done in submitter task
    context
  ublk: make sure that uring cmd aiming at same queue won't cross
    io_uring contexts
  ublk: rename mm_lock as lock
  ublk: quiesce request queue when aborting queue
  ublk: replace monitor work with uring_cmd exit notifier
  ublk: simplify aborting request

 drivers/block/ublk_drv.c       | 216 +++++++++++++++++++--------------
 include/linux/io_uring.h       |  27 ++++-
 include/linux/io_uring_types.h |   3 +
 io_uring/io_uring.c            |  57 +++++++++
 io_uring/io_uring.h            |   4 +
 io_uring/uring_cmd.c           |  13 ++
 6 files changed, 230 insertions(+), 90 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 01/10] io_uring: allocate ctx id and build map between id and ctx
  2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
@ 2023-09-18  4:10 ` Ming Lei
  2023-09-18  4:10 ` [PATCH 02/10] io_uring: pass io_uring_ctx->id to uring_cmd Ming Lei
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-09-18  4:10 UTC (permalink / raw)
  To: Jens Axboe, linux-block, io-uring; +Cc: ZiyangZhang, Ming Lei

Prepare for supporting to notify uring_cmd driver when ctx/io_uring_task
is going away.

Notifier callback will be registered by driver to get notified, so that driver
can cancel in-flight command which may depend on the io task.

For driver to check if the ctx is matched with uring_cmd, allocate/provide ctx
id to the callback, so we can avoid to expose the whole ctx instance.

The global xarray of ctx_ids is added for holding the mapping and allocating
unique id for each ctx.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/io_uring.h       | 2 ++
 include/linux/io_uring_types.h | 3 +++
 io_uring/io_uring.c            | 9 +++++++++
 3 files changed, 14 insertions(+)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 106cdc55ff3b..ec9714e36477 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -41,6 +41,8 @@ static inline const void *io_uring_sqe_cmd(const struct io_uring_sqe *sqe)
 	return sqe->cmd;
 }
 
+#define IO_URING_INVALID_CTX_ID  UINT_MAX
+
 #if defined(CONFIG_IO_URING)
 int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 			      struct iov_iter *iter, void *ioucmd);
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 13d19b9be9f4..d310bb073101 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -215,6 +215,9 @@ struct io_ring_ctx {
 		struct percpu_ref	refs;
 
 		enum task_work_notify_mode	notify_method;
+
+		/* for uring cmd driver to retrieve context  */
+		unsigned int		id;
 	} ____cacheline_aligned_in_smp;
 
 	/* submission data */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 783ed0fff71b..c015c070ff85 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -175,6 +175,9 @@ static struct ctl_table kernel_io_uring_disabled_table[] = {
 };
 #endif
 
+/* mapping between io_ring_ctx instance and its ctx_id */
+static DEFINE_XARRAY_FLAGS(ctx_ids, XA_FLAGS_ALLOC);
+
 struct sock *io_uring_get_socket(struct file *file)
 {
 #if defined(CONFIG_UNIX)
@@ -303,6 +306,10 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 
 	xa_init(&ctx->io_bl_xa);
 
+	ctx->id = IO_URING_INVALID_CTX_ID;
+	if (xa_alloc(&ctx_ids, &ctx->id, ctx, xa_limit_31b, GFP_KERNEL))
+		goto err;
+
 	/*
 	 * Use 5 bits less than the max cq entries, that should give us around
 	 * 32 entries per hash list if totally full and uniformly spread, but
@@ -356,6 +363,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	kfree(ctx->cancel_table_locked.hbs);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
+	xa_erase(&ctx_ids, ctx->id);
 	kfree(ctx);
 	return NULL;
 }
@@ -2929,6 +2937,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	kfree(ctx->cancel_table_locked.hbs);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
+	xa_erase(&ctx_ids, ctx->id);
 	kfree(ctx);
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 02/10] io_uring: pass io_uring_ctx->id to uring_cmd
  2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
  2023-09-18  4:10 ` [PATCH 01/10] io_uring: allocate ctx id and build map between id and ctx Ming Lei
@ 2023-09-18  4:10 ` Ming Lei
  2023-09-18  4:10 ` [PATCH 03/10] io_uring: support io_uring notifier for uring_cmd Ming Lei
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-09-18  4:10 UTC (permalink / raw)
  To: Jens Axboe, linux-block, io-uring; +Cc: ZiyangZhang, Ming Lei

Pass io_uring_ctx-id to uring_cmd driver, and prepare for supporting
io_uring ctx or task exit notifier.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/io_uring.h | 6 +++++-
 io_uring/uring_cmd.c     | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index ec9714e36477..c395807bd7cf 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -33,7 +33,11 @@ struct io_uring_cmd {
 	};
 	u32		cmd_op;
 	u32		flags;
-	u8		pdu[32]; /* available inline for free use */
+	union {
+		/* driver needs to save ctx_id */
+		u32		ctx_id;
+		u8		pdu[32]; /* available inline for free use */
+	};
 };
 
 static inline const void *io_uring_sqe_cmd(const struct io_uring_sqe *sqe)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 537795fddc87..c54c627fb6b9 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -105,6 +105,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		req->imu = ctx->user_bufs[index];
 		io_req_set_rsrc_node(req, ctx, 0);
 	}
+	ioucmd->ctx_id = req->ctx->id;
 	ioucmd->sqe = sqe;
 	ioucmd->cmd_op = READ_ONCE(sqe->cmd_op);
 	return 0;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 03/10] io_uring: support io_uring notifier for uring_cmd
  2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
  2023-09-18  4:10 ` [PATCH 01/10] io_uring: allocate ctx id and build map between id and ctx Ming Lei
  2023-09-18  4:10 ` [PATCH 02/10] io_uring: pass io_uring_ctx->id to uring_cmd Ming Lei
@ 2023-09-18  4:10 ` Ming Lei
  2023-09-18  4:11 ` [PATCH 04/10] ublk: don't get ublk device reference in ublk_abort_queue() Ming Lei
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-09-18  4:10 UTC (permalink / raw)
  To: Jens Axboe, linux-block, io-uring; +Cc: ZiyangZhang, Ming Lei

Notifier callback is registered by driver to get notified, so far only for
uring_cmd based driver.

With this notifier, driver can cancel in-flight command when ctx is being
released or io task is exiting.

The main use case is ublk(or probably fuse with uring cmd support) in which
uring command may never complete, so driver has to cancel this command
when io task is exiting or ctx is releasing, otherwise __io_uring_cancel() may
wait forever because of inflight commands.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/io_uring.h | 19 ++++++++++++++++
 io_uring/io_uring.c      | 48 ++++++++++++++++++++++++++++++++++++++++
 io_uring/io_uring.h      |  4 ++++
 io_uring/uring_cmd.c     | 12 ++++++++++
 4 files changed, 83 insertions(+)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index c395807bd7cf..037bff9960a1 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -47,7 +47,19 @@ static inline const void *io_uring_sqe_cmd(const struct io_uring_sqe *sqe)
 
 #define IO_URING_INVALID_CTX_ID  UINT_MAX
 
+enum io_uring_notifier {
+	IO_URING_NOTIFIER_CTX_EXIT,
+	IO_URING_NOTIFIER_IO_TASK_EXIT,
+};
+
+struct io_uring_notifier_data {
+	unsigned int ctx_id;
+	const struct task_struct *task;
+};
+
 #if defined(CONFIG_IO_URING)
+int io_uring_cmd_register_notifier(struct notifier_block *nb);
+void io_uring_cmd_unregister_notifier(struct notifier_block *nb);
 int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 			      struct iov_iter *iter, void *ioucmd);
 void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2,
@@ -89,6 +101,13 @@ static inline void io_uring_free(struct task_struct *tsk)
 }
 int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags);
 #else
+static inline int io_uring_cmd_register_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+static inline void io_uring_cmd_unregister_notifier(struct notifier_block *nb)
+{
+}
 static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 			      struct iov_iter *iter, void *ioucmd)
 {
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index c015c070ff85..de9b217bf5d8 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -73,6 +73,7 @@
 #include <linux/audit.h>
 #include <linux/security.h>
 #include <asm/shmparam.h>
+#include <linux/notifier.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -178,6 +179,22 @@ static struct ctl_table kernel_io_uring_disabled_table[] = {
 /* mapping between io_ring_ctx instance and its ctx_id */
 static DEFINE_XARRAY_FLAGS(ctx_ids, XA_FLAGS_ALLOC);
 
+/*
+ * Uring_cmd driver can register to be notified when ctx/io_uring_task
+ * is going away for canceling inflight commands.
+ */
+static struct srcu_notifier_head notifier_chain;
+
+int io_uring_register_notifier(struct notifier_block *nb)
+{
+	return srcu_notifier_chain_register(&notifier_chain, nb);
+}
+
+void io_uring_unregister_notifier(struct notifier_block *nb)
+{
+	srcu_notifier_chain_unregister(&notifier_chain, nb);
+}
+
 struct sock *io_uring_get_socket(struct file *file)
 {
 #if defined(CONFIG_UNIX)
@@ -191,6 +208,11 @@ struct sock *io_uring_get_socket(struct file *file)
 }
 EXPORT_SYMBOL(io_uring_get_socket);
 
+struct io_ring_ctx *io_uring_id_to_ctx(unsigned int id)
+{
+	return (struct io_ring_ctx *)xa_load(&ctx_ids, id);
+}
+
 static inline void io_submit_flush_completions(struct io_ring_ctx *ctx)
 {
 	if (!wq_list_empty(&ctx->submit_state.compl_reqs) ||
@@ -3060,6 +3082,23 @@ static __cold bool io_cancel_ctx_cb(struct io_wq_work *work, void *data)
 	return req->ctx == data;
 }
 
+static __cold void io_uring_cancel_notify(struct io_ring_ctx *ctx,
+					  struct task_struct *task)
+{
+	struct io_uring_notifier_data notifier_data = {
+		.ctx_id = ctx->id,
+		.task	= task,
+	};
+	enum io_uring_notifier notifier;
+
+	if (!task)
+		notifier = IO_URING_NOTIFIER_CTX_EXIT;
+	else
+		notifier = IO_URING_NOTIFIER_IO_TASK_EXIT;
+
+	srcu_notifier_call_chain(&notifier_chain, notifier, &notifier_data);
+}
+
 static __cold void io_ring_exit_work(struct work_struct *work)
 {
 	struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
@@ -3069,6 +3108,8 @@ static __cold void io_ring_exit_work(struct work_struct *work)
 	struct io_tctx_node *node;
 	int ret;
 
+	io_uring_cancel_notify(ctx, NULL);
+
 	/*
 	 * If we're doing polled IO and end up having requests being
 	 * submitted async (out-of-line), then completions can come in while
@@ -3346,6 +3387,11 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd)
 	if (tctx->io_wq)
 		io_wq_exit_start(tctx->io_wq);
 
+	if (!cancel_all) {
+		xa_for_each(&tctx->xa, index, node)
+			io_uring_cancel_notify(node->ctx, current);
+	}
+
 	atomic_inc(&tctx->in_cancel);
 	do {
 		bool loop = false;
@@ -4695,6 +4741,8 @@ static int __init io_uring_init(void)
 	register_sysctl_init("kernel", kernel_io_uring_disabled_table);
 #endif
 
+	srcu_init_notifier_head(&notifier_chain);
+
 	return 0;
 };
 __initcall(io_uring_init);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 547c30582fb8..1d5588d8a88a 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -38,6 +38,10 @@ enum {
 	IOU_STOP_MULTISHOT	= -ECANCELED,
 };
 
+struct io_ring_ctx *io_uring_id_to_ctx(unsigned int id);
+int io_uring_register_notifier(struct notifier_block *nb);
+void io_uring_unregister_notifier(struct notifier_block *nb);
+
 bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow);
 void io_req_cqe_overflow(struct io_kiocb *req);
 int io_run_task_work_sig(struct io_ring_ctx *ctx);
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index c54c627fb6b9..03e3a8c1b712 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -192,3 +192,15 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 	}
 }
 EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
+
+int io_uring_cmd_register_notifier(struct notifier_block *nb)
+{
+	return io_uring_register_notifier(nb);
+}
+EXPORT_SYMBOL_GPL(io_uring_cmd_register_notifier);
+
+void io_uring_cmd_unregister_notifier(struct notifier_block *nb)
+{
+	io_uring_unregister_notifier(nb);
+}
+EXPORT_SYMBOL_GPL(io_uring_cmd_unregister_notifier);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 04/10] ublk: don't get ublk device reference in ublk_abort_queue()
  2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
                   ` (2 preceding siblings ...)
  2023-09-18  4:10 ` [PATCH 03/10] io_uring: support io_uring notifier for uring_cmd Ming Lei
@ 2023-09-18  4:11 ` Ming Lei
  2023-09-18  4:11 ` [PATCH 05/10] ublk: make sure ublk uring cmd handling is done in submitter task context Ming Lei
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-09-18  4:11 UTC (permalink / raw)
  To: Jens Axboe, linux-block, io-uring; +Cc: ZiyangZhang, Ming Lei

ublk_abort_queue() is called in ublk_daemon_monitor_work(), in which
it is guaranteed that ublk device is live, so no need to get the device
reference.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 630ddfe6657b..9b3c0b3dd36e 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1419,9 +1419,6 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
 {
 	int i;
 
-	if (!ublk_get_device(ub))
-		return;
-
 	for (i = 0; i < ubq->q_depth; i++) {
 		struct ublk_io *io = &ubq->ios[i];
 
@@ -1437,7 +1434,6 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
 				__ublk_fail_req(ubq, io, rq);
 		}
 	}
-	ublk_put_device(ub);
 }
 
 static void ublk_daemon_monitor_work(struct work_struct *work)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 05/10] ublk: make sure ublk uring cmd handling is done in submitter task context
  2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
                   ` (3 preceding siblings ...)
  2023-09-18  4:11 ` [PATCH 04/10] ublk: don't get ublk device reference in ublk_abort_queue() Ming Lei
@ 2023-09-18  4:11 ` Ming Lei
  2023-09-18  4:11 ` [PATCH 06/10] ublk: make sure that uring cmd aiming at same queue won't cross io_uring contexts Ming Lei
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-09-18  4:11 UTC (permalink / raw)
  To: Jens Axboe, linux-block, io-uring; +Cc: ZiyangZhang, Ming Lei

In well-done ublk server implementation, ublk io command won't be
linked into any link chain. Meantime we are always handled in no-wait
style, so basically ublk uring cmd handling is always done in submitter
task context.

However, the server may set IOSQE_ASYNC, or io command is linked to one
chain mistakenly, then we may still run into io-wq context and
ctx->uring_lock isn't held.

So in case of IO_URING_F_UNLOCKED, schedule this command by
io_uring_cmd_complete_in_task to run it in submitter task. Then
ublk_ch_uring_cmd_local() is always called with context uring_lock held,
and we needn't to worry about sync among submission code path.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 9b3c0b3dd36e..46d499d96ca3 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1810,7 +1810,8 @@ static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub,
 	return NULL;
 }
 
-static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
+static inline int ublk_ch_uring_cmd_local(struct io_uring_cmd *cmd,
+		unsigned int issue_flags)
 {
 	/*
 	 * Not necessary for async retry, but let's keep it simple and always
@@ -1824,9 +1825,28 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 		.addr = READ_ONCE(ub_src->addr)
 	};
 
+	WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED);
+
 	return __ublk_ch_uring_cmd(cmd, issue_flags, &ub_cmd);
 }
 
+static void ublk_ch_uring_cmd_cb(struct io_uring_cmd *cmd,
+		unsigned int issue_flags)
+{
+	ublk_ch_uring_cmd_local(cmd, issue_flags);
+}
+
+static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
+{
+	/* well-implemented server won't run into unlocked */
+	if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) {
+		io_uring_cmd_complete_in_task(cmd, ublk_ch_uring_cmd_cb);
+		return -EIOCBQUEUED;
+	}
+
+	return ublk_ch_uring_cmd_local(cmd, issue_flags);
+}
+
 static inline bool ublk_check_ubuf_dir(const struct request *req,
 		int ubuf_dir)
 {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 06/10] ublk: make sure that uring cmd aiming at same queue won't cross io_uring contexts
  2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
                   ` (4 preceding siblings ...)
  2023-09-18  4:11 ` [PATCH 05/10] ublk: make sure ublk uring cmd handling is done in submitter task context Ming Lei
@ 2023-09-18  4:11 ` Ming Lei
  2023-09-18  4:11 ` [PATCH 07/10] ublk: rename mm_lock as lock Ming Lei
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-09-18  4:11 UTC (permalink / raw)
  To: Jens Axboe, linux-block, io-uring; +Cc: ZiyangZhang, Ming Lei

Make sure that all commands aiming at same ublk queue are from same io_uring
context. This way is one very reasonable requirement, and not see any
reason userspace may send uring cmd to same queue by multiple io_uring
contexts.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 46d499d96ca3..52dd53662ffb 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -131,6 +131,7 @@ struct ublk_queue {
 	unsigned long flags;
 	struct task_struct	*ubq_daemon;
 	char *io_cmd_buf;
+	unsigned int ctx_id;
 
 	struct llist_head	io_cmds;
 
@@ -1410,6 +1411,11 @@ static void ublk_commit_completion(struct ublk_device *ub,
 		ublk_put_req_ref(ubq, req);
 }
 
+static inline bool ublk_ctx_id_is_valid(unsigned int ctx_id)
+{
+	return ctx_id != IO_URING_INVALID_CTX_ID;
+}
+
 /*
  * When ->ubq_daemon is exiting, either new request is ended immediately,
  * or any queued io command is drained, so it is safe to abort queue
@@ -1609,11 +1615,13 @@ static void ublk_stop_dev(struct ublk_device *ub)
 }
 
 /* device can only be started after all IOs are ready */
-static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
+static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq,
+		unsigned int ctx_id)
 {
 	mutex_lock(&ub->mutex);
 	ubq->nr_io_ready++;
 	if (ublk_queue_ready(ubq)) {
+		ubq->ctx_id = ctx_id;
 		ubq->ubq_daemon = current;
 		get_task_struct(ubq->ubq_daemon);
 		ub->nr_queues_ready++;
@@ -1682,6 +1690,9 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
 	if (ubq->ubq_daemon && ubq->ubq_daemon != current)
 		goto out;
 
+	if (ublk_ctx_id_is_valid(ubq->ctx_id) && cmd->ctx_id != ubq->ctx_id)
+		goto out;
+
 	if (tag >= ubq->q_depth)
 		goto out;
 
@@ -1734,7 +1745,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
 		}
 
 		ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
-		ublk_mark_io_ready(ub, ubq);
+		ublk_mark_io_ready(ub, ubq, cmd->ctx_id);
 		break;
 	case UBLK_IO_COMMIT_AND_FETCH_REQ:
 		req = blk_mq_tag_to_rq(ub->tag_set.tags[ub_cmd->q_id], tag);
@@ -1989,6 +2000,7 @@ static int ublk_init_queue(struct ublk_device *ub, int q_id)
 
 	ubq->io_cmd_buf = ptr;
 	ubq->dev = ub;
+	ubq->ctx_id = IO_URING_INVALID_CTX_ID;
 	return 0;
 }
 
@@ -2593,6 +2605,8 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
 	ubq->ubq_daemon = NULL;
 	ubq->timeout = false;
 
+	ubq->ctx_id = IO_URING_INVALID_CTX_ID;
+
 	for (i = 0; i < ubq->q_depth; i++) {
 		struct ublk_io *io = &ubq->ios[i];
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 07/10] ublk: rename mm_lock as lock
  2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
                   ` (5 preceding siblings ...)
  2023-09-18  4:11 ` [PATCH 06/10] ublk: make sure that uring cmd aiming at same queue won't cross io_uring contexts Ming Lei
@ 2023-09-18  4:11 ` Ming Lei
  2023-09-18  4:11 ` [PATCH 08/10] ublk: quiesce request queue when aborting queue Ming Lei
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-09-18  4:11 UTC (permalink / raw)
  To: Jens Axboe, linux-block, io-uring; +Cc: ZiyangZhang, Ming Lei

Rename the mm_lock field of ublk_device as lock, so that this lock can
be reused for protecting access of ub->ub_disk, which will be used for
simplifying ublk_abort_queue() by quiesce queue in next patch.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 52dd53662ffb..4bc4c4f87b36 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -167,7 +167,7 @@ struct ublk_device {
 
 	struct mutex		mutex;
 
-	spinlock_t		mm_lock;
+	spinlock_t		lock;
 	struct mm_struct	*mm;
 
 	struct ublk_params	params;
@@ -1358,12 +1358,12 @@ static int ublk_ch_mmap(struct file *filp, struct vm_area_struct *vma)
 	unsigned long pfn, end, phys_off = vma->vm_pgoff << PAGE_SHIFT;
 	int q_id, ret = 0;
 
-	spin_lock(&ub->mm_lock);
+	spin_lock(&ub->lock);
 	if (!ub->mm)
 		ub->mm = current->mm;
 	if (current->mm != ub->mm)
 		ret = -EINVAL;
-	spin_unlock(&ub->mm_lock);
+	spin_unlock(&ub->lock);
 
 	if (ret)
 		return ret;
@@ -2348,7 +2348,7 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
 	if (!ub)
 		goto out_unlock;
 	mutex_init(&ub->mutex);
-	spin_lock_init(&ub->mm_lock);
+	spin_lock_init(&ub->lock);
 	INIT_WORK(&ub->quiesce_work, ublk_quiesce_work_fn);
 	INIT_WORK(&ub->stop_work, ublk_stop_work_fn);
 	INIT_DELAYED_WORK(&ub->monitor_work, ublk_daemon_monitor_work);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 08/10] ublk: quiesce request queue when aborting queue
  2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
                   ` (6 preceding siblings ...)
  2023-09-18  4:11 ` [PATCH 07/10] ublk: rename mm_lock as lock Ming Lei
@ 2023-09-18  4:11 ` Ming Lei
  2023-09-18  4:11 ` [PATCH 09/10] ublk: replace monitor work with uring_cmd exit notifier Ming Lei
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-09-18  4:11 UTC (permalink / raw)
  To: Jens Axboe, linux-block, io-uring; +Cc: ZiyangZhang, Ming Lei

So far aborting queue ends request when the ubq daemon is exiting, and
it can be run concurrently with ublk_queue_rq(), this way is fragile and
we depend on the tricky usage of UBLK_IO_FLAG_ABORTED for avoiding such
race.

Quiesce queue when aborting queue, and the two code paths can be run
completely exclusively, then it becomes easier to add new ublk feature,
such as relaxing single same task limit for each queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 43 ++++++++++++++++++++++++++++++++++------
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 4bc4c4f87b36..3b691bf3d9ef 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1446,21 +1446,45 @@ static void ublk_daemon_monitor_work(struct work_struct *work)
 {
 	struct ublk_device *ub =
 		container_of(work, struct ublk_device, monitor_work.work);
+	struct gendisk *disk;
 	int i;
 
 	for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
 		struct ublk_queue *ubq = ublk_get_queue(ub, i);
 
-		if (ubq_daemon_is_dying(ubq)) {
-			if (ublk_queue_can_use_recovery(ubq))
-				schedule_work(&ub->quiesce_work);
-			else
-				schedule_work(&ub->stop_work);
+		if (ubq_daemon_is_dying(ubq))
+			goto found;
+	}
+	return;
+
+found:
+	spin_lock(&ub->lock);
+	disk = ub->ub_disk;
+	if (disk)
+		get_device(disk_to_dev(disk));
+	spin_unlock(&ub->lock);
+
+	/* Our disk has been dead */
+	if (!disk)
+		return;
+
+	/* Now we are serialized with ublk_queue_rq() */
+	blk_mq_quiesce_queue(disk->queue);
+	for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+		struct ublk_queue *ubq = ublk_get_queue(ub, i);
 
+		if (ubq_daemon_is_dying(ubq)) {
 			/* abort queue is for making forward progress */
 			ublk_abort_queue(ub, ubq);
 		}
 	}
+	blk_mq_unquiesce_queue(disk->queue);
+	put_device(disk_to_dev(disk));
+
+	if (ublk_can_use_recovery(ub))
+		schedule_work(&ub->quiesce_work);
+	else
+		schedule_work(&ub->stop_work);
 
 	/*
 	 * We can't schedule monitor work after ub's state is not UBLK_S_DEV_LIVE.
@@ -1595,6 +1619,8 @@ static void ublk_unquiesce_dev(struct ublk_device *ub)
 
 static void ublk_stop_dev(struct ublk_device *ub)
 {
+	struct gendisk *disk;
+
 	mutex_lock(&ub->mutex);
 	if (ub->dev_info.state == UBLK_S_DEV_DEAD)
 		goto unlock;
@@ -1604,10 +1630,15 @@ static void ublk_stop_dev(struct ublk_device *ub)
 		ublk_unquiesce_dev(ub);
 	}
 	del_gendisk(ub->ub_disk);
+
+	/* Sync with ublk_abort_queue() by holding the lock */
+	spin_lock(&ub->lock);
+	disk = ub->ub_disk;
 	ub->dev_info.state = UBLK_S_DEV_DEAD;
 	ub->dev_info.ublksrv_pid = -1;
-	put_disk(ub->ub_disk);
 	ub->ub_disk = NULL;
+	spin_unlock(&ub->lock);
+	put_disk(disk);
  unlock:
 	ublk_cancel_dev(ub);
 	mutex_unlock(&ub->mutex);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 09/10] ublk: replace monitor work with uring_cmd exit notifier
  2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
                   ` (7 preceding siblings ...)
  2023-09-18  4:11 ` [PATCH 08/10] ublk: quiesce request queue when aborting queue Ming Lei
@ 2023-09-18  4:11 ` Ming Lei
  2023-09-18  4:11 ` [PATCH 10/10] ublk: simplify aborting request Ming Lei
  2023-09-18 12:54 ` [PATCH 00/10] io_uring/ublk: exit notifier support Jens Axboe
  10 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-09-18  4:11 UTC (permalink / raw)
  To: Jens Axboe, linux-block, io-uring; +Cc: ZiyangZhang, Ming Lei

Monitor work actually introduces one extra context for handling abort, this
way is easy to cause race, and also introduce extra delay when handling
aborting.

Now uring_cmd introduces exit notifier, so use it instead:

1) this notifier callback is either run from the uring cmd submission task
context or called after the io_uring context is exit, so the callback is
run exclusively with ublk_ch_uring_cmd() and __ublk_rq_task_work().

2) the previous patch freezes request queue when calling ublk_abort_queue(),
which is now completely exclusive with ublk_queue_rq() and
ublk_ch_uring_cmd()/__ublk_rq_task_work().

This way simplifies aborting queue, and is helpful for adding new feature,
such as, relax the limit of using single task for handling one queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 89 +++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 47 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 3b691bf3d9ef..90e0137ff784 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -144,8 +144,6 @@ struct ublk_queue {
 	struct ublk_io ios[];
 };
 
-#define UBLK_DAEMON_MONITOR_PERIOD	(5 * HZ)
-
 struct ublk_device {
 	struct gendisk		*ub_disk;
 
@@ -176,11 +174,7 @@ struct ublk_device {
 	unsigned int		nr_queues_ready;
 	unsigned int		nr_privileged_daemon;
 
-	/*
-	 * Our ubq->daemon may be killed without any notification, so
-	 * monitor each queue's daemon periodically
-	 */
-	struct delayed_work	monitor_work;
+	struct notifier_block	notif;
 	struct work_struct	quiesce_work;
 	struct work_struct	stop_work;
 };
@@ -194,7 +188,6 @@ struct ublk_params_header {
 static inline unsigned int ublk_req_build_flags(struct request *req);
 static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
 						   int tag);
-
 static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
 {
 	return ub->dev_info.flags & UBLK_F_USER_COPY;
@@ -1119,8 +1112,6 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq,
 		blk_mq_requeue_request(rq, false);
 	else
 		blk_mq_end_request(rq, BLK_STS_IOERR);
-
-	mod_delayed_work(system_wq, &ubq->dev->monitor_work, 0);
 }
 
 static inline void __ublk_rq_task_work(struct request *req,
@@ -1241,12 +1232,12 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
 	io = &ubq->ios[rq->tag];
 	/*
 	 * If the check pass, we know that this is a re-issued request aborted
-	 * previously in monitor_work because the ubq_daemon(cmd's task) is
+	 * previously in exit notifier because the ubq_daemon(cmd's task) is
 	 * PF_EXITING. We cannot call io_uring_cmd_complete_in_task() anymore
 	 * because this ioucmd's io_uring context may be freed now if no inflight
 	 * ioucmd exists. Otherwise we may cause null-deref in ctx->fallback_work.
 	 *
-	 * Note: monitor_work sets UBLK_IO_FLAG_ABORTED and ends this request(releasing
+	 * Note: exit notifier sets UBLK_IO_FLAG_ABORTED and ends this request(releasing
 	 * the tag). Then the request is re-started(allocating the tag) and we are here.
 	 * Since releasing/allocating a tag implies smp_mb(), finding UBLK_IO_FLAG_ABORTED
 	 * guarantees that here is a re-issued request aborted previously.
@@ -1334,9 +1325,17 @@ static int ublk_ch_open(struct inode *inode, struct file *filp)
 {
 	struct ublk_device *ub = container_of(inode->i_cdev,
 			struct ublk_device, cdev);
+	int ret;
 
 	if (test_and_set_bit(UB_STATE_OPEN, &ub->state))
 		return -EBUSY;
+
+	ret = io_uring_cmd_register_notifier(&ub->notif);
+	if (ret) {
+		clear_bit(UB_STATE_OPEN, &ub->state);
+		return ret;
+	}
+
 	filp->private_data = ub;
 	return 0;
 }
@@ -1346,6 +1345,8 @@ static int ublk_ch_release(struct inode *inode, struct file *filp)
 	struct ublk_device *ub = filp->private_data;
 
 	clear_bit(UB_STATE_OPEN, &ub->state);
+	io_uring_cmd_unregister_notifier(&ub->notif);
+
 	return 0;
 }
 
@@ -1417,9 +1418,9 @@ static inline bool ublk_ctx_id_is_valid(unsigned int ctx_id)
 }
 
 /*
- * When ->ubq_daemon is exiting, either new request is ended immediately,
- * or any queued io command is drained, so it is safe to abort queue
- * lockless
+ * Called from ubq_daemon context via the notifier, meantime quiesce ublk
+ * blk-mq queue, so we are called exclusively with blk-mq and ubq_daemon
+ * context, so everything is serialized.
  */
 static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
 {
@@ -1442,20 +1443,34 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
 	}
 }
 
-static void ublk_daemon_monitor_work(struct work_struct *work)
+static inline bool ubq_daemon_is_exiting(const struct ublk_queue *ubq,
+		const struct io_uring_notifier_data *data)
 {
-	struct ublk_device *ub =
-		container_of(work, struct ublk_device, monitor_work.work);
+	return data->ctx_id == ubq->ctx_id && (!data->task ||
+			data->task == ubq->ubq_daemon);
+}
+
+static int ublk_notifier_cb(struct notifier_block *nb,
+		unsigned long event, void *val)
+{
+	struct ublk_device *ub = container_of(nb, struct ublk_device, notif);
+	struct io_uring_notifier_data *data = val;
 	struct gendisk *disk;
 	int i;
 
+	pr_devel("%s: event %lu ctx_id %u task %p\n", __func__, event,
+			data->ctx_id, data->task);
+
+	if (!ublk_ctx_id_is_valid(data->ctx_id))
+		return 0;
+
 	for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
 		struct ublk_queue *ubq = ublk_get_queue(ub, i);
 
-		if (ubq_daemon_is_dying(ubq))
+		if (ubq_daemon_is_exiting(ubq, data))
 			goto found;
 	}
-	return;
+	return 0;
 
 found:
 	spin_lock(&ub->lock);
@@ -1466,14 +1481,14 @@ static void ublk_daemon_monitor_work(struct work_struct *work)
 
 	/* Our disk has been dead */
 	if (!disk)
-		return;
+		return 0;
 
 	/* Now we are serialized with ublk_queue_rq() */
 	blk_mq_quiesce_queue(disk->queue);
 	for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
 		struct ublk_queue *ubq = ublk_get_queue(ub, i);
 
-		if (ubq_daemon_is_dying(ubq)) {
+		if (ubq_daemon_is_exiting(ubq, data)) {
 			/* abort queue is for making forward progress */
 			ublk_abort_queue(ub, ubq);
 		}
@@ -1485,17 +1500,7 @@ static void ublk_daemon_monitor_work(struct work_struct *work)
 		schedule_work(&ub->quiesce_work);
 	else
 		schedule_work(&ub->stop_work);
-
-	/*
-	 * We can't schedule monitor work after ub's state is not UBLK_S_DEV_LIVE.
-	 * after ublk_remove() or __ublk_quiesce_dev() is started.
-	 *
-	 * No need ub->mutex, monitor work are canceled after state is marked
-	 * as not LIVE, so new state is observed reliably.
-	 */
-	if (ub->dev_info.state == UBLK_S_DEV_LIVE)
-		schedule_delayed_work(&ub->monitor_work,
-				UBLK_DAEMON_MONITOR_PERIOD);
+	return 0;
 }
 
 static inline bool ublk_queue_ready(struct ublk_queue *ubq)
@@ -1512,6 +1517,9 @@ static void ublk_cancel_queue(struct ublk_queue *ubq)
 {
 	int i;
 
+	if (ublk_ctx_id_is_valid(ubq->ctx_id))
+		ubq->ctx_id = IO_URING_INVALID_CTX_ID;
+
 	if (!ublk_queue_ready(ubq))
 		return;
 
@@ -1572,15 +1580,6 @@ static void __ublk_quiesce_dev(struct ublk_device *ub)
 	ublk_wait_tagset_rqs_idle(ub);
 	ub->dev_info.state = UBLK_S_DEV_QUIESCED;
 	ublk_cancel_dev(ub);
-	/* we are going to release task_struct of ubq_daemon and resets
-	 * ->ubq_daemon to NULL. So in monitor_work, check on ubq_daemon causes UAF.
-	 * Besides, monitor_work is not necessary in QUIESCED state since we have
-	 * already scheduled quiesce_work and quiesced all ubqs.
-	 *
-	 * Do not let monitor_work schedule itself if state it QUIESCED. And we cancel
-	 * it here and re-schedule it in END_USER_RECOVERY to avoid UAF.
-	 */
-	cancel_delayed_work_sync(&ub->monitor_work);
 }
 
 static void ublk_quiesce_work_fn(struct work_struct *work)
@@ -1642,7 +1641,6 @@ static void ublk_stop_dev(struct ublk_device *ub)
  unlock:
 	ublk_cancel_dev(ub);
 	mutex_unlock(&ub->mutex);
-	cancel_delayed_work_sync(&ub->monitor_work);
 }
 
 /* device can only be started after all IOs are ready */
@@ -2210,8 +2208,6 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
 	if (wait_for_completion_interruptible(&ub->completion) != 0)
 		return -EINTR;
 
-	schedule_delayed_work(&ub->monitor_work, UBLK_DAEMON_MONITOR_PERIOD);
-
 	mutex_lock(&ub->mutex);
 	if (ub->dev_info.state == UBLK_S_DEV_LIVE ||
 	    test_bit(UB_STATE_USED, &ub->state)) {
@@ -2382,7 +2378,7 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
 	spin_lock_init(&ub->lock);
 	INIT_WORK(&ub->quiesce_work, ublk_quiesce_work_fn);
 	INIT_WORK(&ub->stop_work, ublk_stop_work_fn);
-	INIT_DELAYED_WORK(&ub->monitor_work, ublk_daemon_monitor_work);
+	ub->notif.notifier_call = ublk_notifier_cb;
 
 	ret = ublk_alloc_dev_number(ub, header->dev_id);
 	if (ret < 0)
@@ -2722,7 +2718,6 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
 			__func__, header->dev_id);
 	blk_mq_kick_requeue_list(ub->ub_disk->queue);
 	ub->dev_info.state = UBLK_S_DEV_LIVE;
-	schedule_delayed_work(&ub->monitor_work, UBLK_DAEMON_MONITOR_PERIOD);
 	ret = 0;
  out_unlock:
 	mutex_unlock(&ub->mutex);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 10/10] ublk: simplify aborting request
  2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
                   ` (8 preceding siblings ...)
  2023-09-18  4:11 ` [PATCH 09/10] ublk: replace monitor work with uring_cmd exit notifier Ming Lei
@ 2023-09-18  4:11 ` Ming Lei
  2023-09-18 12:54 ` [PATCH 00/10] io_uring/ublk: exit notifier support Jens Axboe
  10 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-09-18  4:11 UTC (permalink / raw)
  To: Jens Axboe, linux-block, io-uring; +Cc: ZiyangZhang, Ming Lei

Now ublk_abort_queue() is run exclusively with ublk_queue_rq() and the
ubq_daemon task, so simplify aborting request:

- set UBLK_IO_FLAG_ABORTED in ublk_abort_queue()

- abort request in ublk_queue_rq() if UBLK_IO_FLAG_ABORTED is observed,
and no need to check ubq_daemon any more

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 46 ++++++++++++----------------------------
 1 file changed, 14 insertions(+), 32 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 90e0137ff784..0f54f63136fd 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1077,13 +1077,10 @@ static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
 {
 	WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
 
-	if (!(io->flags & UBLK_IO_FLAG_ABORTED)) {
-		io->flags |= UBLK_IO_FLAG_ABORTED;
-		if (ublk_queue_can_use_recovery_reissue(ubq))
-			blk_mq_requeue_request(req, false);
-		else
-			ublk_put_req_ref(ubq, req);
-	}
+	if (ublk_queue_can_use_recovery_reissue(ubq))
+		blk_mq_requeue_request(req, false);
+	else
+		ublk_put_req_ref(ubq, req);
 }
 
 static void ubq_complete_io_cmd(struct ublk_io *io, int res,
@@ -1221,30 +1218,12 @@ static void ublk_rq_task_work_cb(struct io_uring_cmd *cmd, unsigned issue_flags)
 	ublk_forward_io_cmds(ubq, issue_flags);
 }
 
-static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
+static inline void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
 {
 	struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq);
-	struct ublk_io *io;
-
-	if (!llist_add(&data->node, &ubq->io_cmds))
-		return;
 
-	io = &ubq->ios[rq->tag];
-	/*
-	 * If the check pass, we know that this is a re-issued request aborted
-	 * previously in exit notifier because the ubq_daemon(cmd's task) is
-	 * PF_EXITING. We cannot call io_uring_cmd_complete_in_task() anymore
-	 * because this ioucmd's io_uring context may be freed now if no inflight
-	 * ioucmd exists. Otherwise we may cause null-deref in ctx->fallback_work.
-	 *
-	 * Note: exit notifier sets UBLK_IO_FLAG_ABORTED and ends this request(releasing
-	 * the tag). Then the request is re-started(allocating the tag) and we are here.
-	 * Since releasing/allocating a tag implies smp_mb(), finding UBLK_IO_FLAG_ABORTED
-	 * guarantees that here is a re-issued request aborted previously.
-	 */
-	if (unlikely(io->flags & UBLK_IO_FLAG_ABORTED)) {
-		ublk_abort_io_cmds(ubq);
-	} else {
+	if (llist_add(&data->node, &ubq->io_cmds)) {
+		struct ublk_io *io = &ubq->ios[rq->tag];
 		struct io_uring_cmd *cmd = io->cmd;
 		struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
 
@@ -1274,6 +1253,7 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
 {
 	struct ublk_queue *ubq = hctx->driver_data;
 	struct request *rq = bd->rq;
+	struct ublk_io *io = &ubq->ios[rq->tag];
 	blk_status_t res;
 
 	/* fill iod to slot in io cmd buffer */
@@ -1293,13 +1273,12 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
 	if (ublk_queue_can_use_recovery(ubq) && unlikely(ubq->force_abort))
 		return BLK_STS_IOERR;
 
-	blk_mq_start_request(bd->rq);
-
-	if (unlikely(ubq_daemon_is_dying(ubq))) {
+	if (unlikely(io->flags & UBLK_IO_FLAG_ABORTED)) {
 		__ublk_abort_rq(ubq, rq);
 		return BLK_STS_OK;
 	}
 
+	blk_mq_start_request(bd->rq);
 	ublk_queue_cmd(ubq, rq);
 
 	return BLK_STS_OK;
@@ -1429,6 +1408,9 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
 	for (i = 0; i < ubq->q_depth; i++) {
 		struct ublk_io *io = &ubq->ios[i];
 
+		if (!(io->flags & UBLK_IO_FLAG_ABORTED))
+			io->flags |= UBLK_IO_FLAG_ABORTED;
+
 		if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) {
 			struct request *rq;
 
@@ -1437,7 +1419,7 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
 			 * will do it
 			 */
 			rq = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], i);
-			if (rq)
+			if (rq && blk_mq_request_started(rq))
 				__ublk_fail_req(ubq, io, rq);
 		}
 	}
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/10] io_uring/ublk: exit notifier support
  2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
                   ` (9 preceding siblings ...)
  2023-09-18  4:11 ` [PATCH 10/10] ublk: simplify aborting request Ming Lei
@ 2023-09-18 12:54 ` Jens Axboe
  2023-09-18 13:24   ` Ming Lei
  10 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2023-09-18 12:54 UTC (permalink / raw)
  To: Ming Lei, linux-block, io-uring; +Cc: ZiyangZhang

On 9/17/23 10:10 PM, Ming Lei wrote:
> Hello,
> 
> In do_exit(), io_uring needs to wait pending requests.
> 
> ublk is one uring_cmd driver, and its usage is a bit special by submitting
> command for waiting incoming block IO request in advance, so if there
> isn't any IO request coming, the command can't be completed. So far ublk
> driver has to bind its queue with one ublk daemon server, meantime
> starts one monitor work to check if this daemon is live periodically.
> This way requires ublk queue to be bound one single daemon pthread, and
> not flexible, meantime the monitor work is run in 3rd context, and the
> implementation is a bit tricky.
> 
> The 1st 3 patches adds io_uring task exit notifier, and the other
> patches converts ublk into this exit notifier, and the implementation
> becomes more robust & readable, meantime it becomes easier to relax
> the ublk queue/daemon limit in future, such as not require to bind
> ublk queue with single daemon.

The normal approach for this is to ensure that each request is
cancelable, which we need for other things too (like actual cancel
support). Why can't we just do the same for ublk?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/10] io_uring/ublk: exit notifier support
  2023-09-18 12:54 ` [PATCH 00/10] io_uring/ublk: exit notifier support Jens Axboe
@ 2023-09-18 13:24   ` Ming Lei
  2023-09-18 14:15     ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2023-09-18 13:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, io-uring, ming.lei

On Mon, Sep 18, 2023 at 06:54:33AM -0600, Jens Axboe wrote:
> On 9/17/23 10:10 PM, Ming Lei wrote:
> > Hello,
> > 
> > In do_exit(), io_uring needs to wait pending requests.
> > 
> > ublk is one uring_cmd driver, and its usage is a bit special by submitting
> > command for waiting incoming block IO request in advance, so if there
> > isn't any IO request coming, the command can't be completed. So far ublk
> > driver has to bind its queue with one ublk daemon server, meantime
> > starts one monitor work to check if this daemon is live periodically.
> > This way requires ublk queue to be bound one single daemon pthread, and
> > not flexible, meantime the monitor work is run in 3rd context, and the
> > implementation is a bit tricky.
> > 
> > The 1st 3 patches adds io_uring task exit notifier, and the other
> > patches converts ublk into this exit notifier, and the implementation
> > becomes more robust & readable, meantime it becomes easier to relax
> > the ublk queue/daemon limit in future, such as not require to bind
> > ublk queue with single daemon.
> 
> The normal approach for this is to ensure that each request is
> cancelable, which we need for other things too (like actual cancel
> support) Why can't we just do the same for ublk?

I guess you meant IORING_OP_ASYNC_CANCEL, which needs userspace to
submit this command, but here the userspace(ublk server) may be just panic
or killed, and there isn't chance to send IORING_OP_ASYNC_CANCEL.

And driver doesn't have any knowledge if the io_uring ctx or io task
is exiting, so can't complete issued commands, then hang in
io_uring_cancel_generic() when the io task/ctx is exiting.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/10] io_uring/ublk: exit notifier support
  2023-09-18 13:24   ` Ming Lei
@ 2023-09-18 14:15     ` Jens Axboe
  2023-09-18 16:02       ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2023-09-18 14:15 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, io-uring

On 9/18/23 7:24 AM, Ming Lei wrote:
> On Mon, Sep 18, 2023 at 06:54:33AM -0600, Jens Axboe wrote:
>> On 9/17/23 10:10 PM, Ming Lei wrote:
>>> Hello,
>>>
>>> In do_exit(), io_uring needs to wait pending requests.
>>>
>>> ublk is one uring_cmd driver, and its usage is a bit special by submitting
>>> command for waiting incoming block IO request in advance, so if there
>>> isn't any IO request coming, the command can't be completed. So far ublk
>>> driver has to bind its queue with one ublk daemon server, meantime
>>> starts one monitor work to check if this daemon is live periodically.
>>> This way requires ublk queue to be bound one single daemon pthread, and
>>> not flexible, meantime the monitor work is run in 3rd context, and the
>>> implementation is a bit tricky.
>>>
>>> The 1st 3 patches adds io_uring task exit notifier, and the other
>>> patches converts ublk into this exit notifier, and the implementation
>>> becomes more robust & readable, meantime it becomes easier to relax
>>> the ublk queue/daemon limit in future, such as not require to bind
>>> ublk queue with single daemon.
>>
>> The normal approach for this is to ensure that each request is
>> cancelable, which we need for other things too (like actual cancel
>> support) Why can't we just do the same for ublk?
> 
> I guess you meant IORING_OP_ASYNC_CANCEL, which needs userspace to
> submit this command, but here the userspace(ublk server) may be just panic
> or killed, and there isn't chance to send IORING_OP_ASYNC_CANCEL.

Either that, or cancel done because of task exit.

> And driver doesn't have any knowledge if the io_uring ctx or io task
> is exiting, so can't complete issued commands, then hang in
> io_uring_cancel_generic() when the io task/ctx is exiting.

If you hooked into the normal cancel paths, you very much would get
notified when the task is exiting. That's how the other cancelations
work, eg if a task has pending poll requests and exits, they get
canceled and reaped.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/10] io_uring/ublk: exit notifier support
  2023-09-18 14:15     ` Jens Axboe
@ 2023-09-18 16:02       ` Ming Lei
  2023-09-18 16:03         ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2023-09-18 16:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, io-uring, ming.lei

On Mon, Sep 18, 2023 at 08:15:07AM -0600, Jens Axboe wrote:
> On 9/18/23 7:24 AM, Ming Lei wrote:
> > On Mon, Sep 18, 2023 at 06:54:33AM -0600, Jens Axboe wrote:
> >> On 9/17/23 10:10 PM, Ming Lei wrote:
> >>> Hello,
> >>>
> >>> In do_exit(), io_uring needs to wait pending requests.
> >>>
> >>> ublk is one uring_cmd driver, and its usage is a bit special by submitting
> >>> command for waiting incoming block IO request in advance, so if there
> >>> isn't any IO request coming, the command can't be completed. So far ublk
> >>> driver has to bind its queue with one ublk daemon server, meantime
> >>> starts one monitor work to check if this daemon is live periodically.
> >>> This way requires ublk queue to be bound one single daemon pthread, and
> >>> not flexible, meantime the monitor work is run in 3rd context, and the
> >>> implementation is a bit tricky.
> >>>
> >>> The 1st 3 patches adds io_uring task exit notifier, and the other
> >>> patches converts ublk into this exit notifier, and the implementation
> >>> becomes more robust & readable, meantime it becomes easier to relax
> >>> the ublk queue/daemon limit in future, such as not require to bind
> >>> ublk queue with single daemon.
> >>
> >> The normal approach for this is to ensure that each request is
> >> cancelable, which we need for other things too (like actual cancel
> >> support) Why can't we just do the same for ublk?
> > 
> > I guess you meant IORING_OP_ASYNC_CANCEL, which needs userspace to
> > submit this command, but here the userspace(ublk server) may be just panic
> > or killed, and there isn't chance to send IORING_OP_ASYNC_CANCEL.
> 
> Either that, or cancel done because of task exit.
> 
> > And driver doesn't have any knowledge if the io_uring ctx or io task
> > is exiting, so can't complete issued commands, then hang in
> > io_uring_cancel_generic() when the io task/ctx is exiting.
> 
> If you hooked into the normal cancel paths, you very much would get
> notified when the task is exiting. That's how the other cancelations
> work, eg if a task has pending poll requests and exits, they get
> canceled and reaped.

Ok, got the idea, thanks for the point!

Turns out it is cancelable uring_cmd, and I will try to work towards
this direction, and has got something in mind about the implementation.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/10] io_uring/ublk: exit notifier support
  2023-09-18 16:02       ` Ming Lei
@ 2023-09-18 16:03         ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2023-09-18 16:03 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, io-uring

On 9/18/23 10:02 AM, Ming Lei wrote:
> On Mon, Sep 18, 2023 at 08:15:07AM -0600, Jens Axboe wrote:
>> On 9/18/23 7:24 AM, Ming Lei wrote:
>>> On Mon, Sep 18, 2023 at 06:54:33AM -0600, Jens Axboe wrote:
>>>> On 9/17/23 10:10 PM, Ming Lei wrote:
>>>>> Hello,
>>>>>
>>>>> In do_exit(), io_uring needs to wait pending requests.
>>>>>
>>>>> ublk is one uring_cmd driver, and its usage is a bit special by submitting
>>>>> command for waiting incoming block IO request in advance, so if there
>>>>> isn't any IO request coming, the command can't be completed. So far ublk
>>>>> driver has to bind its queue with one ublk daemon server, meantime
>>>>> starts one monitor work to check if this daemon is live periodically.
>>>>> This way requires ublk queue to be bound one single daemon pthread, and
>>>>> not flexible, meantime the monitor work is run in 3rd context, and the
>>>>> implementation is a bit tricky.
>>>>>
>>>>> The 1st 3 patches adds io_uring task exit notifier, and the other
>>>>> patches converts ublk into this exit notifier, and the implementation
>>>>> becomes more robust & readable, meantime it becomes easier to relax
>>>>> the ublk queue/daemon limit in future, such as not require to bind
>>>>> ublk queue with single daemon.
>>>>
>>>> The normal approach for this is to ensure that each request is
>>>> cancelable, which we need for other things too (like actual cancel
>>>> support) Why can't we just do the same for ublk?
>>>
>>> I guess you meant IORING_OP_ASYNC_CANCEL, which needs userspace to
>>> submit this command, but here the userspace(ublk server) may be just panic
>>> or killed, and there isn't chance to send IORING_OP_ASYNC_CANCEL.
>>
>> Either that, or cancel done because of task exit.
>>
>>> And driver doesn't have any knowledge if the io_uring ctx or io task
>>> is exiting, so can't complete issued commands, then hang in
>>> io_uring_cancel_generic() when the io task/ctx is exiting.
>>
>> If you hooked into the normal cancel paths, you very much would get
>> notified when the task is exiting. That's how the other cancelations
>> work, eg if a task has pending poll requests and exits, they get
>> canceled and reaped.
> 
> Ok, got the idea, thanks for the point!
> 
> Turns out it is cancelable uring_cmd, and I will try to work towards
> this direction, and has got something in mind about the implementation.

Perfect, thanks Ming!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-09-18 16:43 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-18  4:10 [PATCH 00/10] io_uring/ublk: exit notifier support Ming Lei
2023-09-18  4:10 ` [PATCH 01/10] io_uring: allocate ctx id and build map between id and ctx Ming Lei
2023-09-18  4:10 ` [PATCH 02/10] io_uring: pass io_uring_ctx->id to uring_cmd Ming Lei
2023-09-18  4:10 ` [PATCH 03/10] io_uring: support io_uring notifier for uring_cmd Ming Lei
2023-09-18  4:11 ` [PATCH 04/10] ublk: don't get ublk device reference in ublk_abort_queue() Ming Lei
2023-09-18  4:11 ` [PATCH 05/10] ublk: make sure ublk uring cmd handling is done in submitter task context Ming Lei
2023-09-18  4:11 ` [PATCH 06/10] ublk: make sure that uring cmd aiming at same queue won't cross io_uring contexts Ming Lei
2023-09-18  4:11 ` [PATCH 07/10] ublk: rename mm_lock as lock Ming Lei
2023-09-18  4:11 ` [PATCH 08/10] ublk: quiesce request queue when aborting queue Ming Lei
2023-09-18  4:11 ` [PATCH 09/10] ublk: replace monitor work with uring_cmd exit notifier Ming Lei
2023-09-18  4:11 ` [PATCH 10/10] ublk: simplify aborting request Ming Lei
2023-09-18 12:54 ` [PATCH 00/10] io_uring/ublk: exit notifier support Jens Axboe
2023-09-18 13:24   ` Ming Lei
2023-09-18 14:15     ` Jens Axboe
2023-09-18 16:02       ` Ming Lei
2023-09-18 16:03         ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.