All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 5.12 0/8] remove task file notes
@ 2021-03-06 11:02 Pavel Begunkov
  2021-03-06 11:02 ` [PATCH v4 1/8] io_uring: make del_task_file more forgiving Pavel Begunkov
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Introduce a mapping from ctx to all tctx, and using that removes
file notes, i.e. taking a io_uring file note previously stored in
task->io_uring->xa. It's needed because we don't free io_uring ctx
until all submitters die/exec, and it became worse after killing
 ->flush(). There are rough corner in a form of not behaving nicely,
I'll address in follow-up patches. Also use it to do cancellations
right.

The torture is as simple as below. It will get OOM in no time. Also,
I plan to use it to fix recently broken cancellations.

while (1) {
	assert(!io_uring_queue_init(8, &ring, 0));
	io_uring_queue_exit(&ring);
}

WARNING: hangs without reverting sq park refactoring

v2: rebase (resolve conflicts)
    drop taken 2 patches
v3: use jiffies in 6/6 (Jens)
v4: 2 patches, 7/8 fix cancellation regressions

Pavel Begunkov (8):
  io_uring: make del_task_file more forgiving
  io_uring: introduce ctx to tctx back map
  io_uring: do ctx initiated file note removal
  io_uring: don't take task ring-file notes
  io_uring: index io_uring->xa by ctx not file
  io_uring: warn when ring exit takes too long
  io_uring: cancel reqs of all iowq's on ring exit
  io-wq: warn on creating manager awhile exiting

 fs/io-wq.c               |   2 +
 fs/io_uring.c            | 173 ++++++++++++++++++++++++++++++++-------
 include/linux/io_uring.h |   2 +-
 3 files changed, 147 insertions(+), 30 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/8] io_uring: make del_task_file more forgiving
  2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
  2021-03-06 11:02 ` [PATCH v4 2/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Rework io_uring_del_task_file(), so it accepts an index to delete, and
it's not necessarily have to be in the ->xa. Infer file from xa_erase()
to maintain a single origin of truth.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 92c25b5f1349..4c6a92e5d5a3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8780,15 +8780,18 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 /*
  * Remove this io_uring_file -> task mapping.
  */
-static void io_uring_del_task_file(struct file *file)
+static void io_uring_del_task_file(unsigned long index)
 {
 	struct io_uring_task *tctx = current->io_uring;
+	struct file *file;
+
+	file = xa_erase(&tctx->xa, index);
+	if (!file)
+		return;
 
 	if (tctx->last == file)
 		tctx->last = NULL;
-	file = xa_erase(&tctx->xa, (unsigned long)file);
-	if (file)
-		fput(file);
+	fput(file);
 }
 
 static void io_uring_clean_tctx(struct io_uring_task *tctx)
@@ -8797,7 +8800,7 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
 	unsigned long index;
 
 	xa_for_each(&tctx->xa, index, file)
-		io_uring_del_task_file(file);
+		io_uring_del_task_file(index);
 	if (tctx->io_wq) {
 		io_wq_put_and_exit(tctx->io_wq);
 		tctx->io_wq = NULL;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 2/8] io_uring: introduce ctx to tctx back map
  2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
  2021-03-06 11:02 ` [PATCH v4 1/8] io_uring: make del_task_file more forgiving Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
  2021-03-06 11:02 ` [PATCH v4 3/8] io_uring: do ctx initiated file note removal Pavel Begunkov
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
  To: Jens Axboe, io-uring

For each pair tcxt-ctx create an object and chain it into ctx, so we
have a way to traverse all tctx that are using current ctx. Preparation
patch, will be used later.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 58 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 44 insertions(+), 14 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4c6a92e5d5a3..f26f8199e4ab 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -454,6 +454,7 @@ struct io_ring_ctx {
 
 	/* Keep this last, we don't need it for the fast path */
 	struct work_struct		exit_work;
+	struct list_head		tctx_list;
 };
 
 /*
@@ -805,6 +806,13 @@ struct io_kiocb {
 	struct io_wq_work		work;
 };
 
+struct io_tctx_node {
+	struct list_head	ctx_node;
+	struct task_struct	*task;
+	struct file		*file;
+	struct io_ring_ctx	*ctx;
+};
+
 struct io_defer_entry {
 	struct list_head	list;
 	struct io_kiocb		*req;
@@ -1144,6 +1152,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	INIT_LIST_HEAD(&ctx->rsrc_ref_list);
 	INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
 	init_llist_head(&ctx->rsrc_put_llist);
+	INIT_LIST_HEAD(&ctx->tctx_list);
 	INIT_LIST_HEAD(&ctx->submit_state.comp.free_list);
 	INIT_LIST_HEAD(&ctx->submit_state.comp.locked_free_list);
 	return ctx;
@@ -8743,6 +8752,7 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
 static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 {
 	struct io_uring_task *tctx = current->io_uring;
+	struct io_tctx_node *node;
 	int ret;
 
 	if (unlikely(!tctx)) {
@@ -8755,13 +8765,25 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 		void *old = xa_load(&tctx->xa, (unsigned long)file);
 
 		if (!old) {
+			node = kmalloc(sizeof(*node), GFP_KERNEL);
+			if (!node)
+				return -ENOMEM;
+			node->ctx = ctx;
+			node->file = file;
+			node->task = current;
+
 			get_file(file);
 			ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
-						file, GFP_KERNEL));
+						node, GFP_KERNEL));
 			if (ret) {
 				fput(file);
+				kfree(node);
 				return ret;
 			}
+
+			mutex_lock(&ctx->uring_lock);
+			list_add(&node->ctx_node, &ctx->tctx_list);
+			mutex_unlock(&ctx->uring_lock);
 		}
 		tctx->last = file;
 	}
@@ -8783,23 +8805,31 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 static void io_uring_del_task_file(unsigned long index)
 {
 	struct io_uring_task *tctx = current->io_uring;
-	struct file *file;
+	struct io_tctx_node *node;
 
-	file = xa_erase(&tctx->xa, index);
-	if (!file)
+	node = xa_erase(&tctx->xa, index);
+	if (!node)
 		return;
 
-	if (tctx->last == file)
+	WARN_ON_ONCE(current != node->task);
+	WARN_ON_ONCE(list_empty(&node->ctx_node));
+
+	mutex_lock(&node->ctx->uring_lock);
+	list_del(&node->ctx_node);
+	mutex_unlock(&node->ctx->uring_lock);
+
+	if (tctx->last == node->file)
 		tctx->last = NULL;
-	fput(file);
+	fput(node->file);
+	kfree(node);
 }
 
 static void io_uring_clean_tctx(struct io_uring_task *tctx)
 {
-	struct file *file;
+	struct io_tctx_node *node;
 	unsigned long index;
 
-	xa_for_each(&tctx->xa, index, file)
+	xa_for_each(&tctx->xa, index, node)
 		io_uring_del_task_file(index);
 	if (tctx->io_wq) {
 		io_wq_put_and_exit(tctx->io_wq);
@@ -8810,13 +8840,13 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
 void __io_uring_files_cancel(struct files_struct *files)
 {
 	struct io_uring_task *tctx = current->io_uring;
-	struct file *file;
+	struct io_tctx_node *node;
 	unsigned long index;
 
 	/* make sure overflow events are dropped */
 	atomic_inc(&tctx->in_idle);
-	xa_for_each(&tctx->xa, index, file)
-		io_uring_cancel_task_requests(file->private_data, files);
+	xa_for_each(&tctx->xa, index, node)
+		io_uring_cancel_task_requests(node->ctx, files);
 	atomic_dec(&tctx->in_idle);
 
 	if (files)
@@ -8879,11 +8909,11 @@ void __io_uring_task_cancel(void)
 	atomic_inc(&tctx->in_idle);
 
 	if (tctx->sqpoll) {
-		struct file *file;
+		struct io_tctx_node *node;
 		unsigned long index;
 
-		xa_for_each(&tctx->xa, index, file)
-			io_uring_cancel_sqpoll(file->private_data);
+		xa_for_each(&tctx->xa, index, node)
+			io_uring_cancel_sqpoll(node->ctx);
 	}
 
 	do {
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 3/8] io_uring: do ctx initiated file note removal
  2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
  2021-03-06 11:02 ` [PATCH v4 1/8] io_uring: make del_task_file more forgiving Pavel Begunkov
  2021-03-06 11:02 ` [PATCH v4 2/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
  2021-03-06 11:02 ` [PATCH v4 4/8] io_uring: don't take task ring-file notes Pavel Begunkov
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Another preparation patch. When full quiesce is done on ctx exit, use
task_work infra to remove corresponding to the ctx io_uring->xa entries.
For that we use the back tctx map. Also use ->in_idle to prevent
removing it while we traversing ->xa on cancellation, just ignore it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f26f8199e4ab..692096e85749 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -987,6 +987,7 @@ static const struct io_op_def io_op_defs[] = {
 	[IORING_OP_UNLINKAT] = {},
 };
 
+static void io_uring_del_task_file(unsigned long index);
 static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 					 struct task_struct *task,
 					 struct files_struct *files);
@@ -8531,10 +8532,33 @@ static bool io_run_ctx_fallback(struct io_ring_ctx *ctx)
 	return executed;
 }
 
+struct io_tctx_exit {
+	struct callback_head		task_work;
+	struct completion		completion;
+	unsigned long			index;
+};
+
+static void io_tctx_exit_cb(struct callback_head *cb)
+{
+	struct io_uring_task *tctx = current->io_uring;
+	struct io_tctx_exit *work;
+
+	work = container_of(cb, struct io_tctx_exit, task_work);
+	/*
+	 * When @in_idle, we're in cancellation and it's racy to remove the
+	 * node. It'll be removed by the end of cancellation, just ignore it.
+	 */
+	if (!atomic_read(&tctx->in_idle))
+		io_uring_del_task_file(work->index);
+	complete(&work->completion);
+}
+
 static void io_ring_exit_work(struct work_struct *work)
 {
-	struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
-					       exit_work);
+	struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
+	struct io_tctx_exit exit;
+	struct io_tctx_node *node;
+	int ret;
 
 	/*
 	 * If we're doing polled IO and end up having requests being
@@ -8545,6 +8569,26 @@ static void io_ring_exit_work(struct work_struct *work)
 	do {
 		io_uring_try_cancel_requests(ctx, NULL, NULL);
 	} while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20));
+
+	mutex_lock(&ctx->uring_lock);
+	while (!list_empty(&ctx->tctx_list)) {
+		node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
+					ctx_node);
+		exit.index = (unsigned long)node->file;
+		init_completion(&exit.completion);
+		init_task_work(&exit.task_work, io_tctx_exit_cb);
+		ret = task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
+		if (WARN_ON_ONCE(ret))
+			continue;
+		wake_up_process(node->task);
+
+		mutex_unlock(&ctx->uring_lock);
+		wait_for_completion(&exit.completion);
+		cond_resched();
+		mutex_lock(&ctx->uring_lock);
+	}
+	mutex_unlock(&ctx->uring_lock);
+
 	io_ring_ctx_free(ctx);
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 4/8] io_uring: don't take task ring-file notes
  2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-03-06 11:02 ` [PATCH v4 3/8] io_uring: do ctx initiated file note removal Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
  2021-03-06 11:02 ` [PATCH v4 5/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
  To: Jens Axboe, io-uring

With ->flush() gone we're now leaving all uring file notes until the
task dies/execs, so the ctx will not be freed until all tasks that have
ever submit a request die. It was nicer with flush but not much, we
could have locked as described ctx in many cases.

Now we guarantee that ctx outlives all tctx in a sense that
io_ring_exit_work() waits for all tctxs to drop their corresponding
enties in ->xa, and ctx won't go away until then. Hence, additional
io_uring file reference (a.k.a. task file notes) are not needed anymore.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 692096e85749..a4e5acb058d2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8816,11 +8816,9 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 			node->file = file;
 			node->task = current;
 
-			get_file(file);
 			ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
 						node, GFP_KERNEL));
 			if (ret) {
-				fput(file);
 				kfree(node);
 				return ret;
 			}
@@ -8851,6 +8849,8 @@ static void io_uring_del_task_file(unsigned long index)
 	struct io_uring_task *tctx = current->io_uring;
 	struct io_tctx_node *node;
 
+	if (!tctx)
+		return;
 	node = xa_erase(&tctx->xa, index);
 	if (!node)
 		return;
@@ -8864,7 +8864,6 @@ static void io_uring_del_task_file(unsigned long index)
 
 	if (tctx->last == node->file)
 		tctx->last = NULL;
-	fput(node->file);
 	kfree(node);
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 5/8] io_uring: index io_uring->xa by ctx not file
  2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
                   ` (3 preceding siblings ...)
  2021-03-06 11:02 ` [PATCH v4 4/8] io_uring: don't take task ring-file notes Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
  2021-03-06 11:02 ` [PATCH v4 6/8] io_uring: warn when ring exit takes too long Pavel Begunkov
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
  To: Jens Axboe, io-uring

We don't use task file notes anymore, and no need left in indexing
task->io_uring->xa by file, and replace it with ctx. It's better
design-wise, especially since we keep a dangling file, and so have to
keep an eye on not dereferencing it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c            | 24 +++++++++++-------------
 include/linux/io_uring.h |  2 +-
 2 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a4e5acb058d2..c5436b24d221 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -809,7 +809,6 @@ struct io_kiocb {
 struct io_tctx_node {
 	struct list_head	ctx_node;
 	struct task_struct	*task;
-	struct file		*file;
 	struct io_ring_ctx	*ctx;
 };
 
@@ -8535,7 +8534,7 @@ static bool io_run_ctx_fallback(struct io_ring_ctx *ctx)
 struct io_tctx_exit {
 	struct callback_head		task_work;
 	struct completion		completion;
-	unsigned long			index;
+	struct io_ring_ctx		*ctx;
 };
 
 static void io_tctx_exit_cb(struct callback_head *cb)
@@ -8549,7 +8548,7 @@ static void io_tctx_exit_cb(struct callback_head *cb)
 	 * node. It'll be removed by the end of cancellation, just ignore it.
 	 */
 	if (!atomic_read(&tctx->in_idle))
-		io_uring_del_task_file(work->index);
+		io_uring_del_task_file((unsigned long)work->ctx);
 	complete(&work->completion);
 }
 
@@ -8574,7 +8573,7 @@ static void io_ring_exit_work(struct work_struct *work)
 	while (!list_empty(&ctx->tctx_list)) {
 		node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
 					ctx_node);
-		exit.index = (unsigned long)node->file;
+		exit.ctx = ctx;
 		init_completion(&exit.completion);
 		init_task_work(&exit.task_work, io_tctx_exit_cb);
 		ret = task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
@@ -8793,7 +8792,7 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
 /*
  * Note that this task has used io_uring. We use it for cancelation purposes.
  */
-static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
+static int io_uring_add_task_file(struct io_ring_ctx *ctx)
 {
 	struct io_uring_task *tctx = current->io_uring;
 	struct io_tctx_node *node;
@@ -8805,18 +8804,17 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 			return ret;
 		tctx = current->io_uring;
 	}
-	if (tctx->last != file) {
-		void *old = xa_load(&tctx->xa, (unsigned long)file);
+	if (tctx->last != ctx) {
+		void *old = xa_load(&tctx->xa, (unsigned long)ctx);
 
 		if (!old) {
 			node = kmalloc(sizeof(*node), GFP_KERNEL);
 			if (!node)
 				return -ENOMEM;
 			node->ctx = ctx;
-			node->file = file;
 			node->task = current;
 
-			ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
+			ret = xa_err(xa_store(&tctx->xa, (unsigned long)ctx,
 						node, GFP_KERNEL));
 			if (ret) {
 				kfree(node);
@@ -8827,7 +8825,7 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 			list_add(&node->ctx_node, &ctx->tctx_list);
 			mutex_unlock(&ctx->uring_lock);
 		}
-		tctx->last = file;
+		tctx->last = ctx;
 	}
 
 	/*
@@ -8862,7 +8860,7 @@ static void io_uring_del_task_file(unsigned long index)
 	list_del(&node->ctx_node);
 	mutex_unlock(&node->ctx->uring_lock);
 
-	if (tctx->last == node->file)
+	if (tctx->last == node->ctx)
 		tctx->last = NULL;
 	kfree(node);
 }
@@ -9161,7 +9159,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 		}
 		submitted = to_submit;
 	} else if (to_submit) {
-		ret = io_uring_add_task_file(ctx, f.file);
+		ret = io_uring_add_task_file(ctx);
 		if (unlikely(ret))
 			goto out;
 		mutex_lock(&ctx->uring_lock);
@@ -9370,7 +9368,7 @@ static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
 	if (fd < 0)
 		return fd;
 
-	ret = io_uring_add_task_file(ctx, file);
+	ret = io_uring_add_task_file(ctx);
 	if (ret) {
 		put_unused_fd(fd);
 		return ret;
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 7cb7bd0e334c..9761a0ec9f95 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -18,7 +18,7 @@ struct io_uring_task {
 	/* submission side */
 	struct xarray		xa;
 	struct wait_queue_head	wait;
-	struct file		*last;
+	void			*last;
 	void			*io_wq;
 	struct percpu_counter	inflight;
 	atomic_t		in_idle;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 6/8] io_uring: warn when ring exit takes too long
  2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
                   ` (4 preceding siblings ...)
  2021-03-06 11:02 ` [PATCH v4 5/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
  2021-03-06 11:02 ` [PATCH v4 7/8] io_uring: cancel reqs of all iowq's on ring exit Pavel Begunkov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
  To: Jens Axboe, io-uring

We use system_unbound_wq to run io_ring_exit_work(), so it's hard to
monitor whether removal hang or not. Add WARN_ONCE to catch hangs.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index c5436b24d221..e4c771df6364 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8555,6 +8555,7 @@ static void io_tctx_exit_cb(struct callback_head *cb)
 static void io_ring_exit_work(struct work_struct *work)
 {
 	struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
+	unsigned long timeout = jiffies + HZ * 60 * 5;
 	struct io_tctx_exit exit;
 	struct io_tctx_node *node;
 	int ret;
@@ -8567,10 +8568,14 @@ static void io_ring_exit_work(struct work_struct *work)
 	 */
 	do {
 		io_uring_try_cancel_requests(ctx, NULL, NULL);
+
+		WARN_ON_ONCE(time_after(jiffies, timeout));
 	} while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20));
 
 	mutex_lock(&ctx->uring_lock);
 	while (!list_empty(&ctx->tctx_list)) {
+		WARN_ON_ONCE(time_after(jiffies, timeout));
+
 		node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
 					ctx_node);
 		exit.ctx = ctx;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 7/8] io_uring: cancel reqs of all iowq's on ring exit
  2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
                   ` (5 preceding siblings ...)
  2021-03-06 11:02 ` [PATCH v4 6/8] io_uring: warn when ring exit takes too long Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
  2021-03-06 11:02 ` [PATCH v4 8/8] io-wq: warn on creating manager awhile exiting Pavel Begunkov
  2021-03-08 21:05 ` [PATCH v4 5.12 0/8] remove task file notes Jens Axboe
  8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
  To: Jens Axboe, io-uring

io_ring_exit_work() have to cancel all requests, including those staying
in io-wq, however it tries only cancellation of current tctx, which is
NULL. If we've got task==NULL, use the ctx-to-tctx map to go over all
tctx/io-wq and try cancellations on them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e4c771df6364..a367e65a2191 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8683,19 +8683,55 @@ static void io_cancel_defer_files(struct io_ring_ctx *ctx,
 	}
 }
 
+static bool io_cancel_ctx_cb(struct io_wq_work *work, void *data)
+{
+	struct io_kiocb *req = container_of(work, struct io_kiocb, work);
+
+	return req->ctx == data;
+}
+
+static bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx)
+{
+	struct io_tctx_node *node;
+	enum io_wq_cancel cret;
+	bool ret = false;
+
+	mutex_lock(&ctx->uring_lock);
+	list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
+		struct io_uring_task *tctx = node->task->io_uring;
+
+		/*
+		 * io_wq will stay alive while we hold uring_lock, because it's
+		 * killed after ctx nodes, which requires to take the lock.
+		 */
+		if (!tctx || !tctx->io_wq)
+			continue;
+		cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_ctx_cb, ctx, true);
+		ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
+	}
+	mutex_unlock(&ctx->uring_lock);
+
+	return ret;
+}
+
 static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 					 struct task_struct *task,
 					 struct files_struct *files)
 {
 	struct io_task_cancel cancel = { .task = task, .files = files, };
-	struct task_struct *tctx_task = task ?: current;
-	struct io_uring_task *tctx = tctx_task->io_uring;
+	struct io_uring_task *tctx = task ? task->io_uring : NULL;
 
 	while (1) {
 		enum io_wq_cancel cret;
 		bool ret = false;
 
-		if (tctx && tctx->io_wq) {
+		if (!task) {
+			ret |= io_uring_try_cancel_iowq(ctx);
+		} else if (tctx && tctx->io_wq) {
+			/*
+			 * Cancels requests of all rings, not only @ctx, but
+			 * it's fine as the task is in exit/exec.
+			 */
 			cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_task_cb,
 					       &cancel, true);
 			ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 8/8] io-wq: warn on creating manager awhile exiting
  2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
                   ` (6 preceding siblings ...)
  2021-03-06 11:02 ` [PATCH v4 7/8] io_uring: cancel reqs of all iowq's on ring exit Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
  2021-03-08 21:05 ` [PATCH v4 5.12 0/8] remove task file notes Jens Axboe
  8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Add a simple warning making sure that nobody tries to create a new
manager while we're under IO_WQ_BIT_EXIT. That can potentially happen
due to racy work submission after final put.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io-wq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 1bfdb86336e4..1ab9324e602f 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -774,6 +774,8 @@ static int io_wq_fork_manager(struct io_wq *wq)
 	if (wq->manager)
 		return 0;
 
+	WARN_ON_ONCE(test_bit(IO_WQ_BIT_EXIT, &wq->state));
+
 	init_completion(&wq->worker_done);
 	atomic_set(&wq->worker_refs, 1);
 	tsk = create_io_thread(io_wq_manager, wq, NUMA_NO_NODE);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 5.12 0/8] remove task file notes
  2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
                   ` (7 preceding siblings ...)
  2021-03-06 11:02 ` [PATCH v4 8/8] io-wq: warn on creating manager awhile exiting Pavel Begunkov
@ 2021-03-08 21:05 ` Jens Axboe
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2021-03-08 21:05 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 3/6/21 4:02 AM, Pavel Begunkov wrote:
> Introduce a mapping from ctx to all tctx, and using that removes
> file notes, i.e. taking a io_uring file note previously stored in
> task->io_uring->xa. It's needed because we don't free io_uring ctx
> until all submitters die/exec, and it became worse after killing
>  ->flush(). There are rough corner in a form of not behaving nicely,
> I'll address in follow-up patches. Also use it to do cancellations
> right.
> 
> The torture is as simple as below. It will get OOM in no time. Also,
> I plan to use it to fix recently broken cancellations.
> 
> while (1) {
> 	assert(!io_uring_queue_init(8, &ring, 0));
> 	io_uring_queue_exit(&ring);
> }
> 
> WARNING: hangs without reverting sq park refactoring

Got fixed separately - forgot to write that this series is applied,
thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-03-08 21:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 1/8] io_uring: make del_task_file more forgiving Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 2/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 3/8] io_uring: do ctx initiated file note removal Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 4/8] io_uring: don't take task ring-file notes Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 5/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 6/8] io_uring: warn when ring exit takes too long Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 7/8] io_uring: cancel reqs of all iowq's on ring exit Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 8/8] io-wq: warn on creating manager awhile exiting Pavel Begunkov
2021-03-08 21:05 ` [PATCH v4 5.12 0/8] remove task file notes Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.