All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/4] io_uring and task_work interactions
@ 2020-04-06 19:48 Jens Axboe
  2020-04-06 19:48 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-06 19:48 UTC (permalink / raw)
  To: io-uring; +Cc: peterz

There's a case for io_uring where we want to run the task_work on ring
exit, but we can't easily do that as we don't know if the task_works has
work, or if exit work has already been queued.

Here's two prep patches that adds a task_work_pending() helper, and
makes task_work_run() check it. Then we can remove these checks from
the caller, and we can have io_uring check if we need to run work if
it needs to on ring exit.

-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] task_work: add task_work_pending() helper
  2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
@ 2020-04-06 19:48 ` Jens Axboe
  2020-04-07 11:28   ` Oleg Nesterov
  2020-04-06 19:48 ` [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2020-04-06 19:48 UTC (permalink / raw)
  To: io-uring; +Cc: peterz, Jens Axboe

Various callsites currently check current->task_works != NULL to know
when to run work. Add a helper that we use internally for that. This is
in preparation for also not running if exit queue has been queued.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/task_work.h | 13 ++++++++++++-
 kernel/task_work.c        |  2 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index bd9a6a91c097..54c911bbf754 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -15,7 +15,18 @@ init_task_work(struct callback_head *twork, task_work_func_t func)
 
 int task_work_add(struct task_struct *task, struct callback_head *twork, bool);
 struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t);
-void task_work_run(void);
+void __task_work_run(void);
+
+static inline bool task_work_pending(void)
+{
+	return current->task_works;
+}
+
+static inline void task_work_run(void)
+{
+	if (task_work_pending())
+		__task_work_run();
+}
 
 static inline void exit_task_work(struct task_struct *task)
 {
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 825f28259a19..9620333423a3 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -87,7 +87,7 @@ task_work_cancel(struct task_struct *task, task_work_func_t func)
  * it exits. In the latter case task_work_add() can no longer add the
  * new work after task_work_run() returns.
  */
-void task_work_run(void)
+void __task_work_run(void)
 {
 	struct task_struct *task = current;
 	struct callback_head *work, *head, *next;
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
  2020-04-06 19:48 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
@ 2020-04-06 19:48 ` Jens Axboe
  2020-04-07 11:39   ` Oleg Nesterov
  2020-04-07 12:47   ` Peter Zijlstra
  2020-04-06 19:48 ` [PATCH 3/4] task_work: kill current->task_works checking in callers Jens Axboe
  2020-04-06 19:48 ` [PATCH 4/4] io_uring: flush task work before waiting for ring exit Jens Axboe
  3 siblings, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-06 19:48 UTC (permalink / raw)
  To: io-uring; +Cc: peterz, Jens Axboe

If task_work has already been run on task exit, we don't always know
if it's safe to run again. Check for task_work_exited in the
task_work_pending() helper. This makes it less fragile in calling
from the exit files path, for example.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/task_work.h | 4 +++-
 kernel/task_work.c        | 8 ++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 54c911bbf754..24f977a8fc35 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -7,6 +7,8 @@
 
 typedef void (*task_work_func_t)(struct callback_head *);
 
+extern struct callback_head task_work_exited;
+
 static inline void
 init_task_work(struct callback_head *twork, task_work_func_t func)
 {
@@ -19,7 +21,7 @@ void __task_work_run(void);
 
 static inline bool task_work_pending(void)
 {
-	return current->task_works;
+	return current->task_works && current->task_works != &task_work_exited;
 }
 
 static inline void task_work_run(void)
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 9620333423a3..d6a8b4ab4858 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -3,7 +3,7 @@
 #include <linux/task_work.h>
 #include <linux/tracehook.h>
 
-static struct callback_head work_exited; /* all we need is ->next == NULL */
+struct callback_head task_work_exited = { };
 
 /**
  * task_work_add - ask the @task to execute @work->func()
@@ -31,7 +31,7 @@ task_work_add(struct task_struct *task, struct callback_head *work, bool notify)
 
 	do {
 		head = READ_ONCE(task->task_works);
-		if (unlikely(head == &work_exited))
+		if (unlikely(head == &task_work_exited))
 			return -ESRCH;
 		work->next = head;
 	} while (cmpxchg(&task->task_works, head, work) != head);
@@ -95,14 +95,14 @@ void __task_work_run(void)
 	for (;;) {
 		/*
 		 * work->func() can do task_work_add(), do not set
-		 * work_exited unless the list is empty.
+		 * task_work_exited unless the list is empty.
 		 */
 		do {
 			head = NULL;
 			work = READ_ONCE(task->task_works);
 			if (!work) {
 				if (task->flags & PF_EXITING)
-					head = &work_exited;
+					head = &task_work_exited;
 				else
 					break;
 			}
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] task_work: kill current->task_works checking in callers
  2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
  2020-04-06 19:48 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
  2020-04-06 19:48 ` [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued Jens Axboe
@ 2020-04-06 19:48 ` Jens Axboe
  2020-04-06 19:48 ` [PATCH 4/4] io_uring: flush task work before waiting for ring exit Jens Axboe
  3 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-06 19:48 UTC (permalink / raw)
  To: io-uring; +Cc: peterz, Jens Axboe

If the callsite cares, use task_work_pending(). If not, just call
task_work_run() unconditionally, that makes the check inline and
doesn't add any extra overhead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io-wq.c                |  7 ++-----
 fs/io_uring.c             | 20 ++++++++------------
 include/linux/tracehook.h |  3 +--
 kernel/signal.c           |  7 +++----
 4 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 4023c9846860..5bee3f5f67e1 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -717,8 +717,7 @@ static int io_wq_manager(void *data)
 	complete(&wq->done);
 
 	while (!kthread_should_stop()) {
-		if (current->task_works)
-			task_work_run();
+		task_work_run();
 
 		for_each_node(node) {
 			struct io_wqe *wqe = wq->wqes[node];
@@ -742,9 +741,7 @@ static int io_wq_manager(void *data)
 		schedule_timeout(HZ);
 	}
 
-	if (current->task_works)
-		task_work_run();
-
+	task_work_run();
 	return 0;
 err:
 	set_bit(IO_WQ_BIT_ERROR, &wq->state);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 79bd22289d73..1579390c7c53 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5967,8 +5967,7 @@ static int io_sq_thread(void *data)
 			if (!list_empty(&ctx->poll_list) ||
 			    (!time_after(jiffies, timeout) && ret != -EBUSY &&
 			    !percpu_ref_is_dying(&ctx->refs))) {
-				if (current->task_works)
-					task_work_run();
+				task_work_run();
 				cond_resched();
 				continue;
 			}
@@ -6000,8 +5999,8 @@ static int io_sq_thread(void *data)
 					finish_wait(&ctx->sqo_wait, &wait);
 					break;
 				}
-				if (current->task_works) {
-					task_work_run();
+				if (task_work_pending()) {
+					__task_work_run();
 					finish_wait(&ctx->sqo_wait, &wait);
 					continue;
 				}
@@ -6024,8 +6023,7 @@ static int io_sq_thread(void *data)
 		timeout = jiffies + ctx->sq_thread_idle;
 	}
 
-	if (current->task_works)
-		task_work_run();
+	task_work_run();
 
 	set_fs(old_fs);
 	if (cur_mm) {
@@ -6094,9 +6092,9 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	do {
 		if (io_cqring_events(ctx, false) >= min_events)
 			return 0;
-		if (!current->task_works)
+		if (!task_work_pending())
 			break;
-		task_work_run();
+		__task_work_run();
 	} while (1);
 
 	if (sig) {
@@ -6117,8 +6115,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	do {
 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
-		if (current->task_works)
-			task_work_run();
+		task_work_run();
 		if (io_should_wake(&iowq, false))
 			break;
 		schedule();
@@ -7467,8 +7464,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	int submitted = 0;
 	struct fd f;
 
-	if (current->task_works)
-		task_work_run();
+	task_work_run();
 
 	if (flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP))
 		return -EINVAL;
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 36fb3bbed6b2..608a2d12bc14 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -184,8 +184,7 @@ static inline void tracehook_notify_resume(struct pt_regs *regs)
 	 * hlist_add_head(task->task_works);
 	 */
 	smp_mb__after_atomic();
-	if (unlikely(current->task_works))
-		task_work_run();
+	task_work_run();
 
 #ifdef CONFIG_KEYS_REQUEST_CACHE
 	if (unlikely(current->cached_requested_key)) {
diff --git a/kernel/signal.c b/kernel/signal.c
index e58a6c619824..d62b7a3f2045 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2271,8 +2271,8 @@ static void ptrace_do_notify(int signr, int exit_code, int why)
 void ptrace_notify(int exit_code)
 {
 	BUG_ON((exit_code & (0x7f | ~0xffff)) != SIGTRAP);
-	if (unlikely(current->task_works))
-		task_work_run();
+
+	task_work_run();
 
 	spin_lock_irq(&current->sighand->siglock);
 	ptrace_do_notify(SIGTRAP, exit_code, CLD_TRAPPED);
@@ -2529,8 +2529,7 @@ bool get_signal(struct ksignal *ksig)
 	struct signal_struct *signal = current->signal;
 	int signr;
 
-	if (unlikely(current->task_works))
-		task_work_run();
+	task_work_run();
 
 	if (unlikely(uprobe_deny_signal()))
 		return false;
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] io_uring: flush task work before waiting for ring exit
  2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
                   ` (2 preceding siblings ...)
  2020-04-06 19:48 ` [PATCH 3/4] task_work: kill current->task_works checking in callers Jens Axboe
@ 2020-04-06 19:48 ` Jens Axboe
  3 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-06 19:48 UTC (permalink / raw)
  To: io-uring; +Cc: peterz, Jens Axboe

We could have triggered task work when we removed poll completions.
It's now safe to unconditionally call task_work_run() since it checks
for the exited task work, do so in case we have pending items there.
Ensure we do so before flushing the CQ ring overflow.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 1579390c7c53..183bd28761e3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7295,10 +7295,13 @@ static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
 		io_wq_cancel_all(ctx->io_wq);
 
 	io_iopoll_reap_events(ctx);
+	idr_for_each(&ctx->personality_idr, io_remove_personalities, ctx);
+	task_work_run();
+
 	/* if we failed setting up the ctx, we might not have any rings */
 	if (ctx->rings)
 		io_cqring_overflow_flush(ctx, true);
-	idr_for_each(&ctx->personality_idr, io_remove_personalities, ctx);
+
 	wait_for_completion(&ctx->completions[0]);
 	io_ring_ctx_free(ctx);
 }
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] task_work: add task_work_pending() helper
  2020-04-06 19:48 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
@ 2020-04-07 11:28   ` Oleg Nesterov
  2020-04-07 15:43     ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2020-04-07 11:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, peterz

On 04/06, Jens Axboe wrote:
>
> +static inline bool task_work_pending(void)
> +{
> +	return current->task_works;
> +}
> +
> +static inline void task_work_run(void)
> +{
> +	if (task_work_pending())
> +		__task_work_run();
> +}

No, this is wrong. exit_task_work() must always call __task_work_run()
to install work_exited.

This helper (and 3/4) probably makes sense but please change exit_task_work()
to use __task_work_run() then.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-06 19:48 ` [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued Jens Axboe
@ 2020-04-07 11:39   ` Oleg Nesterov
  2020-04-07 11:54     ` Oleg Nesterov
  2020-04-07 15:40     ` Jens Axboe
  2020-04-07 12:47   ` Peter Zijlstra
  1 sibling, 2 replies; 15+ messages in thread
From: Oleg Nesterov @ 2020-04-07 11:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, peterz

On 04/06, Jens Axboe wrote:
>
> +extern struct callback_head task_work_exited;
> +
>  static inline void
>  init_task_work(struct callback_head *twork, task_work_func_t func)
>  {
> @@ -19,7 +21,7 @@ void __task_work_run(void);
>
>  static inline bool task_work_pending(void)
>  {
> -	return current->task_works;
> +	return current->task_works && current->task_works != &task_work_exited;
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Well, this penalizes all the current users, they can't hit work_exited.

IIUC, this is needed for the next change which adds task_work_run() into
io_ring_ctx_wait_and_kill(), right?

could you explain how the exiting can call io_ring_ctx_wait_and_kill()
after it passed exit_task_work() ?

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 11:39   ` Oleg Nesterov
@ 2020-04-07 11:54     ` Oleg Nesterov
  2020-04-07 15:40     ` Jens Axboe
  1 sibling, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2020-04-07 11:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, peterz

On 04/07, Oleg Nesterov wrote:
>
> On 04/06, Jens Axboe wrote:
> >
> > +extern struct callback_head task_work_exited;
> > +
> >  static inline void
> >  init_task_work(struct callback_head *twork, task_work_func_t func)
> >  {
> > @@ -19,7 +21,7 @@ void __task_work_run(void);
> >
> >  static inline bool task_work_pending(void)
> >  {
> > -	return current->task_works;
> > +	return current->task_works && current->task_works != &task_work_exited;
>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Well, this penalizes all the current users, they can't hit work_exited.
>
> IIUC, this is needed for the next change which adds task_work_run() into
> io_ring_ctx_wait_and_kill(), right?
>
> could you explain how the exiting can call io_ring_ctx_wait_and_kill()
> after it passed exit_task_work() ?

Note also that currently we assume that task_work_run() must not be called
by the exiting process (except exit_task_work). If io_ring adds the precedent
we need change the PF_EXITING logic in task_work_run().

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-06 19:48 ` [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued Jens Axboe
  2020-04-07 11:39   ` Oleg Nesterov
@ 2020-04-07 12:47   ` Peter Zijlstra
  2020-04-07 15:43     ` Jens Axboe
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2020-04-07 12:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Oleg Nesterov, viro


You seem to have lost Oleg and Al from the Cc list..

On Mon, Apr 06, 2020 at 01:48:51PM -0600, Jens Axboe wrote:
> If task_work has already been run on task exit, we don't always know
> if it's safe to run again. Check for task_work_exited in the
> task_work_pending() helper. This makes it less fragile in calling
> from the exit files path, for example.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  include/linux/task_work.h | 4 +++-
>  kernel/task_work.c        | 8 ++++----
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/task_work.h b/include/linux/task_work.h
> index 54c911bbf754..24f977a8fc35 100644
> --- a/include/linux/task_work.h
> +++ b/include/linux/task_work.h
> @@ -7,6 +7,8 @@
>  
>  typedef void (*task_work_func_t)(struct callback_head *);
>  
> +extern struct callback_head task_work_exited;
> +
>  static inline void
>  init_task_work(struct callback_head *twork, task_work_func_t func)
>  {
> @@ -19,7 +21,7 @@ void __task_work_run(void);
>  
>  static inline bool task_work_pending(void)
>  {
> -	return current->task_works;
> +	return current->task_works && current->task_works != &task_work_exited;
>  }

Hurmph..  not sure I like this. It inlines that second condition to
every caller of task_work_run() even though for pretty much all of them
this is impossible.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 11:39   ` Oleg Nesterov
  2020-04-07 11:54     ` Oleg Nesterov
@ 2020-04-07 15:40     ` Jens Axboe
  2020-04-07 16:19       ` Oleg Nesterov
  1 sibling, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2020-04-07 15:40 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: io-uring, peterz

On 4/7/20 4:39 AM, Oleg Nesterov wrote:
> On 04/06, Jens Axboe wrote:
>>
>> +extern struct callback_head task_work_exited;
>> +
>>  static inline void
>>  init_task_work(struct callback_head *twork, task_work_func_t func)
>>  {
>> @@ -19,7 +21,7 @@ void __task_work_run(void);
>>
>>  static inline bool task_work_pending(void)
>>  {
>> -	return current->task_works;
>> +	return current->task_works && current->task_works != &task_work_exited;
>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Well, this penalizes all the current users, they can't hit work_exited.
> 
> IIUC, this is needed for the next change which adds task_work_run() into
> io_ring_ctx_wait_and_kill(), right?

Right - so you'd rather I localize that check there instead? Can certainly
do that.

> could you explain how the exiting can call io_ring_ctx_wait_and_kill()
> after it passed exit_task_work() ?

Sure, here's a trace where it happens:


BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 0 P4D 0 
Oops: 0010 [#1] SMP
CPU: 51 PID: 7290 Comm: mc_worker Kdump: loaded Not tainted 5.2.9-03696-gf2db01aa1e97 #190
Hardware name: Quanta Leopard ORv2-DDR4/Leopard ORv2-DDR4, BIOS F06_3B17 03/16/2018
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0018:ffffc9002721bc78 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffffff82d10ff0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffc9002721bc60 RDI: ffffffff82d10ff0
RBP: ffff889fd220e8f0 R08: 0000000000000000 R09: ffffffff812f1000
R10: ffff88bfa5fcb100 R11: 0000000000000000 R12: ffff889fd220e200
R13: ffff889fd220e92c R14: ffffffff82d10ff0 R15: 0000000000000000
FS:  00007f03161ff700(0000) GS:ffff88bfff9c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 0000000002409004 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __task_work_run+0x66/0xa0
 io_ring_ctx_wait_and_kill+0x14e/0x3c0
 io_uring_release+0x1c/0x20
 __fput+0xaa/0x200
 __task_work_run+0x66/0xa0
 do_exit+0x9cf/0xb40
 do_group_exit+0x3a/0xa0
 get_signal+0x152/0x800
 do_signal+0x36/0x640
 ? __audit_syscall_exit+0x23c/0x290
 exit_to_usermode_loop+0x65/0x100
 do_syscall_64+0xd4/0x100
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f77057fe8ce
Code: Bad RIP value.
RSP: 002b:00007f03161f8960 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: 000000000000002a RBX: 00007f03161f8a30 RCX: 00007f77057fe8ce
RDX: 0000000000004040 RSI: 00007f03161f8a30 RDI: 00000000000057a4
RBP: 00007f03161f8980 R08: 0000000000000000 R09: 00007f03161fb7b8
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
R13: 0000000000004040 R14: 00007f02dc12bc00 R15: 00007f02dc21b900

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 12:47   ` Peter Zijlstra
@ 2020-04-07 15:43     ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-07 15:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: io-uring, Oleg Nesterov, viro

On 4/7/20 5:47 AM, Peter Zijlstra wrote:
> 
> You seem to have lost Oleg and Al from the Cc list..

I'll add them for v2, I did point Oleg at it!

> On Mon, Apr 06, 2020 at 01:48:51PM -0600, Jens Axboe wrote:
>> If task_work has already been run on task exit, we don't always know
>> if it's safe to run again. Check for task_work_exited in the
>> task_work_pending() helper. This makes it less fragile in calling
>> from the exit files path, for example.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  include/linux/task_work.h | 4 +++-
>>  kernel/task_work.c        | 8 ++++----
>>  2 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/task_work.h b/include/linux/task_work.h
>> index 54c911bbf754..24f977a8fc35 100644
>> --- a/include/linux/task_work.h
>> +++ b/include/linux/task_work.h
>> @@ -7,6 +7,8 @@
>>  
>>  typedef void (*task_work_func_t)(struct callback_head *);
>>  
>> +extern struct callback_head task_work_exited;
>> +
>>  static inline void
>>  init_task_work(struct callback_head *twork, task_work_func_t func)
>>  {
>> @@ -19,7 +21,7 @@ void __task_work_run(void);
>>  
>>  static inline bool task_work_pending(void)
>>  {
>> -	return current->task_works;
>> +	return current->task_works && current->task_works != &task_work_exited;
>>  }
> 
> Hurmph..  not sure I like this. It inlines that second condition to
> every caller of task_work_run() even though for pretty much all of them
> this is impossible.

Oleg had the same concern, and I agree with both of you. Would you
prefer it we just leave task_work_run() as:

static inline void task_work_run(void)
{
	if (current->task_works)
		__task_work_run();
}

and then have the io_uring caller do:

	if (current->task_works != &task_work_exited)
		task_work_run();

instead?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] task_work: add task_work_pending() helper
  2020-04-07 11:28   ` Oleg Nesterov
@ 2020-04-07 15:43     ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-07 15:43 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: io-uring, peterz

On 4/7/20 4:28 AM, Oleg Nesterov wrote:
> On 04/06, Jens Axboe wrote:
>>
>> +static inline bool task_work_pending(void)
>> +{
>> +	return current->task_works;
>> +}
>> +
>> +static inline void task_work_run(void)
>> +{
>> +	if (task_work_pending())
>> +		__task_work_run();
>> +}
> 
> No, this is wrong. exit_task_work() must always call __task_work_run()
> to install work_exited.
> 
> This helper (and 3/4) probably makes sense but please change exit_task_work()
> to use __task_work_run() then.

Good catch, thanks. I'll make the change.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 15:40     ` Jens Axboe
@ 2020-04-07 16:19       ` Oleg Nesterov
  2020-04-07 16:59         ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2020-04-07 16:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, peterz

On 04/07, Jens Axboe wrote:
>
> On 4/7/20 4:39 AM, Oleg Nesterov wrote:
> >
> > IIUC, this is needed for the next change which adds task_work_run() into
> > io_ring_ctx_wait_and_kill(), right?
>
> Right - so you'd rather I localize that check there instead? Can certainly
> do that.

I am still not sure we need this check at all... probably this is because
I don't understand the problem.

> > could you explain how the exiting can call io_ring_ctx_wait_and_kill()
> > after it passed exit_task_work() ?
>
> Sure, here's a trace where it happens:

but this task has not passed exit_task_work(),

>  __task_work_run+0x66/0xa0
>  io_ring_ctx_wait_and_kill+0x14e/0x3c0
>  io_uring_release+0x1c/0x20
>  __fput+0xaa/0x200
>  __task_work_run+0x66/0xa0
>  do_exit+0x9cf/0xb40

So task_work_run() is called recursively from exit_task_work()->task_work_run().
See my another email, this is wrong with or without this series. And that is
why I think task_work_run() hits work_exited.

Could you explain why io_ring_ctx_wait_and_kill() needs task_work_run() ?

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 16:19       ` Oleg Nesterov
@ 2020-04-07 16:59         ` Jens Axboe
  2020-04-07 17:43           ` Oleg Nesterov
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2020-04-07 16:59 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: io-uring, peterz

On 4/7/20 9:19 AM, Oleg Nesterov wrote:
> On 04/07, Jens Axboe wrote:
>>
>> On 4/7/20 4:39 AM, Oleg Nesterov wrote:
>>>
>>> IIUC, this is needed for the next change which adds task_work_run() into
>>> io_ring_ctx_wait_and_kill(), right?
>>
>> Right - so you'd rather I localize that check there instead? Can certainly
>> do that.
> 
> I am still not sure we need this check at all... probably this is because
> I don't understand the problem.

Probably because I'm not explaining it very well... Let me try. io_uring
uses the task_work to handle completion of poll requests. Either an
explicit poll, or one done implicitly because someone did a recv/send
(or whatever) on a socket that wasn't ready. When we get the poll
waitqueue callback, we queue up task_work to handle the completion of
it.

These can come in at any time, obviously, as space or data becomes
available. If the task is exiting, our task_work_add() fails, and we
queue with someone else. But there seems to be a case where it does get
queued locally, and then io_uring doesn't know if it's safe to run
task_work or not. Sometimes that manifests itself in hitting the RIP ==
0 case that I included here. With the work_pending && work != exit_work
in place, it works fine.

>>> could you explain how the exiting can call io_ring_ctx_wait_and_kill()
>>> after it passed exit_task_work() ?
>>
>> Sure, here's a trace where it happens:
> 
> but this task has not passed exit_task_work(),

But it's definitely hitting callback->func == NULL, which is the
exit_work. So if it's not from past exit_task_work(), where is it from?

> 
>>  __task_work_run+0x66/0xa0
>>  io_ring_ctx_wait_and_kill+0x14e/0x3c0
>>  io_uring_release+0x1c/0x20
>>  __fput+0xaa/0x200
>>  __task_work_run+0x66/0xa0
>>  do_exit+0x9cf/0xb40
> 
> So task_work_run() is called recursively from
> exit_task_work()->task_work_run().  See my another email, this is
> wrong with or without this series. And that is why I think
> task_work_run() hits work_exited.

I see your newer email on this, I'll go read it.

> Could you explain why io_ring_ctx_wait_and_kill() needs
> task_work_run() ?

Hopefully the above helped! If I'm way off somehow, cluebat away.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 16:59         ` Jens Axboe
@ 2020-04-07 17:43           ` Oleg Nesterov
  0 siblings, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2020-04-07 17:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, peterz

Jens,

I'll read your email tomorrow, just one note for now ...

On 04/07, Jens Axboe wrote:
>
> On 4/7/20 9:19 AM, Oleg Nesterov wrote:
> >
> > but this task has not passed exit_task_work(),
>
> But it's definitely hitting callback->func == NULL, which is the
> exit_work. So if it's not from past exit_task_work(), where is it from?

I guess it comes from task_work_run() added by the next patch ;)

> I see your newer email on this, I'll go read it.

please look at the "bad_work_func" example.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-04-07 17:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
2020-04-06 19:48 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
2020-04-07 11:28   ` Oleg Nesterov
2020-04-07 15:43     ` Jens Axboe
2020-04-06 19:48 ` [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued Jens Axboe
2020-04-07 11:39   ` Oleg Nesterov
2020-04-07 11:54     ` Oleg Nesterov
2020-04-07 15:40     ` Jens Axboe
2020-04-07 16:19       ` Oleg Nesterov
2020-04-07 16:59         ` Jens Axboe
2020-04-07 17:43           ` Oleg Nesterov
2020-04-07 12:47   ` Peter Zijlstra
2020-04-07 15:43     ` Jens Axboe
2020-04-06 19:48 ` [PATCH 3/4] task_work: kill current->task_works checking in callers Jens Axboe
2020-04-06 19:48 ` [PATCH 4/4] io_uring: flush task work before waiting for ring exit Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.