bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf 0/2] bpf: fix bpf_send_signal()/bpf_send_signal_thread() helper in NMI mode
@ 2020-03-03 23:15 Yonghong Song
  2020-03-03 23:15 ` [PATCH bpf 1/2] " Yonghong Song
  2020-03-03 23:15 ` [PATCH bpf 2/2] bpf: avoid irq_work for bpf_send_signal() if CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG Yonghong Song
  0 siblings, 2 replies; 5+ messages in thread
From: Yonghong Song @ 2020-03-03 23:15 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Commit 8b401f9ed244 ("bpf: implement bpf_send_signal() helper")
introduced bpf_send_signal() helper and Commit 8482941f0906
("bpf: Add bpf_send_signal_thread() helper") added bpf_send_signal_thread()
helper. Both helpers try to send a signel to current process or thread.

When the bpf prog, hence the helper, is called in nmi mode,
the actual sending of signal is delayed to an irq_work.
But this is still not always safe as nmi could happen
in scheduler with scheduler lock is taken, later on
the routine to send signal may tries to acquire the same
spinlock and caused a deadlock. See patch #1 for more
detailed description of the problem and how to use
task_work to solve the problem.

Patch #2 is an optimization. task_work can be set up
directly in nmi mode if CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
is true. Indeed, CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG is true
for most modern architectures.

Patch #1 is for bpf tree. Patch #2 is intended for bpf-next tree.

Yonghong Song (2):
  bpf: fix bpf_send_signal()/bpf_send_signal_thread() helper in NMI mode
  bpf: avoid irq_work for bpf_send_signal() if
    CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG

 kernel/trace/bpf_trace.c | 82 ++++++++++++++++++++++++++++++++--------
 1 file changed, 67 insertions(+), 15 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH bpf 1/2] bpf: fix bpf_send_signal()/bpf_send_signal_thread() helper in NMI mode
  2020-03-03 23:15 [PATCH bpf 0/2] bpf: fix bpf_send_signal()/bpf_send_signal_thread() helper in NMI mode Yonghong Song
@ 2020-03-03 23:15 ` Yonghong Song
  2020-03-04  1:08   ` Alexei Starovoitov
  2020-03-03 23:15 ` [PATCH bpf 2/2] bpf: avoid irq_work for bpf_send_signal() if CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG Yonghong Song
  1 sibling, 1 reply; 5+ messages in thread
From: Yonghong Song @ 2020-03-03 23:15 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Rik van Riel

When experimenting with bpf_send_signal() helper in our production environment,
we experienced a deadlock in NMI mode:
   #0 [fffffe000046be58] crash_nmi_callback at ffffffff8103f48b
   #1 [fffffe000046be60] nmi_handle at ffffffff8101feed
   #2 [fffffe000046beb8] default_do_nmi at ffffffff8102027e
   #3 [fffffe000046bed8] do_nmi at ffffffff81020434
   #4 [fffffe000046bef0] end_repeat_nmi at ffffffff81c01093
      [exception RIP: queued_spin_lock_slowpath+68]
      RIP: ffffffff8110be24  RSP: ffffc9002219f770  RFLAGS: 00000002
      RAX: 0000000000000101  RBX: 0000000000000046  RCX: 000000000000002a
      RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff88871c96c044
      RBP: 0000000000000000   R8: ffff88870f11f040   R9: 0000000000000000
      R10: 0000000000000008  R11: 00000000acd93e4d  R12: ffff88871c96c044
      R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000001
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  --- <NMI exception stack> ---
   #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
   #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
   #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
   #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
   #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
  #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
  #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
  #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
  #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
  #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
  #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
  #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
  #17 [ffffc9002219fc90] schedule at ffffffff81a3e219
  #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
  #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
  #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
  #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
  #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
  #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068

Basically, when task->pi_lock is taken, a NMI happens, bpf program executes,
which calls bpf program. The bpf program calls bpf_send_signal() helper,
which will call group_send_sig_info() in irq_work, which will try to
grab task->pi_lock again and failed due to deadlock.

To break the deadlock, group_send_sig_info() call should be delayed
until it is safe to do.

This patch registers a task_work callback inside the irq_work so
group_send_sig_info() in the task_work can be called later safely.

This patch also fixed a potential issue where the "current"
task in nmi context is gone when the actual irq_work is triggered.
Hold a reference to the task and drop the reference inside
the irq_work to ensure the task is not gone.

Fixes: 8482941f0906 ("bpf: Add bpf_send_signal_thread() helper")
Fixes: 8b401f9ed244 ("bpf: implement bpf_send_signal() helper")
Cc: Rik van Riel <riel@surriel.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/trace/bpf_trace.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 07764c761073..db7b6194e38a 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -12,6 +12,7 @@
 #include <linux/ctype.h>
 #include <linux/kprobes.h>
 #include <linux/syscalls.h>
+#include <linux/task_work.h>
 #include <linux/error-injection.h>
 
 #include <asm/tlb.h>
@@ -697,6 +698,36 @@ static const struct bpf_func_proto bpf_current_task_under_cgroup_proto = {
 	.arg2_type      = ARG_ANYTHING,
 };
 
+struct send_signal_work_cb {
+	struct callback_head twork;
+	u32 sig;
+	enum pid_type type;
+};
+
+static void do_send_signal_work(struct callback_head *twork)
+{
+	struct send_signal_work_cb *twcb = container_of(twork,
+			struct send_signal_work_cb, twork);
+
+	group_send_sig_info(twcb->sig, SEND_SIG_PRIV, current, twcb->type);
+	kfree(twcb);
+}
+
+static void add_send_signal_task_work(u32 sig, enum pid_type type)
+{
+	struct send_signal_work_cb *twcb;
+
+	twcb = kzalloc(sizeof(*twcb), GFP_ATOMIC);
+	if (!twcb)
+		return;
+
+	twcb->sig = sig;
+	twcb->type = type;
+	init_task_work(&twcb->twork, do_send_signal_work);
+	if (task_work_add(current, &twcb->twork, true))
+		kfree(twcb);
+}
+
 struct send_signal_irq_work {
 	struct irq_work irq_work;
 	struct task_struct *task;
@@ -711,7 +742,10 @@ static void do_bpf_send_signal(struct irq_work *entry)
 	struct send_signal_irq_work *work;
 
 	work = container_of(entry, struct send_signal_irq_work, irq_work);
-	group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, work->type);
+	if (work->task == current && !(current->flags & PF_EXITING))
+		add_send_signal_task_work(work->sig, work->type);
+
+	put_task_struct(work->task);
 }
 
 static int bpf_send_signal_common(u32 sig, enum pid_type type)
@@ -745,7 +779,7 @@ static int bpf_send_signal_common(u32 sig, enum pid_type type)
 		 * to the irq_work. The current task may change when queued
 		 * irq works get executed.
 		 */
-		work->task = current;
+		work->task = get_task_struct(current);
 		work->sig = sig;
 		work->type = type;
 		irq_work_queue(&work->irq_work);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH bpf 2/2] bpf: avoid irq_work for bpf_send_signal() if CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
  2020-03-03 23:15 [PATCH bpf 0/2] bpf: fix bpf_send_signal()/bpf_send_signal_thread() helper in NMI mode Yonghong Song
  2020-03-03 23:15 ` [PATCH bpf 1/2] " Yonghong Song
@ 2020-03-03 23:15 ` Yonghong Song
  1 sibling, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2020-03-03 23:15 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Rik van Riel

This is an optimization. In task_work_add(), we have
the following loop:
        do {
                head = READ_ONCE(task->task_works);
                if (unlikely(head == &work_exited))
                        return -ESRCH;
                work->next = head;
        } while (cmpxchg(&task->task_works, head, work) != head);

If CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, even in the
nmi context, we are safe to call task_work_add().
In such cases, irq_work() can be avoided, to avoid
the intermediate step to set up the task_work.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/trace/bpf_trace.c | 52 +++++++++++++++++++++++++++-------------
 1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index db7b6194e38a..b7bb11c0e5b0 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -713,21 +713,26 @@ static void do_send_signal_work(struct callback_head *twork)
 	kfree(twcb);
 }
 
-static void add_send_signal_task_work(u32 sig, enum pid_type type)
+static int add_send_signal_task_work(u32 sig, enum pid_type type)
 {
 	struct send_signal_work_cb *twcb;
+	int ret;
 
 	twcb = kzalloc(sizeof(*twcb), GFP_ATOMIC);
 	if (!twcb)
-		return;
+		return -ENOMEM;
 
 	twcb->sig = sig;
 	twcb->type = type;
 	init_task_work(&twcb->twork, do_send_signal_work);
-	if (task_work_add(current, &twcb->twork, true))
+	ret = task_work_add(current, &twcb->twork, true);
+	if (ret)
 		kfree(twcb);
+
+	return ret;
 }
 
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 struct send_signal_irq_work {
 	struct irq_work irq_work;
 	struct task_struct *task;
@@ -748,10 +753,29 @@ static void do_bpf_send_signal(struct irq_work *entry)
 	put_task_struct(work->task);
 }
 
-static int bpf_send_signal_common(u32 sig, enum pid_type type)
+static int add_send_signal_irq_work(u32 sig, enum pid_type type)
 {
 	struct send_signal_irq_work *work = NULL;
 
+	work = this_cpu_ptr(&send_signal_work);
+	if (atomic_read(&work->irq_work.flags) & IRQ_WORK_BUSY)
+		return -EBUSY;
+
+	/* Add the current task, which is the target of sending signal,
+	 * to the irq_work. The current task may change when queued
+	 * irq works get executed.
+	 */
+	work->task = get_task_struct(current);
+	work->sig = sig;
+	work->type = type;
+	irq_work_queue(&work->irq_work);
+
+	return 0;
+}
+#endif
+
+static int bpf_send_signal_common(u32 sig, enum pid_type type)
+{
 	/* Similar to bpf_probe_write_user, task needs to be
 	 * in a sound condition and kernel memory access be
 	 * permitted in order to send signal to the current
@@ -771,19 +795,11 @@ static int bpf_send_signal_common(u32 sig, enum pid_type type)
 		if (unlikely(!valid_signal(sig)))
 			return -EINVAL;
 
-		work = this_cpu_ptr(&send_signal_work);
-		if (atomic_read(&work->irq_work.flags) & IRQ_WORK_BUSY)
-			return -EBUSY;
-
-		/* Add the current task, which is the target of sending signal,
-		 * to the irq_work. The current task may change when queued
-		 * irq works get executed.
-		 */
-		work->task = get_task_struct(current);
-		work->sig = sig;
-		work->type = type;
-		irq_work_queue(&work->irq_work);
-		return 0;
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+		return add_send_signal_irq_work(sig, type);
+#else
+		return add_send_signal_task_work(sig, type);
+#endif
 	}
 
 	return group_send_sig_info(sig, SEND_SIG_PRIV, current, type);
@@ -1673,6 +1689,7 @@ int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
 	return err;
 }
 
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
 static int __init send_signal_irq_work_init(void)
 {
 	int cpu;
@@ -1686,6 +1703,7 @@ static int __init send_signal_irq_work_init(void)
 }
 
 subsys_initcall(send_signal_irq_work_init);
+#endif
 
 #ifdef CONFIG_MODULES
 static int bpf_event_notify(struct notifier_block *nb, unsigned long op,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf 1/2] bpf: fix bpf_send_signal()/bpf_send_signal_thread() helper in NMI mode
  2020-03-03 23:15 ` [PATCH bpf 1/2] " Yonghong Song
@ 2020-03-04  1:08   ` Alexei Starovoitov
  2020-03-04  3:03     ` Yonghong Song
  0 siblings, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2020-03-04  1:08 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, kernel-team, Rik van Riel

On Tue, Mar 03, 2020 at 03:15:54PM -0800, Yonghong Song wrote:
> When experimenting with bpf_send_signal() helper in our production environment,
> we experienced a deadlock in NMI mode:
>    #0 [fffffe000046be58] crash_nmi_callback at ffffffff8103f48b
>    #1 [fffffe000046be60] nmi_handle at ffffffff8101feed
>    #2 [fffffe000046beb8] default_do_nmi at ffffffff8102027e
>    #3 [fffffe000046bed8] do_nmi at ffffffff81020434
>    #4 [fffffe000046bef0] end_repeat_nmi at ffffffff81c01093
>       [exception RIP: queued_spin_lock_slowpath+68]
>       RIP: ffffffff8110be24  RSP: ffffc9002219f770  RFLAGS: 00000002
>       RAX: 0000000000000101  RBX: 0000000000000046  RCX: 000000000000002a
>       RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff88871c96c044
>       RBP: 0000000000000000   R8: ffff88870f11f040   R9: 0000000000000000
>       R10: 0000000000000008  R11: 00000000acd93e4d  R12: ffff88871c96c044
>       R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000001
>       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   --- <NMI exception stack> ---
>    #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
>    #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
>    #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
>    #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
>    #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
>   #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
>   #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
>   #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
>   #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
>   #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
>   #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
>   #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
>   #17 [ffffc9002219fc90] schedule at ffffffff81a3e219
>   #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
>   #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
>   #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
>   #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
>   #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
>   #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068
> 
> Basically, when task->pi_lock is taken, a NMI happens, bpf program executes,
> which calls bpf program. The bpf program calls bpf_send_signal() helper,
> which will call group_send_sig_info() in irq_work, which will try to
> grab task->pi_lock again and failed due to deadlock.
> 
> To break the deadlock, group_send_sig_info() call should be delayed
> until it is safe to do.
> 
> This patch registers a task_work callback inside the irq_work so
> group_send_sig_info() in the task_work can be called later safely.
> 
> This patch also fixed a potential issue where the "current"
> task in nmi context is gone when the actual irq_work is triggered.
> Hold a reference to the task and drop the reference inside
> the irq_work to ensure the task is not gone.
> 
> Fixes: 8482941f0906 ("bpf: Add bpf_send_signal_thread() helper")
> Fixes: 8b401f9ed244 ("bpf: implement bpf_send_signal() helper")
> Cc: Rik van Riel <riel@surriel.com>
> Suggested-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Yonghong Song <yhs@fb.com>

I don't think that fixes it.
The stack trace is not doing nmi.
It's a sw event and 'if (in_nmi())' is false.
try_to_wake_up() is safe to do from irq_work for both current and other tasks.
I don't think task_work() is necessary here.
It's a very similar issue that was addressed by
commit eac9153f2b58 ("bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()")
Imo the same approach will work here.
Please craft a reproducer first though.
I think the one Song did for the above commit may be adopted for this case too.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf 1/2] bpf: fix bpf_send_signal()/bpf_send_signal_thread() helper in NMI mode
  2020-03-04  1:08   ` Alexei Starovoitov
@ 2020-03-04  3:03     ` Yonghong Song
  0 siblings, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2020-03-04  3:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, kernel-team, Rik van Riel



On 3/3/20 5:08 PM, Alexei Starovoitov wrote:
> On Tue, Mar 03, 2020 at 03:15:54PM -0800, Yonghong Song wrote:
>> When experimenting with bpf_send_signal() helper in our production environment,
>> we experienced a deadlock in NMI mode:
>>     #0 [fffffe000046be58] crash_nmi_callback at ffffffff8103f48b
>>     #1 [fffffe000046be60] nmi_handle at ffffffff8101feed
>>     #2 [fffffe000046beb8] default_do_nmi at ffffffff8102027e
>>     #3 [fffffe000046bed8] do_nmi at ffffffff81020434
>>     #4 [fffffe000046bef0] end_repeat_nmi at ffffffff81c01093
>>        [exception RIP: queued_spin_lock_slowpath+68]
>>        RIP: ffffffff8110be24  RSP: ffffc9002219f770  RFLAGS: 00000002
>>        RAX: 0000000000000101  RBX: 0000000000000046  RCX: 000000000000002a
>>        RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff88871c96c044
>>        RBP: 0000000000000000   R8: ffff88870f11f040   R9: 0000000000000000
>>        R10: 0000000000000008  R11: 00000000acd93e4d  R12: ffff88871c96c044
>>        R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000001
>>        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>    --- <NMI exception stack> ---
>>     #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
>>     #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
>>     #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
>>     #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
>>     #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
>>    #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
>>    #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
>>    #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
>>    #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
>>    #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
>>    #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
>>    #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
>>    #17 [ffffc9002219fc90] schedule at ffffffff81a3e219
>>    #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
>>    #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
>>    #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
>>    #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
>>    #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
>>    #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068
>>
>> Basically, when task->pi_lock is taken, a NMI happens, bpf program executes,
>> which calls bpf program. The bpf program calls bpf_send_signal() helper,
>> which will call group_send_sig_info() in irq_work, which will try to
>> grab task->pi_lock again and failed due to deadlock.
>>
>> To break the deadlock, group_send_sig_info() call should be delayed
>> until it is safe to do.
>>
>> This patch registers a task_work callback inside the irq_work so
>> group_send_sig_info() in the task_work can be called later safely.
>>
>> This patch also fixed a potential issue where the "current"
>> task in nmi context is gone when the actual irq_work is triggered.
>> Hold a reference to the task and drop the reference inside
>> the irq_work to ensure the task is not gone.
>>
>> Fixes: 8482941f0906 ("bpf: Add bpf_send_signal_thread() helper")
>> Fixes: 8b401f9ed244 ("bpf: implement bpf_send_signal() helper")
>> Cc: Rik van Riel <riel@surriel.com>
>> Suggested-by: Jens Axboe <axboe@kernel.dk>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> I don't think that fixes it.
> The stack trace is not doing nmi.
> It's a sw event and 'if (in_nmi())' is false.
> try_to_wake_up() is safe to do from irq_work for both current and other tasks.
> I don't think task_work() is necessary here.

I thought nmi is there but is gone and irq_work takes over ...
But clearly I am wrong, looks like a perf_sw_event...

> It's a very similar issue that was addressed by
> commit eac9153f2b58 ("bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()")
> Imo the same approach will work here.
> Please craft a reproducer first though.

Thanks for the tip. I am not aware of this. Will try to reproduce and 
then fix properly.

> I think the one Song did for the above commit may be adopted for this case too.
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-03-04  3:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-03 23:15 [PATCH bpf 0/2] bpf: fix bpf_send_signal()/bpf_send_signal_thread() helper in NMI mode Yonghong Song
2020-03-03 23:15 ` [PATCH bpf 1/2] " Yonghong Song
2020-03-04  1:08   ` Alexei Starovoitov
2020-03-04  3:03     ` Yonghong Song
2020-03-03 23:15 ` [PATCH bpf 2/2] bpf: avoid irq_work for bpf_send_signal() if CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).