linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
@ 2020-04-08 16:46 Jiri Olsa
  2020-04-09  9:02 ` Naveen N. Rao
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Jiri Olsa @ 2020-04-08 16:46 UTC (permalink / raw)
  To: Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Masami Hiramatsu, Peter Zijlstra
  Cc: lkml, bibo,mao, Ziqian SUN (Zamir)

hi,
Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
My test was also able to trigger lockdep output:

 ============================================
 WARNING: possible recursive locking detected
 5.6.0-rc6+ #6 Not tainted
 --------------------------------------------
 sched-messaging/2767 is trying to acquire lock:
 ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0

 but task is already holding lock:
 ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(kretprobe_table_locks[i].lock));
   lock(&(kretprobe_table_locks[i].lock));

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by sched-messaging/2767:
  #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 stack backtrace:
 CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
 Call Trace:
  dump_stack+0x96/0xe0
  __lock_acquire.cold.57+0x173/0x2b7
  ? native_queued_spin_lock_slowpath+0x42b/0x9e0
  ? lockdep_hardirqs_on+0x590/0x590
  ? __lock_acquire+0xf63/0x4030
  lock_acquire+0x15a/0x3d0
  ? kretprobe_hash_lock+0x52/0xa0
  _raw_spin_lock_irqsave+0x36/0x70
  ? kretprobe_hash_lock+0x52/0xa0
  kretprobe_hash_lock+0x52/0xa0
  trampoline_handler+0xf8/0x940
  ? kprobe_fault_handler+0x380/0x380
  ? find_held_lock+0x3a/0x1c0
  kretprobe_trampoline+0x25/0x50
  ? lock_acquired+0x392/0xbc0
  ? _raw_spin_lock_irqsave+0x50/0x70
  ? __get_valid_kprobe+0x1f0/0x1f0
  ? _raw_spin_unlock_irqrestore+0x3b/0x40
  ? finish_task_switch+0x4b9/0x6d0
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70

The code within the kretprobe handler checks for probe reentrancy,
so we won't trigger any _raw_spin_lock_irqsave probe in there.

The problem is in outside kprobe_flush_task, where we call:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave

where _raw_spin_lock_irqsave triggers the kretprobe and installs
kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

The kretprobe_trampoline handler is then executed with already
locked kretprobe_table_locks, and first thing it does is to
lock kretprobe_table_locks ;-) the whole lockup path like:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed

        ---> kretprobe_table_locks locked

        kretprobe_trampoline
          trampoline_handler
            kretprobe_hash_lock(current, &head, &flags);  <--- deadlock

The change below sets current_kprobe in kprobe_flush_task, so the probe
recursion protection check is hit and the probe is never set. It seems
to fix the deadlock.

I'm not sure this is the best fix, any ideas are welcome ;-)

thanks,
jirka


---
 kernel/kprobes.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2625c241ac00..b13247cae752 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1236,6 +1236,10 @@ __releases(hlist_lock)
 }
 NOKPROBE_SYMBOL(kretprobe_table_unlock);
 
+static struct kprobe kretprobe_dummy = {
+        .addr = (void *)kretprobe_trampoline,
+};
+
 /*
  * This function is called from finish_task_switch when task tk becomes dead,
  * so that we can recycle any function-return probe instances associated
@@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
 	INIT_HLIST_HEAD(&empty_rp);
 	hash = hash_ptr(tk, KPROBE_HASH_BITS);
 	head = &kretprobe_inst_table[hash];
+	__this_cpu_write(current_kprobe, &kretprobe_dummy);
 	kretprobe_table_lock(hash, &flags);
 	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
 		if (ri->task == tk)
 			recycle_rp_inst(ri, &empty_rp);
 	}
 	kretprobe_table_unlock(hash, &flags);
+	__this_cpu_write(current_kprobe, NULL);
 	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
 		hlist_del(&ri->hlist);
 		kfree(ri);
-- 
2.25.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-08 16:46 [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task Jiri Olsa
@ 2020-04-09  9:02 ` Naveen N. Rao
  2020-04-09 18:43   ` Jiri Olsa
  2020-04-09 12:38 ` Masami Hiramatsu
  2020-04-09 14:41 ` Masami Hiramatsu
  2 siblings, 1 reply; 24+ messages in thread
From: Naveen N. Rao @ 2020-04-09  9:02 UTC (permalink / raw)
  To: Anil S Keshavamurthy, David S. Miller, Jiri Olsa,
	Masami Hiramatsu, Peter Zijlstra
  Cc: bibo,mao, lkml, Ziqian SUN (Zamir)

Hi Jiri,

Jiri Olsa wrote:
> hi,
> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> My test was also able to trigger lockdep output:
> 
>  ============================================
>  WARNING: possible recursive locking detected
>  5.6.0-rc6+ #6 Not tainted
>  --------------------------------------------
>  sched-messaging/2767 is trying to acquire lock:
>  ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
> 
>  but task is already holding lock:
>  ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(&(kretprobe_table_locks[i].lock));
>    lock(&(kretprobe_table_locks[i].lock));
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  1 lock held by sched-messaging/2767:
>   #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  stack backtrace:
>  CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
>  Call Trace:
>   dump_stack+0x96/0xe0
>   __lock_acquire.cold.57+0x173/0x2b7
>   ? native_queued_spin_lock_slowpath+0x42b/0x9e0
>   ? lockdep_hardirqs_on+0x590/0x590
>   ? __lock_acquire+0xf63/0x4030
>   lock_acquire+0x15a/0x3d0
>   ? kretprobe_hash_lock+0x52/0xa0
>   _raw_spin_lock_irqsave+0x36/0x70
>   ? kretprobe_hash_lock+0x52/0xa0
>   kretprobe_hash_lock+0x52/0xa0
>   trampoline_handler+0xf8/0x940
>   ? kprobe_fault_handler+0x380/0x380
>   ? find_held_lock+0x3a/0x1c0
>   kretprobe_trampoline+0x25/0x50
>   ? lock_acquired+0x392/0xbc0
>   ? _raw_spin_lock_irqsave+0x50/0x70
>   ? __get_valid_kprobe+0x1f0/0x1f0
>   ? _raw_spin_unlock_irqrestore+0x3b/0x40
>   ? finish_task_switch+0x4b9/0x6d0
>   ? __switch_to_asm+0x34/0x70
>   ? __switch_to_asm+0x40/0x70
> 
> The code within the kretprobe handler checks for probe reentrancy,
> so we won't trigger any _raw_spin_lock_irqsave probe in there.
> 
> The problem is in outside kprobe_flush_task, where we call:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave
> 
> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> 
> The kretprobe_trampoline handler is then executed with already
> locked kretprobe_table_locks, and first thing it does is to
> lock kretprobe_table_locks ;-) the whole lockup path like:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> 
>         ---> kretprobe_table_locks locked
> 
>         kretprobe_trampoline
>           trampoline_handler
>             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> 
> The change below sets current_kprobe in kprobe_flush_task, so the probe
> recursion protection check is hit and the probe is never set. It seems
> to fix the deadlock.

Good analysis!

> 
> I'm not sure this is the best fix, any ideas are welcome ;-)

I think this is a good way to address this issue.

> 
> thanks,
> jirka
> 
> 
> ---
>  kernel/kprobes.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..b13247cae752 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
>  }
>  NOKPROBE_SYMBOL(kretprobe_table_unlock);
>  
> +static struct kprobe kretprobe_dummy = {
> +        .addr = (void *)kretprobe_trampoline,
> +};
> +

Perhaps a more meaningful name, say, kprobe_flush_task_protect ?

>  /*
>   * This function is called from finish_task_switch when task tk becomes dead,
>   * so that we can recycle any function-return probe instances associated
> @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
>  	INIT_HLIST_HEAD(&empty_rp);
>  	hash = hash_ptr(tk, KPROBE_HASH_BITS);
>  	head = &kretprobe_inst_table[hash];
> +	__this_cpu_write(current_kprobe, &kretprobe_dummy);
>  	kretprobe_table_lock(hash, &flags);
>  	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
>  		if (ri->task == tk)
>  			recycle_rp_inst(ri, &empty_rp);
>  	}
>  	kretprobe_table_unlock(hash, &flags);
> +	__this_cpu_write(current_kprobe, NULL);

I would move this to the end of the function to also cover the below 
code. kprobe_flush_task() is marked NOKPROBE, so it is good to prevent 
probe handling in the below code too.

>  	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
>  		hlist_del(&ri->hlist);
>  		kfree(ri);


- Naveen


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-08 16:46 [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task Jiri Olsa
  2020-04-09  9:02 ` Naveen N. Rao
@ 2020-04-09 12:38 ` Masami Hiramatsu
  2020-04-09 12:52   ` Jiri Olsa
  2020-04-09 13:16   ` Naveen N. Rao
  2020-04-09 14:41 ` Masami Hiramatsu
  2 siblings, 2 replies; 24+ messages in thread
From: Masami Hiramatsu @ 2020-04-09 12:38 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir)

Hi Jiri,

On Wed,  8 Apr 2020 18:46:41 +0200
Jiri Olsa <jolsa@kernel.org> wrote:

> hi,
> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.

Hmm, kprobe is lockless, but kretprobe involves spinlock.
Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
for kretprobe.
If you need to trace spinlock return, please consider to putting
kprobe at "ret" instruction.

> My test was also able to trigger lockdep output:
> 
>  ============================================
>  WARNING: possible recursive locking detected
>  5.6.0-rc6+ #6 Not tainted
>  --------------------------------------------
>  sched-messaging/2767 is trying to acquire lock:
>  ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
> 
>  but task is already holding lock:
>  ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(&(kretprobe_table_locks[i].lock));
>    lock(&(kretprobe_table_locks[i].lock));
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  1 lock held by sched-messaging/2767:
>   #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  stack backtrace:
>  CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
>  Call Trace:
>   dump_stack+0x96/0xe0
>   __lock_acquire.cold.57+0x173/0x2b7
>   ? native_queued_spin_lock_slowpath+0x42b/0x9e0
>   ? lockdep_hardirqs_on+0x590/0x590
>   ? __lock_acquire+0xf63/0x4030
>   lock_acquire+0x15a/0x3d0
>   ? kretprobe_hash_lock+0x52/0xa0
>   _raw_spin_lock_irqsave+0x36/0x70
>   ? kretprobe_hash_lock+0x52/0xa0
>   kretprobe_hash_lock+0x52/0xa0
>   trampoline_handler+0xf8/0x940
>   ? kprobe_fault_handler+0x380/0x380
>   ? find_held_lock+0x3a/0x1c0
>   kretprobe_trampoline+0x25/0x50
>   ? lock_acquired+0x392/0xbc0
>   ? _raw_spin_lock_irqsave+0x50/0x70
>   ? __get_valid_kprobe+0x1f0/0x1f0
>   ? _raw_spin_unlock_irqrestore+0x3b/0x40
>   ? finish_task_switch+0x4b9/0x6d0
>   ? __switch_to_asm+0x34/0x70
>   ? __switch_to_asm+0x40/0x70
> 
> The code within the kretprobe handler checks for probe reentrancy,
> so we won't trigger any _raw_spin_lock_irqsave probe in there.
> 
> The problem is in outside kprobe_flush_task, where we call:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave
> 
> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

Hmm, OK. In this case, I think we should mark this process is
going to die and never try to kretprobe on it.

> 
> The kretprobe_trampoline handler is then executed with already
> locked kretprobe_table_locks, and first thing it does is to
> lock kretprobe_table_locks ;-) the whole lockup path like:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> 
>         ---> kretprobe_table_locks locked
> 
>         kretprobe_trampoline
>           trampoline_handler
>             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> 
> The change below sets current_kprobe in kprobe_flush_task, so the probe
> recursion protection check is hit and the probe is never set. It seems
> to fix the deadlock.
> 
> I'm not sure this is the best fix, any ideas are welcome ;-)

Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
kprobes (and kretprobe) on an area by filling current_kprobe might
be a good idea, but it also involves other kprobes.

How about let kretprobe skip the task which state == TASK_DEAD ?

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 627fc1b7011a..3f207d2e0afb 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1874,9 +1874,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 	 * To avoid deadlocks, prohibit return probing in NMI contexts,
 	 * just skip the probe and increase the (inexact) 'nmissed'
 	 * statistical counter, so that the user is informed that
-	 * something happened:
+	 * something happened.
+	 * Also, if the current task is dead, we will already in the process
+	 * to reclaim kretprobe instances from hash list. To avoid memory
+	 * leak, skip to run the kretprobe on such task.
 	 */
-	if (unlikely(in_nmi())) {
+	if (unlikely(in_nmi()) || current->state == TASK_DEAD) {
 		rp->nmissed++;
 		return 0;
 	}

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-09 12:38 ` Masami Hiramatsu
@ 2020-04-09 12:52   ` Jiri Olsa
  2020-04-09 14:16     ` Masami Hiramatsu
  2020-04-09 13:16   ` Naveen N. Rao
  1 sibling, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2020-04-09 12:52 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir)

On Thu, Apr 09, 2020 at 09:38:06PM +0900, Masami Hiramatsu wrote:
> Hi Jiri,
> 
> On Wed,  8 Apr 2020 18:46:41 +0200
> Jiri Olsa <jolsa@kernel.org> wrote:
> 
> > hi,
> > Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> 
> Hmm, kprobe is lockless, but kretprobe involves spinlock.
> Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
> for kretprobe.

I thought of blacklisting, but we were using that kretprobe
in a bcc script.. it's not overloaded by using bpf trampolines,
but still might be useful

SNIP

> > The code within the kretprobe handler checks for probe reentrancy,
> > so we won't trigger any _raw_spin_lock_irqsave probe in there.
> > 
> > The problem is in outside kprobe_flush_task, where we call:
> > 
> >   kprobe_flush_task
> >     kretprobe_table_lock
> >       raw_spin_lock_irqsave
> >         _raw_spin_lock_irqsave
> > 
> > where _raw_spin_lock_irqsave triggers the kretprobe and installs
> > kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> 
> Hmm, OK. In this case, I think we should mark this process is
> going to die and never try to kretprobe on it.
> 
> > 
> > The kretprobe_trampoline handler is then executed with already
> > locked kretprobe_table_locks, and first thing it does is to
> > lock kretprobe_table_locks ;-) the whole lockup path like:
> > 
> >   kprobe_flush_task
> >     kretprobe_table_lock
> >       raw_spin_lock_irqsave
> >         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> > 
> >         ---> kretprobe_table_locks locked
> > 
> >         kretprobe_trampoline
> >           trampoline_handler
> >             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> > 
> > The change below sets current_kprobe in kprobe_flush_task, so the probe
> > recursion protection check is hit and the probe is never set. It seems
> > to fix the deadlock.
> > 
> > I'm not sure this is the best fix, any ideas are welcome ;-)
> 
> Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
> kprobes (and kretprobe) on an area by filling current_kprobe might
> be a good idea, but it also involves other kprobes.
> 
> How about let kretprobe skip the task which state == TASK_DEAD ?

hum, isn't that considerable amount of paths (with state == TASK_DEAD)
that we would kill kprobes for? someone might want to trace it

thanks,
jirka


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-09 12:38 ` Masami Hiramatsu
  2020-04-09 12:52   ` Jiri Olsa
@ 2020-04-09 13:16   ` Naveen N. Rao
  2020-04-09 14:26     ` Masami Hiramatsu
  1 sibling, 1 reply; 24+ messages in thread
From: Naveen N. Rao @ 2020-04-09 13:16 UTC (permalink / raw)
  To: Jiri Olsa, Masami Hiramatsu
  Cc: Anil S Keshavamurthy, bibo,mao, David S. Miller, lkml,
	Peter Zijlstra, Ziqian SUN (Zamir)

Hi Masami,

Masami Hiramatsu wrote:
> Hi Jiri,
> 
> On Wed,  8 Apr 2020 18:46:41 +0200
> Jiri Olsa <jolsa@kernel.org> wrote:
> 
>> hi,
>> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> 
> Hmm, kprobe is lockless, but kretprobe involves spinlock.
> Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
> for kretprobe.

As far as I can see, this is the only place where probing 
_raw_spin_lock_irqsave() is an issue.  Should we blacklist all users for 
this case alone?

> If you need to trace spinlock return, please consider to putting
> kprobe at "ret" instruction.
> 
>> My test was also able to trigger lockdep output:
>> 
>>  ============================================
>>  WARNING: possible recursive locking detected
>>  5.6.0-rc6+ #6 Not tainted
>>  --------------------------------------------
>>  sched-messaging/2767 is trying to acquire lock:
>>  ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
>> 
>>  but task is already holding lock:
>>  ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>> 
>>  other info that might help us debug this:
>>   Possible unsafe locking scenario:
>> 
>>         CPU0
>>         ----
>>    lock(&(kretprobe_table_locks[i].lock));
>>    lock(&(kretprobe_table_locks[i].lock));
>> 
>>   *** DEADLOCK ***
>> 
>>   May be due to missing lock nesting notation
>> 
>>  1 lock held by sched-messaging/2767:
>>   #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>> 
>>  stack backtrace:
>>  CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
>>  Call Trace:
>>   dump_stack+0x96/0xe0
>>   __lock_acquire.cold.57+0x173/0x2b7
>>   ? native_queued_spin_lock_slowpath+0x42b/0x9e0
>>   ? lockdep_hardirqs_on+0x590/0x590
>>   ? __lock_acquire+0xf63/0x4030
>>   lock_acquire+0x15a/0x3d0
>>   ? kretprobe_hash_lock+0x52/0xa0
>>   _raw_spin_lock_irqsave+0x36/0x70
>>   ? kretprobe_hash_lock+0x52/0xa0
>>   kretprobe_hash_lock+0x52/0xa0
>>   trampoline_handler+0xf8/0x940
>>   ? kprobe_fault_handler+0x380/0x380
>>   ? find_held_lock+0x3a/0x1c0
>>   kretprobe_trampoline+0x25/0x50
>>   ? lock_acquired+0x392/0xbc0
>>   ? _raw_spin_lock_irqsave+0x50/0x70
>>   ? __get_valid_kprobe+0x1f0/0x1f0
>>   ? _raw_spin_unlock_irqrestore+0x3b/0x40
>>   ? finish_task_switch+0x4b9/0x6d0
>>   ? __switch_to_asm+0x34/0x70
>>   ? __switch_to_asm+0x40/0x70
>> 
>> The code within the kretprobe handler checks for probe reentrancy,
>> so we won't trigger any _raw_spin_lock_irqsave probe in there.
>> 
>> The problem is in outside kprobe_flush_task, where we call:
>> 
>>   kprobe_flush_task
>>     kretprobe_table_lock
>>       raw_spin_lock_irqsave
>>         _raw_spin_lock_irqsave
>> 
>> where _raw_spin_lock_irqsave triggers the kretprobe and installs
>> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> 
> Hmm, OK. In this case, I think we should mark this process is
> going to die and never try to kretprobe on it.
> 
>> 
>> The kretprobe_trampoline handler is then executed with already
>> locked kretprobe_table_locks, and first thing it does is to
>> lock kretprobe_table_locks ;-) the whole lockup path like:
>> 
>>   kprobe_flush_task
>>     kretprobe_table_lock
>>       raw_spin_lock_irqsave
>>         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
>> 
>>         ---> kretprobe_table_locks locked
>> 
>>         kretprobe_trampoline
>>           trampoline_handler
>>             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
>> 
>> The change below sets current_kprobe in kprobe_flush_task, so the probe
>> recursion protection check is hit and the probe is never set. It seems
>> to fix the deadlock.
>> 
>> I'm not sure this is the best fix, any ideas are welcome ;-)
> 
> Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
> kprobes (and kretprobe) on an area by filling current_kprobe might
> be a good idea, but it also involves other kprobes.

Not sure how you mean that. Jiri's RFC patch would be disabling 
k[ret]probes within kprobe_flush_task(), which is only ever invoked from 
finish_task_switch(). I only see calls to spin locks and kfree() from 
here. Besides, kprobe_flush_task() itself is NOKPROBE, so we would 
ideally want to not trace/probe other functions it calls.

> 
> How about let kretprobe skip the task which state == TASK_DEAD ?
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 627fc1b7011a..3f207d2e0afb 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1874,9 +1874,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  	 * To avoid deadlocks, prohibit return probing in NMI contexts,
>  	 * just skip the probe and increase the (inexact) 'nmissed'
>  	 * statistical counter, so that the user is informed that
> -	 * something happened:
> +	 * something happened.
> +	 * Also, if the current task is dead, we will already in the process
> +	 * to reclaim kretprobe instances from hash list. To avoid memory
> +	 * leak, skip to run the kretprobe on such task.
>  	 */
> -	if (unlikely(in_nmi())) {
> +	if (unlikely(in_nmi()) || current->state == TASK_DEAD) {

I'm wondering if this actually works. kprobe_flush_task() seems to be 
called from finish_task_switch(), after the task switch is complete. So, 
current task won't actually be dead here.


- Naveen


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-09 12:52   ` Jiri Olsa
@ 2020-04-09 14:16     ` Masami Hiramatsu
  0 siblings, 0 replies; 24+ messages in thread
From: Masami Hiramatsu @ 2020-04-09 14:16 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir)

On Thu, 9 Apr 2020 14:52:13 +0200
Jiri Olsa <jolsa@redhat.com> wrote:

> On Thu, Apr 09, 2020 at 09:38:06PM +0900, Masami Hiramatsu wrote:
> > Hi Jiri,
> > 
> > On Wed,  8 Apr 2020 18:46:41 +0200
> > Jiri Olsa <jolsa@kernel.org> wrote:
> > 
> > > hi,
> > > Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> > 
> > Hmm, kprobe is lockless, but kretprobe involves spinlock.
> > Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
> > for kretprobe.
> 
> I thought of blacklisting, but we were using that kretprobe
> in a bcc script.. it's not overloaded by using bpf trampolines,
> but still might be useful
> 
> SNIP
> 
> > > The code within the kretprobe handler checks for probe reentrancy,
> > > so we won't trigger any _raw_spin_lock_irqsave probe in there.
> > > 
> > > The problem is in outside kprobe_flush_task, where we call:
> > > 
> > >   kprobe_flush_task
> > >     kretprobe_table_lock
> > >       raw_spin_lock_irqsave
> > >         _raw_spin_lock_irqsave
> > > 
> > > where _raw_spin_lock_irqsave triggers the kretprobe and installs
> > > kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> > 
> > Hmm, OK. In this case, I think we should mark this process is
> > going to die and never try to kretprobe on it.
> > 
> > > 
> > > The kretprobe_trampoline handler is then executed with already
> > > locked kretprobe_table_locks, and first thing it does is to
> > > lock kretprobe_table_locks ;-) the whole lockup path like:
> > > 
> > >   kprobe_flush_task
> > >     kretprobe_table_lock
> > >       raw_spin_lock_irqsave
> > >         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> > > 
> > >         ---> kretprobe_table_locks locked
> > > 
> > >         kretprobe_trampoline
> > >           trampoline_handler
> > >             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> > > 
> > > The change below sets current_kprobe in kprobe_flush_task, so the probe
> > > recursion protection check is hit and the probe is never set. It seems
> > > to fix the deadlock.
> > > 
> > > I'm not sure this is the best fix, any ideas are welcome ;-)
> > 
> > Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
> > kprobes (and kretprobe) on an area by filling current_kprobe might
> > be a good idea, but it also involves other kprobes.
> > 
> > How about let kretprobe skip the task which state == TASK_DEAD ?
> 
> hum, isn't that considerable amount of paths (with state == TASK_DEAD)
> that we would kill kprobes for? someone might want to trace it

OK, and I found that even after calling kprobe_flush_task(), it may be
work because the task will not be switched. kretprobe instance will be
reclaimed.

Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-09 13:16   ` Naveen N. Rao
@ 2020-04-09 14:26     ` Masami Hiramatsu
  0 siblings, 0 replies; 24+ messages in thread
From: Masami Hiramatsu @ 2020-04-09 14:26 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Jiri Olsa, Anil S Keshavamurthy, bibo,mao, David S. Miller, lkml,
	Peter Zijlstra, Ziqian SUN (Zamir)

Hi,

On Thu, 09 Apr 2020 18:46:47 +0530
"Naveen N. Rao" <naveen.n.rao@linux.ibm.com> wrote:

> Hi Masami,
> 
> Masami Hiramatsu wrote:
> > Hi Jiri,
> > 
> > On Wed,  8 Apr 2020 18:46:41 +0200
> > Jiri Olsa <jolsa@kernel.org> wrote:
> > 
> >> hi,
> >> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> > 
> > Hmm, kprobe is lockless, but kretprobe involves spinlock.
> > Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
> > for kretprobe.
> 
> As far as I can see, this is the only place where probing 
> _raw_spin_lock_irqsave() is an issue.  Should we blacklist all users for 
> this case alone?

Hrm, right. kretprobe is different from kprobe's case.

> 
> > If you need to trace spinlock return, please consider to putting
> > kprobe at "ret" instruction.
> > 
> >> My test was also able to trigger lockdep output:
> >> 
> >>  ============================================
> >>  WARNING: possible recursive locking detected
> >>  5.6.0-rc6+ #6 Not tainted
> >>  --------------------------------------------
> >>  sched-messaging/2767 is trying to acquire lock:
> >>  ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
> >> 
> >>  but task is already holding lock:
> >>  ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> >> 
> >>  other info that might help us debug this:
> >>   Possible unsafe locking scenario:
> >> 
> >>         CPU0
> >>         ----
> >>    lock(&(kretprobe_table_locks[i].lock));
> >>    lock(&(kretprobe_table_locks[i].lock));
> >> 
> >>   *** DEADLOCK ***
> >> 
> >>   May be due to missing lock nesting notation
> >> 
> >>  1 lock held by sched-messaging/2767:
> >>   #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> >> 
> >>  stack backtrace:
> >>  CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
> >>  Call Trace:
> >>   dump_stack+0x96/0xe0
> >>   __lock_acquire.cold.57+0x173/0x2b7
> >>   ? native_queued_spin_lock_slowpath+0x42b/0x9e0
> >>   ? lockdep_hardirqs_on+0x590/0x590
> >>   ? __lock_acquire+0xf63/0x4030
> >>   lock_acquire+0x15a/0x3d0
> >>   ? kretprobe_hash_lock+0x52/0xa0
> >>   _raw_spin_lock_irqsave+0x36/0x70
> >>   ? kretprobe_hash_lock+0x52/0xa0
> >>   kretprobe_hash_lock+0x52/0xa0
> >>   trampoline_handler+0xf8/0x940
> >>   ? kprobe_fault_handler+0x380/0x380
> >>   ? find_held_lock+0x3a/0x1c0
> >>   kretprobe_trampoline+0x25/0x50
> >>   ? lock_acquired+0x392/0xbc0
> >>   ? _raw_spin_lock_irqsave+0x50/0x70
> >>   ? __get_valid_kprobe+0x1f0/0x1f0
> >>   ? _raw_spin_unlock_irqrestore+0x3b/0x40
> >>   ? finish_task_switch+0x4b9/0x6d0
> >>   ? __switch_to_asm+0x34/0x70
> >>   ? __switch_to_asm+0x40/0x70
> >> 
> >> The code within the kretprobe handler checks for probe reentrancy,
> >> so we won't trigger any _raw_spin_lock_irqsave probe in there.
> >> 
> >> The problem is in outside kprobe_flush_task, where we call:
> >> 
> >>   kprobe_flush_task
> >>     kretprobe_table_lock
> >>       raw_spin_lock_irqsave
> >>         _raw_spin_lock_irqsave
> >> 
> >> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> >> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> > 
> > Hmm, OK. In this case, I think we should mark this process is
> > going to die and never try to kretprobe on it.
> > 
> >> 
> >> The kretprobe_trampoline handler is then executed with already
> >> locked kretprobe_table_locks, and first thing it does is to
> >> lock kretprobe_table_locks ;-) the whole lockup path like:
> >> 
> >>   kprobe_flush_task
> >>     kretprobe_table_lock
> >>       raw_spin_lock_irqsave
> >>         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> >> 
> >>         ---> kretprobe_table_locks locked
> >> 
> >>         kretprobe_trampoline
> >>           trampoline_handler
> >>             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> >> 
> >> The change below sets current_kprobe in kprobe_flush_task, so the probe
> >> recursion protection check is hit and the probe is never set. It seems
> >> to fix the deadlock.
> >> 
> >> I'm not sure this is the best fix, any ideas are welcome ;-)
> > 
> > Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
> > kprobes (and kretprobe) on an area by filling current_kprobe might
> > be a good idea, but it also involves other kprobes.
> 
> Not sure how you mean that. Jiri's RFC patch would be disabling 
> k[ret]probes within kprobe_flush_task(), which is only ever invoked from 
> finish_task_switch(). I only see calls to spin locks and kfree() from 
> here. Besides, kprobe_flush_task() itself is NOKPROBE, so we would 
> ideally want to not trace/probe other functions it calls.

Ah, good point. I forgot that has been blacklisted. OK. then I accept
the Jiri's RFC. 

Thank you,


> 
> > 
> > How about let kretprobe skip the task which state == TASK_DEAD ?
> > 
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 627fc1b7011a..3f207d2e0afb 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1874,9 +1874,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> >  	 * To avoid deadlocks, prohibit return probing in NMI contexts,
> >  	 * just skip the probe and increase the (inexact) 'nmissed'
> >  	 * statistical counter, so that the user is informed that
> > -	 * something happened:
> > +	 * something happened.
> > +	 * Also, if the current task is dead, we will already in the process
> > +	 * to reclaim kretprobe instances from hash list. To avoid memory
> > +	 * leak, skip to run the kretprobe on such task.
> >  	 */
> > -	if (unlikely(in_nmi())) {
> > +	if (unlikely(in_nmi()) || current->state == TASK_DEAD) {
> 
> I'm wondering if this actually works. kprobe_flush_task() seems to be 
> called from finish_task_switch(), after the task switch is complete. So, 
> current task won't actually be dead here.



> 
> 
> - Naveen
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-08 16:46 [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task Jiri Olsa
  2020-04-09  9:02 ` Naveen N. Rao
  2020-04-09 12:38 ` Masami Hiramatsu
@ 2020-04-09 14:41 ` Masami Hiramatsu
  2020-04-09 18:44   ` Jiri Olsa
  2 siblings, 1 reply; 24+ messages in thread
From: Masami Hiramatsu @ 2020-04-09 14:41 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir)

On Wed,  8 Apr 2020 18:46:41 +0200
Jiri Olsa <jolsa@kernel.org> wrote:

> hi,
> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> My test was also able to trigger lockdep output:
> 
>  ============================================
>  WARNING: possible recursive locking detected
>  5.6.0-rc6+ #6 Not tainted
>  --------------------------------------------
>  sched-messaging/2767 is trying to acquire lock:
>  ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
> 
>  but task is already holding lock:
>  ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(&(kretprobe_table_locks[i].lock));
>    lock(&(kretprobe_table_locks[i].lock));
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  1 lock held by sched-messaging/2767:
>   #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  stack backtrace:
>  CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
>  Call Trace:
>   dump_stack+0x96/0xe0
>   __lock_acquire.cold.57+0x173/0x2b7
>   ? native_queued_spin_lock_slowpath+0x42b/0x9e0
>   ? lockdep_hardirqs_on+0x590/0x590
>   ? __lock_acquire+0xf63/0x4030
>   lock_acquire+0x15a/0x3d0
>   ? kretprobe_hash_lock+0x52/0xa0
>   _raw_spin_lock_irqsave+0x36/0x70
>   ? kretprobe_hash_lock+0x52/0xa0
>   kretprobe_hash_lock+0x52/0xa0
>   trampoline_handler+0xf8/0x940
>   ? kprobe_fault_handler+0x380/0x380
>   ? find_held_lock+0x3a/0x1c0
>   kretprobe_trampoline+0x25/0x50
>   ? lock_acquired+0x392/0xbc0
>   ? _raw_spin_lock_irqsave+0x50/0x70
>   ? __get_valid_kprobe+0x1f0/0x1f0
>   ? _raw_spin_unlock_irqrestore+0x3b/0x40
>   ? finish_task_switch+0x4b9/0x6d0
>   ? __switch_to_asm+0x34/0x70
>   ? __switch_to_asm+0x40/0x70
> 
> The code within the kretprobe handler checks for probe reentrancy,
> so we won't trigger any _raw_spin_lock_irqsave probe in there.
> 
> The problem is in outside kprobe_flush_task, where we call:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave
> 
> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> 
> The kretprobe_trampoline handler is then executed with already
> locked kretprobe_table_locks, and first thing it does is to
> lock kretprobe_table_locks ;-) the whole lockup path like:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> 
>         ---> kretprobe_table_locks locked
> 
>         kretprobe_trampoline
>           trampoline_handler
>             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> 
> The change below sets current_kprobe in kprobe_flush_task, so the probe
> recursion protection check is hit and the probe is never set. It seems
> to fix the deadlock.
> 
> I'm not sure this is the best fix, any ideas are welcome ;-)

OK, I just have 1 comment. :)

> 
> thanks,
> jirka
> 
> 
> ---
>  kernel/kprobes.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..b13247cae752 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
>  }
>  NOKPROBE_SYMBOL(kretprobe_table_unlock);
>  
> +static struct kprobe kretprobe_dummy = {
> +        .addr = (void *)kretprobe_trampoline,
> +};
> +
>  /*
>   * This function is called from finish_task_switch when task tk becomes dead,
>   * so that we can recycle any function-return probe instances associated
> @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
>  	INIT_HLIST_HEAD(&empty_rp);
>  	hash = hash_ptr(tk, KPROBE_HASH_BITS);
>  	head = &kretprobe_inst_table[hash];
> +	__this_cpu_write(current_kprobe, &kretprobe_dummy);

Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?

BTW, we may be better to introduce a common kprobe_reject_section_start()
and kprobe_reject_section_end() so that the user don't need to prepare
dummy kprobes.

Thank you,

>  	kretprobe_table_lock(hash, &flags);
>  	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
>  		if (ri->task == tk)
>  			recycle_rp_inst(ri, &empty_rp);
>  	}
>  	kretprobe_table_unlock(hash, &flags);
> +	__this_cpu_write(current_kprobe, NULL);
>  	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
>  		hlist_del(&ri->hlist);
>  		kfree(ri);
> -- 
> 2.25.2
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-09  9:02 ` Naveen N. Rao
@ 2020-04-09 18:43   ` Jiri Olsa
  0 siblings, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2020-04-09 18:43 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Anil S Keshavamurthy, David S. Miller, Jiri Olsa,
	Masami Hiramatsu, Peter Zijlstra, bibo,mao, lkml,
	Ziqian SUN (Zamir)

On Thu, Apr 09, 2020 at 02:32:21PM +0530, Naveen N. Rao wrote:

SNIP

> > ---
> >  kernel/kprobes.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 2625c241ac00..b13247cae752 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> >  }
> >  NOKPROBE_SYMBOL(kretprobe_table_unlock);
> > +static struct kprobe kretprobe_dummy = {
> > +        .addr = (void *)kretprobe_trampoline,
> > +};
> > +
> 
> Perhaps a more meaningful name, say, kprobe_flush_task_protect ?

ok, will rename together with Masami's changes

> 
> >  /*
> >   * This function is called from finish_task_switch when task tk becomes dead,
> >   * so that we can recycle any function-return probe instances associated
> > @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> >  	INIT_HLIST_HEAD(&empty_rp);
> >  	hash = hash_ptr(tk, KPROBE_HASH_BITS);
> >  	head = &kretprobe_inst_table[hash];
> > +	__this_cpu_write(current_kprobe, &kretprobe_dummy);
> >  	kretprobe_table_lock(hash, &flags);
> >  	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
> >  		if (ri->task == tk)
> >  			recycle_rp_inst(ri, &empty_rp);
> >  	}
> >  	kretprobe_table_unlock(hash, &flags);
> > +	__this_cpu_write(current_kprobe, NULL);
> 
> I would move this to the end of the function to also cover the below code.
> kprobe_flush_task() is marked NOKPROBE, so it is good to prevent probe
> handling in the below code too.

ok, will do

thanks,
jirka

> 
> >  	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> >  		hlist_del(&ri->hlist);
> >  		kfree(ri);
> 
> 
> - Naveen
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-09 14:41 ` Masami Hiramatsu
@ 2020-04-09 18:44   ` Jiri Olsa
  2020-04-09 20:13     ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2020-04-09 18:44 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir)

On Thu, Apr 09, 2020 at 11:41:01PM +0900, Masami Hiramatsu wrote:

SNIP

> > ---
> >  kernel/kprobes.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 2625c241ac00..b13247cae752 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> >  }
> >  NOKPROBE_SYMBOL(kretprobe_table_unlock);
> >  
> > +static struct kprobe kretprobe_dummy = {
> > +        .addr = (void *)kretprobe_trampoline,
> > +};
> > +
> >  /*
> >   * This function is called from finish_task_switch when task tk becomes dead,
> >   * so that we can recycle any function-return probe instances associated
> > @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> >  	INIT_HLIST_HEAD(&empty_rp);
> >  	hash = hash_ptr(tk, KPROBE_HASH_BITS);
> >  	head = &kretprobe_inst_table[hash];
> > +	__this_cpu_write(current_kprobe, &kretprobe_dummy);
> 
> Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?
> 
> BTW, we may be better to introduce a common kprobe_reject_section_start()
> and kprobe_reject_section_end() so that the user don't need to prepare
> dummy kprobes.

sure, will do

thank you both for review
jirka

> 
> Thank you,
> 
> >  	kretprobe_table_lock(hash, &flags);
> >  	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
> >  		if (ri->task == tk)
> >  			recycle_rp_inst(ri, &empty_rp);
> >  	}
> >  	kretprobe_table_unlock(hash, &flags);
> > +	__this_cpu_write(current_kprobe, NULL);
> >  	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> >  		hlist_del(&ri->hlist);
> >  		kfree(ri);
> > -- 
> > 2.25.2
> > 
> 
> 
> -- 
> Masami Hiramatsu <mhiramat@kernel.org>
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-09 18:44   ` Jiri Olsa
@ 2020-04-09 20:13     ` Jiri Olsa
  2020-04-10  0:31       ` Masami Hiramatsu
  2020-04-10  1:31       ` [RFC] " Ziqian SUN (Zamir)
  0 siblings, 2 replies; 24+ messages in thread
From: Jiri Olsa @ 2020-04-09 20:13 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir)

On Thu, Apr 09, 2020 at 08:45:01PM +0200, Jiri Olsa wrote:
> On Thu, Apr 09, 2020 at 11:41:01PM +0900, Masami Hiramatsu wrote:
> 
> SNIP
> 
> > > ---
> > >  kernel/kprobes.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > > index 2625c241ac00..b13247cae752 100644
> > > --- a/kernel/kprobes.c
> > > +++ b/kernel/kprobes.c
> > > @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> > >  }
> > >  NOKPROBE_SYMBOL(kretprobe_table_unlock);
> > >  
> > > +static struct kprobe kretprobe_dummy = {
> > > +        .addr = (void *)kretprobe_trampoline,
> > > +};
> > > +
> > >  /*
> > >   * This function is called from finish_task_switch when task tk becomes dead,
> > >   * so that we can recycle any function-return probe instances associated
> > > @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> > >  	INIT_HLIST_HEAD(&empty_rp);
> > >  	hash = hash_ptr(tk, KPROBE_HASH_BITS);
> > >  	head = &kretprobe_inst_table[hash];
> > > +	__this_cpu_write(current_kprobe, &kretprobe_dummy);
> > 
> > Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?
> > 
> > BTW, we may be better to introduce a common kprobe_reject_section_start()
> > and kprobe_reject_section_end() so that the user don't need to prepare
> > dummy kprobes.
> 
> sure, will do
> 
> thank you both for review

ok, found out it's actually arch code..  would you guys be ok with something like below?

jirka


---
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4d7022a740ab..081d0f366c99 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -757,12 +757,33 @@ static struct kprobe kretprobe_kprobe = {
 	.addr = (void *)kretprobe_trampoline,
 };
 
+void arch_kprobe_reject_section_start(void)
+{
+	struct kprobe_ctlblk *kcb;
+
+	preempt_disable();
+
+	/*
+	 * Set a dummy kprobe for avoiding kretprobe recursion.
+	 * Since kretprobe never run in kprobe handler, kprobe must not
+	 * be running behind this point.
+	 */
+	__this_cpu_write(current_kprobe, &kretprobe_kprobe);
+	kcb = get_kprobe_ctlblk();
+	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+}
+
+void arch_kprobe_reject_section_end(void)
+{
+	__this_cpu_write(current_kprobe, NULL);
+	preempt_enable();
+}
+
 /*
  * Called from kretprobe_trampoline
  */
 __used __visible void *trampoline_handler(struct pt_regs *regs)
 {
-	struct kprobe_ctlblk *kcb;
 	struct kretprobe_instance *ri = NULL;
 	struct hlist_head *head, empty_rp;
 	struct hlist_node *tmp;
@@ -772,16 +793,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 	void *frame_pointer;
 	bool skipped = false;
 
-	preempt_disable();
-
-	/*
-	 * Set a dummy kprobe for avoiding kretprobe recursion.
-	 * Since kretprobe never run in kprobe handler, kprobe must not
-	 * be running at this point.
-	 */
-	kcb = get_kprobe_ctlblk();
-	__this_cpu_write(current_kprobe, &kretprobe_kprobe);
-	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+	arch_kprobe_reject_section_start();
 
 	INIT_HLIST_HEAD(&empty_rp);
 	kretprobe_hash_lock(current, &head, &flags);
@@ -873,8 +885,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 
 	kretprobe_hash_unlock(current, &flags);
 
-	__this_cpu_write(current_kprobe, NULL);
-	preempt_enable();
+	arch_kprobe_reject_section_end();
 
 	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
 		hlist_del(&ri->hlist);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2625c241ac00..935dd8c87705 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1236,6 +1236,14 @@ __releases(hlist_lock)
 }
 NOKPROBE_SYMBOL(kretprobe_table_unlock);
 
+void __weak arch_kprobe_reject_section_start(void)
+{
+}
+
+void __weak arch_kprobe_reject_section_end(void)
+{
+}
+
 /*
  * This function is called from finish_task_switch when task tk becomes dead,
  * so that we can recycle any function-return probe instances associated
@@ -1253,6 +1261,8 @@ void kprobe_flush_task(struct task_struct *tk)
 		/* Early boot.  kretprobe_table_locks not yet initialized. */
 		return;
 
+	arch_kprobe_reject_section_start();
+
 	INIT_HLIST_HEAD(&empty_rp);
 	hash = hash_ptr(tk, KPROBE_HASH_BITS);
 	head = &kretprobe_inst_table[hash];
@@ -1266,6 +1276,8 @@ void kprobe_flush_task(struct task_struct *tk)
 		hlist_del(&ri->hlist);
 		kfree(ri);
 	}
+
+	arch_kprobe_reject_section_end();
 }
 NOKPROBE_SYMBOL(kprobe_flush_task);
 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-09 20:13     ` Jiri Olsa
@ 2020-04-10  0:31       ` Masami Hiramatsu
  2020-04-14 16:03         ` Jiri Olsa
  2020-04-10  1:31       ` [RFC] " Ziqian SUN (Zamir)
  1 sibling, 1 reply; 24+ messages in thread
From: Masami Hiramatsu @ 2020-04-10  0:31 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir)

Hi Jiri,

On Thu, 9 Apr 2020 22:13:36 +0200
Jiri Olsa <jolsa@redhat.com> wrote:

> On Thu, Apr 09, 2020 at 08:45:01PM +0200, Jiri Olsa wrote:
> > On Thu, Apr 09, 2020 at 11:41:01PM +0900, Masami Hiramatsu wrote:
> > 
> > SNIP
> > 
> > > > ---
> > > >  kernel/kprobes.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > > 
> > > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > > > index 2625c241ac00..b13247cae752 100644
> > > > --- a/kernel/kprobes.c
> > > > +++ b/kernel/kprobes.c
> > > > @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> > > >  }
> > > >  NOKPROBE_SYMBOL(kretprobe_table_unlock);
> > > >  
> > > > +static struct kprobe kretprobe_dummy = {
> > > > +        .addr = (void *)kretprobe_trampoline,
> > > > +};
> > > > +
> > > >  /*
> > > >   * This function is called from finish_task_switch when task tk becomes dead,
> > > >   * so that we can recycle any function-return probe instances associated
> > > > @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> > > >  	INIT_HLIST_HEAD(&empty_rp);
> > > >  	hash = hash_ptr(tk, KPROBE_HASH_BITS);
> > > >  	head = &kretprobe_inst_table[hash];
> > > > +	__this_cpu_write(current_kprobe, &kretprobe_dummy);
> > > 
> > > Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?
> > > 
> > > BTW, we may be better to introduce a common kprobe_reject_section_start()
> > > and kprobe_reject_section_end() so that the user don't need to prepare
> > > dummy kprobes.
> > 
> > sure, will do
> > 
> > thank you both for review
> 
> ok, found out it's actually arch code..  would you guys be ok with something like below?

Thanks for update!

> 
> jirka
> 
> 
> ---
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 4d7022a740ab..081d0f366c99 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -757,12 +757,33 @@ static struct kprobe kretprobe_kprobe = {
>  	.addr = (void *)kretprobe_trampoline,
>  };
>  
> +void arch_kprobe_reject_section_start(void)
> +{
> +	struct kprobe_ctlblk *kcb;
> +
> +	preempt_disable();
> +
> +	/*
> +	 * Set a dummy kprobe for avoiding kretprobe recursion.
> +	 * Since kretprobe never run in kprobe handler, kprobe must not
> +	 * be running behind this point.
> +	 */
> +	__this_cpu_write(current_kprobe, &kretprobe_kprobe);
> +	kcb = get_kprobe_ctlblk();
> +	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +}

Yeah, the code seems good to me.

BTW, I rather like make it arch independent function so that other
arch can use it. In this case, the dummy kprobe's addr should be
somewhere obviously blacklisted (but it must be accessible.)
I think get_kprobe() will be a candidate.

And (sorry about changing my mind), the naming, I think kprobe_busy_begin()
and kprobe_busy_end() will be better because it doesn't reject registering
kprobes, instead, just make it busy :)

Thank you,

> +
> +void arch_kprobe_reject_section_end(void)
> +{
> +	__this_cpu_write(current_kprobe, NULL);
> +	preempt_enable();
> +}
> +
>  /*
>   * Called from kretprobe_trampoline
>   */
>  __used __visible void *trampoline_handler(struct pt_regs *regs)
>  {
> -	struct kprobe_ctlblk *kcb;
>  	struct kretprobe_instance *ri = NULL;
>  	struct hlist_head *head, empty_rp;
>  	struct hlist_node *tmp;
> @@ -772,16 +793,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>  	void *frame_pointer;
>  	bool skipped = false;
>  
> -	preempt_disable();
> -
> -	/*
> -	 * Set a dummy kprobe for avoiding kretprobe recursion.
> -	 * Since kretprobe never run in kprobe handler, kprobe must not
> -	 * be running at this point.
> -	 */
> -	kcb = get_kprobe_ctlblk();
> -	__this_cpu_write(current_kprobe, &kretprobe_kprobe);
> -	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +	arch_kprobe_reject_section_start();
>  
>  	INIT_HLIST_HEAD(&empty_rp);
>  	kretprobe_hash_lock(current, &head, &flags);
> @@ -873,8 +885,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>  
>  	kretprobe_hash_unlock(current, &flags);
>  
> -	__this_cpu_write(current_kprobe, NULL);
> -	preempt_enable();
> +	arch_kprobe_reject_section_end();
>  
>  	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
>  		hlist_del(&ri->hlist);
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..935dd8c87705 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,14 @@ __releases(hlist_lock)
>  }
>  NOKPROBE_SYMBOL(kretprobe_table_unlock);
>  
> +void __weak arch_kprobe_reject_section_start(void)
> +{
> +}
> +
> +void __weak arch_kprobe_reject_section_end(void)
> +{
> +}
> +
>  /*
>   * This function is called from finish_task_switch when task tk becomes dead,
>   * so that we can recycle any function-return probe instances associated
> @@ -1253,6 +1261,8 @@ void kprobe_flush_task(struct task_struct *tk)
>  		/* Early boot.  kretprobe_table_locks not yet initialized. */
>  		return;
>  
> +	arch_kprobe_reject_section_start();
> +
>  	INIT_HLIST_HEAD(&empty_rp);
>  	hash = hash_ptr(tk, KPROBE_HASH_BITS);
>  	head = &kretprobe_inst_table[hash];
> @@ -1266,6 +1276,8 @@ void kprobe_flush_task(struct task_struct *tk)
>  		hlist_del(&ri->hlist);
>  		kfree(ri);
>  	}
> +
> +	arch_kprobe_reject_section_end();
>  }
>  NOKPROBE_SYMBOL(kprobe_flush_task);
>  
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-09 20:13     ` Jiri Olsa
  2020-04-10  0:31       ` Masami Hiramatsu
@ 2020-04-10  1:31       ` Ziqian SUN (Zamir)
  2020-04-14 16:03         ` Jiri Olsa
  1 sibling, 1 reply; 24+ messages in thread
From: Ziqian SUN (Zamir) @ 2020-04-10  1:31 UTC (permalink / raw)
  To: Jiri Olsa, Masami Hiramatsu
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, sztsian



On 4/10/20 4:13 AM, Jiri Olsa wrote:
> On Thu, Apr 09, 2020 at 08:45:01PM +0200, Jiri Olsa wrote:
>> On Thu, Apr 09, 2020 at 11:41:01PM +0900, Masami Hiramatsu wrote:
>>
>> SNIP
>>
>>>> ---
>>>>   kernel/kprobes.c | 6 ++++++
>>>>   1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
>>>> index 2625c241ac00..b13247cae752 100644
>>>> --- a/kernel/kprobes.c
>>>> +++ b/kernel/kprobes.c
>>>> @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
>>>>   }
>>>>   NOKPROBE_SYMBOL(kretprobe_table_unlock);
>>>>   
>>>> +static struct kprobe kretprobe_dummy = {
>>>> +        .addr = (void *)kretprobe_trampoline,
>>>> +};
>>>> +
>>>>   /*
>>>>    * This function is called from finish_task_switch when task tk becomes dead,
>>>>    * so that we can recycle any function-return probe instances associated
>>>> @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
>>>>   	INIT_HLIST_HEAD(&empty_rp);
>>>>   	hash = hash_ptr(tk, KPROBE_HASH_BITS);
>>>>   	head = &kretprobe_inst_table[hash];
>>>> +	__this_cpu_write(current_kprobe, &kretprobe_dummy);
>>>
>>> Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?
>>>
>>> BTW, we may be better to introduce a common kprobe_reject_section_start()
>>> and kprobe_reject_section_end() so that the user don't need to prepare
>>> dummy kprobes.
>>
>> sure, will do
>>
>> thank you both for review
> 
> ok, found out it's actually arch code..  would you guys be ok with something like below?
> 
> jirka
> 

Hi Jiri,

In my origin test lockup happens on both x86_64 and ppc64le. So I would 
appreciate if you can also come up with a solution for both of the 
architectures.

> 
> ---
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 4d7022a740ab..081d0f366c99 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -757,12 +757,33 @@ static struct kprobe kretprobe_kprobe = {
>   	.addr = (void *)kretprobe_trampoline,
>   };
>   
> +void arch_kprobe_reject_section_start(void)
> +{
> +	struct kprobe_ctlblk *kcb;
> +
> +	preempt_disable();
> +
> +	/*
> +	 * Set a dummy kprobe for avoiding kretprobe recursion.
> +	 * Since kretprobe never run in kprobe handler, kprobe must not
> +	 * be running behind this point.
> +	 */
> +	__this_cpu_write(current_kprobe, &kretprobe_kprobe);
> +	kcb = get_kprobe_ctlblk();
> +	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +}
> +
> +void arch_kprobe_reject_section_end(void)
> +{
> +	__this_cpu_write(current_kprobe, NULL);
> +	preempt_enable();
> +}
> +
>   /*
>    * Called from kretprobe_trampoline
>    */
>   __used __visible void *trampoline_handler(struct pt_regs *regs)
>   {
> -	struct kprobe_ctlblk *kcb;
>   	struct kretprobe_instance *ri = NULL;
>   	struct hlist_head *head, empty_rp;
>   	struct hlist_node *tmp;
> @@ -772,16 +793,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>   	void *frame_pointer;
>   	bool skipped = false;
>   
> -	preempt_disable();
> -
> -	/*
> -	 * Set a dummy kprobe for avoiding kretprobe recursion.
> -	 * Since kretprobe never run in kprobe handler, kprobe must not
> -	 * be running at this point.
> -	 */
> -	kcb = get_kprobe_ctlblk();
> -	__this_cpu_write(current_kprobe, &kretprobe_kprobe);
> -	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +	arch_kprobe_reject_section_start();
>   
>   	INIT_HLIST_HEAD(&empty_rp);
>   	kretprobe_hash_lock(current, &head, &flags);
> @@ -873,8 +885,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>   
>   	kretprobe_hash_unlock(current, &flags);
>   
> -	__this_cpu_write(current_kprobe, NULL);
> -	preempt_enable();
> +	arch_kprobe_reject_section_end();
>   
>   	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
>   		hlist_del(&ri->hlist);
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..935dd8c87705 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,14 @@ __releases(hlist_lock)
>   }
>   NOKPROBE_SYMBOL(kretprobe_table_unlock);
>   
> +void __weak arch_kprobe_reject_section_start(void)
> +{
> +}
> +
> +void __weak arch_kprobe_reject_section_end(void)
> +{
> +}
> +
>   /*
>    * This function is called from finish_task_switch when task tk becomes dead,
>    * so that we can recycle any function-return probe instances associated
> @@ -1253,6 +1261,8 @@ void kprobe_flush_task(struct task_struct *tk)
>   		/* Early boot.  kretprobe_table_locks not yet initialized. */
>   		return;
>   
> +	arch_kprobe_reject_section_start();
> +
>   	INIT_HLIST_HEAD(&empty_rp);
>   	hash = hash_ptr(tk, KPROBE_HASH_BITS);
>   	head = &kretprobe_inst_table[hash];
> @@ -1266,6 +1276,8 @@ void kprobe_flush_task(struct task_struct *tk)
>   		hlist_del(&ri->hlist);
>   		kfree(ri);
>   	}
> +
> +	arch_kprobe_reject_section_end();
>   }
>   NOKPROBE_SYMBOL(kprobe_flush_task);
>   
> 

-- 
Ziqian SUN (Zamir)
9F Raycom office (NAY)
Red Hat Software (Beijing) Co.,Ltd
IRC: zsun (internal and freenode)
Tel: +86 10 65627458
GPG : 1D86 6D4A 49CE 4BBD 72CF FCF5 D856 6E11 F2A0 525E


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-10  1:31       ` [RFC] " Ziqian SUN (Zamir)
@ 2020-04-14 16:03         ` Jiri Olsa
  0 siblings, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2020-04-14 16:03 UTC (permalink / raw)
  To: Ziqian SUN (Zamir)
  Cc: Masami Hiramatsu, Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy,
	David S. Miller, Peter Zijlstra, lkml, bibo,mao, sztsian

On Fri, Apr 10, 2020 at 09:31:07AM +0800, Ziqian SUN (Zamir) wrote:
> 
> 
> On 4/10/20 4:13 AM, Jiri Olsa wrote:
> > On Thu, Apr 09, 2020 at 08:45:01PM +0200, Jiri Olsa wrote:
> > > On Thu, Apr 09, 2020 at 11:41:01PM +0900, Masami Hiramatsu wrote:
> > > 
> > > SNIP
> > > 
> > > > > ---
> > > > >   kernel/kprobes.c | 6 ++++++
> > > > >   1 file changed, 6 insertions(+)
> > > > > 
> > > > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > > > > index 2625c241ac00..b13247cae752 100644
> > > > > --- a/kernel/kprobes.c
> > > > > +++ b/kernel/kprobes.c
> > > > > @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> > > > >   }
> > > > >   NOKPROBE_SYMBOL(kretprobe_table_unlock);
> > > > > +static struct kprobe kretprobe_dummy = {
> > > > > +        .addr = (void *)kretprobe_trampoline,
> > > > > +};
> > > > > +
> > > > >   /*
> > > > >    * This function is called from finish_task_switch when task tk becomes dead,
> > > > >    * so that we can recycle any function-return probe instances associated
> > > > > @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> > > > >   	INIT_HLIST_HEAD(&empty_rp);
> > > > >   	hash = hash_ptr(tk, KPROBE_HASH_BITS);
> > > > >   	head = &kretprobe_inst_table[hash];
> > > > > +	__this_cpu_write(current_kprobe, &kretprobe_dummy);
> > > > 
> > > > Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?
> > > > 
> > > > BTW, we may be better to introduce a common kprobe_reject_section_start()
> > > > and kprobe_reject_section_end() so that the user don't need to prepare
> > > > dummy kprobes.
> > > 
> > > sure, will do
> > > 
> > > thank you both for review
> > 
> > ok, found out it's actually arch code..  would you guys be ok with something like below?
> > 
> > jirka
> > 
> 
> Hi Jiri,
> 
> In my origin test lockup happens on both x86_64 and ppc64le. So I would
> appreciate if you can also come up with a solution for both of the
> architectures.

aaaah right.. will update the fix

thanks,
jirka


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-10  0:31       ` Masami Hiramatsu
@ 2020-04-14 16:03         ` Jiri Olsa
  2020-04-15  9:05           ` [PATCH] " Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2020-04-14 16:03 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir)

On Fri, Apr 10, 2020 at 09:31:59AM +0900, Masami Hiramatsu wrote:

SNIP

> > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> > index 4d7022a740ab..081d0f366c99 100644
> > --- a/arch/x86/kernel/kprobes/core.c
> > +++ b/arch/x86/kernel/kprobes/core.c
> > @@ -757,12 +757,33 @@ static struct kprobe kretprobe_kprobe = {
> >  	.addr = (void *)kretprobe_trampoline,
> >  };
> >  
> > +void arch_kprobe_reject_section_start(void)
> > +{
> > +	struct kprobe_ctlblk *kcb;
> > +
> > +	preempt_disable();
> > +
> > +	/*
> > +	 * Set a dummy kprobe for avoiding kretprobe recursion.
> > +	 * Since kretprobe never run in kprobe handler, kprobe must not
> > +	 * be running behind this point.
> > +	 */
> > +	__this_cpu_write(current_kprobe, &kretprobe_kprobe);
> > +	kcb = get_kprobe_ctlblk();
> > +	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> > +}
> 
> Yeah, the code seems good to me.
> 
> BTW, I rather like make it arch independent function so that other
> arch can use it. In this case, the dummy kprobe's addr should be
> somewhere obviously blacklisted (but it must be accessible.)
> I think get_kprobe() will be a candidate.

right.. as Ziqian noted we see this on other ppc as well

> 
> And (sorry about changing my mind), the naming, I think kprobe_busy_begin()
> and kprobe_busy_end() will be better because it doesn't reject registering
> kprobes, instead, just make it busy :)

ok, will change 

thanks,
jirka


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-14 16:03         ` Jiri Olsa
@ 2020-04-15  9:05           ` Jiri Olsa
  2020-04-16  1:55             ` Masami Hiramatsu
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2020-04-15  9:05 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir)

Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
My test was also able to trigger lockdep output:

 ============================================
 WARNING: possible recursive locking detected
 5.6.0-rc6+ #6 Not tainted
 --------------------------------------------
 sched-messaging/2767 is trying to acquire lock:
 ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0

 but task is already holding lock:
 ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(kretprobe_table_locks[i].lock));
   lock(&(kretprobe_table_locks[i].lock));

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by sched-messaging/2767:
  #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 stack backtrace:
 CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
 Call Trace:
  dump_stack+0x96/0xe0
  __lock_acquire.cold.57+0x173/0x2b7
  ? native_queued_spin_lock_slowpath+0x42b/0x9e0
  ? lockdep_hardirqs_on+0x590/0x590
  ? __lock_acquire+0xf63/0x4030
  lock_acquire+0x15a/0x3d0
  ? kretprobe_hash_lock+0x52/0xa0
  _raw_spin_lock_irqsave+0x36/0x70
  ? kretprobe_hash_lock+0x52/0xa0
  kretprobe_hash_lock+0x52/0xa0
  trampoline_handler+0xf8/0x940
  ? kprobe_fault_handler+0x380/0x380
  ? find_held_lock+0x3a/0x1c0
  kretprobe_trampoline+0x25/0x50
  ? lock_acquired+0x392/0xbc0
  ? _raw_spin_lock_irqsave+0x50/0x70
  ? __get_valid_kprobe+0x1f0/0x1f0
  ? _raw_spin_unlock_irqrestore+0x3b/0x40
  ? finish_task_switch+0x4b9/0x6d0
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70

The code within the kretprobe handler checks for probe reentrancy,
so we won't trigger any _raw_spin_lock_irqsave probe in there.

The problem is in outside kprobe_flush_task, where we call:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave

where _raw_spin_lock_irqsave triggers the kretprobe and installs
kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

The kretprobe_trampoline handler is then executed with already
locked kretprobe_table_locks, and first thing it does is to
lock kretprobe_table_locks ;-) the whole lockup path like:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed

        ---> kretprobe_table_locks locked

        kretprobe_trampoline
          trampoline_handler
            kretprobe_hash_lock(current, &head, &flags);  <--- deadlock

Adding kprobe_busy_begin/end helpers that mark code with fake
probe installed to prevent triggering of another kprobe within
this code.

Using these helpers in kprobe_flush_task, so the probe recursion
protection check is hit and the probe is never set to prevent
above lockup.

Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/kernel/kprobes/core.c | 16 +++-------------
 include/linux/kprobes.h        |  4 ++++
 kernel/kprobes.c               | 24 ++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4d7022a740ab..a12adbe1559d 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -753,16 +753,11 @@ asm(
 NOKPROBE_SYMBOL(kretprobe_trampoline);
 STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
 
-static struct kprobe kretprobe_kprobe = {
-	.addr = (void *)kretprobe_trampoline,
-};
-
 /*
  * Called from kretprobe_trampoline
  */
 __used __visible void *trampoline_handler(struct pt_regs *regs)
 {
-	struct kprobe_ctlblk *kcb;
 	struct kretprobe_instance *ri = NULL;
 	struct hlist_head *head, empty_rp;
 	struct hlist_node *tmp;
@@ -772,16 +767,12 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 	void *frame_pointer;
 	bool skipped = false;
 
-	preempt_disable();
-
 	/*
 	 * Set a dummy kprobe for avoiding kretprobe recursion.
 	 * Since kretprobe never run in kprobe handler, kprobe must not
 	 * be running at this point.
 	 */
-	kcb = get_kprobe_ctlblk();
-	__this_cpu_write(current_kprobe, &kretprobe_kprobe);
-	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+	kprobe_busy_begin();
 
 	INIT_HLIST_HEAD(&empty_rp);
 	kretprobe_hash_lock(current, &head, &flags);
@@ -857,7 +848,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 			__this_cpu_write(current_kprobe, &ri->rp->kp);
 			ri->ret_addr = correct_ret_addr;
 			ri->rp->handler(ri, regs);
-			__this_cpu_write(current_kprobe, &kretprobe_kprobe);
+			__this_cpu_write(current_kprobe, &kprobe_busy);
 		}
 
 		recycle_rp_inst(ri, &empty_rp);
@@ -873,8 +864,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 
 	kretprobe_hash_unlock(current, &flags);
 
-	__this_cpu_write(current_kprobe, NULL);
-	preempt_enable();
+	kprobe_busy_end();
 
 	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
 		hlist_del(&ri->hlist);
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 04bdaf01112c..645fd401c856 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -350,6 +350,10 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
 	return this_cpu_ptr(&kprobe_ctlblk);
 }
 
+extern struct kprobe kprobe_busy;
+void kprobe_busy_begin(void);
+void kprobe_busy_end(void);
+
 kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
 int register_kprobe(struct kprobe *p);
 void unregister_kprobe(struct kprobe *p);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2625c241ac00..75bb4a8458e7 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1236,6 +1236,26 @@ __releases(hlist_lock)
 }
 NOKPROBE_SYMBOL(kretprobe_table_unlock);
 
+struct kprobe kprobe_busy = {
+	.addr = (void *) get_kprobe,
+};
+
+void kprobe_busy_begin(void)
+{
+	struct kprobe_ctlblk *kcb;
+
+	preempt_disable();
+	__this_cpu_write(current_kprobe, &kprobe_busy);
+	kcb = get_kprobe_ctlblk();
+	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+}
+
+void kprobe_busy_end(void)
+{
+	__this_cpu_write(current_kprobe, NULL);
+	preempt_enable();
+}
+
 /*
  * This function is called from finish_task_switch when task tk becomes dead,
  * so that we can recycle any function-return probe instances associated
@@ -1253,6 +1273,8 @@ void kprobe_flush_task(struct task_struct *tk)
 		/* Early boot.  kretprobe_table_locks not yet initialized. */
 		return;
 
+	kprobe_busy_begin();
+
 	INIT_HLIST_HEAD(&empty_rp);
 	hash = hash_ptr(tk, KPROBE_HASH_BITS);
 	head = &kretprobe_inst_table[hash];
@@ -1266,6 +1288,8 @@ void kprobe_flush_task(struct task_struct *tk)
 		hlist_del(&ri->hlist);
 		kfree(ri);
 	}
+
+	kprobe_busy_end();
 }
 NOKPROBE_SYMBOL(kprobe_flush_task);
 
-- 
2.18.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-15  9:05           ` [PATCH] " Jiri Olsa
@ 2020-04-16  1:55             ` Masami Hiramatsu
  2020-04-16  9:13               ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Masami Hiramatsu @ 2020-04-16  1:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir),
	stable

On Wed, 15 Apr 2020 11:05:07 +0200
Jiri Olsa <jolsa@redhat.com> wrote:

> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> My test was also able to trigger lockdep output:
> 
>  ============================================
>  WARNING: possible recursive locking detected
>  5.6.0-rc6+ #6 Not tainted
>  --------------------------------------------
>  sched-messaging/2767 is trying to acquire lock:
>  ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
> 
>  but task is already holding lock:
>  ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(&(kretprobe_table_locks[i].lock));
>    lock(&(kretprobe_table_locks[i].lock));
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  1 lock held by sched-messaging/2767:
>   #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  stack backtrace:
>  CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
>  Call Trace:
>   dump_stack+0x96/0xe0
>   __lock_acquire.cold.57+0x173/0x2b7
>   ? native_queued_spin_lock_slowpath+0x42b/0x9e0
>   ? lockdep_hardirqs_on+0x590/0x590
>   ? __lock_acquire+0xf63/0x4030
>   lock_acquire+0x15a/0x3d0
>   ? kretprobe_hash_lock+0x52/0xa0
>   _raw_spin_lock_irqsave+0x36/0x70
>   ? kretprobe_hash_lock+0x52/0xa0
>   kretprobe_hash_lock+0x52/0xa0
>   trampoline_handler+0xf8/0x940
>   ? kprobe_fault_handler+0x380/0x380
>   ? find_held_lock+0x3a/0x1c0
>   kretprobe_trampoline+0x25/0x50
>   ? lock_acquired+0x392/0xbc0
>   ? _raw_spin_lock_irqsave+0x50/0x70
>   ? __get_valid_kprobe+0x1f0/0x1f0
>   ? _raw_spin_unlock_irqrestore+0x3b/0x40
>   ? finish_task_switch+0x4b9/0x6d0
>   ? __switch_to_asm+0x34/0x70
>   ? __switch_to_asm+0x40/0x70
> 
> The code within the kretprobe handler checks for probe reentrancy,
> so we won't trigger any _raw_spin_lock_irqsave probe in there.
> 
> The problem is in outside kprobe_flush_task, where we call:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave
> 
> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> 
> The kretprobe_trampoline handler is then executed with already
> locked kretprobe_table_locks, and first thing it does is to
> lock kretprobe_table_locks ;-) the whole lockup path like:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> 
>         ---> kretprobe_table_locks locked
> 
>         kretprobe_trampoline
>           trampoline_handler
>             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> 
> Adding kprobe_busy_begin/end helpers that mark code with fake
> probe installed to prevent triggering of another kprobe within
> this code.
> 
> Using these helpers in kprobe_flush_task, so the probe recursion
> protection check is hit and the probe is never set to prevent
> above lockup.
> 
> Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

Thanks Jiri and Ziqian!

Looks good to me.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>

BTW, this is a kind of bugfix. So should it add a Fixes tag?

Fixes: ef53d9c5e4da ('kprobes: improve kretprobe scalability with hashed locking')
Cc: stable@vger.kernel.org

Thank you,

> ---
>  arch/x86/kernel/kprobes/core.c | 16 +++-------------
>  include/linux/kprobes.h        |  4 ++++
>  kernel/kprobes.c               | 24 ++++++++++++++++++++++++
>  3 files changed, 31 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 4d7022a740ab..a12adbe1559d 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -753,16 +753,11 @@ asm(
>  NOKPROBE_SYMBOL(kretprobe_trampoline);
>  STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
>  
> -static struct kprobe kretprobe_kprobe = {
> -	.addr = (void *)kretprobe_trampoline,
> -};
> -
>  /*
>   * Called from kretprobe_trampoline
>   */
>  __used __visible void *trampoline_handler(struct pt_regs *regs)
>  {
> -	struct kprobe_ctlblk *kcb;
>  	struct kretprobe_instance *ri = NULL;
>  	struct hlist_head *head, empty_rp;
>  	struct hlist_node *tmp;
> @@ -772,16 +767,12 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>  	void *frame_pointer;
>  	bool skipped = false;
>  
> -	preempt_disable();
> -
>  	/*
>  	 * Set a dummy kprobe for avoiding kretprobe recursion.
>  	 * Since kretprobe never run in kprobe handler, kprobe must not
>  	 * be running at this point.
>  	 */
> -	kcb = get_kprobe_ctlblk();
> -	__this_cpu_write(current_kprobe, &kretprobe_kprobe);
> -	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +	kprobe_busy_begin();
>  
>  	INIT_HLIST_HEAD(&empty_rp);
>  	kretprobe_hash_lock(current, &head, &flags);
> @@ -857,7 +848,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>  			__this_cpu_write(current_kprobe, &ri->rp->kp);
>  			ri->ret_addr = correct_ret_addr;
>  			ri->rp->handler(ri, regs);
> -			__this_cpu_write(current_kprobe, &kretprobe_kprobe);
> +			__this_cpu_write(current_kprobe, &kprobe_busy);
>  		}
>  
>  		recycle_rp_inst(ri, &empty_rp);
> @@ -873,8 +864,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>  
>  	kretprobe_hash_unlock(current, &flags);
>  
> -	__this_cpu_write(current_kprobe, NULL);
> -	preempt_enable();
> +	kprobe_busy_end();
>  
>  	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
>  		hlist_del(&ri->hlist);
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 04bdaf01112c..645fd401c856 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -350,6 +350,10 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
>  	return this_cpu_ptr(&kprobe_ctlblk);
>  }
>  
> +extern struct kprobe kprobe_busy;
> +void kprobe_busy_begin(void);
> +void kprobe_busy_end(void);
> +
>  kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
>  int register_kprobe(struct kprobe *p);
>  void unregister_kprobe(struct kprobe *p);
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..75bb4a8458e7 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,26 @@ __releases(hlist_lock)
>  }
>  NOKPROBE_SYMBOL(kretprobe_table_unlock);
>  
> +struct kprobe kprobe_busy = {
> +	.addr = (void *) get_kprobe,
> +};
> +
> +void kprobe_busy_begin(void)
> +{
> +	struct kprobe_ctlblk *kcb;
> +
> +	preempt_disable();
> +	__this_cpu_write(current_kprobe, &kprobe_busy);
> +	kcb = get_kprobe_ctlblk();
> +	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +}
> +
> +void kprobe_busy_end(void)
> +{
> +	__this_cpu_write(current_kprobe, NULL);
> +	preempt_enable();
> +}
> +
>  /*
>   * This function is called from finish_task_switch when task tk becomes dead,
>   * so that we can recycle any function-return probe instances associated
> @@ -1253,6 +1273,8 @@ void kprobe_flush_task(struct task_struct *tk)
>  		/* Early boot.  kretprobe_table_locks not yet initialized. */
>  		return;
>  
> +	kprobe_busy_begin();
> +
>  	INIT_HLIST_HEAD(&empty_rp);
>  	hash = hash_ptr(tk, KPROBE_HASH_BITS);
>  	head = &kretprobe_inst_table[hash];
> @@ -1266,6 +1288,8 @@ void kprobe_flush_task(struct task_struct *tk)
>  		hlist_del(&ri->hlist);
>  		kfree(ri);
>  	}
> +
> +	kprobe_busy_end();
>  }
>  NOKPROBE_SYMBOL(kprobe_flush_task);
>  
> -- 
> 2.18.2
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-16  1:55             ` Masami Hiramatsu
@ 2020-04-16  9:13               ` Jiri Olsa
  2020-04-16 13:42                 ` Masami Hiramatsu
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2020-04-16  9:13 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir),
	stable

On Thu, Apr 16, 2020 at 10:55:06AM +0900, Masami Hiramatsu wrote:

SNIP

> >           trampoline_handler
> >             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> > 
> > Adding kprobe_busy_begin/end helpers that mark code with fake
> > probe installed to prevent triggering of another kprobe within
> > this code.
> > 
> > Using these helpers in kprobe_flush_task, so the probe recursion
> > protection check is hit and the probe is never set to prevent
> > above lockup.
> > 
> > Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com>
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> 
> Thanks Jiri and Ziqian!
> 
> Looks good to me.
> 
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> 
> BTW, this is a kind of bugfix. So should it add a Fixes tag?
> 
> Fixes: ef53d9c5e4da ('kprobes: improve kretprobe scalability with hashed locking')
> Cc: stable@vger.kernel.org

ah right, do you want me to repost with those?

thanks,
jirka


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-16  9:13               ` Jiri Olsa
@ 2020-04-16 13:42                 ` Masami Hiramatsu
  2020-04-16 14:31                   ` [PATCHv2] " Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Masami Hiramatsu @ 2020-04-16 13:42 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir),
	stable

Hi Jiri,

On Thu, 16 Apr 2020 11:13:20 +0200
Jiri Olsa <jolsa@redhat.com> wrote:

> On Thu, Apr 16, 2020 at 10:55:06AM +0900, Masami Hiramatsu wrote:
> 
> SNIP
> 
> > >           trampoline_handler
> > >             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> > > 
> > > Adding kprobe_busy_begin/end helpers that mark code with fake
> > > probe installed to prevent triggering of another kprobe within
> > > this code.
> > > 
> > > Using these helpers in kprobe_flush_task, so the probe recursion
> > > protection check is hit and the probe is never set to prevent
> > > above lockup.
> > > 
> > > Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com>
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > 
> > Thanks Jiri and Ziqian!
> > 
> > Looks good to me.
> > 
> > Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> > 
> > BTW, this is a kind of bugfix. So should it add a Fixes tag?
> > 
> > Fixes: ef53d9c5e4da ('kprobes: improve kretprobe scalability with hashed locking')
> > Cc: stable@vger.kernel.org
> 
> ah right, do you want me to repost with those?

Yeah, if you don't mind.

Thank you,

> 
> thanks,
> jirka
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCHv2] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-16 13:42                 ` Masami Hiramatsu
@ 2020-04-16 14:31                   ` Jiri Olsa
  2020-04-17  7:38                     ` Masami Hiramatsu
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2020-04-16 14:31 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir),
	stable

Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
My test was also able to trigger lockdep output:

 ============================================
 WARNING: possible recursive locking detected
 5.6.0-rc6+ #6 Not tainted
 --------------------------------------------
 sched-messaging/2767 is trying to acquire lock:
 ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0

 but task is already holding lock:
 ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(kretprobe_table_locks[i].lock));
   lock(&(kretprobe_table_locks[i].lock));

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by sched-messaging/2767:
  #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 stack backtrace:
 CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
 Call Trace:
  dump_stack+0x96/0xe0
  __lock_acquire.cold.57+0x173/0x2b7
  ? native_queued_spin_lock_slowpath+0x42b/0x9e0
  ? lockdep_hardirqs_on+0x590/0x590
  ? __lock_acquire+0xf63/0x4030
  lock_acquire+0x15a/0x3d0
  ? kretprobe_hash_lock+0x52/0xa0
  _raw_spin_lock_irqsave+0x36/0x70
  ? kretprobe_hash_lock+0x52/0xa0
  kretprobe_hash_lock+0x52/0xa0
  trampoline_handler+0xf8/0x940
  ? kprobe_fault_handler+0x380/0x380
  ? find_held_lock+0x3a/0x1c0
  kretprobe_trampoline+0x25/0x50
  ? lock_acquired+0x392/0xbc0
  ? _raw_spin_lock_irqsave+0x50/0x70
  ? __get_valid_kprobe+0x1f0/0x1f0
  ? _raw_spin_unlock_irqrestore+0x3b/0x40
  ? finish_task_switch+0x4b9/0x6d0
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70

The code within the kretprobe handler checks for probe reentrancy,
so we won't trigger any _raw_spin_lock_irqsave probe in there.

The problem is in outside kprobe_flush_task, where we call:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave

where _raw_spin_lock_irqsave triggers the kretprobe and installs
kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

The kretprobe_trampoline handler is then executed with already
locked kretprobe_table_locks, and first thing it does is to
lock kretprobe_table_locks ;-) the whole lockup path like:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed

        ---> kretprobe_table_locks locked

        kretprobe_trampoline
          trampoline_handler
            kretprobe_hash_lock(current, &head, &flags);  <--- deadlock

Adding kprobe_busy_begin/end helpers that mark code with fake
probe installed to prevent triggering of another kprobe within
this code.

Using these helpers in kprobe_flush_task, so the probe recursion
protection check is hit and the probe is never set to prevent
above lockup.

Fixes: ef53d9c5e4da ('kprobes: improve kretprobe scalability with hashed locking')
Cc: stable@vger.kernel.org
Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/kernel/kprobes/core.c | 16 +++-------------
 include/linux/kprobes.h        |  4 ++++
 kernel/kprobes.c               | 24 ++++++++++++++++++++++++
 3 files changed, 31 insertions(+), 13 deletions(-)

v2 changes: updated changelog with Fixes/Ack and Cc stable

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4d7022a740ab..a12adbe1559d 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -753,16 +753,11 @@ asm(
 NOKPROBE_SYMBOL(kretprobe_trampoline);
 STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
 
-static struct kprobe kretprobe_kprobe = {
-	.addr = (void *)kretprobe_trampoline,
-};
-
 /*
  * Called from kretprobe_trampoline
  */
 __used __visible void *trampoline_handler(struct pt_regs *regs)
 {
-	struct kprobe_ctlblk *kcb;
 	struct kretprobe_instance *ri = NULL;
 	struct hlist_head *head, empty_rp;
 	struct hlist_node *tmp;
@@ -772,16 +767,12 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 	void *frame_pointer;
 	bool skipped = false;
 
-	preempt_disable();
-
 	/*
 	 * Set a dummy kprobe for avoiding kretprobe recursion.
 	 * Since kretprobe never run in kprobe handler, kprobe must not
 	 * be running at this point.
 	 */
-	kcb = get_kprobe_ctlblk();
-	__this_cpu_write(current_kprobe, &kretprobe_kprobe);
-	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+	kprobe_busy_begin();
 
 	INIT_HLIST_HEAD(&empty_rp);
 	kretprobe_hash_lock(current, &head, &flags);
@@ -857,7 +848,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 			__this_cpu_write(current_kprobe, &ri->rp->kp);
 			ri->ret_addr = correct_ret_addr;
 			ri->rp->handler(ri, regs);
-			__this_cpu_write(current_kprobe, &kretprobe_kprobe);
+			__this_cpu_write(current_kprobe, &kprobe_busy);
 		}
 
 		recycle_rp_inst(ri, &empty_rp);
@@ -873,8 +864,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 
 	kretprobe_hash_unlock(current, &flags);
 
-	__this_cpu_write(current_kprobe, NULL);
-	preempt_enable();
+	kprobe_busy_end();
 
 	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
 		hlist_del(&ri->hlist);
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 04bdaf01112c..645fd401c856 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -350,6 +350,10 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
 	return this_cpu_ptr(&kprobe_ctlblk);
 }
 
+extern struct kprobe kprobe_busy;
+void kprobe_busy_begin(void);
+void kprobe_busy_end(void);
+
 kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
 int register_kprobe(struct kprobe *p);
 void unregister_kprobe(struct kprobe *p);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2625c241ac00..75bb4a8458e7 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1236,6 +1236,26 @@ __releases(hlist_lock)
 }
 NOKPROBE_SYMBOL(kretprobe_table_unlock);
 
+struct kprobe kprobe_busy = {
+	.addr = (void *) get_kprobe,
+};
+
+void kprobe_busy_begin(void)
+{
+	struct kprobe_ctlblk *kcb;
+
+	preempt_disable();
+	__this_cpu_write(current_kprobe, &kprobe_busy);
+	kcb = get_kprobe_ctlblk();
+	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+}
+
+void kprobe_busy_end(void)
+{
+	__this_cpu_write(current_kprobe, NULL);
+	preempt_enable();
+}
+
 /*
  * This function is called from finish_task_switch when task tk becomes dead,
  * so that we can recycle any function-return probe instances associated
@@ -1253,6 +1273,8 @@ void kprobe_flush_task(struct task_struct *tk)
 		/* Early boot.  kretprobe_table_locks not yet initialized. */
 		return;
 
+	kprobe_busy_begin();
+
 	INIT_HLIST_HEAD(&empty_rp);
 	hash = hash_ptr(tk, KPROBE_HASH_BITS);
 	head = &kretprobe_inst_table[hash];
@@ -1266,6 +1288,8 @@ void kprobe_flush_task(struct task_struct *tk)
 		hlist_del(&ri->hlist);
 		kfree(ri);
 	}
+
+	kprobe_busy_end();
 }
 NOKPROBE_SYMBOL(kprobe_flush_task);
 
-- 
2.18.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv2] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-16 14:31                   ` [PATCHv2] " Jiri Olsa
@ 2020-04-17  7:38                     ` Masami Hiramatsu
  2020-04-28 21:36                       ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Masami Hiramatsu @ 2020-04-17  7:38 UTC (permalink / raw)
  To: Jiri Olsa, Ingo Molnar
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir),
	stable

On Thu, 16 Apr 2020 16:31:04 +0200
Jiri Olsa <jolsa@redhat.com> wrote:

> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> My test was also able to trigger lockdep output:
> 
>  ============================================
>  WARNING: possible recursive locking detected
>  5.6.0-rc6+ #6 Not tainted
>  --------------------------------------------
>  sched-messaging/2767 is trying to acquire lock:
>  ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
> 
>  but task is already holding lock:
>  ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(&(kretprobe_table_locks[i].lock));
>    lock(&(kretprobe_table_locks[i].lock));
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  1 lock held by sched-messaging/2767:
>   #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> 
>  stack backtrace:
>  CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
>  Call Trace:
>   dump_stack+0x96/0xe0
>   __lock_acquire.cold.57+0x173/0x2b7
>   ? native_queued_spin_lock_slowpath+0x42b/0x9e0
>   ? lockdep_hardirqs_on+0x590/0x590
>   ? __lock_acquire+0xf63/0x4030
>   lock_acquire+0x15a/0x3d0
>   ? kretprobe_hash_lock+0x52/0xa0
>   _raw_spin_lock_irqsave+0x36/0x70
>   ? kretprobe_hash_lock+0x52/0xa0
>   kretprobe_hash_lock+0x52/0xa0
>   trampoline_handler+0xf8/0x940
>   ? kprobe_fault_handler+0x380/0x380
>   ? find_held_lock+0x3a/0x1c0
>   kretprobe_trampoline+0x25/0x50
>   ? lock_acquired+0x392/0xbc0
>   ? _raw_spin_lock_irqsave+0x50/0x70
>   ? __get_valid_kprobe+0x1f0/0x1f0
>   ? _raw_spin_unlock_irqrestore+0x3b/0x40
>   ? finish_task_switch+0x4b9/0x6d0
>   ? __switch_to_asm+0x34/0x70
>   ? __switch_to_asm+0x40/0x70
> 
> The code within the kretprobe handler checks for probe reentrancy,
> so we won't trigger any _raw_spin_lock_irqsave probe in there.
> 
> The problem is in outside kprobe_flush_task, where we call:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave
> 
> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> 
> The kretprobe_trampoline handler is then executed with already
> locked kretprobe_table_locks, and first thing it does is to
> lock kretprobe_table_locks ;-) the whole lockup path like:
> 
>   kprobe_flush_task
>     kretprobe_table_lock
>       raw_spin_lock_irqsave
>         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> 
>         ---> kretprobe_table_locks locked
> 
>         kretprobe_trampoline
>           trampoline_handler
>             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> 
> Adding kprobe_busy_begin/end helpers that mark code with fake
> probe installed to prevent triggering of another kprobe within
> this code.
> 
> Using these helpers in kprobe_flush_task, so the probe recursion
> protection check is hit and the probe is never set to prevent
> above lockup.
> 

Thanks Jiri!

Ingo, could you pick this up?

Regards,

> Fixes: ef53d9c5e4da ('kprobes: improve kretprobe scalability with hashed locking')
> Cc: stable@vger.kernel.org
> Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com>
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  arch/x86/kernel/kprobes/core.c | 16 +++-------------
>  include/linux/kprobes.h        |  4 ++++
>  kernel/kprobes.c               | 24 ++++++++++++++++++++++++
>  3 files changed, 31 insertions(+), 13 deletions(-)
> 
> v2 changes: updated changelog with Fixes/Ack and Cc stable
> 
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 4d7022a740ab..a12adbe1559d 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -753,16 +753,11 @@ asm(
>  NOKPROBE_SYMBOL(kretprobe_trampoline);
>  STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
>  
> -static struct kprobe kretprobe_kprobe = {
> -	.addr = (void *)kretprobe_trampoline,
> -};
> -
>  /*
>   * Called from kretprobe_trampoline
>   */
>  __used __visible void *trampoline_handler(struct pt_regs *regs)
>  {
> -	struct kprobe_ctlblk *kcb;
>  	struct kretprobe_instance *ri = NULL;
>  	struct hlist_head *head, empty_rp;
>  	struct hlist_node *tmp;
> @@ -772,16 +767,12 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>  	void *frame_pointer;
>  	bool skipped = false;
>  
> -	preempt_disable();
> -
>  	/*
>  	 * Set a dummy kprobe for avoiding kretprobe recursion.
>  	 * Since kretprobe never run in kprobe handler, kprobe must not
>  	 * be running at this point.
>  	 */
> -	kcb = get_kprobe_ctlblk();
> -	__this_cpu_write(current_kprobe, &kretprobe_kprobe);
> -	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +	kprobe_busy_begin();
>  
>  	INIT_HLIST_HEAD(&empty_rp);
>  	kretprobe_hash_lock(current, &head, &flags);
> @@ -857,7 +848,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>  			__this_cpu_write(current_kprobe, &ri->rp->kp);
>  			ri->ret_addr = correct_ret_addr;
>  			ri->rp->handler(ri, regs);
> -			__this_cpu_write(current_kprobe, &kretprobe_kprobe);
> +			__this_cpu_write(current_kprobe, &kprobe_busy);
>  		}
>  
>  		recycle_rp_inst(ri, &empty_rp);
> @@ -873,8 +864,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>  
>  	kretprobe_hash_unlock(current, &flags);
>  
> -	__this_cpu_write(current_kprobe, NULL);
> -	preempt_enable();
> +	kprobe_busy_end();
>  
>  	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
>  		hlist_del(&ri->hlist);
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 04bdaf01112c..645fd401c856 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -350,6 +350,10 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
>  	return this_cpu_ptr(&kprobe_ctlblk);
>  }
>  
> +extern struct kprobe kprobe_busy;
> +void kprobe_busy_begin(void);
> +void kprobe_busy_end(void);
> +
>  kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
>  int register_kprobe(struct kprobe *p);
>  void unregister_kprobe(struct kprobe *p);
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..75bb4a8458e7 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,26 @@ __releases(hlist_lock)
>  }
>  NOKPROBE_SYMBOL(kretprobe_table_unlock);
>  
> +struct kprobe kprobe_busy = {
> +	.addr = (void *) get_kprobe,
> +};
> +
> +void kprobe_busy_begin(void)
> +{
> +	struct kprobe_ctlblk *kcb;
> +
> +	preempt_disable();
> +	__this_cpu_write(current_kprobe, &kprobe_busy);
> +	kcb = get_kprobe_ctlblk();
> +	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +}
> +
> +void kprobe_busy_end(void)
> +{
> +	__this_cpu_write(current_kprobe, NULL);
> +	preempt_enable();
> +}
> +
>  /*
>   * This function is called from finish_task_switch when task tk becomes dead,
>   * so that we can recycle any function-return probe instances associated
> @@ -1253,6 +1273,8 @@ void kprobe_flush_task(struct task_struct *tk)
>  		/* Early boot.  kretprobe_table_locks not yet initialized. */
>  		return;
>  
> +	kprobe_busy_begin();
> +
>  	INIT_HLIST_HEAD(&empty_rp);
>  	hash = hash_ptr(tk, KPROBE_HASH_BITS);
>  	head = &kretprobe_inst_table[hash];
> @@ -1266,6 +1288,8 @@ void kprobe_flush_task(struct task_struct *tk)
>  		hlist_del(&ri->hlist);
>  		kfree(ri);
>  	}
> +
> +	kprobe_busy_end();
>  }
>  NOKPROBE_SYMBOL(kprobe_flush_task);
>  
> -- 
> 2.18.2
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv2] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-17  7:38                     ` Masami Hiramatsu
@ 2020-04-28 21:36                       ` Jiri Olsa
  2020-05-01  2:01                         ` Masami Hiramatsu
  0 siblings, 1 reply; 24+ messages in thread
From: Jiri Olsa @ 2020-04-28 21:36 UTC (permalink / raw)
  To: Masami Hiramatsu, Ingo Molnar
  Cc: Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy, David S. Miller,
	Peter Zijlstra, lkml, bibo,mao, Ziqian SUN (Zamir),
	stable

On Fri, Apr 17, 2020 at 04:38:10PM +0900, Masami Hiramatsu wrote:

SNIP

> > 
> > The code within the kretprobe handler checks for probe reentrancy,
> > so we won't trigger any _raw_spin_lock_irqsave probe in there.
> > 
> > The problem is in outside kprobe_flush_task, where we call:
> > 
> >   kprobe_flush_task
> >     kretprobe_table_lock
> >       raw_spin_lock_irqsave
> >         _raw_spin_lock_irqsave
> > 
> > where _raw_spin_lock_irqsave triggers the kretprobe and installs
> > kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> > 
> > The kretprobe_trampoline handler is then executed with already
> > locked kretprobe_table_locks, and first thing it does is to
> > lock kretprobe_table_locks ;-) the whole lockup path like:
> > 
> >   kprobe_flush_task
> >     kretprobe_table_lock
> >       raw_spin_lock_irqsave
> >         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> > 
> >         ---> kretprobe_table_locks locked
> > 
> >         kretprobe_trampoline
> >           trampoline_handler
> >             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> > 
> > Adding kprobe_busy_begin/end helpers that mark code with fake
> > probe installed to prevent triggering of another kprobe within
> > this code.
> > 
> > Using these helpers in kprobe_flush_task, so the probe recursion
> > protection check is hit and the probe is never set to prevent
> > above lockup.
> > 
> 
> Thanks Jiri!
> 
> Ingo, could you pick this up?

Ingo, any chance you could take this one?

thanks,
jirka


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv2] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-04-28 21:36                       ` Jiri Olsa
@ 2020-05-01  2:01                         ` Masami Hiramatsu
  2020-05-07 10:15                           ` Jiri Olsa
  0 siblings, 1 reply; 24+ messages in thread
From: Masami Hiramatsu @ 2020-05-01  2:01 UTC (permalink / raw)
  To: Ingo Molnar, Ingo Molnar
  Cc: Jiri Olsa, Jiri Olsa, Naveen N. Rao, Anil S Keshavamurthy,
	David S. Miller, Peter Zijlstra, lkml, bibo,mao,
	Ziqian SUN (Zamir),
	stable

On Tue, 28 Apr 2020 23:36:27 +0200
Jiri Olsa <jolsa@redhat.com> wrote:

> On Fri, Apr 17, 2020 at 04:38:10PM +0900, Masami Hiramatsu wrote:
> 
> SNIP
> 
> > > 
> > > The code within the kretprobe handler checks for probe reentrancy,
> > > so we won't trigger any _raw_spin_lock_irqsave probe in there.
> > > 
> > > The problem is in outside kprobe_flush_task, where we call:
> > > 
> > >   kprobe_flush_task
> > >     kretprobe_table_lock
> > >       raw_spin_lock_irqsave
> > >         _raw_spin_lock_irqsave
> > > 
> > > where _raw_spin_lock_irqsave triggers the kretprobe and installs
> > > kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> > > 
> > > The kretprobe_trampoline handler is then executed with already
> > > locked kretprobe_table_locks, and first thing it does is to
> > > lock kretprobe_table_locks ;-) the whole lockup path like:
> > > 
> > >   kprobe_flush_task
> > >     kretprobe_table_lock
> > >       raw_spin_lock_irqsave
> > >         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> > > 
> > >         ---> kretprobe_table_locks locked
> > > 
> > >         kretprobe_trampoline
> > >           trampoline_handler
> > >             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> > > 
> > > Adding kprobe_busy_begin/end helpers that mark code with fake
> > > probe installed to prevent triggering of another kprobe within
> > > this code.
> > > 
> > > Using these helpers in kprobe_flush_task, so the probe recursion
> > > protection check is hit and the probe is never set to prevent
> > > above lockup.
> > > 
> > 
> > Thanks Jiri!
> > 
> > Ingo, could you pick this up?
> 
> Ingo, any chance you could take this one?

Hi Ingo,

Should I make a pull request for all kprobes related patches to you?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCHv2] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
  2020-05-01  2:01                         ` Masami Hiramatsu
@ 2020-05-07 10:15                           ` Jiri Olsa
  0 siblings, 0 replies; 24+ messages in thread
From: Jiri Olsa @ 2020-05-07 10:15 UTC (permalink / raw)
  To: Masami Hiramatsu, Thomas Gleixner
  Cc: Ingo Molnar, Ingo Molnar, Jiri Olsa, Naveen N. Rao,
	Anil S Keshavamurthy, David S. Miller, Peter Zijlstra, lkml,
	bibo,mao, Ziqian SUN (Zamir),
	stable

On Fri, May 01, 2020 at 11:01:07AM +0900, Masami Hiramatsu wrote:
> On Tue, 28 Apr 2020 23:36:27 +0200
> Jiri Olsa <jolsa@redhat.com> wrote:
> 
> > On Fri, Apr 17, 2020 at 04:38:10PM +0900, Masami Hiramatsu wrote:
> > 
> > SNIP
> > 
> > > > 
> > > > The code within the kretprobe handler checks for probe reentrancy,
> > > > so we won't trigger any _raw_spin_lock_irqsave probe in there.
> > > > 
> > > > The problem is in outside kprobe_flush_task, where we call:
> > > > 
> > > >   kprobe_flush_task
> > > >     kretprobe_table_lock
> > > >       raw_spin_lock_irqsave
> > > >         _raw_spin_lock_irqsave
> > > > 
> > > > where _raw_spin_lock_irqsave triggers the kretprobe and installs
> > > > kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> > > > 
> > > > The kretprobe_trampoline handler is then executed with already
> > > > locked kretprobe_table_locks, and first thing it does is to
> > > > lock kretprobe_table_locks ;-) the whole lockup path like:
> > > > 
> > > >   kprobe_flush_task
> > > >     kretprobe_table_lock
> > > >       raw_spin_lock_irqsave
> > > >         _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> > > > 
> > > >         ---> kretprobe_table_locks locked
> > > > 
> > > >         kretprobe_trampoline
> > > >           trampoline_handler
> > > >             kretprobe_hash_lock(current, &head, &flags);  <--- deadlock
> > > > 
> > > > Adding kprobe_busy_begin/end helpers that mark code with fake
> > > > probe installed to prevent triggering of another kprobe within
> > > > this code.
> > > > 
> > > > Using these helpers in kprobe_flush_task, so the probe recursion
> > > > protection check is hit and the probe is never set to prevent
> > > > above lockup.
> > > > 
> > > 
> > > Thanks Jiri!
> > > 
> > > Ingo, could you pick this up?
> > 
> > Ingo, any chance you could take this one?
> 
> Hi Ingo,
> 
> Should I make a pull request for all kprobes related patches to you?

looks like Ingo is offline, Thomas, could you please pull this one?

thanks,
jirka


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-05-07 10:15 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-08 16:46 [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task Jiri Olsa
2020-04-09  9:02 ` Naveen N. Rao
2020-04-09 18:43   ` Jiri Olsa
2020-04-09 12:38 ` Masami Hiramatsu
2020-04-09 12:52   ` Jiri Olsa
2020-04-09 14:16     ` Masami Hiramatsu
2020-04-09 13:16   ` Naveen N. Rao
2020-04-09 14:26     ` Masami Hiramatsu
2020-04-09 14:41 ` Masami Hiramatsu
2020-04-09 18:44   ` Jiri Olsa
2020-04-09 20:13     ` Jiri Olsa
2020-04-10  0:31       ` Masami Hiramatsu
2020-04-14 16:03         ` Jiri Olsa
2020-04-15  9:05           ` [PATCH] " Jiri Olsa
2020-04-16  1:55             ` Masami Hiramatsu
2020-04-16  9:13               ` Jiri Olsa
2020-04-16 13:42                 ` Masami Hiramatsu
2020-04-16 14:31                   ` [PATCHv2] " Jiri Olsa
2020-04-17  7:38                     ` Masami Hiramatsu
2020-04-28 21:36                       ` Jiri Olsa
2020-05-01  2:01                         ` Masami Hiramatsu
2020-05-07 10:15                           ` Jiri Olsa
2020-04-10  1:31       ` [RFC] " Ziqian SUN (Zamir)
2020-04-14 16:03         ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).