All of lore.kernel.org
 help / color / mirror / Atom feed
* x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
@ 2020-08-24 12:02 Eddy_Wu
  2020-08-24 14:14 ` Peter Zijlstra
  2020-08-24 15:54 ` Masami Hiramatsu
  0 siblings, 2 replies; 33+ messages in thread
From: Eddy_Wu @ 2020-08-24 12:02 UTC (permalink / raw)
  To: Masami Hiramatsu, Peter Zijlstra; +Cc: linux-kernel, x86, David S. Miller

Greetings!

Starting from kernel 5.8 (x86_64), kretprobe handler will always missed if corresponding kprobe on function entry is not optimized (using break point instead).
Step to reproduce this:
1) Build the kretprobe example module (CONFIG_SAMPLE_KRETPROBES=m)
2) Disable jump optimization (`sysctl debug.kprobes-optimization=0` or register any kprobe.post_handler at same location)
3) Insert the kretprobe_example module
4) Launch some process to trigger _do_fork
5) Remove kretprobe_example module
6) dmesg shows that all probing instances are missed

Example output:
# sysctl debug.kprobes-optimization=0
debug.kprobes-optimization = 0
# insmod samples/kprobes/kretprobe_example.ko
# ls > /dev/null
# rmmod kretprobe_example
# dmesg
[48555.067295] Planted return probe at _do_fork: 0000000038ae0211
[48560.229459] kretprobe at 0000000038ae0211 unregistered
[48560.229460] Missed probing 3 instances of _do_fork

After bisecting, I found this behavior seems to introduce by this commit: (5.8-rc1)
0d00449c7a28a1514595630735df383dec606812 x86: Replace ist_enter() with nmi_enter()
This make kprobe_int3_handler() effectively running as NMI context, which pre_handler_kretprobe() explicitly checked to prevent recursion.

(in_nmi() check appears from v3.17)
f96f56780ca584930bb3a2769d73fd9a101bcbbe kprobes: Skip kretprobe hit in NMI context to avoid deadlock

To make kretprobe work again with int3 breakpoint, I think we can replace the in_nmi() check with in_nmi() == (1 << NMI_SHIFT) at kprobe_int3_handler() and skip kretprobe if nested NMI.
Did a quick test on 5.9-rc2 and it seems to be working.
I'm not sure if it is the best way to do since it may also require change to other architecture as well, any thought?


TREND MICRO EMAIL NOTICE

The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

For details about what personal information we collect and why, please see our Privacy Notice on our website at: Read privacy policy<http://www.trendmicro.com/privacy>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-24 12:02 x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint) Eddy_Wu
@ 2020-08-24 14:14 ` Peter Zijlstra
  2020-08-24 16:18   ` Eddy_Wu
  2020-08-24 18:15   ` Masami Hiramatsu
  2020-08-24 15:54 ` Masami Hiramatsu
  1 sibling, 2 replies; 33+ messages in thread
From: Peter Zijlstra @ 2020-08-24 14:14 UTC (permalink / raw)
  To: Eddy_Wu; +Cc: Masami Hiramatsu, linux-kernel, x86, David S. Miller

On Mon, Aug 24, 2020 at 12:02:58PM +0000, Eddy_Wu@trendmicro.com wrote:
> After bisecting, I found this behavior seems to introduce by this
> commit: (5.8-rc1) 0d00449c7a28a1514595630735df383dec606812 x86:
> Replace ist_enter() with nmi_enter() This make kprobe_int3_handler()
> effectively running as NMI context, which pre_handler_kretprobe()
> explicitly checked to prevent recursion.
> 
> (in_nmi() check appears from v3.17)
> f96f56780ca584930bb3a2769d73fd9a101bcbbe kprobes: Skip kretprobe hit
> in NMI context to avoid deadlock
> 
> To make kretprobe work again with int3 breakpoint, I think we can
> replace the in_nmi() check with in_nmi() == (1 << NMI_SHIFT) at
> kprobe_int3_handler() and skip kretprobe if nested NMI.  Did a quick
> test on 5.9-rc2 and it seems to be working.  I'm not sure if it is the
> best way to do since it may also require change to other architecture
> as well, any thought?

Masami, would it be possible to have a kretprobe specific recursion
count here?

I did the below, but i'm not at all sure that isn't horrible broken. I
can't really find many rp->lock sites and this might break things by
limiting contention.

---

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9be1bff4f586..0bff314cc800 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -153,6 +153,7 @@ struct kretprobe {
 	size_t data_size;
 	struct hlist_head free_instances;
 	raw_spinlock_t lock;
+	atomic_t recursion;
 };
 
 struct kretprobe_instance {
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 287b263c9cb9..27fd096bcb9a 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1934,22 +1934,17 @@ unsigned long __weak arch_deref_entry_point(void *entry)
 static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 {
 	struct kretprobe *rp = container_of(p, struct kretprobe, kp);
-	unsigned long hash, flags = 0;
 	struct kretprobe_instance *ri;
-
-	/*
-	 * To avoid deadlocks, prohibit return probing in NMI contexts,
-	 * just skip the probe and increase the (inexact) 'nmissed'
-	 * statistical counter, so that the user is informed that
-	 * something happened:
-	 */
-	if (unlikely(in_nmi())) {
-		rp->nmissed++;
-		return 0;
-	}
+	unsigned long hash, flags;
+	int rec;
 
 	/* TODO: consider to only swap the RA after the last pre_handler fired */
 	hash = hash_ptr(current, KPROBE_HASH_BITS);
+	rec = atomic_fetch_inc_acquire(&rp->recursion);
+	if (rec) {
+		rp->nmissed++;
+		goto out;
+	}
 	raw_spin_lock_irqsave(&rp->lock, flags);
 	if (!hlist_empty(&rp->free_instances)) {
 		ri = hlist_entry(rp->free_instances.first,
@@ -1964,7 +1959,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 			raw_spin_lock_irqsave(&rp->lock, flags);
 			hlist_add_head(&ri->hlist, &rp->free_instances);
 			raw_spin_unlock_irqrestore(&rp->lock, flags);
-			return 0;
+			goto out;
 		}
 
 		arch_prepare_kretprobe(ri, regs);
@@ -1978,6 +1973,8 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 		rp->nmissed++;
 		raw_spin_unlock_irqrestore(&rp->lock, flags);
 	}
+out:
+	atomic_dec(&rp->recursion);
 	return 0;
 }
 NOKPROBE_SYMBOL(pre_handler_kretprobe);


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-24 12:02 x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint) Eddy_Wu
  2020-08-24 14:14 ` Peter Zijlstra
@ 2020-08-24 15:54 ` Masami Hiramatsu
  2020-08-24 16:41   ` Eddy_Wu
  1 sibling, 1 reply; 33+ messages in thread
From: Masami Hiramatsu @ 2020-08-24 15:54 UTC (permalink / raw)
  To: Eddy_Wu; +Cc: Peter Zijlstra, linux-kernel, x86, David S. Miller

On Mon, 24 Aug 2020 12:02:58 +0000
"Eddy_Wu@trendmicro.com" <Eddy_Wu@trendmicro.com> wrote:

> Greetings!
> 
> Starting from kernel 5.8 (x86_64), kretprobe handler will always missed if corresponding kprobe on function entry is not optimized (using break point instead).

Oops, good catch. I always enabled ftrace hook for kretprobe, I didn't noticed that.

> Step to reproduce this:
> 1) Build the kretprobe example module (CONFIG_SAMPLE_KRETPROBES=m)
> 2) Disable jump optimization (`sysctl debug.kprobes-optimization=0` or register any kprobe.post_handler at same location)
> 3) Insert the kretprobe_example module
> 4) Launch some process to trigger _do_fork
> 5) Remove kretprobe_example module
> 6) dmesg shows that all probing instances are missed
> 
> Example output:
> # sysctl debug.kprobes-optimization=0
> debug.kprobes-optimization = 0
> # insmod samples/kprobes/kretprobe_example.ko
> # ls > /dev/null
> # rmmod kretprobe_example
> # dmesg
> [48555.067295] Planted return probe at _do_fork: 0000000038ae0211
> [48560.229459] kretprobe at 0000000038ae0211 unregistered
> [48560.229460] Missed probing 3 instances of _do_fork
> 
> After bisecting, I found this behavior seems to introduce by this commit: (5.8-rc1)
> 0d00449c7a28a1514595630735df383dec606812 x86: Replace ist_enter() with nmi_enter()
> This make kprobe_int3_handler() effectively running as NMI context, which pre_handler_kretprobe() explicitly checked to prevent recursion.

Thanks for the bisecting! 

> 
> (in_nmi() check appears from v3.17)
> f96f56780ca584930bb3a2769d73fd9a101bcbbe kprobes: Skip kretprobe hit in NMI context to avoid deadlock
> 
> To make kretprobe work again with int3 breakpoint, I think we can replace the in_nmi() check with in_nmi() == (1 << NMI_SHIFT) at kprobe_int3_handler() and skip kretprobe if nested NMI.

Ah, I see. Now int3 is a kind of NMI, so in the handler in_nmi() always returns !0.

> Did a quick test on 5.9-rc2 and it seems to be working.
> I'm not sure if it is the best way to do since it may also require change to other architecture as well, any thought?

Hmm, this behavior is arch-dependent. So I think we need an weak function like this.

@kernel/kprobes.c

bool __weak arch_kprobe_in_nmi(void)
{
	return in_nmi()
}

@arch/x86/kernel/kprobes/core.c

bool arch_kprobe_in_nmi(void)
{
       /*
        * Since the int3 is one of NMI, we have to check in_nmi() is
        * bigger than 1 << NMI_SHIFT instead of !0.
        */
       return in_nmi() > (1 << NMI_SHIFT);
}

And use arch_kprobe_in_nmi() instead of in_nmi() in kprobes.c.

Thanks,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-24 14:14 ` Peter Zijlstra
@ 2020-08-24 16:18   ` Eddy_Wu
  2020-08-24 18:15   ` Masami Hiramatsu
  1 sibling, 0 replies; 33+ messages in thread
From: Eddy_Wu @ 2020-08-24 16:18 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Masami Hiramatsu, linux-kernel, x86, David S. Miller

> -----Original Message-----
> From: Peter Zijlstra <peterz@infradead.org>
> Sent: Monday, August 24, 2020 10:14 PM
> To: Eddy Wu <Eddy_Wu@trendmicro.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>; linux-kernel@vger.kernel.org; x86@kernel.org; David S. Miller <davem@davemloft.net>
> Subject: Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
>
> On Mon, Aug 24, 2020 at 12:02:58PM +0000, Eddy_Wu@trendmicro.com wrote:
> > After bisecting, I found this behavior seems to introduce by this
> > commit: (5.8-rc1) 0d00449c7a28a1514595630735df383dec606812 x86:
> > Replace ist_enter() with nmi_enter() This make kprobe_int3_handler()
> > effectively running as NMI context, which pre_handler_kretprobe()
> > explicitly checked to prevent recursion.
> >
> > (in_nmi() check appears from v3.17)
> > f96f56780ca584930bb3a2769d73fd9a101bcbbe kprobes: Skip kretprobe hit
> > in NMI context to avoid deadlock
> >
> > To make kretprobe work again with int3 breakpoint, I think we can
> > replace the in_nmi() check with in_nmi() == (1 << NMI_SHIFT) at
> > kprobe_int3_handler() and skip kretprobe if nested NMI.  Did a quick
> > test on 5.9-rc2 and it seems to be working.  I'm not sure if it is the
> > best way to do since it may also require change to other architecture
> > as well, any thought?
>
> Masami, would it be possible to have a kretprobe specific recursion
> count here?
>
> I did the below, but i'm not at all sure that isn't horrible broken. I
> can't really find many rp->lock sites and this might break things by
> limiting contention.
>
> ---
>
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 9be1bff4f586..0bff314cc800 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -153,6 +153,7 @@ struct kretprobe {
>         size_t data_size;
>         struct hlist_head free_instances;
>         raw_spinlock_t lock;
> +       atomic_t recursion;
>  };
>
>  struct kretprobe_instance {
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 287b263c9cb9..27fd096bcb9a 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1934,22 +1934,17 @@ unsigned long __weak arch_deref_entry_point(void *entry)
>  static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  {
>         struct kretprobe *rp = container_of(p, struct kretprobe, kp);
> -       unsigned long hash, flags = 0;
>         struct kretprobe_instance *ri;
> -
> -       /*
> -        * To avoid deadlocks, prohibit return probing in NMI contexts,
> -        * just skip the probe and increase the (inexact) 'nmissed'
> -        * statistical counter, so that the user is informed that
> -        * something happened:
> -        */
> -       if (unlikely(in_nmi())) {
> -               rp->nmissed++;
> -               return 0;
> -       }
> +       unsigned long hash, flags;
> +       int rec;
>
>         /* TODO: consider to only swap the RA after the last pre_handler fired */
>         hash = hash_ptr(current, KPROBE_HASH_BITS);
> +       rec = atomic_fetch_inc_acquire(&rp->recursion);
> +       if (rec) {
> +               rp->nmissed++;
> +               goto out;
> +       }
>         raw_spin_lock_irqsave(&rp->lock, flags);
>         if (!hlist_empty(&rp->free_instances)) {
>                 ri = hlist_entry(rp->free_instances.first,
> @@ -1964,7 +1959,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>                         raw_spin_lock_irqsave(&rp->lock, flags);
>                         hlist_add_head(&ri->hlist, &rp->free_instances);
>                         raw_spin_unlock_irqrestore(&rp->lock, flags);
> -                       return 0;
> +                       goto out;
>                 }
>
>                 arch_prepare_kretprobe(ri, regs);
> @@ -1978,6 +1973,8 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>                 rp->nmissed++;
>                 raw_spin_unlock_irqrestore(&rp->lock, flags);
>         }
> +out:
> +       atomic_dec(&rp->recursion);
>         return 0;
>  }
>  NOKPROBE_SYMBOL(pre_handler_kretprobe);
>
I think kprobe_int3_handler() already prevented pre_handler_kretprobe() from recursing, we need to protect critical section in recycle_rp_inst() that might be interrupt by NMI.
There is another kretprobe_table_lock has other call site maybe be interrupt by NMI too
TREND MICRO EMAIL NOTICE

The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

For details about what personal information we collect and why, please see our Privacy Notice on our website at: Read privacy policy<http://www.trendmicro.com/privacy>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-24 15:54 ` Masami Hiramatsu
@ 2020-08-24 16:41   ` Eddy_Wu
  2020-08-25  6:15     ` Masami Hiramatsu
  0 siblings, 1 reply; 33+ messages in thread
From: Eddy_Wu @ 2020-08-24 16:41 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Peter Zijlstra, linux-kernel, x86, David S. Miller

> -----Original Message-----
> From: Masami Hiramatsu <mhiramat@kernel.org>
> Sent: Monday, August 24, 2020 11:54 PM
> To: Eddy Wu (RD-TW) <Eddy_Wu@trendmicro.com>
> Cc: Peter Zijlstra <peterz@infradead.org>; linux-kernel@vger.kernel.org; x86@kernel.org; David S. Miller <davem@davemloft.net>
> Subject: Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
>
>
> This message was sent from outside of Trend Micro. Please do not click links or open attachments unless you recognise the source of this
> email and know the content is safe.
>
>
> On Mon, 24 Aug 2020 12:02:58 +0000
> "Eddy_Wu@trendmicro.com" <Eddy_Wu@trendmicro.com> wrote:
>
> > Greetings!
> >
> > Starting from kernel 5.8 (x86_64), kretprobe handler will always missed if corresponding kprobe on function entry is not optimized
> (using break point instead).
>
> Oops, good catch. I always enabled ftrace hook for kretprobe, I didn't noticed that.
>
> > Step to reproduce this:
> > 1) Build the kretprobe example module (CONFIG_SAMPLE_KRETPROBES=m)
> > 2) Disable jump optimization (`sysctl debug.kprobes-optimization=0` or register any kprobe.post_handler at same location)
> > 3) Insert the kretprobe_example module
> > 4) Launch some process to trigger _do_fork
> > 5) Remove kretprobe_example module
> > 6) dmesg shows that all probing instances are missed
> >
> > Example output:
> > # sysctl debug.kprobes-optimization=0
> > debug.kprobes-optimization = 0
> > # insmod samples/kprobes/kretprobe_example.ko
> > # ls > /dev/null
> > # rmmod kretprobe_example
> > # dmesg
> > [48555.067295] Planted return probe at _do_fork: 0000000038ae0211
> > [48560.229459] kretprobe at 0000000038ae0211 unregistered
> > [48560.229460] Missed probing 3 instances of _do_fork
> >
> > After bisecting, I found this behavior seems to introduce by this commit: (5.8-rc1)
> > 0d00449c7a28a1514595630735df383dec606812 x86: Replace ist_enter() with nmi_enter()
> > This make kprobe_int3_handler() effectively running as NMI context, which pre_handler_kretprobe() explicitly checked to prevent
> recursion.
>
> Thanks for the bisecting!
>
> >
> > (in_nmi() check appears from v3.17)
> > f96f56780ca584930bb3a2769d73fd9a101bcbbe kprobes: Skip kretprobe hit in NMI context to avoid deadlock
> >
> > To make kretprobe work again with int3 breakpoint, I think we can replace the in_nmi() check with in_nmi() == (1 << NMI_SHIFT) at
> kprobe_int3_handler() and skip kretprobe if nested NMI.
>
> Ah, I see. Now int3 is a kind of NMI, so in the handler in_nmi() always returns !0.
>
> > Did a quick test on 5.9-rc2 and it seems to be working.
> > I'm not sure if it is the best way to do since it may also require change to other architecture as well, any thought?
>
> Hmm, this behavior is arch-dependent. So I think we need an weak function like this.
>
> @kernel/kprobes.c
>
> bool __weak arch_kprobe_in_nmi(void)
> {
>         return in_nmi()
> }
>
> @arch/x86/kernel/kprobes/core.c
>
> bool arch_kprobe_in_nmi(void)
> {
>        /*
>         * Since the int3 is one of NMI, we have to check in_nmi() is
>         * bigger than 1 << NMI_SHIFT instead of !0.
>         */
>        return in_nmi() > (1 << NMI_SHIFT);
> }
>
> And use arch_kprobe_in_nmi() instead of in_nmi() in kprobes.c.
>
> Thanks,
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>

Kretprobe might still trigger from NMI with nmi counter == 1 (if entry kprobe is jump-optimized).
The arch- dependent weak function looks cleaner than doing this in kprobe_int3_handler() under x86/, but I don't know if there is a way to check if called by specific int3 handler or not.

My original patch below, need to change all architecture support kretprobe though

Thanks

---
 arch/x86/kernel/kprobes/core.c |  6 ++++++
 include/linux/kprobes.h        |  1 +
 kernel/kprobes.c               | 13 +------------
 3 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index fdadc37d72af..1b785aef85ef 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -699,6 +699,12 @@ int kprobe_int3_handler(struct pt_regs *regs)
 set_current_kprobe(p, regs, kcb);
 kcb->kprobe_status = KPROBE_HIT_ACTIVE;

+if (p->pre_handler == pre_handler_kretprobe && in_nmi() != (1 << NMI_SHIFT)) {
+struct kretprobe *rp = container_of(p, struct kretprobe, kp);
+rp->nmissed++;
+setup_singlestep(p, regs, kcb, 0);
+return 1;
+}
 /*
  * If we have no pre-handler or it returned 0, we
  * continue with normal processing.  If we have a
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9be1bff4f586..3ded8e46ada5 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -494,5 +494,6 @@ static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs,
 return false;
 return kprobe_fault_handler(regs, trap);
 }
+int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs);

 #endif /* _LINUX_KPROBES_H */
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 287b263c9cb9..0f4d61613ded 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1931,23 +1931,12 @@ unsigned long __weak arch_deref_entry_point(void *entry)
  * This kprobe pre_handler is registered with every kretprobe. When probe
  * hits it will set up the return probe.
  */
-static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
+int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 {
 struct kretprobe *rp = container_of(p, struct kretprobe, kp);
 unsigned long hash, flags = 0;
 struct kretprobe_instance *ri;

-/*
- * To avoid deadlocks, prohibit return probing in NMI contexts,
- * just skip the probe and increase the (inexact) 'nmissed'
- * statistical counter, so that the user is informed that
- * something happened:
- */
-if (unlikely(in_nmi())) {
-rp->nmissed++;
-return 0;
-}
-
 /* TODO: consider to only swap the RA after the last pre_handler fired */
 hash = hash_ptr(current, KPROBE_HASH_BITS);
 raw_spin_lock_irqsave(&rp->lock, flags);
--
2.17.1

TREND MICRO EMAIL NOTICE

The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

For details about what personal information we collect and why, please see our Privacy Notice on our website at: Read privacy policy<http://www.trendmicro.com/privacy>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-24 14:14 ` Peter Zijlstra
  2020-08-24 16:18   ` Eddy_Wu
@ 2020-08-24 18:15   ` Masami Hiramatsu
  2020-08-25  7:36     ` peterz
  1 sibling, 1 reply; 33+ messages in thread
From: Masami Hiramatsu @ 2020-08-24 18:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Eddy_Wu, Masami Hiramatsu, linux-kernel, x86, David S. Miller

Hi Peter,

On Mon, 24 Aug 2020 16:14:29 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Aug 24, 2020 at 12:02:58PM +0000, Eddy_Wu@trendmicro.com wrote:
> > After bisecting, I found this behavior seems to introduce by this
> > commit: (5.8-rc1) 0d00449c7a28a1514595630735df383dec606812 x86:
> > Replace ist_enter() with nmi_enter() This make kprobe_int3_handler()
> > effectively running as NMI context, which pre_handler_kretprobe()
> > explicitly checked to prevent recursion.
> > 
> > (in_nmi() check appears from v3.17)
> > f96f56780ca584930bb3a2769d73fd9a101bcbbe kprobes: Skip kretprobe hit
> > in NMI context to avoid deadlock
> > 
> > To make kretprobe work again with int3 breakpoint, I think we can
> > replace the in_nmi() check with in_nmi() == (1 << NMI_SHIFT) at
> > kprobe_int3_handler() and skip kretprobe if nested NMI.  Did a quick
> > test on 5.9-rc2 and it seems to be working.  I'm not sure if it is the
> > best way to do since it may also require change to other architecture
> > as well, any thought?
> 
> Masami, would it be possible to have a kretprobe specific recursion
> count here?

Hmm, good point. As I commented in f96f56780ca5 ("kprobes: Skip kretprobe
hit in NMI context to avoid deadlock"), this check is for avoiding the
deadlock with kretprobe_table_lock which is used in pre_handler *and*
kretprobe trampoline handler.

> 
> I did the below, but i'm not at all sure that isn't horrible broken. I
> can't really find many rp->lock sites and this might break things by
> limiting contention.

This is not enough. For checking the recursion of kretprobes, we might
need kretprobe_table_trylock() or kretprobe_table_busy() (but both
can be false positive)

Note that rp->lock shouldn't matter unless we will support recursive
kprobe itself. (even though, we can use raw_spin_trylock_irqsave())

Thank you,

> 
> ---
> 
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 9be1bff4f586..0bff314cc800 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -153,6 +153,7 @@ struct kretprobe {
>  	size_t data_size;
>  	struct hlist_head free_instances;
>  	raw_spinlock_t lock;
> +	atomic_t recursion;
>  };
>  
>  struct kretprobe_instance {
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 287b263c9cb9..27fd096bcb9a 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1934,22 +1934,17 @@ unsigned long __weak arch_deref_entry_point(void *entry)
>  static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  {
>  	struct kretprobe *rp = container_of(p, struct kretprobe, kp);
> -	unsigned long hash, flags = 0;
>  	struct kretprobe_instance *ri;
> -
> -	/*
> -	 * To avoid deadlocks, prohibit return probing in NMI contexts,
> -	 * just skip the probe and increase the (inexact) 'nmissed'
> -	 * statistical counter, so that the user is informed that
> -	 * something happened:
> -	 */
> -	if (unlikely(in_nmi())) {
> -		rp->nmissed++;
> -		return 0;
> -	}
> +	unsigned long hash, flags;
> +	int rec;
>  
>  	/* TODO: consider to only swap the RA after the last pre_handler fired */
>  	hash = hash_ptr(current, KPROBE_HASH_BITS);
> +	rec = atomic_fetch_inc_acquire(&rp->recursion);
> +	if (rec) {
> +		rp->nmissed++;
> +		goto out;
> +	}
>  	raw_spin_lock_irqsave(&rp->lock, flags);
>  	if (!hlist_empty(&rp->free_instances)) {
>  		ri = hlist_entry(rp->free_instances.first,
> @@ -1964,7 +1959,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  			raw_spin_lock_irqsave(&rp->lock, flags);
>  			hlist_add_head(&ri->hlist, &rp->free_instances);
>  			raw_spin_unlock_irqrestore(&rp->lock, flags);
> -			return 0;
> +			goto out;
>  		}
>  
>  		arch_prepare_kretprobe(ri, regs);
> @@ -1978,6 +1973,8 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  		rp->nmissed++;
>  		raw_spin_unlock_irqrestore(&rp->lock, flags);
>  	}
> +out:
> +	atomic_dec(&rp->recursion);
>  	return 0;
>  }
>  NOKPROBE_SYMBOL(pre_handler_kretprobe);
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-24 16:41   ` Eddy_Wu
@ 2020-08-25  6:15     ` Masami Hiramatsu
  2020-08-25  8:33       ` Eddy_Wu
                         ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Masami Hiramatsu @ 2020-08-25  6:15 UTC (permalink / raw)
  To: Eddy_Wu; +Cc: Peter Zijlstra, linux-kernel, x86, David S. Miller

Hi Eddy,

On Mon, 24 Aug 2020 16:41:58 +0000
"Eddy_Wu@trendmicro.com" <Eddy_Wu@trendmicro.com> wrote:

> > -----Original Message-----
> > From: Masami Hiramatsu <mhiramat@kernel.org>
> > Sent: Monday, August 24, 2020 11:54 PM
> > To: Eddy Wu (RD-TW) <Eddy_Wu@trendmicro.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>; linux-kernel@vger.kernel.org; x86@kernel.org; David S. Miller <davem@davemloft.net>
> > Subject: Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
> >
> >
> > This message was sent from outside of Trend Micro. Please do not click links or open attachments unless you recognise the source of this
> > email and know the content is safe.
> >
> >
> > On Mon, 24 Aug 2020 12:02:58 +0000
> > "Eddy_Wu@trendmicro.com" <Eddy_Wu@trendmicro.com> wrote:
> >
> > > Greetings!
> > >
> > > Starting from kernel 5.8 (x86_64), kretprobe handler will always missed if corresponding kprobe on function entry is not optimized
> > (using break point instead).
> >
> > Oops, good catch. I always enabled ftrace hook for kretprobe, I didn't noticed that.
> >
> > > Step to reproduce this:
> > > 1) Build the kretprobe example module (CONFIG_SAMPLE_KRETPROBES=m)
> > > 2) Disable jump optimization (`sysctl debug.kprobes-optimization=0` or register any kprobe.post_handler at same location)
> > > 3) Insert the kretprobe_example module
> > > 4) Launch some process to trigger _do_fork
> > > 5) Remove kretprobe_example module
> > > 6) dmesg shows that all probing instances are missed
> > >
> > > Example output:
> > > # sysctl debug.kprobes-optimization=0
> > > debug.kprobes-optimization = 0
> > > # insmod samples/kprobes/kretprobe_example.ko
> > > # ls > /dev/null
> > > # rmmod kretprobe_example
> > > # dmesg
> > > [48555.067295] Planted return probe at _do_fork: 0000000038ae0211
> > > [48560.229459] kretprobe at 0000000038ae0211 unregistered
> > > [48560.229460] Missed probing 3 instances of _do_fork
> > >
> > > After bisecting, I found this behavior seems to introduce by this commit: (5.8-rc1)
> > > 0d00449c7a28a1514595630735df383dec606812 x86: Replace ist_enter() with nmi_enter()
> > > This make kprobe_int3_handler() effectively running as NMI context, which pre_handler_kretprobe() explicitly checked to prevent
> > recursion.
> >
> > Thanks for the bisecting!
> >
> > >
> > > (in_nmi() check appears from v3.17)
> > > f96f56780ca584930bb3a2769d73fd9a101bcbbe kprobes: Skip kretprobe hit in NMI context to avoid deadlock
> > >
> > > To make kretprobe work again with int3 breakpoint, I think we can replace the in_nmi() check with in_nmi() == (1 << NMI_SHIFT) at
> > kprobe_int3_handler() and skip kretprobe if nested NMI.
> >
> > Ah, I see. Now int3 is a kind of NMI, so in the handler in_nmi() always returns !0.
> >
> > > Did a quick test on 5.9-rc2 and it seems to be working.
> > > I'm not sure if it is the best way to do since it may also require change to other architecture as well, any thought?
> >
> > Hmm, this behavior is arch-dependent. So I think we need an weak function like this.
> >
> > @kernel/kprobes.c
> >
> > bool __weak arch_kprobe_in_nmi(void)
> > {
> >         return in_nmi()
> > }
> >
> > @arch/x86/kernel/kprobes/core.c
> >
> > bool arch_kprobe_in_nmi(void)
> > {
> >        /*
> >         * Since the int3 is one of NMI, we have to check in_nmi() is
> >         * bigger than 1 << NMI_SHIFT instead of !0.
> >         */
> >        return in_nmi() > (1 << NMI_SHIFT);
> > }
> >
> > And use arch_kprobe_in_nmi() instead of in_nmi() in kprobes.c.
> >
> > Thanks,
> >
> > --
> > Masami Hiramatsu <mhiramat@kernel.org>
> 
> Kretprobe might still trigger from NMI with nmi counter == 1 (if entry kprobe is jump-optimized).

Ah, right. Hmm, in that case, we can store the int3 status in 
the kprobe_ctlblk and refer it in the handler.


> The arch- dependent weak function looks cleaner than doing this in kprobe_int3_handler() under x86/, but I don't know if there is a way to check if called by specific int3 handler or not.
> 
> My original patch below, need to change all architecture support kretprobe though

OK, here is my fix. This will not change the other arches. please try it.


From 24390dffe6eb9a3e95f7d46a528a1dcfd716dc81 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat@kernel.org>
Date: Tue, 25 Aug 2020 01:37:00 +0900
Subject: [PATCH] kprobes/x86: Fixes NMI context check on x86

Since commit 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
made int3 as one of NMI, in_nmi() in kprobe handlers always returns !0.
Thus the kretprobe handlers always skipped the execution on x86 if it
is using int3. (CONFIG_KPROBES_ON_FTRACE=n and
echo 0 > /proc/sys/debug/kprobe_optimization)

To avoid this issue, introduce arch_kprobe_in_nmi() and check the
in_nmi() count is bigger than 1 << NMI_SHIFT on x86 if the handler
has been invoked from kprobe_int3_handler. By default, the
arch_kprobe_in_nmi() will be same as in_nmi().

Fixes: 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
Reported-by: Eddy Wu <Eddy_Wu@trendmicro.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
---
 arch/x86/include/asm/kprobes.h |  1 +
 arch/x86/kernel/kprobes/core.c | 18 ++++++++++++++++++
 kernel/kprobes.c               |  8 +++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 143bc9abe99c..ddb24feb95ad 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -98,6 +98,7 @@ struct kprobe_ctlblk {
 	unsigned long kprobe_old_flags;
 	unsigned long kprobe_saved_flags;
 	struct prev_kprobe prev_kprobe;
+	bool 	in_int3;
 };
 
 extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 2ca10b770cff..649d467c8231 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -583,6 +583,20 @@ static nokprobe_inline void restore_btf(void)
 	}
 }
 
+bool arch_kprobe_in_nmi(void)
+{
+	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+
+	if (kcb->in_int3) {
+		/*
+		 * Since the int3 is one of NMI, we have to check in_nmi() is
+		 * bigger than 1 << NMI_SHIFT instead of !0.
+		 */
+		return in_nmi() > (1 << NMI_SHIFT);
+	} else
+		return in_nmi();
+}
+
 void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
 {
 	unsigned long *sara = stack_addr(regs);
@@ -697,6 +711,7 @@ int kprobe_int3_handler(struct pt_regs *regs)
 				return 1;
 		} else {
 			set_current_kprobe(p, regs, kcb);
+			kcb->in_int3 = true;
 			kcb->kprobe_status = KPROBE_HIT_ACTIVE;
 
 			/*
@@ -710,6 +725,7 @@ int kprobe_int3_handler(struct pt_regs *regs)
 				setup_singlestep(p, regs, kcb, 0);
 			else
 				reset_current_kprobe();
+			kcb->in_int3 = false;
 			return 1;
 		}
 	} else if (*addr != INT3_INSN_OPCODE) {
@@ -994,7 +1010,9 @@ int kprobe_debug_handler(struct pt_regs *regs)
 
 	if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
 		kcb->kprobe_status = KPROBE_HIT_SSDONE;
+		kcb->in_int3 = true;
 		cur->post_handler(cur, regs, 0);
+		kcb->in_int3 = false;
 	}
 
 	/* Restore back the original saved kprobes variables and continue. */
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 287b263c9cb9..9564928fb882 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1927,6 +1927,12 @@ unsigned long __weak arch_deref_entry_point(void *entry)
 }
 
 #ifdef CONFIG_KRETPROBES
+
+bool __weak arch_kprobe_in_nmi(void)
+{
+	return in_nmi();
+}
+
 /*
  * This kprobe pre_handler is registered with every kretprobe. When probe
  * hits it will set up the return probe.
@@ -1943,7 +1949,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 	 * statistical counter, so that the user is informed that
 	 * something happened:
 	 */
-	if (unlikely(in_nmi())) {
+	if (unlikely(arch_kprobe_in_nmi())) {
 		rp->nmissed++;
 		return 0;
 	}
-- 
2.25.1


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-24 18:15   ` Masami Hiramatsu
@ 2020-08-25  7:36     ` peterz
  0 siblings, 0 replies; 33+ messages in thread
From: peterz @ 2020-08-25  7:36 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Eddy_Wu, linux-kernel, x86, David S. Miller

On Tue, Aug 25, 2020 at 03:15:03AM +0900, Masami Hiramatsu wrote:

> > I did the below, but i'm not at all sure that isn't horrible broken. I
> > can't really find many rp->lock sites and this might break things by
> > limiting contention.
> 
> This is not enough. 

I was afraid of that..

> For checking the recursion of kretprobes, we might
> need kretprobe_table_trylock() or kretprobe_table_busy() (but both
> can be false positive)

Agreed.

> Note that rp->lock shouldn't matter unless we will support recursive
> kprobe itself. (even though, we can use raw_spin_trylock_irqsave())

If the deadlock mentioned isn't about rp->lock, then what it is about?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-25  6:15     ` Masami Hiramatsu
@ 2020-08-25  8:33       ` Eddy_Wu
  2020-08-25 11:06         ` kernel test robot
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: Eddy_Wu @ 2020-08-25  8:33 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Peter Zijlstra, linux-kernel, x86, David S. Miller


> -----Original Message-----
> From: Masami Hiramatsu <mhiramat@kernel.org>
> Sent: Tuesday, August 25, 2020 2:16 PM
> To: Eddy Wu (RD-TW) <Eddy_Wu@trendmicro.com>
> Cc: Peter Zijlstra <peterz@infradead.org>; linux-kernel@vger.kernel.org; x86@kernel.org; David S. Miller <davem@davemloft.net>
> Subject: Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
>
>
> This message was sent from outside of Trend Micro. Please do not click links or open attachments unless you recognise the source of this
> email and know the content is safe.
>
>
> Hi Eddy,
>
> On Mon, 24 Aug 2020 16:41:58 +0000
> "Eddy_Wu@trendmicro.com" <Eddy_Wu@trendmicro.com> wrote:
>
> > > -----Original Message-----
> > > From: Masami Hiramatsu <mhiramat@kernel.org>
> > > Sent: Monday, August 24, 2020 11:54 PM
> > > To: Eddy Wu (RD-TW) <Eddy_Wu@trendmicro.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>; linux-kernel@vger.kernel.org; x86@kernel.org; David S. Miller <davem@davemloft.net>
> > > Subject: Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
> > >
> > >
> > > This message was sent from outside of Trend Micro. Please do not click links or open attachments unless you recognise the source of
> this
> > > email and know the content is safe.
> > >
> > >
> > > On Mon, 24 Aug 2020 12:02:58 +0000
> > > "Eddy_Wu@trendmicro.com" <Eddy_Wu@trendmicro.com> wrote:
> > >
> > > > Greetings!
> > > >
> > > > Starting from kernel 5.8 (x86_64), kretprobe handler will always missed if corresponding kprobe on function entry is not optimized
> > > (using break point instead).
> > >
> > > Oops, good catch. I always enabled ftrace hook for kretprobe, I didn't noticed that.
> > >
> > > > Step to reproduce this:
> > > > 1) Build the kretprobe example module (CONFIG_SAMPLE_KRETPROBES=m)
> > > > 2) Disable jump optimization (`sysctl debug.kprobes-optimization=0` or register any kprobe.post_handler at same location)
> > > > 3) Insert the kretprobe_example module
> > > > 4) Launch some process to trigger _do_fork
> > > > 5) Remove kretprobe_example module
> > > > 6) dmesg shows that all probing instances are missed
> > > >
> > > > Example output:
> > > > # sysctl debug.kprobes-optimization=0
> > > > debug.kprobes-optimization = 0
> > > > # insmod samples/kprobes/kretprobe_example.ko
> > > > # ls > /dev/null
> > > > # rmmod kretprobe_example
> > > > # dmesg
> > > > [48555.067295] Planted return probe at _do_fork: 0000000038ae0211
> > > > [48560.229459] kretprobe at 0000000038ae0211 unregistered
> > > > [48560.229460] Missed probing 3 instances of _do_fork
> > > >
> > > > After bisecting, I found this behavior seems to introduce by this commit: (5.8-rc1)
> > > > 0d00449c7a28a1514595630735df383dec606812 x86: Replace ist_enter() with nmi_enter()
> > > > This make kprobe_int3_handler() effectively running as NMI context, which pre_handler_kretprobe() explicitly checked to prevent
> > > recursion.
> > >
> > > Thanks for the bisecting!
> > >
> > > >
> > > > (in_nmi() check appears from v3.17)
> > > > f96f56780ca584930bb3a2769d73fd9a101bcbbe kprobes: Skip kretprobe hit in NMI context to avoid deadlock
> > > >
> > > > To make kretprobe work again with int3 breakpoint, I think we can replace the in_nmi() check with in_nmi() == (1 << NMI_SHIFT) at
> > > kprobe_int3_handler() and skip kretprobe if nested NMI.
> > >
> > > Ah, I see. Now int3 is a kind of NMI, so in the handler in_nmi() always returns !0.
> > >
> > > > Did a quick test on 5.9-rc2 and it seems to be working.
> > > > I'm not sure if it is the best way to do since it may also require change to other architecture as well, any thought?
> > >
> > > Hmm, this behavior is arch-dependent. So I think we need an weak function like this.
> > >
> > > @kernel/kprobes.c
> > >
> > > bool __weak arch_kprobe_in_nmi(void)
> > > {
> > >         return in_nmi()
> > > }
> > >
> > > @arch/x86/kernel/kprobes/core.c
> > >
> > > bool arch_kprobe_in_nmi(void)
> > > {
> > >        /*
> > >         * Since the int3 is one of NMI, we have to check in_nmi() is
> > >         * bigger than 1 << NMI_SHIFT instead of !0.
> > >         */
> > >        return in_nmi() > (1 << NMI_SHIFT);
> > > }
> > >
> > > And use arch_kprobe_in_nmi() instead of in_nmi() in kprobes.c.
> > >
> > > Thanks,
> > >
> > > --
> > > Masami Hiramatsu <mhiramat@kernel.org>
> >
> > Kretprobe might still trigger from NMI with nmi counter == 1 (if entry kprobe is jump-optimized).
>
> Ah, right. Hmm, in that case, we can store the int3 status in
> the kprobe_ctlblk and refer it in the handler.
>
>
> > The arch- dependent weak function looks cleaner than doing this in kprobe_int3_handler() under x86/, but I don't know if there is a
> way to check if called by specific int3 handler or not.
> >
> > My original patch below, need to change all architecture support kretprobe though
>
> OK, here is my fix. This will not change the other arches. please try it.
>
>
> From 24390dffe6eb9a3e95f7d46a528a1dcfd716dc81 Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu <mhiramat@kernel.org>
> Date: Tue, 25 Aug 2020 01:37:00 +0900
> Subject: [PATCH] kprobes/x86: Fixes NMI context check on x86
>
> Since commit 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
> made int3 as one of NMI, in_nmi() in kprobe handlers always returns !0.
> Thus the kretprobe handlers always skipped the execution on x86 if it
> is using int3. (CONFIG_KPROBES_ON_FTRACE=n and
> echo 0 > /proc/sys/debug/kprobe_optimization)
>
> To avoid this issue, introduce arch_kprobe_in_nmi() and check the
> in_nmi() count is bigger than 1 << NMI_SHIFT on x86 if the handler
> has been invoked from kprobe_int3_handler. By default, the
> arch_kprobe_in_nmi() will be same as in_nmi().
>
> Fixes: 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
> Reported-by: Eddy Wu <Eddy_Wu@trendmicro.com>
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: stable@vger.kernel.org
> ---
>  arch/x86/include/asm/kprobes.h |  1 +
>  arch/x86/kernel/kprobes/core.c | 18 ++++++++++++++++++
>  kernel/kprobes.c               |  8 +++++++-
>  3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
> index 143bc9abe99c..ddb24feb95ad 100644
> --- a/arch/x86/include/asm/kprobes.h
> +++ b/arch/x86/include/asm/kprobes.h
> @@ -98,6 +98,7 @@ struct kprobe_ctlblk {
>         unsigned long kprobe_old_flags;
>         unsigned long kprobe_saved_flags;
>         struct prev_kprobe prev_kprobe;
> +       bool    in_int3;
>  };
>
>  extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 2ca10b770cff..649d467c8231 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -583,6 +583,20 @@ static nokprobe_inline void restore_btf(void)
>         }
>  }
>
> +bool arch_kprobe_in_nmi(void)
> +{
> +       struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
> +
> +       if (kcb->in_int3) {
> +               /*
> +                * Since the int3 is one of NMI, we have to check in_nmi() is
> +                * bigger than 1 << NMI_SHIFT instead of !0.
> +                */
> +               return in_nmi() > (1 << NMI_SHIFT);
> +       } else
> +               return in_nmi();
> +}
> +
>  void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
>  {
>         unsigned long *sara = stack_addr(regs);
> @@ -697,6 +711,7 @@ int kprobe_int3_handler(struct pt_regs *regs)
>                                 return 1;
>                 } else {
>                         set_current_kprobe(p, regs, kcb);
> +                       kcb->in_int3 = true;
>                         kcb->kprobe_status = KPROBE_HIT_ACTIVE;
>
>                         /*
> @@ -710,6 +725,7 @@ int kprobe_int3_handler(struct pt_regs *regs)
>                                 setup_singlestep(p, regs, kcb, 0);
>                         else
>                                 reset_current_kprobe();
> +                       kcb->in_int3 = false;
>                         return 1;
>                 }
>         } else if (*addr != INT3_INSN_OPCODE) {
> @@ -994,7 +1010,9 @@ int kprobe_debug_handler(struct pt_regs *regs)
>
>         if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
>                 kcb->kprobe_status = KPROBE_HIT_SSDONE;
> +               kcb->in_int3 = true;
>                 cur->post_handler(cur, regs, 0);
> +               kcb->in_int3 = false;
>         }
>
>         /* Restore back the original saved kprobes variables and continue. */
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 287b263c9cb9..9564928fb882 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1927,6 +1927,12 @@ unsigned long __weak arch_deref_entry_point(void *entry)
>  }
>
>  #ifdef CONFIG_KRETPROBES
> +
> +bool __weak arch_kprobe_in_nmi(void)
> +{
> +       return in_nmi();
> +}
> +
>  /*
>   * This kprobe pre_handler is registered with every kretprobe. When probe
>   * hits it will set up the return probe.
> @@ -1943,7 +1949,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>          * statistical counter, so that the user is informed that
>          * something happened:
>          */
> -       if (unlikely(in_nmi())) {
> +       if (unlikely(arch_kprobe_in_nmi())) {
>                 rp->nmissed++;
>                 return 0;
>         }
> --
> 2.25.1
>
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>

This works on my machine, thanks!

TREND MICRO EMAIL NOTICE

The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

For details about what personal information we collect and why, please see our Privacy Notice on our website at: Read privacy policy<http://www.trendmicro.com/privacy>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kprobes/x86: Fixes NMI context check on x86
  2020-08-25  6:15     ` Masami Hiramatsu
@ 2020-08-25 11:06         ` kernel test robot
  2020-08-25 11:06         ` kernel test robot
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2020-08-25 11:06 UTC (permalink / raw)
  To: Masami Hiramatsu, Eddy_Wu; +Cc: kbuild-all, Peter Zijlstra, linux-kernel, x86

[-- Attachment #1: Type: text/plain, Size: 2305 bytes --]

Hi Masami,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on tip/auto-latest linux/master linus/master v5.9-rc2 next-20200825]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ef2ff0f5d6008d325c9a068e20981c0d0acc4d6b
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/kprobes/core.c:573:6: warning: no previous prototype for 'arch_kprobe_in_nmi' [-Wmissing-prototypes]
     573 | bool arch_kprobe_in_nmi(void)
         |      ^~~~~~~~~~~~~~~~~~
   arch/x86/kernel/kprobes/core.c:776:24: warning: no previous prototype for 'trampoline_handler' [-Wmissing-prototypes]
     776 | __used __visible void *trampoline_handler(struct pt_regs *regs)
         |                        ^~~~~~~~~~~~~~~~~~

# https://github.com/0day-ci/linux/commit/2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
git checkout 2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
vim +/arch_kprobe_in_nmi +573 arch/x86/kernel/kprobes/core.c

   572	
 > 573	bool arch_kprobe_in_nmi(void)
   574	{
   575		struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
   576	
   577		if (kcb->in_int3) {
   578			/*
   579			 * Since the int3 is one of NMI, we have to check in_nmi() is
   580			 * bigger than 1 << NMI_SHIFT instead of !0.
   581			 */
   582			return in_nmi() > (1 << NMI_SHIFT);
   583		} else
   584			return in_nmi();
   585	}
   586	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 74141 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kprobes/x86: Fixes NMI context check on x86
@ 2020-08-25 11:06         ` kernel test robot
  0 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2020-08-25 11:06 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2360 bytes --]

Hi Masami,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on tip/auto-latest linux/master linus/master v5.9-rc2 next-20200825]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ef2ff0f5d6008d325c9a068e20981c0d0acc4d6b
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/kprobes/core.c:573:6: warning: no previous prototype for 'arch_kprobe_in_nmi' [-Wmissing-prototypes]
     573 | bool arch_kprobe_in_nmi(void)
         |      ^~~~~~~~~~~~~~~~~~
   arch/x86/kernel/kprobes/core.c:776:24: warning: no previous prototype for 'trampoline_handler' [-Wmissing-prototypes]
     776 | __used __visible void *trampoline_handler(struct pt_regs *regs)
         |                        ^~~~~~~~~~~~~~~~~~

# https://github.com/0day-ci/linux/commit/2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
git checkout 2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
vim +/arch_kprobe_in_nmi +573 arch/x86/kernel/kprobes/core.c

   572	
 > 573	bool arch_kprobe_in_nmi(void)
   574	{
   575		struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
   576	
   577		if (kcb->in_int3) {
   578			/*
   579			 * Since the int3 is one of NMI, we have to check in_nmi() is
   580			 * bigger than 1 << NMI_SHIFT instead of !0.
   581			 */
   582			return in_nmi() > (1 << NMI_SHIFT);
   583		} else
   584			return in_nmi();
   585	}
   586	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 74141 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-25  6:15     ` Masami Hiramatsu
  2020-08-25  8:33       ` Eddy_Wu
  2020-08-25 11:06         ` kernel test robot
@ 2020-08-25 12:09       ` peterz
  2020-08-25 13:15         ` Masami Hiramatsu
                           ` (2 more replies)
  2020-08-25 12:20         ` kernel test robot
  2020-08-25 12:25         ` kernel test robot
  4 siblings, 3 replies; 33+ messages in thread
From: peterz @ 2020-08-25 12:09 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Eddy_Wu, linux-kernel, x86, David S. Miller

On Tue, Aug 25, 2020 at 03:15:38PM +0900, Masami Hiramatsu wrote:

> From 24390dffe6eb9a3e95f7d46a528a1dcfd716dc81 Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu <mhiramat@kernel.org>
> Date: Tue, 25 Aug 2020 01:37:00 +0900
> Subject: [PATCH] kprobes/x86: Fixes NMI context check on x86
> 
> Since commit 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
> made int3 as one of NMI, in_nmi() in kprobe handlers always returns !0.
> Thus the kretprobe handlers always skipped the execution on x86 if it
> is using int3. (CONFIG_KPROBES_ON_FTRACE=n and
> echo 0 > /proc/sys/debug/kprobe_optimization)
> 
> To avoid this issue, introduce arch_kprobe_in_nmi() and check the
> in_nmi() count is bigger than 1 << NMI_SHIFT on x86 if the handler
> has been invoked from kprobe_int3_handler. By default, the
> arch_kprobe_in_nmi() will be same as in_nmi().

So I still hate all of this, it feels like it's making a bad hack worse.

> Fixes: 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

I think there's another problem, when you run this code with lockdep
enabled it'll complain very loudly. Lockdep doesn't like using locks
from NMI context much.

Can't we take one step back. Why are we taking locks from kprobe
context? I was under the impression that kprobes were lockless (for good
reasons), why does kretprobe need one? And can't we fix that?

Looking at the code it mucks about with with an hlist.

So on pre_handler is tries and take a kretprobe_instance from a list,
and fails when there isn't one. This looks like per-task-per-retprobe
data. Also, all of that is only returned when the task dies.

Surely we can do a lockless list for this. We have llist_add() and
llist_del_first() to make a lockless LIFO/stack.


/me frobs code....


Oooohh, another lock :-( kretprobe_table_lock

Bah bah bah, what a mess, surely we have a lockless hash-table
somewhere? /me goes rummage around. Nope we don't.

Lightbulb!

That whole hash-table is nonsense, we don't need it. All it does is
provide a per-task hlist. We can just stick a llist_head in task_struct
itself and delete all that.


/me frobs more code...

argh, arch/ code uses this


damn... one last problem is dangling instances.. so close.
We can apparently unregister a kretprobe while there's still active
kretprobe_instance's out referencing it.

Ignoring that issue for the moment, the below seems to actually work.

---
Subject: kprobe: Rewrite kretprobe to not use locks

Use llist based atomic stack for rp->free_instances, and a llist based
non-atomic stack for current->kretprobe_instances. The latter is only
ever referenced from the task context and properly nests, so even if it
gets interrupted, the interrupt should pop whatever it pushed on the
stack before returning.

XXX unregister_kretprobe*() vs active instances is broken.
XXX everything !x86 is broken

Not-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/kprobes/core.c      |  75 ++++++-------------
 drivers/gpu/drm/i915/i915_request.c |   6 --
 include/linux/kprobes.h             |   8 +-
 include/linux/llist.h               |  15 ++++
 include/linux/sched.h               |   4 +
 kernel/fork.c                       |   4 +
 kernel/kprobes.c                    | 145 +++++++++---------------------------
 7 files changed, 87 insertions(+), 170 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 2ca10b770cff..311c26ef0fc2 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -772,14 +772,12 @@ STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
  */
 __used __visible void *trampoline_handler(struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
 	unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
 	kprobe_opcode_t *correct_ret_addr = NULL;
+	struct kretprobe_instance *ri = NULL;
+	struct llist_node *node, *first;
+	unsigned long orig_ret_address;
 	void *frame_pointer;
-	bool skipped = false;
 
 	/*
 	 * Set a dummy kprobe for avoiding kretprobe recursion.
@@ -788,8 +786,6 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 	 */
 	kprobe_busy_begin();
 
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
 	/* fixup registers */
 	regs->cs = __KERNEL_CS;
 #ifdef CONFIG_X86_32
@@ -813,48 +809,37 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 	 *	 will be the real return address, and all the rest will
 	 *	 point to kretprobe_trampoline.
 	 */
-	hlist_for_each_entry(ri, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-		/*
-		 * Return probes must be pushed on this hash list correct
-		 * order (same as return order) so that it can be popped
-		 * correctly. However, if we find it is pushed it incorrect
-		 * order, this means we find a function which should not be
-		 * probed, because the wrong order entry is pushed on the
-		 * path of processing other kretprobe itself.
-		 */
-		if (ri->fp != frame_pointer) {
-			if (!skipped)
-				pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n");
-			skipped = true;
-			continue;
-		}
 
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		if (skipped)
-			pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n",
-				ri->rp->kp.addr);
+	first = node = current->kretprobe_instances.first;
+	while (node) {
+		ri = container_of(node, struct kretprobe_instance, llist);
+		node = node->next;
 
-		if (orig_ret_address != trampoline_address)
+		BUG_ON(ri->fp != frame_pointer);
+
+		orig_ret_address = (unsigned long)ri->ret_addr;
+		if (orig_ret_address != trampoline_address) {
 			/*
 			 * This is the real return address. Any other
 			 * instances associated with this task are for
 			 * other calls deeper on the call stack
 			 */
 			break;
+		}
 	}
 
 	kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
 	correct_ret_addr = ri->ret_addr;
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-		if (ri->fp != frame_pointer)
-			continue;
+
+	/*
+	 * Per the above BUG_ON() we're guaranteed at least one ri.
+	 *
+	 * Delete all nodes for this frame_pointer.
+	 */
+	current->kretprobe_instances.first = node;
+	while (first) {
+		ri = container_of(first, struct kretprobe_instance, llist);
+		node = first->next;
 
 		orig_ret_address = (unsigned long)ri->ret_addr;
 		if (ri->rp && ri->rp->handler) {
@@ -864,25 +849,13 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 			__this_cpu_write(current_kprobe, &kprobe_busy);
 		}
 
-		recycle_rp_inst(ri, &empty_rp);
+		recycle_rp_inst(ri, NULL);
 
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
+		first = node;
 	}
 
-	kretprobe_hash_unlock(current, &flags);
-
 	kprobe_busy_end();
 
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
 	return (void *)orig_ret_address;
 }
 NOKPROBE_SYMBOL(trampoline_handler);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0b2fe55e6194..0e851b925c8c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -357,12 +357,6 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (i915_request_retire(tmp) && tmp != rq);
 }
 
-static void __llist_add(struct llist_node *node, struct llist_head *head)
-{
-	node->next = head->first;
-	head->first = node;
-}
-
 static struct i915_request * const *
 __engine_active(struct intel_engine_cs *engine)
 {
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9be1bff4f586..073c40ae2cdb 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -151,12 +151,14 @@ struct kretprobe {
 	int maxactive;
 	int nmissed;
 	size_t data_size;
-	struct hlist_head free_instances;
-	raw_spinlock_t lock;
+//	struct hlist_head free_instances;
+//	raw_spinlock_t lock;
+	struct llist_head free_instances;
 };
 
 struct kretprobe_instance {
-	struct hlist_node hlist;
+//	struct hlist_node hlist;
+	struct llist_node llist;
 	struct kretprobe *rp;
 	kprobe_opcode_t *ret_addr;
 	struct task_struct *task;
diff --git a/include/linux/llist.h b/include/linux/llist.h
index 2e9c7215882b..c17893dcc591 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -197,6 +197,16 @@ static inline struct llist_node *llist_next(struct llist_node *node)
 extern bool llist_add_batch(struct llist_node *new_first,
 			    struct llist_node *new_last,
 			    struct llist_head *head);
+
+static inline bool __llist_add_batch(struct llist_node *new_first,
+				     struct llist_node *new_last,
+				     struct llist_head *head)
+{
+	new_last->next = head->first;
+	head->first = new_first;
+	return new_last->next == NULL;
+}
+
 /**
  * llist_add - add a new entry
  * @new:	new entry to be added
@@ -209,6 +219,11 @@ static inline bool llist_add(struct llist_node *new, struct llist_head *head)
 	return llist_add_batch(new, new, head);
 }
 
+static inline bool __llist_add(struct llist_node *new, struct llist_head *head)
+{
+	return __llist_add_batch(new, new, head);
+}
+
 /**
  * llist_del_all - delete all entries from lock-less list
  * @head:	the head of lock-less list to delete all entries
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93ecd930efd3..e53fd653cefa 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1315,6 +1315,10 @@ struct task_struct {
 	struct callback_head		mce_kill_me;
 #endif
 
+#ifdef CONFIG_KRETPROBES
+	struct llist_head		kretprobe_instances;
+#endif
+
 	/*
 	 * New fields for task_struct should be added above here, so that
 	 * they are included in the randomized portion of task_struct.
diff --git a/kernel/fork.c b/kernel/fork.c
index 4d32190861bd..2ff5cceb0732 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2161,6 +2161,10 @@ static __latent_entropy struct task_struct *copy_process(
 	INIT_LIST_HEAD(&p->thread_group);
 	p->task_works = NULL;
 
+#ifdef CONFIG_KRETPROBES
+	p->kretprobe_instances.first = NULL;
+#endif
+
 	/*
 	 * Ensure that the cgroup subsystem policies allow the new process to be
 	 * forked. It should be noted the the new process's css_set can be changed
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 287b263c9cb9..e208ce6fa84d 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -53,7 +53,6 @@ static int kprobes_initialized;
  * - RCU hlist traversal under disabling preempt (breakpoint handlers)
  */
 static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
-static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
 
 /* NOTE: change this value only with kprobe_mutex held */
 static bool kprobes_all_disarmed;
@@ -61,9 +60,6 @@ static bool kprobes_all_disarmed;
 /* This protects kprobe_table and optimizing_list */
 static DEFINE_MUTEX(kprobe_mutex);
 static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL;
-static struct {
-	raw_spinlock_t lock ____cacheline_aligned_in_smp;
-} kretprobe_table_locks[KPROBE_TABLE_SIZE];
 
 kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
 					unsigned int __unused)
@@ -71,11 +67,6 @@ kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
 	return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
 }
 
-static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
-{
-	return &(kretprobe_table_locks[hash].lock);
-}
-
 /* Blacklist -- list of struct kprobe_blacklist_entry */
 static LIST_HEAD(kprobe_blacklist);
 
@@ -1228,62 +1219,22 @@ void recycle_rp_inst(struct kretprobe_instance *ri,
 {
 	struct kretprobe *rp = ri->rp;
 
-	/* remove rp inst off the rprobe_inst_table */
-	hlist_del(&ri->hlist);
-	INIT_HLIST_NODE(&ri->hlist);
-	if (likely(rp)) {
-		raw_spin_lock(&rp->lock);
-		hlist_add_head(&ri->hlist, &rp->free_instances);
-		raw_spin_unlock(&rp->lock);
-	} else
-		/* Unregistering */
-		hlist_add_head(&ri->hlist, head);
+	llist_add(&ri->llist, &rp->free_instances);
 }
 NOKPROBE_SYMBOL(recycle_rp_inst);
 
 void kretprobe_hash_lock(struct task_struct *tsk,
 			 struct hlist_head **head, unsigned long *flags)
-__acquires(hlist_lock)
 {
-	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
-	raw_spinlock_t *hlist_lock;
-
-	*head = &kretprobe_inst_table[hash];
-	hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
 }
 NOKPROBE_SYMBOL(kretprobe_hash_lock);
 
-static void kretprobe_table_lock(unsigned long hash,
-				 unsigned long *flags)
-__acquires(hlist_lock)
-{
-	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_table_lock);
-
 void kretprobe_hash_unlock(struct task_struct *tsk,
 			   unsigned long *flags)
-__releases(hlist_lock)
 {
-	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
-	raw_spinlock_t *hlist_lock;
-
-	hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_unlock_irqrestore(hlist_lock, *flags);
 }
 NOKPROBE_SYMBOL(kretprobe_hash_unlock);
 
-static void kretprobe_table_unlock(unsigned long hash,
-				   unsigned long *flags)
-__releases(hlist_lock)
-{
-	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_unlock_irqrestore(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_table_unlock);
-
 struct kprobe kprobe_busy = {
 	.addr = (void *) get_kprobe,
 };
@@ -1312,29 +1263,21 @@ void kprobe_busy_end(void)
  */
 void kprobe_flush_task(struct task_struct *tk)
 {
+	struct llist_node *next, *node;
 	struct kretprobe_instance *ri;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long hash, flags = 0;
 
+	/* Early boot, not yet initialized. */
 	if (unlikely(!kprobes_initialized))
-		/* Early boot.  kretprobe_table_locks not yet initialized. */
 		return;
 
 	kprobe_busy_begin();
 
-	INIT_HLIST_HEAD(&empty_rp);
-	hash = hash_ptr(tk, KPROBE_HASH_BITS);
-	head = &kretprobe_inst_table[hash];
-	kretprobe_table_lock(hash, &flags);
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task == tk)
-			recycle_rp_inst(ri, &empty_rp);
-	}
-	kretprobe_table_unlock(hash, &flags);
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
+	node = llist_del_all(&current->kretprobe_instances);
+	while (node) {
+		next = node->next;
+		ri = container_of(node, struct kretprobe_instance, llist);
 		kfree(ri);
+		node = next;
 	}
 
 	kprobe_busy_end();
@@ -1343,12 +1286,14 @@ NOKPROBE_SYMBOL(kprobe_flush_task);
 
 static inline void free_rp_inst(struct kretprobe *rp)
 {
+	struct llist_node *next, *node = llist_del_all(&rp->free_instances);
 	struct kretprobe_instance *ri;
-	struct hlist_node *next;
 
-	hlist_for_each_entry_safe(ri, next, &rp->free_instances, hlist) {
-		hlist_del(&ri->hlist);
+	while (node) {
+		next = node->next;
+		ri = container_of(node, struct kretprobe_instance, llist);
 		kfree(ri);
+		node = next;
 	}
 }
 
@@ -1359,6 +1304,10 @@ static void cleanup_rp_inst(struct kretprobe *rp)
 	struct hlist_node *next;
 	struct hlist_head *head;
 
+	/*
+	 * XXX broken... 
+	 */
+#if 0
 	/* No race here */
 	for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
 		kretprobe_table_lock(hash, &flags);
@@ -1369,6 +1318,7 @@ static void cleanup_rp_inst(struct kretprobe *rp)
 		}
 		kretprobe_table_unlock(hash, &flags);
 	}
+#endif
 	free_rp_inst(rp);
 }
 NOKPROBE_SYMBOL(cleanup_rp_inst);
@@ -1934,50 +1884,28 @@ unsigned long __weak arch_deref_entry_point(void *entry)
 static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 {
 	struct kretprobe *rp = container_of(p, struct kretprobe, kp);
-	unsigned long hash, flags = 0;
 	struct kretprobe_instance *ri;
+	struct llist_node *llist;
 
-	/*
-	 * To avoid deadlocks, prohibit return probing in NMI contexts,
-	 * just skip the probe and increase the (inexact) 'nmissed'
-	 * statistical counter, so that the user is informed that
-	 * something happened:
-	 */
-	if (unlikely(in_nmi())) {
+	llist = llist_del_first(&rp->free_instances);
+	if (!llist) {
 		rp->nmissed++;
 		return 0;
 	}
 
-	/* TODO: consider to only swap the RA after the last pre_handler fired */
-	hash = hash_ptr(current, KPROBE_HASH_BITS);
-	raw_spin_lock_irqsave(&rp->lock, flags);
-	if (!hlist_empty(&rp->free_instances)) {
-		ri = hlist_entry(rp->free_instances.first,
-				struct kretprobe_instance, hlist);
-		hlist_del(&ri->hlist);
-		raw_spin_unlock_irqrestore(&rp->lock, flags);
+	ri = container_of(llist, struct kretprobe_instance, llist);
+	ri->rp = rp;
+	ri->task = current;
 
-		ri->rp = rp;
-		ri->task = current;
+	if (rp->entry_handler && rp->entry_handler(ri, regs)) {
+		llist_add(llist, &rp->free_instances);
+		return 0;
+	}
 
-		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
-			raw_spin_lock_irqsave(&rp->lock, flags);
-			hlist_add_head(&ri->hlist, &rp->free_instances);
-			raw_spin_unlock_irqrestore(&rp->lock, flags);
-			return 0;
-		}
+	arch_prepare_kretprobe(ri, regs);
 
-		arch_prepare_kretprobe(ri, regs);
+	__llist_add(llist, &current->kretprobe_instances);
 
-		/* XXX(hch): why is there no hlist_move_head? */
-		INIT_HLIST_NODE(&ri->hlist);
-		kretprobe_table_lock(hash, &flags);
-		hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
-		kretprobe_table_unlock(hash, &flags);
-	} else {
-		rp->nmissed++;
-		raw_spin_unlock_irqrestore(&rp->lock, flags);
-	}
 	return 0;
 }
 NOKPROBE_SYMBOL(pre_handler_kretprobe);
@@ -2034,8 +1962,9 @@ int register_kretprobe(struct kretprobe *rp)
 		rp->maxactive = num_possible_cpus();
 #endif
 	}
-	raw_spin_lock_init(&rp->lock);
-	INIT_HLIST_HEAD(&rp->free_instances);
+
+	rp->free_instances.first = NULL;
+
 	for (i = 0; i < rp->maxactive; i++) {
 		inst = kmalloc(sizeof(struct kretprobe_instance) +
 			       rp->data_size, GFP_KERNEL);
@@ -2043,8 +1972,7 @@ int register_kretprobe(struct kretprobe *rp)
 			free_rp_inst(rp);
 			return -ENOMEM;
 		}
-		INIT_HLIST_NODE(&inst->hlist);
-		hlist_add_head(&inst->hlist, &rp->free_instances);
+		llist_add(&inst->llist, &rp->free_instances);
 	}
 
 	rp->nmissed = 0;
@@ -2458,11 +2386,8 @@ static int __init init_kprobes(void)
 
 	/* FIXME allocate the probe table, currently defined statically */
 	/* initialize all list heads */
-	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
+	for (i = 0; i < KPROBE_TABLE_SIZE; i++)
 		INIT_HLIST_HEAD(&kprobe_table[i]);
-		INIT_HLIST_HEAD(&kretprobe_inst_table[i]);
-		raw_spin_lock_init(&(kretprobe_table_locks[i].lock));
-	}
 
 	err = populate_kprobe_blacklist(__start_kprobe_blacklist,
 					__stop_kprobe_blacklist);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] kprobes/x86: Fixes NMI context check on x86
  2020-08-25  6:15     ` Masami Hiramatsu
@ 2020-08-25 12:20         ` kernel test robot
  2020-08-25 11:06         ` kernel test robot
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2020-08-25 12:20 UTC (permalink / raw)
  To: Masami Hiramatsu, Eddy_Wu
  Cc: kbuild-all, clang-built-linux, Peter Zijlstra, linux-kernel, x86

[-- Attachment #1: Type: text/plain, Size: 3120 bytes --]

Hi Masami,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on tip/auto-latest linux/master linus/master v5.9-rc2 next-20200825]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ef2ff0f5d6008d325c9a068e20981c0d0acc4d6b
config: x86_64-randconfig-a015-20200825 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 77e5a195f818b9ace91f7b12ab948b21d7918238)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/kprobes/core.c:573:6: warning: no previous prototype for function 'arch_kprobe_in_nmi' [-Wmissing-prototypes]
   bool arch_kprobe_in_nmi(void)
        ^
   arch/x86/kernel/kprobes/core.c:573:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   bool arch_kprobe_in_nmi(void)
   ^
   static 
   arch/x86/kernel/kprobes/core.c:776:24: warning: no previous prototype for function 'trampoline_handler' [-Wmissing-prototypes]
   __used __visible void *trampoline_handler(struct pt_regs *regs)
                          ^
   arch/x86/kernel/kprobes/core.c:776:18: note: declare 'static' if the function is not intended to be used outside of this translation unit
   __used __visible void *trampoline_handler(struct pt_regs *regs)
                    ^
                    static 
   2 warnings generated.

# https://github.com/0day-ci/linux/commit/2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
git checkout 2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
vim +/arch_kprobe_in_nmi +573 arch/x86/kernel/kprobes/core.c

   572	
 > 573	bool arch_kprobe_in_nmi(void)
   574	{
   575		struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
   576	
   577		if (kcb->in_int3) {
   578			/*
   579			 * Since the int3 is one of NMI, we have to check in_nmi() is
   580			 * bigger than 1 << NMI_SHIFT instead of !0.
   581			 */
   582			return in_nmi() > (1 << NMI_SHIFT);
   583		} else
   584			return in_nmi();
   585	}
   586	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36223 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kprobes/x86: Fixes NMI context check on x86
@ 2020-08-25 12:20         ` kernel test robot
  0 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2020-08-25 12:20 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3188 bytes --]

Hi Masami,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on tip/auto-latest linux/master linus/master v5.9-rc2 next-20200825]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ef2ff0f5d6008d325c9a068e20981c0d0acc4d6b
config: x86_64-randconfig-a015-20200825 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 77e5a195f818b9ace91f7b12ab948b21d7918238)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/kprobes/core.c:573:6: warning: no previous prototype for function 'arch_kprobe_in_nmi' [-Wmissing-prototypes]
   bool arch_kprobe_in_nmi(void)
        ^
   arch/x86/kernel/kprobes/core.c:573:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   bool arch_kprobe_in_nmi(void)
   ^
   static 
   arch/x86/kernel/kprobes/core.c:776:24: warning: no previous prototype for function 'trampoline_handler' [-Wmissing-prototypes]
   __used __visible void *trampoline_handler(struct pt_regs *regs)
                          ^
   arch/x86/kernel/kprobes/core.c:776:18: note: declare 'static' if the function is not intended to be used outside of this translation unit
   __used __visible void *trampoline_handler(struct pt_regs *regs)
                    ^
                    static 
   2 warnings generated.

# https://github.com/0day-ci/linux/commit/2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
git checkout 2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
vim +/arch_kprobe_in_nmi +573 arch/x86/kernel/kprobes/core.c

   572	
 > 573	bool arch_kprobe_in_nmi(void)
   574	{
   575		struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
   576	
   577		if (kcb->in_int3) {
   578			/*
   579			 * Since the int3 is one of NMI, we have to check in_nmi() is
   580			 * bigger than 1 << NMI_SHIFT instead of !0.
   581			 */
   582			return in_nmi() > (1 << NMI_SHIFT);
   583		} else
   584			return in_nmi();
   585	}
   586	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36223 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kprobes/x86: Fixes NMI context check on x86
  2020-08-25  6:15     ` Masami Hiramatsu
@ 2020-08-25 12:25         ` kernel test robot
  2020-08-25 11:06         ` kernel test robot
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2020-08-25 12:25 UTC (permalink / raw)
  To: Masami Hiramatsu, Eddy_Wu
  Cc: kbuild-all, clang-built-linux, Peter Zijlstra, linux-kernel, x86

[-- Attachment #1: Type: text/plain, Size: 3120 bytes --]

Hi Masami,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on tip/auto-latest linux/master linus/master v5.9-rc2 next-20200825]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ef2ff0f5d6008d325c9a068e20981c0d0acc4d6b
config: x86_64-randconfig-r014-20200825 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 77e5a195f818b9ace91f7b12ab948b21d7918238)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/kprobes/core.c:573:6: warning: no previous prototype for function 'arch_kprobe_in_nmi' [-Wmissing-prototypes]
   bool arch_kprobe_in_nmi(void)
        ^
   arch/x86/kernel/kprobes/core.c:573:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   bool arch_kprobe_in_nmi(void)
   ^
   static 
   arch/x86/kernel/kprobes/core.c:776:24: warning: no previous prototype for function 'trampoline_handler' [-Wmissing-prototypes]
   __used __visible void *trampoline_handler(struct pt_regs *regs)
                          ^
   arch/x86/kernel/kprobes/core.c:776:18: note: declare 'static' if the function is not intended to be used outside of this translation unit
   __used __visible void *trampoline_handler(struct pt_regs *regs)
                    ^
                    static 
   2 warnings generated.

# https://github.com/0day-ci/linux/commit/2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
git checkout 2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
vim +/arch_kprobe_in_nmi +573 arch/x86/kernel/kprobes/core.c

   572	
 > 573	bool arch_kprobe_in_nmi(void)
   574	{
   575		struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
   576	
   577		if (kcb->in_int3) {
   578			/*
   579			 * Since the int3 is one of NMI, we have to check in_nmi() is
   580			 * bigger than 1 << NMI_SHIFT instead of !0.
   581			 */
   582			return in_nmi() > (1 << NMI_SHIFT);
   583		} else
   584			return in_nmi();
   585	}
   586	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27770 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] kprobes/x86: Fixes NMI context check on x86
@ 2020-08-25 12:25         ` kernel test robot
  0 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2020-08-25 12:25 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3188 bytes --]

Hi Masami,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on tip/auto-latest linux/master linus/master v5.9-rc2 next-20200825]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ef2ff0f5d6008d325c9a068e20981c0d0acc4d6b
config: x86_64-randconfig-r014-20200825 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 77e5a195f818b9ace91f7b12ab948b21d7918238)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/kprobes/core.c:573:6: warning: no previous prototype for function 'arch_kprobe_in_nmi' [-Wmissing-prototypes]
   bool arch_kprobe_in_nmi(void)
        ^
   arch/x86/kernel/kprobes/core.c:573:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   bool arch_kprobe_in_nmi(void)
   ^
   static 
   arch/x86/kernel/kprobes/core.c:776:24: warning: no previous prototype for function 'trampoline_handler' [-Wmissing-prototypes]
   __used __visible void *trampoline_handler(struct pt_regs *regs)
                          ^
   arch/x86/kernel/kprobes/core.c:776:18: note: declare 'static' if the function is not intended to be used outside of this translation unit
   __used __visible void *trampoline_handler(struct pt_regs *regs)
                    ^
                    static 
   2 warnings generated.

# https://github.com/0day-ci/linux/commit/2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Masami-Hiramatsu/kprobes-x86-Fixes-NMI-context-check-on-x86/20200825-141829
git checkout 2c75bf59d2f4fcecab1f1099a04ed17049df8e0a
vim +/arch_kprobe_in_nmi +573 arch/x86/kernel/kprobes/core.c

   572	
 > 573	bool arch_kprobe_in_nmi(void)
   574	{
   575		struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
   576	
   577		if (kcb->in_int3) {
   578			/*
   579			 * Since the int3 is one of NMI, we have to check in_nmi() is
   580			 * bigger than 1 << NMI_SHIFT instead of !0.
   581			 */
   582			return in_nmi() > (1 << NMI_SHIFT);
   583		} else
   584			return in_nmi();
   585	}
   586	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 27770 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-25 12:09       ` x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint) peterz
@ 2020-08-25 13:15         ` Masami Hiramatsu
  2020-08-25 13:30           ` peterz
  2020-08-27  9:02           ` peterz
  2020-08-26  7:07         ` Eddy_Wu
  2020-08-26  8:31         ` Masami Hiramatsu
  2 siblings, 2 replies; 33+ messages in thread
From: Masami Hiramatsu @ 2020-08-25 13:15 UTC (permalink / raw)
  To: peterz; +Cc: Eddy_Wu, linux-kernel, x86, David S. Miller

On Tue, 25 Aug 2020 14:09:11 +0200
peterz@infradead.org wrote:

> On Tue, Aug 25, 2020 at 03:15:38PM +0900, Masami Hiramatsu wrote:
> 
> > From 24390dffe6eb9a3e95f7d46a528a1dcfd716dc81 Mon Sep 17 00:00:00 2001
> > From: Masami Hiramatsu <mhiramat@kernel.org>
> > Date: Tue, 25 Aug 2020 01:37:00 +0900
> > Subject: [PATCH] kprobes/x86: Fixes NMI context check on x86
> > 
> > Since commit 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
> > made int3 as one of NMI, in_nmi() in kprobe handlers always returns !0.
> > Thus the kretprobe handlers always skipped the execution on x86 if it
> > is using int3. (CONFIG_KPROBES_ON_FTRACE=n and
> > echo 0 > /proc/sys/debug/kprobe_optimization)
> > 
> > To avoid this issue, introduce arch_kprobe_in_nmi() and check the
> > in_nmi() count is bigger than 1 << NMI_SHIFT on x86 if the handler
> > has been invoked from kprobe_int3_handler. By default, the
> > arch_kprobe_in_nmi() will be same as in_nmi().
> 
> So I still hate all of this, it feels like it's making a bad hack worse.
> 
> > Fixes: 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
> 
> I think there's another problem, when you run this code with lockdep
> enabled it'll complain very loudly. Lockdep doesn't like using locks
> from NMI context much.
> 
> Can't we take one step back. Why are we taking locks from kprobe
> context? I was under the impression that kprobes were lockless (for good
> reasons), why does kretprobe need one? And can't we fix that?
> 
> Looking at the code it mucks about with with an hlist.
> 
> So on pre_handler is tries and take a kretprobe_instance from a list,
> and fails when there isn't one. This looks like per-task-per-retprobe
> data. Also, all of that is only returned when the task dies.
> 
> Surely we can do a lockless list for this. We have llist_add() and
> llist_del_first() to make a lockless LIFO/stack.

Ah, that's really nice!

> /me frobs code....
> 
> 
> Oooohh, another lock :-( kretprobe_table_lock
> 
> Bah bah bah, what a mess, surely we have a lockless hash-table
> somewhere? /me goes rummage around. Nope we don't.
> 
> Lightbulb!
> 
> That whole hash-table is nonsense, we don't need it. All it does is
> provide a per-task hlist. We can just stick a llist_head in task_struct
> itself and delete all that.

Yeah, at the first time kretprobe designed to not change the task struct,
but it seems odd restriction now. Per-task shadow stack should be implemented
as per-task.

> /me frobs more code...
> 
> argh, arch/ code uses this
> 
> 
> damn... one last problem is dangling instances.. so close.
> We can apparently unregister a kretprobe while there's still active
> kretprobe_instance's out referencing it.

Yeah, kretprobe already provided the per-instance data (as far as
I know, only systemtap depends on it). We need to provide it for
such users.
But if we only have one lock, we can avoid checking NMI because
we can check the recursion with trylock. It is needed only if the
kretprobe uses per-instance data. Or we can just pass a dummy
instance on the stack.

> 
> Ignoring that issue for the moment, the below seems to actually work.

OK, this looks good to me too.
I'll make a series to rewrite kretprobe based on this patch, OK?

Thanks,

> 
> ---
> Subject: kprobe: Rewrite kretprobe to not use locks
> 
> Use llist based atomic stack for rp->free_instances, and a llist based
> non-atomic stack for current->kretprobe_instances. The latter is only
> ever referenced from the task context and properly nests, so even if it
> gets interrupted, the interrupt should pop whatever it pushed on the
> stack before returning.
> 
> XXX unregister_kretprobe*() vs active instances is broken.
> XXX everything !x86 is broken
> 
> Not-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/kprobes/core.c      |  75 ++++++-------------
>  drivers/gpu/drm/i915/i915_request.c |   6 --
>  include/linux/kprobes.h             |   8 +-
>  include/linux/llist.h               |  15 ++++
>  include/linux/sched.h               |   4 +
>  kernel/fork.c                       |   4 +
>  kernel/kprobes.c                    | 145 +++++++++---------------------------
>  7 files changed, 87 insertions(+), 170 deletions(-)
> 
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 2ca10b770cff..311c26ef0fc2 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -772,14 +772,12 @@ STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
>   */
>  __used __visible void *trampoline_handler(struct pt_regs *regs)
>  {
> -	struct kretprobe_instance *ri = NULL;
> -	struct hlist_head *head, empty_rp;
> -	struct hlist_node *tmp;
> -	unsigned long flags, orig_ret_address = 0;
>  	unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
>  	kprobe_opcode_t *correct_ret_addr = NULL;
> +	struct kretprobe_instance *ri = NULL;
> +	struct llist_node *node, *first;
> +	unsigned long orig_ret_address;
>  	void *frame_pointer;
> -	bool skipped = false;
>  
>  	/*
>  	 * Set a dummy kprobe for avoiding kretprobe recursion.
> @@ -788,8 +786,6 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>  	 */
>  	kprobe_busy_begin();
>  
> -	INIT_HLIST_HEAD(&empty_rp);
> -	kretprobe_hash_lock(current, &head, &flags);
>  	/* fixup registers */
>  	regs->cs = __KERNEL_CS;
>  #ifdef CONFIG_X86_32
> @@ -813,48 +809,37 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>  	 *	 will be the real return address, and all the rest will
>  	 *	 point to kretprobe_trampoline.
>  	 */
> -	hlist_for_each_entry(ri, head, hlist) {
> -		if (ri->task != current)
> -			/* another task is sharing our hash bucket */
> -			continue;
> -		/*
> -		 * Return probes must be pushed on this hash list correct
> -		 * order (same as return order) so that it can be popped
> -		 * correctly. However, if we find it is pushed it incorrect
> -		 * order, this means we find a function which should not be
> -		 * probed, because the wrong order entry is pushed on the
> -		 * path of processing other kretprobe itself.
> -		 */
> -		if (ri->fp != frame_pointer) {
> -			if (!skipped)
> -				pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n");
> -			skipped = true;
> -			continue;
> -		}
>  
> -		orig_ret_address = (unsigned long)ri->ret_addr;
> -		if (skipped)
> -			pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n",
> -				ri->rp->kp.addr);
> +	first = node = current->kretprobe_instances.first;
> +	while (node) {
> +		ri = container_of(node, struct kretprobe_instance, llist);
> +		node = node->next;
>  
> -		if (orig_ret_address != trampoline_address)
> +		BUG_ON(ri->fp != frame_pointer);
> +
> +		orig_ret_address = (unsigned long)ri->ret_addr;
> +		if (orig_ret_address != trampoline_address) {
>  			/*
>  			 * This is the real return address. Any other
>  			 * instances associated with this task are for
>  			 * other calls deeper on the call stack
>  			 */
>  			break;
> +		}
>  	}
>  
>  	kretprobe_assert(ri, orig_ret_address, trampoline_address);
> -
>  	correct_ret_addr = ri->ret_addr;
> -	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
> -		if (ri->task != current)
> -			/* another task is sharing our hash bucket */
> -			continue;
> -		if (ri->fp != frame_pointer)
> -			continue;
> +
> +	/*
> +	 * Per the above BUG_ON() we're guaranteed at least one ri.
> +	 *
> +	 * Delete all nodes for this frame_pointer.
> +	 */
> +	current->kretprobe_instances.first = node;
> +	while (first) {
> +		ri = container_of(first, struct kretprobe_instance, llist);
> +		node = first->next;
>  
>  		orig_ret_address = (unsigned long)ri->ret_addr;
>  		if (ri->rp && ri->rp->handler) {
> @@ -864,25 +849,13 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>  			__this_cpu_write(current_kprobe, &kprobe_busy);
>  		}
>  
> -		recycle_rp_inst(ri, &empty_rp);
> +		recycle_rp_inst(ri, NULL);
>  
> -		if (orig_ret_address != trampoline_address)
> -			/*
> -			 * This is the real return address. Any other
> -			 * instances associated with this task are for
> -			 * other calls deeper on the call stack
> -			 */
> -			break;
> +		first = node;
>  	}
>  
> -	kretprobe_hash_unlock(current, &flags);
> -
>  	kprobe_busy_end();
>  
> -	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> -		hlist_del(&ri->hlist);
> -		kfree(ri);
> -	}
>  	return (void *)orig_ret_address;
>  }
>  NOKPROBE_SYMBOL(trampoline_handler);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 0b2fe55e6194..0e851b925c8c 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -357,12 +357,6 @@ void i915_request_retire_upto(struct i915_request *rq)
>  	} while (i915_request_retire(tmp) && tmp != rq);
>  }
>  
> -static void __llist_add(struct llist_node *node, struct llist_head *head)
> -{
> -	node->next = head->first;
> -	head->first = node;
> -}
> -
>  static struct i915_request * const *
>  __engine_active(struct intel_engine_cs *engine)
>  {
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 9be1bff4f586..073c40ae2cdb 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -151,12 +151,14 @@ struct kretprobe {
>  	int maxactive;
>  	int nmissed;
>  	size_t data_size;
> -	struct hlist_head free_instances;
> -	raw_spinlock_t lock;
> +//	struct hlist_head free_instances;
> +//	raw_spinlock_t lock;
> +	struct llist_head free_instances;
>  };
>  
>  struct kretprobe_instance {
> -	struct hlist_node hlist;
> +//	struct hlist_node hlist;
> +	struct llist_node llist;
>  	struct kretprobe *rp;
>  	kprobe_opcode_t *ret_addr;
>  	struct task_struct *task;
> diff --git a/include/linux/llist.h b/include/linux/llist.h
> index 2e9c7215882b..c17893dcc591 100644
> --- a/include/linux/llist.h
> +++ b/include/linux/llist.h
> @@ -197,6 +197,16 @@ static inline struct llist_node *llist_next(struct llist_node *node)
>  extern bool llist_add_batch(struct llist_node *new_first,
>  			    struct llist_node *new_last,
>  			    struct llist_head *head);
> +
> +static inline bool __llist_add_batch(struct llist_node *new_first,
> +				     struct llist_node *new_last,
> +				     struct llist_head *head)
> +{
> +	new_last->next = head->first;
> +	head->first = new_first;
> +	return new_last->next == NULL;
> +}
> +
>  /**
>   * llist_add - add a new entry
>   * @new:	new entry to be added
> @@ -209,6 +219,11 @@ static inline bool llist_add(struct llist_node *new, struct llist_head *head)
>  	return llist_add_batch(new, new, head);
>  }
>  
> +static inline bool __llist_add(struct llist_node *new, struct llist_head *head)
> +{
> +	return __llist_add_batch(new, new, head);
> +}
> +
>  /**
>   * llist_del_all - delete all entries from lock-less list
>   * @head:	the head of lock-less list to delete all entries
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 93ecd930efd3..e53fd653cefa 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1315,6 +1315,10 @@ struct task_struct {
>  	struct callback_head		mce_kill_me;
>  #endif
>  
> +#ifdef CONFIG_KRETPROBES
> +	struct llist_head		kretprobe_instances;
> +#endif
> +
>  	/*
>  	 * New fields for task_struct should be added above here, so that
>  	 * they are included in the randomized portion of task_struct.
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 4d32190861bd..2ff5cceb0732 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2161,6 +2161,10 @@ static __latent_entropy struct task_struct *copy_process(
>  	INIT_LIST_HEAD(&p->thread_group);
>  	p->task_works = NULL;
>  
> +#ifdef CONFIG_KRETPROBES
> +	p->kretprobe_instances.first = NULL;
> +#endif
> +
>  	/*
>  	 * Ensure that the cgroup subsystem policies allow the new process to be
>  	 * forked. It should be noted the the new process's css_set can be changed
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 287b263c9cb9..e208ce6fa84d 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -53,7 +53,6 @@ static int kprobes_initialized;
>   * - RCU hlist traversal under disabling preempt (breakpoint handlers)
>   */
>  static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
> -static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
>  
>  /* NOTE: change this value only with kprobe_mutex held */
>  static bool kprobes_all_disarmed;
> @@ -61,9 +60,6 @@ static bool kprobes_all_disarmed;
>  /* This protects kprobe_table and optimizing_list */
>  static DEFINE_MUTEX(kprobe_mutex);
>  static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL;
> -static struct {
> -	raw_spinlock_t lock ____cacheline_aligned_in_smp;
> -} kretprobe_table_locks[KPROBE_TABLE_SIZE];
>  
>  kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
>  					unsigned int __unused)
> @@ -71,11 +67,6 @@ kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
>  	return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
>  }
>  
> -static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
> -{
> -	return &(kretprobe_table_locks[hash].lock);
> -}
> -
>  /* Blacklist -- list of struct kprobe_blacklist_entry */
>  static LIST_HEAD(kprobe_blacklist);
>  
> @@ -1228,62 +1219,22 @@ void recycle_rp_inst(struct kretprobe_instance *ri,
>  {
>  	struct kretprobe *rp = ri->rp;
>  
> -	/* remove rp inst off the rprobe_inst_table */
> -	hlist_del(&ri->hlist);
> -	INIT_HLIST_NODE(&ri->hlist);
> -	if (likely(rp)) {
> -		raw_spin_lock(&rp->lock);
> -		hlist_add_head(&ri->hlist, &rp->free_instances);
> -		raw_spin_unlock(&rp->lock);
> -	} else
> -		/* Unregistering */
> -		hlist_add_head(&ri->hlist, head);
> +	llist_add(&ri->llist, &rp->free_instances);
>  }
>  NOKPROBE_SYMBOL(recycle_rp_inst);
>  
>  void kretprobe_hash_lock(struct task_struct *tsk,
>  			 struct hlist_head **head, unsigned long *flags)
> -__acquires(hlist_lock)
>  {
> -	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
> -	raw_spinlock_t *hlist_lock;
> -
> -	*head = &kretprobe_inst_table[hash];
> -	hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_lock_irqsave(hlist_lock, *flags);
>  }
>  NOKPROBE_SYMBOL(kretprobe_hash_lock);
>  
> -static void kretprobe_table_lock(unsigned long hash,
> -				 unsigned long *flags)
> -__acquires(hlist_lock)
> -{
> -	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_lock_irqsave(hlist_lock, *flags);
> -}
> -NOKPROBE_SYMBOL(kretprobe_table_lock);
> -
>  void kretprobe_hash_unlock(struct task_struct *tsk,
>  			   unsigned long *flags)
> -__releases(hlist_lock)
>  {
> -	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
> -	raw_spinlock_t *hlist_lock;
> -
> -	hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_unlock_irqrestore(hlist_lock, *flags);
>  }
>  NOKPROBE_SYMBOL(kretprobe_hash_unlock);
>  
> -static void kretprobe_table_unlock(unsigned long hash,
> -				   unsigned long *flags)
> -__releases(hlist_lock)
> -{
> -	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_unlock_irqrestore(hlist_lock, *flags);
> -}
> -NOKPROBE_SYMBOL(kretprobe_table_unlock);
> -
>  struct kprobe kprobe_busy = {
>  	.addr = (void *) get_kprobe,
>  };
> @@ -1312,29 +1263,21 @@ void kprobe_busy_end(void)
>   */
>  void kprobe_flush_task(struct task_struct *tk)
>  {
> +	struct llist_node *next, *node;
>  	struct kretprobe_instance *ri;
> -	struct hlist_head *head, empty_rp;
> -	struct hlist_node *tmp;
> -	unsigned long hash, flags = 0;
>  
> +	/* Early boot, not yet initialized. */
>  	if (unlikely(!kprobes_initialized))
> -		/* Early boot.  kretprobe_table_locks not yet initialized. */
>  		return;
>  
>  	kprobe_busy_begin();
>  
> -	INIT_HLIST_HEAD(&empty_rp);
> -	hash = hash_ptr(tk, KPROBE_HASH_BITS);
> -	head = &kretprobe_inst_table[hash];
> -	kretprobe_table_lock(hash, &flags);
> -	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
> -		if (ri->task == tk)
> -			recycle_rp_inst(ri, &empty_rp);
> -	}
> -	kretprobe_table_unlock(hash, &flags);
> -	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> -		hlist_del(&ri->hlist);
> +	node = llist_del_all(&current->kretprobe_instances);
> +	while (node) {
> +		next = node->next;
> +		ri = container_of(node, struct kretprobe_instance, llist);
>  		kfree(ri);
> +		node = next;
>  	}
>  
>  	kprobe_busy_end();
> @@ -1343,12 +1286,14 @@ NOKPROBE_SYMBOL(kprobe_flush_task);
>  
>  static inline void free_rp_inst(struct kretprobe *rp)
>  {
> +	struct llist_node *next, *node = llist_del_all(&rp->free_instances);
>  	struct kretprobe_instance *ri;
> -	struct hlist_node *next;
>  
> -	hlist_for_each_entry_safe(ri, next, &rp->free_instances, hlist) {
> -		hlist_del(&ri->hlist);
> +	while (node) {
> +		next = node->next;
> +		ri = container_of(node, struct kretprobe_instance, llist);
>  		kfree(ri);
> +		node = next;
>  	}
>  }
>  
> @@ -1359,6 +1304,10 @@ static void cleanup_rp_inst(struct kretprobe *rp)
>  	struct hlist_node *next;
>  	struct hlist_head *head;
>  
> +	/*
> +	 * XXX broken... 
> +	 */
> +#if 0
>  	/* No race here */
>  	for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
>  		kretprobe_table_lock(hash, &flags);
> @@ -1369,6 +1318,7 @@ static void cleanup_rp_inst(struct kretprobe *rp)
>  		}
>  		kretprobe_table_unlock(hash, &flags);
>  	}
> +#endif
>  	free_rp_inst(rp);
>  }
>  NOKPROBE_SYMBOL(cleanup_rp_inst);
> @@ -1934,50 +1884,28 @@ unsigned long __weak arch_deref_entry_point(void *entry)
>  static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  {
>  	struct kretprobe *rp = container_of(p, struct kretprobe, kp);
> -	unsigned long hash, flags = 0;
>  	struct kretprobe_instance *ri;
> +	struct llist_node *llist;
>  
> -	/*
> -	 * To avoid deadlocks, prohibit return probing in NMI contexts,
> -	 * just skip the probe and increase the (inexact) 'nmissed'
> -	 * statistical counter, so that the user is informed that
> -	 * something happened:
> -	 */
> -	if (unlikely(in_nmi())) {
> +	llist = llist_del_first(&rp->free_instances);
> +	if (!llist) {
>  		rp->nmissed++;
>  		return 0;
>  	}
>  
> -	/* TODO: consider to only swap the RA after the last pre_handler fired */
> -	hash = hash_ptr(current, KPROBE_HASH_BITS);
> -	raw_spin_lock_irqsave(&rp->lock, flags);
> -	if (!hlist_empty(&rp->free_instances)) {
> -		ri = hlist_entry(rp->free_instances.first,
> -				struct kretprobe_instance, hlist);
> -		hlist_del(&ri->hlist);
> -		raw_spin_unlock_irqrestore(&rp->lock, flags);
> +	ri = container_of(llist, struct kretprobe_instance, llist);
> +	ri->rp = rp;
> +	ri->task = current;
>  
> -		ri->rp = rp;
> -		ri->task = current;
> +	if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> +		llist_add(llist, &rp->free_instances);
> +		return 0;
> +	}
>  
> -		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> -			raw_spin_lock_irqsave(&rp->lock, flags);
> -			hlist_add_head(&ri->hlist, &rp->free_instances);
> -			raw_spin_unlock_irqrestore(&rp->lock, flags);
> -			return 0;
> -		}
> +	arch_prepare_kretprobe(ri, regs);
>  
> -		arch_prepare_kretprobe(ri, regs);
> +	__llist_add(llist, &current->kretprobe_instances);
>  
> -		/* XXX(hch): why is there no hlist_move_head? */
> -		INIT_HLIST_NODE(&ri->hlist);
> -		kretprobe_table_lock(hash, &flags);
> -		hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
> -		kretprobe_table_unlock(hash, &flags);
> -	} else {
> -		rp->nmissed++;
> -		raw_spin_unlock_irqrestore(&rp->lock, flags);
> -	}
>  	return 0;
>  }
>  NOKPROBE_SYMBOL(pre_handler_kretprobe);
> @@ -2034,8 +1962,9 @@ int register_kretprobe(struct kretprobe *rp)
>  		rp->maxactive = num_possible_cpus();
>  #endif
>  	}
> -	raw_spin_lock_init(&rp->lock);
> -	INIT_HLIST_HEAD(&rp->free_instances);
> +
> +	rp->free_instances.first = NULL;
> +
>  	for (i = 0; i < rp->maxactive; i++) {
>  		inst = kmalloc(sizeof(struct kretprobe_instance) +
>  			       rp->data_size, GFP_KERNEL);
> @@ -2043,8 +1972,7 @@ int register_kretprobe(struct kretprobe *rp)
>  			free_rp_inst(rp);
>  			return -ENOMEM;
>  		}
> -		INIT_HLIST_NODE(&inst->hlist);
> -		hlist_add_head(&inst->hlist, &rp->free_instances);
> +		llist_add(&inst->llist, &rp->free_instances);
>  	}
>  
>  	rp->nmissed = 0;
> @@ -2458,11 +2386,8 @@ static int __init init_kprobes(void)
>  
>  	/* FIXME allocate the probe table, currently defined statically */
>  	/* initialize all list heads */
> -	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
> +	for (i = 0; i < KPROBE_TABLE_SIZE; i++)
>  		INIT_HLIST_HEAD(&kprobe_table[i]);
> -		INIT_HLIST_HEAD(&kretprobe_inst_table[i]);
> -		raw_spin_lock_init(&(kretprobe_table_locks[i].lock));
> -	}
>  
>  	err = populate_kprobe_blacklist(__start_kprobe_blacklist,
>  					__stop_kprobe_blacklist);


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-25 13:15         ` Masami Hiramatsu
@ 2020-08-25 13:30           ` peterz
  2020-08-25 13:59             ` Masami Hiramatsu
  2020-08-25 14:10             ` peterz
  2020-08-27  9:02           ` peterz
  1 sibling, 2 replies; 33+ messages in thread
From: peterz @ 2020-08-25 13:30 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Eddy_Wu, linux-kernel, x86, David S. Miller

On Tue, Aug 25, 2020 at 10:15:55PM +0900, Masami Hiramatsu wrote:

> > damn... one last problem is dangling instances.. so close.
> > We can apparently unregister a kretprobe while there's still active
> > kretprobe_instance's out referencing it.
> 
> Yeah, kretprobe already provided the per-instance data (as far as
> I know, only systemtap depends on it). We need to provide it for
> such users.
> But if we only have one lock, we can avoid checking NMI because
> we can check the recursion with trylock. It is needed only if the
> kretprobe uses per-instance data. Or we can just pass a dummy
> instance on the stack.

I think it is true in general, you can unregister a rp while tasks are
preempted.

Anyway,. I think I have a solution, just need to talk to paulmck for a
bit.

> > Ignoring that issue for the moment, the below seems to actually work.
> 
> OK, this looks good to me too.
> I'll make a series to rewrite kretprobe based on this patch, OK?

Please, I'll send the fix along when I have it.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-25 13:30           ` peterz
@ 2020-08-25 13:59             ` Masami Hiramatsu
  2020-08-25 14:15               ` peterz
  2020-08-25 14:10             ` peterz
  1 sibling, 1 reply; 33+ messages in thread
From: Masami Hiramatsu @ 2020-08-25 13:59 UTC (permalink / raw)
  To: peterz; +Cc: Eddy_Wu, linux-kernel, x86, David S. Miller

On Tue, 25 Aug 2020 15:30:05 +0200
peterz@infradead.org wrote:

> On Tue, Aug 25, 2020 at 10:15:55PM +0900, Masami Hiramatsu wrote:
> 
> > > damn... one last problem is dangling instances.. so close.
> > > We can apparently unregister a kretprobe while there's still active
> > > kretprobe_instance's out referencing it.
> > 
> > Yeah, kretprobe already provided the per-instance data (as far as
> > I know, only systemtap depends on it). We need to provide it for
> > such users.
> > But if we only have one lock, we can avoid checking NMI because
> > we can check the recursion with trylock. It is needed only if the
> > kretprobe uses per-instance data. Or we can just pass a dummy
> > instance on the stack.
> 
> I think it is true in general, you can unregister a rp while tasks are
> preempted.

Would you mean the kretprobe handler (or trampoline handler) will be
preempted? All kprobes (including kretprobe) handler is running in
non-preemptive state, so it shouldn't happen...

> 
> Anyway,. I think I have a solution, just need to talk to paulmck for a
> bit.

Ah, you mentioned that the removing the kfree() from the trampline
handler? I think we can make an rcu callback which will kfree() the
given instances. (If it works in NMI)

> 
> > > Ignoring that issue for the moment, the below seems to actually work.
> > 
> > OK, this looks good to me too.
> > I'll make a series to rewrite kretprobe based on this patch, OK?
> 
> Please, I'll send the fix along when I have it.

OK, I'm planning to (1) add a generic trampoline code (2) cleanup per-arch
trampoline to use generic one, (3) rewrite the generic trampoline to use
lockless code. Then it will not break anything.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-25 13:30           ` peterz
  2020-08-25 13:59             ` Masami Hiramatsu
@ 2020-08-25 14:10             ` peterz
  2020-08-25 14:19               ` Masami Hiramatsu
  1 sibling, 1 reply; 33+ messages in thread
From: peterz @ 2020-08-25 14:10 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Eddy_Wu, linux-kernel, x86, David S. Miller

On Tue, Aug 25, 2020 at 03:30:05PM +0200, peterz@infradead.org wrote:
> On Tue, Aug 25, 2020 at 10:15:55PM +0900, Masami Hiramatsu wrote:

> > OK, this looks good to me too.
> > I'll make a series to rewrite kretprobe based on this patch, OK?
> 
> Please, I'll send the fix along when I have it.

One approach that I think might work nicely is trying to pull
trampoline_handler() into core code (with a few arch helpers). Then we
can replace that loop once, instead of having to go fix each
architectures one by one.

They're all basically the same loop after all.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-25 13:59             ` Masami Hiramatsu
@ 2020-08-25 14:15               ` peterz
  0 siblings, 0 replies; 33+ messages in thread
From: peterz @ 2020-08-25 14:15 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Eddy_Wu, linux-kernel, x86, David S. Miller, Paul McKenney

On Tue, Aug 25, 2020 at 10:59:54PM +0900, Masami Hiramatsu wrote:
> On Tue, 25 Aug 2020 15:30:05 +0200
> peterz@infradead.org wrote:
> 
> > On Tue, Aug 25, 2020 at 10:15:55PM +0900, Masami Hiramatsu wrote:
> > 
> > > > damn... one last problem is dangling instances.. so close.
> > > > We can apparently unregister a kretprobe while there's still active
> > > > kretprobe_instance's out referencing it.
> > > 
> > > Yeah, kretprobe already provided the per-instance data (as far as
> > > I know, only systemtap depends on it). We need to provide it for
> > > such users.
> > > But if we only have one lock, we can avoid checking NMI because
> > > we can check the recursion with trylock. It is needed only if the
> > > kretprobe uses per-instance data. Or we can just pass a dummy
> > > instance on the stack.
> > 
> > I think it is true in general, you can unregister a rp while tasks are
> > preempted.
> 
> Would you mean the kretprobe handler (or trampoline handler) will be
> preempted? All kprobes (including kretprobe) handler is running in
> non-preemptive state, so it shouldn't happen...

I was thinking about something like:


	for_each_process_thread(p, t) {
		if (!t->kretprobe_instances.first)
			continue;

	again:
		if (try_invoke_on_locked_down_task(t, unhook_rp_inst, tp))
			continue;

		smp_function_call(...);

		if (!done)
			goto again;
	}

So then for each task that has a kretprobe stack, we iterate the stack
and set ri->rp = NULL, remotely when the task isn't running, locally if
the task is running.

I just need to change the semantics of try_invoke_on_locked_down_task()
a bit -- they're a tad weird atm.

> > Anyway,. I think I have a solution, just need to talk to paulmck for a
> > bit.
> 
> Ah, you mentioned that the removing the kfree() from the trampline
> handler? I think we can make an rcu callback which will kfree() the
> given instances. (If it works in NMI)

Yes, calling kfree() from the trampoline seems dodgy at best. When
!ri->rp rcu_free() is a good option.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-25 14:10             ` peterz
@ 2020-08-25 14:19               ` Masami Hiramatsu
  0 siblings, 0 replies; 33+ messages in thread
From: Masami Hiramatsu @ 2020-08-25 14:19 UTC (permalink / raw)
  To: peterz; +Cc: Eddy_Wu, linux-kernel, x86, David S. Miller

On Tue, 25 Aug 2020 16:10:58 +0200
peterz@infradead.org wrote:

> On Tue, Aug 25, 2020 at 03:30:05PM +0200, peterz@infradead.org wrote:
> > On Tue, Aug 25, 2020 at 10:15:55PM +0900, Masami Hiramatsu wrote:
> 
> > > OK, this looks good to me too.
> > > I'll make a series to rewrite kretprobe based on this patch, OK?
> > 
> > Please, I'll send the fix along when I have it.
> 
> One approach that I think might work nicely is trying to pull
> trampoline_handler() into core code (with a few arch helpers). Then we
> can replace that loop once, instead of having to go fix each
> architectures one by one.

Yes, that is what I'm trying (I had done it, but lost the code... let's do it again)

Thank you,

> 
> They're all basically the same loop after all.


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-25 12:09       ` x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint) peterz
  2020-08-25 13:15         ` Masami Hiramatsu
@ 2020-08-26  7:07         ` Eddy_Wu
  2020-08-26  8:22           ` Masami Hiramatsu
  2020-08-26  9:01           ` peterz
  2020-08-26  8:31         ` Masami Hiramatsu
  2 siblings, 2 replies; 33+ messages in thread
From: Eddy_Wu @ 2020-08-26  7:07 UTC (permalink / raw)
  To: peterz; +Cc: Masami Hiramatsu, linux-kernel, x86


> -----Original Message-----
> From: peterz@infradead.org <peterz@infradead.org>
> Sent: Tuesday, August 25, 2020 8:09 PM
> To: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Eddy Wu (RD-TW) <Eddy_Wu@trendmicro.com>; linux-kernel@vger.kernel.org; x86@kernel.org; David S. Miller
> <davem@davemloft.net>
> Subject: Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
>
> Surely we can do a lockless list for this. We have llist_add() and
> llist_del_first() to make a lockless LIFO/stack.
>

llist operations require atomic cmpxchg, for some arch doesn't have CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, in_nmi() check might still needed.
(HAVE_KRETPROBES && !CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG): arc, arm, csky, mips

TREND MICRO EMAIL NOTICE

The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

For details about what personal information we collect and why, please see our Privacy Notice on our website at: Read privacy policy<http://www.trendmicro.com/privacy>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-26  7:07         ` Eddy_Wu
@ 2020-08-26  8:22           ` Masami Hiramatsu
  2020-08-26  9:06             ` Masami Hiramatsu
  2020-08-26  9:01           ` peterz
  1 sibling, 1 reply; 33+ messages in thread
From: Masami Hiramatsu @ 2020-08-26  8:22 UTC (permalink / raw)
  To: Eddy_Wu; +Cc: peterz, Masami Hiramatsu, linux-kernel, x86

On Wed, 26 Aug 2020 07:07:09 +0000
"Eddy_Wu@trendmicro.com" <Eddy_Wu@trendmicro.com> wrote:

> 
> > -----Original Message-----
> > From: peterz@infradead.org <peterz@infradead.org>
> > Sent: Tuesday, August 25, 2020 8:09 PM
> > To: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Eddy Wu (RD-TW) <Eddy_Wu@trendmicro.com>; linux-kernel@vger.kernel.org; x86@kernel.org; David S. Miller
> > <davem@davemloft.net>
> > Subject: Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
> >
> > Surely we can do a lockless list for this. We have llist_add() and
> > llist_del_first() to make a lockless LIFO/stack.
> >
> 
> llist operations require atomic cmpxchg, for some arch doesn't have CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, in_nmi() check might still needed.
> (HAVE_KRETPROBES && !CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG): arc, arm, csky, mips

Good catch. In those cases, we can add in_nmi() check at arch dependent code.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-25 12:09       ` x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint) peterz
  2020-08-25 13:15         ` Masami Hiramatsu
  2020-08-26  7:07         ` Eddy_Wu
@ 2020-08-26  8:31         ` Masami Hiramatsu
  2 siblings, 0 replies; 33+ messages in thread
From: Masami Hiramatsu @ 2020-08-26  8:31 UTC (permalink / raw)
  To: peterz; +Cc: Eddy_Wu, linux-kernel, x86, David S. Miller

Hi Peter,

On Tue, 25 Aug 2020 14:09:11 +0200
peterz@infradead.org wrote:

> 
> @@ -1934,50 +1884,28 @@ unsigned long __weak arch_deref_entry_point(void *entry)
>  static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  {
>  	struct kretprobe *rp = container_of(p, struct kretprobe, kp);
> -	unsigned long hash, flags = 0;
>  	struct kretprobe_instance *ri;
> +	struct llist_node *llist;
>  
> -	/*
> -	 * To avoid deadlocks, prohibit return probing in NMI contexts,
> -	 * just skip the probe and increase the (inexact) 'nmissed'
> -	 * statistical counter, so that the user is informed that
> -	 * something happened:
> -	 */
> -	if (unlikely(in_nmi())) {
> +	llist = llist_del_first(&rp->free_instances);
> +	if (!llist) {
>  		rp->nmissed++;
>  		return 0;
>  	}

Would we need a lock around llist_del_first(&rp->free_instance) here?

linux/llist.h said,

 * Cases where locking is not needed:
 * If there are multiple producers and multiple consumers, llist_add can be
 * used in producers and llist_del_all can be used in consumers simultaneously
 * without locking. Also a single consumer can use llist_del_first while
                               ^^^^^^^^^^^^^^^^^^^^^^^
 * multiple producers simultaneously use llist_add, without any locking.
 *
 * Cases where locking is needed:
 * If we have multiple consumers with llist_del_first used in one consumer, and
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 * llist_del_first or llist_del_all used in other consumers, then a lock is
 * needed.

pre_handler_kretprobe() can be invoked simultaneously on the different CPUs
if those are calling the same probed function.


Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-26  7:07         ` Eddy_Wu
  2020-08-26  8:22           ` Masami Hiramatsu
@ 2020-08-26  9:01           ` peterz
  2020-08-26  9:21             ` peterz
  1 sibling, 1 reply; 33+ messages in thread
From: peterz @ 2020-08-26  9:01 UTC (permalink / raw)
  To: Eddy_Wu; +Cc: Masami Hiramatsu, linux-kernel, x86

On Wed, Aug 26, 2020 at 07:07:09AM +0000, Eddy_Wu@trendmicro.com wrote:
> llist operations require atomic cmpxchg, for some arch doesn't have
> CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, in_nmi() check might still needed.
> (HAVE_KRETPROBES && !CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG): arc, arm,
> csky, mips
> 

Look at the MIPS NMI handler, it's brilliant ;-)

Anyway, I think that CONFIG could use a little help, the point was to
opt-in to some code, and it was supposed to leave out known broken
architectures.

If your architecture has NMIs (not all of them do) or SMP, and doesn't
have sane atomic ops (CAS or LL/SC), then it's broken crap and I don't
care about it, full stop.

Those architectures are known broken and limp along on pure luck, that
CONFIG flag lets them disable some code that makes them crash faster.

The same with non-coherent SMP, some archs tried to limp along, nobody
cared about them, and I think we've since deleted them. I long for the
day we get to delete the last of these broken atomic archs.

Known broken archs include: Sparc32-SMP, PARISC, ARC-v1-SMP.
There might be a few more, but I've forgotten.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-26  8:22           ` Masami Hiramatsu
@ 2020-08-26  9:06             ` Masami Hiramatsu
  2020-08-26 10:00               ` Masami Hiramatsu
  0 siblings, 1 reply; 33+ messages in thread
From: Masami Hiramatsu @ 2020-08-26  9:06 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Eddy_Wu, peterz, linux-kernel, x86

On Wed, 26 Aug 2020 17:22:39 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Wed, 26 Aug 2020 07:07:09 +0000
> "Eddy_Wu@trendmicro.com" <Eddy_Wu@trendmicro.com> wrote:
> 
> > 
> > > -----Original Message-----
> > > From: peterz@infradead.org <peterz@infradead.org>
> > > Sent: Tuesday, August 25, 2020 8:09 PM
> > > To: Masami Hiramatsu <mhiramat@kernel.org>
> > > Cc: Eddy Wu (RD-TW) <Eddy_Wu@trendmicro.com>; linux-kernel@vger.kernel.org; x86@kernel.org; David S. Miller
> > > <davem@davemloft.net>
> > > Subject: Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
> > >
> > > Surely we can do a lockless list for this. We have llist_add() and
> > > llist_del_first() to make a lockless LIFO/stack.
> > >
> > 
> > llist operations require atomic cmpxchg, for some arch doesn't have CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, in_nmi() check might still needed.
> > (HAVE_KRETPROBES && !CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG): arc, arm, csky, mips
> 
> Good catch. In those cases, we can add in_nmi() check at arch dependent code.

Oops, in_nmi() check is needed in pre_kretprobe_handler() which has no
arch dependent code. Hmm, so we still need an weak function to check it...

Thanks,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-26  9:01           ` peterz
@ 2020-08-26  9:21             ` peterz
  0 siblings, 0 replies; 33+ messages in thread
From: peterz @ 2020-08-26  9:21 UTC (permalink / raw)
  To: Eddy_Wu; +Cc: Masami Hiramatsu, linux-kernel, x86

On Wed, Aug 26, 2020 at 11:01:02AM +0200, peterz@infradead.org wrote:

> Known broken archs include: Sparc32-SMP, PARISC, ARC-v1-SMP.
> There might be a few more, but I've forgotten.

Note that none of those actually have NMIs and llist is mostly OK on
those architectures too.

The problem is when you combine cmpxchg() with regular stores and
llist() doesn't do that.

The only architectures that have NMIs are: ARM, ARM64, POWERPC, s390,
SH, SPARC64, X86 and XTENSA and all of them have sane atomic ops.

(XTENSA is tricky, because it looks like it has parts that don't have
sane atomics, but I _think_ those parts also don't have NMI)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-26  9:06             ` Masami Hiramatsu
@ 2020-08-26 10:00               ` Masami Hiramatsu
  2020-08-26 10:25                 ` peterz
  0 siblings, 1 reply; 33+ messages in thread
From: Masami Hiramatsu @ 2020-08-26 10:00 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Eddy_Wu, peterz, linux-kernel, x86

On Wed, 26 Aug 2020 18:06:45 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Wed, 26 Aug 2020 17:22:39 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > On Wed, 26 Aug 2020 07:07:09 +0000
> > "Eddy_Wu@trendmicro.com" <Eddy_Wu@trendmicro.com> wrote:
> > 
> > > 
> > > > -----Original Message-----
> > > > From: peterz@infradead.org <peterz@infradead.org>
> > > > Sent: Tuesday, August 25, 2020 8:09 PM
> > > > To: Masami Hiramatsu <mhiramat@kernel.org>
> > > > Cc: Eddy Wu (RD-TW) <Eddy_Wu@trendmicro.com>; linux-kernel@vger.kernel.org; x86@kernel.org; David S. Miller
> > > > <davem@davemloft.net>
> > > > Subject: Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
> > > >
> > > > Surely we can do a lockless list for this. We have llist_add() and
> > > > llist_del_first() to make a lockless LIFO/stack.
> > > >
> > > 
> > > llist operations require atomic cmpxchg, for some arch doesn't have CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, in_nmi() check might still needed.
> > > (HAVE_KRETPROBES && !CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG): arc, arm, csky, mips
> > 
> > Good catch. In those cases, we can add in_nmi() check at arch dependent code.
> 
> Oops, in_nmi() check is needed in pre_kretprobe_handler() which has no
> arch dependent code. Hmm, so we still need an weak function to check it...

Oops, again. Sorry I found a big misunderstand. I found the in_nmi() check is
completely unnecessary with Jiri's commit 9b38cc704e84 ("kretprobe: Prevent 
triggering kretprobe from within kprobe_flush_task").

This commit introduced the kprobe_busy_begin/end() to the kretproeb trampoline
handler, which set a dummy kprobe to the per-cpu current kprobe pointer.
This current-kprobe is checked at the kprobe pre handler to prevent kprobes
(including kretprobe) recursion.

This means, if an NMI interrupts a kretprobe operation (both pre-handler and
trampoline-handler) and it hits the same kretprobe, this nested kretprobe
handlers never be called, because there is a current kprobe is already set.
Thus, we are totally safe from double-lock issue in the kretprobe handlers.

So we can just remove the in_nmi() check from pre_kretprobe_handler() if
we introduced a generic trampoline handler, since kprobe_busy_begin/end()
must called from the trampoline handlers. Currently it is used on x86 only.

Of course, this doesn't solve the llist_del_first() contention in the
pre_kretprobe_handler(). So anyway we need a lock for per-probe llist
(if I understand llist.h comment correctly.)

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-26 10:00               ` Masami Hiramatsu
@ 2020-08-26 10:25                 ` peterz
  2020-08-26 13:36                   ` Eddy_Wu
  0 siblings, 1 reply; 33+ messages in thread
From: peterz @ 2020-08-26 10:25 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Eddy_Wu, linux-kernel, x86

On Wed, Aug 26, 2020 at 07:00:41PM +0900, Masami Hiramatsu wrote:
> Of course, this doesn't solve the llist_del_first() contention in the
> pre_kretprobe_handler(). So anyway we need a lock for per-probe llist
> (if I understand llist.h comment correctly.)

Bah, lemme think about that. Kprobes really shouldn't be using locks :/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-26 10:25                 ` peterz
@ 2020-08-26 13:36                   ` Eddy_Wu
  2020-08-26 13:51                     ` Masami Hiramatsu
  0 siblings, 1 reply; 33+ messages in thread
From: Eddy_Wu @ 2020-08-26 13:36 UTC (permalink / raw)
  To: peterz; +Cc: mhiramat, linux-kernel, x86



> -----Original Message-----
> From: peterz@infradead.org <peterz@infradead.org>
> Sent: Wednesday, August 26, 2020 6:26 PM
> To: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Eddy Wu (RD-TW) <Eddy_Wu@trendmicro.com>; linux-kernel@vger.kernel.org; x86@kernel.org
> Subject: Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
>
> On Wed, Aug 26, 2020 at 07:00:41PM +0900, Masami Hiramatsu wrote:
> > Of course, this doesn't solve the llist_del_first() contention in the
> > pre_kretprobe_handler(). So anyway we need a lock for per-probe llist
> > (if I understand llist.h comment correctly.)
>
> Bah, lemme think about that. Kprobes really shouldn't be using locks :/

Maybe we can have per-cpu free list for retprobe_instance?
This ensure we only have one user requesting free instance at a time, given that pre_kretprobe_handler() wont recursive.

We may be wasting memory if target function perfer some cpu though.


TREND MICRO EMAIL NOTICE

The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

For details about what personal information we collect and why, please see our Privacy Notice on our website at: Read privacy policy<http://www.trendmicro.com/privacy>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-26 13:36                   ` Eddy_Wu
@ 2020-08-26 13:51                     ` Masami Hiramatsu
  0 siblings, 0 replies; 33+ messages in thread
From: Masami Hiramatsu @ 2020-08-26 13:51 UTC (permalink / raw)
  To: Eddy_Wu; +Cc: peterz, mhiramat, linux-kernel, x86

On Wed, 26 Aug 2020 13:36:15 +0000
"Eddy_Wu@trendmicro.com" <Eddy_Wu@trendmicro.com> wrote:

> 
> 
> > -----Original Message-----
> > From: peterz@infradead.org <peterz@infradead.org>
> > Sent: Wednesday, August 26, 2020 6:26 PM
> > To: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Eddy Wu (RD-TW) <Eddy_Wu@trendmicro.com>; linux-kernel@vger.kernel.org; x86@kernel.org
> > Subject: Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
> >
> > On Wed, Aug 26, 2020 at 07:00:41PM +0900, Masami Hiramatsu wrote:
> > > Of course, this doesn't solve the llist_del_first() contention in the
> > > pre_kretprobe_handler(). So anyway we need a lock for per-probe llist
> > > (if I understand llist.h comment correctly.)
> >
> > Bah, lemme think about that. Kprobes really shouldn't be using locks :/
> 
> Maybe we can have per-cpu free list for retprobe_instance?
> This ensure we only have one user requesting free instance at a time, given that pre_kretprobe_handler() wont recursive.

That will restrict kretprobe not to probe the recursive call loop... 
and if a thread is scheduled, another thread will not be probed too.

I think lockless kretprobe is finally implemented by merging it's
shadow-stack with func-graph tracer.

Thank you,

> 
> We may be wasting memory if target function perfer some cpu though.
> 
> 
> TREND MICRO EMAIL NOTICE
> 
> The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.
> 
> For details about what personal information we collect and why, please see our Privacy Notice on our website at: Read privacy policy<http://www.trendmicro.com/privacy>


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)
  2020-08-25 13:15         ` Masami Hiramatsu
  2020-08-25 13:30           ` peterz
@ 2020-08-27  9:02           ` peterz
  1 sibling, 0 replies; 33+ messages in thread
From: peterz @ 2020-08-27  9:02 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: Eddy_Wu, linux-kernel, x86, David S. Miller

On Tue, Aug 25, 2020 at 10:15:55PM +0900, Masami Hiramatsu wrote:

> Yeah, kretprobe already provided the per-instance data (as far as
> I know, only systemtap depends on it). We need to provide it for
> such users.

Well, systemtap is out of tree, we don't _need_ to provide anything for
them. Furthermore, the function-graph tracer's ret_stack, which you said
you wanted to integrate with, also doesn't provide this.

Ditching it makes things simpler in that all kretprobe_instance's will
be the same and we no longer need per kretprobe storage of them.

Anyway, let me try and preserve them for now...

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-08-27  9:02 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-24 12:02 x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint) Eddy_Wu
2020-08-24 14:14 ` Peter Zijlstra
2020-08-24 16:18   ` Eddy_Wu
2020-08-24 18:15   ` Masami Hiramatsu
2020-08-25  7:36     ` peterz
2020-08-24 15:54 ` Masami Hiramatsu
2020-08-24 16:41   ` Eddy_Wu
2020-08-25  6:15     ` Masami Hiramatsu
2020-08-25  8:33       ` Eddy_Wu
2020-08-25 11:06       ` [PATCH] kprobes/x86: Fixes NMI context check on x86 kernel test robot
2020-08-25 11:06         ` kernel test robot
2020-08-25 12:09       ` x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint) peterz
2020-08-25 13:15         ` Masami Hiramatsu
2020-08-25 13:30           ` peterz
2020-08-25 13:59             ` Masami Hiramatsu
2020-08-25 14:15               ` peterz
2020-08-25 14:10             ` peterz
2020-08-25 14:19               ` Masami Hiramatsu
2020-08-27  9:02           ` peterz
2020-08-26  7:07         ` Eddy_Wu
2020-08-26  8:22           ` Masami Hiramatsu
2020-08-26  9:06             ` Masami Hiramatsu
2020-08-26 10:00               ` Masami Hiramatsu
2020-08-26 10:25                 ` peterz
2020-08-26 13:36                   ` Eddy_Wu
2020-08-26 13:51                     ` Masami Hiramatsu
2020-08-26  9:01           ` peterz
2020-08-26  9:21             ` peterz
2020-08-26  8:31         ` Masami Hiramatsu
2020-08-25 12:20       ` [PATCH] kprobes/x86: Fixes NMI context check on x86 kernel test robot
2020-08-25 12:20         ` kernel test robot
2020-08-25 12:25       ` kernel test robot
2020-08-25 12:25         ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.