linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
@ 2020-08-29 12:59 Masami Hiramatsu
  2020-08-29 13:00 ` [PATCH v5 01/21] kprobes: Add generic kretprobe trampoline handler Masami Hiramatsu
                   ` (21 more replies)
  0 siblings, 22 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 12:59 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Hi,

Here is the 5th version of the series to unify the kretprobe trampoline handler
and to make the kretprobe lockless. Thanks Peter for this work !!

Previous version is here;

 https://lkml.kernel.org/r/159861759775.992023.12553306821235086809.stgit@devnote2

This version merges the remove-task-scan patch into remove kretprobe hash
patch, fixes code according to the comments, and fixes a minor issues.

This version is also available on
git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git lockless-kretprobe-v5

I ran smoke test and ftracetest on x86-64 and did build tests for x86-64, i386, arm,
arm64, and mips.


Thank you,

---

Masami Hiramatsu (16):
      kprobes: Add generic kretprobe trampoline handler
      x86/kprobes: Use generic kretprobe trampoline handler
      arm: kprobes: Use generic kretprobe trampoline handler
      arm64: kprobes: Use generic kretprobe trampoline handler
      arc: kprobes: Use generic kretprobe trampoline handler
      csky: kprobes: Use generic kretprobe trampoline handler
      ia64: kprobes: Use generic kretprobe trampoline handler
      mips: kprobes: Use generic kretprobe trampoline handler
      parisc: kprobes: Use generic kretprobe trampoline handler
      powerpc: kprobes: Use generic kretprobe trampoline handler
      s390: kprobes: Use generic kretprobe trampoline handler
      sh: kprobes: Use generic kretprobe trampoline handler
      sparc: kprobes: Use generic kretprobe trampoline handler
      kprobes: Remove NMI context check
      kprobes: Free kretprobe_instance with rcu callback
      kprobes: Make local used functions static

Peter Zijlstra (5):
      llist: Add nonatomic __llist_add() and __llist_dell_all()
      kprobes: Remove kretprobe hash
      asm-generic/atomic: Add try_cmpxchg() fallbacks
      freelist: Lock less freelist
      kprobes: Replace rp->free_instance with freelist


 arch/arc/kernel/kprobes.c                 |   54 ------
 arch/arm/probes/kprobes/core.c            |   78 ---------
 arch/arm64/kernel/probes/kprobes.c        |   78 ---------
 arch/csky/kernel/probes/kprobes.c         |   77 --------
 arch/ia64/kernel/kprobes.c                |   77 --------
 arch/mips/kernel/kprobes.c                |   54 ------
 arch/parisc/kernel/kprobes.c              |   76 --------
 arch/powerpc/kernel/kprobes.c             |   53 ------
 arch/s390/kernel/kprobes.c                |   79 ---------
 arch/sh/kernel/kprobes.c                  |   58 ------
 arch/sparc/kernel/kprobes.c               |   51 ------
 arch/x86/include/asm/atomic.h             |    2 
 arch/x86/include/asm/atomic64_64.h        |    2 
 arch/x86/include/asm/cmpxchg.h            |    2 
 arch/x86/kernel/kprobes/core.c            |  108 ------------
 drivers/gpu/drm/i915/i915_request.c       |    6 -
 include/asm-generic/atomic-instrumented.h |  216 ++++++++++++++----------
 include/linux/atomic-arch-fallback.h      |   90 +++++++++-
 include/linux/atomic-fallback.h           |   90 +++++++++-
 include/linux/freelist.h                  |  129 ++++++++++++++
 include/linux/kprobes.h                   |   74 +++++---
 include/linux/llist.h                     |   23 +++
 include/linux/sched.h                     |    4 
 kernel/fork.c                             |    4 
 kernel/kprobes.c                          |  264 +++++++++++++----------------
 kernel/trace/trace_kprobe.c               |    3 
 scripts/atomic/gen-atomic-fallback.sh     |   63 ++++++-
 scripts/atomic/gen-atomic-instrumented.sh |   29 +++
 28 files changed, 735 insertions(+), 1109 deletions(-)
 create mode 100644 include/linux/freelist.h

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v5 01/21] kprobes: Add generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
@ 2020-08-29 13:00 ` Masami Hiramatsu
  2020-08-29 13:00 ` [PATCH v5 02/21] x86/kprobes: Use " Masami Hiramatsu
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:00 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Add a generic kretprobe trampoline handler for unifying
the all cloned /arch/* kretprobe trampoline handlers.

The generic kretprobe trampoline handler is based on the
x86 implementation, because it is the latest implementation.
It has frame pointer checking, kprobe_busy_begin/end and
return address fixup for user handlers.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes in v5:
   - Restore actual current_kprobe instead of kprobe_busy.
 Changes in v4:
   - Remove orig_ret_address
   - Change the type of trampoline_address to void *
---
 include/linux/kprobes.h |   32 +++++++++++++--
 kernel/kprobes.c        |   98 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+), 4 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9be1bff4f586..c6a913e608b7 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -187,10 +187,38 @@ static inline int kprobes_built_in(void)
 	return 1;
 }
 
+extern struct kprobe kprobe_busy;
+void kprobe_busy_begin(void);
+void kprobe_busy_end(void);
+
 #ifdef CONFIG_KRETPROBES
 extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				   struct pt_regs *regs);
 extern int arch_trampoline_kprobe(struct kprobe *p);
+
+/* If the trampoline handler called from a kprobe, use this version */
+unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
+				void *trampoline_address,
+				void *frame_pointer);
+
+static nokprobe_inline
+unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
+				void *trampoline_address,
+				void *frame_pointer)
+{
+	unsigned long ret;
+	/*
+	 * Set a dummy kprobe for avoiding kretprobe recursion.
+	 * Since kretprobe never runs in kprobe handler, any kprobe must not
+	 * be running at this point.
+	 */
+	kprobe_busy_begin();
+	ret = __kretprobe_trampoline_handler(regs, trampoline_address, frame_pointer);
+	kprobe_busy_end();
+
+	return ret;
+}
+
 #else /* CONFIG_KRETPROBES */
 static inline void arch_prepare_kretprobe(struct kretprobe *rp,
 					struct pt_regs *regs)
@@ -354,10 +382,6 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
 	return this_cpu_ptr(&kprobe_ctlblk);
 }
 
-extern struct kprobe kprobe_busy;
-void kprobe_busy_begin(void);
-void kprobe_busy_end(void);
-
 kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
 int register_kprobe(struct kprobe *p);
 void unregister_kprobe(struct kprobe *p);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 287b263c9cb9..a0afaa79024e 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1927,6 +1927,104 @@ unsigned long __weak arch_deref_entry_point(void *entry)
 }
 
 #ifdef CONFIG_KRETPROBES
+
+unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
+					     void *trampoline_address,
+					     void *frame_pointer)
+{
+	struct kretprobe_instance *ri = NULL, *last = NULL;
+	struct hlist_head *head, empty_rp;
+	struct hlist_node *tmp;
+	unsigned long flags;
+	kprobe_opcode_t *correct_ret_addr = NULL;
+	bool skipped = false;
+
+	INIT_HLIST_HEAD(&empty_rp);
+	kretprobe_hash_lock(current, &head, &flags);
+
+	/*
+	 * It is possible to have multiple instances associated with a given
+	 * task either because multiple functions in the call path have
+	 * return probes installed on them, and/or more than one
+	 * return probe was registered for a target function.
+	 *
+	 * We can handle this because:
+	 *     - instances are always pushed into the head of the list
+	 *     - when multiple return probes are registered for the same
+	 *	 function, the (chronologically) first instance's ret_addr
+	 *	 will be the real return address, and all the rest will
+	 *	 point to kretprobe_trampoline.
+	 */
+	hlist_for_each_entry(ri, head, hlist) {
+		if (ri->task != current)
+			/* another task is sharing our hash bucket */
+			continue;
+		/*
+		 * Return probes must be pushed on this hash list correct
+		 * order (same as return order) so that it can be popped
+		 * correctly. However, if we find it is pushed it incorrect
+		 * order, this means we find a function which should not be
+		 * probed, because the wrong order entry is pushed on the
+		 * path of processing other kretprobe itself.
+		 */
+		if (ri->fp != frame_pointer) {
+			if (!skipped)
+				pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n");
+			skipped = true;
+			continue;
+		}
+
+		correct_ret_addr = ri->ret_addr;
+		if (skipped)
+			pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n",
+				ri->rp->kp.addr);
+
+		if (correct_ret_addr != trampoline_address)
+			/*
+			 * This is the real return address. Any other
+			 * instances associated with this task are for
+			 * other calls deeper on the call stack
+			 */
+			break;
+	}
+
+	kretprobe_assert(ri, (unsigned long)correct_ret_addr,
+			     (unsigned long)trampoline_address);
+	last = ri;
+
+	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
+		if (ri->task != current)
+			/* another task is sharing our hash bucket */
+			continue;
+		if (ri->fp != frame_pointer)
+			continue;
+
+		if (ri->rp && ri->rp->handler) {
+			struct kprobe *prev = kprobe_running();
+
+			__this_cpu_write(current_kprobe, &ri->rp->kp);
+			ri->ret_addr = correct_ret_addr;
+			ri->rp->handler(ri, regs);
+			__this_cpu_write(current_kprobe, prev);
+		}
+
+		recycle_rp_inst(ri, &empty_rp);
+
+		if (ri == last)
+			break;
+	}
+
+	kretprobe_hash_unlock(current, &flags);
+
+	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
+		hlist_del(&ri->hlist);
+		kfree(ri);
+	}
+
+	return (unsigned long)correct_ret_addr;
+}
+NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
+
 /*
  * This kprobe pre_handler is registered with every kretprobe. When probe
  * hits it will set up the return probe.


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 02/21] x86/kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
  2020-08-29 13:00 ` [PATCH v5 01/21] kprobes: Add generic kretprobe trampoline handler Masami Hiramatsu
@ 2020-08-29 13:00 ` Masami Hiramatsu
  2020-08-29 13:00 ` [PATCH v5 03/21] arm: kprobes: " Masami Hiramatsu
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:00 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/kernel/kprobes/core.c |  108 +---------------------------------------
 1 file changed, 3 insertions(+), 105 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index fdadc37d72af..882b95313ad6 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -767,124 +767,22 @@ asm(
 NOKPROBE_SYMBOL(kretprobe_trampoline);
 STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
 
+
 /*
  * Called from kretprobe_trampoline
  */
 __used __visible void *trampoline_handler(struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
-	unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
-	kprobe_opcode_t *correct_ret_addr = NULL;
-	void *frame_pointer;
-	bool skipped = false;
-
-	/*
-	 * Set a dummy kprobe for avoiding kretprobe recursion.
-	 * Since kretprobe never run in kprobe handler, kprobe must not
-	 * be running at this point.
-	 */
-	kprobe_busy_begin();
-
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
 	/* fixup registers */
 	regs->cs = __KERNEL_CS;
 #ifdef CONFIG_X86_32
 	regs->cs |= get_kernel_rpl();
 	regs->gs = 0;
 #endif
-	/* We use pt_regs->sp for return address holder. */
-	frame_pointer = &regs->sp;
-	regs->ip = trampoline_address;
+	regs->ip = (unsigned long)&kretprobe_trampoline;
 	regs->orig_ax = ~0UL;
 
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because multiple functions in the call path have
-	 * return probes installed on them, and/or more than one
-	 * return probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always pushed into the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *	 function, the (chronologically) first instance's ret_addr
-	 *	 will be the real return address, and all the rest will
-	 *	 point to kretprobe_trampoline.
-	 */
-	hlist_for_each_entry(ri, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-		/*
-		 * Return probes must be pushed on this hash list correct
-		 * order (same as return order) so that it can be popped
-		 * correctly. However, if we find it is pushed it incorrect
-		 * order, this means we find a function which should not be
-		 * probed, because the wrong order entry is pushed on the
-		 * path of processing other kretprobe itself.
-		 */
-		if (ri->fp != frame_pointer) {
-			if (!skipped)
-				pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n");
-			skipped = true;
-			continue;
-		}
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		if (skipped)
-			pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n",
-				ri->rp->kp.addr);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
-	correct_ret_addr = ri->ret_addr;
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-		if (ri->fp != frame_pointer)
-			continue;
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		if (ri->rp && ri->rp->handler) {
-			__this_cpu_write(current_kprobe, &ri->rp->kp);
-			ri->ret_addr = correct_ret_addr;
-			ri->rp->handler(ri, regs);
-			__this_cpu_write(current_kprobe, &kprobe_busy);
-		}
-
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_hash_unlock(current, &flags);
-
-	kprobe_busy_end();
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
-	return (void *)orig_ret_address;
+	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, &regs->sp);
 }
 NOKPROBE_SYMBOL(trampoline_handler);
 


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 03/21] arm: kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
  2020-08-29 13:00 ` [PATCH v5 01/21] kprobes: Add generic kretprobe trampoline handler Masami Hiramatsu
  2020-08-29 13:00 ` [PATCH v5 02/21] x86/kprobes: Use " Masami Hiramatsu
@ 2020-08-29 13:00 ` Masami Hiramatsu
  2020-08-29 13:00 ` [PATCH v5 04/21] arm64: " Masami Hiramatsu
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:00 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Use the generic kretprobe trampoline handler. Use regs->ARM_fp
for framepointer verification.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
  Changes in v2:
   - Fix to cast frame_pointer type to void *.
---
 arch/arm/probes/kprobes/core.c |   78 ++--------------------------------------
 1 file changed, 3 insertions(+), 75 deletions(-)

diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index feefa2055eba..a9653117ca0d 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -413,87 +413,15 @@ void __naked __kprobes kretprobe_trampoline(void)
 /* Called from kretprobe_trampoline */
 static __used __kprobes void *trampoline_handler(struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
-	unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
-	kprobe_opcode_t *correct_ret_addr = NULL;
-
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
-
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because multiple functions in the call path have
-	 * a return probe installed on them, and/or more than one return
-	 * probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always inserted at the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *       function, the first instance's ret_addr will point to the
-	 *       real return address, and all the rest will point to
-	 *       kretprobe_trampoline
-	 */
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
-	correct_ret_addr = ri->ret_addr;
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		if (ri->rp && ri->rp->handler) {
-			__this_cpu_write(current_kprobe, &ri->rp->kp);
-			get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
-			ri->ret_addr = correct_ret_addr;
-			ri->rp->handler(ri, regs);
-			__this_cpu_write(current_kprobe, NULL);
-		}
-
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_hash_unlock(current, &flags);
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
-
-	return (void *)orig_ret_address;
+	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
+						    (void *)regs->ARM_fp);
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				      struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *)regs->ARM_lr;
+	ri->fp = (void *)regs->ARM_fp;
 
 	/* Replace the return addr with trampoline addr. */
 	regs->ARM_lr = (unsigned long)&kretprobe_trampoline;


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 04/21] arm64: kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (2 preceding siblings ...)
  2020-08-29 13:00 ` [PATCH v5 03/21] arm: kprobes: " Masami Hiramatsu
@ 2020-08-29 13:00 ` Masami Hiramatsu
  2020-08-29 13:00 ` [PATCH v5 05/21] arc: " Masami Hiramatsu
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:00 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Use the generic kretprobe trampoline handler, and use the
kernel_stack_pointer(regs) for framepointer verification.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/arm64/kernel/probes/kprobes.c |   78 +-----------------------------------
 1 file changed, 3 insertions(+), 75 deletions(-)

diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index 5290f17a4d80..deba738142ed 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -464,87 +464,15 @@ int __init arch_populate_kprobe_blacklist(void)
 
 void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
-	unsigned long trampoline_address =
-		(unsigned long)&kretprobe_trampoline;
-	kprobe_opcode_t *correct_ret_addr = NULL;
-
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
-
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because multiple functions in the call path have
-	 * return probes installed on them, and/or more than one
-	 * return probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always pushed into the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *	 function, the (chronologically) first instance's ret_addr
-	 *	 will be the real return address, and all the rest will
-	 *	 point to kretprobe_trampoline.
-	 */
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
-	correct_ret_addr = ri->ret_addr;
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		if (ri->rp && ri->rp->handler) {
-			__this_cpu_write(current_kprobe, &ri->rp->kp);
-			get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
-			ri->ret_addr = correct_ret_addr;
-			ri->rp->handler(ri, regs);
-			__this_cpu_write(current_kprobe, NULL);
-		}
-
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_hash_unlock(current, &flags);
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
-	return (void *)orig_ret_address;
+	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
+					(void *)kernel_stack_pointer(regs));
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				      struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *)regs->regs[30];
+	ri->fp = (void *)kernel_stack_pointer(regs);
 
 	/* replace return addr (x30) with trampoline */
 	regs->regs[30] = (long)&kretprobe_trampoline;


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 05/21] arc: kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (3 preceding siblings ...)
  2020-08-29 13:00 ` [PATCH v5 04/21] arm64: " Masami Hiramatsu
@ 2020-08-29 13:00 ` Masami Hiramatsu
  2020-08-29 13:00 ` [PATCH v5 06/21] csky: " Masami Hiramatsu
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:00 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/arc/kernel/kprobes.c |   54 ++-------------------------------------------
 1 file changed, 2 insertions(+), 52 deletions(-)

diff --git a/arch/arc/kernel/kprobes.c b/arch/arc/kernel/kprobes.c
index 7d3efe83cba7..cabef45f11df 100644
--- a/arch/arc/kernel/kprobes.c
+++ b/arch/arc/kernel/kprobes.c
@@ -388,6 +388,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 {
 
 	ri->ret_addr = (kprobe_opcode_t *) regs->blink;
+	ri->fp = NULL;
 
 	/* Replace the return addr with trampoline addr */
 	regs->blink = (unsigned long)&kretprobe_trampoline;
@@ -396,58 +397,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 static int __kprobes trampoline_probe_handler(struct kprobe *p,
 					      struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
-	unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
-
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
-
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because an multiple functions in the call path
-	 * have a return probe installed on them, and/or more than one return
-	 * return probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always inserted at the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *       function, the first instance's ret_addr will point to the
-	 *       real return address, and all the rest will point to
-	 *       kretprobe_trampoline
-	 */
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		if (ri->rp && ri->rp->handler)
-			ri->rp->handler(ri, regs);
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address) {
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-		}
-	}
-
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
-	regs->ret = orig_ret_address;
-
-	kretprobe_hash_unlock(current, &flags);
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
+	regs->ret = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
 
 	/* By returning a non zero value, we are telling the kprobe handler
 	 * that we don't want the post_handler to run


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 06/21] csky: kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (4 preceding siblings ...)
  2020-08-29 13:00 ` [PATCH v5 05/21] arc: " Masami Hiramatsu
@ 2020-08-29 13:00 ` Masami Hiramatsu
  2020-08-29 13:01 ` [PATCH v5 07/21] ia64: " Masami Hiramatsu
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:00 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Guo Ren <guoren@kernel.org>
---
 arch/csky/kernel/probes/kprobes.c |   77 +------------------------------------
 1 file changed, 2 insertions(+), 75 deletions(-)

diff --git a/arch/csky/kernel/probes/kprobes.c b/arch/csky/kernel/probes/kprobes.c
index f0f733b7ac5a..589f090f48b9 100644
--- a/arch/csky/kernel/probes/kprobes.c
+++ b/arch/csky/kernel/probes/kprobes.c
@@ -404,87 +404,14 @@ int __init arch_populate_kprobe_blacklist(void)
 
 void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
-	unsigned long trampoline_address =
-		(unsigned long)&kretprobe_trampoline;
-	kprobe_opcode_t *correct_ret_addr = NULL;
-
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
-
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because multiple functions in the call path have
-	 * return probes installed on them, and/or more than one
-	 * return probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always pushed into the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *	 function, the (chronologically) first instance's ret_addr
-	 *	 will be the real return address, and all the rest will
-	 *	 point to kretprobe_trampoline.
-	 */
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
-	correct_ret_addr = ri->ret_addr;
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		if (ri->rp && ri->rp->handler) {
-			__this_cpu_write(current_kprobe, &ri->rp->kp);
-			get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
-			ri->ret_addr = correct_ret_addr;
-			ri->rp->handler(ri, regs);
-			__this_cpu_write(current_kprobe, NULL);
-		}
-
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_hash_unlock(current, &flags);
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
-	return (void *)orig_ret_address;
+	return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				      struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *)regs->lr;
+	ri->fp = NULL;
 	regs->lr = (unsigned long) &kretprobe_trampoline;
 }
 


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 07/21] ia64: kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (5 preceding siblings ...)
  2020-08-29 13:00 ` [PATCH v5 06/21] csky: " Masami Hiramatsu
@ 2020-08-29 13:01 ` Masami Hiramatsu
  2020-08-29 13:01 ` [PATCH v5 08/21] mips: " Masami Hiramatsu
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:01 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/ia64/kernel/kprobes.c |   77 +-------------------------------------------
 1 file changed, 2 insertions(+), 75 deletions(-)

diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index 7a7df944d798..fc1ff8a4d7de 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -396,83 +396,9 @@ static void kretprobe_trampoline(void)
 {
 }
 
-/*
- * At this point the target function has been tricked into
- * returning into our trampoline.  Lookup the associated instance
- * and then:
- *    - call the handler function
- *    - cleanup by marking the instance as unused
- *    - long jump back to the original return address
- */
 int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
-	unsigned long trampoline_address =
-		((struct fnptr *)kretprobe_trampoline)->ip;
-
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
-
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because an multiple functions in the call path
-	 * have a return probe installed on them, and/or more than one return
-	 * return probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always inserted at the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *       function, the first instance's ret_addr will point to the
-	 *       real return address, and all the rest will point to
-	 *       kretprobe_trampoline
-	 */
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	regs->cr_iip = orig_ret_address;
-
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		if (ri->rp && ri->rp->handler)
-			ri->rp->handler(ri, regs);
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
-	kretprobe_hash_unlock(current, &flags);
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
+	regs->cr_iip = __kretprobe_trampoline_handler(regs, kretprobe_trampoline, NULL);
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler
@@ -485,6 +411,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				      struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *)regs->b0;
+	ri->fp = NULL;
 
 	/* Replace the return addr with trampoline addr */
 	regs->b0 = ((struct fnptr *)kretprobe_trampoline)->ip;


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 08/21] mips: kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (6 preceding siblings ...)
  2020-08-29 13:01 ` [PATCH v5 07/21] ia64: " Masami Hiramatsu
@ 2020-08-29 13:01 ` Masami Hiramatsu
  2020-08-29 13:01 ` [PATCH v5 09/21] parisc: " Masami Hiramatsu
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:01 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/mips/kernel/kprobes.c |   54 ++------------------------------------------
 1 file changed, 3 insertions(+), 51 deletions(-)

diff --git a/arch/mips/kernel/kprobes.c b/arch/mips/kernel/kprobes.c
index d043c2f897fc..54dfba8fa77c 100644
--- a/arch/mips/kernel/kprobes.c
+++ b/arch/mips/kernel/kprobes.c
@@ -477,6 +477,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				      struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *) regs->regs[31];
+	ri->fp = NULL;
 
 	/* Replace the return addr with trampoline addr */
 	regs->regs[31] = (unsigned long)kretprobe_trampoline;
@@ -488,57 +489,8 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 static int __kprobes trampoline_probe_handler(struct kprobe *p,
 						struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
-	unsigned long trampoline_address = (unsigned long)kretprobe_trampoline;
-
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
-
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because an multiple functions in the call path
-	 * have a return probe installed on them, and/or more than one return
-	 * return probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always inserted at the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *	 function, the first instance's ret_addr will point to the
-	 *	 real return address, and all the rest will point to
-	 *	 kretprobe_trampoline
-	 */
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		if (ri->rp && ri->rp->handler)
-			ri->rp->handler(ri, regs);
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
-	instruction_pointer(regs) = orig_ret_address;
-
-	kretprobe_hash_unlock(current, &flags);
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
+	instruction_pointer(regs) = __kretprobe_trampoline_handler(regs,
+						kretprobe_trampoline, NULL);
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 09/21] parisc: kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (7 preceding siblings ...)
  2020-08-29 13:01 ` [PATCH v5 08/21] mips: " Masami Hiramatsu
@ 2020-08-29 13:01 ` Masami Hiramatsu
  2020-08-29 13:01 ` [PATCH v5 10/21] powerpc: " Masami Hiramatsu
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:01 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/parisc/kernel/kprobes.c |   76 ++----------------------------------------
 1 file changed, 4 insertions(+), 72 deletions(-)

diff --git a/arch/parisc/kernel/kprobes.c b/arch/parisc/kernel/kprobes.c
index 77ec51818916..6d21a515eea5 100644
--- a/arch/parisc/kernel/kprobes.c
+++ b/arch/parisc/kernel/kprobes.c
@@ -191,80 +191,11 @@ static struct kprobe trampoline_p = {
 static int __kprobes trampoline_probe_handler(struct kprobe *p,
 					      struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
-	unsigned long trampoline_address = (unsigned long)trampoline_p.addr;
-	kprobe_opcode_t *correct_ret_addr = NULL;
-
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
-
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because multiple functions in the call path have
-	 * a return probe installed on them, and/or more than one return
-	 * probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always inserted at the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *       function, the first instance's ret_addr will point to the
-	 *       real return address, and all the rest will point to
-	 *       kretprobe_trampoline
-	 */
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
-	correct_ret_addr = ri->ret_addr;
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		if (ri->rp && ri->rp->handler) {
-			__this_cpu_write(current_kprobe, &ri->rp->kp);
-			get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
-			ri->ret_addr = correct_ret_addr;
-			ri->rp->handler(ri, regs);
-			__this_cpu_write(current_kprobe, NULL);
-		}
-
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
+	unsigned long orig_ret_address;
 
-	kretprobe_hash_unlock(current, &flags);
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
+	orig_ret_address = __kretprobe_trampoline_handler(regs, trampoline_p.addr, NULL);
 	instruction_pointer_set(regs, orig_ret_address);
+
 	return 1;
 }
 
@@ -272,6 +203,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				      struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *)regs->gr[2];
+	ri->fp = NULL;
 
 	/* Replace the return addr with trampoline addr. */
 	regs->gr[2] = (unsigned long)trampoline_p.addr;


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 10/21] powerpc: kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (8 preceding siblings ...)
  2020-08-29 13:01 ` [PATCH v5 09/21] parisc: " Masami Hiramatsu
@ 2020-08-29 13:01 ` Masami Hiramatsu
  2020-08-29 13:02 ` [PATCH v5 11/21] s390: " Masami Hiramatsu
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:01 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes in v2:
   Fix to use correct trampoline_address.
---
 arch/powerpc/kernel/kprobes.c |   53 ++---------------------------------------
 1 file changed, 3 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 6ab9b4d037c3..01ab2163659e 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -218,6 +218,7 @@ bool arch_kprobe_on_func_entry(unsigned long offset)
 void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *)regs->link;
+	ri->fp = NULL;
 
 	/* Replace the return addr with trampoline addr */
 	regs->link = (unsigned long)kretprobe_trampoline;
@@ -396,50 +397,9 @@ asm(".global kretprobe_trampoline\n"
  */
 static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
-	unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
-
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
-
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because an multiple functions in the call path
-	 * have a return probe installed on them, and/or more than one return
-	 * return probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always inserted at the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *       function, the first instance's ret_addr will point to the
-	 *       real return address, and all the rest will point to
-	 *       kretprobe_trampoline
-	 */
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		if (ri->rp && ri->rp->handler)
-			ri->rp->handler(ri, regs);
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
+	unsigned long orig_ret_address;
 
+	orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
 	/*
 	 * We get here through one of two paths:
 	 * 1. by taking a trap -> kprobe_handler() -> here
@@ -458,13 +418,6 @@ static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 	regs->nip = orig_ret_address - 4;
 	regs->link = orig_ret_address;
 
-	kretprobe_hash_unlock(current, &flags);
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
-
 	return 0;
 }
 NOKPROBE_SYMBOL(trampoline_probe_handler);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 11/21] s390: kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (9 preceding siblings ...)
  2020-08-29 13:01 ` [PATCH v5 10/21] powerpc: " Masami Hiramatsu
@ 2020-08-29 13:02 ` Masami Hiramatsu
  2020-08-29 13:02 ` [PATCH v5 12/21] sh: " Masami Hiramatsu
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:02 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/s390/kernel/kprobes.c |   79 +-------------------------------------------
 1 file changed, 2 insertions(+), 77 deletions(-)

diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index d2a71d872638..fc30e799bd84 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -228,6 +228,7 @@ NOKPROBE_SYMBOL(pop_kprobe);
 void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *) regs->gprs[14];
+	ri->fp = NULL;
 
 	/* Replace the return addr with trampoline addr */
 	regs->gprs[14] = (unsigned long) &kretprobe_trampoline;
@@ -331,83 +332,7 @@ static void __used kretprobe_trampoline_holder(void)
  */
 static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address;
-	unsigned long trampoline_address;
-	kprobe_opcode_t *correct_ret_addr;
-
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
-
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because an multiple functions in the call path
-	 * have a return probe installed on them, and/or more than one return
-	 * return probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always inserted at the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *	 function, the first instance's ret_addr will point to the
-	 *	 real return address, and all the rest will point to
-	 *	 kretprobe_trampoline
-	 */
-	ri = NULL;
-	orig_ret_address = 0;
-	correct_ret_addr = NULL;
-	trampoline_address = (unsigned long) &kretprobe_trampoline;
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		orig_ret_address = (unsigned long) ri->ret_addr;
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
-	correct_ret_addr = ri->ret_addr;
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		orig_ret_address = (unsigned long) ri->ret_addr;
-
-		if (ri->rp && ri->rp->handler) {
-			ri->ret_addr = correct_ret_addr;
-			ri->rp->handler(ri, regs);
-		}
-
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	regs->psw.addr = orig_ret_address;
-
-	kretprobe_hash_unlock(current, &flags);
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
+	regs->psw.addr = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 12/21] sh: kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (10 preceding siblings ...)
  2020-08-29 13:02 ` [PATCH v5 11/21] s390: " Masami Hiramatsu
@ 2020-08-29 13:02 ` Masami Hiramatsu
  2020-08-29 13:02 ` [PATCH v5 13/21] sparc: " Masami Hiramatsu
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:02 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/sh/kernel/kprobes.c |   58 ++--------------------------------------------
 1 file changed, 3 insertions(+), 55 deletions(-)

diff --git a/arch/sh/kernel/kprobes.c b/arch/sh/kernel/kprobes.c
index 318296f48f1a..756100b01e84 100644
--- a/arch/sh/kernel/kprobes.c
+++ b/arch/sh/kernel/kprobes.c
@@ -204,6 +204,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				      struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *) regs->pr;
+	ri->fp = NULL;
 
 	/* Replace the return addr with trampoline addr */
 	regs->pr = (unsigned long)kretprobe_trampoline;
@@ -302,62 +303,9 @@ static void __used kretprobe_trampoline_holder(void)
  */
 int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
-	unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
+	regs->pc = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
 
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
-
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because an multiple functions in the call path
-	 * have a return probe installed on them, and/or more then one return
-	 * return probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always inserted at the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *       function, the first instance's ret_addr will point to the
-	 *       real return address, and all the rest will point to
-	 *       kretprobe_trampoline
-	 */
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		if (ri->rp && ri->rp->handler) {
-			__this_cpu_write(current_kprobe, &ri->rp->kp);
-			ri->rp->handler(ri, regs);
-			__this_cpu_write(current_kprobe, NULL);
-		}
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
-	regs->pc = orig_ret_address;
-	kretprobe_hash_unlock(current, &flags);
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
-
-	return orig_ret_address;
+	return 1;
 }
 
 static int __kprobes post_kprobe_handler(struct pt_regs *regs)


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 13/21] sparc: kprobes: Use generic kretprobe trampoline handler
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (11 preceding siblings ...)
  2020-08-29 13:02 ` [PATCH v5 12/21] sh: " Masami Hiramatsu
@ 2020-08-29 13:02 ` Masami Hiramatsu
  2020-08-29 13:02 ` [PATCH v5 14/21] kprobes: Remove NMI context check Masami Hiramatsu
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:02 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/sparc/kernel/kprobes.c |   51 +++----------------------------------------
 1 file changed, 3 insertions(+), 48 deletions(-)

diff --git a/arch/sparc/kernel/kprobes.c b/arch/sparc/kernel/kprobes.c
index dfbca2470536..217c21a6986a 100644
--- a/arch/sparc/kernel/kprobes.c
+++ b/arch/sparc/kernel/kprobes.c
@@ -453,6 +453,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 				      struct pt_regs *regs)
 {
 	ri->ret_addr = (kprobe_opcode_t *)(regs->u_regs[UREG_RETPC] + 8);
+	ri->fp = NULL;
 
 	/* Replace the return addr with trampoline addr */
 	regs->u_regs[UREG_RETPC] =
@@ -465,58 +466,12 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 static int __kprobes trampoline_probe_handler(struct kprobe *p,
 					      struct pt_regs *regs)
 {
-	struct kretprobe_instance *ri = NULL;
-	struct hlist_head *head, empty_rp;
-	struct hlist_node *tmp;
-	unsigned long flags, orig_ret_address = 0;
-	unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
+	unsigned long orig_ret_address = 0;
 
-	INIT_HLIST_HEAD(&empty_rp);
-	kretprobe_hash_lock(current, &head, &flags);
-
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because an multiple functions in the call path
-	 * have a return probe installed on them, and/or more than one return
-	 * return probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always inserted at the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *       function, the first instance's ret_addr will point to the
-	 *       real return address, and all the rest will point to
-	 *       kretprobe_trampoline
-	 */
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-
-		if (ri->rp && ri->rp->handler)
-			ri->rp->handler(ri, regs);
-
-		orig_ret_address = (unsigned long)ri->ret_addr;
-		recycle_rp_inst(ri, &empty_rp);
-
-		if (orig_ret_address != trampoline_address)
-			/*
-			 * This is the real return address. Any other
-			 * instances associated with this task are for
-			 * other calls deeper on the call stack
-			 */
-			break;
-	}
-
-	kretprobe_assert(ri, orig_ret_address, trampoline_address);
+	orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
 	regs->tpc = orig_ret_address;
 	regs->tnpc = orig_ret_address + 4;
 
-	kretprobe_hash_unlock(current, &flags);
-
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
 	/*
 	 * By returning a non-zero value, we are telling
 	 * kprobe_handler() that we don't want the post_handler


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 14/21] kprobes: Remove NMI context check
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (12 preceding siblings ...)
  2020-08-29 13:02 ` [PATCH v5 13/21] sparc: " Masami Hiramatsu
@ 2020-08-29 13:02 ` Masami Hiramatsu
       [not found]   ` <20201030213831.04e81962@oasis.local.home>
  2020-08-29 13:02 ` [PATCH v5 15/21] kprobes: Free kretprobe_instance with rcu callback Masami Hiramatsu
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:02 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Since the commit 9b38cc704e84 ("kretprobe: Prevent triggering
kretprobe from within kprobe_flush_task") sets a dummy current
kprobe in the trampoline handler by kprobe_busy_begin/end(),
it is not possible to run a kretprobe pre handler in kretprobe
trampoline handler context even with the NMI. If the NMI interrupts
a kretprobe_trampoline_handler() and it hits a kretprobe, the
2nd kretprobe will detect recursion correctly and it will be
skipped.
This means we have almost no double-lock issue on kretprobes by NMI.

The last one point is in cleanup_rp_inst() which also takes
kretprobe_table_lock without setting up current kprobes.
So adding kprobe_busy_begin/end() there allows us to remove
in_nmi() check.

The above commit applies kprobe_busy_begin/end() on x86, but
now all arch implementation are unified to generic one, we can
safely remove the in_nmi() check from arch independent code.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/kprobes.c |   16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index a0afaa79024e..211138225fa5 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1359,7 +1359,8 @@ static void cleanup_rp_inst(struct kretprobe *rp)
 	struct hlist_node *next;
 	struct hlist_head *head;
 
-	/* No race here */
+	/* To avoid recursive kretprobe by NMI, set kprobe busy here */
+	kprobe_busy_begin();
 	for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
 		kretprobe_table_lock(hash, &flags);
 		head = &kretprobe_inst_table[hash];
@@ -1369,6 +1370,8 @@ static void cleanup_rp_inst(struct kretprobe *rp)
 		}
 		kretprobe_table_unlock(hash, &flags);
 	}
+	kprobe_busy_end();
+
 	free_rp_inst(rp);
 }
 NOKPROBE_SYMBOL(cleanup_rp_inst);
@@ -2035,17 +2038,6 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 	unsigned long hash, flags = 0;
 	struct kretprobe_instance *ri;
 
-	/*
-	 * To avoid deadlocks, prohibit return probing in NMI contexts,
-	 * just skip the probe and increase the (inexact) 'nmissed'
-	 * statistical counter, so that the user is informed that
-	 * something happened:
-	 */
-	if (unlikely(in_nmi())) {
-		rp->nmissed++;
-		return 0;
-	}
-
 	/* TODO: consider to only swap the RA after the last pre_handler fired */
 	hash = hash_ptr(current, KPROBE_HASH_BITS);
 	raw_spin_lock_irqsave(&rp->lock, flags);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 15/21] kprobes: Free kretprobe_instance with rcu callback
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (13 preceding siblings ...)
  2020-08-29 13:02 ` [PATCH v5 14/21] kprobes: Remove NMI context check Masami Hiramatsu
@ 2020-08-29 13:02 ` Masami Hiramatsu
  2020-08-29 13:03 ` [PATCH v5 16/21] kprobes: Make local used functions static Masami Hiramatsu
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:02 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Free kretprobe_instance with rcu callback instead of directly
freeing the object in the kretprobe handler context.

This will make kretprobe run safer in NMI context.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes in v3:
   - Stick the rcu_head with hlist_node in kretprobe_instance
   - Make recycle_rp_inst() static
---
 include/linux/kprobes.h |    6 ++++--
 kernel/kprobes.c        |   25 ++++++-------------------
 2 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index c6a913e608b7..663be8debf25 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -156,7 +156,10 @@ struct kretprobe {
 };
 
 struct kretprobe_instance {
-	struct hlist_node hlist;
+	union {
+		struct hlist_node hlist;
+		struct rcu_head rcu;
+	};
 	struct kretprobe *rp;
 	kprobe_opcode_t *ret_addr;
 	struct task_struct *task;
@@ -395,7 +398,6 @@ int register_kretprobes(struct kretprobe **rps, int num);
 void unregister_kretprobes(struct kretprobe **rps, int num);
 
 void kprobe_flush_task(struct task_struct *tk);
-void recycle_rp_inst(struct kretprobe_instance *ri, struct hlist_head *head);
 
 int disable_kprobe(struct kprobe *kp);
 int enable_kprobe(struct kprobe *kp);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 211138225fa5..0676868f1ac2 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1223,8 +1223,7 @@ void kprobes_inc_nmissed_count(struct kprobe *p)
 }
 NOKPROBE_SYMBOL(kprobes_inc_nmissed_count);
 
-void recycle_rp_inst(struct kretprobe_instance *ri,
-		     struct hlist_head *head)
+static void recycle_rp_inst(struct kretprobe_instance *ri)
 {
 	struct kretprobe *rp = ri->rp;
 
@@ -1236,8 +1235,7 @@ void recycle_rp_inst(struct kretprobe_instance *ri,
 		hlist_add_head(&ri->hlist, &rp->free_instances);
 		raw_spin_unlock(&rp->lock);
 	} else
-		/* Unregistering */
-		hlist_add_head(&ri->hlist, head);
+		kfree_rcu(ri, rcu);
 }
 NOKPROBE_SYMBOL(recycle_rp_inst);
 
@@ -1313,7 +1311,7 @@ void kprobe_busy_end(void)
 void kprobe_flush_task(struct task_struct *tk)
 {
 	struct kretprobe_instance *ri;
-	struct hlist_head *head, empty_rp;
+	struct hlist_head *head;
 	struct hlist_node *tmp;
 	unsigned long hash, flags = 0;
 
@@ -1323,19 +1321,14 @@ void kprobe_flush_task(struct task_struct *tk)
 
 	kprobe_busy_begin();
 
-	INIT_HLIST_HEAD(&empty_rp);
 	hash = hash_ptr(tk, KPROBE_HASH_BITS);
 	head = &kretprobe_inst_table[hash];
 	kretprobe_table_lock(hash, &flags);
 	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
 		if (ri->task == tk)
-			recycle_rp_inst(ri, &empty_rp);
+			recycle_rp_inst(ri);
 	}
 	kretprobe_table_unlock(hash, &flags);
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
 
 	kprobe_busy_end();
 }
@@ -1936,13 +1929,12 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 					     void *frame_pointer)
 {
 	struct kretprobe_instance *ri = NULL, *last = NULL;
-	struct hlist_head *head, empty_rp;
+	struct hlist_head *head;
 	struct hlist_node *tmp;
 	unsigned long flags;
 	kprobe_opcode_t *correct_ret_addr = NULL;
 	bool skipped = false;
 
-	INIT_HLIST_HEAD(&empty_rp);
 	kretprobe_hash_lock(current, &head, &flags);
 
 	/*
@@ -2011,7 +2003,7 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 			__this_cpu_write(current_kprobe, prev);
 		}
 
-		recycle_rp_inst(ri, &empty_rp);
+		recycle_rp_inst(ri);
 
 		if (ri == last)
 			break;
@@ -2019,11 +2011,6 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 
 	kretprobe_hash_unlock(current, &flags);
 
-	hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
-		hlist_del(&ri->hlist);
-		kfree(ri);
-	}
-
 	return (unsigned long)correct_ret_addr;
 }
 NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 16/21] kprobes: Make local used functions static
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (14 preceding siblings ...)
  2020-08-29 13:02 ` [PATCH v5 15/21] kprobes: Free kretprobe_instance with rcu callback Masami Hiramatsu
@ 2020-08-29 13:03 ` Masami Hiramatsu
  2020-08-29 13:03 ` [PATCH v5 17/21] llist: Add nonatomic __llist_add() and __llist_dell_all() Masami Hiramatsu
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:03 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

Since we unified the kretprobe trampoline handler from arch/* code,
some functions and objects no need to be exported anymore.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes in v4:
   - Convert kretprobe_assert() into a BUG_ON().
---
 include/linux/kprobes.h |   15 ---------------
 kernel/kprobes.c        |    9 ++++-----
 2 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 663be8debf25..9c880c8a4e80 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -190,7 +190,6 @@ static inline int kprobes_built_in(void)
 	return 1;
 }
 
-extern struct kprobe kprobe_busy;
 void kprobe_busy_begin(void);
 void kprobe_busy_end(void);
 
@@ -235,16 +234,6 @@ static inline int arch_trampoline_kprobe(struct kprobe *p)
 
 extern struct kretprobe_blackpoint kretprobe_blacklist[];
 
-static inline void kretprobe_assert(struct kretprobe_instance *ri,
-	unsigned long orig_ret_address, unsigned long trampoline_address)
-{
-	if (!orig_ret_address || (orig_ret_address == trampoline_address)) {
-		printk("kretprobe BUG!: Processing kretprobe %p @ %p\n",
-				ri->rp, ri->rp->kp.addr);
-		BUG();
-	}
-}
-
 #ifdef CONFIG_KPROBES_SANITY_TEST
 extern int init_test_probes(void);
 #else
@@ -364,10 +353,6 @@ int arch_check_ftrace_location(struct kprobe *p);
 
 /* Get the kprobe at this addr (if any) - called with preemption disabled */
 struct kprobe *get_kprobe(void *addr);
-void kretprobe_hash_lock(struct task_struct *tsk,
-			 struct hlist_head **head, unsigned long *flags);
-void kretprobe_hash_unlock(struct task_struct *tsk, unsigned long *flags);
-struct hlist_head * kretprobe_inst_table_head(struct task_struct *tsk);
 
 /* kprobe_running() will just return the current_kprobe on this CPU */
 static inline struct kprobe *kprobe_running(void)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 0676868f1ac2..732a70163584 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1239,7 +1239,7 @@ static void recycle_rp_inst(struct kretprobe_instance *ri)
 }
 NOKPROBE_SYMBOL(recycle_rp_inst);
 
-void kretprobe_hash_lock(struct task_struct *tsk,
+static void kretprobe_hash_lock(struct task_struct *tsk,
 			 struct hlist_head **head, unsigned long *flags)
 __acquires(hlist_lock)
 {
@@ -1261,7 +1261,7 @@ __acquires(hlist_lock)
 }
 NOKPROBE_SYMBOL(kretprobe_table_lock);
 
-void kretprobe_hash_unlock(struct task_struct *tsk,
+static void kretprobe_hash_unlock(struct task_struct *tsk,
 			   unsigned long *flags)
 __releases(hlist_lock)
 {
@@ -1282,7 +1282,7 @@ __releases(hlist_lock)
 }
 NOKPROBE_SYMBOL(kretprobe_table_unlock);
 
-struct kprobe kprobe_busy = {
+static struct kprobe kprobe_busy = {
 	.addr = (void *) get_kprobe,
 };
 
@@ -1983,8 +1983,7 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 			break;
 	}
 
-	kretprobe_assert(ri, (unsigned long)correct_ret_addr,
-			     (unsigned long)trampoline_address);
+	BUG_ON(!correct_ret_addr || (correct_ret_addr == trampoline_address));
 	last = ri;
 
 	hlist_for_each_entry_safe(ri, tmp, head, hlist) {


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 17/21] llist: Add nonatomic __llist_add() and __llist_dell_all()
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (15 preceding siblings ...)
  2020-08-29 13:03 ` [PATCH v5 16/21] kprobes: Make local used functions static Masami Hiramatsu
@ 2020-08-29 13:03 ` Masami Hiramatsu
  2020-10-12 16:24   ` Ingo Molnar
  2020-08-29 13:03 ` [PATCH v5 18/21] kprobes: Remove kretprobe hash Masami Hiramatsu
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:03 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

From: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/gpu/drm/i915/i915_request.c |    6 ------
 include/linux/llist.h               |   23 +++++++++++++++++++++++
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0b2fe55e6194..0e851b925c8c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -357,12 +357,6 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (i915_request_retire(tmp) && tmp != rq);
 }
 
-static void __llist_add(struct llist_node *node, struct llist_head *head)
-{
-	node->next = head->first;
-	head->first = node;
-}
-
 static struct i915_request * const *
 __engine_active(struct intel_engine_cs *engine)
 {
diff --git a/include/linux/llist.h b/include/linux/llist.h
index 2e9c7215882b..24f207b0190b 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -197,6 +197,16 @@ static inline struct llist_node *llist_next(struct llist_node *node)
 extern bool llist_add_batch(struct llist_node *new_first,
 			    struct llist_node *new_last,
 			    struct llist_head *head);
+
+static inline bool __llist_add_batch(struct llist_node *new_first,
+				     struct llist_node *new_last,
+				     struct llist_head *head)
+{
+	new_last->next = head->first;
+	head->first = new_first;
+	return new_last->next == NULL;
+}
+
 /**
  * llist_add - add a new entry
  * @new:	new entry to be added
@@ -209,6 +219,11 @@ static inline bool llist_add(struct llist_node *new, struct llist_head *head)
 	return llist_add_batch(new, new, head);
 }
 
+static inline bool __llist_add(struct llist_node *new, struct llist_head *head)
+{
+	return __llist_add_batch(new, new, head);
+}
+
 /**
  * llist_del_all - delete all entries from lock-less list
  * @head:	the head of lock-less list to delete all entries
@@ -222,6 +237,14 @@ static inline struct llist_node *llist_del_all(struct llist_head *head)
 	return xchg(&head->first, NULL);
 }
 
+static inline struct llist_node *__llist_del_all(struct llist_head *head)
+{
+	struct llist_node *first = head->first;
+
+	head->first = NULL;
+	return first;
+}
+
 extern struct llist_node *llist_del_first(struct llist_head *head);
 
 struct llist_node *llist_reverse_order(struct llist_node *head);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 18/21] kprobes: Remove kretprobe hash
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (16 preceding siblings ...)
  2020-08-29 13:03 ` [PATCH v5 17/21] llist: Add nonatomic __llist_add() and __llist_dell_all() Masami Hiramatsu
@ 2020-08-29 13:03 ` Masami Hiramatsu
  2020-08-29 13:03 ` [PATCH v5 19/21] asm-generic/atomic: Add try_cmpxchg() fallbacks Masami Hiramatsu
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:03 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

From: Peter Zijlstra <peterz@infradead.org>

The kretprobe hash is mostly superfluous, replace it with a per-task
variable.

This gets rid of the task hash and it's related locking.


Note that this may change the kprobes module-exported API for kretprobe
handlers. If any out-of-tree kretprobe user uses ri->rp, use
get_kretprobe(ri) instead.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
  Changes in v5:
    - Fix not to flush current task's kretprobe_instance
    - Remove task scan by merging kretprobe_holder patch
    - Use READ_ONCE() and rcu_locked check instead of smp_r/wmb()
    - Make the failure case in __kretprobe_trampoline_handler() clearer
---
 include/linux/kprobes.h     |   19 +++
 include/linux/sched.h       |    4 +
 kernel/fork.c               |    4 +
 kernel/kprobes.c            |  236 +++++++++++++------------------------------
 kernel/trace/trace_kprobe.c |    3 -
 5 files changed, 97 insertions(+), 169 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9c880c8a4e80..f8f87a13345a 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -27,6 +27,7 @@
 #include <linux/rcupdate.h>
 #include <linux/mutex.h>
 #include <linux/ftrace.h>
+#include <linux/refcount.h>
 #include <asm/kprobes.h>
 
 #ifdef CONFIG_KPROBES
@@ -144,6 +145,11 @@ static inline int kprobe_ftrace(struct kprobe *p)
  * ignored, due to maxactive being too low.
  *
  */
+struct kretprobe_holder {
+	struct kretprobe	*rp;
+	refcount_t		ref;
+};
+
 struct kretprobe {
 	struct kprobe kp;
 	kretprobe_handler_t handler;
@@ -152,17 +158,18 @@ struct kretprobe {
 	int nmissed;
 	size_t data_size;
 	struct hlist_head free_instances;
+	struct kretprobe_holder *rph;
 	raw_spinlock_t lock;
 };
 
 struct kretprobe_instance {
 	union {
+		struct llist_node llist;
 		struct hlist_node hlist;
 		struct rcu_head rcu;
 	};
-	struct kretprobe *rp;
+	struct kretprobe_holder *rph;
 	kprobe_opcode_t *ret_addr;
-	struct task_struct *task;
 	void *fp;
 	char data[];
 };
@@ -221,6 +228,14 @@ unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
 	return ret;
 }
 
+static nokprobe_inline struct kretprobe *get_kretprobe(struct kretprobe_instance *ri)
+{
+	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
+		"Kretprobe is accessed from instance under preemptive context");
+
+	return READ_ONCE(ri->rph->rp);
+}
+
 #else /* CONFIG_KRETPROBES */
 static inline void arch_prepare_kretprobe(struct kretprobe *rp,
 					struct pt_regs *regs)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93ecd930efd3..0f2532f052a9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1315,6 +1315,10 @@ struct task_struct {
 	struct callback_head		mce_kill_me;
 #endif
 
+#ifdef CONFIG_KRETPROBES
+	struct llist_head               kretprobe_instances;
+#endif
+
 	/*
 	 * New fields for task_struct should be added above here, so that
 	 * they are included in the randomized portion of task_struct.
diff --git a/kernel/fork.c b/kernel/fork.c
index 4d32190861bd..2ff5cceb0732 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2161,6 +2161,10 @@ static __latent_entropy struct task_struct *copy_process(
 	INIT_LIST_HEAD(&p->thread_group);
 	p->task_works = NULL;
 
+#ifdef CONFIG_KRETPROBES
+	p->kretprobe_instances.first = NULL;
+#endif
+
 	/*
 	 * Ensure that the cgroup subsystem policies allow the new process to be
 	 * forked. It should be noted the the new process's css_set can be changed
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 732a70163584..bc65603fce00 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -53,7 +53,6 @@ static int kprobes_initialized;
  * - RCU hlist traversal under disabling preempt (breakpoint handlers)
  */
 static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
-static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
 
 /* NOTE: change this value only with kprobe_mutex held */
 static bool kprobes_all_disarmed;
@@ -61,9 +60,6 @@ static bool kprobes_all_disarmed;
 /* This protects kprobe_table and optimizing_list */
 static DEFINE_MUTEX(kprobe_mutex);
 static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL;
-static struct {
-	raw_spinlock_t lock ____cacheline_aligned_in_smp;
-} kretprobe_table_locks[KPROBE_TABLE_SIZE];
 
 kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
 					unsigned int __unused)
@@ -71,11 +67,6 @@ kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
 	return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
 }
 
-static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
-{
-	return &(kretprobe_table_locks[hash].lock);
-}
-
 /* Blacklist -- list of struct kprobe_blacklist_entry */
 static LIST_HEAD(kprobe_blacklist);
 
@@ -1223,65 +1214,30 @@ void kprobes_inc_nmissed_count(struct kprobe *p)
 }
 NOKPROBE_SYMBOL(kprobes_inc_nmissed_count);
 
+static void free_rp_inst_rcu(struct rcu_head *head)
+{
+	struct kretprobe_instance *ri = container_of(head, struct kretprobe_instance, rcu);
+
+	if (refcount_dec_and_test(&ri->rph->ref))
+		kfree(ri->rph);
+	kfree(ri);
+}
+NOKPROBE_SYMBOL(free_rp_inst_rcu);
+
 static void recycle_rp_inst(struct kretprobe_instance *ri)
 {
-	struct kretprobe *rp = ri->rp;
+	struct kretprobe *rp = get_kretprobe(ri);
 
-	/* remove rp inst off the rprobe_inst_table */
-	hlist_del(&ri->hlist);
 	INIT_HLIST_NODE(&ri->hlist);
 	if (likely(rp)) {
 		raw_spin_lock(&rp->lock);
 		hlist_add_head(&ri->hlist, &rp->free_instances);
 		raw_spin_unlock(&rp->lock);
 	} else
-		kfree_rcu(ri, rcu);
+		call_rcu(&ri->rcu, free_rp_inst_rcu);
 }
 NOKPROBE_SYMBOL(recycle_rp_inst);
 
-static void kretprobe_hash_lock(struct task_struct *tsk,
-			 struct hlist_head **head, unsigned long *flags)
-__acquires(hlist_lock)
-{
-	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
-	raw_spinlock_t *hlist_lock;
-
-	*head = &kretprobe_inst_table[hash];
-	hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_hash_lock);
-
-static void kretprobe_table_lock(unsigned long hash,
-				 unsigned long *flags)
-__acquires(hlist_lock)
-{
-	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_table_lock);
-
-static void kretprobe_hash_unlock(struct task_struct *tsk,
-			   unsigned long *flags)
-__releases(hlist_lock)
-{
-	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
-	raw_spinlock_t *hlist_lock;
-
-	hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_unlock_irqrestore(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_hash_unlock);
-
-static void kretprobe_table_unlock(unsigned long hash,
-				   unsigned long *flags)
-__releases(hlist_lock)
-{
-	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_unlock_irqrestore(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_table_unlock);
-
 static struct kprobe kprobe_busy = {
 	.addr = (void *) get_kprobe,
 };
@@ -1311,24 +1267,21 @@ void kprobe_busy_end(void)
 void kprobe_flush_task(struct task_struct *tk)
 {
 	struct kretprobe_instance *ri;
-	struct hlist_head *head;
-	struct hlist_node *tmp;
-	unsigned long hash, flags = 0;
+	struct llist_node *node;
 
+	/* Early boot, not yet initialized. */
 	if (unlikely(!kprobes_initialized))
-		/* Early boot.  kretprobe_table_locks not yet initialized. */
 		return;
 
 	kprobe_busy_begin();
 
-	hash = hash_ptr(tk, KPROBE_HASH_BITS);
-	head = &kretprobe_inst_table[hash];
-	kretprobe_table_lock(hash, &flags);
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task == tk)
-			recycle_rp_inst(ri);
+	node = __llist_del_all(&tk->kretprobe_instances);
+	while (node) {
+		ri = container_of(node, struct kretprobe_instance, llist);
+		node = node->next;
+
+		recycle_rp_inst(ri);
 	}
-	kretprobe_table_unlock(hash, &flags);
 
 	kprobe_busy_end();
 }
@@ -1338,36 +1291,19 @@ static inline void free_rp_inst(struct kretprobe *rp)
 {
 	struct kretprobe_instance *ri;
 	struct hlist_node *next;
+	int count = 0;
 
 	hlist_for_each_entry_safe(ri, next, &rp->free_instances, hlist) {
 		hlist_del(&ri->hlist);
 		kfree(ri);
+		count++;
 	}
-}
-
-static void cleanup_rp_inst(struct kretprobe *rp)
-{
-	unsigned long flags, hash;
-	struct kretprobe_instance *ri;
-	struct hlist_node *next;
-	struct hlist_head *head;
 
-	/* To avoid recursive kretprobe by NMI, set kprobe busy here */
-	kprobe_busy_begin();
-	for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
-		kretprobe_table_lock(hash, &flags);
-		head = &kretprobe_inst_table[hash];
-		hlist_for_each_entry_safe(ri, next, head, hlist) {
-			if (ri->rp == rp)
-				ri->rp = NULL;
-		}
-		kretprobe_table_unlock(hash, &flags);
+	if (refcount_sub_and_test(count, &rp->rph->ref)) {
+		kfree(rp->rph);
+		rp->rph = NULL;
 	}
-	kprobe_busy_end();
-
-	free_rp_inst(rp);
 }
-NOKPROBE_SYMBOL(cleanup_rp_inst);
 
 /* Add the new probe to ap->list */
 static int add_new_kprobe(struct kprobe *ap, struct kprobe *p)
@@ -1928,88 +1864,56 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
 					     void *trampoline_address,
 					     void *frame_pointer)
 {
-	struct kretprobe_instance *ri = NULL, *last = NULL;
-	struct hlist_head *head;
-	struct hlist_node *tmp;
-	unsigned long flags;
 	kprobe_opcode_t *correct_ret_addr = NULL;
-	bool skipped = false;
+	struct kretprobe_instance *ri = NULL;
+	struct llist_node *first, *node;
+	struct kretprobe *rp;
 
-	kretprobe_hash_lock(current, &head, &flags);
+	/* Find all nodes for this frame. */
+	first = node = current->kretprobe_instances.first;
+	while (node) {
+		ri = container_of(node, struct kretprobe_instance, llist);
 
-	/*
-	 * It is possible to have multiple instances associated with a given
-	 * task either because multiple functions in the call path have
-	 * return probes installed on them, and/or more than one
-	 * return probe was registered for a target function.
-	 *
-	 * We can handle this because:
-	 *     - instances are always pushed into the head of the list
-	 *     - when multiple return probes are registered for the same
-	 *	 function, the (chronologically) first instance's ret_addr
-	 *	 will be the real return address, and all the rest will
-	 *	 point to kretprobe_trampoline.
-	 */
-	hlist_for_each_entry(ri, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-		/*
-		 * Return probes must be pushed on this hash list correct
-		 * order (same as return order) so that it can be popped
-		 * correctly. However, if we find it is pushed it incorrect
-		 * order, this means we find a function which should not be
-		 * probed, because the wrong order entry is pushed on the
-		 * path of processing other kretprobe itself.
-		 */
-		if (ri->fp != frame_pointer) {
-			if (!skipped)
-				pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n");
-			skipped = true;
-			continue;
-		}
+		BUG_ON(ri->fp != frame_pointer);
 
-		correct_ret_addr = ri->ret_addr;
-		if (skipped)
-			pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n",
-				ri->rp->kp.addr);
-
-		if (correct_ret_addr != trampoline_address)
+		if (ri->ret_addr != trampoline_address) {
+			correct_ret_addr = ri->ret_addr;
 			/*
 			 * This is the real return address. Any other
 			 * instances associated with this task are for
 			 * other calls deeper on the call stack
 			 */
-			break;
+			goto found;
+		}
+
+		node = node->next;
 	}
+	pr_err("Oops! Kretprobe fails to find correct return address.\n");
+	BUG_ON(1);
 
-	BUG_ON(!correct_ret_addr || (correct_ret_addr == trampoline_address));
-	last = ri;
+found:
+	/* Unlink all nodes for this frame. */
+	current->kretprobe_instances.first = node->next;
+	node->next = NULL;
 
-	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
-		if (ri->task != current)
-			/* another task is sharing our hash bucket */
-			continue;
-		if (ri->fp != frame_pointer)
-			continue;
+	/* Run them..  */
+	while (first) {
+		ri = container_of(first, struct kretprobe_instance, llist);
+		first = first->next;
 
-		if (ri->rp && ri->rp->handler) {
+		rp = get_kretprobe(ri);
+		if (rp && rp->handler) {
 			struct kprobe *prev = kprobe_running();
 
-			__this_cpu_write(current_kprobe, &ri->rp->kp);
+			__this_cpu_write(current_kprobe, &rp->kp);
 			ri->ret_addr = correct_ret_addr;
-			ri->rp->handler(ri, regs);
+			rp->handler(ri, regs);
 			__this_cpu_write(current_kprobe, prev);
 		}
 
 		recycle_rp_inst(ri);
-
-		if (ri == last)
-			break;
 	}
 
-	kretprobe_hash_unlock(current, &flags);
-
 	return (unsigned long)correct_ret_addr;
 }
 NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
@@ -2021,11 +1925,10 @@ NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
 static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 {
 	struct kretprobe *rp = container_of(p, struct kretprobe, kp);
-	unsigned long hash, flags = 0;
+	unsigned long flags = 0;
 	struct kretprobe_instance *ri;
 
 	/* TODO: consider to only swap the RA after the last pre_handler fired */
-	hash = hash_ptr(current, KPROBE_HASH_BITS);
 	raw_spin_lock_irqsave(&rp->lock, flags);
 	if (!hlist_empty(&rp->free_instances)) {
 		ri = hlist_entry(rp->free_instances.first,
@@ -2033,9 +1936,6 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 		hlist_del(&ri->hlist);
 		raw_spin_unlock_irqrestore(&rp->lock, flags);
 
-		ri->rp = rp;
-		ri->task = current;
-
 		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
 			raw_spin_lock_irqsave(&rp->lock, flags);
 			hlist_add_head(&ri->hlist, &rp->free_instances);
@@ -2045,11 +1945,8 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 
 		arch_prepare_kretprobe(ri, regs);
 
-		/* XXX(hch): why is there no hlist_move_head? */
-		INIT_HLIST_NODE(&ri->hlist);
-		kretprobe_table_lock(hash, &flags);
-		hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
-		kretprobe_table_unlock(hash, &flags);
+		__llist_add(&ri->llist, &current->kretprobe_instances);
+
 	} else {
 		rp->nmissed++;
 		raw_spin_unlock_irqrestore(&rp->lock, flags);
@@ -2112,16 +2009,24 @@ int register_kretprobe(struct kretprobe *rp)
 	}
 	raw_spin_lock_init(&rp->lock);
 	INIT_HLIST_HEAD(&rp->free_instances);
+	rp->rph = kzalloc(sizeof(struct kretprobe_holder), GFP_KERNEL);
+	if (!rp->rph)
+		return -ENOMEM;
+
+	rp->rph->rp = rp;
 	for (i = 0; i < rp->maxactive; i++) {
-		inst = kmalloc(sizeof(struct kretprobe_instance) +
+		inst = kzalloc(sizeof(struct kretprobe_instance) +
 			       rp->data_size, GFP_KERNEL);
 		if (inst == NULL) {
+			refcount_set(&rp->rph->ref, i);
 			free_rp_inst(rp);
 			return -ENOMEM;
 		}
+		inst->rph = rp->rph;
 		INIT_HLIST_NODE(&inst->hlist);
 		hlist_add_head(&inst->hlist, &rp->free_instances);
 	}
+	refcount_set(&rp->rph->ref, i);
 
 	rp->nmissed = 0;
 	/* Establish function entry probe point */
@@ -2163,16 +2068,18 @@ void unregister_kretprobes(struct kretprobe **rps, int num)
 	if (num <= 0)
 		return;
 	mutex_lock(&kprobe_mutex);
-	for (i = 0; i < num; i++)
+	for (i = 0; i < num; i++) {
 		if (__unregister_kprobe_top(&rps[i]->kp) < 0)
 			rps[i]->kp.addr = NULL;
+		rps[i]->rph->rp = NULL;
+	}
 	mutex_unlock(&kprobe_mutex);
 
 	synchronize_rcu();
 	for (i = 0; i < num; i++) {
 		if (rps[i]->kp.addr) {
 			__unregister_kprobe_bottom(&rps[i]->kp);
-			cleanup_rp_inst(rps[i]);
+			free_rp_inst(rps[i]);
 		}
 	}
 }
@@ -2534,11 +2441,8 @@ static int __init init_kprobes(void)
 
 	/* FIXME allocate the probe table, currently defined statically */
 	/* initialize all list heads */
-	for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
+	for (i = 0; i < KPROBE_TABLE_SIZE; i++)
 		INIT_HLIST_HEAD(&kprobe_table[i]);
-		INIT_HLIST_HEAD(&kretprobe_inst_table[i]);
-		raw_spin_lock_init(&(kretprobe_table_locks[i].lock));
-	}
 
 	err = populate_kprobe_blacklist(__start_kprobe_blacklist,
 					__stop_kprobe_blacklist);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index aefb6065b508..07baf6f6cecc 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1714,7 +1714,8 @@ NOKPROBE_SYMBOL(kprobe_dispatcher);
 static int
 kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
 {
-	struct trace_kprobe *tk = container_of(ri->rp, struct trace_kprobe, rp);
+	struct kretprobe *rp = get_kretprobe(ri);
+	struct trace_kprobe *tk = container_of(rp, struct trace_kprobe, rp);
 
 	raw_cpu_inc(*tk->nhit);
 


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 19/21] asm-generic/atomic: Add try_cmpxchg() fallbacks
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (17 preceding siblings ...)
  2020-08-29 13:03 ` [PATCH v5 18/21] kprobes: Remove kretprobe hash Masami Hiramatsu
@ 2020-08-29 13:03 ` Masami Hiramatsu
  2020-10-12 16:25   ` Ingo Molnar
  2020-08-29 13:03 ` [PATCH v5 20/21] freelist: Lock less freelist Masami Hiramatsu
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:03 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

From: Peter Zijlstra <peterz@infradead.org>

Only x86 provides try_cmpxchg() outside of the atomic_t interfaces,
provide generic fallbacks to create this interface from the widely
available cmpxchg() function.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
---
 arch/x86/include/asm/atomic.h             |    2 
 arch/x86/include/asm/atomic64_64.h        |    2 
 arch/x86/include/asm/cmpxchg.h            |    2 
 include/asm-generic/atomic-instrumented.h |  216 +++++++++++++++++------------
 include/linux/atomic-arch-fallback.h      |   90 +++++++++++-
 include/linux/atomic-fallback.h           |   90 +++++++++++-
 scripts/atomic/gen-atomic-fallback.sh     |   63 ++++++++
 scripts/atomic/gen-atomic-instrumented.sh |   29 +++-
 8 files changed, 372 insertions(+), 122 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index b6cac6e9bb70..f732741ad7c7 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -199,7 +199,7 @@ static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
 
 static __always_inline bool arch_atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
-	return try_cmpxchg(&v->counter, old, new);
+	return arch_try_cmpxchg(&v->counter, old, new);
 }
 #define arch_atomic_try_cmpxchg arch_atomic_try_cmpxchg
 
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 809bd010a751..7886d0578fc9 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -187,7 +187,7 @@ static inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 
 static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
-	return try_cmpxchg(&v->counter, old, new);
+	return arch_try_cmpxchg(&v->counter, old, new);
 }
 #define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
 
diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index a8bfac131256..4d4ec5cbdc51 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,7 +221,7 @@ extern void __add_wrong_size(void)
 #define __try_cmpxchg(ptr, pold, new, size)				\
 	__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)
 
-#define try_cmpxchg(ptr, pold, new) 					\
+#define arch_try_cmpxchg(ptr, pold, new) 				\
 	__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))
 
 /*
diff --git a/include/asm-generic/atomic-instrumented.h b/include/asm-generic/atomic-instrumented.h
index 379986e40159..2cab3fdaae3b 100644
--- a/include/asm-generic/atomic-instrumented.h
+++ b/include/asm-generic/atomic-instrumented.h
@@ -1642,148 +1642,192 @@ atomic64_dec_if_positive(atomic64_t *v)
 #endif
 
 #if !defined(arch_xchg_relaxed) || defined(arch_xchg)
-#define xchg(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_xchg(__ai_ptr, __VA_ARGS__);				\
+#define xchg(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_xchg(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
 #if defined(arch_xchg_acquire)
-#define xchg_acquire(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_xchg_acquire(__ai_ptr, __VA_ARGS__);				\
+#define xchg_acquire(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_xchg_acquire(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
 #if defined(arch_xchg_release)
-#define xchg_release(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_xchg_release(__ai_ptr, __VA_ARGS__);				\
+#define xchg_release(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_xchg_release(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
 #if defined(arch_xchg_relaxed)
-#define xchg_relaxed(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_xchg_relaxed(__ai_ptr, __VA_ARGS__);				\
+#define xchg_relaxed(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_xchg_relaxed(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
 #if !defined(arch_cmpxchg_relaxed) || defined(arch_cmpxchg)
-#define cmpxchg(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_cmpxchg(__ai_ptr, __VA_ARGS__);				\
+#define cmpxchg(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_cmpxchg(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
 #if defined(arch_cmpxchg_acquire)
-#define cmpxchg_acquire(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__);				\
+#define cmpxchg_acquire(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
 #if defined(arch_cmpxchg_release)
-#define cmpxchg_release(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_cmpxchg_release(__ai_ptr, __VA_ARGS__);				\
+#define cmpxchg_release(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_cmpxchg_release(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
 #if defined(arch_cmpxchg_relaxed)
-#define cmpxchg_relaxed(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__);				\
+#define cmpxchg_relaxed(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
 #if !defined(arch_cmpxchg64_relaxed) || defined(arch_cmpxchg64)
-#define cmpxchg64(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_cmpxchg64(__ai_ptr, __VA_ARGS__);				\
+#define cmpxchg64(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
 #if defined(arch_cmpxchg64_acquire)
-#define cmpxchg64_acquire(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__);				\
+#define cmpxchg64_acquire(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
 #if defined(arch_cmpxchg64_release)
-#define cmpxchg64_release(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__);				\
+#define cmpxchg64_release(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
 #if defined(arch_cmpxchg64_relaxed)
-#define cmpxchg64_relaxed(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__);				\
+#define cmpxchg64_relaxed(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__); \
 })
 #endif
 
-#define cmpxchg_local(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_cmpxchg_local(__ai_ptr, __VA_ARGS__);				\
+#if !defined(arch_try_cmpxchg_relaxed) || defined(arch_try_cmpxchg)
+#define try_cmpxchg(ptr, oldp, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	typeof(oldp) __ai_oldp = (oldp); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+	arch_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#if defined(arch_try_cmpxchg_acquire)
+#define try_cmpxchg_acquire(ptr, oldp, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	typeof(oldp) __ai_oldp = (oldp); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+	arch_try_cmpxchg_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#if defined(arch_try_cmpxchg_release)
+#define try_cmpxchg_release(ptr, oldp, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	typeof(oldp) __ai_oldp = (oldp); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+	arch_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#if defined(arch_try_cmpxchg_relaxed)
+#define try_cmpxchg_relaxed(ptr, oldp, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	typeof(oldp) __ai_oldp = (oldp); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+	arch_try_cmpxchg_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#define cmpxchg_local(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_cmpxchg_local(__ai_ptr, __VA_ARGS__); \
 })
 
-#define cmpxchg64_local(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__);				\
+#define cmpxchg64_local(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__); \
 })
 
-#define sync_cmpxchg(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr));		\
-	arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__);				\
+#define sync_cmpxchg(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+	arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
 })
 
-#define cmpxchg_double(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
-	arch_cmpxchg_double(__ai_ptr, __VA_ARGS__);				\
+#define cmpxchg_double(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
+	arch_cmpxchg_double(__ai_ptr, __VA_ARGS__); \
 })
 
 
-#define cmpxchg_double_local(ptr, ...)						\
-({									\
-	typeof(ptr) __ai_ptr = (ptr);					\
-	instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
-	arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__);				\
+#define cmpxchg_double_local(ptr, ...) \
+({ \
+	typeof(ptr) __ai_ptr = (ptr); \
+	instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
+	arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__); \
 })
 
 #endif /* _ASM_GENERIC_ATOMIC_INSTRUMENTED_H */
-// 89bf97f3a7509b740845e51ddf31055b48a81f40
+// ff0fe7f81ee97f01f13bb78b0e3ce800bc56d9dd
diff --git a/include/linux/atomic-arch-fallback.h b/include/linux/atomic-arch-fallback.h
index bcb6aa27cfa6..a3dba31df01e 100644
--- a/include/linux/atomic-arch-fallback.h
+++ b/include/linux/atomic-arch-fallback.h
@@ -9,9 +9,9 @@
 #include <linux/compiler.h>
 
 #ifndef arch_xchg_relaxed
-#define arch_xchg_relaxed		arch_xchg
-#define arch_xchg_acquire		arch_xchg
-#define arch_xchg_release		arch_xchg
+#define arch_xchg_acquire arch_xchg
+#define arch_xchg_release arch_xchg
+#define arch_xchg_relaxed arch_xchg
 #else /* arch_xchg_relaxed */
 
 #ifndef arch_xchg_acquire
@@ -32,9 +32,9 @@
 #endif /* arch_xchg_relaxed */
 
 #ifndef arch_cmpxchg_relaxed
-#define arch_cmpxchg_relaxed		arch_cmpxchg
-#define arch_cmpxchg_acquire		arch_cmpxchg
-#define arch_cmpxchg_release		arch_cmpxchg
+#define arch_cmpxchg_acquire arch_cmpxchg
+#define arch_cmpxchg_release arch_cmpxchg
+#define arch_cmpxchg_relaxed arch_cmpxchg
 #else /* arch_cmpxchg_relaxed */
 
 #ifndef arch_cmpxchg_acquire
@@ -55,9 +55,9 @@
 #endif /* arch_cmpxchg_relaxed */
 
 #ifndef arch_cmpxchg64_relaxed
-#define arch_cmpxchg64_relaxed		arch_cmpxchg64
-#define arch_cmpxchg64_acquire		arch_cmpxchg64
-#define arch_cmpxchg64_release		arch_cmpxchg64
+#define arch_cmpxchg64_acquire arch_cmpxchg64
+#define arch_cmpxchg64_release arch_cmpxchg64
+#define arch_cmpxchg64_relaxed arch_cmpxchg64
 #else /* arch_cmpxchg64_relaxed */
 
 #ifndef arch_cmpxchg64_acquire
@@ -77,6 +77,76 @@
 
 #endif /* arch_cmpxchg64_relaxed */
 
+#ifndef arch_try_cmpxchg_relaxed
+#ifdef arch_try_cmpxchg
+#define arch_try_cmpxchg_acquire arch_try_cmpxchg
+#define arch_try_cmpxchg_release arch_try_cmpxchg
+#define arch_try_cmpxchg_relaxed arch_try_cmpxchg
+#endif /* arch_try_cmpxchg */
+
+#ifndef arch_try_cmpxchg
+#define arch_try_cmpxchg(_ptr, _oldp, _new) \
+({ \
+	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+	___r = arch_cmpxchg((_ptr), ___o, (_new)); \
+	if (unlikely(___r != ___o)) \
+		*___op = ___r; \
+	likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg */
+
+#ifndef arch_try_cmpxchg_acquire
+#define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \
+({ \
+	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+	___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \
+	if (unlikely(___r != ___o)) \
+		*___op = ___r; \
+	likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_acquire */
+
+#ifndef arch_try_cmpxchg_release
+#define arch_try_cmpxchg_release(_ptr, _oldp, _new) \
+({ \
+	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+	___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \
+	if (unlikely(___r != ___o)) \
+		*___op = ___r; \
+	likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_release */
+
+#ifndef arch_try_cmpxchg_relaxed
+#define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \
+({ \
+	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+	___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \
+	if (unlikely(___r != ___o)) \
+		*___op = ___r; \
+	likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_relaxed */
+
+#else /* arch_try_cmpxchg_relaxed */
+
+#ifndef arch_try_cmpxchg_acquire
+#define arch_try_cmpxchg_acquire(...) \
+	__atomic_op_acquire(arch_try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef arch_try_cmpxchg_release
+#define arch_try_cmpxchg_release(...) \
+	__atomic_op_release(arch_try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef arch_try_cmpxchg
+#define arch_try_cmpxchg(...) \
+	__atomic_op_fence(arch_try_cmpxchg, __VA_ARGS__)
+#endif
+
+#endif /* arch_try_cmpxchg_relaxed */
+
 #ifndef arch_atomic_read_acquire
 static __always_inline int
 arch_atomic_read_acquire(const atomic_t *v)
@@ -2288,4 +2358,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
 #endif
 
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 90cd26cfd69d2250303d654955a0cc12620fb91b
+// cca554917d7ea73d5e3e7397dd70c484cad9b2c4
diff --git a/include/linux/atomic-fallback.h b/include/linux/atomic-fallback.h
index fd525c71d676..2a3f55d98be9 100644
--- a/include/linux/atomic-fallback.h
+++ b/include/linux/atomic-fallback.h
@@ -9,9 +9,9 @@
 #include <linux/compiler.h>
 
 #ifndef xchg_relaxed
-#define xchg_relaxed		xchg
-#define xchg_acquire		xchg
-#define xchg_release		xchg
+#define xchg_acquire xchg
+#define xchg_release xchg
+#define xchg_relaxed xchg
 #else /* xchg_relaxed */
 
 #ifndef xchg_acquire
@@ -32,9 +32,9 @@
 #endif /* xchg_relaxed */
 
 #ifndef cmpxchg_relaxed
-#define cmpxchg_relaxed		cmpxchg
-#define cmpxchg_acquire		cmpxchg
-#define cmpxchg_release		cmpxchg
+#define cmpxchg_acquire cmpxchg
+#define cmpxchg_release cmpxchg
+#define cmpxchg_relaxed cmpxchg
 #else /* cmpxchg_relaxed */
 
 #ifndef cmpxchg_acquire
@@ -55,9 +55,9 @@
 #endif /* cmpxchg_relaxed */
 
 #ifndef cmpxchg64_relaxed
-#define cmpxchg64_relaxed		cmpxchg64
-#define cmpxchg64_acquire		cmpxchg64
-#define cmpxchg64_release		cmpxchg64
+#define cmpxchg64_acquire cmpxchg64
+#define cmpxchg64_release cmpxchg64
+#define cmpxchg64_relaxed cmpxchg64
 #else /* cmpxchg64_relaxed */
 
 #ifndef cmpxchg64_acquire
@@ -77,6 +77,76 @@
 
 #endif /* cmpxchg64_relaxed */
 
+#ifndef try_cmpxchg_relaxed
+#ifdef try_cmpxchg
+#define try_cmpxchg_acquire try_cmpxchg
+#define try_cmpxchg_release try_cmpxchg
+#define try_cmpxchg_relaxed try_cmpxchg
+#endif /* try_cmpxchg */
+
+#ifndef try_cmpxchg
+#define try_cmpxchg(_ptr, _oldp, _new) \
+({ \
+	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+	___r = cmpxchg((_ptr), ___o, (_new)); \
+	if (unlikely(___r != ___o)) \
+		*___op = ___r; \
+	likely(___r == ___o); \
+})
+#endif /* try_cmpxchg */
+
+#ifndef try_cmpxchg_acquire
+#define try_cmpxchg_acquire(_ptr, _oldp, _new) \
+({ \
+	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+	___r = cmpxchg_acquire((_ptr), ___o, (_new)); \
+	if (unlikely(___r != ___o)) \
+		*___op = ___r; \
+	likely(___r == ___o); \
+})
+#endif /* try_cmpxchg_acquire */
+
+#ifndef try_cmpxchg_release
+#define try_cmpxchg_release(_ptr, _oldp, _new) \
+({ \
+	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+	___r = cmpxchg_release((_ptr), ___o, (_new)); \
+	if (unlikely(___r != ___o)) \
+		*___op = ___r; \
+	likely(___r == ___o); \
+})
+#endif /* try_cmpxchg_release */
+
+#ifndef try_cmpxchg_relaxed
+#define try_cmpxchg_relaxed(_ptr, _oldp, _new) \
+({ \
+	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+	___r = cmpxchg_relaxed((_ptr), ___o, (_new)); \
+	if (unlikely(___r != ___o)) \
+		*___op = ___r; \
+	likely(___r == ___o); \
+})
+#endif /* try_cmpxchg_relaxed */
+
+#else /* try_cmpxchg_relaxed */
+
+#ifndef try_cmpxchg_acquire
+#define try_cmpxchg_acquire(...) \
+	__atomic_op_acquire(try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef try_cmpxchg_release
+#define try_cmpxchg_release(...) \
+	__atomic_op_release(try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef try_cmpxchg
+#define try_cmpxchg(...) \
+	__atomic_op_fence(try_cmpxchg, __VA_ARGS__)
+#endif
+
+#endif /* try_cmpxchg_relaxed */
+
 #define arch_atomic_read atomic_read
 #define arch_atomic_read_acquire atomic_read_acquire
 
@@ -2522,4 +2592,4 @@ atomic64_dec_if_positive(atomic64_t *v)
 #endif
 
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 9d95b56f98d82a2a26c7b79ccdd0c47572d50a6f
+// d78e6c293c661c15188f0ec05bce45188c8d5892
diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
index 693dfa1de430..317a6cec76e1 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -144,15 +144,11 @@ gen_proto_order_variants()
 	printf "#endif /* ${basename}_relaxed */\n\n"
 }
 
-gen_xchg_fallbacks()
+gen_order_fallbacks()
 {
 	local xchg="$1"; shift
+
 cat <<EOF
-#ifndef ${xchg}_relaxed
-#define ${xchg}_relaxed		${xchg}
-#define ${xchg}_acquire		${xchg}
-#define ${xchg}_release		${xchg}
-#else /* ${xchg}_relaxed */
 
 #ifndef ${xchg}_acquire
 #define ${xchg}_acquire(...) \\
@@ -169,11 +165,62 @@ cat <<EOF
 	__atomic_op_fence(${xchg}, __VA_ARGS__)
 #endif
 
-#endif /* ${xchg}_relaxed */
+EOF
+}
+
+gen_xchg_fallbacks()
+{
+	local xchg="$1"; shift
+	printf "#ifndef ${xchg}_relaxed\n"
+
+	gen_basic_fallbacks ${xchg}
+
+	printf "#else /* ${xchg}_relaxed */\n"
+
+	gen_order_fallbacks ${xchg}
+
+	printf "#endif /* ${xchg}_relaxed */\n\n"
+}
+
+gen_try_cmpxchg_fallback()
+{
+	local order="$1"; shift;
+
+cat <<EOF
+#ifndef ${ARCH}try_cmpxchg${order}
+#define ${ARCH}try_cmpxchg${order}(_ptr, _oldp, _new) \\
+({ \\
+	typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \\
+	___r = ${ARCH}cmpxchg${order}((_ptr), ___o, (_new)); \\
+	if (unlikely(___r != ___o)) \\
+		*___op = ___r; \\
+	likely(___r == ___o); \\
+})
+#endif /* ${ARCH}try_cmpxchg${order} */
 
 EOF
 }
 
+gen_try_cmpxchg_fallbacks()
+{
+	printf "#ifndef ${ARCH}try_cmpxchg_relaxed\n"
+	printf "#ifdef ${ARCH}try_cmpxchg\n"
+
+	gen_basic_fallbacks "${ARCH}try_cmpxchg"
+
+	printf "#endif /* ${ARCH}try_cmpxchg */\n\n"
+
+	for order in "" "_acquire" "_release" "_relaxed"; do
+		gen_try_cmpxchg_fallback "${order}"
+	done
+
+	printf "#else /* ${ARCH}try_cmpxchg_relaxed */\n"
+
+	gen_order_fallbacks "${ARCH}try_cmpxchg"
+
+	printf "#endif /* ${ARCH}try_cmpxchg_relaxed */\n\n"
+}
+
 cat << EOF
 // SPDX-License-Identifier: GPL-2.0
 
@@ -191,6 +238,8 @@ for xchg in "${ARCH}xchg" "${ARCH}cmpxchg" "${ARCH}cmpxchg64"; do
 	gen_xchg_fallbacks "${xchg}"
 done
 
+gen_try_cmpxchg_fallbacks
+
 grep '^[a-z]' "$1" | while read name meta args; do
 	gen_proto "${meta}" "${name}" "${ARCH}" "atomic" "int" ${args}
 done
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index 6afadf73da17..85dc25685c0d 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -103,14 +103,31 @@ gen_xchg()
 	local xchg="$1"; shift
 	local mult="$1"; shift
 
+	if [ "${xchg%${xchg#try_cmpxchg}}" = "try_cmpxchg" ] ; then
+
+cat <<EOF
+#define ${xchg}(ptr, oldp, ...) \\
+({ \\
+	typeof(ptr) __ai_ptr = (ptr); \\
+	typeof(oldp) __ai_oldp = (oldp); \\
+	instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
+	instrument_atomic_write(__ai_oldp, ${mult}sizeof(*__ai_oldp)); \\
+	arch_${xchg}(__ai_ptr, __ai_oldp, __VA_ARGS__); \\
+})
+EOF
+
+	else
+
 cat <<EOF
-#define ${xchg}(ptr, ...)						\\
-({									\\
-	typeof(ptr) __ai_ptr = (ptr);					\\
-	instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));		\\
-	arch_${xchg}(__ai_ptr, __VA_ARGS__);				\\
+#define ${xchg}(ptr, ...) \\
+({ \\
+	typeof(ptr) __ai_ptr = (ptr); \\
+	instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
+	arch_${xchg}(__ai_ptr, __VA_ARGS__); \\
 })
 EOF
+
+	fi
 }
 
 gen_optional_xchg()
@@ -160,7 +177,7 @@ grep '^[a-z]' "$1" | while read name meta args; do
 	gen_proto "${meta}" "${name}" "atomic64" "s64" ${args}
 done
 
-for xchg in "xchg" "cmpxchg" "cmpxchg64"; do
+for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg"; do
 	for order in "" "_acquire" "_release" "_relaxed"; do
 		gen_optional_xchg "${xchg}" "${order}"
 	done


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 20/21] freelist: Lock less freelist
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (18 preceding siblings ...)
  2020-08-29 13:03 ` [PATCH v5 19/21] asm-generic/atomic: Add try_cmpxchg() fallbacks Masami Hiramatsu
@ 2020-08-29 13:03 ` Masami Hiramatsu
  2020-08-29 13:03 ` [PATCH v5 21/21] kprobes: Replace rp->free_instance with freelist Masami Hiramatsu
  2020-09-01 19:08 ` [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Peter Zijlstra
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:03 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

From: Peter Zijlstra <peterz@infradead.org>

Cc: cameron@moodycamel.com
Cc: oleg@redhat.com
Cc: will@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/freelist.h |  129 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 129 insertions(+)
 create mode 100644 include/linux/freelist.h

diff --git a/include/linux/freelist.h b/include/linux/freelist.h
new file mode 100644
index 000000000000..fc1842b96469
--- /dev/null
+++ b/include/linux/freelist.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause */
+#ifndef FREELIST_H
+#define FREELIST_H
+
+#include <linux/atomic.h>
+
+/*
+ * Copyright: cameron@moodycamel.com
+ *
+ * A simple CAS-based lock-free free list. Not the fastest thing in the world
+ * under heavy contention, but simple and correct (assuming nodes are never
+ * freed until after the free list is destroyed), and fairly speedy under low
+ * contention.
+ *
+ * Adapted from: https://moodycamel.com/blog/2014/solving-the-aba-problem-for-lock-free-free-lists
+ */
+
+struct freelist_node {
+	atomic_t		refs;
+	struct freelist_node	*next;
+};
+
+struct freelist_head {
+	struct freelist_node	*head;
+};
+
+#define REFS_ON_FREELIST 0x80000000
+#define REFS_MASK	 0x7FFFFFFF
+
+static inline void __freelist_add(struct freelist_node *node, struct freelist_head *list)
+{
+	/*
+	 * Since the refcount is zero, and nobody can increase it once it's
+	 * zero (except us, and we run only one copy of this method per node at
+	 * a time, i.e. the single thread case), then we know we can safely
+	 * change the next pointer of the node; however, once the refcount is
+	 * back above zero, then other threads could increase it (happens under
+	 * heavy contention, when the refcount goes to zero in between a load
+	 * and a refcount increment of a node in try_get, then back up to
+	 * something non-zero, then the refcount increment is done by the other
+	 * thread) -- so if the CAS to add the node to the actual list fails,
+	 * decrese the refcount and leave the add operation to the next thread
+	 * who puts the refcount back to zero (which could be us, hence the
+	 * loop).
+	 */
+	struct freelist_node *head = READ_ONCE(list->head);
+
+	for (;;) {
+		WRITE_ONCE(node->next, head);
+		atomic_set_release(&node->refs, 1);
+
+		if (!try_cmpxchg_release(&list->head, &head, node)) {
+			/*
+			 * Hmm, the add failed, but we can only try again when
+			 * the refcount goes back to zero.
+			 */
+			if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, &node->refs) == 1)
+				continue;
+		}
+		return;
+	}
+}
+
+static inline void freelist_add(struct freelist_node *node, struct freelist_head *list)
+{
+	/*
+	 * We know that the should-be-on-freelist bit is 0 at this point, so
+	 * it's safe to set it using a fetch_add.
+	 */
+	if (!atomic_fetch_add_release(REFS_ON_FREELIST, &node->refs)) {
+		/*
+		 * Oh look! We were the last ones referencing this node, and we
+		 * know we want to add it to the free list, so let's do it!
+		 */
+		__freelist_add(node, list);
+	}
+}
+
+static inline struct freelist_node *freelist_try_get(struct freelist_head *list)
+{
+	struct freelist_node *prev, *next, *head = smp_load_acquire(&list->head);
+	unsigned int refs;
+
+	while (head) {
+		prev = head;
+		refs = atomic_read(&head->refs);
+		if ((refs & REFS_MASK) == 0 ||
+		    !atomic_try_cmpxchg_acquire(&head->refs, &refs, refs+1)) {
+			head = smp_load_acquire(&list->head);
+			continue;
+		}
+
+		/*
+		 * Good, reference count has been incremented (it wasn't at
+		 * zero), which means we can read the next and not worry about
+		 * it changing between now and the time we do the CAS.
+		 */
+		next = READ_ONCE(head->next);
+		if (try_cmpxchg_acquire(&list->head, &head, next)) {
+			/*
+			 * Yay, got the node. This means it was on the list,
+			 * which means should-be-on-freelist must be false no
+			 * matter the refcount (because nobody else knows it's
+			 * been taken off yet, it can't have been put back on).
+			 */
+			WARN_ON_ONCE(atomic_read(&head->refs) & REFS_ON_FREELIST);
+
+			/*
+			 * Decrease refcount twice, once for our ref, and once
+			 * for the list's ref.
+			 */
+			atomic_fetch_add(-2, &head->refs);
+
+			return head;
+		}
+
+		/*
+		 * OK, the head must have changed on us, but we still need to decrement
+		 * the refcount we increased.
+		 */
+		refs = atomic_fetch_add(-1, &prev->refs);
+		if (refs == REFS_ON_FREELIST + 1)
+			__freelist_add(prev, list);
+	}
+
+	return NULL;
+}
+
+#endif /* FREELIST_H */


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v5 21/21] kprobes: Replace rp->free_instance with freelist
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (19 preceding siblings ...)
  2020-08-29 13:03 ` [PATCH v5 20/21] freelist: Lock less freelist Masami Hiramatsu
@ 2020-08-29 13:03 ` Masami Hiramatsu
  2020-09-01 19:08 ` [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Peter Zijlstra
  21 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-08-29 13:03 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra
  Cc: Eddy_Wu, x86, davem, rostedt, naveen.n.rao, anil.s.keshavamurthy,
	linux-arch, cameron, oleg, will, paulmck, mhiramat

From: Peter Zijlstra <peterz@infradead.org>

Gets rid of rp->lock, and as a result kretprobes are now fully
lockless.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Changes
  - [MH] expel the llist from anon union in kretprobe_instance
---
 include/linux/kprobes.h |    8 +++----
 kernel/kprobes.c        |   56 ++++++++++++++++++++---------------------------
 2 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index f8f87a13345a..89ff0fc15286 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -28,6 +28,7 @@
 #include <linux/mutex.h>
 #include <linux/ftrace.h>
 #include <linux/refcount.h>
+#include <linux/freelist.h>
 #include <asm/kprobes.h>
 
 #ifdef CONFIG_KPROBES
@@ -157,17 +158,16 @@ struct kretprobe {
 	int maxactive;
 	int nmissed;
 	size_t data_size;
-	struct hlist_head free_instances;
+	struct freelist_head freelist;
 	struct kretprobe_holder *rph;
-	raw_spinlock_t lock;
 };
 
 struct kretprobe_instance {
 	union {
-		struct llist_node llist;
-		struct hlist_node hlist;
+		struct freelist_node freelist;
 		struct rcu_head rcu;
 	};
+	struct llist_node llist;
 	struct kretprobe_holder *rph;
 	kprobe_opcode_t *ret_addr;
 	void *fp;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index bc65603fce00..af6551992b91 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1228,11 +1228,8 @@ static void recycle_rp_inst(struct kretprobe_instance *ri)
 {
 	struct kretprobe *rp = get_kretprobe(ri);
 
-	INIT_HLIST_NODE(&ri->hlist);
 	if (likely(rp)) {
-		raw_spin_lock(&rp->lock);
-		hlist_add_head(&ri->hlist, &rp->free_instances);
-		raw_spin_unlock(&rp->lock);
+		freelist_add(&ri->freelist, &rp->freelist);
 	} else
 		call_rcu(&ri->rcu, free_rp_inst_rcu);
 }
@@ -1290,11 +1287,14 @@ NOKPROBE_SYMBOL(kprobe_flush_task);
 static inline void free_rp_inst(struct kretprobe *rp)
 {
 	struct kretprobe_instance *ri;
-	struct hlist_node *next;
+	struct freelist_node *node;
 	int count = 0;
 
-	hlist_for_each_entry_safe(ri, next, &rp->free_instances, hlist) {
-		hlist_del(&ri->hlist);
+	node = rp->freelist.head;
+	while (node) {
+		ri = container_of(node, struct kretprobe_instance, freelist);
+		node = node->next;
+
 		kfree(ri);
 		count++;
 	}
@@ -1925,32 +1925,26 @@ NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
 static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 {
 	struct kretprobe *rp = container_of(p, struct kretprobe, kp);
-	unsigned long flags = 0;
 	struct kretprobe_instance *ri;
+	struct freelist_node *fn;
 
-	/* TODO: consider to only swap the RA after the last pre_handler fired */
-	raw_spin_lock_irqsave(&rp->lock, flags);
-	if (!hlist_empty(&rp->free_instances)) {
-		ri = hlist_entry(rp->free_instances.first,
-				struct kretprobe_instance, hlist);
-		hlist_del(&ri->hlist);
-		raw_spin_unlock_irqrestore(&rp->lock, flags);
-
-		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
-			raw_spin_lock_irqsave(&rp->lock, flags);
-			hlist_add_head(&ri->hlist, &rp->free_instances);
-			raw_spin_unlock_irqrestore(&rp->lock, flags);
-			return 0;
-		}
-
-		arch_prepare_kretprobe(ri, regs);
+	fn = freelist_try_get(&rp->freelist);
+	if (!fn) {
+		rp->nmissed++;
+		return 0;
+	}
 
-		__llist_add(&ri->llist, &current->kretprobe_instances);
+	ri = container_of(fn, struct kretprobe_instance, freelist);
 
-	} else {
-		rp->nmissed++;
-		raw_spin_unlock_irqrestore(&rp->lock, flags);
+	if (rp->entry_handler && rp->entry_handler(ri, regs)) {
+		freelist_add(&ri->freelist, &rp->freelist);
+		return 0;
 	}
+
+	arch_prepare_kretprobe(ri, regs);
+
+	__llist_add(&ri->llist, &current->kretprobe_instances);
+
 	return 0;
 }
 NOKPROBE_SYMBOL(pre_handler_kretprobe);
@@ -2007,8 +2001,7 @@ int register_kretprobe(struct kretprobe *rp)
 		rp->maxactive = num_possible_cpus();
 #endif
 	}
-	raw_spin_lock_init(&rp->lock);
-	INIT_HLIST_HEAD(&rp->free_instances);
+	rp->freelist.head = NULL;
 	rp->rph = kzalloc(sizeof(struct kretprobe_holder), GFP_KERNEL);
 	if (!rp->rph)
 		return -ENOMEM;
@@ -2023,8 +2016,7 @@ int register_kretprobe(struct kretprobe *rp)
 			return -ENOMEM;
 		}
 		inst->rph = rp->rph;
-		INIT_HLIST_NODE(&inst->hlist);
-		hlist_add_head(&inst->hlist, &rp->free_instances);
+		freelist_add(&inst->freelist, &rp->freelist);
 	}
 	refcount_set(&rp->rph->ref, i);
 


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
                   ` (20 preceding siblings ...)
  2020-08-29 13:03 ` [PATCH v5 21/21] kprobes: Replace rp->free_instance with freelist Masami Hiramatsu
@ 2020-09-01 19:08 ` Peter Zijlstra
  2020-09-02  0:37   ` Masami Hiramatsu
  21 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2020-09-01 19:08 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Eddy_Wu, x86, davem, rostedt, naveen.n.rao,
	anil.s.keshavamurthy, linux-arch, cameron, oleg, will, paulmck

On Sat, Aug 29, 2020 at 09:59:49PM +0900, Masami Hiramatsu wrote:
> Masami Hiramatsu (16):
>       kprobes: Add generic kretprobe trampoline handler
>       x86/kprobes: Use generic kretprobe trampoline handler
>       arm: kprobes: Use generic kretprobe trampoline handler
>       arm64: kprobes: Use generic kretprobe trampoline handler
>       arc: kprobes: Use generic kretprobe trampoline handler
>       csky: kprobes: Use generic kretprobe trampoline handler
>       ia64: kprobes: Use generic kretprobe trampoline handler
>       mips: kprobes: Use generic kretprobe trampoline handler
>       parisc: kprobes: Use generic kretprobe trampoline handler
>       powerpc: kprobes: Use generic kretprobe trampoline handler
>       s390: kprobes: Use generic kretprobe trampoline handler
>       sh: kprobes: Use generic kretprobe trampoline handler
>       sparc: kprobes: Use generic kretprobe trampoline handler
>       kprobes: Remove NMI context check
>       kprobes: Free kretprobe_instance with rcu callback
>       kprobes: Make local used functions static
> 
> Peter Zijlstra (5):
>       llist: Add nonatomic __llist_add() and __llist_dell_all()
>       kprobes: Remove kretprobe hash
>       asm-generic/atomic: Add try_cmpxchg() fallbacks
>       freelist: Lock less freelist
>       kprobes: Replace rp->free_instance with freelist

This looks good to me, do you want me to merge them through -tip? If so,
do we want to try and get them in this release still?

Ingo, opinions? This basically fixes a regression cauesd by

  0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-01 19:08 ` [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Peter Zijlstra
@ 2020-09-02  0:37   ` Masami Hiramatsu
  2020-09-02  7:02     ` peterz
  0 siblings, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-09-02  0:37 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Eddy_Wu, x86, davem, rostedt, naveen.n.rao,
	anil.s.keshavamurthy, linux-arch, cameron, oleg, will, paulmck

On Tue, 1 Sep 2020 21:08:08 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Sat, Aug 29, 2020 at 09:59:49PM +0900, Masami Hiramatsu wrote:
> > Masami Hiramatsu (16):
> >       kprobes: Add generic kretprobe trampoline handler
> >       x86/kprobes: Use generic kretprobe trampoline handler
> >       arm: kprobes: Use generic kretprobe trampoline handler
> >       arm64: kprobes: Use generic kretprobe trampoline handler
> >       arc: kprobes: Use generic kretprobe trampoline handler
> >       csky: kprobes: Use generic kretprobe trampoline handler
> >       ia64: kprobes: Use generic kretprobe trampoline handler
> >       mips: kprobes: Use generic kretprobe trampoline handler
> >       parisc: kprobes: Use generic kretprobe trampoline handler
> >       powerpc: kprobes: Use generic kretprobe trampoline handler
> >       s390: kprobes: Use generic kretprobe trampoline handler
> >       sh: kprobes: Use generic kretprobe trampoline handler
> >       sparc: kprobes: Use generic kretprobe trampoline handler
> >       kprobes: Remove NMI context check
> >       kprobes: Free kretprobe_instance with rcu callback
> >       kprobes: Make local used functions static
> > 
> > Peter Zijlstra (5):
> >       llist: Add nonatomic __llist_add() and __llist_dell_all()
> >       kprobes: Remove kretprobe hash
> >       asm-generic/atomic: Add try_cmpxchg() fallbacks
> >       freelist: Lock less freelist
> >       kprobes: Replace rp->free_instance with freelist
> 
> This looks good to me, do you want me to merge them through -tip? If so,
> do we want to try and get them in this release still?

Yes, thanks. For the kretprobe missing issue, we will need the first half
(up to "kprobes: Remove NMI context check"), so we can split the series
if someone think the lockless is still immature.

> 
> Ingo, opinions? This basically fixes a regression cauesd by
> 
>   0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
> 

Oops, I missed Ingo in CC. 

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-02  0:37   ` Masami Hiramatsu
@ 2020-09-02  7:02     ` peterz
  2020-09-02  8:17       ` Masami Hiramatsu
  2020-09-11  2:32       ` Masami Hiramatsu
  0 siblings, 2 replies; 51+ messages in thread
From: peterz @ 2020-09-02  7:02 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

On Wed, Sep 02, 2020 at 09:37:39AM +0900, Masami Hiramatsu wrote:
> On Tue, 1 Sep 2020 21:08:08 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Sat, Aug 29, 2020 at 09:59:49PM +0900, Masami Hiramatsu wrote:
> > > Masami Hiramatsu (16):
> > >       kprobes: Add generic kretprobe trampoline handler
> > >       x86/kprobes: Use generic kretprobe trampoline handler
> > >       arm: kprobes: Use generic kretprobe trampoline handler
> > >       arm64: kprobes: Use generic kretprobe trampoline handler
> > >       arc: kprobes: Use generic kretprobe trampoline handler
> > >       csky: kprobes: Use generic kretprobe trampoline handler
> > >       ia64: kprobes: Use generic kretprobe trampoline handler
> > >       mips: kprobes: Use generic kretprobe trampoline handler
> > >       parisc: kprobes: Use generic kretprobe trampoline handler
> > >       powerpc: kprobes: Use generic kretprobe trampoline handler
> > >       s390: kprobes: Use generic kretprobe trampoline handler
> > >       sh: kprobes: Use generic kretprobe trampoline handler
> > >       sparc: kprobes: Use generic kretprobe trampoline handler
> > >       kprobes: Remove NMI context check
> > >       kprobes: Free kretprobe_instance with rcu callback
> > >       kprobes: Make local used functions static
> > > 
> > > Peter Zijlstra (5):
> > >       llist: Add nonatomic __llist_add() and __llist_dell_all()
> > >       kprobes: Remove kretprobe hash
> > >       asm-generic/atomic: Add try_cmpxchg() fallbacks
> > >       freelist: Lock less freelist
> > >       kprobes: Replace rp->free_instance with freelist
> > 
> > This looks good to me, do you want me to merge them through -tip? If so,
> > do we want to try and get them in this release still?
> 
> Yes, thanks. For the kretprobe missing issue, we will need the first half
> (up to "kprobes: Remove NMI context check"), so we can split the series
> if someone think the lockless is still immature.

Ok, but then lockdep will yell at you if you have that enabled and run
the unoptimized things.

> > Ingo, opinions? This basically fixes a regression cauesd by
> > 
> >   0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
> > 
> 
> Oops, I missed Ingo in CC. 

You had x86@, he'll have a copy :-)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-02  7:02     ` peterz
@ 2020-09-02  8:17       ` Masami Hiramatsu
  2020-09-02  9:36         ` peterz
  2020-09-11  2:32       ` Masami Hiramatsu
  1 sibling, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-09-02  8:17 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

On Wed, 2 Sep 2020 09:02:26 +0200
peterz@infradead.org wrote:

> On Wed, Sep 02, 2020 at 09:37:39AM +0900, Masami Hiramatsu wrote:
> > On Tue, 1 Sep 2020 21:08:08 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Sat, Aug 29, 2020 at 09:59:49PM +0900, Masami Hiramatsu wrote:
> > > > Masami Hiramatsu (16):
> > > >       kprobes: Add generic kretprobe trampoline handler
> > > >       x86/kprobes: Use generic kretprobe trampoline handler
> > > >       arm: kprobes: Use generic kretprobe trampoline handler
> > > >       arm64: kprobes: Use generic kretprobe trampoline handler
> > > >       arc: kprobes: Use generic kretprobe trampoline handler
> > > >       csky: kprobes: Use generic kretprobe trampoline handler
> > > >       ia64: kprobes: Use generic kretprobe trampoline handler
> > > >       mips: kprobes: Use generic kretprobe trampoline handler
> > > >       parisc: kprobes: Use generic kretprobe trampoline handler
> > > >       powerpc: kprobes: Use generic kretprobe trampoline handler
> > > >       s390: kprobes: Use generic kretprobe trampoline handler
> > > >       sh: kprobes: Use generic kretprobe trampoline handler
> > > >       sparc: kprobes: Use generic kretprobe trampoline handler
> > > >       kprobes: Remove NMI context check
> > > >       kprobes: Free kretprobe_instance with rcu callback
> > > >       kprobes: Make local used functions static
> > > > 
> > > > Peter Zijlstra (5):
> > > >       llist: Add nonatomic __llist_add() and __llist_dell_all()
> > > >       kprobes: Remove kretprobe hash
> > > >       asm-generic/atomic: Add try_cmpxchg() fallbacks
> > > >       freelist: Lock less freelist
> > > >       kprobes: Replace rp->free_instance with freelist
> > > 
> > > This looks good to me, do you want me to merge them through -tip? If so,
> > > do we want to try and get them in this release still?
> > 
> > Yes, thanks. For the kretprobe missing issue, we will need the first half
> > (up to "kprobes: Remove NMI context check"), so we can split the series
> > if someone think the lockless is still immature.
> 
> Ok, but then lockdep will yell at you if you have that enabled and run
> the unoptimized things.

Oh, does it warn for all spinlock things in kprobes if it is unoptimized?
Hmm, it has to be noted in the documentation.

Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-02  8:17       ` Masami Hiramatsu
@ 2020-09-02  9:36         ` peterz
  2020-09-02 13:19           ` Masami Hiramatsu
  0 siblings, 1 reply; 51+ messages in thread
From: peterz @ 2020-09-02  9:36 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

On Wed, Sep 02, 2020 at 05:17:55PM +0900, Masami Hiramatsu wrote:

> > Ok, but then lockdep will yell at you if you have that enabled and run
> > the unoptimized things.
> 
> Oh, does it warn for all spinlock things in kprobes if it is unoptimized?
> Hmm, it has to be noted in the documentation.

Lockdep will warn about spinlocks used in NMI context that are also used
outside NMI context.

Now, for the kretprobe that kprobe_busy flag prevents the actual
recursion self-deadlock, but lockdep isn't smart enough to see that.

One way around this might be to use SINGLE_DEPTH_NESTING for locks when
we use them from INT3 context. That way they'll have a different class
and lockdep will not see the recursion.

pre_handler_kretprobe() is always called from INT3, right?

Something like the below might then work...

---
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 287b263c9cb9..b78f4fe08e86 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1255,11 +1255,11 @@ __acquires(hlist_lock)
 NOKPROBE_SYMBOL(kretprobe_hash_lock);
 
 static void kretprobe_table_lock(unsigned long hash,
-				 unsigned long *flags)
+				 unsigned long *flags, int nesting)
 __acquires(hlist_lock)
 {
 	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
+	raw_spin_lock_irqsave_nested(hlist_lock, *flags, nesting);
 }
 NOKPROBE_SYMBOL(kretprobe_table_lock);
 
@@ -1326,7 +1326,7 @@ void kprobe_flush_task(struct task_struct *tk)
 	INIT_HLIST_HEAD(&empty_rp);
 	hash = hash_ptr(tk, KPROBE_HASH_BITS);
 	head = &kretprobe_inst_table[hash];
-	kretprobe_table_lock(hash, &flags);
+	kretprobe_table_lock(hash, &flags, 0);
 	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
 		if (ri->task == tk)
 			recycle_rp_inst(ri, &empty_rp);
@@ -1361,7 +1361,7 @@ static void cleanup_rp_inst(struct kretprobe *rp)
 
 	/* No race here */
 	for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
-		kretprobe_table_lock(hash, &flags);
+		kretprobe_table_lock(hash, &flags, 0);
 		head = &kretprobe_inst_table[hash];
 		hlist_for_each_entry_safe(ri, next, head, hlist) {
 			if (ri->rp == rp)
@@ -1950,7 +1950,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 
 	/* TODO: consider to only swap the RA after the last pre_handler fired */
 	hash = hash_ptr(current, KPROBE_HASH_BITS);
-	raw_spin_lock_irqsave(&rp->lock, flags);
+	raw_spin_lock_irqsave_nested(&rp->lock, flags, SINGLE_DEPTH_NESTING);
 	if (!hlist_empty(&rp->free_instances)) {
 		ri = hlist_entry(rp->free_instances.first,
 				struct kretprobe_instance, hlist);
@@ -1961,7 +1961,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 		ri->task = current;
 
 		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
-			raw_spin_lock_irqsave(&rp->lock, flags);
+			raw_spin_lock_irqsave_nested(&rp->lock, flags, SINGLE_DEPTH_NESTING);
 			hlist_add_head(&ri->hlist, &rp->free_instances);
 			raw_spin_unlock_irqrestore(&rp->lock, flags);
 			return 0;
@@ -1971,7 +1971,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 
 		/* XXX(hch): why is there no hlist_move_head? */
 		INIT_HLIST_NODE(&ri->hlist);
-		kretprobe_table_lock(hash, &flags);
+		kretprobe_table_lock(hash, &flags, SINGLE_DEPTH_NESTING);
 		hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
 		kretprobe_table_unlock(hash, &flags);
 	} else {

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-02  9:36         ` peterz
@ 2020-09-02 13:19           ` Masami Hiramatsu
  2020-09-02 13:42             ` peterz
  0 siblings, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-09-02 13:19 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

On Wed, 2 Sep 2020 11:36:13 +0200
peterz@infradead.org wrote:

> On Wed, Sep 02, 2020 at 05:17:55PM +0900, Masami Hiramatsu wrote:
> 
> > > Ok, but then lockdep will yell at you if you have that enabled and run
> > > the unoptimized things.
> > 
> > Oh, does it warn for all spinlock things in kprobes if it is unoptimized?
> > Hmm, it has to be noted in the documentation.
> 
> Lockdep will warn about spinlocks used in NMI context that are also used
> outside NMI context.

OK, but raw_spin_lock_irqsave() will not involve lockdep, correct?

> Now, for the kretprobe that kprobe_busy flag prevents the actual
> recursion self-deadlock, but lockdep isn't smart enough to see that.
> 
> One way around this might be to use SINGLE_DEPTH_NESTING for locks when
> we use them from INT3 context. That way they'll have a different class
> and lockdep will not see the recursion.

Hmm, so lockdep warns only when it detects the spinlock in NMI context,
and int3 is now always NMI, thus all spinlock (except raw_spinlock?)
in kprobe handlers should get warned, right?
I have tested this series up to [16/21] with optprobe disabled, but
I haven't see the lockdep warnings.

> 
> pre_handler_kretprobe() is always called from INT3, right?

No, not always, it can be called from optprobe (same as original code
context) or ftrace handler.
But if you set 0 to /proc/sys/debug/kprobe_optimization, and compile
the kernel without function tracer, it should always be called from
INT3.

> 
> Something like the below might then work...

OK, but I would like to reproduce the lockdep warning on this for
confirmation.

Thank you,

> 
> ---
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 287b263c9cb9..b78f4fe08e86 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1255,11 +1255,11 @@ __acquires(hlist_lock)
>  NOKPROBE_SYMBOL(kretprobe_hash_lock);
>  
>  static void kretprobe_table_lock(unsigned long hash,
> -				 unsigned long *flags)
> +				 unsigned long *flags, int nesting)
>  __acquires(hlist_lock)
>  {
>  	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_lock_irqsave(hlist_lock, *flags);
> +	raw_spin_lock_irqsave_nested(hlist_lock, *flags, nesting);
>  }
>  NOKPROBE_SYMBOL(kretprobe_table_lock);
>  
> @@ -1326,7 +1326,7 @@ void kprobe_flush_task(struct task_struct *tk)
>  	INIT_HLIST_HEAD(&empty_rp);
>  	hash = hash_ptr(tk, KPROBE_HASH_BITS);
>  	head = &kretprobe_inst_table[hash];
> -	kretprobe_table_lock(hash, &flags);
> +	kretprobe_table_lock(hash, &flags, 0);
>  	hlist_for_each_entry_safe(ri, tmp, head, hlist) {
>  		if (ri->task == tk)
>  			recycle_rp_inst(ri, &empty_rp);
> @@ -1361,7 +1361,7 @@ static void cleanup_rp_inst(struct kretprobe *rp)
>  
>  	/* No race here */
>  	for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
> -		kretprobe_table_lock(hash, &flags);
> +		kretprobe_table_lock(hash, &flags, 0);
>  		head = &kretprobe_inst_table[hash];
>  		hlist_for_each_entry_safe(ri, next, head, hlist) {
>  			if (ri->rp == rp)
> @@ -1950,7 +1950,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  
>  	/* TODO: consider to only swap the RA after the last pre_handler fired */
>  	hash = hash_ptr(current, KPROBE_HASH_BITS);
> -	raw_spin_lock_irqsave(&rp->lock, flags);
> +	raw_spin_lock_irqsave_nested(&rp->lock, flags, SINGLE_DEPTH_NESTING);
>  	if (!hlist_empty(&rp->free_instances)) {
>  		ri = hlist_entry(rp->free_instances.first,
>  				struct kretprobe_instance, hlist);
> @@ -1961,7 +1961,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  		ri->task = current;
>  
>  		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> -			raw_spin_lock_irqsave(&rp->lock, flags);
> +			raw_spin_lock_irqsave_nested(&rp->lock, flags, SINGLE_DEPTH_NESTING);
>  			hlist_add_head(&ri->hlist, &rp->free_instances);
>  			raw_spin_unlock_irqrestore(&rp->lock, flags);
>  			return 0;
> @@ -1971,7 +1971,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  
>  		/* XXX(hch): why is there no hlist_move_head? */
>  		INIT_HLIST_NODE(&ri->hlist);
> -		kretprobe_table_lock(hash, &flags);
> +		kretprobe_table_lock(hash, &flags, SINGLE_DEPTH_NESTING);
>  		hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
>  		kretprobe_table_unlock(hash, &flags);
>  	} else {


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-02 13:19           ` Masami Hiramatsu
@ 2020-09-02 13:42             ` peterz
  2020-09-03  1:39               ` Masami Hiramatsu
  0 siblings, 1 reply; 51+ messages in thread
From: peterz @ 2020-09-02 13:42 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

On Wed, Sep 02, 2020 at 10:19:26PM +0900, Masami Hiramatsu wrote:
> On Wed, 2 Sep 2020 11:36:13 +0200
> peterz@infradead.org wrote:
> 
> > On Wed, Sep 02, 2020 at 05:17:55PM +0900, Masami Hiramatsu wrote:
> > 
> > > > Ok, but then lockdep will yell at you if you have that enabled and run
> > > > the unoptimized things.
> > > 
> > > Oh, does it warn for all spinlock things in kprobes if it is unoptimized?
> > > Hmm, it has to be noted in the documentation.
> > 
> > Lockdep will warn about spinlocks used in NMI context that are also used
> > outside NMI context.
> 
> OK, but raw_spin_lock_irqsave() will not involve lockdep, correct?

It will. The distinction between spin_lock and raw_spin_lock is only
that raw_spin_lock stays a spinlock on PREEMPT_RT, while spin_lock will
turn into a (PI) mutex in that case.

But both will call into lockdep. Unlike local_irq_disable() and
raw_local_irq_disable(), where the latter will not. Yes your prefixes
are a mess :/

> > Now, for the kretprobe that kprobe_busy flag prevents the actual
> > recursion self-deadlock, but lockdep isn't smart enough to see that.
> > 
> > One way around this might be to use SINGLE_DEPTH_NESTING for locks when
> > we use them from INT3 context. That way they'll have a different class
> > and lockdep will not see the recursion.
> 
> Hmm, so lockdep warns only when it detects the spinlock in NMI context,
> and int3 is now always NMI, thus all spinlock (except raw_spinlock?)
> in kprobe handlers should get warned, right?
> I have tested this series up to [16/21] with optprobe disabled, but
> I haven't see the lockdep warnings.

There's a bug, that might make it miss it. I have a patch. I'll send it
shortly.

> > pre_handler_kretprobe() is always called from INT3, right?
> 
> No, not always, it can be called from optprobe (same as original code
> context) or ftrace handler.
> But if you set 0 to /proc/sys/debug/kprobe_optimization, and compile
> the kernel without function tracer, it should always be called from
> INT3.

D'oh, ofcourse! Arguably I should make the optprobe context NMI like
too.. but that's for another day.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-02 13:42             ` peterz
@ 2020-09-03  1:39               ` Masami Hiramatsu
  2020-09-03  2:02                 ` Masami Hiramatsu
  2020-09-08 10:37                 ` peterz
  0 siblings, 2 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-09-03  1:39 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck, systemtap

On Wed, 2 Sep 2020 15:42:52 +0200
peterz@infradead.org wrote:

> On Wed, Sep 02, 2020 at 10:19:26PM +0900, Masami Hiramatsu wrote:
> > On Wed, 2 Sep 2020 11:36:13 +0200
> > peterz@infradead.org wrote:
> > 
> > > On Wed, Sep 02, 2020 at 05:17:55PM +0900, Masami Hiramatsu wrote:
> > > 
> > > > > Ok, but then lockdep will yell at you if you have that enabled and run
> > > > > the unoptimized things.
> > > > 
> > > > Oh, does it warn for all spinlock things in kprobes if it is unoptimized?
> > > > Hmm, it has to be noted in the documentation.
> > > 
> > > Lockdep will warn about spinlocks used in NMI context that are also used
> > > outside NMI context.
> > 
> > OK, but raw_spin_lock_irqsave() will not involve lockdep, correct?
> 
> It will. The distinction between spin_lock and raw_spin_lock is only
> that raw_spin_lock stays a spinlock on PREEMPT_RT, while spin_lock will
> turn into a (PI) mutex in that case.
> 
> But both will call into lockdep. Unlike local_irq_disable() and
> raw_local_irq_disable(), where the latter will not. Yes your prefixes
> are a mess :/

Yeah, that's really confusing...

> > > Now, for the kretprobe that kprobe_busy flag prevents the actual
> > > recursion self-deadlock, but lockdep isn't smart enough to see that.
> > > 
> > > One way around this might be to use SINGLE_DEPTH_NESTING for locks when
> > > we use them from INT3 context. That way they'll have a different class
> > > and lockdep will not see the recursion.
> > 
> > Hmm, so lockdep warns only when it detects the spinlock in NMI context,
> > and int3 is now always NMI, thus all spinlock (except raw_spinlock?)
> > in kprobe handlers should get warned, right?
> > I have tested this series up to [16/21] with optprobe disabled, but
> > I haven't see the lockdep warnings.
> 
> There's a bug, that might make it miss it. I have a patch. I'll send it
> shortly.

OK, I've confirmed that the lockdep warns on kretprobe from INT3
with your fix. Of course make it lockless then warning is gone.
But even without the lockless patch, this warning can be false-positive
because we prohibit nested kprobe call, right?

If the kprobe user handler uses a spinlock, the spinlock is used
only in that handler (and in the context between kprobe_busy_begin/end),
it will be safe since the spinlock is not nested.
But if the spinlock is shared with other context, it will be dangerous
because it can be interrupted by NMI (including INT3). This also applied
to the function which is called from kprobe user handlers, thus user
has to take care of it.
BTW, what would you think about setting NMI count between kprobe_busy_begin/end?

> 
> > > pre_handler_kretprobe() is always called from INT3, right?
> > 
> > No, not always, it can be called from optprobe (same as original code
> > context) or ftrace handler.
> > But if you set 0 to /proc/sys/debug/kprobe_optimization, and compile
> > the kernel without function tracer, it should always be called from
> > INT3.
> 
> D'oh, ofcourse! Arguably I should make the optprobe context NMI like
> too.. but that's for another day.

Hmm, we still have kprobe-on-ftrace. Would you consider we will
make it NMI like too? (and what the ftrace does for this)

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-03  1:39               ` Masami Hiramatsu
@ 2020-09-03  2:02                 ` Masami Hiramatsu
  2020-09-07 17:44                   ` Frank Ch. Eigler
  2020-09-08 10:37                 ` peterz
  1 sibling, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-09-03  2:02 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: peterz, Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck, systemtap

On Thu, 3 Sep 2020 10:39:54 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> OK, I've confirmed that the lockdep warns on kretprobe from INT3
> with your fix. Of course make it lockless then warning is gone.
> But even without the lockless patch, this warning can be false-positive
> because we prohibit nested kprobe call, right?
> 
> If the kprobe user handler uses a spinlock, the spinlock is used
> only in that handler (and in the context between kprobe_busy_begin/end),
> it will be safe since the spinlock is not nested.
> But if the spinlock is shared with other context, it will be dangerous
> because it can be interrupted by NMI (including INT3). This also applied
> to the function which is called from kprobe user handlers, thus user
> has to take care of it.

Sorry, for noticing this point, I Cc'd to systemtap. Is systemtap taking
care of spinlock too?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-03  2:02                 ` Masami Hiramatsu
@ 2020-09-07 17:44                   ` Frank Ch. Eigler
  2020-09-08  2:55                     ` Masami Hiramatsu
  0 siblings, 1 reply; 51+ messages in thread
From: Frank Ch. Eigler @ 2020-09-07 17:44 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: peterz, Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck, systemtap

Masami Hiramatsu <mhiramat@kernel.org> writes:

> Sorry, for noticing this point, I Cc'd to systemtap. Is systemtap taking
> care of spinlock too?

On PRREMPT_RT configurations, systemtap uses the raw_spinlock_t
types/functions, to keep its probe handlers as atomic as we can make them.

- FChE


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-07 17:44                   ` Frank Ch. Eigler
@ 2020-09-08  2:55                     ` Masami Hiramatsu
  0 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-09-08  2:55 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: peterz, Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck, systemtap

On Mon, 07 Sep 2020 13:44:19 -0400
fche@redhat.com (Frank Ch. Eigler) wrote:

> Masami Hiramatsu <mhiramat@kernel.org> writes:
> 
> > Sorry, for noticing this point, I Cc'd to systemtap. Is systemtap taking
> > care of spinlock too?
> 
> On PRREMPT_RT configurations, systemtap uses the raw_spinlock_t
> types/functions, to keep its probe handlers as atomic as we can make them.

OK, if the lock is only used in the probe handlers, there should be
no problem. Even if a probe hits in the NMI which happens in another
kprobe handler, the probe does not call its handler (because we don't
support nested kprobes* yet).
But maybe you'll get warnings if you enable the lockdep.

* https://lkml.kernel.org/r/158894789510.14896.13461271606820304664.stgit@devnote2
It seems that we need more work for the nested kprobes.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-03  1:39               ` Masami Hiramatsu
  2020-09-03  2:02                 ` Masami Hiramatsu
@ 2020-09-08 10:37                 ` peterz
  2020-09-08 11:15                   ` Eddy_Wu
  2020-09-08 15:09                   ` Masami Hiramatsu
  1 sibling, 2 replies; 51+ messages in thread
From: peterz @ 2020-09-08 10:37 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck, systemtap

On Thu, Sep 03, 2020 at 10:39:54AM +0900, Masami Hiramatsu wrote:

> > There's a bug, that might make it miss it. I have a patch. I'll send it
> > shortly.
> 
> OK, I've confirmed that the lockdep warns on kretprobe from INT3
> with your fix.

I'm now trying and failing to reproduce.... I can't seem to make it use
int3 today. It seems to want to use ftrace or refuses everything. I'm
probably doing it wrong.

Could you verify the below actually works? It's on top of the first 16
patches.

> Of course make it lockless then warning is gone.
> But even without the lockless patch, this warning can be false-positive
> because we prohibit nested kprobe call, right?

Yes, because the actual nesting is avoided by kprobe_busy, but lockdep
can't tell. Lockdep sees a regular lock user and an in-nmi lock user and
figures that's a bad combination.


---
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1241,48 +1241,47 @@ void recycle_rp_inst(struct kretprobe_in
 }
 NOKPROBE_SYMBOL(recycle_rp_inst);
 
-void kretprobe_hash_lock(struct task_struct *tsk,
-			 struct hlist_head **head, unsigned long *flags)
-__acquires(hlist_lock)
-{
-	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
-	raw_spinlock_t *hlist_lock;
-
-	*head = &kretprobe_inst_table[hash];
-	hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_hash_lock);
-
 static void kretprobe_table_lock(unsigned long hash,
 				 unsigned long *flags)
 __acquires(hlist_lock)
 {
 	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
+	/*
+	 * HACK, due to kprobe_busy we'll never actually recurse, make NMI
+	 * context use a different lock class to avoid it reporting.
+	 */
+	raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
 }
 NOKPROBE_SYMBOL(kretprobe_table_lock);
 
-void kretprobe_hash_unlock(struct task_struct *tsk,
-			   unsigned long *flags)
+static void kretprobe_table_unlock(unsigned long hash,
+				   unsigned long *flags)
 __releases(hlist_lock)
 {
+	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
+	raw_spin_unlock_irqrestore(hlist_lock, *flags);
+}
+NOKPROBE_SYMBOL(kretprobe_table_unlock);
+
+void kretprobe_hash_lock(struct task_struct *tsk,
+			 struct hlist_head **head, unsigned long *flags)
+__acquires(hlist_lock)
+{
 	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
-	raw_spinlock_t *hlist_lock;
 
-	hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_unlock_irqrestore(hlist_lock, *flags);
+	*head = &kretprobe_inst_table[hash];
+	kretprobe_table_lock(hash, flags);
 }
-NOKPROBE_SYMBOL(kretprobe_hash_unlock);
+NOKPROBE_SYMBOL(kretprobe_hash_lock);
 
-static void kretprobe_table_unlock(unsigned long hash,
-				   unsigned long *flags)
+void kretprobe_hash_unlock(struct task_struct *tsk,
+			   unsigned long *flags)
 __releases(hlist_lock)
 {
-	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_unlock_irqrestore(hlist_lock, *flags);
+	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
+	kretprobe_table_unlock(hash, flags);
 }
-NOKPROBE_SYMBOL(kretprobe_table_unlock);
+NOKPROBE_SYMBOL(kretprobe_hash_unlock);
 
 struct kprobe kprobe_busy = {
 	.addr = (void *) get_kprobe,

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-08 10:37                 ` peterz
@ 2020-09-08 11:15                   ` Eddy_Wu
  2020-09-08 11:33                     ` peterz
  2020-09-08 15:09                   ` Masami Hiramatsu
  1 sibling, 1 reply; 51+ messages in thread
From: Eddy_Wu @ 2020-09-08 11:15 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, linux-kernel, x86, davem, rostedt, naveen.n.rao,
	anil.s.keshavamurthy, linux-arch, cameron, oleg, will, paulmck,
	systemtap

> From: peterz@infradead.org <peterz@infradead.org>
>
> I'm now trying and failing to reproduce.... I can't seem to make it use
> int3 today. It seems to want to use ftrace or refuses everything. I'm
> probably doing it wrong.
>

You can turn off CONFIG_KPROBES_ON_FTRACE (and also sysctl debug.kprobes-optimization) to make it use int3 handler


TREND MICRO EMAIL NOTICE

The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

For details about what personal information we collect and why, please see our Privacy Notice on our website at: Read privacy policy<http://www.trendmicro.com/privacy>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-08 11:15                   ` Eddy_Wu
@ 2020-09-08 11:33                     ` peterz
  0 siblings, 0 replies; 51+ messages in thread
From: peterz @ 2020-09-08 11:33 UTC (permalink / raw)
  To: Eddy_Wu
  Cc: Ingo Molnar, linux-kernel, x86, davem, rostedt, naveen.n.rao,
	anil.s.keshavamurthy, linux-arch, cameron, oleg, will, paulmck,
	systemtap

On Tue, Sep 08, 2020 at 11:15:14AM +0000, Eddy_Wu@trendmicro.com wrote:
> > From: peterz@infradead.org <peterz@infradead.org>
> >
> > I'm now trying and failing to reproduce.... I can't seem to make it use
> > int3 today. It seems to want to use ftrace or refuses everything. I'm
> > probably doing it wrong.
> >
> 
> You can turn off CONFIG_KPROBES_ON_FTRACE (and also sysctl
> debug.kprobes-optimization) to make it use int3 handler

I'm fairly sure I used the sysctl like in your original reproducer.

I'll try again later.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-08 10:37                 ` peterz
  2020-09-08 11:15                   ` Eddy_Wu
@ 2020-09-08 15:09                   ` Masami Hiramatsu
  2020-09-09  5:28                     ` Masami Hiramatsu
  1 sibling, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-09-08 15:09 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck, systemtap

On Tue, 8 Sep 2020 12:37:36 +0200
peterz@infradead.org wrote:

> On Thu, Sep 03, 2020 at 10:39:54AM +0900, Masami Hiramatsu wrote:
> 
> > > There's a bug, that might make it miss it. I have a patch. I'll send it
> > > shortly.
> > 
> > OK, I've confirmed that the lockdep warns on kretprobe from INT3
> > with your fix.
> 
> I'm now trying and failing to reproduce.... I can't seem to make it use
> int3 today. It seems to want to use ftrace or refuses everything. I'm
> probably doing it wrong.

Ah, yes. For using the INT3, we need to disable CONFIG_FUNCTION_TRACER,
or enable CONFIG_KPROBE_EVENTS_ON_NOTRACE and put a kretprobe on a notrace
function :)

> 
> Could you verify the below actually works? It's on top of the first 16
> patches.

Sure. (BTW, you mean the first 15 patches, since kretprobe_hash_lock()
becomes static by 16th patch ?)

Here is the result. I got same warning with or without your patch.
I have built the kernel with CONFIG_FUNCTION_TRACER=n and CONFIG_KPROBES_SANITY_TEST=y. 

[    0.269235] PCI: Using configuration type 1 for base access
[    0.272171] Kprobe smoke test: started
[    0.281125] 
[    0.281126] ================================
[    0.281126] WARNING: inconsistent lock state
[    0.281126] 5.9.0-rc2+ #101 Not tainted
[    0.281126] --------------------------------
[    0.281127] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[    0.281127] swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
[    0.281127] ffffffff8228c778 (&rp->lock){....}-{2:2}, at: pre_handler_kretprobe+0x2b/0x1b0
[    0.281128] {INITIAL USE} state was registered at:
[    0.281129]   lock_acquire+0x9e/0x3c0
[    0.281129]   _raw_spin_lock+0x2f/0x40
[    0.281129]   recycle_rp_inst+0x48/0x90
[    0.281129]   __kretprobe_trampoline_handler+0x15d/0x1e0
[    0.281130]   trampoline_handler+0x43/0x60
[    0.281130]   kretprobe_trampoline+0x2a/0x50
[    0.281130]   kretprobe_trampoline+0x0/0x50
[    0.281130]   init_kprobes+0x1b6/0x1d5
[    0.281130]   do_one_initcall+0x5c/0x315
[    0.281131]   kernel_init_freeable+0x21a/0x25f
[    0.281131]   kernel_init+0x9/0x104
[    0.281131]   ret_from_fork+0x22/0x30
[    0.281131] irq event stamp: 25978
[    0.281132] hardirqs last  enabled at (25977): [<ffffffff8108a0f7>] queue_delayed_work_on+0x57/0x90
[    0.281132] hardirqs last disabled at (25978): [<ffffffff818df778>] exc_int3+0x48/0x140
[    0.281132] softirqs last  enabled at (25964): [<ffffffff81c00389>] __do_softirq+0x389/0x48b
[    0.281133] softirqs last disabled at (25957): [<ffffffff81a00f42>] asm_call_on_stack+0x12/0x20
[    0.281133] 
[    0.281133] other info that might help us debug this:
[    0.281133]  Possible unsafe locking scenario:
[    0.281134] 
[    0.281134]        CPU0
[    0.281134]        ----
[    0.281134]   lock(&rp->lock);
[    0.281135]   <Interrupt>
[    0.281135]     lock(&rp->lock);
[    0.281136] 
[    0.281136]  *** DEADLOCK ***
[    0.281136] 
[    0.281136] no locks held by swapper/0/1.
[    0.281136] 
[    0.281137] stack backtrace:
[    0.281137] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc2+ #101
[    0.281137] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[    0.281137] Call Trace:
[    0.281138]  dump_stack+0x81/0xba
[    0.281138]  print_usage_bug.cold+0x195/0x19e
[    0.281138]  lock_acquire+0x314/0x3c0
[    0.281138]  ? __text_poke+0x2db/0x400
[    0.281139]  ? pre_handler_kretprobe+0x2b/0x1b0
[    0.281139]  _raw_spin_lock_irqsave+0x3a/0x50
[    0.281139]  ? pre_handler_kretprobe+0x2b/0x1b0
[    0.281139]  pre_handler_kretprobe+0x2b/0x1b0
[    0.281139]  ? stop_machine_from_inactive_cpu+0x120/0x120
[    0.281140]  aggr_pre_handler+0x5f/0xd0
[    0.281140]  kprobe_int3_handler+0x10f/0x160
[    0.281140]  exc_int3+0x52/0x140
[    0.281140]  asm_exc_int3+0x31/0x40
[    0.281141] RIP: 0010:kprobe_target+0x1/0x10
[    0.281141] Code: 65 48 33 04 25 28 00 00 00 75 10 48 81 c4 90 00 00 00 44 89 e0 41 5c 5d c3 0f 0b e8 a
[    0.281141] RSP: 0000:ffffc90000013e48 EFLAGS: 00000246
[    0.281142] RAX: ffffffff81142130 RBX: 0000000000000001 RCX: ffffc90000013cec
[    0.281142] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 00000000f3eb0b20
[    0.281142] RBP: ffffc90000013e68 R08: 0000000000000000 R09: 0000000000000001
[    0.281143] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    0.281143] R13: ffffffff8224c2b0 R14: ffffffff8255cf60 R15: 0000000000000000
[    0.281143]  ? stop_machine_from_inactive_cpu+0x120/0x120
[    0.281143]  ? kprobe_target+0x1/0x10
[    0.281144]  ? stop_machine_from_inactive_cpu+0x120/0x120
[    0.281144]  ? init_test_probes.cold+0x2e3/0x391
[    0.281144]  init_kprobes+0x1b6/0x1d5
[    0.281144]  ? debugfs_kprobe_init+0xa9/0xa9
[    0.281145]  do_one_initcall+0x5c/0x315
[    0.281145]  ? rcu_read_lock_sched_held+0x4a/0x80
[    0.281145]  kernel_init_freeable+0x21a/0x25f
[    0.281145]  ? rest_init+0x23c/0x23c
[    0.281145]  kernel_init+0x9/0x104
[    0.281146]  ret_from_fork+0x22/0x30
[    0.281282] Kprobe smoke test: passed successfully


> 
> > Of course make it lockless then warning is gone.
> > But even without the lockless patch, this warning can be false-positive
> > because we prohibit nested kprobe call, right?
> 
> Yes, because the actual nesting is avoided by kprobe_busy, but lockdep
> can't tell. Lockdep sees a regular lock user and an in-nmi lock user and
> figures that's a bad combination.

Thanks for confirmation!

> 
> 
> ---
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1241,48 +1241,47 @@ void recycle_rp_inst(struct kretprobe_in
>  }
>  NOKPROBE_SYMBOL(recycle_rp_inst);
>  
> -void kretprobe_hash_lock(struct task_struct *tsk,
> -			 struct hlist_head **head, unsigned long *flags)
> -__acquires(hlist_lock)
> -{
> -	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
> -	raw_spinlock_t *hlist_lock;
> -
> -	*head = &kretprobe_inst_table[hash];
> -	hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_lock_irqsave(hlist_lock, *flags);
> -}
> -NOKPROBE_SYMBOL(kretprobe_hash_lock);
> -
>  static void kretprobe_table_lock(unsigned long hash,
>  				 unsigned long *flags)
>  __acquires(hlist_lock)
>  {
>  	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_lock_irqsave(hlist_lock, *flags);
> +	/*
> +	 * HACK, due to kprobe_busy we'll never actually recurse, make NMI
> +	 * context use a different lock class to avoid it reporting.
> +	 */
> +	raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
>  }
>  NOKPROBE_SYMBOL(kretprobe_table_lock);
>  
> -void kretprobe_hash_unlock(struct task_struct *tsk,
> -			   unsigned long *flags)
> +static void kretprobe_table_unlock(unsigned long hash,
> +				   unsigned long *flags)
>  __releases(hlist_lock)
>  {
> +	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> +	raw_spin_unlock_irqrestore(hlist_lock, *flags);
> +}
> +NOKPROBE_SYMBOL(kretprobe_table_unlock);
> +
> +void kretprobe_hash_lock(struct task_struct *tsk,
> +			 struct hlist_head **head, unsigned long *flags)
> +__acquires(hlist_lock)
> +{
>  	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
> -	raw_spinlock_t *hlist_lock;
>  
> -	hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_unlock_irqrestore(hlist_lock, *flags);
> +	*head = &kretprobe_inst_table[hash];
> +	kretprobe_table_lock(hash, flags);
>  }
> -NOKPROBE_SYMBOL(kretprobe_hash_unlock);
> +NOKPROBE_SYMBOL(kretprobe_hash_lock);
>  
> -static void kretprobe_table_unlock(unsigned long hash,
> -				   unsigned long *flags)
> +void kretprobe_hash_unlock(struct task_struct *tsk,
> +			   unsigned long *flags)
>  __releases(hlist_lock)
>  {
> -	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_unlock_irqrestore(hlist_lock, *flags);
> +	unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
> +	kretprobe_table_unlock(hash, flags);
>  }
> -NOKPROBE_SYMBOL(kretprobe_table_unlock);
> +NOKPROBE_SYMBOL(kretprobe_hash_unlock);
>  
>  struct kprobe kprobe_busy = {
>  	.addr = (void *) get_kprobe,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-08 15:09                   ` Masami Hiramatsu
@ 2020-09-09  5:28                     ` Masami Hiramatsu
  0 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-09-09  5:28 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: peterz, Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck, systemtap

On Wed, 9 Sep 2020 00:09:23 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > > Of course make it lockless then warning is gone.
> > > But even without the lockless patch, this warning can be false-positive
> > > because we prohibit nested kprobe call, right?
> > 
> > Yes, because the actual nesting is avoided by kprobe_busy, but lockdep
> > can't tell. Lockdep sees a regular lock user and an in-nmi lock user and
> > figures that's a bad combination.

Hmm, what about introducing new LOCK_USED_KPROBE bit, which will be set
if the lock is accessed when the current_kprobe is set (including kprobe_busy)?
This means it is in the kprobe user-handler context. If we access the lock always
in the kprobes context, it is never nested.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
  2020-09-02  7:02     ` peterz
  2020-09-02  8:17       ` Masami Hiramatsu
@ 2020-09-11  2:32       ` Masami Hiramatsu
  1 sibling, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-09-11  2:32 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, linux-kernel, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

Hi Peter and Ingo,

On Wed, 2 Sep 2020 09:02:26 +0200
peterz@infradead.org wrote:

> On Wed, Sep 02, 2020 at 09:37:39AM +0900, Masami Hiramatsu wrote:
> > On Tue, 1 Sep 2020 21:08:08 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Sat, Aug 29, 2020 at 09:59:49PM +0900, Masami Hiramatsu wrote:
> > > > Masami Hiramatsu (16):
> > > >       kprobes: Add generic kretprobe trampoline handler
> > > >       x86/kprobes: Use generic kretprobe trampoline handler
> > > >       arm: kprobes: Use generic kretprobe trampoline handler
> > > >       arm64: kprobes: Use generic kretprobe trampoline handler
> > > >       arc: kprobes: Use generic kretprobe trampoline handler
> > > >       csky: kprobes: Use generic kretprobe trampoline handler
> > > >       ia64: kprobes: Use generic kretprobe trampoline handler
> > > >       mips: kprobes: Use generic kretprobe trampoline handler
> > > >       parisc: kprobes: Use generic kretprobe trampoline handler
> > > >       powerpc: kprobes: Use generic kretprobe trampoline handler
> > > >       s390: kprobes: Use generic kretprobe trampoline handler
> > > >       sh: kprobes: Use generic kretprobe trampoline handler
> > > >       sparc: kprobes: Use generic kretprobe trampoline handler
> > > >       kprobes: Remove NMI context check
> > > >       kprobes: Free kretprobe_instance with rcu callback
> > > >       kprobes: Make local used functions static
> > > > 
> > > > Peter Zijlstra (5):
> > > >       llist: Add nonatomic __llist_add() and __llist_dell_all()
> > > >       kprobes: Remove kretprobe hash
> > > >       asm-generic/atomic: Add try_cmpxchg() fallbacks
> > > >       freelist: Lock less freelist
> > > >       kprobes: Replace rp->free_instance with freelist
> > > 
> > > This looks good to me, do you want me to merge them through -tip? If so,
> > > do we want to try and get them in this release still?
> > 
> > Yes, thanks. For the kretprobe missing issue, we will need the first half
> > (up to "kprobes: Remove NMI context check"), so we can split the series
> > if someone think the lockless is still immature.
> 
> Ok, but then lockdep will yell at you if you have that enabled and run
> the unoptimized things.
> 
> > > Ingo, opinions? This basically fixes a regression cauesd by
> > > 
> > >   0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

So what would you think of this? I saw the unification part of this series
on the tip/master, but lockless part is not there. This might still keep
lockdep to warn on kretprobes if we disable CONFIG_FUNCTION_TRACER and
optprobe.

If we make the kretprobe lockless, we will remove all locks from in-kernel
kprobe handlers. So at least upstream user will be happy.

Or, do we fix lockdep warning on the spinlocks in kprobe handlers first?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 17/21] llist: Add nonatomic __llist_add() and __llist_dell_all()
  2020-08-29 13:03 ` [PATCH v5 17/21] llist: Add nonatomic __llist_add() and __llist_dell_all() Masami Hiramatsu
@ 2020-10-12 16:24   ` Ingo Molnar
  2020-10-14  0:24     ` Masami Hiramatsu
  0 siblings, 1 reply; 51+ messages in thread
From: Ingo Molnar @ 2020-10-12 16:24 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Peter Zijlstra, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck


* Masami Hiramatsu <mhiramat@kernel.org> wrote:

> From: Peter Zijlstra <peterz@infradead.org>
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Because you are forwarding this patch here, I've added your SOB:

  Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>

(Let me know if that's not OK.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 19/21] asm-generic/atomic: Add try_cmpxchg() fallbacks
  2020-08-29 13:03 ` [PATCH v5 19/21] asm-generic/atomic: Add try_cmpxchg() fallbacks Masami Hiramatsu
@ 2020-10-12 16:25   ` Ingo Molnar
  0 siblings, 0 replies; 51+ messages in thread
From: Ingo Molnar @ 2020-10-12 16:25 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Peter Zijlstra, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck


* Masami Hiramatsu <mhiramat@kernel.org> wrote:

> From: Peter Zijlstra <peterz@infradead.org>
> 
> Only x86 provides try_cmpxchg() outside of the atomic_t interfaces,
> provide generic fallbacks to create this interface from the widely
> available cmpxchg() function.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Will Deacon <will@kernel.org>

Your SOB was missing here too.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 17/21] llist: Add nonatomic __llist_add() and __llist_dell_all()
  2020-10-12 16:24   ` Ingo Molnar
@ 2020-10-14  0:24     ` Masami Hiramatsu
  0 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-10-14  0:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Peter Zijlstra, Eddy_Wu, x86, davem, rostedt,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

On Mon, 12 Oct 2020 18:24:54 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > From: Peter Zijlstra <peterz@infradead.org>
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Because you are forwarding this patch here, I've added your SOB:
> 
>   Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> 
> (Let me know if that's not OK.)

OK, I keep it in mind next time.

Thanks Ingo!

> 
> Thanks,
> 
> 	Ingo


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 14/21] kprobes: Remove NMI context check
       [not found]   ` <20201030213831.04e81962@oasis.local.home>
@ 2020-11-02  5:11     ` Masami Hiramatsu
  2020-11-02  5:53       ` Masami Hiramatsu
  0 siblings, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-11-02  5:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Peter Zijlstra, Eddy_Wu, x86, davem, naveen.n.rao,
	anil.s.keshavamurthy, linux-arch, cameron, oleg, will, paulmck

On Fri, 30 Oct 2020 21:38:31 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Sat, 29 Aug 2020 22:02:36 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > Since the commit 9b38cc704e84 ("kretprobe: Prevent triggering
> > kretprobe from within kprobe_flush_task") sets a dummy current
> > kprobe in the trampoline handler by kprobe_busy_begin/end(),
> > it is not possible to run a kretprobe pre handler in kretprobe
> > trampoline handler context even with the NMI. If the NMI interrupts
> > a kretprobe_trampoline_handler() and it hits a kretprobe, the
> > 2nd kretprobe will detect recursion correctly and it will be
> > skipped.
> > This means we have almost no double-lock issue on kretprobes by NMI.
> > 
> > The last one point is in cleanup_rp_inst() which also takes
> > kretprobe_table_lock without setting up current kprobes.
> > So adding kprobe_busy_begin/end() there allows us to remove
> > in_nmi() check.
> > 
> > The above commit applies kprobe_busy_begin/end() on x86, but
> > now all arch implementation are unified to generic one, we can
> > safely remove the in_nmi() check from arch independent code.
> >
> 
> So are you saying that lockdep is lying?
> 
> Kprobe smoke test: started
> 
> ================================
> WARNING: inconsistent lock state
> 5.10.0-rc1-test+ #29 Not tainted
> --------------------------------
> inconsistent {INITIAL USE} -> {IN-NMI} usage.
> swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
> ffffffff82b07118 (&rp->lock){....}-{2:2}, at: pre_handler_kretprobe+0x4b/0x193
> {INITIAL USE} state was registered at:
>   lock_acquire+0x280/0x325
>   _raw_spin_lock+0x30/0x3f
>   recycle_rp_inst+0x3f/0x86
>   __kretprobe_trampoline_handler+0x13a/0x177
>   trampoline_handler+0x48/0x57
>   kretprobe_trampoline+0x2a/0x4f
>   kretprobe_trampoline+0x0/0x4f
>   init_kprobes+0x193/0x19d
>   do_one_initcall+0xf9/0x27e
>   kernel_init_freeable+0x16e/0x2b6
>   kernel_init+0xe/0x109
>   ret_from_fork+0x22/0x30
> irq event stamp: 1670
> hardirqs last  enabled at (1669): [<ffffffff811cc344>] slab_free_freelist_hook+0xb4/0xfd
> hardirqs last disabled at (1670): [<ffffffff81da0887>] exc_int3+0xae/0x10a
> softirqs last  enabled at (1484): [<ffffffff82000352>] __do_softirq+0x352/0x38d
> softirqs last disabled at (1471): [<ffffffff81e00f82>] asm_call_irq_on_stack+0x12/0x20
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&rp->lock);
>   <Interrupt>
>     lock(&rp->lock);
> 
>  *** DEADLOCK ***
> 
> no locks held by swapper/0/1.
> 
> stack backtrace:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc1-test+ #29
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> Call Trace:
>  dump_stack+0x7d/0x9f
>  print_usage_bug+0x1c0/0x1d3
>  lock_acquire+0x302/0x325
>  ? pre_handler_kretprobe+0x4b/0x193
>  ? stop_machine_from_inactive_cpu+0x120/0x120
>  _raw_spin_lock_irqsave+0x43/0x58
>  ? pre_handler_kretprobe+0x4b/0x193
>  pre_handler_kretprobe+0x4b/0x193
>  ? stop_machine_from_inactive_cpu+0x120/0x120
>  ? kprobe_target+0x1/0x16
>  kprobe_int3_handler+0xd0/0x109
>  exc_int3+0xb8/0x10a
>  asm_exc_int3+0x31/0x40
> RIP: 0010:kprobe_target+0x1/0x16
>  5d c3 cc
> RSP: 0000:ffffc90000033e00 EFLAGS: 00000246
> RAX: ffffffff8110ea77 RBX: 0000000000000001 RCX: ffffc90000033cb4
> RDX: 0000000000000231 RSI: 0000000000000000 RDI: 000000003ca57c35
> RBP: ffffc90000033e20 R08: 0000000000000000 R09: ffffffff8111d207
> R10: ffff8881002ab480 R11: ffff8881002ab480 R12: 0000000000000000
> R13: ffffffff82a52af0 R14: 0000000000000200 R15: ffff888100331130
>  ? register_kprobe+0x43c/0x492
>  ? stop_machine_from_inactive_cpu+0x120/0x120
>  ? kprobe_target+0x1/0x16
>  ? init_test_probes+0x2c6/0x38a
>  init_kprobes+0x193/0x19d
>  ? debugfs_kprobe_init+0xb8/0xb8
>  do_one_initcall+0xf9/0x27e
>  ? rcu_read_lock_sched_held+0x3e/0x75
>  ? init_mm_internals+0x27b/0x284
>  kernel_init_freeable+0x16e/0x2b6
>  ? rest_init+0x152/0x152
>  kernel_init+0xe/0x109
>  ret_from_fork+0x22/0x30
> Kprobe smoke test: passed successfully
> 
> Config attached.

Thanks for the report! Let me check what happen.

Thank you,

> 
> -- Steve


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 14/21] kprobes: Remove NMI context check
  2020-11-02  5:11     ` Masami Hiramatsu
@ 2020-11-02  5:53       ` Masami Hiramatsu
  2020-11-02  7:02         ` Masami Hiramatsu
  0 siblings, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-11-02  5:53 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, linux-kernel, Peter Zijlstra, Eddy_Wu, x86,
	davem, naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron,
	oleg, will, paulmck

On Mon, 2 Nov 2020 14:11:38 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Fri, 30 Oct 2020 21:38:31 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Sat, 29 Aug 2020 22:02:36 +0900
> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > 
> > > Since the commit 9b38cc704e84 ("kretprobe: Prevent triggering
> > > kretprobe from within kprobe_flush_task") sets a dummy current
> > > kprobe in the trampoline handler by kprobe_busy_begin/end(),
> > > it is not possible to run a kretprobe pre handler in kretprobe
> > > trampoline handler context even with the NMI. If the NMI interrupts
> > > a kretprobe_trampoline_handler() and it hits a kretprobe, the
> > > 2nd kretprobe will detect recursion correctly and it will be
> > > skipped.
> > > This means we have almost no double-lock issue on kretprobes by NMI.
> > > 
> > > The last one point is in cleanup_rp_inst() which also takes
> > > kretprobe_table_lock without setting up current kprobes.
> > > So adding kprobe_busy_begin/end() there allows us to remove
> > > in_nmi() check.
> > > 
> > > The above commit applies kprobe_busy_begin/end() on x86, but
> > > now all arch implementation are unified to generic one, we can
> > > safely remove the in_nmi() check from arch independent code.
> > >
> > 
> > So are you saying that lockdep is lying?
> > 
> > Kprobe smoke test: started
> > 
> > ================================
> > WARNING: inconsistent lock state
> > 5.10.0-rc1-test+ #29 Not tainted
> > --------------------------------
> > inconsistent {INITIAL USE} -> {IN-NMI} usage.
> > swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
> > ffffffff82b07118 (&rp->lock){....}-{2:2}, at: pre_handler_kretprobe+0x4b/0x193
> > {INITIAL USE} state was registered at:
> >   lock_acquire+0x280/0x325
> >   _raw_spin_lock+0x30/0x3f
> >   recycle_rp_inst+0x3f/0x86
> >   __kretprobe_trampoline_handler+0x13a/0x177
> >   trampoline_handler+0x48/0x57
> >   kretprobe_trampoline+0x2a/0x4f
> >   kretprobe_trampoline+0x0/0x4f
> >   init_kprobes+0x193/0x19d
> >   do_one_initcall+0xf9/0x27e
> >   kernel_init_freeable+0x16e/0x2b6
> >   kernel_init+0xe/0x109
> >   ret_from_fork+0x22/0x30
> > irq event stamp: 1670
> > hardirqs last  enabled at (1669): [<ffffffff811cc344>] slab_free_freelist_hook+0xb4/0xfd
> > hardirqs last disabled at (1670): [<ffffffff81da0887>] exc_int3+0xae/0x10a
> > softirqs last  enabled at (1484): [<ffffffff82000352>] __do_softirq+0x352/0x38d
> > softirqs last disabled at (1471): [<ffffffff81e00f82>] asm_call_irq_on_stack+0x12/0x20
> > 
> > other info that might help us debug this:
> >  Possible unsafe locking scenario:
> > 
> >        CPU0
> >        ----
> >   lock(&rp->lock);
> >   <Interrupt>
> >     lock(&rp->lock);
> > 
> >  *** DEADLOCK ***
> > 
> > no locks held by swapper/0/1.
> > 
> > stack backtrace:
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc1-test+ #29
> > Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> > Call Trace:
> >  dump_stack+0x7d/0x9f
> >  print_usage_bug+0x1c0/0x1d3
> >  lock_acquire+0x302/0x325
> >  ? pre_handler_kretprobe+0x4b/0x193
> >  ? stop_machine_from_inactive_cpu+0x120/0x120
> >  _raw_spin_lock_irqsave+0x43/0x58
> >  ? pre_handler_kretprobe+0x4b/0x193
> >  pre_handler_kretprobe+0x4b/0x193
> >  ? stop_machine_from_inactive_cpu+0x120/0x120
> >  ? kprobe_target+0x1/0x16
> >  kprobe_int3_handler+0xd0/0x109
> >  exc_int3+0xb8/0x10a
> >  asm_exc_int3+0x31/0x40
> > RIP: 0010:kprobe_target+0x1/0x16
> >  5d c3 cc
> > RSP: 0000:ffffc90000033e00 EFLAGS: 00000246
> > RAX: ffffffff8110ea77 RBX: 0000000000000001 RCX: ffffc90000033cb4
> > RDX: 0000000000000231 RSI: 0000000000000000 RDI: 000000003ca57c35
> > RBP: ffffc90000033e20 R08: 0000000000000000 R09: ffffffff8111d207
> > R10: ffff8881002ab480 R11: ffff8881002ab480 R12: 0000000000000000
> > R13: ffffffff82a52af0 R14: 0000000000000200 R15: ffff888100331130
> >  ? register_kprobe+0x43c/0x492
> >  ? stop_machine_from_inactive_cpu+0x120/0x120
> >  ? kprobe_target+0x1/0x16
> >  ? init_test_probes+0x2c6/0x38a
> >  init_kprobes+0x193/0x19d
> >  ? debugfs_kprobe_init+0xb8/0xb8
> >  do_one_initcall+0xf9/0x27e
> >  ? rcu_read_lock_sched_held+0x3e/0x75
> >  ? init_mm_internals+0x27b/0x284
> >  kernel_init_freeable+0x16e/0x2b6
> >  ? rest_init+0x152/0x152
> >  kernel_init+0xe/0x109
> >  ret_from_fork+0x22/0x30
> > Kprobe smoke test: passed successfully
> > 
> > Config attached.
> 
> Thanks for the report! Let me check what happen.

OK, confirmed. But this is actually false-positive report.

The lockdep reports rp->lock case between pre_handler_kretprobe()
and recycle_rp_inst() from __kretprobe_trampoline_handler().
Since kretprobe_trampoline_handler() sets current_kprobe,
if other kprobes hits on same CPU, those are skipped. This means
pre_handler_kretprobe() is not called while executing
__kretprobe_trampoline_handler().

Actually, since this rp->lock is expected to be removed in the last
patch in this series ([21/21]), I left this as is, but we might better
to treat this case because the latter half of this series will be
merged in 5.11.

Hmm, are there any way to tell lockdep this is safe?

Thank you,


Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 14/21] kprobes: Remove NMI context check
  2020-11-02  5:53       ` Masami Hiramatsu
@ 2020-11-02  7:02         ` Masami Hiramatsu
  2020-11-02 14:27           ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-11-02  7:02 UTC (permalink / raw)
  To: Masami Hiramatsu, Ingo Molnar
  Cc: Steven Rostedt, linux-kernel, Peter Zijlstra, Eddy_Wu, x86,
	davem, naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron,
	oleg, will, paulmck

On Mon, 2 Nov 2020 14:53:34 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Mon, 2 Nov 2020 14:11:38 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > On Fri, 30 Oct 2020 21:38:31 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > On Sat, 29 Aug 2020 22:02:36 +0900
> > > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > > 
> > > > Since the commit 9b38cc704e84 ("kretprobe: Prevent triggering
> > > > kretprobe from within kprobe_flush_task") sets a dummy current
> > > > kprobe in the trampoline handler by kprobe_busy_begin/end(),
> > > > it is not possible to run a kretprobe pre handler in kretprobe
> > > > trampoline handler context even with the NMI. If the NMI interrupts
> > > > a kretprobe_trampoline_handler() and it hits a kretprobe, the
> > > > 2nd kretprobe will detect recursion correctly and it will be
> > > > skipped.
> > > > This means we have almost no double-lock issue on kretprobes by NMI.
> > > > 
> > > > The last one point is in cleanup_rp_inst() which also takes
> > > > kretprobe_table_lock without setting up current kprobes.
> > > > So adding kprobe_busy_begin/end() there allows us to remove
> > > > in_nmi() check.
> > > > 
> > > > The above commit applies kprobe_busy_begin/end() on x86, but
> > > > now all arch implementation are unified to generic one, we can
> > > > safely remove the in_nmi() check from arch independent code.
> > > >
> > > 
> > > So are you saying that lockdep is lying?
> > > 
> > > Kprobe smoke test: started
> > > 
> > > ================================
> > > WARNING: inconsistent lock state
> > > 5.10.0-rc1-test+ #29 Not tainted
> > > --------------------------------
> > > inconsistent {INITIAL USE} -> {IN-NMI} usage.
> > > swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
> > > ffffffff82b07118 (&rp->lock){....}-{2:2}, at: pre_handler_kretprobe+0x4b/0x193
> > > {INITIAL USE} state was registered at:
> > >   lock_acquire+0x280/0x325
> > >   _raw_spin_lock+0x30/0x3f
> > >   recycle_rp_inst+0x3f/0x86
> > >   __kretprobe_trampoline_handler+0x13a/0x177
> > >   trampoline_handler+0x48/0x57
> > >   kretprobe_trampoline+0x2a/0x4f
> > >   kretprobe_trampoline+0x0/0x4f
> > >   init_kprobes+0x193/0x19d
> > >   do_one_initcall+0xf9/0x27e
> > >   kernel_init_freeable+0x16e/0x2b6
> > >   kernel_init+0xe/0x109
> > >   ret_from_fork+0x22/0x30
> > > irq event stamp: 1670
> > > hardirqs last  enabled at (1669): [<ffffffff811cc344>] slab_free_freelist_hook+0xb4/0xfd
> > > hardirqs last disabled at (1670): [<ffffffff81da0887>] exc_int3+0xae/0x10a
> > > softirqs last  enabled at (1484): [<ffffffff82000352>] __do_softirq+0x352/0x38d
> > > softirqs last disabled at (1471): [<ffffffff81e00f82>] asm_call_irq_on_stack+0x12/0x20
> > > 
> > > other info that might help us debug this:
> > >  Possible unsafe locking scenario:
> > > 
> > >        CPU0
> > >        ----
> > >   lock(&rp->lock);
> > >   <Interrupt>
> > >     lock(&rp->lock);
> > > 
> > >  *** DEADLOCK ***
> > > 
> > > no locks held by swapper/0/1.
> > > 
> > > stack backtrace:
> > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc1-test+ #29
> > > Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> > > Call Trace:
> > >  dump_stack+0x7d/0x9f
> > >  print_usage_bug+0x1c0/0x1d3
> > >  lock_acquire+0x302/0x325
> > >  ? pre_handler_kretprobe+0x4b/0x193
> > >  ? stop_machine_from_inactive_cpu+0x120/0x120
> > >  _raw_spin_lock_irqsave+0x43/0x58
> > >  ? pre_handler_kretprobe+0x4b/0x193
> > >  pre_handler_kretprobe+0x4b/0x193
> > >  ? stop_machine_from_inactive_cpu+0x120/0x120
> > >  ? kprobe_target+0x1/0x16
> > >  kprobe_int3_handler+0xd0/0x109
> > >  exc_int3+0xb8/0x10a
> > >  asm_exc_int3+0x31/0x40
> > > RIP: 0010:kprobe_target+0x1/0x16
> > >  5d c3 cc
> > > RSP: 0000:ffffc90000033e00 EFLAGS: 00000246
> > > RAX: ffffffff8110ea77 RBX: 0000000000000001 RCX: ffffc90000033cb4
> > > RDX: 0000000000000231 RSI: 0000000000000000 RDI: 000000003ca57c35
> > > RBP: ffffc90000033e20 R08: 0000000000000000 R09: ffffffff8111d207
> > > R10: ffff8881002ab480 R11: ffff8881002ab480 R12: 0000000000000000
> > > R13: ffffffff82a52af0 R14: 0000000000000200 R15: ffff888100331130
> > >  ? register_kprobe+0x43c/0x492
> > >  ? stop_machine_from_inactive_cpu+0x120/0x120
> > >  ? kprobe_target+0x1/0x16
> > >  ? init_test_probes+0x2c6/0x38a
> > >  init_kprobes+0x193/0x19d
> > >  ? debugfs_kprobe_init+0xb8/0xb8
> > >  do_one_initcall+0xf9/0x27e
> > >  ? rcu_read_lock_sched_held+0x3e/0x75
> > >  ? init_mm_internals+0x27b/0x284
> > >  kernel_init_freeable+0x16e/0x2b6
> > >  ? rest_init+0x152/0x152
> > >  kernel_init+0xe/0x109
> > >  ret_from_fork+0x22/0x30
> > > Kprobe smoke test: passed successfully
> > > 
> > > Config attached.
> > 
> > Thanks for the report! Let me check what happen.
> 
> OK, confirmed. But this is actually false-positive report.
> 
> The lockdep reports rp->lock case between pre_handler_kretprobe()
> and recycle_rp_inst() from __kretprobe_trampoline_handler().
> Since kretprobe_trampoline_handler() sets current_kprobe,
> if other kprobes hits on same CPU, those are skipped. This means
> pre_handler_kretprobe() is not called while executing
> __kretprobe_trampoline_handler().
> 
> Actually, since this rp->lock is expected to be removed in the last
> patch in this series ([21/21]), I left this as is, but we might better
> to treat this case because the latter half of this series will be
> merged in 5.11.
> 
> Hmm, are there any way to tell lockdep this is safe?
> 

This can supress the warnings. After introducing the lockless patch,
we don't need this anymore.


From 509b27efef8c7dbf56cab2e812916d6cd778c745 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat@kernel.org>
Date: Mon, 2 Nov 2020 15:37:28 +0900
Subject: [PATCH] kprobes: Disable lockdep for kprobe busy area

Since the code area in between kprobe_busy_begin()/end() prohibits
other kprobs to call probe handlers, we can avoid inconsitent
locks there. But lockdep doesn't know that, so it warns rp->lock
or kretprobe_table_lock.

To supress those false-positive errors, disable lockdep while
kprobe_busy is set.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/kprobes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8a12a25fa40d..c7196e583600 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1295,10 +1295,12 @@ void kprobe_busy_begin(void)
 	__this_cpu_write(current_kprobe, &kprobe_busy);
 	kcb = get_kprobe_ctlblk();
 	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+	lockdep_off();
 }
 
 void kprobe_busy_end(void)
 {
+	lockdep_on();
 	__this_cpu_write(current_kprobe, NULL);
 	preempt_enable();
 }
-- 
2.25.1


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 14/21] kprobes: Remove NMI context check
  2020-11-02  7:02         ` Masami Hiramatsu
@ 2020-11-02 14:27           ` Steven Rostedt
  2020-11-03  5:39             ` Masami Hiramatsu
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2020-11-02 14:27 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Eddy_Wu, x86, davem,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck


[ Peter Z, please take a look a this ]

On Mon, 2 Nov 2020 16:02:34 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> >From 509b27efef8c7dbf56cab2e812916d6cd778c745 Mon Sep 17 00:00:00 2001  
> From: Masami Hiramatsu <mhiramat@kernel.org>
> Date: Mon, 2 Nov 2020 15:37:28 +0900
> Subject: [PATCH] kprobes: Disable lockdep for kprobe busy area
> 
> Since the code area in between kprobe_busy_begin()/end() prohibits
> other kprobs to call probe handlers, we can avoid inconsitent
> locks there. But lockdep doesn't know that, so it warns rp->lock
> or kretprobe_table_lock.
> 
> To supress those false-positive errors, disable lockdep while
> kprobe_busy is set.
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> ---
>  kernel/kprobes.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 8a12a25fa40d..c7196e583600 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1295,10 +1295,12 @@ void kprobe_busy_begin(void)
>  	__this_cpu_write(current_kprobe, &kprobe_busy);
>  	kcb = get_kprobe_ctlblk();
>  	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +	lockdep_off();
>  }
>  
>  void kprobe_busy_end(void)
>  {
> +	lockdep_on();
>  	__this_cpu_write(current_kprobe, NULL);
>  	preempt_enable();
>  }
> -- 

No, this is not the correct workaround (too big of a hammer). You could do
the following:

From 4139d9c8437b0bd2262e989ca4eb0a83b7e7bb72 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Date: Mon, 2 Nov 2020 09:17:49 -0500
Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting

Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in NMI context and other context to
supress the false warnings.

Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/kprobes.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8a12a25fa40d..ccb285867059 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1249,7 +1249,12 @@ __acquires(hlist_lock)
 
 	*head = &kretprobe_inst_table[hash];
 	hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
+	/*
+	 * Nested is a workaround that will soon not be needed.
+	 * There's other protections that make sure the same lock
+	 * is not taken on the same CPU that lockdep is unaware of.
+	 */
+	raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
 }
 NOKPROBE_SYMBOL(kretprobe_hash_lock);
 
@@ -1258,7 +1263,12 @@ static void kretprobe_table_lock(unsigned long hash,
 __acquires(hlist_lock)
 {
 	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
+	/*
+	 * Nested is a workaround that will soon not be needed.
+	 * There's other protections that make sure the same lock
+	 * is not taken on the same CPU that lockdep is unaware of.
+	 */
+	raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
 }
 NOKPROBE_SYMBOL(kretprobe_table_lock);
 
@@ -2025,10 +2035,16 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 	struct kretprobe *rp = container_of(p, struct kretprobe, kp);
 	unsigned long hash, flags = 0;
 	struct kretprobe_instance *ri;
+	int nmi = !!in_nmi();
 
 	/* TODO: consider to only swap the RA after the last pre_handler fired */
 	hash = hash_ptr(current, KPROBE_HASH_BITS);
-	raw_spin_lock_irqsave(&rp->lock, flags);
+	/*
+	 * Nested is a workaround that will soon not be needed.
+	 * There's other protections that make sure the same lock
+	 * is not taken on the same CPU that lockdep is unaware of.
+	 */
+	raw_spin_lock_irqsave_nested(&rp->lock, flags, nmi);
 	if (!hlist_empty(&rp->free_instances)) {
 		ri = hlist_entry(rp->free_instances.first,
 				struct kretprobe_instance, hlist);
@@ -2039,7 +2055,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 		ri->task = current;
 
 		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
-			raw_spin_lock_irqsave(&rp->lock, flags);
+			raw_spin_lock_irqsave_nested(&rp->lock, flags, nmi);
 			hlist_add_head(&ri->hlist, &rp->free_instances);
 			raw_spin_unlock_irqrestore(&rp->lock, flags);
 			return 0;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 14/21] kprobes: Remove NMI context check
  2020-11-02 14:27           ` Steven Rostedt
@ 2020-11-03  5:39             ` Masami Hiramatsu
  2020-11-03 16:09               ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-11-03  5:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Eddy_Wu, x86, davem,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

On Mon, 2 Nov 2020 09:27:26 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> [ Peter Z, please take a look a this ]
> 
> On Mon, 2 Nov 2020 16:02:34 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > >From 509b27efef8c7dbf56cab2e812916d6cd778c745 Mon Sep 17 00:00:00 2001  
> > From: Masami Hiramatsu <mhiramat@kernel.org>
> > Date: Mon, 2 Nov 2020 15:37:28 +0900
> > Subject: [PATCH] kprobes: Disable lockdep for kprobe busy area
> > 
> > Since the code area in between kprobe_busy_begin()/end() prohibits
> > other kprobs to call probe handlers, we can avoid inconsitent
> > locks there. But lockdep doesn't know that, so it warns rp->lock
> > or kretprobe_table_lock.
> > 
> > To supress those false-positive errors, disable lockdep while
> > kprobe_busy is set.
> > 
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> > ---
> >  kernel/kprobes.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 8a12a25fa40d..c7196e583600 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1295,10 +1295,12 @@ void kprobe_busy_begin(void)
> >  	__this_cpu_write(current_kprobe, &kprobe_busy);
> >  	kcb = get_kprobe_ctlblk();
> >  	kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> > +	lockdep_off();
> >  }
> >  
> >  void kprobe_busy_end(void)
> >  {
> > +	lockdep_on();
> >  	__this_cpu_write(current_kprobe, NULL);
> >  	preempt_enable();
> >  }
> > -- 
> 
> No, this is not the correct workaround (too big of a hammer). You could do
> the following:
> 
> From 4139d9c8437b0bd2262e989ca4eb0a83b7e7bb72 Mon Sep 17 00:00:00 2001
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> Date: Mon, 2 Nov 2020 09:17:49 -0500
> Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting
> 
> Since the kprobe handlers have protection that prohibits other handlers from
> executing in other contexts (like if an NMI comes in while processing a
> kprobe, and executes the same kprobe, it will get fail with a "busy"
> return). Lockdep is unaware of this protection. Use lockdep's nesting api to
> differentiate between locks taken in NMI context and other context to
> supress the false warnings.

Ah, OK. This looks good to me.

BTW, in_nmi() in pre_handler_kretprobe() always be true because
now int3 is treated as an NMI. So you can always pass 1 there.

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>

Thank you,

> 
> Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  kernel/kprobes.c | 24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 8a12a25fa40d..ccb285867059 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1249,7 +1249,12 @@ __acquires(hlist_lock)
>  
>  	*head = &kretprobe_inst_table[hash];
>  	hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_lock_irqsave(hlist_lock, *flags);
> +	/*
> +	 * Nested is a workaround that will soon not be needed.
> +	 * There's other protections that make sure the same lock
> +	 * is not taken on the same CPU that lockdep is unaware of.
> +	 */
> +	raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
>  }
>  NOKPROBE_SYMBOL(kretprobe_hash_lock);
>  
> @@ -1258,7 +1263,12 @@ static void kretprobe_table_lock(unsigned long hash,
>  __acquires(hlist_lock)
>  {
>  	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_lock_irqsave(hlist_lock, *flags);
> +	/*
> +	 * Nested is a workaround that will soon not be needed.
> +	 * There's other protections that make sure the same lock
> +	 * is not taken on the same CPU that lockdep is unaware of.
> +	 */
> +	raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
>  }
>  NOKPROBE_SYMBOL(kretprobe_table_lock);
>  
> @@ -2025,10 +2035,16 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  	struct kretprobe *rp = container_of(p, struct kretprobe, kp);
>  	unsigned long hash, flags = 0;
>  	struct kretprobe_instance *ri;
> +	int nmi = !!in_nmi();
>  
>  	/* TODO: consider to only swap the RA after the last pre_handler fired */
>  	hash = hash_ptr(current, KPROBE_HASH_BITS);
> -	raw_spin_lock_irqsave(&rp->lock, flags);
> +	/*
> +	 * Nested is a workaround that will soon not be needed.
> +	 * There's other protections that make sure the same lock
> +	 * is not taken on the same CPU that lockdep is unaware of.
> +	 */
> +	raw_spin_lock_irqsave_nested(&rp->lock, flags, nmi);
>  	if (!hlist_empty(&rp->free_instances)) {
>  		ri = hlist_entry(rp->free_instances.first,
>  				struct kretprobe_instance, hlist);
> @@ -2039,7 +2055,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  		ri->task = current;
>  
>  		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> -			raw_spin_lock_irqsave(&rp->lock, flags);
> +			raw_spin_lock_irqsave_nested(&rp->lock, flags, nmi);
>  			hlist_add_head(&ri->hlist, &rp->free_instances);
>  			raw_spin_unlock_irqrestore(&rp->lock, flags);
>  			return 0;
> -- 
> 2.25.4
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 14/21] kprobes: Remove NMI context check
  2020-11-03  5:39             ` Masami Hiramatsu
@ 2020-11-03 16:09               ` Steven Rostedt
  2020-11-04  2:08                 ` Masami Hiramatsu
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2020-11-03 16:09 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Eddy_Wu, x86, davem,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

On Tue, 3 Nov 2020 14:39:38 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Ah, OK. This looks good to me.
> 
> BTW, in_nmi() in pre_handler_kretprobe() always be true because
> now int3 is treated as an NMI. So you can always pass 1 there.

What about the below patch then?

> 
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>

Thanks!

From 29ac1a5c9068df06f3196173d4325c8076759551 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Date: Mon, 2 Nov 2020 09:17:49 -0500
Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting

Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in INT3 context and other context to
suppress the false warnings.

Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/kprobes.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8a12a25fa40d..30889ea5514f 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1249,7 +1249,12 @@ __acquires(hlist_lock)
 
 	*head = &kretprobe_inst_table[hash];
 	hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
+	/*
+	 * Nested is a workaround that will soon not be needed.
+	 * There's other protections that make sure the same lock
+	 * is not taken on the same CPU that lockdep is unaware of.
+	 */
+	raw_spin_lock_irqsave_nested(hlist_lock, *flags, 1);
 }
 NOKPROBE_SYMBOL(kretprobe_hash_lock);
 
@@ -1258,7 +1263,12 @@ static void kretprobe_table_lock(unsigned long hash,
 __acquires(hlist_lock)
 {
 	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
+	/*
+	 * Nested is a workaround that will soon not be needed.
+	 * There's other protections that make sure the same lock
+	 * is not taken on the same CPU that lockdep is unaware of.
+	 */
+	raw_spin_lock_irqsave_nested(hlist_lock, *flags, 1);
 }
 NOKPROBE_SYMBOL(kretprobe_table_lock);
 
@@ -2028,7 +2038,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 
 	/* TODO: consider to only swap the RA after the last pre_handler fired */
 	hash = hash_ptr(current, KPROBE_HASH_BITS);
-	raw_spin_lock_irqsave(&rp->lock, flags);
+	/*
+	 * Nested is a workaround that will soon not be needed.
+	 * There's other protections that make sure the same lock
+	 * is not taken on the same CPU that lockdep is unaware of.
+	 */
+	raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
 	if (!hlist_empty(&rp->free_instances)) {
 		ri = hlist_entry(rp->free_instances.first,
 				struct kretprobe_instance, hlist);
@@ -2039,7 +2054,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 		ri->task = current;
 
 		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
-			raw_spin_lock_irqsave(&rp->lock, flags);
+			raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
 			hlist_add_head(&ri->hlist, &rp->free_instances);
 			raw_spin_unlock_irqrestore(&rp->lock, flags);
 			return 0;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 14/21] kprobes: Remove NMI context check
  2020-11-03 16:09               ` Steven Rostedt
@ 2020-11-04  2:08                 ` Masami Hiramatsu
  2020-11-04 14:47                   ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Masami Hiramatsu @ 2020-11-04  2:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Eddy_Wu, x86, davem,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

On Tue, 3 Nov 2020 11:09:13 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 3 Nov 2020 14:39:38 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > Ah, OK. This looks good to me.
> > 
> > BTW, in_nmi() in pre_handler_kretprobe() always be true because
> > now int3 is treated as an NMI. So you can always pass 1 there.
> 
> What about the below patch then?

kretprobe_hash_lock() and kretprobe_table_lock() will be called from
outside of the kprobe pre_handler context. So, please keep in_nmi()
in those functions.
for the pre_handler_kretprobe(), this looks good to me.

Thank you,

> > Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> 
> Thanks!
> 
> From 29ac1a5c9068df06f3196173d4325c8076759551 Mon Sep 17 00:00:00 2001
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> Date: Mon, 2 Nov 2020 09:17:49 -0500
> Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting
> 
> Since the kprobe handlers have protection that prohibits other handlers from
> executing in other contexts (like if an NMI comes in while processing a
> kprobe, and executes the same kprobe, it will get fail with a "busy"
> return). Lockdep is unaware of this protection. Use lockdep's nesting api to
> differentiate between locks taken in INT3 context and other context to
> suppress the false warnings.
> 
> Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  kernel/kprobes.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 8a12a25fa40d..30889ea5514f 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1249,7 +1249,12 @@ __acquires(hlist_lock)
>  
>  	*head = &kretprobe_inst_table[hash];
>  	hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_lock_irqsave(hlist_lock, *flags);
> +	/*
> +	 * Nested is a workaround that will soon not be needed.
> +	 * There's other protections that make sure the same lock
> +	 * is not taken on the same CPU that lockdep is unaware of.
> +	 */
> +	raw_spin_lock_irqsave_nested(hlist_lock, *flags, 1);
>  }
>  NOKPROBE_SYMBOL(kretprobe_hash_lock);
>  
> @@ -1258,7 +1263,12 @@ static void kretprobe_table_lock(unsigned long hash,
>  __acquires(hlist_lock)
>  {
>  	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_lock_irqsave(hlist_lock, *flags);
> +	/*
> +	 * Nested is a workaround that will soon not be needed.
> +	 * There's other protections that make sure the same lock
> +	 * is not taken on the same CPU that lockdep is unaware of.
> +	 */
> +	raw_spin_lock_irqsave_nested(hlist_lock, *flags, 1);
>  }
>  NOKPROBE_SYMBOL(kretprobe_table_lock);
>  
> @@ -2028,7 +2038,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  
>  	/* TODO: consider to only swap the RA after the last pre_handler fired */
>  	hash = hash_ptr(current, KPROBE_HASH_BITS);
> -	raw_spin_lock_irqsave(&rp->lock, flags);
> +	/*
> +	 * Nested is a workaround that will soon not be needed.
> +	 * There's other protections that make sure the same lock
> +	 * is not taken on the same CPU that lockdep is unaware of.
> +	 */
> +	raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
>  	if (!hlist_empty(&rp->free_instances)) {
>  		ri = hlist_entry(rp->free_instances.first,
>  				struct kretprobe_instance, hlist);
> @@ -2039,7 +2054,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  		ri->task = current;
>  
>  		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> -			raw_spin_lock_irqsave(&rp->lock, flags);
> +			raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
>  			hlist_add_head(&ri->hlist, &rp->free_instances);
>  			raw_spin_unlock_irqrestore(&rp->lock, flags);
>  			return 0;
> -- 
> 2.25.4
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 14/21] kprobes: Remove NMI context check
  2020-11-04  2:08                 ` Masami Hiramatsu
@ 2020-11-04 14:47                   ` Steven Rostedt
  2020-11-05  5:15                     ` Masami Hiramatsu
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2020-11-04 14:47 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Eddy_Wu, x86, davem,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

On Wed, 4 Nov 2020 11:08:52 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> kretprobe_hash_lock() and kretprobe_table_lock() will be called from
> outside of the kprobe pre_handler context. So, please keep in_nmi()
> in those functions.
> for the pre_handler_kretprobe(), this looks good to me.
> 

Final version, before sending to Linus.

-- Steve

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting

Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in INT3 context and other context to
suppress the false warnings.

Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org

Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/kprobes.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8a12a25fa40d..41fdbb7953c6 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1249,7 +1249,13 @@ __acquires(hlist_lock)
 
 	*head = &kretprobe_inst_table[hash];
 	hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
+	/*
+	 * Nested is a workaround that will soon not be needed.
+	 * There's other protections that make sure the same lock
+	 * is not taken on the same CPU that lockdep is unaware of.
+	 * Differentiate when it is taken in NMI context.
+	 */
+	raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
 }
 NOKPROBE_SYMBOL(kretprobe_hash_lock);
 
@@ -1258,7 +1264,13 @@ static void kretprobe_table_lock(unsigned long hash,
 __acquires(hlist_lock)
 {
 	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
-	raw_spin_lock_irqsave(hlist_lock, *flags);
+	/*
+	 * Nested is a workaround that will soon not be needed.
+	 * There's other protections that make sure the same lock
+	 * is not taken on the same CPU that lockdep is unaware of.
+	 * Differentiate when it is taken in NMI context.
+	 */
+	raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
 }
 NOKPROBE_SYMBOL(kretprobe_table_lock);
 
@@ -2028,7 +2040,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 
 	/* TODO: consider to only swap the RA after the last pre_handler fired */
 	hash = hash_ptr(current, KPROBE_HASH_BITS);
-	raw_spin_lock_irqsave(&rp->lock, flags);
+	/*
+	 * Nested is a workaround that will soon not be needed.
+	 * There's other protections that make sure the same lock
+	 * is not taken on the same CPU that lockdep is unaware of.
+	 */
+	raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
 	if (!hlist_empty(&rp->free_instances)) {
 		ri = hlist_entry(rp->free_instances.first,
 				struct kretprobe_instance, hlist);
@@ -2039,7 +2056,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 		ri->task = current;
 
 		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
-			raw_spin_lock_irqsave(&rp->lock, flags);
+			raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
 			hlist_add_head(&ri->hlist, &rp->free_instances);
 			raw_spin_unlock_irqrestore(&rp->lock, flags);
 			return 0;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v5 14/21] kprobes: Remove NMI context check
  2020-11-04 14:47                   ` Steven Rostedt
@ 2020-11-05  5:15                     ` Masami Hiramatsu
  0 siblings, 0 replies; 51+ messages in thread
From: Masami Hiramatsu @ 2020-11-05  5:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Eddy_Wu, x86, davem,
	naveen.n.rao, anil.s.keshavamurthy, linux-arch, cameron, oleg,
	will, paulmck

On Wed, 4 Nov 2020 09:47:22 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 4 Nov 2020 11:08:52 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > kretprobe_hash_lock() and kretprobe_table_lock() will be called from
> > outside of the kprobe pre_handler context. So, please keep in_nmi()
> > in those functions.
> > for the pre_handler_kretprobe(), this looks good to me.
> > 
> 
> Final version, before sending to Linus.

This looks good to me :)

Thank you!

> 
> -- Steve
> 
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting
> 
> Since the kprobe handlers have protection that prohibits other handlers from
> executing in other contexts (like if an NMI comes in while processing a
> kprobe, and executes the same kprobe, it will get fail with a "busy"
> return). Lockdep is unaware of this protection. Use lockdep's nesting api to
> differentiate between locks taken in INT3 context and other context to
> suppress the false warnings.
> 
> Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  kernel/kprobes.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 8a12a25fa40d..41fdbb7953c6 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1249,7 +1249,13 @@ __acquires(hlist_lock)
>  
>  	*head = &kretprobe_inst_table[hash];
>  	hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_lock_irqsave(hlist_lock, *flags);
> +	/*
> +	 * Nested is a workaround that will soon not be needed.
> +	 * There's other protections that make sure the same lock
> +	 * is not taken on the same CPU that lockdep is unaware of.
> +	 * Differentiate when it is taken in NMI context.
> +	 */
> +	raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
>  }
>  NOKPROBE_SYMBOL(kretprobe_hash_lock);
>  
> @@ -1258,7 +1264,13 @@ static void kretprobe_table_lock(unsigned long hash,
>  __acquires(hlist_lock)
>  {
>  	raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> -	raw_spin_lock_irqsave(hlist_lock, *flags);
> +	/*
> +	 * Nested is a workaround that will soon not be needed.
> +	 * There's other protections that make sure the same lock
> +	 * is not taken on the same CPU that lockdep is unaware of.
> +	 * Differentiate when it is taken in NMI context.
> +	 */
> +	raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
>  }
>  NOKPROBE_SYMBOL(kretprobe_table_lock);
>  
> @@ -2028,7 +2040,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  
>  	/* TODO: consider to only swap the RA after the last pre_handler fired */
>  	hash = hash_ptr(current, KPROBE_HASH_BITS);
> -	raw_spin_lock_irqsave(&rp->lock, flags);
> +	/*
> +	 * Nested is a workaround that will soon not be needed.
> +	 * There's other protections that make sure the same lock
> +	 * is not taken on the same CPU that lockdep is unaware of.
> +	 */
> +	raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
>  	if (!hlist_empty(&rp->free_instances)) {
>  		ri = hlist_entry(rp->free_instances.first,
>  				struct kretprobe_instance, hlist);
> @@ -2039,7 +2056,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>  		ri->task = current;
>  
>  		if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> -			raw_spin_lock_irqsave(&rp->lock, flags);
> +			raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
>  			hlist_add_head(&ri->hlist, &rp->free_instances);
>  			raw_spin_unlock_irqrestore(&rp->lock, flags);
>  			return 0;
> -- 
> 2.25.4
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2020-11-05  5:15 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-29 12:59 [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Masami Hiramatsu
2020-08-29 13:00 ` [PATCH v5 01/21] kprobes: Add generic kretprobe trampoline handler Masami Hiramatsu
2020-08-29 13:00 ` [PATCH v5 02/21] x86/kprobes: Use " Masami Hiramatsu
2020-08-29 13:00 ` [PATCH v5 03/21] arm: kprobes: " Masami Hiramatsu
2020-08-29 13:00 ` [PATCH v5 04/21] arm64: " Masami Hiramatsu
2020-08-29 13:00 ` [PATCH v5 05/21] arc: " Masami Hiramatsu
2020-08-29 13:00 ` [PATCH v5 06/21] csky: " Masami Hiramatsu
2020-08-29 13:01 ` [PATCH v5 07/21] ia64: " Masami Hiramatsu
2020-08-29 13:01 ` [PATCH v5 08/21] mips: " Masami Hiramatsu
2020-08-29 13:01 ` [PATCH v5 09/21] parisc: " Masami Hiramatsu
2020-08-29 13:01 ` [PATCH v5 10/21] powerpc: " Masami Hiramatsu
2020-08-29 13:02 ` [PATCH v5 11/21] s390: " Masami Hiramatsu
2020-08-29 13:02 ` [PATCH v5 12/21] sh: " Masami Hiramatsu
2020-08-29 13:02 ` [PATCH v5 13/21] sparc: " Masami Hiramatsu
2020-08-29 13:02 ` [PATCH v5 14/21] kprobes: Remove NMI context check Masami Hiramatsu
     [not found]   ` <20201030213831.04e81962@oasis.local.home>
2020-11-02  5:11     ` Masami Hiramatsu
2020-11-02  5:53       ` Masami Hiramatsu
2020-11-02  7:02         ` Masami Hiramatsu
2020-11-02 14:27           ` Steven Rostedt
2020-11-03  5:39             ` Masami Hiramatsu
2020-11-03 16:09               ` Steven Rostedt
2020-11-04  2:08                 ` Masami Hiramatsu
2020-11-04 14:47                   ` Steven Rostedt
2020-11-05  5:15                     ` Masami Hiramatsu
2020-08-29 13:02 ` [PATCH v5 15/21] kprobes: Free kretprobe_instance with rcu callback Masami Hiramatsu
2020-08-29 13:03 ` [PATCH v5 16/21] kprobes: Make local used functions static Masami Hiramatsu
2020-08-29 13:03 ` [PATCH v5 17/21] llist: Add nonatomic __llist_add() and __llist_dell_all() Masami Hiramatsu
2020-10-12 16:24   ` Ingo Molnar
2020-10-14  0:24     ` Masami Hiramatsu
2020-08-29 13:03 ` [PATCH v5 18/21] kprobes: Remove kretprobe hash Masami Hiramatsu
2020-08-29 13:03 ` [PATCH v5 19/21] asm-generic/atomic: Add try_cmpxchg() fallbacks Masami Hiramatsu
2020-10-12 16:25   ` Ingo Molnar
2020-08-29 13:03 ` [PATCH v5 20/21] freelist: Lock less freelist Masami Hiramatsu
2020-08-29 13:03 ` [PATCH v5 21/21] kprobes: Replace rp->free_instance with freelist Masami Hiramatsu
2020-09-01 19:08 ` [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless Peter Zijlstra
2020-09-02  0:37   ` Masami Hiramatsu
2020-09-02  7:02     ` peterz
2020-09-02  8:17       ` Masami Hiramatsu
2020-09-02  9:36         ` peterz
2020-09-02 13:19           ` Masami Hiramatsu
2020-09-02 13:42             ` peterz
2020-09-03  1:39               ` Masami Hiramatsu
2020-09-03  2:02                 ` Masami Hiramatsu
2020-09-07 17:44                   ` Frank Ch. Eigler
2020-09-08  2:55                     ` Masami Hiramatsu
2020-09-08 10:37                 ` peterz
2020-09-08 11:15                   ` Eddy_Wu
2020-09-08 11:33                     ` peterz
2020-09-08 15:09                   ` Masami Hiramatsu
2020-09-09  5:28                     ` Masami Hiramatsu
2020-09-11  2:32       ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).