linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-28  5:21 Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 01/10] Uprobes: Move uprobe structure to uprobe.h Ravi Bangoria
                   ` (9 more replies)
  0 siblings, 10 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-28  5:21 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, linux-arm-kernel, linux-mips, linux, ralf,
	paul.burton, Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the a uprobe infrastructure. Ex,[2]

  # cat tick.c
    ... 
    for (i = 0; i < 100; i++) {
	DTRACE_PROBE1(tick, loop1, i);
        if (TICK_LOOP2_ENABLED()) {
            DTRACE_PROBE1(tick, loop2, i); 
        }
        printf("hi: %d\n", i); 
        sleep(1);
    }   
    ... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
             3      sdt_tick:loop1
             0      sdt_tick:loop2
     2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C  
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed


v5 changes:
 - Fix build failure. 'struct uprobe' was local to uprobe.c file and I
   was using it in some arch specific code which caused a build failure.
   Added new patch [PATCH v5 1/10] to fix it.

v4 changes:
 - Previous version moved the implementation from trace_uprobe to core
   uprobe. But it had some issues. I've fixed them in this version. To
   cut a long story short, reference counter increment/decrement is
   tied to instruction patching. Whenever instruction gets patched, we
   update the reference counter. Now, what if vma holding reference
   counter is not present while patching an instruction? To overcome
   this issue, we will add such uprobe to delayed_uprobe list. Whenever
   process maps the region holding the reference counter, we increment
   it and remove the uprobe from delayed_uprobe list.
 - Until last version, we were incrementing reference counter for each
   uprobe_consumer. That isn't true now. With this implementation, we
   increment and decrement the counter only once. This is fine because
   we increment the counter when we find first valid consumer and we
   decrement it when last consumer is going away. (For a tiny binary,
   multiple vmas maps to the same file portion. In such case, we are
   patching the instruction multiple time and thus we will increment /
   decrement the reference counter multiple time as well.)
 - Fortunately, there is no need to maintain sdt_mm_list now, because
   we are sure that the increment and decrement will always happen in
   sync. This make the implementation lot more simpler compared to
   earlier versions.

v4 can be foud at: https://lkml.org/lkml/2018/6/19/1324 

Note:
 - 'reference counter' is called as 'semaphore' in original Dtrace
   (or Systemtap, bcc and even in ELF) documentation and code. But the 
   term 'semaphore' is misleading in this context. This is just a counter
   used to hold number of tracers tracing on a marker. This is not really
   used for any synchronization. So we are referring it as 'reference
   counter' in kernel / perf code.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506


Ravi Bangoria (10):
  Uprobes: Move uprobe structure to uprobe.h
  Uprobes: Simplify uprobe_register() body
  Uprobe: Change set_swbp definition
  Uprobe: Change set_orig_insn definition
  Uprobe: Change uprobe_write_opcode definition
  Uprobes: Support SDT markers having reference count (semaphore)
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  Uprobes/sdt: Document about reference counter
  perf probe: Support SDT markers having reference counter (semaphore)

 Documentation/trace/uprobetracer.rst |  16 +-
 arch/arm/probes/uprobes/core.c       |   6 +-
 arch/mips/kernel/uprobes.c           |   6 +-
 include/linux/uprobes.h              |  37 +++-
 kernel/events/uprobes.c              | 390 ++++++++++++++++++++++++++++-------
 kernel/trace/trace.c                 |   2 +-
 kernel/trace/trace_uprobe.c          |  74 ++++++-
 tools/perf/util/probe-event.c        |  39 +++-
 tools/perf/util/probe-event.h        |   1 +
 tools/perf/util/probe-file.c         |  34 ++-
 tools/perf/util/probe-file.h         |   1 +
 tools/perf/util/symbol-elf.c         |  46 +++--
 tools/perf/util/symbol.h             |   7 +
 13 files changed, 542 insertions(+), 117 deletions(-)

-- 
2.14.4


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v5 01/10] Uprobes: Move uprobe structure to uprobe.h
  2018-06-28  5:21 [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
@ 2018-06-28  5:22 ` Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 02/10] Uprobes: Simplify uprobe_register() body Ravi Bangoria
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-28  5:22 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, linux-arm-kernel, linux-mips, linux, ralf,
	paul.burton, Ravi Bangoria

We need struct uprobe in some arch specific files. Move it to
uprobe.h from uprobe.c

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 include/linux/uprobes.h | 25 +++++++++++++++++++++++++
 kernel/events/uprobes.c | 24 ------------------------
 2 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e950df8..044359083a57 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -28,6 +28,7 @@
 #include <linux/rbtree.h>
 #include <linux/types.h>
 #include <linux/wait.h>
+#include <linux/rwsem.h>
 
 struct vm_area_struct;
 struct mm_struct;
@@ -61,6 +62,30 @@ struct uprobe_consumer {
 #ifdef CONFIG_UPROBES
 #include <asm/uprobes.h>
 
+struct uprobe {
+	struct rb_node		rb_node;	/* node in the rb tree */
+	atomic_t		ref;
+	struct rw_semaphore	register_rwsem;
+	struct rw_semaphore	consumer_rwsem;
+	struct list_head	pending_list;
+	struct uprobe_consumer	*consumers;
+	struct inode		*inode;		/* Also hold a ref to inode */
+	loff_t			offset;
+	unsigned long		flags;
+
+	/*
+	 * The generic code assumes that it has two members of unknown type
+	 * owned by the arch-specific code:
+	 *
+	 *	insn -	copy_insn() saves the original instruction here for
+	 *		arch_uprobe_analyze_insn().
+	 *
+	 *	ixol -	potentially modified instruction to execute out of
+	 *		line, copied to xol_area by xol_get_insn_slot().
+	 */
+	struct arch_uprobe	arch;
+};
+
 enum uprobe_task_state {
 	UTASK_RUNNING,
 	UTASK_SSTEP,
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ccc579a7d32e..e6e812304b22 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -64,30 +64,6 @@ static struct percpu_rw_semaphore dup_mmap_sem;
 /* Have a copy of original instruction */
 #define UPROBE_COPY_INSN	0
 
-struct uprobe {
-	struct rb_node		rb_node;	/* node in the rb tree */
-	atomic_t		ref;
-	struct rw_semaphore	register_rwsem;
-	struct rw_semaphore	consumer_rwsem;
-	struct list_head	pending_list;
-	struct uprobe_consumer	*consumers;
-	struct inode		*inode;		/* Also hold a ref to inode */
-	loff_t			offset;
-	unsigned long		flags;
-
-	/*
-	 * The generic code assumes that it has two members of unknown type
-	 * owned by the arch-specific code:
-	 *
-	 * 	insn -	copy_insn() saves the original instruction here for
-	 *		arch_uprobe_analyze_insn().
-	 *
-	 *	ixol -	potentially modified instruction to execute out of
-	 *		line, copied to xol_area by xol_get_insn_slot().
-	 */
-	struct arch_uprobe	arch;
-};
-
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 02/10] Uprobes: Simplify uprobe_register() body
  2018-06-28  5:21 [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 01/10] Uprobes: Move uprobe structure to uprobe.h Ravi Bangoria
@ 2018-06-28  5:22 ` Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 03/10] Uprobe: Change set_swbp definition Ravi Bangoria
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-28  5:22 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, linux-arm-kernel, linux-mips, linux, ralf,
	paul.burton, Ravi Bangoria

Simplify uprobe_register() function body and let __uprobe_register()
handle everything. Also move dependency functions around to fix build
failures.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/events/uprobes.c | 69 ++++++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index e6e812304b22..26c2d95c55fa 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -816,13 +816,8 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 	return err;
 }
 
-static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-	consumer_add(uprobe, uc);
-	return register_for_each_vma(uprobe, uc);
-}
-
-static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
+static void
+__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 	int err;
 
@@ -836,24 +831,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *u
 }
 
 /*
- * uprobe_register - register a probe
+ * uprobe_unregister - unregister a already registered probe.
+ * @inode: the file in which the probe has to be removed.
+ * @offset: offset from the start of the file.
+ * @uc: identify which probe if multiple probes are colocated.
+ */
+void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+{
+	struct uprobe *uprobe;
+
+	uprobe = find_uprobe(inode, offset);
+	if (WARN_ON(!uprobe))
+		return;
+
+	down_write(&uprobe->register_rwsem);
+	__uprobe_unregister(uprobe, uc);
+	up_write(&uprobe->register_rwsem);
+	put_uprobe(uprobe);
+}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
+
+/*
+ * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, uprobe_register() takes a creation
+ * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of uprobe_register() is required to keep @inode
+ * unregisters. Caller of __uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+static int __uprobe_register(struct inode *inode, loff_t offset,
+			     struct uprobe_consumer *uc)
 {
 	struct uprobe *uprobe;
 	int ret;
@@ -880,7 +897,8 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
 	down_write(&uprobe->register_rwsem);
 	ret = -EAGAIN;
 	if (likely(uprobe_is_active(uprobe))) {
-		ret = __uprobe_register(uprobe, uc);
+		consumer_add(uprobe, uc);
+		ret = register_for_each_vma(uprobe, uc);
 		if (ret)
 			__uprobe_unregister(uprobe, uc);
 	}
@@ -891,6 +909,12 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
 		goto retry;
 	return ret;
 }
+
+int uprobe_register(struct inode *inode, loff_t offset,
+		    struct uprobe_consumer *uc)
+{
+	return __uprobe_register(inode, offset, uc);
+}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
@@ -922,27 +946,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
 	return ret;
 }
 
-/*
- * uprobe_unregister - unregister a already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
- * @uc: identify which probe if multiple probes are colocated.
- */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
-{
-	struct uprobe *uprobe;
-
-	uprobe = find_uprobe(inode, offset);
-	if (WARN_ON(!uprobe))
-		return;
-
-	down_write(&uprobe->register_rwsem);
-	__uprobe_unregister(uprobe, uc);
-	up_write(&uprobe->register_rwsem);
-	put_uprobe(uprobe);
-}
-EXPORT_SYMBOL_GPL(uprobe_unregister);
-
 static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 {
 	struct vm_area_struct *vma;
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 03/10] Uprobe: Change set_swbp definition
  2018-06-28  5:21 [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 01/10] Uprobes: Move uprobe structure to uprobe.h Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 02/10] Uprobes: Simplify uprobe_register() body Ravi Bangoria
@ 2018-06-28  5:22 ` Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 04/10] Uprobe: Change set_orig_insn definition Ravi Bangoria
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-28  5:22 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, linux-arm-kernel, linux-mips, linux, ralf,
	paul.burton, Ravi Bangoria

Instead of arch_uprobe, pass uprobe object to set_swbp(). We need
this in later set of patches.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/arm/probes/uprobes/core.c | 4 ++--
 arch/mips/kernel/uprobes.c     | 4 ++--
 include/linux/uprobes.h        | 2 +-
 kernel/events/uprobes.c        | 6 +++---
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index d1329f1ba4e4..c678a5c4a284 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -29,11 +29,11 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
 		(UPROBE_SWBP_ARM_INSN & 0x0fffffff);
 }
 
-int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
+int set_swbp(struct uprobe *uprobe, struct mm_struct *mm,
 	     unsigned long vaddr)
 {
 	return uprobe_write_opcode(mm, vaddr,
-		   __opcode_to_mem_arm(auprobe->bpinsn));
+		   __opcode_to_mem_arm(uprobe->arch.bpinsn));
 }
 
 bool arch_uprobe_ignore(struct arch_uprobe *auprobe, struct pt_regs *regs)
diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c
index f7a0645ccb82..2aaa378a46e6 100644
--- a/arch/mips/kernel/uprobes.c
+++ b/arch/mips/kernel/uprobes.c
@@ -211,7 +211,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
 
 /**
  * set_swbp - store breakpoint at a given address.
- * @auprobe: arch specific probepoint information.
+ * @uprobe: uprobe object.
  * @mm: the probed process address space.
  * @vaddr: the virtual address to insert the opcode.
  *
@@ -221,7 +221,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
  * This version overrides the weak version in kernel/events/uprobes.c.
  * It is required to handle MIPS16 and microMIPS.
  */
-int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
+int __weak set_swbp(struct uprobe *uprobe, struct mm_struct *mm,
 	unsigned long vaddr)
 {
 	return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 044359083a57..7a77bd57c9e7 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -140,7 +140,7 @@ struct uprobes_state {
 	struct xol_area		*xol_area;
 };
 
-extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
+extern int set_swbp(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 26c2d95c55fa..8de52bcd1a18 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -318,14 +318,14 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
 
 /**
  * set_swbp - store breakpoint at a given address.
- * @auprobe: arch specific probepoint information.
+ * @uprobe: uprobe object.
  * @mm: the probed process address space.
  * @vaddr: the virtual address to insert the opcode.
  *
  * For mm @mm, store the breakpoint instruction at @vaddr.
  * Return 0 (success) or a negative errno.
  */
-int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
+int __weak set_swbp(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
 {
 	return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
 }
@@ -642,7 +642,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
 	if (first_uprobe)
 		set_bit(MMF_HAS_UPROBES, &mm->flags);
 
-	ret = set_swbp(&uprobe->arch, mm, vaddr);
+	ret = set_swbp(uprobe, mm, vaddr);
 	if (!ret)
 		clear_bit(MMF_RECALC_UPROBES, &mm->flags);
 	else if (first_uprobe)
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 04/10] Uprobe: Change set_orig_insn definition
  2018-06-28  5:21 [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (2 preceding siblings ...)
  2018-06-28  5:22 ` [PATCH v5 03/10] Uprobe: Change set_swbp definition Ravi Bangoria
@ 2018-06-28  5:22 ` Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 05/10] Uprobe: Change uprobe_write_opcode definition Ravi Bangoria
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-28  5:22 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, linux-arm-kernel, linux-mips, linux, ralf,
	paul.burton, Ravi Bangoria

Instead of arch_uprobe, pass uprobe object to set_orig_insn(). We
need this in later set of patches.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 include/linux/uprobes.h | 2 +-
 kernel/events/uprobes.c | 9 +++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 7a77bd57c9e7..85db5918f675 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -141,7 +141,7 @@ struct uprobes_state {
 };
 
 extern int set_swbp(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr);
-extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
+extern int set_orig_insn(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 8de52bcd1a18..04fe80057331 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -333,16 +333,17 @@ int __weak set_swbp(struct uprobe *uprobe, struct mm_struct *mm, unsigned long v
 /**
  * set_orig_insn - Restore the original instruction.
  * @mm: the probed process address space.
- * @auprobe: arch specific probepoint information.
+ * @uprobe: uprobe object.
  * @vaddr: the virtual address to insert the opcode.
  *
  * For mm @mm, restore the original opcode (opcode) at @vaddr.
  * Return 0 (success) or a negative errno.
  */
 int __weak
-set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
+set_orig_insn(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
 {
-	return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t *)&auprobe->insn);
+	return uprobe_write_opcode(mm, vaddr,
+			*(uprobe_opcode_t *)&(uprobe->arch.insn));
 }
 
 static struct uprobe *get_uprobe(struct uprobe *uprobe)
@@ -655,7 +656,7 @@ static int
 remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
 {
 	set_bit(MMF_RECALC_UPROBES, &mm->flags);
-	return set_orig_insn(&uprobe->arch, mm, vaddr);
+	return set_orig_insn(uprobe, mm, vaddr);
 }
 
 static inline bool uprobe_is_active(struct uprobe *uprobe)
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 05/10] Uprobe: Change uprobe_write_opcode definition
  2018-06-28  5:21 [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (3 preceding siblings ...)
  2018-06-28  5:22 ` [PATCH v5 04/10] Uprobe: Change set_orig_insn definition Ravi Bangoria
@ 2018-06-28  5:22 ` Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-28  5:22 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, linux-arm-kernel, linux-mips, linux, ralf,
	paul.burton, Ravi Bangoria

Add addition argument 'uprobe' to uprobe_write_opcode(). We need
this in later set of patches.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/arm/probes/uprobes/core.c | 2 +-
 arch/mips/kernel/uprobes.c     | 2 +-
 include/linux/uprobes.h        | 2 +-
 kernel/events/uprobes.c        | 9 +++++----
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index c678a5c4a284..a4856a904a0a 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
 int set_swbp(struct uprobe *uprobe, struct mm_struct *mm,
 	     unsigned long vaddr)
 {
-	return uprobe_write_opcode(mm, vaddr,
+	return uprobe_write_opcode(uprobe, mm, vaddr,
 		   __opcode_to_mem_arm(uprobe->arch.bpinsn));
 }
 
diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c
index 2aaa378a46e6..8bdf4b54c4a7 100644
--- a/arch/mips/kernel/uprobes.c
+++ b/arch/mips/kernel/uprobes.c
@@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
 int __weak set_swbp(struct uprobe *uprobe, struct mm_struct *mm,
 	unsigned long vaddr)
 {
-	return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+	return uprobe_write_opcode(uprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 85db5918f675..88942b1f0a3e 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -146,7 +146,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
+extern int uprobe_write_opcode(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t opcode);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 04fe80057331..3e0426c76376 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -268,6 +268,7 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
  * that have fixed length instructions.
  *
  * uprobe_write_opcode - write the opcode at a given virtual address.
+ * @uprobe: uprobe object.
  * @mm: the probed process address space.
  * @vaddr: the virtual address to store the opcode.
  * @opcode: opcode to be written at @vaddr.
@@ -275,8 +276,8 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
  * Called with mm->mmap_sem held for write.
  * Return 0 (success) or a negative errno.
  */
-int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
-			uprobe_opcode_t opcode)
+int uprobe_write_opcode(struct uprobe *uprobe, struct mm_struct *mm,
+			unsigned long vaddr, uprobe_opcode_t opcode)
 {
 	struct page *old_page, *new_page;
 	struct vm_area_struct *vma;
@@ -327,7 +328,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
  */
 int __weak set_swbp(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
 {
-	return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+	return uprobe_write_opcode(uprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 /**
@@ -342,7 +343,7 @@ int __weak set_swbp(struct uprobe *uprobe, struct mm_struct *mm, unsigned long v
 int __weak
 set_orig_insn(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
 {
-	return uprobe_write_opcode(mm, vaddr,
+	return uprobe_write_opcode(uprobe, mm, vaddr,
 			*(uprobe_opcode_t *)&(uprobe->arch.insn));
 }
 
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-28  5:21 [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (4 preceding siblings ...)
  2018-06-28  5:22 ` [PATCH v5 05/10] Uprobe: Change uprobe_write_opcode definition Ravi Bangoria
@ 2018-06-28  5:22 ` Ravi Bangoria
  2018-06-28 19:51   ` Oleg Nesterov
                     ` (2 more replies)
  2018-06-28  5:22 ` [PATCH v5 07/10] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
                   ` (3 subsequent siblings)
  9 siblings, 3 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-28  5:22 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, linux-arm-kernel, linux-mips, linux, ralf,
	paul.burton, Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

    <path>:<offset>[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

    uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 include/linux/uprobes.h     |   6 +
 kernel/events/uprobes.c     | 276 +++++++++++++++++++++++++++++++++++++++++---
 kernel/trace/trace_uprobe.c |  38 +++++-
 3 files changed, 302 insertions(+), 18 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 88942b1f0a3e..3ef21f546c1f 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -71,6 +71,7 @@ struct uprobe {
 	struct uprobe_consumer	*consumers;
 	struct inode		*inode;		/* Also hold a ref to inode */
 	loff_t			offset;
+	loff_t			ref_ctr_offset;
 	unsigned long		flags;
 
 	/*
@@ -148,6 +149,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t opcode);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -185,6 +187,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
 {
 	return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+	return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 3e0426c76376..61f9b0024794 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -64,6 +64,20 @@ static struct percpu_rw_semaphore dup_mmap_sem;
 /* Have a copy of original instruction */
 #define UPROBE_COPY_INSN	0
 
+enum {
+	UPROBE_OFFSET = 0,
+	REF_CTR_OFFSET
+};
+
+struct delayed_uprobe {
+	struct list_head list;
+	struct uprobe *uprobe;
+	struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -258,6 +272,158 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
 	return 1;
 }
 
+static loff_t uprobe_get_offset(struct uprobe *u, int off_type)
+{
+	return (off_type == UPROBE_OFFSET) ? u->offset : u->ref_ctr_offset;
+}
+
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct delayed_uprobe *du;
+
+	list_for_each_entry(du, &delayed_uprobe_list, list)
+		if (du->uprobe == uprobe && du->mm == mm)
+			return du;
+	return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct delayed_uprobe *du;
+
+	if (delayed_uprobe_check(uprobe, mm))
+		return 0;
+
+	du  = kzalloc(sizeof(*du), GFP_KERNEL);
+	if (!du)
+		return -ENOMEM;
+
+	du->uprobe = uprobe;
+	du->mm = mm;
+	list_add(&du->list, &delayed_uprobe_list);
+	return 0;
+}
+
+static void delayed_uprobe_delete(struct delayed_uprobe *du)
+{
+	if (!du)
+		return;
+	list_del(&du->list);
+	kfree(du);
+}
+
+static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct list_head *pos, *q;
+	struct delayed_uprobe *du;
+
+	list_for_each_safe(pos, q, &delayed_uprobe_list) {
+		du = list_entry(pos, struct delayed_uprobe, list);
+		if (uprobe) {
+			if (du->uprobe == uprobe && du->mm == mm)
+				delayed_uprobe_delete(du);
+		} else if (du->mm == mm) {
+			delayed_uprobe_delete(du);
+		}
+	}
+}
+
+static bool valid_ref_ctr_vma(struct uprobe *uprobe,
+			      struct vm_area_struct *vma)
+{
+	unsigned long vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
+
+	return uprobe->ref_ctr_offset &&
+		vma->vm_file &&
+		file_inode(vma->vm_file) == uprobe->inode &&
+		vma->vm_flags & VM_WRITE &&
+		vma->vm_start <= vaddr &&
+		vma->vm_end > vaddr;
+}
+
+static struct vm_area_struct *
+find_ref_ctr_vma(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct vm_area_struct *tmp;
+
+	for (tmp = mm->mmap; tmp; tmp = tmp->vm_next)
+		if (valid_ref_ctr_vma(uprobe, tmp))
+			return tmp;
+
+	return NULL;
+}
+
+static int
+__update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
+{
+	void *kaddr;
+	struct page *page;
+	struct vm_area_struct *vma;
+	int ret = 0;
+	short *ptr;
+
+	if (vaddr == 0 || d == 0)
+		return -EINVAL;
+
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+		FOLL_FORCE | FOLL_WRITE, &page, &vma, NULL);
+	if (unlikely(ret <= 0)) {
+		/*
+		 * We are asking for 1 page. If get_user_pages_remote() fails,
+		 * it may return 0, in that case we have to return error.
+		 */
+		ret = (ret == 0) ? -EBUSY : ret;
+		pr_warn("Failed to %s ref_ctr. (%d)\n",
+			d > 0 ? "increment" : "decrement", ret);
+		return ret;
+	}
+
+	kaddr = kmap_atomic(page);
+	ptr = kaddr + (vaddr & ~PAGE_MASK);
+
+	if (unlikely(*ptr + d < 0)) {
+		pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
+			"curr val: %d, delta: %d\n", vaddr, *ptr, d);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	*ptr += d;
+	ret = 0;
+out:
+	kunmap_atomic(kaddr);
+	put_page(page);
+	return ret;
+}
+
+static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
+			  bool is_register)
+{
+	struct vm_area_struct *rc_vma;
+	unsigned long rc_vaddr;
+	int ret = 0;
+
+	rc_vma = find_ref_ctr_vma(uprobe, mm);
+
+	if (rc_vma) {
+		rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
+		ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
+
+		if (is_register)
+			return ret;
+	}
+
+	mutex_lock(&delayed_uprobe_lock);
+	if (is_register)
+		ret = delayed_uprobe_add(uprobe, mm);
+	else
+		delayed_uprobe_remove(uprobe, mm);
+	mutex_unlock(&delayed_uprobe_lock);
+
+	return ret;
+}
+
 /*
  * NOTE:
  * Expect the breakpoint instruction to be the smallest size instruction for
@@ -281,7 +447,9 @@ int uprobe_write_opcode(struct uprobe *uprobe, struct mm_struct *mm,
 {
 	struct page *old_page, *new_page;
 	struct vm_area_struct *vma;
-	int ret;
+	int ret, is_register, ref_ctr_updated = 0;
+
+	is_register = is_swbp_insn(&opcode);
 
 retry:
 	/* Read the page with vaddr into memory */
@@ -294,6 +462,15 @@ int uprobe_write_opcode(struct uprobe *uprobe, struct mm_struct *mm,
 	if (ret <= 0)
 		goto put_old;
 
+	/* Update the ref_ctr if we are going to replace instruction. */
+	if (!ref_ctr_updated) {
+		ret = update_ref_ctr(uprobe, mm, is_register);
+		if (ret)
+			goto put_old;
+
+		ref_ctr_updated = 1;
+	}
+
 	ret = anon_vma_prepare(vma);
 	if (ret)
 		goto put_old;
@@ -314,6 +491,11 @@ int uprobe_write_opcode(struct uprobe *uprobe, struct mm_struct *mm,
 
 	if (unlikely(ret == -EAGAIN))
 		goto retry;
+
+	/* Revert back reference counter if instruction update failed. */
+	if (ret && is_register && ref_ctr_updated)
+		update_ref_ctr(uprobe, mm, false);
+
 	return ret;
 }
 
@@ -461,7 +643,8 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
 	return u;
 }
 
-static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
+static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
+				   loff_t ref_ctr_offset)
 {
 	struct uprobe *uprobe, *cur_uprobe;
 
@@ -471,6 +654,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
 
 	uprobe->inode = inode;
 	uprobe->offset = offset;
+	uprobe->ref_ctr_offset = ref_ctr_offset;
 	init_rwsem(&uprobe->register_rwsem);
 	init_rwsem(&uprobe->consumer_rwsem);
 
@@ -872,7 +1056,7 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
  * else return 0 (success)
  */
 static int __uprobe_register(struct inode *inode, loff_t offset,
-			     struct uprobe_consumer *uc)
+			     loff_t ref_ctr_offset, struct uprobe_consumer *uc)
 {
 	struct uprobe *uprobe;
 	int ret;
@@ -889,7 +1073,7 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 		return -EINVAL;
 
  retry:
-	uprobe = alloc_uprobe(inode, offset);
+	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (!uprobe)
 		return -ENOMEM;
 	/*
@@ -915,10 +1099,17 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 int uprobe_register(struct inode *inode, loff_t offset,
 		    struct uprobe_consumer *uc)
 {
-	return __uprobe_register(inode, offset, uc);
+	return __uprobe_register(inode, offset, 0, uc);
 }
 EXPORT_SYMBOL_GPL(uprobe_register);
 
+int uprobe_register_refctr(struct inode *inode, loff_t offset,
+			   loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+	return __uprobe_register(inode, offset, ref_ctr_offset, uc);
+}
+EXPORT_SYMBOL_GPL(uprobe_register_refctr);
+
 /*
  * uprobe_apply - unregister a already registered probe.
  * @inode: the file in which the probe has to be removed.
@@ -976,21 +1167,22 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 }
 
 static struct rb_node *
-find_node_in_range(struct inode *inode, loff_t min, loff_t max)
+find_node_in_range(struct inode *inode, int off_type, loff_t min, loff_t max)
 {
 	struct rb_node *n = uprobes_tree.rb_node;
 
 	while (n) {
 		struct uprobe *u = rb_entry(n, struct uprobe, rb_node);
+		loff_t offset = uprobe_get_offset(u, off_type);
 
 		if (inode < u->inode) {
 			n = n->rb_left;
 		} else if (inode > u->inode) {
 			n = n->rb_right;
 		} else {
-			if (max < u->offset)
+			if (max < offset)
 				n = n->rb_left;
-			else if (min > u->offset)
+			else if (min > offset)
 				n = n->rb_right;
 			else
 				break;
@@ -1004,6 +1196,7 @@ find_node_in_range(struct inode *inode, loff_t min, loff_t max)
  * For a given range in vma, build a list of probes that need to be inserted.
  */
 static void build_probe_list(struct inode *inode,
+				int off_type,
 				struct vm_area_struct *vma,
 				unsigned long start, unsigned long end,
 				struct list_head *head)
@@ -1011,24 +1204,29 @@ static void build_probe_list(struct inode *inode,
 	loff_t min, max;
 	struct rb_node *n, *t;
 	struct uprobe *u;
+	loff_t offset;
 
 	INIT_LIST_HEAD(head);
 	min = vaddr_to_offset(vma, start);
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
-	n = find_node_in_range(inode, min, max);
+	n = find_node_in_range(inode, off_type, min, max);
 	if (n) {
 		for (t = n; t; t = rb_prev(t)) {
 			u = rb_entry(t, struct uprobe, rb_node);
-			if (u->inode != inode || u->offset < min)
+			offset = uprobe_get_offset(u, off_type);
+
+			if (u->inode != inode || offset < min)
 				break;
 			list_add(&u->pending_list, head);
 			get_uprobe(u);
 		}
 		for (t = n; (t = rb_next(t)); ) {
 			u = rb_entry(t, struct uprobe, rb_node);
-			if (u->inode != inode || u->offset > max)
+			offset = uprobe_get_offset(u, off_type);
+
+			if (u->inode != inode || offset > max)
 				break;
 			list_add(&u->pending_list, head);
 			get_uprobe(u);
@@ -1037,6 +1235,43 @@ static void build_probe_list(struct inode *inode,
 	spin_unlock(&uprobes_treelock);
 }
 
+static int
+delayed_uprobe_install(struct vm_area_struct *vma, struct inode *inode)
+{
+	struct list_head tmp_list;
+	struct uprobe *uprobe, *u;
+	struct delayed_uprobe *du;
+	unsigned long vaddr;
+	int ret = 0, err = 0;
+
+	build_probe_list(inode, REF_CTR_OFFSET, vma, vma->vm_start,
+			 vma->vm_end, &tmp_list);
+
+	list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
+		if (!uprobe->ref_ctr_offset ||
+		    !uprobe_is_active(uprobe) ||
+		    !valid_ref_ctr_vma(uprobe, vma))
+			goto cont;
+
+		mutex_lock(&delayed_uprobe_lock);
+		du = delayed_uprobe_check(uprobe, vma->vm_mm);
+		if (!du) {
+			mutex_unlock(&delayed_uprobe_lock);
+			goto cont;
+		}
+
+		vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
+		ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
+		/* Record an error and continue. */
+		err = ret & !err ? ret : err;
+		delayed_uprobe_delete(du);
+		mutex_unlock(&delayed_uprobe_lock);
+cont:
+		put_uprobe(uprobe);
+	}
+	return err;
+}
+
 /*
  * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
  *
@@ -1049,7 +1284,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	struct uprobe *uprobe, *u;
 	struct inode *inode;
 
-	if (no_uprobe_events() || !valid_vma(vma, true))
+	if (no_uprobe_events())
 		return 0;
 
 	inode = file_inode(vma->vm_file);
@@ -1057,7 +1292,14 @@ int uprobe_mmap(struct vm_area_struct *vma)
 		return 0;
 
 	mutex_lock(uprobes_mmap_hash(inode));
-	build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list);
+	if (vma->vm_flags & VM_WRITE)
+		delayed_uprobe_install(vma, inode);
+
+	if (!valid_vma(vma, true))
+		goto out;
+
+	build_probe_list(inode, UPROBE_OFFSET, vma, vma->vm_start,
+			 vma->vm_end, &tmp_list);
 	/*
 	 * We can race with uprobe_unregister(), this uprobe can be already
 	 * removed. But in this case filter_chain() must return false, all
@@ -1071,8 +1313,8 @@ int uprobe_mmap(struct vm_area_struct *vma)
 		}
 		put_uprobe(uprobe);
 	}
+out:
 	mutex_unlock(uprobes_mmap_hash(inode));
-
 	return 0;
 }
 
@@ -1089,7 +1331,7 @@ vma_has_uprobes(struct vm_area_struct *vma, unsigned long start, unsigned long e
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
-	n = find_node_in_range(inode, min, max);
+	n = find_node_in_range(inode, UPROBE_OFFSET, min, max);
 	spin_unlock(&uprobes_treelock);
 
 	return !!n;
@@ -1223,6 +1465,10 @@ void uprobe_clear_state(struct mm_struct *mm)
 {
 	struct xol_area *area = mm->uprobes_state.xol_area;
 
+	mutex_lock(&delayed_uprobe_lock);
+	delayed_uprobe_remove(NULL, mm);
+	mutex_unlock(&delayed_uprobe_lock);
+
 	if (!area)
 		return;
 
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf89a51e740d..bf2be098eb08 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -59,6 +59,7 @@ struct trace_uprobe {
 	struct inode			*inode;
 	char				*filename;
 	unsigned long			offset;
+	unsigned long			ref_ctr_offset;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -364,10 +365,10 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 static int create_trace_uprobe(int argc, char **argv)
 {
 	struct trace_uprobe *tu;
-	char *arg, *event, *group, *filename;
+	char *arg, *event, *group, *filename, *rctr, *rctr_end;
 	char buf[MAX_EVENT_NAME_LEN];
 	struct path path;
-	unsigned long offset;
+	unsigned long offset, ref_ctr_offset;
 	bool is_delete, is_return;
 	int i, ret;
 
@@ -376,6 +377,7 @@ static int create_trace_uprobe(int argc, char **argv)
 	is_return = false;
 	event = NULL;
 	group = NULL;
+	ref_ctr_offset = 0;
 
 	/* argc must be >= 1 */
 	if (argv[0][0] == '-')
@@ -450,6 +452,26 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 
+	/* Parse reference counter offset if specified. */
+	rctr = strchr(arg, '(');
+	if (rctr) {
+		rctr_end = strchr(rctr, ')');
+		if (rctr > rctr_end || *(rctr_end + 1) != 0) {
+			ret = -EINVAL;
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+
+		*rctr++ = '\0';
+		*rctr_end = '\0';
+		ret = kstrtoul(rctr, 0, &ref_ctr_offset);
+		if (ret) {
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+	}
+
+	/* Parse uprobe offset. */
 	ret = kstrtoul(arg, 0, &offset);
 	if (ret)
 		goto fail_address_parse;
@@ -484,6 +506,7 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 	tu->offset = offset;
+	tu->ref_ctr_offset = ref_ctr_offset;
 	tu->path = path;
 	tu->filename = kstrdup(filename, GFP_KERNEL);
 
@@ -602,6 +625,9 @@ static int probes_seq_show(struct seq_file *m, void *v)
 			trace_event_name(&tu->tp.call), tu->filename,
 			(int)(sizeof(void *) * 2), tu->offset);
 
+	if (tu->ref_ctr_offset)
+		seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
+
 	for (i = 0; i < tu->tp.nr_args; i++)
 		seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
 
@@ -917,7 +943,13 @@ probe_event_enable(struct trace_uprobe *tu, struct trace_event_file *file,
 
 	tu->consumer.filter = filter;
 	tu->inode = d_real_inode(tu->path.dentry);
-	ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
+	if (tu->ref_ctr_offset) {
+		ret = uprobe_register_refctr(tu->inode, tu->offset,
+				tu->ref_ctr_offset, &tu->consumer);
+	} else {
+		ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
+	}
+
 	if (ret)
 		goto err_buffer;
 
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 07/10] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  2018-06-28  5:21 [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (5 preceding siblings ...)
  2018-06-28  5:22 ` [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
@ 2018-06-28  5:22 ` Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 08/10] Uprobes/sdt: " Ravi Bangoria
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-28  5:22 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, linux-arm-kernel, linux-mips, linux, ralf,
	paul.burton, Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0x10030)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

One exception to this is when user is trying to replace the old
entry with the new one. We allow this if the new entry does not
conflict with any other existing entry.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/trace/trace_uprobe.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf2be098eb08..7dd86f302f2a 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,34 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
 	return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Only one exception to this is when we are
+ * replacing old trace_uprobe with new one(same group/event).
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+	struct trace_uprobe *tmp, *old = NULL;
+	struct inode *new_inode = d_real_inode(new->path.dentry);
+
+	old = find_probe_event(trace_event_name(&new->tp.call),
+				new->tp.call.class->system);
+	if (!new->ref_ctr_offset)
+		return old;
+
+	list_for_each_entry(tmp, &uprobe_list, list) {
+		if (new_inode == d_real_inode(tmp->path.dentry) &&
+		    new->offset == tmp->offset &&
+		    new->ref_ctr_offset != tmp->ref_ctr_offset &&
+		    tmp != old) {
+			pr_warn("Reference counter offset mismatch.");
+			return ERR_PTR(-EINVAL);
+		}
+	}
+	return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +361,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 	mutex_lock(&uprobe_lock);
 
 	/* register as an event */
-	old_tu = find_probe_event(trace_event_name(&tu->tp.call),
-			tu->tp.call.class->system);
+	old_tu = find_old_trace_uprobe(tu);
+	if (IS_ERR(old_tu)) {
+		ret = PTR_ERR(old_tu);
+		goto end;
+	}
+
 	if (old_tu) {
 		/* delete old event */
 		ret = unregister_trace_uprobe(old_tu);
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 08/10] Uprobes/sdt: Prevent multiple reference counter for same uprobe
  2018-06-28  5:21 [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (6 preceding siblings ...)
  2018-06-28  5:22 ` [PATCH v5 07/10] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
@ 2018-06-28  5:22 ` Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 09/10] Uprobes/sdt: Document about reference counter Ravi Bangoria
  2018-06-28  5:22 ` [PATCH v5 10/10] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
  9 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-28  5:22 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, linux-arm-kernel, linux-mips, linux, ralf,
	paul.burton, Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to register multiple uprobes having same
inode+offset but different reference counter.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/events/uprobes.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 61f9b0024794..feefeeb6d2c6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -662,6 +662,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 	cur_uprobe = insert_uprobe(uprobe);
 	/* a uprobe exists for this inode:offset combination */
 	if (cur_uprobe) {
+		if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+			pr_warn("Err: Reference counter mismatch.\n");
+			put_uprobe(cur_uprobe);
+			kfree(uprobe);
+			return ERR_PTR(-EINVAL);
+		}
 		kfree(uprobe);
 		uprobe = cur_uprobe;
 	}
@@ -1076,6 +1082,9 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (!uprobe)
 		return -ENOMEM;
+	if (IS_ERR(uprobe))
+		return PTR_ERR(uprobe);
+
 	/*
 	 * We can race with uprobe_unregister()->delete_uprobe().
 	 * Check uprobe_is_active() and retry if it is false.
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 09/10] Uprobes/sdt: Document about reference counter
  2018-06-28  5:21 [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (7 preceding siblings ...)
  2018-06-28  5:22 ` [PATCH v5 08/10] Uprobes/sdt: " Ravi Bangoria
@ 2018-06-28  5:22 ` Ravi Bangoria
  2018-07-02 14:54   ` Srikar Dronamraju
  2018-06-28  5:22 ` [PATCH v5 10/10] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
  9 siblings, 1 reply; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-28  5:22 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, linux-arm-kernel, linux-mips, linux, ralf,
	paul.burton, Ravi Bangoria

Reference counter gate the invocation of probe. If present,
by default reference count is 0. Kernel needs to increment
it before tracing the probe and decrement it when done. This
is identical to semaphore in Userspace Statically Defined
Tracepoints (USDT).

Document usage of reference counter.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Documentation/trace/uprobetracer.rst | 16 +++++++++++++---
 kernel/trace/trace.c                 |  2 +-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
index 98d3f692957a..0c27c654a21f 100644
--- a/Documentation/trace/uprobetracer.rst
+++ b/Documentation/trace/uprobetracer.rst
@@ -22,15 +22,25 @@ Synopsis of uprobe_tracer
 -------------------------
 ::
 
-  p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe
-  r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe (uretprobe)
-  -:[GRP/]EVENT                           : Clear uprobe or uretprobe event
+  p[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  r[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  -:[GRP/]EVENT
+
+  p : Set a uprobe
+  r : Set a return uprobe (uretprobe)
+  - : Clear uprobe or uretprobe event
 
   GRP           : Group name. If omitted, "uprobes" is the default value.
   EVENT         : Event name. If omitted, the event name is generated based
                   on PATH+OFFSET.
   PATH          : Path to an executable or a library.
   OFFSET        : Offset where the probe is inserted.
+  REF_CTR_OFFSET: Reference counter offset. Optional field. Reference count
+                 gate the invocation of probe. If present, by default
+                 reference count is 0. Kernel needs to increment it before
+                 tracing the probe and decrement it when done. This is
+                 identical to semaphore in Userspace Statically Defined
+                 Tracepoints (USDT).
 
   FETCHARGS     : Arguments. Each probe can have up to 128 args.
    %REG         : Fetch register REG
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a0079b4c7a49..a46268e2aa6a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4609,7 +4609,7 @@ static const char readme_msg[] =
   "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
 #endif
 #ifdef CONFIG_UPROBE_EVENTS
-	"\t    place: <path>:<offset>\n"
+  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
 #endif
 	"\t     args: <name>=fetcharg[:type]\n"
 	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v5 10/10] perf probe: Support SDT markers having reference counter (semaphore)
  2018-06-28  5:21 [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (8 preceding siblings ...)
  2018-06-28  5:22 ` [PATCH v5 09/10] Uprobes/sdt: Document about reference counter Ravi Bangoria
@ 2018-06-28  5:22 ` Ravi Bangoria
  2018-07-02 14:45   ` Srikar Dronamraju
  2018-07-02 14:57   ` Srikar Dronamraju
  9 siblings, 2 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-28  5:22 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, linux-arm-kernel, linux-mips, linux, ralf,
	paul.burton, Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
    Name: loop2
    ... Semaphore: 0x0000000010020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 tools/perf/util/probe-event.c | 39 ++++++++++++++++++++++++++++++++----
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++++++++++++++++++++++++++------
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ++++++++++++++++++++++++++++++++-----------
 tools/perf/util/symbol.h      |  7 +++++++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f119eb628dbb..e86f8be89157 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev)
 			tp->offset = strtoul(fmt2_str, NULL, 10);
 	}
 
+	if (tev->uprobes) {
+		fmt2_str = strchr(p, '(');
+		if (fmt2_str)
+			tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+	}
+
 	tev->nargs = argc - 2;
 	tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
 	if (tev->args == NULL) {
@@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct probe_trace_arg *arg,
 	return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+	struct probe_trace_point *tp = &tev->point;
+	int err;
+
+	err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+	if (err >= 0 && tp->ref_ctr_offset) {
+		if (!uprobe_ref_ctr_is_supported())
+			return -1;
+		err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+	}
+	return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
 	struct probe_trace_point *tp = &tev->point;
@@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 	}
 
 	/* Use the tp->address for uprobes */
-	if (tev->uprobes)
-		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
-	else if (!strncmp(tp->symbol, "0x", 2))
+	if (tev->uprobes) {
+		err = synthesize_uprobe_trace_def(tev, &buf);
+	} else if (!strncmp(tp->symbol, "0x", 2)) {
 		/* Absolute address. See try_to_find_absolute_address() */
 		err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
 				  tp->module ? ":" : "", tp->address);
-	else
+	} else {
 		err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
 				tp->module ? ":" : "", tp->symbol, tp->offset);
+	}
+
 	if (err)
 		goto error;
 
@@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct probe_trace_event *tev)
 {
 	int i;
 	char *buf = synthesize_probe_trace_command(tev);
+	struct probe_trace_point *tp = &tev->point;
+
+	if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+		pr_warning("A semaphore is associated with %s:%s and "
+			   "seems your kernel doesn't support it.\n",
+			   tev->group, tev->event);
+	}
 
 	/* Old uprobe event doesn't support memory dereference */
 	if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f020558..15a98c3a2a2f 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
 	char		*symbol;	/* Base symbol */
 	char		*module;	/* Module name */
 	unsigned long	offset;		/* Offset from symbol */
+	unsigned long	ref_ctr_offset;	/* SDT reference counter offset */
 	unsigned long	address;	/* Actual address of the trace point */
 	bool		retprobe;	/* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index b76088fadf3d..aac7817d9e14 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -696,8 +696,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
 static unsigned long long sdt_note__get_addr(struct sdt_note *note)
 {
-	return note->bit32 ? (unsigned long long)note->addr.a32[0]
-		 : (unsigned long long)note->addr.a64[0];
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_LOC] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_LOC];
+}
+
+static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
+{
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_REFCTR] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_REFCTR];
 }
 
 static const char * const type_to_suffix[] = {
@@ -775,14 +783,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
 {
 	struct strbuf buf;
 	char *ret = NULL, **args;
-	int i, args_count;
+	int i, args_count, err;
+	unsigned long long ref_ctr_offset;
 
 	if (strbuf_init(&buf, 32) < 0)
 		return NULL;
 
-	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
-				sdtgrp, note->name, pathname,
-				sdt_note__get_addr(note)) < 0)
+	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
+			sdtgrp, note->name, pathname,
+			sdt_note__get_addr(note));
+
+	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
+	if (ref_ctr_offset && err >= 0)
+		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
+
+	if (err < 0)
 		goto error;
 
 	if (!note->args)
@@ -998,6 +1013,7 @@ int probe_cache__show_all_caches(struct strfilter *filter)
 enum ftrace_readme {
 	FTRACE_README_PROBE_TYPE_X = 0,
 	FTRACE_README_KRETPROBE_OFFSET,
+	FTRACE_README_UPROBE_REF_CTR,
 	FTRACE_README_END,
 };
 
@@ -1009,6 +1025,7 @@ static struct {
 	[idx] = {.pattern = pat, .avail = false}
 	DEFINE_TYPE(FTRACE_README_PROBE_TYPE_X, "*type: * x8/16/32/64,*"),
 	DEFINE_TYPE(FTRACE_README_KRETPROBE_OFFSET, "*place (kretprobe): *"),
+	DEFINE_TYPE(FTRACE_README_UPROBE_REF_CTR, "*ref_ctr_offset*"),
 };
 
 static bool scan_ftrace_readme(enum ftrace_readme type)
@@ -1064,3 +1081,8 @@ bool kretprobe_offset_is_supported(void)
 {
 	return scan_ftrace_readme(FTRACE_README_KRETPROBE_OFFSET);
 }
+
+bool uprobe_ref_ctr_is_supported(void)
+{
+	return scan_ftrace_readme(FTRACE_README_UPROBE_REF_CTR);
+}
diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
index 63f29b1d22c1..2a249182f2a6 100644
--- a/tools/perf/util/probe-file.h
+++ b/tools/perf/util/probe-file.h
@@ -69,6 +69,7 @@ struct probe_cache_entry *probe_cache__find_by_name(struct probe_cache *pcache,
 int probe_cache__show_all_caches(struct strfilter *filter);
 bool probe_type_is_available(enum probe_type type);
 bool kretprobe_offset_is_supported(void);
+bool uprobe_ref_ctr_is_supported(void);
 #else	/* ! HAVE_LIBELF_SUPPORT */
 static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused, struct nsinfo *nsi __maybe_unused)
 {
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 29770ea61768..0281d5e2cd67 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1947,6 +1947,34 @@ void kcore_extract__delete(struct kcore_extract *kce)
 }
 
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
+
+static void sdt_adjust_loc(struct sdt_note *tmp, GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32)
+		tmp->addr.a32[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a32[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a32[SDT_NOTE_IDX_BASE];
+	else
+		tmp->addr.a64[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a64[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a64[SDT_NOTE_IDX_BASE];
+}
+
+static void sdt_adjust_refctr(struct sdt_note *tmp, GElf_Addr base_addr,
+			      GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32 && tmp->addr.a32[SDT_NOTE_IDX_REFCTR])
+		tmp->addr.a32[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+	else if (tmp->addr.a64[SDT_NOTE_IDX_REFCTR])
+		tmp->addr.a64[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+}
+
 /**
  * populate_sdt_note : Parse raw data and identify SDT note
  * @elf: elf of the opened file
@@ -1964,7 +1992,6 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	const char *provider, *name, *args;
 	struct sdt_note *tmp = NULL;
 	GElf_Ehdr ehdr;
-	GElf_Addr base_off = 0;
 	GElf_Shdr shdr;
 	int ret = -EINVAL;
 
@@ -2060,17 +2087,12 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	 * base address in the description of the SDT note. If its different,
 	 * then accordingly, adjust the note location.
 	 */
-	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL)) {
-		base_off = shdr.sh_offset;
-		if (base_off) {
-			if (tmp->bit32)
-				tmp->addr.a32[0] = tmp->addr.a32[0] + base_off -
-					tmp->addr.a32[1];
-			else
-				tmp->addr.a64[0] = tmp->addr.a64[0] + base_off -
-					tmp->addr.a64[1];
-		}
-	}
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL))
+		sdt_adjust_loc(tmp, shdr.sh_offset);
+
+	/* Adjust reference counter offset */
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_PROBES_SCN, NULL))
+		sdt_adjust_refctr(tmp, shdr.sh_addr, shdr.sh_offset);
 
 	list_add_tail(&tmp->note_list, sdt_notes);
 	return 0;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index f25fae4b5743..20f49779116b 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -379,12 +379,19 @@ int get_sdt_note_list(struct list_head *head, const char *target);
 int cleanup_sdt_note_list(struct list_head *sdt_notes);
 int sdt_notes__get_count(struct list_head *start);
 
+#define SDT_PROBES_SCN ".probes"
 #define SDT_BASE_SCN ".stapsdt.base"
 #define SDT_NOTE_SCN  ".note.stapsdt"
 #define SDT_NOTE_TYPE 3
 #define SDT_NOTE_NAME "stapsdt"
 #define NR_ADDR 3
 
+enum {
+	SDT_NOTE_IDX_LOC = 0,
+	SDT_NOTE_IDX_BASE,
+	SDT_NOTE_IDX_REFCTR,
+};
+
 struct mem_info *mem_info__new(void);
 struct mem_info *mem_info__get(struct mem_info *mi);
 void   mem_info__put(struct mem_info *mi);
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-28  5:22 ` [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
@ 2018-06-28 19:51   ` Oleg Nesterov
  2018-06-29  3:23     ` Ravi Bangoria
  2018-07-01 21:09   ` Oleg Nesterov
  2018-07-02 16:01   ` Srikar Dronamraju
  2 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2018-06-28 19:51 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

I have to admit that after a quick glance I can't understand this patch
at all... I'll try to read it again tomorrow, but could you at least explain
how find_node_in_range/build_probe_list can work if off_type==REF_CTR_OFFSET?

On 06/28, Ravi Bangoria wrote:
>
> -find_node_in_range(struct inode *inode, loff_t min, loff_t max)
> +find_node_in_range(struct inode *inode, int off_type, loff_t min, loff_t max)
>  {
>  	struct rb_node *n = uprobes_tree.rb_node;
>  
>  	while (n) {
>  		struct uprobe *u = rb_entry(n, struct uprobe, rb_node);
> +		loff_t offset = uprobe_get_offset(u, off_type);
>  
>  		if (inode < u->inode) {
>  			n = n->rb_left;
>  		} else if (inode > u->inode) {
>  			n = n->rb_right;
>  		} else {
> -			if (max < u->offset)
> +			if (max < offset)
>  				n = n->rb_left;
> -			else if (min > u->offset)
> +			else if (min > offset)
>  				n = n->rb_right;
>  			else
>  				break;

To simplify, lets forget about uprobe->inode (which acts as a key too). So uprobes_tree
is a binary tree sorted by uprobe->offset key and that is why the binary search works.

But it is not sorted by uprobe->ref_ctr_offset. So for example n->rb_left can have the
n->ref_ctr_offset key that is greater than the n's ref_ctr_offset. So how we can use the
binary search if REF_CTR_OFFSET?

I must have missed something, I assume you tested this patch and it works somehow...

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-28 19:51   ` Oleg Nesterov
@ 2018-06-29  3:23     ` Ravi Bangoria
  0 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-06-29  3:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Oleg,

On 06/29/2018 01:21 AM, Oleg Nesterov wrote:
> I have to admit that after a quick glance I can't understand this patch
> at all... I'll try to read it again tomorrow, but could you at least explain
> how find_node_in_range/build_probe_list can work if off_type==REF_CTR_OFFSET?
> 
> On 06/28, Ravi Bangoria wrote:
>>
>> -find_node_in_range(struct inode *inode, loff_t min, loff_t max)
>> +find_node_in_range(struct inode *inode, int off_type, loff_t min, loff_t max)
>>  {
>>  	struct rb_node *n = uprobes_tree.rb_node;
>>  
>>  	while (n) {
>>  		struct uprobe *u = rb_entry(n, struct uprobe, rb_node);
>> +		loff_t offset = uprobe_get_offset(u, off_type);
>>  
>>  		if (inode < u->inode) {
>>  			n = n->rb_left;
>>  		} else if (inode > u->inode) {
>>  			n = n->rb_right;
>>  		} else {
>> -			if (max < u->offset)
>> +			if (max < offset)
>>  				n = n->rb_left;
>> -			else if (min > u->offset)
>> +			else if (min > offset)
>>  				n = n->rb_right;
>>  			else
>>  				break;
> 
> To simplify, lets forget about uprobe->inode (which acts as a key too). So uprobes_tree
> is a binary tree sorted by uprobe->offset key and that is why the binary search works.
> 
> But it is not sorted by uprobe->ref_ctr_offset. So for example n->rb_left can have the
> n->ref_ctr_offset key that is greater than the n's ref_ctr_offset. So how we can use the
> binary search if REF_CTR_OFFSET?
> 
> I must have missed something, I assume you tested this patch and it works somehow...
> 

Right, I've misinterpreted that code. Will check it.

Thanks for explaining,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-28  5:22 ` [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
  2018-06-28 19:51   ` Oleg Nesterov
@ 2018-07-01 21:09   ` Oleg Nesterov
  2018-07-02  5:16     ` Ravi Bangoria
  2018-07-02 16:01   ` Srikar Dronamraju
  2 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2018-07-01 21:09 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On 06/28, Ravi Bangoria wrote:
>
> @@ -294,6 +462,15 @@ int uprobe_write_opcode(struct uprobe *uprobe, struct mm_struct *mm,
>  	if (ret <= 0)
>  		goto put_old;
>  
> +	/* Update the ref_ctr if we are going to replace instruction. */
> +	if (!ref_ctr_updated) {
> +		ret = update_ref_ctr(uprobe, mm, is_register);
> +		if (ret)
> +			goto put_old;
> +
> +		ref_ctr_updated = 1;
> +	}

Why can't this code live in install_breakpoint() and remove_breakpoint() ?
this way we do not need to export "struct uprobe" and change set_swbp/set_orig_insn,
and the logic will be more simple.

And let me ask again... May be you have already explained this, but I can't
find the previous discussion.

So why do we need a counter but not a boolean? IIRC, because the counter can
be shared, in particular 2 different uprobes can have the same >ref_ctr_offset,
right?

But who else can use this counter and how? Say, can userspace update it too?
If yes, why this can't race with __update_ref_ctr() ?

And btw, why does __update_ref_ctr() use FOLL_FORCE? This vma should be writeable
or valid_ref_ctr_vma() should nack it?

And shouldn't valid_ref_ctr_vma() check VM_SHARED? IIUC we do not want to write
to this file?

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-01 21:09   ` Oleg Nesterov
@ 2018-07-02  5:16     ` Ravi Bangoria
  2018-07-02 18:01       ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-02  5:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Oleg,

On 07/02/2018 02:39 AM, Oleg Nesterov wrote:
> On 06/28, Ravi Bangoria wrote:
>>
>> @@ -294,6 +462,15 @@ int uprobe_write_opcode(struct uprobe *uprobe, struct mm_struct *mm,
>>  	if (ret <= 0)
>>  		goto put_old;
>>  
>> +	/* Update the ref_ctr if we are going to replace instruction. */
>> +	if (!ref_ctr_updated) {
>> +		ret = update_ref_ctr(uprobe, mm, is_register);
>> +		if (ret)
>> +			goto put_old;
>> +
>> +		ref_ctr_updated = 1;
>> +	}
> 
> Why can't this code live in install_breakpoint() and remove_breakpoint() ?
> this way we do not need to export "struct uprobe" and change set_swbp/set_orig_insn,
> and the logic will be more simple.

IMO, it's more easier with current approach. Updating reference counter
inside uprobe_write_opcode() makes it tightly coupled with instruction
patching. Basically, reference counter gets incremented only when first
consumer gets activated and will get decremented only when last consumer
is going away.

Advantage is, we can get rid of sdt_mm_list patch*, because increment and
decrement are anyway happening in sync. This makes the implementation lot
more simpler. If I do it inside install_breakpoit()/ remove_breakpoint(),
I've to reintroduce sdt_mm_list which makes code more complicated and ugly.

* https://lkml.org/lkml/2018/4/17/28

BTW, is there any harm in exporting struct uprobe outside of uprobe.c?

> 
> And let me ask again... May be you have already explained this, but I can't
> find the previous discussion.
> 
> So why do we need a counter but not a boolean? IIRC, because the counter can
> be shared, in particular 2 different uprobes can have the same >ref_ctr_offset,
> right?

Actually, it's by design. This counter keeps track of current tracers
tracing on a particular SDT marker. So only boolean can't work here.
Also, yes, multiple markers can share the same reference counter.

> 
> But who else can use this counter and how? Say, can userspace update it too?

There are many different ways user can change the reference counter.
Ex, systemtap and bcc both uses uprobe to probe on a marker but reference
counter update logic is different in both of them. Systemtap records all
exec/mmap events and updates the counter when it finds interested process/
vma. bcc directly hooks into process's memory (/proc/pid/mem).

> If yes, why this can't race with __update_ref_ctr() ?

Right. But there is no synchronization mechanism provided by the SDT
infrastructure and this is all userspace so there are chances of race.
At least, this patch still tries to make sure that reference counter
does not go negative. If so, throw a warning and don't update it.

> 
> And btw, why does __update_ref_ctr() use FOLL_FORCE? This vma should be writeable
> or valid_ref_ctr_vma() should nack it?

I don't remember the exact reason but seem its unnecessary. Let me try to
recall the reason, otherwise will remove it in the next version.

> 
> And shouldn't valid_ref_ctr_vma() check VM_SHARED? IIUC we do not want to write
> to this file?

Hmm, I haven't seen reference counter shared across processes. But as this
is a generic infrastructure, I'll add a check there.

Thanks for the review,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 10/10] perf probe: Support SDT markers having reference counter (semaphore)
  2018-06-28  5:22 ` [PATCH v5 10/10] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
@ 2018-07-02 14:45   ` Srikar Dronamraju
  2018-07-02 14:57   ` Srikar Dronamraju
  1 sibling, 0 replies; 50+ messages in thread
From: Srikar Dronamraju @ 2018-07-02 14:45 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, rostedt, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

* Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-06-28 10:52:09]:

> With this, perf buildid-cache will save SDT markers with reference
> counter in probe cache. Perf probe will be able to probe markers
> having reference counter. Ex,
> 
>   # readelf -n /tmp/tick | grep -A1 loop2
>     Name: loop2
>     ... Semaphore: 0x0000000010020036
> 
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> 

Looks good to me.

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 09/10] Uprobes/sdt: Document about reference counter
  2018-06-28  5:22 ` [PATCH v5 09/10] Uprobes/sdt: Document about reference counter Ravi Bangoria
@ 2018-07-02 14:54   ` Srikar Dronamraju
  2018-07-03  7:50     ` Ravi Bangoria
  0 siblings, 1 reply; 50+ messages in thread
From: Srikar Dronamraju @ 2018-07-02 14:54 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, rostedt, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

* Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-06-28 10:52:08]:

> Reference counter gate the invocation of probe. If present,
> by default reference count is 0. Kernel needs to increment
> it before tracing the probe and decrement it when done. This
> is identical to semaphore in Userspace Statically Defined
> Tracepoints (USDT).
> 
> Document usage of reference counter.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>

Unlike perf, this mechanism cannot detect ref count and depends on the
users data. What happens if the user mistakenly provides a wrong location?
I guess he can corrupt some other data structures?

Hence I would think twice of advertising this mechanism. i.e keep this
as an undocumented feature.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 10/10] perf probe: Support SDT markers having reference counter (semaphore)
  2018-06-28  5:22 ` [PATCH v5 10/10] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
  2018-07-02 14:45   ` Srikar Dronamraju
@ 2018-07-02 14:57   ` Srikar Dronamraju
  2018-07-03  8:00     ` Ravi Bangoria
  1 sibling, 1 reply; 50+ messages in thread
From: Srikar Dronamraju @ 2018-07-02 14:57 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, rostedt, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

* Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-06-28 10:52:09]:

> With this, perf buildid-cache will save SDT markers with reference
> counter in probe cache. Perf probe will be able to probe markers
> having reference counter. Ex,
> 
>   # readelf -n /tmp/tick | grep -A1 loop2
>     Name: loop2
>     ... Semaphore: 0x0000000010020036
> 
>   # ./perf buildid-cache --add /tmp/tick

Would this perf buildid-cache work if the executable is stripped of
symbols?

>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed

Also can we document how to use SDT markers with perf under perf-probe
or perf-build-cache?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-28  5:22 ` [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
  2018-06-28 19:51   ` Oleg Nesterov
  2018-07-01 21:09   ` Oleg Nesterov
@ 2018-07-02 16:01   ` Srikar Dronamraju
  2018-07-02 18:05     ` Oleg Nesterov
                       ` (2 more replies)
  2 siblings, 3 replies; 50+ messages in thread
From: Srikar Dronamraju @ 2018-07-02 16:01 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, rostedt, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

> Implement the reference counter logic in core uprobe. User will be
> able to use it from trace_uprobe as well as from kernel module. New
> trace_uprobe definition with reference counter will now be:
> 
>     <path>:<offset>[(ref_ctr_offset)]
> 
> where ref_ctr_offset is an optional field. For kernel module, new
> variant of uprobe_register() has been introduced:
> 
>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> 

Sorry for bringing this again, but I would actually think the ref_ctr is
a consumer property. i.e the ref_ctr_offset should be part of
uprobe_consumer.

The advantages of doing that would be
1. Dont need to expose uprobe structure and just update our
uprobe_consumer which is already an exported structure.
- Exporting uprobe structure would expose some of our internal
  implementation details, basically reduce the freedom of changing stuff
  internally.
- we came up with uprobe_arch for the parts that we wanted to expose
  to archs. exposing uprobe and uprobe_arch looks weird.

2. ref_ctr_offset is necessarily a consumer property, its not a uprobe
property at all.

3. We dont need to change/add new uprobe_register functions.

The way I look at it is.

Based on the ref_ctr_offset field in consumer, we update_ref_ctr()
around install_breakpoint/remove_breakpoint.

> +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +	struct delayed_uprobe *du;
> +
> +	if (delayed_uprobe_check(uprobe, mm))
> +		return 0;
> +
> +	du  = kzalloc(sizeof(*du), GFP_KERNEL);
> +	if (!du)
> +		return -ENOMEM;
> +
> +	du->uprobe = uprobe;
> +	du->mm = mm;
> +	list_add(&du->list, &delayed_uprobe_list);
> +	return 0;
> +}
> +

If I understood the delayed_uprobe stuff, its when we could insert a
breakpoint but the vma that has the ref_ctr_offset is not loaded. Is
that correct?

> 
> -- 
> 2.14.4
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-02  5:16     ` Ravi Bangoria
@ 2018-07-02 18:01       ` Oleg Nesterov
  2018-07-03  5:30         ` Ravi Bangoria
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2018-07-02 18:01 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On 07/02, Ravi Bangoria wrote:
>
> Hi Oleg,
>
> On 07/02/2018 02:39 AM, Oleg Nesterov wrote:
> > On 06/28, Ravi Bangoria wrote:
> >>
> >> @@ -294,6 +462,15 @@ int uprobe_write_opcode(struct uprobe *uprobe, struct mm_struct *mm,
> >>  	if (ret <= 0)
> >>  		goto put_old;
> >>
> >> +	/* Update the ref_ctr if we are going to replace instruction. */
> >> +	if (!ref_ctr_updated) {
> >> +		ret = update_ref_ctr(uprobe, mm, is_register);
> >> +		if (ret)
> >> +			goto put_old;
> >> +
> >> +		ref_ctr_updated = 1;
> >> +	}
> >
> > Why can't this code live in install_breakpoint() and remove_breakpoint() ?
> > this way we do not need to export "struct uprobe" and change set_swbp/set_orig_insn,
> > and the logic will be more simple.
>
> IMO, it's more easier with current approach. Updating reference counter
> inside uprobe_write_opcode() makes it tightly coupled with instruction
> patching. Basically, reference counter gets incremented only when first
> consumer gets activated and will get decremented only when last consumer
> is going away.
>
> Advantage is, we can get rid of sdt_mm_list patch*, because increment and
> decrement are anyway happening in sync. This makes the implementation lot
> more simpler. If I do it inside install_breakpoit()/ remove_breakpoint(),
> I've to reintroduce sdt_mm_list which makes code more complicated and ugly.

Why? I do not understand. Afaics you can have the same logic, but the resulting
code will be simpler and more clear.

> BTW, is there any harm in exporting struct uprobe outside of uprobe.c?

I think it is always better to avoid the exporting if possible (and avoid
changing the arch-dependant set_swbp/set_orig_insn). But once again, I think
this only complicates the code for no reason.

> > So why do we need a counter but not a boolean? IIRC, because the counter can
> > be shared, in particular 2 different uprobes can have the same >ref_ctr_offset,
> > right?
>
> Actually, it's by design. This counter keeps track of current tracers
> tracing on a particular SDT marker. So only boolean can't work here.
> Also, yes, multiple markers can share the same reference counter.

markers or uprobes? OK, I'll assume that multiple uprobes can share the counter.

> > But who else can use this counter and how? Say, can userspace update it too?
>
> There are many different ways user can change the reference counter.
> Ex, systemtap and bcc both uses uprobe to probe on a marker but reference
> counter update logic is different in both of them. Systemtap records all
> exec/mmap events and updates the counter when it finds interested process/
> vma. bcc directly hooks into process's memory (/proc/pid/mem).

OK, and how exactly they update the counter? I mean, can we assume that, say,
bcc or systemtap can only increment or decrement it?

If yes, perhaps we can simplify the kernel code...

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-02 16:01   ` Srikar Dronamraju
@ 2018-07-02 18:05     ` Oleg Nesterov
  2018-07-03  6:29     ` Ravi Bangoria
  2018-07-04  6:07     ` Ravi Bangoria
  2 siblings, 0 replies; 50+ messages in thread
From: Oleg Nesterov @ 2018-07-02 18:05 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ravi Bangoria, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On 07/02, Srikar Dronamraju wrote:
>
> > Implement the reference counter logic in core uprobe. User will be
> > able to use it from trace_uprobe as well as from kernel module. New
> > trace_uprobe definition with reference counter will now be:
> >
> >     <path>:<offset>[(ref_ctr_offset)]
> >
> > where ref_ctr_offset is an optional field. For kernel module, new
> > variant of uprobe_register() has been introduced:
> >
> >     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> >
>
> Sorry for bringing this again, but I would actually think the ref_ctr is
> a consumer property. i.e the ref_ctr_offset should be part of
> uprobe_consumer.

Damn yes ;) I was thinking about this too but decided to discuss this later.

But this probably means more complications, I am not sure about the actual
implementation.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-02 18:01       ` Oleg Nesterov
@ 2018-07-03  5:30         ` Ravi Bangoria
  2018-07-03  6:16           ` Srikar Dronamraju
                             ` (3 more replies)
  0 siblings, 4 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-03  5:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Oleg,

On 07/02/2018 11:31 PM, Oleg Nesterov wrote:
> On 07/02, Ravi Bangoria wrote:
>>
>> Hi Oleg,
>>
>> On 07/02/2018 02:39 AM, Oleg Nesterov wrote:
>>> On 06/28, Ravi Bangoria wrote:
>>>>
>>>> @@ -294,6 +462,15 @@ int uprobe_write_opcode(struct uprobe *uprobe, struct mm_struct *mm,
>>>>  	if (ret <= 0)
>>>>  		goto put_old;
>>>>
>>>> +	/* Update the ref_ctr if we are going to replace instruction. */
>>>> +	if (!ref_ctr_updated) {
>>>> +		ret = update_ref_ctr(uprobe, mm, is_register);
>>>> +		if (ret)
>>>> +			goto put_old;
>>>> +
>>>> +		ref_ctr_updated = 1;
>>>> +	}
>>>
>>> Why can't this code live in install_breakpoint() and remove_breakpoint() ?
>>> this way we do not need to export "struct uprobe" and change set_swbp/set_orig_insn,
>>> and the logic will be more simple.
>>
>> IMO, it's more easier with current approach. Updating reference counter
>> inside uprobe_write_opcode() makes it tightly coupled with instruction
>> patching. Basically, reference counter gets incremented only when first
>> consumer gets activated and will get decremented only when last consumer
>> is going away.
>>
>> Advantage is, we can get rid of sdt_mm_list patch*, because increment and
>> decrement are anyway happening in sync. This makes the implementation lot
>> more simpler. If I do it inside install_breakpoit()/ remove_breakpoint(),
>> I've to reintroduce sdt_mm_list which makes code more complicated and ugly.
> 
> Why? I do not understand. Afaics you can have the same logic, but the resulting
> code will be simpler and more clear.


Ok let me explain the difference.

Current approach:

    ------------
    register_for_each_vma() / uprobe_mmap()
      install_breakpoint()
        uprobe_write_opcode() {
                if (instruction is not already patched) {
                        /* Gets called only _once_. */
                        increment the reference counter;
                        patch the instruction;
                }
        }
    ------------

Here, reference counter is updated only if we are going to patch the
instruction. No matter how many consumers are there, no matter how many
times install_breakpoint() gets called. Same case for decrement as well.
Reference counter will get decremented only when we are going to reset
the instruction.

Now, if I put it inside install_breakpoint():

    ------------
    uprobe_register()
      register_for_each_vma()
        install_breakpoint() {
                /* Called _for each consumer_ */
                increment the reference counter _once_;
                uprobe_write_opcode()
		...
        }

  OR

    uprobe_mmap()
      install_breakpoint() {
                increment the reference counter _for each consumer_;
                uprobe_write_opcode()
		...
      }
    ------------

By putting it inside install_breaopoint() / remove_breakpoint(), we
increment and decremnet the counter for each consumer.

Consider this python example:

    ------------
    # readelf -n /usr/lib64/libpython2.7.so.1.0 | grep -A2 Provider
      Provider: python
      Name: function__entry
      ... Semaphore: 0x00000000002899d8

    # sudo perf probe sdt_python:function__entry

    # strace python
      mmap(NULL, 2738968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fff92460000
      mmap(0x7fff926a0000, 327680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x230000) = 0x7fff926a0000
      mprotect(0x7fff926a0000, 65536, PROT_READ) = 0
    ------------

The first mmap() maps the whole library into one region. Second mmap()
and third mprotect() split out the whole region into smaller vmas and
sets appropriate protection flags.

Now, in this case, install_breakpoint() will update reference counter
twice -- by second mmap() call and by third mprotect() call -- because
both regions contains the same reference counter. But, remove_brekpoint()
will decrement the reference counter only once, which will left counter
greater than 0 even when no one is tracing on that uprobe.

To solve this issue, I have to re-introduced sdt_mm_list, a list of
{consumer,mm} tuple for each uprobe. This tells us, for X consumer
we already have incremented the counter in Y mm so that we don't
increment same counter again. This additional complexity is to make
sure increment and decrement happen in sync.

Fortunately, we don't need to maintain this list with current approach,
because we know for sure that, increment and decrement will happen only
once when we patch/up-patch the instruction.

In short, if I put it inside install_breakpoint()/remove_breakpoint(),
I'll have to put the same logic there with more complication to
increment / decrement counter for each consumer. And additionally,
I've to maintain sdt_mm_list to make sure increment and decrement
happen in sync. With current approach, this all happens by default :)


> 
>> BTW, is there any harm in exporting struct uprobe outside of uprobe.c?
> 
> I think it is always better to avoid the exporting if possible (and avoid
> changing the arch-dependant set_swbp/set_orig_insn). But once again, I think
> this only complicates the code for no reason.> 
>>> So why do we need a counter but not a boolean? IIRC, because the counter can
>>> be shared, in particular 2 different uprobes can have the same >ref_ctr_offset,
>>> right?
>>
>> Actually, it's by design. This counter keeps track of current tracers
>> tracing on a particular SDT marker. So only boolean can't work here.
>> Also, yes, multiple markers can share the same reference counter.
> 
> markers or uprobes? OK, I'll assume that multiple uprobes can share the counter.


Yes, I meant uprobe.


> 
>>> But who else can use this counter and how? Say, can userspace update it too?
>>
>> There are many different ways user can change the reference counter.
>> Ex, systemtap and bcc both uses uprobe to probe on a marker but reference
>> counter update logic is different in both of them. Systemtap records all
>> exec/mmap events and updates the counter when it finds interested process/
>> vma. bcc directly hooks into process's memory (/proc/pid/mem).
> 
> OK, and how exactly they update the counter? I mean, can we assume that, say,
> bcc or systemtap can only increment or decrement it?


I don't think we can assume anything here because this is all in user's
control. User can even manually go and update the counter by directly
hooking into the memory. Ex, Brendan Gregg's blog on USDT explains how
to change counter manually in "Hacking with Ftrace: Is-Enabled" section
of his blog:
  http://www.brendangregg.com/blog/2015-07-03/hacking-linux-usdt-ftrace.html


> 
> If yes, perhaps we can simplify the kernel code...


Sure, let me know if you have any better idea.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03  5:30         ` Ravi Bangoria
@ 2018-07-03  6:16           ` Srikar Dronamraju
  2018-07-03  7:43             ` Ravi Bangoria
  2018-07-03  8:11           ` Ravi Bangoria
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 50+ messages in thread
From: Srikar Dronamraju @ 2018-07-03  6:16 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Oleg Nesterov, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

> Current approach:
> 
>     ------------
>     register_for_each_vma() / uprobe_mmap()
>       install_breakpoint()
>         uprobe_write_opcode() {
>                 if (instruction is not already patched) {
>                         /* Gets called only _once_. */
>                         increment the reference counter;
>                         patch the instruction;
>                 }
>         }
>     ------------
> 

Lets say a user just installs a breakpoint which is part of USDT (using
either a trace or perf (or some other utility) 
Since the semaphore is not updated, it never hits the probe.
This is correct.

Now he toggles the semaphore and places a probe at the same spot using
systemtap or bcc.
The probes will now be active and we see hits.
This is also correct.

If the user toggles the semaphore or deletes the probe using
systemtap/bcc. The probes will still be active.
Since the reference count is removed on the last consumer deletion. No?
This may be wrong because, we may be unnecessarily hitting the probes.


> Thanks,
> Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-02 16:01   ` Srikar Dronamraju
  2018-07-02 18:05     ` Oleg Nesterov
@ 2018-07-03  6:29     ` Ravi Bangoria
  2018-07-03 19:26       ` Oleg Nesterov
  2018-07-04  6:07     ` Ravi Bangoria
  2 siblings, 1 reply; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-03  6:29 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: oleg, rostedt, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton, Ravi Bangoria

Hi Srikar,

On 07/02/2018 09:31 PM, Srikar Dronamraju wrote:
>> Implement the reference counter logic in core uprobe. User will be
>> able to use it from trace_uprobe as well as from kernel module. New
>> trace_uprobe definition with reference counter will now be:
>>
>>     <path>:<offset>[(ref_ctr_offset)]
>>
>> where ref_ctr_offset is an optional field. For kernel module, new
>> variant of uprobe_register() has been introduced:
>>
>>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
>>
> 
> Sorry for bringing this again, but I would actually think the ref_ctr is
> a consumer property. i.e the ref_ctr_offset should be part of
> uprobe_consumer.


I agree that reference counter is a consumer property and that was the
main reason my initial draft was to change trace_uprobe. But there were
couple of issues with that approach too. I've already mentioned few of
them here: https://lkml.org/lkml/2018/6/6/129. Apart from these, if I
do it inside trace_uprobe, kernel module won't have a way to use
reference counter.

Now about adding ref_ctr_offset into uprobe_consumer. Actually, I
didn't want to change the uprobe_consumer definition because it's
already exported and tools like systemtap are using it. And thus, I
haven't explored how difficult or easy it will be to implement it that
way.


> 
> The advantages of doing that would be
> 1. Dont need to expose uprobe structure and just update our
> uprobe_consumer which is already an exported structure.
> - Exporting uprobe structure would expose some of our internal
>   implementation details, basically reduce the freedom of changing stuff
>   internally.


I agree. We will loose the freedom to change stuff by exporting uprobe.


> - we came up with uprobe_arch for the parts that we wanted to expose
>   to archs. exposing uprobe and uprobe_arch looks weird.


Hmm, how about this ...

  set_swbp(arch_uprobe, ...) {
    uprobe_write_opcode(arch_uprobe, ...) {
      uprobe = container_of(arch_uprobe);
      ...
    }
  }

Let me think on this. If this works, I won't need to export struct uprobe
outside.


> 
> 2. ref_ctr_offset is necessarily a consumer property, its not a uprobe
> property at all.


I agree.


> 
> 3. We dont need to change/add new uprobe_register functions.


Quite possible. I need to explore on that.


> 
> The way I look at it is.
> 
> Based on the ref_ctr_offset field in consumer, we update_ref_ctr()
> around install_breakpoint/remove_breakpoint.
> 
>> +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
>> +{
>> +	struct delayed_uprobe *du;
>> +
>> +	if (delayed_uprobe_check(uprobe, mm))
>> +		return 0;
>> +
>> +	du  = kzalloc(sizeof(*du), GFP_KERNEL);
>> +	if (!du)
>> +		return -ENOMEM;
>> +
>> +	du->uprobe = uprobe;
>> +	du->mm = mm;
>> +	list_add(&du->list, &delayed_uprobe_list);
>> +	return 0;
>> +}
>> +
> 
> If I understood the delayed_uprobe stuff, its when we could insert a
> breakpoint but the vma that has the ref_ctr_offset is not loaded. Is
> that correct?


That's correct.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03  6:16           ` Srikar Dronamraju
@ 2018-07-03  7:43             ` Ravi Bangoria
  2018-07-04  9:16               ` Srikar Dronamraju
  0 siblings, 1 reply; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-03  7:43 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Oleg Nesterov, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Srikar,

On 07/03/2018 11:46 AM, Srikar Dronamraju wrote:
>> Current approach:
>>
>>     ------------
>>     register_for_each_vma() / uprobe_mmap()
>>       install_breakpoint()
>>         uprobe_write_opcode() {
>>                 if (instruction is not already patched) {
>>                         /* Gets called only _once_. */
>>                         increment the reference counter;
>>                         patch the instruction;
>>                 }
>>         }
>>     ------------
>>
> 
> Lets say a user just installs a breakpoint which is part of USDT (using
> either a trace or perf (or some other utility) 
> Since the semaphore is not updated, it never hits the probe.
> This is correct.
> 
> Now he toggles the semaphore and places a probe at the same spot using
> systemtap or bcc.
> The probes will now be active and we see hits.
> This is also correct.
> 
> If the user toggles the semaphore or deletes the probe using
> systemtap/bcc. The probes will still be active.
> Since the reference count is removed on the last consumer deletion. No?
> This may be wrong because, we may be unnecessarily hitting the probes.


I'm not sure if I get your concerns but let me try to explain what happens
in such cases. please let me know if I misunderstood your point.

1. Install a probe using perf.
  # ./perf probe sdt_tick:loop2

  This does not do anything. Just creates an entry in trace_uprobe.

2. Start record using perf:
  # ./perf record -a -e sdt_tick:loop2

  This will call uprobe_register_refctr(inode, offfset, ref_ctr_offset)
  and will create an object of struct uprobe.

3. Start target process
  # ./tick

  When we start 'tick', kernel will load the binary and start creating
  vmas, which will basically call uprobe_mmap(). First vma will be of
  text section and thus instruction will be patched. But vma holding
  reference counter is not present yet and thus the uprobe will be
  added to delayed_uprobe_list. Now kernel will map the data section
  holding reference counter and uprobe_mmap() will check for any
  delayed_uprobe. It will find the uprobe and thus it will increment
  the reference counter.

  So, Reference counter = 1, instruction is patched with trap.

4. For simplicity, I'm using kernel module instead of systemtap / bcc.
   User loads a kernel module that registers a probe on the same uprobe
   by calling uprobe_register_refctr(inode, offset, ref_ctr_offset).

  # insmod test-sdt.ko

  The sequence of function call here will be:
    uprobe_register_refctr()
      register_for_each_vma()
        install_breakpoint()
          set_swbp()
            uprobe_write_opcode()

  But uprobe_write_opcode will find instruction is already patched and
  thus return without doing anything.

  So, Reference counter = 1, instruction is patched.

5. Let's say user kills perf record started in step 2. Function call
   sequence here will be:

     uprobe_unregister()
       register_for_each_vma() {
         if (!filter_chain())
            remove_breakpoint();
       }

   Here filter_chain return true because other consumer is active on the
   same uprobe. so uprobe_unregister() will return without doing anything.

   So, Reference counter = 1, instruction is patched.

6. User unloads the kernel module.
   # rmmod test-sdt.ko

   Same function sequence as step 5 but this time filter_chain() will
   return false because there are no consumer left. And thus
   remove_breakpoint() will get called which will reset instruction and
   decrement the counter.

  So, Reference counter = 0, instruction is reset.


Does this explain your concerns?

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 09/10] Uprobes/sdt: Document about reference counter
  2018-07-02 14:54   ` Srikar Dronamraju
@ 2018-07-03  7:50     ` Ravi Bangoria
  0 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-03  7:50 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: oleg, rostedt, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton, Ravi Bangoria

Hi Srikar,

On 07/02/2018 08:24 PM, Srikar Dronamraju wrote:
> * Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-06-28 10:52:08]:
> 
>> Reference counter gate the invocation of probe. If present,
>> by default reference count is 0. Kernel needs to increment
>> it before tracing the probe and decrement it when done. This
>> is identical to semaphore in Userspace Statically Defined
>> Tracepoints (USDT).
>>
>> Document usage of reference counter.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
>> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> 
> Unlike perf, this mechanism cannot detect ref count and depends on the
> users data. What happens if the user mistakenly provides a wrong location?
> I guess he can corrupt some other data structures?

Yes, if user is giving wrong ref_ctr_offset, uprobe infrastructure will
corrupt some user data.

> 
> Hence I would think twice of advertising this mechanism. i.e keep this
> as an undocumented feature.
> 

I don't mind.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 10/10] perf probe: Support SDT markers having reference counter (semaphore)
  2018-07-02 14:57   ` Srikar Dronamraju
@ 2018-07-03  8:00     ` Ravi Bangoria
  0 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-03  8:00 UTC (permalink / raw)
  To: Srikar Dronamraju, mhiramat
  Cc: oleg, rostedt, peterz, mingo, acme, alexander.shishkin, jolsa,
	namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton, Ravi Bangoria

Hi Srikar,

On 07/02/2018 08:27 PM, Srikar Dronamraju wrote:
> * Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-06-28 10:52:09]:
> 
>> With this, perf buildid-cache will save SDT markers with reference
>> counter in probe cache. Perf probe will be able to probe markers
>> having reference counter. Ex,
>>
>>   # readelf -n /tmp/tick | grep -A1 loop2
>>     Name: loop2
>>     ... Semaphore: 0x0000000010020036
>>
>>   # ./perf buildid-cache --add /tmp/tick
> 
> Would this perf buildid-cache work if the executable is stripped of
> symbols?

Description of SDT markers resides in .notes section. If .notes section
is there, binary has SDT markers, if .notes section is not there, binary
does not have any SDT markers. So SDT markers does not have anything to
do with symbol table.

> 
>>   # ./perf probe sdt_tick:loop2
>>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>>     hi: 0
>>     hi: 1
>>     hi: 2
>>     ^C
>>      Performance counter stats for '/tmp/tick':
>>                  3      sdt_tick:loop2
>>        2.561851452 seconds time elapsed
> 
> Also can we document how to use SDT markers with perf under perf-probe
> or perf-build-cache?
> 

Yes, perf-buildid-cache and perf-probe man pages describes about SDT
markers.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03  5:30         ` Ravi Bangoria
  2018-07-03  6:16           ` Srikar Dronamraju
@ 2018-07-03  8:11           ` Ravi Bangoria
  2018-07-03 16:36           ` Oleg Nesterov
  2018-07-03 17:12           ` Oleg Nesterov
  3 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-03  8:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria



On 07/03/2018 11:00 AM, Ravi Bangoria wrote:
> In short, if I put it inside install_breakpoint()/remove_breakpoint(),
> I'll have to put the same logic there with more complication to
> increment / decrement counter for each consumer. And additionally,
> I've to maintain sdt_mm_list to make sure increment and decrement
> happen in sync. With current approach, this all happens by default :)

s/this all happens by default/no need to maintain all this/

-Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03  5:30         ` Ravi Bangoria
  2018-07-03  6:16           ` Srikar Dronamraju
  2018-07-03  8:11           ` Ravi Bangoria
@ 2018-07-03 16:36           ` Oleg Nesterov
  2018-07-03 17:25             ` Oleg Nesterov
  2018-07-04  4:49             ` Ravi Bangoria
  2018-07-03 17:12           ` Oleg Nesterov
  3 siblings, 2 replies; 50+ messages in thread
From: Oleg Nesterov @ 2018-07-03 16:36 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On 07/03, Ravi Bangoria wrote:
>
> Ok let me explain the difference.
>
> Current approach:
>
>     ------------
>     register_for_each_vma() / uprobe_mmap()
>       install_breakpoint()
>         uprobe_write_opcode() {
>                 if (instruction is not already patched) {
>                         /* Gets called only _once_. */
>                         increment the reference counter;
>                         patch the instruction;
>                 }
>         }

Yes I see. And I am not sure this all is correct. And I still hope we can do
something better, I'll write another email.

For now, let's discuss your current approach.

> Now, if I put it inside install_breakpoint():
>
>     ------------
>     uprobe_register()
>       register_for_each_vma()
>         install_breakpoint() {
>                 /* Called _for each consumer_ */

How so? it is not called for each consumer. I think you misread this code.

>                 increment the reference counter _once_;
>                 uprobe_write_opcode()
> 		...
>         }

So. I meant that you can move the _same_ logic into install_breakpoint() and
remove_breakpoint(). And note that ref_ctr_updated in uprobe_write_opcode() is
only needed because it can retry the fault.

IOW, you can simply do update_ref_ctr(is_register => 1) at the start of
install_breakpoint(), and update_ref_ctr(0) in remove_breakpoint(), there are
no other callers of uprobe_write_opcode(). To clarify, it is indirectly called
by set_swbp() and set_orig_insn(), but this doesn't matter.

Or you can kill update_ref_ctr() and (roughly) do

	rc_vma = find_ref_ctr_vma(...);
	if (rc_vma)
		__update_ref_ctr(..., 1);
	else
		delayed_uprobe_add(...);

at the start of install_breakpoint() and

	rc_vma = find_ref_ctr_vma(...);
	if (rc_vma)
		__update_ref_ctr(..., -1);
	delayed_uprobe_remove(...);

in remove_breakpoint().


>     uprobe_mmap()
>       install_breakpoint() {
>                 increment the reference counter _for each consumer_;

Again, I do not understand where do you see the "for each consumer" thing.

>                 uprobe_write_opcode()

In short. There is a 1:1 relationship between uprobe_write_opcode(is_register => 1)
and install_breakpoint(), and between uprobe_write_opcode(is_register => 0) and
remove_breakpoint(). Whatever uprobe_write_opcode() can do if is_register == 1 can be
done in install_breakpoint(), the same for is_register == 0 and remove_breakpont().

What have I missed?

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03  5:30         ` Ravi Bangoria
                             ` (2 preceding siblings ...)
  2018-07-03 16:36           ` Oleg Nesterov
@ 2018-07-03 17:12           ` Oleg Nesterov
  2018-07-03 18:23             ` Oleg Nesterov
  3 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2018-07-03 17:12 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On 07/03, Ravi Bangoria wrote:
>
> > OK, and how exactly they update the counter? I mean, can we assume that, say,
> > bcc or systemtap can only increment or decrement it?
>
> I don't think we can assume anything here because this is all in user's
> control. User can even manually go and update the counter by directly
> hooking into the memory.

Then how this all can work? I understand that user-space can do anything with
this counter, but we do not care if it does something wrong, say nullifies the
ctr incremented by kernel.

I don't understand this. I think that if a user registers uprobe with
->ref_ctr_offset != 0 we can safely assume that this is a counter, and we do
not care if userspace corrupts it.

> > If yes, perhaps we can simplify the kernel code...
>
> Sure, let me know if you have any better idea.

Can't we (ab)use the most significant bit in this counter?

To simplify, lets suppose for the moment that 2 different uprobes can't have
the same ->ref_ctr_offset. Then we can do something like

	#define UPROBE_KERN_CTR		(SHRT_MAX + 1)	// MSB

	install_breakpoint:

		for (each valid_ref_ctr_vma which maps uprobe->ref_ctr_offset)
			*ctr_ptr |= UPROBE_KERN_CTR;

		set_swbp();

and

	remove_breakpoint:

		for (each valid_ref_ctr_vma which maps uprobe->ref_ctr_offset)
			*ctr_ptr &= ~UPROBE_KERN_CTR;

		set_orig_insn();

IOW, we increment/decrement by UPROBE_KERN_CTR, not by 1. But this way the
"increment" is idempotent, we do not care if "|=" or "&=" was applied more than
once, we do not need to record the fact that the counter was already incremented,
and inc/dec are always balanced.


Now, lets recall that multiple uprobes can share the same counter. install_breakpoint()
is still fine, and we only need to add the additional code into remove_breakpoint:

		for (each uprobe with the same inode and ref_ctr_offset)
			if (filter_chain(uprobe))
				goto keep_ctr;

		for (each valid_ref_ctr_vma which maps uprobe->ref_ctr_offset)
			*ctr_ptr &= ~UPROBE_KERN_CTR;

	keep_ctr:
		set_orig_insn();
				

Just an idea.

What do you think?

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03 16:36           ` Oleg Nesterov
@ 2018-07-03 17:25             ` Oleg Nesterov
  2018-07-04  4:53               ` Ravi Bangoria
  2018-07-04  4:49             ` Ravi Bangoria
  1 sibling, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2018-07-03 17:25 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On 07/03, Oleg Nesterov wrote:
>
> In short. There is a 1:1 relationship between uprobe_write_opcode(is_register => 1)
> and install_breakpoint(), and between uprobe_write_opcode(is_register => 0) and
> remove_breakpoint(). Whatever uprobe_write_opcode() can do if is_register == 1 can be
> done in install_breakpoint(), the same for is_register == 0 and remove_breakpont().
>
> What have I missed?

Ah. I missed the fact that uprobe_write_opcode() doesn't do update_ref_ctr() if
verify_opcode() returns false.

Now I understand what did you mean by "for each consumer". So if we move this logic
into install/remove_breakpoint as I tried to suggest, we will also need another error
code for the case when verify_opcode() returns false.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03 17:12           ` Oleg Nesterov
@ 2018-07-03 18:23             ` Oleg Nesterov
  2018-07-04  5:25               ` Ravi Bangoria
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2018-07-03 18:23 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

forgot to mention...

Of course, I am not sure that UPROBE_KERN_CTR can actually work, there are a lot
of details. But if it can, then we can also make ->ref_ctr_offset a consumer property.

On 07/03, Oleg Nesterov wrote:
>
> On 07/03, Ravi Bangoria wrote:
> >
> > > OK, and how exactly they update the counter? I mean, can we assume that, say,
> > > bcc or systemtap can only increment or decrement it?
> >
> > I don't think we can assume anything here because this is all in user's
> > control. User can even manually go and update the counter by directly
> > hooking into the memory.
>
> Then how this all can work? I understand that user-space can do anything with
> this counter, but we do not care if it does something wrong, say nullifies the
> ctr incremented by kernel.
>
> I don't understand this. I think that if a user registers uprobe with
> ->ref_ctr_offset != 0 we can safely assume that this is a counter, and we do
> not care if userspace corrupts it.
>
> > > If yes, perhaps we can simplify the kernel code...
> >
> > Sure, let me know if you have any better idea.
>
> Can't we (ab)use the most significant bit in this counter?
>
> To simplify, lets suppose for the moment that 2 different uprobes can't have
> the same ->ref_ctr_offset. Then we can do something like
>
> 	#define UPROBE_KERN_CTR		(SHRT_MAX + 1)	// MSB
>
> 	install_breakpoint:
>
> 		for (each valid_ref_ctr_vma which maps uprobe->ref_ctr_offset)
> 			*ctr_ptr |= UPROBE_KERN_CTR;
>
> 		set_swbp();
>
> and
>
> 	remove_breakpoint:
>
> 		for (each valid_ref_ctr_vma which maps uprobe->ref_ctr_offset)
> 			*ctr_ptr &= ~UPROBE_KERN_CTR;
>
> 		set_orig_insn();
>
> IOW, we increment/decrement by UPROBE_KERN_CTR, not by 1. But this way the
> "increment" is idempotent, we do not care if "|=" or "&=" was applied more than
> once, we do not need to record the fact that the counter was already incremented,
> and inc/dec are always balanced.
>
>
> Now, lets recall that multiple uprobes can share the same counter. install_breakpoint()
> is still fine, and we only need to add the additional code into remove_breakpoint:
>
> 		for (each uprobe with the same inode and ref_ctr_offset)
> 			if (filter_chain(uprobe))
> 				goto keep_ctr;
>
> 		for (each valid_ref_ctr_vma which maps uprobe->ref_ctr_offset)
> 			*ctr_ptr &= ~UPROBE_KERN_CTR;
>
> 	keep_ctr:
> 		set_orig_insn();
>
>
> Just an idea.
>
> What do you think?
>
> Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03  6:29     ` Ravi Bangoria
@ 2018-07-03 19:26       ` Oleg Nesterov
  2018-07-04  5:26         ` Ravi Bangoria
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2018-07-03 19:26 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Srikar Dronamraju, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On 07/03, Ravi Bangoria wrote:
>
> Now about adding ref_ctr_offset into uprobe_consumer. Actually, I
> didn't want to change the uprobe_consumer definition because it's
> already exported and tools like systemtap are using it.

So what? Yes, the out-of-tree modules should be updated, but this doesn't
mean we can add the new features.

And, speaking of systemtap, it already has to do build-time checks to verify
that uprobe_consumer->ret_handler member exists. So it will need another
STAPCONF_INODE_REFCNT and that is all.

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03 16:36           ` Oleg Nesterov
  2018-07-03 17:25             ` Oleg Nesterov
@ 2018-07-04  4:49             ` Ravi Bangoria
  1 sibling, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-04  4:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Oleg,

On 07/03/2018 10:06 PM, Oleg Nesterov wrote:
> On 07/03, Ravi Bangoria wrote:
>>
>> Ok let me explain the difference.
>>
>> Current approach:
>>
>>     ------------
>>     register_for_each_vma() / uprobe_mmap()
>>       install_breakpoint()
>>         uprobe_write_opcode() {
>>                 if (instruction is not already patched) {
>>                         /* Gets called only _once_. */
>>                         increment the reference counter;
>>                         patch the instruction;
>>                 }
>>         }
> 
> Yes I see. And I am not sure this all is correct. And I still hope we can do
> something better, I'll write another email.
> 
> For now, let's discuss your current approach.
> 
>> Now, if I put it inside install_breakpoint():
>>
>>     ------------
>>     uprobe_register()
>>       register_for_each_vma()
>>         install_breakpoint() {
>>                 /* Called _for each consumer_ */
> 
> How so? it is not called for each consumer. I think you misread this code.


Actually, I meant entire sequence

  uprobe_register()
    register_for_each_vma()
      install_breakpoint()

gets called for each consumer. Not just install_breakpoint(). Sorry
for a bit of ambiguity.


> 
>>                 increment the reference counter _once_;
>>                 uprobe_write_opcode()
>> 		...
>>         }
> 
> So. I meant that you can move the _same_ logic into install_breakpoint() and
> remove_breakpoint(). And note that ref_ctr_updated in uprobe_write_opcode() is
> only needed because it can retry the fault.
> 
> IOW, you can simply do update_ref_ctr(is_register => 1) at the start of
> install_breakpoint(), and update_ref_ctr(0) in remove_breakpoint(), there are
> no other callers of uprobe_write_opcode(). To clarify, it is indirectly called
> by set_swbp() and set_orig_insn(), but this doesn't matter.
> 
> Or you can kill update_ref_ctr() and (roughly) do
> 
> 	rc_vma = find_ref_ctr_vma(...);
> 	if (rc_vma)
> 		__update_ref_ctr(..., 1);
> 	else
> 		delayed_uprobe_add(...);
> 
> at the start of install_breakpoint() and
> 
> 	rc_vma = find_ref_ctr_vma(...);
> 	if (rc_vma)
> 		__update_ref_ctr(..., -1);
> 	delayed_uprobe_remove(...);
> 
> in remove_breakpoint().
> 
> 
>>     uprobe_mmap()
>>       install_breakpoint() {
>>                 increment the reference counter _for each consumer_;
> 
> Again, I do not understand where do you see the "for each consumer" thing.
> 
>>                 uprobe_write_opcode()
> 
> In short. There is a 1:1 relationship between uprobe_write_opcode(is_register => 1)
> and install_breakpoint(), and between uprobe_write_opcode(is_register => 0) and
> remove_breakpoint(). Whatever uprobe_write_opcode() can do if is_register == 1 can be
> done in install_breakpoint(), the same for is_register == 0 and remove_breakpont().
> 
> What have I missed?


Yes, the verify_opcode() stuff as you have mentioned in another reply.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03 17:25             ` Oleg Nesterov
@ 2018-07-04  4:53               ` Ravi Bangoria
  2018-07-10 15:25                 ` Oleg Nesterov
  0 siblings, 1 reply; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-04  4:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Oleg,

On 07/03/2018 10:55 PM, Oleg Nesterov wrote:
> On 07/03, Oleg Nesterov wrote:
>>
>> In short. There is a 1:1 relationship between uprobe_write_opcode(is_register => 1)
>> and install_breakpoint(), and between uprobe_write_opcode(is_register => 0) and
>> remove_breakpoint(). Whatever uprobe_write_opcode() can do if is_register == 1 can be
>> done in install_breakpoint(), the same for is_register == 0 and remove_breakpont().
>>
>> What have I missed?
> 
> Ah. I missed the fact that uprobe_write_opcode() doesn't do update_ref_ctr() if
> verify_opcode() returns false.
> 
> Now I understand what did you mean by "for each consumer". So if we move this logic
> into install/remove_breakpoint as I tried to suggest, we will also need another error
> code for the case when verify_opcode() returns false.

Ok so if we can use verify_opcode() inside install_breakpoint(), we can probably
move implementation logic in install/remove_breakpoint(). Let me explore that more.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03 18:23             ` Oleg Nesterov
@ 2018-07-04  5:25               ` Ravi Bangoria
  0 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-04  5:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Oleg,

On 07/03/2018 11:53 PM, Oleg Nesterov wrote:
> forgot to mention...
> 
> Of course, I am not sure that UPROBE_KERN_CTR can actually work, there are a lot
> of details. But if it can, then we can also make ->ref_ctr_offset a consumer property.
> 
> On 07/03, Oleg Nesterov wrote:
>>
>> On 07/03, Ravi Bangoria wrote:
>>>
>>>> OK, and how exactly they update the counter? I mean, can we assume that, say,
>>>> bcc or systemtap can only increment or decrement it?
>>>
>>> I don't think we can assume anything here because this is all in user's
>>> control. User can even manually go and update the counter by directly
>>> hooking into the memory.
>>
>> Then how this all can work? I understand that user-space can do anything with
>> this counter, but we do not care if it does something wrong, say nullifies the
>> ctr incremented by kernel.
>>
>> I don't understand this. I think that if a user registers uprobe with
>> ->ref_ctr_offset != 0 we can safely assume that this is a counter, and we do
>> not care if userspace corrupts it.


Right, that's what I meant. We can't assume anything. Our(kernel) job
is just to:
 1. Increment / decrement counter in sync. No matter what is the value
    of counter before operation.
 2. Make sure that counter does not go negative by the kernel.
 3. Make sure that kernel is not harmed in any case.


>>
>>>> If yes, perhaps we can simplify the kernel code...
>>>
>>> Sure, let me know if you have any better idea.
>>
>> Can't we (ab)use the most significant bit in this counter?
>>
>> To simplify, lets suppose for the moment that 2 different uprobes can't have
>> the same ->ref_ctr_offset. Then we can do something like
>>
>> 	#define UPROBE_KERN_CTR		(SHRT_MAX + 1)	// MSB
>>
>> 	install_breakpoint:
>>
>> 		for (each valid_ref_ctr_vma which maps uprobe->ref_ctr_offset)
>> 			*ctr_ptr |= UPROBE_KERN_CTR;
>>
>> 		set_swbp();
>>
>> and
>>
>> 	remove_breakpoint:
>>
>> 		for (each valid_ref_ctr_vma which maps uprobe->ref_ctr_offset)
>> 			*ctr_ptr &= ~UPROBE_KERN_CTR;
>>
>> 		set_orig_insn();
>>
>> IOW, we increment/decrement by UPROBE_KERN_CTR, not by 1. But this way the
>> "increment" is idempotent, we do not care if "|=" or "&=" was applied more than
>> once, we do not need to record the fact that the counter was already incremented,
>> and inc/dec are always balanced.
>>
>>
>> Now, lets recall that multiple uprobes can share the same counter. install_breakpoint()
>> is still fine, and we only need to add the additional code into remove_breakpoint:
>>
>> 		for (each uprobe with the same inode and ref_ctr_offset)
>> 			if (filter_chain(uprobe))
>> 				goto keep_ctr;
>>
>> 		for (each valid_ref_ctr_vma which maps uprobe->ref_ctr_offset)
>> 			*ctr_ptr &= ~UPROBE_KERN_CTR;
>>
>> 	keep_ctr:
>> 		set_orig_insn();
>>
>>
>> Just an idea.
>>
>> What do you think?


This is really a cool idea. Let me explore that.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03 19:26       ` Oleg Nesterov
@ 2018-07-04  5:26         ` Ravi Bangoria
  0 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-04  5:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Srikar Dronamraju, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria



On 07/04/2018 12:56 AM, Oleg Nesterov wrote:
> On 07/03, Ravi Bangoria wrote:
>>
>> Now about adding ref_ctr_offset into uprobe_consumer. Actually, I
>> didn't want to change the uprobe_consumer definition because it's
>> already exported and tools like systemtap are using it.
> 
> So what? Yes, the out-of-tree modules should be updated, but this doesn't
> mean we can add the new features.
> 
> And, speaking of systemtap, it already has to do build-time checks to verify
> that uprobe_consumer->ret_handler member exists. So it will need another
> STAPCONF_INODE_REFCNT and that is all.

Ok. let me explore that possibility as well.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-02 16:01   ` Srikar Dronamraju
  2018-07-02 18:05     ` Oleg Nesterov
  2018-07-03  6:29     ` Ravi Bangoria
@ 2018-07-04  6:07     ` Ravi Bangoria
  2 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-04  6:07 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: oleg, rostedt, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton, Ravi Bangoria

Hi Srikar,

On 07/02/2018 09:31 PM, Srikar Dronamraju wrote:

> 2. ref_ctr_offset is necessarily a consumer property, its not a uprobe
> property at all.


Sorry to ask this after agreeing :P

Initially, I thought this point is right. But after thinking more on this,
it seems wrong to me. Ex a python SDT probe:

    Provider: python
    Name: function__entry
    Location: 0x0000000000140de4, Base: 0x00000000001ddfb0, Semaphore: 0x0000000000268320

Here, "Semaphore" field is what we are referring to as reference counter.
Basically reference counter's job is to gate invocation of uprobe and
thus ref_ctr_offset becomes a uprobe property, not the consumer property.
No? Am I missing anything?

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-03  7:43             ` Ravi Bangoria
@ 2018-07-04  9:16               ` Srikar Dronamraju
  2018-07-04  9:24                 ` Ravi Bangoria
  0 siblings, 1 reply; 50+ messages in thread
From: Srikar Dronamraju @ 2018-07-04  9:16 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Oleg Nesterov, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

> 
> I'm not sure if I get your concerns but let me try to explain what happens
> in such cases. please let me know if I misunderstood your point.
> 
> 1. Install a probe using perf.
>   # ./perf probe sdt_tick:loop2
> 
> 
> 
> Does this explain your concerns?
> 


No, this was not my concern.
My concern is with two users on the same USDT.
1. First user enables the probe point but doesn't increment the ref_cnt.
via uprobe_register

2. Second user tries to enable the probe point and also increments the
ref_cnt via uprobe_register_refctr.

3. If the second user now removes the probe point via uprobe_unregister.

4. What is the state of the ref_cnt?

--
Thanks and Regards
Srikar


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-04  9:16               ` Srikar Dronamraju
@ 2018-07-04  9:24                 ` Ravi Bangoria
  0 siblings, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-04  9:24 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Oleg Nesterov, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Srikar,

On 07/04/2018 02:46 PM, Srikar Dronamraju wrote:
>>
>> I'm not sure if I get your concerns but let me try to explain what happens
>> in such cases. please let me know if I misunderstood your point.
>>
>> 1. Install a probe using perf.
>>   # ./perf probe sdt_tick:loop2
>>
>>
>>
>> Does this explain your concerns?
>>
> 
> 
> No, this was not my concern.
> My concern is with two users on the same USDT.
> 1. First user enables the probe point but doesn't increment the ref_cnt.
> via uprobe_register
> 
> 2. Second user tries to enable the probe point and also increments the
> ref_cnt via uprobe_register_refctr.


Ok got it. uprobe_register_refctr() will return with error because we don't
allow one uprobe(inode+offset) with multiple reference counter.

i.e. If inode+offset matches for two uprobes, ref_ctr_offset must match as
well. Patch 8/10 takes care of this.


> 
> 3. If the second user now removes the probe point via uprobe_unregister.
> 
> 4. What is the state of the ref_cnt?
> 
> --
> Thanks and Regards
> Srikar
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-04  4:53               ` Ravi Bangoria
@ 2018-07-10 15:25                 ` Oleg Nesterov
  2018-07-11  8:44                   ` Ravi Bangoria
  0 siblings, 1 reply; 50+ messages in thread
From: Oleg Nesterov @ 2018-07-10 15:25 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

Hi Ravi,

On 07/04, Ravi Bangoria wrote:
>
> > Now I understand what did you mean by "for each consumer". So if we move this logic
> > into install/remove_breakpoint as I tried to suggest, we will also need another error
> > code for the case when verify_opcode() returns false.
>
> Ok so if we can use verify_opcode() inside install_breakpoint(), we can probably
> move implementation logic in install/remove_breakpoint(). Let me explore that more.

No, sorry for confusion, I meant another thing... But please forget. If we rely on
verify_opcode() I no longer think it would be more clean to move this logic into
install/remove_breakpoint.

However, I still think it would be better to avoid uprobe exporting and modifying
set_swbp/set_orig_insn. May be we can simply kill both set_swbp() and set_orig_insn(),
I'll re-check...

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-10 15:25                 ` Oleg Nesterov
@ 2018-07-11  8:44                   ` Ravi Bangoria
  2018-07-11  9:52                     ` Ravi Bangoria
  2018-07-12 14:58                     ` Oleg Nesterov
  0 siblings, 2 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-11  8:44 UTC (permalink / raw)
  To: Oleg Nesterov, srikar
  Cc: rostedt, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton, Ravi Bangoria

Hi Oleg,

On 07/10/2018 08:55 PM, Oleg Nesterov wrote:
> Hi Ravi,
> 
> On 07/04, Ravi Bangoria wrote:
>>
>>> Now I understand what did you mean by "for each consumer". So if we move this logic
>>> into install/remove_breakpoint as I tried to suggest, we will also need another error
>>> code for the case when verify_opcode() returns false.
>>
>> Ok so if we can use verify_opcode() inside install_breakpoint(), we can probably
>> move implementation logic in install/remove_breakpoint(). Let me explore that more.
> 
> No, sorry for confusion, I meant another thing... But please forget. If we rely on
> verify_opcode() I no longer think it would be more clean to move this logic into
> install/remove_breakpoint.
> 
> However, I still think it would be better to avoid uprobe exporting and modifying
> set_swbp/set_orig_insn. May be we can simply kill both set_swbp() and set_orig_insn(),
> I'll re-check...

Good that you bring this up. Actually, we can implement same logic
without exporting uprobe. We can do "uprobe = container_of(arch_uprobe)"
in uprobe_write_opcode(). No need to export struct uprobe outside,
no need to change set_swbp() / set_orig_insn() syntax. Just that we
need to pass arch_uprobe object to uprobe_write_opcode().


But, I wanted to discuss about making ref_ctr_offset a uprobe property
or a consumer property, before posting v6:

If we make it a consumer property, the design becomes flexible for
user. User will have an option to either depend on kernel to handle
reference counter or he can create normal uprobe and manipulate
reference counter on his own. This will not require any changes to
existing tools. With this approach we need to increment / decrement
reference counter for each consumer. But, because of the fact that our
install_breakpoint() / remove_breakpoint() are not balanced, we have
to keep track of which reference counter have been updated in which
mm, for which uprobe and for which consumer. I.e. Maintain a list of
{uprobe, consumer, mm}. This will make kernel implementation quite
complex because there are chances of races. What we gain in return
is flexibility for users. Please let me know if there are any other
advantages of this approach.

By making it a uprobe property, we are forcing user to use
uprobe_register_refctr() for uprobes having reference counter. Because
we don't allow same uprobe with multiple reference counter and thus
user can't use both uprobe_register_refctr() and uprobe_register() in
parallel for same uprobe. Which means user does not have a flexibility
to create normal uprobe with uprobe_register() and maintain reference
counter on his own. But kernel implementation becomes simple with this
approach. Though, this will require changes in tools which already
supports SDT events, systemtap and bcc I'm aware of. But AFAICS, the
change should not be major. In fact, the change should make tool code
simpler because they don't have to worry about maintaining reference
counter.

Third options: How about allowing 0 as a special value for reference
counter? I mean allow uprobe_register() and uprobe_register_refctr()
in parallel but do not allow two uprobe_register_refctr() with two
different reference counter. If we can do this, the issue of forcing
legacy user to use uprobe_register_refctr() will be solved. I.e. no
change needed on tools side and still kernel implementation will
remain simple. I'll explore this option.

Let me know your thoughts.

Thanks,
Ravi

PS: We can't abuse MSB with first approach because any userspace tool
can also abuse MSB in parallel. Probably, we can abuse MSB in second
and third approach, though, there is no need to.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-11  8:44                   ` Ravi Bangoria
@ 2018-07-11  9:52                     ` Ravi Bangoria
  2018-07-12 14:58                     ` Oleg Nesterov
  1 sibling, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-11  9:52 UTC (permalink / raw)
  To: Oleg Nesterov, srikar
  Cc: rostedt, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton, Ravi Bangoria

Minor correction:

On 07/11/2018 02:14 PM, Ravi Bangoria wrote:
> 
> Third options: How about allowing 0 as a special value for reference
> counter?

s/reference counter/reference counter offset/

 I mean allow uprobe_register() and uprobe_register_refctr()
> in parallel but do not allow two uprobe_register_refctr() with two
> different reference counter.

Ditto.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-11  8:44                   ` Ravi Bangoria
  2018-07-11  9:52                     ` Ravi Bangoria
@ 2018-07-12 14:58                     ` Oleg Nesterov
  2018-07-12 19:53                       ` Song Liu
  2018-07-13  5:39                       ` Ravi Bangoria
  1 sibling, 2 replies; 50+ messages in thread
From: Oleg Nesterov @ 2018-07-12 14:58 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On 07/11, Ravi Bangoria wrote:
>
> > However, I still think it would be better to avoid uprobe exporting and modifying
> > set_swbp/set_orig_insn. May be we can simply kill both set_swbp() and set_orig_insn(),
> > I'll re-check...
>
> Good that you bring this up. Actually, we can implement same logic
> without exporting uprobe. We can do "uprobe = container_of(arch_uprobe)"
> in uprobe_write_opcode(). No need to export struct uprobe outside,
> no need to change set_swbp() / set_orig_insn() syntax. Just that we
> need to pass arch_uprobe object to uprobe_write_opcode().

Yes, but you still need to modify set_swbp/set_orig_insn to pass the new
arg to uprobe_write_opcode(). OK, this is fine.


> But, I wanted to discuss about making ref_ctr_offset a uprobe property
> or a consumer property, before posting v6:
>
> If we make it a consumer property, the design becomes flexible for
> user. User will have an option to either depend on kernel to handle
> reference counter or he can create normal uprobe and manipulate
> reference counter on his own. This will not require any changes to
> existing tools. With this approach we need to increment / decrement
> reference counter for each consumer. But, because of the fact that our
> install_breakpoint() / remove_breakpoint() are not balanced, we have
> to keep track of which reference counter have been updated in which
> mm, for which uprobe and for which consumer. I.e. Maintain a list of
> {uprobe, consumer, mm}.

Did you explore the UPROBE_KERN_CTR hack I tried to suggest?

If it can work then, again, *ctr_ptr |= UPROBE_KERN_CTR from install_breakpoint()
paths is always fine, the nontrivial part is remove_breakpoint() case, perhaps
you can do something like

		for (each uprobe in inode)
			for (each consumer)
				if (consumer_filter(consumer))
					goto keep_ctr;

		for (each vma which maps this counter)
			*ctr_ptr &= ~UPROBE_KERN_CTR;

	keep_ctr:
		set_orig_insn(...);

but again, I didn't even try to think about details, not sure this
can really work.

And in any case:

> This will make kernel implementation quite
> complex

Yes. So I personally won't insist on this feature.

> Third options: How about allowing 0 as a special value for reference
> counter? I mean allow uprobe_register() and uprobe_register_refctr()
> in parallel but do not allow two uprobe_register_refctr() with two
> different reference counter.

I am not sure I understand how you can do this, and how much complications
this needs, so I have no opinion.


Cough, just noticed the final part below...

> PS: We can't abuse MSB with first approach because any userspace tool
> can also abuse MSB in parallel.

For what?

> Probably, we can abuse MSB in second
> and third approach, though, there is no need to.

Confused... If userspace can change it, how we can use it in 2nd approach?

Oleg.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-12 14:58                     ` Oleg Nesterov
@ 2018-07-12 19:53                       ` Song Liu
  2018-07-13  7:55                         ` Ravi Bangoria
  2018-07-13  5:39                       ` Ravi Bangoria
  1 sibling, 1 reply; 50+ messages in thread
From: Song Liu @ 2018-07-12 19:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ravi Bangoria, srikar, rostedt, mhiramat, Peter Zijlstra, mingo,
	acme, alexander.shishkin, jolsa, namhyung, open list, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

I guess I got to the party late. I found this thread after I started developing
the same feature...

On Thu, Jul 12, 2018 at 7:58 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/11, Ravi Bangoria wrote:
>>
>> > However, I still think it would be better to avoid uprobe exporting and modifying
>> > set_swbp/set_orig_insn. May be we can simply kill both set_swbp() and set_orig_insn(),
>> > I'll re-check...
>>
>> Good that you bring this up. Actually, we can implement same logic
>> without exporting uprobe. We can do "uprobe = container_of(arch_uprobe)"
>> in uprobe_write_opcode(). No need to export struct uprobe outside,
>> no need to change set_swbp() / set_orig_insn() syntax. Just that we
>> need to pass arch_uprobe object to uprobe_write_opcode().
>
> Yes, but you still need to modify set_swbp/set_orig_insn to pass the new
> arg to uprobe_write_opcode(). OK, this is fine.
>
>
>> But, I wanted to discuss about making ref_ctr_offset a uprobe property
>> or a consumer property, before posting v6:
>>
>> If we make it a consumer property, the design becomes flexible for
>> user. User will have an option to either depend on kernel to handle
>> reference counter or he can create normal uprobe and manipulate
>> reference counter on his own. This will not require any changes to
>> existing tools. With this approach we need to increment / decrement
>> reference counter for each consumer. But, because of the fact that our
>> install_breakpoint() / remove_breakpoint() are not balanced, we have
>> to keep track of which reference counter have been updated in which
>> mm, for which uprobe and for which consumer. I.e. Maintain a list of
>> {uprobe, consumer, mm}.

Is it possible to maintain balanced refcount by modifying callers of
install_breakpoint() and remove_breakpoint()? I am actually working
toward this direction. And I found some imbalance between
     register_for_each_vma(uprobe, uc)
and
     register_for_each_vma(uprobe, NULL)

From reading the thread, I think there are other sources of imbalance.
But I think it is still possible to fix it? Please let me know if this is not
realistic...


About race conditions, I think both install_breakpoint() and remove_breakpoint()
are called with

   down_write(&mm->mmap_sem);


As long as we do the same when modifying the reference counter,
it should be fine, right?

Wait... sometimes we only down_read(). Is this by design?

Thanks,
Song

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-12 14:58                     ` Oleg Nesterov
  2018-07-12 19:53                       ` Song Liu
@ 2018-07-13  5:39                       ` Ravi Bangoria
  1 sibling, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-13  5:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Oleg,

On 07/12/2018 08:28 PM, Oleg Nesterov wrote:
> On 07/11, Ravi Bangoria wrote:
>>
>>> However, I still think it would be better to avoid uprobe exporting and modifying
>>> set_swbp/set_orig_insn. May be we can simply kill both set_swbp() and set_orig_insn(),
>>> I'll re-check...
>>
>> Good that you bring this up. Actually, we can implement same logic
>> without exporting uprobe. We can do "uprobe = container_of(arch_uprobe)"
>> in uprobe_write_opcode(). No need to export struct uprobe outside,
>> no need to change set_swbp() / set_orig_insn() syntax. Just that we
>> need to pass arch_uprobe object to uprobe_write_opcode().
> 
> Yes, but you still need to modify set_swbp/set_orig_insn to pass the new
> arg to uprobe_write_opcode(). OK, this is fine.
> 


We can probably kill set_swbp()/set_orig_insn() but with container_of()
tweak, we don't need to.


> 
>> But, I wanted to discuss about making ref_ctr_offset a uprobe property
>> or a consumer property, before posting v6:
>>
>> If we make it a consumer property, the design becomes flexible for
>> user. User will have an option to either depend on kernel to handle
>> reference counter or he can create normal uprobe and manipulate
>> reference counter on his own. This will not require any changes to
>> existing tools. With this approach we need to increment / decrement
>> reference counter for each consumer. But, because of the fact that our
>> install_breakpoint() / remove_breakpoint() are not balanced, we have
>> to keep track of which reference counter have been updated in which
>> mm, for which uprobe and for which consumer. I.e. Maintain a list of
>> {uprobe, consumer, mm}.
> 
> Did you explore the UPROBE_KERN_CTR hack I tried to suggest?


Yes I tried to implement it. Seems it's not safe to use MSB. Let me explain
this at the end.


> 
> If it can work then, again, *ctr_ptr |= UPROBE_KERN_CTR from install_breakpoint()
> paths is always fine, the nontrivial part is remove_breakpoint() case, perhaps
> you can do something like
> 
> 		for (each uprobe in inode)
> 			for (each consumer)
> 				if (consumer_filter(consumer))
> 					goto keep_ctr;
> 
> 		for (each vma which maps this counter)
> 			*ctr_ptr &= ~UPROBE_KERN_CTR;
> 
> 	keep_ctr:
> 		set_orig_insn(...);
> 
> but again, I didn't even try to think about details, not sure this
> can really work.
> 
> And in any case:
> 
>> This will make kernel implementation quite
>> complex
> 
> Yes. So I personally won't insist on this feature.


Sure, thanks for giving your opinion.


> 
>> Third options: How about allowing 0 as a special value for reference
>> counter? I mean allow uprobe_register() and uprobe_register_refctr()
>> in parallel but do not allow two uprobe_register_refctr() with two
>> different reference counter.
> 
> I am not sure I understand how you can do this, and how much complications
> this needs, so I have no opinion.


Yes, it's quite possible. So the design will remain same as current set of
patches(v5). v5 basically forces to use only one reference counter for one
uprobe, the new design will allow 0 as a special value for reference counter
_offset_. But still two non-zero reference counter offset for same uprobe
won't be allowed. So there won't be any major changes in the code. Only few
tweaks are needed. I've initial draft ready and it seems working. I'll test
it properly and send it as v6.


> 
> 
> Cough, just noticed the final part below...
> 
>> PS: We can't abuse MSB with first approach because any userspace tool
>> can also abuse MSB in parallel.
> 
> For what?


Ok. With first approach, we wants to allow multiple reference counter by
making ref_ctr_offset a consumer property. Now let's say one application
rely on kernel to maintain reference counter and other wants to maintain
on his own, and if we use MSB, reference counter value can not be
consistent. For ex,

  step1.
   app1 calls uprobe_register(inode1, off1, consumer{ref_ctr_offset = 0x123});
   kernel will use MSB and thus reference counter is 0x8000

  step2.
   app2 calls uprobe_register(inode1, off1, consumer{ref_ctr_offset = 0x0});
   and modifies reference counter by changing MSB. So reference
   counter is still 0x8000

  step3.
   app2 calls uprobe_unregister();
   and modifies reference counter by resetting MSB. So reference
   counter becomes 0x0000

  ... which is wrong.


> 
>> Probably, we can abuse MSB in second
>> and third approach, though, there is no need to.
> 
> Confused... If userspace can change it, how we can use it in 2nd approach?


Now, with second approach, we are making uprobe_register_refctr()
and uprobe_register() mutually exclusive. So, step2 in above example
is not possible. i.e. Either kernel can use MSB or user can use
MSB, but both can't at the same time. So it's quite safe to use MSB
with this approach. BUT, as this is all userspace, we can't assume
anything. If user has totally different mechanism for tracing SDT
event, he can still use uprobe infrastructure(which will use MSB)
and his own mechanism which will use MSB as well and again we are
screwed! So, IMO, it's not safe to use MSB. Better to stick with
increment/decrement ABI. Please let me know if you disagree. Anyway,
if we go by second approach, increment/decrement are always balanced.
So no need to use MSB with second approach.

Hmm, but for third approach, we have same issue as first approach so
it's not safe to use MSB with third approach as well.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-12 19:53                       ` Song Liu
@ 2018-07-13  7:55                         ` Ravi Bangoria
  2018-07-13 23:50                           ` Song Liu
  0 siblings, 1 reply; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-13  7:55 UTC (permalink / raw)
  To: Song Liu
  Cc: Oleg Nesterov, srikar, rostedt, mhiramat, Peter Zijlstra, mingo,
	acme, alexander.shishkin, jolsa, namhyung, open list, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Song,

On 07/13/2018 01:23 AM, Song Liu wrote:
> I guess I got to the party late. I found this thread after I started developing
> the same feature...
> 
> On Thu, Jul 12, 2018 at 7:58 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> On 07/11, Ravi Bangoria wrote:
>>>
>>>> However, I still think it would be better to avoid uprobe exporting and modifying
>>>> set_swbp/set_orig_insn. May be we can simply kill both set_swbp() and set_orig_insn(),
>>>> I'll re-check...
>>>
>>> Good that you bring this up. Actually, we can implement same logic
>>> without exporting uprobe. We can do "uprobe = container_of(arch_uprobe)"
>>> in uprobe_write_opcode(). No need to export struct uprobe outside,
>>> no need to change set_swbp() / set_orig_insn() syntax. Just that we
>>> need to pass arch_uprobe object to uprobe_write_opcode().
>>
>> Yes, but you still need to modify set_swbp/set_orig_insn to pass the new
>> arg to uprobe_write_opcode(). OK, this is fine.
>>
>>
>>> But, I wanted to discuss about making ref_ctr_offset a uprobe property
>>> or a consumer property, before posting v6:
>>>
>>> If we make it a consumer property, the design becomes flexible for
>>> user. User will have an option to either depend on kernel to handle
>>> reference counter or he can create normal uprobe and manipulate
>>> reference counter on his own. This will not require any changes to
>>> existing tools. With this approach we need to increment / decrement
>>> reference counter for each consumer. But, because of the fact that our
>>> install_breakpoint() / remove_breakpoint() are not balanced, we have
>>> to keep track of which reference counter have been updated in which
>>> mm, for which uprobe and for which consumer. I.e. Maintain a list of
>>> {uprobe, consumer, mm}.
> 
> Is it possible to maintain balanced refcount by modifying callers of
> install_breakpoint() and remove_breakpoint()? I am actually working
> toward this direction. And I found some imbalance between
>      register_for_each_vma(uprobe, uc)
> and
>      register_for_each_vma(uprobe, NULL)
> 
> From reading the thread, I think there are other sources of imbalance.
> But I think it is still possible to fix it? Please let me know if this is not
> realistic...


I don't think so. It all depends on memory layout of the process, the
execution sequence of tracer vs target, how binary is loaded or how mmap()s
are called. To achieve a balance you need to change current uprobe
implementation. (I haven't explored to change current implementation because
I personally think there is no need to). Let me show you a simple example on
my Ubuntu 18.04 (powerpc vm) with upstream kernel:

-------------
  $ cat loop.c
    #include <stdio.h>
    #include <unistd.h>
    
    void foo(int i)
    {
            printf("Hi: %d\n", i);
            sleep(1);
    }
    
    void main()
    {
            int i;
            for (i = 0; i < 100; i++)
                    foo(i);
    }
  
  $ sudo ./perf probe -x ~/loop foo
  $ sudo ./perf probe install_breakpoint uprobe mm vaddr
  $ sudo ./perf probe remove_breakpoint uprobe mm vaddr
  
  term1~$ ./loop
  
  term2~$ sudo ./perf record -a -e probe:* -o perf.data.kprobe
  
  term3~$ sudo ./perf record -a -e probe_loop:foo
          ^C
  
  term2~$ ...
          ^C[ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.217 MB perf.data.probe (10 samples) ]
  
  term2~$ sudo ./perf script -i perf.data.kprobe
    probe:install_breakpoint: (c00000000032e4e8) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108e0844
    probe:install_breakpoint: (c00000000032e4e8) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108d0844
    probe:install_breakpoint: (c00000000032e4e8) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5055500 vaddr=0x7fffa2620844
    probe:install_breakpoint: (c00000000032e4e8) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5055500 vaddr=0x7fffa2620844
     probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108e0844
     probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108d0844
     probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108d0844
     probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108e0844
     probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108e0844
     probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108d0844
-------------

Here install_breakpoint() for our target (mm: 0xc0000000b5072900) was
called 2 times where as remove_breakpoint() was called 6 times.

Because, there is an imbalance, and if you make reference counter a
consumer property, you have two options. Either you have to fix
current uprobe infrastructure to solve this imbalance. Or maintain
a list of already updated counter as I've explained(in reply to Oleg).

Now,

  uprobe_register()
    register_for_each_vma()
      install_breakpoint()

gets called for each consumer, but

  uprobe_mmap()
    install_breakpoint()

gets called only once. Now, if you make ref_ctr_offset a consumer
property, you have to increment reference counter for each consumer
in case of uprobe_mmap(). Also, you have to make sure you update
reference counter only once for each consumer because install/
remove_breakpoint() are not balanced. Now, what if reference
counter increment fails for any one consumer? You have to rollback
already updated ones, which brings more complication.

Now, other complication is, generally vma holding reference counter
won't be present when install_breakpoint() gets called from
uprobe_mmap(). I've introduced delayed_uprobes for this. This is
anyway needed with any approach.

The only advantage I was seeing by making reference counter a
consumer property was a user flexibility to update reference counter
on his own. But I've already proposed a solution for that.

So, I personally don't suggest to make ref_ctr_offset a consumer
property because I, again personally, don't think it's a consumer
property.

Please feel free to say if this all looks crap to you :)


> 
> 
> About race conditions, I think both install_breakpoint() and remove_breakpoint()
> are called with
> 
>    down_write(&mm->mmap_sem);
> 


No. Not about updating in one mm. I meant, races in maintaining
all complication I've mentioned above.


> 
> As long as we do the same when modifying the reference counter,
> it should be fine, right?
> 
> Wait... sometimes we only down_read(). Is this by design?


Didn't get this.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-13  7:55                         ` Ravi Bangoria
@ 2018-07-13 23:50                           ` Song Liu
  2018-07-16  8:20                             ` Ravi Bangoria
  2018-07-16  8:51                             ` Ravi Bangoria
  0 siblings, 2 replies; 50+ messages in thread
From: Song Liu @ 2018-07-13 23:50 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Oleg Nesterov, srikar, rostedt, mhiramat, Peter Zijlstra, mingo,
	acme, alexander.shishkin, jolsa, namhyung, open list, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Fri, Jul 13, 2018 at 12:55 AM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> Hi Song,
>
> On 07/13/2018 01:23 AM, Song Liu wrote:
>> I guess I got to the party late. I found this thread after I started developing
>> the same feature...
>>
>> On Thu, Jul 12, 2018 at 7:58 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>> On 07/11, Ravi Bangoria wrote:
>>>>
>>>>> However, I still think it would be better to avoid uprobe exporting and modifying
>>>>> set_swbp/set_orig_insn. May be we can simply kill both set_swbp() and set_orig_insn(),
>>>>> I'll re-check...
>>>>
>>>> Good that you bring this up. Actually, we can implement same logic
>>>> without exporting uprobe. We can do "uprobe = container_of(arch_uprobe)"
>>>> in uprobe_write_opcode(). No need to export struct uprobe outside,
>>>> no need to change set_swbp() / set_orig_insn() syntax. Just that we
>>>> need to pass arch_uprobe object to uprobe_write_opcode().
>>>
>>> Yes, but you still need to modify set_swbp/set_orig_insn to pass the new
>>> arg to uprobe_write_opcode(). OK, this is fine.
>>>
>>>
>>>> But, I wanted to discuss about making ref_ctr_offset a uprobe property
>>>> or a consumer property, before posting v6:
>>>>
>>>> If we make it a consumer property, the design becomes flexible for
>>>> user. User will have an option to either depend on kernel to handle
>>>> reference counter or he can create normal uprobe and manipulate
>>>> reference counter on his own. This will not require any changes to
>>>> existing tools. With this approach we need to increment / decrement
>>>> reference counter for each consumer. But, because of the fact that our
>>>> install_breakpoint() / remove_breakpoint() are not balanced, we have
>>>> to keep track of which reference counter have been updated in which
>>>> mm, for which uprobe and for which consumer. I.e. Maintain a list of
>>>> {uprobe, consumer, mm}.
>>
>> Is it possible to maintain balanced refcount by modifying callers of
>> install_breakpoint() and remove_breakpoint()? I am actually working
>> toward this direction. And I found some imbalance between
>>      register_for_each_vma(uprobe, uc)
>> and
>>      register_for_each_vma(uprobe, NULL)
>>
>> From reading the thread, I think there are other sources of imbalance.
>> But I think it is still possible to fix it? Please let me know if this is not
>> realistic...
>
>
> I don't think so. It all depends on memory layout of the process, the
> execution sequence of tracer vs target, how binary is loaded or how mmap()s
> are called. To achieve a balance you need to change current uprobe
> implementation. (I haven't explored to change current implementation because
> I personally think there is no need to). Let me show you a simple example on
> my Ubuntu 18.04 (powerpc vm) with upstream kernel:
>
> -------------
>   $ cat loop.c
>     #include <stdio.h>
>     #include <unistd.h>
>
>     void foo(int i)
>     {
>             printf("Hi: %d\n", i);
>             sleep(1);
>     }
>
>     void main()
>     {
>             int i;
>             for (i = 0; i < 100; i++)
>                     foo(i);
>     }
>
>   $ sudo ./perf probe -x ~/loop foo
>   $ sudo ./perf probe install_breakpoint uprobe mm vaddr
>   $ sudo ./perf probe remove_breakpoint uprobe mm vaddr
>
>   term1~$ ./loop
>
>   term2~$ sudo ./perf record -a -e probe:* -o perf.data.kprobe
>
>   term3~$ sudo ./perf record -a -e probe_loop:foo
>           ^C
>
>   term2~$ ...
>           ^C[ perf record: Woken up 1 times to write data ]
>           [ perf record: Captured and wrote 0.217 MB perf.data.probe (10 samples) ]
>
>   term2~$ sudo ./perf script -i perf.data.kprobe
>     probe:install_breakpoint: (c00000000032e4e8) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108e0844
>     probe:install_breakpoint: (c00000000032e4e8) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108d0844
>     probe:install_breakpoint: (c00000000032e4e8) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5055500 vaddr=0x7fffa2620844
>     probe:install_breakpoint: (c00000000032e4e8) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5055500 vaddr=0x7fffa2620844
>      probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108e0844
>      probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108d0844
>      probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108d0844
>      probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108e0844
>      probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108e0844
>      probe:remove_breakpoint: (c00000000032e938) uprobe=0xc0000000a40d0e00 mm=0xc0000000b5072900 vaddr=0x1108d0844
> -------------
>
> Here install_breakpoint() for our target (mm: 0xc0000000b5072900) was
> called 2 times where as remove_breakpoint() was called 6 times.
>
> Because, there is an imbalance, and if you make reference counter a
> consumer property, you have two options. Either you have to fix
> current uprobe infrastructure to solve this imbalance. Or maintain
> a list of already updated counter as I've explained(in reply to Oleg).
>
> Now,
>
>   uprobe_register()
>     register_for_each_vma()
>       install_breakpoint()
>
> gets called for each consumer, but
>
>   uprobe_mmap()
>     install_breakpoint()
>
> gets called only once. Now, if you make ref_ctr_offset a consumer
> property, you have to increment reference counter for each consumer
> in case of uprobe_mmap(). Also, you have to make sure you update
> reference counter only once for each consumer because install/
> remove_breakpoint() are not balanced. Now, what if reference
> counter increment fails for any one consumer? You have to rollback
> already updated ones, which brings more complication.

Hmm... what happens when we have multiple uprobes sharing the same
reference counter? It feels equally complicate to me. Or did I miss any
cases here?


>
> Now, other complication is, generally vma holding reference counter
> won't be present when install_breakpoint() gets called from
> uprobe_mmap(). I've introduced delayed_uprobes for this. This is
> anyway needed with any approach.

Yeah, I am aware of this problem. But I haven't started looking into a fix.

>
> The only advantage I was seeing by making reference counter a
> consumer property was a user flexibility to update reference counter
> on his own. But I've already proposed a solution for that.
>
> So, I personally don't suggest to make ref_ctr_offset a consumer
> property because I, again personally, don't think it's a consumer
> property.
>
> Please feel free to say if this all looks crap to you :)
>

These all make sense. Multiple consumer case does make the problem
a lot more complicated

For the example you showed above (~/loop:foo), will the following patch
fixes the imbalance? It worked in my tests.

Thanks,
Song

From 664b087cff0d458c0360a6834140a2a88dff478e Mon Sep 17 00:00:00 2001
From: Song Liu <songliubraving@fb.com>
Date: Thu, 12 Jul 2018 11:16:51 -0700
Subject: [PATCH] perf/core,uprobe: fix imbalanced install_breakpoint and
 remove_breakpoint

When uprobes are used by perf event, it is handle as follows:

Enable path:
1. perf_event_open() => TRACE_REG_PERF_REGISTER => probe_event_enable()
2. PERF_EVENT_IOC_ENABLE => TRACE_REG_PERF_OPEN => uprobe_perf_open()

Disable path:
3. PERF_EVENT_IOC_DISABLE => TRACE_REG_PERF_CLOSE => uprobe_perf_close()
4. close(fd) => TRACE_REG_PERF_UNREGISTER => probe_event_disable()

In this routine, install_breakpoint() is called once at step 2; while
remove_breakpoint is called twice at both step 3 and step 4.

This patch tries to resolve this imbalance by passing extra flag
"restore_insn" to probe_event_disable().

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/uprobes.h     |  6 ++++--
 kernel/events/uprobes.c     | 21 +++++++++++++++------
 kernel/trace/trace_uprobe.c | 14 ++++++++++----
 3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e950df8..2b7a67b64877 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -124,7 +124,8 @@ extern unsigned long uprobe_get_trap_addr(struct
pt_regs *regs);
 extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long
vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct
uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct
uprobe_consumer *uc, bool);
-extern void uprobe_unregister(struct inode *inode, loff_t offset,
struct uprobe_consumer *uc);
+extern void uprobe_unregister(struct inode *inode, loff_t offset,
+                  struct uprobe_consumer *uc, bool);
 extern int uprobe_mmap(struct vm_area_struct *vma);
 extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long
start, unsigned long end);
 extern void uprobe_start_dup_mmap(void);
@@ -166,7 +167,8 @@ uprobe_apply(struct inode *inode, loff_t offset,
struct uprobe_consumer *uc, boo
     return -ENOSYS;
 }
 static inline void
-uprobe_unregister(struct inode *inode, loff_t offset, struct
uprobe_consumer *uc)
+uprobe_unregister(struct inode *inode, loff_t offset, struct
uprobe_consumer *uc,
+          bool restore_insn)
 {
 }
 static inline int uprobe_mmap(struct vm_area_struct *vma)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ccc579a7d32e..988f5a5acaca 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -846,14 +846,16 @@ static int __uprobe_register(struct uprobe
*uprobe, struct uprobe_consumer *uc)
     return register_for_each_vma(uprobe, uc);
 }

-static void __uprobe_unregister(struct uprobe *uprobe, struct
uprobe_consumer *uc)
+static void __uprobe_unregister(struct uprobe *uprobe, struct
uprobe_consumer *uc,
+                bool restore_insn)
 {
-    int err;
+    int err = 0;

     if (WARN_ON(!consumer_del(uprobe, uc)))
         return;

-    err = register_for_each_vma(uprobe, NULL);
+    if (restore_insn)
+        err = register_for_each_vma(uprobe, NULL);
     /* TODO : cant unregister? schedule a worker thread */
     if (!uprobe->consumers && !err)
         delete_uprobe(uprobe);
@@ -906,7 +908,11 @@ int uprobe_register(struct inode *inode, loff_t
offset, struct uprobe_consumer *
     if (likely(uprobe_is_active(uprobe))) {
         ret = __uprobe_register(uprobe, uc);
         if (ret)
-            __uprobe_unregister(uprobe, uc);
+            /*
+             * only do remove_breakpoint (restore_insn)
+             * when failed in install_breakpoint (ret > 0)
+             */
+            __uprobe_unregister(uprobe, uc, ret > 0);
     }
     up_write(&uprobe->register_rwsem);
     put_uprobe(uprobe);
@@ -951,8 +957,11 @@ int uprobe_apply(struct inode *inode, loff_t offset,
  * @inode: the file in which the probe has to be removed.
  * @offset: offset from the start of the file.
  * @uc: identify which probe if multiple probes are colocated.
+ * @restore_insn: shall we restore original instruction with
+ *                register_for_each_vma(uprobe, NULL)
  */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct
uprobe_consumer *uc)
+void uprobe_unregister(struct inode *inode, loff_t offset, struct
uprobe_consumer *uc,
+               bool restore_insn)
 {
     struct uprobe *uprobe;

@@ -961,7 +970,7 @@ void uprobe_unregister(struct inode *inode, loff_t
offset, struct uprobe_consume
         return;

     down_write(&uprobe->register_rwsem);
-    __uprobe_unregister(uprobe, uc);
+    __uprobe_unregister(uprobe, uc, restore_insn);
     up_write(&uprobe->register_rwsem);
     put_uprobe(uprobe);
 }
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf89a51e740d..fb6fb9d00cdc 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -938,7 +938,8 @@ probe_event_enable(struct trace_uprobe *tu, struct
trace_event_file *file,
 }

 static void
-probe_event_disable(struct trace_uprobe *tu, struct trace_event_file *file)
+probe_event_disable(struct trace_uprobe *tu, struct trace_event_file *file,
+            bool restore_insn)
 {
     if (!trace_probe_is_enabled(&tu->tp))
         return;
@@ -961,7 +962,8 @@ probe_event_disable(struct trace_uprobe *tu,
struct trace_event_file *file)

     WARN_ON(!uprobe_filter_is_empty(&tu->filter));

-    uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
+    uprobe_unregister(tu->inode, tu->offset, &tu->consumer,
+              restore_insn);
     tu->inode = NULL;
     tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;

@@ -1197,7 +1199,7 @@ trace_uprobe_register(struct trace_event_call
*event, enum trace_reg type,
         return probe_event_enable(tu, file, NULL);

     case TRACE_REG_UNREGISTER:
-        probe_event_disable(tu, file);
+        probe_event_disable(tu, file, true);
         return 0;

 #ifdef CONFIG_PERF_EVENTS
@@ -1205,7 +1207,11 @@ trace_uprobe_register(struct trace_event_call
*event, enum trace_reg type,
         return probe_event_enable(tu, NULL, uprobe_perf_filter);

     case TRACE_REG_PERF_UNREGISTER:
-        probe_event_disable(tu, NULL);
+        /*
+         * Don't restore instruction, as TRACE_REG_PERF_CLOSE
+         * already did that.
+         */
+        probe_event_disable(tu, NULL, false /* restore_insn */);
         return 0;

     case TRACE_REG_PERF_OPEN:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-13 23:50                           ` Song Liu
@ 2018-07-16  8:20                             ` Ravi Bangoria
  2018-07-16  8:51                             ` Ravi Bangoria
  1 sibling, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-16  8:20 UTC (permalink / raw)
  To: Song Liu
  Cc: Oleg Nesterov, srikar, rostedt, mhiramat, Peter Zijlstra, mingo,
	acme, alexander.shishkin, jolsa, namhyung, open list, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Song,

On 07/14/2018 05:20 AM, Song Liu wrote:
> 
> Hmm... what happens when we have multiple uprobes sharing the same
> reference counter? It feels equally complicate to me. Or did I miss any
> cases here?


As far as I can think of, it should be handled by default. No special
treatment needed for this.

...

> 
> This patch tries to resolve this imbalance by passing extra flag
> "restore_insn" to probe_event_disable().
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>


Ah cool. Seems this will make them balanced. But is it fine to change
uprobe_unregister()? It's already an ABI. No other way around?

Also, one more source of imbalance is this condition:

          if (is_register)
                  flags |= VM_WRITE;

in valid_vma(). You need to take care of that as well.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
  2018-07-13 23:50                           ` Song Liu
  2018-07-16  8:20                             ` Ravi Bangoria
@ 2018-07-16  8:51                             ` Ravi Bangoria
  1 sibling, 0 replies; 50+ messages in thread
From: Ravi Bangoria @ 2018-07-16  8:51 UTC (permalink / raw)
  To: Song Liu
  Cc: Oleg Nesterov, srikar, rostedt, mhiramat, Peter Zijlstra, mingo,
	acme, alexander.shishkin, jolsa, namhyung, open list, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Song,

BTW, I've sent v6. Can you please review it.

https://lkml.org/lkml/2018/7/16/353

Thanks
Ravi


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2018-07-16  8:51 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-28  5:21 [PATCH v5 00/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-06-28  5:22 ` [PATCH v5 01/10] Uprobes: Move uprobe structure to uprobe.h Ravi Bangoria
2018-06-28  5:22 ` [PATCH v5 02/10] Uprobes: Simplify uprobe_register() body Ravi Bangoria
2018-06-28  5:22 ` [PATCH v5 03/10] Uprobe: Change set_swbp definition Ravi Bangoria
2018-06-28  5:22 ` [PATCH v5 04/10] Uprobe: Change set_orig_insn definition Ravi Bangoria
2018-06-28  5:22 ` [PATCH v5 05/10] Uprobe: Change uprobe_write_opcode definition Ravi Bangoria
2018-06-28  5:22 ` [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-06-28 19:51   ` Oleg Nesterov
2018-06-29  3:23     ` Ravi Bangoria
2018-07-01 21:09   ` Oleg Nesterov
2018-07-02  5:16     ` Ravi Bangoria
2018-07-02 18:01       ` Oleg Nesterov
2018-07-03  5:30         ` Ravi Bangoria
2018-07-03  6:16           ` Srikar Dronamraju
2018-07-03  7:43             ` Ravi Bangoria
2018-07-04  9:16               ` Srikar Dronamraju
2018-07-04  9:24                 ` Ravi Bangoria
2018-07-03  8:11           ` Ravi Bangoria
2018-07-03 16:36           ` Oleg Nesterov
2018-07-03 17:25             ` Oleg Nesterov
2018-07-04  4:53               ` Ravi Bangoria
2018-07-10 15:25                 ` Oleg Nesterov
2018-07-11  8:44                   ` Ravi Bangoria
2018-07-11  9:52                     ` Ravi Bangoria
2018-07-12 14:58                     ` Oleg Nesterov
2018-07-12 19:53                       ` Song Liu
2018-07-13  7:55                         ` Ravi Bangoria
2018-07-13 23:50                           ` Song Liu
2018-07-16  8:20                             ` Ravi Bangoria
2018-07-16  8:51                             ` Ravi Bangoria
2018-07-13  5:39                       ` Ravi Bangoria
2018-07-04  4:49             ` Ravi Bangoria
2018-07-03 17:12           ` Oleg Nesterov
2018-07-03 18:23             ` Oleg Nesterov
2018-07-04  5:25               ` Ravi Bangoria
2018-07-02 16:01   ` Srikar Dronamraju
2018-07-02 18:05     ` Oleg Nesterov
2018-07-03  6:29     ` Ravi Bangoria
2018-07-03 19:26       ` Oleg Nesterov
2018-07-04  5:26         ` Ravi Bangoria
2018-07-04  6:07     ` Ravi Bangoria
2018-06-28  5:22 ` [PATCH v5 07/10] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
2018-06-28  5:22 ` [PATCH v5 08/10] Uprobes/sdt: " Ravi Bangoria
2018-06-28  5:22 ` [PATCH v5 09/10] Uprobes/sdt: Document about reference counter Ravi Bangoria
2018-07-02 14:54   ` Srikar Dronamraju
2018-07-03  7:50     ` Ravi Bangoria
2018-06-28  5:22 ` [PATCH v5 10/10] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
2018-07-02 14:45   ` Srikar Dronamraju
2018-07-02 14:57   ` Srikar Dronamraju
2018-07-03  8:00     ` Ravi Bangoria

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).