linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-08-09  4:18 Ravi Bangoria
  2018-08-09  4:18 ` [PATCH v8 1/6] Uprobes: Simplify uprobe_register() body Ravi Bangoria
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-09  4:18 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

v8 changes:
 - Call delayed_uprobe_remove(uprobe, NULL) from put_uprobe() as well.
 - Other minor optimizations.

v7: https://lkml.org/lkml/2018/7/31/20


Description:
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the a uprobe infrastructure. Ex,[2]

  # cat tick.c
    ... 
    for (i = 0; i < 100; i++) {
	DTRACE_PROBE1(tick, loop1, i);
        if (TICK_LOOP2_ENABLED()) {
            DTRACE_PROBE1(tick, loop2, i); 
        }
        printf("hi: %d\n", i); 
        sleep(1);
    }   
    ... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
             3      sdt_tick:loop1
             0      sdt_tick:loop2
     2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C  
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation


Ravi Bangoria (6):
  Uprobes: Simplify uprobe_register() body
  Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  perf probe: Support SDT markers having reference counter (semaphore)

 arch/arm/probes/uprobes/core.c |   2 +-
 arch/mips/kernel/uprobes.c     |   2 +-
 include/linux/uprobes.h        |   7 +-
 kernel/events/uprobes.c        | 329 +++++++++++++++++++++++++++++++++++------
 kernel/trace/trace.c           |   2 +-
 kernel/trace/trace_uprobe.c    |  75 +++++++++-
 tools/perf/util/probe-event.c  |  39 ++++-
 tools/perf/util/probe-event.h  |   1 +
 tools/perf/util/probe-file.c   |  34 ++++-
 tools/perf/util/probe-file.h   |   1 +
 tools/perf/util/symbol-elf.c   |  46 ++++--
 tools/perf/util/symbol.h       |   7 +
 12 files changed, 472 insertions(+), 73 deletions(-)

-- 
2.14.4


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v8 1/6] Uprobes: Simplify uprobe_register() body
  2018-08-09  4:18 [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
@ 2018-08-09  4:18 ` Ravi Bangoria
  2018-08-11  8:00   ` Song Liu
  2018-08-13  8:56   ` Srikar Dronamraju
  2018-08-09  4:18 ` [PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Ravi Bangoria
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-09  4:18 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Simplify uprobe_register() function body and let __uprobe_register()
handle everything. Also move dependency functions around to fix build
failures.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/events/uprobes.c | 69 ++++++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ccc579a7d32e..471eac896635 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 	return err;
 }
 
-static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-	consumer_add(uprobe, uc);
-	return register_for_each_vma(uprobe, uc);
-}
-
-static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
+static void
+__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 	int err;
 
@@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *u
 }
 
 /*
- * uprobe_register - register a probe
+ * uprobe_unregister - unregister a already registered probe.
+ * @inode: the file in which the probe has to be removed.
+ * @offset: offset from the start of the file.
+ * @uc: identify which probe if multiple probes are colocated.
+ */
+void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+{
+	struct uprobe *uprobe;
+
+	uprobe = find_uprobe(inode, offset);
+	if (WARN_ON(!uprobe))
+		return;
+
+	down_write(&uprobe->register_rwsem);
+	__uprobe_unregister(uprobe, uc);
+	up_write(&uprobe->register_rwsem);
+	put_uprobe(uprobe);
+}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
+
+/*
+ * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, uprobe_register() takes a creation
+ * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of uprobe_register() is required to keep @inode
+ * unregisters. Caller of __uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+static int __uprobe_register(struct inode *inode, loff_t offset,
+			     struct uprobe_consumer *uc)
 {
 	struct uprobe *uprobe;
 	int ret;
@@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
 	down_write(&uprobe->register_rwsem);
 	ret = -EAGAIN;
 	if (likely(uprobe_is_active(uprobe))) {
-		ret = __uprobe_register(uprobe, uc);
+		consumer_add(uprobe, uc);
+		ret = register_for_each_vma(uprobe, uc);
 		if (ret)
 			__uprobe_unregister(uprobe, uc);
 	}
@@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
 		goto retry;
 	return ret;
 }
+
+int uprobe_register(struct inode *inode, loff_t offset,
+		    struct uprobe_consumer *uc)
+{
+	return __uprobe_register(inode, offset, uc);
+}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
@@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
 	return ret;
 }
 
-/*
- * uprobe_unregister - unregister a already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
- * @uc: identify which probe if multiple probes are colocated.
- */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
-{
-	struct uprobe *uprobe;
-
-	uprobe = find_uprobe(inode, offset);
-	if (WARN_ON(!uprobe))
-		return;
-
-	down_write(&uprobe->register_rwsem);
-	__uprobe_unregister(uprobe, uc);
-	up_write(&uprobe->register_rwsem);
-	put_uprobe(uprobe);
-}
-EXPORT_SYMBOL_GPL(uprobe_unregister);
-
 static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 {
 	struct vm_area_struct *vma;
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
  2018-08-09  4:18 [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
  2018-08-09  4:18 ` [PATCH v8 1/6] Uprobes: Simplify uprobe_register() body Ravi Bangoria
@ 2018-08-09  4:18 ` Ravi Bangoria
  2018-08-11  8:01   ` Song Liu
  2018-08-13  8:51   ` Srikar Dronamraju
  2018-08-09  4:18 ` [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-09  4:18 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Add addition argument 'arch_uprobe' to uprobe_write_opcode().
We need this in later set of patches.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/arm/probes/uprobes/core.c | 2 +-
 arch/mips/kernel/uprobes.c     | 2 +-
 include/linux/uprobes.h        | 2 +-
 kernel/events/uprobes.c        | 9 +++++----
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index d1329f1ba4e4..bf992264060e 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
 int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	     unsigned long vaddr)
 {
-	return uprobe_write_opcode(mm, vaddr,
+	return uprobe_write_opcode(auprobe, mm, vaddr,
 		   __opcode_to_mem_arm(auprobe->bpinsn));
 }
 
diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c
index f7a0645ccb82..4aaff3b3175c 100644
--- a/arch/mips/kernel/uprobes.c
+++ b/arch/mips/kernel/uprobes.c
@@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	unsigned long vaddr)
 {
-	return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+	return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e950df8..bb9d2084af03 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
+extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 471eac896635..c0418ba52ba8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
  * Called with mm->mmap_sem held for write.
  * Return 0 (success) or a negative errno.
  */
-int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
-			uprobe_opcode_t opcode)
+int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
+			unsigned long vaddr, uprobe_opcode_t opcode)
 {
 	struct page *old_page, *new_page;
 	struct vm_area_struct *vma;
@@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
  */
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
 {
-	return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+	return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 /**
@@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned
 int __weak
 set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
 {
-	return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t *)&auprobe->insn);
+	return uprobe_write_opcode(auprobe, mm, vaddr,
+			*(uprobe_opcode_t *)&auprobe->insn);
 }
 
 static struct uprobe *get_uprobe(struct uprobe *uprobe)
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-09  4:18 [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
  2018-08-09  4:18 ` [PATCH v8 1/6] Uprobes: Simplify uprobe_register() body Ravi Bangoria
  2018-08-09  4:18 ` [PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Ravi Bangoria
@ 2018-08-09  4:18 ` Ravi Bangoria
  2018-08-09 14:38   ` Oleg Nesterov
                     ` (3 more replies)
  2018-08-09  4:18 ` [PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
                   ` (3 subsequent siblings)
  6 siblings, 4 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-09  4:18 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

    <path>:<offset>[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

    uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
[Only trace_uprobe.c]
---
 include/linux/uprobes.h     |   5 +
 kernel/events/uprobes.c     | 246 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/trace/trace.c        |   2 +-
 kernel/trace/trace_uprobe.c |  38 ++++++-
 4 files changed, 280 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
 {
 	return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+	return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c0418ba52ba8..61b0481ef417 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
 	struct uprobe_consumer	*consumers;
 	struct inode		*inode;		/* Also hold a ref to inode */
 	loff_t			offset;
+	loff_t			ref_ctr_offset;
 	unsigned long		flags;
 
 	/*
@@ -88,6 +89,15 @@ struct uprobe {
 	struct arch_uprobe	arch;
 };
 
+struct delayed_uprobe {
+	struct list_head list;
+	struct uprobe *uprobe;
+	struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,157 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
 	return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct delayed_uprobe *du;
+
+	list_for_each_entry(du, &delayed_uprobe_list, list)
+		if (du->uprobe == uprobe && du->mm == mm)
+			return du;
+	return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct delayed_uprobe *du;
+
+	if (delayed_uprobe_check(uprobe, mm))
+		return 0;
+
+	du = kzalloc(sizeof(*du), GFP_KERNEL);
+	if (!du)
+		return -ENOMEM;
+
+	du->uprobe = uprobe;
+	du->mm = mm;
+	list_add(&du->list, &delayed_uprobe_list);
+	return 0;
+}
+
+static void delayed_uprobe_delete(struct delayed_uprobe *du)
+{
+	if (!du)
+		return;
+	list_del(&du->list);
+	kfree(du);
+}
+
+static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct list_head *pos, *q;
+	struct delayed_uprobe *du;
+
+	if (!uprobe && !mm)
+		return;
+
+	list_for_each_safe(pos, q, &delayed_uprobe_list) {
+		du = list_entry(pos, struct delayed_uprobe, list);
+
+		if (uprobe && mm && du->uprobe == uprobe && du->mm == mm)
+			delayed_uprobe_delete(du);
+		else if (!uprobe && du->mm == mm)
+			delayed_uprobe_delete(du);
+		else if (!mm && du->uprobe == uprobe)
+			delayed_uprobe_delete(du);
+	}
+}
+
+static bool valid_ref_ctr_vma(struct uprobe *uprobe,
+			      struct vm_area_struct *vma)
+{
+	unsigned long vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
+
+	return uprobe->ref_ctr_offset &&
+		vma->vm_file &&
+		file_inode(vma->vm_file) == uprobe->inode &&
+		(vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
+		vma->vm_start <= vaddr &&
+		vma->vm_end > vaddr;
+}
+
+static struct vm_area_struct *
+find_ref_ctr_vma(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct vm_area_struct *tmp;
+
+	for (tmp = mm->mmap; tmp; tmp = tmp->vm_next)
+		if (valid_ref_ctr_vma(uprobe, tmp))
+			return tmp;
+
+	return NULL;
+}
+
+static int
+__update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
+{
+	void *kaddr;
+	struct page *page;
+	struct vm_area_struct *vma;
+	int ret = 0;
+	short *ptr;
+
+	if (vaddr == 0 || d == 0)
+		return -EINVAL;
+
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+			FOLL_WRITE, &page, &vma, NULL);
+	if (unlikely(ret <= 0)) {
+		/*
+		 * We are asking for 1 page. If get_user_pages_remote() fails,
+		 * it may return 0, in that case we have to return error.
+		 */
+		ret = (ret == 0) ? -EBUSY : ret;
+		pr_warn("Failed to %s ref_ctr. (%d)\n",
+			d > 0 ? "increment" : "decrement", ret);
+		return ret;
+	}
+
+	kaddr = kmap_atomic(page);
+	ptr = kaddr + (vaddr & ~PAGE_MASK);
+
+	if (unlikely(*ptr + d < 0)) {
+		pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
+			"curr val: %d, delta: %d\n", vaddr, *ptr, d);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	*ptr += d;
+	ret = 0;
+out:
+	kunmap_atomic(kaddr);
+	put_page(page);
+	return ret;
+}
+
+static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
+			  bool is_register)
+{
+	struct vm_area_struct *rc_vma;
+	unsigned long rc_vaddr;
+	int ret = 0;
+
+	rc_vma = find_ref_ctr_vma(uprobe, mm);
+
+	if (rc_vma) {
+		rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
+		ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
+
+		if (is_register)
+			return ret;
+	}
+
+	mutex_lock(&delayed_uprobe_lock);
+	if (is_register)
+		ret = delayed_uprobe_add(uprobe, mm);
+	else
+		delayed_uprobe_remove(uprobe, mm);
+	mutex_unlock(&delayed_uprobe_lock);
+
+	return ret;
+}
+
 /*
  * NOTE:
  * Expect the breakpoint instruction to be the smallest size instruction for
@@ -302,9 +463,13 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
 int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 			unsigned long vaddr, uprobe_opcode_t opcode)
 {
+	struct uprobe *uprobe;
 	struct page *old_page, *new_page;
 	struct vm_area_struct *vma;
-	int ret;
+	int ret, is_register, ref_ctr_updated = 0;
+
+	is_register = is_swbp_insn(&opcode);
+	uprobe = container_of(auprobe, struct uprobe, arch);
 
 retry:
 	/* Read the page with vaddr into memory */
@@ -317,6 +482,15 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	if (ret <= 0)
 		goto put_old;
 
+	/* We are going to replace instruction, update ref_ctr. */
+	if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
+		ret = update_ref_ctr(uprobe, mm, is_register);
+		if (ret)
+			goto put_old;
+
+		ref_ctr_updated = 1;
+	}
+
 	ret = anon_vma_prepare(vma);
 	if (ret)
 		goto put_old;
@@ -337,6 +511,11 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 
 	if (unlikely(ret == -EAGAIN))
 		goto retry;
+
+	/* Revert back reference counter if instruction update failed. */
+	if (ret && is_register && ref_ctr_updated)
+		update_ref_ctr(uprobe, mm, false);
+
 	return ret;
 }
 
@@ -378,8 +557,15 @@ static struct uprobe *get_uprobe(struct uprobe *uprobe)
 
 static void put_uprobe(struct uprobe *uprobe)
 {
-	if (atomic_dec_and_test(&uprobe->ref))
+	if (atomic_dec_and_test(&uprobe->ref)) {
+		/*
+		 * If application munmap(exec_vma) before uprobe_unregister()
+		 * gets called, we don't get a chance to remove uprobe from
+		 * delayed_uprobe_list in remove_breakpoint(). Do it here.
+		 */
+		delayed_uprobe_remove(uprobe, NULL);
 		kfree(uprobe);
+	}
 }
 
 static int match_uprobe(struct uprobe *l, struct uprobe *r)
@@ -484,7 +670,8 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
 	return u;
 }
 
-static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
+static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
+				   loff_t ref_ctr_offset)
 {
 	struct uprobe *uprobe, *cur_uprobe;
 
@@ -494,6 +681,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
 
 	uprobe->inode = inode;
 	uprobe->offset = offset;
+	uprobe->ref_ctr_offset = ref_ctr_offset;
 	init_rwsem(&uprobe->register_rwsem);
 	init_rwsem(&uprobe->consumer_rwsem);
 
@@ -895,7 +1083,7 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
  * else return 0 (success)
  */
 static int __uprobe_register(struct inode *inode, loff_t offset,
-			     struct uprobe_consumer *uc)
+			     loff_t ref_ctr_offset, struct uprobe_consumer *uc)
 {
 	struct uprobe *uprobe;
 	int ret;
@@ -912,7 +1100,7 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 		return -EINVAL;
 
  retry:
-	uprobe = alloc_uprobe(inode, offset);
+	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (!uprobe)
 		return -ENOMEM;
 	/*
@@ -938,10 +1126,17 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 int uprobe_register(struct inode *inode, loff_t offset,
 		    struct uprobe_consumer *uc)
 {
-	return __uprobe_register(inode, offset, uc);
+	return __uprobe_register(inode, offset, 0, uc);
 }
 EXPORT_SYMBOL_GPL(uprobe_register);
 
+int uprobe_register_refctr(struct inode *inode, loff_t offset,
+			   loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+	return __uprobe_register(inode, offset, ref_ctr_offset, uc);
+}
+EXPORT_SYMBOL_GPL(uprobe_register_refctr);
+
 /*
  * uprobe_apply - unregister a already registered probe.
  * @inode: the file in which the probe has to be removed.
@@ -1060,6 +1255,31 @@ static void build_probe_list(struct inode *inode,
 	spin_unlock(&uprobes_treelock);
 }
 
+static int delayed_uprobe_install(struct vm_area_struct *vma)
+{
+	struct list_head *pos, *q;
+	struct delayed_uprobe *du;
+	unsigned long vaddr;
+	int ret = 0, err = 0;
+
+	mutex_lock(&delayed_uprobe_lock);
+	list_for_each_safe(pos, q, &delayed_uprobe_list) {
+		du = list_entry(pos, struct delayed_uprobe, list);
+
+		if (!valid_ref_ctr_vma(du->uprobe, vma))
+			continue;
+
+		vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
+		ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
+		/* Record an error and continue. */
+		if (ret && !err)
+			err = ret;
+		delayed_uprobe_delete(du);
+	}
+	mutex_unlock(&delayed_uprobe_lock);
+	return err;
+}
+
 /*
  * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
  *
@@ -1072,7 +1292,15 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	struct uprobe *uprobe, *u;
 	struct inode *inode;
 
-	if (no_uprobe_events() || !valid_vma(vma, true))
+	if (no_uprobe_events())
+		return 0;
+
+	if (vma->vm_file &&
+	    (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
+	    test_bit(MMF_HAS_UPROBES, &vma->vm_mm->flags))
+		delayed_uprobe_install(vma);
+
+	if (!valid_vma(vma, true))
 		return 0;
 
 	inode = file_inode(vma->vm_file);
@@ -1246,6 +1474,10 @@ void uprobe_clear_state(struct mm_struct *mm)
 {
 	struct xol_area *area = mm->uprobes_state.xol_area;
 
+	mutex_lock(&delayed_uprobe_lock);
+	delayed_uprobe_remove(NULL, mm);
+	mutex_unlock(&delayed_uprobe_lock);
+
 	if (!area)
 		return;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 823687997b01..616160b85860 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4620,7 +4620,7 @@ static const char readme_msg[] =
   "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
 #endif
 #ifdef CONFIG_UPROBE_EVENTS
-	"\t    place: <path>:<offset>\n"
+  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
 #endif
 	"\t     args: <name>=fetcharg[:type]\n"
 	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf89a51e740d..bf2be098eb08 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -59,6 +59,7 @@ struct trace_uprobe {
 	struct inode			*inode;
 	char				*filename;
 	unsigned long			offset;
+	unsigned long			ref_ctr_offset;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -364,10 +365,10 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 static int create_trace_uprobe(int argc, char **argv)
 {
 	struct trace_uprobe *tu;
-	char *arg, *event, *group, *filename;
+	char *arg, *event, *group, *filename, *rctr, *rctr_end;
 	char buf[MAX_EVENT_NAME_LEN];
 	struct path path;
-	unsigned long offset;
+	unsigned long offset, ref_ctr_offset;
 	bool is_delete, is_return;
 	int i, ret;
 
@@ -376,6 +377,7 @@ static int create_trace_uprobe(int argc, char **argv)
 	is_return = false;
 	event = NULL;
 	group = NULL;
+	ref_ctr_offset = 0;
 
 	/* argc must be >= 1 */
 	if (argv[0][0] == '-')
@@ -450,6 +452,26 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 
+	/* Parse reference counter offset if specified. */
+	rctr = strchr(arg, '(');
+	if (rctr) {
+		rctr_end = strchr(rctr, ')');
+		if (rctr > rctr_end || *(rctr_end + 1) != 0) {
+			ret = -EINVAL;
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+
+		*rctr++ = '\0';
+		*rctr_end = '\0';
+		ret = kstrtoul(rctr, 0, &ref_ctr_offset);
+		if (ret) {
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+	}
+
+	/* Parse uprobe offset. */
 	ret = kstrtoul(arg, 0, &offset);
 	if (ret)
 		goto fail_address_parse;
@@ -484,6 +506,7 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 	tu->offset = offset;
+	tu->ref_ctr_offset = ref_ctr_offset;
 	tu->path = path;
 	tu->filename = kstrdup(filename, GFP_KERNEL);
 
@@ -602,6 +625,9 @@ static int probes_seq_show(struct seq_file *m, void *v)
 			trace_event_name(&tu->tp.call), tu->filename,
 			(int)(sizeof(void *) * 2), tu->offset);
 
+	if (tu->ref_ctr_offset)
+		seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
+
 	for (i = 0; i < tu->tp.nr_args; i++)
 		seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
 
@@ -917,7 +943,13 @@ probe_event_enable(struct trace_uprobe *tu, struct trace_event_file *file,
 
 	tu->consumer.filter = filter;
 	tu->inode = d_real_inode(tu->path.dentry);
-	ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
+	if (tu->ref_ctr_offset) {
+		ret = uprobe_register_refctr(tu->inode, tu->offset,
+				tu->ref_ctr_offset, &tu->consumer);
+	} else {
+		ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
+	}
+
 	if (ret)
 		goto err_buffer;
 
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe
  2018-08-09  4:18 [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (2 preceding siblings ...)
  2018-08-09  4:18 ` [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
@ 2018-08-09  4:18 ` Ravi Bangoria
  2018-08-11  8:04   ` Song Liu
  2018-08-13  8:50   ` Srikar Dronamraju
  2018-08-09  4:18 ` [PATCH v8 5/6] trace_uprobe/sdt: " Ravi Bangoria
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-09  4:18 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to register multiple uprobes having same
inode+offset but different reference counter.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/events/uprobes.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 61b0481ef417..492a0e005b4d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -689,6 +689,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 	cur_uprobe = insert_uprobe(uprobe);
 	/* a uprobe exists for this inode:offset combination */
 	if (cur_uprobe) {
+		if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+			pr_warn("Reference counter offset mismatch.\n");
+			put_uprobe(cur_uprobe);
+			kfree(uprobe);
+			return ERR_PTR(-EINVAL);
+		}
 		kfree(uprobe);
 		uprobe = cur_uprobe;
 	}
@@ -1103,6 +1109,9 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (!uprobe)
 		return -ENOMEM;
+	if (IS_ERR(uprobe))
+		return PTR_ERR(uprobe);
+
 	/*
 	 * We can race with uprobe_unregister()->delete_uprobe().
 	 * Check uprobe_is_active() and retry if it is false.
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  2018-08-09  4:18 [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (3 preceding siblings ...)
  2018-08-09  4:18 ` [PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
@ 2018-08-09  4:18 ` Ravi Bangoria
  2018-08-11  8:12   ` Song Liu
  2018-08-13  8:48   ` Srikar Dronamraju
  2018-08-09  4:18 ` [PATCH v8 6/6] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
  2018-08-13 16:55 ` [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore) Oleg Nesterov
  6 siblings, 2 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-09  4:18 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xfffff)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

There is one exception though:
When user is trying to replace the old entry with the new
one, we allow this if the new entry does not conflict with
any other existing entries.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/trace/trace_uprobe.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf2be098eb08..be64d943d7ea 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
 	return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Though, there is one exception: If user is
+ * replacing old trace_uprobe with new one(same group/event),
+ * then we allow same uprobe with new reference counter as far
+ * as the new one does not conflict with any other existing
+ * ones.
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+	struct trace_uprobe *tmp, *old = NULL;
+	struct inode *new_inode = d_real_inode(new->path.dentry);
+
+	old = find_probe_event(trace_event_name(&new->tp.call),
+				new->tp.call.class->system);
+
+	list_for_each_entry(tmp, &uprobe_list, list) {
+		if ((old ? old != tmp : true) &&
+		    new_inode == d_real_inode(tmp->path.dentry) &&
+		    new->offset == tmp->offset &&
+		    new->ref_ctr_offset != tmp->ref_ctr_offset) {
+			pr_warn("Reference counter offset mismatch.");
+			return ERR_PTR(-EINVAL);
+		}
+	}
+	return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 	mutex_lock(&uprobe_lock);
 
 	/* register as an event */
-	old_tu = find_probe_event(trace_event_name(&tu->tp.call),
-			tu->tp.call.class->system);
+	old_tu = find_old_trace_uprobe(tu);
+	if (IS_ERR(old_tu)) {
+		ret = PTR_ERR(old_tu);
+		goto end;
+	}
+
 	if (old_tu) {
 		/* delete old event */
 		ret = unregister_trace_uprobe(old_tu);
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 6/6] perf probe: Support SDT markers having reference counter (semaphore)
  2018-08-09  4:18 [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (4 preceding siblings ...)
  2018-08-09  4:18 ` [PATCH v8 5/6] trace_uprobe/sdt: " Ravi Bangoria
@ 2018-08-09  4:18 ` Ravi Bangoria
  2018-08-13 16:55 ` [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore) Oleg Nesterov
  6 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-09  4:18 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
    Name: loop2
    ... Semaphore: 0x0000000010020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 tools/perf/util/probe-event.c | 39 ++++++++++++++++++++++++++++++++----
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++++++++++++++++++++++++++------
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ++++++++++++++++++++++++++++++++-----------
 tools/perf/util/symbol.h      |  7 +++++++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f119eb628dbb..e86f8be89157 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev)
 			tp->offset = strtoul(fmt2_str, NULL, 10);
 	}
 
+	if (tev->uprobes) {
+		fmt2_str = strchr(p, '(');
+		if (fmt2_str)
+			tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+	}
+
 	tev->nargs = argc - 2;
 	tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
 	if (tev->args == NULL) {
@@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct probe_trace_arg *arg,
 	return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+	struct probe_trace_point *tp = &tev->point;
+	int err;
+
+	err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+	if (err >= 0 && tp->ref_ctr_offset) {
+		if (!uprobe_ref_ctr_is_supported())
+			return -1;
+		err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+	}
+	return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
 	struct probe_trace_point *tp = &tev->point;
@@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 	}
 
 	/* Use the tp->address for uprobes */
-	if (tev->uprobes)
-		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
-	else if (!strncmp(tp->symbol, "0x", 2))
+	if (tev->uprobes) {
+		err = synthesize_uprobe_trace_def(tev, &buf);
+	} else if (!strncmp(tp->symbol, "0x", 2)) {
 		/* Absolute address. See try_to_find_absolute_address() */
 		err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
 				  tp->module ? ":" : "", tp->address);
-	else
+	} else {
 		err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
 				tp->module ? ":" : "", tp->symbol, tp->offset);
+	}
+
 	if (err)
 		goto error;
 
@@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct probe_trace_event *tev)
 {
 	int i;
 	char *buf = synthesize_probe_trace_command(tev);
+	struct probe_trace_point *tp = &tev->point;
+
+	if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+		pr_warning("A semaphore is associated with %s:%s and "
+			   "seems your kernel doesn't support it.\n",
+			   tev->group, tev->event);
+	}
 
 	/* Old uprobe event doesn't support memory dereference */
 	if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f020558..15a98c3a2a2f 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
 	char		*symbol;	/* Base symbol */
 	char		*module;	/* Module name */
 	unsigned long	offset;		/* Offset from symbol */
+	unsigned long	ref_ctr_offset;	/* SDT reference counter offset */
 	unsigned long	address;	/* Actual address of the trace point */
 	bool		retprobe;	/* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index b76088fadf3d..aac7817d9e14 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -696,8 +696,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
 static unsigned long long sdt_note__get_addr(struct sdt_note *note)
 {
-	return note->bit32 ? (unsigned long long)note->addr.a32[0]
-		 : (unsigned long long)note->addr.a64[0];
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_LOC] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_LOC];
+}
+
+static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
+{
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_REFCTR] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_REFCTR];
 }
 
 static const char * const type_to_suffix[] = {
@@ -775,14 +783,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
 {
 	struct strbuf buf;
 	char *ret = NULL, **args;
-	int i, args_count;
+	int i, args_count, err;
+	unsigned long long ref_ctr_offset;
 
 	if (strbuf_init(&buf, 32) < 0)
 		return NULL;
 
-	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
-				sdtgrp, note->name, pathname,
-				sdt_note__get_addr(note)) < 0)
+	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
+			sdtgrp, note->name, pathname,
+			sdt_note__get_addr(note));
+
+	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
+	if (ref_ctr_offset && err >= 0)
+		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
+
+	if (err < 0)
 		goto error;
 
 	if (!note->args)
@@ -998,6 +1013,7 @@ int probe_cache__show_all_caches(struct strfilter *filter)
 enum ftrace_readme {
 	FTRACE_README_PROBE_TYPE_X = 0,
 	FTRACE_README_KRETPROBE_OFFSET,
+	FTRACE_README_UPROBE_REF_CTR,
 	FTRACE_README_END,
 };
 
@@ -1009,6 +1025,7 @@ static struct {
 	[idx] = {.pattern = pat, .avail = false}
 	DEFINE_TYPE(FTRACE_README_PROBE_TYPE_X, "*type: * x8/16/32/64,*"),
 	DEFINE_TYPE(FTRACE_README_KRETPROBE_OFFSET, "*place (kretprobe): *"),
+	DEFINE_TYPE(FTRACE_README_UPROBE_REF_CTR, "*ref_ctr_offset*"),
 };
 
 static bool scan_ftrace_readme(enum ftrace_readme type)
@@ -1064,3 +1081,8 @@ bool kretprobe_offset_is_supported(void)
 {
 	return scan_ftrace_readme(FTRACE_README_KRETPROBE_OFFSET);
 }
+
+bool uprobe_ref_ctr_is_supported(void)
+{
+	return scan_ftrace_readme(FTRACE_README_UPROBE_REF_CTR);
+}
diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
index 63f29b1d22c1..2a249182f2a6 100644
--- a/tools/perf/util/probe-file.h
+++ b/tools/perf/util/probe-file.h
@@ -69,6 +69,7 @@ struct probe_cache_entry *probe_cache__find_by_name(struct probe_cache *pcache,
 int probe_cache__show_all_caches(struct strfilter *filter);
 bool probe_type_is_available(enum probe_type type);
 bool kretprobe_offset_is_supported(void);
+bool uprobe_ref_ctr_is_supported(void);
 #else	/* ! HAVE_LIBELF_SUPPORT */
 static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused, struct nsinfo *nsi __maybe_unused)
 {
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 29770ea61768..0281d5e2cd67 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1947,6 +1947,34 @@ void kcore_extract__delete(struct kcore_extract *kce)
 }
 
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
+
+static void sdt_adjust_loc(struct sdt_note *tmp, GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32)
+		tmp->addr.a32[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a32[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a32[SDT_NOTE_IDX_BASE];
+	else
+		tmp->addr.a64[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a64[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a64[SDT_NOTE_IDX_BASE];
+}
+
+static void sdt_adjust_refctr(struct sdt_note *tmp, GElf_Addr base_addr,
+			      GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32 && tmp->addr.a32[SDT_NOTE_IDX_REFCTR])
+		tmp->addr.a32[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+	else if (tmp->addr.a64[SDT_NOTE_IDX_REFCTR])
+		tmp->addr.a64[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+}
+
 /**
  * populate_sdt_note : Parse raw data and identify SDT note
  * @elf: elf of the opened file
@@ -1964,7 +1992,6 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	const char *provider, *name, *args;
 	struct sdt_note *tmp = NULL;
 	GElf_Ehdr ehdr;
-	GElf_Addr base_off = 0;
 	GElf_Shdr shdr;
 	int ret = -EINVAL;
 
@@ -2060,17 +2087,12 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	 * base address in the description of the SDT note. If its different,
 	 * then accordingly, adjust the note location.
 	 */
-	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL)) {
-		base_off = shdr.sh_offset;
-		if (base_off) {
-			if (tmp->bit32)
-				tmp->addr.a32[0] = tmp->addr.a32[0] + base_off -
-					tmp->addr.a32[1];
-			else
-				tmp->addr.a64[0] = tmp->addr.a64[0] + base_off -
-					tmp->addr.a64[1];
-		}
-	}
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL))
+		sdt_adjust_loc(tmp, shdr.sh_offset);
+
+	/* Adjust reference counter offset */
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_PROBES_SCN, NULL))
+		sdt_adjust_refctr(tmp, shdr.sh_addr, shdr.sh_offset);
 
 	list_add_tail(&tmp->note_list, sdt_notes);
 	return 0;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index f25fae4b5743..20f49779116b 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -379,12 +379,19 @@ int get_sdt_note_list(struct list_head *head, const char *target);
 int cleanup_sdt_note_list(struct list_head *sdt_notes);
 int sdt_notes__get_count(struct list_head *start);
 
+#define SDT_PROBES_SCN ".probes"
 #define SDT_BASE_SCN ".stapsdt.base"
 #define SDT_NOTE_SCN  ".note.stapsdt"
 #define SDT_NOTE_TYPE 3
 #define SDT_NOTE_NAME "stapsdt"
 #define NR_ADDR 3
 
+enum {
+	SDT_NOTE_IDX_LOC = 0,
+	SDT_NOTE_IDX_BASE,
+	SDT_NOTE_IDX_REFCTR,
+};
+
 struct mem_info *mem_info__new(void);
 struct mem_info *mem_info__get(struct mem_info *mi);
 void   mem_info__put(struct mem_info *mi);
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-09  4:18 ` [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
@ 2018-08-09 14:38   ` Oleg Nesterov
  2018-08-10 19:58     ` Steven Rostedt
  2018-08-11  7:57   ` Song Liu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 42+ messages in thread
From: Oleg Nesterov @ 2018-08-09 14:38 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, liu.song.a23, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

I need to read this (hopefully final) version carefully. I'll try to do
this before next Monday.

just one note,

On 08/09, Ravi Bangoria wrote:
>
> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +	struct list_head *pos, *q;
> +	struct delayed_uprobe *du;
> +
> +	if (!uprobe && !mm)
> +		return;
> +
> +	list_for_each_safe(pos, q, &delayed_uprobe_list) {
> +		du = list_entry(pos, struct delayed_uprobe, list);
> +
> +		if (uprobe && mm && du->uprobe == uprobe && du->mm == mm)
> +			delayed_uprobe_delete(du);
> +		else if (!uprobe && du->mm == mm)
> +			delayed_uprobe_delete(du);
> +		else if (!mm && du->uprobe == uprobe)
> +			delayed_uprobe_delete(du);
> +	}

Sorry, I can't resist... this doesn't look very nice. How about

	list_for_each_safe(pos, q, &delayed_uprobe_list) {
		du = list_entry(pos, struct delayed_uprobe, list);

		if (uprobe && du->uprobe != uprobe)
			continue;
		if (mm && du->mm != mm)
			continue;

		delayed_uprobe_delete();
	}

I won't insist, this is cosmetic after all, but please consider this change
in case you will need to send v9.

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-09 14:38   ` Oleg Nesterov
@ 2018-08-10 19:58     ` Steven Rostedt
  2018-08-11  6:14       ` Song Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Steven Rostedt @ 2018-08-10 19:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ravi Bangoria, srikar, mhiramat, liu.song.a23, peterz, mingo,
	acme, alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

On Thu, 9 Aug 2018 16:38:28 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> I need to read this (hopefully final) version carefully. I'll try to do
> this before next Monday.
> 

Monday may be the opening of the merge window (more likely Sunday). Do
you think this is good enough for pushing it in this late in the game,
or do you think we should wait another cycle?

-- Steve

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-10 19:58     ` Steven Rostedt
@ 2018-08-11  6:14       ` Song Liu
  2018-08-14  0:01         ` Steven Rostedt
  0 siblings, 1 reply; 42+ messages in thread
From: Song Liu @ 2018-08-11  6:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Oleg Nesterov, Ravi Bangoria, srikar, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

On Fri, Aug 10, 2018 at 12:58 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 9 Aug 2018 16:38:28 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
>> I need to read this (hopefully final) version carefully. I'll try to do
>> this before next Monday.
>>
>
> Monday may be the opening of the merge window (more likely Sunday). Do
> you think this is good enough for pushing it in this late in the game,
> or do you think we should wait another cycle?
>
> -- Steve

We (Facebook) have use cases in production that would benefit from this work, so
I would rather see this gets in sooner than later. After a brief
review (I will more
careful review before Monday), I think this set is not likely to break
existing uprobes
(those w/o ref_ctr). Therefore, I think it is safe to put it in this cycle.

Thanks,
Song

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-09  4:18 ` [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
  2018-08-09 14:38   ` Oleg Nesterov
@ 2018-08-11  7:57   ` Song Liu
  2018-08-11 15:37     ` Steven Rostedt
  2018-08-13  5:47     ` Ravi Bangoria
  2018-08-13 10:03   ` Srikar Dronamraju
  2018-08-14  8:56   ` Ravi Bangoria
  3 siblings, 2 replies; 42+ messages in thread
From: Song Liu @ 2018-08-11  7:57 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, Oleg Nesterov, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

On Wed, Aug 8, 2018 at 9:18 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
>
>     if (reference_counter > 0) {
>         Execute marker instructions;
>     }
>
> Default value of reference counter is 0. Tracer has to increment the
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
>
> Implement the reference counter logic in core uprobe. User will be
> able to use it from trace_uprobe as well as from kernel module. New
> trace_uprobe definition with reference counter will now be:
>
>     <path>:<offset>[(ref_ctr_offset)]
>
> where ref_ctr_offset is an optional field. For kernel module, new
> variant of uprobe_register() has been introduced:
>
>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
>
> No new variant for uprobe_unregister() because it's assumed to have
> only one reference counter for one uprobe.
>
> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
>
> Note: 'reference counter' is called as 'semaphore' in original Dtrace
> (or Systemtap, bcc and even in ELF) documentation and code. But the
> term 'semaphore' is misleading in this context. This is just a counter
> used to hold number of tracers tracing on a marker. This is not really
> used for any synchronization. So we are referring it as 'reference
> counter' in kernel / perf code.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
> [Only trace_uprobe.c]
> ---
>  include/linux/uprobes.h     |   5 +
>  kernel/events/uprobes.c     | 246 ++++++++++++++++++++++++++++++++++++++++++--
>  kernel/trace/trace.c        |   2 +-
>  kernel/trace/trace_uprobe.c |  38 ++++++-
>  4 files changed, 280 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index bb9d2084af03..103a48a48872 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
>  extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
>  extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
>  extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
> +extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
>  extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
>  extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
>  extern int uprobe_mmap(struct vm_area_struct *vma);
> @@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
>  {
>         return -ENOSYS;
>  }
> +static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
> +{
> +       return -ENOSYS;
> +}
>  static inline int
>  uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
>  {
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index c0418ba52ba8..61b0481ef417 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -73,6 +73,7 @@ struct uprobe {
>         struct uprobe_consumer  *consumers;
>         struct inode            *inode;         /* Also hold a ref to inode */
>         loff_t                  offset;
> +       loff_t                  ref_ctr_offset;
>         unsigned long           flags;
>
>         /*
> @@ -88,6 +89,15 @@ struct uprobe {
>         struct arch_uprobe      arch;
>  };
>
> +struct delayed_uprobe {
> +       struct list_head list;
> +       struct uprobe *uprobe;
> +       struct mm_struct *mm;
> +};
> +
> +static DEFINE_MUTEX(delayed_uprobe_lock);
> +static LIST_HEAD(delayed_uprobe_list);
> +
>  /*
>   * Execute out of line area: anonymous executable mapping installed
>   * by the probed task to execute the copy of the original instruction
> @@ -282,6 +292,157 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
>         return 1;
>  }
>
> +static struct delayed_uprobe *
> +delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +       struct delayed_uprobe *du;
> +
> +       list_for_each_entry(du, &delayed_uprobe_list, list)
> +               if (du->uprobe == uprobe && du->mm == mm)
> +                       return du;
> +       return NULL;
> +}
> +
> +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +       struct delayed_uprobe *du;
> +
> +       if (delayed_uprobe_check(uprobe, mm))
> +               return 0;
> +
> +       du = kzalloc(sizeof(*du), GFP_KERNEL);
> +       if (!du)
> +               return -ENOMEM;
> +
> +       du->uprobe = uprobe;
> +       du->mm = mm;
> +       list_add(&du->list, &delayed_uprobe_list);
> +       return 0;
> +}
> +
> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
> +{
> +       if (!du)
> +               return;
Do we really need this check?

> +       list_del(&du->list);
> +       kfree(du);
> +}
> +
> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +       struct list_head *pos, *q;
> +       struct delayed_uprobe *du;
> +
> +       if (!uprobe && !mm)
> +               return;
And do we really need this check?

> +
> +       list_for_each_safe(pos, q, &delayed_uprobe_list) {
> +               du = list_entry(pos, struct delayed_uprobe, list);
> +
> +               if (uprobe && mm && du->uprobe == uprobe && du->mm == mm)
> +                       delayed_uprobe_delete(du);
> +               else if (!uprobe && du->mm == mm)
> +                       delayed_uprobe_delete(du);
> +               else if (!mm && du->uprobe == uprobe)
> +                       delayed_uprobe_delete(du);
> +       }
> +}
> +
> +static bool valid_ref_ctr_vma(struct uprobe *uprobe,
> +                             struct vm_area_struct *vma)
> +{
> +       unsigned long vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
> +
> +       return uprobe->ref_ctr_offset &&
> +               vma->vm_file &&
> +               file_inode(vma->vm_file) == uprobe->inode &&
> +               (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
> +               vma->vm_start <= vaddr &&
> +               vma->vm_end > vaddr;
> +}
> +
> +static struct vm_area_struct *
> +find_ref_ctr_vma(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +       struct vm_area_struct *tmp;
> +
> +       for (tmp = mm->mmap; tmp; tmp = tmp->vm_next)
> +               if (valid_ref_ctr_vma(uprobe, tmp))
> +                       return tmp;
> +
> +       return NULL;
> +}
> +
> +static int
> +__update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
> +{
> +       void *kaddr;
> +       struct page *page;
> +       struct vm_area_struct *vma;
> +       int ret = 0;
It is not necessary to initialize ret here.

> +       short *ptr;
> +
> +       if (vaddr == 0 || d == 0)
> +               return -EINVAL;
> +
> +       ret = get_user_pages_remote(NULL, mm, vaddr, 1,
> +                       FOLL_WRITE, &page, &vma, NULL);
> +       if (unlikely(ret <= 0)) {
> +               /*
> +                * We are asking for 1 page. If get_user_pages_remote() fails,
> +                * it may return 0, in that case we have to return error.
> +                */
> +               ret = (ret == 0) ? -EBUSY : ret;
> +               pr_warn("Failed to %s ref_ctr. (%d)\n",
> +                       d > 0 ? "increment" : "decrement", ret);
This warning is not really useful. Seems this function has little information
about which uprobe is failing here. Maybe we only need warning in the caller
(or caller of caller).

> +               return ret;
> +       }
> +
> +       kaddr = kmap_atomic(page);
> +       ptr = kaddr + (vaddr & ~PAGE_MASK);
> +
> +       if (unlikely(*ptr + d < 0)) {
> +               pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
> +                       "curr val: %d, delta: %d\n", vaddr, *ptr, d);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       *ptr += d;
> +       ret = 0;
> +out:
> +       kunmap_atomic(kaddr);
> +       put_page(page);
> +       return ret;
> +}
> +
> +static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
> +                         bool is_register)
What's the reason of bool is_register here vs. short d in __update_ref_ctr()?
Can we use short for both?

> +{
> +       struct vm_area_struct *rc_vma;
> +       unsigned long rc_vaddr;
> +       int ret = 0;
> +
> +       rc_vma = find_ref_ctr_vma(uprobe, mm);
> +
> +       if (rc_vma) {
> +               rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
> +               ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
> +
> +               if (is_register)
> +                       return ret;
> +       }
Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same
function is a little confusing (at least for me). How about we always use
delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)?

> +
> +       mutex_lock(&delayed_uprobe_lock);
> +       if (is_register)
> +               ret = delayed_uprobe_add(uprobe, mm);
> +       else
> +               delayed_uprobe_remove(uprobe, mm);
> +       mutex_unlock(&delayed_uprobe_lock);
> +
> +       return ret;
> +}
> +
>  /*
>   * NOTE:
>   * Expect the breakpoint instruction to be the smallest size instruction for
> @@ -302,9 +463,13 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
>  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>                         unsigned long vaddr, uprobe_opcode_t opcode)
>  {
> +       struct uprobe *uprobe;
>         struct page *old_page, *new_page;
>         struct vm_area_struct *vma;
> -       int ret;
> +       int ret, is_register, ref_ctr_updated = 0;
> +
> +       is_register = is_swbp_insn(&opcode);
> +       uprobe = container_of(auprobe, struct uprobe, arch);
>
>  retry:
>         /* Read the page with vaddr into memory */
> @@ -317,6 +482,15 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>         if (ret <= 0)
>                 goto put_old;
>
> +       /* We are going to replace instruction, update ref_ctr. */
> +       if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
> +               ret = update_ref_ctr(uprobe, mm, is_register);
> +               if (ret)
> +                       goto put_old;
> +
> +               ref_ctr_updated = 1;
> +       }
> +
>         ret = anon_vma_prepare(vma);
>         if (ret)
>                 goto put_old;
> @@ -337,6 +511,11 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>
>         if (unlikely(ret == -EAGAIN))
>                 goto retry;
> +
> +       /* Revert back reference counter if instruction update failed. */
> +       if (ret && is_register && ref_ctr_updated)
> +               update_ref_ctr(uprobe, mm, false);
> +
>         return ret;
>  }
>
> @@ -378,8 +557,15 @@ static struct uprobe *get_uprobe(struct uprobe *uprobe)
>
>  static void put_uprobe(struct uprobe *uprobe)
>  {
> -       if (atomic_dec_and_test(&uprobe->ref))
> +       if (atomic_dec_and_test(&uprobe->ref)) {
> +               /*
> +                * If application munmap(exec_vma) before uprobe_unregister()
> +                * gets called, we don't get a chance to remove uprobe from
> +                * delayed_uprobe_list in remove_breakpoint(). Do it here.
> +                */
> +               delayed_uprobe_remove(uprobe, NULL);
>                 kfree(uprobe);
> +       }
>  }
>
>  static int match_uprobe(struct uprobe *l, struct uprobe *r)
> @@ -484,7 +670,8 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
>         return u;
>  }
>
> -static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
> +static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
> +                                  loff_t ref_ctr_offset)
>  {
>         struct uprobe *uprobe, *cur_uprobe;
>
> @@ -494,6 +681,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
>
>         uprobe->inode = inode;
>         uprobe->offset = offset;
> +       uprobe->ref_ctr_offset = ref_ctr_offset;
>         init_rwsem(&uprobe->register_rwsem);
>         init_rwsem(&uprobe->consumer_rwsem);
>
> @@ -895,7 +1083,7 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
>   * else return 0 (success)
>   */
>  static int __uprobe_register(struct inode *inode, loff_t offset,
> -                            struct uprobe_consumer *uc)
> +                            loff_t ref_ctr_offset, struct uprobe_consumer *uc)
>  {
>         struct uprobe *uprobe;
>         int ret;
> @@ -912,7 +1100,7 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
>                 return -EINVAL;
>
>   retry:
> -       uprobe = alloc_uprobe(inode, offset);
> +       uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
>         if (!uprobe)
>                 return -ENOMEM;
>         /*
> @@ -938,10 +1126,17 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
>  int uprobe_register(struct inode *inode, loff_t offset,
>                     struct uprobe_consumer *uc)
>  {
> -       return __uprobe_register(inode, offset, uc);
> +       return __uprobe_register(inode, offset, 0, uc);
>  }
>  EXPORT_SYMBOL_GPL(uprobe_register);
>
> +int uprobe_register_refctr(struct inode *inode, loff_t offset,
> +                          loff_t ref_ctr_offset, struct uprobe_consumer *uc)
> +{
> +       return __uprobe_register(inode, offset, ref_ctr_offset, uc);
> +}
> +EXPORT_SYMBOL_GPL(uprobe_register_refctr);
> +
>  /*
>   * uprobe_apply - unregister a already registered probe.
>   * @inode: the file in which the probe has to be removed.
> @@ -1060,6 +1255,31 @@ static void build_probe_list(struct inode *inode,
>         spin_unlock(&uprobes_treelock);
>  }
>
> +static int delayed_uprobe_install(struct vm_area_struct *vma)
This function name is confusing. How about we call it delayed_ref_ctr_incr() or
something similar? Also, we should add comments to highlight this is vma is not
the vma containing the uprobe, but the vma containing the ref_ctr.

> +{
> +       struct list_head *pos, *q;
> +       struct delayed_uprobe *du;
> +       unsigned long vaddr;
> +       int ret = 0, err = 0;
> +
> +       mutex_lock(&delayed_uprobe_lock);
> +       list_for_each_safe(pos, q, &delayed_uprobe_list) {
> +               du = list_entry(pos, struct delayed_uprobe, list);
> +
> +               if (!valid_ref_ctr_vma(du->uprobe, vma))
> +                       continue;
> +
> +               vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
> +               ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
> +               /* Record an error and continue. */
> +               if (ret && !err)
> +                       err = ret;
I think this is a good place (when ret != 0) to call pr_warn(). I guess we can
print which mm get error for which uprobe (inode+offset).

> +               delayed_uprobe_delete(du);
> +       }
> +       mutex_unlock(&delayed_uprobe_lock);
> +       return err;
> +}
> +
>  /*
>   * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
>   *
> @@ -1072,7 +1292,15 @@ int uprobe_mmap(struct vm_area_struct *vma)
>         struct uprobe *uprobe, *u;
>         struct inode *inode;
>
> -       if (no_uprobe_events() || !valid_vma(vma, true))
> +       if (no_uprobe_events())
> +               return 0;
> +
> +       if (vma->vm_file &&
> +           (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
> +           test_bit(MMF_HAS_UPROBES, &vma->vm_mm->flags))
> +               delayed_uprobe_install(vma);
> +
> +       if (!valid_vma(vma, true))
>                 return 0;
>
>         inode = file_inode(vma->vm_file);
> @@ -1246,6 +1474,10 @@ void uprobe_clear_state(struct mm_struct *mm)
>  {
>         struct xol_area *area = mm->uprobes_state.xol_area;
>
> +       mutex_lock(&delayed_uprobe_lock);
> +       delayed_uprobe_remove(NULL, mm);
> +       mutex_unlock(&delayed_uprobe_lock);
> +
>         if (!area)
>                 return;
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 823687997b01..616160b85860 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -4620,7 +4620,7 @@ static const char readme_msg[] =
>    "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
>  #endif
>  #ifdef CONFIG_UPROBE_EVENTS
> -       "\t    place: <path>:<offset>\n"
> +  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
>  #endif
>         "\t     args: <name>=fetcharg[:type]\n"
>         "\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index bf89a51e740d..bf2be098eb08 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -59,6 +59,7 @@ struct trace_uprobe {
>         struct inode                    *inode;
>         char                            *filename;
>         unsigned long                   offset;
> +       unsigned long                   ref_ctr_offset;
>         unsigned long                   nhit;
>         struct trace_probe              tp;
>  };
> @@ -364,10 +365,10 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
>  static int create_trace_uprobe(int argc, char **argv)
>  {
>         struct trace_uprobe *tu;
> -       char *arg, *event, *group, *filename;
> +       char *arg, *event, *group, *filename, *rctr, *rctr_end;
>         char buf[MAX_EVENT_NAME_LEN];
>         struct path path;
> -       unsigned long offset;
> +       unsigned long offset, ref_ctr_offset;
>         bool is_delete, is_return;
>         int i, ret;
>
> @@ -376,6 +377,7 @@ static int create_trace_uprobe(int argc, char **argv)
>         is_return = false;
>         event = NULL;
>         group = NULL;
> +       ref_ctr_offset = 0;
>
>         /* argc must be >= 1 */
>         if (argv[0][0] == '-')
> @@ -450,6 +452,26 @@ static int create_trace_uprobe(int argc, char **argv)
>                 goto fail_address_parse;
>         }
>
> +       /* Parse reference counter offset if specified. */
> +       rctr = strchr(arg, '(');
> +       if (rctr) {
> +               rctr_end = strchr(rctr, ')');
> +               if (rctr > rctr_end || *(rctr_end + 1) != 0) {
> +                       ret = -EINVAL;
> +                       pr_info("Invalid reference counter offset.\n");
> +                       goto fail_address_parse;
> +               }
> +
> +               *rctr++ = '\0';
> +               *rctr_end = '\0';
> +               ret = kstrtoul(rctr, 0, &ref_ctr_offset);
> +               if (ret) {
> +                       pr_info("Invalid reference counter offset.\n");
> +                       goto fail_address_parse;
> +               }
> +       }
> +
> +       /* Parse uprobe offset. */
>         ret = kstrtoul(arg, 0, &offset);
>         if (ret)
>                 goto fail_address_parse;
> @@ -484,6 +506,7 @@ static int create_trace_uprobe(int argc, char **argv)
>                 goto fail_address_parse;
>         }
>         tu->offset = offset;
> +       tu->ref_ctr_offset = ref_ctr_offset;
>         tu->path = path;
>         tu->filename = kstrdup(filename, GFP_KERNEL);
>
> @@ -602,6 +625,9 @@ static int probes_seq_show(struct seq_file *m, void *v)
>                         trace_event_name(&tu->tp.call), tu->filename,
>                         (int)(sizeof(void *) * 2), tu->offset);
>
> +       if (tu->ref_ctr_offset)
> +               seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
> +
>         for (i = 0; i < tu->tp.nr_args; i++)
>                 seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
>
> @@ -917,7 +943,13 @@ probe_event_enable(struct trace_uprobe *tu, struct trace_event_file *file,
>
>         tu->consumer.filter = filter;
>         tu->inode = d_real_inode(tu->path.dentry);
> -       ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> +       if (tu->ref_ctr_offset) {
> +               ret = uprobe_register_refctr(tu->inode, tu->offset,
> +                               tu->ref_ctr_offset, &tu->consumer);
> +       } else {
> +               ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> +       }
> +
>         if (ret)
>                 goto err_buffer;
>
> --
> 2.14.4
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/6] Uprobes: Simplify uprobe_register() body
  2018-08-09  4:18 ` [PATCH v8 1/6] Uprobes: Simplify uprobe_register() body Ravi Bangoria
@ 2018-08-11  8:00   ` Song Liu
  2018-08-13  8:56   ` Srikar Dronamraju
  1 sibling, 0 replies; 42+ messages in thread
From: Song Liu @ 2018-08-11  8:00 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, Oleg Nesterov, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

On Wed, Aug 8, 2018 at 9:18 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> Simplify uprobe_register() function body and let __uprobe_register()
> handle everything. Also move dependency functions around to fix build
> failures.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>

Reviewed-by: Song Liu <songliubraving@fb.com>

> ---
>  kernel/events/uprobes.c | 69 ++++++++++++++++++++++++++-----------------------
>  1 file changed, 36 insertions(+), 33 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index ccc579a7d32e..471eac896635 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
>         return err;
>  }
>
> -static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
> -{
> -       consumer_add(uprobe, uc);
> -       return register_for_each_vma(uprobe, uc);
> -}
> -
> -static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
> +static void
> +__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
>  {
>         int err;
>
> @@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *u
>  }
>
>  /*
> - * uprobe_register - register a probe
> + * uprobe_unregister - unregister a already registered probe.
> + * @inode: the file in which the probe has to be removed.
> + * @offset: offset from the start of the file.
> + * @uc: identify which probe if multiple probes are colocated.
> + */
> +void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
> +{
> +       struct uprobe *uprobe;
> +
> +       uprobe = find_uprobe(inode, offset);
> +       if (WARN_ON(!uprobe))
> +               return;
> +
> +       down_write(&uprobe->register_rwsem);
> +       __uprobe_unregister(uprobe, uc);
> +       up_write(&uprobe->register_rwsem);
> +       put_uprobe(uprobe);
> +}
> +EXPORT_SYMBOL_GPL(uprobe_unregister);
> +
> +/*
> + * __uprobe_register - register a probe
>   * @inode: the file in which the probe has to be placed.
>   * @offset: offset from the start of the file.
>   * @uc: information on howto handle the probe..
>   *
> - * Apart from the access refcount, uprobe_register() takes a creation
> + * Apart from the access refcount, __uprobe_register() takes a creation
>   * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
>   * inserted into the rbtree (i.e first consumer for a @inode:@offset
>   * tuple).  Creation refcount stops uprobe_unregister from freeing the
>   * @uprobe even before the register operation is complete. Creation
>   * refcount is released when the last @uc for the @uprobe
> - * unregisters. Caller of uprobe_register() is required to keep @inode
> + * unregisters. Caller of __uprobe_register() is required to keep @inode
>   * (and the containing mount) referenced.
>   *
>   * Return errno if it cannot successully install probes
>   * else return 0 (success)
>   */
> -int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
> +static int __uprobe_register(struct inode *inode, loff_t offset,
> +                            struct uprobe_consumer *uc)
>  {
>         struct uprobe *uprobe;
>         int ret;
> @@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
>         down_write(&uprobe->register_rwsem);
>         ret = -EAGAIN;
>         if (likely(uprobe_is_active(uprobe))) {
> -               ret = __uprobe_register(uprobe, uc);
> +               consumer_add(uprobe, uc);
> +               ret = register_for_each_vma(uprobe, uc);
>                 if (ret)
>                         __uprobe_unregister(uprobe, uc);
>         }
> @@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
>                 goto retry;
>         return ret;
>  }
> +
> +int uprobe_register(struct inode *inode, loff_t offset,
> +                   struct uprobe_consumer *uc)
> +{
> +       return __uprobe_register(inode, offset, uc);
> +}
>  EXPORT_SYMBOL_GPL(uprobe_register);
>
>  /*
> @@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
>         return ret;
>  }
>
> -/*
> - * uprobe_unregister - unregister a already registered probe.
> - * @inode: the file in which the probe has to be removed.
> - * @offset: offset from the start of the file.
> - * @uc: identify which probe if multiple probes are colocated.
> - */
> -void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
> -{
> -       struct uprobe *uprobe;
> -
> -       uprobe = find_uprobe(inode, offset);
> -       if (WARN_ON(!uprobe))
> -               return;
> -
> -       down_write(&uprobe->register_rwsem);
> -       __uprobe_unregister(uprobe, uc);
> -       up_write(&uprobe->register_rwsem);
> -       put_uprobe(uprobe);
> -}
> -EXPORT_SYMBOL_GPL(uprobe_unregister);
> -
>  static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
>  {
>         struct vm_area_struct *vma;
> --
> 2.14.4
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
  2018-08-09  4:18 ` [PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Ravi Bangoria
@ 2018-08-11  8:01   ` Song Liu
  2018-08-13  8:51   ` Srikar Dronamraju
  1 sibling, 0 replies; 42+ messages in thread
From: Song Liu @ 2018-08-11  8:01 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, Oleg Nesterov, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

On Wed, Aug 8, 2018 at 9:18 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> Add addition argument 'arch_uprobe' to uprobe_write_opcode().
> We need this in later set of patches.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>

Reviewed-by: Song Liu <songliubraving@fb.com>

> ---
>  arch/arm/probes/uprobes/core.c | 2 +-
>  arch/mips/kernel/uprobes.c     | 2 +-
>  include/linux/uprobes.h        | 2 +-
>  kernel/events/uprobes.c        | 9 +++++----
>  4 files changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
> index d1329f1ba4e4..bf992264060e 100644
> --- a/arch/arm/probes/uprobes/core.c
> +++ b/arch/arm/probes/uprobes/core.c
> @@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
>  int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
>              unsigned long vaddr)
>  {
> -       return uprobe_write_opcode(mm, vaddr,
> +       return uprobe_write_opcode(auprobe, mm, vaddr,
>                    __opcode_to_mem_arm(auprobe->bpinsn));
>  }
>
> diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c
> index f7a0645ccb82..4aaff3b3175c 100644
> --- a/arch/mips/kernel/uprobes.c
> +++ b/arch/mips/kernel/uprobes.c
> @@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
>  int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
>         unsigned long vaddr)
>  {
> -       return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
> +       return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
>  }
>
>  void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index 0a294e950df8..bb9d2084af03 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
>  extern bool is_trap_insn(uprobe_opcode_t *insn);
>  extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
>  extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
> -extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
> +extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
>  extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
>  extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
>  extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 471eac896635..c0418ba52ba8 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
>   * Called with mm->mmap_sem held for write.
>   * Return 0 (success) or a negative errno.
>   */
> -int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
> -                       uprobe_opcode_t opcode)
> +int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> +                       unsigned long vaddr, uprobe_opcode_t opcode)
>  {
>         struct page *old_page, *new_page;
>         struct vm_area_struct *vma;
> @@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
>   */
>  int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
>  {
> -       return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
> +       return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
>  }
>
>  /**
> @@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned
>  int __weak
>  set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
>  {
> -       return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t *)&auprobe->insn);
> +       return uprobe_write_opcode(auprobe, mm, vaddr,
> +                       *(uprobe_opcode_t *)&auprobe->insn);
>  }
>
>  static struct uprobe *get_uprobe(struct uprobe *uprobe)
> --
> 2.14.4
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe
  2018-08-09  4:18 ` [PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
@ 2018-08-11  8:04   ` Song Liu
  2018-08-13  8:50   ` Srikar Dronamraju
  1 sibling, 0 replies; 42+ messages in thread
From: Song Liu @ 2018-08-11  8:04 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, Oleg Nesterov, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

On Wed, Aug 8, 2018 at 9:18 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> We assume to have only one reference counter for one uprobe.
> Don't allow user to register multiple uprobes having same
> inode+offset but different reference counter.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> ---
>  kernel/events/uprobes.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 61b0481ef417..492a0e005b4d 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -689,6 +689,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
>         cur_uprobe = insert_uprobe(uprobe);
>         /* a uprobe exists for this inode:offset combination */
>         if (cur_uprobe) {
> +               if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
> +                       pr_warn("Reference counter offset mismatch.\n");
We should print more information here, including the inode, the offset
and both ref_ctr_offset.

> +                       put_uprobe(cur_uprobe);
> +                       kfree(uprobe);
> +                       return ERR_PTR(-EINVAL);
> +               }
>                 kfree(uprobe);
>                 uprobe = cur_uprobe;
>         }
> @@ -1103,6 +1109,9 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
>         uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
>         if (!uprobe)
>                 return -ENOMEM;
> +       if (IS_ERR(uprobe))
> +               return PTR_ERR(uprobe);
> +
>         /*
>          * We can race with uprobe_unregister()->delete_uprobe().
>          * Check uprobe_is_active() and retry if it is false.
> --
> 2.14.4
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  2018-08-09  4:18 ` [PATCH v8 5/6] trace_uprobe/sdt: " Ravi Bangoria
@ 2018-08-11  8:12   ` Song Liu
  2018-08-13  8:19     ` Ravi Bangoria
  2018-08-13  8:48   ` Srikar Dronamraju
  1 sibling, 1 reply; 42+ messages in thread
From: Song Liu @ 2018-08-11  8:12 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, Oleg Nesterov, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

Do we really need this given we already have PATCH 4/6?
uprobe_regsiter() can be called
out of trace_uprobe, this patch won't catch all conflicts anyway.

Song

On Wed, Aug 8, 2018 at 9:18 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> We assume to have only one reference counter for one uprobe.
> Don't allow user to add multiple trace_uprobe entries having
> same inode+offset but different reference counter.
>
> Ex,
>   # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
>   # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xfffff)" >> uprobe_events
>   bash: echo: write error: Invalid argument
>
>   # dmesg
>   trace_kprobe: Reference counter offset mismatch.
>
> There is one exception though:
> When user is trying to replace the old entry with the new
> one, we allow this if the new entry does not conflict with
> any other existing entries.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> ---
>  kernel/trace/trace_uprobe.c | 37 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index bf2be098eb08..be64d943d7ea 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
>         return 0;
>  }
>
> +/*
> + * Uprobe with multiple reference counter is not allowed. i.e.
> + * If inode and offset matches, reference counter offset *must*
> + * match as well. Though, there is one exception: If user is
> + * replacing old trace_uprobe with new one(same group/event),
> + * then we allow same uprobe with new reference counter as far
> + * as the new one does not conflict with any other existing
> + * ones.
> + */
> +static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
> +{
> +       struct trace_uprobe *tmp, *old = NULL;
> +       struct inode *new_inode = d_real_inode(new->path.dentry);
> +
> +       old = find_probe_event(trace_event_name(&new->tp.call),
> +                               new->tp.call.class->system);
> +
> +       list_for_each_entry(tmp, &uprobe_list, list) {
> +               if ((old ? old != tmp : true) &&
> +                   new_inode == d_real_inode(tmp->path.dentry) &&
> +                   new->offset == tmp->offset &&
> +                   new->ref_ctr_offset != tmp->ref_ctr_offset) {
> +                       pr_warn("Reference counter offset mismatch.");
> +                       return ERR_PTR(-EINVAL);
> +               }
> +       }
> +       return old;
> +}
> +
>  /* Register a trace_uprobe and probe_event */
>  static int register_trace_uprobe(struct trace_uprobe *tu)
>  {
> @@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
>         mutex_lock(&uprobe_lock);
>
>         /* register as an event */
> -       old_tu = find_probe_event(trace_event_name(&tu->tp.call),
> -                       tu->tp.call.class->system);
> +       old_tu = find_old_trace_uprobe(tu);
> +       if (IS_ERR(old_tu)) {
> +               ret = PTR_ERR(old_tu);
> +               goto end;
> +       }
> +
>         if (old_tu) {
>                 /* delete old event */
>                 ret = unregister_trace_uprobe(old_tu);
> --
> 2.14.4
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-11  7:57   ` Song Liu
@ 2018-08-11 15:37     ` Steven Rostedt
  2018-08-13  5:47     ` Ravi Bangoria
  1 sibling, 0 replies; 42+ messages in thread
From: Steven Rostedt @ 2018-08-11 15:37 UTC (permalink / raw)
  To: Song Liu
  Cc: Ravi Bangoria, srikar, Oleg Nesterov, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

On Sat, 11 Aug 2018 00:57:12 -0700
Song Liu <liu.song.a23@gmail.com> wrote:

> > +
> > +static void delayed_uprobe_delete(struct delayed_uprobe *du)
> > +{
> > +       if (!du)
> > +               return;  
> Do we really need this check?

I'd suggest we keep it. It's not a fast path, and operations like this
should really check for NULL.


> 
> > +       list_del(&du->list);
> > +       kfree(du);
> > +}
> > +
> > +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct *mm)
> > +{
> > +       struct list_head *pos, *q;
> > +       struct delayed_uprobe *du;
> > +
> > +       if (!uprobe && !mm)
> > +               return;  
> And do we really need this check?

Same here, as it's not a fast path, and it prevents kernel oops if a
NULL is passed in.

> 
> > +
> > +       list_for_each_safe(pos, q, &delayed_uprobe_list) {
> > +               du = list_entry(pos, struct delayed_uprobe, list);
> > +
> > +               if (uprobe && mm && du->uprobe == uprobe && du->mm == mm)
> > +                       delayed_uprobe_delete(du);
> > +               else if (!uprobe && du->mm == mm)
> > +                       delayed_uprobe_delete(du);
> > +               else if (!mm && du->uprobe == uprobe)
> > +                       delayed_uprobe_delete(du);
> > +       }
> > +}
> > +
> > +static bool valid_ref_ctr_vma(struct uprobe *uprobe,
> > +                             struct vm_area_struct *vma)
> > +{
> > +       unsigned long vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
> > +
> > +       return uprobe->ref_ctr_offset &&
> > +               vma->vm_file &&
> > +               file_inode(vma->vm_file) == uprobe->inode &&
> > +               (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
> > +               vma->vm_start <= vaddr &&
> > +               vma->vm_end > vaddr;
> > +}
> > +
> > +static struct vm_area_struct *
> > +find_ref_ctr_vma(struct uprobe *uprobe, struct mm_struct *mm)
> > +{
> > +       struct vm_area_struct *tmp;
> > +
> > +       for (tmp = mm->mmap; tmp; tmp = tmp->vm_next)
> > +               if (valid_ref_ctr_vma(uprobe, tmp))
> > +                       return tmp;
> > +
> > +       return NULL;
> > +}
> > +
> > +static int
> > +__update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
> > +{
> > +       void *kaddr;
> > +       struct page *page;
> > +       struct vm_area_struct *vma;
> > +       int ret = 0;  
> It is not necessary to initialize ret here.

Agreed.

> 
> > +       short *ptr;
> > +
> > +       if (vaddr == 0 || d == 0)
> > +               return -EINVAL;
> > +
> > +       ret = get_user_pages_remote(NULL, mm, vaddr, 1,
> > +                       FOLL_WRITE, &page, &vma, NULL);
> > +       if (unlikely(ret <= 0)) {
> > +               /*
> > +                * We are asking for 1 page. If get_user_pages_remote() fails,
> > +                * it may return 0, in that case we have to return error.
> > +                */
> > +               ret = (ret == 0) ? -EBUSY : ret;
> > +               pr_warn("Failed to %s ref_ctr. (%d)\n",
> > +                       d > 0 ? "increment" : "decrement", ret);  
> This warning is not really useful. Seems this function has little information
> about which uprobe is failing here. Maybe we only need warning in the caller
> (or caller of caller).

I'm fine with or without this.

> 
> > +               return ret;
> > +       }
> > +
> > +       kaddr = kmap_atomic(page);
> > +       ptr = kaddr + (vaddr & ~PAGE_MASK);
> > +
> > +       if (unlikely(*ptr + d < 0)) {
> > +               pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
> > +                       "curr val: %d, delta: %d\n", vaddr, *ptr, d);
> > +               ret = -EINVAL;
> > +               goto out;
> > +       }
> > +
> > +       *ptr += d;
> > +       ret = 0;
> > +out:
> > +       kunmap_atomic(kaddr);
> > +       put_page(page);
> > +       return ret;
> > +}
> > +
> > +static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
> > +                         bool is_register)  
> What's the reason of bool is_register here vs. short d in __update_ref_ctr()?
> Can we use short for both?

bool has more control, but I agree that this function is a bit
confusing.

-- Steve

> 
> > +{
> > +       struct vm_area_struct *rc_vma;
> > +       unsigned long rc_vaddr;
> > +       int ret = 0;
> > +
> > +       rc_vma = find_ref_ctr_vma(uprobe, mm);
> > +
> > +       if (rc_vma) {
> > +               rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
> > +               ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
> > +
> > +               if (is_register)
> > +                       return ret;
> > +       }  
> Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same
> function is a little confusing (at least for me). How about we always use
> delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)?
> 
> > +
> > +       mutex_lock(&delayed_uprobe_lock);
> > +       if (is_register)
> > +               ret = delayed_uprobe_add(uprobe, mm);
> > +       else
> > +               delayed_uprobe_remove(uprobe, mm);
> > +       mutex_unlock(&delayed_uprobe_lock);
> > +
> > +       return ret;
> > +}
> > +
> >  /*
> >   * NOTE:
> >   * Expect the breakpoint instruction to be the smallest size instruction for
> > @@ -302,9 +463,13 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
> >  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> >                         unsigned long vaddr, uprobe_opcode_t opcode)
> >  {
> > +       struct uprobe *uprobe;
> >         struct page *old_page, *new_page;
> >         struct vm_area_struct *vma;
> > -       int ret;
> > +       int ret, is_register, ref_ctr_updated = 0;
> > +
> > +       is_register = is_swbp_insn(&opcode);
> > +       uprobe = container_of(auprobe, struct uprobe, arch);
> >
> >  retry:
> >         /* Read the page with vaddr into memory */
> > @@ -317,6 +482,15 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> >         if (ret <= 0)
> >                 goto put_old;
> >
> > +       /* We are going to replace instruction, update ref_ctr. */
> > +       if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
> > +               ret = update_ref_ctr(uprobe, mm, is_register);
> > +               if (ret)
> > +                       goto put_old;
> > +
> > +               ref_ctr_updated = 1;
> > +       }
> > +
> >         ret = anon_vma_prepare(vma);
> >         if (ret)
> >                 goto put_old;
> > @@ -337,6 +511,11 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> >
> >         if (unlikely(ret == -EAGAIN))
> >                 goto retry;
> > +
> > +       /* Revert back reference counter if instruction update failed. */
> > +       if (ret && is_register && ref_ctr_updated)
> > +               update_ref_ctr(uprobe, mm, false);
> > +
> >         return ret;
> >  }
> >
> > @@ -378,8 +557,15 @@ static struct uprobe *get_uprobe(struct uprobe *uprobe)
> >
> >  static void put_uprobe(struct uprobe *uprobe)
> >  {
> > -       if (atomic_dec_and_test(&uprobe->ref))
> > +       if (atomic_dec_and_test(&uprobe->ref)) {
> > +               /*
> > +                * If application munmap(exec_vma) before uprobe_unregister()
> > +                * gets called, we don't get a chance to remove uprobe from
> > +                * delayed_uprobe_list in remove_breakpoint(). Do it here.
> > +                */
> > +               delayed_uprobe_remove(uprobe, NULL);
> >                 kfree(uprobe);
> > +       }
> >  }
> >
> >  static int match_uprobe(struct uprobe *l, struct uprobe *r)
> > @@ -484,7 +670,8 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
> >         return u;
> >  }
> >
> > -static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
> > +static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
> > +                                  loff_t ref_ctr_offset)
> >  {
> >         struct uprobe *uprobe, *cur_uprobe;
> >
> > @@ -494,6 +681,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
> >
> >         uprobe->inode = inode;
> >         uprobe->offset = offset;
> > +       uprobe->ref_ctr_offset = ref_ctr_offset;
> >         init_rwsem(&uprobe->register_rwsem);
> >         init_rwsem(&uprobe->consumer_rwsem);
> >
> > @@ -895,7 +1083,7 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
> >   * else return 0 (success)
> >   */
> >  static int __uprobe_register(struct inode *inode, loff_t offset,
> > -                            struct uprobe_consumer *uc)
> > +                            loff_t ref_ctr_offset, struct uprobe_consumer *uc)
> >  {
> >         struct uprobe *uprobe;
> >         int ret;
> > @@ -912,7 +1100,7 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
> >                 return -EINVAL;
> >
> >   retry:
> > -       uprobe = alloc_uprobe(inode, offset);
> > +       uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
> >         if (!uprobe)
> >                 return -ENOMEM;
> >         /*
> > @@ -938,10 +1126,17 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
> >  int uprobe_register(struct inode *inode, loff_t offset,
> >                     struct uprobe_consumer *uc)
> >  {
> > -       return __uprobe_register(inode, offset, uc);
> > +       return __uprobe_register(inode, offset, 0, uc);
> >  }
> >  EXPORT_SYMBOL_GPL(uprobe_register);
> >
> > +int uprobe_register_refctr(struct inode *inode, loff_t offset,
> > +                          loff_t ref_ctr_offset, struct uprobe_consumer *uc)
> > +{
> > +       return __uprobe_register(inode, offset, ref_ctr_offset, uc);
> > +}
> > +EXPORT_SYMBOL_GPL(uprobe_register_refctr);
> > +
> >  /*
> >   * uprobe_apply - unregister a already registered probe.
> >   * @inode: the file in which the probe has to be removed.
> > @@ -1060,6 +1255,31 @@ static void build_probe_list(struct inode *inode,
> >         spin_unlock(&uprobes_treelock);
> >  }
> >
> > +static int delayed_uprobe_install(struct vm_area_struct *vma)  
> This function name is confusing. How about we call it delayed_ref_ctr_incr() or
> something similar? Also, we should add comments to highlight this is vma is not
> the vma containing the uprobe, but the vma containing the ref_ctr.
> 
> > +{
> > +       struct list_head *pos, *q;
> > +       struct delayed_uprobe *du;
> > +       unsigned long vaddr;
> > +       int ret = 0, err = 0;
> > +
> > +       mutex_lock(&delayed_uprobe_lock);
> > +       list_for_each_safe(pos, q, &delayed_uprobe_list) {
> > +               du = list_entry(pos, struct delayed_uprobe, list);
> > +
> > +               if (!valid_ref_ctr_vma(du->uprobe, vma))
> > +                       continue;
> > +
> > +               vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
> > +               ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
> > +               /* Record an error and continue. */
> > +               if (ret && !err)
> > +                       err = ret;  
> I think this is a good place (when ret != 0) to call pr_warn(). I guess we can
> print which mm get error for which uprobe (inode+offset).
> 
> > +               delayed_uprobe_delete(du);
> > +       }
> > +       mutex_unlock(&delayed_uprobe_lock);
> > +       return err;
> > +}
> > +
> >  /*
> >   * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
> >   *
> > @@ -1072,7 +1292,15 @@ int uprobe_mmap(struct vm_area_struct *vma)
> >         struct uprobe *uprobe, *u;
> >         struct inode *inode;
> >
> > -       if (no_uprobe_events() || !valid_vma(vma, true))
> > +       if (no_uprobe_events())
> > +               return 0;
> > +
> > +       if (vma->vm_file &&
> > +           (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
> > +           test_bit(MMF_HAS_UPROBES, &vma->vm_mm->flags))
> > +               delayed_uprobe_install(vma);
> > +
> > +       if (!valid_vma(vma, true))
> >                 return 0;
> >
> >         inode = file_inode(vma->vm_file);
> > @@ -1246,6 +1474,10 @@ void uprobe_clear_state(struct mm_struct *mm)
> >  {
> >         struct xol_area *area = mm->uprobes_state.xol_area;
> >
> > +       mutex_lock(&delayed_uprobe_lock);
> > +       delayed_uprobe_remove(NULL, mm);
> > +       mutex_unlock(&delayed_uprobe_lock);
> > +
> >         if (!area)
> >                 return;
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 823687997b01..616160b85860 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -4620,7 +4620,7 @@ static const char readme_msg[] =
> >    "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
> >  #endif
> >  #ifdef CONFIG_UPROBE_EVENTS
> > -       "\t    place: <path>:<offset>\n"
> > +  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
> >  #endif
> >         "\t     args: <name>=fetcharg[:type]\n"
> >         "\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
> > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > index bf89a51e740d..bf2be098eb08 100644
> > --- a/kernel/trace/trace_uprobe.c
> > +++ b/kernel/trace/trace_uprobe.c
> > @@ -59,6 +59,7 @@ struct trace_uprobe {
> >         struct inode                    *inode;
> >         char                            *filename;
> >         unsigned long                   offset;
> > +       unsigned long                   ref_ctr_offset;
> >         unsigned long                   nhit;
> >         struct trace_probe              tp;
> >  };
> > @@ -364,10 +365,10 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
> >  static int create_trace_uprobe(int argc, char **argv)
> >  {
> >         struct trace_uprobe *tu;
> > -       char *arg, *event, *group, *filename;
> > +       char *arg, *event, *group, *filename, *rctr, *rctr_end;
> >         char buf[MAX_EVENT_NAME_LEN];
> >         struct path path;
> > -       unsigned long offset;
> > +       unsigned long offset, ref_ctr_offset;
> >         bool is_delete, is_return;
> >         int i, ret;
> >
> > @@ -376,6 +377,7 @@ static int create_trace_uprobe(int argc, char **argv)
> >         is_return = false;
> >         event = NULL;
> >         group = NULL;
> > +       ref_ctr_offset = 0;
> >
> >         /* argc must be >= 1 */
> >         if (argv[0][0] == '-')
> > @@ -450,6 +452,26 @@ static int create_trace_uprobe(int argc, char **argv)
> >                 goto fail_address_parse;
> >         }
> >
> > +       /* Parse reference counter offset if specified. */
> > +       rctr = strchr(arg, '(');
> > +       if (rctr) {
> > +               rctr_end = strchr(rctr, ')');
> > +               if (rctr > rctr_end || *(rctr_end + 1) != 0) {
> > +                       ret = -EINVAL;
> > +                       pr_info("Invalid reference counter offset.\n");
> > +                       goto fail_address_parse;
> > +               }
> > +
> > +               *rctr++ = '\0';
> > +               *rctr_end = '\0';
> > +               ret = kstrtoul(rctr, 0, &ref_ctr_offset);
> > +               if (ret) {
> > +                       pr_info("Invalid reference counter offset.\n");
> > +                       goto fail_address_parse;
> > +               }
> > +       }
> > +
> > +       /* Parse uprobe offset. */
> >         ret = kstrtoul(arg, 0, &offset);
> >         if (ret)
> >                 goto fail_address_parse;
> > @@ -484,6 +506,7 @@ static int create_trace_uprobe(int argc, char **argv)
> >                 goto fail_address_parse;
> >         }
> >         tu->offset = offset;
> > +       tu->ref_ctr_offset = ref_ctr_offset;
> >         tu->path = path;
> >         tu->filename = kstrdup(filename, GFP_KERNEL);
> >
> > @@ -602,6 +625,9 @@ static int probes_seq_show(struct seq_file *m, void *v)
> >                         trace_event_name(&tu->tp.call), tu->filename,
> >                         (int)(sizeof(void *) * 2), tu->offset);
> >
> > +       if (tu->ref_ctr_offset)
> > +               seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
> > +
> >         for (i = 0; i < tu->tp.nr_args; i++)
> >                 seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
> >
> > @@ -917,7 +943,13 @@ probe_event_enable(struct trace_uprobe *tu, struct trace_event_file *file,
> >
> >         tu->consumer.filter = filter;
> >         tu->inode = d_real_inode(tu->path.dentry);
> > -       ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> > +       if (tu->ref_ctr_offset) {
> > +               ret = uprobe_register_refctr(tu->inode, tu->offset,
> > +                               tu->ref_ctr_offset, &tu->consumer);
> > +       } else {
> > +               ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> > +       }
> > +
> >         if (ret)
> >                 goto err_buffer;
> >
> > --
> > 2.14.4
> >  


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-11  7:57   ` Song Liu
  2018-08-11 15:37     ` Steven Rostedt
@ 2018-08-13  5:47     ` Ravi Bangoria
  2018-08-13  7:38       ` Ravi Bangoria
                         ` (2 more replies)
  1 sibling, 3 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-13  5:47 UTC (permalink / raw)
  To: Song Liu
  Cc: srikar, Oleg Nesterov, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton, Ravi Bangoria

Hi Song,

On 08/11/2018 01:27 PM, Song Liu wrote:
>> +
>> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
>> +{
>> +       if (!du)
>> +               return;
> Do we really need this check?


Not necessary though, but I would still like to keep it for a safety.


> 
>> +       list_del(&du->list);
>> +       kfree(du);
>> +}
>> +
>> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct *mm)
>> +{
>> +       struct list_head *pos, *q;
>> +       struct delayed_uprobe *du;
>> +
>> +       if (!uprobe && !mm)
>> +               return;
> And do we really need this check?


Yes. delayed_uprobe_remove(uprobe=NULL, mm=NULL) is an invalid case. If I remove
this check, code below (or more accurately code suggested by Oleg) will remove
all entries from delayed_uprobe_list. So I will keep this check but put a comment
above function.


[...]
>> +
>> +       ret = get_user_pages_remote(NULL, mm, vaddr, 1,
>> +                       FOLL_WRITE, &page, &vma, NULL);
>> +       if (unlikely(ret <= 0)) {
>> +               /*
>> +                * We are asking for 1 page. If get_user_pages_remote() fails,
>> +                * it may return 0, in that case we have to return error.
>> +                */
>> +               ret = (ret == 0) ? -EBUSY : ret;
>> +               pr_warn("Failed to %s ref_ctr. (%d)\n",
>> +                       d > 0 ? "increment" : "decrement", ret);
> This warning is not really useful. Seems this function has little information
> about which uprobe is failing here. Maybe we only need warning in the caller
> (or caller of caller).


Sure, I can move this warning to caller of this function but what are the
exact fields you would like to print with warning? Something like this is
fine?

    pr_warn("ref_ctr %s failed for 0x%lx, 0x%lx, 0x%lx, 0x%p",
		d > 0 ? "increment" : "decrement", inode->i_ino,
		offset, ref_ctr_offset, mm);

More importantly, the reason I didn't print more info is because dmesg is
accessible to unprivileged users in many distros but uprobes are not. So
printing this information may be a security violation. No?


> 
>> +               return ret;
>> +       }
>> +
>> +       kaddr = kmap_atomic(page);
>> +       ptr = kaddr + (vaddr & ~PAGE_MASK);
>> +
>> +       if (unlikely(*ptr + d < 0)) {
>> +               pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
>> +                       "curr val: %d, delta: %d\n", vaddr, *ptr, d);
>> +               ret = -EINVAL;
>> +               goto out;
>> +       }
>> +
>> +       *ptr += d;
>> +       ret = 0;
>> +out:
>> +       kunmap_atomic(kaddr);
>> +       put_page(page);
>> +       return ret;
>> +}
>> +
>> +static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
>> +                         bool is_register)
> What's the reason of bool is_register here vs. short d in __update_ref_ctr()?
> Can we use short for both?


Yes, I can use short as well.


> 
>> +{
>> +       struct vm_area_struct *rc_vma;
>> +       unsigned long rc_vaddr;
>> +       int ret = 0;
>> +
>> +       rc_vma = find_ref_ctr_vma(uprobe, mm);
>> +
>> +       if (rc_vma) {
>> +               rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
>> +               ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
>> +
>> +               if (is_register)
>> +                       return ret;
>> +       }
> Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same
> function is a little confusing (at least for me). How about we always use
> delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)?


No. delayed_uprobe_add() is needed for uprobe_register() case to handle race
between uprobe_register() and process creation.


[...]
>>
>> +static int delayed_uprobe_install(struct vm_area_struct *vma)
> This function name is confusing. How about we call it delayed_ref_ctr_incr() or
> something similar? Also, we should add comments to highlight this is vma is not
> the vma containing the uprobe, but the vma containing the ref_ctr.


Sure, I'll do that.


> 
>> +{
>> +       struct list_head *pos, *q;
>> +       struct delayed_uprobe *du;
>> +       unsigned long vaddr;
>> +       int ret = 0, err = 0;
>> +
>> +       mutex_lock(&delayed_uprobe_lock);
>> +       list_for_each_safe(pos, q, &delayed_uprobe_list) {
>> +               du = list_entry(pos, struct delayed_uprobe, list);
>> +
>> +               if (!valid_ref_ctr_vma(du->uprobe, vma))
>> +                       continue;
>> +
>> +               vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
>> +               ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
>> +               /* Record an error and continue. */
>> +               if (ret && !err)
>> +                       err = ret;
> I think this is a good place (when ret != 0) to call pr_warn(). I guess we can
> print which mm get error for which uprobe (inode+offset).


__update_ref_ctr() is already printing warning, so I didn't add anything here.
In case I remove a warning from __update_ref_ctr(), a warning something like
below is fine?

    pr_warn("ref_ctr increment failed for 0x%lx, 0x%lx, 0x%lx, 0x%p",
		inode->i_ino, offset, ref_ctr_offset, vma->vm_mm);

Again, can this lead to a security violation?

Thanks for detailed review :)
-Ravi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-13  5:47     ` Ravi Bangoria
@ 2018-08-13  7:38       ` Ravi Bangoria
  2018-08-13 11:50       ` Oleg Nesterov
  2018-08-13 16:50       ` Song Liu
  2 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-13  7:38 UTC (permalink / raw)
  To: Song Liu
  Cc: srikar, Oleg Nesterov, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton, Ravi Bangoria

Hi Song,

On 08/13/2018 11:17 AM, Ravi Bangoria wrote:
>>> +
>>> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct *mm)
>>> +{
>>> +       struct list_head *pos, *q;
>>> +       struct delayed_uprobe *du;
>>> +
>>> +       if (!uprobe && !mm)
>>> +               return;
>> And do we really need this check?
> 
> 
> Yes. delayed_uprobe_remove(uprobe=NULL, mm=NULL) is an invalid case. If I remove
> this check, code below (or more accurately code suggested by Oleg) will remove
> all entries from delayed_uprobe_list. So I will keep this check but put a comment
> above function.
> 

Sorry, my bad. Please ignore above comment. Even though, it saves us
to unnecessary loop over entire delayed_uprobe_list when both uprobe
and mm are NULL, that case is not possible with current code. Also,
I'm not dereferencing any of them. So, IMHO it's fine to remove this
check.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  2018-08-11  8:12   ` Song Liu
@ 2018-08-13  8:19     ` Ravi Bangoria
  2018-08-13  8:49       ` Srikar Dronamraju
  0 siblings, 1 reply; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-13  8:19 UTC (permalink / raw)
  To: Song Liu, srikar, Oleg Nesterov, Steven Rostedt, mhiramat
  Cc: Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Song,

On 08/11/2018 01:42 PM, Song Liu wrote:
> Do we really need this given we already have PATCH 4/6?
> uprobe_regsiter() can be called
> out of trace_uprobe, this patch won't catch all conflicts anyway.

Right but it, at least, catch all conflicts happening via trace_uprobe.

I don't mind in removing this patch but I would like to get an opinion of
Oleg/Srikar/Steven/Masami.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  2018-08-09  4:18 ` [PATCH v8 5/6] trace_uprobe/sdt: " Ravi Bangoria
  2018-08-11  8:12   ` Song Liu
@ 2018-08-13  8:48   ` Srikar Dronamraju
  1 sibling, 0 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2018-08-13  8:48 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, rostedt, mhiramat, liu.song.a23, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

* Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-08-09 09:48:55]:

> We assume to have only one reference counter for one uprobe.
> Don't allow user to add multiple trace_uprobe entries having
> same inode+offset but different reference counter.
> 
> Ex,
>   # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
>   # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xfffff)" >> uprobe_events
>   bash: echo: write error: Invalid argument
> 
>   # dmesg
>   trace_kprobe: Reference counter offset mismatch.
> 
> There is one exception though:
> When user is trying to replace the old entry with the new
> one, we allow this if the new entry does not conflict with
> any other existing entries.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> ---

Looks good to me.

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  2018-08-13  8:19     ` Ravi Bangoria
@ 2018-08-13  8:49       ` Srikar Dronamraju
  2018-08-13 16:27         ` Song Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2018-08-13  8:49 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Song Liu, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

* Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-08-13 13:49:44]:

> Hi Song,
> 
> On 08/11/2018 01:42 PM, Song Liu wrote:
> > Do we really need this given we already have PATCH 4/6?
> > uprobe_regsiter() can be called
> > out of trace_uprobe, this patch won't catch all conflicts anyway.
> 
> Right but it, at least, catch all conflicts happening via trace_uprobe.
> 
> I don't mind in removing this patch but I would like to get an opinion of
> Oleg/Srikar/Steven/Masami.
> 

I would suggest to keep it, atleast it can ctah conflicts happening via
trace_uprobe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe
  2018-08-09  4:18 ` [PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
  2018-08-11  8:04   ` Song Liu
@ 2018-08-13  8:50   ` Srikar Dronamraju
  1 sibling, 0 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2018-08-13  8:50 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, rostedt, mhiramat, liu.song.a23, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

* Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-08-09 09:48:54]:

> We assume to have only one reference counter for one uprobe.
> Don't allow user to register multiple uprobes having same
> inode+offset but different reference counter.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> ---
Looks good to me.

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
  2018-08-09  4:18 ` [PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Ravi Bangoria
  2018-08-11  8:01   ` Song Liu
@ 2018-08-13  8:51   ` Srikar Dronamraju
  1 sibling, 0 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2018-08-13  8:51 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, rostedt, mhiramat, liu.song.a23, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

* Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-08-09 09:48:52]:

> Add addition argument 'arch_uprobe' to uprobe_write_opcode().
> We need this in later set of patches.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Looks good to me.

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/6] Uprobes: Simplify uprobe_register() body
  2018-08-09  4:18 ` [PATCH v8 1/6] Uprobes: Simplify uprobe_register() body Ravi Bangoria
  2018-08-11  8:00   ` Song Liu
@ 2018-08-13  8:56   ` Srikar Dronamraju
  2018-08-14  0:07     ` Steven Rostedt
  1 sibling, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2018-08-13  8:56 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, rostedt, mhiramat, liu.song.a23, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

* Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-08-09 09:48:51]:

> Simplify uprobe_register() function body and let __uprobe_register()
> handle everything. Also move dependency functions around to fix build
> failures.
> 

One nit:
s/to fix build/failures/to avoid build failures/



> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> ---


Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-09  4:18 ` [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
  2018-08-09 14:38   ` Oleg Nesterov
  2018-08-11  7:57   ` Song Liu
@ 2018-08-13 10:03   ` Srikar Dronamraju
  2018-08-13 11:23     ` Oleg Nesterov
  2018-08-14  8:56   ` Ravi Bangoria
  3 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2018-08-13 10:03 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, rostedt, mhiramat, liu.song.a23, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

> +
> +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +	struct delayed_uprobe *du;
> +
> +	if (delayed_uprobe_check(uprobe, mm))
> +		return 0;
> +
> +	du = kzalloc(sizeof(*du), GFP_KERNEL);
> +	if (!du)
> +		return -ENOMEM;
> +
> +	du->uprobe = uprobe;
> +	du->mm = mm;
> +	list_add(&du->list, &delayed_uprobe_list);
> +	return 0;
> +}
> +

Should we keep the delayed uprobe list per mm?
That way we could avoid the global mutex lock.
And the list to traverse would also be small.


> 
> @@ -378,8 +557,15 @@ static struct uprobe *get_uprobe(struct uprobe *uprobe)
> 
>  static void put_uprobe(struct uprobe *uprobe)
>  {
> -	if (atomic_dec_and_test(&uprobe->ref))
> +	if (atomic_dec_and_test(&uprobe->ref)) {
> +		/*
> +		 * If application munmap(exec_vma) before uprobe_unregister()
> +		 * gets called, we don't get a chance to remove uprobe from
> +		 * delayed_uprobe_list in remove_breakpoint(). Do it here.
> +		 */
> +		delayed_uprobe_remove(uprobe, NULL);

I am not getting this part. If unmap happens before unregister,
why cant we use uprobe_munmap? what am I missing.

>  		kfree(uprobe);
> +	}
>  }
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-13 10:03   ` Srikar Dronamraju
@ 2018-08-13 11:23     ` Oleg Nesterov
  0 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2018-08-13 11:23 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ravi Bangoria, rostedt, mhiramat, liu.song.a23, peterz, mingo,
	acme, alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

On 08/13, Srikar Dronamraju wrote:
>
> > +
> > +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
> > +{
> > +	struct delayed_uprobe *du;
> > +
> > +	if (delayed_uprobe_check(uprobe, mm))
> > +		return 0;
> > +
> > +	du = kzalloc(sizeof(*du), GFP_KERNEL);
> > +	if (!du)
> > +		return -ENOMEM;
> > +
> > +	du->uprobe = uprobe;
> > +	du->mm = mm;
> > +	list_add(&du->list, &delayed_uprobe_list);
> > +	return 0;
> > +}
> > +
>
> Should we keep the delayed uprobe list per mm?
> That way we could avoid the global mutex lock.
> And the list to traverse would also be small.

Plus uprobe_mmap() could simply check list_empty(mm->delayed_list) before
the costly delayed_uprobe_install().

Yes, I mentioned this too. But then I suggested to not do this in the initial
version to make it more simple.

Hopefully we can do this later, but note that this conflicts with the change
in put_uprobe() you commented below.

> >  static void put_uprobe(struct uprobe *uprobe)
> >  {
> > -	if (atomic_dec_and_test(&uprobe->ref))
> > +	if (atomic_dec_and_test(&uprobe->ref)) {
> > +		/*
> > +		 * If application munmap(exec_vma) before uprobe_unregister()
> > +		 * gets called, we don't get a chance to remove uprobe from
> > +		 * delayed_uprobe_list in remove_breakpoint(). Do it here.
> > +		 */
> > +		delayed_uprobe_remove(uprobe, NULL);
>
> I am not getting this part. If unmap happens before unregister,
> why cant we use uprobe_munmap?

How? I mean what exactly can we do in uprobe_munmap() ? Firstly, we will need
to restore build_probe_list/list_for_each_entry_safe(pending_list) in
uprobe_munmap() and this is not nice performance-wise. Then what?

We don't even know if the caller is actually munmap(), it can be vma_adjust()
and in the latter case we can't do delayed_uprobe_remove(uprobe, mm).

Perhaps we can use uprobe_munmap() to cleanup the delayed_uprobe_list, but this
doesn't look simple to me.

In fact I think that we should simply kill uprobe_munmap().

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-13  5:47     ` Ravi Bangoria
  2018-08-13  7:38       ` Ravi Bangoria
@ 2018-08-13 11:50       ` Oleg Nesterov
  2018-08-13 13:01         ` Ravi Bangoria
  2018-08-14  0:03         ` Steven Rostedt
  2018-08-13 16:50       ` Song Liu
  2 siblings, 2 replies; 42+ messages in thread
From: Oleg Nesterov @ 2018-08-13 11:50 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Song Liu, srikar, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

On 08/13, Ravi Bangoria wrote:
>
> On 08/11/2018 01:27 PM, Song Liu wrote:
> >> +
> >> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
> >> +{
> >> +       if (!du)
> >> +               return;
> > Do we really need this check?
>
> Not necessary though, but I would still like to keep it for a safety.

Heh. I tried to ignore all minor problems in this version, but now that Song
mentioned this unnecessary check...

Personally I really dislike the checks like this one.

	- It can confuse the reader who will try to understand the purpose

	- it can hide a bug if delayed_uprobe_delete(du) is actually called
	  with du == NULL.

IMO, you should either remove it and let the kernel crash (to notice the
problem), or turn it into

	if (WARN_ON(!du))
		return;

which is self-documented and reports the problem without kernel crash.

> >> +       rc_vma = find_ref_ctr_vma(uprobe, mm);
> >> +
> >> +       if (rc_vma) {
> >> +               rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
> >> +               ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
> >> +
> >> +               if (is_register)
> >> +                       return ret;
> >> +       }
> > Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same
> > function is a little confusing (at least for me). How about we always use
> > delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)?
>
>
> No. delayed_uprobe_add() is needed for uprobe_register() case to handle race
> between uprobe_register() and process creation.

Yes.

But damn, process creation (exec) is trivial. We could add a new uprobe_exec()
hook and avoid delayed_uprobe_install() in uprobe_mmap().

Afaics, the really problematic case is dlopen() which can race with _register()
too, right?

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-13 11:50       ` Oleg Nesterov
@ 2018-08-13 13:01         ` Ravi Bangoria
  2018-08-13 13:17           ` Oleg Nesterov
  2018-08-14  0:03         ` Steven Rostedt
  1 sibling, 1 reply; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-13 13:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Song Liu, srikar, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton, Ravi Bangoria

Hi Oleg,

On 08/13/2018 05:20 PM, Oleg Nesterov wrote:
> On 08/13, Ravi Bangoria wrote:
>>
>> On 08/11/2018 01:27 PM, Song Liu wrote:
>>>> +
>>>> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
>>>> +{
>>>> +       if (!du)
>>>> +               return;
>>> Do we really need this check?
>>
>> Not necessary though, but I would still like to keep it for a safety.
> 
> Heh. I tried to ignore all minor problems in this version, but now that Song
> mentioned this unnecessary check...
> 
> Personally I really dislike the checks like this one.
> 
> 	- It can confuse the reader who will try to understand the purpose
> 
> 	- it can hide a bug if delayed_uprobe_delete(du) is actually called
> 	  with du == NULL.
> 
> IMO, you should either remove it and let the kernel crash (to notice the
> problem), or turn it into
> 
> 	if (WARN_ON(!du))
> 		return;
> 
> which is self-documented and reports the problem without kernel crash.

Ok. I'll remove that check.

> 
>>>> +       rc_vma = find_ref_ctr_vma(uprobe, mm);
>>>> +
>>>> +       if (rc_vma) {
>>>> +               rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
>>>> +               ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
>>>> +
>>>> +               if (is_register)
>>>> +                       return ret;
>>>> +       }
>>> Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same
>>> function is a little confusing (at least for me). How about we always use
>>> delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)?
>>
>>
>> No. delayed_uprobe_add() is needed for uprobe_register() case to handle race
>> between uprobe_register() and process creation.
> 
> Yes.
> 
> But damn, process creation (exec) is trivial. We could add a new uprobe_exec()
> hook and avoid delayed_uprobe_install() in uprobe_mmap().

I'm sorry. I didn't get this.

> 
> Afaics, the really problematic case is dlopen() which can race with _register()
> too, right?

dlopen() should internally use mmap() right? So what is the problem here? Can
you please elaborate.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-13 13:01         ` Ravi Bangoria
@ 2018-08-13 13:17           ` Oleg Nesterov
  2018-08-13 14:26             ` Ravi Bangoria
                               ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Oleg Nesterov @ 2018-08-13 13:17 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Song Liu, srikar, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

On 08/13, Ravi Bangoria wrote:
>
> > But damn, process creation (exec) is trivial. We could add a new uprobe_exec()
> > hook and avoid delayed_uprobe_install() in uprobe_mmap().
>
> I'm sorry. I didn't get this.

Sorry for confusion...

I meant, if only exec*( could race with _register(), we could add another uprobe
hook which updates all (delayed) counters before return to user-mode.

> > Afaics, the really problematic case is dlopen() which can race with _register()
> > too, right?
>
> dlopen() should internally use mmap() right? So what is the problem here? Can
> you please elaborate.

What I tried to say is that we can't avoid uprobe_mmap()->delayed_uprobe_install()
because dlopen() can race with _register() too, just like exec.

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-13 13:17           ` Oleg Nesterov
@ 2018-08-13 14:26             ` Ravi Bangoria
  2018-08-13 14:51             ` Oleg Nesterov
  2018-08-13 17:12             ` Song Liu
  2 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-13 14:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Song Liu, srikar, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton, Ravi Bangoria



On 08/13/2018 06:47 PM, Oleg Nesterov wrote:
> On 08/13, Ravi Bangoria wrote:
>>
>>> But damn, process creation (exec) is trivial. We could add a new uprobe_exec()
>>> hook and avoid delayed_uprobe_install() in uprobe_mmap().
>>
>> I'm sorry. I didn't get this.
> 
> Sorry for confusion...
> 
> I meant, if only exec*( could race with _register(), we could add another uprobe
> hook which updates all (delayed) counters before return to user-mode.

Ok.

> 
>>> Afaics, the really problematic case is dlopen() which can race with _register()
>>> too, right?
>>
>> dlopen() should internally use mmap() right? So what is the problem here? Can
>> you please elaborate.
> 
> What I tried to say is that we can't avoid uprobe_mmap()->delayed_uprobe_install()
> because dlopen() can race with _register() too, just like exec.

Right :)

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-13 13:17           ` Oleg Nesterov
  2018-08-13 14:26             ` Ravi Bangoria
@ 2018-08-13 14:51             ` Oleg Nesterov
  2018-08-13 17:12             ` Song Liu
  2 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2018-08-13 14:51 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Song Liu, srikar, Steven Rostedt, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

On 08/13, Oleg Nesterov wrote:
>
> On 08/13, Ravi Bangoria wrote:
> >
> > > But damn, process creation (exec) is trivial. We could add a new uprobe_exec()
> > > hook and avoid delayed_uprobe_install() in uprobe_mmap().
> >
> > I'm sorry. I didn't get this.
>
> Sorry for confusion...
>
> I meant, if only exec*( could race with _register(), we could add another uprobe
> hook which updates all (delayed) counters before return to user-mode.
>
> > > Afaics, the really problematic case is dlopen() which can race with _register()
> > > too, right?
> >
> > dlopen() should internally use mmap() right? So what is the problem here? Can
> > you please elaborate.
>
> What I tried to say is that we can't avoid uprobe_mmap()->delayed_uprobe_install()
> because dlopen() can race with _register() too, just like exec.

I'm afraid I added even more confusion... so let me clarify although I am sure you
understand this better than me ;)

of course, the main reason why we can't avoid uprobe_mmap()->delayed_uprobe_install()
is not the race with _register(), the main reason is that dlopen() can mmap the code
first, then mmap the counter we need to increment.

Again, just like exec, but exec is "atomic" in that it can't return to user-mode until
evrything is done, so we could add another uprobe hook and avoid delayed_uprobe_install
in uprobe_mmap().

Just in case my previous email was not clear.

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  2018-08-13  8:49       ` Srikar Dronamraju
@ 2018-08-13 16:27         ` Song Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Song Liu @ 2018-08-13 16:27 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ravi Bangoria, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Mon, Aug 13, 2018 at 1:49 AM, Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
> * Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-08-13 13:49:44]:
>
>> Hi Song,
>>
>> On 08/11/2018 01:42 PM, Song Liu wrote:
>> > Do we really need this given we already have PATCH 4/6?
>> > uprobe_regsiter() can be called
>> > out of trace_uprobe, this patch won't catch all conflicts anyway.
>>
>> Right but it, at least, catch all conflicts happening via trace_uprobe.
>>
>> I don't mind in removing this patch but I would like to get an opinion of
>> Oleg/Srikar/Steven/Masami.
>>
>
> I would suggest to keep it, atleast it can ctah conflicts happening via
> trace_uprobe.
>

Yeah, that makes sense.

Reviewed-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-13  5:47     ` Ravi Bangoria
  2018-08-13  7:38       ` Ravi Bangoria
  2018-08-13 11:50       ` Oleg Nesterov
@ 2018-08-13 16:50       ` Song Liu
  2 siblings, 0 replies; 42+ messages in thread
From: Song Liu @ 2018-08-13 16:50 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Sun, Aug 12, 2018 at 10:47 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> Hi Song,
>
> On 08/11/2018 01:27 PM, Song Liu wrote:
>>> +
>>> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
>>> +{
>>> +       if (!du)
>>> +               return;
>> Do we really need this check?
>
>
> Not necessary though, but I would still like to keep it for a safety.
>
>
>>
>>> +       list_del(&du->list);
>>> +       kfree(du);
>>> +}
>>> +
>>> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct *mm)
>>> +{
>>> +       struct list_head *pos, *q;
>>> +       struct delayed_uprobe *du;
>>> +
>>> +       if (!uprobe && !mm)
>>> +               return;
>> And do we really need this check?
>
>
> Yes. delayed_uprobe_remove(uprobe=NULL, mm=NULL) is an invalid case. If I remove
> this check, code below (or more accurately code suggested by Oleg) will remove
> all entries from delayed_uprobe_list. So I will keep this check but put a comment
> above function.
>
>
> [...]
>>> +
>>> +       ret = get_user_pages_remote(NULL, mm, vaddr, 1,
>>> +                       FOLL_WRITE, &page, &vma, NULL);
>>> +       if (unlikely(ret <= 0)) {
>>> +               /*
>>> +                * We are asking for 1 page. If get_user_pages_remote() fails,
>>> +                * it may return 0, in that case we have to return error.
>>> +                */
>>> +               ret = (ret == 0) ? -EBUSY : ret;
>>> +               pr_warn("Failed to %s ref_ctr. (%d)\n",
>>> +                       d > 0 ? "increment" : "decrement", ret);
>> This warning is not really useful. Seems this function has little information
>> about which uprobe is failing here. Maybe we only need warning in the caller
>> (or caller of caller).
>
>
> Sure, I can move this warning to caller of this function but what are the
> exact fields you would like to print with warning? Something like this is
> fine?
>
>     pr_warn("ref_ctr %s failed for 0x%lx, 0x%lx, 0x%lx, 0x%p",
>                 d > 0 ? "increment" : "decrement", inode->i_ino,
>                 offset, ref_ctr_offset, mm);
>
> More importantly, the reason I didn't print more info is because dmesg is
> accessible to unprivileged users in many distros but uprobes are not. So
> printing this information may be a security violation. No?
>
>
>>
>>> +               return ret;
>>> +       }
>>> +
>>> +       kaddr = kmap_atomic(page);
>>> +       ptr = kaddr + (vaddr & ~PAGE_MASK);
>>> +
>>> +       if (unlikely(*ptr + d < 0)) {
>>> +               pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
>>> +                       "curr val: %d, delta: %d\n", vaddr, *ptr, d);
>>> +               ret = -EINVAL;
>>> +               goto out;
>>> +       }
>>> +
>>> +       *ptr += d;
>>> +       ret = 0;
>>> +out:
>>> +       kunmap_atomic(kaddr);
>>> +       put_page(page);
>>> +       return ret;
>>> +}
>>> +
>>> +static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
>>> +                         bool is_register)
>> What's the reason of bool is_register here vs. short d in __update_ref_ctr()?
>> Can we use short for both?
>
>
> Yes, I can use short as well.
>
>
>>
>>> +{
>>> +       struct vm_area_struct *rc_vma;
>>> +       unsigned long rc_vaddr;
>>> +       int ret = 0;
>>> +
>>> +       rc_vma = find_ref_ctr_vma(uprobe, mm);
>>> +
>>> +       if (rc_vma) {
>>> +               rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
>>> +               ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
>>> +
>>> +               if (is_register)
>>> +                       return ret;
>>> +       }
>> Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same
>> function is a little confusing (at least for me). How about we always use
>> delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)?
>
>
> No. delayed_uprobe_add() is needed for uprobe_register() case to handle race
> between uprobe_register() and process creation.

I see.

>
>
> [...]
>>>
>>> +static int delayed_uprobe_install(struct vm_area_struct *vma)
>> This function name is confusing. How about we call it delayed_ref_ctr_incr() or
>> something similar? Also, we should add comments to highlight this is vma is not
>> the vma containing the uprobe, but the vma containing the ref_ctr.
>
>
> Sure, I'll do that.
>
>
>>
>>> +{
>>> +       struct list_head *pos, *q;
>>> +       struct delayed_uprobe *du;
>>> +       unsigned long vaddr;
>>> +       int ret = 0, err = 0;
>>> +
>>> +       mutex_lock(&delayed_uprobe_lock);
>>> +       list_for_each_safe(pos, q, &delayed_uprobe_list) {
>>> +               du = list_entry(pos, struct delayed_uprobe, list);
>>> +
>>> +               if (!valid_ref_ctr_vma(du->uprobe, vma))
>>> +                       continue;
>>> +
>>> +               vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
>>> +               ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
>>> +               /* Record an error and continue. */
>>> +               if (ret && !err)
>>> +                       err = ret;
>> I think this is a good place (when ret != 0) to call pr_warn(). I guess we can
>> print which mm get error for which uprobe (inode+offset).
>
>
> __update_ref_ctr() is already printing warning, so I didn't add anything here.
> In case I remove a warning from __update_ref_ctr(), a warning something like
> below is fine?
>
>     pr_warn("ref_ctr increment failed for 0x%lx, 0x%lx, 0x%lx, 0x%p",
>                 inode->i_ino, offset, ref_ctr_offset, vma->vm_mm);
>

I was thinking about a message like:

ref_ctr increment failed for inode XX offset YY ref_ctr ZZ of mm 0xWWW

I didn't thought about the security part of it, but I guess it is OK.

Thanks,
Song

> Again, can this lead to a security violation?
>
> Thanks for detailed review :)
> -Ravi
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-09  4:18 [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (5 preceding siblings ...)
  2018-08-09  4:18 ` [PATCH v8 6/6] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
@ 2018-08-13 16:55 ` Oleg Nesterov
  6 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2018-08-13 16:55 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, liu.song.a23, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

On 08/09, Ravi Bangoria wrote:
>
> Ravi Bangoria (6):
>   Uprobes: Simplify uprobe_register() body
>   Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
>   Uprobes: Support SDT markers having reference count (semaphore)
>   Uprobes/sdt: Prevent multiple reference counter for same uprobe
>   trace_uprobe/sdt: Prevent multiple reference counter for same uprobe

OK, feel free to add

Reviewed-by: Oleg Nesterov <oleg@redhat.com>

>   perf probe: Support SDT markers having reference counter (semaphore)

I didn't even try to read this patch.

Oleg.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-13 13:17           ` Oleg Nesterov
  2018-08-13 14:26             ` Ravi Bangoria
  2018-08-13 14:51             ` Oleg Nesterov
@ 2018-08-13 17:12             ` Song Liu
  2018-08-14  4:37               ` Ravi Bangoria
  2 siblings, 1 reply; 42+ messages in thread
From: Song Liu @ 2018-08-13 17:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ravi Bangoria, Srikar Dronamraju, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Mon, Aug 13, 2018 at 6:17 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 08/13, Ravi Bangoria wrote:
>>
>> > But damn, process creation (exec) is trivial. We could add a new uprobe_exec()
>> > hook and avoid delayed_uprobe_install() in uprobe_mmap().
>>
>> I'm sorry. I didn't get this.
>
> Sorry for confusion...
>
> I meant, if only exec*( could race with _register(), we could add another uprobe
> hook which updates all (delayed) counters before return to user-mode.
>
>> > Afaics, the really problematic case is dlopen() which can race with _register()
>> > too, right?
>>
>> dlopen() should internally use mmap() right? So what is the problem here? Can
>> you please elaborate.
>
> What I tried to say is that we can't avoid uprobe_mmap()->delayed_uprobe_install()
> because dlopen() can race with _register() too, just like exec.
>
> Oleg.
>

How about we do delayed_uprobe_install() per file? Say we keep a list
of delayed_uprobe
in load_elf_binary(). Then we can install delayed_uprobe after loading
all sections of the
file.

Song

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-11  6:14       ` Song Liu
@ 2018-08-14  0:01         ` Steven Rostedt
  2018-08-14  0:05           ` Steven Rostedt
  0 siblings, 1 reply; 42+ messages in thread
From: Steven Rostedt @ 2018-08-14  0:01 UTC (permalink / raw)
  To: Song Liu
  Cc: Oleg Nesterov, Ravi Bangoria, srikar, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

On Fri, 10 Aug 2018 23:14:01 -0700
Song Liu <liu.song.a23@gmail.com> wrote:

> On Fri, Aug 10, 2018 at 12:58 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Thu, 9 Aug 2018 16:38:28 +0200
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >  
> >> I need to read this (hopefully final) version carefully. I'll try to do
> >> this before next Monday.
> >>  
> >
> > Monday may be the opening of the merge window (more likely Sunday). Do
> > you think this is good enough for pushing it in this late in the game,
> > or do you think we should wait another cycle?
> >
> > -- Steve  
> 
> We (Facebook) have use cases in production that would benefit from this work, so
> I would rather see this gets in sooner than later. After a brief
> review (I will more
> careful review before Monday), I think this set is not likely to break
> existing uprobes
> (those w/o ref_ctr). Therefore, I think it is safe to put it in this cycle.
> 

It's still going under review, and the merge window is now open. I'd
prefer that this waits till the next merge window.

-- Steve

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-13 11:50       ` Oleg Nesterov
  2018-08-13 13:01         ` Ravi Bangoria
@ 2018-08-14  0:03         ` Steven Rostedt
  1 sibling, 0 replies; 42+ messages in thread
From: Steven Rostedt @ 2018-08-14  0:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ravi Bangoria, Song Liu, srikar, mhiramat, Peter Zijlstra, mingo,
	acme, alexander.shishkin, jolsa, namhyung, open list, ananth,
	Alexis Berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

On Mon, 13 Aug 2018 13:50:19 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 08/13, Ravi Bangoria wrote:
> >
> > On 08/11/2018 01:27 PM, Song Liu wrote:  
> > >> +
> > >> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
> > >> +{
> > >> +       if (!du)
> > >> +               return;  
> > > Do we really need this check?  
> >
> > Not necessary though, but I would still like to keep it for a safety.  
> 
> Heh. I tried to ignore all minor problems in this version, but now that Song
> mentioned this unnecessary check...
> 
> Personally I really dislike the checks like this one.
> 
> 	- It can confuse the reader who will try to understand the purpose
> 
> 	- it can hide a bug if delayed_uprobe_delete(du) is actually called
> 	  with du == NULL.
> 
> IMO, you should either remove it and let the kernel crash (to notice the
> problem), or turn it into
> 
> 	if (WARN_ON(!du))
> 		return;

I'd prefer the more robust WARN_ON(!du) above instead of removing it.

-- Steve


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-14  0:01         ` Steven Rostedt
@ 2018-08-14  0:05           ` Steven Rostedt
  0 siblings, 0 replies; 42+ messages in thread
From: Steven Rostedt @ 2018-08-14  0:05 UTC (permalink / raw)
  To: Song Liu
  Cc: Oleg Nesterov, Ravi Bangoria, srikar, mhiramat, Peter Zijlstra,
	mingo, acme, alexander.shishkin, jolsa, namhyung, open list,
	ananth, Alexis Berlemont, naveen.n.rao, linux-arm-kernel,
	linux-mips, linux, ralf, paul.burton

On Mon, 13 Aug 2018 20:01:41 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 10 Aug 2018 23:14:01 -0700
> Song Liu <liu.song.a23@gmail.com> wrote:
> 
> > On Fri, Aug 10, 2018 at 12:58 PM, Steven Rostedt <rostedt@goodmis.org> wrote:  
> > > On Thu, 9 Aug 2018 16:38:28 +0200
> > > Oleg Nesterov <oleg@redhat.com> wrote:
> > >    
> > >> I need to read this (hopefully final) version carefully. I'll try to do
> > >> this before next Monday.
> > >>    
> > >
> > > Monday may be the opening of the merge window (more likely Sunday). Do
> > > you think this is good enough for pushing it in this late in the game,
> > > or do you think we should wait another cycle?
> > >
> > > -- Steve    
> > 
> > We (Facebook) have use cases in production that would benefit from this work, so
> > I would rather see this gets in sooner than later. After a brief
> > review (I will more
> > careful review before Monday), I think this set is not likely to break
> > existing uprobes
> > (those w/o ref_ctr). Therefore, I think it is safe to put it in this cycle.
> >   
> 
> It's still going under review, and the merge window is now open. I'd
> prefer that this waits till the next merge window.
> 
> 

But this said, patch 1&2 look simple enough to include this merge
window. I'll pull them in and start testing.

-- Steve

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/6] Uprobes: Simplify uprobe_register() body
  2018-08-13  8:56   ` Srikar Dronamraju
@ 2018-08-14  0:07     ` Steven Rostedt
  0 siblings, 0 replies; 42+ messages in thread
From: Steven Rostedt @ 2018-08-14  0:07 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ravi Bangoria, oleg, mhiramat, liu.song.a23, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

On Mon, 13 Aug 2018 01:56:12 -0700
Srikar Dronamraju <srikar@linux.vnet.ibm.com> wrote:

> * Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-08-09 09:48:51]:
> 
> > Simplify uprobe_register() function body and let __uprobe_register()
> > handle everything. Also move dependency functions around to fix build
> > failures.
> >   
> 
> One nit:
> s/to fix build/failures/to avoid build failures/

I pulled in this patch with the above update to the change log.

> 
> 
> 
> > Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> > ---  
> 
> 
> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Thanks, I added your ack and Song's Reviewed-by tags.

-- Steve

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-13 17:12             ` Song Liu
@ 2018-08-14  4:37               ` Ravi Bangoria
  2018-08-14 16:35                 ` Song Liu
  0 siblings, 1 reply; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-14  4:37 UTC (permalink / raw)
  To: Song Liu
  Cc: Oleg Nesterov, Srikar Dronamraju, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Song,

On 08/13/2018 10:42 PM, Song Liu wrote:
> On Mon, Aug 13, 2018 at 6:17 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> On 08/13, Ravi Bangoria wrote:
>>>
>>>> But damn, process creation (exec) is trivial. We could add a new uprobe_exec()
>>>> hook and avoid delayed_uprobe_install() in uprobe_mmap().
>>>
>>> I'm sorry. I didn't get this.
>>
>> Sorry for confusion...
>>
>> I meant, if only exec*( could race with _register(), we could add another uprobe
>> hook which updates all (delayed) counters before return to user-mode.
>>
>>>> Afaics, the really problematic case is dlopen() which can race with _register()
>>>> too, right?
>>>
>>> dlopen() should internally use mmap() right? So what is the problem here? Can
>>> you please elaborate.
>>
>> What I tried to say is that we can't avoid uprobe_mmap()->delayed_uprobe_install()
>> because dlopen() can race with _register() too, just like exec.
>>
>> Oleg.
>>
> 
> How about we do delayed_uprobe_install() per file? Say we keep a list
> of delayed_uprobe
> in load_elf_binary(). Then we can install delayed_uprobe after loading
> all sections of the
> file.

I'm not sure if I totally understood the idea. But how this approach can
solve dlopen() race with _register()?

Rather, making delayed_uprobe_list an mm field seems simple and effective
idea to me. The only overhead will be list_empty(mm->delayed_list) check.

Please let me know if I misunderstood you.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-09  4:18 ` [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                     ` (2 preceding siblings ...)
  2018-08-13 10:03   ` Srikar Dronamraju
@ 2018-08-14  8:56   ` Ravi Bangoria
  3 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-08-14  8:56 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria


> +static int delayed_uprobe_install(struct vm_area_struct *vma)
> +{
> +	struct list_head *pos, *q;
> +	struct delayed_uprobe *du;
> +	unsigned long vaddr;
> +	int ret = 0, err = 0;
> +
> +	mutex_lock(&delayed_uprobe_lock);
> +	list_for_each_safe(pos, q, &delayed_uprobe_list) {
> +		du = list_entry(pos, struct delayed_uprobe, list);
> +
> +		if (!valid_ref_ctr_vma(du->uprobe, vma))
> +			continue;

I think we should compare mm here. I.e.:

    if (du->mm != vma->vm_mm || !valid_ref_ctr_vma(du->uprobe, vma))
            continue;

Otherwise things can mess up.

> +
> +		vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
> +		ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
> +		/* Record an error and continue. */
> +		if (ret && !err)
> +			err = ret;
> +		delayed_uprobe_delete(du);
> +	}
> +	mutex_unlock(&delayed_uprobe_lock);
> +	return err;
> +}


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-14  4:37               ` Ravi Bangoria
@ 2018-08-14 16:35                 ` Song Liu
  0 siblings, 0 replies; 42+ messages in thread
From: Song Liu @ 2018-08-14 16:35 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Oleg Nesterov, Srikar Dronamraju, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Mon, Aug 13, 2018 at 9:37 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> Hi Song,
>
> On 08/13/2018 10:42 PM, Song Liu wrote:
>> On Mon, Aug 13, 2018 at 6:17 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>>> On 08/13, Ravi Bangoria wrote:
>>>>
>>>>> But damn, process creation (exec) is trivial. We could add a new uprobe_exec()
>>>>> hook and avoid delayed_uprobe_install() in uprobe_mmap().
>>>>
>>>> I'm sorry. I didn't get this.
>>>
>>> Sorry for confusion...
>>>
>>> I meant, if only exec*( could race with _register(), we could add another uprobe
>>> hook which updates all (delayed) counters before return to user-mode.
>>>
>>>>> Afaics, the really problematic case is dlopen() which can race with _register()
>>>>> too, right?
>>>>
>>>> dlopen() should internally use mmap() right? So what is the problem here? Can
>>>> you please elaborate.
>>>
>>> What I tried to say is that we can't avoid uprobe_mmap()->delayed_uprobe_install()
>>> because dlopen() can race with _register() too, just like exec.
>>>
>>> Oleg.
>>>
>>
>> How about we do delayed_uprobe_install() per file? Say we keep a list
>> of delayed_uprobe
>> in load_elf_binary(). Then we can install delayed_uprobe after loading
>> all sections of the
>> file.
>
> I'm not sure if I totally understood the idea. But how this approach can
> solve dlopen() race with _register()?
>
> Rather, making delayed_uprobe_list an mm field seems simple and effective
> idea to me. The only overhead will be list_empty(mm->delayed_list) check.
>
> Please let me know if I misunderstood you.
>
> Thanks,
> Ravi

I misunderstood the problem here. I guess mm->delayed_list is the
easiest solution of the race condition.

Thanks,
Song

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2018-08-14 16:35 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-09  4:18 [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-08-09  4:18 ` [PATCH v8 1/6] Uprobes: Simplify uprobe_register() body Ravi Bangoria
2018-08-11  8:00   ` Song Liu
2018-08-13  8:56   ` Srikar Dronamraju
2018-08-14  0:07     ` Steven Rostedt
2018-08-09  4:18 ` [PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Ravi Bangoria
2018-08-11  8:01   ` Song Liu
2018-08-13  8:51   ` Srikar Dronamraju
2018-08-09  4:18 ` [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-08-09 14:38   ` Oleg Nesterov
2018-08-10 19:58     ` Steven Rostedt
2018-08-11  6:14       ` Song Liu
2018-08-14  0:01         ` Steven Rostedt
2018-08-14  0:05           ` Steven Rostedt
2018-08-11  7:57   ` Song Liu
2018-08-11 15:37     ` Steven Rostedt
2018-08-13  5:47     ` Ravi Bangoria
2018-08-13  7:38       ` Ravi Bangoria
2018-08-13 11:50       ` Oleg Nesterov
2018-08-13 13:01         ` Ravi Bangoria
2018-08-13 13:17           ` Oleg Nesterov
2018-08-13 14:26             ` Ravi Bangoria
2018-08-13 14:51             ` Oleg Nesterov
2018-08-13 17:12             ` Song Liu
2018-08-14  4:37               ` Ravi Bangoria
2018-08-14 16:35                 ` Song Liu
2018-08-14  0:03         ` Steven Rostedt
2018-08-13 16:50       ` Song Liu
2018-08-13 10:03   ` Srikar Dronamraju
2018-08-13 11:23     ` Oleg Nesterov
2018-08-14  8:56   ` Ravi Bangoria
2018-08-09  4:18 ` [PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
2018-08-11  8:04   ` Song Liu
2018-08-13  8:50   ` Srikar Dronamraju
2018-08-09  4:18 ` [PATCH v8 5/6] trace_uprobe/sdt: " Ravi Bangoria
2018-08-11  8:12   ` Song Liu
2018-08-13  8:19     ` Ravi Bangoria
2018-08-13  8:49       ` Srikar Dronamraju
2018-08-13 16:27         ` Song Liu
2018-08-13  8:48   ` Srikar Dronamraju
2018-08-09  4:18 ` [PATCH v8 6/6] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
2018-08-13 16:55 ` [PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore) Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).