linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-08-20  4:42 Ravi Bangoria
  2018-08-20  4:42 ` [PATCH v9 1/4] " Ravi Bangoria
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Ravi Bangoria @ 2018-08-20  4:42 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

v8 -> v9:
 - Rebased to rostedt/for-next (Commit bb730b5833b5 to be precise)
 - Not including first two patches now. They are already pulled by
   Steven.
 - Change delayed_uprobe_remove() function as suggested by Oleg
 - Dump inode, offset, ref_ctr_offset, mm etc if we fail to update
   reference counter.
 - Rename delayed_uprobe_install() to delayed_ref_ctr_inc()
 - Use 'short d' (delta) in update_ref_ctr() in place of 'bool
   is_register'.

v8: https://lkml.org/lkml/2018/8/9/81

Future work:
 - Optimize uprobe_mmap()->delayed_ref_ctr_inc() by making
   delayed_uprobe_list per mm.

Description:
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the a uprobe infrastructure. Ex,[2]

  # cat tick.c
    ... 
    for (i = 0; i < 100; i++) {
	DTRACE_PROBE1(tick, loop1, i);
        if (TICK_LOOP2_ENABLED()) {
            DTRACE_PROBE1(tick, loop2, i); 
        }
        printf("hi: %d\n", i); 
        sleep(1);
    }   
    ... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
             3      sdt_tick:loop1
             0      sdt_tick:loop2
     2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C  
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506

Ravi Bangoria (4):
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  perf probe: Support SDT markers having reference counter (semaphore)

 include/linux/uprobes.h       |   5 +
 kernel/events/uprobes.c       | 278 ++++++++++++++++++++++++++++++++++++++++--
 kernel/trace/trace.c          |   2 +-
 kernel/trace/trace_uprobe.c   |  75 +++++++++++-
 tools/perf/util/probe-event.c |  39 +++++-
 tools/perf/util/probe-event.h |   1 +
 tools/perf/util/probe-file.c  |  34 +++++-
 tools/perf/util/probe-file.h  |   1 +
 tools/perf/util/symbol-elf.c  |  46 +++++--
 tools/perf/util/symbol.h      |   7 ++
 10 files changed, 453 insertions(+), 35 deletions(-)

-- 
2.14.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-20  4:42 [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
@ 2018-08-20  4:42 ` Ravi Bangoria
  2018-08-20  5:53   ` Song Liu
  2018-08-22 12:39   ` Srikar Dronamraju
  2018-08-20  4:42 ` [PATCH v9 2/4] Uprobes/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 24+ messages in thread
From: Ravi Bangoria @ 2018-08-20  4:42 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

    <path>:<offset>[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

    uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are calling it a 'reference counter'
in kernel / perf code.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
[Only trace_uprobe.c]
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/uprobes.h     |   5 +
 kernel/events/uprobes.c     | 259 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/trace/trace.c        |   2 +-
 kernel/trace/trace_uprobe.c |  38 ++++++-
 4 files changed, 293 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
 {
 	return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+	return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 919c1ce32beb..35065febcb6c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
 	struct uprobe_consumer	*consumers;
 	struct inode		*inode;		/* Also hold a ref to inode */
 	loff_t			offset;
+	loff_t			ref_ctr_offset;
 	unsigned long		flags;
 
 	/*
@@ -88,6 +89,15 @@ struct uprobe {
 	struct arch_uprobe	arch;
 };
 
+struct delayed_uprobe {
+	struct list_head list;
+	struct uprobe *uprobe;
+	struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,166 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
 	return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct delayed_uprobe *du;
+
+	list_for_each_entry(du, &delayed_uprobe_list, list)
+		if (du->uprobe == uprobe && du->mm == mm)
+			return du;
+	return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct delayed_uprobe *du;
+
+	if (delayed_uprobe_check(uprobe, mm))
+		return 0;
+
+	du  = kzalloc(sizeof(*du), GFP_KERNEL);
+	if (!du)
+		return -ENOMEM;
+
+	du->uprobe = uprobe;
+	du->mm = mm;
+	list_add(&du->list, &delayed_uprobe_list);
+	return 0;
+}
+
+static void delayed_uprobe_delete(struct delayed_uprobe *du)
+{
+	if (WARN_ON(!du))
+		return;
+	list_del(&du->list);
+	kfree(du);
+}
+
+static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct list_head *pos, *q;
+	struct delayed_uprobe *du;
+
+	if (!uprobe && !mm)
+		return;
+
+	list_for_each_safe(pos, q, &delayed_uprobe_list) {
+		du = list_entry(pos, struct delayed_uprobe, list);
+
+		if (uprobe && du->uprobe != uprobe)
+			continue;
+		if (mm && du->mm != mm)
+			continue;
+
+		delayed_uprobe_delete(du);
+	}
+}
+
+static bool valid_ref_ctr_vma(struct uprobe *uprobe,
+			      struct vm_area_struct *vma)
+{
+	unsigned long vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
+
+	return uprobe->ref_ctr_offset &&
+		vma->vm_file &&
+		file_inode(vma->vm_file) == uprobe->inode &&
+		(vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
+		vma->vm_start <= vaddr &&
+		vma->vm_end > vaddr;
+}
+
+static struct vm_area_struct *
+find_ref_ctr_vma(struct uprobe *uprobe, struct mm_struct *mm)
+{
+	struct vm_area_struct *tmp;
+
+	for (tmp = mm->mmap; tmp; tmp = tmp->vm_next)
+		if (valid_ref_ctr_vma(uprobe, tmp))
+			return tmp;
+
+	return NULL;
+}
+
+static int
+__update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
+{
+	void *kaddr;
+	struct page *page;
+	struct vm_area_struct *vma;
+	int ret;
+	short *ptr;
+
+	if (!vaddr || !d)
+		return -EINVAL;
+
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+			FOLL_WRITE, &page, &vma, NULL);
+	if (unlikely(ret <= 0)) {
+		/*
+		 * We are asking for 1 page. If get_user_pages_remote() fails,
+		 * it may return 0, in that case we have to return error.
+		 */
+		return ret == 0 ? -EBUSY : ret;
+	}
+
+	kaddr = kmap_atomic(page);
+	ptr = kaddr + (vaddr & ~PAGE_MASK);
+
+	if (unlikely(*ptr + d < 0)) {
+		pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
+			"curr val: %d, delta: %d\n", vaddr, *ptr, d);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	*ptr += d;
+	ret = 0;
+out:
+	kunmap_atomic(kaddr);
+	put_page(page);
+	return ret;
+}
+
+static void update_ref_ctr_warn(struct uprobe *uprobe,
+				struct mm_struct *mm, short d)
+{
+	pr_warn("ref_ctr %s failed for inode: 0x%lx offset: "
+		"0x%llx ref_ctr_offset: 0x%llx of mm: 0x%pK\n",
+		d > 0 ? "increment" : "decrement", uprobe->inode->i_ino,
+		(unsigned long long) uprobe->offset,
+		(unsigned long long) uprobe->ref_ctr_offset, mm);
+}
+
+static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
+			  short d)
+{
+	struct vm_area_struct *rc_vma;
+	unsigned long rc_vaddr;
+	int ret = 0;
+
+	rc_vma = find_ref_ctr_vma(uprobe, mm);
+
+	if (rc_vma) {
+		rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
+		ret = __update_ref_ctr(mm, rc_vaddr, d);
+		if (ret)
+			update_ref_ctr_warn(uprobe, mm, d);
+
+		if (d > 0)
+			return ret;
+	}
+
+	mutex_lock(&delayed_uprobe_lock);
+	if (d > 0)
+		ret = delayed_uprobe_add(uprobe, mm);
+	else
+		delayed_uprobe_remove(uprobe, mm);
+	mutex_unlock(&delayed_uprobe_lock);
+
+	return ret;
+}
+
 /*
  * NOTE:
  * Expect the breakpoint instruction to be the smallest size instruction for
@@ -302,9 +472,13 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
 int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 			unsigned long vaddr, uprobe_opcode_t opcode)
 {
+	struct uprobe *uprobe;
 	struct page *old_page, *new_page;
 	struct vm_area_struct *vma;
-	int ret;
+	int ret, is_register, ref_ctr_updated = 0;
+
+	is_register = is_swbp_insn(&opcode);
+	uprobe = container_of(auprobe, struct uprobe, arch);
 
 retry:
 	/* Read the page with vaddr into memory */
@@ -317,6 +491,15 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	if (ret <= 0)
 		goto put_old;
 
+	/* We are going to replace instruction, update ref_ctr. */
+	if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
+		ret = update_ref_ctr(uprobe, mm, is_register ? 1 : -1);
+		if (ret)
+			goto put_old;
+
+		ref_ctr_updated = 1;
+	}
+
 	ret = anon_vma_prepare(vma);
 	if (ret)
 		goto put_old;
@@ -337,6 +520,11 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 
 	if (unlikely(ret == -EAGAIN))
 		goto retry;
+
+	/* Revert back reference counter if instruction update failed. */
+	if (ret && is_register && ref_ctr_updated)
+		update_ref_ctr(uprobe, mm, -1);
+
 	return ret;
 }
 
@@ -378,8 +566,15 @@ static struct uprobe *get_uprobe(struct uprobe *uprobe)
 
 static void put_uprobe(struct uprobe *uprobe)
 {
-	if (atomic_dec_and_test(&uprobe->ref))
+	if (atomic_dec_and_test(&uprobe->ref)) {
+		/*
+		 * If application munmap(exec_vma) before uprobe_unregister()
+		 * gets called, we don't get a chance to remove uprobe from
+		 * delayed_uprobe_list from remove_breakpoint(). Do it here.
+		 */
+		delayed_uprobe_remove(uprobe, NULL);
 		kfree(uprobe);
+	}
 }
 
 static int match_uprobe(struct uprobe *l, struct uprobe *r)
@@ -484,7 +679,8 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
 	return u;
 }
 
-static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
+static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
+				   loff_t ref_ctr_offset)
 {
 	struct uprobe *uprobe, *cur_uprobe;
 
@@ -494,6 +690,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
 
 	uprobe->inode = inode;
 	uprobe->offset = offset;
+	uprobe->ref_ctr_offset = ref_ctr_offset;
 	init_rwsem(&uprobe->register_rwsem);
 	init_rwsem(&uprobe->consumer_rwsem);
 
@@ -895,7 +1092,7 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
  * else return 0 (success)
  */
 static int __uprobe_register(struct inode *inode, loff_t offset,
-			     struct uprobe_consumer *uc)
+			     loff_t ref_ctr_offset, struct uprobe_consumer *uc)
 {
 	struct uprobe *uprobe;
 	int ret;
@@ -912,7 +1109,7 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 		return -EINVAL;
 
  retry:
-	uprobe = alloc_uprobe(inode, offset);
+	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (!uprobe)
 		return -ENOMEM;
 	/*
@@ -938,10 +1135,17 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 int uprobe_register(struct inode *inode, loff_t offset,
 		    struct uprobe_consumer *uc)
 {
-	return __uprobe_register(inode, offset, uc);
+	return __uprobe_register(inode, offset, 0, uc);
 }
 EXPORT_SYMBOL_GPL(uprobe_register);
 
+int uprobe_register_refctr(struct inode *inode, loff_t offset,
+			   loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+	return __uprobe_register(inode, offset, ref_ctr_offset, uc);
+}
+EXPORT_SYMBOL_GPL(uprobe_register_refctr);
+
 /*
  * uprobe_apply - unregister a already registered probe.
  * @inode: the file in which the probe has to be removed.
@@ -1060,6 +1264,35 @@ static void build_probe_list(struct inode *inode,
 	spin_unlock(&uprobes_treelock);
 }
 
+/* @vma contains reference counter, not the probed instruction. */
+static int delayed_ref_ctr_inc(struct vm_area_struct *vma)
+{
+	struct list_head *pos, *q;
+	struct delayed_uprobe *du;
+	unsigned long vaddr;
+	int ret = 0, err = 0;
+
+	mutex_lock(&delayed_uprobe_lock);
+	list_for_each_safe(pos, q, &delayed_uprobe_list) {
+		du = list_entry(pos, struct delayed_uprobe, list);
+
+		if (du->mm != vma->vm_mm ||
+		    !valid_ref_ctr_vma(du->uprobe, vma))
+			continue;
+
+		vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
+		ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
+		if (ret) {
+			update_ref_ctr_warn(du->uprobe, vma->vm_mm, 1);
+			if (!err)
+				err = ret;
+		}
+		delayed_uprobe_delete(du);
+	}
+	mutex_unlock(&delayed_uprobe_lock);
+	return err;
+}
+
 /*
  * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
  *
@@ -1072,7 +1305,15 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	struct uprobe *uprobe, *u;
 	struct inode *inode;
 
-	if (no_uprobe_events() || !valid_vma(vma, true))
+	if (no_uprobe_events())
+		return 0;
+
+	if (vma->vm_file &&
+	    (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
+	    test_bit(MMF_HAS_UPROBES, &vma->vm_mm->flags))
+		delayed_ref_ctr_inc(vma);
+
+	if (!valid_vma(vma, true))
 		return 0;
 
 	inode = file_inode(vma->vm_file);
@@ -1246,6 +1487,10 @@ void uprobe_clear_state(struct mm_struct *mm)
 {
 	struct xol_area *area = mm->uprobes_state.xol_area;
 
+	mutex_lock(&delayed_uprobe_lock);
+	delayed_uprobe_remove(NULL, mm);
+	mutex_unlock(&delayed_uprobe_lock);
+
 	if (!area)
 		return;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2dad27809794..23689831f656 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4620,7 +4620,7 @@ static const char readme_msg[] =
   "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
 #endif
 #ifdef CONFIG_UPROBE_EVENTS
-	"\t    place: <path>:<offset>\n"
+  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
 #endif
 	"\t     args: <name>=fetcharg[:type]\n"
 	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index ac02fafc9f1b..a7ef6c4ca16e 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -59,6 +59,7 @@ struct trace_uprobe {
 	struct inode			*inode;
 	char				*filename;
 	unsigned long			offset;
+	unsigned long			ref_ctr_offset;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -364,10 +365,10 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 static int create_trace_uprobe(int argc, char **argv)
 {
 	struct trace_uprobe *tu;
-	char *arg, *event, *group, *filename;
+	char *arg, *event, *group, *filename, *rctr, *rctr_end;
 	char buf[MAX_EVENT_NAME_LEN];
 	struct path path;
-	unsigned long offset;
+	unsigned long offset, ref_ctr_offset;
 	bool is_delete, is_return;
 	int i, ret;
 
@@ -376,6 +377,7 @@ static int create_trace_uprobe(int argc, char **argv)
 	is_return = false;
 	event = NULL;
 	group = NULL;
+	ref_ctr_offset = 0;
 
 	/* argc must be >= 1 */
 	if (argv[0][0] == '-')
@@ -450,6 +452,26 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 
+	/* Parse reference counter offset if specified. */
+	rctr = strchr(arg, '(');
+	if (rctr) {
+		rctr_end = strchr(rctr, ')');
+		if (rctr > rctr_end || *(rctr_end + 1) != 0) {
+			ret = -EINVAL;
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+
+		*rctr++ = '\0';
+		*rctr_end = '\0';
+		ret = kstrtoul(rctr, 0, &ref_ctr_offset);
+		if (ret) {
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+	}
+
+	/* Parse uprobe offset. */
 	ret = kstrtoul(arg, 0, &offset);
 	if (ret)
 		goto fail_address_parse;
@@ -484,6 +506,7 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 	tu->offset = offset;
+	tu->ref_ctr_offset = ref_ctr_offset;
 	tu->path = path;
 	tu->filename = kstrdup(filename, GFP_KERNEL);
 
@@ -602,6 +625,9 @@ static int probes_seq_show(struct seq_file *m, void *v)
 			trace_event_name(&tu->tp.call), tu->filename,
 			(int)(sizeof(void *) * 2), tu->offset);
 
+	if (tu->ref_ctr_offset)
+		seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
+
 	for (i = 0; i < tu->tp.nr_args; i++)
 		seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
 
@@ -917,7 +943,13 @@ probe_event_enable(struct trace_uprobe *tu, struct trace_event_file *file,
 
 	tu->consumer.filter = filter;
 	tu->inode = d_real_inode(tu->path.dentry);
-	ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
+	if (tu->ref_ctr_offset) {
+		ret = uprobe_register_refctr(tu->inode, tu->offset,
+				tu->ref_ctr_offset, &tu->consumer);
+	} else {
+		ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
+	}
+
 	if (ret)
 		goto err_buffer;
 
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v9 2/4] Uprobes/sdt: Prevent multiple reference counter for same uprobe
  2018-08-20  4:42 [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
  2018-08-20  4:42 ` [PATCH v9 1/4] " Ravi Bangoria
@ 2018-08-20  4:42 ` Ravi Bangoria
  2018-08-20  5:54   ` Song Liu
  2018-08-20  4:42 ` [PATCH v9 3/4] trace_uprobe/sdt: " Ravi Bangoria
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Ravi Bangoria @ 2018-08-20  4:42 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to register multiple uprobes having same
inode+offset but different reference counter.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/uprobes.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 35065febcb6c..ecee371a59c7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -679,6 +679,16 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
 	return u;
 }
 
+static void
+ref_ctr_mismatch_warn(struct uprobe *cur_uprobe, struct uprobe *uprobe)
+{
+	pr_warn("ref_ctr_offset mismatch. inode: 0x%lx offset: 0x%llx "
+		"ref_ctr_offset(old): 0x%llx ref_ctr_offset(new): 0x%llx\n",
+		uprobe->inode->i_ino, (unsigned long long) uprobe->offset,
+		(unsigned long long) cur_uprobe->ref_ctr_offset,
+		(unsigned long long) uprobe->ref_ctr_offset);
+}
+
 static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 				   loff_t ref_ctr_offset)
 {
@@ -698,6 +708,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 	cur_uprobe = insert_uprobe(uprobe);
 	/* a uprobe exists for this inode:offset combination */
 	if (cur_uprobe) {
+		if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+			ref_ctr_mismatch_warn(cur_uprobe, uprobe);
+			put_uprobe(cur_uprobe);
+			kfree(uprobe);
+			return ERR_PTR(-EINVAL);
+		}
 		kfree(uprobe);
 		uprobe = cur_uprobe;
 	}
@@ -1112,6 +1128,9 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (!uprobe)
 		return -ENOMEM;
+	if (IS_ERR(uprobe))
+		return PTR_ERR(uprobe);
+
 	/*
 	 * We can race with uprobe_unregister()->delete_uprobe().
 	 * Check uprobe_is_active() and retry if it is false.
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v9 3/4] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  2018-08-20  4:42 [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
  2018-08-20  4:42 ` [PATCH v9 1/4] " Ravi Bangoria
  2018-08-20  4:42 ` [PATCH v9 2/4] Uprobes/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
@ 2018-08-20  4:42 ` Ravi Bangoria
  2018-08-21 20:46   ` Song Liu
  2018-08-20  4:42 ` [PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
  2018-08-20 17:38 ` [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Song Liu
  4 siblings, 1 reply; 24+ messages in thread
From: Ravi Bangoria @ 2018-08-20  4:42 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xfffff)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

There is one exception though:
When user is trying to replace the old entry with the new
one, we allow this if the new entry does not conflict with
any other existing entries.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_uprobe.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index a7ef6c4ca16e..21d9fffaa096 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
 	return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Though, there is one exception: If user is
+ * replacing old trace_uprobe with new one(same group/event),
+ * then we allow same uprobe with new reference counter as far
+ * as the new one does not conflict with any other existing
+ * ones.
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+	struct trace_uprobe *tmp, *old = NULL;
+	struct inode *new_inode = d_real_inode(new->path.dentry);
+
+	old = find_probe_event(trace_event_name(&new->tp.call),
+				new->tp.call.class->system);
+
+	list_for_each_entry(tmp, &uprobe_list, list) {
+		if ((old ? old != tmp : true) &&
+		    new_inode == d_real_inode(tmp->path.dentry) &&
+		    new->offset == tmp->offset &&
+		    new->ref_ctr_offset != tmp->ref_ctr_offset) {
+			pr_warn("Reference counter offset mismatch.");
+			return ERR_PTR(-EINVAL);
+		}
+	}
+	return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 	mutex_lock(&uprobe_lock);
 
 	/* register as an event */
-	old_tu = find_probe_event(trace_event_name(&tu->tp.call),
-			tu->tp.call.class->system);
+	old_tu = find_old_trace_uprobe(tu);
+	if (IS_ERR(old_tu)) {
+		ret = PTR_ERR(old_tu);
+		goto end;
+	}
+
 	if (old_tu) {
 		/* delete old event */
 		ret = unregister_trace_uprobe(old_tu);
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore)
  2018-08-20  4:42 [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (2 preceding siblings ...)
  2018-08-20  4:42 ` [PATCH v9 3/4] trace_uprobe/sdt: " Ravi Bangoria
@ 2018-08-20  4:42 ` Ravi Bangoria
  2018-08-21 21:33   ` Song Liu
  2018-08-30 18:45   ` Steven Rostedt
  2018-08-20 17:38 ` [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Song Liu
  4 siblings, 2 replies; 24+ messages in thread
From: Ravi Bangoria @ 2018-08-20  4:42 UTC (permalink / raw)
  To: srikar, oleg, rostedt, mhiramat, liu.song.a23
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, ananth, alexis.berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
    Name: loop2
    ... Semaphore: 0x0000000010020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 tools/perf/util/probe-event.c | 39 ++++++++++++++++++++++++++++++++----
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++++++++++++++++++++++++++------
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ++++++++++++++++++++++++++++++++-----------
 tools/perf/util/symbol.h      |  7 +++++++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f119eb628dbb..e86f8be89157 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev)
 			tp->offset = strtoul(fmt2_str, NULL, 10);
 	}
 
+	if (tev->uprobes) {
+		fmt2_str = strchr(p, '(');
+		if (fmt2_str)
+			tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+	}
+
 	tev->nargs = argc - 2;
 	tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
 	if (tev->args == NULL) {
@@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct probe_trace_arg *arg,
 	return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+	struct probe_trace_point *tp = &tev->point;
+	int err;
+
+	err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+	if (err >= 0 && tp->ref_ctr_offset) {
+		if (!uprobe_ref_ctr_is_supported())
+			return -1;
+		err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+	}
+	return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
 	struct probe_trace_point *tp = &tev->point;
@@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 	}
 
 	/* Use the tp->address for uprobes */
-	if (tev->uprobes)
-		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
-	else if (!strncmp(tp->symbol, "0x", 2))
+	if (tev->uprobes) {
+		err = synthesize_uprobe_trace_def(tev, &buf);
+	} else if (!strncmp(tp->symbol, "0x", 2)) {
 		/* Absolute address. See try_to_find_absolute_address() */
 		err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
 				  tp->module ? ":" : "", tp->address);
-	else
+	} else {
 		err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
 				tp->module ? ":" : "", tp->symbol, tp->offset);
+	}
+
 	if (err)
 		goto error;
 
@@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct probe_trace_event *tev)
 {
 	int i;
 	char *buf = synthesize_probe_trace_command(tev);
+	struct probe_trace_point *tp = &tev->point;
+
+	if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+		pr_warning("A semaphore is associated with %s:%s and "
+			   "seems your kernel doesn't support it.\n",
+			   tev->group, tev->event);
+	}
 
 	/* Old uprobe event doesn't support memory dereference */
 	if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f020558..15a98c3a2a2f 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
 	char		*symbol;	/* Base symbol */
 	char		*module;	/* Module name */
 	unsigned long	offset;		/* Offset from symbol */
+	unsigned long	ref_ctr_offset;	/* SDT reference counter offset */
 	unsigned long	address;	/* Actual address of the trace point */
 	bool		retprobe;	/* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index b76088fadf3d..aac7817d9e14 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -696,8 +696,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
 static unsigned long long sdt_note__get_addr(struct sdt_note *note)
 {
-	return note->bit32 ? (unsigned long long)note->addr.a32[0]
-		 : (unsigned long long)note->addr.a64[0];
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_LOC] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_LOC];
+}
+
+static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
+{
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_REFCTR] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_REFCTR];
 }
 
 static const char * const type_to_suffix[] = {
@@ -775,14 +783,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
 {
 	struct strbuf buf;
 	char *ret = NULL, **args;
-	int i, args_count;
+	int i, args_count, err;
+	unsigned long long ref_ctr_offset;
 
 	if (strbuf_init(&buf, 32) < 0)
 		return NULL;
 
-	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
-				sdtgrp, note->name, pathname,
-				sdt_note__get_addr(note)) < 0)
+	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
+			sdtgrp, note->name, pathname,
+			sdt_note__get_addr(note));
+
+	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
+	if (ref_ctr_offset && err >= 0)
+		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
+
+	if (err < 0)
 		goto error;
 
 	if (!note->args)
@@ -998,6 +1013,7 @@ int probe_cache__show_all_caches(struct strfilter *filter)
 enum ftrace_readme {
 	FTRACE_README_PROBE_TYPE_X = 0,
 	FTRACE_README_KRETPROBE_OFFSET,
+	FTRACE_README_UPROBE_REF_CTR,
 	FTRACE_README_END,
 };
 
@@ -1009,6 +1025,7 @@ static struct {
 	[idx] = {.pattern = pat, .avail = false}
 	DEFINE_TYPE(FTRACE_README_PROBE_TYPE_X, "*type: * x8/16/32/64,*"),
 	DEFINE_TYPE(FTRACE_README_KRETPROBE_OFFSET, "*place (kretprobe): *"),
+	DEFINE_TYPE(FTRACE_README_UPROBE_REF_CTR, "*ref_ctr_offset*"),
 };
 
 static bool scan_ftrace_readme(enum ftrace_readme type)
@@ -1064,3 +1081,8 @@ bool kretprobe_offset_is_supported(void)
 {
 	return scan_ftrace_readme(FTRACE_README_KRETPROBE_OFFSET);
 }
+
+bool uprobe_ref_ctr_is_supported(void)
+{
+	return scan_ftrace_readme(FTRACE_README_UPROBE_REF_CTR);
+}
diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
index 63f29b1d22c1..2a249182f2a6 100644
--- a/tools/perf/util/probe-file.h
+++ b/tools/perf/util/probe-file.h
@@ -69,6 +69,7 @@ struct probe_cache_entry *probe_cache__find_by_name(struct probe_cache *pcache,
 int probe_cache__show_all_caches(struct strfilter *filter);
 bool probe_type_is_available(enum probe_type type);
 bool kretprobe_offset_is_supported(void);
+bool uprobe_ref_ctr_is_supported(void);
 #else	/* ! HAVE_LIBELF_SUPPORT */
 static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused, struct nsinfo *nsi __maybe_unused)
 {
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 29770ea61768..0281d5e2cd67 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1947,6 +1947,34 @@ void kcore_extract__delete(struct kcore_extract *kce)
 }
 
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
+
+static void sdt_adjust_loc(struct sdt_note *tmp, GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32)
+		tmp->addr.a32[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a32[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a32[SDT_NOTE_IDX_BASE];
+	else
+		tmp->addr.a64[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a64[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a64[SDT_NOTE_IDX_BASE];
+}
+
+static void sdt_adjust_refctr(struct sdt_note *tmp, GElf_Addr base_addr,
+			      GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32 && tmp->addr.a32[SDT_NOTE_IDX_REFCTR])
+		tmp->addr.a32[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+	else if (tmp->addr.a64[SDT_NOTE_IDX_REFCTR])
+		tmp->addr.a64[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+}
+
 /**
  * populate_sdt_note : Parse raw data and identify SDT note
  * @elf: elf of the opened file
@@ -1964,7 +1992,6 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	const char *provider, *name, *args;
 	struct sdt_note *tmp = NULL;
 	GElf_Ehdr ehdr;
-	GElf_Addr base_off = 0;
 	GElf_Shdr shdr;
 	int ret = -EINVAL;
 
@@ -2060,17 +2087,12 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	 * base address in the description of the SDT note. If its different,
 	 * then accordingly, adjust the note location.
 	 */
-	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL)) {
-		base_off = shdr.sh_offset;
-		if (base_off) {
-			if (tmp->bit32)
-				tmp->addr.a32[0] = tmp->addr.a32[0] + base_off -
-					tmp->addr.a32[1];
-			else
-				tmp->addr.a64[0] = tmp->addr.a64[0] + base_off -
-					tmp->addr.a64[1];
-		}
-	}
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL))
+		sdt_adjust_loc(tmp, shdr.sh_offset);
+
+	/* Adjust reference counter offset */
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_PROBES_SCN, NULL))
+		sdt_adjust_refctr(tmp, shdr.sh_addr, shdr.sh_offset);
 
 	list_add_tail(&tmp->note_list, sdt_notes);
 	return 0;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index f25fae4b5743..20f49779116b 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -379,12 +379,19 @@ int get_sdt_note_list(struct list_head *head, const char *target);
 int cleanup_sdt_note_list(struct list_head *sdt_notes);
 int sdt_notes__get_count(struct list_head *start);
 
+#define SDT_PROBES_SCN ".probes"
 #define SDT_BASE_SCN ".stapsdt.base"
 #define SDT_NOTE_SCN  ".note.stapsdt"
 #define SDT_NOTE_TYPE 3
 #define SDT_NOTE_NAME "stapsdt"
 #define NR_ADDR 3
 
+enum {
+	SDT_NOTE_IDX_LOC = 0,
+	SDT_NOTE_IDX_BASE,
+	SDT_NOTE_IDX_REFCTR,
+};
+
 struct mem_info *mem_info__new(void);
 struct mem_info *mem_info__get(struct mem_info *mi);
 void   mem_info__put(struct mem_info *mi);
-- 
2.14.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-20  4:42 ` [PATCH v9 1/4] " Ravi Bangoria
@ 2018-08-20  5:53   ` Song Liu
  2018-08-21 21:35     ` Song Liu
  2018-08-22 12:39   ` Srikar Dronamraju
  1 sibling, 1 reply; 24+ messages in thread
From: Song Liu @ 2018-08-20  5:53 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Sun, Aug 19, 2018 at 9:42 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
>
>     if (reference_counter > 0) {
>         Execute marker instructions;
>     }
>
> Default value of reference counter is 0. Tracer has to increment the
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
>
> Implement the reference counter logic in core uprobe. User will be
> able to use it from trace_uprobe as well as from kernel module. New
> trace_uprobe definition with reference counter will now be:
>
>     <path>:<offset>[(ref_ctr_offset)]
>
> where ref_ctr_offset is an optional field. For kernel module, new
> variant of uprobe_register() has been introduced:
>
>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
>
> No new variant for uprobe_unregister() because it's assumed to have
> only one reference counter for one uprobe.
>
> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
>
> Note: 'reference counter' is called as 'semaphore' in original Dtrace
> (or Systemtap, bcc and even in ELF) documentation and code. But the
> term 'semaphore' is misleading in this context. This is just a counter
> used to hold number of tracers tracing on a marker. This is not really
> used for any synchronization. So we are calling it a 'reference counter'
> in kernel / perf code.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
> [Only trace_uprobe.c]
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>

Reviewed-by: Song Liu <songliubraving@fb.com>

> ---
>  include/linux/uprobes.h     |   5 +
>  kernel/events/uprobes.c     | 259 ++++++++++++++++++++++++++++++++++++++++++--
>  kernel/trace/trace.c        |   2 +-
>  kernel/trace/trace_uprobe.c |  38 ++++++-
>  4 files changed, 293 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index bb9d2084af03..103a48a48872 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
>  extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
>  extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
>  extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
> +extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
>  extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
>  extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
>  extern int uprobe_mmap(struct vm_area_struct *vma);
> @@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
>  {
>         return -ENOSYS;
>  }
> +static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
> +{
> +       return -ENOSYS;
> +}
>  static inline int
>  uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
>  {
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 919c1ce32beb..35065febcb6c 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -73,6 +73,7 @@ struct uprobe {
>         struct uprobe_consumer  *consumers;
>         struct inode            *inode;         /* Also hold a ref to inode */
>         loff_t                  offset;
> +       loff_t                  ref_ctr_offset;
>         unsigned long           flags;
>
>         /*
> @@ -88,6 +89,15 @@ struct uprobe {
>         struct arch_uprobe      arch;
>  };
>
> +struct delayed_uprobe {
> +       struct list_head list;
> +       struct uprobe *uprobe;
> +       struct mm_struct *mm;
> +};
> +
> +static DEFINE_MUTEX(delayed_uprobe_lock);
> +static LIST_HEAD(delayed_uprobe_list);
> +
>  /*
>   * Execute out of line area: anonymous executable mapping installed
>   * by the probed task to execute the copy of the original instruction
> @@ -282,6 +292,166 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
>         return 1;
>  }
>
> +static struct delayed_uprobe *
> +delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +       struct delayed_uprobe *du;
> +
> +       list_for_each_entry(du, &delayed_uprobe_list, list)
> +               if (du->uprobe == uprobe && du->mm == mm)
> +                       return du;
> +       return NULL;
> +}
> +
> +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +       struct delayed_uprobe *du;
> +
> +       if (delayed_uprobe_check(uprobe, mm))
> +               return 0;
> +
> +       du  = kzalloc(sizeof(*du), GFP_KERNEL);
> +       if (!du)
> +               return -ENOMEM;
> +
> +       du->uprobe = uprobe;
> +       du->mm = mm;
> +       list_add(&du->list, &delayed_uprobe_list);
> +       return 0;
> +}
> +
> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
> +{
> +       if (WARN_ON(!du))
> +               return;
> +       list_del(&du->list);
> +       kfree(du);
> +}
> +
> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +       struct list_head *pos, *q;
> +       struct delayed_uprobe *du;
> +
> +       if (!uprobe && !mm)
> +               return;
> +
> +       list_for_each_safe(pos, q, &delayed_uprobe_list) {
> +               du = list_entry(pos, struct delayed_uprobe, list);
> +
> +               if (uprobe && du->uprobe != uprobe)
> +                       continue;
> +               if (mm && du->mm != mm)
> +                       continue;
> +
> +               delayed_uprobe_delete(du);
> +       }
> +}
> +
> +static bool valid_ref_ctr_vma(struct uprobe *uprobe,
> +                             struct vm_area_struct *vma)
> +{
> +       unsigned long vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
> +
> +       return uprobe->ref_ctr_offset &&
> +               vma->vm_file &&
> +               file_inode(vma->vm_file) == uprobe->inode &&
> +               (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
> +               vma->vm_start <= vaddr &&
> +               vma->vm_end > vaddr;
> +}
> +
> +static struct vm_area_struct *
> +find_ref_ctr_vma(struct uprobe *uprobe, struct mm_struct *mm)
> +{
> +       struct vm_area_struct *tmp;
> +
> +       for (tmp = mm->mmap; tmp; tmp = tmp->vm_next)
> +               if (valid_ref_ctr_vma(uprobe, tmp))
> +                       return tmp;
> +
> +       return NULL;
> +}
> +
> +static int
> +__update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
> +{
> +       void *kaddr;
> +       struct page *page;
> +       struct vm_area_struct *vma;
> +       int ret;
> +       short *ptr;
> +
> +       if (!vaddr || !d)
> +               return -EINVAL;
> +
> +       ret = get_user_pages_remote(NULL, mm, vaddr, 1,
> +                       FOLL_WRITE, &page, &vma, NULL);
> +       if (unlikely(ret <= 0)) {
> +               /*
> +                * We are asking for 1 page. If get_user_pages_remote() fails,
> +                * it may return 0, in that case we have to return error.
> +                */
> +               return ret == 0 ? -EBUSY : ret;
> +       }
> +
> +       kaddr = kmap_atomic(page);
> +       ptr = kaddr + (vaddr & ~PAGE_MASK);
> +
> +       if (unlikely(*ptr + d < 0)) {
> +               pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
> +                       "curr val: %d, delta: %d\n", vaddr, *ptr, d);
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       *ptr += d;
> +       ret = 0;
> +out:
> +       kunmap_atomic(kaddr);
> +       put_page(page);
> +       return ret;
> +}
> +
> +static void update_ref_ctr_warn(struct uprobe *uprobe,
> +                               struct mm_struct *mm, short d)
> +{
> +       pr_warn("ref_ctr %s failed for inode: 0x%lx offset: "
> +               "0x%llx ref_ctr_offset: 0x%llx of mm: 0x%pK\n",
> +               d > 0 ? "increment" : "decrement", uprobe->inode->i_ino,
> +               (unsigned long long) uprobe->offset,
> +               (unsigned long long) uprobe->ref_ctr_offset, mm);
> +}
> +
> +static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
> +                         short d)
> +{
> +       struct vm_area_struct *rc_vma;
> +       unsigned long rc_vaddr;
> +       int ret = 0;
> +
> +       rc_vma = find_ref_ctr_vma(uprobe, mm);
> +
> +       if (rc_vma) {
> +               rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
> +               ret = __update_ref_ctr(mm, rc_vaddr, d);
> +               if (ret)
> +                       update_ref_ctr_warn(uprobe, mm, d);
> +
> +               if (d > 0)
> +                       return ret;
> +       }
> +
> +       mutex_lock(&delayed_uprobe_lock);
> +       if (d > 0)
> +               ret = delayed_uprobe_add(uprobe, mm);
> +       else
> +               delayed_uprobe_remove(uprobe, mm);
> +       mutex_unlock(&delayed_uprobe_lock);
> +
> +       return ret;
> +}
> +
>  /*
>   * NOTE:
>   * Expect the breakpoint instruction to be the smallest size instruction for
> @@ -302,9 +472,13 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
>  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>                         unsigned long vaddr, uprobe_opcode_t opcode)
>  {
> +       struct uprobe *uprobe;
>         struct page *old_page, *new_page;
>         struct vm_area_struct *vma;
> -       int ret;
> +       int ret, is_register, ref_ctr_updated = 0;
> +
> +       is_register = is_swbp_insn(&opcode);
> +       uprobe = container_of(auprobe, struct uprobe, arch);
>
>  retry:
>         /* Read the page with vaddr into memory */
> @@ -317,6 +491,15 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>         if (ret <= 0)
>                 goto put_old;
>
> +       /* We are going to replace instruction, update ref_ctr. */
> +       if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
> +               ret = update_ref_ctr(uprobe, mm, is_register ? 1 : -1);
> +               if (ret)
> +                       goto put_old;
> +
> +               ref_ctr_updated = 1;
> +       }
> +
>         ret = anon_vma_prepare(vma);
>         if (ret)
>                 goto put_old;
> @@ -337,6 +520,11 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>
>         if (unlikely(ret == -EAGAIN))
>                 goto retry;
> +
> +       /* Revert back reference counter if instruction update failed. */
> +       if (ret && is_register && ref_ctr_updated)
> +               update_ref_ctr(uprobe, mm, -1);
> +
>         return ret;
>  }
>
> @@ -378,8 +566,15 @@ static struct uprobe *get_uprobe(struct uprobe *uprobe)
>
>  static void put_uprobe(struct uprobe *uprobe)
>  {
> -       if (atomic_dec_and_test(&uprobe->ref))
> +       if (atomic_dec_and_test(&uprobe->ref)) {
> +               /*
> +                * If application munmap(exec_vma) before uprobe_unregister()
> +                * gets called, we don't get a chance to remove uprobe from
> +                * delayed_uprobe_list from remove_breakpoint(). Do it here.
> +                */
> +               delayed_uprobe_remove(uprobe, NULL);
>                 kfree(uprobe);
> +       }
>  }
>
>  static int match_uprobe(struct uprobe *l, struct uprobe *r)
> @@ -484,7 +679,8 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
>         return u;
>  }
>
> -static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
> +static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
> +                                  loff_t ref_ctr_offset)
>  {
>         struct uprobe *uprobe, *cur_uprobe;
>
> @@ -494,6 +690,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
>
>         uprobe->inode = inode;
>         uprobe->offset = offset;
> +       uprobe->ref_ctr_offset = ref_ctr_offset;
>         init_rwsem(&uprobe->register_rwsem);
>         init_rwsem(&uprobe->consumer_rwsem);
>
> @@ -895,7 +1092,7 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
>   * else return 0 (success)
>   */
>  static int __uprobe_register(struct inode *inode, loff_t offset,
> -                            struct uprobe_consumer *uc)
> +                            loff_t ref_ctr_offset, struct uprobe_consumer *uc)
>  {
>         struct uprobe *uprobe;
>         int ret;
> @@ -912,7 +1109,7 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
>                 return -EINVAL;
>
>   retry:
> -       uprobe = alloc_uprobe(inode, offset);
> +       uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
>         if (!uprobe)
>                 return -ENOMEM;
>         /*
> @@ -938,10 +1135,17 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
>  int uprobe_register(struct inode *inode, loff_t offset,
>                     struct uprobe_consumer *uc)
>  {
> -       return __uprobe_register(inode, offset, uc);
> +       return __uprobe_register(inode, offset, 0, uc);
>  }
>  EXPORT_SYMBOL_GPL(uprobe_register);
>
> +int uprobe_register_refctr(struct inode *inode, loff_t offset,
> +                          loff_t ref_ctr_offset, struct uprobe_consumer *uc)
> +{
> +       return __uprobe_register(inode, offset, ref_ctr_offset, uc);
> +}
> +EXPORT_SYMBOL_GPL(uprobe_register_refctr);
> +
>  /*
>   * uprobe_apply - unregister a already registered probe.
>   * @inode: the file in which the probe has to be removed.
> @@ -1060,6 +1264,35 @@ static void build_probe_list(struct inode *inode,
>         spin_unlock(&uprobes_treelock);
>  }
>
> +/* @vma contains reference counter, not the probed instruction. */
> +static int delayed_ref_ctr_inc(struct vm_area_struct *vma)
> +{
> +       struct list_head *pos, *q;
> +       struct delayed_uprobe *du;
> +       unsigned long vaddr;
> +       int ret = 0, err = 0;
> +
> +       mutex_lock(&delayed_uprobe_lock);
> +       list_for_each_safe(pos, q, &delayed_uprobe_list) {
> +               du = list_entry(pos, struct delayed_uprobe, list);
> +
> +               if (du->mm != vma->vm_mm ||
> +                   !valid_ref_ctr_vma(du->uprobe, vma))
> +                       continue;
> +
> +               vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
> +               ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
> +               if (ret) {
> +                       update_ref_ctr_warn(du->uprobe, vma->vm_mm, 1);
> +                       if (!err)
> +                               err = ret;
> +               }
> +               delayed_uprobe_delete(du);
> +       }
> +       mutex_unlock(&delayed_uprobe_lock);
> +       return err;
> +}
> +
>  /*
>   * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
>   *
> @@ -1072,7 +1305,15 @@ int uprobe_mmap(struct vm_area_struct *vma)
>         struct uprobe *uprobe, *u;
>         struct inode *inode;
>
> -       if (no_uprobe_events() || !valid_vma(vma, true))
> +       if (no_uprobe_events())
> +               return 0;
> +
> +       if (vma->vm_file &&
> +           (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
> +           test_bit(MMF_HAS_UPROBES, &vma->vm_mm->flags))
> +               delayed_ref_ctr_inc(vma);
> +
> +       if (!valid_vma(vma, true))
>                 return 0;
>
>         inode = file_inode(vma->vm_file);
> @@ -1246,6 +1487,10 @@ void uprobe_clear_state(struct mm_struct *mm)
>  {
>         struct xol_area *area = mm->uprobes_state.xol_area;
>
> +       mutex_lock(&delayed_uprobe_lock);
> +       delayed_uprobe_remove(NULL, mm);
> +       mutex_unlock(&delayed_uprobe_lock);
> +
>         if (!area)
>                 return;
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 2dad27809794..23689831f656 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -4620,7 +4620,7 @@ static const char readme_msg[] =
>    "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
>  #endif
>  #ifdef CONFIG_UPROBE_EVENTS
> -       "\t    place: <path>:<offset>\n"
> +  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
>  #endif
>         "\t     args: <name>=fetcharg[:type]\n"
>         "\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index ac02fafc9f1b..a7ef6c4ca16e 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -59,6 +59,7 @@ struct trace_uprobe {
>         struct inode                    *inode;
>         char                            *filename;
>         unsigned long                   offset;
> +       unsigned long                   ref_ctr_offset;
>         unsigned long                   nhit;
>         struct trace_probe              tp;
>  };
> @@ -364,10 +365,10 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
>  static int create_trace_uprobe(int argc, char **argv)
>  {
>         struct trace_uprobe *tu;
> -       char *arg, *event, *group, *filename;
> +       char *arg, *event, *group, *filename, *rctr, *rctr_end;
>         char buf[MAX_EVENT_NAME_LEN];
>         struct path path;
> -       unsigned long offset;
> +       unsigned long offset, ref_ctr_offset;
>         bool is_delete, is_return;
>         int i, ret;
>
> @@ -376,6 +377,7 @@ static int create_trace_uprobe(int argc, char **argv)
>         is_return = false;
>         event = NULL;
>         group = NULL;
> +       ref_ctr_offset = 0;
>
>         /* argc must be >= 1 */
>         if (argv[0][0] == '-')
> @@ -450,6 +452,26 @@ static int create_trace_uprobe(int argc, char **argv)
>                 goto fail_address_parse;
>         }
>
> +       /* Parse reference counter offset if specified. */
> +       rctr = strchr(arg, '(');
> +       if (rctr) {
> +               rctr_end = strchr(rctr, ')');
> +               if (rctr > rctr_end || *(rctr_end + 1) != 0) {
> +                       ret = -EINVAL;
> +                       pr_info("Invalid reference counter offset.\n");
> +                       goto fail_address_parse;
> +               }
> +
> +               *rctr++ = '\0';
> +               *rctr_end = '\0';
> +               ret = kstrtoul(rctr, 0, &ref_ctr_offset);
> +               if (ret) {
> +                       pr_info("Invalid reference counter offset.\n");
> +                       goto fail_address_parse;
> +               }
> +       }
> +
> +       /* Parse uprobe offset. */
>         ret = kstrtoul(arg, 0, &offset);
>         if (ret)
>                 goto fail_address_parse;
> @@ -484,6 +506,7 @@ static int create_trace_uprobe(int argc, char **argv)
>                 goto fail_address_parse;
>         }
>         tu->offset = offset;
> +       tu->ref_ctr_offset = ref_ctr_offset;
>         tu->path = path;
>         tu->filename = kstrdup(filename, GFP_KERNEL);
>
> @@ -602,6 +625,9 @@ static int probes_seq_show(struct seq_file *m, void *v)
>                         trace_event_name(&tu->tp.call), tu->filename,
>                         (int)(sizeof(void *) * 2), tu->offset);
>
> +       if (tu->ref_ctr_offset)
> +               seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
> +
>         for (i = 0; i < tu->tp.nr_args; i++)
>                 seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
>
> @@ -917,7 +943,13 @@ probe_event_enable(struct trace_uprobe *tu, struct trace_event_file *file,
>
>         tu->consumer.filter = filter;
>         tu->inode = d_real_inode(tu->path.dentry);
> -       ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> +       if (tu->ref_ctr_offset) {
> +               ret = uprobe_register_refctr(tu->inode, tu->offset,
> +                               tu->ref_ctr_offset, &tu->consumer);
> +       } else {
> +               ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> +       }
> +
>         if (ret)
>                 goto err_buffer;
>
> --
> 2.14.4
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 2/4] Uprobes/sdt: Prevent multiple reference counter for same uprobe
  2018-08-20  4:42 ` [PATCH v9 2/4] Uprobes/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
@ 2018-08-20  5:54   ` Song Liu
  2018-08-21 21:35     ` Song Liu
  0 siblings, 1 reply; 24+ messages in thread
From: Song Liu @ 2018-08-20  5:54 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Sun, Aug 19, 2018 at 9:42 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> We assume to have only one reference counter for one uprobe.
> Don't allow user to register multiple uprobes having same
> inode+offset but different reference counter.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>

Reviewed-by: Song Liu <songliubraving@fb.com>

> ---
>  kernel/events/uprobes.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 35065febcb6c..ecee371a59c7 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -679,6 +679,16 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
>         return u;
>  }
>
> +static void
> +ref_ctr_mismatch_warn(struct uprobe *cur_uprobe, struct uprobe *uprobe)
> +{
> +       pr_warn("ref_ctr_offset mismatch. inode: 0x%lx offset: 0x%llx "
> +               "ref_ctr_offset(old): 0x%llx ref_ctr_offset(new): 0x%llx\n",
> +               uprobe->inode->i_ino, (unsigned long long) uprobe->offset,
> +               (unsigned long long) cur_uprobe->ref_ctr_offset,
> +               (unsigned long long) uprobe->ref_ctr_offset);
> +}
> +
>  static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
>                                    loff_t ref_ctr_offset)
>  {
> @@ -698,6 +708,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
>         cur_uprobe = insert_uprobe(uprobe);
>         /* a uprobe exists for this inode:offset combination */
>         if (cur_uprobe) {
> +               if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
> +                       ref_ctr_mismatch_warn(cur_uprobe, uprobe);
> +                       put_uprobe(cur_uprobe);
> +                       kfree(uprobe);
> +                       return ERR_PTR(-EINVAL);
> +               }
>                 kfree(uprobe);
>                 uprobe = cur_uprobe;
>         }
> @@ -1112,6 +1128,9 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
>         uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
>         if (!uprobe)
>                 return -ENOMEM;
> +       if (IS_ERR(uprobe))
> +               return PTR_ERR(uprobe);
> +
>         /*
>          * We can race with uprobe_unregister()->delete_uprobe().
>          * Check uprobe_is_active() and retry if it is false.
> --
> 2.14.4
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-20  4:42 [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
                   ` (3 preceding siblings ...)
  2018-08-20  4:42 ` [PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
@ 2018-08-20 17:38 ` Song Liu
  2018-08-21  4:42   ` Ravi Bangoria
                     ` (2 more replies)
  4 siblings, 3 replies; 24+ messages in thread
From: Song Liu @ 2018-08-20 17:38 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

I am testing the patch set with the following code:

#include <stdio.h>
#include <unistd.h>

volatile short semaphore = 0;

int for_uprobe(int c)
{
        printf("%d\n", c + 10);
        return c + 1;
}

int main(int argc, char *argv[])
{
        for_uprobe(argc);
        while (1) {
                sleep(1);
                printf("semaphore %d\n", semaphore);
        }
}

I created a uprobe on function for_uprobe(), that uses semaphore as
reference counter:

  echo "p:uprobe_1 /root/a.out:0x49a(0x1036)" >> uprobe_events

I found that if I start a.out before enabling the uprobe, the output
of a.out is like:

root@virt-test:~# ~/a.out
11
semaphore 0
semaphore 0
semaphore 2      <<<  when the uprobe is enabled
semaphore 2
semaphore 2
semaphore 2
semaphore 2
semaphore 0     <<< when the uprobe is disabled
semaphore 0
semaphore 0

I am not quite sure whether the value should be 1 or 2, but it works.

However, if I start a.out AFTER enabling the uprobe, there is something wrong:

root@virt-test:~# ~/a.out
11
semaphore 0       <<< this should be non-zero, as the uprobe is already enabled
semaphore 0
semaphore 0
semaphore 0
semaphore 0       <<< disabling the uprobe, the value is still 0,
                            <<<  get warning "ref_ctr going negative"
at the same time
semaphore 0
semaphore 0
semaphore 1       <<< enable uprobe again, getting 1 this time
semaphore 1
semaphore 1

I haven't spent much time debugging this, but I guess there is
something wrong on the mmap path?

Thanks,
Song

On Sun, Aug 19, 2018 at 9:42 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> v8 -> v9:
>  - Rebased to rostedt/for-next (Commit bb730b5833b5 to be precise)
>  - Not including first two patches now. They are already pulled by
>    Steven.
>  - Change delayed_uprobe_remove() function as suggested by Oleg
>  - Dump inode, offset, ref_ctr_offset, mm etc if we fail to update
>    reference counter.
>  - Rename delayed_uprobe_install() to delayed_ref_ctr_inc()
>  - Use 'short d' (delta) in update_ref_ctr() in place of 'bool
>    is_register'.
>
> v8: https://lkml.org/lkml/2018/8/9/81
>
> Future work:
>  - Optimize uprobe_mmap()->delayed_ref_ctr_inc() by making
>    delayed_uprobe_list per mm.
>
> Description:
> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
>
>     if (reference_counter > 0) {
>         Execute marker instructions;
>     }
>
> Default value of reference counter is 0. Tracer has to increment the
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
>
> Currently, perf tool has limited supports for SDT markers. I.e. it
> can not trace markers surrounded by reference counter. Also, it's
> not easy to add reference counter logic in userspace tool like perf,
> so basic idea for this patchset is to add reference counter logic in
> the a uprobe infrastructure. Ex,[2]
>
>   # cat tick.c
>     ...
>     for (i = 0; i < 100; i++) {
>         DTRACE_PROBE1(tick, loop1, i);
>         if (TICK_LOOP2_ENABLED()) {
>             DTRACE_PROBE1(tick, loop2, i);
>         }
>         printf("hi: %d\n", i);
>         sleep(1);
>     }
>     ...
>
> Here tick:loop1 is marker without reference counter where as tick:loop2
> is surrounded by reference counter condition.
>
>   # perf buildid-cache --add /tmp/tick
>   # perf probe sdt_tick:loop1
>   # perf probe sdt_tick:loop2
>
>   # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
>   hi: 0
>   hi: 1
>   hi: 2
>   ^C
>   Performance counter stats for '/tmp/tick':
>              3      sdt_tick:loop1
>              0      sdt_tick:loop2
>      2.747086086 seconds time elapsed
>
> Perf failed to record data for tick:loop2. Same experiment with this
> patch series:
>
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed
>
> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> [2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
>
> Ravi Bangoria (4):
>   Uprobes: Support SDT markers having reference count (semaphore)
>   Uprobes/sdt: Prevent multiple reference counter for same uprobe
>   trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
>   perf probe: Support SDT markers having reference counter (semaphore)
>
>  include/linux/uprobes.h       |   5 +
>  kernel/events/uprobes.c       | 278 ++++++++++++++++++++++++++++++++++++++++--
>  kernel/trace/trace.c          |   2 +-
>  kernel/trace/trace_uprobe.c   |  75 +++++++++++-
>  tools/perf/util/probe-event.c |  39 +++++-
>  tools/perf/util/probe-event.h |   1 +
>  tools/perf/util/probe-file.c  |  34 +++++-
>  tools/perf/util/probe-file.h  |   1 +
>  tools/perf/util/symbol-elf.c  |  46 +++++--
>  tools/perf/util/symbol.h      |   7 ++
>  10 files changed, 453 insertions(+), 35 deletions(-)
>
> --
> 2.14.4
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-20 17:38 ` [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Song Liu
@ 2018-08-21  4:42   ` Ravi Bangoria
  2018-08-21  4:55     ` Song Liu
  2018-08-21  5:23   ` Ravi Bangoria
  2018-08-21  7:34   ` Naveen N. Rao
  2 siblings, 1 reply; 24+ messages in thread
From: Ravi Bangoria @ 2018-08-21  4:42 UTC (permalink / raw)
  To: Song Liu
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Song,

> root@virt-test:~# ~/a.out
> 11
> semaphore 0
> semaphore 0
> semaphore 2      <<<  when the uprobe is enabled

Yes, this happens when multiple vmas points to the same file portion.
Can you check /proc/`pgrep a.out`/maps.

Logic is simple. If we are going to patch an instruction, increment the
reference counter. If we are going to unpatch an instruction, decrement
the reference counter. In this case, we patched instruction twice and
thus incremented reference counter twice as well.

Ravi


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-21  4:42   ` Ravi Bangoria
@ 2018-08-21  4:55     ` Song Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Song Liu @ 2018-08-21  4:55 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Mon, Aug 20, 2018 at 9:42 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> Hi Song,
>
>> root@virt-test:~# ~/a.out
>> 11
>> semaphore 0
>> semaphore 0
>> semaphore 2      <<<  when the uprobe is enabled
>
> Yes, this happens when multiple vmas points to the same file portion.
> Can you check /proc/`pgrep a.out`/maps.
>
> Logic is simple. If we are going to patch an instruction, increment the
> reference counter. If we are going to unpatch an instruction, decrement
> the reference counter. In this case, we patched instruction twice and
> thus incremented reference counter twice as well.

Yes, this makes sense.

Song

>
> Ravi
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-20 17:38 ` [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Song Liu
  2018-08-21  4:42   ` Ravi Bangoria
@ 2018-08-21  5:23   ` Ravi Bangoria
  2018-08-21  6:20     ` Ravi Bangoria
  2018-08-21  7:34   ` Naveen N. Rao
  2 siblings, 1 reply; 24+ messages in thread
From: Ravi Bangoria @ 2018-08-21  5:23 UTC (permalink / raw)
  To: Song Liu
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Song,

> However, if I start a.out AFTER enabling the uprobe, there is something wrong:
> 
> root@virt-test:~# ~/a.out
> 11
> semaphore 0       <<< this should be non-zero, as the uprobe is already enabled

Ok. I'm able to reproduce this. Digging deeper.

Ravi


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-21  5:23   ` Ravi Bangoria
@ 2018-08-21  6:20     ` Ravi Bangoria
  0 siblings, 0 replies; 24+ messages in thread
From: Ravi Bangoria @ 2018-08-21  6:20 UTC (permalink / raw)
  To: Song Liu
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton,
	Ravi Bangoria

Hi Song,

On 08/21/2018 10:53 AM, Ravi Bangoria wrote:
> Hi Song,
> 
>> However, if I start a.out AFTER enabling the uprobe, there is something wrong:
>>
>> root@virt-test:~# ~/a.out
>> 11
>> semaphore 0       <<< this should be non-zero, as the uprobe is already enabled

In this testcase, semaphore variable is stored into .bss:

  $ nm test | grep semaphore
  0000000010010c5e B semaphore
 
  $ readelf -SW ./test | grep "data\|bss"
    [22] .data             PROGBITS        0000000010010c58 000c58 000004 00  WA  0   0  1
    [23] .bss              NOBITS          0000000010010c5c 000c5c 000004 00  WA  0   0  2

I'm not so sure but I guess .bss data initialization happens after
calling uprobe_mmap() and thus you are seeing semaphore as 0.

To verify this, if I force to save semaphore into data section by
assigning non-zero value to it:

  volatile short semaphore = 1

 $ nm test | grep semaphore
 0000000010010c5c D semaphore

 $ readelf -SW ./test | grep "data\|bss"
    [22] .data             PROGBITS        0000000010010c58 000c58 000006 00  WA  0   0  2
    [23] .bss              NOBITS          0000000010010c5e 000c5e 000002 00  WA  0   0  1 

increment/decrement works fine.

Ravi


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-20 17:38 ` [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Song Liu
  2018-08-21  4:42   ` Ravi Bangoria
  2018-08-21  5:23   ` Ravi Bangoria
@ 2018-08-21  7:34   ` Naveen N. Rao
  2018-08-21 11:58     ` Ravi Bangoria
  2 siblings, 1 reply; 24+ messages in thread
From: Naveen N. Rao @ 2018-08-21  7:34 UTC (permalink / raw)
  To: Song Liu, Ravi Bangoria
  Cc: acme, alexander.shishkin, Alexis Berlemont, ananth, jolsa,
	linux-arm-kernel, linux, open list, linux-mips, mhiramat, mingo,
	namhyung, Oleg Nesterov, paul.burton, Peter Zijlstra, ralf,
	Steven Rostedt, Srikar Dronamraju

Song Liu wrote:
> I am testing the patch set with the following code:
> 
> #include <stdio.h>
> #include <unistd.h>
> 
> volatile short semaphore = 0;
> 
> int for_uprobe(int c)
> {
>         printf("%d\n", c + 10);
>         return c + 1;
> }
> 
> int main(int argc, char *argv[])
> {
>         for_uprobe(argc);
>         while (1) {
>                 sleep(1);
>                 printf("semaphore %d\n", semaphore);
>         }
> }
> 
> I created a uprobe on function for_uprobe(), that uses semaphore as
> reference counter:
> 
>   echo "p:uprobe_1 /root/a.out:0x49a(0x1036)" >> uprobe_events

Is that even valid? That _looks_ like a semaphore, but I'm not quite 
sure that it qualifies as an _SDT_ semaphore. Do you see this issue if 
you instead use the macros provided by <sys/sdt.h> to create SDT 
markers?


- Naveen



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-21  7:34   ` Naveen N. Rao
@ 2018-08-21 11:58     ` Ravi Bangoria
  0 siblings, 0 replies; 24+ messages in thread
From: Ravi Bangoria @ 2018-08-21 11:58 UTC (permalink / raw)
  To: Naveen N. Rao, Song Liu
  Cc: acme, alexander.shishkin, Alexis Berlemont, ananth, jolsa,
	linux-arm-kernel, linux, open list, linux-mips, mhiramat, mingo,
	namhyung, Oleg Nesterov, paul.burton, Peter Zijlstra, ralf,
	Steven Rostedt, Srikar Dronamraju, Ravi Bangoria


On 08/21/2018 01:04 PM, Naveen N. Rao wrote:
> Song Liu wrote:
>> I am testing the patch set with the following code:
>>
>> #include <stdio.h>
>> #include <unistd.h>
>>
>> volatile short semaphore = 0;
>>
>> int for_uprobe(int c)
>> {
>>         printf("%d\n", c + 10);
>>         return c + 1;
>> }
>>
>> int main(int argc, char *argv[])
>> {
>>         for_uprobe(argc);
>>         while (1) {
>>                 sleep(1);
>>                 printf("semaphore %d\n", semaphore);
>>         }
>> }
>>
>> I created a uprobe on function for_uprobe(), that uses semaphore as
>> reference counter:
>>
>>   echo "p:uprobe_1 /root/a.out:0x49a(0x1036)" >> uprobe_events
> 
> Is that even valid? That _looks_ like a semaphore, but I'm not quite sure that it qualifies as an _SDT_ semaphore. Do you see this issue if you instead use the macros provided by <sys/sdt.h> to create SDT markers?
> 

Right. By default SDT reference counters(semaphore) goes into .probes
section:

  [25] .probes           PROGBITS        000000000060102c 00102c 000004 00  WA  0   0  2

which has PROGBITS set. So this works fine. And there are many other
things which are coded into <sys/sdt.h>. So the official way to use
SDT markers should be through that.

Ravi


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 3/4] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  2018-08-20  4:42 ` [PATCH v9 3/4] trace_uprobe/sdt: " Ravi Bangoria
@ 2018-08-21 20:46   ` Song Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Song Liu @ 2018-08-21 20:46 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Sun, Aug 19, 2018 at 9:42 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> We assume to have only one reference counter for one uprobe.
> Don't allow user to add multiple trace_uprobe entries having
> same inode+offset but different reference counter.
>
> Ex,
>   # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
>   # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xfffff)" >> uprobe_events
>   bash: echo: write error: Invalid argument
>
>   # dmesg
>   trace_kprobe: Reference counter offset mismatch.
>
> There is one exception though:
> When user is trying to replace the old entry with the new
> one, we allow this if the new entry does not conflict with
> any other existing entries.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Reviewed-by: Song Liu <songliubraving@fb.com>

Reviewed-and-tested-by: Song Liu <songliubraving@fb.com>

> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/trace/trace_uprobe.c | 37 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index a7ef6c4ca16e..21d9fffaa096 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
>         return 0;
>  }
>
> +/*
> + * Uprobe with multiple reference counter is not allowed. i.e.
> + * If inode and offset matches, reference counter offset *must*
> + * match as well. Though, there is one exception: If user is
> + * replacing old trace_uprobe with new one(same group/event),
> + * then we allow same uprobe with new reference counter as far
> + * as the new one does not conflict with any other existing
> + * ones.
> + */
> +static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
> +{
> +       struct trace_uprobe *tmp, *old = NULL;
> +       struct inode *new_inode = d_real_inode(new->path.dentry);
> +
> +       old = find_probe_event(trace_event_name(&new->tp.call),
> +                               new->tp.call.class->system);
> +
> +       list_for_each_entry(tmp, &uprobe_list, list) {
> +               if ((old ? old != tmp : true) &&
> +                   new_inode == d_real_inode(tmp->path.dentry) &&
> +                   new->offset == tmp->offset &&
> +                   new->ref_ctr_offset != tmp->ref_ctr_offset) {
> +                       pr_warn("Reference counter offset mismatch.");
> +                       return ERR_PTR(-EINVAL);
> +               }
> +       }
> +       return old;
> +}
> +
>  /* Register a trace_uprobe and probe_event */
>  static int register_trace_uprobe(struct trace_uprobe *tu)
>  {
> @@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
>         mutex_lock(&uprobe_lock);
>
>         /* register as an event */
> -       old_tu = find_probe_event(trace_event_name(&tu->tp.call),
> -                       tu->tp.call.class->system);
> +       old_tu = find_old_trace_uprobe(tu);
> +       if (IS_ERR(old_tu)) {
> +               ret = PTR_ERR(old_tu);
> +               goto end;
> +       }
> +
>         if (old_tu) {
>                 /* delete old event */
>                 ret = unregister_trace_uprobe(old_tu);
> --
> 2.14.4
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore)
  2018-08-20  4:42 ` [PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
@ 2018-08-21 21:33   ` Song Liu
  2018-08-30 18:45   ` Steven Rostedt
  1 sibling, 0 replies; 24+ messages in thread
From: Song Liu @ 2018-08-21 21:33 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Sun, Aug 19, 2018 at 9:42 PM, Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> With this, perf buildid-cache will save SDT markers with reference
> counter in probe cache. Perf probe will be able to probe markers
> having reference counter. Ex,
>
>   # readelf -n /tmp/tick | grep -A1 loop2
>     Name: loop2
>     ... Semaphore: 0x0000000010020036
>
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Reviewed-and-tested-by: Song Liu <songliubraving@fb.com>

> ---
>  tools/perf/util/probe-event.c | 39 ++++++++++++++++++++++++++++++++----
>  tools/perf/util/probe-event.h |  1 +
>  tools/perf/util/probe-file.c  | 34 ++++++++++++++++++++++++++------
>  tools/perf/util/probe-file.h  |  1 +
>  tools/perf/util/symbol-elf.c  | 46 ++++++++++++++++++++++++++++++++-----------
>  tools/perf/util/symbol.h      |  7 +++++++
>  6 files changed, 106 insertions(+), 22 deletions(-)
>
> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
> index f119eb628dbb..e86f8be89157 100644
> --- a/tools/perf/util/probe-event.c
> +++ b/tools/perf/util/probe-event.c
> @@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev)
>                         tp->offset = strtoul(fmt2_str, NULL, 10);
>         }
>
> +       if (tev->uprobes) {
> +               fmt2_str = strchr(p, '(');
> +               if (fmt2_str)
> +                       tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
> +       }
> +
>         tev->nargs = argc - 2;
>         tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
>         if (tev->args == NULL) {
> @@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct probe_trace_arg *arg,
>         return err;
>  }
>
> +static int
> +synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
> +{
> +       struct probe_trace_point *tp = &tev->point;
> +       int err;
> +
> +       err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
> +
> +       if (err >= 0 && tp->ref_ctr_offset) {
> +               if (!uprobe_ref_ctr_is_supported())
> +                       return -1;
> +               err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
> +       }
> +       return err >= 0 ? 0 : -1;
> +}
> +
>  char *synthesize_probe_trace_command(struct probe_trace_event *tev)
>  {
>         struct probe_trace_point *tp = &tev->point;
> @@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
>         }
>
>         /* Use the tp->address for uprobes */
> -       if (tev->uprobes)
> -               err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
> -       else if (!strncmp(tp->symbol, "0x", 2))
> +       if (tev->uprobes) {
> +               err = synthesize_uprobe_trace_def(tev, &buf);
> +       } else if (!strncmp(tp->symbol, "0x", 2)) {
>                 /* Absolute address. See try_to_find_absolute_address() */
>                 err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
>                                   tp->module ? ":" : "", tp->address);
> -       else
> +       } else {
>                 err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
>                                 tp->module ? ":" : "", tp->symbol, tp->offset);
> +       }
> +
>         if (err)
>                 goto error;
>
> @@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct probe_trace_event *tev)
>  {
>         int i;
>         char *buf = synthesize_probe_trace_command(tev);
> +       struct probe_trace_point *tp = &tev->point;
> +
> +       if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
> +               pr_warning("A semaphore is associated with %s:%s and "
> +                          "seems your kernel doesn't support it.\n",
> +                          tev->group, tev->event);
> +       }
>
>         /* Old uprobe event doesn't support memory dereference */
>         if (!tev->uprobes || tev->nargs == 0 || !buf)
> diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
> index 45b14f020558..15a98c3a2a2f 100644
> --- a/tools/perf/util/probe-event.h
> +++ b/tools/perf/util/probe-event.h
> @@ -27,6 +27,7 @@ struct probe_trace_point {
>         char            *symbol;        /* Base symbol */
>         char            *module;        /* Module name */
>         unsigned long   offset;         /* Offset from symbol */
> +       unsigned long   ref_ctr_offset; /* SDT reference counter offset */
>         unsigned long   address;        /* Actual address of the trace point */
>         bool            retprobe;       /* Return probe flag */
>  };
> diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
> index b76088fadf3d..aac7817d9e14 100644
> --- a/tools/perf/util/probe-file.c
> +++ b/tools/perf/util/probe-file.c
> @@ -696,8 +696,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
>  #ifdef HAVE_GELF_GETNOTE_SUPPORT
>  static unsigned long long sdt_note__get_addr(struct sdt_note *note)
>  {
> -       return note->bit32 ? (unsigned long long)note->addr.a32[0]
> -                : (unsigned long long)note->addr.a64[0];
> +       return note->bit32 ?
> +               (unsigned long long)note->addr.a32[SDT_NOTE_IDX_LOC] :
> +               (unsigned long long)note->addr.a64[SDT_NOTE_IDX_LOC];
> +}
> +
> +static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
> +{
> +       return note->bit32 ?
> +               (unsigned long long)note->addr.a32[SDT_NOTE_IDX_REFCTR] :
> +               (unsigned long long)note->addr.a64[SDT_NOTE_IDX_REFCTR];
>  }
>
>  static const char * const type_to_suffix[] = {
> @@ -775,14 +783,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
>  {
>         struct strbuf buf;
>         char *ret = NULL, **args;
> -       int i, args_count;
> +       int i, args_count, err;
> +       unsigned long long ref_ctr_offset;
>
>         if (strbuf_init(&buf, 32) < 0)
>                 return NULL;
>
> -       if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> -                               sdtgrp, note->name, pathname,
> -                               sdt_note__get_addr(note)) < 0)
> +       err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> +                       sdtgrp, note->name, pathname,
> +                       sdt_note__get_addr(note));
> +
> +       ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
> +       if (ref_ctr_offset && err >= 0)
> +               err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
> +
> +       if (err < 0)
>                 goto error;
>
>         if (!note->args)
> @@ -998,6 +1013,7 @@ int probe_cache__show_all_caches(struct strfilter *filter)
>  enum ftrace_readme {
>         FTRACE_README_PROBE_TYPE_X = 0,
>         FTRACE_README_KRETPROBE_OFFSET,
> +       FTRACE_README_UPROBE_REF_CTR,
>         FTRACE_README_END,
>  };
>
> @@ -1009,6 +1025,7 @@ static struct {
>         [idx] = {.pattern = pat, .avail = false}
>         DEFINE_TYPE(FTRACE_README_PROBE_TYPE_X, "*type: * x8/16/32/64,*"),
>         DEFINE_TYPE(FTRACE_README_KRETPROBE_OFFSET, "*place (kretprobe): *"),
> +       DEFINE_TYPE(FTRACE_README_UPROBE_REF_CTR, "*ref_ctr_offset*"),
>  };
>
>  static bool scan_ftrace_readme(enum ftrace_readme type)
> @@ -1064,3 +1081,8 @@ bool kretprobe_offset_is_supported(void)
>  {
>         return scan_ftrace_readme(FTRACE_README_KRETPROBE_OFFSET);
>  }
> +
> +bool uprobe_ref_ctr_is_supported(void)
> +{
> +       return scan_ftrace_readme(FTRACE_README_UPROBE_REF_CTR);
> +}
> diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
> index 63f29b1d22c1..2a249182f2a6 100644
> --- a/tools/perf/util/probe-file.h
> +++ b/tools/perf/util/probe-file.h
> @@ -69,6 +69,7 @@ struct probe_cache_entry *probe_cache__find_by_name(struct probe_cache *pcache,
>  int probe_cache__show_all_caches(struct strfilter *filter);
>  bool probe_type_is_available(enum probe_type type);
>  bool kretprobe_offset_is_supported(void);
> +bool uprobe_ref_ctr_is_supported(void);
>  #else  /* ! HAVE_LIBELF_SUPPORT */
>  static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused, struct nsinfo *nsi __maybe_unused)
>  {
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index 29770ea61768..0281d5e2cd67 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -1947,6 +1947,34 @@ void kcore_extract__delete(struct kcore_extract *kce)
>  }
>
>  #ifdef HAVE_GELF_GETNOTE_SUPPORT
> +
> +static void sdt_adjust_loc(struct sdt_note *tmp, GElf_Addr base_off)
> +{
> +       if (!base_off)
> +               return;
> +
> +       if (tmp->bit32)
> +               tmp->addr.a32[SDT_NOTE_IDX_LOC] =
> +                       tmp->addr.a32[SDT_NOTE_IDX_LOC] + base_off -
> +                       tmp->addr.a32[SDT_NOTE_IDX_BASE];
> +       else
> +               tmp->addr.a64[SDT_NOTE_IDX_LOC] =
> +                       tmp->addr.a64[SDT_NOTE_IDX_LOC] + base_off -
> +                       tmp->addr.a64[SDT_NOTE_IDX_BASE];
> +}
> +
> +static void sdt_adjust_refctr(struct sdt_note *tmp, GElf_Addr base_addr,
> +                             GElf_Addr base_off)
> +{
> +       if (!base_off)
> +               return;
> +
> +       if (tmp->bit32 && tmp->addr.a32[SDT_NOTE_IDX_REFCTR])
> +               tmp->addr.a32[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
> +       else if (tmp->addr.a64[SDT_NOTE_IDX_REFCTR])
> +               tmp->addr.a64[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
> +}
> +
>  /**
>   * populate_sdt_note : Parse raw data and identify SDT note
>   * @elf: elf of the opened file
> @@ -1964,7 +1992,6 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
>         const char *provider, *name, *args;
>         struct sdt_note *tmp = NULL;
>         GElf_Ehdr ehdr;
> -       GElf_Addr base_off = 0;
>         GElf_Shdr shdr;
>         int ret = -EINVAL;
>
> @@ -2060,17 +2087,12 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
>          * base address in the description of the SDT note. If its different,
>          * then accordingly, adjust the note location.
>          */
> -       if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL)) {
> -               base_off = shdr.sh_offset;
> -               if (base_off) {
> -                       if (tmp->bit32)
> -                               tmp->addr.a32[0] = tmp->addr.a32[0] + base_off -
> -                                       tmp->addr.a32[1];
> -                       else
> -                               tmp->addr.a64[0] = tmp->addr.a64[0] + base_off -
> -                                       tmp->addr.a64[1];
> -               }
> -       }
> +       if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL))
> +               sdt_adjust_loc(tmp, shdr.sh_offset);
> +
> +       /* Adjust reference counter offset */
> +       if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_PROBES_SCN, NULL))
> +               sdt_adjust_refctr(tmp, shdr.sh_addr, shdr.sh_offset);
>
>         list_add_tail(&tmp->note_list, sdt_notes);
>         return 0;
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index f25fae4b5743..20f49779116b 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -379,12 +379,19 @@ int get_sdt_note_list(struct list_head *head, const char *target);
>  int cleanup_sdt_note_list(struct list_head *sdt_notes);
>  int sdt_notes__get_count(struct list_head *start);
>
> +#define SDT_PROBES_SCN ".probes"
>  #define SDT_BASE_SCN ".stapsdt.base"
>  #define SDT_NOTE_SCN  ".note.stapsdt"
>  #define SDT_NOTE_TYPE 3
>  #define SDT_NOTE_NAME "stapsdt"
>  #define NR_ADDR 3
>
> +enum {
> +       SDT_NOTE_IDX_LOC = 0,
> +       SDT_NOTE_IDX_BASE,
> +       SDT_NOTE_IDX_REFCTR,
> +};
> +
>  struct mem_info *mem_info__new(void);
>  struct mem_info *mem_info__get(struct mem_info *mi);
>  void   mem_info__put(struct mem_info *mi);
> --
> 2.14.4
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-20  5:53   ` Song Liu
@ 2018-08-21 21:35     ` Song Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Song Liu @ 2018-08-21 21:35 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Sun, Aug 19, 2018 at 10:53 PM, Song Liu <liu.song.a23@gmail.com> wrote:
> On Sun, Aug 19, 2018 at 9:42 PM, Ravi Bangoria
> <ravi.bangoria@linux.ibm.com> wrote:
>> Userspace Statically Defined Tracepoints[1] are dtrace style markers
>> inside userspace applications. Applications like PostgreSQL, MySQL,
>> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
>> have these markers embedded in them. These markers are added by developer
>> at important places in the code. Each marker source expands to a single
>> nop instruction in the compiled code but there may be additional
>> overhead for computing the marker arguments which expands to couple of
>> instructions. In case the overhead is more, execution of it can be
>> omitted by runtime if() condition when no one is tracing on the marker:
>>
>>     if (reference_counter > 0) {
>>         Execute marker instructions;
>>     }
>>
>> Default value of reference counter is 0. Tracer has to increment the
>> reference counter before tracing on a marker and decrement it when
>> done with the tracing.
>>
>> Implement the reference counter logic in core uprobe. User will be
>> able to use it from trace_uprobe as well as from kernel module. New
>> trace_uprobe definition with reference counter will now be:
>>
>>     <path>:<offset>[(ref_ctr_offset)]
>>
>> where ref_ctr_offset is an optional field. For kernel module, new
>> variant of uprobe_register() has been introduced:
>>
>>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
>>
>> No new variant for uprobe_unregister() because it's assumed to have
>> only one reference counter for one uprobe.
>>
>> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
>>
>> Note: 'reference counter' is called as 'semaphore' in original Dtrace
>> (or Systemtap, bcc and even in ELF) documentation and code. But the
>> term 'semaphore' is misleading in this context. This is just a counter
>> used to hold number of tracers tracing on a marker. This is not really
>> used for any synchronization. So we are calling it a 'reference counter'
>> in kernel / perf code.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
>> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
>> [Only trace_uprobe.c]
>> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>
> Reviewed-by: Song Liu <songliubraving@fb.com>

Reviewed-and-tested-by: Song Liu <songliubraving@fb.com>

>
>> ---
>>  include/linux/uprobes.h     |   5 +
>>  kernel/events/uprobes.c     | 259 ++++++++++++++++++++++++++++++++++++++++++--
>>  kernel/trace/trace.c        |   2 +-
>>  kernel/trace/trace_uprobe.c |  38 ++++++-
>>  4 files changed, 293 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
>> index bb9d2084af03..103a48a48872 100644
>> --- a/include/linux/uprobes.h
>> +++ b/include/linux/uprobes.h
>> @@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
>>  extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
>>  extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
>>  extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
>> +extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
>>  extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
>>  extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
>>  extern int uprobe_mmap(struct vm_area_struct *vma);
>> @@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
>>  {
>>         return -ENOSYS;
>>  }
>> +static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
>> +{
>> +       return -ENOSYS;
>> +}
>>  static inline int
>>  uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
>>  {
>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
>> index 919c1ce32beb..35065febcb6c 100644
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -73,6 +73,7 @@ struct uprobe {
>>         struct uprobe_consumer  *consumers;
>>         struct inode            *inode;         /* Also hold a ref to inode */
>>         loff_t                  offset;
>> +       loff_t                  ref_ctr_offset;
>>         unsigned long           flags;
>>
>>         /*
>> @@ -88,6 +89,15 @@ struct uprobe {
>>         struct arch_uprobe      arch;
>>  };
>>
>> +struct delayed_uprobe {
>> +       struct list_head list;
>> +       struct uprobe *uprobe;
>> +       struct mm_struct *mm;
>> +};
>> +
>> +static DEFINE_MUTEX(delayed_uprobe_lock);
>> +static LIST_HEAD(delayed_uprobe_list);
>> +
>>  /*
>>   * Execute out of line area: anonymous executable mapping installed
>>   * by the probed task to execute the copy of the original instruction
>> @@ -282,6 +292,166 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
>>         return 1;
>>  }
>>
>> +static struct delayed_uprobe *
>> +delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
>> +{
>> +       struct delayed_uprobe *du;
>> +
>> +       list_for_each_entry(du, &delayed_uprobe_list, list)
>> +               if (du->uprobe == uprobe && du->mm == mm)
>> +                       return du;
>> +       return NULL;
>> +}
>> +
>> +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
>> +{
>> +       struct delayed_uprobe *du;
>> +
>> +       if (delayed_uprobe_check(uprobe, mm))
>> +               return 0;
>> +
>> +       du  = kzalloc(sizeof(*du), GFP_KERNEL);
>> +       if (!du)
>> +               return -ENOMEM;
>> +
>> +       du->uprobe = uprobe;
>> +       du->mm = mm;
>> +       list_add(&du->list, &delayed_uprobe_list);
>> +       return 0;
>> +}
>> +
>> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
>> +{
>> +       if (WARN_ON(!du))
>> +               return;
>> +       list_del(&du->list);
>> +       kfree(du);
>> +}
>> +
>> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct *mm)
>> +{
>> +       struct list_head *pos, *q;
>> +       struct delayed_uprobe *du;
>> +
>> +       if (!uprobe && !mm)
>> +               return;
>> +
>> +       list_for_each_safe(pos, q, &delayed_uprobe_list) {
>> +               du = list_entry(pos, struct delayed_uprobe, list);
>> +
>> +               if (uprobe && du->uprobe != uprobe)
>> +                       continue;
>> +               if (mm && du->mm != mm)
>> +                       continue;
>> +
>> +               delayed_uprobe_delete(du);
>> +       }
>> +}
>> +
>> +static bool valid_ref_ctr_vma(struct uprobe *uprobe,
>> +                             struct vm_area_struct *vma)
>> +{
>> +       unsigned long vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
>> +
>> +       return uprobe->ref_ctr_offset &&
>> +               vma->vm_file &&
>> +               file_inode(vma->vm_file) == uprobe->inode &&
>> +               (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
>> +               vma->vm_start <= vaddr &&
>> +               vma->vm_end > vaddr;
>> +}
>> +
>> +static struct vm_area_struct *
>> +find_ref_ctr_vma(struct uprobe *uprobe, struct mm_struct *mm)
>> +{
>> +       struct vm_area_struct *tmp;
>> +
>> +       for (tmp = mm->mmap; tmp; tmp = tmp->vm_next)
>> +               if (valid_ref_ctr_vma(uprobe, tmp))
>> +                       return tmp;
>> +
>> +       return NULL;
>> +}
>> +
>> +static int
>> +__update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
>> +{
>> +       void *kaddr;
>> +       struct page *page;
>> +       struct vm_area_struct *vma;
>> +       int ret;
>> +       short *ptr;
>> +
>> +       if (!vaddr || !d)
>> +               return -EINVAL;
>> +
>> +       ret = get_user_pages_remote(NULL, mm, vaddr, 1,
>> +                       FOLL_WRITE, &page, &vma, NULL);
>> +       if (unlikely(ret <= 0)) {
>> +               /*
>> +                * We are asking for 1 page. If get_user_pages_remote() fails,
>> +                * it may return 0, in that case we have to return error.
>> +                */
>> +               return ret == 0 ? -EBUSY : ret;
>> +       }
>> +
>> +       kaddr = kmap_atomic(page);
>> +       ptr = kaddr + (vaddr & ~PAGE_MASK);
>> +
>> +       if (unlikely(*ptr + d < 0)) {
>> +               pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
>> +                       "curr val: %d, delta: %d\n", vaddr, *ptr, d);
>> +               ret = -EINVAL;
>> +               goto out;
>> +       }
>> +
>> +       *ptr += d;
>> +       ret = 0;
>> +out:
>> +       kunmap_atomic(kaddr);
>> +       put_page(page);
>> +       return ret;
>> +}
>> +
>> +static void update_ref_ctr_warn(struct uprobe *uprobe,
>> +                               struct mm_struct *mm, short d)
>> +{
>> +       pr_warn("ref_ctr %s failed for inode: 0x%lx offset: "
>> +               "0x%llx ref_ctr_offset: 0x%llx of mm: 0x%pK\n",
>> +               d > 0 ? "increment" : "decrement", uprobe->inode->i_ino,
>> +               (unsigned long long) uprobe->offset,
>> +               (unsigned long long) uprobe->ref_ctr_offset, mm);
>> +}
>> +
>> +static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
>> +                         short d)
>> +{
>> +       struct vm_area_struct *rc_vma;
>> +       unsigned long rc_vaddr;
>> +       int ret = 0;
>> +
>> +       rc_vma = find_ref_ctr_vma(uprobe, mm);
>> +
>> +       if (rc_vma) {
>> +               rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
>> +               ret = __update_ref_ctr(mm, rc_vaddr, d);
>> +               if (ret)
>> +                       update_ref_ctr_warn(uprobe, mm, d);
>> +
>> +               if (d > 0)
>> +                       return ret;
>> +       }
>> +
>> +       mutex_lock(&delayed_uprobe_lock);
>> +       if (d > 0)
>> +               ret = delayed_uprobe_add(uprobe, mm);
>> +       else
>> +               delayed_uprobe_remove(uprobe, mm);
>> +       mutex_unlock(&delayed_uprobe_lock);
>> +
>> +       return ret;
>> +}
>> +
>>  /*
>>   * NOTE:
>>   * Expect the breakpoint instruction to be the smallest size instruction for
>> @@ -302,9 +472,13 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
>>  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>>                         unsigned long vaddr, uprobe_opcode_t opcode)
>>  {
>> +       struct uprobe *uprobe;
>>         struct page *old_page, *new_page;
>>         struct vm_area_struct *vma;
>> -       int ret;
>> +       int ret, is_register, ref_ctr_updated = 0;
>> +
>> +       is_register = is_swbp_insn(&opcode);
>> +       uprobe = container_of(auprobe, struct uprobe, arch);
>>
>>  retry:
>>         /* Read the page with vaddr into memory */
>> @@ -317,6 +491,15 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>>         if (ret <= 0)
>>                 goto put_old;
>>
>> +       /* We are going to replace instruction, update ref_ctr. */
>> +       if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
>> +               ret = update_ref_ctr(uprobe, mm, is_register ? 1 : -1);
>> +               if (ret)
>> +                       goto put_old;
>> +
>> +               ref_ctr_updated = 1;
>> +       }
>> +
>>         ret = anon_vma_prepare(vma);
>>         if (ret)
>>                 goto put_old;
>> @@ -337,6 +520,11 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>>
>>         if (unlikely(ret == -EAGAIN))
>>                 goto retry;
>> +
>> +       /* Revert back reference counter if instruction update failed. */
>> +       if (ret && is_register && ref_ctr_updated)
>> +               update_ref_ctr(uprobe, mm, -1);
>> +
>>         return ret;
>>  }
>>
>> @@ -378,8 +566,15 @@ static struct uprobe *get_uprobe(struct uprobe *uprobe)
>>
>>  static void put_uprobe(struct uprobe *uprobe)
>>  {
>> -       if (atomic_dec_and_test(&uprobe->ref))
>> +       if (atomic_dec_and_test(&uprobe->ref)) {
>> +               /*
>> +                * If application munmap(exec_vma) before uprobe_unregister()
>> +                * gets called, we don't get a chance to remove uprobe from
>> +                * delayed_uprobe_list from remove_breakpoint(). Do it here.
>> +                */
>> +               delayed_uprobe_remove(uprobe, NULL);
>>                 kfree(uprobe);
>> +       }
>>  }
>>
>>  static int match_uprobe(struct uprobe *l, struct uprobe *r)
>> @@ -484,7 +679,8 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
>>         return u;
>>  }
>>
>> -static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
>> +static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
>> +                                  loff_t ref_ctr_offset)
>>  {
>>         struct uprobe *uprobe, *cur_uprobe;
>>
>> @@ -494,6 +690,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
>>
>>         uprobe->inode = inode;
>>         uprobe->offset = offset;
>> +       uprobe->ref_ctr_offset = ref_ctr_offset;
>>         init_rwsem(&uprobe->register_rwsem);
>>         init_rwsem(&uprobe->consumer_rwsem);
>>
>> @@ -895,7 +1092,7 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
>>   * else return 0 (success)
>>   */
>>  static int __uprobe_register(struct inode *inode, loff_t offset,
>> -                            struct uprobe_consumer *uc)
>> +                            loff_t ref_ctr_offset, struct uprobe_consumer *uc)
>>  {
>>         struct uprobe *uprobe;
>>         int ret;
>> @@ -912,7 +1109,7 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
>>                 return -EINVAL;
>>
>>   retry:
>> -       uprobe = alloc_uprobe(inode, offset);
>> +       uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
>>         if (!uprobe)
>>                 return -ENOMEM;
>>         /*
>> @@ -938,10 +1135,17 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
>>  int uprobe_register(struct inode *inode, loff_t offset,
>>                     struct uprobe_consumer *uc)
>>  {
>> -       return __uprobe_register(inode, offset, uc);
>> +       return __uprobe_register(inode, offset, 0, uc);
>>  }
>>  EXPORT_SYMBOL_GPL(uprobe_register);
>>
>> +int uprobe_register_refctr(struct inode *inode, loff_t offset,
>> +                          loff_t ref_ctr_offset, struct uprobe_consumer *uc)
>> +{
>> +       return __uprobe_register(inode, offset, ref_ctr_offset, uc);
>> +}
>> +EXPORT_SYMBOL_GPL(uprobe_register_refctr);
>> +
>>  /*
>>   * uprobe_apply - unregister a already registered probe.
>>   * @inode: the file in which the probe has to be removed.
>> @@ -1060,6 +1264,35 @@ static void build_probe_list(struct inode *inode,
>>         spin_unlock(&uprobes_treelock);
>>  }
>>
>> +/* @vma contains reference counter, not the probed instruction. */
>> +static int delayed_ref_ctr_inc(struct vm_area_struct *vma)
>> +{
>> +       struct list_head *pos, *q;
>> +       struct delayed_uprobe *du;
>> +       unsigned long vaddr;
>> +       int ret = 0, err = 0;
>> +
>> +       mutex_lock(&delayed_uprobe_lock);
>> +       list_for_each_safe(pos, q, &delayed_uprobe_list) {
>> +               du = list_entry(pos, struct delayed_uprobe, list);
>> +
>> +               if (du->mm != vma->vm_mm ||
>> +                   !valid_ref_ctr_vma(du->uprobe, vma))
>> +                       continue;
>> +
>> +               vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
>> +               ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
>> +               if (ret) {
>> +                       update_ref_ctr_warn(du->uprobe, vma->vm_mm, 1);
>> +                       if (!err)
>> +                               err = ret;
>> +               }
>> +               delayed_uprobe_delete(du);
>> +       }
>> +       mutex_unlock(&delayed_uprobe_lock);
>> +       return err;
>> +}
>> +
>>  /*
>>   * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
>>   *
>> @@ -1072,7 +1305,15 @@ int uprobe_mmap(struct vm_area_struct *vma)
>>         struct uprobe *uprobe, *u;
>>         struct inode *inode;
>>
>> -       if (no_uprobe_events() || !valid_vma(vma, true))
>> +       if (no_uprobe_events())
>> +               return 0;
>> +
>> +       if (vma->vm_file &&
>> +           (vma->vm_flags & (VM_WRITE|VM_SHARED)) == VM_WRITE &&
>> +           test_bit(MMF_HAS_UPROBES, &vma->vm_mm->flags))
>> +               delayed_ref_ctr_inc(vma);
>> +
>> +       if (!valid_vma(vma, true))
>>                 return 0;
>>
>>         inode = file_inode(vma->vm_file);
>> @@ -1246,6 +1487,10 @@ void uprobe_clear_state(struct mm_struct *mm)
>>  {
>>         struct xol_area *area = mm->uprobes_state.xol_area;
>>
>> +       mutex_lock(&delayed_uprobe_lock);
>> +       delayed_uprobe_remove(NULL, mm);
>> +       mutex_unlock(&delayed_uprobe_lock);
>> +
>>         if (!area)
>>                 return;
>>
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index 2dad27809794..23689831f656 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -4620,7 +4620,7 @@ static const char readme_msg[] =
>>    "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
>>  #endif
>>  #ifdef CONFIG_UPROBE_EVENTS
>> -       "\t    place: <path>:<offset>\n"
>> +  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
>>  #endif
>>         "\t     args: <name>=fetcharg[:type]\n"
>>         "\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
>> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
>> index ac02fafc9f1b..a7ef6c4ca16e 100644
>> --- a/kernel/trace/trace_uprobe.c
>> +++ b/kernel/trace/trace_uprobe.c
>> @@ -59,6 +59,7 @@ struct trace_uprobe {
>>         struct inode                    *inode;
>>         char                            *filename;
>>         unsigned long                   offset;
>> +       unsigned long                   ref_ctr_offset;
>>         unsigned long                   nhit;
>>         struct trace_probe              tp;
>>  };
>> @@ -364,10 +365,10 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
>>  static int create_trace_uprobe(int argc, char **argv)
>>  {
>>         struct trace_uprobe *tu;
>> -       char *arg, *event, *group, *filename;
>> +       char *arg, *event, *group, *filename, *rctr, *rctr_end;
>>         char buf[MAX_EVENT_NAME_LEN];
>>         struct path path;
>> -       unsigned long offset;
>> +       unsigned long offset, ref_ctr_offset;
>>         bool is_delete, is_return;
>>         int i, ret;
>>
>> @@ -376,6 +377,7 @@ static int create_trace_uprobe(int argc, char **argv)
>>         is_return = false;
>>         event = NULL;
>>         group = NULL;
>> +       ref_ctr_offset = 0;
>>
>>         /* argc must be >= 1 */
>>         if (argv[0][0] == '-')
>> @@ -450,6 +452,26 @@ static int create_trace_uprobe(int argc, char **argv)
>>                 goto fail_address_parse;
>>         }
>>
>> +       /* Parse reference counter offset if specified. */
>> +       rctr = strchr(arg, '(');
>> +       if (rctr) {
>> +               rctr_end = strchr(rctr, ')');
>> +               if (rctr > rctr_end || *(rctr_end + 1) != 0) {
>> +                       ret = -EINVAL;
>> +                       pr_info("Invalid reference counter offset.\n");
>> +                       goto fail_address_parse;
>> +               }
>> +
>> +               *rctr++ = '\0';
>> +               *rctr_end = '\0';
>> +               ret = kstrtoul(rctr, 0, &ref_ctr_offset);
>> +               if (ret) {
>> +                       pr_info("Invalid reference counter offset.\n");
>> +                       goto fail_address_parse;
>> +               }
>> +       }
>> +
>> +       /* Parse uprobe offset. */
>>         ret = kstrtoul(arg, 0, &offset);
>>         if (ret)
>>                 goto fail_address_parse;
>> @@ -484,6 +506,7 @@ static int create_trace_uprobe(int argc, char **argv)
>>                 goto fail_address_parse;
>>         }
>>         tu->offset = offset;
>> +       tu->ref_ctr_offset = ref_ctr_offset;
>>         tu->path = path;
>>         tu->filename = kstrdup(filename, GFP_KERNEL);
>>
>> @@ -602,6 +625,9 @@ static int probes_seq_show(struct seq_file *m, void *v)
>>                         trace_event_name(&tu->tp.call), tu->filename,
>>                         (int)(sizeof(void *) * 2), tu->offset);
>>
>> +       if (tu->ref_ctr_offset)
>> +               seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
>> +
>>         for (i = 0; i < tu->tp.nr_args; i++)
>>                 seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
>>
>> @@ -917,7 +943,13 @@ probe_event_enable(struct trace_uprobe *tu, struct trace_event_file *file,
>>
>>         tu->consumer.filter = filter;
>>         tu->inode = d_real_inode(tu->path.dentry);
>> -       ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
>> +       if (tu->ref_ctr_offset) {
>> +               ret = uprobe_register_refctr(tu->inode, tu->offset,
>> +                               tu->ref_ctr_offset, &tu->consumer);
>> +       } else {
>> +               ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
>> +       }
>> +
>>         if (ret)
>>                 goto err_buffer;
>>
>> --
>> 2.14.4
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 2/4] Uprobes/sdt: Prevent multiple reference counter for same uprobe
  2018-08-20  5:54   ` Song Liu
@ 2018-08-21 21:35     ` Song Liu
  0 siblings, 0 replies; 24+ messages in thread
From: Song Liu @ 2018-08-21 21:35 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Srikar Dronamraju, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Sun, Aug 19, 2018 at 10:54 PM, Song Liu <liu.song.a23@gmail.com> wrote:
> On Sun, Aug 19, 2018 at 9:42 PM, Ravi Bangoria
> <ravi.bangoria@linux.ibm.com> wrote:
>> We assume to have only one reference counter for one uprobe.
>> Don't allow user to register multiple uprobes having same
>> inode+offset but different reference counter.
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
>> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>
> Reviewed-by: Song Liu <songliubraving@fb.com>

Reviewed-and-tested-by: Song Liu <songliubraving@fb.com>

>
>> ---
>>  kernel/events/uprobes.c | 19 +++++++++++++++++++
>>  1 file changed, 19 insertions(+)
>>
>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
>> index 35065febcb6c..ecee371a59c7 100644
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -679,6 +679,16 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
>>         return u;
>>  }
>>
>> +static void
>> +ref_ctr_mismatch_warn(struct uprobe *cur_uprobe, struct uprobe *uprobe)
>> +{
>> +       pr_warn("ref_ctr_offset mismatch. inode: 0x%lx offset: 0x%llx "
>> +               "ref_ctr_offset(old): 0x%llx ref_ctr_offset(new): 0x%llx\n",
>> +               uprobe->inode->i_ino, (unsigned long long) uprobe->offset,
>> +               (unsigned long long) cur_uprobe->ref_ctr_offset,
>> +               (unsigned long long) uprobe->ref_ctr_offset);
>> +}
>> +
>>  static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
>>                                    loff_t ref_ctr_offset)
>>  {
>> @@ -698,6 +708,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
>>         cur_uprobe = insert_uprobe(uprobe);
>>         /* a uprobe exists for this inode:offset combination */
>>         if (cur_uprobe) {
>> +               if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
>> +                       ref_ctr_mismatch_warn(cur_uprobe, uprobe);
>> +                       put_uprobe(cur_uprobe);
>> +                       kfree(uprobe);
>> +                       return ERR_PTR(-EINVAL);
>> +               }
>>                 kfree(uprobe);
>>                 uprobe = cur_uprobe;
>>         }
>> @@ -1112,6 +1128,9 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
>>         uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
>>         if (!uprobe)
>>                 return -ENOMEM;
>> +       if (IS_ERR(uprobe))
>> +               return PTR_ERR(uprobe);
>> +
>>         /*
>>          * We can race with uprobe_unregister()->delete_uprobe().
>>          * Check uprobe_is_active() and retry if it is false.
>> --
>> 2.14.4
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-20  4:42 ` [PATCH v9 1/4] " Ravi Bangoria
  2018-08-20  5:53   ` Song Liu
@ 2018-08-22 12:39   ` Srikar Dronamraju
  2018-08-28 19:07     ` Song Liu
  1 sibling, 1 reply; 24+ messages in thread
From: Srikar Dronamraju @ 2018-08-22 12:39 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, rostedt, mhiramat, liu.song.a23, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

* Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-08-20 10:12:47]:

> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
> 
>     if (reference_counter > 0) {
>         Execute marker instructions;
>     }
> 
> Default value of reference counter is 0. Tracer has to increment the
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
> 
> Implement the reference counter logic in core uprobe. User will be
> able to use it from trace_uprobe as well as from kernel module. New
> trace_uprobe definition with reference counter will now be:
> 
>     <path>:<offset>[(ref_ctr_offset)]
> 
> where ref_ctr_offset is an optional field. For kernel module, new
> variant of uprobe_register() has been introduced:
> 
>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> 
> No new variant for uprobe_unregister() because it's assumed to have
> only one reference counter for one uprobe.
> 
> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> 
> Note: 'reference counter' is called as 'semaphore' in original Dtrace
> (or Systemtap, bcc and even in ELF) documentation and code. But the
> term 'semaphore' is misleading in this context. This is just a counter
> used to hold number of tracers tracing on a marker. This is not really
> used for any synchronization. So we are calling it a 'reference counter'
> in kernel / perf code.
> 


Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
> [Only trace_uprobe.c]
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> ---


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-22 12:39   ` Srikar Dronamraju
@ 2018-08-28 19:07     ` Song Liu
  2018-08-28 19:34       ` Steven Rostedt
  0 siblings, 1 reply; 24+ messages in thread
From: Song Liu @ 2018-08-28 19:07 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ravi Bangoria, Oleg Nesterov, Steven Rostedt, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

Hi all,

What's our plan with this work? Will this be routed via Steven's tree?

Thanks,
Song

On Wed, Aug 22, 2018 at 5:39 AM Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
>
> * Ravi Bangoria <ravi.bangoria@linux.ibm.com> [2018-08-20 10:12:47]:
>
> > Userspace Statically Defined Tracepoints[1] are dtrace style markers
> > inside userspace applications. Applications like PostgreSQL, MySQL,
> > Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> > have these markers embedded in them. These markers are added by developer
> > at important places in the code. Each marker source expands to a single
> > nop instruction in the compiled code but there may be additional
> > overhead for computing the marker arguments which expands to couple of
> > instructions. In case the overhead is more, execution of it can be
> > omitted by runtime if() condition when no one is tracing on the marker:
> >
> >     if (reference_counter > 0) {
> >         Execute marker instructions;
> >     }
> >
> > Default value of reference counter is 0. Tracer has to increment the
> > reference counter before tracing on a marker and decrement it when
> > done with the tracing.
> >
> > Implement the reference counter logic in core uprobe. User will be
> > able to use it from trace_uprobe as well as from kernel module. New
> > trace_uprobe definition with reference counter will now be:
> >
> >     <path>:<offset>[(ref_ctr_offset)]
> >
> > where ref_ctr_offset is an optional field. For kernel module, new
> > variant of uprobe_register() has been introduced:
> >
> >     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> >
> > No new variant for uprobe_unregister() because it's assumed to have
> > only one reference counter for one uprobe.
> >
> > [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> >
> > Note: 'reference counter' is called as 'semaphore' in original Dtrace
> > (or Systemtap, bcc and even in ELF) documentation and code. But the
> > term 'semaphore' is misleading in this context. This is just a counter
> > used to hold number of tracers tracing on a marker. This is not really
> > used for any synchronization. So we are calling it a 'reference counter'
> > in kernel / perf code.
> >
>
>
> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>
> > Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> > Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
> > [Only trace_uprobe.c]
> > Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> > ---
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)
  2018-08-28 19:07     ` Song Liu
@ 2018-08-28 19:34       ` Steven Rostedt
  0 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2018-08-28 19:34 UTC (permalink / raw)
  To: Song Liu
  Cc: Srikar Dronamraju, Ravi Bangoria, Oleg Nesterov, mhiramat,
	Peter Zijlstra, mingo, acme, alexander.shishkin, jolsa, namhyung,
	open list, ananth, Alexis Berlemont, naveen.n.rao,
	linux-arm-kernel, linux-mips, linux, ralf, paul.burton

On Tue, 28 Aug 2018 12:07:30 -0700
Song Liu <liu.song.a23@gmail.com> wrote:

> Hi all,
> 
> What's our plan with this work? Will this be routed via Steven's tree?
> 
> 

I can start pulling these in and testing them.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore)
  2018-08-20  4:42 ` [PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
  2018-08-21 21:33   ` Song Liu
@ 2018-08-30 18:45   ` Steven Rostedt
  2018-08-30 18:50     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2018-08-30 18:45 UTC (permalink / raw)
  To: acme
  Cc: Ravi Bangoria, srikar, oleg, mhiramat, liu.song.a23, peterz,
	mingo, alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton


Arnaldo,

I'm going to be playing with some of the probe code which may conflict
with these patches, so I would like to pull these in my tree. But this
patch requires an Acked-by from you before I can pull it in.

You OK with this?

Thanks!

-- Steve


On Mon, 20 Aug 2018 10:12:50 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> With this, perf buildid-cache will save SDT markers with reference
> counter in probe cache. Perf probe will be able to probe markers
> having reference counter. Ex,
> 
>   # readelf -n /tmp/tick | grep -A1 loop2
>     Name: loop2
>     ... Semaphore: 0x0000000010020036
> 
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore)
  2018-08-30 18:45   ` Steven Rostedt
@ 2018-08-30 18:50     ` Arnaldo Carvalho de Melo
  2018-08-30 19:43       ` Steven Rostedt
  0 siblings, 1 reply; 24+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-08-30 18:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ravi Bangoria, srikar, oleg, mhiramat, liu.song.a23, peterz,
	mingo, alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

Em Thu, Aug 30, 2018 at 02:45:31PM -0400, Steven Rostedt escreveu:
> 
> Arnaldo,
> 
> I'm going to be playing with some of the probe code which may conflict
> with these patches, so I would like to pull these in my tree. But this
> patch requires an Acked-by from you before I can pull it in.
> 
> You OK with this?

Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 
> Thanks!
> 
> -- Steve
> 
> 
> On Mon, 20 Aug 2018 10:12:50 +0530
> Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
> 
> > With this, perf buildid-cache will save SDT markers with reference
> > counter in probe cache. Perf probe will be able to probe markers
> > having reference counter. Ex,
> > 
> >   # readelf -n /tmp/tick | grep -A1 loop2
> >     Name: loop2
> >     ... Semaphore: 0x0000000010020036
> > 
> >   # ./perf buildid-cache --add /tmp/tick
> >   # ./perf probe sdt_tick:loop2
> >   # ./perf stat -e sdt_tick:loop2 /tmp/tick
> >     hi: 0
> >     hi: 1
> >     hi: 2
> >     ^C
> >      Performance counter stats for '/tmp/tick':
> >                  3      sdt_tick:loop2
> >        2.561851452 seconds time elapsed
> > 
> > Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> > Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
> > Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore)
  2018-08-30 18:50     ` Arnaldo Carvalho de Melo
@ 2018-08-30 19:43       ` Steven Rostedt
  0 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2018-08-30 19:43 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ravi Bangoria, srikar, oleg, mhiramat, liu.song.a23, peterz,
	mingo, alexander.shishkin, jolsa, namhyung, linux-kernel, ananth,
	alexis.berlemont, naveen.n.rao, linux-arm-kernel, linux-mips,
	linux, ralf, paul.burton

On Thu, 30 Aug 2018 15:50:27 -0300
Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Em Thu, Aug 30, 2018 at 02:45:31PM -0400, Steven Rostedt escreveu:
> > 
> > Arnaldo,
> > 
> > I'm going to be playing with some of the probe code which may conflict
> > with these patches, so I would like to pull these in my tree. But this
> > patch requires an Acked-by from you before I can pull it in.
> > 
> > You OK with this?  
> 
> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
>  

Awesome, thanks Arnaldo,

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-08-30 19:43 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-20  4:42 [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-08-20  4:42 ` [PATCH v9 1/4] " Ravi Bangoria
2018-08-20  5:53   ` Song Liu
2018-08-21 21:35     ` Song Liu
2018-08-22 12:39   ` Srikar Dronamraju
2018-08-28 19:07     ` Song Liu
2018-08-28 19:34       ` Steven Rostedt
2018-08-20  4:42 ` [PATCH v9 2/4] Uprobes/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
2018-08-20  5:54   ` Song Liu
2018-08-21 21:35     ` Song Liu
2018-08-20  4:42 ` [PATCH v9 3/4] trace_uprobe/sdt: " Ravi Bangoria
2018-08-21 20:46   ` Song Liu
2018-08-20  4:42 ` [PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
2018-08-21 21:33   ` Song Liu
2018-08-30 18:45   ` Steven Rostedt
2018-08-30 18:50     ` Arnaldo Carvalho de Melo
2018-08-30 19:43       ` Steven Rostedt
2018-08-20 17:38 ` [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore) Song Liu
2018-08-21  4:42   ` Ravi Bangoria
2018-08-21  4:55     ` Song Liu
2018-08-21  5:23   ` Ravi Bangoria
2018-08-21  6:20     ` Ravi Bangoria
2018-08-21  7:34   ` Naveen N. Rao
2018-08-21 11:58     ` Ravi Bangoria

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).