All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] trace_uprobe: Support SDT markers having semaphore
@ 2018-02-28  7:53 Ravi Bangoria
  2018-02-28  7:53 ` [RFC 1/4] Uprobe: Rename map_info to uprobe_map_info Ravi Bangoria
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Ravi Bangoria @ 2018-02-28  7:53 UTC (permalink / raw)
  To: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, srikar,
	oleg
  Cc: Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. These markers are added by developer at
important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. If this computaion is quite more, execution of it can be
ommited by runtime if() condition when no one is tracing on the marker:

    if (semaphore > 0) {
        Execute marker instructions;
    }

Default value of semaphore is 0. Tracer has to increment the semaphore
before recording on a marker and decrement it at the end of tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by semaphore. Also, it's not easy
to add semaphore flip logic in userspace tool like perf, so basic
idea for this patchset is to add semaphore flip logic in the
trace_uprobe infrastructure. Ex,[2]

  # cat tick.c
    ... 
    for (i = 0; i < 100; i++) {
	DTRACE_PROBE1(tick, loop1, i);
        if (TICK_LOOP2_ENABLED()) {
            DTRACE_PROBE1(tick, loop2, i); 
        }
        printf("hi: %d\n", i); 
        sleep(1);
    }   
    ... 

Here tick:loop1 is marker without semaphore where as tick:loop2 is
surrounded by semaphore.


  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
             3      sdt_tick:loop1
             0      sdt_tick:loop2
     2.747086086 seconds time elapsed


Perf failed to record data for tick:loop2. Same experiment with this
patch series:


  # readelf -n ./tick
  Provider: tick
  Name: loop2
  ... Semaphore: 0x0000000010020036

  # readelf -SW ./tick | grep probes
  [25] .probes           PROGBITS        0000000010020034 010034


Semaphore offset is 0x10036. I don't have educated 'perf probe'
about semaphore. So instead of using 'perf probe' command, I'm
manually adding entry in the <tracefs>/uprobe_events file.
Special char * denotes semaphore offset.


  # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4 *0x10036" > uprobe_events

  # perf stat -e sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  hi: 3
  ^C
  Performance counter stats for '/tmp/tick':
              4      sdt_tick:loop2                                              
     3.359047827 seconds time elapsed


Feedback?

TODO:
 - Educate perf tool about semaphore.
 - perf_event_open() now suppoers {k,u}probe event creation[3]. If we
   can supply semaphore offset in perf_event_attr, perf_event_open()
   can be educated to probe SDT marker having semaphore. Though, both
   config1 and config2 are already being used for uprobe and I don't
   see any other attribute which I can use for semaphore offset. Can
   we introduce one more config there? config3?

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
[3] https://lkml.org/lkml/2017/12/6/976


Ravi Bangoria (4):
  Uprobe: Rename map_info to uprobe_map_info
  Uprobe: Export few functions / data structures
  trace_uprobe: Support SDT markers having semaphore
  trace_uprobe: Fix multiple update of same semaphores

 include/linux/uprobes.h     |  25 +++++
 kernel/events/uprobes.c     |  43 ++++----
 kernel/trace/trace_uprobe.c | 244 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 290 insertions(+), 22 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC 1/4] Uprobe: Rename map_info to uprobe_map_info
  2018-02-28  7:53 [RFC 0/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
@ 2018-02-28  7:53 ` Ravi Bangoria
  2018-02-28 12:09   ` Srikar Dronamraju
  2018-02-28  7:53 ` [RFC 2/4] Uprobe: Export few functions / data structures Ravi Bangoria
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Ravi Bangoria @ 2018-02-28  7:53 UTC (permalink / raw)
  To: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, srikar,
	oleg
  Cc: Ravi Bangoria

map_info is very generic name, rename it to uprobe_map_info.
Renaming will help to export this structure outside of the
file.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 kernel/events/uprobes.c | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 267f6ef..fcce25dd 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -705,27 +705,29 @@ static void delete_uprobe(struct uprobe *uprobe)
 	put_uprobe(uprobe);
 }
 
-struct map_info {
-	struct map_info *next;
+struct uprobe_map_info {
+	struct uprobe_map_info *next;
 	struct mm_struct *mm;
 	unsigned long vaddr;
 };
 
-static inline struct map_info *free_map_info(struct map_info *info)
+static inline struct uprobe_map_info *
+free_uprobe_map_info(struct uprobe_map_info *info)
 {
-	struct map_info *next = info->next;
+	struct uprobe_map_info *next = info->next;
 	kfree(info);
 	return next;
 }
 
-static struct map_info *
-build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
+static struct uprobe_map_info *
+build_uprobe_map_info(struct address_space *mapping, loff_t offset,
+		      bool is_register)
 {
 	unsigned long pgoff = offset >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
-	struct map_info *curr = NULL;
-	struct map_info *prev = NULL;
-	struct map_info *info;
+	struct uprobe_map_info *curr = NULL;
+	struct uprobe_map_info *prev = NULL;
+	struct uprobe_map_info *info;
 	int more = 0;
 
  again:
@@ -739,7 +741,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 			 * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
 			 * reclaim. This is optimistic, no harm done if it fails.
 			 */
-			prev = kmalloc(sizeof(struct map_info),
+			prev = kmalloc(sizeof(struct uprobe_map_info),
 					GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
 			if (prev)
 				prev->next = NULL;
@@ -772,7 +774,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 	}
 
 	do {
-		info = kmalloc(sizeof(struct map_info), GFP_KERNEL);
+		info = kmalloc(sizeof(struct uprobe_map_info), GFP_KERNEL);
 		if (!info) {
 			curr = ERR_PTR(-ENOMEM);
 			goto out;
@@ -784,7 +786,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 	goto again;
  out:
 	while (prev)
-		prev = free_map_info(prev);
+		prev = free_uprobe_map_info(prev);
 	return curr;
 }
 
@@ -792,11 +794,11 @@ static inline struct map_info *free_map_info(struct map_info *info)
 register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 {
 	bool is_register = !!new;
-	struct map_info *info;
+	struct uprobe_map_info *info;
 	int err = 0;
 
 	percpu_down_write(&dup_mmap_sem);
-	info = build_map_info(uprobe->inode->i_mapping,
+	info = build_uprobe_map_info(uprobe->inode->i_mapping,
 					uprobe->offset, is_register);
 	if (IS_ERR(info)) {
 		err = PTR_ERR(info);
@@ -835,7 +837,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 		up_write(&mm->mmap_sem);
  free:
 		mmput(mm);
-		info = free_map_info(info);
+		info = free_uprobe_map_info(info);
 	}
  out:
 	percpu_up_write(&dup_mmap_sem);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 2/4] Uprobe: Export few functions / data structures
  2018-02-28  7:53 [RFC 0/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
  2018-02-28  7:53 ` [RFC 1/4] Uprobe: Rename map_info to uprobe_map_info Ravi Bangoria
@ 2018-02-28  7:53 ` Ravi Bangoria
  2018-02-28 12:24   ` Srikar Dronamraju
  2018-02-28  7:53 ` [RFC 3/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Ravi Bangoria @ 2018-02-28  7:53 UTC (permalink / raw)
  To: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, srikar,
	oleg
  Cc: Ravi Bangoria

These functions and data structures will be used by other files
in later patches.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 include/linux/uprobes.h | 23 +++++++++++++++++++++++
 kernel/events/uprobes.c | 20 ++++++--------------
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e9..06c169e 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -115,6 +115,12 @@ struct uprobes_state {
 	struct xol_area		*xol_area;
 };
 
+struct uprobe_map_info {
+	struct uprobe_map_info *next;
+	struct mm_struct *mm;
+	unsigned long vaddr;
+};
+
 extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
@@ -149,6 +155,11 @@ struct uprobes_state {
 extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 					 void *src, unsigned long len);
+unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset);
+void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len);
+void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len);
+struct uprobe_map_info *free_uprobe_map_info(struct uprobe_map_info *info);
+
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
@@ -203,5 +214,17 @@ static inline void uprobe_copy_process(struct task_struct *t, unsigned long flag
 static inline void uprobe_clear_state(struct mm_struct *mm)
 {
 }
+unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
+{
+}
+void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len)
+{
+}
+void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
+{
+}
+struct uprobe_map_info *free_uprobe_map_info(struct uprobe_map_info *info)
+{
+}
 #endif /* !CONFIG_UPROBES */
 #endif	/* _LINUX_UPROBES_H */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index fcce25dd..56dd7af 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -130,7 +130,7 @@ static bool valid_vma(struct vm_area_struct *vma, bool is_register)
 	return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC;
 }
 
-static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
+unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
 {
 	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
 }
@@ -240,14 +240,14 @@ bool __weak is_trap_insn(uprobe_opcode_t *insn)
 	return is_swbp_insn(insn);
 }
 
-static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len)
+void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len)
 {
 	void *kaddr = kmap_atomic(page);
 	memcpy(dst, kaddr + (vaddr & ~PAGE_MASK), len);
 	kunmap_atomic(kaddr);
 }
 
-static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
+void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
 {
 	void *kaddr = kmap_atomic(page);
 	memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len);
@@ -705,23 +705,15 @@ static void delete_uprobe(struct uprobe *uprobe)
 	put_uprobe(uprobe);
 }
 
-struct uprobe_map_info {
-	struct uprobe_map_info *next;
-	struct mm_struct *mm;
-	unsigned long vaddr;
-};
-
-static inline struct uprobe_map_info *
-free_uprobe_map_info(struct uprobe_map_info *info)
+struct uprobe_map_info *free_uprobe_map_info(struct uprobe_map_info *info)
 {
 	struct uprobe_map_info *next = info->next;
 	kfree(info);
 	return next;
 }
 
-static struct uprobe_map_info *
-build_uprobe_map_info(struct address_space *mapping, loff_t offset,
-		      bool is_register)
+struct uprobe_map_info *build_uprobe_map_info(struct address_space *mapping,
+					      loff_t offset, bool is_register)
 {
 	unsigned long pgoff = offset >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 3/4] trace_uprobe: Support SDT markers having semaphore
  2018-02-28  7:53 [RFC 0/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
  2018-02-28  7:53 ` [RFC 1/4] Uprobe: Rename map_info to uprobe_map_info Ravi Bangoria
  2018-02-28  7:53 ` [RFC 2/4] Uprobe: Export few functions / data structures Ravi Bangoria
@ 2018-02-28  7:53 ` Ravi Bangoria
  2018-03-01 14:07   ` Masami Hiramatsu
  2018-03-06 11:59   ` Peter Zijlstra
  2018-02-28  7:53 ` [RFC 4/4] trace_uprobe: Fix multiple update of same semaphores Ravi Bangoria
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Ravi Bangoria @ 2018-02-28  7:53 UTC (permalink / raw)
  To: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, srikar,
	oleg
  Cc: Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. These markers are added by developer at
important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. If this computaion is quite more, execution of it can be
ommited by runtime if() condition when no one is tracing on the marker:

    if (semaphore > 0) {
        Execute marker instructions;
    }   

Default value of semaphore is 0. Tracer has to increment the semaphore
before recording on a marker and decrement it at the end of tracing.

Implement the semaphore flip logic in trace_uprobe, leaving core uprobe
infrastructure as is, except one new callback from uprobe_mmap() to
trace_uprobe.

There are two major scenarios while enabling the marker,
 1. Trace already running process. In this case, find all suitable
    processes and increment the semaphore value in them.
 2. Trace is already enabled when target binary is executed. In this
    case, all mmaps will get notified to trace_uprobe and trace_uprobe
    will increment the semaphore if corresponding uprobe is enabled.

At the time of disabling probes, decrement semaphore in all existing
target processes.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 include/linux/uprobes.h     |   2 +
 kernel/events/uprobes.c     |   5 ++
 kernel/trace/trace_uprobe.c | 145 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 152 insertions(+)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 06c169e..04e9d57 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -121,6 +121,8 @@ struct uprobe_map_info {
 	unsigned long vaddr;
 };
 
+extern void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
+
 extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 56dd7af..81d8aaf 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1051,6 +1051,8 @@ static void build_probe_list(struct inode *inode,
 	spin_unlock(&uprobes_treelock);
 }
 
+void (*uprobe_mmap_callback)(struct vm_area_struct *vma) = NULL;
+
 /*
  * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
  *
@@ -1063,6 +1065,9 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	struct uprobe *uprobe, *u;
 	struct inode *inode;
 
+	if (vma->vm_flags & VM_WRITE && uprobe_mmap_callback)
+		uprobe_mmap_callback(vma);
+
 	if (no_uprobe_events() || !valid_vma(vma, true))
 		return 0;
 
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 40592e7b..d14aafc 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -25,6 +25,7 @@
 #include <linux/namei.h>
 #include <linux/string.h>
 #include <linux/rculist.h>
+#include <linux/sched/mm.h>
 
 #include "trace_probe.h"
 
@@ -58,6 +59,7 @@ struct trace_uprobe {
 	struct inode			*inode;
 	char				*filename;
 	unsigned long			offset;
+	unsigned long			sdt_offset; /* sdt semaphore offset */
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -502,6 +504,16 @@ static int create_trace_uprobe(int argc, char **argv)
 	for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
 		struct probe_arg *parg = &tu->tp.args[i];
 
+		/* This is not really an argument. */
+		if (argv[i][0] == '*') {
+			ret = kstrtoul(&(argv[i][1]), 0, &tu->sdt_offset);
+			if (ret) {
+				pr_info("Invalid semaphore address.\n");
+				goto error;
+			}
+			continue;
+		}
+
 		/* Increment count for freeing args in error case */
 		tu->tp.nr_args++;
 
@@ -894,6 +906,131 @@ static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func,
 	return trace_handle_return(s);
 }
 
+static bool sdt_valid_vma(struct trace_uprobe *tu, struct vm_area_struct *vma)
+{
+	unsigned long vaddr = offset_to_vaddr(vma, tu->sdt_offset);
+
+	return tu->sdt_offset &&
+		vma->vm_file &&
+		file_inode(vma->vm_file) == tu->inode &&
+		vma->vm_flags & VM_WRITE &&
+		vma->vm_start <= vaddr &&
+		vma->vm_end > vaddr;
+}
+
+static struct vm_area_struct *
+sdt_find_vma(struct mm_struct *mm, struct trace_uprobe *tu)
+{
+	struct vm_area_struct *tmp;
+
+	for (tmp = mm->mmap; tmp != NULL; tmp = tmp->vm_next)
+		if (sdt_valid_vma(tu, tmp))
+			return tmp;
+
+	return NULL;
+}
+
+static int
+sdt_update_sem(struct mm_struct *mm, unsigned long vaddr, short val)
+{
+	struct page *page;
+	struct vm_area_struct *vma;
+	int ret = 0;
+	unsigned short orig = 0;
+
+	if (vaddr == 0)
+		return -EINVAL;
+
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+		FOLL_FORCE | FOLL_WRITE, &page, &vma, NULL);
+	if (ret <= 0)
+		return ret;
+
+	copy_from_page(page, vaddr, &orig, sizeof(orig));
+	orig += val;
+	copy_to_page(page, vaddr, &orig, sizeof(orig));
+	put_page(page);
+	return 0;
+}
+
+/*
+ * TODO: Adding this defination in include/linux/uprobes.h throws
+ * warnings about address_sapce. Adding it here for the time being.
+ */
+struct uprobe_map_info *build_uprobe_map_info(struct address_space *mapping, loff_t offset, bool is_register);
+
+static void sdt_increment_sem(struct trace_uprobe *tu)
+{
+	struct uprobe_map_info *info;
+	struct vm_area_struct *vma;
+	unsigned long vaddr;
+
+	uprobe_start_dup_mmap();
+	info = build_uprobe_map_info(tu->inode->i_mapping, tu->sdt_offset, false);
+	if (IS_ERR(info))
+		goto out;
+
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+		vma = sdt_find_vma(info->mm, tu);
+		if (!vma)
+			goto cont;
+
+		vaddr = offset_to_vaddr(vma, tu->sdt_offset);
+		sdt_update_sem(info->mm, vaddr, 1);
+
+cont:
+		up_write(&info->mm->mmap_sem);
+		mmput(info->mm);
+		info = free_uprobe_map_info(info);
+	}
+
+out:
+	uprobe_end_dup_mmap();
+}
+
+/* Called with down_write(&vma->vm_mm->mmap_sem) */
+void trace_uprobe_mmap_callback(struct vm_area_struct *vma)
+{
+	struct trace_uprobe *tu;
+	unsigned long vaddr;
+
+	mutex_lock(&uprobe_lock);
+	list_for_each_entry(tu, &uprobe_list, list) {
+		if (!sdt_valid_vma(tu, vma) ||
+		    !trace_probe_is_enabled(&tu->tp))
+			continue;
+
+		vaddr = offset_to_vaddr(vma, tu->sdt_offset);
+		sdt_update_sem(vma->vm_mm, vaddr, 1);
+	}
+	mutex_unlock(&uprobe_lock);
+}
+
+static void sdt_decrement_sem(struct trace_uprobe *tu)
+{
+	struct vm_area_struct *vma;
+	unsigned long vaddr;
+	struct uprobe_map_info *info;
+
+	info = build_uprobe_map_info(tu->inode->i_mapping, tu->sdt_offset, false);
+	if (IS_ERR(info))
+		return;
+
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+		vma = sdt_find_vma(info->mm, tu);
+		if (vma) {
+			vaddr = offset_to_vaddr(vma, tu->sdt_offset);
+			sdt_update_sem(info->mm, vaddr, -1);
+		}
+		up_write(&info->mm->mmap_sem);
+
+		mmput(info->mm);
+		info = free_uprobe_map_info(info);
+	}
+}
+
 typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 				enum uprobe_filter_ctx ctx,
 				struct mm_struct *mm);
@@ -939,6 +1076,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 	if (ret)
 		goto err_buffer;
 
+	if (tu->sdt_offset)
+		sdt_increment_sem(tu);
+
 	return 0;
 
  err_buffer:
@@ -979,6 +1119,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
+	if (tu->sdt_offset)
+		sdt_decrement_sem(tu);
+
 	uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
 	tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
 
@@ -1353,6 +1496,8 @@ static __init int init_uprobe_trace(void)
 	/* Profile interface */
 	trace_create_file("uprobe_profile", 0444, d_tracer,
 				    NULL, &uprobe_profile_ops);
+
+	uprobe_mmap_callback = trace_uprobe_mmap_callback;
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 4/4] trace_uprobe: Fix multiple update of same semaphores
  2018-02-28  7:53 [RFC 0/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
                   ` (2 preceding siblings ...)
  2018-02-28  7:53 ` [RFC 3/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
@ 2018-02-28  7:53 ` Ravi Bangoria
  2018-02-28 12:06 ` [RFC 0/4] trace_uprobe: Support SDT markers having semaphore Srikar Dronamraju
  2018-02-28 14:25 ` Masami Hiramatsu
  5 siblings, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2018-02-28  7:53 UTC (permalink / raw)
  To: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, srikar,
	oleg
  Cc: Ravi Bangoria

For tiny binaries/libraries, different mmap regions points to
the same file portion. In such cases, we may increment semaphore
multiple times. But while de-registration, semaphore will get
decremented only once, leaving semaphore > 0 even if no one is
tracing on that marker.

Ensure increment and decrement happens in sync by keeping list
of mms in trace_uprobe. Increment semaphore only if mm is not
present in the list and decrement only if mm is present in the
list.

Example

  # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4 *0x10036" > uprobe_events

Before patch:

  # echo 1 > events/sdt_tick/loop2/enable
  # ./Workspace/sdt_prog/tick &
  # dd if=/proc//mem bs=1 count=1 skip=268566582 2>/dev/null | xxd
    0000000: 02                                       .

  # echo 0 > events/sdt_tick/loop2/enable
  # dd if=/proc//mem bs=1 count=1 skip=268566582 2>/dev/null | xxd
    0000000: 01                                       .

After patch:

  # echo 1 > events/sdt_tick/loop2/enable
  # ./Workspace/sdt_prog/tick &
  # dd if=/proc//mem bs=1 count=1 skip=268566582 2>/dev/null | xxd
    0000000: 01                                       .

  # echo 0 > events/sdt_tick/loop2/enable
  # dd if=/proc//mem bs=1 count=1 skip=268566582 2>/dev/null | xxd
    0000000: 00                                       .

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 kernel/trace/trace_uprobe.c | 105 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 102 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index d14aafc..3f1e8bd 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -49,6 +49,11 @@ struct trace_uprobe_filter {
 	struct list_head	perf_events;
 };
 
+struct sdt_mm_list {
+	struct mm_struct *mm;
+	struct sdt_mm_list *next;
+};
+
 /*
  * uprobe event core functions
  */
@@ -60,6 +65,8 @@ struct trace_uprobe {
 	char				*filename;
 	unsigned long			offset;
 	unsigned long			sdt_offset; /* sdt semaphore offset */
+	struct sdt_mm_list		*sml;
+	struct rw_semaphore		sml_rw_sem;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -273,6 +280,7 @@ static inline bool is_ret_probe(struct trace_uprobe *tu)
 	if (is_ret)
 		tu->consumer.ret_handler = uretprobe_dispatcher;
 	init_trace_uprobe_filter(&tu->filter);
+	init_rwsem(&tu->sml_rw_sem);
 	return tu;
 
 error:
@@ -953,6 +961,75 @@ static bool sdt_valid_vma(struct trace_uprobe *tu, struct vm_area_struct *vma)
 	return 0;
 }
 
+static bool sdt_check_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct sdt_mm_list *tmp = tu->sml;
+
+	if (!tu->sml || !mm)
+		return false;
+
+	while (tmp) {
+		if (tmp->mm == mm)
+			return true;
+		tmp = tmp->next;
+	}
+
+	return false;
+}
+
+static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct sdt_mm_list *tmp;
+
+	tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
+	if (!tmp) {
+		pr_info("sdt_add_mm_list failed.\n");
+		return;
+	}
+	tmp->mm = mm;
+	tmp->next = tu->sml;
+	tu->sml = tmp;
+}
+
+static void sdt_del_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct sdt_mm_list *prev, *curr;
+
+	if (!tu->sml)
+		return;
+
+	if (tu->sml->mm == mm) {
+		curr = tu->sml;
+		tu->sml = tu->sml->next;
+		kfree(curr);
+		return;
+	}
+
+	prev = tu->sml;
+	curr = tu->sml->next;
+	while (curr) {
+		if (curr->mm == mm) {
+			prev->next = curr->next;
+			kfree(curr);
+			return;
+		}
+		prev = curr;
+		curr = curr->next;
+	}
+}
+
+static void sdt_flush_mm_list(struct trace_uprobe *tu)
+{
+	struct sdt_mm_list *next, *curr = tu->sml;
+
+	while (curr) {
+		next = curr->next;
+		kfree(curr);
+		curr = next;
+	}
+	tu->sml = NULL;
+}
+
 /*
  * TODO: Adding this defination in include/linux/uprobes.h throws
  * warnings about address_sapce. Adding it here for the time being.
@@ -970,20 +1047,26 @@ static void sdt_increment_sem(struct trace_uprobe *tu)
 	if (IS_ERR(info))
 		goto out;
 
+	down_write(&tu->sml_rw_sem);
 	while (info) {
 		down_write(&info->mm->mmap_sem);
 		vma = sdt_find_vma(info->mm, tu);
 		if (!vma)
 			goto cont;
 
+		if (sdt_check_mm_list(tu, info->mm))
+			goto cont;
+
 		vaddr = offset_to_vaddr(vma, tu->sdt_offset);
-		sdt_update_sem(info->mm, vaddr, 1);
+		if (!sdt_update_sem(info->mm, vaddr, 1))
+			sdt_add_mm_list(tu, info->mm);
 
 cont:
 		up_write(&info->mm->mmap_sem);
 		mmput(info->mm);
 		info = free_uprobe_map_info(info);
 	}
+	up_write(&tu->sml_rw_sem);
 
 out:
 	uprobe_end_dup_mmap();
@@ -1001,8 +1084,16 @@ void trace_uprobe_mmap_callback(struct vm_area_struct *vma)
 		    !trace_probe_is_enabled(&tu->tp))
 			continue;
 
+		down_write(&tu->sml_rw_sem);
+		if (sdt_check_mm_list(tu, vma->vm_mm))
+			goto cont;
+
 		vaddr = offset_to_vaddr(vma, tu->sdt_offset);
-		sdt_update_sem(vma->vm_mm, vaddr, 1);
+		if (!sdt_update_sem(vma->vm_mm, vaddr, 1))
+			sdt_add_mm_list(tu, vma->vm_mm);
+
+cont:
+		up_write(&tu->sml_rw_sem);
 	}
 	mutex_unlock(&uprobe_lock);
 }
@@ -1017,7 +1108,11 @@ static void sdt_decrement_sem(struct trace_uprobe *tu)
 	if (IS_ERR(info))
 		return;
 
+	down_write(&tu->sml_rw_sem);
 	while (info) {
+		if (!sdt_check_mm_list(tu, info->mm))
+			goto cont;
+
 		down_write(&info->mm->mmap_sem);
 		vma = sdt_find_vma(info->mm, tu);
 		if (vma) {
@@ -1025,10 +1120,14 @@ static void sdt_decrement_sem(struct trace_uprobe *tu)
 			sdt_update_sem(info->mm, vaddr, -1);
 		}
 		up_write(&info->mm->mmap_sem);
-
+		sdt_del_mm_list(tu, info->mm);
+
+cont:
 		mmput(info->mm);
 		info = free_uprobe_map_info(info);
 	}
+	sdt_flush_mm_list(tu);
+	up_write(&tu->sml_rw_sem);
 }
 
 typedef bool (*filter_func_t)(struct uprobe_consumer *self,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] trace_uprobe: Support SDT markers having semaphore
  2018-02-28  7:53 [RFC 0/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
                   ` (3 preceding siblings ...)
  2018-02-28  7:53 ` [RFC 4/4] trace_uprobe: Fix multiple update of same semaphores Ravi Bangoria
@ 2018-02-28 12:06 ` Srikar Dronamraju
  2018-03-01  5:10   ` Ravi Bangoria
  2018-02-28 14:25 ` Masami Hiramatsu
  5 siblings, 1 reply; 19+ messages in thread
From: Srikar Dronamraju @ 2018-02-28 12:06 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, oleg

* Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> [2018-02-28 13:23:41]:

>   # readelf -n ./tick
>   Provider: tick
>   Name: loop2
>   ... Semaphore: 0x0000000010020036
> 
>   # readelf -SW ./tick | grep probes
>   [25] .probes           PROGBITS        0000000010020034 010034
> 
> 
> Semaphore offset is 0x10036. I don't have educated 'perf probe'
> about semaphore. So instead of using 'perf probe' command, I'm
> manually adding entry in the <tracefs>/uprobe_events file.
> Special char * denotes semaphore offset.
> 
> 
>   # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4 *0x10036" > uprobe_events
> 
>   # perf stat -e sdt_tick:loop2 -- /tmp/tick
>   hi: 0
>   hi: 1
>   hi: 2
>   hi: 3
>   ^C
>   Performance counter stats for '/tmp/tick':
>               4      sdt_tick:loop2                                              
>      3.359047827 seconds time elapsed
> 
> 
> Feedback?
> 
> TODO:
>  - Educate perf tool about semaphore.
> 

Is it possible to extend perf buildcache with a new option to work with
semaphore?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 1/4] Uprobe: Rename map_info to uprobe_map_info
  2018-02-28  7:53 ` [RFC 1/4] Uprobe: Rename map_info to uprobe_map_info Ravi Bangoria
@ 2018-02-28 12:09   ` Srikar Dronamraju
  2018-03-01  5:11     ` Ravi Bangoria
  0 siblings, 1 reply; 19+ messages in thread
From: Srikar Dronamraju @ 2018-02-28 12:09 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, oleg

> 
> -static inline struct map_info *free_map_info(struct map_info *info)
> +static inline struct uprobe_map_info *
> +free_uprobe_map_info(struct uprobe_map_info *info)
>  {
> -	struct map_info *next = info->next;
> +	struct uprobe_map_info *next = info->next;
>  	kfree(info);
>  	return next;
>  }
> 
> -static struct map_info *
> -build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
> +static struct uprobe_map_info *
> +build_uprobe_map_info(struct address_space *mapping, loff_t offset,
> +		      bool is_register)

Imho, uprobe_build_map_info may be better than build_uprobe_map_info,
similarly uprobe_free_map_info.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] Uprobe: Export few functions / data structures
  2018-02-28  7:53 ` [RFC 2/4] Uprobe: Export few functions / data structures Ravi Bangoria
@ 2018-02-28 12:24   ` Srikar Dronamraju
  2018-03-01  5:25       ` Ravi Bangoria
  0 siblings, 1 reply; 19+ messages in thread
From: Srikar Dronamraju @ 2018-02-28 12:24 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, oleg

> @@ -149,6 +155,11 @@ struct uprobes_state {
>  extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
>  extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
>  					 void *src, unsigned long len);
> +unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset);
> +void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len);
> +void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len);
> +struct uprobe_map_info *free_uprobe_map_info(struct uprobe_map_info *info);
> +
>  #else /* !CONFIG_UPROBES */

If we have to export the above, we might have to work with mm maintainers and
see if we can move them there.

> -static inline struct uprobe_map_info *
> -free_uprobe_map_info(struct uprobe_map_info *info)
> +struct uprobe_map_info *free_uprobe_map_info(struct uprobe_map_info *info)
>  {
>  	struct uprobe_map_info *next = info->next;
>  	kfree(info);
>  	return next;
>  }
> 
> -static struct uprobe_map_info *
> -build_uprobe_map_info(struct address_space *mapping, loff_t offset,
> -		      bool is_register)
> +struct uprobe_map_info *build_uprobe_map_info(struct address_space *mapping,
> +					      loff_t offset, bool is_register)
>  {
>  	unsigned long pgoff = offset >> PAGE_SHIFT;
>  	struct vm_area_struct *vma;

Instead of exporting, did you look at extending the uprobe consumer with
ops. i.e if the consumer detects that a probe is a semaphore and exports
a set of callbacks which can them be called from uprobe
insertion/deletion time. With such a thing, incrementing/decrementing
the semaphore and the insertion/deletion of the breakpoint can be done
at one shot. No?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] trace_uprobe: Support SDT markers having semaphore
  2018-02-28  7:53 [RFC 0/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
                   ` (4 preceding siblings ...)
  2018-02-28 12:06 ` [RFC 0/4] trace_uprobe: Support SDT markers having semaphore Srikar Dronamraju
@ 2018-02-28 14:25 ` Masami Hiramatsu
  2018-03-01  5:32   ` Ravi Bangoria
  5 siblings, 1 reply; 19+ messages in thread
From: Masami Hiramatsu @ 2018-02-28 14:25 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, ananth, naveen.n.rao, srikar, oleg

Hi Ravi,

Thank you for your great work!

On Wed, 28 Feb 2018 13:23:41 +0530
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:

> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. These markers are added by developer at
> important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. If this computaion is quite more, execution of it can be
> ommited by runtime if() condition when no one is tracing on the marker:
> 
>     if (semaphore > 0) {
>         Execute marker instructions;
>     }
> 
> Default value of semaphore is 0. Tracer has to increment the semaphore
> before recording on a marker and decrement it at the end of tracing.
> 
> Currently, perf tool has limited supports for SDT markers. I.e. it
> can not trace markers surrounded by semaphore. Also, it's not easy
> to add semaphore flip logic in userspace tool like perf, so basic
> idea for this patchset is to add semaphore flip logic in the
> trace_uprobe infrastructure. Ex,[2]
> 
>   # cat tick.c
>     ... 
>     for (i = 0; i < 100; i++) {
> 	DTRACE_PROBE1(tick, loop1, i);
>         if (TICK_LOOP2_ENABLED()) {
>             DTRACE_PROBE1(tick, loop2, i); 
>         }
>         printf("hi: %d\n", i); 
>         sleep(1);
>     }   
>     ... 
> 
> Here tick:loop1 is marker without semaphore where as tick:loop2 is
> surrounded by semaphore.
> 
> 
>   # perf buildid-cache --add /tmp/tick
>   # perf probe sdt_tick:loop1
>   # perf probe sdt_tick:loop2
> 
>   # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
>   hi: 0
>   hi: 1
>   hi: 2
>   ^C
>   Performance counter stats for '/tmp/tick':
>              3      sdt_tick:loop1
>              0      sdt_tick:loop2
>      2.747086086 seconds time elapsed
> 
> 
> Perf failed to record data for tick:loop2. Same experiment with this
> patch series:
> 
> 
>   # readelf -n ./tick
>   Provider: tick
>   Name: loop2
>   ... Semaphore: 0x0000000010020036
> 
>   # readelf -SW ./tick | grep probes
>   [25] .probes           PROGBITS        0000000010020034 010034
> 
> 
> Semaphore offset is 0x10036. I don't have educated 'perf probe'
> about semaphore. So instead of using 'perf probe' command, I'm
> manually adding entry in the <tracefs>/uprobe_events file.

Ok, it is easy to pass semaphore address via perf probe :)

> Special char * denotes semaphore offset.
> 
> 
>   # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4 *0x10036" > uprobe_events

IMHO, this syntax is no good, separate with space is only for arguments.
Since the semaphore is per-probe-point based, that should be specified with probe point.
(there are no 2 or more semaphores on 1 event, are there?)
So something like

# echo "p:sdt_tick/loop2 /tmp/tick:0x6e4(0x10036)" > uprobe_events

would be better to me.

Thank you,

> 
>   # perf stat -e sdt_tick:loop2 -- /tmp/tick
>   hi: 0
>   hi: 1
>   hi: 2
>   hi: 3
>   ^C
>   Performance counter stats for '/tmp/tick':
>               4      sdt_tick:loop2                                              
>      3.359047827 seconds time elapsed
> 
> 
> Feedback?
> 
> TODO:
>  - Educate perf tool about semaphore.
>  - perf_event_open() now suppoers {k,u}probe event creation[3]. If we
>    can supply semaphore offset in perf_event_attr, perf_event_open()
>    can be educated to probe SDT marker having semaphore. Though, both
>    config1 and config2 are already being used for uprobe and I don't
>    see any other attribute which I can use for semaphore offset. Can
>    we introduce one more config there? config3?
> 
> [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> [2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
> [3] https://lkml.org/lkml/2017/12/6/976
> 
> 
> Ravi Bangoria (4):
>   Uprobe: Rename map_info to uprobe_map_info
>   Uprobe: Export few functions / data structures
>   trace_uprobe: Support SDT markers having semaphore
>   trace_uprobe: Fix multiple update of same semaphores
> 
>  include/linux/uprobes.h     |  25 +++++
>  kernel/events/uprobes.c     |  43 ++++----
>  kernel/trace/trace_uprobe.c | 244 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 290 insertions(+), 22 deletions(-)
> 
> -- 
> 1.8.3.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] trace_uprobe: Support SDT markers having semaphore
  2018-02-28 12:06 ` [RFC 0/4] trace_uprobe: Support SDT markers having semaphore Srikar Dronamraju
@ 2018-03-01  5:10   ` Ravi Bangoria
  0 siblings, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2018-03-01  5:10 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, oleg,
	Ravi Bangoria



On 02/28/2018 05:36 PM, Srikar Dronamraju wrote:
> * Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> [2018-02-28 13:23:41]:
>
>>   # readelf -n ./tick
>>   Provider: tick
>>   Name: loop2
>>   ... Semaphore: 0x0000000010020036
>>
>>   # readelf -SW ./tick | grep probes
>>   [25] .probes           PROGBITS        0000000010020034 010034
>>
>>
>> Semaphore offset is 0x10036. I don't have educated 'perf probe'
>> about semaphore. So instead of using 'perf probe' command, I'm
>> manually adding entry in the <tracefs>/uprobe_events file.
>> Special char * denotes semaphore offset.
>>
>>
>>   # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4 *0x10036" > uprobe_events
>>
>>   # perf stat -e sdt_tick:loop2 -- /tmp/tick
>>   hi: 0
>>   hi: 1
>>   hi: 2
>>   hi: 3
>>   ^C
>>   Performance counter stats for '/tmp/tick':
>>               4      sdt_tick:loop2                                              
>>      3.359047827 seconds time elapsed
>>
>>
>> Feedback?
>>
>> TODO:
>>  - Educate perf tool about semaphore.
>>
> Is it possible to extend perf buildcache with a new option to work with
> semaphore?

Yes, that should be fairly easy to do. Will add a patch for that.

Thanks for the review,
Ravi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 1/4] Uprobe: Rename map_info to uprobe_map_info
  2018-02-28 12:09   ` Srikar Dronamraju
@ 2018-03-01  5:11     ` Ravi Bangoria
  0 siblings, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2018-03-01  5:11 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, oleg,
	Ravi Bangoria



On 02/28/2018 05:39 PM, Srikar Dronamraju wrote:
>> -static inline struct map_info *free_map_info(struct map_info *info)
>> +static inline struct uprobe_map_info *
>> +free_uprobe_map_info(struct uprobe_map_info *info)
>>  {
>> -	struct map_info *next = info->next;
>> +	struct uprobe_map_info *next = info->next;
>>  	kfree(info);
>>  	return next;
>>  }
>>
>> -static struct map_info *
>> -build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
>> +static struct uprobe_map_info *
>> +build_uprobe_map_info(struct address_space *mapping, loff_t offset,
>> +		      bool is_register)
> Imho, uprobe_build_map_info may be better than build_uprobe_map_info,
> similarly uprobe_free_map_info.

Sure, Will change it.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] Uprobe: Export few functions / data structures
  2018-02-28 12:24   ` Srikar Dronamraju
@ 2018-03-01  5:25       ` Ravi Bangoria
  0 siblings, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2018-03-01  5:25 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, oleg,
	Ravi Bangoria, linux-mm, Michal Hocko, Andrew Morton



On 02/28/2018 05:54 PM, Srikar Dronamraju wrote:
>> @@ -149,6 +155,11 @@ struct uprobes_state {
>>  extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
>>  extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
>>  					 void *src, unsigned long len);
>> +unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset);
>> +void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len);
>> +void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len);
>> +struct uprobe_map_info *free_uprobe_map_info(struct uprobe_map_info *info);
>> +
>>  #else /* !CONFIG_UPROBES */
> If we have to export the above, we might have to work with mm maintainers and
> see if we can move them there.

Adding
    linux-mm@kvack.org
    Michal Hocko <mhocko@kernel.org>
    Andrew Morton <akpm@linux-foundation.org>
in the cc.

>> -static inline struct uprobe_map_info *
>> -free_uprobe_map_info(struct uprobe_map_info *info)
>> +struct uprobe_map_info *free_uprobe_map_info(struct uprobe_map_info *info)
>>  {
>>  	struct uprobe_map_info *next = info->next;
>>  	kfree(info);
>>  	return next;
>>  }
>>
>> -static struct uprobe_map_info *
>> -build_uprobe_map_info(struct address_space *mapping, loff_t offset,
>> -		      bool is_register)
>> +struct uprobe_map_info *build_uprobe_map_info(struct address_space *mapping,
>> +					      loff_t offset, bool is_register)
>>  {
>>  	unsigned long pgoff = offset >> PAGE_SHIFT;
>>  	struct vm_area_struct *vma;
> Instead of exporting, did you look at extending the uprobe consumer with
> ops. i.e if the consumer detects that a probe is a semaphore and exports
> a set of callbacks which can them be called from uprobe
> insertion/deletion time. With such a thing, incrementing/decrementing
> the semaphore and the insertion/deletion of the breakpoint can be done
> at one shot. No?

Yes, we tried that approach as well. Basically, when install_breakpoint() get called,
notify consumer about that. We can either use consumer_filter function or add a
new callback into uprobe_consumer which will get called if install_breakpoint()
succeeds. something like:

     if (install_breakpoint()) {
         /* Notify consumers right after patching instruction. */
         consumer->post_prepare();
     }

There are different problem with that approach. install_breakpoint() gets called in
very early stage of binary loading and vma that holds the semaphore won't be
present in the mm yet. I also tried to solve this by creating a task_work in
consumer callback. task_work handler will get called when process virtual memory
map is fully prepared and we are going back to userspace. But it will make design
quite complicated. Also, there is no way to know if mm_struct we got in task_work
handler is _still_ valid.

With unregister also, we first remove the "caller" consumer and then re-patch
original instruction. i.e.

     __uprobe_unregister()
     {
         if (WARN_ON(!consumer_del(uprobe, uc)))
             return;
         err = register_for_each_vma(uprobe, NULL);

We don't callback "caller" consumer at unregistration.

Our idea is to make changes in core uprobe as less as possible. And IMHO,
exporting build_map_info() helps to simplifies the implementation.

Let me know if I'm missing something.

Thanks for the review,
Ravi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] Uprobe: Export few functions / data structures
@ 2018-03-01  5:25       ` Ravi Bangoria
  0 siblings, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2018-03-01  5:25 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, mhiramat, ananth, naveen.n.rao, oleg,
	Ravi Bangoria, linux-mm, Michal Hocko, Andrew Morton



On 02/28/2018 05:54 PM, Srikar Dronamraju wrote:
>> @@ -149,6 +155,11 @@ struct uprobes_state {
>>  extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
>>  extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
>>  					 void *src, unsigned long len);
>> +unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset);
>> +void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len);
>> +void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len);
>> +struct uprobe_map_info *free_uprobe_map_info(struct uprobe_map_info *info);
>> +
>>  #else /* !CONFIG_UPROBES */
> If we have to export the above, we might have to work with mm maintainers and
> see if we can move them there.

Adding
A A A  linux-mm@kvack.org
A A A  Michal Hocko <mhocko@kernel.org>
A A A  Andrew Morton <akpm@linux-foundation.org>
in the cc.

>> -static inline struct uprobe_map_info *
>> -free_uprobe_map_info(struct uprobe_map_info *info)
>> +struct uprobe_map_info *free_uprobe_map_info(struct uprobe_map_info *info)
>>  {
>>  	struct uprobe_map_info *next = info->next;
>>  	kfree(info);
>>  	return next;
>>  }
>>
>> -static struct uprobe_map_info *
>> -build_uprobe_map_info(struct address_space *mapping, loff_t offset,
>> -		      bool is_register)
>> +struct uprobe_map_info *build_uprobe_map_info(struct address_space *mapping,
>> +					      loff_t offset, bool is_register)
>>  {
>>  	unsigned long pgoff = offset >> PAGE_SHIFT;
>>  	struct vm_area_struct *vma;
> Instead of exporting, did you look at extending the uprobe consumer with
> ops. i.e if the consumer detects that a probe is a semaphore and exports
> a set of callbacks which can them be called from uprobe
> insertion/deletion time. With such a thing, incrementing/decrementing
> the semaphore and the insertion/deletion of the breakpoint can be done
> at one shot. No?

Yes, we tried that approach as well. Basically, when install_breakpoint() get called,
notify consumer about that. We can either use consumer_filter function or add a
new callback into uprobe_consumer which will get called if install_breakpoint()
succeeds. something like:

A A A A  if (install_breakpoint()) {
A A A A  A A A  /* Notify consumers right after patching instruction. */
A A A A A A A A  consumer->post_prepare();
A A A A  }

There are different problem with that approach. install_breakpoint() gets called in
very early stage of binary loading and vma that holds the semaphore won't be
present in the mm yet. I also tried to solve this by creating a task_work in
consumer callback. task_work handler will get called when process virtual memory
map is fully prepared and we are going back to userspace. But it will make design
quite complicated. Also, there is no way to know if mm_struct we got in task_work
handler is _still_ valid.

With unregister also, we first remove the "caller" consumer and then re-patch
original instruction. i.e.

A A A A  __uprobe_unregister()
A A A A  {
A A A A A A A A  if (WARN_ON(!consumer_del(uprobe, uc)))
A A A A A A A A A A A A  return;
A A A A A A A A  err = register_for_each_vma(uprobe, NULL);

We don't callback "caller" consumer at unregistration.

Our idea is to make changes in core uprobe as less as possible. And IMHO,
exporting build_map_info() helps to simplifies the implementation.

Let me know if I'm missing something.

Thanks for the review,
Ravi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/4] trace_uprobe: Support SDT markers having semaphore
  2018-02-28 14:25 ` Masami Hiramatsu
@ 2018-03-01  5:32   ` Ravi Bangoria
  0 siblings, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2018-03-01  5:32 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, ananth, naveen.n.rao, srikar, oleg,
	Ravi Bangoria



On 02/28/2018 07:55 PM, Masami Hiramatsu wrote:
> Hi Ravi,
>
> Thank you for your great work!

Thanks Masami.

> On Wed, 28 Feb 2018 13:23:41 +0530
> Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:
>
>> Userspace Statically Defined Tracepoints[1] are dtrace style markers
>> inside userspace applications. These markers are added by developer at
>> important places in the code. Each marker source expands to a single
>> nop instruction in the compiled code but there may be additional
>> overhead for computing the marker arguments which expands to couple of
>> instructions. If this computaion is quite more, execution of it can be
>> ommited by runtime if() condition when no one is tracing on the marker:
...
>>
>> Semaphore offset is 0x10036. I don't have educated 'perf probe'
>> about semaphore. So instead of using 'perf probe' command, I'm
>> manually adding entry in the <tracefs>/uprobe_events file.
> Ok, it is easy to pass semaphore address via perf probe :)

Yes, it should be fairly easy to parse semaphore at buildid-cache time.
Will add a patch for that.

>
>> Special char * denotes semaphore offset.
>>
>>
>>   # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4 *0x10036" > uprobe_events
> IMHO, this syntax is no good, separate with space is only for arguments.
> Since the semaphore is per-probe-point based, that should be specified with probe point.
> (there are no 2 or more semaphores on 1 event, are there?)
> So something like
>
> # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4(0x10036)" > uprobe_events

This is great suggestion. Will change it.

Please review patch 3 and 4 which contains actual implementation.

Thanks for the review,
Ravi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/4] trace_uprobe: Support SDT markers having semaphore
  2018-02-28  7:53 ` [RFC 3/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
@ 2018-03-01 14:07   ` Masami Hiramatsu
  2018-03-02  3:54     ` Ravi Bangoria
  2018-03-06 11:59   ` Peter Zijlstra
  1 sibling, 1 reply; 19+ messages in thread
From: Masami Hiramatsu @ 2018-03-01 14:07 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, ananth, naveen.n.rao, srikar, oleg

On Wed, 28 Feb 2018 13:23:44 +0530
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:

> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. These markers are added by developer at
> important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. If this computaion is quite more, execution of it can be
> ommited by runtime if() condition when no one is tracing on the marker:
> 
>     if (semaphore > 0) {
>         Execute marker instructions;
>     }   
> 
> Default value of semaphore is 0. Tracer has to increment the semaphore
> before recording on a marker and decrement it at the end of tracing.
> 
> Implement the semaphore flip logic in trace_uprobe, leaving core uprobe
> infrastructure as is, except one new callback from uprobe_mmap() to
> trace_uprobe.
> 
> There are two major scenarios while enabling the marker,
>  1. Trace already running process. In this case, find all suitable
>     processes and increment the semaphore value in them.
>  2. Trace is already enabled when target binary is executed. In this
>     case, all mmaps will get notified to trace_uprobe and trace_uprobe
>     will increment the semaphore if corresponding uprobe is enabled.
> 
> At the time of disabling probes, decrement semaphore in all existing
> target processes.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> ---
>  include/linux/uprobes.h     |   2 +
>  kernel/events/uprobes.c     |   5 ++
>  kernel/trace/trace_uprobe.c | 145 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 152 insertions(+)
> 
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index 06c169e..04e9d57 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -121,6 +121,8 @@ struct uprobe_map_info {
>  	unsigned long vaddr;
>  };
>  
> +extern void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
> +
>  extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
>  extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
>  extern bool is_swbp_insn(uprobe_opcode_t *insn);
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 56dd7af..81d8aaf 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1051,6 +1051,8 @@ static void build_probe_list(struct inode *inode,
>  	spin_unlock(&uprobes_treelock);
>  }
>  
> +void (*uprobe_mmap_callback)(struct vm_area_struct *vma) = NULL;
> +
>  /*
>   * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
>   *
> @@ -1063,6 +1065,9 @@ int uprobe_mmap(struct vm_area_struct *vma)
>  	struct uprobe *uprobe, *u;
>  	struct inode *inode;
>  
> +	if (vma->vm_flags & VM_WRITE && uprobe_mmap_callback)
> +		uprobe_mmap_callback(vma);
> +
>  	if (no_uprobe_events() || !valid_vma(vma, true))
>  		return 0;
>  
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 40592e7b..d14aafc 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -25,6 +25,7 @@
>  #include <linux/namei.h>
>  #include <linux/string.h>
>  #include <linux/rculist.h>
> +#include <linux/sched/mm.h>
>  
>  #include "trace_probe.h"
>  
> @@ -58,6 +59,7 @@ struct trace_uprobe {
>  	struct inode			*inode;
>  	char				*filename;
>  	unsigned long			offset;
> +	unsigned long			sdt_offset; /* sdt semaphore offset */
>  	unsigned long			nhit;
>  	struct trace_probe		tp;
>  };
> @@ -502,6 +504,16 @@ static int create_trace_uprobe(int argc, char **argv)
>  	for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
>  		struct probe_arg *parg = &tu->tp.args[i];
>  
> +		/* This is not really an argument. */
> +		if (argv[i][0] == '*') {
> +			ret = kstrtoul(&(argv[i][1]), 0, &tu->sdt_offset);
> +			if (ret) {
> +				pr_info("Invalid semaphore address.\n");
> +				goto error;
> +			}
> +			continue;
> +		}

So, this part should be done with parsing probe-point as I pointed.

> +
>  		/* Increment count for freeing args in error case */
>  		tu->tp.nr_args++;
>  
> @@ -894,6 +906,131 @@ static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func,
>  	return trace_handle_return(s);
>  }
>  
> +static bool sdt_valid_vma(struct trace_uprobe *tu, struct vm_area_struct *vma)
> +{
> +	unsigned long vaddr = offset_to_vaddr(vma, tu->sdt_offset);
> +
> +	return tu->sdt_offset &&
> +		vma->vm_file &&
> +		file_inode(vma->vm_file) == tu->inode &&
> +		vma->vm_flags & VM_WRITE &&
> +		vma->vm_start <= vaddr &&
> +		vma->vm_end > vaddr;
> +}
> +
> +static struct vm_area_struct *
> +sdt_find_vma(struct mm_struct *mm, struct trace_uprobe *tu)
> +{
> +	struct vm_area_struct *tmp;
> +
> +	for (tmp = mm->mmap; tmp != NULL; tmp = tmp->vm_next)
> +		if (sdt_valid_vma(tu, tmp))
> +			return tmp;
> +
> +	return NULL;
> +}
> +
> +static int
> +sdt_update_sem(struct mm_struct *mm, unsigned long vaddr, short val)
> +{
> +	struct page *page;
> +	struct vm_area_struct *vma;
> +	int ret = 0;
> +	unsigned short orig = 0;
> +
> +	if (vaddr == 0)
> +		return -EINVAL;
> +
> +	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
> +		FOLL_FORCE | FOLL_WRITE, &page, &vma, NULL);
> +	if (ret <= 0)
> +		return ret;
> +
> +	copy_from_page(page, vaddr, &orig, sizeof(orig));
> +	orig += val;
> +	copy_to_page(page, vaddr, &orig, sizeof(orig));
> +	put_page(page);
> +	return 0;
> +}
> +
> +/*
> + * TODO: Adding this defination in include/linux/uprobes.h throws
> + * warnings about address_sapce. Adding it here for the time being.
> + */
> +struct uprobe_map_info *build_uprobe_map_info(struct address_space *mapping, loff_t offset, bool is_register);
> +
> +static void sdt_increment_sem(struct trace_uprobe *tu)
> +{
> +	struct uprobe_map_info *info;
> +	struct vm_area_struct *vma;
> +	unsigned long vaddr;
> +
> +	uprobe_start_dup_mmap();
> +	info = build_uprobe_map_info(tu->inode->i_mapping, tu->sdt_offset, false);
> +	if (IS_ERR(info))
> +		goto out;
> +
> +	while (info) {
> +		down_write(&info->mm->mmap_sem);
> +		vma = sdt_find_vma(info->mm, tu);
> +		if (!vma)
> +			goto cont;
> +
> +		vaddr = offset_to_vaddr(vma, tu->sdt_offset);
> +		sdt_update_sem(info->mm, vaddr, 1);
> +
> +cont:

Why would you use goto here?

Thank you,

> +		up_write(&info->mm->mmap_sem);
> +		mmput(info->mm);
> +		info = free_uprobe_map_info(info);
> +	}
> +
> +out:
> +	uprobe_end_dup_mmap();
> +}
> +
> +/* Called with down_write(&vma->vm_mm->mmap_sem) */
> +void trace_uprobe_mmap_callback(struct vm_area_struct *vma)
> +{
> +	struct trace_uprobe *tu;
> +	unsigned long vaddr;
> +
> +	mutex_lock(&uprobe_lock);
> +	list_for_each_entry(tu, &uprobe_list, list) {
> +		if (!sdt_valid_vma(tu, vma) ||
> +		    !trace_probe_is_enabled(&tu->tp))
> +			continue;
> +
> +		vaddr = offset_to_vaddr(vma, tu->sdt_offset);
> +		sdt_update_sem(vma->vm_mm, vaddr, 1);
> +	}
> +	mutex_unlock(&uprobe_lock);
> +}
> +
> +static void sdt_decrement_sem(struct trace_uprobe *tu)
> +{
> +	struct vm_area_struct *vma;
> +	unsigned long vaddr;
> +	struct uprobe_map_info *info;
> +
> +	info = build_uprobe_map_info(tu->inode->i_mapping, tu->sdt_offset, false);
> +	if (IS_ERR(info))
> +		return;
> +
> +	while (info) {
> +		down_write(&info->mm->mmap_sem);
> +		vma = sdt_find_vma(info->mm, tu);
> +		if (vma) {
> +			vaddr = offset_to_vaddr(vma, tu->sdt_offset);
> +			sdt_update_sem(info->mm, vaddr, -1);
> +		}
> +		up_write(&info->mm->mmap_sem);
> +
> +		mmput(info->mm);
> +		info = free_uprobe_map_info(info);
> +	}
> +}
> +
>  typedef bool (*filter_func_t)(struct uprobe_consumer *self,
>  				enum uprobe_filter_ctx ctx,
>  				struct mm_struct *mm);
> @@ -939,6 +1076,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
>  	if (ret)
>  		goto err_buffer;
>  
> +	if (tu->sdt_offset)
> +		sdt_increment_sem(tu);
> +
>  	return 0;
>  
>   err_buffer:
> @@ -979,6 +1119,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
>  
>  	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>  
> +	if (tu->sdt_offset)
> +		sdt_decrement_sem(tu);
> +
>  	uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
>  	tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
>  
> @@ -1353,6 +1496,8 @@ static __init int init_uprobe_trace(void)
>  	/* Profile interface */
>  	trace_create_file("uprobe_profile", 0444, d_tracer,
>  				    NULL, &uprobe_profile_ops);
> +
> +	uprobe_mmap_callback = trace_uprobe_mmap_callback;
>  	return 0;
>  }
>  
> -- 
> 1.8.3.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/4] trace_uprobe: Support SDT markers having semaphore
  2018-03-01 14:07   ` Masami Hiramatsu
@ 2018-03-02  3:54     ` Ravi Bangoria
  0 siblings, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2018-03-02  3:54 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, rostedt, ananth, naveen.n.rao, srikar, oleg,
	Ravi Bangoria



On 03/01/2018 07:37 PM, Masami Hiramatsu wrote:
> On Wed, 28 Feb 2018 13:23:44 +0530
> Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:
>
>> @@ -502,6 +504,16 @@ static int create_trace_uprobe(int argc, char **argv)
>>  	for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
>>  		struct probe_arg *parg = &tu->tp.args[i];
>>  
>> +		/* This is not really an argument. */
>> +		if (argv[i][0] == '*') {
>> +			ret = kstrtoul(&(argv[i][1]), 0, &tu->sdt_offset);
>> +			if (ret) {
>> +				pr_info("Invalid semaphore address.\n");
>> +				goto error;
>> +			}
>> +			continue;
>> +		}
> So, this part should be done with parsing probe-point as I pointed.

Yes, will change it.

>> +static void sdt_increment_sem(struct trace_uprobe *tu)
>> +{
>> +	struct uprobe_map_info *info;
>> +	struct vm_area_struct *vma;
>> +	unsigned long vaddr;
>> +
>> +	uprobe_start_dup_mmap();
>> +	info = build_uprobe_map_info(tu->inode->i_mapping, tu->sdt_offset, false);
>> +	if (IS_ERR(info))
>> +		goto out;
>> +
>> +	while (info) {
>> +		down_write(&info->mm->mmap_sem);
>> +		vma = sdt_find_vma(info->mm, tu);
>> +		if (!vma)
>> +			goto cont;
>> +
>> +		vaddr = offset_to_vaddr(vma, tu->sdt_offset);
>> +		sdt_update_sem(info->mm, vaddr, 1);
>> +
>> +cont:
> Why would you use goto here?

Hmm.. sdt_find_vma() must return vma. Sure, will remove the goto.

Should I add a WARN_ON(!vma) ?

Thanks for the review,
Ravi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/4] trace_uprobe: Support SDT markers having semaphore
  2018-02-28  7:53 ` [RFC 3/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
  2018-03-01 14:07   ` Masami Hiramatsu
@ 2018-03-06 11:59   ` Peter Zijlstra
  2018-03-07  8:46     ` Ravi Bangoria
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2018-03-06 11:59 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: mingo, acme, alexander.shishkin, jolsa, namhyung, linux-kernel,
	rostedt, mhiramat, ananth, naveen.n.rao, srikar, oleg

On Wed, Feb 28, 2018 at 01:23:44PM +0530, Ravi Bangoria wrote:
> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. These markers are added by developer at
> important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. If this computaion is quite more, execution of it can be
> ommited by runtime if() condition when no one is tracing on the marker:
> 
>     if (semaphore > 0) {
>         Execute marker instructions;
>     }   
> 
> Default value of semaphore is 0. Tracer has to increment the semaphore
> before recording on a marker and decrement it at the end of tracing.
> 
> Implement the semaphore flip logic in trace_uprobe, leaving core uprobe
> infrastructure as is, except one new callback from uprobe_mmap() to
> trace_uprobe.

W.T.H. is that called a semaphore? afaict its just a usage-counter.
There is no blocking, no releasing, nothing that would make it an actual
semaphore.

So please, remove all mention of semaphore from this code, because it,
most emphatically, is not one.

Also, would it not be much better to do userspace jump-labels for this?
That completely avoids the dynamic branch at the SDT site.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/4] trace_uprobe: Support SDT markers having semaphore
  2018-03-06 11:59   ` Peter Zijlstra
@ 2018-03-07  8:46     ` Ravi Bangoria
  2018-03-07  8:57       ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Ravi Bangoria @ 2018-03-07  8:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, acme, alexander.shishkin, jolsa, namhyung, linux-kernel,
	rostedt, mhiramat, ananth, naveen.n.rao, srikar, oleg,
	Ravi Bangoria



On 03/06/2018 05:29 PM, Peter Zijlstra wrote:
> On Wed, Feb 28, 2018 at 01:23:44PM +0530, Ravi Bangoria wrote:
>> Userspace Statically Defined Tracepoints[1] are dtrace style markers
>> inside userspace applications. These markers are added by developer at
>> important places in the code. Each marker source expands to a single
>> nop instruction in the compiled code but there may be additional
>> overhead for computing the marker arguments which expands to couple of
>> instructions. If this computaion is quite more, execution of it can be
>> ommited by runtime if() condition when no one is tracing on the marker:
>>
>>     if (semaphore > 0) {
>>         Execute marker instructions;
>>     }   
>>
>> Default value of semaphore is 0. Tracer has to increment the semaphore
>> before recording on a marker and decrement it at the end of tracing.
>>
>> Implement the semaphore flip logic in trace_uprobe, leaving core uprobe
>> infrastructure as is, except one new callback from uprobe_mmap() to
>> trace_uprobe.
> W.T.H. is that called a semaphore? afaict its just a usage-counter.

I totally agree with you. But it's not me who named it semaphore :)

Please refer to "Semaphore Handling" section at:
https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

We can surly name it differently in the kernel code and document it
properly in the Documents/tracing/

> There is no blocking, no releasing, nothing that would make it an actual
> semaphore.
>
> So please, remove all mention of semaphore from this code, because it,
> most emphatically, is not one.
>
> Also, would it not be much better to do userspace jump-labels for this?
> That completely avoids the dynamic branch at the SDT site.
>

Userspace jump-label is a good idea but...

Semaphore logic has already became a kinda ABI now. Tools like bcc,
gdb, systemtap etc. flip the semaphore while probing the marker.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/4] trace_uprobe: Support SDT markers having semaphore
  2018-03-07  8:46     ` Ravi Bangoria
@ 2018-03-07  8:57       ` Peter Zijlstra
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2018-03-07  8:57 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: mingo, acme, alexander.shishkin, jolsa, namhyung, linux-kernel,
	rostedt, mhiramat, ananth, naveen.n.rao, srikar, oleg

On Wed, Mar 07, 2018 at 02:16:13PM +0530, Ravi Bangoria wrote:
> > W.T.H. is that called a semaphore? afaict its just a usage-counter.
> 
> I totally agree with you. But it's not me who named it semaphore :)
> 
> Please refer to "Semaphore Handling" section at:
> https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> 
> We can surly name it differently in the kernel code and document it
> properly in the Documents/tracing/

Yes, just because the SDT people failed their CS101 class doesn't mean
we need to perpetuate that failure. Please name it sensibly in our code.

> > Also, would it not be much better to do userspace jump-labels for this?
> > That completely avoids the dynamic branch at the SDT site.
> >
> 
> Userspace jump-label is a good idea but...
> 
> Semaphore logic has already became a kinda ABI now. Tools like bcc,
> gdb, systemtap etc. flip the semaphore while probing the marker.

*groan*.. maybe suggest it for the next version; it appears we're
already at SDTv3, so surely there will be a v4 too.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-03-07  8:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-28  7:53 [RFC 0/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
2018-02-28  7:53 ` [RFC 1/4] Uprobe: Rename map_info to uprobe_map_info Ravi Bangoria
2018-02-28 12:09   ` Srikar Dronamraju
2018-03-01  5:11     ` Ravi Bangoria
2018-02-28  7:53 ` [RFC 2/4] Uprobe: Export few functions / data structures Ravi Bangoria
2018-02-28 12:24   ` Srikar Dronamraju
2018-03-01  5:25     ` Ravi Bangoria
2018-03-01  5:25       ` Ravi Bangoria
2018-02-28  7:53 ` [RFC 3/4] trace_uprobe: Support SDT markers having semaphore Ravi Bangoria
2018-03-01 14:07   ` Masami Hiramatsu
2018-03-02  3:54     ` Ravi Bangoria
2018-03-06 11:59   ` Peter Zijlstra
2018-03-07  8:46     ` Ravi Bangoria
2018-03-07  8:57       ` Peter Zijlstra
2018-02-28  7:53 ` [RFC 4/4] trace_uprobe: Fix multiple update of same semaphores Ravi Bangoria
2018-02-28 12:06 ` [RFC 0/4] trace_uprobe: Support SDT markers having semaphore Srikar Dronamraju
2018-03-01  5:10   ` Ravi Bangoria
2018-02-28 14:25 ` Masami Hiramatsu
2018-03-01  5:32   ` Ravi Bangoria

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.