All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] trace_uprobe: Support SDT markers having reference count (semaphore)
@ 2018-04-04  8:31 ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }   

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the trace_uprobe infrastructure. Ex,[2]

  # cat tick.c
    ... 
    for (i = 0; i < 100; i++) {
	DTRACE_PROBE1(tick, loop1, i);
        if (TICK_LOOP2_ENABLED()) {
            DTRACE_PROBE1(tick, loop2, i); 
        }
        printf("hi: %d\n", i); 
        sleep(1);
    }   
    ... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
             3      sdt_tick:loop1
             0      sdt_tick:loop2
     2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C  
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed


Note:
 - 'reference counter' is called as 'semaphore' in original Dtrace
   (or Systemtap, bcc and even in ELF) documentation and code. But the 
   term 'semaphore' is misleading in this context. This is just a counter
   used to hold number of tracers tracing on a marker. This is not really
   used for any synchronization. So we are referring it as 'reference
   counter' in kernel / perf code.


v2 changes:
 - [PATCH v2 3/9] is new. build_map_info() has a side effect. One has
   to perform mmput() when he is done with the mm. Let free_map_info()
   take care of mmput() so that one does not need to worry about it.
 - [PATCH v2 6/9] sdt_update_ref_ctr(). No need to use memcpy().
   Reference counter can be directly updated using normal assignment.
 - [PATCH v2 6/9] Check valid vma is returned by sdt_find_vma() before
   incrementing / decrementing a reference counter.
 - [PATCH v2 6/9] Introduce utility functions for taking write lock on
   dup_mmap_sem. Use these functions in trace_uprobe to avoide race with
   fork / dup_mmap().
 - [PATCH v2 6/9] Don't check presence of mm in tu->sml at decrement
   time. Purpose of maintaining the list is to ensure increment happen
   only once for each {trace_uprobe,mm} tuple.
 - [PATCH v2 7/9] v1 was not removing mm from tu->sml when process
   exits and tracing is still on. This leads to a problem if same
   address gets used by new mm. Use mmu_notifier to remove such mm
   from the list. This guarantees that all mm which has been added
   to tu->sml will be removed from list either when tracing ends or
   when process goes away.
 - [PATCH v2 7/9] Patch description was misleading. Change it. Add
   more generic python example.
 - [PATCH v2 7/9] Convert sml_rw_sem into mutex sml_lock.
 - [PATCH v2 7/9] Use builtin linked list in sdt_mm_list instead of
   defining it's own pointer chain.
 - Change the order of last two patches.
 - [PATCH v2 9/9] Check availability of ref_ctr_offset support by
   trace_uprobe infrastructure before using it. This ensures newer
   perf tool will still work on older kernels which does not support
   trace_uprobe with reference counter.
 - Other changes as suggested by Masami, Oleg and Steve.

v1 can be found at:
  https://lkml.org/lkml/2018/3/13/432

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
[3] https://lkml.org/lkml/2017/12/6/976


Oleg Nesterov (1):
  Uprobe: Move mmput() into free_map_info()

Ravi Bangoria (8):
  Uprobe: Export vaddr <-> offset conversion functions
  mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()
  Uprobe: Rename map_info to uprobe_map_info
  Uprobe: Export uprobe_map_info along with
    uprobe_{build/free}_map_info()
  trace_uprobe: Support SDT markers having reference count (semaphore)
  trace_uprobe/sdt: Fix multiple update of same reference counter
  trace_uprobe/sdt: Document about reference counter
  perf probe: Support SDT markers having reference counter (semaphore)

 Documentation/trace/uprobetracer.txt |  16 ++-
 include/linux/mm.h                   |  12 ++
 include/linux/uprobes.h              |  19 +++
 kernel/events/uprobes.c              |  79 ++++++-----
 kernel/trace/trace.c                 |   2 +-
 kernel/trace/trace_uprobe.c          | 261 ++++++++++++++++++++++++++++++++++-
 tools/perf/util/probe-event.c        |  18 ++-
 tools/perf/util/probe-event.h        |   1 +
 tools/perf/util/probe-file.c         |  34 ++++-
 tools/perf/util/probe-file.h         |   1 +
 tools/perf/util/symbol-elf.c         |  46 ++++--
 tools/perf/util/symbol.h             |   7 +
 12 files changed, 431 insertions(+), 65 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/9] trace_uprobe: Support SDT markers having reference count (semaphore)
@ 2018-04-04  8:31 ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }   

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the trace_uprobe infrastructure. Ex,[2]

  # cat tick.c
    ... 
    for (i = 0; i < 100; i++) {
	DTRACE_PROBE1(tick, loop1, i);
        if (TICK_LOOP2_ENABLED()) {
            DTRACE_PROBE1(tick, loop2, i); 
        }
        printf("hi: %d\n", i); 
        sleep(1);
    }   
    ... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
             3      sdt_tick:loop1
             0      sdt_tick:loop2
     2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C  
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed


Note:
 - 'reference counter' is called as 'semaphore' in original Dtrace
   (or Systemtap, bcc and even in ELF) documentation and code. But the 
   term 'semaphore' is misleading in this context. This is just a counter
   used to hold number of tracers tracing on a marker. This is not really
   used for any synchronization. So we are referring it as 'reference
   counter' in kernel / perf code.


v2 changes:
 - [PATCH v2 3/9] is new. build_map_info() has a side effect. One has
   to perform mmput() when he is done with the mm. Let free_map_info()
   take care of mmput() so that one does not need to worry about it.
 - [PATCH v2 6/9] sdt_update_ref_ctr(). No need to use memcpy().
   Reference counter can be directly updated using normal assignment.
 - [PATCH v2 6/9] Check valid vma is returned by sdt_find_vma() before
   incrementing / decrementing a reference counter.
 - [PATCH v2 6/9] Introduce utility functions for taking write lock on
   dup_mmap_sem. Use these functions in trace_uprobe to avoide race with
   fork / dup_mmap().
 - [PATCH v2 6/9] Don't check presence of mm in tu->sml at decrement
   time. Purpose of maintaining the list is to ensure increment happen
   only once for each {trace_uprobe,mm} tuple.
 - [PATCH v2 7/9] v1 was not removing mm from tu->sml when process
   exits and tracing is still on. This leads to a problem if same
   address gets used by new mm. Use mmu_notifier to remove such mm
   from the list. This guarantees that all mm which has been added
   to tu->sml will be removed from list either when tracing ends or
   when process goes away.
 - [PATCH v2 7/9] Patch description was misleading. Change it. Add
   more generic python example.
 - [PATCH v2 7/9] Convert sml_rw_sem into mutex sml_lock.
 - [PATCH v2 7/9] Use builtin linked list in sdt_mm_list instead of
   defining it's own pointer chain.
 - Change the order of last two patches.
 - [PATCH v2 9/9] Check availability of ref_ctr_offset support by
   trace_uprobe infrastructure before using it. This ensures newer
   perf tool will still work on older kernels which does not support
   trace_uprobe with reference counter.
 - Other changes as suggested by Masami, Oleg and Steve.

v1 can be found at:
  https://lkml.org/lkml/2018/3/13/432

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
[3] https://lkml.org/lkml/2017/12/6/976


Oleg Nesterov (1):
  Uprobe: Move mmput() into free_map_info()

Ravi Bangoria (8):
  Uprobe: Export vaddr <-> offset conversion functions
  mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()
  Uprobe: Rename map_info to uprobe_map_info
  Uprobe: Export uprobe_map_info along with
    uprobe_{build/free}_map_info()
  trace_uprobe: Support SDT markers having reference count (semaphore)
  trace_uprobe/sdt: Fix multiple update of same reference counter
  trace_uprobe/sdt: Document about reference counter
  perf probe: Support SDT markers having reference counter (semaphore)

 Documentation/trace/uprobetracer.txt |  16 ++-
 include/linux/mm.h                   |  12 ++
 include/linux/uprobes.h              |  19 +++
 kernel/events/uprobes.c              |  79 ++++++-----
 kernel/trace/trace.c                 |   2 +-
 kernel/trace/trace_uprobe.c          | 261 ++++++++++++++++++++++++++++++++++-
 tools/perf/util/probe-event.c        |  18 ++-
 tools/perf/util/probe-event.h        |   1 +
 tools/perf/util/probe-file.c         |  34 ++++-
 tools/perf/util/probe-file.h         |   1 +
 tools/perf/util/symbol-elf.c         |  46 ++++--
 tools/perf/util/symbol.h             |   7 +
 12 files changed, 431 insertions(+), 65 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 1/9] Uprobe: Export vaddr <-> offset conversion functions
  2018-04-04  8:31 ` Ravi Bangoria
  (?)
@ 2018-04-04  8:31   ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

These are generic functions which operates on file offset
and virtual address. Make these functions available outside
of uprobe code so that other can use it as well.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
---
 include/linux/mm.h      | 12 ++++++++++++
 kernel/events/uprobes.c | 10 ----------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..95909f2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2274,6 +2274,18 @@ struct vm_unmapped_area_info {
 		return unmapped_area(info);
 }
 
+static inline unsigned long
+offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
+{
+	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+}
+
+static inline loff_t
+vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
+{
+	return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
+}
+
 /* truncate.c */
 extern void truncate_inode_pages(struct address_space *, loff_t);
 extern void truncate_inode_pages_range(struct address_space *,
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ce6848e..bd6f230 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -130,16 +130,6 @@ static bool valid_vma(struct vm_area_struct *vma, bool is_register)
 	return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC;
 }
 
-static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
-{
-	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
-}
-
-static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
-{
-	return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
-}
-
 /**
  * __replace_page - replace page in vma by new page.
  * based on replace_page in mm/ksm.c
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 1/9] Uprobe: Export vaddr <-> offset conversion functions
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

These are generic functions which operates on file offset
and virtual address. Make these functions available outside
of uprobe code so that other can use it as well.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
---
 include/linux/mm.h      | 12 ++++++++++++
 kernel/events/uprobes.c | 10 ----------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..95909f2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2274,6 +2274,18 @@ struct vm_unmapped_area_info {
 		return unmapped_area(info);
 }
 
+static inline unsigned long
+offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
+{
+	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+}
+
+static inline loff_t
+vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
+{
+	return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
+}
+
 /* truncate.c */
 extern void truncate_inode_pages(struct address_space *, loff_t);
 extern void truncate_inode_pages_range(struct address_space *,
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ce6848e..bd6f230 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -130,16 +130,6 @@ static bool valid_vma(struct vm_area_struct *vma, bool is_register)
 	return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC;
 }
 
-static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
-{
-	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
-}
-
-static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
-{
-	return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
-}
-
 /**
  * __replace_page - replace page in vma by new page.
  * based on replace_page in mm/ksm.c
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 1/9] Uprobe: Export vaddr <-> offset conversion functions
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

These are generic functions which operates on file offset
and virtual address. Make these functions available outside
of uprobe code so that other can use it as well.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: JA(C)rA'me Glisse <jglisse@redhat.com>
---
 include/linux/mm.h      | 12 ++++++++++++
 kernel/events/uprobes.c | 10 ----------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..95909f2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2274,6 +2274,18 @@ struct vm_unmapped_area_info {
 		return unmapped_area(info);
 }
 
+static inline unsigned long
+offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
+{
+	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+}
+
+static inline loff_t
+vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
+{
+	return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
+}
+
 /* truncate.c */
 extern void truncate_inode_pages(struct address_space *, loff_t);
 extern void truncate_inode_pages_range(struct address_space *,
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ce6848e..bd6f230 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -130,16 +130,6 @@ static bool valid_vma(struct vm_area_struct *vma, bool is_register)
 	return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC;
 }
 
-static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
-{
-	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
-}
-
-static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
-{
-	return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
-}
-
 /**
  * __replace_page - replace page in vma by new page.
  * based on replace_page in mm/ksm.c
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 2/9] mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()
  2018-04-04  8:31 ` Ravi Bangoria
  (?)
@ 2018-04-04  8:31   ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Make function names more meaningful by adding vma_ prefix
to them.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
---
 include/linux/mm.h      |  4 ++--
 kernel/events/uprobes.c | 14 +++++++-------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 95909f2..d7ee526 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2275,13 +2275,13 @@ struct vm_unmapped_area_info {
 }
 
 static inline unsigned long
-offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
+vma_offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
 {
 	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
 }
 
 static inline loff_t
-vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
+vma_vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
 {
 	return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
 }
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index bd6f230..535fd39 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -748,7 +748,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 		curr = info;
 
 		info->mm = vma->vm_mm;
-		info->vaddr = offset_to_vaddr(vma, offset);
+		info->vaddr = vma_offset_to_vaddr(vma, offset);
 	}
 	i_mmap_unlock_read(mapping);
 
@@ -807,7 +807,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 			goto unlock;
 
 		if (vma->vm_start > info->vaddr ||
-		    vaddr_to_offset(vma, info->vaddr) != uprobe->offset)
+		    vma_vaddr_to_offset(vma, info->vaddr) != uprobe->offset)
 			goto unlock;
 
 		if (is_register) {
@@ -977,7 +977,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 		    uprobe->offset >= offset + vma->vm_end - vma->vm_start)
 			continue;
 
-		vaddr = offset_to_vaddr(vma, uprobe->offset);
+		vaddr = vma_offset_to_vaddr(vma, uprobe->offset);
 		err |= remove_breakpoint(uprobe, mm, vaddr);
 	}
 	up_read(&mm->mmap_sem);
@@ -1023,7 +1023,7 @@ static void build_probe_list(struct inode *inode,
 	struct uprobe *u;
 
 	INIT_LIST_HEAD(head);
-	min = vaddr_to_offset(vma, start);
+	min = vma_vaddr_to_offset(vma, start);
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
@@ -1076,7 +1076,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
 		if (!fatal_signal_pending(current) &&
 		    filter_chain(uprobe, UPROBE_FILTER_MMAP, vma->vm_mm)) {
-			unsigned long vaddr = offset_to_vaddr(vma, uprobe->offset);
+			unsigned long vaddr = vma_offset_to_vaddr(vma, uprobe->offset);
 			install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
 		}
 		put_uprobe(uprobe);
@@ -1095,7 +1095,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 
 	inode = file_inode(vma->vm_file);
 
-	min = vaddr_to_offset(vma, start);
+	min = vma_vaddr_to_offset(vma, start);
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
@@ -1730,7 +1730,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
 	if (vma && vma->vm_start <= bp_vaddr) {
 		if (valid_vma(vma, false)) {
 			struct inode *inode = file_inode(vma->vm_file);
-			loff_t offset = vaddr_to_offset(vma, bp_vaddr);
+			loff_t offset = vma_vaddr_to_offset(vma, bp_vaddr);
 
 			uprobe = find_uprobe(inode, offset);
 		}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 2/9] mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Make function names more meaningful by adding vma_ prefix
to them.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
---
 include/linux/mm.h      |  4 ++--
 kernel/events/uprobes.c | 14 +++++++-------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 95909f2..d7ee526 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2275,13 +2275,13 @@ struct vm_unmapped_area_info {
 }
 
 static inline unsigned long
-offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
+vma_offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
 {
 	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
 }
 
 static inline loff_t
-vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
+vma_vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
 {
 	return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
 }
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index bd6f230..535fd39 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -748,7 +748,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 		curr = info;
 
 		info->mm = vma->vm_mm;
-		info->vaddr = offset_to_vaddr(vma, offset);
+		info->vaddr = vma_offset_to_vaddr(vma, offset);
 	}
 	i_mmap_unlock_read(mapping);
 
@@ -807,7 +807,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 			goto unlock;
 
 		if (vma->vm_start > info->vaddr ||
-		    vaddr_to_offset(vma, info->vaddr) != uprobe->offset)
+		    vma_vaddr_to_offset(vma, info->vaddr) != uprobe->offset)
 			goto unlock;
 
 		if (is_register) {
@@ -977,7 +977,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 		    uprobe->offset >= offset + vma->vm_end - vma->vm_start)
 			continue;
 
-		vaddr = offset_to_vaddr(vma, uprobe->offset);
+		vaddr = vma_offset_to_vaddr(vma, uprobe->offset);
 		err |= remove_breakpoint(uprobe, mm, vaddr);
 	}
 	up_read(&mm->mmap_sem);
@@ -1023,7 +1023,7 @@ static void build_probe_list(struct inode *inode,
 	struct uprobe *u;
 
 	INIT_LIST_HEAD(head);
-	min = vaddr_to_offset(vma, start);
+	min = vma_vaddr_to_offset(vma, start);
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
@@ -1076,7 +1076,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
 		if (!fatal_signal_pending(current) &&
 		    filter_chain(uprobe, UPROBE_FILTER_MMAP, vma->vm_mm)) {
-			unsigned long vaddr = offset_to_vaddr(vma, uprobe->offset);
+			unsigned long vaddr = vma_offset_to_vaddr(vma, uprobe->offset);
 			install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
 		}
 		put_uprobe(uprobe);
@@ -1095,7 +1095,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 
 	inode = file_inode(vma->vm_file);
 
-	min = vaddr_to_offset(vma, start);
+	min = vma_vaddr_to_offset(vma, start);
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
@@ -1730,7 +1730,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
 	if (vma && vma->vm_start <= bp_vaddr) {
 		if (valid_vma(vma, false)) {
 			struct inode *inode = file_inode(vma->vm_file);
-			loff_t offset = vaddr_to_offset(vma, bp_vaddr);
+			loff_t offset = vma_vaddr_to_offset(vma, bp_vaddr);
 
 			uprobe = find_uprobe(inode, offset);
 		}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 2/9] mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Make function names more meaningful by adding vma_ prefix
to them.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: JA(C)rA'me Glisse <jglisse@redhat.com>
---
 include/linux/mm.h      |  4 ++--
 kernel/events/uprobes.c | 14 +++++++-------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 95909f2..d7ee526 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2275,13 +2275,13 @@ struct vm_unmapped_area_info {
 }
 
 static inline unsigned long
-offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
+vma_offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
 {
 	return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
 }
 
 static inline loff_t
-vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
+vma_vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
 {
 	return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
 }
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index bd6f230..535fd39 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -748,7 +748,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 		curr = info;
 
 		info->mm = vma->vm_mm;
-		info->vaddr = offset_to_vaddr(vma, offset);
+		info->vaddr = vma_offset_to_vaddr(vma, offset);
 	}
 	i_mmap_unlock_read(mapping);
 
@@ -807,7 +807,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 			goto unlock;
 
 		if (vma->vm_start > info->vaddr ||
-		    vaddr_to_offset(vma, info->vaddr) != uprobe->offset)
+		    vma_vaddr_to_offset(vma, info->vaddr) != uprobe->offset)
 			goto unlock;
 
 		if (is_register) {
@@ -977,7 +977,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 		    uprobe->offset >= offset + vma->vm_end - vma->vm_start)
 			continue;
 
-		vaddr = offset_to_vaddr(vma, uprobe->offset);
+		vaddr = vma_offset_to_vaddr(vma, uprobe->offset);
 		err |= remove_breakpoint(uprobe, mm, vaddr);
 	}
 	up_read(&mm->mmap_sem);
@@ -1023,7 +1023,7 @@ static void build_probe_list(struct inode *inode,
 	struct uprobe *u;
 
 	INIT_LIST_HEAD(head);
-	min = vaddr_to_offset(vma, start);
+	min = vma_vaddr_to_offset(vma, start);
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
@@ -1076,7 +1076,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
 		if (!fatal_signal_pending(current) &&
 		    filter_chain(uprobe, UPROBE_FILTER_MMAP, vma->vm_mm)) {
-			unsigned long vaddr = offset_to_vaddr(vma, uprobe->offset);
+			unsigned long vaddr = vma_offset_to_vaddr(vma, uprobe->offset);
 			install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
 		}
 		put_uprobe(uprobe);
@@ -1095,7 +1095,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 
 	inode = file_inode(vma->vm_file);
 
-	min = vaddr_to_offset(vma, start);
+	min = vma_vaddr_to_offset(vma, start);
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
@@ -1730,7 +1730,7 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
 	if (vma && vma->vm_start <= bp_vaddr) {
 		if (valid_vma(vma, false)) {
 			struct inode *inode = file_inode(vma->vm_file);
-			loff_t offset = vaddr_to_offset(vma, bp_vaddr);
+			loff_t offset = vma_vaddr_to_offset(vma, bp_vaddr);
 
 			uprobe = find_uprobe(inode, offset);
 		}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 3/9] Uprobe: Move mmput() into free_map_info()
  2018-04-04  8:31 ` Ravi Bangoria
@ 2018-04-04  8:31   ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

From: Oleg Nesterov <oleg@redhat.com>

build_map_info() has a side effect like one need to perform
mmput() when done with the mm. Add mmput() in free_map_info()
so that user does not have to call it explicitly.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 kernel/events/uprobes.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 535fd39..1d439c7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -704,6 +704,7 @@ struct map_info {
 static inline struct map_info *free_map_info(struct map_info *info)
 {
 	struct map_info *next = info->next;
+	mmput(info->mm);
 	kfree(info);
 	return next;
 }
@@ -773,8 +774,11 @@ static inline struct map_info *free_map_info(struct map_info *info)
 
 	goto again;
  out:
-	while (prev)
-		prev = free_map_info(prev);
+	while (prev) {
+		info = prev;
+		prev = prev->next;
+		kfree(info);
+	}
 	return curr;
 }
 
@@ -824,7 +828,6 @@ static inline struct map_info *free_map_info(struct map_info *info)
  unlock:
 		up_write(&mm->mmap_sem);
  free:
-		mmput(mm);
 		info = free_map_info(info);
 	}
  out:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 3/9] Uprobe: Move mmput() into free_map_info()
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

From: Oleg Nesterov <oleg@redhat.com>

build_map_info() has a side effect like one need to perform
mmput() when done with the mm. Add mmput() in free_map_info()
so that user does not have to call it explicitly.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 kernel/events/uprobes.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 535fd39..1d439c7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -704,6 +704,7 @@ struct map_info {
 static inline struct map_info *free_map_info(struct map_info *info)
 {
 	struct map_info *next = info->next;
+	mmput(info->mm);
 	kfree(info);
 	return next;
 }
@@ -773,8 +774,11 @@ static inline struct map_info *free_map_info(struct map_info *info)
 
 	goto again;
  out:
-	while (prev)
-		prev = free_map_info(prev);
+	while (prev) {
+		info = prev;
+		prev = prev->next;
+		kfree(info);
+	}
 	return curr;
 }
 
@@ -824,7 +828,6 @@ static inline struct map_info *free_map_info(struct map_info *info)
  unlock:
 		up_write(&mm->mmap_sem);
  free:
-		mmput(mm);
 		info = free_map_info(info);
 	}
  out:
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 4/9] Uprobe: Rename map_info to uprobe_map_info
  2018-04-04  8:31 ` Ravi Bangoria
  (?)
@ 2018-04-04  8:31   ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

map_info is very generic name, rename it to uprobe_map_info.
Renaming will help to export this structure outside of the
file.

Also rename free_map_info() to uprobe_free_map_info() and
build_map_info() to uprobe_build_map_info().

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
---
 kernel/events/uprobes.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1d439c7..477dc42 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -695,28 +695,30 @@ static void delete_uprobe(struct uprobe *uprobe)
 	put_uprobe(uprobe);
 }
 
-struct map_info {
-	struct map_info *next;
+struct uprobe_map_info {
+	struct uprobe_map_info *next;
 	struct mm_struct *mm;
 	unsigned long vaddr;
 };
 
-static inline struct map_info *free_map_info(struct map_info *info)
+static inline struct uprobe_map_info *
+uprobe_free_map_info(struct uprobe_map_info *info)
 {
-	struct map_info *next = info->next;
+	struct uprobe_map_info *next = info->next;
 	mmput(info->mm);
 	kfree(info);
 	return next;
 }
 
-static struct map_info *
-build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
+static struct uprobe_map_info *
+uprobe_build_map_info(struct address_space *mapping, loff_t offset,
+		      bool is_register)
 {
 	unsigned long pgoff = offset >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
-	struct map_info *curr = NULL;
-	struct map_info *prev = NULL;
-	struct map_info *info;
+	struct uprobe_map_info *curr = NULL;
+	struct uprobe_map_info *prev = NULL;
+	struct uprobe_map_info *info;
 	int more = 0;
 
  again:
@@ -730,7 +732,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 			 * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
 			 * reclaim. This is optimistic, no harm done if it fails.
 			 */
-			prev = kmalloc(sizeof(struct map_info),
+			prev = kmalloc(sizeof(struct uprobe_map_info),
 					GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
 			if (prev)
 				prev->next = NULL;
@@ -763,7 +765,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 	}
 
 	do {
-		info = kmalloc(sizeof(struct map_info), GFP_KERNEL);
+		info = kmalloc(sizeof(struct uprobe_map_info), GFP_KERNEL);
 		if (!info) {
 			curr = ERR_PTR(-ENOMEM);
 			goto out;
@@ -786,11 +788,11 @@ static inline struct map_info *free_map_info(struct map_info *info)
 register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 {
 	bool is_register = !!new;
-	struct map_info *info;
+	struct uprobe_map_info *info;
 	int err = 0;
 
 	percpu_down_write(&dup_mmap_sem);
-	info = build_map_info(uprobe->inode->i_mapping,
+	info = uprobe_build_map_info(uprobe->inode->i_mapping,
 					uprobe->offset, is_register);
 	if (IS_ERR(info)) {
 		err = PTR_ERR(info);
@@ -828,7 +830,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
  unlock:
 		up_write(&mm->mmap_sem);
  free:
-		info = free_map_info(info);
+		info = uprobe_free_map_info(info);
 	}
  out:
 	percpu_up_write(&dup_mmap_sem);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 4/9] Uprobe: Rename map_info to uprobe_map_info
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

map_info is very generic name, rename it to uprobe_map_info.
Renaming will help to export this structure outside of the
file.

Also rename free_map_info() to uprobe_free_map_info() and
build_map_info() to uprobe_build_map_info().

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
---
 kernel/events/uprobes.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1d439c7..477dc42 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -695,28 +695,30 @@ static void delete_uprobe(struct uprobe *uprobe)
 	put_uprobe(uprobe);
 }
 
-struct map_info {
-	struct map_info *next;
+struct uprobe_map_info {
+	struct uprobe_map_info *next;
 	struct mm_struct *mm;
 	unsigned long vaddr;
 };
 
-static inline struct map_info *free_map_info(struct map_info *info)
+static inline struct uprobe_map_info *
+uprobe_free_map_info(struct uprobe_map_info *info)
 {
-	struct map_info *next = info->next;
+	struct uprobe_map_info *next = info->next;
 	mmput(info->mm);
 	kfree(info);
 	return next;
 }
 
-static struct map_info *
-build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
+static struct uprobe_map_info *
+uprobe_build_map_info(struct address_space *mapping, loff_t offset,
+		      bool is_register)
 {
 	unsigned long pgoff = offset >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
-	struct map_info *curr = NULL;
-	struct map_info *prev = NULL;
-	struct map_info *info;
+	struct uprobe_map_info *curr = NULL;
+	struct uprobe_map_info *prev = NULL;
+	struct uprobe_map_info *info;
 	int more = 0;
 
  again:
@@ -730,7 +732,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 			 * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
 			 * reclaim. This is optimistic, no harm done if it fails.
 			 */
-			prev = kmalloc(sizeof(struct map_info),
+			prev = kmalloc(sizeof(struct uprobe_map_info),
 					GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
 			if (prev)
 				prev->next = NULL;
@@ -763,7 +765,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 	}
 
 	do {
-		info = kmalloc(sizeof(struct map_info), GFP_KERNEL);
+		info = kmalloc(sizeof(struct uprobe_map_info), GFP_KERNEL);
 		if (!info) {
 			curr = ERR_PTR(-ENOMEM);
 			goto out;
@@ -786,11 +788,11 @@ static inline struct map_info *free_map_info(struct map_info *info)
 register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 {
 	bool is_register = !!new;
-	struct map_info *info;
+	struct uprobe_map_info *info;
 	int err = 0;
 
 	percpu_down_write(&dup_mmap_sem);
-	info = build_map_info(uprobe->inode->i_mapping,
+	info = uprobe_build_map_info(uprobe->inode->i_mapping,
 					uprobe->offset, is_register);
 	if (IS_ERR(info)) {
 		err = PTR_ERR(info);
@@ -828,7 +830,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
  unlock:
 		up_write(&mm->mmap_sem);
  free:
-		info = free_map_info(info);
+		info = uprobe_free_map_info(info);
 	}
  out:
 	percpu_up_write(&dup_mmap_sem);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 4/9] Uprobe: Rename map_info to uprobe_map_info
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

map_info is very generic name, rename it to uprobe_map_info.
Renaming will help to export this structure outside of the
file.

Also rename free_map_info() to uprobe_free_map_info() and
build_map_info() to uprobe_build_map_info().

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: JA(C)rA'me Glisse <jglisse@redhat.com>
---
 kernel/events/uprobes.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1d439c7..477dc42 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -695,28 +695,30 @@ static void delete_uprobe(struct uprobe *uprobe)
 	put_uprobe(uprobe);
 }
 
-struct map_info {
-	struct map_info *next;
+struct uprobe_map_info {
+	struct uprobe_map_info *next;
 	struct mm_struct *mm;
 	unsigned long vaddr;
 };
 
-static inline struct map_info *free_map_info(struct map_info *info)
+static inline struct uprobe_map_info *
+uprobe_free_map_info(struct uprobe_map_info *info)
 {
-	struct map_info *next = info->next;
+	struct uprobe_map_info *next = info->next;
 	mmput(info->mm);
 	kfree(info);
 	return next;
 }
 
-static struct map_info *
-build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
+static struct uprobe_map_info *
+uprobe_build_map_info(struct address_space *mapping, loff_t offset,
+		      bool is_register)
 {
 	unsigned long pgoff = offset >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
-	struct map_info *curr = NULL;
-	struct map_info *prev = NULL;
-	struct map_info *info;
+	struct uprobe_map_info *curr = NULL;
+	struct uprobe_map_info *prev = NULL;
+	struct uprobe_map_info *info;
 	int more = 0;
 
  again:
@@ -730,7 +732,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 			 * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
 			 * reclaim. This is optimistic, no harm done if it fails.
 			 */
-			prev = kmalloc(sizeof(struct map_info),
+			prev = kmalloc(sizeof(struct uprobe_map_info),
 					GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
 			if (prev)
 				prev->next = NULL;
@@ -763,7 +765,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
 	}
 
 	do {
-		info = kmalloc(sizeof(struct map_info), GFP_KERNEL);
+		info = kmalloc(sizeof(struct uprobe_map_info), GFP_KERNEL);
 		if (!info) {
 			curr = ERR_PTR(-ENOMEM);
 			goto out;
@@ -786,11 +788,11 @@ static inline struct map_info *free_map_info(struct map_info *info)
 register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 {
 	bool is_register = !!new;
-	struct map_info *info;
+	struct uprobe_map_info *info;
 	int err = 0;
 
 	percpu_down_write(&dup_mmap_sem);
-	info = build_map_info(uprobe->inode->i_mapping,
+	info = uprobe_build_map_info(uprobe->inode->i_mapping,
 					uprobe->offset, is_register);
 	if (IS_ERR(info)) {
 		err = PTR_ERR(info);
@@ -828,7 +830,7 @@ static inline struct map_info *free_map_info(struct map_info *info)
  unlock:
 		up_write(&mm->mmap_sem);
  free:
-		info = free_map_info(info);
+		info = uprobe_free_map_info(info);
 	}
  out:
 	percpu_up_write(&dup_mmap_sem);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 5/9] Uprobe: Export uprobe_map_info along with uprobe_{build/free}_map_info()
  2018-04-04  8:31 ` Ravi Bangoria
  (?)
@ 2018-04-04  8:31   ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Given the file(inode) and offset, build_map_info() finds all
existing mm that map the portion of file containing offset.

Exporting these functions and data structure will help to use
them in other set of files.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
---
 include/linux/uprobes.h |  9 +++++++++
 kernel/events/uprobes.c | 14 +++-----------
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e9..7bd2760 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -109,12 +109,19 @@ enum rp_check {
 	RP_CHECK_RET,
 };
 
+struct address_space;
 struct xol_area;
 
 struct uprobes_state {
 	struct xol_area		*xol_area;
 };
 
+struct uprobe_map_info {
+	struct uprobe_map_info *next;
+	struct mm_struct *mm;
+	unsigned long vaddr;
+};
+
 extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
@@ -149,6 +156,8 @@ struct uprobes_state {
 extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 					 void *src, unsigned long len);
+extern struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info);
+extern struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping, loff_t offset, bool is_register);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 477dc42..096d1e6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -695,14 +695,7 @@ static void delete_uprobe(struct uprobe *uprobe)
 	put_uprobe(uprobe);
 }
 
-struct uprobe_map_info {
-	struct uprobe_map_info *next;
-	struct mm_struct *mm;
-	unsigned long vaddr;
-};
-
-static inline struct uprobe_map_info *
-uprobe_free_map_info(struct uprobe_map_info *info)
+struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info)
 {
 	struct uprobe_map_info *next = info->next;
 	mmput(info->mm);
@@ -710,9 +703,8 @@ struct uprobe_map_info {
 	return next;
 }
 
-static struct uprobe_map_info *
-uprobe_build_map_info(struct address_space *mapping, loff_t offset,
-		      bool is_register)
+struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping,
+					      loff_t offset, bool is_register)
 {
 	unsigned long pgoff = offset >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 5/9] Uprobe: Export uprobe_map_info along with uprobe_{build/free}_map_info()
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Given the file(inode) and offset, build_map_info() finds all
existing mm that map the portion of file containing offset.

Exporting these functions and data structure will help to use
them in other set of files.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
---
 include/linux/uprobes.h |  9 +++++++++
 kernel/events/uprobes.c | 14 +++-----------
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e9..7bd2760 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -109,12 +109,19 @@ enum rp_check {
 	RP_CHECK_RET,
 };
 
+struct address_space;
 struct xol_area;
 
 struct uprobes_state {
 	struct xol_area		*xol_area;
 };
 
+struct uprobe_map_info {
+	struct uprobe_map_info *next;
+	struct mm_struct *mm;
+	unsigned long vaddr;
+};
+
 extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
@@ -149,6 +156,8 @@ struct uprobes_state {
 extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 					 void *src, unsigned long len);
+extern struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info);
+extern struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping, loff_t offset, bool is_register);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 477dc42..096d1e6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -695,14 +695,7 @@ static void delete_uprobe(struct uprobe *uprobe)
 	put_uprobe(uprobe);
 }
 
-struct uprobe_map_info {
-	struct uprobe_map_info *next;
-	struct mm_struct *mm;
-	unsigned long vaddr;
-};
-
-static inline struct uprobe_map_info *
-uprobe_free_map_info(struct uprobe_map_info *info)
+struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info)
 {
 	struct uprobe_map_info *next = info->next;
 	mmput(info->mm);
@@ -710,9 +703,8 @@ struct uprobe_map_info {
 	return next;
 }
 
-static struct uprobe_map_info *
-uprobe_build_map_info(struct address_space *mapping, loff_t offset,
-		      bool is_register)
+struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping,
+					      loff_t offset, bool is_register)
 {
 	unsigned long pgoff = offset >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 5/9] Uprobe: Export uprobe_map_info along with uprobe_{build/free}_map_info()
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Given the file(inode) and offset, build_map_info() finds all
existing mm that map the portion of file containing offset.

Exporting these functions and data structure will help to use
them in other set of files.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: JA(C)rA'me Glisse <jglisse@redhat.com>
---
 include/linux/uprobes.h |  9 +++++++++
 kernel/events/uprobes.c | 14 +++-----------
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e9..7bd2760 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -109,12 +109,19 @@ enum rp_check {
 	RP_CHECK_RET,
 };
 
+struct address_space;
 struct xol_area;
 
 struct uprobes_state {
 	struct xol_area		*xol_area;
 };
 
+struct uprobe_map_info {
+	struct uprobe_map_info *next;
+	struct mm_struct *mm;
+	unsigned long vaddr;
+};
+
 extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
@@ -149,6 +156,8 @@ struct uprobes_state {
 extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 					 void *src, unsigned long len);
+extern struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info);
+extern struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping, loff_t offset, bool is_register);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 477dc42..096d1e6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -695,14 +695,7 @@ static void delete_uprobe(struct uprobe *uprobe)
 	put_uprobe(uprobe);
 }
 
-struct uprobe_map_info {
-	struct uprobe_map_info *next;
-	struct mm_struct *mm;
-	unsigned long vaddr;
-};
-
-static inline struct uprobe_map_info *
-uprobe_free_map_info(struct uprobe_map_info *info)
+struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info)
 {
 	struct uprobe_map_info *next = info->next;
 	mmput(info->mm);
@@ -710,9 +703,8 @@ struct uprobe_map_info {
 	return next;
 }
 
-static struct uprobe_map_info *
-uprobe_build_map_info(struct address_space *mapping, loff_t offset,
-		      bool is_register)
+struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping,
+					      loff_t offset, bool is_register)
 {
 	unsigned long pgoff = offset >> PAGE_SHIFT;
 	struct vm_area_struct *vma;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)
  2018-04-04  8:31 ` Ravi Bangoria
@ 2018-04-04  8:31   ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. These markers are added by developer at
important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
ommited by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in trace_uprobe, leaving core
uprobe infrastructure as is, except one new callback from uprobe_mmap()
to trace_uprobe.

trace_uprobe definition with reference counter will now be:

  <path>:<offset>[(ref_ctr_offset)]

There are two different cases while enabling the marker,
 1. Trace existing process. In this case, find all suitable processes
    and increment the reference counter in them.
 2. Enable trace before running target binary. In this case, all mmaps
    will get notified to trace_uprobe and trace_uprobe will increment
    the reference counter if corresponding uprobe is enabled.

At the time of disabling probes, decrement reference counter in all
existing target processes.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[Fengguang reported/fixed build failure in RFC patch]
---
 include/linux/uprobes.h     |  10 +++
 kernel/events/uprobes.c     |  16 +++++
 kernel/trace/trace_uprobe.c | 162 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 186 insertions(+), 2 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 7bd2760..2db3ed1 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -122,6 +122,8 @@ struct uprobe_map_info {
 	unsigned long vaddr;
 };
 
+extern void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
+
 extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
@@ -136,6 +138,8 @@ struct uprobe_map_info {
 extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end);
 extern void uprobe_start_dup_mmap(void);
 extern void uprobe_end_dup_mmap(void);
+extern void uprobe_down_write_dup_mmap(void);
+extern void uprobe_up_write_dup_mmap(void);
 extern void uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm);
 extern void uprobe_free_utask(struct task_struct *t);
 extern void uprobe_copy_process(struct task_struct *t, unsigned long flags);
@@ -192,6 +196,12 @@ static inline void uprobe_start_dup_mmap(void)
 static inline void uprobe_end_dup_mmap(void)
 {
 }
+static inline void uprobe_down_write_dup_mmap(void)
+{
+}
+static inline void uprobe_up_write_dup_mmap(void)
+{
+}
 static inline void
 uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 096d1e6..c691334 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1044,6 +1044,9 @@ static void build_probe_list(struct inode *inode,
 	spin_unlock(&uprobes_treelock);
 }
 
+/* Rightnow the only user of this is trace_uprobe. */
+void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
+
 /*
  * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
  *
@@ -1056,6 +1059,9 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	struct uprobe *uprobe, *u;
 	struct inode *inode;
 
+	if (uprobe_mmap_callback)
+		uprobe_mmap_callback(vma);
+
 	if (no_uprobe_events() || !valid_vma(vma, true))
 		return 0;
 
@@ -1247,6 +1253,16 @@ void uprobe_end_dup_mmap(void)
 	percpu_up_read(&dup_mmap_sem);
 }
 
+void uprobe_down_write_dup_mmap(void)
+{
+	percpu_down_write(&dup_mmap_sem);
+}
+
+void uprobe_up_write_dup_mmap(void)
+{
+	percpu_up_write(&dup_mmap_sem);
+}
+
 void uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm)
 {
 	if (test_bit(MMF_HAS_UPROBES, &oldmm->flags)) {
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2014f43..5582c2d 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -25,6 +25,8 @@
 #include <linux/namei.h>
 #include <linux/string.h>
 #include <linux/rculist.h>
+#include <linux/sched/mm.h>
+#include <linux/highmem.h>
 
 #include "trace_probe.h"
 
@@ -58,6 +60,7 @@ struct trace_uprobe {
 	struct inode			*inode;
 	char				*filename;
 	unsigned long			offset;
+	unsigned long			ref_ctr_offset;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -362,10 +365,10 @@ static int create_trace_uprobe(int argc, char **argv)
 {
 	struct trace_uprobe *tu;
 	struct inode *inode;
-	char *arg, *event, *group, *filename;
+	char *arg, *event, *group, *filename, *rctr, *rctr_end;
 	char buf[MAX_EVENT_NAME_LEN];
 	struct path path;
-	unsigned long offset;
+	unsigned long offset, ref_ctr_offset;
 	bool is_delete, is_return;
 	int i, ret;
 
@@ -375,6 +378,7 @@ static int create_trace_uprobe(int argc, char **argv)
 	is_return = false;
 	event = NULL;
 	group = NULL;
+	ref_ctr_offset = 0;
 
 	/* argc must be >= 1 */
 	if (argv[0][0] == '-')
@@ -454,6 +458,26 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 
+	/* Parse reference counter offset if specified. */
+	rctr = strchr(arg, '(');
+	if (rctr) {
+		rctr_end = strchr(rctr, ')');
+		if (rctr > rctr_end || *(rctr_end + 1) != 0) {
+			ret = -EINVAL;
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+
+		*rctr++ = '\0';
+		*rctr_end = '\0';
+		ret = kstrtoul(rctr, 0, &ref_ctr_offset);
+		if (ret) {
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+	}
+
+	/* Parse uprobe offset. */
 	ret = kstrtoul(arg, 0, &offset);
 	if (ret)
 		goto fail_address_parse;
@@ -488,6 +512,7 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 	tu->offset = offset;
+	tu->ref_ctr_offset = ref_ctr_offset;
 	tu->inode = inode;
 	tu->filename = kstrdup(filename, GFP_KERNEL);
 
@@ -620,6 +645,8 @@ static int probes_seq_show(struct seq_file *m, void *v)
 			break;
 		}
 	}
+	if (tu->ref_ctr_offset)
+		seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
 
 	for (i = 0; i < tu->tp.nr_args; i++)
 		seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
@@ -894,6 +921,129 @@ static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func,
 	return trace_handle_return(s);
 }
 
+static bool sdt_valid_vma(struct trace_uprobe *tu,
+			  struct vm_area_struct *vma,
+			  unsigned long vaddr)
+{
+	return tu->ref_ctr_offset &&
+		vma->vm_file &&
+		file_inode(vma->vm_file) == tu->inode &&
+		vma->vm_flags & VM_WRITE &&
+		vma->vm_start <= vaddr &&
+		vma->vm_end > vaddr;
+}
+
+static struct vm_area_struct *sdt_find_vma(struct trace_uprobe *tu,
+					   struct mm_struct *mm,
+					   unsigned long vaddr)
+{
+	struct vm_area_struct *vma = find_vma(mm, vaddr);
+
+	return (vma && sdt_valid_vma(tu, vma, vaddr)) ? vma : NULL;
+}
+
+/*
+ * Reference counter gate the invocation of probe. If present,
+ * by default reference counter is 0. One needs to increment
+ * it before tracing the probe and decrement it when done.
+ */
+static int
+sdt_update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
+{
+	void *kaddr;
+	struct page *page;
+	struct vm_area_struct *vma;
+	int ret = 0;
+	unsigned short *ptr;
+
+	if (vaddr == 0)
+		return -EINVAL;
+
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+		FOLL_FORCE | FOLL_WRITE, &page, &vma, NULL);
+	if (ret <= 0)
+		return ret;
+
+	kaddr = kmap_atomic(page);
+	ptr = kaddr + (vaddr & ~PAGE_MASK);
+	*ptr += d;
+	kunmap_atomic(kaddr);
+
+	put_page(page);
+	return 0;
+}
+
+static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
+{
+	struct uprobe_map_info *info;
+
+	uprobe_down_write_dup_mmap();
+	info = uprobe_build_map_info(tu->inode->i_mapping,
+				tu->ref_ctr_offset, false);
+	if (IS_ERR(info))
+		goto out;
+
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+
+		if (sdt_find_vma(tu, info->mm, info->vaddr))
+			sdt_update_ref_ctr(info->mm, info->vaddr, 1);
+
+		up_write(&info->mm->mmap_sem);
+		info = uprobe_free_map_info(info);
+	}
+
+out:
+	uprobe_up_write_dup_mmap();
+}
+
+/* Called with down_write(&vma->vm_mm->mmap_sem) */
+void trace_uprobe_mmap(struct vm_area_struct *vma)
+{
+	struct trace_uprobe *tu;
+	unsigned long vaddr;
+
+	if (!(vma->vm_flags & VM_WRITE))
+		return;
+
+	mutex_lock(&uprobe_lock);
+	list_for_each_entry(tu, &uprobe_list, list) {
+		if (!trace_probe_is_enabled(&tu->tp))
+			continue;
+
+		vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset);
+		if (!sdt_valid_vma(tu, vma, vaddr))
+			continue;
+
+		sdt_update_ref_ctr(vma->vm_mm, vaddr, 1);
+	}
+	mutex_unlock(&uprobe_lock);
+}
+
+static void sdt_decrement_ref_ctr(struct trace_uprobe *tu)
+{
+	struct uprobe_map_info *info;
+
+	uprobe_down_write_dup_mmap();
+	info = uprobe_build_map_info(tu->inode->i_mapping,
+				tu->ref_ctr_offset, false);
+	if (IS_ERR(info))
+		goto out;
+
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+
+		if (sdt_find_vma(tu, info->mm, info->vaddr))
+			sdt_update_ref_ctr(info->mm, info->vaddr, -1);
+
+		up_write(&info->mm->mmap_sem);
+		info = uprobe_free_map_info(info);
+	}
+
+out:
+	uprobe_up_write_dup_mmap();
+}
+
 typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 				enum uprobe_filter_ctx ctx,
 				struct mm_struct *mm);
@@ -939,6 +1089,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 	if (ret)
 		goto err_buffer;
 
+	if (tu->ref_ctr_offset)
+		sdt_increment_ref_ctr(tu);
+
 	return 0;
 
  err_buffer:
@@ -979,6 +1132,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
+	if (tu->ref_ctr_offset)
+		sdt_decrement_ref_ctr(tu);
+
 	uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
 	tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
 
@@ -1423,6 +1579,8 @@ static __init int init_uprobe_trace(void)
 	/* Profile interface */
 	trace_create_file("uprobe_profile", 0444, d_tracer,
 				    NULL, &uprobe_profile_ops);
+
+	uprobe_mmap_callback = trace_uprobe_mmap;
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. These markers are added by developer at
important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
ommited by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in trace_uprobe, leaving core
uprobe infrastructure as is, except one new callback from uprobe_mmap()
to trace_uprobe.

trace_uprobe definition with reference counter will now be:

  <path>:<offset>[(ref_ctr_offset)]

There are two different cases while enabling the marker,
 1. Trace existing process. In this case, find all suitable processes
    and increment the reference counter in them.
 2. Enable trace before running target binary. In this case, all mmaps
    will get notified to trace_uprobe and trace_uprobe will increment
    the reference counter if corresponding uprobe is enabled.

At the time of disabling probes, decrement reference counter in all
existing target processes.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[Fengguang reported/fixed build failure in RFC patch]
---
 include/linux/uprobes.h     |  10 +++
 kernel/events/uprobes.c     |  16 +++++
 kernel/trace/trace_uprobe.c | 162 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 186 insertions(+), 2 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 7bd2760..2db3ed1 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -122,6 +122,8 @@ struct uprobe_map_info {
 	unsigned long vaddr;
 };
 
+extern void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
+
 extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
 extern bool is_swbp_insn(uprobe_opcode_t *insn);
@@ -136,6 +138,8 @@ struct uprobe_map_info {
 extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end);
 extern void uprobe_start_dup_mmap(void);
 extern void uprobe_end_dup_mmap(void);
+extern void uprobe_down_write_dup_mmap(void);
+extern void uprobe_up_write_dup_mmap(void);
 extern void uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm);
 extern void uprobe_free_utask(struct task_struct *t);
 extern void uprobe_copy_process(struct task_struct *t, unsigned long flags);
@@ -192,6 +196,12 @@ static inline void uprobe_start_dup_mmap(void)
 static inline void uprobe_end_dup_mmap(void)
 {
 }
+static inline void uprobe_down_write_dup_mmap(void)
+{
+}
+static inline void uprobe_up_write_dup_mmap(void)
+{
+}
 static inline void
 uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 096d1e6..c691334 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1044,6 +1044,9 @@ static void build_probe_list(struct inode *inode,
 	spin_unlock(&uprobes_treelock);
 }
 
+/* Rightnow the only user of this is trace_uprobe. */
+void (*uprobe_mmap_callback)(struct vm_area_struct *vma);
+
 /*
  * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
  *
@@ -1056,6 +1059,9 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	struct uprobe *uprobe, *u;
 	struct inode *inode;
 
+	if (uprobe_mmap_callback)
+		uprobe_mmap_callback(vma);
+
 	if (no_uprobe_events() || !valid_vma(vma, true))
 		return 0;
 
@@ -1247,6 +1253,16 @@ void uprobe_end_dup_mmap(void)
 	percpu_up_read(&dup_mmap_sem);
 }
 
+void uprobe_down_write_dup_mmap(void)
+{
+	percpu_down_write(&dup_mmap_sem);
+}
+
+void uprobe_up_write_dup_mmap(void)
+{
+	percpu_up_write(&dup_mmap_sem);
+}
+
 void uprobe_dup_mmap(struct mm_struct *oldmm, struct mm_struct *newmm)
 {
 	if (test_bit(MMF_HAS_UPROBES, &oldmm->flags)) {
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2014f43..5582c2d 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -25,6 +25,8 @@
 #include <linux/namei.h>
 #include <linux/string.h>
 #include <linux/rculist.h>
+#include <linux/sched/mm.h>
+#include <linux/highmem.h>
 
 #include "trace_probe.h"
 
@@ -58,6 +60,7 @@ struct trace_uprobe {
 	struct inode			*inode;
 	char				*filename;
 	unsigned long			offset;
+	unsigned long			ref_ctr_offset;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -362,10 +365,10 @@ static int create_trace_uprobe(int argc, char **argv)
 {
 	struct trace_uprobe *tu;
 	struct inode *inode;
-	char *arg, *event, *group, *filename;
+	char *arg, *event, *group, *filename, *rctr, *rctr_end;
 	char buf[MAX_EVENT_NAME_LEN];
 	struct path path;
-	unsigned long offset;
+	unsigned long offset, ref_ctr_offset;
 	bool is_delete, is_return;
 	int i, ret;
 
@@ -375,6 +378,7 @@ static int create_trace_uprobe(int argc, char **argv)
 	is_return = false;
 	event = NULL;
 	group = NULL;
+	ref_ctr_offset = 0;
 
 	/* argc must be >= 1 */
 	if (argv[0][0] == '-')
@@ -454,6 +458,26 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 
+	/* Parse reference counter offset if specified. */
+	rctr = strchr(arg, '(');
+	if (rctr) {
+		rctr_end = strchr(rctr, ')');
+		if (rctr > rctr_end || *(rctr_end + 1) != 0) {
+			ret = -EINVAL;
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+
+		*rctr++ = '\0';
+		*rctr_end = '\0';
+		ret = kstrtoul(rctr, 0, &ref_ctr_offset);
+		if (ret) {
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+	}
+
+	/* Parse uprobe offset. */
 	ret = kstrtoul(arg, 0, &offset);
 	if (ret)
 		goto fail_address_parse;
@@ -488,6 +512,7 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 	tu->offset = offset;
+	tu->ref_ctr_offset = ref_ctr_offset;
 	tu->inode = inode;
 	tu->filename = kstrdup(filename, GFP_KERNEL);
 
@@ -620,6 +645,8 @@ static int probes_seq_show(struct seq_file *m, void *v)
 			break;
 		}
 	}
+	if (tu->ref_ctr_offset)
+		seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
 
 	for (i = 0; i < tu->tp.nr_args; i++)
 		seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
@@ -894,6 +921,129 @@ static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func,
 	return trace_handle_return(s);
 }
 
+static bool sdt_valid_vma(struct trace_uprobe *tu,
+			  struct vm_area_struct *vma,
+			  unsigned long vaddr)
+{
+	return tu->ref_ctr_offset &&
+		vma->vm_file &&
+		file_inode(vma->vm_file) == tu->inode &&
+		vma->vm_flags & VM_WRITE &&
+		vma->vm_start <= vaddr &&
+		vma->vm_end > vaddr;
+}
+
+static struct vm_area_struct *sdt_find_vma(struct trace_uprobe *tu,
+					   struct mm_struct *mm,
+					   unsigned long vaddr)
+{
+	struct vm_area_struct *vma = find_vma(mm, vaddr);
+
+	return (vma && sdt_valid_vma(tu, vma, vaddr)) ? vma : NULL;
+}
+
+/*
+ * Reference counter gate the invocation of probe. If present,
+ * by default reference counter is 0. One needs to increment
+ * it before tracing the probe and decrement it when done.
+ */
+static int
+sdt_update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
+{
+	void *kaddr;
+	struct page *page;
+	struct vm_area_struct *vma;
+	int ret = 0;
+	unsigned short *ptr;
+
+	if (vaddr == 0)
+		return -EINVAL;
+
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+		FOLL_FORCE | FOLL_WRITE, &page, &vma, NULL);
+	if (ret <= 0)
+		return ret;
+
+	kaddr = kmap_atomic(page);
+	ptr = kaddr + (vaddr & ~PAGE_MASK);
+	*ptr += d;
+	kunmap_atomic(kaddr);
+
+	put_page(page);
+	return 0;
+}
+
+static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
+{
+	struct uprobe_map_info *info;
+
+	uprobe_down_write_dup_mmap();
+	info = uprobe_build_map_info(tu->inode->i_mapping,
+				tu->ref_ctr_offset, false);
+	if (IS_ERR(info))
+		goto out;
+
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+
+		if (sdt_find_vma(tu, info->mm, info->vaddr))
+			sdt_update_ref_ctr(info->mm, info->vaddr, 1);
+
+		up_write(&info->mm->mmap_sem);
+		info = uprobe_free_map_info(info);
+	}
+
+out:
+	uprobe_up_write_dup_mmap();
+}
+
+/* Called with down_write(&vma->vm_mm->mmap_sem) */
+void trace_uprobe_mmap(struct vm_area_struct *vma)
+{
+	struct trace_uprobe *tu;
+	unsigned long vaddr;
+
+	if (!(vma->vm_flags & VM_WRITE))
+		return;
+
+	mutex_lock(&uprobe_lock);
+	list_for_each_entry(tu, &uprobe_list, list) {
+		if (!trace_probe_is_enabled(&tu->tp))
+			continue;
+
+		vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset);
+		if (!sdt_valid_vma(tu, vma, vaddr))
+			continue;
+
+		sdt_update_ref_ctr(vma->vm_mm, vaddr, 1);
+	}
+	mutex_unlock(&uprobe_lock);
+}
+
+static void sdt_decrement_ref_ctr(struct trace_uprobe *tu)
+{
+	struct uprobe_map_info *info;
+
+	uprobe_down_write_dup_mmap();
+	info = uprobe_build_map_info(tu->inode->i_mapping,
+				tu->ref_ctr_offset, false);
+	if (IS_ERR(info))
+		goto out;
+
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+
+		if (sdt_find_vma(tu, info->mm, info->vaddr))
+			sdt_update_ref_ctr(info->mm, info->vaddr, -1);
+
+		up_write(&info->mm->mmap_sem);
+		info = uprobe_free_map_info(info);
+	}
+
+out:
+	uprobe_up_write_dup_mmap();
+}
+
 typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 				enum uprobe_filter_ctx ctx,
 				struct mm_struct *mm);
@@ -939,6 +1089,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 	if (ret)
 		goto err_buffer;
 
+	if (tu->ref_ctr_offset)
+		sdt_increment_ref_ctr(tu);
+
 	return 0;
 
  err_buffer:
@@ -979,6 +1132,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 
 	WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
+	if (tu->ref_ctr_offset)
+		sdt_decrement_ref_ctr(tu);
+
 	uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
 	tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
 
@@ -1423,6 +1579,8 @@ static __init int init_uprobe_trace(void)
 	/* Profile interface */
 	trace_create_file("uprobe_profile", 0444, d_tracer,
 				    NULL, &uprobe_profile_ops);
+
+	uprobe_mmap_callback = trace_uprobe_mmap;
 	return 0;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
  2018-04-04  8:31 ` Ravi Bangoria
@ 2018-04-04  8:31   ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

When virtual memory map for binary/library is being prepared, there is
no direct one to one mapping between mmap() and virtual memory area. Ex,
when loader loads the library, it first calls mmap(size = total_size),
where total_size is addition of size of all elf sections that are going
to be mapped. Then it splits individual vmas with new mmap()/mprotect()
calls. Loader does this to ensure it gets continuous address range for
a library. load_elf_binary() also uses similar tricks while preparing
mappings of binary.

Ex for pyhton library,

  # strace -o out python
    mmap(NULL, 2738968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fff92460000
    mmap(0x7fff926a0000, 327680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x230000) = 0x7fff926a0000
    mprotect(0x7fff926a0000, 65536, PROT_READ) = 0

Here, the first mmap() maps the whole library into one region. Second
mmap() and third mprotect() split out the whole region into smaller
vmas and sets appropriate protection flags.

Now, in this case, trace_uprobe_mmap_callback() update the reference
counter twice -- by second mmap() call and by third mprotect() call --
because both regions contain reference counter.

But while de-registration, reference counter will get decremented only
by once leaving reference counter > 0 even if no one is tracing on that
marker.

Example with python library before patch:

    # readelf -n /lib64/libpython2.7.so.1.0 | grep -A1 function__entry
      Name: function__entry
      ... Semaphore: 0x00000000002899d8

  Probe on a marker:
    # echo "p:sdt_python/function__entry /usr/lib64/libpython2.7.so.1.0:0x16a4d4(0x2799d8)" > uprobe_events

  Start tracing:
    # perf record -e sdt_python:function__entry -a

  Run python workload:
    # python
    # cat /proc/`pgrep python`/maps | grep libpython
      7fffadb00000-7fffadd40000 r-xp 00000000 08:05 403934  /usr/lib64/libpython2.7.so.1.0
      7fffadd40000-7fffadd50000 r--p 00230000 08:05 403934  /usr/lib64/libpython2.7.so.1.0
      7fffadd50000-7fffadd90000 rw-p 00240000 08:05 403934  /usr/lib64/libpython2.7.so.1.0

  Reference counter value has been incremented twice:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fffadd899d8 )) 2>/dev/null | xxd
      0000000: 02                                       .

  Kill perf:
    #
      ^C[ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.322 MB perf.data (1273 samples) ]

  Reference conter is still 1 even when no one is tracing on it:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fffadd899d8 )) 2>/dev/null | xxd
      0000000: 01                                       .

Ensure increment and decrement happens in sync by keeping list of mms
in trace_uprobe. Check presence of mm in the list before incrementing
the reference counter. I.e. for each {trace_uprobe,mm} tuple, reference
counter must be incremented only by one. Note that we don't check the
presence of mm in the list at decrement time.

We consider only two case while incrementing the reference counter:
  1. Target binary is already running when we start tracing. In this
     case, find all mm which maps region of target binary containing
     reference counter. Loop over all mms and increment the counter
     if mm is not already present in the list.
  2. Tracer is already tracing before target binary starts execution.
     In this case, all mmap(vma) gets notified to trace_uprobe.
     Trace_uprobe will update reference counter if vma->vm_mm is not
     already present in the list.

  There is also a third case which we don't consider, a fork() case.
  When process with markers forks itself, we don't explicitly increment
  the reference counter in child process because it should be taken care
  by dup_mmap(). We also don't add the child mm in the list. This is
  fine because we don't check presence of mm in the list at decrement
  time.

After patch:

  Start perf record and then run python...
  Reference counter value has been incremented only once:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fff9cbf99d8 )) 2>/dev/null | xxd
      0000000: 01                                       .

  Kill perf:
    #
      ^C[ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.364 MB perf.data (1427 samples) ]

  Reference conter is reset to 0:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fff9cbb99d8 )) 2>/dev/null | xxd
      0000000: 00                                       .

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 kernel/trace/trace_uprobe.c | 105 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 102 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 5582c2d..c045174 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -27,6 +27,7 @@
 #include <linux/rculist.h>
 #include <linux/sched/mm.h>
 #include <linux/highmem.h>
+#include <linux/mmu_notifier.h>
 
 #include "trace_probe.h"
 
@@ -50,6 +51,11 @@ struct trace_uprobe_filter {
 	struct list_head	perf_events;
 };
 
+struct sdt_mm_list {
+	struct list_head list;
+	struct mm_struct *mm;
+};
+
 /*
  * uprobe event core functions
  */
@@ -61,6 +67,8 @@ struct trace_uprobe {
 	char				*filename;
 	unsigned long			offset;
 	unsigned long			ref_ctr_offset;
+	struct sdt_mm_list		sml;
+	struct mutex			sml_lock;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -274,6 +282,8 @@ static inline bool is_ret_probe(struct trace_uprobe *tu)
 	if (is_ret)
 		tu->consumer.ret_handler = uretprobe_dispatcher;
 	init_trace_uprobe_filter(&tu->filter);
+	mutex_init(&tu->sml_lock);
+	INIT_LIST_HEAD(&(tu->sml.list));
 	return tu;
 
 error:
@@ -921,6 +931,56 @@ static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func,
 	return trace_handle_return(s);
 }
 
+static bool sdt_check_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct sdt_mm_list *sml;
+
+	list_for_each_entry(sml, &(tu->sml.list), list)
+		if (sml->mm == mm)
+			return true;
+
+	return false;
+}
+
+static void sdt_mm_release(struct mmu_notifier *mn, struct mm_struct *mm);
+
+static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
+	.release = sdt_mm_release,
+};
+
+static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct mmu_notifier *mn;
+	struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
+
+	if (!sml)
+		return;
+	sml->mm = mm;
+	list_add(&(sml->list), &(tu->sml.list));
+
+	/* Register mmu_notifier for this mm. */
+	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
+	if (!mn)
+		return;
+
+	mn->ops = &sdt_mmu_notifier_ops;
+	__mmu_notifier_register(mn, mm);
+}
+
+static void sdt_del_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct list_head *pos, *q;
+	struct sdt_mm_list *sml;
+
+	list_for_each_safe(pos, q, &(tu->sml.list)) {
+		sml = list_entry(pos, struct sdt_mm_list, list);
+		if (sml->mm == mm) {
+			list_del(pos);
+			kfree(sml);
+		}
+	}
+}
+
 static bool sdt_valid_vma(struct trace_uprobe *tu,
 			  struct vm_area_struct *vma,
 			  unsigned long vaddr)
@@ -983,15 +1043,22 @@ static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
 	if (IS_ERR(info))
 		goto out;
 
+	mutex_lock(&tu->sml_lock);
 	while (info) {
+		if (sdt_check_mm_list(tu, info->mm))
+			goto cont;
+
 		down_write(&info->mm->mmap_sem);
 
-		if (sdt_find_vma(tu, info->mm, info->vaddr))
-			sdt_update_ref_ctr(info->mm, info->vaddr, 1);
+		if (sdt_find_vma(tu, info->mm, info->vaddr) &&
+		    !sdt_update_ref_ctr(info->mm, info->vaddr, 1))
+			sdt_add_mm_list(tu, info->mm);
 
 		up_write(&info->mm->mmap_sem);
+cont:
 		info = uprobe_free_map_info(info);
 	}
+	mutex_unlock(&tu->sml_lock);
 
 out:
 	uprobe_up_write_dup_mmap();
@@ -1015,11 +1082,27 @@ void trace_uprobe_mmap(struct vm_area_struct *vma)
 		if (!sdt_valid_vma(tu, vma, vaddr))
 			continue;
 
-		sdt_update_ref_ctr(vma->vm_mm, vaddr, 1);
+		mutex_lock(&tu->sml_lock);
+
+		if (!sdt_check_mm_list(tu, vma->vm_mm) &&
+		    !sdt_update_ref_ctr(vma->vm_mm, vaddr, 1))
+			sdt_add_mm_list(tu, vma->vm_mm);
+
+		mutex_unlock(&tu->sml_lock);
 	}
 	mutex_unlock(&uprobe_lock);
 }
 
+/*
+ * We don't check presence of mm in tu->sml here. We just decrement
+ * the reference counter if we find vma holding the reference counter.
+ *
+ * For tiny binaries/libraries, different mmap regions point to the
+ * same file portion. In such cases, uprobe_build_map_info() returns
+ * same mm multiple times with different virtual address of one
+ * reference counter. But we don't decrement the reference counter
+ * multiple time because we check for VM_WRITE in sdt_valid_vma().
+ */
 static void sdt_decrement_ref_ctr(struct trace_uprobe *tu)
 {
 	struct uprobe_map_info *info;
@@ -1030,6 +1113,7 @@ static void sdt_decrement_ref_ctr(struct trace_uprobe *tu)
 	if (IS_ERR(info))
 		goto out;
 
+	mutex_lock(&tu->sml_lock);
 	while (info) {
 		down_write(&info->mm->mmap_sem);
 
@@ -1037,13 +1121,28 @@ static void sdt_decrement_ref_ctr(struct trace_uprobe *tu)
 			sdt_update_ref_ctr(info->mm, info->vaddr, -1);
 
 		up_write(&info->mm->mmap_sem);
+		sdt_del_mm_list(tu, info->mm);
 		info = uprobe_free_map_info(info);
 	}
+	mutex_unlock(&tu->sml_lock);
 
 out:
 	uprobe_up_write_dup_mmap();
 }
 
+static void sdt_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct trace_uprobe *tu;
+
+	mutex_lock(&uprobe_lock);
+	list_for_each_entry(tu, &uprobe_list, list) {
+		mutex_lock(&tu->sml_lock);
+		sdt_del_mm_list(tu, mm);
+		mutex_unlock(&tu->sml_lock);
+	}
+	mutex_unlock(&uprobe_lock);
+}
+
 typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 				enum uprobe_filter_ctx ctx,
 				struct mm_struct *mm);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

When virtual memory map for binary/library is being prepared, there is
no direct one to one mapping between mmap() and virtual memory area. Ex,
when loader loads the library, it first calls mmap(size = total_size),
where total_size is addition of size of all elf sections that are going
to be mapped. Then it splits individual vmas with new mmap()/mprotect()
calls. Loader does this to ensure it gets continuous address range for
a library. load_elf_binary() also uses similar tricks while preparing
mappings of binary.

Ex for pyhton library,

  # strace -o out python
    mmap(NULL, 2738968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fff92460000
    mmap(0x7fff926a0000, 327680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x230000) = 0x7fff926a0000
    mprotect(0x7fff926a0000, 65536, PROT_READ) = 0

Here, the first mmap() maps the whole library into one region. Second
mmap() and third mprotect() split out the whole region into smaller
vmas and sets appropriate protection flags.

Now, in this case, trace_uprobe_mmap_callback() update the reference
counter twice -- by second mmap() call and by third mprotect() call --
because both regions contain reference counter.

But while de-registration, reference counter will get decremented only
by once leaving reference counter > 0 even if no one is tracing on that
marker.

Example with python library before patch:

    # readelf -n /lib64/libpython2.7.so.1.0 | grep -A1 function__entry
      Name: function__entry
      ... Semaphore: 0x00000000002899d8

  Probe on a marker:
    # echo "p:sdt_python/function__entry /usr/lib64/libpython2.7.so.1.0:0x16a4d4(0x2799d8)" > uprobe_events

  Start tracing:
    # perf record -e sdt_python:function__entry -a

  Run python workload:
    # python
    # cat /proc/`pgrep python`/maps | grep libpython
      7fffadb00000-7fffadd40000 r-xp 00000000 08:05 403934  /usr/lib64/libpython2.7.so.1.0
      7fffadd40000-7fffadd50000 r--p 00230000 08:05 403934  /usr/lib64/libpython2.7.so.1.0
      7fffadd50000-7fffadd90000 rw-p 00240000 08:05 403934  /usr/lib64/libpython2.7.so.1.0

  Reference counter value has been incremented twice:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fffadd899d8 )) 2>/dev/null | xxd
      0000000: 02                                       .

  Kill perf:
    #
      ^C[ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.322 MB perf.data (1273 samples) ]

  Reference conter is still 1 even when no one is tracing on it:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fffadd899d8 )) 2>/dev/null | xxd
      0000000: 01                                       .

Ensure increment and decrement happens in sync by keeping list of mms
in trace_uprobe. Check presence of mm in the list before incrementing
the reference counter. I.e. for each {trace_uprobe,mm} tuple, reference
counter must be incremented only by one. Note that we don't check the
presence of mm in the list at decrement time.

We consider only two case while incrementing the reference counter:
  1. Target binary is already running when we start tracing. In this
     case, find all mm which maps region of target binary containing
     reference counter. Loop over all mms and increment the counter
     if mm is not already present in the list.
  2. Tracer is already tracing before target binary starts execution.
     In this case, all mmap(vma) gets notified to trace_uprobe.
     Trace_uprobe will update reference counter if vma->vm_mm is not
     already present in the list.

  There is also a third case which we don't consider, a fork() case.
  When process with markers forks itself, we don't explicitly increment
  the reference counter in child process because it should be taken care
  by dup_mmap(). We also don't add the child mm in the list. This is
  fine because we don't check presence of mm in the list at decrement
  time.

After patch:

  Start perf record and then run python...
  Reference counter value has been incremented only once:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fff9cbf99d8 )) 2>/dev/null | xxd
      0000000: 01                                       .

  Kill perf:
    #
      ^C[ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.364 MB perf.data (1427 samples) ]

  Reference conter is reset to 0:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fff9cbb99d8 )) 2>/dev/null | xxd
      0000000: 00                                       .

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 kernel/trace/trace_uprobe.c | 105 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 102 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 5582c2d..c045174 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -27,6 +27,7 @@
 #include <linux/rculist.h>
 #include <linux/sched/mm.h>
 #include <linux/highmem.h>
+#include <linux/mmu_notifier.h>
 
 #include "trace_probe.h"
 
@@ -50,6 +51,11 @@ struct trace_uprobe_filter {
 	struct list_head	perf_events;
 };
 
+struct sdt_mm_list {
+	struct list_head list;
+	struct mm_struct *mm;
+};
+
 /*
  * uprobe event core functions
  */
@@ -61,6 +67,8 @@ struct trace_uprobe {
 	char				*filename;
 	unsigned long			offset;
 	unsigned long			ref_ctr_offset;
+	struct sdt_mm_list		sml;
+	struct mutex			sml_lock;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -274,6 +282,8 @@ static inline bool is_ret_probe(struct trace_uprobe *tu)
 	if (is_ret)
 		tu->consumer.ret_handler = uretprobe_dispatcher;
 	init_trace_uprobe_filter(&tu->filter);
+	mutex_init(&tu->sml_lock);
+	INIT_LIST_HEAD(&(tu->sml.list));
 	return tu;
 
 error:
@@ -921,6 +931,56 @@ static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func,
 	return trace_handle_return(s);
 }
 
+static bool sdt_check_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct sdt_mm_list *sml;
+
+	list_for_each_entry(sml, &(tu->sml.list), list)
+		if (sml->mm == mm)
+			return true;
+
+	return false;
+}
+
+static void sdt_mm_release(struct mmu_notifier *mn, struct mm_struct *mm);
+
+static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
+	.release = sdt_mm_release,
+};
+
+static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct mmu_notifier *mn;
+	struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
+
+	if (!sml)
+		return;
+	sml->mm = mm;
+	list_add(&(sml->list), &(tu->sml.list));
+
+	/* Register mmu_notifier for this mm. */
+	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
+	if (!mn)
+		return;
+
+	mn->ops = &sdt_mmu_notifier_ops;
+	__mmu_notifier_register(mn, mm);
+}
+
+static void sdt_del_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
+{
+	struct list_head *pos, *q;
+	struct sdt_mm_list *sml;
+
+	list_for_each_safe(pos, q, &(tu->sml.list)) {
+		sml = list_entry(pos, struct sdt_mm_list, list);
+		if (sml->mm == mm) {
+			list_del(pos);
+			kfree(sml);
+		}
+	}
+}
+
 static bool sdt_valid_vma(struct trace_uprobe *tu,
 			  struct vm_area_struct *vma,
 			  unsigned long vaddr)
@@ -983,15 +1043,22 @@ static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
 	if (IS_ERR(info))
 		goto out;
 
+	mutex_lock(&tu->sml_lock);
 	while (info) {
+		if (sdt_check_mm_list(tu, info->mm))
+			goto cont;
+
 		down_write(&info->mm->mmap_sem);
 
-		if (sdt_find_vma(tu, info->mm, info->vaddr))
-			sdt_update_ref_ctr(info->mm, info->vaddr, 1);
+		if (sdt_find_vma(tu, info->mm, info->vaddr) &&
+		    !sdt_update_ref_ctr(info->mm, info->vaddr, 1))
+			sdt_add_mm_list(tu, info->mm);
 
 		up_write(&info->mm->mmap_sem);
+cont:
 		info = uprobe_free_map_info(info);
 	}
+	mutex_unlock(&tu->sml_lock);
 
 out:
 	uprobe_up_write_dup_mmap();
@@ -1015,11 +1082,27 @@ void trace_uprobe_mmap(struct vm_area_struct *vma)
 		if (!sdt_valid_vma(tu, vma, vaddr))
 			continue;
 
-		sdt_update_ref_ctr(vma->vm_mm, vaddr, 1);
+		mutex_lock(&tu->sml_lock);
+
+		if (!sdt_check_mm_list(tu, vma->vm_mm) &&
+		    !sdt_update_ref_ctr(vma->vm_mm, vaddr, 1))
+			sdt_add_mm_list(tu, vma->vm_mm);
+
+		mutex_unlock(&tu->sml_lock);
 	}
 	mutex_unlock(&uprobe_lock);
 }
 
+/*
+ * We don't check presence of mm in tu->sml here. We just decrement
+ * the reference counter if we find vma holding the reference counter.
+ *
+ * For tiny binaries/libraries, different mmap regions point to the
+ * same file portion. In such cases, uprobe_build_map_info() returns
+ * same mm multiple times with different virtual address of one
+ * reference counter. But we don't decrement the reference counter
+ * multiple time because we check for VM_WRITE in sdt_valid_vma().
+ */
 static void sdt_decrement_ref_ctr(struct trace_uprobe *tu)
 {
 	struct uprobe_map_info *info;
@@ -1030,6 +1113,7 @@ static void sdt_decrement_ref_ctr(struct trace_uprobe *tu)
 	if (IS_ERR(info))
 		goto out;
 
+	mutex_lock(&tu->sml_lock);
 	while (info) {
 		down_write(&info->mm->mmap_sem);
 
@@ -1037,13 +1121,28 @@ static void sdt_decrement_ref_ctr(struct trace_uprobe *tu)
 			sdt_update_ref_ctr(info->mm, info->vaddr, -1);
 
 		up_write(&info->mm->mmap_sem);
+		sdt_del_mm_list(tu, info->mm);
 		info = uprobe_free_map_info(info);
 	}
+	mutex_unlock(&tu->sml_lock);
 
 out:
 	uprobe_up_write_dup_mmap();
 }
 
+static void sdt_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct trace_uprobe *tu;
+
+	mutex_lock(&uprobe_lock);
+	list_for_each_entry(tu, &uprobe_list, list) {
+		mutex_lock(&tu->sml_lock);
+		sdt_del_mm_list(tu, mm);
+		mutex_unlock(&tu->sml_lock);
+	}
+	mutex_unlock(&uprobe_lock);
+}
+
 typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 				enum uprobe_filter_ctx ctx,
 				struct mm_struct *mm);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 8/9] trace_uprobe/sdt: Document about reference counter
  2018-04-04  8:31 ` Ravi Bangoria
@ 2018-04-04  8:31   ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Reference counter gate the invocation of probe. If present,
by default reference count is 0. Kernel needs to increment
it before tracing the probe and decrement it when done. This
is identical to semaphore in Userspace Statically Defined
Tracepoints (USDT).

Document usage of reference counter.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 Documentation/trace/uprobetracer.txt | 16 +++++++++++++---
 kernel/trace/trace.c                 |  2 +-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/uprobetracer.txt b/Documentation/trace/uprobetracer.txt
index bf526a7c..cb6751d 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -19,15 +19,25 @@ user to calculate the offset of the probepoint in the object.
 
 Synopsis of uprobe_tracer
 -------------------------
-  p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe
-  r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe (uretprobe)
-  -:[GRP/]EVENT                           : Clear uprobe or uretprobe event
+  p[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  r[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  -:[GRP/]EVENT
+
+  p : Set a uprobe
+  r : Set a return uprobe (uretprobe)
+  - : Clear uprobe or uretprobe event
 
   GRP           : Group name. If omitted, "uprobes" is the default value.
   EVENT         : Event name. If omitted, the event name is generated based
                   on PATH+OFFSET.
   PATH          : Path to an executable or a library.
   OFFSET        : Offset where the probe is inserted.
+  REF_CTR_OFFSET: Reference counter offset. Optional field. Reference count
+		  gate the invocation of probe. If present, by default
+		  reference count is 0. Kernel needs to increment it before
+		  tracing the probe and decrement it when done. This is
+		  identical to semaphore in Userspace Statically Defined
+		  Tracepoints (USDT).
 
   FETCHARGS     : Arguments. Each probe can have up to 128 args.
    %REG         : Fetch register REG
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 300f4ea..d211937 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4604,7 +4604,7 @@ static int tracing_trace_options_open(struct inode *inode, struct file *file)
   "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
 #endif
 #ifdef CONFIG_UPROBE_EVENTS
-	"\t    place: <path>:<offset>\n"
+  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
 #endif
 	"\t     args: <name>=fetcharg[:type]\n"
 	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 8/9] trace_uprobe/sdt: Document about reference counter
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Reference counter gate the invocation of probe. If present,
by default reference count is 0. Kernel needs to increment
it before tracing the probe and decrement it when done. This
is identical to semaphore in Userspace Statically Defined
Tracepoints (USDT).

Document usage of reference counter.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 Documentation/trace/uprobetracer.txt | 16 +++++++++++++---
 kernel/trace/trace.c                 |  2 +-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/uprobetracer.txt b/Documentation/trace/uprobetracer.txt
index bf526a7c..cb6751d 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -19,15 +19,25 @@ user to calculate the offset of the probepoint in the object.
 
 Synopsis of uprobe_tracer
 -------------------------
-  p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe
-  r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe (uretprobe)
-  -:[GRP/]EVENT                           : Clear uprobe or uretprobe event
+  p[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  r[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  -:[GRP/]EVENT
+
+  p : Set a uprobe
+  r : Set a return uprobe (uretprobe)
+  - : Clear uprobe or uretprobe event
 
   GRP           : Group name. If omitted, "uprobes" is the default value.
   EVENT         : Event name. If omitted, the event name is generated based
                   on PATH+OFFSET.
   PATH          : Path to an executable or a library.
   OFFSET        : Offset where the probe is inserted.
+  REF_CTR_OFFSET: Reference counter offset. Optional field. Reference count
+		  gate the invocation of probe. If present, by default
+		  reference count is 0. Kernel needs to increment it before
+		  tracing the probe and decrement it when done. This is
+		  identical to semaphore in Userspace Statically Defined
+		  Tracepoints (USDT).
 
   FETCHARGS     : Arguments. Each probe can have up to 128 args.
    %REG         : Fetch register REG
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 300f4ea..d211937 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4604,7 +4604,7 @@ static int tracing_trace_options_open(struct inode *inode, struct file *file)
   "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
 #endif
 #ifdef CONFIG_UPROBE_EVENTS
-	"\t    place: <path>:<offset>\n"
+  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
 #endif
 	"\t     args: <name>=fetcharg[:type]\n"
 	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore)
  2018-04-04  8:31 ` Ravi Bangoria
@ 2018-04-04  8:31   ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
    Name: loop2
    ... Semaphore: 0x0000000010020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 tools/perf/util/probe-event.c | 18 ++++++++++++++---
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++++++++++++++++++++++++++------
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ++++++++++++++++++++++++++++++++-----------
 tools/perf/util/symbol.h      |  7 +++++++
 6 files changed, 86 insertions(+), 21 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index e1dbc98..b3a1330 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1832,6 +1832,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev)
 			tp->offset = strtoul(fmt2_str, NULL, 10);
 	}
 
+	if (tev->uprobes) {
+		fmt2_str = strchr(p, '(');
+		if (fmt2_str)
+			tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+	}
+
 	tev->nargs = argc - 2;
 	tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
 	if (tev->args == NULL) {
@@ -2054,15 +2060,21 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 	}
 
 	/* Use the tp->address for uprobes */
-	if (tev->uprobes)
+	if (tev->uprobes) {
 		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
-	else if (!strncmp(tp->symbol, "0x", 2))
+		if (uprobe_ref_ctr_is_supported() &&
+		    tp->ref_ctr_offset &&
+		    err >= 0)
+			err = strbuf_addf(&buf, "(0x%lx)", tp->ref_ctr_offset);
+	} else if (!strncmp(tp->symbol, "0x", 2)) {
 		/* Absolute address. See try_to_find_absolute_address() */
 		err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
 				  tp->module ? ":" : "", tp->address);
-	else
+	} else {
 		err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
 				tp->module ? ":" : "", tp->symbol, tp->offset);
+	}
+
 	if (err)
 		goto error;
 
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f0..15a98c3 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
 	char		*symbol;	/* Base symbol */
 	char		*module;	/* Module name */
 	unsigned long	offset;		/* Offset from symbol */
+	unsigned long	ref_ctr_offset;	/* SDT reference counter offset */
 	unsigned long	address;	/* Actual address of the trace point */
 	bool		retprobe;	/* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 4ae1123..ca0e524 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -697,8 +697,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
 static unsigned long long sdt_note__get_addr(struct sdt_note *note)
 {
-	return note->bit32 ? (unsigned long long)note->addr.a32[0]
-		 : (unsigned long long)note->addr.a64[0];
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_LOC] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_LOC];
+}
+
+static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
+{
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_REFCTR] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_REFCTR];
 }
 
 static const char * const type_to_suffix[] = {
@@ -776,14 +784,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
 {
 	struct strbuf buf;
 	char *ret = NULL, **args;
-	int i, args_count;
+	int i, args_count, err;
+	unsigned long long ref_ctr_offset;
 
 	if (strbuf_init(&buf, 32) < 0)
 		return NULL;
 
-	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
-				sdtgrp, note->name, pathname,
-				sdt_note__get_addr(note)) < 0)
+	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
+			sdtgrp, note->name, pathname,
+			sdt_note__get_addr(note));
+
+	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
+	if (uprobe_ref_ctr_is_supported() && ref_ctr_offset && err >= 0)
+		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
+
+	if (err < 0)
 		goto error;
 
 	if (!note->args)
@@ -999,6 +1014,7 @@ int probe_cache__show_all_caches(struct strfilter *filter)
 enum ftrace_readme {
 	FTRACE_README_PROBE_TYPE_X = 0,
 	FTRACE_README_KRETPROBE_OFFSET,
+	FTRACE_README_UPROBE_REF_CTR,
 	FTRACE_README_END,
 };
 
@@ -1010,6 +1026,7 @@ enum ftrace_readme {
 	[idx] = {.pattern = pat, .avail = false}
 	DEFINE_TYPE(FTRACE_README_PROBE_TYPE_X, "*type: * x8/16/32/64,*"),
 	DEFINE_TYPE(FTRACE_README_KRETPROBE_OFFSET, "*place (kretprobe): *"),
+	DEFINE_TYPE(FTRACE_README_UPROBE_REF_CTR, "*ref_ctr_offset*"),
 };
 
 static bool scan_ftrace_readme(enum ftrace_readme type)
@@ -1065,3 +1082,8 @@ bool kretprobe_offset_is_supported(void)
 {
 	return scan_ftrace_readme(FTRACE_README_KRETPROBE_OFFSET);
 }
+
+bool uprobe_ref_ctr_is_supported(void)
+{
+	return scan_ftrace_readme(FTRACE_README_UPROBE_REF_CTR);
+}
diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
index 63f29b1..2a24918 100644
--- a/tools/perf/util/probe-file.h
+++ b/tools/perf/util/probe-file.h
@@ -69,6 +69,7 @@ struct probe_cache_entry *probe_cache__find_by_name(struct probe_cache *pcache,
 int probe_cache__show_all_caches(struct strfilter *filter);
 bool probe_type_is_available(enum probe_type type);
 bool kretprobe_offset_is_supported(void);
+bool uprobe_ref_ctr_is_supported(void);
 #else	/* ! HAVE_LIBELF_SUPPORT */
 static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused, struct nsinfo *nsi __maybe_unused)
 {
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 2de7705..45b7dba 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1803,6 +1803,34 @@ void kcore_extract__delete(struct kcore_extract *kce)
 }
 
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
+
+static void sdt_adjust_loc(struct sdt_note *tmp, GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32)
+		tmp->addr.a32[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a32[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a32[SDT_NOTE_IDX_BASE];
+	else
+		tmp->addr.a64[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a64[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a64[SDT_NOTE_IDX_BASE];
+}
+
+static void sdt_adjust_refctr(struct sdt_note *tmp, GElf_Addr base_addr,
+			      GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32)
+		tmp->addr.a32[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+	else
+		tmp->addr.a64[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+}
+
 /**
  * populate_sdt_note : Parse raw data and identify SDT note
  * @elf: elf of the opened file
@@ -1820,7 +1848,6 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	const char *provider, *name, *args;
 	struct sdt_note *tmp = NULL;
 	GElf_Ehdr ehdr;
-	GElf_Addr base_off = 0;
 	GElf_Shdr shdr;
 	int ret = -EINVAL;
 
@@ -1916,17 +1943,12 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	 * base address in the description of the SDT note. If its different,
 	 * then accordingly, adjust the note location.
 	 */
-	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL)) {
-		base_off = shdr.sh_offset;
-		if (base_off) {
-			if (tmp->bit32)
-				tmp->addr.a32[0] = tmp->addr.a32[0] + base_off -
-					tmp->addr.a32[1];
-			else
-				tmp->addr.a64[0] = tmp->addr.a64[0] + base_off -
-					tmp->addr.a64[1];
-		}
-	}
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL))
+		sdt_adjust_loc(tmp, shdr.sh_offset);
+
+	/* Adjust reference counter offset */
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_PROBES_SCN, NULL))
+		sdt_adjust_refctr(tmp, shdr.sh_addr, shdr.sh_offset);
 
 	list_add_tail(&tmp->note_list, sdt_notes);
 	return 0;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 70c16741..aa095bf 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -384,12 +384,19 @@ struct sdt_note {
 int cleanup_sdt_note_list(struct list_head *sdt_notes);
 int sdt_notes__get_count(struct list_head *start);
 
+#define SDT_PROBES_SCN ".probes"
 #define SDT_BASE_SCN ".stapsdt.base"
 #define SDT_NOTE_SCN  ".note.stapsdt"
 #define SDT_NOTE_TYPE 3
 #define SDT_NOTE_NAME "stapsdt"
 #define NR_ADDR 3
 
+enum {
+	SDT_NOTE_IDX_LOC = 0,
+	SDT_NOTE_IDX_BASE,
+	SDT_NOTE_IDX_REFCTR,
+};
+
 struct mem_info *mem_info__new(void);
 struct mem_info *mem_info__get(struct mem_info *mi);
 void   mem_info__put(struct mem_info *mi);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore)
@ 2018-04-04  8:31   ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-04  8:31 UTC (permalink / raw)
  To: mhiramat, oleg, peterz, srikar, rostedt
  Cc: acme, ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
    Name: loop2
    ... Semaphore: 0x0000000010020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 tools/perf/util/probe-event.c | 18 ++++++++++++++---
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++++++++++++++++++++++++++------
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ++++++++++++++++++++++++++++++++-----------
 tools/perf/util/symbol.h      |  7 +++++++
 6 files changed, 86 insertions(+), 21 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index e1dbc98..b3a1330 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1832,6 +1832,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev)
 			tp->offset = strtoul(fmt2_str, NULL, 10);
 	}
 
+	if (tev->uprobes) {
+		fmt2_str = strchr(p, '(');
+		if (fmt2_str)
+			tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+	}
+
 	tev->nargs = argc - 2;
 	tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
 	if (tev->args == NULL) {
@@ -2054,15 +2060,21 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 	}
 
 	/* Use the tp->address for uprobes */
-	if (tev->uprobes)
+	if (tev->uprobes) {
 		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
-	else if (!strncmp(tp->symbol, "0x", 2))
+		if (uprobe_ref_ctr_is_supported() &&
+		    tp->ref_ctr_offset &&
+		    err >= 0)
+			err = strbuf_addf(&buf, "(0x%lx)", tp->ref_ctr_offset);
+	} else if (!strncmp(tp->symbol, "0x", 2)) {
 		/* Absolute address. See try_to_find_absolute_address() */
 		err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
 				  tp->module ? ":" : "", tp->address);
-	else
+	} else {
 		err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
 				tp->module ? ":" : "", tp->symbol, tp->offset);
+	}
+
 	if (err)
 		goto error;
 
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f0..15a98c3 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
 	char		*symbol;	/* Base symbol */
 	char		*module;	/* Module name */
 	unsigned long	offset;		/* Offset from symbol */
+	unsigned long	ref_ctr_offset;	/* SDT reference counter offset */
 	unsigned long	address;	/* Actual address of the trace point */
 	bool		retprobe;	/* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 4ae1123..ca0e524 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -697,8 +697,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
 static unsigned long long sdt_note__get_addr(struct sdt_note *note)
 {
-	return note->bit32 ? (unsigned long long)note->addr.a32[0]
-		 : (unsigned long long)note->addr.a64[0];
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_LOC] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_LOC];
+}
+
+static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
+{
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_REFCTR] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_REFCTR];
 }
 
 static const char * const type_to_suffix[] = {
@@ -776,14 +784,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
 {
 	struct strbuf buf;
 	char *ret = NULL, **args;
-	int i, args_count;
+	int i, args_count, err;
+	unsigned long long ref_ctr_offset;
 
 	if (strbuf_init(&buf, 32) < 0)
 		return NULL;
 
-	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
-				sdtgrp, note->name, pathname,
-				sdt_note__get_addr(note)) < 0)
+	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
+			sdtgrp, note->name, pathname,
+			sdt_note__get_addr(note));
+
+	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
+	if (uprobe_ref_ctr_is_supported() && ref_ctr_offset && err >= 0)
+		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
+
+	if (err < 0)
 		goto error;
 
 	if (!note->args)
@@ -999,6 +1014,7 @@ int probe_cache__show_all_caches(struct strfilter *filter)
 enum ftrace_readme {
 	FTRACE_README_PROBE_TYPE_X = 0,
 	FTRACE_README_KRETPROBE_OFFSET,
+	FTRACE_README_UPROBE_REF_CTR,
 	FTRACE_README_END,
 };
 
@@ -1010,6 +1026,7 @@ enum ftrace_readme {
 	[idx] = {.pattern = pat, .avail = false}
 	DEFINE_TYPE(FTRACE_README_PROBE_TYPE_X, "*type: * x8/16/32/64,*"),
 	DEFINE_TYPE(FTRACE_README_KRETPROBE_OFFSET, "*place (kretprobe): *"),
+	DEFINE_TYPE(FTRACE_README_UPROBE_REF_CTR, "*ref_ctr_offset*"),
 };
 
 static bool scan_ftrace_readme(enum ftrace_readme type)
@@ -1065,3 +1082,8 @@ bool kretprobe_offset_is_supported(void)
 {
 	return scan_ftrace_readme(FTRACE_README_KRETPROBE_OFFSET);
 }
+
+bool uprobe_ref_ctr_is_supported(void)
+{
+	return scan_ftrace_readme(FTRACE_README_UPROBE_REF_CTR);
+}
diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
index 63f29b1..2a24918 100644
--- a/tools/perf/util/probe-file.h
+++ b/tools/perf/util/probe-file.h
@@ -69,6 +69,7 @@ struct probe_cache_entry *probe_cache__find_by_name(struct probe_cache *pcache,
 int probe_cache__show_all_caches(struct strfilter *filter);
 bool probe_type_is_available(enum probe_type type);
 bool kretprobe_offset_is_supported(void);
+bool uprobe_ref_ctr_is_supported(void);
 #else	/* ! HAVE_LIBELF_SUPPORT */
 static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused, struct nsinfo *nsi __maybe_unused)
 {
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 2de7705..45b7dba 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1803,6 +1803,34 @@ void kcore_extract__delete(struct kcore_extract *kce)
 }
 
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
+
+static void sdt_adjust_loc(struct sdt_note *tmp, GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32)
+		tmp->addr.a32[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a32[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a32[SDT_NOTE_IDX_BASE];
+	else
+		tmp->addr.a64[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a64[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a64[SDT_NOTE_IDX_BASE];
+}
+
+static void sdt_adjust_refctr(struct sdt_note *tmp, GElf_Addr base_addr,
+			      GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32)
+		tmp->addr.a32[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+	else
+		tmp->addr.a64[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+}
+
 /**
  * populate_sdt_note : Parse raw data and identify SDT note
  * @elf: elf of the opened file
@@ -1820,7 +1848,6 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	const char *provider, *name, *args;
 	struct sdt_note *tmp = NULL;
 	GElf_Ehdr ehdr;
-	GElf_Addr base_off = 0;
 	GElf_Shdr shdr;
 	int ret = -EINVAL;
 
@@ -1916,17 +1943,12 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	 * base address in the description of the SDT note. If its different,
 	 * then accordingly, adjust the note location.
 	 */
-	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL)) {
-		base_off = shdr.sh_offset;
-		if (base_off) {
-			if (tmp->bit32)
-				tmp->addr.a32[0] = tmp->addr.a32[0] + base_off -
-					tmp->addr.a32[1];
-			else
-				tmp->addr.a64[0] = tmp->addr.a64[0] + base_off -
-					tmp->addr.a64[1];
-		}
-	}
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL))
+		sdt_adjust_loc(tmp, shdr.sh_offset);
+
+	/* Adjust reference counter offset */
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_PROBES_SCN, NULL))
+		sdt_adjust_refctr(tmp, shdr.sh_addr, shdr.sh_offset);
 
 	list_add_tail(&tmp->note_list, sdt_notes);
 	return 0;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 70c16741..aa095bf 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -384,12 +384,19 @@ struct sdt_note {
 int cleanup_sdt_note_list(struct list_head *sdt_notes);
 int sdt_notes__get_count(struct list_head *start);
 
+#define SDT_PROBES_SCN ".probes"
 #define SDT_BASE_SCN ".stapsdt.base"
 #define SDT_NOTE_SCN  ".note.stapsdt"
 #define SDT_NOTE_TYPE 3
 #define SDT_NOTE_NAME "stapsdt"
 #define NR_ADDR 3
 
+enum {
+	SDT_NOTE_IDX_LOC = 0,
+	SDT_NOTE_IDX_BASE,
+	SDT_NOTE_IDX_REFCTR,
+};
+
 struct mem_info *mem_info__new(void);
 struct mem_info *mem_info__get(struct mem_info *mi);
 void   mem_info__put(struct mem_info *mi);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
  2018-04-04  8:31   ` Ravi Bangoria
@ 2018-04-04 13:18     ` kbuild test robot
  -1 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-04-04 13:18 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: kbuild-all, mhiramat, oleg, peterz, srikar, rostedt, acme,
	ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

[-- Attachment #1: Type: text/plain, Size: 2994 bytes --]

Hi Ravi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on v4.16 next-20180403]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ravi-Bangoria/trace_uprobe-Support-SDT-markers-having-reference-count-semaphore/20180404-201900
config: x86_64-randconfig-x019-201813 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

>> kernel/trace/trace_uprobe.c:947:21: error: variable 'sdt_mmu_notifier_ops' has initializer but incomplete type
    static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
                        ^~~~~~~~~~~~~~~~
>> kernel/trace/trace_uprobe.c:948:3: error: 'const struct mmu_notifier_ops' has no member named 'release'
     .release = sdt_mm_release,
      ^~~~~~~
>> kernel/trace/trace_uprobe.c:948:13: warning: excess elements in struct initializer
     .release = sdt_mm_release,
                ^~~~~~~~~~~~~~
   kernel/trace/trace_uprobe.c:948:13: note: (near initialization for 'sdt_mmu_notifier_ops')
   kernel/trace/trace_uprobe.c: In function 'sdt_add_mm_list':
>> kernel/trace/trace_uprobe.c:962:22: error: dereferencing pointer to incomplete type 'struct mmu_notifier'
     mn = kzalloc(sizeof(*mn), GFP_KERNEL);
                         ^~~
>> kernel/trace/trace_uprobe.c:967:2: error: implicit declaration of function '__mmu_notifier_register'; did you mean 'mmu_notifier_release'? [-Werror=implicit-function-declaration]
     __mmu_notifier_register(mn, mm);
     ^~~~~~~~~~~~~~~~~~~~~~~
     mmu_notifier_release
   kernel/trace/trace_uprobe.c: At top level:
>> kernel/trace/trace_uprobe.c:947:38: error: storage size of 'sdt_mmu_notifier_ops' isn't known
    static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
                                         ^~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/sdt_mmu_notifier_ops +947 kernel/trace/trace_uprobe.c

   946	
 > 947	static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
 > 948		.release = sdt_mm_release,
   949	};
   950	
   951	static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
   952	{
   953		struct mmu_notifier *mn;
   954		struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
   955	
   956		if (!sml)
   957			return;
   958		sml->mm = mm;
   959		list_add(&(sml->list), &(tu->sml.list));
   960	
   961		/* Register mmu_notifier for this mm. */
 > 962		mn = kzalloc(sizeof(*mn), GFP_KERNEL);
   963		if (!mn)
   964			return;
   965	
   966		mn->ops = &sdt_mmu_notifier_ops;
 > 967		__mmu_notifier_register(mn, mm);
   968	}
   969	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26443 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
@ 2018-04-04 13:18     ` kbuild test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-04-04 13:18 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: kbuild-all, mhiramat, oleg, peterz, srikar, rostedt, acme,
	ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse

[-- Attachment #1: Type: text/plain, Size: 2994 bytes --]

Hi Ravi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on v4.16 next-20180403]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ravi-Bangoria/trace_uprobe-Support-SDT-markers-having-reference-count-semaphore/20180404-201900
config: x86_64-randconfig-x019-201813 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

>> kernel/trace/trace_uprobe.c:947:21: error: variable 'sdt_mmu_notifier_ops' has initializer but incomplete type
    static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
                        ^~~~~~~~~~~~~~~~
>> kernel/trace/trace_uprobe.c:948:3: error: 'const struct mmu_notifier_ops' has no member named 'release'
     .release = sdt_mm_release,
      ^~~~~~~
>> kernel/trace/trace_uprobe.c:948:13: warning: excess elements in struct initializer
     .release = sdt_mm_release,
                ^~~~~~~~~~~~~~
   kernel/trace/trace_uprobe.c:948:13: note: (near initialization for 'sdt_mmu_notifier_ops')
   kernel/trace/trace_uprobe.c: In function 'sdt_add_mm_list':
>> kernel/trace/trace_uprobe.c:962:22: error: dereferencing pointer to incomplete type 'struct mmu_notifier'
     mn = kzalloc(sizeof(*mn), GFP_KERNEL);
                         ^~~
>> kernel/trace/trace_uprobe.c:967:2: error: implicit declaration of function '__mmu_notifier_register'; did you mean 'mmu_notifier_release'? [-Werror=implicit-function-declaration]
     __mmu_notifier_register(mn, mm);
     ^~~~~~~~~~~~~~~~~~~~~~~
     mmu_notifier_release
   kernel/trace/trace_uprobe.c: At top level:
>> kernel/trace/trace_uprobe.c:947:38: error: storage size of 'sdt_mmu_notifier_ops' isn't known
    static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
                                         ^~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/sdt_mmu_notifier_ops +947 kernel/trace/trace_uprobe.c

   946	
 > 947	static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
 > 948		.release = sdt_mm_release,
   949	};
   950	
   951	static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
   952	{
   953		struct mmu_notifier *mn;
   954		struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
   955	
   956		if (!sml)
   957			return;
   958		sml->mm = mm;
   959		list_add(&(sml->list), &(tu->sml.list));
   960	
   961		/* Register mmu_notifier for this mm. */
 > 962		mn = kzalloc(sizeof(*mn), GFP_KERNEL);
   963		if (!mn)
   964			return;
   965	
   966		mn->ops = &sdt_mmu_notifier_ops;
 > 967		__mmu_notifier_register(mn, mm);
   968	}
   969	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26443 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
  2018-04-04  8:31   ` Ravi Bangoria
@ 2018-04-04 13:24     ` kbuild test robot
  -1 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-04-04 13:24 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: kbuild-all, mhiramat, oleg, peterz, srikar, rostedt, acme,
	ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

[-- Attachment #1: Type: text/plain, Size: 2649 bytes --]

Hi Ravi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on v4.16 next-20180403]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ravi-Bangoria/trace_uprobe-Support-SDT-markers-having-reference-count-semaphore/20180404-201900
config: i386-randconfig-a0-201813 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel//trace/trace_uprobe.c:947:21: error: variable 'sdt_mmu_notifier_ops' has initializer but incomplete type
    static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
                        ^
>> kernel//trace/trace_uprobe.c:948:2: error: unknown field 'release' specified in initializer
     .release = sdt_mm_release,
     ^
   kernel//trace/trace_uprobe.c:948:2: warning: excess elements in struct initializer
   kernel//trace/trace_uprobe.c:948:2: warning: (near initialization for 'sdt_mmu_notifier_ops')
   kernel//trace/trace_uprobe.c: In function 'sdt_add_mm_list':
>> kernel//trace/trace_uprobe.c:962:22: error: dereferencing pointer to incomplete type
     mn = kzalloc(sizeof(*mn), GFP_KERNEL);
                         ^
   kernel//trace/trace_uprobe.c:966:4: error: dereferencing pointer to incomplete type
     mn->ops = &sdt_mmu_notifier_ops;
       ^
>> kernel//trace/trace_uprobe.c:967:2: error: implicit declaration of function '__mmu_notifier_register' [-Werror=implicit-function-declaration]
     __mmu_notifier_register(mn, mm);
     ^
   cc1: some warnings being treated as errors

vim +/__mmu_notifier_register +967 kernel//trace/trace_uprobe.c

   946	
 > 947	static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
 > 948		.release = sdt_mm_release,
   949	};
   950	
   951	static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
   952	{
   953		struct mmu_notifier *mn;
   954		struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
   955	
   956		if (!sml)
   957			return;
   958		sml->mm = mm;
   959		list_add(&(sml->list), &(tu->sml.list));
   960	
   961		/* Register mmu_notifier for this mm. */
 > 962		mn = kzalloc(sizeof(*mn), GFP_KERNEL);
   963		if (!mn)
   964			return;
   965	
   966		mn->ops = &sdt_mmu_notifier_ops;
 > 967		__mmu_notifier_register(mn, mm);
   968	}
   969	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32696 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
@ 2018-04-04 13:24     ` kbuild test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-04-04 13:24 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: kbuild-all, mhiramat, oleg, peterz, srikar, rostedt, acme,
	ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse

[-- Attachment #1: Type: text/plain, Size: 2649 bytes --]

Hi Ravi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on v4.16 next-20180403]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ravi-Bangoria/trace_uprobe-Support-SDT-markers-having-reference-count-semaphore/20180404-201900
config: i386-randconfig-a0-201813 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel//trace/trace_uprobe.c:947:21: error: variable 'sdt_mmu_notifier_ops' has initializer but incomplete type
    static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
                        ^
>> kernel//trace/trace_uprobe.c:948:2: error: unknown field 'release' specified in initializer
     .release = sdt_mm_release,
     ^
   kernel//trace/trace_uprobe.c:948:2: warning: excess elements in struct initializer
   kernel//trace/trace_uprobe.c:948:2: warning: (near initialization for 'sdt_mmu_notifier_ops')
   kernel//trace/trace_uprobe.c: In function 'sdt_add_mm_list':
>> kernel//trace/trace_uprobe.c:962:22: error: dereferencing pointer to incomplete type
     mn = kzalloc(sizeof(*mn), GFP_KERNEL);
                         ^
   kernel//trace/trace_uprobe.c:966:4: error: dereferencing pointer to incomplete type
     mn->ops = &sdt_mmu_notifier_ops;
       ^
>> kernel//trace/trace_uprobe.c:967:2: error: implicit declaration of function '__mmu_notifier_register' [-Werror=implicit-function-declaration]
     __mmu_notifier_register(mn, mm);
     ^
   cc1: some warnings being treated as errors

vim +/__mmu_notifier_register +967 kernel//trace/trace_uprobe.c

   946	
 > 947	static const struct mmu_notifier_ops sdt_mmu_notifier_ops = {
 > 948		.release = sdt_mm_release,
   949	};
   950	
   951	static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
   952	{
   953		struct mmu_notifier *mn;
   954		struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
   955	
   956		if (!sml)
   957			return;
   958		sml->mm = mm;
   959		list_add(&(sml->list), &(tu->sml.list));
   960	
   961		/* Register mmu_notifier for this mm. */
 > 962		mn = kzalloc(sizeof(*mn), GFP_KERNEL);
   963		if (!mn)
   964			return;
   965	
   966		mn->ops = &sdt_mmu_notifier_ops;
 > 967		__mmu_notifier_register(mn, mm);
   968	}
   969	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32696 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)
  2018-04-04  8:31   ` Ravi Bangoria
  (?)
@ 2018-04-04 15:03     ` kbuild test robot
  -1 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-04-04 15:03 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: kbuild-all, mhiramat, oleg, peterz, srikar, rostedt, acme,
	ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Hi Ravi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/perf/core]
[also build test WARNING on v4.16 next-20180404]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ravi-Bangoria/trace_uprobe-Support-SDT-markers-having-reference-count-semaphore/20180404-201900
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   kernel/trace/trace.h:1298:38: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct event_filter *filter @@    got struct event_filtstruct event_filter *filter @@
   kernel/trace/trace.h:1298:38:    expected struct event_filter *filter
   kernel/trace/trace.h:1298:38:    got struct event_filter [noderef] <asn:4>*filter
>> kernel/trace/trace_uprobe.c:1001:6: sparse: symbol 'trace_uprobe_mmap' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH] trace_uprobe: trace_uprobe_mmap() can be static
  2018-04-04  8:31   ` Ravi Bangoria
  (?)
@ 2018-04-04 15:03     ` kbuild test robot
  -1 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-04-04 15:03 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: kbuild-all, mhiramat, oleg, peterz, srikar, rostedt, acme,
	ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria


Fixes: d8d4d3603b92 ("trace_uprobe: Support SDT markers having reference count (semaphore)")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 trace_uprobe.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2502bd7..49a8673 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -998,7 +998,7 @@ static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
 }
 
 /* Called with down_write(&vma->vm_mm->mmap_sem) */
-void trace_uprobe_mmap(struct vm_area_struct *vma)
+static void trace_uprobe_mmap(struct vm_area_struct *vma)
 {
 	struct trace_uprobe *tu;
 	unsigned long vaddr;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)
@ 2018-04-04 15:03     ` kbuild test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-04-04 15:03 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: kbuild-all, mhiramat, oleg, peterz, srikar, rostedt, acme,
	ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria

Hi Ravi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/perf/core]
[also build test WARNING on v4.16 next-20180404]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ravi-Bangoria/trace_uprobe-Support-SDT-markers-having-reference-count-semaphore/20180404-201900
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   kernel/trace/trace.h:1298:38: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct event_filter *filter @@    got struct event_filtstruct event_filter *filter @@
   kernel/trace/trace.h:1298:38:    expected struct event_filter *filter
   kernel/trace/trace.h:1298:38:    got struct event_filter [noderef] <asn:4>*filter
>> kernel/trace/trace_uprobe.c:1001:6: sparse: symbol 'trace_uprobe_mmap' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH] trace_uprobe: trace_uprobe_mmap() can be static
@ 2018-04-04 15:03     ` kbuild test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-04-04 15:03 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: kbuild-all, mhiramat, oleg, peterz, srikar, rostedt, acme,
	ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse,
	Ravi Bangoria


Fixes: d8d4d3603b92 ("trace_uprobe: Support SDT markers having reference count (semaphore)")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 trace_uprobe.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2502bd7..49a8673 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -998,7 +998,7 @@ static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
 }
 
 /* Called with down_write(&vma->vm_mm->mmap_sem) */
-void trace_uprobe_mmap(struct vm_area_struct *vma)
+static void trace_uprobe_mmap(struct vm_area_struct *vma)
 {
 	struct trace_uprobe *tu;
 	unsigned long vaddr;
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)
@ 2018-04-04 15:03     ` kbuild test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-04-04 15:03 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: kbuild-all, mhiramat, oleg, peterz, srikar, rostedt, acme,
	ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse

Hi Ravi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/perf/core]
[also build test WARNING on v4.16 next-20180404]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ravi-Bangoria/trace_uprobe-Support-SDT-markers-having-reference-count-semaphore/20180404-201900
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   kernel/trace/trace.h:1298:38: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct event_filter *filter @@    got struct event_filtstruct event_filter *filter @@
   kernel/trace/trace.h:1298:38:    expected struct event_filter *filter
   kernel/trace/trace.h:1298:38:    got struct event_filter [noderef] <asn:4>*filter
>> kernel/trace/trace_uprobe.c:1001:6: sparse: symbol 'trace_uprobe_mmap' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH] trace_uprobe: trace_uprobe_mmap() can be static
@ 2018-04-04 15:03     ` kbuild test robot
  0 siblings, 0 replies; 56+ messages in thread
From: kbuild test robot @ 2018-04-04 15:03 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: kbuild-all, mhiramat, oleg, peterz, srikar, rostedt, acme,
	ananth, akpm, alexander.shishkin, alexis.berlemont, corbet,
	dan.j.williams, jolsa, kan.liang, kjlx, kstewart, linux-doc,
	linux-kernel, linux-mm, milian.wolff, mingo, namhyung,
	naveen.n.rao, pc, tglx, yao.jin, fengguang.wu, jglisse


Fixes: d8d4d3603b92 ("trace_uprobe: Support SDT markers having reference count (semaphore)")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 trace_uprobe.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2502bd7..49a8673 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -998,7 +998,7 @@ static void sdt_increment_ref_ctr(struct trace_uprobe *tu)
 }
 
 /* Called with down_write(&vma->vm_mm->mmap_sem) */
-void trace_uprobe_mmap(struct vm_area_struct *vma)
+static void trace_uprobe_mmap(struct vm_area_struct *vma)
 {
 	struct trace_uprobe *tu;
 	unsigned long vaddr;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore)
  2018-04-04  8:31   ` Ravi Bangoria
@ 2018-04-09  7:28     ` Masami Hiramatsu
  -1 siblings, 0 replies; 56+ messages in thread
From: Masami Hiramatsu @ 2018-04-09  7:28 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse

Hi Ravi,

On Wed,  4 Apr 2018 14:01:10 +0530
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:

> With this, perf buildid-cache will save SDT markers with reference
> counter in probe cache. Perf probe will be able to probe markers
> having reference counter. Ex,
> 
>   # readelf -n /tmp/tick | grep -A1 loop2
>     Name: loop2
>     ... Semaphore: 0x0000000010020036
> 
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> ---
>  tools/perf/util/probe-event.c | 18 ++++++++++++++---
>  tools/perf/util/probe-event.h |  1 +
>  tools/perf/util/probe-file.c  | 34 ++++++++++++++++++++++++++------
>  tools/perf/util/probe-file.h  |  1 +
>  tools/perf/util/symbol-elf.c  | 46 ++++++++++++++++++++++++++++++++-----------
>  tools/perf/util/symbol.h      |  7 +++++++
>  6 files changed, 86 insertions(+), 21 deletions(-)
> 
> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
> index e1dbc98..b3a1330 100644
> --- a/tools/perf/util/probe-event.c
> +++ b/tools/perf/util/probe-event.c
> @@ -1832,6 +1832,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev)
>  			tp->offset = strtoul(fmt2_str, NULL, 10);
>  	}
>  
> +	if (tev->uprobes) {
> +		fmt2_str = strchr(p, '(');
> +		if (fmt2_str)
> +			tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
> +	}
> +
>  	tev->nargs = argc - 2;
>  	tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
>  	if (tev->args == NULL) {
> @@ -2054,15 +2060,21 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
>  	}
>  
>  	/* Use the tp->address for uprobes */
> -	if (tev->uprobes)
> +	if (tev->uprobes) {
>  		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
> -	else if (!strncmp(tp->symbol, "0x", 2))
> +		if (uprobe_ref_ctr_is_supported() &&
> +		    tp->ref_ctr_offset &&
> +		    err >= 0)
> +			err = strbuf_addf(&buf, "(0x%lx)", tp->ref_ctr_offset);

If the kernel doesn't support uprobe_ref_ctr but the event requires
to increment uprobe_ref_ctr, I think we should (at least) warn user here.

> +	} else if (!strncmp(tp->symbol, "0x", 2)) {
>  		/* Absolute address. See try_to_find_absolute_address() */
>  		err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
>  				  tp->module ? ":" : "", tp->address);
> -	else
> +	} else {
>  		err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
>  				tp->module ? ":" : "", tp->symbol, tp->offset);
> +	}
> +
>  	if (err)
>  		goto error;
>  
> diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
> index 45b14f0..15a98c3 100644
> --- a/tools/perf/util/probe-event.h
> +++ b/tools/perf/util/probe-event.h
> @@ -27,6 +27,7 @@ struct probe_trace_point {
>  	char		*symbol;	/* Base symbol */
>  	char		*module;	/* Module name */
>  	unsigned long	offset;		/* Offset from symbol */
> +	unsigned long	ref_ctr_offset;	/* SDT reference counter offset */
>  	unsigned long	address;	/* Actual address of the trace point */
>  	bool		retprobe;	/* Return probe flag */
>  };
> diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
> index 4ae1123..ca0e524 100644
> --- a/tools/perf/util/probe-file.c
> +++ b/tools/perf/util/probe-file.c
> @@ -697,8 +697,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
>  #ifdef HAVE_GELF_GETNOTE_SUPPORT
>  static unsigned long long sdt_note__get_addr(struct sdt_note *note)
>  {
> -	return note->bit32 ? (unsigned long long)note->addr.a32[0]
> -		 : (unsigned long long)note->addr.a64[0];
> +	return note->bit32 ?
> +		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_LOC] :
> +		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_LOC];
> +}
> +
> +static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
> +{
> +	return note->bit32 ?
> +		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_REFCTR] :
> +		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_REFCTR];
>  }
>  
>  static const char * const type_to_suffix[] = {
> @@ -776,14 +784,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
>  {
>  	struct strbuf buf;
>  	char *ret = NULL, **args;
> -	int i, args_count;
> +	int i, args_count, err;
> +	unsigned long long ref_ctr_offset;
>  
>  	if (strbuf_init(&buf, 32) < 0)
>  		return NULL;
>  
> -	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> -				sdtgrp, note->name, pathname,
> -				sdt_note__get_addr(note)) < 0)
> +	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> +			sdtgrp, note->name, pathname,
> +			sdt_note__get_addr(note));
> +
> +	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
> +	if (uprobe_ref_ctr_is_supported() && ref_ctr_offset && err >= 0)
> +		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);

We don't have to care about uprobe_ref_ctr support here, because
this information will be just cached, not directly written to
uprobe_events.

Other parts look good to me.

Thanks,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore)
@ 2018-04-09  7:28     ` Masami Hiramatsu
  0 siblings, 0 replies; 56+ messages in thread
From: Masami Hiramatsu @ 2018-04-09  7:28 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse

Hi Ravi,

On Wed,  4 Apr 2018 14:01:10 +0530
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:

> With this, perf buildid-cache will save SDT markers with reference
> counter in probe cache. Perf probe will be able to probe markers
> having reference counter. Ex,
> 
>   # readelf -n /tmp/tick | grep -A1 loop2
>     Name: loop2
>     ... Semaphore: 0x0000000010020036
> 
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> ---
>  tools/perf/util/probe-event.c | 18 ++++++++++++++---
>  tools/perf/util/probe-event.h |  1 +
>  tools/perf/util/probe-file.c  | 34 ++++++++++++++++++++++++++------
>  tools/perf/util/probe-file.h  |  1 +
>  tools/perf/util/symbol-elf.c  | 46 ++++++++++++++++++++++++++++++++-----------
>  tools/perf/util/symbol.h      |  7 +++++++
>  6 files changed, 86 insertions(+), 21 deletions(-)
> 
> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
> index e1dbc98..b3a1330 100644
> --- a/tools/perf/util/probe-event.c
> +++ b/tools/perf/util/probe-event.c
> @@ -1832,6 +1832,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev)
>  			tp->offset = strtoul(fmt2_str, NULL, 10);
>  	}
>  
> +	if (tev->uprobes) {
> +		fmt2_str = strchr(p, '(');
> +		if (fmt2_str)
> +			tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
> +	}
> +
>  	tev->nargs = argc - 2;
>  	tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
>  	if (tev->args == NULL) {
> @@ -2054,15 +2060,21 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
>  	}
>  
>  	/* Use the tp->address for uprobes */
> -	if (tev->uprobes)
> +	if (tev->uprobes) {
>  		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
> -	else if (!strncmp(tp->symbol, "0x", 2))
> +		if (uprobe_ref_ctr_is_supported() &&
> +		    tp->ref_ctr_offset &&
> +		    err >= 0)
> +			err = strbuf_addf(&buf, "(0x%lx)", tp->ref_ctr_offset);

If the kernel doesn't support uprobe_ref_ctr but the event requires
to increment uprobe_ref_ctr, I think we should (at least) warn user here.

> +	} else if (!strncmp(tp->symbol, "0x", 2)) {
>  		/* Absolute address. See try_to_find_absolute_address() */
>  		err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
>  				  tp->module ? ":" : "", tp->address);
> -	else
> +	} else {
>  		err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
>  				tp->module ? ":" : "", tp->symbol, tp->offset);
> +	}
> +
>  	if (err)
>  		goto error;
>  
> diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
> index 45b14f0..15a98c3 100644
> --- a/tools/perf/util/probe-event.h
> +++ b/tools/perf/util/probe-event.h
> @@ -27,6 +27,7 @@ struct probe_trace_point {
>  	char		*symbol;	/* Base symbol */
>  	char		*module;	/* Module name */
>  	unsigned long	offset;		/* Offset from symbol */
> +	unsigned long	ref_ctr_offset;	/* SDT reference counter offset */
>  	unsigned long	address;	/* Actual address of the trace point */
>  	bool		retprobe;	/* Return probe flag */
>  };
> diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
> index 4ae1123..ca0e524 100644
> --- a/tools/perf/util/probe-file.c
> +++ b/tools/perf/util/probe-file.c
> @@ -697,8 +697,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
>  #ifdef HAVE_GELF_GETNOTE_SUPPORT
>  static unsigned long long sdt_note__get_addr(struct sdt_note *note)
>  {
> -	return note->bit32 ? (unsigned long long)note->addr.a32[0]
> -		 : (unsigned long long)note->addr.a64[0];
> +	return note->bit32 ?
> +		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_LOC] :
> +		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_LOC];
> +}
> +
> +static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
> +{
> +	return note->bit32 ?
> +		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_REFCTR] :
> +		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_REFCTR];
>  }
>  
>  static const char * const type_to_suffix[] = {
> @@ -776,14 +784,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
>  {
>  	struct strbuf buf;
>  	char *ret = NULL, **args;
> -	int i, args_count;
> +	int i, args_count, err;
> +	unsigned long long ref_ctr_offset;
>  
>  	if (strbuf_init(&buf, 32) < 0)
>  		return NULL;
>  
> -	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> -				sdtgrp, note->name, pathname,
> -				sdt_note__get_addr(note)) < 0)
> +	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> +			sdtgrp, note->name, pathname,
> +			sdt_note__get_addr(note));
> +
> +	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
> +	if (uprobe_ref_ctr_is_supported() && ref_ctr_offset && err >= 0)
> +		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);

We don't have to care about uprobe_ref_ctr support here, because
this information will be just cached, not directly written to
uprobe_events.

Other parts look good to me.

Thanks,

-- 
Masami Hiramatsu <mhiramat@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore)
  2018-04-09  7:28     ` Masami Hiramatsu
  (?)
@ 2018-04-09  8:29       ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-09  8:29 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse, Ravi Bangoria

Hi Masami,

On 04/09/2018 12:58 PM, Masami Hiramatsu wrote:
> Hi Ravi,
>
> On Wed,  4 Apr 2018 14:01:10 +0530
> Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:
>
>> @@ -2054,15 +2060,21 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
>>  	}
>>  
>>  	/* Use the tp->address for uprobes */
>> -	if (tev->uprobes)
>> +	if (tev->uprobes) {
>>  		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
>> -	else if (!strncmp(tp->symbol, "0x", 2))
>> +		if (uprobe_ref_ctr_is_supported() &&
>> +		    tp->ref_ctr_offset &&
>> +		    err >= 0)
>> +			err = strbuf_addf(&buf, "(0x%lx)", tp->ref_ctr_offset);
> If the kernel doesn't support uprobe_ref_ctr but the event requires
> to increment uprobe_ref_ctr, I think we should (at least) warn user here.

pr_debug("A semaphore is associated with %s:%s and seems your kernel doesn't support it.\n"
         tev->group, tev->event);

Looks good?

>> @@ -776,14 +784,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
>>  {
>>  	struct strbuf buf;
>>  	char *ret = NULL, **args;
>> -	int i, args_count;
>> +	int i, args_count, err;
>> +	unsigned long long ref_ctr_offset;
>>  
>>  	if (strbuf_init(&buf, 32) < 0)
>>  		return NULL;
>>  
>> -	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
>> -				sdtgrp, note->name, pathname,
>> -				sdt_note__get_addr(note)) < 0)
>> +	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
>> +			sdtgrp, note->name, pathname,
>> +			sdt_note__get_addr(note));
>> +
>> +	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
>> +	if (uprobe_ref_ctr_is_supported() && ref_ctr_offset && err >= 0)
>> +		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
> We don't have to care about uprobe_ref_ctr support here, because
> this information will be just cached, not directly written to
> uprobe_events.

Sure, will remove the check.

Thanks for the review :).
Ravi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore)
@ 2018-04-09  8:29       ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-09  8:29 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse, Ravi Bangoria

Hi Masami,

On 04/09/2018 12:58 PM, Masami Hiramatsu wrote:
> Hi Ravi,
>
> On Wed,  4 Apr 2018 14:01:10 +0530
> Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:
>
>> @@ -2054,15 +2060,21 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
>>  	}
>>  
>>  	/* Use the tp->address for uprobes */
>> -	if (tev->uprobes)
>> +	if (tev->uprobes) {
>>  		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
>> -	else if (!strncmp(tp->symbol, "0x", 2))
>> +		if (uprobe_ref_ctr_is_supported() &&
>> +		    tp->ref_ctr_offset &&
>> +		    err >= 0)
>> +			err = strbuf_addf(&buf, "(0x%lx)", tp->ref_ctr_offset);
> If the kernel doesn't support uprobe_ref_ctr but the event requires
> to increment uprobe_ref_ctr, I think we should (at least) warn user here.

pr_debug("A semaphore is associated with %s:%s and seems your kernel doesn't support it.\n"
         tev->group, tev->event);

Looks good?

>> @@ -776,14 +784,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
>>  {
>>  	struct strbuf buf;
>>  	char *ret = NULL, **args;
>> -	int i, args_count;
>> +	int i, args_count, err;
>> +	unsigned long long ref_ctr_offset;
>>  
>>  	if (strbuf_init(&buf, 32) < 0)
>>  		return NULL;
>>  
>> -	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
>> -				sdtgrp, note->name, pathname,
>> -				sdt_note__get_addr(note)) < 0)
>> +	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
>> +			sdtgrp, note->name, pathname,
>> +			sdt_note__get_addr(note));
>> +
>> +	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
>> +	if (uprobe_ref_ctr_is_supported() && ref_ctr_offset && err >= 0)
>> +		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
> We don't have to care about uprobe_ref_ctr support here, because
> this information will be just cached, not directly written to
> uprobe_events.

Sure, will remove the check.

Thanks for the review :).
Ravi

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore)
@ 2018-04-09  8:29       ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-09  8:29 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse, Ravi Bangoria

Hi Masami,

On 04/09/2018 12:58 PM, Masami Hiramatsu wrote:
> Hi Ravi,
>
> On Wed,  4 Apr 2018 14:01:10 +0530
> Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:
>
>> @@ -2054,15 +2060,21 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
>>  	}
>>  
>>  	/* Use the tp->address for uprobes */
>> -	if (tev->uprobes)
>> +	if (tev->uprobes) {
>>  		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
>> -	else if (!strncmp(tp->symbol, "0x", 2))
>> +		if (uprobe_ref_ctr_is_supported() &&
>> +		    tp->ref_ctr_offset &&
>> +		    err >= 0)
>> +			err = strbuf_addf(&buf, "(0x%lx)", tp->ref_ctr_offset);
> If the kernel doesn't support uprobe_ref_ctr but the event requires
> to increment uprobe_ref_ctr, I think we should (at least) warn user here.

pr_debug("A semaphore is associated with %s:%s and seems your kernel doesn't support it.\n"
A A A A A A A A  tev->group, tev->event);

Looks good?

>> @@ -776,14 +784,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
>>  {
>>  	struct strbuf buf;
>>  	char *ret = NULL, **args;
>> -	int i, args_count;
>> +	int i, args_count, err;
>> +	unsigned long long ref_ctr_offset;
>>  
>>  	if (strbuf_init(&buf, 32) < 0)
>>  		return NULL;
>>  
>> -	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
>> -				sdtgrp, note->name, pathname,
>> -				sdt_note__get_addr(note)) < 0)
>> +	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
>> +			sdtgrp, note->name, pathname,
>> +			sdt_note__get_addr(note));
>> +
>> +	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
>> +	if (uprobe_ref_ctr_is_supported() && ref_ctr_offset && err >= 0)
>> +		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
> We don't have to care about uprobe_ref_ctr support here, because
> this information will be just cached, not directly written to
> uprobe_events.

Sure, will remove the check.

Thanks for the review :).
Ravi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
  2018-04-04  8:31   ` Ravi Bangoria
@ 2018-04-09 13:17     ` Oleg Nesterov
  -1 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2018-04-09 13:17 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse

On 04/04, Ravi Bangoria wrote:
>
> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
> +{
> +	struct mmu_notifier *mn;
> +	struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
> +
> +	if (!sml)
> +		return;
> +	sml->mm = mm;
> +	list_add(&(sml->list), &(tu->sml.list));
> +
> +	/* Register mmu_notifier for this mm. */
> +	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
> +	if (!mn)
> +		return;
> +
> +	mn->ops = &sdt_mmu_notifier_ops;
> +	__mmu_notifier_register(mn, mm);
> +}

I didn't read this version yet, just one question...

So now it depends on CONFIG_MMU_NOTIFIER, yes? I do not see any changes in Kconfig
files, this doesn't look right...

Oleg.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
@ 2018-04-09 13:17     ` Oleg Nesterov
  0 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2018-04-09 13:17 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse

On 04/04, Ravi Bangoria wrote:
>
> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
> +{
> +	struct mmu_notifier *mn;
> +	struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
> +
> +	if (!sml)
> +		return;
> +	sml->mm = mm;
> +	list_add(&(sml->list), &(tu->sml.list));
> +
> +	/* Register mmu_notifier for this mm. */
> +	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
> +	if (!mn)
> +		return;
> +
> +	mn->ops = &sdt_mmu_notifier_ops;
> +	__mmu_notifier_register(mn, mm);
> +}

I didn't read this version yet, just one question...

So now it depends on CONFIG_MMU_NOTIFIER, yes? I do not see any changes in Kconfig
files, this doesn't look right...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
  2018-04-04  8:31   ` Ravi Bangoria
@ 2018-04-09 13:29     ` Oleg Nesterov
  -1 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2018-04-09 13:29 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse

On 04/04, Ravi Bangoria wrote:
>
> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
> +{
> +	struct mmu_notifier *mn;
> +	struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
> +
> +	if (!sml)
> +		return;
> +	sml->mm = mm;
> +	list_add(&(sml->list), &(tu->sml.list));
> +
> +	/* Register mmu_notifier for this mm. */
> +	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
> +	if (!mn)
> +		return;
> +
> +	mn->ops = &sdt_mmu_notifier_ops;
> +	__mmu_notifier_register(mn, mm);
> +}

and what if __mmu_notifier_register() fails simply because signal_pending() == T?
see mm_take_all_locks().

at first glance this all look suspicious and sub-optimal, but let me repeat that
I didn't read this version yet.

Oleg.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
@ 2018-04-09 13:29     ` Oleg Nesterov
  0 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2018-04-09 13:29 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse

On 04/04, Ravi Bangoria wrote:
>
> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
> +{
> +	struct mmu_notifier *mn;
> +	struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
> +
> +	if (!sml)
> +		return;
> +	sml->mm = mm;
> +	list_add(&(sml->list), &(tu->sml.list));
> +
> +	/* Register mmu_notifier for this mm. */
> +	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
> +	if (!mn)
> +		return;
> +
> +	mn->ops = &sdt_mmu_notifier_ops;
> +	__mmu_notifier_register(mn, mm);
> +}

and what if __mmu_notifier_register() fails simply because signal_pending() == T?
see mm_take_all_locks().

at first glance this all look suspicious and sub-optimal, but let me repeat that
I didn't read this version yet.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
  2018-04-09 13:17     ` Oleg Nesterov
@ 2018-04-09 13:32       ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-09 13:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse, Ravi Bangoria

Hi Oleg,

On 04/09/2018 06:47 PM, Oleg Nesterov wrote:
> On 04/04, Ravi Bangoria wrote:
>> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
>> +{
>> +	struct mmu_notifier *mn;
>> +	struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
>> +
>> +	if (!sml)
>> +		return;
>> +	sml->mm = mm;
>> +	list_add(&(sml->list), &(tu->sml.list));
>> +
>> +	/* Register mmu_notifier for this mm. */
>> +	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
>> +	if (!mn)
>> +		return;
>> +
>> +	mn->ops = &sdt_mmu_notifier_ops;
>> +	__mmu_notifier_register(mn, mm);
>> +}
> I didn't read this version yet, just one question...
>
> So now it depends on CONFIG_MMU_NOTIFIER, yes? I do not see any changes in Kconfig
> files, this doesn't look right...

Yes, you are write. I'll make CONFIG_UPROBE_EVENTS dependent on
CONFIG_MMU_NOTIFIER.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
@ 2018-04-09 13:32       ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-09 13:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse, Ravi Bangoria

Hi Oleg,

On 04/09/2018 06:47 PM, Oleg Nesterov wrote:
> On 04/04, Ravi Bangoria wrote:
>> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
>> +{
>> +	struct mmu_notifier *mn;
>> +	struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
>> +
>> +	if (!sml)
>> +		return;
>> +	sml->mm = mm;
>> +	list_add(&(sml->list), &(tu->sml.list));
>> +
>> +	/* Register mmu_notifier for this mm. */
>> +	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
>> +	if (!mn)
>> +		return;
>> +
>> +	mn->ops = &sdt_mmu_notifier_ops;
>> +	__mmu_notifier_register(mn, mm);
>> +}
> I didn't read this version yet, just one question...
>
> So now it depends on CONFIG_MMU_NOTIFIER, yes? I do not see any changes in Kconfig
> files, this doesn't look right...

Yes, you are write. I'll make CONFIG_UPROBE_EVENTS dependent on
CONFIG_MMU_NOTIFIER.

Thanks,
Ravi

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
  2018-04-09 13:32       ` Ravi Bangoria
@ 2018-04-09 13:41         ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-09 13:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse, Ravi Bangoria



On 04/09/2018 07:02 PM, Ravi Bangoria wrote:
> Hi Oleg,
>
> On 04/09/2018 06:47 PM, Oleg Nesterov wrote:
>> I didn't read this version yet, just one question...
>>
>> So now it depends on CONFIG_MMU_NOTIFIER, yes? I do not see any changes in Kconfig
>> files, this doesn't look right...
> Yes, you are write.

s/write/right.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
@ 2018-04-09 13:41         ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-09 13:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse, Ravi Bangoria



On 04/09/2018 07:02 PM, Ravi Bangoria wrote:
> Hi Oleg,
>
> On 04/09/2018 06:47 PM, Oleg Nesterov wrote:
>> I didn't read this version yet, just one question...
>>
>> So now it depends on CONFIG_MMU_NOTIFIER, yes? I do not see any changes in Kconfig
>> files, this doesn't look right...
> Yes, you are write.

s/write/right.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore)
  2018-04-09  8:29       ` Ravi Bangoria
  (?)
@ 2018-04-09 14:08         ` Masami Hiramatsu
  -1 siblings, 0 replies; 56+ messages in thread
From: Masami Hiramatsu @ 2018-04-09 14:08 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse

On Mon, 9 Apr 2018 13:59:16 +0530
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:

> Hi Masami,
> 
> On 04/09/2018 12:58 PM, Masami Hiramatsu wrote:
> > Hi Ravi,
> >
> > On Wed,  4 Apr 2018 14:01:10 +0530
> > Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:
> >
> >> @@ -2054,15 +2060,21 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
> >>  	}
> >>  
> >>  	/* Use the tp->address for uprobes */
> >> -	if (tev->uprobes)
> >> +	if (tev->uprobes) {
> >>  		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
> >> -	else if (!strncmp(tp->symbol, "0x", 2))
> >> +		if (uprobe_ref_ctr_is_supported() &&
> >> +		    tp->ref_ctr_offset &&
> >> +		    err >= 0)
> >> +			err = strbuf_addf(&buf, "(0x%lx)", tp->ref_ctr_offset);
> > If the kernel doesn't support uprobe_ref_ctr but the event requires
> > to increment uprobe_ref_ctr, I think we should (at least) warn user here.
> 
> pr_debug("A semaphore is associated with %s:%s and seems your kernel doesn't support it.\n"
>          tev->group, tev->event);
> 
> Looks good?

I think it should be pr_warning() and return NULL, since user may not be able to
trace the event even if it is enabled.

> 
> >> @@ -776,14 +784,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
> >>  {
> >>  	struct strbuf buf;
> >>  	char *ret = NULL, **args;
> >> -	int i, args_count;
> >> +	int i, args_count, err;
> >> +	unsigned long long ref_ctr_offset;
> >>  
> >>  	if (strbuf_init(&buf, 32) < 0)
> >>  		return NULL;
> >>  
> >> -	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> >> -				sdtgrp, note->name, pathname,
> >> -				sdt_note__get_addr(note)) < 0)
> >> +	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> >> +			sdtgrp, note->name, pathname,
> >> +			sdt_note__get_addr(note));
> >> +
> >> +	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
> >> +	if (uprobe_ref_ctr_is_supported() && ref_ctr_offset && err >= 0)
> >> +		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
> > We don't have to care about uprobe_ref_ctr support here, because
> > this information will be just cached, not directly written to
> > uprobe_events.
> 
> Sure, will remove the check.

Thanks!

> 
> Thanks for the review :).
> Ravi
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore)
@ 2018-04-09 14:08         ` Masami Hiramatsu
  0 siblings, 0 replies; 56+ messages in thread
From: Masami Hiramatsu @ 2018-04-09 14:08 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse

On Mon, 9 Apr 2018 13:59:16 +0530
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:

> Hi Masami,
> 
> On 04/09/2018 12:58 PM, Masami Hiramatsu wrote:
> > Hi Ravi,
> >
> > On Wed,  4 Apr 2018 14:01:10 +0530
> > Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:
> >
> >> @@ -2054,15 +2060,21 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
> >>  	}
> >>  
> >>  	/* Use the tp->address for uprobes */
> >> -	if (tev->uprobes)
> >> +	if (tev->uprobes) {
> >>  		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
> >> -	else if (!strncmp(tp->symbol, "0x", 2))
> >> +		if (uprobe_ref_ctr_is_supported() &&
> >> +		    tp->ref_ctr_offset &&
> >> +		    err >= 0)
> >> +			err = strbuf_addf(&buf, "(0x%lx)", tp->ref_ctr_offset);
> > If the kernel doesn't support uprobe_ref_ctr but the event requires
> > to increment uprobe_ref_ctr, I think we should (at least) warn user here.
> 
> pr_debug("A semaphore is associated with %s:%s and seems your kernel doesn't support it.\n"
>          tev->group, tev->event);
> 
> Looks good?

I think it should be pr_warning() and return NULL, since user may not be able to
trace the event even if it is enabled.

> 
> >> @@ -776,14 +784,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
> >>  {
> >>  	struct strbuf buf;
> >>  	char *ret = NULL, **args;
> >> -	int i, args_count;
> >> +	int i, args_count, err;
> >> +	unsigned long long ref_ctr_offset;
> >>  
> >>  	if (strbuf_init(&buf, 32) < 0)
> >>  		return NULL;
> >>  
> >> -	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> >> -				sdtgrp, note->name, pathname,
> >> -				sdt_note__get_addr(note)) < 0)
> >> +	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> >> +			sdtgrp, note->name, pathname,
> >> +			sdt_note__get_addr(note));
> >> +
> >> +	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
> >> +	if (uprobe_ref_ctr_is_supported() && ref_ctr_offset && err >= 0)
> >> +		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
> > We don't have to care about uprobe_ref_ctr support here, because
> > this information will be just cached, not directly written to
> > uprobe_events.
> 
> Sure, will remove the check.

Thanks!

> 
> Thanks for the review :).
> Ravi
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore)
@ 2018-04-09 14:08         ` Masami Hiramatsu
  0 siblings, 0 replies; 56+ messages in thread
From: Masami Hiramatsu @ 2018-04-09 14:08 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse

On Mon, 9 Apr 2018 13:59:16 +0530
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:

> Hi Masami,
> 
> On 04/09/2018 12:58 PM, Masami Hiramatsu wrote:
> > Hi Ravi,
> >
> > On Wed,  4 Apr 2018 14:01:10 +0530
> > Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:
> >
> >> @@ -2054,15 +2060,21 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
> >>  	}
> >>  
> >>  	/* Use the tp->address for uprobes */
> >> -	if (tev->uprobes)
> >> +	if (tev->uprobes) {
> >>  		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
> >> -	else if (!strncmp(tp->symbol, "0x", 2))
> >> +		if (uprobe_ref_ctr_is_supported() &&
> >> +		    tp->ref_ctr_offset &&
> >> +		    err >= 0)
> >> +			err = strbuf_addf(&buf, "(0x%lx)", tp->ref_ctr_offset);
> > If the kernel doesn't support uprobe_ref_ctr but the event requires
> > to increment uprobe_ref_ctr, I think we should (at least) warn user here.
> 
> pr_debug("A semaphore is associated with %s:%s and seems your kernel doesn't support it.\n"
> A A A A A A A A  tev->group, tev->event);
> 
> Looks good?

I think it should be pr_warning() and return NULL, since user may not be able to
trace the event even if it is enabled.

> 
> >> @@ -776,14 +784,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
> >>  {
> >>  	struct strbuf buf;
> >>  	char *ret = NULL, **args;
> >> -	int i, args_count;
> >> +	int i, args_count, err;
> >> +	unsigned long long ref_ctr_offset;
> >>  
> >>  	if (strbuf_init(&buf, 32) < 0)
> >>  		return NULL;
> >>  
> >> -	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> >> -				sdtgrp, note->name, pathname,
> >> -				sdt_note__get_addr(note)) < 0)
> >> +	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
> >> +			sdtgrp, note->name, pathname,
> >> +			sdt_note__get_addr(note));
> >> +
> >> +	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
> >> +	if (uprobe_ref_ctr_is_supported() && ref_ctr_offset && err >= 0)
> >> +		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
> > We don't have to care about uprobe_ref_ctr support here, because
> > this information will be just cached, not directly written to
> > uprobe_events.
> 
> Sure, will remove the check.

Thanks!

> 
> Thanks for the review :).
> Ravi
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
  2018-04-09 13:29     ` Oleg Nesterov
@ 2018-04-10  8:19       ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-10  8:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse, Ravi Bangoria

Hi Oleg,

On 04/09/2018 06:59 PM, Oleg Nesterov wrote:
> On 04/04, Ravi Bangoria wrote:
>> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
>> +{
>> +	struct mmu_notifier *mn;
>> +	struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
>> +
>> +	if (!sml)
>> +		return;
>> +	sml->mm = mm;
>> +	list_add(&(sml->list), &(tu->sml.list));
>> +
>> +	/* Register mmu_notifier for this mm. */
>> +	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
>> +	if (!mn)
>> +		return;
>> +
>> +	mn->ops = &sdt_mmu_notifier_ops;
>> +	__mmu_notifier_register(mn, mm);
>> +}
> and what if __mmu_notifier_register() fails simply because signal_pending() == T?
> see mm_take_all_locks().
>
> at first glance this all look suspicious and sub-optimal,

Yes. I should have added checks for failure cases.
Will fix them in v3.

Thanks for the review,
Ravi

>  but let me repeat that
> I didn't read this version yet.
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
@ 2018-04-10  8:19       ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-10  8:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse, Ravi Bangoria

Hi Oleg,

On 04/09/2018 06:59 PM, Oleg Nesterov wrote:
> On 04/04, Ravi Bangoria wrote:
>> +static void sdt_add_mm_list(struct trace_uprobe *tu, struct mm_struct *mm)
>> +{
>> +	struct mmu_notifier *mn;
>> +	struct sdt_mm_list *sml = kzalloc(sizeof(*sml), GFP_KERNEL);
>> +
>> +	if (!sml)
>> +		return;
>> +	sml->mm = mm;
>> +	list_add(&(sml->list), &(tu->sml.list));
>> +
>> +	/* Register mmu_notifier for this mm. */
>> +	mn = kzalloc(sizeof(*mn), GFP_KERNEL);
>> +	if (!mn)
>> +		return;
>> +
>> +	mn->ops = &sdt_mmu_notifier_ops;
>> +	__mmu_notifier_register(mn, mm);
>> +}
> and what if __mmu_notifier_register() fails simply because signal_pending() == T?
> see mm_take_all_locks().
>
> at first glance this all look suspicious and sub-optimal,

Yes. I should have added checks for failure cases.
Will fix them in v3.

Thanks for the review,
Ravi

>  but let me repeat that
> I didn't read this version yet.
>
> Oleg.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
  2018-04-10  8:19       ` Ravi Bangoria
@ 2018-04-10 11:06         ` Oleg Nesterov
  -1 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2018-04-10 11:06 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse

Hi Ravi,

On 04/10, Ravi Bangoria wrote:
>
> > and what if __mmu_notifier_register() fails simply because signal_pending() == T?
> > see mm_take_all_locks().
> >
> > at first glance this all look suspicious and sub-optimal,
>
> Yes. I should have added checks for failure cases.
> Will fix them in v3.

And what can you do if it fails? Nothing except report the problem. But
signal_pending() is not the unlikely or error condition, it should not
cause the tracing errors.

Plus mm_take_all_locks() is very heavy... BTW, uprobe_mmap_callback() is
called unconditionally. Whatever it does, can we at least move it after
the no_uprobe_events() check? Can't we also check MMF_HAS_UPROBES?

Either way, I do not feel that mmu_notifier is the right tool... Did you
consider the uprobe_clear_state() hook we already have?

Oleg.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
@ 2018-04-10 11:06         ` Oleg Nesterov
  0 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2018-04-10 11:06 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse

Hi Ravi,

On 04/10, Ravi Bangoria wrote:
>
> > and what if __mmu_notifier_register() fails simply because signal_pending() == T?
> > see mm_take_all_locks().
> >
> > at first glance this all look suspicious and sub-optimal,
>
> Yes. I should have added checks for failure cases.
> Will fix them in v3.

And what can you do if it fails? Nothing except report the problem. But
signal_pending() is not the unlikely or error condition, it should not
cause the tracing errors.

Plus mm_take_all_locks() is very heavy... BTW, uprobe_mmap_callback() is
called unconditionally. Whatever it does, can we at least move it after
the no_uprobe_events() check? Can't we also check MMF_HAS_UPROBES?

Either way, I do not feel that mmu_notifier is the right tool... Did you
consider the uprobe_clear_state() hook we already have?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
  2018-04-10 11:06         ` Oleg Nesterov
@ 2018-04-11  4:28           ` Ravi Bangoria
  -1 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-11  4:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse, Ravi Bangoria

Hi Oleg,

On 04/10/2018 04:36 PM, Oleg Nesterov wrote:
> Hi Ravi,
>
> On 04/10, Ravi Bangoria wrote:
>>> and what if __mmu_notifier_register() fails simply because signal_pending() == T?
>>> see mm_take_all_locks().
>>>
>>> at first glance this all look suspicious and sub-optimal,
>> Yes. I should have added checks for failure cases.
>> Will fix them in v3.
> And what can you do if it fails? Nothing except report the problem. But
> signal_pending() is not the unlikely or error condition, it should not
> cause the tracing errors.

...

> Plus mm_take_all_locks() is very heavy... BTW, uprobe_mmap_callback() is
> called unconditionally. Whatever it does, can we at least move it after
> the no_uprobe_events() check? Can't we also check MMF_HAS_UPROBES?

Sure, I'll move it after these conditions.

> Either way, I do not feel that mmu_notifier is the right tool... Did you
> consider the uprobe_clear_state() hook we already have?

Ah! This is really a good idea. We don't need mmu_notifier then.

Thanks for suggestion,
Ravi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter
@ 2018-04-11  4:28           ` Ravi Bangoria
  0 siblings, 0 replies; 56+ messages in thread
From: Ravi Bangoria @ 2018-04-11  4:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mhiramat, peterz, srikar, rostedt, acme, ananth, akpm,
	alexander.shishkin, alexis.berlemont, corbet, dan.j.williams,
	jolsa, kan.liang, kjlx, kstewart, linux-doc, linux-kernel,
	linux-mm, milian.wolff, mingo, namhyung, naveen.n.rao, pc, tglx,
	yao.jin, fengguang.wu, jglisse, Ravi Bangoria

Hi Oleg,

On 04/10/2018 04:36 PM, Oleg Nesterov wrote:
> Hi Ravi,
>
> On 04/10, Ravi Bangoria wrote:
>>> and what if __mmu_notifier_register() fails simply because signal_pending() == T?
>>> see mm_take_all_locks().
>>>
>>> at first glance this all look suspicious and sub-optimal,
>> Yes. I should have added checks for failure cases.
>> Will fix them in v3.
> And what can you do if it fails? Nothing except report the problem. But
> signal_pending() is not the unlikely or error condition, it should not
> cause the tracing errors.

...

> Plus mm_take_all_locks() is very heavy... BTW, uprobe_mmap_callback() is
> called unconditionally. Whatever it does, can we at least move it after
> the no_uprobe_events() check? Can't we also check MMF_HAS_UPROBES?

Sure, I'll move it after these conditions.

> Either way, I do not feel that mmu_notifier is the right tool... Did you
> consider the uprobe_clear_state() hook we already have?

Ah! This is really a good idea. We don't need mmu_notifier then.

Thanks for suggestion,
Ravi

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2018-04-11  4:28 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-04  8:31 [PATCH v2 0/9] trace_uprobe: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-04-04  8:31 ` Ravi Bangoria
2018-04-04  8:31 ` [PATCH v2 1/9] Uprobe: Export vaddr <-> offset conversion functions Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04  8:31 ` [PATCH v2 2/9] mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr() Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04  8:31 ` [PATCH v2 3/9] Uprobe: Move mmput() into free_map_info() Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04  8:31 ` [PATCH v2 4/9] Uprobe: Rename map_info to uprobe_map_info Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04  8:31 ` [PATCH v2 5/9] Uprobe: Export uprobe_map_info along with uprobe_{build/free}_map_info() Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04  8:31 ` [PATCH v2 6/9] trace_uprobe: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04 15:03   ` [RFC PATCH] trace_uprobe: trace_uprobe_mmap() can be static kbuild test robot
2018-04-04 15:03     ` kbuild test robot
2018-04-04 15:03     ` kbuild test robot
2018-04-04 15:03   ` [PATCH v2 6/9] trace_uprobe: Support SDT markers having reference count (semaphore) kbuild test robot
2018-04-04 15:03     ` kbuild test robot
2018-04-04 15:03     ` kbuild test robot
2018-04-04  8:31 ` [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04 13:18   ` kbuild test robot
2018-04-04 13:18     ` kbuild test robot
2018-04-04 13:24   ` kbuild test robot
2018-04-04 13:24     ` kbuild test robot
2018-04-09 13:17   ` Oleg Nesterov
2018-04-09 13:17     ` Oleg Nesterov
2018-04-09 13:32     ` Ravi Bangoria
2018-04-09 13:32       ` Ravi Bangoria
2018-04-09 13:41       ` Ravi Bangoria
2018-04-09 13:41         ` Ravi Bangoria
2018-04-09 13:29   ` Oleg Nesterov
2018-04-09 13:29     ` Oleg Nesterov
2018-04-10  8:19     ` Ravi Bangoria
2018-04-10  8:19       ` Ravi Bangoria
2018-04-10 11:06       ` Oleg Nesterov
2018-04-10 11:06         ` Oleg Nesterov
2018-04-11  4:28         ` Ravi Bangoria
2018-04-11  4:28           ` Ravi Bangoria
2018-04-04  8:31 ` [PATCH v2 8/9] trace_uprobe/sdt: Document about " Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-04  8:31 ` [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
2018-04-04  8:31   ` Ravi Bangoria
2018-04-09  7:28   ` Masami Hiramatsu
2018-04-09  7:28     ` Masami Hiramatsu
2018-04-09  8:29     ` Ravi Bangoria
2018-04-09  8:29       ` Ravi Bangoria
2018-04-09  8:29       ` Ravi Bangoria
2018-04-09 14:08       ` Masami Hiramatsu
2018-04-09 14:08         ` Masami Hiramatsu
2018-04-09 14:08         ` Masami Hiramatsu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.