All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-06  8:33 ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Why RFC again:

This series is different from earlier versions[1]. Earlier series
implemented this feature in trace_uprobe while this has implemented
the logic in core uprobe. Few reasons for this:
 1. One of the major reason was the deadlock between uprobe_lock and
 mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
 because mm->mmap is not in control of trace_uprobe_mmap() and it has
 to take uprobe_lock to loop over trace_uprobe list. More details can
 be found at[2]. With this new approach, there are no deadlocks found
 so far.
 2. Many of the core uprobe function and data-structures needs to be
 exported to make earlier implementation simple. With this new approach,
 reference counter logic is been implemented in core uprobe and thus
 no need to export anything.

Description:

Userspace Statically Defined Tracepoints[3] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the trace_uprobe infrastructure. Ex,[4]

  # cat tick.c
    ... 
    for (i = 0; i < 100; i++) {
	DTRACE_PROBE1(tick, loop1, i);
        if (TICK_LOOP2_ENABLED()) {
            DTRACE_PROBE1(tick, loop2, i); 
        }
        printf("hi: %d\n", i); 
        sleep(1);
    }   
    ... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
             3      sdt_tick:loop1
             0      sdt_tick:loop2
     2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C  
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed


Note:
 - 'reference counter' is called as 'semaphore' in original Dtrace
   (or Systemtap, bcc and even in ELF) documentation and code. But the 
   term 'semaphore' is misleading in this context. This is just a counter
   used to hold number of tracers tracing on a marker. This is not really
   used for any synchronization. So we are referring it as 'reference
   counter' in kernel / perf code.
 - This patches still has one issue. If there are multiple instances of
   same application running and user wants to trace any particular
   instance, trace_uprobe is updating reference counter in all instances.
   This is not a problem on user side because instruction is not replaced
   with trap/int3 and thus user will only see samples from his interested
   process. But still this is more of a correctness issue. I'm working on
   a fix for this.

[1] https://lkml.org/lkml/2018/4/17/23
[2] https://lkml.org/lkml/2018/5/25/111
[3] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[4] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506

Ravi Bangoria (7):
  Uprobes: Simplify uprobe_register() body
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Fix multiple update of same reference counter
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  Uprobes/sdt: Document about reference counter
  perf probe: Support SDT markers having reference counter (semaphore)

 Documentation/trace/uprobetracer.rst |  16 +-
 include/linux/uprobes.h              |   5 +
 kernel/events/uprobes.c              | 502 +++++++++++++++++++++++++++++++----
 kernel/trace/trace.c                 |   2 +-
 kernel/trace/trace_uprobe.c          |  74 +++++-
 tools/perf/util/probe-event.c        |  39 ++-
 tools/perf/util/probe-event.h        |   1 +
 tools/perf/util/probe-file.c         |  34 ++-
 tools/perf/util/probe-file.h         |   1 +
 tools/perf/util/symbol-elf.c         |  46 +++-
 tools/perf/util/symbol.h             |   7 +
 11 files changed, 643 insertions(+), 84 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-06  8:33 ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Why RFC again:

This series is different from earlier versions[1]. Earlier series
implemented this feature in trace_uprobe while this has implemented
the logic in core uprobe. Few reasons for this:
 1. One of the major reason was the deadlock between uprobe_lock and
 mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
 because mm->mmap is not in control of trace_uprobe_mmap() and it has
 to take uprobe_lock to loop over trace_uprobe list. More details can
 be found at[2]. With this new approach, there are no deadlocks found
 so far.
 2. Many of the core uprobe function and data-structures needs to be
 exported to make earlier implementation simple. With this new approach,
 reference counter logic is been implemented in core uprobe and thus
 no need to export anything.

Description:

Userspace Statically Defined Tracepoints[3] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the trace_uprobe infrastructure. Ex,[4]

  # cat tick.c
    ... 
    for (i = 0; i < 100; i++) {
	DTRACE_PROBE1(tick, loop1, i);
        if (TICK_LOOP2_ENABLED()) {
            DTRACE_PROBE1(tick, loop2, i); 
        }
        printf("hi: %d\n", i); 
        sleep(1);
    }   
    ... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
             3      sdt_tick:loop1
             0      sdt_tick:loop2
     2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C  
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed


Note:
 - 'reference counter' is called as 'semaphore' in original Dtrace
   (or Systemtap, bcc and even in ELF) documentation and code. But the 
   term 'semaphore' is misleading in this context. This is just a counter
   used to hold number of tracers tracing on a marker. This is not really
   used for any synchronization. So we are referring it as 'reference
   counter' in kernel / perf code.
 - This patches still has one issue. If there are multiple instances of
   same application running and user wants to trace any particular
   instance, trace_uprobe is updating reference counter in all instances.
   This is not a problem on user side because instruction is not replaced
   with trap/int3 and thus user will only see samples from his interested
   process. But still this is more of a correctness issue. I'm working on
   a fix for this.

[1] https://lkml.org/lkml/2018/4/17/23
[2] https://lkml.org/lkml/2018/5/25/111
[3] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[4] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506

Ravi Bangoria (7):
  Uprobes: Simplify uprobe_register() body
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Fix multiple update of same reference counter
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  Uprobes/sdt: Document about reference counter
  perf probe: Support SDT markers having reference counter (semaphore)

 Documentation/trace/uprobetracer.rst |  16 +-
 include/linux/uprobes.h              |   5 +
 kernel/events/uprobes.c              | 502 +++++++++++++++++++++++++++++++----
 kernel/trace/trace.c                 |   2 +-
 kernel/trace/trace_uprobe.c          |  74 +++++-
 tools/perf/util/probe-event.c        |  39 ++-
 tools/perf/util/probe-event.h        |   1 +
 tools/perf/util/probe-file.c         |  34 ++-
 tools/perf/util/probe-file.h         |   1 +
 tools/perf/util/symbol-elf.c         |  46 +++-
 tools/perf/util/symbol.h             |   7 +
 11 files changed, 643 insertions(+), 84 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/7] Uprobes: Simplify uprobe_register() body
  2018-06-06  8:33 ` Ravi Bangoria
@ 2018-06-06  8:33   ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Simplify uprobe_register() function body and let __uprobe_register()
handle everything. Also move dependency functions around to fix build
failures.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/events/uprobes.c | 69 ++++++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1725b90..c377a85 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,13 +840,8 @@ static inline struct map_info *free_map_info(struct map_info *info)
 	return err;
 }
 
-static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-	consumer_add(uprobe, uc);
-	return register_for_each_vma(uprobe, uc);
-}
-
-static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
+static void
+__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 	int err;
 
@@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *u
 }
 
 /*
- * uprobe_register - register a probe
+ * uprobe_unregister - unregister a already registered probe.
+ * @inode: the file in which the probe has to be removed.
+ * @offset: offset from the start of the file.
+ * @uc: identify which probe if multiple probes are colocated.
+ */
+void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+{
+	struct uprobe *uprobe;
+
+	uprobe = find_uprobe(inode, offset);
+	if (WARN_ON(!uprobe))
+		return;
+
+	down_write(&uprobe->register_rwsem);
+	__uprobe_unregister(uprobe, uc);
+	up_write(&uprobe->register_rwsem);
+	put_uprobe(uprobe);
+}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
+
+/*
+ * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, uprobe_register() takes a creation
+ * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of uprobe_register() is required to keep @inode
+ * unregisters. Caller of __uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+static int __uprobe_register(struct inode *inode, loff_t offset,
+			     struct uprobe_consumer *uc)
 {
 	struct uprobe *uprobe;
 	int ret;
@@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
 	down_write(&uprobe->register_rwsem);
 	ret = -EAGAIN;
 	if (likely(uprobe_is_active(uprobe))) {
-		ret = __uprobe_register(uprobe, uc);
+		consumer_add(uprobe, uc);
+		ret = register_for_each_vma(uprobe, uc);
 		if (ret)
 			__uprobe_unregister(uprobe, uc);
 	}
@@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
 		goto retry;
 	return ret;
 }
+
+int uprobe_register(struct inode *inode, loff_t offset,
+		    struct uprobe_consumer *uc)
+{
+	return __uprobe_register(inode, offset, uc);
+}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
@@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
 	return ret;
 }
 
-/*
- * uprobe_unregister - unregister a already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
- * @uc: identify which probe if multiple probes are colocated.
- */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
-{
-	struct uprobe *uprobe;
-
-	uprobe = find_uprobe(inode, offset);
-	if (WARN_ON(!uprobe))
-		return;
-
-	down_write(&uprobe->register_rwsem);
-	__uprobe_unregister(uprobe, uc);
-	up_write(&uprobe->register_rwsem);
-	put_uprobe(uprobe);
-}
-EXPORT_SYMBOL_GPL(uprobe_unregister);
-
 static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 {
 	struct vm_area_struct *vma;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 1/7] Uprobes: Simplify uprobe_register() body
@ 2018-06-06  8:33   ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Simplify uprobe_register() function body and let __uprobe_register()
handle everything. Also move dependency functions around to fix build
failures.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/events/uprobes.c | 69 ++++++++++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1725b90..c377a85 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,13 +840,8 @@ static inline struct map_info *free_map_info(struct map_info *info)
 	return err;
 }
 
-static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-	consumer_add(uprobe, uc);
-	return register_for_each_vma(uprobe, uc);
-}
-
-static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
+static void
+__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 	int err;
 
@@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *u
 }
 
 /*
- * uprobe_register - register a probe
+ * uprobe_unregister - unregister a already registered probe.
+ * @inode: the file in which the probe has to be removed.
+ * @offset: offset from the start of the file.
+ * @uc: identify which probe if multiple probes are colocated.
+ */
+void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+{
+	struct uprobe *uprobe;
+
+	uprobe = find_uprobe(inode, offset);
+	if (WARN_ON(!uprobe))
+		return;
+
+	down_write(&uprobe->register_rwsem);
+	__uprobe_unregister(uprobe, uc);
+	up_write(&uprobe->register_rwsem);
+	put_uprobe(uprobe);
+}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
+
+/*
+ * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, uprobe_register() takes a creation
+ * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of uprobe_register() is required to keep @inode
+ * unregisters. Caller of __uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+static int __uprobe_register(struct inode *inode, loff_t offset,
+			     struct uprobe_consumer *uc)
 {
 	struct uprobe *uprobe;
 	int ret;
@@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
 	down_write(&uprobe->register_rwsem);
 	ret = -EAGAIN;
 	if (likely(uprobe_is_active(uprobe))) {
-		ret = __uprobe_register(uprobe, uc);
+		consumer_add(uprobe, uc);
+		ret = register_for_each_vma(uprobe, uc);
 		if (ret)
 			__uprobe_unregister(uprobe, uc);
 	}
@@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
 		goto retry;
 	return ret;
 }
+
+int uprobe_register(struct inode *inode, loff_t offset,
+		    struct uprobe_consumer *uc)
+{
+	return __uprobe_register(inode, offset, uc);
+}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
@@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
 	return ret;
 }
 
-/*
- * uprobe_unregister - unregister a already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
- * @uc: identify which probe if multiple probes are colocated.
- */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
-{
-	struct uprobe *uprobe;
-
-	uprobe = find_uprobe(inode, offset);
-	if (WARN_ON(!uprobe))
-		return;
-
-	down_write(&uprobe->register_rwsem);
-	__uprobe_unregister(uprobe, uc);
-	up_write(&uprobe->register_rwsem);
-	put_uprobe(uprobe);
-}
-EXPORT_SYMBOL_GPL(uprobe_unregister);
-
 static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 {
 	struct vm_area_struct *vma;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-06  8:33 ` Ravi Bangoria
@ 2018-06-06  8:33   ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing. Implement the reference counter logic in Uprobe.
New function uprobe_register_refctr() has been added for this. Also,
it's not exported so, for now, the interface to use reference counter
is only through trace_uprobe.

trace_uprobe definition with reference counter will now be:

  <path>:<offset>[(ref_ctr_offset)]

where ref_ctr_offset is an optional field.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 include/linux/uprobes.h     |   5 +
 kernel/events/uprobes.c     | 298 +++++++++++++++++++++++++++++++++++++++-----
 kernel/trace/trace_uprobe.c |  38 +++++-
 3 files changed, 309 insertions(+), 32 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e9..58666c6 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ struct uprobes_state {
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, loff_t ref_ctr_offset);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ struct uprobes_state {
 {
 	return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, unsigned long ref_ctr_offset)
+{
+	return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c377a85..ed3c588 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -64,6 +64,11 @@
 /* Have a copy of original instruction */
 #define UPROBE_COPY_INSN	0
 
+enum {
+	UPROBE_OFFSET = 0,
+	REF_CTR_OFFSET
+};
+
 struct uprobe {
 	struct rb_node		rb_node;	/* node in the rb tree */
 	atomic_t		ref;
@@ -73,6 +78,7 @@ struct uprobe {
 	struct uprobe_consumer	*consumers;
 	struct inode		*inode;		/* Also hold a ref to inode */
 	loff_t			offset;
+	loff_t			ref_ctr_offset;
 	unsigned long		flags;
 
 	/*
@@ -483,7 +489,8 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
 	return u;
 }
 
-static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
+static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
+				   loff_t ref_ctr_offset)
 {
 	struct uprobe *uprobe, *cur_uprobe;
 
@@ -493,6 +500,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
 
 	uprobe->inode = inode;
 	uprobe->offset = offset;
+	uprobe->ref_ctr_offset = ref_ctr_offset;
 	init_rwsem(&uprobe->register_rwsem);
 	init_rwsem(&uprobe->consumer_rwsem);
 
@@ -840,11 +848,174 @@ static inline struct map_info *free_map_info(struct map_info *info)
 	return err;
 }
 
-static void
-__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
+static bool sdt_valid_vma(struct uprobe *uprobe,
+			  struct vm_area_struct *vma,
+			  unsigned long vaddr)
+{
+	return uprobe->ref_ctr_offset &&
+		vma->vm_file &&
+		file_inode(vma->vm_file) == uprobe->inode &&
+		vma->vm_flags & VM_WRITE &&
+		vma->vm_start <= vaddr &&
+		vma->vm_end > vaddr;
+}
+
+static struct vm_area_struct *sdt_find_vma(struct uprobe *uprobe,
+					   struct mm_struct *mm,
+					   unsigned long vaddr)
+{
+	struct vm_area_struct *vma = find_vma(mm, vaddr);
+
+	return (vma && sdt_valid_vma(uprobe, vma, vaddr)) ? vma : NULL;
+}
+
+/*
+ * Reference counter gate the invocation of probe. If present,
+ * by default reference counter is 0. One needs to increment
+ * it before tracing the probe and decrement it when done.
+ */
+static int
+sdt_update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
+{
+	void *kaddr;
+	struct page *page;
+	struct vm_area_struct *vma;
+	int ret = 0;
+	short *ptr;
+
+	if (vaddr == 0 || d == 0)
+		return -EINVAL;
+
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+		FOLL_FORCE | FOLL_WRITE, &page, &vma, NULL);
+	if (unlikely(ret <= 0)) {
+		/*
+		 * We are asking for 1 page. If get_user_pages_remote() fails,
+		 * it may return 0, in that case we have to return error.
+		 */
+		ret = (ret == 0) ? -EBUSY : ret;
+		pr_warn("Failed to %s ref_ctr. (%d)\n",
+			d > 0 ? "increment" : "decrement", ret);
+		return ret;
+	}
+
+	kaddr = kmap_atomic(page);
+	ptr = kaddr + (vaddr & ~PAGE_MASK);
+
+	if (unlikely(*ptr + d < 0)) {
+		pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
+			"curr val: %d, delta: %d\n", vaddr, *ptr, d);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	*ptr += d;
+	ret = 0;
+out:
+	kunmap_atomic(kaddr);
+	put_page(page);
+	return ret;
+}
+
+static int sdt_increment_ref_ctr(struct uprobe *uprobe)
+{
+	struct map_info *info, *first = NULL;
+	int ctr = 0, ret = 0, tmp = 0;
+
+	percpu_down_write(&dup_mmap_sem);
+
+	info = build_map_info(uprobe->inode->i_mapping,
+			      uprobe->ref_ctr_offset, false);
+	if (IS_ERR(info)) {
+		percpu_up_write(&dup_mmap_sem);
+		return PTR_ERR(info);
+	}
+
+	first = info;
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
+			ret = sdt_update_ref_ctr(info->mm, info->vaddr, 1);
+			if (unlikely(ret)) {
+				up_write(&info->mm->mmap_sem);
+				goto rollback;
+			}
+		}
+		up_write(&info->mm->mmap_sem);
+		info = info->next;
+		ctr++;
+	}
+	ret = 0;
+	goto out;
+
+rollback:
+	/*
+	 * We failed to update reference counter in any one of
+	 * the target mm. Rollback alredy updated mms.
+	 */
+	info = first;
+	while (ctr) {
+		down_write(&info->mm->mmap_sem);
+		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
+			tmp = sdt_update_ref_ctr(info->mm, info->vaddr, -1);
+			if (unlikely(tmp))
+				pr_warn("ref_ctr rollback failed. (%d)\n", tmp);
+		}
+		up_write(&info->mm->mmap_sem);
+		info = info->next;
+		ctr--;
+	}
+
+out:
+	info = first;
+	while (info) {
+		mmput(info->mm);
+		info = free_map_info(info);
+	}
+
+	percpu_up_write(&dup_mmap_sem);
+	return ret;
+}
+
+static int sdt_decrement_ref_ctr(struct uprobe *uprobe)
+{
+	struct map_info *info;
+	int ret = 0, err = 0;
+
+	percpu_down_write(&dup_mmap_sem);
+	info = build_map_info(uprobe->inode->i_mapping,
+			      uprobe->ref_ctr_offset, false);
+	if (IS_ERR(info))
+		goto out;
+
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+
+		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
+			ret = sdt_update_ref_ctr(info->mm, info->vaddr, -1);
+			/* Save error and continue. */
+			err = !err && ret ? ret : err;
+		}
+
+		up_write(&info->mm->mmap_sem);
+		mmput(info->mm);
+		info = free_map_info(info);
+	}
+
+out:
+	percpu_up_write(&dup_mmap_sem);
+	return err;
+}
+
+static void __uprobe_unregister(struct uprobe *uprobe,
+				struct uprobe_consumer *uc,
+				bool ref_ctr_dec)
 {
 	int err;
 
+	if (ref_ctr_dec && uprobe->ref_ctr_offset)
+		sdt_decrement_ref_ctr(uprobe);
+
 	if (WARN_ON(!consumer_del(uprobe, uc)))
 		return;
 
@@ -869,7 +1040,7 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
 		return;
 
 	down_write(&uprobe->register_rwsem);
-	__uprobe_unregister(uprobe, uc);
+	__uprobe_unregister(uprobe, uc, true);
 	up_write(&uprobe->register_rwsem);
 	put_uprobe(uprobe);
 }
@@ -880,21 +1051,27 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
+ * @ref_ctr_offset: Reference counter offset
  *
  * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
- * inserted into the rbtree (i.e first consumer for a @inode:@offset
- * tuple).  Creation refcount stops uprobe_unregister from freeing the
- * @uprobe even before the register operation is complete. Creation
- * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of __uprobe_register() is required to keep @inode
- * (and the containing mount) referenced.
+ * inserted into the rbtree (i.e first consumer for a
+ * @inode:@offset:@ref_ctr_offset tuple). Creation refcount stops
+ * uprobe_unregister from freeing the @uprobe even before the register
+ * operation is complete. Creation refcount is released when the last
+ * @uc for the @uprobe unregisters. Caller of __uprobe_register() is
+ * required to keep @inode (and the containing mount) referenced.
+ *
+ * Note that, 'refcount' and 'ref_ctr_offset' are totally different
+ * entities and each has it's own purpose. 'ref_ctr_offset' is the file
+ * offset of the counter which gates the uprobe and it has nothing to
+ * do with the value of 'refcount'.
  *
- * Return errno if it cannot successully install probes
- * else return 0 (success)
+ * Return errno if it cannot successully install probes else return 0
+ * (success).
  */
 static int __uprobe_register(struct inode *inode, loff_t offset,
-			     struct uprobe_consumer *uc)
+			     struct uprobe_consumer *uc, loff_t ref_ctr_offset)
 {
 	struct uprobe *uprobe;
 	int ret;
@@ -907,11 +1084,11 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 	if (!inode->i_mapping->a_ops->readpage && !shmem_mapping(inode->i_mapping))
 		return -EIO;
 	/* Racy, just to catch the obvious mistakes */
-	if (offset > i_size_read(inode))
+	if (offset > i_size_read(inode) || ref_ctr_offset > i_size_read(inode))
 		return -EINVAL;
 
  retry:
-	uprobe = alloc_uprobe(inode, offset);
+	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (!uprobe)
 		return -ENOMEM;
 	/*
@@ -922,9 +1099,13 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 	ret = -EAGAIN;
 	if (likely(uprobe_is_active(uprobe))) {
 		consumer_add(uprobe, uc);
+
 		ret = register_for_each_vma(uprobe, uc);
+		if (!ret && ref_ctr_offset)
+			ret = sdt_increment_ref_ctr(uprobe);
+
 		if (ret)
-			__uprobe_unregister(uprobe, uc);
+			__uprobe_unregister(uprobe, uc, false);
 	}
 	up_write(&uprobe->register_rwsem);
 	put_uprobe(uprobe);
@@ -937,10 +1118,17 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 int uprobe_register(struct inode *inode, loff_t offset,
 		    struct uprobe_consumer *uc)
 {
-	return __uprobe_register(inode, offset, uc);
+	return __uprobe_register(inode, offset, uc, 0);
 }
 EXPORT_SYMBOL_GPL(uprobe_register);
 
+/* Currently, the only user of this is trace_uprobe. */
+int uprobe_register_refctr(struct inode *inode, loff_t offset,
+			   struct uprobe_consumer *uc, loff_t ref_ctr_offset)
+{
+	return __uprobe_register(inode, offset, uc, ref_ctr_offset);
+}
+
 /*
  * uprobe_apply - unregister a already registered probe.
  * @inode: the file in which the probe has to be removed.
@@ -997,22 +1185,30 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 	return err;
 }
 
+static loff_t uprobe_get_offset(struct uprobe *u, int off_type)
+{
+	return (off_type == UPROBE_OFFSET) ? u->offset : u->ref_ctr_offset;
+}
+
 static struct rb_node *
-find_node_in_range(struct inode *inode, loff_t min, loff_t max)
+find_node_in_range(struct inode *inode, int off_type, loff_t min, loff_t max)
 {
 	struct rb_node *n = uprobes_tree.rb_node;
+	struct uprobe *u;
+	loff_t offset;
 
 	while (n) {
-		struct uprobe *u = rb_entry(n, struct uprobe, rb_node);
+		u = rb_entry(n, struct uprobe, rb_node);
+		offset = uprobe_get_offset(u, off_type);
 
 		if (inode < u->inode) {
 			n = n->rb_left;
 		} else if (inode > u->inode) {
 			n = n->rb_right;
 		} else {
-			if (max < u->offset)
+			if (max < offset)
 				n = n->rb_left;
-			else if (min > u->offset)
+			else if (min > offset)
 				n = n->rb_right;
 			else
 				break;
@@ -1025,7 +1221,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 /*
  * For a given range in vma, build a list of probes that need to be inserted.
  */
-static void build_probe_list(struct inode *inode,
+static void build_probe_list(struct inode *inode, int off_type,
 				struct vm_area_struct *vma,
 				unsigned long start, unsigned long end,
 				struct list_head *head)
@@ -1033,24 +1229,27 @@ static void build_probe_list(struct inode *inode,
 	loff_t min, max;
 	struct rb_node *n, *t;
 	struct uprobe *u;
+	loff_t offset;
 
 	INIT_LIST_HEAD(head);
 	min = vaddr_to_offset(vma, start);
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
-	n = find_node_in_range(inode, min, max);
+	n = find_node_in_range(inode, off_type, min, max);
 	if (n) {
 		for (t = n; t; t = rb_prev(t)) {
 			u = rb_entry(t, struct uprobe, rb_node);
-			if (u->inode != inode || u->offset < min)
+			offset = uprobe_get_offset(u, off_type);
+			if (u->inode != inode || offset < min)
 				break;
 			list_add(&u->pending_list, head);
 			get_uprobe(u);
 		}
 		for (t = n; (t = rb_next(t)); ) {
 			u = rb_entry(t, struct uprobe, rb_node);
-			if (u->inode != inode || u->offset > max)
+			offset = uprobe_get_offset(u, off_type);
+			if (u->inode != inode || offset > max)
 				break;
 			list_add(&u->pending_list, head);
 			get_uprobe(u);
@@ -1059,6 +1258,39 @@ static void build_probe_list(struct inode *inode,
 	spin_unlock(&uprobes_treelock);
 }
 
+/* Called with down_write(&vma->vm_mm->mmap_sem) */
+static int sdt_uprobe_mmap(struct vm_area_struct *vma, struct inode *inode)
+{
+	struct list_head tmp_list;
+	struct uprobe *uprobe, *u;
+	struct uprobe_consumer *uc;
+	unsigned long vaddr;
+	int ret = 0, err = 0;
+
+	build_probe_list(inode, REF_CTR_OFFSET, vma, vma->vm_start,
+			 vma->vm_end, &tmp_list);
+
+	list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
+		if (!uprobe->ref_ctr_offset || !uprobe_is_active(uprobe))
+			continue;
+
+		vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
+		if (!sdt_valid_vma(uprobe, vma, vaddr))
+			continue;
+
+		/* Increment reference counter for each consumer. */
+		down_read(&uprobe->consumer_rwsem);
+		for (uc = uprobe->consumers; uc; uc = uc->next) {
+			ret = sdt_update_ref_ctr(vma->vm_mm, vaddr, 1);
+			err = !err && ret ? ret : err;
+		}
+		up_read(&uprobe->consumer_rwsem);
+		put_uprobe(uprobe);
+	}
+
+	return err;
+}
+
 /*
  * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
  *
@@ -1071,7 +1303,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	struct uprobe *uprobe, *u;
 	struct inode *inode;
 
-	if (no_uprobe_events() || !valid_vma(vma, true))
+	if (no_uprobe_events())
 		return 0;
 
 	inode = file_inode(vma->vm_file);
@@ -1079,7 +1311,14 @@ int uprobe_mmap(struct vm_area_struct *vma)
 		return 0;
 
 	mutex_lock(uprobes_mmap_hash(inode));
-	build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list);
+	if (vma->vm_flags & VM_WRITE)
+		sdt_uprobe_mmap(vma, inode);
+
+	if (!valid_vma(vma, true))
+		goto out;
+
+	build_probe_list(inode, UPROBE_OFFSET, vma, vma->vm_start,
+			 vma->vm_end, &tmp_list);
 	/*
 	 * We can race with uprobe_unregister(), this uprobe can be already
 	 * removed. But in this case filter_chain() must return false, all
@@ -1093,8 +1332,9 @@ int uprobe_mmap(struct vm_area_struct *vma)
 		}
 		put_uprobe(uprobe);
 	}
-	mutex_unlock(uprobes_mmap_hash(inode));
 
+out:
+	mutex_unlock(uprobes_mmap_hash(inode));
 	return 0;
 }
 
@@ -1111,7 +1351,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
-	n = find_node_in_range(inode, min, max);
+	n = find_node_in_range(inode, UPROBE_OFFSET, min, max);
 	spin_unlock(&uprobes_treelock);
 
 	return !!n;
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index ac89287..d5b6ca9 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -59,6 +59,7 @@ struct trace_uprobe {
 	struct inode			*inode;
 	char				*filename;
 	unsigned long			offset;
+	unsigned long			ref_ctr_offset;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -364,10 +365,10 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 static int create_trace_uprobe(int argc, char **argv)
 {
 	struct trace_uprobe *tu;
-	char *arg, *event, *group, *filename;
+	char *arg, *event, *group, *filename, *rctr, *rctr_end;
 	char buf[MAX_EVENT_NAME_LEN];
 	struct path path;
-	unsigned long offset;
+	unsigned long offset, ref_ctr_offset;
 	bool is_delete, is_return;
 	int i, ret;
 
@@ -376,6 +377,7 @@ static int create_trace_uprobe(int argc, char **argv)
 	is_return = false;
 	event = NULL;
 	group = NULL;
+	ref_ctr_offset = 0;
 
 	/* argc must be >= 1 */
 	if (argv[0][0] == '-')
@@ -450,6 +452,26 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 
+	/* Parse reference counter offset if specified. */
+	rctr = strchr(arg, '(');
+	if (rctr) {
+		rctr_end = strchr(rctr, ')');
+		if (rctr > rctr_end || *(rctr_end + 1) != 0) {
+			ret = -EINVAL;
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+
+		*rctr++ = '\0';
+		*rctr_end = '\0';
+		ret = kstrtoul(rctr, 0, &ref_ctr_offset);
+		if (ret) {
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+	}
+
+	/* Parse uprobe offset. */
 	ret = kstrtoul(arg, 0, &offset);
 	if (ret)
 		goto fail_address_parse;
@@ -484,6 +506,7 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 	tu->offset = offset;
+	tu->ref_ctr_offset = ref_ctr_offset;
 	tu->path = path;
 	tu->filename = kstrdup(filename, GFP_KERNEL);
 
@@ -602,6 +625,9 @@ static int probes_seq_show(struct seq_file *m, void *v)
 			trace_event_name(&tu->tp.call), tu->filename,
 			(int)(sizeof(void *) * 2), tu->offset);
 
+	if (tu->ref_ctr_offset)
+		seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
+
 	for (i = 0; i < tu->tp.nr_args; i++)
 		seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
 
@@ -917,7 +943,13 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 
 	tu->consumer.filter = filter;
 	tu->inode = d_real_inode(tu->path.dentry);
-	ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
+	if (tu->ref_ctr_offset) {
+		ret = uprobe_register_refctr(tu->inode, tu->offset,
+				&tu->consumer, tu->ref_ctr_offset);
+	} else {
+		ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
+	}
+
 	if (ret)
 		goto err_buffer;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-06  8:33   ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing. Implement the reference counter logic in Uprobe.
New function uprobe_register_refctr() has been added for this. Also,
it's not exported so, for now, the interface to use reference counter
is only through trace_uprobe.

trace_uprobe definition with reference counter will now be:

  <path>:<offset>[(ref_ctr_offset)]

where ref_ctr_offset is an optional field.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 include/linux/uprobes.h     |   5 +
 kernel/events/uprobes.c     | 298 +++++++++++++++++++++++++++++++++++++++-----
 kernel/trace/trace_uprobe.c |  38 +++++-
 3 files changed, 309 insertions(+), 32 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e9..58666c6 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ struct uprobes_state {
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, loff_t ref_ctr_offset);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ struct uprobes_state {
 {
 	return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, unsigned long ref_ctr_offset)
+{
+	return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c377a85..ed3c588 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -64,6 +64,11 @@
 /* Have a copy of original instruction */
 #define UPROBE_COPY_INSN	0
 
+enum {
+	UPROBE_OFFSET = 0,
+	REF_CTR_OFFSET
+};
+
 struct uprobe {
 	struct rb_node		rb_node;	/* node in the rb tree */
 	atomic_t		ref;
@@ -73,6 +78,7 @@ struct uprobe {
 	struct uprobe_consumer	*consumers;
 	struct inode		*inode;		/* Also hold a ref to inode */
 	loff_t			offset;
+	loff_t			ref_ctr_offset;
 	unsigned long		flags;
 
 	/*
@@ -483,7 +489,8 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
 	return u;
 }
 
-static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
+static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
+				   loff_t ref_ctr_offset)
 {
 	struct uprobe *uprobe, *cur_uprobe;
 
@@ -493,6 +500,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
 
 	uprobe->inode = inode;
 	uprobe->offset = offset;
+	uprobe->ref_ctr_offset = ref_ctr_offset;
 	init_rwsem(&uprobe->register_rwsem);
 	init_rwsem(&uprobe->consumer_rwsem);
 
@@ -840,11 +848,174 @@ static inline struct map_info *free_map_info(struct map_info *info)
 	return err;
 }
 
-static void
-__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
+static bool sdt_valid_vma(struct uprobe *uprobe,
+			  struct vm_area_struct *vma,
+			  unsigned long vaddr)
+{
+	return uprobe->ref_ctr_offset &&
+		vma->vm_file &&
+		file_inode(vma->vm_file) == uprobe->inode &&
+		vma->vm_flags & VM_WRITE &&
+		vma->vm_start <= vaddr &&
+		vma->vm_end > vaddr;
+}
+
+static struct vm_area_struct *sdt_find_vma(struct uprobe *uprobe,
+					   struct mm_struct *mm,
+					   unsigned long vaddr)
+{
+	struct vm_area_struct *vma = find_vma(mm, vaddr);
+
+	return (vma && sdt_valid_vma(uprobe, vma, vaddr)) ? vma : NULL;
+}
+
+/*
+ * Reference counter gate the invocation of probe. If present,
+ * by default reference counter is 0. One needs to increment
+ * it before tracing the probe and decrement it when done.
+ */
+static int
+sdt_update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
+{
+	void *kaddr;
+	struct page *page;
+	struct vm_area_struct *vma;
+	int ret = 0;
+	short *ptr;
+
+	if (vaddr == 0 || d == 0)
+		return -EINVAL;
+
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1,
+		FOLL_FORCE | FOLL_WRITE, &page, &vma, NULL);
+	if (unlikely(ret <= 0)) {
+		/*
+		 * We are asking for 1 page. If get_user_pages_remote() fails,
+		 * it may return 0, in that case we have to return error.
+		 */
+		ret = (ret == 0) ? -EBUSY : ret;
+		pr_warn("Failed to %s ref_ctr. (%d)\n",
+			d > 0 ? "increment" : "decrement", ret);
+		return ret;
+	}
+
+	kaddr = kmap_atomic(page);
+	ptr = kaddr + (vaddr & ~PAGE_MASK);
+
+	if (unlikely(*ptr + d < 0)) {
+		pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
+			"curr val: %d, delta: %d\n", vaddr, *ptr, d);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	*ptr += d;
+	ret = 0;
+out:
+	kunmap_atomic(kaddr);
+	put_page(page);
+	return ret;
+}
+
+static int sdt_increment_ref_ctr(struct uprobe *uprobe)
+{
+	struct map_info *info, *first = NULL;
+	int ctr = 0, ret = 0, tmp = 0;
+
+	percpu_down_write(&dup_mmap_sem);
+
+	info = build_map_info(uprobe->inode->i_mapping,
+			      uprobe->ref_ctr_offset, false);
+	if (IS_ERR(info)) {
+		percpu_up_write(&dup_mmap_sem);
+		return PTR_ERR(info);
+	}
+
+	first = info;
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
+			ret = sdt_update_ref_ctr(info->mm, info->vaddr, 1);
+			if (unlikely(ret)) {
+				up_write(&info->mm->mmap_sem);
+				goto rollback;
+			}
+		}
+		up_write(&info->mm->mmap_sem);
+		info = info->next;
+		ctr++;
+	}
+	ret = 0;
+	goto out;
+
+rollback:
+	/*
+	 * We failed to update reference counter in any one of
+	 * the target mm. Rollback alredy updated mms.
+	 */
+	info = first;
+	while (ctr) {
+		down_write(&info->mm->mmap_sem);
+		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
+			tmp = sdt_update_ref_ctr(info->mm, info->vaddr, -1);
+			if (unlikely(tmp))
+				pr_warn("ref_ctr rollback failed. (%d)\n", tmp);
+		}
+		up_write(&info->mm->mmap_sem);
+		info = info->next;
+		ctr--;
+	}
+
+out:
+	info = first;
+	while (info) {
+		mmput(info->mm);
+		info = free_map_info(info);
+	}
+
+	percpu_up_write(&dup_mmap_sem);
+	return ret;
+}
+
+static int sdt_decrement_ref_ctr(struct uprobe *uprobe)
+{
+	struct map_info *info;
+	int ret = 0, err = 0;
+
+	percpu_down_write(&dup_mmap_sem);
+	info = build_map_info(uprobe->inode->i_mapping,
+			      uprobe->ref_ctr_offset, false);
+	if (IS_ERR(info))
+		goto out;
+
+	while (info) {
+		down_write(&info->mm->mmap_sem);
+
+		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
+			ret = sdt_update_ref_ctr(info->mm, info->vaddr, -1);
+			/* Save error and continue. */
+			err = !err && ret ? ret : err;
+		}
+
+		up_write(&info->mm->mmap_sem);
+		mmput(info->mm);
+		info = free_map_info(info);
+	}
+
+out:
+	percpu_up_write(&dup_mmap_sem);
+	return err;
+}
+
+static void __uprobe_unregister(struct uprobe *uprobe,
+				struct uprobe_consumer *uc,
+				bool ref_ctr_dec)
 {
 	int err;
 
+	if (ref_ctr_dec && uprobe->ref_ctr_offset)
+		sdt_decrement_ref_ctr(uprobe);
+
 	if (WARN_ON(!consumer_del(uprobe, uc)))
 		return;
 
@@ -869,7 +1040,7 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
 		return;
 
 	down_write(&uprobe->register_rwsem);
-	__uprobe_unregister(uprobe, uc);
+	__uprobe_unregister(uprobe, uc, true);
 	up_write(&uprobe->register_rwsem);
 	put_uprobe(uprobe);
 }
@@ -880,21 +1051,27 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
+ * @ref_ctr_offset: Reference counter offset
  *
  * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
- * inserted into the rbtree (i.e first consumer for a @inode:@offset
- * tuple).  Creation refcount stops uprobe_unregister from freeing the
- * @uprobe even before the register operation is complete. Creation
- * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of __uprobe_register() is required to keep @inode
- * (and the containing mount) referenced.
+ * inserted into the rbtree (i.e first consumer for a
+ * @inode:@offset:@ref_ctr_offset tuple). Creation refcount stops
+ * uprobe_unregister from freeing the @uprobe even before the register
+ * operation is complete. Creation refcount is released when the last
+ * @uc for the @uprobe unregisters. Caller of __uprobe_register() is
+ * required to keep @inode (and the containing mount) referenced.
+ *
+ * Note that, 'refcount' and 'ref_ctr_offset' are totally different
+ * entities and each has it's own purpose. 'ref_ctr_offset' is the file
+ * offset of the counter which gates the uprobe and it has nothing to
+ * do with the value of 'refcount'.
  *
- * Return errno if it cannot successully install probes
- * else return 0 (success)
+ * Return errno if it cannot successully install probes else return 0
+ * (success).
  */
 static int __uprobe_register(struct inode *inode, loff_t offset,
-			     struct uprobe_consumer *uc)
+			     struct uprobe_consumer *uc, loff_t ref_ctr_offset)
 {
 	struct uprobe *uprobe;
 	int ret;
@@ -907,11 +1084,11 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 	if (!inode->i_mapping->a_ops->readpage && !shmem_mapping(inode->i_mapping))
 		return -EIO;
 	/* Racy, just to catch the obvious mistakes */
-	if (offset > i_size_read(inode))
+	if (offset > i_size_read(inode) || ref_ctr_offset > i_size_read(inode))
 		return -EINVAL;
 
  retry:
-	uprobe = alloc_uprobe(inode, offset);
+	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (!uprobe)
 		return -ENOMEM;
 	/*
@@ -922,9 +1099,13 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 	ret = -EAGAIN;
 	if (likely(uprobe_is_active(uprobe))) {
 		consumer_add(uprobe, uc);
+
 		ret = register_for_each_vma(uprobe, uc);
+		if (!ret && ref_ctr_offset)
+			ret = sdt_increment_ref_ctr(uprobe);
+
 		if (ret)
-			__uprobe_unregister(uprobe, uc);
+			__uprobe_unregister(uprobe, uc, false);
 	}
 	up_write(&uprobe->register_rwsem);
 	put_uprobe(uprobe);
@@ -937,10 +1118,17 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 int uprobe_register(struct inode *inode, loff_t offset,
 		    struct uprobe_consumer *uc)
 {
-	return __uprobe_register(inode, offset, uc);
+	return __uprobe_register(inode, offset, uc, 0);
 }
 EXPORT_SYMBOL_GPL(uprobe_register);
 
+/* Currently, the only user of this is trace_uprobe. */
+int uprobe_register_refctr(struct inode *inode, loff_t offset,
+			   struct uprobe_consumer *uc, loff_t ref_ctr_offset)
+{
+	return __uprobe_register(inode, offset, uc, ref_ctr_offset);
+}
+
 /*
  * uprobe_apply - unregister a already registered probe.
  * @inode: the file in which the probe has to be removed.
@@ -997,22 +1185,30 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 	return err;
 }
 
+static loff_t uprobe_get_offset(struct uprobe *u, int off_type)
+{
+	return (off_type == UPROBE_OFFSET) ? u->offset : u->ref_ctr_offset;
+}
+
 static struct rb_node *
-find_node_in_range(struct inode *inode, loff_t min, loff_t max)
+find_node_in_range(struct inode *inode, int off_type, loff_t min, loff_t max)
 {
 	struct rb_node *n = uprobes_tree.rb_node;
+	struct uprobe *u;
+	loff_t offset;
 
 	while (n) {
-		struct uprobe *u = rb_entry(n, struct uprobe, rb_node);
+		u = rb_entry(n, struct uprobe, rb_node);
+		offset = uprobe_get_offset(u, off_type);
 
 		if (inode < u->inode) {
 			n = n->rb_left;
 		} else if (inode > u->inode) {
 			n = n->rb_right;
 		} else {
-			if (max < u->offset)
+			if (max < offset)
 				n = n->rb_left;
-			else if (min > u->offset)
+			else if (min > offset)
 				n = n->rb_right;
 			else
 				break;
@@ -1025,7 +1221,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 /*
  * For a given range in vma, build a list of probes that need to be inserted.
  */
-static void build_probe_list(struct inode *inode,
+static void build_probe_list(struct inode *inode, int off_type,
 				struct vm_area_struct *vma,
 				unsigned long start, unsigned long end,
 				struct list_head *head)
@@ -1033,24 +1229,27 @@ static void build_probe_list(struct inode *inode,
 	loff_t min, max;
 	struct rb_node *n, *t;
 	struct uprobe *u;
+	loff_t offset;
 
 	INIT_LIST_HEAD(head);
 	min = vaddr_to_offset(vma, start);
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
-	n = find_node_in_range(inode, min, max);
+	n = find_node_in_range(inode, off_type, min, max);
 	if (n) {
 		for (t = n; t; t = rb_prev(t)) {
 			u = rb_entry(t, struct uprobe, rb_node);
-			if (u->inode != inode || u->offset < min)
+			offset = uprobe_get_offset(u, off_type);
+			if (u->inode != inode || offset < min)
 				break;
 			list_add(&u->pending_list, head);
 			get_uprobe(u);
 		}
 		for (t = n; (t = rb_next(t)); ) {
 			u = rb_entry(t, struct uprobe, rb_node);
-			if (u->inode != inode || u->offset > max)
+			offset = uprobe_get_offset(u, off_type);
+			if (u->inode != inode || offset > max)
 				break;
 			list_add(&u->pending_list, head);
 			get_uprobe(u);
@@ -1059,6 +1258,39 @@ static void build_probe_list(struct inode *inode,
 	spin_unlock(&uprobes_treelock);
 }
 
+/* Called with down_write(&vma->vm_mm->mmap_sem) */
+static int sdt_uprobe_mmap(struct vm_area_struct *vma, struct inode *inode)
+{
+	struct list_head tmp_list;
+	struct uprobe *uprobe, *u;
+	struct uprobe_consumer *uc;
+	unsigned long vaddr;
+	int ret = 0, err = 0;
+
+	build_probe_list(inode, REF_CTR_OFFSET, vma, vma->vm_start,
+			 vma->vm_end, &tmp_list);
+
+	list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
+		if (!uprobe->ref_ctr_offset || !uprobe_is_active(uprobe))
+			continue;
+
+		vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
+		if (!sdt_valid_vma(uprobe, vma, vaddr))
+			continue;
+
+		/* Increment reference counter for each consumer. */
+		down_read(&uprobe->consumer_rwsem);
+		for (uc = uprobe->consumers; uc; uc = uc->next) {
+			ret = sdt_update_ref_ctr(vma->vm_mm, vaddr, 1);
+			err = !err && ret ? ret : err;
+		}
+		up_read(&uprobe->consumer_rwsem);
+		put_uprobe(uprobe);
+	}
+
+	return err;
+}
+
 /*
  * Called from mmap_region/vma_adjust with mm->mmap_sem acquired.
  *
@@ -1071,7 +1303,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	struct uprobe *uprobe, *u;
 	struct inode *inode;
 
-	if (no_uprobe_events() || !valid_vma(vma, true))
+	if (no_uprobe_events())
 		return 0;
 
 	inode = file_inode(vma->vm_file);
@@ -1079,7 +1311,14 @@ int uprobe_mmap(struct vm_area_struct *vma)
 		return 0;
 
 	mutex_lock(uprobes_mmap_hash(inode));
-	build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list);
+	if (vma->vm_flags & VM_WRITE)
+		sdt_uprobe_mmap(vma, inode);
+
+	if (!valid_vma(vma, true))
+		goto out;
+
+	build_probe_list(inode, UPROBE_OFFSET, vma, vma->vm_start,
+			 vma->vm_end, &tmp_list);
 	/*
 	 * We can race with uprobe_unregister(), this uprobe can be already
 	 * removed. But in this case filter_chain() must return false, all
@@ -1093,8 +1332,9 @@ int uprobe_mmap(struct vm_area_struct *vma)
 		}
 		put_uprobe(uprobe);
 	}
-	mutex_unlock(uprobes_mmap_hash(inode));
 
+out:
+	mutex_unlock(uprobes_mmap_hash(inode));
 	return 0;
 }
 
@@ -1111,7 +1351,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
 	max = min + (end - start) - 1;
 
 	spin_lock(&uprobes_treelock);
-	n = find_node_in_range(inode, min, max);
+	n = find_node_in_range(inode, UPROBE_OFFSET, min, max);
 	spin_unlock(&uprobes_treelock);
 
 	return !!n;
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index ac89287..d5b6ca9 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -59,6 +59,7 @@ struct trace_uprobe {
 	struct inode			*inode;
 	char				*filename;
 	unsigned long			offset;
+	unsigned long			ref_ctr_offset;
 	unsigned long			nhit;
 	struct trace_probe		tp;
 };
@@ -364,10 +365,10 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 static int create_trace_uprobe(int argc, char **argv)
 {
 	struct trace_uprobe *tu;
-	char *arg, *event, *group, *filename;
+	char *arg, *event, *group, *filename, *rctr, *rctr_end;
 	char buf[MAX_EVENT_NAME_LEN];
 	struct path path;
-	unsigned long offset;
+	unsigned long offset, ref_ctr_offset;
 	bool is_delete, is_return;
 	int i, ret;
 
@@ -376,6 +377,7 @@ static int create_trace_uprobe(int argc, char **argv)
 	is_return = false;
 	event = NULL;
 	group = NULL;
+	ref_ctr_offset = 0;
 
 	/* argc must be >= 1 */
 	if (argv[0][0] == '-')
@@ -450,6 +452,26 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 
+	/* Parse reference counter offset if specified. */
+	rctr = strchr(arg, '(');
+	if (rctr) {
+		rctr_end = strchr(rctr, ')');
+		if (rctr > rctr_end || *(rctr_end + 1) != 0) {
+			ret = -EINVAL;
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+
+		*rctr++ = '\0';
+		*rctr_end = '\0';
+		ret = kstrtoul(rctr, 0, &ref_ctr_offset);
+		if (ret) {
+			pr_info("Invalid reference counter offset.\n");
+			goto fail_address_parse;
+		}
+	}
+
+	/* Parse uprobe offset. */
 	ret = kstrtoul(arg, 0, &offset);
 	if (ret)
 		goto fail_address_parse;
@@ -484,6 +506,7 @@ static int create_trace_uprobe(int argc, char **argv)
 		goto fail_address_parse;
 	}
 	tu->offset = offset;
+	tu->ref_ctr_offset = ref_ctr_offset;
 	tu->path = path;
 	tu->filename = kstrdup(filename, GFP_KERNEL);
 
@@ -602,6 +625,9 @@ static int probes_seq_show(struct seq_file *m, void *v)
 			trace_event_name(&tu->tp.call), tu->filename,
 			(int)(sizeof(void *) * 2), tu->offset);
 
+	if (tu->ref_ctr_offset)
+		seq_printf(m, "(0x%lx)", tu->ref_ctr_offset);
+
 	for (i = 0; i < tu->tp.nr_args; i++)
 		seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
 
@@ -917,7 +943,13 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 
 	tu->consumer.filter = filter;
 	tu->inode = d_real_inode(tu->path.dentry);
-	ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
+	if (tu->ref_ctr_offset) {
+		ret = uprobe_register_refctr(tu->inode, tu->offset,
+				&tu->consumer, tu->ref_ctr_offset);
+	} else {
+		ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
+	}
+
 	if (ret)
 		goto err_buffer;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/7] Uprobes/sdt: Fix multiple update of same reference counter
  2018-06-06  8:33 ` Ravi Bangoria
@ 2018-06-06  8:33   ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

When virtual memory map for binary/library is being prepared, there is
no direct one to one mapping between mmap() and virtual memory area. Ex,
when loader loads the library, it first calls mmap(size = total_size),
where total_size is addition of size of all elf sections that are going
to be mapped. Then it splits individual vmas with new mmap()/mprotect()
calls. Loader does this to ensure it gets continuous address range for
a library. load_elf_binary() also uses similar tricks while preparing
mappings of binary.

Ex for pyhton library,

  # strace python
    mmap(NULL, 2738968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fff92460000
    mmap(0x7fff926a0000, 327680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x230000) = 0x7fff926a0000
    mprotect(0x7fff926a0000, 65536, PROT_READ) = 0

Here, the first mmap() maps the whole library into one region. Second
mmap() and third mprotect() split out the whole region into smaller
vmas and sets appropriate protection flags.

Now, in this case, uprobe_mmap() updates the reference counter twice
-- by second mmap() call and by third mprotect() call -- because both
regions contain reference counter.

But while de-registration, reference counter will get decremented only
once leaving reference counter > 0 even if no one is tracing on that
marker.

Example with python library before patch:

    # readelf -n /lib64/libpython2.7.so.1.0 | grep -A1 function__entry
      Name: function__entry
      ... Semaphore: 0x00000000002899d8

  Probe on a marker:
    # echo "p:sdt_python/function__entry /usr/lib64/libpython2.7.so.1.0:0x16a4d4(0x2799d8)" > uprobe_events

  Start tracing:
    # perf record -e sdt_python:function__entry -a

  Run python workload:
    # python
    # cat /proc/`pgrep python`/maps | grep libpython
      7fffadb00000-7fffadd40000 r-xp 00000000 08:05 403934  /usr/lib64/libpython2.7.so.1.0
      7fffadd40000-7fffadd50000 r--p 00230000 08:05 403934  /usr/lib64/libpython2.7.so.1.0
      7fffadd50000-7fffadd90000 rw-p 00240000 08:05 403934  /usr/lib64/libpython2.7.so.1.0

  Reference counter value has been incremented twice:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fffadd899d8 )) 2>/dev/null | xxd
      0000000: 02                                       .

  Kill perf:
    #
      ^C[ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.322 MB perf.data (1273 samples) ]

  Reference conter is still 1 even when no one is tracing on it:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fffadd899d8 )) 2>/dev/null | xxd
      0000000: 01                                       .

Ensure increment and decrement happens in sync by keeping list of
{consumer, mm} tuple in struct uprobe. Check presence of it in the list
before incrementing the reference counter. I.e. for each {consumer, mm}
tuple, reference counter must be incremented only by one. Note that we
don't check the presence of mm in the list at decrement time.

We consider only two cases while _incrementing_ the reference counter:
  1. Target binary is already running when we start tracing. In this
     case, find all mm which maps region of target binary containing
     reference counter. Loop over all mms and increment the counter if
     {consumer, mm} is not already present in the list.
  2. Tracer is already tracing before target binary starts execution. In
     this case, update reference counter only if {consumer, vma->vm_mm}
     is not already present in the list.

  There is also a third case which we don't consider, a fork() case.
  When process with markers forks itself, we don't explicitly increment
  the reference counter in child process because it should be taken care
  by dup_mmap(). We also don't add the child mm in the list. This is
  fine because we don't check presence of mm in the list at decrement
  time.

After patch:

  Start perf record and then run python...
  Reference counter value has been incremented only once:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fff9cbf99d8 )) 2>/dev/null | xxd
      0000000: 01                                       .

  Kill perf:
    #
      ^C[ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.364 MB perf.data (1427 samples) ]

  Reference conter is reset to 0:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fff9cbb99d8 )) 2>/dev/null | xxd
      0000000: 00                                       .

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/events/uprobes.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 150 insertions(+), 6 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ed3c588..279c8fc 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -69,6 +69,20 @@ enum {
 	REF_CTR_OFFSET
 };
 
+/*
+ * List of {consumer, mm} tupple. Unlike uprobe->consumers,
+ * uc is not a list here. It's just a single consumer. Ex,
+ * if there are 2 consumers and 3 target mms for a uprobe,
+ * total number of elements in uprobe->sml = 2 * 3 = 6. This
+ * list is used to ensure reference counter increment and
+ * decrement happen in sync.
+ */
+struct sdt_mm_list {
+	struct list_head list;
+	struct uprobe_consumer *uc;
+	struct mm_struct *mm;
+};
+
 struct uprobe {
 	struct rb_node		rb_node;	/* node in the rb tree */
 	atomic_t		ref;
@@ -79,6 +93,8 @@ struct uprobe {
 	struct inode		*inode;		/* Also hold a ref to inode */
 	loff_t			offset;
 	loff_t			ref_ctr_offset;
+	struct sdt_mm_list	sml;
+	struct mutex		sml_lock;
 	unsigned long		flags;
 
 	/*
@@ -503,6 +519,8 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 	uprobe->ref_ctr_offset = ref_ctr_offset;
 	init_rwsem(&uprobe->register_rwsem);
 	init_rwsem(&uprobe->consumer_rwsem);
+	mutex_init(&uprobe->sml_lock);
+	INIT_LIST_HEAD(&(uprobe->sml.list));
 
 	/* add to uprobes_tree, sorted on inode:offset */
 	cur_uprobe = insert_uprobe(uprobe);
@@ -848,6 +866,49 @@ static inline struct map_info *free_map_info(struct map_info *info)
 	return err;
 }
 
+static bool sdt_check_mm_list(struct uprobe *uprobe, struct mm_struct *mm,
+			      struct uprobe_consumer *uc)
+{
+	struct sdt_mm_list *sml;
+
+	list_for_each_entry(sml, &(uprobe->sml.list), list)
+		if (sml->mm == mm && sml->uc == uc)
+			return true;
+
+	return false;
+}
+
+static int sdt_add_mm_list(struct uprobe *uprobe, struct mm_struct *mm,
+			   struct uprobe_consumer *uc)
+{
+	struct sdt_mm_list *sml  = kzalloc(sizeof(*sml), GFP_KERNEL);
+
+	if (!sml) {
+		pr_info("Failed to add mm into list.\n");
+		return -ENOMEM;
+	}
+
+	sml->mm = mm;
+	sml->uc = uc;
+	list_add(&(sml->list), &(uprobe->sml.list));
+	return 0;
+}
+
+static void sdt_del_mm_list(struct uprobe *uprobe, struct mm_struct *mm,
+			    struct uprobe_consumer *uc)
+{
+	struct list_head *pos, *q;
+	struct sdt_mm_list *sml;
+
+	list_for_each_safe(pos, q, &(uprobe->sml.list)) {
+		sml = list_entry(pos, struct sdt_mm_list, list);
+		if (sml->mm == mm && sml->uc == uc) {
+			list_del(pos);
+			kfree(sml);
+		}
+	}
+}
+
 static bool sdt_valid_vma(struct uprobe *uprobe,
 			  struct vm_area_struct *vma,
 			  unsigned long vaddr)
@@ -917,7 +978,8 @@ static struct vm_area_struct *sdt_find_vma(struct uprobe *uprobe,
 	return ret;
 }
 
-static int sdt_increment_ref_ctr(struct uprobe *uprobe)
+static int
+sdt_increment_ref_ctr(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 	struct map_info *info, *first = NULL;
 	int ctr = 0, ret = 0, tmp = 0;
@@ -934,16 +996,35 @@ static int sdt_increment_ref_ctr(struct uprobe *uprobe)
 	first = info;
 	while (info) {
 		down_write(&info->mm->mmap_sem);
+		mutex_lock(&uprobe->sml_lock);
+
+		if (sdt_check_mm_list(uprobe, info->mm, uc)) {
+			mutex_unlock(&uprobe->sml_lock);
+			up_write(&info->mm->mmap_sem);
+			info = info->next;
+			continue;
+		}
+
 		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
 			ret = sdt_update_ref_ctr(info->mm, info->vaddr, 1);
 			if (unlikely(ret)) {
+				mutex_unlock(&uprobe->sml_lock);
+				up_write(&info->mm->mmap_sem);
+				goto rollback;
+			}
+
+			ctr++;
+
+			ret = sdt_add_mm_list(uprobe, info->mm, uc);
+			if (unlikely(ret)) {
+				mutex_unlock(&uprobe->sml_lock);
 				up_write(&info->mm->mmap_sem);
 				goto rollback;
 			}
 		}
+		mutex_unlock(&uprobe->sml_lock);
 		up_write(&info->mm->mmap_sem);
 		info = info->next;
-		ctr++;
 	}
 	ret = 0;
 	goto out;
@@ -956,11 +1037,15 @@ static int sdt_increment_ref_ctr(struct uprobe *uprobe)
 	info = first;
 	while (ctr) {
 		down_write(&info->mm->mmap_sem);
+		mutex_lock(&uprobe->sml_lock);
 		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
 			tmp = sdt_update_ref_ctr(info->mm, info->vaddr, -1);
 			if (unlikely(tmp))
 				pr_warn("ref_ctr rollback failed. (%d)\n", tmp);
+
+			sdt_del_mm_list(uprobe, info->mm, uc);
 		}
+		mutex_unlock(&uprobe->sml_lock);
 		up_write(&info->mm->mmap_sem);
 		info = info->next;
 		ctr--;
@@ -977,7 +1062,12 @@ static int sdt_increment_ref_ctr(struct uprobe *uprobe)
 	return ret;
 }
 
-static int sdt_decrement_ref_ctr(struct uprobe *uprobe)
+/*
+ * We don't check presence of {uc,mm} in sml here. We just decrement
+ * the reference counter if we find vma holding the reference counter.
+ */
+static int
+sdt_decrement_ref_ctr(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 	struct map_info *info;
 	int ret = 0, err = 0;
@@ -990,6 +1080,7 @@ static int sdt_decrement_ref_ctr(struct uprobe *uprobe)
 
 	while (info) {
 		down_write(&info->mm->mmap_sem);
+		mutex_lock(&uprobe->sml_lock);
 
 		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
 			ret = sdt_update_ref_ctr(info->mm, info->vaddr, -1);
@@ -997,6 +1088,8 @@ static int sdt_decrement_ref_ctr(struct uprobe *uprobe)
 			err = !err && ret ? ret : err;
 		}
 
+		sdt_del_mm_list(uprobe, info->mm, uc);
+		mutex_unlock(&uprobe->sml_lock);
 		up_write(&info->mm->mmap_sem);
 		mmput(info->mm);
 		info = free_map_info(info);
@@ -1014,7 +1107,7 @@ static void __uprobe_unregister(struct uprobe *uprobe,
 	int err;
 
 	if (ref_ctr_dec && uprobe->ref_ctr_offset)
-		sdt_decrement_ref_ctr(uprobe);
+		sdt_decrement_ref_ctr(uprobe, uc);
 
 	if (WARN_ON(!consumer_del(uprobe, uc)))
 		return;
@@ -1102,7 +1195,7 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 
 		ret = register_for_each_vma(uprobe, uc);
 		if (!ret && ref_ctr_offset)
-			ret = sdt_increment_ref_ctr(uprobe);
+			ret = sdt_increment_ref_ctr(uprobe, uc);
 
 		if (ret)
 			__uprobe_unregister(uprobe, uc, false);
@@ -1258,6 +1351,26 @@ static void build_probe_list(struct inode *inode, int off_type,
 	spin_unlock(&uprobes_treelock);
 }
 
+static int __sdt_uprobe_mmap(struct uprobe *uprobe, struct mm_struct *mm,
+			     unsigned long vaddr, struct uprobe_consumer *uc)
+{
+	int ret;
+
+	if (sdt_check_mm_list(uprobe, mm, uc))
+		return 0;
+
+	ret = sdt_update_ref_ctr(mm, vaddr, 1);
+	if (unlikely(ret))
+		return ret;
+
+	ret = sdt_add_mm_list(uprobe, mm, uc);
+	if (likely(!ret))
+		return ret;
+
+	/* Failed to add mm into the list. Rollback ref_ctr update */
+	return sdt_update_ref_ctr(mm, vaddr, -1);
+}
+
 /* Called with down_write(&vma->vm_mm->mmap_sem) */
 static int sdt_uprobe_mmap(struct vm_area_struct *vma, struct inode *inode)
 {
@@ -1280,10 +1393,14 @@ static int sdt_uprobe_mmap(struct vm_area_struct *vma, struct inode *inode)
 
 		/* Increment reference counter for each consumer. */
 		down_read(&uprobe->consumer_rwsem);
+		mutex_lock(&uprobe->sml_lock);
+
 		for (uc = uprobe->consumers; uc; uc = uc->next) {
-			ret = sdt_update_ref_ctr(vma->vm_mm, vaddr, 1);
+			ret = __sdt_uprobe_mmap(uprobe, vma->vm_mm, vaddr, uc);
 			err = !err && ret ? ret : err;
 		}
+
+		mutex_unlock(&uprobe->sml_lock);
 		up_read(&uprobe->consumer_rwsem);
 		put_uprobe(uprobe);
 	}
@@ -1477,6 +1594,29 @@ static struct xol_area *get_xol_area(void)
 	return area;
 }
 
+static void sdt_mm_release(struct rb_node *n, struct mm_struct *mm)
+{
+	struct uprobe *uprobe;
+	struct uprobe_consumer *uc;
+
+	if (!n)
+		return;
+
+	uprobe = rb_entry(n, struct uprobe, rb_node);
+
+	down_read(&uprobe->consumer_rwsem);
+	mutex_lock(&uprobe->sml_lock);
+
+	for (uc = uprobe->consumers; uc; uc = uc->next)
+		sdt_del_mm_list(uprobe, mm, uc);
+
+	mutex_unlock(&uprobe->sml_lock);
+	up_read(&uprobe->consumer_rwsem);
+
+	sdt_mm_release(n->rb_left, mm);
+	sdt_mm_release(n->rb_right, mm);
+}
+
 /*
  * uprobe_clear_state - Free the area allocated for slots.
  */
@@ -1484,6 +1624,10 @@ void uprobe_clear_state(struct mm_struct *mm)
 {
 	struct xol_area *area = mm->uprobes_state.xol_area;
 
+	spin_lock(&uprobes_treelock);
+	sdt_mm_release(uprobes_tree.rb_node, mm);
+	spin_unlock(&uprobes_treelock);
+
 	if (!area)
 		return;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/7] Uprobes/sdt: Fix multiple update of same reference counter
@ 2018-06-06  8:33   ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

When virtual memory map for binary/library is being prepared, there is
no direct one to one mapping between mmap() and virtual memory area. Ex,
when loader loads the library, it first calls mmap(size = total_size),
where total_size is addition of size of all elf sections that are going
to be mapped. Then it splits individual vmas with new mmap()/mprotect()
calls. Loader does this to ensure it gets continuous address range for
a library. load_elf_binary() also uses similar tricks while preparing
mappings of binary.

Ex for pyhton library,

  # strace python
    mmap(NULL, 2738968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fff92460000
    mmap(0x7fff926a0000, 327680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x230000) = 0x7fff926a0000
    mprotect(0x7fff926a0000, 65536, PROT_READ) = 0

Here, the first mmap() maps the whole library into one region. Second
mmap() and third mprotect() split out the whole region into smaller
vmas and sets appropriate protection flags.

Now, in this case, uprobe_mmap() updates the reference counter twice
-- by second mmap() call and by third mprotect() call -- because both
regions contain reference counter.

But while de-registration, reference counter will get decremented only
once leaving reference counter > 0 even if no one is tracing on that
marker.

Example with python library before patch:

    # readelf -n /lib64/libpython2.7.so.1.0 | grep -A1 function__entry
      Name: function__entry
      ... Semaphore: 0x00000000002899d8

  Probe on a marker:
    # echo "p:sdt_python/function__entry /usr/lib64/libpython2.7.so.1.0:0x16a4d4(0x2799d8)" > uprobe_events

  Start tracing:
    # perf record -e sdt_python:function__entry -a

  Run python workload:
    # python
    # cat /proc/`pgrep python`/maps | grep libpython
      7fffadb00000-7fffadd40000 r-xp 00000000 08:05 403934  /usr/lib64/libpython2.7.so.1.0
      7fffadd40000-7fffadd50000 r--p 00230000 08:05 403934  /usr/lib64/libpython2.7.so.1.0
      7fffadd50000-7fffadd90000 rw-p 00240000 08:05 403934  /usr/lib64/libpython2.7.so.1.0

  Reference counter value has been incremented twice:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fffadd899d8 )) 2>/dev/null | xxd
      0000000: 02                                       .

  Kill perf:
    #
      ^C[ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.322 MB perf.data (1273 samples) ]

  Reference conter is still 1 even when no one is tracing on it:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fffadd899d8 )) 2>/dev/null | xxd
      0000000: 01                                       .

Ensure increment and decrement happens in sync by keeping list of
{consumer, mm} tuple in struct uprobe. Check presence of it in the list
before incrementing the reference counter. I.e. for each {consumer, mm}
tuple, reference counter must be incremented only by one. Note that we
don't check the presence of mm in the list at decrement time.

We consider only two cases while _incrementing_ the reference counter:
  1. Target binary is already running when we start tracing. In this
     case, find all mm which maps region of target binary containing
     reference counter. Loop over all mms and increment the counter if
     {consumer, mm} is not already present in the list.
  2. Tracer is already tracing before target binary starts execution. In
     this case, update reference counter only if {consumer, vma->vm_mm}
     is not already present in the list.

  There is also a third case which we don't consider, a fork() case.
  When process with markers forks itself, we don't explicitly increment
  the reference counter in child process because it should be taken care
  by dup_mmap(). We also don't add the child mm in the list. This is
  fine because we don't check presence of mm in the list at decrement
  time.

After patch:

  Start perf record and then run python...
  Reference counter value has been incremented only once:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fff9cbf99d8 )) 2>/dev/null | xxd
      0000000: 01                                       .

  Kill perf:
    #
      ^C[ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.364 MB perf.data (1427 samples) ]

  Reference conter is reset to 0:
    # dd if=/proc/`pgrep python`/mem bs=1 count=1 skip=$(( 0x7fff9cbb99d8 )) 2>/dev/null | xxd
      0000000: 00                                       .

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/events/uprobes.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 150 insertions(+), 6 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ed3c588..279c8fc 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -69,6 +69,20 @@ enum {
 	REF_CTR_OFFSET
 };
 
+/*
+ * List of {consumer, mm} tupple. Unlike uprobe->consumers,
+ * uc is not a list here. It's just a single consumer. Ex,
+ * if there are 2 consumers and 3 target mms for a uprobe,
+ * total number of elements in uprobe->sml = 2 * 3 = 6. This
+ * list is used to ensure reference counter increment and
+ * decrement happen in sync.
+ */
+struct sdt_mm_list {
+	struct list_head list;
+	struct uprobe_consumer *uc;
+	struct mm_struct *mm;
+};
+
 struct uprobe {
 	struct rb_node		rb_node;	/* node in the rb tree */
 	atomic_t		ref;
@@ -79,6 +93,8 @@ struct uprobe {
 	struct inode		*inode;		/* Also hold a ref to inode */
 	loff_t			offset;
 	loff_t			ref_ctr_offset;
+	struct sdt_mm_list	sml;
+	struct mutex		sml_lock;
 	unsigned long		flags;
 
 	/*
@@ -503,6 +519,8 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 	uprobe->ref_ctr_offset = ref_ctr_offset;
 	init_rwsem(&uprobe->register_rwsem);
 	init_rwsem(&uprobe->consumer_rwsem);
+	mutex_init(&uprobe->sml_lock);
+	INIT_LIST_HEAD(&(uprobe->sml.list));
 
 	/* add to uprobes_tree, sorted on inode:offset */
 	cur_uprobe = insert_uprobe(uprobe);
@@ -848,6 +866,49 @@ static inline struct map_info *free_map_info(struct map_info *info)
 	return err;
 }
 
+static bool sdt_check_mm_list(struct uprobe *uprobe, struct mm_struct *mm,
+			      struct uprobe_consumer *uc)
+{
+	struct sdt_mm_list *sml;
+
+	list_for_each_entry(sml, &(uprobe->sml.list), list)
+		if (sml->mm == mm && sml->uc == uc)
+			return true;
+
+	return false;
+}
+
+static int sdt_add_mm_list(struct uprobe *uprobe, struct mm_struct *mm,
+			   struct uprobe_consumer *uc)
+{
+	struct sdt_mm_list *sml  = kzalloc(sizeof(*sml), GFP_KERNEL);
+
+	if (!sml) {
+		pr_info("Failed to add mm into list.\n");
+		return -ENOMEM;
+	}
+
+	sml->mm = mm;
+	sml->uc = uc;
+	list_add(&(sml->list), &(uprobe->sml.list));
+	return 0;
+}
+
+static void sdt_del_mm_list(struct uprobe *uprobe, struct mm_struct *mm,
+			    struct uprobe_consumer *uc)
+{
+	struct list_head *pos, *q;
+	struct sdt_mm_list *sml;
+
+	list_for_each_safe(pos, q, &(uprobe->sml.list)) {
+		sml = list_entry(pos, struct sdt_mm_list, list);
+		if (sml->mm == mm && sml->uc == uc) {
+			list_del(pos);
+			kfree(sml);
+		}
+	}
+}
+
 static bool sdt_valid_vma(struct uprobe *uprobe,
 			  struct vm_area_struct *vma,
 			  unsigned long vaddr)
@@ -917,7 +978,8 @@ static struct vm_area_struct *sdt_find_vma(struct uprobe *uprobe,
 	return ret;
 }
 
-static int sdt_increment_ref_ctr(struct uprobe *uprobe)
+static int
+sdt_increment_ref_ctr(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 	struct map_info *info, *first = NULL;
 	int ctr = 0, ret = 0, tmp = 0;
@@ -934,16 +996,35 @@ static int sdt_increment_ref_ctr(struct uprobe *uprobe)
 	first = info;
 	while (info) {
 		down_write(&info->mm->mmap_sem);
+		mutex_lock(&uprobe->sml_lock);
+
+		if (sdt_check_mm_list(uprobe, info->mm, uc)) {
+			mutex_unlock(&uprobe->sml_lock);
+			up_write(&info->mm->mmap_sem);
+			info = info->next;
+			continue;
+		}
+
 		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
 			ret = sdt_update_ref_ctr(info->mm, info->vaddr, 1);
 			if (unlikely(ret)) {
+				mutex_unlock(&uprobe->sml_lock);
+				up_write(&info->mm->mmap_sem);
+				goto rollback;
+			}
+
+			ctr++;
+
+			ret = sdt_add_mm_list(uprobe, info->mm, uc);
+			if (unlikely(ret)) {
+				mutex_unlock(&uprobe->sml_lock);
 				up_write(&info->mm->mmap_sem);
 				goto rollback;
 			}
 		}
+		mutex_unlock(&uprobe->sml_lock);
 		up_write(&info->mm->mmap_sem);
 		info = info->next;
-		ctr++;
 	}
 	ret = 0;
 	goto out;
@@ -956,11 +1037,15 @@ static int sdt_increment_ref_ctr(struct uprobe *uprobe)
 	info = first;
 	while (ctr) {
 		down_write(&info->mm->mmap_sem);
+		mutex_lock(&uprobe->sml_lock);
 		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
 			tmp = sdt_update_ref_ctr(info->mm, info->vaddr, -1);
 			if (unlikely(tmp))
 				pr_warn("ref_ctr rollback failed. (%d)\n", tmp);
+
+			sdt_del_mm_list(uprobe, info->mm, uc);
 		}
+		mutex_unlock(&uprobe->sml_lock);
 		up_write(&info->mm->mmap_sem);
 		info = info->next;
 		ctr--;
@@ -977,7 +1062,12 @@ static int sdt_increment_ref_ctr(struct uprobe *uprobe)
 	return ret;
 }
 
-static int sdt_decrement_ref_ctr(struct uprobe *uprobe)
+/*
+ * We don't check presence of {uc,mm} in sml here. We just decrement
+ * the reference counter if we find vma holding the reference counter.
+ */
+static int
+sdt_decrement_ref_ctr(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 	struct map_info *info;
 	int ret = 0, err = 0;
@@ -990,6 +1080,7 @@ static int sdt_decrement_ref_ctr(struct uprobe *uprobe)
 
 	while (info) {
 		down_write(&info->mm->mmap_sem);
+		mutex_lock(&uprobe->sml_lock);
 
 		if (sdt_find_vma(uprobe, info->mm, info->vaddr)) {
 			ret = sdt_update_ref_ctr(info->mm, info->vaddr, -1);
@@ -997,6 +1088,8 @@ static int sdt_decrement_ref_ctr(struct uprobe *uprobe)
 			err = !err && ret ? ret : err;
 		}
 
+		sdt_del_mm_list(uprobe, info->mm, uc);
+		mutex_unlock(&uprobe->sml_lock);
 		up_write(&info->mm->mmap_sem);
 		mmput(info->mm);
 		info = free_map_info(info);
@@ -1014,7 +1107,7 @@ static void __uprobe_unregister(struct uprobe *uprobe,
 	int err;
 
 	if (ref_ctr_dec && uprobe->ref_ctr_offset)
-		sdt_decrement_ref_ctr(uprobe);
+		sdt_decrement_ref_ctr(uprobe, uc);
 
 	if (WARN_ON(!consumer_del(uprobe, uc)))
 		return;
@@ -1102,7 +1195,7 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 
 		ret = register_for_each_vma(uprobe, uc);
 		if (!ret && ref_ctr_offset)
-			ret = sdt_increment_ref_ctr(uprobe);
+			ret = sdt_increment_ref_ctr(uprobe, uc);
 
 		if (ret)
 			__uprobe_unregister(uprobe, uc, false);
@@ -1258,6 +1351,26 @@ static void build_probe_list(struct inode *inode, int off_type,
 	spin_unlock(&uprobes_treelock);
 }
 
+static int __sdt_uprobe_mmap(struct uprobe *uprobe, struct mm_struct *mm,
+			     unsigned long vaddr, struct uprobe_consumer *uc)
+{
+	int ret;
+
+	if (sdt_check_mm_list(uprobe, mm, uc))
+		return 0;
+
+	ret = sdt_update_ref_ctr(mm, vaddr, 1);
+	if (unlikely(ret))
+		return ret;
+
+	ret = sdt_add_mm_list(uprobe, mm, uc);
+	if (likely(!ret))
+		return ret;
+
+	/* Failed to add mm into the list. Rollback ref_ctr update */
+	return sdt_update_ref_ctr(mm, vaddr, -1);
+}
+
 /* Called with down_write(&vma->vm_mm->mmap_sem) */
 static int sdt_uprobe_mmap(struct vm_area_struct *vma, struct inode *inode)
 {
@@ -1280,10 +1393,14 @@ static int sdt_uprobe_mmap(struct vm_area_struct *vma, struct inode *inode)
 
 		/* Increment reference counter for each consumer. */
 		down_read(&uprobe->consumer_rwsem);
+		mutex_lock(&uprobe->sml_lock);
+
 		for (uc = uprobe->consumers; uc; uc = uc->next) {
-			ret = sdt_update_ref_ctr(vma->vm_mm, vaddr, 1);
+			ret = __sdt_uprobe_mmap(uprobe, vma->vm_mm, vaddr, uc);
 			err = !err && ret ? ret : err;
 		}
+
+		mutex_unlock(&uprobe->sml_lock);
 		up_read(&uprobe->consumer_rwsem);
 		put_uprobe(uprobe);
 	}
@@ -1477,6 +1594,29 @@ static struct xol_area *get_xol_area(void)
 	return area;
 }
 
+static void sdt_mm_release(struct rb_node *n, struct mm_struct *mm)
+{
+	struct uprobe *uprobe;
+	struct uprobe_consumer *uc;
+
+	if (!n)
+		return;
+
+	uprobe = rb_entry(n, struct uprobe, rb_node);
+
+	down_read(&uprobe->consumer_rwsem);
+	mutex_lock(&uprobe->sml_lock);
+
+	for (uc = uprobe->consumers; uc; uc = uc->next)
+		sdt_del_mm_list(uprobe, mm, uc);
+
+	mutex_unlock(&uprobe->sml_lock);
+	up_read(&uprobe->consumer_rwsem);
+
+	sdt_mm_release(n->rb_left, mm);
+	sdt_mm_release(n->rb_right, mm);
+}
+
 /*
  * uprobe_clear_state - Free the area allocated for slots.
  */
@@ -1484,6 +1624,10 @@ void uprobe_clear_state(struct mm_struct *mm)
 {
 	struct xol_area *area = mm->uprobes_state.xol_area;
 
+	spin_lock(&uprobes_treelock);
+	sdt_mm_release(uprobes_tree.rb_node, mm);
+	spin_unlock(&uprobes_treelock);
+
 	if (!area)
 		return;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/7] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  2018-06-06  8:33 ` Ravi Bangoria
@ 2018-06-06  8:33   ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Rightnow, there can be multiple trace_uprobes with same inode+offset
but internally they all points to same uprobe. To allow user to
specify multiple ref_ctr_offset for single inode+offset combo, core
uprobe logic has to be changed. It's also unlikely to have more than
one reference counter for single uprobe. Restrict it.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036) > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0x10030) >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg | tail -1
  trace_kprobe: Reference counter offset mismatch.

One exception to this is when user is trying to replace the old entry
with the new one. We allow this if the new entry does not conflict with
any other existing entry.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/trace/trace_uprobe.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index d5b6ca9..e8914f7 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,34 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
 	return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Only one exception to this is when we are
+ * replacing old trace_uprobe with new one(same group/event).
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+	struct trace_uprobe *tmp, *old = NULL;
+	struct inode *new_inode = d_real_inode(new->path.dentry);
+
+	old = find_probe_event(trace_event_name(&new->tp.call),
+				new->tp.call.class->system);
+	if (!new->ref_ctr_offset)
+		return old;
+
+	list_for_each_entry(tmp, &uprobe_list, list) {
+		if (new_inode == d_real_inode(tmp->path.dentry) &&
+		    new->offset == tmp->offset &&
+		    new->ref_ctr_offset != tmp->ref_ctr_offset &&
+		    tmp != old) {
+			pr_warn("Reference counter offset mismatch.");
+			return ERR_PTR(-EINVAL);
+		}
+	}
+	return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +361,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 	mutex_lock(&uprobe_lock);
 
 	/* register as an event */
-	old_tu = find_probe_event(trace_event_name(&tu->tp.call),
-			tu->tp.call.class->system);
+	old_tu = find_old_trace_uprobe(tu);
+	if (IS_ERR(old_tu)) {
+		ret = PTR_ERR(old_tu);
+		goto end;
+	}
+
 	if (old_tu) {
 		/* delete old event */
 		ret = unregister_trace_uprobe(old_tu);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/7] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
@ 2018-06-06  8:33   ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Rightnow, there can be multiple trace_uprobes with same inode+offset
but internally they all points to same uprobe. To allow user to
specify multiple ref_ctr_offset for single inode+offset combo, core
uprobe logic has to be changed. It's also unlikely to have more than
one reference counter for single uprobe. Restrict it.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036) > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0x10030) >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg | tail -1
  trace_kprobe: Reference counter offset mismatch.

One exception to this is when user is trying to replace the old entry
with the new one. We allow this if the new entry does not conflict with
any other existing entry.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/trace/trace_uprobe.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index d5b6ca9..e8914f7 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,34 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
 	return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Only one exception to this is when we are
+ * replacing old trace_uprobe with new one(same group/event).
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+	struct trace_uprobe *tmp, *old = NULL;
+	struct inode *new_inode = d_real_inode(new->path.dentry);
+
+	old = find_probe_event(trace_event_name(&new->tp.call),
+				new->tp.call.class->system);
+	if (!new->ref_ctr_offset)
+		return old;
+
+	list_for_each_entry(tmp, &uprobe_list, list) {
+		if (new_inode == d_real_inode(tmp->path.dentry) &&
+		    new->offset == tmp->offset &&
+		    new->ref_ctr_offset != tmp->ref_ctr_offset &&
+		    tmp != old) {
+			pr_warn("Reference counter offset mismatch.");
+			return ERR_PTR(-EINVAL);
+		}
+	}
+	return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +361,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 	mutex_lock(&uprobe_lock);
 
 	/* register as an event */
-	old_tu = find_probe_event(trace_event_name(&tu->tp.call),
-			tu->tp.call.class->system);
+	old_tu = find_old_trace_uprobe(tu);
+	if (IS_ERR(old_tu)) {
+		ret = PTR_ERR(old_tu);
+		goto end;
+	}
+
 	if (old_tu) {
 		/* delete old event */
 		ret = unregister_trace_uprobe(old_tu);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/7] Uprobes/sdt: Prevent multiple reference counter for same uprobe
  2018-06-06  8:33 ` Ravi Bangoria
@ 2018-06-06  8:33   ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Even if we don't allow to create same uprobe with different reference
counter offset via trace_uprobe, user can still create such uprobes
by creating one uprobe using trace_uprobe and other by directly
calling uprobe_register() from kernel module. Prevent such scenarios.

Note that, we are restricting user to not use both ways (trace_uprobe
and kernel module) simultaneously. Ex, User can not trace single uprobe
(with reference counter) simultaneously with Systemtap and perf at the
same time.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/events/uprobes.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 279c8fc..5e07f99 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -526,6 +526,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 	cur_uprobe = insert_uprobe(uprobe);
 	/* a uprobe exists for this inode:offset combination */
 	if (cur_uprobe) {
+		if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+			pr_warn("Err: Reference counter mismatch.\n");
+			put_uprobe(cur_uprobe);
+			kfree(uprobe);
+			return ERR_PTR(-EINVAL);
+		}
 		kfree(uprobe);
 		uprobe = cur_uprobe;
 	}
@@ -1184,6 +1190,9 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (!uprobe)
 		return -ENOMEM;
+	if (IS_ERR(uprobe))
+		return PTR_ERR(uprobe);
+
 	/*
 	 * We can race with uprobe_unregister()->delete_uprobe().
 	 * Check uprobe_is_active() and retry if it is false.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/7] Uprobes/sdt: Prevent multiple reference counter for same uprobe
@ 2018-06-06  8:33   ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Even if we don't allow to create same uprobe with different reference
counter offset via trace_uprobe, user can still create such uprobes
by creating one uprobe using trace_uprobe and other by directly
calling uprobe_register() from kernel module. Prevent such scenarios.

Note that, we are restricting user to not use both ways (trace_uprobe
and kernel module) simultaneously. Ex, User can not trace single uprobe
(with reference counter) simultaneously with Systemtap and perf at the
same time.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 kernel/events/uprobes.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 279c8fc..5e07f99 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -526,6 +526,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 	cur_uprobe = insert_uprobe(uprobe);
 	/* a uprobe exists for this inode:offset combination */
 	if (cur_uprobe) {
+		if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+			pr_warn("Err: Reference counter mismatch.\n");
+			put_uprobe(cur_uprobe);
+			kfree(uprobe);
+			return ERR_PTR(-EINVAL);
+		}
 		kfree(uprobe);
 		uprobe = cur_uprobe;
 	}
@@ -1184,6 +1190,9 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (!uprobe)
 		return -ENOMEM;
+	if (IS_ERR(uprobe))
+		return PTR_ERR(uprobe);
+
 	/*
 	 * We can race with uprobe_unregister()->delete_uprobe().
 	 * Check uprobe_is_active() and retry if it is false.
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/7] Uprobes/sdt: Document about reference counter
  2018-06-06  8:33 ` Ravi Bangoria
@ 2018-06-06  8:33   ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Reference counter gate the invocation of probe. If present,
by default reference count is 0. Kernel needs to increment
it before tracing the probe and decrement it when done. This
is identical to semaphore in Userspace Statically Defined
Tracepoints (USDT).

Document usage of reference counter.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Documentation/trace/uprobetracer.rst | 16 +++++++++++++---
 kernel/trace/trace.c                 |  2 +-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
index 98d3f69..0c27c65 100644
--- a/Documentation/trace/uprobetracer.rst
+++ b/Documentation/trace/uprobetracer.rst
@@ -22,15 +22,25 @@ Synopsis of uprobe_tracer
 -------------------------
 ::
 
-  p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe
-  r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe (uretprobe)
-  -:[GRP/]EVENT                           : Clear uprobe or uretprobe event
+  p[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  r[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  -:[GRP/]EVENT
+
+  p : Set a uprobe
+  r : Set a return uprobe (uretprobe)
+  - : Clear uprobe or uretprobe event
 
   GRP           : Group name. If omitted, "uprobes" is the default value.
   EVENT         : Event name. If omitted, the event name is generated based
                   on PATH+OFFSET.
   PATH          : Path to an executable or a library.
   OFFSET        : Offset where the probe is inserted.
+  REF_CTR_OFFSET: Reference counter offset. Optional field. Reference count
+                 gate the invocation of probe. If present, by default
+                 reference count is 0. Kernel needs to increment it before
+                 tracing the probe and decrement it when done. This is
+                 identical to semaphore in Userspace Statically Defined
+                 Tracepoints (USDT).
 
   FETCHARGS     : Arguments. Each probe can have up to 128 args.
    %REG         : Fetch register REG
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index bcd9303..f85639b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4616,7 +4616,7 @@ static int tracing_trace_options_open(struct inode *inode, struct file *file)
   "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
 #endif
 #ifdef CONFIG_UPROBE_EVENTS
-	"\t    place: <path>:<offset>\n"
+  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
 #endif
 	"\t     args: <name>=fetcharg[:type]\n"
 	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/7] Uprobes/sdt: Document about reference counter
@ 2018-06-06  8:33   ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

Reference counter gate the invocation of probe. If present,
by default reference count is 0. Kernel needs to increment
it before tracing the probe and decrement it when done. This
is identical to semaphore in Userspace Statically Defined
Tracepoints (USDT).

Document usage of reference counter.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Documentation/trace/uprobetracer.rst | 16 +++++++++++++---
 kernel/trace/trace.c                 |  2 +-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
index 98d3f69..0c27c65 100644
--- a/Documentation/trace/uprobetracer.rst
+++ b/Documentation/trace/uprobetracer.rst
@@ -22,15 +22,25 @@ Synopsis of uprobe_tracer
 -------------------------
 ::
 
-  p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe
-  r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe (uretprobe)
-  -:[GRP/]EVENT                           : Clear uprobe or uretprobe event
+  p[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  r[:[GRP/]EVENT] PATH:OFFSET[(REF_CTR_OFFSET)] [FETCHARGS]
+  -:[GRP/]EVENT
+
+  p : Set a uprobe
+  r : Set a return uprobe (uretprobe)
+  - : Clear uprobe or uretprobe event
 
   GRP           : Group name. If omitted, "uprobes" is the default value.
   EVENT         : Event name. If omitted, the event name is generated based
                   on PATH+OFFSET.
   PATH          : Path to an executable or a library.
   OFFSET        : Offset where the probe is inserted.
+  REF_CTR_OFFSET: Reference counter offset. Optional field. Reference count
+                 gate the invocation of probe. If present, by default
+                 reference count is 0. Kernel needs to increment it before
+                 tracing the probe and decrement it when done. This is
+                 identical to semaphore in Userspace Statically Defined
+                 Tracepoints (USDT).
 
   FETCHARGS     : Arguments. Each probe can have up to 128 args.
    %REG         : Fetch register REG
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index bcd9303..f85639b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4616,7 +4616,7 @@ static int tracing_trace_options_open(struct inode *inode, struct file *file)
   "place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>\n"
 #endif
 #ifdef CONFIG_UPROBE_EVENTS
-	"\t    place: <path>:<offset>\n"
+  "   place (uprobe): <path>:<offset>[(ref_ctr_offset)]\n"
 #endif
 	"\t     args: <name>=fetcharg[:type]\n"
 	"\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 7/7] perf probe: Support SDT markers having reference counter (semaphore)
  2018-06-06  8:33 ` Ravi Bangoria
@ 2018-06-06  8:33   ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
    Name: loop2
    ... Semaphore: 0x0000000010020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 tools/perf/util/probe-event.c | 39 ++++++++++++++++++++++++++++++++----
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++++++++++++++++++++++++++------
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ++++++++++++++++++++++++++++++++-----------
 tools/perf/util/symbol.h      |  7 +++++++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 3094f11..0911c9f 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1820,6 +1820,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev)
 			tp->offset = strtoul(fmt2_str, NULL, 10);
 	}
 
+	if (tev->uprobes) {
+		fmt2_str = strchr(p, '(');
+		if (fmt2_str)
+			tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+	}
+
 	tev->nargs = argc - 2;
 	tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
 	if (tev->args == NULL) {
@@ -2013,6 +2019,22 @@ static int synthesize_probe_trace_arg(struct probe_trace_arg *arg,
 	return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+	struct probe_trace_point *tp = &tev->point;
+	int err;
+
+	err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+	if (err >= 0 && tp->ref_ctr_offset) {
+		if (!uprobe_ref_ctr_is_supported())
+			return -1;
+		err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+	}
+	return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
 	struct probe_trace_point *tp = &tev->point;
@@ -2042,15 +2064,17 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 	}
 
 	/* Use the tp->address for uprobes */
-	if (tev->uprobes)
-		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
-	else if (!strncmp(tp->symbol, "0x", 2))
+	if (tev->uprobes) {
+		err = synthesize_uprobe_trace_def(tev, &buf);
+	} else if (!strncmp(tp->symbol, "0x", 2)) {
 		/* Absolute address. See try_to_find_absolute_address() */
 		err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
 				  tp->module ? ":" : "", tp->address);
-	else
+	} else {
 		err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
 				tp->module ? ":" : "", tp->symbol, tp->offset);
+	}
+
 	if (err)
 		goto error;
 
@@ -2634,6 +2658,13 @@ static void warn_uprobe_event_compat(struct probe_trace_event *tev)
 {
 	int i;
 	char *buf = synthesize_probe_trace_command(tev);
+	struct probe_trace_point *tp = &tev->point;
+
+	if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+		pr_warning("A semaphore is associated with %s:%s and "
+			   "seems your kernel doesn't support it.\n",
+			   tev->group, tev->event);
+	}
 
 	/* Old uprobe event doesn't support memory dereference */
 	if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f0..15a98c3 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
 	char		*symbol;	/* Base symbol */
 	char		*module;	/* Module name */
 	unsigned long	offset;		/* Offset from symbol */
+	unsigned long	ref_ctr_offset;	/* SDT reference counter offset */
 	unsigned long	address;	/* Actual address of the trace point */
 	bool		retprobe;	/* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index b76088f..aac7817 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -696,8 +696,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
 static unsigned long long sdt_note__get_addr(struct sdt_note *note)
 {
-	return note->bit32 ? (unsigned long long)note->addr.a32[0]
-		 : (unsigned long long)note->addr.a64[0];
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_LOC] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_LOC];
+}
+
+static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
+{
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_REFCTR] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_REFCTR];
 }
 
 static const char * const type_to_suffix[] = {
@@ -775,14 +783,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
 {
 	struct strbuf buf;
 	char *ret = NULL, **args;
-	int i, args_count;
+	int i, args_count, err;
+	unsigned long long ref_ctr_offset;
 
 	if (strbuf_init(&buf, 32) < 0)
 		return NULL;
 
-	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
-				sdtgrp, note->name, pathname,
-				sdt_note__get_addr(note)) < 0)
+	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
+			sdtgrp, note->name, pathname,
+			sdt_note__get_addr(note));
+
+	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
+	if (ref_ctr_offset && err >= 0)
+		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
+
+	if (err < 0)
 		goto error;
 
 	if (!note->args)
@@ -998,6 +1013,7 @@ int probe_cache__show_all_caches(struct strfilter *filter)
 enum ftrace_readme {
 	FTRACE_README_PROBE_TYPE_X = 0,
 	FTRACE_README_KRETPROBE_OFFSET,
+	FTRACE_README_UPROBE_REF_CTR,
 	FTRACE_README_END,
 };
 
@@ -1009,6 +1025,7 @@ enum ftrace_readme {
 	[idx] = {.pattern = pat, .avail = false}
 	DEFINE_TYPE(FTRACE_README_PROBE_TYPE_X, "*type: * x8/16/32/64,*"),
 	DEFINE_TYPE(FTRACE_README_KRETPROBE_OFFSET, "*place (kretprobe): *"),
+	DEFINE_TYPE(FTRACE_README_UPROBE_REF_CTR, "*ref_ctr_offset*"),
 };
 
 static bool scan_ftrace_readme(enum ftrace_readme type)
@@ -1064,3 +1081,8 @@ bool kretprobe_offset_is_supported(void)
 {
 	return scan_ftrace_readme(FTRACE_README_KRETPROBE_OFFSET);
 }
+
+bool uprobe_ref_ctr_is_supported(void)
+{
+	return scan_ftrace_readme(FTRACE_README_UPROBE_REF_CTR);
+}
diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
index 63f29b1..2a24918 100644
--- a/tools/perf/util/probe-file.h
+++ b/tools/perf/util/probe-file.h
@@ -69,6 +69,7 @@ struct probe_cache_entry *probe_cache__find_by_name(struct probe_cache *pcache,
 int probe_cache__show_all_caches(struct strfilter *filter);
 bool probe_type_is_available(enum probe_type type);
 bool kretprobe_offset_is_supported(void);
+bool uprobe_ref_ctr_is_supported(void);
 #else	/* ! HAVE_LIBELF_SUPPORT */
 static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused, struct nsinfo *nsi __maybe_unused)
 {
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 29770ea..0281d5e 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1947,6 +1947,34 @@ void kcore_extract__delete(struct kcore_extract *kce)
 }
 
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
+
+static void sdt_adjust_loc(struct sdt_note *tmp, GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32)
+		tmp->addr.a32[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a32[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a32[SDT_NOTE_IDX_BASE];
+	else
+		tmp->addr.a64[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a64[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a64[SDT_NOTE_IDX_BASE];
+}
+
+static void sdt_adjust_refctr(struct sdt_note *tmp, GElf_Addr base_addr,
+			      GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32 && tmp->addr.a32[SDT_NOTE_IDX_REFCTR])
+		tmp->addr.a32[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+	else if (tmp->addr.a64[SDT_NOTE_IDX_REFCTR])
+		tmp->addr.a64[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+}
+
 /**
  * populate_sdt_note : Parse raw data and identify SDT note
  * @elf: elf of the opened file
@@ -1964,7 +1992,6 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	const char *provider, *name, *args;
 	struct sdt_note *tmp = NULL;
 	GElf_Ehdr ehdr;
-	GElf_Addr base_off = 0;
 	GElf_Shdr shdr;
 	int ret = -EINVAL;
 
@@ -2060,17 +2087,12 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	 * base address in the description of the SDT note. If its different,
 	 * then accordingly, adjust the note location.
 	 */
-	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL)) {
-		base_off = shdr.sh_offset;
-		if (base_off) {
-			if (tmp->bit32)
-				tmp->addr.a32[0] = tmp->addr.a32[0] + base_off -
-					tmp->addr.a32[1];
-			else
-				tmp->addr.a64[0] = tmp->addr.a64[0] + base_off -
-					tmp->addr.a64[1];
-		}
-	}
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL))
+		sdt_adjust_loc(tmp, shdr.sh_offset);
+
+	/* Adjust reference counter offset */
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_PROBES_SCN, NULL))
+		sdt_adjust_refctr(tmp, shdr.sh_addr, shdr.sh_offset);
 
 	list_add_tail(&tmp->note_list, sdt_notes);
 	return 0;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 1a16438..61ed90c 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -382,12 +382,19 @@ struct sdt_note {
 int cleanup_sdt_note_list(struct list_head *sdt_notes);
 int sdt_notes__get_count(struct list_head *start);
 
+#define SDT_PROBES_SCN ".probes"
 #define SDT_BASE_SCN ".stapsdt.base"
 #define SDT_NOTE_SCN  ".note.stapsdt"
 #define SDT_NOTE_TYPE 3
 #define SDT_NOTE_NAME "stapsdt"
 #define NR_ADDR 3
 
+enum {
+	SDT_NOTE_IDX_LOC = 0,
+	SDT_NOTE_IDX_BASE,
+	SDT_NOTE_IDX_REFCTR,
+};
+
 struct mem_info *mem_info__new(void);
 struct mem_info *mem_info__get(struct mem_info *mi);
 void   mem_info__put(struct mem_info *mi);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 7/7] perf probe: Support SDT markers having reference counter (semaphore)
@ 2018-06-06  8:33   ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:33 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
    Name: loop2
    ... Semaphore: 0x0000000010020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
    hi: 0
    hi: 1
    hi: 2
    ^C
     Performance counter stats for '/tmp/tick':
                 3      sdt_tick:loop2
       2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 tools/perf/util/probe-event.c | 39 ++++++++++++++++++++++++++++++++----
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++++++++++++++++++++++++++------
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ++++++++++++++++++++++++++++++++-----------
 tools/perf/util/symbol.h      |  7 +++++++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 3094f11..0911c9f 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1820,6 +1820,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev)
 			tp->offset = strtoul(fmt2_str, NULL, 10);
 	}
 
+	if (tev->uprobes) {
+		fmt2_str = strchr(p, '(');
+		if (fmt2_str)
+			tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+	}
+
 	tev->nargs = argc - 2;
 	tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
 	if (tev->args == NULL) {
@@ -2013,6 +2019,22 @@ static int synthesize_probe_trace_arg(struct probe_trace_arg *arg,
 	return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+	struct probe_trace_point *tp = &tev->point;
+	int err;
+
+	err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+	if (err >= 0 && tp->ref_ctr_offset) {
+		if (!uprobe_ref_ctr_is_supported())
+			return -1;
+		err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+	}
+	return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
 	struct probe_trace_point *tp = &tev->point;
@@ -2042,15 +2064,17 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 	}
 
 	/* Use the tp->address for uprobes */
-	if (tev->uprobes)
-		err = strbuf_addf(&buf, "%s:0x%lx", tp->module, tp->address);
-	else if (!strncmp(tp->symbol, "0x", 2))
+	if (tev->uprobes) {
+		err = synthesize_uprobe_trace_def(tev, &buf);
+	} else if (!strncmp(tp->symbol, "0x", 2)) {
 		/* Absolute address. See try_to_find_absolute_address() */
 		err = strbuf_addf(&buf, "%s%s0x%lx", tp->module ?: "",
 				  tp->module ? ":" : "", tp->address);
-	else
+	} else {
 		err = strbuf_addf(&buf, "%s%s%s+%lu", tp->module ?: "",
 				tp->module ? ":" : "", tp->symbol, tp->offset);
+	}
+
 	if (err)
 		goto error;
 
@@ -2634,6 +2658,13 @@ static void warn_uprobe_event_compat(struct probe_trace_event *tev)
 {
 	int i;
 	char *buf = synthesize_probe_trace_command(tev);
+	struct probe_trace_point *tp = &tev->point;
+
+	if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+		pr_warning("A semaphore is associated with %s:%s and "
+			   "seems your kernel doesn't support it.\n",
+			   tev->group, tev->event);
+	}
 
 	/* Old uprobe event doesn't support memory dereference */
 	if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f0..15a98c3 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
 	char		*symbol;	/* Base symbol */
 	char		*module;	/* Module name */
 	unsigned long	offset;		/* Offset from symbol */
+	unsigned long	ref_ctr_offset;	/* SDT reference counter offset */
 	unsigned long	address;	/* Actual address of the trace point */
 	bool		retprobe;	/* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index b76088f..aac7817 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -696,8 +696,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
 static unsigned long long sdt_note__get_addr(struct sdt_note *note)
 {
-	return note->bit32 ? (unsigned long long)note->addr.a32[0]
-		 : (unsigned long long)note->addr.a64[0];
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_LOC] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_LOC];
+}
+
+static unsigned long long sdt_note__get_ref_ctr_offset(struct sdt_note *note)
+{
+	return note->bit32 ?
+		(unsigned long long)note->addr.a32[SDT_NOTE_IDX_REFCTR] :
+		(unsigned long long)note->addr.a64[SDT_NOTE_IDX_REFCTR];
 }
 
 static const char * const type_to_suffix[] = {
@@ -775,14 +783,21 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note,
 {
 	struct strbuf buf;
 	char *ret = NULL, **args;
-	int i, args_count;
+	int i, args_count, err;
+	unsigned long long ref_ctr_offset;
 
 	if (strbuf_init(&buf, 32) < 0)
 		return NULL;
 
-	if (strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
-				sdtgrp, note->name, pathname,
-				sdt_note__get_addr(note)) < 0)
+	err = strbuf_addf(&buf, "p:%s/%s %s:0x%llx",
+			sdtgrp, note->name, pathname,
+			sdt_note__get_addr(note));
+
+	ref_ctr_offset = sdt_note__get_ref_ctr_offset(note);
+	if (ref_ctr_offset && err >= 0)
+		err = strbuf_addf(&buf, "(0x%llx)", ref_ctr_offset);
+
+	if (err < 0)
 		goto error;
 
 	if (!note->args)
@@ -998,6 +1013,7 @@ int probe_cache__show_all_caches(struct strfilter *filter)
 enum ftrace_readme {
 	FTRACE_README_PROBE_TYPE_X = 0,
 	FTRACE_README_KRETPROBE_OFFSET,
+	FTRACE_README_UPROBE_REF_CTR,
 	FTRACE_README_END,
 };
 
@@ -1009,6 +1025,7 @@ enum ftrace_readme {
 	[idx] = {.pattern = pat, .avail = false}
 	DEFINE_TYPE(FTRACE_README_PROBE_TYPE_X, "*type: * x8/16/32/64,*"),
 	DEFINE_TYPE(FTRACE_README_KRETPROBE_OFFSET, "*place (kretprobe): *"),
+	DEFINE_TYPE(FTRACE_README_UPROBE_REF_CTR, "*ref_ctr_offset*"),
 };
 
 static bool scan_ftrace_readme(enum ftrace_readme type)
@@ -1064,3 +1081,8 @@ bool kretprobe_offset_is_supported(void)
 {
 	return scan_ftrace_readme(FTRACE_README_KRETPROBE_OFFSET);
 }
+
+bool uprobe_ref_ctr_is_supported(void)
+{
+	return scan_ftrace_readme(FTRACE_README_UPROBE_REF_CTR);
+}
diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
index 63f29b1..2a24918 100644
--- a/tools/perf/util/probe-file.h
+++ b/tools/perf/util/probe-file.h
@@ -69,6 +69,7 @@ struct probe_cache_entry *probe_cache__find_by_name(struct probe_cache *pcache,
 int probe_cache__show_all_caches(struct strfilter *filter);
 bool probe_type_is_available(enum probe_type type);
 bool kretprobe_offset_is_supported(void);
+bool uprobe_ref_ctr_is_supported(void);
 #else	/* ! HAVE_LIBELF_SUPPORT */
 static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused, struct nsinfo *nsi __maybe_unused)
 {
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 29770ea..0281d5e 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1947,6 +1947,34 @@ void kcore_extract__delete(struct kcore_extract *kce)
 }
 
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
+
+static void sdt_adjust_loc(struct sdt_note *tmp, GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32)
+		tmp->addr.a32[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a32[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a32[SDT_NOTE_IDX_BASE];
+	else
+		tmp->addr.a64[SDT_NOTE_IDX_LOC] =
+			tmp->addr.a64[SDT_NOTE_IDX_LOC] + base_off -
+			tmp->addr.a64[SDT_NOTE_IDX_BASE];
+}
+
+static void sdt_adjust_refctr(struct sdt_note *tmp, GElf_Addr base_addr,
+			      GElf_Addr base_off)
+{
+	if (!base_off)
+		return;
+
+	if (tmp->bit32 && tmp->addr.a32[SDT_NOTE_IDX_REFCTR])
+		tmp->addr.a32[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+	else if (tmp->addr.a64[SDT_NOTE_IDX_REFCTR])
+		tmp->addr.a64[SDT_NOTE_IDX_REFCTR] -= (base_addr - base_off);
+}
+
 /**
  * populate_sdt_note : Parse raw data and identify SDT note
  * @elf: elf of the opened file
@@ -1964,7 +1992,6 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	const char *provider, *name, *args;
 	struct sdt_note *tmp = NULL;
 	GElf_Ehdr ehdr;
-	GElf_Addr base_off = 0;
 	GElf_Shdr shdr;
 	int ret = -EINVAL;
 
@@ -2060,17 +2087,12 @@ static int populate_sdt_note(Elf **elf, const char *data, size_t len,
 	 * base address in the description of the SDT note. If its different,
 	 * then accordingly, adjust the note location.
 	 */
-	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL)) {
-		base_off = shdr.sh_offset;
-		if (base_off) {
-			if (tmp->bit32)
-				tmp->addr.a32[0] = tmp->addr.a32[0] + base_off -
-					tmp->addr.a32[1];
-			else
-				tmp->addr.a64[0] = tmp->addr.a64[0] + base_off -
-					tmp->addr.a64[1];
-		}
-	}
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_BASE_SCN, NULL))
+		sdt_adjust_loc(tmp, shdr.sh_offset);
+
+	/* Adjust reference counter offset */
+	if (elf_section_by_name(*elf, &ehdr, &shdr, SDT_PROBES_SCN, NULL))
+		sdt_adjust_refctr(tmp, shdr.sh_addr, shdr.sh_offset);
 
 	list_add_tail(&tmp->note_list, sdt_notes);
 	return 0;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 1a16438..61ed90c 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -382,12 +382,19 @@ struct sdt_note {
 int cleanup_sdt_note_list(struct list_head *sdt_notes);
 int sdt_notes__get_count(struct list_head *start);
 
+#define SDT_PROBES_SCN ".probes"
 #define SDT_BASE_SCN ".stapsdt.base"
 #define SDT_NOTE_SCN  ".note.stapsdt"
 #define SDT_NOTE_TYPE 3
 #define SDT_NOTE_NAME "stapsdt"
 #define NR_ADDR 3
 
+enum {
+	SDT_NOTE_IDX_LOC = 0,
+	SDT_NOTE_IDX_BASE,
+	SDT_NOTE_IDX_REFCTR,
+};
+
 struct mem_info *mem_info__new(void);
 struct mem_info *mem_info__get(struct mem_info *mi);
 void   mem_info__put(struct mem_info *mi);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-06  8:33 ` Ravi Bangoria
@ 2018-06-06  8:35   ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:35 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria



On 06/06/2018 02:03 PM, Ravi Bangoria wrote:
> Why RFC again:
> 

Please consider this as [RFC]. I just forgot to tag it in the subject.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-06  8:35   ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-06  8:35 UTC (permalink / raw)
  To: oleg, srikar, rostedt, mhiramat
  Cc: peterz, mingo, acme, alexander.shishkin, jolsa, namhyung,
	linux-kernel, corbet, linux-doc, ananth, alexis.berlemont,
	naveen.n.rao, Ravi Bangoria



On 06/06/2018 02:03 PM, Ravi Bangoria wrote:
> Why RFC again:
> 

Please consider this as [RFC]. I just forgot to tag it in the subject.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-06  8:33 ` Ravi Bangoria
@ 2018-06-08  1:10   ` Masami Hiramatsu
  -1 siblings, 0 replies; 42+ messages in thread
From: Masami Hiramatsu @ 2018-06-08  1:10 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao

On Wed,  6 Jun 2018 14:03:37 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Why RFC again:
> 
> This series is different from earlier versions[1]. Earlier series
> implemented this feature in trace_uprobe while this has implemented
> the logic in core uprobe. Few reasons for this:
>  1. One of the major reason was the deadlock between uprobe_lock and
>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>  to take uprobe_lock to loop over trace_uprobe list. More details can
>  be found at[2]. With this new approach, there are no deadlocks found
>  so far.
>  2. Many of the core uprobe function and data-structures needs to be
>  exported to make earlier implementation simple. With this new approach,
>  reference counter logic is been implemented in core uprobe and thus
>  no need to export anything.

I agree with you. Moreover, since uprobe_register/unregister() are
exported to modules, this enablement would better be implemented
inside uprobe so that all uprobe users benefit from this.

> 
> Description:
> 
> Userspace Statically Defined Tracepoints[3] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
> 
>     if (reference_counter > 0) {
>         Execute marker instructions;
>     }
> 
> Default value of reference counter is 0. Tracer has to increment the 
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
> 
> Currently, perf tool has limited supports for SDT markers. I.e. it
> can not trace markers surrounded by reference counter. Also, it's
> not easy to add reference counter logic in userspace tool like perf,
> so basic idea for this patchset is to add reference counter logic in
> the trace_uprobe infrastructure. Ex,[4]
> 
>   # cat tick.c
>     ... 
>     for (i = 0; i < 100; i++) {
> 	DTRACE_PROBE1(tick, loop1, i);
>         if (TICK_LOOP2_ENABLED()) {
>             DTRACE_PROBE1(tick, loop2, i); 
>         }
>         printf("hi: %d\n", i); 
>         sleep(1);
>     }   
>     ... 
> 
> Here tick:loop1 is marker without reference counter where as tick:loop2
> is surrounded by reference counter condition.
> 
>   # perf buildid-cache --add /tmp/tick
>   # perf probe sdt_tick:loop1
>   # perf probe sdt_tick:loop2
> 
>   # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
>   hi: 0
>   hi: 1
>   hi: 2
>   ^C
>   Performance counter stats for '/tmp/tick':
>              3      sdt_tick:loop1
>              0      sdt_tick:loop2
>      2.747086086 seconds time elapsed
> 
> Perf failed to record data for tick:loop2. Same experiment with this
> patch series:
> 
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C  
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed
> 
> 
> Note:
>  - 'reference counter' is called as 'semaphore' in original Dtrace
>    (or Systemtap, bcc and even in ELF) documentation and code. But the 
>    term 'semaphore' is misleading in this context. This is just a counter
>    used to hold number of tracers tracing on a marker. This is not haeally
>    used for any synchronization. So we are referring it as 'reference
>    counter' in kernel / perf code.

+1. I like this "reference counter" term :)

>  - This patches still has one issue. If there are multiple instances of
>    same application running and user wants to trace any particular
>    instance, trace_uprobe is updating reference counter in all instances.
>    This is not a problem on user side because instruction is not replaced
>    with trap/int3 and thus user will only see samples from his interested
>    process. But still this is more of a correctness issue. I'm working on
>    a fix for this.

Hmm, it sounds like not a correctness issue, but there maybe a performace
tradeoff. Tracing one particulear instance, other instances also will get
a performance loss (Only if the parameter preparation block is heavy,
because the heaviest part of probing - trap/int3 and recording data - isn't
executed.)

BTW, why this happens? I thought the refcounter part is just a data which
is not shared among processes...

Thank you,


> 
> [1] https://lkml.org/lkml/2018/4/17/23
> [2] https://lkml.org/lkml/2018/5/25/111
> [3] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> [4] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
> 
> Ravi Bangoria (7):
>   Uprobes: Simplify uprobe_register() body
>   Uprobes: Support SDT markers having reference count (semaphore)
>   Uprobes/sdt: Fix multiple update of same reference counter
>   trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Document about reference counter
>   perf probe: Support SDT markers having reference counter (semaphore)
> 
>  Documentation/trace/uprobetracer.rst |  16 +-
>  include/linux/uprobes.h              |   5 +
>  kernel/events/uprobes.c              | 502 +++++++++++++++++++++++++++++++----
>  kernel/trace/trace.c                 |   2 +-
>  kernel/trace/trace_uprobe.c          |  74 +++++-
>  tools/perf/util/probe-event.c        |  39 ++-
>  tools/perf/util/probe-event.h        |   1 +
>  tools/perf/util/probe-file.c         |  34 ++-
>  tools/perf/util/probe-file.h         |   1 +
>  tools/perf/util/symbol-elf.c         |  46 +++-
>  tools/perf/util/symbol.h             |   7 +
>  11 files changed, 643 insertions(+), 84 deletions(-)
> 
> -- 
> 1.8.3.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-08  1:10   ` Masami Hiramatsu
  0 siblings, 0 replies; 42+ messages in thread
From: Masami Hiramatsu @ 2018-06-08  1:10 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao

On Wed,  6 Jun 2018 14:03:37 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Why RFC again:
> 
> This series is different from earlier versions[1]. Earlier series
> implemented this feature in trace_uprobe while this has implemented
> the logic in core uprobe. Few reasons for this:
>  1. One of the major reason was the deadlock between uprobe_lock and
>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>  to take uprobe_lock to loop over trace_uprobe list. More details can
>  be found at[2]. With this new approach, there are no deadlocks found
>  so far.
>  2. Many of the core uprobe function and data-structures needs to be
>  exported to make earlier implementation simple. With this new approach,
>  reference counter logic is been implemented in core uprobe and thus
>  no need to export anything.

I agree with you. Moreover, since uprobe_register/unregister() are
exported to modules, this enablement would better be implemented
inside uprobe so that all uprobe users benefit from this.

> 
> Description:
> 
> Userspace Statically Defined Tracepoints[3] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
> 
>     if (reference_counter > 0) {
>         Execute marker instructions;
>     }
> 
> Default value of reference counter is 0. Tracer has to increment the 
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
> 
> Currently, perf tool has limited supports for SDT markers. I.e. it
> can not trace markers surrounded by reference counter. Also, it's
> not easy to add reference counter logic in userspace tool like perf,
> so basic idea for this patchset is to add reference counter logic in
> the trace_uprobe infrastructure. Ex,[4]
> 
>   # cat tick.c
>     ... 
>     for (i = 0; i < 100; i++) {
> 	DTRACE_PROBE1(tick, loop1, i);
>         if (TICK_LOOP2_ENABLED()) {
>             DTRACE_PROBE1(tick, loop2, i); 
>         }
>         printf("hi: %d\n", i); 
>         sleep(1);
>     }   
>     ... 
> 
> Here tick:loop1 is marker without reference counter where as tick:loop2
> is surrounded by reference counter condition.
> 
>   # perf buildid-cache --add /tmp/tick
>   # perf probe sdt_tick:loop1
>   # perf probe sdt_tick:loop2
> 
>   # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
>   hi: 0
>   hi: 1
>   hi: 2
>   ^C
>   Performance counter stats for '/tmp/tick':
>              3      sdt_tick:loop1
>              0      sdt_tick:loop2
>      2.747086086 seconds time elapsed
> 
> Perf failed to record data for tick:loop2. Same experiment with this
> patch series:
> 
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C  
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed
> 
> 
> Note:
>  - 'reference counter' is called as 'semaphore' in original Dtrace
>    (or Systemtap, bcc and even in ELF) documentation and code. But the 
>    term 'semaphore' is misleading in this context. This is just a counter
>    used to hold number of tracers tracing on a marker. This is not haeally
>    used for any synchronization. So we are referring it as 'reference
>    counter' in kernel / perf code.

+1. I like this "reference counter" term :)

>  - This patches still has one issue. If there are multiple instances of
>    same application running and user wants to trace any particular
>    instance, trace_uprobe is updating reference counter in all instances.
>    This is not a problem on user side because instruction is not replaced
>    with trap/int3 and thus user will only see samples from his interested
>    process. But still this is more of a correctness issue. I'm working on
>    a fix for this.

Hmm, it sounds like not a correctness issue, but there maybe a performace
tradeoff. Tracing one particulear instance, other instances also will get
a performance loss (Only if the parameter preparation block is heavy,
because the heaviest part of probing - trap/int3 and recording data - isn't
executed.)

BTW, why this happens? I thought the refcounter part is just a data which
is not shared among processes...

Thank you,


> 
> [1] https://lkml.org/lkml/2018/4/17/23
> [2] https://lkml.org/lkml/2018/5/25/111
> [3] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> [4] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
> 
> Ravi Bangoria (7):
>   Uprobes: Simplify uprobe_register() body
>   Uprobes: Support SDT markers having reference count (semaphore)
>   Uprobes/sdt: Fix multiple update of same reference counter
>   trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Document about reference counter
>   perf probe: Support SDT markers having reference counter (semaphore)
> 
>  Documentation/trace/uprobetracer.rst |  16 +-
>  include/linux/uprobes.h              |   5 +
>  kernel/events/uprobes.c              | 502 +++++++++++++++++++++++++++++++----
>  kernel/trace/trace.c                 |   2 +-
>  kernel/trace/trace_uprobe.c          |  74 +++++-
>  tools/perf/util/probe-event.c        |  39 ++-
>  tools/perf/util/probe-event.h        |   1 +
>  tools/perf/util/probe-file.c         |  34 ++-
>  tools/perf/util/probe-file.h         |   1 +
>  tools/perf/util/symbol-elf.c         |  46 +++-
>  tools/perf/util/symbol.h             |   7 +
>  11 files changed, 643 insertions(+), 84 deletions(-)
> 
> -- 
> 1.8.3.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-08  1:10   ` Masami Hiramatsu
@ 2018-06-08  2:29     ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-08  2:29 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria

Hi Masami,

On 06/08/2018 06:40 AM, Masami Hiramatsu wrote:
> On Wed,  6 Jun 2018 14:03:37 +0530
> Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
> 
>> Why RFC again:
>>
>> This series is different from earlier versions[1]. Earlier series
>> implemented this feature in trace_uprobe while this has implemented
>> the logic in core uprobe. Few reasons for this:
>>  1. One of the major reason was the deadlock between uprobe_lock and
>>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>>  to take uprobe_lock to loop over trace_uprobe list. More details can
>>  be found at[2]. With this new approach, there are no deadlocks found
>>  so far.
>>  2. Many of the core uprobe function and data-structures needs to be
>>  exported to make earlier implementation simple. With this new approach,
>>  reference counter logic is been implemented in core uprobe and thus
>>  no need to export anything.
> 
> I agree with you. Moreover, since uprobe_register/unregister() are
> exported to modules, this enablement would better be implemented
> inside uprobe so that all uprobe users benefit from this.


Sorry, I think you got me wrong. I meant, I don't need to expose all core
uprobe _static_ functions to tarce_uprobe.

Now, about kernel modules, basically uprobe_register() takes three parameters:
    inode, offset and consumer.
There is no scope for the reference counter there. So I've created one more
function: uprobe_register_refctr(). But this function is not exported as ABI
to kernel module. i.e. kernel modules still does not have a way to create
uprobe with reference counter. So for kernel modules,

is it fine to change current ABI from
    uprobe_register(inode, offset, consumer)
to
    uprobe_register(inode, offset, ref_ctr_offset, consumer)

Or I should introduce new function for this:
    uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
and export it to kernel module?

What's your suggestion?

[...]

>> 
>>  - This patches still has one issue. If there are multiple instances of
>>    same application running and user wants to trace any particular
>>    instance, trace_uprobe is updating reference counter in all instances.
>>    This is not a problem on user side because instruction is not replaced
>>    with trap/int3 and thus user will only see samples from his interested
>>    process. But still this is more of a correctness issue. I'm working on
>>    a fix for this.
> 
> Hmm, it sounds like not a correctness issue, but there maybe a performace
> tradeoff. Tracing one particulear instance, other instances also will get
> a performance loss


Right, but it's temporary. I mean, putting everything in to this series was making
it complex. So this is the initial one and I'll send followup patches which will
optimize the reference counter update.


> (Only if the parameter preparation block is heavy,
> because the heaviest part of probing - trap/int3 and recording data - isn't
> executed.)
>> BTW, why this happens? I thought the refcounter part is just a data which
> is not shared among processes...
> 

This happens because we are not calling consumer_filter function. consumer_filter
is the one who decides whether to change the instruction to trap or not in a given
mm. We also need to call it before updating reference counter.

Let me know your thoughts.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-08  2:29     ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-08  2:29 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria

Hi Masami,

On 06/08/2018 06:40 AM, Masami Hiramatsu wrote:
> On Wed,  6 Jun 2018 14:03:37 +0530
> Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
> 
>> Why RFC again:
>>
>> This series is different from earlier versions[1]. Earlier series
>> implemented this feature in trace_uprobe while this has implemented
>> the logic in core uprobe. Few reasons for this:
>>  1. One of the major reason was the deadlock between uprobe_lock and
>>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>>  to take uprobe_lock to loop over trace_uprobe list. More details can
>>  be found at[2]. With this new approach, there are no deadlocks found
>>  so far.
>>  2. Many of the core uprobe function and data-structures needs to be
>>  exported to make earlier implementation simple. With this new approach,
>>  reference counter logic is been implemented in core uprobe and thus
>>  no need to export anything.
> 
> I agree with you. Moreover, since uprobe_register/unregister() are
> exported to modules, this enablement would better be implemented
> inside uprobe so that all uprobe users benefit from this.


Sorry, I think you got me wrong. I meant, I don't need to expose all core
uprobe _static_ functions to tarce_uprobe.

Now, about kernel modules, basically uprobe_register() takes three parameters:
    inode, offset and consumer.
There is no scope for the reference counter there. So I've created one more
function: uprobe_register_refctr(). But this function is not exported as ABI
to kernel module. i.e. kernel modules still does not have a way to create
uprobe with reference counter. So for kernel modules,

is it fine to change current ABI from
    uprobe_register(inode, offset, consumer)
to
    uprobe_register(inode, offset, ref_ctr_offset, consumer)

Or I should introduce new function for this:
    uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
and export it to kernel module?

What's your suggestion?

[...]

>> 
>>  - This patches still has one issue. If there are multiple instances of
>>    same application running and user wants to trace any particular
>>    instance, trace_uprobe is updating reference counter in all instances.
>>    This is not a problem on user side because instruction is not replaced
>>    with trap/int3 and thus user will only see samples from his interested
>>    process. But still this is more of a correctness issue. I'm working on
>>    a fix for this.
> 
> Hmm, it sounds like not a correctness issue, but there maybe a performace
> tradeoff. Tracing one particulear instance, other instances also will get
> a performance loss


Right, but it's temporary. I mean, putting everything in to this series was making
it complex. So this is the initial one and I'll send followup patches which will
optimize the reference counter update.


> (Only if the parameter preparation block is heavy,
> because the heaviest part of probing - trap/int3 and recording data - isn't
> executed.)
>> BTW, why this happens? I thought the refcounter part is just a data which
> is not shared among processes...
> 

This happens because we are not calling consumer_filter function. consumer_filter
is the one who decides whether to change the instruction to trap or not in a given
mm. We also need to call it before updating reference counter.

Let me know your thoughts.

Thanks,
Ravi

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-08  2:29     ` Ravi Bangoria
@ 2018-06-08  5:14       ` Masami Hiramatsu
  -1 siblings, 0 replies; 42+ messages in thread
From: Masami Hiramatsu @ 2018-06-08  5:14 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao

On Fri, 8 Jun 2018 07:59:38 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Hi Masami,
> 
> On 06/08/2018 06:40 AM, Masami Hiramatsu wrote:
> > On Wed,  6 Jun 2018 14:03:37 +0530
> > Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
> > 
> >> Why RFC again:
> >>
> >> This series is different from earlier versions[1]. Earlier series
> >> implemented this feature in trace_uprobe while this has implemented
> >> the logic in core uprobe. Few reasons for this:
> >>  1. One of the major reason was the deadlock between uprobe_lock and
> >>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
> >>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
> >>  to take uprobe_lock to loop over trace_uprobe list. More details can
> >>  be found at[2]. With this new approach, there are no deadlocks found
> >>  so far.
> >>  2. Many of the core uprobe function and data-structures needs to be
> >>  exported to make earlier implementation simple. With this new approach,
> >>  reference counter logic is been implemented in core uprobe and thus
> >>  no need to export anything.
> > 
> > I agree with you. Moreover, since uprobe_register/unregister() are
> > exported to modules, this enablement would better be implemented
> > inside uprobe so that all uprobe users benefit from this.
> 
> 
> Sorry, I think you got me wrong. I meant, I don't need to expose all core
> uprobe _static_ functions to tarce_uprobe.
> 
> Now, about kernel modules, basically uprobe_register() takes three parameters:
>     inode, offset and consumer.
> There is no scope for the reference counter there. So I've created one more
> function: uprobe_register_refctr(). But this function is not exported as ABI
> to kernel module. i.e. kernel modules still does not have a way to create
> uprobe with reference counter.

OK, I got it from your patches. :)

> So for kernel modules,
> 
> is it fine to change current ABI from
>     uprobe_register(inode, offset, consumer)
> to
>     uprobe_register(inode, offset, ref_ctr_offset, consumer)
> 
> Or I should introduce new function for this:
>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> and export it to kernel module?
> 
> What's your suggestion?

Latter is fine to me. Since the refctr is introduced totally in userspace
(for SDT) and free-address userspace probing doesn't need refctr, maybe
we should keep those separated.

> [...]
> 
> >> 
> >>  - This patches still has one issue. If there are multiple instances of
> >>    same application running and user wants to trace any particular
> >>    instance, trace_uprobe is updating reference counter in all instances.
> >>    This is not a problem on user side because instruction is not replaced
> >>    with trap/int3 and thus user will only see samples from his interested
> >>    process. But still this is more of a correctness issue. I'm working on
> >>    a fix for this.
> > 
> > Hmm, it sounds like not a correctness issue, but there maybe a performace
> > tradeoff. Tracing one particulear instance, other instances also will get
> > a performance loss
> 
> 
> Right, but it's temporary. I mean, putting everything in to this series was making
> it complex. So this is the initial one and I'll send followup patches which will
> optimize the reference counter update.

Ah, OK. If you have prepared the followup patches, could you also send it
with this series? Perhups it will help us to understand the issue clearer.

> 
> > (Only if the parameter preparation block is heavy,
> > because the heaviest part of probing - trap/int3 and recording data - isn't
> > executed.)
> >> BTW, why this happens? I thought the refcounter part is just a data which
> > is not shared among processes...
> > 
> 
> This happens because we are not calling consumer_filter function. consumer_filter
> is the one who decides whether to change the instruction to trap or not in a given
> mm. We also need to call it before updating reference counter.

Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
remove_breakpoint?

Thank you,

> 
> Let me know your thoughts.
> 
> Thanks,
> Ravi
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-08  5:14       ` Masami Hiramatsu
  0 siblings, 0 replies; 42+ messages in thread
From: Masami Hiramatsu @ 2018-06-08  5:14 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao

On Fri, 8 Jun 2018 07:59:38 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Hi Masami,
> 
> On 06/08/2018 06:40 AM, Masami Hiramatsu wrote:
> > On Wed,  6 Jun 2018 14:03:37 +0530
> > Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
> > 
> >> Why RFC again:
> >>
> >> This series is different from earlier versions[1]. Earlier series
> >> implemented this feature in trace_uprobe while this has implemented
> >> the logic in core uprobe. Few reasons for this:
> >>  1. One of the major reason was the deadlock between uprobe_lock and
> >>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
> >>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
> >>  to take uprobe_lock to loop over trace_uprobe list. More details can
> >>  be found at[2]. With this new approach, there are no deadlocks found
> >>  so far.
> >>  2. Many of the core uprobe function and data-structures needs to be
> >>  exported to make earlier implementation simple. With this new approach,
> >>  reference counter logic is been implemented in core uprobe and thus
> >>  no need to export anything.
> > 
> > I agree with you. Moreover, since uprobe_register/unregister() are
> > exported to modules, this enablement would better be implemented
> > inside uprobe so that all uprobe users benefit from this.
> 
> 
> Sorry, I think you got me wrong. I meant, I don't need to expose all core
> uprobe _static_ functions to tarce_uprobe.
> 
> Now, about kernel modules, basically uprobe_register() takes three parameters:
>     inode, offset and consumer.
> There is no scope for the reference counter there. So I've created one more
> function: uprobe_register_refctr(). But this function is not exported as ABI
> to kernel module. i.e. kernel modules still does not have a way to create
> uprobe with reference counter.

OK, I got it from your patches. :)

> So for kernel modules,
> 
> is it fine to change current ABI from
>     uprobe_register(inode, offset, consumer)
> to
>     uprobe_register(inode, offset, ref_ctr_offset, consumer)
> 
> Or I should introduce new function for this:
>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> and export it to kernel module?
> 
> What's your suggestion?

Latter is fine to me. Since the refctr is introduced totally in userspace
(for SDT) and free-address userspace probing doesn't need refctr, maybe
we should keep those separated.

> [...]
> 
> >> 
> >>  - This patches still has one issue. If there are multiple instances of
> >>    same application running and user wants to trace any particular
> >>    instance, trace_uprobe is updating reference counter in all instances.
> >>    This is not a problem on user side because instruction is not replaced
> >>    with trap/int3 and thus user will only see samples from his interested
> >>    process. But still this is more of a correctness issue. I'm working on
> >>    a fix for this.
> > 
> > Hmm, it sounds like not a correctness issue, but there maybe a performace
> > tradeoff. Tracing one particulear instance, other instances also will get
> > a performance loss
> 
> 
> Right, but it's temporary. I mean, putting everything in to this series was making
> it complex. So this is the initial one and I'll send followup patches which will
> optimize the reference counter update.

Ah, OK. If you have prepared the followup patches, could you also send it
with this series? Perhups it will help us to understand the issue clearer.

> 
> > (Only if the parameter preparation block is heavy,
> > because the heaviest part of probing - trap/int3 and recording data - isn't
> > executed.)
> >> BTW, why this happens? I thought the refcounter part is just a data which
> > is not shared among processes...
> > 
> 
> This happens because we are not calling consumer_filter function. consumer_filter
> is the one who decides whether to change the instruction to trap or not in a given
> mm. We also need to call it before updating reference counter.

Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
remove_breakpoint?

Thank you,

> 
> Let me know your thoughts.
> 
> Thanks,
> Ravi
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-08  5:14       ` Masami Hiramatsu
@ 2018-06-08  6:34         ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-08  6:34 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria

Hi Masami,

>> So for kernel modules,
>>
>> is it fine to change current ABI from
>>     uprobe_register(inode, offset, consumer)
>> to
>>     uprobe_register(inode, offset, ref_ctr_offset, consumer)
>>
>> Or I should introduce new function for this:
>>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
>> and export it to kernel module?
>>
>> What's your suggestion?
> 
> Latter is fine to me. Since the refctr is introduced totally in userspace
> (for SDT) and free-address userspace probing doesn't need refctr, maybe
> we should keep those separated.

Sure.

> 
>> [...]
>>
>>>>
>>>>  - This patches still has one issue. If there are multiple instances of
>>>>    same application running and user wants to trace any particular
>>>>    instance, trace_uprobe is updating reference counter in all instances.
>>>>    This is not a problem on user side because instruction is not replaced
>>>>    with trap/int3 and thus user will only see samples from his interested
>>>>    process. But still this is more of a correctness issue. I'm working on
>>>>    a fix for this.
>>>
>>> Hmm, it sounds like not a correctness issue, but there maybe a performace
>>> tradeoff. Tracing one particulear instance, other instances also will get
>>> a performance loss
>>
>>
>> Right, but it's temporary. I mean, putting everything in to this series was making
>> it complex. So this is the initial one and I'll send followup patches which will
>> optimize the reference counter update.
> 
> Ah, OK. If you have prepared the followup patches, could you also send it
> with this series? Perhups it will help us to understand the issue clearer.

Not ready as such.. it's making the code bit complicated. I'm working on it
and will send the next series with those patches included.

> 
>>
>>> (Only if the parameter preparation block is heavy,
>>> because the heaviest part of probing - trap/int3 and recording data - isn't
>>> executed.)
>>>> BTW, why this happens? I thought the refcounter part is just a data which
>>> is not shared among processes...
>>>
>>
>> This happens because we are not calling consumer_filter function. consumer_filter
>> is the one who decides whether to change the instruction to trap or not in a given
>> mm. We also need to call it before updating reference counter.
> 
> Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
> remove_breakpoint?

Not really, it would be simpler if I can put it inside install_breakpoint().
Consider an mmap() case. Probed instruction resides in the text section whereas
reference counter resides in the data section. These sections gets mapped using
separate mmap() calls. So, when process mmaps the text section we will change the
instruction, but section holding the reference counter may not have been mapped
yet in the virtual memory. If so, we will fail to update the reference counter.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-08  6:34         ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-08  6:34 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria

Hi Masami,

>> So for kernel modules,
>>
>> is it fine to change current ABI from
>>     uprobe_register(inode, offset, consumer)
>> to
>>     uprobe_register(inode, offset, ref_ctr_offset, consumer)
>>
>> Or I should introduce new function for this:
>>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
>> and export it to kernel module?
>>
>> What's your suggestion?
> 
> Latter is fine to me. Since the refctr is introduced totally in userspace
> (for SDT) and free-address userspace probing doesn't need refctr, maybe
> we should keep those separated.

Sure.

> 
>> [...]
>>
>>>>
>>>>  - This patches still has one issue. If there are multiple instances of
>>>>    same application running and user wants to trace any particular
>>>>    instance, trace_uprobe is updating reference counter in all instances.
>>>>    This is not a problem on user side because instruction is not replaced
>>>>    with trap/int3 and thus user will only see samples from his interested
>>>>    process. But still this is more of a correctness issue. I'm working on
>>>>    a fix for this.
>>>
>>> Hmm, it sounds like not a correctness issue, but there maybe a performace
>>> tradeoff. Tracing one particulear instance, other instances also will get
>>> a performance loss
>>
>>
>> Right, but it's temporary. I mean, putting everything in to this series was making
>> it complex. So this is the initial one and I'll send followup patches which will
>> optimize the reference counter update.
> 
> Ah, OK. If you have prepared the followup patches, could you also send it
> with this series? Perhups it will help us to understand the issue clearer.

Not ready as such.. it's making the code bit complicated. I'm working on it
and will send the next series with those patches included.

> 
>>
>>> (Only if the parameter preparation block is heavy,
>>> because the heaviest part of probing - trap/int3 and recording data - isn't
>>> executed.)
>>>> BTW, why this happens? I thought the refcounter part is just a data which
>>> is not shared among processes...
>>>
>>
>> This happens because we are not calling consumer_filter function. consumer_filter
>> is the one who decides whether to change the instruction to trap or not in a given
>> mm. We also need to call it before updating reference counter.
> 
> Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
> remove_breakpoint?

Not really, it would be simpler if I can put it inside install_breakpoint().
Consider an mmap() case. Probed instruction resides in the text section whereas
reference counter resides in the data section. These sections gets mapped using
separate mmap() calls. So, when process mmaps the text section we will change the
instruction, but section holding the reference counter may not have been mapped
yet in the virtual memory. If so, we will fail to update the reference counter.

Thanks,
Ravi

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-08  6:34         ` Ravi Bangoria
@ 2018-06-08 15:45           ` Masami Hiramatsu
  -1 siblings, 0 replies; 42+ messages in thread
From: Masami Hiramatsu @ 2018-06-08 15:45 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao

On Fri, 8 Jun 2018 12:04:25 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Hi Masami,
> 
> >> So for kernel modules,
> >>
> >> is it fine to change current ABI from
> >>     uprobe_register(inode, offset, consumer)
> >> to
> >>     uprobe_register(inode, offset, ref_ctr_offset, consumer)
> >>
> >> Or I should introduce new function for this:
> >>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> >> and export it to kernel module?
> >>
> >> What's your suggestion?
> > 
> > Latter is fine to me. Since the refctr is introduced totally in userspace
> > (for SDT) and free-address userspace probing doesn't need refctr, maybe
> > we should keep those separated.
> 
> Sure.
> 
> > 
> >> [...]
> >>
> >>>>
> >>>>  - This patches still has one issue. If there are multiple instances of
> >>>>    same application running and user wants to trace any particular
> >>>>    instance, trace_uprobe is updating reference counter in all instances.
> >>>>    This is not a problem on user side because instruction is not replaced
> >>>>    with trap/int3 and thus user will only see samples from his interested
> >>>>    process. But still this is more of a correctness issue. I'm working on
> >>>>    a fix for this.
> >>>
> >>> Hmm, it sounds like not a correctness issue, but there maybe a performace
> >>> tradeoff. Tracing one particulear instance, other instances also will get
> >>> a performance loss
> >>
> >>
> >> Right, but it's temporary. I mean, putting everything in to this series was making
> >> it complex. So this is the initial one and I'll send followup patches which will
> >> optimize the reference counter update.
> > 
> > Ah, OK. If you have prepared the followup patches, could you also send it
> > with this series? Perhups it will help us to understand the issue clearer.
> 
> Not ready as such.. it's making the code bit complicated. I'm working on it
> and will send the next series with those patches included.

OK, thanks!

> >>> (Only if the parameter preparation block is heavy,
> >>> because the heaviest part of probing - trap/int3 and recording data - isn't
> >>> executed.)
> >>>> BTW, why this happens? I thought the refcounter part is just a data which
> >>> is not shared among processes...
> >>>
> >>
> >> This happens because we are not calling consumer_filter function. consumer_filter
> >> is the one who decides whether to change the instruction to trap or not in a given
> >> mm. We also need to call it before updating reference counter.
> > 
> > Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
> > remove_breakpoint?
> 
> Not really, it would be simpler if I can put it inside install_breakpoint().
> Consider an mmap() case. Probed instruction resides in the text section whereas
> reference counter resides in the data section. These sections gets mapped using
> separate mmap() calls. So, when process mmaps the text section we will change the
> instruction, but section holding the reference counter may not have been mapped
> yet in the virtual memory. If so, we will fail to update the reference counter.

Got it. 
In such case, maybe we can hook the target page mmapped and do install_breakpoint()
at that point. Since the instruction is protected by a refctr, unless mmap the
page on where the refctr is, the program doesn't reach the tracepoint. Is that right?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-08 15:45           ` Masami Hiramatsu
  0 siblings, 0 replies; 42+ messages in thread
From: Masami Hiramatsu @ 2018-06-08 15:45 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao

On Fri, 8 Jun 2018 12:04:25 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Hi Masami,
> 
> >> So for kernel modules,
> >>
> >> is it fine to change current ABI from
> >>     uprobe_register(inode, offset, consumer)
> >> to
> >>     uprobe_register(inode, offset, ref_ctr_offset, consumer)
> >>
> >> Or I should introduce new function for this:
> >>     uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)
> >> and export it to kernel module?
> >>
> >> What's your suggestion?
> > 
> > Latter is fine to me. Since the refctr is introduced totally in userspace
> > (for SDT) and free-address userspace probing doesn't need refctr, maybe
> > we should keep those separated.
> 
> Sure.
> 
> > 
> >> [...]
> >>
> >>>>
> >>>>  - This patches still has one issue. If there are multiple instances of
> >>>>    same application running and user wants to trace any particular
> >>>>    instance, trace_uprobe is updating reference counter in all instances.
> >>>>    This is not a problem on user side because instruction is not replaced
> >>>>    with trap/int3 and thus user will only see samples from his interested
> >>>>    process. But still this is more of a correctness issue. I'm working on
> >>>>    a fix for this.
> >>>
> >>> Hmm, it sounds like not a correctness issue, but there maybe a performace
> >>> tradeoff. Tracing one particulear instance, other instances also will get
> >>> a performance loss
> >>
> >>
> >> Right, but it's temporary. I mean, putting everything in to this series was making
> >> it complex. So this is the initial one and I'll send followup patches which will
> >> optimize the reference counter update.
> > 
> > Ah, OK. If you have prepared the followup patches, could you also send it
> > with this series? Perhups it will help us to understand the issue clearer.
> 
> Not ready as such.. it's making the code bit complicated. I'm working on it
> and will send the next series with those patches included.

OK, thanks!

> >>> (Only if the parameter preparation block is heavy,
> >>> because the heaviest part of probing - trap/int3 and recording data - isn't
> >>> executed.)
> >>>> BTW, why this happens? I thought the refcounter part is just a data which
> >>> is not shared among processes...
> >>>
> >>
> >> This happens because we are not calling consumer_filter function. consumer_filter
> >> is the one who decides whether to change the instruction to trap or not in a given
> >> mm. We also need to call it before updating reference counter.
> > 
> > Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
> > remove_breakpoint?
> 
> Not really, it would be simpler if I can put it inside install_breakpoint().
> Consider an mmap() case. Probed instruction resides in the text section whereas
> reference counter resides in the data section. These sections gets mapped using
> separate mmap() calls. So, when process mmaps the text section we will change the
> instruction, but section holding the reference counter may not have been mapped
> yet in the virtual memory. If so, we will fail to update the reference counter.

Got it. 
In such case, maybe we can hook the target page mmapped and do install_breakpoint()
at that point. Since the instruction is protected by a refctr, unless mmap the
page on where the refctr is, the program doesn't reach the tracepoint. Is that right?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-06  8:33 ` Ravi Bangoria
@ 2018-06-08 16:36   ` Oleg Nesterov
  -1 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2018-06-08 16:36 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao

Hello,

I am travelling till the end of the next week, can't read this version
until I return. Just one question,

On 06/06, Ravi Bangoria wrote:
>
>  1. One of the major reason was the deadlock between uprobe_lock and
>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix

Could you remind what exactly was wrong?

I can't find your previous email about this problem, but iirc you didn't
explain the deadlock in details, just copied some traces from lockdep...

Oleg.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-08 16:36   ` Oleg Nesterov
  0 siblings, 0 replies; 42+ messages in thread
From: Oleg Nesterov @ 2018-06-08 16:36 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao

Hello,

I am travelling till the end of the next week, can't read this version
until I return. Just one question,

On 06/06, Ravi Bangoria wrote:
>
>  1. One of the major reason was the deadlock between uprobe_lock and
>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix

Could you remind what exactly was wrong?

I can't find your previous email about this problem, but iirc you didn't
explain the deadlock in details, just copied some traces from lockdep...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-08 16:36   ` Oleg Nesterov
@ 2018-06-11  4:13     ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-11  4:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao, Ravi Bangoria

Hi Oleg,

On 06/08/2018 10:06 PM, Oleg Nesterov wrote:
> Hello,
> 
> I am travelling till the end of the next week, can't read this version
> until I return. Just one question,
> 
> On 06/06, Ravi Bangoria wrote:
>>
>>  1. One of the major reason was the deadlock between uprobe_lock and
>>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
> 
> Could you remind what exactly was wrong?
> 
> I can't find your previous email about this problem, but iirc you didn't
> explain the deadlock in details, just copied some traces from lockdep...

The deadlock is between mm->mmap_sem and uprobe_lock.

Some existing code path is taking these locks in following order:
	uprobe_lock
	  event_mutex
	    uprobe->register_rwsem
	      dup_mmap_sem
		mm->mmap_sem

I've introduced new function trace_uprobe_mmap() which gets called
from mmap_region() / vma_adjust() with mm->mmap_sem already acquired.
And it has to take uprobe_lock to loop over all trace_uprobes. i.e.
the sequence is:
	mm->mmap_sem
	  uprobe_lock

Why it's difficult to resolve is because trace_uprobe_mmap() does
not have control over mm->mmap_sem.

Detailed trace from lockdep:

[  499.258006] ======================================================
[  499.258205] WARNING: possible circular locking dependency detected
[  499.258409] 4.17.0-rc3+ #76 Not tainted
[  499.258528] ------------------------------------------------------
[  499.258731] perf/6744 is trying to acquire lock:
[  499.258895] 00000000e4895f49 (uprobe_lock){+.+.}, at: trace_uprobe_mmap+0x78/0x130
[  499.259147]
[  499.259147] but task is already holding lock:
[  499.259349] 000000009ec93a76 (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xe0/0x160
[  499.259597]
[  499.259597] which lock already depends on the new lock.
[  499.259597]
[  499.259848]
[  499.259848] the existing dependency chain (in reverse order) is:
[  499.260086]
[  499.260086] -> #4 (&mm->mmap_sem){++++}:
[  499.260277]        __lock_acquire+0x53c/0x910
[  499.260442]        lock_acquire+0xf4/0x2f0
[  499.260595]        down_write_killable+0x6c/0x150
[  499.260764]        copy_process.isra.34.part.35+0x1594/0x1be0
[  499.260967]        _do_fork+0xf8/0x910
[  499.261090]        ppc_clone+0x8/0xc
[  499.261209]
[  499.261209] -> #3 (&dup_mmap_sem){++++}:
[  499.261378]        __lock_acquire+0x53c/0x910
[  499.261540]        lock_acquire+0xf4/0x2f0
[  499.261669]        down_write+0x6c/0x110
[  499.261793]        percpu_down_write+0x48/0x140
[  499.261954]        register_for_each_vma+0x6c/0x2a0
[  499.262116]        uprobe_register+0x230/0x320
[  499.262277]        probe_event_enable+0x1cc/0x540
[  499.262435]        perf_trace_event_init+0x1e0/0x350
[  499.262587]        perf_trace_init+0xb0/0x110
[  499.262750]        perf_tp_event_init+0x38/0x90
[  499.262910]        perf_try_init_event+0x10c/0x150
[  499.263075]        perf_event_alloc+0xbb0/0xf10
[  499.263235]        sys_perf_event_open+0x2a8/0xdd0
[  499.263396]        system_call+0x58/0x6c
[  499.263516]
[  499.263516] -> #2 (&uprobe->register_rwsem){++++}:
[  499.263723]        __lock_acquire+0x53c/0x910
[  499.263884]        lock_acquire+0xf4/0x2f0
[  499.264002]        down_write+0x6c/0x110
[  499.264118]        uprobe_register+0x1ec/0x320
[  499.264283]        probe_event_enable+0x1cc/0x540
[  499.264442]        perf_trace_event_init+0x1e0/0x350
[  499.264603]        perf_trace_init+0xb0/0x110
[  499.264766]        perf_tp_event_init+0x38/0x90
[  499.264930]        perf_try_init_event+0x10c/0x150
[  499.265092]        perf_event_alloc+0xbb0/0xf10
[  499.265261]        sys_perf_event_open+0x2a8/0xdd0
[  499.265424]        system_call+0x58/0x6c
[  499.265542]
[  499.265542] -> #1 (event_mutex){+.+.}:
[  499.265738]        __lock_acquire+0x53c/0x910
[  499.265896]        lock_acquire+0xf4/0x2f0
[  499.266019]        __mutex_lock+0xa0/0xab0
[  499.266142]        trace_add_event_call+0x44/0x100
[  499.266310]        create_trace_uprobe+0x4a0/0x8b0
[  499.266474]        trace_run_command+0xa4/0xc0
[  499.266631]        trace_parse_run_command+0xe4/0x200
[  499.266799]        probes_write+0x20/0x40
[  499.266922]        __vfs_write+0x6c/0x240
[  499.267041]        vfs_write+0xd0/0x240
[  499.267166]        ksys_write+0x6c/0x110
[  499.267295]        system_call+0x58/0x6c
[  499.267413]
[  499.267413] -> #0 (uprobe_lock){+.+.}:
[  499.267591]        validate_chain.isra.34+0xbd0/0x1000
[  499.267747]        __lock_acquire+0x53c/0x910
[  499.267917]        lock_acquire+0xf4/0x2f0
[  499.268048]        __mutex_lock+0xa0/0xab0
[  499.268170]        trace_uprobe_mmap+0x78/0x130
[  499.268335]        uprobe_mmap+0x80/0x3b0
[  499.268464]        mmap_region+0x290/0x660
[  499.268590]        do_mmap+0x40c/0x500
[  499.268718]        vm_mmap_pgoff+0x114/0x160
[  499.268870]        ksys_mmap_pgoff+0xe8/0x2e0
[  499.269034]        sys_mmap+0x84/0xf0
[  499.269161]        system_call+0x58/0x6c
[  499.269279]
[  499.269279] other info that might help us debug this:
[  499.269279]
[  499.269524] Chain exists of:
[  499.269524]   uprobe_lock --> &dup_mmap_sem --> &mm->mmap_sem
[  499.269524]
[  499.269856]  Possible unsafe locking scenario:
[  499.269856]
[  499.270058]        CPU0                    CPU1
[  499.270223]        ----                    ----
[  499.270384]   lock(&mm->mmap_sem);
[  499.270514]                                lock(&dup_mmap_sem);
[  499.270711]                                lock(&mm->mmap_sem);
[  499.270923]   lock(uprobe_lock);
[  499.271046]
[  499.271046]  *** DEADLOCK ***
[  499.271046]
[  499.271256] 1 lock held by perf/6744:
[  499.271377]  #0: 000000009ec93a76 (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xe0/0x160
[  499.271628]
[  499.271628] stack backtrace:
[  499.271797] CPU: 25 PID: 6744 Comm: perf Not tainted 4.17.0-rc3+ #76
[  499.272003] Call Trace:
[  499.272094] [c0000000e32d74a0] [c000000000b00174] dump_stack+0xe8/0x164 (unreliable)
[  499.272349] [c0000000e32d74f0] [c0000000001a905c] print_circular_bug.isra.30+0x354/0x388
[  499.272590] [c0000000e32d7590] [c0000000001a3050] check_prev_add.constprop.38+0x8f0/0x910
[  499.272828] [c0000000e32d7690] [c0000000001a3c40] validate_chain.isra.34+0xbd0/0x1000
[  499.273070] [c0000000e32d7780] [c0000000001a57cc] __lock_acquire+0x53c/0x910
[  499.273311] [c0000000e32d7860] [c0000000001a65b4] lock_acquire+0xf4/0x2f0
[  499.273510] [c0000000e32d7930] [c000000000b1d1f0] __mutex_lock+0xa0/0xab0
[  499.273717] [c0000000e32d7a40] [c0000000002b01b8] trace_uprobe_mmap+0x78/0x130
[  499.273952] [c0000000e32d7a90] [c0000000002d7070] uprobe_mmap+0x80/0x3b0
[  499.274153] [c0000000e32d7b20] [c0000000003550a0] mmap_region+0x290/0x660
[  499.274353] [c0000000e32d7c00] [c00000000035587c] do_mmap+0x40c/0x500
[  499.274560] [c0000000e32d7c80] [c00000000031ebc4] vm_mmap_pgoff+0x114/0x160
[  499.274763] [c0000000e32d7d60] [c000000000352818] ksys_mmap_pgoff+0xe8/0x2e0
[  499.275013] [c0000000e32d7de0] [c000000000016864] sys_mmap+0x84/0xf0
[  499.275207] [c0000000e32d7e30] [c00000000000b404] system_call+0x58/0x6c


( Reference: https://lkml.org/lkml/2018/5/25/111 )

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-11  4:13     ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-11  4:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: srikar, rostedt, mhiramat, peterz, mingo, acme,
	alexander.shishkin, jolsa, namhyung, linux-kernel, corbet,
	linux-doc, ananth, alexis.berlemont, naveen.n.rao, Ravi Bangoria

Hi Oleg,

On 06/08/2018 10:06 PM, Oleg Nesterov wrote:
> Hello,
> 
> I am travelling till the end of the next week, can't read this version
> until I return. Just one question,
> 
> On 06/06, Ravi Bangoria wrote:
>>
>>  1. One of the major reason was the deadlock between uprobe_lock and
>>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
> 
> Could you remind what exactly was wrong?
> 
> I can't find your previous email about this problem, but iirc you didn't
> explain the deadlock in details, just copied some traces from lockdep...

The deadlock is between mm->mmap_sem and uprobe_lock.

Some existing code path is taking these locks in following order:
	uprobe_lock
	  event_mutex
	    uprobe->register_rwsem
	      dup_mmap_sem
		mm->mmap_sem

I've introduced new function trace_uprobe_mmap() which gets called
from mmap_region() / vma_adjust() with mm->mmap_sem already acquired.
And it has to take uprobe_lock to loop over all trace_uprobes. i.e.
the sequence is:
	mm->mmap_sem
	  uprobe_lock

Why it's difficult to resolve is because trace_uprobe_mmap() does
not have control over mm->mmap_sem.

Detailed trace from lockdep:

[  499.258006] ======================================================
[  499.258205] WARNING: possible circular locking dependency detected
[  499.258409] 4.17.0-rc3+ #76 Not tainted
[  499.258528] ------------------------------------------------------
[  499.258731] perf/6744 is trying to acquire lock:
[  499.258895] 00000000e4895f49 (uprobe_lock){+.+.}, at: trace_uprobe_mmap+0x78/0x130
[  499.259147]
[  499.259147] but task is already holding lock:
[  499.259349] 000000009ec93a76 (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xe0/0x160
[  499.259597]
[  499.259597] which lock already depends on the new lock.
[  499.259597]
[  499.259848]
[  499.259848] the existing dependency chain (in reverse order) is:
[  499.260086]
[  499.260086] -> #4 (&mm->mmap_sem){++++}:
[  499.260277]        __lock_acquire+0x53c/0x910
[  499.260442]        lock_acquire+0xf4/0x2f0
[  499.260595]        down_write_killable+0x6c/0x150
[  499.260764]        copy_process.isra.34.part.35+0x1594/0x1be0
[  499.260967]        _do_fork+0xf8/0x910
[  499.261090]        ppc_clone+0x8/0xc
[  499.261209]
[  499.261209] -> #3 (&dup_mmap_sem){++++}:
[  499.261378]        __lock_acquire+0x53c/0x910
[  499.261540]        lock_acquire+0xf4/0x2f0
[  499.261669]        down_write+0x6c/0x110
[  499.261793]        percpu_down_write+0x48/0x140
[  499.261954]        register_for_each_vma+0x6c/0x2a0
[  499.262116]        uprobe_register+0x230/0x320
[  499.262277]        probe_event_enable+0x1cc/0x540
[  499.262435]        perf_trace_event_init+0x1e0/0x350
[  499.262587]        perf_trace_init+0xb0/0x110
[  499.262750]        perf_tp_event_init+0x38/0x90
[  499.262910]        perf_try_init_event+0x10c/0x150
[  499.263075]        perf_event_alloc+0xbb0/0xf10
[  499.263235]        sys_perf_event_open+0x2a8/0xdd0
[  499.263396]        system_call+0x58/0x6c
[  499.263516]
[  499.263516] -> #2 (&uprobe->register_rwsem){++++}:
[  499.263723]        __lock_acquire+0x53c/0x910
[  499.263884]        lock_acquire+0xf4/0x2f0
[  499.264002]        down_write+0x6c/0x110
[  499.264118]        uprobe_register+0x1ec/0x320
[  499.264283]        probe_event_enable+0x1cc/0x540
[  499.264442]        perf_trace_event_init+0x1e0/0x350
[  499.264603]        perf_trace_init+0xb0/0x110
[  499.264766]        perf_tp_event_init+0x38/0x90
[  499.264930]        perf_try_init_event+0x10c/0x150
[  499.265092]        perf_event_alloc+0xbb0/0xf10
[  499.265261]        sys_perf_event_open+0x2a8/0xdd0
[  499.265424]        system_call+0x58/0x6c
[  499.265542]
[  499.265542] -> #1 (event_mutex){+.+.}:
[  499.265738]        __lock_acquire+0x53c/0x910
[  499.265896]        lock_acquire+0xf4/0x2f0
[  499.266019]        __mutex_lock+0xa0/0xab0
[  499.266142]        trace_add_event_call+0x44/0x100
[  499.266310]        create_trace_uprobe+0x4a0/0x8b0
[  499.266474]        trace_run_command+0xa4/0xc0
[  499.266631]        trace_parse_run_command+0xe4/0x200
[  499.266799]        probes_write+0x20/0x40
[  499.266922]        __vfs_write+0x6c/0x240
[  499.267041]        vfs_write+0xd0/0x240
[  499.267166]        ksys_write+0x6c/0x110
[  499.267295]        system_call+0x58/0x6c
[  499.267413]
[  499.267413] -> #0 (uprobe_lock){+.+.}:
[  499.267591]        validate_chain.isra.34+0xbd0/0x1000
[  499.267747]        __lock_acquire+0x53c/0x910
[  499.267917]        lock_acquire+0xf4/0x2f0
[  499.268048]        __mutex_lock+0xa0/0xab0
[  499.268170]        trace_uprobe_mmap+0x78/0x130
[  499.268335]        uprobe_mmap+0x80/0x3b0
[  499.268464]        mmap_region+0x290/0x660
[  499.268590]        do_mmap+0x40c/0x500
[  499.268718]        vm_mmap_pgoff+0x114/0x160
[  499.268870]        ksys_mmap_pgoff+0xe8/0x2e0
[  499.269034]        sys_mmap+0x84/0xf0
[  499.269161]        system_call+0x58/0x6c
[  499.269279]
[  499.269279] other info that might help us debug this:
[  499.269279]
[  499.269524] Chain exists of:
[  499.269524]   uprobe_lock --> &dup_mmap_sem --> &mm->mmap_sem
[  499.269524]
[  499.269856]  Possible unsafe locking scenario:
[  499.269856]
[  499.270058]        CPU0                    CPU1
[  499.270223]        ----                    ----
[  499.270384]   lock(&mm->mmap_sem);
[  499.270514]                                lock(&dup_mmap_sem);
[  499.270711]                                lock(&mm->mmap_sem);
[  499.270923]   lock(uprobe_lock);
[  499.271046]
[  499.271046]  *** DEADLOCK ***
[  499.271046]
[  499.271256] 1 lock held by perf/6744:
[  499.271377]  #0: 000000009ec93a76 (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xe0/0x160
[  499.271628]
[  499.271628] stack backtrace:
[  499.271797] CPU: 25 PID: 6744 Comm: perf Not tainted 4.17.0-rc3+ #76
[  499.272003] Call Trace:
[  499.272094] [c0000000e32d74a0] [c000000000b00174] dump_stack+0xe8/0x164 (unreliable)
[  499.272349] [c0000000e32d74f0] [c0000000001a905c] print_circular_bug.isra.30+0x354/0x388
[  499.272590] [c0000000e32d7590] [c0000000001a3050] check_prev_add.constprop.38+0x8f0/0x910
[  499.272828] [c0000000e32d7690] [c0000000001a3c40] validate_chain.isra.34+0xbd0/0x1000
[  499.273070] [c0000000e32d7780] [c0000000001a57cc] __lock_acquire+0x53c/0x910
[  499.273311] [c0000000e32d7860] [c0000000001a65b4] lock_acquire+0xf4/0x2f0
[  499.273510] [c0000000e32d7930] [c000000000b1d1f0] __mutex_lock+0xa0/0xab0
[  499.273717] [c0000000e32d7a40] [c0000000002b01b8] trace_uprobe_mmap+0x78/0x130
[  499.273952] [c0000000e32d7a90] [c0000000002d7070] uprobe_mmap+0x80/0x3b0
[  499.274153] [c0000000e32d7b20] [c0000000003550a0] mmap_region+0x290/0x660
[  499.274353] [c0000000e32d7c00] [c00000000035587c] do_mmap+0x40c/0x500
[  499.274560] [c0000000e32d7c80] [c00000000031ebc4] vm_mmap_pgoff+0x114/0x160
[  499.274763] [c0000000e32d7d60] [c000000000352818] ksys_mmap_pgoff+0xe8/0x2e0
[  499.275013] [c0000000e32d7de0] [c000000000016864] sys_mmap+0x84/0xf0
[  499.275207] [c0000000e32d7e30] [c00000000000b404] system_call+0x58/0x6c


( Reference: https://lkml.org/lkml/2018/5/25/111 )

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-08 15:45           ` Masami Hiramatsu
@ 2018-06-11  4:31             ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-11  4:31 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria

Hi Masami,

>>> Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
>>> remove_breakpoint?
>>
>> Not really, it would be simpler if I can put it inside install_breakpoint().
>> Consider an mmap() case. Probed instruction resides in the text section whereas
>> reference counter resides in the data section. These sections gets mapped using
>> separate mmap() calls. So, when process mmaps the text section we will change the
>> instruction, but section holding the reference counter may not have been mapped
>> yet in the virtual memory. If so, we will fail to update the reference counter.
> 
> Got it. 
> In such case, maybe we can hook the target page mmapped and do install_breakpoint()
> at that point. Since the instruction is protected by a refctr, unless mmap the
> page on where the refctr is, the program doesn't reach the tracepoint. Is that right?
> 

You mean, when mmap(text) happens, save the target page somewhere and when
mmap(data) happens, update both instruction and ref_ctr?

This sounds feasible. Let me think on it.

Thanks for suggestion,
Ravi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-11  4:31             ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-11  4:31 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria

Hi Masami,

>>> Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
>>> remove_breakpoint?
>>
>> Not really, it would be simpler if I can put it inside install_breakpoint().
>> Consider an mmap() case. Probed instruction resides in the text section whereas
>> reference counter resides in the data section. These sections gets mapped using
>> separate mmap() calls. So, when process mmaps the text section we will change the
>> instruction, but section holding the reference counter may not have been mapped
>> yet in the virtual memory. If so, we will fail to update the reference counter.
> 
> Got it. 
> In such case, maybe we can hook the target page mmapped and do install_breakpoint()
> at that point. Since the instruction is protected by a refctr, unless mmap the
> page on where the refctr is, the program doesn't reach the tracepoint. Is that right?
> 

You mean, when mmap(text) happens, save the target page somewhere and when
mmap(data) happens, update both instruction and ref_ctr?

This sounds feasible. Let me think on it.

Thanks for suggestion,
Ravi

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-11  4:31             ` Ravi Bangoria
@ 2018-06-16 13:50               ` Masami Hiramatsu
  -1 siblings, 0 replies; 42+ messages in thread
From: Masami Hiramatsu @ 2018-06-16 13:50 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao

Hi Ravi,

Sorry for replying later.

On Mon, 11 Jun 2018 10:01:58 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Hi Masami,
> 
> >>> Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
> >>> remove_breakpoint?
> >>
> >> Not really, it would be simpler if I can put it inside install_breakpoint().
> >> Consider an mmap() case. Probed instruction resides in the text section whereas
> >> reference counter resides in the data section. These sections gets mapped using
> >> separate mmap() calls. So, when process mmaps the text section we will change the
> >> instruction, but section holding the reference counter may not have been mapped
> >> yet in the virtual memory. If so, we will fail to update the reference counter.
> > 
> > Got it. 
> > In such case, maybe we can hook the target page mmapped and do install_breakpoint()
> > at that point. Since the instruction is protected by a refctr, unless mmap the
> > page on where the refctr is, the program doesn't reach the tracepoint. Is that right?
> > 
> 
> You mean, when mmap(text) happens, save the target page somewhere and when
> mmap(data) happens, update both instruction and ref_ctr?

Yes. I think you can just clone the target(text) page but not install
breakpoint, and if mmap(data) happens, update both.

> 
> This sounds feasible. Let me think on it.

Thanks!

> 
> Thanks for suggestion,
> Ravi
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-16 13:50               ` Masami Hiramatsu
  0 siblings, 0 replies; 42+ messages in thread
From: Masami Hiramatsu @ 2018-06-16 13:50 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao

Hi Ravi,

Sorry for replying later.

On Mon, 11 Jun 2018 10:01:58 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Hi Masami,
> 
> >>> Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
> >>> remove_breakpoint?
> >>
> >> Not really, it would be simpler if I can put it inside install_breakpoint().
> >> Consider an mmap() case. Probed instruction resides in the text section whereas
> >> reference counter resides in the data section. These sections gets mapped using
> >> separate mmap() calls. So, when process mmaps the text section we will change the
> >> instruction, but section holding the reference counter may not have been mapped
> >> yet in the virtual memory. If so, we will fail to update the reference counter.
> > 
> > Got it. 
> > In such case, maybe we can hook the target page mmapped and do install_breakpoint()
> > at that point. Since the instruction is protected by a refctr, unless mmap the
> > page on where the refctr is, the program doesn't reach the tracepoint. Is that right?
> > 
> 
> You mean, when mmap(text) happens, save the target page somewhere and when
> mmap(data) happens, update both instruction and ref_ctr?

Yes. I think you can just clone the target(text) page but not install
breakpoint, and if mmap(data) happens, update both.

> 
> This sounds feasible. Let me think on it.

Thanks!

> 
> Thanks for suggestion,
> Ravi
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-16 13:50               ` Masami Hiramatsu
@ 2018-06-16 15:07                 ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-16 15:07 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria

Hi Masami,

On 06/16/2018 07:20 PM, Masami Hiramatsu wrote:
> Hi Ravi,
> 
> Sorry for replying later.

No issues :)

> 
> On Mon, 11 Jun 2018 10:01:58 +0530
> Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
> 
>> Hi Masami,
>>
>>>>> Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
>>>>> remove_breakpoint?
>>>>
>>>> Not really, it would be simpler if I can put it inside install_breakpoint().
>>>> Consider an mmap() case. Probed instruction resides in the text section whereas
>>>> reference counter resides in the data section. These sections gets mapped using
>>>> separate mmap() calls. So, when process mmaps the text section we will change the
>>>> instruction, but section holding the reference counter may not have been mapped
>>>> yet in the virtual memory. If so, we will fail to update the reference counter.
>>>
>>> Got it. 
>>> In such case, maybe we can hook the target page mmapped and do install_breakpoint()
>>> at that point. Since the instruction is protected by a refctr, unless mmap the
>>> page on where the refctr is, the program doesn't reach the tracepoint. Is that right?
>>>
>>
>> You mean, when mmap(text) happens, save the target page somewhere and when
>> mmap(data) happens, update both instruction and ref_ctr?
> 
> Yes. I think you can just clone the target(text) page but not install
> breakpoint, and if mmap(data) happens, update both.

I'm preparing a prototype according to this. The only difference in my approach is,
I'm not delaying instruction update. I.e. let instruction update happen as it is,
just mark that uprobe as delayed. Whenever vma holding reference counter gets mapped, 
update the reference counter.

Will post the series soon.

Thanks for the reply,
Ravi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-16 15:07                 ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-16 15:07 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: oleg, srikar, rostedt, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria

Hi Masami,

On 06/16/2018 07:20 PM, Masami Hiramatsu wrote:
> Hi Ravi,
> 
> Sorry for replying later.

No issues :)

> 
> On Mon, 11 Jun 2018 10:01:58 +0530
> Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
> 
>> Hi Masami,
>>
>>>>> Hmm, it sounds simple... maybe we can increment refctr in install_breakpoint/
>>>>> remove_breakpoint?
>>>>
>>>> Not really, it would be simpler if I can put it inside install_breakpoint().
>>>> Consider an mmap() case. Probed instruction resides in the text section whereas
>>>> reference counter resides in the data section. These sections gets mapped using
>>>> separate mmap() calls. So, when process mmaps the text section we will change the
>>>> instruction, but section holding the reference counter may not have been mapped
>>>> yet in the virtual memory. If so, we will fail to update the reference counter.
>>>
>>> Got it. 
>>> In such case, maybe we can hook the target page mmapped and do install_breakpoint()
>>> at that point. Since the instruction is protected by a refctr, unless mmap the
>>> page on where the refctr is, the program doesn't reach the tracepoint. Is that right?
>>>
>>
>> You mean, when mmap(text) happens, save the target page somewhere and when
>> mmap(data) happens, update both instruction and ref_ctr?
> 
> Yes. I think you can just clone the target(text) page but not install
> breakpoint, and if mmap(data) happens, update both.

I'm preparing a prototype according to this. The only difference in my approach is,
I'm not delaying instruction update. I.e. let instruction update happen as it is,
just mark that uprobe as delayed. Whenever vma holding reference counter gets mapped, 
update the reference counter.

Will post the series soon.

Thanks for the reply,
Ravi

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-06  8:33 ` Ravi Bangoria
@ 2018-06-20 16:37   ` Steven Rostedt
  -1 siblings, 0 replies; 42+ messages in thread
From: Steven Rostedt @ 2018-06-20 16:37 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao

On Wed,  6 Jun 2018 14:03:37 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Why RFC again:
> 
> This series is different from earlier versions[1]. Earlier series
> implemented this feature in trace_uprobe while this has implemented
> the logic in core uprobe. Few reasons for this:
>  1. One of the major reason was the deadlock between uprobe_lock and
>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>  to take uprobe_lock to loop over trace_uprobe list. More details can
>  be found at[2]. With this new approach, there are no deadlocks found
>  so far.
>  2. Many of the core uprobe function and data-structures needs to be
>  exported to make earlier implementation simple. With this new approach,
>  reference counter logic is been implemented in core uprobe and thus
>  no need to export anything.


A quick scan of the patches, I don't see anything controversial with
them. Unless others have any qualms about it, I say repost as non RFC,
and we can start doing a more thorough review.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-20 16:37   ` Steven Rostedt
  0 siblings, 0 replies; 42+ messages in thread
From: Steven Rostedt @ 2018-06-20 16:37 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: oleg, srikar, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao

On Wed,  6 Jun 2018 14:03:37 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Why RFC again:
> 
> This series is different from earlier versions[1]. Earlier series
> implemented this feature in trace_uprobe while this has implemented
> the logic in core uprobe. Few reasons for this:
>  1. One of the major reason was the deadlock between uprobe_lock and
>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>  to take uprobe_lock to loop over trace_uprobe list. More details can
>  be found at[2]. With this new approach, there are no deadlocks found
>  so far.
>  2. Many of the core uprobe function and data-structures needs to be
>  exported to make earlier implementation simple. With this new approach,
>  reference counter logic is been implemented in core uprobe and thus
>  no need to export anything.


A quick scan of the patches, I don't see anything controversial with
them. Unless others have any qualms about it, I say repost as non RFC,
and we can start doing a more thorough review.

Thanks,

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
  2018-06-20 16:37   ` Steven Rostedt
@ 2018-06-21  2:35     ` Ravi Bangoria
  -1 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-21  2:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: oleg, srikar, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria



On 06/20/2018 10:07 PM, Steven Rostedt wrote:
> On Wed,  6 Jun 2018 14:03:37 +0530
> Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
> 
>> Why RFC again:
>>
>> This series is different from earlier versions[1]. Earlier series
>> implemented this feature in trace_uprobe while this has implemented
>> the logic in core uprobe. Few reasons for this:
>>  1. One of the major reason was the deadlock between uprobe_lock and
>>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>>  to take uprobe_lock to loop over trace_uprobe list. More details can
>>  be found at[2]. With this new approach, there are no deadlocks found
>>  so far.
>>  2. Many of the core uprobe function and data-structures needs to be
>>  exported to make earlier implementation simple. With this new approach,
>>  reference counter logic is been implemented in core uprobe and thus
>>  no need to export anything.
> 
> 
> A quick scan of the patches, I don't see anything controversial with
> them. Unless others have any qualms about it, I say repost as non RFC,
> and we can start doing a more thorough review.

Yes, I've posted it: https://lkml.org/lkml/2018/6/19/1324

I've copied you as well :) Please have a look.

Thanks,
Ravi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
@ 2018-06-21  2:35     ` Ravi Bangoria
  0 siblings, 0 replies; 42+ messages in thread
From: Ravi Bangoria @ 2018-06-21  2:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: oleg, srikar, mhiramat, peterz, mingo, acme, alexander.shishkin,
	jolsa, namhyung, linux-kernel, corbet, linux-doc, ananth,
	alexis.berlemont, naveen.n.rao, Ravi Bangoria



On 06/20/2018 10:07 PM, Steven Rostedt wrote:
> On Wed,  6 Jun 2018 14:03:37 +0530
> Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:
> 
>> Why RFC again:
>>
>> This series is different from earlier versions[1]. Earlier series
>> implemented this feature in trace_uprobe while this has implemented
>> the logic in core uprobe. Few reasons for this:
>>  1. One of the major reason was the deadlock between uprobe_lock and
>>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>>  to take uprobe_lock to loop over trace_uprobe list. More details can
>>  be found at[2]. With this new approach, there are no deadlocks found
>>  so far.
>>  2. Many of the core uprobe function and data-structures needs to be
>>  exported to make earlier implementation simple. With this new approach,
>>  reference counter logic is been implemented in core uprobe and thus
>>  no need to export anything.
> 
> 
> A quick scan of the patches, I don't see anything controversial with
> them. Unless others have any qualms about it, I say repost as non RFC,
> and we can start doing a more thorough review.

Yes, I've posted it: https://lkml.org/lkml/2018/6/19/1324

I've copied you as well :) Please have a look.

Thanks,
Ravi

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2018-06-21  2:36 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-06  8:33 [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-06-06  8:33 ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 1/7] Uprobes: Simplify uprobe_register() body Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 2/7] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 3/7] Uprobes/sdt: Fix multiple update of same reference counter Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 4/7] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 5/7] Uprobes/sdt: " Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 6/7] Uprobes/sdt: Document about reference counter Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 7/7] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:35 ` [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-06-06  8:35   ` Ravi Bangoria
2018-06-08  1:10 ` Masami Hiramatsu
2018-06-08  1:10   ` Masami Hiramatsu
2018-06-08  2:29   ` Ravi Bangoria
2018-06-08  2:29     ` Ravi Bangoria
2018-06-08  5:14     ` Masami Hiramatsu
2018-06-08  5:14       ` Masami Hiramatsu
2018-06-08  6:34       ` Ravi Bangoria
2018-06-08  6:34         ` Ravi Bangoria
2018-06-08 15:45         ` Masami Hiramatsu
2018-06-08 15:45           ` Masami Hiramatsu
2018-06-11  4:31           ` Ravi Bangoria
2018-06-11  4:31             ` Ravi Bangoria
2018-06-16 13:50             ` Masami Hiramatsu
2018-06-16 13:50               ` Masami Hiramatsu
2018-06-16 15:07               ` Ravi Bangoria
2018-06-16 15:07                 ` Ravi Bangoria
2018-06-08 16:36 ` Oleg Nesterov
2018-06-08 16:36   ` Oleg Nesterov
2018-06-11  4:13   ` Ravi Bangoria
2018-06-11  4:13     ` Ravi Bangoria
2018-06-20 16:37 ` Steven Rostedt
2018-06-20 16:37   ` Steven Rostedt
2018-06-21  2:35   ` Ravi Bangoria
2018-06-21  2:35     ` Ravi Bangoria

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.