All of lore.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu <mhiramat@kernel.org>
To: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: oleg@redhat.com, srikar@linux.vnet.ibm.com, rostedt@goodmis.org,
	peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	alexander.shishkin@linux.intel.com, jolsa@redhat.com,
	namhyung@kernel.org, linux-kernel@vger.kernel.org,
	corbet@lwn.net, linux-doc@vger.kernel.org,
	ananth@linux.vnet.ibm.com, alexis.berlemont@gmail.com,
	naveen.n.rao@linux.vnet.ibm.com
Subject: Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
Date: Fri, 8 Jun 2018 10:10:23 +0900	[thread overview]
Message-ID: <20180608101023.cbf20db485026213d490cb7c@kernel.org> (raw)
In-Reply-To: <20180606083344.31320-1-ravi.bangoria@linux.ibm.com>

On Wed,  6 Jun 2018 14:03:37 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Why RFC again:
> 
> This series is different from earlier versions[1]. Earlier series
> implemented this feature in trace_uprobe while this has implemented
> the logic in core uprobe. Few reasons for this:
>  1. One of the major reason was the deadlock between uprobe_lock and
>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>  to take uprobe_lock to loop over trace_uprobe list. More details can
>  be found at[2]. With this new approach, there are no deadlocks found
>  so far.
>  2. Many of the core uprobe function and data-structures needs to be
>  exported to make earlier implementation simple. With this new approach,
>  reference counter logic is been implemented in core uprobe and thus
>  no need to export anything.

I agree with you. Moreover, since uprobe_register/unregister() are
exported to modules, this enablement would better be implemented
inside uprobe so that all uprobe users benefit from this.

> 
> Description:
> 
> Userspace Statically Defined Tracepoints[3] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
> 
>     if (reference_counter > 0) {
>         Execute marker instructions;
>     }
> 
> Default value of reference counter is 0. Tracer has to increment the 
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
> 
> Currently, perf tool has limited supports for SDT markers. I.e. it
> can not trace markers surrounded by reference counter. Also, it's
> not easy to add reference counter logic in userspace tool like perf,
> so basic idea for this patchset is to add reference counter logic in
> the trace_uprobe infrastructure. Ex,[4]
> 
>   # cat tick.c
>     ... 
>     for (i = 0; i < 100; i++) {
> 	DTRACE_PROBE1(tick, loop1, i);
>         if (TICK_LOOP2_ENABLED()) {
>             DTRACE_PROBE1(tick, loop2, i); 
>         }
>         printf("hi: %d\n", i); 
>         sleep(1);
>     }   
>     ... 
> 
> Here tick:loop1 is marker without reference counter where as tick:loop2
> is surrounded by reference counter condition.
> 
>   # perf buildid-cache --add /tmp/tick
>   # perf probe sdt_tick:loop1
>   # perf probe sdt_tick:loop2
> 
>   # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
>   hi: 0
>   hi: 1
>   hi: 2
>   ^C
>   Performance counter stats for '/tmp/tick':
>              3      sdt_tick:loop1
>              0      sdt_tick:loop2
>      2.747086086 seconds time elapsed
> 
> Perf failed to record data for tick:loop2. Same experiment with this
> patch series:
> 
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C  
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed
> 
> 
> Note:
>  - 'reference counter' is called as 'semaphore' in original Dtrace
>    (or Systemtap, bcc and even in ELF) documentation and code. But the 
>    term 'semaphore' is misleading in this context. This is just a counter
>    used to hold number of tracers tracing on a marker. This is not haeally
>    used for any synchronization. So we are referring it as 'reference
>    counter' in kernel / perf code.

+1. I like this "reference counter" term :)

>  - This patches still has one issue. If there are multiple instances of
>    same application running and user wants to trace any particular
>    instance, trace_uprobe is updating reference counter in all instances.
>    This is not a problem on user side because instruction is not replaced
>    with trap/int3 and thus user will only see samples from his interested
>    process. But still this is more of a correctness issue. I'm working on
>    a fix for this.

Hmm, it sounds like not a correctness issue, but there maybe a performace
tradeoff. Tracing one particulear instance, other instances also will get
a performance loss (Only if the parameter preparation block is heavy,
because the heaviest part of probing - trap/int3 and recording data - isn't
executed.)

BTW, why this happens? I thought the refcounter part is just a data which
is not shared among processes...

Thank you,


> 
> [1] https://lkml.org/lkml/2018/4/17/23
> [2] https://lkml.org/lkml/2018/5/25/111
> [3] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> [4] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
> 
> Ravi Bangoria (7):
>   Uprobes: Simplify uprobe_register() body
>   Uprobes: Support SDT markers having reference count (semaphore)
>   Uprobes/sdt: Fix multiple update of same reference counter
>   trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Document about reference counter
>   perf probe: Support SDT markers having reference counter (semaphore)
> 
>  Documentation/trace/uprobetracer.rst |  16 +-
>  include/linux/uprobes.h              |   5 +
>  kernel/events/uprobes.c              | 502 +++++++++++++++++++++++++++++++----
>  kernel/trace/trace.c                 |   2 +-
>  kernel/trace/trace_uprobe.c          |  74 +++++-
>  tools/perf/util/probe-event.c        |  39 ++-
>  tools/perf/util/probe-event.h        |   1 +
>  tools/perf/util/probe-file.c         |  34 ++-
>  tools/perf/util/probe-file.h         |   1 +
>  tools/perf/util/symbol-elf.c         |  46 +++-
>  tools/perf/util/symbol.h             |   7 +
>  11 files changed, 643 insertions(+), 84 deletions(-)
> 
> -- 
> 1.8.3.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

WARNING: multiple messages have this Message-ID (diff)
From: Masami Hiramatsu <mhiramat@kernel.org>
To: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: oleg@redhat.com, srikar@linux.vnet.ibm.com, rostedt@goodmis.org,
	peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	alexander.shishkin@linux.intel.com, jolsa@redhat.com,
	namhyung@kernel.org, linux-kernel@vger.kernel.org,
	corbet@lwn.net, linux-doc@vger.kernel.org,
	ananth@linux.vnet.ibm.com, alexis.berlemont@gmail.com,
	naveen.n.rao@linux.vnet.ibm.com
Subject: Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
Date: Fri, 8 Jun 2018 10:10:23 +0900	[thread overview]
Message-ID: <20180608101023.cbf20db485026213d490cb7c@kernel.org> (raw)
In-Reply-To: <20180606083344.31320-1-ravi.bangoria@linux.ibm.com>

On Wed,  6 Jun 2018 14:03:37 +0530
Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote:

> Why RFC again:
> 
> This series is different from earlier versions[1]. Earlier series
> implemented this feature in trace_uprobe while this has implemented
> the logic in core uprobe. Few reasons for this:
>  1. One of the major reason was the deadlock between uprobe_lock and
>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
>  because mm->mmap is not in control of trace_uprobe_mmap() and it has
>  to take uprobe_lock to loop over trace_uprobe list. More details can
>  be found at[2]. With this new approach, there are no deadlocks found
>  so far.
>  2. Many of the core uprobe function and data-structures needs to be
>  exported to make earlier implementation simple. With this new approach,
>  reference counter logic is been implemented in core uprobe and thus
>  no need to export anything.

I agree with you. Moreover, since uprobe_register/unregister() are
exported to modules, this enablement would better be implemented
inside uprobe so that all uprobe users benefit from this.

> 
> Description:
> 
> Userspace Statically Defined Tracepoints[3] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
> 
>     if (reference_counter > 0) {
>         Execute marker instructions;
>     }
> 
> Default value of reference counter is 0. Tracer has to increment the 
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
> 
> Currently, perf tool has limited supports for SDT markers. I.e. it
> can not trace markers surrounded by reference counter. Also, it's
> not easy to add reference counter logic in userspace tool like perf,
> so basic idea for this patchset is to add reference counter logic in
> the trace_uprobe infrastructure. Ex,[4]
> 
>   # cat tick.c
>     ... 
>     for (i = 0; i < 100; i++) {
> 	DTRACE_PROBE1(tick, loop1, i);
>         if (TICK_LOOP2_ENABLED()) {
>             DTRACE_PROBE1(tick, loop2, i); 
>         }
>         printf("hi: %d\n", i); 
>         sleep(1);
>     }   
>     ... 
> 
> Here tick:loop1 is marker without reference counter where as tick:loop2
> is surrounded by reference counter condition.
> 
>   # perf buildid-cache --add /tmp/tick
>   # perf probe sdt_tick:loop1
>   # perf probe sdt_tick:loop2
> 
>   # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
>   hi: 0
>   hi: 1
>   hi: 2
>   ^C
>   Performance counter stats for '/tmp/tick':
>              3      sdt_tick:loop1
>              0      sdt_tick:loop2
>      2.747086086 seconds time elapsed
> 
> Perf failed to record data for tick:loop2. Same experiment with this
> patch series:
> 
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
>     hi: 0
>     hi: 1
>     hi: 2
>     ^C  
>      Performance counter stats for '/tmp/tick':
>                  3      sdt_tick:loop2
>        2.561851452 seconds time elapsed
> 
> 
> Note:
>  - 'reference counter' is called as 'semaphore' in original Dtrace
>    (or Systemtap, bcc and even in ELF) documentation and code. But the 
>    term 'semaphore' is misleading in this context. This is just a counter
>    used to hold number of tracers tracing on a marker. This is not haeally
>    used for any synchronization. So we are referring it as 'reference
>    counter' in kernel / perf code.

+1. I like this "reference counter" term :)

>  - This patches still has one issue. If there are multiple instances of
>    same application running and user wants to trace any particular
>    instance, trace_uprobe is updating reference counter in all instances.
>    This is not a problem on user side because instruction is not replaced
>    with trap/int3 and thus user will only see samples from his interested
>    process. But still this is more of a correctness issue. I'm working on
>    a fix for this.

Hmm, it sounds like not a correctness issue, but there maybe a performace
tradeoff. Tracing one particulear instance, other instances also will get
a performance loss (Only if the parameter preparation block is heavy,
because the heaviest part of probing - trap/int3 and recording data - isn't
executed.)

BTW, why this happens? I thought the refcounter part is just a data which
is not shared among processes...

Thank you,


> 
> [1] https://lkml.org/lkml/2018/4/17/23
> [2] https://lkml.org/lkml/2018/5/25/111
> [3] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
> [4] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
> 
> Ravi Bangoria (7):
>   Uprobes: Simplify uprobe_register() body
>   Uprobes: Support SDT markers having reference count (semaphore)
>   Uprobes/sdt: Fix multiple update of same reference counter
>   trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Document about reference counter
>   perf probe: Support SDT markers having reference counter (semaphore)
> 
>  Documentation/trace/uprobetracer.rst |  16 +-
>  include/linux/uprobes.h              |   5 +
>  kernel/events/uprobes.c              | 502 +++++++++++++++++++++++++++++++----
>  kernel/trace/trace.c                 |   2 +-
>  kernel/trace/trace_uprobe.c          |  74 +++++-
>  tools/perf/util/probe-event.c        |  39 ++-
>  tools/perf/util/probe-event.h        |   1 +
>  tools/perf/util/probe-file.c         |  34 ++-
>  tools/perf/util/probe-file.h         |   1 +
>  tools/perf/util/symbol-elf.c         |  46 +++-
>  tools/perf/util/symbol.h             |   7 +
>  11 files changed, 643 insertions(+), 84 deletions(-)
> 
> -- 
> 1.8.3.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2018-06-08  1:10 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-06  8:33 [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-06-06  8:33 ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 1/7] Uprobes: Simplify uprobe_register() body Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 2/7] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 3/7] Uprobes/sdt: Fix multiple update of same reference counter Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 4/7] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 5/7] Uprobes/sdt: " Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 6/7] Uprobes/sdt: Document about reference counter Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:33 ` [PATCH 7/7] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
2018-06-06  8:33   ` Ravi Bangoria
2018-06-06  8:35 ` [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-06-06  8:35   ` Ravi Bangoria
2018-06-08  1:10 ` Masami Hiramatsu [this message]
2018-06-08  1:10   ` Masami Hiramatsu
2018-06-08  2:29   ` Ravi Bangoria
2018-06-08  2:29     ` Ravi Bangoria
2018-06-08  5:14     ` Masami Hiramatsu
2018-06-08  5:14       ` Masami Hiramatsu
2018-06-08  6:34       ` Ravi Bangoria
2018-06-08  6:34         ` Ravi Bangoria
2018-06-08 15:45         ` Masami Hiramatsu
2018-06-08 15:45           ` Masami Hiramatsu
2018-06-11  4:31           ` Ravi Bangoria
2018-06-11  4:31             ` Ravi Bangoria
2018-06-16 13:50             ` Masami Hiramatsu
2018-06-16 13:50               ` Masami Hiramatsu
2018-06-16 15:07               ` Ravi Bangoria
2018-06-16 15:07                 ` Ravi Bangoria
2018-06-08 16:36 ` Oleg Nesterov
2018-06-08 16:36   ` Oleg Nesterov
2018-06-11  4:13   ` Ravi Bangoria
2018-06-11  4:13     ` Ravi Bangoria
2018-06-20 16:37 ` Steven Rostedt
2018-06-20 16:37   ` Steven Rostedt
2018-06-21  2:35   ` Ravi Bangoria
2018-06-21  2:35     ` Ravi Bangoria

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180608101023.cbf20db485026213d490cb7c@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexis.berlemont@gmail.com \
    --cc=ananth@linux.vnet.ibm.com \
    --cc=corbet@lwn.net \
    --cc=jolsa@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@linux.ibm.com \
    --cc=rostedt@goodmis.org \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.