All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] perf/sdt: Directly record SDT events with 'perf record'
@ 2017-02-24  7:43 Ravi Bangoria
  2017-02-24  7:43 ` [PATCH v3 1/2] perf/sdt: Introduce util func is_sdt_event() Ravi Bangoria
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Ravi Bangoria @ 2017-02-24  7:43 UTC (permalink / raw)
  To: mingo, acme, masami.hiramatsu.pt
  Cc: brendan.d.gregg, peterz, alexander.shishkin, wangnan0, jolsa, ak,
	treeze.taeung, mathieu.poirier, hekuang, sukadev, ananth,
	naveen.n.rao, adrian.hunter, linux-kernel, hemant, Ravi Bangoria

All events from 'perf list', except SDT events, can be directly recorded
with 'perf record'. But, the flow is little different for SDT events.
Probe point for SDT event needs to be created using 'perf probe' before
recording it using 'perf record'.

As suggested by Ingo[1], it's better to make this process simple by
creating probe point automatically with 'perf record' for SDT events.
Same effort was done by Hemant some time ago[2]. This patch series
is based on work he has done.

Changes in v3:
  - Rebased to current acme/perf/core

  - v2 was having build failure if LIB_ELF support is missing. Solved
    it in this series.

  - Event starting with 'sdt_' will be considered as SDT event. v2
    was forcing to prepend SDT event name with '%' for 'perf record'
    but not for 'perf probe'. This change was suggested by Brendan/
    Masami in one of the review comment in v1.[3]

  - There was one more problem with v2. If internal 'perf probe' fails
    to lookup SDT event in cache because of too big list (-E2BIG),
    'perf record' starts recording cycles event instead of showing
    warning/error. This can totally confuse user. I've resolved it. ex:

    With v2:
      $ perf record -a -e sdt_qemu:*
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.246 MB perf.data (1523 samples) ]

      $ perf evlist
        cpu-clock

    With this patch:
      $ perf record -a -e sdt_qemu:*
        event syntax error: 'sdt_qemu:*'
                               \___ Cache lookup failed

  - This patch series still allows user to place a probe manually with
    'perf probe' but also shows a hint:

      $ perf probe sdt_libpthread:mutex_entry
        ...
        Hint: SDT event can be directly recorded with 'perf record'. No need to create probe manually.

  - v2 was always trying to look for sdt event in probe cache by ignoring
    entries of uprobe_events. Hence, it creates new probe points for event
    even if it already exists. This was confusing from user's point of
    view.

    With v2:
      $ perf probe sdt_libpthread:mutex_entry
        Added new events:
          sdt_libpthread:mutex_entry (on %mutex_entry in /usr/lib64/libpthread-2.24.so)
          sdt_libpthread:mutex_entry_1 (on %mutex_entry in /usr/lib64/libpthread-2.24.so)

      $ cat /sys/kernel/debug/tracing/uprobe_events
        p:sdt_libpthread/mutex_entry /usr/lib64/libpthread-2.24.so:0x0000000000009ddb
        p:sdt_libpthread/mutex_entry_1 /usr/lib64/libpthread-2.24.so:0x000000000000bcbb

      $ perf record -a -e %sdt_libpthread:mutex_entry
        Warning : Recording on 2 occurrences of sdt_libpthread:mutex_entry

      $ perf evlist
        sdt_libpthread:mutex_entry_3
        sdt_libpthread:mutex_entry_2

    There are two issues:
    1st, It confuses user about new names.
    2nd, (and more important) Perf won't allow you to record
         'sdt_libpthread:mutex_entry_1' with 'perf record' even if it
         exists in uprobe_events, because it won't find event with that
         name in probe cache.

    I've solved these issues. This patch gives first priority to existing
    entries of uprobe_events file. After that, it looks into probe cache.
    For ex,

    With this patch:
      $ perf probe sdt_libpthread:mutex_entry
        Added new events:
          sdt_libpthread:mutex_entry (on %mutex_entry in /usr/lib64/libpthread-2.24.so)
          sdt_libpthread:mutex_entry_1 (on %mutex_entry in /usr/lib64/libpthread-2.24.so)

      $ perf record -a -e sdt_libpthread:mutex_entry_1
        Matching event(s) from uprobe_events:
          sdt_libpthread:mutex_entry_1  0xbcbb@/usr/lib64/libpthread-2.24.so
        Use 'perf probe -d <event>' to delete event(s).

    It also lists matching entries as 'name addr@filename' followed by the
    hint about how to delete them with 'perf probe -d'.

    If there is no matching entry found in uprobe_events with that *name*,
    perf will find all events with that name from probe cache. Once again
    it checks whether these events exists in uprobe_events, but this time
    it uses *address* and *filename* instead of eventname. If entry found,
    it reuse that entry instead of creating new one. In continuation of
    above example,

    With this patch:
      $ perf probe -d sdt_libpthread:mutex_entry
        Removed event: sdt_libpthread:mutex_entry

      $ cat /sys/kernel/debug/tracing/uprobe_events
        p:sdt_libpthread/mutex_entry_1 /usr/lib64/libpthread-2.24.so:0x000000000000bcbb

      $ perf record -a -e sdt_libpthread:*
        Matching event(s) from uprobe_events:
          sdt_libpthread:mutex_entry_1  0xbcbb@/usr/lib64/libpthread-2.24.so
        Use 'perf probe -d <event>' to delete event(s).

        Warning: Recording on 35 occurrences of sdt_libpthread:*

      $ perf evlist
        sdt_libpthread:mutex_entry
        sdt_libpthread:pthread_create
        sdt_libpthread:mutex_entry_1
        ...

    Here, Perf has reused entry for event with 0xbcbb address, but also it
    has created new entry for event with 0x9ddb address. It also maintains
    list of entries created for particular record session, and uses that
    list to remove entries at the end of session.

    Finally, If somehow tool fails to clean events from uprobe_events at
    the end of session, user has to clean events manually with
    'perf probe -d'. But perf will give Warning in such case. For ex,

      $ perf record -a -e sdt_libpthread:mutex_entry
        Warning: Recording on 2 occurrences of sdt_libpthread:mutex_entry
        /** Fails with segfault **/

      $ cat /sys/kernel/debug/tracing/uprobe_events
        p:sdt_libpthread/mutex_entry /usr/lib64/libpthread-2.24.so:0x0000000000009ddb
        p:sdt_libpthread/mutex_entry_1 /usr/lib64/libpthread-2.24.so:0x000000000000bcbb

    When next time user tries to record, it will show a warning:

      $ perf record -a -e sdt_libpthread:mutex_entry
        Matching event(s) from uprobe_events:
           sdt_libpthread:mutex_entry  0x9ddb@/usr/lib64/libpthread-2.24.so
        Use 'perf probe -d <event>' to delete event(s).

        Warning: Found 2 events from probe-cache with name 'sdt_libpthread:mutex_entry'.
	         Since probe point already exists with this name, recording only 1 event.
        Hint: Please use 'perf probe -d sdt_libpthread:mutex_entry*' to allow record on all events.

    But no such warning for 'sdt_libpthread:mutex_entry_1'.

      $ perf record -a -e sdt_libpthread:mutex_entry_1
        Matching event(s) from uprobe_events:
          sdt_libpthread:mutex_entry_1  0xbcbb@/usr/lib64/libpthread-2.24.so
        Use 'perf probe -d <event>' to delete event(s).


[1] https://lkml.org/lkml/2017/2/7/59
[2] https://lkml.org/lkml/2016/5/3/810
[3] https://lkml.org/lkml/2016/5/2/689


Hemant Kumar (1):
  perf/sdt: Directly record SDT events with 'perf record'

Ravi Bangoria (1):
  perf/sdt: Introduce util func is_sdt_event()

 tools/lib/api/fs/tracing_path.c |  17 +-
 tools/perf/builtin-probe.c      |  21 ++-
 tools/perf/builtin-record.c     |  23 +++
 tools/perf/perf.h               |   2 +
 tools/perf/util/parse-events.c  |  56 +++++-
 tools/perf/util/parse-events.h  |   2 +
 tools/perf/util/probe-event.c   |  53 +++++-
 tools/perf/util/probe-event.h   |   4 +
 tools/perf/util/probe-file.c    | 376 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/probe-file.h    |  27 +++
 tools/perf/util/util.c          |  12 ++
 tools/perf/util/util.h          |   2 +
 12 files changed, 567 insertions(+), 28 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/2] perf/sdt: Introduce util func is_sdt_event()
  2017-02-24  7:43 [PATCH v3 0/2] perf/sdt: Directly record SDT events with 'perf record' Ravi Bangoria
@ 2017-02-24  7:43 ` Ravi Bangoria
  2017-02-28  1:34   ` Masami Hiramatsu
  2017-02-24  7:43 ` [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record' Ravi Bangoria
  2017-02-27  4:54 ` [PATCH v3 0/2] " Ravi Bangoria
  2 siblings, 1 reply; 11+ messages in thread
From: Ravi Bangoria @ 2017-02-24  7:43 UTC (permalink / raw)
  To: mingo, acme, masami.hiramatsu.pt
  Cc: brendan.d.gregg, peterz, alexander.shishkin, wangnan0, jolsa, ak,
	treeze.taeung, mathieu.poirier, hekuang, sukadev, ananth,
	naveen.n.rao, adrian.hunter, linux-kernel, hemant, Ravi Bangoria

No Functionality changes.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 tools/perf/util/probe-event.c |  9 +--------
 tools/perf/util/util.c        | 12 ++++++++++++
 tools/perf/util/util.h        |  2 ++
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 28fb62c..2b1409f 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1339,14 +1339,7 @@ static int parse_perf_probe_point(char *arg, struct perf_probe_event *pev)
 	if (!arg)
 		return -EINVAL;
 
-	/*
-	 * If the probe point starts with '%',
-	 * or starts with "sdt_" and has a ':' but no '=',
-	 * then it should be a SDT/cached probe point.
-	 */
-	if (arg[0] == '%' ||
-	    (!strncmp(arg, "sdt_", 4) &&
-	     !!strchr(arg, ':') && !strchr(arg, '='))) {
+	if (is_sdt_event(arg)) {
 		pev->sdt = true;
 		if (arg[0] == '%')
 			arg++;
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index d8b45ce..b827428 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -802,3 +802,15 @@ int unit_number__scnprintf(char *buf, size_t size, u64 n)
 
 	return scnprintf(buf, size, "%" PRIu64 "%c", n, unit[i]);
 }
+
+/*
+ * If the probe point starts with '%',
+ * or starts with "sdt_" and has a ':' but no '=',
+ * then it should be a SDT/cached probe point.
+ */
+bool is_sdt_event(char *str)
+{
+	return (str[0] == '%' ||
+		(!strncmp(str, "sdt_", 4) &&
+		 !!strchr(str, ':') && !strchr(str, '=')));
+}
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index c74708d..ee599dc 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -364,4 +364,6 @@ int is_printable_array(char *p, unsigned int len);
 int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz);
 
 int unit_number__scnprintf(char *buf, size_t size, u64 n);
+
+bool is_sdt_event(char *str);
 #endif /* GIT_COMPAT_UTIL_H */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
  2017-02-24  7:43 [PATCH v3 0/2] perf/sdt: Directly record SDT events with 'perf record' Ravi Bangoria
  2017-02-24  7:43 ` [PATCH v3 1/2] perf/sdt: Introduce util func is_sdt_event() Ravi Bangoria
@ 2017-02-24  7:43 ` Ravi Bangoria
  2017-02-28  5:45   ` Masami Hiramatsu
  2017-02-28 22:31   ` Brendan Gregg
  2017-02-27  4:54 ` [PATCH v3 0/2] " Ravi Bangoria
  2 siblings, 2 replies; 11+ messages in thread
From: Ravi Bangoria @ 2017-02-24  7:43 UTC (permalink / raw)
  To: mingo, acme, masami.hiramatsu.pt
  Cc: brendan.d.gregg, peterz, alexander.shishkin, wangnan0, jolsa, ak,
	treeze.taeung, mathieu.poirier, hekuang, sukadev, ananth,
	naveen.n.rao, adrian.hunter, linux-kernel, hemant, Ravi Bangoria

From: Hemant Kumar <hemant@linux.vnet.ibm.com>

Add support for directly recording SDT events which are present in
the probe cache. Without this patch, we could probe into SDT events
using 'perf probe' and 'perf record'. With this patch, we can probe
the SDT events directly using 'perf record'.

For example :

  $ perf list sdt
    sdt_libpthread:mutex_entry                         [SDT event]
    sdt_libc:setjmp                                    [SDT event]
    ...

  $ perf record -a -e sdt_libc:setjmp
    ^C[ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ]

  $ perf script
         bash   803 [002]  6492.190311: sdt_libc:setjmp: (7f1d503b56a1)
        login   488 [001]  6496.791583: sdt_libc:setjmp: (7ff3013d56a1)
      fprintd 11038 [003]  6496.808032: sdt_libc:setjmp: (7fdedf5936a1)

Recording on SDT events with same provider and marker names is also
supported:

  $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
      Provider: libpthread
      Name: mutex_entry
      Location: 0x0000000000009ddb, Base: 0x00000000000139cc, ...
    --
      Provider: libpthread
      Name: mutex_entry
      Location: 0x000000000000bcbb, Base: 0x00000000000139cc, ...

  $ perf record -a -e sdt_libpthread:mutex_entry
    Info: Recording on 2 occurrences of sdt_libpthread:mutex_entry
    ^C[ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.197 MB perf.data (31 samples) ]

  $ perf script
      in:imjournal 442 [000]  6625.701833:   sdt_libpthread:mutex_entry: (7fb1a1940ddb)
     rs:main Q:Reg 443 [001]  6625.701889: sdt_libpthread:mutex_entry_1: (7fb1a1942cbb)


After invoking 'perf record', behind the scenes, it checks whether
the event specified is an SDT event by using the flag '%' or string
'sdt_'. After that, it first checks whether event already exists
with that *name* in uprobe_events file. If found, it records that
particular event. Otherwise, it does a lookup of the probe cache to
find out the SDT event. If its not present, it throws an error.
If found, it again tries to find existing events from uprobe_events
file, but this time it uses *address* and *filename* for comparison.
Finally it writes new events into the uprobe_events file and starts
recording. It also maintains a list of the event names that were
written to uprobe_events file for this session. After finishing the
record session, it removes the events from the uprobe_events file
using the maintained name list.

Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
---
 tools/lib/api/fs/tracing_path.c |  17 +-
 tools/perf/builtin-probe.c      |  21 ++-
 tools/perf/builtin-record.c     |  23 +++
 tools/perf/perf.h               |   2 +
 tools/perf/util/parse-events.c  |  56 +++++-
 tools/perf/util/parse-events.h  |   2 +
 tools/perf/util/probe-event.c   |  44 ++++-
 tools/perf/util/probe-event.h   |   4 +
 tools/perf/util/probe-file.c    | 376 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/probe-file.h    |  27 +++
 10 files changed, 552 insertions(+), 20 deletions(-)

diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c
index 3e606b9..fa52e67 100644
--- a/tools/lib/api/fs/tracing_path.c
+++ b/tools/lib/api/fs/tracing_path.c
@@ -103,19 +103,10 @@ int tracing_path__strerror_open_tp(int err, char *buf, size_t size,
 		 * - jirka
 		 */
 		if (debugfs__configured() || tracefs__configured()) {
-			/* sdt markers */
-			if (!strncmp(filename, "sdt_", 4)) {
-				snprintf(buf, size,
-					"Error:\tFile %s/%s not found.\n"
-					"Hint:\tSDT event cannot be directly recorded on.\n"
-					"\tPlease first use 'perf probe %s:%s' before recording it.\n",
-					tracing_events_path, filename, sys, name);
-			} else {
-				snprintf(buf, size,
-					 "Error:\tFile %s/%s not found.\n"
-					 "Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n",
-					 tracing_events_path, filename);
-			}
+			snprintf(buf, size,
+				 "Error:\tFile %s/%s not found.\n"
+				 "Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n",
+				 tracing_events_path, filename);
 			break;
 		}
 		snprintf(buf, size, "%s",
diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 1fcebc3..d255a6d 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -45,6 +45,7 @@
 #define DEFAULT_VAR_FILTER "!__k???tab_* & !__crc_*"
 #define DEFAULT_FUNC_FILTER "!_*"
 #define DEFAULT_LIST_FILTER "*"
+#define MAX_EVENT_LENGTH 512
 
 /* Session management structure */
 static struct {
@@ -312,6 +313,19 @@ static void pr_err_with_code(const char *msg, int err)
 	pr_err("\n");
 }
 
+/* Show how to use the event. */
+static void record_syntax_hint(const char *group, const char *event)
+{
+	char ge[MAX_EVENT_LENGTH];
+
+	pr_info("\nYou can now use it in all perf tools, such as:\n\n");
+	pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event);
+
+	snprintf(ge, strlen(group) + strlen(event) + 2, "%s:%s", group, event);
+	if (is_sdt_event(ge))
+		pr_info("Hint: SDT event can be directly recorded with 'perf record'. No need to create probe manually.\n");
+}
+
 static int perf_add_probe_events(struct perf_probe_event *pevs, int npevs)
 {
 	int ret;
@@ -356,11 +370,8 @@ static int perf_add_probe_events(struct perf_probe_event *pevs, int npevs)
 	}
 
 	/* Note that it is possible to skip all events because of blacklist */
-	if (event) {
-		/* Show how to use the event. */
-		pr_info("\nYou can now use it in all perf tools, such as:\n\n");
-		pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event);
-	}
+	if (event)
+		record_syntax_hint(group, event);
 
 out_cleanup:
 	cleanup_perf_probe_events(pevs, npevs);
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index bc84a37..fa9b345 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -39,6 +39,7 @@
 #include "util/trigger.h"
 #include "util/perf-hooks.h"
 #include "asm/bug.h"
+#include "util/probe-file.h"
 
 #include <unistd.h>
 #include <sched.h>
@@ -73,6 +74,7 @@ struct record {
 	bool			timestamp_filename;
 	struct switch_output	switch_output;
 	unsigned long long	samples;
+	struct list_head	sdt_event_list;
 };
 
 static volatile int auxtrace_record__snapshot_started;
@@ -1503,6 +1505,25 @@ static struct record record = {
 	},
 };
 
+void sdt_event_list__add(struct list_head *sdt_event_list)
+{
+	if (list_empty(sdt_event_list))
+		return;
+	list_splice(sdt_event_list, &record.sdt_event_list);
+}
+
+bool is_cmd_record(void)
+{
+	return (record.evlist != NULL);
+}
+
+void sdt_event_list__remove(void)
+{
+#ifdef HAVE_LIBELF_SUPPORT
+	remove_sdt_event_list(&record.sdt_event_list);
+#endif
+}
+
 const char record_callchain_help[] = CALLCHAIN_RECORD_HELP
 	"\n\t\t\t\tDefault: fp";
 
@@ -1671,6 +1692,7 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (rec->evlist == NULL)
 		return -ENOMEM;
 
+	INIT_LIST_HEAD(&rec->sdt_event_list);
 	err = perf_config(perf_record_config, rec);
 	if (err)
 		return err;
@@ -1841,6 +1863,7 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 	perf_evlist__delete(rec->evlist);
 	symbol__exit();
 	auxtrace_record__free(rec->itr);
+	sdt_event_list__remove();
 	return err;
 }
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 1c27d94..4ad12bb 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -76,4 +76,6 @@ struct record_opts {
 struct option;
 extern const char * const *record_usage;
 extern struct option *record_options;
+
+bool is_cmd_record(void);
 #endif
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 67a8aeb..7d1f042 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1726,12 +1726,66 @@ static void parse_events_print_error(struct parse_events_error *err,
 
 #undef MAX_WIDTH
 
+/* SDT event needs LIBELF support for creating a probe point */
+#ifdef HAVE_LIBELF_SUPPORT
+static int parse_sdt_event(struct perf_evlist *evlist, const char *str,
+			   struct parse_events_error *err)
+{
+	char *ptr = NULL;
+	int ret;
+	struct list_head *sdt_evlist;
+	struct sdt_event_list *event;
+
+	if (str[0] == '%')
+		str++;
+
+	ptr = strdup(str);
+	if (ptr == NULL)
+		return -ENOMEM;
+
+	sdt_evlist = zalloc(sizeof(*sdt_evlist));
+	if (!sdt_evlist) {
+		free(ptr);
+		pr_err("Error in sdt_evlist memory allocation\n");
+		return -ENOMEM;
+	}
+	INIT_LIST_HEAD(sdt_evlist);
+
+	/*
+	 * If there is an error in this call, no need to free
+	 * up sdt_evlist, its already free'ed up in the previous
+	 * call. Free up 'ptr' though.
+	 */
+	ret = add_sdt_event(ptr, sdt_evlist, err);
+	if (!ret) {
+		list_for_each_entry(event, sdt_evlist, list) {
+			ret = parse_events(evlist, event->event_info, err);
+			if (ret < 0)
+				goto ret;
+		}
+		/* Add it to the record struct */
+		sdt_event_list__add(sdt_evlist);
+	}
+
+ret:
+	free(ptr);
+	return ret;
+}
+#endif /* HAVE_LIBELF_SUPPORT */
+
 int parse_events_option(const struct option *opt, const char *str,
 			int unset __maybe_unused)
 {
 	struct perf_evlist *evlist = *(struct perf_evlist **)opt->value;
 	struct parse_events_error err = { .idx = 0, };
-	int ret = parse_events(evlist, str, &err);
+	int ret = 0;
+
+#ifdef HAVE_LIBELF_SUPPORT
+	if (is_sdt_event((char *)str) && is_cmd_record())
+		ret = parse_sdt_event(evlist, str, &err);
+	else
+#endif
+		ret = parse_events(evlist, str, &err);
 
 	if (ret)
 		parse_events_print_error(&err, str);
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 1af6a26..205f682 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -196,4 +196,6 @@ int is_valid_tracepoint(const char *event_string);
 int valid_event_mount(const char *eventfs);
 char *parse_events_formats_error_string(char *additional_terms);
 
+void sdt_event_list__add(struct list_head *sdt_event_list);
+void sdt_event_list__remove(void);
 #endif /* __PERF_PARSE_EVENTS_H */
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 2b1409f..a19b00d 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1293,7 +1293,7 @@ int parse_line_range_desc(const char *arg, struct line_range *lr)
 	return err;
 }
 
-static int parse_perf_probe_event_name(char **arg, struct perf_probe_event *pev)
+int parse_perf_probe_event_name(char **arg, struct perf_probe_event *pev)
 {
 	char *ptr;
 
@@ -3166,6 +3166,12 @@ static int find_cached_events_all(struct perf_probe_event *pev,
 	return ret;
 }
 
+int find_sdt_events_from_cache(struct perf_probe_event *pev,
+			       struct probe_trace_event **tevs)
+{
+	return find_cached_events_all(pev, tevs);
+}
+
 static int find_probe_trace_events_from_cache(struct perf_probe_event *pev,
 					      struct probe_trace_event **tevs)
 {
@@ -3476,3 +3482,39 @@ int copy_to_probe_trace_arg(struct probe_trace_arg *tvar,
 		tvar->name = NULL;
 	return 0;
 }
+
+/*
+ * Record session for SDT events has ended. Delete the SDT events
+ * from uprobe_events file that were created initially.
+ */
+void remove_sdt_event_list(struct list_head *sdt_events)
+{
+	struct sdt_event_list *event;
+	struct strfilter *filter = NULL;
+	const char *err = NULL;
+	int ret = 0;
+
+	if (list_empty(sdt_events))
+		return;
+
+	list_for_each_entry(event, sdt_events, list) {
+		if (event->existing_event)
+			continue;
+		if (!filter) {
+			filter = strfilter__new(event->event_info, &err);
+			if (!filter)
+				goto free_list;
+		} else {
+			ret = strfilter__or(filter, event->event_info, &err);
+		}
+	}
+
+	if (filter)
+		ret = del_perf_probe_events(filter);
+
+	if (ret)
+		pr_err("Error in deleting the SDT list\n");
+
+free_list:
+	free_sdt_list(sdt_events);
+}
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 5d4e940..5ec648e 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -182,4 +182,8 @@ struct map *get_target_map(const char *target, bool user);
 void arch__post_process_probe_trace_events(struct perf_probe_event *pev,
 					   int ntevs);
 
+int parse_perf_probe_event_name(char **arg, struct perf_probe_event *pev);
+
+int find_sdt_events_from_cache(struct perf_probe_event *pev,
+			       struct probe_trace_event **tevs);
 #endif /*_PROBE_EVENT_H */
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 436b647..c941f38 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -27,8 +27,10 @@
 #include "probe-event.h"
 #include "probe-file.h"
 #include "session.h"
+#include "probe-finder.h"
 
 #define MAX_CMDLEN 256
+#define MAX_EVENT_LENGTH 512
 
 static void print_open_warning(int err, bool uprobe)
 {
@@ -933,3 +935,377 @@ bool probe_type_is_available(enum probe_type type)
 
 	return ret;
 }
+
+void free_sdt_list(struct list_head *sdt_events)
+{
+	struct sdt_event_list *tmp, *ptr;
+
+	if (list_empty(sdt_events))
+		return;
+	list_for_each_entry_safe(tmp, ptr, sdt_events, list) {
+		list_del(&tmp->list);
+		free(tmp->event_info);
+		free(tmp);
+	}
+}
+
+static int alloc_exst_sdt_event(struct exst_sdt_event_list **esl)
+{
+	struct exst_sdt_event_list *tmp;
+
+	tmp = zalloc(sizeof(*tmp));
+	if (!tmp)
+		return -ENOMEM;
+
+	tmp->tev = zalloc(sizeof(struct probe_trace_event));
+	if (!tmp->tev) {
+		free(tmp);
+		return -ENOMEM;
+	}
+
+	tmp->match = false;
+	*esl = tmp;
+	return 0;
+}
+
+static void free_exst_sdt_event(struct exst_sdt_event_list *esl)
+{
+	if (!esl)
+		return;
+
+	if (esl->tev) {
+		free(esl->tev->args);
+		free(esl->tev);
+	}
+
+	free(esl);
+}
+
+static void probe_file__free_exst_sdt_list(struct exst_sdt_event_list *esl)
+{
+	struct list_head *pos, *q;
+	struct exst_sdt_event_list *tmp;
+
+	list_for_each_safe(pos, q, &(esl->list)) {
+		tmp = list_entry(pos, struct exst_sdt_event_list, list);
+		free_exst_sdt_event(tmp);
+		list_del(pos);
+	}
+}
+
+/*
+ * Look into uprobe_events file and prepare list of sdt events
+ * whose probepoint is already registered.
+ */
+static int probe_file__get_exst_sdt_list(struct exst_sdt_event_list *esl)
+{
+	int fd, ret = 0;
+	struct strlist *rawlist;
+	struct str_node *ent;
+	struct exst_sdt_event_list *tmp = NULL;
+
+	fd = probe_file__open(PF_FL_RW | PF_FL_UPROBE);
+	if (fd < 0)
+		return fd;
+
+	rawlist = probe_file__get_rawlist(fd);
+
+	strlist__for_each_entry(ent, rawlist) {
+		ret = alloc_exst_sdt_event(&tmp);
+		if (ret < 0)
+			goto error;
+
+		ret = parse_probe_trace_command(ent->s, tmp->tev);
+		if (ret < 0) {
+			free_exst_sdt_event(tmp);
+			goto error;
+		}
+
+		if (!strncmp(tmp->tev->group, "sdt_", 4))
+			list_add_tail(&(tmp->list), &(esl->list));
+		else
+			free_exst_sdt_event(tmp);
+	}
+	return 0;
+
+error:
+	probe_file__free_exst_sdt_list(esl);
+	return ret;
+}
+
+/*
+ * Remove ith tev from pev->tevs list and shift remaining
+ * tevs(i+1 to pev->ntevs) one step.
+ */
+static void shift_pev(struct perf_probe_event *pev, int i)
+{
+	int j;
+
+	free(pev->tevs[i].event);
+	free(pev->tevs[i].group);
+	free(pev->tevs[i].args);
+	free(pev->tevs[i].point.realname);
+	free(pev->tevs[i].point.symbol);
+	free(pev->tevs[i].point.module);
+
+	/*
+	 * If ith element is last element, no need to shift,
+	 * just decrement pev->ntevs.
+	 */
+	if (i == pev->ntevs - 1)
+		goto ret;
+
+	for (j = i; j < pev->ntevs - 1; j++) {
+		pev->tevs[j].event          = pev->tevs[j + 1].event;
+		pev->tevs[j].group          = pev->tevs[j + 1].group;
+		pev->tevs[j].nargs          = pev->tevs[j + 1].nargs;
+		pev->tevs[j].uprobes        = pev->tevs[j + 1].uprobes;
+		pev->tevs[j].args           = pev->tevs[j + 1].args;
+		pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname;
+		pev->tevs[j].point.symbol   = pev->tevs[j + 1].point.symbol;
+		pev->tevs[j].point.module   = pev->tevs[j + 1].point.module;
+		pev->tevs[j].point.offset   = pev->tevs[j + 1].point.offset;
+		pev->tevs[j].point.address  = pev->tevs[j + 1].point.address;
+		pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe;
+	}
+
+ret:
+	pev->ntevs--;
+}
+
+/* Compare address and filename */
+static bool is_sdt_match(struct probe_trace_event *tev1,
+			 struct probe_trace_event *tev2)
+{
+	return (tev1->point.address == tev2->point.address &&
+		!(strcmp(tev1->point.module, tev2->point.module)));
+}
+
+/*
+ * Filter out all those pev->tevs which already exists in uprobe_events.
+ * Return 'true' if any matching entry found otherwise return 'false'.
+ */
+static bool filter_exst_sdt_events_from_pev(struct perf_probe_event *pev,
+					    struct exst_sdt_event_list *esl)
+{
+	int i;
+	bool ret = false;
+	struct exst_sdt_event_list *tmp;
+
+	list_for_each_entry(tmp, &(esl->list), list) {
+		for (i = 0; i < pev->ntevs; i++) {
+			if (is_sdt_match(&(pev->tevs[i]), tmp->tev)) {
+				shift_pev(pev, i);
+				tmp->match = true;
+				ret = true;
+			}
+		}
+	}
+
+	return ret;
+}
+
+static int list_add_sdt_event(struct list_head *sdt_events,
+			      bool existing_event,
+			      struct probe_trace_event *tev)
+{
+	struct sdt_event_list *tmp;
+
+	tmp = zalloc(sizeof(*tmp));
+	if (!tmp)
+		return -ENOMEM;
+
+	tmp->existing_event = existing_event;
+
+	INIT_LIST_HEAD(&tmp->list);
+	tmp->event_info = zalloc(MAX_EVENT_LENGTH * sizeof(char));
+	if (!tmp->event_info) {
+		free(tmp);
+		return -ENOMEM;
+	}
+
+	snprintf(tmp->event_info, strlen(tev->group) + strlen(tev->event) + 2,
+		 "%s:%s", tev->group, tev->event);
+
+	list_add(&tmp->list, sdt_events);
+
+	return 0;
+}
+
+static void print_exst_sdt_events(struct exst_sdt_event_list *tmp)
+{
+	static bool msg_head;
+
+	if (!msg_head) {
+		pr_info("Matching event(s) from uprobe_events:\n");
+		msg_head = true;
+	}
+
+	pr_info("   %s:%s  0x%" PRIx64 "@%s\n", tmp->tev->group,
+		tmp->tev->event, tmp->tev->point.address,
+		tmp->tev->point.module);
+}
+
+static void print_exst_sdt_event_footer(void)
+{
+	pr_info("Use 'perf probe -d <event>' to delete event(s).\n\n");
+}
+
+/*
+ * If there is entry with the same name in uprobe_events, record it.
+ * Return value  0: no error, not found
+ *              <0: error
+ *              >0: found
+ */
+static int probe_file__add_exst_sdt_event(struct exst_sdt_event_list *esl,
+					  struct list_head *sdt_events,
+					  struct perf_probe_event *pev)
+{
+	struct exst_sdt_event_list *tmp;
+	int ret = 0;
+
+	list_for_each_entry(tmp, &(esl->list), list) {
+		if (strcmp(tmp->tev->group, pev->group) ||
+		    strcmp(tmp->tev->event, pev->event))
+			continue;
+
+		tmp->match = true;
+
+		ret = list_add_sdt_event(sdt_events, true, tmp->tev);
+		if (ret < 0)
+			return ret;
+
+		print_exst_sdt_events(tmp);
+		print_exst_sdt_event_footer();
+		if (pev->ntevs > 1)
+			pr_warning("Warning: Found %d events from probe-cache with name '%s:%s'.\n"
+				"\t Since probe point already exists with this name, recording only 1 event.\n"
+				"Hint: Please use 'perf probe -d %s:%s*' to allow record on all events.\n\n",
+				pev->ntevs, pev->group, pev->event, pev->group, pev->event);
+		return 1;
+	}
+	return 0;
+}
+
+int add_sdt_event(char *event, struct list_head *sdt_events,
+		  struct parse_events_error *err)
+{
+	struct perf_probe_event *pev;
+	int ret = 0, i, ctr = 0, found = 0, exst_ctr = 0;
+	char *str;
+	struct exst_sdt_event_list *tmp;
+	struct exst_sdt_event_list esl;
+	bool filter;
+
+	pev = zalloc(sizeof(*pev));
+	if (!pev)
+		return -ENOMEM;
+
+	pev->sdt = true;
+	pev->uprobes = true;
+
+	str = strdup(event);
+	if (!str)
+		return -ENOMEM;
+
+	/*
+	 * Parse event to find the group name and event name of
+	 * the sdt event.
+	 */
+	ret = parse_perf_probe_event_name(&event, pev);
+	if (ret) {
+		pr_err("Error in parsing sdt event %s\n", str);
+		free(pev);
+		goto free_str;
+	}
+
+	probe_conf.max_probes = MAX_PROBES;
+	probe_conf.force_add = 1;
+
+	/*
+	 * Find the sdt event from the cache. We deliberately check failure
+	 * of this function after checking entries in uprobe_events. Because,
+	 * we may find matching entry from uprobe_events and in that case we
+	 * should continue recording that event.
+	 */
+	pev->ntevs = find_sdt_events_from_cache(pev, &pev->tevs);
+
+	/* Prepare list of existing sdt events from uprobe_events */
+	INIT_LIST_HEAD(&esl.list);
+	ret = probe_file__get_exst_sdt_list(&esl);
+	if (ret < 0)
+		goto free_str;
+
+	/* If there is entry with the same name in uprobe_events, record it. */
+	found = probe_file__add_exst_sdt_event(&esl, sdt_events, pev);
+	if (found) {
+		ret = (found > 0) ? 0 : found;
+		goto free_str;
+	}
+
+	/* Reaching here means no matching entry found in uprobe_events. */
+	filter = filter_exst_sdt_events_from_pev(pev, &esl);
+	if (!filter && pev->ntevs == 0) {
+		pr_err("%s:%s not found in the cache\n", pev->group,
+			pev->event);
+		ret = -EINVAL;
+		goto free_str;
+	} else if (pev->ntevs < 0) {
+		err->str = strdup("Cache lookup failed.\n");
+		ret = pev->ntevs;
+		goto free_str;
+	}
+
+	/* Create probe points for new events. */
+	ret = apply_perf_probe_events(pev, 1);
+	if (ret) {
+		pr_err("Error in adding SDT event : %s\n", event);
+		goto free_str;
+	}
+
+	/* Add existing event names to "sdt_events" list */
+	list_for_each_entry(tmp, &(esl.list), list) {
+		if (!tmp->match)
+			continue;
+
+		ret = list_add_sdt_event(sdt_events, true, tmp->tev);
+		if (ret < 0)
+			goto free_str;
+		print_exst_sdt_events(tmp);
+		ctr++;
+		exst_ctr++;
+	}
+	if (exst_ctr)
+		print_exst_sdt_event_footer();
+
+	/* Add new event names to "sdt_events" list */
+	for (i = 0; i < pev->ntevs; i++) {
+		ret = list_add_sdt_event(sdt_events, false, &(pev->tevs[i]));
+		if (ret < 0)
+			goto free_str;
+
+		ctr++;
+	}
+
+	/* Print warning for multiple events */
+	if (ctr > 1)
+		pr_warning("Warning: Recording on %d occurrences of %s:%s\n",
+			   ctr, pev->group, pev->event);
+
+free_str:
+	/*
+	 * User may ask for multiple events in the same record command like,
+	 *    perf record -a -e sdt_lib1:* -e sdt_a:b
+	 *
+	 * If sdt_lib1:* events are already added and there is some failure
+	 * for sdt_a:b, we need to clean sdt_lib1:* events from
+	 * record.sdt_event_list
+	 */
+	if (ret < 0)
+		sdt_event_list__remove();
+
+	free(str);
+	probe_file__free_exst_sdt_list(&esl);
+	cleanup_perf_probe_events(pev, 1);
+	return ret;
+}
diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
index eba44c3..e785eb6 100644
--- a/tools/perf/util/probe-file.h
+++ b/tools/perf/util/probe-file.h
@@ -4,6 +4,7 @@
 #include "strlist.h"
 #include "strfilter.h"
 #include "probe-event.h"
+#include "parse-events.h"
 
 /* Cache of probe definitions */
 struct probe_cache_entry {
@@ -19,6 +20,28 @@ struct probe_cache {
 	struct list_head entries;
 };
 
+struct sdt_event_list {
+	char *event_info;
+	struct list_head list;
+
+	/*
+	 * Flag to check whether this event already exists in urobe_events
+	 * file. This helps at a time of freeing sdt_event_list.
+	 */
+	bool existing_event;
+};
+
+/*
+ * This maintains list of sdt events which are already exists in
+ * uprobe_events file at the time of creating probe point for any
+ * sdt event with 'perf record'.
+ */
+struct exst_sdt_event_list {
+	struct probe_trace_event *tev;
+	struct list_head list;
+	bool match;	/* Does this event match with any new event */
+};
+
 enum probe_type {
 	PROBE_TYPE_U = 0,
 	PROBE_TYPE_S,
@@ -64,6 +87,10 @@ struct probe_cache_entry *probe_cache__find_by_name(struct probe_cache *pcache,
 					const char *group, const char *event);
 int probe_cache__show_all_caches(struct strfilter *filter);
 bool probe_type_is_available(enum probe_type type);
+int add_sdt_event(char *event, struct list_head *sdt_events,
+		  struct parse_events_error *err);
+void remove_sdt_event_list(struct list_head *sdt_event_list);
+void free_sdt_list(struct list_head *sdt_events);
 #else	/* ! HAVE_LIBELF_SUPPORT */
 static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused)
 {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 0/2] perf/sdt: Directly record SDT events with 'perf record'
  2017-02-24  7:43 [PATCH v3 0/2] perf/sdt: Directly record SDT events with 'perf record' Ravi Bangoria
  2017-02-24  7:43 ` [PATCH v3 1/2] perf/sdt: Introduce util func is_sdt_event() Ravi Bangoria
  2017-02-24  7:43 ` [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record' Ravi Bangoria
@ 2017-02-27  4:54 ` Ravi Bangoria
  2 siblings, 0 replies; 11+ messages in thread
From: Ravi Bangoria @ 2017-02-27  4:54 UTC (permalink / raw)
  To: mhiramat
  Cc: mingo, acme, brendan.d.gregg, peterz, alexander.shishkin,
	wangnan0, jolsa, ak, treeze.taeung, mathieu.poirier, hekuang,
	sukadev, ananth, naveen.n.rao, adrian.hunter, linux-kernel,
	hemant, Ravi Bangoria

Looks like I've older address of Masami(masami.hiramatsu.pt@hitachi.com) in my database.

Adding  To: mhiramat@kernel.org

-Ravi

On Friday 24 February 2017 01:13 PM, Ravi Bangoria wrote:
> All events from 'perf list', except SDT events, can be directly recorded
> with 'perf record'. But, the flow is little different for SDT events.
> Probe point for SDT event needs to be created using 'perf probe' before
> recording it using 'perf record'.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] perf/sdt: Introduce util func is_sdt_event()
  2017-02-24  7:43 ` [PATCH v3 1/2] perf/sdt: Introduce util func is_sdt_event() Ravi Bangoria
@ 2017-02-28  1:34   ` Masami Hiramatsu
  0 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu @ 2017-02-28  1:34 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: mingo, acme, masami.hiramatsu.pt, brendan.d.gregg, peterz,
	alexander.shishkin, wangnan0, jolsa, ak, treeze.taeung,
	mathieu.poirier, hekuang, sukadev, ananth, naveen.n.rao,
	adrian.hunter, linux-kernel, hemant

On Fri, 24 Feb 2017 13:13:24 +0530
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:

> No Functionality changes.

Please describe even it seems not have much info.
----
Factor out the SDT event name checking routine as is_sdt_event().
----

BTW, would we really need to move it in util.c? I think parse-event.{c,h}
is a better place.

Thank you,

> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> ---
>  tools/perf/util/probe-event.c |  9 +--------
>  tools/perf/util/util.c        | 12 ++++++++++++
>  tools/perf/util/util.h        |  2 ++
>  3 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
> index 28fb62c..2b1409f 100644
> --- a/tools/perf/util/probe-event.c
> +++ b/tools/perf/util/probe-event.c
> @@ -1339,14 +1339,7 @@ static int parse_perf_probe_point(char *arg, struct perf_probe_event *pev)
>  	if (!arg)
>  		return -EINVAL;
>  
> -	/*
> -	 * If the probe point starts with '%',
> -	 * or starts with "sdt_" and has a ':' but no '=',
> -	 * then it should be a SDT/cached probe point.
> -	 */
> -	if (arg[0] == '%' ||
> -	    (!strncmp(arg, "sdt_", 4) &&
> -	     !!strchr(arg, ':') && !strchr(arg, '='))) {
> +	if (is_sdt_event(arg)) {
>  		pev->sdt = true;
>  		if (arg[0] == '%')
>  			arg++;
> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> index d8b45ce..b827428 100644
> --- a/tools/perf/util/util.c
> +++ b/tools/perf/util/util.c
> @@ -802,3 +802,15 @@ int unit_number__scnprintf(char *buf, size_t size, u64 n)
>  
>  	return scnprintf(buf, size, "%" PRIu64 "%c", n, unit[i]);
>  }
> +
> +/*
> + * If the probe point starts with '%',
> + * or starts with "sdt_" and has a ':' but no '=',
> + * then it should be a SDT/cached probe point.
> + */
> +bool is_sdt_event(char *str)
> +{
> +	return (str[0] == '%' ||
> +		(!strncmp(str, "sdt_", 4) &&
> +		 !!strchr(str, ':') && !strchr(str, '=')));
> +}
> diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
> index c74708d..ee599dc 100644
> --- a/tools/perf/util/util.h
> +++ b/tools/perf/util/util.h
> @@ -364,4 +364,6 @@ int is_printable_array(char *p, unsigned int len);
>  int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz);
>  
>  int unit_number__scnprintf(char *buf, size_t size, u64 n);
> +
> +bool is_sdt_event(char *str);
>  #endif /* GIT_COMPAT_UTIL_H */
> -- 
> 2.9.3
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
  2017-02-24  7:43 ` [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record' Ravi Bangoria
@ 2017-02-28  5:45   ` Masami Hiramatsu
  2017-02-28 10:56     ` Ravi Bangoria
  2017-02-28 22:31   ` Brendan Gregg
  1 sibling, 1 reply; 11+ messages in thread
From: Masami Hiramatsu @ 2017-02-28  5:45 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: mingo, acme, masami.hiramatsu.pt, brendan.d.gregg, peterz,
	alexander.shishkin, wangnan0, jolsa, ak, treeze.taeung,
	mathieu.poirier, hekuang, sukadev, ananth, naveen.n.rao,
	adrian.hunter, linux-kernel, hemant

On Fri, 24 Feb 2017 13:13:25 +0530
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:

> From: Hemant Kumar <hemant@linux.vnet.ibm.com>
> 
> Add support for directly recording SDT events which are present in
> the probe cache. Without this patch, we could probe into SDT events
> using 'perf probe' and 'perf record'. With this patch, we can probe
> the SDT events directly using 'perf record'.
> 
> For example :
> 
>   $ perf list sdt
>     sdt_libpthread:mutex_entry                         [SDT event]
>     sdt_libc:setjmp                                    [SDT event]
>     ...
> 
>   $ perf record -a -e sdt_libc:setjmp
>     ^C[ perf record: Woken up 1 times to write data ]
>     [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ]
> 
>   $ perf script
>          bash   803 [002]  6492.190311: sdt_libc:setjmp: (7f1d503b56a1)
>         login   488 [001]  6496.791583: sdt_libc:setjmp: (7ff3013d56a1)
>       fprintd 11038 [003]  6496.808032: sdt_libc:setjmp: (7fdedf5936a1)
> 
> Recording on SDT events with same provider and marker names is also
> supported:
> 
>   $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
>       Provider: libpthread
>       Name: mutex_entry
>       Location: 0x0000000000009ddb, Base: 0x00000000000139cc, ...
>     --
>       Provider: libpthread
>       Name: mutex_entry
>       Location: 0x000000000000bcbb, Base: 0x00000000000139cc, ...
> 
>   $ perf record -a -e sdt_libpthread:mutex_entry
>     Info: Recording on 2 occurrences of sdt_libpthread:mutex_entry
>     ^C[ perf record: Woken up 1 times to write data ]
>     [ perf record: Captured and wrote 0.197 MB perf.data (31 samples) ]
> 
>   $ perf script
>       in:imjournal 442 [000]  6625.701833:   sdt_libpthread:mutex_entry: (7fb1a1940ddb)
>      rs:main Q:Reg 443 [001]  6625.701889: sdt_libpthread:mutex_entry_1: (7fb1a1942cbb)
> 
> 
> After invoking 'perf record', behind the scenes, it checks whether
> the event specified is an SDT event by using the flag '%' or string
> 'sdt_'. After that, it first checks whether event already exists
> with that *name* in uprobe_events file. If found, it records that
> particular event. Otherwise, it does a lookup of the probe cache to
> find out the SDT event. If its not present, it throws an error.
> If found, it again tries to find existing events from uprobe_events
> file, but this time it uses *address* and *filename* for comparison.
> Finally it writes new events into the uprobe_events file and starts
> recording. It also maintains a list of the event names that were
> written to uprobe_events file for this session. After finishing the
> record session, it removes the events from the uprobe_events file
> using the maintained name list.

OK, the behavior looks good. However, the implementation seems
too complex, could you make it simpler?
I have some comments on the code, see below.

[...]
> diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
> index 1fcebc3..d255a6d 100644
> --- a/tools/perf/builtin-probe.c
> +++ b/tools/perf/builtin-probe.c
> @@ -45,6 +45,7 @@
>  #define DEFAULT_VAR_FILTER "!__k???tab_* & !__crc_*"
>  #define DEFAULT_FUNC_FILTER "!_*"
>  #define DEFAULT_LIST_FILTER "*"
> +#define MAX_EVENT_LENGTH 512
>  
>  /* Session management structure */
>  static struct {
> @@ -312,6 +313,19 @@ static void pr_err_with_code(const char *msg, int err)
>  	pr_err("\n");
>  }
>  
> +/* Show how to use the event. */
> +static void record_syntax_hint(const char *group, const char *event)
> +{
> +	char ge[MAX_EVENT_LENGTH];
> +
> +	pr_info("\nYou can now use it in all perf tools, such as:\n\n");
> +	pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event);
> +
> +	snprintf(ge, strlen(group) + strlen(event) + 2, "%s:%s", group, event);
> +	if (is_sdt_event(ge))
> +		pr_info("Hint: SDT event can be directly recorded with 'perf record'. No need to create probe manually.\n");
> +}

Here, we don't need this message because anyway perf_add_probe_events already
created this probe...

> +
>  static int perf_add_probe_events(struct perf_probe_event *pevs, int npevs)
>  {
>  	int ret;
> @@ -356,11 +370,8 @@ static int perf_add_probe_events(struct perf_probe_event *pevs, int npevs)
>  	}
>  
>  	/* Note that it is possible to skip all events because of blacklist */
> -	if (event) {
> -		/* Show how to use the event. */
> -		pr_info("\nYou can now use it in all perf tools, such as:\n\n");
> -		pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event);
> -	}
> +	if (event)
> +		record_syntax_hint(group, event);
>  
>  out_cleanup:
>  	cleanup_perf_probe_events(pevs, npevs);
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index bc84a37..fa9b345 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -39,6 +39,7 @@
>  #include "util/trigger.h"
>  #include "util/perf-hooks.h"
>  #include "asm/bug.h"
> +#include "util/probe-file.h"
>  
>  #include <unistd.h>
>  #include <sched.h>
> @@ -73,6 +74,7 @@ struct record {
>  	bool			timestamp_filename;
>  	struct switch_output	switch_output;
>  	unsigned long long	samples;
> +	struct list_head	sdt_event_list;
>  };
>  
>  static volatile int auxtrace_record__snapshot_started;
> @@ -1503,6 +1505,25 @@ static struct record record = {
>  	},
>  };
>  
> +void sdt_event_list__add(struct list_head *sdt_event_list)
> +{
> +	if (list_empty(sdt_event_list))
> +		return;
> +	list_splice(sdt_event_list, &record.sdt_event_list);
> +}
> +
> +bool is_cmd_record(void)
> +{
> +	return (record.evlist != NULL);
> +}
> +
> +void sdt_event_list__remove(void)
> +{
> +#ifdef HAVE_LIBELF_SUPPORT
> +	remove_sdt_event_list(&record.sdt_event_list);
> +#endif
> +}
> +
>  const char record_callchain_help[] = CALLCHAIN_RECORD_HELP
>  	"\n\t\t\t\tDefault: fp";
>  
> @@ -1671,6 +1692,7 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
>  	if (rec->evlist == NULL)
>  		return -ENOMEM;
>  
> +	INIT_LIST_HEAD(&rec->sdt_event_list);
>  	err = perf_config(perf_record_config, rec);
>  	if (err)
>  		return err;
> @@ -1841,6 +1863,7 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
>  	perf_evlist__delete(rec->evlist);
>  	symbol__exit();
>  	auxtrace_record__free(rec->itr);
> +	sdt_event_list__remove();
>  	return err;
>  }
>  
> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
> index 1c27d94..4ad12bb 100644
> --- a/tools/perf/perf.h
> +++ b/tools/perf/perf.h
> @@ -76,4 +76,6 @@ struct record_opts {
>  struct option;
>  extern const char * const *record_usage;
>  extern struct option *record_options;
> +
> +bool is_cmd_record(void);
>  #endif
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 67a8aeb..7d1f042 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -1726,12 +1726,66 @@ static void parse_events_print_error(struct parse_events_error *err,
>  
>  #undef MAX_WIDTH
>  
> +/* SDT event needs LIBELF support for creating a probe point */
> +#ifdef HAVE_LIBELF_SUPPORT
> +static int parse_sdt_event(struct perf_evlist *evlist, const char *str,
> +			   struct parse_events_error *err)
> +{
> +	char *ptr = NULL;
> +	int ret;
> +	struct list_head *sdt_evlist;
> +	struct sdt_event_list *event;
> +
> +	if (str[0] == '%')
> +		str++;
> +
> +	ptr = strdup(str);
> +	if (ptr == NULL)
> +		return -ENOMEM;
> +
> +	sdt_evlist = zalloc(sizeof(*sdt_evlist));
> +	if (!sdt_evlist) {
> +		free(ptr);
> +		pr_err("Error in sdt_evlist memory allocation\n");

Could you make this message pr_debug() and/or use the 'err' ?

> +		return -ENOMEM;
> +	}
> +	INIT_LIST_HEAD(sdt_evlist);
> +
> +	/*
> +	 * If there is an error in this call, no need to free
> +	 * up sdt_evlist, its already free'ed up in the previous
> +	 * call. Free up 'ptr' though.
> +	 */
> +	ret = add_sdt_event(ptr, sdt_evlist, err);
> +	if (!ret) {
> +		list_for_each_entry(event, sdt_evlist, list) {
> +			ret = parse_events(evlist, event->event_info, err);
> +			if (ret < 0)
> +				goto ret;
> +		}
> +		/* Add it to the record struct */
> +		sdt_event_list__add(sdt_evlist);
> +	}
> +
> +ret:
> +	free(ptr);
> +	return ret;
> +}
> +#endif /* HAVE_LIBELF_SUPPORT */
> +
>  int parse_events_option(const struct option *opt, const char *str,
>  			int unset __maybe_unused)
>  {
>  	struct perf_evlist *evlist = *(struct perf_evlist **)opt->value;
>  	struct parse_events_error err = { .idx = 0, };
> -	int ret = parse_events(evlist, str, &err);
> +	int ret = 0;
> +
> +#ifdef HAVE_LIBELF_SUPPORT
> +	if (is_sdt_event((char *)str) && is_cmd_record())
> +		ret = parse_sdt_event(evlist, str, &err);
> +	else
> +#endif
> +		ret = parse_events(evlist, str, &err);
>  
>  	if (ret)
>  		parse_events_print_error(&err, str);
> diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> index 1af6a26..205f682 100644
> --- a/tools/perf/util/parse-events.h
> +++ b/tools/perf/util/parse-events.h
> @@ -196,4 +196,6 @@ int is_valid_tracepoint(const char *event_string);
>  int valid_event_mount(const char *eventfs);
>  char *parse_events_formats_error_string(char *additional_terms);
>  
> +void sdt_event_list__add(struct list_head *sdt_event_list);
> +void sdt_event_list__remove(void);
>  #endif /* __PERF_PARSE_EVENTS_H */
> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
> index 2b1409f..a19b00d 100644
> --- a/tools/perf/util/probe-event.c
> +++ b/tools/perf/util/probe-event.c
> @@ -1293,7 +1293,7 @@ int parse_line_range_desc(const char *arg, struct line_range *lr)
>  	return err;
>  }
>  
> -static int parse_perf_probe_event_name(char **arg, struct perf_probe_event *pev)
> +int parse_perf_probe_event_name(char **arg, struct perf_probe_event *pev)
>  {
>  	char *ptr;
>  
> @@ -3166,6 +3166,12 @@ static int find_cached_events_all(struct perf_probe_event *pev,
>  	return ret;
>  }
>  
> +int find_sdt_events_from_cache(struct perf_probe_event *pev,
> +			       struct probe_trace_event **tevs)
> +{
> +	return find_cached_events_all(pev, tevs);
> +}

Would we really need this wrapper? (or just rename find_sdt_events_all and expose it)

> +
>  static int find_probe_trace_events_from_cache(struct perf_probe_event *pev,
>  					      struct probe_trace_event **tevs)
>  {
> @@ -3476,3 +3482,39 @@ int copy_to_probe_trace_arg(struct probe_trace_arg *tvar,
>  		tvar->name = NULL;
>  	return 0;
>  }
> +
> +/*
> + * Record session for SDT events has ended. Delete the SDT events
> + * from uprobe_events file that were created initially.
> + */
> +void remove_sdt_event_list(struct list_head *sdt_events)
> +{
> +	struct sdt_event_list *event;
> +	struct strfilter *filter = NULL;
> +	const char *err = NULL;
> +	int ret = 0;
> +
> +	if (list_empty(sdt_events))
> +		return;
> +
> +	list_for_each_entry(event, sdt_events, list) {
> +		if (event->existing_event)
> +			continue;
> +		if (!filter) {
> +			filter = strfilter__new(event->event_info, &err);
> +			if (!filter)
> +				goto free_list;
> +		} else {
> +			ret = strfilter__or(filter, event->event_info, &err);
> +		}
> +	}
> +
> +	if (filter)
> +		ret = del_perf_probe_events(filter);
> +
> +	if (ret)
> +		pr_err("Error in deleting the SDT list\n");

I think we don't have to care about errors while removing SDT events,
since we have no exclusive lock on that, someone can use it in parallel.
We can just ignore it.

> +
> +free_list:
> +	free_sdt_list(sdt_events);
> +}
> diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
> index 5d4e940..5ec648e 100644
> --- a/tools/perf/util/probe-event.h
> +++ b/tools/perf/util/probe-event.h
> @@ -182,4 +182,8 @@ struct map *get_target_map(const char *target, bool user);
>  void arch__post_process_probe_trace_events(struct perf_probe_event *pev,
>  					   int ntevs);
>  
> +int parse_perf_probe_event_name(char **arg, struct perf_probe_event *pev);
> +
> +int find_sdt_events_from_cache(struct perf_probe_event *pev,
> +			       struct probe_trace_event **tevs);
>  #endif /*_PROBE_EVENT_H */
> diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
> index 436b647..c941f38 100644
> --- a/tools/perf/util/probe-file.c
> +++ b/tools/perf/util/probe-file.c
> @@ -27,8 +27,10 @@
>  #include "probe-event.h"
>  #include "probe-file.h"
>  #include "session.h"
> +#include "probe-finder.h"
>  
>  #define MAX_CMDLEN 256
> +#define MAX_EVENT_LENGTH 512
>  
>  static void print_open_warning(int err, bool uprobe)
>  {
> @@ -933,3 +935,377 @@ bool probe_type_is_available(enum probe_type type)
>  
>  	return ret;
>  }
> +
> +void free_sdt_list(struct list_head *sdt_events)
> +{
> +	struct sdt_event_list *tmp, *ptr;
> +
> +	if (list_empty(sdt_events))
> +		return;
> +	list_for_each_entry_safe(tmp, ptr, sdt_events, list) {
> +		list_del(&tmp->list);
> +		free(tmp->event_info);
> +		free(tmp);
> +	}
> +}
> +
> +static int alloc_exst_sdt_event(struct exst_sdt_event_list **esl)
> +{
> +	struct exst_sdt_event_list *tmp;
> +
> +	tmp = zalloc(sizeof(*tmp));
> +	if (!tmp)
> +		return -ENOMEM;
> +
> +	tmp->tev = zalloc(sizeof(struct probe_trace_event));
> +	if (!tmp->tev) {
> +		free(tmp);
> +		return -ENOMEM;
> +	}
> +
> +	tmp->match = false;
> +	*esl = tmp;
> +	return 0;
> +}
> +
> +static void free_exst_sdt_event(struct exst_sdt_event_list *esl)
> +{
> +	if (!esl)
> +		return;
> +
> +	if (esl->tev) {
> +		free(esl->tev->args);
> +		free(esl->tev);
> +	}
> +
> +	free(esl);
> +}
> +
> +static void probe_file__free_exst_sdt_list(struct exst_sdt_event_list *esl)
> +{
> +	struct list_head *pos, *q;
> +	struct exst_sdt_event_list *tmp;
> +
> +	list_for_each_safe(pos, q, &(esl->list)) {
> +		tmp = list_entry(pos, struct exst_sdt_event_list, list);
> +		free_exst_sdt_event(tmp);
> +		list_del(pos);
> +	}
> +}
> +
> +/*
> + * Look into uprobe_events file and prepare list of sdt events
> + * whose probepoint is already registered.
> + */
> +static int probe_file__get_exst_sdt_list(struct exst_sdt_event_list *esl)
> +{
> +	int fd, ret = 0;
> +	struct strlist *rawlist;
> +	struct str_node *ent;
> +	struct exst_sdt_event_list *tmp = NULL;
> +
> +	fd = probe_file__open(PF_FL_RW | PF_FL_UPROBE);
> +	if (fd < 0)
> +		return fd;
> +
> +	rawlist = probe_file__get_rawlist(fd);
> +
> +	strlist__for_each_entry(ent, rawlist) {
> +		ret = alloc_exst_sdt_event(&tmp);
> +		if (ret < 0)
> +			goto error;
> +
> +		ret = parse_probe_trace_command(ent->s, tmp->tev);
> +		if (ret < 0) {
> +			free_exst_sdt_event(tmp);
> +			goto error;
> +		}
> +
> +		if (!strncmp(tmp->tev->group, "sdt_", 4))
> +			list_add_tail(&(tmp->list), &(esl->list));
> +		else
> +			free_exst_sdt_event(tmp);
> +	}
> +	return 0;
> +
> +error:
> +	probe_file__free_exst_sdt_list(esl);
> +	return ret;
> +}
> +
> +/*
> + * Remove ith tev from pev->tevs list and shift remaining
> + * tevs(i+1 to pev->ntevs) one step.
> + */
> +static void shift_pev(struct perf_probe_event *pev, int i)
> +{
> +	int j;
> +
> +	free(pev->tevs[i].event);
> +	free(pev->tevs[i].group);
> +	free(pev->tevs[i].args);
> +	free(pev->tevs[i].point.realname);
> +	free(pev->tevs[i].point.symbol);
> +	free(pev->tevs[i].point.module);
> +
> +	/*
> +	 * If ith element is last element, no need to shift,
> +	 * just decrement pev->ntevs.
> +	 */
> +	if (i == pev->ntevs - 1)
> +		goto ret;
> +
> +	for (j = i; j < pev->ntevs - 1; j++) {
> +		pev->tevs[j].event          = pev->tevs[j + 1].event;
> +		pev->tevs[j].group          = pev->tevs[j + 1].group;
> +		pev->tevs[j].nargs          = pev->tevs[j + 1].nargs;
> +		pev->tevs[j].uprobes        = pev->tevs[j + 1].uprobes;
> +		pev->tevs[j].args           = pev->tevs[j + 1].args;
> +		pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname;
> +		pev->tevs[j].point.symbol   = pev->tevs[j + 1].point.symbol;
> +		pev->tevs[j].point.module   = pev->tevs[j + 1].point.module;
> +		pev->tevs[j].point.offset   = pev->tevs[j + 1].point.offset;
> +		pev->tevs[j].point.address  = pev->tevs[j + 1].point.address;
> +		pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe;
> +	}
> +
> +ret:
> +	pev->ntevs--;
> +}

This code smells no good...
Why don't you make a list of newly added probes ?

> +
> +/* Compare address and filename */
> +static bool is_sdt_match(struct probe_trace_event *tev1,
> +			 struct probe_trace_event *tev2)
> +{
> +	return (tev1->point.address == tev2->point.address &&
> +		!(strcmp(tev1->point.module, tev2->point.module)));
> +}
> +
> +/*
> + * Filter out all those pev->tevs which already exists in uprobe_events.
> + * Return 'true' if any matching entry found otherwise return 'false'.
> + */
> +static bool filter_exst_sdt_events_from_pev(struct perf_probe_event *pev,
> +					    struct exst_sdt_event_list *esl)
> +{
> +	int i;
> +	bool ret = false;
> +	struct exst_sdt_event_list *tmp;
> +
> +	list_for_each_entry(tmp, &(esl->list), list) {
> +		for (i = 0; i < pev->ntevs; i++) {
> +			if (is_sdt_match(&(pev->tevs[i]), tmp->tev)) {
> +				shift_pev(pev, i);
> +				tmp->match = true;
> +				ret = true;
> +			}
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static int list_add_sdt_event(struct list_head *sdt_events,
> +			      bool existing_event,
> +			      struct probe_trace_event *tev)
> +{
> +	struct sdt_event_list *tmp;
> +
> +	tmp = zalloc(sizeof(*tmp));
> +	if (!tmp)
> +		return -ENOMEM;
> +
> +	tmp->existing_event = existing_event;
> +
> +	INIT_LIST_HEAD(&tmp->list);
> +	tmp->event_info = zalloc(MAX_EVENT_LENGTH * sizeof(char));
> +	if (!tmp->event_info) {
> +		free(tmp);
> +		return -ENOMEM;
> +	}
> +
> +	snprintf(tmp->event_info, strlen(tev->group) + strlen(tev->event) + 2,
> +		 "%s:%s", tev->group, tev->event);
> +
> +	list_add(&tmp->list, sdt_events);
> +
> +	return 0;
> +}
> +
> +static void print_exst_sdt_events(struct exst_sdt_event_list *tmp)
> +{
> +	static bool msg_head;
> +
> +	if (!msg_head) {
> +		pr_info("Matching event(s) from uprobe_events:\n");
> +		msg_head = true;
> +	}
> +
> +	pr_info("   %s:%s  0x%" PRIx64 "@%s\n", tmp->tev->group,
> +		tmp->tev->event, tmp->tev->point.address,
> +		tmp->tev->point.module);
> +}
> +
> +static void print_exst_sdt_event_footer(void)
> +{
> +	pr_info("Use 'perf probe -d <event>' to delete event(s).\n\n");
> +}
> +
> +/*
> + * If there is entry with the same name in uprobe_events, record it.
> + * Return value  0: no error, not found
> + *              <0: error
> + *              >0: found
> + */
> +static int probe_file__add_exst_sdt_event(struct exst_sdt_event_list *esl,
> +					  struct list_head *sdt_events,
> +					  struct perf_probe_event *pev)
> +{
> +	struct exst_sdt_event_list *tmp;
> +	int ret = 0;
> +
> +	list_for_each_entry(tmp, &(esl->list), list) {
> +		if (strcmp(tmp->tev->group, pev->group) ||
> +		    strcmp(tmp->tev->event, pev->event))
> +			continue;
> +
> +		tmp->match = true;
> +
> +		ret = list_add_sdt_event(sdt_events, true, tmp->tev);
> +		if (ret < 0)
> +			return ret;
> +
> +		print_exst_sdt_events(tmp);
> +		print_exst_sdt_event_footer();
> +		if (pev->ntevs > 1)
> +			pr_warning("Warning: Found %d events from probe-cache with name '%s:%s'.\n"
> +				"\t Since probe point already exists with this name, recording only 1 event.\n"
> +				"Hint: Please use 'perf probe -d %s:%s*' to allow record on all events.\n\n",
> +				pev->ntevs, pev->group, pev->event, pev->group, pev->event);
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +int add_sdt_event(char *event, struct list_head *sdt_events,
> +		  struct parse_events_error *err)
> +{
> +	struct perf_probe_event *pev;
> +	int ret = 0, i, ctr = 0, found = 0, exst_ctr = 0;
> +	char *str;
> +	struct exst_sdt_event_list *tmp;
> +	struct exst_sdt_event_list esl;
> +	bool filter;
> +
> +	pev = zalloc(sizeof(*pev));
> +	if (!pev)
> +		return -ENOMEM;
> +
> +	pev->sdt = true;
> +	pev->uprobes = true;
> +
> +	str = strdup(event);
> +	if (!str)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Parse event to find the group name and event name of
> +	 * the sdt event.
> +	 */
> +	ret = parse_perf_probe_event_name(&event, pev);
> +	if (ret) {
> +		pr_err("Error in parsing sdt event %s\n", str);
> +		free(pev);
> +		goto free_str;
> +	}
> +
> +	probe_conf.max_probes = MAX_PROBES;
> +	probe_conf.force_add = 1;
> +
> +	/*
> +	 * Find the sdt event from the cache. We deliberately check failure
> +	 * of this function after checking entries in uprobe_events. Because,
> +	 * we may find matching entry from uprobe_events and in that case we
> +	 * should continue recording that event.
> +	 */
> +	pev->ntevs = find_sdt_events_from_cache(pev, &pev->tevs);
> +
> +	/* Prepare list of existing sdt events from uprobe_events */
> +	INIT_LIST_HEAD(&esl.list);
> +	ret = probe_file__get_exst_sdt_list(&esl);
> +	if (ret < 0)
> +		goto free_str;
> +
> +	/* If there is entry with the same name in uprobe_events, record it. */
> +	found = probe_file__add_exst_sdt_event(&esl, sdt_events, pev);
> +	if (found) {
> +		ret = (found > 0) ? 0 : found;
> +		goto free_str;
> +	}

Curious, this should be done before searching cached SDT, so that
we can just skip listing up existing probes.

> +
> +	/* Reaching here means no matching entry found in uprobe_events. */
> +	filter = filter_exst_sdt_events_from_pev(pev, &esl);
> +	if (!filter && pev->ntevs == 0) {
> +		pr_err("%s:%s not found in the cache\n", pev->group,
> +			pev->event);
> +		ret = -EINVAL;
> +		goto free_str;
> +	} else if (pev->ntevs < 0) {
> +		err->str = strdup("Cache lookup failed.\n");
> +		ret = pev->ntevs;
> +		goto free_str;
> +	}

Again, why is the esl needed here? We can remove events by using it's name.

What I expected was,

0) add "generated_sdts" in record structure.
1) search existing events by using its name from tracing/events/.
(Don't need to touch uprobe_events)
2) if it does not exist, add new probe points and store its name in "generated_sdts".
3) record events
4) remove the events by using "generated_sdts" which only has the event name.

Thank you,

> +
> +	/* Create probe points for new events. */
> +	ret = apply_perf_probe_events(pev, 1);
> +	if (ret) {
> +		pr_err("Error in adding SDT event : %s\n", event);
> +		goto free_str;
> +	}
> +
> +	/* Add existing event names to "sdt_events" list */
> +	list_for_each_entry(tmp, &(esl.list), list) {
> +		if (!tmp->match)
> +			continue;
> +
> +		ret = list_add_sdt_event(sdt_events, true, tmp->tev);
> +		if (ret < 0)
> +			goto free_str;
> +		print_exst_sdt_events(tmp);
> +		ctr++;
> +		exst_ctr++;
> +	}
> +	if (exst_ctr)
> +		print_exst_sdt_event_footer();
> +
> +	/* Add new event names to "sdt_events" list */
> +	for (i = 0; i < pev->ntevs; i++) {
> +		ret = list_add_sdt_event(sdt_events, false, &(pev->tevs[i]));
> +		if (ret < 0)
> +			goto free_str;
> +
> +		ctr++;
> +	}
> +
> +	/* Print warning for multiple events */
> +	if (ctr > 1)
> +		pr_warning("Warning: Recording on %d occurrences of %s:%s\n",
> +			   ctr, pev->group, pev->event);
> +
> +free_str:
> +	/*
> +	 * User may ask for multiple events in the same record command like,
> +	 *    perf record -a -e sdt_lib1:* -e sdt_a:b
> +	 *
> +	 * If sdt_lib1:* events are already added and there is some failure
> +	 * for sdt_a:b, we need to clean sdt_lib1:* events from
> +	 * record.sdt_event_list
> +	 */
> +	if (ret < 0)
> +		sdt_event_list__remove();
> +
> +	free(str);
> +	probe_file__free_exst_sdt_list(&esl);
> +	cleanup_perf_probe_events(pev, 1);
> +	return ret;
> +}
> diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
> index eba44c3..e785eb6 100644
> --- a/tools/perf/util/probe-file.h
> +++ b/tools/perf/util/probe-file.h
> @@ -4,6 +4,7 @@
>  #include "strlist.h"
>  #include "strfilter.h"
>  #include "probe-event.h"
> +#include "parse-events.h"
>  
>  /* Cache of probe definitions */
>  struct probe_cache_entry {
> @@ -19,6 +20,28 @@ struct probe_cache {
>  	struct list_head entries;
>  };
>  
> +struct sdt_event_list {
> +	char *event_info;
> +	struct list_head list;
> +
> +	/*
> +	 * Flag to check whether this event already exists in urobe_events
> +	 * file. This helps at a time of freeing sdt_event_list.
> +	 */
> +	bool existing_event;
> +};
> +
> +/*
> + * This maintains list of sdt events which are already exists in
> + * uprobe_events file at the time of creating probe point for any
> + * sdt event with 'perf record'.
> + */
> +struct exst_sdt_event_list {
> +	struct probe_trace_event *tev;
> +	struct list_head list;
> +	bool match;	/* Does this event match with any new event */
> +};
> +
>  enum probe_type {
>  	PROBE_TYPE_U = 0,
>  	PROBE_TYPE_S,
> @@ -64,6 +87,10 @@ struct probe_cache_entry *probe_cache__find_by_name(struct probe_cache *pcache,
>  					const char *group, const char *event);
>  int probe_cache__show_all_caches(struct strfilter *filter);
>  bool probe_type_is_available(enum probe_type type);
> +int add_sdt_event(char *event, struct list_head *sdt_events,
> +		  struct parse_events_error *err);
> +void remove_sdt_event_list(struct list_head *sdt_event_list);
> +void free_sdt_list(struct list_head *sdt_events);
>  #else	/* ! HAVE_LIBELF_SUPPORT */
>  static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused)
>  {
> -- 
> 2.9.3
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
  2017-02-28  5:45   ` Masami Hiramatsu
@ 2017-02-28 10:56     ` Ravi Bangoria
  2017-03-01  5:45       ` Masami Hiramatsu
  0 siblings, 1 reply; 11+ messages in thread
From: Ravi Bangoria @ 2017-02-28 10:56 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: mingo, acme, masami.hiramatsu.pt, brendan.d.gregg, peterz,
	alexander.shishkin, wangnan0, jolsa, ak, treeze.taeung,
	mathieu.poirier, hekuang, sukadev, ananth, naveen.n.rao,
	adrian.hunter, linux-kernel, hemant, Ravi Bangoria

Thanks Masami for the review.
Please find my replies to your comments.

On Tuesday 28 February 2017 11:15 AM, Masami Hiramatsu wrote:
> On Fri, 24 Feb 2017 13:13:25 +0530
> Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:
>
>> After invoking 'perf record', behind the scenes, it checks whether
>> the event specified is an SDT event by using the flag '%' or string
>> 'sdt_'. After that, it first checks whether event already exists
>> with that *name* in uprobe_events file. If found, it records that
>> particular event. Otherwise, it does a lookup of the probe cache to
>> find out the SDT event. If its not present, it throws an error.
>> If found, it again tries to find existing events from uprobe_events
>> file, but this time it uses *address* and *filename* for comparison.
>> Finally it writes new events into the uprobe_events file and starts
>> recording. It also maintains a list of the event names that were
>> written to uprobe_events file for this session. After finishing the
>> record session, it removes the events from the uprobe_events file
>> using the maintained name list.
> OK, the behavior looks good. However, the implementation seems
> too complex, could you make it simpler?
> I have some comments on the code, see below.

Yes, it's complex code. It's all because of multiple SDT events with same name
exists and we wants to allow both -- 'perf probe' as well as 'perf record' -- which
creates complex situations that has to be taken care. See my replies below for
such examples.

>> +/*
>> + * Remove ith tev from pev->tevs list and shift remaining
>> + * tevs(i+1 to pev->ntevs) one step.
>> + */
>> +static void shift_pev(struct perf_probe_event *pev, int i)
>> +{
>> +	int j;
>> +
>> +	free(pev->tevs[i].event);
>> +	free(pev->tevs[i].group);
>> +	free(pev->tevs[i].args);
>> +	free(pev->tevs[i].point.realname);
>> +	free(pev->tevs[i].point.symbol);
>> +	free(pev->tevs[i].point.module);
>> +
>> +	/*
>> +	 * If ith element is last element, no need to shift,
>> +	 * just decrement pev->ntevs.
>> +	 */
>> +	if (i == pev->ntevs - 1)
>> +		goto ret;
>> +
>> +	for (j = i; j < pev->ntevs - 1; j++) {
>> +		pev->tevs[j].event          = pev->tevs[j + 1].event;
>> +		pev->tevs[j].group          = pev->tevs[j + 1].group;
>> +		pev->tevs[j].nargs          = pev->tevs[j + 1].nargs;
>> +		pev->tevs[j].uprobes        = pev->tevs[j + 1].uprobes;
>> +		pev->tevs[j].args           = pev->tevs[j + 1].args;
>> +		pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname;
>> +		pev->tevs[j].point.symbol   = pev->tevs[j + 1].point.symbol;
>> +		pev->tevs[j].point.module   = pev->tevs[j + 1].point.module;
>> +		pev->tevs[j].point.offset   = pev->tevs[j + 1].point.offset;
>> +		pev->tevs[j].point.address  = pev->tevs[j + 1].point.address;
>> +		pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe;
>> +	}
>> +
>> +ret:
>> +	pev->ntevs--;
>> +}
> This code smells no good...
> Why don't you make a list of newly added probes ?
>

Yes. This is little ugly. Consider a following scenario.

  $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
        Provider: libpthread
        Name: mutex_entry
        Location: 0x0000000000009ddb, Base: 0x00000000000139cc, ...
      --
        Provider: libpthread
        Name: mutex_entry
        Location: 0x000000000000bcbb, Base: 0x00000000000139cc, ...

  $ perf probe sdt_libpthread:mutex_entry

  $ cat /sys/kernel/debug/tracing/uprobe_events
    p:sdt_libpthread/mutex_entry /usr/lib64/libpthread-2.24.so:0x0000000000009ddb
    p:sdt_libpthread/mutex_entry_1 /usr/lib64/libpthread-2.24.so:0x000000000000bcbb

  $ perf probe -d sdt_libpthread:mutex_entry

  $ perf record -a -e sdt_libpthread:mutex_entry
     /* This should record:
           sdt_libpthread/mutex_entry          <== New entry (delete at the end of session)
           sdt_libpthread/mutex_entry_1      <== Reuse (do not delete at the end of session)
      */

First of all, perf looks into uprobe_events for entry with sdt_libpthread:mutex_entry
*name*. As it's not present, perf does a lookup in probe-cache and gets two entries.
Perf stores them in 'pev->tevs' array. Now it again compares uprobe_events entries
with entries found from probe-cache with *filename* and *address* as comparison
parameter (because this combo makes unique identifier for any SDT event in a
particular system - please correct me if that's wrong). And perf finds one entry
sdt_libpthread/mutex_entry_1.

Keeping this entry in 'pevs->tevs' and calling apply_perf_probe_events() will add
new entry for that event even if it exists. So this code removes that particular
element of an array and shift all remaining elements one step.

Please suggest if you have a better approach.

>> +
>> +/* Compare address and filename */
>> +static bool is_sdt_match(struct probe_trace_event *tev1,
>> +			 struct probe_trace_event *tev2)
>> +{
>> +	return (tev1->point.address == tev2->point.address &&
>> +		!(strcmp(tev1->point.module, tev2->point.module)));
>> +}
>> +
...
>> +	probe_conf.max_probes = MAX_PROBES;
>> +	probe_conf.force_add = 1;
>> +
>> +	/*
>> +	 * Find the sdt event from the cache. We deliberately check failure
>> +	 * of this function after checking entries in uprobe_events. Because,
>> +	 * we may find matching entry from uprobe_events and in that case we
>> +	 * should continue recording that event.
>> +	 */
>> +	pev->ntevs = find_sdt_events_from_cache(pev, &pev->tevs);
>> +
>> +	/* Prepare list of existing sdt events from uprobe_events */
>> +	INIT_LIST_HEAD(&esl.list);
>> +	ret = probe_file__get_exst_sdt_list(&esl);
>> +	if (ret < 0)
>> +		goto free_str;
>> +
>> +	/* If there is entry with the same name in uprobe_events, record it. */
>> +	found = probe_file__add_exst_sdt_event(&esl, sdt_events, pev);
>> +	if (found) {
>> +		ret = (found > 0) ? 0 : found;
>> +		goto free_str;
>> +	}
> Curious, this should be done before searching cached SDT, so that
> we can just skip listing up existing probes.

Yes that's intentional. We warn user if event with exact name found from
uprobe_events and there are multiple instances of that event found from
probe-cache. For example,

  $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
        Provider: libpthread
        Name: mutex_entry
        Location: 0x0000000000009ddb, Base: 0x00000000000139cc, ...
      --
        Provider: libpthread
        Name: mutex_entry
        Location: 0x000000000000bcbb, Base: 0x00000000000139cc, ...

  $ cat /sys/kernel/debug/tracing/uprobe_events
    p:sdt_libpthread/mutex_entry /usr/lib64/libpthread-2.24.so:0x0000000000009ddb

  $ perf record -a -e sdt_libpthread/mutex_entry
    Warning: Found 2 events from probe-cache with name 'sdt_libpthread:mutex_entry'.
             Since probe point already exists with this name, recording only 1 event.
    Hint: Please use 'perf probe -d sdt_libpthread:mutex_entry*' to allow record on all events.

>> +
>> +	/* Reaching here means no matching entry found in uprobe_events. */
>> +	filter = filter_exst_sdt_events_from_pev(pev, &esl);
>> +	if (!filter && pev->ntevs == 0) {
>> +		pr_err("%s:%s not found in the cache\n", pev->group,
>> +			pev->event);
>> +		ret = -EINVAL;
>> +		goto free_str;
>> +	} else if (pev->ntevs < 0) {
>> +		err->str = strdup("Cache lookup failed.\n");
>> +		ret = pev->ntevs;
>> +		goto free_str;
>> +	}
> Again, why is the esl needed here? We can remove events by using it's name.
>
> What I expected was,
>
> 0) add "generated_sdts" in record structure.
> 1) search existing events by using its name from tracing/events/.
> (Don't need to touch uprobe_events)

I didn't get your point to look at tracing/events/. It's easy to look into
uprobe_events, because perf has a builtin functions that easily transform
entries of uprobe_events to 'tevs'. And uprobe_events has all data like
eventname, filename, address which are crucial for this implementation.

> 2) if it does not exist, add new probe points and store its name in "generated_sdts".

This step is little difficult. As I mentioned earlier, major issue is, event
exists in uprobe_events but with different name. In such case I've to
reuse existing entries and also have to add new entries for events
which are absent.

> 3) record events
> 4) remove the events by using "generated_sdts" which only has the event name.
>

Let me know if I've misunderstood any of your point.

Thanks,
-Ravi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
  2017-02-24  7:43 ` [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record' Ravi Bangoria
  2017-02-28  5:45   ` Masami Hiramatsu
@ 2017-02-28 22:31   ` Brendan Gregg
  2017-03-01  5:04     ` Brendan Gregg
  1 sibling, 1 reply; 11+ messages in thread
From: Brendan Gregg @ 2017-02-28 22:31 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Masami Hiramatsu,
	Peter Zijlstra, Alexander Shishkin, Wang Nan, Jiri Olsa,
	Andi Kleen, treeze.taeung, Mathieu Poirier, He Kuang, sukadev,
	ananth, Naveen N. Rao, Adrian Hunter, LKML, Hemant Kumar,
	Masami Hiramatsu

G'Day Ravi,

On Thu, Feb 23, 2017 at 11:43 PM, Ravi Bangoria
<ravi.bangoria@linux.vnet.ibm.com> wrote:
>
> From: Hemant Kumar <hemant@linux.vnet.ibm.com>
>
> Add support for directly recording SDT events which are present in
> the probe cache. Without this patch, we could probe into SDT events
> using 'perf probe' and 'perf record'. With this patch, we can probe
> the SDT events directly using 'perf record'.
>
> For example :
>
>   $ perf list sdt
>     sdt_libpthread:mutex_entry                         [SDT event]
>     sdt_libc:setjmp                                    [SDT event]
>     ...
>
>   $ perf record -a -e sdt_libc:setjmp
>     ^C[ perf record: Woken up 1 times to write data ]
>     [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ]
>
>   $ perf script
>          bash   803 [002]  6492.190311: sdt_libc:setjmp: (7f1d503b56a1)
>         login   488 [001]  6496.791583: sdt_libc:setjmp: (7ff3013d56a1)
>       fprintd 11038 [003]  6496.808032: sdt_libc:setjmp: (7fdedf5936a1)
> [...]


Thanks, I like the usage. I ran into trouble testing on Node.js:

# ./perf buildid-cache --add `which node`
# ./perf list | grep sdt_node
  sdt_node:gc__done                                  [SDT event]
  sdt_node:gc__start                                 [SDT event]
  sdt_node:http__client__request                     [SDT event]
  sdt_node:http__client__response                    [SDT event]
  sdt_node:http__server__request                     [SDT event]
  sdt_node:http__server__response                    [SDT event]
  sdt_node:net__server__connection                   [SDT event]
  sdt_node:net__stream__end                          [SDT event]
# ./perf record -e sdt_node:http__server__request -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.308 MB perf.data ]
# ./perf script
#

No events. I can see it had set it up:

# cat /sys/kernel/debug/tracing/uprobe_events
p:sdt_node/http__server__request /usr/local/bin/node:0x00000000009c2e69

Ok. Am I sure my workload is working? Trying from bcc/eBPF:

# /mnt/src/bcc/tools/trace.py 'u:node:http__server__request'
failed to enable probe 'http__server__request'; a possible cause can
be that the probe requires a pid to enable
# /mnt/src/bcc/tools/trace.py -p `pgrep node` 'u:node:http__server__request'
In file included from /virtual/main.c:41:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/blkdev.h:9:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/genhd.h:64:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/device.h:24:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/pinctrl/devinfo.h:21:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/pinctrl/consumer.h:17:
In file included from
/lib/modules/4.10.0-rc8-virtual/build/include/linux/seq_file.h:10:
/lib/modules/4.10.0-rc8-virtual/build/include/linux/fs.h:2648:9:
warning: comparison of unsigned enum expression < 0 is always false
[-Wtautological-compare]
        if (id < 0 || id >= READING_MAX_ID)
            ~~ ^ ~
1 warning generated.
PID    TID    COMM         FUNC
7646   7646   node         http__server__request
7646   7646   node         http__server__request
7646   7646   node         http__server__request
^C

(ignore the warning; I just asked lkml about it). So that works. It
instrumented:

# cat /sys/kernel/debug/tracing/uprobe_events
p:uprobes/p__usr_local_bin_node_0x9c2e69_bcc_25410
/usr/local/bin/node:0x00000000009c2e69

Now retrying perf:

# ./perf record -e sdt_node:http__server__request -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ]
# ./perf script
            node  7646 [002]   361.012364:
sdt_node:http__server__request: (dc2e69)
            node  7646 [002]   361.204718:
sdt_node:http__server__request: (dc2e69)
            node  7646 [002]   361.363043:
sdt_node:http__server__request: (dc2e69)

Now perf works.

If I restart the node process, it goes back to the broken state.

This doesn't fix it either:

# ./perf probe sdt_node:http__server__request
Added new event:
  sdt_node:http__server__request (on %http__server__request in
/usr/local/bin/node)

You can now use it in all perf tools, such as:

    perf record -e sdt_node:http__server__request -aR sleep 1

Hint: SDT event can be directly recorded with 'perf record'. No need
to create probe manually.

Brendan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
  2017-02-28 22:31   ` Brendan Gregg
@ 2017-03-01  5:04     ` Brendan Gregg
  2017-03-01  5:45       ` Ravi Bangoria
  0 siblings, 1 reply; 11+ messages in thread
From: Brendan Gregg @ 2017-03-01  5:04 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Alexander Shishkin, Wang Nan, Jiri Olsa, Andi Kleen,
	treeze.taeung, Mathieu Poirier, He Kuang, sukadev, ananth,
	Naveen N. Rao, Adrian Hunter, LKML, Hemant Kumar,
	Masami Hiramatsu

On Tue, Feb 28, 2017 at 2:31 PM, Brendan Gregg
<brendan.d.gregg@gmail.com> wrote:
> G'Day Ravi,
>
[...]
> Now retrying perf:
>
> # ./perf record -e sdt_node:http__server__request -a
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ]
> # ./perf script
>             node  7646 [002]   361.012364:
> sdt_node:http__server__request: (dc2e69)
>             node  7646 [002]   361.204718:
> sdt_node:http__server__request: (dc2e69)
>             node  7646 [002]   361.363043:
> sdt_node:http__server__request: (dc2e69)
>
> Now perf works.
>
> If I restart the node process, it goes back to the broken state.
>

Oh sorry, I forgot about that these Node.js probes are behind an
is-enabled semaphore.

$ readelf -n `which node`
[...]
  stapsdt              0x00000089    NT_STAPSDT (SystemTap probe descriptors)
    Provider: node
    Name: http__server__request
    Location: 0x0000000000dc2e69, Base: 0x000000000112e064, Semaphore:
0x0000000001470954
    Arguments: 8@%r14 8@%rax 8@-4344(%rbp) -4@-4348(%rbp)
8@-4304(%rbp) 8@-4312(%rbp) -4@-4352(%rbp)
# dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x0000000001470954 ))
2>/dev/null | xxd
00000000: 00                                       .
 # printf "\x1" | dd of=/proc/31695/mem bs=1 count=1 seek=$((
0x0000000001470954 )) 2>/dev/null
# dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x0000000001470954 ))
2>/dev/null | xxd
00000000: 01                                       .
# ./perf record -e sdt_node:http__server__request -a
Matching event(s) from uprobe_events:
   sdt_node:http__server__request  0x9c2e69@/usr/local/bin/node
Use 'perf probe -d <event>' to delete event(s).
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.280 MB perf.data (3 samples) ]
# ./perf script
            node 31695 [003] 24947.168761:
sdt_node:http__server__request: (dc2e69)
            node 31695 [003] 24947.476143:
sdt_node:http__server__request: (dc2e69)
            node 31695 [003] 24947.679090:
sdt_node:http__server__request: (dc2e69)

So setting that to 1 made the probe work from perf. I guess this is
not a problem with this patch set, but rather a feature request for
the next one: is-enabled SDT support.

Were probe arguments supposed to work? I don't notice them in the perf
script output.

PS, if it's helpful, here's the commands to build node with these SDT probes:

$ sudo apt-get install systemtap-sdt-dev       # adds "dtrace", used
by node build
$ wget https://nodejs.org/dist/v4.4.1/node-v4.4.1.tar.gz
$ tar xvf node-v4.4.1.tar.gz
$ cd node-v4.4.1
$ ./configure --with-dtrace
$ make -j 8

Brendan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
  2017-02-28 10:56     ` Ravi Bangoria
@ 2017-03-01  5:45       ` Masami Hiramatsu
  0 siblings, 0 replies; 11+ messages in thread
From: Masami Hiramatsu @ 2017-03-01  5:45 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: mingo, acme, masami.hiramatsu.pt, brendan.d.gregg, peterz,
	alexander.shishkin, wangnan0, jolsa, ak, treeze.taeung,
	mathieu.poirier, hekuang, sukadev, ananth, naveen.n.rao,
	adrian.hunter, linux-kernel, hemant

On Tue, 28 Feb 2017 16:26:19 +0530
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:

> Thanks Masami for the review.
> Please find my replies to your comments.
> 
> On Tuesday 28 February 2017 11:15 AM, Masami Hiramatsu wrote:
> > On Fri, 24 Feb 2017 13:13:25 +0530
> > Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> wrote:
> >
> >> After invoking 'perf record', behind the scenes, it checks whether
> >> the event specified is an SDT event by using the flag '%' or string
> >> 'sdt_'. After that, it first checks whether event already exists
> >> with that *name* in uprobe_events file. If found, it records that
> >> particular event. Otherwise, it does a lookup of the probe cache to
> >> find out the SDT event. If its not present, it throws an error.
> >> If found, it again tries to find existing events from uprobe_events
> >> file, but this time it uses *address* and *filename* for comparison.
> >> Finally it writes new events into the uprobe_events file and starts
> >> recording. It also maintains a list of the event names that were
> >> written to uprobe_events file for this session. After finishing the
> >> record session, it removes the events from the uprobe_events file
> >> using the maintained name list.
> > OK, the behavior looks good. However, the implementation seems
> > too complex, could you make it simpler?
> > I have some comments on the code, see below.
> 
> Yes, it's complex code. It's all because of multiple SDT events with same name
> exists and we wants to allow both -- 'perf probe' as well as 'perf record' -- which
> creates complex situations that has to be taken care. See my replies below for
> such examples.

OK, but in that case, could you split those support for review?
E.g. the first one just combine perf-probe and perf-record and
second one support multiple SDT events.


> 
> >> +/*
> >> + * Remove ith tev from pev->tevs list and shift remaining
> >> + * tevs(i+1 to pev->ntevs) one step.
> >> + */
> >> +static void shift_pev(struct perf_probe_event *pev, int i)
> >> +{
> >> +	int j;
> >> +
> >> +	free(pev->tevs[i].event);
> >> +	free(pev->tevs[i].group);
> >> +	free(pev->tevs[i].args);
> >> +	free(pev->tevs[i].point.realname);
> >> +	free(pev->tevs[i].point.symbol);
> >> +	free(pev->tevs[i].point.module);
> >> +
> >> +	/*
> >> +	 * If ith element is last element, no need to shift,
> >> +	 * just decrement pev->ntevs.
> >> +	 */
> >> +	if (i == pev->ntevs - 1)
> >> +		goto ret;
> >> +
> >> +	for (j = i; j < pev->ntevs - 1; j++) {
> >> +		pev->tevs[j].event          = pev->tevs[j + 1].event;
> >> +		pev->tevs[j].group          = pev->tevs[j + 1].group;
> >> +		pev->tevs[j].nargs          = pev->tevs[j + 1].nargs;
> >> +		pev->tevs[j].uprobes        = pev->tevs[j + 1].uprobes;
> >> +		pev->tevs[j].args           = pev->tevs[j + 1].args;
> >> +		pev->tevs[j].point.realname = pev->tevs[j + 1].point.realname;
> >> +		pev->tevs[j].point.symbol   = pev->tevs[j + 1].point.symbol;
> >> +		pev->tevs[j].point.module   = pev->tevs[j + 1].point.module;
> >> +		pev->tevs[j].point.offset   = pev->tevs[j + 1].point.offset;
> >> +		pev->tevs[j].point.address  = pev->tevs[j + 1].point.address;
> >> +		pev->tevs[j].point.retprobe = pev->tevs[j + 1].point.retprobe;
> >> +	}
> >> +
> >> +ret:
> >> +	pev->ntevs--;
> >> +}
> > This code smells no good...
> > Why don't you make a list of newly added probes ?
> >
> 
> Yes. This is little ugly. Consider a following scenario.
> 
>   $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
>         Provider: libpthread
>         Name: mutex_entry
>         Location: 0x0000000000009ddb, Base: 0x00000000000139cc, ...
>       --
>         Provider: libpthread
>         Name: mutex_entry
>         Location: 0x000000000000bcbb, Base: 0x00000000000139cc, ...
> 
>   $ perf probe sdt_libpthread:mutex_entry
> 
>   $ cat /sys/kernel/debug/tracing/uprobe_events
>     p:sdt_libpthread/mutex_entry /usr/lib64/libpthread-2.24.so:0x0000000000009ddb
>     p:sdt_libpthread/mutex_entry_1 /usr/lib64/libpthread-2.24.so:0x000000000000bcbb
> 
>   $ perf probe -d sdt_libpthread:mutex_entry
> 
>   $ perf record -a -e sdt_libpthread:mutex_entry
>      /* This should record:
>            sdt_libpthread/mutex_entry          <== New entry (delete at the end of session)
>            sdt_libpthread/mutex_entry_1      <== Reuse (do not delete at the end of session)
>       */

Hmm, OK. but this also not becomes the reason why you need 2 different
lists. After calling apply_perf_probe_events(), pev->tevs have been
updated, which have applied names. So you can pull it from the trace_event.

> 
> First of all, perf looks into uprobe_events for entry with sdt_libpthread:mutex_entry
> *name*. As it's not present, perf does a lookup in probe-cache and gets two entries.
> Perf stores them in 'pev->tevs' array. Now it again compares uprobe_events entries
> with entries found from probe-cache with *filename* and *address* as comparison
> parameter (because this combo makes unique identifier for any SDT event in a
> particular system - please correct me if that's wrong). And perf finds one entry
> sdt_libpthread/mutex_entry_1.
> 
> Keeping this entry in 'pevs->tevs' and calling apply_perf_probe_events() will add
> new entry for that event even if it exists. So this code removes that particular
> element of an array and shift all remaining elements one step.
> 
> Please suggest if you have a better approach.

As I described above, you can get actual name of newly added event after calling
apply_perf_probe_events(). (if not, please feel free to update it to do so.)


> 
> >> +
> >> +/* Compare address and filename */
> >> +static bool is_sdt_match(struct probe_trace_event *tev1,
> >> +			 struct probe_trace_event *tev2)
> >> +{
> >> +	return (tev1->point.address == tev2->point.address &&
> >> +		!(strcmp(tev1->point.module, tev2->point.module)));
> >> +}
> >> +
> ...
> >> +	probe_conf.max_probes = MAX_PROBES;
> >> +	probe_conf.force_add = 1;
> >> +
> >> +	/*
> >> +	 * Find the sdt event from the cache. We deliberately check failure
> >> +	 * of this function after checking entries in uprobe_events. Because,
> >> +	 * we may find matching entry from uprobe_events and in that case we
> >> +	 * should continue recording that event.
> >> +	 */
> >> +	pev->ntevs = find_sdt_events_from_cache(pev, &pev->tevs);
> >> +
> >> +	/* Prepare list of existing sdt events from uprobe_events */
> >> +	INIT_LIST_HEAD(&esl.list);
> >> +	ret = probe_file__get_exst_sdt_list(&esl);
> >> +	if (ret < 0)
> >> +		goto free_str;
> >> +
> >> +	/* If there is entry with the same name in uprobe_events, record it. */
> >> +	found = probe_file__add_exst_sdt_event(&esl, sdt_events, pev);
> >> +	if (found) {
> >> +		ret = (found > 0) ? 0 : found;
> >> +		goto free_str;
> >> +	}
> > Curious, this should be done before searching cached SDT, so that
> > we can just skip listing up existing probes.
> 
> Yes that's intentional. We warn user if event with exact name found from
> uprobe_events and there are multiple instances of that event found from
> probe-cache. For example,
> 
>   $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider
>         Provider: libpthread
>         Name: mutex_entry
>         Location: 0x0000000000009ddb, Base: 0x00000000000139cc, ...
>       --
>         Provider: libpthread
>         Name: mutex_entry
>         Location: 0x000000000000bcbb, Base: 0x00000000000139cc, ...
> 
>   $ cat /sys/kernel/debug/tracing/uprobe_events
>     p:sdt_libpthread/mutex_entry /usr/lib64/libpthread-2.24.so:0x0000000000009ddb
> 
>   $ perf record -a -e sdt_libpthread/mutex_entry
>     Warning: Found 2 events from probe-cache with name 'sdt_libpthread:mutex_entry'.
>              Since probe point already exists with this name, recording only 1 event.
>     Hint: Please use 'perf probe -d sdt_libpthread:mutex_entry*' to allow record on all events.

OK, but could you split this out to another patch?

> 
> >> +
> >> +	/* Reaching here means no matching entry found in uprobe_events. */
> >> +	filter = filter_exst_sdt_events_from_pev(pev, &esl);
> >> +	if (!filter && pev->ntevs == 0) {
> >> +		pr_err("%s:%s not found in the cache\n", pev->group,
> >> +			pev->event);
> >> +		ret = -EINVAL;
> >> +		goto free_str;
> >> +	} else if (pev->ntevs < 0) {
> >> +		err->str = strdup("Cache lookup failed.\n");
> >> +		ret = pev->ntevs;
> >> +		goto free_str;
> >> +	}
> > Again, why is the esl needed here? We can remove events by using it's name.
> >
> > What I expected was,
> >
> > 0) add "generated_sdts" in record structure.
> > 1) search existing events by using its name from tracing/events/.
> > (Don't need to touch uprobe_events)
> 
> I didn't get your point to look at tracing/events/. It's easy to look into
> uprobe_events, because perf has a builtin functions that easily transform
> entries of uprobe_events to 'tevs'. And uprobe_events has all data like
> eventname, filename, address which are crucial for this implementation.
> 
> > 2) if it does not exist, add new probe points and store its name in "generated_sdts".
> 
> This step is little difficult. As I mentioned earlier, major issue is, event
> exists in uprobe_events but with different name. In such case I've to
> reuse existing entries and also have to add new entries for events
> which are absent.

OK, I see. But could you break this patch into a series? At first, we just
need a name-based event setting, and checking event definition should be done
in following patches in the series. One patch should solve one thing, that
help us to review it smoother.

Thank  you,

> 
> > 3) record events
> > 4) remove the events by using "generated_sdts" which only has the event name.
> >
> 
> Let me know if I've misunderstood any of your point.
> 
> Thanks,
> -Ravi
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
  2017-03-01  5:04     ` Brendan Gregg
@ 2017-03-01  5:45       ` Ravi Bangoria
  0 siblings, 0 replies; 11+ messages in thread
From: Ravi Bangoria @ 2017-03-01  5:45 UTC (permalink / raw)
  To: Brendan Gregg
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Alexander Shishkin, Wang Nan, Jiri Olsa, Andi Kleen,
	treeze.taeung, Mathieu Poirier, He Kuang, sukadev, ananth,
	Naveen N. Rao, Adrian Hunter, LKML, Hemant Kumar,
	Masami Hiramatsu, Ravi Bangoria

Thank you Brendan for reviewing,

On Wednesday 01 March 2017 10:34 AM, Brendan Gregg wrote:
> On Tue, Feb 28, 2017 at 2:31 PM, Brendan Gregg
> <brendan.d.gregg@gmail.com> wrote:
>> G'Day Ravi,
>>
> [...]
>> Now retrying perf:
>>
>> # ./perf record -e sdt_node:http__server__request -a
>> ^C[ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.446 MB perf.data (3 samples) ]
>> # ./perf script
>>             node  7646 [002]   361.012364:
>> sdt_node:http__server__request: (dc2e69)
>>             node  7646 [002]   361.204718:
>> sdt_node:http__server__request: (dc2e69)
>>             node  7646 [002]   361.363043:
>> sdt_node:http__server__request: (dc2e69)
>>
>> Now perf works.
>>
>> If I restart the node process, it goes back to the broken state.
>>
> Oh sorry, I forgot about that these Node.js probes are behind an
> is-enabled semaphore.

Yes. Perf does not support "is-enabled" markers yet.

> $ readelf -n `which node`
> [...]
>   stapsdt              0x00000089    NT_STAPSDT (SystemTap probe descriptors)
>     Provider: node
>     Name: http__server__request
>     Location: 0x0000000000dc2e69, Base: 0x000000000112e064, Semaphore:
> 0x0000000001470954
>     Arguments: 8@%r14 8@%rax 8@-4344(%rbp) -4@-4348(%rbp)
> 8@-4304(%rbp) 8@-4312(%rbp) -4@-4352(%rbp)
> # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x0000000001470954 ))
> 2>/dev/null | xxd
> 00000000: 00                                       .
>  # printf "\x1" | dd of=/proc/31695/mem bs=1 count=1 seek=$((
> 0x0000000001470954 )) 2>/dev/null
> # dd if=/proc/31695/mem bs=1 count=1 skip=$(( 0x0000000001470954 ))
> 2>/dev/null | xxd
> 00000000: 01                                       .
> # ./perf record -e sdt_node:http__server__request -a
> Matching event(s) from uprobe_events:
>    sdt_node:http__server__request  0x9c2e69@/usr/local/bin/node
> Use 'perf probe -d <event>' to delete event(s).
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.280 MB perf.data (3 samples) ]
> # ./perf script
>             node 31695 [003] 24947.168761:
> sdt_node:http__server__request: (dc2e69)
>             node 31695 [003] 24947.476143:
> sdt_node:http__server__request: (dc2e69)
>             node 31695 [003] 24947.679090:
> sdt_node:http__server__request: (dc2e69)
>
> So setting that to 1 made the probe work from perf. I guess this is
> not a problem with this patch set, but rather a feature request for
> the next one: is-enabled SDT support.

Yes. Actually I'm thinking about how this can be accomplished. Perf is a userspace
tool and, unlike systemtap, it cannot change semaphore value easily. This is what
I was thinking:

'perf record', at the start of session will increments semaphore in /proc/<pid>/mem.
and at the end of session, perf will decrement it (same as bcc). This does not require
any support from kernel infrastructure. But there are challenges with this approach:

1. What if user starts workload after starting 'perf record'. How perf will be able
    to increment semaphore value.

2. Systemwide record. We have to loop over all pids and check if any process is
   using SDT with semaphore that is being recorded.

3. Dynamic library loading. How to handle SDT probes in library that is not loaded
   at the time of 'perf record'?

Please let me know your thoughts.

> Were probe arguments supposed to work? I don't notice them in the perf
> script output.

Not yet. Alexis[1] (and followed by me[2]) had sent patches for that. Please
have a look at them.

[1] https://lkml.org/lkml/2016/12/13/784
[2] https://lkml.org/lkml/2017/2/2/145


So, why perf is able to record data after recording them with bcc?

Ideally, bcc should increment semaphore value at the start of session and
it should decrement at the end of the session. So after bcc process exits,
semaphore value should be zero. But actually it's not happening.

I've seen this when I was experimenting bcc with is-enabled markers.
See this example,

  $ readelf -n /usr/bin/node | grep -A2 Provider
      Provider: node
      Name: http__server__request
      Location: 0x0000000000e5f484, Base: 0x00000000011a0bc4, Semaphore: 0x0000000001558cf2

  $ sudo cat /proc/1426/maps
    00400000-01306000 r-xp 00000000 08:02 1083365    /usr/bin/node
    01506000-01551000 r--p 00f06000 08:02 1083365    /usr/bin/node
    01551000-01559000 rw-p 00f51000 08:02 1083365    /usr/bin/node
    ...

  [TERMINAL-1]$ gdb 1426
    (gdb) x/1 0x1558cf2
    0x1558cf2:    0

  [TERMINAL-2]$ sudo ./trace.py -p 1426 'u:node:http__server__request'
    PID    TID    COMM         FUNC
    /* Do not exit yet. */

  [TERMINAL-1]
    (gdb) x/1 0x1558cf2
    0x1558cf2:    2

  [TERMINAL-2]
     ^C         /* Exit bcc trace.py */

  [TERMINAL-1]
    (gdb) x/1 0x1558cf2
    0x1558cf2:    2

Here it's maintaining value 2 as it is. it should be 0. I suspect this is a bug in
bcc. Please let me know if I'm understanding it wrong.

>
> PS, if it's helpful, here's the commands to build node with these SDT probes:
>
> $ sudo apt-get install systemtap-sdt-dev       # adds "dtrace", used
> by node build
> $ wget https://nodejs.org/dist/v4.4.1/node-v4.4.1.tar.gz
> $ tar xvf node-v4.4.1.tar.gz
> $ cd node-v4.4.1
> $ ./configure --with-dtrace
> $ make -j 8

Thanks for this. :)
-Ravi

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-03-01 11:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-24  7:43 [PATCH v3 0/2] perf/sdt: Directly record SDT events with 'perf record' Ravi Bangoria
2017-02-24  7:43 ` [PATCH v3 1/2] perf/sdt: Introduce util func is_sdt_event() Ravi Bangoria
2017-02-28  1:34   ` Masami Hiramatsu
2017-02-24  7:43 ` [PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record' Ravi Bangoria
2017-02-28  5:45   ` Masami Hiramatsu
2017-02-28 10:56     ` Ravi Bangoria
2017-03-01  5:45       ` Masami Hiramatsu
2017-02-28 22:31   ` Brendan Gregg
2017-03-01  5:04     ` Brendan Gregg
2017-03-01  5:45       ` Ravi Bangoria
2017-02-27  4:54 ` [PATCH v3 0/2] " Ravi Bangoria

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.