linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object
@ 2021-06-21 15:05 Adrian Hunter
  2021-06-21 15:05 ` [PATCH RFC 01/11] perf script: Move filter_cpu() earlier Adrian Hunter
                   ` (10 more replies)
  0 siblings, 11 replies; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Hi
 
In some cases, users want to filter very large amounts of data (e.g. from
AUX area tracing like Intel PT) looking for something specific. While
scripting such as Python can be used, Python is 10 to 20 times slower than
C. So define a C API so that custom filters can be written and loaded.

There are 3 preparation patches.  The main patch is patch 4.

The other patches add more functionality, except for patch 6 which installs
the C API header file.


Adrian Hunter (11):
      perf script: Move filter_cpu() earlier
      perf script: Move filtering before scripting
      perf script: Share addr_al between functions
      perf script: Add API for filtering via dynamically loaded shared object
      perf script: Add dlfilter__filter_event_early()
      perf build: Install perf_dlfilter.h
      perf dlfilter: Add resolve_address() to perf_dlfilter_fns
      perf dlfilter: Add insn() to perf_dlfilter_fns
      perf dlfilter: Add srcline() to perf_dlfilter_fns
      perf dlfilter: Add attr() to perf_dlfilter_fns
      perf dlfilter: Add object_code() to perf_dlfilter_fns

 tools/perf/Documentation/perf-dlfilter.txt | 235 +++++++++++++++
 tools/perf/Documentation/perf-script.txt   |   7 +-
 tools/perf/Makefile.config                 |   3 +
 tools/perf/Makefile.perf                   |   4 +-
 tools/perf/builtin-script.c                |  93 ++++--
 tools/perf/util/Build                      |   1 +
 tools/perf/util/dlfilter.c                 | 469 +++++++++++++++++++++++++++++
 tools/perf/util/dlfilter.h                 |  91 ++++++
 tools/perf/util/perf_dlfilter.h            | 139 +++++++++
 9 files changed, 1015 insertions(+), 27 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-dlfilter.txt
 create mode 100644 tools/perf/util/dlfilter.c
 create mode 100644 tools/perf/util/dlfilter.h
 create mode 100644 tools/perf/util/perf_dlfilter.h


Regards
Adrian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 01/11] perf script: Move filter_cpu() earlier
  2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
@ 2021-06-21 15:05 ` Adrian Hunter
  2021-06-22 18:16   ` Arnaldo Carvalho de Melo
  2021-06-21 15:05 ` [PATCH RFC 02/11] perf script: Move filtering before scripting Adrian Hunter
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Generally, it should be more efficient if filter_cpu() comes before
machine__resolve() because filter_cpu() is much less code than
machine__resolve().

Example:

 $ perf record --sample-cpu -- make -C tools/perf >/dev/null

Before:

 $ perf stat -- perf script -C 0 >/dev/null

  Performance counter stats for 'perf script -C 0':

            116.94 msec task-clock                #    0.992 CPUs utilized
                 2      context-switches          #   17.103 /sec
                 0      cpu-migrations            #    0.000 /sec
             8,187      page-faults               #   70.011 K/sec
       478,351,812      cycles                    #    4.091 GHz
       564,785,464      instructions              #    1.18  insn per cycle
       114,341,105      branches                  #  977.789 M/sec
         2,615,495      branch-misses             #    2.29% of all branches

       0.117840576 seconds time elapsed

       0.085040000 seconds user
       0.032396000 seconds sys

After:

 $ perf stat -- perf script -C 0 >/dev/null

  Performance counter stats for 'perf script -C 0':

            107.45 msec task-clock                #    0.992 CPUs utilized
                 3      context-switches          #   27.919 /sec
                 0      cpu-migrations            #    0.000 /sec
             7,964      page-faults               #   74.117 K/sec
       438,417,260      cycles                    #    4.080 GHz
       522,571,855      instructions              #    1.19  insn per cycle
       105,187,488      branches                  #  978.921 M/sec
         2,356,261      branch-misses             #    2.24% of all branches

       0.108282546 seconds time elapsed

       0.095935000 seconds user
       0.011991000 seconds sys

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-script.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 57488d60b64a..08a2b5d51018 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2191,6 +2191,9 @@ static int process_sample_event(struct perf_tool *tool,
 		return 0;
 	}
 
+	if (filter_cpu(sample))
+		return 0;
+
 	if (machine__resolve(machine, &al, sample) < 0) {
 		pr_err("problem processing %d event, skipping it.\n",
 		       event->header.type);
@@ -2200,9 +2203,6 @@ static int process_sample_event(struct perf_tool *tool,
 	if (al.filtered)
 		goto out_put;
 
-	if (filter_cpu(sample))
-		goto out_put;
-
 	if (scripting_ops) {
 		struct addr_location *addr_al_ptr = NULL;
 		struct addr_location addr_al;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 02/11] perf script: Move filtering before scripting
  2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
  2021-06-21 15:05 ` [PATCH RFC 01/11] perf script: Move filter_cpu() earlier Adrian Hunter
@ 2021-06-21 15:05 ` Adrian Hunter
  2021-06-22 18:18   ` Arnaldo Carvalho de Melo
  2021-06-21 15:05 ` [PATCH RFC 03/11] perf script: Share addr_al between functions Adrian Hunter
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

To make it possible to use filtering with scripts, move filtering before
scripting.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-script.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 08a2b5d51018..ff7b43899f2e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1984,12 +1984,6 @@ static void process_event(struct perf_script *script,
 	if (output[type].fields == 0)
 		return;
 
-	if (!show_event(sample, evsel, thread, al))
-		return;
-
-	if (evswitch__discard(&script->evswitch, evsel))
-		return;
-
 	++es->samples;
 
 	perf_sample__fprintf_start(script, sample, thread, evsel,
@@ -2203,6 +2197,12 @@ static int process_sample_event(struct perf_tool *tool,
 	if (al.filtered)
 		goto out_put;
 
+	if (!show_event(sample, evsel, al.thread, &al))
+		goto out_put;
+
+	if (evswitch__discard(&scr->evswitch, evsel))
+		goto out_put;
+
 	if (scripting_ops) {
 		struct addr_location *addr_al_ptr = NULL;
 		struct addr_location addr_al;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 03/11] perf script: Share addr_al between functions
  2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
  2021-06-21 15:05 ` [PATCH RFC 01/11] perf script: Move filter_cpu() earlier Adrian Hunter
  2021-06-21 15:05 ` [PATCH RFC 02/11] perf script: Move filtering before scripting Adrian Hunter
@ 2021-06-21 15:05 ` Adrian Hunter
  2021-06-22 18:21   ` Arnaldo Carvalho de Melo
  2021-06-21 15:05 ` [PATCH RFC 04/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Share the addr_location of 'addr' so that it need not be resolved more than
once.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-script.c | 38 +++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ff7b43899f2e..d2771a997e26 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1337,17 +1337,18 @@ static const char *resolve_branch_sym(struct perf_sample *sample,
 				      struct evsel *evsel,
 				      struct thread *thread,
 				      struct addr_location *al,
+				      struct addr_location *addr_al,
 				      u64 *ip)
 {
-	struct addr_location addr_al;
 	struct perf_event_attr *attr = &evsel->core.attr;
 	const char *name = NULL;
 
 	if (sample->flags & (PERF_IP_FLAG_CALL | PERF_IP_FLAG_TRACE_BEGIN)) {
 		if (sample_addr_correlates_sym(attr)) {
-			thread__resolve(thread, &addr_al, sample);
-			if (addr_al.sym)
-				name = addr_al.sym->name;
+			if (!addr_al->thread)
+				thread__resolve(thread, addr_al, sample);
+			if (addr_al->sym)
+				name = addr_al->sym->name;
 			else
 				*ip = sample->addr;
 		} else {
@@ -1365,7 +1366,9 @@ static const char *resolve_branch_sym(struct perf_sample *sample,
 static int perf_sample__fprintf_callindent(struct perf_sample *sample,
 					   struct evsel *evsel,
 					   struct thread *thread,
-					   struct addr_location *al, FILE *fp)
+					   struct addr_location *al,
+					   struct addr_location *addr_al,
+					   FILE *fp)
 {
 	struct perf_event_attr *attr = &evsel->core.attr;
 	size_t depth = thread_stack__depth(thread, sample->cpu);
@@ -1382,7 +1385,7 @@ static int perf_sample__fprintf_callindent(struct perf_sample *sample,
 	if (thread->ts && sample->flags & PERF_IP_FLAG_RETURN)
 		depth += 1;
 
-	name = resolve_branch_sym(sample, evsel, thread, al, &ip);
+	name = resolve_branch_sym(sample, evsel, thread, al, addr_al, &ip);
 
 	if (PRINT_FIELD(DSO) && !(PRINT_FIELD(IP) || PRINT_FIELD(ADDR))) {
 		dlen += fprintf(fp, "(");
@@ -1466,6 +1469,7 @@ static int perf_sample__fprintf_bts(struct perf_sample *sample,
 				    struct evsel *evsel,
 				    struct thread *thread,
 				    struct addr_location *al,
+				    struct addr_location *addr_al,
 				    struct machine *machine, FILE *fp)
 {
 	struct perf_event_attr *attr = &evsel->core.attr;
@@ -1474,7 +1478,7 @@ static int perf_sample__fprintf_bts(struct perf_sample *sample,
 	int printed = 0;
 
 	if (PRINT_FIELD(CALLINDENT))
-		printed += perf_sample__fprintf_callindent(sample, evsel, thread, al, fp);
+		printed += perf_sample__fprintf_callindent(sample, evsel, thread, al, addr_al, fp);
 
 	/* print branch_from information */
 	if (PRINT_FIELD(IP)) {
@@ -1931,7 +1935,8 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 static bool show_event(struct perf_sample *sample,
 		       struct evsel *evsel,
 		       struct thread *thread,
-		       struct addr_location *al)
+		       struct addr_location *al,
+		       struct addr_location *addr_al)
 {
 	int depth = thread_stack__depth(thread, sample->cpu);
 
@@ -1947,7 +1952,7 @@ static bool show_event(struct perf_sample *sample,
 	} else {
 		const char *s = symbol_conf.graph_function;
 		u64 ip;
-		const char *name = resolve_branch_sym(sample, evsel, thread, al,
+		const char *name = resolve_branch_sym(sample, evsel, thread, al, addr_al,
 				&ip);
 		unsigned nlen;
 
@@ -1972,6 +1977,7 @@ static bool show_event(struct perf_sample *sample,
 static void process_event(struct perf_script *script,
 			  struct perf_sample *sample, struct evsel *evsel,
 			  struct addr_location *al,
+			  struct addr_location *addr_al,
 			  struct machine *machine)
 {
 	struct thread *thread = al->thread;
@@ -2005,7 +2011,7 @@ static void process_event(struct perf_script *script,
 		perf_sample__fprintf_flags(sample->flags, fp);
 
 	if (is_bts_event(attr)) {
-		perf_sample__fprintf_bts(sample, evsel, thread, al, machine, fp);
+		perf_sample__fprintf_bts(sample, evsel, thread, al, addr_al, machine, fp);
 		return;
 	}
 
@@ -2168,6 +2174,7 @@ static int process_sample_event(struct perf_tool *tool,
 {
 	struct perf_script *scr = container_of(tool, struct perf_script, tool);
 	struct addr_location al;
+	struct addr_location addr_al;
 
 	if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num,
 					  sample->time)) {
@@ -2197,7 +2204,10 @@ static int process_sample_event(struct perf_tool *tool,
 	if (al.filtered)
 		goto out_put;
 
-	if (!show_event(sample, evsel, al.thread, &al))
+	/* Set thread to NULL to indicate addr_al is not initialized */
+	addr_al.thread = NULL;
+
+	if (!show_event(sample, evsel, al.thread, &al, &addr_al))
 		goto out_put;
 
 	if (evswitch__discard(&scr->evswitch, evsel))
@@ -2205,16 +2215,16 @@ static int process_sample_event(struct perf_tool *tool,
 
 	if (scripting_ops) {
 		struct addr_location *addr_al_ptr = NULL;
-		struct addr_location addr_al;
 
 		if ((evsel->core.attr.sample_type & PERF_SAMPLE_ADDR) &&
 		    sample_addr_correlates_sym(&evsel->core.attr)) {
-			thread__resolve(al.thread, &addr_al, sample);
+			if (!addr_al.thread)
+				thread__resolve(al.thread, &addr_al, sample);
 			addr_al_ptr = &addr_al;
 		}
 		scripting_ops->process_event(event, sample, evsel, &al, addr_al_ptr);
 	} else {
-		process_event(scr, sample, evsel, &al, machine);
+		process_event(scr, sample, evsel, &al, &addr_al, machine);
 	}
 
 out_put:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 04/11] perf script: Add API for filtering via dynamically loaded shared object
  2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
                   ` (2 preceding siblings ...)
  2021-06-21 15:05 ` [PATCH RFC 03/11] perf script: Share addr_al between functions Adrian Hunter
@ 2021-06-21 15:05 ` Adrian Hunter
  2021-06-22 18:32   ` Arnaldo Carvalho de Melo
  2021-06-21 15:05 ` [PATCH RFC 05/11] perf script: Add dlfilter__filter_event_early() Adrian Hunter
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

In some cases, users want to filter very large amounts of data (e.g. from
AUX area tracing like Intel PT) looking for something specific. While
scripting such as Python can be used, Python is 10 to 20 times slower than
C. So define a C API so that custom filters can be written and loaded.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-dlfilter.txt | 214 +++++++++++++
 tools/perf/Documentation/perf-script.txt   |   7 +-
 tools/perf/builtin-script.c                |  25 +-
 tools/perf/util/Build                      |   1 +
 tools/perf/util/dlfilter.c                 | 330 +++++++++++++++++++++
 tools/perf/util/dlfilter.h                 |  74 +++++
 tools/perf/util/perf_dlfilter.h            | 120 ++++++++
 7 files changed, 769 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-dlfilter.txt
 create mode 100644 tools/perf/util/dlfilter.c
 create mode 100644 tools/perf/util/dlfilter.h
 create mode 100644 tools/perf/util/perf_dlfilter.h

diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt
new file mode 100644
index 000000000000..d8f80998790f
--- /dev/null
+++ b/tools/perf/Documentation/perf-dlfilter.txt
@@ -0,0 +1,214 @@
+perf-dlfilter(1)
+================
+
+NAME
+----
+perf-dlfilter - Filter sample events using a dynamically loaded shared
+object file
+
+SYNOPSIS
+--------
+[verse]
+'perf script' [--dlfilter file.so ]
+
+DESCRIPTION
+-----------
+
+This option is used to process data through a custom filter provided by a
+dynamically loaded shared object file.
+
+API
+---
+
+The API for filtering consists of the following:
+
+[source,c]
+----
+#include <perf/perf_dlfilter.h>
+
+const struct perf_dlfilter_fns perf_dlfilter_fns;
+
+int start(void **data);
+int stop(void *data);
+int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx);
+----
+
+If implemented, 'start' will be called at the beginning, before any
+calls to 'filter_event' . Return 0 to indicate success,
+or return a negative error code. '*data' can be assigned for use by other
+functions.
+
+If implemented, 'stop' will be called at the end, after any calls to
+'filter_event'. Return 0 to indicate success, or
+return a negative error code. 'data' is set by 'start'.
+
+If implemented, 'filter_event' will be called for each sample event.
+Return 0 to keep the sample event, 1 to filter it out, or return a negative
+error code. 'data' is set by 'start'. 'ctx' is needed for calls to
+'perf_dlfilter_fns'.
+
+The perf_dlfilter_sample structure
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+'filter_event' is passed a perf_dlfilter_sample
+structure, which contains the following fields:
+[source,c]
+----
+/*
+ * perf sample event information (as per perf script and <linux/perf_event.h>)
+ */
+struct perf_dlfilter_sample {
+	__u32 size; /* Size of this structure (for compatibility checking) */
+	__u64 ip;
+	__s32 pid;
+	__s32 tid;
+	__u64 time;
+	__u64 addr;
+	__u64 id;
+	__u64 stream_id;
+	__u64 period;
+	__u64 weight;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
+	__u16 ins_lat;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
+	__u16 p_stage_cyc;	/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
+	__u64 transaction;	/* Refer PERF_SAMPLE_TRANSACTION in <linux/perf_event.h> */
+	__u64 insn_cnt;	/* For instructions-per-cycle (IPC) */
+	__u64 cyc_cnt;		/* For instructions-per-cycle (IPC) */
+	__s32 cpu;
+	__u32 flags;		/* Refer PERF_DLFILTER_FLAG_* above */
+	__u64 data_src;		/* Refer PERF_SAMPLE_DATA_SRC in <linux/perf_event.h> */
+	__u64 phys_addr;	/* Refer PERF_SAMPLE_PHYS_ADDR in <linux/perf_event.h> */
+	__u64 data_page_size;	/* Refer PERF_SAMPLE_DATA_PAGE_SIZE in <linux/perf_event.h> */
+	__u64 code_page_size;	/* Refer PERF_SAMPLE_CODE_PAGE_SIZE in <linux/perf_event.h> */
+	__u64 cgroup;		/* Refer PERF_SAMPLE_CGROUP in <linux/perf_event.h> */
+	__u8  cpumode;		/* Refer CPUMODE_MASK etc in <linux/perf_event.h> */
+	__u8  addr_correlates_sym; /* True => resolve_addr() can be called */
+	__u16 misc;		/* Refer perf_event_header in <linux/perf_event.h> */
+	__u32 raw_size;		/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
+	const void *raw_data;	/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
+	__u64 brstack_nr;	/* Number of brstack entries */
+	const struct perf_branch_entry *brstack; /* Refer <linux/perf_event.h> */
+	__u64 raw_callchain_nr;	/* Number of raw_callchain entries */
+	const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */
+	const char *event;
+};
+----
+
+The perf_dlfilter_fns structure
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The 'perf_dlfilter_fns' structure is populated with function pointers when the
+file is loaded. The functions can be called by 'filter_event'.
+
+[source,c]
+----
+struct perf_dlfilter_fns {
+	const struct perf_dlfilter_al *(*resolve_ip)(void *ctx);
+	const struct perf_dlfilter_al *(*resolve_addr)(void *ctx);
+	void *(*reserved[126])(void *);
+};
+----
+
+'resolve_ip' returns information about ip.
+
+'resolve_addr' returns information about addr (if addr_correlates_sym).
+
+The perf_dlfilter_al structure
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The 'perf_dlfilter_al' structure contains information about an address.
+
+[source,c]
+----
+/*
+ * Address location (as per perf script)
+ */
+struct perf_dlfilter_al {
+	__u32 size; /* Size of this structure (for compatibility checking) */
+	__u32 symoff;
+	const char *sym;
+	__u64 addr; /* Mapped address (from dso) */
+	__u64 sym_start;
+	__u64 sym_end;
+	const char *dso;
+	__u8  sym_binding; /* STB_LOCAL, STB_GLOBAL or STB_WEAK, refer <elf.h> */
+	__u8  is_64_bit; /* Only valid if dso is not NULL */
+	__u8  is_kernel_ip; /* True if in kernel space */
+	__u32 buildid_size;
+	__u8 *buildid;
+	/* Below members are only populated by resolve_ip() */
+	__u8 filtered; /* true if this sample event will be filtered out */
+	const char *comm;
+};
+----
+
+perf_dlfilter_sample flags
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The 'flags' member of 'perf_dlfilter_sample' corresponds with the flags field
+of perf script. The bits of the flags are as follows:
+
+[source,c]
+----
+/* Definitions for perf_dlfilter_sample flags */
+enum {
+	PERF_DLFILTER_FLAG_BRANCH	= 1ULL << 0,
+	PERF_DLFILTER_FLAG_CALL		= 1ULL << 1,
+	PERF_DLFILTER_FLAG_RETURN	= 1ULL << 2,
+	PERF_DLFILTER_FLAG_CONDITIONAL	= 1ULL << 3,
+	PERF_DLFILTER_FLAG_SYSCALLRET	= 1ULL << 4,
+	PERF_DLFILTER_FLAG_ASYNC	= 1ULL << 5,
+	PERF_DLFILTER_FLAG_INTERRUPT	= 1ULL << 6,
+	PERF_DLFILTER_FLAG_TX_ABORT	= 1ULL << 7,
+	PERF_DLFILTER_FLAG_TRACE_BEGIN	= 1ULL << 8,
+	PERF_DLFILTER_FLAG_TRACE_END	= 1ULL << 9,
+	PERF_DLFILTER_FLAG_IN_TX	= 1ULL << 10,
+	PERF_DLFILTER_FLAG_VMENTRY	= 1ULL << 11,
+	PERF_DLFILTER_FLAG_VMEXIT	= 1ULL << 12,
+};
+----
+
+EXAMPLE
+-------
+
+Filter out everything except branches from "foo" to "bar":
+
+[source,c]
+----
+#include <perf/perf_dlfilter.h>
+#include <string.h>
+
+const struct perf_dlfilter_fns perf_dlfilter_fns;
+
+int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
+{
+	const struct perf_dlfilter_al *al;
+	const struct perf_dlfilter_al *addr_al;
+
+	if (!sample->ip || !sample->addr_correlates_sym)
+		return 1;
+
+	al = perf_dlfilter_fns.resolve_ip(ctx);
+	if (!al || !al->sym || strcmp(al->sym, "foo"))
+		return 1;
+
+	addr_al = perf_dlfilter_fns.resolve_addr(ctx);
+	if (!addr_al || !addr_al->sym || strcmp(addr_al->sym, "bar"))
+		return 1;
+
+	return 0;
+}
+----
+
+To build the shared object, assuming perf has been installed for the local user
+i.e. perf_dlfilter.h is in ~/include/perf :
+
+	gcc -c -I ~/include -fpic dlfilter-example.c
+	gcc -shared -o dlfilter-example.so dlfilter-example.o
+
+To use the filter with perf script:
+
+	perf script --dlfilter ./dlfilter-example.so
+
+SEE ALSO
+--------
+linkperf:perf-script[1]
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 48a5f5b26dd4..2306c81b606b 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -98,6 +98,10 @@ OPTIONS
         Generate perf-script.[ext] starter script for given language,
         using current perf.data.
 
+--dlfilter=<file>::
+	Filter sample events using the given shared object file.
+	Refer linkperf:perf-dlfilter[1]
+
 -a::
         Force system-wide collection.  Scripts run without a <command>
         normally use -a by default, while scripts run with a <command>
@@ -483,4 +487,5 @@ include::itrace.txt[]
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
-linkperf:perf-script-python[1], linkperf:perf-intel-pt[1]
+linkperf:perf-script-python[1], linkperf:perf-intel-pt[1],
+linkperf:perf-dlfilter[1]
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index d2771a997e26..aaf2922643a0 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -55,6 +55,7 @@
 #include <subcmd/pager.h>
 #include <perf/evlist.h>
 #include <linux/err.h>
+#include "util/dlfilter.h"
 #include "util/record.h"
 #include "util/util.h"
 #include "perf.h"
@@ -79,6 +80,7 @@ static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 static struct perf_stat_config	stat_config;
 static int			max_blocks;
 static bool			native_arch;
+static struct dlfilter		*dlfilter;
 
 unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
 
@@ -2175,6 +2177,7 @@ static int process_sample_event(struct perf_tool *tool,
 	struct perf_script *scr = container_of(tool, struct perf_script, tool);
 	struct addr_location al;
 	struct addr_location addr_al;
+	int ret = 0;
 
 	if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num,
 					  sample->time)) {
@@ -2213,6 +2216,13 @@ static int process_sample_event(struct perf_tool *tool,
 	if (evswitch__discard(&scr->evswitch, evsel))
 		goto out_put;
 
+	ret = dlfilter__filter_event(dlfilter, event, sample, evsel, machine, &al, &addr_al);
+	if (ret) {
+		if (ret > 0)
+			ret = 0;
+		goto out_put;
+	}
+
 	if (scripting_ops) {
 		struct addr_location *addr_al_ptr = NULL;
 
@@ -2229,7 +2239,7 @@ static int process_sample_event(struct perf_tool *tool,
 
 out_put:
 	addr_location__put(&al);
-	return 0;
+	return ret;
 }
 
 static int process_attr(struct perf_tool *tool, union perf_event *event,
@@ -3568,6 +3578,7 @@ int cmd_script(int argc, const char **argv)
 	};
 	struct utsname uts;
 	char *script_path = NULL;
+	const char *dlfilter_file = NULL;
 	const char **__argv;
 	int i, j, err = 0;
 	struct perf_script script = {
@@ -3615,6 +3626,7 @@ int cmd_script(int argc, const char **argv)
 		     parse_scriptname),
 	OPT_STRING('g', "gen-script", &generate_script_lang, "lang",
 		   "generate perf-script.xx script in specified language"),
+	OPT_STRING(0, "dlfilter", &dlfilter_file, "file", "filter .so file name"),
 	OPT_STRING('i', "input", &input_name, "file", "input file name"),
 	OPT_BOOLEAN('d', "debug-mode", &debug_mode,
 		   "do various checks like samples ordering and lost events"),
@@ -3933,6 +3945,12 @@ int cmd_script(int argc, const char **argv)
 		exit(-1);
 	}
 
+	if (dlfilter_file) {
+		dlfilter = dlfilter__new(dlfilter_file);
+		if (!dlfilter)
+			return -1;
+	}
+
 	if (!script_name) {
 		setup_pager();
 		use_browser = 0;
@@ -4032,6 +4050,10 @@ int cmd_script(int argc, const char **argv)
 		goto out_delete;
 	}
 
+	err = dlfilter__start(dlfilter, session);
+	if (err)
+		goto out_delete;
+
 	if (script_name) {
 		err = scripting_ops->start_script(script_name, argc, argv, session);
 		if (err)
@@ -4081,6 +4103,7 @@ int cmd_script(int argc, const char **argv)
 
 	if (script_started)
 		cleanup_scripting();
+	dlfilter__cleanup(dlfilter);
 out:
 	return err;
 }
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 95e15d1035ab..1a909b53dc15 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -126,6 +126,7 @@ perf-y += parse-regs-options.o
 perf-y += parse-sublevel-options.o
 perf-y += term.o
 perf-y += help-unknown-cmd.o
+perf-y += dlfilter.o
 perf-y += mem-events.o
 perf-y += vsprintf.o
 perf-y += units.o
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
new file mode 100644
index 000000000000..15cb9de13a4b
--- /dev/null
+++ b/tools/perf/util/dlfilter.c
@@ -0,0 +1,330 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * dlfilter.c: Interface to perf script --dlfilter shared object
+ * Copyright (c) 2021, Intel Corporation.
+ */
+#include <dlfcn.h>
+#include <stdlib.h>
+#include <string.h>
+#include <linux/zalloc.h>
+#include <linux/build_bug.h>
+
+#include "debug.h"
+#include "event.h"
+#include "evsel.h"
+#include "dso.h"
+#include "map.h"
+#include "thread.h"
+#include "symbol.h"
+#include "dlfilter.h"
+#include "perf_dlfilter.h"
+
+static void al_to_d_al(struct addr_location *al, struct perf_dlfilter_al *d_al)
+{
+	struct symbol *sym = al->sym;
+
+	d_al->size = sizeof(*d_al);
+	if (al->map) {
+		struct dso *dso = al->map->dso;
+
+		if (symbol_conf.show_kernel_path && dso->long_name)
+			d_al->dso = dso->long_name;
+		else
+			d_al->dso = dso->name;
+		d_al->is_64_bit = dso->is_64_bit;
+		d_al->buildid_size = dso->bid.size;
+		d_al->buildid = dso->bid.data;
+	} else {
+		d_al->dso = NULL;
+		d_al->is_64_bit = 0;
+		d_al->buildid_size = 0;
+		d_al->buildid = NULL;
+	}
+	if (sym) {
+		d_al->sym = sym->name;
+		d_al->sym_start = sym->start;
+		d_al->sym_end = sym->end;
+		if (al->addr < sym->end)
+			d_al->symoff = al->addr - sym->start;
+		else
+			d_al->symoff = al->addr - al->map->start - sym->start;
+		d_al->sym_binding = sym->binding;
+	} else {
+		d_al->sym = NULL;
+		d_al->sym_start = 0;
+		d_al->sym_end = 0;
+		d_al->symoff = 0;
+		d_al->sym_binding = 0;
+	}
+	d_al->addr = al->addr;
+	d_al->comm = NULL;
+	d_al->filtered = 0;
+}
+
+static struct addr_location *get_al(struct dlfilter *d)
+{
+	struct addr_location *al = d->al;
+
+	if (!al->thread && machine__resolve(d->machine, al, d->sample) < 0)
+		return NULL;
+	return al;
+}
+
+static struct thread *get_thread(struct dlfilter *d)
+{
+	struct addr_location *al = get_al(d);
+
+	return al ? al->thread : NULL;
+}
+
+static const struct perf_dlfilter_al *dlfilter__resolve_ip(void *ctx)
+{
+	struct dlfilter *d = (struct dlfilter *)ctx;
+	struct perf_dlfilter_al *d_al = d->d_ip_al;
+	struct addr_location *al;
+
+	if (!d->ctx_valid)
+		return NULL;
+
+	/* 'size' is also used to indicate already initialized */
+	if (d_al->size)
+		return d_al;
+
+	al = get_al(d);
+	if (!al)
+		return NULL;
+
+	al_to_d_al(al, d_al);
+
+	d_al->is_kernel_ip = machine__kernel_ip(d->machine, d->sample->ip);
+	d_al->comm = al->thread ? thread__comm_str(al->thread) : ":-1";
+	d_al->filtered = al->filtered;
+
+	return d_al;
+}
+
+static const struct perf_dlfilter_al *dlfilter__resolve_addr(void *ctx)
+{
+	struct dlfilter *d = (struct dlfilter *)ctx;
+	struct perf_dlfilter_al *d_addr_al = d->d_addr_al;
+	struct addr_location *addr_al = d->addr_al;
+
+	if (!d->ctx_valid || !d->d_sample->addr_correlates_sym)
+		return NULL;
+
+	/* 'size' is also used to indicate already initialized */
+	if (d_addr_al->size)
+		return d_addr_al;
+
+	if (!addr_al->thread) {
+		struct thread *thread = get_thread(d);
+
+		if (!thread)
+			return NULL;
+		thread__resolve(thread, addr_al, d->sample);
+	}
+
+	al_to_d_al(addr_al, d_addr_al);
+
+	d_addr_al->is_kernel_ip = machine__kernel_ip(d->machine, d->sample->addr);
+
+	return d_addr_al;
+}
+
+static const struct perf_dlfilter_fns perf_dlfilter_fns = {
+	.resolve_ip      = dlfilter__resolve_ip,
+	.resolve_addr    = dlfilter__resolve_addr,
+};
+
+#define CHECK_FLAG(x) BUILD_BUG_ON((u64)PERF_DLFILTER_FLAG_ ## x != (u64)PERF_IP_FLAG_ ## x)
+
+static int dlfilter__init(struct dlfilter *d, const char *file)
+{
+	CHECK_FLAG(BRANCH);
+	CHECK_FLAG(CALL);
+	CHECK_FLAG(RETURN);
+	CHECK_FLAG(CONDITIONAL);
+	CHECK_FLAG(SYSCALLRET);
+	CHECK_FLAG(ASYNC);
+	CHECK_FLAG(INTERRUPT);
+	CHECK_FLAG(TX_ABORT);
+	CHECK_FLAG(TRACE_BEGIN);
+	CHECK_FLAG(TRACE_END);
+	CHECK_FLAG(IN_TX);
+	CHECK_FLAG(VMENTRY);
+	CHECK_FLAG(VMEXIT);
+
+	memset(d, 0, sizeof(*d));
+	d->file = strdup(file);
+	if (!d->file)
+		return -1;
+	return 0;
+}
+
+static void dlfilter__exit(struct dlfilter *d)
+{
+	zfree(&d->file);
+}
+
+static int dlfilter__open(struct dlfilter *d)
+{
+	d->handle = dlopen(d->file, RTLD_NOW);
+	if (!d->handle) {
+		pr_err("dlopen failed for: '%s'\n", d->file);
+		return -1;
+	}
+	d->start = dlsym(d->handle, "start");
+	d->filter_event = dlsym(d->handle, "filter_event");
+	d->stop = dlsym(d->handle, "stop");
+	d->fns = dlsym(d->handle, "perf_dlfilter_fns");
+	if (d->fns)
+		memcpy(d->fns, &perf_dlfilter_fns, sizeof(struct perf_dlfilter_fns));
+	return 0;
+}
+
+static int dlfilter__close(struct dlfilter *d)
+{
+	return dlclose(d->handle);
+}
+
+struct dlfilter *dlfilter__new(const char *file)
+{
+	struct dlfilter *d = malloc(sizeof(*d));
+
+	if (!d)
+		return NULL;
+
+	if (dlfilter__init(d, file))
+		goto err_free;
+
+	if (dlfilter__open(d))
+		goto err_exit;
+
+	return d;
+
+err_exit:
+	dlfilter__exit(d);
+err_free:
+	free(d);
+	return NULL;
+}
+
+static void dlfilter__free(struct dlfilter *d)
+{
+	if (d) {
+		dlfilter__exit(d);
+		free(d);
+	}
+}
+
+int dlfilter__start(struct dlfilter *d, struct perf_session *session)
+{
+	if (d) {
+		d->session = session;
+		if (d->start)
+			return d->start(&d->data);
+	}
+	return 0;
+}
+
+static int dlfilter__stop(struct dlfilter *d)
+{
+	if (d && d->stop)
+		return d->stop(d->data);
+	return 0;
+}
+
+void dlfilter__cleanup(struct dlfilter *d)
+{
+	if (d) {
+		dlfilter__stop(d);
+		dlfilter__close(d);
+		dlfilter__free(d);
+	}
+}
+
+#define ASSIGN(x) d_sample.x = sample->x
+
+int dlfilter__do_filter_event(struct dlfilter *d,
+			      union perf_event *event,
+			      struct perf_sample *sample,
+			      struct evsel *evsel,
+			      struct machine *machine,
+			      struct addr_location *al,
+			      struct addr_location *addr_al)
+{
+	struct perf_dlfilter_sample d_sample;
+	struct perf_dlfilter_al d_ip_al;
+	struct perf_dlfilter_al d_addr_al;
+	int ret;
+
+	d->event       = event;
+	d->sample      = sample;
+	d->evsel       = evsel;
+	d->machine     = machine;
+	d->al          = al;
+	d->addr_al     = addr_al;
+	d->d_sample    = &d_sample;
+	d->d_ip_al     = &d_ip_al;
+	d->d_addr_al   = &d_addr_al;
+
+	d_sample.size  = sizeof(d_sample);
+	d_ip_al.size   = 0; /* To indicate d_ip_al is not initialized */
+	d_addr_al.size = 0; /* To indicate d_addr_al is not initialized */
+
+	ASSIGN(ip);
+	ASSIGN(pid);
+	ASSIGN(tid);
+	ASSIGN(time);
+	ASSIGN(addr);
+	ASSIGN(id);
+	ASSIGN(stream_id);
+	ASSIGN(period);
+	ASSIGN(weight);
+	ASSIGN(ins_lat);
+	ASSIGN(p_stage_cyc);
+	ASSIGN(transaction);
+	ASSIGN(insn_cnt);
+	ASSIGN(cyc_cnt);
+	ASSIGN(cpu);
+	ASSIGN(flags);
+	ASSIGN(data_src);
+	ASSIGN(phys_addr);
+	ASSIGN(data_page_size);
+	ASSIGN(code_page_size);
+	ASSIGN(cgroup);
+	ASSIGN(cpumode);
+	ASSIGN(misc);
+	ASSIGN(raw_size);
+	ASSIGN(raw_data);
+
+	if (sample->branch_stack) {
+		d_sample.brstack_nr = sample->branch_stack->nr;
+		d_sample.brstack = (struct perf_branch_entry *)perf_sample__branch_entries(sample);
+	} else {
+		d_sample.brstack_nr = 0;
+		d_sample.brstack = NULL;
+	}
+
+	if (sample->callchain) {
+		d_sample.raw_callchain_nr = sample->callchain->nr;
+		d_sample.raw_callchain = (__u64 *)sample->callchain->ips;
+	} else {
+		d_sample.raw_callchain_nr = 0;
+		d_sample.raw_callchain = NULL;
+	}
+
+	d_sample.addr_correlates_sym =
+		(evsel->core.attr.sample_type & PERF_SAMPLE_ADDR) &&
+		sample_addr_correlates_sym(&evsel->core.attr);
+
+	d_sample.event = evsel__name(evsel);
+
+	d->ctx_valid = true;
+
+	ret = d->filter_event(d->data, &d_sample, d);
+
+	d->ctx_valid = false;
+
+	return ret;
+}
diff --git a/tools/perf/util/dlfilter.h b/tools/perf/util/dlfilter.h
new file mode 100644
index 000000000000..671e2d3d5a06
--- /dev/null
+++ b/tools/perf/util/dlfilter.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * dlfilter.h: Interface to perf script --dlfilter shared object
+ * Copyright (c) 2021, Intel Corporation.
+ */
+
+#ifndef PERF_UTIL_DLFILTER_H
+#define PERF_UTIL_DLFILTER_H
+
+struct perf_session;
+union  perf_event;
+struct perf_sample;
+struct evsel;
+struct machine;
+struct addr_location;
+struct perf_dlfilter_fns;
+struct perf_dlfilter_sample;
+struct perf_dlfilter_al;
+
+struct dlfilter {
+	char				*file;
+	void				*handle;
+	void				*data;
+	struct perf_session		*session;
+	bool				ctx_valid;
+
+	union perf_event		*event;
+	struct perf_sample		*sample;
+	struct evsel			*evsel;
+	struct machine			*machine;
+	struct addr_location		*al;
+	struct addr_location		*addr_al;
+	struct perf_dlfilter_sample	*d_sample;
+	struct perf_dlfilter_al		*d_ip_al;
+	struct perf_dlfilter_al		*d_addr_al;
+
+	int (*start)(void **data);
+	int (*stop)(void *data);
+
+	int (*filter_event)(void *data,
+			    const struct perf_dlfilter_sample *sample,
+			    void *ctx);
+
+	struct perf_dlfilter_fns *fns;
+};
+
+struct dlfilter *dlfilter__new(const char *file);
+
+int dlfilter__start(struct dlfilter *d, struct perf_session *session);
+
+int dlfilter__do_filter_event(struct dlfilter *d,
+			      union perf_event *event,
+			      struct perf_sample *sample,
+			      struct evsel *evsel,
+			      struct machine *machine,
+			      struct addr_location *al,
+			      struct addr_location *addr_al);
+
+void dlfilter__cleanup(struct dlfilter *d);
+
+static inline int dlfilter__filter_event(struct dlfilter *d,
+					 union perf_event *event,
+					 struct perf_sample *sample,
+					 struct evsel *evsel,
+					 struct machine *machine,
+					 struct addr_location *al,
+					 struct addr_location *addr_al)
+{
+	if (!d || !d->filter_event)
+		return 0;
+	return dlfilter__do_filter_event(d, event, sample, evsel, machine, al, addr_al);
+}
+
+#endif
diff --git a/tools/perf/util/perf_dlfilter.h b/tools/perf/util/perf_dlfilter.h
new file mode 100644
index 000000000000..132f833f0a0b
--- /dev/null
+++ b/tools/perf/util/perf_dlfilter.h
@@ -0,0 +1,120 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * perf_dlfilter.h: API for perf --dlfilter shared object
+ * Copyright (c) 2021, Intel Corporation.
+ */
+#ifndef _LINUX_PERF_DLFILTER_H
+#define _LINUX_PERF_DLFILTER_H
+
+#include <linux/perf_event.h>
+#include <linux/types.h>
+
+/* Definitions for perf_dlfilter_sample flags */
+enum {
+	PERF_DLFILTER_FLAG_BRANCH	= 1ULL << 0,
+	PERF_DLFILTER_FLAG_CALL		= 1ULL << 1,
+	PERF_DLFILTER_FLAG_RETURN	= 1ULL << 2,
+	PERF_DLFILTER_FLAG_CONDITIONAL	= 1ULL << 3,
+	PERF_DLFILTER_FLAG_SYSCALLRET	= 1ULL << 4,
+	PERF_DLFILTER_FLAG_ASYNC	= 1ULL << 5,
+	PERF_DLFILTER_FLAG_INTERRUPT	= 1ULL << 6,
+	PERF_DLFILTER_FLAG_TX_ABORT	= 1ULL << 7,
+	PERF_DLFILTER_FLAG_TRACE_BEGIN	= 1ULL << 8,
+	PERF_DLFILTER_FLAG_TRACE_END	= 1ULL << 9,
+	PERF_DLFILTER_FLAG_IN_TX	= 1ULL << 10,
+	PERF_DLFILTER_FLAG_VMENTRY	= 1ULL << 11,
+	PERF_DLFILTER_FLAG_VMEXIT	= 1ULL << 12,
+};
+
+/*
+ * perf sample event information (as per perf script and <linux/perf_event.h>)
+ */
+struct perf_dlfilter_sample {
+	__u32 size; /* Size of this structure (for compatibility checking) */
+	__u64 ip;
+	__s32 pid;
+	__s32 tid;
+	__u64 time;
+	__u64 addr;
+	__u64 id;
+	__u64 stream_id;
+	__u64 period;
+	__u64 weight;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
+	__u16 ins_lat;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
+	__u16 p_stage_cyc;	/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
+	__u64 transaction;	/* Refer PERF_SAMPLE_TRANSACTION in <linux/perf_event.h> */
+	__u64 insn_cnt;	/* For instructions-per-cycle (IPC) */
+	__u64 cyc_cnt;		/* For instructions-per-cycle (IPC) */
+	__s32 cpu;
+	__u32 flags;		/* Refer PERF_DLFILTER_FLAG_* above */
+	__u64 data_src;		/* Refer PERF_SAMPLE_DATA_SRC in <linux/perf_event.h> */
+	__u64 phys_addr;	/* Refer PERF_SAMPLE_PHYS_ADDR in <linux/perf_event.h> */
+	__u64 data_page_size;	/* Refer PERF_SAMPLE_DATA_PAGE_SIZE in <linux/perf_event.h> */
+	__u64 code_page_size;	/* Refer PERF_SAMPLE_CODE_PAGE_SIZE in <linux/perf_event.h> */
+	__u64 cgroup;		/* Refer PERF_SAMPLE_CGROUP in <linux/perf_event.h> */
+	__u8  cpumode;		/* Refer CPUMODE_MASK etc in <linux/perf_event.h> */
+	__u8  addr_correlates_sym; /* True => resolve_addr() can be called */
+	__u16 misc;		/* Refer perf_event_header in <linux/perf_event.h> */
+	__u32 raw_size;		/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
+	const void *raw_data;	/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
+	__u64 brstack_nr;	/* Number of brstack entries */
+	const struct perf_branch_entry *brstack; /* Refer <linux/perf_event.h> */
+	__u64 raw_callchain_nr;	/* Number of raw_callchain entries */
+	const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */
+	const char *event;
+};
+
+/*
+ * Address location (as per perf script)
+ */
+struct perf_dlfilter_al {
+	__u32 size; /* Size of this structure (for compatibility checking) */
+	__u32 symoff;
+	const char *sym;
+	__u64 addr; /* Mapped address (from dso) */
+	__u64 sym_start;
+	__u64 sym_end;
+	const char *dso;
+	__u8  sym_binding; /* STB_LOCAL, STB_GLOBAL or STB_WEAK, refer <elf.h> */
+	__u8  is_64_bit; /* Only valid if dso is not NULL */
+	__u8  is_kernel_ip; /* True if in kernel space */
+	__u32 buildid_size;
+	__u8 *buildid;
+	/* Below members are only populated by resolve_ip() */
+	__u8 filtered; /* True if this sample event will be filtered out */
+	const char *comm;
+};
+
+struct perf_dlfilter_fns {
+	/* Return information about ip */
+	const struct perf_dlfilter_al *(*resolve_ip)(void *ctx);
+	/* Return information about addr (if addr_correlates_sym) */
+	const struct perf_dlfilter_al *(*resolve_addr)(void *ctx);
+	/* Reserved */
+	void *(*reserved[126])(void *);
+};
+
+/*
+ * If implemented, 'start' will be called at the beginning,
+ * before any calls to 'filter_event'. Return 0 to indicate success,
+ * or return a negative error code. '*data' can be assigned for use
+ * by other functions.
+ */
+int start(void **data);
+
+/*
+ * If implemented, 'stop' will be called at the end,
+ * after any calls to 'filter_event'. Return 0 to indicate success, or
+ * return a negative error code. 'data' is set by start().
+ */
+int stop(void *data);
+
+/*
+ * If implemented, 'filter_event' will be called for each sample
+ * event. Return 0 to keep the sample event, 1 to filter it out, or
+ * return a negative error code. 'data' is set by start(). 'ctx' is
+ * needed for calls to perf_dlfilter_fns.
+ */
+int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx);
+
+#endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 05/11] perf script: Add dlfilter__filter_event_early()
  2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
                   ` (3 preceding siblings ...)
  2021-06-21 15:05 ` [PATCH RFC 04/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
@ 2021-06-21 15:05 ` Adrian Hunter
  2021-06-21 15:05 ` [PATCH RFC 06/11] perf build: Install perf_dlfilter.h Adrian Hunter
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

filter_event_early() can be more than 30% faster than filter_event()
because it is called before internal filtering. In other respects it
is the same as filter_event(), except that it will be passed events
that have yet to be filtered out.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-dlfilter.txt | 13 +++++++----
 tools/perf/builtin-script.c                | 26 +++++++++++++++-------
 tools/perf/util/dlfilter.c                 |  9 ++++++--
 tools/perf/util/dlfilter.h                 | 21 +++++++++++++++--
 tools/perf/util/perf_dlfilter.h            |  6 +++++
 5 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt
index d8f80998790f..8eada542330a 100644
--- a/tools/perf/Documentation/perf-dlfilter.txt
+++ b/tools/perf/Documentation/perf-dlfilter.txt
@@ -31,15 +31,16 @@ const struct perf_dlfilter_fns perf_dlfilter_fns;
 int start(void **data);
 int stop(void *data);
 int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx);
+int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx);
 ----
 
 If implemented, 'start' will be called at the beginning, before any
-calls to 'filter_event' . Return 0 to indicate success,
+calls to 'filter_event' or 'filter_event_early'. Return 0 to indicate success,
 or return a negative error code. '*data' can be assigned for use by other
 functions.
 
 If implemented, 'stop' will be called at the end, after any calls to
-'filter_event'. Return 0 to indicate success, or
+'filter_event' or 'filter_event_early'. Return 0 to indicate success, or
 return a negative error code. 'data' is set by 'start'.
 
 If implemented, 'filter_event' will be called for each sample event.
@@ -47,10 +48,13 @@ Return 0 to keep the sample event, 1 to filter it out, or return a negative
 error code. 'data' is set by 'start'. 'ctx' is needed for calls to
 'perf_dlfilter_fns'.
 
+'filter_event_early' is the same as 'filter_event' except it is called before
+internal filtering.
+
 The perf_dlfilter_sample structure
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-'filter_event' is passed a perf_dlfilter_sample
+'filter_event' and 'filter_event_early' are passed a perf_dlfilter_sample
 structure, which contains the following fields:
 [source,c]
 ----
@@ -97,7 +101,8 @@ The perf_dlfilter_fns structure
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The 'perf_dlfilter_fns' structure is populated with function pointers when the
-file is loaded. The functions can be called by 'filter_event'.
+file is loaded. The functions can be called by 'filter_event' or
+'filter_event_early'.
 
 [source,c]
 ----
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index aaf2922643a0..e47affe674a5 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2179,9 +2179,20 @@ static int process_sample_event(struct perf_tool *tool,
 	struct addr_location addr_al;
 	int ret = 0;
 
+	/* Set thread to NULL to indicate addr_al and al are not initialized */
+	addr_al.thread = NULL;
+	al.thread = NULL;
+
+	ret = dlfilter__filter_event_early(dlfilter, event, sample, evsel, machine, &al, &addr_al);
+	if (ret) {
+		if (ret > 0)
+			ret = 0;
+		goto out_put;
+	}
+
 	if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num,
 					  sample->time)) {
-		return 0;
+		goto out_put;
 	}
 
 	if (debug_mode) {
@@ -2192,24 +2203,22 @@ static int process_sample_event(struct perf_tool *tool,
 			nr_unordered++;
 		}
 		last_timestamp = sample->time;
-		return 0;
+		goto out_put;
 	}
 
 	if (filter_cpu(sample))
-		return 0;
+		goto out_put;
 
 	if (machine__resolve(machine, &al, sample) < 0) {
 		pr_err("problem processing %d event, skipping it.\n",
 		       event->header.type);
-		return -1;
+		ret = -1;
+		goto out_put;
 	}
 
 	if (al.filtered)
 		goto out_put;
 
-	/* Set thread to NULL to indicate addr_al is not initialized */
-	addr_al.thread = NULL;
-
 	if (!show_event(sample, evsel, al.thread, &al, &addr_al))
 		goto out_put;
 
@@ -2238,7 +2247,8 @@ static int process_sample_event(struct perf_tool *tool,
 	}
 
 out_put:
-	addr_location__put(&al);
+	if (al.thread)
+		addr_location__put(&al);
 	return ret;
 }
 
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
index 15cb9de13a4b..8a2b196f07a7 100644
--- a/tools/perf/util/dlfilter.c
+++ b/tools/perf/util/dlfilter.c
@@ -175,6 +175,7 @@ static int dlfilter__open(struct dlfilter *d)
 	}
 	d->start = dlsym(d->handle, "start");
 	d->filter_event = dlsym(d->handle, "filter_event");
+	d->filter_event_early = dlsym(d->handle, "filter_event_early");
 	d->stop = dlsym(d->handle, "stop");
 	d->fns = dlsym(d->handle, "perf_dlfilter_fns");
 	if (d->fns)
@@ -251,7 +252,8 @@ int dlfilter__do_filter_event(struct dlfilter *d,
 			      struct evsel *evsel,
 			      struct machine *machine,
 			      struct addr_location *al,
-			      struct addr_location *addr_al)
+			      struct addr_location *addr_al,
+			      bool early)
 {
 	struct perf_dlfilter_sample d_sample;
 	struct perf_dlfilter_al d_ip_al;
@@ -322,7 +324,10 @@ int dlfilter__do_filter_event(struct dlfilter *d,
 
 	d->ctx_valid = true;
 
-	ret = d->filter_event(d->data, &d_sample, d);
+	if (early)
+		ret = d->filter_event_early(d->data, &d_sample, d);
+	else
+		ret = d->filter_event(d->data, &d_sample, d);
 
 	d->ctx_valid = false;
 
diff --git a/tools/perf/util/dlfilter.h b/tools/perf/util/dlfilter.h
index 671e2d3d5a06..eee799ea4382 100644
--- a/tools/perf/util/dlfilter.h
+++ b/tools/perf/util/dlfilter.h
@@ -40,6 +40,9 @@ struct dlfilter {
 	int (*filter_event)(void *data,
 			    const struct perf_dlfilter_sample *sample,
 			    void *ctx);
+	int (*filter_event_early)(void *data,
+				  const struct perf_dlfilter_sample *sample,
+				  void *ctx);
 
 	struct perf_dlfilter_fns *fns;
 };
@@ -54,7 +57,8 @@ int dlfilter__do_filter_event(struct dlfilter *d,
 			      struct evsel *evsel,
 			      struct machine *machine,
 			      struct addr_location *al,
-			      struct addr_location *addr_al);
+			      struct addr_location *addr_al,
+			      bool early);
 
 void dlfilter__cleanup(struct dlfilter *d);
 
@@ -68,7 +72,20 @@ static inline int dlfilter__filter_event(struct dlfilter *d,
 {
 	if (!d || !d->filter_event)
 		return 0;
-	return dlfilter__do_filter_event(d, event, sample, evsel, machine, al, addr_al);
+	return dlfilter__do_filter_event(d, event, sample, evsel, machine, al, addr_al, false);
+}
+
+static inline int dlfilter__filter_event_early(struct dlfilter *d,
+					       union perf_event *event,
+					       struct perf_sample *sample,
+					       struct evsel *evsel,
+					       struct machine *machine,
+					       struct addr_location *al,
+					       struct addr_location *addr_al)
+{
+	if (!d || !d->filter_event_early)
+		return 0;
+	return dlfilter__do_filter_event(d, event, sample, evsel, machine, al, addr_al, true);
 }
 
 #endif
diff --git a/tools/perf/util/perf_dlfilter.h b/tools/perf/util/perf_dlfilter.h
index 132f833f0a0b..d24c3e2a8407 100644
--- a/tools/perf/util/perf_dlfilter.h
+++ b/tools/perf/util/perf_dlfilter.h
@@ -117,4 +117,10 @@ int stop(void *data);
  */
 int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx);
 
+/*
+ * The same as 'filter_event' except it is called before internal
+ * filtering.
+ */
+int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx);
+
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 06/11] perf build: Install perf_dlfilter.h
  2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
                   ` (4 preceding siblings ...)
  2021-06-21 15:05 ` [PATCH RFC 05/11] perf script: Add dlfilter__filter_event_early() Adrian Hunter
@ 2021-06-21 15:05 ` Adrian Hunter
  2021-06-21 15:05 ` [PATCH RFC 07/11] perf dlfilter: Add resolve_address() to perf_dlfilter_fns Adrian Hunter
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Users of the --dlfilter option need to include perf_dlfilter.h
in their filters. Install it to the include path.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Makefile.config | 3 +++
 tools/perf/Makefile.perf   | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index a62d09808fef..87f9a139a89c 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -1118,6 +1118,8 @@ prefix ?= $(HOME)
 endif
 bindir_relative = bin
 bindir = $(abspath $(prefix)/$(bindir_relative))
+includedir_relative = include
+includedir = $(abspath $(prefix)/$(includedir_relative))
 mandir = share/man
 infodir = share/info
 perfexecdir = libexec/perf-core
@@ -1150,6 +1152,7 @@ ETC_PERFCONFIG_SQ = $(subst ','\'',$(ETC_PERFCONFIG))
 STRACE_GROUPS_DIR_SQ = $(subst ','\'',$(STRACE_GROUPS_DIR))
 DESTDIR_SQ = $(subst ','\'',$(DESTDIR))
 bindir_SQ = $(subst ','\'',$(bindir))
+includedir_SQ = $(subst ','\'',$(includedir))
 mandir_SQ = $(subst ','\'',$(mandir))
 infodir_SQ = $(subst ','\'',$(infodir))
 perfexecdir_SQ = $(subst ','\'',$(perfexecdir))
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index e47f04e5b51e..c9e0de5b00c1 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -923,7 +923,9 @@ install-tools: all install-gtk
 	$(call QUIET_INSTALL, binaries) \
 		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(bindir_SQ)'; \
 		$(INSTALL) $(OUTPUT)perf '$(DESTDIR_SQ)$(bindir_SQ)'; \
-		$(LN) '$(DESTDIR_SQ)$(bindir_SQ)/perf' '$(DESTDIR_SQ)$(bindir_SQ)/trace'
+		$(LN) '$(DESTDIR_SQ)$(bindir_SQ)/perf' '$(DESTDIR_SQ)$(dir_SQ)/trace'; \
+		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(includedir_SQ)/perf'; \
+		$(INSTALL) util/perf_dlfilter.h -t '$(DESTDIR_SQ)$(includedir_SQ)/perf'
 ifndef NO_PERF_READ_VDSO32
 	$(call QUIET_INSTALL, perf-read-vdso32) \
 		$(INSTALL) $(OUTPUT)perf-read-vdso32 '$(DESTDIR_SQ)$(bindir_SQ)';
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 07/11] perf dlfilter: Add resolve_address() to perf_dlfilter_fns
  2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
                   ` (5 preceding siblings ...)
  2021-06-21 15:05 ` [PATCH RFC 06/11] perf build: Install perf_dlfilter.h Adrian Hunter
@ 2021-06-21 15:05 ` Adrian Hunter
  2021-06-21 15:05 ` [PATCH RFC 08/11] perf dlfilter: Add insn() " Adrian Hunter
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Add a function, for use by dlfilters, to resolve addresses from branch
stacks or callchains.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-dlfilter.txt |  6 ++++-
 tools/perf/util/dlfilter.c                 | 29 ++++++++++++++++++++++
 tools/perf/util/perf_dlfilter.h            |  7 +++++-
 3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt
index 8eada542330a..c3fe3bc02819 100644
--- a/tools/perf/Documentation/perf-dlfilter.txt
+++ b/tools/perf/Documentation/perf-dlfilter.txt
@@ -109,7 +109,8 @@ file is loaded. The functions can be called by 'filter_event' or
 struct perf_dlfilter_fns {
 	const struct perf_dlfilter_al *(*resolve_ip)(void *ctx);
 	const struct perf_dlfilter_al *(*resolve_addr)(void *ctx);
-	void *(*reserved[126])(void *);
+	__s32 (*resolve_address)(void *ctx, __u64 address, struct perf_dlfilter_al *al);
+	void *(*reserved[125])(void *);
 };
 ----
 
@@ -117,6 +118,9 @@ struct perf_dlfilter_fns {
 
 'resolve_addr' returns information about addr (if addr_correlates_sym).
 
+'resolve_address' provides information about 'address'. al->size must be set
+before calling. Returns 0 on success, -1 otherwise.
+
 The perf_dlfilter_al structure
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
index 8a2b196f07a7..3b29bf9f2bd3 100644
--- a/tools/perf/util/dlfilter.c
+++ b/tools/perf/util/dlfilter.c
@@ -131,9 +131,38 @@ static const struct perf_dlfilter_al *dlfilter__resolve_addr(void *ctx)
 	return d_addr_al;
 }
 
+static __s32 dlfilter__resolve_address(void *ctx, __u64 address, struct perf_dlfilter_al *d_al_p)
+{
+	struct dlfilter *d = (struct dlfilter *)ctx;
+	struct perf_dlfilter_al d_al;
+	struct addr_location al;
+	struct thread *thread;
+	__u32 sz;
+
+	if (!d->ctx_valid || !d_al_p)
+		return -1;
+
+	thread = get_thread(d);
+	if (!thread)
+		return -1;
+
+	thread__find_symbol_fb(thread, d->sample->cpumode, address, &al);
+
+	al_to_d_al(&al, &d_al);
+
+	d_al.is_kernel_ip = machine__kernel_ip(d->machine, address);
+
+	sz = d_al_p->size;
+	memcpy(d_al_p, &d_al, min((size_t)sz, sizeof(d_al)));
+	d_al_p->size = sz;
+
+	return 0;
+}
+
 static const struct perf_dlfilter_fns perf_dlfilter_fns = {
 	.resolve_ip      = dlfilter__resolve_ip,
 	.resolve_addr    = dlfilter__resolve_addr,
+	.resolve_address = dlfilter__resolve_address,
 };
 
 #define CHECK_FLAG(x) BUILD_BUG_ON((u64)PERF_DLFILTER_FLAG_ ## x != (u64)PERF_IP_FLAG_ ## x)
diff --git a/tools/perf/util/perf_dlfilter.h b/tools/perf/util/perf_dlfilter.h
index d24c3e2a8407..97bceb625b54 100644
--- a/tools/perf/util/perf_dlfilter.h
+++ b/tools/perf/util/perf_dlfilter.h
@@ -90,8 +90,13 @@ struct perf_dlfilter_fns {
 	const struct perf_dlfilter_al *(*resolve_ip)(void *ctx);
 	/* Return information about addr (if addr_correlates_sym) */
 	const struct perf_dlfilter_al *(*resolve_addr)(void *ctx);
+	/*
+	 * Return information about address (al->size must be set before
+	 * calling). Returns 0 on success, -1 otherwise.
+	 */
+	__s32 (*resolve_address)(void *ctx, __u64 address, struct perf_dlfilter_al *al);
 	/* Reserved */
-	void *(*reserved[126])(void *);
+	void *(*reserved[125])(void *);
 };
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 08/11] perf dlfilter: Add insn() to perf_dlfilter_fns
  2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
                   ` (6 preceding siblings ...)
  2021-06-21 15:05 ` [PATCH RFC 07/11] perf dlfilter: Add resolve_address() to perf_dlfilter_fns Adrian Hunter
@ 2021-06-21 15:05 ` Adrian Hunter
  2021-06-21 15:05 ` [PATCH RFC 09/11] perf dlfilter: Add srcline() " Adrian Hunter
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Add a function, for use by dlfilters, to return instruction bytes.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-dlfilter.txt |  5 +++-
 tools/perf/util/dlfilter.c                 | 32 ++++++++++++++++++++++
 tools/perf/util/perf_dlfilter.h            |  4 ++-
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt
index c3fe3bc02819..5ed64bbf084e 100644
--- a/tools/perf/Documentation/perf-dlfilter.txt
+++ b/tools/perf/Documentation/perf-dlfilter.txt
@@ -110,7 +110,8 @@ struct perf_dlfilter_fns {
 	const struct perf_dlfilter_al *(*resolve_ip)(void *ctx);
 	const struct perf_dlfilter_al *(*resolve_addr)(void *ctx);
 	__s32 (*resolve_address)(void *ctx, __u64 address, struct perf_dlfilter_al *al);
-	void *(*reserved[125])(void *);
+	const __u8 *(*insn)(void *ctx, __u32 *length);
+	void *(*reserved[124])(void *);
 };
 ----
 
@@ -121,6 +122,8 @@ struct perf_dlfilter_fns {
 'resolve_address' provides information about 'address'. al->size must be set
 before calling. Returns 0 on success, -1 otherwise.
 
+'insn' returns instruction bytes and length.
+
 The perf_dlfilter_al structure
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
index 3b29bf9f2bd3..375fb01bdeb8 100644
--- a/tools/perf/util/dlfilter.c
+++ b/tools/perf/util/dlfilter.c
@@ -15,6 +15,7 @@
 #include "dso.h"
 #include "map.h"
 #include "thread.h"
+#include "trace-event.h"
 #include "symbol.h"
 #include "dlfilter.h"
 #include "perf_dlfilter.h"
@@ -159,10 +160,41 @@ static __s32 dlfilter__resolve_address(void *ctx, __u64 address, struct perf_dlf
 	return 0;
 }
 
+static const __u8 *dlfilter__insn(void *ctx, __u32 *len)
+{
+	struct dlfilter *d = (struct dlfilter *)ctx;
+
+	if (!len)
+		return NULL;
+
+	*len = 0;
+
+	if (!d->ctx_valid)
+		return NULL;
+
+	if (d->sample->ip && !d->sample->insn_len) {
+		struct addr_location *al = d->al;
+
+		if (!al->thread && machine__resolve(d->machine, al, d->sample) < 0)
+			return NULL;
+
+		if (al->thread->maps && al->thread->maps->machine)
+			script_fetch_insn(d->sample, al->thread, al->thread->maps->machine);
+	}
+
+	if (!d->sample->insn_len)
+		return NULL;
+
+	*len = d->sample->insn_len;
+
+	return (__u8 *)d->sample->insn;
+}
+
 static const struct perf_dlfilter_fns perf_dlfilter_fns = {
 	.resolve_ip      = dlfilter__resolve_ip,
 	.resolve_addr    = dlfilter__resolve_addr,
 	.resolve_address = dlfilter__resolve_address,
+	.insn            = dlfilter__insn,
 };
 
 #define CHECK_FLAG(x) BUILD_BUG_ON((u64)PERF_DLFILTER_FLAG_ ## x != (u64)PERF_IP_FLAG_ ## x)
diff --git a/tools/perf/util/perf_dlfilter.h b/tools/perf/util/perf_dlfilter.h
index 97bceb625b54..913b773af268 100644
--- a/tools/perf/util/perf_dlfilter.h
+++ b/tools/perf/util/perf_dlfilter.h
@@ -95,8 +95,10 @@ struct perf_dlfilter_fns {
 	 * calling). Returns 0 on success, -1 otherwise.
 	 */
 	__s32 (*resolve_address)(void *ctx, __u64 address, struct perf_dlfilter_al *al);
+	/* Return instruction bytes and length */
+	const __u8 *(*insn)(void *ctx, __u32 *length);
 	/* Reserved */
-	void *(*reserved[125])(void *);
+	void *(*reserved[124])(void *);
 };
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 09/11] perf dlfilter: Add srcline() to perf_dlfilter_fns
  2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
                   ` (7 preceding siblings ...)
  2021-06-21 15:05 ` [PATCH RFC 08/11] perf dlfilter: Add insn() " Adrian Hunter
@ 2021-06-21 15:05 ` Adrian Hunter
  2021-06-21 15:05 ` [PATCH RFC 10/11] perf dlfilter: Add attr() " Adrian Hunter
  2021-06-21 15:05 ` [PATCH RFC 11/11] perf dlfilter: Add object_code() " Adrian Hunter
  10 siblings, 0 replies; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Add a function, for use by dlfilters, to return source code file name and
line number.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-dlfilter.txt |  5 +++-
 tools/perf/util/dlfilter.c                 | 28 ++++++++++++++++++++++
 tools/perf/util/perf_dlfilter.h            |  4 +++-
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt
index 5ed64bbf084e..b6f958983584 100644
--- a/tools/perf/Documentation/perf-dlfilter.txt
+++ b/tools/perf/Documentation/perf-dlfilter.txt
@@ -111,7 +111,8 @@ struct perf_dlfilter_fns {
 	const struct perf_dlfilter_al *(*resolve_addr)(void *ctx);
 	__s32 (*resolve_address)(void *ctx, __u64 address, struct perf_dlfilter_al *al);
 	const __u8 *(*insn)(void *ctx, __u32 *length);
-	void *(*reserved[124])(void *);
+	const char *(*srcline)(void *ctx, __u32 *line_number);
+	void *(*reserved[123])(void *);
 };
 ----
 
@@ -124,6 +125,8 @@ before calling. Returns 0 on success, -1 otherwise.
 
 'insn' returns instruction bytes and length.
 
+'srcline' return source file name and line number.
+
 The perf_dlfilter_al structure
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
index 375fb01bdeb8..d71b0c97d1eb 100644
--- a/tools/perf/util/dlfilter.c
+++ b/tools/perf/util/dlfilter.c
@@ -17,6 +17,7 @@
 #include "thread.h"
 #include "trace-event.h"
 #include "symbol.h"
+#include "srcline.h"
 #include "dlfilter.h"
 #include "perf_dlfilter.h"
 
@@ -190,11 +191,38 @@ static const __u8 *dlfilter__insn(void *ctx, __u32 *len)
 	return (__u8 *)d->sample->insn;
 }
 
+static const char *dlfilter__srcline(void *ctx, __u32 *line_no)
+{
+	struct dlfilter *d = (struct dlfilter *)ctx;
+	struct addr_location *al;
+	unsigned int line = 0;
+	char *srcfile = NULL;
+	struct map *map;
+	u64 addr;
+
+	if (!d->ctx_valid || !line_no)
+		return NULL;
+
+	al = get_al(d);
+	if (!al)
+		return NULL;
+
+	map = al->map;
+	addr = al->addr;
+
+	if (map && map->dso)
+		srcfile = get_srcline_split(map->dso, map__rip_2objdump(map, addr), &line);
+
+	*line_no = line;
+	return srcfile;
+}
+
 static const struct perf_dlfilter_fns perf_dlfilter_fns = {
 	.resolve_ip      = dlfilter__resolve_ip,
 	.resolve_addr    = dlfilter__resolve_addr,
 	.resolve_address = dlfilter__resolve_address,
 	.insn            = dlfilter__insn,
+	.srcline         = dlfilter__srcline,
 };
 
 #define CHECK_FLAG(x) BUILD_BUG_ON((u64)PERF_DLFILTER_FLAG_ ## x != (u64)PERF_IP_FLAG_ ## x)
diff --git a/tools/perf/util/perf_dlfilter.h b/tools/perf/util/perf_dlfilter.h
index 913b773af268..a91e314ba24a 100644
--- a/tools/perf/util/perf_dlfilter.h
+++ b/tools/perf/util/perf_dlfilter.h
@@ -97,8 +97,10 @@ struct perf_dlfilter_fns {
 	__s32 (*resolve_address)(void *ctx, __u64 address, struct perf_dlfilter_al *al);
 	/* Return instruction bytes and length */
 	const __u8 *(*insn)(void *ctx, __u32 *length);
+	/* Return source file name and line number */
+	const char *(*srcline)(void *ctx, __u32 *line_number);
 	/* Reserved */
-	void *(*reserved[124])(void *);
+	void *(*reserved[123])(void *);
 };
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 10/11] perf dlfilter: Add attr() to perf_dlfilter_fns
  2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
                   ` (8 preceding siblings ...)
  2021-06-21 15:05 ` [PATCH RFC 09/11] perf dlfilter: Add srcline() " Adrian Hunter
@ 2021-06-21 15:05 ` Adrian Hunter
  2021-06-21 15:05 ` [PATCH RFC 11/11] perf dlfilter: Add object_code() " Adrian Hunter
  10 siblings, 0 replies; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Add a function, for use by dlfilters, to return the perf_event_attr
structure.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-dlfilter.txt |  5 ++++-
 tools/perf/util/dlfilter.c                 | 11 +++++++++++
 tools/perf/util/perf_dlfilter.h            |  4 +++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt
index b6f958983584..d37913343449 100644
--- a/tools/perf/Documentation/perf-dlfilter.txt
+++ b/tools/perf/Documentation/perf-dlfilter.txt
@@ -112,7 +112,8 @@ struct perf_dlfilter_fns {
 	__s32 (*resolve_address)(void *ctx, __u64 address, struct perf_dlfilter_al *al);
 	const __u8 *(*insn)(void *ctx, __u32 *length);
 	const char *(*srcline)(void *ctx, __u32 *line_number);
-	void *(*reserved[123])(void *);
+	struct perf_event_attr *(*attr)(void *ctx);
+	void *(*reserved[122])(void *);
 };
 ----
 
@@ -127,6 +128,8 @@ before calling. Returns 0 on success, -1 otherwise.
 
 'srcline' return source file name and line number.
 
+'attr' returns perf_event_attr, refer <linux/perf_event.h>.
+
 The perf_dlfilter_al structure
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
index d71b0c97d1eb..2e89f322ff60 100644
--- a/tools/perf/util/dlfilter.c
+++ b/tools/perf/util/dlfilter.c
@@ -217,12 +217,23 @@ static const char *dlfilter__srcline(void *ctx, __u32 *line_no)
 	return srcfile;
 }
 
+static struct perf_event_attr *dlfilter__attr(void *ctx)
+{
+	struct dlfilter *d = (struct dlfilter *)ctx;
+
+	if (!d->ctx_valid)
+		return NULL;
+
+	return &d->evsel->core.attr;
+}
+
 static const struct perf_dlfilter_fns perf_dlfilter_fns = {
 	.resolve_ip      = dlfilter__resolve_ip,
 	.resolve_addr    = dlfilter__resolve_addr,
 	.resolve_address = dlfilter__resolve_address,
 	.insn            = dlfilter__insn,
 	.srcline         = dlfilter__srcline,
+	.attr            = dlfilter__attr,
 };
 
 #define CHECK_FLAG(x) BUILD_BUG_ON((u64)PERF_DLFILTER_FLAG_ ## x != (u64)PERF_IP_FLAG_ ## x)
diff --git a/tools/perf/util/perf_dlfilter.h b/tools/perf/util/perf_dlfilter.h
index a91e314ba24a..e2cdf416b22e 100644
--- a/tools/perf/util/perf_dlfilter.h
+++ b/tools/perf/util/perf_dlfilter.h
@@ -99,8 +99,10 @@ struct perf_dlfilter_fns {
 	const __u8 *(*insn)(void *ctx, __u32 *length);
 	/* Return source file name and line number */
 	const char *(*srcline)(void *ctx, __u32 *line_number);
+	/* Return perf_event_attr, refer <linux/perf_event.h> */
+	struct perf_event_attr *(*attr)(void *ctx);
 	/* Reserved */
-	void *(*reserved[123])(void *);
+	void *(*reserved[122])(void *);
 };
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 11/11] perf dlfilter: Add object_code() to perf_dlfilter_fns
  2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
                   ` (9 preceding siblings ...)
  2021-06-21 15:05 ` [PATCH RFC 10/11] perf dlfilter: Add attr() " Adrian Hunter
@ 2021-06-21 15:05 ` Adrian Hunter
  10 siblings, 0 replies; 17+ messages in thread
From: Adrian Hunter @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Add a function, for use by dlfilters, to read object code.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-dlfilter.txt |  5 +++-
 tools/perf/util/dlfilter.c                 | 34 ++++++++++++++++++++++
 tools/perf/util/perf_dlfilter.h            |  4 ++-
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt
index d37913343449..8a6d21e30fc5 100644
--- a/tools/perf/Documentation/perf-dlfilter.txt
+++ b/tools/perf/Documentation/perf-dlfilter.txt
@@ -113,7 +113,8 @@ struct perf_dlfilter_fns {
 	const __u8 *(*insn)(void *ctx, __u32 *length);
 	const char *(*srcline)(void *ctx, __u32 *line_number);
 	struct perf_event_attr *(*attr)(void *ctx);
-	void *(*reserved[122])(void *);
+	__s32 (*object_code)(void *ctx, __u64 ip, void *buf, __u32 len);
+	void *(*reserved[121])(void *);
 };
 ----
 
@@ -130,6 +131,8 @@ before calling. Returns 0 on success, -1 otherwise.
 
 'attr' returns perf_event_attr, refer <linux/perf_event.h>.
 
+'object_code' reads object code and returns the number of bytes read.
+
 The perf_dlfilter_al structure
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
index 2e89f322ff60..fa087422baa1 100644
--- a/tools/perf/util/dlfilter.c
+++ b/tools/perf/util/dlfilter.c
@@ -227,6 +227,39 @@ static struct perf_event_attr *dlfilter__attr(void *ctx)
 	return &d->evsel->core.attr;
 }
 
+static __s32 dlfilter__object_code(void *ctx, __u64 ip, void *buf, __u32 len)
+{
+	struct dlfilter *d = (struct dlfilter *)ctx;
+	struct addr_location *al;
+	struct addr_location a;
+	struct map *map;
+	u64 offset;
+
+	if (!d->ctx_valid)
+		return -1;
+
+	al = get_al(d);
+	if (!al)
+		return -1;
+
+	map = al->map;
+
+	if (map && ip >= map->start && ip < map->end &&
+	    machine__kernel_ip(d->machine, ip) == machine__kernel_ip(d->machine, d->sample->ip))
+		goto have_map;
+
+	thread__find_map_fb(al->thread, d->sample->cpumode, ip, &a);
+	if (!a.map)
+		return -1;
+
+	map = a.map;
+have_map:
+	offset = map->map_ip(map, ip);
+	if (ip + len >= map->end)
+		len = map->end - ip;
+	return dso__data_read_offset(map->dso, d->machine, offset, buf, len);
+}
+
 static const struct perf_dlfilter_fns perf_dlfilter_fns = {
 	.resolve_ip      = dlfilter__resolve_ip,
 	.resolve_addr    = dlfilter__resolve_addr,
@@ -234,6 +267,7 @@ static const struct perf_dlfilter_fns perf_dlfilter_fns = {
 	.insn            = dlfilter__insn,
 	.srcline         = dlfilter__srcline,
 	.attr            = dlfilter__attr,
+	.object_code     = dlfilter__object_code,
 };
 
 #define CHECK_FLAG(x) BUILD_BUG_ON((u64)PERF_DLFILTER_FLAG_ ## x != (u64)PERF_IP_FLAG_ ## x)
diff --git a/tools/perf/util/perf_dlfilter.h b/tools/perf/util/perf_dlfilter.h
index e2cdf416b22e..4429f3836fa7 100644
--- a/tools/perf/util/perf_dlfilter.h
+++ b/tools/perf/util/perf_dlfilter.h
@@ -101,8 +101,10 @@ struct perf_dlfilter_fns {
 	const char *(*srcline)(void *ctx, __u32 *line_number);
 	/* Return perf_event_attr, refer <linux/perf_event.h> */
 	struct perf_event_attr *(*attr)(void *ctx);
+	/* Read object code, return numbers of bytes read */
+	__s32 (*object_code)(void *ctx, __u64 ip, void *buf, __u32 len);
 	/* Reserved */
-	void *(*reserved[122])(void *);
+	void *(*reserved[121])(void *);
 };
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 01/11] perf script: Move filter_cpu() earlier
  2021-06-21 15:05 ` [PATCH RFC 01/11] perf script: Move filter_cpu() earlier Adrian Hunter
@ 2021-06-22 18:16   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-06-22 18:16 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Em Mon, Jun 21, 2021 at 06:05:04PM +0300, Adrian Hunter escreveu:
> Generally, it should be more efficient if filter_cpu() comes before
> machine__resolve() because filter_cpu() is much less code than
> machine__resolve().

Simple patch, I like it, applied.

- Arnaldo
 
> Example:
> 
>  $ perf record --sample-cpu -- make -C tools/perf >/dev/null
> 
> Before:
> 
>  $ perf stat -- perf script -C 0 >/dev/null
> 
>   Performance counter stats for 'perf script -C 0':
> 
>             116.94 msec task-clock                #    0.992 CPUs utilized
>                  2      context-switches          #   17.103 /sec
>                  0      cpu-migrations            #    0.000 /sec
>              8,187      page-faults               #   70.011 K/sec
>        478,351,812      cycles                    #    4.091 GHz
>        564,785,464      instructions              #    1.18  insn per cycle
>        114,341,105      branches                  #  977.789 M/sec
>          2,615,495      branch-misses             #    2.29% of all branches
> 
>        0.117840576 seconds time elapsed
> 
>        0.085040000 seconds user
>        0.032396000 seconds sys
> 
> After:
> 
>  $ perf stat -- perf script -C 0 >/dev/null
> 
>   Performance counter stats for 'perf script -C 0':
> 
>             107.45 msec task-clock                #    0.992 CPUs utilized
>                  3      context-switches          #   27.919 /sec
>                  0      cpu-migrations            #    0.000 /sec
>              7,964      page-faults               #   74.117 K/sec
>        438,417,260      cycles                    #    4.080 GHz
>        522,571,855      instructions              #    1.19  insn per cycle
>        105,187,488      branches                  #  978.921 M/sec
>          2,356,261      branch-misses             #    2.24% of all branches
> 
>        0.108282546 seconds time elapsed
> 
>        0.095935000 seconds user
>        0.011991000 seconds sys
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/builtin-script.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 57488d60b64a..08a2b5d51018 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -2191,6 +2191,9 @@ static int process_sample_event(struct perf_tool *tool,
>  		return 0;
>  	}
>  
> +	if (filter_cpu(sample))
> +		return 0;
> +
>  	if (machine__resolve(machine, &al, sample) < 0) {
>  		pr_err("problem processing %d event, skipping it.\n",
>  		       event->header.type);
> @@ -2200,9 +2203,6 @@ static int process_sample_event(struct perf_tool *tool,
>  	if (al.filtered)
>  		goto out_put;
>  
> -	if (filter_cpu(sample))
> -		goto out_put;
> -
>  	if (scripting_ops) {
>  		struct addr_location *addr_al_ptr = NULL;
>  		struct addr_location addr_al;
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 02/11] perf script: Move filtering before scripting
  2021-06-21 15:05 ` [PATCH RFC 02/11] perf script: Move filtering before scripting Adrian Hunter
@ 2021-06-22 18:18   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-06-22 18:18 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Em Mon, Jun 21, 2021 at 06:05:05PM +0300, Adrian Hunter escreveu:
> To make it possible to use filtering with scripts, move filtering before
> scripting.


Thanks, applied.

- Arnaldo

 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/builtin-script.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 08a2b5d51018..ff7b43899f2e 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -1984,12 +1984,6 @@ static void process_event(struct perf_script *script,
>  	if (output[type].fields == 0)
>  		return;
>  
> -	if (!show_event(sample, evsel, thread, al))
> -		return;
> -
> -	if (evswitch__discard(&script->evswitch, evsel))
> -		return;
> -
>  	++es->samples;
>  
>  	perf_sample__fprintf_start(script, sample, thread, evsel,
> @@ -2203,6 +2197,12 @@ static int process_sample_event(struct perf_tool *tool,
>  	if (al.filtered)
>  		goto out_put;
>  
> +	if (!show_event(sample, evsel, al.thread, &al))
> +		goto out_put;
> +
> +	if (evswitch__discard(&scr->evswitch, evsel))
> +		goto out_put;
> +
>  	if (scripting_ops) {
>  		struct addr_location *addr_al_ptr = NULL;
>  		struct addr_location addr_al;
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 03/11] perf script: Share addr_al between functions
  2021-06-21 15:05 ` [PATCH RFC 03/11] perf script: Share addr_al between functions Adrian Hunter
@ 2021-06-22 18:21   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-06-22 18:21 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Em Mon, Jun 21, 2021 at 06:05:06PM +0300, Adrian Hunter escreveu:
> Share the addr_location of 'addr' so that it need not be resolved more than
> once.

Another nice patch, i.e. don't resolve it unconditionally, let the first
function that needs it under a callchain to do it, then reuse if another
needs it.

Looks ok, applied.

- Arnaldo
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/builtin-script.c | 38 +++++++++++++++++++++++--------------
>  1 file changed, 24 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index ff7b43899f2e..d2771a997e26 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -1337,17 +1337,18 @@ static const char *resolve_branch_sym(struct perf_sample *sample,
>  				      struct evsel *evsel,
>  				      struct thread *thread,
>  				      struct addr_location *al,
> +				      struct addr_location *addr_al,
>  				      u64 *ip)
>  {
> -	struct addr_location addr_al;
>  	struct perf_event_attr *attr = &evsel->core.attr;
>  	const char *name = NULL;
>  
>  	if (sample->flags & (PERF_IP_FLAG_CALL | PERF_IP_FLAG_TRACE_BEGIN)) {
>  		if (sample_addr_correlates_sym(attr)) {
> -			thread__resolve(thread, &addr_al, sample);
> -			if (addr_al.sym)
> -				name = addr_al.sym->name;
> +			if (!addr_al->thread)
> +				thread__resolve(thread, addr_al, sample);
> +			if (addr_al->sym)
> +				name = addr_al->sym->name;
>  			else
>  				*ip = sample->addr;
>  		} else {
> @@ -1365,7 +1366,9 @@ static const char *resolve_branch_sym(struct perf_sample *sample,
>  static int perf_sample__fprintf_callindent(struct perf_sample *sample,
>  					   struct evsel *evsel,
>  					   struct thread *thread,
> -					   struct addr_location *al, FILE *fp)
> +					   struct addr_location *al,
> +					   struct addr_location *addr_al,
> +					   FILE *fp)
>  {
>  	struct perf_event_attr *attr = &evsel->core.attr;
>  	size_t depth = thread_stack__depth(thread, sample->cpu);
> @@ -1382,7 +1385,7 @@ static int perf_sample__fprintf_callindent(struct perf_sample *sample,
>  	if (thread->ts && sample->flags & PERF_IP_FLAG_RETURN)
>  		depth += 1;
>  
> -	name = resolve_branch_sym(sample, evsel, thread, al, &ip);
> +	name = resolve_branch_sym(sample, evsel, thread, al, addr_al, &ip);
>  
>  	if (PRINT_FIELD(DSO) && !(PRINT_FIELD(IP) || PRINT_FIELD(ADDR))) {
>  		dlen += fprintf(fp, "(");
> @@ -1466,6 +1469,7 @@ static int perf_sample__fprintf_bts(struct perf_sample *sample,
>  				    struct evsel *evsel,
>  				    struct thread *thread,
>  				    struct addr_location *al,
> +				    struct addr_location *addr_al,
>  				    struct machine *machine, FILE *fp)
>  {
>  	struct perf_event_attr *attr = &evsel->core.attr;
> @@ -1474,7 +1478,7 @@ static int perf_sample__fprintf_bts(struct perf_sample *sample,
>  	int printed = 0;
>  
>  	if (PRINT_FIELD(CALLINDENT))
> -		printed += perf_sample__fprintf_callindent(sample, evsel, thread, al, fp);
> +		printed += perf_sample__fprintf_callindent(sample, evsel, thread, al, addr_al, fp);
>  
>  	/* print branch_from information */
>  	if (PRINT_FIELD(IP)) {
> @@ -1931,7 +1935,8 @@ static void perf_sample__fprint_metric(struct perf_script *script,
>  static bool show_event(struct perf_sample *sample,
>  		       struct evsel *evsel,
>  		       struct thread *thread,
> -		       struct addr_location *al)
> +		       struct addr_location *al,
> +		       struct addr_location *addr_al)
>  {
>  	int depth = thread_stack__depth(thread, sample->cpu);
>  
> @@ -1947,7 +1952,7 @@ static bool show_event(struct perf_sample *sample,
>  	} else {
>  		const char *s = symbol_conf.graph_function;
>  		u64 ip;
> -		const char *name = resolve_branch_sym(sample, evsel, thread, al,
> +		const char *name = resolve_branch_sym(sample, evsel, thread, al, addr_al,
>  				&ip);
>  		unsigned nlen;
>  
> @@ -1972,6 +1977,7 @@ static bool show_event(struct perf_sample *sample,
>  static void process_event(struct perf_script *script,
>  			  struct perf_sample *sample, struct evsel *evsel,
>  			  struct addr_location *al,
> +			  struct addr_location *addr_al,
>  			  struct machine *machine)
>  {
>  	struct thread *thread = al->thread;
> @@ -2005,7 +2011,7 @@ static void process_event(struct perf_script *script,
>  		perf_sample__fprintf_flags(sample->flags, fp);
>  
>  	if (is_bts_event(attr)) {
> -		perf_sample__fprintf_bts(sample, evsel, thread, al, machine, fp);
> +		perf_sample__fprintf_bts(sample, evsel, thread, al, addr_al, machine, fp);
>  		return;
>  	}
>  
> @@ -2168,6 +2174,7 @@ static int process_sample_event(struct perf_tool *tool,
>  {
>  	struct perf_script *scr = container_of(tool, struct perf_script, tool);
>  	struct addr_location al;
> +	struct addr_location addr_al;
>  
>  	if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num,
>  					  sample->time)) {
> @@ -2197,7 +2204,10 @@ static int process_sample_event(struct perf_tool *tool,
>  	if (al.filtered)
>  		goto out_put;
>  
> -	if (!show_event(sample, evsel, al.thread, &al))
> +	/* Set thread to NULL to indicate addr_al is not initialized */
> +	addr_al.thread = NULL;
> +
> +	if (!show_event(sample, evsel, al.thread, &al, &addr_al))
>  		goto out_put;
>  
>  	if (evswitch__discard(&scr->evswitch, evsel))
> @@ -2205,16 +2215,16 @@ static int process_sample_event(struct perf_tool *tool,
>  
>  	if (scripting_ops) {
>  		struct addr_location *addr_al_ptr = NULL;
> -		struct addr_location addr_al;
>  
>  		if ((evsel->core.attr.sample_type & PERF_SAMPLE_ADDR) &&
>  		    sample_addr_correlates_sym(&evsel->core.attr)) {
> -			thread__resolve(al.thread, &addr_al, sample);
> +			if (!addr_al.thread)
> +				thread__resolve(al.thread, &addr_al, sample);
>  			addr_al_ptr = &addr_al;
>  		}
>  		scripting_ops->process_event(event, sample, evsel, &al, addr_al_ptr);
>  	} else {
> -		process_event(scr, sample, evsel, &al, machine);
> +		process_event(scr, sample, evsel, &al, &addr_al, machine);
>  	}
>  
>  out_put:
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 04/11] perf script: Add API for filtering via dynamically loaded shared object
  2021-06-21 15:05 ` [PATCH RFC 04/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
@ 2021-06-22 18:32   ` Arnaldo Carvalho de Melo
  2021-06-23  9:25     ` Adrian Hunter
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-06-22 18:32 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

Em Mon, Jun 21, 2021 at 06:05:07PM +0300, Adrian Hunter escreveu:
> In some cases, users want to filter very large amounts of data (e.g. from
> AUX area tracing like Intel PT) looking for something specific. While
> scripting such as Python can be used, Python is 10 to 20 times slower than
> C. So define a C API so that custom filters can be written and loaded.

Statically linking with perf would be even faster 8-) But yeah, I think
its something useful, some notes below

A first question, can this be combined with pre-existing filters? I.e.

  perf script -C 0 --dlfilter file.so

?

Also in the docs we can state that after these custom filters get
polished and mature, it would be interesting to send them to be merged
upstream, where they could even be statically linked with perf, if a
profile session with it linked statically proves that.

Having a collection of such filters upstream we could then have
a similar interface as for the script collection to describe filters,
i.e. the .so file would have to have a description and perhaps a man
page:

⬢[acme@toolbox perf]$ perf script -l
List of available trace scripts:
  compaction-times [-h] [-u] [-p|-pv] [-t | [-m] [-fs] [-ms]] [pid|pid-range|comm-regex] display time taken by mm compaction
  event_analyzing_sample               analyze all perf samples
  export-to-postgresql [database name] [columns] [calls] export perf data to a postgresql database
  export-to-sqlite [database name] [columns] [calls] export perf data to a sqlite3 database
  failed-syscalls-by-pid [comm]        system-wide failed syscalls, by pid
  flamegraph                           create flame graphs
  futex-contention                     futext contention measurement
  intel-pt-events                      print Intel PT Events including Power Events and PTWRITE
  mem-phys-addr                        resolve physical address samples
  net_dropmonitor                      display a table of dropped frames
  netdev-times [tx] [rx] [dev=] [debug] display a process of packet and processing time
  powerpc-hcalls
  sched-migration                      sched migration overview
  sctop [comm] [interval]              syscall top
  stackcollapse                        produce callgraphs in short form for scripting use
  syscall-counts-by-pid [comm]         system-wide syscall counts, by pid
  syscall-counts [comm]                system-wide syscall counts
  failed-syscalls [comm]               system-wide failed syscalls
  rw-by-file <comm>                    r/w activity for a program, by file
  rw-by-pid                            system-wide r/w activity
  rwtop [interval]                     system-wide r/w top
  wakeup-latency                       system-wide min/max/avg wakeup latency
⬢[acme@toolbox perf]$

I.e. some 'perf script --list-dlfilters'
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/Documentation/perf-dlfilter.txt | 214 +++++++++++++
>  tools/perf/Documentation/perf-script.txt   |   7 +-
>  tools/perf/builtin-script.c                |  25 +-
>  tools/perf/util/Build                      |   1 +
>  tools/perf/util/dlfilter.c                 | 330 +++++++++++++++++++++
>  tools/perf/util/dlfilter.h                 |  74 +++++
>  tools/perf/util/perf_dlfilter.h            | 120 ++++++++
>  7 files changed, 769 insertions(+), 2 deletions(-)
>  create mode 100644 tools/perf/Documentation/perf-dlfilter.txt
>  create mode 100644 tools/perf/util/dlfilter.c
>  create mode 100644 tools/perf/util/dlfilter.h
>  create mode 100644 tools/perf/util/perf_dlfilter.h
> 
> diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt
> new file mode 100644
> index 000000000000..d8f80998790f
> --- /dev/null
> +++ b/tools/perf/Documentation/perf-dlfilter.txt
> @@ -0,0 +1,214 @@
> +perf-dlfilter(1)
> +================
> +
> +NAME
> +----
> +perf-dlfilter - Filter sample events using a dynamically loaded shared
> +object file
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'perf script' [--dlfilter file.so ]
> +
> +DESCRIPTION
> +-----------
> +
> +This option is used to process data through a custom filter provided by a
> +dynamically loaded shared object file.
> +
> +API
> +---
> +
> +The API for filtering consists of the following:
> +
> +[source,c]
> +----
> +#include <perf/perf_dlfilter.h>
> +
> +const struct perf_dlfilter_fns perf_dlfilter_fns;
> +
> +int start(void **data);
> +int stop(void *data);
> +int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx);
> +----
> +
> +If implemented, 'start' will be called at the beginning, before any
> +calls to 'filter_event' . Return 0 to indicate success,
> +or return a negative error code. '*data' can be assigned for use by other
> +functions.
> +
> +If implemented, 'stop' will be called at the end, after any calls to
> +'filter_event'. Return 0 to indicate success, or
> +return a negative error code. 'data' is set by 'start'.
> +
> +If implemented, 'filter_event' will be called for each sample event.
> +Return 0 to keep the sample event, 1 to filter it out, or return a negative
> +error code. 'data' is set by 'start'. 'ctx' is needed for calls to
> +'perf_dlfilter_fns'.
> +
> +The perf_dlfilter_sample structure
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +'filter_event' is passed a perf_dlfilter_sample
> +structure, which contains the following fields:
> +[source,c]
> +----
> +/*
> + * perf sample event information (as per perf script and <linux/perf_event.h>)
> + */
> +struct perf_dlfilter_sample {
> +	__u32 size; /* Size of this structure (for compatibility checking) */

There is a 4-byte hole here, 

> +	__u64 ip;
> +	__s32 pid;
> +	__s32 tid;
> +	__u64 time;
> +	__u64 addr;
> +	__u64 id;
> +	__u64 stream_id;
> +	__u64 period;
> +	__u64 weight;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
> +	__u16 ins_lat;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
> +	__u16 p_stage_cyc;	/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */

Another

Can you move these last two __u16 to right after the 'size' field?

> +	__u64 transaction;	/* Refer PERF_SAMPLE_TRANSACTION in <linux/perf_event.h> */
> +	__u64 insn_cnt;	/* For instructions-per-cycle (IPC) */
> +	__u64 cyc_cnt;		/* For instructions-per-cycle (IPC) */
> +	__s32 cpu;
> +	__u32 flags;		/* Refer PERF_DLFILTER_FLAG_* above */
> +	__u64 data_src;		/* Refer PERF_SAMPLE_DATA_SRC in <linux/perf_event.h> */
> +	__u64 phys_addr;	/* Refer PERF_SAMPLE_PHYS_ADDR in <linux/perf_event.h> */
> +	__u64 data_page_size;	/* Refer PERF_SAMPLE_DATA_PAGE_SIZE in <linux/perf_event.h> */
> +	__u64 code_page_size;	/* Refer PERF_SAMPLE_CODE_PAGE_SIZE in <linux/perf_event.h> */
> +	__u64 cgroup;		/* Refer PERF_SAMPLE_CGROUP in <linux/perf_event.h> */
> +	__u8  cpumode;		/* Refer CPUMODE_MASK etc in <linux/perf_event.h> */
> +	__u8  addr_correlates_sym; /* True => resolve_addr() can be called */
> +	__u16 misc;		/* Refer perf_event_header in <linux/perf_event.h> */
> +	__u32 raw_size;		/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
> +	const void *raw_data;	/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
> +	__u64 brstack_nr;	/* Number of brstack entries */
> +	const struct perf_branch_entry *brstack; /* Refer <linux/perf_event.h> */
> +	__u64 raw_callchain_nr;	/* Number of raw_callchain entries */
> +	const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */
> +	const char *event;
> +};
> +----
> +
> +The perf_dlfilter_fns structure
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The 'perf_dlfilter_fns' structure is populated with function pointers when the
> +file is loaded. The functions can be called by 'filter_event'.
> +
> +[source,c]
> +----
> +struct perf_dlfilter_fns {
> +	const struct perf_dlfilter_al *(*resolve_ip)(void *ctx);
> +	const struct perf_dlfilter_al *(*resolve_addr)(void *ctx);
> +	void *(*reserved[126])(void *);
> +};
> +----
> +
> +'resolve_ip' returns information about ip.
> +
> +'resolve_addr' returns information about addr (if addr_correlates_sym).
> +
> +The perf_dlfilter_al structure
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The 'perf_dlfilter_al' structure contains information about an address.
> +
> +[source,c]
> +----
> +/*
> + * Address location (as per perf script)
> + */
> +struct perf_dlfilter_al {
> +	__u32 size; /* Size of this structure (for compatibility checking) */
> +	__u32 symoff;
> +	const char *sym;
> +	__u64 addr; /* Mapped address (from dso) */
> +	__u64 sym_start;
> +	__u64 sym_end;
> +	const char *dso;
> +	__u8  sym_binding; /* STB_LOCAL, STB_GLOBAL or STB_WEAK, refer <elf.h> */
> +	__u8  is_64_bit; /* Only valid if dso is not NULL */
> +	__u8  is_kernel_ip; /* True if in kernel space */
> +	__u32 buildid_size;
> +	__u8 *buildid;
> +	/* Below members are only populated by resolve_ip() */
> +	__u8 filtered; /* true if this sample event will be filtered out */
> +	const char *comm;
> +};
> +----
> +
> +perf_dlfilter_sample flags
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The 'flags' member of 'perf_dlfilter_sample' corresponds with the flags field
> +of perf script. The bits of the flags are as follows:
> +
> +[source,c]
> +----
> +/* Definitions for perf_dlfilter_sample flags */
> +enum {
> +	PERF_DLFILTER_FLAG_BRANCH	= 1ULL << 0,
> +	PERF_DLFILTER_FLAG_CALL		= 1ULL << 1,
> +	PERF_DLFILTER_FLAG_RETURN	= 1ULL << 2,
> +	PERF_DLFILTER_FLAG_CONDITIONAL	= 1ULL << 3,
> +	PERF_DLFILTER_FLAG_SYSCALLRET	= 1ULL << 4,
> +	PERF_DLFILTER_FLAG_ASYNC	= 1ULL << 5,
> +	PERF_DLFILTER_FLAG_INTERRUPT	= 1ULL << 6,
> +	PERF_DLFILTER_FLAG_TX_ABORT	= 1ULL << 7,
> +	PERF_DLFILTER_FLAG_TRACE_BEGIN	= 1ULL << 8,
> +	PERF_DLFILTER_FLAG_TRACE_END	= 1ULL << 9,
> +	PERF_DLFILTER_FLAG_IN_TX	= 1ULL << 10,
> +	PERF_DLFILTER_FLAG_VMENTRY	= 1ULL << 11,
> +	PERF_DLFILTER_FLAG_VMEXIT	= 1ULL << 12,
> +};
> +----
> +
> +EXAMPLE
> +-------
> +
> +Filter out everything except branches from "foo" to "bar":
> +
> +[source,c]
> +----
> +#include <perf/perf_dlfilter.h>
> +#include <string.h>
> +
> +const struct perf_dlfilter_fns perf_dlfilter_fns;
> +
> +int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
> +{
> +	const struct perf_dlfilter_al *al;
> +	const struct perf_dlfilter_al *addr_al;
> +
> +	if (!sample->ip || !sample->addr_correlates_sym)
> +		return 1;
> +
> +	al = perf_dlfilter_fns.resolve_ip(ctx);
> +	if (!al || !al->sym || strcmp(al->sym, "foo"))
> +		return 1;
> +
> +	addr_al = perf_dlfilter_fns.resolve_addr(ctx);
> +	if (!addr_al || !addr_al->sym || strcmp(addr_al->sym, "bar"))
> +		return 1;
> +
> +	return 0;
> +}
> +----
> +
> +To build the shared object, assuming perf has been installed for the local user
> +i.e. perf_dlfilter.h is in ~/include/perf :
> +
> +	gcc -c -I ~/include -fpic dlfilter-example.c
> +	gcc -shared -o dlfilter-example.so dlfilter-example.o
> +
> +To use the filter with perf script:
> +
> +	perf script --dlfilter ./dlfilter-example.so
> +
> +SEE ALSO
> +--------
> +linkperf:perf-script[1]
> diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> index 48a5f5b26dd4..2306c81b606b 100644
> --- a/tools/perf/Documentation/perf-script.txt
> +++ b/tools/perf/Documentation/perf-script.txt
> @@ -98,6 +98,10 @@ OPTIONS
>          Generate perf-script.[ext] starter script for given language,
>          using current perf.data.
>  
> +--dlfilter=<file>::
> +	Filter sample events using the given shared object file.
> +	Refer linkperf:perf-dlfilter[1]
> +
>  -a::
>          Force system-wide collection.  Scripts run without a <command>
>          normally use -a by default, while scripts run with a <command>
> @@ -483,4 +487,5 @@ include::itrace.txt[]
>  SEE ALSO
>  --------
>  linkperf:perf-record[1], linkperf:perf-script-perl[1],
> -linkperf:perf-script-python[1], linkperf:perf-intel-pt[1]
> +linkperf:perf-script-python[1], linkperf:perf-intel-pt[1],
> +linkperf:perf-dlfilter[1]
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index d2771a997e26..aaf2922643a0 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -55,6 +55,7 @@
>  #include <subcmd/pager.h>
>  #include <perf/evlist.h>
>  #include <linux/err.h>
> +#include "util/dlfilter.h"
>  #include "util/record.h"
>  #include "util/util.h"
>  #include "perf.h"
> @@ -79,6 +80,7 @@ static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
>  static struct perf_stat_config	stat_config;
>  static int			max_blocks;
>  static bool			native_arch;
> +static struct dlfilter		*dlfilter;
>  
>  unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
>  
> @@ -2175,6 +2177,7 @@ static int process_sample_event(struct perf_tool *tool,
>  	struct perf_script *scr = container_of(tool, struct perf_script, tool);
>  	struct addr_location al;
>  	struct addr_location addr_al;
> +	int ret = 0;
>  
>  	if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num,
>  					  sample->time)) {
> @@ -2213,6 +2216,13 @@ static int process_sample_event(struct perf_tool *tool,
>  	if (evswitch__discard(&scr->evswitch, evsel))
>  		goto out_put;
>  
> +	ret = dlfilter__filter_event(dlfilter, event, sample, evsel, machine, &al, &addr_al);
> +	if (ret) {
> +		if (ret > 0)
> +			ret = 0;
> +		goto out_put;
> +	}
> +
>  	if (scripting_ops) {
>  		struct addr_location *addr_al_ptr = NULL;
>  
> @@ -2229,7 +2239,7 @@ static int process_sample_event(struct perf_tool *tool,
>  
>  out_put:
>  	addr_location__put(&al);
> -	return 0;
> +	return ret;
>  }
>  
>  static int process_attr(struct perf_tool *tool, union perf_event *event,
> @@ -3568,6 +3578,7 @@ int cmd_script(int argc, const char **argv)
>  	};
>  	struct utsname uts;
>  	char *script_path = NULL;
> +	const char *dlfilter_file = NULL;
>  	const char **__argv;
>  	int i, j, err = 0;
>  	struct perf_script script = {
> @@ -3615,6 +3626,7 @@ int cmd_script(int argc, const char **argv)
>  		     parse_scriptname),
>  	OPT_STRING('g', "gen-script", &generate_script_lang, "lang",
>  		   "generate perf-script.xx script in specified language"),
> +	OPT_STRING(0, "dlfilter", &dlfilter_file, "file", "filter .so file name"),
>  	OPT_STRING('i', "input", &input_name, "file", "input file name"),
>  	OPT_BOOLEAN('d', "debug-mode", &debug_mode,
>  		   "do various checks like samples ordering and lost events"),
> @@ -3933,6 +3945,12 @@ int cmd_script(int argc, const char **argv)
>  		exit(-1);
>  	}
>  
> +	if (dlfilter_file) {
> +		dlfilter = dlfilter__new(dlfilter_file);
> +		if (!dlfilter)
> +			return -1;
> +	}
> +
>  	if (!script_name) {
>  		setup_pager();
>  		use_browser = 0;
> @@ -4032,6 +4050,10 @@ int cmd_script(int argc, const char **argv)
>  		goto out_delete;
>  	}
>  
> +	err = dlfilter__start(dlfilter, session);
> +	if (err)
> +		goto out_delete;
> +
>  	if (script_name) {
>  		err = scripting_ops->start_script(script_name, argc, argv, session);
>  		if (err)
> @@ -4081,6 +4103,7 @@ int cmd_script(int argc, const char **argv)
>  
>  	if (script_started)
>  		cleanup_scripting();
> +	dlfilter__cleanup(dlfilter);
>  out:
>  	return err;
>  }
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index 95e15d1035ab..1a909b53dc15 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -126,6 +126,7 @@ perf-y += parse-regs-options.o
>  perf-y += parse-sublevel-options.o
>  perf-y += term.o
>  perf-y += help-unknown-cmd.o
> +perf-y += dlfilter.o
>  perf-y += mem-events.o
>  perf-y += vsprintf.o
>  perf-y += units.o
> diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
> new file mode 100644
> index 000000000000..15cb9de13a4b
> --- /dev/null
> +++ b/tools/perf/util/dlfilter.c
> @@ -0,0 +1,330 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * dlfilter.c: Interface to perf script --dlfilter shared object
> + * Copyright (c) 2021, Intel Corporation.
> + */
> +#include <dlfcn.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <linux/zalloc.h>
> +#include <linux/build_bug.h>
> +
> +#include "debug.h"
> +#include "event.h"
> +#include "evsel.h"
> +#include "dso.h"
> +#include "map.h"
> +#include "thread.h"
> +#include "symbol.h"
> +#include "dlfilter.h"
> +#include "perf_dlfilter.h"
> +
> +static void al_to_d_al(struct addr_location *al, struct perf_dlfilter_al *d_al)
> +{
> +	struct symbol *sym = al->sym;
> +
> +	d_al->size = sizeof(*d_al);
> +	if (al->map) {
> +		struct dso *dso = al->map->dso;
> +
> +		if (symbol_conf.show_kernel_path && dso->long_name)
> +			d_al->dso = dso->long_name;
> +		else
> +			d_al->dso = dso->name;
> +		d_al->is_64_bit = dso->is_64_bit;
> +		d_al->buildid_size = dso->bid.size;
> +		d_al->buildid = dso->bid.data;
> +	} else {
> +		d_al->dso = NULL;
> +		d_al->is_64_bit = 0;
> +		d_al->buildid_size = 0;
> +		d_al->buildid = NULL;
> +	}
> +	if (sym) {
> +		d_al->sym = sym->name;
> +		d_al->sym_start = sym->start;
> +		d_al->sym_end = sym->end;
> +		if (al->addr < sym->end)
> +			d_al->symoff = al->addr - sym->start;
> +		else
> +			d_al->symoff = al->addr - al->map->start - sym->start;
> +		d_al->sym_binding = sym->binding;
> +	} else {
> +		d_al->sym = NULL;
> +		d_al->sym_start = 0;
> +		d_al->sym_end = 0;
> +		d_al->symoff = 0;
> +		d_al->sym_binding = 0;
> +	}
> +	d_al->addr = al->addr;
> +	d_al->comm = NULL;
> +	d_al->filtered = 0;
> +}
> +
> +static struct addr_location *get_al(struct dlfilter *d)
> +{
> +	struct addr_location *al = d->al;
> +
> +	if (!al->thread && machine__resolve(d->machine, al, d->sample) < 0)
> +		return NULL;
> +	return al;
> +}
> +
> +static struct thread *get_thread(struct dlfilter *d)
> +{
> +	struct addr_location *al = get_al(d);
> +
> +	return al ? al->thread : NULL;
> +}
> +
> +static const struct perf_dlfilter_al *dlfilter__resolve_ip(void *ctx)
> +{
> +	struct dlfilter *d = (struct dlfilter *)ctx;
> +	struct perf_dlfilter_al *d_al = d->d_ip_al;
> +	struct addr_location *al;
> +
> +	if (!d->ctx_valid)
> +		return NULL;
> +
> +	/* 'size' is also used to indicate already initialized */
> +	if (d_al->size)
> +		return d_al;
> +
> +	al = get_al(d);
> +	if (!al)
> +		return NULL;
> +
> +	al_to_d_al(al, d_al);
> +
> +	d_al->is_kernel_ip = machine__kernel_ip(d->machine, d->sample->ip);
> +	d_al->comm = al->thread ? thread__comm_str(al->thread) : ":-1";
> +	d_al->filtered = al->filtered;
> +
> +	return d_al;
> +}
> +
> +static const struct perf_dlfilter_al *dlfilter__resolve_addr(void *ctx)
> +{
> +	struct dlfilter *d = (struct dlfilter *)ctx;
> +	struct perf_dlfilter_al *d_addr_al = d->d_addr_al;
> +	struct addr_location *addr_al = d->addr_al;
> +
> +	if (!d->ctx_valid || !d->d_sample->addr_correlates_sym)
> +		return NULL;
> +
> +	/* 'size' is also used to indicate already initialized */
> +	if (d_addr_al->size)
> +		return d_addr_al;
> +
> +	if (!addr_al->thread) {
> +		struct thread *thread = get_thread(d);
> +
> +		if (!thread)
> +			return NULL;
> +		thread__resolve(thread, addr_al, d->sample);
> +	}
> +
> +	al_to_d_al(addr_al, d_addr_al);
> +
> +	d_addr_al->is_kernel_ip = machine__kernel_ip(d->machine, d->sample->addr);
> +
> +	return d_addr_al;
> +}
> +
> +static const struct perf_dlfilter_fns perf_dlfilter_fns = {
> +	.resolve_ip      = dlfilter__resolve_ip,
> +	.resolve_addr    = dlfilter__resolve_addr,
> +};
> +
> +#define CHECK_FLAG(x) BUILD_BUG_ON((u64)PERF_DLFILTER_FLAG_ ## x != (u64)PERF_IP_FLAG_ ## x)
> +
> +static int dlfilter__init(struct dlfilter *d, const char *file)
> +{
> +	CHECK_FLAG(BRANCH);
> +	CHECK_FLAG(CALL);
> +	CHECK_FLAG(RETURN);
> +	CHECK_FLAG(CONDITIONAL);
> +	CHECK_FLAG(SYSCALLRET);
> +	CHECK_FLAG(ASYNC);
> +	CHECK_FLAG(INTERRUPT);
> +	CHECK_FLAG(TX_ABORT);
> +	CHECK_FLAG(TRACE_BEGIN);
> +	CHECK_FLAG(TRACE_END);
> +	CHECK_FLAG(IN_TX);
> +	CHECK_FLAG(VMENTRY);
> +	CHECK_FLAG(VMEXIT);
> +
> +	memset(d, 0, sizeof(*d));
> +	d->file = strdup(file);
> +	if (!d->file)
> +		return -1;
> +	return 0;
> +}
> +
> +static void dlfilter__exit(struct dlfilter *d)
> +{
> +	zfree(&d->file);
> +}
> +
> +static int dlfilter__open(struct dlfilter *d)
> +{
> +	d->handle = dlopen(d->file, RTLD_NOW);
> +	if (!d->handle) {
> +		pr_err("dlopen failed for: '%s'\n", d->file);
> +		return -1;
> +	}
> +	d->start = dlsym(d->handle, "start");
> +	d->filter_event = dlsym(d->handle, "filter_event");
> +	d->stop = dlsym(d->handle, "stop");
> +	d->fns = dlsym(d->handle, "perf_dlfilter_fns");
> +	if (d->fns)
> +		memcpy(d->fns, &perf_dlfilter_fns, sizeof(struct perf_dlfilter_fns));
> +	return 0;
> +}
> +
> +static int dlfilter__close(struct dlfilter *d)
> +{
> +	return dlclose(d->handle);
> +}
> +
> +struct dlfilter *dlfilter__new(const char *file)
> +{
> +	struct dlfilter *d = malloc(sizeof(*d));
> +
> +	if (!d)
> +		return NULL;
> +
> +	if (dlfilter__init(d, file))
> +		goto err_free;
> +
> +	if (dlfilter__open(d))
> +		goto err_exit;
> +
> +	return d;
> +
> +err_exit:
> +	dlfilter__exit(d);
> +err_free:
> +	free(d);
> +	return NULL;
> +}
> +
> +static void dlfilter__free(struct dlfilter *d)
> +{
> +	if (d) {
> +		dlfilter__exit(d);
> +		free(d);
> +	}
> +}
> +
> +int dlfilter__start(struct dlfilter *d, struct perf_session *session)
> +{
> +	if (d) {
> +		d->session = session;
> +		if (d->start)
> +			return d->start(&d->data);
> +	}
> +	return 0;
> +}
> +
> +static int dlfilter__stop(struct dlfilter *d)
> +{
> +	if (d && d->stop)
> +		return d->stop(d->data);
> +	return 0;
> +}
> +
> +void dlfilter__cleanup(struct dlfilter *d)
> +{
> +	if (d) {
> +		dlfilter__stop(d);
> +		dlfilter__close(d);
> +		dlfilter__free(d);
> +	}
> +}
> +
> +#define ASSIGN(x) d_sample.x = sample->x
> +
> +int dlfilter__do_filter_event(struct dlfilter *d,
> +			      union perf_event *event,
> +			      struct perf_sample *sample,
> +			      struct evsel *evsel,
> +			      struct machine *machine,
> +			      struct addr_location *al,
> +			      struct addr_location *addr_al)
> +{
> +	struct perf_dlfilter_sample d_sample;
> +	struct perf_dlfilter_al d_ip_al;
> +	struct perf_dlfilter_al d_addr_al;
> +	int ret;
> +
> +	d->event       = event;
> +	d->sample      = sample;
> +	d->evsel       = evsel;
> +	d->machine     = machine;
> +	d->al          = al;
> +	d->addr_al     = addr_al;
> +	d->d_sample    = &d_sample;
> +	d->d_ip_al     = &d_ip_al;
> +	d->d_addr_al   = &d_addr_al;
> +
> +	d_sample.size  = sizeof(d_sample);
> +	d_ip_al.size   = 0; /* To indicate d_ip_al is not initialized */
> +	d_addr_al.size = 0; /* To indicate d_addr_al is not initialized */
> +
> +	ASSIGN(ip);
> +	ASSIGN(pid);
> +	ASSIGN(tid);
> +	ASSIGN(time);
> +	ASSIGN(addr);
> +	ASSIGN(id);
> +	ASSIGN(stream_id);
> +	ASSIGN(period);
> +	ASSIGN(weight);
> +	ASSIGN(ins_lat);
> +	ASSIGN(p_stage_cyc);
> +	ASSIGN(transaction);
> +	ASSIGN(insn_cnt);
> +	ASSIGN(cyc_cnt);
> +	ASSIGN(cpu);
> +	ASSIGN(flags);
> +	ASSIGN(data_src);
> +	ASSIGN(phys_addr);
> +	ASSIGN(data_page_size);
> +	ASSIGN(code_page_size);
> +	ASSIGN(cgroup);
> +	ASSIGN(cpumode);
> +	ASSIGN(misc);
> +	ASSIGN(raw_size);
> +	ASSIGN(raw_data);
> +
> +	if (sample->branch_stack) {
> +		d_sample.brstack_nr = sample->branch_stack->nr;
> +		d_sample.brstack = (struct perf_branch_entry *)perf_sample__branch_entries(sample);
> +	} else {
> +		d_sample.brstack_nr = 0;
> +		d_sample.brstack = NULL;
> +	}
> +
> +	if (sample->callchain) {
> +		d_sample.raw_callchain_nr = sample->callchain->nr;
> +		d_sample.raw_callchain = (__u64 *)sample->callchain->ips;
> +	} else {
> +		d_sample.raw_callchain_nr = 0;
> +		d_sample.raw_callchain = NULL;
> +	}
> +
> +	d_sample.addr_correlates_sym =
> +		(evsel->core.attr.sample_type & PERF_SAMPLE_ADDR) &&
> +		sample_addr_correlates_sym(&evsel->core.attr);
> +
> +	d_sample.event = evsel__name(evsel);
> +
> +	d->ctx_valid = true;
> +
> +	ret = d->filter_event(d->data, &d_sample, d);
> +
> +	d->ctx_valid = false;
> +
> +	return ret;
> +}
> diff --git a/tools/perf/util/dlfilter.h b/tools/perf/util/dlfilter.h
> new file mode 100644
> index 000000000000..671e2d3d5a06
> --- /dev/null
> +++ b/tools/perf/util/dlfilter.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * dlfilter.h: Interface to perf script --dlfilter shared object
> + * Copyright (c) 2021, Intel Corporation.
> + */
> +
> +#ifndef PERF_UTIL_DLFILTER_H
> +#define PERF_UTIL_DLFILTER_H
> +
> +struct perf_session;
> +union  perf_event;
> +struct perf_sample;
> +struct evsel;
> +struct machine;
> +struct addr_location;
> +struct perf_dlfilter_fns;
> +struct perf_dlfilter_sample;
> +struct perf_dlfilter_al;
> +
> +struct dlfilter {
> +	char				*file;
> +	void				*handle;
> +	void				*data;
> +	struct perf_session		*session;
> +	bool				ctx_valid;
> +
> +	union perf_event		*event;
> +	struct perf_sample		*sample;
> +	struct evsel			*evsel;
> +	struct machine			*machine;
> +	struct addr_location		*al;
> +	struct addr_location		*addr_al;
> +	struct perf_dlfilter_sample	*d_sample;
> +	struct perf_dlfilter_al		*d_ip_al;
> +	struct perf_dlfilter_al		*d_addr_al;
> +
> +	int (*start)(void **data);
> +	int (*stop)(void *data);
> +
> +	int (*filter_event)(void *data,
> +			    const struct perf_dlfilter_sample *sample,
> +			    void *ctx);
> +
> +	struct perf_dlfilter_fns *fns;
> +};
> +
> +struct dlfilter *dlfilter__new(const char *file);
> +
> +int dlfilter__start(struct dlfilter *d, struct perf_session *session);
> +
> +int dlfilter__do_filter_event(struct dlfilter *d,
> +			      union perf_event *event,
> +			      struct perf_sample *sample,
> +			      struct evsel *evsel,
> +			      struct machine *machine,
> +			      struct addr_location *al,
> +			      struct addr_location *addr_al);
> +
> +void dlfilter__cleanup(struct dlfilter *d);
> +
> +static inline int dlfilter__filter_event(struct dlfilter *d,
> +					 union perf_event *event,
> +					 struct perf_sample *sample,
> +					 struct evsel *evsel,
> +					 struct machine *machine,
> +					 struct addr_location *al,
> +					 struct addr_location *addr_al)
> +{
> +	if (!d || !d->filter_event)
> +		return 0;
> +	return dlfilter__do_filter_event(d, event, sample, evsel, machine, al, addr_al);
> +}
> +
> +#endif
> diff --git a/tools/perf/util/perf_dlfilter.h b/tools/perf/util/perf_dlfilter.h
> new file mode 100644
> index 000000000000..132f833f0a0b
> --- /dev/null
> +++ b/tools/perf/util/perf_dlfilter.h
> @@ -0,0 +1,120 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * perf_dlfilter.h: API for perf --dlfilter shared object
> + * Copyright (c) 2021, Intel Corporation.
> + */
> +#ifndef _LINUX_PERF_DLFILTER_H
> +#define _LINUX_PERF_DLFILTER_H
> +
> +#include <linux/perf_event.h>
> +#include <linux/types.h>
> +
> +/* Definitions for perf_dlfilter_sample flags */
> +enum {
> +	PERF_DLFILTER_FLAG_BRANCH	= 1ULL << 0,
> +	PERF_DLFILTER_FLAG_CALL		= 1ULL << 1,
> +	PERF_DLFILTER_FLAG_RETURN	= 1ULL << 2,
> +	PERF_DLFILTER_FLAG_CONDITIONAL	= 1ULL << 3,
> +	PERF_DLFILTER_FLAG_SYSCALLRET	= 1ULL << 4,
> +	PERF_DLFILTER_FLAG_ASYNC	= 1ULL << 5,
> +	PERF_DLFILTER_FLAG_INTERRUPT	= 1ULL << 6,
> +	PERF_DLFILTER_FLAG_TX_ABORT	= 1ULL << 7,
> +	PERF_DLFILTER_FLAG_TRACE_BEGIN	= 1ULL << 8,
> +	PERF_DLFILTER_FLAG_TRACE_END	= 1ULL << 9,
> +	PERF_DLFILTER_FLAG_IN_TX	= 1ULL << 10,
> +	PERF_DLFILTER_FLAG_VMENTRY	= 1ULL << 11,
> +	PERF_DLFILTER_FLAG_VMEXIT	= 1ULL << 12,
> +};
> +
> +/*
> + * perf sample event information (as per perf script and <linux/perf_event.h>)
> + */
> +struct perf_dlfilter_sample {
> +	__u32 size; /* Size of this structure (for compatibility checking) */
> +	__u64 ip;
> +	__s32 pid;
> +	__s32 tid;
> +	__u64 time;
> +	__u64 addr;
> +	__u64 id;
> +	__u64 stream_id;
> +	__u64 period;
> +	__u64 weight;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
> +	__u16 ins_lat;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
> +	__u16 p_stage_cyc;	/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
> +	__u64 transaction;	/* Refer PERF_SAMPLE_TRANSACTION in <linux/perf_event.h> */
> +	__u64 insn_cnt;	/* For instructions-per-cycle (IPC) */
> +	__u64 cyc_cnt;		/* For instructions-per-cycle (IPC) */
> +	__s32 cpu;
> +	__u32 flags;		/* Refer PERF_DLFILTER_FLAG_* above */
> +	__u64 data_src;		/* Refer PERF_SAMPLE_DATA_SRC in <linux/perf_event.h> */
> +	__u64 phys_addr;	/* Refer PERF_SAMPLE_PHYS_ADDR in <linux/perf_event.h> */
> +	__u64 data_page_size;	/* Refer PERF_SAMPLE_DATA_PAGE_SIZE in <linux/perf_event.h> */
> +	__u64 code_page_size;	/* Refer PERF_SAMPLE_CODE_PAGE_SIZE in <linux/perf_event.h> */
> +	__u64 cgroup;		/* Refer PERF_SAMPLE_CGROUP in <linux/perf_event.h> */
> +	__u8  cpumode;		/* Refer CPUMODE_MASK etc in <linux/perf_event.h> */
> +	__u8  addr_correlates_sym; /* True => resolve_addr() can be called */
> +	__u16 misc;		/* Refer perf_event_header in <linux/perf_event.h> */
> +	__u32 raw_size;		/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
> +	const void *raw_data;	/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
> +	__u64 brstack_nr;	/* Number of brstack entries */
> +	const struct perf_branch_entry *brstack; /* Refer <linux/perf_event.h> */
> +	__u64 raw_callchain_nr;	/* Number of raw_callchain entries */
> +	const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */
> +	const char *event;
> +};
> +
> +/*
> + * Address location (as per perf script)
> + */
> +struct perf_dlfilter_al {
> +	__u32 size; /* Size of this structure (for compatibility checking) */
> +	__u32 symoff;
> +	const char *sym;
> +	__u64 addr; /* Mapped address (from dso) */
> +	__u64 sym_start;
> +	__u64 sym_end;
> +	const char *dso;
> +	__u8  sym_binding; /* STB_LOCAL, STB_GLOBAL or STB_WEAK, refer <elf.h> */
> +	__u8  is_64_bit; /* Only valid if dso is not NULL */
> +	__u8  is_kernel_ip; /* True if in kernel space */
> +	__u32 buildid_size;
> +	__u8 *buildid;
> +	/* Below members are only populated by resolve_ip() */
> +	__u8 filtered; /* True if this sample event will be filtered out */
> +	const char *comm;
> +};
> +
> +struct perf_dlfilter_fns {
> +	/* Return information about ip */
> +	const struct perf_dlfilter_al *(*resolve_ip)(void *ctx);
> +	/* Return information about addr (if addr_correlates_sym) */
> +	const struct perf_dlfilter_al *(*resolve_addr)(void *ctx);
> +	/* Reserved */
> +	void *(*reserved[126])(void *);
> +};
> +
> +/*
> + * If implemented, 'start' will be called at the beginning,
> + * before any calls to 'filter_event'. Return 0 to indicate success,
> + * or return a negative error code. '*data' can be assigned for use
> + * by other functions.
> + */
> +int start(void **data);
> +
> +/*
> + * If implemented, 'stop' will be called at the end,
> + * after any calls to 'filter_event'. Return 0 to indicate success, or
> + * return a negative error code. 'data' is set by start().
> + */
> +int stop(void *data);
> +
> +/*
> + * If implemented, 'filter_event' will be called for each sample
> + * event. Return 0 to keep the sample event, 1 to filter it out, or
> + * return a negative error code. 'data' is set by start(). 'ctx' is
> + * needed for calls to perf_dlfilter_fns.
> + */
> +int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx);
> +
> +#endif
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 04/11] perf script: Add API for filtering via dynamically loaded shared object
  2021-06-22 18:32   ` Arnaldo Carvalho de Melo
@ 2021-06-23  9:25     ` Adrian Hunter
  0 siblings, 0 replies; 17+ messages in thread
From: Adrian Hunter @ 2021-06-23  9:25 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Andi Kleen, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Namhyung Kim, Leo Yan, Kan Liang, linux-perf-users, linux-kernel

On 22/06/21 9:32 pm, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jun 21, 2021 at 06:05:07PM +0300, Adrian Hunter escreveu:
>> In some cases, users want to filter very large amounts of data (e.g. from
>> AUX area tracing like Intel PT) looking for something specific. While
>> scripting such as Python can be used, Python is 10 to 20 times slower than
>> C. So define a C API so that custom filters can be written and loaded.
> 
> Statically linking with perf would be even faster 8-) But yeah, I think
> its something useful, some notes below
> 
> A first question, can this be combined with pre-existing filters? I.e.
> 
>   perf script -C 0 --dlfilter file.so
> 
> ?

Thanks for looking at the patches.

Yes.  The filter_event() callback is called after internal filtering.
Patch 5 introduces filter_event_early() which is called before internal
filtering, but in either case all internal filtering is still done.

> 
> Also in the docs we can state that after these custom filters get
> polished and mature, it would be interesting to send them to be merged
> upstream, where they could even be statically linked with perf, if a
> profile session with it linked statically proves that.
> 
> Having a collection of such filters upstream we could then have
> a similar interface as for the script collection to describe filters,
> i.e. the .so file would have to have a description and perhaps a man
> page:
> 
> ⬢[acme@toolbox perf]$ perf script -l
> List of available trace scripts:
>   compaction-times [-h] [-u] [-p|-pv] [-t | [-m] [-fs] [-ms]] [pid|pid-range|comm-regex] display time taken by mm compaction
>   event_analyzing_sample               analyze all perf samples
>   export-to-postgresql [database name] [columns] [calls] export perf data to a postgresql database
>   export-to-sqlite [database name] [columns] [calls] export perf data to a sqlite3 database
>   failed-syscalls-by-pid [comm]        system-wide failed syscalls, by pid
>   flamegraph                           create flame graphs
>   futex-contention                     futext contention measurement
>   intel-pt-events                      print Intel PT Events including Power Events and PTWRITE
>   mem-phys-addr                        resolve physical address samples
>   net_dropmonitor                      display a table of dropped frames
>   netdev-times [tx] [rx] [dev=] [debug] display a process of packet and processing time
>   powerpc-hcalls
>   sched-migration                      sched migration overview
>   sctop [comm] [interval]              syscall top
>   stackcollapse                        produce callgraphs in short form for scripting use
>   syscall-counts-by-pid [comm]         system-wide syscall counts, by pid
>   syscall-counts [comm]                system-wide syscall counts
>   failed-syscalls [comm]               system-wide failed syscalls
>   rw-by-file <comm>                    r/w activity for a program, by file
>   rw-by-pid                            system-wide r/w activity
>   rwtop [interval]                     system-wide r/w top
>   wakeup-latency                       system-wide min/max/avg wakeup latency
> ⬢[acme@toolbox perf]$
> 
> I.e. some 'perf script --list-dlfilters'

Yes but we would also need some way to store and retrieve a 
description of the filter otherwise all we have is the file name.
I guess it could be another API function so the filter .so could
be loaded and the "const char* filter_description(const char **title)"
function called to get the description.

>  
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/Documentation/perf-dlfilter.txt | 214 +++++++++++++
>>  tools/perf/Documentation/perf-script.txt   |   7 +-
>>  tools/perf/builtin-script.c                |  25 +-
>>  tools/perf/util/Build                      |   1 +
>>  tools/perf/util/dlfilter.c                 | 330 +++++++++++++++++++++
>>  tools/perf/util/dlfilter.h                 |  74 +++++
>>  tools/perf/util/perf_dlfilter.h            | 120 ++++++++
>>  7 files changed, 769 insertions(+), 2 deletions(-)
>>  create mode 100644 tools/perf/Documentation/perf-dlfilter.txt
>>  create mode 100644 tools/perf/util/dlfilter.c
>>  create mode 100644 tools/perf/util/dlfilter.h
>>  create mode 100644 tools/perf/util/perf_dlfilter.h
>>
>> diff --git a/tools/perf/Documentation/perf-dlfilter.txt b/tools/perf/Documentation/perf-dlfilter.txt
>> new file mode 100644
>> index 000000000000..d8f80998790f
>> --- /dev/null
>> +++ b/tools/perf/Documentation/perf-dlfilter.txt
>> @@ -0,0 +1,214 @@
>> +perf-dlfilter(1)
>> +================
>> +
>> +NAME
>> +----
>> +perf-dlfilter - Filter sample events using a dynamically loaded shared
>> +object file
>> +
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'perf script' [--dlfilter file.so ]
>> +
>> +DESCRIPTION
>> +-----------
>> +
>> +This option is used to process data through a custom filter provided by a
>> +dynamically loaded shared object file.
>> +
>> +API
>> +---
>> +
>> +The API for filtering consists of the following:
>> +
>> +[source,c]
>> +----
>> +#include <perf/perf_dlfilter.h>
>> +
>> +const struct perf_dlfilter_fns perf_dlfilter_fns;
>> +
>> +int start(void **data);
>> +int stop(void *data);
>> +int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx);
>> +----
>> +
>> +If implemented, 'start' will be called at the beginning, before any
>> +calls to 'filter_event' . Return 0 to indicate success,
>> +or return a negative error code. '*data' can be assigned for use by other
>> +functions.
>> +
>> +If implemented, 'stop' will be called at the end, after any calls to
>> +'filter_event'. Return 0 to indicate success, or
>> +return a negative error code. 'data' is set by 'start'.
>> +
>> +If implemented, 'filter_event' will be called for each sample event.
>> +Return 0 to keep the sample event, 1 to filter it out, or return a negative
>> +error code. 'data' is set by 'start'. 'ctx' is needed for calls to
>> +'perf_dlfilter_fns'.
>> +
>> +The perf_dlfilter_sample structure
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +'filter_event' is passed a perf_dlfilter_sample
>> +structure, which contains the following fields:
>> +[source,c]
>> +----
>> +/*
>> + * perf sample event information (as per perf script and <linux/perf_event.h>)
>> + */
>> +struct perf_dlfilter_sample {
>> +	__u32 size; /* Size of this structure (for compatibility checking) */
> 
> There is a 4-byte hole here, 
> 
>> +	__u64 ip;
>> +	__s32 pid;
>> +	__s32 tid;
>> +	__u64 time;
>> +	__u64 addr;
>> +	__u64 id;
>> +	__u64 stream_id;
>> +	__u64 period;
>> +	__u64 weight;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
>> +	__u16 ins_lat;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
>> +	__u16 p_stage_cyc;	/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
> 
> Another
> 
> Can you move these last two __u16 to right after the 'size' field?

Sure

> 
>> +	__u64 transaction;	/* Refer PERF_SAMPLE_TRANSACTION in <linux/perf_event.h> */
>> +	__u64 insn_cnt;	/* For instructions-per-cycle (IPC) */
>> +	__u64 cyc_cnt;		/* For instructions-per-cycle (IPC) */
>> +	__s32 cpu;
>> +	__u32 flags;		/* Refer PERF_DLFILTER_FLAG_* above */
>> +	__u64 data_src;		/* Refer PERF_SAMPLE_DATA_SRC in <linux/perf_event.h> */
>> +	__u64 phys_addr;	/* Refer PERF_SAMPLE_PHYS_ADDR in <linux/perf_event.h> */
>> +	__u64 data_page_size;	/* Refer PERF_SAMPLE_DATA_PAGE_SIZE in <linux/perf_event.h> */
>> +	__u64 code_page_size;	/* Refer PERF_SAMPLE_CODE_PAGE_SIZE in <linux/perf_event.h> */
>> +	__u64 cgroup;		/* Refer PERF_SAMPLE_CGROUP in <linux/perf_event.h> */
>> +	__u8  cpumode;		/* Refer CPUMODE_MASK etc in <linux/perf_event.h> */
>> +	__u8  addr_correlates_sym; /* True => resolve_addr() can be called */
>> +	__u16 misc;		/* Refer perf_event_header in <linux/perf_event.h> */
>> +	__u32 raw_size;		/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
>> +	const void *raw_data;	/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
>> +	__u64 brstack_nr;	/* Number of brstack entries */
>> +	const struct perf_branch_entry *brstack; /* Refer <linux/perf_event.h> */
>> +	__u64 raw_callchain_nr;	/* Number of raw_callchain entries */
>> +	const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */
>> +	const char *event;
>> +};
>> +----
>> +
>> +The perf_dlfilter_fns structure
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +The 'perf_dlfilter_fns' structure is populated with function pointers when the
>> +file is loaded. The functions can be called by 'filter_event'.
>> +
>> +[source,c]
>> +----
>> +struct perf_dlfilter_fns {
>> +	const struct perf_dlfilter_al *(*resolve_ip)(void *ctx);
>> +	const struct perf_dlfilter_al *(*resolve_addr)(void *ctx);
>> +	void *(*reserved[126])(void *);
>> +};
>> +----
>> +
>> +'resolve_ip' returns information about ip.
>> +
>> +'resolve_addr' returns information about addr (if addr_correlates_sym).
>> +
>> +The perf_dlfilter_al structure
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +The 'perf_dlfilter_al' structure contains information about an address.
>> +
>> +[source,c]
>> +----
>> +/*
>> + * Address location (as per perf script)
>> + */
>> +struct perf_dlfilter_al {
>> +	__u32 size; /* Size of this structure (for compatibility checking) */
>> +	__u32 symoff;
>> +	const char *sym;
>> +	__u64 addr; /* Mapped address (from dso) */
>> +	__u64 sym_start;
>> +	__u64 sym_end;
>> +	const char *dso;
>> +	__u8  sym_binding; /* STB_LOCAL, STB_GLOBAL or STB_WEAK, refer <elf.h> */
>> +	__u8  is_64_bit; /* Only valid if dso is not NULL */
>> +	__u8  is_kernel_ip; /* True if in kernel space */
>> +	__u32 buildid_size;
>> +	__u8 *buildid;
>> +	/* Below members are only populated by resolve_ip() */
>> +	__u8 filtered; /* true if this sample event will be filtered out */
>> +	const char *comm;
>> +};
>> +----
>> +
>> +perf_dlfilter_sample flags
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +The 'flags' member of 'perf_dlfilter_sample' corresponds with the flags field
>> +of perf script. The bits of the flags are as follows:
>> +
>> +[source,c]
>> +----
>> +/* Definitions for perf_dlfilter_sample flags */
>> +enum {
>> +	PERF_DLFILTER_FLAG_BRANCH	= 1ULL << 0,
>> +	PERF_DLFILTER_FLAG_CALL		= 1ULL << 1,
>> +	PERF_DLFILTER_FLAG_RETURN	= 1ULL << 2,
>> +	PERF_DLFILTER_FLAG_CONDITIONAL	= 1ULL << 3,
>> +	PERF_DLFILTER_FLAG_SYSCALLRET	= 1ULL << 4,
>> +	PERF_DLFILTER_FLAG_ASYNC	= 1ULL << 5,
>> +	PERF_DLFILTER_FLAG_INTERRUPT	= 1ULL << 6,
>> +	PERF_DLFILTER_FLAG_TX_ABORT	= 1ULL << 7,
>> +	PERF_DLFILTER_FLAG_TRACE_BEGIN	= 1ULL << 8,
>> +	PERF_DLFILTER_FLAG_TRACE_END	= 1ULL << 9,
>> +	PERF_DLFILTER_FLAG_IN_TX	= 1ULL << 10,
>> +	PERF_DLFILTER_FLAG_VMENTRY	= 1ULL << 11,
>> +	PERF_DLFILTER_FLAG_VMEXIT	= 1ULL << 12,
>> +};
>> +----
>> +
>> +EXAMPLE
>> +-------
>> +
>> +Filter out everything except branches from "foo" to "bar":
>> +
>> +[source,c]
>> +----
>> +#include <perf/perf_dlfilter.h>
>> +#include <string.h>
>> +
>> +const struct perf_dlfilter_fns perf_dlfilter_fns;
>> +
>> +int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
>> +{
>> +	const struct perf_dlfilter_al *al;
>> +	const struct perf_dlfilter_al *addr_al;
>> +
>> +	if (!sample->ip || !sample->addr_correlates_sym)
>> +		return 1;
>> +
>> +	al = perf_dlfilter_fns.resolve_ip(ctx);
>> +	if (!al || !al->sym || strcmp(al->sym, "foo"))
>> +		return 1;
>> +
>> +	addr_al = perf_dlfilter_fns.resolve_addr(ctx);
>> +	if (!addr_al || !addr_al->sym || strcmp(addr_al->sym, "bar"))
>> +		return 1;
>> +
>> +	return 0;
>> +}
>> +----
>> +
>> +To build the shared object, assuming perf has been installed for the local user
>> +i.e. perf_dlfilter.h is in ~/include/perf :
>> +
>> +	gcc -c -I ~/include -fpic dlfilter-example.c
>> +	gcc -shared -o dlfilter-example.so dlfilter-example.o
>> +
>> +To use the filter with perf script:
>> +
>> +	perf script --dlfilter ./dlfilter-example.so
>> +
>> +SEE ALSO
>> +--------
>> +linkperf:perf-script[1]
>> diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
>> index 48a5f5b26dd4..2306c81b606b 100644
>> --- a/tools/perf/Documentation/perf-script.txt
>> +++ b/tools/perf/Documentation/perf-script.txt
>> @@ -98,6 +98,10 @@ OPTIONS
>>          Generate perf-script.[ext] starter script for given language,
>>          using current perf.data.
>>  
>> +--dlfilter=<file>::
>> +	Filter sample events using the given shared object file.
>> +	Refer linkperf:perf-dlfilter[1]
>> +
>>  -a::
>>          Force system-wide collection.  Scripts run without a <command>
>>          normally use -a by default, while scripts run with a <command>
>> @@ -483,4 +487,5 @@ include::itrace.txt[]
>>  SEE ALSO
>>  --------
>>  linkperf:perf-record[1], linkperf:perf-script-perl[1],
>> -linkperf:perf-script-python[1], linkperf:perf-intel-pt[1]
>> +linkperf:perf-script-python[1], linkperf:perf-intel-pt[1],
>> +linkperf:perf-dlfilter[1]
>> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
>> index d2771a997e26..aaf2922643a0 100644
>> --- a/tools/perf/builtin-script.c
>> +++ b/tools/perf/builtin-script.c
>> @@ -55,6 +55,7 @@
>>  #include <subcmd/pager.h>
>>  #include <perf/evlist.h>
>>  #include <linux/err.h>
>> +#include "util/dlfilter.h"
>>  #include "util/record.h"
>>  #include "util/util.h"
>>  #include "perf.h"
>> @@ -79,6 +80,7 @@ static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
>>  static struct perf_stat_config	stat_config;
>>  static int			max_blocks;
>>  static bool			native_arch;
>> +static struct dlfilter		*dlfilter;
>>  
>>  unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
>>  
>> @@ -2175,6 +2177,7 @@ static int process_sample_event(struct perf_tool *tool,
>>  	struct perf_script *scr = container_of(tool, struct perf_script, tool);
>>  	struct addr_location al;
>>  	struct addr_location addr_al;
>> +	int ret = 0;
>>  
>>  	if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num,
>>  					  sample->time)) {
>> @@ -2213,6 +2216,13 @@ static int process_sample_event(struct perf_tool *tool,
>>  	if (evswitch__discard(&scr->evswitch, evsel))
>>  		goto out_put;
>>  
>> +	ret = dlfilter__filter_event(dlfilter, event, sample, evsel, machine, &al, &addr_al);
>> +	if (ret) {
>> +		if (ret > 0)
>> +			ret = 0;
>> +		goto out_put;
>> +	}
>> +
>>  	if (scripting_ops) {
>>  		struct addr_location *addr_al_ptr = NULL;
>>  
>> @@ -2229,7 +2239,7 @@ static int process_sample_event(struct perf_tool *tool,
>>  
>>  out_put:
>>  	addr_location__put(&al);
>> -	return 0;
>> +	return ret;
>>  }
>>  
>>  static int process_attr(struct perf_tool *tool, union perf_event *event,
>> @@ -3568,6 +3578,7 @@ int cmd_script(int argc, const char **argv)
>>  	};
>>  	struct utsname uts;
>>  	char *script_path = NULL;
>> +	const char *dlfilter_file = NULL;
>>  	const char **__argv;
>>  	int i, j, err = 0;
>>  	struct perf_script script = {
>> @@ -3615,6 +3626,7 @@ int cmd_script(int argc, const char **argv)
>>  		     parse_scriptname),
>>  	OPT_STRING('g', "gen-script", &generate_script_lang, "lang",
>>  		   "generate perf-script.xx script in specified language"),
>> +	OPT_STRING(0, "dlfilter", &dlfilter_file, "file", "filter .so file name"),
>>  	OPT_STRING('i', "input", &input_name, "file", "input file name"),
>>  	OPT_BOOLEAN('d', "debug-mode", &debug_mode,
>>  		   "do various checks like samples ordering and lost events"),
>> @@ -3933,6 +3945,12 @@ int cmd_script(int argc, const char **argv)
>>  		exit(-1);
>>  	}
>>  
>> +	if (dlfilter_file) {
>> +		dlfilter = dlfilter__new(dlfilter_file);
>> +		if (!dlfilter)
>> +			return -1;
>> +	}
>> +
>>  	if (!script_name) {
>>  		setup_pager();
>>  		use_browser = 0;
>> @@ -4032,6 +4050,10 @@ int cmd_script(int argc, const char **argv)
>>  		goto out_delete;
>>  	}
>>  
>> +	err = dlfilter__start(dlfilter, session);
>> +	if (err)
>> +		goto out_delete;
>> +
>>  	if (script_name) {
>>  		err = scripting_ops->start_script(script_name, argc, argv, session);
>>  		if (err)
>> @@ -4081,6 +4103,7 @@ int cmd_script(int argc, const char **argv)
>>  
>>  	if (script_started)
>>  		cleanup_scripting();
>> +	dlfilter__cleanup(dlfilter);
>>  out:
>>  	return err;
>>  }
>> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
>> index 95e15d1035ab..1a909b53dc15 100644
>> --- a/tools/perf/util/Build
>> +++ b/tools/perf/util/Build
>> @@ -126,6 +126,7 @@ perf-y += parse-regs-options.o
>>  perf-y += parse-sublevel-options.o
>>  perf-y += term.o
>>  perf-y += help-unknown-cmd.o
>> +perf-y += dlfilter.o
>>  perf-y += mem-events.o
>>  perf-y += vsprintf.o
>>  perf-y += units.o
>> diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
>> new file mode 100644
>> index 000000000000..15cb9de13a4b
>> --- /dev/null
>> +++ b/tools/perf/util/dlfilter.c
>> @@ -0,0 +1,330 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * dlfilter.c: Interface to perf script --dlfilter shared object
>> + * Copyright (c) 2021, Intel Corporation.
>> + */
>> +#include <dlfcn.h>
>> +#include <stdlib.h>
>> +#include <string.h>
>> +#include <linux/zalloc.h>
>> +#include <linux/build_bug.h>
>> +
>> +#include "debug.h"
>> +#include "event.h"
>> +#include "evsel.h"
>> +#include "dso.h"
>> +#include "map.h"
>> +#include "thread.h"
>> +#include "symbol.h"
>> +#include "dlfilter.h"
>> +#include "perf_dlfilter.h"
>> +
>> +static void al_to_d_al(struct addr_location *al, struct perf_dlfilter_al *d_al)
>> +{
>> +	struct symbol *sym = al->sym;
>> +
>> +	d_al->size = sizeof(*d_al);
>> +	if (al->map) {
>> +		struct dso *dso = al->map->dso;
>> +
>> +		if (symbol_conf.show_kernel_path && dso->long_name)
>> +			d_al->dso = dso->long_name;
>> +		else
>> +			d_al->dso = dso->name;
>> +		d_al->is_64_bit = dso->is_64_bit;
>> +		d_al->buildid_size = dso->bid.size;
>> +		d_al->buildid = dso->bid.data;
>> +	} else {
>> +		d_al->dso = NULL;
>> +		d_al->is_64_bit = 0;
>> +		d_al->buildid_size = 0;
>> +		d_al->buildid = NULL;
>> +	}
>> +	if (sym) {
>> +		d_al->sym = sym->name;
>> +		d_al->sym_start = sym->start;
>> +		d_al->sym_end = sym->end;
>> +		if (al->addr < sym->end)
>> +			d_al->symoff = al->addr - sym->start;
>> +		else
>> +			d_al->symoff = al->addr - al->map->start - sym->start;
>> +		d_al->sym_binding = sym->binding;
>> +	} else {
>> +		d_al->sym = NULL;
>> +		d_al->sym_start = 0;
>> +		d_al->sym_end = 0;
>> +		d_al->symoff = 0;
>> +		d_al->sym_binding = 0;
>> +	}
>> +	d_al->addr = al->addr;
>> +	d_al->comm = NULL;
>> +	d_al->filtered = 0;
>> +}
>> +
>> +static struct addr_location *get_al(struct dlfilter *d)
>> +{
>> +	struct addr_location *al = d->al;
>> +
>> +	if (!al->thread && machine__resolve(d->machine, al, d->sample) < 0)
>> +		return NULL;
>> +	return al;
>> +}
>> +
>> +static struct thread *get_thread(struct dlfilter *d)
>> +{
>> +	struct addr_location *al = get_al(d);
>> +
>> +	return al ? al->thread : NULL;
>> +}
>> +
>> +static const struct perf_dlfilter_al *dlfilter__resolve_ip(void *ctx)
>> +{
>> +	struct dlfilter *d = (struct dlfilter *)ctx;
>> +	struct perf_dlfilter_al *d_al = d->d_ip_al;
>> +	struct addr_location *al;
>> +
>> +	if (!d->ctx_valid)
>> +		return NULL;
>> +
>> +	/* 'size' is also used to indicate already initialized */
>> +	if (d_al->size)
>> +		return d_al;
>> +
>> +	al = get_al(d);
>> +	if (!al)
>> +		return NULL;
>> +
>> +	al_to_d_al(al, d_al);
>> +
>> +	d_al->is_kernel_ip = machine__kernel_ip(d->machine, d->sample->ip);
>> +	d_al->comm = al->thread ? thread__comm_str(al->thread) : ":-1";
>> +	d_al->filtered = al->filtered;
>> +
>> +	return d_al;
>> +}
>> +
>> +static const struct perf_dlfilter_al *dlfilter__resolve_addr(void *ctx)
>> +{
>> +	struct dlfilter *d = (struct dlfilter *)ctx;
>> +	struct perf_dlfilter_al *d_addr_al = d->d_addr_al;
>> +	struct addr_location *addr_al = d->addr_al;
>> +
>> +	if (!d->ctx_valid || !d->d_sample->addr_correlates_sym)
>> +		return NULL;
>> +
>> +	/* 'size' is also used to indicate already initialized */
>> +	if (d_addr_al->size)
>> +		return d_addr_al;
>> +
>> +	if (!addr_al->thread) {
>> +		struct thread *thread = get_thread(d);
>> +
>> +		if (!thread)
>> +			return NULL;
>> +		thread__resolve(thread, addr_al, d->sample);
>> +	}
>> +
>> +	al_to_d_al(addr_al, d_addr_al);
>> +
>> +	d_addr_al->is_kernel_ip = machine__kernel_ip(d->machine, d->sample->addr);
>> +
>> +	return d_addr_al;
>> +}
>> +
>> +static const struct perf_dlfilter_fns perf_dlfilter_fns = {
>> +	.resolve_ip      = dlfilter__resolve_ip,
>> +	.resolve_addr    = dlfilter__resolve_addr,
>> +};
>> +
>> +#define CHECK_FLAG(x) BUILD_BUG_ON((u64)PERF_DLFILTER_FLAG_ ## x != (u64)PERF_IP_FLAG_ ## x)
>> +
>> +static int dlfilter__init(struct dlfilter *d, const char *file)
>> +{
>> +	CHECK_FLAG(BRANCH);
>> +	CHECK_FLAG(CALL);
>> +	CHECK_FLAG(RETURN);
>> +	CHECK_FLAG(CONDITIONAL);
>> +	CHECK_FLAG(SYSCALLRET);
>> +	CHECK_FLAG(ASYNC);
>> +	CHECK_FLAG(INTERRUPT);
>> +	CHECK_FLAG(TX_ABORT);
>> +	CHECK_FLAG(TRACE_BEGIN);
>> +	CHECK_FLAG(TRACE_END);
>> +	CHECK_FLAG(IN_TX);
>> +	CHECK_FLAG(VMENTRY);
>> +	CHECK_FLAG(VMEXIT);
>> +
>> +	memset(d, 0, sizeof(*d));
>> +	d->file = strdup(file);
>> +	if (!d->file)
>> +		return -1;
>> +	return 0;
>> +}
>> +
>> +static void dlfilter__exit(struct dlfilter *d)
>> +{
>> +	zfree(&d->file);
>> +}
>> +
>> +static int dlfilter__open(struct dlfilter *d)
>> +{
>> +	d->handle = dlopen(d->file, RTLD_NOW);
>> +	if (!d->handle) {
>> +		pr_err("dlopen failed for: '%s'\n", d->file);
>> +		return -1;
>> +	}
>> +	d->start = dlsym(d->handle, "start");
>> +	d->filter_event = dlsym(d->handle, "filter_event");
>> +	d->stop = dlsym(d->handle, "stop");
>> +	d->fns = dlsym(d->handle, "perf_dlfilter_fns");
>> +	if (d->fns)
>> +		memcpy(d->fns, &perf_dlfilter_fns, sizeof(struct perf_dlfilter_fns));
>> +	return 0;
>> +}
>> +
>> +static int dlfilter__close(struct dlfilter *d)
>> +{
>> +	return dlclose(d->handle);
>> +}
>> +
>> +struct dlfilter *dlfilter__new(const char *file)
>> +{
>> +	struct dlfilter *d = malloc(sizeof(*d));
>> +
>> +	if (!d)
>> +		return NULL;
>> +
>> +	if (dlfilter__init(d, file))
>> +		goto err_free;
>> +
>> +	if (dlfilter__open(d))
>> +		goto err_exit;
>> +
>> +	return d;
>> +
>> +err_exit:
>> +	dlfilter__exit(d);
>> +err_free:
>> +	free(d);
>> +	return NULL;
>> +}
>> +
>> +static void dlfilter__free(struct dlfilter *d)
>> +{
>> +	if (d) {
>> +		dlfilter__exit(d);
>> +		free(d);
>> +	}
>> +}
>> +
>> +int dlfilter__start(struct dlfilter *d, struct perf_session *session)
>> +{
>> +	if (d) {
>> +		d->session = session;
>> +		if (d->start)
>> +			return d->start(&d->data);
>> +	}
>> +	return 0;
>> +}
>> +
>> +static int dlfilter__stop(struct dlfilter *d)
>> +{
>> +	if (d && d->stop)
>> +		return d->stop(d->data);
>> +	return 0;
>> +}
>> +
>> +void dlfilter__cleanup(struct dlfilter *d)
>> +{
>> +	if (d) {
>> +		dlfilter__stop(d);
>> +		dlfilter__close(d);
>> +		dlfilter__free(d);
>> +	}
>> +}
>> +
>> +#define ASSIGN(x) d_sample.x = sample->x
>> +
>> +int dlfilter__do_filter_event(struct dlfilter *d,
>> +			      union perf_event *event,
>> +			      struct perf_sample *sample,
>> +			      struct evsel *evsel,
>> +			      struct machine *machine,
>> +			      struct addr_location *al,
>> +			      struct addr_location *addr_al)
>> +{
>> +	struct perf_dlfilter_sample d_sample;
>> +	struct perf_dlfilter_al d_ip_al;
>> +	struct perf_dlfilter_al d_addr_al;
>> +	int ret;
>> +
>> +	d->event       = event;
>> +	d->sample      = sample;
>> +	d->evsel       = evsel;
>> +	d->machine     = machine;
>> +	d->al          = al;
>> +	d->addr_al     = addr_al;
>> +	d->d_sample    = &d_sample;
>> +	d->d_ip_al     = &d_ip_al;
>> +	d->d_addr_al   = &d_addr_al;
>> +
>> +	d_sample.size  = sizeof(d_sample);
>> +	d_ip_al.size   = 0; /* To indicate d_ip_al is not initialized */
>> +	d_addr_al.size = 0; /* To indicate d_addr_al is not initialized */
>> +
>> +	ASSIGN(ip);
>> +	ASSIGN(pid);
>> +	ASSIGN(tid);
>> +	ASSIGN(time);
>> +	ASSIGN(addr);
>> +	ASSIGN(id);
>> +	ASSIGN(stream_id);
>> +	ASSIGN(period);
>> +	ASSIGN(weight);
>> +	ASSIGN(ins_lat);
>> +	ASSIGN(p_stage_cyc);
>> +	ASSIGN(transaction);
>> +	ASSIGN(insn_cnt);
>> +	ASSIGN(cyc_cnt);
>> +	ASSIGN(cpu);
>> +	ASSIGN(flags);
>> +	ASSIGN(data_src);
>> +	ASSIGN(phys_addr);
>> +	ASSIGN(data_page_size);
>> +	ASSIGN(code_page_size);
>> +	ASSIGN(cgroup);
>> +	ASSIGN(cpumode);
>> +	ASSIGN(misc);
>> +	ASSIGN(raw_size);
>> +	ASSIGN(raw_data);
>> +
>> +	if (sample->branch_stack) {
>> +		d_sample.brstack_nr = sample->branch_stack->nr;
>> +		d_sample.brstack = (struct perf_branch_entry *)perf_sample__branch_entries(sample);
>> +	} else {
>> +		d_sample.brstack_nr = 0;
>> +		d_sample.brstack = NULL;
>> +	}
>> +
>> +	if (sample->callchain) {
>> +		d_sample.raw_callchain_nr = sample->callchain->nr;
>> +		d_sample.raw_callchain = (__u64 *)sample->callchain->ips;
>> +	} else {
>> +		d_sample.raw_callchain_nr = 0;
>> +		d_sample.raw_callchain = NULL;
>> +	}
>> +
>> +	d_sample.addr_correlates_sym =
>> +		(evsel->core.attr.sample_type & PERF_SAMPLE_ADDR) &&
>> +		sample_addr_correlates_sym(&evsel->core.attr);
>> +
>> +	d_sample.event = evsel__name(evsel);
>> +
>> +	d->ctx_valid = true;
>> +
>> +	ret = d->filter_event(d->data, &d_sample, d);
>> +
>> +	d->ctx_valid = false;
>> +
>> +	return ret;
>> +}
>> diff --git a/tools/perf/util/dlfilter.h b/tools/perf/util/dlfilter.h
>> new file mode 100644
>> index 000000000000..671e2d3d5a06
>> --- /dev/null
>> +++ b/tools/perf/util/dlfilter.h
>> @@ -0,0 +1,74 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * dlfilter.h: Interface to perf script --dlfilter shared object
>> + * Copyright (c) 2021, Intel Corporation.
>> + */
>> +
>> +#ifndef PERF_UTIL_DLFILTER_H
>> +#define PERF_UTIL_DLFILTER_H
>> +
>> +struct perf_session;
>> +union  perf_event;
>> +struct perf_sample;
>> +struct evsel;
>> +struct machine;
>> +struct addr_location;
>> +struct perf_dlfilter_fns;
>> +struct perf_dlfilter_sample;
>> +struct perf_dlfilter_al;
>> +
>> +struct dlfilter {
>> +	char				*file;
>> +	void				*handle;
>> +	void				*data;
>> +	struct perf_session		*session;
>> +	bool				ctx_valid;
>> +
>> +	union perf_event		*event;
>> +	struct perf_sample		*sample;
>> +	struct evsel			*evsel;
>> +	struct machine			*machine;
>> +	struct addr_location		*al;
>> +	struct addr_location		*addr_al;
>> +	struct perf_dlfilter_sample	*d_sample;
>> +	struct perf_dlfilter_al		*d_ip_al;
>> +	struct perf_dlfilter_al		*d_addr_al;
>> +
>> +	int (*start)(void **data);
>> +	int (*stop)(void *data);
>> +
>> +	int (*filter_event)(void *data,
>> +			    const struct perf_dlfilter_sample *sample,
>> +			    void *ctx);
>> +
>> +	struct perf_dlfilter_fns *fns;
>> +};
>> +
>> +struct dlfilter *dlfilter__new(const char *file);
>> +
>> +int dlfilter__start(struct dlfilter *d, struct perf_session *session);
>> +
>> +int dlfilter__do_filter_event(struct dlfilter *d,
>> +			      union perf_event *event,
>> +			      struct perf_sample *sample,
>> +			      struct evsel *evsel,
>> +			      struct machine *machine,
>> +			      struct addr_location *al,
>> +			      struct addr_location *addr_al);
>> +
>> +void dlfilter__cleanup(struct dlfilter *d);
>> +
>> +static inline int dlfilter__filter_event(struct dlfilter *d,
>> +					 union perf_event *event,
>> +					 struct perf_sample *sample,
>> +					 struct evsel *evsel,
>> +					 struct machine *machine,
>> +					 struct addr_location *al,
>> +					 struct addr_location *addr_al)
>> +{
>> +	if (!d || !d->filter_event)
>> +		return 0;
>> +	return dlfilter__do_filter_event(d, event, sample, evsel, machine, al, addr_al);
>> +}
>> +
>> +#endif
>> diff --git a/tools/perf/util/perf_dlfilter.h b/tools/perf/util/perf_dlfilter.h
>> new file mode 100644
>> index 000000000000..132f833f0a0b
>> --- /dev/null
>> +++ b/tools/perf/util/perf_dlfilter.h
>> @@ -0,0 +1,120 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * perf_dlfilter.h: API for perf --dlfilter shared object
>> + * Copyright (c) 2021, Intel Corporation.
>> + */
>> +#ifndef _LINUX_PERF_DLFILTER_H
>> +#define _LINUX_PERF_DLFILTER_H
>> +
>> +#include <linux/perf_event.h>
>> +#include <linux/types.h>
>> +
>> +/* Definitions for perf_dlfilter_sample flags */
>> +enum {
>> +	PERF_DLFILTER_FLAG_BRANCH	= 1ULL << 0,
>> +	PERF_DLFILTER_FLAG_CALL		= 1ULL << 1,
>> +	PERF_DLFILTER_FLAG_RETURN	= 1ULL << 2,
>> +	PERF_DLFILTER_FLAG_CONDITIONAL	= 1ULL << 3,
>> +	PERF_DLFILTER_FLAG_SYSCALLRET	= 1ULL << 4,
>> +	PERF_DLFILTER_FLAG_ASYNC	= 1ULL << 5,
>> +	PERF_DLFILTER_FLAG_INTERRUPT	= 1ULL << 6,
>> +	PERF_DLFILTER_FLAG_TX_ABORT	= 1ULL << 7,
>> +	PERF_DLFILTER_FLAG_TRACE_BEGIN	= 1ULL << 8,
>> +	PERF_DLFILTER_FLAG_TRACE_END	= 1ULL << 9,
>> +	PERF_DLFILTER_FLAG_IN_TX	= 1ULL << 10,
>> +	PERF_DLFILTER_FLAG_VMENTRY	= 1ULL << 11,
>> +	PERF_DLFILTER_FLAG_VMEXIT	= 1ULL << 12,
>> +};
>> +
>> +/*
>> + * perf sample event information (as per perf script and <linux/perf_event.h>)
>> + */
>> +struct perf_dlfilter_sample {
>> +	__u32 size; /* Size of this structure (for compatibility checking) */
>> +	__u64 ip;
>> +	__s32 pid;
>> +	__s32 tid;
>> +	__u64 time;
>> +	__u64 addr;
>> +	__u64 id;
>> +	__u64 stream_id;
>> +	__u64 period;
>> +	__u64 weight;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
>> +	__u16 ins_lat;		/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
>> +	__u16 p_stage_cyc;	/* Refer PERF_SAMPLE_WEIGHT_TYPE in <linux/perf_event.h> */
>> +	__u64 transaction;	/* Refer PERF_SAMPLE_TRANSACTION in <linux/perf_event.h> */
>> +	__u64 insn_cnt;	/* For instructions-per-cycle (IPC) */
>> +	__u64 cyc_cnt;		/* For instructions-per-cycle (IPC) */
>> +	__s32 cpu;
>> +	__u32 flags;		/* Refer PERF_DLFILTER_FLAG_* above */
>> +	__u64 data_src;		/* Refer PERF_SAMPLE_DATA_SRC in <linux/perf_event.h> */
>> +	__u64 phys_addr;	/* Refer PERF_SAMPLE_PHYS_ADDR in <linux/perf_event.h> */
>> +	__u64 data_page_size;	/* Refer PERF_SAMPLE_DATA_PAGE_SIZE in <linux/perf_event.h> */
>> +	__u64 code_page_size;	/* Refer PERF_SAMPLE_CODE_PAGE_SIZE in <linux/perf_event.h> */
>> +	__u64 cgroup;		/* Refer PERF_SAMPLE_CGROUP in <linux/perf_event.h> */
>> +	__u8  cpumode;		/* Refer CPUMODE_MASK etc in <linux/perf_event.h> */
>> +	__u8  addr_correlates_sym; /* True => resolve_addr() can be called */
>> +	__u16 misc;		/* Refer perf_event_header in <linux/perf_event.h> */
>> +	__u32 raw_size;		/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
>> +	const void *raw_data;	/* Refer PERF_SAMPLE_RAW in <linux/perf_event.h> */
>> +	__u64 brstack_nr;	/* Number of brstack entries */
>> +	const struct perf_branch_entry *brstack; /* Refer <linux/perf_event.h> */
>> +	__u64 raw_callchain_nr;	/* Number of raw_callchain entries */
>> +	const __u64 *raw_callchain; /* Refer <linux/perf_event.h> */
>> +	const char *event;
>> +};
>> +
>> +/*
>> + * Address location (as per perf script)
>> + */
>> +struct perf_dlfilter_al {
>> +	__u32 size; /* Size of this structure (for compatibility checking) */
>> +	__u32 symoff;
>> +	const char *sym;
>> +	__u64 addr; /* Mapped address (from dso) */
>> +	__u64 sym_start;
>> +	__u64 sym_end;
>> +	const char *dso;
>> +	__u8  sym_binding; /* STB_LOCAL, STB_GLOBAL or STB_WEAK, refer <elf.h> */
>> +	__u8  is_64_bit; /* Only valid if dso is not NULL */
>> +	__u8  is_kernel_ip; /* True if in kernel space */
>> +	__u32 buildid_size;
>> +	__u8 *buildid;
>> +	/* Below members are only populated by resolve_ip() */
>> +	__u8 filtered; /* True if this sample event will be filtered out */
>> +	const char *comm;
>> +};
>> +
>> +struct perf_dlfilter_fns {
>> +	/* Return information about ip */
>> +	const struct perf_dlfilter_al *(*resolve_ip)(void *ctx);
>> +	/* Return information about addr (if addr_correlates_sym) */
>> +	const struct perf_dlfilter_al *(*resolve_addr)(void *ctx);
>> +	/* Reserved */
>> +	void *(*reserved[126])(void *);
>> +};
>> +
>> +/*
>> + * If implemented, 'start' will be called at the beginning,
>> + * before any calls to 'filter_event'. Return 0 to indicate success,
>> + * or return a negative error code. '*data' can be assigned for use
>> + * by other functions.
>> + */
>> +int start(void **data);
>> +
>> +/*
>> + * If implemented, 'stop' will be called at the end,
>> + * after any calls to 'filter_event'. Return 0 to indicate success, or
>> + * return a negative error code. 'data' is set by start().
>> + */
>> +int stop(void *data);
>> +
>> +/*
>> + * If implemented, 'filter_event' will be called for each sample
>> + * event. Return 0 to keep the sample event, 1 to filter it out, or
>> + * return a negative error code. 'data' is set by start(). 'ctx' is
>> + * needed for calls to perf_dlfilter_fns.
>> + */
>> +int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx);
>> +
>> +#endif
>> -- 
>> 2.17.1
>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-06-23  9:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-21 15:05 [PATCH RFC 00/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
2021-06-21 15:05 ` [PATCH RFC 01/11] perf script: Move filter_cpu() earlier Adrian Hunter
2021-06-22 18:16   ` Arnaldo Carvalho de Melo
2021-06-21 15:05 ` [PATCH RFC 02/11] perf script: Move filtering before scripting Adrian Hunter
2021-06-22 18:18   ` Arnaldo Carvalho de Melo
2021-06-21 15:05 ` [PATCH RFC 03/11] perf script: Share addr_al between functions Adrian Hunter
2021-06-22 18:21   ` Arnaldo Carvalho de Melo
2021-06-21 15:05 ` [PATCH RFC 04/11] perf script: Add API for filtering via dynamically loaded shared object Adrian Hunter
2021-06-22 18:32   ` Arnaldo Carvalho de Melo
2021-06-23  9:25     ` Adrian Hunter
2021-06-21 15:05 ` [PATCH RFC 05/11] perf script: Add dlfilter__filter_event_early() Adrian Hunter
2021-06-21 15:05 ` [PATCH RFC 06/11] perf build: Install perf_dlfilter.h Adrian Hunter
2021-06-21 15:05 ` [PATCH RFC 07/11] perf dlfilter: Add resolve_address() to perf_dlfilter_fns Adrian Hunter
2021-06-21 15:05 ` [PATCH RFC 08/11] perf dlfilter: Add insn() " Adrian Hunter
2021-06-21 15:05 ` [PATCH RFC 09/11] perf dlfilter: Add srcline() " Adrian Hunter
2021-06-21 15:05 ` [PATCH RFC 10/11] perf dlfilter: Add attr() " Adrian Hunter
2021-06-21 15:05 ` [PATCH RFC 11/11] perf dlfilter: Add object_code() " Adrian Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).