linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC perf,bpf 0/5] reveal invisible bpf programs
@ 2018-11-06 20:52 Song Liu
  2018-11-06 20:52 ` [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT Song Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Song Liu @ 2018-11-06 20:52 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kernel-team, Song Liu, ast, daniel, peterz, acme

This is to follow up Alexei's early effort to show bpf programs

   https://www.spinics.net/lists/netdev/msg524232.html

In this version, PERF_RECORD_BPF_EVENT is introduced to send real time BPF
load/unload events to user space. In user space, perf-record is modified
to listen to these events (through a dedicated ring buffer) and generate
detailed information about the program (struct bpf_prog_info_event). Then,
perf-report translates these events into proper symbols.

With this set, perf-report will show bpf program as:

   18.49%     0.16%  test  [kernel.vmlinux]    [k] ksys_write
   18.01%     0.47%  test  [kernel.vmlinux]    [k] vfs_write
   17.02%     0.40%  test  bpf_prog            [k] bpf_prog_F
   16.97%     0.10%  test  [kernel.vmlinux]    [k] __vfs_write
   16.86%     0.12%  test  [kernel.vmlinux]    [k] comm_write
   16.67%     0.39%  test  [kernel.vmlinux]    [k] bpf_probe_read

Note that, the program name is still work in progress, it will be cleaner
with function types in BTF.

Please share your comments on this.

Thanks,
Song

Song Liu (5):
  perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  perf: sync tools/include/uapi/linux/perf_event.h
  perf util: basic handling of PERF_RECORD_BPF_EVENT
  perf util: introduce bpf_prog_info_event
  perf util: generate bpf_prog_info_event for short living bpf programs

 include/linux/perf_event.h            |   5 +
 include/uapi/linux/perf_event.h       |  27 ++-
 kernel/bpf/syscall.c                  |   4 +
 kernel/events/core.c                  |  82 +++++++-
 tools/include/uapi/linux/perf_event.h |  27 ++-
 tools/perf/builtin-record.c           |  55 +++++
 tools/perf/builtin-report.c           |   2 +
 tools/perf/util/Build                 |   2 +
 tools/perf/util/bpf-info.c            | 287 ++++++++++++++++++++++++++
 tools/perf/util/bpf-info.h            |  29 +++
 tools/perf/util/event.c               |  20 ++
 tools/perf/util/event.h               |  29 +++
 tools/perf/util/evlist.c              |  42 +++-
 tools/perf/util/evlist.h              |   4 +
 tools/perf/util/evsel.c               |   9 +
 tools/perf/util/evsel.h               |   3 +
 tools/perf/util/machine.c             |  10 +
 tools/perf/util/machine.h             |   2 +
 tools/perf/util/session.c             |   8 +
 tools/perf/util/tool.h                |   7 +-
 20 files changed, 646 insertions(+), 8 deletions(-)
 create mode 100644 tools/perf/util/bpf-info.c
 create mode 100644 tools/perf/util/bpf-info.h

--
2.17.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-11-06 20:52 [RFC perf,bpf 0/5] reveal invisible bpf programs Song Liu
@ 2018-11-06 20:52 ` Song Liu
  2018-11-07  8:40   ` Peter Zijlstra
  2018-11-06 20:52 ` [RFC perf,bpf 2/5] perf: sync tools/include/uapi/linux/perf_event.h Song Liu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Song Liu @ 2018-11-06 20:52 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kernel-team, Song Liu, ast, daniel, peterz, acme

For better performance analysis of BPF programs, this patch introduces
PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
load/unload information to user space.

        /*
         * Record different types of bpf events:
         *   enum perf_bpf_event_type {
         *      PERF_BPF_EVENT_UNKNOWN          = 0,
         *      PERF_BPF_EVENT_PROG_LOAD        = 1,
         *      PERF_BPF_EVENT_PROG_UNLOAD      = 2,
         *   };
         *
         * struct {
         *      struct perf_event_header header;
         *      u16 type;
         *      u16 flags;
         *      u32 id;  // prog_id or map_id
         * };
         */
        PERF_RECORD_BPF_EVENT                   = 17,

PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
Perf utility (or other user space tools) should listen to this event and
fetch more details about the event via BPF syscalls
(BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).

Currently, PERF_RECORD_BPF_EVENT only support two events:
PERF_BPF_EVENT_PROG_LOAD and PERF_BPF_EVENT_PROG_UNLOAD. But it can be
easily extended to support more events.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/perf_event.h      |  5 ++
 include/uapi/linux/perf_event.h | 27 ++++++++++-
 kernel/bpf/syscall.c            |  4 ++
 kernel/events/core.c            | 82 ++++++++++++++++++++++++++++++++-
 4 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 53c500f0ca79..a3126fd5b7f1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1113,6 +1113,9 @@ static inline void perf_event_task_sched_out(struct task_struct *prev,
 }
 
 extern void perf_event_mmap(struct vm_area_struct *vma);
+extern void perf_event_bpf_event(enum perf_bpf_event_type type,
+				 u16 flags, u32 id);
+
 extern struct perf_guest_info_callbacks *perf_guest_cbs;
 extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
 extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
@@ -1333,6 +1336,8 @@ static inline int perf_unregister_guest_info_callbacks
 (struct perf_guest_info_callbacks *callbacks)				{ return 0; }
 
 static inline void perf_event_mmap(struct vm_area_struct *vma)		{ }
+static inline void perf_event_bpf_event(enum perf_bpf_event_type type,
+					u16 flags, u32 id)		{ }
 static inline void perf_event_exec(void)				{ }
 static inline void perf_event_comm(struct task_struct *tsk, bool exec)	{ }
 static inline void perf_event_namespaces(struct task_struct *tsk)	{ }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index f35eb72739c0..d51cacb3077a 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -372,7 +372,8 @@ struct perf_event_attr {
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
 				namespaces     :  1, /* include namespaces data */
-				__reserved_1   : 35;
+				bpf_event      :  1, /* include bpf events */
+				__reserved_1   : 34;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -963,9 +964,33 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_NAMESPACES			= 16,
 
+	/*
+	 * Record different types of bpf events:
+	 *  enum perf_bpf_event_type {
+	 *     PERF_BPF_EVENT_UNKNOWN		= 0,
+	 *     PERF_BPF_EVENT_PROG_LOAD	= 1,
+	 *     PERF_BPF_EVENT_PROG_UNLOAD	= 2,
+	 *  };
+	 *
+	 * struct {
+	 *	struct perf_event_header header;
+	 *	u16 type;
+	 *	u16 flags;
+	 *	u32 id;  // prog_id or map_id
+	 * };
+	 */
+	PERF_RECORD_BPF_EVENT			= 17,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
+enum perf_bpf_event_type {
+	PERF_BPF_EVENT_UNKNOWN		= 0,
+	PERF_BPF_EVENT_PROG_LOAD	= 1,
+	PERF_BPF_EVENT_PROG_UNLOAD	= 2,
+	PERF_BPF_EVENT_MAX,		/* non-ABI */
+};
+
 #define PERF_MAX_STACK_DEPTH		127
 #define PERF_MAX_CONTEXTS_PER_STACK	  8
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 18e3be193a05..b37051a13be6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1101,9 +1101,12 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
 {
 	if (atomic_dec_and_test(&prog->aux->refcnt)) {
+		int prog_id = prog->aux->id;
+
 		/* bpf_prog_free_id() must be called first */
 		bpf_prog_free_id(prog, do_idr_lock);
 		bpf_prog_kallsyms_del_all(prog);
+		perf_event_bpf_event(PERF_BPF_EVENT_PROG_UNLOAD, 0, prog_id);
 
 		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
 	}
@@ -1441,6 +1444,7 @@ static int bpf_prog_load(union bpf_attr *attr)
 	}
 
 	bpf_prog_kallsyms_add(prog);
+	perf_event_bpf_event(PERF_BPF_EVENT_PROG_LOAD, 0, prog->aux->id);
 	return err;
 
 free_used_maps:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5a97f34bc14c..54667be6669b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -385,6 +385,7 @@ static atomic_t nr_namespaces_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
 static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
+static atomic_t nr_bpf_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4235,7 +4236,7 @@ static bool is_sb_event(struct perf_event *event)
 
 	if (attr->mmap || attr->mmap_data || attr->mmap2 ||
 	    attr->comm || attr->comm_exec ||
-	    attr->task ||
+	    attr->task || attr->bpf_event ||
 	    attr->context_switch)
 		return true;
 	return false;
@@ -4305,6 +4306,8 @@ static void unaccount_event(struct perf_event *event)
 		dec = true;
 	if (has_branch_stack(event))
 		dec = true;
+	if (event->attr.bpf_event)
+		atomic_dec(&nr_bpf_events);
 
 	if (dec) {
 		if (!atomic_add_unless(&perf_sched_count, -1, 1))
@@ -7650,6 +7653,81 @@ static void perf_log_throttle(struct perf_event *event, int enable)
 	perf_output_end(&handle);
 }
 
+/*
+ * bpf load/unload tracking
+ */
+
+struct perf_bpf_event {
+	struct {
+		struct perf_event_header        header;
+		u16 type;
+		u16 flags;
+		u32 id;
+	} event_id;
+};
+
+static int perf_event_bpf_match(struct perf_event *event)
+{
+	return event->attr.bpf_event;
+}
+
+static void perf_event_bpf_output(struct perf_event *event,
+				   void *data)
+{
+	struct perf_bpf_event *bpf_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	int size = bpf_event->event_id.header.size;
+	int ret;
+
+	if (!perf_event_bpf_match(event))
+		return;
+
+	perf_event_header__init_id(&bpf_event->event_id.header, &sample, event);
+	ret = perf_output_begin(&handle, event,
+				bpf_event->event_id.header.size);
+	if (ret)
+		goto out;
+
+	perf_output_put(&handle, bpf_event->event_id);
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+out:
+	bpf_event->event_id.header.size = size;
+}
+
+static void perf_event_bpf(struct perf_bpf_event *bpf_event)
+{
+	perf_iterate_sb(perf_event_bpf_output,
+		       bpf_event,
+		       NULL);
+}
+
+void perf_event_bpf_event(enum perf_bpf_event_type type, u16 flags, u32 id)
+{
+	struct perf_bpf_event bpf_event;
+
+	if (!atomic_read(&nr_bpf_events))
+		return;
+
+	if (type <= PERF_BPF_EVENT_UNKNOWN || type >= PERF_BPF_EVENT_MAX)
+		return;
+
+	bpf_event = (struct perf_bpf_event){
+		.event_id = {
+			.header = {
+				.type = PERF_RECORD_BPF_EVENT,
+				.size = sizeof(bpf_event.event_id),
+			},
+			.type = type,
+			.flags = flags,
+			.id = id,
+		},
+	};
+	perf_event_bpf(&bpf_event);
+}
+
 void perf_event_itrace_started(struct perf_event *event)
 {
 	event->attach_state |= PERF_ATTACH_ITRACE;
@@ -9871,6 +9949,8 @@ static void account_event(struct perf_event *event)
 		inc = true;
 	if (is_cgroup_event(event))
 		inc = true;
+	if (event->attr.bpf_event)
+		atomic_inc(&nr_bpf_events);
 
 	if (inc) {
 		/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC perf,bpf 2/5] perf: sync tools/include/uapi/linux/perf_event.h
  2018-11-06 20:52 [RFC perf,bpf 0/5] reveal invisible bpf programs Song Liu
  2018-11-06 20:52 ` [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT Song Liu
@ 2018-11-06 20:52 ` Song Liu
  2018-11-06 20:52 ` [RFC perf,bpf 3/5] perf util: basic handling of PERF_RECORD_BPF_EVENT Song Liu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Song Liu @ 2018-11-06 20:52 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kernel-team, Song Liu, ast, daniel, peterz, acme

Sync changes for PERF_RECORD_BPF_EVENT.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index f35eb72739c0..d51cacb3077a 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -372,7 +372,8 @@ struct perf_event_attr {
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
 				namespaces     :  1, /* include namespaces data */
-				__reserved_1   : 35;
+				bpf_event      :  1, /* include bpf events */
+				__reserved_1   : 34;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -963,9 +964,33 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_NAMESPACES			= 16,
 
+	/*
+	 * Record different types of bpf events:
+	 *  enum perf_bpf_event_type {
+	 *     PERF_BPF_EVENT_UNKNOWN		= 0,
+	 *     PERF_BPF_EVENT_PROG_LOAD	= 1,
+	 *     PERF_BPF_EVENT_PROG_UNLOAD	= 2,
+	 *  };
+	 *
+	 * struct {
+	 *	struct perf_event_header header;
+	 *	u16 type;
+	 *	u16 flags;
+	 *	u32 id;  // prog_id or map_id
+	 * };
+	 */
+	PERF_RECORD_BPF_EVENT			= 17,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
+enum perf_bpf_event_type {
+	PERF_BPF_EVENT_UNKNOWN		= 0,
+	PERF_BPF_EVENT_PROG_LOAD	= 1,
+	PERF_BPF_EVENT_PROG_UNLOAD	= 2,
+	PERF_BPF_EVENT_MAX,		/* non-ABI */
+};
+
 #define PERF_MAX_STACK_DEPTH		127
 #define PERF_MAX_CONTEXTS_PER_STACK	  8
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC perf,bpf 3/5] perf util: basic handling of PERF_RECORD_BPF_EVENT
  2018-11-06 20:52 [RFC perf,bpf 0/5] reveal invisible bpf programs Song Liu
  2018-11-06 20:52 ` [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT Song Liu
  2018-11-06 20:52 ` [RFC perf,bpf 2/5] perf: sync tools/include/uapi/linux/perf_event.h Song Liu
@ 2018-11-06 20:52 ` Song Liu
  2018-11-06 20:52 ` [RFC perf,bpf 4/5] perf util: introduce bpf_prog_info_event Song Liu
  2018-11-06 20:52 ` [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs Song Liu
  4 siblings, 0 replies; 27+ messages in thread
From: Song Liu @ 2018-11-06 20:52 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kernel-team, Song Liu, ast, daniel, peterz, acme

This patch adds basic handling of PERF_RECORD_BPF_EVENT in perf util.
Following patches add more logic that leverages these events.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/perf/util/event.c   | 19 +++++++++++++++++++
 tools/perf/util/event.h   | 15 +++++++++++++++
 tools/perf/util/evsel.c   |  1 +
 tools/perf/util/machine.c | 10 ++++++++++
 tools/perf/util/machine.h |  2 ++
 tools/perf/util/session.c |  4 ++++
 tools/perf/util/tool.h    |  4 +++-
 7 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 0cd42150f712..601432afbfb2 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -43,6 +43,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_SWITCH]			= "SWITCH",
 	[PERF_RECORD_SWITCH_CPU_WIDE]		= "SWITCH_CPU_WIDE",
 	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
+	[PERF_RECORD_BPF_EVENT]			= "BPF_EVENT",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -1333,6 +1334,14 @@ int perf_event__process_switch(struct perf_tool *tool __maybe_unused,
 	return machine__process_switch_event(machine, event);
 }
 
+int perf_event__process_bpf_event(struct perf_tool *tool __maybe_unused,
+				  union perf_event *event,
+				  struct perf_sample *sample __maybe_unused,
+				  struct machine *machine)
+{
+	return machine__process_bpf_event(machine, event);
+}
+
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
 {
 	return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %c %s\n",
@@ -1465,6 +1474,13 @@ static size_t perf_event__fprintf_lost(union perf_event *event, FILE *fp)
 	return fprintf(fp, " lost %" PRIu64 "\n", event->lost.lost);
 }
 
+size_t perf_event__fprintf_bpf_event(union perf_event *event, FILE *fp)
+{
+	return fprintf(fp, " bpf event with type %u, flags %u, id %u\n",
+		       event->bpf_event.type, event->bpf_event.flags,
+		       event->bpf_event.id);
+}
+
 size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 {
 	size_t ret = fprintf(fp, "PERF_RECORD_%s",
@@ -1500,6 +1516,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_LOST:
 		ret += perf_event__fprintf_lost(event, fp);
 		break;
+	case PERF_RECORD_BPF_EVENT:
+		ret += perf_event__fprintf_bpf_event(event, fp);
+		break;
 	default:
 		ret += fprintf(fp, "\n");
 	}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index bfa60bcafbde..13a0c64dd0ed 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -84,6 +84,15 @@ struct throttle_event {
 	u64 stream_id;
 };
 
+
+/* Record different types of bpf events, see enum perf_bpf_event_type */
+struct bpf_event {
+	struct perf_event_header header;
+	u16 type;
+	u16 flags;
+	u32 id;
+};
+
 #define PERF_SAMPLE_MASK				\
 	(PERF_SAMPLE_IP | PERF_SAMPLE_TID |		\
 	 PERF_SAMPLE_TIME | PERF_SAMPLE_ADDR |		\
@@ -651,6 +660,7 @@ union perf_event {
 	struct stat_round_event		stat_round;
 	struct time_conv_event		time_conv;
 	struct feature_event		feat;
+	struct bpf_event		bpf_event;
 };
 
 void perf_event__print_totals(void);
@@ -750,6 +760,10 @@ int perf_event__process_exit(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
 			     struct machine *machine);
+int perf_event__process_bpf_event(struct perf_tool *tool,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct machine *machine);
 int perf_tool__process_synth_event(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct machine *machine,
@@ -814,6 +828,7 @@ size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_bpf_event(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
 
 int kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index cb7f01059940..af9d539e4b6a 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1635,6 +1635,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(context_switch, p_unsigned);
 	PRINT_ATTRf(write_backward, p_unsigned);
 	PRINT_ATTRf(namespaces, p_unsigned);
+	PRINT_ATTRf(bpf_event, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 111ae858cbcb..74c295f6fdc4 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -681,6 +681,14 @@ int machine__process_switch_event(struct machine *machine __maybe_unused,
 	return 0;
 }
 
+int machine__process_bpf_event(struct machine *machine __maybe_unused,
+			       union perf_event *event)
+{
+	if (dump_trace)
+		perf_event__fprintf_bpf_event(event, stdout);
+	return 0;
+}
+
 static void dso__adjust_kmod_long_name(struct dso *dso, const char *filename)
 {
 	const char *dup_filename;
@@ -1795,6 +1803,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 	case PERF_RECORD_SWITCH:
 	case PERF_RECORD_SWITCH_CPU_WIDE:
 		ret = machine__process_switch_event(machine, event); break;
+	case PERF_RECORD_BPF_EVENT:
+		ret = machine__process_bpf_event(machine, event); break;
 	default:
 		ret = -1;
 		break;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index d856b85862e2..0ed237d6fd0a 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -127,6 +127,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
 				 struct perf_sample *sample);
+int machine__process_bpf_event(struct machine *machine,
+			       union perf_event *event);
 int machine__process_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7d2c8ce6cfad..dffe5120d2d3 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -371,6 +371,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->itrace_start = perf_event__process_itrace_start;
 	if (tool->context_switch == NULL)
 		tool->context_switch = perf_event__process_switch;
+	if (tool->bpf_event == NULL)
+		tool->bpf_event = perf_event__process_bpf_event;
 	if (tool->read == NULL)
 		tool->read = process_event_sample_stub;
 	if (tool->throttle == NULL)
@@ -1300,6 +1302,8 @@ static int machines__deliver_event(struct machines *machines,
 	case PERF_RECORD_SWITCH:
 	case PERF_RECORD_SWITCH_CPU_WIDE:
 		return tool->context_switch(tool, event, sample, machine);
+	case PERF_RECORD_BPF_EVENT:
+		return tool->bpf_event(tool, event, sample, machine);
 	default:
 		++evlist->stats.nr_unknown_events;
 		return -1;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 56e4ca54020a..69ae898ca024 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -53,7 +53,9 @@ struct perf_tool {
 			itrace_start,
 			context_switch,
 			throttle,
-			unthrottle;
+			unthrottle,
+			bpf_event;
+
 	event_attr_op	attr;
 	event_attr_op	event_update;
 	event_op2	tracing_data;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC perf,bpf 4/5] perf util: introduce bpf_prog_info_event
  2018-11-06 20:52 [RFC perf,bpf 0/5] reveal invisible bpf programs Song Liu
                   ` (2 preceding siblings ...)
  2018-11-06 20:52 ` [RFC perf,bpf 3/5] perf util: basic handling of PERF_RECORD_BPF_EVENT Song Liu
@ 2018-11-06 20:52 ` Song Liu
  2018-11-06 21:11   ` Alexei Starovoitov
  2018-11-06 20:52 ` [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs Song Liu
  4 siblings, 1 reply; 27+ messages in thread
From: Song Liu @ 2018-11-06 20:52 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kernel-team, Song Liu, ast, daniel, peterz, acme

This patch introduces struct bpf_prog_info_event to union perf_event.

struct bpf_prog_info_event {
       struct perf_event_header        header;
       u32                             prog_info_len;
       u32                             ksym_table_len;
       u64                             ksym_table;
       struct bpf_prog_info            prog_info;
       char                            data[];
};

struct bpf_prog_info_event contains information about a bpf program.
These events are written to perf.data by perf-record, and processed by
perf-report.

struct bpf_prog_info_event uses arrays for some data (ksym_table, and
arrays in struct bpf_prog_info). To make these arrays easy to serialize,
we allocate continuous memory (data). These array pointers are translated
to offset in bpf_prog_info_event before written to file. And vice-versa
when the event is read from file.

This patch enables synthesizing these events at the beginning of
perf-record run. Next patch will process short living bpf programs that
are created during perf-record.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/perf/builtin-record.c |   5 +
 tools/perf/builtin-report.c |   2 +
 tools/perf/util/Build       |   2 +
 tools/perf/util/bpf-info.c  | 287 ++++++++++++++++++++++++++++++++++++
 tools/perf/util/bpf-info.h  |  29 ++++
 tools/perf/util/event.c     |   1 +
 tools/perf/util/event.h     |  14 ++
 tools/perf/util/session.c   |   4 +
 tools/perf/util/tool.h      |   3 +-
 9 files changed, 346 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/bpf-info.c
 create mode 100644 tools/perf/util/bpf-info.h

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0980dfe3396b..73b02bde1ebc 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -41,6 +41,7 @@
 #include "util/perf-hooks.h"
 #include "util/time-utils.h"
 #include "util/units.h"
+#include "util/bpf-info.h"
 #include "asm/bug.h"
 
 #include <errno.h>
@@ -850,6 +851,9 @@ static int record__synthesize(struct record *rec, bool tail)
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
 					    process_synthesized_event, opts->sample_address,
 					    opts->proc_map_timeout, 1);
+
+	err = perf_event__synthesize_bpf_prog_info(
+		&rec->tool, process_synthesized_event, machine);
 out:
 	return err;
 }
@@ -1531,6 +1535,7 @@ static struct record record = {
 		.namespaces	= perf_event__process_namespaces,
 		.mmap		= perf_event__process_mmap,
 		.mmap2		= perf_event__process_mmap2,
+		.bpf_prog_info	= perf_event__process_bpf_prog_info,
 		.ordered_events	= true,
 	},
 };
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c0703979c51d..4a9a3e8da4e0 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -41,6 +41,7 @@
 #include "util/auxtrace.h"
 #include "util/units.h"
 #include "util/branch.h"
+#include "util/bpf-info.h"
 
 #include <dlfcn.h>
 #include <errno.h>
@@ -981,6 +982,7 @@ int cmd_report(int argc, const char **argv)
 			.auxtrace_info	 = perf_event__process_auxtrace_info,
 			.auxtrace	 = perf_event__process_auxtrace,
 			.feature	 = process_feature_event,
+			.bpf_prog_info	 = perf_event__process_bpf_prog_info,
 			.ordered_events	 = true,
 			.ordering_requires_timestamps = true,
 		},
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index ecd9f9ceda77..624c7281217c 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -150,6 +150,8 @@ endif
 
 libperf-y += perf-hooks.o
 
+libperf-$(CONFIG_LIBBPF) += bpf-info.o
+
 libperf-$(CONFIG_CXX) += c++/
 
 CFLAGS_config.o   += -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
diff --git a/tools/perf/util/bpf-info.c b/tools/perf/util/bpf-info.c
new file mode 100644
index 000000000000..fa598c4328be
--- /dev/null
+++ b/tools/perf/util/bpf-info.c
@@ -0,0 +1,287 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018 Facebook
+ */
+#include <errno.h>
+#include <stdio.h>
+#include <bpf/bpf.h>
+#include "bpf-info.h"
+#include "debug.h"
+#include "session.h"
+
+#define KSYM_NAME_LEN 128
+#define BPF_PROG_INFO_MIN_SIZE 128  /* minimal require jited_func_lens */
+
+static inline __u64 ptr_to_u64(const void *ptr)
+{
+	return (__u64) (unsigned long) ptr;
+}
+
+/* fetch information of the bpf program via bpf syscall. */
+struct bpf_prog_info_event *perf_bpf_info__get_bpf_prog_info_event(u32 prog_id)
+{
+	struct bpf_prog_info_event *prog_info_event = NULL;
+	struct bpf_prog_info info = {};
+	u32 info_len = sizeof(info);
+	u32 event_len, i;
+	int fd, err;
+	void *ptr;
+
+	fd = bpf_prog_get_fd_by_id(prog_id);
+	if (fd < 0) {
+		pr_debug("Failed to get fd for prog_id %u\n", prog_id);
+		return NULL;
+	}
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+	if (err) {
+		pr_debug("can't get prog info: %s", strerror(errno));
+		goto close_fd;
+	}
+	if (info_len < BPF_PROG_INFO_MIN_SIZE) {
+		pr_debug("kernel is too old to support proper prog info\n");
+		goto close_fd;
+	}
+
+	/* calculate size of bpf_prog_info_event */
+	event_len = sizeof(struct bpf_prog_info_event);
+	event_len += info_len;
+	event_len -= sizeof(info);
+	event_len += info.jited_prog_len;
+	event_len += info.xlated_prog_len;
+	event_len += info.nr_map_ids * sizeof(u32);
+	event_len += info.nr_jited_ksyms * sizeof(u64);
+	event_len += info.nr_jited_func_lens * sizeof(u32);
+	event_len += info.nr_jited_ksyms * KSYM_NAME_LEN;
+
+	prog_info_event = (struct bpf_prog_info_event *) malloc(event_len);
+	if (!prog_info_event)
+		goto close_fd;
+
+	/* assign pointers for map_ids, jited_prog_insns, etc. */
+	ptr = prog_info_event->data;
+	info.map_ids = ptr_to_u64(ptr);
+	ptr += info.nr_map_ids * sizeof(u32);
+	info.jited_prog_insns = ptr_to_u64(ptr);
+	ptr += info.jited_prog_len;
+	info.xlated_prog_insns = ptr_to_u64(ptr);
+	ptr += info.xlated_prog_len;
+	info.jited_ksyms = ptr_to_u64(ptr);
+	ptr += info.nr_jited_ksyms * sizeof(u64);
+	info.jited_func_lens = ptr_to_u64(ptr);
+	ptr += info.nr_jited_func_lens * sizeof(u32);
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+	if (err) {
+		pr_err("can't get prog info: %s\n", strerror(errno));
+		free(prog_info_event);
+		prog_info_event = NULL;
+		goto close_fd;
+	}
+
+	/* fill data in prog_info_event */
+	prog_info_event->header.type = PERF_RECORD_BPF_PROG_INFO;
+	prog_info_event->header.misc = 0;
+	prog_info_event->prog_info_len = info_len;
+
+	memcpy(&prog_info_event->prog_info, &info, info_len);
+
+	prog_info_event->ksym_table_len = 0;
+	prog_info_event->ksym_table = ptr_to_u64(ptr);
+
+	/* fill in fake symbol name for now, add real name after BTF */
+	if (info.nr_jited_func_lens == 1 && info.name) {  /* only main prog */
+		size_t l;
+
+		assert(info.nr_jited_ksyms == 1);
+		l = snprintf(ptr, KSYM_NAME_LEN, "bpf_prog_%s", info.name);
+		prog_info_event->ksym_table_len += l + 1;
+		ptr += l + 1;
+
+	} else {
+		assert(info.nr_jited_ksyms == info.nr_jited_func_lens);
+
+		for (i = 0; i < info.nr_jited_ksyms; i++) {
+			size_t l;
+
+			l = snprintf(ptr, KSYM_NAME_LEN, "bpf_prog_%d_%d",
+				     info.id, i);
+			prog_info_event->ksym_table_len += l + 1;
+			ptr += l + 1;
+		}
+	}
+
+	prog_info_event->header.size = ptr - (void *)prog_info_event;
+
+close_fd:
+	close(fd);
+	return prog_info_event;
+}
+
+static size_t fprintf_bpf_prog_info(
+	struct bpf_prog_info_event *prog_info_event, FILE *fp)
+{
+	struct bpf_prog_info *info = &prog_info_event->prog_info;
+	unsigned long *jited_ksyms = (unsigned long *)(info->jited_ksyms);
+	char *name_ptr = (char *) prog_info_event->ksym_table;
+	unsigned int i;
+	size_t ret;
+
+	ret = fprintf(fp, "bpf_prog: type: %u id: %u ", info->type, info->id);
+	ret += fprintf(fp, "nr_jited_ksyms: %u\n", info->nr_jited_ksyms);
+
+	for (i = 0; i < info->nr_jited_ksyms; i++) {
+		ret += fprintf(fp, "jited_ksyms[%u]: %lx %s\n",
+			       i, jited_ksyms[i], name_ptr);
+		name_ptr += strlen(name_ptr);
+	}
+	return ret;
+}
+
+size_t perf_event__fprintf_bpf_prog_info(union perf_event *event, FILE *fp)
+{
+	return fprintf_bpf_prog_info(&event->bpf_prog_info, fp);
+}
+
+/*
+ * translate all array ptr to offset from base address, called before
+ * writing the event to file
+ */
+void perf_bpf_info__ptr_to_offset(
+	struct bpf_prog_info_event *prog_info_event)
+{
+	u64 base = ptr_to_u64(prog_info_event);
+
+	prog_info_event->ksym_table -= base;
+	prog_info_event->prog_info.jited_prog_insns -= base;
+	prog_info_event->prog_info.xlated_prog_insns -= base;
+	prog_info_event->prog_info.map_ids -= base;
+	prog_info_event->prog_info.jited_ksyms -= base;
+	prog_info_event->prog_info.jited_func_lens -= base;
+}
+
+/*
+ * translate offset from base address to array pointer, called after
+ * reading the event from file
+ */
+void perf_bpf_info__offset_to_ptr(
+	struct bpf_prog_info_event *prog_info_event)
+{
+	u64 base = ptr_to_u64(prog_info_event);
+
+	prog_info_event->ksym_table += base;
+	prog_info_event->prog_info.jited_prog_insns += base;
+	prog_info_event->prog_info.xlated_prog_insns += base;
+	prog_info_event->prog_info.map_ids += base;
+	prog_info_event->prog_info.jited_ksyms += base;
+	prog_info_event->prog_info.jited_func_lens += base;
+}
+
+int perf_event__synthesize_one_bpf_prog_info(struct perf_tool *tool,
+					     perf_event__handler_t process,
+					     struct machine *machine,
+					     __u32 id)
+{
+	struct bpf_prog_info_event *prog_info_event;
+
+	prog_info_event = perf_bpf_info__get_bpf_prog_info_event(id);
+
+	if (!prog_info_event) {
+		pr_err("Failed to get prog_info_event\n");
+		return -1;
+	}
+	perf_bpf_info__ptr_to_offset(prog_info_event);
+
+	if (perf_tool__process_synth_event(
+		    tool, (union perf_event *)prog_info_event,
+		    machine, process) != 0) {
+		free(prog_info_event);
+		return -1;
+	}
+
+	free(prog_info_event);
+	return 0;
+}
+
+int perf_event__synthesize_bpf_prog_info(struct perf_tool *tool,
+					 perf_event__handler_t process,
+					 struct machine *machine)
+{
+	__u32 id = 0;
+	int err = 0;
+
+	while (true) {
+		err = bpf_prog_get_next_id(id, &id);
+		if (err) {
+			if (errno == ENOENT) {
+				err = 0;
+				break;
+			}
+			fprintf(stderr, "can't get next program: %s%s",
+				strerror(errno),
+				errno == EINVAL ? " -- kernel too old?" : "");
+			err = -1;
+			break;
+		}
+		err = perf_event__synthesize_one_bpf_prog_info(
+			tool, process, machine, id);
+	}
+	return err;
+}
+
+int perf_event__process_bpf_prog_info(struct perf_session *session,
+				      union perf_event *event)
+{
+	struct machine *machine = &session->machines.host;
+	struct bpf_prog_info_event *prog_info_event;
+	struct bpf_prog_info *info;
+	struct symbol *sym;
+	struct map *map;
+	char *name_ptr;
+	int ret = 0;
+	u64 *addrs;
+	u32 *lens;
+	u32 i;
+
+	prog_info_event = (struct bpf_prog_info_event *)
+		malloc(event->header.size);
+	if (!prog_info_event)
+		return -ENOMEM;
+
+	/* copy the data to rw memeory so we can modify it */
+	memcpy(prog_info_event,  &event->bpf_prog_info, event->header.size);
+	info = &prog_info_event->prog_info;
+
+	perf_bpf_info__offset_to_ptr(prog_info_event);
+	name_ptr = (char *) prog_info_event->ksym_table;
+	addrs = (u64 *)info->jited_ksyms;
+	lens = (u32 *)info->jited_func_lens;
+	for (i = 0; i < info->nr_jited_ksyms; i++) {
+		u32 len = info->nr_jited_func_lens == 1 ?
+			len = info->jited_prog_len : lens[i];
+
+		map = map_groups__find(&machine->kmaps, addrs[i]);
+		if (!map) {
+			map = dso__new_map("bpf_prog");
+			if (!map) {
+				ret = -ENOMEM;
+				break;
+			}
+			map->start = addrs[i];
+			map->pgoff = map->start;
+			map->end = map->start + len;
+			map_groups__insert(&machine->kmaps, map);
+		}
+
+		sym = symbol__new(addrs[i], len, 0, 0, name_ptr);
+		if (!sym) {
+			ret = -ENOMEM;
+			break;
+		}
+		dso__insert_symbol(map->dso, sym);
+		name_ptr += strlen(name_ptr) + 1;
+	}
+
+	free(prog_info_event);
+	return ret;
+}
diff --git a/tools/perf/util/bpf-info.h b/tools/perf/util/bpf-info.h
new file mode 100644
index 000000000000..813cad07bacb
--- /dev/null
+++ b/tools/perf/util/bpf-info.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_BPF_INFO_H
+#define __PERF_BPF_INFO_H
+
+#include "event.h"
+#include "machine.h"
+#include "tool.h"
+#include "symbol.h"
+
+struct bpf_prog_info_event *perf_bpf_info__get_bpf_prog_info_event(u32 prog_id);
+
+size_t perf_event__fprintf_bpf_prog_info(union perf_event *event, FILE *fp);
+
+int perf_event__synthesize_one_bpf_prog_info(struct perf_tool *tool,
+					     perf_event__handler_t process,
+					     struct machine *machine,
+					     __u32 id);
+
+int perf_event__synthesize_bpf_prog_info(struct perf_tool *tool,
+					 perf_event__handler_t process,
+					 struct machine *machine);
+
+void perf_bpf_info__ptr_to_offset(struct bpf_prog_info_event *prog_info_event);
+void perf_bpf_info__offset_to_ptr(struct bpf_prog_info_event *prog_info_event);
+
+int perf_event__process_bpf_prog_info(struct perf_session *session,
+				      union perf_event *event);
+
+#endif /* __PERF_BPF_INFO_H */
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 601432afbfb2..33b1c168b83e 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -61,6 +61,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_EVENT_UPDATE]		= "EVENT_UPDATE",
 	[PERF_RECORD_TIME_CONV]			= "TIME_CONV",
 	[PERF_RECORD_HEADER_FEATURE]		= "FEATURE",
+	[PERF_RECORD_BPF_PROG_INFO]		= "BPF_PROG_INFO",
 };
 
 static const char *perf_ns__names[] = {
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 13a0c64dd0ed..dc64d800eaa6 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -5,6 +5,7 @@
 #include <limits.h>
 #include <stdio.h>
 #include <linux/kernel.h>
+#include <linux/bpf.h>
 
 #include "../perf.h"
 #include "build-id.h"
@@ -258,6 +259,7 @@ enum perf_user_event_type { /* above any possible kernel type */
 	PERF_RECORD_EVENT_UPDATE		= 78,
 	PERF_RECORD_TIME_CONV			= 79,
 	PERF_RECORD_HEADER_FEATURE		= 80,
+	PERF_RECORD_BPF_PROG_INFO		= 81,
 	PERF_RECORD_HEADER_MAX
 };
 
@@ -629,6 +631,17 @@ struct feature_event {
 	char				data[];
 };
 
+#define KSYM_NAME_LEN 128
+
+struct bpf_prog_info_event {
+	struct perf_event_header	header;
+	u32				prog_info_len;
+	u32				ksym_table_len;
+	u64				ksym_table;
+	struct bpf_prog_info		prog_info;
+	char				data[];
+};
+
 union perf_event {
 	struct perf_event_header	header;
 	struct mmap_event		mmap;
@@ -661,6 +674,7 @@ union perf_event {
 	struct time_conv_event		time_conv;
 	struct feature_event		feat;
 	struct bpf_event		bpf_event;
+	struct bpf_prog_info_event	bpf_prog_info;
 };
 
 void perf_event__print_totals(void);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index dffe5120d2d3..5365ee1dfbec 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -415,6 +415,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->time_conv = process_event_op2_stub;
 	if (tool->feature == NULL)
 		tool->feature = process_event_op2_stub;
+	if (tool->bpf_prog_info == NULL)
+		tool->bpf_prog_info = process_event_op2_stub;
 }
 
 static void swap_sample_id_all(union perf_event *event, void *data)
@@ -1397,6 +1399,8 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 		return tool->time_conv(session, event);
 	case PERF_RECORD_HEADER_FEATURE:
 		return tool->feature(session, event);
+	case PERF_RECORD_BPF_PROG_INFO:
+		return tool->bpf_prog_info(session, event);
 	default:
 		return -EINVAL;
 	}
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 69ae898ca024..739a4b1188f7 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -70,7 +70,8 @@ struct perf_tool {
 			stat_config,
 			stat,
 			stat_round,
-			feature;
+			feature,
+			bpf_prog_info;
 	event_op3	auxtrace;
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-06 20:52 [RFC perf,bpf 0/5] reveal invisible bpf programs Song Liu
                   ` (3 preceding siblings ...)
  2018-11-06 20:52 ` [RFC perf,bpf 4/5] perf util: introduce bpf_prog_info_event Song Liu
@ 2018-11-06 20:52 ` Song Liu
  2018-11-06 21:54   ` David Ahern
  4 siblings, 1 reply; 27+ messages in thread
From: Song Liu @ 2018-11-06 20:52 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: kernel-team, Song Liu, ast, daniel, peterz, acme

This patch enables perf-record to listen to bpf_event and generate
bpf_prog_info_event for bpf programs loaded and unloaded during
perf-record run.

To minimize latency between bpf_event and following bpf calls, separate
mmap with watermark of 1 is created to process these vip events. Then
a separate dummy event is attached to the special mmap.

By default, perf-record will listen to bpf_event. Option no-bpf-event is
added in case the user would opt out.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/perf/builtin-record.c | 50 +++++++++++++++++++++++++++++++++++++
 tools/perf/util/evlist.c    | 42 ++++++++++++++++++++++++++++---
 tools/perf/util/evlist.h    |  4 +++
 tools/perf/util/evsel.c     |  8 ++++++
 tools/perf/util/evsel.h     |  3 +++
 5 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 73b02bde1ebc..1036a64eb9f7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -80,6 +80,7 @@ struct record {
 	bool			buildid_all;
 	bool			timestamp_filename;
 	bool			timestamp_boundary;
+	bool			no_bpf_event;
 	struct switch_output	switch_output;
 	unsigned long long	samples;
 };
@@ -381,6 +382,8 @@ static int record__open(struct record *rec)
 		pos->tracking = 1;
 		pos->attr.enable_on_exec = 1;
 	}
+	if (!rec->no_bpf_event)
+		perf_evlist__add_bpf_tracker(evlist);
 
 	perf_evlist__config(evlist, opts, &callchain_param);
 
@@ -562,10 +565,55 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 	return rc;
 }
 
+static int record__mmap_process_vip_events(struct record *rec)
+{
+	int i;
+
+	for (i = 0; i < rec->evlist->nr_mmaps; i++) {
+		struct perf_mmap *map = &rec->evlist->vip_mmap[i];
+		union perf_event *event;
+
+		perf_mmap__read_init(map);
+		while ((event = perf_mmap__read_event(map)) != NULL) {
+			pr_debug("processing vip event of type %d\n",
+				 event->header.type);
+			switch (event->header.type) {
+			case PERF_RECORD_BPF_EVENT:
+				switch (event->bpf_event.type) {
+				case PERF_BPF_EVENT_PROG_LOAD:
+					perf_event__synthesize_one_bpf_prog_info(
+						&rec->tool,
+						process_synthesized_event,
+						&rec->session->machines.host,
+						event->bpf_event.id);
+					/* fall through */
+				case PERF_BPF_EVENT_PROG_UNLOAD:
+					record__write(rec, NULL, event,
+						      event->header.size);
+				break;
+				default:
+					break;
+				}
+				break;
+			default:
+				break;
+			}
+			perf_mmap__consume(map);
+		}
+		perf_mmap__read_done(map);
+	}
+
+	return 0;
+}
+
 static int record__mmap_read_all(struct record *rec)
 {
 	int err;
 
+	err = record__mmap_process_vip_events(rec);
+	if (err)
+		return err;
+
 	err = record__mmap_read_evlist(rec, rec->evlist, false);
 	if (err)
 		return err;
@@ -1686,6 +1734,8 @@ static struct option __record_options[] = {
 			  "signal"),
 	OPT_BOOLEAN(0, "dry-run", &dry_run,
 		    "Parse options then exit"),
+	OPT_BOOLEAN(0, "no-bpf-event", &record.no_bpf_event,
+		    "do not record event on bpf program load/unload"),
 	OPT_END()
 };
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index be440df29615..466a9f7b1e93 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -45,6 +45,7 @@ void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus,
 	for (i = 0; i < PERF_EVLIST__HLIST_SIZE; ++i)
 		INIT_HLIST_HEAD(&evlist->heads[i]);
 	INIT_LIST_HEAD(&evlist->entries);
+	INIT_LIST_HEAD(&evlist->vip_entries);
 	perf_evlist__set_maps(evlist, cpus, threads);
 	fdarray__init(&evlist->pollfd, 64);
 	evlist->workload.pid = -1;
@@ -177,6 +178,8 @@ void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry)
 {
 	entry->evlist = evlist;
 	list_add_tail(&entry->node, &evlist->entries);
+	if (entry->vip)
+		list_add_tail(&entry->vip_node, &evlist->vip_entries);
 	entry->idx = evlist->nr_entries;
 	entry->tracking = !entry->idx;
 
@@ -267,6 +270,27 @@ int perf_evlist__add_dummy(struct perf_evlist *evlist)
 	return 0;
 }
 
+int perf_evlist__add_bpf_tracker(struct perf_evlist *evlist)
+{
+	struct perf_event_attr attr = {
+		.type	          = PERF_TYPE_SOFTWARE,
+		.config           = PERF_COUNT_SW_DUMMY,
+		.watermark        = 1,
+		.bpf_event        = 1,
+		.wakeup_watermark = 1,
+		.size	   = sizeof(attr), /* to capture ABI version */
+	};
+	struct perf_evsel *evsel = perf_evsel__new_idx(&attr,
+						       evlist->nr_entries);
+
+	if (evsel == NULL)
+		return -ENOMEM;
+
+	evsel->vip = true;
+	perf_evlist__add(evlist, evsel);
+	return 0;
+}
+
 static int perf_evlist__add_attrs(struct perf_evlist *evlist,
 				  struct perf_event_attr *attrs, size_t nr_attrs)
 {
@@ -770,6 +794,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 	int evlist_cpu = cpu_map__cpu(evlist->cpus, cpu_idx);
 
 	evlist__for_each_entry(evlist, evsel) {
+		struct perf_mmap *vip_maps = evlist->vip_mmap;
 		struct perf_mmap *maps = evlist->mmap;
 		int *output = _output;
 		int fd;
@@ -800,7 +825,11 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 
 		fd = FD(evsel, cpu, thread);
 
-		if (*output == -1) {
+		if (evsel->vip) {
+			if (perf_mmap__mmap(&vip_maps[idx], mp,
+					    fd, evlist_cpu) < 0)
+				return -1;
+		} else if (*output == -1) {
 			*output = fd;
 
 			if (perf_mmap__mmap(&maps[idx], mp, *output, evlist_cpu) < 0)
@@ -822,8 +851,12 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 		 * Therefore don't add it for polling.
 		 */
 		if (!evsel->system_wide &&
-		    __perf_evlist__add_pollfd(evlist, fd, &maps[idx], revent) < 0) {
-			perf_mmap__put(&maps[idx]);
+		    __perf_evlist__add_pollfd(
+			    evlist, fd,
+			    evsel->vip ? &vip_maps[idx] : &maps[idx],
+			    revent) < 0) {
+			perf_mmap__put(evsel->vip ?
+				       &vip_maps[idx] : &maps[idx]);
 			return -1;
 		}
 
@@ -1035,6 +1068,9 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	if (!evlist->mmap)
 		return -ENOMEM;
 
+	if (!evlist->vip_mmap)
+		evlist->vip_mmap = perf_evlist__alloc_mmap(evlist, false);
+
 	if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
 		return -ENOMEM;
 
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index dc66436add98..6d99e8dab570 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -26,6 +26,7 @@ struct record_opts;
 
 struct perf_evlist {
 	struct list_head entries;
+	struct list_head vip_entries;
 	struct hlist_head heads[PERF_EVLIST__HLIST_SIZE];
 	int		 nr_entries;
 	int		 nr_groups;
@@ -43,6 +44,7 @@ struct perf_evlist {
 	} workload;
 	struct fdarray	 pollfd;
 	struct perf_mmap *mmap;
+	struct perf_mmap *vip_mmap;
 	struct perf_mmap *overwrite_mmap;
 	struct thread_map *threads;
 	struct cpu_map	  *cpus;
@@ -84,6 +86,8 @@ int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,
 
 int perf_evlist__add_dummy(struct perf_evlist *evlist);
 
+int perf_evlist__add_bpf_tracker(struct perf_evlist *evlist);
+
 int perf_evlist__add_newtp(struct perf_evlist *evlist,
 			   const char *sys, const char *name, void *handler);
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index af9d539e4b6a..94456a493607 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -235,6 +235,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
 	evsel->evlist	   = NULL;
 	evsel->bpf_fd	   = -1;
 	INIT_LIST_HEAD(&evsel->node);
+	INIT_LIST_HEAD(&evsel->vip_node);
 	INIT_LIST_HEAD(&evsel->config_terms);
 	perf_evsel__object.init(evsel);
 	evsel->sample_size = __perf_evsel__sample_size(attr->sample_type);
@@ -1795,6 +1796,8 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 				     PERF_SAMPLE_BRANCH_NO_CYCLES);
 	if (perf_missing_features.group_read && evsel->attr.inherit)
 		evsel->attr.read_format &= ~(PERF_FORMAT_GROUP|PERF_FORMAT_ID);
+	if (perf_missing_features.bpf_event)
+		evsel->attr.bpf_event = 0;
 retry_sample_id:
 	if (perf_missing_features.sample_id_all)
 		evsel->attr.sample_id_all = 0;
@@ -1939,6 +1942,11 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 		perf_missing_features.exclude_guest = true;
 		pr_debug2("switching off exclude_guest, exclude_host\n");
 		goto fallback_missing_features;
+	} else if (!perf_missing_features.bpf_event &&
+		   evsel->attr.bpf_event) {
+		perf_missing_features.bpf_event = true;
+		pr_debug2("switching off bpf_event\n");
+		goto fallback_missing_features;
 	} else if (!perf_missing_features.sample_id_all) {
 		perf_missing_features.sample_id_all = true;
 		pr_debug2("switching off sample_id_all\n");
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 4107c39f4a54..82b1d3e42603 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -89,6 +89,7 @@ struct perf_stat_evsel;
  */
 struct perf_evsel {
 	struct list_head	node;
+	struct list_head	vip_node;
 	struct perf_evlist	*evlist;
 	struct perf_event_attr	attr;
 	char			*filter;
@@ -128,6 +129,7 @@ struct perf_evsel {
 	bool			ignore_missing_thread;
 	bool			forced_leader;
 	bool			use_uncore_alias;
+	bool			vip;  /* vip events have their own mmap */
 	/* parse modifier helper */
 	int			exclude_GH;
 	int			nr_members;
@@ -163,6 +165,7 @@ struct perf_missing_features {
 	bool lbr_flags;
 	bool write_backward;
 	bool group_read;
+	bool bpf_event;
 };
 
 extern struct perf_missing_features perf_missing_features;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 4/5] perf util: introduce bpf_prog_info_event
  2018-11-06 20:52 ` [RFC perf,bpf 4/5] perf util: introduce bpf_prog_info_event Song Liu
@ 2018-11-06 21:11   ` Alexei Starovoitov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2018-11-06 21:11 UTC (permalink / raw)
  To: Song Liu, netdev, linux-kernel; +Cc: Kernel Team, ast, daniel, peterz, acme

On 11/6/18 12:52 PM, Song Liu wrote:
> +	/* fill in fake symbol name for now, add real name after BTF */
> +	if (info.nr_jited_func_lens == 1 && info.name) {  /* only main prog */
> +		size_t l;
> +
> +		assert(info.nr_jited_ksyms == 1);
> +		l = snprintf(ptr, KSYM_NAME_LEN, "bpf_prog_%s", info.name);

please include the prog tag here. Just like kernel kallsyms do.
Other than this small nit the patch set looks great to me.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-06 20:52 ` [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs Song Liu
@ 2018-11-06 21:54   ` David Ahern
  2018-11-06 23:17     ` Song Liu
  0 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2018-11-06 21:54 UTC (permalink / raw)
  To: Song Liu, netdev, linux-kernel; +Cc: kernel-team, ast, daniel, peterz, acme

On 11/6/18 1:52 PM, Song Liu wrote:
> +
>  static int record__mmap_read_all(struct record *rec)
>  {
>  	int err;
>  
> +	err = record__mmap_process_vip_events(rec);
> +	if (err)
> +		return err;
> +
>  	err = record__mmap_read_evlist(rec, rec->evlist, false);
>  	if (err)
>  		return err;

Seems to me that is going to increase the overhead of perf on any system
doing BPF updates. The BPF events cause a wakeup every load and unload,
and perf processes not only the VIP events but then walks all of the
other maps.

> @@ -1686,6 +1734,8 @@ static struct option __record_options[] = {
>  			  "signal"),
>  	OPT_BOOLEAN(0, "dry-run", &dry_run,
>  		    "Parse options then exit"),
> +	OPT_BOOLEAN(0, "no-bpf-event", &record.no_bpf_event,
> +		    "do not record event on bpf program load/unload"),

Why should this default on? If am recording FIB events, I don't care
about BPF events.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-06 21:54   ` David Ahern
@ 2018-11-06 23:17     ` Song Liu
  2018-11-06 23:29       ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Song Liu @ 2018-11-06 23:17 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, linux-kernel, Kernel Team, ast, daniel, peterz, acme



> On Nov 6, 2018, at 1:54 PM, David Ahern <dsahern@gmail.com> wrote:
> 
> On 11/6/18 1:52 PM, Song Liu wrote:
>> +
>> static int record__mmap_read_all(struct record *rec)
>> {
>> 	int err;
>> 
>> +	err = record__mmap_process_vip_events(rec);
>> +	if (err)
>> +		return err;
>> +
>> 	err = record__mmap_read_evlist(rec, rec->evlist, false);
>> 	if (err)
>> 		return err;
> 
> Seems to me that is going to increase the overhead of perf on any system
> doing BPF updates. The BPF events cause a wakeup every load and unload,
> and perf processes not only the VIP events but then walks all of the
> other maps.

BPF prog load/unload events should be rare events in real world use cases. 
So I think the overhead is OK. Also, I don't see an easy way to improve 
this. 

> 
>> @@ -1686,6 +1734,8 @@ static struct option __record_options[] = {
>> 			  "signal"),
>> 	OPT_BOOLEAN(0, "dry-run", &dry_run,
>> 		    "Parse options then exit"),
>> +	OPT_BOOLEAN(0, "no-bpf-event", &record.no_bpf_event,
>> +		    "do not record event on bpf program load/unload"),
> 
> Why should this default on? If am recording FIB events, I don't care
> about BPF events.
> 

I am OK with default off if that's the preferred way. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-06 23:17     ` Song Liu
@ 2018-11-06 23:29       ` Alexei Starovoitov
  2018-11-06 23:36         ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2018-11-06 23:29 UTC (permalink / raw)
  To: Song Liu, David Ahern
  Cc: netdev, linux-kernel, Kernel Team, ast, daniel, peterz, acme

On 11/6/18 3:17 PM, Song Liu wrote:
> 
> 
>> On Nov 6, 2018, at 1:54 PM, David Ahern <dsahern@gmail.com> wrote:
>>
>> On 11/6/18 1:52 PM, Song Liu wrote:
>>> +
>>> static int record__mmap_read_all(struct record *rec)
>>> {
>>> 	int err;
>>>
>>> +	err = record__mmap_process_vip_events(rec);
>>> +	if (err)
>>> +		return err;
>>> +
>>> 	err = record__mmap_read_evlist(rec, rec->evlist, false);
>>> 	if (err)
>>> 		return err;
>>
>> Seems to me that is going to increase the overhead of perf on any system
>> doing BPF updates. The BPF events cause a wakeup every load and unload,
>> and perf processes not only the VIP events but then walks all of the
>> other maps.
> 
> BPF prog load/unload events should be rare events in real world use cases.
> So I think the overhead is OK. Also, I don't see an easy way to improve
> this.
> 
>>
>>> @@ -1686,6 +1734,8 @@ static struct option __record_options[] = {
>>> 			  "signal"),
>>> 	OPT_BOOLEAN(0, "dry-run", &dry_run,
>>> 		    "Parse options then exit"),
>>> +	OPT_BOOLEAN(0, "no-bpf-event", &record.no_bpf_event,
>>> +		    "do not record event on bpf program load/unload"),
>>
>> Why should this default on? If am recording FIB events, I don't care
>> about BPF events.
>>
> 
> I am OK with default off if that's the preferred way.

I think concerns with perf overhead from collecting bpf events
are unfounded.
I would prefer for this flag to be on by default.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-06 23:29       ` Alexei Starovoitov
@ 2018-11-06 23:36         ` David Miller
  2018-11-07  0:13           ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2018-11-06 23:36 UTC (permalink / raw)
  To: ast
  Cc: songliubraving, dsahern, netdev, linux-kernel, Kernel-team, ast,
	daniel, peterz, acme

From: Alexei Starovoitov <ast@fb.com>
Date: Tue, 6 Nov 2018 23:29:07 +0000

> I think concerns with perf overhead from collecting bpf events
> are unfounded.
> I would prefer for this flag to be on by default.

I will sit in userspace looping over bpf load/unload and see how the
person trying to monitor something else with perf feels about that.

Really, it is inappropriate to turn this on by default, I completely
agree with David Ahern.

It's hard enough, _AS IS_, for me to fight back all of the bloat that
is in perf right now and get it back to being able to handle simple
full workloads without dropping events..

Every new event type like this sets us back.

If people want to monitor new things, or have new functionality, fine.

But not by default, please.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-06 23:36         ` David Miller
@ 2018-11-07  0:13           ` Alexei Starovoitov
  2018-11-07  0:23             ` David Ahern
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2018-11-07  0:13 UTC (permalink / raw)
  To: David Miller
  Cc: Song Liu, dsahern, netdev, linux-kernel, Kernel Team, ast,
	daniel, peterz, acme

On 11/6/18 3:36 PM, David Miller wrote:
> From: Alexei Starovoitov <ast@fb.com>
> Date: Tue, 6 Nov 2018 23:29:07 +0000
> 
>> I think concerns with perf overhead from collecting bpf events
>> are unfounded.
>> I would prefer for this flag to be on by default.
> 
> I will sit in userspace looping over bpf load/unload and see how the
> person trying to monitor something else with perf feels about that.
> 
> Really, it is inappropriate to turn this on by default, I completely
> agree with David Ahern.
> 
> It's hard enough, _AS IS_, for me to fight back all of the bloat that
> is in perf right now and get it back to being able to handle simple
> full workloads without dropping events..

It's a separate perf thread and separate event with its own epoll.
I don't see how it can affect main event collection.
Let's put it this way. If it does affect somehow, then yes,
it should not be on. If it is not, there is no downside to keep it on.
Typical user expects to type 'perf record' and see everything that
is happening on the system. Right now short lived bpf programs
will not be seen. How user suppose to even know when to use the flag?
The only option is to always pass the flag 'just in case'
which is unnecessary burden.
The problem of dropped events is certainly valid, but it's
a separate issue. The aio stuff that Alexey Budankov is working on
suppose to address that.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-07  0:13           ` Alexei Starovoitov
@ 2018-11-07  0:23             ` David Ahern
  2018-11-07  0:26               ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2018-11-07  0:23 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller
  Cc: Song Liu, netdev, linux-kernel, Kernel Team, ast, daniel, peterz, acme

On 11/6/18 5:13 PM, Alexei Starovoitov wrote:
> On 11/6/18 3:36 PM, David Miller wrote:
>> From: Alexei Starovoitov <ast@fb.com>
>> Date: Tue, 6 Nov 2018 23:29:07 +0000
>>
>>> I think concerns with perf overhead from collecting bpf events
>>> are unfounded.
>>> I would prefer for this flag to be on by default.
>>
>> I will sit in userspace looping over bpf load/unload and see how the
>> person trying to monitor something else with perf feels about that.
>>
>> Really, it is inappropriate to turn this on by default, I completely
>> agree with David Ahern.
>>
>> It's hard enough, _AS IS_, for me to fight back all of the bloat that
>> is in perf right now and get it back to being able to handle simple
>> full workloads without dropping events..
> 
> It's a separate perf thread and separate event with its own epoll.
> I don't see how it can affect main event collection.
> Let's put it this way. If it does affect somehow, then yes,
> it should not be on. If it is not, there is no downside to keep it on.
> Typical user expects to type 'perf record' and see everything that
> is happening on the system. Right now short lived bpf programs
> will not be seen. How user suppose to even know when to use the flag?

The default is profiling where perf record collects task events and
periodic samples. So for the default record/report, the bpf load /
unload events are not relevant.

> The only option is to always pass the flag 'just in case'
> which is unnecessary burden.

People who care about an event enable the event collection of the event.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-07  0:23             ` David Ahern
@ 2018-11-07  0:26               ` Alexei Starovoitov
  2018-11-07  0:44                 ` David Ahern
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2018-11-07  0:26 UTC (permalink / raw)
  To: David Ahern, David Miller
  Cc: Song Liu, netdev, linux-kernel, Kernel Team, ast, daniel, peterz, acme

On 11/6/18 4:23 PM, David Ahern wrote:
> On 11/6/18 5:13 PM, Alexei Starovoitov wrote:
>> On 11/6/18 3:36 PM, David Miller wrote:
>>> From: Alexei Starovoitov <ast@fb.com>
>>> Date: Tue, 6 Nov 2018 23:29:07 +0000
>>>
>>>> I think concerns with perf overhead from collecting bpf events
>>>> are unfounded.
>>>> I would prefer for this flag to be on by default.
>>>
>>> I will sit in userspace looping over bpf load/unload and see how the
>>> person trying to monitor something else with perf feels about that.
>>>
>>> Really, it is inappropriate to turn this on by default, I completely
>>> agree with David Ahern.
>>>
>>> It's hard enough, _AS IS_, for me to fight back all of the bloat that
>>> is in perf right now and get it back to being able to handle simple
>>> full workloads without dropping events..
>>
>> It's a separate perf thread and separate event with its own epoll.
>> I don't see how it can affect main event collection.
>> Let's put it this way. If it does affect somehow, then yes,
>> it should not be on. If it is not, there is no downside to keep it on.
>> Typical user expects to type 'perf record' and see everything that
>> is happening on the system. Right now short lived bpf programs
>> will not be seen. How user suppose to even know when to use the flag?
> 
> The default is profiling where perf record collects task events and
> periodic samples. So for the default record/report, the bpf load /
> unload events are not relevant.

Exactly the opposite.
It's for default 'perf record' collection of periodic samples.
It can be off for -e collection. That's easy.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-07  0:26               ` Alexei Starovoitov
@ 2018-11-07  0:44                 ` David Ahern
  2018-11-07  1:09                   ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2018-11-07  0:44 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller
  Cc: Song Liu, netdev, linux-kernel, Kernel Team, ast, daniel, peterz, acme

On 11/6/18 5:26 PM, Alexei Starovoitov wrote:
> On 11/6/18 4:23 PM, David Ahern wrote:
>> On 11/6/18 5:13 PM, Alexei Starovoitov wrote:
>>> On 11/6/18 3:36 PM, David Miller wrote:
>>>> From: Alexei Starovoitov <ast@fb.com>
>>>> Date: Tue, 6 Nov 2018 23:29:07 +0000
>>>>
>>>>> I think concerns with perf overhead from collecting bpf events
>>>>> are unfounded.
>>>>> I would prefer for this flag to be on by default.
>>>>
>>>> I will sit in userspace looping over bpf load/unload and see how the
>>>> person trying to monitor something else with perf feels about that.
>>>>
>>>> Really, it is inappropriate to turn this on by default, I completely
>>>> agree with David Ahern.
>>>>
>>>> It's hard enough, _AS IS_, for me to fight back all of the bloat that
>>>> is in perf right now and get it back to being able to handle simple
>>>> full workloads without dropping events..
>>>
>>> It's a separate perf thread and separate event with its own epoll.
>>> I don't see how it can affect main event collection.
>>> Let's put it this way. If it does affect somehow, then yes,
>>> it should not be on. If it is not, there is no downside to keep it on.
>>> Typical user expects to type 'perf record' and see everything that
>>> is happening on the system. Right now short lived bpf programs
>>> will not be seen. How user suppose to even know when to use the flag?
>>
>> The default is profiling where perf record collects task events and
>> periodic samples. So for the default record/report, the bpf load /
>> unload events are not relevant.
> 
> Exactly the opposite.
> It's for default 'perf record' collection of periodic samples.
> It can be off for -e collection. That's easy.
> 

So one use case is profiling bpf programs. I was also considering the
auditing discussion from some weeks ago which I thought the events are
also targeting.

As for the overhead, I did not see a separate thread getting spun off
for the bpf events, so the events are processed inline for this RFC set.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-07  0:44                 ` David Ahern
@ 2018-11-07  1:09                   ` Alexei Starovoitov
  2018-11-07 18:12                     ` David Ahern
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2018-11-07  1:09 UTC (permalink / raw)
  To: David Ahern, David Miller
  Cc: Song Liu, netdev, linux-kernel, Kernel Team, ast, daniel, peterz, acme

On 11/6/18 4:44 PM, David Ahern wrote:
> 
> So one use case is profiling bpf programs. I was also considering the
> auditing discussion from some weeks ago which I thought the events are
> also targeting.

yes. there should be separate mode for 're: audit discussion' where
only bpf events are collected. This patch set doesn't add that to
perf user space side.
The kernel side is common though. It can be used for bpf load/unload
only and for different use case in this set. Which is making
bpf program appear in normal 'perf report'.
Please see link in cover letter.
We decided to abandon my old approach in favor of this one,
but the end result is the same.
 From 0.81% cpu of some magic 0x00007fffa001a660
into 18.13% of bpf_prog_1accc788e7f04c38_balancer_ingres


> As for the overhead, I did not see a separate thread getting spun off
> for the bpf events, so the events are processed inline for this RFC set.

argh. you're right. we have to fix that.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-11-06 20:52 ` [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT Song Liu
@ 2018-11-07  8:40   ` Peter Zijlstra
  2018-11-07 18:25     ` Song Liu
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2018-11-07  8:40 UTC (permalink / raw)
  To: Song Liu; +Cc: netdev, linux-kernel, kernel-team, ast, daniel, acme

On Tue, Nov 06, 2018 at 12:52:42PM -0800, Song Liu wrote:
> For better performance analysis of BPF programs, this patch introduces
> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
> load/unload information to user space.
> 
>         /*
>          * Record different types of bpf events:
>          *   enum perf_bpf_event_type {
>          *      PERF_BPF_EVENT_UNKNOWN          = 0,
>          *      PERF_BPF_EVENT_PROG_LOAD        = 1,
>          *      PERF_BPF_EVENT_PROG_UNLOAD      = 2,
>          *   };
>          *
>          * struct {
>          *      struct perf_event_header header;
>          *      u16 type;
>          *      u16 flags;
>          *      u32 id;  // prog_id or map_id
>          * };
>          */
>         PERF_RECORD_BPF_EVENT                   = 17,
> 
> PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
> Perf utility (or other user space tools) should listen to this event and
> fetch more details about the event via BPF syscalls
> (BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).

Why !? You're failing to explain why it cannot provide the full
information there.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-07  1:09                   ` Alexei Starovoitov
@ 2018-11-07 18:12                     ` David Ahern
  2018-11-07 18:28                       ` Song Liu
  0 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2018-11-07 18:12 UTC (permalink / raw)
  To: Alexei Starovoitov, David Ahern, David Miller
  Cc: Song Liu, netdev, linux-kernel, Kernel Team, ast, daniel, peterz, acme

On 11/6/18 6:09 PM, Alexei Starovoitov wrote:
> On 11/6/18 4:44 PM, David Ahern wrote:
>>
>> So one use case is profiling bpf programs. I was also considering the
>> auditing discussion from some weeks ago which I thought the events are
>> also targeting.
> 
> yes. there should be separate mode for 're: audit discussion' where
> only bpf events are collected. This patch set doesn't add that to
> perf user space side.
> The kernel side is common though. It can be used for bpf load/unload
> only and for different use case in this set. Which is making
> bpf program appear in normal 'perf report'.

It would be good for perf-script to work in the next version. Users
should be able to dump the event from the kernel and the synthesized
events. Should be trivial to add and allows a review of both
perspectives -- profiling and monitoring events.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-11-07  8:40   ` Peter Zijlstra
@ 2018-11-07 18:25     ` Song Liu
  2018-11-08 15:00       ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Song Liu @ 2018-11-07 18:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Netdev, lkml, Kernel Team, ast, daniel, acme



> On Nov 7, 2018, at 12:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Nov 06, 2018 at 12:52:42PM -0800, Song Liu wrote:
>> For better performance analysis of BPF programs, this patch introduces
>> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
>> load/unload information to user space.
>> 
>>        /*
>>         * Record different types of bpf events:
>>         *   enum perf_bpf_event_type {
>>         *      PERF_BPF_EVENT_UNKNOWN          = 0,
>>         *      PERF_BPF_EVENT_PROG_LOAD        = 1,
>>         *      PERF_BPF_EVENT_PROG_UNLOAD      = 2,
>>         *   };
>>         *
>>         * struct {
>>         *      struct perf_event_header header;
>>         *      u16 type;
>>         *      u16 flags;
>>         *      u32 id;  // prog_id or map_id
>>         * };
>>         */
>>        PERF_RECORD_BPF_EVENT                   = 17,
>> 
>> PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
>> Perf utility (or other user space tools) should listen to this event and
>> fetch more details about the event via BPF syscalls
>> (BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).
> 
> Why !? You're failing to explain why it cannot provide the full
> information there.

Aha, I missed this part. I will add the following to next version. Please
let me know if anything is not clear.

Thanks,
Song



This design decision is picked for the following reasons. First, BPF 
programs could be loaded-and-jited and/or unloaded before/during/after 
perf-record run. Once a BPF programs is unloaded, it is impossible to 
recover details of the program. It is impossible to provide the 
information through a simple key (like the build ID). Second, BPF prog
annotation is under fast developments. Multiple informations will be 
added to bpf_prog_info in the next few releases. Including all the
information of a BPF program in the perf ring buffer requires frequent 
changes to the perf ABI, and thus makes it very difficult to manage 
compatibility of perf utility. 

For more information on this topic, please refer to the following 
discussions: 

    https://www.spinics.net/lists/netdev/msg524230.html


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs
  2018-11-07 18:12                     ` David Ahern
@ 2018-11-07 18:28                       ` Song Liu
  0 siblings, 0 replies; 27+ messages in thread
From: Song Liu @ 2018-11-07 18:28 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexei Starovoitov, David Miller, netdev, linux-kernel,
	Kernel Team, ast, daniel, peterz, acme



> On Nov 7, 2018, at 10:12 AM, David Ahern <dsahern@gmail.com> wrote:
> 
> On 11/6/18 6:09 PM, Alexei Starovoitov wrote:
>> On 11/6/18 4:44 PM, David Ahern wrote:
>>> 
>>> So one use case is profiling bpf programs. I was also considering the
>>> auditing discussion from some weeks ago which I thought the events are
>>> also targeting.
>> 
>> yes. there should be separate mode for 're: audit discussion' where
>> only bpf events are collected. This patch set doesn't add that to
>> perf user space side.
>> The kernel side is common though. It can be used for bpf load/unload
>> only and for different use case in this set. Which is making
>> bpf program appear in normal 'perf report'.
> 
> It would be good for perf-script to work in the next version. Users
> should be able to dump the event from the kernel and the synthesized
> events. Should be trivial to add and allows a review of both
> perspectives -- profiling and monitoring events.

Yes, we do plan to include these information in other perf commands, 
including perf-script, perf-top, perf-annotate, etc. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-11-07 18:25     ` Song Liu
@ 2018-11-08 15:00       ` Peter Zijlstra
  2018-11-08 18:04         ` Song Liu
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2018-11-08 15:00 UTC (permalink / raw)
  To: Song Liu; +Cc: Netdev, lkml, Kernel Team, ast, daniel, acme

On Wed, Nov 07, 2018 at 06:25:04PM +0000, Song Liu wrote:
> 
> 
> > On Nov 7, 2018, at 12:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > On Tue, Nov 06, 2018 at 12:52:42PM -0800, Song Liu wrote:
> >> For better performance analysis of BPF programs, this patch introduces
> >> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
> >> load/unload information to user space.
> >> 
> >>        /*
> >>         * Record different types of bpf events:
> >>         *   enum perf_bpf_event_type {
> >>         *      PERF_BPF_EVENT_UNKNOWN          = 0,
> >>         *      PERF_BPF_EVENT_PROG_LOAD        = 1,
> >>         *      PERF_BPF_EVENT_PROG_UNLOAD      = 2,
> >>         *   };
> >>         *
> >>         * struct {
> >>         *      struct perf_event_header header;
> >>         *      u16 type;
> >>         *      u16 flags;
> >>         *      u32 id;  // prog_id or map_id
> >>         * };
> >>         */
> >>        PERF_RECORD_BPF_EVENT                   = 17,
> >> 
> >> PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
> >> Perf utility (or other user space tools) should listen to this event and
> >> fetch more details about the event via BPF syscalls
> >> (BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).
> > 
> > Why !? You're failing to explain why it cannot provide the full
> > information there.
> 
> Aha, I missed this part. I will add the following to next version. Please
> let me know if anything is not clear.

> 
> This design decision is picked for the following reasons. First, BPF 
> programs could be loaded-and-jited and/or unloaded before/during/after 
> perf-record run. Once a BPF programs is unloaded, it is impossible to 
> recover details of the program. It is impossible to provide the 
> information through a simple key (like the build ID). Second, BPF prog
> annotation is under fast developments. Multiple informations will be 
> added to bpf_prog_info in the next few releases. Including all the
> information of a BPF program in the perf ring buffer requires frequent 
> changes to the perf ABI, and thus makes it very difficult to manage 
> compatibility of perf utility. 

So I don't agree with that reasoning. If you want symbol information
you'll just have to commit to some form of ABI. That bpf_prog_info is an
ABI too.

And relying on userspace to synchronously consume perf output to
directly call into the kernel again to get more info (through another
ABI) is a pretty terrible design.

So please try harder. NAK on this.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-11-08 15:00       ` Peter Zijlstra
@ 2018-11-08 18:04         ` Song Liu
  2018-11-08 18:26           ` David Ahern
  0 siblings, 1 reply; 27+ messages in thread
From: Song Liu @ 2018-11-08 18:04 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Netdev, lkml, Kernel Team, ast, daniel, acme

Hi Peter,

> On Nov 8, 2018, at 7:00 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Nov 07, 2018 at 06:25:04PM +0000, Song Liu wrote:
>> 
>> 
>>> On Nov 7, 2018, at 12:40 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> 
>>> On Tue, Nov 06, 2018 at 12:52:42PM -0800, Song Liu wrote:
>>>> For better performance analysis of BPF programs, this patch introduces
>>>> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
>>>> load/unload information to user space.
>>>> 
>>>>       /*
>>>>        * Record different types of bpf events:
>>>>        *   enum perf_bpf_event_type {
>>>>        *      PERF_BPF_EVENT_UNKNOWN          = 0,
>>>>        *      PERF_BPF_EVENT_PROG_LOAD        = 1,
>>>>        *      PERF_BPF_EVENT_PROG_UNLOAD      = 2,
>>>>        *   };
>>>>        *
>>>>        * struct {
>>>>        *      struct perf_event_header header;
>>>>        *      u16 type;
>>>>        *      u16 flags;
>>>>        *      u32 id;  // prog_id or map_id
>>>>        * };
>>>>        */
>>>>       PERF_RECORD_BPF_EVENT                   = 17,
>>>> 
>>>> PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
>>>> Perf utility (or other user space tools) should listen to this event and
>>>> fetch more details about the event via BPF syscalls
>>>> (BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).
>>> 
>>> Why !? You're failing to explain why it cannot provide the full
>>> information there.
>> 
>> Aha, I missed this part. I will add the following to next version. Please
>> let me know if anything is not clear.
> 
>> 
>> This design decision is picked for the following reasons. First, BPF 
>> programs could be loaded-and-jited and/or unloaded before/during/after 
>> perf-record run. Once a BPF programs is unloaded, it is impossible to 
>> recover details of the program. It is impossible to provide the 
>> information through a simple key (like the build ID). Second, BPF prog
>> annotation is under fast developments. Multiple informations will be 
>> added to bpf_prog_info in the next few releases. Including all the
>> information of a BPF program in the perf ring buffer requires frequent 
>> changes to the perf ABI, and thus makes it very difficult to manage 
>> compatibility of perf utility. 
> 
> So I don't agree with that reasoning. If you want symbol information
> you'll just have to commit to some form of ABI. That bpf_prog_info is an
> ABI too.

At the beginning of the perf-record run, perf need to query bpf_prog_info 
of already loaded BPF programs. Therefore, we need to commit to the 
bpf_prog_info ABI. If we also include full information of the BPF program 
in the perf ring buffer, we will commit to TWO ABIs. 

Also, perf-record write the event to perf.data file, so the data need to be 
serialized. This is implemented in patch 4/5. To include the data in the 
ring buffer, we will need another piece of code in the kernel to do the
same serialization work.   

On the other hand, processing BPF load/unload events synchronously should
not introduce too much overhead for meaningful use cases. If many BPF progs
are being loaded/unloaded within short period of time, it is not the steady
state that profiling works care about. 

Would these resolve your concerns? 

Thanks,
Song


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-11-08 18:04         ` Song Liu
@ 2018-11-08 18:26           ` David Ahern
  2018-11-08 18:49             ` Song Liu
  0 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2018-11-08 18:26 UTC (permalink / raw)
  To: Song Liu, Peter Zijlstra; +Cc: Netdev, lkml, Kernel Team, ast, daniel, acme

On 11/8/18 11:04 AM, Song Liu wrote:
> On the other hand, processing BPF load/unload events synchronously should
> not introduce too much overhead for meaningful use cases. If many BPF progs
> are being loaded/unloaded within short period of time, it is not the steady
> state that profiling works care about. 

but, profiling is not the only use case, and perf-record is common with
those other use cases.

I think that answers why your RFC set does not fork a thread for the bpf
events. You are focused on profiling and for already loaded programs or
for a small number of programs loaded by a specific workload started by
perf.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-11-08 18:26           ` David Ahern
@ 2018-11-08 18:49             ` Song Liu
  2018-11-09 17:08               ` David Ahern
  0 siblings, 1 reply; 27+ messages in thread
From: Song Liu @ 2018-11-08 18:49 UTC (permalink / raw)
  To: David Ahern; +Cc: Peter Zijlstra, Netdev, lkml, Kernel Team, ast, daniel, acme

Hi David,

> On Nov 8, 2018, at 10:26 AM, David Ahern <dsahern@gmail.com> wrote:
> 
> On 11/8/18 11:04 AM, Song Liu wrote:
>> On the other hand, processing BPF load/unload events synchronously should
>> not introduce too much overhead for meaningful use cases. If many BPF progs
>> are being loaded/unloaded within short period of time, it is not the steady
>> state that profiling works care about. 
> 
> but, profiling is not the only use case, and perf-record is common with
> those other use cases.
> 
> I think that answers why your RFC set does not fork a thread for the bpf
> events. You are focused on profiling and for already loaded programs or
> for a small number of programs loaded by a specific workload started by
> perf.

We sure can fork a thread for the BPF event. But I guess that's not Peter's
main concern here...

Could you please point me to more information about the use cases you worry 
about? I am more than happy to optimize the logic for those use cases. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-11-08 18:49             ` Song Liu
@ 2018-11-09 17:08               ` David Ahern
  2018-11-09 18:49                 ` Song Liu
  0 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2018-11-09 17:08 UTC (permalink / raw)
  To: Song Liu; +Cc: Peter Zijlstra, Netdev, lkml, Kernel Team, ast, daniel, acme

On 11/8/18 11:49 AM, Song Liu wrote:
> Could you please point me to more information about the use cases you worry 
> about? I am more than happy to optimize the logic for those use cases. 

bpf load and unload as just another tracepoint to throw into a set of
events that are monitored. As mentioned before auditing the loads and
unloads is one example.

And that brings up another comment: Why are you adding a PERF_RECORD_*
rather than a tracepoint? From what I can see the PERF_RECORD_BPF_EVENT
definition does not include the who is loading / unloading a bpf
program. That is important information as well.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-11-09 17:08               ` David Ahern
@ 2018-11-09 18:49                 ` Song Liu
  2018-11-09 19:14                   ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Song Liu @ 2018-11-09 18:49 UTC (permalink / raw)
  To: David Ahern; +Cc: Peter Zijlstra, Netdev, lkml, Kernel Team, ast, daniel, acme



> On Nov 9, 2018, at 9:08 AM, David Ahern <dsahern@gmail.com> wrote:
> 
> On 11/8/18 11:49 AM, Song Liu wrote:
>> Could you please point me to more information about the use cases you worry 
>> about? I am more than happy to optimize the logic for those use cases. 
> 
> bpf load and unload as just another tracepoint to throw into a set of
> events that are monitored. As mentioned before auditing the loads and
> unloads is one example.

For the tracepoint approach, we need similar synchronous logic in perf to 
process BPF load/unload events. If we agree this is the right direction, 
I will modify the set with tracepoints. 

> 
> And that brings up another comment: Why are you adding a PERF_RECORD_*
> rather than a tracepoint? From what I can see the PERF_RECORD_BPF_EVENT
> definition does not include the who is loading / unloading a bpf
> program. That is important information as well.

bpf_prog_info has "__u32 created_by_uid;" in it, so it is not really needed
in current version. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-11-09 18:49                 ` Song Liu
@ 2018-11-09 19:14                   ` Alexei Starovoitov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2018-11-09 19:14 UTC (permalink / raw)
  To: Song Liu, David Ahern
  Cc: Peter Zijlstra, Netdev, lkml, Kernel Team, ast, daniel, acme

On 11/9/18 10:49 AM, Song Liu wrote:
> 
> 
>> On Nov 9, 2018, at 9:08 AM, David Ahern <dsahern@gmail.com> wrote:
>>
>> On 11/8/18 11:49 AM, Song Liu wrote:
>>> Could you please point me to more information about the use cases you worry
>>> about? I am more than happy to optimize the logic for those use cases.
>>
>> bpf load and unload as just another tracepoint to throw into a set of
>> events that are monitored. As mentioned before auditing the loads and
>> unloads is one example.
> 
> For the tracepoint approach, we need similar synchronous logic in perf to
> process BPF load/unload events. If we agree this is the right direction,
> I will modify the set with tracepoints.

we had tracepoints in bpf core. We removed it.
I don't think we'll be going back.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-11-09 19:15 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-06 20:52 [RFC perf,bpf 0/5] reveal invisible bpf programs Song Liu
2018-11-06 20:52 ` [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT Song Liu
2018-11-07  8:40   ` Peter Zijlstra
2018-11-07 18:25     ` Song Liu
2018-11-08 15:00       ` Peter Zijlstra
2018-11-08 18:04         ` Song Liu
2018-11-08 18:26           ` David Ahern
2018-11-08 18:49             ` Song Liu
2018-11-09 17:08               ` David Ahern
2018-11-09 18:49                 ` Song Liu
2018-11-09 19:14                   ` Alexei Starovoitov
2018-11-06 20:52 ` [RFC perf,bpf 2/5] perf: sync tools/include/uapi/linux/perf_event.h Song Liu
2018-11-06 20:52 ` [RFC perf,bpf 3/5] perf util: basic handling of PERF_RECORD_BPF_EVENT Song Liu
2018-11-06 20:52 ` [RFC perf,bpf 4/5] perf util: introduce bpf_prog_info_event Song Liu
2018-11-06 21:11   ` Alexei Starovoitov
2018-11-06 20:52 ` [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs Song Liu
2018-11-06 21:54   ` David Ahern
2018-11-06 23:17     ` Song Liu
2018-11-06 23:29       ` Alexei Starovoitov
2018-11-06 23:36         ` David Miller
2018-11-07  0:13           ` Alexei Starovoitov
2018-11-07  0:23             ` David Ahern
2018-11-07  0:26               ` Alexei Starovoitov
2018-11-07  0:44                 ` David Ahern
2018-11-07  1:09                   ` Alexei Starovoitov
2018-11-07 18:12                     ` David Ahern
2018-11-07 18:28                       ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).