linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 perf, bpf-next 0/4] reveal invisible bpf programs
@ 2018-12-11 23:33 Song Liu
  2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT Song Liu
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Song Liu @ 2018-12-11 23:33 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Song Liu, ast, daniel, kernel-team, peterz, acme

This set catches symbol for all BPF programs loaded/unloaded
before/during/after perf-record run in a mmap-like event,
PERF_RECORD_BPF_EVENT.

PERF_RECORD_BPF_EVENT includes key information of a BPF program load and
unload. It is sent through perf ringbuffer, and stored in perf.data.
Before this patch, perf-report will not be able to recover symbols of
BPF programs once the programs are unloaded.

Besides basic mmap-like information (name, addr, size),
PERF_RECORD_BPF_EVENT includes BPF program id, and tag. These information
can be used for detailed performance analysis like annotation.

This is to follow up Alexei's early effort [2] to show bpf programs via
mmap events.

Thanks,
Song


Changes v2 -> PATCH v3:
1. Rebase on bpf-next tree, and on top of BPF sub program tag patches [1]
   for latest information in bpf_prog_info.
2. Complete handling and synthesizing PERF_RECORD_BPF_EVENT in perf.

Changes v1 -> PATCH v2:
1. Only 3 of the 5 patches in v1, to focus on ABI first;
2. Generate PERF_RECORD_BPF_EVENT per bpf sub program instead of per prog;
3. Modify PERF_RECORD_BPF_EVENT with more details (addr, len, name),
   so that it can be used for basic profiling without calling sys_bpf.

Changes RFC -> PATCH v1:
1. In perf-record, poll vip events in a separate thread;
2. Add tag to bpf prog name;
3. Small refactorings.

[1] https://patchwork.ozlabs.org/project/netdev/list/?series=81037
[2] https://www.spinics.net/lists/netdev/msg524232.html

Song Liu (4):
  perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  perf: sync tools/include/uapi/linux/perf_event.h
  perf util: handle PERF_RECORD_BPF_EVENT
  perf tools: synthesize bpf event for loaded BPF programs

 include/linux/filter.h                |   7 +
 include/linux/perf_event.h            |  10 +
 include/uapi/linux/perf_event.h       |  35 +++-
 kernel/bpf/core.c                     |   2 +-
 kernel/bpf/syscall.c                  |   3 +
 kernel/events/core.c                  | 123 ++++++++++++-
 tools/include/uapi/linux/perf_event.h |  35 +++-
 tools/perf/builtin-record.c           |   6 +
 tools/perf/util/Build                 |   2 +
 tools/perf/util/bpf-event.c           | 251 ++++++++++++++++++++++++++
 tools/perf/util/bpf-event.h           |  15 ++
 tools/perf/util/event.c               |  23 +++
 tools/perf/util/event.h               |  26 +++
 tools/perf/util/evsel.c               |   9 +
 tools/perf/util/evsel.h               |   1 +
 tools/perf/util/machine.c             |   3 +
 tools/perf/util/session.c             |   4 +
 tools/perf/util/tool.h                |   4 +-
 18 files changed, 554 insertions(+), 5 deletions(-)
 create mode 100644 tools/perf/util/bpf-event.c
 create mode 100644 tools/perf/util/bpf-event.h

--
2.17.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-11 23:33 [PATCH v3 perf, bpf-next 0/4] reveal invisible bpf programs Song Liu
@ 2018-12-11 23:33 ` Song Liu
  2018-12-12 12:06   ` Peter Zijlstra
  2018-12-12 13:15   ` Peter Zijlstra
  2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 2/4] perf: sync tools/include/uapi/linux/perf_event.h Song Liu
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: Song Liu @ 2018-12-11 23:33 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Song Liu, ast, daniel, kernel-team, peterz, acme

For better performance analysis of BPF programs, this patch introduces
PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
load/unload information to user space.

Each BPF program may contain up to BPF_MAX_SUBPROGS (256) sub programs.
The following example shows kernel symbols for a BPF program with 7
sub programs:

    ffffffffa0257cf9 t bpf_prog_b07ccb89267cf242_F
    ffffffffa02592e1 t bpf_prog_2dcecc18072623fc_F
    ffffffffa025b0e9 t bpf_prog_bb7a405ebaec5d5c_F
    ffffffffa025dd2c t bpf_prog_a7540d4a39ec1fc7_F
    ffffffffa025fcca t bpf_prog_05762d4ade0e3737_F
    ffffffffa026108f t bpf_prog_db4bd11e35df90d4_F
    ffffffffa0263f00 t bpf_prog_89d64e4abf0f0126_F
    ffffffffa0257cf9 t bpf_prog_ae31629322c4b018__dummy_tracepoi

Note that these sub programs are not allocated in contiguous memory
ranges. Instead, each of them occupies separate page(s). The starting
address of these sub programs are randomized within the page(s) for
security reasons.

The following data structure is used for PERF_RECORD_BPF_EVENT. It is
generated for each _sub program_:

        /*
         * Record different types of bpf events:
         *  enum perf_bpf_event_type {
         *     PERF_BPF_EVENT_UNKNOWN           = 0,
         *     PERF_BPF_EVENT_PROG_LOAD         = 1,
         *     PERF_BPF_EVENT_PROG_UNLOAD       = 2,
         *  };
         *
         * struct {
         *      struct perf_event_header header;
         *      u32                             type;
         *      u32                             flags;
         *      u32                             id; // prog_id or other id
         *      u32                             sub_id; // subprog id
         *
         *      // for bpf_prog types, bpf prog or subprog
         *      u8                              tag[BPF_TAG_SIZE];
         *      u64                             addr;
         *      u64                             len;
         *      char                            name[];
         *      struct sample_id                sample_id;
         * };
         */

This data is designed for different use cases:

  1. For simple perf profiling, addr, len, and name[] works similar to
     PERF_RECORD_MMAP. These raw records are stored in perf.data file.
  2. For perf annotation and other cases that needs more details of the
     BPF program, id and sub_id are used to extract detailed information
     of the prog through sys_bpf(BPF_OBJ_GET_INFO_BY_FD). User space
     tools are responsible to save the detailed information properly, as
     these information will not be available after the bpf program is
     unloaded.

This follows the existing perf model of keeping the ordered records
with enough information for profiling while keeping keys for reliably
finding extra, more voluminous information for further analysis, like
raw jitted binaries augmented with line numbers that can be used for
disassembly, annotation, etc

Currently, PERF_RECORD_BPF_EVENT only support two events:
PERF_BPF_EVENT_PROG_LOAD and PERF_BPF_EVENT_PROG_UNLOAD. But it can be
easily extended to support more events.

Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 include/linux/filter.h          |   7 ++
 include/linux/perf_event.h      |  10 +++
 include/uapi/linux/perf_event.h |  35 ++++++++-
 kernel/bpf/core.c               |   2 +-
 kernel/bpf/syscall.c            |   3 +
 kernel/events/core.c            | 123 +++++++++++++++++++++++++++++++-
 6 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 537e9e7c6e6f..45d23560f90b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -955,6 +955,7 @@ bpf_address_lookup(unsigned long addr, unsigned long *size,
 
 void bpf_prog_kallsyms_add(struct bpf_prog *fp);
 void bpf_prog_kallsyms_del(struct bpf_prog *fp);
+void bpf_get_prog_name(const struct bpf_prog *prog, char *sym);
 
 #else /* CONFIG_BPF_JIT */
 
@@ -1010,6 +1011,12 @@ static inline void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 static inline void bpf_prog_kallsyms_del(struct bpf_prog *fp)
 {
 }
+
+static inline void bpf_get_prog_name(const struct bpf_prog *prog, char *sym)
+{
+	sym[0] = '\0';
+}
+
 #endif /* CONFIG_BPF_JIT */
 
 void bpf_prog_kallsyms_del_subprogs(struct bpf_prog *fp);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 53c500f0ca79..02217bab64d0 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1113,6 +1113,12 @@ static inline void perf_event_task_sched_out(struct task_struct *prev,
 }
 
 extern void perf_event_mmap(struct vm_area_struct *vma);
+
+#ifdef CONFIG_BPF_SYSCALL
+extern void perf_event_bpf_event_prog(enum perf_bpf_event_type type,
+				      struct bpf_prog *prog);
+#endif
+
 extern struct perf_guest_info_callbacks *perf_guest_cbs;
 extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
 extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
@@ -1333,6 +1339,10 @@ static inline int perf_unregister_guest_info_callbacks
 (struct perf_guest_info_callbacks *callbacks)				{ return 0; }
 
 static inline void perf_event_mmap(struct vm_area_struct *vma)		{ }
+#ifdef CONFIG_BPF_SYSCALL
+static inline void perf_event_bpf_event_prog(enum perf_bpf_event_type type,
+					     struct bpf_prog *prog)	{ }
+#endif
 static inline void perf_event_exec(void)				{ }
 static inline void perf_event_comm(struct task_struct *tsk, bool exec)	{ }
 static inline void perf_event_namespaces(struct task_struct *tsk)	{ }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9de8780ac8d9..0ae3dae55fa8 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -372,7 +372,8 @@ struct perf_event_attr {
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
 				namespaces     :  1, /* include namespaces data */
-				__reserved_1   : 35;
+				bpf_event      :  1, /* include bpf events */
+				__reserved_1   : 34;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -965,9 +966,41 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_NAMESPACES			= 16,
 
+	/*
+	 * Record different types of bpf events:
+	 *  enum perf_bpf_event_type {
+	 *     PERF_BPF_EVENT_UNKNOWN		= 0,
+	 *     PERF_BPF_EVENT_PROG_LOAD		= 1,
+	 *     PERF_BPF_EVENT_PROG_UNLOAD	= 2,
+	 *  };
+	 *
+	 * struct {
+	 *	struct perf_event_header header;
+	 *	u32				type;
+	 *	u32				flags;
+	 *	u32				id; // prog_id or other id
+	 *	u32				sub_id; // subprog id
+	 *
+	 *	// for bpf_prog types, bpf prog or subprog
+	 *	u8				tag[BPF_TAG_SIZE];
+	 *	u64				addr;
+	 *	u64				len;
+	 *	char				name[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_BPF_EVENT			= 17,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
+enum perf_bpf_event_type {
+	PERF_BPF_EVENT_UNKNOWN		= 0,
+	PERF_BPF_EVENT_PROG_LOAD	= 1,
+	PERF_BPF_EVENT_PROG_UNLOAD	= 2,
+	PERF_BPF_EVENT_MAX,		/* non-ABI */
+};
+
 #define PERF_MAX_STACK_DEPTH		127
 #define PERF_MAX_CONTEXTS_PER_STACK	  8
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 5cdd8da0e7f2..2a8364294f11 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -496,7 +496,7 @@ bpf_get_prog_addr_region(const struct bpf_prog *prog,
 	*symbol_end   = addr + hdr->pages * PAGE_SIZE;
 }
 
-static void bpf_get_prog_name(const struct bpf_prog *prog, char *sym)
+void bpf_get_prog_name(const struct bpf_prog *prog, char *sym)
 {
 	const char *end = sym + KSYM_NAME_LEN;
 	const struct btf_type *type;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 496c2f695f2d..a97a3698b7a4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1210,6 +1210,8 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
 {
 	if (atomic_dec_and_test(&prog->aux->refcnt)) {
+		perf_event_bpf_event_prog(PERF_BPF_EVENT_PROG_UNLOAD, prog);
+
 		/* bpf_prog_free_id() must be called first */
 		bpf_prog_free_id(prog, do_idr_lock);
 		bpf_prog_kallsyms_del_all(prog);
@@ -1558,6 +1560,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	}
 
 	bpf_prog_kallsyms_add(prog);
+	perf_event_bpf_event_prog(PERF_BPF_EVENT_PROG_LOAD, prog);
 	return err;
 
 free_used_maps:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 84530ab358c3..4b2d1a58d3fe 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -385,6 +385,7 @@ static atomic_t nr_namespaces_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
 static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
+static atomic_t nr_bpf_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -4235,7 +4236,7 @@ static bool is_sb_event(struct perf_event *event)
 
 	if (attr->mmap || attr->mmap_data || attr->mmap2 ||
 	    attr->comm || attr->comm_exec ||
-	    attr->task ||
+	    attr->task || attr->bpf_event ||
 	    attr->context_switch)
 		return true;
 	return false;
@@ -4305,6 +4306,8 @@ static void unaccount_event(struct perf_event *event)
 		dec = true;
 	if (has_branch_stack(event))
 		dec = true;
+	if (event->attr.bpf_event)
+		atomic_dec(&nr_bpf_events);
 
 	if (dec) {
 		if (!atomic_add_unless(&perf_sched_count, -1, 1))
@@ -7650,6 +7653,122 @@ static void perf_log_throttle(struct perf_event *event, int enable)
 	perf_output_end(&handle);
 }
 
+/*
+ * bpf load/unload tracking
+ */
+
+struct perf_bpf_event {
+	struct bpf_prog *prog;
+
+	struct {
+		struct perf_event_header        header;
+		u32 type;
+		u32 flags;
+		u32 id;
+		u32 sub_id;
+		u8 tag[BPF_TAG_SIZE];
+		u64 addr;
+		u64 len;
+	} event_id;
+};
+
+static int perf_event_bpf_match(struct perf_event *event)
+{
+	return event->attr.bpf_event;
+}
+
+static void perf_event_bpf_output(struct perf_event *event,
+				   void *data)
+{
+	struct perf_bpf_event *bpf_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	char name[KSYM_NAME_LEN];
+	int name_len;
+	int ret;
+
+	if (!perf_event_bpf_match(event))
+		return;
+
+	/* get prog name and round up to 64 bit aligned */
+	bpf_get_prog_name(bpf_event->prog, name);
+	name_len = strlen(name) + 1;
+	while (!IS_ALIGNED(name_len, sizeof(u64)))
+		name[name_len++] = '\0';
+	bpf_event->event_id.header.size += name_len;
+
+	perf_event_header__init_id(&bpf_event->event_id.header, &sample, event);
+	ret = perf_output_begin(&handle, event,
+				bpf_event->event_id.header.size);
+	if (ret)
+		return;
+
+	perf_output_put(&handle, bpf_event->event_id);
+
+	__output_copy(&handle, name, name_len);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+}
+
+static void perf_event_bpf(struct perf_bpf_event *bpf_event)
+{
+	perf_iterate_sb(perf_event_bpf_output,
+		       bpf_event,
+		       NULL);
+}
+
+static void perf_event_bpf_event_subprog(
+	enum perf_bpf_event_type type,
+	struct bpf_prog *prog, u32 id, u32 sub_id)
+{
+	struct perf_bpf_event bpf_event = (struct perf_bpf_event){
+		.prog = prog,
+		.event_id = {
+			.header = {
+				.type = PERF_RECORD_BPF_EVENT,
+				.size = sizeof(bpf_event.event_id),
+			},
+			.type = type,
+			/* .flags = 0 */
+			.id = id,
+			.sub_id = sub_id,
+			.addr = (u64)(unsigned long)prog->bpf_func,
+			.len = prog->jited_len,
+		},
+	};
+
+	memcpy(bpf_event.event_id.tag, prog->tag, BPF_TAG_SIZE);
+	perf_event_bpf(&bpf_event);
+}
+
+/*
+ * This is call per bpf_prog. In case of multiple sub programs,
+ * this function calls perf_event_bpf_event_subprog() multiple times
+ */
+void perf_event_bpf_event_prog(enum perf_bpf_event_type type,
+			       struct bpf_prog *prog)
+{
+	if (!atomic_read(&nr_bpf_events))
+		return;
+
+	if (type != PERF_BPF_EVENT_PROG_LOAD &&
+	    type != PERF_BPF_EVENT_PROG_UNLOAD)
+		return;
+
+	if (prog->aux->func_cnt == 0) {
+		perf_event_bpf_event_subprog(type, prog,
+					     prog->aux->id, 0);
+	} else {
+		int i;
+
+		for (i = 0; i < prog->aux->func_cnt; i++)
+			perf_event_bpf_event_subprog(type, prog->aux->func[i],
+						     prog->aux->id, i);
+	}
+}
+
 void perf_event_itrace_started(struct perf_event *event)
 {
 	event->attach_state |= PERF_ATTACH_ITRACE;
@@ -9900,6 +10019,8 @@ static void account_event(struct perf_event *event)
 		inc = true;
 	if (is_cgroup_event(event))
 		inc = true;
+	if (event->attr.bpf_event)
+		atomic_inc(&nr_bpf_events);
 
 	if (inc) {
 		/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 perf, bpf-next 2/4] perf: sync tools/include/uapi/linux/perf_event.h
  2018-12-11 23:33 [PATCH v3 perf, bpf-next 0/4] reveal invisible bpf programs Song Liu
  2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT Song Liu
@ 2018-12-11 23:33 ` Song Liu
  2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 3/4] perf util: handle PERF_RECORD_BPF_EVENT Song Liu
  2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 4/4] perf tools: synthesize bpf event for loaded BPF programs Song Liu
  3 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-12-11 23:33 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Song Liu, ast, daniel, kernel-team, peterz, acme

Sync changes for PERF_RECORD_BPF_EVENT.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/include/uapi/linux/perf_event.h | 35 ++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 9de8780ac8d9..0ae3dae55fa8 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -372,7 +372,8 @@ struct perf_event_attr {
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
 				namespaces     :  1, /* include namespaces data */
-				__reserved_1   : 35;
+				bpf_event      :  1, /* include bpf events */
+				__reserved_1   : 34;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -965,9 +966,41 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_NAMESPACES			= 16,
 
+	/*
+	 * Record different types of bpf events:
+	 *  enum perf_bpf_event_type {
+	 *     PERF_BPF_EVENT_UNKNOWN		= 0,
+	 *     PERF_BPF_EVENT_PROG_LOAD		= 1,
+	 *     PERF_BPF_EVENT_PROG_UNLOAD	= 2,
+	 *  };
+	 *
+	 * struct {
+	 *	struct perf_event_header header;
+	 *	u32				type;
+	 *	u32				flags;
+	 *	u32				id; // prog_id or other id
+	 *	u32				sub_id; // subprog id
+	 *
+	 *	// for bpf_prog types, bpf prog or subprog
+	 *	u8				tag[BPF_TAG_SIZE];
+	 *	u64				addr;
+	 *	u64				len;
+	 *	char				name[];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_BPF_EVENT			= 17,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
+enum perf_bpf_event_type {
+	PERF_BPF_EVENT_UNKNOWN		= 0,
+	PERF_BPF_EVENT_PROG_LOAD	= 1,
+	PERF_BPF_EVENT_PROG_UNLOAD	= 2,
+	PERF_BPF_EVENT_MAX,		/* non-ABI */
+};
+
 #define PERF_MAX_STACK_DEPTH		127
 #define PERF_MAX_CONTEXTS_PER_STACK	  8
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 perf, bpf-next 3/4] perf util: handle PERF_RECORD_BPF_EVENT
  2018-12-11 23:33 [PATCH v3 perf, bpf-next 0/4] reveal invisible bpf programs Song Liu
  2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT Song Liu
  2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 2/4] perf: sync tools/include/uapi/linux/perf_event.h Song Liu
@ 2018-12-11 23:33 ` Song Liu
  2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 4/4] perf tools: synthesize bpf event for loaded BPF programs Song Liu
  3 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-12-11 23:33 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Song Liu, ast, daniel, kernel-team, peterz, acme

This patch handles PERF_RECORD_BPF_EVENT in perf record/report.
Specifically, map and symbol are created for PERF_BPF_EVENT_PROG_LOAD,
and removed for PERF_BPF_EVENT_PROG_UNLOAD.

This patch also set perf_event_attr.bpf_event properly. The flag is
ON by default.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/perf/util/Build       |  2 ++
 tools/perf/util/bpf-event.c | 66 +++++++++++++++++++++++++++++++++++++
 tools/perf/util/bpf-event.h | 11 +++++++
 tools/perf/util/event.c     | 23 +++++++++++++
 tools/perf/util/event.h     | 26 +++++++++++++++
 tools/perf/util/evsel.c     |  9 +++++
 tools/perf/util/evsel.h     |  1 +
 tools/perf/util/machine.c   |  3 ++
 tools/perf/util/session.c   |  4 +++
 tools/perf/util/tool.h      |  4 ++-
 10 files changed, 148 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/bpf-event.c
 create mode 100644 tools/perf/util/bpf-event.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index b7bf201fe8a8..99455f30c264 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -151,6 +151,8 @@ endif
 
 libperf-y += perf-hooks.o
 
+libperf-$(CONFIG_LIBBPF) += bpf-event.o
+
 libperf-$(CONFIG_CXX) += c++/
 
 CFLAGS_config.o   += -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
new file mode 100644
index 000000000000..2b0a45d8f1d7
--- /dev/null
+++ b/tools/perf/util/bpf-event.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <errno.h>
+#include <bpf/bpf.h>
+#include "bpf-event.h"
+#include "debug.h"
+#include "symbol.h"
+
+static int machine__process_bpf_prog_load(
+	struct machine *machine,
+	union perf_event *event,
+	struct perf_sample *sample __maybe_unused)
+{
+	struct symbol *sym;
+	struct map *map;
+
+	map = map_groups__find(&machine->kmaps, event->bpf_event.addr);
+	if (!map) {
+		map = dso__new_map("bpf_prog");
+		if (!map)
+			return -ENOMEM;
+
+		map->start = event->bpf_event.addr;
+		map->pgoff = map->start;
+		map->end = map->start + event->bpf_event.len;
+		map_groups__insert(&machine->kmaps, map);
+	}
+
+	sym = symbol__new(event->bpf_event.addr, event->bpf_event.len,
+			  0, 0, event->bpf_event.name);
+	if (!sym)
+		return -ENOMEM;
+	dso__insert_symbol(map->dso, sym);
+	return 0;
+}
+
+static int machine__process_bpf_prog_unload(
+	struct machine *machine,
+	union perf_event *event,
+	struct perf_sample *sample __maybe_unused)
+{
+	struct map *map;
+
+	map = map_groups__find(&machine->kmaps, event->bpf_event.addr);
+	if (map)
+		map_groups__remove(&machine->kmaps, map);
+
+	return 0;
+}
+
+int machine__process_bpf_event(struct machine *machine __maybe_unused,
+			       union perf_event *event,
+			       struct perf_sample *sample)
+{
+	if (dump_trace)
+		perf_event__fprintf_bpf_event(event, stdout);
+
+	switch (event->bpf_event.type) {
+	case PERF_BPF_EVENT_PROG_LOAD:
+		return machine__process_bpf_prog_load(machine, event, sample);
+	case PERF_BPF_EVENT_PROG_UNLOAD:
+		return machine__process_bpf_prog_unload(machine, event, sample);
+	default:
+		break;
+	}
+	return 0;
+}
diff --git a/tools/perf/util/bpf-event.h b/tools/perf/util/bpf-event.h
new file mode 100644
index 000000000000..d5ca355dd298
--- /dev/null
+++ b/tools/perf/util/bpf-event.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_BPF_EVENT_H
+#define __PERF_BPF_EVENT_H
+
+#include "machine.h"
+
+int machine__process_bpf_event(struct machine *machine,
+			       union perf_event *event,
+			       struct perf_sample *sample);
+
+#endif
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index e9c108a6b1c3..de8bfbfa6860 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -24,6 +24,8 @@
 #include "symbol/kallsyms.h"
 #include "asm/bug.h"
 #include "stat.h"
+#include "session.h"
+#include "bpf-event.h"
 
 static const char *perf_event__names[] = {
 	[0]					= "TOTAL",
@@ -43,6 +45,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_SWITCH]			= "SWITCH",
 	[PERF_RECORD_SWITCH_CPU_WIDE]		= "SWITCH_CPU_WIDE",
 	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
+	[PERF_RECORD_BPF_EVENT]			= "BPF_EVENT",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -1335,6 +1338,14 @@ int perf_event__process_switch(struct perf_tool *tool __maybe_unused,
 	return machine__process_switch_event(machine, event);
 }
 
+int perf_event__process_bpf_event(struct perf_tool *tool __maybe_unused,
+				  union perf_event *event,
+				  struct perf_sample *sample __maybe_unused,
+				  struct machine *machine)
+{
+	return machine__process_bpf_event(machine, event, sample);
+}
+
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
 {
 	return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %c %s\n",
@@ -1467,6 +1478,15 @@ static size_t perf_event__fprintf_lost(union perf_event *event, FILE *fp)
 	return fprintf(fp, " lost %" PRIu64 "\n", event->lost.lost);
 }
 
+size_t perf_event__fprintf_bpf_event(union perf_event *event, FILE *fp)
+{
+	return fprintf(fp, " bpf event with type %u, flags %u, id %u sub_id %u addr %lx len %lu name %s\n",
+		       event->bpf_event.type, event->bpf_event.flags,
+		       event->bpf_event.id, event->bpf_event.sub_id,
+		       event->bpf_event.addr, event->bpf_event.len,
+		       event->bpf_event.name);
+}
+
 size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 {
 	size_t ret = fprintf(fp, "PERF_RECORD_%s",
@@ -1502,6 +1522,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_LOST:
 		ret += perf_event__fprintf_lost(event, fp);
 		break;
+	case PERF_RECORD_BPF_EVENT:
+		ret += perf_event__fprintf_bpf_event(event, fp);
+		break;
 	default:
 		ret += fprintf(fp, "\n");
 	}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index bfa60bcafbde..cea19b1b128a 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -5,6 +5,7 @@
 #include <limits.h>
 #include <stdio.h>
 #include <linux/kernel.h>
+#include <linux/bpf.h>
 
 #include "../perf.h"
 #include "build-id.h"
@@ -84,6 +85,25 @@ struct throttle_event {
 	u64 stream_id;
 };
 
+#ifndef KSYM_NAME_LEN
+#define KSYM_NAME_LEN 256
+#endif
+
+/* Record different types of bpf events, see enum perf_bpf_event_type */
+struct bpf_event {
+	struct perf_event_header header;
+	u32 type;
+	u32 flags;
+	u32 id;
+	u32 sub_id;
+
+	/* for bpf_prog types */
+	u8 tag[BPF_TAG_SIZE];  // prog tag
+	u64 addr;
+	u64 len;
+	char name[KSYM_NAME_LEN];
+};
+
 #define PERF_SAMPLE_MASK				\
 	(PERF_SAMPLE_IP | PERF_SAMPLE_TID |		\
 	 PERF_SAMPLE_TIME | PERF_SAMPLE_ADDR |		\
@@ -651,6 +671,7 @@ union perf_event {
 	struct stat_round_event		stat_round;
 	struct time_conv_event		time_conv;
 	struct feature_event		feat;
+	struct bpf_event		bpf_event;
 };
 
 void perf_event__print_totals(void);
@@ -750,6 +771,10 @@ int perf_event__process_exit(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
 			     struct machine *machine);
+int perf_event__process_bpf_event(struct perf_tool *tool,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct machine *machine);
 int perf_tool__process_synth_event(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct machine *machine,
@@ -814,6 +839,7 @@ size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_bpf_event(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
 
 int kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index dbc0466db368..6e13e28c0bf0 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1035,6 +1035,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
 	attr->mmap  = track;
 	attr->mmap2 = track && !perf_missing_features.mmap2;
 	attr->comm  = track;
+	attr->bpf_event = track && !perf_missing_features.bpf_event;
 
 	if (opts->record_namespaces)
 		attr->namespaces  = track;
@@ -1652,6 +1653,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(context_switch, p_unsigned);
 	PRINT_ATTRf(write_backward, p_unsigned);
 	PRINT_ATTRf(namespaces, p_unsigned);
+	PRINT_ATTRf(bpf_event, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
@@ -1811,6 +1813,8 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 				     PERF_SAMPLE_BRANCH_NO_CYCLES);
 	if (perf_missing_features.group_read && evsel->attr.inherit)
 		evsel->attr.read_format &= ~(PERF_FORMAT_GROUP|PERF_FORMAT_ID);
+	if (perf_missing_features.bpf_event)
+		evsel->attr.bpf_event = 0;
 retry_sample_id:
 	if (perf_missing_features.sample_id_all)
 		evsel->attr.sample_id_all = 0;
@@ -1955,6 +1959,11 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 		perf_missing_features.exclude_guest = true;
 		pr_debug2("switching off exclude_guest, exclude_host\n");
 		goto fallback_missing_features;
+	} else if (!perf_missing_features.bpf_event &&
+		   evsel->attr.bpf_event) {
+		perf_missing_features.bpf_event = true;
+		pr_debug2("switching off bpf_event\n");
+		goto fallback_missing_features;
 	} else if (!perf_missing_features.sample_id_all) {
 		perf_missing_features.sample_id_all = true;
 		pr_debug2("switching off sample_id_all\n");
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 3147ca76c6fc..ee7e7cc23c6c 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -168,6 +168,7 @@ struct perf_missing_features {
 	bool lbr_flags;
 	bool write_backward;
 	bool group_read;
+	bool bpf_event;
 };
 
 extern struct perf_missing_features perf_missing_features;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 8f36ce813bc5..29c18989ba4f 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -21,6 +21,7 @@
 #include "unwind.h"
 #include "linux/hash.h"
 #include "asm/bug.h"
+#include "bpf-event.h"
 
 #include "sane_ctype.h"
 #include <symbol/kallsyms.h>
@@ -1812,6 +1813,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 	case PERF_RECORD_SWITCH:
 	case PERF_RECORD_SWITCH_CPU_WIDE:
 		ret = machine__process_switch_event(machine, event); break;
+	case PERF_RECORD_BPF_EVENT:
+		ret = machine__process_bpf_event(machine, event, sample); break;
 	default:
 		ret = -1;
 		break;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7d2c8ce6cfad..dffe5120d2d3 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -371,6 +371,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->itrace_start = perf_event__process_itrace_start;
 	if (tool->context_switch == NULL)
 		tool->context_switch = perf_event__process_switch;
+	if (tool->bpf_event == NULL)
+		tool->bpf_event = perf_event__process_bpf_event;
 	if (tool->read == NULL)
 		tool->read = process_event_sample_stub;
 	if (tool->throttle == NULL)
@@ -1300,6 +1302,8 @@ static int machines__deliver_event(struct machines *machines,
 	case PERF_RECORD_SWITCH:
 	case PERF_RECORD_SWITCH_CPU_WIDE:
 		return tool->context_switch(tool, event, sample, machine);
+	case PERF_RECORD_BPF_EVENT:
+		return tool->bpf_event(tool, event, sample, machine);
 	default:
 		++evlist->stats.nr_unknown_events;
 		return -1;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 56e4ca54020a..69ae898ca024 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -53,7 +53,9 @@ struct perf_tool {
 			itrace_start,
 			context_switch,
 			throttle,
-			unthrottle;
+			unthrottle,
+			bpf_event;
+
 	event_attr_op	attr;
 	event_attr_op	event_update;
 	event_op2	tracing_data;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 perf, bpf-next 4/4] perf tools: synthesize bpf event for loaded BPF programs
  2018-12-11 23:33 [PATCH v3 perf, bpf-next 0/4] reveal invisible bpf programs Song Liu
                   ` (2 preceding siblings ...)
  2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 3/4] perf util: handle PERF_RECORD_BPF_EVENT Song Liu
@ 2018-12-11 23:33 ` Song Liu
  3 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-12-11 23:33 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Song Liu, ast, daniel, kernel-team, peterz, acme

This patch synthesize PERF_RECORD_BPF_EVENT for BPF programs loaded before
perf-record. This is achieved by gathering information about all BPF
programs via sys_bpf.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/perf/builtin-record.c |   6 ++
 tools/perf/util/bpf-event.c | 185 ++++++++++++++++++++++++++++++++++++
 tools/perf/util/bpf-event.h |   4 +
 3 files changed, 195 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 488779bc4c8d..81928952d13b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -41,6 +41,7 @@
 #include "util/perf-hooks.h"
 #include "util/time-utils.h"
 #include "util/units.h"
+#include "util/bpf-event.h"
 #include "asm/bug.h"
 
 #include <errno.h>
@@ -855,6 +856,11 @@ static int record__synthesize(struct record *rec, bool tail)
 		return err;
 	}
 
+	err = perf_event__synthesize_bpf_events(tool, process_synthesized_event,
+						machine);
+	if (err < 0)
+		pr_warning("Couldn't synthesize bpf events.\n");
+
 	err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
 					    process_synthesized_event, opts->sample_address,
 					    opts->proc_map_timeout, 1);
diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
index 2b0a45d8f1d7..ae1b89c5b73d 100644
--- a/tools/perf/util/bpf-event.c
+++ b/tools/perf/util/bpf-event.c
@@ -1,10 +1,24 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <errno.h>
 #include <bpf/bpf.h>
+#include <bpf/btf.h>
+#include <linux/btf.h>
 #include "bpf-event.h"
 #include "debug.h"
 #include "symbol.h"
 
+#define ptr_to_u64(ptr)    ((__u64)(unsigned long)(ptr))
+
+static int snprintf_hex(char *buf, size_t size, unsigned char *data, size_t len)
+{
+	int ret = 0;
+	size_t i;
+
+	for (i = 0; i < len; i++)
+		ret += snprintf(buf + ret, size - ret, "%02x", data[i]);
+	return ret;
+}
+
 static int machine__process_bpf_prog_load(
 	struct machine *machine,
 	union perf_event *event,
@@ -64,3 +78,174 @@ int machine__process_bpf_event(struct machine *machine __maybe_unused,
 	}
 	return 0;
 }
+
+/*
+ * Synthesize bpf_event for one bpf program. One bpf_event is generated for
+ * each sub program.
+ */
+static int perf_event__synthesize_one_bpf_prog(struct perf_tool *tool,
+					       perf_event__handler_t process,
+					       struct machine *machine,
+					       int fd,
+					       union perf_event *event)
+{
+	struct bpf_event *bpf_event = &event->bpf_event;
+	u32 sub_prog_cnt, i, func_info_rec_size;
+	u8 (*prog_tags)[BPF_TAG_SIZE] = NULL;
+	struct bpf_prog_info info = {};
+	u32 info_len = sizeof(info);
+	void *func_infos = NULL;
+	u64 *prog_addrs = NULL;
+	struct btf *btf = NULL;
+	u32 *prog_lens = NULL;
+	bool has_btf = false;
+	int err = 0;
+
+	/* Call bpf_obj_get_info_by_fd() to get sizes of arrays */
+	err = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+
+	if (err || info_len < 192 /* need field prog_tags */)
+		return -1;
+
+	/* number of ksyms, func_lengths, and tags should match */
+	sub_prog_cnt = info.nr_jited_ksyms;
+	if (sub_prog_cnt != info.nr_prog_tags ||
+	    sub_prog_cnt != info.nr_jited_func_lens)
+		return -1;
+
+	/* check BTF func info support */
+	if (info.btf_id && info.nr_func_info && info.func_info_rec_size) {
+		/* btf func info number should be same as sub_prog_cnt */
+		if (sub_prog_cnt != info.nr_func_info)
+			return -1;
+		if (btf__get_from_id(info.btf_id, &btf))
+			return -1;
+		func_info_rec_size = info.func_info_rec_size;
+		func_infos = malloc(sub_prog_cnt * func_info_rec_size);
+		if (!func_infos)
+			return -1;
+		has_btf = true;
+	}
+
+	/*
+	 * We need address, length, and tag for each sub program.
+	 * Allocate memory and call bpf_obj_get_info_by_fd() again
+	 */
+	prog_addrs = (u64 *)malloc(sizeof(u64) * sub_prog_cnt);
+	prog_lens = (u32 *)malloc(sizeof(u64) * sub_prog_cnt);
+	prog_tags = malloc(sizeof(u8) * BPF_TAG_SIZE * sub_prog_cnt);
+
+	err = !prog_addrs || !prog_lens || !prog_tags;
+	if (err)
+		goto out;
+
+	memset(&info, 0, sizeof(info));
+	info.nr_jited_ksyms = sub_prog_cnt;
+	info.nr_jited_func_lens = sub_prog_cnt;
+	info.nr_prog_tags = sub_prog_cnt;
+	info.jited_ksyms = ptr_to_u64(prog_addrs);
+	info.jited_func_lens = ptr_to_u64(prog_lens);
+	info.prog_tags = ptr_to_u64(prog_tags);
+	info_len = sizeof(info);
+	if (has_btf) {
+		info.nr_func_info = sub_prog_cnt;
+		info.func_info_rec_size = func_info_rec_size;
+		info.func_info = ptr_to_u64(func_infos);
+	}
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+	if (err)
+		goto out;
+
+	/* Synthesize bpf_events */
+	for (i = 0; i < sub_prog_cnt; i++) {
+		const struct bpf_func_info *finfo;
+		const char *short_name = NULL;
+		const struct btf_type *t;
+		int name_len;
+
+		bpf_event->header.type = PERF_RECORD_BPF_EVENT;
+		bpf_event->header.misc = 0;
+		bpf_event->header.size = sizeof(struct bpf_event);
+		bpf_event->type = PERF_BPF_EVENT_PROG_LOAD;
+		bpf_event->flags = 0;
+		bpf_event->id = info.id;
+		bpf_event->sub_id = i;
+		bpf_event->addr = prog_addrs[i];
+		bpf_event->len = prog_lens[i];
+		memcpy(bpf_event->tag, prog_tags[i], BPF_TAG_SIZE);
+
+		name_len = snprintf(bpf_event->name, KSYM_NAME_LEN,
+				    "bpf_prog_");
+		name_len += snprintf_hex(bpf_event->name + name_len,
+					 KSYM_NAME_LEN - name_len,
+					 bpf_event->tag, BPF_TAG_SIZE);
+		if (has_btf) {
+			finfo = func_infos + i * info.func_info_rec_size;
+			t = btf__type_by_id(btf, finfo->type_id);
+			short_name = btf__name_by_offset(btf, t->name_off);
+		} else if (i == 0 && sub_prog_cnt == 1) {
+			/* no subprog */
+			if (info.name[0])
+				short_name = info.name;
+		} else
+			short_name = "F";
+		if (short_name)
+			name_len += snprintf(bpf_event->name + name_len,
+					     KSYM_NAME_LEN - name_len,
+					     "_%s", short_name);
+
+		bpf_event->header.size += PERF_ALIGN(name_len + 1, sizeof(u64));
+		err = perf_tool__process_synth_event(tool, event,
+						     machine, process);
+	}
+
+out:
+	free(prog_tags);
+	free(prog_lens);
+	free(prog_addrs);
+	free(func_infos);
+	free(btf);
+	return err ? -1 : 0;
+}
+
+int perf_event__synthesize_bpf_events(struct perf_tool *tool,
+				      perf_event__handler_t process,
+				      struct machine *machine)
+{
+	union perf_event *event;
+	__u32 id = 0;
+	int err;
+	int fd;
+
+	event = malloc(sizeof(event->bpf_event) + KSYM_NAME_LEN);
+	if (!event)
+		return -1;
+	while (true) {
+		err = bpf_prog_get_next_id(id, &id);
+		if (err) {
+			if (errno == ENOENT) {
+				err = 0;
+				break;
+			}
+			pr_err("can't get next program: %s%s",
+			       strerror(errno),
+			       errno == EINVAL ? " -- kernel too old?" : "");
+			err = -1;
+			break;
+		}
+		fd = bpf_prog_get_fd_by_id(id);
+		if (fd < 0) {
+			pr_debug("Failed to get fd for prog_id %u\n", id);
+			continue;
+		}
+
+		err = perf_event__synthesize_one_bpf_prog(
+			tool, process, machine, fd, event);
+		close(fd);
+		if (err)
+			break;
+	}
+	free(event);
+	return err;
+}
diff --git a/tools/perf/util/bpf-event.h b/tools/perf/util/bpf-event.h
index d5ca355dd298..be6372a44aba 100644
--- a/tools/perf/util/bpf-event.h
+++ b/tools/perf/util/bpf-event.h
@@ -8,4 +8,8 @@ int machine__process_bpf_event(struct machine *machine,
 			       union perf_event *event,
 			       struct perf_sample *sample);
 
+int perf_event__synthesize_bpf_events(struct perf_tool *tool,
+				      perf_event__handler_t process,
+				      struct machine *machine);
+
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT Song Liu
@ 2018-12-12 12:06   ` Peter Zijlstra
  2018-12-12 16:49     ` Song Liu
  2018-12-12 13:15   ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2018-12-12 12:06 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-kernel, netdev, ast, daniel, kernel-team, acme

On Tue, Dec 11, 2018 at 03:33:47PM -0800, Song Liu wrote:
> +static void perf_event_bpf_output(struct perf_event *event,
> +				   void *data)
> +{
> +	struct perf_bpf_event *bpf_event = data;
> +	struct perf_output_handle handle;
> +	struct perf_sample_data sample;
> +	char name[KSYM_NAME_LEN];
> +	int name_len;
> +	int ret;
> +
> +	if (!perf_event_bpf_match(event))
> +		return;
> +
> +	/* get prog name and round up to 64 bit aligned */
> +	bpf_get_prog_name(bpf_event->prog, name);
> +	name_len = strlen(name) + 1;
> +	while (!IS_ALIGNED(name_len, sizeof(u64)))
> +		name[name_len++] = '\0';

Does this not require something like:

	BUILD_BUG_ON(KSYM_NAME_LEN % sizeof(u64));

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT Song Liu
  2018-12-12 12:06   ` Peter Zijlstra
@ 2018-12-12 13:15   ` Peter Zijlstra
  2018-12-12 13:27     ` Arnaldo Carvalho de Melo
  2018-12-12 17:09     ` Song Liu
  1 sibling, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2018-12-12 13:15 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, netdev, ast, daniel, kernel-team, acme, Steven Rostedt

On Tue, Dec 11, 2018 at 03:33:47PM -0800, Song Liu wrote:
> For better performance analysis of BPF programs, this patch introduces
> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
> load/unload information to user space.
> 
> Each BPF program may contain up to BPF_MAX_SUBPROGS (256) sub programs.
> The following example shows kernel symbols for a BPF program with 7
> sub programs:
> 
>     ffffffffa0257cf9 t bpf_prog_b07ccb89267cf242_F
>     ffffffffa02592e1 t bpf_prog_2dcecc18072623fc_F
>     ffffffffa025b0e9 t bpf_prog_bb7a405ebaec5d5c_F
>     ffffffffa025dd2c t bpf_prog_a7540d4a39ec1fc7_F
>     ffffffffa025fcca t bpf_prog_05762d4ade0e3737_F
>     ffffffffa026108f t bpf_prog_db4bd11e35df90d4_F
>     ffffffffa0263f00 t bpf_prog_89d64e4abf0f0126_F
>     ffffffffa0257cf9 t bpf_prog_ae31629322c4b018__dummy_tracepoi

Doesn't BPF have enough information to generate 'saner' names? Going by
the thing below, these sub-progs are actually functions, right?

>         /*
>          * Record different types of bpf events:
>          *  enum perf_bpf_event_type {
>          *     PERF_BPF_EVENT_UNKNOWN           = 0,
>          *     PERF_BPF_EVENT_PROG_LOAD         = 1,
>          *     PERF_BPF_EVENT_PROG_UNLOAD       = 2,
>          *  };
>          *
>          * struct {
>          *      struct perf_event_header header;
>          *      u32                             type;
>          *      u32                             flags;
>          *      u32                             id; // prog_id or other id
>          *      u32                             sub_id; // subprog id
>          *
>          *      // for bpf_prog types, bpf prog or subprog
>          *      u8                              tag[BPF_TAG_SIZE];
>          *      u64                             addr;
>          *      u64                             len;
>          *      char                            name[];
>          *      struct sample_id                sample_id;
>          * };
>          */

Isn't this mixing two different things (poorly)? The kallsym update and
the BPF load/unload ?

And while this tracks the bpf kallsyms, it does not do all kallsyms.

.... Oooh, I see the problem, everybody is doing their own custom
kallsym_{add,del}() thing, instead of having that in generic code :-(

This, for example, doesn't track module load/unload nor ftrace
trampolines, even though both affect kallsyms.

> +void perf_event_bpf_event_prog(enum perf_bpf_event_type type,
> +			       struct bpf_prog *prog)
> +{
> +	if (!atomic_read(&nr_bpf_events))
> +		return;
> +
> +	if (type != PERF_BPF_EVENT_PROG_LOAD &&
> +	    type != PERF_BPF_EVENT_PROG_UNLOAD)
> +		return;
> +
> +	if (prog->aux->func_cnt == 0) {
> +		perf_event_bpf_event_subprog(type, prog,
> +					     prog->aux->id, 0);
> +	} else {
> +		int i;
> +
> +		for (i = 0; i < prog->aux->func_cnt; i++)
> +			perf_event_bpf_event_subprog(type, prog->aux->func[i],
> +						     prog->aux->id, i);
> +	}
> +}



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-12 13:15   ` Peter Zijlstra
@ 2018-12-12 13:27     ` Arnaldo Carvalho de Melo
  2018-12-12 17:33       ` Song Liu
  2018-12-12 17:09     ` Song Liu
  1 sibling, 1 reply; 22+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-12-12 13:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Song Liu, linux-kernel, netdev, ast, daniel, kernel-team, Steven Rostedt

Em Wed, Dec 12, 2018 at 02:15:49PM +0100, Peter Zijlstra escreveu:
> On Tue, Dec 11, 2018 at 03:33:47PM -0800, Song Liu wrote:
> > For better performance analysis of BPF programs, this patch introduces
> > PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
> > load/unload information to user space.
> > 
> > Each BPF program may contain up to BPF_MAX_SUBPROGS (256) sub programs.
> > The following example shows kernel symbols for a BPF program with 7
> > sub programs:
> > 
> >     ffffffffa0257cf9 t bpf_prog_b07ccb89267cf242_F
> >     ffffffffa02592e1 t bpf_prog_2dcecc18072623fc_F
> >     ffffffffa025b0e9 t bpf_prog_bb7a405ebaec5d5c_F
> >     ffffffffa025dd2c t bpf_prog_a7540d4a39ec1fc7_F
> >     ffffffffa025fcca t bpf_prog_05762d4ade0e3737_F
> >     ffffffffa026108f t bpf_prog_db4bd11e35df90d4_F
> >     ffffffffa0263f00 t bpf_prog_89d64e4abf0f0126_F
> >     ffffffffa0257cf9 t bpf_prog_ae31629322c4b018__dummy_tracepoi
> 
> Doesn't BPF have enough information to generate 'saner' names? Going by
> the thing below, these sub-progs are actually functions, right?

Yeah, this looks just like a symbol table, albeit just with functions,
so far.
 
> >         /*
> >          * Record different types of bpf events:
> >          *  enum perf_bpf_event_type {
> >          *     PERF_BPF_EVENT_UNKNOWN           = 0,
> >          *     PERF_BPF_EVENT_PROG_LOAD         = 1,
> >          *     PERF_BPF_EVENT_PROG_UNLOAD       = 2,
> >          *  };
> >          *
> >          * struct {
> >          *      struct perf_event_header header;
> >          *      u32                             type;
> >          *      u32                             flags;
> >          *      u32                             id; // prog_id or other id
> >          *      u32                             sub_id; // subprog id
> >          *
> >          *      // for bpf_prog types, bpf prog or subprog
> >          *      u8                              tag[BPF_TAG_SIZE];
> >          *      u64                             addr;
> >          *      u64                             len;
> >          *      char                            name[];
> >          *      struct sample_id                sample_id;
> >          * };
> >          */
> 
> Isn't this mixing two different things (poorly)? The kallsym update and
> the BPF load/unload ?
> 
> And while this tracks the bpf kallsyms, it does not do all kallsyms.
> 
> .... Oooh, I see the problem, everybody is doing their own custom
> kallsym_{add,del}() thing, instead of having that in generic code :-(
 
> This, for example, doesn't track module load/unload nor ftrace
> trampolines, even though both affect kallsyms.

So you think the best would have to be a PERF_RECORD_ that just states
that code has been loaded at range (addr, len). I.e. much like
PERF_RECORD_MMAP does, just for userspace? Then it could be used by BPF
and any other kernel facility like the ones you described?

There would be an area that would be used by each of these facilities to
figure out further info, like we use the mmap->name to go and get the
symbol table from ELF files, etc, but BPF could use for their specific
stuff?

The above is almost like PERF_RECORD_MMAP (PERF_BPF_EVENT_PROG_LOAD) +
PERF_RECORD_MUNMAP(PERF_BPF_EVENT_PROG_UNLOAD) in one event, with this
'type' thing allowing for more stuff to be put in later, I guess.

- Arnaldo
 
> > +void perf_event_bpf_event_prog(enum perf_bpf_event_type type,
> > +			       struct bpf_prog *prog)
> > +{
> > +	if (!atomic_read(&nr_bpf_events))
> > +		return;
> > +
> > +	if (type != PERF_BPF_EVENT_PROG_LOAD &&
> > +	    type != PERF_BPF_EVENT_PROG_UNLOAD)
> > +		return;
> > +
> > +	if (prog->aux->func_cnt == 0) {
> > +		perf_event_bpf_event_subprog(type, prog,
> > +					     prog->aux->id, 0);
> > +	} else {
> > +		int i;
> > +
> > +		for (i = 0; i < prog->aux->func_cnt; i++)
> > +			perf_event_bpf_event_subprog(type, prog->aux->func[i],
> > +						     prog->aux->id, i);
> > +	}
> > +}
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-12 12:06   ` Peter Zijlstra
@ 2018-12-12 16:49     ` Song Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-12-12 16:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: lkml, netdev, ast, daniel, Kernel Team, acme



> On Dec 12, 2018, at 4:06 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Dec 11, 2018 at 03:33:47PM -0800, Song Liu wrote:
>> +static void perf_event_bpf_output(struct perf_event *event,
>> +				   void *data)
>> +{
>> +	struct perf_bpf_event *bpf_event = data;
>> +	struct perf_output_handle handle;
>> +	struct perf_sample_data sample;
>> +	char name[KSYM_NAME_LEN];
>> +	int name_len;
>> +	int ret;
>> +
>> +	if (!perf_event_bpf_match(event))
>> +		return;
>> +
>> +	/* get prog name and round up to 64 bit aligned */
>> +	bpf_get_prog_name(bpf_event->prog, name);
>> +	name_len = strlen(name) + 1;
>> +	while (!IS_ALIGNED(name_len, sizeof(u64)))
>> +		name[name_len++] = '\0';
> 
> Does this not require something like:
> 
> 	BUILD_BUG_ON(KSYM_NAME_LEN % sizeof(u64));

Yeah, this makes sense. I will add this in next version. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-12 13:15   ` Peter Zijlstra
  2018-12-12 13:27     ` Arnaldo Carvalho de Melo
@ 2018-12-12 17:09     ` Song Liu
  2018-12-12 18:05       ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Song Liu @ 2018-12-12 17:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, netdev, ast, daniel, Kernel Team, acme, Steven Rostedt



> On Dec 12, 2018, at 5:15 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Dec 11, 2018 at 03:33:47PM -0800, Song Liu wrote:
>> For better performance analysis of BPF programs, this patch introduces
>> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
>> load/unload information to user space.
>> 
>> Each BPF program may contain up to BPF_MAX_SUBPROGS (256) sub programs.
>> The following example shows kernel symbols for a BPF program with 7
>> sub programs:
>> 
>>    ffffffffa0257cf9 t bpf_prog_b07ccb89267cf242_F
>>    ffffffffa02592e1 t bpf_prog_2dcecc18072623fc_F
>>    ffffffffa025b0e9 t bpf_prog_bb7a405ebaec5d5c_F
>>    ffffffffa025dd2c t bpf_prog_a7540d4a39ec1fc7_F
>>    ffffffffa025fcca t bpf_prog_05762d4ade0e3737_F
>>    ffffffffa026108f t bpf_prog_db4bd11e35df90d4_F
>>    ffffffffa0263f00 t bpf_prog_89d64e4abf0f0126_F
>>    ffffffffa0257cf9 t bpf_prog_ae31629322c4b018__dummy_tracepoi
> 
> Doesn't BPF have enough information to generate 'saner' names? Going by
> the thing below, these sub-progs are actually functions, right?

These sub programs/functions will have their descriptive names from BTF
function types (coming in 4.21). However, BTF is optional in normal cases, 
when BTF is missing, they will be named as bpf_prog_<tag>_F. The main BPF
program has a name up to 16 byte long. In the example above, the last 
program has name _dummy_tracepoint. 

I think these sub programs are more like "programs" than "functions", 
because each sub program occupies its own page(s). 

> 
>>        /*
>>         * Record different types of bpf events:
>>         *  enum perf_bpf_event_type {
>>         *     PERF_BPF_EVENT_UNKNOWN           = 0,
>>         *     PERF_BPF_EVENT_PROG_LOAD         = 1,
>>         *     PERF_BPF_EVENT_PROG_UNLOAD       = 2,
>>         *  };
>>         *
>>         * struct {
>>         *      struct perf_event_header header;
>>         *      u32                             type;
>>         *      u32                             flags;
>>         *      u32                             id; // prog_id or other id
>>         *      u32                             sub_id; // subprog id
>>         *
>>         *      // for bpf_prog types, bpf prog or subprog
>>         *      u8                              tag[BPF_TAG_SIZE];
>>         *      u64                             addr;
>>         *      u64                             len;
>>         *      char                            name[];
>>         *      struct sample_id                sample_id;
>>         * };
>>         */
> 
> Isn't this mixing two different things (poorly)? The kallsym update and
> the BPF load/unload ?

I would say these two things are actually two parts of the same event. 
Fields id, sub_id, and tag provide information about which program is
mapped to this ksym. They are equivalent to "pgoff + filename" in 
PERF_RECORD_MMAP, or "maj, min, ino, and ino_generation" in 
PERF_RECORD_MMAP2. 

> 
> And while this tracks the bpf kallsyms, it does not do all kallsyms.
> 
> .... Oooh, I see the problem, everybody is doing their own custom
> kallsym_{add,del}() thing, instead of having that in generic code :-(
> 
> This, for example, doesn't track module load/unload nor ftrace
> trampolines, even though both affect kallsyms.

I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload. 
That could be separate sets of patches. 

Thanks,
Song

> 
>> +void perf_event_bpf_event_prog(enum perf_bpf_event_type type,
>> +			       struct bpf_prog *prog)
>> +{
>> +	if (!atomic_read(&nr_bpf_events))
>> +		return;
>> +
>> +	if (type != PERF_BPF_EVENT_PROG_LOAD &&
>> +	    type != PERF_BPF_EVENT_PROG_UNLOAD)
>> +		return;
>> +
>> +	if (prog->aux->func_cnt == 0) {
>> +		perf_event_bpf_event_subprog(type, prog,
>> +					     prog->aux->id, 0);
>> +	} else {
>> +		int i;
>> +
>> +		for (i = 0; i < prog->aux->func_cnt; i++)
>> +			perf_event_bpf_event_subprog(type, prog->aux->func[i],
>> +						     prog->aux->id, i);
>> +	}
>> +}
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-12 13:27     ` Arnaldo Carvalho de Melo
@ 2018-12-12 17:33       ` Song Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-12-12 17:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, linux-kernel, netdev, ast, daniel, Kernel Team,
	Steven Rostedt



> On Dec 12, 2018, at 5:27 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> Em Wed, Dec 12, 2018 at 02:15:49PM +0100, Peter Zijlstra escreveu:
>> On Tue, Dec 11, 2018 at 03:33:47PM -0800, Song Liu wrote:
>>> For better performance analysis of BPF programs, this patch introduces
>>> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
>>> load/unload information to user space.
>>> 
>>> Each BPF program may contain up to BPF_MAX_SUBPROGS (256) sub programs.
>>> The following example shows kernel symbols for a BPF program with 7
>>> sub programs:
>>> 
>>>    ffffffffa0257cf9 t bpf_prog_b07ccb89267cf242_F
>>>    ffffffffa02592e1 t bpf_prog_2dcecc18072623fc_F
>>>    ffffffffa025b0e9 t bpf_prog_bb7a405ebaec5d5c_F
>>>    ffffffffa025dd2c t bpf_prog_a7540d4a39ec1fc7_F
>>>    ffffffffa025fcca t bpf_prog_05762d4ade0e3737_F
>>>    ffffffffa026108f t bpf_prog_db4bd11e35df90d4_F
>>>    ffffffffa0263f00 t bpf_prog_89d64e4abf0f0126_F
>>>    ffffffffa0257cf9 t bpf_prog_ae31629322c4b018__dummy_tracepoi
>> 
>> Doesn't BPF have enough information to generate 'saner' names? Going by
>> the thing below, these sub-progs are actually functions, right?
> 
> Yeah, this looks just like a symbol table, albeit just with functions,
> so far.
>>>        /*
>>>         * Record different types of bpf events:
>>>         *  enum perf_bpf_event_type {
>>>         *     PERF_BPF_EVENT_UNKNOWN           = 0,
>>>         *     PERF_BPF_EVENT_PROG_LOAD         = 1,
>>>         *     PERF_BPF_EVENT_PROG_UNLOAD       = 2,
>>>         *  };
>>>         *
>>>         * struct {
>>>         *      struct perf_event_header header;
>>>         *      u32                             type;
>>>         *      u32                             flags;
>>>         *      u32                             id; // prog_id or other id
>>>         *      u32                             sub_id; // subprog id
>>>         *
>>>         *      // for bpf_prog types, bpf prog or subprog
>>>         *      u8                              tag[BPF_TAG_SIZE];
>>>         *      u64                             addr;
>>>         *      u64                             len;
>>>         *      char                            name[];
>>>         *      struct sample_id                sample_id;
>>>         * };
>>>         */
>> 
>> Isn't this mixing two different things (poorly)? The kallsym update and
>> the BPF load/unload ?
>> 
>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
>> 
>> .... Oooh, I see the problem, everybody is doing their own custom
>> kallsym_{add,del}() thing, instead of having that in generic code :-(
> 
>> This, for example, doesn't track module load/unload nor ftrace
>> trampolines, even though both affect kallsyms.
> 
> So you think the best would have to be a PERF_RECORD_ that just states
> that code has been loaded at range (addr, len). I.e. much like
> PERF_RECORD_MMAP does, just for userspace? Then it could be used by BPF
> and any other kernel facility like the ones you described?
> 
> There would be an area that would be used by each of these facilities to
> figure out further info, like we use the mmap->name to go and get the
> symbol table from ELF files, etc, but BPF could use for their specific
> stuff?
> 
> The above is almost like PERF_RECORD_MMAP (PERF_BPF_EVENT_PROG_LOAD) +
> PERF_RECORD_MUNMAP(PERF_BPF_EVENT_PROG_UNLOAD) in one event, with this
> 'type' thing allowing for more stuff to be put in later, I guess.
> 
> - Arnaldo

PERF_RECORD_MMAP and PERF_RECORD_BPF_EVENT are issued at different 
granularities: 

   One PERF_RECORD_MMAP is issued for each mmap, which may contain 
   many symbols;
   PERF_RECORD_BPF_EVENT is issued for each symbol (sub program). 

Unlike mmap to ELF file, where all symbols have static offsets within 
the file, different sub programs of a BPF program have their own 
page(s) and random starting addresses within the page(s). Therefore, we 
cannot have one PERF_RECORD_BPF_EVENT to cover multiple sub programs. 

Current version has these mmap-like PERF_RECORD_BPF_EVENT in perf.data, 
which is sufficient for basic profiling cases. For annotation and source
line matching, use space need to process PERF_RECORD_BPF_EVENT (from 
kernel and synthesized). 

Thanks,
Song


>>> +void perf_event_bpf_event_prog(enum perf_bpf_event_type type,
>>> +			       struct bpf_prog *prog)
>>> +{
>>> +	if (!atomic_read(&nr_bpf_events))
>>> +		return;
>>> +
>>> +	if (type != PERF_BPF_EVENT_PROG_LOAD &&
>>> +	    type != PERF_BPF_EVENT_PROG_UNLOAD)
>>> +		return;
>>> +
>>> +	if (prog->aux->func_cnt == 0) {
>>> +		perf_event_bpf_event_subprog(type, prog,
>>> +					     prog->aux->id, 0);
>>> +	} else {
>>> +		int i;
>>> +
>>> +		for (i = 0; i < prog->aux->func_cnt; i++)
>>> +			perf_event_bpf_event_subprog(type, prog->aux->func[i],
>>> +						     prog->aux->id, i);
>>> +	}
>>> +}
>> 
> 
> -- 
> 
> - Arnaldo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-12 17:09     ` Song Liu
@ 2018-12-12 18:05       ` Peter Zijlstra
  2018-12-12 18:33         ` Steven Rostedt
  2018-12-12 18:56         ` Song Liu
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2018-12-12 18:05 UTC (permalink / raw)
  To: Song Liu; +Cc: lkml, netdev, ast, daniel, Kernel Team, acme, Steven Rostedt

On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
> > And while this tracks the bpf kallsyms, it does not do all kallsyms.
> > 
> > .... Oooh, I see the problem, everybody is doing their own custom
> > kallsym_{add,del}() thing, instead of having that in generic code :-(
> > 
> > This, for example, doesn't track module load/unload nor ftrace
> > trampolines, even though both affect kallsyms.
> 
> I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload. 
> That could be separate sets of patches. 

So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
and also have ftrace use that.

Because currently the ftrace stuff is otherwise invisible.

A generic kallsym register/unregister for any JIT.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-12 18:05       ` Peter Zijlstra
@ 2018-12-12 18:33         ` Steven Rostedt
  2018-12-12 18:58           ` Song Liu
  2018-12-13 18:45           ` Peter Zijlstra
  2018-12-12 18:56         ` Song Liu
  1 sibling, 2 replies; 22+ messages in thread
From: Steven Rostedt @ 2018-12-12 18:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Song Liu, lkml, netdev, ast, daniel, Kernel Team, acme

On Wed, 12 Dec 2018 19:05:53 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
> > > And while this tracks the bpf kallsyms, it does not do all kallsyms.
> > > 
> > > .... Oooh, I see the problem, everybody is doing their own custom
> > > kallsym_{add,del}() thing, instead of having that in generic code :-(
> > > 
> > > This, for example, doesn't track module load/unload nor ftrace
> > > trampolines, even though both affect kallsyms.  
> > 
> > I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload. 
> > That could be separate sets of patches.   
> 
> So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
> bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
> and also have ftrace use that.
> 
> Because currently the ftrace stuff is otherwise invisible.
> 
> A generic kallsym register/unregister for any JIT.

That's if it needs to look up the symbols that were recorded when init
was unloaded.

The ftrace kallsyms is used to save the function names of init code
that was freed, but may have been recorded. With out the ftrace
kallsyms the functions traced at init time would just show up as hex
addresses (not very useful).

I'm not sure how BPF would need those symbols unless they were executed
during init (module or core) and needed to see what the symbols use to
be).

-- Steve

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-12 18:05       ` Peter Zijlstra
  2018-12-12 18:33         ` Steven Rostedt
@ 2018-12-12 18:56         ` Song Liu
  2018-12-13 15:25           ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Song Liu @ 2018-12-12 18:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, netdev, ast, daniel, Kernel Team, acme, Steven Rostedt



> On Dec 12, 2018, at 10:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
>>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
>>> 
>>> .... Oooh, I see the problem, everybody is doing their own custom
>>> kallsym_{add,del}() thing, instead of having that in generic code :-(
>>> 
>>> This, for example, doesn't track module load/unload nor ftrace
>>> trampolines, even though both affect kallsyms.
>> 
>> I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload. 
>> That could be separate sets of patches. 
> 
> So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
> bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
> and also have ftrace use that.
> 
> Because currently the ftrace stuff is otherwise invisible.
> 
> A generic kallsym register/unregister for any JIT.

I guess this is _not_ a requirement for this patchset? BPF program has
special data (id, sub_id, tag) that we need PERF_RECORD_BPF_EVENT. So 
this patchset should be orthogonal to the generic kallsym framework?

Thanks,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-12 18:33         ` Steven Rostedt
@ 2018-12-12 18:58           ` Song Liu
  2018-12-13 18:45           ` Peter Zijlstra
  1 sibling, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-12-12 18:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, lkml, netdev, ast, daniel, Kernel Team, acme



> On Dec 12, 2018, at 10:33 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Wed, 12 Dec 2018 19:05:53 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
>>>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
>>>> 
>>>> .... Oooh, I see the problem, everybody is doing their own custom
>>>> kallsym_{add,del}() thing, instead of having that in generic code :-(
>>>> 
>>>> This, for example, doesn't track module load/unload nor ftrace
>>>> trampolines, even though both affect kallsyms.  
>>> 
>>> I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload. 
>>> That could be separate sets of patches.   
>> 
>> So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
>> bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
>> and also have ftrace use that.
>> 
>> Because currently the ftrace stuff is otherwise invisible.
>> 
>> A generic kallsym register/unregister for any JIT.
> 
> That's if it needs to look up the symbols that were recorded when init
> was unloaded.
> 
> The ftrace kallsyms is used to save the function names of init code
> that was freed, but may have been recorded. With out the ftrace
> kallsyms the functions traced at init time would just show up as hex
> addresses (not very useful).
> 
> I'm not sure how BPF would need those symbols unless they were executed
> during init (module or core) and needed to see what the symbols use to
> be).
> 
> -- Steve

Currently, BPF programs are not executed during init. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-12 18:56         ` Song Liu
@ 2018-12-13 15:25           ` Peter Zijlstra
  2018-12-13 16:07             ` Song Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2018-12-13 15:25 UTC (permalink / raw)
  To: Song Liu; +Cc: lkml, netdev, ast, daniel, Kernel Team, acme, Steven Rostedt

On Wed, Dec 12, 2018 at 06:56:11PM +0000, Song Liu wrote:
> 
> 
> > On Dec 12, 2018, at 10:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
> >>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
> >>> 
> >>> .... Oooh, I see the problem, everybody is doing their own custom
> >>> kallsym_{add,del}() thing, instead of having that in generic code :-(
> >>> 
> >>> This, for example, doesn't track module load/unload nor ftrace
> >>> trampolines, even though both affect kallsyms.
> >> 
> >> I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload. 
> >> That could be separate sets of patches. 
> > 
> > So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
> > bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
> > and also have ftrace use that.
> > 
> > Because currently the ftrace stuff is otherwise invisible.
> > 
> > A generic kallsym register/unregister for any JIT.
> 
> I guess this is _not_ a requirement for this patchset? BPF program has
> special data (id, sub_id, tag) that we need PERF_RECORD_BPF_EVENT. So 
> this patchset should be orthogonal to the generic kallsym framework?

Well, it is a question of ABI. I don't like mixing the kallsym updates
with the BPF updates.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-13 15:25           ` Peter Zijlstra
@ 2018-12-13 16:07             ` Song Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-12-13 16:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, netdev, ast, daniel, Kernel Team, acme, Steven Rostedt



> On Dec 13, 2018, at 7:25 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Dec 12, 2018 at 06:56:11PM +0000, Song Liu wrote:
>> 
>> 
>>> On Dec 12, 2018, at 10:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> 
>>> On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
>>>>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
>>>>> 
>>>>> .... Oooh, I see the problem, everybody is doing their own custom
>>>>> kallsym_{add,del}() thing, instead of having that in generic code :-(
>>>>> 
>>>>> This, for example, doesn't track module load/unload nor ftrace
>>>>> trampolines, even though both affect kallsyms.
>>>> 
>>>> I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload. 
>>>> That could be separate sets of patches. 
>>> 
>>> So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
>>> bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
>>> and also have ftrace use that.
>>> 
>>> Because currently the ftrace stuff is otherwise invisible.
>>> 
>>> A generic kallsym register/unregister for any JIT.
>> 
>> I guess this is _not_ a requirement for this patchset? BPF program has
>> special data (id, sub_id, tag) that we need PERF_RECORD_BPF_EVENT. So 
>> this patchset should be orthogonal to the generic kallsym framework?
> 
> Well, it is a question of ABI. I don't like mixing the kallsym updates
> with the BPF updates.

I have been always thinking the two is one update: "mapping this BPF 
program to this ksym". 

On the other hand, if we really want to separate the two. I guess we 
need two PERF_RECORD_*:

       /*
        * PERF_RECORD_KSYM_ADD/DEL or MMAP3/MUNMAP3
        *
        * struct {
        *      struct perf_event_header header;
        *      u64                             addr;
        *      u64                             len;
        *      char                            name[];
        *      struct sample_id                sample_id;
        * };
        */

and

       /*
        * PERF_RECORD_BPF_EVENT
        *
        * struct {
        *      struct perf_event_header header;
        *      u32                             type;
        *      u32                             flags;
        *      u32                             id; // prog_id or other id
        *      u32                             sub_id; // subprog id
        *
        *      // for bpf_prog types, bpf prog or subprog
        *      u8                              tag[BPF_TAG_SIZE];
        *      struct sample_id                sample_id;
        * };
        */

In this case, PERF_RECORD_BPF_EVENT is only needed when user want
annotation. When annotation is needed, kernel will generate both
record for each BPF program load/unload. Then, user space will do 
some work to match the two. 

Personally, I think this is not as clean as current version. But
it would work. 

Would you recommend we go on this direction? 

Thanks,
Song




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-12 18:33         ` Steven Rostedt
  2018-12-12 18:58           ` Song Liu
@ 2018-12-13 18:45           ` Peter Zijlstra
  2018-12-13 21:48             ` Song Liu
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2018-12-13 18:45 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Song Liu, lkml, netdev, ast, daniel, Kernel Team, acme

On Wed, Dec 12, 2018 at 01:33:20PM -0500, Steven Rostedt wrote:
> On Wed, 12 Dec 2018 19:05:53 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
> > > > And while this tracks the bpf kallsyms, it does not do all kallsyms.
> > > > 
> > > > .... Oooh, I see the problem, everybody is doing their own custom
> > > > kallsym_{add,del}() thing, instead of having that in generic code :-(
> > > > 
> > > > This, for example, doesn't track module load/unload nor ftrace
> > > > trampolines, even though both affect kallsyms.  
> > > 
> > > I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload. 
> > > That could be separate sets of patches.   
> > 
> > So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
> > bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
> > and also have ftrace use that.
> > 
> > Because currently the ftrace stuff is otherwise invisible.
> > 
> > A generic kallsym register/unregister for any JIT.
> 
> That's if it needs to look up the symbols that were recorded when init
> was unloaded.
> 
> The ftrace kallsyms is used to save the function names of init code
> that was freed, but may have been recorded. With out the ftrace
> kallsyms the functions traced at init time would just show up as hex
> addresses (not very useful).
> 
> I'm not sure how BPF would need those symbols unless they were executed
> during init (module or core) and needed to see what the symbols use to
> be).

Aah, that sounds entirely dodgy and possibly quite broken. We freed that
init code, so BPF or your trampolines (or a tiny module) could actually
fit in there and insert their own kallsyms, and then we have overlapping
symbols, which would be pretty bad.

I thought the ftrace kallsym stuff was for the trampolines, which would
be fairly similar to what BPF is doing. And why I'm trying to get a
generic dynamic kallsym thing sorted. There's bound the be other
code-gen things at some point.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-13 18:45           ` Peter Zijlstra
@ 2018-12-13 21:48             ` Song Liu
  2018-12-14 13:48               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 22+ messages in thread
From: Song Liu @ 2018-12-13 21:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, lkml, netdev, ast, daniel, Kernel Team, acme



> On Dec 13, 2018, at 10:45 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Dec 12, 2018 at 01:33:20PM -0500, Steven Rostedt wrote:
>> On Wed, 12 Dec 2018 19:05:53 +0100
>> Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>>> On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
>>>>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
>>>>> 
>>>>> .... Oooh, I see the problem, everybody is doing their own custom
>>>>> kallsym_{add,del}() thing, instead of having that in generic code :-(
>>>>> 
>>>>> This, for example, doesn't track module load/unload nor ftrace
>>>>> trampolines, even though both affect kallsyms.  
>>>> 
>>>> I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload. 
>>>> That could be separate sets of patches.   
>>> 
>>> So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
>>> bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
>>> and also have ftrace use that.
>>> 
>>> Because currently the ftrace stuff is otherwise invisible.
>>> 
>>> A generic kallsym register/unregister for any JIT.
>> 
>> That's if it needs to look up the symbols that were recorded when init
>> was unloaded.
>> 
>> The ftrace kallsyms is used to save the function names of init code
>> that was freed, but may have been recorded. With out the ftrace
>> kallsyms the functions traced at init time would just show up as hex
>> addresses (not very useful).
>> 
>> I'm not sure how BPF would need those symbols unless they were executed
>> during init (module or core) and needed to see what the symbols use to
>> be).
> 
> Aah, that sounds entirely dodgy and possibly quite broken. We freed that
> init code, so BPF or your trampolines (or a tiny module) could actually
> fit in there and insert their own kallsyms, and then we have overlapping
> symbols, which would be pretty bad.
> 
> I thought the ftrace kallsym stuff was for the trampolines, which would
> be fairly similar to what BPF is doing. And why I'm trying to get a
> generic dynamic kallsym thing sorted. There's bound the be other
> code-gen things at some point.

Hi Peter, 

I guess you are looking for something for all ksym add/delete events, like;

      /*
       * PERF_RECORD_KSYMBOL
       *
       * struct {
       *      struct perf_event_header header;
       *      u64                             addr;
       *      u32                             len;
       *      u16                             ksym_type;
       *      u16                             flags;
       *      char                            name[];
       *      struct sample_id                sample_id;
       * };
       */

We can use ksym_type to encode BPF_EVENT, trampolines, or other type of ksym.
We can use flags or header.misc to encode ksym add/delete. Is this right?

If we go this direction, shall we reserve a few more bytes in it for different
types to use, like:

      /*
       * PERF_RECORD_KSYMBOL
       *
       * struct {
       *      struct perf_event_header header;
       *      u64                             addr;
       *      u32                             len;
       *      u16                             ksym_type;
       *      u16                             flags;
       *      u64                             data[2];
       *      char                            name[];
       *      struct sample_id                sample_id;
       * };
       */

Thanks,
Song



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-13 21:48             ` Song Liu
@ 2018-12-14 13:48               ` Arnaldo Carvalho de Melo
  2018-12-14 17:10                 ` Song Liu
  2018-12-17 15:48                 ` Peter Zijlstra
  0 siblings, 2 replies; 22+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-12-14 13:48 UTC (permalink / raw)
  To: Song Liu
  Cc: Peter Zijlstra, Steven Rostedt, lkml, netdev, ast, daniel, Kernel Team

Em Thu, Dec 13, 2018 at 09:48:57PM +0000, Song Liu escreveu:
> 
> 
> > On Dec 13, 2018, at 10:45 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > On Wed, Dec 12, 2018 at 01:33:20PM -0500, Steven Rostedt wrote:
> >> On Wed, 12 Dec 2018 19:05:53 +0100
> >> Peter Zijlstra <peterz@infradead.org> wrote:
> >> 
> >>> On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
> >>>>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
> >>>>> 
> >>>>> .... Oooh, I see the problem, everybody is doing their own custom
> >>>>> kallsym_{add,del}() thing, instead of having that in generic code :-(
> >>>>> 
> >>>>> This, for example, doesn't track module load/unload nor ftrace
> >>>>> trampolines, even though both affect kallsyms.  
> >>>> 
> >>>> I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload. 
> >>>> That could be separate sets of patches.   
> >>> 
> >>> So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
> >>> bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
> >>> and also have ftrace use that.
> >>> 
> >>> Because currently the ftrace stuff is otherwise invisible.
> >>> 
> >>> A generic kallsym register/unregister for any JIT.
> >> 
> >> That's if it needs to look up the symbols that were recorded when init
> >> was unloaded.
> >> 
> >> The ftrace kallsyms is used to save the function names of init code
> >> that was freed, but may have been recorded. With out the ftrace
> >> kallsyms the functions traced at init time would just show up as hex
> >> addresses (not very useful).
> >> 
> >> I'm not sure how BPF would need those symbols unless they were executed
> >> during init (module or core) and needed to see what the symbols use to
> >> be).
> > 
> > Aah, that sounds entirely dodgy and possibly quite broken. We freed that
> > init code, so BPF or your trampolines (or a tiny module) could actually
> > fit in there and insert their own kallsyms, and then we have overlapping
> > symbols, which would be pretty bad.
> > 
> > I thought the ftrace kallsym stuff was for the trampolines, which would
> > be fairly similar to what BPF is doing. And why I'm trying to get a
> > generic dynamic kallsym thing sorted. There's bound the be other
> > code-gen things at some point.
> 
> Hi Peter, 
> 
> I guess you are looking for something for all ksym add/delete events, like;
> 
>       /*
>        * PERF_RECORD_KSYMBOL
>        *
>        * struct {
>        *      struct perf_event_header header;
>        *      u64                             addr;
>        *      u32                             len;
>        *      u16                             ksym_type;
>        *      u16                             flags;
>        *      char                            name[];
>        *      struct sample_id                sample_id;
>        * };
>        */

Can't this reuse PERF_RECORD_MMAP2 with some bit in the header to mean
that the name is the symbol name, not a path to some ELF/whatever? The
ksym type could be encoded in the prot field, PROT_EXEC for functions,
PROT_READ for read only data, PROT_WRITE for rw data.

If we do it that way older tools will show the DSO name and an
unresolved symbol, and even an indication if its a function or data,
which is better than not showing anything when processing a new
PERF_RECORD_KSYMBOL.

New tools, seeing the perf_event_attr.header bit will know that this is
a "map" with just one symbol and will show that for both DSO name and
symbol.
 
> We can use ksym_type to encode BPF_EVENT, trampolines, or other type of ksym.
> We can use flags or header.misc to encode ksym add/delete. Is this right?
> 
> If we go this direction, shall we reserve a few more bytes in it for different
> types to use, like:
> 
>       /*
>        * PERF_RECORD_KSYMBOL
>        *
>        * struct {
>        *      struct perf_event_header header;
>        *      u64                             addr;
>        *      u32                             len;
>        *      u16                             ksym_type;
>        *      u16                             flags;
>        *      u64                             data[2];
>        *      char                            name[];
>        *      struct sample_id                sample_id;
>        * };
>        */
> 
> Thanks,
> Song
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-14 13:48               ` Arnaldo Carvalho de Melo
@ 2018-12-14 17:10                 ` Song Liu
  2018-12-17 15:48                 ` Peter Zijlstra
  1 sibling, 0 replies; 22+ messages in thread
From: Song Liu @ 2018-12-14 17:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Steven Rostedt, lkml, netdev, ast, daniel, Kernel Team



> On Dec 14, 2018, at 5:48 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> Em Thu, Dec 13, 2018 at 09:48:57PM +0000, Song Liu escreveu:
>> 
>> 
>>> On Dec 13, 2018, at 10:45 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> 
>>> On Wed, Dec 12, 2018 at 01:33:20PM -0500, Steven Rostedt wrote:
>>>> On Wed, 12 Dec 2018 19:05:53 +0100
>>>> Peter Zijlstra <peterz@infradead.org> wrote:
>>>> 
>>>>> On Wed, Dec 12, 2018 at 05:09:17PM +0000, Song Liu wrote:
>>>>>>> And while this tracks the bpf kallsyms, it does not do all kallsyms.
>>>>>>> 
>>>>>>> .... Oooh, I see the problem, everybody is doing their own custom
>>>>>>> kallsym_{add,del}() thing, instead of having that in generic code :-(
>>>>>>> 
>>>>>>> This, for example, doesn't track module load/unload nor ftrace
>>>>>>> trampolines, even though both affect kallsyms.  
>>>>>> 
>>>>>> I think we can use PERF_RECORD_MMAP(or MMAP2) for module load/unload. 
>>>>>> That could be separate sets of patches.   
>>>>> 
>>>>> So I would actually like to move bpf_lock/bpf_kallsyms/bpf_tree +
>>>>> bpf_prog_kallsyms_*() + __bpf_address_lookup() into kernel/kallsyms.c
>>>>> and also have ftrace use that.
>>>>> 
>>>>> Because currently the ftrace stuff is otherwise invisible.
>>>>> 
>>>>> A generic kallsym register/unregister for any JIT.
>>>> 
>>>> That's if it needs to look up the symbols that were recorded when init
>>>> was unloaded.
>>>> 
>>>> The ftrace kallsyms is used to save the function names of init code
>>>> that was freed, but may have been recorded. With out the ftrace
>>>> kallsyms the functions traced at init time would just show up as hex
>>>> addresses (not very useful).
>>>> 
>>>> I'm not sure how BPF would need those symbols unless they were executed
>>>> during init (module or core) and needed to see what the symbols use to
>>>> be).
>>> 
>>> Aah, that sounds entirely dodgy and possibly quite broken. We freed that
>>> init code, so BPF or your trampolines (or a tiny module) could actually
>>> fit in there and insert their own kallsyms, and then we have overlapping
>>> symbols, which would be pretty bad.
>>> 
>>> I thought the ftrace kallsym stuff was for the trampolines, which would
>>> be fairly similar to what BPF is doing. And why I'm trying to get a
>>> generic dynamic kallsym thing sorted. There's bound the be other
>>> code-gen things at some point.
>> 
>> Hi Peter, 
>> 
>> I guess you are looking for something for all ksym add/delete events, like;
>> 
>>      /*
>>       * PERF_RECORD_KSYMBOL
>>       *
>>       * struct {
>>       *      struct perf_event_header header;
>>       *      u64                             addr;
>>       *      u32                             len;
>>       *      u16                             ksym_type;
>>       *      u16                             flags;
>>       *      char                            name[];
>>       *      struct sample_id                sample_id;
>>       * };
>>       */
> 
> Can't this reuse PERF_RECORD_MMAP2 with some bit in the header to mean
> that the name is the symbol name, not a path to some ELF/whatever? The
> ksym type could be encoded in the prot field, PROT_EXEC for functions,
> PROT_READ for read only data, PROT_WRITE for rw data.

Thanks Arnaldo!

I think this works. PERF_RECORD_MMAP2 has many bits in it. We can encode
a lot of details. We can even have bit to differentiate map/unmap. 

> 
> If we do it that way older tools will show the DSO name and an
> unresolved symbol, and even an indication if its a function or data,
> which is better than not showing anything when processing a new
> PERF_RECORD_KSYMBOL.

For compatibility, we can use attr.bpf_event bit (or attr.mmap2_plus)
to turn on/off new variations of PERF_RECORD_MMAP2. Unless user runs
perf-record and perf-report with different versions of perf tools, we 
should not see weird events. 

> 
> New tools, seeing the perf_event_attr.header bit will know that this is
> a "map" with just one symbol and will show that for both DSO name and
> symbol.
> 

Hi Peter, 

Could you please share your comments/suggestions on Arnaldo's proposal?

Thanks,
Song




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT
  2018-12-14 13:48               ` Arnaldo Carvalho de Melo
  2018-12-14 17:10                 ` Song Liu
@ 2018-12-17 15:48                 ` Peter Zijlstra
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2018-12-17 15:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Song Liu, Steven Rostedt, lkml, netdev, ast, daniel, Kernel Team

On Fri, Dec 14, 2018 at 10:48:57AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Dec 13, 2018 at 09:48:57PM +0000, Song Liu escreveu:

> > I guess you are looking for something for all ksym add/delete events, like;
> > 
> >       /*
> >        * PERF_RECORD_KSYMBOL
> >        *
> >        * struct {
> >        *      struct perf_event_header header;
> >        *      u64                             addr;
> >        *      u32                             len;
> >        *      u16                             ksym_type;
> >        *      u16                             flags;
> >        *      char                            name[];
> >        *      struct sample_id                sample_id;
> >        * };
> >        */

Yes, something like that.

> Can't this reuse PERF_RECORD_MMAP2 with some bit in the header to mean
> that the name is the symbol name, not a path to some ELF/whatever? The
> ksym type could be encoded in the prot field, PROT_EXEC for functions,
> PROT_READ for read only data, PROT_WRITE for rw data.
> 
> If we do it that way older tools will show the DSO name and an
> unresolved symbol, and even an indication if its a function or data,
> which is better than not showing anything when processing a new
> PERF_RECORD_KSYMBOL.
> 
> New tools, seeing the perf_event_attr.header bit will know that this is
> a "map" with just one symbol and will show that for both DSO name and
> symbol.

That confuses me; the DSO for ksyms is [kernel|$modname] after all. And
BPF would like to have multiple symbols per 'program', so I can imagine
it would want to do something like:

	[bpf-progname1] function1
	[bpf-progname1] function2
	[bpf-progname2] progname2

The first being an bpf proglet with multiple functions, the second a
'legacy' bpf proglet with only a single function.

Trouble is; both PERF_RECORD_KSYM and MMAP* only have a single name[]
field. Now, I suppose we could add:

	char modname[MODULE_NAME_LEN]

or:

	u16 modlen;
	char modname[modlen];

or something along those lines.

Similarly; I would not expect the ftrace trampolines to all have a
different module name.

> > We can use ksym_type to encode BPF_EVENT, trampolines, or other type of ksym.
> > We can use flags or header.misc to encode ksym add/delete. Is this right?
> > 
> > If we go this direction, shall we reserve a few more bytes in it for different
> > types to use, like:
> > 
> >       /*
> >        * PERF_RECORD_KSYMBOL
> >        *
> >        * struct {
> >        *      struct perf_event_header header;
> >        *      u64                             addr;
> >        *      u32                             len;
> >        *      u16                             ksym_type;
> >        *      u16                             flags;
> >        *      u64                             data[2];
> >        *      char                            name[];
> >        *      struct sample_id                sample_id;
> >        * };
> >        */

Right; elsewhere you proposed keeping PERF_RECORD_BPF_EVENT for that;
which I think is clearer.

I think you can keep much of the current patches for that in fact.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-12-17 15:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-11 23:33 [PATCH v3 perf, bpf-next 0/4] reveal invisible bpf programs Song Liu
2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 1/4] perf, bpf: Introduce PERF_RECORD_BPF_EVENT Song Liu
2018-12-12 12:06   ` Peter Zijlstra
2018-12-12 16:49     ` Song Liu
2018-12-12 13:15   ` Peter Zijlstra
2018-12-12 13:27     ` Arnaldo Carvalho de Melo
2018-12-12 17:33       ` Song Liu
2018-12-12 17:09     ` Song Liu
2018-12-12 18:05       ` Peter Zijlstra
2018-12-12 18:33         ` Steven Rostedt
2018-12-12 18:58           ` Song Liu
2018-12-13 18:45           ` Peter Zijlstra
2018-12-13 21:48             ` Song Liu
2018-12-14 13:48               ` Arnaldo Carvalho de Melo
2018-12-14 17:10                 ` Song Liu
2018-12-17 15:48                 ` Peter Zijlstra
2018-12-12 18:56         ` Song Liu
2018-12-13 15:25           ` Peter Zijlstra
2018-12-13 16:07             ` Song Liu
2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 2/4] perf: sync tools/include/uapi/linux/perf_event.h Song Liu
2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 3/4] perf util: handle PERF_RECORD_BPF_EVENT Song Liu
2018-12-11 23:33 ` [PATCH v3 perf, bpf-next 4/4] perf tools: synthesize bpf event for loaded BPF programs Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).