linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/3] Introduce perf-stat -b for BPF programs
@ 2020-12-29 21:42 Song Liu
  2020-12-29 21:42 ` [PATCH v7 1/3] bpftool: add Makefile target bootstrap Song Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Song Liu @ 2020-12-29 21:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: acme, peterz, mingo, alexander.shishkin, namhyung, mark.rutland,
	jolsa, kernel-team, Song Liu

This set introduces perf-stat -b option to count events for BPF programs.
This is similar to bpftool-prog-profile. But perf-stat makes it much more
flexible.

Changes v6 => v7
  1. Merge documentation change with feature change. (Arnaldo)
  2. Fix python import perf. (Arnaldo)
  3. Other small fixes. (Namhyung)
  4. Improve allocation/free of bpf skeleton. Also added assert()'s to
     debug a segfault.

Changes v5 => v6
  1. Update the name for bootstrap bpftool. (Jiri)

Changes v4 => v5:
  1. Add documentation. (Jiri)
  2. Silent make output for removing .bpf.o file. (Jiri)

Changes v3 => v4:
  1. Split changes in bpftool/Makefile to a separate patch
  2. Various small fixes. (Jiri)

Changes v2 => v3:
  1. Small fixes in Makefile.perf and bpf_counter.c (Jiri)
  2. Rebased on top of bpf-next. This is because 1/2 conflicts with some
     patches in bpftool/Makefile.

Changes PATCH v1 => PATCH v2:
  1. Various fixes in Makefiles. (Jiri)
  2. Fix an build warning/error with gcc-10. (Jiri)

Changes RFC v2 => PATCH v1:
  1. Support counting on multiple BPF programs.
  2. Add BPF handling to target__validate().
  3. Improve Makefile. (Jiri)

Changes RFC v1 => RFC v2:
  1. Use bootstrap version of bpftool. (Jiri)
  2. Set default to not building bpf skeletons. (Jiri)
  3. Remove util/bpf_skel/Makefile, keep all the logic in Makefile.perf.
     (Jiri)
  4. Remove dependency to vmlinux.h in the two skeletons. The goal here is
     to enable building perf without building kernel (vmlinux) first.
     Note: I also removed the logic that build vmlinux.h. We can add that
     back when we have to use it (to access big kernel structures).

Song Liu (3):
  bpftool: add Makefile target bootstrap
  perf: support build BPF skeletons with perf
  perf-stat: enable counting events for BPF programs

 tools/bpf/bpftool/Makefile                    |   2 +
 tools/build/Makefile.feature                  |   4 +-
 tools/perf/Documentation/perf-stat.txt        |  18 ++
 tools/perf/Makefile.config                    |   9 +
 tools/perf/Makefile.perf                      |  49 ++-
 tools/perf/builtin-stat.c                     |  82 ++++-
 tools/perf/util/Build                         |   1 +
 tools/perf/util/bpf_counter.c                 | 297 ++++++++++++++++++
 tools/perf/util/bpf_counter.h                 |  72 +++++
 tools/perf/util/bpf_skel/.gitignore           |   3 +
 .../util/bpf_skel/bpf_prog_profiler.bpf.c     |  93 ++++++
 tools/perf/util/evsel.c                       |   9 +
 tools/perf/util/evsel.h                       |   5 +
 tools/perf/util/python.c                      |  21 ++
 tools/perf/util/stat-display.c                |   4 +-
 tools/perf/util/stat.c                        |   2 +-
 tools/perf/util/target.c                      |  34 +-
 tools/perf/util/target.h                      |  10 +
 tools/scripts/Makefile.include                |   1 +
 19 files changed, 696 insertions(+), 20 deletions(-)
 create mode 100644 tools/perf/util/bpf_counter.c
 create mode 100644 tools/perf/util/bpf_counter.h
 create mode 100644 tools/perf/util/bpf_skel/.gitignore
 create mode 100644 tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c

--
2.24.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v7 1/3] bpftool: add Makefile target bootstrap
  2020-12-29 21:42 [PATCH v7 0/3] Introduce perf-stat -b for BPF programs Song Liu
@ 2020-12-29 21:42 ` Song Liu
  2020-12-29 21:42 ` [PATCH v7 2/3] perf: support build BPF skeletons with perf Song Liu
  2020-12-29 21:42 ` [PATCH v7 3/3] perf-stat: enable counting events for BPF programs Song Liu
  2 siblings, 0 replies; 19+ messages in thread
From: Song Liu @ 2020-12-29 21:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: acme, peterz, mingo, alexander.shishkin, namhyung, mark.rutland,
	jolsa, kernel-team, Song Liu

This target is used to only build the bootstrap bpftool, which will be
used to generate bpf skeletons for other tools, like perf.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/bpf/bpftool/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index f897cb5fb12d0..e3292a3a0c461 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -148,6 +148,8 @@ VMLINUX_BTF_PATHS ?= $(if $(O),$(O)/vmlinux)				\
 		     /boot/vmlinux-$(shell uname -r)
 VMLINUX_BTF ?= $(abspath $(firstword $(wildcard $(VMLINUX_BTF_PATHS))))
 
+bootstrap: $(BPFTOOL_BOOTSTRAP)
+
 ifneq ($(VMLINUX_BTF)$(VMLINUX_H),)
 ifeq ($(feature-clang-bpf-co-re),1)
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 2/3] perf: support build BPF skeletons with perf
  2020-12-29 21:42 [PATCH v7 0/3] Introduce perf-stat -b for BPF programs Song Liu
  2020-12-29 21:42 ` [PATCH v7 1/3] bpftool: add Makefile target bootstrap Song Liu
@ 2020-12-29 21:42 ` Song Liu
  2020-12-29 21:42 ` [PATCH v7 3/3] perf-stat: enable counting events for BPF programs Song Liu
  2 siblings, 0 replies; 19+ messages in thread
From: Song Liu @ 2020-12-29 21:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: acme, peterz, mingo, alexander.shishkin, namhyung, mark.rutland,
	jolsa, kernel-team, Song Liu

BPF programs are useful in perf to profile BPF programs. BPF skeleton is
by far the easiest way to write BPF tools. Enable building BPF skeletons
in util/bpf_skel. A dummy bpf skeleton is added. More bpf skeletons will
be added for different use cases.

Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/build/Makefile.feature        |  4 ++-
 tools/perf/Makefile.config          |  9 ++++++
 tools/perf/Makefile.perf            | 49 +++++++++++++++++++++++++++--
 tools/perf/util/bpf_skel/.gitignore |  3 ++
 tools/scripts/Makefile.include      |  1 +
 5 files changed, 63 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/util/bpf_skel/.gitignore

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 97cbfb31b7625..74e255d58d8d0 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -99,7 +99,9 @@ FEATURE_TESTS_EXTRA :=                  \
          clang                          \
          libbpf                         \
          libpfm4                        \
-         libdebuginfod
+         libdebuginfod			\
+         clang-bpf-co-re
+
 
 FEATURE_TESTS ?= $(FEATURE_TESTS_BASIC)
 
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index ce8516e4de34f..d8e59d31399a5 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -621,6 +621,15 @@ ifndef NO_LIBBPF
   endif
 endif
 
+ifdef BUILD_BPF_SKEL
+  $(call feature_check,clang-bpf-co-re)
+  ifeq ($(feature-clang-bpf-co-re), 0)
+    dummy := $(error Error: clang too old. Please install recent clang)
+  endif
+  $(call detected,CONFIG_PERF_BPF_SKEL)
+  CFLAGS += -DHAVE_BPF_SKEL
+endif
+
 dwarf-post-unwind := 1
 dwarf-post-unwind-text := BUG
 
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 62f3deb1d3a8b..d182a2dbb9bbd 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -126,6 +126,8 @@ include ../scripts/utilities.mak
 #
 # Define NO_LIBDEBUGINFOD if you do not want support debuginfod
 #
+# Define BUILD_BPF_SKEL to enable BPF skeletons
+#
 
 # As per kernel Makefile, avoid funny character set dependencies
 unexport LC_ALL
@@ -175,6 +177,12 @@ endef
 
 LD += $(EXTRA_LDFLAGS)
 
+HOSTCC  ?= gcc
+HOSTLD  ?= ld
+HOSTAR  ?= ar
+CLANG   ?= clang
+LLVM_STRIP ?= llvm-strip
+
 PKG_CONFIG = $(CROSS_COMPILE)pkg-config
 LLVM_CONFIG ?= llvm-config
 
@@ -731,7 +739,8 @@ prepare: $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h archheaders $(drm_ioc
 	$(x86_arch_prctl_code_array) \
 	$(rename_flags_array) \
 	$(arch_errno_name_array) \
-	$(sync_file_range_arrays)
+	$(sync_file_range_arrays) \
+	bpf-skel
 
 $(OUTPUT)%.o: %.c prepare FORCE
 	$(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@
@@ -1004,7 +1013,43 @@ config-clean:
 python-clean:
 	$(python-clean)
 
-clean:: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBPERF)-clean config-clean fixdep-clean python-clean
+SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel)
+SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp)
+SKELETONS :=
+
+ifdef BUILD_BPF_SKEL
+BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
+LIBBPF_SRC := $(abspath ../lib/bpf)
+BPF_INCLUDE := -I$(SKEL_TMP_OUT)/.. -I$(BPF_PATH) -I$(LIBBPF_SRC)/..
+
+$(SKEL_TMP_OUT):
+	$(Q)$(MKDIR) -p $@
+
+$(BPFTOOL): | $(SKEL_TMP_OUT)
+	CFLAGS= $(MAKE) -C ../bpf/bpftool \
+		OUTPUT=$(SKEL_TMP_OUT)/ bootstrap
+
+$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT)
+	$(QUIET_CLANG)$(CLANG) -g -O2 -target bpf $(BPF_INCLUDE) \
+	  -c $(filter util/bpf_skel/%.bpf.c,$^) -o $@ && $(LLVM_STRIP) -g $@
+
+$(SKEL_OUT)/%.skel.h: $(SKEL_TMP_OUT)/%.bpf.o | $(BPFTOOL)
+	$(QUIET_GENSKEL)$(BPFTOOL) gen skeleton $< > $@
+
+bpf-skel: $(SKELETONS)
+
+.PRECIOUS: $(SKEL_TMP_OUT)/%.bpf.o
+
+else # BUILD_BPF_SKEL
+
+bpf-skel:
+
+endif # BUILD_BPF_SKEL
+
+bpf-skel-clean:
+	$(call QUIET_CLEAN, bpf-skel) $(RM) -r $(SKEL_TMP_OUT) $(SKELETONS)
+
+clean:: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBPERF)-clean config-clean fixdep-clean python-clean bpf-skel-clean
 	$(call QUIET_CLEAN, core-objs)  $(RM) $(LIBPERF_A) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
 	$(Q)find $(if $(OUTPUT),$(OUTPUT),.) -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
 	$(Q)$(RM) $(OUTPUT).config-detected
diff --git a/tools/perf/util/bpf_skel/.gitignore b/tools/perf/util/bpf_skel/.gitignore
new file mode 100644
index 0000000000000..5263e9e6c5d83
--- /dev/null
+++ b/tools/perf/util/bpf_skel/.gitignore
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+.tmp
+*.skel.h
\ No newline at end of file
diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
index 1358e89cdf7d6..62119ce69ad9a 100644
--- a/tools/scripts/Makefile.include
+++ b/tools/scripts/Makefile.include
@@ -127,6 +127,7 @@ ifneq ($(silent),1)
 			 $(MAKE) $(PRINT_DIR) -C $$subdir
 	QUIET_FLEX     = @echo '  FLEX     '$@;
 	QUIET_BISON    = @echo '  BISON    '$@;
+	QUIET_GENSKEL  = @echo '  GEN-SKEL '$@;
 
 	descend = \
 		+@echo	       '  DESCEND  '$(1); \
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2020-12-29 21:42 [PATCH v7 0/3] Introduce perf-stat -b for BPF programs Song Liu
  2020-12-29 21:42 ` [PATCH v7 1/3] bpftool: add Makefile target bootstrap Song Liu
  2020-12-29 21:42 ` [PATCH v7 2/3] perf: support build BPF skeletons with perf Song Liu
@ 2020-12-29 21:42 ` Song Liu
  2021-01-12  7:35   ` Namhyung Kim
  2021-01-18 19:38   ` Arnaldo Carvalho de Melo
  2 siblings, 2 replies; 19+ messages in thread
From: Song Liu @ 2020-12-29 21:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: acme, peterz, mingo, alexander.shishkin, namhyung, mark.rutland,
	jolsa, kernel-team, Song Liu

Introduce perf-stat -b option, which counts events for BPF programs, like:

[root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
     1.487903822            115,200      ref-cycles
     1.487903822             86,012      cycles
     2.489147029             80,560      ref-cycles
     2.489147029             73,784      cycles
     3.490341825             60,720      ref-cycles
     3.490341825             37,797      cycles
     4.491540887             37,120      ref-cycles
     4.491540887             31,963      cycles

The example above counts cycles and ref-cycles of BPF program of id 254.
This is similar to bpftool-prog-profile command, but more flexible.

perf-stat -b creates per-cpu perf_event and loads fentry/fexit BPF
programs (monitor-progs) to the target BPF program (target-prog). The
monitor-progs read perf_event before and after the target-prog, and
aggregate the difference in a BPF map. Then the user space reads data
from these maps.

A new struct bpf_counter is introduced to provide common interface that
uses BPF programs/maps to count perf events.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/perf/Documentation/perf-stat.txt        |  18 +
 tools/perf/Makefile.perf                      |   2 +-
 tools/perf/builtin-stat.c                     |  82 ++++-
 tools/perf/util/Build                         |   1 +
 tools/perf/util/bpf_counter.c                 | 309 ++++++++++++++++++
 tools/perf/util/bpf_counter.h                 |  72 ++++
 .../util/bpf_skel/bpf_prog_profiler.bpf.c     |  93 ++++++
 tools/perf/util/evsel.c                       |   9 +
 tools/perf/util/evsel.h                       |   5 +
 tools/perf/util/python.c                      |  21 ++
 tools/perf/util/stat-display.c                |   4 +-
 tools/perf/util/stat.c                        |   2 +-
 tools/perf/util/target.c                      |  34 +-
 tools/perf/util/target.h                      |  10 +
 14 files changed, 644 insertions(+), 18 deletions(-)
 create mode 100644 tools/perf/util/bpf_counter.c
 create mode 100644 tools/perf/util/bpf_counter.h
 create mode 100644 tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 5d4a673d7621a..98a4dfd3b2dc9 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -75,6 +75,24 @@ report::
 --tid=<tid>::
         stat events on existing thread id (comma separated list)
 
+-b::
+--bpf-prog::
+        stat events on existing bpf program id (comma separated list),
+        requiring root rights. bpftool-prog could be used to find program
+        id all bpf programs in the system. For example:
+
+  # bpftool prog | head -n 1
+  17247: tracepoint  name sys_enter  tag 192d548b9d754067  gpl
+
+  # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
+
+   Performance counter stats for 'BPF program(s) 17247':
+
+             85,967      cycles
+             28,982      instructions              #    0.34  insn per cycle
+
+        1.102235068 seconds time elapsed
+
 ifdef::HAVE_LIBPFM[]
 --pfm-events events::
 Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index d182a2dbb9bbd..8c4e039c3b813 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -1015,7 +1015,7 @@ python-clean:
 
 SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel)
 SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp)
-SKELETONS :=
+SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
 
 ifdef BUILD_BPF_SKEL
 BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 8cc24967bc273..3c054b8d4677c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -67,6 +67,7 @@
 #include "util/top.h"
 #include "util/affinity.h"
 #include "util/pfm.h"
+#include "util/bpf_counter.h"
 #include "asm/bug.h"
 
 #include <linux/time64.h>
@@ -409,12 +410,32 @@ static int read_affinity_counters(struct timespec *rs)
 	return 0;
 }
 
+static int read_bpf_map_counters(void)
+{
+	struct evsel *counter;
+	int err;
+
+	evlist__for_each_entry(evsel_list, counter) {
+		err = bpf_counter__read(counter);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
 static void read_counters(struct timespec *rs)
 {
 	struct evsel *counter;
+	int err;
 
-	if (!stat_config.stop_read_counter && (read_affinity_counters(rs) < 0))
-		return;
+	if (!stat_config.stop_read_counter) {
+		if (target__has_bpf(&target))
+			err = read_bpf_map_counters();
+		else
+			err = read_affinity_counters(rs);
+		if (err < 0)
+			return;
+	}
 
 	evlist__for_each_entry(evsel_list, counter) {
 		if (counter->err)
@@ -496,11 +517,22 @@ static bool handle_interval(unsigned int interval, int *times)
 	return false;
 }
 
-static void enable_counters(void)
+static int enable_counters(void)
 {
+	struct evsel *evsel;
+	int err;
+
+	if (target__has_bpf(&target)) {
+		evlist__for_each_entry(evsel_list, evsel) {
+			err = bpf_counter__enable(evsel);
+			if (err)
+				return err;
+		}
+	}
+
 	if (stat_config.initial_delay < 0) {
 		pr_info(EVLIST_DISABLED_MSG);
-		return;
+		return 0;
 	}
 
 	if (stat_config.initial_delay > 0) {
@@ -518,6 +550,7 @@ static void enable_counters(void)
 		if (stat_config.initial_delay > 0)
 			pr_info(EVLIST_ENABLED_MSG);
 	}
+	return 0;
 }
 
 static void disable_counters(void)
@@ -720,7 +753,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	const bool forks = (argc > 0);
 	bool is_pipe = STAT_RECORD ? perf_stat.data.is_pipe : false;
 	struct affinity affinity;
-	int i, cpu;
+	int i, cpu, err;
 	bool second_pass = false;
 
 	if (forks) {
@@ -737,6 +770,13 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	if (affinity__setup(&affinity) < 0)
 		return -1;
 
+	if (target__has_bpf(&target)) {
+		evlist__for_each_entry(evsel_list, counter) {
+			if (bpf_counter__load(counter, &target))
+				return -1;
+		}
+	}
+
 	evlist__for_each_cpu (evsel_list, i, cpu) {
 		affinity__set(&affinity, cpu);
 
@@ -850,7 +890,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	}
 
 	if (STAT_RECORD) {
-		int err, fd = perf_data__fd(&perf_stat.data);
+		int fd = perf_data__fd(&perf_stat.data);
 
 		if (is_pipe) {
 			err = perf_header__write_pipe(perf_data__fd(&perf_stat.data));
@@ -876,7 +916,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 
 	if (forks) {
 		evlist__start_workload(evsel_list);
-		enable_counters();
+		err = enable_counters();
+		if (err)
+			return -1;
 
 		if (interval || timeout || evlist__ctlfd_initialized(evsel_list))
 			status = dispatch_events(forks, timeout, interval, &times);
@@ -895,7 +937,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 		if (WIFSIGNALED(status))
 			psignal(WTERMSIG(status), argv[0]);
 	} else {
-		enable_counters();
+		err = enable_counters();
+		if (err)
+			return -1;
 		status = dispatch_events(forks, timeout, interval, &times);
 	}
 
@@ -1085,6 +1129,10 @@ static struct option stat_options[] = {
 		   "stat events on existing process id"),
 	OPT_STRING('t', "tid", &target.tid, "tid",
 		   "stat events on existing thread id"),
+#ifdef HAVE_BPF_SKEL
+	OPT_STRING('b', "bpf-prog", &target.bpf_str, "bpf-prog-id",
+		   "stat events on existing bpf program id"),
+#endif
 	OPT_BOOLEAN('a', "all-cpus", &target.system_wide,
 		    "system-wide collection from all CPUs"),
 	OPT_BOOLEAN('g', "group", &group,
@@ -2064,11 +2112,12 @@ int cmd_stat(int argc, const char **argv)
 		"perf stat [<options>] [<command>]",
 		NULL
 	};
-	int status = -EINVAL, run_idx;
+	int status = -EINVAL, run_idx, err;
 	const char *mode;
 	FILE *output = stderr;
 	unsigned int interval, timeout;
 	const char * const stat_subcommands[] = { "record", "report" };
+	char errbuf[BUFSIZ];
 
 	setlocale(LC_ALL, "");
 
@@ -2179,6 +2228,12 @@ int cmd_stat(int argc, const char **argv)
 	} else if (big_num_opt == 0) /* User passed --no-big-num */
 		stat_config.big_num = false;
 
+	err = target__validate(&target);
+	if (err) {
+		target__strerror(&target, err, errbuf, BUFSIZ);
+		pr_warning("%s\n", errbuf);
+	}
+
 	setup_system_wide(argc);
 
 	/*
@@ -2252,8 +2307,6 @@ int cmd_stat(int argc, const char **argv)
 		}
 	}
 
-	target__validate(&target);
-
 	if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))
 		target.per_thread = true;
 
@@ -2384,9 +2437,10 @@ int cmd_stat(int argc, const char **argv)
 		 * tools remain  -acme
 		 */
 		int fd = perf_data__fd(&perf_stat.data);
-		int err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
-							     process_synthesized_event,
-							     &perf_stat.session->machines.host);
+
+		err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
+							 process_synthesized_event,
+							 &perf_stat.session->machines.host);
 		if (err) {
 			pr_warning("Couldn't synthesize the kernel mmap record, harmless, "
 				   "older tools may produce warnings about this file\n.");
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e2563d0154eb6..188521f343470 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -135,6 +135,7 @@ perf-y += clockid.o
 
 perf-$(CONFIG_LIBBPF) += bpf-loader.o
 perf-$(CONFIG_LIBBPF) += bpf_map.o
+perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o
 perf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o
 perf-$(CONFIG_LIBELF) += symbol-elf.o
 perf-$(CONFIG_LIBELF) += probe-file.o
diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
new file mode 100644
index 0000000000000..8c977f038f497
--- /dev/null
+++ b/tools/perf/util/bpf_counter.c
@@ -0,0 +1,309 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Copyright (c) 2019 Facebook */
+
+#include <assert.h>
+#include <limits.h>
+#include <unistd.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <linux/err.h>
+#include <linux/zalloc.h>
+#include <bpf/bpf.h>
+#include <bpf/btf.h>
+#include <bpf/libbpf.h>
+
+#include "bpf_counter.h"
+#include "counts.h"
+#include "debug.h"
+#include "evsel.h"
+#include "target.h"
+
+#include "bpf_skel/bpf_prog_profiler.skel.h"
+
+static inline void *u64_to_ptr(__u64 ptr)
+{
+	return (void *)(unsigned long)ptr;
+}
+
+static void set_max_rlimit(void)
+{
+	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
+
+	setrlimit(RLIMIT_MEMLOCK, &rinf);
+}
+
+static struct bpf_counter *bpf_counter_alloc(void)
+{
+	struct bpf_counter *counter;
+
+	counter = zalloc(sizeof(*counter));
+	if (counter)
+		INIT_LIST_HEAD(&counter->list);
+	return counter;
+}
+
+static int bpf_program_profiler__destroy(struct evsel *evsel)
+{
+	struct bpf_counter *counter, *tmp;
+
+	list_for_each_entry_safe(counter, tmp,
+				 &evsel->bpf_counter_list, list) {
+		list_del_init(&counter->list);
+		bpf_prog_profiler_bpf__destroy(counter->skel);
+		free(counter);
+	}
+	assert(list_empty(&evsel->bpf_counter_list));
+
+	return 0;
+}
+
+static char *bpf_target_prog_name(int tgt_fd)
+{
+	struct bpf_prog_info_linear *info_linear;
+	struct bpf_func_info *func_info;
+	const struct btf_type *t;
+	char *name = NULL;
+	struct btf *btf;
+
+	info_linear = bpf_program__get_prog_info_linear(
+		tgt_fd, 1UL << BPF_PROG_INFO_FUNC_INFO);
+	if (IS_ERR_OR_NULL(info_linear)) {
+		pr_debug("failed to get info_linear for prog FD %d\n", tgt_fd);
+		return NULL;
+	}
+
+	if (info_linear->info.btf_id == 0 ||
+	    btf__get_from_id(info_linear->info.btf_id, &btf)) {
+		pr_debug("prog FD %d doesn't have valid btf\n", tgt_fd);
+		goto out;
+	}
+
+	func_info = u64_to_ptr(info_linear->info.func_info);
+	t = btf__type_by_id(btf, func_info[0].type_id);
+	if (!t) {
+		pr_debug("btf %d doesn't have type %d\n",
+			 info_linear->info.btf_id, func_info[0].type_id);
+		goto out;
+	}
+	name = strdup(btf__name_by_offset(btf, t->name_off));
+out:
+	free(info_linear);
+	return name;
+}
+
+static int bpf_program_profiler_load_one(struct evsel *evsel, u32 prog_id)
+{
+	struct bpf_prog_profiler_bpf *skel;
+	struct bpf_counter *counter;
+	struct bpf_program *prog;
+	char *prog_name;
+	int prog_fd;
+	int err;
+
+	prog_fd = bpf_prog_get_fd_by_id(prog_id);
+	if (prog_fd < 0) {
+		pr_err("Failed to open fd for bpf prog %u\n", prog_id);
+		return -1;
+	}
+	counter = bpf_counter_alloc();
+	if (!counter) {
+		close(prog_fd);
+		return -1;
+	}
+
+	skel = bpf_prog_profiler_bpf__open();
+	if (!skel) {
+		pr_err("Failed to open bpf skeleton\n");
+		goto err_out;
+	}
+
+	skel->rodata->num_cpu = evsel__nr_cpus(evsel);
+
+	bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
+	bpf_map__resize(skel->maps.fentry_readings, 1);
+	bpf_map__resize(skel->maps.accum_readings, 1);
+
+	prog_name = bpf_target_prog_name(prog_fd);
+	if (!prog_name) {
+		pr_err("Failed to get program name for bpf prog %u. Does it have BTF?\n", prog_id);
+		goto err_out;
+	}
+
+	bpf_object__for_each_program(prog, skel->obj) {
+		err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
+		if (err) {
+			pr_err("bpf_program__set_attach_target failed.\n"
+			       "Does bpf prog %u have BTF?\n", prog_id);
+			goto err_out;
+		}
+	}
+	set_max_rlimit();
+	err = bpf_prog_profiler_bpf__load(skel);
+	if (err) {
+		pr_err("bpf_prog_profiler_bpf__load failed\n");
+		goto err_out;
+	}
+
+	assert(skel != NULL);
+	counter->skel = skel;
+	list_add(&counter->list, &evsel->bpf_counter_list);
+	close(prog_fd);
+	return 0;
+err_out:
+	bpf_prog_profiler_bpf__destroy(skel);
+	free(counter);
+	close(prog_fd);
+	return -1;
+}
+
+static int bpf_program_profiler__load(struct evsel *evsel, struct target *target)
+{
+	char *bpf_str, *bpf_str_, *tok, *saveptr = NULL, *p;
+	u32 prog_id;
+	int ret;
+
+	bpf_str_ = bpf_str = strdup(target->bpf_str);
+	if (!bpf_str)
+		return -1;
+
+	while ((tok = strtok_r(bpf_str, ",", &saveptr)) != NULL) {
+		prog_id = strtoul(tok, &p, 10);
+		if (prog_id == 0 || prog_id == UINT_MAX ||
+		    (*p != '\0' && *p != ',')) {
+			pr_err("Failed to parse bpf prog ids %s\n",
+			       target->bpf_str);
+			return -1;
+		}
+
+		ret = bpf_program_profiler_load_one(evsel, prog_id);
+		if (ret) {
+			bpf_program_profiler__destroy(evsel);
+			free(bpf_str_);
+			return -1;
+		}
+		bpf_str = NULL;
+	}
+	free(bpf_str_);
+	return 0;
+}
+
+static int bpf_program_profiler__enable(struct evsel *evsel)
+{
+	struct bpf_counter *counter;
+	int ret;
+
+	list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
+		assert(counter->skel != NULL);
+		ret = bpf_prog_profiler_bpf__attach(counter->skel);
+		if (ret) {
+			bpf_program_profiler__destroy(evsel);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int bpf_program_profiler__read(struct evsel *evsel)
+{
+	int num_cpu = evsel__nr_cpus(evsel);
+	struct bpf_perf_event_value values[num_cpu];
+	struct bpf_counter *counter;
+	int reading_map_fd;
+	__u32 key = 0;
+	int err, cpu;
+
+	if (list_empty(&evsel->bpf_counter_list))
+		return -EAGAIN;
+
+	for (cpu = 0; cpu < num_cpu; cpu++) {
+		perf_counts(evsel->counts, cpu, 0)->val = 0;
+		perf_counts(evsel->counts, cpu, 0)->ena = 0;
+		perf_counts(evsel->counts, cpu, 0)->run = 0;
+	}
+	list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
+		struct bpf_prog_profiler_bpf *skel = counter->skel;
+
+		assert(skel != NULL);
+		reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
+
+		err = bpf_map_lookup_elem(reading_map_fd, &key, values);
+		if (err) {
+			pr_err("failed to read value\n");
+			return err;
+		}
+
+		for (cpu = 0; cpu < num_cpu; cpu++) {
+			perf_counts(evsel->counts, cpu, 0)->val += values[cpu].counter;
+			perf_counts(evsel->counts, cpu, 0)->ena += values[cpu].enabled;
+			perf_counts(evsel->counts, cpu, 0)->run += values[cpu].running;
+		}
+	}
+	return 0;
+}
+
+static int bpf_program_profiler__install_pe(struct evsel *evsel, int cpu,
+					    int fd)
+{
+	struct bpf_prog_profiler_bpf *skel;
+	struct bpf_counter *counter;
+	int ret;
+
+	list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
+		skel = counter->skel;
+		assert(skel != NULL);
+
+		ret = bpf_map_update_elem(bpf_map__fd(skel->maps.events),
+					  &cpu, &fd, BPF_ANY);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+struct bpf_counter_ops bpf_program_profiler_ops = {
+	.load       = bpf_program_profiler__load,
+	.enable	    = bpf_program_profiler__enable,
+	.read       = bpf_program_profiler__read,
+	.destroy    = bpf_program_profiler__destroy,
+	.install_pe = bpf_program_profiler__install_pe,
+};
+
+int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd)
+{
+	if (list_empty(&evsel->bpf_counter_list))
+		return 0;
+	return evsel->bpf_counter_ops->install_pe(evsel, cpu, fd);
+}
+
+int bpf_counter__load(struct evsel *evsel, struct target *target)
+{
+	if (target__has_bpf(target))
+		evsel->bpf_counter_ops = &bpf_program_profiler_ops;
+
+	if (evsel->bpf_counter_ops)
+		return evsel->bpf_counter_ops->load(evsel, target);
+	return 0;
+}
+
+int bpf_counter__enable(struct evsel *evsel)
+{
+	if (list_empty(&evsel->bpf_counter_list))
+		return 0;
+	return evsel->bpf_counter_ops->enable(evsel);
+}
+
+int bpf_counter__read(struct evsel *evsel)
+{
+	if (list_empty(&evsel->bpf_counter_list))
+		return -EAGAIN;
+	return evsel->bpf_counter_ops->read(evsel);
+}
+
+void bpf_counter__destroy(struct evsel *evsel)
+{
+	if (list_empty(&evsel->bpf_counter_list))
+		return;
+	evsel->bpf_counter_ops->destroy(evsel);
+	evsel->bpf_counter_ops = NULL;
+}
diff --git a/tools/perf/util/bpf_counter.h b/tools/perf/util/bpf_counter.h
new file mode 100644
index 0000000000000..2eca210e5dc16
--- /dev/null
+++ b/tools/perf/util/bpf_counter.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_BPF_COUNTER_H
+#define __PERF_BPF_COUNTER_H 1
+
+#include <linux/list.h>
+
+struct evsel;
+struct target;
+struct bpf_counter;
+
+typedef int (*bpf_counter_evsel_op)(struct evsel *evsel);
+typedef int (*bpf_counter_evsel_target_op)(struct evsel *evsel,
+					   struct target *target);
+typedef int (*bpf_counter_evsel_install_pe_op)(struct evsel *evsel,
+					       int cpu,
+					       int fd);
+
+struct bpf_counter_ops {
+	bpf_counter_evsel_target_op load;
+	bpf_counter_evsel_op enable;
+	bpf_counter_evsel_op read;
+	bpf_counter_evsel_op destroy;
+	bpf_counter_evsel_install_pe_op install_pe;
+};
+
+struct bpf_counter {
+	void *skel;
+	struct list_head list;
+};
+
+#ifdef HAVE_BPF_SKEL
+
+int bpf_counter__load(struct evsel *evsel, struct target *target);
+int bpf_counter__enable(struct evsel *evsel);
+int bpf_counter__read(struct evsel *evsel);
+void bpf_counter__destroy(struct evsel *evsel);
+int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd);
+
+#else /* HAVE_BPF_SKEL */
+
+#include<linux/err.h>
+
+static inline int bpf_counter__load(struct evsel *evsel __maybe_unused,
+				    struct target *target __maybe_unused)
+{
+	return 0;
+}
+
+static inline int bpf_counter__enable(struct evsel *evsel __maybe_unused)
+{
+	return 0;
+}
+
+static inline int bpf_counter__read(struct evsel *evsel __maybe_unused)
+{
+	return -EAGAIN;
+}
+
+static inline void bpf_counter__destroy(struct evsel *evsel __maybe_unused)
+{
+}
+
+static inline int bpf_counter__install_pe(struct evsel *evsel __maybe_unused,
+					  int cpu __maybe_unused,
+					  int fd __maybe_unused)
+{
+	return 0;
+}
+
+#endif /* HAVE_BPF_SKEL */
+
+#endif /* __PERF_BPF_COUNTER_H */
diff --git a/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c b/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
new file mode 100644
index 0000000000000..c7cec92d02360
--- /dev/null
+++ b/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+// Copyright (c) 2020 Facebook
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+/* map of perf event fds, num_cpu * num_metric entries */
+struct {
+	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(int));
+} events SEC(".maps");
+
+/* readings at fentry */
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct bpf_perf_event_value));
+	__uint(max_entries, 1);
+} fentry_readings SEC(".maps");
+
+/* accumulated readings */
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct bpf_perf_event_value));
+	__uint(max_entries, 1);
+} accum_readings SEC(".maps");
+
+const volatile __u32 num_cpu = 1;
+
+SEC("fentry/XXX")
+int BPF_PROG(fentry_XXX)
+{
+	__u32 key = bpf_get_smp_processor_id();
+	struct bpf_perf_event_value *ptr;
+	__u32 zero = 0;
+	long err;
+
+	/* look up before reading, to reduce error */
+	ptr = bpf_map_lookup_elem(&fentry_readings, &zero);
+	if (!ptr)
+		return 0;
+
+	err = bpf_perf_event_read_value(&events, key, ptr, sizeof(*ptr));
+	if (err)
+		return 0;
+
+	return 0;
+}
+
+static inline void
+fexit_update_maps(struct bpf_perf_event_value *after)
+{
+	struct bpf_perf_event_value *before, diff, *accum;
+	__u32 zero = 0;
+
+	before = bpf_map_lookup_elem(&fentry_readings, &zero);
+	/* only account samples with a valid fentry_reading */
+	if (before && before->counter) {
+		struct bpf_perf_event_value *accum;
+
+		diff.counter = after->counter - before->counter;
+		diff.enabled = after->enabled - before->enabled;
+		diff.running = after->running - before->running;
+
+		accum = bpf_map_lookup_elem(&accum_readings, &zero);
+		if (accum) {
+			accum->counter += diff.counter;
+			accum->enabled += diff.enabled;
+			accum->running += diff.running;
+		}
+	}
+}
+
+SEC("fexit/XXX")
+int BPF_PROG(fexit_XXX)
+{
+	struct bpf_perf_event_value reading;
+	__u32 cpu = bpf_get_smp_processor_id();
+	__u32 one = 1, zero = 0;
+	int err;
+
+	/* read all events before updating the maps, to reduce error */
+	err = bpf_perf_event_read_value(&events, cpu, &reading, sizeof(reading));
+	if (err)
+		return 0;
+
+	fexit_update_maps(&reading);
+	return 0;
+}
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index dc0cfa5f26101..7a5ca00496000 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -25,6 +25,7 @@
 #include <stdlib.h>
 #include <perf/evsel.h>
 #include "asm/bug.h"
+#include "bpf_counter.h"
 #include "callchain.h"
 #include "cgroup.h"
 #include "counts.h"
@@ -51,6 +52,10 @@
 #include <internal/lib.h>
 
 #include <linux/ctype.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include <bpf/btf.h>
+#include "rlimit.h"
 
 struct perf_missing_features perf_missing_features;
 
@@ -247,6 +252,7 @@ void evsel__init(struct evsel *evsel,
 	evsel->bpf_obj	   = NULL;
 	evsel->bpf_fd	   = -1;
 	INIT_LIST_HEAD(&evsel->config_terms);
+	INIT_LIST_HEAD(&evsel->bpf_counter_list);
 	perf_evsel__object.init(evsel);
 	evsel->sample_size = __evsel__sample_size(attr->sample_type);
 	evsel__calc_id_pos(evsel);
@@ -1368,6 +1374,7 @@ void evsel__exit(struct evsel *evsel)
 {
 	assert(list_empty(&evsel->core.node));
 	assert(evsel->evlist == NULL);
+	bpf_counter__destroy(evsel);
 	evsel__free_counts(evsel);
 	perf_evsel__free_fd(&evsel->core);
 	perf_evsel__free_id(&evsel->core);
@@ -1783,6 +1790,8 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
 
 			FD(evsel, cpu, thread) = fd;
 
+			bpf_counter__install_pe(evsel, cpu, fd);
+
 			if (unlikely(test_attr__enabled)) {
 				test_attr__open(&evsel->core.attr, pid, cpus->map[cpu],
 						fd, group_fd, flags);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index cd1d8dd431997..8226b1fefa8cf 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -17,6 +17,8 @@ struct cgroup;
 struct perf_counts;
 struct perf_stat_evsel;
 union perf_event;
+struct bpf_counter_ops;
+struct target;
 
 typedef int (evsel__sb_cb_t)(union perf_event *event, void *data);
 
@@ -127,6 +129,8 @@ struct evsel {
 	 * See also evsel__has_callchain().
 	 */
 	__u64			synth_sample_type;
+	struct list_head	bpf_counter_list;
+	struct bpf_counter_ops	*bpf_counter_ops;
 };
 
 struct perf_missing_features {
@@ -424,4 +428,5 @@ static inline bool evsel__is_dummy_event(struct evsel *evsel)
 struct perf_env *evsel__env(struct evsel *evsel);
 
 int evsel__store_ids(struct evsel *evsel, struct evlist *evlist);
+
 #endif /* __PERF_EVSEL_H */
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index cc5ade85a33fc..278abecb5bdfc 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -79,6 +79,27 @@ int metricgroup__copy_metric_events(struct evlist *evlist, struct cgroup *cgrp,
 	return 0;
 }
 
+/*
+ * XXX: All these evsel destructors need some better mechanism, like a linked
+ * list of destructors registered when the relevant code indeed is used instead
+ * of having more and more calls in perf_evsel__delete(). -- acme
+ *
+ * For now, add some more:
+ *
+ * Not to drag the BPF bandwagon...
+ */
+void bpf_counter__destroy(struct evsel *evsel);
+int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd);
+
+void bpf_counter__destroy(struct evsel *evsel __maybe_unused)
+{
+}
+
+int bpf_counter__install_pe(struct evsel *evsel __maybe_unused, int cpu __maybe_unused, int fd __maybe_unused)
+{
+	return 0;
+}
+
 /*
  * Support debug printing even though util/debug.c is not linked.  That means
  * implementing 'verbose' and 'eprintf'.
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 583ae4f09c5d1..cce7a76d6473c 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -1045,7 +1045,9 @@ static void print_header(struct perf_stat_config *config,
 	if (!config->csv_output) {
 		fprintf(output, "\n");
 		fprintf(output, " Performance counter stats for ");
-		if (_target->system_wide)
+		if (_target->bpf_str)
+			fprintf(output, "\'BPF program(s) %s", _target->bpf_str);
+		else if (_target->system_wide)
 			fprintf(output, "\'system wide");
 		else if (_target->cpu_list)
 			fprintf(output, "\'CPU(s) %s", _target->cpu_list);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 8ce1479c98f03..0b3957323f668 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -527,7 +527,7 @@ int create_perf_stat_counter(struct evsel *evsel,
 	if (leader->core.nr_members > 1)
 		attr->read_format |= PERF_FORMAT_ID|PERF_FORMAT_GROUP;
 
-	attr->inherit = !config->no_inherit;
+	attr->inherit = !config->no_inherit && list_empty(&evsel->bpf_counter_list);
 
 	/*
 	 * Some events get initialized with sample_(period/type) set,
diff --git a/tools/perf/util/target.c b/tools/perf/util/target.c
index a3db13dea937c..0f383418e3df5 100644
--- a/tools/perf/util/target.c
+++ b/tools/perf/util/target.c
@@ -56,6 +56,34 @@ enum target_errno target__validate(struct target *target)
 			ret = TARGET_ERRNO__UID_OVERRIDE_SYSTEM;
 	}
 
+	/* BPF and CPU are mutually exclusive */
+	if (target->bpf_str && target->cpu_list) {
+		target->cpu_list = NULL;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__BPF_OVERRIDE_CPU;
+	}
+
+	/* BPF and PID/TID are mutually exclusive */
+	if (target->bpf_str && target->tid) {
+		target->tid = NULL;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__BPF_OVERRIDE_PID;
+	}
+
+	/* BPF and UID are mutually exclusive */
+	if (target->bpf_str && target->uid_str) {
+		target->uid_str = NULL;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__BPF_OVERRIDE_UID;
+	}
+
+	/* BPF and THREADS are mutually exclusive */
+	if (target->bpf_str && target->per_thread) {
+		target->per_thread = false;
+		if (ret == TARGET_ERRNO__SUCCESS)
+			ret = TARGET_ERRNO__BPF_OVERRIDE_THREAD;
+	}
+
 	/* THREAD and SYSTEM/CPU are mutually exclusive */
 	if (target->per_thread && (target->system_wide || target->cpu_list)) {
 		target->per_thread = false;
@@ -109,6 +137,10 @@ static const char *target__error_str[] = {
 	"PID/TID switch overriding SYSTEM",
 	"UID switch overriding SYSTEM",
 	"SYSTEM/CPU switch overriding PER-THREAD",
+	"BPF switch overriding CPU",
+	"BPF switch overriding PID/TID",
+	"BPF switch overriding UID",
+	"BPF switch overriding THREAD",
 	"Invalid User: %s",
 	"Problems obtaining information for user %s",
 };
@@ -134,7 +166,7 @@ int target__strerror(struct target *target, int errnum,
 
 	switch (errnum) {
 	case TARGET_ERRNO__PID_OVERRIDE_CPU ...
-	     TARGET_ERRNO__SYSTEM_OVERRIDE_THREAD:
+	     TARGET_ERRNO__BPF_OVERRIDE_THREAD:
 		snprintf(buf, buflen, "%s", msg);
 		break;
 
diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
index 6ef01a83b24e9..f132c6c2eef81 100644
--- a/tools/perf/util/target.h
+++ b/tools/perf/util/target.h
@@ -10,6 +10,7 @@ struct target {
 	const char   *tid;
 	const char   *cpu_list;
 	const char   *uid_str;
+	const char   *bpf_str;
 	uid_t	     uid;
 	bool	     system_wide;
 	bool	     uses_mmap;
@@ -36,6 +37,10 @@ enum target_errno {
 	TARGET_ERRNO__PID_OVERRIDE_SYSTEM,
 	TARGET_ERRNO__UID_OVERRIDE_SYSTEM,
 	TARGET_ERRNO__SYSTEM_OVERRIDE_THREAD,
+	TARGET_ERRNO__BPF_OVERRIDE_CPU,
+	TARGET_ERRNO__BPF_OVERRIDE_PID,
+	TARGET_ERRNO__BPF_OVERRIDE_UID,
+	TARGET_ERRNO__BPF_OVERRIDE_THREAD,
 
 	/* for target__parse_uid() */
 	TARGET_ERRNO__INVALID_UID,
@@ -59,6 +64,11 @@ static inline bool target__has_cpu(struct target *target)
 	return target->system_wide || target->cpu_list;
 }
 
+static inline bool target__has_bpf(struct target *target)
+{
+	return target->bpf_str;
+}
+
 static inline bool target__none(struct target *target)
 {
 	return !target__has_task(target) && !target__has_cpu(target);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2020-12-29 21:42 ` [PATCH v7 3/3] perf-stat: enable counting events for BPF programs Song Liu
@ 2021-01-12  7:35   ` Namhyung Kim
  2021-01-15 18:53     ` Arnaldo Carvalho de Melo
  2021-01-18 19:38   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 19+ messages in thread
From: Namhyung Kim @ 2021-01-12  7:35 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Mark Rutland, Jiri Olsa,
	kernel-team

Hi,

On Wed, Dec 30, 2020 at 6:42 AM Song Liu <songliubraving@fb.com> wrote:
>
> Introduce perf-stat -b option, which counts events for BPF programs, like:
>
> [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
>      1.487903822            115,200      ref-cycles
>      1.487903822             86,012      cycles
>      2.489147029             80,560      ref-cycles
>      2.489147029             73,784      cycles
>      3.490341825             60,720      ref-cycles
>      3.490341825             37,797      cycles
>      4.491540887             37,120      ref-cycles
>      4.491540887             31,963      cycles
>
> The example above counts cycles and ref-cycles of BPF program of id 254.
> This is similar to bpftool-prog-profile command, but more flexible.
>
> perf-stat -b creates per-cpu perf_event and loads fentry/fexit BPF
> programs (monitor-progs) to the target BPF program (target-prog). The
> monitor-progs read perf_event before and after the target-prog, and
> aggregate the difference in a BPF map. Then the user space reads data
> from these maps.
>
> A new struct bpf_counter is introduced to provide common interface that
> uses BPF programs/maps to count perf events.

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-12  7:35   ` Namhyung Kim
@ 2021-01-15 18:53     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-15 18:53 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Song Liu, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Alexander Shishkin, Mark Rutland, Jiri Olsa, kernel-team

Em Tue, Jan 12, 2021 at 04:35:14PM +0900, Namhyung Kim escreveu:
> Hi,
> 
> On Wed, Dec 30, 2020 at 6:42 AM Song Liu <songliubraving@fb.com> wrote:
> >
> > Introduce perf-stat -b option, which counts events for BPF programs, like:
> >
> > [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
> >      1.487903822            115,200      ref-cycles
> >      1.487903822             86,012      cycles
> >      2.489147029             80,560      ref-cycles
> >      2.489147029             73,784      cycles
> >      3.490341825             60,720      ref-cycles
> >      3.490341825             37,797      cycles
> >      4.491540887             37,120      ref-cycles
> >      4.491540887             31,963      cycles
> >
> > The example above counts cycles and ref-cycles of BPF program of id 254.
> > This is similar to bpftool-prog-profile command, but more flexible.
> >
> > perf-stat -b creates per-cpu perf_event and loads fentry/fexit BPF
> > programs (monitor-progs) to the target BPF program (target-prog). The
> > monitor-progs read perf_event before and after the target-prog, and
> > aggregate the difference in a BPF map. Then the user space reads data
> > from these maps.
> >
> > A new struct bpf_counter is introduced to provide common interface that
> > uses BPF programs/maps to count perf events.
> 
> Acked-by: Namhyung Kim <namhyung@kernel.org>

Applied locally, testing now.

- Arnaldo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2020-12-29 21:42 ` [PATCH v7 3/3] perf-stat: enable counting events for BPF programs Song Liu
  2021-01-12  7:35   ` Namhyung Kim
@ 2021-01-18 19:38   ` Arnaldo Carvalho de Melo
  2021-01-19  0:48     ` Song Liu
  1 sibling, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-18 19:38 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, peterz, mingo, alexander.shishkin, namhyung,
	mark.rutland, jolsa, kernel-team

Em Tue, Dec 29, 2020 at 01:42:14PM -0800, Song Liu escreveu:
> Introduce perf-stat -b option, which counts events for BPF programs, like:
> 
> [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
>      1.487903822            115,200      ref-cycles
>      1.487903822             86,012      cycles
>      2.489147029             80,560      ref-cycles
>      2.489147029             73,784      cycles
>      3.490341825             60,720      ref-cycles
>      3.490341825             37,797      cycles
>      4.491540887             37,120      ref-cycles
>      4.491540887             31,963      cycles
> 
> The example above counts cycles and ref-cycles of BPF program of id 254.
> This is similar to bpftool-prog-profile command, but more flexible.

So trying to get back to this after vacation:

[root@five ~]# perf stat -vvv -e cycles -b 103
Control descriptor is not initialized
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: load bpf program failed: Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Cannot recursively attach
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'fentry_XXX'
libbpf: failed to load object 'bpf_prog_profiler_bpf'
libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
bpf_prog_profiler_bpf__load failed
[root@five ~]#

[root@five perf]# tools/bpf/runqslower/.output/
bpftool/    libbpf/     runqslower
[root@five perf]# tools/bpf/runqslower/.output/runqslower
libbpf: elf: skipping unrecognized data section(10) .eh_frame
libbpf: elf: skipping relo section(17) .rel.eh_frame for section(10) .eh_frame
Tracing run queue latency higher than 10000 us
TIME     COMM             PID           LAT(us)

[root@five ~]# bpftool prog
6: tracing  name dump_bpf_map  tag e54c922dfa54f65f  gpl
	loaded_at 2021-01-15T11:14:38-0300  uid 0
	xlated 256B  jited 154B  memlock 4096B  map_ids 3
	btf_id 6
8: tracing  name dump_bpf_prog  tag 4cb5558b902ee92e  gpl
	loaded_at 2021-01-15T11:14:38-0300  uid 0
	xlated 520B  jited 301B  memlock 4096B  map_ids 3
	btf_id 6
80: cgroup_device  tag 8b9c33f36f812014  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 744B  jited 447B  memlock 4096B
81: cgroup_skb  tag 6deef7357e7b4530  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 64B  jited 54B  memlock 4096B
82: cgroup_skb  tag 6deef7357e7b4530  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 64B  jited 54B  memlock 4096B
83: cgroup_device  tag ee0e253c78993a24  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 416B  jited 255B  memlock 4096B
84: cgroup_device  tag e3dbd137be8d6168  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 504B  jited 309B  memlock 4096B
85: cgroup_skb  tag 6deef7357e7b4530  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 64B  jited 54B  memlock 4096B
86: cgroup_skb  tag 6deef7357e7b4530  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 64B  jited 54B  memlock 4096B
87: cgroup_device  tag c8b47a902f1cc68b  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 464B  jited 288B  memlock 4096B
88: cgroup_device  tag 0ecd07b7b633809f  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 496B  jited 307B  memlock 4096B
89: cgroup_skb  tag 6deef7357e7b4530  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 64B  jited 54B  memlock 4096B
90: cgroup_skb  tag 6deef7357e7b4530  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 64B  jited 54B  memlock 4096B
91: cgroup_skb  tag 6deef7357e7b4530  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 64B  jited 54B  memlock 4096B
92: cgroup_skb  tag 6deef7357e7b4530  gpl
	loaded_at 2021-01-17T18:35:22-0300  uid 0
	xlated 64B  jited 54B  memlock 4096B
100: tracing  name handle__sched_w  tag 00ba1f1d332f5a57  gpl
	loaded_at 2021-01-18T16:34:00-0300  uid 0
	xlated 152B  jited 93B  memlock 4096B  map_ids 11,9
	btf_id 248
102: tracing  name handle__sched_w  tag 00ba1f1d332f5a57  gpl
	loaded_at 2021-01-18T16:34:00-0300  uid 0
	xlated 152B  jited 93B  memlock 4096B  map_ids 11,9
	btf_id 248
103: tracing  name handle__sched_s  tag a4ac42153fd49be2  gpl
	loaded_at 2021-01-18T16:34:00-0300  uid 0
	xlated 560B  jited 332B  memlock 4096B  map_ids 11,9,10
	btf_id 248
[root@five ~]# perf stat -vvv -e cycles -b 92
Control descriptor is not initialized
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
prog FD 3 doesn't have valid btf
Failed to get program name for bpf prog 92. Does it have BTF?
[root@five ~]# perf stat -vvv -e cycles -b 103
Control descriptor is not initialized
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: load bpf program failed: Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Cannot recursively attach
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'fentry_XXX'
libbpf: failed to load object 'bpf_prog_profiler_bpf'
libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
bpf_prog_profiler_bpf__load failed
[root@five ~]# perf stat -vvv -e cycles -b 102
Control descriptor is not initialized
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: load bpf program failed: Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Cannot recursively attach
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'fentry_XXX'
libbpf: failed to load object 'bpf_prog_profiler_bpf'
libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
bpf_prog_profiler_bpf__load failed
[root@five ~]#
[root@five ~]# perf stat -vvv -e cycles -b 6
Control descriptor is not initialized
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: load bpf program failed: Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Cannot recursively attach
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'fentry_XXX'
libbpf: failed to load object 'bpf_prog_profiler_bpf'
libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
bpf_prog_profiler_bpf__load failed
[root@five ~]#

So no more segfaults, but still something to look at, I'm pushing my
local perf/core to tmp.perf/core, please take a look.

- Arnaldo
 
> perf-stat -b creates per-cpu perf_event and loads fentry/fexit BPF
> programs (monitor-progs) to the target BPF program (target-prog). The
> monitor-progs read perf_event before and after the target-prog, and
> aggregate the difference in a BPF map. Then the user space reads data
> from these maps.
> 
> A new struct bpf_counter is introduced to provide common interface that
> uses BPF programs/maps to count perf events.
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  tools/perf/Documentation/perf-stat.txt        |  18 +
>  tools/perf/Makefile.perf                      |   2 +-
>  tools/perf/builtin-stat.c                     |  82 ++++-
>  tools/perf/util/Build                         |   1 +
>  tools/perf/util/bpf_counter.c                 | 309 ++++++++++++++++++
>  tools/perf/util/bpf_counter.h                 |  72 ++++
>  .../util/bpf_skel/bpf_prog_profiler.bpf.c     |  93 ++++++
>  tools/perf/util/evsel.c                       |   9 +
>  tools/perf/util/evsel.h                       |   5 +
>  tools/perf/util/python.c                      |  21 ++
>  tools/perf/util/stat-display.c                |   4 +-
>  tools/perf/util/stat.c                        |   2 +-
>  tools/perf/util/target.c                      |  34 +-
>  tools/perf/util/target.h                      |  10 +
>  14 files changed, 644 insertions(+), 18 deletions(-)
>  create mode 100644 tools/perf/util/bpf_counter.c
>  create mode 100644 tools/perf/util/bpf_counter.h
>  create mode 100644 tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
> 
> diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
> index 5d4a673d7621a..98a4dfd3b2dc9 100644
> --- a/tools/perf/Documentation/perf-stat.txt
> +++ b/tools/perf/Documentation/perf-stat.txt
> @@ -75,6 +75,24 @@ report::
>  --tid=<tid>::
>          stat events on existing thread id (comma separated list)
>  
> +-b::
> +--bpf-prog::
> +        stat events on existing bpf program id (comma separated list),
> +        requiring root rights. bpftool-prog could be used to find program
> +        id all bpf programs in the system. For example:
> +
> +  # bpftool prog | head -n 1
> +  17247: tracepoint  name sys_enter  tag 192d548b9d754067  gpl
> +
> +  # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
> +
> +   Performance counter stats for 'BPF program(s) 17247':
> +
> +             85,967      cycles
> +             28,982      instructions              #    0.34  insn per cycle
> +
> +        1.102235068 seconds time elapsed
> +
>  ifdef::HAVE_LIBPFM[]
>  --pfm-events events::
>  Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
> diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> index d182a2dbb9bbd..8c4e039c3b813 100644
> --- a/tools/perf/Makefile.perf
> +++ b/tools/perf/Makefile.perf
> @@ -1015,7 +1015,7 @@ python-clean:
>  
>  SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel)
>  SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp)
> -SKELETONS :=
> +SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
>  
>  ifdef BUILD_BPF_SKEL
>  BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 8cc24967bc273..3c054b8d4677c 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -67,6 +67,7 @@
>  #include "util/top.h"
>  #include "util/affinity.h"
>  #include "util/pfm.h"
> +#include "util/bpf_counter.h"
>  #include "asm/bug.h"
>  
>  #include <linux/time64.h>
> @@ -409,12 +410,32 @@ static int read_affinity_counters(struct timespec *rs)
>  	return 0;
>  }
>  
> +static int read_bpf_map_counters(void)
> +{
> +	struct evsel *counter;
> +	int err;
> +
> +	evlist__for_each_entry(evsel_list, counter) {
> +		err = bpf_counter__read(counter);
> +		if (err)
> +			return err;
> +	}
> +	return 0;
> +}
> +
>  static void read_counters(struct timespec *rs)
>  {
>  	struct evsel *counter;
> +	int err;
>  
> -	if (!stat_config.stop_read_counter && (read_affinity_counters(rs) < 0))
> -		return;
> +	if (!stat_config.stop_read_counter) {
> +		if (target__has_bpf(&target))
> +			err = read_bpf_map_counters();
> +		else
> +			err = read_affinity_counters(rs);
> +		if (err < 0)
> +			return;
> +	}
>  
>  	evlist__for_each_entry(evsel_list, counter) {
>  		if (counter->err)
> @@ -496,11 +517,22 @@ static bool handle_interval(unsigned int interval, int *times)
>  	return false;
>  }
>  
> -static void enable_counters(void)
> +static int enable_counters(void)
>  {
> +	struct evsel *evsel;
> +	int err;
> +
> +	if (target__has_bpf(&target)) {
> +		evlist__for_each_entry(evsel_list, evsel) {
> +			err = bpf_counter__enable(evsel);
> +			if (err)
> +				return err;
> +		}
> +	}
> +
>  	if (stat_config.initial_delay < 0) {
>  		pr_info(EVLIST_DISABLED_MSG);
> -		return;
> +		return 0;
>  	}
>  
>  	if (stat_config.initial_delay > 0) {
> @@ -518,6 +550,7 @@ static void enable_counters(void)
>  		if (stat_config.initial_delay > 0)
>  			pr_info(EVLIST_ENABLED_MSG);
>  	}
> +	return 0;
>  }
>  
>  static void disable_counters(void)
> @@ -720,7 +753,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  	const bool forks = (argc > 0);
>  	bool is_pipe = STAT_RECORD ? perf_stat.data.is_pipe : false;
>  	struct affinity affinity;
> -	int i, cpu;
> +	int i, cpu, err;
>  	bool second_pass = false;
>  
>  	if (forks) {
> @@ -737,6 +770,13 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  	if (affinity__setup(&affinity) < 0)
>  		return -1;
>  
> +	if (target__has_bpf(&target)) {
> +		evlist__for_each_entry(evsel_list, counter) {
> +			if (bpf_counter__load(counter, &target))
> +				return -1;
> +		}
> +	}
> +
>  	evlist__for_each_cpu (evsel_list, i, cpu) {
>  		affinity__set(&affinity, cpu);
>  
> @@ -850,7 +890,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  	}
>  
>  	if (STAT_RECORD) {
> -		int err, fd = perf_data__fd(&perf_stat.data);
> +		int fd = perf_data__fd(&perf_stat.data);
>  
>  		if (is_pipe) {
>  			err = perf_header__write_pipe(perf_data__fd(&perf_stat.data));
> @@ -876,7 +916,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  
>  	if (forks) {
>  		evlist__start_workload(evsel_list);
> -		enable_counters();
> +		err = enable_counters();
> +		if (err)
> +			return -1;
>  
>  		if (interval || timeout || evlist__ctlfd_initialized(evsel_list))
>  			status = dispatch_events(forks, timeout, interval, &times);
> @@ -895,7 +937,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  		if (WIFSIGNALED(status))
>  			psignal(WTERMSIG(status), argv[0]);
>  	} else {
> -		enable_counters();
> +		err = enable_counters();
> +		if (err)
> +			return -1;
>  		status = dispatch_events(forks, timeout, interval, &times);
>  	}
>  
> @@ -1085,6 +1129,10 @@ static struct option stat_options[] = {
>  		   "stat events on existing process id"),
>  	OPT_STRING('t', "tid", &target.tid, "tid",
>  		   "stat events on existing thread id"),
> +#ifdef HAVE_BPF_SKEL
> +	OPT_STRING('b', "bpf-prog", &target.bpf_str, "bpf-prog-id",
> +		   "stat events on existing bpf program id"),
> +#endif
>  	OPT_BOOLEAN('a', "all-cpus", &target.system_wide,
>  		    "system-wide collection from all CPUs"),
>  	OPT_BOOLEAN('g', "group", &group,
> @@ -2064,11 +2112,12 @@ int cmd_stat(int argc, const char **argv)
>  		"perf stat [<options>] [<command>]",
>  		NULL
>  	};
> -	int status = -EINVAL, run_idx;
> +	int status = -EINVAL, run_idx, err;
>  	const char *mode;
>  	FILE *output = stderr;
>  	unsigned int interval, timeout;
>  	const char * const stat_subcommands[] = { "record", "report" };
> +	char errbuf[BUFSIZ];
>  
>  	setlocale(LC_ALL, "");
>  
> @@ -2179,6 +2228,12 @@ int cmd_stat(int argc, const char **argv)
>  	} else if (big_num_opt == 0) /* User passed --no-big-num */
>  		stat_config.big_num = false;
>  
> +	err = target__validate(&target);
> +	if (err) {
> +		target__strerror(&target, err, errbuf, BUFSIZ);
> +		pr_warning("%s\n", errbuf);
> +	}
> +
>  	setup_system_wide(argc);
>  
>  	/*
> @@ -2252,8 +2307,6 @@ int cmd_stat(int argc, const char **argv)
>  		}
>  	}
>  
> -	target__validate(&target);
> -
>  	if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))
>  		target.per_thread = true;
>  
> @@ -2384,9 +2437,10 @@ int cmd_stat(int argc, const char **argv)
>  		 * tools remain  -acme
>  		 */
>  		int fd = perf_data__fd(&perf_stat.data);
> -		int err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
> -							     process_synthesized_event,
> -							     &perf_stat.session->machines.host);
> +
> +		err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
> +							 process_synthesized_event,
> +							 &perf_stat.session->machines.host);
>  		if (err) {
>  			pr_warning("Couldn't synthesize the kernel mmap record, harmless, "
>  				   "older tools may produce warnings about this file\n.");
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index e2563d0154eb6..188521f343470 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -135,6 +135,7 @@ perf-y += clockid.o
>  
>  perf-$(CONFIG_LIBBPF) += bpf-loader.o
>  perf-$(CONFIG_LIBBPF) += bpf_map.o
> +perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o
>  perf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o
>  perf-$(CONFIG_LIBELF) += symbol-elf.o
>  perf-$(CONFIG_LIBELF) += probe-file.o
> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> new file mode 100644
> index 0000000000000..8c977f038f497
> --- /dev/null
> +++ b/tools/perf/util/bpf_counter.c
> @@ -0,0 +1,309 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/* Copyright (c) 2019 Facebook */
> +
> +#include <assert.h>
> +#include <limits.h>
> +#include <unistd.h>
> +#include <sys/time.h>
> +#include <sys/resource.h>
> +#include <linux/err.h>
> +#include <linux/zalloc.h>
> +#include <bpf/bpf.h>
> +#include <bpf/btf.h>
> +#include <bpf/libbpf.h>
> +
> +#include "bpf_counter.h"
> +#include "counts.h"
> +#include "debug.h"
> +#include "evsel.h"
> +#include "target.h"
> +
> +#include "bpf_skel/bpf_prog_profiler.skel.h"
> +
> +static inline void *u64_to_ptr(__u64 ptr)
> +{
> +	return (void *)(unsigned long)ptr;
> +}
> +
> +static void set_max_rlimit(void)
> +{
> +	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> +
> +	setrlimit(RLIMIT_MEMLOCK, &rinf);
> +}
> +
> +static struct bpf_counter *bpf_counter_alloc(void)
> +{
> +	struct bpf_counter *counter;
> +
> +	counter = zalloc(sizeof(*counter));
> +	if (counter)
> +		INIT_LIST_HEAD(&counter->list);
> +	return counter;
> +}
> +
> +static int bpf_program_profiler__destroy(struct evsel *evsel)
> +{
> +	struct bpf_counter *counter, *tmp;
> +
> +	list_for_each_entry_safe(counter, tmp,
> +				 &evsel->bpf_counter_list, list) {
> +		list_del_init(&counter->list);
> +		bpf_prog_profiler_bpf__destroy(counter->skel);
> +		free(counter);
> +	}
> +	assert(list_empty(&evsel->bpf_counter_list));
> +
> +	return 0;
> +}
> +
> +static char *bpf_target_prog_name(int tgt_fd)
> +{
> +	struct bpf_prog_info_linear *info_linear;
> +	struct bpf_func_info *func_info;
> +	const struct btf_type *t;
> +	char *name = NULL;
> +	struct btf *btf;
> +
> +	info_linear = bpf_program__get_prog_info_linear(
> +		tgt_fd, 1UL << BPF_PROG_INFO_FUNC_INFO);
> +	if (IS_ERR_OR_NULL(info_linear)) {
> +		pr_debug("failed to get info_linear for prog FD %d\n", tgt_fd);
> +		return NULL;
> +	}
> +
> +	if (info_linear->info.btf_id == 0 ||
> +	    btf__get_from_id(info_linear->info.btf_id, &btf)) {
> +		pr_debug("prog FD %d doesn't have valid btf\n", tgt_fd);
> +		goto out;
> +	}
> +
> +	func_info = u64_to_ptr(info_linear->info.func_info);
> +	t = btf__type_by_id(btf, func_info[0].type_id);
> +	if (!t) {
> +		pr_debug("btf %d doesn't have type %d\n",
> +			 info_linear->info.btf_id, func_info[0].type_id);
> +		goto out;
> +	}
> +	name = strdup(btf__name_by_offset(btf, t->name_off));
> +out:
> +	free(info_linear);
> +	return name;
> +}
> +
> +static int bpf_program_profiler_load_one(struct evsel *evsel, u32 prog_id)
> +{
> +	struct bpf_prog_profiler_bpf *skel;
> +	struct bpf_counter *counter;
> +	struct bpf_program *prog;
> +	char *prog_name;
> +	int prog_fd;
> +	int err;
> +
> +	prog_fd = bpf_prog_get_fd_by_id(prog_id);
> +	if (prog_fd < 0) {
> +		pr_err("Failed to open fd for bpf prog %u\n", prog_id);
> +		return -1;
> +	}
> +	counter = bpf_counter_alloc();
> +	if (!counter) {
> +		close(prog_fd);
> +		return -1;
> +	}
> +
> +	skel = bpf_prog_profiler_bpf__open();
> +	if (!skel) {
> +		pr_err("Failed to open bpf skeleton\n");
> +		goto err_out;
> +	}
> +
> +	skel->rodata->num_cpu = evsel__nr_cpus(evsel);
> +
> +	bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
> +	bpf_map__resize(skel->maps.fentry_readings, 1);
> +	bpf_map__resize(skel->maps.accum_readings, 1);
> +
> +	prog_name = bpf_target_prog_name(prog_fd);
> +	if (!prog_name) {
> +		pr_err("Failed to get program name for bpf prog %u. Does it have BTF?\n", prog_id);
> +		goto err_out;
> +	}
> +
> +	bpf_object__for_each_program(prog, skel->obj) {
> +		err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> +		if (err) {
> +			pr_err("bpf_program__set_attach_target failed.\n"
> +			       "Does bpf prog %u have BTF?\n", prog_id);
> +			goto err_out;
> +		}
> +	}
> +	set_max_rlimit();
> +	err = bpf_prog_profiler_bpf__load(skel);
> +	if (err) {
> +		pr_err("bpf_prog_profiler_bpf__load failed\n");
> +		goto err_out;
> +	}
> +
> +	assert(skel != NULL);
> +	counter->skel = skel;
> +	list_add(&counter->list, &evsel->bpf_counter_list);
> +	close(prog_fd);
> +	return 0;
> +err_out:
> +	bpf_prog_profiler_bpf__destroy(skel);
> +	free(counter);
> +	close(prog_fd);
> +	return -1;
> +}
> +
> +static int bpf_program_profiler__load(struct evsel *evsel, struct target *target)
> +{
> +	char *bpf_str, *bpf_str_, *tok, *saveptr = NULL, *p;
> +	u32 prog_id;
> +	int ret;
> +
> +	bpf_str_ = bpf_str = strdup(target->bpf_str);
> +	if (!bpf_str)
> +		return -1;
> +
> +	while ((tok = strtok_r(bpf_str, ",", &saveptr)) != NULL) {
> +		prog_id = strtoul(tok, &p, 10);
> +		if (prog_id == 0 || prog_id == UINT_MAX ||
> +		    (*p != '\0' && *p != ',')) {
> +			pr_err("Failed to parse bpf prog ids %s\n",
> +			       target->bpf_str);
> +			return -1;
> +		}
> +
> +		ret = bpf_program_profiler_load_one(evsel, prog_id);
> +		if (ret) {
> +			bpf_program_profiler__destroy(evsel);
> +			free(bpf_str_);
> +			return -1;
> +		}
> +		bpf_str = NULL;
> +	}
> +	free(bpf_str_);
> +	return 0;
> +}
> +
> +static int bpf_program_profiler__enable(struct evsel *evsel)
> +{
> +	struct bpf_counter *counter;
> +	int ret;
> +
> +	list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> +		assert(counter->skel != NULL);
> +		ret = bpf_prog_profiler_bpf__attach(counter->skel);
> +		if (ret) {
> +			bpf_program_profiler__destroy(evsel);
> +			return ret;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int bpf_program_profiler__read(struct evsel *evsel)
> +{
> +	int num_cpu = evsel__nr_cpus(evsel);
> +	struct bpf_perf_event_value values[num_cpu];
> +	struct bpf_counter *counter;
> +	int reading_map_fd;
> +	__u32 key = 0;
> +	int err, cpu;
> +
> +	if (list_empty(&evsel->bpf_counter_list))
> +		return -EAGAIN;
> +
> +	for (cpu = 0; cpu < num_cpu; cpu++) {
> +		perf_counts(evsel->counts, cpu, 0)->val = 0;
> +		perf_counts(evsel->counts, cpu, 0)->ena = 0;
> +		perf_counts(evsel->counts, cpu, 0)->run = 0;
> +	}
> +	list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> +		struct bpf_prog_profiler_bpf *skel = counter->skel;
> +
> +		assert(skel != NULL);
> +		reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
> +
> +		err = bpf_map_lookup_elem(reading_map_fd, &key, values);
> +		if (err) {
> +			pr_err("failed to read value\n");
> +			return err;
> +		}
> +
> +		for (cpu = 0; cpu < num_cpu; cpu++) {
> +			perf_counts(evsel->counts, cpu, 0)->val += values[cpu].counter;
> +			perf_counts(evsel->counts, cpu, 0)->ena += values[cpu].enabled;
> +			perf_counts(evsel->counts, cpu, 0)->run += values[cpu].running;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int bpf_program_profiler__install_pe(struct evsel *evsel, int cpu,
> +					    int fd)
> +{
> +	struct bpf_prog_profiler_bpf *skel;
> +	struct bpf_counter *counter;
> +	int ret;
> +
> +	list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> +		skel = counter->skel;
> +		assert(skel != NULL);
> +
> +		ret = bpf_map_update_elem(bpf_map__fd(skel->maps.events),
> +					  &cpu, &fd, BPF_ANY);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +struct bpf_counter_ops bpf_program_profiler_ops = {
> +	.load       = bpf_program_profiler__load,
> +	.enable	    = bpf_program_profiler__enable,
> +	.read       = bpf_program_profiler__read,
> +	.destroy    = bpf_program_profiler__destroy,
> +	.install_pe = bpf_program_profiler__install_pe,
> +};
> +
> +int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd)
> +{
> +	if (list_empty(&evsel->bpf_counter_list))
> +		return 0;
> +	return evsel->bpf_counter_ops->install_pe(evsel, cpu, fd);
> +}
> +
> +int bpf_counter__load(struct evsel *evsel, struct target *target)
> +{
> +	if (target__has_bpf(target))
> +		evsel->bpf_counter_ops = &bpf_program_profiler_ops;
> +
> +	if (evsel->bpf_counter_ops)
> +		return evsel->bpf_counter_ops->load(evsel, target);
> +	return 0;
> +}
> +
> +int bpf_counter__enable(struct evsel *evsel)
> +{
> +	if (list_empty(&evsel->bpf_counter_list))
> +		return 0;
> +	return evsel->bpf_counter_ops->enable(evsel);
> +}
> +
> +int bpf_counter__read(struct evsel *evsel)
> +{
> +	if (list_empty(&evsel->bpf_counter_list))
> +		return -EAGAIN;
> +	return evsel->bpf_counter_ops->read(evsel);
> +}
> +
> +void bpf_counter__destroy(struct evsel *evsel)
> +{
> +	if (list_empty(&evsel->bpf_counter_list))
> +		return;
> +	evsel->bpf_counter_ops->destroy(evsel);
> +	evsel->bpf_counter_ops = NULL;
> +}
> diff --git a/tools/perf/util/bpf_counter.h b/tools/perf/util/bpf_counter.h
> new file mode 100644
> index 0000000000000..2eca210e5dc16
> --- /dev/null
> +++ b/tools/perf/util/bpf_counter.h
> @@ -0,0 +1,72 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __PERF_BPF_COUNTER_H
> +#define __PERF_BPF_COUNTER_H 1
> +
> +#include <linux/list.h>
> +
> +struct evsel;
> +struct target;
> +struct bpf_counter;
> +
> +typedef int (*bpf_counter_evsel_op)(struct evsel *evsel);
> +typedef int (*bpf_counter_evsel_target_op)(struct evsel *evsel,
> +					   struct target *target);
> +typedef int (*bpf_counter_evsel_install_pe_op)(struct evsel *evsel,
> +					       int cpu,
> +					       int fd);
> +
> +struct bpf_counter_ops {
> +	bpf_counter_evsel_target_op load;
> +	bpf_counter_evsel_op enable;
> +	bpf_counter_evsel_op read;
> +	bpf_counter_evsel_op destroy;
> +	bpf_counter_evsel_install_pe_op install_pe;
> +};
> +
> +struct bpf_counter {
> +	void *skel;
> +	struct list_head list;
> +};
> +
> +#ifdef HAVE_BPF_SKEL
> +
> +int bpf_counter__load(struct evsel *evsel, struct target *target);
> +int bpf_counter__enable(struct evsel *evsel);
> +int bpf_counter__read(struct evsel *evsel);
> +void bpf_counter__destroy(struct evsel *evsel);
> +int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd);
> +
> +#else /* HAVE_BPF_SKEL */
> +
> +#include<linux/err.h>
> +
> +static inline int bpf_counter__load(struct evsel *evsel __maybe_unused,
> +				    struct target *target __maybe_unused)
> +{
> +	return 0;
> +}
> +
> +static inline int bpf_counter__enable(struct evsel *evsel __maybe_unused)
> +{
> +	return 0;
> +}
> +
> +static inline int bpf_counter__read(struct evsel *evsel __maybe_unused)
> +{
> +	return -EAGAIN;
> +}
> +
> +static inline void bpf_counter__destroy(struct evsel *evsel __maybe_unused)
> +{
> +}
> +
> +static inline int bpf_counter__install_pe(struct evsel *evsel __maybe_unused,
> +					  int cpu __maybe_unused,
> +					  int fd __maybe_unused)
> +{
> +	return 0;
> +}
> +
> +#endif /* HAVE_BPF_SKEL */
> +
> +#endif /* __PERF_BPF_COUNTER_H */
> diff --git a/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c b/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
> new file mode 100644
> index 0000000000000..c7cec92d02360
> --- /dev/null
> +++ b/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
> @@ -0,0 +1,93 @@
> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +// Copyright (c) 2020 Facebook
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +/* map of perf event fds, num_cpu * num_metric entries */
> +struct {
> +	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
> +	__uint(key_size, sizeof(__u32));
> +	__uint(value_size, sizeof(int));
> +} events SEC(".maps");
> +
> +/* readings at fentry */
> +struct {
> +	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
> +	__uint(key_size, sizeof(__u32));
> +	__uint(value_size, sizeof(struct bpf_perf_event_value));
> +	__uint(max_entries, 1);
> +} fentry_readings SEC(".maps");
> +
> +/* accumulated readings */
> +struct {
> +	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
> +	__uint(key_size, sizeof(__u32));
> +	__uint(value_size, sizeof(struct bpf_perf_event_value));
> +	__uint(max_entries, 1);
> +} accum_readings SEC(".maps");
> +
> +const volatile __u32 num_cpu = 1;
> +
> +SEC("fentry/XXX")
> +int BPF_PROG(fentry_XXX)
> +{
> +	__u32 key = bpf_get_smp_processor_id();
> +	struct bpf_perf_event_value *ptr;
> +	__u32 zero = 0;
> +	long err;
> +
> +	/* look up before reading, to reduce error */
> +	ptr = bpf_map_lookup_elem(&fentry_readings, &zero);
> +	if (!ptr)
> +		return 0;
> +
> +	err = bpf_perf_event_read_value(&events, key, ptr, sizeof(*ptr));
> +	if (err)
> +		return 0;
> +
> +	return 0;
> +}
> +
> +static inline void
> +fexit_update_maps(struct bpf_perf_event_value *after)
> +{
> +	struct bpf_perf_event_value *before, diff, *accum;
> +	__u32 zero = 0;
> +
> +	before = bpf_map_lookup_elem(&fentry_readings, &zero);
> +	/* only account samples with a valid fentry_reading */
> +	if (before && before->counter) {
> +		struct bpf_perf_event_value *accum;
> +
> +		diff.counter = after->counter - before->counter;
> +		diff.enabled = after->enabled - before->enabled;
> +		diff.running = after->running - before->running;
> +
> +		accum = bpf_map_lookup_elem(&accum_readings, &zero);
> +		if (accum) {
> +			accum->counter += diff.counter;
> +			accum->enabled += diff.enabled;
> +			accum->running += diff.running;
> +		}
> +	}
> +}
> +
> +SEC("fexit/XXX")
> +int BPF_PROG(fexit_XXX)
> +{
> +	struct bpf_perf_event_value reading;
> +	__u32 cpu = bpf_get_smp_processor_id();
> +	__u32 one = 1, zero = 0;
> +	int err;
> +
> +	/* read all events before updating the maps, to reduce error */
> +	err = bpf_perf_event_read_value(&events, cpu, &reading, sizeof(reading));
> +	if (err)
> +		return 0;
> +
> +	fexit_update_maps(&reading);
> +	return 0;
> +}
> +
> +char LICENSE[] SEC("license") = "Dual BSD/GPL";
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index dc0cfa5f26101..7a5ca00496000 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -25,6 +25,7 @@
>  #include <stdlib.h>
>  #include <perf/evsel.h>
>  #include "asm/bug.h"
> +#include "bpf_counter.h"
>  #include "callchain.h"
>  #include "cgroup.h"
>  #include "counts.h"
> @@ -51,6 +52,10 @@
>  #include <internal/lib.h>
>  
>  #include <linux/ctype.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +#include <bpf/btf.h>
> +#include "rlimit.h"
>  
>  struct perf_missing_features perf_missing_features;
>  
> @@ -247,6 +252,7 @@ void evsel__init(struct evsel *evsel,
>  	evsel->bpf_obj	   = NULL;
>  	evsel->bpf_fd	   = -1;
>  	INIT_LIST_HEAD(&evsel->config_terms);
> +	INIT_LIST_HEAD(&evsel->bpf_counter_list);
>  	perf_evsel__object.init(evsel);
>  	evsel->sample_size = __evsel__sample_size(attr->sample_type);
>  	evsel__calc_id_pos(evsel);
> @@ -1368,6 +1374,7 @@ void evsel__exit(struct evsel *evsel)
>  {
>  	assert(list_empty(&evsel->core.node));
>  	assert(evsel->evlist == NULL);
> +	bpf_counter__destroy(evsel);
>  	evsel__free_counts(evsel);
>  	perf_evsel__free_fd(&evsel->core);
>  	perf_evsel__free_id(&evsel->core);
> @@ -1783,6 +1790,8 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
>  
>  			FD(evsel, cpu, thread) = fd;
>  
> +			bpf_counter__install_pe(evsel, cpu, fd);
> +
>  			if (unlikely(test_attr__enabled)) {
>  				test_attr__open(&evsel->core.attr, pid, cpus->map[cpu],
>  						fd, group_fd, flags);
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index cd1d8dd431997..8226b1fefa8cf 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -17,6 +17,8 @@ struct cgroup;
>  struct perf_counts;
>  struct perf_stat_evsel;
>  union perf_event;
> +struct bpf_counter_ops;
> +struct target;
>  
>  typedef int (evsel__sb_cb_t)(union perf_event *event, void *data);
>  
> @@ -127,6 +129,8 @@ struct evsel {
>  	 * See also evsel__has_callchain().
>  	 */
>  	__u64			synth_sample_type;
> +	struct list_head	bpf_counter_list;
> +	struct bpf_counter_ops	*bpf_counter_ops;
>  };
>  
>  struct perf_missing_features {
> @@ -424,4 +428,5 @@ static inline bool evsel__is_dummy_event(struct evsel *evsel)
>  struct perf_env *evsel__env(struct evsel *evsel);
>  
>  int evsel__store_ids(struct evsel *evsel, struct evlist *evlist);
> +
>  #endif /* __PERF_EVSEL_H */
> diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
> index cc5ade85a33fc..278abecb5bdfc 100644
> --- a/tools/perf/util/python.c
> +++ b/tools/perf/util/python.c
> @@ -79,6 +79,27 @@ int metricgroup__copy_metric_events(struct evlist *evlist, struct cgroup *cgrp,
>  	return 0;
>  }
>  
> +/*
> + * XXX: All these evsel destructors need some better mechanism, like a linked
> + * list of destructors registered when the relevant code indeed is used instead
> + * of having more and more calls in perf_evsel__delete(). -- acme
> + *
> + * For now, add some more:
> + *
> + * Not to drag the BPF bandwagon...
> + */
> +void bpf_counter__destroy(struct evsel *evsel);
> +int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd);
> +
> +void bpf_counter__destroy(struct evsel *evsel __maybe_unused)
> +{
> +}
> +
> +int bpf_counter__install_pe(struct evsel *evsel __maybe_unused, int cpu __maybe_unused, int fd __maybe_unused)
> +{
> +	return 0;
> +}
> +
>  /*
>   * Support debug printing even though util/debug.c is not linked.  That means
>   * implementing 'verbose' and 'eprintf'.
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 583ae4f09c5d1..cce7a76d6473c 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -1045,7 +1045,9 @@ static void print_header(struct perf_stat_config *config,
>  	if (!config->csv_output) {
>  		fprintf(output, "\n");
>  		fprintf(output, " Performance counter stats for ");
> -		if (_target->system_wide)
> +		if (_target->bpf_str)
> +			fprintf(output, "\'BPF program(s) %s", _target->bpf_str);
> +		else if (_target->system_wide)
>  			fprintf(output, "\'system wide");
>  		else if (_target->cpu_list)
>  			fprintf(output, "\'CPU(s) %s", _target->cpu_list);
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 8ce1479c98f03..0b3957323f668 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -527,7 +527,7 @@ int create_perf_stat_counter(struct evsel *evsel,
>  	if (leader->core.nr_members > 1)
>  		attr->read_format |= PERF_FORMAT_ID|PERF_FORMAT_GROUP;
>  
> -	attr->inherit = !config->no_inherit;
> +	attr->inherit = !config->no_inherit && list_empty(&evsel->bpf_counter_list);
>  
>  	/*
>  	 * Some events get initialized with sample_(period/type) set,
> diff --git a/tools/perf/util/target.c b/tools/perf/util/target.c
> index a3db13dea937c..0f383418e3df5 100644
> --- a/tools/perf/util/target.c
> +++ b/tools/perf/util/target.c
> @@ -56,6 +56,34 @@ enum target_errno target__validate(struct target *target)
>  			ret = TARGET_ERRNO__UID_OVERRIDE_SYSTEM;
>  	}
>  
> +	/* BPF and CPU are mutually exclusive */
> +	if (target->bpf_str && target->cpu_list) {
> +		target->cpu_list = NULL;
> +		if (ret == TARGET_ERRNO__SUCCESS)
> +			ret = TARGET_ERRNO__BPF_OVERRIDE_CPU;
> +	}
> +
> +	/* BPF and PID/TID are mutually exclusive */
> +	if (target->bpf_str && target->tid) {
> +		target->tid = NULL;
> +		if (ret == TARGET_ERRNO__SUCCESS)
> +			ret = TARGET_ERRNO__BPF_OVERRIDE_PID;
> +	}
> +
> +	/* BPF and UID are mutually exclusive */
> +	if (target->bpf_str && target->uid_str) {
> +		target->uid_str = NULL;
> +		if (ret == TARGET_ERRNO__SUCCESS)
> +			ret = TARGET_ERRNO__BPF_OVERRIDE_UID;
> +	}
> +
> +	/* BPF and THREADS are mutually exclusive */
> +	if (target->bpf_str && target->per_thread) {
> +		target->per_thread = false;
> +		if (ret == TARGET_ERRNO__SUCCESS)
> +			ret = TARGET_ERRNO__BPF_OVERRIDE_THREAD;
> +	}
> +
>  	/* THREAD and SYSTEM/CPU are mutually exclusive */
>  	if (target->per_thread && (target->system_wide || target->cpu_list)) {
>  		target->per_thread = false;
> @@ -109,6 +137,10 @@ static const char *target__error_str[] = {
>  	"PID/TID switch overriding SYSTEM",
>  	"UID switch overriding SYSTEM",
>  	"SYSTEM/CPU switch overriding PER-THREAD",
> +	"BPF switch overriding CPU",
> +	"BPF switch overriding PID/TID",
> +	"BPF switch overriding UID",
> +	"BPF switch overriding THREAD",
>  	"Invalid User: %s",
>  	"Problems obtaining information for user %s",
>  };
> @@ -134,7 +166,7 @@ int target__strerror(struct target *target, int errnum,
>  
>  	switch (errnum) {
>  	case TARGET_ERRNO__PID_OVERRIDE_CPU ...
> -	     TARGET_ERRNO__SYSTEM_OVERRIDE_THREAD:
> +	     TARGET_ERRNO__BPF_OVERRIDE_THREAD:
>  		snprintf(buf, buflen, "%s", msg);
>  		break;
>  
> diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
> index 6ef01a83b24e9..f132c6c2eef81 100644
> --- a/tools/perf/util/target.h
> +++ b/tools/perf/util/target.h
> @@ -10,6 +10,7 @@ struct target {
>  	const char   *tid;
>  	const char   *cpu_list;
>  	const char   *uid_str;
> +	const char   *bpf_str;
>  	uid_t	     uid;
>  	bool	     system_wide;
>  	bool	     uses_mmap;
> @@ -36,6 +37,10 @@ enum target_errno {
>  	TARGET_ERRNO__PID_OVERRIDE_SYSTEM,
>  	TARGET_ERRNO__UID_OVERRIDE_SYSTEM,
>  	TARGET_ERRNO__SYSTEM_OVERRIDE_THREAD,
> +	TARGET_ERRNO__BPF_OVERRIDE_CPU,
> +	TARGET_ERRNO__BPF_OVERRIDE_PID,
> +	TARGET_ERRNO__BPF_OVERRIDE_UID,
> +	TARGET_ERRNO__BPF_OVERRIDE_THREAD,
>  
>  	/* for target__parse_uid() */
>  	TARGET_ERRNO__INVALID_UID,
> @@ -59,6 +64,11 @@ static inline bool target__has_cpu(struct target *target)
>  	return target->system_wide || target->cpu_list;
>  }
>  
> +static inline bool target__has_bpf(struct target *target)
> +{
> +	return target->bpf_str;
> +}
> +
>  static inline bool target__none(struct target *target)
>  {
>  	return !target__has_task(target) && !target__has_cpu(target);
> -- 
> 2.24.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-18 19:38   ` Arnaldo Carvalho de Melo
@ 2021-01-19  0:48     ` Song Liu
  2021-01-19 14:31       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 19+ messages in thread
From: Song Liu @ 2021-01-19  0:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team

Hi Arnaldo,

> On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> Em Tue, Dec 29, 2020 at 01:42:14PM -0800, Song Liu escreveu:
>> Introduce perf-stat -b option, which counts events for BPF programs, like:
>> 
>> [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
>>     1.487903822            115,200      ref-cycles
>>     1.487903822             86,012      cycles
>>     2.489147029             80,560      ref-cycles
>>     2.489147029             73,784      cycles
>>     3.490341825             60,720      ref-cycles
>>     3.490341825             37,797      cycles
>>     4.491540887             37,120      ref-cycles
>>     4.491540887             31,963      cycles
>> 
>> The example above counts cycles and ref-cycles of BPF program of id 254.
>> This is similar to bpftool-prog-profile command, but more flexible.
> 
> So trying to get back to this after vacation:
> 
> [root@five ~]# perf stat -vvv -e cycles -b 103
> Control descriptor is not initialized
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: load bpf program failed: Invalid argument
> libbpf: -- BEGIN DUMP LOG ---
> libbpf:
> Cannot recursively attach
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> 
> libbpf: -- END LOG --
> libbpf: failed to load program 'fentry_XXX'
> libbpf: failed to load object 'bpf_prog_profiler_bpf'
> libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
> bpf_prog_profiler_bpf__load failed
> [root@five ~]#
> 
> [root@five perf]# tools/bpf/runqslower/.output/
> bpftool/    libbpf/     runqslower
> [root@five perf]# tools/bpf/runqslower/.output/runqslower
> libbpf: elf: skipping unrecognized data section(10) .eh_frame
> libbpf: elf: skipping relo section(17) .rel.eh_frame for section(10) .eh_frame
> Tracing run queue latency higher than 10000 us
> TIME     COMM             PID           LAT(us)
> 
> [root@five ~]# bpftool prog
> 6: tracing  name dump_bpf_map  tag e54c922dfa54f65f  gpl
> 	loaded_at 2021-01-15T11:14:38-0300  uid 0
> 	xlated 256B  jited 154B  memlock 4096B  map_ids 3
> 	btf_id 6
> 8: tracing  name dump_bpf_prog  tag 4cb5558b902ee92e  gpl
> 	loaded_at 2021-01-15T11:14:38-0300  uid 0
> 	xlated 520B  jited 301B  memlock 4096B  map_ids 3
> 	btf_id 6
> 80: cgroup_device  tag 8b9c33f36f812014  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 744B  jited 447B  memlock 4096B
> 81: cgroup_skb  tag 6deef7357e7b4530  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 64B  jited 54B  memlock 4096B
> 82: cgroup_skb  tag 6deef7357e7b4530  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 64B  jited 54B  memlock 4096B
> 83: cgroup_device  tag ee0e253c78993a24  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 416B  jited 255B  memlock 4096B
> 84: cgroup_device  tag e3dbd137be8d6168  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 504B  jited 309B  memlock 4096B
> 85: cgroup_skb  tag 6deef7357e7b4530  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 64B  jited 54B  memlock 4096B
> 86: cgroup_skb  tag 6deef7357e7b4530  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 64B  jited 54B  memlock 4096B
> 87: cgroup_device  tag c8b47a902f1cc68b  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 464B  jited 288B  memlock 4096B
> 88: cgroup_device  tag 0ecd07b7b633809f  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 496B  jited 307B  memlock 4096B
> 89: cgroup_skb  tag 6deef7357e7b4530  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 64B  jited 54B  memlock 4096B
> 90: cgroup_skb  tag 6deef7357e7b4530  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 64B  jited 54B  memlock 4096B
> 91: cgroup_skb  tag 6deef7357e7b4530  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 64B  jited 54B  memlock 4096B
> 92: cgroup_skb  tag 6deef7357e7b4530  gpl
> 	loaded_at 2021-01-17T18:35:22-0300  uid 0
> 	xlated 64B  jited 54B  memlock 4096B
> 100: tracing  name handle__sched_w  tag 00ba1f1d332f5a57  gpl
> 	loaded_at 2021-01-18T16:34:00-0300  uid 0
> 	xlated 152B  jited 93B  memlock 4096B  map_ids 11,9
> 	btf_id 248
> 102: tracing  name handle__sched_w  tag 00ba1f1d332f5a57  gpl
> 	loaded_at 2021-01-18T16:34:00-0300  uid 0
> 	xlated 152B  jited 93B  memlock 4096B  map_ids 11,9
> 	btf_id 248
> 103: tracing  name handle__sched_s  tag a4ac42153fd49be2  gpl
> 	loaded_at 2021-01-18T16:34:00-0300  uid 0
> 	xlated 560B  jited 332B  memlock 4096B  map_ids 11,9,10
> 	btf_id 248
> [root@five ~]# perf stat -vvv -e cycles -b 92
> Control descriptor is not initialized
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> prog FD 3 doesn't have valid btf
> Failed to get program name for bpf prog 92. Does it have BTF?
> [root@five ~]# perf stat -vvv -e cycles -b 103
> Control descriptor is not initialized
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: load bpf program failed: Invalid argument
> libbpf: -- BEGIN DUMP LOG ---
> libbpf:
> Cannot recursively attach
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> 
> libbpf: -- END LOG --
> libbpf: failed to load program 'fentry_XXX'
> libbpf: failed to load object 'bpf_prog_profiler_bpf'
> libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
> bpf_prog_profiler_bpf__load failed
> [root@five ~]# perf stat -vvv -e cycles -b 102
> Control descriptor is not initialized
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: load bpf program failed: Invalid argument
> libbpf: -- BEGIN DUMP LOG ---
> libbpf:
> Cannot recursively attach
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> 
> libbpf: -- END LOG --
> libbpf: failed to load program 'fentry_XXX'
> libbpf: failed to load object 'bpf_prog_profiler_bpf'
> libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
> bpf_prog_profiler_bpf__load failed
> [root@five ~]#
> [root@five ~]# perf stat -vvv -e cycles -b 6
> Control descriptor is not initialized
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: load bpf program failed: Invalid argument
> libbpf: -- BEGIN DUMP LOG ---
> libbpf:
> Cannot recursively attach
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> 
> libbpf: -- END LOG --
> libbpf: failed to load program 'fentry_XXX'
> libbpf: failed to load object 'bpf_prog_profiler_bpf'
> libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
> bpf_prog_profiler_bpf__load failed
> [root@five ~]#
> 
> So no more segfaults, but still something to look at, I'm pushing my
> local perf/core to tmp.perf/core, please take a look.
> 
> - Arnaldo

We are looking at two issues:
1. Cannot recursively attach;
2. prog FD 3 doesn't have valid btf. 

#1 was caused by the verifier disallowing attaching fentry/fexit program 
to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target). 
This constraint was added when we only had fentry/fexit in the TRACING
type. We have extended the TRACING type to many other use cases, like 
"tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit 
this constraint. I will work on this. 

For #2, we require the target program to have BTF. I guess we won't remove
this requirement.

While I work on improving #1, could you please test with some kprobe 
programs? For example, we can use fileslower.py from bcc. 

Thanks,
Song

[...]


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-19  0:48     ` Song Liu
@ 2021-01-19 14:31       ` Arnaldo Carvalho de Melo
  2021-01-19 14:42         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-19 14:31 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team

Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
> > On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> We are looking at two issues:
> 1. Cannot recursively attach;
> 2. prog FD 3 doesn't have valid btf. 
 
> #1 was caused by the verifier disallowing attaching fentry/fexit program 
> to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target). 
> This constraint was added when we only had fentry/fexit in the TRACING
> type. We have extended the TRACING type to many other use cases, like 
> "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit 
> this constraint. I will work on this. 
 
> For #2, we require the target program to have BTF. I guess we won't remove
> this requirement.
 
> While I work on improving #1, could you please test with some kprobe 
> programs? For example, we can use fileslower.py from bcc. 

Sure, and please consider improving the error messages to state what you
described above.

- Arnaldo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-19 14:31       ` Arnaldo Carvalho de Melo
@ 2021-01-19 14:42         ` Arnaldo Carvalho de Melo
  2021-01-19 16:31           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-19 14:42 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team

Em Tue, Jan 19, 2021 at 11:31:44AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
> > > On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > We are looking at two issues:
> > 1. Cannot recursively attach;
> > 2. prog FD 3 doesn't have valid btf. 
  
> > #1 was caused by the verifier disallowing attaching fentry/fexit program 
> > to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target). 
> > This constraint was added when we only had fentry/fexit in the TRACING
> > type. We have extended the TRACING type to many other use cases, like 
> > "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit 
> > this constraint. I will work on this. 
  
> > For #2, we require the target program to have BTF. I guess we won't remove
> > this requirement.
  
> > While I work on improving #1, could you please test with some kprobe 
> > programs? For example, we can use fileslower.py from bcc. 
 
> Sure, and please consider improving the error messages to state what you
> described above.

Terminal 1:

[root@five perf]# perf trace -e 5sec.c
^C
[root@five perf]# cat 5sec.c
#include <bpf.h>

#define NSEC_PER_SEC	1000000000L

int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
{
	return sec / NSEC_PER_SEC == 5;
}

license(GPL);
[root@five perf]# perf trace -e 5sec.c/max-stack=16/
     0.000 sleep/3739435 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1743337312, rqtp: 5000000000)
                                       hrtimer_nanosleep ([kernel.kallsyms])
                                       common_nsleep ([kernel.kallsyms])
                                       __x64_sys_clock_nanosleep ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                       __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)


Terminal 2:

[root@five ~]# perf stat -e cycles -b 180 -I 1000
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.
Aborted (core dumped)
[root@five ~]# gdb perf
GNU gdb (GDB) Fedora 10.1-2.fc33
<SNIP>
Reading symbols from perf...
(gdb) run stat -e cycles -b 180 -I 1000
Starting program: /root/bin/perf stat -e cycles -b 180 -I 1000
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.32-2.fc33.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff75149d5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-4.fc33.x86_64 cyrus-sasl-lib-2.1.27-6.fc33.x86_64 elfutils-debuginfod-client-0.182-1.fc33.x86_64 elfutils-libelf-0.182-1.fc33.x86_64 elfutils-libs-0.182-1.fc33.x86_64 keyutils-libs-1.6.1-1.fc33.x86_64 krb5-libs-1.18.2-29.fc33.x86_64 libbabeltrace-1.5.8-3.fc33.x86_64 libbrotli-1.0.9-3.fc33.x86_64 libcap-2.26-8.fc33.x86_64 libcom_err-1.45.6-4.fc33.x86_64 libcurl-7.71.1-8.fc33.x86_64 libgcc-10.2.1-9.fc33.x86_64 libidn2-2.3.0-4.fc33.x86_64 libnghttp2-1.41.0-3.fc33.x86_64 libpsl-0.21.1-2.fc33.x86_64 libselinux-3.1-2.fc33.x86_64 libssh-0.9.5-1.fc33.x86_64 libunistring-0.9.10-9.fc33.x86_64 libunwind-1.4.0-4.fc33.x86_64 libuuid-2.36.1-1.fc33.x86_64 libxcrypt-4.4.17-1.fc33.x86_64 libzstd-1.4.7-1.fc33.x86_64 numactl-libs-2.0.14-1.fc33.x86_64 openldap-2.4.50-5.fc33.x86_64 openssl-libs-1.1.1i-1.fc33.x86_64 pcre-8.44-2.fc33.x86_64 pcre2-10.36-1.fc33.x86_64 perl-libs-5.32.0-465.fc33.x86_64 popt-1.18-2.fc33.x86_64 python3-libs-3.9.1-1.fc33.x86_64 slang-2.3.2-8.fc33.x86_64 xz-libs-5.2.5-4.fc33.x86_64 zlib-1.2.11-23.fc33.x86_64
(gdb) bt
#0  0x00007ffff75149d5 in raise () from /lib64/libc.so.6
#1  0x00007ffff74fd8a4 in abort () from /lib64/libc.so.6
#2  0x00007ffff74fd789 in __assert_fail_base.cold () from /lib64/libc.so.6
#3  0x00007ffff750d026 in __assert_fail () from /lib64/libc.so.6
#4  0x00000000005661c3 in bpf_program_profiler__read (evsel=0xc3a2c0) at util/bpf_counter.c:227
#5  0x0000000000000000 in ?? ()
(gdb)

207 static int bpf_program_profiler__read(struct evsel *evsel)
208 {
209         int num_cpu = evsel__nr_cpus(evsel);
210         struct bpf_perf_event_value values[num_cpu];
211         struct bpf_counter *counter;
212         int reading_map_fd;
213         __u32 key = 0;
214         int err, cpu;
215 
216         if (list_empty(&evsel->bpf_counter_list))
217                 return -EAGAIN;
218 
219         for (cpu = 0; cpu < num_cpu; cpu++) {
220                 perf_counts(evsel->counts, cpu, 0)->val = 0;
221                 perf_counts(evsel->counts, cpu, 0)->ena = 0;
222                 perf_counts(evsel->counts, cpu, 0)->run = 0;
223         }
224         list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
225                 struct bpf_prog_profiler_bpf *skel = counter->skel;
226 
227                 assert(skel != NULL);
228                 reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
229 
230                 err = bpf_map_lookup_elem(reading_map_fd, &key, values);
231                 if (err) {
232                         pr_err("failed to read value\n");
233                         return err;
234                 }
235 
236                 for (cpu = 0; cpu < num_cpu; cpu++) {
237                         perf_counts(evsel->counts, cpu, 0)->val += values[cpu].counter;
238                         perf_counts(evsel->counts, cpu, 0)->ena += values[cpu].enabled;
239                         perf_counts(evsel->counts, cpu, 0)->run += values[cpu].running;
240                 }
241         }
242         return 0;
243 }

- Arnaldo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-19 14:42         ` Arnaldo Carvalho de Melo
@ 2021-01-19 16:31           ` Arnaldo Carvalho de Melo
  2021-01-19 21:54             ` Song Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-19 16:31 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team

Em Tue, Jan 19, 2021 at 11:42:49AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Jan 19, 2021 at 11:31:44AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
> > > > On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > We are looking at two issues:
> > > 1. Cannot recursively attach;
> > > 2. prog FD 3 doesn't have valid btf. 
>   
> > > #1 was caused by the verifier disallowing attaching fentry/fexit program 
> > > to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target). 
> > > This constraint was added when we only had fentry/fexit in the TRACING
> > > type. We have extended the TRACING type to many other use cases, like 
> > > "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit 
> > > this constraint. I will work on this. 
>   
> > > For #2, we require the target program to have BTF. I guess we won't remove
> > > this requirement.
>   
> > > While I work on improving #1, could you please test with some kprobe 
> > > programs? For example, we can use fileslower.py from bcc. 
>  
> > Sure, and please consider improving the error messages to state what you
> > described above.
> 
> Terminal 1:
> 
> [root@five perf]# perf trace -e 5sec.c
> ^C
> [root@five perf]# cat 5sec.c
> #include <bpf.h>
> 
> #define NSEC_PER_SEC	1000000000L
> 
> int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
> {
> 	return sec / NSEC_PER_SEC == 5;
> }
> 
> license(GPL);
> [root@five perf]# perf trace -e 5sec.c/max-stack=16/
>      0.000 sleep/3739435 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1743337312, rqtp: 5000000000)
>                                        hrtimer_nanosleep ([kernel.kallsyms])
>                                        common_nsleep ([kernel.kallsyms])
>                                        __x64_sys_clock_nanosleep ([kernel.kallsyms])
>                                        do_syscall_64 ([kernel.kallsyms])
>                                        entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
>                                        __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
> 
> 
> Terminal 2:
> 
> [root@five ~]# perf stat -e cycles -b 180 -I 1000
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.
> Aborted (core dumped)

Out to lunch, will continue later, but this may help you figure this out
till then :)

Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, bpf_program_profiler_load_one (evsel=0xce02c0, prog_id=244) at util/bpf_counter.c:96
96	{
(gdb) n
104		prog_fd = bpf_prog_get_fd_by_id(prog_id);
(gdb) 
105		if (prog_fd < 0) {
(gdb) 
109		counter = bpf_counter_alloc();
(gdb) 
110		if (!counter) {
(gdb) n
115		skel = bpf_prog_profiler_bpf__open();
(gdb) p counter
$9 = (struct bpf_counter *) 0xce09e0
(gdb) p *counter
$10 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
(gdb) p *counter
$11 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
(gdb) n
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
116		if (!skel) {
(gdb) 
121		skel->rodata->num_cpu = evsel__nr_cpus(evsel);
(gdb) 
123		bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
(gdb) 
124		bpf_map__resize(skel->maps.fentry_readings, 1);
(gdb) 
125		bpf_map__resize(skel->maps.accum_readings, 1);
(gdb) 
127		prog_name = bpf_target_prog_name(prog_fd);
(gdb) 
128		if (!prog_name) {
(gdb) 
133		bpf_object__for_each_program(prog, skel->obj) {
(gdb) 
134			err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
(gdb) 
135			if (err) {
(gdb) 
133		bpf_object__for_each_program(prog, skel->obj) {
(gdb) p evsel
$12 = (struct evsel *) 0xce02c0
(gdb) p evsel->name
$13 = 0xce04e0 "cycles"
(gdb) n
134			err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
(gdb) 
135			if (err) {
(gdb) 
133		bpf_object__for_each_program(prog, skel->obj) {
(gdb) 
141		set_max_rlimit();
(gdb) 
142		err = bpf_prog_profiler_bpf__load(skel);
(gdb) 
143		if (err) {
(gdb) 
148		assert(skel != NULL);
(gdb) 
149		counter->skel = skel;
(gdb) 
150		list_add(&counter->list, &evsel->bpf_counter_list);
(gdb) c
Continuing.

Breakpoint 4, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:247
247	{
(gdb) n
252		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
(gdb) 
253			skel = counter->skel;
(gdb) watch counter->skel
Hardware watchpoint 6: counter->skel
(gdb) p counter->skel
$14 = (void *) 0xce0a00
(gdb) n
254			assert(skel != NULL);
(gdb) p skel
$15 = (struct bpf_prog_profiler_bpf *) 0xce0a00
(gdb) c
Continuing.

Hardware watchpoint 6: counter->skel

Old value = (void *) 0xce0a00
New value = (void *) 0x0
0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
252		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000005ceb44 in bpf_program_profiler_load_one at util/bpf_counter.c:96
	breakpoint already hit 1 time
2       breakpoint     keep y   0x00000000005cef2e in bpf_program_profiler__enable at util/bpf_counter.c:192
3       breakpoint     keep y   0x00000000005cf00a in bpf_program_profiler__read at util/bpf_counter.c:208
4       breakpoint     keep y   0x00000000005cf3ba in bpf_program_profiler__install_pe at util/bpf_counter.c:247
	breakpoint already hit 1 time
5       breakpoint     keep y   0x00000000005ce8d4 in bpf_program_profiler__destroy at util/bpf_counter.c:47
6       hw watchpoint  keep y                      counter->skel
	breakpoint already hit 1 time
(gdb) p evsel
$16 = (struct evsel *) 0xce02c0
(gdb) bt
#0  0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
#1  0x00000000005cf4f3 in bpf_counter__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:276
#2  0x00000000004f3ef1 in evsel__open_cpu (evsel=0xce02c0, cpus=0xce0500, threads=0xce75c0, start_cpu=0, end_cpu=1) at util/evsel.c:1792
#3  0x00000000004f4aa7 in evsel__open_per_cpu (evsel=0xce02c0, cpus=0xce0500, cpu=0) at util/evsel.c:1996
#4  0x000000000057f6f2 in create_perf_stat_counter (evsel=0xce02c0, config=0xa97860 <stat_config>, target=0xa97660 <target>, cpu=0) at util/stat.c:568
#5  0x0000000000435d96 in __run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:790
#6  0x00000000004367e3 in run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:1001
#7  0x0000000000439e35 in cmd_stat (argc=0, argv=0x7fffffffdb60) at builtin-stat.c:2415
#8  0x00000000004cf058 in run_builtin (p=0xaabe00 <commands+288>, argc=7, argv=0x7fffffffdb60) at perf.c:312
#9  0x00000000004cf2c5 in handle_internal_command (argc=7, argv=0x7fffffffdb60) at perf.c:364
#10 0x00000000004cf40c in run_argv (argcp=0x7fffffffd9ac, argv=0x7fffffffd9a0) at perf.c:408
#11 0x00000000004cf7d8 in main (argc=7, argv=0x7fffffffdb60) at perf.c:538
(gdb) n
261		return 0;


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-19 16:31           ` Arnaldo Carvalho de Melo
@ 2021-01-19 21:54             ` Song Liu
  2021-01-19 22:30               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 19+ messages in thread
From: Song Liu @ 2021-01-19 21:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team



> On Jan 19, 2021, at 8:31 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> Em Tue, Jan 19, 2021 at 11:42:49AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Tue, Jan 19, 2021 at 11:31:44AM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
>>>>> On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>>>> We are looking at two issues:
>>>> 1. Cannot recursively attach;
>>>> 2. prog FD 3 doesn't have valid btf. 
>> 
>>>> #1 was caused by the verifier disallowing attaching fentry/fexit program 
>>>> to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target). 
>>>> This constraint was added when we only had fentry/fexit in the TRACING
>>>> type. We have extended the TRACING type to many other use cases, like 
>>>> "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit 
>>>> this constraint. I will work on this. 
>> 
>>>> For #2, we require the target program to have BTF. I guess we won't remove
>>>> this requirement.
>> 
>>>> While I work on improving #1, could you please test with some kprobe 
>>>> programs? For example, we can use fileslower.py from bcc. 
>> 
>>> Sure, and please consider improving the error messages to state what you
>>> described above.
>> 
>> Terminal 1:
>> 
>> [root@five perf]# perf trace -e 5sec.c
>> ^C
>> [root@five perf]# cat 5sec.c
>> #include <bpf.h>
>> 
>> #define NSEC_PER_SEC	1000000000L
>> 
>> int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
>> {
>> 	return sec / NSEC_PER_SEC == 5;
>> }
>> 
>> license(GPL);
>> [root@five perf]# perf trace -e 5sec.c/max-stack=16/
>>     0.000 sleep/3739435 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1743337312, rqtp: 5000000000)
>>                                       hrtimer_nanosleep ([kernel.kallsyms])
>>                                       common_nsleep ([kernel.kallsyms])
>>                                       __x64_sys_clock_nanosleep ([kernel.kallsyms])
>>                                       do_syscall_64 ([kernel.kallsyms])
>>                                       entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
>>                                       __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
>> 
>> 
>> Terminal 2:
>> 
>> [root@five ~]# perf stat -e cycles -b 180 -I 1000
>> libbpf: elf: skipping unrecognized data section(9) .eh_frame
>> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
>> perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.
>> Aborted (core dumped)
> 
> Out to lunch, will continue later, but this may help you figure this out
> till then :)
> 
> Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> 
> Breakpoint 1, bpf_program_profiler_load_one (evsel=0xce02c0, prog_id=244) at util/bpf_counter.c:96
> 96	{
> (gdb) n
> 104		prog_fd = bpf_prog_get_fd_by_id(prog_id);
> (gdb) 
> 105		if (prog_fd < 0) {
> (gdb) 
> 109		counter = bpf_counter_alloc();
> (gdb) 
> 110		if (!counter) {
> (gdb) n
> 115		skel = bpf_prog_profiler_bpf__open();
> (gdb) p counter
> $9 = (struct bpf_counter *) 0xce09e0
> (gdb) p *counter
> $10 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> (gdb) p *counter
> $11 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> (gdb) n
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> 116		if (!skel) {
> (gdb) 
> 121		skel->rodata->num_cpu = evsel__nr_cpus(evsel);
> (gdb) 
> 123		bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
> (gdb) 
> 124		bpf_map__resize(skel->maps.fentry_readings, 1);
> (gdb) 
> 125		bpf_map__resize(skel->maps.accum_readings, 1);
> (gdb) 
> 127		prog_name = bpf_target_prog_name(prog_fd);
> (gdb) 
> 128		if (!prog_name) {
> (gdb) 
> 133		bpf_object__for_each_program(prog, skel->obj) {
> (gdb) 
> 134			err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> (gdb) 
> 135			if (err) {
> (gdb) 
> 133		bpf_object__for_each_program(prog, skel->obj) {
> (gdb) p evsel
> $12 = (struct evsel *) 0xce02c0
> (gdb) p evsel->name
> $13 = 0xce04e0 "cycles"
> (gdb) n
> 134			err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> (gdb) 
> 135			if (err) {
> (gdb) 
> 133		bpf_object__for_each_program(prog, skel->obj) {
> (gdb) 
> 141		set_max_rlimit();
> (gdb) 
> 142		err = bpf_prog_profiler_bpf__load(skel);
> (gdb) 
> 143		if (err) {
> (gdb) 
> 148		assert(skel != NULL);
> (gdb) 
> 149		counter->skel = skel;
> (gdb) 
> 150		list_add(&counter->list, &evsel->bpf_counter_list);
> (gdb) c
> Continuing.
> 
> Breakpoint 4, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:247
> 247	{
> (gdb) n
> 252		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> (gdb) 
> 253			skel = counter->skel;
> (gdb) watch counter->skel
> Hardware watchpoint 6: counter->skel
> (gdb) p counter->skel
> $14 = (void *) 0xce0a00
> (gdb) n
> 254			assert(skel != NULL);
> (gdb) p skel
> $15 = (struct bpf_prog_profiler_bpf *) 0xce0a00
> (gdb) c
> Continuing.
> 
> Hardware watchpoint 6: counter->skel
> 
> Old value = (void *) 0xce0a00
> New value = (void *) 0x0
> 0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
> 252		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {

So it was the list operation that set counter->skel to NULL? I am really confused...

Song

> (gdb) info b
> Num     Type           Disp Enb Address            What
> 1       breakpoint     keep y   0x00000000005ceb44 in bpf_program_profiler_load_one at util/bpf_counter.c:96
> 	breakpoint already hit 1 time
> 2       breakpoint     keep y   0x00000000005cef2e in bpf_program_profiler__enable at util/bpf_counter.c:192
> 3       breakpoint     keep y   0x00000000005cf00a in bpf_program_profiler__read at util/bpf_counter.c:208
> 4       breakpoint     keep y   0x00000000005cf3ba in bpf_program_profiler__install_pe at util/bpf_counter.c:247
> 	breakpoint already hit 1 time
> 5       breakpoint     keep y   0x00000000005ce8d4 in bpf_program_profiler__destroy at util/bpf_counter.c:47
> 6       hw watchpoint  keep y                      counter->skel
> 	breakpoint already hit 1 time
> (gdb) p evsel
> $16 = (struct evsel *) 0xce02c0
> (gdb) bt
> #0  0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
> #1  0x00000000005cf4f3 in bpf_counter__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:276
> #2  0x00000000004f3ef1 in evsel__open_cpu (evsel=0xce02c0, cpus=0xce0500, threads=0xce75c0, start_cpu=0, end_cpu=1) at util/evsel.c:1792
> #3  0x00000000004f4aa7 in evsel__open_per_cpu (evsel=0xce02c0, cpus=0xce0500, cpu=0) at util/evsel.c:1996
> #4  0x000000000057f6f2 in create_perf_stat_counter (evsel=0xce02c0, config=0xa97860 <stat_config>, target=0xa97660 <target>, cpu=0) at util/stat.c:568
> #5  0x0000000000435d96 in __run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:790
> #6  0x00000000004367e3 in run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:1001
> #7  0x0000000000439e35 in cmd_stat (argc=0, argv=0x7fffffffdb60) at builtin-stat.c:2415
> #8  0x00000000004cf058 in run_builtin (p=0xaabe00 <commands+288>, argc=7, argv=0x7fffffffdb60) at perf.c:312
> #9  0x00000000004cf2c5 in handle_internal_command (argc=7, argv=0x7fffffffdb60) at perf.c:364
> #10 0x00000000004cf40c in run_argv (argcp=0x7fffffffd9ac, argv=0x7fffffffd9a0) at perf.c:408
> #11 0x00000000004cf7d8 in main (argc=7, argv=0x7fffffffdb60) at perf.c:538
> (gdb) n
> 261		return 0;
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-19 21:54             ` Song Liu
@ 2021-01-19 22:30               ` Arnaldo Carvalho de Melo
  2021-01-20 12:37                 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-19 22:30 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team

Em Tue, Jan 19, 2021 at 09:54:50PM +0000, Song Liu escreveu:
> 
> 
> > On Jan 19, 2021, at 8:31 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > 
> > Em Tue, Jan 19, 2021 at 11:42:49AM -0300, Arnaldo Carvalho de Melo escreveu:
> >> Em Tue, Jan 19, 2021 at 11:31:44AM -0300, Arnaldo Carvalho de Melo escreveu:
> >>> Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
> >>>>> On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> >>>> We are looking at two issues:
> >>>> 1. Cannot recursively attach;
> >>>> 2. prog FD 3 doesn't have valid btf. 
> >> 
> >>>> #1 was caused by the verifier disallowing attaching fentry/fexit program 
> >>>> to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target). 
> >>>> This constraint was added when we only had fentry/fexit in the TRACING
> >>>> type. We have extended the TRACING type to many other use cases, like 
> >>>> "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit 
> >>>> this constraint. I will work on this. 
> >> 
> >>>> For #2, we require the target program to have BTF. I guess we won't remove
> >>>> this requirement.
> >> 
> >>>> While I work on improving #1, could you please test with some kprobe 
> >>>> programs? For example, we can use fileslower.py from bcc. 
> >> 
> >>> Sure, and please consider improving the error messages to state what you
> >>> described above.
> >> 
> >> Terminal 1:
> >> 
> >> [root@five perf]# perf trace -e 5sec.c
> >> ^C
> >> [root@five perf]# cat 5sec.c
> >> #include <bpf.h>
> >> 
> >> #define NSEC_PER_SEC	1000000000L
> >> 
> >> int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
> >> {
> >> 	return sec / NSEC_PER_SEC == 5;
> >> }
> >> 
> >> license(GPL);
> >> [root@five perf]# perf trace -e 5sec.c/max-stack=16/
> >>     0.000 sleep/3739435 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1743337312, rqtp: 5000000000)
> >>                                       hrtimer_nanosleep ([kernel.kallsyms])
> >>                                       common_nsleep ([kernel.kallsyms])
> >>                                       __x64_sys_clock_nanosleep ([kernel.kallsyms])
> >>                                       do_syscall_64 ([kernel.kallsyms])
> >>                                       entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
> >>                                       __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
> >> 
> >> 
> >> Terminal 2:
> >> 
> >> [root@five ~]# perf stat -e cycles -b 180 -I 1000
> >> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> >> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> >> perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.
> >> Aborted (core dumped)
> > 
> > Out to lunch, will continue later, but this may help you figure this out
> > till then :)
> > 
> > Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > 
> > Breakpoint 1, bpf_program_profiler_load_one (evsel=0xce02c0, prog_id=244) at util/bpf_counter.c:96
> > 96	{
> > (gdb) n
> > 104		prog_fd = bpf_prog_get_fd_by_id(prog_id);
> > (gdb) 
> > 105		if (prog_fd < 0) {
> > (gdb) 
> > 109		counter = bpf_counter_alloc();
> > (gdb) 
> > 110		if (!counter) {
> > (gdb) n
> > 115		skel = bpf_prog_profiler_bpf__open();
> > (gdb) p counter
> > $9 = (struct bpf_counter *) 0xce09e0
> > (gdb) p *counter
> > $10 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> > (gdb) p *counter
> > $11 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> > (gdb) n
> > libbpf: elf: skipping unrecognized data section(9) .eh_frame
> > libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> > 116		if (!skel) {
> > (gdb) 
> > 121		skel->rodata->num_cpu = evsel__nr_cpus(evsel);
> > (gdb) 
> > 123		bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
> > (gdb) 
> > 124		bpf_map__resize(skel->maps.fentry_readings, 1);
> > (gdb) 
> > 125		bpf_map__resize(skel->maps.accum_readings, 1);
> > (gdb) 
> > 127		prog_name = bpf_target_prog_name(prog_fd);
> > (gdb) 
> > 128		if (!prog_name) {
> > (gdb) 
> > 133		bpf_object__for_each_program(prog, skel->obj) {
> > (gdb) 
> > 134			err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> > (gdb) 
> > 135			if (err) {
> > (gdb) 
> > 133		bpf_object__for_each_program(prog, skel->obj) {
> > (gdb) p evsel
> > $12 = (struct evsel *) 0xce02c0
> > (gdb) p evsel->name
> > $13 = 0xce04e0 "cycles"
> > (gdb) n
> > 134			err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> > (gdb) 
> > 135			if (err) {
> > (gdb) 
> > 133		bpf_object__for_each_program(prog, skel->obj) {
> > (gdb) 
> > 141		set_max_rlimit();
> > (gdb) 
> > 142		err = bpf_prog_profiler_bpf__load(skel);
> > (gdb) 
> > 143		if (err) {
> > (gdb) 
> > 148		assert(skel != NULL);
> > (gdb) 
> > 149		counter->skel = skel;
> > (gdb) 
> > 150		list_add(&counter->list, &evsel->bpf_counter_list);
> > (gdb) c
> > Continuing.
> > 
> > Breakpoint 4, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:247
> > 247	{
> > (gdb) n
> > 252		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> > (gdb) 
> > 253			skel = counter->skel;
> > (gdb) watch counter->skel
> > Hardware watchpoint 6: counter->skel
> > (gdb) p counter->skel
> > $14 = (void *) 0xce0a00
> > (gdb) n
> > 254			assert(skel != NULL);
> > (gdb) p skel
> > $15 = (struct bpf_prog_profiler_bpf *) 0xce0a00
> > (gdb) c
> > Continuing.
> > 
> > Hardware watchpoint 6: counter->skel
> > 
> > Old value = (void *) 0xce0a00
> > New value = (void *) 0x0
> > 0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
> > 252		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> 
> So it was the list operation that set counter->skel to NULL? I am really confused...

Yep, I'm confused as well and trying to reproduce this, but got
sidetracked...

- Arnaldo
 
> Song
> 
> > (gdb) info b
> > Num     Type           Disp Enb Address            What
> > 1       breakpoint     keep y   0x00000000005ceb44 in bpf_program_profiler_load_one at util/bpf_counter.c:96
> > 	breakpoint already hit 1 time
> > 2       breakpoint     keep y   0x00000000005cef2e in bpf_program_profiler__enable at util/bpf_counter.c:192
> > 3       breakpoint     keep y   0x00000000005cf00a in bpf_program_profiler__read at util/bpf_counter.c:208
> > 4       breakpoint     keep y   0x00000000005cf3ba in bpf_program_profiler__install_pe at util/bpf_counter.c:247
> > 	breakpoint already hit 1 time
> > 5       breakpoint     keep y   0x00000000005ce8d4 in bpf_program_profiler__destroy at util/bpf_counter.c:47
> > 6       hw watchpoint  keep y                      counter->skel
> > 	breakpoint already hit 1 time
> > (gdb) p evsel
> > $16 = (struct evsel *) 0xce02c0
> > (gdb) bt
> > #0  0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
> > #1  0x00000000005cf4f3 in bpf_counter__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:276
> > #2  0x00000000004f3ef1 in evsel__open_cpu (evsel=0xce02c0, cpus=0xce0500, threads=0xce75c0, start_cpu=0, end_cpu=1) at util/evsel.c:1792
> > #3  0x00000000004f4aa7 in evsel__open_per_cpu (evsel=0xce02c0, cpus=0xce0500, cpu=0) at util/evsel.c:1996
> > #4  0x000000000057f6f2 in create_perf_stat_counter (evsel=0xce02c0, config=0xa97860 <stat_config>, target=0xa97660 <target>, cpu=0) at util/stat.c:568
> > #5  0x0000000000435d96 in __run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:790
> > #6  0x00000000004367e3 in run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:1001
> > #7  0x0000000000439e35 in cmd_stat (argc=0, argv=0x7fffffffdb60) at builtin-stat.c:2415
> > #8  0x00000000004cf058 in run_builtin (p=0xaabe00 <commands+288>, argc=7, argv=0x7fffffffdb60) at perf.c:312
> > #9  0x00000000004cf2c5 in handle_internal_command (argc=7, argv=0x7fffffffdb60) at perf.c:364
> > #10 0x00000000004cf40c in run_argv (argcp=0x7fffffffd9ac, argv=0x7fffffffd9a0) at perf.c:408
> > #11 0x00000000004cf7d8 in main (argc=7, argv=0x7fffffffdb60) at perf.c:538
> > (gdb) n
> > 261		return 0;
> > 
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-19 22:30               ` Arnaldo Carvalho de Melo
@ 2021-01-20 12:37                 ` Arnaldo Carvalho de Melo
  2021-01-20 13:01                   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-20 12:37 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team

Em Tue, Jan 19, 2021 at 07:30:21PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Jan 19, 2021 at 09:54:50PM +0000, Song Liu escreveu:
> > 
> > 
> > > On Jan 19, 2021, at 8:31 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > 
> > > Em Tue, Jan 19, 2021 at 11:42:49AM -0300, Arnaldo Carvalho de Melo escreveu:
> > >> Em Tue, Jan 19, 2021 at 11:31:44AM -0300, Arnaldo Carvalho de Melo escreveu:
> > >>> Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
> > >>>>> On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > >>>> We are looking at two issues:
> > >>>> 1. Cannot recursively attach;
> > >>>> 2. prog FD 3 doesn't have valid btf. 
> > >> 
> > >>>> #1 was caused by the verifier disallowing attaching fentry/fexit program 
> > >>>> to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target). 
> > >>>> This constraint was added when we only had fentry/fexit in the TRACING
> > >>>> type. We have extended the TRACING type to many other use cases, like 
> > >>>> "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit 
> > >>>> this constraint. I will work on this. 
> > >> 
> > >>>> For #2, we require the target program to have BTF. I guess we won't remove
> > >>>> this requirement.
> > >> 
> > >>>> While I work on improving #1, could you please test with some kprobe 
> > >>>> programs? For example, we can use fileslower.py from bcc. 
> > >> 
> > >>> Sure, and please consider improving the error messages to state what you
> > >>> described above.
> > >> 
> > >> Terminal 1:
> > >> 
> > >> [root@five perf]# perf trace -e 5sec.c
> > >> ^C
> > >> [root@five perf]# cat 5sec.c
> > >> #include <bpf.h>
> > >> 
> > >> #define NSEC_PER_SEC	1000000000L
> > >> 
> > >> int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
> > >> {
> > >> 	return sec / NSEC_PER_SEC == 5;
> > >> }
> > >> 
> > >> license(GPL);
> > >> [root@five perf]# perf trace -e 5sec.c/max-stack=16/
> > >>     0.000 sleep/3739435 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1743337312, rqtp: 5000000000)
> > >>                                       hrtimer_nanosleep ([kernel.kallsyms])
> > >>                                       common_nsleep ([kernel.kallsyms])
> > >>                                       __x64_sys_clock_nanosleep ([kernel.kallsyms])
> > >>                                       do_syscall_64 ([kernel.kallsyms])
> > >>                                       entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
> > >>                                       __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
> > >> 
> > >> 
> > >> Terminal 2:
> > >> 
> > >> [root@five ~]# perf stat -e cycles -b 180 -I 1000
> > >> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> > >> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> > >> perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.
> > >> Aborted (core dumped)
> > > 
> > > Out to lunch, will continue later, but this may help you figure this out
> > > till then :)
> > > 
> > > Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
> > > [Thread debugging using libthread_db enabled]
> > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > > 
> > > Breakpoint 1, bpf_program_profiler_load_one (evsel=0xce02c0, prog_id=244) at util/bpf_counter.c:96
> > > 96	{
> > > (gdb) n
> > > 104		prog_fd = bpf_prog_get_fd_by_id(prog_id);
> > > (gdb) 
> > > 105		if (prog_fd < 0) {
> > > (gdb) 
> > > 109		counter = bpf_counter_alloc();
> > > (gdb) 
> > > 110		if (!counter) {
> > > (gdb) n
> > > 115		skel = bpf_prog_profiler_bpf__open();
> > > (gdb) p counter
> > > $9 = (struct bpf_counter *) 0xce09e0
> > > (gdb) p *counter
> > > $10 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> > > (gdb) p *counter
> > > $11 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> > > (gdb) n
> > > libbpf: elf: skipping unrecognized data section(9) .eh_frame
> > > libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> > > 116		if (!skel) {
> > > (gdb) 
> > > 121		skel->rodata->num_cpu = evsel__nr_cpus(evsel);
> > > (gdb) 
> > > 123		bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
> > > (gdb) 
> > > 124		bpf_map__resize(skel->maps.fentry_readings, 1);
> > > (gdb) 
> > > 125		bpf_map__resize(skel->maps.accum_readings, 1);
> > > (gdb) 
> > > 127		prog_name = bpf_target_prog_name(prog_fd);
> > > (gdb) 
> > > 128		if (!prog_name) {
> > > (gdb) 
> > > 133		bpf_object__for_each_program(prog, skel->obj) {
> > > (gdb) 
> > > 134			err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> > > (gdb) 
> > > 135			if (err) {
> > > (gdb) 
> > > 133		bpf_object__for_each_program(prog, skel->obj) {
> > > (gdb) p evsel
> > > $12 = (struct evsel *) 0xce02c0
> > > (gdb) p evsel->name
> > > $13 = 0xce04e0 "cycles"
> > > (gdb) n
> > > 134			err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> > > (gdb) 
> > > 135			if (err) {
> > > (gdb) 
> > > 133		bpf_object__for_each_program(prog, skel->obj) {
> > > (gdb) 
> > > 141		set_max_rlimit();
> > > (gdb) 
> > > 142		err = bpf_prog_profiler_bpf__load(skel);
> > > (gdb) 
> > > 143		if (err) {
> > > (gdb) 
> > > 148		assert(skel != NULL);
> > > (gdb) 
> > > 149		counter->skel = skel;
> > > (gdb) 
> > > 150		list_add(&counter->list, &evsel->bpf_counter_list);
> > > (gdb) c
> > > Continuing.
> > > 
> > > Breakpoint 4, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:247
> > > 247	{
> > > (gdb) n
> > > 252		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> > > (gdb) 
> > > 253			skel = counter->skel;
> > > (gdb) watch counter->skel
> > > Hardware watchpoint 6: counter->skel
> > > (gdb) p counter->skel
> > > $14 = (void *) 0xce0a00
> > > (gdb) n
> > > 254			assert(skel != NULL);
> > > (gdb) p skel
> > > $15 = (struct bpf_prog_profiler_bpf *) 0xce0a00
> > > (gdb) c
> > > Continuing.
> > > 
> > > Hardware watchpoint 6: counter->skel
> > > 
> > > Old value = (void *) 0xce0a00
> > > New value = (void *) 0x0
> > > 0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
> > > 252		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> > 
> > So it was the list operation that set counter->skel to NULL? I am really confused...
> 
> Yep, I'm confused as well and trying to reproduce this, but got
> sidetracked...

Coming back to this, now it is exploding later:

Breakpoint 8, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=22, fd=32) at util/bpf_counter.c:254
254			assert(skel != NULL);
(gdb) p skel
$52 = (struct bpf_prog_profiler_bpf *) 0xce0a00
(gdb) c
Continuing.

Breakpoint 4, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=23, fd=33) at util/bpf_counter.c:247
247	{
(gdb) p skel
$53 = (struct bpf_prog_profiler_bpf *) 0xce04c0
(gdb) c
Continuing.

Breakpoint 8, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=23, fd=33) at util/bpf_counter.c:254
254			assert(skel != NULL);
(gdb) p skel
$54 = (struct bpf_prog_profiler_bpf *) 0xce0a00
(gdb) c
Continuing.

Breakpoint 2, bpf_program_profiler__enable (evsel=0xce02c0) at util/bpf_counter.c:192
192	{
(gdb) n
196		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
(gdb)
197			assert(counter->skel != NULL);
(gdb)
198			ret = bpf_prog_profiler_bpf__attach(counter->skel);
(gdb) c
Continuing.

Breakpoint 3, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:208
208	{
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
224		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
(gdb) p evsel
$55 = (struct evsel *) 0x0
(gdb) bt
#0  0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
#1  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
#1  0x0000000000000000 in ?? ()
(gdb)
(gdb) info threads
  Id   Target Id                                  Frame
* 1    Thread 0x7ffff647f900 (LWP 1725711) "perf" 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
(gdb)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-20 12:37                 ` Arnaldo Carvalho de Melo
@ 2021-01-20 13:01                   ` Arnaldo Carvalho de Melo
  2021-01-20 13:50                     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-20 13:01 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team

Em Wed, Jan 20, 2021 at 09:37:27AM -0300, Arnaldo Carvalho de Melo escreveu:
> Breakpoint 3, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:208
> 208	{
> (gdb) c
> Continuing.
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> 224		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> (gdb) p evsel
> $55 = (struct evsel *) 0x0
> (gdb) bt
> #0  0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> #1  0x0000000000000000 in ?? ()
> (gdb) bt
> #0  0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> #1  0x0000000000000000 in ?? ()
> (gdb)
> (gdb) info threads
>   Id   Target Id                                  Frame
> * 1    Thread 0x7ffff647f900 (LWP 1725711) "perf" 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> (gdb)

(gdb) run stat -e cycles -b 244 -I 1000
Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame

Breakpoint 1, read_bpf_map_counters () at builtin-stat.c:414
414	{
(gdb) c
Continuing.

Breakpoint 2, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:224
224		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
(gdb) bt
#0  bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:224
#1  0x00000000005cf648 in bpf_counter__read (evsel=0xce02c0) at util/bpf_counter.c:300
#2  0x0000000000434e46 in read_bpf_map_counters () at builtin-stat.c:419
#3  0x0000000000434ecb in read_counters (rs=0x7fffffff9530) at builtin-stat.c:433
#4  0x00000000004351f3 in process_interval () at builtin-stat.c:498
#5  0x00000000004352b9 in handle_interval (interval=1000, times=0x7fffffff9618) at builtin-stat.c:513
#6  0x0000000000435859 in dispatch_events (forks=false, timeout=0, interval=1000, times=0x7fffffff9618) at builtin-stat.c:676
#7  0x00000000004365f1 in __run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:944
#8  0x00000000004367e3 in run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:1001
#9  0x0000000000439e35 in cmd_stat (argc=0, argv=0x7fffffffdb60) at builtin-stat.c:2415
#10 0x00000000004cf058 in run_builtin (p=0xaabe00 <commands+288>, argc=7, argv=0x7fffffffdb60) at perf.c:312
#11 0x00000000004cf2c5 in handle_internal_command (argc=7, argv=0x7fffffffdb60) at perf.c:364
#12 0x00000000004cf40c in run_argv (argcp=0x7fffffffd9ac, argv=0x7fffffffd9a0) at perf.c:408
#13 0x00000000004cf7d8 in main (argc=7, argv=0x7fffffffdb60) at perf.c:538
(gdb) n
225			struct bpf_prog_profiler_bpf *skel = counter->skel;
(gdb) n
227			assert(skel != NULL);
(gdb) n
228			reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
(gdb) p skel->maps.accum_readings
$16 = (struct bpf_map *) 0xce7400
(gdb) p *skel->maps.accum_readings
$17 = {name = 0xce7580 "accum_readings", fd = 7, sec_idx = 6, sec_offset = 56, map_ifindex = 0, inner_map_fd = -1, def = {type = 6, key_size = 4, value_size = 24, max_entries = 1, map_flags = 0}, numa_node = 0, btf_var_idx = 2, btf_key_type_id = 0, btf_value_type_id = 0,
  btf_vmlinux_value_type_id = 0, priv = 0x0, clear_priv = 0x0, libbpf_type = LIBBPF_MAP_UNSPEC, mmaped = 0x0, st_ops = 0x0, inner_map = 0x0, init_slots = 0x0, init_slots_sz = 0, pin_path = 0x0, pinned = false, reused = false}
(gdb) n
230			err = bpf_map_lookup_elem(reading_map_fd, &key, values);
(gdb) n
231			if (err) {
(gdb) p evsel
$18 = (struct evsel *) 0x0
(gdb)

So at this point is stack corruption when doing the bpf lookup on this
Ryzen system with 12 cores/24 threads:

[root@five ~]# bpftool prog | tail -4
244: kprobe  name hrtimer_nanosle  tag 0e77bacaf4555f83  gpl
	loaded_at 2021-01-19T13:04:03-0300  uid 0
	xlated 80B  jited 49B  memlock 4096B
	btf_id 360
[root@five ~]# perf stat -e cycles -b 244 -I 1000
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
#           time             counts unit events
     1.246867591             37,371      cycles
     2.247955619             33,652      cycles
     3.249017879             38,983      cycles
     4.250083954             49,546      cycles
     5.251150241             57,319      cycles
     6.252221407             44,170      cycles
     7.253279040             34,691      cycles
     8.254333356             48,195      cycles
^C     9.009242074             41,320      cycles

[root@five ~]#

This is with this hack:

diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
index 8c977f038f497fc1..f227fd09d33794b5 100644
--- a/tools/perf/util/bpf_counter.c
+++ b/tools/perf/util/bpf_counter.c
@@ -207,7 +207,7 @@ static int bpf_program_profiler__enable(struct evsel *evsel)
 static int bpf_program_profiler__read(struct evsel *evsel)
 {
 	int num_cpu = evsel__nr_cpus(evsel);
-	struct bpf_perf_event_value values[num_cpu];
+	struct bpf_perf_event_value values[num_cpu * 2];
 	struct bpf_counter *counter;
 	int reading_map_fd;
 	__u32 key = 0;


--------------------------

Now to check the proper fix...

Works with all the events, etc, we need to remove that nnoying warning
about .eh_frame (is this solved in the bpf tree?)

[root@five ~]# perf stat -b 244 -I 1000
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
#           time             counts unit events
     3.624452622               0.12 msec cpu-clock                 #    0.000 CPUs utilized          
     3.624452622                  0      context-switches          #    0.000 K/sec                  
     3.624452622                  0      cpu-migrations            #    0.000 K/sec                  
     3.624452622                  0      page-faults               #    0.000 K/sec                  
     3.624452622            302,518      cycles                    #    2.468 GHz                      (83.41%)
     3.624452622             26,101      stalled-cycles-frontend   #    8.63% frontend cycles idle     (81.05%)
     3.624452622             96,327      stalled-cycles-backend    #   31.84% backend cycles idle      (80.58%)
     3.624452622            269,500      instructions              #    0.89  insn per cycle         
     3.624452622                                                   #    0.36  stalled cycles per insn  (81.81%)
     3.624452622             55,003      branches                  #  448.667 M/sec                    (86.94%)
     3.624452622                467      branch-misses             #    0.85% of all branches          (77.18%)
     4.625690606               0.11 msec cpu-clock                 #    0.000 CPUs utilized          
     4.625690606                  0      context-switches          #    0.000 K/sec                  
     4.625690606                  0      cpu-migrations            #    0.000 K/sec                  
     4.625690606                  0      page-faults               #    0.000 K/sec                  
     4.625690606            290,093      cycles                    #    2.535 GHz                      (79.46%)
     4.625690606             28,274      stalled-cycles-frontend   #    9.75% frontend cycles idle     (79.47%)
     4.625690606            101,772      stalled-cycles-backend    #   35.08% backend cycles idle      (84.05%)
     4.625690606            261,867      instructions              #    0.90  insn per cycle         
     4.625690606                                                   #    0.39  stalled cycles per insn  (86.38%)
     4.625690606             55,334      branches                  #  483.473 M/sec                    (82.58%)
     4.625690606                373      branch-misses             #    0.67% of all branches          (88.59%)
     5.626686730               0.14 msec cpu-clock                 #    0.000 CPUs utilized          
     5.626686730                  0      context-switches          #    0.000 K/sec                  
     5.626686730                  0      cpu-migrations            #    0.000 K/sec                  
     5.626686730                  0      page-faults               #    0.000 K/sec                  
     5.626686730            369,810      cycles                    #    2.574 GHz                      (79.77%)
     5.626686730             32,212      stalled-cycles-frontend   #    8.71% frontend cycles idle     (76.00%)
     5.626686730            122,778      stalled-cycles-backend    #   33.20% backend cycles idle      (81.49%)
     5.626686730            275,699      instructions              #    0.75  insn per cycle         
     5.626686730                                                   #    0.45  stalled cycles per insn  (78.39%)
     5.626686730             58,135      branches                  #  404.716 M/sec                    (73.61%)
     5.626686730                588      branch-misses             #    1.01% of all branches          (71.01%)
     6.627688626               0.14 msec cpu-clock                 #    0.000 CPUs utilized          
     6.627688626                  0      context-switches          #    0.000 K/sec                  
     6.627688626                  0      cpu-migrations            #    0.000 K/sec                  
     6.627688626                  0      page-faults               #    0.000 K/sec                  
     6.627688626            358,906      cycles                    #    2.543 GHz                      (81.90%)
     6.627688626             26,411      stalled-cycles-frontend   #    7.36% frontend cycles idle     (80.87%)
     6.627688626            129,526      stalled-cycles-backend    #   36.09% backend cycles idle      (77.72%)
     6.627688626            278,756      instructions              #    0.78  insn per cycle         
     6.627688626                                                   #    0.46  stalled cycles per insn  (70.31%)
     6.627688626             56,514      branches                  #  400.448 M/sec                    (80.53%)
     6.627688626                687      branch-misses             #    1.22% of all branches          (75.74%)
     7.628688818               0.15 msec cpu-clock                 #    0.000 CPUs utilized          
     7.628688818                  0      context-switches          #    0.000 K/sec                  
     7.628688818                  0      cpu-migrations            #    0.000 K/sec                  
     7.628688818                  0      page-faults               #    0.000 K/sec                  
     7.628688818            384,382      cycles                    #    2.574 GHz                      (84.08%)
     7.628688818             27,148      stalled-cycles-frontend   #    7.06% frontend cycles idle     (81.82%)
     7.628688818            128,434      stalled-cycles-backend    #   33.41% backend cycles idle      (82.29%)
     7.628688818            270,693      instructions              #    0.70  insn per cycle         
     7.628688818                                                   #    0.47  stalled cycles per insn  (83.58%)
     7.628688818             57,277      branches                  #  383.614 M/sec                    (81.68%)
     7.628688818                756      branch-misses             #    1.32% of all branches          (85.39%)
^C     7.934955676               0.05 msec cpu-clock                 #    0.000 CPUs utilized          
     7.934955676                  0      context-switches          #    0.000 K/sec                  
     7.934955676                  0      cpu-migrations            #    0.000 K/sec                  
     7.934955676                  0      page-faults               #    0.000 K/sec                  
     7.934955676            126,813      cycles                    #    2.626 GHz                      (77.14%)
     7.934955676              9,424      stalled-cycles-frontend   #    7.43% frontend cycles idle     (87.96%)
     7.934955676             44,506      stalled-cycles-backend    #   35.10% backend cycles idle      (82.43%)
     7.934955676             86,383      instructions              #    0.68  insn per cycle         
     7.934955676                                                   #    0.52  stalled cycles per insn  (86.26%)
     7.934955676             17,491      branches                  #  362.222 M/sec                    (87.10%)
     7.934955676                247      branch-misses             #    1.41% of all branches          (76.62%)

[root@five ~]#

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-20 13:01                   ` Arnaldo Carvalho de Melo
@ 2021-01-20 13:50                     ` Arnaldo Carvalho de Melo
  2021-01-20 16:30                       ` FIX " Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-20 13:50 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team

Em Wed, Jan 20, 2021 at 10:01:37AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Jan 20, 2021 at 09:37:27AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Breakpoint 3, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:208
> > 208	{
> > (gdb) c
> > Continuing.
> > 
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> > 224		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> > (gdb) p evsel
> > $55 = (struct evsel *) 0x0
> > (gdb) bt
> > #0  0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> > #1  0x0000000000000000 in ?? ()
> > (gdb) bt
> > #0  0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> > #1  0x0000000000000000 in ?? ()
> > (gdb)
> > (gdb) info threads
> >   Id   Target Id                                  Frame
> > * 1    Thread 0x7ffff647f900 (LWP 1725711) "perf" 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> > (gdb)

[root@five perf]# bpftool prog | grep ^244: -A100
244: kprobe  name hrtimer_nanosle  tag 0e77bacaf4555f83  gpl
	loaded_at 2021-01-19T13:04:03-0300  uid 0
	xlated 80B  jited 49B  memlock 4096B
	btf_id 360
486: tracing  name fentry_XXX  tag 51a406116b7e2f01  gpl
	loaded_at 2021-01-20T10:36:29-0300  uid 0
	xlated 160B  jited 98B  memlock 4096B  map_ids 367,366,369
	btf_id 489
488: tracing  name fexit_XXX  tag a0cbac3880048978  gpl
	loaded_at 2021-01-20T10:36:29-0300  uid 0
	xlated 416B  jited 242B  memlock 4096B  map_ids 366,367,368,369
	btf_id 489
[root@five perf]#

[root@five perf]# bpftool map
3: array  name iterator.rodata  flags 0x480
	key 4B  value 98B  max_entries 1  memlock 4096B
	btf_id 6  frozen
366: perf_event_array  name events  flags 0x0
	key 4B  value 4B  max_entries 24  memlock 4096B
367: percpu_array  name fentry_readings  flags 0x0
	key 4B  value 24B  max_entries 1  memlock 4096B
368: percpu_array  name accum_readings  flags 0x0
	key 4B  value 24B  max_entries 1  memlock 4096B
369: array  name bpf_prog.rodata  flags 0x480
	key 4B  value 4B  max_entries 1  memlock 4096B
	btf_id 489  frozen
[root@five perf]#

So sizeof(struct bpf_perf_event_value) == 24 and it is a per-cpu array, the
machine has 24 cpus, why is the kernel thinking it has more and end up zeroing
entries after the 24 cores? Some percpu map subtlety (or obvious thing ;-\) I'm
missing?

Checking lookups into per cpu maps in sample code now...

Breakpoint 2, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:230
230			values[num_cpu].counter = 111111;
(gdb) del 2
(gdb) b 233
Breakpoint 3 at 0x5cf270: file util/bpf_counter.c, line 233.
(gdb) c
Continuing.

Breakpoint 3, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:233
233			err = bpf_map_lookup_elem(reading_map_fd, &key, values);
(gdb) p values[num_cpu].running
$4 = 333333
(gdb) c
Continuing.

Breakpoint 1, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:239
239			for (cpu = 0; cpu < num_cpu; cpu++) {
(gdb) p values[num_cpu].running
$5 = 0
(gdb) list -10
224		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
225			struct bpf_prog_profiler_bpf *skel = counter->skel;
226	
227			assert(skel != NULL);
228			reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
229	
230			values[num_cpu].counter = 111111;
231			values[num_cpu].enabled = 222222;
232			values[num_cpu].running = 333333;
233			err = bpf_map_lookup_elem(reading_map_fd, &key, values);
(gdb) 
234			if (err) {
235				pr_err("failed to read value\n");
236				return err;
237			}
238	
239			for (cpu = 0; cpu < num_cpu; cpu++) {
240				perf_counts(evsel->counts, cpu, 0)->val += values[cpu].counter;
241				perf_counts(evsel->counts, cpu, 0)->ena += values[cpu].enabled;
242				perf_counts(evsel->counts, cpu, 0)->run += values[cpu].running;
243			}
(gdb) 
 
> (gdb) run stat -e cycles -b 244 -I 1000
> Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> 
> Breakpoint 1, read_bpf_map_counters () at builtin-stat.c:414
> 414	{
> (gdb) c
> Continuing.
> 
> Breakpoint 2, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:224
> 224		list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> (gdb) bt
> #0  bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:224
> #1  0x00000000005cf648 in bpf_counter__read (evsel=0xce02c0) at util/bpf_counter.c:300
> #2  0x0000000000434e46 in read_bpf_map_counters () at builtin-stat.c:419
> #3  0x0000000000434ecb in read_counters (rs=0x7fffffff9530) at builtin-stat.c:433
> #4  0x00000000004351f3 in process_interval () at builtin-stat.c:498
> #5  0x00000000004352b9 in handle_interval (interval=1000, times=0x7fffffff9618) at builtin-stat.c:513
> #6  0x0000000000435859 in dispatch_events (forks=false, timeout=0, interval=1000, times=0x7fffffff9618) at builtin-stat.c:676
> #7  0x00000000004365f1 in __run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:944
> #8  0x00000000004367e3 in run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:1001
> #9  0x0000000000439e35 in cmd_stat (argc=0, argv=0x7fffffffdb60) at builtin-stat.c:2415
> #10 0x00000000004cf058 in run_builtin (p=0xaabe00 <commands+288>, argc=7, argv=0x7fffffffdb60) at perf.c:312
> #11 0x00000000004cf2c5 in handle_internal_command (argc=7, argv=0x7fffffffdb60) at perf.c:364
> #12 0x00000000004cf40c in run_argv (argcp=0x7fffffffd9ac, argv=0x7fffffffd9a0) at perf.c:408
> #13 0x00000000004cf7d8 in main (argc=7, argv=0x7fffffffdb60) at perf.c:538
> (gdb) n
> 225			struct bpf_prog_profiler_bpf *skel = counter->skel;
> (gdb) n
> 227			assert(skel != NULL);
> (gdb) n
> 228			reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
> (gdb) p skel->maps.accum_readings
> $16 = (struct bpf_map *) 0xce7400
> (gdb) p *skel->maps.accum_readings
> $17 = {name = 0xce7580 "accum_readings", fd = 7, sec_idx = 6, sec_offset = 56, map_ifindex = 0, inner_map_fd = -1, def = {type = 6, key_size = 4, value_size = 24, max_entries = 1, map_flags = 0}, numa_node = 0, btf_var_idx = 2, btf_key_type_id = 0, btf_value_type_id = 0,
>   btf_vmlinux_value_type_id = 0, priv = 0x0, clear_priv = 0x0, libbpf_type = LIBBPF_MAP_UNSPEC, mmaped = 0x0, st_ops = 0x0, inner_map = 0x0, init_slots = 0x0, init_slots_sz = 0, pin_path = 0x0, pinned = false, reused = false}
> (gdb) n
> 230			err = bpf_map_lookup_elem(reading_map_fd, &key, values);
> (gdb) n
> 231			if (err) {
> (gdb) p evsel
> $18 = (struct evsel *) 0x0
> (gdb)
> 
> So at this point is stack corruption when doing the bpf lookup on this
> Ryzen system with 12 cores/24 threads:
> 
> [root@five ~]# bpftool prog | tail -4
> 244: kprobe  name hrtimer_nanosle  tag 0e77bacaf4555f83  gpl
> 	loaded_at 2021-01-19T13:04:03-0300  uid 0
> 	xlated 80B  jited 49B  memlock 4096B
> 	btf_id 360
> [root@five ~]# perf stat -e cycles -b 244 -I 1000
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> #           time             counts unit events
>      1.246867591             37,371      cycles
>      2.247955619             33,652      cycles
>      3.249017879             38,983      cycles
>      4.250083954             49,546      cycles
>      5.251150241             57,319      cycles
>      6.252221407             44,170      cycles
>      7.253279040             34,691      cycles
>      8.254333356             48,195      cycles
> ^C     9.009242074             41,320      cycles
> 
> [root@five ~]#
> 
> This is with this hack:
> 
> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> index 8c977f038f497fc1..f227fd09d33794b5 100644
> --- a/tools/perf/util/bpf_counter.c
> +++ b/tools/perf/util/bpf_counter.c
> @@ -207,7 +207,7 @@ static int bpf_program_profiler__enable(struct evsel *evsel)
>  static int bpf_program_profiler__read(struct evsel *evsel)
>  {
>  	int num_cpu = evsel__nr_cpus(evsel);
> -	struct bpf_perf_event_value values[num_cpu];
> +	struct bpf_perf_event_value values[num_cpu * 2];
>  	struct bpf_counter *counter;
>  	int reading_map_fd;
>  	__u32 key = 0;
> 
> 
> --------------------------
> 
> Now to check the proper fix...
> 
> Works with all the events, etc, we need to remove that nnoying warning
> about .eh_frame (is this solved in the bpf tree?)
> 
> [root@five ~]# perf stat -b 244 -I 1000
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> #           time             counts unit events
>      3.624452622               0.12 msec cpu-clock                 #    0.000 CPUs utilized          
>      3.624452622                  0      context-switches          #    0.000 K/sec                  
>      3.624452622                  0      cpu-migrations            #    0.000 K/sec                  
>      3.624452622                  0      page-faults               #    0.000 K/sec                  
>      3.624452622            302,518      cycles                    #    2.468 GHz                      (83.41%)
>      3.624452622             26,101      stalled-cycles-frontend   #    8.63% frontend cycles idle     (81.05%)
>      3.624452622             96,327      stalled-cycles-backend    #   31.84% backend cycles idle      (80.58%)
>      3.624452622            269,500      instructions              #    0.89  insn per cycle         
>      3.624452622                                                   #    0.36  stalled cycles per insn  (81.81%)
>      3.624452622             55,003      branches                  #  448.667 M/sec                    (86.94%)
>      3.624452622                467      branch-misses             #    0.85% of all branches          (77.18%)
>      4.625690606               0.11 msec cpu-clock                 #    0.000 CPUs utilized          
>      4.625690606                  0      context-switches          #    0.000 K/sec                  
>      4.625690606                  0      cpu-migrations            #    0.000 K/sec                  
>      4.625690606                  0      page-faults               #    0.000 K/sec                  
>      4.625690606            290,093      cycles                    #    2.535 GHz                      (79.46%)
>      4.625690606             28,274      stalled-cycles-frontend   #    9.75% frontend cycles idle     (79.47%)
>      4.625690606            101,772      stalled-cycles-backend    #   35.08% backend cycles idle      (84.05%)
>      4.625690606            261,867      instructions              #    0.90  insn per cycle         
>      4.625690606                                                   #    0.39  stalled cycles per insn  (86.38%)
>      4.625690606             55,334      branches                  #  483.473 M/sec                    (82.58%)
>      4.625690606                373      branch-misses             #    0.67% of all branches          (88.59%)
>      5.626686730               0.14 msec cpu-clock                 #    0.000 CPUs utilized          
>      5.626686730                  0      context-switches          #    0.000 K/sec                  
>      5.626686730                  0      cpu-migrations            #    0.000 K/sec                  
>      5.626686730                  0      page-faults               #    0.000 K/sec                  
>      5.626686730            369,810      cycles                    #    2.574 GHz                      (79.77%)
>      5.626686730             32,212      stalled-cycles-frontend   #    8.71% frontend cycles idle     (76.00%)
>      5.626686730            122,778      stalled-cycles-backend    #   33.20% backend cycles idle      (81.49%)
>      5.626686730            275,699      instructions              #    0.75  insn per cycle         
>      5.626686730                                                   #    0.45  stalled cycles per insn  (78.39%)
>      5.626686730             58,135      branches                  #  404.716 M/sec                    (73.61%)
>      5.626686730                588      branch-misses             #    1.01% of all branches          (71.01%)
>      6.627688626               0.14 msec cpu-clock                 #    0.000 CPUs utilized          
>      6.627688626                  0      context-switches          #    0.000 K/sec                  
>      6.627688626                  0      cpu-migrations            #    0.000 K/sec                  
>      6.627688626                  0      page-faults               #    0.000 K/sec                  
>      6.627688626            358,906      cycles                    #    2.543 GHz                      (81.90%)
>      6.627688626             26,411      stalled-cycles-frontend   #    7.36% frontend cycles idle     (80.87%)
>      6.627688626            129,526      stalled-cycles-backend    #   36.09% backend cycles idle      (77.72%)
>      6.627688626            278,756      instructions              #    0.78  insn per cycle         
>      6.627688626                                                   #    0.46  stalled cycles per insn  (70.31%)
>      6.627688626             56,514      branches                  #  400.448 M/sec                    (80.53%)
>      6.627688626                687      branch-misses             #    1.22% of all branches          (75.74%)
>      7.628688818               0.15 msec cpu-clock                 #    0.000 CPUs utilized          
>      7.628688818                  0      context-switches          #    0.000 K/sec                  
>      7.628688818                  0      cpu-migrations            #    0.000 K/sec                  
>      7.628688818                  0      page-faults               #    0.000 K/sec                  
>      7.628688818            384,382      cycles                    #    2.574 GHz                      (84.08%)
>      7.628688818             27,148      stalled-cycles-frontend   #    7.06% frontend cycles idle     (81.82%)
>      7.628688818            128,434      stalled-cycles-backend    #   33.41% backend cycles idle      (82.29%)
>      7.628688818            270,693      instructions              #    0.70  insn per cycle         
>      7.628688818                                                   #    0.47  stalled cycles per insn  (83.58%)
>      7.628688818             57,277      branches                  #  383.614 M/sec                    (81.68%)
>      7.628688818                756      branch-misses             #    1.32% of all branches          (85.39%)
> ^C     7.934955676               0.05 msec cpu-clock                 #    0.000 CPUs utilized          
>      7.934955676                  0      context-switches          #    0.000 K/sec                  
>      7.934955676                  0      cpu-migrations            #    0.000 K/sec                  
>      7.934955676                  0      page-faults               #    0.000 K/sec                  
>      7.934955676            126,813      cycles                    #    2.626 GHz                      (77.14%)
>      7.934955676              9,424      stalled-cycles-frontend   #    7.43% frontend cycles idle     (87.96%)
>      7.934955676             44,506      stalled-cycles-backend    #   35.10% backend cycles idle      (82.43%)
>      7.934955676             86,383      instructions              #    0.68  insn per cycle         
>      7.934955676                                                   #    0.52  stalled cycles per insn  (86.26%)
>      7.934955676             17,491      branches                  #  362.222 M/sec                    (87.10%)
>      7.934955676                247      branch-misses             #    1.41% of all branches          (76.62%)
> 
> [root@five ~]#

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* FIX Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-20 13:50                     ` Arnaldo Carvalho de Melo
@ 2021-01-20 16:30                       ` Arnaldo Carvalho de Melo
  2021-01-20 16:40                         ` Song Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-20 16:30 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team

Em Wed, Jan 20, 2021 at 10:50:13AM -0300, Arnaldo Carvalho de Melo escreveu:
> So sizeof(struct bpf_perf_event_value) == 24 and it is a per-cpu array, the
> machine has 24 cpus, why is the kernel thinking it has more and end up zeroing
> entries after the 24 cores? Some percpu map subtlety (or obvious thing ;-\) I'm
> missing?
> 
> Checking lookups into per cpu maps in sample code now...
 
(gdb) run stat -b 244 -I 1000 -e cycles
Starting program: /root/bin/perf stat -b 244 -I 1000 -e cycles
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame

Breakpoint 1, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:217
217		if (list_empty(&evsel->bpf_counter_list))
(gdb) p num_
num_cpu              num_groups           num_leaps            num_print_iv         num_stmts            num_transitions      num_warnings_issued
num_cpu_bpf          num_ifs              num_print_interval   num_srcfiles         num_to_str           num_types
(gdb) p num_cpu
$1 = 24
(gdb) p num_cpu_bpf
$2 = 32
(gdb)

Humm, why?

But then libbpf and the sample/bpf/ code use it this way:


diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
index 8c977f038f497fc1..7dd3d57aba4f620c 100644
--- a/tools/perf/util/bpf_counter.c
+++ b/tools/perf/util/bpf_counter.c
@@ -207,7 +207,8 @@ static int bpf_program_profiler__enable(struct evsel *evsel)
 static int bpf_program_profiler__read(struct evsel *evsel)
 {
 	int num_cpu = evsel__nr_cpus(evsel);
-	struct bpf_perf_event_value values[num_cpu];
+	int num_cpu_bpf = libbpf_num_possible_cpus();
+	struct bpf_perf_event_value values[num_cpu > num_cpu_bpf ? num_cpu : num_cpu_bpf];
 	struct bpf_counter *counter;
 	int reading_map_fd;
 	__u32 key = 0;

-------------------------------------------------------------

[root@five ~]# cat /sys/devices/system/cpu/possible
0-31
[root@five ~]#

I bet that in your test systems evsel__nr_cpus(evsel) matches
/sys/devices/system/cpu/possible and thus you don't see the problem.

evsel__nr_cpus(evsel) uses what is in:

[acme@five perf]$ cat /sys/devices/system/cpu/online
0-23
[acme@five perf]$

So that is the reason for the problem and the fix is to use
libbpf_num_possible_cpus(), I'll bolt that into the patch that
introduced that code.

- Arnaldo

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: FIX Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-20 16:30                       ` FIX " Arnaldo Carvalho de Melo
@ 2021-01-20 16:40                         ` Song Liu
  2021-01-20 17:04                           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 19+ messages in thread
From: Song Liu @ 2021-01-20 16:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team



> On Jan 20, 2021, at 8:30 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> Em Wed, Jan 20, 2021 at 10:50:13AM -0300, Arnaldo Carvalho de Melo escreveu:
>> So sizeof(struct bpf_perf_event_value) == 24 and it is a per-cpu array, the
>> machine has 24 cpus, why is the kernel thinking it has more and end up zeroing
>> entries after the 24 cores? Some percpu map subtlety (or obvious thing ;-\) I'm
>> missing?
>> 
>> Checking lookups into per cpu maps in sample code now...
> 
> (gdb) run stat -b 244 -I 1000 -e cycles
> Starting program: /root/bin/perf stat -b 244 -I 1000 -e cycles
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> 
> Breakpoint 1, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:217
> 217		if (list_empty(&evsel->bpf_counter_list))
> (gdb) p num_
> num_cpu              num_groups           num_leaps            num_print_iv         num_stmts            num_transitions      num_warnings_issued
> num_cpu_bpf          num_ifs              num_print_interval   num_srcfiles         num_to_str           num_types
> (gdb) p num_cpu
> $1 = 24
> (gdb) p num_cpu_bpf
> $2 = 32
> (gdb)
> 
> Humm, why?
> 
> But then libbpf and the sample/bpf/ code use it this way:
> 
> 
> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> index 8c977f038f497fc1..7dd3d57aba4f620c 100644
> --- a/tools/perf/util/bpf_counter.c
> +++ b/tools/perf/util/bpf_counter.c
> @@ -207,7 +207,8 @@ static int bpf_program_profiler__enable(struct evsel *evsel)
> static int bpf_program_profiler__read(struct evsel *evsel)
> {
> 	int num_cpu = evsel__nr_cpus(evsel);
> -	struct bpf_perf_event_value values[num_cpu];
> +	int num_cpu_bpf = libbpf_num_possible_cpus();
> +	struct bpf_perf_event_value values[num_cpu > num_cpu_bpf ? num_cpu : num_cpu_bpf];
> 	struct bpf_counter *counter;
> 	int reading_map_fd;
> 	__u32 key = 0;
> 
> -------------------------------------------------------------
> 
> [root@five ~]# cat /sys/devices/system/cpu/possible
> 0-31
> [root@five ~]#
> 
> I bet that in your test systems evsel__nr_cpus(evsel) matches
> /sys/devices/system/cpu/possible and thus you don't see the problem.

Thanks Arnaldo!

Yes, my system have same online and possible CPUs. 

Since possible_cpu >= online_cpu, maybe we can use num_cpu_bpf in 
bpf_program_profiler__read() without he extra check? 

Song

[...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: FIX Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs
  2021-01-20 16:40                         ` Song Liu
@ 2021-01-20 17:04                           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-20 17:04 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Kernel Team

Em Wed, Jan 20, 2021 at 04:40:46PM +0000, Song Liu escreveu:
> 
> 
> > On Jan 20, 2021, at 8:30 AM, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > 
> > Em Wed, Jan 20, 2021 at 10:50:13AM -0300, Arnaldo Carvalho de Melo escreveu:
> >> So sizeof(struct bpf_perf_event_value) == 24 and it is a per-cpu array, the
> >> machine has 24 cpus, why is the kernel thinking it has more and end up zeroing
> >> entries after the 24 cores? Some percpu map subtlety (or obvious thing ;-\) I'm
> >> missing?
> >> 
> >> Checking lookups into per cpu maps in sample code now...
> > 
> > (gdb) run stat -b 244 -I 1000 -e cycles
> > Starting program: /root/bin/perf stat -b 244 -I 1000 -e cycles
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > libbpf: elf: skipping unrecognized data section(9) .eh_frame
> > libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> > 
> > Breakpoint 1, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:217
> > 217		if (list_empty(&evsel->bpf_counter_list))
> > (gdb) p num_
> > num_cpu              num_groups           num_leaps            num_print_iv         num_stmts            num_transitions      num_warnings_issued
> > num_cpu_bpf          num_ifs              num_print_interval   num_srcfiles         num_to_str           num_types
> > (gdb) p num_cpu
> > $1 = 24
> > (gdb) p num_cpu_bpf
> > $2 = 32
> > (gdb)
> > 
> > Humm, why?
> > 
> > But then libbpf and the sample/bpf/ code use it this way:
> > 
> > 
> > diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> > index 8c977f038f497fc1..7dd3d57aba4f620c 100644
> > --- a/tools/perf/util/bpf_counter.c
> > +++ b/tools/perf/util/bpf_counter.c
> > @@ -207,7 +207,8 @@ static int bpf_program_profiler__enable(struct evsel *evsel)
> > static int bpf_program_profiler__read(struct evsel *evsel)
> > {
> > 	int num_cpu = evsel__nr_cpus(evsel);
> > -	struct bpf_perf_event_value values[num_cpu];
> > +	int num_cpu_bpf = libbpf_num_possible_cpus();
> > +	struct bpf_perf_event_value values[num_cpu > num_cpu_bpf ? num_cpu : num_cpu_bpf];
> > 	struct bpf_counter *counter;
> > 	int reading_map_fd;
> > 	__u32 key = 0;
> > 
> > -------------------------------------------------------------
> > 
> > [root@five ~]# cat /sys/devices/system/cpu/possible
> > 0-31
> > [root@five ~]#
> > 
> > I bet that in your test systems evsel__nr_cpus(evsel) matches
> > /sys/devices/system/cpu/possible and thus you don't see the problem.
> 
> Thanks Arnaldo!
> 
> Yes, my system have same online and possible CPUs. 
> 
> Since possible_cpu >= online_cpu, maybe we can use num_cpu_bpf in 
> bpf_program_profiler__read() without he extra check? 

That is what I'll do, no need to resubmit, I'll audit the other bits to
see if something else needs changing.

- Arnaldo

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-01-20 21:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-29 21:42 [PATCH v7 0/3] Introduce perf-stat -b for BPF programs Song Liu
2020-12-29 21:42 ` [PATCH v7 1/3] bpftool: add Makefile target bootstrap Song Liu
2020-12-29 21:42 ` [PATCH v7 2/3] perf: support build BPF skeletons with perf Song Liu
2020-12-29 21:42 ` [PATCH v7 3/3] perf-stat: enable counting events for BPF programs Song Liu
2021-01-12  7:35   ` Namhyung Kim
2021-01-15 18:53     ` Arnaldo Carvalho de Melo
2021-01-18 19:38   ` Arnaldo Carvalho de Melo
2021-01-19  0:48     ` Song Liu
2021-01-19 14:31       ` Arnaldo Carvalho de Melo
2021-01-19 14:42         ` Arnaldo Carvalho de Melo
2021-01-19 16:31           ` Arnaldo Carvalho de Melo
2021-01-19 21:54             ` Song Liu
2021-01-19 22:30               ` Arnaldo Carvalho de Melo
2021-01-20 12:37                 ` Arnaldo Carvalho de Melo
2021-01-20 13:01                   ` Arnaldo Carvalho de Melo
2021-01-20 13:50                     ` Arnaldo Carvalho de Melo
2021-01-20 16:30                       ` FIX " Arnaldo Carvalho de Melo
2021-01-20 16:40                         ` Song Liu
2021-01-20 17:04                           ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).