linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 0/3] bpf: introduce bpf_get_branch_snapshot
@ 2021-08-26 22:13 Song Liu
  2021-08-26 22:13 ` [PATCH v2 bpf-next 1/3] perf: enable branch record for software events Song Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Song Liu @ 2021-08-26 22:13 UTC (permalink / raw)
  To: bpf, linux-kernel; +Cc: acme, peterz, mingo, kjain, kernel-team, Song Liu

Changes v1 => v2:
1. Rename the helper as bpf_get_branch_snapshot;
2. Fix/simplify the use of stack_call;
3. Instead of percpu variables, let intel_pmu_snapshot_branch_stack output
   branch records to an output argument of type perf_branch_snapshot.

Branch stack can be very useful in understanding software events. For
example, when a long function, e.g. sys_perf_event_open, returns an errno,
it is not obvious why the function failed. Branch stack could provide very
helpful information in this type of scenarios.

This set adds support to read branch stack with a new BPF helper
bpf_get_branch_trace(). Currently, this is only supported in Intel systems.
It is also possible to support the same feaure for PowerPC.

The hardware that records the branch stace is not stopped automatically on
software events. Therefore, it is necessary to stop it in software soon.
Otherwise, the hardware buffers/registers will be flushed. One of the key
design consideration in this set is to minimize the number of branch record
entries between the event triggers and the hardware recorder is stopped.
Based on this goal, current design is different from the discussions in
original RFC [1]:
 1) Static call is used when supported, to save function pointer
    dereference;
 2) intel_pmu_lbr_disable_all is used instead of perf_pmu_disable(),
    because the latter uses about 10 entries before stopping LBR.

With current code, on Intel CPU, LBR is stopped after 6 branch entries
after fexit triggers:

ID: 0 from intel_pmu_lbr_disable_all.part.10+37 to intel_pmu_lbr_disable_all.part.10+72
ID: 1 from intel_pmu_lbr_disable_all.part.10+33 to intel_pmu_lbr_disable_all.part.10+37
ID: 2 from intel_pmu_snapshot_branch_stack+46 to intel_pmu_lbr_disable_all.part.10+0
ID: 3 from __bpf_prog_enter+38 to intel_pmu_snapshot_branch_stack+0
ID: 4 from __bpf_prog_enter+8 to __bpf_prog_enter+38
ID: 5 from __brk_limit+477020214 to __bpf_prog_enter+0
ID: 6 from bpf_fexit_loop_test1+22 to __brk_limit+477020195
ID: 7 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
ID: 8 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
...

[1] https://lore.kernel.org/bpf/20210818012937.2522409-1-songliubraving@fb.com/

Song Liu (3):
  perf: enable branch record for software events
  bpf: introduce helper bpf_get_branch_snapshot
  selftests/bpf: add test for bpf_get_branch_snapshot

 arch/x86/events/intel/core.c                  |   5 +-
 arch/x86/events/intel/lbr.c                   |  13 +++
 arch/x86/events/perf_event.h                  |   2 +
 include/linux/bpf.h                           |   2 +
 include/linux/filter.h                        |   3 +-
 include/linux/perf_event.h                    |  29 +++++
 include/uapi/linux/bpf.h                      |  16 +++
 kernel/bpf/trampoline.c                       |  11 ++
 kernel/bpf/verifier.c                         |   7 ++
 kernel/events/core.c                          |   3 +
 kernel/trace/bpf_trace.c                      |  38 +++++++
 net/bpf/test_run.c                            |  15 ++-
 tools/include/uapi/linux/bpf.h                |  16 +++
 .../bpf/prog_tests/get_branch_snapshot.c      | 106 ++++++++++++++++++
 .../selftests/bpf/progs/get_branch_snapshot.c |  41 +++++++
 tools/testing/selftests/bpf/trace_helpers.c   |  30 +++++
 tools/testing/selftests/bpf/trace_helpers.h   |   5 +
 17 files changed, 339 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c
 create mode 100644 tools/testing/selftests/bpf/progs/get_branch_snapshot.c

--
2.30.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-08-26 22:13 [PATCH v2 bpf-next 0/3] bpf: introduce bpf_get_branch_snapshot Song Liu
@ 2021-08-26 22:13 ` Song Liu
  2021-08-30 10:22   ` Peter Zijlstra
  2021-08-30 10:43   ` Peter Zijlstra
  2021-08-26 22:13 ` [PATCH v2 bpf-next 2/3] bpf: introduce helper bpf_get_branch_snapshot Song Liu
  2021-08-26 22:13 ` [PATCH v2 bpf-next 3/3] selftests/bpf: add test for bpf_get_branch_snapshot Song Liu
  2 siblings, 2 replies; 18+ messages in thread
From: Song Liu @ 2021-08-26 22:13 UTC (permalink / raw)
  To: bpf, linux-kernel; +Cc: acme, peterz, mingo, kjain, kernel-team, Song Liu

The typical way to access branch record (e.g. Intel LBR) is via hardware
perf_event. For CPUs with FREEZE_LBRS_ON_PMI support, PMI could capture
reliable LBR. On the other hand, LBR could also be useful in non-PMI
scenario. For example, in kretprobe or bpf fexit program, LBR could
provide a lot of information on what happened with the function. Add API
to use branch record for software use.

Note that, when the software event triggers, it is necessary to stop the
branch record hardware asap. Therefore, static_call is used to remove some
branch instructions in this process.

Signed-off-by: Song Liu <songliubraving@fb.com>

---
Some data on intel_pmu_lbr_disable_all() and perf_pmu_disable().

With this patch, when fexit program triggers, intel_pmu_lbr_disable_all is
used to stop the LBR, and the LBR is stopped after 6 extra branch records
(see the full trace below). If we replace intel_pmu_lbr_disable_all in
intel_pmu_snapshot_branch_stack() with perf_pmu_disable, the LBR is stopped
after 19 extra branch records. This is still acceptable for systems with 32
LBR entries. But for systems with fewer entries, all the entries before
fexit are flushed. Therefore, I suggest we take the short cut and stop LBR
asap.


LBR snapshot captured when we use intel_pmu_lbr_disable_all():

ID: 0 from intel_pmu_lbr_disable_all.part.10+37 to intel_pmu_lbr_disable_all.part.10+72
ID: 1 from intel_pmu_lbr_disable_all.part.10+33 to intel_pmu_lbr_disable_all.part.10+37
ID: 2 from intel_pmu_snapshot_branch_stack+51 to intel_pmu_lbr_disable_all.part.10+0
ID: 3 from __bpf_prog_enter+53 to intel_pmu_snapshot_branch_stack+0
ID: 4 from __bpf_prog_enter+8 to __bpf_prog_enter+38
ID: 5 from __brk_limit+473903158 to __bpf_prog_enter+0
ID: 6 from bpf_fexit_loop_test1+22 to __brk_limit+473903139
ID: 7 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
ID: 8 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
ID: 9 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13


LBR snapshot captured when we use perf_pmu_disable():

ID: 0 from intel_pmu_lbr_disable_all+58 to intel_pmu_lbr_disable_all+93
ID: 1 from intel_pmu_lbr_disable_all+54 to intel_pmu_lbr_disable_all+58
ID: 2 from intel_pmu_disable_all+15 to intel_pmu_lbr_disable_all+0
ID: 3 from intel_pmu_pebs_disable_all+30 to intel_pmu_disable_all+15
ID: 4 from intel_pmu_disable_all+10 to intel_pmu_pebs_disable_all+0
ID: 5 from __intel_pmu_disable_all+49 to intel_pmu_disable_all+10
ID: 6 from intel_pmu_disable_all+5 to __intel_pmu_disable_all+0
ID: 7 from x86_pmu_disable+61 to intel_pmu_disable_all+0
ID: 8 from x86_pmu_disable+38 to x86_pmu_disable+41
ID: 9 from __x86_indirect_thunk_rax+16 to x86_pmu_disable+0
ID: 10 from __x86_indirect_thunk_rax+0 to __x86_indirect_thunk_rax+12
ID: 11 from perf_pmu_disable.part.122+4 to __x86_indirect_thunk_rax+0
ID: 12 from perf_pmu_disable+23 to perf_pmu_disable.part.122+0
ID: 13 from intel_pmu_snapshot_branch_stack+45 to perf_pmu_disable+0
ID: 14 from x86_get_pmu+35 to intel_pmu_snapshot_branch_stack+39
ID: 15 from intel_pmu_snapshot_branch_stack+34 to x86_get_pmu+0
ID: 16 from __bpf_prog_enter+53 to intel_pmu_snapshot_branch_stack+0
ID: 17 from __bpf_prog_enter+8 to __bpf_prog_enter+38
ID: 18 from __brk_limit+478056502 to __bpf_prog_enter+0
ID: 19 from bpf_fexit_loop_test1+22 to __brk_limit+478056483
ID: 20 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
ID: 21 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13


---
 arch/x86/events/intel/core.c |  5 ++++-
 arch/x86/events/intel/lbr.c  | 13 +++++++++++++
 arch/x86/events/perf_event.h |  2 ++
 include/linux/perf_event.h   | 29 +++++++++++++++++++++++++++++
 kernel/events/core.c         |  3 +++
 5 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ac6fd2dabf6a2..a29649e7241cc 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6283,8 +6283,11 @@ __init int intel_pmu_init(void)
 			x86_pmu.lbr_nr = 0;
 	}

-	if (x86_pmu.lbr_nr)
+	if (x86_pmu.lbr_nr) {
 		pr_cont("%d-deep LBR, ", x86_pmu.lbr_nr);
+		static_call_update(perf_snapshot_branch_stack,
+				   intel_pmu_snapshot_branch_stack);
+	}

 	intel_pmu_check_extra_regs(x86_pmu.extra_regs);

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 9e6d6eaeb4cb6..7d4fe1d6e79ff 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1862,3 +1862,16 @@ EXPORT_SYMBOL_GPL(x86_perf_get_lbr);
 struct event_constraint vlbr_constraint =
 	__EVENT_CONSTRAINT(INTEL_FIXED_VLBR_EVENT, (1ULL << INTEL_PMC_IDX_FIXED_VLBR),
 			  FIXED_EVENT_FLAGS, 1, 0, PERF_X86_EVENT_LBR_SELECT);
+
+int intel_pmu_snapshot_branch_stack(struct perf_branch_snapshot *br_snapshot)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	intel_pmu_lbr_disable_all();
+	intel_pmu_lbr_read();
+	memcpy(br_snapshot->entries, cpuc->lbr_entries,
+	       sizeof(struct perf_branch_entry) * x86_pmu.lbr_nr);
+	br_snapshot->nr = x86_pmu.lbr_nr;
+	intel_pmu_lbr_enable_all(false);
+	return 0;
+}
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index e3ac05c97b5e5..0f4ca25d10bf1 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1379,6 +1379,8 @@ void intel_pmu_pebs_data_source_skl(bool pmem);

 int intel_pmu_setup_lbr_filter(struct perf_event *event);

+int intel_pmu_snapshot_branch_stack(struct perf_branch_snapshot *br_snapshot);
+
 void intel_pt_interrupt(void);

 int intel_bts_interrupt(void);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index fe156a8170aa3..f029eb4b2ce40 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -57,6 +57,7 @@ struct perf_guest_info_callbacks {
 #include <linux/cgroup.h>
 #include <linux/refcount.h>
 #include <linux/security.h>
+#include <linux/static_call.h>
 #include <asm/local.h>

 struct perf_callchain_entry {
@@ -1612,4 +1613,32 @@ extern void __weak arch_perf_update_userpage(struct perf_event *event,
 extern __weak u64 arch_perf_get_page_size(struct mm_struct *mm, unsigned long addr);
 #endif

+/*
+ * Snapshot branch stack on software events.
+ *
+ * Branch stack can be very useful in understanding software events. For
+ * example, when a long function, e.g. sys_perf_event_open, returns an
+ * errno, it is not obvious why the function failed. Branch stack could
+ * provide very helpful information in this type of scenarios.
+ *
+ * On software event, it is necessary to stop the hardware branch recorder
+ * fast. Otherwise, the hardware register/buffer will be flushed with
+ * entries af the triggering event. Therefore, static call is used to
+ * stop the hardware recorder.
+ *
+ * To use the snapshot:
+ * 1) After the event triggers, call perf_snapshot_branch_stack asap;
+ * 2) On the same cpu, access the snapshot with perf_read_branch_snapshot;
+ */
+#define MAX_BRANCH_SNAPSHOT 32
+
+struct perf_branch_snapshot {
+	unsigned int nr;
+	struct perf_branch_entry entries[MAX_BRANCH_SNAPSHOT];
+};
+
+int dummy_perf_snapshot_branch_stack(struct perf_branch_snapshot *br_snapshot);
+
+DECLARE_STATIC_CALL(perf_snapshot_branch_stack, dummy_perf_snapshot_branch_stack);
+
 #endif /* _LINUX_PERF_EVENT_H */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 011cc5069b7ba..c53fe90e630ac 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -13437,3 +13437,6 @@ struct cgroup_subsys perf_event_cgrp_subsys = {
 	.threaded	= true,
 };
 #endif /* CONFIG_CGROUP_PERF */
+
+DEFINE_STATIC_CALL_NULL(perf_snapshot_branch_stack,
+			dummy_perf_snapshot_branch_stack);
--
2.30.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 bpf-next 2/3] bpf: introduce helper bpf_get_branch_snapshot
  2021-08-26 22:13 [PATCH v2 bpf-next 0/3] bpf: introduce bpf_get_branch_snapshot Song Liu
  2021-08-26 22:13 ` [PATCH v2 bpf-next 1/3] perf: enable branch record for software events Song Liu
@ 2021-08-26 22:13 ` Song Liu
  2021-08-27  9:28   ` kernel test robot
  2021-08-27 15:10   ` kernel test robot
  2021-08-26 22:13 ` [PATCH v2 bpf-next 3/3] selftests/bpf: add test for bpf_get_branch_snapshot Song Liu
  2 siblings, 2 replies; 18+ messages in thread
From: Song Liu @ 2021-08-26 22:13 UTC (permalink / raw)
  To: bpf, linux-kernel; +Cc: acme, peterz, mingo, kjain, kernel-team, Song Liu

Introduce bpf_get_branch_snapshot(), which allows tracing pogram to get
branch trace from hardware (e.g. Intel LBR). To use the feature, the
user need to create perf_event with proper branch_record filtering
on each cpu, and then calls bpf_get_branch_snapshot in the bpf function.
On Intel CPUs, VLBR event (raw event 0x1b00) can be use for this.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/bpf.h            |  2 ++
 include/linux/filter.h         |  3 ++-
 include/uapi/linux/bpf.h       | 16 ++++++++++++++
 kernel/bpf/trampoline.c        | 11 ++++++++++
 kernel/bpf/verifier.c          |  7 +++++++
 kernel/trace/bpf_trace.c       | 38 ++++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h | 16 ++++++++++++++
 7 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f4c16f19f83e3..1868434dc519a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2220,4 +2220,6 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
 			u32 **bin_buf, u32 num_args);
 void bpf_bprintf_cleanup(void);
 
+DECLARE_PER_CPU(struct perf_branch_snapshot, bpf_perf_branch_snapshot);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 7d248941ecea3..8c30712f56ab2 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -575,7 +575,8 @@ struct bpf_prog {
 				has_callchain_buf:1, /* callchain buffer allocated? */
 				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
 				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
-				call_get_func_ip:1; /* Do we call get_func_ip() */
+				call_get_func_ip:1, /* Do we call get_func_ip() */
+				call_get_branch:1; /* Do we call get_branch_trace() */
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	enum bpf_attach_type	expected_attach_type; /* For some prog types */
 	u32			len;		/* Number of filter blocks */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 791f31dd0abee..e113879d4f882 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4877,6 +4877,21 @@ union bpf_attr {
  *		Get the struct pt_regs associated with **task**.
  *	Return
  *		A pointer to struct pt_regs.
+ *
+ * long bpf_get_branch_snapshot(void *entries, u32 size)
+ *	Description
+ *		Get branch trace from hardware engines like Intel LBR. The
+ *		branch trace is taken soon after the trigger point of the
+ *		BPF program, so it may contain some entries after the
+ *		trigger point. The user need to filter these entries
+ *		accordingly.
+ *
+ *		The data is stored as struct perf_branch_entry into output
+ *		buffer *entries*. *size* is the size of *entries* in bytes.
+ *
+ *	Return
+ *		> 0, number of valid output entries.
+ *		**-EOPNOTSUP**, the hardware/kernel does not support this function
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5055,6 +5070,7 @@ union bpf_attr {
 	FN(get_func_ip),		\
 	FN(get_attach_cookie),		\
 	FN(task_pt_regs),		\
+	FN(get_branch_snapshot),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index fe1e857324e66..4e21982aec1c6 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -10,6 +10,7 @@
 #include <linux/rcupdate_trace.h>
 #include <linux/rcupdate_wait.h>
 #include <linux/module.h>
+#include <linux/static_call.h>
 
 /* dummy _ops. The verifier will operate on target program's ops. */
 const struct bpf_verifier_ops bpf_extension_verifier_ops = {
@@ -564,6 +565,16 @@ static void notrace inc_misses_counter(struct bpf_prog *prog)
 u64 notrace __bpf_prog_enter(struct bpf_prog *prog)
 	__acquires(RCU)
 {
+	/* Calling migrate_disable costs two entries in the LBR. To save
+	 * some entries, we call perf_snapshot_branch_stack before
+	 * migrate_disable to save some entries. This is OK because we
+	 * care about the branch trace before entering the BPF program.
+	 * If migrate happens exactly here, there isn't much we can do to
+	 * preserve the data.
+	 */
+	if (prog->call_get_branch)
+		static_call_cond(perf_snapshot_branch_stack)(
+			this_cpu_ptr(&bpf_perf_branch_snapshot));
 	rcu_read_lock();
 	migrate_disable();
 	if (unlikely(__this_cpu_inc_return(*(prog->active)) != 1)) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 206c221453cfa..ba91ec0e204c0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6446,6 +6446,13 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		env->prog->call_get_func_ip = true;
 	}
 
+	if (func_id == BPF_FUNC_get_branch_snapshot) {
+		if (env->prog->aux->sleepable) {
+			verbose(env, "sleepable progs cannot call get_branch_snapshot\n");
+			return -ENOTSUPP;
+		}
+		env->prog->call_get_branch = true;
+	}
 	if (changes_data)
 		clear_all_pkt_pointers(env);
 	return 0;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 8e2eb950aa829..4ecade09369e6 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1017,6 +1017,29 @@ static const struct bpf_func_proto bpf_get_attach_cookie_proto_pe = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_2(bpf_get_branch_snapshot, void *, buf, u32, size)
+{
+	u32 max_size;
+
+	if (this_cpu_ptr(&bpf_perf_branch_snapshot)->nr == 0)
+		return -EOPNOTSUPP;
+
+	max_size = this_cpu_ptr(&bpf_perf_branch_snapshot)->nr *
+		sizeof(struct perf_branch_entry);
+	memcpy(buf, this_cpu_ptr(&bpf_perf_branch_snapshot)->entries,
+	       min_t(u32, size, max_size));
+
+	return this_cpu_ptr(&bpf_perf_branch_snapshot)->nr;
+}
+
+static const struct bpf_func_proto bpf_get_branch_snapshot_proto = {
+	.func		= bpf_get_branch_snapshot,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+};
+
 static const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1132,6 +1155,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_snprintf_proto;
 	case BPF_FUNC_get_func_ip:
 		return &bpf_get_func_ip_proto_tracing;
+	case BPF_FUNC_get_branch_snapshot:
+		return &bpf_get_branch_snapshot_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
@@ -1863,9 +1888,22 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
 	preempt_enable();
 }
 
+DEFINE_PER_CPU(struct perf_branch_snapshot, bpf_perf_branch_snapshot);
+
 static __always_inline
 void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
 {
+	/* Calling migrate_disable costs two entries in the LBR. To save
+	 * some entries, we call perf_snapshot_branch_stack before
+	 * migrate_disable to save some entries. This is OK because we
+	 * care about the branch trace before entering the BPF program.
+	 * If migrate happens exactly here, there isn't much we can do to
+	 * preserve the data.
+	 */
+	if (prog->call_get_branch)
+		static_call_cond(perf_snapshot_branch_stack)(
+			this_cpu_ptr(&bpf_perf_branch_snapshot));
+
 	cant_sleep();
 	rcu_read_lock();
 	(void) bpf_prog_run(prog, args);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 791f31dd0abee..e113879d4f882 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4877,6 +4877,21 @@ union bpf_attr {
  *		Get the struct pt_regs associated with **task**.
  *	Return
  *		A pointer to struct pt_regs.
+ *
+ * long bpf_get_branch_snapshot(void *entries, u32 size)
+ *	Description
+ *		Get branch trace from hardware engines like Intel LBR. The
+ *		branch trace is taken soon after the trigger point of the
+ *		BPF program, so it may contain some entries after the
+ *		trigger point. The user need to filter these entries
+ *		accordingly.
+ *
+ *		The data is stored as struct perf_branch_entry into output
+ *		buffer *entries*. *size* is the size of *entries* in bytes.
+ *
+ *	Return
+ *		> 0, number of valid output entries.
+ *		**-EOPNOTSUP**, the hardware/kernel does not support this function
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5055,6 +5070,7 @@ union bpf_attr {
 	FN(get_func_ip),		\
 	FN(get_attach_cookie),		\
 	FN(task_pt_regs),		\
+	FN(get_branch_snapshot),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 bpf-next 3/3] selftests/bpf: add test for bpf_get_branch_snapshot
  2021-08-26 22:13 [PATCH v2 bpf-next 0/3] bpf: introduce bpf_get_branch_snapshot Song Liu
  2021-08-26 22:13 ` [PATCH v2 bpf-next 1/3] perf: enable branch record for software events Song Liu
  2021-08-26 22:13 ` [PATCH v2 bpf-next 2/3] bpf: introduce helper bpf_get_branch_snapshot Song Liu
@ 2021-08-26 22:13 ` Song Liu
  2 siblings, 0 replies; 18+ messages in thread
From: Song Liu @ 2021-08-26 22:13 UTC (permalink / raw)
  To: bpf, linux-kernel; +Cc: acme, peterz, mingo, kjain, kernel-team, Song Liu

This test uses bpf_get_branch_snapshot from a fexit program. The test uses
a target kernel function (bpf_fexit_loop_test1) and compares the record
against kallsyms. If there isn't enough record matching kallsyms, the
test fails.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 net/bpf/test_run.c                            |  15 ++-
 .../bpf/prog_tests/get_branch_snapshot.c      | 106 ++++++++++++++++++
 .../selftests/bpf/progs/get_branch_snapshot.c |  41 +++++++
 tools/testing/selftests/bpf/trace_helpers.c   |  30 +++++
 tools/testing/selftests/bpf/trace_helpers.h   |   5 +
 5 files changed, 196 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c
 create mode 100644 tools/testing/selftests/bpf/progs/get_branch_snapshot.c

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 2eb0e55ef54d2..6cc179a532c9c 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -231,6 +231,18 @@ struct sock * noinline bpf_kfunc_call_test3(struct sock *sk)
 	return sk;
 }
 
+noinline int bpf_fexit_loop_test1(int n)
+{
+	int i, sum = 0;
+
+	/* the primary goal of this test is to test LBR. Create a lot of
+	 * branches in the function, so we can catch it easily.
+	 */
+	for (i = 0; i < n; i++)
+		sum += i;
+	return sum;
+}
+
 __diag_pop();
 
 ALLOW_ERROR_INJECTION(bpf_modify_return_test, ERRNO);
@@ -293,7 +305,8 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog,
 		    bpf_fentry_test5(11, (void *)12, 13, 14, 15) != 65 ||
 		    bpf_fentry_test6(16, (void *)17, 18, 19, (void *)20, 21) != 111 ||
 		    bpf_fentry_test7((struct bpf_fentry_test_t *)0) != 0 ||
-		    bpf_fentry_test8(&arg) != 0)
+		    bpf_fentry_test8(&arg) != 0 ||
+		    bpf_fexit_loop_test1(101) != 5050)
 			goto out;
 		break;
 	case BPF_MODIFY_RETURN:
diff --git a/tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c b/tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c
new file mode 100644
index 0000000000000..9bb16826418fb
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include <test_progs.h>
+#include "get_branch_snapshot.skel.h"
+
+static int *pfd_array;
+static int cpu_cnt;
+
+static int create_perf_events(void)
+{
+	struct perf_event_attr attr = {0};
+	int cpu;
+
+	/* create perf event */
+	attr.size = sizeof(attr);
+	attr.type = PERF_TYPE_RAW;
+	attr.config = 0x1b00;
+	attr.sample_type = PERF_SAMPLE_BRANCH_STACK;
+	attr.branch_sample_type = PERF_SAMPLE_BRANCH_KERNEL |
+		PERF_SAMPLE_BRANCH_USER | PERF_SAMPLE_BRANCH_ANY;
+
+	cpu_cnt = libbpf_num_possible_cpus();
+	pfd_array = malloc(sizeof(int) * cpu_cnt);
+	if (!pfd_array) {
+		cpu_cnt = 0;
+		return 1;
+	}
+
+	for (cpu = 0; cpu < libbpf_num_possible_cpus(); cpu++) {
+		pfd_array[cpu] = syscall(__NR_perf_event_open, &attr,
+					 -1, cpu, -1, PERF_FLAG_FD_CLOEXEC);
+		if (pfd_array[cpu] < 0)
+			break;
+	}
+
+	return cpu == 0;
+}
+
+static void close_perf_events(void)
+{
+	int cpu = 0;
+	int fd;
+
+	while (cpu++ < cpu_cnt) {
+		fd = pfd_array[cpu];
+		if (fd < 0)
+			break;
+		close(fd);
+	}
+	free(pfd_array);
+}
+
+void test_get_branch_snapshot(void)
+{
+	struct get_branch_snapshot *skel;
+	int err, prog_fd;
+	__u32 retval;
+
+	if (create_perf_events()) {
+		test__skip();  /* system doesn't support LBR */
+		goto cleanup;
+	}
+
+	skel = get_branch_snapshot__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "get_branch_snapshot__open_and_load"))
+		goto cleanup;
+
+	err = kallsyms_find("bpf_fexit_loop_test1", &skel->bss->address_low);
+	if (!ASSERT_OK(err, "kallsyms_find"))
+		goto cleanup;
+
+	err = kallsyms_find_next("bpf_fexit_loop_test1", &skel->bss->address_high);
+	if (!ASSERT_OK(err, "kallsyms_find_next"))
+		goto cleanup;
+
+	err = get_branch_snapshot__attach(skel);
+	if (!ASSERT_OK(err, "get_branch_snapshot__attach"))
+		goto cleanup;
+
+	prog_fd = bpf_program__fd(skel->progs.test1);
+	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+				NULL, 0, &retval, NULL);
+
+	if (!ASSERT_OK(err, "bpf_prog_test_run"))
+		goto cleanup;
+
+	if (skel->bss->total_entries < 16) {
+		/* too few entries for the hit/waste test */
+		test__skip();
+		goto cleanup;
+	}
+
+	ASSERT_GT(skel->bss->test1_hits, 5, "find_test1_in_lbr");
+
+	/* Given we stop LBR in software, we will waste a few entries.
+	 * But we should try to waste as few as possibleentries. We are at
+	 * about 7 on x86_64 systems.
+	 * Add a check for < 10 so that we get heads-up when something
+	 * changes and wastes too many entries.
+	 */
+	ASSERT_LT(skel->bss->wasted_entries, 10, "check_wasted_entries");
+
+cleanup:
+	get_branch_snapshot__destroy(skel);
+	close_perf_events();
+}
diff --git a/tools/testing/selftests/bpf/progs/get_branch_snapshot.c b/tools/testing/selftests/bpf/progs/get_branch_snapshot.c
new file mode 100644
index 0000000000000..9c944e7480b95
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/get_branch_snapshot.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+__u64 test1_hits = 0;
+__u64 address_low = 0;
+__u64 address_high = 0;
+int wasted_entries = 0;
+long total_entries = 0;
+
+#define MAX_LBR_ENTRIES 32
+
+struct perf_branch_entry entries[MAX_LBR_ENTRIES] = {};
+
+
+static inline bool in_range(__u64 val)
+{
+	return (val >= address_low) && (val < address_high);
+}
+
+SEC("fexit/bpf_fexit_loop_test1")
+int BPF_PROG(test1, int n, int ret)
+{
+	long i;
+
+	total_entries = bpf_get_branch_snapshot(entries, sizeof(entries));
+
+	for (i = 0; i < MAX_LBR_ENTRIES; i++) {
+		if (i >= total_entries)
+			break;
+		if (in_range(entries[i].from) && in_range(entries[i].to))
+			test1_hits++;
+		else if (!test1_hits)
+			wasted_entries++;
+	}
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index e7a19b04d4eaf..2926a3b626821 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -117,6 +117,36 @@ int kallsyms_find(const char *sym, unsigned long long *addr)
 	return err;
 }
 
+/* find the address of the next symbol, this can be used to determine the
+ * end of a function
+ */
+int kallsyms_find_next(const char *sym, unsigned long long *addr)
+{
+	char type, name[500];
+	unsigned long long value;
+	bool found = false;
+	int err = 0;
+	FILE *f;
+
+	f = fopen("/proc/kallsyms", "r");
+	if (!f)
+		return -EINVAL;
+
+	while (fscanf(f, "%llx %c %499s%*[^\n]\n", &value, &type, name) > 0) {
+		if (found) {
+			*addr = value;
+			goto out;
+		}
+		if (strcmp(name, sym) == 0)
+			found = true;
+	}
+	err = -ENOENT;
+
+out:
+	fclose(f);
+	return err;
+}
+
 void read_trace_pipe(void)
 {
 	int trace_fd;
diff --git a/tools/testing/selftests/bpf/trace_helpers.h b/tools/testing/selftests/bpf/trace_helpers.h
index d907b445524d5..bc8ed86105d94 100644
--- a/tools/testing/selftests/bpf/trace_helpers.h
+++ b/tools/testing/selftests/bpf/trace_helpers.h
@@ -16,6 +16,11 @@ long ksym_get_addr(const char *name);
 /* open kallsyms and find addresses on the fly, faster than load + search. */
 int kallsyms_find(const char *sym, unsigned long long *addr);
 
+/* find the address of the next symbol, this can be used to determine the
+ * end of a function
+ */
+int kallsyms_find_next(const char *sym, unsigned long long *addr);
+
 void read_trace_pipe(void);
 
 ssize_t get_uprobe_offset(const void *addr, ssize_t base);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 2/3] bpf: introduce helper bpf_get_branch_snapshot
  2021-08-26 22:13 ` [PATCH v2 bpf-next 2/3] bpf: introduce helper bpf_get_branch_snapshot Song Liu
@ 2021-08-27  9:28   ` kernel test robot
  2021-08-27 15:10   ` kernel test robot
  1 sibling, 0 replies; 18+ messages in thread
From: kernel test robot @ 2021-08-27  9:28 UTC (permalink / raw)
  To: Song Liu, bpf, linux-kernel
  Cc: kbuild-all, acme, peterz, mingo, kjain, kernel-team, Song Liu

[-- Attachment #1: Type: text/plain, Size: 1885 bytes --]

Hi Song,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Song-Liu/bpf-introduce-bpf_get_branch_snapshot/20210827-061705
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ad355675cf08d95b3f5171b681420cfc8e2f1b7e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Song-Liu/bpf-introduce-bpf_get_branch_snapshot/20210827-061705
        git checkout ad355675cf08d95b3f5171b681420cfc8e2f1b7e
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arm64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   aarch64-linux-ld: kernel/bpf/trampoline.o: in function `__bpf_prog_enter':
>> trampoline.c:(.text+0x61c): undefined reference to `bpf_perf_branch_snapshot'
   aarch64-linux-ld: kernel/bpf/trampoline.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `bpf_perf_branch_snapshot' which may bind externally can not be used when making a shared object; recompile with -fPIC
   trampoline.c:(.text+0x61c): dangerous relocation: unsupported relocation
>> aarch64-linux-ld: trampoline.c:(.text+0x624): undefined reference to `bpf_perf_branch_snapshot'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 55207 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 2/3] bpf: introduce helper bpf_get_branch_snapshot
  2021-08-26 22:13 ` [PATCH v2 bpf-next 2/3] bpf: introduce helper bpf_get_branch_snapshot Song Liu
  2021-08-27  9:28   ` kernel test robot
@ 2021-08-27 15:10   ` kernel test robot
  2021-08-30 10:47     ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: kernel test robot @ 2021-08-27 15:10 UTC (permalink / raw)
  To: Song Liu, bpf, linux-kernel
  Cc: kbuild-all, acme, peterz, mingo, kjain, kernel-team, Song Liu

[-- Attachment #1: Type: text/plain, Size: 1651 bytes --]

Hi Song,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Song-Liu/bpf-introduce-bpf_get_branch_snapshot/20210827-061705
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: riscv-randconfig-r023-20210827 (attached as .config)
compiler: riscv32-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ad355675cf08d95b3f5171b681420cfc8e2f1b7e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Song-Liu/bpf-introduce-bpf_get_branch_snapshot/20210827-061705
        git checkout ad355675cf08d95b3f5171b681420cfc8e2f1b7e
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   riscv32-linux-ld: kernel/bpf/trampoline.o: in function `.L57':
>> trampoline.c:(.text+0x34c): undefined reference to `__SCK__perf_snapshot_branch_stack'
   riscv32-linux-ld: kernel/bpf/trampoline.o: in function `.L61':
   trampoline.c:(.text+0x360): undefined reference to `bpf_perf_branch_snapshot'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37464 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-08-26 22:13 ` [PATCH v2 bpf-next 1/3] perf: enable branch record for software events Song Liu
@ 2021-08-30 10:22   ` Peter Zijlstra
  2021-08-30 15:25     ` Song Liu
  2021-08-30 17:41     ` Song Liu
  2021-08-30 10:43   ` Peter Zijlstra
  1 sibling, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2021-08-30 10:22 UTC (permalink / raw)
  To: Song Liu; +Cc: bpf, linux-kernel, acme, mingo, kjain, kernel-team

On Thu, Aug 26, 2021 at 03:13:04PM -0700, Song Liu wrote:
> +int dummy_perf_snapshot_branch_stack(struct perf_branch_snapshot *br_snapshot);
> +
> +DECLARE_STATIC_CALL(perf_snapshot_branch_stack, dummy_perf_snapshot_branch_stack);
> +
>  #endif /* _LINUX_PERF_EVENT_H */
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 011cc5069b7ba..c53fe90e630ac 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -13437,3 +13437,6 @@ struct cgroup_subsys perf_event_cgrp_subsys = {
>  	.threaded	= true,
>  };
>  #endif /* CONFIG_CGROUP_PERF */
> +
> +DEFINE_STATIC_CALL_NULL(perf_snapshot_branch_stack,
> +			dummy_perf_snapshot_branch_stack);

This isn't right...

The whole dummy_perf_snapshot_branch_stack() thing is a declaration only
and used as a typedef. Also, DEFINE_STATIC_CALL_NULL() and
static_call_cond() rely on a void return value, which it doesn't have.

Did you want:

  DECLARE_STATIC_CALL(perf_snapshot_branch_stack, void (*)(struct perf_branch_snapshot *));

  DEFINE_STATIC_CALL_NULL(perf_snapshot_branch_stack, void (*)(struct perf_branch_snapshot *));

  static_call_cond(perf_snapshot_branch_stack)(...);

*OR*, do you actually need that return value, in which case you're
probably looking for:

  DECLARE_STATIC_CALL(perf_snapshot_branch_stack, int (*)(struct perf_branch_snapshot *));

  DEFINE_STATIC_CALL_RET0(perf_snapshot_branch_stack, int (*)(struct perf_branch_snapshot *));

  ret = static_call(perf_snapshot_branch_stack)(...);

?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-08-26 22:13 ` [PATCH v2 bpf-next 1/3] perf: enable branch record for software events Song Liu
  2021-08-30 10:22   ` Peter Zijlstra
@ 2021-08-30 10:43   ` Peter Zijlstra
  2021-08-30 16:06     ` Song Liu
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2021-08-30 10:43 UTC (permalink / raw)
  To: Song Liu; +Cc: bpf, linux-kernel, acme, mingo, kjain, kernel-team

On Thu, Aug 26, 2021 at 03:13:04PM -0700, Song Liu wrote:

> Some data on intel_pmu_lbr_disable_all() and perf_pmu_disable().
> 
> With this patch, when fexit program triggers, intel_pmu_lbr_disable_all is
> used to stop the LBR, and the LBR is stopped after 6 extra branch records
> (see the full trace below). If we replace intel_pmu_lbr_disable_all in
> intel_pmu_snapshot_branch_stack() with perf_pmu_disable, the LBR is stopped
> after 19 extra branch records. This is still acceptable for systems with 32
> LBR entries. But for systems with fewer entries, all the entries before
> fexit are flushed. Therefore, I suggest we take the short cut and stop LBR
> asap.
> 
> 
> LBR snapshot captured when we use intel_pmu_lbr_disable_all():
> 
> ID: 0 from intel_pmu_lbr_disable_all.part.10+37 to intel_pmu_lbr_disable_all.part.10+72
> ID: 1 from intel_pmu_lbr_disable_all.part.10+33 to intel_pmu_lbr_disable_all.part.10+37
> ID: 2 from intel_pmu_snapshot_branch_stack+51 to intel_pmu_lbr_disable_all.part.10+0
> ID: 3 from __bpf_prog_enter+53 to intel_pmu_snapshot_branch_stack+0
> ID: 4 from __bpf_prog_enter+8 to __bpf_prog_enter+38
> ID: 5 from __brk_limit+473903158 to __bpf_prog_enter+0
> ID: 6 from bpf_fexit_loop_test1+22 to __brk_limit+473903139
> ID: 7 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
> ID: 8 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
> ID: 9 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
> 
> 
> LBR snapshot captured when we use perf_pmu_disable():
> 
> ID: 0 from intel_pmu_lbr_disable_all+58 to intel_pmu_lbr_disable_all+93
> ID: 1 from intel_pmu_lbr_disable_all+54 to intel_pmu_lbr_disable_all+58
> ID: 2 from intel_pmu_disable_all+15 to intel_pmu_lbr_disable_all+0
> ID: 3 from intel_pmu_pebs_disable_all+30 to intel_pmu_disable_all+15
> ID: 4 from intel_pmu_disable_all+10 to intel_pmu_pebs_disable_all+0
> ID: 5 from __intel_pmu_disable_all+49 to intel_pmu_disable_all+10
> ID: 6 from intel_pmu_disable_all+5 to __intel_pmu_disable_all+0
> ID: 7 from x86_pmu_disable+61 to intel_pmu_disable_all+0
> ID: 8 from x86_pmu_disable+38 to x86_pmu_disable+41
> ID: 9 from __x86_indirect_thunk_rax+16 to x86_pmu_disable+0
> ID: 10 from __x86_indirect_thunk_rax+0 to __x86_indirect_thunk_rax+12
> ID: 11 from perf_pmu_disable.part.122+4 to __x86_indirect_thunk_rax+0
> ID: 12 from perf_pmu_disable+23 to perf_pmu_disable.part.122+0
> ID: 13 from intel_pmu_snapshot_branch_stack+45 to perf_pmu_disable+0
> ID: 14 from x86_get_pmu+35 to intel_pmu_snapshot_branch_stack+39
> ID: 15 from intel_pmu_snapshot_branch_stack+34 to x86_get_pmu+0
> ID: 16 from __bpf_prog_enter+53 to intel_pmu_snapshot_branch_stack+0
> ID: 17 from __bpf_prog_enter+8 to __bpf_prog_enter+38
> ID: 18 from __brk_limit+478056502 to __bpf_prog_enter+0
> ID: 19 from bpf_fexit_loop_test1+22 to __brk_limit+478056483
> ID: 20 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
> ID: 21 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13

Well, if you're willing to do something like:

> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index ac6fd2dabf6a2..a29649e7241cc 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -6283,8 +6283,11 @@ __init int intel_pmu_init(void)
>  			x86_pmu.lbr_nr = 0;
>  	}
> 
> -	if (x86_pmu.lbr_nr)
> +	if (x86_pmu.lbr_nr) {
>  		pr_cont("%d-deep LBR, ", x86_pmu.lbr_nr);

		if (x86_pmu.disable_all == intel_pmu_disable_all)

> +		static_call_update(perf_snapshot_branch_stack,
> +				   intel_pmu_snapshot_branch_stack);
> +	}
> 
>  	intel_pmu_check_extra_regs(x86_pmu.extra_regs);
> 
> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
> index 9e6d6eaeb4cb6..7d4fe1d6e79ff 100644
> --- a/arch/x86/events/intel/lbr.c
> +++ b/arch/x86/events/intel/lbr.c
> @@ -1862,3 +1862,16 @@ EXPORT_SYMBOL_GPL(x86_perf_get_lbr);
>  struct event_constraint vlbr_constraint =
>  	__EVENT_CONSTRAINT(INTEL_FIXED_VLBR_EVENT, (1ULL << INTEL_PMC_IDX_FIXED_VLBR),
>  			  FIXED_EVENT_FLAGS, 1, 0, PERF_X86_EVENT_LBR_SELECT);
> +
> +int intel_pmu_snapshot_branch_stack(struct perf_branch_snapshot *br_snapshot)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +
> +	intel_pmu_lbr_disable_all();
> +	intel_pmu_lbr_read();
> +	memcpy(br_snapshot->entries, cpuc->lbr_entries,
> +	       sizeof(struct perf_branch_entry) * x86_pmu.lbr_nr);
> +	br_snapshot->nr = x86_pmu.lbr_nr;
> +	intel_pmu_lbr_enable_all(false);
> +	return 0;
> +}

Then the above can assume perfmon > v2 and we can either inline
__intel_pmu_disable_all() or simply do the
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL).

One thing that needs checking, intel_pmu_disable_all() also clears
MSR_IA32_PEBS_ENABLE, is that really needed if we just want to inhibit
PMIs ? That is, will the PEBS machinery still trigger PMI if GLOBAL_CTRL
== 0 ?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 2/3] bpf: introduce helper bpf_get_branch_snapshot
  2021-08-27 15:10   ` kernel test robot
@ 2021-08-30 10:47     ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2021-08-30 10:47 UTC (permalink / raw)
  To: kernel test robot
  Cc: Song Liu, bpf, linux-kernel, kbuild-all, acme, mingo, kjain, kernel-team

On Fri, Aug 27, 2021 at 11:10:39PM +0800, kernel test robot wrote:

> All errors (new ones prefixed by >>):
> 
>    riscv32-linux-ld: kernel/bpf/trampoline.o: in function `.L57':
> >> trampoline.c:(.text+0x34c): undefined reference to `__SCK__perf_snapshot_branch_stack'
>    riscv32-linux-ld: kernel/bpf/trampoline.o: in function `.L61':
>    trampoline.c:(.text+0x360): undefined reference to `bpf_perf_branch_snapshot'
> 

This a build with PERF_EVENTS=n, I suppose you'd better make calling
perf_snapshot_branch_stack() dependent on having that :-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-08-30 10:22   ` Peter Zijlstra
@ 2021-08-30 15:25     ` Song Liu
  2021-08-30 16:06       ` Peter Zijlstra
  2021-08-30 17:41     ` Song Liu
  1 sibling, 1 reply; 18+ messages in thread
From: Song Liu @ 2021-08-30 15:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: open list:BPF (Safe dynamic programs and tools),
	LKML, Arnaldo Carvalho de Melo, Ingo Molnar, kajoljain,
	Kernel Team



> On Aug 30, 2021, at 3:22 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, Aug 26, 2021 at 03:13:04PM -0700, Song Liu wrote:
>> +int dummy_perf_snapshot_branch_stack(struct perf_branch_snapshot *br_snapshot);
>> +
>> +DECLARE_STATIC_CALL(perf_snapshot_branch_stack, dummy_perf_snapshot_branch_stack);
>> +
>> #endif /* _LINUX_PERF_EVENT_H */
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 011cc5069b7ba..c53fe90e630ac 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -13437,3 +13437,6 @@ struct cgroup_subsys perf_event_cgrp_subsys = {
>> 	.threaded	= true,
>> };
>> #endif /* CONFIG_CGROUP_PERF */
>> +
>> +DEFINE_STATIC_CALL_NULL(perf_snapshot_branch_stack,
>> +			dummy_perf_snapshot_branch_stack);
> 
> This isn't right...
> 
> The whole dummy_perf_snapshot_branch_stack() thing is a declaration only
> and used as a typedef. Also, DEFINE_STATIC_CALL_NULL() and
> static_call_cond() rely on a void return value, which it doesn't have.
> 
> Did you want:
> 
>  DECLARE_STATIC_CALL(perf_snapshot_branch_stack, void (*)(struct perf_branch_snapshot *));
> 
>  DEFINE_STATIC_CALL_NULL(perf_snapshot_branch_stack, void (*)(struct perf_branch_snapshot *));
> 
>  static_call_cond(perf_snapshot_branch_stack)(...);
> 
> *OR*, do you actually need that return value, in which case you're
> probably looking for:
> 
>  DECLARE_STATIC_CALL(perf_snapshot_branch_stack, int (*)(struct perf_branch_snapshot *));
> 
>  DEFINE_STATIC_CALL_RET0(perf_snapshot_branch_stack, int (*)(struct perf_branch_snapshot *));
> 
>  ret = static_call(perf_snapshot_branch_stack)(...);
> 
> ?

Thanks for these information! I did get confused these macros for quite a 
while. Let me try with the _RET0 version.

Song  


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-08-30 10:43   ` Peter Zijlstra
@ 2021-08-30 16:06     ` Song Liu
  0 siblings, 0 replies; 18+ messages in thread
From: Song Liu @ 2021-08-30 16:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: open list:BPF (Safe dynamic programs and tools),
	LKML, Arnaldo Carvalho de Melo, Ingo Molnar, kajoljain,
	Kernel Team



> On Aug 30, 2021, at 3:43 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, Aug 26, 2021 at 03:13:04PM -0700, Song Liu wrote:
> 
>> Some data on intel_pmu_lbr_disable_all() and perf_pmu_disable().
>> 
>> With this patch, when fexit program triggers, intel_pmu_lbr_disable_all is
>> used to stop the LBR, and the LBR is stopped after 6 extra branch records
>> (see the full trace below). If we replace intel_pmu_lbr_disable_all in
>> intel_pmu_snapshot_branch_stack() with perf_pmu_disable, the LBR is stopped
>> after 19 extra branch records. This is still acceptable for systems with 32
>> LBR entries. But for systems with fewer entries, all the entries before
>> fexit are flushed. Therefore, I suggest we take the short cut and stop LBR
>> asap.
>> 
>> 
>> LBR snapshot captured when we use intel_pmu_lbr_disable_all():
>> 
>> ID: 0 from intel_pmu_lbr_disable_all.part.10+37 to intel_pmu_lbr_disable_all.part.10+72
>> ID: 1 from intel_pmu_lbr_disable_all.part.10+33 to intel_pmu_lbr_disable_all.part.10+37
>> ID: 2 from intel_pmu_snapshot_branch_stack+51 to intel_pmu_lbr_disable_all.part.10+0
>> ID: 3 from __bpf_prog_enter+53 to intel_pmu_snapshot_branch_stack+0
>> ID: 4 from __bpf_prog_enter+8 to __bpf_prog_enter+38
>> ID: 5 from __brk_limit+473903158 to __bpf_prog_enter+0
>> ID: 6 from bpf_fexit_loop_test1+22 to __brk_limit+473903139
>> ID: 7 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
>> ID: 8 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
>> ID: 9 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
>> 
>> 
>> LBR snapshot captured when we use perf_pmu_disable():
>> 
>> ID: 0 from intel_pmu_lbr_disable_all+58 to intel_pmu_lbr_disable_all+93
>> ID: 1 from intel_pmu_lbr_disable_all+54 to intel_pmu_lbr_disable_all+58
>> ID: 2 from intel_pmu_disable_all+15 to intel_pmu_lbr_disable_all+0
>> ID: 3 from intel_pmu_pebs_disable_all+30 to intel_pmu_disable_all+15
>> ID: 4 from intel_pmu_disable_all+10 to intel_pmu_pebs_disable_all+0
>> ID: 5 from __intel_pmu_disable_all+49 to intel_pmu_disable_all+10
>> ID: 6 from intel_pmu_disable_all+5 to __intel_pmu_disable_all+0
>> ID: 7 from x86_pmu_disable+61 to intel_pmu_disable_all+0
>> ID: 8 from x86_pmu_disable+38 to x86_pmu_disable+41
>> ID: 9 from __x86_indirect_thunk_rax+16 to x86_pmu_disable+0
>> ID: 10 from __x86_indirect_thunk_rax+0 to __x86_indirect_thunk_rax+12
>> ID: 11 from perf_pmu_disable.part.122+4 to __x86_indirect_thunk_rax+0
>> ID: 12 from perf_pmu_disable+23 to perf_pmu_disable.part.122+0
>> ID: 13 from intel_pmu_snapshot_branch_stack+45 to perf_pmu_disable+0
>> ID: 14 from x86_get_pmu+35 to intel_pmu_snapshot_branch_stack+39
>> ID: 15 from intel_pmu_snapshot_branch_stack+34 to x86_get_pmu+0
>> ID: 16 from __bpf_prog_enter+53 to intel_pmu_snapshot_branch_stack+0
>> ID: 17 from __bpf_prog_enter+8 to __bpf_prog_enter+38
>> ID: 18 from __brk_limit+478056502 to __bpf_prog_enter+0
>> ID: 19 from bpf_fexit_loop_test1+22 to __brk_limit+478056483
>> ID: 20 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
>> ID: 21 from bpf_fexit_loop_test1+20 to bpf_fexit_loop_test1+13
> 
> Well, if you're willing to do something like:
> 
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index ac6fd2dabf6a2..a29649e7241cc 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -6283,8 +6283,11 @@ __init int intel_pmu_init(void)
>> 			x86_pmu.lbr_nr = 0;
>> 	}
>> 
>> -	if (x86_pmu.lbr_nr)
>> +	if (x86_pmu.lbr_nr) {
>> 		pr_cont("%d-deep LBR, ", x86_pmu.lbr_nr);
> 
> 		if (x86_pmu.disable_all == intel_pmu_disable_all)
> 
>> +		static_call_update(perf_snapshot_branch_stack,
>> +				   intel_pmu_snapshot_branch_stack);
>> +	}
>> 
>> 	intel_pmu_check_extra_regs(x86_pmu.extra_regs);
>> 
>> diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
>> index 9e6d6eaeb4cb6..7d4fe1d6e79ff 100644
>> --- a/arch/x86/events/intel/lbr.c
>> +++ b/arch/x86/events/intel/lbr.c
>> @@ -1862,3 +1862,16 @@ EXPORT_SYMBOL_GPL(x86_perf_get_lbr);
>> struct event_constraint vlbr_constraint =
>> 	__EVENT_CONSTRAINT(INTEL_FIXED_VLBR_EVENT, (1ULL << INTEL_PMC_IDX_FIXED_VLBR),
>> 			  FIXED_EVENT_FLAGS, 1, 0, PERF_X86_EVENT_LBR_SELECT);
>> +
>> +int intel_pmu_snapshot_branch_stack(struct perf_branch_snapshot *br_snapshot)
>> +{
>> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> +
>> +	intel_pmu_lbr_disable_all();
>> +	intel_pmu_lbr_read();
>> +	memcpy(br_snapshot->entries, cpuc->lbr_entries,
>> +	       sizeof(struct perf_branch_entry) * x86_pmu.lbr_nr);
>> +	br_snapshot->nr = x86_pmu.lbr_nr;
>> +	intel_pmu_lbr_enable_all(false);
>> +	return 0;
>> +}
> 
> Then the above can assume perfmon > v2 and we can either inline
> __intel_pmu_disable_all() or simply do the
> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL).

I think can do perfmon > v2 only. 

> 
> One thing that needs checking, intel_pmu_disable_all() also clears
> MSR_IA32_PEBS_ENABLE, is that really needed if we just want to inhibit
> PMIs ? That is, will the PEBS machinery still trigger PMI if GLOBAL_CTRL
> == 0 ?

Actually, can we do something like:

static void intel_pmu_disable_all(void)
{
        intel_pmu_lbr_disable_all();   /* moved to the beginning */
        __intel_pmu_disable_all();
        intel_pmu_pebs_disable_all();
}

int intel_pmu_snapshot_branch_stack(struct perf_branch_snapshot *br_snapshot)
{
        struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);

	
        intel_pmu_disable_all();   /* call full pmu_disable */
        intel_pmu_lbr_read();
        memcpy(br_snapshot->entries, cpuc->lbr_entries,
               sizeof(struct perf_branch_entry) * x86_pmu.lbr_nr);
        br_snapshot->nr = x86_pmu.lbr_nr;

        intel_pmu_enable_all(false);
        return 0;
}

In this way, we still call intel_pmu_disable_all(), but since LBR is disabled 
at the beginning of it, we would not flush too many LBR entries. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-08-30 15:25     ` Song Liu
@ 2021-08-30 16:06       ` Peter Zijlstra
  2021-08-30 16:36         ` Song Liu
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2021-08-30 16:06 UTC (permalink / raw)
  To: Song Liu
  Cc: open list:BPF (Safe dynamic programs and tools),
	LKML, Arnaldo Carvalho de Melo, Ingo Molnar, kajoljain,
	Kernel Team

On Mon, Aug 30, 2021 at 03:25:44PM +0000, Song Liu wrote:
> Thanks for these information! I did get confused these macros for quite a 
> while. Let me try with the _RET0 version.

Does you kernel have:

  9ae6ab27f44e ("static_call: Update API documentation")

?

With that included, the comment at the top of static_call.h reads like
the below. Please let me know where you think this can be improved.


/*
 * Static call support
 *
 * Static calls use code patching to hard-code function pointers into direct
 * branch instructions. They give the flexibility of function pointers, but
 * with improved performance. This is especially important for cases where
 * retpolines would otherwise be used, as retpolines can significantly impact
 * performance.
 *
 *
 * API overview:
 *
 *   DECLARE_STATIC_CALL(name, func);
 *   DEFINE_STATIC_CALL(name, func);
 *   DEFINE_STATIC_CALL_NULL(name, typename);
 *   DEFINE_STATIC_CALL_RET0(name, typename);
 *
 *   __static_call_return0;
 *
 *   static_call(name)(args...);
 *   static_call_cond(name)(args...);
 *   static_call_update(name, func);
 *   static_call_query(name);
 *
 *   EXPORT_STATIC_CALL{,_TRAMP}{,_GPL}()
 *
 * Usage example:
 *
 *   # Start with the following functions (with identical prototypes):
 *   int func_a(int arg1, int arg2);
 *   int func_b(int arg1, int arg2);
 *
 *   # Define a 'my_name' reference, associated with func_a() by default
 *   DEFINE_STATIC_CALL(my_name, func_a);
 *
 *   # Call func_a()
 *   static_call(my_name)(arg1, arg2);
 *
 *   # Update 'my_name' to point to func_b()
 *   static_call_update(my_name, &func_b);
 *
 *   # Call func_b()
 *   static_call(my_name)(arg1, arg2);
 *
 *
 * Implementation details:
 *
 *   This requires some arch-specific code (CONFIG_HAVE_STATIC_CALL).
 *   Otherwise basic indirect calls are used (with function pointers).
 *
 *   Each static_call() site calls into a trampoline associated with the name.
 *   The trampoline has a direct branch to the default function.  Updates to a
 *   name will modify the trampoline's branch destination.
 *
 *   If the arch has CONFIG_HAVE_STATIC_CALL_INLINE, then the call sites
 *   themselves will be patched at runtime to call the functions directly,
 *   rather than calling through the trampoline.  This requires objtool or a
 *   compiler plugin to detect all the static_call() sites and annotate them
 *   in the .static_call_sites section.
 *
 *
 * Notes on NULL function pointers:
 *
 *   Static_call()s support NULL functions, with many of the caveats that
 *   regular function pointers have.
 *
 *   Clearly calling a NULL function pointer is 'BAD', so too for
 *   static_call()s (although when HAVE_STATIC_CALL it might not be immediately
 *   fatal). A NULL static_call can be the result of:
 *
 *     DECLARE_STATIC_CALL_NULL(my_static_call, void (*)(int));
 *
 *   which is equivalent to declaring a NULL function pointer with just a
 *   typename:
 *
 *     void (*my_func_ptr)(int arg1) = NULL;
 *
 *   or using static_call_update() with a NULL function. In both cases the
 *   HAVE_STATIC_CALL implementation will patch the trampoline with a RET
 *   instruction, instead of an immediate tail-call JMP. HAVE_STATIC_CALL_INLINE
 *   architectures can patch the trampoline call to a NOP.
 *
 *   In all cases, any argument evaluation is unconditional. Unlike a regular
 *   conditional function pointer call:
 *
 *     if (my_func_ptr)
 *         my_func_ptr(arg1)
 *
 *   where the argument evaludation also depends on the pointer value.
 *
 *   When calling a static_call that can be NULL, use:
 *
 *     static_call_cond(name)(arg1);
 *
 *   which will include the required value tests to avoid NULL-pointer
 *   dereferences.
 *
 *   To query which function is currently set to be called, use:
 *
 *   func = static_call_query(name);
 *
 *
 * DEFINE_STATIC_CALL_RET0 / __static_call_return0:
 *
 *   Just like how DEFINE_STATIC_CALL_NULL() / static_call_cond() optimize the
 *   conditional void function call, DEFINE_STATIC_CALL_RET0 /
 *   __static_call_return0 optimize the do nothing return 0 function.
 *
 *   This feature is strictly UB per the C standard (since it casts a function
 *   pointer to a different signature) and relies on the architecture ABI to
 *   make things work. In particular it relies on Caller Stack-cleanup and the
 *   whole return register being clobbered for short return values. All normal
 *   CDECL style ABIs conform.
 *
 *   In particular the x86_64 implementation replaces the 5 byte CALL
 *   instruction at the callsite with a 5 byte clear of the RAX register,
 *   completely eliding any function call overhead.
 *
 *   Notably argument setup is unconditional.
 *
 *
 * EXPORT_STATIC_CALL() vs EXPORT_STATIC_CALL_TRAMP():
 *
 *   The difference is that the _TRAMP variant tries to only export the
 *   trampoline with the result that a module can use static_call{,_cond}() but
 *   not static_call_update().
 *
 */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-08-30 16:06       ` Peter Zijlstra
@ 2021-08-30 16:36         ` Song Liu
  2021-09-01 17:09           ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Song Liu @ 2021-08-30 16:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: open list:BPF (Safe dynamic programs and tools),
	LKML, Arnaldo Carvalho de Melo, Ingo Molnar, kajoljain,
	Kernel Team



> On Aug 30, 2021, at 9:06 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Mon, Aug 30, 2021 at 03:25:44PM +0000, Song Liu wrote:
>> Thanks for these information! I did get confused these macros for quite a 
>> while. Let me try with the _RET0 version.
> 
> Does you kernel have:
> 
>  9ae6ab27f44e ("static_call: Update API documentation")
> 
> ?
> 
> With that included, the comment at the top of static_call.h reads like
> the below. Please let me know where you think this can be improved.

Aha, my kernel has the code for _RET0 part, but not the documentation. 

> 
> /*
> * Static call support
> *
> * Static calls use code patching to hard-code function pointers into direct
> * branch instructions. They give the flexibility of function pointers, but
> * with improved performance. This is especially important for cases where
> * retpolines would otherwise be used, as retpolines can significantly impact
> * performance.
> *

[...]

> *
> * Notes on NULL function pointers:
> *
> *   Static_call()s support NULL functions, with many of the caveats that
> *   regular function pointers have.
> *
> *   Clearly calling a NULL function pointer is 'BAD', so too for
> *   static_call()s (although when HAVE_STATIC_CALL it might not be immediately
> *   fatal). A NULL static_call can be the result of:
> *

Probably add:

 *     /* for function that returns NULL */
> *     DECLARE_STATIC_CALL_NULL(my_static_call, void (*)(int));


 *   or 
 *     /* for function that returns int */
 *     DECLARE_STATIC_CALL_RET0(my_static_call, int (*)(int));
 * 

So it is clear that we need two different macros. IIUC, the number and 
type of arguments doesn't matter? 

Also, the default return int function has to return 0, right? Can we let 
it return -EOPNOSUPP? 


> *
> *   which is equivalent to declaring a NULL function pointer with just a
> *   typename:
> *
> *     void (*my_func_ptr)(int arg1) = NULL;
> *

[...]

> *   which will include the required value tests to avoid NULL-pointer
> *   dereferences.
> *


> *   To query which function is currently set to be called, use:
> *
> *   func = static_call_query(name);

Maybe move above two lines to "Usage example:" section? 

Thanks,
Song

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-08-30 10:22   ` Peter Zijlstra
  2021-08-30 15:25     ` Song Liu
@ 2021-08-30 17:41     ` Song Liu
  2021-08-30 18:07       ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Song Liu @ 2021-08-30 17:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: open list:BPF (Safe dynamic programs and tools),
	LKML, acme, mingo, kjain, Kernel Team



> On Aug 30, 2021, at 3:22 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, Aug 26, 2021 at 03:13:04PM -0700, Song Liu wrote:
>> +int dummy_perf_snapshot_branch_stack(struct perf_branch_snapshot *br_snapshot);
>> +
>> +DECLARE_STATIC_CALL(perf_snapshot_branch_stack, dummy_perf_snapshot_branch_stack);
>> +
>> #endif /* _LINUX_PERF_EVENT_H */
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 011cc5069b7ba..c53fe90e630ac 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -13437,3 +13437,6 @@ struct cgroup_subsys perf_event_cgrp_subsys = {
>> 	.threaded	= true,
>> };
>> #endif /* CONFIG_CGROUP_PERF */
>> +
>> +DEFINE_STATIC_CALL_NULL(perf_snapshot_branch_stack,
>> +			dummy_perf_snapshot_branch_stack);
> 
> This isn't right...
> 
> The whole dummy_perf_snapshot_branch_stack() thing is a declaration only
> and used as a typedef. Also, DEFINE_STATIC_CALL_NULL() and
> static_call_cond() rely on a void return value, which it doesn't have.
> 
> Did you want:
> 
>  DECLARE_STATIC_CALL(perf_snapshot_branch_stack, void (*)(struct perf_branch_snapshot *));
> 
>  DEFINE_STATIC_CALL_NULL(perf_snapshot_branch_stack, void (*)(struct perf_branch_snapshot *));
> 
>  static_call_cond(perf_snapshot_branch_stack)(...);
> 
> *OR*, do you actually need that return value, in which case you're
> probably looking for:
> 
>  DECLARE_STATIC_CALL(perf_snapshot_branch_stack, int (*)(struct perf_branch_snapshot *));
> 
>  DEFINE_STATIC_CALL_RET0(perf_snapshot_branch_stack, int (*)(struct perf_branch_snapshot *));
> 
>  ret = static_call(perf_snapshot_branch_stack)(...);
> 
> ?

Hmmm... something doesn't work here. I have:

/* include/linux/perf_event.h */
DECLARE_STATIC_CALL(perf_snapshot_branch_stack,
                   int (*)(struct perf_branch_snapshot *));


/* kernel/events/core.c */
DEFINE_STATIC_CALL_RET0(perf_snapshot_branch_stack,
                       int (*)(struct perf_branch_snapshot *));

/* kernel/bpf/trampoline.c */
       if (prog->call_get_branch)
               static_call(perf_snapshot_branch_stack)(
                       this_cpu_ptr(&bpf_perf_branch_snapshot));

/* arch/x86/events/intel/core.c */
       if (x86_pmu.disable_all == intel_pmu_disable_all)
               static_call_update(perf_snapshot_branch_stack,
                                  intel_pmu_snapshot_branch_stack);

And the compiler keeps complain with:

arch/x86/events/intel/core.c: In function ‘intel_pmu_init’:
./include/linux/static_call.h:121:41: error: initialization of ‘int (**)(struct perf_branch_snapshot *)’ from incompatible pointer type ‘int (*)(struct perf_branch_snapshot *)’ [-Werror=incompatible-pointer-type ]
  typeof(&STATIC_CALL_TRAMP(name)) __F = (func);   \
                                         ^
arch/x86/events/intel/core.c:6305:4: note: in expansion of macro ‘static_call_update’
    static_call_update(perf_snapshot_branch_stack,
    ^~~~~~~~~~~~~~~~~~


Something like 

typedef int (perf_snapshot_branch_stack_t)(struct perf_branch_snapshot *);
DECLARE_STATIC_CALL(perf_snapshot_branch_stack, perf_snapshot_branch_stack_t);

seems to work fine. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-08-30 17:41     ` Song Liu
@ 2021-08-30 18:07       ` Peter Zijlstra
  2021-09-01 17:12         ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2021-08-30 18:07 UTC (permalink / raw)
  To: Song Liu
  Cc: open list:BPF (Safe dynamic programs and tools),
	LKML, acme, mingo, kjain, Kernel Team

On Mon, Aug 30, 2021 at 05:41:46PM +0000, Song Liu wrote:
> DECLARE_STATIC_CALL(perf_snapshot_branch_stack,
>                    int (*)(struct perf_branch_snapshot *));

> Something like 
> 
> typedef int (perf_snapshot_branch_stack_t)(struct perf_branch_snapshot *);
> DECLARE_STATIC_CALL(perf_snapshot_branch_stack, perf_snapshot_branch_stack_t);
> 
> seems to work fine. 

Oh urg, indeed. It wants a function type, not a function pointer type.
I've been bitten by that before. Go with the typedef, that's the sanest.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-08-30 16:36         ` Song Liu
@ 2021-09-01 17:09           ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2021-09-01 17:09 UTC (permalink / raw)
  To: Song Liu
  Cc: open list:BPF (Safe dynamic programs and tools),
	LKML, Arnaldo Carvalho de Melo, Ingo Molnar, kajoljain,
	Kernel Team

On Mon, Aug 30, 2021 at 04:36:44PM +0000, Song Liu wrote:

> > /*
> > * Static call support
> > *
> > * Static calls use code patching to hard-code function pointers into direct
> > * branch instructions. They give the flexibility of function pointers, but
> > * with improved performance. This is especially important for cases where
> > * retpolines would otherwise be used, as retpolines can significantly impact
> > * performance.
> > *
> 
> [...]
> 
> > *
> > * Notes on NULL function pointers:
> > *
> > *   Static_call()s support NULL functions, with many of the caveats that
> > *   regular function pointers have.
> > *
> > *   Clearly calling a NULL function pointer is 'BAD', so too for
> > *   static_call()s (although when HAVE_STATIC_CALL it might not be immediately
> > *   fatal). A NULL static_call can be the result of:
> > *
> 
> Probably add:
> 
>  *     /* for function that returns NULL */
> > *     DECLARE_STATIC_CALL_NULL(my_static_call, void (*)(int));
> 
> 
>  *   or 
>  *     /* for function that returns int */
>  *     DECLARE_STATIC_CALL_RET0(my_static_call, int (*)(int));
>  * 

No, anything that uses static_call_cond() must have void return. Note
that static_call_cond() replaces:

	if (func_ptr)
		func_ptr(args);

which is a statement, not an expression, and as such can't function as
an rvalue.

> So it is clear that we need two different macros. IIUC, the number and 
> type of arguments doesn't matter? 

Right, arguments are irrelevant, provided CDECL ABI, which mandates that
arguments that are pushed on the stack are cleaned up by the caller, not
the callee.

> Also, the default return int function has to return 0, right? Can we let 
> it return -EOPNOSUPP? 

Difficult, in principle we can patch any value that fits in a single
instruction, but the more variants we have, the harder it gets.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-08-30 18:07       ` Peter Zijlstra
@ 2021-09-01 17:12         ` Peter Zijlstra
  2021-09-04 23:01           ` Josh Poimboeuf
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2021-09-01 17:12 UTC (permalink / raw)
  To: Song Liu
  Cc: open list:BPF (Safe dynamic programs and tools),
	LKML, acme, mingo, kjain, Kernel Team, Josh Poimboeuf, jbaron,
	rostedt, ardb

On Mon, Aug 30, 2021 at 08:07:24PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 30, 2021 at 05:41:46PM +0000, Song Liu wrote:
> > DECLARE_STATIC_CALL(perf_snapshot_branch_stack,
> >                    int (*)(struct perf_branch_snapshot *));
> 
> > Something like 
> > 
> > typedef int (perf_snapshot_branch_stack_t)(struct perf_branch_snapshot *);
> > DECLARE_STATIC_CALL(perf_snapshot_branch_stack, perf_snapshot_branch_stack_t);
> > 
> > seems to work fine. 
> 
> Oh urg, indeed. It wants a function type, not a function pointer type.
> I've been bitten by that before. Go with the typedef, that's the sanest.

The below is the best I can make of it... it's a little inconsistent and
somewhat tricky, but at least the compiler yells hard if you get it
wrong.

I can *almost* get to: DEFINE_STATIC_CALL(foo, &func), except for
ARCH_DEFINE_STATIC_CALL_TRAMP() which needs the actual function name
string for the ASM :-(

The rest can do with a function pointer type and have it work.


---

diff --git a/arch/arm/include/asm/paravirt.h b/arch/arm/include/asm/paravirt.h
index 95d5b0d625cd..1094b3abd910 100644
--- a/arch/arm/include/asm/paravirt.h
+++ b/arch/arm/include/asm/paravirt.h
@@ -9,9 +9,7 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-u64 dummy_steal_clock(int cpu);
-
-DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+DECLARE_STATIC_CALL(pv_steal_clock, u64 (*)(int));
 
 static inline u64 paravirt_steal_clock(int cpu)
 {
diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h
index 9aa193e0e8f2..26539c1c277a 100644
--- a/arch/arm64/include/asm/paravirt.h
+++ b/arch/arm64/include/asm/paravirt.h
@@ -9,9 +9,7 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-u64 dummy_steal_clock(int cpu);
-
-DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+DECLARE_STATIC_CALL(pv_steal_clock, u64 (*)(int));
 
 static inline u64 paravirt_steal_clock(int cpu)
 {
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 2a57dbed4894..0c3b302026dc 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -60,35 +60,35 @@ DEFINE_STATIC_KEY_FALSE(perf_is_hybrid);
  * This here uses DEFINE_STATIC_CALL_NULL() to get a static_call defined
  * from just a typename, as opposed to an actual function.
  */
-DEFINE_STATIC_CALL_NULL(x86_pmu_handle_irq,  *x86_pmu.handle_irq);
-DEFINE_STATIC_CALL_NULL(x86_pmu_disable_all, *x86_pmu.disable_all);
-DEFINE_STATIC_CALL_NULL(x86_pmu_enable_all,  *x86_pmu.enable_all);
-DEFINE_STATIC_CALL_NULL(x86_pmu_enable,	     *x86_pmu.enable);
-DEFINE_STATIC_CALL_NULL(x86_pmu_disable,     *x86_pmu.disable);
+DEFINE_STATIC_CALL_NULL(x86_pmu_handle_irq,  x86_pmu.handle_irq);
+DEFINE_STATIC_CALL_NULL(x86_pmu_disable_all, x86_pmu.disable_all);
+DEFINE_STATIC_CALL_NULL(x86_pmu_enable_all,  x86_pmu.enable_all);
+DEFINE_STATIC_CALL_NULL(x86_pmu_enable,	     x86_pmu.enable);
+DEFINE_STATIC_CALL_NULL(x86_pmu_disable,     x86_pmu.disable);
 
-DEFINE_STATIC_CALL_NULL(x86_pmu_add,  *x86_pmu.add);
-DEFINE_STATIC_CALL_NULL(x86_pmu_del,  *x86_pmu.del);
-DEFINE_STATIC_CALL_NULL(x86_pmu_read, *x86_pmu.read);
+DEFINE_STATIC_CALL_NULL(x86_pmu_add,  x86_pmu.add);
+DEFINE_STATIC_CALL_NULL(x86_pmu_del,  x86_pmu.del);
+DEFINE_STATIC_CALL_NULL(x86_pmu_read, x86_pmu.read);
 
-DEFINE_STATIC_CALL_NULL(x86_pmu_schedule_events,       *x86_pmu.schedule_events);
-DEFINE_STATIC_CALL_NULL(x86_pmu_get_event_constraints, *x86_pmu.get_event_constraints);
-DEFINE_STATIC_CALL_NULL(x86_pmu_put_event_constraints, *x86_pmu.put_event_constraints);
+DEFINE_STATIC_CALL_NULL(x86_pmu_schedule_events,       x86_pmu.schedule_events);
+DEFINE_STATIC_CALL_NULL(x86_pmu_get_event_constraints, x86_pmu.get_event_constraints);
+DEFINE_STATIC_CALL_NULL(x86_pmu_put_event_constraints, x86_pmu.put_event_constraints);
 
-DEFINE_STATIC_CALL_NULL(x86_pmu_start_scheduling,  *x86_pmu.start_scheduling);
-DEFINE_STATIC_CALL_NULL(x86_pmu_commit_scheduling, *x86_pmu.commit_scheduling);
-DEFINE_STATIC_CALL_NULL(x86_pmu_stop_scheduling,   *x86_pmu.stop_scheduling);
+DEFINE_STATIC_CALL_NULL(x86_pmu_start_scheduling,  x86_pmu.start_scheduling);
+DEFINE_STATIC_CALL_NULL(x86_pmu_commit_scheduling, x86_pmu.commit_scheduling);
+DEFINE_STATIC_CALL_NULL(x86_pmu_stop_scheduling,   x86_pmu.stop_scheduling);
 
-DEFINE_STATIC_CALL_NULL(x86_pmu_sched_task,    *x86_pmu.sched_task);
-DEFINE_STATIC_CALL_NULL(x86_pmu_swap_task_ctx, *x86_pmu.swap_task_ctx);
+DEFINE_STATIC_CALL_NULL(x86_pmu_sched_task,    x86_pmu.sched_task);
+DEFINE_STATIC_CALL_NULL(x86_pmu_swap_task_ctx, x86_pmu.swap_task_ctx);
 
-DEFINE_STATIC_CALL_NULL(x86_pmu_drain_pebs,   *x86_pmu.drain_pebs);
-DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, *x86_pmu.pebs_aliases);
+DEFINE_STATIC_CALL_NULL(x86_pmu_drain_pebs,   x86_pmu.drain_pebs);
+DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, x86_pmu.pebs_aliases);
 
 /*
  * This one is magic, it will get called even when PMU init fails (because
  * there is no PMU), in which case it should simply return NULL.
  */
-DEFINE_STATIC_CALL_RET0(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);
+DEFINE_STATIC_CALL_RET0(x86_pmu_guest_get_msrs, x86_pmu.guest_get_msrs);
 
 u64 __read_mostly hw_cache_event_ids
 				[PERF_COUNT_HW_CACHE_MAX]
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af6ce8d4c86a..d383bda4316e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1503,7 +1503,7 @@ extern bool __read_mostly enable_apicv;
 extern struct kvm_x86_ops kvm_x86_ops;
 
 #define KVM_X86_OP(func) \
-	DECLARE_STATIC_CALL(kvm_x86_##func, *(((struct kvm_x86_ops *)0)->func));
+	DECLARE_STATIC_CALL(kvm_x86_##func, (((struct kvm_x86_ops *)0)->func));
 #define KVM_X86_OP_NULL KVM_X86_OP
 #include <asm/kvm-x86-ops.h>
 
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index da3a1ac82be5..3db2237e9a8d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -18,11 +18,8 @@
 #include <linux/static_call_types.h>
 #include <asm/frame.h>
 
-u64 dummy_steal_clock(int cpu);
-u64 dummy_sched_clock(void);
-
-DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
-DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);
+DECLARE_STATIC_CALL(pv_steal_clock, u64 (*)(int));
+DECLARE_STATIC_CALL(pv_sched_clock, u64 (*)(void));
 
 void paravirt_set_sched_clock(u64 (*func)(void));
 
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index fe5efbcba824..65b2e6ec87a7 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -117,7 +117,7 @@ extern asmlinkage void preempt_schedule_notrace_thunk(void);
 
 #ifdef CONFIG_PREEMPT_DYNAMIC
 
-DECLARE_STATIC_CALL(preempt_schedule, __preempt_schedule_func);
+DECLARE_STATIC_CALL(preempt_schedule, &__preempt_schedule_func);
 
 #define __preempt_schedule() \
 do { \
@@ -125,7 +125,7 @@ do { \
 	asm volatile ("call " STATIC_CALL_TRAMP_STR(preempt_schedule) : ASM_CALL_CONSTRAINT); \
 } while (0)
 
-DECLARE_STATIC_CALL(preempt_schedule_notrace, __preempt_schedule_notrace_func);
+DECLARE_STATIC_CALL(preempt_schedule_notrace, &__preempt_schedule_notrace_func);
 
 #define __preempt_schedule_notrace() \
 do { \
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5d5c5ed7dd4..940c17099fcf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -125,7 +125,7 @@ EXPORT_SYMBOL_GPL(kvm_x86_ops);
 
 #define KVM_X86_OP(func)					     \
 	DEFINE_STATIC_CALL_NULL(kvm_x86_##func,			     \
-				*(((struct kvm_x86_ops *)0)->func));
+				(((struct kvm_x86_ops *)0)->func));
 #define KVM_X86_OP_NULL KVM_X86_OP
 #include <asm/kvm-x86-ops.h>
 EXPORT_STATIC_CALL_GPL(kvm_x86_get_cs_db_l_bits);
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 2e2b8d6140ed..d4106a3f3243 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -456,7 +456,7 @@ irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs);
  */
 void irqentry_exit_cond_resched(void);
 #ifdef CONFIG_PREEMPT_DYNAMIC
-DECLARE_STATIC_CALL(irqentry_exit_cond_resched, irqentry_exit_cond_resched);
+DECLARE_STATIC_CALL(irqentry_exit_cond_resched, &irqentry_exit_cond_resched);
 #endif
 
 /**
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 1b2f0a7e00d6..58aa80015db6 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -97,7 +97,7 @@ extern int __cond_resched(void);
 
 extern int __cond_resched(void);
 
-DECLARE_STATIC_CALL(might_resched, __cond_resched);
+DECLARE_STATIC_CALL(might_resched, &__cond_resched);
 
 static __always_inline void might_resched(void)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1780260f237b..93aad9c6fad1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2012,7 +2012,7 @@ extern int __cond_resched(void);
 
 #ifdef CONFIG_PREEMPT_DYNAMIC
 
-DECLARE_STATIC_CALL(cond_resched, __cond_resched);
+DECLARE_STATIC_CALL(cond_resched, &__cond_resched);
 
 static __always_inline int _cond_resched(void)
 {
diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 3e56a9751c06..18ee529b4937 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -14,16 +14,16 @@
  *
  * API overview:
  *
- *   DECLARE_STATIC_CALL(name, func);
- *   DEFINE_STATIC_CALL(name, func);
- *   DEFINE_STATIC_CALL_NULL(name, typename);
- *   DEFINE_STATIC_CALL_RET0(name, typename);
+ *   DECLARE_STATIC_CALL(name, &func);
+ *   DEFINE_STATIC_CALL(name, func);		// needs an actual function
+ *   DEFINE_STATIC_CALL_NULL(name, &func);
+ *   DEFINE_STATIC_CALL_RET0(name, &func);
  *
  *   __static_call_return0;
  *
  *   static_call(name)(args...);
  *   static_call_cond(name)(args...);
- *   static_call_update(name, func);
+ *   static_call_update(name, &func);
  *   static_call_query(name);
  *
  *   EXPORT_STATIC_CALL{,_TRAMP}{,_GPL}()
@@ -46,6 +46,9 @@
  *   # Call func_b()
  *   static_call(my_name)(arg1, arg2);
  *
+ *   @ To query which function is currently set to be called, use:
+ *   func = static_call_query(name);
+ *
  *
  * Implementation details:
  *
@@ -66,7 +69,8 @@
  * Notes on NULL function pointers:
  *
  *   Static_call()s support NULL functions, with many of the caveats that
- *   regular function pointers have.
+ *   regular function pointers have and a few extra. In particular they rely on
+ *   the function return type being void.
  *
  *   Clearly calling a NULL function pointer is 'BAD', so too for
  *   static_call()s (although when HAVE_STATIC_CALL it might not be immediately
@@ -79,10 +83,11 @@
  *
  *     void (*my_func_ptr)(int arg1) = NULL;
  *
- *   or using static_call_update() with a NULL function. In both cases the
- *   HAVE_STATIC_CALL implementation will patch the trampoline with a RET
- *   instruction, instead of an immediate tail-call JMP. HAVE_STATIC_CALL_INLINE
- *   architectures can patch the trampoline call to a NOP.
+ *   or using static_call_update() with a NULL function pointer. In both cases
+ *   the HAVE_STATIC_CALL implementation will patch the trampoline with a RET
+ *   instruction, instead of an immediate tail-call JMP.
+ *   HAVE_STATIC_CALL_INLINE architectures can patch the trampoline call to a
+ *   NOP.
  *
  *   In all cases, any argument evaluation is unconditional. Unlike a regular
  *   conditional function pointer call:
@@ -97,11 +102,8 @@
  *     static_call_cond(name)(arg1);
  *
  *   which will include the required value tests to avoid NULL-pointer
- *   dereferences.
- *
- *   To query which function is currently set to be called, use:
- *
- *   func = static_call_query(name);
+ *   dereferences. Note that this is a statement, not an expression, hence the
+ *   requirement for a void return value.
  *
  *
  * DEFINE_STATIC_CALL_RET0 / __static_call_return0:
@@ -122,6 +124,14 @@
  *
  *   Notably argument setup is unconditional.
  *
+ *   For example:
+ *
+ *     DEFINE_STATIC_CALL_RET0(my_ret_func, int (*)(int));
+ *
+ *     ret = static_call(my_ret_func)(5);
+ *
+ *   will, unless static_call_update() is used, return 0.
+ *
  *
  * EXPORT_STATIC_CALL() vs EXPORT_STATIC_CALL_TRAMP():
  *
@@ -180,16 +190,16 @@ extern int static_call_text_reserved(void *start, void *end);
 
 extern long __static_call_return0(void);
 
-#define __DEFINE_STATIC_CALL(name, _func, _func_init)			\
-	DECLARE_STATIC_CALL(name, _func);				\
+#define __DEFINE_STATIC_CALL(name, func_ptr, func_init)			\
+	DECLARE_STATIC_CALL(name, func_ptr);				\
 	struct static_call_key STATIC_CALL_KEY(name) = {		\
-		.func = _func_init,					\
+		.func = func_init,					\
 		.type = 1,						\
 	};								\
-	ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func_init)
+	ARCH_DEFINE_STATIC_CALL_TRAMP(name, func_init)
 
-#define DEFINE_STATIC_CALL_NULL(name, _func)				\
-	DECLARE_STATIC_CALL(name, _func);				\
+#define DEFINE_STATIC_CALL_NULL(name, func_ptr)				\
+	DECLARE_STATIC_CALL(name, func_ptr);				\
 	struct static_call_key STATIC_CALL_KEY(name) = {		\
 		.func = NULL,						\
 		.type = 1,						\
@@ -217,15 +227,15 @@ extern long __static_call_return0(void);
 
 static inline int static_call_init(void) { return 0; }
 
-#define __DEFINE_STATIC_CALL(name, _func, _func_init)			\
-	DECLARE_STATIC_CALL(name, _func);				\
+#define __DEFINE_STATIC_CALL(name, func_ptr, func_init)			\
+	DECLARE_STATIC_CALL(name, func_ptr);				\
 	struct static_call_key STATIC_CALL_KEY(name) = {		\
-		.func = _func_init,					\
+		.func = func_init,					\
 	};								\
-	ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func_init)
+	ARCH_DEFINE_STATIC_CALL_TRAMP(name, func_init)
 
-#define DEFINE_STATIC_CALL_NULL(name, _func)				\
-	DECLARE_STATIC_CALL(name, _func);				\
+#define DEFINE_STATIC_CALL_NULL(name, func_ptr)				\
+	DECLARE_STATIC_CALL(name, func_ptr);				\
 	struct static_call_key STATIC_CALL_KEY(name) = {		\
 		.func = NULL,						\
 	};								\
@@ -275,14 +285,14 @@ static inline long __static_call_return0(void)
 	return 0;
 }
 
-#define __DEFINE_STATIC_CALL(name, _func, _func_init)			\
-	DECLARE_STATIC_CALL(name, _func);				\
+#define __DEFINE_STATIC_CALL(name, func_ptr, func_init)			\
+	DECLARE_STATIC_CALL(name, func_ptr);				\
 	struct static_call_key STATIC_CALL_KEY(name) = {		\
-		.func = _func_init,					\
+		.func = func_init,					\
 	}
 
-#define DEFINE_STATIC_CALL_NULL(name, _func)				\
-	DECLARE_STATIC_CALL(name, _func);				\
+#define DEFINE_STATIC_CALL_NULL(name, func_ptr)				\
+	DECLARE_STATIC_CALL(name, func_ptr);				\
 	struct static_call_key STATIC_CALL_KEY(name) = {		\
 		.func = NULL,						\
 	}
@@ -327,10 +337,10 @@ static inline int static_call_text_reserved(void *start, void *end)
 
 #endif /* CONFIG_HAVE_STATIC_CALL */
 
-#define DEFINE_STATIC_CALL(name, _func)					\
-	__DEFINE_STATIC_CALL(name, _func, _func)
+#define DEFINE_STATIC_CALL(name, func)					\
+	__DEFINE_STATIC_CALL(name, &func, func)
 
-#define DEFINE_STATIC_CALL_RET0(name, _func)				\
-	__DEFINE_STATIC_CALL(name, _func, __static_call_return0)
+#define DEFINE_STATIC_CALL_RET0(name, func_ptr)				\
+	__DEFINE_STATIC_CALL(name, func_ptr, __static_call_return0)
 
 #endif /* _LINUX_STATIC_CALL_H */
diff --git a/include/linux/static_call_types.h b/include/linux/static_call_types.h
index 5a00b8b2cf9f..3f87aa682e14 100644
--- a/include/linux/static_call_types.h
+++ b/include/linux/static_call_types.h
@@ -34,9 +34,15 @@ struct static_call_site {
 	s32 key;
 };
 
-#define DECLARE_STATIC_CALL(name, func)					\
+/*
+ * Type mismatch on __SCP__ functions is due to mismatched function vs function
+ * pointer arguments between DECLARE_STATIC_CALL() and DEFINE_STATIC_CALL*()
+ * variants.
+ */
+#define DECLARE_STATIC_CALL(name, func_ptr)				\
 	extern struct static_call_key STATIC_CALL_KEY(name);		\
-	extern typeof(func) STATIC_CALL_TRAMP(name);
+	extern typeof(func_ptr) __SCP__##name;				\
+	extern typeof(*__SCP__##name) STATIC_CALL_TRAMP(name);
 
 #ifdef CONFIG_HAVE_STATIC_CALL
 
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index ab58696d0ddd..118b78fffc9d 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -240,7 +240,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
  */
 #define __DECLARE_TRACE(name, proto, args, cond, data_proto)		\
 	extern int __traceiter_##name(data_proto);			\
-	DECLARE_STATIC_CALL(tp_func_##name, __traceiter_##name);	\
+	DECLARE_STATIC_CALL(tp_func_##name, &__traceiter_##name);	\
 	extern struct tracepoint __tracepoint_##name;			\
 	static inline void trace_##name(proto)				\
 	{								\
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c4462c454ab9..def3aa224ae5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8171,10 +8171,10 @@ EXPORT_SYMBOL(__cond_resched);
 #endif
 
 #ifdef CONFIG_PREEMPT_DYNAMIC
-DEFINE_STATIC_CALL_RET0(cond_resched, __cond_resched);
+DEFINE_STATIC_CALL_RET0(cond_resched, &__cond_resched);
 EXPORT_STATIC_CALL_TRAMP(cond_resched);
 
-DEFINE_STATIC_CALL_RET0(might_resched, __cond_resched);
+DEFINE_STATIC_CALL_RET0(might_resched, &__cond_resched);
 EXPORT_STATIC_CALL_TRAMP(might_resched);
 #endif
 
diff --git a/kernel/static_call.c b/kernel/static_call.c
index 43ba0b1e0edb..b652c32a7250 100644
--- a/kernel/static_call.c
+++ b/kernel/static_call.c
@@ -517,6 +517,12 @@ static int func_b(int x)
 }
 
 DEFINE_STATIC_CALL(sc_selftest, func_a);
+DEFINE_STATIC_CALL_NULL(sc_null, void (*)(int *));
+
+static void null_a(int *arg)
+{
+	*arg = 1;
+}
 
 static struct static_call_data {
       int (*func)(int);
@@ -530,18 +536,26 @@ static struct static_call_data {
 
 static int __init test_static_call_init(void)
 {
-      int i;
+	int i;
 
-      for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) {
-	      struct static_call_data *scd = &static_call_data[i];
+	for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) {
+		struct static_call_data *scd = &static_call_data[i];
 
-              if (scd->func)
-                      static_call_update(sc_selftest, scd->func);
+		if (scd->func)
+			static_call_update(sc_selftest, scd->func);
 
-              WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect);
-      }
+		WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect);
+	}
 
-      return 0;
+	i = 5;
+	static_call_cond(sc_null)(&i);
+	WARN_ON(i != 5);
+
+	static_call_update(sc_null, &null_a);
+	static_call_cond(sc_null)(&i);
+	WARN_ON(i != 1);
+
+	return 0;
 }
 early_initcall(test_static_call_init);
 
diff --git a/security/keys/trusted-keys/trusted_core.c b/security/keys/trusted-keys/trusted_core.c
index d5c891d8d353..1d91eb6b242e 100644
--- a/security/keys/trusted-keys/trusted_core.c
+++ b/security/keys/trusted-keys/trusted_core.c
@@ -35,13 +35,13 @@ static const struct trusted_key_source trusted_key_sources[] = {
 #endif
 };
 
-DEFINE_STATIC_CALL_NULL(trusted_key_init, *trusted_key_sources[0].ops->init);
-DEFINE_STATIC_CALL_NULL(trusted_key_seal, *trusted_key_sources[0].ops->seal);
+DEFINE_STATIC_CALL_NULL(trusted_key_init, trusted_key_sources[0].ops->init);
+DEFINE_STATIC_CALL_NULL(trusted_key_seal, trusted_key_sources[0].ops->seal);
 DEFINE_STATIC_CALL_NULL(trusted_key_unseal,
-			*trusted_key_sources[0].ops->unseal);
+			trusted_key_sources[0].ops->unseal);
 DEFINE_STATIC_CALL_NULL(trusted_key_get_random,
-			*trusted_key_sources[0].ops->get_random);
-DEFINE_STATIC_CALL_NULL(trusted_key_exit, *trusted_key_sources[0].ops->exit);
+			trusted_key_sources[0].ops->get_random);
+DEFINE_STATIC_CALL_NULL(trusted_key_exit, trusted_key_sources[0].ops->exit);
 static unsigned char migratable;
 
 enum {
diff --git a/tools/include/linux/static_call_types.h b/tools/include/linux/static_call_types.h
index 5a00b8b2cf9f..3f87aa682e14 100644
--- a/tools/include/linux/static_call_types.h
+++ b/tools/include/linux/static_call_types.h
@@ -34,9 +34,15 @@ struct static_call_site {
 	s32 key;
 };
 
-#define DECLARE_STATIC_CALL(name, func)					\
+/*
+ * Type mismatch on __SCP__ functions is due to mismatched function vs function
+ * pointer arguments between DECLARE_STATIC_CALL() and DEFINE_STATIC_CALL*()
+ * variants.
+ */
+#define DECLARE_STATIC_CALL(name, func_ptr)				\
 	extern struct static_call_key STATIC_CALL_KEY(name);		\
-	extern typeof(func) STATIC_CALL_TRAMP(name);
+	extern typeof(func_ptr) __SCP__##name;				\
+	extern typeof(*__SCP__##name) STATIC_CALL_TRAMP(name);
 
 #ifdef CONFIG_HAVE_STATIC_CALL
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events
  2021-09-01 17:12         ` Peter Zijlstra
@ 2021-09-04 23:01           ` Josh Poimboeuf
  0 siblings, 0 replies; 18+ messages in thread
From: Josh Poimboeuf @ 2021-09-04 23:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Song Liu, open list:BPF (Safe dynamic programs and tools),
	LKML, acme, mingo, kjain, Kernel Team, jbaron, rostedt, ardb

On Wed, Sep 01, 2021 at 07:12:24PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 30, 2021 at 08:07:24PM +0200, Peter Zijlstra wrote:
> > On Mon, Aug 30, 2021 at 05:41:46PM +0000, Song Liu wrote:
> > > DECLARE_STATIC_CALL(perf_snapshot_branch_stack,
> > >                    int (*)(struct perf_branch_snapshot *));
> > 
> > > Something like 
> > > 
> > > typedef int (perf_snapshot_branch_stack_t)(struct perf_branch_snapshot *);
> > > DECLARE_STATIC_CALL(perf_snapshot_branch_stack, perf_snapshot_branch_stack_t);
> > > 
> > > seems to work fine. 
> > 
> > Oh urg, indeed. It wants a function type, not a function pointer type.
> > I've been bitten by that before. Go with the typedef, that's the sanest.
> 
> The below is the best I can make of it... it's a little inconsistent and
> somewhat tricky, but at least the compiler yells hard if you get it
> wrong.
> 
> I can *almost* get to: DEFINE_STATIC_CALL(foo, &func), except for
> ARCH_DEFINE_STATIC_CALL_TRAMP() which needs the actual function name
> string for the ASM :-(
> 
> The rest can do with a function pointer type and have it work.

Seems reasonable to me.

-- 
Josh


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-09-04 23:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-26 22:13 [PATCH v2 bpf-next 0/3] bpf: introduce bpf_get_branch_snapshot Song Liu
2021-08-26 22:13 ` [PATCH v2 bpf-next 1/3] perf: enable branch record for software events Song Liu
2021-08-30 10:22   ` Peter Zijlstra
2021-08-30 15:25     ` Song Liu
2021-08-30 16:06       ` Peter Zijlstra
2021-08-30 16:36         ` Song Liu
2021-09-01 17:09           ` Peter Zijlstra
2021-08-30 17:41     ` Song Liu
2021-08-30 18:07       ` Peter Zijlstra
2021-09-01 17:12         ` Peter Zijlstra
2021-09-04 23:01           ` Josh Poimboeuf
2021-08-30 10:43   ` Peter Zijlstra
2021-08-30 16:06     ` Song Liu
2021-08-26 22:13 ` [PATCH v2 bpf-next 2/3] bpf: introduce helper bpf_get_branch_snapshot Song Liu
2021-08-27  9:28   ` kernel test robot
2021-08-27 15:10   ` kernel test robot
2021-08-30 10:47     ` Peter Zijlstra
2021-08-26 22:13 ` [PATCH v2 bpf-next 3/3] selftests/bpf: add test for bpf_get_branch_snapshot Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).