[PATCH net-next v5 0/4] bpf: add two helpers to read perf event enabled/running time

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v5 0/4] bpf: add two helpers to read perf event enabled/running time
@ 2017-09-20  6:09 Yonghong Song
  2017-09-20  6:09 ` [PATCH net-next v5 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map Yonghong Song
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Yonghong Song @ 2017-09-20  6:09 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch set implements two helper functions.
The helper bpf_perf_event_read_value reads counter/time_enabled/time_running
for perf event array map. The helper bpf_perf_prog_read_value read
counter/time_enabled/time_running for bpf prog with type BPF_PROG_TYPE_PERF_EVENT.

Changelogs:
v4->v5:
  . fix some coding style issues
  . memset the input buffer in case of error for ARG_PTR_TO_UNINIT_MEM
    type of argument.
v3->v4:
  . fix a build failure
v2->v3:
  . counters should be read in order to read enabled/running time. This is to
    prevent that counters and enabled/running time may be read separately.
v1->v2:
  . reading enabled/running time should be together with reading counters
    which contains the logic to ensure the result is valid.

Yonghong Song (4):
  bpf: add helper bpf_perf_event_read_value for perf event array map
  bpf: add a test case for helper bpf_perf_event_read_value
  bpf: add helper bpf_perf_prog_read_value
  bpf: add a test case for helper bpf_perf_prog_read_value

 include/linux/perf_event.h                |  7 ++-
 include/uapi/linux/bpf.h                  | 27 ++++++++++-
 kernel/bpf/arraymap.c                     |  2 +-
 kernel/bpf/verifier.c                     |  4 +-
 kernel/events/core.c                      | 16 +++++--
 kernel/trace/bpf_trace.c                  | 74 ++++++++++++++++++++++++++++---
 samples/bpf/trace_event_kern.c            | 10 +++++
 samples/bpf/trace_event_user.c            | 13 +++---
 samples/bpf/tracex6_kern.c                | 26 +++++++++++
 samples/bpf/tracex6_user.c                | 13 +++++-
 tools/include/uapi/linux/bpf.h            |  4 +-
 tools/testing/selftests/bpf/bpf_helpers.h |  6 +++
 12 files changed, 182 insertions(+), 20 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next v5 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map
  2017-09-20  6:09 [PATCH net-next v5 0/4] bpf: add two helpers to read perf event enabled/running time Yonghong Song
@ 2017-09-20  6:09 ` Yonghong Song
  2017-09-20 17:26   ` Peter Zijlstra
  2017-09-20  6:09 ` [PATCH net-next v5 2/4] bpf: add a test case for helper bpf_perf_event_read_value Yonghong Song
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Yonghong Song @ 2017-09-20  6:09 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch adds helper bpf_perf_event_read_value for kprobed based perf
event array map, to read perf counter and enabled/running time.
The enabled/running time is accumulated since the perf event open.
To achieve scaling factor between two bpf invocations, users
can can use cpu_id as the key (which is typical for perf array usage model)
to remember the previous value and do the calculation inside the
bpf program.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/perf_event.h |  6 ++++--
 include/uapi/linux/bpf.h   | 19 ++++++++++++++++++-
 kernel/bpf/arraymap.c      |  2 +-
 kernel/bpf/verifier.c      |  4 +++-
 kernel/events/core.c       | 15 ++++++++++++---
 kernel/trace/bpf_trace.c   | 46 +++++++++++++++++++++++++++++++++++++++++-----
 6 files changed, 79 insertions(+), 13 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8e22f24..21d8c12 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -884,7 +884,8 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr,
 				void *context);
 extern void perf_pmu_migrate_context(struct pmu *pmu,
 				int src_cpu, int dst_cpu);
-int perf_event_read_local(struct perf_event *event, u64 *value);
+int perf_event_read_local(struct perf_event *event, u64 *value,
+			  u64 *enabled, u64 *running);
 extern u64 perf_event_read_value(struct perf_event *event,
 				 u64 *enabled, u64 *running);
 
@@ -1286,7 +1287,8 @@ static inline const struct perf_event_attr *perf_event_attrs(struct perf_event *
 {
 	return ERR_PTR(-EINVAL);
 }
-static inline int perf_event_read_local(struct perf_event *event, u64 *value)
+static inline int perf_event_read_local(struct perf_event *event, u64 *value,
+					u64 *enabled, u64 *running)
 {
 	return -EINVAL;
 }
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 43ab5c4..ccfe1b1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -582,6 +582,14 @@ union bpf_attr {
  *	@map: pointer to sockmap to update
  *	@key: key to insert/update sock in map
  *	@flags: same flags as map update elem
+ *
+ * int bpf_perf_event_read_value(map, flags, buf, buf_size)
+ *     read perf event counter value and perf event enabled/running time
+ *     @map: pointer to perf_event_array map
+ *     @flags: index of event in the map or bitmask flags
+ *     @buf: buf to fill
+ *     @buf_size: size of the buf
+ *     Return: 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -638,6 +646,7 @@ union bpf_attr {
 	FN(redirect_map),		\
 	FN(sk_redirect_map),		\
 	FN(sock_map_update),		\
+	FN(perf_event_read_value),	\
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -681,7 +690,9 @@ enum bpf_func_id {
 #define BPF_F_ZERO_CSUM_TX		(1ULL << 1)
 #define BPF_F_DONT_FRAGMENT		(1ULL << 2)
 
-/* BPF_FUNC_perf_event_output and BPF_FUNC_perf_event_read flags. */
+/* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
+ * BPF_FUNC_perf_event_read_value flags.
+ */
 #define BPF_F_INDEX_MASK		0xffffffffULL
 #define BPF_F_CURRENT_CPU		BPF_F_INDEX_MASK
 /* BPF_FUNC_perf_event_output for sk_buff input context. */
@@ -864,4 +875,10 @@ enum {
 #define TCP_BPF_IW		1001	/* Set TCP initial congestion window */
 #define TCP_BPF_SNDCWND_CLAMP	1002	/* Set sndcwnd_clamp */
 
+struct bpf_perf_event_value {
+	__u64 counter;
+	__u64 enabled;
+	__u64 running;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 98c0f00..68d8666 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -492,7 +492,7 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map *map,
 
 	ee = ERR_PTR(-EOPNOTSUPP);
 	event = perf_file->private_data;
-	if (perf_event_read_local(event, &value) == -EOPNOTSUPP)
+	if (perf_event_read_local(event, &value, NULL, NULL) == -EOPNOTSUPP)
 		goto err_out;
 
 	ee = bpf_event_entry_gen(perf_file, map_file);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 799b245..1bf9d7b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1494,7 +1494,8 @@ static int check_map_func_compatibility(struct bpf_map *map, int func_id)
 		break;
 	case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
 		if (func_id != BPF_FUNC_perf_event_read &&
-		    func_id != BPF_FUNC_perf_event_output)
+		    func_id != BPF_FUNC_perf_event_output &&
+		    func_id != BPF_FUNC_perf_event_read_value)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_STACK_TRACE:
@@ -1537,6 +1538,7 @@ static int check_map_func_compatibility(struct bpf_map *map, int func_id)
 		break;
 	case BPF_FUNC_perf_event_read:
 	case BPF_FUNC_perf_event_output:
+	case BPF_FUNC_perf_event_read_value:
 		if (map->map_type != BPF_MAP_TYPE_PERF_EVENT_ARRAY)
 			goto error;
 		break;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3e691b7..2d5bbe5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3684,10 +3684,12 @@ static inline u64 perf_event_count(struct perf_event *event)
  *     will not be local and we cannot read them atomically
  *   - must not have a pmu::count method
  */
-int perf_event_read_local(struct perf_event *event, u64 *value)
+int perf_event_read_local(struct perf_event *event, u64 *value,
+			  u64 *enabled, u64 *running)
 {
 	unsigned long flags;
 	int ret = 0;
+	u64 now;
 
 	/*
 	 * Disabling interrupts avoids all counter scheduling (context
@@ -3718,14 +3720,21 @@ int perf_event_read_local(struct perf_event *event, u64 *value)
 		goto out;
 	}
 
+	now = event->shadow_ctx_time + perf_clock();
+	if (enabled)
+		*enabled = now - event->tstamp_enabled;
 	/*
 	 * If the event is currently on this CPU, its either a per-task event,
 	 * or local to this CPU. Furthermore it means its ACTIVE (otherwise
 	 * oncpu == -1).
 	 */
-	if (event->oncpu == smp_processor_id())
+	if (event->oncpu == smp_processor_id()) {
 		event->pmu->read(event);
-
+		if (running)
+			*running = now - event->tstamp_running;
+	} else if (running) {
+		*running = event->total_time_running;
+	}
 	*value = local64_read(&event->count);
 out:
 	local_irq_restore(flags);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index dc498b6..686dfa1 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -255,14 +255,13 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
 	return &bpf_trace_printk_proto;
 }
 
-BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
-{
+static __always_inline int
+get_map_perf_counter(struct bpf_map *map, u64 flags,
+		     u64 *value, u64 *enabled, u64 *running) {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	unsigned int cpu = smp_processor_id();
 	u64 index = flags & BPF_F_INDEX_MASK;
 	struct bpf_event_entry *ee;
-	u64 value = 0;
-	int err;
 
 	if (unlikely(flags & ~(BPF_F_INDEX_MASK)))
 		return -EINVAL;
@@ -275,7 +274,15 @@ BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
 	if (!ee)
 		return -ENOENT;
 
-	err = perf_event_read_local(ee->event, &value);
+	return perf_event_read_local(ee->event, value, enabled, running);
+}
+
+BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
+{
+	u64 value = 0;
+	int err;
+
+	err = get_map_perf_counter(map, flags, &value, NULL, NULL);
 	/*
 	 * this api is ugly since we miss [-22..-2] range of valid
 	 * counter values, but that's uapi
@@ -293,6 +300,33 @@ static const struct bpf_func_proto bpf_perf_event_read_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_perf_event_read_value, struct bpf_map *, map, u64, flags,
+	   struct bpf_perf_event_value *, buf, u32, size)
+{
+	int err;
+
+	if (unlikely(size != sizeof(struct bpf_perf_event_value)))
+		return -EINVAL;
+
+	err = get_map_perf_counter(map, flags, &buf->counter, &buf->enabled,
+				   &buf->running);
+	if (unlikely(err)) {
+		memset(buf, 0, size);
+		return err;
+	}
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_perf_event_read_value_proto = {
+	.func		= bpf_perf_event_read_value,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_CONST_SIZE,
+};
+
 static DEFINE_PER_CPU(struct perf_sample_data, bpf_sd);
 
 static __always_inline u64
@@ -499,6 +533,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
 		return &bpf_perf_event_output_proto;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto;
+	case BPF_FUNC_perf_event_read_value:
+		return &bpf_perf_event_read_value_proto;
 	default:
 		return tracing_func_proto(func_id);
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v5 2/4] bpf: add a test case for helper bpf_perf_event_read_value
  2017-09-20  6:09 [PATCH net-next v5 0/4] bpf: add two helpers to read perf event enabled/running time Yonghong Song
  2017-09-20  6:09 ` [PATCH net-next v5 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map Yonghong Song
@ 2017-09-20  6:09 ` Yonghong Song
  2017-09-20  6:09 ` [PATCH net-next v5 3/4] bpf: add helper bpf_perf_prog_read_value Yonghong Song
  2017-09-20  6:09 ` [PATCH net-next v5 4/4] bpf: add a test case for " Yonghong Song
  3 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2017-09-20  6:09 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

The bpf sample program tracex6 is enhanced to use the new
helper to read enabled/running time as well.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/tracex6_kern.c                | 26 ++++++++++++++++++++++++++
 samples/bpf/tracex6_user.c                | 13 ++++++++++++-
 tools/include/uapi/linux/bpf.h            |  3 ++-
 tools/testing/selftests/bpf/bpf_helpers.h |  3 +++
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/tracex6_kern.c b/samples/bpf/tracex6_kern.c
index e7d1803..46c557a 100644
--- a/samples/bpf/tracex6_kern.c
+++ b/samples/bpf/tracex6_kern.c
@@ -15,6 +15,12 @@ struct bpf_map_def SEC("maps") values = {
 	.value_size = sizeof(u64),
 	.max_entries = 64,
 };
+struct bpf_map_def SEC("maps") values2 = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(int),
+	.value_size = sizeof(struct bpf_perf_event_value),
+	.max_entries = 64,
+};
 
 SEC("kprobe/htab_map_get_next_key")
 int bpf_prog1(struct pt_regs *ctx)
@@ -37,5 +43,25 @@ int bpf_prog1(struct pt_regs *ctx)
 	return 0;
 }
 
+SEC("kprobe/htab_map_lookup_elem")
+int bpf_prog2(struct pt_regs *ctx)
+{
+	u32 key = bpf_get_smp_processor_id();
+	struct bpf_perf_event_value *val, buf;
+	int error;
+
+	error = bpf_perf_event_read_value(&counters, key, &buf, sizeof(buf));
+	if (error)
+		return 0;
+
+	val = bpf_map_lookup_elem(&values2, &key);
+	if (val)
+		*val = buf;
+	else
+		bpf_map_update_elem(&values2, &key, &buf, BPF_NOEXIST);
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
 u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex6_user.c b/samples/bpf/tracex6_user.c
index a05a99a..3341a96 100644
--- a/samples/bpf/tracex6_user.c
+++ b/samples/bpf/tracex6_user.c
@@ -22,6 +22,7 @@
 
 static void check_on_cpu(int cpu, struct perf_event_attr *attr)
 {
+	struct bpf_perf_event_value value2;
 	int pmu_fd, error = 0;
 	cpu_set_t set;
 	__u64 value;
@@ -46,8 +47,18 @@ static void check_on_cpu(int cpu, struct perf_event_attr *attr)
 		fprintf(stderr, "Value missing for CPU %d\n", cpu);
 		error = 1;
 		goto on_exit;
+	} else {
+		fprintf(stderr, "CPU %d: %llu\n", cpu, value);
+	}
+	/* The above bpf_map_lookup_elem should trigger the second kprobe */
+	if (bpf_map_lookup_elem(map_fd[2], &cpu, &value2)) {
+		fprintf(stderr, "Value2 missing for CPU %d\n", cpu);
+		error = 1;
+		goto on_exit;
+	} else {
+		fprintf(stderr, "CPU %d: counter: %llu, enabled: %llu, running: %llu\n", cpu,
+			value2.counter, value2.enabled, value2.running);
 	}
-	fprintf(stderr, "CPU %d: %llu\n", cpu, value);
 
 on_exit:
 	assert(bpf_map_delete_elem(map_fd[0], &cpu) == 0 || error);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 461811e..79eb529 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -632,7 +632,8 @@ union bpf_attr {
 	FN(skb_adjust_room),		\
 	FN(redirect_map),		\
 	FN(sk_redirect_map),		\
-	FN(sock_map_update),
+	FN(sock_map_update),		\
+	FN(perf_event_read_value),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 36fb916..08e6f8c 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -70,6 +70,9 @@ static int (*bpf_sk_redirect_map)(void *map, int key, int flags) =
 static int (*bpf_sock_map_update)(void *map, void *key, void *value,
 				  unsigned long long flags) =
 	(void *) BPF_FUNC_sock_map_update;
+static int (*bpf_perf_event_read_value)(void *map, unsigned long long flags,
+					void *buf, unsigned int buf_size) =
+	(void *) BPF_FUNC_perf_event_read_value;
 
 
 /* llvm builtin functions that eBPF C program may use to
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v5 3/4] bpf: add helper bpf_perf_prog_read_value
  2017-09-20  6:09 [PATCH net-next v5 0/4] bpf: add two helpers to read perf event enabled/running time Yonghong Song
  2017-09-20  6:09 ` [PATCH net-next v5 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map Yonghong Song
  2017-09-20  6:09 ` [PATCH net-next v5 2/4] bpf: add a test case for helper bpf_perf_event_read_value Yonghong Song
@ 2017-09-20  6:09 ` Yonghong Song
  2017-09-20  6:09 ` [PATCH net-next v5 4/4] bpf: add a test case for " Yonghong Song
  3 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2017-09-20  6:09 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

This patch adds helper bpf_perf_prog_read_cvalue for perf event based bpf
programs, to read event counter and enabled/running time.
The enabled/running time is accumulated since the perf event open.

The typical use case for perf event based bpf program is to attach itself
to a single event. In such cases, if it is desirable to get scaling factor
between two bpf invocations, users can can save the time values in a map,
and use the value from the map and the current value to calculate
the scaling factor.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/perf_event.h |  1 +
 include/uapi/linux/bpf.h   |  8 ++++++++
 kernel/events/core.c       |  1 +
 kernel/trace/bpf_trace.c   | 28 ++++++++++++++++++++++++++++
 4 files changed, 38 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 21d8c12..79b18a2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -806,6 +806,7 @@ struct perf_output_handle {
 struct bpf_perf_event_data_kern {
 	struct pt_regs *regs;
 	struct perf_sample_data *data;
+	struct perf_event *event;
 };
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ccfe1b1..f3eeae2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -590,6 +590,13 @@ union bpf_attr {
  *     @buf: buf to fill
  *     @buf_size: size of the buf
  *     Return: 0 on success or negative error code
+ *
+ * int bpf_perf_prog_read_value(ctx, buf, buf_size)
+ *     read perf prog attached perf event counter and enabled/running time
+ *     @ctx: pointer to ctx
+ *     @buf: buf to fill
+ *     @buf_size: size of the buf
+ *     Return : 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -647,6 +654,7 @@ union bpf_attr {
 	FN(sk_redirect_map),		\
 	FN(sock_map_update),		\
 	FN(perf_event_read_value),	\
+	FN(perf_prog_read_value),	\
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2d5bbe5..d039086 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8081,6 +8081,7 @@ static void bpf_overflow_handler(struct perf_event *event,
 	struct bpf_perf_event_data_kern ctx = {
 		.data = data,
 		.regs = regs,
+		.event = event,
 	};
 	int ret = 0;
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 686dfa1..c4d617a 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -612,6 +612,32 @@ static const struct bpf_func_proto bpf_get_stackid_proto_tp = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_3(bpf_perf_prog_read_value_tp, struct bpf_perf_event_data_kern *, ctx,
+	   struct bpf_perf_event_value *, buf, u32, size)
+{
+	int err;
+
+	if (unlikely(size != sizeof(struct bpf_perf_event_value)))
+		return -EINVAL;
+
+	err = perf_event_read_local(ctx->event, &buf->counter, &buf->enabled,
+				    &buf->running);
+	if (unlikely(err)) {
+		memset(buf, 0, size);
+		return err;
+	}
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_perf_prog_read_value_proto_tp = {
+         .func           = bpf_perf_prog_read_value_tp,
+         .gpl_only       = true,
+         .ret_type       = RET_INTEGER,
+         .arg1_type      = ARG_PTR_TO_CTX,
+         .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
+         .arg3_type      = ARG_CONST_SIZE,
+};
+
 static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
 {
 	switch (func_id) {
@@ -619,6 +645,8 @@ static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
 		return &bpf_perf_event_output_proto_tp;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto_tp;
+	case BPF_FUNC_perf_prog_read_value:
+		return &bpf_perf_prog_read_value_proto_tp;
 	default:
 		return tracing_func_proto(func_id);
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next v5 4/4] bpf: add a test case for helper bpf_perf_prog_read_value
  2017-09-20  6:09 [PATCH net-next v5 0/4] bpf: add two helpers to read perf event enabled/running time Yonghong Song
                   ` (2 preceding siblings ...)
  2017-09-20  6:09 ` [PATCH net-next v5 3/4] bpf: add helper bpf_perf_prog_read_value Yonghong Song
@ 2017-09-20  6:09 ` Yonghong Song
  3 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2017-09-20  6:09 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

The bpf sample program trace_event is enhanced to use the new
helper to print out enabled/running time.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/trace_event_kern.c            | 10 ++++++++++
 samples/bpf/trace_event_user.c            | 13 ++++++++-----
 tools/include/uapi/linux/bpf.h            |  3 ++-
 tools/testing/selftests/bpf/bpf_helpers.h |  3 +++
 4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/samples/bpf/trace_event_kern.c b/samples/bpf/trace_event_kern.c
index 41b6115..a77a583d 100644
--- a/samples/bpf/trace_event_kern.c
+++ b/samples/bpf/trace_event_kern.c
@@ -37,10 +37,14 @@ struct bpf_map_def SEC("maps") stackmap = {
 SEC("perf_event")
 int bpf_prog1(struct bpf_perf_event_data *ctx)
 {
+	char time_fmt1[] = "Time Enabled: %llu, Time Running: %llu";
+	char time_fmt2[] = "Get Time Failed, ErrCode: %d";
 	char fmt[] = "CPU-%d period %lld ip %llx";
 	u32 cpu = bpf_get_smp_processor_id();
+	struct bpf_perf_event_value value_buf;
 	struct key_t key;
 	u64 *val, one = 1;
+	int ret;
 
 	if (ctx->sample_period < 10000)
 		/* ignore warmup */
@@ -54,6 +58,12 @@ int bpf_prog1(struct bpf_perf_event_data *ctx)
 		return 0;
 	}
 
+	ret = bpf_perf_prog_read_value(ctx, (void *)&value_buf, sizeof(struct bpf_perf_event_value));
+	if (!ret)
+	  bpf_trace_printk(time_fmt1, sizeof(time_fmt1), value_buf.enabled, value_buf.running);
+	else
+	  bpf_trace_printk(time_fmt2, sizeof(time_fmt2), ret);
+
 	val = bpf_map_lookup_elem(&counts, &key);
 	if (val)
 		(*val)++;
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index 7bd827b..bf4f1b6 100644
--- a/samples/bpf/trace_event_user.c
+++ b/samples/bpf/trace_event_user.c
@@ -127,6 +127,9 @@ static void test_perf_event_all_cpu(struct perf_event_attr *attr)
 	int *pmu_fd = malloc(nr_cpus * sizeof(int));
 	int i, error = 0;
 
+	/* system wide perf event, no need to inherit */
+	attr->inherit = 0;
+
 	/* open perf_event on all cpus */
 	for (i = 0; i < nr_cpus; i++) {
 		pmu_fd[i] = sys_perf_event_open(attr, -1, i, -1, 0);
@@ -154,6 +157,11 @@ static void test_perf_event_task(struct perf_event_attr *attr)
 {
 	int pmu_fd;
 
+	/* per task perf event, enable inherit so the "dd ..." command can be traced properly.
+	 * Enabling inherit will cause bpf_perf_prog_read_time helper failure.
+	 */
+	attr->inherit = 1;
+
 	/* open task bound event */
 	pmu_fd = sys_perf_event_open(attr, 0, -1, -1, 0);
 	if (pmu_fd < 0) {
@@ -175,14 +183,12 @@ static void test_bpf_perf_event(void)
 		.freq = 1,
 		.type = PERF_TYPE_HARDWARE,
 		.config = PERF_COUNT_HW_CPU_CYCLES,
-		.inherit = 1,
 	};
 	struct perf_event_attr attr_type_sw = {
 		.sample_freq = SAMPLE_FREQ,
 		.freq = 1,
 		.type = PERF_TYPE_SOFTWARE,
 		.config = PERF_COUNT_SW_CPU_CLOCK,
-		.inherit = 1,
 	};
 	struct perf_event_attr attr_hw_cache_l1d = {
 		.sample_freq = SAMPLE_FREQ,
@@ -192,7 +198,6 @@ static void test_bpf_perf_event(void)
 			PERF_COUNT_HW_CACHE_L1D |
 			(PERF_COUNT_HW_CACHE_OP_READ << 8) |
 			(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16),
-		.inherit = 1,
 	};
 	struct perf_event_attr attr_hw_cache_branch_miss = {
 		.sample_freq = SAMPLE_FREQ,
@@ -202,7 +207,6 @@ static void test_bpf_perf_event(void)
 			PERF_COUNT_HW_CACHE_BPU |
 			(PERF_COUNT_HW_CACHE_OP_READ << 8) |
 			(PERF_COUNT_HW_CACHE_RESULT_MISS << 16),
-		.inherit = 1,
 	};
 	struct perf_event_attr attr_type_raw = {
 		.sample_freq = SAMPLE_FREQ,
@@ -210,7 +214,6 @@ static void test_bpf_perf_event(void)
 		.type = PERF_TYPE_RAW,
 		/* Intel Instruction Retired */
 		.config = 0xc0,
-		.inherit = 1,
 	};
 
 	printf("Test HW_CPU_CYCLES\n");
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 79eb529..50d2bcd 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -633,7 +633,8 @@ union bpf_attr {
 	FN(redirect_map),		\
 	FN(sk_redirect_map),		\
 	FN(sock_map_update),		\
-	FN(perf_event_read_value),
+	FN(perf_event_read_value),	\
+	FN(perf_prog_read_value),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 08e6f8c..1d3dcd4 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -73,6 +73,9 @@ static int (*bpf_sock_map_update)(void *map, void *key, void *value,
 static int (*bpf_perf_event_read_value)(void *map, unsigned long long flags,
 					void *buf, unsigned int buf_size) =
 	(void *) BPF_FUNC_perf_event_read_value;
+static int (*bpf_perf_prog_read_value)(void *ctx, void *buf,
+				       unsigned int buf_size) =
+	(void *) BPF_FUNC_perf_prog_read_value;
 
 
 /* llvm builtin functions that eBPF C program may use to
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v5 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map
  2017-09-20  6:09 ` [PATCH net-next v5 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map Yonghong Song
@ 2017-09-20 17:26   ` Peter Zijlstra
  2017-09-20 23:07     ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2017-09-20 17:26 UTC (permalink / raw)
  To: Yonghong Song; +Cc: rostedt, ast, daniel, netdev, kernel-team, David Miller

On Tue, Sep 19, 2017 at 11:09:32PM -0700, Yonghong Song wrote:
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 3e691b7..2d5bbe5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3684,10 +3684,12 @@ static inline u64 perf_event_count(struct perf_event *event)
>   *     will not be local and we cannot read them atomically
>   *   - must not have a pmu::count method
>   */
> -int perf_event_read_local(struct perf_event *event, u64 *value)
> +int perf_event_read_local(struct perf_event *event, u64 *value,
> +			  u64 *enabled, u64 *running)
>  {
>  	unsigned long flags;
>  	int ret = 0;
> +	u64 now;
>  
>  	/*
>  	 * Disabling interrupts avoids all counter scheduling (context
> @@ -3718,14 +3720,21 @@ int perf_event_read_local(struct perf_event *event, u64 *value)
>  		goto out;
>  	}
>  
> +	now = event->shadow_ctx_time + perf_clock();
> +	if (enabled)
> +		*enabled = now - event->tstamp_enabled;
>  	/*
>  	 * If the event is currently on this CPU, its either a per-task event,
>  	 * or local to this CPU. Furthermore it means its ACTIVE (otherwise
>  	 * oncpu == -1).
>  	 */
> -	if (event->oncpu == smp_processor_id())
> +	if (event->oncpu == smp_processor_id()) {
>  		event->pmu->read(event);
> -
> +		if (running)
> +			*running = now - event->tstamp_running;
> +	} else if (running) {
> +		*running = event->total_time_running;
> +	}
>  	*value = local64_read(&event->count);
>  out:
>  	local_irq_restore(flags);

Yeah, this looks about right.

Dave, could we have this in a topic tree of sorts, because I have a
pending series to rework all the timekeeping and it might be nice to not
have sfr run into all sorts of conflicts.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v5 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map
  2017-09-20 17:26   ` Peter Zijlstra
@ 2017-09-20 23:07     ` David Miller
  2017-09-21 15:15       ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2017-09-20 23:07 UTC (permalink / raw)
  To: peterz; +Cc: yhs, rostedt, ast, daniel, netdev, kernel-team

From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 20 Sep 2017 19:26:51 +0200

> Dave, could we have this in a topic tree of sorts, because I have a
> pending series to rework all the timekeeping and it might be nice to not
> have sfr run into all sorts of conflicts.

If you want to merge it into your tree that's fine.

But it means that any further development done on top of this
work will need to go via you as well.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v5 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map
  2017-09-20 23:07     ` David Miller
@ 2017-09-21 15:15       ` Alexei Starovoitov
  2017-09-25 15:41         ` Yonghong Song
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2017-09-21 15:15 UTC (permalink / raw)
  To: David Miller, peterz; +Cc: yhs, rostedt, daniel, netdev, kernel-team

On 9/20/17 4:07 PM, David Miller wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed, 20 Sep 2017 19:26:51 +0200
>
>> Dave, could we have this in a topic tree of sorts, because I have a
>> pending series to rework all the timekeeping and it might be nice to not
>> have sfr run into all sorts of conflicts.
>
> If you want to merge it into your tree that's fine.
>
> But it means that any further development done on top of this
> work will need to go via you as well.

can we merge this set of patches into both net-next and tip ?
We did such tricks in the past to avoid merge conflicts and
everything went fine during merge window.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v5 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map
  2017-09-21 15:15       ` Alexei Starovoitov
@ 2017-09-25 15:41         ` Yonghong Song
  0 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2017-09-25 15:41 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller, peterz
  Cc: rostedt, daniel, netdev, kernel-team



On 9/21/17 8:15 AM, Alexei Starovoitov wrote:
> On 9/20/17 4:07 PM, David Miller wrote:
>> From: Peter Zijlstra <peterz@infradead.org>
>> Date: Wed, 20 Sep 2017 19:26:51 +0200
>>
>>> Dave, could we have this in a topic tree of sorts, because I have a
>>> pending series to rework all the timekeeping and it might be nice to not
>>> have sfr run into all sorts of conflicts.
>>
>> If you want to merge it into your tree that's fine.
>>
>> But it means that any further development done on top of this
>> work will need to go via you as well.
> 
> can we merge this set of patches into both net-next and tip ?
> We did such tricks in the past to avoid merge conflicts and
> everything went fine during merge window.

Hi, Peter,

Any suggestion for merging this patch? Alexei's idea sounds
reasonable?

Thanks,

Yonghong

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-09-25 15:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-20  6:09 [PATCH net-next v5 0/4] bpf: add two helpers to read perf event enabled/running time Yonghong Song
2017-09-20  6:09 ` [PATCH net-next v5 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map Yonghong Song
2017-09-20 17:26   ` Peter Zijlstra
2017-09-20 23:07     ` David Miller
2017-09-21 15:15       ` Alexei Starovoitov
2017-09-25 15:41         ` Yonghong Song
2017-09-20  6:09 ` [PATCH net-next v5 2/4] bpf: add a test case for helper bpf_perf_event_read_value Yonghong Song
2017-09-20  6:09 ` [PATCH net-next v5 3/4] bpf: add helper bpf_perf_prog_read_value Yonghong Song
2017-09-20  6:09 ` [PATCH net-next v5 4/4] bpf: add a test case for " Yonghong Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.