netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/2] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling
@ 2015-10-16  7:42 Kaixu Xia
  2015-10-16  7:42 ` [PATCH V3 1/2] bpf: control the trace data output on current cpu " Kaixu Xia
  2015-10-16  7:42 ` [PATCH V3 2/2] bpf: control all the perf events stored in PERF_EVENT_ARRAY maps Kaixu Xia
  0 siblings, 2 replies; 7+ messages in thread
From: Kaixu Xia @ 2015-10-16  7:42 UTC (permalink / raw)
  To: ast, davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt,
	jolsa, daniel
  Cc: xiakaixu, wangnan0, linux-kernel, pi3orama, hekuang, netdev

Previous patch V2 url:
https://lkml.org/lkml/2015/10/14/347

This patchset introduces the new perf_event_attr attribute 
'dump_enable'. The already existed 'disabled' flag doesn't
meet the requirements. The cpu_function_call is too much 
to do from bpf program and we control the perf_event stored in 
maps like soft_disable, so if the 'disabled' flag is set to
true, we can't enable/disable the perf event by bpf programs.

changes in V3:
 - make the flag name and condition check consistent;
 - check the bpf helper flag only bit 0 and check all other bits are
   reserved;
 - use atomic_dec_if_positive() and atomic_inc_unless_negative();
 - make bpf_perf_event_dump_control_proto be static;
 - remove the ioctl PERF_EVENT_IOC_SET_ENABLER and 'enabler' event;
 - implement the function that controlling all the perf events
   stored in PERF_EVENT_ARRAY maps by setting the parameter 'index'
   to maps max_entries;

changes in V2:
 - rebase the whole patch set to net-next tree(4b418bf);
 - remove the added flag perf_sample_disable in bpf_map;
 - move the added fields in structure perf_event to proper place
   to avoid cacheline miss;
 - use counter based flag instead of 0/1 switcher in considering
   of reentering events;
 - use a single helper bpf_perf_event_sample_control() to enable/
   disable events;
 - implement a light-weight solution to control the trace data
   output on current cpu;
 - create a new ioctl PERF_EVENT_IOC_SET_ENABLER to enable/disable
   a set of events;

Before this patch,
   $ ./perf record -e cycles -a sleep 1
   $ ./perf report --stdio
	# To display the perf.data header info, please use --header/--header-only option
	#
	#
	# Total Lost Samples: 0
	#
	# Samples: 643  of event 'cycles'
	# Event count (approx.): 128313904
	...

After this patch,
   $ ./perf record -e pmux=cycles --event perf-bpf.o/my_cycles_map=pmux/ -a sleep 1
   $ ./perf report --stdio
	# To display the perf.data header info, please use --header/--header-only option
	#
	#
	# Total Lost Samples: 0
	#
	# Samples: 25  of event 'cycles'
	# Event count (approx.): 5788400
	...

The bpf program example:

  struct bpf_map_def SEC("maps") my_cycles_map = {
          .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
          .key_size = sizeof(int),
          .value_size = sizeof(u32),
          .max_entries = 32, 
  };

  SEC("enter=sys_write")
  int bpf_prog_1(struct pt_regs *ctx)
  {
          bpf_perf_event_dump_control(&my_cycles_map, 32, 0); 
          return 0;
  }

  SEC("exit=sys_write%return")
  int bpf_prog_2(struct pt_regs *ctx)
  {
          bpf_perf_event_dump_control(&my_cycles_map, 32, 1); 
          return 0;
  }

Consider control sampling in function level, we have to set
a high sample frequency to dump trace data when enable/disable
the perf event on current cpu.

Kaixu Xia (2):
  bpf: control the trace data output on current cpu when perf sampling
  bpf: control all the perf events stored in PERF_EVENT_ARRAY maps

 include/linux/perf_event.h      |  1 +
 include/uapi/linux/bpf.h        |  5 ++++
 include/uapi/linux/perf_event.h |  3 ++-
 kernel/bpf/verifier.c           |  3 ++-
 kernel/events/core.c            | 13 +++++++++
 kernel/trace/bpf_trace.c        | 60 +++++++++++++++++++++++++++++++++++++++++
 6 files changed, 83 insertions(+), 2 deletions(-)

-- 
1.8.3.4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V3 1/2] bpf: control the trace data output on current cpu when perf sampling
  2015-10-16  7:42 [PATCH V3 0/2] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling Kaixu Xia
@ 2015-10-16  7:42 ` Kaixu Xia
  2015-10-16 22:06   ` Alexei Starovoitov
  2015-10-16  7:42 ` [PATCH V3 2/2] bpf: control all the perf events stored in PERF_EVENT_ARRAY maps Kaixu Xia
  1 sibling, 1 reply; 7+ messages in thread
From: Kaixu Xia @ 2015-10-16  7:42 UTC (permalink / raw)
  To: ast, davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt,
	jolsa, daniel
  Cc: xiakaixu, wangnan0, linux-kernel, pi3orama, hekuang, netdev

This patch adds the flag dump_enable to control the trace data
output process when perf sampling. By setting this flag and
integrating with ebpf, we can control the data output process and
get the samples we are most interested in.

The bpf helper bpf_perf_event_dump_control() can control the
perf_event on current cpu.

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
---
 include/linux/perf_event.h      |  1 +
 include/uapi/linux/bpf.h        |  5 +++++
 include/uapi/linux/perf_event.h |  3 ++-
 kernel/bpf/verifier.c           |  3 ++-
 kernel/events/core.c            | 13 ++++++++++++
 kernel/trace/bpf_trace.c        | 44 +++++++++++++++++++++++++++++++++++++++++
 6 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 092a0e8..2af527e 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -472,6 +472,7 @@ struct perf_event {
 	struct irq_work			pending;
 
 	atomic_t			event_limit;
+	atomic_t			dump_enable;
 
 	void (*destroy)(struct perf_event *);
 	struct rcu_head			rcu_head;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 564f1f0..ba08034 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -287,6 +287,11 @@ enum bpf_func_id {
 	 * Return: realm if != 0
 	 */
 	BPF_FUNC_get_route_realm,
+
+	/**
+	 * u64 bpf_perf_event_dump_control(&map, index, flag)
+	 */
+	BPF_FUNC_perf_event_dump_control,
 	__BPF_FUNC_MAX_ID,
 };
 
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 2881145..f4b8f08 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -331,7 +331,8 @@ struct perf_event_attr {
 				comm_exec      :  1, /* flag comm events that are due to an exec */
 				use_clockid    :  1, /* use @clockid for time fields */
 				context_switch :  1, /* context switch data */
-				__reserved_1   : 37;
+				dump_enable    :  1, /* don't output data on samples */
+				__reserved_1   : 36;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1d6b97b..26b55f2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -245,6 +245,7 @@ static const struct {
 } func_limit[] = {
 	{BPF_MAP_TYPE_PROG_ARRAY, BPF_FUNC_tail_call},
 	{BPF_MAP_TYPE_PERF_EVENT_ARRAY, BPF_FUNC_perf_event_read},
+	{BPF_MAP_TYPE_PERF_EVENT_ARRAY, BPF_FUNC_perf_event_dump_control},
 };
 
 static void print_verifier_state(struct verifier_env *env)
@@ -910,7 +911,7 @@ static int check_map_func_compatibility(struct bpf_map *map, int func_id)
 		 * don't allow any other map type to be passed into
 		 * the special func;
 		 */
-		if (bool_map != bool_func)
+		if (bool_func && bool_map != bool_func)
 			return -EINVAL;
 	}
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b11756f..74a16af 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6337,6 +6337,9 @@ static int __perf_event_overflow(struct perf_event *event,
 		irq_work_queue(&event->pending);
 	}
 
+	if (!atomic_read(&event->dump_enable))
+		return ret;
+
 	if (event->overflow_handler)
 		event->overflow_handler(event, data, regs);
 	else
@@ -7709,6 +7712,14 @@ static void account_event(struct perf_event *event)
 	account_event_cpu(event, event->cpu);
 }
 
+static void perf_event_check_dump_flag(struct perf_event *event)
+{
+	if (event->attr.dump_enable == 1)
+		atomic_set(&event->dump_enable, 1);
+	else
+		atomic_set(&event->dump_enable, 0);
+}
+
 /*
  * Allocate and initialize a event structure
  */
@@ -7840,6 +7851,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		}
 	}
 
+	perf_event_check_dump_flag(event);
+
 	return event;
 
 err_per_task:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0fe96c7..3175600 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -215,6 +215,48 @@ const struct bpf_func_proto bpf_perf_event_read_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+/* flags for PERF_EVENT_ARRAY maps*/
+enum {
+	PERF_EVENT_CTL_BIT_DUMP = 0,
+	_NR_PERF_EVENT_CTL_BITS,
+};
+
+#define	BIT_FLAG_CHECK	GENMASK_ULL(63, _NR_PERF_EVENT_CTL_BITS)
+#define	BIT_DUMP_CTL	BIT_ULL(PERF_EVENT_CTL_BIT_DUMP)
+
+static u64 bpf_perf_event_dump_control(u64 r1, u64 index, u64 flag, u64 r4, u64 r5)
+{
+	struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	struct perf_event *event;
+
+	if (unlikely(index >= array->map.max_entries))
+		return -E2BIG;
+
+	if (flag & BIT_FLAG_CHECK)
+		return -EINVAL;
+
+	event = (struct perf_event *)array->ptrs[index];
+	if (!event)
+		return -ENOENT;
+
+	if (flag & BIT_DUMP_CTL)
+		atomic_dec_if_positive(&event->dump_enable);
+	else
+		atomic_inc_unless_negative(&event->dump_enable);
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_perf_event_dump_control_proto = {
+	.func		= bpf_perf_event_dump_control,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func_id)
 {
 	switch (func_id) {
@@ -242,6 +284,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
 		return &bpf_get_smp_processor_id_proto;
 	case BPF_FUNC_perf_event_read:
 		return &bpf_perf_event_read_proto;
+	case BPF_FUNC_perf_event_dump_control:
+		return &bpf_perf_event_dump_control_proto;
 	default:
 		return NULL;
 	}
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V3 2/2] bpf: control all the perf events stored in PERF_EVENT_ARRAY maps
  2015-10-16  7:42 [PATCH V3 0/2] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling Kaixu Xia
  2015-10-16  7:42 ` [PATCH V3 1/2] bpf: control the trace data output on current cpu " Kaixu Xia
@ 2015-10-16  7:42 ` Kaixu Xia
  2015-10-16 22:12   ` Alexei Starovoitov
  1 sibling, 1 reply; 7+ messages in thread
From: Kaixu Xia @ 2015-10-16  7:42 UTC (permalink / raw)
  To: ast, davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt,
	jolsa, daniel
  Cc: xiakaixu, wangnan0, linux-kernel, pi3orama, hekuang, netdev

This patch implements the function that controlling all the perf
events stored in PERF_EVENT_ARRAY maps by setting the parameter
'index' to maps max_entries.

Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
---
 kernel/trace/bpf_trace.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3175600..4b385863 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -229,13 +229,30 @@ static u64 bpf_perf_event_dump_control(u64 r1, u64 index, u64 flag, u64 r4, u64
 	struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	struct perf_event *event;
+	int i;
 
-	if (unlikely(index >= array->map.max_entries))
+	if (unlikely(index > array->map.max_entries))
 		return -E2BIG;
 
 	if (flag & BIT_FLAG_CHECK)
 		return -EINVAL;
 
+	if (index == array->map.max_entries) {
+		bool dump_control = flag & BIT_DUMP_CTL;
+
+		for (i = 0; i < array->map.max_entries; i++) {
+			event = (struct perf_event *)array->ptrs[i];
+			if (!event)
+				continue;
+
+			if (dump_control)
+				atomic_dec_if_positive(&event->dump_enable);
+			else
+				atomic_inc_unless_negative(&event->dump_enable);
+		}
+		return 0;
+	}
+
 	event = (struct perf_event *)array->ptrs[index];
 	if (!event)
 		return -ENOENT;
@@ -244,7 +261,6 @@ static u64 bpf_perf_event_dump_control(u64 r1, u64 index, u64 flag, u64 r4, u64
 		atomic_dec_if_positive(&event->dump_enable);
 	else
 		atomic_inc_unless_negative(&event->dump_enable);
-
 	return 0;
 }
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V3 1/2] bpf: control the trace data output on current cpu when perf sampling
  2015-10-16  7:42 ` [PATCH V3 1/2] bpf: control the trace data output on current cpu " Kaixu Xia
@ 2015-10-16 22:06   ` Alexei Starovoitov
  2015-10-19  2:48     ` xiakaixu
  2015-10-19  4:03     ` xiakaixu
  0 siblings, 2 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2015-10-16 22:06 UTC (permalink / raw)
  To: Kaixu Xia, davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt,
	jolsa, daniel
  Cc: wangnan0, linux-kernel, pi3orama, hekuang, netdev

On 10/16/15 12:42 AM, Kaixu Xia wrote:
> This patch adds the flag dump_enable to control the trace data
> output process when perf sampling. By setting this flag and
> integrating with ebpf, we can control the data output process and
> get the samples we are most interested in.
>
> The bpf helper bpf_perf_event_dump_control() can control the
> perf_event on current cpu.
>
> Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
> ---
>   include/linux/perf_event.h      |  1 +
>   include/uapi/linux/bpf.h        |  5 +++++
>   include/uapi/linux/perf_event.h |  3 ++-
>   kernel/bpf/verifier.c           |  3 ++-
>   kernel/events/core.c            | 13 ++++++++++++
>   kernel/trace/bpf_trace.c        | 44 +++++++++++++++++++++++++++++++++++++++++
>   6 files changed, 67 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 092a0e8..2af527e 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -472,6 +472,7 @@ struct perf_event {
>   	struct irq_work			pending;
>
>   	atomic_t			event_limit;
> +	atomic_t			dump_enable;

The naming is the hardest...
How about calling it 'soft_enable' instead?

> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -287,6 +287,11 @@ enum bpf_func_id {
>   	 * Return: realm if != 0
>   	 */
>   	BPF_FUNC_get_route_realm,
> +
> +	/**
> +	 * u64 bpf_perf_event_dump_control(&map, index, flag)
> +	 */
> +	BPF_FUNC_perf_event_dump_control,

and this one is too long.
May be bpf_perf_event_control() ?

Daniel, any thoughts on naming?

> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -331,7 +331,8 @@ struct perf_event_attr {
>   				comm_exec      :  1, /* flag comm events that are due to an exec */
>   				use_clockid    :  1, /* use @clockid for time fields */
>   				context_switch :  1, /* context switch data */
> -				__reserved_1   : 37;
> +				dump_enable    :  1, /* don't output data on samples */

either comment or name is wrong.
how about calling this one 'soft_disable',
since you want zero to be default and the event should be on.

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index b11756f..74a16af 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6337,6 +6337,9 @@ static int __perf_event_overflow(struct perf_event *event,
>   		irq_work_queue(&event->pending);
>   	}
>
> +	if (!atomic_read(&event->dump_enable))
> +		return ret;

I'm not an expert in this piece of perf, but should it be 'return 0'
instead ?
and may be moved to is_sampling_event() check?
Also please add unlikely().

> +static void perf_event_check_dump_flag(struct perf_event *event)
> +{
> +	if (event->attr.dump_enable == 1)
> +		atomic_set(&event->dump_enable, 1);
> +	else
> +		atomic_set(&event->dump_enable, 0);

that looks like it breaks perf, since default for bits is zero
and all events will be soft-disabled?
How did you test it?
Please add a test to samples/bpf/ for this feature.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V3 2/2] bpf: control all the perf events stored in PERF_EVENT_ARRAY maps
  2015-10-16  7:42 ` [PATCH V3 2/2] bpf: control all the perf events stored in PERF_EVENT_ARRAY maps Kaixu Xia
@ 2015-10-16 22:12   ` Alexei Starovoitov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2015-10-16 22:12 UTC (permalink / raw)
  To: Kaixu Xia, davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt,
	jolsa, daniel
  Cc: wangnan0, linux-kernel, pi3orama, hekuang, netdev

On 10/16/15 12:42 AM, Kaixu Xia wrote:
> This patch implements the function that controlling all the perf
> events stored in PERF_EVENT_ARRAY maps by setting the parameter
> 'index' to maps max_entries.
>
> Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
> ---
>   kernel/trace/bpf_trace.c | 20 ++++++++++++++++++--
>   1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 3175600..4b385863 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -229,13 +229,30 @@ static u64 bpf_perf_event_dump_control(u64 r1, u64 index, u64 flag, u64 r4, u64
>   	struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
>   	struct bpf_array *array = container_of(map, struct bpf_array, map);
>   	struct perf_event *event;
> +	int i;
>
> -	if (unlikely(index >= array->map.max_entries))
> +	if (unlikely(index > array->map.max_entries))
>   		return -E2BIG;
>
>   	if (flag & BIT_FLAG_CHECK)
>   		return -EINVAL;
>
> +	if (index == array->map.max_entries) {

I don't like in-band signaling like this, since it's easy to make
mistake on the bpf program side.
Please use 2nd bit of 'flags' for that instead.
Also squash this patch into 1st.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V3 1/2] bpf: control the trace data output on current cpu when perf sampling
  2015-10-16 22:06   ` Alexei Starovoitov
@ 2015-10-19  2:48     ` xiakaixu
  2015-10-19  4:03     ` xiakaixu
  1 sibling, 0 replies; 7+ messages in thread
From: xiakaixu @ 2015-10-19  2:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt, jolsa,
	daniel, wangnan0, linux-kernel, pi3orama, hekuang, netdev

于 2015/10/17 6:06, Alexei Starovoitov 写道:
> On 10/16/15 12:42 AM, Kaixu Xia wrote:
>> This patch adds the flag dump_enable to control the trace data
>> output process when perf sampling. By setting this flag and
>> integrating with ebpf, we can control the data output process and
>> get the samples we are most interested in.
>>
>> The bpf helper bpf_perf_event_dump_control() can control the
>> perf_event on current cpu.
>>
>> Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
>> ---
>>   include/linux/perf_event.h      |  1 +
>>   include/uapi/linux/bpf.h        |  5 +++++
>>   include/uapi/linux/perf_event.h |  3 ++-
>>   kernel/bpf/verifier.c           |  3 ++-
>>   kernel/events/core.c            | 13 ++++++++++++
>>   kernel/trace/bpf_trace.c        | 44 +++++++++++++++++++++++++++++++++++++++++
>>   6 files changed, 67 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 092a0e8..2af527e 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -472,6 +472,7 @@ struct perf_event {
>>       struct irq_work            pending;
>>
>>       atomic_t            event_limit;
>> +    atomic_t            dump_enable;
> 
> The naming is the hardest...
> How about calling it 'soft_enable' instead?
> 
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -287,6 +287,11 @@ enum bpf_func_id {
>>        * Return: realm if != 0
>>        */
>>       BPF_FUNC_get_route_realm,
>> +
>> +    /**
>> +     * u64 bpf_perf_event_dump_control(&map, index, flag)
>> +     */
>> +    BPF_FUNC_perf_event_dump_control,
> 
> and this one is too long.
> May be bpf_perf_event_control() ?
> 
> Daniel, any thoughts on naming?
> 
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -331,7 +331,8 @@ struct perf_event_attr {
>>                   comm_exec      :  1, /* flag comm events that are due to an exec */
>>                   use_clockid    :  1, /* use @clockid for time fields */
>>                   context_switch :  1, /* context switch data */
>> -                __reserved_1   : 37;
>> +                dump_enable    :  1, /* don't output data on samples */
> 
> either comment or name is wrong.
> how about calling this one 'soft_disable',
> since you want zero to be default and the event should be on.
> 
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index b11756f..74a16af 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -6337,6 +6337,9 @@ static int __perf_event_overflow(struct perf_event *event,
>>           irq_work_queue(&event->pending);
>>       }
>>
>> +    if (!atomic_read(&event->dump_enable))
>> +        return ret;
> 
> I'm not an expert in this piece of perf, but should it be 'return 0'
> instead ?
> and may be moved to is_sampling_event() check?
> Also please add unlikely().
> 
>> +static void perf_event_check_dump_flag(struct perf_event *event)
>> +{
>> +    if (event->attr.dump_enable == 1)it 
>> +        atomic_set(&event->dump_enable, 1);
>> +    else
>> +        atomic_set(&event->dump_enable, 0);
> 
> that looks like it breaks perf, since default for bits is zero
> and all events will be soft-disabled?
> How did you test it?
> Please add a test to samples/bpf/ for this feature.

It is really hard that adding a test to samples/bpf/. We need to implement most of
'perf record/report' commands from tools/perf/, like mmap(), dump trace, etc. Only
the perf_event_open syscall is really not enough.

Actually, this patch set is only the kernel space side, and it still needs the perf
user space side, you can find the necessary patches in Wang Nan's git tree[1].
Based on Wang Nan's git tree, we can config BPF maps through perf cmdline.
We also need to confing attr->soft_disable in perf user side based on tree[1]. so
it was not included in this patchset. I will send out the perf userspace part after
this patch set is applied.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/pi3orama/linux.git perf/ebpf
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V3 1/2] bpf: control the trace data output on current cpu when perf sampling
  2015-10-16 22:06   ` Alexei Starovoitov
  2015-10-19  2:48     ` xiakaixu
@ 2015-10-19  4:03     ` xiakaixu
  1 sibling, 0 replies; 7+ messages in thread
From: xiakaixu @ 2015-10-19  4:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt, jolsa,
	daniel, wangnan0, linux-kernel, pi3orama, hekuang, netdev

于 2015/10/17 6:06, Alexei Starovoitov 写道:
> On 10/16/15 12:42 AM, Kaixu Xia wrote:
>> This patch adds the flag dump_enable to control the trace data
>> output process when perf sampling. By setting this flag and
>> integrating with ebpf, we can control the data output process and
>> get the samples we are most interested in.
>>
>> The bpf helper bpf_perf_event_dump_control() can control the
>> perf_event on current cpu.
>>
>> Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
>> ---
>>   include/linux/perf_event.h      |  1 +
>>   include/uapi/linux/bpf.h        |  5 +++++
>>   include/uapi/linux/perf_event.h |  3 ++-
>>   kernel/bpf/verifier.c           |  3 ++-
>>   kernel/events/core.c            | 13 ++++++++++++
>>   kernel/trace/bpf_trace.c        | 44 +++++++++++++++++++++++++++++++++++++++++
>>   6 files changed, 67 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 092a0e8..2af527e 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -472,6 +472,7 @@ struct perf_event {
>>       struct irq_work            pending;
>>
>>       atomic_t            event_limit;
>> +    atomic_t            dump_enable;
> 
> The naming is the hardest...
> How about calling it 'soft_enable' instead?
> 
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -287,6 +287,11 @@ enum bpf_func_id {
>>        * Return: realm if != 0
>>        */
>>       BPF_FUNC_get_route_realm,
>> +
>> +    /**
>> +     * u64 bpf_perf_event_dump_control(&map, index, flag)
>> +     */
>> +    BPF_FUNC_perf_event_dump_control,
> 
> and this one is too long.
> May be bpf_perf_event_control() ?
> 
> Daniel, any thoughts on naming?
> 
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -331,7 +331,8 @@ struct perf_event_attr {
>>                   comm_exec      :  1, /* flag comm events that are due to an exec */
>>                   use_clockid    :  1, /* use @clockid for time fields */
>>                   context_switch :  1, /* context switch data */
>> -                __reserved_1   : 37;
>> +                dump_enable    :  1, /* don't output data on samples */
> 
> either comment or name is wrong.
> how about calling this one 'soft_disable',
> since you want zero to be default and the event should be on.
> 
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index b11756f..74a16af 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -6337,6 +6337,9 @@ static int __perf_event_overflow(struct perf_event *event,
>>           irq_work_queue(&event->pending);
>>       }
>>
>> +    if (!atomic_read(&event->dump_enable))
>> +        return ret;
> 
> I'm not an expert in this piece of perf, but should it be 'return 0'
> instead ?
> and may be moved to is_sampling_event() check?
> Also please add unlikely().

The is_sampling_event() is checked in many other kernel places, not only in
perf events interrupt overflow handle. I'm not sure it is fine if we move it
to there. In addition, I think hwc->interrupts++ should be done in
__perf_event_overflow() before event->soft_enable is checked.
> 
>> +static void perf_event_check_dump_flag(struct perf_event *event)
>> +{
>> +    if (event->attr.dump_enable == 1)
>> +        atomic_set(&event->dump_enable, 1);
>> +    else
>> +        atomic_set(&event->dump_enable, 0);
> 
> that looks like it breaks perf, since default for bits is zero
> and all events will be soft-disabled?
> How did you test it?
> Please add a test to samples/bpf/ for this feature.
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-10-19  4:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-16  7:42 [PATCH V3 0/2] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling Kaixu Xia
2015-10-16  7:42 ` [PATCH V3 1/2] bpf: control the trace data output on current cpu " Kaixu Xia
2015-10-16 22:06   ` Alexei Starovoitov
2015-10-19  2:48     ` xiakaixu
2015-10-19  4:03     ` xiakaixu
2015-10-16  7:42 ` [PATCH V3 2/2] bpf: control all the perf events stored in PERF_EVENT_ARRAY maps Kaixu Xia
2015-10-16 22:12   ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).