bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] staging: tracing/kprobe: filter kprobe based perf event
       [not found] <20190918052406.21385-1-jinshan.xiong@gmail.com>
@ 2019-09-18  5:51 ` Yonghong Song
  2019-09-18 14:32   ` Alexei Starovoitov
  0 siblings, 1 reply; 4+ messages in thread
From: Yonghong Song @ 2019-09-18  5:51 UTC (permalink / raw)
  To: jinshan.xiong, linux-kernel, rostedt, mingo; +Cc: jinshan.xiong, bpf


Adding cc to bpf@vger.kernel.org mailing list since this is really
bpf related.

On 9/17/19 10:24 PM, jinshan.xiong@gmail.com wrote:
> From: Jinshan Xiong <jinshan.xiong@gmail.com>
> 
> Invoking bpf program only if kprobe based perf_event has been added into
> the percpu list. This is essential to make event tracing for cgroup to work
> properly.

The issue actually exists for bpf programs with kprobe, uprobe, 
tracepoint and trace_syscall_enter/exit.

In all these places, bpf program is called regardless of
whether there are perf events or not on this cpu.
This provides bpf programs more opportunities to see
the events. I guess this is by design.
Alexei/Daniel, could you clarify?

This, unfortunately, has a consequence on cgroup.
Currently, perf event cgroup based on filtering
(PERF_FLAG_PID_CGROUP) won't work for bpf programs
with kprobee/uprobe/tracepoint/trace_syscall.
The reason is the same, bpf programs see
more events than what perf has configured.

the overflow interrupt (nmi) based perf_event bpf programs
are not impacted.

Any suggestions on what is the proper way to move
forward?

One way to start to honor events only permitted by perf
like what this patch did. But I am not sure whether this
contradicts the original motivation for bpf programs
to see all events regardless.

Another way is to do filtering inside bpf program.
We already have bpf_get_cgroup_id() helper.
We may need another helper to check whether the current
is (a descendant of) another cgroup as it is often the cases
to trace all processes under one parent cgroup which may have many
child cgroups.

> 
> Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
> ---
>   kernel/trace/trace_kprobe.c | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 9d483ad9bb6c..40ef0f1945f7 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -1171,11 +1171,18 @@ static int
>   kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
>   {
>   	struct trace_event_call *call = trace_probe_event_call(&tk->tp);
> +	struct hlist_head *head = this_cpu_ptr(call->perf_events);
>   	struct kprobe_trace_entry_head *entry;
> -	struct hlist_head *head;
>   	int size, __size, dsize;
>   	int rctx;
>   
> +	/*
> +	 * If head is empty, the process currently running on this cpu is not
> +	 * interested by kprobe perf PMU.
> +	 */
> +	if (hlist_empty(head))
> +		return 0;
> +
>   	if (bpf_prog_array_valid(call)) {
>   		unsigned long orig_ip = instruction_pointer(regs);
>   		int ret;
> @@ -1193,10 +1200,6 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
>   			return 0;
>   	}
>   
> -	head = this_cpu_ptr(call->perf_events);
> -	if (hlist_empty(head))
> -		return 0;
> -
>   	dsize = __get_data_size(&tk->tp, regs);
>   	__size = sizeof(*entry) + tk->tp.size + dsize;
>   	size = ALIGN(__size + sizeof(u32), sizeof(u64));
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] staging: tracing/kprobe: filter kprobe based perf event
  2019-09-18  5:51 ` [PATCH] staging: tracing/kprobe: filter kprobe based perf event Yonghong Song
@ 2019-09-18 14:32   ` Alexei Starovoitov
       [not found]     ` <CA+-8EHRk6aAuDQ=S9O7h6T2fhyz2z+zQduN2yiDNWMOWt2-t_A@mail.gmail.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Alexei Starovoitov @ 2019-09-18 14:32 UTC (permalink / raw)
  To: Yonghong Song
  Cc: jinshan.xiong, linux-kernel, rostedt, mingo, jinshan.xiong, bpf

On Wed, Sep 18, 2019 at 05:51:10AM +0000, Yonghong Song wrote:
> 
> Adding cc to bpf@vger.kernel.org mailing list since this is really
> bpf related.
> 
> On 9/17/19 10:24 PM, jinshan.xiong@gmail.com wrote:
> > From: Jinshan Xiong <jinshan.xiong@gmail.com>
> > 
> > Invoking bpf program only if kprobe based perf_event has been added into
> > the percpu list. This is essential to make event tracing for cgroup to work
> > properly.
> 
> The issue actually exists for bpf programs with kprobe, uprobe, 
> tracepoint and trace_syscall_enter/exit.
> 
> In all these places, bpf program is called regardless of
> whether there are perf events or not on this cpu.
> This provides bpf programs more opportunities to see
> the events. I guess this is by design.
> Alexei/Daniel, could you clarify?

Yes. It is by design.
When bpf is attached to kprobe it will fire on all cpus.
Per-cpu or per-task or per-cgroup filtering is already done
inside bpf programs.
We cannot make this change now it will break all users.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] staging: tracing/kprobe: filter kprobe based perf event
       [not found]     ` <CA+-8EHRk6aAuDQ=S9O7h6T2fhyz2z+zQduN2yiDNWMOWt2-t_A@mail.gmail.com>
@ 2019-09-18 20:22       ` Alexei Starovoitov
  2019-09-19  2:41         ` Jinshan Xiong
  0 siblings, 1 reply; 4+ messages in thread
From: Alexei Starovoitov @ 2019-09-18 20:22 UTC (permalink / raw)
  To: Jinshan Xiong
  Cc: Yonghong Song, jinshan.xiong, linux-kernel, rostedt, mingo, bpf

On Wed, Sep 18, 2019 at 8:13 AM Jinshan Xiong <jinshan.xiong@uber.com> wrote:
>
> The problem with the current approach is that it would be difficult to filter cgroup, especially the cgroup in question has descendents, and also it would spawn new descendents after BPF program is installed. it's hard to filter it inside a BPF program.

Why is that?
bpf_current_task_under_cgroup() fits exactly that purpose.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] staging: tracing/kprobe: filter kprobe based perf event
  2019-09-18 20:22       ` Alexei Starovoitov
@ 2019-09-19  2:41         ` Jinshan Xiong
  0 siblings, 0 replies; 4+ messages in thread
From: Jinshan Xiong @ 2019-09-19  2:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yonghong Song, jinshan.xiong, linux-kernel, rostedt, mingo, bpf

That's bloody true. Thanks for your insights.

I will make an example program and commit into bcc repository.

Jinshan


On Wed, Sep 18, 2019 at 1:22 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Sep 18, 2019 at 8:13 AM Jinshan Xiong <jinshan.xiong@uber.com> wrote:
> >
> > The problem with the current approach is that it would be difficult to filter cgroup, especially the cgroup in question has descendents, and also it would spawn new descendents after BPF program is installed. it's hard to filter it inside a BPF program.
>
> Why is that?
> bpf_current_task_under_cgroup() fits exactly that purpose.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-09-19  2:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190918052406.21385-1-jinshan.xiong@gmail.com>
2019-09-18  5:51 ` [PATCH] staging: tracing/kprobe: filter kprobe based perf event Yonghong Song
2019-09-18 14:32   ` Alexei Starovoitov
     [not found]     ` <CA+-8EHRk6aAuDQ=S9O7h6T2fhyz2z+zQduN2yiDNWMOWt2-t_A@mail.gmail.com>
2019-09-18 20:22       ` Alexei Starovoitov
2019-09-19  2:41         ` Jinshan Xiong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).