From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752436AbbGMOcq (ORCPT <rfc822;w@1wt.eu>);
	Mon, 13 Jul 2015 10:32:46 -0400
Received: from m12-14.163.com ([220.181.12.14]:55236 "EHLO m12-14.163.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751980AbbGMOcm convert rfc822-to-8bit (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 13 Jul 2015 10:32:42 -0400
Content-Type: text/plain;
	charset=gb2312
Mime-Version: 1.0 (1.0)
Subject: Re: [RFC PATCH v4 3/3] bpf: Introduce function for outputing data to perf event
From: pi3orama <pi3orama@163.com>
X-Mailer: iPhone Mail (12H143)
In-Reply-To: <20150713140915.GD9917@danjae.kornet>
Date: Mon, 13 Jul 2015 22:29:14 +0800
Cc: He Kuang <hekuang@huawei.com>, Alexei Starovoitov <ast@plumgrid.com>,
        "rostedt@goodmis.org" <rostedt@goodmis.org>,
        "masami.hiramatsu.pt@hitachi.com" <masami.hiramatsu.pt@hitachi.com>,
        "acme@kernel.org" <acme@kernel.org>,
        "a.p.zijlstra@chello.nl" <a.p.zijlstra@chello.nl>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "jolsa@kernel.org" <jolsa@kernel.org>,
        "wangnan0@huawei.com" <wangnan0@huawei.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Content-Transfer-Encoding: 8BIT
Message-Id: <B5145998-CD5C-4B1B-9A42-CD19691EF80B@163.com>
References: <1436522587-136825-1-git-send-email-hekuang@huawei.com> <1436522587-136825-4-git-send-email-hekuang@huawei.com> <55A042DC.6030809@plumgrid.com> <55A3404B.6020904@huawei.com> <20150713135223.GB9917@danjae.kornet> <4D441676-21A7-46EE-AAB0-EB529D408082@163.com> <20150713140915.GD9917@danjae.kornet>
To: Namhyung Kim <namhyung@kernel.org>
X-CM-TRANSID: DsCowEBZxEw5y6NVfrZoAg--.24765S2
X-Coremail-Antispam: 1Uf129KBjvJXoWxWFW3Ww4rAw45WFyDCF4DJwb_yoW5Kr1Dpa
	yjkan8Kr4kJr1Utw12qw48Xw1Fyr92yFWDXwn5GrW8Crn0gry2gr1xJrWj9F93CF1UGr1Y
	vr4Utry2qFy8AaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2
	9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07b5-B_UUUUU=
X-Originating-IP: [117.136.0.184]
X-CM-SenderInfo: lslt02xdpdqiywtou0bp/1tbiXxw2QFWBNxsodgAAsT
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


发自我的 iPhone

> 在 2015年7月13日，下午10:09，Namhyung Kim <namhyung@kernel.org> 写道：
> 
>> On Mon, Jul 13, 2015 at 10:01:26PM +0800, pi3orama wrote:
>> 
>> 
>> 发自我的 iPhone
>> 
>>> 在 2015年7月13日，下午9:52，Namhyung Kim <namhyung@kernel.org> 写道：
>>> 
>>> Hi,
>>> 
>>>> On Mon, Jul 13, 2015 at 12:36:27PM +0800, He Kuang wrote:
>>>> hi, Alexei
>>>> 
>>>>>> On 2015/7/11 6:10, Alexei Starovoitov wrote:
>>>>>> On 7/10/15 3:03 AM, He Kuang wrote:
>>>>>> There're scenarios that we need an eBPF program to record not only
>>>>>> kprobe point args, but also the PMU counters, time latencies or the
>>>>>> number of cache misses between two probe points and other information
>>>>>> when the probe point is entered.
>>>>>> 
>>>>>> This patch adds a new trace event to establish infrastruction for bpf to
>>>>>> output data to perf. Userspace perf tools can detect and use this event
>>>>>> as using the existing tracepoint events.
>>>>>> 
>>>>>> New bpf trace event entry in debugfs:
>>>>>> 
>>>>>>    /sys/kernel/debug/tracing/events/bpf/bpf_output_data
>>>>>> 
>>>>>> Userspace perf tools detect the new tracepoint event as:
>>>>>> 
>>>>>>    bpf:bpf_output_data                          [Tracepoint event]
>>>>> 
>>>>> Nice! This approach looks cleanest so far.
>>>>> 
>>>>>> +TRACE_EVENT(bpf_output_data,
>>>>>> +
>>>>>> +    TP_PROTO(u64 *src, int len),
>>>>>> +
>>>>>> +    TP_ARGS(src, len),
>>>>>> +
>>>>>> +    TP_STRUCT__entry(
>>>>>> +        __dynamic_array(u64,        buf,        len)
>>>>>> +    ),
>>>>>> +
>>>>>> +    TP_fast_assign(
>>>>>> +        memcpy(__get_dynamic_array(buf), src, len * sizeof(u64));
>>>>> 
>>>>> may be make it 'u8' array? The extra multiply and...
>>>> 
>>>> OK
>>>> 
>>>> So the output of three u64 integers (e.g. 0x2060572485, 0x20667b0ff2,
>>>> 0x623eb6d) will be this:
>>>> 
>>>> dd 994 [000] 139.158180: bpf:bpf_output_data: 85 24 57 60 20 00 00 00
>>>> f2 0f 7b 66 20 00 00 00 6d eb 23 06 00 00 00 00
>>>> 
>>>> And users are not restricted to u64 type elements. I'll change that.
>>> 
>>> While this general event format works well, I think it might be hard
>>> to know which output came from which program when more than one bpf
>>> programs used.
>>> 
>>> I was thinking about providing custom event formats for each bpf
>>> program (if needed).  The event format definitions might be in a
>>> specific directory or a bpf object itself.  Then perf can read those
>>> formats and print the output data according to the formats.  Maybe we
>>> need to add some dynamic event id to match format and data.
>> 
>> I think we can do it in perf side. Let BPF programs themselves
>> encode format information into the array and make perf read and
>> decode them. In kernel side simply support raw data should be
>> enough, so we can make kernel code as simple as possible.
> 
> Yes, of course, I also meant that doing those work all in perf side. :)
> 

I have an idea on it:

struct x{
 int a;
 int b;
};
struct x __x;

SEC(...)
int func(void *ctx)
{
  struct x myx;
  ...
  myx.a = 1;
  myx.b = 2;
  OUTPUT(&myx, &__x);
  ...
}

In the above program, the '&' operator will emit a relocation, so libbpf will have a chance to know the exact type of the output data. It then can translate into a unique number. The OUTPUT macro should pass the number through the raw data. When decoding, by reading the first word in the raw data perf knows the format. According to it perf can then retrieve the structure format through dwarf information. We can use more macro to make the above code simpler.

We will start working on it after this patch get accepted.

Thank you.


> Thanks,
> Namhyung