From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932405AbbIUNzd (ORCPT <rfc822;w@1wt.eu>);
	Mon, 21 Sep 2015 09:55:33 -0400
Received: from szxga03-in.huawei.com ([119.145.14.66]:21360 "EHLO
	szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1756464AbbIUNz3 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 21 Sep 2015 09:55:29 -0400
Message-ID: <56000C1F.9060709@huawei.com>
Date: Mon, 21 Sep 2015 21:54:39 +0800
From: Yunlong Song <yunlong.song@huawei.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: <a.p.zijlstra@chello.nl>, <paulus@samba.org>, <mingo@redhat.com>,
        <acme@kernel.org>, <rostedt@goodmis.org>
CC: <linux-kernel@vger.kernel.org>, <wangnan0@huawei.com>
Subject: [RFC] Perf: Trigger and dump sample info to perf.data from user space
 ring buffer
References: <1427809596-29559-1-git-send-email-yunlong.song@huawei.com> <1427809596-29559-9-git-send-email-yunlong.song@huawei.com>
In-Reply-To: <1427809596-29559-9-git-send-email-yunlong.song@huawei.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.111.74.205]
X-CFilter-Loop: Reflected
X-Mirapoint-Virus-RAPID-Raw: score=unknown(0),
	refid=str=0001.0A020206.56000C32.00A2,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0,
	ip=0.0.0.0,
	so=2013-05-26 15:14:31,
	dmn=2013-03-21 17:37:32
X-Mirapoint-Loop-Id: 8ea256f342690fa579be44d0ee41a6f4
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

[Problem Background]

We want to run perf in daemon mode and collect the traces when the exception
(e.g., machine crashes, app performance goes down) appears. Perf may run for a
long time (from days to weeks or even months), since we do not know when the
exception will appear at all, however it will appear at some time (especially
for a beta product). If we simply use “perf record” as usual, here come two
problems as time goes by: 1 there will be amounts of IOs created for writing
perf.data which may affects the performance a lot; 2 the size of perf.data will
be larger and larger as well. Although we can use eBPF to reduce the traces in
normal case, but in our case, the perf runs in daemon mode for a long time and
that will accumulate the traces as time goes by.


[One Solution]

In fact, we only need to collect the sample info which is created for a while
just before the exception appears. We do not care about the other sample info in
other time. So perhaps we have to change the current way how perf makes its
perf.data as follows:
 1 Let perf allocate a user space ring buffer in a reasonable size, which is big
 enough to store all the tracing info we care about (for a while) before the
 exception appears;
 2 Dump the sample info to the user space ring buffer, the size of user space
 ring buffer is a constant value, so the newer sample info will replace the older
 sample info;
 3 After some kind of trigger (maybe via eBPF event, signal or socket
 communication) which is caused by the exception situation, the user space ring
 buffer should dump all its tracing info to perf.data.sample.TIME#


[Use Style]
	
We can add an option (such as “-M size” or “--memory size”) to define the
size of the user space ring buffer and active the user space ring buffer mode
described above. For convenience, we can add --daemon to make perf run as a
daemon.
# perf record -M size -e bpf.o -e cycles -g -F 100 -a sleep 1000000
Or
# perf record -M size -e bpf.o -e cycles -g -F 100 -a --daemon

When the exception appears, it sends a signal (may also use eBPF event or socket
communication) to perf
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1

When the 2nd exception appears
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2

......

When the nth exception appears
# kill -SIGUSR1 1234
# ls
perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2 … perf.data.sample.TIMEn

We can user perf report or perf script to analyze each perf.data.sample.TIME#

Or finally, we can kill perf and combine perf.data.auxiliary with all the
perf.data.sample.TIME# to create all-in-one perf.data
# kill --SIGUSR2 1234
# ls
perf.data


[To Do]

If the idea mentioned above is OK, we want to realize it in the following steps:
1 Develop perf’s user space ring buffer, which can make newer sample info
replace older sample info.
2 Classify the tracing info into two kinds, one kind is just sample event, we
only need some of them which are created (for a while) just before the exception
appears, we can call the first kind of tracing info as Optional tracing info,
and perf should dump this info to the user space ring buffer; the second kind is
the tracing info which are required to analyze the sample events, such as
mmap_event to show the dso’s related info, we can call this second kind of
tracing info as Auxiliary tracing info, and perf should dump this info into
perf.data.auxiliary or just directly into perf.data as before.
3 Develop a trigger for perf, which can activate perf to dump its user space
ring buffer to perf.data.sample.TIME#, or just appends them into perf.data. The
trigger may have three interfaces, eBPF event, signal and socket communication.
4 Make perf report or perf script etc, have the ability to analyze the
perf.data.auxiliary, perf.data.sample.TIME#, or the final synthetic perf.data
combined from perf.data.auxiliary and all the perf.data.sample.TIME#
5 For daemon mode, we should also let perf support its running in backend all
the time and its ending from a trigger.


[Conclusion]

In fact, we realize a mechanism to make perf’s tracing more refined and
efficient. We regard the size of perf.data and the cost of writing perf.data as
an expensive resource, which should be used in a more careful and
just-for-the-exception target way. This mechanism can be used both in daemon way
or in non-daemon way. This idea can be another way to filter the tracing events
compared to eBPF.

Thanks,
------
Yunlong Song