From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932405AbbIUNzd (ORCPT ); Mon, 21 Sep 2015 09:55:33 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:21360 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756464AbbIUNz3 (ORCPT ); Mon, 21 Sep 2015 09:55:29 -0400 Message-ID: <56000C1F.9060709@huawei.com> Date: Mon, 21 Sep 2015 21:54:39 +0800 From: Yunlong Song User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: , , , , CC: , Subject: [RFC] Perf: Trigger and dump sample info to perf.data from user space ring buffer References: <1427809596-29559-1-git-send-email-yunlong.song@huawei.com> <1427809596-29559-9-git-send-email-yunlong.song@huawei.com> In-Reply-To: <1427809596-29559-9-git-send-email-yunlong.song@huawei.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.111.74.205] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020206.56000C32.00A2,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 8ea256f342690fa579be44d0ee41a6f4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Problem Background] We want to run perf in daemon mode and collect the traces when the exception (e.g., machine crashes, app performance goes down) appears. Perf may run for a long time (from days to weeks or even months), since we do not know when the exception will appear at all, however it will appear at some time (especially for a beta product). If we simply use “perf record” as usual, here come two problems as time goes by: 1 there will be amounts of IOs created for writing perf.data which may affects the performance a lot; 2 the size of perf.data will be larger and larger as well. Although we can use eBPF to reduce the traces in normal case, but in our case, the perf runs in daemon mode for a long time and that will accumulate the traces as time goes by. [One Solution] In fact, we only need to collect the sample info which is created for a while just before the exception appears. We do not care about the other sample info in other time. So perhaps we have to change the current way how perf makes its perf.data as follows: 1 Let perf allocate a user space ring buffer in a reasonable size, which is big enough to store all the tracing info we care about (for a while) before the exception appears; 2 Dump the sample info to the user space ring buffer, the size of user space ring buffer is a constant value, so the newer sample info will replace the older sample info; 3 After some kind of trigger (maybe via eBPF event, signal or socket communication) which is caused by the exception situation, the user space ring buffer should dump all its tracing info to perf.data.sample.TIME# [Use Style] We can add an option (such as “-M size” or “--memory size”) to define the size of the user space ring buffer and active the user space ring buffer mode described above. For convenience, we can add --daemon to make perf run as a daemon. # perf record -M size -e bpf.o -e cycles -g -F 100 -a sleep 1000000 Or # perf record -M size -e bpf.o -e cycles -g -F 100 -a --daemon When the exception appears, it sends a signal (may also use eBPF event or socket communication) to perf # kill -SIGUSR1 1234 # ls perf.data.auxiliary perf.data.sample.TIME1 When the 2nd exception appears # kill -SIGUSR1 1234 # ls perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2 ...... When the nth exception appears # kill -SIGUSR1 1234 # ls perf.data.auxiliary perf.data.sample.TIME1 perf.data.sample.TIME2 … perf.data.sample.TIMEn We can user perf report or perf script to analyze each perf.data.sample.TIME# Or finally, we can kill perf and combine perf.data.auxiliary with all the perf.data.sample.TIME# to create all-in-one perf.data # kill --SIGUSR2 1234 # ls perf.data [To Do] If the idea mentioned above is OK, we want to realize it in the following steps: 1 Develop perf’s user space ring buffer, which can make newer sample info replace older sample info. 2 Classify the tracing info into two kinds, one kind is just sample event, we only need some of them which are created (for a while) just before the exception appears, we can call the first kind of tracing info as Optional tracing info, and perf should dump this info to the user space ring buffer; the second kind is the tracing info which are required to analyze the sample events, such as mmap_event to show the dso’s related info, we can call this second kind of tracing info as Auxiliary tracing info, and perf should dump this info into perf.data.auxiliary or just directly into perf.data as before. 3 Develop a trigger for perf, which can activate perf to dump its user space ring buffer to perf.data.sample.TIME#, or just appends them into perf.data. The trigger may have three interfaces, eBPF event, signal and socket communication. 4 Make perf report or perf script etc, have the ability to analyze the perf.data.auxiliary, perf.data.sample.TIME#, or the final synthetic perf.data combined from perf.data.auxiliary and all the perf.data.sample.TIME# 5 For daemon mode, we should also let perf support its running in backend all the time and its ending from a trigger. [Conclusion] In fact, we realize a mechanism to make perf’s tracing more refined and efficient. We regard the size of perf.data and the cost of writing perf.data as an expensive resource, which should be used in a more careful and just-for-the-exception target way. This mechanism can be used both in daemon way or in non-daemon way. This idea can be another way to filter the tracing events compared to eBPF. Thanks, ------ Yunlong Song