From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 6AA46892 for ; Thu, 21 Jul 2016 13:55:57 +0000 (UTC) Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id B716F116 for ; Thu, 21 Jul 2016 13:55:56 +0000 (UTC) Received: from pps.filterd (m0001255.ppops.net [127.0.0.1]) by mx0b-00082601.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u6LDs0EM009212 for ; Thu, 21 Jul 2016 06:55:55 -0700 Received: from maileast.thefacebook.com ([199.201.65.23]) by mx0b-00082601.pphosted.com with ESMTP id 249wja0r0y-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Thu, 21 Jul 2016 06:55:55 -0700 To: References: <578F36B9.802@huawei.com> <20160721100014.GB7901@quack2.suse.cz> From: Chris Mason Message-ID: <577236a8-2921-842a-2243-b8ecfe467381@fb.com> Date: Thu, 21 Jul 2016 09:54:53 -0400 MIME-Version: 1.0 In-Reply-To: <20160721100014.GB7901@quack2.suse.cz> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Ksummit-discuss] [TECH TOPIC] Kernel tracing and end-to-end performance breakdown List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 07/21/2016 06:00 AM, Jan Kara wrote: > > So I think improvements in performance analysis are always welcome but > current proposal seems to be somewhat handwavy so I'm not sure what outcome > you'd like to get from the discussion... If you have a more concrete > proposal how you'd like to achieve what you need, then it may be worth > discussion. > > As a side note I know that Google (and maybe Facebook, not sure here) have > out-of-tree patches which provide really neat performance analysis > capabilities. I have heard they are not really upstreamable because they > are horrible hacks but maybe they can be a good inspiration for this work. > If we could get someone from these companies to explain what capabilities > they have and how they achieve this (regardless how hacky the > implementation may be), that may be an interesting topic. At least for facebook, we're moving most things to bpf. The most interesting part of our analysis isn't so much from the tool used to record it, it's from being able to aggregate over the fleet and making comparisons at scale. For example, Josef setup the off-cpu flame graphs such that we can record stack traces for a latency higher than N, and then sum up the most expensive stack traces over a large number of machines. It makes it much easier to find those happens-once-a-day problems. -chris