From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5A90FA3728 for ; Wed, 16 Oct 2019 10:51:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7C8442168B for ; Wed, 16 Oct 2019 10:51:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729365AbfJPKvL (ORCPT ); Wed, 16 Oct 2019 06:51:11 -0400 Received: from mga04.intel.com ([192.55.52.120]:13732 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726697AbfJPKvL (ORCPT ); Wed, 16 Oct 2019 06:51:11 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Oct 2019 03:51:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,303,1566889200"; d="scan'208";a="189643009" Received: from yjin15-mobl.ccr.corp.intel.com (HELO [10.254.210.124]) ([10.254.210.124]) by orsmga008.jf.intel.com with ESMTP; 16 Oct 2019 03:51:08 -0700 Subject: Re: [PATCH v2 3/5] perf report: Sort by sampled cycles percent per block for stdio To: Jiri Olsa Cc: acme@kernel.org, jolsa@kernel.org, peterz@infradead.org, mingo@redhat.com, alexander.shishkin@linux.intel.com, Linux-kernel@vger.kernel.org, ak@linux.intel.com, kan.liang@intel.com, yao.jin@intel.com References: <20191015053350.13909-1-yao.jin@linux.intel.com> <20191015053350.13909-4-yao.jin@linux.intel.com> <20191015084102.GA10951@krava> <6882f3ae-0f8d-5a01-7fd5-5b9f9c93f9ac@linux.intel.com> <20191016101543.GC15580@krava> From: "Jin, Yao" Message-ID: <456b8e97-dc50-449c-9999-0bddef0e9c4c@linux.intel.com> Date: Wed, 16 Oct 2019 18:51:07 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20191016101543.GC15580@krava> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/16/2019 6:15 PM, Jiri Olsa wrote: > On Tue, Oct 15, 2019 at 10:53:18PM +0800, Jin, Yao wrote: > > SNIP > >>>> +static struct block_header_column{ >>>> + const char *name; >>>> + int width; >>>> +} block_columns[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = { >>>> + [PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_COV] = { >>>> + .name = "Sampled Cycles%", >>>> + .width = 15, >>>> + }, >>>> + [PERF_HPP_REPORT__BLOCK_LBR_CYCLES] = { >>>> + .name = "Sampled Cycles", >>>> + .width = 14, >>>> + }, >>>> + [PERF_HPP_REPORT__BLOCK_CYCLES_PCT] = { >>>> + .name = "Avg Cycles%", >>>> + .width = 11, >>>> + }, >>>> + [PERF_HPP_REPORT__BLOCK_AVG_CYCLES] = { >>>> + .name = "Avg Cycles", >>>> + .width = 10, >>>> + }, >>>> + [PERF_HPP_REPORT__BLOCK_RANGE] = { >>>> + .name = "[Program Block Range]", >>>> + .width = 70, >>>> + }, >>>> + [PERF_HPP_REPORT__BLOCK_DSO] = { >>>> + .name = "Shared Object", >>>> + .width = 20, >>>> + } >>>> }; >>> >>> so we already have support for multiple columns, >>> why don't you add those as 'struct sort_entry' objects? >>> >> >> For 'struct sort_entry' objects, do you mean I should reuse the "sort_dso" >> which has been implemented yet in util/sort.c? >> >> For other columns, it looks we can't reuse the existing sort_entry objects. > > I did not mean reuse, just add new sort entries > to current sort framework > Does it seem like what the c2c does? For example, my new update is like: struct block_dimension { const char *header; int idx; int width; struct sort_entry *se; int64_t (*cmp)(struct perf_hpp_fmt *, struct hist_entry *, struct hist_entry *); int64_t (*sort)(struct perf_hpp_fmt *, struct hist_entry *, struct hist_entry *); int (*entry)(struct perf_hpp_fmt *, struct perf_hpp *, struct hist_entry *); }; struct block_fmt { struct perf_hpp_fmt fmt; struct report *rep; struct block_dimension *dim; }; static struct block_dimension dim_total_cycles_pct { .header = "Sampled Cycles%", .idx = PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_PCT, .width = 15, .cmp = block_info__cmp; .sort = block_cycles_cov_sort; .entry = block_cycles_cov_entry; }; ...... Is above new update correct? >> >>> SNIP >>> >>>> +{ >>>> + struct block_hist *bh = &rep->block_hist; >>>> + >>>> + get_block_hists(hists, bh, rep); >>>> + symbol_conf.report_individual_block = true; >>>> + hists__fprintf(&bh->block_hists, true, 0, 0, 0, >>>> + stdout, true); >>>> + hists__delete_entries(&bh->block_hists); >>>> + return 0; >>>> +} >>>> + >>>> static int perf_evlist__tty_browse_hists(struct evlist *evlist, >>>> struct report *rep, >>>> const char *help) >>>> @@ -500,6 +900,12 @@ static int perf_evlist__tty_browse_hists(struct evlist *evlist, >>>> continue; >>>> hists__fprintf_nr_sample_events(hists, rep, evname, stdout); >>>> + >>>> + if (rep->total_cycles) { >>>> + hists__fprintf_all_blocks(hists, rep); >>> >>> so this call kicks all the block info setup/count/print, right? >>> >> >> Yes, all in this call. >> >>> I thingk it shouldn't be in the output code, but in the code before.. >>> from what I see you could count block_info counts during the sample >>> processing, no? >>> >> >> In sample processing, we just get all symbols and account the cycles per >> symbol. We need to create/count the block_info at some points after the >> sample processing. > > understand, but it needs to be outside display function > OK, I will move the code for block_info collection outside of display function. > also, can't you gather the block_info data gradually > during the sample processing? > Looks we have to gather the block_info after the sample processing. That's because in each sample processing, we will call hist__account_cycles(). Then finally __symbol__account_cycles() gets called for accounting one basic block in this symbol. ch[offset].num_aggr++; ch[offset].cycles_aggr += cycles; So actually, after the sample processing, we will get the num_aggr and cycles_aggr for this basic block and compute the average cycles for this block (cycles_aggr / num_aggr). That's why I think we should gather the block_info after the sample processing. Thanks Jin Yao > jirka > >> >> Maybe it's not very good to put block info setup/count/print in a call, but >> it's really not easy to process the block_info during the sample processing. >> >> Thanks >> Jin Yao >> >>> jirka >>>