From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D59EC433F5 for ; Thu, 28 Apr 2022 20:16:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351691AbiD1UTS (ORCPT ); Thu, 28 Apr 2022 16:19:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233541AbiD1UTN (ORCPT ); Thu, 28 Apr 2022 16:19:13 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BDDB4BFDE; Thu, 28 Apr 2022 13:15:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651176957; x=1682712957; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=zMklSUCdIQqEBYk+4h1mfKBKNx+yRC9UqcAiC5+Gp/c=; b=L3dgOZih1RaQJxQOZ2VmonmuHlY15fZjdQNglq6g3VXJdOpix2oruwpJ fR0Uf53s/RUIev1OpHEexGveNJLHJgm/foxRJcxbdxJJ5oKnjY1oEHaNT iuAL8uuSXQh/BPlQkU22vsKVu4lPEpq8+5YA+NxjEsLfQKneLjCobwNpH JOc1ffww+QQredmE1FJ2+R//4I17XGUxz8olYDoai7iPpcXgqnNhc7YXR sZTGroYZFyaE9pFEOvLehK63UM/B7L3Psn2Ql3RBWMN/DC6pMjAo0N8s9 fY9UT+FBjfdcK0cEUSWuxg5NXnJiTDJWIdyt0+RyopRBA7zDM6igsWcVF A==; X-IronPort-AV: E=McAfee;i="6400,9594,10331"; a="266566221" X-IronPort-AV: E=Sophos;i="5.91,296,1647327600"; d="scan'208";a="266566221" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Apr 2022 13:15:57 -0700 X-IronPort-AV: E=Sophos;i="5.91,296,1647327600"; d="scan'208";a="581612352" Received: from ahunter6-mobl1.ger.corp.intel.com (HELO [10.0.2.15]) ([10.252.32.153]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Apr 2022 13:15:48 -0700 Message-ID: Date: Thu, 28 Apr 2022 23:15:42 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0 Thunderbird/91.7.0 Subject: Re: [PATCH v3 4/5] perf evlist: Respect all_cpus when setting user_requested_cpus Content-Language: en-US To: Ian Rogers Cc: Stephane Eranian , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Mathieu Poirier , Suzuki K Poulose , Mike Leach , Leo Yan , John Garry , Will Deacon , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Kajol Jain , James Clark , German Gomez , Riccardo Mancini , Andi Kleen , Alexey Bayduraev , Alexander Antonov , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org, bpf@vger.kernel.org References: <20220408035616.1356953-1-irogers@google.com> <20220408035616.1356953-5-irogers@google.com> From: Adrian Hunter Organization: Intel Finland Oy, Registered Address: PL 281, 00181 Helsinki, Business Identity Code: 0357606 - 4, Domiciled in Helsinki In-Reply-To: <20220408035616.1356953-5-irogers@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/04/22 06:56, Ian Rogers wrote: > If all_cpus is calculated it represents the merge/union of all > evsel cpu maps. By default user_requested_cpus is computed to be > the online CPUs. For uncore events, it is often the case currently > that all_cpus is a subset of user_requested_cpus. Metrics printed > without aggregation and with metric-only, in print_no_aggr_metric, > iterate over user_requested_cpus assuming every CPU has a metric to > print. For each CPU the prefix is printed, but then if the > evsel's cpus doesn't contain anything you get an empty line like > the following on a 2 socket 36 core SkylakeX: > > ``` > $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000 > 1.000453137 CPU0 0.00 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 CPU18 0.00 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 2.003717143 CPU0 0.00 > ... > ``` > > While it is possible to be lazier in printing the prefix and > trailing newline, having user_requested_cpus not be a subset of > all_cpus is preferential so that wasted work isn't done elsewhere > user_requested_cpus is used. The change modifies user_requested_cpus > to be the intersection of user specified CPUs, or default all online > CPUs, with the CPUs computed through the merge of all evsel cpu maps. > > New behavior: > ``` > $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000 > 1.001086325 CPU0 0.00 > 1.001086325 CPU18 0.00 > 2.003671291 CPU0 0.00 > 2.003671291 CPU18 0.00 > ... > ``` > > Signed-off-by: Ian Rogers > --- > tools/perf/util/evlist.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c > index 52ea004ba01e..196d57b905a0 100644 > --- a/tools/perf/util/evlist.c > +++ b/tools/perf/util/evlist.c > @@ -1036,6 +1036,13 @@ int evlist__create_maps(struct evlist *evlist, struct target *target) > if (!cpus) > goto out_delete_threads; > > + if (evlist->core.all_cpus) { > + struct perf_cpu_map *tmp; > + > + tmp = perf_cpu_map__intersect(cpus, evlist->core.all_cpus); Isn't an uncore PMU represented as being on CPU0 actually collecting data that can be due to any CPU. Or for an uncore PMU represented as being on CPU0-CPU4 on a 4 core 8 hyperthread processor, actually 1 PMU per core. So I am not sure intersection makes sense. Also it is not obvious what happens with hybrid CPUs or per thread recording. > + perf_cpu_map__put(cpus); > + cpus = tmp; > + } > evlist->core.has_user_cpus = !!target->cpu_list && !target->hybrid; > > perf_evlist__set_maps(&evlist->core, cpus, threads); From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 013C9C433EF for ; Thu, 28 Apr 2022 20:17:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:From:References:Cc:To: Subject:MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=jkP2qNvxmKnhDxnSy9Ff6sPgBewO0dsAkWefOISGApU=; b=UlWzjOuDMH3i/P WYBlW6gp+qg0PU9htOcH9LkaVG/NuhSkNZUlxcpx0YQD/ysiLRKiJbn6LalaFMt6m/fpuosCedvTU rzEcAKuJNIPf56vjoq3fBblLRcE+yDGw0RgJnlBiZtdTHtQzoRd9GYe73haSUb03/t3Umany635up 5Ya9d8vawFp67gATsVql5lOvxN1Z8Fj1dj7mKQwZ+va1aQDoLSjEDMWSp2OHszvZEkfeAkq//BIt3 CVT+fOB9DfO66qN+STdMyp1IvUg4qxrIyaWp2NEe55PVukJXzzT0XnW0hTFNYepgfmx/PijkQ21xP a37wqd54kdnW0s41DqwQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nkAYW-008MNs-50; Thu, 28 Apr 2022 20:16:04 +0000 Received: from mga09.intel.com ([134.134.136.24]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nkAYS-008MMK-9b for linux-arm-kernel@lists.infradead.org; Thu, 28 Apr 2022 20:16:01 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651176960; x=1682712960; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=zMklSUCdIQqEBYk+4h1mfKBKNx+yRC9UqcAiC5+Gp/c=; b=LjC+4WS2PIwU2kXqhPc2XOqrkdokY1jSXVm0cn7CQKCdJqT42xsmPJCB f4inS6IBLPibvQwP1yjuLl2o8fzRBGa7Vj1+5df5GkX3OapXnUMBy4hzx ctePuYUZd+hgfHqmNxP+cqEqIi+yqUH8GMDjjkbc6j6SvAmqJH2n3EoYK s7ctuS/Gu1CVx0iO3C+6AvoWMCywe6eugXTXgKZcNXogRPClHGn4N9pOc C4NNes1CQMvpp4595gRx2ESy3NtaKBsXaDNcXhgJeyIvDKJHO4s8blWTE KRSwAH9WqCqXe9w7cslzr12Ej6FED+Zc8vNCF0lhWp+dvBkGBFI0Z7pf8 g==; X-IronPort-AV: E=McAfee;i="6400,9594,10331"; a="265937524" X-IronPort-AV: E=Sophos;i="5.91,296,1647327600"; d="scan'208";a="265937524" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Apr 2022 13:15:57 -0700 X-IronPort-AV: E=Sophos;i="5.91,296,1647327600"; d="scan'208";a="581612352" Received: from ahunter6-mobl1.ger.corp.intel.com (HELO [10.0.2.15]) ([10.252.32.153]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Apr 2022 13:15:48 -0700 Message-ID: Date: Thu, 28 Apr 2022 23:15:42 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0 Thunderbird/91.7.0 Subject: Re: [PATCH v3 4/5] perf evlist: Respect all_cpus when setting user_requested_cpus Content-Language: en-US To: Ian Rogers Cc: Stephane Eranian , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Mathieu Poirier , Suzuki K Poulose , Mike Leach , Leo Yan , John Garry , Will Deacon , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Kajol Jain , James Clark , German Gomez , Riccardo Mancini , Andi Kleen , Alexey Bayduraev , Alexander Antonov , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org, bpf@vger.kernel.org References: <20220408035616.1356953-1-irogers@google.com> <20220408035616.1356953-5-irogers@google.com> From: Adrian Hunter Organization: Intel Finland Oy, Registered Address: PL 281, 00181 Helsinki, Business Identity Code: 0357606 - 4, Domiciled in Helsinki In-Reply-To: <20220408035616.1356953-5-irogers@google.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220428_131600_464135_626D215F X-CRM114-Status: GOOD ( 24.09 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 8/04/22 06:56, Ian Rogers wrote: > If all_cpus is calculated it represents the merge/union of all > evsel cpu maps. By default user_requested_cpus is computed to be > the online CPUs. For uncore events, it is often the case currently > that all_cpus is a subset of user_requested_cpus. Metrics printed > without aggregation and with metric-only, in print_no_aggr_metric, > iterate over user_requested_cpus assuming every CPU has a metric to > print. For each CPU the prefix is printed, but then if the > evsel's cpus doesn't contain anything you get an empty line like > the following on a 2 socket 36 core SkylakeX: > > ``` > $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000 > 1.000453137 CPU0 0.00 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 CPU18 0.00 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 1.000453137 > 2.003717143 CPU0 0.00 > ... > ``` > > While it is possible to be lazier in printing the prefix and > trailing newline, having user_requested_cpus not be a subset of > all_cpus is preferential so that wasted work isn't done elsewhere > user_requested_cpus is used. The change modifies user_requested_cpus > to be the intersection of user specified CPUs, or default all online > CPUs, with the CPUs computed through the merge of all evsel cpu maps. > > New behavior: > ``` > $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000 > 1.001086325 CPU0 0.00 > 1.001086325 CPU18 0.00 > 2.003671291 CPU0 0.00 > 2.003671291 CPU18 0.00 > ... > ``` > > Signed-off-by: Ian Rogers > --- > tools/perf/util/evlist.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c > index 52ea004ba01e..196d57b905a0 100644 > --- a/tools/perf/util/evlist.c > +++ b/tools/perf/util/evlist.c > @@ -1036,6 +1036,13 @@ int evlist__create_maps(struct evlist *evlist, struct target *target) > if (!cpus) > goto out_delete_threads; > > + if (evlist->core.all_cpus) { > + struct perf_cpu_map *tmp; > + > + tmp = perf_cpu_map__intersect(cpus, evlist->core.all_cpus); Isn't an uncore PMU represented as being on CPU0 actually collecting data that can be due to any CPU. Or for an uncore PMU represented as being on CPU0-CPU4 on a 4 core 8 hyperthread processor, actually 1 PMU per core. So I am not sure intersection makes sense. Also it is not obvious what happens with hybrid CPUs or per thread recording. > + perf_cpu_map__put(cpus); > + cpus = tmp; > + } > evlist->core.has_user_cpus = !!target->cpu_list && !target->hybrid; > > perf_evlist__set_maps(&evlist->core, cpus, threads); _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel