From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 005CAC4724C for ; Fri, 8 May 2020 05:36:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CFB34208CA for ; Fri, 8 May 2020 05:36:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tvp8IsOM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726792AbgEHFgf (ORCPT ); Fri, 8 May 2020 01:36:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725829AbgEHFge (ORCPT ); Fri, 8 May 2020 01:36:34 -0400 Received: from mail-qv1-xf4a.google.com (mail-qv1-xf4a.google.com [IPv6:2607:f8b0:4864:20::f4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48393C05BD0A for ; Thu, 7 May 2020 22:36:34 -0700 (PDT) Received: by mail-qv1-xf4a.google.com with SMTP id ev8so694470qvb.7 for ; Thu, 07 May 2020 22:36:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=NVxIVOTpUFm5R6D1Ik0bw3E1/vQaVLUz9mgqWgfmQ4M=; b=tvp8IsOM9NJeODYcUfrQdDk+mbK5pXcoU54CyXgGc9nlNplTAcXH2kjFPV4ToT5RVu ob/9ZHHnprIfIcW4hvYigRy+mEKhMmY0wH6kMTvsVurPm8bSSw4imsP9w9KWF7SouF/o 5ipucGk9a4i/rpvKU5sVwzfLj7CyKWNP3SfNtAFWOaNvg/UwzRNOzLbesmH/idhQZukl m86Qoyo4NxTfBRTO7t8VrBQxR5TUrK5NsJE/iCeCjzTPp+UIE5dgSgIYPtPdp2cJjJRz 7xxCRa5LDU/rijB2c3bdYXCva3gcoF9rCn8cfGw/Na17zLgn+oQ/+ntgok1PfkcLzDBX J1sQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=NVxIVOTpUFm5R6D1Ik0bw3E1/vQaVLUz9mgqWgfmQ4M=; b=reZQqId4mwjeANsx5LxS3XDWcYxH/x5ByJgHdwF5YhXPTdSKKzm6erGWHkVPhYrYMS j885oTyaRmVEinGAUyjWM2EnGlsIccg+TlCm60xN9IpQw+ZTzx2gEtP/OEsw3OW+8H27 AKRF//+II1oRz3tU7EDzMcPRSMZ45O+j6V/e3cUDCnfPRIpF5LqoGsJFe8xYon6+PReU 7BPBjAXctLkUoWxTK3GDU4jtl3qXfOLui+7wPBneedUHwH4b+QpEuklq1U1YwX/8vics ZDCuOWLcF/0PrwCFqeZDh2xV/feu4tGOEUZZXc20jbG9IaDZqqyZ/MIkODsrW1F0HxtZ Dr7w== X-Gm-Message-State: AGi0PuZ1z9xVcNaKlwOb65d73JsRqbRDVGHu+FsFDLacVydR7cfRCvnA It6QLX5z6byUZZerJVDrh67QLQbU+c3B X-Google-Smtp-Source: APiQypJ6oQcDstyt4s+HU1dq0c4aqe7RVxAX5TfQEd0U9dQnkRzl3SL1aOuROG0bLAIF56J2nyjgxhgE7Am3 X-Received: by 2002:a0c:ec07:: with SMTP id y7mr1059063qvo.183.1588916193207; Thu, 07 May 2020 22:36:33 -0700 (PDT) Date: Thu, 7 May 2020 22:36:15 -0700 Message-Id: <20200508053629.210324-1-irogers@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.26.2.645.ge9eca65c58-goog Subject: [RFC PATCH v3 00/14] Share events between metrics From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , Andrii Nakryiko , John Fastabend , KP Singh , Kajol Jain , Andi Kleen , John Garry , Jin Yao , Kan Liang , Cong Wang , Kim Phillips , linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, linux-perf-users@vger.kernel.org, Vince Weaver , Stephane Eranian , Ian Rogers Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Metric groups contain metrics. Metrics create groups of events to ideally be scheduled together. Often metrics refer to the same events, for example, a cache hit and cache miss rate. Using separate event groups means these metrics are multiplexed at different times and the counts don't sum to 100%. More multiplexing also decreases the accuracy of the measurement. This change orders metrics from groups or the command line, so that the ones with the most events are set up first. Later metrics see if groups already provide their events, and reuse them if possible. Unnecessary events and groups are eliminated. The option --metric-no-group is added so that metrics aren't placed in groups. This affects multiplexing and may increase sharing. The option --metric-mo-merge is added and with this option the existing grouping behavior is preserved. RFC because: - without this change events within a metric may get scheduled together, after they may appear as part of a larger group and be multiplexed at different times, lowering accuracy - however, less multiplexing may compensate for this. - libbpf's hashmap is used, however, libbpf is an optional requirement for building perf. - other things I'm not thinking of. Thanks! Example on Sandybridge: $ perf stat -a --metric-no-merge -M TopDownL1_SMT sleep 1 Performance counter stats for 'system wide': 14931177 cpu_clk_unhalted.one_thread_active # 0.47 Backend_Bound_SMT (12.45%) 32314653 int_misc.recovery_cycles_any (16.23%) 555020905 uops_issued.any (18.85%) 1038651176 idq_uops_not_delivered.core (24.95%) 43003170 cpu_clk_unhalted.ref_xclk (25.20%) 1154926272 cpu_clk_unhalted.thread (31.50%) 656873544 uops_retired.retire_slots (31.11%) 16491988 cpu_clk_unhalted.one_thread_active # 0.06 Bad_Speculation_SMT (31.10%) 32064061 int_misc.recovery_cycles_any (31.04%) 648394934 uops_issued.any (31.14%) 42107506 cpu_clk_unhalted.ref_xclk (24.94%) 1124565282 cpu_clk_unhalted.thread (31.14%) 523430886 uops_retired.retire_slots (31.05%) 12328380 cpu_clk_unhalted.one_thread_active # 0.35 Frontend_Bound_SMT (10.08%) 42651836 cpu_clk_unhalted.ref_xclk (10.08%) 1006287722 idq_uops_not_delivered.core (10.08%) 1130593027 cpu_clk_unhalted.thread (10.08%) 14209258 cpu_clk_unhalted.one_thread_active # 0.18 Retiring_SMT (6.39%) 41904474 cpu_clk_unhalted.ref_xclk (6.39%) 522251584 uops_retired.retire_slots (6.39%) 1111257754 cpu_clk_unhalted.thread (6.39%) 12930094 cpu_clk_unhalted.one_thread_active # 2865823806.05 SLOTS_SMT (11.06%) 40975376 cpu_clk_unhalted.ref_xclk (11.06%) 1089204936 cpu_clk_unhalted.thread (11.06%) 1.002165509 seconds time elapsed $ perf stat -a -M TopDownL1_SMT sleep 1 Performance counter stats for 'system wide': 11893411 cpu_clk_unhalted.one_thread_active # 2715516883.49 SLOTS_SMT # 0.19 Retiring_SMT # 0.33 Frontend_Bound_SMT # 0.04 Bad_Speculation_SMT # 0.44 Backend_Bound_SMT (71.46%) 28458253 int_misc.recovery_cycles_any (71.44%) 562710994 uops_issued.any (71.42%) 907105260 idq_uops_not_delivered.core (57.12%) 39797715 cpu_clk_unhalted.ref_xclk (57.12%) 1045357060 cpu_clk_unhalted.thread (71.41%) 504809283 uops_retired.retire_slots (71.44%) 1.001939294 seconds time elapsed Note that without merging the metrics sum to 1.06, but with merging the sum is 1. Example on Cascadelake: $ perf stat -a --metric-no-merge -M TopDownL1_SMT sleep 1 Performance counter stats for 'system wide': 13678949 cpu_clk_unhalted.one_thread_active # 0.59 Backend_Bound_SMT (13.35%) 121286613 int_misc.recovery_cycles_any (18.58%) 4041490966 uops_issued.any (18.81%) 2665605457 idq_uops_not_delivered.core (24.81%) 111757608 cpu_clk_unhalted.ref_xclk (25.03%) 7579026491 cpu_clk_unhalted.thread (31.27%) 3848429110 uops_retired.retire_slots (31.23%) 15554046 cpu_clk_unhalted.one_thread_active # 0.02 Bad_Speculation_SMT (31.19%) 119582342 int_misc.recovery_cycles_any (31.16%) 3813943706 uops_issued.any (31.14%) 113151605 cpu_clk_unhalted.ref_xclk (24.89%) 7621196102 cpu_clk_unhalted.thread (31.12%) 3735690253 uops_retired.retire_slots (31.12%) 13727352 cpu_clk_unhalted.one_thread_active # 0.16 Frontend_Bound_SMT (12.50%) 115441454 cpu_clk_unhalted.ref_xclk (12.50%) 2824946246 idq_uops_not_delivered.core (12.50%) 7817227775 cpu_clk_unhalted.thread (12.50%) 13267908 cpu_clk_unhalted.one_thread_active # 0.21 Retiring_SMT (6.31%) 114015605 cpu_clk_unhalted.ref_xclk (6.31%) 3722498773 uops_retired.retire_slots (6.31%) 7771438396 cpu_clk_unhalted.thread (6.31%) 14948307 cpu_clk_unhalted.one_thread_active # 18085611559.36 SLOTS_SMT (6.30%) 115632797 cpu_clk_unhalted.ref_xclk (6.30%) 8007628156 cpu_clk_unhalted.thread (6.30%) 1.006256703 seconds time elapsed $ perf stat -a -M TopDownL1_SMT sleep 1 Performance counter stats for 'system wide': 35999534 cpu_clk_unhalted.one_thread_active # 25969550384.66 SLOTS_SMT # 0.40 Retiring_SMT # 0.14 Frontend_Bound_SMT # 0.02 Bad_Speculation_SMT # 0.44 Backend_Bound_SMT (71.35%) 133499018 int_misc.recovery_cycles_any (71.36%) 10736468874 uops_issued.any (71.40%) 3518076530 idq_uops_not_delivered.core (57.24%) 78296616 cpu_clk_unhalted.ref_xclk (57.25%) 8894997400 cpu_clk_unhalted.thread (71.50%) 10409738753 uops_retired.retire_slots (71.40%) 1.011611791 seconds time elapsed Note that without merging the metrics sum to 0.98, but with merging the sum is 1. v3. is a rebase with following the merging of patches in v2. It also adds the metric-no-group and metric-no-merge flags. v2. is the entire patch set based on acme's perf/core tree and includes a cherry-picks. Patch 13 was sent for review to the bpf maintainers here: https://lore.kernel.org/lkml/20200506205257.8964-2-irogers@google.com/ v1. was based on the perf metrics fixes and test sent here: https://lore.kernel.org/lkml/20200501173333.227162-1-irogers@google.com/ Andrii Nakryiko (1): libbpf: Fix memory leak and possible double-free in hashmap__clear Ian Rogers (13): perf parse-events: expand add PMU error/verbose messages perf test: improve pmu event metric testing lib/bpf hashmap: increase portability perf expr: fix memory leaks in bison perf evsel: fix 2 memory leaks perf expr: migrate expr ids table to libbpf's hashmap perf metricgroup: change evlist_used to a bitmap perf metricgroup: free metric_events on error perf metricgroup: always place duration_time last perf metricgroup: delay events string creation perf metricgroup: order event groups by size perf metricgroup: remove duped metric group events perf metricgroup: add options to not group or merge tools/lib/bpf/hashmap.c | 7 + tools/lib/bpf/hashmap.h | 3 +- tools/perf/Documentation/perf-stat.txt | 19 ++ tools/perf/arch/x86/util/intel-pt.c | 32 +-- tools/perf/builtin-stat.c | 11 +- tools/perf/tests/builtin-test.c | 5 + tools/perf/tests/expr.c | 41 ++-- tools/perf/tests/pmu-events.c | 159 +++++++++++++- tools/perf/tests/pmu.c | 4 +- tools/perf/tests/tests.h | 2 + tools/perf/util/evsel.c | 2 + tools/perf/util/expr.c | 129 +++++++----- tools/perf/util/expr.h | 22 +- tools/perf/util/expr.y | 25 +-- tools/perf/util/metricgroup.c | 277 ++++++++++++++++--------- tools/perf/util/metricgroup.h | 6 +- tools/perf/util/parse-events.c | 29 ++- tools/perf/util/pmu.c | 33 +-- tools/perf/util/pmu.h | 2 +- tools/perf/util/stat-shadow.c | 49 +++-- tools/perf/util/stat.h | 2 + 21 files changed, 592 insertions(+), 267 deletions(-) -- 2.26.2.645.ge9eca65c58-goog