From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752295AbcD2VK3 (ORCPT ); Fri, 29 Apr 2016 17:10:29 -0400 Received: from mail-vk0-f53.google.com ([209.85.213.53]:34312 "EHLO mail-vk0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750875AbcD2VK1 (ORCPT ); Fri, 29 Apr 2016 17:10:27 -0400 MIME-Version: 1.0 In-Reply-To: References: <1461905018-86355-1-git-send-email-davidcc@google.com> Date: Fri, 29 Apr 2016 14:10:25 -0700 Message-ID: Subject: Re: [PATCH 00/32] 2nd Iteration of Cache QoS Monitoring support. From: David Carrillo-Cisneros To: Vikas Shivappa Cc: Peter Zijlstra , Alexander Shishkin , Arnaldo Carvalho de Melo , Ingo Molnar , Tony Luck , Stephane Eranian , Paul Turner , x86@kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org peterz/queue perf/core On Fri, Apr 29, 2016 at 2:06 PM Vikas Shivappa wrote: > > > > On Thu, 28 Apr 2016, David Carrillo-Cisneros wrote: > > > This series introduces the next iteration of kernel support for the > > Cache QoS Monitoring (CQM) technology available in Intel Xeon processors. > > Wondering what is the kernel version this compiles on ? > > Thanks, > Vikas > > > > > One of the main limitations of the previous version is the inability > > to simultaneously monitor: > > 1) cpu event and any other event in that cpu. > > 2) cgroup events for cgroups in same descendancy line. > > 3) cgroup events and any thread event of a cgroup in the same > > descendancy line. > > > > Another limitation is that monitoring for a cgroup was enabled/disabled by > > the existence of a perf event for that cgroup. Since the event > > llc_occupancy measures changes in occupancy rather than total occupancy, > > in order to read meaningful llc_occupancy values, an event should be > > enabled for a long enough period of time. The overhead in context switches > > caused by the perf events is undesired in some sensitive scenarios. > > > > This series of patches addresses the shortcomings mentioned above and, > > add some other improvements. The main changes are: > > - No more potential conflicts between different events. New > > version builds a hierarchy of RMIDs that captures the dependency > > between monitored cgroups. llc_occupancy for cgroup is the sum of > > llc_occupancies for that cgroup RMID and all other RMIDs in the > > cgroups subtree (both monitored cgroups and threads). > > > > - A cgroup integration that allows to monitor the a cgroup without > > creating a perf event, decreasing the context switch overhead. > > Monitoring is controlled by a boolean cgroup subsystem attribute > > in each perf cgroup, this is: > > > > echo 1 > cgroup_path/perf_event.cqm_cont_monitoring > > > > starts CQM monitoring whether or not there is a perf_event > > attached to the cgroup. Setting the attribute to 0 makes > > monitoring dependent on the existence of a perf_event. > > A perf_event is always required in order to read llc_occupancy. > > This cgroup integration uses Intel's PQR code and is intended to > > be used by upcoming versions of Intel's CAT. > > > > - A more stable rotation algorithm: New algorithm uses SLOs that > > guarantee: > > - A minimum of enabled time for monitored cgroups and > > threads. > > - A maximum time disabled before error is introduced by > > reusing dirty RMIDs. > > - A minimum rate at which RMIDs recycling must progress. > > > > - Reduced impact of stealing/rotation of RMIDs: The new algorithm > > accounts the residual occupancy held by limbo RMIDs towards the > > former owner of the limbo RMID, decreasing the error introduced > > by RMID rotation. > > It also allows a limbo RMID to be reused by its former owner when > > appropriate, decreasing the potential error of reusing dirty RMIDs > > and allowing to make progress even if most limbo RMIDs do not > > drop occupancy fast enough. > > > > - Elimination of pmu::count: perf generic's perf_event_count() > > perform a quick add of atomic types. The introduction of > > pmu::count in the previous CQM series to read occupancy for thread > > events changed the behavior of perf_event_count() by performing a > > potentially slow IPI and write/read to MSR. It also made pmu::read > > to have different behaviors depending on whether the event was a > > cpu/cgroup event or a thread. This patches serie removes the custom > > pmu::count from CQM and provides a consistent behavior for all > > calls of perf_event_read . > > > > - Added error return for pmu::read: Reads to CQM events may fail > > due to stealing of RMIDs, even after successfully adding an event > > to a PMU. This patch series expands pmu::read with an int return > > value and propagates the error to callers that can fail > > (ie. perf_read). > > The ability to fail of pmu::read is consistent with the recent > > changes that allow perf_event_read to fail for transactional > > reading of event groups. > > > > - Introduces the field pmu_event_flags that contain flags set by > > the PMU to signal variations on the default behavior to perf's > > generic code. In this series, three flags are introduced: > > - PERF_CGROUP_NO_RECURSION : Signals generic code to add > > events of the cgroup ancestors of a cgroup. > > - PERF_INACTIVE_CPU_READ_PKG: Signals generic coda that > > this CPU event can be read in any CPU in its event::cpu's > > package, even if the event is not active. > > - PERF_INACTIVE_EV_READ_ANY_CPU: Signals generic code that > > this event can be read in any CPU in any package in the > > system even if the event is not active. > > Using the above flags takes advantage of the CQM's hw ability to > > read llc_occupancy even when the associated perf event is not > > running in a CPU. > > > > This patch series also updates the perf tool to fix error handling and to > > better handle the idiosyncrasies of snapshot and per-pkg events. > > > > David Carrillo-Cisneros (31): > > perf/x86/intel/cqm: temporarily remove MBM from CQM and cleanup > > perf/x86/intel/cqm: remove check for conflicting events > > perf/x86/intel/cqm: remove all code for rotation of RMIDs > > perf/x86/intel/cqm: make read of RMIDs per package (Temporal) > > perf/core: remove unused pmu->count > > x86/intel,cqm: add CONFIG_INTEL_RDT configuration flag and refactor > > PQR > > perf/x86/intel/cqm: separate CQM PMU's attributes from x86 PMU > > perf/x86/intel/cqm: prepare for next patches > > perf/x86/intel/cqm: add per-package RMIDs, data and locks > > perf/x86/intel/cqm: basic RMID hierarchy with per package rmids > > perf/x86/intel/cqm: (I)state and limbo prmids > > perf/x86/intel/cqm: add per-package RMID rotation > > perf/x86/intel/cqm: add polled update of RMID's llc_occupancy > > perf/x86/intel/cqm: add preallocation of anodes > > perf/core: add hooks to expose architecture specific features in > > perf_cgroup > > perf/x86/intel/cqm: add cgroup support > > perf/core: adding pmu::event_terminate > > perf/x86/intel/cqm: use pmu::event_terminate > > perf/core: introduce PMU event flag PERF_CGROUP_NO_RECURSION > > x86/intel/cqm: use PERF_CGROUP_NO_RECURSION in CQM > > perf/x86/intel/cqm: handle inherit event and inherit_stat flag > > perf/x86/intel/cqm: introduce read_subtree > > perf/core: introduce PERF_INACTIVE_*_READ_* flags > > perf/x86/intel/cqm: use PERF_INACTIVE_*_READ_* flags in CQM > > sched: introduce the finish_arch_pre_lock_switch() scheduler hook > > perf/x86/intel/cqm: integrate CQM cgroups with scheduler > > perf/core: add perf_event cgroup hooks for subsystem attributes > > perf/x86/intel/cqm: add CQM attributes to perf_event cgroup > > perf,perf/x86,perf/powerpc,perf/arm,perf/*: add int error return to > > pmu::read > > perf,perf/x86: add hook perf_event_arch_exec > > perf/stat: revamp error handling for snapshot and per_pkg events > > > > Stephane Eranian (1): > > perf/stat: fix bug in handling events in error state > > > > arch/alpha/kernel/perf_event.c | 3 +- > > arch/arc/kernel/perf_event.c | 3 +- > > arch/arm64/include/asm/hw_breakpoint.h | 2 +- > > arch/arm64/kernel/hw_breakpoint.c | 3 +- > > arch/metag/kernel/perf/perf_event.c | 5 +- > > arch/mips/kernel/perf_event_mipsxx.c | 3 +- > > arch/powerpc/include/asm/hw_breakpoint.h | 2 +- > > arch/powerpc/kernel/hw_breakpoint.c | 3 +- > > arch/powerpc/perf/core-book3s.c | 11 +- > > arch/powerpc/perf/core-fsl-emb.c | 5 +- > > arch/powerpc/perf/hv-24x7.c | 5 +- > > arch/powerpc/perf/hv-gpci.c | 3 +- > > arch/s390/kernel/perf_cpum_cf.c | 5 +- > > arch/s390/kernel/perf_cpum_sf.c | 3 +- > > arch/sh/include/asm/hw_breakpoint.h | 2 +- > > arch/sh/kernel/hw_breakpoint.c | 3 +- > > arch/sparc/kernel/perf_event.c | 2 +- > > arch/tile/kernel/perf_event.c | 3 +- > > arch/x86/Kconfig | 6 + > > arch/x86/events/amd/ibs.c | 2 +- > > arch/x86/events/amd/iommu.c | 5 +- > > arch/x86/events/amd/uncore.c | 3 +- > > arch/x86/events/core.c | 3 +- > > arch/x86/events/intel/Makefile | 3 +- > > arch/x86/events/intel/bts.c | 3 +- > > arch/x86/events/intel/cqm.c | 3847 +++++++++++++++++++++--------- > > arch/x86/events/intel/cqm.h | 519 ++++ > > arch/x86/events/intel/cstate.c | 3 +- > > arch/x86/events/intel/pt.c | 3 +- > > arch/x86/events/intel/rapl.c | 3 +- > > arch/x86/events/intel/uncore.c | 3 +- > > arch/x86/events/intel/uncore.h | 2 +- > > arch/x86/events/msr.c | 3 +- > > arch/x86/include/asm/hw_breakpoint.h | 2 +- > > arch/x86/include/asm/perf_event.h | 41 + > > arch/x86/include/asm/pqr_common.h | 74 + > > arch/x86/include/asm/processor.h | 4 + > > arch/x86/kernel/cpu/Makefile | 4 + > > arch/x86/kernel/cpu/pqr_common.c | 43 + > > arch/x86/kernel/hw_breakpoint.c | 3 +- > > arch/x86/kvm/pmu.h | 10 +- > > drivers/bus/arm-cci.c | 3 +- > > drivers/bus/arm-ccn.c | 3 +- > > drivers/perf/arm_pmu.c | 3 +- > > include/linux/perf_event.h | 91 +- > > kernel/events/core.c | 170 +- > > kernel/sched/core.c | 1 + > > kernel/sched/sched.h | 3 + > > kernel/trace/bpf_trace.c | 5 +- > > tools/perf/builtin-stat.c | 43 +- > > tools/perf/util/counts.h | 19 + > > tools/perf/util/evsel.c | 44 +- > > tools/perf/util/evsel.h | 8 +- > > tools/perf/util/stat.c | 35 +- > > 54 files changed, 3746 insertions(+), 1337 deletions(-) > > create mode 100644 arch/x86/events/intel/cqm.h > > create mode 100644 arch/x86/include/asm/pqr_common.h > > create mode 100644 arch/x86/kernel/cpu/pqr_common.c > > > > -- > > 2.8.0.rc3.226.g39d4020 > > > >