From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761021AbdEVSPY (ORCPT ); Mon, 22 May 2017 14:15:24 -0400 Received: from mail-it0-f41.google.com ([209.85.214.41]:33502 "EHLO mail-it0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759936AbdEVSPU (ORCPT ); Mon, 22 May 2017 14:15:20 -0400 MIME-Version: 1.0 In-Reply-To: <20170522083008.fuofuwlgq6muomjn@hirez.programming.kicks-ass.net> References: <1495213582-3635-1-git-send-email-kan.liang@intel.com> <20170522083008.fuofuwlgq6muomjn@hirez.programming.kicks-ass.net> From: Stephane Eranian Date: Mon, 22 May 2017 11:15:18 -0700 Message-ID: Subject: Re: [PATCH 1/2] perf/x86/intel: enable CPU ref_cycles for GP counter To: Peter Zijlstra Cc: "Liang, Kan" , "mingo@redhat.com" , LKML , Alexander Shishkin , Arnaldo Carvalho de Melo , Jiri Olsa , Linus Torvalds , Thomas Gleixner , Vince Weaver , "ak@linux.intel.com" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Mon, May 22, 2017 at 1:30 AM, Peter Zijlstra wrote: > On Fri, May 19, 2017 at 10:06:21AM -0700, kan.liang@intel.com wrote: >> From: Kan Liang >> >> The CPU ref_cycles can only be used by one user at the same time, >> otherwise a "not counted" error will be displaced. >> [kan]$ sudo perf stat -x, -e ref-cycles,ref-cycles -- sleep 1 >> 1203264,,ref-cycles,513112,100.00,,,, >> ,,ref-cycles,0,0.00,,,, >> >> CPU ref_cycles can only be counted by fixed counter 2. It uses >> pseudo-encoding. The GP counter doesn't recognize. >> >> BUS_CYCLES (0x013c) is another event which is not affected by core >> frequency changes. It has a constant ratio with the CPU ref_cycles. >> BUS_CYCLES could be used as an alternative event for ref_cycles on GP >> counter. >> A hook is implemented in x86_schedule_events. If the fixed counter 2 is >> occupied and a GP counter is assigned, BUS_CYCLES is used to replace >> ref_cycles. A new flag PERF_X86_EVENT_REF_CYCLES_REP in >> hw_perf_event is introduced to indicate the replacement. >> To make the switch transparent, counting and sampling are also specially >> handled. >> - For counting, it multiplies the result with the constant ratio after >> reading it. >> - For sampling with fixed period, the BUS_CYCLES period = ref_cycles >> period / the constant ratio. >> - For sampling with fixed frequency, the adaptive frequency algorithm >> will figure it out on its own. Do nothing. >> >> The constant ratio is model specific. >> For the model after NEHALEM but before Skylake, the ratio is defined in >> MSR_PLATFORM_INFO. >> For the model after Skylake, it can be get from CPUID.15H. >> For Knights Landing, Goldmont and later, the ratio is always 1. >> >> The old Silvermont/Airmont, Core2 and Atom machines are not covered by >> the patch. The behavior on those machines will not change. > > Maybe I missed it, but *why* are we doing this? Yes, I would like to understand the motivation for this added complexity as well. My guess is that you have a situation where ref-cycles is used constantly, i.e., pinned, and therefore you lose the ability to count it for any other user. This is the case when you switch the hard lockup detector (NMI watchdog) to using ref-cycles instead of core cycles. This is what you are doing in patch 2/2 actually. Another scenario could be with virtual machines. KVM makes all guests events use pinned events on the host. So if the guest is measuring ref-cycles, then the host cannot.Well, I am hoping this is not the case because as far as I remember system-wide pinned has higher priority than per-process pinned. You cannot make your change transparent in sampling mode. You are adjusting the period with the ratio. If the user asks for the period to be recorded in each sample, the modified period will be captured. If I say I want to sample every 1M ref-cycles and I set event_attr.sample_type = PERF_SAMPLE_PERIOD, then I expect to see 1M in each sample and not some scaled value. So you need to address this problem, including in frequency mode.