From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753295AbbCWMfX (ORCPT ); Mon, 23 Mar 2015 08:35:23 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:58687 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753121AbbCWMfS (ORCPT ); Mon, 23 Mar 2015 08:35:18 -0400 Date: Mon, 23 Mar 2015 13:35:07 +0100 From: Peter Zijlstra To: Ingo Molnar Cc: Andi Kleen , linux-kernel@vger.kernel.org, Andi Kleen Subject: Re: [PATCH 3/3] perf, x86: Add INST_RETIRED.ALL workarounds Message-ID: <20150323123507.GE23123@twins.programming.kicks-ass.net> References: <1424225886-18652-1-git-send-email-andi@firstfloor.org> <1424225886-18652-3-git-send-email-andi@firstfloor.org> <20150323093854.GA24993@gmail.com> <20150323101935.GD23123@twins.programming.kicks-ass.net> <20150323103900.GB12213@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150323103900.GB12213@gmail.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 23, 2015 at 11:39:00AM +0100, Ingo Molnar wrote: > > http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf > > > > BDM11 and BDM55 (not 57) tell us that the PMU will generate crap output > > if you don't do this. Non-fatal but gibberish. > > Should be part of the changelog? Sure, lemme go make that happen. > So I did not say rounding up, I meant this sentence: > > > > > + * [...] We combine the two to enforce > > > > + * a min-period of 128. > > IMO ambiguously suggests that the result of the combination of the two > is to enforce a min-period of 128. Would somethin like this: > > We combine the two to enforce > a min-period of 128, rounded (down) to multiples of 64. > The original period is still kept by the core code and is > approximated in the long run via these slightly fuzzed > hardware-periods. Like so then? --- Subject: perf, x86: Add INST_RETIRED.ALL workarounds From: Andi Kleen Date: Tue, 17 Feb 2015 18:18:06 -0800 On Broadwell INST_RETIRED.ALL cannot be used with any period that doesn't have the lowest 6 bits cleared. And the period should not be smaller than 128. This is erratum BDM11 and BDM55; http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf BDM11: When using a period < 100; we may get incorrect PEBS/PMI interrupts and/or an invalid counter state. BDM55: When bit0-5 of the period are !0 we may get redundant PEBS records on overflow. Add a new callback to enforce this, and set it for Broadwell. How does this handle the case when an app requests a specific period with some of the bottom bits set? Short answer: Any useful instruction sampling period needs to be 4-6 orders of magnitude larger than 128, as an PMI every 128 instructions would instantly overwhelm the system and be throttled. So the +-64 error from this is really small compared to the period, much smaller than normal system jitter. Long answer (by Peterz): IFF we guarantee perf_event_attr::sample_period >= 128. Suppose we start out with sample_period=192; then we'll set period_left to 192, we'll end up with left = 128 (we truncate the lower bits). We get an interrupt, find that period_left = 64 (>0 so we return 0 and don't get an overflow handler), up that to 128. Then we trigger again, at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get an overflow). We increment with sample_period so we get left = 128. We fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an overflow). And on and on. So while the individual interrupts are 'wrong' we get then with interval=256,128 in exactly the right ratio to average out at 192. And this works for everything >=128. So the num_samples*fixed_period thing is still entirely correct +- 127, which is good enough I'd say, as you already have that error anyhow. So no need to 'fix' the tools, al we need to do is refuse to create INST_RETIRED:ALL events with sample_period < 128. Signed-off-by: Andi Kleen Signed-off-by: Peter Zijlstra (Intel) --- arch/x86/kernel/cpu/perf_event.c | 9 +++++++++ arch/x86/kernel/cpu/perf_event.h | 1 + arch/x86/kernel/cpu/perf_event_intel.c | 27 +++++++++++++++++++++++++++ 3 files changed, 37 insertions(+) --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -451,6 +451,12 @@ int x86_pmu_hw_config(struct perf_event if (event->attr.type == PERF_TYPE_RAW) event->hw.config |= event->attr.config & X86_RAW_EVENT_MASK; + if (event->attr.sample_period && x86_pmu.limit_period) { + if (x86_pmu.limit_period(event, event->attr.sample_period) > + event->attr.sample_period) + return -EINVAL; + } + return x86_setup_perfctr(event); } @@ -988,6 +994,9 @@ int x86_perf_event_set_period(struct per if (left > x86_pmu.max_period) left = x86_pmu.max_period; + if (x86_pmu.limit_period) + left = x86_pmu.limit_period(event, left); + per_cpu(pmc_prev_left[idx], smp_processor_id()) = left; /* --- a/arch/x86/kernel/cpu/perf_event.h +++ b/arch/x86/kernel/cpu/perf_event.h @@ -451,6 +451,7 @@ struct x86_pmu { struct x86_pmu_quirk *quirks; int perfctr_second_write; bool late_ack; + unsigned (*limit_period)(struct perf_event *event, unsigned l); /* * sysfs attrs --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2097,6 +2097,32 @@ hsw_get_event_constraints(struct cpu_hw_ return c; } +/* + * Broadwell: + * + * The INST_RETIRED.ALL period always needs to have lowest 6 bits cleared + * (BDM55) and it must not use a period smaller than 100 (BDM11). We combine + * the two to enforce a minimum period of 128 (the smallest value that has bits + * 0-5 cleared and >= 100). + * + * Because of how the code in x86_perf_event_set_period() works, the truncation + * of the lower 6 bits is 'harmless' as we'll occasionally add a longer period + * to make up for the 'lost' events due to carrying the 'error' in period_left. + * + * Therefore the effective (average) period matches the requested period, + * despite coarser hardware granularity. + */ +static unsigned bdw_limit_period(struct perf_event *event, unsigned left) +{ + if ((event->hw.config & INTEL_ARCH_EVENT_MASK) == + X86_CONFIG(.event=0xc0, .umask=0x01)) { + if (left < 128) + left = 128; + left &= ~0x3fu; + } + return left; +} + PMU_FORMAT_ATTR(event, "config:0-7" ); PMU_FORMAT_ATTR(umask, "config:8-15" ); PMU_FORMAT_ATTR(edge, "config:18" ); @@ -2775,6 +2801,7 @@ __init int intel_pmu_init(void) x86_pmu.hw_config = hsw_hw_config; x86_pmu.get_event_constraints = hsw_get_event_constraints; x86_pmu.cpu_events = hsw_events_attrs; + x86_pmu.limit_period = bdw_limit_period; pr_cont("Broadwell events, "); break;