From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754902AbcK1UXc (ORCPT ); Mon, 28 Nov 2016 15:23:32 -0500 Received: from mga03.intel.com ([134.134.136.65]:32168 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751154AbcK1UXY (ORCPT ); Mon, 28 Nov 2016 15:23:24 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,565,1473145200"; d="scan'208";a="1065338471" From: "Liang, Kan" To: Stephane Eranian CC: Peter Zijlstra , "mingo@redhat.com" , LKML , Alexander Shishkin , "ak@linux.intel.com" , "Odzioba, Lukasz" Subject: RE: [PATCH] perf/x86: fix event counter update issue Thread-Topic: [PATCH] perf/x86: fix event counter update issue Thread-Index: AQHSSa1zo+RK785pc06zvVqph8o/bKDuRUgAgACHeQD//4L9gIAAhmVg Date: Mon, 28 Nov 2016 20:23:19 +0000 Message-ID: <37D7C6CF3E00A74B8858931C1DB2F07750CA3A32@SHSMSX103.ccr.corp.intel.com> References: <1480361206-1702-1-git-send-email-kan.liang@intel.com> <37D7C6CF3E00A74B8858931C1DB2F07750CA39C9@SHSMSX103.ccr.corp.intel.com> In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNjMxNDQxNzItYTc4Yi00NTIxLWE2NTctZjU5MDg0MmM2YmRjIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IklkaXBFTVVCSk1mbjVSdmFVSmFrRk1GREM4KzJMeTR2K1wvbFpMV2RSSWNvPSJ9 x-ctpclassification: CTP_IC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id uASKNabl007254 > >> > Case B: In PMI. new > prev. delta is positive. > >> > That's the failure case of Test 3. > >> > The PMI is triggered by overflow. But there is latency between > >> > overflow and PMI handler. So new has small amount. > >> > Current calculation lose the max count value. > >> > > >> > Case C: In PMI. new < prev. delta is negative. > >> > The PMU counter may be start from a big value. E.g. the fixed period > >> > is small. > >> > It can be fixed by adding the max count value. > >> > > >> I am not sure I understand why PMI should matter here. What matters > >> is prev vs. current and the wrap-around. > >> Please explain. > >> Thanks. > > > > Right, PMI shouldn't matter here. > > We should add max count value if delta is negative, no matter if it’s in > PMI. > > > That sounds right, but then why do you have that boolean (bool pmi) in > your patch? For case B. It needs to add max count value if new > prev in PMI. Thanks, Kan > > > To fix this issue, I once had a list which include all the possible cases. > > "non-PMI, new < prev, delta is negative" is one of them. But it rarely > happen. > > So I remove it, but forget to modify the description of Case C. > > > > Thanks, > > Kan > > > >> > >> > > >> > There is still a case which delta could be wrong. > >> > The case is that event update just happens in PMI and overflow gap. > >> > It's not in PMI, new > prev, and delta is positive. Current > >> > calculation may lose the max count value. > >> > But this case rarely happens. So the rare case doesn't handle here. > >> > > >> > Reported-by: Lukasz Odzioba > >> > Signed-off-by: Kan Liang > >> > Tested-by: Lukasz Odzioba > >> > --- > >> > arch/x86/events/core.c | 23 +++++++++++++++++++---- > >> > arch/x86/events/intel/core.c | 4 ++-- > >> > arch/x86/events/intel/p4.c | 2 +- > >> > arch/x86/events/perf_event.h | 2 +- > >> > 4 files changed, 23 insertions(+), 8 deletions(-) > >> > > >> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index > >> > 6c3b0ef..c351ac4 100644 > >> > --- a/arch/x86/events/core.c > >> > +++ b/arch/x86/events/core.c > >> > @@ -63,7 +63,7 @@ u64 __read_mostly hw_cache_extra_regs > >> > * Can only be executed on the CPU where the event is active. > >> > * Returns the delta events processed. > >> > */ > >> > -u64 x86_perf_event_update(struct perf_event *event) > >> > +u64 x86_perf_event_update(struct perf_event *event, bool pmi) > >> > { > >> > struct hw_perf_event *hwc = &event->hw; > >> > int shift = 64 - x86_pmu.cntval_bits; @@ -100,6 +100,21 @@ > >> > u64 x86_perf_event_update(struct perf_event *event) > >> > delta = (new_raw_count << shift) - (prev_raw_count << shift); > >> > delta >>= shift; > >> > > >> > + /* > >> > + * Correct delta for special cases caused by the late PMI handle. > >> > + * Case1: PMU counter may be start from a big value. > >> > + * In PMI, new < prev. delta is negative. > >> > + * Case2: new is big enough which impact the sign flag. > >> > + * The delta will be negative after arithmetic right shift. > >> > + * Case3: In PMI, new > prev. > >> > + * The new - prev lose the max count value. > >> > + * > >> > + * There may be event update in PMI and overflow gap, > >> > + * but it rarely happen. The rare case doesn't handle here. > >> > + */ > >> > + if (((delta > 0) && pmi) || (delta < 0)) > >> > + delta += x86_pmu.cntval_mask + 1; > >> > + > >> > local64_add(delta, &event->count); > >> > local64_sub(delta, &hwc->period_left); > >> > > >> > @@ -1342,7 +1357,7 @@ void x86_pmu_stop(struct perf_event *event, > >> int flags) > >> > * Drain the remaining delta count out of a event > >> > * that we are disabling: > >> > */ > >> > - x86_perf_event_update(event); > >> > + x86_perf_event_update(event, (flags == 0)); > >> > hwc->state |= PERF_HES_UPTODATE; > >> > } > >> > } > >> > @@ -1446,7 +1461,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs) > >> > > >> > event = cpuc->events[idx]; > >> > > >> > - val = x86_perf_event_update(event); > >> > + val = x86_perf_event_update(event, true); > >> > if (val & (1ULL << (x86_pmu.cntval_bits - 1))) > >> > continue; > >> > > >> > @@ -1867,7 +1882,7 @@ early_initcall(init_hw_perf_events); > >> > > >> > static inline void x86_pmu_read(struct perf_event *event) { > >> > - x86_perf_event_update(event); > >> > + x86_perf_event_update(event, false); > >> > } > >> > > >> > /* > >> > diff --git a/arch/x86/events/intel/core.c > >> > b/arch/x86/events/intel/core.c index a74a2db..69d65e6 100644 > >> > --- a/arch/x86/events/intel/core.c > >> > +++ b/arch/x86/events/intel/core.c > >> > @@ -1830,7 +1830,7 @@ static void intel_pmu_nhm_workaround(void) > >> > for (i = 0; i < 4; i++) { > >> > event = cpuc->events[i]; > >> > if (event) > >> > - x86_perf_event_update(event); > >> > + x86_perf_event_update(event, false); > >> > } > >> > > >> > for (i = 0; i < 4; i++) { > >> > @@ -2002,7 +2002,7 @@ static void intel_pmu_add_event(struct > >> perf_event *event) > >> > */ > >> > int intel_pmu_save_and_restart(struct perf_event *event) { > >> > - x86_perf_event_update(event); > >> > + x86_perf_event_update(event, true); > >> > /* > >> > * For a checkpointed counter always reset back to 0. This > >> > * avoids a situation where the counter overflows, aborts > >> > the diff --git a/arch/x86/events/intel/p4.c > >> > b/arch/x86/events/intel/p4.c index eb05335..8117de8 100644 > >> > --- a/arch/x86/events/intel/p4.c > >> > +++ b/arch/x86/events/intel/p4.c > >> > @@ -1024,7 +1024,7 @@ static int p4_pmu_handle_irq(struct pt_regs > >> *regs) > >> > /* it might be unflagged overflow */ > >> > overflow = p4_pmu_clear_cccr_ovf(hwc); > >> > > >> > - val = x86_perf_event_update(event); > >> > + val = x86_perf_event_update(event, true); > >> > if (!overflow && (val & (1ULL << (x86_pmu.cntval_bits - 1)))) > >> > continue; > >> > > >> > diff --git a/arch/x86/events/perf_event.h > >> > b/arch/x86/events/perf_event.h index c6b25ac..09c9db2 100644 > >> > --- a/arch/x86/events/perf_event.h > >> > +++ b/arch/x86/events/perf_event.h > >> > @@ -712,7 +712,7 @@ extern u64 __read_mostly > hw_cache_extra_regs > >> > [PERF_COUNT_HW_CACHE_OP_MAX] > >> > [PERF_COUNT_HW_CACHE_RESULT_MAX]; > >> > > >> > -u64 x86_perf_event_update(struct perf_event *event); > >> > +u64 x86_perf_event_update(struct perf_event *event, bool pmi); > >> > > >> > static inline unsigned int x86_pmu_config_addr(int index) { > >> > -- > >> > 2.4.3 > >> >