From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751617Ab2BQFYJ (ORCPT ); Fri, 17 Feb 2012 00:24:09 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:55966 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750915Ab2BQFYH convert rfc822-to-8bit (ORCPT ); Fri, 17 Feb 2012 00:24:07 -0500 MIME-Version: 1.0 In-Reply-To: <20120216180841.GC31977@mudshark.cambridge.arm.com> References: <1329323900.2293.150.camel@twins> <20120216150004.GE2641@mudshark.cambridge.arm.com> <1329409183.2293.245.camel@twins> <20120216180841.GC31977@mudshark.cambridge.arm.com> Date: Fri, 17 Feb 2012 13:24:02 +0800 Message-ID: Subject: Re: oprofile and ARM A9 hardware counter From: Ming Lei To: Will Deacon Cc: Peter Zijlstra , "eranian@gmail.com" , "Shilimkar, Santosh" , David Long , "b-cousson@ti.com" , "mans@mansr.com" , linux-arm , Ingo Molnar , Linux Kernel Mailing List Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 17, 2012 at 2:08 AM, Will Deacon wrote: > > The more I think about this, the more I think that the overflow parameter to > armpmu_event_update needs to go. It was introduced to prevent massive event > loss in non-sampling mode, but I think we can get around that by changing > the default sample_period to be half of the max_period, therefore giving > ourselves a much better chance of handling the interrupt before new wraps > around past prev. > > Ming Lei - can you try the following please? If it works for you, then I'll > do it properly and kill the overflow parameter altogether. Of course, it does work for the problem reported by Stephane since it changes the delta computation basically as I did, but I am afraid that it may be not good enough for the issue fixed in a737823d ("ARM: 6835/1: perf: ensure overflows aren't missed due to IRQ latency"). > > Thanks, > > Will > > git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c > index 5bb91bf..ef597a3 100644 > --- a/arch/arm/kernel/perf_event.c > +++ b/arch/arm/kernel/perf_event.c > @@ -193,13 +193,7 @@ again: >                             new_raw_count) != prev_raw_count) >                goto again; > > -       new_raw_count &= armpmu->max_period; > -       prev_raw_count &= armpmu->max_period; > - > -       if (overflow) > -               delta = armpmu->max_period - prev_raw_count + new_raw_count + 1; > -       else > -               delta = new_raw_count - prev_raw_count; > +       delta = (new_raw_count - prev_raw_count) & armpmu->max_period; > >        local64_add(delta, &event->count); >        local64_sub(delta, &hwc->period_left); > @@ -518,7 +512,7 @@ __hw_perf_event_init(struct perf_event *event) >        hwc->config_base            |= (unsigned long)mapping; > >        if (!hwc->sample_period) { > -               hwc->sample_period  = armpmu->max_period; > +               hwc->sample_period  = armpmu->max_period >> 1; If you assume that the issue addressed by a737823d can only happen in non-sample situation, Peter's idea of u32 cast is OK and maybe simpler. But I am afraid that the issue still can be triggered in sample-based situation, especially in very high frequency case: suppose the sample freq is 10000, 100us IRQ delay may trigger the issue. So we may use the overflow information to make perf more robust, IMO. thanks -- Ming Lei From mboxrd@z Thu Jan 1 00:00:00 1970 From: ming.lei@canonical.com (Ming Lei) Date: Fri, 17 Feb 2012 13:24:02 +0800 Subject: oprofile and ARM A9 hardware counter In-Reply-To: <20120216180841.GC31977@mudshark.cambridge.arm.com> References: <1329323900.2293.150.camel@twins> <20120216150004.GE2641@mudshark.cambridge.arm.com> <1329409183.2293.245.camel@twins> <20120216180841.GC31977@mudshark.cambridge.arm.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Fri, Feb 17, 2012 at 2:08 AM, Will Deacon wrote: > > The more I think about this, the more I think that the overflow parameter to > armpmu_event_update needs to go. It was introduced to prevent massive event > loss in non-sampling mode, but I think we can get around that by changing > the default sample_period to be half of the max_period, therefore giving > ourselves a much better chance of handling the interrupt before new wraps > around past prev. > > Ming Lei - can you try the following please? If it works for you, then I'll > do it properly and kill the overflow parameter altogether. Of course, it does work for the problem reported by Stephane since it changes the delta computation basically as I did, but I am afraid that it may be not good enough for the issue fixed in a737823d ("ARM: 6835/1: perf: ensure overflows aren't missed due to IRQ latency"). > > Thanks, > > Will > > git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c > index 5bb91bf..ef597a3 100644 > --- a/arch/arm/kernel/perf_event.c > +++ b/arch/arm/kernel/perf_event.c > @@ -193,13 +193,7 @@ again: > ? ? ? ? ? ? ? ? ? ? ? ? ? ? new_raw_count) != prev_raw_count) > ? ? ? ? ? ? ? ?goto again; > > - ? ? ? new_raw_count &= armpmu->max_period; > - ? ? ? prev_raw_count &= armpmu->max_period; > - > - ? ? ? if (overflow) > - ? ? ? ? ? ? ? delta = armpmu->max_period - prev_raw_count + new_raw_count + 1; > - ? ? ? else > - ? ? ? ? ? ? ? delta = new_raw_count - prev_raw_count; > + ? ? ? delta = (new_raw_count - prev_raw_count) & armpmu->max_period; > > ? ? ? ?local64_add(delta, &event->count); > ? ? ? ?local64_sub(delta, &hwc->period_left); > @@ -518,7 +512,7 @@ __hw_perf_event_init(struct perf_event *event) > ? ? ? ?hwc->config_base ? ? ? ? ? ?|= (unsigned long)mapping; > > ? ? ? ?if (!hwc->sample_period) { > - ? ? ? ? ? ? ? hwc->sample_period ?= armpmu->max_period; > + ? ? ? ? ? ? ? hwc->sample_period ?= armpmu->max_period >> 1; If you assume that the issue addressed by a737823d can only happen in non-sample situation, Peter's idea of u32 cast is OK and maybe simpler. But I am afraid that the issue still can be triggered in sample-based situation, especially in very high frequency case: suppose the sample freq is 10000, 100us IRQ delay may trigger the issue. So we may use the overflow information to make perf more robust, IMO. thanks -- Ming Lei