From mboxrd@z Thu Jan 1 00:00:00 1970 From: stephane eranian Subject: Re: oprofile and ARM A9 hardware counter Date: Fri, 27 Jan 2012 16:57:25 +0100 Message-ID: References: <20120127121311.GB2347@mudshark.cambridge.arm.com> <20120127132826.GD2347@mudshark.cambridge.arm.com> <20120127155454.GH2347@mudshark.cambridge.arm.com> Reply-To: eranian@gmail.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mail-vw0-f46.google.com ([209.85.212.46]:47082 "EHLO mail-vw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755885Ab2A0P50 convert rfc822-to-8bit (ORCPT ); Fri, 27 Jan 2012 10:57:26 -0500 Received: by vbbfc26 with SMTP id fc26so1348016vbb.19 for ; Fri, 27 Jan 2012 07:57:25 -0800 (PST) In-Reply-To: <20120127155454.GH2347@mudshark.cambridge.arm.com> Sender: linux-omap-owner@vger.kernel.org List-Id: linux-omap@vger.kernel.org To: Will Deacon Cc: Ming Lei , =?UTF-8?B?TcOlbnMgUnVsbGfDpXJk?= , "Cousson, Benoit" , "oprofile-list@lists.sourceforge.net" , "linux-omap@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , santosh.shilimkar@ti.com On Fri, Jan 27, 2012 at 4:54 PM, Will Deacon wrot= e: > On Fri, Jan 27, 2012 at 03:45:53PM +0000, stephane eranian wrote: >> Hi, > > Hi Stephane, > >> Ok, with the one-line patch [1], this works much better now. >> No more wrap around a 4 billion cycles. > > Hurrah! Thanks Mans and Ming Lei for helping with this. Unfortunately= , I > remember Santosh had objections to this patch so that needs to be res= olved. > Yes, this needs to be resolved ASAP. >> Sampling is okay, though I noticed it tends to not get the >> correct number of samples for a controlled run: >> >> $ perf record -e cycles -c 1009213 noploop 10 >> noploop for 10 seconds >> >> $ perf report -D | tail -20 >> cycles stats: >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TOTAL events: =C2=A0 =C2=A0= =C2=A0 9938 >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 MMAP events: =C2=A0 =C2=A0= =C2=A0 =C2=A0 13 >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 COMM events: =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A02 >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 EXIT events: =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A02 >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 THROTTLE events: =C2=A0 =C2=A0 =C2=A0 =C2= =A0 12 >> =C2=A0 =C2=A0 =C2=A0 UNTHROTTLE events: =C2=A0 =C2=A0 =C2=A0 =C2=A0 = 12 >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 SAMPLE events: =C2=A0 =C2=A0 =C2=A0= 9897 >> >> Should not get throttled samples. Should get abour 10k samples >> but only seeing 9897. The max_rate limit is way higher >> than what I set the period (1000 samples/sec). But then, >> is 3.2.0 throttling is broken. I posted a patch to fix that >> yesterday. I will try with my patch applied as well. > > Ok. Note that on ARM the PMU generates a standard IRQ (i.e. not an NM= I) so > you may miss samples if they occur during critical kernel sections (a= nd if > you look at a profile, spin_unlock_irqrestore will be quite high). > But I am only running a user space noploop. So it spends 99% in user sp= ace, no critical section. > A7 and A15 have the ability to filter counters based on privilege lev= el, so > you can get more accurate userspace counts there. Ok, that's better. Need to update libpfm4 for A15 with priv levels then= ! > > Will -- To unsubscribe from this list: send the line "unsubscribe linux-omap" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 From: eranian@googlemail.com (stephane eranian) Date: Fri, 27 Jan 2012 16:57:25 +0100 Subject: oprofile and ARM A9 hardware counter In-Reply-To: <20120127155454.GH2347@mudshark.cambridge.arm.com> References: <20120127121311.GB2347@mudshark.cambridge.arm.com> <20120127132826.GD2347@mudshark.cambridge.arm.com> <20120127155454.GH2347@mudshark.cambridge.arm.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Fri, Jan 27, 2012 at 4:54 PM, Will Deacon wrote: > On Fri, Jan 27, 2012 at 03:45:53PM +0000, stephane eranian wrote: >> Hi, > > Hi Stephane, > >> Ok, with the one-line patch [1], this works much better now. >> No more wrap around a 4 billion cycles. > > Hurrah! Thanks Mans and Ming Lei for helping with this. Unfortunately, I > remember Santosh had objections to this patch so that needs to be resolved. > Yes, this needs to be resolved ASAP. >> Sampling is okay, though I noticed it tends to not get the >> correct number of samples for a controlled run: >> >> $ perf record -e cycles -c 1009213 noploop 10 >> noploop for 10 seconds >> >> $ perf report -D | tail -20 >> cycles stats: >> ? ? ? ? ? ?TOTAL events: ? ? ? 9938 >> ? ? ? ? ? ? MMAP events: ? ? ? ? 13 >> ? ? ? ? ? ? COMM events: ? ? ? ? ?2 >> ? ? ? ? ? ? EXIT events: ? ? ? ? ?2 >> ? ? ? ? THROTTLE events: ? ? ? ? 12 >> ? ? ? UNTHROTTLE events: ? ? ? ? 12 >> ? ? ? ? ? SAMPLE events: ? ? ? 9897 >> >> Should not get throttled samples. Should get abour 10k samples >> but only seeing 9897. The max_rate limit is way higher >> than what I set the period (1000 samples/sec). But then, >> is 3.2.0 throttling is broken. I posted a patch to fix that >> yesterday. I will try with my patch applied as well. > > Ok. Note that on ARM the PMU generates a standard IRQ (i.e. not an NMI) so > you may miss samples if they occur during critical kernel sections (and if > you look at a profile, spin_unlock_irqrestore will be quite high). > But I am only running a user space noploop. So it spends 99% in user space, no critical section. > A7 and A15 have the ability to filter counters based on privilege level, so > you can get more accurate userspace counts there. Ok, that's better. Need to update libpfm4 for A15 with priv levels then! > > Will