From mboxrd@z Thu Jan 1 00:00:00 1970 From: Will Deacon Subject: Re: oprofile and ARM A9 hardware counter Date: Mon, 30 Jan 2012 19:14:43 +0000 Message-ID: <20120130191443.GB27764@mudshark.cambridge.arm.com> References: <20120130172438.GC17814@mudshark.cambridge.arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from cam-admin0.cambridge.arm.com ([217.140.96.50]:51616 "EHLO cam-admin0.cambridge.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752404Ab2A3TPC (ORCPT ); Mon, 30 Jan 2012 14:15:02 -0500 Content-Disposition: inline In-Reply-To: Sender: linux-omap-owner@vger.kernel.org List-Id: linux-omap@vger.kernel.org To: "eranian@gmail.com" Cc: =?iso-8859-1?Q?M=E5ns_Rullg=E5rd?= , Ming Lei , "Cousson, Benoit" , "oprofile-list@lists.sourceforge.net" , "linux-omap@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "santosh.shilimkar@ti.com" On Mon, Jan 30, 2012 at 05:45:19PM +0000, stephane eranian wrote: > There you go, no attachment, not sure the omap list > supports this. Cheers Stephane. > There is something quite interesting to observe. > > While I run perf record -e cycles -F 100 noploop 10, I watch > /proc/interrupts. The number of interrupts is way lower than > expected. Therefore the number of samples is way too low: > > $ perf record -e cycles -F 100 noploop 10 > $ perf report -D | tail -20 > cycles stats: > TOTAL events: 535 > MMAP events: 11 > COMM events: 2 > EXIT events: 2 > SAMPLE events: 520 > > The delta in /proc/interrupts on CPU1 is 520 interrupts. Yes, that is about half of what you'd expect. Running on my A9 platform (vexpress) I get: $ perf record -e cycles -F 100 noploop 10 $ perf report -D | tail -20 cycles stats: TOTAL events: 1007 MMAP events: 18 COMM events: 2 EXIT events: 2 SAMPLE events: 985 > So looks like the frequency adjustment which is hooked off of the > timer tick is either not called at each timer tick, the timer ticks are > not at regular interval, or the math is wrong. My hunch is that that the interval is probably varying, but I don't know much about OMAP4 and its clocks. > If I go with the fixed period mode: > $ perf stat -e cycles noploop 10 > noploop for 10 seconds > Performance counter stats for 'noploop 10': > 10079156960 cycles # 0.000 GHz > 10.004547117 seconds time elapsed > > That means, if I want 100 samples/sec: = 10079156960/(10*100)=10079157 > $ perf record -e cycles -c 10079157 noploop 10 > $ perf report -D | tail -20 > cycles stats: > TOTAL events: 1003 > MMAP events: 11 > COMM events: 2 > EXIT events: 2 > THROTTLE events: 1 > UNTHROTTLE events: 1 > SAMPLE events: 986 > > Now, we're getting the right answer! Just to confirm, for me: $ perf stat -e cycles ./noploop 10 noploop for 10 seconds Performance counter stats for './noploop 10': 4001163930 cycles # 0.000 GHz 10.006534024 seconds time elapsed $ perf record -e cycles -c 4001163 ./noploop 10 $ perf report -D | tail -20 Aggregated stats: TOTAL events: 1020 MMAP events: 18 COMM events: 2 EXIT events: 2 SAMPLE events: 998 cycles stats: TOTAL events: 1020 MMAP events: 18 COMM events: 2 EXIT events: 2 SAMPLE events: 998 which is close enough :) > We need to elucidate what's going on in perf_event_task_tick(). > I have tried with my throttling fix and it did not help. We are > not subject to throttling with such a low rate. Ok. I would start by looking at the clock ticks if I were you, since this seems to be alright on my board. Will From mboxrd@z Thu Jan 1 00:00:00 1970 From: will.deacon@arm.com (Will Deacon) Date: Mon, 30 Jan 2012 19:14:43 +0000 Subject: oprofile and ARM A9 hardware counter In-Reply-To: References: <20120130172438.GC17814@mudshark.cambridge.arm.com> Message-ID: <20120130191443.GB27764@mudshark.cambridge.arm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Mon, Jan 30, 2012 at 05:45:19PM +0000, stephane eranian wrote: > There you go, no attachment, not sure the omap list > supports this. Cheers Stephane. > There is something quite interesting to observe. > > While I run perf record -e cycles -F 100 noploop 10, I watch > /proc/interrupts. The number of interrupts is way lower than > expected. Therefore the number of samples is way too low: > > $ perf record -e cycles -F 100 noploop 10 > $ perf report -D | tail -20 > cycles stats: > TOTAL events: 535 > MMAP events: 11 > COMM events: 2 > EXIT events: 2 > SAMPLE events: 520 > > The delta in /proc/interrupts on CPU1 is 520 interrupts. Yes, that is about half of what you'd expect. Running on my A9 platform (vexpress) I get: $ perf record -e cycles -F 100 noploop 10 $ perf report -D | tail -20 cycles stats: TOTAL events: 1007 MMAP events: 18 COMM events: 2 EXIT events: 2 SAMPLE events: 985 > So looks like the frequency adjustment which is hooked off of the > timer tick is either not called at each timer tick, the timer ticks are > not at regular interval, or the math is wrong. My hunch is that that the interval is probably varying, but I don't know much about OMAP4 and its clocks. > If I go with the fixed period mode: > $ perf stat -e cycles noploop 10 > noploop for 10 seconds > Performance counter stats for 'noploop 10': > 10079156960 cycles # 0.000 GHz > 10.004547117 seconds time elapsed > > That means, if I want 100 samples/sec: = 10079156960/(10*100)=10079157 > $ perf record -e cycles -c 10079157 noploop 10 > $ perf report -D | tail -20 > cycles stats: > TOTAL events: 1003 > MMAP events: 11 > COMM events: 2 > EXIT events: 2 > THROTTLE events: 1 > UNTHROTTLE events: 1 > SAMPLE events: 986 > > Now, we're getting the right answer! Just to confirm, for me: $ perf stat -e cycles ./noploop 10 noploop for 10 seconds Performance counter stats for './noploop 10': 4001163930 cycles # 0.000 GHz 10.006534024 seconds time elapsed $ perf record -e cycles -c 4001163 ./noploop 10 $ perf report -D | tail -20 Aggregated stats: TOTAL events: 1020 MMAP events: 18 COMM events: 2 EXIT events: 2 SAMPLE events: 998 cycles stats: TOTAL events: 1020 MMAP events: 18 COMM events: 2 EXIT events: 2 SAMPLE events: 998 which is close enough :) > We need to elucidate what's going on in perf_event_task_tick(). > I have tried with my throttling fix and it did not help. We are > not subject to throttling with such a low rate. Ok. I would start by looking at the clock ticks if I were you, since this seems to be alright on my board. Will