linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
@ 2006-06-26 17:33 Chuck Ebbert
  2006-06-26 19:32 ` Grant Grundler
  2006-06-26 22:37 ` Stephane Eranian
  0 siblings, 2 replies; 7+ messages in thread
From: Chuck Ebbert @ 2006-06-26 17:33 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: oprofile-list, perfmon, linux-ia64, perfctr-devel, linux-kernel

> Also a new version of pfmon, pfmon-3.2-060621, to take advantage of the update in libpfm:
> 
>       - support for 32-bit mode AMD64 processors
>       - updated event name parsing to prepare for separate
>         event unit mask management (Kevin Corry)
>       - fix the detection of unavailable PMC registers. it was causing crashes
>         when used with sampling.
> 
> Note that I have tested 32-bit compiled libpfm,pfmon running on an 64-bit AMD
> perfmon kernel. I have not tested on a 32-bit AMD linux kernel because I don't
> have such setup. I would appreciate any feedback on this.

32-bit works great.  Unfortunately, pfmon is far too limited for serious kernel
monitoring AFAICT.  E.g. you can't select edge counting instead of cycle
counting.  So you can count how many clock cycles were spent with interrupts
disabled but you can't count how many times they were disabled.  That's too bad
because using pfmon is so easy compared to writing a program.

And is someone working on kernel profiling tools that use the perfmon2
infrastructure on i386?  I'd like to see kernel-based profiling that lets
you use something like the existing 'readprofile' to retrieve results.  This
would be a lot better than the current timer-based profiling.

-- 
Chuck
 "You can't read a newspaper if you can't read."  --George W. Bush

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
  2006-06-26 17:33 2.6.17.1 new perfmon code base, libpfm, pfmon available Chuck Ebbert
@ 2006-06-26 19:32 ` Grant Grundler
  2006-06-26 22:37 ` Stephane Eranian
  1 sibling, 0 replies; 7+ messages in thread
From: Grant Grundler @ 2006-06-26 19:32 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Stephane Eranian, oprofile-list, perfmon, linux-ia64,
	perfctr-devel, linux-kernel

On Mon, Jun 26, 2006 at 01:33:03PM -0400, Chuck Ebbert wrote:
> 32-bit works great.  Unfortunately, pfmon is far too limited for serious
> kernel monitoring AFAICT.

I think "far too limited for serious kernel monitoring" is not a fair
statement. One can do some very interesting things as I presented
two years ago at OLS:
	http://iou.parisc-linux.org/ols_2004/pfmon_for_iodorks.pdf

It's just a _very_ complex subsystem and has a steep learning curve
to do some of the more complex things that one might like.

> E.g. you can't select edge counting instead
> of cycle counting.  So you can count how many clock cycles were spent
> with interrupts disabled but you can't count how many times they were
> disabled.

At first glance, this example sounds more like a limitation of the HW
and not the SW.

> And is someone working on kernel profiling tools that use the perfmon2
> infrastructure on i386?  I'd like to see kernel-based profiling that lets
> you use something like the existing 'readprofile' to retrieve results.  This
> would be a lot better than the current timer-based profiling.

Both are useful. I wouldn't say one of necessarily better.
FWIW, the "CPU_CYCLES" counts from pfmon aren't timer based on ia64.
AFAIK, the HW counters are sampled to gather those counts.

thanks,
grant

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
  2006-06-26 17:33 2.6.17.1 new perfmon code base, libpfm, pfmon available Chuck Ebbert
  2006-06-26 19:32 ` Grant Grundler
@ 2006-06-26 22:37 ` Stephane Eranian
  1 sibling, 0 replies; 7+ messages in thread
From: Stephane Eranian @ 2006-06-26 22:37 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: oprofile-list, perfmon, linux-ia64, perfctr-devel, linux-kernel

Chuck,

On Mon, Jun 26, 2006 at 01:33:03PM -0400, Chuck Ebbert wrote:
> > Also a new version of pfmon, pfmon-3.2-060621, to take advantage of the update in libpfm:
> > 
> >       - support for 32-bit mode AMD64 processors
> >       - updated event name parsing to prepare for separate
> >         event unit mask management (Kevin Corry)
> >       - fix the detection of unavailable PMC registers. it was causing crashes
> >         when used with sampling.
> > 
> > Note that I have tested 32-bit compiled libpfm,pfmon running on an 64-bit AMD
> > perfmon kernel. I have not tested on a 32-bit AMD linux kernel because I don't
> > have such setup. I would appreciate any feedback on this.
> 
> 32-bit works great.  Unfortunately, pfmon is far too limited for serious kernel
> monitoring AFAICT.  E.g. you can't select edge counting instead of cycle
> counting.  So you can count how many clock cycles were spent with interrupts

I put in an option to enable this mode, do pfmon --help. I think it's called
edge-mask.

> disabled but you can't count how many times they were disabled.  That's too bad
> because using pfmon is so easy compared to writing a program.
> 
Try the option, and let me know if it does not work for you.

> And is someone working on kernel profiling tools that use the perfmon2
> infrastructure on i386?  I'd like to see kernel-based profiling that lets
> you use something like the existing 'readprofile' to retrieve results.  This
> would be a lot better than the current timer-based profiling.
> 
You can do this on your athlon using pfmon already, you need to enable a
different sampling module. Here is an example:

$ pfmon --smpl-module=inst-hist -ecpu_clk_unhalted -k --long-smpl-period=100000 \
     --resolve-addr --system-wide --session-timeout=10

This will sample (period of 100,000 cpu_clk_unhalted) in the kernel ONLY for 10s and print  a flat
profile sorted by #samples/instruction addresses. You can chose any event you want. Note that you can
also use this output format in per-thread mode.

Hope this helps.
-- 
-Stephane

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
  2006-06-27 14:32 ` Stephane Eranian
@ 2006-06-27 16:51   ` Grant Grundler
  0 siblings, 0 replies; 7+ messages in thread
From: Grant Grundler @ 2006-06-27 16:51 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Chuck Ebbert, linux-kernel, perfctr-devel, linux-ia64, perfmon,
	oprofile-list

On Tue, Jun 27, 2006 at 07:32:04AM -0700, Stephane Eranian wrote:
...
> > 5006 hardware interrupts in 10 seconds, 16359 interrupt-disable events ==>
> > the kernel disabled interrupts 11353 times for critical sections.  To get
> > useful results it looks like booting with idle=poll and disabling cpufreq
> > is needed, though, since interrupts_masked_cycles (non-edge mode) counts
> > even when the CPU is halted:
> 
> Yes, I think you need to be careful with the idle thread, some events may or
> may not count when going low-power. I think it is best to avoid going
> low-power for measurements.

Any benchmarking that involves IA64 idle thread is strongly reccomended
to use "nohalt" option. It's about a 15-20% performance difference
on some interrupt intensive benchmarks (e.g. netperf TCP_RR).

If someone has measured the delta for other architectures that
go into a "low power" state in idle thread, I'd be grateful if
they posted the results or mailed them to me.

thanks,
grant

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
  2006-06-27  5:57 Chuck Ebbert
@ 2006-06-27 14:32 ` Stephane Eranian
  2006-06-27 16:51   ` Grant Grundler
  0 siblings, 1 reply; 7+ messages in thread
From: Stephane Eranian @ 2006-06-27 14:32 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: linux-kernel, perfctr-devel, linux-ia64, perfmon, oprofile-list

Chuck,

On Tue, Jun 27, 2006 at 01:57:39AM -0400, Chuck Ebbert wrote:
> It works:
> 
> $ pfmon --system-wide -0 -e interrupts_masked_cycles,interrupts_taken --edge-mask 0,1 -t 10
> <session to end in 10 seconds>
> CPU0    16359 INTERRUPTS_MASKED_CYCLES
> CPU0     5006 INTERRUPTS_TAKEN
> 
> 5006 hardware interrupts in 10 seconds, 16359 interrupt-disable events ==>
> the kernel disabled interrupts 11353 times for critical sections.  To get
> useful results it looks like booting with idle=poll and disabling cpufreq
> is needed, though, since interrupts_masked_cycles (non-edge mode) counts
> even when the CPU is halted:

Yes, I think you need to be careful with the idle thread, some events may or
may not count when going low-power. I think it is best to avoid going
low-power for measurements. It is also useful for some measurements to
exclude the idle task, i.e., to get useful kernel execution. For that
you can use the --excl-idle option of pfmon.

> 
> $ pfmon --system-wide -0 -e interrupts_masked_cycles,cpu_clk_unhalted -t 10
> <session to end in 10 seconds>
> CPU0    352020255 INTERRUPTS_MASKED_CYCLES
> CPU0     65351172 CPU_CLK_UNHALTED
> 
> > > And is someone working on kernel profiling tools that use the perfmon2
> > > infrastructure on i386?  I'd like to see kernel-based profiling that lets
> > > you use something like the existing 'readprofile' to retrieve results.  This
> > > would be a lot better than the current timer-based profiling.
> > > 
> > You can do this on your athlon using pfmon already, you need to enable a
> > different sampling module. Here is an example:
> > 
> > $ pfmon --smpl-module=inst-hist -ecpu_clk_unhalted -k --long-smpl-period=100000 \
> >      --resolve-addr --system-wide --session-timeout=10
> 
> That produces no output except for column headings.  Thinking it was a problem with
> x86_64 32-bit support, I built a p6 version.  I tried both short and long
> periods on both systems with the same result:

I think this is an issue with the NMI setup. I have looked at the code and found
some problems. They wil be fixed in the next patch. I suspect that if you say nmi_watchdog=2
on the kernel cmdline, it will work.

I have added the following 3 patches.
Thanks.

> 
> perfmon: add Pentium II support (family 6 model 3 only.)
> 
> --- 2.6.17.1-d4-pfmon.orig/arch/i386/perfmon/perfmon_p6.c
> +++ 2.6.17.1-d4-pfmon/arch/i386/perfmon/perfmon_p6.c
> @@ -76,6 +76,9 @@ static int pfm_p6_probe_pmu(void)
>  	}
>  
>  	switch(cpu_data->x86_model) {
> +		case 3:
> +			PFM_INFO("Pentium II PMU detected");
> +			break;
>  		case 7 ... 11:
>  			PFM_INFO("P6 core PMU detected");
>  			break;
> _
> 
> libpfm: Add Pentium II support (family 6 model 3 only.)
> 
> --- libpfm-3.2-060621.orig/lib/pfmlib_i386_p6.c
> +++ libpfm-3.2-060621/lib/pfmlib_i386_p6.c
> @@ -136,6 +136,7 @@ pfm_i386_p6_detect(void)
>  		return PFMLIB_ERR_NOTSUPP;
>  
>  	switch(model) {
> +		case 3: /* Pentium II */
>  		case 7: /* Pentium III Katmai */
>  		case 8: /* Pentium III Coppermine */
>  		case 9: /* Mobile Pentium III */
> _
> 
> pfmon: don't build gen_ia32 sample module if not configured.
> 
> --- pfmon-3.2-060621.orig/pfmon/pfmon_smpl.c
> +++ pfmon-3.2-060621/pfmon/pfmon_smpl.c
> @@ -61,6 +61,8 @@ static pfmon_smpl_module_t *smpl_modules
>  #endif
>  #ifdef CONFIG_PFMON_I386_P6
>  	&detailed_i386_p6_smpl_module, /* must be first for P6 */
> +#endif
> +#ifdef CONFIG_PFMON_GEN_IA32
>  	&detailed_gen_ia32_smpl_module, /* must be last for I386 */
>  #endif
>  	&inst_hist_smpl_module,		/* works for any PMU model */

-- 

-Stephane

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
@ 2006-06-27  5:57 Chuck Ebbert
  2006-06-27 14:32 ` Stephane Eranian
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Ebbert @ 2006-06-27  5:57 UTC (permalink / raw)
  To: eranian; +Cc: linux-kernel, perfctr-devel, linux-ia64, perfmon, oprofile-list

In-Reply-To: <20060626223716.GA16082@frankl.hpl.hp.com>

On Mon, 26 Jun 2006 15:37:17 -0700, Stephane Eranian wrote:

> > 32-bit works great.  Unfortunately, pfmon is far too limited for serious kernel
> > monitoring AFAICT.  E.g. you can't select edge counting instead of cycle
> > counting.  So you can count how many clock cycles were spent with interrupts
> 
> I put in an option to enable this mode, do pfmon --help. I think it's called
> edge-mask.

Silly me, I was reading the documentation, which doesn't cover this. :)
It works:

$ pfmon --system-wide -0 -e interrupts_masked_cycles,interrupts_taken --edge-mask 0,1 -t 10
<session to end in 10 seconds>
CPU0    16359 INTERRUPTS_MASKED_CYCLES
CPU0     5006 INTERRUPTS_TAKEN

5006 hardware interrupts in 10 seconds, 16359 interrupt-disable events ==>
the kernel disabled interrupts 11353 times for critical sections.  To get
useful results it looks like booting with idle=poll and disabling cpufreq
is needed, though, since interrupts_masked_cycles (non-edge mode) counts
even when the CPU is halted:

$ pfmon --system-wide -0 -e interrupts_masked_cycles,cpu_clk_unhalted -t 10
<session to end in 10 seconds>
CPU0    352020255 INTERRUPTS_MASKED_CYCLES
CPU0     65351172 CPU_CLK_UNHALTED

> > And is someone working on kernel profiling tools that use the perfmon2
> > infrastructure on i386?  I'd like to see kernel-based profiling that lets
> > you use something like the existing 'readprofile' to retrieve results.  This
> > would be a lot better than the current timer-based profiling.
> > 
> You can do this on your athlon using pfmon already, you need to enable a
> different sampling module. Here is an example:
> 
> $ pfmon --smpl-module=inst-hist -ecpu_clk_unhalted -k --long-smpl-period=100000 \
>      --resolve-addr --system-wide --session-timeout=10

That produces no output except for column headings.  Thinking it was a problem with
x86_64 32-bit support, I built a p6 version.  I tried both short and long
periods on both systems with the same result:

$ pfmon --smpl-module=inst-hist -ecpu_clk_unhalted -k --short-smpl-period=100000 --resolve-addr --system-wide -t 10
only kernel symbols are resolved in system-wide mode
<session to end in 10 seconds>
# counts   %self    %cum code address
# counts   %self    %cum code address

And here's what it took to get everything working on Pentium II (seems OK, not
thoroughly tested:)
_

perfmon: add Pentium II support (family 6 model 3 only.)

--- 2.6.17.1-d4-pfmon.orig/arch/i386/perfmon/perfmon_p6.c
+++ 2.6.17.1-d4-pfmon/arch/i386/perfmon/perfmon_p6.c
@@ -76,6 +76,9 @@ static int pfm_p6_probe_pmu(void)
 	}
 
 	switch(cpu_data->x86_model) {
+		case 3:
+			PFM_INFO("Pentium II PMU detected");
+			break;
 		case 7 ... 11:
 			PFM_INFO("P6 core PMU detected");
 			break;
_

libpfm: Add Pentium II support (family 6 model 3 only.)

--- libpfm-3.2-060621.orig/lib/pfmlib_i386_p6.c
+++ libpfm-3.2-060621/lib/pfmlib_i386_p6.c
@@ -136,6 +136,7 @@ pfm_i386_p6_detect(void)
 		return PFMLIB_ERR_NOTSUPP;
 
 	switch(model) {
+		case 3: /* Pentium II */
 		case 7: /* Pentium III Katmai */
 		case 8: /* Pentium III Coppermine */
 		case 9: /* Mobile Pentium III */
_

pfmon: don't build gen_ia32 sample module if not configured.

--- pfmon-3.2-060621.orig/pfmon/pfmon_smpl.c
+++ pfmon-3.2-060621/pfmon/pfmon_smpl.c
@@ -61,6 +61,8 @@ static pfmon_smpl_module_t *smpl_modules
 #endif
 #ifdef CONFIG_PFMON_I386_P6
 	&detailed_i386_p6_smpl_module, /* must be first for P6 */
+#endif
+#ifdef CONFIG_PFMON_GEN_IA32
 	&detailed_gen_ia32_smpl_module, /* must be last for I386 */
 #endif
 	&inst_hist_smpl_module,		/* works for any PMU model */
-- 
Chuck
 "You can't read a newspaper if you can't read."  --George W. Bush

^ permalink raw reply	[flat|nested] 7+ messages in thread

* 2.6.17.1 new perfmon code base, libpfm, pfmon available
@ 2006-06-21 14:24 Stephane Eranian
  0 siblings, 0 replies; 7+ messages in thread
From: Stephane Eranian @ 2006-06-21 14:24 UTC (permalink / raw)
  To: perfmon; +Cc: perfctr-devel, linux-ia64, linux-kernel, oprofile-list

Hello,

I have released another version of the perfmon new code base package.
This version of the kernel patch is relative to 2.6.17.1.

The patch includes:
	- support for 32-bit mode AMD64 processors (Chuck Ebbert)
	- mini-argument buffers on stack optimization for read/write of PMU registers
	- fix user group permission checking which were ignored
	- fix a missing irqsave in perfmon_kapi.c

For the stack buffers there are per-arch constants that can be adjusted based
on stack size limitations. Look for PFM_ARCH_PM*_ARG.

I have also release a new libpfm, libpfm-3.2-060621, which includes:

	- support for 32-bit mode AMD64 processors
	- fix an opcode matching/range restriction limitation for Itanium2 PMC13
 	  and Montecito PMC41 registers.

This version of the library works with 2.6.17-rc6 and 2.6.17.1

Also a new version of pfmon, pfmon-3.2-060621, to take advantage of the update in libpfm:

	- support for 32-bit mode AMD64 processors
	- updated event name parsing to prepare for separate
	  event unit mask management (Kevin Corry)
	- fix the detection of unavailable PMC registers. it was causing crashes
	  when used with sampling.

Note that I have tested 32-bit compiled libpfm,pfmon running on an 64-bit AMD
perfmon kernel. I have not tested on a 32-bit AMD linux kernel because I don't
have such setup. I would appreciate any feedback on this.

You can grab the new packages at our web site:

	 http://perfmon2.sf.net

PS: I will post an incremental kernel patch and a diffstat on the perfmon mailing list.

-- 
-Stephane

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-06-27 16:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-26 17:33 2.6.17.1 new perfmon code base, libpfm, pfmon available Chuck Ebbert
2006-06-26 19:32 ` Grant Grundler
2006-06-26 22:37 ` Stephane Eranian
  -- strict thread matches above, loose matches on Subject: below --
2006-06-27  5:57 Chuck Ebbert
2006-06-27 14:32 ` Stephane Eranian
2006-06-27 16:51   ` Grant Grundler
2006-06-21 14:24 Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).