linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.17.1 new perfmon code base, libpfm, pfmon available
@ 2006-06-21 14:24 Stephane Eranian
  2006-06-23 21:13 ` [perfmon] " William Cohen
  0 siblings, 1 reply; 10+ messages in thread
From: Stephane Eranian @ 2006-06-21 14:24 UTC (permalink / raw)
  To: perfmon; +Cc: perfctr-devel, linux-ia64, linux-kernel, oprofile-list

Hello,

I have released another version of the perfmon new code base package.
This version of the kernel patch is relative to 2.6.17.1.

The patch includes:
	- support for 32-bit mode AMD64 processors (Chuck Ebbert)
	- mini-argument buffers on stack optimization for read/write of PMU registers
	- fix user group permission checking which were ignored
	- fix a missing irqsave in perfmon_kapi.c

For the stack buffers there are per-arch constants that can be adjusted based
on stack size limitations. Look for PFM_ARCH_PM*_ARG.

I have also release a new libpfm, libpfm-3.2-060621, which includes:

	- support for 32-bit mode AMD64 processors
	- fix an opcode matching/range restriction limitation for Itanium2 PMC13
 	  and Montecito PMC41 registers.

This version of the library works with 2.6.17-rc6 and 2.6.17.1

Also a new version of pfmon, pfmon-3.2-060621, to take advantage of the update in libpfm:

	- support for 32-bit mode AMD64 processors
	- updated event name parsing to prepare for separate
	  event unit mask management (Kevin Corry)
	- fix the detection of unavailable PMC registers. it was causing crashes
	  when used with sampling.

Note that I have tested 32-bit compiled libpfm,pfmon running on an 64-bit AMD
perfmon kernel. I have not tested on a 32-bit AMD linux kernel because I don't
have such setup. I would appreciate any feedback on this.

You can grab the new packages at our web site:

	 http://perfmon2.sf.net

PS: I will post an incremental kernel patch and a diffstat on the perfmon mailing list.

-- 
-Stephane

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [perfmon] 2.6.17.1 new perfmon code base, libpfm, pfmon available
  2006-06-21 14:24 2.6.17.1 new perfmon code base, libpfm, pfmon available Stephane Eranian
@ 2006-06-23 21:13 ` William Cohen
  2006-06-23 21:23   ` [Perfctr-devel] " Stephane Eranian
  0 siblings, 1 reply; 10+ messages in thread
From: William Cohen @ 2006-06-23 21:13 UTC (permalink / raw)
  To: eranian; +Cc: perfmon, oprofile-list, linux-ia64, linux-kernel, perfctr-devel

Hi Stephane,

Some quick questions about the current perfmon code.


The athlon has very similar hw to the amd64 and there is now 32-bit
x86-64 support. Wouldn't it make sense to move perfmon_amd.c to i386
and have it work in the same way as perfmon_p4.c does currently for p4
and em64t?

Could the 32-bit and 64-bit code be combined in a manner similar to
oprofile and avoid duplication between perfmon_em64t_pebs.c and
perfmon_p4_pebs.c?  pfm_{p4|em64}_ds_area and
pfm_{p4|em64t}_pebs_sample_entry have differences due to the upgrade
from 32 to 64 bit values.

Why isn't Intel family 0xf model 3 not supported?
	Model 1,2, 4, and 5 are supported.
	Model 3 Pentium4 isn't that different is it?

Why the following patch in the code and array using this constant in
sys_pfm_write_pmcs and sys_pfm_write_pmds? The the p4/em64t certainly
has more registers than that.

--- linux-2.6.17.1.old/include/asm-i386/perfmon.h	2006-06-21 
05:19:04.000000000 -0700
+++ linux-2.6.17.1/include/asm-i386/perfmon.h	2006-06-21 
04:22:51.000000000 -0700
@@ -18,6 +18,14 @@

  #ifdef __KERNEL__

+#ifdef CONFIG_4KSTACKS
+#define PFM_ARCH_PMD_ARG	2
+#define PFM_ARCH_PMC_ARG	2
+#else
+#define PFM_ARCH_PMD_ARG	4
+#define PFM_ARCH_PMC_ARG	4
+#endif
+
  #include <asm/desc.h>
  #include <asm/apic.h>


What is the purpose of PFM_MAX_XTRA_PMCS and PFM_MAX_XTRA_PMDS? Are
they used for anything other than increasing the size of PFM_MAX_PMCS
and PFM_MAX_PMDS?


-Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Perfctr-devel] [perfmon] 2.6.17.1 new perfmon code base, libpfm, pfmon available
  2006-06-23 21:13 ` [perfmon] " William Cohen
@ 2006-06-23 21:23   ` Stephane Eranian
  2006-06-26 17:10     ` William Cohen
  0 siblings, 1 reply; 10+ messages in thread
From: Stephane Eranian @ 2006-06-23 21:23 UTC (permalink / raw)
  To: William Cohen
  Cc: linux-ia64, perfctr-devel, perfmon, linux-kernel, oprofile-list

Will,

On Fri, Jun 23, 2006 at 05:13:47PM -0400, William Cohen wrote:
> Hi Stephane,
> 
> Some quick questions about the current perfmon code.
> 
> 
> The athlon has very similar hw to the amd64 and there is now 32-bit
> x86-64 support. Wouldn't it make sense to move perfmon_amd.c to i386
> and have it work in the same way as perfmon_p4.c does currently for p4
> and em64t?
> 
Does Athlon have 4 counters as well. I don't have the HW so I cannot really
test. I suspect they are similar. If you have HW and you can test, I don't
have a problem.

> Could the 32-bit and 64-bit code be combined in a manner similar to
> oprofile and avoid duplication between perfmon_em64t_pebs.c and
> perfmon_p4_pebs.c?  pfm_{p4|em64}_ds_area and
> pfm_{p4|em64t}_pebs_sample_entry have differences due to the upgrade
> from 32 to 64 bit values.
> 
You have several issues here:
	- the 64-bit version has 8 more reigsters int the PEBS entry
	- the PEBS entry uses 32 or 64 bitfields depending on data model
	- the ds_area uses 32 or 64 bits depending on the data model except for the threshold value

Now remember that on on EM64T we also support 32-bit (i386) binaries. 
With an EM64T kernel you would have the 64-bit PEBS format. With the same UUID if would satisfy
a i386 binary and this is wrong because they would not match the definition of the PEBS entry.
We need to keep the PEBS 32 and 64-bit format UUIDs different. At the source code level, you would
need to ifdef __x86_64__ and __i386__ to switch struct definition and UUID. That's doable but is
this clean?
	
> Why isn't Intel family 0xf model 3 not supported?
> 	Model 1,2, 4, and 5 are supported.
> 	Model 3 Pentium4 isn't that different is it?

I have not looked at this. I don't have a lot of P4 HW. I think that
all family 15 uses the same PMU. Could someone confirm this?


> 
> Why the following patch in the code and array using this constant in
> sys_pfm_write_pmcs and sys_pfm_write_pmds? The the p4/em64t certainly
> has more registers than that.
> 

The constant are not directly related to the number of registers. There is
an issue with stack space consumption. You need to use the right balance
between most common number of elements passed to read/write calls with
stack size. On i386 (and x86_64, I think) the page size is 4kB and
the default stack is 2 pages, so you have to be careful especially
when you have to call very deep.

> --- linux-2.6.17.1.old/include/asm-i386/perfmon.h	2006-06-21 
> 05:19:04.000000000 -0700
> +++ linux-2.6.17.1/include/asm-i386/perfmon.h	2006-06-21 
> 04:22:51.000000000 -0700
> @@ -18,6 +18,14 @@
> 
>   #ifdef __KERNEL__
> 
> +#ifdef CONFIG_4KSTACKS
> +#define PFM_ARCH_PMD_ARG	2
> +#define PFM_ARCH_PMC_ARG	2
> +#else
> +#define PFM_ARCH_PMD_ARG	4
> +#define PFM_ARCH_PMC_ARG	4
> +#endif
> +
>   #include <asm/desc.h>
>   #include <asm/apic.h>
> 
> 
> What is the purpose of PFM_MAX_XTRA_PMCS and PFM_MAX_XTRA_PMDS? Are
> they used for anything other than increasing the size of PFM_MAX_PMCS
> and PFM_MAX_PMDS?
> 
> 
> -Will
> 
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Perfctr-devel mailing list
> Perfctr-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/perfctr-devel

-- 

-Stephane

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Perfctr-devel] [perfmon] 2.6.17.1 new perfmon code base, libpfm, pfmon available
  2006-06-23 21:23   ` [Perfctr-devel] " Stephane Eranian
@ 2006-06-26 17:10     ` William Cohen
  0 siblings, 0 replies; 10+ messages in thread
From: William Cohen @ 2006-06-26 17:10 UTC (permalink / raw)
  To: eranian; +Cc: linux-ia64, perfctr-devel, perfmon, linux-kernel, oprofile-list

Stephane Eranian wrote:
> Will,
> 
> On Fri, Jun 23, 2006 at 05:13:47PM -0400, William Cohen wrote:
> 
>>Hi Stephane,
>>
>>Some quick questions about the current perfmon code.
>>
>>
>>The athlon has very similar hw to the amd64 and there is now 32-bit
>>x86-64 support. Wouldn't it make sense to move perfmon_amd.c to i386
>>and have it work in the same way as perfmon_p4.c does currently for p4
>>and em64t?
>>
> 
> Does Athlon have 4 counters as well. I don't have the HW so I cannot really
> test. I suspect they are similar. If you have HW and you can test, I don't
> have a problem.

I have an Athlon machine in the office that I can test this change out 
on and send you a diff.

>>Could the 32-bit and 64-bit code be combined in a manner similar to
>>oprofile and avoid duplication between perfmon_em64t_pebs.c and
>>perfmon_p4_pebs.c?  pfm_{p4|em64}_ds_area and
>>pfm_{p4|em64t}_pebs_sample_entry have differences due to the upgrade
>>from 32 to 64 bit values.
>>
> 
> You have several issues here:
> 	- the 64-bit version has 8 more reigsters int the PEBS entry
> 	- the PEBS entry uses 32 or 64 bitfields depending on data model
> 	- the ds_area uses 32 or 64 bits depending on the data model except for the threshold value
> 
> Now remember that on on EM64T we also support 32-bit (i386) binaries. 
> With an EM64T kernel you would have the 64-bit PEBS format. With the same UUID if would satisfy
> a i386 binary and this is wrong because they would not match the definition of the PEBS entry.
> We need to keep the PEBS 32 and 64-bit format UUIDs different. At the source code level, you would
> need to ifdef __x86_64__ and __i386__ to switch struct definition and UUID. That's doable but is
> this clean?

Certainly given the differences in the pebs elements there will need to 
unique names for each. It was just a thought to factor out the similar code.

There is support to handle amd64 hardware running on 32-bit kernel. Has 
someone verified that the em64t processor generate 32-bit compatible 
entries when running in 32-bit mode? Or does it always write out 64-bit 
style PEBS entries?

>>Why isn't Intel family 0xf model 3 not supported?
>>	Model 1,2, 4, and 5 are supported.
>>	Model 3 Pentium4 isn't that different is it?
> 
> 
> I have not looked at this. I don't have a lot of P4 HW. I think that
> all family 15 uses the same PMU. Could someone confirm this?

A NC State University professor mentioned that the ommission of model 3 
was a problem because his machine were model 3. I suggested the addition 
of case for model 3 processors to get him going on that.

-Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
  2006-06-27 14:32 ` Stephane Eranian
@ 2006-06-27 16:51   ` Grant Grundler
  0 siblings, 0 replies; 10+ messages in thread
From: Grant Grundler @ 2006-06-27 16:51 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Chuck Ebbert, linux-kernel, perfctr-devel, linux-ia64, perfmon,
	oprofile-list

On Tue, Jun 27, 2006 at 07:32:04AM -0700, Stephane Eranian wrote:
...
> > 5006 hardware interrupts in 10 seconds, 16359 interrupt-disable events ==>
> > the kernel disabled interrupts 11353 times for critical sections.  To get
> > useful results it looks like booting with idle=poll and disabling cpufreq
> > is needed, though, since interrupts_masked_cycles (non-edge mode) counts
> > even when the CPU is halted:
> 
> Yes, I think you need to be careful with the idle thread, some events may or
> may not count when going low-power. I think it is best to avoid going
> low-power for measurements.

Any benchmarking that involves IA64 idle thread is strongly reccomended
to use "nohalt" option. It's about a 15-20% performance difference
on some interrupt intensive benchmarks (e.g. netperf TCP_RR).

If someone has measured the delta for other architectures that
go into a "low power" state in idle thread, I'd be grateful if
they posted the results or mailed them to me.

thanks,
grant

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
  2006-06-27  5:57 Chuck Ebbert
@ 2006-06-27 14:32 ` Stephane Eranian
  2006-06-27 16:51   ` Grant Grundler
  0 siblings, 1 reply; 10+ messages in thread
From: Stephane Eranian @ 2006-06-27 14:32 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: linux-kernel, perfctr-devel, linux-ia64, perfmon, oprofile-list

Chuck,

On Tue, Jun 27, 2006 at 01:57:39AM -0400, Chuck Ebbert wrote:
> It works:
> 
> $ pfmon --system-wide -0 -e interrupts_masked_cycles,interrupts_taken --edge-mask 0,1 -t 10
> <session to end in 10 seconds>
> CPU0    16359 INTERRUPTS_MASKED_CYCLES
> CPU0     5006 INTERRUPTS_TAKEN
> 
> 5006 hardware interrupts in 10 seconds, 16359 interrupt-disable events ==>
> the kernel disabled interrupts 11353 times for critical sections.  To get
> useful results it looks like booting with idle=poll and disabling cpufreq
> is needed, though, since interrupts_masked_cycles (non-edge mode) counts
> even when the CPU is halted:

Yes, I think you need to be careful with the idle thread, some events may or
may not count when going low-power. I think it is best to avoid going
low-power for measurements. It is also useful for some measurements to
exclude the idle task, i.e., to get useful kernel execution. For that
you can use the --excl-idle option of pfmon.

> 
> $ pfmon --system-wide -0 -e interrupts_masked_cycles,cpu_clk_unhalted -t 10
> <session to end in 10 seconds>
> CPU0    352020255 INTERRUPTS_MASKED_CYCLES
> CPU0     65351172 CPU_CLK_UNHALTED
> 
> > > And is someone working on kernel profiling tools that use the perfmon2
> > > infrastructure on i386?  I'd like to see kernel-based profiling that lets
> > > you use something like the existing 'readprofile' to retrieve results.  This
> > > would be a lot better than the current timer-based profiling.
> > > 
> > You can do this on your athlon using pfmon already, you need to enable a
> > different sampling module. Here is an example:
> > 
> > $ pfmon --smpl-module=inst-hist -ecpu_clk_unhalted -k --long-smpl-period=100000 \
> >      --resolve-addr --system-wide --session-timeout=10
> 
> That produces no output except for column headings.  Thinking it was a problem with
> x86_64 32-bit support, I built a p6 version.  I tried both short and long
> periods on both systems with the same result:

I think this is an issue with the NMI setup. I have looked at the code and found
some problems. They wil be fixed in the next patch. I suspect that if you say nmi_watchdog=2
on the kernel cmdline, it will work.

I have added the following 3 patches.
Thanks.

> 
> perfmon: add Pentium II support (family 6 model 3 only.)
> 
> --- 2.6.17.1-d4-pfmon.orig/arch/i386/perfmon/perfmon_p6.c
> +++ 2.6.17.1-d4-pfmon/arch/i386/perfmon/perfmon_p6.c
> @@ -76,6 +76,9 @@ static int pfm_p6_probe_pmu(void)
>  	}
>  
>  	switch(cpu_data->x86_model) {
> +		case 3:
> +			PFM_INFO("Pentium II PMU detected");
> +			break;
>  		case 7 ... 11:
>  			PFM_INFO("P6 core PMU detected");
>  			break;
> _
> 
> libpfm: Add Pentium II support (family 6 model 3 only.)
> 
> --- libpfm-3.2-060621.orig/lib/pfmlib_i386_p6.c
> +++ libpfm-3.2-060621/lib/pfmlib_i386_p6.c
> @@ -136,6 +136,7 @@ pfm_i386_p6_detect(void)
>  		return PFMLIB_ERR_NOTSUPP;
>  
>  	switch(model) {
> +		case 3: /* Pentium II */
>  		case 7: /* Pentium III Katmai */
>  		case 8: /* Pentium III Coppermine */
>  		case 9: /* Mobile Pentium III */
> _
> 
> pfmon: don't build gen_ia32 sample module if not configured.
> 
> --- pfmon-3.2-060621.orig/pfmon/pfmon_smpl.c
> +++ pfmon-3.2-060621/pfmon/pfmon_smpl.c
> @@ -61,6 +61,8 @@ static pfmon_smpl_module_t *smpl_modules
>  #endif
>  #ifdef CONFIG_PFMON_I386_P6
>  	&detailed_i386_p6_smpl_module, /* must be first for P6 */
> +#endif
> +#ifdef CONFIG_PFMON_GEN_IA32
>  	&detailed_gen_ia32_smpl_module, /* must be last for I386 */
>  #endif
>  	&inst_hist_smpl_module,		/* works for any PMU model */

-- 

-Stephane

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
@ 2006-06-27  5:57 Chuck Ebbert
  2006-06-27 14:32 ` Stephane Eranian
  0 siblings, 1 reply; 10+ messages in thread
From: Chuck Ebbert @ 2006-06-27  5:57 UTC (permalink / raw)
  To: eranian; +Cc: linux-kernel, perfctr-devel, linux-ia64, perfmon, oprofile-list

In-Reply-To: <20060626223716.GA16082@frankl.hpl.hp.com>

On Mon, 26 Jun 2006 15:37:17 -0700, Stephane Eranian wrote:

> > 32-bit works great.  Unfortunately, pfmon is far too limited for serious kernel
> > monitoring AFAICT.  E.g. you can't select edge counting instead of cycle
> > counting.  So you can count how many clock cycles were spent with interrupts
> 
> I put in an option to enable this mode, do pfmon --help. I think it's called
> edge-mask.

Silly me, I was reading the documentation, which doesn't cover this. :)
It works:

$ pfmon --system-wide -0 -e interrupts_masked_cycles,interrupts_taken --edge-mask 0,1 -t 10
<session to end in 10 seconds>
CPU0    16359 INTERRUPTS_MASKED_CYCLES
CPU0     5006 INTERRUPTS_TAKEN

5006 hardware interrupts in 10 seconds, 16359 interrupt-disable events ==>
the kernel disabled interrupts 11353 times for critical sections.  To get
useful results it looks like booting with idle=poll and disabling cpufreq
is needed, though, since interrupts_masked_cycles (non-edge mode) counts
even when the CPU is halted:

$ pfmon --system-wide -0 -e interrupts_masked_cycles,cpu_clk_unhalted -t 10
<session to end in 10 seconds>
CPU0    352020255 INTERRUPTS_MASKED_CYCLES
CPU0     65351172 CPU_CLK_UNHALTED

> > And is someone working on kernel profiling tools that use the perfmon2
> > infrastructure on i386?  I'd like to see kernel-based profiling that lets
> > you use something like the existing 'readprofile' to retrieve results.  This
> > would be a lot better than the current timer-based profiling.
> > 
> You can do this on your athlon using pfmon already, you need to enable a
> different sampling module. Here is an example:
> 
> $ pfmon --smpl-module=inst-hist -ecpu_clk_unhalted -k --long-smpl-period=100000 \
>      --resolve-addr --system-wide --session-timeout=10

That produces no output except for column headings.  Thinking it was a problem with
x86_64 32-bit support, I built a p6 version.  I tried both short and long
periods on both systems with the same result:

$ pfmon --smpl-module=inst-hist -ecpu_clk_unhalted -k --short-smpl-period=100000 --resolve-addr --system-wide -t 10
only kernel symbols are resolved in system-wide mode
<session to end in 10 seconds>
# counts   %self    %cum code address
# counts   %self    %cum code address

And here's what it took to get everything working on Pentium II (seems OK, not
thoroughly tested:)
_

perfmon: add Pentium II support (family 6 model 3 only.)

--- 2.6.17.1-d4-pfmon.orig/arch/i386/perfmon/perfmon_p6.c
+++ 2.6.17.1-d4-pfmon/arch/i386/perfmon/perfmon_p6.c
@@ -76,6 +76,9 @@ static int pfm_p6_probe_pmu(void)
 	}
 
 	switch(cpu_data->x86_model) {
+		case 3:
+			PFM_INFO("Pentium II PMU detected");
+			break;
 		case 7 ... 11:
 			PFM_INFO("P6 core PMU detected");
 			break;
_

libpfm: Add Pentium II support (family 6 model 3 only.)

--- libpfm-3.2-060621.orig/lib/pfmlib_i386_p6.c
+++ libpfm-3.2-060621/lib/pfmlib_i386_p6.c
@@ -136,6 +136,7 @@ pfm_i386_p6_detect(void)
 		return PFMLIB_ERR_NOTSUPP;
 
 	switch(model) {
+		case 3: /* Pentium II */
 		case 7: /* Pentium III Katmai */
 		case 8: /* Pentium III Coppermine */
 		case 9: /* Mobile Pentium III */
_

pfmon: don't build gen_ia32 sample module if not configured.

--- pfmon-3.2-060621.orig/pfmon/pfmon_smpl.c
+++ pfmon-3.2-060621/pfmon/pfmon_smpl.c
@@ -61,6 +61,8 @@ static pfmon_smpl_module_t *smpl_modules
 #endif
 #ifdef CONFIG_PFMON_I386_P6
 	&detailed_i386_p6_smpl_module, /* must be first for P6 */
+#endif
+#ifdef CONFIG_PFMON_GEN_IA32
 	&detailed_gen_ia32_smpl_module, /* must be last for I386 */
 #endif
 	&inst_hist_smpl_module,		/* works for any PMU model */
-- 
Chuck
 "You can't read a newspaper if you can't read."  --George W. Bush

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
  2006-06-26 17:33 Chuck Ebbert
  2006-06-26 19:32 ` Grant Grundler
@ 2006-06-26 22:37 ` Stephane Eranian
  1 sibling, 0 replies; 10+ messages in thread
From: Stephane Eranian @ 2006-06-26 22:37 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: oprofile-list, perfmon, linux-ia64, perfctr-devel, linux-kernel

Chuck,

On Mon, Jun 26, 2006 at 01:33:03PM -0400, Chuck Ebbert wrote:
> > Also a new version of pfmon, pfmon-3.2-060621, to take advantage of the update in libpfm:
> > 
> >       - support for 32-bit mode AMD64 processors
> >       - updated event name parsing to prepare for separate
> >         event unit mask management (Kevin Corry)
> >       - fix the detection of unavailable PMC registers. it was causing crashes
> >         when used with sampling.
> > 
> > Note that I have tested 32-bit compiled libpfm,pfmon running on an 64-bit AMD
> > perfmon kernel. I have not tested on a 32-bit AMD linux kernel because I don't
> > have such setup. I would appreciate any feedback on this.
> 
> 32-bit works great.  Unfortunately, pfmon is far too limited for serious kernel
> monitoring AFAICT.  E.g. you can't select edge counting instead of cycle
> counting.  So you can count how many clock cycles were spent with interrupts

I put in an option to enable this mode, do pfmon --help. I think it's called
edge-mask.

> disabled but you can't count how many times they were disabled.  That's too bad
> because using pfmon is so easy compared to writing a program.
> 
Try the option, and let me know if it does not work for you.

> And is someone working on kernel profiling tools that use the perfmon2
> infrastructure on i386?  I'd like to see kernel-based profiling that lets
> you use something like the existing 'readprofile' to retrieve results.  This
> would be a lot better than the current timer-based profiling.
> 
You can do this on your athlon using pfmon already, you need to enable a
different sampling module. Here is an example:

$ pfmon --smpl-module=inst-hist -ecpu_clk_unhalted -k --long-smpl-period=100000 \
     --resolve-addr --system-wide --session-timeout=10

This will sample (period of 100,000 cpu_clk_unhalted) in the kernel ONLY for 10s and print  a flat
profile sorted by #samples/instruction addresses. You can chose any event you want. Note that you can
also use this output format in per-thread mode.

Hope this helps.
-- 
-Stephane

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
  2006-06-26 17:33 Chuck Ebbert
@ 2006-06-26 19:32 ` Grant Grundler
  2006-06-26 22:37 ` Stephane Eranian
  1 sibling, 0 replies; 10+ messages in thread
From: Grant Grundler @ 2006-06-26 19:32 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Stephane Eranian, oprofile-list, perfmon, linux-ia64,
	perfctr-devel, linux-kernel

On Mon, Jun 26, 2006 at 01:33:03PM -0400, Chuck Ebbert wrote:
> 32-bit works great.  Unfortunately, pfmon is far too limited for serious
> kernel monitoring AFAICT.

I think "far too limited for serious kernel monitoring" is not a fair
statement. One can do some very interesting things as I presented
two years ago at OLS:
	http://iou.parisc-linux.org/ols_2004/pfmon_for_iodorks.pdf

It's just a _very_ complex subsystem and has a steep learning curve
to do some of the more complex things that one might like.

> E.g. you can't select edge counting instead
> of cycle counting.  So you can count how many clock cycles were spent
> with interrupts disabled but you can't count how many times they were
> disabled.

At first glance, this example sounds more like a limitation of the HW
and not the SW.

> And is someone working on kernel profiling tools that use the perfmon2
> infrastructure on i386?  I'd like to see kernel-based profiling that lets
> you use something like the existing 'readprofile' to retrieve results.  This
> would be a lot better than the current timer-based profiling.

Both are useful. I wouldn't say one of necessarily better.
FWIW, the "CPU_CYCLES" counts from pfmon aren't timer based on ia64.
AFAIK, the HW counters are sampled to gather those counts.

thanks,
grant

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.17.1 new perfmon code base, libpfm, pfmon available
@ 2006-06-26 17:33 Chuck Ebbert
  2006-06-26 19:32 ` Grant Grundler
  2006-06-26 22:37 ` Stephane Eranian
  0 siblings, 2 replies; 10+ messages in thread
From: Chuck Ebbert @ 2006-06-26 17:33 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: oprofile-list, perfmon, linux-ia64, perfctr-devel, linux-kernel

> Also a new version of pfmon, pfmon-3.2-060621, to take advantage of the update in libpfm:
> 
>       - support for 32-bit mode AMD64 processors
>       - updated event name parsing to prepare for separate
>         event unit mask management (Kevin Corry)
>       - fix the detection of unavailable PMC registers. it was causing crashes
>         when used with sampling.
> 
> Note that I have tested 32-bit compiled libpfm,pfmon running on an 64-bit AMD
> perfmon kernel. I have not tested on a 32-bit AMD linux kernel because I don't
> have such setup. I would appreciate any feedback on this.

32-bit works great.  Unfortunately, pfmon is far too limited for serious kernel
monitoring AFAICT.  E.g. you can't select edge counting instead of cycle
counting.  So you can count how many clock cycles were spent with interrupts
disabled but you can't count how many times they were disabled.  That's too bad
because using pfmon is so easy compared to writing a program.

And is someone working on kernel profiling tools that use the perfmon2
infrastructure on i386?  I'd like to see kernel-based profiling that lets
you use something like the existing 'readprofile' to retrieve results.  This
would be a lot better than the current timer-based profiling.

-- 
Chuck
 "You can't read a newspaper if you can't read."  --George W. Bush

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-06-27 16:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-21 14:24 2.6.17.1 new perfmon code base, libpfm, pfmon available Stephane Eranian
2006-06-23 21:13 ` [perfmon] " William Cohen
2006-06-23 21:23   ` [Perfctr-devel] " Stephane Eranian
2006-06-26 17:10     ` William Cohen
2006-06-26 17:33 Chuck Ebbert
2006-06-26 19:32 ` Grant Grundler
2006-06-26 22:37 ` Stephane Eranian
2006-06-27  5:57 Chuck Ebbert
2006-06-27 14:32 ` Stephane Eranian
2006-06-27 16:51   ` Grant Grundler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).