All of lore.kernel.org
 help / color / mirror / Atom feed
* re-enable Nehalem raw Offcore-Events support
@ 2011-04-29 15:04 Vince Weaver
  2011-04-29 15:27 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Vince Weaver @ 2011-04-29 15:04 UTC (permalink / raw)
  To: torvalds
  Cc: Ingo Molnar, linux-kernel, Peter Zijlstra, Stephane Eranian, Andi Kleen

Hello Linus

can you revert the commit b52c55c6a25e4515b5e075a989ff346fc251ed09

This removed functionality from perf_events that allowed raw event access 
for OFFCORE_EVENTS type events on Nehalem and Westmere cpus.

To be fair, this is not technically a regression as the feature was only 
(finally!) added in the 2.6.39 merge window.  However this is a useful 
feature and many tools (including the PAPI performance counter library 
that I work on) had added support for it in anticipation of the 2.6.39 
release.

Ingo's reasons for removing the feature seem to boil down to
  1.  "perf" doesn't use the functionality, and any other userspace
      program that uses the perf_events syscalls don't matter
  2.  Users are too stupid to use the raw functionality properly;
      we should only allow a kernel-developer-approved small subset
      of the features provided by the CPU as described in the intel
      developers manuals.

#2 seems like a gross misinterpretation of the whole "Linux gives you 
enough rope to shoot yourself in the foot" policy from days passed, but 
maybe things have moved on.

Thanks,

Vince
vweaver1@eecs.utk.edu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 15:04 re-enable Nehalem raw Offcore-Events support Vince Weaver
@ 2011-04-29 15:27 ` Andi Kleen
  2011-04-29 16:49   ` Ingo Molnar
  2011-04-29 16:42 ` Ingo Molnar
  2011-04-29 17:17 ` Pekka Enberg
  2 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2011-04-29 15:27 UTC (permalink / raw)
  To: Vince Weaver
  Cc: torvalds, Ingo Molnar, linux-kernel, Peter Zijlstra, Stephane Eranian

On Fri, Apr 29, 2011 at 11:04:46AM -0400, Vince Weaver wrote:
> Hello Linus
> 
> can you revert the commit b52c55c6a25e4515b5e075a989ff346fc251ed09

Acked-by: Andi Kleen <ak@linux.intel.com>

(I wrote the original patch)

> (finally!) added in the 2.6.39 merge window.  However this is a useful 
> feature and many tools (including the PAPI performance counter library 
> that I work on) had added support for it in anticipation of the 2.6.39 

I also use some tools which benefit from this functionality. The
extended raw events are very useful to analyze NUMA problems for once.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 15:04 re-enable Nehalem raw Offcore-Events support Vince Weaver
  2011-04-29 15:27 ` Andi Kleen
@ 2011-04-29 16:42 ` Ingo Molnar
  2011-04-29 18:01   ` Vince Weaver
                     ` (2 more replies)
  2011-04-29 17:17 ` Pekka Enberg
  2 siblings, 3 replies; 30+ messages in thread
From: Ingo Molnar @ 2011-04-29 16:42 UTC (permalink / raw)
  To: Vince Weaver
  Cc: torvalds, linux-kernel, Peter Zijlstra, Stephane Eranian,
	Andi Kleen, Thomas Gleixner


* Vince Weaver <vweaver1@eecs.utk.edu> wrote:

> Hello Linus
> 
> can you revert the commit b52c55c6a25e4515b5e075a989ff346fc251ed09
> 
> This removed functionality from perf_events that allowed raw event access 
> for OFFCORE_EVENTS type events on Nehalem and Westmere cpus.

I have three major objections/concerns.

Firstly, one technical problem i have with the raw events ABI method is that it 
was added in commit e994d7d23a0b ("perf: Fix LLC-* events on Intel 
Nehalem/Westmere"). The raw ABI bit was done 'under the radar', it was not the 
declared title of the commit, it was not declared in the changelog either and 
it was not my intention to offer such an ABI prematurely either - and i noticed 
those two lines too late - but still in time to not let this slip into v2.6.39.

Secondly, Peter posted a patch that might resolve this issue in v2.6.40 - but 
that patch is not cooked yet and you guys have not helped finish it. I'd like 
to see that process play out first - maybe we discover some detail that will 
force us to modify the config1/config2 ABI approach - which we cannot do if 
this is released into v2.6.39 prematurely.

Thirdly, and this is my most fundamental objection, i also object to the timing 
of this offcore raw access ABI, because past experience is that we *really* do 
not want to allow raw PMU details without *first* having generic abstractions 
and generic events first.

The discussion in the "[PATCH 1/1] perf tools: Add missing user space support 
for config1/config2" thread on lkml has demonstrated it pretty well: people 
only started making serious thoughts about proper structure and abstractions 
and easy tooling once they were forced to think about that ...

The thing is, as far as i can see you and Andi are *still* pushing the failed 
perfmon and Oprofile ABI and tooling models.

My job as a maintainer is to notice such things and to say 'no' to incomplete 
bits.

Basically without proper generalization people get sloppy and go the fast path 
and export very low level, opaque, unstructured PMU interfaces to user-space 
and repeat the Oprofile and perfmon tooling mistakes again and again.

 "Thinking is hard, lets go shopping^W exporting raw ABIs."

So the perf events policy has always been that while we tolerate raw events 
(there's nothing bad with offering them once generic events have crystallized 
out), we only accept them if the *useful* events are first abstracted and 
generalized out.

We put structure, proper abstractions and easy tooling *ahead* of the interests 
of a small group of people who'd rather prefer a lowlevel, opaque hardware 
channel so that they do not have to *think* about generalization and also 
perhaps so they do not have to share their selection of events and analysis 
methods with others ...

For the offcore patches this concept of 'abstraction first' has been ignored 
entirely, and commit e994d7d23a0b ("perf: Fix LLC-* events on Intel 
Nehalem/Westmere") has (without declaring it in the changelog) added a raw ABI 
hack to the offcore PMU features without bothering to factor out the useful 
events first. This slipped through and i only noticed it when Andi's patch got 
to me:

   https://lkml.org/lkml/2011/4/22/14

Generalization of offcore, NUMA memory events is very much possible and 
desirable, and Peter has posted an RFC patch that implements one form of it:

   https://lkml.org/lkml/2011/4/22/281

And with that done raw events can be offered as well.

But it's still work in progress - it might be mergable in v2.6.40. 
Unfortunately neither you nor Andi has actually bothered testing (and 
improving) Peter's patch. If we do the raw ABI now i fear you guys will 
disappear and wont ever bother with proper generalization.

We want generalization like Peter's patch first - that is what users really 
need in the end, and that is the price of us supporting/maintaining this PMU 
functionality in the kernel. Once we feel good about it can we expose the raw 
bits as well.

Not the other way around.

> To be fair, this is not technically a regression as the feature was only 
> (finally!) added in the 2.6.39 merge window.  However this is a useful 
> feature and many tools (including the PAPI performance counter library that I 
> work on) had added support for it in anticipation of the 2.6.39 release.
> 
> Ingo's reasons for removing the feature seem to boil down to
>   1.  "perf" doesn't use the functionality, and any other userspace
>       program that uses the perf_events syscalls don't matter
>   2.  Users are too stupid to use the raw functionality properly;
>       we should only allow a kernel-developer-approved small subset
>       of the features provided by the CPU as described in the intel
>       developers manuals.
>
> #2 seems like a gross misinterpretation of the whole "Linux gives you
> enough rope to shoot yourself in the foot" policy from days passed, but maybe 
> things have moved on.

That is a very unfair and misleading summary that grossly misrepresents my 
position. I've made my position very clear to you, multiple times - and so has 
Peter and others have made clear their similar position on this issue.

I detailed my concerns in the commit you want reverted and i also repeated it 
in the lkml discussion, multiple times, as replies to you. You can also see it 
outlined in detail in my reply above.

In light of all that, how you could possibly misrepresent my position in such 
an unfair, distorted and manipulative way is beyond me ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 15:27 ` Andi Kleen
@ 2011-04-29 16:49   ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2011-04-29 16:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Vince Weaver, torvalds, linux-kernel, Peter Zijlstra, Stephane Eranian


* Andi Kleen <ak@linux.intel.com> wrote:

> On Fri, Apr 29, 2011 at 11:04:46AM -0400, Vince Weaver wrote:
> > Hello Linus
> > 
> > can you revert the commit b52c55c6a25e4515b5e075a989ff346fc251ed09
> 
> Acked-by: Andi Kleen <ak@linux.intel.com>

I outlined my objections in my reply to Vince.

> (I wrote the original patch)
> 
> > (finally!) added in the 2.6.39 merge window.  However this is a useful 
> > feature and many tools (including the PAPI performance counter library 
> > that I work on) had added support for it in anticipation of the 2.6.39 
> 
> I also use some tools which benefit from this functionality. The extended raw 
> events are very useful to analyze NUMA problems for once.

Mind sharing those methods and help generalizing them and help making them 
useful to non-experts? Peter's patch which adds a 'NUMA' level to the cache 
event abstractions could be a good start.

Only once generalization has been covered sufficiently, once we are sure we can 
stick with the raw ABI, can we push that upstream.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 15:04 re-enable Nehalem raw Offcore-Events support Vince Weaver
  2011-04-29 15:27 ` Andi Kleen
  2011-04-29 16:42 ` Ingo Molnar
@ 2011-04-29 17:17 ` Pekka Enberg
  2011-04-29 17:25   ` Andi Kleen
  2 siblings, 1 reply; 30+ messages in thread
From: Pekka Enberg @ 2011-04-29 17:17 UTC (permalink / raw)
  To: Vince Weaver
  Cc: torvalds, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Stephane Eranian, Andi Kleen

Hi Vince,

On Fri, Apr 29, 2011 at 6:04 PM, Vince Weaver <vweaver1@eecs.utk.edu> wrote:
> Hello Linus
>
> can you revert the commit b52c55c6a25e4515b5e075a989ff346fc251ed09
>
> This removed functionality from perf_events that allowed raw event access
> for OFFCORE_EVENTS type events on Nehalem and Westmere cpus.
>
> To be fair, this is not technically a regression as the feature was only
> (finally!) added in the 2.6.39 merge window.  However this is a useful
> feature and many tools (including the PAPI performance counter library
> that I work on) had added support for it in anticipation of the 2.6.39
> release.
>
> Ingo's reasons for removing the feature seem to boil down to
>  1.  "perf" doesn't use the functionality, and any other userspace
>      program that uses the perf_events syscalls don't matter
>  2.  Users are too stupid to use the raw functionality properly;
>      we should only allow a kernel-developer-approved small subset
>      of the features provided by the CPU as described in the intel
>      developers manuals.
>
> #2 seems like a gross misinterpretation of the whole "Linux gives you
> enough rope to shoot yourself in the foot" policy from days passed, but
> maybe things have moved on.

That's a gross misrepresentation of what Ingo has been saying on LKML.
Really, learn to work with relevant maintainers before you ask Linus
to revert something.

                                Pekka

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 17:17 ` Pekka Enberg
@ 2011-04-29 17:25   ` Andi Kleen
  2011-04-29 17:37     ` Pekka Enberg
  2011-04-29 17:42     ` Thomas Gleixner
  0 siblings, 2 replies; 30+ messages in thread
From: Andi Kleen @ 2011-04-29 17:25 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Vince Weaver, torvalds, Ingo Molnar, linux-kernel,
	Peter Zijlstra, Stephane Eranian

> >  2.  Users are too stupid to use the raw functionality properly;
> >      we should only allow a kernel-developer-approved small subset
> >      of the features provided by the CPU as described in the intel
> >      developers manuals.
> >
> > #2 seems like a gross misinterpretation of the whole "Linux gives you
> > enough rope to shoot yourself in the foot" policy from days passed, but
> > maybe things have moved on.
> 
> That's a gross misrepresentation of what Ingo has been saying on LKML.
> Really, learn to work with relevant maintainers before you ask Linus
> to revert something.

Ingo may not have explicitely said (2), but at least his revert (disabling
the raw interface users are asking for) is practically implementing (2).

Actions speak louder than words.

That is either you have a raw interface or you only have the cooked
interface or you have both. Since he reverted raw only cooked
is left, which is (2)

I agree with Vince it's a bad policy.

-Andi


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 17:25   ` Andi Kleen
@ 2011-04-29 17:37     ` Pekka Enberg
  2011-04-29 17:46       ` Vince Weaver
  2011-04-29 17:42     ` Thomas Gleixner
  1 sibling, 1 reply; 30+ messages in thread
From: Pekka Enberg @ 2011-04-29 17:37 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Vince Weaver, torvalds, Ingo Molnar, linux-kernel,
	Peter Zijlstra, Stephane Eranian

On Fri, Apr 29, 2011 at 8:25 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> >  2.  Users are too stupid to use the raw functionality properly;
>> >      we should only allow a kernel-developer-approved small subset
>> >      of the features provided by the CPU as described in the intel
>> >      developers manuals.
>> >
>> > #2 seems like a gross misinterpretation of the whole "Linux gives you
>> > enough rope to shoot yourself in the foot" policy from days passed, but
>> > maybe things have moved on.
>>
>> That's a gross misrepresentation of what Ingo has been saying on LKML.
>> Really, learn to work with relevant maintainers before you ask Linus
>> to revert something.
>
> Ingo may not have explicitely said (2), but at least his revert (disabling
> the raw interface users are asking for) is practically implementing (2).
>
> Actions speak louder than words.
>
> That is either you have a raw interface or you only have the cooked
> interface or you have both. Since he reverted raw only cooked
> is left, which is (2)
>
> I agree with Vince it's a bad policy.

So a maintainer reverts an ABI that he thinks needs more thought/work
before it's too late and we're stuck with it forever. Can you please
explain what's the problem here?

Asking Linus to revert the commit is short-sighted and doesn't solve
the problem. Learn to work with the maintainer and save yourself a lot
of trouble.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 17:25   ` Andi Kleen
  2011-04-29 17:37     ` Pekka Enberg
@ 2011-04-29 17:42     ` Thomas Gleixner
  2011-04-30 20:06       ` Corey Ashford
  1 sibling, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2011-04-29 17:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Pekka Enberg, Vince Weaver, torvalds, Ingo Molnar, linux-kernel,
	Peter Zijlstra, Stephane Eranian

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1589 bytes --]

On Fri, 29 Apr 2011, Andi Kleen wrote:

> > >  2.  Users are too stupid to use the raw functionality properly;
> > >      we should only allow a kernel-developer-approved small subset
> > >      of the features provided by the CPU as described in the intel
> > >      developers manuals.
> > >
> > > #2 seems like a gross misinterpretation of the whole "Linux gives you
> > > enough rope to shoot yourself in the foot" policy from days passed, but
> > > maybe things have moved on.
> > 
> > That's a gross misrepresentation of what Ingo has been saying on LKML.
> > Really, learn to work with relevant maintainers before you ask Linus
> > to revert something.
> 
> Ingo may not have explicitely said (2), but at least his revert (disabling
> the raw interface users are asking for) is practically implementing (2).
> 
> Actions speak louder than words.
> 
> That is either you have a raw interface or you only have the cooked
> interface or you have both. Since he reverted raw only cooked
> is left, which is (2)
> 
> I agree with Vince it's a bad policy.

No, it's not the raw interface will be made available when the proper
set of abstracted functionality has been added and settled down,
simply because it might to change the way the raw event is exposed. As
long there are open questions which might have an influence on the
exposure of the raw event, it's completely correct to keep it
disabled.

Though you and Vince ignored Peters patches and the questions he
raised and just kept harping on your own interests.

That's a bad attitude, but we've been there before.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 17:37     ` Pekka Enberg
@ 2011-04-29 17:46       ` Vince Weaver
  2011-04-29 17:59         ` Pekka Enberg
  0 siblings, 1 reply; 30+ messages in thread
From: Vince Weaver @ 2011-04-29 17:46 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andi Kleen, torvalds, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Stephane Eranian

On Fri, 29 Apr 2011, Pekka Enberg wrote:

> Asking Linus to revert the commit is short-sighted and doesn't solve
> the problem. Learn to work with the maintainer and save yourself a lot
> of trouble.

Work "with" Ingo?  That's turned out well so far.  I'm sure certain 
scheduler people could comment here too on where that gets you.

The kernel I run is "Linux" not "Ingoix".  So I await a comment from Linus 
on this issue.  If it turns out that he's happy with Ingo's work, fine.
It just means I'll have to start maintaining some perf counter related 
patches out of tree for those of us who actually like having control on 
what we're measuring.

Vince

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 17:46       ` Vince Weaver
@ 2011-04-29 17:59         ` Pekka Enberg
  0 siblings, 0 replies; 30+ messages in thread
From: Pekka Enberg @ 2011-04-29 17:59 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Andi Kleen, torvalds, Ingo Molnar, linux-kernel, Peter Zijlstra,
	Stephane Eranian

Hi Vince!

On Fri, 29 Apr 2011, Pekka Enberg wrote:
>> Asking Linus to revert the commit is short-sighted and doesn't solve
>> the problem. Learn to work with the maintainer and save yourself a lot
>> of trouble.

On Fri, Apr 29, 2011 at 8:46 PM, Vince Weaver <vweaver1@eecs.utk.edu> wrote:
> Work "with" Ingo?  That's turned out well so far.  I'm sure certain
> scheduler people could comment here too on where that gets you.

Yeah, that Ingo dude is really impossible to work with (as are most
kernel maintainers)! I've personally been so unfortunate that I've
never had any problems but it must be my bad attitude to working with
other people and actually listening to them. :-(

On Fri, Apr 29, 2011 at 8:46 PM, Vince Weaver <vweaver1@eecs.utk.edu> wrote:
> The kernel I run is "Linux" not "Ingoix".  So I await a comment from Linus
> on this issue.  If it turns out that he's happy with Ingo's work, fine.
> It just means I'll have to start maintaining some perf counter related
> patches out of tree for those of us who actually like having control on
> what we're measuring.

Well, it's not Ingoix but Ingo gets to maintain your ABI long after
you're gone while Linus can just sit back, relax, and have a drink. So
I think it'd be fair to at least _pretend_ you care what Ingo thinks
about perf ABIs, no?

                         Pekka

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 16:42 ` Ingo Molnar
@ 2011-04-29 18:01   ` Vince Weaver
  2011-04-29 18:57     ` Ingo Molnar
  2011-04-29 22:16   ` Borislav Petkov
  2011-04-30  1:53   ` Vince Weaver
  2 siblings, 1 reply; 30+ messages in thread
From: Vince Weaver @ 2011-04-29 18:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: torvalds, linux-kernel, Peter Zijlstra, Stephane Eranian,
	Andi Kleen, Thomas Gleixner

On Fri, 29 Apr 2011, Ingo Molnar wrote:

> Firstly, one technical problem i have with the raw events ABI method is that it 
> was added in commit e994d7d23a0b ("perf: Fix LLC-* events on Intel 
> Nehalem/Westmere"). The raw ABI bit was done 'under the radar', it was not the 
> declared title of the commit, it was not declared in the changelog either and 
> it was not my intention to offer such an ABI prematurely either - and i noticed 
> those two lines too late - but still in time to not let this slip into v2.6.39.

The initial patches from November seem to make it clear what is being done 
here.  I thought it was pretty obvious to those reviewing those patches 
what was involved.  How would I have known that OFFCORE_RESPONSE support 
was coming if I didn't see the patches obviously float by on linux-kernel?

> Thirdly, and this is my most fundamental objection, i also object to the timing 
> of this offcore raw access ABI, because past experience is that we *really* do 
> not want to allow raw PMU details without *first* having generic abstractions 
> and generic events first.

why?  Can you explain this better?

> The thing is, as far as i can see you and Andi are *still* pushing the failed 
> perfmon and Oprofile ABI and tooling models.

what ABI?  by the way, I hate oprofile and never use it.

perfmon2 and perfctr are very similar to perf_events in that they provide 
lightly massaged access to the MSRs so you can program whatever raw event 
that you like.

It's true that the *userspace* tools (pfmon, iperfex, PAPI) handle things 
differently than perf, but that's a *userspace* API, not a kernel ABI.  
You seem to keep confusing this.

> We put structure, proper abstractions and easy tooling *ahead* of the interests 
> of a small group of people who'd rather prefer a lowlevel, opaque hardware 
> channel so that they do not have to *think* about generalization and also 
> perhaps so they do not have to share their selection of events and analysis 
> methods with others ...

And generalization across platforms (and even across minor chip revisions) 
*doesn't work*.  It lasted maybe a year in PAPI before it was realized to 
be unworkable.  Talk to some people from AMD or Intel if you want.  It's 
not possible to sanely generalize perf counters.  They are too tied to 
hardware quirks.

Vince
vweaver1@eecs.utk.edu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 18:01   ` Vince Weaver
@ 2011-04-29 18:57     ` Ingo Molnar
  2011-04-30  2:17       ` Vince Weaver
  2011-05-09 11:01       ` stephane eranian
  0 siblings, 2 replies; 30+ messages in thread
From: Ingo Molnar @ 2011-04-29 18:57 UTC (permalink / raw)
  To: Vince Weaver
  Cc: torvalds, linux-kernel, Peter Zijlstra, Stephane Eranian,
	Andi Kleen, Thomas Gleixner


* Vince Weaver <vweaver1@eecs.utk.edu> wrote:

> On Fri, 29 Apr 2011, Ingo Molnar wrote:
> 
> > Firstly, one technical problem i have with the raw events ABI method is that it 
> > was added in commit e994d7d23a0b ("perf: Fix LLC-* events on Intel 
> > Nehalem/Westmere"). The raw ABI bit was done 'under the radar', it was not the 
> > declared title of the commit, it was not declared in the changelog either and 
> > it was not my intention to offer such an ABI prematurely either - and i noticed 
> > those two lines too late - but still in time to not let this slip into v2.6.39.
> 
> The initial patches from November seem to make it clear what is being done 
> here.  I thought it was pretty obvious to those reviewing those patches what 
> was involved.  How would I have known that OFFCORE_RESPONSE support was 
> coming if I didn't see the patches obviously float by on linux-kernel?

Not really, Peter did a lot of review of those patches and they were changed 
beyond recognition from their original form - i think Peter wrote a fair 
portion of the supporting cleanups, as Andi seemed desinterested in acting 
quickly on review feedback.

> > Thirdly, and this is my most fundamental objection, i also object to the 
> > timing of this offcore raw access ABI, because past experience is that we 
> > *really* do not want to allow raw PMU details without *first* having 
> > generic abstractions and generic events first.
> 
> why?  Can you explain this better?

Didn't i do that in the rest of my reply? You even quote some of it below.

> > The thing is, as far as i can see you and Andi are *still* pushing the 
> > failed perfmon and Oprofile ABI and tooling models.
> 
> what ABI? 

Well, the raw events ABI reminds me of the perfmon2/perfmon3 ABI: get the raw 
PMU to user-space as quickly as possible and leave all the details to 
user-space. I do not agree with that model of exposing performance measurement 
hardware features.

> [...] by the way, I hate oprofile and never use it.

I dont 'hate' oprofile per se (hey, i still keep pulling and pushing oprofile 
bits from Robert), i just find it very unintuitive and cumbersome to use, and i 
think it was misdesigned in several ways.

> perfmon2 and perfctr are very similar to perf_events in that they provide 
> lightly massaged access to the MSRs so you can program whatever raw event 
> that you like.

perf events (the kernel side) has a very, very different design from perfmon2 
and perfctr - but judging by your past replies such design aspects you do not 
seem to recognize, let alone appreciate.

> It's true that the *userspace* tools (pfmon, iperfex, PAPI) handle things 
> differently than perf, but that's a *userspace* API, not a kernel ABI.  You 
> seem to keep confusing this.

No, i do not think i am confused, i just disagree with you.

> > We put structure, proper abstractions and easy tooling *ahead* of the 
> > interests of a small group of people who'd rather prefer a lowlevel, opaque 
> > hardware channel so that they do not have to *think* about generalization 
> > and also perhaps so they do not have to share their selection of events and 
> > analysis methods with others ...
> 
> And generalization across platforms (and even across minor chip revisions) 
> *doesn't work*.

Why not? We cannot generalize everything, but generalizing the major CPU 
concepts works quite well for perf. The thing is, the laws of physics are the 
same for all CPUs so they all seem to employ very similar concepts and measure 
those concepts in similar ways, with similar events.

But it's more than that, generalization works even on the *hardware* level:

AMD managed to keep a large chunk of their events stable even across very 
radical changes of the underlying hardware. I have two AMD systems produced 
*10* years apart and they even use the same event encodings for the major 
events.

Intel started introducing stable event definitions a couple of years ago as 
well.

So i think i can tell it with a fairly high confidence factor that you simply 
do not know what you are talking about.

> [...]  It lasted maybe a year in PAPI before it was realized to be 
> unworkable.  Talk to some people from AMD or Intel if you want.  It's not 
> possible to sanely generalize perf counters.  They are too tied to hardware 
> quirks.

I have the exact opposite experience: chip designers we talked to were clearly 
supportive of the generalizations perf events offers and clearly both AMD and 
Intel chips are moving *towards* more stable, more generic and more flexible 
performance event measurement methods.

We are getting more counters and with less constraints. Even the hardware is 
slowly but surely abstracting things out.

It is in the interest of PMU designers as well that their stuff moves one level 
higher within OSs and does not stay at the weird hardware-specific level. 
Hardware is getting more complex, measuring it becomes more complex, so making 
things more generic certainly helps.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 16:42 ` Ingo Molnar
  2011-04-29 18:01   ` Vince Weaver
@ 2011-04-29 22:16   ` Borislav Petkov
  2011-04-30  1:49     ` Vince Weaver
  2011-04-30  1:53   ` Vince Weaver
  2 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2011-04-29 22:16 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, torvalds, linux-kernel, Peter Zijlstra,
	Stephane Eranian, Andi Kleen, Thomas Gleixner

On Fri, Apr 29, 2011 at 06:42:27PM +0200, Ingo Molnar wrote:

[..]

> Basically without proper generalization people get sloppy and go the fast path 
> and export very low level, opaque, unstructured PMU interfaces to user-space 
> and repeat the Oprofile and perfmon tooling mistakes again and again.
> 
>  "Thinking is hard, lets go shopping^W exporting raw ABIs."
> 
> So the perf events policy has always been that while we tolerate raw events 
> (there's nothing bad with offering them once generic events have crystallized 
> out), we only accept them if the *useful* events are first abstracted and 
> generalized out.
> 
> We put structure, proper abstractions and easy tooling *ahead* of the interests 
> of a small group of people who'd rather prefer a lowlevel, opaque hardware 
> channel so that they do not have to *think* about generalization and also 
> perhaps so they do not have to share their selection of events and analysis 
> methods with others ...

Yep, absolutely. Excuse my french but even kernel developers who
can understand perf code don't need to know f*cking magical hex
constants in order to trace a little. And yes, we talk about perf
and say how cool it is but users want to see more examples like on
http://perf.wiki.kernel.org - they want to get to use it first _and_
_then_ maybe look at code/more involved scenarios. Other kernel
developers don't give a rat's ass about the possibility for shooting
themselves in the foot - they want to use this thing without reading
code and CPU documentation for a day first. And I believe I speak for
the majority when I say so.

We're always bitching about Linux usability and now when it comes down
to yet another case where this can be done right for a change, and perf
people are trying to do something productive, you come waving hands
loudly at Linus with revert requests instead of helping. This is as
productive as trying to shoot yourself in the foot.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 22:16   ` Borislav Petkov
@ 2011-04-30  1:49     ` Vince Weaver
  0 siblings, 0 replies; 30+ messages in thread
From: Vince Weaver @ 2011-04-30  1:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, torvalds, linux-kernel, Peter Zijlstra,
	Stephane Eranian, Andi Kleen, Thomas Gleixner

On Fri, 29 Apr 2011, Borislav Petkov wrote:

> On Fri, Apr 29, 2011 at 06:42:27PM +0200, Ingo Molnar wrote:
> >  "Thinking is hard, lets go shopping^W exporting raw ABIs."

> We're always bitching about Linux usability and now when it comes down
> to yet another case where this can be done right for a change, and perf
> people are trying to do something productive, you come waving hands
> loudly at Linus with revert requests instead of helping. This is as
> productive as trying to shoot yourself in the foot.

Have I proposed that the "perf" tool be changed at all?

No.  Never.

I proposed that the interface to allow raw access to offcore events _not_ 
be disabled so that advanced tools can access it directly.

I don't care how perf works.  Nor do I care how many pointless generic 
events get added to the kernel (other than being annoyed about it taking 
up extra bytes in my kernel image).

Reverting this patch would have absolutely no bearing on "perf", the 
usability of perf, or anything that any normal user sees.  I'm not sure 
how the argument is even getting framed that way.

Vince

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 16:42 ` Ingo Molnar
  2011-04-29 18:01   ` Vince Weaver
  2011-04-29 22:16   ` Borislav Petkov
@ 2011-04-30  1:53   ` Vince Weaver
  2011-04-30 20:58     ` Vince Weaver
  2 siblings, 1 reply; 30+ messages in thread
From: Vince Weaver @ 2011-04-30  1:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: torvalds, linux-kernel, Peter Zijlstra, Stephane Eranian,
	Andi Kleen, Thomas Gleixner

On Fri, 29 Apr 2011, Ingo Molnar wrote:

> Generalization of offcore, NUMA memory events is very much possible and 
> desirable, and Peter has posted an RFC patch that implements one form of it:
> 
>    https://lkml.org/lkml/2011/4/22/281
> 

OK, so I "reviewed" this patch. 

It creates a "generalized" new event, that is only actually available on 
Nehalem and Westmere.  It's listed as unavailable for all other known 
architectures.

How is this any better than just using the event by its actual name if you 
happen to have a Nehalem-esque chip?

This is just pointless kernel bloat.

So here's my review:
 NACK

Vince

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 18:57     ` Ingo Molnar
@ 2011-04-30  2:17       ` Vince Weaver
  2011-04-30  7:14         ` Pekka Enberg
  2011-04-30  8:11         ` Borislav Petkov
  2011-05-09 11:01       ` stephane eranian
  1 sibling, 2 replies; 30+ messages in thread
From: Vince Weaver @ 2011-04-30  2:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: torvalds, linux-kernel, Peter Zijlstra, Stephane Eranian,
	Andi Kleen, Thomas Gleixner

On Fri, 29 Apr 2011, Ingo Molnar wrote:

> > why?  Can you explain this better?
> 
> Didn't i do that in the rest of my reply? You even quote some of it below.

No.

You have not explained why having "generalized" counter definitions have 
anything to do with raw event access.

If your argument was you thought that the values being written to 
the config1 and config2 fields of the perf_attr structure might need to be 
better defined, well that's a better argument and I'd buy that.  That's a 
valid technical argument for blocking raw event access (though you 
probably shouldn't have the fields there at all if you are unsure, they 
become ABI pretty quickly).

But your argument isn't that.  Your argument is that you're blocking raw 
event access as some sort of punishment because us HPC people aren't 
providing patches for "generalized" events that we never plan to use.
That's not a technical argument, that's some sort of weird power play.

> Well, the raw events ABI reminds me of the perfmon2/perfmon3 ABI: get the raw 
> PMU to user-space as quickly as possible and leave all the details to 
> user-space. I do not agree with that model of exposing performance measurement 
> hardware features.

well you probably should have thought of that before you enabled raw 
events at all then.  It's a bit too late now.

> > perfmon2 and perfctr are very similar to perf_events in that they provide 
> > lightly massaged access to the MSRs so you can program whatever raw event 
> > that you like.
> 
> perf events (the kernel side) has a very, very different design from perfmon2 
> and perfctr - but judging by your past replies such design aspects you do not 
> seem to recognize, let alone appreciate.

I didn't mean the internal designs were similar.  There's only so many 
sane ways to provide access to perf counters at the kernel level, and all 
of them look a lot alike from a high level.

> > It's true that the *userspace* tools (pfmon, iperfex, PAPI) handle things 
> > differently than perf, but that's a *userspace* API, not a kernel ABI.  You 
> > seem to keep confusing this.
> 
> No, i do not think i am confused, i just disagree with you.

Why does it matter?  Why should you as a kernel devel have any say in what 
my userspace tool looks like, as long as it is using a published ABI in a 
documented manner?

> Why not? We cannot generalize everything, but generalizing the major CPU 
> concepts works quite well for perf. The thing is, the laws of physics are the 
> same for all CPUs so they all seem to employ very similar concepts and measure 
> those concepts in similar ways, with similar events.

Fine.  Can we have a document saying what the events measure?

Also can you provide some way to query from userspace what event is being
used so that if someone reports a problem with an event we can figure
out which one it is in the relevant manual?

For cache events:
  + Do they count prefetches?  (SW, HW?)
  + Do they count coherency misses or just standard CCC ones?
  + Do they count speculative accesses or only retired accesses?
  + Do they count HW pagetable walks?  

For branch events:
  + Are they determnistic?
  + Are they speculative?

For retired instructions:
  + Deterministic?
  + Does it inclue HW interrupt counts?
  + are there any erratta?
  + Are any counted twice?

> AMD managed to keep a large chunk of their events stable even across very 
> radical changes of the underlying hardware. I have two AMD systems produced 
> *10* years apart and they even use the same event encodings for the major 
> events.

Well guess what, AMD family 15h changes all of that.

And you're not going to like LWP.  They got tired of waiting for a 
workable kernel perf counter interface and moved it completely to 
usersapce, and there's nothing you can do about it unless you start 
blocking the xsave patches from getting in.


> Intel started introducing stable event definitions a couple of years ago as 
> well.

yes. ANd just how compatible are they?  You might want to discuss that 
with some people from intel.

> So i think i can tell it with a fairly high confidence factor that you simply 
> do not know what you are talking about.

Really.

> I have the exact opposite experience: chip designers we talked to were clearly 
> supportive of the generalizations perf events offers and clearly both AMD and 
> Intel chips are moving *towards* more stable, more generic and more flexible 
> performance event measurement methods.

You must be talking to different people that I have.  Have you looked at 
Power6/Power7 or ARM counters?

> We are getting more counters and with less constraints. Even the hardware is 
> slowly but surely abstracting things out.

Again... Sandy Bridge?  Interlagos?  You might want to check that out.


In any case I wish you'd get on the ball with uncore, offcore, etc. 

One of the promises made when perf_events was merged was that the kernel 
was the place to do all this stuff because it would allow such quick 
turnaround on new features.

As it is by the time Nehalem Offcore/Uncore support gets into a kernel 
that is picked up by a distro the chips are going to be 3+ years old and 
headed to the recycle bin.

Vince

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-30  2:17       ` Vince Weaver
@ 2011-04-30  7:14         ` Pekka Enberg
  2011-04-30 20:47           ` Vince Weaver
  2011-04-30  8:11         ` Borislav Petkov
  1 sibling, 1 reply; 30+ messages in thread
From: Pekka Enberg @ 2011-04-30  7:14 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, torvalds, linux-kernel, Peter Zijlstra,
	Stephane Eranian, Andi Kleen, Thomas Gleixner

Hi Vince,

On Sat, Apr 30, 2011 at 5:17 AM, Vince Weaver <vweaver1@eecs.utk.edu> wrote:
> But your argument isn't that.  Your argument is that you're blocking raw
> event access as some sort of punishment because us HPC people aren't
> providing patches for "generalized" events that we never plan to use.
> That's not a technical argument, that's some sort of weird power play.

That's not his argument at all and if you fail to see that you really
have no idea what the concept "working with the maintainer" means.

Yes, raw event access was reverted from 2.6.39 but that doesn't mean
it's blocked forever. If you want to keep pushing your feature, please
tone down your crazy-talk and start acting like a developer who's
genuinely interested in Linux, not on your own narrow, selfish goals.

I mean really, I haven't even had the pleasure of interracting a lot
with you and while I personally don't see the problem with raw event
access (if done in a well-thought out manner from ABI pov), you've
already managed to convince me that applying _any_ patch from you is a
bad idea because the baggage that comes with it is simply not worth
it.

If you want to alienate other developers, keep doing what you're doing
- otherwise consider changing your tactics. It's boring to watch you
repeat the same mistakes over and over again.

                        Pekka

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-30  2:17       ` Vince Weaver
  2011-04-30  7:14         ` Pekka Enberg
@ 2011-04-30  8:11         ` Borislav Petkov
  2011-04-30 21:03           ` Vince Weaver
  1 sibling, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2011-04-30  8:11 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, torvalds, linux-kernel, Peter Zijlstra,
	Stephane Eranian, Andi Kleen, Thomas Gleixner, Robert Richter

On Fri, Apr 29, 2011 at 10:17:04PM -0400, Vince Weaver wrote:
> > AMD managed to keep a large chunk of their events stable even across very 
> > radical changes of the underlying hardware. I have two AMD systems produced 
> > *10* years apart and they even use the same event encodings for the major 
> > events.
> 
> Well guess what, AMD family 15h changes all of that.

I don't see a big problem here, Robert has a patch that takes care of
counter constraints. It probably needs a bit more work but we'll get
where we need to be.

> And you're not going to like LWP. They got tired of waiting for a
> workable kernel perf counter interface and moved it completely to
> usersapce,

I don't know where you get your information but that's absolutely and
completely not nearly even beginning to smell the truth.

> and there's nothing you can do about it unless you start blocking the
> xsave patches from getting in.

Look at tip/x86/xsave, looks like LWP support will most likely be in
2.6.40.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 17:42     ` Thomas Gleixner
@ 2011-04-30 20:06       ` Corey Ashford
  2011-05-01  4:45         ` Andi Kleen
  2011-05-01 17:55         ` Ingo Molnar
  0 siblings, 2 replies; 30+ messages in thread
From: Corey Ashford @ 2011-04-30 20:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Pekka Enberg, Vince Weaver, torvalds, Ingo Molnar,
	linux-kernel, Peter Zijlstra, Stephane Eranian, Carl Love

On 04/29/2011 10:42 AM, Thomas Gleixner wrote:
> On Fri, 29 Apr 2011, Andi Kleen wrote:
>
>>>>   2.  Users are too stupid to use the raw functionality properly;
>>>>       we should only allow a kernel-developer-approved small subset
>>>>       of the features provided by the CPU as described in the intel
>>>>       developers manuals.
>>>>
>>>> #2 seems like a gross misinterpretation of the whole "Linux gives you
>>>> enough rope to shoot yourself in the foot" policy from days passed, but
>>>> maybe things have moved on.
>>>
>>> That's a gross misrepresentation of what Ingo has been saying on LKML.
>>> Really, learn to work with relevant maintainers before you ask Linus
>>> to revert something.
>>
>> Ingo may not have explicitely said (2), but at least his revert (disabling
>> the raw interface users are asking for) is practically implementing (2).
>>
>> Actions speak louder than words.
>>
>> That is either you have a raw interface or you only have the cooked
>> interface or you have both. Since he reverted raw only cooked
>> is left, which is (2)
>>
>> I agree with Vince it's a bad policy.
>
> No, it's not the raw interface will be made available when the proper
> set of abstracted functionality has been added and settled down,
> simply because it might to change the way the raw event is exposed. As
> long there are open questions which might have an influence on the
> exposure of the raw event, it's completely correct to keep it
> disabled.

Carl Love and I recently completed some work to add perf_events support 
for the IBM Blue Waters machine's "CPU networking" chip, called the 
Torrent chip.  We did all of this work based on a RHEL 6 kernel 
(2.6.32ish), which doesn't have Peter's more recent multi-PMU support.

I would say that most if not all of the events are not generalizable in 
the sense that you are talking about; the events are very specific to 
the Torrent chip.  For example, the Torrent chip communicates with four 
POWER7 chips via a high-speed serial interconnect, called the W, X, Y, 
and Z links, and it also has similar links which connect to other 
Torrent chips, and to other nodes.  The events measure certain types of 
activity on these various links, for example "X link receive idle".

So if I'm understanding what you have said correctly, we would not be 
able to get a forward port of this code committed without abstracting 
these events in a away that's acceptable to the kernel community.  Is 
that right?  If so, this is important for us to know so that we can 
correctly size the work effort involved in the forward port.

Thanks for your consideration,

- Corey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-30  7:14         ` Pekka Enberg
@ 2011-04-30 20:47           ` Vince Weaver
  2011-05-01 18:31             ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Vince Weaver @ 2011-04-30 20:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ingo Molnar, torvalds, linux-kernel, Peter Zijlstra,
	Stephane Eranian, Andi Kleen, Thomas Gleixner

On Sat, 30 Apr 2011, Pekka Enberg wrote:

> you've
> already managed to convince me that applying _any_ patch from you is a
> bad idea because the baggage that comes with it is simply not worth
> it.

well if it makes you feel better you can do a "git log" and search for my 
name and back out all the included perf related patches that have my name 
associated with them.  Then you can have a "vince-free" kernel without
all the "baggage".

> If you want to alienate other developers, keep doing what you're doing
> - otherwise consider changing your tactics. It's boring to watch you
> repeat the same mistakes over and over again.

I spend a lot of time dealing with developers who use perf-counter related 
interfaces all the time.  They complain to me *constantly* about the 
drawbacks of perf_events, because PAPI is one step up from the kernel.

I try to get them to interact with the kernel people, but they won't.  Do 
you know why?  Because they feel like the perf_events developers are rude 
at best, unhelpful in general, and actively anti-anyone-not-using-perf.

Most of them simply think it's not worth dealing with the perf_events 
people, even if it means hardship down the road.  I keep trying because I 
am foolishly idealistic at times.  So anyway all of my vitriol is the 
combined power of scores of disenfranchised developers, who were happy to 
work on kernel problems when the perfmon2 developers were running things, 
but now won't touch it with a 10-foot pole.

So make of that what you will, but things go both ways.  You can be as 
obnoxious as you want as a maintainer but don't expect people to send you 
patches if you are.

Vince


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-30  1:53   ` Vince Weaver
@ 2011-04-30 20:58     ` Vince Weaver
  2011-04-30 21:09       ` Alan Cox
  0 siblings, 1 reply; 30+ messages in thread
From: Vince Weaver @ 2011-04-30 20:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: torvalds, linux-kernel, Peter Zijlstra, Stephane Eranian,
	Andi Kleen, Thomas Gleixner

On Fri, 29 Apr 2011, Vince Weaver wrote:

> >    https://lkml.org/lkml/2011/4/22/281
> > 
> 
> So here's my review:
>  NACK

so a slightly more useful review on slightly more sleep.

You are doing things backwards with your "generalization first" policy.

The right way to do things is enable raw event support first.

Then you can have users experiment with the feature.  Try various events 
using their favorite userspace utility (be it libpfm4, PAPI, perf).  This 
is easy, as choosing a new event is a simple matter of changing the 
command line option for your measurement.  Once a good event is found for 
generalization, *THEN* you add a generalized event that is well tested.

Your way is difficult.  Fine, Peter picks some arbitrary event he thinks 
work well.  I have to download a git kernel and reboot my machine (a 
process that takes an hour at best assuming I have root access).  Then if 
I want to try a new event, since RAW access is blocked, I have to patch 
the kernel, recompile, reboot.  So at least an hour between tests.

This assumes I can even do that.  My only Nehalem machine is at work and 
has only a fragile wireless network connection that requires manual 
intervention to get going.  so I *can't* review a change in general events 
with a remove access when it lives in the kernel, yet if it was in user 
space like it *should be* I could test away all day no problem.

See the problem here?  Going general event first makes it seriously 
inconvenient to test and so noone is going to do it for you because it's 
such a pain.  RAW first is the way to go.

Vince

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-30  8:11         ` Borislav Petkov
@ 2011-04-30 21:03           ` Vince Weaver
  0 siblings, 0 replies; 30+ messages in thread
From: Vince Weaver @ 2011-04-30 21:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, torvalds, linux-kernel, Peter Zijlstra,
	Stephane Eranian, Andi Kleen, Thomas Gleixner, Robert Richter

On Sat, 30 Apr 2011, Borislav Petkov wrote:
> On Fri, Apr 29, 2011 at 10:17:04PM -0400, Vince Weaver wrote:

> > Well guess what, AMD family 15h changes all of that.
> 
> I don't see a big problem here, Robert has a patch that takes care of
> counter constraints. It probably needs a bit more work but we'll get
> where we need to be.

yes, but it's a bit of a change from the PMU of previous AMD chips,
going against Ingo's argument that the featureset of all modern CPUs is 
somehow converging.

> > And you're not going to like LWP. They got tired of waiting for a
> > workable kernel perf counter interface and moved it completely to
> > usersapce,
> 
> I don't know where you get your information but that's absolutely and
> completely not nearly even beginning to smell the truth.

I talked with someone fairly involved in the development with LWP who 
implied as much in an off-the-record discussion.  You have to admit back
5-6 years old when LWP was being planned it wasn't certain that kernel 
support for perf events was *ever* going to make it into Linux.
Though maybe AMD is more concerned about the even worse support in other 
OSes.  It's true you'd probably know better.

> Look at tip/x86/xsave, looks like LWP support will most likely be in
> 2.6.40.

Really?  Does Ingo know yet?  I get the impression he doesn't like 
perf-event features slipping in under the radar like that.

Vince


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-30 20:58     ` Vince Weaver
@ 2011-04-30 21:09       ` Alan Cox
  0 siblings, 0 replies; 30+ messages in thread
From: Alan Cox @ 2011-04-30 21:09 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, torvalds, linux-kernel, Peter Zijlstra,
	Stephane Eranian, Andi Kleen, Thomas Gleixner

> See the problem here?  Going general event first makes it seriously 
> inconvenient to test and so noone is going to do it for you because it's 
> such a pain.  RAW first is the way to go.

Or you build your patch back in each time.

Lots of us don't run a Linus kernel. Mine gets several patches each
update which mean the disk performance is typically a few percent faster
than the upstream one etc.

Alan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-30 20:06       ` Corey Ashford
@ 2011-05-01  4:45         ` Andi Kleen
  2011-05-01 18:00           ` Ingo Molnar
  2011-05-01 17:55         ` Ingo Molnar
  1 sibling, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2011-05-01  4:45 UTC (permalink / raw)
  To: Corey Ashford
  Cc: Thomas Gleixner, Pekka Enberg, Vince Weaver, torvalds,
	Ingo Molnar, linux-kernel, Peter Zijlstra, Stephane Eranian,
	Carl Love

> I would say that most if not all of the events are not generalizable
> in the sense that you are talking about; the events are very
> specific to the Torrent chip.  For example, the Torrent chip

It's similar also on Intel chips. There are lots of events 
which are useful, but are unlikely to have any equivalents
on other designs (or sometimes not even in later/earlier chip
generations). So such a requirement would make it impossible
to support them.

Given a lot of them are obscure, but a lot of others are not
and they can be very useful for specific analyses.
Computers are getting more and more complex and we need all
the help we can get to understand their behaviour.

For example we've been recently using various Nehalem+ events for NUMA
tuning (memory latency and offcore) and it is very useful and 
fuitful.  But there are a lot of specialities there which do not extend to 
other chips.

I've been working around that now by programming the special registers 
in user space from special wrapper scripts, but clearly that's not a good 
solution and doesn't also work in all cases.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-30 20:06       ` Corey Ashford
  2011-05-01  4:45         ` Andi Kleen
@ 2011-05-01 17:55         ` Ingo Molnar
  2011-05-02 18:32           ` Corey Ashford
  1 sibling, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2011-05-01 17:55 UTC (permalink / raw)
  To: Corey Ashford
  Cc: Thomas Gleixner, Andi Kleen, Pekka Enberg, Vince Weaver,
	torvalds, linux-kernel, Peter Zijlstra, Stephane Eranian,
	Carl Love


* Corey Ashford <cjashfor@linux.vnet.ibm.com> wrote:

> Carl Love and I recently completed some work to add perf_events support for 
> the IBM Blue Waters machine's "CPU networking" chip, called the Torrent chip.  
> We did all of this work based on a RHEL 6 kernel (2.6.32ish), which doesn't 
> have Peter's more recent multi-PMU support.
> 
> I would say that most if not all of the events are not generalizable in the 
> sense that you are talking about; the events are very specific to the Torrent 
> chip. [...]

That's ok and not a problem.

The issue here are events that *are* generalizable.

> So if I'm understanding what you have said correctly, we would not be able to 
> get a forward port of this code committed without abstracting these events in 
> a away that's acceptable to the kernel community. [...]

If the number of events worth generalizing is the empty set that's ok.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-05-01  4:45         ` Andi Kleen
@ 2011-05-01 18:00           ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2011-05-01 18:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Corey Ashford, Thomas Gleixner, Pekka Enberg, Vince Weaver,
	torvalds, linux-kernel, Peter Zijlstra, Stephane Eranian,
	Carl Love


* Andi Kleen <ak@linux.intel.com> wrote:

> > I would say that most if not all of the events are not generalizable
> > in the sense that you are talking about; the events are very
> > specific to the Torrent chip.  For example, the Torrent chip
> 
> It's similar also on Intel chips. [...]

You seem to be seriously misinformed about Intel CPUs.

There are a fair number of events on Intel CPUs that can be generalized and 
which we have already generalized. Here's a selection:

 Performance counter stats for './fill_1b':

       2829.562519 task-clock               #    0.994 CPUs utilized          
                27 context-switches         #    0.000 M/sec                  
                52 CPU-migrations           #    0.000 M/sec                  
                99 page-faults              #    0.000 M/sec                  
     8,559,062,611 cycles                   #    3.025 GHz                      (20.02%)
     2,530,761,381 stalled-cycles-frontend  #   29.57% frontend cycles idle     (30.03%)
       423,070,037 stalled-cycles-backend   #    4.94% backend  cycles idle     (40.04%)
    18,043,436,126 instructions             #    2.11  insns per cycle        
                                            #    0.14  stalled cycles per insn  (50.04%)
     1,007,704,770 branches                 #  356.134 M/sec                    (60.04%)
           521,894 branch-misses            #    0.05% of all branches          (60.02%)
         9,424,849 L1-dcache-loads          #    3.331 M/sec                    (50.03%)
         1,028,884 L1-dcache-load-misses    #   10.92% of all L1-dcache hits    (50.02%)
           490,266 LLC-loads                #    0.173 M/sec                    (39.99%)
           133,226 LLC-load-misses          #    0.047 M/sec                    (10.01%)

        2.846836822  seconds time elapsed

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-30 20:47           ` Vince Weaver
@ 2011-05-01 18:31             ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2011-05-01 18:31 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Pekka Enberg, torvalds, linux-kernel, Peter Zijlstra,
	Stephane Eranian, Andi Kleen, Thomas Gleixner


* Vince Weaver <vweaver1@eecs.utk.edu> wrote:

> I spend a lot of time dealing with developers who use perf-counter related 
> interfaces all the time.  They complain to me *constantly* about the 
> drawbacks of perf_events, because PAPI is one step up from the kernel.
> 
> I try to get them to interact with the kernel people, but they won't.  Do you 
> know why?  Because they feel like the perf_events developers are rude at 
> best, unhelpful in general, and actively anti-anyone-not-using-perf.

Arnaldo, the maintainer of perf tooling (and with whom most users complaining 
about perf would be interacting) is one of the most responsive maintainers and 
developers i've ever seen. I have not seen him brush off a single user 
bugreport or complaint, ever - let alone be 'unhelpful' or be anti-anyone. 
Ditto for Peter.

They didnt even brush *you* off, ever.

Let me guess, you just made that argument up, right?

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-05-01 17:55         ` Ingo Molnar
@ 2011-05-02 18:32           ` Corey Ashford
  0 siblings, 0 replies; 30+ messages in thread
From: Corey Ashford @ 2011-05-02 18:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Andi Kleen, Pekka Enberg, Vince Weaver,
	torvalds, linux-kernel, Peter Zijlstra, Stephane Eranian,
	Carl Love

On 05/01/2011 10:55 AM, Ingo Molnar wrote:
> 
> * Corey Ashford <cjashfor@linux.vnet.ibm.com> wrote:
> 
>> Carl Love and I recently completed some work to add perf_events support for 
>> the IBM Blue Waters machine's "CPU networking" chip, called the Torrent chip.  
>> We did all of this work based on a RHEL 6 kernel (2.6.32ish), which doesn't 
>> have Peter's more recent multi-PMU support.
>>
>> I would say that most if not all of the events are not generalizable in the 
>> sense that you are talking about; the events are very specific to the Torrent 
>> chip. [...]
> 
> That's ok and not a problem.
> 
> The issue here are events that *are* generalizable.
> 
>> So if I'm understanding what you have said correctly, we would not be able to 
>> get a forward port of this code committed without abstracting these events in 
>> a away that's acceptable to the kernel community. [...]
> 
> If the number of events worth generalizing is the empty set that's ok.
> 
> Thanks,
> 
> 	Ingo

Great, that's good to hear.

Thanks,

- Corey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-04-29 18:57     ` Ingo Molnar
  2011-04-30  2:17       ` Vince Weaver
@ 2011-05-09 11:01       ` stephane eranian
  2011-05-10  9:35         ` Ingo Molnar
  1 sibling, 1 reply; 30+ messages in thread
From: stephane eranian @ 2011-05-09 11:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vince Weaver, torvalds, linux-kernel, Peter Zijlstra, Andi Kleen,
	Thomas Gleixner, eranian, Arun Sharma, Corey Ashford

On Fri, Apr 29, 2011 at 8:57 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Vince Weaver <vweaver1@eecs.utk.edu> wrote:
>
>> On Fri, 29 Apr 2011, Ingo Molnar wrote:
>>
>> > Firstly, one technical problem i have with the raw events ABI method is that it
>> > was added in commit e994d7d23a0b ("perf: Fix LLC-* events on Intel
>> > Nehalem/Westmere"). The raw ABI bit was done 'under the radar', it was not the
>> > declared title of the commit, it was not declared in the changelog either and
>> > it was not my intention to offer such an ABI prematurely either - and i noticed
>> > those two lines too late - but still in time to not let this slip into v2.6.39.
>>
>> The initial patches from November seem to make it clear what is being done
>> here.  I thought it was pretty obvious to those reviewing those patches what
>> was involved.  How would I have known that OFFCORE_RESPONSE support was
>> coming if I didn't see the patches obviously float by on linux-kernel?
>
> Not really, Peter did a lot of review of those patches and they were changed
> beyond recognition from their original form - i think Peter wrote a fair
> portion of the supporting cleanups, as Andi seemed desinterested in acting
> quickly on review feedback.
>

I did spend quite some time looking at the patch, testing it,
debugging it with Lin
Ming. It was all done in the open. We even discussed with Peter the
config1/config2
approach instead of stashing the extra bits in config due to
SandyBridge. During those
months, nobody, absolutely nobody, including YOU, objected to the fact
that the patch
did not provide a generic abstraction for the offcore_response events.
I find it hard
to believe you overlooked that until the last minute. There was no
'under the radar'
behavior. So please, stick to the facts.

> Secondly, Peter posted a patch that might resolve this issue in v2.6.40 - but
> that patch is not cooked yet and you guys have not helped finish it. I'd like
> to see that process play out first - maybe we discover some detail that will
> force us to modify the config1/config2 ABI approach - which we cannot do if
> this is released into v2.6.39 prematurely.
>

I would think the opposite would happen. The config1 is pretty much all you
need to pass the extra config for this event. The hardware is not going to
change from under us on those processors. Keep in mind that offcore_response
is not an architected event and will never be. I would rather see a situation
where you devise mappings to generic events for v2.6.40 and then later you
realize they are wrong. Now, you've changed the behavior of the kernel, it does
not count the same thing anymore. This has already happened with the existing
generic events and will continue to happen based on my limited understanding
of what they're supposed to count.

> Thirdly, and this is my most fundamental objection, i also object to the timing
> of this offcore raw access ABI, because past experience is that we *really* do
> not want to allow raw PMU details without *first* having generic abstractions
> and generic events first.

I am not opposed to generic events. But I don't think they're the
ultimate solution
to all your performance problems: the crystal ball you're trying to sell.

I also don't think users are sloppy either. That's not showing a lot
of considerations
for end-users. I also don't quite follow the reasoning here: "Users are sloppy,
therefore push all the complexity in the "smart" kernel'. What's wrong
with having
smarter tools to help users? The kernel is not necessarily the
solution to all users'
problems. Tool developers are as talented and innovative as kernel developers.

Performance monitoring is not and never will be a 5mn thing you do at
the end of the
day. Same thing for tools, the fact that you write a performance tool
in half a day
is not necessarily a sign that the tool or the kernel API it sits on,
are very good.
What matters is the quality of the data it returns, the quality of the
interpretation
of the data and how it can be translated into program changes that may
eventually lead
to performance improvements. So when I can do a quick:

 $  perf stat -e l1-load-misses foo

I want to be sure:
 - I understand what I am actually measuring
 - I am measuring the same thing on different processors
 - what I am measuring does not change at each kernel version

Sure, it spares me the time to read the manual, but I'd like to be sure
I understand what's going on. It is easy to be misled by counts (see below).
As we've discussed earlier, what matters is the ability to associate costs to
events. I think it would be quite hard to associate costs to generic events when
many are just too broad.

Generic events could be a first approximation BUT they need to be very carefully
defined. You need to clearly state what they count. That's really a minimum.
And if they are just approximations, then I need to know to what extent. Those
rules would have to be set across the board. If you start saying that on Intel
these restrictions apply and on AMD another set of restrictions applies, then
what's the point of all of this?  "Sloppy" users should not be expected to
sift through the kernel changelog to realize that some generic events have
restrictions or are just vast approximations. Ultimately, the tool has to be
aware of this to warn users. This is the problem with the model, it creates
the illusion of uniformity an stability, when the reality is quite different.

You also need to be more careful in how you map generic events. This
goes back to your
"thinking is hard, ..." argument. You do need to think hard before you
come up with
an event you think would be valuable as a generic event. Such event
becomes valuable
only if it can be mapped on MORE than one processor AND measure the SAME thing.
Failure to do so, means the model is useless.

A quick reading of the Intel event table to find approximate mappings
is not enough.
Given generic events are a center-piece of your design, you need to be
extra cautious
when adding mappings. I would expect you'd write micro-benchmarks to
validate that
the event counts what its generic mapping is defined for.

I am afraid, your recent series of stalls events is not a perfect
illustration of that.
Here is an example:

 /* UOPS_EXECUTED.CORE_ACTIVE_CYCLES,c=1,i=1 */
 intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x1803fb1;

There is a reason this event is called CORE. When HT is on, it counts
what's going on for the
two threads. You're measuring your CPU and the sibling CPU. If you are
stalled and the other
thread is not, you will vastly undercount. This is regardless of the
setting of that ANY bit.
The count is wrong when running per-thread mode. At the user level,
you think you're measuring
stalls in your thread when the reality is very different.  This
example just illustrates the danger
of generic events.

Going back to offcore-response, generic events becomes valuable if you
can map them
onto more than one processor. I'd like to understand their mappings on
AMD processors.

As you said, most processors have common micro-architectural
components these days.
But that does NOT mean you can measure them the same way. The Intel
and AMD event
tables are full of examples of that (LLC misses is one). I am not
necessarily happy about
that, but I can understand why this happens. Many times, it is not
possible to compensate
in SW for the HW differences in how an event counts despite its
concept being apparently
simple such as with a cache miss.

> But it's more than that, generalization works even on the *hardware* level:
>
> AMD managed to keep a large chunk of their events stable even across very
> radical changes of the underlying hardware. I have two AMD systems produced
> *10* years apart and they even use the same event encodings for the major
> events.
>
> Intel started introducing stable event definitions a couple of years ago as
> well.
>

I don't agree with this statement. It's not happening. The proof is that Intel
came out with the architected events with the Core micro-architecture. Since,
then, we've had Nehalem, Westmere, Sandy Bridge and the list of architected
events has NOT been extended. I bet you, it won't with follow on processors.
It does not make sense. The micro-architecture keeps changing. Take the uncore
component. It  varies between a single-socket and dual-socket WSM and is
totally different on the EX part. You think you can ever get an architected last
level cache miss event that works across the board? The event definition does
matter and it's not a marginal issue.

As for AMD, yes, it has not changed in 10 years, but that does not
mean the problem
is solved and that all events are useful. Furthermore, I am sure
you've seen the AMD
patches for Fam15h processors (Bulldozer), they've added a bunch of
event constraints.

> Basically without proper generalization people get sloppy and go the fast path
> and export very low level, opaque, unstructured PMU interfaces to user-space
> and repeat the Oprofile and perfmon tooling mistakes again and again.
>
>  "Thinking is hard, lets go shopping^W exporting raw ABIs."
>

What is your proposal for the proper abstraction for AMD IBS, then?


> We put structure, proper abstractions and easy tooling *ahead* of the interests
> of a small group of people who'd rather prefer a lowlevel, opaque hardware
> channel so that they do not have to *think* about generalization and also
> perhaps so they do not have to share their selection of events and analysis
> methods with others ...
>

Now what? A conspiracy theory. You really think that's the goal of those
people (which I bet include myself)? The reality is quite different. Those
people want to help. They have been looking at this for years. They know
where the pitfalls are and they are trying to raise awareness. They also
want to make sure Linux provides them with an infrastructure on which they
can build better tools for advanced analysis.

Don't go claiming those people will run away once they have raw event access.
Have I not contributed patches to perf_events to make it better and that
despite what happened two years ago?

Nobody is trying to conceal events or analysis techniques (see the presentation
below). People are trying to get what they need based on past experience dealing
with PMU hardware and applications.

Related to that, the following statement on Vince:

> So i think i can tell it with a fairly high confidence factor that you simply
> do not know what you are talking about.

I think this is a gratuitous and unfounded statement. I have known Vince for
years. He has been studying the PMU events for years, writing micro-benchmarks
to really understand what they actually count and their differences
across processors.
So I think he is fully qualified to comment on events.


As described above, there are lots of pitfalls when using PMU events. I'd like
to have to access the events as described in the processor specs. There is no
harm in doing so. This is a way of validating measurements and also a
way of doing
finer grain analysis. The extra 1% of performance does matter for a
lot of applications
and for those you need a lot more than the generic events.

Analysis techniques have been published (not concealed). The following
presentation
given at CERN a few months back is a good example:

    https://openlab-mu-internal.web.cern.ch/openlab-mu-internal/03_Documents/4_Presentations/Slides/2010-list/HPC_Perf_analysis_Xeon_5500_5600_intro.pdf

We believe we can build tools to create that decomposition tree. Such
decomposition
needs access to many raw events. Some people have already prototyped tools based
on those analysis techniques:

    http://mkortela.web.cern.ch/mkortela/ptuview/

If perf_events does not allow such tools to be built because it is
artificially restricting
access to certain hardware features, then people, incl. myself, may legitimately
question its usefulness.

In summary, I am not a believer in generic events, at least not at the
kernel level.
That does not mean I am against them. However, I am against the ideas that there
should only be generic events and that generic events should come first.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: re-enable Nehalem raw Offcore-Events support
  2011-05-09 11:01       ` stephane eranian
@ 2011-05-10  9:35         ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2011-05-10  9:35 UTC (permalink / raw)
  To: eranian
  Cc: Vince Weaver, torvalds, linux-kernel, Peter Zijlstra, Andi Kleen,
	Thomas Gleixner, eranian, Arun Sharma, Corey Ashford


* stephane eranian <eranian@googlemail.com> wrote:

> > Thirdly, and this is my most fundamental objection, i also object to the 
> > timing of this offcore raw access ABI, because past experience is that we 
> > *really* do not want to allow raw PMU details without *first* having 
> > generic abstractions and generic events first.
> 
> I am not opposed to generic events. [...]

Ok - and that's the most important point really.

> [...] But I don't think they're the ultimate solution to all your performance 
> problems: the crystal ball you're trying to sell.

I do not claim that and i'm not selling a crystal ball either.

I just see that 90%+ of our users use generic events (most in fact just use 
whatever comes as a default, which is cycles) and only a tiny niche uses raw 
events. I'm responding to that demand.

[ We saw that with Oprofile already: only an exceedingly small minority *ever* 
  made use of any event but the default Oprofile came with.

  So even with our current generalizations we have more than the typical 
  developer would use for profiling and we try to not define everything and the 
  kitchen sink but respond to demand in a common sense way as we see it. ]

And note that i have no problems with and no prejudices against crazy niches 
(-rt, anyone?), as long as they *know* that they are crazy and as long as they 
help the advancement of the common case!

Really, as a Linux kernel maintainer i'm very easily corrupted by niches: if 
you want me to care about your niche you only need to bribe me with 
improvements to the more common case! :-)

Note that time is running out to get the offcore bits activated even in 
v2.6.40: we are at -rc7 and the merge window is getting closer.

So if you guys care about this code please have a look at Peter's patch and 
help test/finish it (or provide a detailed and convincing technical review of 
his patch to prove why his approach to provide node level events is impossible 
to meet).

Arguing in this thread some more wont help get the code changed i'm afraid!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-05-10  9:36 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-29 15:04 re-enable Nehalem raw Offcore-Events support Vince Weaver
2011-04-29 15:27 ` Andi Kleen
2011-04-29 16:49   ` Ingo Molnar
2011-04-29 16:42 ` Ingo Molnar
2011-04-29 18:01   ` Vince Weaver
2011-04-29 18:57     ` Ingo Molnar
2011-04-30  2:17       ` Vince Weaver
2011-04-30  7:14         ` Pekka Enberg
2011-04-30 20:47           ` Vince Weaver
2011-05-01 18:31             ` Ingo Molnar
2011-04-30  8:11         ` Borislav Petkov
2011-04-30 21:03           ` Vince Weaver
2011-05-09 11:01       ` stephane eranian
2011-05-10  9:35         ` Ingo Molnar
2011-04-29 22:16   ` Borislav Petkov
2011-04-30  1:49     ` Vince Weaver
2011-04-30  1:53   ` Vince Weaver
2011-04-30 20:58     ` Vince Weaver
2011-04-30 21:09       ` Alan Cox
2011-04-29 17:17 ` Pekka Enberg
2011-04-29 17:25   ` Andi Kleen
2011-04-29 17:37     ` Pekka Enberg
2011-04-29 17:46       ` Vince Weaver
2011-04-29 17:59         ` Pekka Enberg
2011-04-29 17:42     ` Thomas Gleixner
2011-04-30 20:06       ` Corey Ashford
2011-05-01  4:45         ` Andi Kleen
2011-05-01 18:00           ` Ingo Molnar
2011-05-01 17:55         ` Ingo Molnar
2011-05-02 18:32           ` Corey Ashford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.