All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about Perf's handling of in-use performance counters
@ 2016-10-21 21:59 Taylor Andrews
  2016-10-27 18:11 ` Andi Kleen
  0 siblings, 1 reply; 12+ messages in thread
From: Taylor Andrews @ 2016-10-21 21:59 UTC (permalink / raw)
  To: linux-perf-users

Hi all,

From what I have seen, this list seems full of perf experts, so I am hoping one or more experts will be able to help me understand some perf behavior we are seeing.

First some background:

VMware's virtual x86 performance counter implementation aims to expose in-use (unavailable) performance counters to the guest operating system in the hopes that software agents will recognize it as an "in-use" resource and follow the PMU sharing guidelines outlined in Intel's Performance Monitoring Unit Sharing Guide (https://software.intel.com/en-us/articles/performance-monitoring-unit-guidelines/).  There is also a VMware-based mechanism to force virtual performance counters to be exposed to the guest operating system as in-use.  "In-use" is defined in the sharing guidelines as the enable bits being found to be set, either in the general purpose PMC's event select MSR, or in the case of fixed function counters, in the Fixed Counter Control MSR.

The Linux PMU driver looks like it currently complains about the BIOS being "broken" if it finds any counters are in-use by it, but it still successfully initializes: https://github.com/torvalds/linux/blob/a5ebe0ba3dff658c5286e8d5f20e4328f719d5a3/arch/x86/kernel/cpu/perf_event.c#L181


The odd behavior:

When running perf in a VMware VM with virtualized x86 performance counters, we sometimes see messages like the following in the log file:

2016-10-10T15:22:46.434-04:00| vcpu-1| I125: VPMC: The guest wrote to the event selector of in-use virtual performance counter 0, which is disallowed.
2016-10-10T15:22:46.435-04:00| vcpu-1| I125: VPMC: The guest wrote to the event selector of in-use virtual performance counter 0, which is disallowed.
2016-10-10T15:22:46.436-04:00| vcpu-1| I125: VPMC: The guest wrote to the event selector of in-use virtual performance counter 0, which is disallowed.

The VMware virtual hardware platform software drops writes to virtual counters it exposes as in-use, and logs warnings similar to above.

We have tried perf 3.13.11-ckt39 on Linux kernel 3.13.0-92-generic, and perf 4.8.1 on Linux kernel  4.8.1.  In both cases, when we forcibly mark counters as in-use, we see warnings similar to above. When this happens, we also see perf sometimes report counts of 0 for the events it was attempting to count.  

For example, rotate1 is a simple c++ binary that loops and should advance cycles and instructions counts.  The following perf 3.13.11-ckt39 run was performed in a VMware VM x86 that has VPMC enabled on Linux kernel 3.13.0-92-generic, and all of the generic counters were forcibly marked as in-use. 

$ perf stat -e cycles -e instructions ./rotate1 10000
Took 9.519990 seconds

 Performance counter stats for './rotate1 10000':

                 0 cycles
                 0 instructions

      17.213091320 seconds time elapsed

By looking at these warnings, naively, it would appear perf is trying to use counters that are marked as in-use.  I would like to investigate if this is expected perf behavior, or unexpected perf behavior.  I have no knowledge about what perf's approach is to handle in-use counters, but I am hoping someone here might.  For example, if perf finds the Linux PMU driver has successfully initialized, does it assume it has authoritative access to all the counters?  If anyone has specific knowledge about how perf treats x86 performance counters that it finds as "in-use" (such as counters that are being used by BIOS, or host operating system), could you please reach out to me?

Thanks,
Taylor 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question about Perf's handling of in-use performance counters
  2016-10-21 21:59 Question about Perf's handling of in-use performance counters Taylor Andrews
@ 2016-10-27 18:11 ` Andi Kleen
  2016-10-27 21:00   ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2016-10-27 18:11 UTC (permalink / raw)
  To: Taylor Andrews; +Cc: linux-perf-users, peterz


Taylor Andrews <andrewst@vmware.com> writes:

> First some background:
>
> VMware's virtual x86 performance counter implementation aims to expose
> in-use (unavailable) performance counters to the guest operating
> system in the hopes that software agents will recognize it as an
> "in-use" resource and follow the PMU sharing guidelines outlined in
> Intel's Performance Monitoring Unit Sharing Guide
> (https://software.intel.com/en-us/articles/performance-monitoring-unit-guidelines/).
> There is also a VMware-based mechanism to force virtual performance
> counters to be exposed to the guest operating system as in-use.
> "In-use" is defined in the sharing guidelines as the enable bits being
> found to be set, either in the general purpose PMC's event select MSR,
> or in the case of fixed function counters, in the Fixed Counter
> Control MSR.
>
> The Linux PMU driver looks like it currently complains about the BIOS
> being "broken" if it finds any counters are in-use by it, but it still
> successfully initializes:
> https://github.com/torvalds/linux/blob/a5ebe0ba3dff658c5286e8d5f20e4328f719d5a3/arch/x86/kernel/cpu/perf_event.c#L181
>

>
> By looking at these warnings, naively, it would appear perf is trying
> to use counters that are marked as in-use.

That's right. For generic counters perf doesn't really follow the
exclusion protocol. It just checks and warns, but they later
the "in use" information is not used in the scheduler.
It works for fixed counters.

> I would like to investigate if this is expected perf behavior, or
> unexpected perf behavior.

It's kind of expected, but could argue that it probably should be fixed.

For the VM of course you could implement it by taking counters from
the top of the range and limiting CPUID.

There are some other use cases (e.g. with user space users of perfmon)
where it would be helpful, so at some point it would be useful to fix.

I think it could be done by simply making the bios test update the
active_mask of the scheduler. But it would be somewhat expensive
to reread the enable registers all the time to do full exclusion
with run timer users.

It's also hard to test unfortunately.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question about Perf's handling of in-use performance counters
  2016-10-27 18:11 ` Andi Kleen
@ 2016-10-27 21:00   ` Peter Zijlstra
  2016-10-28 13:53     ` Andi Kleen
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2016-10-27 21:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Taylor Andrews, linux-perf-users

On Thu, Oct 27, 2016 at 11:11:24AM -0700, Andi Kleen wrote:
> 
> Taylor Andrews <andrewst@vmware.com> writes:
> 
> > First some background:
> >
> > VMware's virtual x86 performance counter implementation aims to expose
> > in-use (unavailable) performance counters to the guest operating
> > system in the hopes that software agents will recognize it as an
> > "in-use" resource and follow the PMU sharing guidelines outlined in
> > Intel's Performance Monitoring Unit Sharing Guide
> > (https://software.intel.com/en-us/articles/performance-monitoring-unit-guidelines/).

I would have to dig out the thread, but that wasn't followed on purpose
and is unlikely to ever be followed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question about Perf's handling of in-use performance counters
  2016-10-27 21:00   ` Peter Zijlstra
@ 2016-10-28 13:53     ` Andi Kleen
  2016-10-28 14:03       ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2016-10-28 13:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andi Kleen, Taylor Andrews, linux-perf-users

On Thu, Oct 27, 2016 at 11:00:12PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 27, 2016 at 11:11:24AM -0700, Andi Kleen wrote:
> > 
> > Taylor Andrews <andrewst@vmware.com> writes:
> > 
> > > First some background:
> > >
> > > VMware's virtual x86 performance counter implementation aims to expose
> > > in-use (unavailable) performance counters to the guest operating
> > > system in the hopes that software agents will recognize it as an
> > > "in-use" resource and follow the PMU sharing guidelines outlined in
> > > Intel's Performance Monitoring Unit Sharing Guide
> > > (https://software.intel.com/en-us/articles/performance-monitoring-unit-guidelines/).
> 
> I would have to dig out the thread, but that wasn't followed on purpose
> and is unlikely to ever be followed.

If I remember correctly it was about difficulty of implementation.

An already used generic counter could be just handled like a pinned
counter. The only change needed would be a hook into the scheduler to
allocate a specific counter. I don't think it would be that difficult
frankly.

The only open question is how often this would need to be rechecked. But
we already have some code to reserve the PMU when it is first used after
being idle (for the old oprofile code). This would probably be the right
place to (re-)do such a check.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question about Perf's handling of in-use performance counters
  2016-10-28 13:53     ` Andi Kleen
@ 2016-10-28 14:03       ` Peter Zijlstra
  2016-10-28 15:40         ` Andi Kleen
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2016-10-28 14:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Taylor Andrews, linux-perf-users

On Fri, Oct 28, 2016 at 06:53:25AM -0700, Andi Kleen wrote:
> On Thu, Oct 27, 2016 at 11:00:12PM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 27, 2016 at 11:11:24AM -0700, Andi Kleen wrote:
> > > 
> > > Taylor Andrews <andrewst@vmware.com> writes:
> > > 
> > > > First some background:
> > > >
> > > > VMware's virtual x86 performance counter implementation aims to expose
> > > > in-use (unavailable) performance counters to the guest operating
> > > > system in the hopes that software agents will recognize it as an
> > > > "in-use" resource and follow the PMU sharing guidelines outlined in
> > > > Intel's Performance Monitoring Unit Sharing Guide
> > > > (https://software.intel.com/en-us/articles/performance-monitoring-unit-guidelines/).
> > 
> > I would have to dig out the thread, but that wasn't followed on purpose
> > and is unlikely to ever be followed.
> 
> If I remember correctly it was about difficulty of implementation.
> 
> An already used generic counter could be just handled like a pinned
> counter. The only change needed would be a hook into the scheduler to
> allocate a specific counter. I don't think it would be that difficult
> frankly.
> 
> The only open question is how often this would need to be rechecked. But
> we already have some code to reserve the PMU when it is first used after
> being idle (for the old oprofile code). This would probably be the right
> place to (re-)do such a check.

No, it was about the (mis)guide-line having fundamental races and the
belief that the BIOS has no business what so ever using these resources
to begin with.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question about Perf's handling of in-use performance counters
  2016-10-28 14:03       ` Peter Zijlstra
@ 2016-10-28 15:40         ` Andi Kleen
  2016-10-28 16:28           ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2016-10-28 15:40 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andi Kleen, Taylor Andrews, linux-perf-users

> No, it was about the (mis)guide-line having fundamental races and the
> belief that the BIOS has no business what so ever using these resources
> to begin with.

In this case it's not the BIOS, but a hypervisor who allocates the
counter. I believe there are valid use cases here. 

The same issue can also happen when people use user space perfmon
tools, like PCM or likwid, and potentially with other non BIOS users.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question about Perf's handling of in-use performance counters
  2016-10-28 15:40         ` Andi Kleen
@ 2016-10-28 16:28           ` Peter Zijlstra
  2016-10-28 16:33             ` Andi Kleen
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2016-10-28 16:28 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Taylor Andrews, linux-perf-users

On Fri, Oct 28, 2016 at 08:40:12AM -0700, Andi Kleen wrote:
> > No, it was about the (mis)guide-line having fundamental races and the
> > belief that the BIOS has no business what so ever using these resources
> > to begin with.
> 
> In this case it's not the BIOS, but a hypervisor who allocates the
> counter. I believe there are valid use cases here. 
> 
> The same issue can also happen when people use user space perfmon
> tools, like PCM or likwid, and potentially with other non BIOS users.

User space writing of the MSRs is not supported and people who do this
get to keep all pieces.

But the thing is still racy, and therefore useless.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question about Perf's handling of in-use performance counters
  2016-10-28 16:28           ` Peter Zijlstra
@ 2016-10-28 16:33             ` Andi Kleen
  2016-10-28 18:28               ` Taylor Andrews
  2016-11-07 22:25               ` Taylor Andrews
  0 siblings, 2 replies; 12+ messages in thread
From: Andi Kleen @ 2016-10-28 16:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andi Kleen, Taylor Andrews, linux-perf-users

> But the thing is still racy, and therefore useless.

At least the Hypervisor case is not racy. The Hypervisor always
uses that counter.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question about Perf's handling of in-use performance counters
  2016-10-28 16:33             ` Andi Kleen
@ 2016-10-28 18:28               ` Taylor Andrews
  2016-11-30 15:44                 ` Taylor Andrews
  2016-11-07 22:25               ` Taylor Andrews
  1 sibling, 1 reply; 12+ messages in thread
From: Taylor Andrews @ 2016-10-28 18:28 UTC (permalink / raw)
  To: Andi Kleen, Peter Zijlstra; +Cc: linux-perf-users

Hi Andi, Peter,

Thanks so much for the responses. 

Hi Peter, 

> No, it was about the (mis)guide-line having fundamental races and the
> belief that the BIOS has no business what so ever using these resources
> to begin with.

Can you please describe what fundamental races you are talking about?

Why do you believe the BIOS should never be using performance counters?

Even if you discount the BIOS use case (despite Intel considering it legitimate), we still have the scenario of a Hypervisor exposing some virtual performance counters as in-use and unavailable.  What was the rationale that the PMU cooperative sharing guidelines outlined by Intel should be intentionally ignored, and that perf should attempt to take over all generic performance counters, regardless of if they are marked in-use?

It should be noted that this design decision degrades the experience of using perf in x86 VMware VMs and has confused perf users.  If one or more virtual generic counters are not available, attempted writes to them are dropped by the Hypervisor so as not to corrupt other entities that are using the real PMU hardware.  This attempted use of in-use counters results in reported event counts of zero from perf with no errors or warnings that could better inform the user of the real issue - that there is a lack of available PMU resources to complete the requested profiling. 

Thanks,
Taylor

________________________________________
From: Andi Kleen <andi@firstfloor.org>
Sent: Friday, October 28, 2016 12:33 PM
To: Peter Zijlstra
Cc: Andi Kleen; Taylor Andrews; linux-perf-users@vger.kernel.org
Subject: Re: Question about Perf's handling of in-use performance counters

> But the thing is still racy, and therefore useless.

At least the Hypervisor case is not racy. The Hypervisor always
uses that counter.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question about Perf's handling of in-use performance counters
  2016-10-28 16:33             ` Andi Kleen
  2016-10-28 18:28               ` Taylor Andrews
@ 2016-11-07 22:25               ` Taylor Andrews
  1 sibling, 0 replies; 12+ messages in thread
From: Taylor Andrews @ 2016-11-07 22:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

Hi Andi,

Do you know of any perf development-oriented mailing lists that may hold more information about this behavior of attempting to use in-use general purpose performance counters?  

Thanks,
Taylor

________________________________________
From: Andi Kleen <andi@firstfloor.org>
Sent: Friday, October 28, 2016 12:33 PM
To: Peter Zijlstra
Cc: Andi Kleen; Taylor Andrews; linux-perf-users@vger.kernel.org
Subject: Re: Question about Perf's handling of in-use performance counters

> But the thing is still racy, and therefore useless.

At least the Hypervisor case is not racy. The Hypervisor always
uses that counter.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question about Perf's handling of in-use performance counters
  2016-10-28 18:28               ` Taylor Andrews
@ 2016-11-30 15:44                 ` Taylor Andrews
  2016-11-30 18:17                   ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Taylor Andrews @ 2016-11-30 15:44 UTC (permalink / raw)
  To: Andi Kleen, Peter Zijlstra; +Cc: linux-perf-users, linux-kernel, Alok Kataria

Hi,

Friendly ping as this discussion seems to have stalled.

For the full discussion please see http://www.spinics.net/lists/linux-perf-users/msg03168.html

Thanks,
Taylor

________________________________________
From: Taylor Andrews
Sent: Friday, October 28, 2016 2:28 PM
To: Andi Kleen; Peter Zijlstra
Cc: linux-perf-users@vger.kernel.org
Subject: Re: Question about Perf's handling of in-use performance counters

Hi Andi, Peter,

Thanks so much for the responses.

Hi Peter,

> No, it was about the (mis)guide-line having fundamental races and the
> belief that the BIOS has no business what so ever using these resources
> to begin with.

Can you please describe what fundamental races you are talking about?

Why do you believe the BIOS should never be using performance counters?

Even if you discount the BIOS use case (despite Intel considering it legitimate), we still have the scenario of a Hypervisor exposing some virtual performance counters as in-use and unavailable.  What was the rationale that the PMU cooperative sharing guidelines outlined by Intel should be intentionally ignored, and that perf should attempt to take over all generic performance counters, regardless of if they are marked in-use?

It should be noted that this design decision degrades the experience of using perf in x86 VMware VMs and has confused perf users.  If one or more virtual generic counters are not available, attempted writes to them are dropped by the Hypervisor so as not to corrupt other entities that are using the real PMU hardware.  This attempted use of in-use counters results in reported event counts of zero from perf with no errors or warnings that could better inform the user of the real issue - that there is a lack of available PMU resources to complete the requested profiling.

Thanks,
Taylor

________________________________________
From: Andi Kleen <andi@firstfloor.org>
Sent: Friday, October 28, 2016 12:33 PM
To: Peter Zijlstra
Cc: Andi Kleen; Taylor Andrews; linux-perf-users@vger.kernel.org
Subject: Re: Question about Perf's handling of in-use performance counters

> But the thing is still racy, and therefore useless.

At least the Hypervisor case is not racy. The Hypervisor always
uses that counter.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question about Perf's handling of in-use performance counters
  2016-11-30 15:44                 ` Taylor Andrews
@ 2016-11-30 18:17                   ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2016-11-30 18:17 UTC (permalink / raw)
  To: Taylor Andrews
  Cc: Andi Kleen, linux-perf-users, linux-kernel, Alok Kataria, Ingo Molnar

On Wed, Nov 30, 2016 at 03:44:57PM +0000, Taylor Andrews wrote:

> > No, it was about the (mis)guide-line having fundamental races and the
> > belief that the BIOS has no business what so ever using these resources
> > to begin with.
> 
> Can you please describe what fundamental races you are talking about?

The protocol does a RDMSR to see if the counter is configured or not, if
not, then it can proceed with the WRMSR.

Since these are two separate instructions, there is nothing that
prevents a vCPU preemption or SMI to come in between and invalidate the
state we just read.

> Why do you believe the BIOS should never be using performance counters?

The BIOS's job is to boot the system and then stay the heck away.
Runtime services are not what its for.

Now I know some hardware vendors like to (ab)use SMM for
fail^Wfeature-add, and that is exactly the kind of crap we don't need.
BIOS monkeys always get it wrong and BIOS code is _never_ updated, so
then you're stuck with wreckage.

> Even if you discount the BIOS use case (despite Intel considering it
> legitimate), we still have the scenario of a Hypervisor exposing some
> virtual performance counters as in-use and unavailable.  What was the
> rationale that the PMU cooperative sharing guidelines outlined by
> Intel should be intentionally ignored, and that perf should attempt to
> take over all generic performance counters, regardless of if they are
> marked in-use?

See the thread here:

  https://lkml.org/lkml/2011/3/24/608

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-11-30 18:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-21 21:59 Question about Perf's handling of in-use performance counters Taylor Andrews
2016-10-27 18:11 ` Andi Kleen
2016-10-27 21:00   ` Peter Zijlstra
2016-10-28 13:53     ` Andi Kleen
2016-10-28 14:03       ` Peter Zijlstra
2016-10-28 15:40         ` Andi Kleen
2016-10-28 16:28           ` Peter Zijlstra
2016-10-28 16:33             ` Andi Kleen
2016-10-28 18:28               ` Taylor Andrews
2016-11-30 15:44                 ` Taylor Andrews
2016-11-30 18:17                   ` Peter Zijlstra
2016-11-07 22:25               ` Taylor Andrews

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.