RE: [PATCH v1 1/8] perf/x86: add support to mask counters from host

From: "Wang, Wei W" <wei.w.wang@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"ak@linux.intel.com" <ak@linux.intel.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"rkrcmar@redhat.com" <rkrcmar@redhat.com>,
	"Xu, Like" <like.xu@intel.com>
Subject: RE: [PATCH v1 1/8] perf/x86: add support to mask counters from host
Date: Mon, 5 Nov 2018 15:37:24 +0000	[thread overview]
Message-ID: <286AC319A985734F985F78AFA26841F73DE3AC8B@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <20181105121413.GC22431@hirez.programming.kicks-ass.net>

On Monday, November 5, 2018 8:14 PM, Peter Zijlstra wrote:
> The answer for PEBS is really simple; PEBS does not virtualize (Andi tried and
> can tell you why; IIRC it has something to do with how the hardware asks for
> a Linear Address instead of a Physical Address). So the problem will not arrise.
> 
> But there are certainly constrained events that will result in the same
> problem.

Yes. I just followed the PEBS assumption to discuss the counter contention that you mentioned.

> 
> The traditional approach of perf on resource contention is to share it; you
> get only partial runtime and can scale up the events given the runtime
> metrics provided.
> 
> We also have perf_event_attr::pinned, which is normally only available to
> root, in which case we'll end up marking any contending event to an error
> state.

Yes,  that's one of the limitations with the existing host event emulation approach - in that case (i.e. both require counter 0 to work) the existing approach fails to have the guest perf function, even the pinned event has ".exclude_guest" set.

> 
> Neither are ideal for MSR level emulation.
> 
> That can only work if the host counter has perf_event_attr::exclude_guest=1,
> any counter without that must also count when the guest is running.
> 
> (and, IIRC, normal perf tool events do not have that set by default)

Probably no. Please see Line 81 at
https://github.com/torvalds/linux/blob/master/tools/perf/util/util.c
perf_guest by default is false, which makes "attr->exclude_guest = 1"

> The thing is; you cannot do blind pass-through of the PMU, some of its
> features simply do not work in a guest. Also, the host perf driver expects
> certain functionality that must be respected.

Actually we are not blindly assigning the perf counters. Guest works with its own complete perf stack (like the one on the host) which also has its own constraints. The perf counter that the guest is requesting from the hypervisor is the one that comes out from its event constraints (i.e. the one that will work for that feature on the guest). 

The counter is also not passed through to the guest, guest accesses to the assigned counter will still exit to the hypervisor, and the hypervisor helps update the counter. 

Please correct me if I misunderstood your point.

> 
> Those are the constraints you have to work with.
> 
> Back when we all started down this virt rathole, I proposed people do
> paravirt perf, where events would be handed to the host kernel and let the
> host kernel do its normal thing. But people wanted to do the MSR based
> thing because of !linux guests.

IMHO, it is worthwhile to care more about the real use case. When a user gets a virtual machine from a vendor, all he can do is to run perf inside the guest. The above contention concerns would not happen, because the user wouldn't be able to come to the host to run perf on the virtualization software (e.g. ./perf qemu..) and in the meantime running perf in the guest to cause the contention.

On the other hand, when we improve the user experience of running perf inside the guest by reducing the virtualization overhead, that would bring real benefits to the real use case.

Best,
Wei