linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Wei Wang <wei.w.wang@intel.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	pbonzini@redhat.com, ak@linux.intel.com, mingo@redhat.com,
	rkrcmar@redhat.com, like.xu@intel.com
Subject: Re: [PATCH v1 1/8] perf/x86: add support to mask counters from host
Date: Mon, 5 Nov 2018 13:14:13 +0100	[thread overview]
Message-ID: <20181105121413.GC22431@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <5BE02725.3010707@intel.com>

On Mon, Nov 05, 2018 at 07:19:01PM +0800, Wei Wang wrote:
> On 11/05/2018 05:34 PM, Peter Zijlstra wrote:
> > On Fri, Nov 02, 2018 at 05:08:31PM +0800, Wei Wang wrote:
> > > On 11/01/2018 10:52 PM, Peter Zijlstra wrote:
> > > > > @@ -723,6 +724,9 @@ static void perf_sched_init(struct perf_sched *sched, struct event_constraint **
> > > > >    	sched->max_weight	= wmax;
> > > > >    	sched->max_gp		= gpmax;
> > > > >    	sched->constraints	= constraints;
> > > > > +#ifdef CONFIG_CPU_SUP_INTEL
> > > > > +	sched->state.used[0]	= cpuc->intel_ctrl_guest_mask;
> > > > > +#endif
> > > > NAK.  This completely undermines the whole purpose of event scheduling.
> > > > 
> > > Hi Peter,
> > > 
> > > Could you share more details how it would affect the host side event
> > > scheduling?
> > Not all counters are equal; suppose you have one of those chips that can
> > only do PEBS on counter 0, and then hand out 0 to the guest for some
> > silly event. That means nobody can use PEBS anymore.
> 
> Thanks for sharing your point.
> 
> In this example (assume PEBS can only work with counter 0), how would the
> existing approach (i.e. using host event to emulate) work?
> For example, guest wants to use PEBS, host also wants to use PEBS or other
> features that only counter 0 fits, I think either guest or host will not
> work then.

The answer for PEBS is really simple; PEBS does not virtualize (Andi
tried and can tell you why; IIRC it has something to do with how the
hardware asks for a Linear Address instead of a Physical Address). So
the problem will not arrise.

But there are certainly constrained events that will result in the same
problem.

The traditional approach of perf on resource contention is to share it;
you get only partial runtime and can scale up the events given the
runtime metrics provided.

We also have perf_event_attr::pinned, which is normally only available
to root, in which case we'll end up marking any contending event to an
error state.

Neither are ideal for MSR level emulation.

> With the register level virtualization approach, we could further support
> that case: if guest requests to use a counter which host happens to be
> using, we can let host and guest both be satisfied by supporting counter
> context switching on guest/host switching. In this case, both guest and host
> can use counter 0. (I think this is actually a policy selection, the current
> series chooses to be guest first, we can further change it if necessary)

That can only work if the host counter has
perf_event_attr::exclude_guest=1, any counter without that must also
count when the guest is running.

(and, IIRC, normal perf tool events do not have that set by default)

> > > Would you have any suggestions?
> > I would suggest not to use virt in the first place of course ;-)
> > 
> > But whatever you do; you have to keep using host events to emulate the
> > guest PMU. That doesn't mean you can't improve things; that code is
> > quite insane from what you told earlier.
> 
> I agree that the host event emulation is a functional approach, but it may
> not be an effective one (also got complaints from people about today's perf
> in the guest).
> We actually have similar problems when doing network virtualization. The
> more effective approach tends to be the one that bypasses the host network
> stack. Both the network stack and perf stack seem to be too heavy to be used
> as part of the emulation.

The thing is; you cannot do blind pass-through of the PMU, some of its
features simply do not work in a guest. Also, the host perf driver
expects certain functionality that must be respected.

Those are the constraints you have to work with.

Back when we all started down this virt rathole, I proposed people do
paravirt perf, where events would be handed to the host kernel and let
the host kernel do its normal thing. But people wanted to do the MSR
based thing because of !linux guests.

Now I don't care about virt much, but I care about !linux guests even
less.

  reply	other threads:[~2018-11-05 12:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-01 10:04 [PATCH v1 0/8] Intel Virtual PMU Optimization Wei Wang
2018-11-01 10:04 ` [PATCH v1 1/8] perf/x86: add support to mask counters from host Wei Wang
2018-11-01 14:52   ` Peter Zijlstra
2018-11-02  9:08     ` Wei Wang
2018-11-05  9:34       ` Peter Zijlstra
2018-11-05 11:19         ` Wei Wang
2018-11-05 12:14           ` Peter Zijlstra [this message]
2018-11-05 15:37             ` Wang, Wei W
2018-11-05 16:56               ` Peter Zijlstra
2018-11-05 18:20               ` Andi Kleen
2018-11-01 10:04 ` [PATCH v1 2/8] perf/x86/intel: add pmi callback support Wei Wang
2018-11-01 10:04 ` [PATCH v1 3/8] KVM/x86/vPMU: optimize intel vPMU Wei Wang
2018-11-01 10:04 ` [PATCH v1 4/8] KVM/x86/vPMU: support msr switch on vmx transitions Wei Wang
2018-11-01 10:04 ` [PATCH v1 5/8] KVM/x86/vPMU: intel_pmu_read_pmc Wei Wang
2018-11-01 10:04 ` [PATCH v1 6/8] KVM/x86/vPMU: remove some unused functions Wei Wang
2018-11-01 10:04 ` [PATCH v1 7/8] KVM/x86/vPMU: save/restore guest perf counters on vCPU switching Wei Wang
2018-11-01 10:04 ` [PATCH v1 8/8] KVM/x86/vPMU: return the counters to host if guest is torn down Wei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181105121413.GC22431@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=ak@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).