linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wei Wang <wei.w.wang@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	pbonzini@redhat.com, ak@linux.intel.com, mingo@redhat.com,
	rkrcmar@redhat.com, like.xu@intel.com
Subject: Re: [PATCH v1] KVM/x86/vPMU: Guest PMI Optimization
Date: Sun, 14 Oct 2018 20:53:55 +0800	[thread overview]
Message-ID: <5BC33C63.7010904@intel.com> (raw)
In-Reply-To: <20181013133013.GA15612@worktop.programming.kicks-ass.net>

On 10/13/2018 09:30 PM, Peter Zijlstra wrote:
> On Fri, Oct 12, 2018 at 08:20:17PM +0800, Wei Wang wrote:
>> Guest changing MSR_CORE_PERF_GLOBAL_CTRL causes KVM to reprogram pmc
>> counters, which re-allocates a host perf event. This process is
> Yea gawds, that's horrific. Why does it do that? We have
> PERF_EVENT_IOC_PERIOD which does that much better. Still, what you're
> proposing is faster still -- if it is correct.

I'm not sure about the back story. Probably it was an initial functional 
implementation.

>> This patch implements a fast path to handle the guest change of
>> MSR_CORE_PERF_GLOBAL_CTRL for the guest pmi case. Guest change of the
>> msr will be applied to the hardware when entering the guest, and the
>> old perf event will continue to be used. The guest setting of the
>> perf counter for the next irq period in pmi will also be written
>> directly to the hardware counter when entering the guest.
> What you're failing to explain here is why exactly it is ok to write to
> the MSR directly without updating the perf_event state. I didn't take
> the time to go through all that, but it certainly needs documenting.

OK. The guest itself has the perf event (the one that is using the 
hardware counter), and the event state is managed by the guest perf core.
The host side perf event isn't the one that uses the hardware counter. 
Essentially, it is here on the host just to occupy the counter (via the 
host perf core) for the guest. The writing to the MSR here is 
essentially performed on behave of the guest perf event.
So, for the host side perf event, I think its state should be active as 
long as the guest is using the counter. The state will be changed to 
inactive (as usual) when the vCPU is scheduled out.

> This is something that can certainly get broken by accident.
>
> Is there any documentation/comment that explains how this virtual PMU
> crud works in general?

I haven't found any docs that could be useful so far.


>> +u64 intel_pmu_disable_guest_counters(void)
>> +{
>> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> +	u64 mask = cpuc->intel_ctrl_host_mask;
>> +
>> +	cpuc->intel_ctrl_host_mask = ULONG_MAX;
>> +
>> +	return mask;
>> +}
>> +EXPORT_SYMBOL_GPL(intel_pmu_disable_guest_counters);
> OK, this them gets the MSR written when we re-enter the guest, after the
> WRMSR trap, right?

Yes, the guest value will be loaded to the MSR.

>
> +		/*
> +		 * The guest PMI handler is asking for enabling the perf
> +		 * counters. This happens at the end of the guest PMI handler,
> +		 * so clear in_pmi.
> +		 */
> +		intel_pmu_enable_guest_counters(pmu->counter_mask);
> +		pmu->in_pmi = false;
> +	}
> +}
> The v4 PMI handler does not in fact do that I think.
>
>> @@ -237,9 +267,23 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   	default:
>>   		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
>>   		    (pmc = get_fixed_pmc(pmu, msr))) {
>> -			if (!msr_info->host_initiated)
>> -				data = (s64)(s32)data;
>> -			pmc->counter += data - pmc_read_counter(pmc);
>> +			if (pmu->in_pmi) {
>> +				/*
>> +				 * Since we are not re-allocating a perf event
>> +				 * to reconfigure the sampling time when the
>> +				 * guest pmu is in PMI, just set the value to
>> +				 * the hardware perf counter. Counting will
>> +				 * continue after the guest enables the
>> +				 * counter bit in MSR_CORE_PERF_GLOBAL_CTRL.
>> +				 */
>> +				struct hw_perf_event *hwc =
>> +						&pmc->perf_event->hw;
>> +				wrmsrl(hwc->event_base, data);
> But all this relies on the event calling the overflow handler; how does
> this not corrupt the event state such that x86_perf_event_set_period()
> might decide that the generated PMI is a spurious one?
>

We will make the optimization more general in the next version, instead 
of relying on PMI, so the above 2 questions would be gone then.

Best,
Wei




      reply	other threads:[~2018-10-14 12:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-12 12:20 [PATCH v1] KVM/x86/vPMU: Guest PMI Optimization Wei Wang
2018-10-12 16:30 ` Andi Kleen
2018-10-12 17:33   ` Alexey Budankov
2018-10-13  2:21   ` Wang, Wei W
2018-10-13  8:09   ` Paolo Bonzini
2018-10-14 12:41     ` Wei Wang
2018-10-14 13:42       ` Wang, Wei W
2018-10-13 13:30 ` Peter Zijlstra
2018-10-14 12:53   ` Wei Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5BC33C63.7010904@intel.com \
    --to=wei.w.wang@intel.com \
    --cc=ak@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).