Re: [PATCH v2 05/15] KVM: vmx/pmu: Emulate MSR_ARCH_LBR_DEPTH for guest Arch LBR

From: "Yang, Weijiang" <weijiang.yang@intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: <pbonzini@redhat.com>, <jmattson@google.com>,
	<kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<like.xu.linux@gmail.com>, <kan.liang@linux.intel.com>,
	<wei.w.wang@intel.com>
Subject: Re: [PATCH v2 05/15] KVM: vmx/pmu: Emulate MSR_ARCH_LBR_DEPTH for guest Arch LBR
Date: Mon, 30 Jan 2023 19:46:37 +0800	[thread overview]
Message-ID: <b34bff98-9f2d-539f-7ee9-7bba09a8269a@intel.com> (raw)
In-Reply-To: <Y9QzLHNxS4K81SfU@google.com>

On 1/28/2023 4:25 AM, Sean Christopherson wrote:
> On Thu, Nov 24, 2022, Yang Weijiang wrote:
>> [...]
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -571,6 +571,9 @@ struct kvm_pmu {
>>   	 * redundant check before cleanup if guest don't use vPMU at all.
>>   	 */
>>   	u8 event_count;
>> +
>> +	/* Guest arch lbr depth supported by KVM. */
>> +	u64 kvm_arch_lbr_depth;
> There is zero reason to store this separately.  KVM already records the allowed
> depth in kvm_vcpu.lbr_desc.records.nr.

kvm_vcpu.lbr_desc.records.nr alone cannot tell whether it's legacy lbr or arch-lbr unless

binding host arch-lbr checking.

>
>>   };
>>   
>>   struct kvm_pmu_ops;
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 905673228932..0c78cb4b72be 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -178,6 +178,10 @@ static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
>>   	    (index == MSR_LBR_SELECT || index == MSR_LBR_TOS))
>>   		return true;
>>   
>> +	if (index == MSR_ARCH_LBR_DEPTH)
>> +		return kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR) &&
> Like the previous patch, since intel_pmu_lbr_is_enabled() effectively serves as
> a generic kvm_cpu_cap_has(LBRS) check, this can be distilled to:
>
> 	if (cpu_feature_enabled(X86_FEATURE_ARCH_LBR)) {
> 		if (index == MSR_ARCH_LBR_DEPTH || index == MSR_ARCH_LBR_CTL)
> 			return true;
> 	} else {
> 		if (index == MSR_LBR_SELECT || index == MSR_LBR_TOS))
> 			return true;
> 	}

yes, exactly, thanks!

>> +		       guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR);
>> +
>>   	if ((index >= records->from && index < records->from + records->nr) ||
>>   	    (index >= records->to && index < records->to + records->nr))
>>   		return true;
>> @@ -345,6 +349,7 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   {
>>   	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>>   	struct kvm_pmc *pmc;
>> +	struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
>>   	u32 msr = msr_info->index;
>>   
>>   	switch (msr) {
>> @@ -369,6 +374,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   	case MSR_PEBS_DATA_CFG:
>>   		msr_info->data = pmu->pebs_data_cfg;
>>   		return 0;
>> +	case MSR_ARCH_LBR_DEPTH:
>> +		msr_info->data = lbr_desc->records.nr;
>> +		return 0;
>>   	default:
>>   		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
>>   		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
>> @@ -395,6 +403,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   {
>>   	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>>   	struct kvm_pmc *pmc;
>> +	struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
>>   	u32 msr = msr_info->index;
>>   	u64 data = msr_info->data;
>>   	u64 reserved_bits, diff;
>> @@ -456,6 +465,24 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   			return 0;
>>   		}
>>   		break;
>> +	case MSR_ARCH_LBR_DEPTH:
>> +		if (!pmu->kvm_arch_lbr_depth && !msr_info->host_initiated)
> Don't invent a new check, just prevent KVM from reaching this path via the
> existing intel_pmu_lbr_is_enabled().

intel_pmu_lbr_is_enabled() only indicates LBR is on(either legacy or 
arch-lbr), but

MSR_ARCH_LBR_DEPTH is only for arch-lbr.

>
>> +			return 1;
>> +		/*
>> +		 * When guest/host depth are different, the handling would be tricky,
>> +		 * so only max depth is supported for both host and guest.
>> +		 */
> This semi-arbitrary restriction is fine because Intel's architecture allows KVM
> to enumerate support for a single depth, but somewhere in the changelog and/or
> code that actually needs to be state.  This blurb
>
>    In the first generation of Arch LBR, max entry size is 32,
>    host configures the max size and guest always honors the setting.
>
> makes it sound like KVM is relying on the guest to do the right thing, and this
> code looks like KVM is making up it's own behavior.

Will modify the change log.

>
>> +		if (data != pmu->kvm_arch_lbr_depth)
>> +			return 1;
>> +
>> +		lbr_desc->records.nr = data;
>> +		/*
>> +		 * Writing depth MSR from guest could either setting the
>> +		 * MSR or resetting the LBR records with the side-effect.
>> +		 */
>> +		if (kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR))
> Another check, really?  KVM shouldn't reach this point if KVM doesn't support
> Arch LBRs.  And if that isn't guarantee (honestly forgot what this series actually
> proposed at this point), then that's a bug, full stop.

Right, this check is unnecessary.

>
>> +			wrmsrl(MSR_ARCH_LBR_DEPTH, lbr_desc->records.nr);
> IIUC, this is subtly broken.  Piecing together all of the undocumented bits, my
> understanding is that arch LBRs piggyback KVM's existing LBR support, i.e. use a
> "virtual" perf event.

Yes.

> And like traditional LBR support, the host can steal control
> of the LBRs in IRQ context by disabling the perf event via IPI.  And since writes
> to MSR_ARCH_LBR_DEPTH purge LBR records, this needs to be treated as if it were a
> write to an LBR record, i.e. belongs in the IRQs disabled section of
> intel_pmu_handle_lbr_msrs_access().

I assume you're referring to host events preempt guest events. In that 
case, it's possible

guest operations interfere host events/data. But this series 
implementation focus on

"guest only" mode, i.e., it sets {Load|Clear}_LBR_CTL at VM entry/exit, 
that way, we don't

need to care about host preempt, the event data is saved/restored at 
event sched_{out|in}.

>
> If for some magical reason it's safe to access arch LBR MSRs without disabling IRQs
> and confirming perf event ownership, I want to see a very detailed changelog
> explaining exactly how that magic works.

Will change the commit log to explain more.

>
>> +		return 0;
>>   	default:
>>   		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
>>   		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
>> @@ -506,6 +533,32 @@ static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
>>   	}
>>   }
>>   
>> +static bool cpuid_enable_lbr(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> +	struct kvm_cpuid_entry2 *entry;
>> +	int depth_bit;
>> +
>> +	if (!kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR))
>> +		return !static_cpu_has(X86_FEATURE_ARCH_LBR) &&
>> +			cpuid_model_is_consistent(vcpu);
>> +
>> +	pmu->kvm_arch_lbr_depth = 0;
>> +	if (!guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
>> +		return false;
>> +
>> +	entry = kvm_find_cpuid_entry(vcpu, 0x1C);
>> +	if (!entry)
>> +		return false;
>> +
>> +	depth_bit = fls(cpuid_eax(0x1C) & 0xff);
> This is unnecessarily fragile.  Get the LBR depth from perf, don't read CPUID and
> assume perf will always configured the max depth.,

Make sense, will refactor the function in next version.

>
> This enabling also belongs at the tail end of the series, i.e. KVM shouldn't let
> userspace enable LBRs until all the support pieces are in place.

OK.

>
>> +	if ((entry->eax & 0xff) != (1 << (depth_bit - 1)))
>> +		return false;
>> +
>> +	pmu->kvm_arch_lbr_depth = depth_bit * 8;
>> +	return true;
>> +}
>> +
[...]