All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] KVM: x86: Allow HWCR.McStatusWrEn to be cleared once set
@ 2023-09-22 16:42 Jim Mattson
  2023-09-22 16:42 ` [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24] Jim Mattson
  2023-09-22 16:42 ` [PATCH 3/3] KVM: selftests: Test behavior of HWCR Jim Mattson
  0 siblings, 2 replies; 10+ messages in thread
From: Jim Mattson @ 2023-09-22 16:42 UTC (permalink / raw)
  To: kvm, 'Sean Christopherson ', 'Paolo Bonzini '; +Cc: Jim Mattson

When HWCR is set to 0, store 0 in vcpu->arch.msr_hwcr.

Fixes: 191c8137a939 ("x86/kvm: Implement HWCR support")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/x86.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6c9c81e82e65..3421ed7fcee0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3701,12 +3701,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		data &= ~(u64)0x8;	/* ignore TLB cache disable */
 
 		/* Handle McStatusWrEn */
-		if (data == BIT_ULL(18)) {
-			vcpu->arch.msr_hwcr = data;
-		} else if (data != 0) {
+		if (data & ~BIT_ULL(18)) {
 			kvm_pr_unimpl_wrmsr(vcpu, msr, data);
 			return 1;
 		}
+		vcpu->arch.msr_hwcr = data;
 		break;
 	case MSR_FAM10H_MMIO_CONF_BASE:
 		if (data != 0) {
-- 
2.42.0.515.g380fc7ccd1-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]
  2023-09-22 16:42 [PATCH 1/3] KVM: x86: Allow HWCR.McStatusWrEn to be cleared once set Jim Mattson
@ 2023-09-22 16:42 ` Jim Mattson
  2023-09-22 17:21   ` Sean Christopherson
  2023-09-22 16:42 ` [PATCH 3/3] KVM: selftests: Test behavior of HWCR Jim Mattson
  1 sibling, 1 reply; 10+ messages in thread
From: Jim Mattson @ 2023-09-22 16:42 UTC (permalink / raw)
  To: kvm, 'Sean Christopherson ', 'Paolo Bonzini '; +Cc: Jim Mattson

On certain CPUs, Linux guests expect HWCR.TscFreqSel[bit 24] to be
set. If it isn't set, they complain:
	[Firmware Bug]: TSC doesn't count with P0 frequency!

Eliminate this complaint by setting the bit on virtual processors for
which Linux guests expect it to be set.

Note that this bit is read-only on said processors.

Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/cpuid.c | 10 ++++++++++
 arch/x86/kvm/x86.c   |  7 +++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0544e30b4946..2d7dcd13dcc3 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -373,6 +373,16 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
 	vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
 
+	/*
+	 * HWCR.TscFreqSel[bit 24] has a reset value of 1 on some processors.
+	 */
+	if (guest_cpuid_is_amd_or_hygon(vcpu) &&
+	    guest_cpuid_has(vcpu, X86_FEATURE_CONSTANT_TSC) &&
+	    (guest_cpuid_family(vcpu) > 0x10 ||
+	     (guest_cpuid_family(vcpu) == 0x10 &&
+	      guest_cpuid_model(vcpu) >= 2)))
+		vcpu->arch.msr_hwcr |= BIT(24);
+
 	kvm_pmu_refresh(vcpu);
 	vcpu->arch.cr4_guest_rsvd_bits =
 	    __cr4_reserved_bits(guest_cpuid_has, vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3421ed7fcee0..cb02a7c2938b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3699,12 +3699,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		data &= ~(u64)0x40;	/* ignore flush filter disable */
 		data &= ~(u64)0x100;	/* ignore ignne emulation enable */
 		data &= ~(u64)0x8;	/* ignore TLB cache disable */
+		data &= ~(u64)0x1000000;/* ignore TscFreqSel */
 
 		/* Handle McStatusWrEn */
 		if (data & ~BIT_ULL(18)) {
 			kvm_pr_unimpl_wrmsr(vcpu, msr, data);
 			return 1;
 		}
+
+		/*
+		 * When set, TscFreqSel is read-only. Attempts to
+		 * clear it are ignored.
+		 */
+		data |= vcpu->arch.msr_hwcr & BIT_ULL(24);
 		vcpu->arch.msr_hwcr = data;
 		break;
 	case MSR_FAM10H_MMIO_CONF_BASE:
-- 
2.42.0.515.g380fc7ccd1-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] KVM: selftests: Test behavior of HWCR
  2023-09-22 16:42 [PATCH 1/3] KVM: x86: Allow HWCR.McStatusWrEn to be cleared once set Jim Mattson
  2023-09-22 16:42 ` [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24] Jim Mattson
@ 2023-09-22 16:42 ` Jim Mattson
  1 sibling, 0 replies; 10+ messages in thread
From: Jim Mattson @ 2023-09-22 16:42 UTC (permalink / raw)
  To: kvm, 'Sean Christopherson ', 'Paolo Bonzini '; +Cc: Jim Mattson

Verify the following:
* Any bits that read as one cannot be cleared
* Attempts to set bits 3, 6, 8, or 24 are ignored
* Bit 18 is the only bit that can be set
* Any bit that can be set can also be cleared

Signed-off-by: Jim Mattson <jmattson@google.com>
---
 tools/testing/selftests/kvm/Makefile          |  1 +
 .../selftests/kvm/x86_64/hwcr_msr_test.c      | 57 +++++++++++++++++++
 2 files changed, 58 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/hwcr_msr_test.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index a3bb36fb3cfc..6b0219ca65eb 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -135,6 +135,7 @@ TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
 TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
 TEST_GEN_PROGS_x86_64 += system_counter_offset_test
+TEST_GEN_PROGS_x86_64 += x86_64/hwcr_msr_test
 
 # Compiled outputs used by test targets
 TEST_GEN_PROGS_EXTENDED_x86_64 += x86_64/nx_huge_pages_test
diff --git a/tools/testing/selftests/kvm/x86_64/hwcr_msr_test.c b/tools/testing/selftests/kvm/x86_64/hwcr_msr_test.c
new file mode 100644
index 000000000000..123267b44daf
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/hwcr_msr_test.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023, Google LLC.
+ *
+ * Tests for the K7_HWCR MSR.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <sys/ioctl.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "vmx.h"
+
+void test_hwcr_bit(struct kvm_vcpu *vcpu, unsigned int bit)
+{
+	const unsigned long long ignored =
+		BIT_ULL(3) | BIT_ULL(6) | BIT_ULL(8) | BIT_ULL(24);
+	const unsigned long long legal = ignored | BIT_ULL(18);
+	uint64_t orig = vcpu_get_msr(vcpu, MSR_K7_HWCR);
+	uint64_t val = BIT_ULL(bit);
+	uint64_t check;
+	int r;
+
+	r = _vcpu_set_msr(vcpu, MSR_K7_HWCR, val);
+	TEST_ASSERT((r == 1 && (val & legal)) || (r == 0 && !(val & legal)),
+		    "Unexpected result (%d) when setting HWCR[bit %u]", r, bit);
+	check =	vcpu_get_msr(vcpu, MSR_K7_HWCR);
+	if (val & (legal & ~ignored)) {
+		TEST_ASSERT(check == (orig | val),
+			    "Bit %u: unexpected HWCR %lx; expected %lx", bit,
+			    check, orig | val);
+		_vcpu_set_msr(vcpu, MSR_K7_HWCR, 0);
+		check =	vcpu_get_msr(vcpu, MSR_K7_HWCR);
+		TEST_ASSERT(check == orig,
+			    "Bit %u: unexpected HWCR %lx; expected %lx", bit,
+			    check, orig);
+	} else {
+		TEST_ASSERT(check == orig,
+			    "Bit %u: unexpected HWCR %lx; expected %lx", bit,
+			    check, orig);
+	}
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	unsigned int bit;
+
+	vm = vm_create_with_one_vcpu(&vcpu, NULL);
+
+	for (bit = 0; bit < BITS_PER_LONG; bit++)
+		test_hwcr_bit(vcpu, bit);
+
+	kvm_vm_free(vm);
+}
-- 
2.42.0.515.g380fc7ccd1-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]
  2023-09-22 16:42 ` [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24] Jim Mattson
@ 2023-09-22 17:21   ` Sean Christopherson
  2023-09-22 17:48     ` Jim Mattson
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2023-09-22 17:21 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, 'Paolo Bonzini '

On Fri, Sep 22, 2023, Jim Mattson wrote:
> On certain CPUs, Linux guests expect HWCR.TscFreqSel[bit 24] to be
> set. If it isn't set, they complain:
> 	[Firmware Bug]: TSC doesn't count with P0 frequency!
> 
> Eliminate this complaint by setting the bit on virtual processors for
> which Linux guests expect it to be set.
> 
> Note that this bit is read-only on said processors.
> 
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 10 ++++++++++
>  arch/x86/kvm/x86.c   |  7 +++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0544e30b4946..2d7dcd13dcc3 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -373,6 +373,16 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
>  	vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
>  
> +	/*
> +	 * HWCR.TscFreqSel[bit 24] has a reset value of 1 on some processors.
> +	 */
> +	if (guest_cpuid_is_amd_or_hygon(vcpu) &&
> +	    guest_cpuid_has(vcpu, X86_FEATURE_CONSTANT_TSC) &&
> +	    (guest_cpuid_family(vcpu) > 0x10 ||
> +	     (guest_cpuid_family(vcpu) == 0x10 &&
> +	      guest_cpuid_model(vcpu) >= 2)))
> +		vcpu->arch.msr_hwcr |= BIT(24);

Oh hell no.  It's bad enough that KVM _allows_ setting uarch specific bits, but
actively setting bits is a step too far.

IMO, we should delete the offending kernel code.  I don't see how it provides any
value these days.

And *if* we want to change something in KVM so that we stop getting coustomer
complaints about a useless bit, just let userspace stuff the bit.

I think we should also raise the issue with AMD (Borislav maybe?) and ask/demand
that bits in HWCR that KVM allows to be set are architecturally defined.  It's
totally fine if the value of bit 24 is uarch specific, but the behavior needs to
be something that won't change from processor to processor.

>  	kvm_pmu_refresh(vcpu);
>  	vcpu->arch.cr4_guest_rsvd_bits =
>  	    __cr4_reserved_bits(guest_cpuid_has, vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3421ed7fcee0..cb02a7c2938b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3699,12 +3699,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		data &= ~(u64)0x40;	/* ignore flush filter disable */
>  		data &= ~(u64)0x100;	/* ignore ignne emulation enable */
>  		data &= ~(u64)0x8;	/* ignore TLB cache disable */
> +		data &= ~(u64)0x1000000;/* ignore TscFreqSel */
>  
>  		/* Handle McStatusWrEn */
>  		if (data & ~BIT_ULL(18)) {
>  			kvm_pr_unimpl_wrmsr(vcpu, msr, data);
>  			return 1;
>  		}
> +
> +		/*
> +		 * When set, TscFreqSel is read-only. Attempts to
> +		 * clear it are ignored.
> +		 */
> +		data |= vcpu->arch.msr_hwcr & BIT_ULL(24);


The bit is read-only from the guest, but KVM needs to let userspace clear the
bit.

>  		vcpu->arch.msr_hwcr = data;
>  		break;
>  	case MSR_FAM10H_MMIO_CONF_BASE:
> -- 
> 2.42.0.515.g380fc7ccd1-goog
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]
  2023-09-22 17:21   ` Sean Christopherson
@ 2023-09-22 17:48     ` Jim Mattson
  2023-09-22 18:15       ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Jim Mattson @ 2023-09-22 17:48 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, Paolo Bonzini

On Fri, Sep 22, 2023 at 10:21 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Sep 22, 2023, Jim Mattson wrote:
> > On certain CPUs, Linux guests expect HWCR.TscFreqSel[bit 24] to be
> > set. If it isn't set, they complain:
> >       [Firmware Bug]: TSC doesn't count with P0 frequency!
> >
> > Eliminate this complaint by setting the bit on virtual processors for
> > which Linux guests expect it to be set.
> >
> > Note that this bit is read-only on said processors.
> >
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 10 ++++++++++
> >  arch/x86/kvm/x86.c   |  7 +++++++
> >  2 files changed, 17 insertions(+)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 0544e30b4946..2d7dcd13dcc3 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -373,6 +373,16 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >       vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
> >       vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
> >
> > +     /*
> > +      * HWCR.TscFreqSel[bit 24] has a reset value of 1 on some processors.
> > +      */
> > +     if (guest_cpuid_is_amd_or_hygon(vcpu) &&
> > +         guest_cpuid_has(vcpu, X86_FEATURE_CONSTANT_TSC) &&
> > +         (guest_cpuid_family(vcpu) > 0x10 ||
> > +          (guest_cpuid_family(vcpu) == 0x10 &&
> > +           guest_cpuid_model(vcpu) >= 2)))
> > +             vcpu->arch.msr_hwcr |= BIT(24);
>
> Oh hell no.  It's bad enough that KVM _allows_ setting uarch specific bits, but
> actively setting bits is a step too far.

The bit should be set on power on. From the PPR for AMD Family 17h
Model 01h, Revision B1 Processors,

> TscFreqSel: TSC frequency select. Read-only. Reset: 1.

Leaving it clear is a violation of the CPU vendor's hardware specification.

> IMO, we should delete the offending kernel code.  I don't see how it provides any
> value these days.

Sure, but that doesn't help legacy guests.

> And *if* we want to change something in KVM so that we stop getting coustomer
> complaints about a useless bit, just let userspace stuff the bit.

We want to make customers happy. That should not even be a question.

> I think we should also raise the issue with AMD (Borislav maybe?) and ask/demand
> that bits in HWCR that KVM allows to be set are architecturally defined.  It's
> totally fine if the value of bit 24 is uarch specific, but the behavior needs to
> be something that won't change from processor to processor.
>
> >       kvm_pmu_refresh(vcpu);
> >       vcpu->arch.cr4_guest_rsvd_bits =
> >           __cr4_reserved_bits(guest_cpuid_has, vcpu);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 3421ed7fcee0..cb02a7c2938b 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3699,12 +3699,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >               data &= ~(u64)0x40;     /* ignore flush filter disable */
> >               data &= ~(u64)0x100;    /* ignore ignne emulation enable */
> >               data &= ~(u64)0x8;      /* ignore TLB cache disable */
> > +             data &= ~(u64)0x1000000;/* ignore TscFreqSel */
> >
> >               /* Handle McStatusWrEn */
> >               if (data & ~BIT_ULL(18)) {
> >                       kvm_pr_unimpl_wrmsr(vcpu, msr, data);
> >                       return 1;
> >               }
> > +
> > +             /*
> > +              * When set, TscFreqSel is read-only. Attempts to
> > +              * clear it are ignored.
> > +              */
> > +             data |= vcpu->arch.msr_hwcr & BIT_ULL(24);
>
>
> The bit is read-only from the guest, but KVM needs to let userspace clear the
> bit.

Why? We don't let userspace clear bit 1 of EFLAGS, which is also a
"reads as one" bit.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]
  2023-09-22 17:48     ` Jim Mattson
@ 2023-09-22 18:15       ` Sean Christopherson
  2023-09-22 18:27         ` Jim Mattson
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2023-09-22 18:15 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, Paolo Bonzini

On Fri, Sep 22, 2023, Jim Mattson wrote:
> On Fri, Sep 22, 2023 at 10:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > IMO, we should delete the offending kernel code.  I don't see how it provides any
> > value these days.
> 
> Sure, but that doesn't help legacy guests.

Heh, IMO they don't need help, their owners just need to be placated ;-)

> > And *if* we want to change something in KVM so that we stop getting coustomer
> > complaints about a useless bit, just let userspace stuff the bit.
> 
> We want to make customers happy. That should not even be a question.

Can we really not tell them "this is a benign guest bug, ignore it"?

> > I think we should also raise the issue with AMD (Borislav maybe?) and ask/demand
> > that bits in HWCR that KVM allows to be set are architecturally defined.  It's
> > totally fine if the value of bit 24 is uarch specific, but the behavior needs to
> > be something that won't change from processor to processor.
> >
> > >       kvm_pmu_refresh(vcpu);
> > >       vcpu->arch.cr4_guest_rsvd_bits =
> > >           __cr4_reserved_bits(guest_cpuid_has, vcpu);
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 3421ed7fcee0..cb02a7c2938b 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -3699,12 +3699,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > >               data &= ~(u64)0x40;     /* ignore flush filter disable */
> > >               data &= ~(u64)0x100;    /* ignore ignne emulation enable */
> > >               data &= ~(u64)0x8;      /* ignore TLB cache disable */
> > > +             data &= ~(u64)0x1000000;/* ignore TscFreqSel */
> > >
> > >               /* Handle McStatusWrEn */
> > >               if (data & ~BIT_ULL(18)) {
> > >                       kvm_pr_unimpl_wrmsr(vcpu, msr, data);
> > >                       return 1;
> > >               }
> > > +
> > > +             /*
> > > +              * When set, TscFreqSel is read-only. Attempts to
> > > +              * clear it are ignored.
> > > +              */
> > > +             data |= vcpu->arch.msr_hwcr & BIT_ULL(24);
> >
> >
> > The bit is read-only from the guest, but KVM needs to let userspace clear the
> > bit.
> 
> Why? We don't let userspace clear bit 1 of EFLAGS, which is also a
> "reads as one" bit.

Because that's architectural behavior, not dependent on FMS, and KVM needs to
write EFLAGS to have any hope of being useful, i.e. giving ownership of EFLAGS
to userspace is not a realistic option.

As proposed, if userspace sets CPUID to a magic FMS, and then changes the FMS to
something else, kvm_vcpu_after_set_cpuid() will not clear the bit and KVM will
end up wrongly enumerating the bit.  I doubt userspace would ever do that, but
it's at least possible.

That could be fixed by actively clearing vcpu->arch.msr_hwcr for other FMS values,
but then KVM would have to be 100% precise on the FMS matching, which would be a
maintenance nightmare.

In other words, userspace owns the vCPU model, and for good reasons.  KVM needs
to allow userspace to define a sane model, but with *very* few exceptions, KVM
should not try to "help" userspace by stuffing bits.

Pretty much everytime KVM tries to help, it causes problems.  E.g. initializing
perf_capabilities to kvm_caps.supported_perf_cap seems like a good thing, except
it presents a bogus model if userspace decides to not enumerate a vPMU to the
guest (Aaron was allegedly going to send a patch for this...).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]
  2023-09-22 18:15       ` Sean Christopherson
@ 2023-09-22 18:27         ` Jim Mattson
  2023-09-22 19:40           ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Jim Mattson @ 2023-09-22 18:27 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, Paolo Bonzini

On Fri, Sep 22, 2023 at 11:15 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Sep 22, 2023, Jim Mattson wrote:
> > On Fri, Sep 22, 2023 at 10:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > > IMO, we should delete the offending kernel code.  I don't see how it provides any
> > > value these days.
> >
> > Sure, but that doesn't help legacy guests.
>
> Heh, IMO they don't need help, their owners just need to be placated ;-)
>
> > > And *if* we want to change something in KVM so that we stop getting coustomer
> > > complaints about a useless bit, just let userspace stuff the bit.
> >
> > We want to make customers happy. That should not even be a question.
>
> Can we really not tell them "this is a benign guest bug, ignore it"?

What is the mechanism for doing that?

> > > I think we should also raise the issue with AMD (Borislav maybe?) and ask/demand
> > > that bits in HWCR that KVM allows to be set are architecturally defined.  It's
> > > totally fine if the value of bit 24 is uarch specific, but the behavior needs to
> > > be something that won't change from processor to processor.
> > >
> > > >       kvm_pmu_refresh(vcpu);
> > > >       vcpu->arch.cr4_guest_rsvd_bits =
> > > >           __cr4_reserved_bits(guest_cpuid_has, vcpu);
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 3421ed7fcee0..cb02a7c2938b 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -3699,12 +3699,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > >               data &= ~(u64)0x40;     /* ignore flush filter disable */
> > > >               data &= ~(u64)0x100;    /* ignore ignne emulation enable */
> > > >               data &= ~(u64)0x8;      /* ignore TLB cache disable */
> > > > +             data &= ~(u64)0x1000000;/* ignore TscFreqSel */
> > > >
> > > >               /* Handle McStatusWrEn */
> > > >               if (data & ~BIT_ULL(18)) {
> > > >                       kvm_pr_unimpl_wrmsr(vcpu, msr, data);
> > > >                       return 1;
> > > >               }
> > > > +
> > > > +             /*
> > > > +              * When set, TscFreqSel is read-only. Attempts to
> > > > +              * clear it are ignored.
> > > > +              */
> > > > +             data |= vcpu->arch.msr_hwcr & BIT_ULL(24);
> > >
> > >
> > > The bit is read-only from the guest, but KVM needs to let userspace clear the
> > > bit.
> >
> > Why? We don't let userspace clear bit 1 of EFLAGS, which is also a
> > "reads as one" bit.
>
> Because that's architectural behavior, not dependent on FMS, and KVM needs to
> write EFLAGS to have any hope of being useful, i.e. giving ownership of EFLAGS
> to userspace is not a realistic option.

Remind me what "MSR" stands for. :)

> As proposed, if userspace sets CPUID to a magic FMS, and then changes the FMS to
> something else, kvm_vcpu_after_set_cpuid() will not clear the bit and KVM will
> end up wrongly enumerating the bit.  I doubt userspace would ever do that, but
> it's at least possible.
>
> That could be fixed by actively clearing vcpu->arch.msr_hwcr for other FMS values,
> but then KVM would have to be 100% precise on the FMS matching, which would be a
> maintenance nightmare.

What if I did something crude like we do for MSR_IA32_MISC_ENABLE, and
just set the bit at reset regardless of FMS:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cb02a7c2938b..4d7d0de42a9d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12086,6 +12086,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu,
bool init_event)
                vcpu->arch.msr_misc_features_enables = 0;
                vcpu->arch.ia32_misc_enable_msr =
MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |

MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
+               vcpu_arch.msr_hwcr = BIT_ULL(24);

                __kvm_set_xcr(vcpu, 0, XFEATURE_MASK_FP);
                __kvm_set_msr(vcpu, MSR_IA32_XSS, 0, true);

> In other words, userspace owns the vCPU model, and for good reasons.  KVM needs
> to allow userspace to define a sane model, but with *very* few exceptions, KVM
> should not try to "help" userspace by stuffing bits.

Okay. What about the IA32_MISC_ENABLE bits above?

> Pretty much everytime KVM tries to help, it causes problems.  E.g. initializing
> perf_capabilities to kvm_caps.supported_perf_cap seems like a good thing, except
> it presents a bogus model if userspace decides to not enumerate a vPMU to the
> guest (Aaron was allegedly going to send a patch for this...).

KVM is nothing if not inconsistent.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]
  2023-09-22 18:27         ` Jim Mattson
@ 2023-09-22 19:40           ` Sean Christopherson
  2023-09-22 20:16             ` Jim Mattson
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2023-09-22 19:40 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, Paolo Bonzini

On Fri, Sep 22, 2023, Jim Mattson wrote:
> On Fri, Sep 22, 2023 at 11:15 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Sep 22, 2023, Jim Mattson wrote:
> > > On Fri, Sep 22, 2023 at 10:21 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > IMO, we should delete the offending kernel code.  I don't see how it provides any
> > > > value these days.
> > >
> > > Sure, but that doesn't help legacy guests.
> >
> > Heh, IMO they don't need help, their owners just need to be placated ;-)
> >
> > > > And *if* we want to change something in KVM so that we stop getting coustomer
> > > > complaints about a useless bit, just let userspace stuff the bit.
> > >
> > > We want to make customers happy. That should not even be a question.
> >
> > Can we really not tell them "this is a benign guest bug, ignore it"?
> 
> What is the mechanism for doing that?

Not my problem?  :-D

> > Because that's architectural behavior, not dependent on FMS, and KVM needs to
> > write EFLAGS to have any hope of being useful, i.e. giving ownership of EFLAGS
> > to userspace is not a realistic option.
> 
> Remind me what "MSR" stands for. :)

Heh, that acronym has long, long since lost all meaning.

Joking aside, I think KVM needs to set a very, very high bare for emulating any
part of any MSR that is truly model specific.  IMO, it's far too likely that KVM
will be the one left holding the bag in such situations.

> > As proposed, if userspace sets CPUID to a magic FMS, and then changes the FMS to
> > something else, kvm_vcpu_after_set_cpuid() will not clear the bit and KVM will
> > end up wrongly enumerating the bit.  I doubt userspace would ever do that, but
> > it's at least possible.
> >
> > That could be fixed by actively clearing vcpu->arch.msr_hwcr for other FMS values,
> > but then KVM would have to be 100% precise on the FMS matching, which would be a
> > maintenance nightmare.
> 
> What if I did something crude like we do for MSR_IA32_MISC_ENABLE, and
> just set the bit at reset regardless of FMS:

I'd prefer that over playing games with FMS.  Though my first chioce would still
be to punt the decision to userspace.

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cb02a7c2938b..4d7d0de42a9d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12086,6 +12086,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu,
> bool init_event)
>                 vcpu->arch.msr_misc_features_enables = 0;
>                 vcpu->arch.ia32_misc_enable_msr =
> MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |
> 
> MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
> +               vcpu_arch.msr_hwcr = BIT_ULL(24);
> 
>                 __kvm_set_xcr(vcpu, 0, XFEATURE_MASK_FP);
>                 __kvm_set_msr(vcpu, MSR_IA32_XSS, 0, true);
> 
> > In other words, userspace owns the vCPU model, and for good reasons.  KVM needs
> > to allow userspace to define a sane model, but with *very* few exceptions, KVM
> > should not try to "help" userspace by stuffing bits.
> 
> Okay. What about the IA32_MISC_ENABLE bits above?

One of the exceptions where I don't see a better option, and hopefully something
that Intel won't repeat in the future.  Though I'm not exactly brimming with
confidence that Intel won't retroactively add more "gotcha! unsupported!" bits
in the future when they realize they forgot add a useful CPUID feature bit.

> > Pretty much everytime KVM tries to help, it causes problems.  E.g. initializing
> > perf_capabilities to kvm_caps.supported_perf_cap seems like a good thing, except
> > it presents a bogus model if userspace decides to not enumerate a vPMU to the
> > guest (Aaron was allegedly going to send a patch for this...).
> 
> KVM is nothing if not inconsistent.

Yeah, the existing inconsistencies are painful, but that's not a good reason to
continue what I see as bad behavior.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]
  2023-09-22 19:40           ` Sean Christopherson
@ 2023-09-22 20:16             ` Jim Mattson
  2023-09-22 20:51               ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Jim Mattson @ 2023-09-22 20:16 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, Paolo Bonzini

On Fri, Sep 22, 2023 at 12:40 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Sep 22, 2023, Jim Mattson wrote:
> > Okay. What about the IA32_MISC_ENABLE bits above?
>
> One of the exceptions where I don't see a better option, and hopefully something
> that Intel won't repeat in the future.  Though I'm not exactly brimming with
> confidence that Intel won't retroactively add more "gotcha! unsupported!" bits
> in the future when they realize they forgot add a useful CPUID feature bit.

I don't understand the difference here. Why not make userspace
responsible for setting these bits as well?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]
  2023-09-22 20:16             ` Jim Mattson
@ 2023-09-22 20:51               ` Sean Christopherson
  0 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2023-09-22 20:51 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, Paolo Bonzini

On Fri, Sep 22, 2023, Jim Mattson wrote:
> On Fri, Sep 22, 2023 at 12:40 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Sep 22, 2023, Jim Mattson wrote:
> > > Okay. What about the IA32_MISC_ENABLE bits above?
> >
> > One of the exceptions where I don't see a better option, and hopefully something
> > that Intel won't repeat in the future.  Though I'm not exactly brimming with
> > confidence that Intel won't retroactively add more "gotcha! unsupported!" bits
> > in the future when they realize they forgot add a useful CPUID feature bit.
> 
> I don't understand the difference here. Why not make userspace
> responsible for setting these bits as well?

That probably would have been the ideal approach.  I'm not entirely sure it would
have actually been feasible though, as I suspect enumerting X86_FEATURE_DS without
any kind of guard would break userspace that reflects KVM_GET_SUPPORTED_CPUID
back into KVM_SET_CPUID(2).

Even better would have been to never merge PEBS support in KVM in its current
form.  The whole thing is a house of cards, e.g. if counters are "cross-mapped"
then the guest counters simply stop working.  And those warts aside, the entire
enabling was a chaotic mess.  See commit 9fc222967a39 ("KVM: x86: Give host
userspace full control of MSR_IA32_MISC_ENABLES").

In other words, setting the UNAVAILABLE bits was the least awful way to salvage
the mess.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-22 20:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-22 16:42 [PATCH 1/3] KVM: x86: Allow HWCR.McStatusWrEn to be cleared once set Jim Mattson
2023-09-22 16:42 ` [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24] Jim Mattson
2023-09-22 17:21   ` Sean Christopherson
2023-09-22 17:48     ` Jim Mattson
2023-09-22 18:15       ` Sean Christopherson
2023-09-22 18:27         ` Jim Mattson
2023-09-22 19:40           ` Sean Christopherson
2023-09-22 20:16             ` Jim Mattson
2023-09-22 20:51               ` Sean Christopherson
2023-09-22 16:42 ` [PATCH 3/3] KVM: selftests: Test behavior of HWCR Jim Mattson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.