stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
@ 2022-10-27  9:20 Paolo Bonzini
  2023-01-24 23:16 ` Jim Mattson
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2022-10-27  9:20 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, stable

Passing the host topology to the guest is almost certainly wrong
and will confuse the scheduler.  In addition, several fields of
these CPUID leaves vary on each processor; it is simply impossible to
return the right values from KVM_GET_SUPPORTED_CPUID in such a way that
they can be passed to KVM_SET_CPUID2.

The values that will most likely prevent confusion are all zeroes.
Userspace will have to override it anyway if it wishes to present a
specific topology to the guest.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst | 14 ++++++++++++++
 arch/x86/kvm/cpuid.c           | 32 ++++++++++++++++----------------
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index eee9f857a986..20f4f6b302ff 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8249,6 +8249,20 @@ CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``
 It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel
 has enabled in-kernel emulation of the local APIC.
 
+CPU topology
+~~~~~~~~~~~~
+
+Several CPUID values include topology information for the host CPU:
+0x0b and 0x1f for Intel systems, 0x8000001e for AMD systems.  Different
+versions of KVM return different values for this information and userspace
+should not rely on it.  Currently they return all zeroes.
+
+If userspace wishes to set up a guest topology, it should be careful that
+the values of these three leaves differ for each CPU.  In particular,
+the APIC ID is found in EDX for all subleaves of 0x0b and 0x1f, and in EAX
+for 0x8000001e; the latter also encodes the core id and node id in bits
+7:0 of EBX and ECX respectively.
+
 Obsolete ioctls and capabilities
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0810e93cbedc..164bfb7e7a16 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -759,16 +759,22 @@ struct kvm_cpuid_array {
 	int nent;
 };
 
+static struct kvm_cpuid_entry2 *get_next_cpuid(struct kvm_cpuid_array *array)
+{
+	if (array->nent >= array->maxnent)
+		return NULL;
+
+	return &array->entries[array->nent++];
+}
+
 static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
 					      u32 function, u32 index)
 {
-	struct kvm_cpuid_entry2 *entry;
+	struct kvm_cpuid_entry2 *entry = get_next_cpuid(array);
 
-	if (array->nent >= array->maxnent)
+	if (!entry)
 		return NULL;
 
-	entry = &array->entries[array->nent++];
-
 	memset(entry, 0, sizeof(*entry));
 	entry->function = function;
 	entry->index = index;
@@ -945,22 +951,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->edx = edx.full;
 		break;
 	}
-	/*
-	 * Per Intel's SDM, the 0x1f is a superset of 0xb,
-	 * thus they can be handled by common code.
-	 */
 	case 0x1f:
 	case 0xb:
 		/*
-		 * Populate entries until the level type (ECX[15:8]) of the
-		 * previous entry is zero.  Note, CPUID EAX.{0x1f,0xb}.0 is
-		 * the starting entry, filled by the primary do_host_cpuid().
+		 * No topology; a valid topology is indicated by the presence
+		 * of subleaf 1.
 		 */
-		for (i = 1; entry->ecx & 0xff00; ++i) {
-			entry = do_host_cpuid(array, function, i);
-			if (!entry)
-				goto out;
-		}
+		entry->eax = entry->ebx = entry->ecx = 0;
 		break;
 	case 0xd: {
 		u64 permitted_xcr0 = kvm_caps.supported_xcr0 & xstate_get_guest_group_perm();
@@ -1193,6 +1190,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->ebx = entry->ecx = entry->edx = 0;
 		break;
 	case 0x8000001e:
+		/* Do not return host topology information.  */
+		entry->eax = entry->ebx = entry->ecx = 0;
+		entry->edx = 0; /* reserved */
 		break;
 	case 0x8000001F:
 		if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2022-10-27  9:20 [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID Paolo Bonzini
@ 2023-01-24 23:16 ` Jim Mattson
  2023-01-25 14:17   ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Jim Mattson @ 2023-01-24 23:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc, stable

On Thu, Oct 27, 2022 at 2:21 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Passing the host topology to the guest is almost certainly wrong
> and will confuse the scheduler.  In addition, several fields of
> these CPUID leaves vary on each processor; it is simply impossible to
> return the right values from KVM_GET_SUPPORTED_CPUID in such a way that
> they can be passed to KVM_SET_CPUID2.
>
> The values that will most likely prevent confusion are all zeroes.
> Userspace will have to override it anyway if it wishes to present a
> specific topology to the guest.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst | 14 ++++++++++++++
>  arch/x86/kvm/cpuid.c           | 32 ++++++++++++++++----------------
>  2 files changed, 30 insertions(+), 16 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index eee9f857a986..20f4f6b302ff 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8249,6 +8249,20 @@ CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``
>  It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel
>  has enabled in-kernel emulation of the local APIC.
>
> +CPU topology
> +~~~~~~~~~~~~
> +
> +Several CPUID values include topology information for the host CPU:
> +0x0b and 0x1f for Intel systems, 0x8000001e for AMD systems.  Different
> +versions of KVM return different values for this information and userspace
> +should not rely on it.  Currently they return all zeroes.
> +
> +If userspace wishes to set up a guest topology, it should be careful that
> +the values of these three leaves differ for each CPU.  In particular,
> +the APIC ID is found in EDX for all subleaves of 0x0b and 0x1f, and in EAX
> +for 0x8000001e; the latter also encodes the core id and node id in bits
> +7:0 of EBX and ECX respectively.
> +
>  Obsolete ioctls and capabilities
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0810e93cbedc..164bfb7e7a16 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -759,16 +759,22 @@ struct kvm_cpuid_array {
>         int nent;
>  };
>
> +static struct kvm_cpuid_entry2 *get_next_cpuid(struct kvm_cpuid_array *array)
> +{
> +       if (array->nent >= array->maxnent)
> +               return NULL;
> +
> +       return &array->entries[array->nent++];
> +}
> +
>  static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
>                                               u32 function, u32 index)
>  {
> -       struct kvm_cpuid_entry2 *entry;
> +       struct kvm_cpuid_entry2 *entry = get_next_cpuid(array);
>
> -       if (array->nent >= array->maxnent)
> +       if (!entry)
>                 return NULL;
>
> -       entry = &array->entries[array->nent++];
> -
>         memset(entry, 0, sizeof(*entry));
>         entry->function = function;
>         entry->index = index;
> @@ -945,22 +951,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>                 entry->edx = edx.full;
>                 break;
>         }
> -       /*
> -        * Per Intel's SDM, the 0x1f is a superset of 0xb,
> -        * thus they can be handled by common code.
> -        */
>         case 0x1f:
>         case 0xb:
>                 /*
> -                * Populate entries until the level type (ECX[15:8]) of the
> -                * previous entry is zero.  Note, CPUID EAX.{0x1f,0xb}.0 is
> -                * the starting entry, filled by the primary do_host_cpuid().
> +                * No topology; a valid topology is indicated by the presence
> +                * of subleaf 1.
>                  */
> -               for (i = 1; entry->ecx & 0xff00; ++i) {
> -                       entry = do_host_cpuid(array, function, i);
> -                       if (!entry)
> -                               goto out;
> -               }
> +               entry->eax = entry->ebx = entry->ecx = 0;
>                 break;
>         case 0xd: {
>                 u64 permitted_xcr0 = kvm_caps.supported_xcr0 & xstate_get_guest_group_perm();
> @@ -1193,6 +1190,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>                 entry->ebx = entry->ecx = entry->edx = 0;
>                 break;
>         case 0x8000001e:
> +               /* Do not return host topology information.  */
> +               entry->eax = entry->ebx = entry->ecx = 0;
> +               entry->edx = 0; /* reserved */
>                 break;
>         case 0x8000001F:
>                 if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
> --
> 2.31.1
>

This is a userspace ABI change that breaks existing hypervisors.
Please don't do this. Userspace ABIs are supposed to be inviolate.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2023-01-24 23:16 ` Jim Mattson
@ 2023-01-25 14:17   ` Paolo Bonzini
  2023-01-25 16:47     ` Jim Mattson
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2023-01-25 14:17 UTC (permalink / raw)
  To: Jim Mattson; +Cc: linux-kernel, kvm, seanjc, stable

On 1/25/23 00:16, Jim Mattson wrote:
> This is a userspace ABI change that breaks existing hypervisors.
> Please don't do this. Userspace ABIs are supposed to be inviolate.

What exactly is broken?

Part of the definition of the API is that you can take 
KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs. 
Returning host topology information for a random host vCPU definitely 
violates the contract.

Furthermore, any existing userspace should be prepared to deal with 
nonexistent host topology leaves.  If you mean that said userspace would 
now pass no topology to the guest, that's not an API change either.

(Now, there are certainly other parts of the KVM_GET_SUPPORTED_CPUID 
contract that should be specified better.  But that should be done for 
each leaf one by one, which I intend to do, and would not extend to 
these three host topology leaves).

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2023-01-25 14:17   ` Paolo Bonzini
@ 2023-01-25 16:47     ` Jim Mattson
  2023-01-25 21:46       ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Jim Mattson @ 2023-01-25 16:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc, stable

On Wed, Jan 25, 2023 at 6:17 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 1/25/23 00:16, Jim Mattson wrote:
> > This is a userspace ABI change that breaks existing hypervisors.
> > Please don't do this. Userspace ABIs are supposed to be inviolate.
>
> What exactly is broken?

KVM_GET_SUPPORTED_CPUID no longer returns the host topology in leaf 0xB.

> Part of the definition of the API is that you can take
> KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs.
> Returning host topology information for a random host vCPU definitely
> violates the contract.

You are attempting to rewrite history. Leaf 0xB was added to
KVM_GET_SUPPORTED_CPUID in commit 0771671749b5 ("KVM: Enhance guest
cpuid management"), and the only documentation of the
KVM_GET_SUPPORTED_CPUID ioctl at that time was in the commit message:

     - KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
       supports

There is nothing in the commit message or the official documentation
at the time that the ioctl was added that says anything about passing
the result to KVM_SET_CPUID2 for all vCPUs. Operationally, it is quite
clear from the committed code that the intention was to return the
host topology information for the current logical processor.

Any future changes to either the operational behavior or the
documented behavior of the ABI surely demand a version bump.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2023-01-25 16:47     ` Jim Mattson
@ 2023-01-25 21:46       ` Paolo Bonzini
  2023-01-25 22:09         ` Jim Mattson
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2023-01-25 21:46 UTC (permalink / raw)
  To: Jim Mattson; +Cc: linux-kernel, kvm, seanjc, stable

On 1/25/23 17:47, Jim Mattson wrote:
>> Part of the definition of the API is that you can take
>> KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs.
>> Returning host topology information for a random host vCPU definitely
>> violates the contract.
> 
> You are attempting to rewrite history.  Leaf 0xB was added to > KVM_GET_SUPPORTED_CPUID in commit 0771671749b5 ("KVM: Enhance guest
> cpuid management"), and the only documentation of the
> KVM_GET_SUPPORTED_CPUID ioctl at that time was in the commit message:
> 
>       - KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
>         supports
>
> [...] the intention was to return the
> host topology information for the current logical processor.

The handling of unknown features is so naive in that commit, that I 
don't think it is possible to read anything from the implementation; and 
it certainly should not be a paragon for a future-proof implementation 
of KVM_GET_SUPPORTED_CPUID.

For example, it only hid _known_ CPUID leaves or features and passed the 
unknown ones through, which you'll agree is completely broken.  It also 
didn't try to handle all leaves for which ECX might turn out to be 
significant---which happened for EAX=7 so the commit returns a wrong 
output for CPUID[EAX=7,ECX=0].EAX.

In other words, the only reason it handles 0xB is because it was known 
to have subleaves.

We can get more information about how userspace was intended to use it 
from the qemu-kvm fork, which at the time was practically the only KVM 
userspace.  As of 2009 it was only checking a handful of leaves:

https://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git/tree/target-i386/kvm.c?h=kvm-88#n133

so shall we say that userspace is supposed to build each CPUID leaf one 
by one and use KVM_GET_SUPPORTED_CPUID2 for validation only?  I think 
the first committed documentation agrees: "Userspace can use the 
information returned by this ioctl to construct cpuid information (for 
KVM_SET_CPUID2) that is consistent with hardware, kernel, and userspace 
capabilities, and with user requirements".

However, that's the theory.  "Do not break userspace" also involves 
looking at how userspace *really* uses the API, and make compromises to 
cater to those uses; which is different from rewriting history.

And in practice, people basically stopped reading after "(for 
KVM_SET_CPUID2)".

For example in kvmtool:

	kvm_cpuid->nent = MAX_KVM_CPUID_ENTRIES;
	if (ioctl(vcpu->kvm->sys_fd, KVM_GET_SUPPORTED_CPUID, kvm_cpuid) < 0)
		die_perror("KVM_GET_SUPPORTED_CPUID failed");

	filter_cpuid(kvm_cpuid, vcpu->cpu_id);

	if (ioctl(vcpu->vcpu_fd, KVM_SET_CPUID2, kvm_cpuid) < 0)
		die_perror("KVM_SET_CPUID2 failed");

where filter_cpuid only does minor adjustments that do not include 0xB, 
0x1F and 0x8000001E.  The result is a topology that makes no sense if 
host #vCPUs != guest #vCPUs, and which doesn't include the correct APIC 
id in EDX.

https://github.com/kvmtool/kvmtool/blob/5657dd3e48b41bc6db38fa657994bc0e030fd31f/x86/cpuid.c


crosvm does optionally attempt to pass through leaves 0xB and 0x1F, but 
it fails to adjust the APIC id in EDX.  On the other hand it also passes 
through 0x8000001E if ctx.cpu_config.host_cpu_topology is false, 
incorrectly.  So on one hand this patch breaks host_cpu_topology == 
true, on the other hand it fixes host_cpu_topology == false on AMD 
processors.

https://github.com/google/crosvm/blob/cc79897fc0813ee8412e6395648593898962ec82/x86_64/src/cpuid.rs#L121


The rust-vmm reference hypervisor adjusts the APIC id in EDX for 0xB but 
not for 0x1F.  Apart from that it passes through the host topology 
leaves, again resulting in nonsensical topology for host #vCPUs != guest 
#vCPUs.

https://github.com/rust-vmm/vmm-reference/blob/5cde58bc955afca8a180585a9f01c82d6277a755/src/vm-vcpu-ref/src/x86_64/cpuid.rs


Firecracker, finally, ignores KVM_GET_SUPPORTED_CPUID's output for 0xb 
and 0x8000001E (good!) but fails to do the same for 0x1F, so this patch 
is again a fix of sorts---having all zeroes in 0x1F is better than 
having a value that is valid but inconsistent with 0xB.

https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/intel.rs#L120
https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/amd.rs#L88


So basically the only open source userspace that is penalized (but not 
broken, and also partly fixed) by the patch is crosvm.  QEMU doesn't 
care, while firecracker/kvmtool/vmm-reference are a net positive.

Paolo

> Any future changes to either the operational behavior or the
> documented behavior of the ABI surely demand a version bump.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2023-01-25 21:46       ` Paolo Bonzini
@ 2023-01-25 22:09         ` Jim Mattson
  2023-01-25 22:43           ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Jim Mattson @ 2023-01-25 22:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc, stable

On Wed, Jan 25, 2023 at 1:46 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 1/25/23 17:47, Jim Mattson wrote:
> >> Part of the definition of the API is that you can take
> >> KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs.
> >> Returning host topology information for a random host vCPU definitely
> >> violates the contract.
> >
> > You are attempting to rewrite history.  Leaf 0xB was added to > KVM_GET_SUPPORTED_CPUID in commit 0771671749b5 ("KVM: Enhance guest
> > cpuid management"), and the only documentation of the
> > KVM_GET_SUPPORTED_CPUID ioctl at that time was in the commit message:
> >
> >       - KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
> >         supports
> >
> > [...] the intention was to return the
> > host topology information for the current logical processor.
>
> The handling of unknown features is so naive in that commit, that I
> don't think it is possible to read anything from the implementation; and
> it certainly should not be a paragon for a future-proof implementation
> of KVM_GET_SUPPORTED_CPUID.
>
> For example, it only hid _known_ CPUID leaves or features and passed the
> unknown ones through, which you'll agree is completely broken.  It also
> didn't try to handle all leaves for which ECX might turn out to be
> significant---which happened for EAX=7 so the commit returns a wrong
> output for CPUID[EAX=7,ECX=0].EAX.
>
> In other words, the only reason it handles 0xB is because it was known
> to have subleaves.
>
> We can get more information about how userspace was intended to use it
> from the qemu-kvm fork, which at the time was practically the only KVM
> userspace.  As of 2009 it was only checking a handful of leaves:
>
> https://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git/tree/target-i386/kvm.c?h=kvm-88#n133
>
> so shall we say that userspace is supposed to build each CPUID leaf one
> by one and use KVM_GET_SUPPORTED_CPUID2 for validation only?  I think
> the first committed documentation agrees: "Userspace can use the
> information returned by this ioctl to construct cpuid information (for
> KVM_SET_CPUID2) that is consistent with hardware, kernel, and userspace
> capabilities, and with user requirements".
>
> However, that's the theory.  "Do not break userspace" also involves
> looking at how userspace *really* uses the API, and make compromises to
> cater to those uses; which is different from rewriting history.
>
> And in practice, people basically stopped reading after "(for
> KVM_SET_CPUID2)".
>
> For example in kvmtool:
>
>         kvm_cpuid->nent = MAX_KVM_CPUID_ENTRIES;
>         if (ioctl(vcpu->kvm->sys_fd, KVM_GET_SUPPORTED_CPUID, kvm_cpuid) < 0)
>                 die_perror("KVM_GET_SUPPORTED_CPUID failed");
>
>         filter_cpuid(kvm_cpuid, vcpu->cpu_id);
>
>         if (ioctl(vcpu->vcpu_fd, KVM_SET_CPUID2, kvm_cpuid) < 0)
>                 die_perror("KVM_SET_CPUID2 failed");
>
> where filter_cpuid only does minor adjustments that do not include 0xB,
> 0x1F and 0x8000001E.  The result is a topology that makes no sense if
> host #vCPUs != guest #vCPUs, and which doesn't include the correct APIC
> id in EDX.
>
> https://github.com/kvmtool/kvmtool/blob/5657dd3e48b41bc6db38fa657994bc0e030fd31f/x86/cpuid.c
>
>
> crosvm does optionally attempt to pass through leaves 0xB and 0x1F, but
> it fails to adjust the APIC id in EDX.  On the other hand it also passes
> through 0x8000001E if ctx.cpu_config.host_cpu_topology is false,
> incorrectly.  So on one hand this patch breaks host_cpu_topology ==
> true, on the other hand it fixes host_cpu_topology == false on AMD
> processors.
>
> https://github.com/google/crosvm/blob/cc79897fc0813ee8412e6395648593898962ec82/x86_64/src/cpuid.rs#L121
>
>
> The rust-vmm reference hypervisor adjusts the APIC id in EDX for 0xB but
> not for 0x1F.  Apart from that it passes through the host topology
> leaves, again resulting in nonsensical topology for host #vCPUs != guest
> #vCPUs.
>
> https://github.com/rust-vmm/vmm-reference/blob/5cde58bc955afca8a180585a9f01c82d6277a755/src/vm-vcpu-ref/src/x86_64/cpuid.rs
>
>
> Firecracker, finally, ignores KVM_GET_SUPPORTED_CPUID's output for 0xb
> and 0x8000001E (good!) but fails to do the same for 0x1F, so this patch
> is again a fix of sorts---having all zeroes in 0x1F is better than
> having a value that is valid but inconsistent with 0xB.
>
> https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/intel.rs#L120
> https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/amd.rs#L88
>
>
> So basically the only open source userspace that is penalized (but not
> broken, and also partly fixed) by the patch is crosvm.  QEMU doesn't
> care, while firecracker/kvmtool/vmm-reference are a net positive.
>
> Paolo

The topology leaves returned by KVM_GET_SUPPORTED_CPUID *for over a
decade* have been passed through unmodified from the host. They have
never made sense for KVM_SET_CPUID2, with the unlikely exception of a
whole-host VM.

Our VMM populates the topology of the guest CPUID table on its own, as
any VMM must. However, it uses the host topology (which
KVM_GET_SUPPORTED_CPUID has been providing pass-through *for over a
decade*) to see if the requested guest topology is possible.

Changing a long-established ABI in a way that breaks userspace
applications is a bad practice. I didn't think we, as a community, did
that. I didn't realize that we were only catering to open source
implementations here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2023-01-25 22:09         ` Jim Mattson
@ 2023-01-25 22:43           ` Paolo Bonzini
  2023-01-26  0:58             ` Jim Mattson
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2023-01-25 22:43 UTC (permalink / raw)
  To: Jim Mattson; +Cc: linux-kernel, kvm, seanjc, stable

On 1/25/23 23:09, Jim Mattson wrote:
> The topology leaves returned by KVM_GET_SUPPORTED_CPUID *for over a
> decade* have been passed through unmodified from the host. They have
> never made sense for KVM_SET_CPUID2, with the unlikely exception of a
> whole-host VM.

True, unfortunately people have not read the nonexistent documentation 
and they are:

1) rarely adjusting correctly all of 0xB, 0x1F and 0x8000001E;

2) never bounding CPUID[EAX=0].EAX to a known CPUID leaf, resulting for 
example in inconsistencies between 0xB and 0x1F.

*But* (2) should not be needed unless you care about maintaining 
homogeneous CPUID within a VM migration pool.  For something like 
kvmtool, having to do (2) would be a workaround for the bug that this 
patch fixes.

> Our VMM populates the topology of the guest CPUID table on its own, as
> any VMM must. However, it uses the host topology (which
> KVM_GET_SUPPORTED_CPUID has been providing pass-through *for over a
> decade*) to see if the requested guest topology is possible.

Ok, thanks; this is useful to know.

> Changing a long-established ABI in a way that breaks userspace
> applications is a bad practice. I didn't think we, as a community, did
> that. I didn't realize that we were only catering to open source
> implementations here.

We aren't.  But the open source implementations provide some guidance as 
to how the API is being used in the wild, and what the pitfalls are.

You wrote it yourself: any VMM must either populate the topology on its 
own, or possibly fill it with zeros.  Returning a value that is 
extremely unlikely to be used is worse in pretty much every way (apart 
from not breaking your VMM, of course).

With a total of six known users (QEMU, crosvm, kvmtool, firecracker, 
rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and 
damned if it doesn't.  There is a tension between fixing the one VMM 
that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly, 
and fixing 3-4 that were silently broken and are now fixed.  I will 
probably send a patch to crosvm, though.

The VMM being _proprietary_ doesn't really matter, however it does 
matter to me that it is not _public_: it is only used within Google, and 
the breakage is neither hard to fix in the VMM nor hard to temporarily 
avoid by reverting the patch in the Google kernel.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2023-01-25 22:43           ` Paolo Bonzini
@ 2023-01-26  0:58             ` Jim Mattson
  2023-01-26  9:40               ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Jim Mattson @ 2023-01-26  0:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc, stable

On Wed, Jan 25, 2023 at 2:44 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 1/25/23 23:09, Jim Mattson wrote:
> > The topology leaves returned by KVM_GET_SUPPORTED_CPUID *for over a
> > decade* have been passed through unmodified from the host. They have
> > never made sense for KVM_SET_CPUID2, with the unlikely exception of a
> > whole-host VM.
>
> True, unfortunately people have not read the nonexistent documentation
> and they are:
>
> 1) rarely adjusting correctly all of 0xB, 0x1F and 0x8000001E;
>
> 2) never bounding CPUID[EAX=0].EAX to a known CPUID leaf, resulting for
> example in inconsistencies between 0xB and 0x1F.
>
> *But* (2) should not be needed unless you care about maintaining
> homogeneous CPUID within a VM migration pool.  For something like
> kvmtool, having to do (2) would be a workaround for the bug that this
> patch fixes.

Maybe we should just populate up to leaf 3. :-)

> > Our VMM populates the topology of the guest CPUID table on its own, as
> > any VMM must. However, it uses the host topology (which
> > KVM_GET_SUPPORTED_CPUID has been providing pass-through *for over a
> > decade*) to see if the requested guest topology is possible.
>
> Ok, thanks; this is useful to know.
>
> > Changing a long-established ABI in a way that breaks userspace
> > applications is a bad practice. I didn't think we, as a community, did
> > that. I didn't realize that we were only catering to open source
> > implementations here.
>
> We aren't.  But the open source implementations provide some guidance as
> to how the API is being used in the wild, and what the pitfalls are.
>
> You wrote it yourself: any VMM must either populate the topology on its
> own, or possibly fill it with zeros.  Returning a value that is
> extremely unlikely to be used is worse in pretty much every way (apart
> from not breaking your VMM, of course).

I've complained about this particular ioctl more than I can remember.
This is just one of its many problems.

> With a total of six known users (QEMU, crosvm, kvmtool, firecracker,
> rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and
> damned if it doesn't.  There is a tension between fixing the one VMM
> that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly,
> and fixing 3-4 that were silently broken and are now fixed.  I will
> probably send a patch to crosvm, though.
>
> The VMM being _proprietary_ doesn't really matter, however it does
> matter to me that it is not _public_: it is only used within Google, and
> the breakage is neither hard to fix in the VMM nor hard to temporarily
> avoid by reverting the patch in the Google kernel.

Sadly, there isn't a single kernel involved. People running our VMM on
their desktops are going to be impacted as soon as this patch hits
that distro. (I don't know if I can say which distro that is.) So, now
we have to get the VMM folks to urgently accommodate this change and
get a new distribution out.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2023-01-26  0:58             ` Jim Mattson
@ 2023-01-26  9:40               ` Paolo Bonzini
  2023-01-26 16:06                 ` Jim Mattson
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2023-01-26  9:40 UTC (permalink / raw)
  To: Jim Mattson; +Cc: linux-kernel, kvm, seanjc, stable

On 1/26/23 01:58, Jim Mattson wrote:
>> You wrote it yourself: any VMM must either populate the topology on its
>> own, or possibly fill it with zeros.  Returning a value that is
>> extremely unlikely to be used is worse in pretty much every way (apart
>> from not breaking your VMM, of course).
> 
> I've complained about this particular ioctl more than I can remember.
> This is just one of its many problems.

I agree.  At the very least it should have been a VM ioctl.

>> With a total of six known users (QEMU, crosvm, kvmtool, firecracker,
>> rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and
>> damned if it doesn't.  There is a tension between fixing the one VMM
>> that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly,
>> and fixing 3-4 that were silently broken and are now fixed.  I will
>> probably send a patch to crosvm, though.
>>
>> The VMM being _proprietary_ doesn't really matter, however it does
>> matter to me that it is not _public_: it is only used within Google, and
>> the breakage is neither hard to fix in the VMM nor hard to temporarily
>> avoid by reverting the patch in the Google kernel.
> 
> Sadly, there isn't a single kernel involved. People running our VMM on
> their desktops are going to be impacted as soon as this patch hits
> that distro. (I don't know if I can say which distro that is.) So, now
> we have to get the VMM folks to urgently accommodate this change and
> get a new distribution out.

Ok, this is what is needed to make a more informed choice.  To be clear, 
this is _still_ not public (for example it's not ChromeOS), so there is 
at least some control on what version of the VMM they use?  Would it 
make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2023-01-26  9:40               ` Paolo Bonzini
@ 2023-01-26 16:06                 ` Jim Mattson
  2023-01-26 17:47                   ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Jim Mattson @ 2023-01-26 16:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc, stable

On Thu, Jan 26, 2023 at 1:40 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 1/26/23 01:58, Jim Mattson wrote:
> >> You wrote it yourself: any VMM must either populate the topology on its
> >> own, or possibly fill it with zeros.  Returning a value that is
> >> extremely unlikely to be used is worse in pretty much every way (apart
> >> from not breaking your VMM, of course).
> >
> > I've complained about this particular ioctl more than I can remember.
> > This is just one of its many problems.
>
> I agree.  At the very least it should have been a VM ioctl.
>
> >> With a total of six known users (QEMU, crosvm, kvmtool, firecracker,
> >> rust-vmm, and the Google VMM), KVM is damned if it reverts the patch and
> >> damned if it doesn't.  There is a tension between fixing the one VMM
> >> that was using KVM_GET_SUPPORTED_CPUID correctly and now breaks loudly,
> >> and fixing 3-4 that were silently broken and are now fixed.  I will
> >> probably send a patch to crosvm, though.
> >>
> >> The VMM being _proprietary_ doesn't really matter, however it does
> >> matter to me that it is not _public_: it is only used within Google, and
> >> the breakage is neither hard to fix in the VMM nor hard to temporarily
> >> avoid by reverting the patch in the Google kernel.
> >
> > Sadly, there isn't a single kernel involved. People running our VMM on
> > their desktops are going to be impacted as soon as this patch hits
> > that distro. (I don't know if I can say which distro that is.) So, now
> > we have to get the VMM folks to urgently accommodate this change and
> > get a new distribution out.
>
> Ok, this is what is needed to make a more informed choice.  To be clear,
> this is _still_ not public (for example it's not ChromeOS), so there is
> at least some control on what version of the VMM they use?  Would it
> make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?

Mainline isn't a problem. I'm more worried about 5.19 LTS.

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2023-01-26 16:06                 ` Jim Mattson
@ 2023-01-26 17:47                   ` Paolo Bonzini
  2023-01-26 20:45                     ` Jim Mattson
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2023-01-26 17:47 UTC (permalink / raw)
  To: Jim Mattson; +Cc: linux-kernel, kvm, seanjc, stable

On 1/26/23 17:06, Jim Mattson wrote:
>>> Sadly, there isn't a single kernel involved. People running our VMM on
>>> their desktops are going to be impacted as soon as this patch hits
>>> that distro. (I don't know if I can say which distro that is.) So, now
>>> we have to get the VMM folks to urgently accommodate this change and
>>> get a new distribution out.
>>
>> Ok, this is what is needed to make a more informed choice.  To be clear,
>> this is _still_ not public (for example it's not ChromeOS), so there is
>> at least some control on what version of the VMM they use?  Would it
>> make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?
> 
> Mainline isn't a problem. I'm more worried about 5.19 LTS.

5.19 is not LTS, is it?  This patch is only in 6.1.7 and 6.1.8 as far as 
stable kernels is concerned, should I ask Greg to revert it there?

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2023-01-26 17:47                   ` Paolo Bonzini
@ 2023-01-26 20:45                     ` Jim Mattson
  2023-01-27  7:23                       ` Greg KH
  0 siblings, 1 reply; 13+ messages in thread
From: Jim Mattson @ 2023-01-26 20:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc, stable

On Thu, Jan 26, 2023 at 9:47 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 1/26/23 17:06, Jim Mattson wrote:
> >>> Sadly, there isn't a single kernel involved. People running our VMM on
> >>> their desktops are going to be impacted as soon as this patch hits
> >>> that distro. (I don't know if I can say which distro that is.) So, now
> >>> we have to get the VMM folks to urgently accommodate this change and
> >>> get a new distribution out.
> >>
> >> Ok, this is what is needed to make a more informed choice.  To be clear,
> >> this is _still_ not public (for example it's not ChromeOS), so there is
> >> at least some control on what version of the VMM they use?  Would it
> >> make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?
> >
> > Mainline isn't a problem. I'm more worried about 5.19 LTS.
>
> 5.19 is not LTS, is it?  This patch is only in 6.1.7 and 6.1.8 as far as
> stable kernels is concerned, should I ask Greg to revert it there?

It came to my attention when commit 196c6f0c3e21 ("KVM: x86: Do not
return host topology information from KVM_GET_SUPPORTED_CPUID") broke
some of our tests under 5.10 LTS.

If it isn't bound for linux-5.19-y, then we have some breathing room.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
  2023-01-26 20:45                     ` Jim Mattson
@ 2023-01-27  7:23                       ` Greg KH
  0 siblings, 0 replies; 13+ messages in thread
From: Greg KH @ 2023-01-27  7:23 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Paolo Bonzini, linux-kernel, kvm, seanjc, stable

On Thu, Jan 26, 2023 at 12:45:58PM -0800, Jim Mattson wrote:
> On Thu, Jan 26, 2023 at 9:47 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 1/26/23 17:06, Jim Mattson wrote:
> > >>> Sadly, there isn't a single kernel involved. People running our VMM on
> > >>> their desktops are going to be impacted as soon as this patch hits
> > >>> that distro. (I don't know if I can say which distro that is.) So, now
> > >>> we have to get the VMM folks to urgently accommodate this change and
> > >>> get a new distribution out.
> > >>
> > >> Ok, this is what is needed to make a more informed choice.  To be clear,
> > >> this is _still_ not public (for example it's not ChromeOS), so there is
> > >> at least some control on what version of the VMM they use?  Would it
> > >> make sense to buy you a few months by deferring this patch to Linux 6.3-6.5?
> > >
> > > Mainline isn't a problem. I'm more worried about 5.19 LTS.
> >
> > 5.19 is not LTS, is it?  This patch is only in 6.1.7 and 6.1.8 as far as
> > stable kernels is concerned, should I ask Greg to revert it there?
> 
> It came to my attention when commit 196c6f0c3e21 ("KVM: x86: Do not
> return host topology information from KVM_GET_SUPPORTED_CPUID") broke
> some of our tests under 5.10 LTS.
> 
> If it isn't bound for linux-5.19-y, then we have some breathing room.

5.19 is long end-of-life, it dropped off of being maintained back in
October of last year.

You can always use the front page of kernel.org to determine what is
still being maintained.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-01-27  7:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-27  9:20 [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID Paolo Bonzini
2023-01-24 23:16 ` Jim Mattson
2023-01-25 14:17   ` Paolo Bonzini
2023-01-25 16:47     ` Jim Mattson
2023-01-25 21:46       ` Paolo Bonzini
2023-01-25 22:09         ` Jim Mattson
2023-01-25 22:43           ` Paolo Bonzini
2023-01-26  0:58             ` Jim Mattson
2023-01-26  9:40               ` Paolo Bonzini
2023-01-26 16:06                 ` Jim Mattson
2023-01-26 17:47                   ` Paolo Bonzini
2023-01-26 20:45                     ` Jim Mattson
2023-01-27  7:23                       ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).