stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Jim Mattson <jmattson@google.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	seanjc@google.com, stable@vger.kernel.org
Subject: Re: [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
Date: Wed, 25 Jan 2023 22:46:30 +0100	[thread overview]
Message-ID: <8bdf22c8-9ef1-e526-df36-9073a150669d@redhat.com> (raw)
In-Reply-To: <CALMp9eQ3wZ4dkq_8ErcUdQAs2F96Gvr-g=7-iBteJeuN5aX00A@mail.gmail.com>

On 1/25/23 17:47, Jim Mattson wrote:
>> Part of the definition of the API is that you can take
>> KVM_GET_SUPPORTED_CPUID and pass it to KVM_SET_CPUID2 for all vCPUs.
>> Returning host topology information for a random host vCPU definitely
>> violates the contract.
> 
> You are attempting to rewrite history.  Leaf 0xB was added to > KVM_GET_SUPPORTED_CPUID in commit 0771671749b5 ("KVM: Enhance guest
> cpuid management"), and the only documentation of the
> KVM_GET_SUPPORTED_CPUID ioctl at that time was in the commit message:
> 
>       - KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
>         supports
>
> [...] the intention was to return the
> host topology information for the current logical processor.

The handling of unknown features is so naive in that commit, that I 
don't think it is possible to read anything from the implementation; and 
it certainly should not be a paragon for a future-proof implementation 
of KVM_GET_SUPPORTED_CPUID.

For example, it only hid _known_ CPUID leaves or features and passed the 
unknown ones through, which you'll agree is completely broken.  It also 
didn't try to handle all leaves for which ECX might turn out to be 
significant---which happened for EAX=7 so the commit returns a wrong 
output for CPUID[EAX=7,ECX=0].EAX.

In other words, the only reason it handles 0xB is because it was known 
to have subleaves.

We can get more information about how userspace was intended to use it 
from the qemu-kvm fork, which at the time was practically the only KVM 
userspace.  As of 2009 it was only checking a handful of leaves:

https://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git/tree/target-i386/kvm.c?h=kvm-88#n133

so shall we say that userspace is supposed to build each CPUID leaf one 
by one and use KVM_GET_SUPPORTED_CPUID2 for validation only?  I think 
the first committed documentation agrees: "Userspace can use the 
information returned by this ioctl to construct cpuid information (for 
KVM_SET_CPUID2) that is consistent with hardware, kernel, and userspace 
capabilities, and with user requirements".

However, that's the theory.  "Do not break userspace" also involves 
looking at how userspace *really* uses the API, and make compromises to 
cater to those uses; which is different from rewriting history.

And in practice, people basically stopped reading after "(for 
KVM_SET_CPUID2)".

For example in kvmtool:

	kvm_cpuid->nent = MAX_KVM_CPUID_ENTRIES;
	if (ioctl(vcpu->kvm->sys_fd, KVM_GET_SUPPORTED_CPUID, kvm_cpuid) < 0)
		die_perror("KVM_GET_SUPPORTED_CPUID failed");

	filter_cpuid(kvm_cpuid, vcpu->cpu_id);

	if (ioctl(vcpu->vcpu_fd, KVM_SET_CPUID2, kvm_cpuid) < 0)
		die_perror("KVM_SET_CPUID2 failed");

where filter_cpuid only does minor adjustments that do not include 0xB, 
0x1F and 0x8000001E.  The result is a topology that makes no sense if 
host #vCPUs != guest #vCPUs, and which doesn't include the correct APIC 
id in EDX.

https://github.com/kvmtool/kvmtool/blob/5657dd3e48b41bc6db38fa657994bc0e030fd31f/x86/cpuid.c


crosvm does optionally attempt to pass through leaves 0xB and 0x1F, but 
it fails to adjust the APIC id in EDX.  On the other hand it also passes 
through 0x8000001E if ctx.cpu_config.host_cpu_topology is false, 
incorrectly.  So on one hand this patch breaks host_cpu_topology == 
true, on the other hand it fixes host_cpu_topology == false on AMD 
processors.

https://github.com/google/crosvm/blob/cc79897fc0813ee8412e6395648593898962ec82/x86_64/src/cpuid.rs#L121


The rust-vmm reference hypervisor adjusts the APIC id in EDX for 0xB but 
not for 0x1F.  Apart from that it passes through the host topology 
leaves, again resulting in nonsensical topology for host #vCPUs != guest 
#vCPUs.

https://github.com/rust-vmm/vmm-reference/blob/5cde58bc955afca8a180585a9f01c82d6277a755/src/vm-vcpu-ref/src/x86_64/cpuid.rs


Firecracker, finally, ignores KVM_GET_SUPPORTED_CPUID's output for 0xb 
and 0x8000001E (good!) but fails to do the same for 0x1F, so this patch 
is again a fix of sorts---having all zeroes in 0x1F is better than 
having a value that is valid but inconsistent with 0xB.

https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/intel.rs#L120
https://github.com/firecracker-microvm/firecracker/blob/cdf4fef3011c51206f178bdf21ececb87caa16c1/src/cpuid/src/transformer/amd.rs#L88


So basically the only open source userspace that is penalized (but not 
broken, and also partly fixed) by the patch is crosvm.  QEMU doesn't 
care, while firecracker/kvmtool/vmm-reference are a net positive.

Paolo

> Any future changes to either the operational behavior or the
> documented behavior of the ABI surely demand a version bump.
> 


  reply	other threads:[~2023-01-25 21:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-27  9:20 [PATCH v2] KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID Paolo Bonzini
2023-01-24 23:16 ` Jim Mattson
2023-01-25 14:17   ` Paolo Bonzini
2023-01-25 16:47     ` Jim Mattson
2023-01-25 21:46       ` Paolo Bonzini [this message]
2023-01-25 22:09         ` Jim Mattson
2023-01-25 22:43           ` Paolo Bonzini
2023-01-26  0:58             ` Jim Mattson
2023-01-26  9:40               ` Paolo Bonzini
2023-01-26 16:06                 ` Jim Mattson
2023-01-26 17:47                   ` Paolo Bonzini
2023-01-26 20:45                     ` Jim Mattson
2023-01-27  7:23                       ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8bdf22c8-9ef1-e526-df36-9073a150669d@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=seanjc@google.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).