* [PATCH v2] kvm: x86: Add logical CPU to KVM_EXIT_FAIL_ENTRY info
@ 2019-12-14 0:20 Jim Mattson
2019-12-14 1:10 ` Liran Alon
0 siblings, 1 reply; 4+ messages in thread
From: Jim Mattson @ 2019-12-14 0:20 UTC (permalink / raw)
To: kvm; +Cc: Jim Mattson, Oliver Upton, Liran Alon, Paolo Bonzini
More often than not, a failed VM-entry in a production environment is
the result of a defective CPU (at least, insofar as Intel x86 is
concerned). To aid in identifying the bad hardware, add the logical
CPU to the information provided to userspace on a KVM exit with reason
KVM_EXIT_FAIL_ENTRY. The presence of this additional information is
indicated by a new capability, KVM_CAP_FAILED_ENTRY_CPU.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
---
Documentation/virt/kvm/api.txt | 1 +
arch/x86/kvm/svm.c | 1 +
arch/x86/kvm/vmx/vmx.c | 2 ++
arch/x86/kvm/x86.c | 1 +
include/uapi/linux/kvm.h | 2 ++
5 files changed, 7 insertions(+)
diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
index ebb37b34dcfc..6e5d92406b65 100644
--- a/Documentation/virt/kvm/api.txt
+++ b/Documentation/virt/kvm/api.txt
@@ -4245,6 +4245,7 @@ hardware_exit_reason.
/* KVM_EXIT_FAIL_ENTRY */
struct {
__u64 hardware_entry_failure_reason;
+ __u32 cpu; /* if KVM_CAP_FAILED_ENTRY_CPU */
} fail_entry;
If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 122d4ce3b1ab..e07c5ce3ac93 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4980,6 +4980,7 @@ static int handle_exit(struct kvm_vcpu *vcpu)
kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
kvm_run->fail_entry.hardware_entry_failure_reason
= svm->vmcb->control.exit_code;
+ kvm_run->fail_entry.cpu = vcpu->cpu;
dump_vmcb(vcpu);
return 0;
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e3394c839dea..17d1a1676fc0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5846,6 +5846,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
vcpu->run->fail_entry.hardware_entry_failure_reason
= exit_reason;
+ vcpu->run->fail_entry.cpu = vcpu->cpu;
return 0;
}
@@ -5854,6 +5855,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
vcpu->run->fail_entry.hardware_entry_failure_reason
= vmcs_read32(VM_INSTRUCTION_ERROR);
+ vcpu->run->fail_entry.cpu = vcpu->cpu;
return 0;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf917139de6b..9e89a32056d1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3273,6 +3273,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_GET_MSR_FEATURES:
case KVM_CAP_MSR_PLATFORM_INFO:
case KVM_CAP_EXCEPTION_PAYLOAD:
+ case KVM_CAP_FAILED_ENTRY_CPU:
r = 1;
break;
case KVM_CAP_SYNC_REGS:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f0a16b4adbbd..09ba7174456d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -277,6 +277,7 @@ struct kvm_run {
/* KVM_EXIT_FAIL_ENTRY */
struct {
__u64 hardware_entry_failure_reason;
+ __u32 cpu;
} fail_entry;
/* KVM_EXIT_EXCEPTION */
struct {
@@ -1009,6 +1010,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176
#define KVM_CAP_ARM_NISV_TO_USER 177
#define KVM_CAP_ARM_INJECT_EXT_DABT 178
+#define KVM_CAP_FAILED_ENTRY_CPU 179
#ifdef KVM_CAP_IRQ_ROUTING
--
2.24.1.735.g03f4e72817-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] kvm: x86: Add logical CPU to KVM_EXIT_FAIL_ENTRY info
2019-12-14 0:20 [PATCH v2] kvm: x86: Add logical CPU to KVM_EXIT_FAIL_ENTRY info Jim Mattson
@ 2019-12-14 1:10 ` Liran Alon
2019-12-16 17:43 ` Jim Mattson
0 siblings, 1 reply; 4+ messages in thread
From: Liran Alon @ 2019-12-14 1:10 UTC (permalink / raw)
To: Jim Mattson; +Cc: kvm, Oliver Upton, Paolo Bonzini
> On 14 Dec 2019, at 2:20, Jim Mattson <jmattson@google.com> wrote:
>
> More often than not, a failed VM-entry in a production environment is
> the result of a defective CPU (at least, insofar as Intel x86 is
> concerned). To aid in identifying the bad hardware, add the logical
> CPU to the information provided to userspace on a KVM exit with reason
> KVM_EXIT_FAIL_ENTRY. The presence of this additional information is
> indicated by a new capability, KVM_CAP_FAILED_ENTRY_CPU.
>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Oliver Upton <oupton@google.com>
> Cc: Liran Alon <liran.alon@oracle.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
BTW, one could argue that receiving an unexpected exit-reason (i.e. KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON)
could also only occur in production either from a KVM bug or from a defective CPU. Similar to failed VM-entry.
Should we add similar behaviour to that as-well?
-Liran
> ---
> Documentation/virt/kvm/api.txt | 1 +
> arch/x86/kvm/svm.c | 1 +
> arch/x86/kvm/vmx/vmx.c | 2 ++
> arch/x86/kvm/x86.c | 1 +
> include/uapi/linux/kvm.h | 2 ++
> 5 files changed, 7 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt
> index ebb37b34dcfc..6e5d92406b65 100644
> --- a/Documentation/virt/kvm/api.txt
> +++ b/Documentation/virt/kvm/api.txt
> @@ -4245,6 +4245,7 @@ hardware_exit_reason.
> /* KVM_EXIT_FAIL_ENTRY */
> struct {
> __u64 hardware_entry_failure_reason;
> + __u32 cpu; /* if KVM_CAP_FAILED_ENTRY_CPU */
> } fail_entry;
>
> If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 122d4ce3b1ab..e07c5ce3ac93 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -4980,6 +4980,7 @@ static int handle_exit(struct kvm_vcpu *vcpu)
> kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> kvm_run->fail_entry.hardware_entry_failure_reason
> = svm->vmcb->control.exit_code;
> + kvm_run->fail_entry.cpu = vcpu->cpu;
> dump_vmcb(vcpu);
> return 0;
> }
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e3394c839dea..17d1a1676fc0 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5846,6 +5846,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
> vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> vcpu->run->fail_entry.hardware_entry_failure_reason
> = exit_reason;
> + vcpu->run->fail_entry.cpu = vcpu->cpu;
> return 0;
> }
>
> @@ -5854,6 +5855,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
> vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> vcpu->run->fail_entry.hardware_entry_failure_reason
> = vmcs_read32(VM_INSTRUCTION_ERROR);
> + vcpu->run->fail_entry.cpu = vcpu->cpu;
> return 0;
> }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf917139de6b..9e89a32056d1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3273,6 +3273,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_GET_MSR_FEATURES:
> case KVM_CAP_MSR_PLATFORM_INFO:
> case KVM_CAP_EXCEPTION_PAYLOAD:
> + case KVM_CAP_FAILED_ENTRY_CPU:
> r = 1;
> break;
> case KVM_CAP_SYNC_REGS:
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index f0a16b4adbbd..09ba7174456d 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -277,6 +277,7 @@ struct kvm_run {
> /* KVM_EXIT_FAIL_ENTRY */
> struct {
> __u64 hardware_entry_failure_reason;
> + __u32 cpu;
> } fail_entry;
> /* KVM_EXIT_EXCEPTION */
> struct {
> @@ -1009,6 +1010,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176
> #define KVM_CAP_ARM_NISV_TO_USER 177
> #define KVM_CAP_ARM_INJECT_EXT_DABT 178
> +#define KVM_CAP_FAILED_ENTRY_CPU 179
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> --
> 2.24.1.735.g03f4e72817-goog
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] kvm: x86: Add logical CPU to KVM_EXIT_FAIL_ENTRY info
2019-12-14 1:10 ` Liran Alon
@ 2019-12-16 17:43 ` Jim Mattson
2019-12-16 17:51 ` Paolo Bonzini
0 siblings, 1 reply; 4+ messages in thread
From: Jim Mattson @ 2019-12-16 17:43 UTC (permalink / raw)
To: Liran Alon; +Cc: kvm list, Oliver Upton, Paolo Bonzini
On Fri, Dec 13, 2019 at 5:10 PM Liran Alon <liran.alon@oracle.com> wrote:
>
>
>
> > On 14 Dec 2019, at 2:20, Jim Mattson <jmattson@google.com> wrote:
> >
> > More often than not, a failed VM-entry in a production environment is
> > the result of a defective CPU (at least, insofar as Intel x86 is
> > concerned). To aid in identifying the bad hardware, add the logical
> > CPU to the information provided to userspace on a KVM exit with reason
> > KVM_EXIT_FAIL_ENTRY. The presence of this additional information is
> > indicated by a new capability, KVM_CAP_FAILED_ENTRY_CPU.
> >
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > Reviewed-by: Oliver Upton <oupton@google.com>
> > Cc: Liran Alon <liran.alon@oracle.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>
> BTW, one could argue that receiving an unexpected exit-reason (i.e. KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON)
> could also only occur in production either from a KVM bug or from a defective CPU. Similar to failed VM-entry.
> Should we add similar behaviour to that as-well?
>
> -Liran
That's a good point. We had one case of numerous VM-exits for INIT,
and I'm pretty sure that was a defective CPU too.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] kvm: x86: Add logical CPU to KVM_EXIT_FAIL_ENTRY info
2019-12-16 17:43 ` Jim Mattson
@ 2019-12-16 17:51 ` Paolo Bonzini
0 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2019-12-16 17:51 UTC (permalink / raw)
To: Jim Mattson, Liran Alon; +Cc: kvm list, Oliver Upton
On 16/12/19 18:43, Jim Mattson wrote:
> That's a good point. We had one case of numerous VM-exits for INIT,
> and I'm pretty sure that was a defective CPU too.
We too, and that's the only conclusion we could reach. And the other
one that I can remember was KVM_INTERNAL_ERROR_DELIVERY_EV with
EPT_VIOLATION exits, so I would add it to all KVM_EXIT_INTERNAL_ERROR.
Nowadays KVM_EXIT_FAIL_ENTRY would probably also be an internal error,
however it was somewhat more frequent back before Intel CPUs had
unrestricted guest support.
Paolo
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-12-16 18:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-14 0:20 [PATCH v2] kvm: x86: Add logical CPU to KVM_EXIT_FAIL_ENTRY info Jim Mattson
2019-12-14 1:10 ` Liran Alon
2019-12-16 17:43 ` Jim Mattson
2019-12-16 17:51 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).