* Broadwell server reboot with vmx: unexpected exit reason 0x3
@ 2019-09-30 8:43 Jinpu Wang
2019-09-30 10:48 ` Liran Alon
0 siblings, 1 reply; 5+ messages in thread
From: Jinpu Wang @ 2019-09-30 8:43 UTC (permalink / raw)
To: kvm
Dear KVM experts,
We have a Broadwell server reboot itself recently, before the reboot,
there were error messages from KVM in netconsole:
[5599380.317055] kvm [9046]: vcpu1, guest rIP: 0xffffffff816ad716 vmx:
unexpected exit reason 0x3
[5599380.317060] kvm [49626]: vcpu0, guest rIP: 0xffffffff81060fe6
vmx: unexpected exit reason 0x3
[5599380.317062] kvm [36632]: vcpu0, guest rIP: 0xffffffff8103970d
vmx: unexpected exit reason 0x3
[5599380.317064] kvm [9620]: vcpu1, guest rIP: 0xffffffffb6c1b08e vmx:
unexpected exit reason 0x3
[5599380.317067] kvm [49925]: vcpu5, guest rIP: 0xffffffff9b406ea2
vmx: unexpected exit reason 0x3
[5599380.317068] kvm [49925]: vcpu3, guest rIP: 0xffffffff9b406ea2
vmx: unexpected exit reason 0x3
[5599380.317070] kvm [33871]: vcpu2, guest rIP: 0xffffffff81060fe6
vmx: unexpected exit reason 0x3
[5599380.317072] kvm [49925]: vcpu4, guest rIP: 0xffffffff9b406ea2
vmx: unexpected exit reason 0x3
[5599380.317074] kvm [48505]: vcpu1, guest rIP: 0xffffffffaf36bf9b
vmx: unexpected exit reason 0x3
[5599380.317076] kvm [21880]: vcpu1, guest rIP: 0xffffffff8103970d
vmx: unexpected exit reason 0x3
Kernel version is: 4.14.129
CPU is Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
There is no crashdump generated, only above message right before server reboot.
Anyone has an idea, what could cause the reboot? is there a known
problem in this regards?
I notice EXIT_REASON_INIT_SIGNAL(3) is introduced recently, is it related?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kvm?id=4b9852f4f38909a9ca74e71afb35aafba0871aa1
Regards,
Jinpu
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Broadwell server reboot with vmx: unexpected exit reason 0x3
2019-09-30 8:43 Broadwell server reboot with vmx: unexpected exit reason 0x3 Jinpu Wang
@ 2019-09-30 10:48 ` Liran Alon
2019-10-02 17:29 ` Sean Christopherson
0 siblings, 1 reply; 5+ messages in thread
From: Liran Alon @ 2019-09-30 10:48 UTC (permalink / raw)
To: Jinpu Wang; +Cc: kvm
> On 30 Sep 2019, at 11:43, Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote:
>
> Dear KVM experts,
>
> We have a Broadwell server reboot itself recently, before the reboot,
> there were error messages from KVM in netconsole:
> [5599380.317055] kvm [9046]: vcpu1, guest rIP: 0xffffffff816ad716 vmx:
> unexpected exit reason 0x3
> [5599380.317060] kvm [49626]: vcpu0, guest rIP: 0xffffffff81060fe6
> vmx: unexpected exit reason 0x3
> [5599380.317062] kvm [36632]: vcpu0, guest rIP: 0xffffffff8103970d
> vmx: unexpected exit reason 0x3
> [5599380.317064] kvm [9620]: vcpu1, guest rIP: 0xffffffffb6c1b08e vmx:
> unexpected exit reason 0x3
> [5599380.317067] kvm [49925]: vcpu5, guest rIP: 0xffffffff9b406ea2
> vmx: unexpected exit reason 0x3
> [5599380.317068] kvm [49925]: vcpu3, guest rIP: 0xffffffff9b406ea2
> vmx: unexpected exit reason 0x3
> [5599380.317070] kvm [33871]: vcpu2, guest rIP: 0xffffffff81060fe6
> vmx: unexpected exit reason 0x3
> [5599380.317072] kvm [49925]: vcpu4, guest rIP: 0xffffffff9b406ea2
> vmx: unexpected exit reason 0x3
> [5599380.317074] kvm [48505]: vcpu1, guest rIP: 0xffffffffaf36bf9b
> vmx: unexpected exit reason 0x3
> [5599380.317076] kvm [21880]: vcpu1, guest rIP: 0xffffffff8103970d
> vmx: unexpected exit reason 0x3
The only way a CPU will raise this exit-reason (3 == EXIT_REASON_INIT_SIGNAL)
is if CPU is in VMX non-root mode while it has a pending INIT signal in LAPIC.
In simple terms, it means that one CPU was running inside guest while
another CPU have sent it a signal to reset itself.
I see in code that kvm_init() does register_reboot_notifier(&kvm_reboot_notifier).
kvm_reboot() runs hardware_disable_nolock() on each CPU before reboot.
Which should result on every CPU running VMX’s hardware_disable() which should
exit VMX operation (VMXOFF) and disable VMX (Clear CR4.VMXE).
Therefore, I’m quite puzzled on how a server reboot triggers the scenario you present here.
Can you send your full kernel log?
>
> Kernel version is: 4.14.129
> CPU is Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
> There is no crashdump generated, only above message right before server reboot.
>
> Anyone has an idea, what could cause the reboot? is there a known
> problem in this regards?
>
> I notice EXIT_REASON_INIT_SIGNAL(3) is introduced recently, is it related?
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_arch_x86_kvm-3Fid-3D4b9852f4f38909a9ca74e71afb35aafba0871aa1&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=3JMSVEOhF1eCpny7VowcBwzScGDxjUkUZpipoP8Hlqw&s=war3Qw8cey9BewvAWmnGQdx3TY7EnL6O5aUkrg3FQUg&e=
As the author of this commit, this shouldn’t be related. i.e. It won’t help you to apply this commit to your kernel.
That commit changes the handling of *virtual* INIT signals inside guest.
What you are seeing here are exits which results from a *physical* INIT signal while CPU was in guest.
-Liran
>
> Regards,
> Jinpu
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Broadwell server reboot with vmx: unexpected exit reason 0x3
2019-09-30 10:48 ` Liran Alon
@ 2019-10-02 17:29 ` Sean Christopherson
2019-10-04 8:53 ` Jinpu Wang
0 siblings, 1 reply; 5+ messages in thread
From: Sean Christopherson @ 2019-10-02 17:29 UTC (permalink / raw)
To: Liran Alon; +Cc: Jinpu Wang, kvm
On Mon, Sep 30, 2019 at 01:48:15PM +0300, Liran Alon wrote:
>
> > On 30 Sep 2019, at 11:43, Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote:
> >
> > Dear KVM experts,
> >
> > We have a Broadwell server reboot itself recently, before the reboot,
> > there were error messages from KVM in netconsole:
> > [5599380.317055] kvm [9046]: vcpu1, guest rIP: 0xffffffff816ad716 vmx:
> > unexpected exit reason 0x3
> > [5599380.317060] kvm [49626]: vcpu0, guest rIP: 0xffffffff81060fe6
> > vmx: unexpected exit reason 0x3
> > [5599380.317062] kvm [36632]: vcpu0, guest rIP: 0xffffffff8103970d
> > vmx: unexpected exit reason 0x3
> > [5599380.317064] kvm [9620]: vcpu1, guest rIP: 0xffffffffb6c1b08e vmx:
> > unexpected exit reason 0x3
> > [5599380.317067] kvm [49925]: vcpu5, guest rIP: 0xffffffff9b406ea2
> > vmx: unexpected exit reason 0x3
> > [5599380.317068] kvm [49925]: vcpu3, guest rIP: 0xffffffff9b406ea2
> > vmx: unexpected exit reason 0x3
> > [5599380.317070] kvm [33871]: vcpu2, guest rIP: 0xffffffff81060fe6
> > vmx: unexpected exit reason 0x3
> > [5599380.317072] kvm [49925]: vcpu4, guest rIP: 0xffffffff9b406ea2
> > vmx: unexpected exit reason 0x3
> > [5599380.317074] kvm [48505]: vcpu1, guest rIP: 0xffffffffaf36bf9b
> > vmx: unexpected exit reason 0x3
> > [5599380.317076] kvm [21880]: vcpu1, guest rIP: 0xffffffff8103970d
> > vmx: unexpected exit reason 0x3
>
> The only way a CPU will raise this exit-reason (3 == EXIT_REASON_INIT_SIGNAL)
> is if CPU is in VMX non-root mode while it has a pending INIT signal in LAPIC.
>
> In simple terms, it means that one CPU was running inside guest while
> another CPU have sent it a signal to reset itself.
>
> I see in code that kvm_init() does register_reboot_notifier(&kvm_reboot_notifier).
> kvm_reboot() runs hardware_disable_nolock() on each CPU before reboot.
> Which should result on every CPU running VMX’s hardware_disable() which should
> exit VMX operation (VMXOFF) and disable VMX (Clear CR4.VMXE).
>
> Therefore, I’m quite puzzled on how a server reboot triggers the scenario you
> present here. Can you send your full kernel log?
My guess is that the system triggered an emergency reboot and was either
unable to force CPUs out of VMX non-root with NMIs, hit a triple fault
shutdown and auto-generated INITs before it could shootdown the other
CPUs, or didn't even attempt the NMI because VMX wasn't enabled on the
CPU that triggered reboot.
In arch/x86/kernel/reboot.c:
/* Use NMIs as IPIs to tell all CPUs to disable virtualization */
static void emergency_vmx_disable_all(void)
{
/* Just make sure we won't change CPUs while doing this */
local_irq_disable();
/*
* We need to disable VMX on all CPUs before rebooting, otherwise
* we risk hanging up the machine, because the CPU ignore INIT
* signals when VMX is enabled.
*
* We can't take any locks and we may be on an inconsistent
* state, so we use NMIs as IPIs to tell the other CPUs to disable
* VMX and halt.
*
* For safety, we will avoid running the nmi_shootdown_cpus()
* stuff unnecessarily, but we don't have a way to check
* if other CPUs have VMX enabled. So we will call it only if the
* CPU we are running on has VMX enabled.
*
* We will miss cases where VMX is not enabled on all CPUs. This
* shouldn't do much harm because KVM always enable VMX on all
* CPUs anyway. But we can miss it on the small window where KVM
* is still enabling VMX.
*/
if (cpu_has_vmx() && cpu_vmx_enabled()) {
/* Disable VMX on this CPU. */
cpu_vmxoff();
/* Halt and disable VMX on the other CPUs */
nmi_shootdown_cpus(vmxoff_nmi);
}
}
static void native_machine_emergency_restart(void)
{
...
if (reboot_emergency)
emergency_vmx_disable_all();
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Broadwell server reboot with vmx: unexpected exit reason 0x3
2019-10-02 17:29 ` Sean Christopherson
@ 2019-10-04 8:53 ` Jinpu Wang
2019-10-17 18:52 ` Sean Christopherson
0 siblings, 1 reply; 5+ messages in thread
From: Jinpu Wang @ 2019-10-04 8:53 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Liran Alon, kvm
On Wed, Oct 2, 2019 at 7:29 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Mon, Sep 30, 2019 at 01:48:15PM +0300, Liran Alon wrote:
> >
> > > On 30 Sep 2019, at 11:43, Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote:
> > >
> > > Dear KVM experts,
> > >
> > > We have a Broadwell server reboot itself recently, before the reboot,
> > > there were error messages from KVM in netconsole:
> > > [5599380.317055] kvm [9046]: vcpu1, guest rIP: 0xffffffff816ad716 vmx:
> > > unexpected exit reason 0x3
> > > [5599380.317060] kvm [49626]: vcpu0, guest rIP: 0xffffffff81060fe6
> > > vmx: unexpected exit reason 0x3
> > > [5599380.317062] kvm [36632]: vcpu0, guest rIP: 0xffffffff8103970d
> > > vmx: unexpected exit reason 0x3
> > > [5599380.317064] kvm [9620]: vcpu1, guest rIP: 0xffffffffb6c1b08e vmx:
> > > unexpected exit reason 0x3
> > > [5599380.317067] kvm [49925]: vcpu5, guest rIP: 0xffffffff9b406ea2
> > > vmx: unexpected exit reason 0x3
> > > [5599380.317068] kvm [49925]: vcpu3, guest rIP: 0xffffffff9b406ea2
> > > vmx: unexpected exit reason 0x3
> > > [5599380.317070] kvm [33871]: vcpu2, guest rIP: 0xffffffff81060fe6
> > > vmx: unexpected exit reason 0x3
> > > [5599380.317072] kvm [49925]: vcpu4, guest rIP: 0xffffffff9b406ea2
> > > vmx: unexpected exit reason 0x3
> > > [5599380.317074] kvm [48505]: vcpu1, guest rIP: 0xffffffffaf36bf9b
> > > vmx: unexpected exit reason 0x3
> > > [5599380.317076] kvm [21880]: vcpu1, guest rIP: 0xffffffff8103970d
> > > vmx: unexpected exit reason 0x3
> >
> > The only way a CPU will raise this exit-reason (3 == EXIT_REASON_INIT_SIGNAL)
> > is if CPU is in VMX non-root mode while it has a pending INIT signal in LAPIC.
> >
> > In simple terms, it means that one CPU was running inside guest while
> > another CPU have sent it a signal to reset itself.
> >
> > I see in code that kvm_init() does register_reboot_notifier(&kvm_reboot_notifier).
> > kvm_reboot() runs hardware_disable_nolock() on each CPU before reboot.
> > Which should result on every CPU running VMX’s hardware_disable() which should
> > exit VMX operation (VMXOFF) and disable VMX (Clear CR4.VMXE).
> >
> > Therefore, I’m quite puzzled on how a server reboot triggers the scenario you
> > present here. Can you send your full kernel log?
>
> My guess is that the system triggered an emergency reboot and was either
> unable to force CPUs out of VMX non-root with NMIs, hit a triple fault
> shutdown and auto-generated INITs before it could shootdown the other
> CPUs, or didn't even attempt the NMI because VMX wasn't enabled on the
> CPU that triggered reboot.
>
> In arch/x86/kernel/reboot.c:
>
> /* Use NMIs as IPIs to tell all CPUs to disable virtualization */
> static void emergency_vmx_disable_all(void)
> {
> /* Just make sure we won't change CPUs while doing this */
> local_irq_disable();
>
> /*
> * We need to disable VMX on all CPUs before rebooting, otherwise
> * we risk hanging up the machine, because the CPU ignore INIT
> * signals when VMX is enabled.
> *
> * We can't take any locks and we may be on an inconsistent
> * state, so we use NMIs as IPIs to tell the other CPUs to disable
> * VMX and halt.
> *
> * For safety, we will avoid running the nmi_shootdown_cpus()
> * stuff unnecessarily, but we don't have a way to check
> * if other CPUs have VMX enabled. So we will call it only if the
> * CPU we are running on has VMX enabled.
> *
> * We will miss cases where VMX is not enabled on all CPUs. This
> * shouldn't do much harm because KVM always enable VMX on all
> * CPUs anyway. But we can miss it on the small window where KVM
> * is still enabling VMX.
> */
> if (cpu_has_vmx() && cpu_vmx_enabled()) {
> /* Disable VMX on this CPU. */
> cpu_vmxoff();
>
> /* Halt and disable VMX on the other CPUs */
> nmi_shootdown_cpus(vmxoff_nmi);
>
> }
> }
>
> static void native_machine_emergency_restart(void)
> {
> ...
>
> if (reboot_emergency)
> emergency_vmx_disable_all();
> }
>
Thanks for the information, Sean, I checked the call path for
emergency_restart, I would expect there should be a kernel message
to indicate the reason why it has to do the emergency_restart, but
there is nothing logged in netconsole or kernel log. I don't
understand.
Do you have a guess what could cause the system to trigger an emergency reboot?
Regards,
Jinpu
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Broadwell server reboot with vmx: unexpected exit reason 0x3
2019-10-04 8:53 ` Jinpu Wang
@ 2019-10-17 18:52 ` Sean Christopherson
0 siblings, 0 replies; 5+ messages in thread
From: Sean Christopherson @ 2019-10-17 18:52 UTC (permalink / raw)
To: Jinpu Wang; +Cc: Liran Alon, kvm
On Fri, Oct 04, 2019 at 10:53:40AM +0200, Jinpu Wang wrote:
> On Wed, Oct 2, 2019 at 7:29 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Mon, Sep 30, 2019 at 01:48:15PM +0300, Liran Alon wrote:
> > >
> > > > On 30 Sep 2019, at 11:43, Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote:
> > > >
> > > > Dear KVM experts,
> > > >
> > > > We have a Broadwell server reboot itself recently, before the reboot,
> > > > there were error messages from KVM in netconsole:
> > > > [5599380.317055] kvm [9046]: vcpu1, guest rIP: 0xffffffff816ad716 vmx:
> > > > unexpected exit reason 0x3
> > > > [5599380.317060] kvm [49626]: vcpu0, guest rIP: 0xffffffff81060fe6
> > > > vmx: unexpected exit reason 0x3
> > > > [5599380.317062] kvm [36632]: vcpu0, guest rIP: 0xffffffff8103970d
> > > > vmx: unexpected exit reason 0x3
> > > > [5599380.317064] kvm [9620]: vcpu1, guest rIP: 0xffffffffb6c1b08e vmx:
> > > > unexpected exit reason 0x3
> > > > [5599380.317067] kvm [49925]: vcpu5, guest rIP: 0xffffffff9b406ea2
> > > > vmx: unexpected exit reason 0x3
> > > > [5599380.317068] kvm [49925]: vcpu3, guest rIP: 0xffffffff9b406ea2
> > > > vmx: unexpected exit reason 0x3
> > > > [5599380.317070] kvm [33871]: vcpu2, guest rIP: 0xffffffff81060fe6
> > > > vmx: unexpected exit reason 0x3
> > > > [5599380.317072] kvm [49925]: vcpu4, guest rIP: 0xffffffff9b406ea2
> > > > vmx: unexpected exit reason 0x3
> > > > [5599380.317074] kvm [48505]: vcpu1, guest rIP: 0xffffffffaf36bf9b
> > > > vmx: unexpected exit reason 0x3
> > > > [5599380.317076] kvm [21880]: vcpu1, guest rIP: 0xffffffff8103970d
> > > > vmx: unexpected exit reason 0x3
> > >
> > > The only way a CPU will raise this exit-reason (3 == EXIT_REASON_INIT_SIGNAL)
> > > is if CPU is in VMX non-root mode while it has a pending INIT signal in LAPIC.
> > >
> > > In simple terms, it means that one CPU was running inside guest while
> > > another CPU have sent it a signal to reset itself.
> > >
> > > I see in code that kvm_init() does register_reboot_notifier(&kvm_reboot_notifier).
> > > kvm_reboot() runs hardware_disable_nolock() on each CPU before reboot.
> > > Which should result on every CPU running VMX’s hardware_disable() which should
> > > exit VMX operation (VMXOFF) and disable VMX (Clear CR4.VMXE).
> > >
> > > Therefore, I’m quite puzzled on how a server reboot triggers the scenario you
> > > present here. Can you send your full kernel log?
> >
> > My guess is that the system triggered an emergency reboot and was either
> > unable to force CPUs out of VMX non-root with NMIs, hit a triple fault
> > shutdown and auto-generated INITs before it could shootdown the other
> > CPUs, or didn't even attempt the NMI because VMX wasn't enabled on the
> > CPU that triggered reboot.
> >
> > In arch/x86/kernel/reboot.c:
> >
> > /* Use NMIs as IPIs to tell all CPUs to disable virtualization */
> > static void emergency_vmx_disable_all(void)
> > {
> > /* Just make sure we won't change CPUs while doing this */
> > local_irq_disable();
> >
> > /*
> > * We need to disable VMX on all CPUs before rebooting, otherwise
> > * we risk hanging up the machine, because the CPU ignore INIT
> > * signals when VMX is enabled.
> > *
> > * We can't take any locks and we may be on an inconsistent
> > * state, so we use NMIs as IPIs to tell the other CPUs to disable
> > * VMX and halt.
> > *
> > * For safety, we will avoid running the nmi_shootdown_cpus()
> > * stuff unnecessarily, but we don't have a way to check
> > * if other CPUs have VMX enabled. So we will call it only if the
> > * CPU we are running on has VMX enabled.
> > *
> > * We will miss cases where VMX is not enabled on all CPUs. This
> > * shouldn't do much harm because KVM always enable VMX on all
> > * CPUs anyway. But we can miss it on the small window where KVM
> > * is still enabling VMX.
> > */
> > if (cpu_has_vmx() && cpu_vmx_enabled()) {
> > /* Disable VMX on this CPU. */
> > cpu_vmxoff();
> >
> > /* Halt and disable VMX on the other CPUs */
> > nmi_shootdown_cpus(vmxoff_nmi);
> >
> > }
> > }
> >
> > static void native_machine_emergency_restart(void)
> > {
> > ...
> >
> > if (reboot_emergency)
> > emergency_vmx_disable_all();
> > }
> >
> Thanks for the information, Sean, I checked the call path for
> emergency_restart, I would expect there should be a kernel message
> to indicate the reason why it has to do the emergency_restart, but
> there is nothing logged in netconsole or kernel log. I don't
> understand.
>
> Do you have a guess what could cause the system to trigger an emergency reboot?
Not really. The emergency reboot thing itself is a guess. Sorry :-(
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-10-17 18:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-30 8:43 Broadwell server reboot with vmx: unexpected exit reason 0x3 Jinpu Wang
2019-09-30 10:48 ` Liran Alon
2019-10-02 17:29 ` Sean Christopherson
2019-10-04 8:53 ` Jinpu Wang
2019-10-17 18:52 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).