All of lore.kernel.org
 help / color / mirror / Atom feed
* [4.1.y] vmwrite error: reg 401e value a9 (err 1)
@ 2016-11-09  0:17 Greg Edwards
  2016-11-09  3:10 ` Huang, Kai
  0 siblings, 1 reply; 4+ messages in thread
From: Greg Edwards @ 2016-11-09  0:17 UTC (permalink / raw)
  To: kvm
  Cc: Sasha Levin, Paolo Bonzini, Radim Krčmář,
	Jim Mattson, Kai Huang

On current 4.1.y stable kernel (4.1.35) on a Broadwell-EP system, I see the
following when shutting down a multiple vcpu VM:

[  758.387722] vmwrite error: reg 401e value a9 (err 1)
[  758.392860] CPU: 33 PID: 14969 Comm: qemu-system-x86 Not tainted 4.1.35 #1
[  758.399897] Hardware name: DDN 14000x/14000, BIOS 0229 09/23/2016
[  758.406156]  0000000000000286 0000000028b15def ffff88202f3fbb38 ffffffff8159de63
[  758.413942]  ffff88402a938000 0000000000000001 ffff88202f3fbb48 ffffffffa060fa1c
[  758.421736]  ffff88202f3fbb58 ffffffffa060fa49 ffff88202f3fbb78 ffffffffa0618fab
[  758.429534] Call Trace:
[  758.432147]  [<ffffffff8159de63>] dump_stack+0x4d/0x63
[  758.437449]  [<ffffffffa060fa1c>] vmwrite_error+0x2c/0x30 [kvm_intel]
[  758.444059]  [<ffffffffa060fa49>] vmcs_writel+0x29/0x30 [kvm_intel]
[  758.450493]  [<ffffffffa0618fab>] vmx_free_vcpu+0xdb/0xf0 [kvm_intel]
[  758.457111]  [<ffffffffa059ddb8>] kvm_arch_vcpu_free+0x48/0x50 [kvm]
[  758.463637]  [<ffffffffa059eb8a>] kvm_arch_destroy_vm+0x10a/0x200 [kvm]
[  758.470418]  [<ffffffff810caff8>] ? synchronize_srcu+0x28/0x30
[  758.476419]  [<ffffffffa05850c5>] kvm_put_kvm+0x105/0x220 [kvm]
[  758.482505]  [<ffffffffa0585218>] kvm_vcpu_release+0x18/0x20 [kvm]
[  758.488853]  [<ffffffff811a143b>] __fput+0xcb/0x1d0
[  758.493899]  [<ffffffff811a158e>] ____fput+0xe/0x10
[  758.498939]  [<ffffffff81098ec4>] task_work_run+0xd4/0xf0
[  758.504497]  [<ffffffff8107d811>] do_exit+0x2a1/0xb40
[  758.509708]  [<ffffffff8107eef7>] do_group_exit+0x47/0xc0
[  758.515269]  [<ffffffff8108adc3>] get_signal+0x1f3/0x6c0
[  758.520743]  [<ffffffff81003517>] do_signal+0x37/0x800
[  758.526042]  [<ffffffff810e9c35>] ? SyS_futex+0x85/0x1a0
[  758.531513]  [<ffffffff81003d50>] do_notify_resume+0x70/0x80
[  758.537334]  [<ffffffff815a4882>] int_signal+0x12/0x17

This started with the inclusion of 6c2ca21665b99ce2f76389c353b985d8195387cc
("KVM: nVMX: Fix memory corruption when using VMCS shadowing") in 4.1.31.

The error is coming out of vmx_disable_pml() when freeing the 2nd and
subsequent vcpus, as SECONDARY_EXEC_ENABLE_PML was already cleared from the
SECONDARY_VM_EXEC_CONTROL when the first vcpu was freed.

Additionally pulling back a3eaa8649e4c6a6afdafaa04b9114fb230617bb1 ("KVM: VMX:
Fix commit which broke PML") from 4.4 resolves it for me, as it fixes
the above condition.

Is this the correct fix for 4.1.y?

Greg

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [4.1.y] vmwrite error: reg 401e value a9 (err 1)
  2016-11-09  0:17 [4.1.y] vmwrite error: reg 401e value a9 (err 1) Greg Edwards
@ 2016-11-09  3:10 ` Huang, Kai
  2016-11-17 12:41   ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Huang, Kai @ 2016-11-09  3:10 UTC (permalink / raw)
  To: Greg Edwards, kvm
  Cc: Sasha Levin, Paolo Bonzini, Radim Krčmář,
	Jim Mattson, pfeiner

Hi Greg,

Thanks for reporting this issue.

I don't have 4.1.y source code tree at hand but after taking a glance 
looks the commit a3eaa8649e4c6a6afdafaa04b9114fb230617bb1 ("KVM: VMX:
Fix commit which broke PML") fixes this by removing vmwrite to 
SECONDARY_VM_EXEC_CONTROL in vmx_disable_pml, so yes I think this commit 
can fix this issue.

But I think you probably need another commit to fix potential vmwrite 
error when creating vcpu: 4e59516a12a6ef6dcb660cb3a3f70c64bd60cfec (kvm: 
vmx: ensure VMCS is current while enabling PML). Peter found and fixed 
this issue, so I also added him to cc-list.

Paolo/Radim, please comment if I made mistake here.

Thanks,
-Kai

On 11/9/2016 1:17 PM, Greg Edwards wrote:
> On current 4.1.y stable kernel (4.1.35) on a Broadwell-EP system, I see the
> following when shutting down a multiple vcpu VM:
>
> [  758.387722] vmwrite error: reg 401e value a9 (err 1)
> [  758.392860] CPU: 33 PID: 14969 Comm: qemu-system-x86 Not tainted 4.1.35 #1
> [  758.399897] Hardware name: DDN 14000x/14000, BIOS 0229 09/23/2016
> [  758.406156]  0000000000000286 0000000028b15def ffff88202f3fbb38 ffffffff8159de63
> [  758.413942]  ffff88402a938000 0000000000000001 ffff88202f3fbb48 ffffffffa060fa1c
> [  758.421736]  ffff88202f3fbb58 ffffffffa060fa49 ffff88202f3fbb78 ffffffffa0618fab
> [  758.429534] Call Trace:
> [  758.432147]  [<ffffffff8159de63>] dump_stack+0x4d/0x63
> [  758.437449]  [<ffffffffa060fa1c>] vmwrite_error+0x2c/0x30 [kvm_intel]
> [  758.444059]  [<ffffffffa060fa49>] vmcs_writel+0x29/0x30 [kvm_intel]
> [  758.450493]  [<ffffffffa0618fab>] vmx_free_vcpu+0xdb/0xf0 [kvm_intel]
> [  758.457111]  [<ffffffffa059ddb8>] kvm_arch_vcpu_free+0x48/0x50 [kvm]
> [  758.463637]  [<ffffffffa059eb8a>] kvm_arch_destroy_vm+0x10a/0x200 [kvm]
> [  758.470418]  [<ffffffff810caff8>] ? synchronize_srcu+0x28/0x30
> [  758.476419]  [<ffffffffa05850c5>] kvm_put_kvm+0x105/0x220 [kvm]
> [  758.482505]  [<ffffffffa0585218>] kvm_vcpu_release+0x18/0x20 [kvm]
> [  758.488853]  [<ffffffff811a143b>] __fput+0xcb/0x1d0
> [  758.493899]  [<ffffffff811a158e>] ____fput+0xe/0x10
> [  758.498939]  [<ffffffff81098ec4>] task_work_run+0xd4/0xf0
> [  758.504497]  [<ffffffff8107d811>] do_exit+0x2a1/0xb40
> [  758.509708]  [<ffffffff8107eef7>] do_group_exit+0x47/0xc0
> [  758.515269]  [<ffffffff8108adc3>] get_signal+0x1f3/0x6c0
> [  758.520743]  [<ffffffff81003517>] do_signal+0x37/0x800
> [  758.526042]  [<ffffffff810e9c35>] ? SyS_futex+0x85/0x1a0
> [  758.531513]  [<ffffffff81003d50>] do_notify_resume+0x70/0x80
> [  758.537334]  [<ffffffff815a4882>] int_signal+0x12/0x17
>
> This started with the inclusion of 6c2ca21665b99ce2f76389c353b985d8195387cc
> ("KVM: nVMX: Fix memory corruption when using VMCS shadowing") in 4.1.31.
>
> The error is coming out of vmx_disable_pml() when freeing the 2nd and
> subsequent vcpus, as SECONDARY_EXEC_ENABLE_PML was already cleared from the
> SECONDARY_VM_EXEC_CONTROL when the first vcpu was freed.
>
> Additionally pulling back a3eaa8649e4c6a6afdafaa04b9114fb230617bb1 ("KVM: VMX:
> Fix commit which broke PML") from 4.4 resolves it for me, as it fixes
> the above condition.
>
> Is this the correct fix for 4.1.y?
>
> Greg
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [4.1.y] vmwrite error: reg 401e value a9 (err 1)
  2016-11-09  3:10 ` Huang, Kai
@ 2016-11-17 12:41   ` Paolo Bonzini
  2016-11-17 16:12     ` Greg Edwards
  0 siblings, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2016-11-17 12:41 UTC (permalink / raw)
  To: Huang, Kai, Greg Edwards, kvm
  Cc: Sasha Levin, Radim Krčmář, Jim Mattson, pfeiner



On 09/11/2016 04:10, Huang, Kai wrote:
> Hi Greg,
> 
> Thanks for reporting this issue.
> 
> I don't have 4.1.y source code tree at hand but after taking a glance
> looks the commit a3eaa8649e4c6a6afdafaa04b9114fb230617bb1 ("KVM: VMX:
> Fix commit which broke PML") fixes this by removing vmwrite to
> SECONDARY_VM_EXEC_CONTROL in vmx_disable_pml, so yes I think this commit
> can fix this issue.
> 
> But I think you probably need another commit to fix potential vmwrite
> error when creating vcpu: 4e59516a12a6ef6dcb660cb3a3f70c64bd60cfec (kvm:
> vmx: ensure VMCS is current while enabling PML). Peter found and fixed
> this issue, so I also added him to cc-list.
> 
> Paolo/Radim, please comment if I made mistake here.

Yes, we should backport both of them.  Greg, can you test it?  Then I'll
send the two patches to linux-stable.

Paolo

> Thanks,
> -Kai
> 
> On 11/9/2016 1:17 PM, Greg Edwards wrote:
>> On current 4.1.y stable kernel (4.1.35) on a Broadwell-EP system, I
>> see the
>> following when shutting down a multiple vcpu VM:
>>
>> [  758.387722] vmwrite error: reg 401e value a9 (err 1)
>> [  758.392860] CPU: 33 PID: 14969 Comm: qemu-system-x86 Not tainted
>> 4.1.35 #1
>> [  758.399897] Hardware name: DDN 14000x/14000, BIOS 0229 09/23/2016
>> [  758.406156]  0000000000000286 0000000028b15def ffff88202f3fbb38
>> ffffffff8159de63
>> [  758.413942]  ffff88402a938000 0000000000000001 ffff88202f3fbb48
>> ffffffffa060fa1c
>> [  758.421736]  ffff88202f3fbb58 ffffffffa060fa49 ffff88202f3fbb78
>> ffffffffa0618fab
>> [  758.429534] Call Trace:
>> [  758.432147]  [<ffffffff8159de63>] dump_stack+0x4d/0x63
>> [  758.437449]  [<ffffffffa060fa1c>] vmwrite_error+0x2c/0x30 [kvm_intel]
>> [  758.444059]  [<ffffffffa060fa49>] vmcs_writel+0x29/0x30 [kvm_intel]
>> [  758.450493]  [<ffffffffa0618fab>] vmx_free_vcpu+0xdb/0xf0 [kvm_intel]
>> [  758.457111]  [<ffffffffa059ddb8>] kvm_arch_vcpu_free+0x48/0x50 [kvm]
>> [  758.463637]  [<ffffffffa059eb8a>] kvm_arch_destroy_vm+0x10a/0x200
>> [kvm]
>> [  758.470418]  [<ffffffff810caff8>] ? synchronize_srcu+0x28/0x30
>> [  758.476419]  [<ffffffffa05850c5>] kvm_put_kvm+0x105/0x220 [kvm]
>> [  758.482505]  [<ffffffffa0585218>] kvm_vcpu_release+0x18/0x20 [kvm]
>> [  758.488853]  [<ffffffff811a143b>] __fput+0xcb/0x1d0
>> [  758.493899]  [<ffffffff811a158e>] ____fput+0xe/0x10
>> [  758.498939]  [<ffffffff81098ec4>] task_work_run+0xd4/0xf0
>> [  758.504497]  [<ffffffff8107d811>] do_exit+0x2a1/0xb40
>> [  758.509708]  [<ffffffff8107eef7>] do_group_exit+0x47/0xc0
>> [  758.515269]  [<ffffffff8108adc3>] get_signal+0x1f3/0x6c0
>> [  758.520743]  [<ffffffff81003517>] do_signal+0x37/0x800
>> [  758.526042]  [<ffffffff810e9c35>] ? SyS_futex+0x85/0x1a0
>> [  758.531513]  [<ffffffff81003d50>] do_notify_resume+0x70/0x80
>> [  758.537334]  [<ffffffff815a4882>] int_signal+0x12/0x17
>>
>> This started with the inclusion of
>> 6c2ca21665b99ce2f76389c353b985d8195387cc
>> ("KVM: nVMX: Fix memory corruption when using VMCS shadowing") in 4.1.31.
>>
>> The error is coming out of vmx_disable_pml() when freeing the 2nd and
>> subsequent vcpus, as SECONDARY_EXEC_ENABLE_PML was already cleared
>> from the
>> SECONDARY_VM_EXEC_CONTROL when the first vcpu was freed.
>>
>> Additionally pulling back a3eaa8649e4c6a6afdafaa04b9114fb230617bb1
>> ("KVM: VMX:
>> Fix commit which broke PML") from 4.4 resolves it for me, as it fixes
>> the above condition.
>>
>> Is this the correct fix for 4.1.y?
>>
>> Greg
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [4.1.y] vmwrite error: reg 401e value a9 (err 1)
  2016-11-17 12:41   ` Paolo Bonzini
@ 2016-11-17 16:12     ` Greg Edwards
  0 siblings, 0 replies; 4+ messages in thread
From: Greg Edwards @ 2016-11-17 16:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Huang, Kai, kvm, Sasha Levin, Radim Krčmář,
	Jim Mattson, pfeiner

On Thu, Nov 17, 2016 at 01:41:12PM +0100, Paolo Bonzini wrote:
> On 09/11/2016 04:10, Huang, Kai wrote:
>> Hi Greg,
>>
>> Thanks for reporting this issue.
>>
>> I don't have 4.1.y source code tree at hand but after taking a glance
>> looks the commit a3eaa8649e4c6a6afdafaa04b9114fb230617bb1 ("KVM: VMX:
>> Fix commit which broke PML") fixes this by removing vmwrite to
>> SECONDARY_VM_EXEC_CONTROL in vmx_disable_pml, so yes I think this commit
>> can fix this issue.
>>
>> But I think you probably need another commit to fix potential vmwrite
>> error when creating vcpu: 4e59516a12a6ef6dcb660cb3a3f70c64bd60cfec (kvm:
>> vmx: ensure VMCS is current while enabling PML). Peter found and fixed
>> this issue, so I also added him to cc-list.
>>
>> Paolo/Radim, please comment if I made mistake here.
>
> Yes, we should backport both of them.  Greg, can you test it?  Then I'll
> send the two patches to linux-stable.

Yes, we've been running with both patches internally without problems
for a week or so.

Greg

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-11-17 17:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-09  0:17 [4.1.y] vmwrite error: reg 401e value a9 (err 1) Greg Edwards
2016-11-09  3:10 ` Huang, Kai
2016-11-17 12:41   ` Paolo Bonzini
2016-11-17 16:12     ` Greg Edwards

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.