All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/kvm: disable fast MMIO when running nested
@ 2018-01-24 15:12 Vitaly Kuznetsov
  2018-01-25  7:55 ` Wanpeng Li
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-24 15:12 UTC (permalink / raw)
  To: kvm; +Cc: x86, linux-kernel, Paolo Bonzini, Radim Krčmář

I was investigating an issue with seabios >= 1.10 which stopped working
for nested KVM on Hyper-V. The problem appears to be in
handle_ept_violation() function: when we do fast mmio we need to skip
the instruction so we do kvm_skip_emulated_instruction(). This, however,
depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
However, this is not the case.

Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
EPT MISCONFIG occurs. While on real hardware it was observed to be set,
some hypervisors follow the spec and don't set it; we end up advancing
IP with some random value.

I checked with Microsoft and they confirmed they don't fill
VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.

Fix the issue by disabling fast mmio when running nested.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/vmx.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c829d89e2e63..54afb446f38e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 	/*
 	 * A nested guest cannot optimize MMIO vmexits, because we have an
 	 * nGPA here instead of the required GPA.
+	 * Skipping instruction below depends on undefined behavior: Intel's
+	 * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
+	 * when EPT MISCONFIG occurs and while on real hardware it was observed
+	 * to be set, other hypervisors (namely Hyper-V) don't set it, we end
+	 * up advancing IP with some random value. Disable fast mmio when
+	 * running nested and keep it for real hardware in hope that
+	 * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
 	 */
 	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
-	if (!is_guest_mode(vcpu) &&
+	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu) &&
 	    !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
 		trace_kvm_fast_mmio(gpa);
 		return kvm_skip_emulated_instruction(vcpu);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
  2018-01-24 15:12 [PATCH] x86/kvm: disable fast MMIO when running nested Vitaly Kuznetsov
@ 2018-01-25  7:55 ` Wanpeng Li
  2018-01-25 14:34 ` Radim Krčmář
  2018-01-25 14:34 ` Paolo Bonzini
  2 siblings, 0 replies; 13+ messages in thread
From: Wanpeng Li @ 2018-01-25  7:55 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, the arch/x86 maintainers, linux-kernel, Paolo Bonzini,
	Radim Krčmář

2018-01-24 23:12 GMT+08:00 Vitaly Kuznetsov <vkuznets@redhat.com>:
> I was investigating an issue with seabios >= 1.10 which stopped working
> for nested KVM on Hyper-V. The problem appears to be in
> handle_ept_violation() function: when we do fast mmio we need to skip
> the instruction so we do kvm_skip_emulated_instruction(). This, however,
> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> However, this is not the case.
>
> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> EPT MISCONFIG occurs. While on real hardware it was observed to be set,
> some hypervisors follow the spec and don't set it; we end up advancing
> IP with some random value.
>
> I checked with Microsoft and they confirmed they don't fill
> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
>
> Fix the issue by disabling fast mmio when running nested.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
>  arch/x86/kvm/vmx.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..54afb446f38e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
>         /*
>          * A nested guest cannot optimize MMIO vmexits, because we have an
>          * nGPA here instead of the required GPA.
> +        * Skipping instruction below depends on undefined behavior: Intel's
> +        * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
> +        * when EPT MISCONFIG occurs and while on real hardware it was observed
> +        * to be set, other hypervisors (namely Hyper-V) don't set it, we end
> +        * up advancing IP with some random value. Disable fast mmio when
> +        * running nested and keep it for real hardware in hope that
> +        * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
>          */
>         gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> -       if (!is_guest_mode(vcpu) &&
> +       if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu) &&
>             !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
>                 trace_kvm_fast_mmio(gpa);
>                 return kvm_skip_emulated_instruction(vcpu);
> --
> 2.14.3
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
  2018-01-24 15:12 [PATCH] x86/kvm: disable fast MMIO when running nested Vitaly Kuznetsov
  2018-01-25  7:55 ` Wanpeng Li
@ 2018-01-25 14:34 ` Radim Krčmář
  2018-01-25 14:34 ` Paolo Bonzini
  2 siblings, 0 replies; 13+ messages in thread
From: Radim Krčmář @ 2018-01-25 14:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov; +Cc: kvm, x86, linux-kernel, Paolo Bonzini

2018-01-24 16:12+0100, Vitaly Kuznetsov:
> I was investigating an issue with seabios >= 1.10 which stopped working
> for nested KVM on Hyper-V. The problem appears to be in
> handle_ept_violation() function: when we do fast mmio we need to skip
> the instruction so we do kvm_skip_emulated_instruction(). This, however,
> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> However, this is not the case.
> 
> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> EPT MISCONFIG occurs. While on real hardware it was observed to be set,
> some hypervisors follow the spec and don't set it; we end up advancing
> IP with some random value.
> 
> I checked with Microsoft and they confirmed they don't fill
> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
> 
> Fix the issue by disabling fast mmio when running nested.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..54afb446f38e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
>  	/*
>  	 * A nested guest cannot optimize MMIO vmexits, because we have an
>  	 * nGPA here instead of the required GPA.
> +	 * Skipping instruction below depends on undefined behavior: Intel's
> +	 * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
> +	 * when EPT MISCONFIG occurs and while on real hardware it was observed
> +	 * to be set, other hypervisors (namely Hyper-V) don't set it, we end
> +	 * up advancing IP with some random value. Disable fast mmio when
> +	 * running nested and keep it for real hardware in hope that
> +	 * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
>  	 */
>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> -	if (!is_guest_mode(vcpu) &&
> +	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu) &&

I realized that Paolo kept a minor optimization while getting rid of the
undefined behavior (https://patchwork.kernel.org/patch/9903811/).
Please do the same trick that signals kvm_io_bus_write() before going to
x86_emulate_instruction(... EMULTYPE_SKIP ...), but add a branch to use
kvm_skip_emulated_instruction() for bare-metal,

thanks.

>  	    !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
>  		trace_kvm_fast_mmio(gpa);
>  		return kvm_skip_emulated_instruction(vcpu);
> -- 
> 2.14.3
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
  2018-01-24 15:12 [PATCH] x86/kvm: disable fast MMIO when running nested Vitaly Kuznetsov
  2018-01-25  7:55 ` Wanpeng Li
  2018-01-25 14:34 ` Radim Krčmář
@ 2018-01-25 14:34 ` Paolo Bonzini
  2018-01-25 14:49   ` Vitaly Kuznetsov
  2 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2018-01-25 14:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov; +Cc: kvm, x86, linux-kernel, Radim Krčmář



----- Original Message -----
> From: "Vitaly Kuznetsov" <vkuznets@redhat.com>
> To: kvm@vger.kernel.org
> Cc: x86@kernel.org, linux-kernel@vger.kernel.org, "Paolo Bonzini" <pbonzini@redhat.com>, "Radim Krčmář"
> <rkrcmar@redhat.com>
> Sent: Wednesday, January 24, 2018 4:12:34 PM
> Subject: [PATCH] x86/kvm: disable fast MMIO when running nested
> 
> I was investigating an issue with seabios >= 1.10 which stopped working
> for nested KVM on Hyper-V. The problem appears to be in
> handle_ept_violation() function: when we do fast mmio we need to skip
> the instruction so we do kvm_skip_emulated_instruction(). This, however,
> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> However, this is not the case.
> 
> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> EPT MISCONFIG occurs. While on real hardware it was observed to be set,
> some hypervisors follow the spec and don't set it; we end up advancing
> IP with some random value.
> 
> I checked with Microsoft and they confirmed they don't fill
> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
> 
> Fix the issue by disabling fast mmio when running nested.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..54afb446f38e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
>  	/*
>  	 * A nested guest cannot optimize MMIO vmexits, because we have an
>  	 * nGPA here instead of the required GPA.
> +	 * Skipping instruction below depends on undefined behavior: Intel's
> +	 * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
> +	 * when EPT MISCONFIG occurs and while on real hardware it was observed
> +	 * to be set, other hypervisors (namely Hyper-V) don't set it, we end
> +	 * up advancing IP with some random value. Disable fast mmio when
> +	 * running nested and keep it for real hardware in hope that
> +	 * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
>  	 */
>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> -	if (!is_guest_mode(vcpu) &&
> +	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu) &&
>  	    !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
>  		trace_kvm_fast_mmio(gpa);
>  		return kvm_skip_emulated_instruction(vcpu);
> --
> 2.14.3

Vitaly,

could you base the X86_FEATURE_HYPERVISOR case on the patch at
https://patchwork.kernel.org/patch/9903811/?

By using EMULTYPE_SKIP, the eventfd is signaled before entering the
emulator and the impact on latency is small.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
  2018-01-25 14:34 ` Paolo Bonzini
@ 2018-01-25 14:49   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 13+ messages in thread
From: Vitaly Kuznetsov @ 2018-01-25 14:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, x86, linux-kernel, Radim Krčmář

Paolo Bonzini <pbonzini@redhat.com> writes:

> ----- Original Message -----
>> From: "Vitaly Kuznetsov" <vkuznets@redhat.com>
>> To: kvm@vger.kernel.org
>> Cc: x86@kernel.org, linux-kernel@vger.kernel.org, "Paolo Bonzini" <pbonzini@redhat.com>, "Radim Krčmář"
>> <rkrcmar@redhat.com>
>> Sent: Wednesday, January 24, 2018 4:12:34 PM
>> Subject: [PATCH] x86/kvm: disable fast MMIO when running nested
>> 
>> I was investigating an issue with seabios >= 1.10 which stopped working
>> for nested KVM on Hyper-V. The problem appears to be in
>> handle_ept_violation() function: when we do fast mmio we need to skip
>> the instruction so we do kvm_skip_emulated_instruction(). This, however,
>> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
>> However, this is not the case.
>> 
>> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
>> EPT MISCONFIG occurs. While on real hardware it was observed to be set,
>> some hypervisors follow the spec and don't set it; we end up advancing
>> IP with some random value.
>> 
>> I checked with Microsoft and they confirmed they don't fill
>> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
>> 
>> Fix the issue by disabling fast mmio when running nested.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/kvm/vmx.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index c829d89e2e63..54afb446f38e 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
>>  	/*
>>  	 * A nested guest cannot optimize MMIO vmexits, because we have an
>>  	 * nGPA here instead of the required GPA.
>> +	 * Skipping instruction below depends on undefined behavior: Intel's
>> +	 * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
>> +	 * when EPT MISCONFIG occurs and while on real hardware it was observed
>> +	 * to be set, other hypervisors (namely Hyper-V) don't set it, we end
>> +	 * up advancing IP with some random value. Disable fast mmio when
>> +	 * running nested and keep it for real hardware in hope that
>> +	 * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
>>  	 */
>>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>> -	if (!is_guest_mode(vcpu) &&
>> +	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu) &&
>>  	    !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
>>  		trace_kvm_fast_mmio(gpa);
>>  		return kvm_skip_emulated_instruction(vcpu);
>> --
>> 2.14.3
>
> Vitaly,
>
> could you base the X86_FEATURE_HYPERVISOR case on the patch at
> https://patchwork.kernel.org/patch/9903811/?
>
> By using EMULTYPE_SKIP, the eventfd is signaled before entering the
> emulator and the impact on latency is small.
>

Oh, I didn't know there was a story! I'll try EMULTYPE_SKIP, thanks!

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
  2018-01-26  2:49           ` Michael S. Tsirkin
@ 2018-01-26  3:21             ` Jason Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2018-01-26  3:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Radim Krčmář,
	Liran Alon, vkuznets, x86, linux-kernel, kvm



On 2018年01月26日 10:49, Michael S. Tsirkin wrote:
> On Fri, Jan 26, 2018 at 10:41:58AM +0800, Jason Wang wrote:
>>
>> On 2018年01月26日 01:11, Michael S. Tsirkin wrote:
>>> On Thu, Jan 25, 2018 at 09:49:22AM -0500, Paolo Bonzini wrote:
>>>>>> Michael and Jason, any progress on implementing a fast virtio mechanism
>>>>>> that doesn't rely on undefined behavior?
>>>>>>
>>>>>> (Encode writing instruction length into last 4 bits of MMIO address,
>>>>>>     side-channel say that accesses to the MMIO area always use certain
>>>>>>     instruction length, use hypercall, ...)
>>>>>>
>>>>>> Thanks.
>>>>> No progress from my side. But we can use PIO for virtio 1.0 and it's
>>>>> faster than fast MMIO (qemu supports modern pio notification bar, we can
>>>>> make it as default). It looks to me that neither encoding nor hypercall
>>>>> will work for real hardware virtio device.
>>>> Encoding the instruction length would work, the h/w virtio devices would
>>>> just ignore it.  But... it is really ugly.
>>>>
>>>> Using PIO would be a small step backwards for PCIe.  As long as the device
>>>> only needs *one* notification register (either MMIO or PIO) to initialize
>>>> successfully, it's okay.  Then if there is no PIO space you'd just fall back
>>>> to the slower MMIO notification.
>>>>
>>>> Paolo
>>> A bigger issue for PIO is it's causing exits for hw devices.
>>>
>>>
>> Just to make sure I understand. For exits you mean vmexit? I believe MMIO
>> will cause vmexit too.
>>
>> Thanks
> Not with an assigned device where the PTE is marked as present, it
> won't.
>

So in this case, assigned device can just provide MMIO bar.

Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
  2018-01-26  2:41         ` Jason Wang
@ 2018-01-26  2:49           ` Michael S. Tsirkin
  2018-01-26  3:21             ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-01-26  2:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Paolo Bonzini, Radim Krčmář,
	Liran Alon, vkuznets, x86, linux-kernel, kvm

On Fri, Jan 26, 2018 at 10:41:58AM +0800, Jason Wang wrote:
> 
> 
> On 2018年01月26日 01:11, Michael S. Tsirkin wrote:
> > On Thu, Jan 25, 2018 at 09:49:22AM -0500, Paolo Bonzini wrote:
> > > > > Michael and Jason, any progress on implementing a fast virtio mechanism
> > > > > that doesn't rely on undefined behavior?
> > > > > 
> > > > > (Encode writing instruction length into last 4 bits of MMIO address,
> > > > >    side-channel say that accesses to the MMIO area always use certain
> > > > >    instruction length, use hypercall, ...)
> > > > > 
> > > > > Thanks.
> > > > No progress from my side. But we can use PIO for virtio 1.0 and it's
> > > > faster than fast MMIO (qemu supports modern pio notification bar, we can
> > > > make it as default). It looks to me that neither encoding nor hypercall
> > > > will work for real hardware virtio device.
> > > Encoding the instruction length would work, the h/w virtio devices would
> > > just ignore it.  But... it is really ugly.
> > > 
> > > Using PIO would be a small step backwards for PCIe.  As long as the device
> > > only needs *one* notification register (either MMIO or PIO) to initialize
> > > successfully, it's okay.  Then if there is no PIO space you'd just fall back
> > > to the slower MMIO notification.
> > > 
> > > Paolo
> > A bigger issue for PIO is it's causing exits for hw devices.
> > 
> > 
> 
> Just to make sure I understand. For exits you mean vmexit? I believe MMIO
> will cause vmexit too.
> 
> Thanks

Not with an assigned device where the PTE is marked as present, it
won't.

-- 
MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
  2018-01-25 17:11       ` Michael S. Tsirkin
@ 2018-01-26  2:41         ` Jason Wang
  2018-01-26  2:49           ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2018-01-26  2:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, Paolo Bonzini
  Cc: Radim Krčmář,
	Liran Alon, vkuznets, x86, linux-kernel, kvm



On 2018年01月26日 01:11, Michael S. Tsirkin wrote:
> On Thu, Jan 25, 2018 at 09:49:22AM -0500, Paolo Bonzini wrote:
>>>> Michael and Jason, any progress on implementing a fast virtio mechanism
>>>> that doesn't rely on undefined behavior?
>>>>
>>>> (Encode writing instruction length into last 4 bits of MMIO address,
>>>>    side-channel say that accesses to the MMIO area always use certain
>>>>    instruction length, use hypercall, ...)
>>>>
>>>> Thanks.
>>> No progress from my side. But we can use PIO for virtio 1.0 and it's
>>> faster than fast MMIO (qemu supports modern pio notification bar, we can
>>> make it as default). It looks to me that neither encoding nor hypercall
>>> will work for real hardware virtio device.
>> Encoding the instruction length would work, the h/w virtio devices would
>> just ignore it.  But... it is really ugly.
>>
>> Using PIO would be a small step backwards for PCIe.  As long as the device
>> only needs *one* notification register (either MMIO or PIO) to initialize
>> successfully, it's okay.  Then if there is no PIO space you'd just fall back
>> to the slower MMIO notification.
>>
>> Paolo
> A bigger issue for PIO is it's causing exits for hw devices.
>
>

Just to make sure I understand. For exits you mean vmexit? I believe 
MMIO will cause vmexit too.

Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
  2018-01-25 14:49     ` Paolo Bonzini
@ 2018-01-25 17:11       ` Michael S. Tsirkin
  2018-01-26  2:41         ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2018-01-25 17:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jason Wang, Radim Krčmář,
	Liran Alon, vkuznets, x86, linux-kernel, kvm

On Thu, Jan 25, 2018 at 09:49:22AM -0500, Paolo Bonzini wrote:
> > > Michael and Jason, any progress on implementing a fast virtio mechanism
> > > that doesn't rely on undefined behavior?
> > >
> > > (Encode writing instruction length into last 4 bits of MMIO address,
> > >   side-channel say that accesses to the MMIO area always use certain
> > >   instruction length, use hypercall, ...)
> > >
> > > Thanks.
> > 
> > No progress from my side. But we can use PIO for virtio 1.0 and it's
> > faster than fast MMIO (qemu supports modern pio notification bar, we can
> > make it as default). It looks to me that neither encoding nor hypercall
> > will work for real hardware virtio device.
> 
> Encoding the instruction length would work, the h/w virtio devices would
> just ignore it.  But... it is really ugly.
> 
> Using PIO would be a small step backwards for PCIe.  As long as the device
> only needs *one* notification register (either MMIO or PIO) to initialize
> successfully, it's okay.  Then if there is no PIO space you'd just fall back
> to the slower MMIO notification.
> 
> Paolo

A bigger issue for PIO is it's causing exits for hw devices.


-- 
MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
  2018-01-25 14:39   ` Jason Wang
@ 2018-01-25 14:49     ` Paolo Bonzini
  2018-01-25 17:11       ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2018-01-25 14:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Radim Krčmář,
	Liran Alon, vkuznets, x86, linux-kernel, kvm, Michael S. Tsirkin

> > Michael and Jason, any progress on implementing a fast virtio mechanism
> > that doesn't rely on undefined behavior?
> >
> > (Encode writing instruction length into last 4 bits of MMIO address,
> >   side-channel say that accesses to the MMIO area always use certain
> >   instruction length, use hypercall, ...)
> >
> > Thanks.
> 
> No progress from my side. But we can use PIO for virtio 1.0 and it's
> faster than fast MMIO (qemu supports modern pio notification bar, we can
> make it as default). It looks to me that neither encoding nor hypercall
> will work for real hardware virtio device.

Encoding the instruction length would work, the h/w virtio devices would
just ignore it.  But... it is really ugly.

Using PIO would be a small step backwards for PCIe.  As long as the device
only needs *one* notification register (either MMIO or PIO) to initialize
successfully, it's okay.  Then if there is no PIO space you'd just fall back
to the slower MMIO notification.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
  2018-01-25 14:16 ` Radim Krčmář
@ 2018-01-25 14:39   ` Jason Wang
  2018-01-25 14:49     ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2018-01-25 14:39 UTC (permalink / raw)
  To: Radim Krčmář, Liran Alon
  Cc: vkuznets, x86, pbonzini, linux-kernel, kvm, Michael S. Tsirkin



On 2018年01月25日 22:16, Radim Krčmář wrote:
> 2018-01-25 01:55-0800, Liran Alon:
>> ----- vkuznets@redhat.com wrote:
>>> I was investigating an issue with seabios >= 1.10 which stopped
>>> working
>>> for nested KVM on Hyper-V. The problem appears to be in
>>> handle_ept_violation() function: when we do fast mmio we need to skip
>>> the instruction so we do kvm_skip_emulated_instruction(). This,
>>> however,
>>> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
>>> However, this is not the case.
>>>
>>> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
>>> EPT MISCONFIG occurs. While on real hardware it was observed to be
>>> set,
>>> some hypervisors follow the spec and don't set it; we end up
>>> advancing
>>> IP with some random value.
>>>
>>> I checked with Microsoft and they confirmed they don't fill
>>> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
>>>
>>> Fix the issue by disabling fast mmio when running nested.
>>>
>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> ---
>>>   arch/x86/kvm/vmx.c | 9 ++++++++-
>>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index c829d89e2e63..54afb446f38e 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu
>>> *vcpu)
>>>   	/*
>>>   	 * A nested guest cannot optimize MMIO vmexits, because we have an
>>>   	 * nGPA here instead of the required GPA.
>>> +	 * Skipping instruction below depends on undefined behavior:
>>> Intel's
>>> +	 * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
>>> +	 * when EPT MISCONFIG occurs and while on real hardware it was
>>> observed
>>> +	 * to be set, other hypervisors (namely Hyper-V) don't set it, we
>>> end
>>> +	 * up advancing IP with some random value. Disable fast mmio when
>>> +	 * running nested and keep it for real hardware in hope that
>>> +	 * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
>> If Intel manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS on EPT_MISCONFIG,
>> I don't think we should do this on real-hardware as-well.
> Neither do I, but you can see the last discussion on this topic,
> https://patchwork.kernel.org/patch/9903811/.  In short, we've agreed to
> limit the hack to real hardware and wait for Intel or virtio changes.
>
> Michael and Jason, any progress on implementing a fast virtio mechanism
> that doesn't rely on undefined behavior?
>
> (Encode writing instruction length into last 4 bits of MMIO address,
>   side-channel say that accesses to the MMIO area always use certain
>   instruction length, use hypercall, ...)
>
> Thanks.

No progress from my side. But we can use PIO for virtio 1.0 and it's 
faster than fast MMIO (qemu supports modern pio notification bar, we can 
make it as default). It looks to me that neither encoding nor hypercall 
will work for real hardware virtio device.

Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
  2018-01-25  9:55 Liran Alon
@ 2018-01-25 14:16 ` Radim Krčmář
  2018-01-25 14:39   ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Radim Krčmář @ 2018-01-25 14:16 UTC (permalink / raw)
  To: Liran Alon
  Cc: vkuznets, x86, pbonzini, linux-kernel, kvm, Michael S. Tsirkin,
	Jason Wang

2018-01-25 01:55-0800, Liran Alon:
> ----- vkuznets@redhat.com wrote:
> > I was investigating an issue with seabios >= 1.10 which stopped
> > working
> > for nested KVM on Hyper-V. The problem appears to be in
> > handle_ept_violation() function: when we do fast mmio we need to skip
> > the instruction so we do kvm_skip_emulated_instruction(). This,
> > however,
> > depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> > However, this is not the case.
> > 
> > Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> > EPT MISCONFIG occurs. While on real hardware it was observed to be
> > set,
> > some hypervisors follow the spec and don't set it; we end up
> > advancing
> > IP with some random value.
> > 
> > I checked with Microsoft and they confirmed they don't fill
> > VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
> > 
> > Fix the issue by disabling fast mmio when running nested.
> > 
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > ---
> >  arch/x86/kvm/vmx.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index c829d89e2e63..54afb446f38e 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu
> > *vcpu)
> >  	/*
> >  	 * A nested guest cannot optimize MMIO vmexits, because we have an
> >  	 * nGPA here instead of the required GPA.
> > +	 * Skipping instruction below depends on undefined behavior:
> > Intel's
> > +	 * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
> > +	 * when EPT MISCONFIG occurs and while on real hardware it was
> > observed
> > +	 * to be set, other hypervisors (namely Hyper-V) don't set it, we
> > end
> > +	 * up advancing IP with some random value. Disable fast mmio when
> > +	 * running nested and keep it for real hardware in hope that
> > +	 * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
> 
> If Intel manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS on EPT_MISCONFIG,
> I don't think we should do this on real-hardware as-well.

Neither do I, but you can see the last discussion on this topic,
https://patchwork.kernel.org/patch/9903811/.  In short, we've agreed to
limit the hack to real hardware and wait for Intel or virtio changes.

Michael and Jason, any progress on implementing a fast virtio mechanism
that doesn't rely on undefined behavior?

(Encode writing instruction length into last 4 bits of MMIO address,
 side-channel say that accesses to the MMIO area always use certain
 instruction length, use hypercall, ...)

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/kvm: disable fast MMIO when running nested
@ 2018-01-25  9:55 Liran Alon
  2018-01-25 14:16 ` Radim Krčmář
  0 siblings, 1 reply; 13+ messages in thread
From: Liran Alon @ 2018-01-25  9:55 UTC (permalink / raw)
  To: vkuznets; +Cc: x86, rkrcmar, pbonzini, linux-kernel, kvm


----- vkuznets@redhat.com wrote:

> I was investigating an issue with seabios >= 1.10 which stopped
> working
> for nested KVM on Hyper-V. The problem appears to be in
> handle_ept_violation() function: when we do fast mmio we need to skip
> the instruction so we do kvm_skip_emulated_instruction(). This,
> however,
> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> However, this is not the case.
> 
> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> EPT MISCONFIG occurs. While on real hardware it was observed to be
> set,
> some hypervisors follow the spec and don't set it; we end up
> advancing
> IP with some random value.
> 
> I checked with Microsoft and they confirmed they don't fill
> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
> 
> Fix the issue by disabling fast mmio when running nested.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..54afb446f38e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu
> *vcpu)
>  	/*
>  	 * A nested guest cannot optimize MMIO vmexits, because we have an
>  	 * nGPA here instead of the required GPA.
> +	 * Skipping instruction below depends on undefined behavior:
> Intel's
> +	 * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
> +	 * when EPT MISCONFIG occurs and while on real hardware it was
> observed
> +	 * to be set, other hypervisors (namely Hyper-V) don't set it, we
> end
> +	 * up advancing IP with some random value. Disable fast mmio when
> +	 * running nested and keep it for real hardware in hope that
> +	 * VM_EXIT_INSTRUCTION_LEN will always be set correctly.

If Intel manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS on EPT_MISCONFIG,
I don't think we should do this on real-hardware as-well.

-Liran

>  	 */
>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> -	if (!is_guest_mode(vcpu) &&
> +	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu)
> &&
>  	    !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
>  		trace_kvm_fast_mmio(gpa);
>  		return kvm_skip_emulated_instruction(vcpu);
> -- 
> 2.14.3

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-01-26  3:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-24 15:12 [PATCH] x86/kvm: disable fast MMIO when running nested Vitaly Kuznetsov
2018-01-25  7:55 ` Wanpeng Li
2018-01-25 14:34 ` Radim Krčmář
2018-01-25 14:34 ` Paolo Bonzini
2018-01-25 14:49   ` Vitaly Kuznetsov
2018-01-25  9:55 Liran Alon
2018-01-25 14:16 ` Radim Krčmář
2018-01-25 14:39   ` Jason Wang
2018-01-25 14:49     ` Paolo Bonzini
2018-01-25 17:11       ` Michael S. Tsirkin
2018-01-26  2:41         ` Jason Wang
2018-01-26  2:49           ` Michael S. Tsirkin
2018-01-26  3:21             ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.