linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
@ 2017-11-14  5:12 Rik van Riel
  2017-11-14 16:57 ` David Hildenbrand
  2017-12-04  2:15 ` Wanpeng Li
  0 siblings, 2 replies; 13+ messages in thread
From: Rik van Riel @ 2017-11-14  5:12 UTC (permalink / raw)
  To: pbonzini; +Cc: kvm, linux-kernel, tglx, rkrcmar, borntraeger

Currently, every time a VCPU is scheduled out, the host kernel will
first save the guest FPU/xstate context, then load the qemu userspace
FPU context, only to then immediately save the qemu userspace FPU
context back to memory. When scheduling in a VCPU, the same extraneous
FPU loads and saves are done.

This could be avoided by moving from a model where the guest FPU is
loaded and stored with preemption disabled, to a model where the
qemu userspace FPU is swapped out for the guest FPU context for
the duration of the KVM_RUN ioctl.

This is done under the VCPU mutex, which is also taken when other
tasks inspect the VCPU FPU context, so the code should already be
safe for this change. That should come as no surprise, given that
s390 already has this optimization.

No performance changes were detected in quick ping-pong tests on
my 4 socket system, which is expected since an FPU+xstate load is
on the order of 0.1us, while ping-ponging between CPUs is on the
order of 20us, and somewhat noisy. 

There may be other tests where performance changes are noticeable.

Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/x86/include/asm/kvm_host.h | 13 +++++++++++++
 arch/x86/kvm/x86.c              | 29 ++++++++++++-----------------
 2 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c73e493adf07..92e66685249e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -536,7 +536,20 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu_memory_cache mmu_page_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
 
+	/*
+	 * QEMU userspace and the guest each have their own FPU state.
+	 * In vcpu_run, we switch between the user and guest FPU contexts.
+	 * While running a VCPU, the VCPU thread will have the guest FPU
+	 * context.
+	 *
+	 * Note that while the PKRU state lives inside the fpu registers,
+	 * it is switched out separately at VMENTER and VMEXIT time. The
+	 * "guest_fpu" state here contains the guest FPU context, with the
+	 * host PRKU bits.
+	 */
+	struct fpu user_fpu;
 	struct fpu guest_fpu;
+
 	u64 xcr0;
 	u64 guest_supported_xcr0;
 	u32 guest_xstate_size;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03869eb7fcd6..59912b20a830 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2917,7 +2917,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 	pagefault_enable();
 	kvm_x86_ops->vcpu_put(vcpu);
-	kvm_put_guest_fpu(vcpu);
 	vcpu->arch.last_host_tsc = rdtsc();
 }
 
@@ -6908,7 +6907,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	preempt_disable();
 
 	kvm_x86_ops->prepare_guest_switch(vcpu);
-	kvm_load_guest_fpu(vcpu);
 
 	/*
 	 * Disable IRQs before setting IN_GUEST_MODE.  Posted interrupt
@@ -7095,6 +7093,8 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 
 	vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
 
+	kvm_load_guest_fpu(vcpu);
+
 	for (;;) {
 		if (kvm_vcpu_running(vcpu)) {
 			r = vcpu_enter_guest(vcpu);
@@ -7132,6 +7132,8 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	kvm_put_guest_fpu(vcpu);
+
 	srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
 
 	return r;
@@ -7663,32 +7665,25 @@ static void fx_init(struct kvm_vcpu *vcpu)
 	vcpu->arch.cr0 |= X86_CR0_ET;
 }
 
+/* Swap (qemu) user FPU context for the guest FPU context. */
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	if (vcpu->guest_fpu_loaded)
-		return;
-
-	/*
-	 * Restore all possible states in the guest,
-	 * and assume host would use all available bits.
-	 * Guest xcr0 would be loaded later.
-	 */
-	vcpu->guest_fpu_loaded = 1;
-	__kernel_fpu_begin();
+	preempt_disable();
+	copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
 	/* PKRU is separately restored in kvm_x86_ops->run.  */
 	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
 				~XFEATURE_MASK_PKRU);
+	preempt_enable();
 	trace_kvm_fpu(1);
 }
 
+/* When vcpu_run ends, restore user space FPU context. */
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	if (!vcpu->guest_fpu_loaded)
-		return;
-
-	vcpu->guest_fpu_loaded = 0;
+	preempt_disable();
 	copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
-	__kernel_fpu_end();
+	copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
+	preempt_enable();
 	++vcpu->stat.fpu_reload;
 	trace_kvm_fpu(0);
 }

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-11-14  5:12 [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run Rik van Riel
@ 2017-11-14 16:57 ` David Hildenbrand
  2017-11-14 18:07   ` Rik van Riel
  2017-12-04  2:15 ` Wanpeng Li
  1 sibling, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2017-11-14 16:57 UTC (permalink / raw)
  To: Rik van Riel, pbonzini; +Cc: kvm, linux-kernel, tglx, rkrcmar, borntraeger

On 14.11.2017 06:12, Rik van Riel wrote:
> Currently, every time a VCPU is scheduled out, the host kernel will
> first save the guest FPU/xstate context, then load the qemu userspace
> FPU context, only to then immediately save the qemu userspace FPU
> context back to memory. When scheduling in a VCPU, the same extraneous
> FPU loads and saves are done.
> 
> This could be avoided by moving from a model where the guest FPU is
> loaded and stored with preemption disabled, to a model where the
> qemu userspace FPU is swapped out for the guest FPU context for
> the duration of the KVM_RUN ioctl.
> 
> This is done under the VCPU mutex, which is also taken when other
> tasks inspect the VCPU FPU context, so the code should already be
> safe for this change. That should come as no surprise, given that
> s390 already has this optimization.
> 
> No performance changes were detected in quick ping-pong tests on
> my 4 socket system, which is expected since an FPU+xstate load is
> on the order of 0.1us, while ping-ponging between CPUs is on the
> order of 20us, and somewhat noisy. 
> 
> There may be other tests where performance changes are noticeable.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 13 +++++++++++++
>  arch/x86/kvm/x86.c              | 29 ++++++++++++-----------------
>  2 files changed, 25 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c73e493adf07..92e66685249e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h

We should also get rid of guest_fpu_loaded now, right?


> @@ -536,7 +536,20 @@ struct kvm_vcpu_arch {
>  	struct kvm_mmu_memory_cache mmu_page_cache;
>  	struct kvm_mmu_memory_cache mmu_page_header_cache;
>  
> +	/*
> +	 * QEMU userspace and the guest each have their own FPU state.
> +	 * In vcpu_run, we switch between the user and guest FPU contexts.
> +	 * While running a VCPU, the VCPU thread will have the guest FPU
> +	 * context.
> +	 *
> +	 * Note that while the PKRU state lives inside the fpu registers,
> +	 * it is switched out separately at VMENTER and VMEXIT time. The
> +	 * "guest_fpu" state here contains the guest FPU context, with the
> +	 * host PRKU bits.
> +	 */
> +	struct fpu user_fpu;
>  	struct fpu guest_fpu;
> +
>  	u64 xcr0;
>  	u64 guest_supported_xcr0;
>  	u32 guest_xstate_size;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 03869eb7fcd6..59912b20a830 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2917,7 +2917,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  	pagefault_enable();
>  	kvm_x86_ops->vcpu_put(vcpu);
> -	kvm_put_guest_fpu(vcpu);
>  	vcpu->arch.last_host_tsc = rdtsc();
>  }
>  
> @@ -6908,7 +6907,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	preempt_disable();
>  
>  	kvm_x86_ops->prepare_guest_switch(vcpu);
> -	kvm_load_guest_fpu(vcpu);
>  
>  	/*
>  	 * Disable IRQs before setting IN_GUEST_MODE.  Posted interrupt
> @@ -7095,6 +7093,8 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>  
> +	kvm_load_guest_fpu(vcpu);
> +
>  	for (;;) {
>  		if (kvm_vcpu_running(vcpu)) {
>  			r = vcpu_enter_guest(vcpu);
> @@ -7132,6 +7132,8 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> +	kvm_put_guest_fpu(vcpu);
> +
>  	srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>  
>  	return r;
> @@ -7663,32 +7665,25 @@ static void fx_init(struct kvm_vcpu *vcpu)
>  	vcpu->arch.cr0 |= X86_CR0_ET;
>  }
>  
> +/* Swap (qemu) user FPU context for the guest FPU context. */
>  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>  {
> -	if (vcpu->guest_fpu_loaded)
> -		return;
> -
> -	/*
> -	 * Restore all possible states in the guest,
> -	 * and assume host would use all available bits.
> -	 * Guest xcr0 would be loaded later.
> -	 */
> -	vcpu->guest_fpu_loaded = 1;
> -	__kernel_fpu_begin();
> +	preempt_disable();
> +	copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
>  	/* PKRU is separately restored in kvm_x86_ops->run.  */
>  	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
>  				~XFEATURE_MASK_PKRU);
> +	preempt_enable();
>  	trace_kvm_fpu(1);
>  }
>  
> +/* When vcpu_run ends, restore user space FPU context. */
>  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>  {
> -	if (!vcpu->guest_fpu_loaded)
> -		return;
> -
> -	vcpu->guest_fpu_loaded = 0;
> +	preempt_disable();
>  	copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
> -	__kernel_fpu_end();
> +	copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
> +	preempt_enable();
>  	++vcpu->stat.fpu_reload;
>  	trace_kvm_fpu(0);
>  }
> 

emulator_get_fpu() does a kvm_load_guest_fpu(). Doesn't that mean that
this is now not needed anymore? (at least when emulator code is called
from inside the loop?)

Also, what about preempt_diable() at that point, still needed?


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-11-14 16:57 ` David Hildenbrand
@ 2017-11-14 18:07   ` Rik van Riel
  2017-11-14 18:09     ` Paolo Bonzini
  2017-11-14 19:40     ` David Hildenbrand
  0 siblings, 2 replies; 13+ messages in thread
From: Rik van Riel @ 2017-11-14 18:07 UTC (permalink / raw)
  To: David Hildenbrand, pbonzini; +Cc: kvm, linux-kernel, tglx, rkrcmar, borntraeger

[-- Attachment #1: Type: text/plain, Size: 1681 bytes --]

On Tue, 2017-11-14 at 17:57 +0100, David Hildenbrand wrote:
> 
> > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h
> > index c73e493adf07..92e66685249e 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> 
> We should also get rid of guest_fpu_loaded now, right?

Indeed, we no longer need that member. I'll get rid of it.

> emulator_get_fpu() does a kvm_load_guest_fpu(). Doesn't that mean
> that
> this is now not needed anymore? (at least when emulator code is
> called
> from inside the loop?)

Now that is a very good question!

When called from inside the loop, it is indeed not
needed.

My question is, can the in-kernel emulator code ever
be called from OUTSIDE the KVM_RUN ioctl loop?

If so, we need to restore the user FPU context before
returning from the emulator code. Given that the current
emulator code does not do that, I suspect this is not
the case. I also see no path from the kvm ioctl into
the emulator code, other than via KVM_RUN.

The FPU and XSAVE ioctls all work on the saved
vcpu->arch.guest_fpu data, and never directly on the
registers.

Looks like we can completely get rid of .get_fpu and
.put_fpu...

Unless Paolo has any objection, I'll go do that :)

> Also, what about preempt_diable() at that point, still needed?

If the guest FPU context is the only FPU context loaded
for the task at that point in time, we should not need
to run with preemption disabled.

After all, if we were to get preempted, the context switch
code would automatically save and restore the guest FPU
context for us.

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-11-14 18:07   ` Rik van Riel
@ 2017-11-14 18:09     ` Paolo Bonzini
  2017-11-14 19:40     ` David Hildenbrand
  1 sibling, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2017-11-14 18:09 UTC (permalink / raw)
  To: Rik van Riel, David Hildenbrand
  Cc: kvm, linux-kernel, tglx, rkrcmar, borntraeger


[-- Attachment #1.1: Type: text/plain, Size: 625 bytes --]

On 14/11/2017 19:07, Rik van Riel wrote:
> My question is, can the in-kernel emulator code ever
> be called from OUTSIDE the KVM_RUN ioctl loop?

No, it can't.  This makes the patch much more appealing...

Paolo

> If so, we need to restore the user FPU context before
> returning from the emulator code. Given that the current
> emulator code does not do that, I suspect this is not
> the case. I also see no path from the kvm ioctl into
> the emulator code, other than via KVM_RUN.
> 
> The FPU and XSAVE ioctls all work on the saved
> vcpu->arch.guest_fpu data, and never directly on the
> registers.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-11-14 18:07   ` Rik van Riel
  2017-11-14 18:09     ` Paolo Bonzini
@ 2017-11-14 19:40     ` David Hildenbrand
  2017-11-14 21:11       ` Rik van Riel
  2017-11-15  8:34       ` Paolo Bonzini
  1 sibling, 2 replies; 13+ messages in thread
From: David Hildenbrand @ 2017-11-14 19:40 UTC (permalink / raw)
  To: Rik van Riel, pbonzini; +Cc: kvm, linux-kernel, tglx, rkrcmar, borntraeger

On 14.11.2017 19:07, Rik van Riel wrote:
> On Tue, 2017-11-14 at 17:57 +0100, David Hildenbrand wrote:
>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h
>>> b/arch/x86/include/asm/kvm_host.h
>>> index c73e493adf07..92e66685249e 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>
>> We should also get rid of guest_fpu_loaded now, right?
> 
> Indeed, we no longer need that member. I'll get rid of it.
> 
>> emulator_get_fpu() does a kvm_load_guest_fpu(). Doesn't that mean
>> that
>> this is now not needed anymore? (at least when emulator code is
>> called
>> from inside the loop?)
> 
> Now that is a very good question!
> 
> When called from inside the loop, it is indeed not
> needed.
> 
> My question is, can the in-kernel emulator code ever
> be called from OUTSIDE the KVM_RUN ioctl loop?
> 
> If so, we need to restore the user FPU context before
> returning from the emulator code. Given that the current
> emulator code does not do that, I suspect this is not
> the case. I also see no path from the kvm ioctl into
> the emulator code, other than via KVM_RUN.
> 
> The FPU and XSAVE ioctls all work on the saved
> vcpu->arch.guest_fpu data, and never directly on the
> registers.
> 
> Looks like we can completely get rid of .get_fpu and
> .put_fpu...
> 
> Unless Paolo has any objection, I'll go do that :)


I think we should check all get/put_fpu callers if they need
preempt_disable().

E.g. em_fxrstor() needs disabled preemption as we temporarily
save + restore some host register (via fxsave + fxrstor) under some
circumstances that are not saved/restored when switching to/back from
another process. We should double check.

@Paolo what about complete_userspace_io? It can end up calling
emulate_instruction(). So maybe we have to move load/put fpu further out
or add special handling.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-11-14 19:40     ` David Hildenbrand
@ 2017-11-14 21:11       ` Rik van Riel
  2017-11-15  8:34       ` Paolo Bonzini
  1 sibling, 0 replies; 13+ messages in thread
From: Rik van Riel @ 2017-11-14 21:11 UTC (permalink / raw)
  To: David Hildenbrand, pbonzini; +Cc: kvm, linux-kernel, tglx, rkrcmar, borntraeger

[-- Attachment #1: Type: text/plain, Size: 2398 bytes --]

On Tue, 2017-11-14 at 20:40 +0100, David Hildenbrand wrote:
> On 14.11.2017 19:07, Rik van Riel wrote:
> > On Tue, 2017-11-14 at 17:57 +0100, David Hildenbrand wrote:
> > > 
> > > > diff --git a/arch/x86/include/asm/kvm_host.h
> > > > b/arch/x86/include/asm/kvm_host.h
> > > > index c73e493adf07..92e66685249e 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > 
> > > We should also get rid of guest_fpu_loaded now, right?
> > 
> > Indeed, we no longer need that member. I'll get rid of it.
> > 
> > > emulator_get_fpu() does a kvm_load_guest_fpu(). Doesn't that mean
> > > that
> > > this is now not needed anymore? (at least when emulator code is
> > > called
> > > from inside the loop?)
> > 
> > Now that is a very good question!
> > 
> > When called from inside the loop, it is indeed not
> > needed.
> > 
> > My question is, can the in-kernel emulator code ever
> > be called from OUTSIDE the KVM_RUN ioctl loop?
> > 
> > If so, we need to restore the user FPU context before
> > returning from the emulator code. Given that the current
> > emulator code does not do that, I suspect this is not
> > the case. I also see no path from the kvm ioctl into
> > the emulator code, other than via KVM_RUN.
> > 
> > The FPU and XSAVE ioctls all work on the saved
> > vcpu->arch.guest_fpu data, and never directly on the
> > registers.
> > 
> > Looks like we can completely get rid of .get_fpu and
> > .put_fpu...
> > 
> > Unless Paolo has any objection, I'll go do that :)
> 
> 
> I think we should check all get/put_fpu callers if they need
> preempt_disable().
> 
> E.g. em_fxrstor() needs disabled preemption as we temporarily
> save + restore some host register (via fxsave + fxrstor) under some
> circumstances that are not saved/restored when switching to/back from
> another process. We should double check.
> 
> @Paolo what about complete_userspace_io? It can end up calling
> emulate_instruction(). So maybe we have to move load/put fpu further
> out
> or add special handling.

It looks like all complete_userspace_io causes is for
the vcpu_run loop to exit, and return to userspace
from the KVM_RUN ioctl code.

In other words, the userspace qemu FPU context should
be restored before we return to userspace, even with
my patch (v2 on the way).

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-11-14 19:40     ` David Hildenbrand
  2017-11-14 21:11       ` Rik van Riel
@ 2017-11-15  8:34       ` Paolo Bonzini
  2017-11-15  9:23         ` David Hildenbrand
  2017-11-15 14:50         ` Rik van Riel
  1 sibling, 2 replies; 13+ messages in thread
From: Paolo Bonzini @ 2017-11-15  8:34 UTC (permalink / raw)
  To: David Hildenbrand, Rik van Riel
  Cc: kvm, linux-kernel, tglx, rkrcmar, borntraeger

On 14/11/2017 20:40, David Hildenbrand wrote:
> I think we should check all get/put_fpu callers if they need
> preempt_disable().
> 
> E.g. em_fxrstor() needs disabled preemption as we temporarily
> save + restore some host register (via fxsave + fxrstor) under some
> circumstances that are not saved/restored when switching to/back from
> another process. We should double check.

Rik may correct me, but I believe that you don't need
preempt_disable/enable because preempt notifiers do this for you.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-11-15  8:34       ` Paolo Bonzini
@ 2017-11-15  9:23         ` David Hildenbrand
  2017-11-15 14:50         ` Rik van Riel
  1 sibling, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2017-11-15  9:23 UTC (permalink / raw)
  To: Paolo Bonzini, Rik van Riel; +Cc: kvm, linux-kernel, tglx, rkrcmar, borntraeger

On 15.11.2017 09:34, Paolo Bonzini wrote:
> On 14/11/2017 20:40, David Hildenbrand wrote:
>> I think we should check all get/put_fpu callers if they need
>> preempt_disable().
>>
>> E.g. em_fxrstor() needs disabled preemption as we temporarily
>> save + restore some host register (via fxsave + fxrstor) under some
>> circumstances that are not saved/restored when switching to/back from
>> another process. We should double check.
> 
> Rik may correct me, but I believe that you don't need
> preempt_disable/enable because preempt notifiers do this for you.
> 

Seems to boil down to what Wanpeng Li asked about preempt notifiers in
response to v2.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-11-15  8:34       ` Paolo Bonzini
  2017-11-15  9:23         ` David Hildenbrand
@ 2017-11-15 14:50         ` Rik van Riel
  2017-11-15 15:20           ` David Hildenbrand
  1 sibling, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2017-11-15 14:50 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand
  Cc: kvm, linux-kernel, tglx, rkrcmar, borntraeger

On Wed, 2017-11-15 at 09:34 +0100, Paolo Bonzini wrote:
> On 14/11/2017 20:40, David Hildenbrand wrote:
> > I think we should check all get/put_fpu callers if they need
> > preempt_disable().
> > 
> > E.g. em_fxrstor() needs disabled preemption as we temporarily
> > save + restore some host register (via fxsave + fxrstor) under some
> > circumstances that are not saved/restored when switching to/back
> > from
> > another process. We should double check.
> 
> Rik may correct me, but I believe that you don't need
> preempt_disable/enable because preempt notifiers do this for you.

We no longer even need the preempt notifiers to save and
restore the guest FPU state.

The context switch code itself will save the FPU state
from the registers, into current->thread.fpu.state, when
the VCPU thread gets scheduled out.

When the VCPU thread gets scheduled in, the scheduler
will restore the guest FPU state from current->thread.fpu.state.

At this point, vcpu->arch.guest_fpu may be OUT OF DATE.

However, this is just fine, because we will save the guest
FPU state into vcpu->arch.guest_fpu in kvm_put_guest_fpu,
before we leave the KVM_RUN ioctl, and before we release
the vcpu->mutex.

In other words, by the time anybody else can examine the
VCPU FPU state (after they obtain the vcpu->mutex), the
vcpu->arch.guest_fpu area will contain the correct FPU
state.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-11-15 14:50         ` Rik van Riel
@ 2017-11-15 15:20           ` David Hildenbrand
  0 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2017-11-15 15:20 UTC (permalink / raw)
  To: Rik van Riel, Paolo Bonzini; +Cc: kvm, linux-kernel, tglx, rkrcmar, borntraeger

On 15.11.2017 15:50, Rik van Riel wrote:
> On Wed, 2017-11-15 at 09:34 +0100, Paolo Bonzini wrote:
>> On 14/11/2017 20:40, David Hildenbrand wrote:
>>> I think we should check all get/put_fpu callers if they need
>>> preempt_disable().
>>>
>>> E.g. em_fxrstor() needs disabled preemption as we temporarily
>>> save + restore some host register (via fxsave + fxrstor) under some
>>> circumstances that are not saved/restored when switching to/back
>>> from
>>> another process. We should double check.
>>
>> Rik may correct me, but I believe that you don't need
>> preempt_disable/enable because preempt notifiers do this for you.
> 
> We no longer even need the preempt notifiers to save and
> restore the guest FPU state.
> 
> The context switch code itself will save the FPU state
> from the registers, into current->thread.fpu.state, when
> the VCPU thread gets scheduled out.
> 
> When the VCPU thread gets scheduled in, the scheduler
> will restore the guest FPU state from current->thread.fpu.state.
> 
> At this point, vcpu->arch.guest_fpu may be OUT OF DATE.
> 
> However, this is just fine, because we will save the guest
> FPU state into vcpu->arch.guest_fpu in kvm_put_guest_fpu,
> before we leave the KVM_RUN ioctl, and before we release
> the vcpu->mutex.
> 
> In other words, by the time anybody else can examine the
> VCPU FPU state (after they obtain the vcpu->mutex), the
> vcpu->arch.guest_fpu area will contain the correct FPU
> state.
> 

Okay, that answers my question, thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-11-14  5:12 [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run Rik van Riel
  2017-11-14 16:57 ` David Hildenbrand
@ 2017-12-04  2:15 ` Wanpeng Li
  2017-12-05 17:09   ` Radim Krcmar
  1 sibling, 1 reply; 13+ messages in thread
From: Wanpeng Li @ 2017-12-04  2:15 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Paolo Bonzini, kvm, linux-kernel, Thomas Gleixner, Radim Krcmar,
	Christian Borntraeger

2017-11-14 13:12 GMT+08:00 Rik van Riel <riel@redhat.com>:
> Currently, every time a VCPU is scheduled out, the host kernel will
> first save the guest FPU/xstate context, then load the qemu userspace
> FPU context, only to then immediately save the qemu userspace FPU
> context back to memory. When scheduling in a VCPU, the same extraneous
> FPU loads and saves are done.
>
> This could be avoided by moving from a model where the guest FPU is
> loaded and stored with preemption disabled, to a model where the
> qemu userspace FPU is swapped out for the guest FPU context for
> the duration of the KVM_RUN ioctl.
>
> This is done under the VCPU mutex, which is also taken when other
> tasks inspect the VCPU FPU context, so the code should already be
> safe for this change. That should come as no surprise, given that
> s390 already has this optimization.
>
> No performance changes were detected in quick ping-pong tests on
> my 4 socket system, which is expected since an FPU+xstate load is
> on the order of 0.1us, while ping-ponging between CPUs is on the
> order of 20us, and somewhat noisy.
>
> There may be other tests where performance changes are noticeable.

The kvm/queue has the below splatting:

[  650.866212] Bad FPU state detected at kvm_put_guest_fpu+0x7d/0x210
[kvm], reinitializing FPU registers.
[  650.866232] ------------[ cut here ]------------
[  650.866241] WARNING: CPU: 2 PID: 2583 at arch/x86/mm/extable.c:103
ex_handler_fprestore+0x5f/0x70
[  650.866473]  libahci wmi hid pinctrl_sunrisepoint video pinctrl_intel
[  650.866496] CPU: 2 PID: 2583 Comm: qemu-system-x86 Not tainted 4.14.0+ #7
[  650.866500] Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS
1.4.9 09/12/2016
[  650.866503] task: ffff97a095a28000 task.stack: ffffa71c8585c000
[  650.866509] RIP: 0010:ex_handler_fprestore+0x5f/0x70
[  650.866512] RSP: 0018:ffffa71c8585fc28 EFLAGS: 00010282
[  650.866519] RAX: 000000000000005b RBX: ffffa71c8585fc68 RCX: 0000000000000006
[  650.866522] RDX: 0000000000000000 RSI: ffffffffb4d35333 RDI: 0000000000000282
[  650.866526] RBP: 000000000000000d R08: 00000000fddae359 R09: 0000000000000000
[  650.866529] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  650.866532] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
[  650.866536] FS:  00007f6f8f22c700(0000) GS:ffff97a09ca00000(0000)
knlGS:0000000000000000
[  650.866540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  650.866543] CR2: 00007f6f993f3000 CR3: 00000003d4aae005 CR4: 00000000003626e0
[  650.866547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  650.866550] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  650.866554] Call Trace:
[  650.866559]  fixup_exception+0x32/0x40
[  650.866567]  do_general_protection+0xa0/0x1b0
[  650.866574]  general_protection+0x22/0x30
[  650.866595] RIP: 0010:kvm_put_guest_fpu+0x7d/0x210 [kvm]
[  650.866599] RSP: 0018:ffffa71c8585fd18 EFLAGS: 00010246
[  650.866605] RAX: 00000000ffffffff RBX: ffff97a095a30000 RCX: 0000000000000001
[  650.866608] RDX: 00000000ffffffff RSI: 00000000f7d5d46a RDI: ffff97a095a30b80
[  650.866611] RBP: 0000000000000000 R08: 00000000fddae359 R09: ffff97a095a28968
[  650.866615] R10: 0000000000000000 R11: 00000000e8d39b88 R12: ffff97a095a31bc0
[  650.866618] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
[  650.866650]  ? kvm_put_guest_fpu+0x27/0x210 [kvm]
[  650.866670]  kvm_vcpu_reset+0x1be/0x250 [kvm]
[  650.866689]  kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
[  650.866707]  kvm_vm_ioctl+0x31a/0x820 [kvm]
[  650.866712]  ? __lock_acquire+0x809/0x1410
[  650.866721]  ? __lock_acquire+0x809/0x1410
[  650.866734]  do_vfs_ioctl+0x9f/0x6c0
[  650.866743]  ? __fget+0x108/0x1f0
[  650.866752]  SyS_ioctl+0x74/0x80
[  650.866757]  ? do_syscall_64+0xc4/0x3d0
[  650.866764]  do_syscall_64+0x8a/0x3d0
[  650.866769]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  650.866781]  entry_SYSCALL64_slow_path+0x25/0x25
[  650.866785] RIP: 0033:0x7f6f973a0f07
[  650.866788] RSP: 002b:00007f6f8f22b968 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  650.866795] RAX: ffffffffffffffda RBX: 000000000000ae41 RCX: 00007f6f973a0f07
[  650.866798] RDX: 0000000000000000 RSI: 000000000000ae41 RDI: 000000000000000b
[  650.866802] RBP: 0000000000000000 R08: 0000558248e26a40 R09: 000055824b58e280
[  650.866805] R10: 0000558249593f40 R11: 0000000000000246 R12: 000055824b55ec90
[  650.866808] R13: 00007ffd274d79ff R14: 00007f6f8f22c9c0 R15: 000055824b58e280
[  650.867014] ---[ end trace 2c5d6cfaba0ee1b3 ]---

Regards,
Wanpeng Li

>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 13 +++++++++++++
>  arch/x86/kvm/x86.c              | 29 ++++++++++++-----------------
>  2 files changed, 25 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c73e493adf07..92e66685249e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -536,7 +536,20 @@ struct kvm_vcpu_arch {
>         struct kvm_mmu_memory_cache mmu_page_cache;
>         struct kvm_mmu_memory_cache mmu_page_header_cache;
>
> +       /*
> +        * QEMU userspace and the guest each have their own FPU state.
> +        * In vcpu_run, we switch between the user and guest FPU contexts.
> +        * While running a VCPU, the VCPU thread will have the guest FPU
> +        * context.
> +        *
> +        * Note that while the PKRU state lives inside the fpu registers,
> +        * it is switched out separately at VMENTER and VMEXIT time. The
> +        * "guest_fpu" state here contains the guest FPU context, with the
> +        * host PRKU bits.
> +        */
> +       struct fpu user_fpu;
>         struct fpu guest_fpu;
> +
>         u64 xcr0;
>         u64 guest_supported_xcr0;
>         u32 guest_xstate_size;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 03869eb7fcd6..59912b20a830 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2917,7 +2917,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>         srcu_read_unlock(&vcpu->kvm->srcu, idx);
>         pagefault_enable();
>         kvm_x86_ops->vcpu_put(vcpu);
> -       kvm_put_guest_fpu(vcpu);
>         vcpu->arch.last_host_tsc = rdtsc();
>  }
>
> @@ -6908,7 +6907,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>         preempt_disable();
>
>         kvm_x86_ops->prepare_guest_switch(vcpu);
> -       kvm_load_guest_fpu(vcpu);
>
>         /*
>          * Disable IRQs before setting IN_GUEST_MODE.  Posted interrupt
> @@ -7095,6 +7093,8 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>
>         vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>
> +       kvm_load_guest_fpu(vcpu);
> +
>         for (;;) {
>                 if (kvm_vcpu_running(vcpu)) {
>                         r = vcpu_enter_guest(vcpu);
> @@ -7132,6 +7132,8 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>                 }
>         }
>
> +       kvm_put_guest_fpu(vcpu);
> +
>         srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>
>         return r;
> @@ -7663,32 +7665,25 @@ static void fx_init(struct kvm_vcpu *vcpu)
>         vcpu->arch.cr0 |= X86_CR0_ET;
>  }
>
> +/* Swap (qemu) user FPU context for the guest FPU context. */
>  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
>  {
> -       if (vcpu->guest_fpu_loaded)
> -               return;
> -
> -       /*
> -        * Restore all possible states in the guest,
> -        * and assume host would use all available bits.
> -        * Guest xcr0 would be loaded later.
> -        */
> -       vcpu->guest_fpu_loaded = 1;
> -       __kernel_fpu_begin();
> +       preempt_disable();
> +       copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
>         /* PKRU is separately restored in kvm_x86_ops->run.  */
>         __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
>                                 ~XFEATURE_MASK_PKRU);
> +       preempt_enable();
>         trace_kvm_fpu(1);
>  }
>
> +/* When vcpu_run ends, restore user space FPU context. */
>  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
>  {
> -       if (!vcpu->guest_fpu_loaded)
> -               return;
> -
> -       vcpu->guest_fpu_loaded = 0;
> +       preempt_disable();
>         copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
> -       __kernel_fpu_end();
> +       copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
> +       preempt_enable();
>         ++vcpu->stat.fpu_reload;
>         trace_kvm_fpu(0);
>  }
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-12-04  2:15 ` Wanpeng Li
@ 2017-12-05 17:09   ` Radim Krcmar
  2017-12-06  2:48     ` Wanpeng Li
  0 siblings, 1 reply; 13+ messages in thread
From: Radim Krcmar @ 2017-12-05 17:09 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Rik van Riel, Paolo Bonzini, kvm, linux-kernel, Thomas Gleixner,
	Christian Borntraeger

2017-12-04 10:15+0800, Wanpeng Li:
> 2017-11-14 13:12 GMT+08:00 Rik van Riel <riel@redhat.com>:
> > Currently, every time a VCPU is scheduled out, the host kernel will
> > first save the guest FPU/xstate context, then load the qemu userspace
> > FPU context, only to then immediately save the qemu userspace FPU
> > context back to memory. When scheduling in a VCPU, the same extraneous
> > FPU loads and saves are done.
> >
> > This could be avoided by moving from a model where the guest FPU is
> > loaded and stored with preemption disabled, to a model where the
> > qemu userspace FPU is swapped out for the guest FPU context for
> > the duration of the KVM_RUN ioctl.
> >
> > This is done under the VCPU mutex, which is also taken when other
> > tasks inspect the VCPU FPU context, so the code should already be
> > safe for this change. That should come as no surprise, given that
> > s390 already has this optimization.
> >
> > No performance changes were detected in quick ping-pong tests on
> > my 4 socket system, which is expected since an FPU+xstate load is
> > on the order of 0.1us, while ping-ponging between CPUs is on the
> > order of 20us, and somewhat noisy.
> >
> > There may be other tests where performance changes are noticeable.
> 
> The kvm/queue has the below splatting:
> 
> [  650.866212] Bad FPU state detected at kvm_put_guest_fpu+0x7d/0x210
> [kvm], reinitializing FPU registers.
> [  650.866232] ------------[ cut here ]------------
> [  650.866241] WARNING: CPU: 2 PID: 2583 at arch/x86/mm/extable.c:103
> ex_handler_fprestore+0x5f/0x70
> [  650.866473]  libahci wmi hid pinctrl_sunrisepoint video pinctrl_intel
> [  650.866496] CPU: 2 PID: 2583 Comm: qemu-system-x86 Not tainted 4.14.0+ #7
> [  650.866500] Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS
> 1.4.9 09/12/2016
> [  650.866503] task: ffff97a095a28000 task.stack: ffffa71c8585c000
> [  650.866509] RIP: 0010:ex_handler_fprestore+0x5f/0x70
> [  650.866512] RSP: 0018:ffffa71c8585fc28 EFLAGS: 00010282
> [  650.866519] RAX: 000000000000005b RBX: ffffa71c8585fc68 RCX: 0000000000000006
> [  650.866522] RDX: 0000000000000000 RSI: ffffffffb4d35333 RDI: 0000000000000282
> [  650.866526] RBP: 000000000000000d R08: 00000000fddae359 R09: 0000000000000000
> [  650.866529] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [  650.866532] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
> [  650.866536] FS:  00007f6f8f22c700(0000) GS:ffff97a09ca00000(0000)
> knlGS:0000000000000000
> [  650.866540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  650.866543] CR2: 00007f6f993f3000 CR3: 00000003d4aae005 CR4: 00000000003626e0
> [  650.866547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  650.866550] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  650.866554] Call Trace:
> [  650.866559]  fixup_exception+0x32/0x40
> [  650.866567]  do_general_protection+0xa0/0x1b0
> [  650.866574]  general_protection+0x22/0x30
> [  650.866595] RIP: 0010:kvm_put_guest_fpu+0x7d/0x210 [kvm]
> [  650.866599] RSP: 0018:ffffa71c8585fd18 EFLAGS: 00010246
> [  650.866605] RAX: 00000000ffffffff RBX: ffff97a095a30000 RCX: 0000000000000001
> [  650.866608] RDX: 00000000ffffffff RSI: 00000000f7d5d46a RDI: ffff97a095a30b80
> [  650.866611] RBP: 0000000000000000 R08: 00000000fddae359 R09: ffff97a095a28968
> [  650.866615] R10: 0000000000000000 R11: 00000000e8d39b88 R12: ffff97a095a31bc0
> [  650.866618] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
> [  650.866650]  ? kvm_put_guest_fpu+0x27/0x210 [kvm]

Looks like we're calling put when the fpu was not loaded.  The simplest
fix would be:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e0367f688547..064eba25c215 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7842,7 +7842,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		 * To avoid have the INIT path from kvm_apic_has_events() that be
 		 * called with loaded FPU and does not let userspace fix the state.
 		 */
-		kvm_put_guest_fpu(vcpu);
+		if (init_event)
+			kvm_put_guest_fpu(vcpu);
 		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
 					XFEATURE_MASK_BNDREGS);
 		if (mpx_state_buffer)
@@ -7851,6 +7852,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 					XFEATURE_MASK_BNDCSR);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
+		if (init_event)
+			kvm_load_guest_fpu(vcpu);
 	}
 
 	if (!init_event) {

I'll carry that until there is a nicer solution, thanks for the report.

> [  650.866670]  kvm_vcpu_reset+0x1be/0x250 [kvm]
> [  650.866689]  kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
> [  650.866707]  kvm_vm_ioctl+0x31a/0x820 [kvm]
> [  650.866712]  ? __lock_acquire+0x809/0x1410
> [  650.866721]  ? __lock_acquire+0x809/0x1410
> [  650.866734]  do_vfs_ioctl+0x9f/0x6c0
> [  650.866743]  ? __fget+0x108/0x1f0
> [  650.866752]  SyS_ioctl+0x74/0x80
> [  650.866757]  ? do_syscall_64+0xc4/0x3d0
> [  650.866764]  do_syscall_64+0x8a/0x3d0
> [  650.866769]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> [  650.866781]  entry_SYSCALL64_slow_path+0x25/0x25
> [  650.866785] RIP: 0033:0x7f6f973a0f07
> [  650.866788] RSP: 002b:00007f6f8f22b968 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  650.866795] RAX: ffffffffffffffda RBX: 000000000000ae41 RCX: 00007f6f973a0f07
> [  650.866798] RDX: 0000000000000000 RSI: 000000000000ae41 RDI: 000000000000000b
> [  650.866802] RBP: 0000000000000000 R08: 0000558248e26a40 R09: 000055824b58e280
> [  650.866805] R10: 0000558249593f40 R11: 0000000000000246 R12: 000055824b55ec90
> [  650.866808] R13: 00007ffd274d79ff R14: 00007f6f8f22c9c0 R15: 000055824b58e280
> [  650.867014] ---[ end trace 2c5d6cfaba0ee1b3 ]---

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
  2017-12-05 17:09   ` Radim Krcmar
@ 2017-12-06  2:48     ` Wanpeng Li
  0 siblings, 0 replies; 13+ messages in thread
From: Wanpeng Li @ 2017-12-06  2:48 UTC (permalink / raw)
  To: Radim Krcmar
  Cc: Rik van Riel, Paolo Bonzini, kvm, linux-kernel, Thomas Gleixner,
	Christian Borntraeger

2017-12-06 1:09 GMT+08:00 Radim Krcmar <rkrcmar@redhat.com>:
> 2017-12-04 10:15+0800, Wanpeng Li:
>> 2017-11-14 13:12 GMT+08:00 Rik van Riel <riel@redhat.com>:
>> > Currently, every time a VCPU is scheduled out, the host kernel will
>> > first save the guest FPU/xstate context, then load the qemu userspace
>> > FPU context, only to then immediately save the qemu userspace FPU
>> > context back to memory. When scheduling in a VCPU, the same extraneous
>> > FPU loads and saves are done.
>> >
>> > This could be avoided by moving from a model where the guest FPU is
>> > loaded and stored with preemption disabled, to a model where the
>> > qemu userspace FPU is swapped out for the guest FPU context for
>> > the duration of the KVM_RUN ioctl.
>> >
>> > This is done under the VCPU mutex, which is also taken when other
>> > tasks inspect the VCPU FPU context, so the code should already be
>> > safe for this change. That should come as no surprise, given that
>> > s390 already has this optimization.
>> >
>> > No performance changes were detected in quick ping-pong tests on
>> > my 4 socket system, which is expected since an FPU+xstate load is
>> > on the order of 0.1us, while ping-ponging between CPUs is on the
>> > order of 20us, and somewhat noisy.
>> >
>> > There may be other tests where performance changes are noticeable.
>>
>> The kvm/queue has the below splatting:
>>
>> [  650.866212] Bad FPU state detected at kvm_put_guest_fpu+0x7d/0x210
>> [kvm], reinitializing FPU registers.
>> [  650.866232] ------------[ cut here ]------------
>> [  650.866241] WARNING: CPU: 2 PID: 2583 at arch/x86/mm/extable.c:103
>> ex_handler_fprestore+0x5f/0x70
>> [  650.866473]  libahci wmi hid pinctrl_sunrisepoint video pinctrl_intel
>> [  650.866496] CPU: 2 PID: 2583 Comm: qemu-system-x86 Not tainted 4.14.0+ #7
>> [  650.866500] Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS
>> 1.4.9 09/12/2016
>> [  650.866503] task: ffff97a095a28000 task.stack: ffffa71c8585c000
>> [  650.866509] RIP: 0010:ex_handler_fprestore+0x5f/0x70
>> [  650.866512] RSP: 0018:ffffa71c8585fc28 EFLAGS: 00010282
>> [  650.866519] RAX: 000000000000005b RBX: ffffa71c8585fc68 RCX: 0000000000000006
>> [  650.866522] RDX: 0000000000000000 RSI: ffffffffb4d35333 RDI: 0000000000000282
>> [  650.866526] RBP: 000000000000000d R08: 00000000fddae359 R09: 0000000000000000
>> [  650.866529] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>> [  650.866532] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
>> [  650.866536] FS:  00007f6f8f22c700(0000) GS:ffff97a09ca00000(0000)
>> knlGS:0000000000000000
>> [  650.866540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  650.866543] CR2: 00007f6f993f3000 CR3: 00000003d4aae005 CR4: 00000000003626e0
>> [  650.866547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  650.866550] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [  650.866554] Call Trace:
>> [  650.866559]  fixup_exception+0x32/0x40
>> [  650.866567]  do_general_protection+0xa0/0x1b0
>> [  650.866574]  general_protection+0x22/0x30
>> [  650.866595] RIP: 0010:kvm_put_guest_fpu+0x7d/0x210 [kvm]
>> [  650.866599] RSP: 0018:ffffa71c8585fd18 EFLAGS: 00010246
>> [  650.866605] RAX: 00000000ffffffff RBX: ffff97a095a30000 RCX: 0000000000000001
>> [  650.866608] RDX: 00000000ffffffff RSI: 00000000f7d5d46a RDI: ffff97a095a30b80
>> [  650.866611] RBP: 0000000000000000 R08: 00000000fddae359 R09: ffff97a095a28968
>> [  650.866615] R10: 0000000000000000 R11: 00000000e8d39b88 R12: ffff97a095a31bc0
>> [  650.866618] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
>> [  650.866650]  ? kvm_put_guest_fpu+0x27/0x210 [kvm]
>
> Looks like we're calling put when the fpu was not loaded.  The simplest
> fix would be:
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e0367f688547..064eba25c215 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7842,7 +7842,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>                  * To avoid have the INIT path from kvm_apic_has_events() that be
>                  * called with loaded FPU and does not let userspace fix the state.
>                  */
> -               kvm_put_guest_fpu(vcpu);
> +               if (init_event)
> +                       kvm_put_guest_fpu(vcpu);
>                 mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
>                                         XFEATURE_MASK_BNDREGS);
>                 if (mpx_state_buffer)
> @@ -7851,6 +7852,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>                                         XFEATURE_MASK_BNDCSR);
>                 if (mpx_state_buffer)
>                         memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
> +               if (init_event)
> +                       kvm_load_guest_fpu(vcpu);
>         }
>
>         if (!init_event) {
>
> I'll carry that until there is a nicer solution, thanks for the report.

NP, and thanks for the fix. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-12-06  2:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-14  5:12 [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run Rik van Riel
2017-11-14 16:57 ` David Hildenbrand
2017-11-14 18:07   ` Rik van Riel
2017-11-14 18:09     ` Paolo Bonzini
2017-11-14 19:40     ` David Hildenbrand
2017-11-14 21:11       ` Rik van Riel
2017-11-15  8:34       ` Paolo Bonzini
2017-11-15  9:23         ` David Hildenbrand
2017-11-15 14:50         ` Rik van Riel
2017-11-15 15:20           ` David Hildenbrand
2017-12-04  2:15 ` Wanpeng Li
2017-12-05 17:09   ` Radim Krcmar
2017-12-06  2:48     ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).