All of lore.kernel.org
 help / color / mirror / Atom feed
From: Radim Krcmar <rkrcmar@redhat.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: Rik van Riel <riel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, kvm <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christian Borntraeger <borntraeger@de.ibm.com>
Subject: Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
Date: Tue, 5 Dec 2017 18:09:55 +0100	[thread overview]
Message-ID: <20171205170951.GA829@flask> (raw)
In-Reply-To: <CANRm+CznfORBCuap5nCoZij-hnOaGmUV42R4i8Hm3G8v3f-YKQ@mail.gmail.com>

2017-12-04 10:15+0800, Wanpeng Li:
> 2017-11-14 13:12 GMT+08:00 Rik van Riel <riel@redhat.com>:
> > Currently, every time a VCPU is scheduled out, the host kernel will
> > first save the guest FPU/xstate context, then load the qemu userspace
> > FPU context, only to then immediately save the qemu userspace FPU
> > context back to memory. When scheduling in a VCPU, the same extraneous
> > FPU loads and saves are done.
> >
> > This could be avoided by moving from a model where the guest FPU is
> > loaded and stored with preemption disabled, to a model where the
> > qemu userspace FPU is swapped out for the guest FPU context for
> > the duration of the KVM_RUN ioctl.
> >
> > This is done under the VCPU mutex, which is also taken when other
> > tasks inspect the VCPU FPU context, so the code should already be
> > safe for this change. That should come as no surprise, given that
> > s390 already has this optimization.
> >
> > No performance changes were detected in quick ping-pong tests on
> > my 4 socket system, which is expected since an FPU+xstate load is
> > on the order of 0.1us, while ping-ponging between CPUs is on the
> > order of 20us, and somewhat noisy.
> >
> > There may be other tests where performance changes are noticeable.
> 
> The kvm/queue has the below splatting:
> 
> [  650.866212] Bad FPU state detected at kvm_put_guest_fpu+0x7d/0x210
> [kvm], reinitializing FPU registers.
> [  650.866232] ------------[ cut here ]------------
> [  650.866241] WARNING: CPU: 2 PID: 2583 at arch/x86/mm/extable.c:103
> ex_handler_fprestore+0x5f/0x70
> [  650.866473]  libahci wmi hid pinctrl_sunrisepoint video pinctrl_intel
> [  650.866496] CPU: 2 PID: 2583 Comm: qemu-system-x86 Not tainted 4.14.0+ #7
> [  650.866500] Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS
> 1.4.9 09/12/2016
> [  650.866503] task: ffff97a095a28000 task.stack: ffffa71c8585c000
> [  650.866509] RIP: 0010:ex_handler_fprestore+0x5f/0x70
> [  650.866512] RSP: 0018:ffffa71c8585fc28 EFLAGS: 00010282
> [  650.866519] RAX: 000000000000005b RBX: ffffa71c8585fc68 RCX: 0000000000000006
> [  650.866522] RDX: 0000000000000000 RSI: ffffffffb4d35333 RDI: 0000000000000282
> [  650.866526] RBP: 000000000000000d R08: 00000000fddae359 R09: 0000000000000000
> [  650.866529] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [  650.866532] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
> [  650.866536] FS:  00007f6f8f22c700(0000) GS:ffff97a09ca00000(0000)
> knlGS:0000000000000000
> [  650.866540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  650.866543] CR2: 00007f6f993f3000 CR3: 00000003d4aae005 CR4: 00000000003626e0
> [  650.866547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  650.866550] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  650.866554] Call Trace:
> [  650.866559]  fixup_exception+0x32/0x40
> [  650.866567]  do_general_protection+0xa0/0x1b0
> [  650.866574]  general_protection+0x22/0x30
> [  650.866595] RIP: 0010:kvm_put_guest_fpu+0x7d/0x210 [kvm]
> [  650.866599] RSP: 0018:ffffa71c8585fd18 EFLAGS: 00010246
> [  650.866605] RAX: 00000000ffffffff RBX: ffff97a095a30000 RCX: 0000000000000001
> [  650.866608] RDX: 00000000ffffffff RSI: 00000000f7d5d46a RDI: ffff97a095a30b80
> [  650.866611] RBP: 0000000000000000 R08: 00000000fddae359 R09: ffff97a095a28968
> [  650.866615] R10: 0000000000000000 R11: 00000000e8d39b88 R12: ffff97a095a31bc0
> [  650.866618] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
> [  650.866650]  ? kvm_put_guest_fpu+0x27/0x210 [kvm]

Looks like we're calling put when the fpu was not loaded.  The simplest
fix would be:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e0367f688547..064eba25c215 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7842,7 +7842,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		 * To avoid have the INIT path from kvm_apic_has_events() that be
 		 * called with loaded FPU and does not let userspace fix the state.
 		 */
-		kvm_put_guest_fpu(vcpu);
+		if (init_event)
+			kvm_put_guest_fpu(vcpu);
 		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
 					XFEATURE_MASK_BNDREGS);
 		if (mpx_state_buffer)
@@ -7851,6 +7852,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 					XFEATURE_MASK_BNDCSR);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
+		if (init_event)
+			kvm_load_guest_fpu(vcpu);
 	}
 
 	if (!init_event) {

I'll carry that until there is a nicer solution, thanks for the report.

> [  650.866670]  kvm_vcpu_reset+0x1be/0x250 [kvm]
> [  650.866689]  kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
> [  650.866707]  kvm_vm_ioctl+0x31a/0x820 [kvm]
> [  650.866712]  ? __lock_acquire+0x809/0x1410
> [  650.866721]  ? __lock_acquire+0x809/0x1410
> [  650.866734]  do_vfs_ioctl+0x9f/0x6c0
> [  650.866743]  ? __fget+0x108/0x1f0
> [  650.866752]  SyS_ioctl+0x74/0x80
> [  650.866757]  ? do_syscall_64+0xc4/0x3d0
> [  650.866764]  do_syscall_64+0x8a/0x3d0
> [  650.866769]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> [  650.866781]  entry_SYSCALL64_slow_path+0x25/0x25
> [  650.866785] RIP: 0033:0x7f6f973a0f07
> [  650.866788] RSP: 002b:00007f6f8f22b968 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  650.866795] RAX: ffffffffffffffda RBX: 000000000000ae41 RCX: 00007f6f973a0f07
> [  650.866798] RDX: 0000000000000000 RSI: 000000000000ae41 RDI: 000000000000000b
> [  650.866802] RBP: 0000000000000000 R08: 0000558248e26a40 R09: 000055824b58e280
> [  650.866805] R10: 0000558249593f40 R11: 0000000000000246 R12: 000055824b55ec90
> [  650.866808] R13: 00007ffd274d79ff R14: 00007f6f8f22c9c0 R15: 000055824b58e280
> [  650.867014] ---[ end trace 2c5d6cfaba0ee1b3 ]---

  reply	other threads:[~2017-12-05 17:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-14  5:12 Rik van Riel
2017-11-14 16:57 ` David Hildenbrand
2017-11-14 18:07   ` Rik van Riel
2017-11-14 18:09     ` Paolo Bonzini
2017-11-14 19:40     ` David Hildenbrand
2017-11-14 21:11       ` Rik van Riel
2017-11-15  8:34       ` Paolo Bonzini
2017-11-15  9:23         ` David Hildenbrand
2017-11-15 14:50         ` Rik van Riel
2017-11-15 15:20           ` David Hildenbrand
2017-12-04  2:15 ` Wanpeng Li
2017-12-05 17:09   ` Radim Krcmar [this message]
2017-12-06  2:48     ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171205170951.GA829@flask \
    --to=rkrcmar@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --subject='Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.