All of lore.kernel.org
 help / color / mirror / Atom feed
From: Radim Krcmar <rkrcmar@redhat.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: Rik van Riel <riel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, kvm <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christian Borntraeger <borntraeger@de.ibm.com>
Subject: Re: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
Date: Tue, 5 Dec 2017 18:09:55 +0100	[thread overview]
Message-ID: <20171205170951.GA829@flask> (raw)
In-Reply-To: <CANRm+CznfORBCuap5nCoZij-hnOaGmUV42R4i8Hm3G8v3f-YKQ@mail.gmail.com>

2017-12-04 10:15+0800, Wanpeng Li:
> 2017-11-14 13:12 GMT+08:00 Rik van Riel <riel@redhat.com>:
> > Currently, every time a VCPU is scheduled out, the host kernel will
> > first save the guest FPU/xstate context, then load the qemu userspace
> > FPU context, only to then immediately save the qemu userspace FPU
> > context back to memory. When scheduling in a VCPU, the same extraneous
> > FPU loads and saves are done.
> >
> > This could be avoided by moving from a model where the guest FPU is
> > loaded and stored with preemption disabled, to a model where the
> > qemu userspace FPU is swapped out for the guest FPU context for
> > the duration of the KVM_RUN ioctl.
> >
> > This is done under the VCPU mutex, which is also taken when other
> > tasks inspect the VCPU FPU context, so the code should already be
> > safe for this change. That should come as no surprise, given that
> > s390 already has this optimization.
> >
> > No performance changes were detected in quick ping-pong tests on
> > my 4 socket system, which is expected since an FPU+xstate load is
> > on the order of 0.1us, while ping-ponging between CPUs is on the
> > order of 20us, and somewhat noisy.
> >
> > There may be other tests where performance changes are noticeable.
> 
> The kvm/queue has the below splatting:
> 
> [  650.866212] Bad FPU state detected at kvm_put_guest_fpu+0x7d/0x210
> [kvm], reinitializing FPU registers.
> [  650.866232] ------------[ cut here ]------------
> [  650.866241] WARNING: CPU: 2 PID: 2583 at arch/x86/mm/extable.c:103
> ex_handler_fprestore+0x5f/0x70
> [  650.866473]  libahci wmi hid pinctrl_sunrisepoint video pinctrl_intel
> [  650.866496] CPU: 2 PID: 2583 Comm: qemu-system-x86 Not tainted 4.14.0+ #7
> [  650.866500] Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS
> 1.4.9 09/12/2016
> [  650.866503] task: ffff97a095a28000 task.stack: ffffa71c8585c000
> [  650.866509] RIP: 0010:ex_handler_fprestore+0x5f/0x70
> [  650.866512] RSP: 0018:ffffa71c8585fc28 EFLAGS: 00010282
> [  650.866519] RAX: 000000000000005b RBX: ffffa71c8585fc68 RCX: 0000000000000006
> [  650.866522] RDX: 0000000000000000 RSI: ffffffffb4d35333 RDI: 0000000000000282
> [  650.866526] RBP: 000000000000000d R08: 00000000fddae359 R09: 0000000000000000
> [  650.866529] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [  650.866532] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
> [  650.866536] FS:  00007f6f8f22c700(0000) GS:ffff97a09ca00000(0000)
> knlGS:0000000000000000
> [  650.866540] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  650.866543] CR2: 00007f6f993f3000 CR3: 00000003d4aae005 CR4: 00000000003626e0
> [  650.866547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  650.866550] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  650.866554] Call Trace:
> [  650.866559]  fixup_exception+0x32/0x40
> [  650.866567]  do_general_protection+0xa0/0x1b0
> [  650.866574]  general_protection+0x22/0x30
> [  650.866595] RIP: 0010:kvm_put_guest_fpu+0x7d/0x210 [kvm]
> [  650.866599] RSP: 0018:ffffa71c8585fd18 EFLAGS: 00010246
> [  650.866605] RAX: 00000000ffffffff RBX: ffff97a095a30000 RCX: 0000000000000001
> [  650.866608] RDX: 00000000ffffffff RSI: 00000000f7d5d46a RDI: ffff97a095a30b80
> [  650.866611] RBP: 0000000000000000 R08: 00000000fddae359 R09: ffff97a095a28968
> [  650.866615] R10: 0000000000000000 R11: 00000000e8d39b88 R12: ffff97a095a31bc0
> [  650.866618] R13: 0000000000000000 R14: ffff97a095a30000 R15: 000055824b58e280
> [  650.866650]  ? kvm_put_guest_fpu+0x27/0x210 [kvm]

Looks like we're calling put when the fpu was not loaded.  The simplest
fix would be:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e0367f688547..064eba25c215 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7842,7 +7842,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		 * To avoid have the INIT path from kvm_apic_has_events() that be
 		 * called with loaded FPU and does not let userspace fix the state.
 		 */
-		kvm_put_guest_fpu(vcpu);
+		if (init_event)
+			kvm_put_guest_fpu(vcpu);
 		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
 					XFEATURE_MASK_BNDREGS);
 		if (mpx_state_buffer)
@@ -7851,6 +7852,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 					XFEATURE_MASK_BNDCSR);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
+		if (init_event)
+			kvm_load_guest_fpu(vcpu);
 	}
 
 	if (!init_event) {

I'll carry that until there is a nicer solution, thanks for the report.

> [  650.866670]  kvm_vcpu_reset+0x1be/0x250 [kvm]
> [  650.866689]  kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
> [  650.866707]  kvm_vm_ioctl+0x31a/0x820 [kvm]
> [  650.866712]  ? __lock_acquire+0x809/0x1410
> [  650.866721]  ? __lock_acquire+0x809/0x1410
> [  650.866734]  do_vfs_ioctl+0x9f/0x6c0
> [  650.866743]  ? __fget+0x108/0x1f0
> [  650.866752]  SyS_ioctl+0x74/0x80
> [  650.866757]  ? do_syscall_64+0xc4/0x3d0
> [  650.866764]  do_syscall_64+0x8a/0x3d0
> [  650.866769]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> [  650.866781]  entry_SYSCALL64_slow_path+0x25/0x25
> [  650.866785] RIP: 0033:0x7f6f973a0f07
> [  650.866788] RSP: 002b:00007f6f8f22b968 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  650.866795] RAX: ffffffffffffffda RBX: 000000000000ae41 RCX: 00007f6f973a0f07
> [  650.866798] RDX: 0000000000000000 RSI: 000000000000ae41 RDI: 000000000000000b
> [  650.866802] RBP: 0000000000000000 R08: 0000558248e26a40 R09: 000055824b58e280
> [  650.866805] R10: 0000558249593f40 R11: 0000000000000246 R12: 000055824b55ec90
> [  650.866808] R13: 00007ffd274d79ff R14: 00007f6f8f22c9c0 R15: 000055824b58e280
> [  650.867014] ---[ end trace 2c5d6cfaba0ee1b3 ]---

  reply	other threads:[~2017-12-05 17:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-14  5:12 [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run Rik van Riel
2017-11-14 16:57 ` David Hildenbrand
2017-11-14 18:07   ` Rik van Riel
2017-11-14 18:09     ` Paolo Bonzini
2017-11-14 19:40     ` David Hildenbrand
2017-11-14 21:11       ` Rik van Riel
2017-11-15  8:34       ` Paolo Bonzini
2017-11-15  9:23         ` David Hildenbrand
2017-11-15 14:50         ` Rik van Riel
2017-11-15 15:20           ` David Hildenbrand
2017-12-04  2:15 ` Wanpeng Li
2017-12-05 17:09   ` Radim Krcmar [this message]
2017-12-06  2:48     ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171205170951.GA829@flask \
    --to=rkrcmar@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.