From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757012AbdEUUTj (ORCPT ); Sun, 21 May 2017 16:19:39 -0400 Received: from mail.kernel.org ([198.145.29.99]:38178 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756760AbdEUUTi (ORCPT ); Sun, 21 May 2017 16:19:38 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C8EFA23961 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org Subject: Re: [RFC] KVM: SVM: do not drop VMCB CPL to 0 if SS is not present To: Roman Penyaev , Andy Lutomirski Cc: Mikhail Sennikovskii , Paolo Bonzini , Gleb Natapov , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Borislav Petkov , Paolo Bonzini References: <1b02a87e-3dd9-24eb-7960-2a2aa5a18515@kernel.org> From: Andy Lutomirski Message-ID: Date: Sun, 21 May 2017 13:19:32 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-MW Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/21/2017 12:53 AM, Roman Penyaev wrote: > On Sun, May 21, 2017 at 5:31 AM, Andy Lutomirski wrote: >> On 05/19/2017 09:14 AM, Roman Penyaev wrote: >>> >>> Hi folks, >>> >>> After experiencing guest double faults (sometimes triple faults) on >>> 3.16 guest kernels with the following common pattern: >>> > > [cut] > >>> >>> Further tracking of VMCB states before and after VMRUN showed, that >>> CPL becomes 0 when VMEXIT happens with the following SS segment: >>> >>> ss = { >>> selector = 0x2b, >>> attrib = 0x400, >>> limit = 0xffffffff, >>> base = 0x0 >>> }, >>> >>> cpl = 0x3 >>> >>> Then on next VMRUN VMCB looks as the following: >>> >>> ss = { >>> selector = 0x2b, >>> attrib = 0x0, <<< dropped to 0 >>> limit = 0xffffffff, >>> base = 0x0 >>> }, >>> >>> cpl = 0x0, <<< dropped to 0 >>> >>> Obviously it was changed between VMRUN calls. The following backtrace >>> shows that VMCB.SAVE.CPL was set to 0 by QEMU itself: >>> >>> CPU: 55 PID: 59531 Comm: kvm >>> [] kvm_arch_vcpu_ioctl_set_sregs+0x2e0/0x480 [kvm] >>> [] kvm_write_guest_cached+0x540/0xc00 [kvm] >>> [] ? finish_task_switch+0x185/0x240 >>> [] ? __schedule+0x28c/0xa10 >>> [] do_vfs_ioctl+0x2cd/0x4a0 >>> >>> SS segment which came from QEMU had the following struct members: >>> >>> SS->base = 0 >>> SS->limit = ffffffff >>> SS->selector = 2b >>> SS->type = 0 >>> SS->present = 0 >>> SS->dpl = 0 >>> SS->db = 0 >>> SS->s = 0 >>> SS->l = 0 >>> SS->g = 0 >>> SS->avl = 0 >>> SS->unusable = 1 >>> >>> Indeed, on last VMEXIT SS segment does not have (P) present bit set in >>> segment attributes: >>> >>> (gdb) p 0x400 & (1 << SVM_SELECTOR_P_SHIFT) >>> $1 = 0 >> >> >> Huh? How is that even possible? It should not be possible to actually run >> the vCPU with a non-NULL SS that isn't present. > > That is utterly good question :) I do not know. According to my shallow > understanding (P) bit is only a hint for CPU that corresponding segment was > read from gdt and now is cached in private CPU registers (attributes). > Am I right? > > At least what I see that it is quite often the case when we exit from VMRUN > with segment not present then VMRUN is resumed and on next vmexit segment has > correct attributes. > >> How would you cause it to happen? > > We run fio and iperf tests in guests for a couple of days. Nothing more, > nothing special. Guests are bare debians with 3.16 kernels. > >> >> Unless... is this the sysret_ss_attrs issue? > > What is the issue? This one > > https://lkml.org/lkml/2015/4/24/770 Yes. But I was thinking about it wrong, since this is probably 64-bit userspace, not 32-bit userspace. Here's my theory: 1. User task A does a syscall. It's not in kernel mode with SS != 0. 2. The scheduler runs and switches to task B. SS != 0. 2. Kernel enters user mode for task B. 3. User task B gets interrupted. Kernel ends up running with SS = 0. 4. Kernel switches back to task A. SS == 0. 5. Kernel does SYSRET. SS == __USER_DS, but SS's attributes are messed up. 6. QEMU does whatever it does that inspires it to zap SS's attributes. 7. Boom. If task B were 32-bit, then the vDSO would fix up SS, so there would only be a 1-instruction window for problems. To check this theory, you could try backporting this to the guest and seeing if the problem goes away: commit 61f01dd941ba9e06d2bf05994450ecc3d61b6b8b Author: Andy Lutomirski Date: Sun Apr 26 16:47:59 2015 -0700 x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue >>> >>> So when on VMEXIT we have such SS state (P bit is not set) and QEMU >>> just decides to synchronize registers the following happens on QEMU >>> side: >>> >>> kvm_cpu_synchronize_state(): >>> kvm_arch_get_registers(): >>> ... >>> get_seg(): >>> if (rhs->unusable) { >>> lhs->flags = 0; <<< SS is unusable [(P) is not set) >>> <<< all attributes are dropped to 0. >>> } >>> cpu->kvm_vcpu_dirty = true; <<< Mark VCPU state as dirty >>> >> >> Looks like the bug is in QEMU, then, right? > > KVM SVM restores CPL from unusable selector, obviously this is not nice. I would imagine that QEMU shouldn't be feeding KVM such a selector. Also, there's an invariant that SS.DPL == CPL, at least most of the time, although this SYSRET issue may be the exception. Paolo, what's the intended behavior here? Is the bug in KVM or in QEMU? > > arch/x86/kvm/svm.c:svm_set_segment(): > > if (var->unusable) > s->attrib = 0; > ... > if (seg == VCPU_SREG_SS) > svm->vmcb->save.cpl = (s->attrib >> SVM_SELECTOR_DPL_SHIFT) & 3; > > > Meanwhile QEMU resets attributes, despite the fact that DPL (which is passed > from KVM) is correct. > > So it is not clear what is the proper way to fix that. What is clear is > that CPL is set to 0 because of this game with registers on both sides. > Now the question is what side to fix or probably both. > > >> Couldn't you just fix this code >> in QEMU by, say, deleting it? > > Certainly, but would be nice to listen to KVM maintainers. At least the issue > is clear and what is left is a proper one-line fix :) --Andy