From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752384AbdCPOGt (ORCPT <rfc822;w@1wt.eu>);
        Thu, 16 Mar 2017 10:06:49 -0400
Received: from mx1.redhat.com ([209.132.183.28]:34176 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751842AbdCPOGr (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 16 Mar 2017 10:06:47 -0400
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com D06A181F03
Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=mst@redhat.com
DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com D06A181F03
Date: Thu, 16 Mar 2017 16:04:12 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Gabriel L. Somlo" <gsomlo@gmail.com>
Cc: linux-kernel@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
        Radim =?utf-8?B?S3LEjW3DocWZ?= <rkrcmar@redhat.com>,
        Jonathan Corbet <corbet@lwn.net>, Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
        x86@kernel.org, Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org,
        linux-doc@vger.kernel.org
Subject: Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests
Message-ID: <20170316155550-mutt-send-email-mst@kernel.org>
References: <1489612895-12799-1-git-send-email-mst@redhat.com>
 <20170315233534.GG2239@HEDWIG.INI.CMU.EDU>
 <20170316013903-mutt-send-email-mst@kernel.org>
 <20170316132426.GB4085@HEDWIG.INI.CMU.EDU>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20170316132426.GB4085@HEDWIG.INI.CMU.EDU>
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Thu, 16 Mar 2017 14:04:17 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Mar 16, 2017 at 09:24:27AM -0400, Gabriel L. Somlo wrote:
> After studying your patch a bit more carefully (sorry, it's crazy
> around here right now :) ) I realized you're simply trying to
> (selectively) decide when to exit L1 and emulate as NOP vs. when to
> just allow L1 to execute MONITOR & MWAIT natively.
> 
> Is that right ? Because if so, the issues I saw on my MacPro1,1 are
> weird and inexplicable, given that allowing L>=1 to run MONITOR/MWAIT
> natively was one of the options Alex Graf and Rene Rebe used back in
> the very early days of OS X on QEMU, at the time I got involved with
> that project. Here's part of an out of tree patch against 3.4 which did
> just that, and worked as far as I remember on *any* MWAIT capable
> intel chip I had access to back in 2010:
> 
> ##############################################################################
> # 99-mwait.patch.kvm-kmod (Rene Rebe <rene@exactcode.de>) 2010-04-27
> ##############################################################################
> diff -pNarU5 linux-3.4/arch/x86/kvm/cpuid.c linux-3.4-mac/arch/x86/kvm/cpuid.c
> --- linux-3.4/arch/x86/kvm/cpuid.c	2012-05-20 18:29:13.000000000 -0400
> +++ linux-3.4-mac/arch/x86/kvm/cpuid.c	2012-10-09 11:42:59.921215750 -0400
> @@ -222,11 +222,11 @@ static int do_cpuid_ent(struct kvm_cpuid
>  		f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
>  		F(FXSR) | F(FXSR_OPT) | f_gbpages | f_rdtscp |
>  		0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW);
>  	/* cpuid 1.ecx */
>  	const u32 kvm_supported_word4_x86_features =
> -		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> +		F(XMM3) | F(PCLMULQDQ) | F(MWAIT) /* DTES64, MONITOR */ |
>  		0 /* DS-CPL, VMX, SMX, EST */ |
>  		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
>  		F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
>  		0 /* Reserved, DCA */ | F(XMM4_1) |
>  		F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
> diff -pNarU5 linux-3.4/arch/x86/kvm/svm.c linux-3.4-mac/arch/x86/kvm/svm.c
> --- linux-3.4/arch/x86/kvm/svm.c	2012-05-20 18:29:13.000000000 -0400
> +++ linux-3.4-mac/arch/x86/kvm/svm.c	2012-10-09 11:44:41.598997481 -0400
> @@ -1102,12 +1102,10 @@ static void init_vmcb(struct vcpu_svm *s
>  	set_intercept(svm, INTERCEPT_VMSAVE);
>  	set_intercept(svm, INTERCEPT_STGI);
>  	set_intercept(svm, INTERCEPT_CLGI);
>  	set_intercept(svm, INTERCEPT_SKINIT);
>  	set_intercept(svm, INTERCEPT_WBINVD);
> -	set_intercept(svm, INTERCEPT_MONITOR);
> -	set_intercept(svm, INTERCEPT_MWAIT);
>  	set_intercept(svm, INTERCEPT_XSETBV);
>  
>  	control->iopm_base_pa = iopm_base;
>  	control->msrpm_base_pa = __pa(svm->msrpm);
>  	control->int_ctl = V_INTR_MASKING_MASK;
> diff -pNarU5 linux-3.4/arch/x86/kvm/vmx.c linux-3.4-mac/arch/x86/kvm/vmx.c
> --- linux-3.4/arch/x86/kvm/vmx.c	2012-05-20 18:29:13.000000000 -0400
> +++ linux-3.4-mac/arch/x86/kvm/vmx.c	2012-10-09 11:42:59.925215977 -0400
> @@ -1938,11 +1938,11 @@ static __init void nested_vmx_setup_ctls
>  		nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high);
>  	nested_vmx_procbased_ctls_low = 0;
>  	nested_vmx_procbased_ctls_high &=
>  		CPU_BASED_VIRTUAL_INTR_PENDING | CPU_BASED_USE_TSC_OFFSETING |
>  		CPU_BASED_HLT_EXITING | CPU_BASED_INVLPG_EXITING |
> -		CPU_BASED_MWAIT_EXITING | CPU_BASED_CR3_LOAD_EXITING |
> +		CPU_BASED_CR3_LOAD_EXITING |
>  		CPU_BASED_CR3_STORE_EXITING |
>  #ifdef CONFIG_X86_64
>  		CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING |
>  #endif
>  		CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
> @@ -2404,12 +2404,10 @@ static __init int setup_vmcs_config(stru
>  	      CPU_BASED_CR3_LOAD_EXITING |
>  	      CPU_BASED_CR3_STORE_EXITING |
>  	      CPU_BASED_USE_IO_BITMAPS |
>  	      CPU_BASED_MOV_DR_EXITING |
>  	      CPU_BASED_USE_TSC_OFFSETING |
> -	      CPU_BASED_MWAIT_EXITING |
> -	      CPU_BASED_MONITOR_EXITING |
>  	      CPU_BASED_INVLPG_EXITING |
>  	      CPU_BASED_RDPMC_EXITING;
>  
>  	opt = CPU_BASED_TPR_SHADOW |
>  	      CPU_BASED_USE_MSR_BITMAPS |
> 
> If all you're trying to do is (selectively) revert to this behavior,
> that "shouldn't" mess it up for the MacPro either, so I'm thoroughly
> confused at this point :)

Yes.  Me too. Want to try that other patch and see what happens?

> Back in 2010, running MWAIT in L>=1  behaved 100% exactly like a NOP,
> didn't power down the physical CPU, just immediately moved on to the
> next instruction. As such, there was no power saving and no
> opportunity to yield to another L0 thread either, unlike with NOP
> emulation at L0.
> 
> Did that change on newer Intel chips (i.e., is guest-mode MWAIT now
> doing something smarter than just acting as a guest-mode NOP) ?
> 
> Thanks,
> --Gabriel

Interesting.  What it seems to say is this:

MWAIT. Behavior of the MWAIT instruction (which always causes an invalid-
opcode exception—#UD—if CPL > 0) is determined by the setting of the “MWAIT
exiting” VM-execution control:
— If the “MWAIT exiting” VM-execution control is 1, MWAIT causes a VM exit
(see Section 22.1.3).
— If the “MWAIT exiting” VM-execution control is 0, MWAIT operates normally if
any of the following is true: (1) the “interrupt-window exiting” VM-execution
control is 0; (2) ECX[0] is 0; or (3) RFLAGS.IF = 1.
— If the “MWAIT exiting” VM-execution control is 0, the “interrupt-window
exiting” VM-execution control is 1, ECX[0] = 1, and RFLAGS.IF = 0, MWAIT
does not cause the processor to enter an implementation-dependent
optimized state; instead, control passes to the instruction following the
MWAIT instruction.


And since interrupt-window exiting is 0 most of the time for KVM,
I would expect MWAIT to behave normally.


-- 
MST