From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Gabriel L. Somlo" Subject: Re: [PATCH v2] kvm: x86: emulate monitor and mwait instructions as nop Date: Mon, 2 Jun 2014 21:55:47 -0400 Message-ID: <20140603015518.GA22023@foober.ini.cmu.edu> References: <20140507205210.GA30030@ERROL.INI.CMU.EDU> <20140602192530.GC1653@ERROL.INI.CMU.EDU> <20140602202044.GA5676@redhat.com> <20140602204128.GA5791@redhat.com> <08D3B919-3B46-4464-AB97-7EBFDA8AB232@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "Michael S. Tsirkin" , "kvm@vger.kernel.org" , "pbonzini@redhat.com" , "afaerber@suse.de" , gsomlo@gmail.com To: Alexander Graf Return-path: Received: from mail-qa0-f49.google.com ([209.85.216.49]:58189 "EHLO mail-qa0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753015AbaFCBzw (ORCPT ); Mon, 2 Jun 2014 21:55:52 -0400 Received: by mail-qa0-f49.google.com with SMTP id cm18so3843820qab.8 for ; Mon, 02 Jun 2014 18:55:51 -0700 (PDT) Content-Disposition: inline In-Reply-To: <08D3B919-3B46-4464-AB97-7EBFDA8AB232@suse.de> Sender: kvm-owner@vger.kernel.org List-ID: On Mon, Jun 02, 2014 at 11:01:07PM +0200, Alexander Graf wrote: > > > > Am 02.06.2014 um 22:41 schrieb "Michael S. Tsirkin" : > > > >> On Mon, Jun 02, 2014 at 10:35:56PM +0200, Alexander Graf wrote: > >> > >> > >>>> Am 02.06.2014 um 22:20 schrieb "Michael S. Tsirkin" : > >>>> > >>>> On Mon, Jun 02, 2014 at 09:48:19PM +0200, Alexander Graf wrote: > >>>> > >>>> > >>>>>> Am 02.06.2014 um 21:25 schrieb "Gabriel L. Somlo" : > >>>>>> > >>>>>> On Wed, May 07, 2014 at 04:52:13PM -0400, Gabriel L. Somlo wrote: > >>>>>> Treat monitor and mwait instructions as nop, which is architecturally > >>>>>> correct (but inefficient) behavior. We do this to prevent misbehaving > >>>>>> guests (e.g. OS X <= 10.7) from crashing after they fail to check for > >>>>>> monitor/mwait availability via cpuid. > >>>>>> > >>>>>> Since mwait-based idle loops relying on these nop-emulated instructions > >>>>>> would keep the host CPU pegged at 100%, do NOT advertise their presence > >>>>>> via cpuid, to prevent compliant guests from using them inadvertently. > >>>>>> > >>>>>> Signed-off-by: Gabriel L. Somlo > >>>>>> --- > >>>>>> > >>>>>> New in v2: remove invalid_op handler functions which were only used to > >>>>>> handle exits caused by monitor and mwait > >>>>>> > >>>>>>>> On Wed, May 07, 2014 at 08:31:27PM +0200, Alexander Graf wrote: > >>>>>>>> On 05/07/2014 08:15 PM, Michael S. Tsirkin wrote: > >>>>>>>> If we really want to be paranoid and worry about guests > >>>>>>>> that use this strange way to trigger invalid opcode, > >>>>>>>> we can make it possible for userspace to enable/disable > >>>>>>>> this hack, and teach qemu to set it. > >>>>>>>> > >>>>>>>> That would make it even safer than it was. > >>>>>>>> > >>>>>>>> Not sure it's worth it, just a thought. > >>>>>>> > >>>>>>> Since we don't trap on non-exposed other instructions (new SSE and > >>>>>>> whatdoiknow) I don't think it's really bad to just expose > >>>>>>> MONITOR/MWAIT as nops. > >>>>> > >>>>> Would it make sense to make this a module parameter, > >>>>> (e.g., "int emulate_mwait") ? > >>>>> > >>>>> Default would be 0 (no emulation). 1 would mean "emulate as nop", and > >>>>> if anyone ever figures out how to do proper page-locking based > >>>>> emulation we could use 2 to enable that, etc. ? > >>>>> > >>>>> Not sure we'd want qemu to enable/disable it automatically, though... > >>>>> > >>>>> What do you all think ? > >>>> > >>>> I don't like module parameters - they're system global and there's a good chance you want to run non-osx in parallel ;). > >>>> > >>>> I'd either link this to the cpuid bits or enable it forcefully through ENABLE_CAP per vcpu. > >>>> > >>>> Alex > >>> > >>> Point is that. > >>> Paolo here thinks it's safe to just make it a NOP unconditionally. > >>> so module parameter would be there as a debugging tool: > >>> as a means for users to test with old kvm behaviour if they see breakage. > >>> Which we don't expect, so no need to waste cycles creating a pretty > >>> interface for it. > >> > >> Both interfaces already exist, so where's the problem? > > > > Hmm sorry which interfaces for enabling mwait nop emulation exist? > > User space can force cpuid bits that kvm doesn't return as supported, so we do have a negative-by-default switch. > > We also have an ENABLE_CAP ioctl. Enabling the monitor/mwait nop ability explicitly by that is a 5 line patch. > > Either way is very flexible and not system wide. W.r.t. monitor/mwait, a guest can do one of the following: 1. Never check CPUID, and never use monitor/mwait - This is great, we don't have to do anything about these 2. Check CPUID for mwait, use it to idle in preference over hlt - Linux, Windows, and Mavericks (10.9) do this - we never want to have CPUID say "yes" to these, since monitor/mwait support will be clunky in the best case, and hlt is overwhelmingly preferable! [*] 3. Never check CPUID, use monitor/mwait with abandon - OS X 10.6 .. 10.8 does this - emulating monitor/mwait here allows us to boot the guest and use it, and perform sysadmin surgery to force a hlt based idle 4. Check CPUID, panic if unavailable - OS X 10.5 did this, IIRC. - whether I can do kext surgery and get it to stop checking CPUID *in addition to* falling back to hlt-based idle is TBD. - emulating monitor/mwait allows us to boot this type of guest, BUT WE ALSO HAVE TO ADVERTISE IT VIA CPUID !!! I like telling qemu on the command line "do monitor = mwait = nop; for this guest only", and having qemu pass that on to KVM for only the VCPUs associated with this guest, optionally, for cases 3 and 4 only (everyone else gets the invalid opcode fault behavior as before). [*] I think we've been over this a few times already, but here's a quick recap: - monitor == mwait == NOP is correct (albeitwasteful) behavior - mwait MUST expect and deal with spurious wakeups (per the Intel manual) - mwait == nop is an INSTANT spurious wakeup (hence works OK with any correctly written program) ! - monitor == nop won't "arm" anything, but that doesn't matter if mwait always immediately wakes up ! - this pegs the host CPU to 100%, so MUCH worse than hlt, shouldn't do it unless we ABSOLUTELY HAVE TO !!! - guest-mode mwait should NEVER be allowed to stop the host CPU (and, according to the Intel manual, it's HARD to try and make it do so, which I think is on purpose !) - instead, guest-mode mwait should map to a host-side condition-wait (where a write to a monitor-ed area acts as condition-signal). - the most likely way to implement something like that would be to write-protect pages and handle write faults - and I never got it working *properly* (but I'm a n00b, so that ain't saying much :) - but the granularity would be all wrong compared to any real CPU (1 page >> typical monitored area size) - but I still don't see it being any better than hlt-based idle, even if we *did* get it to work correctly !!! I'll look into ENABLE_CAP, and how to expose that on the qemu command line (I think I might need both methods mentioned by Alex in tandem, but I'll have to study existing examples before I can say anything useful here). Any extra words of wisdom on how to do that, what examples might be best to study for inspiration, etc, much appreciated !!! Thanks, --Gabriel