All of lore.kernel.org
 help / color / mirror / Atom feed
* expose MWAIT to dom0
@ 2011-08-15  5:35 Tian, Kevin
  2011-08-15  6:54 ` Keir Fraser
  2011-08-15  8:02 ` Jan Beulich
  0 siblings, 2 replies; 37+ messages in thread
From: Tian, Kevin @ 2011-08-15  5:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Zhang, Yang Z, 'Keir Fraser (keir@xen.org)', Wei, Gang

There're basically two methods to enter a given C-state: legacy (hlt + I/O read),
and native(using mwait). MWAIT is always preferred when both underlying CPU
and OS support, which is a more efficient way to conduct C-state transition.

Xen PM relies on Dom0 to parse ACPI Cx/Px information, which involves one
step to notify BIOS about a set of capabilities supported by OSPM. One capability
is about mwait support, which if true ACPI Cx entry contains entry parameters 
for mwait, or else I/O port information is provided. Xen PM later decides entry
method (i/o or mwait) based on parsed ACPI information from dom0.

However Xen doesn't expose MWAIT capability to dom0 due to changeset 17573:

# HG changeset patch
# User Keir Fraser <keir.fraser@citrix.com>
# Date 1210065934 -3600
# Node ID 777f294e3be81a4d0825e3a9b633a8d81c37f613
# Parent  d5589865bfce91bf65c34bd56e466bf26ca4176f
x86, Intel: Make only EST feature visible to dom0 to enable Cx-state
logic. There should be no need to make MWAIT visible.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
diff -r d5589865bfce -r 777f294e3be8 xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c      Tue May 06 10:19:10 2008 +0100
+++ b/xen/arch/x86/traps.c      Tue May 06 10:25:34 2008 +0100
@@ -713,8 +713,7 @@ static int emulate_forced_invalid_op(str
         __clear_bit(X86_FEATURE_PBE, &d);
 
         __clear_bit(X86_FEATURE_DTES64 % 32, &c);
-        if ( !IS_PRIV(current->domain) )
-            __clear_bit(X86_FEATURE_MWAIT % 32, &c);
+        __clear_bit(X86_FEATURE_MWAIT % 32, &c);
         __clear_bit(X86_FEATURE_DSCPL % 32, &c);
         __clear_bit(X86_FEATURE_VMXE % 32, &c);
         __clear_bit(X86_FEATURE_SMXE % 32, &c);
[...]

This then brings a problem to Dom0 which thinks underlying CPU
doesn't report mwait, and thus notify BIOS to use old I/O based method.

Later a trick is integrated in Jeremy's pvops tree:

commit 4151815a222723002d727ef0a89f0756d4e7d3d2
Author: Wei Gang <gang.wei@intel.com>
Date:   Mon Apr 5 14:21:18 2010 -0700

    xen/acpi: re-enable mwait for xen cpuidle
    
    Xen hypervisor doesn't export mwait feature to dom0, but latest Linux
    kernel start to check this feature while initializing _PDC
    object. Bypass such check in pv-ops case to re-enable mwait for xen
    cpuidle.
    
    Signed-off-by: Wei Gang <gang.wei@intel.com>
    Signed-off-by: Yu Ke <ke.yu@intel.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

diff --git a/arch/x86/kernel/acpi/processor.c b/arch/x86/kernel/acpi/processor.c
index 8c9526d..d88866c 100644
--- a/arch/x86/kernel/acpi/processor.c
+++ b/arch/x86/kernel/acpi/processor.c
@@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr, struct cpuinfo_x86 *c)
        /*
         * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
         */
-       if (!cpu_has(c, X86_FEATURE_MWAIT))
+       if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain())
                buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
 
        obj->type = ACPI_TYPE_BUFFER;

Above trick is ugly and error-prone, since it always enable mwait regardless of
actual CPU capability. It's unlikely to make into upstream, and also get lost in
into some distro such as SLES11.

Instead of enhancing current approach (e.g. add a separate channel to reveal
mwait capability instead of using cpuid), I'm curious why we don't expose mwait
in cpuid to dom0 directly. At least original comment in 17573 looks obscure, 
since EIST has nothing to do with Cx...

Thanks
Kevin

^ permalink raw reply related	[flat|nested] 37+ messages in thread
* RE: expose MWAIT to dom0
@ 2011-08-18 12:35 Jan Beulich
  2011-08-19  1:31 ` Tian, Kevin
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2011-08-18 12:35 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 16.08.11 at 10:29, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> Sent: Tuesday, August 16, 2011 4:13 PM
>> 
>> >>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> >> Sent: Tuesday, August 16, 2011 2:42 PM
>> >>
>> >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> >> >> Sent: Monday, August 15, 2011 6:29 PM
>> >> >>
>> >> >> that while improving the situation on CPUs that support the break-on-
>> >> >> interrupt extension to mwait, it would result in C2/C3 not being usable
>> >> >> at all on CPUs that don't (but support mwait in its simpler form and
>> >> >> have ACPI tables specifying FFH as address space id). Is that only a
>> >> >> theoretical concern (i.e. is there an implicit guarantee that for other
>> >> >> than C1 FFH wouldn't be specified without that extension being
>> >> >> available)? I thinks it's a practical one, or otherwise there wouldn't
>> >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC
>> >> >> evaluation.
>> >> >
>> >> > Yes, this is a practical one, though I don't know any box doing that. In
>> > all
>> >> > the boxes I've been using so far, all the extensions are available. But
>> >> > since
>> >> > BIOS vendor may also impact the availability of CPUID bits, I think we
>> >> > should do it right by strictly conforming to theSDM. I.e. we need check
>> >> > CPUID leafs and then verify all Cx states propagated from dom0, instead
>> >> > of blindly following its info. Will work a patch for that.
>> >>
>> >> You're getting it sort of wrong way round: What I don't want to do (but
>> >> seemingly being necessary) is mimic the decision logic the hypervisor
>> >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering
>> >> through MWAIT) in Dom0 when deciding about the bits to pass to
>> >
>> > break-on-interrupt is not a hard requirement to use MWAIT. Even when
>> > that extension is not available, MWAIT can be still used to enter C2/C3,
>> > just with interrupt enabled.
>> 
>> And that's why this implementation detail should be confined to the
>> hypervisor - Dom0 should not care about this if at all possible.
>> 
>> >> _PDC. That ought to be an implementation detail (subject to change)
>> >> in the hypervisor alone. The hypervisor itself, otoh, already properly
>> >> checks CPUID leaf 5 (and that's what might cause it to not use mwait
>> >> despite the bit in CPUID leaf 1 being set, which should be all Dom0
>> >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH).
>> >>
>> >
>> > I made a mistake, that currently CPUID leaf 5 is already checked in
>> > check_cx in hypervisor, so it should be sane. However I still fail to catch
>> > your real concern here. :/
>> 
>> If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it
>> will (with the logic outlined above) call _PDC without clearing
>> ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension
>> is not present, but the address space ID for C2 or C3 is set to FFH,
>> then Xen (in acpi_processor_ffh_cstate_probe()) will reject the
>> Cx entry (and hence refrain from using the respective C-state),
>> whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case,
>> firmware would (normally) have converted the address space ID to
>> SYSTEM_IO, and hence Xen would have decided to use C2/C3 with
>> the SYSIO entry method.
>> 
>> So this is only acceptable if there are *no* production CPUs of any
>> vendor that would support MWAIT without the break-on-interrupt
>> extension.
>> 
> 
> yes, that's also the way that native Linux code currently uses:
> 	- notify BIOS ACPI_PDC_C_C2C3_FFH if cpu has mwait
> 	- reject Cx entry if break-on-interrupt extension is not present later
> in acpi processor driver when parsing Cx entries.
> 
> From BIOS ACPI p.o.v, OSPM can notify BIOS about FFH style if following
> conditions are true:
> 	a) cpu supports mwait
> 	b) OSPM itself supports mwait
> 
> a) is architectural, but b) is implementation specific regarding to what
> can be called "support". Obviously both Xen and Linux here use an
> inconsistent way between the place notifying BIOS and the point parsing
> ACPI Cx entry. So your conclusion is correct that C2/C3 would be rejected
> on the CPU which doesn't support MWAIT with break-on-interrupt 
> extension. But it should be fine in the real world, and we may consider
> whether to do something when a real case is encountered in the future.

Actually, I just checked, and I have two systems that have MWAIT but
no extensions to it. While one is an old SDV of yours, the other is a
production Dell system (which I only run Windows on, so I can't really
check how Xen would behave on it prior to and with the discussed
adjustment).

Jan

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2011-09-05  8:14 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-15  5:35 expose MWAIT to dom0 Tian, Kevin
2011-08-15  6:54 ` Keir Fraser
2011-08-15  7:13   ` Tian, Kevin
2011-08-15  7:44     ` Keir Fraser
2011-08-15  7:57       ` Tian, Kevin
2011-08-15  8:10         ` Keir Fraser
2011-08-15  8:14           ` Tian, Kevin
2011-08-15  8:41             ` Jan Beulich
2011-08-15  9:02             ` Keir Fraser
2011-08-16  5:34               ` Tian, Kevin
2011-08-16  6:31                 ` Keir Fraser
2011-08-15  8:12         ` Jan Beulich
2011-08-15  8:25           ` Tian, Kevin
2011-08-15  8:02 ` Jan Beulich
2011-08-15  8:09   ` Tian, Kevin
2011-08-15  8:18     ` Jan Beulich
2011-08-15 10:29     ` Jan Beulich
2011-08-16  6:03       ` Tian, Kevin
2011-08-16  6:42         ` Jan Beulich
2011-08-16  6:53           ` Tian, Kevin
2011-08-16  8:13             ` Jan Beulich
2011-08-16  8:29               ` Tian, Kevin
2011-08-16  8:45                 ` Jan Beulich
2011-08-16  8:50                   ` Tian, Kevin
2011-08-15 12:44   ` Konrad Rzeszutek Wilk
2011-08-18 12:35 Jan Beulich
2011-08-19  1:31 ` Tian, Kevin
2011-08-19  8:10   ` Jan Beulich
2011-08-19  8:34     ` Tian, Kevin
2011-08-19  9:32       ` Jan Beulich
2011-08-19  9:39         ` Tian, Kevin
2011-08-19 12:55           ` Jan Beulich
2011-08-19 15:01           ` Jan Beulich
2011-08-21  5:26             ` Tian, Kevin
2011-08-25 12:37               ` Jan Beulich
2011-08-26  2:18                 ` Tian, Kevin
2011-09-05  8:14                   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.