All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: expose MWAIT to dom0
@ 2011-08-18 12:35 Jan Beulich
  2011-08-19  1:31 ` Tian, Kevin
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2011-08-18 12:35 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 16.08.11 at 10:29, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> Sent: Tuesday, August 16, 2011 4:13 PM
>> 
>> >>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> >> Sent: Tuesday, August 16, 2011 2:42 PM
>> >>
>> >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> >> >> Sent: Monday, August 15, 2011 6:29 PM
>> >> >>
>> >> >> that while improving the situation on CPUs that support the break-on-
>> >> >> interrupt extension to mwait, it would result in C2/C3 not being usable
>> >> >> at all on CPUs that don't (but support mwait in its simpler form and
>> >> >> have ACPI tables specifying FFH as address space id). Is that only a
>> >> >> theoretical concern (i.e. is there an implicit guarantee that for other
>> >> >> than C1 FFH wouldn't be specified without that extension being
>> >> >> available)? I thinks it's a practical one, or otherwise there wouldn't
>> >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC
>> >> >> evaluation.
>> >> >
>> >> > Yes, this is a practical one, though I don't know any box doing that. In
>> > all
>> >> > the boxes I've been using so far, all the extensions are available. But
>> >> > since
>> >> > BIOS vendor may also impact the availability of CPUID bits, I think we
>> >> > should do it right by strictly conforming to theSDM. I.e. we need check
>> >> > CPUID leafs and then verify all Cx states propagated from dom0, instead
>> >> > of blindly following its info. Will work a patch for that.
>> >>
>> >> You're getting it sort of wrong way round: What I don't want to do (but
>> >> seemingly being necessary) is mimic the decision logic the hypervisor
>> >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering
>> >> through MWAIT) in Dom0 when deciding about the bits to pass to
>> >
>> > break-on-interrupt is not a hard requirement to use MWAIT. Even when
>> > that extension is not available, MWAIT can be still used to enter C2/C3,
>> > just with interrupt enabled.
>> 
>> And that's why this implementation detail should be confined to the
>> hypervisor - Dom0 should not care about this if at all possible.
>> 
>> >> _PDC. That ought to be an implementation detail (subject to change)
>> >> in the hypervisor alone. The hypervisor itself, otoh, already properly
>> >> checks CPUID leaf 5 (and that's what might cause it to not use mwait
>> >> despite the bit in CPUID leaf 1 being set, which should be all Dom0
>> >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH).
>> >>
>> >
>> > I made a mistake, that currently CPUID leaf 5 is already checked in
>> > check_cx in hypervisor, so it should be sane. However I still fail to catch
>> > your real concern here. :/
>> 
>> If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it
>> will (with the logic outlined above) call _PDC without clearing
>> ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension
>> is not present, but the address space ID for C2 or C3 is set to FFH,
>> then Xen (in acpi_processor_ffh_cstate_probe()) will reject the
>> Cx entry (and hence refrain from using the respective C-state),
>> whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case,
>> firmware would (normally) have converted the address space ID to
>> SYSTEM_IO, and hence Xen would have decided to use C2/C3 with
>> the SYSIO entry method.
>> 
>> So this is only acceptable if there are *no* production CPUs of any
>> vendor that would support MWAIT without the break-on-interrupt
>> extension.
>> 
> 
> yes, that's also the way that native Linux code currently uses:
> 	- notify BIOS ACPI_PDC_C_C2C3_FFH if cpu has mwait
> 	- reject Cx entry if break-on-interrupt extension is not present later
> in acpi processor driver when parsing Cx entries.
> 
> From BIOS ACPI p.o.v, OSPM can notify BIOS about FFH style if following
> conditions are true:
> 	a) cpu supports mwait
> 	b) OSPM itself supports mwait
> 
> a) is architectural, but b) is implementation specific regarding to what
> can be called "support". Obviously both Xen and Linux here use an
> inconsistent way between the place notifying BIOS and the point parsing
> ACPI Cx entry. So your conclusion is correct that C2/C3 would be rejected
> on the CPU which doesn't support MWAIT with break-on-interrupt 
> extension. But it should be fine in the real world, and we may consider
> whether to do something when a real case is encountered in the future.

Actually, I just checked, and I have two systems that have MWAIT but
no extensions to it. While one is an old SDV of yours, the other is a
production Dell system (which I only run Windows on, so I can't really
check how Xen would behave on it prior to and with the discussed
adjustment).

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-18 12:35 expose MWAIT to dom0 Jan Beulich
@ 2011-08-19  1:31 ` Tian, Kevin
  2011-08-19  8:10   ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Tian, Kevin @ 2011-08-19  1:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Wei, Gang

> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Thursday, August 18, 2011 8:36 PM
> 
> >>> On 16.08.11 at 10:29, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> Sent: Tuesday, August 16, 2011 4:13 PM
> >>
> >> >>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> >> Sent: Tuesday, August 16, 2011 2:42 PM
> >> >>
> >> >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >> >> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> >> >> Sent: Monday, August 15, 2011 6:29 PM
> >> >> >>
> >> >> >> that while improving the situation on CPUs that support the break-on-
> >> >> >> interrupt extension to mwait, it would result in C2/C3 not being usable
> >> >> >> at all on CPUs that don't (but support mwait in its simpler form and
> >> >> >> have ACPI tables specifying FFH as address space id). Is that only a
> >> >> >> theoretical concern (i.e. is there an implicit guarantee that for other
> >> >> >> than C1 FFH wouldn't be specified without that extension being
> >> >> >> available)? I thinks it's a practical one, or otherwise there wouldn't
> >> >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC
> >> >> >> evaluation.
> >> >> >
> >> >> > Yes, this is a practical one, though I don't know any box doing that. In
> >> > all
> >> >> > the boxes I've been using so far, all the extensions are available. But
> >> >> > since
> >> >> > BIOS vendor may also impact the availability of CPUID bits, I think we
> >> >> > should do it right by strictly conforming to theSDM. I.e. we need check
> >> >> > CPUID leafs and then verify all Cx states propagated from dom0,
> instead
> >> >> > of blindly following its info. Will work a patch for that.
> >> >>
> >> >> You're getting it sort of wrong way round: What I don't want to do (but
> >> >> seemingly being necessary) is mimic the decision logic the hypervisor
> >> >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering
> >> >> through MWAIT) in Dom0 when deciding about the bits to pass to
> >> >
> >> > break-on-interrupt is not a hard requirement to use MWAIT. Even when
> >> > that extension is not available, MWAIT can be still used to enter C2/C3,
> >> > just with interrupt enabled.
> >>
> >> And that's why this implementation detail should be confined to the
> >> hypervisor - Dom0 should not care about this if at all possible.
> >>
> >> >> _PDC. That ought to be an implementation detail (subject to change)
> >> >> in the hypervisor alone. The hypervisor itself, otoh, already properly
> >> >> checks CPUID leaf 5 (and that's what might cause it to not use mwait
> >> >> despite the bit in CPUID leaf 1 being set, which should be all Dom0
> >> >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH).
> >> >>
> >> >
> >> > I made a mistake, that currently CPUID leaf 5 is already checked in
> >> > check_cx in hypervisor, so it should be sane. However I still fail to catch
> >> > your real concern here. :/
> >>
> >> If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it
> >> will (with the logic outlined above) call _PDC without clearing
> >> ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension
> >> is not present, but the address space ID for C2 or C3 is set to FFH,
> >> then Xen (in acpi_processor_ffh_cstate_probe()) will reject the
> >> Cx entry (and hence refrain from using the respective C-state),
> >> whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case,
> >> firmware would (normally) have converted the address space ID to
> >> SYSTEM_IO, and hence Xen would have decided to use C2/C3 with
> >> the SYSIO entry method.
> >>
> >> So this is only acceptable if there are *no* production CPUs of any
> >> vendor that would support MWAIT without the break-on-interrupt
> >> extension.
> >>
> >
> > yes, that's also the way that native Linux code currently uses:
> > 	- notify BIOS ACPI_PDC_C_C2C3_FFH if cpu has mwait
> > 	- reject Cx entry if break-on-interrupt extension is not present later
> > in acpi processor driver when parsing Cx entries.
> >
> > From BIOS ACPI p.o.v, OSPM can notify BIOS about FFH style if following
> > conditions are true:
> > 	a) cpu supports mwait
> > 	b) OSPM itself supports mwait
> >
> > a) is architectural, but b) is implementation specific regarding to what
> > can be called "support". Obviously both Xen and Linux here use an
> > inconsistent way between the place notifying BIOS and the point parsing
> > ACPI Cx entry. So your conclusion is correct that C2/C3 would be rejected
> > on the CPU which doesn't support MWAIT with break-on-interrupt
> > extension. But it should be fine in the real world, and we may consider
> > whether to do something when a real case is encountered in the future.
> 
> Actually, I just checked, and I have two systems that have MWAIT but
> no extensions to it. While one is an old SDV of yours, the other is a
> production Dell system (which I only run Windows on, so I can't really
> check how Xen would behave on it prior to and with the discussed
> adjustment).

OK, to avoid the regression, if it's really cared, then we may change Xen 
to support entering C-state by mwait with interrupt enabled.

But the next question is whether it's really worthy of enabling Xen PM 
such way:
	- I think native Linux only supports mwait with break-on-interrupt
extension too. You may confirm on such machines which I think should 
have no C2/C3 available. It's less likely for a customer to try PM on a 
platform where native Linux fails to do that
	- using mwait with interrupt enabled lacks the trace capability, 
while w/o trace I don't think any customer would enable Xen PM w/o a
verification process.

Another approach, if we really want to keep original I/O style, is to
reveal Xen's mwait related conditions in shared info page, say a simple
flag to indicate whether mwait bit should be set by pvops cpuid hook.
Xen will check mwait extension earlier before dom0 is launched, instead
of current point where dom0 registers cx info. This way there's no Xen
implementation detail encoded in dom0, while concerned regression 
could be removed.

You're OSV guys. So let's draw a conclusion from your expertise. If you
think such regression is important and should be avoided, and the 2nd
approach about share info is reasonable, I'll go for it. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-19  1:31 ` Tian, Kevin
@ 2011-08-19  8:10   ` Jan Beulich
  2011-08-19  8:34     ` Tian, Kevin
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2011-08-19  8:10 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> OK, to avoid the regression, if it's really cared, then we may change Xen 
> to support entering C-state by mwait with interrupt enabled.

I don't think that's worth it.

> But the next question is whether it's really worthy of enabling Xen PM 
> such way:
> 	- I think native Linux only supports mwait with break-on-interrupt
> extension too. You may confirm on such machines which I think should 
> have no C2/C3 available. It's less likely for a customer to try PM on a 
> platform where native Linux fails to do that

Looking at the code, I can't see why Linux wouldn't use the I/O method
in this case instead.

> 	- using mwait with interrupt enabled lacks the trace capability, 
> while w/o trace I don't think any customer would enable Xen PM w/o a
> verification process.
> 
> Another approach, if we really want to keep original I/O style, is to
> reveal Xen's mwait related conditions in shared info page, say a simple
> flag to indicate whether mwait bit should be set by pvops cpuid hook.
> Xen will check mwait extension earlier before dom0 is launched, instead
> of current point where dom0 registers cx info. This way there's no Xen
> implementation detail encoded in dom0, while concerned regression 
> could be removed.

The concept sounds reasonable, just that the shared info page probably
isn't the right mechanism (after all this is Dom0-only information that
we want to expose). A new platform sub-hypercall would probably be
the better route.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-19  8:10   ` Jan Beulich
@ 2011-08-19  8:34     ` Tian, Kevin
  2011-08-19  9:32       ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Tian, Kevin @ 2011-08-19  8:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Wei, Gang

> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Friday, August 19, 2011 4:11 PM
> 
> >>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > OK, to avoid the regression, if it's really cared, then we may change Xen
> > to support entering C-state by mwait with interrupt enabled.
> 
> I don't think that's worth it.
> 
> > But the next question is whether it's really worthy of enabling Xen PM
> > such way:
> > 	- I think native Linux only supports mwait with break-on-interrupt
> > extension too. You may confirm on such machines which I think should
> > have no C2/C3 available. It's less likely for a customer to try PM on a
> > platform where native Linux fails to do that
> 
> Looking at the code, I can't see why Linux wouldn't use the I/O method
> in this case instead.

acpi_processor_ffh_cstate_probe_cpu:
        /* mwait ecx extensions INTERRUPT_BREAK should be supported for C2/C3 */
        if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
            !(ecx & CPUID5_ECX_INTERRUPT_BREAK)) {
                retval = -1;
                goto out;
        }

arch_acpi_set_pdc_bits:
        /*
         * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
         */
        if (!cpu_has(c, X86_FEATURE_MWAIT))
                buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);

> 
> > 	- using mwait with interrupt enabled lacks the trace capability,
> > while w/o trace I don't think any customer would enable Xen PM w/o a
> > verification process.
> >
> > Another approach, if we really want to keep original I/O style, is to
> > reveal Xen's mwait related conditions in shared info page, say a simple
> > flag to indicate whether mwait bit should be set by pvops cpuid hook.
> > Xen will check mwait extension earlier before dom0 is launched, instead
> > of current point where dom0 registers cx info. This way there's no Xen
> > implementation detail encoded in dom0, while concerned regression
> > could be removed.
> 
> The concept sounds reasonable, just that the shared info page probably
> isn't the right mechanism (after all this is Dom0-only information that
> we want to expose). A new platform sub-hypercall would probably be
> the better route.
> 

yes, that's a better choice.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-19  8:34     ` Tian, Kevin
@ 2011-08-19  9:32       ` Jan Beulich
  2011-08-19  9:39         ` Tian, Kevin
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2011-08-19  9:32 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 19.08.11 at 10:34, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> Sent: Friday, August 19, 2011 4:11 PM
>> 
>> >>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> > OK, to avoid the regression, if it's really cared, then we may change Xen
>> > to support entering C-state by mwait with interrupt enabled.
>> 
>> I don't think that's worth it.
>> 
>> > But the next question is whether it's really worthy of enabling Xen PM
>> > such way:
>> > 	- I think native Linux only supports mwait with break-on-interrupt
>> > extension too. You may confirm on such machines which I think should
>> > have no C2/C3 available. It's less likely for a customer to try PM on a
>> > platform where native Linux fails to do that
>> 
>> Looking at the code, I can't see why Linux wouldn't use the I/O method
>> in this case instead.
> 
> acpi_processor_ffh_cstate_probe_cpu:
>         /* mwait ecx extensions INTERRUPT_BREAK should be supported for 
> C2/C3 */
>         if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
>             !(ecx & CPUID5_ECX_INTERRUPT_BREAK)) {
>                 retval = -1;
>                 goto out;
>         }
> 
> arch_acpi_set_pdc_bits:
>         /*
>          * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
>          */
>         if (!cpu_has(c, X86_FEATURE_MWAIT))
>                 buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
> 

Right, this precludes the use of MWAIT, but it doesn't preclude using
the I/O method.

>> 
>> > 	- using mwait with interrupt enabled lacks the trace capability,
>> > while w/o trace I don't think any customer would enable Xen PM w/o a
>> > verification process.
>> >
>> > Another approach, if we really want to keep original I/O style, is to
>> > reveal Xen's mwait related conditions in shared info page, say a simple
>> > flag to indicate whether mwait bit should be set by pvops cpuid hook.
>> > Xen will check mwait extension earlier before dom0 is launched, instead
>> > of current point where dom0 registers cx info. This way there's no Xen
>> > implementation detail encoded in dom0, while concerned regression
>> > could be removed.
>> 
>> The concept sounds reasonable, just that the shared info page probably
>> isn't the right mechanism (after all this is Dom0-only information that
>> we want to expose). A new platform sub-hypercall would probably be
>> the better route.
>> 
> 
> yes, that's a better choice.

Yet another idea - why don't we simply pass the buffer passed to
arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the
bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH
(which I don't think Xen really supports at present).

Or really, depending on who controls what, the P, C, and T bits should
be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently
does, and then let Xen override the bits it ought to control).

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-19  9:32       ` Jan Beulich
@ 2011-08-19  9:39         ` Tian, Kevin
  2011-08-19 12:55           ` Jan Beulich
  2011-08-19 15:01           ` Jan Beulich
  0 siblings, 2 replies; 36+ messages in thread
From: Tian, Kevin @ 2011-08-19  9:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Wei, Gang

> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Friday, August 19, 2011 5:32 PM
> 
> >>> On 19.08.11 at 10:34, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> Sent: Friday, August 19, 2011 4:11 PM
> >>
> >> >>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >> > OK, to avoid the regression, if it's really cared, then we may change Xen
> >> > to support entering C-state by mwait with interrupt enabled.
> >>
> >> I don't think that's worth it.
> >>
> >> > But the next question is whether it's really worthy of enabling Xen PM
> >> > such way:
> >> > 	- I think native Linux only supports mwait with break-on-interrupt
> >> > extension too. You may confirm on such machines which I think should
> >> > have no C2/C3 available. It's less likely for a customer to try PM on a
> >> > platform where native Linux fails to do that
> >>
> >> Looking at the code, I can't see why Linux wouldn't use the I/O method
> >> in this case instead.
> >
> > acpi_processor_ffh_cstate_probe_cpu:
> >         /* mwait ecx extensions INTERRUPT_BREAK should be supported
> for
> > C2/C3 */
> >         if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
> >             !(ecx & CPUID5_ECX_INTERRUPT_BREAK)) {
> >                 retval = -1;
> >                 goto out;
> >         }
> >
> > arch_acpi_set_pdc_bits:
> >         /*
> >          * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
> >          */
> >         if (!cpu_has(c, X86_FEATURE_MWAIT))
> >                 buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
> >
> 
> Right, this precludes the use of MWAIT, but it doesn't preclude using
> the I/O method.

In your special machine which has mwait but no break-on-interrupt extension:

arch_acpi_set_pdc_bits tells BIOS that mwait should be used.

BIOS then report _CST table with each entry filled with mwait style info

later when Linux verifies _CST entry by acpi_processor_ffh_cstate_probe_cpu,
it simply fails. No fallback to i/o method as BIOS is only notified in ACPI init time.

> 
> >>
> >> > 	- using mwait with interrupt enabled lacks the trace capability,
> >> > while w/o trace I don't think any customer would enable Xen PM w/o a
> >> > verification process.
> >> >
> >> > Another approach, if we really want to keep original I/O style, is to
> >> > reveal Xen's mwait related conditions in shared info page, say a simple
> >> > flag to indicate whether mwait bit should be set by pvops cpuid hook.
> >> > Xen will check mwait extension earlier before dom0 is launched, instead
> >> > of current point where dom0 registers cx info. This way there's no Xen
> >> > implementation detail encoded in dom0, while concerned regression
> >> > could be removed.
> >>
> >> The concept sounds reasonable, just that the shared info page probably
> >> isn't the right mechanism (after all this is Dom0-only information that
> >> we want to expose). A new platform sub-hypercall would probably be
> >> the better route.
> >>
> >
> > yes, that's a better choice.
> 
> Yet another idea - why don't we simply pass the buffer passed to
> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the
> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH
> (which I don't think Xen really supports at present).
> 
> Or really, depending on who controls what, the P, C, and T bits should
> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently
> does, and then let Xen override the bits it ought to control).

_PDC is encoded in AML language, and requires an ACPI parser which
is one thing we avoid in Xen. If Xen want to override those bits, then
whole ACPI component needs move down to Xen too.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-19  9:39         ` Tian, Kevin
@ 2011-08-19 12:55           ` Jan Beulich
  2011-08-19 15:01           ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2011-08-19 12:55 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 19.08.11 at 11:39, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> Sent: Friday, August 19, 2011 5:32 PM
>> 
>> >>> On 19.08.11 at 10:34, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> >> Sent: Friday, August 19, 2011 4:11 PM
>> >>
>> >> >>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >> > OK, to avoid the regression, if it's really cared, then we may change Xen
>> >> > to support entering C-state by mwait with interrupt enabled.
>> >>
>> >> I don't think that's worth it.
>> >>
>> >> > But the next question is whether it's really worthy of enabling Xen PM
>> >> > such way:
>> >> > 	- I think native Linux only supports mwait with break-on-interrupt
>> >> > extension too. You may confirm on such machines which I think should
>> >> > have no C2/C3 available. It's less likely for a customer to try PM on a
>> >> > platform where native Linux fails to do that
>> >>
>> >> Looking at the code, I can't see why Linux wouldn't use the I/O method
>> >> in this case instead.
>> >
>> > acpi_processor_ffh_cstate_probe_cpu:
>> >         /* mwait ecx extensions INTERRUPT_BREAK should be supported
>> for
>> > C2/C3 */
>> >         if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
>> >             !(ecx & CPUID5_ECX_INTERRUPT_BREAK)) {
>> >                 retval = -1;
>> >                 goto out;
>> >         }
>> >
>> > arch_acpi_set_pdc_bits:
>> >         /*
>> >          * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
>> >          */
>> >         if (!cpu_has(c, X86_FEATURE_MWAIT))
>> >                 buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
>> >
>> 
>> Right, this precludes the use of MWAIT, but it doesn't preclude using
>> the I/O method.
> 
> In your special machine which has mwait but no break-on-interrupt extension:
> 
> arch_acpi_set_pdc_bits tells BIOS that mwait should be used.
> 
> BIOS then report _CST table with each entry filled with mwait style info
> 
> later when Linux verifies _CST entry by acpi_processor_ffh_cstate_probe_cpu,
> it simply fails. No fallback to i/o method as BIOS is only notified in ACPI 
> init time.

Oh, right, I just stumbled across that too (and didn't pay close enough
attention before). The function really should be checking for the
extension bits.

>> 
>> >>
>> >> > 	- using mwait with interrupt enabled lacks the trace capability,
>> >> > while w/o trace I don't think any customer would enable Xen PM w/o a
>> >> > verification process.
>> >> >
>> >> > Another approach, if we really want to keep original I/O style, is to
>> >> > reveal Xen's mwait related conditions in shared info page, say a simple
>> >> > flag to indicate whether mwait bit should be set by pvops cpuid hook.
>> >> > Xen will check mwait extension earlier before dom0 is launched, instead
>> >> > of current point where dom0 registers cx info. This way there's no Xen
>> >> > implementation detail encoded in dom0, while concerned regression
>> >> > could be removed.
>> >>
>> >> The concept sounds reasonable, just that the shared info page probably
>> >> isn't the right mechanism (after all this is Dom0-only information that
>> >> we want to expose). A new platform sub-hypercall would probably be
>> >> the better route.
>> >>
>> >
>> > yes, that's a better choice.
>> 
>> Yet another idea - why don't we simply pass the buffer passed to
>> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the
>> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH
>> (which I don't think Xen really supports at present).
>> 
>> Or really, depending on who controls what, the P, C, and T bits should
>> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently
>> does, and then let Xen override the bits it ought to control).
> 
> _PDC is encoded in AML language, and requires an ACPI parser which
> is one thing we avoid in Xen. If Xen want to override those bits, then
> whole ACPI component needs move down to Xen too.

No, I'm not saying the evaluation should be happening there. Below is
a draft hypervisor patch (only compile tested so far).

Jan

--- a/xen/arch/ia64/linux-xen/acpi.c
+++ b/xen/arch/ia64/linux-xen/acpi.c
@@ -241,6 +241,12 @@ int get_cpu_id(u32 acpi_id)
 
 	return -1;
 }
+
+int arch_acpi_set_pdc_bits(u32 *pdc, u32 mask)
+{
+	pdc[2] |= ACPI_PDC_EST_CAPABILITY_SMP & mask;
+}
+
 #endif
 
 static int __init
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -643,12 +643,6 @@ static int cpuidle_init_cpu(int cpu)
     return 0;
 }
 
-#define CPUID_MWAIT_LEAF (5)
-#define CPUID5_ECX_EXTENSIONS_SUPPORTED (0x1)
-#define CPUID5_ECX_INTERRUPT_BREAK      (0x2)
-
-#define MWAIT_ECX_INTERRUPT_BREAK       (0x1)
-
 #define MWAIT_SUBSTATE_MASK (0xf)
 #define MWAIT_SUBSTATE_SIZE (4)
 
--- a/xen/arch/x86/acpi/lib.c
+++ b/xen/arch/x86/acpi/lib.c
@@ -81,3 +81,31 @@ unsigned int acpi_get_processor_id(unsig
 
 	return INVALID_ACPIID;
 }
+
+int arch_acpi_set_pdc_bits(u32 *pdc, u32 mask)
+{
+	u32 ecx;
+
+	pdc[2] |= ACPI_PDC_C_CAPABILITY_SMP & mask;
+
+	if (boot_cpu_has(X86_FEATURE_EST))
+		pdc[2] |= ACPI_PDC_EST_CAPABILITY_SWSMP & mask;
+
+	if (boot_cpu_has(X86_FEATURE_ACPI))
+		pdc[2] |= ACPI_PDC_T_FFH & mask;
+
+	/*
+	 * If mwait/monitor or its break-on-interrupt extension are
+	 * unsupported, Cx_FFH will be disabled.
+	 */
+	if (boot_cpu_has(X86_FEATURE_MWAIT) &&
+	    boot_cpu_data.cpuid_level >= CPUID_MWAIT_LEAF)
+		ecx = cpuid_ecx(CPUID_MWAIT_LEAF);
+	else
+		ecx = 0;
+	if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
+	    !(ecx & CPUID5_ECX_INTERRUPT_BREAK))
+		pdc[2] &= ~(ACPI_PDC_C_C1_FFH | ACPI_PDC_C_C2C3_FFH);
+
+	return 0;
+}
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -416,6 +416,18 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe
             ret = -EINVAL;
             break;
 
+        case XEN_PM_PDC:
+            if ( op->u.set_pminfo.id + 1 )
+                ret = -EINVAL;
+            else
+            {
+                XEN_GUEST_HANDLE(uint32) pdc;
+
+                guest_from_compat_handle(pdc, op->u.set_pminfo.u.pdc);
+                ret = acpi_set_pdc_bits(pdc);
+            }
+            break;
+
         default:
             ret = -EINVAL;
             break;
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -517,3 +517,37 @@ int do_pm_op(struct xen_sysctl_pm_op *op
 
     return ret;
 }
+
+int acpi_set_pdc_bits(XEN_GUEST_HANDLE(uint32) pdc)
+{
+    u32 bits[3];
+    int ret;
+
+    if ( copy_from_guest(bits, pdc, 2) )
+        ret = -EFAULT;
+    else if ( bits[0] != ACPI_PDC_REVISION_ID || !bits[1] )
+        ret = -EINVAL;
+    else if ( copy_from_guest_offset(bits + 2, pdc, 2, 1) )
+        ret = -EFAULT;
+    else
+    {
+        u32 mask = 0;
+
+        if ( xen_processor_pmbits & XEN_PROCESSOR_PM_CX )
+            mask |= ACPI_PDC_C_BITS | ACPI_PDC_SMP_C1PT;
+        if ( xen_processor_pmbits & XEN_PROCESSOR_PM_PX )
+            mask |= ACPI_PDC_P_BITS | ACPI_PDC_SMP_C1PT;
+        if ( xen_processor_pmbits & XEN_PROCESSOR_PM_TX )
+            mask |= ACPI_PDC_T_BITS | ACPI_PDC_SMP_C1PT;
+printk("PDC: %04x (mask %04x)", bits[2], mask);//temp
+        bits[2] &= ACPI_PDC_C_BITS & ACPI_PDC_P_BITS & ACPI_PDC_T_BITS &
+                   ACPI_PDC_SMP_C1PT & ~mask;
+printk(" -> %04x", bits[2]);//temp
+        ret = arch_acpi_set_pdc_bits(bits, mask);
+printk(" -> %04x (%d)\n", bits[2], ret);//temp
+    }
+    if ( !ret )
+        ret = copy_to_guest_offset(pdc, 2, bits + 2, 1);
+
+    return ret;
+}
--- a/xen/include/acpi/pdc_intel.h
+++ b/xen/include/acpi/pdc_intel.h
@@ -4,6 +4,8 @@
 #ifndef __PDC_INTEL_H__
 #define __PDC_INTEL_H__
 
+#define ACPI_PDC_REVISION_ID		1
+
 #define ACPI_PDC_P_FFH			(0x0001)
 #define ACPI_PDC_C_C1_HALT		(0x0002)
 #define ACPI_PDC_T_FFH			(0x0004)
@@ -14,6 +16,7 @@
 #define ACPI_PDC_SMP_T_SWCOORD		(0x0080)
 #define ACPI_PDC_C_C1_FFH		(0x0100)
 #define ACPI_PDC_C_C2C3_FFH		(0x0200)
+#define ACPI_PDC_SMP_P_HWCOORD		(0x0800)
 
 #define ACPI_PDC_EST_CAPABILITY_SMP	(ACPI_PDC_SMP_C1PT | \
 					 ACPI_PDC_C_C1_HALT | \
@@ -22,6 +25,7 @@
 #define ACPI_PDC_EST_CAPABILITY_SWSMP	(ACPI_PDC_SMP_C1PT | \
 					 ACPI_PDC_C_C1_HALT | \
 					 ACPI_PDC_SMP_P_SWCOORD | \
+					 ACPI_PDC_SMP_P_HWCOORD | \
 					 ACPI_PDC_P_FFH)
 
 #define ACPI_PDC_C_CAPABILITY_SMP	(ACPI_PDC_SMP_C2C3  | \
@@ -30,4 +34,17 @@
 					 ACPI_PDC_C_C1_FFH  | \
 					 ACPI_PDC_C_C2C3_FFH)
 
+#define ACPI_PDC_C_BITS			(ACPI_PDC_C_C1_HALT | \
+					 ACPI_PDC_C_C1_FFH | \
+					 ACPI_PDC_SMP_C2C3 | \
+					 ACPI_PDC_SMP_C_SWCOORD | \
+					 ACPI_PDC_C_C2C3_FFH)
+
+#define ACPI_PDC_P_BITS			(ACPI_PDC_P_FFH | \
+					 ACPI_PDC_SMP_P_SWCOORD | \
+					 ACPI_PDC_SMP_P_HWCOORD)
+
+#define ACPI_PDC_T_BITS			(ACPI_PDC_T_FFH | \
+					 ACPI_PDC_SMP_T_SWCOORD)
+
 #endif				/* __PDC_INTEL_H__ */
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -152,6 +152,10 @@
 #define boot_cpu_has(bit)	test_bit(bit, boot_cpu_data.x86_capability)
 #define cpufeat_mask(idx)       (1u << ((idx) & 31))
 
+#define CPUID_MWAIT_LEAF                5
+#define CPUID5_ECX_EXTENSIONS_SUPPORTED 0x1
+#define CPUID5_ECX_INTERRUPT_BREAK      0x2
+
 #ifdef __i386__
 #define cpu_has_vme		boot_cpu_has(X86_FEATURE_VME)
 #define cpu_has_de		boot_cpu_has(X86_FEATURE_DE)
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -304,6 +304,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletim
 #define XEN_PM_CX   0
 #define XEN_PM_PX   1
 #define XEN_PM_TX   2
+#define XEN_PM_PDC  3
 
 /* Px sub info type */
 #define XEN_PX_PCT   1
@@ -401,6 +402,7 @@ struct xenpf_set_processor_pminfo {
     union {
         struct xen_processor_power          power;/* Cx: _CST/_CSD */
         struct xen_processor_performance    perf; /* Px: _PPC/_PCT/_PSS/_PSD */
+        XEN_GUEST_HANDLE(uint32)            pdc;  /* _PDC */
     } u;
 };
 typedef struct xenpf_set_processor_pminfo xenpf_set_processor_pminfo_t;
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -358,6 +358,9 @@ static inline unsigned int acpi_get_csta
 static inline void acpi_set_cstate_limit(unsigned int new_limit) { return; }
 #endif
 
+int acpi_set_pdc_bits(XEN_GUEST_HANDLE(uint32));
+int arch_acpi_set_pdc_bits(u32 *, u32 mask);
+
 #ifdef CONFIG_ACPI_NUMA
 int acpi_get_pxm(acpi_handle handle);
 #else

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-19  9:39         ` Tian, Kevin
  2011-08-19 12:55           ` Jan Beulich
@ 2011-08-19 15:01           ` Jan Beulich
  2011-08-21  5:26             ` Tian, Kevin
  1 sibling, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2011-08-19 15:01 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

[-- Attachment #1: Type: text/plain, Size: 4572 bytes --]

>>> On 19.08.11 at 14:55, Jan Beulich wrote:
> >>> On 19.08.11 at 11:39, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
> >> Sent: Friday, August 19, 2011 5:32 PM
> >> 
> >> >>> On 19.08.11 at 10:34, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
> >> >> Sent: Friday, August 19, 2011 4:11 PM
> >> >>
> >> >> >>> On 19.08.11 at 03:31, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >> >> > OK, to avoid the regression, if it's really cared, then we may change Xen
> >> >> > to support entering C-state by mwait with interrupt enabled.
> >> >>
> >> >> I don't think that's worth it.
> >> >>
> >> >> > But the next question is whether it's really worthy of enabling Xen PM
> >> >> > such way:
> >> >> > 	- I think native Linux only supports mwait with break-on-interrupt
> >> >> > extension too. You may confirm on such machines which I think should
> >> >> > have no C2/C3 available. It's less likely for a customer to try PM on a
> >> >> > platform where native Linux fails to do that
> >> >>
> >> >> Looking at the code, I can't see why Linux wouldn't use the I/O method
> >> >> in this case instead.
> >> >
> >> > acpi_processor_ffh_cstate_probe_cpu:
> >> >         /* mwait ecx extensions INTERRUPT_BREAK should be supported
> >> for
> >> > C2/C3 */
> >> >         if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
> >> >             !(ecx & CPUID5_ECX_INTERRUPT_BREAK)) {
> >> >                 retval = -1;
> >> >                 goto out;
> >> >         }
> >> >
> >> > arch_acpi_set_pdc_bits:
> >> >         /*
> >> >          * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
> >> >          */
> >> >         if (!cpu_has(c, X86_FEATURE_MWAIT))
> >> >                 buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
> >> >
> >> 
> >> Right, this precludes the use of MWAIT, but it doesn't preclude using
> >> the I/O method.
> > 
> > In your special machine which has mwait but no break-on-interrupt extension:
> > 
> > arch_acpi_set_pdc_bits tells BIOS that mwait should be used.
> > 
> > BIOS then report _CST table with each entry filled with mwait style info
> > 
> > later when Linux verifies _CST entry by acpi_processor_ffh_cstate_probe_cpu,
> > it simply fails. No fallback to i/o method as BIOS is only notified in ACPI 
> > init time.
> 
> Oh, right, I just stumbled across that too (and didn't pay close enough
> attention before). The function really should be checking for the
> extension bits.
> 
> >> 
> >> >>
> >> >> > 	- using mwait with interrupt enabled lacks the trace capability,
> >> >> > while w/o trace I don't think any customer would enable Xen PM w/o a
> >> >> > verification process.
> >> >> >
> >> >> > Another approach, if we really want to keep original I/O style, is to
> >> >> > reveal Xen's mwait related conditions in shared info page, say a simple
> >> >> > flag to indicate whether mwait bit should be set by pvops cpuid hook.
> >> >> > Xen will check mwait extension earlier before dom0 is launched, instead
> >> >> > of current point where dom0 registers cx info. This way there's no Xen
> >> >> > implementation detail encoded in dom0, while concerned regression
> >> >> > could be removed.
> >> >>
> >> >> The concept sounds reasonable, just that the shared info page probably
> >> >> isn't the right mechanism (after all this is Dom0-only information that
> >> >> we want to expose). A new platform sub-hypercall would probably be
> >> >> the better route.
> >> >>
> >> >
> >> > yes, that's a better choice.
> >> 
> >> Yet another idea - why don't we simply pass the buffer passed to
> >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the
> >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH
> >> (which I don't think Xen really supports at present).
> >> 
> >> Or really, depending on who controls what, the P, C, and T bits should
> >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently
> >> does, and then let Xen override the bits it ought to control).
> > 
> > _PDC is encoded in AML language, and requires an ACPI parser which
> > is one thing we avoid in Xen. If Xen want to override those bits, then
> > whole ACPI component needs move down to Xen too.
> 
> No, I'm not saying the evaluation should be happening there. Below is
> a draft hypervisor patch (only compile tested so far).

Attached a patch that actually works (with a minimal Dom0 addition).

Jan

[-- Attachment #2: acpi-set-pdc-bits.patch --]
[-- Type: text/plain, Size: 7380 bytes --]

--- a/xen/arch/ia64/linux-xen/acpi.c
+++ b/xen/arch/ia64/linux-xen/acpi.c
@@ -241,6 +241,13 @@ int get_cpu_id(u8 acpi_id)
 
 	return -1;
 }
+
+int arch_acpi_set_pdc_bits(u32 acpi_id, u32 *pdc, u32 mask)
+{
+	pdc[2] |= ACPI_PDC_EST_CAPABILITY_SMP & mask;
+	return 0;
+}
+
 #endif
 
 static int __init
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -643,12 +643,6 @@ static int cpuidle_init_cpu(int cpu)
     return 0;
 }
 
-#define CPUID_MWAIT_LEAF (5)
-#define CPUID5_ECX_EXTENSIONS_SUPPORTED (0x1)
-#define CPUID5_ECX_INTERRUPT_BREAK      (0x2)
-
-#define MWAIT_ECX_INTERRUPT_BREAK       (0x1)
-
 #define MWAIT_SUBSTATE_MASK (0xf)
 #define MWAIT_SUBSTATE_SIZE (4)
 
--- a/xen/arch/x86/acpi/lib.c
+++ b/xen/arch/x86/acpi/lib.c
@@ -81,3 +81,47 @@ unsigned int acpi_get_processor_id(unsig
 
 	return INVALID_ACPIID;
 }
+
+static void get_mwait_ecx(void *info)
+{
+	*(u32 *)info = cpuid_ecx(CPUID_MWAIT_LEAF);
+}
+
+int arch_acpi_set_pdc_bits(u32 acpi_id, u32 *pdc, u32 mask)
+{
+	unsigned int cpu = get_cpu_id(acpi_id);
+	struct cpuinfo_x86 *c;
+	u32 ecx;
+
+	if (!(acpi_id + 1))
+		c = &boot_cpu_data;
+	else if (cpu >= NR_CPUS || !cpu_online(cpu))
+		return -EINVAL;
+	else
+		c = cpu_data + cpu;
+
+	pdc[2] |= ACPI_PDC_C_CAPABILITY_SMP & mask;
+
+	if (cpu_has(c, X86_FEATURE_EST))
+		pdc[2] |= ACPI_PDC_EST_CAPABILITY_SWSMP & mask;
+
+	if (cpu_has(c, X86_FEATURE_ACPI))
+		pdc[2] |= ACPI_PDC_T_FFH & mask;
+
+	/*
+	 * If mwait/monitor or its break-on-interrupt extension are
+	 * unsupported, Cx_FFH will be disabled.
+	 */
+	if (!cpu_has(c, X86_FEATURE_MWAIT) ||
+	    c->cpuid_level < CPUID_MWAIT_LEAF)
+		ecx = 0;
+	else if (c == &boot_cpu_data || cpu == smp_processor_id())
+		ecx = cpuid_ecx(CPUID_MWAIT_LEAF);
+	else
+		on_selected_cpus(cpumask_of(cpu), get_mwait_ecx, &ecx, 1);
+	if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
+	    !(ecx & CPUID5_ECX_INTERRUPT_BREAK))
+		pdc[2] &= ~(ACPI_PDC_C_C1_FFH | ACPI_PDC_C_C2C3_FFH);
+
+	return 0;
+}
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -416,6 +416,15 @@ ret_t do_platform_op(XEN_GUEST_HANDLE(xe
             ret = -EINVAL;
             break;
 
+        case XEN_PM_PDC:
+        {
+            XEN_GUEST_HANDLE(uint32) pdc;
+
+            guest_from_compat_handle(pdc, op->u.set_pminfo.u.pdc);
+            ret = acpi_set_pdc_bits(op->u.set_pminfo.id, pdc);
+        }
+        break;
+
         default:
             ret = -EINVAL;
             break;
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -517,3 +517,34 @@ int do_pm_op(struct xen_sysctl_pm_op *op
 
     return ret;
 }
+
+int acpi_set_pdc_bits(u32 acpi_id, XEN_GUEST_HANDLE(uint32) pdc)
+{
+    u32 bits[3];
+    int ret;
+
+    if ( copy_from_guest(bits, pdc, 2) )
+        ret = -EFAULT;
+    else if ( bits[0] != ACPI_PDC_REVISION_ID || !bits[1] )
+        ret = -EINVAL;
+    else if ( copy_from_guest_offset(bits + 2, pdc, 2, 1) )
+        ret = -EFAULT;
+    else
+    {
+        u32 mask = 0;
+
+        if ( xen_processor_pmbits & XEN_PROCESSOR_PM_CX )
+            mask |= ACPI_PDC_C_MASK | ACPI_PDC_SMP_C1PT;
+        if ( xen_processor_pmbits & XEN_PROCESSOR_PM_PX )
+            mask |= ACPI_PDC_P_MASK | ACPI_PDC_SMP_C1PT;
+        if ( xen_processor_pmbits & XEN_PROCESSOR_PM_TX )
+            mask |= ACPI_PDC_T_MASK | ACPI_PDC_SMP_C1PT;
+        bits[2] &= (ACPI_PDC_C_MASK | ACPI_PDC_P_MASK | ACPI_PDC_T_MASK |
+                    ACPI_PDC_SMP_C1PT) & ~mask;
+        ret = arch_acpi_set_pdc_bits(acpi_id, bits, mask);
+    }
+    if ( !ret )
+        ret = copy_to_guest_offset(pdc, 2, bits + 2, 1);
+
+    return ret;
+}
--- a/xen/include/acpi/cpufreq/processor_perf.h
+++ b/xen/include/acpi/cpufreq/processor_perf.h
@@ -3,10 +3,10 @@
 
 #include <public/platform.h>
 #include <public/sysctl.h>
+#include <xen/acpi.h>
 
 #define XEN_PX_INIT 0x80000000
 
-int get_cpu_id(u8);
 int powernow_cpufreq_init(void);
 unsigned int powernow_register_driver(void);
 unsigned int get_measured_perf(unsigned int cpu, unsigned int flag);
--- a/xen/include/acpi/pdc_intel.h
+++ b/xen/include/acpi/pdc_intel.h
@@ -4,6 +4,8 @@
 #ifndef __PDC_INTEL_H__
 #define __PDC_INTEL_H__
 
+#define ACPI_PDC_REVISION_ID		1
+
 #define ACPI_PDC_P_FFH			(0x0001)
 #define ACPI_PDC_C_C1_HALT		(0x0002)
 #define ACPI_PDC_T_FFH			(0x0004)
@@ -14,6 +16,7 @@
 #define ACPI_PDC_SMP_T_SWCOORD		(0x0080)
 #define ACPI_PDC_C_C1_FFH		(0x0100)
 #define ACPI_PDC_C_C2C3_FFH		(0x0200)
+#define ACPI_PDC_SMP_P_HWCOORD		(0x0800)
 
 #define ACPI_PDC_EST_CAPABILITY_SMP	(ACPI_PDC_SMP_C1PT | \
 					 ACPI_PDC_C_C1_HALT | \
@@ -22,6 +25,7 @@
 #define ACPI_PDC_EST_CAPABILITY_SWSMP	(ACPI_PDC_SMP_C1PT | \
 					 ACPI_PDC_C_C1_HALT | \
 					 ACPI_PDC_SMP_P_SWCOORD | \
+					 ACPI_PDC_SMP_P_HWCOORD | \
 					 ACPI_PDC_P_FFH)
 
 #define ACPI_PDC_C_CAPABILITY_SMP	(ACPI_PDC_SMP_C2C3  | \
@@ -30,4 +34,17 @@
 					 ACPI_PDC_C_C1_FFH  | \
 					 ACPI_PDC_C_C2C3_FFH)
 
+#define ACPI_PDC_C_MASK			(ACPI_PDC_C_C1_HALT | \
+					 ACPI_PDC_C_C1_FFH | \
+					 ACPI_PDC_SMP_C2C3 | \
+					 ACPI_PDC_SMP_C_SWCOORD | \
+					 ACPI_PDC_C_C2C3_FFH)
+
+#define ACPI_PDC_P_MASK			(ACPI_PDC_P_FFH | \
+					 ACPI_PDC_SMP_P_SWCOORD | \
+					 ACPI_PDC_SMP_P_HWCOORD)
+
+#define ACPI_PDC_T_MASK			(ACPI_PDC_T_FFH | \
+					 ACPI_PDC_SMP_T_SWCOORD)
+
 #endif				/* __PDC_INTEL_H__ */
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -152,6 +152,10 @@
 #define boot_cpu_has(bit)	test_bit(bit, boot_cpu_data.x86_capability)
 #define cpufeat_mask(idx)       (1u << ((idx) & 31))
 
+#define CPUID_MWAIT_LEAF                5
+#define CPUID5_ECX_EXTENSIONS_SUPPORTED 0x1
+#define CPUID5_ECX_INTERRUPT_BREAK      0x2
+
 #ifdef __i386__
 #define cpu_has_vme		boot_cpu_has(X86_FEATURE_VME)
 #define cpu_has_de		boot_cpu_has(X86_FEATURE_DE)
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -304,6 +304,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_getidletim
 #define XEN_PM_CX   0
 #define XEN_PM_PX   1
 #define XEN_PM_TX   2
+#define XEN_PM_PDC  3
 
 /* Px sub info type */
 #define XEN_PX_PCT   1
@@ -401,6 +402,7 @@ struct xenpf_set_processor_pminfo {
     union {
         struct xen_processor_power          power;/* Cx: _CST/_CSD */
         struct xen_processor_performance    perf; /* Px: _PPC/_PCT/_PSS/_PSD */
+        XEN_GUEST_HANDLE(uint32)            pdc;  /* _PDC */
     } u;
 };
 typedef struct xenpf_set_processor_pminfo xenpf_set_processor_pminfo_t;
--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -322,6 +322,8 @@ static inline int acpi_boot_table_init(v
 
 #endif 	/*!CONFIG_ACPI_BOOT*/
 
+int get_cpu_id(u8 acpi_id);
+
 unsigned int acpi_register_gsi (u32 gsi, int edge_level, int active_high_low);
 int acpi_gsi_to_irq (u32 gsi, unsigned int *irq);
 
@@ -358,6 +360,9 @@ static inline unsigned int acpi_get_csta
 static inline void acpi_set_cstate_limit(unsigned int new_limit) { return; }
 #endif
 
+int acpi_set_pdc_bits(u32 acpi_id, XEN_GUEST_HANDLE(uint32));
+int arch_acpi_set_pdc_bits(u32 acpi_id, u32 *, u32 mask);
+
 #ifdef CONFIG_ACPI_NUMA
 int acpi_get_pxm(acpi_handle handle);
 #else

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-19 15:01           ` Jan Beulich
@ 2011-08-21  5:26             ` Tian, Kevin
  2011-08-25 12:37               ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Tian, Kevin @ 2011-08-21  5:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Wei, Gang

> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Friday, August 19, 2011 11:02 PM
> > >> Yet another idea - why don't we simply pass the buffer passed to
> > >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the
> > >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH
> > >> (which I don't think Xen really supports at present).
> > >>
> > >> Or really, depending on who controls what, the P, C, and T bits should
> > >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently
> > >> does, and then let Xen override the bits it ought to control).
> > >
> > > _PDC is encoded in AML language, and requires an ACPI parser which
> > > is one thing we avoid in Xen. If Xen want to override those bits, then
> > > whole ACPI component needs move down to Xen too.
> >
> > No, I'm not saying the evaluation should be happening there. Below is
> > a draft hypervisor patch (only compile tested so far).
> 
> Attached a patch that actually works (with a minimal Dom0 addition).
> 

yes, this change looks more straightforward. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-21  5:26             ` Tian, Kevin
@ 2011-08-25 12:37               ` Jan Beulich
  2011-08-26  2:18                 ` Tian, Kevin
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2011-08-25 12:37 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 21.08.11 at 07:26, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> Sent: Friday, August 19, 2011 11:02 PM
>> > >> Yet another idea - why don't we simply pass the buffer passed to
>> > >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the
>> > >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH
>> > >> (which I don't think Xen really supports at present).
>> > >>
>> > >> Or really, depending on who controls what, the P, C, and T bits should
>> > >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently
>> > >> does, and then let Xen override the bits it ought to control).
>> > >
>> > > _PDC is encoded in AML language, and requires an ACPI parser which
>> > > is one thing we avoid in Xen. If Xen want to override those bits, then
>> > > whole ACPI component needs move down to Xen too.
>> >
>> > No, I'm not saying the evaluation should be happening there. Below is
>> > a draft hypervisor patch (only compile tested so far).
>> 
>> Attached a patch that actually works (with a minimal Dom0 addition).
>> 
> 
> yes, this change looks more straightforward. :-)

With that in, we still have more deficiencies compared to native Linux.

For one, we don't use mwait when ACPI doesn't tell us to, while Linux
does (in the intel_idle driver for deeper C-states, and for C1 also via
mwait_idle()). This is likely a bit more work, but it should be possible to
construct C-state information from CPUID leaf 5 (and, if valid, ignore
information passed down from Dom0), which would match intel_idle's
taking precedence over acpi_idle in Linux.

Second, if only C1 gets announced by ACPI, we end up not using it
because Dom0 simply neglects to let the hypervisor know. This is
because acpi_processor_get_power_info_cst() (back to at least
2.6.16) returns -EFAULT if less than two C-states were found. Simply
prefixing the check with "!processor_pm_external() && " fixes this
(but I don't know whether something similar could be done in Jeremy's
tree).

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-25 12:37               ` Jan Beulich
@ 2011-08-26  2:18                 ` Tian, Kevin
  2011-09-05  8:14                   ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Tian, Kevin @ 2011-08-26  2:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Wei, Gang,
	'Konrad Rzeszutek Wilk (konrad.wilk@oracle.com)'

> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Thursday, August 25, 2011 8:37 PM
> 
> >>> On 21.08.11 at 07:26, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> Sent: Friday, August 19, 2011 11:02 PM
> >> > >> Yet another idea - why don't we simply pass the buffer passed to
> >> > >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the
> >> > >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH
> >> > >> (which I don't think Xen really supports at present).
> >> > >>
> >> > >> Or really, depending on who controls what, the P, C, and T bits should
> >> > >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently
> >> > >> does, and then let Xen override the bits it ought to control).
> >> > >
> >> > > _PDC is encoded in AML language, and requires an ACPI parser which
> >> > > is one thing we avoid in Xen. If Xen want to override those bits, then
> >> > > whole ACPI component needs move down to Xen too.
> >> >
> >> > No, I'm not saying the evaluation should be happening there. Below is
> >> > a draft hypervisor patch (only compile tested so far).
> >>
> >> Attached a patch that actually works (with a minimal Dom0 addition).
> >>
> >
> > yes, this change looks more straightforward. :-)
> 
> With that in, we still have more deficiencies compared to native Linux.

definitely there'll be even more than what's revealed today, due to the
way that dom0 ACPI processor driver is tightly bound. there're lots of
factors in dom0 itself which may impact the verification/filtering on
Cx entries provide by BIOS, while some of which should be avoided from
Xen p.o.v, such as the 2nd example you just found. The more severe is
that to work around those factors adds intrusive Xen awareness into
generic ACPI processor driver, e.g. 

@@ -780,7 +780,7 @@ static int acpi_processor_get_power_info
 			  current_count));
 
 	/* Validate number of power states discovered */
-	if (current_count < 2)
+	if (current_count < 1 + !processor_pm_external())
 		status = -EFAULT;
 
       end:

More changes like above are added, less possibilities for Xen PM
changes to be accepted into upstream. Also such specific changes
made on one dom0 version may be invalid in a new version quickly.
Above change is one example which doesn't hold true in newer
kernel. 

When working with Konrad on rebasing xen PM patches to latest
Linux 3.0.0. we tried hard to avoid intrusive changes in generic
ACPI processor driver, by trying to invoke existing interfaces in
higher level as possible. The end result is that we skip handling
those corner cases like above example for now, by at least making
Xen PM working on majority boxes. Later after Xen PM is accepted
upstream with more Xen awareness in Linux ACPI people, those
corner cases handling may be improved gradually.
 
Another option Yang currently is working on is to port native intel-idle
driver to Xen, which should avoid nasty dependency on dom0 ACPI
bits and immune to various BIOS bugs.

> 
> For one, we don't use mwait when ACPI doesn't tell us to, while Linux
> does (in the intel_idle driver for deeper C-states, and for C1 also via
> mwait_idle()). This is likely a bit more work, but it should be possible to
> construct C-state information from CPUID leaf 5 (and, if valid, ignore
> information passed down from Dom0), which would match intel_idle's
> taking precedence over acpi_idle in Linux.

yes. This should be a desired feature in Xen, with some limitations:
	- not work with CPU hotplug
	- not work with old boxes (starting from Nehalem)
	- not work with Px/Cx state changes (_PPC, _CST e.g. from Node Manager)

So this will be a supplemented option to existing acpi_idle, and should
work on most cases when above 3 factors are not concerned.

> 
> Second, if only C1 gets announced by ACPI, we end up not using it
> because Dom0 simply neglects to let the hypervisor know. This is
> because acpi_processor_get_power_info_cst() (back to at least
> 2.6.16) returns -EFAULT if less than two C-states were found. Simply
> prefixing the check with "!processor_pm_external() && " fixes this
> (but I don't know whether something similar could be done in Jeremy's
> tree).

this is a very temporary problem which disappears quickly in subsequent
versions. But if just taking 2.6.18-xen, it's a right fix.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-26  2:18                 ` Tian, Kevin
@ 2011-09-05  8:14                   ` Jan Beulich
  0 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2011-09-05  8:14 UTC (permalink / raw)
  To: Kevin Tian
  Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei,
	'KonradRzeszutek Wilk (konrad.wilk@oracle.com)'

>>> On 26.08.11 at 04:18, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> Sent: Thursday, August 25, 2011 8:37 PM
>> 
>> >>> On 21.08.11 at 07:26, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> >> Sent: Friday, August 19, 2011 11:02 PM
>> >> > >> Yet another idea - why don't we simply pass the buffer passed to
>> >> > >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with the
>> >> > >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH
>> >> > >> (which I don't think Xen really supports at present).
>> >> > >>
>> >> > >> Or really, depending on who controls what, the P, C, and T bits should
>> >> > >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it currently
>> >> > >> does, and then let Xen override the bits it ought to control).
>> >> > >
>> >> > > _PDC is encoded in AML language, and requires an ACPI parser which
>> >> > > is one thing we avoid in Xen. If Xen want to override those bits, then
>> >> > > whole ACPI component needs move down to Xen too.
>> >> >
>> >> > No, I'm not saying the evaluation should be happening there. Below is
>> >> > a draft hypervisor patch (only compile tested so far).
>> >>
>> >> Attached a patch that actually works (with a minimal Dom0 addition).
>> >>
>> >
>> > yes, this change looks more straightforward. :-)
>> 
>> With that in, we still have more deficiencies compared to native Linux.
> 
> definitely there'll be even more than what's revealed today, due to the
> way that dom0 ACPI processor driver is tightly bound. there're lots of
> factors in dom0 itself which may impact the verification/filtering on
> Cx entries provide by BIOS, while some of which should be avoided from
> Xen p.o.v, such as the 2nd example you just found. The more severe is
> that to work around those factors adds intrusive Xen awareness into
> generic ACPI processor driver, e.g. 
> 
> @@ -780,7 +780,7 @@ static int acpi_processor_get_power_info
>  			  current_count));
>  
>  	/* Validate number of power states discovered */
> -	if (current_count < 2)
> +	if (current_count < 1 + !processor_pm_external())
>  		status = -EFAULT;
>  
>        end:
> 
> More changes like above are added, less possibilities for Xen PM
> changes to be accepted into upstream. Also such specific changes
> made on one dom0 version may be invalid in a new version quickly.
> Above change is one example which doesn't hold true in newer
> kernel. 

Afaict, the code is unchanged up to at least 3.0, and requires
the same adjustment (at least for the non-pvops case; the pvops
one clearly can't be reasonably viewed from any post-2.6.32
perspective).

> When working with Konrad on rebasing xen PM patches to latest
> Linux 3.0.0. we tried hard to avoid intrusive changes in generic
> ACPI processor driver, by trying to invoke existing interfaces in
> higher level as possible. The end result is that we skip handling
> those corner cases like above example for now, by at least making
> Xen PM working on majority boxes. Later after Xen PM is accepted
> upstream with more Xen awareness in Linux ACPI people, those
> corner cases handling may be improved gradually.
>  
> Another option Yang currently is working on is to port native intel-idle
> driver to Xen, which should avoid nasty dependency on dom0 ACPI
> bits and immune to various BIOS bugs.

That's good to hear.

>> For one, we don't use mwait when ACPI doesn't tell us to, while Linux
>> does (in the intel_idle driver for deeper C-states, and for C1 also via
>> mwait_idle()). This is likely a bit more work, but it should be possible to
>> construct C-state information from CPUID leaf 5 (and, if valid, ignore
>> information passed down from Dom0), which would match intel_idle's
>> taking precedence over acpi_idle in Linux.
> 
> yes. This should be a desired feature in Xen, with some limitations:
> 	- not work with CPU hotplug
> 	- not work with old boxes (starting from Nehalem)
> 	- not work with Px/Cx state changes (_PPC, _CST e.g. from Node Manager)
> 
> So this will be a supplemented option to existing acpi_idle, and should
> work on most cases when above 3 factors are not concerned.
> 
>> 
>> Second, if only C1 gets announced by ACPI, we end up not using it
>> because Dom0 simply neglects to let the hypervisor know. This is
>> because acpi_processor_get_power_info_cst() (back to at least
>> 2.6.16) returns -EFAULT if less than two C-states were found. Simply
>> prefixing the check with "!processor_pm_external() && " fixes this
>> (but I don't know whether something similar could be done in Jeremy's
>> tree).
> 
> this is a very temporary problem which disappears quickly in subsequent
> versions. But if just taking 2.6.18-xen, it's a right fix.

Again - when did you see this disappear?

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-16  8:45                 ` Jan Beulich
@ 2011-08-16  8:50                   ` Tian, Kevin
  0 siblings, 0 replies; 36+ messages in thread
From: Tian, Kevin @ 2011-08-16  8:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Wei, Gang

> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Tuesday, August 16, 2011 4:45 PM
> 
> >>> On 16.08.11 at 10:29, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> Sent: Tuesday, August 16, 2011 4:13 PM
> >>
> >> >>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> >> Sent: Tuesday, August 16, 2011 2:42 PM
> >> >>
> >> >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >> >> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> >> >> Sent: Monday, August 15, 2011 6:29 PM
> >> >> >>
> >> >> >> that while improving the situation on CPUs that support the break-on-
> >> >> >> interrupt extension to mwait, it would result in C2/C3 not being usable
> >> >> >> at all on CPUs that don't (but support mwait in its simpler form and
> >> >> >> have ACPI tables specifying FFH as address space id). Is that only a
> >> >> >> theoretical concern (i.e. is there an implicit guarantee that for other
> >> >> >> than C1 FFH wouldn't be specified without that extension being
> >> >> >> available)? I thinks it's a practical one, or otherwise there wouldn't
> >> >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC
> >> >> >> evaluation.
> >> >> >
> >> >> > Yes, this is a practical one, though I don't know any box doing that. In
> >> > all
> >> >> > the boxes I've been using so far, all the extensions are available. But
> >> >> > since
> >> >> > BIOS vendor may also impact the availability of CPUID bits, I think we
> >> >> > should do it right by strictly conforming to theSDM. I.e. we need check
> >> >> > CPUID leafs and then verify all Cx states propagated from dom0,
> instead
> >> >> > of blindly following its info. Will work a patch for that.
> >> >>
> >> >> You're getting it sort of wrong way round: What I don't want to do (but
> >> >> seemingly being necessary) is mimic the decision logic the hypervisor
> >> >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering
> >> >> through MWAIT) in Dom0 when deciding about the bits to pass to
> >> >
> >> > break-on-interrupt is not a hard requirement to use MWAIT. Even when
> >> > that extension is not available, MWAIT can be still used to enter C2/C3,
> >> > just with interrupt enabled.
> >>
> >> And that's why this implementation detail should be confined to the
> >> hypervisor - Dom0 should not care about this if at all possible.
> >>
> >> >> _PDC. That ought to be an implementation detail (subject to change)
> >> >> in the hypervisor alone. The hypervisor itself, otoh, already properly
> >> >> checks CPUID leaf 5 (and that's what might cause it to not use mwait
> >> >> despite the bit in CPUID leaf 1 being set, which should be all Dom0
> >> >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH).
> >> >>
> >> >
> >> > I made a mistake, that currently CPUID leaf 5 is already checked in
> >> > check_cx in hypervisor, so it should be sane. However I still fail to catch
> >> > your real concern here. :/
> >>
> >> If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it
> >> will (with the logic outlined above) call _PDC without clearing
> >> ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension
> >> is not present, but the address space ID for C2 or C3 is set to FFH,
> >> then Xen (in acpi_processor_ffh_cstate_probe()) will reject the
> >> Cx entry (and hence refrain from using the respective C-state),
> >> whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case,
> >> firmware would (normally) have converted the address space ID to
> >> SYSTEM_IO, and hence Xen would have decided to use C2/C3 with
> >> the SYSIO entry method.
> >>
> >> So this is only acceptable if there are *no* production CPUs of any
> >> vendor that would support MWAIT without the break-on-interrupt
> >> extension.
> >>
> >
> > yes, that's also the way that native Linux code currently uses:
> > 	- notify BIOS ACPI_PDC_C_C2C3_FFH if cpu has mwait
> > 	- reject Cx entry if break-on-interrupt extension is not present later
> > in acpi processor driver when parsing Cx entries.
> >
> > From BIOS ACPI p.o.v, OSPM can notify BIOS about FFH style if following
> > conditions are true:
> > 	a) cpu supports mwait
> > 	b) OSPM itself supports mwait
> >
> > a) is architectural, but b) is implementation specific regarding to what
> > can be called "support". Obviously both Xen and Linux here use an
> > inconsistent way between the place notifying BIOS and the point parsing
> > ACPI Cx entry. So your conclusion is correct that C2/C3 would be rejected
> > on the CPU which doesn't support MWAIT with break-on-interrupt
> > extension. But it should be fine in the real world, and we may consider
> > whether to do something when a real case is encountered in the future.
> >
> > On the other hand, you can think it as the decision from Xen that it
> > doesn't want to use legacy I/O method for C2/C3 when such situation exists.
> > :-)
> 
> Yeah, but customers could validly view this as regression (because on
> such a system Xen would use C2/C3 currently).

Well, if such system -does- exists and such customers -do-exist. Anyway legacy I/O 
method should really be avoided. If some customers happen to do some power
business on such system, I'd suggest moving to a typical system since that
environment won't ensure a good power efficiency. It's not a good base for power
tuning.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-16  8:29               ` Tian, Kevin
@ 2011-08-16  8:45                 ` Jan Beulich
  2011-08-16  8:50                   ` Tian, Kevin
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2011-08-16  8:45 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 16.08.11 at 10:29, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> Sent: Tuesday, August 16, 2011 4:13 PM
>> 
>> >>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> >> Sent: Tuesday, August 16, 2011 2:42 PM
>> >>
>> >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> >> >> Sent: Monday, August 15, 2011 6:29 PM
>> >> >>
>> >> >> that while improving the situation on CPUs that support the break-on-
>> >> >> interrupt extension to mwait, it would result in C2/C3 not being usable
>> >> >> at all on CPUs that don't (but support mwait in its simpler form and
>> >> >> have ACPI tables specifying FFH as address space id). Is that only a
>> >> >> theoretical concern (i.e. is there an implicit guarantee that for other
>> >> >> than C1 FFH wouldn't be specified without that extension being
>> >> >> available)? I thinks it's a practical one, or otherwise there wouldn't
>> >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC
>> >> >> evaluation.
>> >> >
>> >> > Yes, this is a practical one, though I don't know any box doing that. In
>> > all
>> >> > the boxes I've been using so far, all the extensions are available. But
>> >> > since
>> >> > BIOS vendor may also impact the availability of CPUID bits, I think we
>> >> > should do it right by strictly conforming to theSDM. I.e. we need check
>> >> > CPUID leafs and then verify all Cx states propagated from dom0, instead
>> >> > of blindly following its info. Will work a patch for that.
>> >>
>> >> You're getting it sort of wrong way round: What I don't want to do (but
>> >> seemingly being necessary) is mimic the decision logic the hypervisor
>> >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering
>> >> through MWAIT) in Dom0 when deciding about the bits to pass to
>> >
>> > break-on-interrupt is not a hard requirement to use MWAIT. Even when
>> > that extension is not available, MWAIT can be still used to enter C2/C3,
>> > just with interrupt enabled.
>> 
>> And that's why this implementation detail should be confined to the
>> hypervisor - Dom0 should not care about this if at all possible.
>> 
>> >> _PDC. That ought to be an implementation detail (subject to change)
>> >> in the hypervisor alone. The hypervisor itself, otoh, already properly
>> >> checks CPUID leaf 5 (and that's what might cause it to not use mwait
>> >> despite the bit in CPUID leaf 1 being set, which should be all Dom0
>> >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH).
>> >>
>> >
>> > I made a mistake, that currently CPUID leaf 5 is already checked in
>> > check_cx in hypervisor, so it should be sane. However I still fail to catch
>> > your real concern here. :/
>> 
>> If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it
>> will (with the logic outlined above) call _PDC without clearing
>> ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension
>> is not present, but the address space ID for C2 or C3 is set to FFH,
>> then Xen (in acpi_processor_ffh_cstate_probe()) will reject the
>> Cx entry (and hence refrain from using the respective C-state),
>> whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case,
>> firmware would (normally) have converted the address space ID to
>> SYSTEM_IO, and hence Xen would have decided to use C2/C3 with
>> the SYSIO entry method.
>> 
>> So this is only acceptable if there are *no* production CPUs of any
>> vendor that would support MWAIT without the break-on-interrupt
>> extension.
>> 
> 
> yes, that's also the way that native Linux code currently uses:
> 	- notify BIOS ACPI_PDC_C_C2C3_FFH if cpu has mwait
> 	- reject Cx entry if break-on-interrupt extension is not present later
> in acpi processor driver when parsing Cx entries.
> 
> From BIOS ACPI p.o.v, OSPM can notify BIOS about FFH style if following
> conditions are true:
> 	a) cpu supports mwait
> 	b) OSPM itself supports mwait
> 
> a) is architectural, but b) is implementation specific regarding to what
> can be called "support". Obviously both Xen and Linux here use an
> inconsistent way between the place notifying BIOS and the point parsing
> ACPI Cx entry. So your conclusion is correct that C2/C3 would be rejected
> on the CPU which doesn't support MWAIT with break-on-interrupt 
> extension. But it should be fine in the real world, and we may consider
> whether to do something when a real case is encountered in the future.
> 
> On the other hand, you can think it as the decision from Xen that it
> doesn't want to use legacy I/O method for C2/C3 when such situation exists. 
> :-)

Yeah, but customers could validly view this as regression (because on
such a system Xen would use C2/C3 currently).

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-16  8:13             ` Jan Beulich
@ 2011-08-16  8:29               ` Tian, Kevin
  2011-08-16  8:45                 ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Tian, Kevin @ 2011-08-16  8:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Wei, Gang

> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Tuesday, August 16, 2011 4:13 PM
> 
> >>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> Sent: Tuesday, August 16, 2011 2:42 PM
> >>
> >> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> >> Sent: Monday, August 15, 2011 6:29 PM
> >> >>
> >> >> that while improving the situation on CPUs that support the break-on-
> >> >> interrupt extension to mwait, it would result in C2/C3 not being usable
> >> >> at all on CPUs that don't (but support mwait in its simpler form and
> >> >> have ACPI tables specifying FFH as address space id). Is that only a
> >> >> theoretical concern (i.e. is there an implicit guarantee that for other
> >> >> than C1 FFH wouldn't be specified without that extension being
> >> >> available)? I thinks it's a practical one, or otherwise there wouldn't
> >> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC
> >> >> evaluation.
> >> >
> >> > Yes, this is a practical one, though I don't know any box doing that. In
> > all
> >> > the boxes I've been using so far, all the extensions are available. But
> >> > since
> >> > BIOS vendor may also impact the availability of CPUID bits, I think we
> >> > should do it right by strictly conforming to theSDM. I.e. we need check
> >> > CPUID leafs and then verify all Cx states propagated from dom0, instead
> >> > of blindly following its info. Will work a patch for that.
> >>
> >> You're getting it sort of wrong way round: What I don't want to do (but
> >> seemingly being necessary) is mimic the decision logic the hypervisor
> >> uses (i.e. require the break-on-interrupt extension for C2/C3 entering
> >> through MWAIT) in Dom0 when deciding about the bits to pass to
> >
> > break-on-interrupt is not a hard requirement to use MWAIT. Even when
> > that extension is not available, MWAIT can be still used to enter C2/C3,
> > just with interrupt enabled.
> 
> And that's why this implementation detail should be confined to the
> hypervisor - Dom0 should not care about this if at all possible.
> 
> >> _PDC. That ought to be an implementation detail (subject to change)
> >> in the hypervisor alone. The hypervisor itself, otoh, already properly
> >> checks CPUID leaf 5 (and that's what might cause it to not use mwait
> >> despite the bit in CPUID leaf 1 being set, which should be all Dom0
> >> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH).
> >>
> >
> > I made a mistake, that currently CPUID leaf 5 is already checked in
> > check_cx in hypervisor, so it should be sane. However I still fail to catch
> > your real concern here. :/
> 
> If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it
> will (with the logic outlined above) call _PDC without clearing
> ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension
> is not present, but the address space ID for C2 or C3 is set to FFH,
> then Xen (in acpi_processor_ffh_cstate_probe()) will reject the
> Cx entry (and hence refrain from using the respective C-state),
> whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case,
> firmware would (normally) have converted the address space ID to
> SYSTEM_IO, and hence Xen would have decided to use C2/C3 with
> the SYSIO entry method.
> 
> So this is only acceptable if there are *no* production CPUs of any
> vendor that would support MWAIT without the break-on-interrupt
> extension.
> 

yes, that's also the way that native Linux code currently uses:
	- notify BIOS ACPI_PDC_C_C2C3_FFH if cpu has mwait
	- reject Cx entry if break-on-interrupt extension is not present later
in acpi processor driver when parsing Cx entries.

>From BIOS ACPI p.o.v, OSPM can notify BIOS about FFH style if following
conditions are true:
	a) cpu supports mwait
	b) OSPM itself supports mwait

a) is architectural, but b) is implementation specific regarding to what
can be called "support". Obviously both Xen and Linux here use an
inconsistent way between the place notifying BIOS and the point parsing
ACPI Cx entry. So your conclusion is correct that C2/C3 would be rejected
on the CPU which doesn't support MWAIT with break-on-interrupt 
extension. But it should be fine in the real world, and we may consider
whether to do something when a real case is encountered in the future.

On the other hand, you can think it as the decision from Xen that it
doesn't want to use legacy I/O method for C2/C3 when such situation exists. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-16  6:53           ` Tian, Kevin
@ 2011-08-16  8:13             ` Jan Beulich
  2011-08-16  8:29               ` Tian, Kevin
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2011-08-16  8:13 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 16.08.11 at 08:53, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> Sent: Tuesday, August 16, 2011 2:42 PM
>> 
>> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> >> Sent: Monday, August 15, 2011 6:29 PM
>> >>
>> >> that while improving the situation on CPUs that support the break-on-
>> >> interrupt extension to mwait, it would result in C2/C3 not being usable
>> >> at all on CPUs that don't (but support mwait in its simpler form and
>> >> have ACPI tables specifying FFH as address space id). Is that only a
>> >> theoretical concern (i.e. is there an implicit guarantee that for other
>> >> than C1 FFH wouldn't be specified without that extension being
>> >> available)? I thinks it's a practical one, or otherwise there wouldn't
>> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC
>> >> evaluation.
>> >
>> > Yes, this is a practical one, though I don't know any box doing that. In 
> all
>> > the boxes I've been using so far, all the extensions are available. But
>> > since
>> > BIOS vendor may also impact the availability of CPUID bits, I think we
>> > should do it right by strictly conforming to theSDM. I.e. we need check
>> > CPUID leafs and then verify all Cx states propagated from dom0, instead
>> > of blindly following its info. Will work a patch for that.
>> 
>> You're getting it sort of wrong way round: What I don't want to do (but
>> seemingly being necessary) is mimic the decision logic the hypervisor
>> uses (i.e. require the break-on-interrupt extension for C2/C3 entering
>> through MWAIT) in Dom0 when deciding about the bits to pass to
> 
> break-on-interrupt is not a hard requirement to use MWAIT. Even when
> that extension is not available, MWAIT can be still used to enter C2/C3,
> just with interrupt enabled.

And that's why this implementation detail should be confined to the
hypervisor - Dom0 should not care about this if at all possible.

>> _PDC. That ought to be an implementation detail (subject to change)
>> in the hypervisor alone. The hypervisor itself, otoh, already properly
>> checks CPUID leaf 5 (and that's what might cause it to not use mwait
>> despite the bit in CPUID leaf 1 being set, which should be all Dom0
>> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH).
>> 
> 
> I made a mistake, that currently CPUID leaf 5 is already checked in
> check_cx in hypervisor, so it should be sane. However I still fail to catch
> your real concern here. :/

If Dom0 finds (real) CPUID leaf 1 report MWAIT to be available, it
will (with the logic outlined above) call _PDC without clearing
ACPI_PDC_C_C2C3_FFH. If now the break-on-interrupt extension
is not present, but the address space ID for C2 or C3 is set to FFH,
then Xen (in acpi_processor_ffh_cstate_probe()) will reject the
Cx entry (and hence refrain from using the respective C-state),
whereas if Dom0 cleared ACPI_PDC_C_C2C3_FFH in that case,
firmware would (normally) have converted the address space ID to
SYSTEM_IO, and hence Xen would have decided to use C2/C3 with
the SYSIO entry method.

So this is only acceptable if there are *no* production CPUs of any
vendor that would support MWAIT without the break-on-interrupt
extension.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-16  6:42         ` Jan Beulich
@ 2011-08-16  6:53           ` Tian, Kevin
  2011-08-16  8:13             ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Tian, Kevin @ 2011-08-16  6:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Wei, Gang

> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Tuesday, August 16, 2011 2:42 PM
> 
> >>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> Sent: Monday, August 15, 2011 6:29 PM
> >>
> >> >>> On 15.08.11 at 10:09, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> >> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >> >> > It's unlikely to make into upstream, and also get lost in
> >> >> > into some distro such as SLES11.
> >> >>
> >> >> We can certainly fix it there.
> >> >>
> >> >
> >> > that'd be great. I/O method has observable impact on power efficiency,
> >> > and the fix would be very welcomed. :-)
> >>
> >> While the change is simple to do and works, I'm somewhat concerned
> >
> > Current change mentioned above is not safe, which always enables mwait
> > support w/o knowledge about underlying presence. But new processors
> 
> Not following you here: If I execute a "real" cpuid instruction, I will know
> whether mwait is "present".

Sorry. I thought you want to integrate current patch in 2.6.32 pvops tree:
diff --git a/arch/x86/kernel/acpi/processor.c b/arch/x86/kernel/acpi/processor.c
index 8c9526d..d88866c 100644
--- a/arch/x86/kernel/acpi/processor.c
+++ b/arch/x86/kernel/acpi/processor.c
@@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr, struct cpuinfo_x86 *c)
        /*
         * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
         */
-       if (!cpu_has(c, X86_FEATURE_MWAIT))
+       if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain())
                buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
 
        obj->type = ACPI_TYPE_BUFFER;

that's definitely wrong.

> 
> > all have mwait support, so this may not be big problem.
> >
> >> that while improving the situation on CPUs that support the break-on-
> >> interrupt extension to mwait, it would result in C2/C3 not being usable
> >> at all on CPUs that don't (but support mwait in its simpler form and
> >> have ACPI tables specifying FFH as address space id). Is that only a
> >> theoretical concern (i.e. is there an implicit guarantee that for other
> >> than C1 FFH wouldn't be specified without that extension being
> >> available)? I thinks it's a practical one, or otherwise there wouldn't
> >> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC
> >> evaluation.
> >
> > Yes, this is a practical one, though I don't know any box doing that. In all
> > the boxes I've been using so far, all the extensions are available. But
> > since
> > BIOS vendor may also impact the availability of CPUID bits, I think we
> > should do it right by strictly conforming to theSDM. I.e. we need check
> > CPUID leafs and then verify all Cx states propagated from dom0, instead
> > of blindly following its info. Will work a patch for that.
> 
> You're getting it sort of wrong way round: What I don't want to do (but
> seemingly being necessary) is mimic the decision logic the hypervisor
> uses (i.e. require the break-on-interrupt extension for C2/C3 entering
> through MWAIT) in Dom0 when deciding about the bits to pass to

break-on-interrupt is not a hard requirement to use MWAIT. Even when
that extension is not available, MWAIT can be still used to enter C2/C3,
just with interrupt enabled.

> _PDC. That ought to be an implementation detail (subject to change)
> in the hypervisor alone. The hypervisor itself, otoh, already properly
> checks CPUID leaf 5 (and that's what might cause it to not use mwait
> despite the bit in CPUID leaf 1 being set, which should be all Dom0
> ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH).
> 

I made a mistake, that currently CPUID leaf 5 is already checked in
check_cx in hypervisor, so it should be sane. However I still fail to catch
your real concern here. :/

Thanks,
Kevin

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-16  6:03       ` Tian, Kevin
@ 2011-08-16  6:42         ` Jan Beulich
  2011-08-16  6:53           ` Tian, Kevin
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2011-08-16  6:42 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 16.08.11 at 08:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> Sent: Monday, August 15, 2011 6:29 PM
>> 
>> >>> On 15.08.11 at 10:09, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> >> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> >> > It's unlikely to make into upstream, and also get lost in
>> >> > into some distro such as SLES11.
>> >>
>> >> We can certainly fix it there.
>> >>
>> >
>> > that'd be great. I/O method has observable impact on power efficiency,
>> > and the fix would be very welcomed. :-)
>> 
>> While the change is simple to do and works, I'm somewhat concerned
> 
> Current change mentioned above is not safe, which always enables mwait
> support w/o knowledge about underlying presence. But new processors

Not following you here: If I execute a "real" cpuid instruction, I will know
whether mwait is "present".

> all have mwait support, so this may not be big problem.
> 
>> that while improving the situation on CPUs that support the break-on-
>> interrupt extension to mwait, it would result in C2/C3 not being usable
>> at all on CPUs that don't (but support mwait in its simpler form and
>> have ACPI tables specifying FFH as address space id). Is that only a
>> theoretical concern (i.e. is there an implicit guarantee that for other
>> than C1 FFH wouldn't be specified without that extension being
>> available)? I thinks it's a practical one, or otherwise there wouldn't
>> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC
>> evaluation.
> 
> Yes, this is a practical one, though I don't know any box doing that. In all
> the boxes I've been using so far, all the extensions are available. But 
> since 
> BIOS vendor may also impact the availability of CPUID bits, I think we 
> should do it right by strictly conforming to theSDM. I.e. we need check 
> CPUID leafs and then verify all Cx states propagated from dom0, instead
> of blindly following its info. Will work a patch for that.

You're getting it sort of wrong way round: What I don't want to do (but
seemingly being necessary) is mimic the decision logic the hypervisor
uses (i.e. require the break-on-interrupt extension for C2/C3 entering
through MWAIT) in Dom0 when deciding about the bits to pass to
_PDC. That ought to be an implementation detail (subject to change)
in the hypervisor alone. The hypervisor itself, otoh, already properly
checks CPUID leaf 5 (and that's what might cause it to not use mwait
despite the bit in CPUID leaf 1 being set, which should be all Dom0
ought to look at for deciding whether to clear ACPI_PDC_C_C2C3_FFH).

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: expose MWAIT to dom0
  2011-08-16  5:34               ` Tian, Kevin
@ 2011-08-16  6:31                 ` Keir Fraser
  0 siblings, 0 replies; 36+ messages in thread
From: Keir Fraser @ 2011-08-16  6:31 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wei, Gang, Jan Beulich




On 16/08/2011 06:34, "Tian, Kevin" <kevin.tian@intel.com> wrote:

>> From: Keir Fraser [mailto:keir.xen@gmail.com]
>> Sent: Monday, August 15, 2011 5:02 PM
>> 
>> On 15/08/2011 09:14, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> 
>>>> cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid,
>>>> right? And cpuid goes through a pv_ops hook?
>>>> 
>>> 
>>> I'm not quite catching you here. Do you want to prefill mwait bit from
>>> pv_ops
>>> hook unconditionally even in current situation where Xen doesn't expose
>>> mwait, or selectively under some dom0's boot option even when Xen is
>>> changed to expose mwait? The first case is not sane, while for the 2nd case
>>> I don't think any pv_ops hook is required as long as Xen can expose it,
>>> isn't
>>> it?
>> 
>> But there is a pv_ops hook, and Xen isn't going to expose it because it
>> breaks old dom0 kernels.
>> 
> 
> yes, now I understand your suggestion. Basically we discussed two approaches:
> 
> a) in current pv_ops hook (xen_cpuid):
> - use unconditional cpuid to query mwait capability on physical cpu
> - if the bit is set, and SIF_PM_MASK indicates xen pm is enabled:
> then set the bit in the ECX when leaf 1 is queried
>   this should effectively has X86_FEATURE_MWAIT set, and then _PDC method
> will notify BIOS using mwait entry method.
>   This method doesn't require Xen change, but it would only work for future
> releases which incorporates this change
> 
> b) in Xen we selectively clear MWAIT capability in pv_cpuid, based on whether
> xen_cpuidle is enabled. If there's no MWAIT available on physical CPU, or
> xen_cpuidle is disabled, MWAIT is not exposed to the guest. This approach
> doesn't break old dom0 kernel, as it's controlled by xen_cpuidle cmdline
> option.

It does require xen_cpuidle=0 to be added to the Xen command line. That's
not great.

> Basically it's not suggested to activate Cx transition by using legacy I/O
> method,
> so it's fine to disable xen cpuidle on those old dom0 kernel.
> 
> approach a) is better from the angle that we don't want non-ring0 code to
> execute MWAIT which causes invalid opcode exception, while for PM purpose
> MWAIT is purely informational.
> 
> Approach b) is better if we want existing Xen deployments to use efficient
> C-state benefit, since Xen is backward compatible and thus upgrade to a
> newer Xen release is lighter than upgrading to a newer dom0 kernel.
> 
> What's your opinion about this? If b) is not a valid concern from your side,
> we
> will go to a) as you suggested.

I think (a) is the right way to go.

 -- Keir

> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-15 10:29     ` Jan Beulich
@ 2011-08-16  6:03       ` Tian, Kevin
  2011-08-16  6:42         ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Tian, Kevin @ 2011-08-16  6:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Wei, Gang

> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Monday, August 15, 2011 6:29 PM
> 
> >>> On 15.08.11 at 10:09, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>  From: Jan Beulich [mailto:JBeulich@novell.com]
> >> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >> > It's unlikely to make into upstream, and also get lost in
> >> > into some distro such as SLES11.
> >>
> >> We can certainly fix it there.
> >>
> >
> > that'd be great. I/O method has observable impact on power efficiency,
> > and the fix would be very welcomed. :-)
> 
> While the change is simple to do and works, I'm somewhat concerned

Current change mentioned above is not safe, which always enables mwait
support w/o knowledge about underlying presence. But new processors
all have mwait support, so this may not be big problem.

> that while improving the situation on CPUs that support the break-on-
> interrupt extension to mwait, it would result in C2/C3 not being usable
> at all on CPUs that don't (but support mwait in its simpler form and
> have ACPI tables specifying FFH as address space id). Is that only a
> theoretical concern (i.e. is there an implicit guarantee that for other
> than C1 FFH wouldn't be specified without that extension being
> available)? I thinks it's a practical one, or otherwise there wouldn't
> be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC
> evaluation.

Yes, this is a practical one, though I don't know any box doing that. In all
the boxes I've been using so far, all the extensions are available. But since 
BIOS vendor may also impact the availability of CPUID bits, I think we 
should do it right by strictly conforming to theSDM. I.e. we need check 
CPUID leafs and then verify all Cx states propagated from dom0, instead
of blindly following its info. Will work a patch for that.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-15  9:02             ` Keir Fraser
@ 2011-08-16  5:34               ` Tian, Kevin
  2011-08-16  6:31                 ` Keir Fraser
  0 siblings, 1 reply; 36+ messages in thread
From: Tian, Kevin @ 2011-08-16  5:34 UTC (permalink / raw)
  To: Keir Fraser, xen-devel; +Cc: Zhang, Yang Z, Wei, Gang, Jan Beulich

> From: Keir Fraser [mailto:keir.xen@gmail.com]
> Sent: Monday, August 15, 2011 5:02 PM
> 
> On 15/08/2011 09:14, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> >> cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid,
> >> right? And cpuid goes through a pv_ops hook?
> >>
> >
> > I'm not quite catching you here. Do you want to prefill mwait bit from pv_ops
> > hook unconditionally even in current situation where Xen doesn't expose
> > mwait, or selectively under some dom0's boot option even when Xen is
> > changed to expose mwait? The first case is not sane, while for the 2nd case
> > I don't think any pv_ops hook is required as long as Xen can expose it, isn't
> > it?
> 
> But there is a pv_ops hook, and Xen isn't going to expose it because it
> breaks old dom0 kernels.
> 

yes, now I understand your suggestion. Basically we discussed two approaches:

a) in current pv_ops hook (xen_cpuid):
	- use unconditional cpuid to query mwait capability on physical cpu
	- if the bit is set, and SIF_PM_MASK indicates xen pm is enabled:
		then set the bit in the ECX when leaf 1 is queried
  this should effectively has X86_FEATURE_MWAIT set, and then _PDC method
will notify BIOS using mwait entry method. 
  This method doesn't require Xen change, but it would only work for future
releases which incorporates this change

b) in Xen we selectively clear MWAIT capability in pv_cpuid, based on whether
xen_cpuidle is enabled. If there's no MWAIT available on physical CPU, or
xen_cpuidle is disabled, MWAIT is not exposed to the guest. This approach
doesn't break old dom0 kernel, as it's controlled by xen_cpuidle cmdline option.
Basically it's not suggested to activate Cx transition by using legacy I/O method,
so it's fine to disable xen cpuidle on those old dom0 kernel.

approach a) is better from the angle that we don't want non-ring0 code to
execute MWAIT which causes invalid opcode exception, while for PM purpose
MWAIT is purely informational. 

Approach b) is better if we want existing Xen deployments to use efficient 
C-state benefit, since Xen is backward compatible and thus upgrade to a 
newer Xen release is lighter than upgrading to a newer dom0 kernel.

What's your opinion about this? If b) is not a valid concern from your side, we
will go to a) as you suggested.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: expose MWAIT to dom0
  2011-08-15  8:02 ` Jan Beulich
  2011-08-15  8:09   ` Tian, Kevin
@ 2011-08-15 12:44   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-15 12:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Yang Z Zhang, Kevin Tian, xen-devel, Keir Fraser, Gang Wei

On Mon, Aug 15, 2011 at 09:02:29AM +0100, Jan Beulich wrote:
> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > There're basically two methods to enter a given C-state: legacy (hlt + I/O 
> > read),
> > and native(using mwait). MWAIT is always preferred when both underlying CPU
> > and OS support, which is a more efficient way to conduct C-state transition.
> > 
> > Xen PM relies on Dom0 to parse ACPI Cx/Px information, which involves one
> > step to notify BIOS about a set of capabilities supported by OSPM. One 
> > capability
> > is about mwait support, which if true ACPI Cx entry contains entry 
> > parameters 
> > for mwait, or else I/O port information is provided. Xen PM later decides 
> > entry
> > method (i/o or mwait) based on parsed ACPI information from dom0.
> > 
> > However Xen doesn't expose MWAIT capability to dom0 due to changeset 17573:
> > 
> > This then brings a problem to Dom0 which thinks underlying CPU
> > doesn't report mwait, and thus notify BIOS to use old I/O based method.
> > 
> > Later a trick is integrated in Jeremy's pvops tree:
> > 
> > --- a/arch/x86/kernel/acpi/processor.c
> > +++ b/arch/x86/kernel/acpi/processor.c
> > @@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr, 
> > struct cpuinfo_x86 *c)
> >         /*
> >          * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
> >          */
> > -       if (!cpu_has(c, X86_FEATURE_MWAIT))
> > +       if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain())
> >                 buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
> >  
> >         obj->type = ACPI_TYPE_BUFFER;
> > 
> > Above trick is ugly and error-prone, since it always enable mwait regardless 
> > of actual CPU capability.
> 
> 3.x (and later 2.6.3x) don't look at the CPUID flag anymore, they just
> check boot_option_idle_override, which is being controlled from the
> command line or enforced for some particular systems based on DMI
> data.

Or in case when running under Xen (dom0):
(arch/x86/xen/setup.c)

428         disable_cpuidle();
429         boot_option_idle_override = IDLE_HALT;

Which ends up calling the xen_safe_halt which does the yield hypercall.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-15  8:09   ` Tian, Kevin
  2011-08-15  8:18     ` Jan Beulich
@ 2011-08-15 10:29     ` Jan Beulich
  2011-08-16  6:03       ` Tian, Kevin
  1 sibling, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2011-08-15 10:29 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 15.08.11 at 10:09, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> > It's unlikely to make into upstream, and also get lost in
>> > into some distro such as SLES11.
>> 
>> We can certainly fix it there.
>> 
> 
> that'd be great. I/O method has observable impact on power efficiency,
> and the fix would be very welcomed. :-)

While the change is simple to do and works, I'm somewhat concerned
that while improving the situation on CPUs that support the break-on-
interrupt extension to mwait, it would result in C2/C3 not being usable
at all on CPUs that don't (but support mwait in its simpler form and
have ACPI tables specifying FFH as address space id). Is that only a
theoretical concern (i.e. is there an implicit guarantee that for other
than C1 FFH wouldn't be specified without that extension being
available)? I thinks it's a practical one, or otherwise there wouldn't
be a point in removing the ACPI_PDC_C_C2C3_FFH bit prior to _PDC
evaluation.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: expose MWAIT to dom0
  2011-08-15  8:14           ` Tian, Kevin
  2011-08-15  8:41             ` Jan Beulich
@ 2011-08-15  9:02             ` Keir Fraser
  2011-08-16  5:34               ` Tian, Kevin
  1 sibling, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2011-08-15  9:02 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wei, Gang

On 15/08/2011 09:14, "Tian, Kevin" <kevin.tian@intel.com> wrote:

>> cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid,
>> right? And cpuid goes through a pv_ops hook?
>> 
> 
> I'm not quite catching you here. Do you want to prefill mwait bit from pv_ops
> hook unconditionally even in current situation where Xen doesn't expose
> mwait, or selectively under some dom0's boot option even when Xen is
> changed to expose mwait? The first case is not sane, while for the 2nd case
> I don't think any pv_ops hook is required as long as Xen can expose it, isn't
> it?

But there is a pv_ops hook, and Xen isn't going to expose it because it
breaks old dom0 kernels.

 -- Keir

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-15  8:14           ` Tian, Kevin
@ 2011-08-15  8:41             ` Jan Beulich
  2011-08-15  9:02             ` Keir Fraser
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2011-08-15  8:41 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 15.08.11 at 10:14, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Keir Fraser [mailto:keir.xen@gmail.com] 
>> Sent: Monday, August 15, 2011 4:11 PM
>> 
>> On 15/08/2011 08:57, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> 
>> >> Else the kernel could get the flag from the real non paravirtualised CPUID
>> >> instruction.
>> >
>> > linux uses cpu_has to extract mwait capability. To use real cpuid 
> instruction,
>> > then
>> > we need change Linux code which is not worthy though, like below:
>> >        if (!cpu_has(c, X86_FEATURE_MWAIT))
>> >                buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
>> > If we make it into cpu_has bits, then it lacks of original guarding effect.
>> 
>> cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid,
>> right? And cpuid goes through a pv_ops hook?
>> 
> 
> I'm not quite catching you here. Do you want to prefill mwait bit from 
> pv_ops
> hook unconditionally even in current situation where Xen doesn't expose 
> mwait, or selectively under some dom0's boot option even when Xen is
> changed to expose mwait? The first case is not sane, while for the 2nd case
> I don't think any pv_ops hook is required as long as Xen can expose it, 
> isn't it?

Didn't you want to make Xen's exposing of the pv bit conditional upon
xen_cpuidle? Dom0 can indirectly see this variable already through
XEN_PROCESSOR_PM_CX being set in SIF_PM_MASK, so it could use it for
its own decision. Whether that's really safe is another matter, as
xen_cpuidle set doesn't mean mwait is actually being used.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-15  8:09   ` Tian, Kevin
@ 2011-08-15  8:18     ` Jan Beulich
  2011-08-15 10:29     ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2011-08-15  8:18 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 15.08.11 at 10:09, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@novell.com] 
>> Sent: Monday, August 15, 2011 4:02 PM
>> 
>> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> > There're basically two methods to enter a given C-state: legacy (hlt + I/O
>> > read),
>> > and native(using mwait). MWAIT is always preferred when both underlying
>> CPU
>> > and OS support, which is a more efficient way to conduct C-state transition.
>> >
>> > Xen PM relies on Dom0 to parse ACPI Cx/Px information, which involves one
>> > step to notify BIOS about a set of capabilities supported by OSPM. One
>> > capability
>> > is about mwait support, which if true ACPI Cx entry contains entry
>> > parameters
>> > for mwait, or else I/O port information is provided. Xen PM later decides
>> > entry
>> > method (i/o or mwait) based on parsed ACPI information from dom0.
>> >
>> > However Xen doesn't expose MWAIT capability to dom0 due to changeset
>> 17573:
>> >
>> > This then brings a problem to Dom0 which thinks underlying CPU
>> > doesn't report mwait, and thus notify BIOS to use old I/O based method.
>> >
>> > Later a trick is integrated in Jeremy's pvops tree:
>> >
>> > --- a/arch/x86/kernel/acpi/processor.c
>> > +++ b/arch/x86/kernel/acpi/processor.c
>> > @@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr,
>> > struct cpuinfo_x86 *c)
>> >         /*
>> >          * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
>> >          */
>> > -       if (!cpu_has(c, X86_FEATURE_MWAIT))
>> > +       if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain())
>> >                 buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
>> >
>> >         obj->type = ACPI_TYPE_BUFFER;
>> >
>> > Above trick is ugly and error-prone, since it always enable mwait regardless
>> > of actual CPU capability.
>> 
>> 3.x (and later 2.6.3x) don't look at the CPUID flag anymore, they just
>> check boot_option_idle_override, which is being controlled from the
>> command line or enforced for some particular systems based on DMI
>> data.
> 
> I don't think so. "boot_option_idle_override" controls the way how idle loop
> is implemented, which has the side effect to disable MWAIT if cpuid says it
> but "boot=nomwait" is specified. But it has no effect to enable MWAIT if
> Xen doesn't tell dom0 about it. 
> 
> Check arch_acpi_set_pdc_bits under x86:

Indeed - I didn't look at the header files.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-15  8:10         ` Keir Fraser
@ 2011-08-15  8:14           ` Tian, Kevin
  2011-08-15  8:41             ` Jan Beulich
  2011-08-15  9:02             ` Keir Fraser
  0 siblings, 2 replies; 36+ messages in thread
From: Tian, Kevin @ 2011-08-15  8:14 UTC (permalink / raw)
  To: Keir Fraser, xen-devel; +Cc: Zhang, Yang Z, Wei, Gang

> From: Keir Fraser [mailto:keir.xen@gmail.com]
> Sent: Monday, August 15, 2011 4:11 PM
> 
> On 15/08/2011 08:57, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> >> Else the kernel could get the flag from the real non paravirtualised CPUID
> >> instruction.
> >
> > linux uses cpu_has to extract mwait capability. To use real cpuid instruction,
> > then
> > we need change Linux code which is not worthy though, like below:
> >        if (!cpu_has(c, X86_FEATURE_MWAIT))
> >                buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
> > If we make it into cpu_has bits, then it lacks of original guarding effect.
> 
> cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid,
> right? And cpuid goes through a pv_ops hook?
> 

I'm not quite catching you here. Do you want to prefill mwait bit from pv_ops
hook unconditionally even in current situation where Xen doesn't expose 
mwait, or selectively under some dom0's boot option even when Xen is
changed to expose mwait? The first case is not sane, while for the 2nd case
I don't think any pv_ops hook is required as long as Xen can expose it, isn't it?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-15  7:57       ` Tian, Kevin
  2011-08-15  8:10         ` Keir Fraser
@ 2011-08-15  8:12         ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2011-08-15  8:12 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 15.08.11 at 09:57, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> So how about the change like below?
> 
> emulate_forced_invalid_op:
> -        __clear_bit(X86_FEATURE_MWAIT % 32, &c);
> +        if ( !IS_PRIV(current->domain) || !xen_cpuidle )
> +            __clear_bit(X86_FEATURE_MWAIT % 32, &c);

You'd break any Dom0 kernel that uses the flag to determine whether it
can actually use the mwait/monitor instructions. I agree with Keir that
for this particular purpose the code ought to look at the real CPUID
flag.

It should be possible to do this without changing non-Xen code, since
the use of the instructions is additionally gated on CPUID leaf 5
producing non-zero output. But again, the kernel has to do this for
itself, the hypervisor shouldn't expose the feature flag.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: expose MWAIT to dom0
  2011-08-15  7:57       ` Tian, Kevin
@ 2011-08-15  8:10         ` Keir Fraser
  2011-08-15  8:14           ` Tian, Kevin
  2011-08-15  8:12         ` Jan Beulich
  1 sibling, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2011-08-15  8:10 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel; +Cc: Zhang, Yang Z, Wei, Gang

On 15/08/2011 08:57, "Tian, Kevin" <kevin.tian@intel.com> wrote:

>> Else the kernel could get the flag from the real non paravirtualised CPUID
>> instruction.
> 
> linux uses cpu_has to extract mwait capability. To use real cpuid instruction,
> then
> we need change Linux code which is not worthy though, like below:
>        if (!cpu_has(c, X86_FEATURE_MWAIT))
>                buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
> If we make it into cpu_has bits, then it lacks of original guarding effect.

cpu_has() accesses a pre-filled capabilities bitmask, filled from cpuid,
right? And cpuid goes through a pv_ops hook?

 -- Keir

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-15  8:02 ` Jan Beulich
@ 2011-08-15  8:09   ` Tian, Kevin
  2011-08-15  8:18     ` Jan Beulich
  2011-08-15 10:29     ` Jan Beulich
  2011-08-15 12:44   ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 36+ messages in thread
From: Tian, Kevin @ 2011-08-15  8:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Wei, Gang

> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Monday, August 15, 2011 4:02 PM
> 
> >>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > There're basically two methods to enter a given C-state: legacy (hlt + I/O
> > read),
> > and native(using mwait). MWAIT is always preferred when both underlying
> CPU
> > and OS support, which is a more efficient way to conduct C-state transition.
> >
> > Xen PM relies on Dom0 to parse ACPI Cx/Px information, which involves one
> > step to notify BIOS about a set of capabilities supported by OSPM. One
> > capability
> > is about mwait support, which if true ACPI Cx entry contains entry
> > parameters
> > for mwait, or else I/O port information is provided. Xen PM later decides
> > entry
> > method (i/o or mwait) based on parsed ACPI information from dom0.
> >
> > However Xen doesn't expose MWAIT capability to dom0 due to changeset
> 17573:
> >
> > This then brings a problem to Dom0 which thinks underlying CPU
> > doesn't report mwait, and thus notify BIOS to use old I/O based method.
> >
> > Later a trick is integrated in Jeremy's pvops tree:
> >
> > --- a/arch/x86/kernel/acpi/processor.c
> > +++ b/arch/x86/kernel/acpi/processor.c
> > @@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr,
> > struct cpuinfo_x86 *c)
> >         /*
> >          * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
> >          */
> > -       if (!cpu_has(c, X86_FEATURE_MWAIT))
> > +       if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain())
> >                 buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
> >
> >         obj->type = ACPI_TYPE_BUFFER;
> >
> > Above trick is ugly and error-prone, since it always enable mwait regardless
> > of actual CPU capability.
> 
> 3.x (and later 2.6.3x) don't look at the CPUID flag anymore, they just
> check boot_option_idle_override, which is being controlled from the
> command line or enforced for some particular systems based on DMI
> data.

I don't think so. "boot_option_idle_override" controls the way how idle loop
is implemented, which has the side effect to disable MWAIT if cpuid says it
but "boot=nomwait" is specified. But it has no effect to enable MWAIT if
Xen doesn't tell dom0 about it. 

Check arch_acpi_set_pdc_bits under x86:
static inline void arch_acpi_set_pdc_bits(u32 *buf)
{
        struct cpuinfo_x86 *c = &cpu_data(0);

        buf[2] |= ACPI_PDC_C_CAPABILITY_SMP;

        if (cpu_has(c, X86_FEATURE_EST))
                buf[2] |= ACPI_PDC_EST_CAPABILITY_SWSMP;
 
        if (cpu_has(c, X86_FEATURE_ACPI))
                buf[2] |= ACPI_PDC_T_FFH;
        
        /*
         * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
         */
        if (!cpu_has(c, X86_FEATURE_MWAIT))
                buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
}       

> 
> > It's unlikely to make into upstream, and also get lost in
> > into some distro such as SLES11.
> 
> We can certainly fix it there.
> 

that'd be great. I/O method has observable impact on power efficiency,
and the fix would be very welcomed. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: expose MWAIT to dom0
  2011-08-15  5:35 Tian, Kevin
  2011-08-15  6:54 ` Keir Fraser
@ 2011-08-15  8:02 ` Jan Beulich
  2011-08-15  8:09   ` Tian, Kevin
  2011-08-15 12:44   ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 36+ messages in thread
From: Jan Beulich @ 2011-08-15  8:02 UTC (permalink / raw)
  To: Kevin Tian; +Cc: Yang Z Zhang, xen-devel, Keir Fraser, Gang Wei

>>> On 15.08.11 at 07:35, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> There're basically two methods to enter a given C-state: legacy (hlt + I/O 
> read),
> and native(using mwait). MWAIT is always preferred when both underlying CPU
> and OS support, which is a more efficient way to conduct C-state transition.
> 
> Xen PM relies on Dom0 to parse ACPI Cx/Px information, which involves one
> step to notify BIOS about a set of capabilities supported by OSPM. One 
> capability
> is about mwait support, which if true ACPI Cx entry contains entry 
> parameters 
> for mwait, or else I/O port information is provided. Xen PM later decides 
> entry
> method (i/o or mwait) based on parsed ACPI information from dom0.
> 
> However Xen doesn't expose MWAIT capability to dom0 due to changeset 17573:
> 
> This then brings a problem to Dom0 which thinks underlying CPU
> doesn't report mwait, and thus notify BIOS to use old I/O based method.
> 
> Later a trick is integrated in Jeremy's pvops tree:
> 
> --- a/arch/x86/kernel/acpi/processor.c
> +++ b/arch/x86/kernel/acpi/processor.c
> @@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr, 
> struct cpuinfo_x86 *c)
>         /*
>          * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
>          */
> -       if (!cpu_has(c, X86_FEATURE_MWAIT))
> +       if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain())
>                 buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
>  
>         obj->type = ACPI_TYPE_BUFFER;
> 
> Above trick is ugly and error-prone, since it always enable mwait regardless 
> of actual CPU capability.

3.x (and later 2.6.3x) don't look at the CPUID flag anymore, they just
check boot_option_idle_override, which is being controlled from the
command line or enforced for some particular systems based on DMI
data.

> It's unlikely to make into upstream, and also get lost in
> into some distro such as SLES11.

We can certainly fix it there.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-15  7:44     ` Keir Fraser
@ 2011-08-15  7:57       ` Tian, Kevin
  2011-08-15  8:10         ` Keir Fraser
  2011-08-15  8:12         ` Jan Beulich
  0 siblings, 2 replies; 36+ messages in thread
From: Tian, Kevin @ 2011-08-15  7:57 UTC (permalink / raw)
  To: Keir Fraser, xen-devel
  Cc: Zhang, Yang Z, 'Keir Fraser (keir@xen.org)', Wei, Gang

> From: Keir Fraser [mailto:keir.xen@gmail.com]
> Sent: Monday, August 15, 2011 3:45 PM
> 
> On 15/08/2011 08:13, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> >> Well the problem is some older kernels will try to use MWAIT if they see the
> >> feature in CPUID. Of course that doesn't work outside ring 0.
> >>
> >
> > if those old kernels are still of interest, then possibly a boot option in Xen
> > would
> > be applausive. Or can we just allow selectively exposing MWAIT when xen
> > cpuidle is enabled?
> 
> The kernel could unconditionally advertise MWAIT from its cpuid pv_ops hook?
> If all that's doing is causing relevant parts of BIOS tables to be parsed,
> would that be safe when MWAIT is not in fact available?

It's not safe to unconditionally advertise MWAIT, since if underlying CPU doesn't
support it you'll get invalid op in Xen when attempting to use it. This is why I
want to see MWAIT expose to guest base on real cpu capability, even when I
say selectively expose it under xen cpuidle. :-)

> 
> Else the kernel could get the flag from the real non paravirtualised CPUID
> instruction.

linux uses cpu_has to extract mwait capability. To use real cpuid instruction, then
we need change Linux code which is not worthy though, like below:
       if (!cpu_has(c, X86_FEATURE_MWAIT))
               buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
If we make it into cpu_has bits, then it lacks of original guarding effect.

So how about the change like below?

emulate_forced_invalid_op:
-        __clear_bit(X86_FEATURE_MWAIT % 32, &c);
+        if ( !IS_PRIV(current->domain) || !xen_cpuidle )
+            __clear_bit(X86_FEATURE_MWAIT % 32, &c);


Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: expose MWAIT to dom0
  2011-08-15  7:13   ` Tian, Kevin
@ 2011-08-15  7:44     ` Keir Fraser
  2011-08-15  7:57       ` Tian, Kevin
  0 siblings, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2011-08-15  7:44 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: Zhang, Yang Z, 'Keir Fraser (keir@xen.org)', Wei, Gang

On 15/08/2011 08:13, "Tian, Kevin" <kevin.tian@intel.com> wrote:

>> Well the problem is some older kernels will try to use MWAIT if they see the
>> feature in CPUID. Of course that doesn't work outside ring 0.
>> 
> 
> if those old kernels are still of interest, then possibly a boot option in Xen
> would
> be applausive. Or can we just allow selectively exposing MWAIT when xen
> cpuidle is enabled?

The kernel could unconditionally advertise MWAIT from its cpuid pv_ops hook?
If all that's doing is causing relevant parts of BIOS tables to be parsed,
would that be safe when MWAIT is not in fact available?

Else the kernel could get the flag from the real non paravirtualised CPUID
instruction.

 -- Keir

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: expose MWAIT to dom0
  2011-08-15  6:54 ` Keir Fraser
@ 2011-08-15  7:13   ` Tian, Kevin
  2011-08-15  7:44     ` Keir Fraser
  0 siblings, 1 reply; 36+ messages in thread
From: Tian, Kevin @ 2011-08-15  7:13 UTC (permalink / raw)
  To: Keir Fraser, xen-devel
  Cc: Zhang, Yang Z, 'Keir Fraser (keir@xen.org)', Wei, Gang

> From: Keir Fraser [mailto:keir.xen@gmail.com]
> Sent: Monday, August 15, 2011 2:55 PM
> 
> On 15/08/2011 06:35, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > Above trick is ugly and error-prone, since it always enable mwait regardless
> > of
> > actual CPU capability. It's unlikely to make into upstream, and also get lost
> > in
> > into some distro such as SLES11.
> >
> > Instead of enhancing current approach (e.g. add a separate channel to reveal
> > mwait capability instead of using cpuid), I'm curious why we don't expose
> > mwait
> > in cpuid to dom0 directly. At least original comment in 17573 looks obscure,
> > since EIST has nothing to do with Cx...
> 
> Well the problem is some older kernels will try to use MWAIT if they see the
> feature in CPUID. Of course that doesn't work outside ring 0.
> 

if those old kernels are still of interest, then possibly a boot option in Xen would
be applausive. Or can we just allow selectively exposing MWAIT when xen 
cpuidle is enabled?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: expose MWAIT to dom0
  2011-08-15  5:35 Tian, Kevin
@ 2011-08-15  6:54 ` Keir Fraser
  2011-08-15  7:13   ` Tian, Kevin
  2011-08-15  8:02 ` Jan Beulich
  1 sibling, 1 reply; 36+ messages in thread
From: Keir Fraser @ 2011-08-15  6:54 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: Zhang, Yang Z, 'Keir Fraser (keir@xen.org)', Wei, Gang

On 15/08/2011 06:35, "Tian, Kevin" <kevin.tian@intel.com> wrote:

> Above trick is ugly and error-prone, since it always enable mwait regardless
> of
> actual CPU capability. It's unlikely to make into upstream, and also get lost
> in
> into some distro such as SLES11.
> 
> Instead of enhancing current approach (e.g. add a separate channel to reveal
> mwait capability instead of using cpuid), I'm curious why we don't expose
> mwait
> in cpuid to dom0 directly. At least original comment in 17573 looks obscure,
> since EIST has nothing to do with Cx...

Well the problem is some older kernels will try to use MWAIT if they see the
feature in CPUID. Of course that doesn't work outside ring 0.

 -- Keir

^ permalink raw reply	[flat|nested] 36+ messages in thread

* expose MWAIT to dom0
@ 2011-08-15  5:35 Tian, Kevin
  2011-08-15  6:54 ` Keir Fraser
  2011-08-15  8:02 ` Jan Beulich
  0 siblings, 2 replies; 36+ messages in thread
From: Tian, Kevin @ 2011-08-15  5:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Zhang, Yang Z, 'Keir Fraser (keir@xen.org)', Wei, Gang

There're basically two methods to enter a given C-state: legacy (hlt + I/O read),
and native(using mwait). MWAIT is always preferred when both underlying CPU
and OS support, which is a more efficient way to conduct C-state transition.

Xen PM relies on Dom0 to parse ACPI Cx/Px information, which involves one
step to notify BIOS about a set of capabilities supported by OSPM. One capability
is about mwait support, which if true ACPI Cx entry contains entry parameters 
for mwait, or else I/O port information is provided. Xen PM later decides entry
method (i/o or mwait) based on parsed ACPI information from dom0.

However Xen doesn't expose MWAIT capability to dom0 due to changeset 17573:

# HG changeset patch
# User Keir Fraser <keir.fraser@citrix.com>
# Date 1210065934 -3600
# Node ID 777f294e3be81a4d0825e3a9b633a8d81c37f613
# Parent  d5589865bfce91bf65c34bd56e466bf26ca4176f
x86, Intel: Make only EST feature visible to dom0 to enable Cx-state
logic. There should be no need to make MWAIT visible.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
diff -r d5589865bfce -r 777f294e3be8 xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c      Tue May 06 10:19:10 2008 +0100
+++ b/xen/arch/x86/traps.c      Tue May 06 10:25:34 2008 +0100
@@ -713,8 +713,7 @@ static int emulate_forced_invalid_op(str
         __clear_bit(X86_FEATURE_PBE, &d);
 
         __clear_bit(X86_FEATURE_DTES64 % 32, &c);
-        if ( !IS_PRIV(current->domain) )
-            __clear_bit(X86_FEATURE_MWAIT % 32, &c);
+        __clear_bit(X86_FEATURE_MWAIT % 32, &c);
         __clear_bit(X86_FEATURE_DSCPL % 32, &c);
         __clear_bit(X86_FEATURE_VMXE % 32, &c);
         __clear_bit(X86_FEATURE_SMXE % 32, &c);
[...]

This then brings a problem to Dom0 which thinks underlying CPU
doesn't report mwait, and thus notify BIOS to use old I/O based method.

Later a trick is integrated in Jeremy's pvops tree:

commit 4151815a222723002d727ef0a89f0756d4e7d3d2
Author: Wei Gang <gang.wei@intel.com>
Date:   Mon Apr 5 14:21:18 2010 -0700

    xen/acpi: re-enable mwait for xen cpuidle
    
    Xen hypervisor doesn't export mwait feature to dom0, but latest Linux
    kernel start to check this feature while initializing _PDC
    object. Bypass such check in pv-ops case to re-enable mwait for xen
    cpuidle.
    
    Signed-off-by: Wei Gang <gang.wei@intel.com>
    Signed-off-by: Yu Ke <ke.yu@intel.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

diff --git a/arch/x86/kernel/acpi/processor.c b/arch/x86/kernel/acpi/processor.c
index 8c9526d..d88866c 100644
--- a/arch/x86/kernel/acpi/processor.c
+++ b/arch/x86/kernel/acpi/processor.c
@@ -60,7 +60,7 @@ static void init_intel_pdc(struct acpi_processor *pr, struct cpuinfo_x86 *c)
        /*
         * If mwait/monitor is unsupported, C2/C3_FFH will be disabled
         */
-       if (!cpu_has(c, X86_FEATURE_MWAIT))
+       if (!cpu_has(c, X86_FEATURE_MWAIT) && !xen_initial_domain())
                buf[2] &= ~(ACPI_PDC_C_C2C3_FFH);
 
        obj->type = ACPI_TYPE_BUFFER;

Above trick is ugly and error-prone, since it always enable mwait regardless of
actual CPU capability. It's unlikely to make into upstream, and also get lost in
into some distro such as SLES11.

Instead of enhancing current approach (e.g. add a separate channel to reveal
mwait capability instead of using cpuid), I'm curious why we don't expose mwait
in cpuid to dom0 directly. At least original comment in 17573 looks obscure, 
since EIST has nothing to do with Cx...

Thanks
Kevin

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2011-09-05  8:14 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-18 12:35 expose MWAIT to dom0 Jan Beulich
2011-08-19  1:31 ` Tian, Kevin
2011-08-19  8:10   ` Jan Beulich
2011-08-19  8:34     ` Tian, Kevin
2011-08-19  9:32       ` Jan Beulich
2011-08-19  9:39         ` Tian, Kevin
2011-08-19 12:55           ` Jan Beulich
2011-08-19 15:01           ` Jan Beulich
2011-08-21  5:26             ` Tian, Kevin
2011-08-25 12:37               ` Jan Beulich
2011-08-26  2:18                 ` Tian, Kevin
2011-09-05  8:14                   ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2011-08-15  5:35 Tian, Kevin
2011-08-15  6:54 ` Keir Fraser
2011-08-15  7:13   ` Tian, Kevin
2011-08-15  7:44     ` Keir Fraser
2011-08-15  7:57       ` Tian, Kevin
2011-08-15  8:10         ` Keir Fraser
2011-08-15  8:14           ` Tian, Kevin
2011-08-15  8:41             ` Jan Beulich
2011-08-15  9:02             ` Keir Fraser
2011-08-16  5:34               ` Tian, Kevin
2011-08-16  6:31                 ` Keir Fraser
2011-08-15  8:12         ` Jan Beulich
2011-08-15  8:02 ` Jan Beulich
2011-08-15  8:09   ` Tian, Kevin
2011-08-15  8:18     ` Jan Beulich
2011-08-15 10:29     ` Jan Beulich
2011-08-16  6:03       ` Tian, Kevin
2011-08-16  6:42         ` Jan Beulich
2011-08-16  6:53           ` Tian, Kevin
2011-08-16  8:13             ` Jan Beulich
2011-08-16  8:29               ` Tian, Kevin
2011-08-16  8:45                 ` Jan Beulich
2011-08-16  8:50                   ` Tian, Kevin
2011-08-15 12:44   ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.