From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Tian, Kevin" Subject: RE: expose MWAIT to dom0 Date: Fri, 26 Aug 2011 10:18:37 +0800 Message-ID: <625BA99ED14B2D499DC4E29D8138F15063007E01DB@shsmsx502.ccr.corp.intel.com> References: <4E4D23370200007800051D2C@nat28.tlf.novell.com> <625BA99ED14B2D499DC4E29D8138F15062EB8CC157@shsmsx502.ccr.corp.intel.com> <4E4E369F0200007800051F2A@nat28.tlf.novell.com> <625BA99ED14B2D499DC4E29D8138F15062EB8CC533@shsmsx502.ccr.corp.intel.com> <4E4E49C40200007800051F7B@nat28.tlf.novell.com> <625BA99ED14B2D499DC4E29D8138F15062EB8CC5AC@shsmsx502.ccr.corp.intel.com> <4E4E9707020000780005206C@nat28.tlf.novell.com> <625BA99ED14B2D499DC4E29D8138F15062F09DB9A7@shsmsx502.ccr.corp.intel.com> <4E565E0D0200007800053391@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <4E565E0D0200007800053391@nat28.tlf.novell.com> Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jan Beulich Cc: "Zhang, Yang Z" , "xen-devel@lists.xensource.com" , Keir Fraser , "Wei, Gang" , "'Konrad Rzeszutek Wilk (konrad.wilk@oracle.com)'" List-Id: xen-devel@lists.xenproject.org > From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Thursday, August 25, 2011 8:37 PM >=20 > >>> On 21.08.11 at 07:26, "Tian, Kevin" wrote: > >> From: Jan Beulich [mailto:JBeulich@novell.com] > >> Sent: Friday, August 19, 2011 11:02 PM > >> > >> Yet another idea - why don't we simply pass the buffer passed to > >> > >> arch_acpi_set_pdc_bits() down to Xen, rather than fiddling with t= he > >> > >> bits in Dom0? That would at once allow to not set ACPI_PDC_T_FFH > >> > >> (which I don't think Xen really supports at present). > >> > >> > >> > >> Or really, depending on who controls what, the P, C, and T bits s= hould > >> > >> be set by either Dom0 or Xen (so e.g. let Dom0 do what it current= ly > >> > >> does, and then let Xen override the bits it ought to control). > >> > > > >> > > _PDC is encoded in AML language, and requires an ACPI parser which > >> > > is one thing we avoid in Xen. If Xen want to override those bits, = then > >> > > whole ACPI component needs move down to Xen too. > >> > > >> > No, I'm not saying the evaluation should be happening there. Below i= s > >> > a draft hypervisor patch (only compile tested so far). > >> > >> Attached a patch that actually works (with a minimal Dom0 addition). > >> > > > > yes, this change looks more straightforward. :-) >=20 > With that in, we still have more deficiencies compared to native Linux. definitely there'll be even more than what's revealed today, due to the way that dom0 ACPI processor driver is tightly bound. there're lots of factors in dom0 itself which may impact the verification/filtering on Cx entries provide by BIOS, while some of which should be avoided from Xen p.o.v, such as the 2nd example you just found. The more severe is that to work around those factors adds intrusive Xen awareness into generic ACPI processor driver, e.g.=20 @@ -780,7 +780,7 @@ static int acpi_processor_get_power_info current_count)); =20 /* Validate number of power states discovered */ - if (current_count < 2) + if (current_count < 1 + !processor_pm_external()) status =3D -EFAULT; =20 end: More changes like above are added, less possibilities for Xen PM changes to be accepted into upstream. Also such specific changes made on one dom0 version may be invalid in a new version quickly. Above change is one example which doesn't hold true in newer kernel.=20 When working with Konrad on rebasing xen PM patches to latest Linux 3.0.0. we tried hard to avoid intrusive changes in generic ACPI processor driver, by trying to invoke existing interfaces in higher level as possible. The end result is that we skip handling those corner cases like above example for now, by at least making Xen PM working on majority boxes. Later after Xen PM is accepted upstream with more Xen awareness in Linux ACPI people, those corner cases handling may be improved gradually. =20 Another option Yang currently is working on is to port native intel-idle driver to Xen, which should avoid nasty dependency on dom0 ACPI bits and immune to various BIOS bugs. >=20 > For one, we don't use mwait when ACPI doesn't tell us to, while Linux > does (in the intel_idle driver for deeper C-states, and for C1 also via > mwait_idle()). This is likely a bit more work, but it should be possible = to > construct C-state information from CPUID leaf 5 (and, if valid, ignore > information passed down from Dom0), which would match intel_idle's > taking precedence over acpi_idle in Linux. yes. This should be a desired feature in Xen, with some limitations: - not work with CPU hotplug - not work with old boxes (starting from Nehalem) - not work with Px/Cx state changes (_PPC, _CST e.g. from Node Manager) So this will be a supplemented option to existing acpi_idle, and should work on most cases when above 3 factors are not concerned. >=20 > Second, if only C1 gets announced by ACPI, we end up not using it > because Dom0 simply neglects to let the hypervisor know. This is > because acpi_processor_get_power_info_cst() (back to at least > 2.6.16) returns -EFAULT if less than two C-states were found. Simply > prefixing the check with "!processor_pm_external() && " fixes this > (but I don't know whether something similar could be done in Jeremy's > tree). this is a very temporary problem which disappears quickly in subsequent versions. But if just taking 2.6.18-xen, it's a right fix. Thanks Kevin