linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] acpi processor and cpufreq harester - aka pipe all of that up to the hypervisor (v3)
@ 2012-02-14  5:06 Konrad Rzeszutek Wilk
  2012-02-14  5:06 ` [PATCH 1/3] xen/setup/pm/acpi: Remove the call to boot_option_idle_override Konrad Rzeszutek Wilk
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-02-14  5:06 UTC (permalink / raw)
  To: ke.yu, kevin.tian, JBeulich, linux-kernel, linux-acpi
  Cc: xen-devel, Konrad Rzeszutek Wilk


Changelog [v3:]
 - new name and decided to put it in drivers/xen since it uses APIs from both cpufreq and acpi.
 - updated to expose MWAIT capability
 - cleaned up the code a bit.
[since v2 - not posted]:
 - change the name to processor_passthrough_xen and move it to drivers/acpi
 - make it launch a thread, support CPU hotplug
[since v1: http://comments.gmane.org/gmane.linux.acpi.devel/51862]
 - initial posting.

The problem these three patches try to solve is to provide ACPI power management
information to the hypervisor. The hypervisor lacks the ACPI DSDT parser so it can't
get that data without some help - and the initial domain can provide that. One
approach (https://lkml.org/lkml/2011/11/30/245) augments the ACPI code to call
an external PM code - but there were no comments about it so I decided to see
if another approach could solve it.

This "harvester" (I am horrible with names, if you have any suggestions please
tell me them) collects the information that the cpufreq drivers and the
ACPI processor code save in the 'struct acpi_processor' and then sends it to
the hypervisor.

The driver can be either an module or compiled in. In either mode the driver
launches a thread that checks whether an cpufreq driver is registered. If so
it reads all the 'struct acpi_processor' data for all online CPUs and sends
it to hypervisor. The driver also register a CPU hotplug component - so if a new
CPU shows up - it would send the data to the hypervisor for it as well.

I've tested this with success on a variety of Intel and AMD hardware (need
a patch to the hypervisor to allow the rdmsr to be passed through). The one
caveat is that dom0_max_vcpus inhibits the driver from reading the vCPUs
that are not present in dom0. One solution is to boot without dom0_max_vcpus
and utilize the 'xl vcpu-set' command to offline the vCPUs. Other one that
Nakajima Jun suggested was to hotplug vCPUS in - so bootup dom0 and hotplug
the vCPUs in - but I am running in difficulties on how to do this in the hypervisor.

Konrad Rzeszutek Wilk (3):
      xen/setup/pm/acpi: Remove the call to boot_option_idle_override.
      xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it.
      xen/acpi/cpufreq: Provide an driver that passes struct acpi_processor data to the hypervisor.

 arch/x86/xen/enlighten.c         |   92 +++++++++-
 arch/x86/xen/setup.c             |    1 -
 drivers/xen/Kconfig              |   14 ++
 drivers/xen/Makefile             |    2 +-
 drivers/xen/processor-harvest.c  |  397 ++++++++++++++++++++++++++++++++++++++
 include/xen/interface/platform.h |    4 +-
 6 files changed, 506 insertions(+), 4 deletions(-)


Oh, and the hypervisor patch to make this work under AMD:
# HG changeset patch
# Parent 9ad1e42c341bc78463b6f6610a6300f75b535fbb
traps: AMD PM MSRs (MSR_K8_PSTATE_CTRL, etc)

The restriction to read and write the AMD power management MSRs is gated if the
domain 0 is the PM domain (so FREQCTL_dom0_kernel is set). But we can
relax this restriction and allow the privileged domain to read the MSRs
(but not write). This allows the priviliged domain to harvest the power
management information (ACPI _PSS states) and send it to the hypervisor.

TODO: Have not tested on K7 machines.
TODO: Have not tested this with XenOLinux 2.6.32 dom0 on AMD machines.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff -r 9ad1e42c341b xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c	Fri Feb 10 17:24:50 2012 +0000
+++ b/xen/arch/x86/traps.c	Mon Feb 13 23:11:59 2012 -0500
@@ -2457,7 +2457,7 @@ static int emulate_privileged_op(struct 
         case MSR_K8_HWCR:
             if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
                 goto fail;
-            if ( !is_cpufreq_controller(v->domain) )
+            if ( !is_cpufreq_controller(v->domain) && !IS_PRIV(v->domain) )
                 break;
             if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
                 goto fail;

^ permalink raw reply	[flat|nested] 14+ messages in thread
* [PATCH] processor passthru - upload _Cx and _Pxx data to hypervisor (v5).
@ 2012-02-23 22:31 Konrad Rzeszutek Wilk
  2012-02-23 22:31 ` [PATCH 2/3] xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-02-23 22:31 UTC (permalink / raw)
  To: ke.yu, kevin.tian, JBeulich, linux-kernel, linux-acpi, lenb
  Cc: xen-devel, Konrad Rzeszutek Wilk

The problem these three patches try to solve is to provide ACPI power management
information to the Xen hypervisor. The hypervisor lacks the ACPI DSDT parser so it can't
get that data without some help - and the initial domain can provide that. One
approach (https://lkml.org/lkml/2011/11/30/245) augments the ACPI code to call
an external PM code - but there were no comments about it so I decided to see
if another approach could solve it.

This module (processor-passthru)  collects the information that the cpufreq
drivers and the ACPI processor code save in the 'struct acpi_processor' and
then uploads it to the hypervisor.

The driver can be either an module or compiled in. In either mode the driver
launches a thread that checks whether an cpufreq driver is registered. If so
it reads all the 'struct acpi_processor' data for all online CPUs and sends
it to hypervisor. The driver also register a CPU hotplug component - so if a new
CPU shows up - it would send the data to the hypervisor for it as well.
Furthermore it also verifies whether the ACPI ID count is different than what
the kernel sees (which is possible with dom0_max_vcpus) and if so uploads
the data for the other ACPI IDs.

The patches are available in this git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/processor-passthru.v5


Konrad Rzeszutek Wilk (3):
      xen/setup/pm/acpi: Remove the call to boot_option_idle_override.
      xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it.
      xen/processor-passthru: Provide an driver that passes struct acpi_processor data to the hypervisor.

 arch/x86/xen/enlighten.c         |   92 +++++++-
 arch/x86/xen/setup.c             |    1 -
 drivers/xen/Kconfig              |   14 +
 drivers/xen/Makefile             |    2 +-
 drivers/xen/processor-passthru.c |  485 ++++++++++++++++++++++++++++++++++++++
 include/xen/interface/platform.h |    4 +-
 6 files changed, 594 insertions(+), 4 deletions(-)


P.S.
On the hypervisor side, it requires this patch on AMD:
# HG changeset patch
# Parent aea8cfac8cf1afe397f2e1d422a852008d8a83fe
traps: AMD PM RDMSRs (MSR_K8_PSTATE_CTRL, etc)

The restriction to read and write the AMD power management MSRs is gated if the
domain 0 is the PM domain (so FREQCTL_dom0_kernel is set). But we can
relax this restriction and allow the privileged domain to read the MSRs
(but not write). This allows the priviliged domain to harvest the power
management information (ACPI _PSS states) and send it to the hypervisor.

This patch works fine with older classic dom0 (2.6.32) and with
AMD K7 and K8 boxes.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
diff -r aea8cfac8cf1 xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c	Thu Feb 23 13:23:02 2012 -0500
+++ b/xen/arch/x86/traps.c	Thu Feb 23 13:29:00 2012 -0500
@@ -2484,7 +2484,7 @@ static int emulate_privileged_op(struct 
         case MSR_K8_PSTATE7:
             if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
                 goto fail;
-            if ( !is_cpufreq_controller(v->domain) )
+            if ( !is_cpufreq_controller(v->domain) && !IS_PRIV(v->domain) )
             {
                 regs->eax = regs->edx = 0;
                 break;



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-02-27  8:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-14  5:06 [RFC] acpi processor and cpufreq harester - aka pipe all of that up to the hypervisor (v3) Konrad Rzeszutek Wilk
2012-02-14  5:06 ` [PATCH 1/3] xen/setup/pm/acpi: Remove the call to boot_option_idle_override Konrad Rzeszutek Wilk
2012-02-14  5:06 ` [PATCH 2/3] xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it Konrad Rzeszutek Wilk
2012-02-14  5:06 ` [PATCH 3/3] xen/acpi/cpufreq: Provide an driver that passes struct acpi_processor data to the hypervisor Konrad Rzeszutek Wilk
2012-02-14 18:30 ` [Xen-devel] [RFC] acpi processor and cpufreq harester - aka pipe all of that up to the hypervisor (v3) Pasi Kärkkäinen
2012-02-15 16:33   ` Konrad Rzeszutek Wilk
2012-02-21  0:07   ` [RFC] follow-on patches to acpi processor and cpufreq harvester^H^H^Hpassthru (v4) Konrad Rzeszutek Wilk
2012-02-21  0:07     ` [PATCH 1/3] xen/processor-passthru: Change the name to processor-passthru Konrad Rzeszutek Wilk
2012-02-21  0:07     ` [PATCH 2/3] xen/processor-passthru: Support vCPU != pCPU - aka dom0_max_vcpus Konrad Rzeszutek Wilk
2012-02-21  0:07     ` [PATCH 3/3] xen/processor-passthru: Remove the print_hex_dump - as it is difficult to decipher it Konrad Rzeszutek Wilk
2012-02-23 22:31 [PATCH] processor passthru - upload _Cx and _Pxx data to hypervisor (v5) Konrad Rzeszutek Wilk
2012-02-23 22:31 ` [PATCH 2/3] xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it Konrad Rzeszutek Wilk
2012-02-24 10:32   ` Jan Beulich
2012-02-24 23:52     ` Konrad Rzeszutek Wilk
2012-02-27  8:14       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).