All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Borislav Petkov <bp@alien8.de>,
	Stefan Bader <stefan.bader@canonical.com>,
	Andre Przywara <andre@andrep.de>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Matthew Garrett <mjg@redhat.com>
Subject: Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
Date: Fri, 18 Jan 2013 14:00:15 -0500	[thread overview]
Message-ID: <20130118190015.GC11351@phenom.dumpdata.com> (raw)
In-Reply-To: <20130115181839.GC8101@liondog.tnic>

On Tue, Jan 15, 2013 at 07:18:39PM +0100, Borislav Petkov wrote:
> On Tue, Jan 15, 2013 at 12:53:05PM -0500, Konrad Rzeszutek Wilk wrote:
> > > I don't think that's the right change - this is fixing baremetal so that
> > > it works on xen. And besides, this code was in powernow-k8 before so I'm
> > > wondering why did it work then.
> > 
> > Powernow-k8 only populated the cpufreq policy information. This library
> > (processor_perflib) is the generic library used for ACPI P-states parsing.
> > This specific function (acpi_processor_get_performance_states) is just
> > used to fetch and parse the P-states.
> > 
> > Xen-acpi-processor (which we use to upload the P and C-states to the
> > hypervisor) ends up calling this library to parse the P-states
> > and this unfortunate quirk clamps the P-states based on the MSRS.
> 
> Huh? This is a fix for _PSS frequency values which are rounded and thus
> imprecise. The _PSS objects are the unfortunate ones, as most of the
> other crap BIOS produces.

I did not explain myself well. The fix is OK - it just that the hypervisor
causes the quirk to not work correctly. Hmm, I wonder if there BIOSes
that do the same thing (cause the MSR to return 0). Per you estimation
of BIOS quality, it seems that this could happen.

> 
> > It is odd that this CPU specific quirk got added in this generic
> > library. Is there no ACPI quirk system similar to how DMI quirks are
> > handled?
> 
> Even if there were, do you know all the boards and BIOS revisions which
> have those rounded values? The fix addresses the hardware which has
> those 50MHz multiples and simply ignores the _PSS data but reads out the
> P-states directly from the hardware.

Oh, I was not thinking DMI per-say. I was thinking something similar to
DMI-quirk API. But for the ACPI subsystem, so it would be:

	if (ARM)
		... these quirks neccessary
	if (AMD)
		.. these quirks

and then the ACPI code can make the calls to this ACPI-quirk API to
figure out whether it needs to modulate values. But this is all
hand-waving at this point.
> 
> > Anyhow, I think this patch makes sense - it makes sure that the MSR
> > value is sane.
> 
> I agree to a certain degree. Testing the Valid bit is something we
> should do for P-state MSRs - and for all MSRs containing a Valid bit,
> for that matter - and the original code didn't do it.

OK.
> 
> However, you need to push down the *correct* frequencies *after* the
> quirk to the hypervisor (I'm looking at push_pxx_to_hypervisor()) so
> that it is aware of the exact P-state frequencies this CPU supports and
> not some rounded values.

Yes!  It could be done in the hypervisor (it does the MSRs and figures out
that the P-states need tweaking).

> 
> AFAICT for the xen part, of course. But the baseline stands: you need to
> tell the thing that switches P-states the exact P-state frequencies of
> the CPU. :-).

Right, that information is gathered from the MSRs. I think the Xen would
need to do this since it can do the MSRs correctly and modify the P-states.

So something like this in the hypervisor maybe (not even tested):

diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c
index a9b7792..54e7808 100644
--- a/xen/arch/x86/acpi/cpufreq/powernow.c
+++ b/xen/arch/x86/acpi/cpufreq/powernow.c
@@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy,
 
     return 0;
 }
+#define MSR_AMD_PSTATE_DEF_BASE     0xc0010064
+static void amd_fixup_frequency(struct xen_processor_px *px, int i)
+{
+	u32 hi, lo, fid, did;
+	int index = px->control & 0x00000007;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+		return;
+
+	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
+	    || boot_cpu_data.x86 == 0x11) {
+		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
+        /* Bit 63 indicates whether contents are valid */
+        if (!(hi & 0x80000000))
+            return;
+
+		fid = lo & 0x3f;
+		did = (lo >> 6) & 7;
+		if (boot_cpu_data.x86 == 0x10)
+			px->core_frequency = (100 * (fid + 0x10)) >> did;
+		else
+			px->core_frequency = (100 * (fid + 8)) >> did;
+	}
+}
+
+static void amd_fixup_freq(struct processor_performance *perf)
+{
 
+    int i;
+
+    for (i = 0; i < perf->state_count; i++)
+        amd_fixup_frequency(perf->states, i);
+
+}
 static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
 {
     struct acpi_cpufreq_data *data;
@@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
 
     perf = &processor_pminfo[policy->cpu]->perf;
 
+    amd_fixup_freq(perf);
+
     cpufreq_verify_within_limits(policy, 0, 
         perf->states[perf->platform_limit].core_frequency * 1000);
 

  reply	other threads:[~2013-01-18 19:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-14 15:58 kernel 3.7+ cpufreq regression on AMD system running as dom0 Stefan Bader
2013-01-14 16:34 ` Borislav Petkov
2013-01-14 16:55   ` [Xen-devel] " Jan Beulich
2013-01-14 17:08   ` Stefan Bader
2013-01-14 17:40     ` André Przywara
2013-01-14 17:40       ` André Przywara
2013-01-15 17:53   ` Konrad Rzeszutek Wilk
2013-01-15 18:18     ` Borislav Petkov
2013-01-18 19:00       ` Konrad Rzeszutek Wilk [this message]
2013-01-18 19:38         ` [Xen-devel] " Boris Ostrovsky
2013-01-18 19:44           ` Andrew Cooper
2013-01-18 20:03         ` Borislav Petkov
2013-01-18 22:00           ` Konrad Rzeszutek Wilk
2013-01-21 12:22           ` Stefan Bader
2013-01-21 12:42             ` Borislav Petkov
2013-01-21 12:53               ` Rafael J. Wysocki
2013-01-21 13:08                 ` Borislav Petkov
2013-01-21 13:11               ` Stefan Bader
2013-01-21 15:03               ` Stefan Bader
2013-01-21 15:31                 ` Borislav Petkov
2013-01-22 13:54                   ` Rafael J. Wysocki
2013-01-22  0:01         ` [Xen-devel] " Boris Ostrovsky
2013-01-16 10:26     ` Jan Beulich
2013-01-16 14:34       ` Stefan Bader
2013-01-16 14:34       ` [Xen-devel] " Stefan Bader
2013-01-16 10:26     ` Jan Beulich
2013-01-15 13:04 ` Matt Wilson
2013-01-15 17:59   ` [Xen-devel] " Matt Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130118190015.GC11351@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=andre@andrep.de \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjg@redhat.com \
    --cc=rjw@sisk.pl \
    --cc=stefan.bader@canonical.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.