All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel 3.7+ cpufreq regression on AMD system running as dom0
@ 2013-01-14 15:58 Stefan Bader
  2013-01-14 16:34 ` Borislav Petkov
  2013-01-15 13:04 ` Matt Wilson
  0 siblings, 2 replies; 28+ messages in thread
From: Stefan Bader @ 2013-01-14 15:58 UTC (permalink / raw)
  To: xen-devel, Linux Kernel Mailing List, Rafael J. Wysocki,
	Konrad Rzeszutek Wilk
  Cc: Matthew Garrett, Andre Przywara

[-- Attachment #1: Type: text/plain, Size: 2799 bytes --]

Starting with kernel v3.7 the following commit added a quirk
to obtain the real frequencies of certain AMD systems:

commit f594065faf4f9067c2283a34619fc0714e79a98d
Author: Matthew Garrett <mjg@redhat.com>
Date:   Tue Sep 4 08:28:06 2012 +0000

    ACPI: Add fixups for AMD P-state figures

When running bare-metal, on my Opteron 6128 test box results
in the frequencies remaining effectively unchanged:
[    5.475735] P0: MSR(hi,lo): 8000015c-50004004
[    5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000
[    5.484001] P1: MSR(hi,lo): 8000014c-50004a4e
[    5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500
[    5.492272] P2: MSR(hi,lo): 80000141-50005048
[    5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200
[    5.500540] P3: MSR(hi,lo): 80000138-50005844
[    5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000
[    5.508812] P4: MSR(hi,lo): 80000131-50005c40
[    5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800

However running as dom0 under Xen 4.2, reading this MSR returns
null:
[   11.613068] P0: MSR(hi,lo): 00000000-00000000
[   11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600
[   11.613078] P1: MSR(hi,lo): 00000000-00000000
[   11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600
[   11.613085] P2: MSR(hi,lo): 00000000-00000000
[   11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600
[   11.613091] P3: MSR(hi,lo): 00000000-00000000
[   11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600
[   11.613098] P4: MSR(hi,lo): 00000000-00000000
[   11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600

And this results in Xen failing to change the governor:
  "(XEN) Fail change to ondemand governor"

I suppose this ultimately requires some support in the hypervisor
to pass through the real values. But since this is at least on my
combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a
regression compared to older kernels, I wonder whether the following
change might be something that should go into mainline:

--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px
        if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
            || boot_cpu_data.x86 == 0x11) {
                rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
+               /* Bit 63 indicates whether contents are valid */
+               if (!(hi & 0x8000000))
+                       return;
                fid = lo & 0x3f;
                did = (lo >> 6) & 7;
                if (boot_cpu_data.x86 == 0x10)

I tested something similar (so hopefully I have not failed on slapping
together a cleaned up version), which did resolve the problem.

-Stefan

Reference: https://bugs.launchpad.net/bugs/1078619


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-14 15:58 kernel 3.7+ cpufreq regression on AMD system running as dom0 Stefan Bader
@ 2013-01-14 16:34 ` Borislav Petkov
  2013-01-14 16:55   ` [Xen-devel] " Jan Beulich
                     ` (2 more replies)
  2013-01-15 13:04 ` Matt Wilson
  1 sibling, 3 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-01-14 16:34 UTC (permalink / raw)
  To: Stefan Bader, Andre Przywara
  Cc: xen-devel, Linux Kernel Mailing List, Rafael J. Wysocki,
	Konrad Rzeszutek Wilk, Matthew Garrett

On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
> Starting with kernel v3.7 the following commit added a quirk
> to obtain the real frequencies of certain AMD systems:
> 
> commit f594065faf4f9067c2283a34619fc0714e79a98d
> Author: Matthew Garrett <mjg@redhat.com>
> Date:   Tue Sep 4 08:28:06 2012 +0000
> 
>     ACPI: Add fixups for AMD P-state figures
> 
> When running bare-metal, on my Opteron 6128 test box results
> in the frequencies remaining effectively unchanged:
> [    5.475735] P0: MSR(hi,lo): 8000015c-50004004
> [    5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000
> [    5.484001] P1: MSR(hi,lo): 8000014c-50004a4e
> [    5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500
> [    5.492272] P2: MSR(hi,lo): 80000141-50005048
> [    5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200
> [    5.500540] P3: MSR(hi,lo): 80000138-50005844
> [    5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000
> [    5.508812] P4: MSR(hi,lo): 80000131-50005c40
> [    5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800
> 
> However running as dom0 under Xen 4.2, reading this MSR returns
> null:
> [   11.613068] P0: MSR(hi,lo): 00000000-00000000
> [   11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600
> [   11.613078] P1: MSR(hi,lo): 00000000-00000000
> [   11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600
> [   11.613085] P2: MSR(hi,lo): 00000000-00000000
> [   11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600
> [   11.613091] P3: MSR(hi,lo): 00000000-00000000
> [   11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600
> [   11.613098] P4: MSR(hi,lo): 00000000-00000000
> [   11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600
> 
> And this results in Xen failing to change the governor:
>   "(XEN) Fail change to ondemand governor"
> 
> I suppose this ultimately requires some support in the hypervisor
> to pass through the real values. But since this is at least on my
> combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a
> regression compared to older kernels, I wonder whether the following
> change might be something that should go into mainline:
> 
> --- a/drivers/acpi/processor_perflib.c
> +++ b/drivers/acpi/processor_perflib.c
> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px
>         if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>             || boot_cpu_data.x86 == 0x11) {
>                 rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> +               /* Bit 63 indicates whether contents are valid */
> +               if (!(hi & 0x8000000))
> +                       return;

I don't think that's the right change - this is fixing baremetal so that
it works on xen. And besides, this code was in powernow-k8 before so I'm
wondering why did it work then.

>                 fid = lo & 0x3f;
>                 did = (lo >> 6) & 7;
>                 if (boot_cpu_data.x86 == 0x10)
> 
> I tested something similar (so hopefully I have not failed on slapping
> together a cleaned up version), which did resolve the problem.
> 
> -Stefan
> 
> Reference: https://bugs.launchpad.net/bugs/1078619

Adding the new Andre. :-) Andre, what did we do for powernow-k8 on xen
so that the F10h 50MHz steps quirk would work?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-14 16:34 ` Borislav Petkov
@ 2013-01-14 16:55   ` Jan Beulich
  2013-01-14 17:08   ` Stefan Bader
  2013-01-15 17:53   ` Konrad Rzeszutek Wilk
  2 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2013-01-14 16:55 UTC (permalink / raw)
  To: Borislav Petkov, Andre Przywara, Stefan Bader
  Cc: xen-devel, Konrad Rzeszutek Wilk, Matthew Garrett,
	Rafael J. Wysocki, Linux Kernel Mailing List

>>> On 14.01.13 at 17:34, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
>> --- a/drivers/acpi/processor_perflib.c
>> +++ b/drivers/acpi/processor_perflib.c
>> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px 
> *px
>>         if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>>             || boot_cpu_data.x86 == 0x11) {
>>                 rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
>> +               /* Bit 63 indicates whether contents are valid */
>> +               if (!(hi & 0x8000000))
>> +                       return;
> 
> I don't think that's the right change - this is fixing baremetal so that
> it works on xen. And besides, this code was in powernow-k8 before so I'm
> wondering why did it work then.

Because the driver doesn't get loaded in that case?

Jan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-14 16:34 ` Borislav Petkov
  2013-01-14 16:55   ` [Xen-devel] " Jan Beulich
@ 2013-01-14 17:08   ` Stefan Bader
  2013-01-14 17:40       ` André Przywara
  2013-01-15 17:53   ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 28+ messages in thread
From: Stefan Bader @ 2013-01-14 17:08 UTC (permalink / raw)
  To: Borislav Petkov, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Rafael J. Wysocki,
	Konrad Rzeszutek Wilk, Matthew Garrett

[-- Attachment #1: Type: text/plain, Size: 3865 bytes --]

On 14.01.2013 17:34, Borislav Petkov wrote:
> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
>> Starting with kernel v3.7 the following commit added a quirk
>> to obtain the real frequencies of certain AMD systems:
>>
>> commit f594065faf4f9067c2283a34619fc0714e79a98d
>> Author: Matthew Garrett <mjg@redhat.com>
>> Date:   Tue Sep 4 08:28:06 2012 +0000
>>
>>     ACPI: Add fixups for AMD P-state figures
>>
>> When running bare-metal, on my Opteron 6128 test box results
>> in the frequencies remaining effectively unchanged:
>> [    5.475735] P0: MSR(hi,lo): 8000015c-50004004
>> [    5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000
>> [    5.484001] P1: MSR(hi,lo): 8000014c-50004a4e
>> [    5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500
>> [    5.492272] P2: MSR(hi,lo): 80000141-50005048
>> [    5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200
>> [    5.500540] P3: MSR(hi,lo): 80000138-50005844
>> [    5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000
>> [    5.508812] P4: MSR(hi,lo): 80000131-50005c40
>> [    5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800
>>
>> However running as dom0 under Xen 4.2, reading this MSR returns
>> null:
>> [   11.613068] P0: MSR(hi,lo): 00000000-00000000
>> [   11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600
>> [   11.613078] P1: MSR(hi,lo): 00000000-00000000
>> [   11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600
>> [   11.613085] P2: MSR(hi,lo): 00000000-00000000
>> [   11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600
>> [   11.613091] P3: MSR(hi,lo): 00000000-00000000
>> [   11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600
>> [   11.613098] P4: MSR(hi,lo): 00000000-00000000
>> [   11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600
>>
>> And this results in Xen failing to change the governor:
>>   "(XEN) Fail change to ondemand governor"
>>
>> I suppose this ultimately requires some support in the hypervisor
>> to pass through the real values. But since this is at least on my
>> combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a
>> regression compared to older kernels, I wonder whether the following
>> change might be something that should go into mainline:
>>
>> --- a/drivers/acpi/processor_perflib.c
>> +++ b/drivers/acpi/processor_perflib.c
>> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px
>>         if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>>             || boot_cpu_data.x86 == 0x11) {
>>                 rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
>> +               /* Bit 63 indicates whether contents are valid */
>> +               if (!(hi & 0x8000000))
>> +                       return;
> 
> I don't think that's the right change - this is fixing baremetal so that
> it works on xen. And besides, this code was in powernow-k8 before so I'm
> wondering why did it work then.

This actually only started to work when the xen-processor module got introduced
to provide acpi information to the hypervisor. If I remember correctly
powernow-k8 did fail.
For the way I did the fix: the AMD BIOS docs seemed to indicate that even for
bare metal bit 63 would say whether the values are valid. So I thought this is a
nice coincidence that under Xen with all 0 this matches that special case... ;)

-Stefan

> 
>>                 fid = lo & 0x3f;
>>                 did = (lo >> 6) & 7;
>>                 if (boot_cpu_data.x86 == 0x10)
>>
>> I tested something similar (so hopefully I have not failed on slapping
>> together a cleaned up version), which did resolve the problem.
>>
>> -Stefan
>>
>> Reference: https://bugs.launchpad.net/bugs/1078619
> 
> Adding the new Andre. :-) Andre, what did we do for powernow-k8 on xen
> so that the F10h 50MHz steps quirk would work?
> 
> Thanks.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-14 17:08   ` Stefan Bader
@ 2013-01-14 17:40       ` André Przywara
  0 siblings, 0 replies; 28+ messages in thread
From: André Przywara @ 2013-01-14 17:40 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Borislav Petkov, xen-devel, Linux Kernel Mailing List,
	Rafael J. Wysocki, Konrad Rzeszutek Wilk, Matthew Garrett

On Mon, 14 Jan 2013 18:08:45 +0100
Stefan Bader <stefan.bader@canonical.com> wrote:

> On 14.01.2013 17:34, Borislav Petkov wrote:
> > On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
> >> Starting with kernel v3.7 the following commit added a quirk
> >> to obtain the real frequencies of certain AMD systems:
> >>
> >> commit f594065faf4f9067c2283a34619fc0714e79a98d
> >> Author: Matthew Garrett <mjg@redhat.com>
> >> Date:   Tue Sep 4 08:28:06 2012 +0000
> >>
> >>     ACPI: Add fixups for AMD P-state figures
> >>
> >> When running bare-metal, on my Opteron 6128 test box results
> >> in the frequencies remaining effectively unchanged:
> >> [    5.475735] P0: MSR(hi,lo): 8000015c-50004004
> >> [    5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000
> >> [    5.484001] P1: MSR(hi,lo): 8000014c-50004a4e
> >> [    5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500
> >> [    5.492272] P2: MSR(hi,lo): 80000141-50005048
> >> [    5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200
> >> [    5.500540] P3: MSR(hi,lo): 80000138-50005844
> >> [    5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000
> >> [    5.508812] P4: MSR(hi,lo): 80000131-50005c40
> >> [    5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800
> >>
> >> However running as dom0 under Xen 4.2, reading this MSR returns
> >> null:
> >> [   11.613068] P0: MSR(hi,lo): 00000000-00000000
> >> [   11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600
> >> [   11.613078] P1: MSR(hi,lo): 00000000-00000000
> >> [   11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600
> >> [   11.613085] P2: MSR(hi,lo): 00000000-00000000
> >> [   11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600
> >> [   11.613091] P3: MSR(hi,lo): 00000000-00000000
> >> [   11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600
> >> [   11.613098] P4: MSR(hi,lo): 00000000-00000000
> >> [   11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600
> >>
> >> And this results in Xen failing to change the governor:
> >>   "(XEN) Fail change to ondemand governor"
> >>
> >> I suppose this ultimately requires some support in the hypervisor
> >> to pass through the real values. But since this is at least on my
> >> combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a
> >> regression compared to older kernels, I wonder whether the
> >> following change might be something that should go into mainline:
> >>
> >> --- a/drivers/acpi/processor_perflib.c
> >> +++ b/drivers/acpi/processor_perflib.c
> >> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct
> >> acpi_processor_px *px if ((boot_cpu_data.x86 == 0x10 &&
> >> boot_cpu_data.x86_model < 10) || boot_cpu_data.x86 == 0x11) {
> >>                 rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> >> +               /* Bit 63 indicates whether contents are valid */
> >> +               if (!(hi & 0x8000000))
> >> +                       return;
> > 
> > I don't think that's the right change - this is fixing baremetal so
> > that it works on xen. And besides, this code was in powernow-k8
> > before so I'm wondering why did it work then.
> 
> This actually only started to work when the xen-processor module got
> introduced to provide acpi information to the hypervisor. If I
> remember correctly powernow-k8 did fail.
> For the way I did the fix: the AMD BIOS docs seemed to indicate that
> even for bare metal bit 63 would say whether the values are valid. So
> I thought this is a nice coincidence that under Xen with all 0 this
> matches that special case... ;)

>From a first glance I think this fix is a valid approach. There are
BIOSes which disable P-states via this bit, so we have to observe this
for bare-metal, too. Let me think a bit more about this, however, and
see whether there is a better solution to do the right thing (tm) under
Xen. Getting back to you then.

Thanks,
Andre.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
@ 2013-01-14 17:40       ` André Przywara
  0 siblings, 0 replies; 28+ messages in thread
From: André Przywara @ 2013-01-14 17:40 UTC (permalink / raw)
  To: Stefan Bader
  Cc: xen-devel, Konrad Rzeszutek Wilk, Linux Kernel Mailing List,
	Rafael J. Wysocki, Borislav Petkov, Matthew Garrett

On Mon, 14 Jan 2013 18:08:45 +0100
Stefan Bader <stefan.bader@canonical.com> wrote:

> On 14.01.2013 17:34, Borislav Petkov wrote:
> > On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
> >> Starting with kernel v3.7 the following commit added a quirk
> >> to obtain the real frequencies of certain AMD systems:
> >>
> >> commit f594065faf4f9067c2283a34619fc0714e79a98d
> >> Author: Matthew Garrett <mjg@redhat.com>
> >> Date:   Tue Sep 4 08:28:06 2012 +0000
> >>
> >>     ACPI: Add fixups for AMD P-state figures
> >>
> >> When running bare-metal, on my Opteron 6128 test box results
> >> in the frequencies remaining effectively unchanged:
> >> [    5.475735] P0: MSR(hi,lo): 8000015c-50004004
> >> [    5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000
> >> [    5.484001] P1: MSR(hi,lo): 8000014c-50004a4e
> >> [    5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500
> >> [    5.492272] P2: MSR(hi,lo): 80000141-50005048
> >> [    5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200
> >> [    5.500540] P3: MSR(hi,lo): 80000138-50005844
> >> [    5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000
> >> [    5.508812] P4: MSR(hi,lo): 80000131-50005c40
> >> [    5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800
> >>
> >> However running as dom0 under Xen 4.2, reading this MSR returns
> >> null:
> >> [   11.613068] P0: MSR(hi,lo): 00000000-00000000
> >> [   11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600
> >> [   11.613078] P1: MSR(hi,lo): 00000000-00000000
> >> [   11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600
> >> [   11.613085] P2: MSR(hi,lo): 00000000-00000000
> >> [   11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600
> >> [   11.613091] P3: MSR(hi,lo): 00000000-00000000
> >> [   11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600
> >> [   11.613098] P4: MSR(hi,lo): 00000000-00000000
> >> [   11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600
> >>
> >> And this results in Xen failing to change the governor:
> >>   "(XEN) Fail change to ondemand governor"
> >>
> >> I suppose this ultimately requires some support in the hypervisor
> >> to pass through the real values. But since this is at least on my
> >> combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a
> >> regression compared to older kernels, I wonder whether the
> >> following change might be something that should go into mainline:
> >>
> >> --- a/drivers/acpi/processor_perflib.c
> >> +++ b/drivers/acpi/processor_perflib.c
> >> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct
> >> acpi_processor_px *px if ((boot_cpu_data.x86 == 0x10 &&
> >> boot_cpu_data.x86_model < 10) || boot_cpu_data.x86 == 0x11) {
> >>                 rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> >> +               /* Bit 63 indicates whether contents are valid */
> >> +               if (!(hi & 0x8000000))
> >> +                       return;
> > 
> > I don't think that's the right change - this is fixing baremetal so
> > that it works on xen. And besides, this code was in powernow-k8
> > before so I'm wondering why did it work then.
> 
> This actually only started to work when the xen-processor module got
> introduced to provide acpi information to the hypervisor. If I
> remember correctly powernow-k8 did fail.
> For the way I did the fix: the AMD BIOS docs seemed to indicate that
> even for bare metal bit 63 would say whether the values are valid. So
> I thought this is a nice coincidence that under Xen with all 0 this
> matches that special case... ;)

>From a first glance I think this fix is a valid approach. There are
BIOSes which disable P-states via this bit, so we have to observe this
for bare-metal, too. Let me think a bit more about this, however, and
see whether there is a better solution to do the right thing (tm) under
Xen. Getting back to you then.

Thanks,
Andre.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-14 15:58 kernel 3.7+ cpufreq regression on AMD system running as dom0 Stefan Bader
  2013-01-14 16:34 ` Borislav Petkov
@ 2013-01-15 13:04 ` Matt Wilson
  2013-01-15 17:59   ` [Xen-devel] " Matt Wilson
  1 sibling, 1 reply; 28+ messages in thread
From: Matt Wilson @ 2013-01-15 13:04 UTC (permalink / raw)
  To: Stefan Bader
  Cc: xen-devel, Linux Kernel Mailing List, Rafael J. Wysocki,
	Konrad Rzeszutek Wilk, Matthew Garrett, André Przywara

On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
> Starting with kernel v3.7 the following commit added a quirk
> to obtain the real frequencies of certain AMD systems:
> 
> commit f594065faf4f9067c2283a34619fc0714e79a98d
> Author: Matthew Garrett <mjg@redhat.com>
> Date:   Tue Sep 4 08:28:06 2012 +0000
> 
>     ACPI: Add fixups for AMD P-state figures
> 
> When running bare-metal, on my Opteron 6128 test box results
> in the frequencies remaining effectively unchanged:
> [    5.475735] P0: MSR(hi,lo): 8000015c-50004004
> [    5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000
> [    5.484001] P1: MSR(hi,lo): 8000014c-50004a4e
> [    5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500
> [    5.492272] P2: MSR(hi,lo): 80000141-50005048
> [    5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200
> [    5.500540] P3: MSR(hi,lo): 80000138-50005844
> [    5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000
> [    5.508812] P4: MSR(hi,lo): 80000131-50005c40
> [    5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800
> 
> However running as dom0 under Xen 4.2, reading this MSR returns
> null:
> [   11.613068] P0: MSR(hi,lo): 00000000-00000000
> [   11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600
> [   11.613078] P1: MSR(hi,lo): 00000000-00000000
> [   11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600
> [   11.613085] P2: MSR(hi,lo): 00000000-00000000
> [   11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600
> [   11.613091] P3: MSR(hi,lo): 00000000-00000000
> [   11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600
> [   11.613098] P4: MSR(hi,lo): 00000000-00000000
> [   11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600
> 
> And this results in Xen failing to change the governor:
>   "(XEN) Fail change to ondemand governor"
> 
> I suppose this ultimately requires some support in the hypervisor
> to pass through the real values. But since this is at least on my
> combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a
> regression compared to older kernels, I wonder whether the following
> change might be something that should go into mainline:

The correct values should be returned already via rdmsr if
"cpureq=dom0-kernel" is specified on the Xen command line. Looking at
the LP report, it doesn't seem that this option was used.

Likely you will also need to use "dom0_vcpu_pin=1" if you want dom0 to
be the cpufreq controller on AMD.

Matt

> --- a/drivers/acpi/processor_perflib.c
> +++ b/drivers/acpi/processor_perflib.c
> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px
>         if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>             || boot_cpu_data.x86 == 0x11) {
>                 rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> +               /* Bit 63 indicates whether contents are valid */
> +               if (!(hi & 0x8000000))
> +                       return;
>                 fid = lo & 0x3f;
>                 did = (lo >> 6) & 7;
>                 if (boot_cpu_data.x86 == 0x10)
> 
> I tested something similar (so hopefully I have not failed on slapping
> together a cleaned up version), which did resolve the problem.
> 
> -Stefan
> 
> Reference: https://bugs.launchpad.net/bugs/1078619

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-14 16:34 ` Borislav Petkov
  2013-01-14 16:55   ` [Xen-devel] " Jan Beulich
  2013-01-14 17:08   ` Stefan Bader
@ 2013-01-15 17:53   ` Konrad Rzeszutek Wilk
  2013-01-15 18:18     ` Borislav Petkov
                       ` (2 more replies)
  2 siblings, 3 replies; 28+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-15 17:53 UTC (permalink / raw)
  To: Borislav Petkov, Stefan Bader, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Rafael J. Wysocki, Matthew Garrett

On Mon, Jan 14, 2013 at 05:34:45PM +0100, Borislav Petkov wrote:
> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
> > Starting with kernel v3.7 the following commit added a quirk
> > to obtain the real frequencies of certain AMD systems:
> > 
> > commit f594065faf4f9067c2283a34619fc0714e79a98d
> > Author: Matthew Garrett <mjg@redhat.com>
> > Date:   Tue Sep 4 08:28:06 2012 +0000
> > 
> >     ACPI: Add fixups for AMD P-state figures
> > 
> > When running bare-metal, on my Opteron 6128 test box results
> > in the frequencies remaining effectively unchanged:
> > [    5.475735] P0: MSR(hi,lo): 8000015c-50004004
> > [    5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000
> > [    5.484001] P1: MSR(hi,lo): 8000014c-50004a4e
> > [    5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500
> > [    5.492272] P2: MSR(hi,lo): 80000141-50005048
> > [    5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200
> > [    5.500540] P3: MSR(hi,lo): 80000138-50005844
> > [    5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000
> > [    5.508812] P4: MSR(hi,lo): 80000131-50005c40
> > [    5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800
> > 
> > However running as dom0 under Xen 4.2, reading this MSR returns
> > null:
> > [   11.613068] P0: MSR(hi,lo): 00000000-00000000
> > [   11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600
> > [   11.613078] P1: MSR(hi,lo): 00000000-00000000
> > [   11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600
> > [   11.613085] P2: MSR(hi,lo): 00000000-00000000
> > [   11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600
> > [   11.613091] P3: MSR(hi,lo): 00000000-00000000
> > [   11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600
> > [   11.613098] P4: MSR(hi,lo): 00000000-00000000
> > [   11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600
> > 
> > And this results in Xen failing to change the governor:
> >   "(XEN) Fail change to ondemand governor"
> > 
> > I suppose this ultimately requires some support in the hypervisor
> > to pass through the real values. But since this is at least on my
> > combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a
> > regression compared to older kernels, I wonder whether the following
> > change might be something that should go into mainline:
> > 
> > --- a/drivers/acpi/processor_perflib.c
> > +++ b/drivers/acpi/processor_perflib.c
> > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px
> >         if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
> >             || boot_cpu_data.x86 == 0x11) {
> >                 rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> > +               /* Bit 63 indicates whether contents are valid */
> > +               if (!(hi & 0x8000000))
> > +                       return;
> 
> I don't think that's the right change - this is fixing baremetal so that
> it works on xen. And besides, this code was in powernow-k8 before so I'm
> wondering why did it work then.

Powernow-k8 only populated the cpufreq policy information. This library
(processor_perflib) is the generic library used for ACPI P-states parsing.
This specific function (acpi_processor_get_performance_states) is just
used to fetch and parse the P-states.

Xen-acpi-processor (which we use to upload the P and C-states to the
hypervisor) ends up calling this library to parse the P-states
and this unfortunate quirk clamps the P-states based on the MSRS.

It is odd that this CPU specific quirk got added in this generic
library. Is there no ACPI quirk system similar to how DMI quirks
are handled?

Anyhow, I think this patch makes sense - it makes sure that the
MSR value is sane.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

> 
> >                 fid = lo & 0x3f;
> >                 did = (lo >> 6) & 7;
> >                 if (boot_cpu_data.x86 == 0x10)
> > 
> > I tested something similar (so hopefully I have not failed on slapping
> > together a cleaned up version), which did resolve the problem.
> > 
> > -Stefan
> > 
> > Reference: https://bugs.launchpad.net/bugs/1078619
> 
> Adding the new Andre. :-) Andre, what did we do for powernow-k8 on xen
> so that the F10h 50MHz steps quirk would work?
> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-15 13:04 ` Matt Wilson
@ 2013-01-15 17:59   ` Matt Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Matt Wilson @ 2013-01-15 17:59 UTC (permalink / raw)
  To: Stefan Bader
  Cc: André Przywara, xen-devel, Konrad Rzeszutek Wilk,
	Linux Kernel Mailing List, Rafael J. Wysocki, Matthew Garrett

On Tue, Jan 15, 2013 at 02:04:30PM +0100, Matt Wilson wrote:
> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
> > Starting with kernel v3.7 the following commit added a quirk
> > to obtain the real frequencies of certain AMD systems:
> > 
> > commit f594065faf4f9067c2283a34619fc0714e79a98d
> > Author: Matthew Garrett <mjg@redhat.com>
> > Date:   Tue Sep 4 08:28:06 2012 +0000
> > 
> >     ACPI: Add fixups for AMD P-state figures
> > 
> > When running bare-metal, on my Opteron 6128 test box results
> > in the frequencies remaining effectively unchanged:
> > [    5.475735] P0: MSR(hi,lo): 8000015c-50004004
> > [    5.479049] P0: fid=0x4, did=0x0, freq: 2000 -> 2000
> > [    5.484001] P1: MSR(hi,lo): 8000014c-50004a4e
> > [    5.487314] P1: fid=0xe, did=0x1, freq: 1500 -> 1500
> > [    5.492272] P2: MSR(hi,lo): 80000141-50005048
> > [    5.495584] P2: fid=0x8, did=0x1, freq: 1200 -> 1200
> > [    5.500540] P3: MSR(hi,lo): 80000138-50005844
> > [    5.503853] P3: fid=0x4, did=0x1, freq: 1000 -> 1000
> > [    5.508812] P4: MSR(hi,lo): 80000131-50005c40
> > [    5.512125] P4: fid=0x0, did=0x1, freq: 800 -> 800
> > 
> > However running as dom0 under Xen 4.2, reading this MSR returns
> > null:
> > [   11.613068] P0: MSR(hi,lo): 00000000-00000000
> > [   11.613074] P0: fid=0x0, did=0x0, freq: 2000 -> 1600
> > [   11.613078] P1: MSR(hi,lo): 00000000-00000000
> > [   11.613081] P1: fid=0x0, did=0x0, freq: 1500 -> 1600
> > [   11.613085] P2: MSR(hi,lo): 00000000-00000000
> > [   11.613088] P2: fid=0x0, did=0x0, freq: 1200 -> 1600
> > [   11.613091] P3: MSR(hi,lo): 00000000-00000000
> > [   11.613094] P3: fid=0x0, did=0x0, freq: 1000 -> 1600
> > [   11.613098] P4: MSR(hi,lo): 00000000-00000000
> > [   11.613101] P4: fid=0x0, did=0x0, freq: 800 -> 1600
> > 
> > And this results in Xen failing to change the governor:
> >   "(XEN) Fail change to ondemand governor"
> > 
> > I suppose this ultimately requires some support in the hypervisor
> > to pass through the real values. But since this is at least on my
> > combination of Xen 4.2 + kernel v3.7+ and AMD family 0x10 CPU a
> > regression compared to older kernels, I wonder whether the following
> > change might be something that should go into mainline:
> 
> The correct values should be returned already via rdmsr if
> "cpureq=dom0-kernel" is specified on the Xen command line. Looking at
> the LP report, it doesn't seem that this option was used.
> 
> Likely you will also need to use "dom0_vcpu_pin=1" if you want dom0 to
> be the cpufreq controller on AMD.

Aah, after chatting with Konrad I understand the problem better. This
broke using Xen as the cpufreq controller - sorry for failing to
notice that. :-)

Checking for a valid result seems to make sense.

Matt

> > --- a/drivers/acpi/processor_perflib.c
> > +++ b/drivers/acpi/processor_perflib.c
> > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px
> >         if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
> >             || boot_cpu_data.x86 == 0x11) {
> >                 rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> > +               /* Bit 63 indicates whether contents are valid */
> > +               if (!(hi & 0x8000000))
> > +                       return;
> >                 fid = lo & 0x3f;
> >                 did = (lo >> 6) & 7;
> >                 if (boot_cpu_data.x86 == 0x10)
> > 
> > I tested something similar (so hopefully I have not failed on slapping
> > together a cleaned up version), which did resolve the problem.
> > 
> > -Stefan
> > 
> > Reference: https://bugs.launchpad.net/bugs/1078619
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-15 17:53   ` Konrad Rzeszutek Wilk
@ 2013-01-15 18:18     ` Borislav Petkov
  2013-01-18 19:00       ` Konrad Rzeszutek Wilk
  2013-01-16 10:26     ` Jan Beulich
  2013-01-16 10:26     ` Jan Beulich
  2 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2013-01-15 18:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefan Bader, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Rafael J. Wysocki, Matthew Garrett

On Tue, Jan 15, 2013 at 12:53:05PM -0500, Konrad Rzeszutek Wilk wrote:
> > I don't think that's the right change - this is fixing baremetal so that
> > it works on xen. And besides, this code was in powernow-k8 before so I'm
> > wondering why did it work then.
> 
> Powernow-k8 only populated the cpufreq policy information. This library
> (processor_perflib) is the generic library used for ACPI P-states parsing.
> This specific function (acpi_processor_get_performance_states) is just
> used to fetch and parse the P-states.
> 
> Xen-acpi-processor (which we use to upload the P and C-states to the
> hypervisor) ends up calling this library to parse the P-states
> and this unfortunate quirk clamps the P-states based on the MSRS.

Huh? This is a fix for _PSS frequency values which are rounded and thus
imprecise. The _PSS objects are the unfortunate ones, as most of the
other crap BIOS produces.

> It is odd that this CPU specific quirk got added in this generic
> library. Is there no ACPI quirk system similar to how DMI quirks are
> handled?

Even if there were, do you know all the boards and BIOS revisions which
have those rounded values? The fix addresses the hardware which has
those 50MHz multiples and simply ignores the _PSS data but reads out the
P-states directly from the hardware.

> Anyhow, I think this patch makes sense - it makes sure that the MSR
> value is sane.

I agree to a certain degree. Testing the Valid bit is something we
should do for P-state MSRs - and for all MSRs containing a Valid bit,
for that matter - and the original code didn't do it.

However, you need to push down the *correct* frequencies *after* the
quirk to the hypervisor (I'm looking at push_pxx_to_hypervisor()) so
that it is aware of the exact P-state frequencies this CPU supports and
not some rounded values.

AFAICT for the xen part, of course. But the baseline stands: you need to
tell the thing that switches P-states the exact P-state frequencies of
the CPU. :-).

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-15 17:53   ` Konrad Rzeszutek Wilk
  2013-01-15 18:18     ` Borislav Petkov
@ 2013-01-16 10:26     ` Jan Beulich
  2013-01-16 14:34       ` Stefan Bader
  2013-01-16 14:34       ` [Xen-devel] " Stefan Bader
  2013-01-16 10:26     ` Jan Beulich
  2 siblings, 2 replies; 28+ messages in thread
From: Jan Beulich @ 2013-01-16 10:26 UTC (permalink / raw)
  To: Stefan Bader, Konrad Rzeszutek Wilk
  Cc: Borislav Petkov, Andre Przywara, xen-devel, Matthew Garrett,
	Rafael J. Wysocki, Linux Kernel Mailing List

>>> On 15.01.13 at 18:53, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Mon, Jan 14, 2013 at 05:34:45PM +0100, Borislav Petkov wrote:
>> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
>> > --- a/drivers/acpi/processor_perflib.c
>> > +++ b/drivers/acpi/processor_perflib.c
>> > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px
>> >         if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>> >             || boot_cpu_data.x86 == 0x11) {
>> >                 rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
>> > +               /* Bit 63 indicates whether contents are valid */
>> > +               if (!(hi & 0x8000000))
>> > +                       return;
>> 
>> I don't think that's the right change - this is fixing baremetal so that
>> it works on xen. And besides, this code was in powernow-k8 before so I'm
>> wondering why did it work then.
> 
> Powernow-k8 only populated the cpufreq policy information. This library
> (processor_perflib) is the generic library used for ACPI P-states parsing.
> This specific function (acpi_processor_get_performance_states) is just
> used to fetch and parse the P-states.
> 
> Xen-acpi-processor (which we use to upload the P and C-states to the
> hypervisor) ends up calling this library to parse the P-states
> and this unfortunate quirk clamps the P-states based on the MSRS.
> 
> It is odd that this CPU specific quirk got added in this generic
> library. Is there no ACPI quirk system similar to how DMI quirks
> are handled?
> 
> Anyhow, I think this patch makes sense - it makes sure that the
> MSR value is sane.
> 
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Did someone actually _test_ that patch? I ask because the mask
used (0x8000000) doesn't check bit 63 as the comment says, but
bit 59 instead...

Jan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-15 17:53   ` Konrad Rzeszutek Wilk
  2013-01-15 18:18     ` Borislav Petkov
  2013-01-16 10:26     ` Jan Beulich
@ 2013-01-16 10:26     ` Jan Beulich
  2 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2013-01-16 10:26 UTC (permalink / raw)
  To: Stefan Bader, Konrad Rzeszutek Wilk
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, xen-devel,
	Andre Przywara, Borislav Petkov, Matthew Garrett

>>> On 15.01.13 at 18:53, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Mon, Jan 14, 2013 at 05:34:45PM +0100, Borislav Petkov wrote:
>> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
>> > --- a/drivers/acpi/processor_perflib.c
>> > +++ b/drivers/acpi/processor_perflib.c
>> > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px
>> >         if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>> >             || boot_cpu_data.x86 == 0x11) {
>> >                 rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
>> > +               /* Bit 63 indicates whether contents are valid */
>> > +               if (!(hi & 0x8000000))
>> > +                       return;
>> 
>> I don't think that's the right change - this is fixing baremetal so that
>> it works on xen. And besides, this code was in powernow-k8 before so I'm
>> wondering why did it work then.
> 
> Powernow-k8 only populated the cpufreq policy information. This library
> (processor_perflib) is the generic library used for ACPI P-states parsing.
> This specific function (acpi_processor_get_performance_states) is just
> used to fetch and parse the P-states.
> 
> Xen-acpi-processor (which we use to upload the P and C-states to the
> hypervisor) ends up calling this library to parse the P-states
> and this unfortunate quirk clamps the P-states based on the MSRS.
> 
> It is odd that this CPU specific quirk got added in this generic
> library. Is there no ACPI quirk system similar to how DMI quirks
> are handled?
> 
> Anyhow, I think this patch makes sense - it makes sure that the
> MSR value is sane.
> 
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Did someone actually _test_ that patch? I ask because the mask
used (0x8000000) doesn't check bit 63 as the comment says, but
bit 59 instead...

Jan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-16 10:26     ` Jan Beulich
  2013-01-16 14:34       ` Stefan Bader
@ 2013-01-16 14:34       ` Stefan Bader
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Bader @ 2013-01-16 14:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Konrad Rzeszutek Wilk, Borislav Petkov, Andre Przywara,
	xen-devel, Matthew Garrett, Rafael J. Wysocki,
	Linux Kernel Mailing List

On 01/16/2013 11:26 AM, Jan Beulich wrote:
>>>> On 15.01.13 at 18:53, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> On Mon, Jan 14, 2013 at 05:34:45PM +0100, Borislav Petkov wrote:
>>> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
>>>> --- a/drivers/acpi/processor_perflib.c
>>>> +++ b/drivers/acpi/processor_perflib.c
>>>> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px
>>>>          if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>>>>              || boot_cpu_data.x86 == 0x11) {
>>>>                  rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
>>>> +               /* Bit 63 indicates whether contents are valid */
>>>> +               if (!(hi & 0x8000000))
>>>> +                       return;
>>>
>>> I don't think that's the right change - this is fixing baremetal so that
>>> it works on xen. And besides, this code was in powernow-k8 before so I'm
>>> wondering why did it work then.
>>
>> Powernow-k8 only populated the cpufreq policy information. This library
>> (processor_perflib) is the generic library used for ACPI P-states parsing.
>> This specific function (acpi_processor_get_performance_states) is just
>> used to fetch and parse the P-states.
>>
>> Xen-acpi-processor (which we use to upload the P and C-states to the
>> hypervisor) ends up calling this library to parse the P-states
>> and this unfortunate quirk clamps the P-states based on the MSRS.
>>
>> It is odd that this CPU specific quirk got added in this generic
>> library. Is there no ACPI quirk system similar to how DMI quirks
>> are handled?
>>
>> Anyhow, I think this patch makes sense - it makes sure that the
>> MSR value is sane.
>>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>
> Did someone actually _test_ that patch? I ask because the mask
> used (0x8000000) doesn't check bit 63 as the comment says, but
> bit 59 instead...
>
> Jan
>
Not this version which I did at the end of a day to have a cleaned up version to 
discuss. And obviously I managed to get the number of zeros wrong. :(

The comment is right and it should be 0x80000000

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-16 10:26     ` Jan Beulich
@ 2013-01-16 14:34       ` Stefan Bader
  2013-01-16 14:34       ` [Xen-devel] " Stefan Bader
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Bader @ 2013-01-16 14:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Rafael J. Wysocki, Konrad Rzeszutek Wilk,
	Linux Kernel Mailing List, xen-devel, Andre Przywara,
	Borislav Petkov, Matthew Garrett

On 01/16/2013 11:26 AM, Jan Beulich wrote:
>>>> On 15.01.13 at 18:53, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> On Mon, Jan 14, 2013 at 05:34:45PM +0100, Borislav Petkov wrote:
>>> On Mon, Jan 14, 2013 at 04:58:54PM +0100, Stefan Bader wrote:
>>>> --- a/drivers/acpi/processor_perflib.c
>>>> +++ b/drivers/acpi/processor_perflib.c
>>>> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px
>>>>          if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>>>>              || boot_cpu_data.x86 == 0x11) {
>>>>                  rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
>>>> +               /* Bit 63 indicates whether contents are valid */
>>>> +               if (!(hi & 0x8000000))
>>>> +                       return;
>>>
>>> I don't think that's the right change - this is fixing baremetal so that
>>> it works on xen. And besides, this code was in powernow-k8 before so I'm
>>> wondering why did it work then.
>>
>> Powernow-k8 only populated the cpufreq policy information. This library
>> (processor_perflib) is the generic library used for ACPI P-states parsing.
>> This specific function (acpi_processor_get_performance_states) is just
>> used to fetch and parse the P-states.
>>
>> Xen-acpi-processor (which we use to upload the P and C-states to the
>> hypervisor) ends up calling this library to parse the P-states
>> and this unfortunate quirk clamps the P-states based on the MSRS.
>>
>> It is odd that this CPU specific quirk got added in this generic
>> library. Is there no ACPI quirk system similar to how DMI quirks
>> are handled?
>>
>> Anyhow, I think this patch makes sense - it makes sure that the
>> MSR value is sane.
>>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>
> Did someone actually _test_ that patch? I ask because the mask
> used (0x8000000) doesn't check bit 63 as the comment says, but
> bit 59 instead...
>
> Jan
>
Not this version which I did at the end of a day to have a cleaned up version to 
discuss. And obviously I managed to get the number of zeros wrong. :(

The comment is right and it should be 0x80000000

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-15 18:18     ` Borislav Petkov
@ 2013-01-18 19:00       ` Konrad Rzeszutek Wilk
  2013-01-18 19:38         ` [Xen-devel] " Boris Ostrovsky
                           ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-18 19:00 UTC (permalink / raw)
  To: Borislav Petkov, Stefan Bader, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Rafael J. Wysocki, Matthew Garrett

On Tue, Jan 15, 2013 at 07:18:39PM +0100, Borislav Petkov wrote:
> On Tue, Jan 15, 2013 at 12:53:05PM -0500, Konrad Rzeszutek Wilk wrote:
> > > I don't think that's the right change - this is fixing baremetal so that
> > > it works on xen. And besides, this code was in powernow-k8 before so I'm
> > > wondering why did it work then.
> > 
> > Powernow-k8 only populated the cpufreq policy information. This library
> > (processor_perflib) is the generic library used for ACPI P-states parsing.
> > This specific function (acpi_processor_get_performance_states) is just
> > used to fetch and parse the P-states.
> > 
> > Xen-acpi-processor (which we use to upload the P and C-states to the
> > hypervisor) ends up calling this library to parse the P-states
> > and this unfortunate quirk clamps the P-states based on the MSRS.
> 
> Huh? This is a fix for _PSS frequency values which are rounded and thus
> imprecise. The _PSS objects are the unfortunate ones, as most of the
> other crap BIOS produces.

I did not explain myself well. The fix is OK - it just that the hypervisor
causes the quirk to not work correctly. Hmm, I wonder if there BIOSes
that do the same thing (cause the MSR to return 0). Per you estimation
of BIOS quality, it seems that this could happen.

> 
> > It is odd that this CPU specific quirk got added in this generic
> > library. Is there no ACPI quirk system similar to how DMI quirks are
> > handled?
> 
> Even if there were, do you know all the boards and BIOS revisions which
> have those rounded values? The fix addresses the hardware which has
> those 50MHz multiples and simply ignores the _PSS data but reads out the
> P-states directly from the hardware.

Oh, I was not thinking DMI per-say. I was thinking something similar to
DMI-quirk API. But for the ACPI subsystem, so it would be:

	if (ARM)
		... these quirks neccessary
	if (AMD)
		.. these quirks

and then the ACPI code can make the calls to this ACPI-quirk API to
figure out whether it needs to modulate values. But this is all
hand-waving at this point.
> 
> > Anyhow, I think this patch makes sense - it makes sure that the MSR
> > value is sane.
> 
> I agree to a certain degree. Testing the Valid bit is something we
> should do for P-state MSRs - and for all MSRs containing a Valid bit,
> for that matter - and the original code didn't do it.

OK.
> 
> However, you need to push down the *correct* frequencies *after* the
> quirk to the hypervisor (I'm looking at push_pxx_to_hypervisor()) so
> that it is aware of the exact P-state frequencies this CPU supports and
> not some rounded values.

Yes!  It could be done in the hypervisor (it does the MSRs and figures out
that the P-states need tweaking).

> 
> AFAICT for the xen part, of course. But the baseline stands: you need to
> tell the thing that switches P-states the exact P-state frequencies of
> the CPU. :-).

Right, that information is gathered from the MSRs. I think the Xen would
need to do this since it can do the MSRs correctly and modify the P-states.

So something like this in the hypervisor maybe (not even tested):

diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c
index a9b7792..54e7808 100644
--- a/xen/arch/x86/acpi/cpufreq/powernow.c
+++ b/xen/arch/x86/acpi/cpufreq/powernow.c
@@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy,
 
     return 0;
 }
+#define MSR_AMD_PSTATE_DEF_BASE     0xc0010064
+static void amd_fixup_frequency(struct xen_processor_px *px, int i)
+{
+	u32 hi, lo, fid, did;
+	int index = px->control & 0x00000007;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+		return;
+
+	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
+	    || boot_cpu_data.x86 == 0x11) {
+		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
+        /* Bit 63 indicates whether contents are valid */
+        if (!(hi & 0x80000000))
+            return;
+
+		fid = lo & 0x3f;
+		did = (lo >> 6) & 7;
+		if (boot_cpu_data.x86 == 0x10)
+			px->core_frequency = (100 * (fid + 0x10)) >> did;
+		else
+			px->core_frequency = (100 * (fid + 8)) >> did;
+	}
+}
+
+static void amd_fixup_freq(struct processor_performance *perf)
+{
 
+    int i;
+
+    for (i = 0; i < perf->state_count; i++)
+        amd_fixup_frequency(perf->states, i);
+
+}
 static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
 {
     struct acpi_cpufreq_data *data;
@@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
 
     perf = &processor_pminfo[policy->cpu]->perf;
 
+    amd_fixup_freq(perf);
+
     cpufreq_verify_within_limits(policy, 0, 
         perf->states[perf->platform_limit].core_frequency * 1000);
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-18 19:00       ` Konrad Rzeszutek Wilk
@ 2013-01-18 19:38         ` Boris Ostrovsky
  2013-01-18 19:44           ` Andrew Cooper
  2013-01-18 20:03         ` Borislav Petkov
  2013-01-22  0:01         ` [Xen-devel] " Boris Ostrovsky
  2 siblings, 1 reply; 28+ messages in thread
From: Boris Ostrovsky @ 2013-01-18 19:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Borislav Petkov, Stefan Bader, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Rafael J. Wysocki, Matthew Garrett

On 01/18/2013 02:00 PM, Konrad Rzeszutek Wilk wrote:
>
> Right, that information is gathered from the MSRs. I think the Xen would
> need to do this since it can do the MSRs correctly and modify the P-states.
>
> So something like this in the hypervisor maybe (not even tested):

Is there any harm in allowing dom0 read P-state registers?

Something along these lines:

diff -r 40881d58e991 xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c	Thu Jan 17 14:47:04 2013 -0500
+++ b/xen/arch/x86/traps.c	Fri Jan 18 09:32:51 2013 -0500
@@ -2535,7 +2535,7 @@ static int emulate_privileged_op(struct
          case MSR_K8_PSTATE7:
              if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
                  goto fail;
-            if ( !is_cpufreq_controller(v->domain) )
+            if ( d->domain_id != 0 )
              {
                  regs->eax = regs->edx = 0;
                  break;


(It does seem to fix the bug too)

-boris


>
> diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c
> index a9b7792..54e7808 100644
> --- a/xen/arch/x86/acpi/cpufreq/powernow.c
> +++ b/xen/arch/x86/acpi/cpufreq/powernow.c
> @@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy,
>
>       return 0;
>   }
> +#define MSR_AMD_PSTATE_DEF_BASE     0xc0010064
> +static void amd_fixup_frequency(struct xen_processor_px *px, int i)
> +{
> +	u32 hi, lo, fid, did;
> +	int index = px->control & 0x00000007;
> +
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> +		return;
> +
> +	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
> +	    || boot_cpu_data.x86 == 0x11) {
> +		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> +        /* Bit 63 indicates whether contents are valid */
> +        if (!(hi & 0x80000000))
> +            return;
> +
> +		fid = lo & 0x3f;
> +		did = (lo >> 6) & 7;
> +		if (boot_cpu_data.x86 == 0x10)
> +			px->core_frequency = (100 * (fid + 0x10)) >> did;
> +		else
> +			px->core_frequency = (100 * (fid + 8)) >> did;
> +	}
> +}
> +
> +static void amd_fixup_freq(struct processor_performance *perf)
> +{
>
> +    int i;
> +
> +    for (i = 0; i < perf->state_count; i++)
> +        amd_fixup_frequency(perf->states, i);
> +
> +}
>   static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
>   {
>       struct acpi_cpufreq_data *data;
> @@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
>
>       perf = &processor_pminfo[policy->cpu]->perf;
>
> +    amd_fixup_freq(perf);
> +
>       cpufreq_verify_within_limits(policy, 0,
>           perf->states[perf->platform_limit].core_frequency * 1000);
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-18 19:38         ` [Xen-devel] " Boris Ostrovsky
@ 2013-01-18 19:44           ` Andrew Cooper
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Cooper @ 2013-01-18 19:44 UTC (permalink / raw)
  To: xen-devel

On 18/01/13 19:38, Boris Ostrovsky wrote:
> On 01/18/2013 02:00 PM, Konrad Rzeszutek Wilk wrote:
>> Right, that information is gathered from the MSRs. I think the Xen would
>> need to do this since it can do the MSRs correctly and modify the P-states.
>>
>> So something like this in the hypervisor maybe (not even tested):
> Is there any harm in allowing dom0 read P-state registers?

Yes - the dom0 vcpu could be moved across pcpus between MSR accesses.

There is currently some hacky code for pinning the dom0 cpus right at
boot time, after which dom0 is permitted to access a few more MSRs,
which appear to be power related.

~Andrew

>
> Something along these lines:
>
> diff -r 40881d58e991 xen/arch/x86/traps.c
> --- a/xen/arch/x86/traps.c	Thu Jan 17 14:47:04 2013 -0500
> +++ b/xen/arch/x86/traps.c	Fri Jan 18 09:32:51 2013 -0500
> @@ -2535,7 +2535,7 @@ static int emulate_privileged_op(struct
>           case MSR_K8_PSTATE7:
>               if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD )
>                   goto fail;
> -            if ( !is_cpufreq_controller(v->domain) )
> +            if ( d->domain_id != 0 )
>               {
>                   regs->eax = regs->edx = 0;
>                   break;
>
>
> (It does seem to fix the bug too)
>
> -boris
>
>
>> diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c
>> index a9b7792..54e7808 100644
>> --- a/xen/arch/x86/acpi/cpufreq/powernow.c
>> +++ b/xen/arch/x86/acpi/cpufreq/powernow.c
>> @@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy,
>>
>>       return 0;
>>   }
>> +#define MSR_AMD_PSTATE_DEF_BASE     0xc0010064
>> +static void amd_fixup_frequency(struct xen_processor_px *px, int i)
>> +{
>> +	u32 hi, lo, fid, did;
>> +	int index = px->control & 0x00000007;
>> +
>> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
>> +		return;
>> +
>> +	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>> +	    || boot_cpu_data.x86 == 0x11) {
>> +		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
>> +        /* Bit 63 indicates whether contents are valid */
>> +        if (!(hi & 0x80000000))
>> +            return;
>> +
>> +		fid = lo & 0x3f;
>> +		did = (lo >> 6) & 7;
>> +		if (boot_cpu_data.x86 == 0x10)
>> +			px->core_frequency = (100 * (fid + 0x10)) >> did;
>> +		else
>> +			px->core_frequency = (100 * (fid + 8)) >> did;
>> +	}
>> +}
>> +
>> +static void amd_fixup_freq(struct processor_performance *perf)
>> +{
>>
>> +    int i;
>> +
>> +    for (i = 0; i < perf->state_count; i++)
>> +        amd_fixup_frequency(perf->states, i);
>> +
>> +}
>>   static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
>>   {
>>       struct acpi_cpufreq_data *data;
>> @@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
>>
>>       perf = &processor_pminfo[policy->cpu]->perf;
>>
>> +    amd_fixup_freq(perf);
>> +
>>       cpufreq_verify_within_limits(policy, 0,
>>           perf->states[perf->platform_limit].core_frequency * 1000);
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-18 19:00       ` Konrad Rzeszutek Wilk
  2013-01-18 19:38         ` [Xen-devel] " Boris Ostrovsky
@ 2013-01-18 20:03         ` Borislav Petkov
  2013-01-18 22:00           ` Konrad Rzeszutek Wilk
  2013-01-21 12:22           ` Stefan Bader
  2013-01-22  0:01         ` [Xen-devel] " Boris Ostrovsky
  2 siblings, 2 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-01-18 20:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefan Bader, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Rafael J. Wysocki, Matthew Garrett

On Fri, Jan 18, 2013 at 02:00:15PM -0500, Konrad Rzeszutek Wilk wrote:
> I did not explain myself well. The fix is OK - it just that the
> hypervisor causes the quirk to not work correctly. Hmm, I wonder if
> there BIOSes that do the same thing (cause the MSR to return 0). Per
> you estimation of BIOS quality, it seems that this could happen.

Yeah, I don't think there's a limit to the amount of SNAFU a BIOS can
cause :-).

> Oh, I was not thinking DMI per-say. I was thinking something similar to
> DMI-quirk API. But for the ACPI subsystem, so it would be:
> 
> 	if (ARM)
> 		... these quirks neccessary
> 	if (AMD)
> 		.. these quirks
> 
> and then the ACPI code can make the calls to this ACPI-quirk API to
> figure out whether it needs to modulate values. But this is all
> hand-waving at this point.

Yeah, those CPUs are just a very small set to even warrant a quirk API.

[ … ]

> Right, that information is gathered from the MSRs. I think the Xen would
> need to do this since it can do the MSRs correctly and modify the P-states.
> 
> So something like this in the hypervisor maybe (not even tested):

Yeah, something like that. Basically you can copy the quirk down to the
hypervisor.

But, Andre was explaining to me the other day that those P-states
frequencies are not that important.

Let me explain: the ondemand governor, for example, computes idle time
and each time it needs to increase, it switches straight up to the
highest frequency. When it decreases the freq. though, it goes down in a
staircase manner, going over all P-states, AFAICT.

So we use them but not for all decisions. The question is, what does the
xen governor(s) do?

If it only uses the frequencies for reporting, then it is not that big
of a deal. If it uses their values for switching decisions, then it
probably needs the correct ones.

> diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c
> index a9b7792..54e7808 100644
> --- a/xen/arch/x86/acpi/cpufreq/powernow.c
> +++ b/xen/arch/x86/acpi/cpufreq/powernow.c
> @@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy,
>  
>      return 0;
>  }
> +#define MSR_AMD_PSTATE_DEF_BASE     0xc0010064
> +static void amd_fixup_frequency(struct xen_processor_px *px, int i)
> +{
> +	u32 hi, lo, fid, did;
> +	int index = px->control & 0x00000007;
> +
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> +		return;
> +
> +	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
> +	    || boot_cpu_data.x86 == 0x11) {
> +		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> +        /* Bit 63 indicates whether contents are valid */
> +        if (!(hi & 0x80000000))
> +            return;

Something's funny with this indentation.

> +
> +		fid = lo & 0x3f;
> +		did = (lo >> 6) & 7;
> +		if (boot_cpu_data.x86 == 0x10)
> +			px->core_frequency = (100 * (fid + 0x10)) >> did;
> +		else
> +			px->core_frequency = (100 * (fid + 8)) >> did;
> +	}
> +}
> +
> +static void amd_fixup_freq(struct processor_performance *perf)
> +{
>  
> +    int i;
> +
> +    for (i = 0; i < perf->state_count; i++)
> +        amd_fixup_frequency(perf->states, i);
> +
> +}
>  static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
>  {
>      struct acpi_cpufreq_data *data;
> @@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
>  
>      perf = &processor_pminfo[policy->cpu]->perf;
>  
> +    amd_fixup_freq(perf);
> +
>      cpufreq_verify_within_limits(policy, 0, 
>          perf->states[perf->platform_limit].core_frequency * 1000);

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-18 20:03         ` Borislav Petkov
@ 2013-01-18 22:00           ` Konrad Rzeszutek Wilk
  2013-01-21 12:22           ` Stefan Bader
  1 sibling, 0 replies; 28+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-01-18 22:00 UTC (permalink / raw)
  To: Borislav Petkov, Stefan Bader, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Rafael J. Wysocki, Matthew Garrett

> 
> > Right, that information is gathered from the MSRs. I think the Xen would
> > need to do this since it can do the MSRs correctly and modify the P-states.
> > 
> > So something like this in the hypervisor maybe (not even tested):
> 
> Yeah, something like that. Basically you can copy the quirk down to the
> hypervisor.

<nods> Need also to include the comments from Matthew in it.
> 
> But, Andre was explaining to me the other day that those P-states
> frequencies are not that important.
> 
> Let me explain: the ondemand governor, for example, computes idle time
> and each time it needs to increase, it switches straight up to the
> highest frequency. When it decreases the freq. though, it goes down in a
> staircase manner, going over all P-states, AFAICT.
> 
> So we use them but not for all decisions. The question is, what does the
> xen governor(s) do?

 should look in the code. I know it borrowed from the Linux code - but
I don't know from which era - 2.6.18 maybe?

> 
> If it only uses the frequencies for reporting, then it is not that big
> of a deal. If it uses their values for switching decisions, then it
> probably needs the correct ones.

OK. 
> 
> > diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c
> > index a9b7792..54e7808 100644
> > --- a/xen/arch/x86/acpi/cpufreq/powernow.c
> > +++ b/xen/arch/x86/acpi/cpufreq/powernow.c
> > @@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy,
> >  
> >      return 0;
> >  }
> > +#define MSR_AMD_PSTATE_DEF_BASE     0xc0010064
> > +static void amd_fixup_frequency(struct xen_processor_px *px, int i)
> > +{
> > +	u32 hi, lo, fid, did;
> > +	int index = px->control & 0x00000007;
> > +
> > +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> > +		return;
> > +
> > +	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
> > +	    || boot_cpu_data.x86 == 0x11) {
> > +		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> > +        /* Bit 63 indicates whether contents are valid */
> > +        if (!(hi & 0x80000000))
> > +            return;
> 
> Something's funny with this indentation.

That was copy-n-paste, so I must have done something incorrectly. Anyhow
I still need to actually test this code.

> 
> > +
> > +		fid = lo & 0x3f;
> > +		did = (lo >> 6) & 7;
> > +		if (boot_cpu_data.x86 == 0x10)
> > +			px->core_frequency = (100 * (fid + 0x10)) >> did;
> > +		else
> > +			px->core_frequency = (100 * (fid + 8)) >> did;
> > +	}
> > +}
> > +
> > +static void amd_fixup_freq(struct processor_performance *perf)
> > +{
> >  
> > +    int i;
> > +
> > +    for (i = 0; i < perf->state_count; i++)
> > +        amd_fixup_frequency(perf->states, i);
> > +
> > +}
> >  static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
> >  {
> >      struct acpi_cpufreq_data *data;
> > @@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
> >  
> >      perf = &processor_pminfo[policy->cpu]->perf;
> >  
> > +    amd_fixup_freq(perf);
> > +
> >      cpufreq_verify_within_limits(policy, 0, 
> >          perf->states[perf->platform_limit].core_frequency * 1000);
> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-18 20:03         ` Borislav Petkov
  2013-01-18 22:00           ` Konrad Rzeszutek Wilk
@ 2013-01-21 12:22           ` Stefan Bader
  2013-01-21 12:42             ` Borislav Petkov
  1 sibling, 1 reply; 28+ messages in thread
From: Stefan Bader @ 2013-01-21 12:22 UTC (permalink / raw)
  To: Borislav Petkov, Konrad Rzeszutek Wilk, Andre Przywara,
	xen-devel, Linux Kernel Mailing List, Rafael J. Wysocki,
	Matthew Garrett

[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]

On 01/18/2013 08:03 PM, Borislav Petkov wrote:
> On Fri, Jan 18, 2013 at 02:00:15PM -0500, Konrad Rzeszutek Wilk wrote:
>> I did not explain myself well. The fix is OK - it just that the
>> hypervisor causes the quirk to not work correctly. Hmm, I wonder if
>> there BIOSes that do the same thing (cause the MSR to return 0). Per
>> you estimation of BIOS quality, it seems that this could happen.
>
> Yeah, I don't think there's a limit to the amount of SNAFU a BIOS can
> cause :-).
>

So for having the "check for sensible BIOS" in mainline I refreshed the patch 
(fixed the bit test, and actually tested it this time) and also added some 
hopefully sensible explanation to it (attached below).

Should I send it to acpi lists or would that have to go via an Andre?

-Stefan

 From 6e2fc8291c91339123a37162382d8b08b50867ba Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Mon, 14 Jan 2013 16:17:00 +0100
Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies

To fix incorrect P-state frequencies which can happen on
some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d
   "ACPI: Add fixups for AMD P-state figures"
introduced a quirk to obtain the correct values by reading
from AMD specific MSRs.

This did cause a regression when running a kernel using that
quirk under Xen which does (currently) not pass on the contents
of the HW but 0. And this seems to cause a failure to initialize
the ondemand governour (hard to say for sure as all P-states
appear to run at the same frequency).

While this should also be fixed in the hypervisor (to allow
a guest to read that MSR), this patch is intended to work
around the issue in the meantime. In discussion it turned out
that indeed real HW/BIOSes may choose to not set the valid bit
and thus mark the P-state as invalid. So this could be considered
a fix for broken BIOSes that also works around the issue on Xen.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@vger.kernel.org # v3.7..
---
  drivers/acpi/processor_perflib.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index 836bfe0..41f4bdac 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px 
*px, int i)
  	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
  	    || boot_cpu_data.x86 == 0x11) {
  		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
+		/* Bit 63 indicates whether contents are valid */
+		if (!(hi & 0x80000000))
+			return;
  		fid = lo & 0x3f;
  		did = (lo >> 6) & 7;
  		if (boot_cpu_data.x86 == 0x10)
-- 
1.8.0



[-- Attachment #2: 0001-ACPI-Check-MSR-valid-bit-before-using-P-state-freque.patch --]
[-- Type: text/x-diff, Size: 1932 bytes --]

>From 6e2fc8291c91339123a37162382d8b08b50867ba Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Mon, 14 Jan 2013 16:17:00 +0100
Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies

To fix incorrect P-state frequencies which can happen on
some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d
  "ACPI: Add fixups for AMD P-state figures"
introduced a quirk to obtain the correct values by reading
from AMD specific MSRs.

This did cause a regression when running a kernel using that
quirk under Xen which does (currently) not pass on the contents
of the HW but 0. And this seems to cause a failure to initialize
the ondemand governour (hard to say for sure as all P-states
appear to run at the same frequency).

While this should also be fixed in the hypervisor (to allow
a guest to read that MSR), this patch is intended to work
around the issue in the meantime. In discussion it turned out
that indeed real HW/BIOSes may choose to not set the valid bit
and thus mark the P-state as invalid. So this could be considered
a fix for broken BIOSes that also works around the issue on Xen.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@vger.kernel.org # v3.7..
---
 drivers/acpi/processor_perflib.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index 836bfe0..41f4bdac 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct acpi_processor_px *px, int i)
 	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
 	    || boot_cpu_data.x86 == 0x11) {
 		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
+		/* Bit 63 indicates whether contents are valid */
+		if (!(hi & 0x80000000))
+			return;
 		fid = lo & 0x3f;
 		did = (lo >> 6) & 7;
 		if (boot_cpu_data.x86 == 0x10)
-- 
1.8.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-21 12:22           ` Stefan Bader
@ 2013-01-21 12:42             ` Borislav Petkov
  2013-01-21 12:53               ` Rafael J. Wysocki
                                 ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-01-21 12:42 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Konrad Rzeszutek Wilk, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Rafael J. Wysocki, Matthew Garrett

On Mon, Jan 21, 2013 at 12:22:18PM +0000, Stefan Bader wrote:
> So for having the "check for sensible BIOS" in mainline I refreshed
> the patch (fixed the bit test, and actually tested it this time) and
> also added some hopefully sensible explanation to it (attached
> below).
> 
> Should I send it to acpi lists or would that have to go via an Andre?

Maybe Rafael could pick it up?

> 
> -Stefan
> 
> From 6e2fc8291c91339123a37162382d8b08b50867ba Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Mon, 14 Jan 2013 16:17:00 +0100
> Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies
> 
> To fix incorrect P-state frequencies which can happen on
> some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d
>   "ACPI: Add fixups for AMD P-state figures"
> introduced a quirk to obtain the correct values by reading
> from AMD specific MSRs.
> 
> This did cause a regression when running a kernel using that
> quirk under Xen which does (currently) not pass on the contents
> of the HW but 0.

Actually this should say "does not currently pass through MSR accesses
to baremetal" or similar.

And this bit you mean is actually bit 63:

"63: PstateEn. Read-write. 1=The P-state specified by this MSR is valid.
0=The P-state specified by this MSR is not valid. The purpose of this
register is to indicate if the rest of the P-state information in the
register is valid after a reset; it controls no hardware."

in the MSRC001_00[68:64] P-State [4:0] Registers.

> And this seems to cause a failure to initialize
> the ondemand governour (hard to say for sure as all P-states
> appear to run at the same frequency).
> 
> While this should also be fixed in the hypervisor (to allow
> a guest to read that MSR), this patch is intended to work
> around the issue in the meantime. In discussion it turned out
> that indeed real HW/BIOSes may choose to not set the valid bit
> and thus mark the P-state as invalid. So this could be considered
> a fix for broken BIOSes that also works around the issue on Xen.
> 
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> Cc: stable@vger.kernel.org # v3.7..
> ---
>  drivers/acpi/processor_perflib.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
> index 836bfe0..41f4bdac 100644
> --- a/drivers/acpi/processor_perflib.c
> +++ b/drivers/acpi/processor_perflib.c
> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct
> acpi_processor_px *px, int i)
>  	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>  	    || boot_cpu_data.x86 == 0x11) {
>  		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> +		/* Bit 63 indicates whether contents are valid */
> +		if (!(hi & 0x80000000))

You can make this a lot more explicit:

		if (!(hi & BIT(31)))
			return;

This way

1) you're sure you're testing the correct bit and
2) any reviewer can know on the spot which bit it is about.

> +			return;
>  		fid = lo & 0x3f;
>  		did = (lo >> 6) & 7;
>  		if (boot_cpu_data.x86 == 0x10)

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-21 12:42             ` Borislav Petkov
@ 2013-01-21 12:53               ` Rafael J. Wysocki
  2013-01-21 13:08                 ` Borislav Petkov
  2013-01-21 13:11               ` Stefan Bader
  2013-01-21 15:03               ` Stefan Bader
  2 siblings, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2013-01-21 12:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stefan Bader, Konrad Rzeszutek Wilk, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Matthew Garrett

On Monday, January 21, 2013 01:42:55 PM Borislav Petkov wrote:
> On Mon, Jan 21, 2013 at 12:22:18PM +0000, Stefan Bader wrote:
> > So for having the "check for sensible BIOS" in mainline I refreshed
> > the patch (fixed the bit test, and actually tested it this time) and
> > also added some hopefully sensible explanation to it (attached
> > below).
> > 
> > Should I send it to acpi lists or would that have to go via an Andre?
> 
> Maybe Rafael could pick it up?

I can, if you ACK it for me. :-)

Thanks,
Rafael

 
> > From 6e2fc8291c91339123a37162382d8b08b50867ba Mon Sep 17 00:00:00 2001
> > From: Stefan Bader <stefan.bader@canonical.com>
> > Date: Mon, 14 Jan 2013 16:17:00 +0100
> > Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies
> > 
> > To fix incorrect P-state frequencies which can happen on
> > some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d
> >   "ACPI: Add fixups for AMD P-state figures"
> > introduced a quirk to obtain the correct values by reading
> > from AMD specific MSRs.
> > 
> > This did cause a regression when running a kernel using that
> > quirk under Xen which does (currently) not pass on the contents
> > of the HW but 0.
> 
> Actually this should say "does not currently pass through MSR accesses
> to baremetal" or similar.
> 
> And this bit you mean is actually bit 63:
> 
> "63: PstateEn. Read-write. 1=The P-state specified by this MSR is valid.
> 0=The P-state specified by this MSR is not valid. The purpose of this
> register is to indicate if the rest of the P-state information in the
> register is valid after a reset; it controls no hardware."
> 
> in the MSRC001_00[68:64] P-State [4:0] Registers.
> 
> > And this seems to cause a failure to initialize
> > the ondemand governour (hard to say for sure as all P-states
> > appear to run at the same frequency).
> > 
> > While this should also be fixed in the hypervisor (to allow
> > a guest to read that MSR), this patch is intended to work
> > around the issue in the meantime. In discussion it turned out
> > that indeed real HW/BIOSes may choose to not set the valid bit
> > and thus mark the P-state as invalid. So this could be considered
> > a fix for broken BIOSes that also works around the issue on Xen.
> > 
> > Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> > Cc: stable@vger.kernel.org # v3.7..
> > ---
> >  drivers/acpi/processor_perflib.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
> > index 836bfe0..41f4bdac 100644
> > --- a/drivers/acpi/processor_perflib.c
> > +++ b/drivers/acpi/processor_perflib.c
> > @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct
> > acpi_processor_px *px, int i)
> >  	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
> >  	    || boot_cpu_data.x86 == 0x11) {
> >  		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> > +		/* Bit 63 indicates whether contents are valid */
> > +		if (!(hi & 0x80000000))
> 
> You can make this a lot more explicit:
> 
> 		if (!(hi & BIT(31)))
> 			return;
> 
> This way
> 
> 1) you're sure you're testing the correct bit and
> 2) any reviewer can know on the spot which bit it is about.
> 
> > +			return;
> >  		fid = lo & 0x3f;
> >  		did = (lo >> 6) & 7;
> >  		if (boot_cpu_data.x86 == 0x10)
> 
> Thanks.
> 
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-21 12:53               ` Rafael J. Wysocki
@ 2013-01-21 13:08                 ` Borislav Petkov
  0 siblings, 0 replies; 28+ messages in thread
From: Borislav Petkov @ 2013-01-21 13:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stefan Bader, Konrad Rzeszutek Wilk, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Matthew Garrett

On Mon, Jan 21, 2013 at 01:53:38PM +0100, Rafael J. Wysocki wrote:

> I can, if you ACK it for me. :-)

Will do, after Stefan corrects some cosmetic issues :-)

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-21 12:42             ` Borislav Petkov
  2013-01-21 12:53               ` Rafael J. Wysocki
@ 2013-01-21 13:11               ` Stefan Bader
  2013-01-21 15:03               ` Stefan Bader
  2 siblings, 0 replies; 28+ messages in thread
From: Stefan Bader @ 2013-01-21 13:11 UTC (permalink / raw)
  To: Borislav Petkov, Konrad Rzeszutek Wilk, Andre Przywara,
	xen-devel, Linux Kernel Mailing List, Rafael J. Wysocki,
	Matthew Garrett

On 01/21/2013 12:42 PM, Borislav Petkov wrote:
> On Mon, Jan 21, 2013 at 12:22:18PM +0000, Stefan Bader wrote:
>> So for having the "check for sensible BIOS" in mainline I refreshed
>> the patch (fixed the bit test, and actually tested it this time) and
>> also added some hopefully sensible explanation to it (attached
>> below).
>>
>> Should I send it to acpi lists or would that have to go via an Andre?
>
> Maybe Rafael could pick it up?
>
>>
>> -Stefan
>>
>>  From 6e2fc8291c91339123a37162382d8b08b50867ba Mon Sep 17 00:00:00 2001
>> From: Stefan Bader <stefan.bader@canonical.com>
>> Date: Mon, 14 Jan 2013 16:17:00 +0100
>> Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies
>>
>> To fix incorrect P-state frequencies which can happen on
>> some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d
>>    "ACPI: Add fixups for AMD P-state figures"
>> introduced a quirk to obtain the correct values by reading
>> from AMD specific MSRs.
>>
>> This did cause a regression when running a kernel using that
>> quirk under Xen which does (currently) not pass on the contents
>> of the HW but 0.
>
> Actually this should say "does not currently pass through MSR accesses
> to baremetal" or similar.

Ok, that sounds much better.

>
> And this bit you mean is actually bit 63:
>
> "63: PstateEn. Read-write. 1=The P-state specified by this MSR is valid.
> 0=The P-state specified by this MSR is not valid. The purpose of this
> register is to indicate if the rest of the P-state information in the
> register is valid after a reset; it controls no hardware."
>
> in the MSRC001_00[68:64] P-State [4:0] Registers.

Darn, yes.

>
>> And this seems to cause a failure to initialize
>> the ondemand governour (hard to say for sure as all P-states
>> appear to run at the same frequency).
>>
>> While this should also be fixed in the hypervisor (to allow
>> a guest to read that MSR), this patch is intended to work
>> around the issue in the meantime. In discussion it turned out
>> that indeed real HW/BIOSes may choose to not set the valid bit
>> and thus mark the P-state as invalid. So this could be considered
>> a fix for broken BIOSes that also works around the issue on Xen.
>>
>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>> Cc: stable@vger.kernel.org # v3.7..
>> ---
>>   drivers/acpi/processor_perflib.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
>> index 836bfe0..41f4bdac 100644
>> --- a/drivers/acpi/processor_perflib.c
>> +++ b/drivers/acpi/processor_perflib.c
>> @@ -340,6 +340,9 @@ static void amd_fixup_frequency(struct
>> acpi_processor_px *px, int i)
>>   	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>>   	    || boot_cpu_data.x86 == 0x11) {
>>   		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
>> +		/* Bit 63 indicates whether contents are valid */
>> +		if (!(hi & 0x80000000))
>
> You can make this a lot more explicit:
>
> 		if (!(hi & BIT(31)))
> 			return;
>

True, ok, so let me respin the whole thing and re-send it.

-Stefan
> This way
>
> 1) you're sure you're testing the correct bit and
> 2) any reviewer can know on the spot which bit it is about.
>
>> +			return;
>>   		fid = lo & 0x3f;
>>   		did = (lo >> 6) & 7;
>>   		if (boot_cpu_data.x86 == 0x10)
>
> Thanks.
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-21 12:42             ` Borislav Petkov
  2013-01-21 12:53               ` Rafael J. Wysocki
  2013-01-21 13:11               ` Stefan Bader
@ 2013-01-21 15:03               ` Stefan Bader
  2013-01-21 15:31                 ` Borislav Petkov
  2 siblings, 1 reply; 28+ messages in thread
From: Stefan Bader @ 2013-01-21 15:03 UTC (permalink / raw)
  To: Borislav Petkov, Konrad Rzeszutek Wilk, Andre Przywara,
	xen-devel, Linux Kernel Mailing List, Rafael J. Wysocki,
	Matthew Garrett

[-- Attachment #1: Type: text/plain, Size: 2363 bytes --]

Ok, so how about this?

-Stefan

 From 9870926d4a847e36c0f61921762fd50f1c92f75d Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Mon, 14 Jan 2013 16:17:00 +0100
Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies

To fix incorrect P-state frequencies which can happen on
some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d
   "ACPI: Add fixups for AMD P-state figures"
introduced a quirk to obtain the correct values by reading
from AMD specific MSRs.

This did cause a regression when running a kernel using that
quirk under Xen which does (currently) not pass through MSR
reads to the HW. Instead the guest gets a 0 in return.
And this seems to cause a failure to initialize the ondemand
governour (hard to say for sure as all P-states appear to run
at the same frequency).

While this should also be fixed in the hypervisor (to allow
a guest to read that MSR), this patch is intended to work
around the issue in the meantime. In discussion it turned out
that indeed real HW/BIOSes may choose to not set the valid bit
and thus mark the P-state as invalid. So this could be considered
a fix for broken BIOSes that also works around the issue on Xen.

[v2] Reword description text and use helper for bit index.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@vger.kernel.org # v3.7..
---
  drivers/acpi/processor_perflib.c | 11 +++++++++++
  1 file changed, 11 insertions(+)

diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index 836bfe0..caa042e 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -340,6 +340,17 @@ static void amd_fixup_frequency(struct acpi_processor_px 
*px, int i)
  	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
  	    || boot_cpu_data.x86 == 0x11) {
  		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
+		/*
+		 * MSR C001_0064+:
+		 * Bit 63: PstateEn. Read-write. 1=The P-state specified by
+		 * this MSR is valid. 0=The P-state specified by this MSR is
+		 * not valid. The purpose of this register is to indicate if
+		 * the rest of the P-state information in the register is
+		 * valid after a reset; it controls no hardware.
+		 */
+		if (!(hi & BIT(31)))
+			return;
+
  		fid = lo & 0x3f;
  		did = (lo >> 6) & 7;
  		if (boot_cpu_data.x86 == 0x10)
-- 
1.8.0


[-- Attachment #2: 0001-ACPI-Check-MSR-valid-bit-before-using-P-state-freque.patch --]
[-- Type: text/x-diff, Size: 2320 bytes --]

>From 9870926d4a847e36c0f61921762fd50f1c92f75d Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Mon, 14 Jan 2013 16:17:00 +0100
Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies

To fix incorrect P-state frequencies which can happen on
some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d
  "ACPI: Add fixups for AMD P-state figures"
introduced a quirk to obtain the correct values by reading
from AMD specific MSRs.

This did cause a regression when running a kernel using that
quirk under Xen which does (currently) not pass through MSR
reads to the HW. Instead the guest gets a 0 in return.
And this seems to cause a failure to initialize the ondemand
governour (hard to say for sure as all P-states appear to run
at the same frequency).

While this should also be fixed in the hypervisor (to allow
a guest to read that MSR), this patch is intended to work
around the issue in the meantime. In discussion it turned out
that indeed real HW/BIOSes may choose to not set the valid bit
and thus mark the P-state as invalid. So this could be considered
a fix for broken BIOSes that also works around the issue on Xen.

[v2] Reword description text and use helper for bit index.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@vger.kernel.org # v3.7..
---
 drivers/acpi/processor_perflib.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index 836bfe0..caa042e 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -340,6 +340,17 @@ static void amd_fixup_frequency(struct acpi_processor_px *px, int i)
 	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
 	    || boot_cpu_data.x86 == 0x11) {
 		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
+		/*
+		 * MSR C001_0064+:
+		 * Bit 63: PstateEn. Read-write. 1=The P-state specified by
+		 * this MSR is valid. 0=The P-state specified by this MSR is
+		 * not valid. The purpose of this register is to indicate if
+		 * the rest of the P-state information in the register is
+		 * valid after a reset; it controls no hardware.
+		 */
+		if (!(hi & BIT(31)))
+			return;
+
 		fid = lo & 0x3f;
 		did = (lo >> 6) & 7;
 		if (boot_cpu_data.x86 == 0x10)
-- 
1.8.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-21 15:03               ` Stefan Bader
@ 2013-01-21 15:31                 ` Borislav Petkov
  2013-01-22 13:54                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2013-01-21 15:31 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Konrad Rzeszutek Wilk, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Rafael J. Wysocki, Matthew Garrett

On Mon, Jan 21, 2013 at 03:03:43PM +0000, Stefan Bader wrote:
> From 9870926d4a847e36c0f61921762fd50f1c92f75d Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Mon, 14 Jan 2013 16:17:00 +0100
> Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies
> 
> To fix incorrect P-state frequencies which can happen on
> some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d
>   "ACPI: Add fixups for AMD P-state figures"
> introduced a quirk to obtain the correct values by reading
> from AMD specific MSRs.
> 
> This did cause a regression when running a kernel using that
> quirk under Xen which does (currently) not pass through MSR
> reads to the HW. Instead the guest gets a 0 in return.
> And this seems to cause a failure to initialize the ondemand
> governour (hard to say for sure as all P-states appear to run
> at the same frequency).
> 
> While this should also be fixed in the hypervisor (to allow
> a guest to read that MSR), this patch is intended to work
> around the issue in the meantime. In discussion it turned out
> that indeed real HW/BIOSes may choose to not set the valid bit
> and thus mark the P-state as invalid. So this could be considered
> a fix for broken BIOSes that also works around the issue on Xen.
> 
> [v2] Reword description text and use helper for bit index.
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> Cc: stable@vger.kernel.org # v3.7..
> ---
>  drivers/acpi/processor_perflib.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
> index 836bfe0..caa042e 100644
> --- a/drivers/acpi/processor_perflib.c
> +++ b/drivers/acpi/processor_perflib.c
> @@ -340,6 +340,17 @@ static void amd_fixup_frequency(struct
> acpi_processor_px *px, int i)
>  	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
>  	    || boot_cpu_data.x86 == 0x11) {
>  		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> +		/*
> +		 * MSR C001_0064+:
> +		 * Bit 63: PstateEn. Read-write. 1=The P-state specified by
> +		 * this MSR is valid. 0=The P-state specified by this MSR is
> +		 * not valid. The purpose of this register is to indicate if
> +		 * the rest of the P-state information in the register is
> +		 * valid after a reset; it controls no hardware.
> +		 */

Maybe this comment is a but too long and it contains that idiotic
processor manual speak :-). It should've contained only the first
sentence: "PstateEn. If set, the P-state is valid."

But maybe Rafael could correct it while committing, no reason to resend
for that only.

Acked-by: Borislav Petkov <bp@suse.de>


> +		if (!(hi & BIT(31)))
> +			return;
> +
>  		fid = lo & 0x3f;
>  		did = (lo >> 6) & 7;
>  		if (boot_cpu_data.x86 == 0x10)
> -- 
> 1.8.0

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xen-devel] kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-18 19:00       ` Konrad Rzeszutek Wilk
  2013-01-18 19:38         ` [Xen-devel] " Boris Ostrovsky
  2013-01-18 20:03         ` Borislav Petkov
@ 2013-01-22  0:01         ` Boris Ostrovsky
  2 siblings, 0 replies; 28+ messages in thread
From: Boris Ostrovsky @ 2013-01-22  0:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Borislav Petkov, Stefan Bader, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Rafael J. Wysocki, Matthew Garrett



On 01/18/2013 02:00 PM, Konrad Rzeszutek Wilk wrote:
>
> So something like this in the hypervisor maybe (not even tested):
>
> diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c
> index a9b7792..54e7808 100644
> --- a/xen/arch/x86/acpi/cpufreq/powernow.c
> +++ b/xen/arch/x86/acpi/cpufreq/powernow.c
> @@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy,
>
>       return 0;
>   }
> +#define MSR_AMD_PSTATE_DEF_BASE     0xc0010064
> +static void amd_fixup_frequency(struct xen_processor_px *px, int i)
> +{
> +	u32 hi, lo, fid, did;
> +	int index = px->control & 0x00000007;
> +
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> +		return;
> +
> +	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
> +	    || boot_cpu_data.x86 == 0x11) {
> +		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> +        /* Bit 63 indicates whether contents are valid */
> +        if (!(hi & 0x80000000))
> +            return;
> +
> +		fid = lo & 0x3f;
> +		did = (lo >> 6) & 7;
> +		if (boot_cpu_data.x86 == 0x10)
> +			px->core_frequency = (100 * (fid + 0x10)) >> did;
> +		else
> +			px->core_frequency = (100 * (fid + 8)) >> did;
> +	}
> +}
> +
> +static void amd_fixup_freq(struct processor_performance *perf)
> +{
>
> +    int i;
> +
> +    for (i = 0; i < perf->state_count; i++)
> +        amd_fixup_frequency(perf->states, i);

amd_fixup_frequency(&perf->states[i]) will walk P-states.

> +
> +}
>   static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
>   {
>       struct acpi_cpufreq_data *data;
> @@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
>
>       perf = &processor_pminfo[policy->cpu]->perf;
>
> +    amd_fixup_freq(perf);

Maybe move to powernow_cpufreq_cpu_init(), although this works too.


-boris


> +
>       cpufreq_verify_within_limits(policy, 0,
>           perf->states[perf->platform_limit].core_frequency * 1000);
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
  2013-01-21 15:31                 ` Borislav Petkov
@ 2013-01-22 13:54                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 28+ messages in thread
From: Rafael J. Wysocki @ 2013-01-22 13:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stefan Bader, Konrad Rzeszutek Wilk, Andre Przywara, xen-devel,
	Linux Kernel Mailing List, Matthew Garrett

On Monday, January 21, 2013 04:31:05 PM Borislav Petkov wrote:
> On Mon, Jan 21, 2013 at 03:03:43PM +0000, Stefan Bader wrote:
> > From 9870926d4a847e36c0f61921762fd50f1c92f75d Mon Sep 17 00:00:00 2001
> > From: Stefan Bader <stefan.bader@canonical.com>
> > Date: Mon, 14 Jan 2013 16:17:00 +0100
> > Subject: [PATCH] ACPI: Check MSR valid bit before using P-state frequencies
> > 
> > To fix incorrect P-state frequencies which can happen on
> > some AMD systems f594065faf4f9067c2283a34619fc0714e79a98d
> >   "ACPI: Add fixups for AMD P-state figures"
> > introduced a quirk to obtain the correct values by reading
> > from AMD specific MSRs.
> > 
> > This did cause a regression when running a kernel using that
> > quirk under Xen which does (currently) not pass through MSR
> > reads to the HW. Instead the guest gets a 0 in return.
> > And this seems to cause a failure to initialize the ondemand
> > governour (hard to say for sure as all P-states appear to run
> > at the same frequency).
> > 
> > While this should also be fixed in the hypervisor (to allow
> > a guest to read that MSR), this patch is intended to work
> > around the issue in the meantime. In discussion it turned out
> > that indeed real HW/BIOSes may choose to not set the valid bit
> > and thus mark the P-state as invalid. So this could be considered
> > a fix for broken BIOSes that also works around the issue on Xen.
> > 
> > [v2] Reword description text and use helper for bit index.
> > Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> > Cc: stable@vger.kernel.org # v3.7..
> > ---
> >  drivers/acpi/processor_perflib.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
> > index 836bfe0..caa042e 100644
> > --- a/drivers/acpi/processor_perflib.c
> > +++ b/drivers/acpi/processor_perflib.c
> > @@ -340,6 +340,17 @@ static void amd_fixup_frequency(struct
> > acpi_processor_px *px, int i)
> >  	if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
> >  	    || boot_cpu_data.x86 == 0x11) {
> >  		rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
> > +		/*
> > +		 * MSR C001_0064+:
> > +		 * Bit 63: PstateEn. Read-write. 1=The P-state specified by
> > +		 * this MSR is valid. 0=The P-state specified by this MSR is
> > +		 * not valid. The purpose of this register is to indicate if
> > +		 * the rest of the P-state information in the register is
> > +		 * valid after a reset; it controls no hardware.
> > +		 */
> 
> Maybe this comment is a but too long and it contains that idiotic
> processor manual speak :-). It should've contained only the first
> sentence: "PstateEn. If set, the P-state is valid."
> 
> But maybe Rafael could correct it while committing, no reason to resend
> for that only.
> 
> Acked-by: Borislav Petkov <bp@suse.de>

Applied, the comment fixed up.

It's in my bleeding-edge branch for now and I'll move it to the linux-next
branch after build testing.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-01-22 13:48 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-14 15:58 kernel 3.7+ cpufreq regression on AMD system running as dom0 Stefan Bader
2013-01-14 16:34 ` Borislav Petkov
2013-01-14 16:55   ` [Xen-devel] " Jan Beulich
2013-01-14 17:08   ` Stefan Bader
2013-01-14 17:40     ` André Przywara
2013-01-14 17:40       ` André Przywara
2013-01-15 17:53   ` Konrad Rzeszutek Wilk
2013-01-15 18:18     ` Borislav Petkov
2013-01-18 19:00       ` Konrad Rzeszutek Wilk
2013-01-18 19:38         ` [Xen-devel] " Boris Ostrovsky
2013-01-18 19:44           ` Andrew Cooper
2013-01-18 20:03         ` Borislav Petkov
2013-01-18 22:00           ` Konrad Rzeszutek Wilk
2013-01-21 12:22           ` Stefan Bader
2013-01-21 12:42             ` Borislav Petkov
2013-01-21 12:53               ` Rafael J. Wysocki
2013-01-21 13:08                 ` Borislav Petkov
2013-01-21 13:11               ` Stefan Bader
2013-01-21 15:03               ` Stefan Bader
2013-01-21 15:31                 ` Borislav Petkov
2013-01-22 13:54                   ` Rafael J. Wysocki
2013-01-22  0:01         ` [Xen-devel] " Boris Ostrovsky
2013-01-16 10:26     ` Jan Beulich
2013-01-16 14:34       ` Stefan Bader
2013-01-16 14:34       ` [Xen-devel] " Stefan Bader
2013-01-16 10:26     ` Jan Beulich
2013-01-15 13:04 ` Matt Wilson
2013-01-15 17:59   ` [Xen-devel] " Matt Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.