All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: Use deep C states for off-lined CPUs
@ 2012-02-28 22:08 Boris Ostrovsky
  2012-02-29  1:37 ` Zhang, Yang Z
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Boris Ostrovsky @ 2012-02-28 22:08 UTC (permalink / raw)
  To: xen-devel; +Cc: boris.ostrovsky

# HG changeset patch
# User Boris Ostrovsky <boris.ostrovsky@amd.com>
# Date 1330466573 -3600
# Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c
# Parent  a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9
x86: Use deep C states for off-lined CPUs

Currently when a core is taken off-line it is placed in C1 state (unless MONITOR/MWAIT
is used). This patch allows a core to go to deeper C states resulting in significantly
higher power savings.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>

diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c
--- a/xen/arch/x86/acpi/cpu_idle.c	Mon Feb 27 17:05:18 2012 +0000
+++ b/xen/arch/x86/acpi/cpu_idle.c	Tue Feb 28 23:02:53 2012 +0100
@@ -573,10 +573,10 @@ static void acpi_dead_idle(void)
     if ( (cx = &power->states[power->count-1]) == NULL )
         goto default_halt;
 
-    mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
-
     if ( cx->entry_method == ACPI_CSTATE_EM_FFH )
     {
+        mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
+
         /*
          * Cache must be flushed as the last operation before sleeping.
          * Otherwise, CPU may still hold dirty data, breaking cache coherency,
@@ -601,6 +601,20 @@ static void acpi_dead_idle(void)
             mb();
             __mwait(cx->address, 0);
         }
+    } 
+    else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO )
+    {
+        /* Avoid references to shared data after the cache flush */
+        u32 address = cx->address;
+        u32 pmtmr_ioport_local = pmtmr_ioport;
+
+        wbinvd();	
+
+        while ( 1 )
+        {
+            inb(address);
+            inl(pmtmr_ioport_local);
+        }
     }
 
 default_halt:

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Use deep C states for off-lined CPUs
  2012-02-28 22:08 [PATCH] x86: Use deep C states for off-lined CPUs Boris Ostrovsky
@ 2012-02-29  1:37 ` Zhang, Yang Z
  2012-02-29  4:03   ` Ostrovsky, Boris
  2012-02-29  4:58 ` Liu, Jinsong
  2012-02-29  9:12 ` Jan Beulich
  2 siblings, 1 reply; 11+ messages in thread
From: Zhang, Yang Z @ 2012-02-29  1:37 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel

I noticed the following comments when using mwait based idle:
-------------------------------------------------------------------------
        while ( 1 )
        {
            /*
             * 1. The CLFLUSH is a workaround for erratum AAI65 for
             * the Xeon 7400 series.
             * 2. The WBINVD is insufficient due to the spurious-wakeup
             * case where we return around the loop.
             * 3. Unlike wbinvd, clflush is a light weight but not serializing
             * instruction, hence memory fence is necessary to make sure all
             * load/store visible before flush cache line.
             */
            mb();
            clflush(mwait_ptr);
            __monitor(mwait_ptr, 0, 0);
            mb();
            __mwait(cx->address, 0);
        }
    }
-------------------------------------------------------------------------
Your patch should follow it too.

best regards
yang


> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org
> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Boris Ostrovsky
> Sent: Wednesday, February 29, 2012 6:09 AM
> To: xen-devel@lists.xensource.com
> Cc: boris.ostrovsky@amd.com
> Subject: [Xen-devel] [PATCH] x86: Use deep C states for off-lined CPUs
> 
> # HG changeset patch
> # User Boris Ostrovsky <boris.ostrovsky@amd.com> # Date 1330466573 -3600 #
> Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c
> # Parent  a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9
> x86: Use deep C states for off-lined CPUs
> 
> Currently when a core is taken off-line it is placed in C1 state (unless
> MONITOR/MWAIT is used). This patch allows a core to go to deeper C states
> resulting in significantly higher power savings.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
> 
> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c
> --- a/xen/arch/x86/acpi/cpu_idle.c	Mon Feb 27 17:05:18 2012 +0000
> +++ b/xen/arch/x86/acpi/cpu_idle.c	Tue Feb 28 23:02:53 2012 +0100
> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void)
>      if ( (cx = &power->states[power->count-1]) == NULL )
>          goto default_halt;
> 
> -    mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
> -
>      if ( cx->entry_method == ACPI_CSTATE_EM_FFH )
>      {
> +        mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
> +
>          /*
>           * Cache must be flushed as the last operation before sleeping.
>           * Otherwise, CPU may still hold dirty data, breaking cache coherency,
> @@ -601,6 +601,20 @@ static void acpi_dead_idle(void)
>              mb();
>              __mwait(cx->address, 0);
>          }
> +    }
> +    else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO )
> +    {
> +        /* Avoid references to shared data after the cache flush */
> +        u32 address = cx->address;
> +        u32 pmtmr_ioport_local = pmtmr_ioport;
> +
> +        wbinvd();
> +
> +        while ( 1 )
> +        {
> +            inb(address);
> +            inl(pmtmr_ioport_local);
> +        }
>      }
> 
>  default_halt:
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Use deep C states for off-lined CPUs
  2012-02-29  1:37 ` Zhang, Yang Z
@ 2012-02-29  4:03   ` Ostrovsky, Boris
  2012-02-29  5:21     ` Liu, Jinsong
  0 siblings, 1 reply; 11+ messages in thread
From: Ostrovsky, Boris @ 2012-02-29  4:03 UTC (permalink / raw)
  To: Zhang, Yang Z, xen-devel

The patch is adding IO-based C-states. My understading is that CFLUSH was to work around a MONITOR-related erratum. 

Or are you referring to something else?

-boris


________________________________________
From: Zhang, Yang Z [yang.z.zhang@intel.com]
Sent: Tuesday, February 28, 2012 8:37 PM
To: Ostrovsky, Boris; xen-devel@lists.xensource.com
Subject: RE: [Xen-devel] [PATCH] x86: Use deep C states for off-lined CPUs

I noticed the following comments when using mwait based idle:
-------------------------------------------------------------------------
        while ( 1 )
        {
            /*
             * 1. The CLFLUSH is a workaround for erratum AAI65 for
             * the Xeon 7400 series.
             * 2. The WBINVD is insufficient due to the spurious-wakeup
             * case where we return around the loop.
             * 3. Unlike wbinvd, clflush is a light weight but not serializing
             * instruction, hence memory fence is necessary to make sure all
             * load/store visible before flush cache line.
             */
            mb();
            clflush(mwait_ptr);
            __monitor(mwait_ptr, 0, 0);
            mb();
            __mwait(cx->address, 0);
        }
    }
-------------------------------------------------------------------------
Your patch should follow it too.

best regards
yang


> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org
> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Boris Ostrovsky
> Sent: Wednesday, February 29, 2012 6:09 AM
> To: xen-devel@lists.xensource.com
> Cc: boris.ostrovsky@amd.com
> Subject: [Xen-devel] [PATCH] x86: Use deep C states for off-lined CPUs
>
> # HG changeset patch
> # User Boris Ostrovsky <boris.ostrovsky@amd.com> # Date 1330466573 -3600 #
> Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c
> # Parent  a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9
> x86: Use deep C states for off-lined CPUs
>
> Currently when a core is taken off-line it is placed in C1 state (unless
> MONITOR/MWAIT is used). This patch allows a core to go to deeper C states
> resulting in significantly higher power savings.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
>
> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c
> --- a/xen/arch/x86/acpi/cpu_idle.c    Mon Feb 27 17:05:18 2012 +0000
> +++ b/xen/arch/x86/acpi/cpu_idle.c    Tue Feb 28 23:02:53 2012 +0100
> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void)
>      if ( (cx = &power->states[power->count-1]) == NULL )
>          goto default_halt;
>
> -    mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
> -
>      if ( cx->entry_method == ACPI_CSTATE_EM_FFH )
>      {
> +        mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
> +
>          /*
>           * Cache must be flushed as the last operation before sleeping.
>           * Otherwise, CPU may still hold dirty data, breaking cache coherency,
> @@ -601,6 +601,20 @@ static void acpi_dead_idle(void)
>              mb();
>              __mwait(cx->address, 0);
>          }
> +    }
> +    else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO )
> +    {
> +        /* Avoid references to shared data after the cache flush */
> +        u32 address = cx->address;
> +        u32 pmtmr_ioport_local = pmtmr_ioport;
> +
> +        wbinvd();
> +
> +        while ( 1 )
> +        {
> +            inb(address);
> +            inl(pmtmr_ioport_local);
> +        }
>      }
>
>  default_halt:
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Use deep C states for off-lined CPUs
  2012-02-28 22:08 [PATCH] x86: Use deep C states for off-lined CPUs Boris Ostrovsky
  2012-02-29  1:37 ` Zhang, Yang Z
@ 2012-02-29  4:58 ` Liu, Jinsong
  2012-02-29 13:48   ` Boris Ostrovsky
  2012-02-29  9:12 ` Jan Beulich
  2 siblings, 1 reply; 11+ messages in thread
From: Liu, Jinsong @ 2012-02-29  4:58 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel

I don't think we should go back to old SYSIO method, the history here is:

Xen originally has SYSIO method when offline cpu, but at c/s 23022 we cancel it as reason below
======================
x86: Fix cpu offline bug: cancel SYSIO method when play dead

Play dead is a fragile and tricky point of cpu offline logic.  For how
to play cpu dead, linux kernel changed several times: Very old kernel
support 3 ways to play cpu dead: mwait, SYSIO, and halt, just like
what cpuidle did when enter C3; Later, it cancel mwait and SYSIO
support, only use halt to play dead; Latest linux 2.6.38 add mwait
support when cpu dead.

This patch cancel SYSIO method when cpu dead, keep same with latest
kernel.

SYSIO is an obsoleted method to enter deep C, with some tricky
hardware behavior, and seldom supported in new platform.  Xen
experiment indicate that when cpu dead, SYSIO method would trigger
unknown issue which would bring strange error.  We now cancel SYSIO
method when cpu dead, after all, correctness is more important than
power save, and btw new platform use mwait.
======================

Thanks,
Jinsong

Boris Ostrovsky wrote:
> # HG changeset patch
> # User Boris Ostrovsky <boris.ostrovsky@amd.com>
> # Date 1330466573 -3600
> # Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c
> # Parent  a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9
> x86: Use deep C states for off-lined CPUs
> 
> Currently when a core is taken off-line it is placed in C1 state
> (unless MONITOR/MWAIT is used). This patch allows a core to go to
> deeper C states resulting in significantly higher power savings.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
> 
> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c
> --- a/xen/arch/x86/acpi/cpu_idle.c	Mon Feb 27 17:05:18 2012 +0000
> +++ b/xen/arch/x86/acpi/cpu_idle.c	Tue Feb 28 23:02:53 2012 +0100
> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void)
>      if ( (cx = &power->states[power->count-1]) == NULL )
>          goto default_halt;
> 
> -    mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
> -
>      if ( cx->entry_method == ACPI_CSTATE_EM_FFH )
>      {
> +        mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
> +
>          /*
>           * Cache must be flushed as the last operation before
> sleeping. 
>           * Otherwise, CPU may still hold dirty data, breaking cache
> coherency, @@ -601,6 +601,20 @@ static void acpi_dead_idle(void)
>              mb();
>              __mwait(cx->address, 0);
>          }
> +    }
> +    else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO )
> +    {
> +        /* Avoid references to shared data after the cache flush */
> +        u32 address = cx->address;
> +        u32 pmtmr_ioport_local = pmtmr_ioport;
> +
> +        wbinvd();
> +
> +        while ( 1 )
> +        {
> +            inb(address);
> +            inl(pmtmr_ioport_local);
> +        }
>      }
> 
>  default_halt:
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Use deep C states for off-lined CPUs
  2012-02-29  4:03   ` Ostrovsky, Boris
@ 2012-02-29  5:21     ` Liu, Jinsong
  2012-02-29 14:55       ` Boris Ostrovsky
  0 siblings, 1 reply; 11+ messages in thread
From: Liu, Jinsong @ 2012-02-29  5:21 UTC (permalink / raw)
  To: Ostrovsky, Boris, Zhang, Yang Z, xen-devel; +Cc: Wei, Gang

Hmm, no.

It need flush cache, as long as *deep Cx* would be spurious-wokenup.
The reason clflush here is, it's a light-weight flush, in fact it also could use wbinvd if not consider performance.

For halt, it don't need to do so since cpu still keep snoop when sleep.

Thanks,
Jinsong

Ostrovsky, Boris wrote:
> The patch is adding IO-based C-states. My understading is that CFLUSH
> was to work around a MONITOR-related erratum. 
> 
> Or are you referring to something else?
> 
> -boris
> 
> 
> ________________________________________
> From: Zhang, Yang Z [yang.z.zhang@intel.com]
> Sent: Tuesday, February 28, 2012 8:37 PM
> To: Ostrovsky, Boris; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] [PATCH] x86: Use deep C states for off-lined
> CPUs 
> 
> I noticed the following comments when using mwait based idle:
> -------------------------------------------------------------------------
>         while ( 1 )
>         {
>             /*
>              * 1. The CLFLUSH is a workaround for erratum AAI65 for
>              * the Xeon 7400 series.
>              * 2. The WBINVD is insufficient due to the
> spurious-wakeup 
>              * case where we return around the loop.
>              * 3. Unlike wbinvd, clflush is a light weight but not
> serializing 
>              * instruction, hence memory fence is necessary to make
> sure all 
>              * load/store visible before flush cache line.
>              */
>             mb();
>             clflush(mwait_ptr);
>             __monitor(mwait_ptr, 0, 0);
>             mb();
>             __mwait(cx->address, 0);
>         }
>     }
> -------------------------------------------------------------------------
> Your patch should follow it too.
> 
> best regards
> yang
> 
> 
>> -----Original Message-----
>> From: xen-devel-bounces@lists.xen.org
>> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Boris Ostrovsky
>> Sent: Wednesday, February 29, 2012 6:09 AM
>> To: xen-devel@lists.xensource.com
>> Cc: boris.ostrovsky@amd.com
>> Subject: [Xen-devel] [PATCH] x86: Use deep C states for off-lined
>> CPUs 
>> 
>> # HG changeset patch
>> # User Boris Ostrovsky <boris.ostrovsky@amd.com> # Date 1330466573
>> -3600 # Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c
>> # Parent  a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9
>> x86: Use deep C states for off-lined CPUs
>> 
>> Currently when a core is taken off-line it is placed in C1 state
>> (unless MONITOR/MWAIT is used). This patch allows a core to go to
>> deeper C states resulting in significantly higher power savings.
>> 
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
>> 
>> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c
>> --- a/xen/arch/x86/acpi/cpu_idle.c    Mon Feb 27 17:05:18 2012 +0000
>> +++ b/xen/arch/x86/acpi/cpu_idle.c    Tue Feb 28 23:02:53 2012 +0100
>> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void)
>>      if ( (cx = &power->states[power->count-1]) == NULL )         
>> goto default_halt; 
>> 
>> -    mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); -
>>      if ( cx->entry_method == ACPI_CSTATE_EM_FFH )
>>      {
>> +        mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); +
>>          /*
>>           * Cache must be flushed as the last operation before
>> sleeping. 
>>           * Otherwise, CPU may still hold dirty data, breaking cache
>> coherency, @@ -601,6 +601,20 @@ static void acpi_dead_idle(void)    
>>              mb(); __mwait(cx->address, 0);
>>          }
>> +    }
>> +    else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) +    {
>> +        /* Avoid references to shared data after the cache flush */
>> +        u32 address = cx->address;
>> +        u32 pmtmr_ioport_local = pmtmr_ioport;
>> +
>> +        wbinvd();
>> +
>> +        while ( 1 )
>> +        {
>> +            inb(address);
>> +            inl(pmtmr_ioport_local);
>> +        }
>>      }
>> 
>>  default_halt:
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Use deep C states for off-lined CPUs
  2012-02-28 22:08 [PATCH] x86: Use deep C states for off-lined CPUs Boris Ostrovsky
  2012-02-29  1:37 ` Zhang, Yang Z
  2012-02-29  4:58 ` Liu, Jinsong
@ 2012-02-29  9:12 ` Jan Beulich
  2 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2012-02-29  9:12 UTC (permalink / raw)
  To: boris.ostrovsky; +Cc: xen-devel

>>> On 28.02.12 at 23:08, Boris Ostrovsky <boris.ostrovsky@amd.com> wrote:
> --- a/xen/arch/x86/acpi/cpu_idle.c	Mon Feb 27 17:05:18 2012 +0000
> +++ b/xen/arch/x86/acpi/cpu_idle.c	Tue Feb 28 23:02:53 2012 +0100
> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void)
>      if ( (cx = &power->states[power->count-1]) == NULL )
>          goto default_halt;
>  
> -    mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
> -
>      if ( cx->entry_method == ACPI_CSTATE_EM_FFH )
>      {
> +        mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
> +

If you're concerned about the placement of this (the change being
unrelated to what your patch is aiming it anyway), then you should
- explain why
- move the declaration of mwait_ptr also into the if() scope

>          /*
>           * Cache must be flushed as the last operation before sleeping.
>           * Otherwise, CPU may still hold dirty data, breaking cache coherency,
> @@ -601,6 +601,20 @@ static void acpi_dead_idle(void)
>              mb();
>              __mwait(cx->address, 0);
>          }
> +    } 
> +    else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO )
> +    {
> +        /* Avoid references to shared data after the cache flush */
> +        u32 address = cx->address;
> +        u32 pmtmr_ioport_local = pmtmr_ioport;
> +
> +        wbinvd();	
> +
> +        while ( 1 )
> +        {
> +            inb(address);
> +            inl(pmtmr_ioport_local);
> +        }

You will need to eliminate the reservations of the Intel folks for this
to be accepted, I'm afraid, or make this AMD specific (provided the
issues pointed out by them don't affect AMD systems).

Jan

>      }
>  
>  default_halt:

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Use deep C states for off-lined CPUs
  2012-02-29  4:58 ` Liu, Jinsong
@ 2012-02-29 13:48   ` Boris Ostrovsky
  2012-03-01  7:29     ` Liu, Jinsong
  0 siblings, 1 reply; 11+ messages in thread
From: Boris Ostrovsky @ 2012-02-29 13:48 UTC (permalink / raw)
  To: Liu, Jinsong; +Cc: xen-devel

As far as I can tell the most relevant change in Linux was this: 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ea53069231f9317062910d6e772cca4ce93de8c8
and it sounds that it was made mostly because MWAIT-based idle is more 
efficient on Intel processors. That's not the case on AMD where IO-based 
idle is preferred (and I am not aware of any issues, at least so far).

I can make the patch to be AMD_specific but since for the most parts the 
logic is the same as in acpi_idle_do_entry() won't we have to modify 
that function as well?

-boris


On 02/28/12 23:58, Liu, Jinsong wrote:
> I don't think we should go back to old SYSIO method, the history here is:
>
> Xen originally has SYSIO method when offline cpu, but at c/s 23022 we cancel it as reason below
> ======================
> x86: Fix cpu offline bug: cancel SYSIO method when play dead
>
> Play dead is a fragile and tricky point of cpu offline logic.  For how
> to play cpu dead, linux kernel changed several times: Very old kernel
> support 3 ways to play cpu dead: mwait, SYSIO, and halt, just like
> what cpuidle did when enter C3; Later, it cancel mwait and SYSIO
> support, only use halt to play dead; Latest linux 2.6.38 add mwait
> support when cpu dead.
>
> This patch cancel SYSIO method when cpu dead, keep same with latest
> kernel.
>
> SYSIO is an obsoleted method to enter deep C, with some tricky
> hardware behavior, and seldom supported in new platform.  Xen
> experiment indicate that when cpu dead, SYSIO method would trigger
> unknown issue which would bring strange error.  We now cancel SYSIO
> method when cpu dead, after all, correctness is more important than
> power save, and btw new platform use mwait.
> ======================
>
> Thanks,
> Jinsong
>
> Boris Ostrovsky wrote:
>> # HG changeset patch
>> # User Boris Ostrovsky<boris.ostrovsky@amd.com>
>> # Date 1330466573 -3600
>> # Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c
>> # Parent  a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9
>> x86: Use deep C states for off-lined CPUs
>>
>> Currently when a core is taken off-line it is placed in C1 state
>> (unless MONITOR/MWAIT is used). This patch allows a core to go to
>> deeper C states resulting in significantly higher power savings.
>>
>> Signed-off-by: Boris Ostrovsky<boris.ostrovsky@amd.com>
>>
>> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c
>> --- a/xen/arch/x86/acpi/cpu_idle.c	Mon Feb 27 17:05:18 2012 +0000
>> +++ b/xen/arch/x86/acpi/cpu_idle.c	Tue Feb 28 23:02:53 2012 +0100
>> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void)
>>       if ( (cx =&power->states[power->count-1]) == NULL )
>>           goto default_halt;
>>
>> -    mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
>> -
>>       if ( cx->entry_method == ACPI_CSTATE_EM_FFH )
>>       {
>> +        mwait_ptr = (void *)&mwait_wakeup(smp_processor_id());
>> +
>>           /*
>>            * Cache must be flushed as the last operation before
>> sleeping.
>>            * Otherwise, CPU may still hold dirty data, breaking cache
>> coherency, @@ -601,6 +601,20 @@ static void acpi_dead_idle(void)
>>               mb();
>>               __mwait(cx->address, 0);
>>           }
>> +    }
>> +    else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO )
>> +    {
>> +        /* Avoid references to shared data after the cache flush */
>> +        u32 address = cx->address;
>> +        u32 pmtmr_ioport_local = pmtmr_ioport;
>> +
>> +        wbinvd();
>> +
>> +        while ( 1 )
>> +        {
>> +            inb(address);
>> +            inl(pmtmr_ioport_local);
>> +        }
>>       }
>>
>>   default_halt:
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Use deep C states for off-lined CPUs
  2012-02-29  5:21     ` Liu, Jinsong
@ 2012-02-29 14:55       ` Boris Ostrovsky
  2012-02-29 15:03         ` Jan Beulich
  2012-03-01  8:10         ` Liu, Jinsong
  0 siblings, 2 replies; 11+ messages in thread
From: Boris Ostrovsky @ 2012-02-29 14:55 UTC (permalink / raw)
  To: Liu, Jinsong; +Cc: Zhang, Yang Z, xen-devel, Wei, Gang

On 02/29/12 00:21, Liu, Jinsong wrote:
> Hmm, no.
>
> It need flush cache, as long as *deep Cx* would be spurious-wokenup.
> The reason clflush here is, it's a light-weight flush, in fact it also could use wbinvd if not consider performance.

What address would need to be CFLUSH'd ? Both "address" and 
"pmtmr_ioport_local"?

>
> For halt, it don't need to do so since cpu still keep snoop when sleep.

If cpu not snoop when in deeper C-states, wouldn't we have a problem 
with acpi_idle_do_entry()? There is a code path (at least for for C2) 
where the cache is not flushed.

Incidentally, if CFLUSH is required for MONITOR then perhaps 
mwait_idle_with_hints() needs to have it as well?

-boris

>
> Thanks,
> Jinsong
>
> Ostrovsky, Boris wrote:
>> The patch is adding IO-based C-states. My understading is that CFLUSH
>> was to work around a MONITOR-related erratum.
>>
>> Or are you referring to something else?
>>
>> -boris
>>
>>
>> ________________________________________
>> From: Zhang, Yang Z [yang.z.zhang@intel.com]
>> Sent: Tuesday, February 28, 2012 8:37 PM
>> To: Ostrovsky, Boris; xen-devel@lists.xensource.com
>> Subject: RE: [Xen-devel] [PATCH] x86: Use deep C states for off-lined
>> CPUs
>>
>> I noticed the following comments when using mwait based idle:
>> -------------------------------------------------------------------------
>>          while ( 1 )
>>          {
>>              /*
>>               * 1. The CLFLUSH is a workaround for erratum AAI65 for
>>               * the Xeon 7400 series.
>>               * 2. The WBINVD is insufficient due to the
>> spurious-wakeup
>>               * case where we return around the loop.
>>               * 3. Unlike wbinvd, clflush is a light weight but not
>> serializing
>>               * instruction, hence memory fence is necessary to make
>> sure all
>>               * load/store visible before flush cache line.
>>               */
>>              mb();
>>              clflush(mwait_ptr);
>>              __monitor(mwait_ptr, 0, 0);
>>              mb();
>>              __mwait(cx->address, 0);
>>          }
>>      }
>> -------------------------------------------------------------------------
>> Your patch should follow it too.
>>
>> best regards
>> yang
>>
>>
>>> -----Original Message-----
>>> From: xen-devel-bounces@lists.xen.org
>>> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Boris Ostrovsky
>>> Sent: Wednesday, February 29, 2012 6:09 AM
>>> To: xen-devel@lists.xensource.com
>>> Cc: boris.ostrovsky@amd.com
>>> Subject: [Xen-devel] [PATCH] x86: Use deep C states for off-lined
>>> CPUs
>>>
>>> # HG changeset patch
>>> # User Boris Ostrovsky<boris.ostrovsky@amd.com>  # Date 1330466573
>>> -3600 # Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c
>>> # Parent  a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9
>>> x86: Use deep C states for off-lined CPUs
>>>
>>> Currently when a core is taken off-line it is placed in C1 state
>>> (unless MONITOR/MWAIT is used). This patch allows a core to go to
>>> deeper C states resulting in significantly higher power savings.
>>>
>>> Signed-off-by: Boris Ostrovsky<boris.ostrovsky@amd.com>
>>>
>>> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c
>>> --- a/xen/arch/x86/acpi/cpu_idle.c    Mon Feb 27 17:05:18 2012 +0000
>>> +++ b/xen/arch/x86/acpi/cpu_idle.c    Tue Feb 28 23:02:53 2012 +0100
>>> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void)
>>>       if ( (cx =&power->states[power->count-1]) == NULL )
>>> goto default_halt;
>>>
>>> -    mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); -
>>>       if ( cx->entry_method == ACPI_CSTATE_EM_FFH )
>>>       {
>>> +        mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); +
>>>           /*
>>>            * Cache must be flushed as the last operation before
>>> sleeping.
>>>            * Otherwise, CPU may still hold dirty data, breaking cache
>>> coherency, @@ -601,6 +601,20 @@ static void acpi_dead_idle(void)
>>>               mb(); __mwait(cx->address, 0);
>>>           }
>>> +    }
>>> +    else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) +    {
>>> +        /* Avoid references to shared data after the cache flush */
>>> +        u32 address = cx->address;
>>> +        u32 pmtmr_ioport_local = pmtmr_ioport;
>>> +
>>> +        wbinvd();
>>> +
>>> +        while ( 1 )
>>> +        {
>>> +            inb(address);
>>> +            inl(pmtmr_ioport_local);
>>> +        }
>>>       }
>>>
>>>   default_halt:
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Use deep C states for off-lined CPUs
  2012-02-29 14:55       ` Boris Ostrovsky
@ 2012-02-29 15:03         ` Jan Beulich
  2012-03-01  8:10         ` Liu, Jinsong
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2012-02-29 15:03 UTC (permalink / raw)
  To: Boris Ostrovsky, Jinsong Liu; +Cc: Yang Z Zhang, xen-devel, Gang Wei

>>> On 29.02.12 at 15:55, Boris Ostrovsky <boris.ostrovsky@amd.com> wrote:
> On 02/29/12 00:21, Liu, Jinsong wrote:
>> Hmm, no.
>>
>> It need flush cache, as long as *deep Cx* would be spurious-wokenup.
>> The reason clflush here is, it's a light-weight flush, in fact it also could 
> use wbinvd if not consider performance.
> 
> What address would need to be CFLUSH'd ? Both "address" and 
> "pmtmr_ioport_local"?

Hardly - these are both I/O ports.

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Use deep C states for off-lined CPUs
  2012-02-29 13:48   ` Boris Ostrovsky
@ 2012-03-01  7:29     ` Liu, Jinsong
  0 siblings, 0 replies; 11+ messages in thread
From: Liu, Jinsong @ 2012-03-01  7:29 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: xen-devel

Boris Ostrovsky wrote:
> As far as I can tell the most relevant change in Linux was this:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ea53069231f9317062910d6e772cca4ce93de8c8
> and it sounds that it was made mostly because MWAIT-based idle is more
> efficient on Intel processors. That's not the case on AMD where
> IO-based idle is preferred (and I am not aware of any issues, at
> least so far). 
> 
> I can make the patch to be AMD_specific but since for the most parts
> the logic is the same as in acpi_idle_do_entry() won't we have to
> modify that function as well?
> 

AMD specific approach is OK to me.

Thanks,
Jinsong

> 
> On 02/28/12 23:58, Liu, Jinsong wrote:
>> I don't think we should go back to old SYSIO method, the history
>> here is: 
>> 
>> Xen originally has SYSIO method when offline cpu, but at c/s 23022
>> we cancel it as reason below ====================== x86: Fix cpu
>> offline bug: cancel SYSIO method when play dead 
>> 
>> Play dead is a fragile and tricky point of cpu offline logic.  For
>> how 
>> to play cpu dead, linux kernel changed several times: Very old kernel
>> support 3 ways to play cpu dead: mwait, SYSIO, and halt, just like
>> what cpuidle did when enter C3; Later, it cancel mwait and SYSIO
>> support, only use halt to play dead; Latest linux 2.6.38 add mwait
>> support when cpu dead.
>> 
>> This patch cancel SYSIO method when cpu dead, keep same with latest
>> kernel.
>> 
>> SYSIO is an obsoleted method to enter deep C, with some tricky
>> hardware behavior, and seldom supported in new platform.  Xen
>> experiment indicate that when cpu dead, SYSIO method would trigger
>> unknown issue which would bring strange error.  We now cancel SYSIO
>> method when cpu dead, after all, correctness is more important than
>> power save, and btw new platform use mwait.
>> ======================
>> 
>> Thanks,
>> Jinsong
>> 
>> Boris Ostrovsky wrote:
>>> # HG changeset patch
>>> # User Boris Ostrovsky<boris.ostrovsky@amd.com>
>>> # Date 1330466573 -3600
>>> # Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c
>>> # Parent  a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9
>>> x86: Use deep C states for off-lined CPUs
>>> 
>>> Currently when a core is taken off-line it is placed in C1 state
>>> (unless MONITOR/MWAIT is used). This patch allows a core to go to
>>> deeper C states resulting in significantly higher power savings.
>>> 
>>> Signed-off-by: Boris Ostrovsky<boris.ostrovsky@amd.com>
>>> 
>>> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c
>>> --- a/xen/arch/x86/acpi/cpu_idle.c	Mon Feb 27 17:05:18 2012 +0000
>>> +++ b/xen/arch/x86/acpi/cpu_idle.c	Tue Feb 28 23:02:53 2012 +0100
>>> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void)
>>>       if ( (cx =&power->states[power->count-1]) == NULL )          
>>> goto default_halt; 
>>> 
>>> -    mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); -
>>>       if ( cx->entry_method == ACPI_CSTATE_EM_FFH )       {
>>> +        mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); +
>>>           /*
>>>            * Cache must be flushed as the last operation before
>>> sleeping. 
>>>            * Otherwise, CPU may still hold dirty data, breaking
>>> cache coherency, @@ -601,6 +601,20 @@ static void
>>>               acpi_dead_idle(void)               mb();
>>>           __mwait(cx->address, 0); }
>>> +    }
>>> +    else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) +    {
>>> +        /* Avoid references to shared data after the cache flush */
>>> +        u32 address = cx->address;
>>> +        u32 pmtmr_ioport_local = pmtmr_ioport;
>>> +
>>> +        wbinvd();
>>> +
>>> +        while ( 1 )
>>> +        {
>>> +            inb(address);
>>> +            inl(pmtmr_ioport_local);
>>> +        }
>>>       }
>>> 
>>>   default_halt:
>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: Use deep C states for off-lined CPUs
  2012-02-29 14:55       ` Boris Ostrovsky
  2012-02-29 15:03         ` Jan Beulich
@ 2012-03-01  8:10         ` Liu, Jinsong
  1 sibling, 0 replies; 11+ messages in thread
From: Liu, Jinsong @ 2012-03-01  8:10 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: Zhang, Yang Z, xen-devel, Wei, Gang

Boris Ostrovsky wrote:
> On 02/29/12 00:21, Liu, Jinsong wrote:
>> Hmm, no.
>> 
>> It need flush cache, as long as *deep Cx* would be spurious-wokenup.
>> The reason clflush here is, it's a light-weight flush, in fact it
>> also could use wbinvd if not consider performance. 
> 
> What address would need to be CFLUSH'd ? Both "address" and
> "pmtmr_ioport_local"?

if while loop only involve inb/inl port, no need to flush.

> 
>> 
>> For halt, it don't need to do so since cpu still keep snoop when
>> sleep. 
> 
> If cpu not snoop when in deeper C-states, wouldn't we have a problem
> with acpi_idle_do_entry()? There is a code path (at least for for C2)
> where the cache is not flushed.

No problem for C1/C2, only C3 and deeper would stop snoop.

> 
> Incidentally, if CFLUSH is required for MONITOR then perhaps
> mwait_idle_with_hints() needs to have it as well?
> 

No need to do so, wbinvd has been done before mwait_idle_with_hints enter C3, and it has different scenario with acpi_dead_idle which is a while(1) loop.

Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-03-01  8:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-28 22:08 [PATCH] x86: Use deep C states for off-lined CPUs Boris Ostrovsky
2012-02-29  1:37 ` Zhang, Yang Z
2012-02-29  4:03   ` Ostrovsky, Boris
2012-02-29  5:21     ` Liu, Jinsong
2012-02-29 14:55       ` Boris Ostrovsky
2012-02-29 15:03         ` Jan Beulich
2012-03-01  8:10         ` Liu, Jinsong
2012-02-29  4:58 ` Liu, Jinsong
2012-02-29 13:48   ` Boris Ostrovsky
2012-03-01  7:29     ` Liu, Jinsong
2012-02-29  9:12 ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.