All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] Misc fix to hypervisor.
@ 2014-06-04 13:37 Konrad Rzeszutek Wilk
  2014-06-04 13:37 ` [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-04 13:37 UTC (permalink / raw)
  To: xen-devel, chegger, jinsong.liu, keir, jbeulich

This patch came about as I was working on a different bug and crashed
dom0 quite often. That caused an infinite loop which made the CPUs hot
and the firmware decided to tell me about that. In fact it told me so
often that I got fed up with it. As such this patch adds a state machine
to this so it will only print this when the state has changed.

 xen/arch/x86/cpu/mcheck/mce_intel.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2)
  2014-06-04 13:37 [PATCH v1] Misc fix to hypervisor Konrad Rzeszutek Wilk
@ 2014-06-04 13:37 ` Konrad Rzeszutek Wilk
  2014-06-04 13:49   ` Andrew Cooper
  2014-06-04 14:49   ` Jan Beulich
  0 siblings, 2 replies; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-04 13:37 UTC (permalink / raw)
  To: xen-devel, chegger, jinsong.liu, keir, jbeulich; +Cc: Konrad Rzeszutek Wilk

If the machine has been quite busy it ends up with these
messages printed on the hypervisor console:

(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature above threshold
(XEN) CPU0: Running in modulated clock mode
(XEN) CPU1: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal

While the state changes are important, the non-altered
state information is not needed. As such add a latch
mechanism to only print the information if it has
changed since the last update.

This was observed on Intel DQ67SW,
BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012

CC: Jan Beulich <jbeulich@suse.com>
CC: Keir Fraser <keir@xen.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Redo per Daniel and Boris's review]
---
 xen/arch/x86/cpu/mcheck/mce_intel.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c
index b32fdb2..a5caf47 100644
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -49,11 +49,13 @@ static int __read_mostly nr_intel_ext_msrs;
 #define INTEL_SRAR_INSTR_FETCH	0x150
 
 #ifdef CONFIG_X86_MCE_THERMAL
+#define MCE_RING                0x1
 static void intel_thermal_interrupt(struct cpu_user_regs *regs)
 {
     uint64_t msr_content;
     unsigned int cpu = smp_processor_id();
     static DEFINE_PER_CPU(s_time_t, next);
+    static DEFINE_PER_CPU(int, last_state) = -1;
 
     ack_APIC_irq();
 
@@ -62,10 +64,13 @@ static void intel_thermal_interrupt(struct cpu_user_regs *regs)
 
     per_cpu(next, cpu) = NOW() + MILLISECS(5000);
     rdmsrl(MSR_IA32_THERM_STATUS, msr_content);
-    if (msr_content & 0x1) {
+    if ( __get_cpu_var(last_state) == (msr_content & MCE_RING) )
+        return;
+    per_cpu(last_state, cpu) = msr_content & MCE_RING;
+    if ( msr_content & MCE_RING )
+    {
         printk(KERN_EMERG "CPU%d: Temperature above threshold\n", cpu);
-        printk(KERN_EMERG "CPU%d: Running in modulated clock mode\n",
-                cpu);
+        printk(KERN_EMERG "CPU%d: Running in modulated clock mode\n", cpu);
         add_taint(TAINT_MACHINE_CHECK);
     } else {
         printk(KERN_INFO "CPU%d: Temperature/speed normal\n", cpu);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2)
  2014-06-04 13:37 ` [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) Konrad Rzeszutek Wilk
@ 2014-06-04 13:49   ` Andrew Cooper
  2014-06-04 14:49   ` Jan Beulich
  1 sibling, 0 replies; 5+ messages in thread
From: Andrew Cooper @ 2014-06-04 13:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, chegger, keir, jbeulich, jinsong.liu

On 04/06/14 14:37, Konrad Rzeszutek Wilk wrote:
> If the machine has been quite busy it ends up with these
> messages printed on the hypervisor console:
>
> (XEN) CPU3: Temperature/speed normal
> (XEN) CPU1: Temperature/speed normal
> (XEN) CPU0: Temperature/speed normal
> (XEN) CPU1: Temperature/speed normal
> (XEN) CPU0: Temperature/speed normal
> (XEN) CPU2: Temperature/speed normal
> (XEN) CPU3: Temperature/speed normal
> (XEN) CPU0: Temperature/speed normal
> (XEN) CPU2: Temperature/speed normal
> (XEN) CPU3: Temperature/speed normal
> (XEN) CPU1: Temperature/speed normal
> (XEN) CPU0: Temperature above threshold
> (XEN) CPU0: Running in modulated clock mode
> (XEN) CPU1: Temperature/speed normal
> (XEN) CPU2: Temperature/speed normal
> (XEN) CPU3: Temperature/speed normal
>
> While the state changes are important, the non-altered
> state information is not needed. As such add a latch
> mechanism to only print the information if it has
> changed since the last update.
>
> This was observed on Intel DQ67SW,
> BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012
>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Keir Fraser <keir@xen.org>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> [v2: Redo per Daniel and Boris's review]
> ---
>  xen/arch/x86/cpu/mcheck/mce_intel.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c
> index b32fdb2..a5caf47 100644
> --- a/xen/arch/x86/cpu/mcheck/mce_intel.c
> +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
> @@ -49,11 +49,13 @@ static int __read_mostly nr_intel_ext_msrs;
>  #define INTEL_SRAR_INSTR_FETCH	0x150
>  
>  #ifdef CONFIG_X86_MCE_THERMAL
> +#define MCE_RING                0x1
>  static void intel_thermal_interrupt(struct cpu_user_regs *regs)
>  {
>      uint64_t msr_content;
>      unsigned int cpu = smp_processor_id();
>      static DEFINE_PER_CPU(s_time_t, next);
> +    static DEFINE_PER_CPU(int, last_state) = -1;

Percpu areas are all 0'd during init.  This default value won't stick on
anything other than the BSP, and that is only because of the current
implementation.

>  
>      ack_APIC_irq();
>  
> @@ -62,10 +64,13 @@ static void intel_thermal_interrupt(struct cpu_user_regs *regs)
>  
>      per_cpu(next, cpu) = NOW() + MILLISECS(5000);
>      rdmsrl(MSR_IA32_THERM_STATUS, msr_content);
> -    if (msr_content & 0x1) {
> +    if ( __get_cpu_var(last_state) == (msr_content & MCE_RING) )
> +        return;
> +    per_cpu(last_state, cpu) = msr_content & MCE_RING;

You have an asymmetry in accessing last_state.

As both of them end up using RELOC_HIDE(), it is more efficient to use
int *this_last_state = &per_cpu(last_state, cpu);

> +    if ( msr_content & MCE_RING )
> +    {
>          printk(KERN_EMERG "CPU%d: Temperature above threshold\n", cpu);
> -        printk(KERN_EMERG "CPU%d: Running in modulated clock mode\n",
> -                cpu);
> +        printk(KERN_EMERG "CPU%d: Running in modulated clock mode\n", cpu);

As you are fixing this up, cpu is unsigned so should use %u rather than %d

~Andrew

>          add_taint(TAINT_MACHINE_CHECK);
>      } else {
>          printk(KERN_INFO "CPU%d: Temperature/speed normal\n", cpu);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2)
  2014-06-04 13:37 ` [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) Konrad Rzeszutek Wilk
  2014-06-04 13:49   ` Andrew Cooper
@ 2014-06-04 14:49   ` Jan Beulich
  2014-06-04 18:50     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2014-06-04 14:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: jinsong.liu, chegger, keir, xen-devel

>>> On 04.06.14 at 15:37, <konrad.wilk@oracle.com> wrote:
> If the machine has been quite busy it ends up with these
> messages printed on the hypervisor console:
> 
> (XEN) CPU3: Temperature/speed normal
> (XEN) CPU1: Temperature/speed normal
> (XEN) CPU0: Temperature/speed normal
> (XEN) CPU1: Temperature/speed normal
> (XEN) CPU0: Temperature/speed normal
> (XEN) CPU2: Temperature/speed normal
> (XEN) CPU3: Temperature/speed normal
> (XEN) CPU0: Temperature/speed normal
> (XEN) CPU2: Temperature/speed normal
> (XEN) CPU3: Temperature/speed normal
> (XEN) CPU1: Temperature/speed normal
> (XEN) CPU0: Temperature above threshold
> (XEN) CPU0: Running in modulated clock mode
> (XEN) CPU1: Temperature/speed normal
> (XEN) CPU2: Temperature/speed normal
> (XEN) CPU3: Temperature/speed normal
> 
> While the state changes are important, the non-altered
> state information is not needed. As such add a latch
> mechanism to only print the information if it has
> changed since the last update.

But isn't the interrupt supposed to happen only when state changes
in the first place?

Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2)
  2014-06-04 14:49   ` Jan Beulich
@ 2014-06-04 18:50     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-06-04 18:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: jinsong.liu, chegger, keir, xen-devel

On Wed, Jun 04, 2014 at 03:49:52PM +0100, Jan Beulich wrote:
> >>> On 04.06.14 at 15:37, <konrad.wilk@oracle.com> wrote:
> > If the machine has been quite busy it ends up with these
> > messages printed on the hypervisor console:
> > 
> > (XEN) CPU3: Temperature/speed normal
> > (XEN) CPU1: Temperature/speed normal
> > (XEN) CPU0: Temperature/speed normal
> > (XEN) CPU1: Temperature/speed normal
> > (XEN) CPU0: Temperature/speed normal
> > (XEN) CPU2: Temperature/speed normal
> > (XEN) CPU3: Temperature/speed normal
> > (XEN) CPU0: Temperature/speed normal
> > (XEN) CPU2: Temperature/speed normal
> > (XEN) CPU3: Temperature/speed normal
> > (XEN) CPU1: Temperature/speed normal
> > (XEN) CPU0: Temperature above threshold
> > (XEN) CPU0: Running in modulated clock mode
> > (XEN) CPU1: Temperature/speed normal
> > (XEN) CPU2: Temperature/speed normal
> > (XEN) CPU3: Temperature/speed normal
> > 
> > While the state changes are important, the non-altered
> > state information is not needed. As such add a latch
> > mechanism to only print the information if it has
> > changed since the last update.
> 
> But isn't the interrupt supposed to happen only when state changes
> in the first place?


HA!
That is what I thought too, but this machine keeps on triggering
this interrupt.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-06-04 18:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-04 13:37 [PATCH v1] Misc fix to hypervisor Konrad Rzeszutek Wilk
2014-06-04 13:37 ` [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) Konrad Rzeszutek Wilk
2014-06-04 13:49   ` Andrew Cooper
2014-06-04 14:49   ` Jan Beulich
2014-06-04 18:50     ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.