All of lore.kernel.org
 help / color / mirror / Atom feed
* G5 Xserve rackmeter broken?
@ 2015-05-10 18:32 Aaro Koskinen
  2015-05-10 22:13 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Aaro Koskinen @ 2015-05-10 18:32 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev

Hi,

With 4.1-rc2 the rackmeter driver for G5 Xserve is giving bogus
led patterns. So far I have seen at least the following:

a) With static load the leds seems to be sane and report CPU
usage properly, but after few minutes they go completely OFF,
even if the CPU load remains high.

b) On a completely idle system, leds alter between all OFF and all ON
roughly once a second.

Unfortunately I cannot say which was the last kernel where this worked
properly... These servers were away from normal use for a while due
to PSU issues.

A.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: G5 Xserve rackmeter broken?
  2015-05-10 18:32 G5 Xserve rackmeter broken? Aaro Koskinen
@ 2015-05-10 22:13 ` Benjamin Herrenschmidt
  2015-05-12 17:55   ` Aaro Koskinen
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2015-05-10 22:13 UTC (permalink / raw)
  To: Aaro Koskinen; +Cc: linuxppc-dev

On Sun, 2015-05-10 at 21:32 +0300, Aaro Koskinen wrote:
> Hi,
> 
> With 4.1-rc2 the rackmeter driver for G5 Xserve is giving bogus
> led patterns. So far I have seen at least the following:
> 
> a) With static load the leds seems to be sane and report CPU
> usage properly, but after few minutes they go completely OFF,
> even if the CPU load remains high.
> 
> b) On a completely idle system, leds alter between all OFF and all ON
> roughly once a second.
> 
> Unfortunately I cannot say which was the last kernel where this worked
> properly... These servers were away from normal use for a while due
> to PSU issues.

And mine is dead due to ... PSU issue :-(

It could be that what we use to get the "idle time" isn't correct
anymore...

Ben.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: G5 Xserve rackmeter broken?
  2015-05-10 22:13 ` Benjamin Herrenschmidt
@ 2015-05-12 17:55   ` Aaro Koskinen
  2015-05-12 20:39     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Aaro Koskinen @ 2015-05-12 17:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

Hi,

On Mon, May 11, 2015 at 08:13:35AM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2015-05-10 at 21:32 +0300, Aaro Koskinen wrote:
> > Hi,
> > 
> > With 4.1-rc2 the rackmeter driver for G5 Xserve is giving bogus
> > led patterns. So far I have seen at least the following:
> > 
> > a) With static load the leds seems to be sane and report CPU
> > usage properly, but after few minutes they go completely OFF,
> > even if the CPU load remains high.
> > 
> > b) On a completely idle system, leds alter between all OFF and all ON
> > roughly once a second.
> > 
> > Unfortunately I cannot say which was the last kernel where this worked
> > properly... These servers were away from normal use for a while due
> > to PSU issues.
> 
> And mine is dead due to ... PSU issue :-(

I had 4 dead servers, of which I have now managed get 2 again back
running by recapping the PSU.

> It could be that what we use to get the "idle time" isn't correct
> anymore...

It seems sometimes the idle ticks exceed total ticks and mess up
load calculation. This will explain the b) case behaviour at least.
I added the following debug patch:

@@ -234,6 +234,10 @@ static void rackmeter_do_timer(struct work_struct *work)
         */
        load = (9 * (total_ticks - idle_ticks)) / total_ticks;

+       if (load > 10)
+               pr_err("load: %d total: %u idle: %u\n", load,
+                       total_ticks, idle_ticks);
+
        offset = cpu << 3;
        cumm = 0;
        for (i = 0; i < 8; i++) {

Which shows:

[  795.832701] load: 515 total: 8333333 idle: 8333661
[  796.792767] load: 515 total: 8333333 idle: 8333551
[  796.832770] load: 515 total: 8333333 idle: 8333656
[  797.292799] load: 515 total: 8333334 idle: 8333532
[  798.082856] load: 515 total: 8333334 idle: 8333591
[  798.792903] load: 515 total: 8333333 idle: 8333424
[  798.832909] load: 515 total: 8333333 idle: 8333571
[  799.292937] load: 515 total: 8333334 idle: 8333459
[  799.832973] load: 515 total: 8333333 idle: 8333551
[  800.793038] load: 515 total: 8333333 idle: 8333414
[  800.833045] load: 515 total: 8333333 idle: 8333583
[  801.293071] load: 515 total: 8333334 idle: 8333455
[  801.833107] load: 515 total: 8333333 idle: 8333564

I'm running with HZ=100 so the values are still probably within
jiffy resolution, so perhaps the calculation should first do
idle = min(idle, total)?

A.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: G5 Xserve rackmeter broken?
  2015-05-12 17:55   ` Aaro Koskinen
@ 2015-05-12 20:39     ` Benjamin Herrenschmidt
  2015-05-14 10:06       ` Aaro Koskinen
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2015-05-12 20:39 UTC (permalink / raw)
  To: Aaro Koskinen; +Cc: linuxppc-dev

On Tue, 2015-05-12 at 20:55 +0300, Aaro Koskinen wrote:
> I'm running with HZ=100 so the values are still probably within
> jiffy resolution, so perhaps the calculation should first do
> idle = min(idle, total)?

Does it gives you a reasonable output if you do that ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: G5 Xserve rackmeter broken?
  2015-05-12 20:39     ` Benjamin Herrenschmidt
@ 2015-05-14 10:06       ` Aaro Koskinen
  2015-05-14 10:14         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Aaro Koskinen @ 2015-05-14 10:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

Hi,

On Wed, May 13, 2015 at 06:39:40AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2015-05-12 at 20:55 +0300, Aaro Koskinen wrote:
> > I'm running with HZ=100 so the values are still probably within
> > jiffy resolution, so perhaps the calculation should first do
> > idle = min(idle, total)?
> 
> Does it gives you a reasonable output if you do that ?

The below change fixes the idle system blinking behaviour.

I'm also able to reproduce the leds going off during full CPU load case.
It seems there is some DMA error. Normally, reading rm->dma_regs->status
in the IRQ handler gives 0x8400. In the failure cases I've seen values
0x8880 and 0x8980 - the IRQ will stop after this and it will need
paused -> started cycle before it gets going again (but sometimes fails
again soon after).

A.

diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c
index 048901a..3381fa59 100644
--- a/drivers/macintosh/rack-meter.c
+++ b/drivers/macintosh/rack-meter.c
@@ -227,6 +227,7 @@ static void rackmeter_do_timer(struct work_struct *work)
 
 	total_idle_ticks = get_cpu_idle_time(cpu);
 	idle_ticks = (unsigned int) (total_idle_ticks - rcpu->prev_idle);
+	idle_ticks = min(idle_ticks, total_ticks);
 	rcpu->prev_idle = total_idle_ticks;
 
 	/* We do a very dumb calculation to update the LEDs for now,

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: G5 Xserve rackmeter broken?
  2015-05-14 10:06       ` Aaro Koskinen
@ 2015-05-14 10:14         ` Benjamin Herrenschmidt
  2015-05-14 11:48           ` Aaro Koskinen
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2015-05-14 10:14 UTC (permalink / raw)
  To: Aaro Koskinen; +Cc: linuxppc-dev

On Thu, 2015-05-14 at 13:06 +0300, Aaro Koskinen wrote:
> Hi,
> 
> On Wed, May 13, 2015 at 06:39:40AM +1000, Benjamin Herrenschmidt wrote:
> > On Tue, 2015-05-12 at 20:55 +0300, Aaro Koskinen wrote:
> > > I'm running with HZ=100 so the values are still probably within
> > > jiffy resolution, so perhaps the calculation should first do
> > > idle = min(idle, total)?
> > 
> > Does it gives you a reasonable output if you do that ?
> 
> The below change fixes the idle system blinking behaviour.
> 
> I'm also able to reproduce the leds going off during full CPU load case.
> It seems there is some DMA error. Normally, reading rm->dma_regs->status
> in the IRQ handler gives 0x8400. In the failure cases I've seen values
> 0x8880 and 0x8980 - the IRQ will stop after this and it will need
> paused -> started cycle before it gets going again (but sometimes fails
> again soon after).

That's a bit worrysome, is that new ? Smells like faulting HW ...

Ben.

> A.
> 
> diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c
> index 048901a..3381fa59 100644
> --- a/drivers/macintosh/rack-meter.c
> +++ b/drivers/macintosh/rack-meter.c
> @@ -227,6 +227,7 @@ static void rackmeter_do_timer(struct work_struct *work)
>  
>  	total_idle_ticks = get_cpu_idle_time(cpu);
>  	idle_ticks = (unsigned int) (total_idle_ticks - rcpu->prev_idle);
> +	idle_ticks = min(idle_ticks, total_ticks);
>  	rcpu->prev_idle = total_idle_ticks;
>  
>  	/* We do a very dumb calculation to update the LEDs for now,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: G5 Xserve rackmeter broken?
  2015-05-14 10:14         ` Benjamin Herrenschmidt
@ 2015-05-14 11:48           ` Aaro Koskinen
  0 siblings, 0 replies; 7+ messages in thread
From: Aaro Koskinen @ 2015-05-14 11:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

Hi,

On Thu, May 14, 2015 at 08:14:57PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-05-14 at 13:06 +0300, Aaro Koskinen wrote:
> > On Wed, May 13, 2015 at 06:39:40AM +1000, Benjamin Herrenschmidt wrote:
> > > On Tue, 2015-05-12 at 20:55 +0300, Aaro Koskinen wrote:
> > > > I'm running with HZ=100 so the values are still probably within
> > > > jiffy resolution, so perhaps the calculation should first do
> > > > idle = min(idle, total)?
> > > 
> > > Does it gives you a reasonable output if you do that ?
> > 
> > The below change fixes the idle system blinking behaviour.
> > 
> > I'm also able to reproduce the leds going off during full CPU load case.
> > It seems there is some DMA error. Normally, reading rm->dma_regs->status
> > in the IRQ handler gives 0x8400. In the failure cases I've seen values
> > 0x8880 and 0x8980 - the IRQ will stop after this and it will need
> > paused -> started cycle before it gets going again (but sometimes fails
> > again soon after).
> 
> That's a bit worrysome, is that new ? Smells like faulting HW ...

Ok, right... I swapped the PSU and HD into a different box, and now it
seems to work as expected! (At least the first hour into GCC bootstrap
is still going fine...)

A.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-05-14 11:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-10 18:32 G5 Xserve rackmeter broken? Aaro Koskinen
2015-05-10 22:13 ` Benjamin Herrenschmidt
2015-05-12 17:55   ` Aaro Koskinen
2015-05-12 20:39     ` Benjamin Herrenschmidt
2015-05-14 10:06       ` Aaro Koskinen
2015-05-14 10:14         ` Benjamin Herrenschmidt
2015-05-14 11:48           ` Aaro Koskinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.