All of lore.kernel.org
 help / color / mirror / Atom feed
* Power measurement wrong when idle
@ 2022-10-29 12:33 Marc SCHAEFER
  2022-10-29 13:52 ` Guenter Roeck
  0 siblings, 1 reply; 6+ messages in thread
From: Marc SCHAEFER @ 2022-10-29 12:33 UTC (permalink / raw)
  To: linux-hwmon

Hello,

I am using the apu2 embedded platform, which uses an amd64 AMD GX-412TC SOC,
stepping        : 1
microcode       : 0x7030105

With Debian bullseye, the power measurement when idle is very big, and wrong (>
80 .. 100 W). We have observed this behaviour on multiple systems.

The problem did not occur with Debian buster, does not occur with the
temperature sensor, and the power measurement goes back to apparently correct
values when the system is no longer idle.

It does not seem to be linked to amd64 specific firmwares.

The problem lies in the /sys/class/hwmon/hwmon0/power1_average not in the
lm-sensors package (direct reading the /sys files gives the same isue).

So it appears to be within the kernel: 4.19.0-22-amd64 seems ok and
5.10.0-18-amd64 is not.

Funnily, there does not seem to be relevant changes in the specific kernel
driver (fam15h_power).

Any idea what could lead to this strange behaviour?

Thank you for any ideas or pointers.

Examples:

When bullseye is idle, it's completely wrong (' are from me):

cat /sys/class/hwmon/hwmon0/power1_average
94'019'396

When bullseye has 100% CPU used (one core):
cat /sys/class/hwmon/hwmon0/power1_average
10'917'309

The only visible change is that hwmon1 and hwmon0 are interchanged:

bullseye:
   fam15h_power-pci-00c4
   Adapter: PCI adapter
   power1:       88.61 W  (interval =   0.01 s, crit =   6.00 W)
   
   k10temp-pci-00c3
   Adapter: PCI adapter
   temp1:        +54.5 C  (high = +70.0 C)
                          (crit = +105.0 C, hyst = +104.0 C)
   
buster:
   k10temp-pci-00c3
   Adapter: PCI adapter
   temp1:        +59.6°C  (high = +70.0°C)
                          (crit = +105.0°C, hyst = +104.0°C)
   
   fam15h_power-pci-00c4
   Adapter: PCI adapter
   power1:        8.00 W  (interval =   0.01 s, crit =   6.00 W)
   

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Power measurement wrong when idle
  2022-10-29 12:33 Power measurement wrong when idle Marc SCHAEFER
@ 2022-10-29 13:52 ` Guenter Roeck
  2022-10-30 14:15   ` Marc SCHAEFER
  2022-11-04 10:52   ` Marc SCHAEFER
  0 siblings, 2 replies; 6+ messages in thread
From: Guenter Roeck @ 2022-10-29 13:52 UTC (permalink / raw)
  To: Marc SCHAEFER, linux-hwmon

On 10/29/22 05:33, Marc SCHAEFER wrote:
> Hello,
> 
> I am using the apu2 embedded platform, which uses an amd64 AMD GX-412TC SOC,
> stepping        : 1
> microcode       : 0x7030105
> 
> With Debian bullseye, the power measurement when idle is very big, and wrong (>
> 80 .. 100 W). We have observed this behaviour on multiple systems.
> 
> The problem did not occur with Debian buster, does not occur with the
> temperature sensor, and the power measurement goes back to apparently correct
> values when the system is no longer idle.
> 
> It does not seem to be linked to amd64 specific firmwares.
> 
> The problem lies in the /sys/class/hwmon/hwmon0/power1_average not in the
> lm-sensors package (direct reading the /sys files gives the same isue).
> 
> So it appears to be within the kernel: 4.19.0-22-amd64 seems ok and
> 5.10.0-18-amd64 is not.
> 
> Funnily, there does not seem to be relevant changes in the specific kernel
> driver (fam15h_power).
> 
> Any idea what could lead to this strange behaviour?
> 

A few, but they are all more or less unlikely.

- Debian might carry some non-upstream driver patches causing the problem
   (or fixing it in the older kernel, and the patch was not applied to the
   new kernel).
- Debian installs its own version of the CPU firmware, and the version
   installed with the newer kernel introduces the problem.
   Normally the BIOS would update the CPU firmware, but that may not be
   the case for older systems.
- The problem is caused by some change in the kernel outside the
   fam15h_power driver. I can not imagine what that might be, but it is
   a possibility.

You should be able to check the first two possibilities. For the last one,
the only means I could think of would be to bisect between the good and
the bad version.

Guenter

> Thank you for any ideas or pointers.
> 
> Examples:
> 
> When bullseye is idle, it's completely wrong (' are from me):
> 
> cat /sys/class/hwmon/hwmon0/power1_average
> 94'019'396
> 
> When bullseye has 100% CPU used (one core):
> cat /sys/class/hwmon/hwmon0/power1_average
> 10'917'309
> 
> The only visible change is that hwmon1 and hwmon0 are interchanged:
> 
> bullseye:
>     fam15h_power-pci-00c4
>     Adapter: PCI adapter
>     power1:       88.61 W  (interval =   0.01 s, crit =   6.00 W)
>     
>     k10temp-pci-00c3
>     Adapter: PCI adapter
>     temp1:        +54.5 C  (high = +70.0 C)
>                            (crit = +105.0 C, h94019396yst = +104.0 C)
>     
> buster:
>     k10temp-pci-00c3
>     Adapter: PCI adapter
>     temp1:        +59.6°C  (high = +70.0°C)
>                            (crit = +105.0°C, hyst = +104.0°C)
>     
>     fam15h_power-pci-00c4
>     Adapter: PCI adapter
>     power1:        8.00 W  (interval =   0.01 s, crit =   6.00 W)
>     


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Power measurement wrong when idle
  2022-10-29 13:52 ` Guenter Roeck
@ 2022-10-30 14:15   ` Marc SCHAEFER
  2022-11-04 10:52   ` Marc SCHAEFER
  1 sibling, 0 replies; 6+ messages in thread
From: Marc SCHAEFER @ 2022-10-30 14:15 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-hwmon

Hello,

On Sat, Oct 29, 2022 at 06:52:41AM -0700, Guenter Roeck wrote:
> - Debian might carry some non-upstream driver patches causing the problem
>   (or fixing it in the older kernel, and the patch was not applied to the
>   new kernel).

I think I diffed the two kernel sources properly.

> - Debian installs its own version of the CPU firmware, and the version
>   installed with the newer kernel introduces the problem.

Possible, at least the CPU firmwares are the same version according to
/proc/cpuinfo.

> For the last one, the only means I could think of would be to bisect between
> the good and the bad version.

Yes, I will try that as soon as I have hardware that I can remove from
production.

Thank you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Power measurement wrong when idle
  2022-10-29 13:52 ` Guenter Roeck
  2022-10-30 14:15   ` Marc SCHAEFER
@ 2022-11-04 10:52   ` Marc SCHAEFER
  2022-11-05 14:31     ` Marc SCHAEFER
  1 sibling, 1 reply; 6+ messages in thread
From: Marc SCHAEFER @ 2022-11-04 10:52 UTC (permalink / raw)
  To: linux-hwmon

Hello Guenter,

so far I could install a 5.10 kernel on buster (that kernel contains drivers
and some firmwares, in form of .ko).

Nothing else was installed, and the bug is now present (was not with 4.19).

So, now, I will try the first stock kernel 5 release manually.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Power measurement wrong when idle
  2022-11-04 10:52   ` Marc SCHAEFER
@ 2022-11-05 14:31     ` Marc SCHAEFER
  2022-11-07  2:41       ` Guenter Roeck
  0 siblings, 1 reply; 6+ messages in thread
From: Marc SCHAEFER @ 2022-11-05 14:31 UTC (permalink / raw)
  To: linux-hwmon

Hello,

On Fri, Nov 04, 2022 at 11:52:57AM +0100, Marc SCHAEFER wrote:
> so far I could install a 5.10 kernel on buster (that kernel contains drivers
> and some firmwares, in form of .ko).

I just compiled some stock kernels from kernel.org using make bindeb-pkg

The results:

4.19.260 HAS NOT the bug
4.19.264 HAS NOT the bug
5.0.1 HAS the bug
5.1   HAS the bug

If I understand it well, 5.0.1 is the first ever 5.x kernel and
4.19.264 the latest 4.x kernel.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Power measurement wrong when idle
  2022-11-05 14:31     ` Marc SCHAEFER
@ 2022-11-07  2:41       ` Guenter Roeck
  0 siblings, 0 replies; 6+ messages in thread
From: Guenter Roeck @ 2022-11-07  2:41 UTC (permalink / raw)
  To: Marc SCHAEFER; +Cc: linux-hwmon

On Sat, Nov 05, 2022 at 03:31:19PM +0100, Marc SCHAEFER wrote:
> Hello,
> 
> On Fri, Nov 04, 2022 at 11:52:57AM +0100, Marc SCHAEFER wrote:
> > so far I could install a 5.10 kernel on buster (that kernel contains drivers
> > and some firmwares, in form of .ko).
> 
> I just compiled some stock kernels from kernel.org using make bindeb-pkg
> 
> The results:
> 
> 4.19.260 HAS NOT the bug
> 4.19.264 HAS NOT the bug
> 5.0.1 HAS the bug
> 5.1   HAS the bug
> 
> If I understand it well, 5.0.1 is the first ever 5.x kernel and
> 4.19.264 the latest 4.x kernel.

Unfortunately I don't find a relevant change between v4.19 and v5.0.

The only chance I can see would be to bisect between those kernel versions
to try to find the responsible commit.

Guenter

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-07  2:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-29 12:33 Power measurement wrong when idle Marc SCHAEFER
2022-10-29 13:52 ` Guenter Roeck
2022-10-30 14:15   ` Marc SCHAEFER
2022-11-04 10:52   ` Marc SCHAEFER
2022-11-05 14:31     ` Marc SCHAEFER
2022-11-07  2:41       ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.